This post is also available as a PDF, a Jupyter Notebook and as a py file.¶ Continue reading “80. Grouping unlabelled data with k-means clustering”
Principal component analysis (PCA) may be used for two purposes: Continue reading “79. Reducing data complexity, and eliminating covariance, with principal component analysis”
The last of our machine learning methods that we will look at in this introduction is neural networks.
Neural networks power much of modern image and voice recongition. They can cope with highly complex data, but often take large amounts of data to train well. There are many parameters that can be changes, so fine-tuning a neural net can require extensive work. We will not go into all the ways they may be fine-tuned here, but just look at a simple example. Continue reading “73. Machine learning: neural networks”
Random forest is a versatile machine learning method based on decision trees. One useful feature of random forests is that it is easy to obtain the relative importance of features. This may be used to help better understand what drives classification, and may also be used to reduce the feature set used with minimal reduction in accuracy.
Once again we will re-use our logistic regression model, and replace the model function wit the following three lines:
from sklearn.ensemble import
RandomForestClassifier model = RandomForestClassifier(n_estimators=10000, random_state=0, n_jobs=-1)
model.fit (X,y) Continue reading “72. Machine Learning: Random Forests”