Below is an index of posts by topic area. To the right is a search box.
Python basics
Introduction, and installing python for healthcare modelling (video on installing and using the Spyder code editor and runner).
Sorting and sub-grouping dictionary items with itemgetter and groupby
if, else, elif, while, and logical operators; else after while
List comprehensions – one line loops (more examples here).
try …. except (where code might fail)
Lambda functions (one line functions), and map/filter/reduce
Accessing date and time, and timing code
Saving python objects to disk with pickle
NumPy and Pandas
Converting between NumPy and Pandas
Reading and writing CSV files using NumPy and Pandas
Applying user-defined functions to NumPy and Pandas
Adding more data to NumPy arrays and Pandas dataframes
Using Pandas to merge or lookup data
Sorting and ranking with Pandas
Using masks to filter data, and perform search and replace, in NumPy and Pandas
Summarising data by groups in Pandas using pivot_tables and groupby
Reshaping Pandas data with stack, unstack, pivot and melt
Subgrouping data in Pandas with groupby
Iterating through columns and rows in NumPy and Pandas
Removing duplicate data in NumPy and Pandas
Setting width and number of decimal places in NumPy print output
Using NumPy to generate random numbers, or shuffle arrays
Using ‘pop’ to remove a Pandas DataFrame column and transfer to new variable
Saving intact Pandas DataFrames using ‘pickle’
Matplotlib for plotting charts
Simple xy line charts, and simple save to file
Scatter plot, and adding titles to axes
Pie charts, and adding a title
Histograms (and obtaining histogram data with NumPy)
3D wireframe and surface plots
Common modifications to charts
Adding contour lines to a heatmap
Statistics
Linear regression with scipy.stats
Linear regression with scikit learn
One sample t-test and Wilcoxon signed rank test
t-tests for testing the difference between two groups of data
Multi-comparison with Tukey’s test and the Holm-Bonferroni method
Multiple comparison of non-normally distributed data with the Kruskal-Wallace test
Confidence Interval for a single proportion
Clinical pathway simulation with SimPy
A simple bed occupancy model (object-based)
A hospital bed occupancy model with queuing for a limited number of beds (object based)
Machine Learning with SciKit Learn
Splitting data into training and test sets
Using logistic regression to diagnose breast cancer
Adding standard diagnostic performance metrics to a ml diagnosis model
How do you know if you have gathered enough data? By using learning rates.
Working with ordinal and categorical data
Choosing between models with stratified k-fold validation
Optimising scikit-learn machine learning models with grid search or randomized search
Visualising accuracy and error in a classification model with a confusion matrix
Reducing data complexity, and eliminating covariance, with principal component analysis
Grouping unlabelled data with k-means clustering
Linear regression with scikit learn
Random Forests regression (suitable for more complex data sets than linear regression)
Worked machine learning example (for HSMA course)
Simple machine learning model to predict emergency department (ED) breaches of the four-hour target
Oversampling to correct for imbalanced data using naive sampling or SMOTE
Natural language processing
Pre-processing data: tokenization, stemming, and removal of stop words
Pre-processing data: tokenization, stemming, and removal of stop words (compressed code)
POS (Parts of Speech) tagging – labelling words as nouns, verbs, adjectives, etc.
Using free text for classification – ‘Bag of Words’
Topic modelling (dividing documents into topic groups) with Gensim
TensorFlow text-based classification – from raw text to prediction
Machine learning with TensorFlow
Installing and using tensorflow in Anaconda
Splitting data set into training and test sets using Pandas DataFrames methods
Image recognition with TensorFlow
TensorFlow text-based classification – from raw text to prediction
Regression analysis with TensorFlow
Some common (and hopefully useful) algorithms
The travelling community nurse problem (aka the Travelling Salesman Problem)
Genetic algorithms 1. A simple genetic algorithm
Exploring the best possible trade-off between competing objectives: identifying the Pareto Front
Crowding distances: selecting solutions when too many multi-objective solutions exist
Genetic Algorithms 2 – a multiple objective genetic algorithm (NSGA-II) (Code Only)
Interactive charts with HoloViews and Bokeh
A basic example of creating an interactive plot with HoloViews and Bokeh
Miscellaneous Python
Speed up Python by 1,000 times or more using numba!
Passing arguments to Python from the command line (or other programs)
Parallel processing functions and loops with dask ‘delayed’ method
Other resources
Open Data Sets for Machine Learning