New: See also the notebooks using Titanic survival to teach classification with machine learning. These cover the essentials of machine learning classification, and include logistic regression. Random Forest, PyTorch and TensorFlow models.

See here:

Below is an index of posts by topic area. To the right is a search box.

Python basics

Introduction, and installing python for healthcare modelling (video on installing and using the Spyder code editor and runner).


Nested Lists




Sorting and sub-grouping dictionary items with itemgetter and groupby


math module

Variable Types

Random numbers and sequences

if, else, elif, while, and logical operators; else after while

loops and iterating

List comprehensions – one line loops (more examples here).

try …. except (where code might fail)

Decimal places in output

Read from and write to files


Automatically passing unpacked lists or tuples to a function (or why do you see * before lists and tuples)

Lambda functions (one line functions), and map/filter/reduce

Accessing date and time, and timing code

Brief examples of applying lambda functions to lists, and filtering lists with list comprehensions, map and filter

Saving python objects to disk with pickle

NumPy and Pandas

NumPy and Pandas

NumPy basics: building an array from lists, basic statistics, converting to booleans, referencing the array, and taking slices

Pandas basics: building a dataframe from lists, and retrieving data from the dataframe using row and column index references

Pandas: basic statistics

Converting between NumPy and Pandas

Array maths in NumPy

Reading and writing CSV files using NumPy and Pandas

Applying user-defined functions to NumPy and Pandas

Adding more data to NumPy arrays and Pandas dataframes

Using Pandas to merge or lookup data

Sorting and ranking with Pandas

Using masks to filter data, and perform search and replace, in NumPy and Pandas

Summarising data by groups in Pandas using pivot_tables and groupby

Reshaping Pandas data with stack, unstack, pivot and melt

Subgrouping data in Pandas with groupby

Iterating through columns and rows in NumPy and Pandas

Removing duplicate data in NumPy and Pandas

Setting width and number of decimal places in NumPy print output

Using NumPy to generate random numbers, or shuffle arrays

Using ‘pop’ to remove a Pandas DataFrame column and transfer to new variable

Saving intact Pandas DataFrames using ‘pickle’

Matplotlib for plotting charts

Simple xy line charts, and simple save to file

Scatter plot, and adding titles to axes

Bar charts

Pie charts, and adding a title

Histograms (and obtaining histogram data with NumPy)


Violin plots

3D wireframe and surface plots

Common modifications to charts

A simple heatmap

Adding contour lines to a heatmap

Creating a grid of subplots

Adding error bars to charts

Adding shaded areas to charts


Linear regression with scipy.stats

Linear regression with scikit learn

One sample t-test and Wilcoxon signed rank test

t-tests for testing the difference between two groups of data

Mann Whitney U-test

Analysis of variance (ANOVA)

Multi-comparison with Tukey’s test and the Holm-Bonferroni method

Multiple comparison of non-normally distributed data with the Kruskal-Wallace test

Confidence Interval for a single proportion

Chi-squared test

Fisher’s exact test

Distribution fitting to data

Clinical pathway simulation with SimPy

A simple bed occupancy model

A simple bed occupancy model (object-based)

A hospital bed occupancy model with queuing for a limited number of beds (object based)

An emergency department model in SimPy, with patient prioritisation and capacity limited by doctor availability (object based)

Generating log normal samples from provided arithmetic mean and standard deviation of original population

Machine Learning with SciKit Learn

New: See also the notebooks using Titanic survival to teach classification with machine learning. These cover the essentials of machine learning classification, and include logistic regression. Random Forest, PyTorch and TensorFlow models. See here:

The iris data set

Splitting data into training and test sets

Feature Scaling

A short function to replace (impute) missing numerical data in Pandas DataFrames with median of column values

Using logistic regression to diagnose breast cancer

Adding standard diagnostic performance metrics to a ml diagnosis model

How do you know if you have gathered enough data? By using learning rates.

Working with ordinal and categorical data

Support Vector machines

Random Forests

Neural networks

Choosing between models with stratified k-fold validation

Optimising scikit-learn machine learning models with grid search or randomized search

Visualising accuracy and error in a classification model with a confusion matrix

Changing sensitivity of machine learning algorithms and performing a receiver-operator characteristic curve

Reducing data complexity, and eliminating covariance, with principal component analysis

Feature selection 1 (univariate statistical selection)

Feature selection 2 (model selection; forward selection)

Feature selection 3 (model selection; backwards selection)

Feature expansion

Grouping unlabelled data with k-means clustering

Linear regression with scikit learn

Random Forests regression (suitable for more complex data sets than linear regression)

Worked machine learning example (for HSMA course)

Simple machine learning model to predict emergency department (ED) breaches of the four-hour target

Oversampling to correct for imbalanced data using naive sampling or SMOTE

Natural language processing

Pre-processing data: tokenization, stemming, and removal of stop words

Pre-processing data: tokenization, stemming, and removal of stop words (compressed code)

POS (Parts of Speech) tagging – labelling words as nouns, verbs, adjectives, etc.

Using free text for classification – ‘Bag of Words’

Topic modelling (dividing documents into topic groups) with Gensim

Converting text to numbers

TensorFlow text-based classification – from raw text to prediction

Machine learning with TensorFlow

Installing and using tensorflow in Anaconda

Splitting data set into training and test sets using Pandas DataFrames methods

Image recognition with TensorFlow

TensorFlow text-based classification – from raw text to prediction

Regression analysis with TensorFlow

Save and load model weights

Some common (and hopefully useful) algorithms

The travelling community nurse problem (aka the Travelling Salesman Problem)

Genetic algorithms 1. A simple genetic algorithm

Exploring the best possible trade-off between competing objectives: identifying the Pareto Front

Crowding distances: selecting solutions when too many multi-objective solutions exist

Genetic Algorithms 2 – a multiple objective genetic algorithm (NSGA-II) (Code Only)

Interactive charts with HoloViews and Bokeh

A basic example of creating an interactive plot with HoloViews and Bokeh

Miscellaneous Python

Parallel processing across CPU cores

Function decorators

Speed up Python by 1,000 times or more using numba!

Passing arguments to Python from the command line (or other programs)

Parallel processing functions and loops with dask ‘delayed’ method

Design Patterns

Other resources

Open Data Sets for Machine Learning

Open data travel distances and times for all England lower super output areas (LSOA) to all acute hospitals

Just for fun

A game of pong