New: See also the notebooks using Titanic survival to teach classification with machine learning. These cover the essentials of machine learning classification, and include logistic regression. Random Forest, PyTorch and TensorFlow models.

See here: https://pythonhealthcare.org/titanic-survival/

Below is an index of posts by topic area. To the right is a search box.

# Python basics

Introduction, and installing python for healthcare modelling (video on installing and using the Spyder code editor and runner).

Sorting and sub-grouping dictionary items with itemgetter and groupby

if, else, elif, while, and logical operators; else after while

List comprehensions – one line loops (more examples here).

try …. except (where code might fail)

Lambda functions (one line functions), and map/filter/reduce

Accessing date and time, and timing code

Saving python objects to disk with pickle

# NumPy and Pandas

Converting between NumPy and Pandas

Reading and writing CSV files using NumPy and Pandas

Applying user-defined functions to NumPy and Pandas

Adding more data to NumPy arrays and Pandas dataframes

Using Pandas to merge or lookup data

Sorting and ranking with Pandas

Using masks to filter data, and perform search and replace, in NumPy and Pandas

Summarising data by groups in Pandas using pivot_tables and groupby

Reshaping Pandas data with stack, unstack, pivot and melt

Subgrouping data in Pandas with groupby

Iterating through columns and rows in NumPy and Pandas

Removing duplicate data in NumPy and Pandas

Setting width and number of decimal places in NumPy print output

Using NumPy to generate random numbers, or shuffle arrays

Using ‘pop’ to remove a Pandas DataFrame column and transfer to new variable

Saving intact Pandas DataFrames using ‘pickle’

# Matplotlib for plotting charts

Simple xy line charts, and simple save to file

Scatter plot, and adding titles to axes

Pie charts, and adding a title

Histograms (and obtaining histogram data with NumPy)

3D wireframe and surface plots

Common modifications to charts

Adding contour lines to a heatmap

# Statistics

Linear regression with scipy.stats

Linear regression with scikit learn

One sample t-test and Wilcoxon signed rank test

t-tests for testing the difference between two groups of data

Multi-comparison with Tukey’s test and the Holm-Bonferroni method

Multiple comparison of non-normally distributed data with the Kruskal-Wallace test

Confidence Interval for a single proportion

# Clinical pathway simulation with SimPy

A simple bed occupancy model (object-based)

A hospital bed occupancy model with queuing for a limited number of beds (object based)

# Machine Learning with SciKit Learn

New: See also the notebooks using Titanic survival to teach classification with machine learning. These cover the essentials of machine learning classification, and include logistic regression. Random Forest, PyTorch and TensorFlow models. See here: https://pythonhealthcare.org/titanic-survival/

Splitting data into training and test sets

Using logistic regression to diagnose breast cancer

Adding standard diagnostic performance metrics to a ml diagnosis model

How do you know if you have gathered enough data? By using learning rates.

Working with ordinal and categorical data

Choosing between models with stratified k-fold validation

Optimising scikit-learn machine learning models with grid search or randomized search

Visualising accuracy and error in a classification model with a confusion matrix

Reducing data complexity, and eliminating covariance, with principal component analysis

Feature selection 1 (univariate statistical selection)

Feature selection 2 (model selection; forward selection)

Feature selection 3 (model selection; backwards selection)

Grouping unlabelled data with k-means clustering

Linear regression with scikit learn

Random Forests regression (suitable for more complex data sets than linear regression)

Worked machine learning example (for HSMA course)

Simple machine learning model to predict emergency department (ED) breaches of the four-hour target

Oversampling to correct for imbalanced data using naive sampling or SMOTE

# Natural language processing

Pre-processing data: tokenization, stemming, and removal of stop words

Pre-processing data: tokenization, stemming, and removal of stop words (compressed code)

POS (Parts of Speech) tagging – labelling words as nouns, verbs, adjectives, etc.

Using free text for classification – ‘Bag of Words’

Topic modelling (dividing documents into topic groups) with Gensim

TensorFlow text-based classification – from raw text to prediction

# Machine learning with TensorFlow

Installing and using tensorflow in Anaconda

Splitting data set into training and test sets using Pandas DataFrames methods

Image recognition with TensorFlow

TensorFlow text-based classification – from raw text to prediction

Regression analysis with TensorFlow

# Some common (and hopefully useful) algorithms

The travelling community nurse problem (aka the Travelling Salesman Problem)

Genetic algorithms 1. A simple genetic algorithm

Exploring the best possible trade-off between competing objectives: identifying the Pareto Front

Crowding distances: selecting solutions when too many multi-objective solutions exist

Genetic Algorithms 2 – a multiple objective genetic algorithm (NSGA-II) (Code Only)

## Interactive charts with HoloViews and Bokeh

A basic example of creating an interactive plot with HoloViews and Bokeh

# Miscellaneous Python

Parallel processing across CPU cores

Speed up Python by 1,000 times or more using numba!

Passing arguments to Python from the command line (or other programs)

Parallel processing functions and loops with dask ‘delayed’ method

# Other resources

Open Data Sets for Machine Learning