The log normal distribution is frequently a useful distribution for mimicking process times in healthcare pathways (or many other non-automated processes). The distribution has a right skew which may frequently occur when some clinical process step has some additional complexity to it compared to the ‘usual’ case.
To sample from a log normal distribution we need to convert the mean and standard deviation that was calculated from the original non-logged population into the mu and sigma of the underlying log normal population.
(For maximum computation effuiciency, when calling the function repeatedly using the same mean and standard deviation, you may wish to split this into two functions – one to calculate mu and sigma which needs only calling once, and the other to sample from the log normal distribution given mu and sigma).
For more on the maths see:
import numpy as np def generate_lognormal_samples(mean, stdev, n=1): """ Returns n samples taken from a lognormal distribution, based on mean and standard deviation calaculated from the original non-logged population. Converts mean and standard deviation to underlying lognormal distribution mu and sigma based on calculations desribed at: https://blogs.sas.com/content/iml/2014/06/04/simulate-lognormal-data- with-specified-mean-and-variance.html Returns a numpy array of floats if n > 1, otherwise return a float """ # Calculate mu and sigma of underlying lognormal distribution phi = (stdev ** 2 + mean ** 2) ** 0.5 mu = np.log(mean ** 2 / phi) sigma = (np.log(phi ** 2 / mean ** 2)) ** 0.5 # Generate lognormal population generated_pop = np.random.lognormal(mu, sigma , n) # Convert single sample (if n=1) to a float, otherwise leave as array generated_pop = \ generated_pop if len(generated_pop) == 1 else generated_pop return generated_pop
Test the function
We will generate a population of 100,000 samples with a given mean and standard deviation (these would be calculated on the non-logged population), and test the resulting generated population has the same mean and standard deviation.
mean = 10 stdev = 10 generated_pop = generate_lognormal_samples(mean, stdev, 100000) print ('Mean:', generated_pop.mean()) print ('Standard deviation:', generated_pop.std()) Out: Mean: 10.043105926813356 Standard deviation: 9.99527575740651
Plot a histogram of the generated population:
import matplotlib.pyplot as plt %matplotlib inline bins = np.arange(0,51,1) plt.hist(generated_pop, bins=bins) plt.show()
Generating a single sample
The function will return a single number if no
n is given in the function call:
print (generate_lognormal_samples(mean, stdev)) Out: 6.999376449335125