This post is also available as a PDF, a Jupyter Notebook and a py file.

# Category: Statistics

# 58. Statistics: Chi-squared test

This post is also available as a PDF and as a Jupyter Notebook.

See https://en.wikipedia.org/wiki/Chi-squared_test

Without other qualification, ’chi-squared test’ often is used as short for Pearson’s chi-squared test. The chi-squared test is used to determine whether there is a significant difference between the expected frequencies and the observed frequencies in one or more categories.

Example chi-squared test for categorical data: Continue reading “58. Statistics: Chi-squared test”

# 59. Statistics: Fisher’s exact test

This post is also available as a PDF and as a Jupyter Notebook.

Fisher’s exact test is similar to the Chi-squared test, but is suitable for small sample sizes. As a rule it should be used if at least 20% of values are less than 5 or any value is zero. Although in practice it is employed when sample sizes are small, it is valid for all sample sizes.

For example, let us look at an example where a group of 16 people may choose tennis or football. In the group of 16 there are six boys and ten girls. The tennis group has one boy and eight girls. The football group has five boys and two girls. Does the sport affect the proportion of boys and girls choosing it? Continue reading “59. Statistics: Fisher’s exact test”

# 57. Statistics: Confidence Interval for a single proportion

This post is also available as a PDF and as a Jupyter Notebook

We can use statsmodels to calculate the confidence interval of the proportion of given ’successes’ from a number of trials. This may the frequency of occurrence of a gene, the intention to vote in a particular way, etc. Continue reading “57. Statistics: Confidence Interval for a single proportion”

# 56. Statistics: Multiple comparison of non-normally distributed data with the Kruskal-Wallace test

This post is also available as a PDF and as a Jupyter Notebook.

For data that is not normally distributed, the equivalent test to the ANOVA test (for normally distributed data) is the Kruskal-Wallace test. This tests whether all groups are likely to be from the same population. Continue reading “56. Statistics: Multiple comparison of non-normally distributed data with the Kruskal-Wallace test”

# 55. Statistics: Multi-comparison with Tukey’s test and the Holm-Bonferroni method

This post is also available as a PDF and as a Jupyter Notebook.

If an ANOVA test has identified that not all groups belong to the same population, then methods may be used to identify which groups are significantly different to each other.

Below are two commonly used methods: Tukey’s and Holm-Bonferroni.

These two methods assume that data is approximately normally distributed. Continue reading “55. Statistics: Multi-comparison with Tukey’s test and the Holm-Bonferroni method”

# 54. Statistics: Analysis of variance (ANOVA)

This post is also available as PDF and as a Jupyter Notebook.

One way analysis of variance (ANOVA) tests whether multiple groups all belong to the same population or not.

If a conclusion is reached that the groups do not all belong to the same population, further tests may be utilised to identify the differences. Continue reading “54. Statistics: Analysis of variance (ANOVA)”