20. NumPy and Pandas

For much of healthcare data analytics and modelling we will use NumPy and Pandas as the key containers of our data. These libraries allow efficient manipulation and analysis of very large data sets. They are very considerably faster than using a ’pure Python’ approach.

NumPy and Pandas are distributed with all main scientific Python distributions.

NumPy

NumPy is a library for supporting work with large multi-dimensional arrays, with many mathematical functions. NumPy is compatible with many other libraries such as Pandas (see below), MatPlotLib (for plotting), and many Python maths, stats and optimisation libraries.

When we import NumPy as as library it is standard practice to use the as statement which allows it to be referenced with a shorter name. We will import as np:

import numpy as np

Pandas

Pandas is a library that allows manipulation of large arrays of data. Data may be indexed and manipulated based on index. Data is readily pivoted, reshaped, grouped, merged.

We will use pandas alongside numpy. Generally, NumPy is faster for mathemetical functions, but Pandas is more powerful for data manipulation.

As with numpy, we import with a shortened name:

import pandas as pd

 

One thought on “20. NumPy and Pandas

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s