29. Sorting and ranking with Pandas

Sorting

Pandas allows easy and flexible sorting.

As usual, let’s first build a dataframe:

import pandas as pd
df = pd.DataFrame()

names = ['Gandolf','Gimli','Frodo','Legolas','Bilbo']
types = ['Wizard','Dwarf','Hobbit','Elf','Hobbit']
magic = [10, 1, 4, 6, 4]
aggression = [7, 10, 2, 5, 1]
stealth = [8, 2, 5, 10, 5]

df['names'] = names
df['type'] = types
df['magic_power'] = magic
df['aggression'] = aggression
df['stealth'] = stealth

And now let’s sort first by magic power and then (in reverse order aggression.

new_df = df.sort_values(['magic_power','aggression'], ascending=[False,True])
print (new_df)

OUT:

     names    type  magic_power  aggression  stealth
0  Gandolf  Wizard           10           7        8
3  Legolas     Elf            6           5       10
4    Bilbo  Hobbit            4           1        5
2    Frodo  Hobbit            4           2        5
1    Gimli   Dwarf            1          10        2

Usually it is fine to use the default sorting method. Sometimes though you may wish to do a series of sequential sorts where you maintain the previous order within the sorted the dataframe. In that case use a mergesort by passing kind = ’mergesort’ as one of the arguments.

We can use sort_index to sort by the index field. Let’s sort our new dataframe by reverse index order:

print (new_df.sort_index(ascending=False))                                                  
i                                                                                                      
     names    type  magic_power  aggression  stealth                                                   
4    Bilbo  Hobbit            4           1        5                                                   
3  Legolas     Elf            6           5       10                                                   
2    Frodo  Hobbit            4           2        5                                                   
1    Gimli   Dwarf            1          10        2                                                   
0  Gandolf  Wizard           10           7        8

Ranking

Pandas allows easy ranking of dataframes by a single column. Where two values are identical the result is the average of the number of ranks they would cover. Notice that a higher number is a higher rank.

i
print (df['magic_power'].rank())

OUT:

0    5.0
1    1.0
2    2.5
3    4.0
4    2.5
Name: magic_power, dtype: float64

Pandas does not offer a direct method for ranking using multiple columns. One way would be to sort the dataframe, reset the index with df.reset_index() and compare the index values to the original table.

One thought on “29. Sorting and ranking with Pandas

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s