Sort Excel data using Python

Sharing is caring!

Last Updated on July 14, 2022 by Jay

Sorting table is a common task in Excel. We sort a table in order to help us view or use the data easier. However, sorting in Excel can be very slow when your data is huge or contain calculations. So I’m going to show you how to sort an Excel data table using Python, with speed and efficiency guaranteed!

This tutorial is part of the “Integrate Python with Excel” series, you can find the table of content here for easier navigation.

Prepare a dataframe for demo

Since we are using Python to process data in an Excel file, almost by default, we’ll use the pandas library. And we’ll load an example Excel file. There are 4 columns in the file, ID appears to be numbers, Cusomter and purchase are string, and Date is a date.

import pandas as pd

df = pd.read_excel('sort_eg.xlsx')

>>> df
    ID      Customer            purchase       Date
0  101  Forrest Gump        Dragon Ball  2020-08-12
1  102     Mary Jane          Evangelion 2020-01-01
2  103  Harry Porter        Kill la Kill 2020-08-01
3  104     Jean Grey        Dragon Ball  1999-01-01
4  105     Mary Jane          Evangelion 2019-12-31
5  106  Harry Porter  Ghost in the Shell 2020-01-01
6  107     Jean Grey          Evangelion 2018-04-01
7  108     Mary Jane        Demon Slayer 2021-01-01

pandas sorting methods

There are two main sorting methods in pandas. We’ll take a look at each one.

.sort_index() used mostly to sort by index or column

There are several worth noting arguments:

  • axis: 0 means sort by index, 1 means sort by columns. Default is 0.
  • ascending: True means sort by ascending order, False means sort by descending order.
  • inplace: if True, the resulting dataframe replaces the original, default is False.

.sort_values() used mostly to sort by any column

The arguments are similar to the .sort_index() method, except we can now specify which column to sort by:

  • by: the column(s) to sort by. Can take either a string or a list of strings.
  • other arguments refer to above method.

Sort table by columns

Sometimes we want to show the columns in certain order (alphabetical, increase/decreasing, etc). We can use the .sort_index() method, with axis = 1. Note the below output that the table is now sorted by column names in an alphabetical order. However, note that this resulting dataframe does not replace the original df due to inplace=False by default.

>>> df.sort_index(axis=1)
       Customer       Date   ID            purchase
0  Forrest Gump 2020-08-12  101        Dragon Ball 
1     Mary Jane 2020-01-01  102          Evangelion
2  Harry Porter 2020-08-01  103        Kill la Kill
3     Jean Grey 1999-01-01  104        Dragon Ball 
4     Mary Jane 2019-12-31  105          Evangelion
5  Harry Porter 2020-01-01  106  Ghost in the Shell
6     Jean Grey 2018-04-01  107          Evangelion
7     Mary Jane 2021-01-01  108        Demon Slayer

Sort table by index

We can also sort the table in ascending or descending order.

>>> df.sort_index(ascending=False)
    ID      Customer            purchase       Date
7  108     Mary Jane        Demon Slayer 2021-01-01
6  107     Jean Grey          Evangelion 2018-04-01
5  106  Harry Porter  Ghost in the Shell 2020-01-01
4  105     Mary Jane          Evangelion 2019-12-31
3  104     Jean Grey        Dragon Ball  1999-01-01
2  103  Harry Porter        Kill la Kill 2020-08-01
1  102     Mary Jane          Evangelion 2020-01-01
0  101  Forrest Gump        Dragon Ball  2020-08-12

Sort by a specified column

We’ve seen how to sort by index, now let’s see how to sort by an individual column. Let’s sort the table by purchasing date. By default, ascending order is used, so we’ll see earlier dates come first. Of course, we can reverse the table by specifying ascending=False.

>>> df.sort_values(by='Date')
    ID      Customer            purchase       Date
3  104     Jean Grey        Dragon Ball  1999-01-01
6  107     Jean Grey          Evangelion 2018-04-01
4  105     Mary Jane          Evangelion 2019-12-31
1  102     Mary Jane          Evangelion 2020-01-01
5  106  Harry Porter  Ghost in the Shell 2020-01-01
2  103  Harry Porter        Kill la Kill 2020-08-01
0  101  Forrest Gump        Dragon Ball  2020-08-12
7  108     Mary Jane        Demon Slayer 2021-01-01

Sort by multiple columns

We can also sort by multiple columns. In the below example, Customer’s names are sorted first, then within each Customer, “purchase” is then sorted again. For example, for Harry Porter, “Ghost in the Shell” comes before “Kill la Kill” since the letter G comes from the letter K.

>>> df.sort_values(by=['Customer','purchase'])
    ID      Customer            purchase       Date
0  101  Forrest Gump        Dragon Ball  2020-08-12
5  106  Harry Porter  Ghost in the Shell 2020-01-01
2  103  Harry Porter        Kill la Kill 2020-08-01
3  104     Jean Grey        Dragon Ball  1999-01-01
6  107     Jean Grey          Evangelion 2018-04-01
7  108     Mary Jane        Demon Slayer 2021-01-01
1  102     Mary Jane          Evangelion 2020-01-01
4  105     Mary Jane          Evangelion 2019-12-31

Leave a Reply

Your email address will not be published. Required fields are marked *