Last Updated on July 14, 2022 by Jay
Sorting table is a common task in Excel. We sort a table in order to help us view or use the data easier. However, sorting in Excel can be very slow when your data is huge or contain calculations. So I’m going to show you how to sort an Excel data table using Python, with speed and efficiency guaranteed!
This tutorial is part of the “Integrate Python with Excel” series, you can find the table of content here for easier navigation.
Prepare a dataframe for demo
Since we are using Python to process data in an Excel file, almost by default, we’ll use the pandas
library. And we’ll load an example Excel file. There are 4 columns in the file, ID appears to be numbers, Cusomter and purchase are string, and Date is a date.
import pandas as pd
df = pd.read_excel('sort_eg.xlsx')
>>> df
ID Customer purchase Date
0 101 Forrest Gump Dragon Ball 2020-08-12
1 102 Mary Jane Evangelion 2020-01-01
2 103 Harry Porter Kill la Kill 2020-08-01
3 104 Jean Grey Dragon Ball 1999-01-01
4 105 Mary Jane Evangelion 2019-12-31
5 106 Harry Porter Ghost in the Shell 2020-01-01
6 107 Jean Grey Evangelion 2018-04-01
7 108 Mary Jane Demon Slayer 2021-01-01
pandas sorting methods
There are two main sorting methods in pandas
. We’ll take a look at each one.
.sort_index()
used mostly to sort by index or column
There are several worth noting arguments:
axis
: 0 means sort by index, 1 means sort by columns. Default is 0.ascending
: True means sort by ascending order, False means sort by descending order.inplace
: if True, the resulting dataframe replaces the original, default is False.
.sort_values()
used mostly to sort by any column
The arguments are similar to the .sort_index() method, except we can now specify which column to sort by:
by
: the column(s) to sort by. Can take either a string or a list of strings.- other arguments refer to above method.
Sort table by columns
Sometimes we want to show the columns in certain order (alphabetical, increase/decreasing, etc). We can use the .sort_index()
method, with axis = 1
. Note the below output that the table is now sorted by column names in an alphabetical order. However, note that this resulting dataframe does not replace the original df due to inplace=False
by default.
>>> df.sort_index(axis=1)
Customer Date ID purchase
0 Forrest Gump 2020-08-12 101 Dragon Ball
1 Mary Jane 2020-01-01 102 Evangelion
2 Harry Porter 2020-08-01 103 Kill la Kill
3 Jean Grey 1999-01-01 104 Dragon Ball
4 Mary Jane 2019-12-31 105 Evangelion
5 Harry Porter 2020-01-01 106 Ghost in the Shell
6 Jean Grey 2018-04-01 107 Evangelion
7 Mary Jane 2021-01-01 108 Demon Slayer
Sort table by index
We can also sort the table in ascending or descending order.
>>> df.sort_index(ascending=False)
ID Customer purchase Date
7 108 Mary Jane Demon Slayer 2021-01-01
6 107 Jean Grey Evangelion 2018-04-01
5 106 Harry Porter Ghost in the Shell 2020-01-01
4 105 Mary Jane Evangelion 2019-12-31
3 104 Jean Grey Dragon Ball 1999-01-01
2 103 Harry Porter Kill la Kill 2020-08-01
1 102 Mary Jane Evangelion 2020-01-01
0 101 Forrest Gump Dragon Ball 2020-08-12
Sort by a specified column
We’ve seen how to sort by index, now let’s see how to sort by an individual column. Let’s sort the table by purchasing date. By default, ascending order is used, so we’ll see earlier dates come first. Of course, we can reverse the table by specifying ascending=False
.
>>> df.sort_values(by='Date')
ID Customer purchase Date
3 104 Jean Grey Dragon Ball 1999-01-01
6 107 Jean Grey Evangelion 2018-04-01
4 105 Mary Jane Evangelion 2019-12-31
1 102 Mary Jane Evangelion 2020-01-01
5 106 Harry Porter Ghost in the Shell 2020-01-01
2 103 Harry Porter Kill la Kill 2020-08-01
0 101 Forrest Gump Dragon Ball 2020-08-12
7 108 Mary Jane Demon Slayer 2021-01-01
Sort by multiple columns
We can also sort by multiple columns. In the below example, Customer’s names are sorted first, then within each Customer, “purchase” is then sorted again. For example, for Harry Porter, “Ghost in the Shell” comes before “Kill la Kill” since the letter G comes from the letter K.
>>> df.sort_values(by=['Customer','purchase'])
ID Customer purchase Date
0 101 Forrest Gump Dragon Ball 2020-08-12
5 106 Harry Porter Ghost in the Shell 2020-01-01
2 103 Harry Porter Kill la Kill 2020-08-01
3 104 Jean Grey Dragon Ball 1999-01-01
6 107 Jean Grey Evangelion 2018-04-01
7 108 Mary Jane Demon Slayer 2021-01-01
1 102 Mary Jane Evangelion 2020-01-01
4 105 Mary Jane Evangelion 2019-12-31