Last Updated on January 31, 2022 by Jay
In this tutorial, we’ll learn how to change column order in a pandas dataframe. Some might think that the order of the data columns doesn’t matter too much as long as the data is there. Although this is true while working with data inside Python/pandas, once we export results into Excel, column order becomes important. We want to make sure the column order follows a logical sense so the audience can understand the results easily.
Let’s start by creating a sample dataset using pandas
and numpy
libraries.
The below dataframe construction uses a dictionary comprehension technique, aka “one line for-loop” to quickly create a dataframe with a size of 10 rows X 5 columns. To learn more about this neat looping technique, check out this article.
import pandas as pd
import numpy as np
df = pd.DataFrame(
{f'col_{str(i)}': np.random.rand(5) for i in range(1,6)}
)
>>> df
col_1 col_2 col_3 col_4 col_5
0 0.182444 0.329306 0.356124 0.573469 0.377775
1 0.166817 0.800108 0.876973 0.617122 0.370871
2 0.121286 0.308944 0.611156 0.059454 0.589542
3 0.365968 0.973306 0.515767 0.797248 0.512052
4 0.139206 0.415332 0.836222 0.486267 0.779764
Change column order by pandas dataframe indexing
Remember that we can reference multiple columns by passing a list of column names into the dataframe? For example df[ [ 'col_1', 'col_2', 'col_3' ] ]
will give us a dataframe that contains three columns, namely “col_1”, “col_2”, and “col_3”. The best part is that the column order of the resulting dataframe will be exactly the order that you typed those names in. In the below example, pay attention to the order of the list inside df[]
, as well as the column order of the resulting dataframe.
>>> df[['col_2','col_1','col_5','col_3','col_4']]
col_2 col_1 col_5 col_3 col_4
0 0.329306 0.182444 0.377775 0.356124 0.573469
1 0.800108 0.166817 0.370871 0.876973 0.617122
2 0.308944 0.121286 0.589542 0.611156 0.059454
3 0.973306 0.365968 0.512052 0.515767 0.797248
4 0.415332 0.139206 0.779764 0.836222 0.486267
Reversing columns by using df.columns
df.columns return the column labels of the dataframe. This is also a convenient way to examine what columns are available inside a dataframe.
>>> df.columns
Index(['col_1', 'col_2', 'col_3', 'col_4', 'col_5'], dtype='object')
We can reverse the dataframe column order by playing with the .columns
attribute. When we have a list or list-type object, we can reverse it simply with [::-1]
. Then we can pass the reversed column names into the dataframe to reverse the columns of a dataframe.
>>> df.columns[::-1]
Index(['col_5', 'col_4', 'col_3', 'col_2', 'col_1'], dtype='object')
>>> df[df.columns[::-1]]
col_5 col_4 col_3 col_2 col_1
0 0.377775 0.573469 0.356124 0.329306 0.182444
1 0.370871 0.617122 0.876973 0.800108 0.166817
2 0.589542 0.059454 0.611156 0.308944 0.121286
3 0.512052 0.797248 0.515767 0.973306 0.365968
4 0.779764 0.486267 0.836222 0.415332 0.139206
Change column order by using pandas method .reindex()
The df.reindex()
method can shuffle around both rows or columns.
To change the column order: include the columns
argument in the method. Note this method doesn’t have an inplace
argument, so we can’t modify the dataframe inplace.
>>> df.reindex(columns = ['col_2','col_1','col_4','col_3','col_5'])
col_2 col_1 col_4 col_3 col_5
0 0.329306 0.182444 0.573469 0.356124 0.377775
1 0.800108 0.166817 0.617122 0.876973 0.370871
2 0.308944 0.121286 0.059454 0.611156 0.589542
3 0.973306 0.365968 0.797248 0.515767 0.512052
4 0.415332 0.139206 0.486267 0.836222 0.779764
On a side note, to change order for rows, simply include the index
argument.
>>> df2.re>>> df.reindex(index=[4,3,2,1,0])
col_1 col_2 col_3 col_4 col_5
4 0.139206 0.415332 0.836222 0.486267 0.779764
3 0.365968 0.973306 0.515767 0.797248 0.512052
2 0.121286 0.308944 0.611156 0.059454 0.589542
1 0.166817 0.800108 0.876973 0.617122 0.370871
0 0.182444 0.329306 0.356124 0.573469 0.377775
Sorting column order by .sort_index()
The .sort_index()
method also can sort either columns or rows of a dataframe.
- Argument axis = 1 will sort by columns
- Argument axis = 0 will sort by rows/index
Let’s create a df2 with un-ordered columns:
df2 = df[['col_2','col_1','col_5','col_3','col_4']]
>>> df2
col_2 col_1 col_5 col_3 col_4
0 0.329306 0.182444 0.377775 0.356124 0.573469
1 0.800108 0.166817 0.370871 0.876973 0.617122
2 0.308944 0.121286 0.589542 0.611156 0.059454
3 0.973306 0.365968 0.512052 0.515767 0.797248
4 0.415332 0.139206 0.779764 0.836222 0.486267
To sort the column by order:
>>> df2.sort_index(axis = 1, ascending=True)
col_1 col_2 col_3 col_4 col_5
0 0.182444 0.329306 0.356124 0.573469 0.377775
1 0.166817 0.800108 0.876973 0.617122 0.370871
2 0.121286 0.308944 0.611156 0.059454 0.589542
3 0.365968 0.973306 0.515767 0.797248 0.512052
4 0.139206 0.415332 0.836222 0.486267 0.779764
The argument ascending=True
is optional, and the default is True. We can also sort the columns by descending order:
>>> df2.sort_index(axis = 1, ascending=False)
col_5 col_4 col_3 col_2 col_1
0 0.377775 0.573469 0.356124 0.329306 0.182444
1 0.370871 0.617122 0.876973 0.800108 0.166817
2 0.589542 0.059454 0.611156 0.308944 0.121286
3 0.512052 0.797248 0.515767 0.973306 0.365968
4 0.779764 0.486267 0.836222 0.415332 0.139206