Last Updated on July 14, 2022 by Jay
There are many reasons to rename pandas dataframe columns. For example, maybe you want the column names to be more descriptive, or maybe you want to shorten the names. This tutorial will walk through how to change names in a dataframe.
This tutorial is part of the “Integrate Python with Excel” series, you can find the table of content here for easier navigation.
Prepare a dataframe for demo
The pandas library provides a convenient way to read data from a web page, so we’ll load up a table from Wikipedia – the S&P 500 companies list.
import pandas from pd
df = pd.read_html('https://en.wikipedia.org/wiki/List_of_S%26P_500_companies')[0]
>>> df.head()
Symbol Security ... CIK Founded
0 MMM 3M Company ... 66740 1902
1 ABT Abbott Laboratories ... 1800 1888
2 ABBV AbbVie Inc. ... 1551152 2013 (1888)
3 ABMD ABIOMED Inc ... 815094 1981
4 ACN Accenture plc ... 1467373 1989
[5 rows x 9 columns]
It appears there are 9 columns in total. Some of them are hidden as represented by the “…” in the above display. Let’s see what columns this table got. A screenshot of the actual Wikipedia page is also included below for reference.
>>> df.columns
Index(['Symbol', 'Security', 'SEC filings', 'GICS Sector', 'GICS Sub Industry',
'Headquarters Location', 'Date first added', 'CIK', 'Founded'],
dtype='object')
Let’s make some modifications to the dataframe. First, we are going to remove some unwanted columns. We don’t need the following columns so let’s drop them: SEC filings, GICS Sub Industry, Date first added, and CIK.
df.drop(['SEC filings', 'GICS Sub Industry', 'Date first added', 'CIK'], axis=1, inplace=True)
>>> df.head()
Symbol Security ... Headquarters Location Founded
0 MMM 3M Company ... St. Paul, Minnesota 1902
1 ABT Abbott Laboratories ... North Chicago, Illinois 1888
2 ABBV AbbVie Inc. ... North Chicago, Illinois 2013 (1888)
3 ABMD ABIOMED Inc ... Danvers, Massachusetts 1981
4 ACN Accenture plc ... Dublin, Ireland 1989
[5 rows x 5 columns]
After dropping the columns, we can check the df.head()
to confirm the drop was successful – now there are only 5 columns. We are left with the following columns:
>>> df.columns
Index(['Symbol', 'Security', 'GICS Sector', 'Headquarters Location',
'Founded'],
dtype='object')
And I think some of the names are not intuitive for me, so I’m going to rename the following:
- Security -> Company Name
- GICS Sector -> Sector
- Headquarters Location -> HQ Location
- Founded -> Founded Yr
Just like most things in pandas, there are several ways to rename columns. We will take a look at some methods and talk about which one is better under different scenarios.
Rename pandas columns with rename()
method
This is my personal favorite because of its readability is probably the best out of the three methods. We can rename either index (rows) or columns using this method, and we need to tell pandas what we are changing (i.e. columns or rows), so there’s no confusion. We also need to tell pandas the column names before and after the change, which improves readability.
df1 = df.rename(columns = {'Security': 'Company Name',
'GICS Sector': 'Sector',
'Headquarters Location': 'HQ Location',
'Founded': 'Founded Yr'},
)
As shown above, we pass in an argument columns, which is a dictionary containing the before and after column names. I chose not to overwrite the original dataframe (inplace=False
by default) because I want to keep the original dataframe for other demonstrations. Note that we only need to pass in the columns we plan to change names for.
Rename pandas columns with set_axis()
method
This method is different from rename()
, as set_axis()
only requires the final column names. But we have to pass in names for every column that we want to keep.
df2 = df.set_axis(['Symbol', 'Company Name', 'Sector', 'HQ Location', 'Founded Yr'], axis =1)
Rename pandas columns by modifyihng df.columns
attribute
df.columns
returns a list of column names for the given dataframe.
>>> df.columns
Index(['Symbol', 'Security', 'GICS Sector', 'Headquarters Location',
'Founded'],
dtype='object')
We can easily change the above column names by re-assigning it to a new list-like object:
>>> df.columns = ['Symbol','Company Name','Sector', 'HQ Location', 'Founded Yr']
Check the updated column names:
>>> df.columns
Index(['Symbol', 'Company Name', 'Sector', 'HQ Location', 'Founded Yr'], dtype='object')
Note that this method is similar to the set_axis()
method in the sense that we need to pass in names for every columns we want to keep.
Which approach to use?
The “pay attention to detail” readers probably already know the answer here.
.rename()
method requires us to pass only the columns need changing.set_axis()
anddf.columns
require us to pass all the column names
In other words, use:
.rename()
when you need to change just a few of the columns. E.g. your table might have 100 columns and you are changing just 3 of them. The only downside is that you have to know the original column names before the name change..set_axis()
ordf.columns
when your table doesn’t have too many columns because you have to give every column a new name! But the upside is that you don’t need to know the original column names beforehand, you can just make the change.