Rename pandas dataframe columns

Sharing is caring!

Last Updated on July 14, 2022 by Jay

There are many reasons to rename pandas dataframe columns. For example, maybe you want the column names to be more descriptive, or maybe you want to shorten the names. This tutorial will walk through how to change names in a dataframe.

This tutorial is part of the “Integrate Python with Excel” series, you can find the table of content here for easier navigation.

Prepare a dataframe for demo

The pandas library provides a convenient way to read data from a web page, so we’ll load up a table from Wikipedia – the S&P 500 companies list.

import pandas from pd
df = pd.read_html('https://en.wikipedia.org/wiki/List_of_S%26P_500_companies')[0]

>>> df.head()
  Symbol             Security  ...      CIK      Founded
0    MMM           3M Company  ...    66740         1902
1    ABT  Abbott Laboratories  ...     1800         1888
2   ABBV          AbbVie Inc.  ...  1551152  2013 (1888)
3   ABMD          ABIOMED Inc  ...   815094         1981
4    ACN        Accenture plc  ...  1467373         1989

[5 rows x 9 columns]

It appears there are 9 columns in total. Some of them are hidden as represented by the “…” in the above display. Let’s see what columns this table got. A screenshot of the actual Wikipedia page is also included below for reference.

>>> df.columns
Index(['Symbol', 'Security', 'SEC filings', 'GICS Sector', 'GICS Sub Industry',
       'Headquarters Location', 'Date first added', 'CIK', 'Founded'],
      dtype='object')
Wikipedia SP 500 company list

Let’s make some modifications to the dataframe. First, we are going to remove some unwanted columns. We don’t need the following columns so let’s drop them: SEC filings, GICS Sub Industry, Date first added, and CIK.

df.drop(['SEC filings', 'GICS Sub Industry', 'Date first added', 'CIK'], axis=1, inplace=True)

>>> df.head()
  Symbol             Security  ...    Headquarters Location      Founded
0    MMM           3M Company  ...      St. Paul, Minnesota         1902
1    ABT  Abbott Laboratories  ...  North Chicago, Illinois         1888
2   ABBV          AbbVie Inc.  ...  North Chicago, Illinois  2013 (1888)
3   ABMD          ABIOMED Inc  ...   Danvers, Massachusetts         1981
4    ACN        Accenture plc  ...          Dublin, Ireland         1989

[5 rows x 5 columns]

After dropping the columns, we can check the df.head() to confirm the drop was successful – now there are only 5 columns. We are left with the following columns:

>>> df.columns
Index(['Symbol', 'Security', 'GICS Sector', 'Headquarters Location',
       'Founded'],
      dtype='object')

And I think some of the names are not intuitive for me, so I’m going to rename the following:

  • Security -> Company Name
  • GICS Sector -> Sector
  • Headquarters Location -> HQ Location
  • Founded -> Founded Yr

Just like most things in pandas, there are several ways to rename columns. We will take a look at some methods and talk about which one is better under different scenarios.

Rename pandas columns with rename() method

This is my personal favorite because of its readability is probably the best out of the three methods. We can rename either index (rows) or columns using this method, and we need to tell pandas what we are changing (i.e. columns or rows), so there’s no confusion. We also need to tell pandas the column names before and after the change, which improves readability.

df1 = df.rename(columns = {'Security': 'Company Name',
                     'GICS Sector': 'Sector',
                     'Headquarters Location': 'HQ Location',
                     'Founded': 'Founded Yr'},
          )          

As shown above, we pass in an argument columns, which is a dictionary containing the before and after column names. I chose not to overwrite the original dataframe (inplace=False by default) because I want to keep the original dataframe for other demonstrations. Note that we only need to pass in the columns we plan to change names for.

Rename pandas columns with set_axis() method

This method is different from rename(), as set_axis() only requires the final column names. But we have to pass in names for every column that we want to keep.

df2 = df.set_axis(['Symbol', 'Company Name', 'Sector', 'HQ Location', 'Founded Yr'], axis =1)

Rename pandas columns by modifyihng df.columns attribute

df.columns returns a list of column names for the given dataframe.

>>> df.columns
Index(['Symbol', 'Security', 'GICS Sector', 'Headquarters Location',
       'Founded'],
      dtype='object')

We can easily change the above column names by re-assigning it to a new list-like object:

>>> df.columns = ['Symbol','Company Name','Sector', 'HQ Location', 'Founded Yr']

Check the updated column names:

>>> df.columns
Index(['Symbol', 'Company Name', 'Sector', 'HQ Location', 'Founded Yr'], dtype='object')

Note that this method is similar to the set_axis() method in the sense that we need to pass in names for every columns we want to keep.

Which approach to use?

The “pay attention to detail” readers probably already know the answer here.

  • .rename() method requires us to pass only the columns need changing
  • .set_axis() and df.columns require us to pass all the column names

In other words, use:

  • .rename() when you need to change just a few of the columns. E.g. your table might have 100 columns and you are changing just 3 of them. The only downside is that you have to know the original column names before the name change.
  • .set_axis() or df.columns when your table doesn’t have too many columns because you have to give every column a new name! But the upside is that you don’t need to know the original column names beforehand, you can just make the change.

Leave a Reply

Your email address will not be published. Required fields are marked *