Last Updated on July 14, 2022 by Jay
In this tutorial, we’ll learn how to drop/delete columns from a pandas dataframe. We are going to walk through three methods to achieve this. Depending on the situation, one method might be better than the other when used properly.
This article is part of the “Integrate Python with Excel” series, you can find the table of content here for easier navigation.
Preparing a dataframe
We’ll start off by creating a dataframe to demonstrate how to delete columns. Feel free to download this sample Excel file to follow along.
import pandas as pd df = pd.read_excel('users.xlsx', index_col=0) >>> df Country City Gender Age User Name Forrest Gump USA New York M 50 Mary Jane CANADA Toronto F 30 Harry Porter UK London M 10 Jean Grey CHINA Shanghai F 30 Jean Grey CANADA Montreal F 30 Mary Jane CANADA Toronto F 30
Delete columns from dataframe with
.drop() pandas method
Similar to deleting rows, we can also delete columns using
.drop(). The only difference is that in the method we need to specify an argument
axis=1. A few notes about this .drop() method.
- To delete a single column: pass in the column name (string)
- To delete multiple columns: pass in a list of the names for the columns to be deleted
- If you want to overwrite the original dataframe, include
df.drop('Country', axis=1) # delete a single column df.drop(['Country', 'City'], axis=1) # delete multiple columns df.drop(['Country', 'City'], axis=1, inplace=True) # overwrite the original dataframe
Delete columns from pandas dataframe with
del a keyword in Python, which can be used to delete an object. We can use it to delete a column from a dataframe.
Note that when using
del, the object is deleted so it means the original dataframe is also updated to reflect the delete.
del df['Country'] >>> df User Name City Gender Age 0 Forrest Gump New York M 50 1 Mary Jane Toronto F 30 2 Harry Porter London M 10 3 Jean Grey Shanghai F 30 4 Jean Grey Montreal F 30 5 Mary Jane Toronto F 30
Delete columns from pandas dataframe with Re-assignment method
Aka the Square bracket method I coined. This is not a true delete method, but rather a re-assignment operation. However, the ending result is the same as a deletion.
Consider our original dataframe, which has 5 columns, namely:
User Name, Country, City, Gender, Age
Let’s say we want to delete Country and Age columns. Instead of delete, we create a new dataframe with only User Name, City and Gender in it, effectively “delete” the other two columns. Then, we assign the newly created dataframe to the original dataframe to complete the “delete operation”. Note the double square brackets in the code.
df = df[[ 'User Name', City', 'Gender' ]]
Which method to use??
You must be thinking “okay so you told me three methods, which one should I use??”. The answer is always: it depends. Below are some tips that I’ve been using to determine which method to use.
- Works best when we have many columns and you need to drop only a few. In this case, we only need to list the columns to drop.
- However, we need to remember to include the inplace=True argument if we want to overwrite the original dataframe.
- Works best when we need to drop only 1 or 2 columns. This method is the simplest and shortest code to write.
- However, if we need to drop multiple columns, we need to use a loop, which is more cumbersome than the .drop() method.
- Works best when the dataframe has only a few columns; or the dataframes has many columns but we are only keeping a few columns.
- If we need to keep many columns, we’ll have to type all the columns names that we plan to keep, which could be a lot of typing.