Last Updated on July 14, 2022 by Jay
Deleting rows is a common task in Excel, in this tutorial, we’ll learn a few techniques to delete rows from a pandas dataframe.
This article is part of the “Integrate Python with Excel” series, you can find the table of content here for easier navigation.
Prepare a dataframe
We’ll use the table below to demonstrate how to delete rows. Feel free to download the sample file to walk through the example.
import pandas as pd
df = pd.read_excel('users.xlsx', index_col=0)
>>> df
Country City Gender Age
User Name
Forrest Gump USA New York M 50
Mary Jane CANADA Toronto F 30
Harry Porter UK London M 10
Jean Grey CHINA Shanghai F 30
Jean Grey CANADA Montreal F 30
Mary Jane CANADA Toronto F 30
Notice the index_col=0
from the above code? If we leave that argument blank, the index will be a 0-based index. By specifying the index_col=0
, we ask pandas to use the first column (User Name) as the index.
The .drop()
method
Let’s delete the 3rd row (Harry Porter) from the dataframe. pandas provides a convenient method .drop()
to delete rows. The important arguments for drop() method are listed below, note there are other arguments but we will only cover the following:
label
: single label or a list of labels, these can be either row or column labelsaxis
: default is 0, which means index (i.e. rows). If set to 1, it refers to columnsinplace
: this is to tell if pandas should overwrite the original dataframe
Delete rows by name
>>> df.drop('Harry Porter')
Country City Gender Age
User Name
Forrest Gump USA New York M 50
Mary Jane CANADA Toronto F 30
Jean Grey CHINA Shanghai F 30
Jean Grey CANADA Montreal F 30
Mary Jane CANADA Toronto F 30
We skipped the axis argument, which means leaving it as the default value 0, or rows. So we are deleting a row with the index value “Harry Porter”. Also, note that the .drop()
method also returns the resulting dataframe. Now is the interesting part, let’s see the dataframe df
. It didn’t change! This is because we left out the inplace
argument. Most of the pandas methods will return a copy of the dataframe, instead of modifying the original dataframe, unless we specify inplace=True
.
>>> df
Country City Gender Age
User Name
Forrest Gump USA New York M 50
Mary Jane CANADA Toronto F 30
Harry Porter UK London M 10
Jean Grey CHINA Shanghai F 30
Jean Grey CANADA Montreal F 30
Mary Jane CANADA Toronto F 30
Use the following 2 methods if you want to overwrite the original dataframe df:
- assign the resulting dataframe back to the original
df
- place
inplace=True
inside thedrop()
method
## The following 2 lines of code will give the same result
df = df.drop('Harry Porter')
df.drop('Harry Porter', inplace=True)
Delete rows by position
We can also use the row (index) position to delete rows. Let’s delete rows 1 and 3, which are Forrest Gump and Harry Porter. In the resulting dataframe, we should see only Mary Jane and Jean Grey.
>>> df = df.drop(df.index [ [ 0,2 ] ])
>>> df
Country City Gender Age
User Name
Mary Jane CANADA Toronto F 30
Jean Grey CHINA Shanghai F 30
Jean Grey CANADA Montreal F 30
Mary Jane CANADA Toronto F 30
The boolean index
Boolean index is basically a list of boolean values (True or False). We can use boolean index to filter rows easily, here we can also use it to delete rows conveniently. This time we’ll delete the rows with “Jean Grey” from the dataframe, and assign the result into a new dataframe
>>> df_new = df[df.index != "Jean Grey"]
>>> df_new
Country City Gender Age
User Name
Forrest Gump USA New York M 50
Mary Jane CANADA Toronto F 30
Harry Porter UK London M 10
Mary Jane CANADA Toronto F 30
Next chapter we’ll look at how to delete columns using a few different approaches.