Last Updated on July 14, 2022 by Jay
One common task in Excel is to insert rows into a sheet. When working with data in Python, we can also insert rows into an equivalent dataframe.
This article is part of the “Integrate Python with Excel” series, you can find the table of content here for easier navigation.
Add rows into a dataframe
So here’s the bad news, there’s no “insert” function in pandas. We can’t just right-click on a row in our imaginary sheet then choose .insert(). pandas
built-in functions don’t allow us to insert a row at a specific location. The built-in methods only allow us to add a row (or rows) at the end of a dataframe. There are two methods: append
and concat
. They work very similarly so I’ll just go through append
, since I use it more often. Let’s look at some code.
import pandas as pd
df = pd.DataFrame({'col 1':[1,2,3,4,5],
'col 2':[6,7,8,9,10],
'col 3':[11,12,13,14,15],
})
>>> df
col 1 col 2 col 3
0 1 6 11
1 2 7 12
2 3 8 13
3 4 9 14
4 5 10 15
I’ve just created a 5×3 dataframe. Now if I want to add a row to it, I can use append()
, which takes one of the following items: a Dataframe, a Series, or a Dictionary. To make it more obvious, let’s add a row of 100s.
#append a DataFrame
row_to_add = pd.DataFrame({'col 1':[100],
'col 2':[100],
'col 3':[100],
})
df_final = df.append(row_to_add)
Notice how the newly added row has an index value of 0, which is a duplicate? See the first row – the original dataframe also has a row with 0 index. So now there’s a problem, you have two rows with an index 0. If we select index 0, we’ll get two rows – original first row and the newly added row. In most cases, this is probably not your intention.
>>> df_final.loc[0]
col 1 col 2 col 3
0 1 6 11
0 100 100 100
It’s easy to fix this problem, we just need to add another argument ignore_index=True
in the append()
operation. This is almost equivalent to reset_index(). Now you should have the newly added row at index 5.
row_to_add = pd.DataFrame({'col 1':[100],
'col 2':[100],
'col 3':[100],
})
df_final = df.append(row_to_add, ignore_index=True)
>>> df_final
col 1 col 2 col 3
0 1 6 11
1 2 7 12
2 3 8 13
3 4 9 14
4 5 10 15
5 100 100 100
You might say that this is not what you want, and you want to add the row in the middle – just after the 3rd row of the original dataframe. So it’s time for some customization.
Mimic how we insert rows in Excel
In Excel, when we insert a row to a table, we are effectively just shifting everything down by one row (same thing with inserting multiple rows). Technically speaking, we are “splitting” the original table into two parts, then place the new row in between them. A graphical presentation looks like the following.
We can mimic the above technique, and perform the same “insert” operation in Python. Going back to our hypothetical picky requirement – insert the row after the 3rd row, i.e. index 2.
We will create two new dataframes, part_1 and part_2, which contain the rows 1-3, and 4-5, respectively. Then we’ll use the append()
method to glue them together with the row_to_add.
df_part_1 = df.loc[0:2]
df_part_2 = df.loc[3:4]
df_final = df_part_1.append(row_to_add, ignore_index=True)
df_final = df_final.append(df_part_2, ignore_index=True)
>>> df_final
col 1 col 2 col 3
0 1 6 11
1 2 7 12
2 3 8 13
3 100 100 100
4 4 9 14
5 5 10 15
There you go, we just added the new row of 100s after the 3rd row. Most of the time, we would convert the above to a function, so that we can make our code re-usable. See below for a simple example, but take note that you should handle the situations where user input a row_num < 0
, or a row_num > the maximum length
of the original dataframe.
def insert_row(row_num, orig_df, row_to_add):
row_num= min(max(0, row_num), len(orig_df))
df_part_1 = orig_df.loc[0:row_num]
df_part_2 = orig_df.loc[row_num+1:]
df_final = df_part_1.append(row_to_add, ignore_index = True)
df_final = df_final.append(df_part_2, ignore_index = True)
return df_final
In the next chapter, we’ll talk about how to insert columns into a pandas dataframe.
One comment