Last Updated on July 14, 2022 by Jay
Today we’ll learn how to replicate a common Excel task – find and replace data in Python.
This tutorial is part of the “Integrate Python with Excel” series, you can find the table of content here for easier navigation.
Prepare a dataframe for demo
Feel free to download the sample Excel file here to follow along. Let’s load the Excel file data into Python, to do that we’ll use the pandas
library, which is the standard for data analysis in Python.
import pandas as pd
df = pd.read_excel('find_replace.xlsx')
>>> df
ID Pilot Unit Side
0 0 Ayanami Rei Unit 00 Ally
1 1 Shiji Ikari Unit 01 Ally
2 2 Asuka Langley Sohryu Unit 02 Ally
3 3 Toji Suzuhara Unit 03 Ally
4 4 Kaworu Nagisa Unit 04 Ally
5 5 Mari Makinami Unit 05 Ally
6 6 Kaworu Nagisa Mark. 06 Ally
7 7 Unknown Unit 07 NaN
8 8 Mari Makinami Unit 08 Ally
9 9 Ayanami Rei Mark. 09 Ally
10 10 NaN NaN NaN
11 13 Shiji Ikari Unit 13 Ally
I’ll demonstrate two ways to find and replace data in Python. The first one is what I call “direct replace”, and the second one is “conditional replace”.
Direct replace with .replace()
method
As the name suggests, this method will find matching data and replace it with something else. Below are some of the arguments we can pass into the .replace()
method:
to_replace
: the data to be replacedvalue
: the new valuesinplace
: whether or not to replace the original dataframe- Note that there are other arguments you can use, but I’m not going through them. For a complete list of arguments, check out pandas official documentation
Replace All
In Excel, we can press Ctrl + H and Replace All values, let’s replicate that here. We replace the name “Ayanami Rei” with “Yui Ikari” everywhere inside the dataframe. Note for the below code, inside the parentheses I’ve included the argument names, but it’s not really required. You can simply write df.replace('Ayanami Rei', 'Yui Ikari')
and it would still work. The values in Pilot column, row 0 and 9 have been updated.
df.replace(to_replace = 'Ayanami Rei', value = 'Yui Ikari')
ID Pilot Unit Side
0 0 Yui Ikari Unit 00 Ally
1 1 Shiji Ikari Unit 01 Ally
2 2 Asuka Langley Sohryu Unit 02 Ally
3 3 Toji Suzuhara Unit 03 Ally
4 4 Kaworu Nagisa Unit 04 Ally
5 5 Mari Makinami Unit 05 Ally
6 6 Kaworu Nagisa Mark. 06 Ally
7 7 Unknown Unit 07 NaN
8 8 Mari Makinami Unit 08 Ally
9 9 Yui Ikari Mark. 09 Ally
10 10 NaN NaN NaN
11 13 Shiji Ikari Unit 13 Ally
Conditional replace with filtering
This method solves one problem that the direct replace can’t solve, which is when we want to replace data based on some conditions other than the value of data itself. Here’s an example: for the pilot “Kaworu Nagisa” (line number 4 and 6), we want to change his Side from “Ally” to “Enemy”. We can’t use the previous method (or Ctrl + H) because that will replace all “Ally” to “Enemy”. To tackle this problem, we need to filter the dataframe first, and the condition to be met is when Pilot == ‘Kaworu Nagisa”.
The Excel way of doing this would be using formulae, something like =if(A1="Kaworu Nagisa", "Enemy", Ally")
, then drag down for all the rows. However, this is not efficient because we are basically changing all rows, when we just need to modify two of them.
Here’s the Python solution, and I hope you still remember how to filter a dataframe. So let’s begin by filtering the dataframe first.
>>> df.loc[df['Pilot'] == 'Kaworu Nagisa']
ID Pilot Unit Side
4 4 Kaworu Nagisa Unit 04 Ally
6 6 Kaworu Nagisa Mark. 06 Ally
The above line of code returns the records (4 and 6) where the condition Pilot == ‘Kaworu Nagisa’ is True
. Once we have this sub-dataset, we can feel free to modify anything on these two records, so let’s change the Side from “Ally” to “Enemy”. If you want to still use .replace()
here, be my guest. However, I’m a lazy person, I’ll do it another way. Remember when we filter, we can actually select specific columns? So we’ll select only the Side column for the records that meet our criteria, then assign value “Enemy” directly in that column. And btw, this is the more Pythonic way to write code.
df.loc[df['Pilot'] == 'Kaworu Nagisa', 'Side'] = 'Enemy'
>>> df
ID Pilot Unit Side
0 0 Ayanami Rei Unit 00 Ally
1 1 Shiji Ikari Unit 01 Ally
2 2 Asuka Langley Sohryu Unit 02 Ally
3 3 Toji Suzuhara Unit 03 Ally
4 4 Kaworu Nagisa Unit 04 Enemy
5 5 Mari Makinami Unit 05 Ally
6 6 Kaworu Nagisa Mark. 06 Enemy
7 7 Unknown Unit 07 NaN
8 8 Mari Makinami Unit 08 Ally
9 9 Ayanami Rei Mark. 09 Ally
10 10 NaN NaN NaN
11 13 Shiji Ikari Unit 13 Ally