Find and replace Excel data using Python

Sharing is caring!

Last Updated on July 14, 2022 by Jay

Today we’ll learn how to replicate a common Excel task – find and replace data in Python.

This tutorial is part of the “Integrate Python with Excel” series, you can find the table of content here for easier navigation.

Prepare a dataframe for demo

Feel free to download the sample Excel file here to follow along. Let’s load the Excel file data into Python, to do that we’ll use the pandas library, which is the standard for data analysis in Python.

import pandas as pd

df = pd.read_excel('find_replace.xlsx')

>>> df
    ID                 Pilot      Unit  Side
0    0           Ayanami Rei   Unit 00  Ally
1    1           Shiji Ikari   Unit 01  Ally
2    2  Asuka Langley Sohryu   Unit 02  Ally
3    3         Toji Suzuhara   Unit 03  Ally
4    4         Kaworu Nagisa   Unit 04  Ally
5    5         Mari Makinami   Unit 05  Ally
6    6         Kaworu Nagisa  Mark. 06  Ally
7    7               Unknown   Unit 07   NaN
8    8         Mari Makinami   Unit 08  Ally
9    9           Ayanami Rei  Mark. 09  Ally
10  10                   NaN       NaN   NaN
11  13           Shiji Ikari   Unit 13  Ally

I’ll demonstrate two ways to find and replace data in Python. The first one is what I call “direct replace”, and the second one is “conditional replace”.

Direct replace with .replace() method

As the name suggests, this method will find matching data and replace it with something else. Below are some of the arguments we can pass into the .replace() method:

  • to_replace: the data to be replaced
  • value: the new values
  • inplace: whether or not to replace the original dataframe
  • Note that there are other arguments you can use, but I’m not going through them. For a complete list of arguments, check out pandas official documentation

Replace All

In Excel, we can press Ctrl + H and Replace All values, let’s replicate that here. We replace the name “Ayanami Rei” with “Yui Ikari” everywhere inside the dataframe. Note for the below code, inside the parentheses I’ve included the argument names, but it’s not really required. You can simply write df.replace('Ayanami Rei', 'Yui Ikari') and it would still work. The values in Pilot column, row 0 and 9 have been updated.

df.replace(to_replace = 'Ayanami Rei', value = 'Yui Ikari')

    ID                 Pilot      Unit  Side
0    0             Yui Ikari   Unit 00  Ally
1    1           Shiji Ikari   Unit 01  Ally
2    2  Asuka Langley Sohryu   Unit 02  Ally
3    3         Toji Suzuhara   Unit 03  Ally
4    4         Kaworu Nagisa   Unit 04  Ally
5    5         Mari Makinami   Unit 05  Ally
6    6         Kaworu Nagisa  Mark. 06  Ally
7    7               Unknown   Unit 07   NaN
8    8         Mari Makinami   Unit 08  Ally
9    9             Yui Ikari  Mark. 09  Ally
10  10                   NaN       NaN   NaN
11  13           Shiji Ikari   Unit 13  Ally

Conditional replace with filtering

This method solves one problem that the direct replace can’t solve, which is when we want to replace data based on some conditions other than the value of data itself. Here’s an example: for the pilot “Kaworu Nagisa” (line number 4 and 6), we want to change his Side from “Ally” to “Enemy”. We can’t use the previous method (or Ctrl + H) because that will replace all “Ally” to “Enemy”. To tackle this problem, we need to filter the dataframe first, and the condition to be met is when Pilot == ‘Kaworu Nagisa”.

The Excel way of doing this would be using formulae, something like =if(A1="Kaworu Nagisa", "Enemy", Ally"), then drag down for all the rows. However, this is not efficient because we are basically changing all rows, when we just need to modify two of them.

Here’s the Python solution, and I hope you still remember how to filter a dataframe. So let’s begin by filtering the dataframe first.

>>> df.loc[df['Pilot'] == 'Kaworu Nagisa']
   ID          Pilot      Unit  Side
4   4  Kaworu Nagisa   Unit 04  Ally
6   6  Kaworu Nagisa  Mark. 06  Ally

The above line of code returns the records (4 and 6) where the condition Pilot == ‘Kaworu Nagisa’ is True. Once we have this sub-dataset, we can feel free to modify anything on these two records, so let’s change the Side from “Ally” to “Enemy”. If you want to still use .replace() here, be my guest. However, I’m a lazy person, I’ll do it another way. Remember when we filter, we can actually select specific columns? So we’ll select only the Side column for the records that meet our criteria, then assign value “Enemy” directly in that column. And btw, this is the more Pythonic way to write code.

df.loc[df['Pilot'] == 'Kaworu Nagisa', 'Side'] = 'Enemy'

>>> df
    ID                 Pilot      Unit   Side
0    0           Ayanami Rei   Unit 00   Ally
1    1           Shiji Ikari   Unit 01   Ally
2    2  Asuka Langley Sohryu   Unit 02   Ally
3    3         Toji Suzuhara   Unit 03   Ally
4    4         Kaworu Nagisa   Unit 04  Enemy
5    5         Mari Makinami   Unit 05   Ally
6    6         Kaworu Nagisa  Mark. 06  Enemy
7    7               Unknown   Unit 07    NaN
8    8         Mari Makinami   Unit 08   Ally
9    9           Ayanami Rei  Mark. 09   Ally
10  10                   NaN       NaN    NaN
11  13           Shiji Ikari   Unit 13   Ally

Leave a Reply

Your email address will not be published. Required fields are marked *