How to Remove Whitespace in Pandas

Sharing is caring!

Last Updated on March 5, 2022 by Jay

We often need to remove whitespace from data using pandas; this tutorial will show you how with several examples.

Take a look at the following dataframe, it’s actually not uncommon to receive data like the below – there could be whitespace either in front, after, or even inside the values.

import pandas as pd

df = pd.DataFrame({
                   'country': ['   USA     ','India','Germ any','Fra nce', '    Canada    ',
                               'Nether land','UK','Singapore', 'Australia', 
                               'Canada            '],
                  })

df
              country
0            USA     
1               India
2            Germ any
3             Fra nce
4          Canada    
5         Nether land
6                  UK
7           Singapore
8           Australia
9  Canada            

df['country'].values
array(['   USA     ', 'India', 'Germ any', 'Fra nce', '    Canada    ',
       'Nether land', 'UK', 'Singapore', 'Australia',
       'Canada            '], dtype=object)

These whitespace are annoying because to a computer, ” Canada ” and “Canada” are not the same thanks to the whitespace!

'Canada' in df['country'].values
False

'    Canada    ' in df['country'].values
True

Remove Leading and Trailing Whitespace in Pandas

The easiest way to remove leading & trailing whitespace in pandas is to use the str accessor which gives pandas capability to use String functions.

Both leading and trailing whitespace

Note in the below example, we leave the argument blank in the strip() method on purpose. This will remove all whitespace before and after String values.

df['clean'] = df['country'].str.strip()

df
              country        clean
0            USA               USA
1               India        India
2            Germ any     Germ any
3             Fra nce      Fra nce
4          Canada           Canada
5         Nether land  Nether land
6                  UK           UK
7           Singapore    Singapore
8           Australia    Australia
9  Canada                   Canada

df['clean'].values
array(['USA', 'India', 'Germ any', 'Fra nce', 'Canada', 'Nether land',
       'UK', 'Singapore', 'Australia', 'Canada'], dtype=object)

Leading Whitespace Only

In some instances, we might want to remove only the leading whitespace (left of the String values). We can do this by using the lstrip() (left strip) method. Note the trailing whitespace is untouched.

df['left_clean'] = df['country'].str.lstrip()
df['left_clean'].values
array(['USA     ', 
       'India', 
       'Germ any', 
       'Fra    nce', 
       'Canada    ',
       'Nether land', 
       'UK', 
       'Singapore', 
       'Australia',
       'Canada            '], dtype=object)

Trailing Whitespace Only

We can also remove only the trailing whitespace, and leave the leading whitespace intact. Simply call the rstrip() method:

df['right_clean'] = df['country'].str.rstrip()
df['right_clean'].values
array(['   USA', 
       'India', 
       'Germ any', 
       'Fra    nce', 
       '    Canada',
       'Nether land', 
       'UK', 
       'Singapore', 
       'Australia', 
       'Canada'],
      dtype=object)

Remove Unwanted Whitespace Inside A String

Note in the above ‘cleaned’ data, there is still some whitespace inside String values, such as “Germ any”, and “Nether land”. We also want to remove those to make our country names spell correctly.

We can’t use the strip() method because it removes either leading or trailing characters/whitespace only. To remove whitespace in the middle of a String value, we need to use replace().

In fact, replace() can remove all the whitespace from a String! The syntax looks like this:

replace(old_value, new_value, count)

  • old_value – we pass in the whitespace ‘ ‘
  • new_value – we pass in an empty string ”
  • count – leave blank to replace ALL occurrences
df['clean_2'] = df['country'].str.replace(' ','')

df['clean_2'].values
array(['USA', 'India', 'Germany', 'France', 'Canada', 'Netherland', 'UK',
       'Singapore', 'Australia', 'Canada'], dtype=object)

Additional Resources

How to Filter A Pandas Dataframe By A List of Values

9 Examples on How to Filter Dataframe with Pandas Query() Method

How to Filter Pandas Dataframe Using Boolean Index

How to Filter Pandas Dataframe with OR, AND, NOT

SUMIF and COUNTIF in Pandas

Leave a Reply

Your email address will not be published. Required fields are marked *