Last Updated on March 5, 2022 by Jay
We often need to remove whitespace from data using pandas; this tutorial will show you how with several examples.
Take a look at the following dataframe, it’s actually not uncommon to receive data like the below – there could be whitespace either in front, after, or even inside the values.
import pandas as pd
df = pd.DataFrame({
'country': [' USA ','India','Germ any','Fra nce', ' Canada ',
'Nether land','UK','Singapore', 'Australia',
'Canada '],
})
df
country
0 USA
1 India
2 Germ any
3 Fra nce
4 Canada
5 Nether land
6 UK
7 Singapore
8 Australia
9 Canada
df['country'].values
array([' USA ', 'India', 'Germ any', 'Fra nce', ' Canada ',
'Nether land', 'UK', 'Singapore', 'Australia',
'Canada '], dtype=object)
These whitespace are annoying because to a computer, ” Canada ” and “Canada” are not the same thanks to the whitespace!
'Canada' in df['country'].values
False
' Canada ' in df['country'].values
True
Remove Leading and Trailing Whitespace in Pandas
The easiest way to remove leading & trailing whitespace in pandas is to use the str accessor which gives pandas capability to use String functions.
Both leading and trailing whitespace
Note in the below example, we leave the argument blank in the strip() method on purpose. This will remove all whitespace before and after String values.
df['clean'] = df['country'].str.strip()
df
country clean
0 USA USA
1 India India
2 Germ any Germ any
3 Fra nce Fra nce
4 Canada Canada
5 Nether land Nether land
6 UK UK
7 Singapore Singapore
8 Australia Australia
9 Canada Canada
df['clean'].values
array(['USA', 'India', 'Germ any', 'Fra nce', 'Canada', 'Nether land',
'UK', 'Singapore', 'Australia', 'Canada'], dtype=object)
Leading Whitespace Only
In some instances, we might want to remove only the leading whitespace (left of the String values). We can do this by using the lstrip() (left strip) method. Note the trailing whitespace is untouched.
df['left_clean'] = df['country'].str.lstrip()
df['left_clean'].values
array(['USA ',
'India',
'Germ any',
'Fra nce',
'Canada ',
'Nether land',
'UK',
'Singapore',
'Australia',
'Canada '], dtype=object)
Trailing Whitespace Only
We can also remove only the trailing whitespace, and leave the leading whitespace intact. Simply call the rstrip() method:
df['right_clean'] = df['country'].str.rstrip()
df['right_clean'].values
array([' USA',
'India',
'Germ any',
'Fra nce',
' Canada',
'Nether land',
'UK',
'Singapore',
'Australia',
'Canada'],
dtype=object)
Remove Unwanted Whitespace Inside A String
Note in the above ‘cleaned’ data, there is still some whitespace inside String values, such as “Germ any”, and “Nether land”. We also want to remove those to make our country names spell correctly.
We can’t use the strip() method because it removes either leading or trailing characters/whitespace only. To remove whitespace in the middle of a String value, we need to use replace().
In fact, replace() can remove all the whitespace from a String! The syntax looks like this:
replace(old_value, new_value, count)
- old_value – we pass in the whitespace ‘ ‘
- new_value – we pass in an empty string ”
- count – leave blank to replace ALL occurrences
df['clean_2'] = df['country'].str.replace(' ','')
df['clean_2'].values
array(['USA', 'India', 'Germany', 'France', 'Canada', 'Netherland', 'UK',
'Singapore', 'Australia', 'Canada'], dtype=object)
Additional Resources
How to Filter A Pandas Dataframe By A List of Values
9 Examples on How to Filter Dataframe with Pandas Query() Method
How to Filter Pandas Dataframe Using Boolean Index