Last Updated on March 4, 2022 by Jay
When doing queries we often need to filter a pandas dataframe by a list of values instead of a single value.
Sample Dataset
import numpy as np
import pandas as pd
df = pd.DataFrame({'date':pd.date_range(start='2021-12-01', periods=10, freq='MS'),
                   'country':['USA','India','Germany','France','Canada','Netherland',
                              'UK','Singapore', 'Australia', 'Canada'],
                   'a': np.random.randint(10, size=10),
                   'b': np.random.randint(10, size=10)})
df
        date     country  a  b
0 2021-12-01         USA  8  7
1 2022-01-01       India  7  7
2 2022-02-01     Germany  7  4
3 2022-03-01      France  2  6
4 2022-04-01      Canada  3  9
5 2022-05-01  Netherland  3  4
6 2022-06-01          UK  2  7
7 2022-07-01   Singapore  3  9
8 2022-08-01   Australia  7  6
9 2022-09-01      Canada  1  4Filter By Using A Boolean Index
A boolean index is essentially a list of True and False values. This method gives the most flexibility and control.
Let’s filter data to have records with country = Canada or USA, note we need to use the bitwise OR — “|” between the two criteria.
To help understand what a boolean index is, see the below example, which returns a list/Series of True or False values.
(df['country'] == 'Canada') | (df['country'] == 'USA')
0     True
1    False
2    False
3    False
4     True
5    False
6    False
7    False
8    False
9     True
Name: country, dtype: boolThen, we pass this Series of True and False values into the pandas dataframe to filter:
df[(df['country'] == 'Canada') | (df['country'] == 'USA')]
        date country  a  b
0 2021-12-01     USA  8  7
4 2022-04-01  Canada  3  9
9 2022-09-01  Canada  1  4However, when we have a long list, the above method becomes awkward to use – we need to write the criteria many times to include all items in the list. This is where the isin() method becomes handy.
Filter By Using Pandas isin() Method On A List
In Python we can check if an item is in a list by using the in keyword:
'Canada' in ['Canada', 'USA', 'India']
TrueHowever, this doesn’t work in pandas. Fortunately, there’s the isin() method. We just need to pass in the list of values we want to filter by:
df[df['country'].isin(['Canada', 'USA', 'India'])]
        date country  a  b
0 2021-12-01     USA  8  7
1 2022-01-01   India  7  7
4 2022-04-01  Canada  3  9
9 2022-09-01  Canada  1  4The isin() method works on numbers too:
df[df['a'].isin([2,3,4,5])]
        date     country  a  b
3 2022-03-01      France  2  6
4 2022-04-01      Canada  3  9
5 2022-05-01  Netherland  3  4
6 2022-06-01          UK  2  7
7 2022-07-01   Singapore  3  9Filter By Using Pandas query() Method
The pandas query() method takes a String expression as the filter criteria. For example in the case of a single value:
df.query("country == 'Canada'")
        date country  a  b
4 2022-04-01  Canada  3  9
9 2022-09-01  Canada  1  4For multiple values, we can either the normal logical operators or the bitwise operators. The following examples return the same result:
df.query("country == 'Canada' | country == 'USA'")
df.query("country == 'Canada' or country == 'USA'")
        date country  a  b
0 2021-12-01     USA  8  7
4 2022-04-01  Canada  3  9
9 2022-09-01  Canada  1  4Finally, we can’t use the isin as a part of the String expression. We have to use the keyword in:
df.query("country in ['Canada','USA','India']")
        date country  a  b
0 2021-12-01     USA  8  7
1 2022-01-01   India  7  7
4 2022-04-01  Canada  3  9
9 2022-09-01  Canada  1  4
df.query("a in [2,3,4,5]")
        date     country  a  b
3 2022-03-01      France  2  6
4 2022-04-01      Canada  3  9
5 2022-05-01  Netherland  3  4
6 2022-06-01          UK  2  7
7 2022-07-01   Singapore  3  9
Additional Resources
9 Examples on How to Filter Dataframe with Pandas Query() Method
How to Filter Pandas Dataframe Using Boolean Index