How to Filter A Pandas Dataframe By A List of Values

Sharing is caring!

Last Updated on March 4, 2022 by Jay

When doing queries we often need to filter a pandas dataframe by a list of values instead of a single value.

Sample Dataset

import numpy as np
import pandas as pd

df = pd.DataFrame({'date':pd.date_range(start='2021-12-01', periods=10, freq='MS'),
                   'country':['USA','India','Germany','France','Canada','Netherland',
                              'UK','Singapore', 'Australia', 'Canada'],
                   'a': np.random.randint(10, size=10),
                   'b': np.random.randint(10, size=10)})

df
        date     country  a  b
0 2021-12-01         USA  8  7
1 2022-01-01       India  7  7
2 2022-02-01     Germany  7  4
3 2022-03-01      France  2  6
4 2022-04-01      Canada  3  9
5 2022-05-01  Netherland  3  4
6 2022-06-01          UK  2  7
7 2022-07-01   Singapore  3  9
8 2022-08-01   Australia  7  6
9 2022-09-01      Canada  1  4

Filter By Using A Boolean Index

A boolean index is essentially a list of True and False values. This method gives the most flexibility and control.

Let’s filter data to have records with country = Canada or USA, note we need to use the bitwise OR — “|” between the two criteria.

To help understand what a boolean index is, see the below example, which returns a list/Series of True or False values.

(df['country'] == 'Canada') | (df['country'] == 'USA')
0     True
1    False
2    False
3    False
4     True
5    False
6    False
7    False
8    False
9     True
Name: country, dtype: bool

Then, we pass this Series of True and False values into the pandas dataframe to filter:

df[(df['country'] == 'Canada') | (df['country'] == 'USA')]
        date country  a  b
0 2021-12-01     USA  8  7
4 2022-04-01  Canada  3  9
9 2022-09-01  Canada  1  4

However, when we have a long list, the above method becomes awkward to use – we need to write the criteria many times to include all items in the list. This is where the isin() method becomes handy.

Filter By Using Pandas isin() Method On A List

In Python we can check if an item is in a list by using the in keyword:

'Canada' in ['Canada', 'USA', 'India']
True

However, this doesn’t work in pandas. Fortunately, there’s the isin() method. We just need to pass in the list of values we want to filter by:

df[df['country'].isin(['Canada', 'USA', 'India'])]
        date country  a  b
0 2021-12-01     USA  8  7
1 2022-01-01   India  7  7
4 2022-04-01  Canada  3  9
9 2022-09-01  Canada  1  4

The isin() method works on numbers too:

df[df['a'].isin([2,3,4,5])]

        date     country  a  b
3 2022-03-01      France  2  6
4 2022-04-01      Canada  3  9
5 2022-05-01  Netherland  3  4
6 2022-06-01          UK  2  7
7 2022-07-01   Singapore  3  9

Filter By Using Pandas query() Method

The pandas query() method takes a String expression as the filter criteria. For example in the case of a single value:

df.query("country == 'Canada'")
        date country  a  b
4 2022-04-01  Canada  3  9
9 2022-09-01  Canada  1  4

For multiple values, we can either the normal logical operators or the bitwise operators. The following examples return the same result:

df.query("country == 'Canada' | country == 'USA'")
df.query("country == 'Canada' or country == 'USA'")
        date country  a  b
0 2021-12-01     USA  8  7
4 2022-04-01  Canada  3  9
9 2022-09-01  Canada  1  4

Finally, we can’t use the isin as a part of the String expression. We have to use the keyword in:

df.query("country in ['Canada','USA','India']")
        date country  a  b
0 2021-12-01     USA  8  7
1 2022-01-01   India  7  7
4 2022-04-01  Canada  3  9
9 2022-09-01  Canada  1  4


df.query("a in [2,3,4,5]")
        date     country  a  b
3 2022-03-01      France  2  6
4 2022-04-01      Canada  3  9
5 2022-05-01  Netherland  3  4
6 2022-06-01          UK  2  7
7 2022-07-01   Singapore  3  9

Additional Resources

9 Examples on How to Filter Dataframe with Pandas Query() Method

How to Filter Pandas Dataframe Using Boolean Index

How to Filter Pandas Dataframe with OR, AND, NOT

SUMIF and COUNTIF in Pandas

Leave a Reply

Your email address will not be published. Required fields are marked *