How to Filter Dataframe With Pandas Query Method – With Examples

Sharing is caring!

Last Updated on March 4, 2022 by Jay

One way to filter a dataframe is by using the pandas query() method and let’s take a look at it with 9 examples.

Sample Dataset

import numpy as np
import pandas as pd

df = pd.DataFrame({'date':pd.date_range(start='2021-12-01', periods=10, freq='MS'),
                   'country':['USA','India','Germany','France','Canada','Netherland',
                              'UK','Singapore', 'Australia', 'Canada'],
                   'a': np.random.randint(10, size=10),
                   'b': np.random.randint(10, size=10)})

df
        date     country  a  b
0 2021-12-01         USA  8  7
1 2022-01-01       India  7  7
2 2022-02-01     Germany  7  4
3 2022-03-01      France  2  6
4 2022-04-01      Canada  3  9
5 2022-05-01  Netherland  3  4
6 2022-06-01          UK  2  7
7 2022-07-01   Singapore  3  9
8 2022-08-01   Australia  7  6
9 2022-09-01      Canada  1  4

Syntax

The high-level syntax is straightforward: df.query(“expression”). Note the expression is going to be a String and it’s very versatile.

Example 1 – Numeric Comparison

We can use the column names directly inside the String expression, like “a>3” below will give us all rows where values in column a are greater than 3.

Note this is the same as df[df[‘a’] > 3].

df.query('a > 3')
        date    country  a  b
0 2021-12-01        USA  8  7
1 2022-01-01      India  7  7
2 2022-02-01    Germany  7  4
8 2022-08-01  Australia  7  6

Example 2 – Matching String Values

To use String values inside the expression, we just need to use a different quote. If your outer quotation mark is a double “” quote, then use single quote ” for the String values and vice versa.

df.query("country == 'Canada' or b == 6")
        date    country  a  b
3 2022-03-01     France  2  6
4 2022-04-01     Canada  3  9
8 2022-08-01  Australia  7  6
9 2022-09-01     Canada  1  4

#alternative
df.query('country == "Canada" or b == 6')

Example 3 – Include Built-in Functions

Some Python built-in functions also work in the String expression we pass into the pandas query() method see below examples. All countries other than USA and Canada are returned:

df.query("not (country in ['USA', 'Canada'])")
        date     country  a  b
1 2022-01-01       India  7  7
2 2022-02-01     Germany  7  4
3 2022-03-01      France  2  6
5 2022-05-01  Netherland  3  4
6 2022-06-01          UK  2  7
7 2022-07-01   Singapore  3  9
8 2022-08-01   Australia  7  6

Example 4 – Use Calculations In String Expression

The String expression also allows calculations. In the below example we can find all records where column a has an even number.

df.query("a % 2 == 0")
        date country  a  b
0 2021-12-01     USA  8  7
3 2022-03-01  France  2  6
6 2022-06-01      UK  2  7

Example 5 – Compare Columns

We can also compare columns directly. The below example returns all instances where values in a greater than values in b.

df.query("a > b")
        date    country  a  b
0 2021-12-01        USA  8  7
2 2022-02-01    Germany  7  4
8 2022-08-01  Australia  7  6

Example 6 – Working With Dates

The pandas dt accessor still works even inside the String expression!

df.query("date.dt.month >= 8")
        date    country  a  b
0 2021-12-01        USA  8  7
8 2022-08-01  Australia  7  6
9 2022-09-01     Canada  1  4

Example 7 – String Comparison

We can compare String values directly in Python, for example:

'canada' > 'c'
True

The same also applies to the query() method. We can find all rows where country names start with a letter after “M”:

df.query("country > 'M'")

        date     country  a  b
0 2021-12-01         USA  8  7
5 2022-05-01  Netherland  3  4
6 2022-06-01          UK  2  7
7 2022-07-01   Singapore  3  9

Example 8 – Passing Variable Into Query()

Yes, it’s possible to pass variables directly into the query() method. It would suck if we can only pass in hardcoded values. To pass in a variable, simply add the @ symbol in front of the variable name. Let’s create some variables and see:

start_date = '2022-02-01'
end_date = '2022-04-01'
num = 5

df.query("a >= @num")
        date    country  a  b
0 2021-12-01        USA  8  7
1 2022-01-01      India  7  7
2 2022-02-01    Germany  7  4
8 2022-08-01  Australia  7  6

Example 9 – Dates In String Type Also Work!

Thanks to the String comparison we showed earlier, we can compare dates even when they are in String format. This expression ‘2022-03-03’ > ‘2022-03-02’ will return True in Python.

df.query("date >= @start_date and date <= @end_date")
        date  country  a  b
2 2022-02-01  Germany  7  4
3 2022-03-01   France  2  6
4 2022-04-01   Canada  3  9

Hopefully, these 9 examples of using pandas query() method give you more ideas on how to filter a dataframe. Of course, it’s also possible to filter a dataframe by using the boolean index, which works the same as the query() method. It all comes down to your personal preference on how you want to write the code.

Additional Resource

Filter A Pandas Dataframe Using Boolean Index

Filter A Pandas Dataframe With Conditions

Replicate SUMIF and COUNTIF in Pandas

Leave a Reply

Your email address will not be published. Required fields are marked *