Pandas idxmax method – How To Get First Row In Dataframe Based On Criteria (With Examples)

Sharing is caring!

Last Updated on July 14, 2022 by Jay

The pandas idxmax() method can make some operations really simple. For example, getting the first row in a dataframe based on criteria. This tutorial will walk through how to use the idxmax method.

What Is Pandas idxmax

The idxmax() method returns the index of the first occurrence of maximum values on an axis. The below example helps illustrate it:

We have test scores for 4 students with IDs 0,1,2,3 as represented by the dataframe index.

import pandas as pd
df= pd.DataFrame({'Math':[50,60,80,100],'English':[99,88,77,66], 'CS':[80,60,90,100]})

   Math  English   CS
0    50       99   80
1    60       88   60
2    80       77   90
3   100       66  100

idxmax() will help find the maximum test scores of the dataframe. By default, axis=0:

  • Student 3 has the highest Math test score
  • Student 0 has the highest English test score
  • Also, student 3 has the highest CS test score
Math       3
English    0
CS         3
dtype: int64

We can also set axis=1 to find the subject with the highest score for each student. It gets interesting here:

  • Student 3 has perfect scores (100) for both Math and CS
  • idxmax() returns only Math – the first occurrence!
0    English
1    English
2         CS
3       Math
dtype: object

Get First Row In Dataframe Based On Criteria

Now we know, idxmax returns the index of the first occurrence of the maximum values of a dataframe. We can use this feature to help find the first row in a dataframe based on certain criteria.

Consider the following example with (imaginary) SPY stock price for 6 days. We want to find the first row / date when the price goes above $400.

df = pd.DataFrame({'SPY':[355,390,395,420,380, 430]}, index=pd.date_range('2022-05-07', '2022-05-12'))

2022-05-07  355
2022-05-08  390
2022-05-09  395
2022-05-10  420
2022-05-11  380
2022-05-12  430

Let’s break it down by steps, we first do a “filter” on the prices, and check for whether the price is greater than 400. The result of this operation is a boolean index.

df['SPY'] > 400
2022-05-07    False
2022-05-08    False
2022-05-09    False
2022-05-10     True
2022-05-11    False
2022-05-12     True
Freq: D, Name: SPY, dtype: bool

We know for boolean values, True means 1, False means 0. Basically the above looks like this with just 0s and 1s.

(df['SPY'] > 400)*1
2022-05-07    0
2022-05-08    0
2022-05-09    0
2022-05-10    1
2022-05-11    0
2022-05-12    1
Freq: D, Name: SPY, dtype: int32

Now we can apply idxmax to the above:

  • The value 1 will be the maximum value here
  • The first occurrence of the value 1 is on 2022-05-10
  • idxmax returns that index
(df['SPY'] > 400).idxmax()
Timestamp('2022-05-10 00:00:00', freq='D')

Additional Resources

Filter a pandas dataframe (think Excel filters but more powerful)

How to Filter Dataframe With Pandas Query Method – With Examples

How To Find The Closest Value To An Input In Pandas

Leave a Reply

Your email address will not be published. Required fields are marked *