How To Find The Closest Value To An Input In Pandas

Sharing is caring!

Last Updated on May 8, 2022 by Jay

This tutorial will show you how to find the closest value to a given input in pandas. Sometimes we try to filter a dataframe with a value, and the value doesn’t exist. Then we’ll receive an empty dataframe. Instead, we are interested in finding the closest value to that input in the dataframe.

Here’s a simple dataset we’ll use to demonstrate the technique. We have the SPY stock (imaginary) prices for 5 days.

import pandas as pd

df = pd.DataFrame({'SPY':[355,390,360,380,450]}, index=pd.date_range('2022-05-07', '2022-05-11'))

2022-05-07  355
2022-05-08  390
2022-05-09  360
2022-05-10  380
2022-05-11  450

Find The Closest Value To An Input In Pandas

Let’s say we want to find the row that’s closest to a price of $386.

By eyeballing the dataframe, we notice two values are close to the given input: 390 and 380. Obviously, the final answer is 390 since it’s closer to 386 than 380 is.

In this situation, we can’t use filters like greater than “>” or less than “<” since we don’t know whether the matching value is above or below the given input 386.

The Process

  1. Calculate the differences between each value and the input
  2. Make sure to take the absolute value on the differences to help with ranking, since there could be both positive and negative numbers
  3. Rank the result from step 2 above, the record with the lowest absolute difference is the record that’s closest to the input value

The code below shows results up to step 2:

find_it = 386
(df['SPY'] - find_it).abs()
2022-05-07    31
2022-05-08     4
2022-05-09    26
2022-05-10     6
2022-05-11    64

Next, what we could do (but not going to do) is use sort_values() on the dataframe, then find the first (lowest) entry. But there’s a better way to do this.

pandas argsort() method

Introducing the argsort() pandas method, which returns the integer indices that would sort the values. For example:

(df['SPY'] - find_it).abs().argsort()
2022-05-07    1
2022-05-08    3
2022-05-09    2
2022-05-10    0
2022-05-11    4

This might look a bit confusing at first, especially when you look at the ranking with the dates column. The following graph will help you understand:

  1. On the right-hand side, the original dataframe (or the absolute difference dataframe, since their index is the same) has a number index 0,1,2,3,4.
  2. On the left-hand side, ignore the index/date column. The argsort() returns the number index in an order
  3. If we apply this order to the original dataframe, as the below lines suggest, then we can sort the dataframe
    • Value 4 (2022-05-08) row should go to the first place
    • Value 6 (2022-05-10) row should go to the second place
    • And Value 64 (2022-05-11) row should go to the last place.
find the closest value with pandas argsort()
find the closest value with pandas argsort()

We can then use the .iloc[] property to re-organize the dataframe:

df.iloc[(df['SPY'] - find_it).abs().argsort(),:]
2022-05-08  390
2022-05-10  380
2022-05-09  360
2022-05-07  355
2022-05-11  450

If we just want to get the nearest value instead of the whole sorted dataframe:

df.iloc[(df['SPY'] - find_it).abs().argsort()[0],:]
SPY    390
Name: 2022-05-08 00:00:00, dtype: int64

Additional Resources

Sort Excel data using Python

How To Truncate Dataframe In Pandas

Leave a Reply

Your email address will not be published. Required fields are marked *