Last Updated on May 8, 2022 by Jay

This tutorial will show you how to find the closest value to a given input in pandas. Sometimes we try to filter a dataframe with a value, and the value doesn’t exist. Then we’ll receive an empty dataframe. Instead, we are interested in finding the closest value to that input in the dataframe.

Here’s a simple dataset we’ll use to demonstrate the technique. We have the SPY stock (imaginary) prices for 5 days.

```
import pandas as pd
df = pd.DataFrame({'SPY':[355,390,360,380,450]}, index=pd.date_range('2022-05-07', '2022-05-11'))
df
SPY
2022-05-07 355
2022-05-08 390
2022-05-09 360
2022-05-10 380
2022-05-11 450
```

## Find The Closest Value To An Input In Pandas

Let’s say we want to find the row that’s closest to a price of $386.

By eyeballing the dataframe, we notice two values are close to the given input: 390 and 380. Obviously, the final answer is 390 since it’s closer to 386 than 380 is.

In this situation, we can’t use filters like greater than “>” or less than “<” since we don’t know whether the matching value is above or below the given input 386.

### The Process

- Calculate the differences between each value and the input
- Make sure to take the absolute value on the differences to help with ranking, since there could be both positive and negative numbers
- Rank the result from step 2 above, the record with the lowest absolute difference is the record that’s closest to the input value

The code below shows results up to step 2:

```
find_it = 386
(df['SPY'] - find_it).abs()
2022-05-07 31
2022-05-08 4
2022-05-09 26
2022-05-10 6
2022-05-11 64
```

Next, what we could do (but not going to do) is use **sort_values()** on the dataframe, then find the first (lowest) entry. But there’s a better way to do this.

### pandas argsort() method

Introducing the **argsort()** pandas method, which returns the **integer indices** that would sort the values. For example:

```
(df['SPY'] - find_it).abs().argsort()
2022-05-07 1
2022-05-08 3
2022-05-09 2
2022-05-10 0
2022-05-11 4
```

This might look a bit confusing at first, especially when you look at the ranking with the dates column. The following graph will help you understand:

- On the right-hand side, the original dataframe (or the absolute difference dataframe, since their index is the same) has a number index 0,1,2,3,4.
- On the left-hand side,
**ignore the index/date**column. The**argsort()**returns the number index**in an order** - If we apply this order to the original dataframe, as the below lines suggest, then we can sort the dataframe
- Value 4 (2022-05-08) row should go to the first place
- Value 6 (2022-05-10) row should go to the second place
- …
- And Value 64 (2022-05-11) row should go to the last place.

We can then use the .iloc[] property to re-organize the dataframe:

```
df.iloc[(df['SPY'] - find_it).abs().argsort(),:]
SPY
2022-05-08 390
2022-05-10 380
2022-05-09 360
2022-05-07 355
2022-05-11 450
```

If we just want to get the nearest value instead of the whole sorted dataframe:

```
df.iloc[(df['SPY'] - find_it).abs().argsort()[0],:]
SPY 390
Name: 2022-05-08 00:00:00, dtype: int64
```