Last Updated on July 14, 2022 by Jay

Sometimes we want to calculate the difference between rows in a pandas dataframe, instead of looping through the rows, we can use the **dataframe.diff()** method.

For Excel users, it’s tempting to use a loop to calculate the difference between rows, because that’s how it’s done in Excel. However, **pandas **provides a much simpler solution.

We’ll use the below sample dataframe for the illustration. Imagine we have the prices for two stocks: SPY and TSLA.

```
import pandas
df = pd.DataFrame({'SPY':[400,405,420,450,500],
'TSLA':[200,400,800,700,1000]},
index=pd.date_range("2022-04-20", "2022-04-25"))
df
SPY TSLA
2022-04-18 400 200
2022-04-19 405 400
2022-04-20 420 800
2022-04-21 450 700
2022-04-22 500 1000
```

## Pandas diff() Syntax

```
DataFrame.diff(periods = 1,
axis = 0)
```

## Calculate The Difference Between Rows In Pandas Dataframe

We can calculate the daily difference of the stocks without looping through the rows. The argument **periods **controls the periods to shift for calculating the differences between rows. The default value is 1.

The below example calculates daily differences in stock prices. The first row is NaN because there are no previous values to calculate. Starting from the second row, it basically takes values from the second row of the original dataframe, and subtracts the values from the first row of the original dataframe. E.g. 405-400 =5, and 400-200=200.

```
df.diff()
SPY TSLA
2022-04-18 NaN NaN
2022-04-19 5.0 200.0
2022-04-20 15.0 400.0
2022-04-21 30.0 -100.0
2022-04-22 50.0 300.0
```

For the same reasoning, we can calculate the difference between rows backward by setting the **periods **to a negative number. This is quite convenient as we don’t have to reverse the order of the data.

```
df.diff(periods = -1)
SPY TSLA
2022-04-18 -5.0 -200.0
2022-04-19 -15.0 -400.0
2022-04-20 -30.0 100.0
2022-04-21 -50.0 -300.0
2022-04-22 NaN NaN
```

We can also calculate the difference between non-consecutive rows by setting the periods to a number other than 1.

```
df.diff(periods = 2)
SPY TSLA
2022-04-18 NaN NaN
2022-04-19 NaN NaN
2022-04-20 20.0 600.0
2022-04-21 45.0 300.0
2022-04-22 80.0 200.0
```

To help visualize the above example, we can first **shift()** the columns down by 2 rows, and then perform a subtraction. The below code replicates the **diff(periods=2)** above.

```
df[['SPY_2','TSLA_2']] = df[['SPY','TSLA']].shift(2)
SPY TSLA SPY_2 TSLA_2
2022-04-18 400 200 NaN NaN
2022-04-19 405 400 NaN NaN
2022-04-20 420 800 400.0 200.0
2022-04-21 450 700 405.0 400.0
2022-04-22 500 1000 420.0 800.0
df['SPY_diff'] = df['SPY'] - df['SPY_2']
df['TSLA_diff'] = df['TSLA'] - df['TSLA_2']
SPY TSLA SPY_2 TSLA_2 SPY_diff TSLA_diff
2022-04-18 400 200 NaN NaN NaN NaN
2022-04-19 405 400 NaN NaN NaN NaN
2022-04-20 420 800 400.0 200.0 20.0 600.0
2022-04-21 450 700 405.0 400.0 45.0 300.0
2022-04-22 500 1000 420.0 800.0 80.0 200.0
```

## Calculate The Difference Between Columns

It’s also possible to calculate the difference between columns of a dataframe by setting the **axis **argument to 1 (or ‘columns’). The **axis **argument in pandas usually has a default value of 0 (i.e. rows).

```
df.diff(periods =1, axis = 1)
SPY TSLA
2022-04-18 NaN -200
2022-04-19 NaN -5
2022-04-20 NaN 380
2022-04-21 NaN 250
2022-04-22 NaN 500
```