Last Updated on March 2, 2022 by Jay
We often need to calculate the number of days between two dates when doing data analysis in Python and pandas.
Python Datetime
When we need to work with data associated with date or time, we usually use the datetime class. It provides many convenient functions to handle both date and time data. Datetime is a Python built-in library, so no installation is required.
import datetime as dt
tdy = dt.datetime.today()
eoy = dt.datetime(2022,12,31)
print(tdy)
print(eoy)
2022-03-02 19:06:22.248996
2022-12-31 00:00:00
We can use simple math operators directly on datetime objects. The result is a timedelta object, which has an attribute called days. Note below we can chain the calculation and calling the attribute.
eoy - tdy
datetime.timedelta(days=303, seconds=17617, microseconds=751004)
(eoy - tdy).days
303
With this basic understanding, now let’s move on to handle data in a pandas dataframe.
Calculate Number of Days Between Dates in Python Pandas
Sample Dataset
We have a simple dataset with just two columns, date has several different dates, and date2 has Dec 01, 2022. We are going to calculate the difference between all dates for the two columns.
import pandas as pd
df = pd.DataFrame({'date':['2021-12-01', '2022-01-01','2022-02-01','2022-03-01','2022-04-01','2022-05-01'],
'date2':['2022-12-01', '2022-12-01','2022-12-01','2022-12-01','2022-12-01','2022-12-01']})
date date2
0 2021-12-01 2022-12-01
1 2022-01-01 2022-12-01
2 2022-02-01 2022-12-01
3 2022-03-01 2022-12-01
4 2022-04-01 2022-12-01
5 2022-05-01 2022-12-01
Note how we created the two columns, the data in both columns are String at the moment.
We need to first convert these String values into datetime data type for easy processing later.
df['date'] = pd.to_datetime(df['date'])
df['date2'] = pd.to_datetime(df['date2'])
Remember we can access the datetime values inside a dataframe column using the dt accessor? We can further add a date attribute after dt to get just the date portion, then use basic math operators to calculate the difference.
df['diff'] = df['date2'].dt.date - df['date'].dt.date
df
date date2 diff
0 2021-12-01 2022-12-01 365 days
1 2022-01-01 2022-12-01 334 days
2 2022-02-01 2022-12-01 303 days
3 2022-03-01 2022-12-01 275 days
4 2022-04-01 2022-12-01 244 days
5 2022-05-01 2022-12-01 214 days
There’s even an easier way to do this, we don’t need to use the dt accessor at all if we are already using datetime dtype columns. The diff column now contains timedelta objects.
Then, to remove the word “days” and keep only the number part of the diff column, we can use the dt.days attribute.
df['diff'] = df['date2'] - df['date']
df
date date2 diff
0 2021-12-01 2022-12-01 365 days
1 2022-01-01 2022-12-01 334 days
2 2022-02-01 2022-12-01 303 days
3 2022-03-01 2022-12-01 275 days
4 2022-04-01 2022-12-01 244 days
5 2022-05-01 2022-12-01 214 days
print(df['diff'].dtype)
timedelta64[ns]
df['diff'].dt.days
0 365
1 334
2 303
3 275
4 244
5 214
Name: diff, dtype: int64
Additional Resources
How to Convert Column to Datetime in Pandas
This was helpful, thanks!