How to Calculate Days Between Dates in Python

Sharing is caring!

Last Updated on March 2, 2022 by Jay

We often need to calculate the number of days between two dates when doing data analysis in Python and pandas.

Python Datetime

When we need to work with data associated with date or time, we usually use the datetime class. It provides many convenient functions to handle both date and time data. Datetime is a Python built-in library, so no installation is required.

import datetime as dt

tdy = dt.datetime.today()
eoy = dt.datetime(2022,12,31)

print(tdy)
print(eoy)

2022-03-02 19:06:22.248996
2022-12-31 00:00:00

We can use simple math operators directly on datetime objects. The result is a timedelta object, which has an attribute called days. Note below we can chain the calculation and calling the attribute.

eoy - tdy
datetime.timedelta(days=303, seconds=17617, microseconds=751004)

(eoy - tdy).days
303

With this basic understanding, now let’s move on to handle data in a pandas dataframe.

Calculate Number of Days Between Dates in Python Pandas

Sample Dataset

We have a simple dataset with just two columns, date has several different dates, and date2 has Dec 01, 2022. We are going to calculate the difference between all dates for the two columns.

import pandas as pd

df = pd.DataFrame({'date':['2021-12-01', '2022-01-01','2022-02-01','2022-03-01','2022-04-01','2022-05-01'],
                   'date2':['2022-12-01', '2022-12-01','2022-12-01','2022-12-01','2022-12-01','2022-12-01']})

	      date	     date2
0	2021-12-01	2022-12-01
1	2022-01-01	2022-12-01
2	2022-02-01	2022-12-01
3	2022-03-01	2022-12-01
4	2022-04-01	2022-12-01
5	2022-05-01	2022-12-01

Note how we created the two columns, the data in both columns are String at the moment.

We need to first convert these String values into datetime data type for easy processing later.

df['date'] = pd.to_datetime(df['date'])
df['date2'] = pd.to_datetime(df['date2'])

Remember we can access the datetime values inside a dataframe column using the dt accessor? We can further add a date attribute after dt to get just the date portion, then use basic math operators to calculate the difference.

df['diff'] = df['date2'].dt.date - df['date'].dt.date
df

	      date           date2	    diff
0	2021-12-01	2022-12-01	365 days
1	2022-01-01	2022-12-01	334 days
2	2022-02-01	2022-12-01	303 days
3	2022-03-01	2022-12-01	275 days
4	2022-04-01	2022-12-01	244 days
5	2022-05-01	2022-12-01	214 days

There’s even an easier way to do this, we don’t need to use the dt accessor at all if we are already using datetime dtype columns. The diff column now contains timedelta objects.

Then, to remove the word “days” and keep only the number part of the diff column, we can use the dt.days attribute.

df['diff'] = df['date2'] - df['date']
df

	      date           date2	    diff
0	2021-12-01	2022-12-01	365 days
1	2022-01-01	2022-12-01	334 days
2	2022-02-01	2022-12-01	303 days
3	2022-03-01	2022-12-01	275 days
4	2022-04-01	2022-12-01	244 days
5	2022-05-01	2022-12-01	214 days

print(df['diff'].dtype)
timedelta64[ns]

df['diff'].dt.days
0    365
1    334
2    303
3    275
4    244
5    214
Name: diff, dtype: int64

Additional Resources

How to Convert Column to Datetime in Pandas

How to Convert Datetime to String in Pandas

Python Datetime Format

One comment

Leave a Reply

Your email address will not be published. Required fields are marked *