How to Filter Pandas Dataframe by Date

Sharing is caring!

Last Updated on March 1, 2022 by Jay

There are a few ways to filter a pandas dataframe by date. Some methods are easier while others are more flexible. Let’s take a look at them.

Sample Dataset

We’ll use a hypothetical stock price dataframe that contains the date, and stock open, high, low, close prices for individual dates.

df = pd.DataFrame({'date':['2021-12-01', '2022-01-01','2022-02-01','2022-03-01','2022-04-01','2022-05-01'],
                   'open':[100, 101, 102, 103, 104, 105],
                   'high':[1000, 1001, 1002, 1003, 1004, 1005],
                   'low':[0, 1,2,3,4,5],
                   'close':[200, 201,202,203,204,205]})


	      date	open	high	low	close
0	2021-12-01	100	1000	0	200
1	2022-01-01	101	1001	1	201
2	2022-02-01	102	1002	2	202
3	2022-03-01	103	1003	3	203
4	2022-04-01	104	1004	4	204
5	2022-05-01	105	1005	5	205

Note we used text values when creating the date column. Therefore it’s a String data type, we can check it to confirm. A dtype of ‘O’ means Object or simply String value.

df['date'].dtype
dtype('O')

Method 1: Filter dataframe by date string value

I find this method funny while convenient. You can use logical comparison (greater than, less than, etc) with string values. The below filters the dataframe by selecting dates after ‘2022-03-01’.

# Days after (not including) 20222-03-01 
df[df['date'] > '2022-03-01']

	      date	open	high	low	close
4	2022-04-01	104	1004	4	204
5	2022-05-01	105	1005	5	205

We can also filter the dataframe by providing a date range using the bitwise and & operator. Note that we have to use parathesis around the two conditions, or it will throw an error. The below filters the dataframe by selecting dates between 2022-02-01 and 2022-04-01 inclusive.

# Days between 2022-02-01 and 2022-04-01 inclusive
df[(df['date'] >= '2022-02-01') & (df['date'] <= '2022-04-01')]


              date	open	high	low	close
2	2022-02-01	102	1002	2	202
3	2022-03-01	103	1003	3	203
4	2022-04-01	104	1004	4	204

Using partial dates

Warning – using partial dates for comparison is somewhat confusing. I do not recommend this approach until you know the caveats.

To filter all dates after the year 2021, we can do so by just passing in the string value ‘2022’ into the comparison. Note, > actually means >= greater than and equal to in this case… Therefore, only the first record which is in 2021 got dropped.

df[df['date'] > '2022']
              date	open	high	low	close
1	2022-01-01	101	1001	1	201
2	2022-02-01	102	1002	2	202
3	2022-03-01	103	1003	3	203
4	2022-04-01	104	1004	4	204
5	2022-05-01	105	1005	5	205


df[df['date'] >= '2022']
	      date	open	high	low	close
1	2022-01-01	101	1001	1	201
2	2022-02-01	102	1002	2	202
3	2022-03-01	103	1003	3	203
4	2022-04-01	104	1004	4	204
5	2022-05-01	105	1005	5	205

We can take this further by adding the month into the string value. However, this also can cause confusion sometimes. Note the subtle difference in the below code and results.

I would say the first one is correct, because 2022-02-01 is not > 2022-02-01 technically, so it shouldn’t be included in the selection.

#correct selection
df[df['date'] > '2022-02-01']
	      date	open	high	low	close
3	2022-03-01	103	1003	3	203
4	2022-04-01	104	1004	4	204
5	2022-05-01	105	1005	5	205


#incorrect selection
df[df['date'] > '2022-02']
	      date	open	high	low	close
2	2022-02-01	102	1002	2	202
3	2022-03-01	103	1003	3	203
4	2022-04-01	104	1004	4	204
5	2022-05-01	105	1005	5	205

Method 2: Filter dataframe by datetime

While filtering dataframe by “string” date value is convenient, it lacks flexibility in many cases. To gain full control of the date data, we should consider converting the column into a datetime date type first. We are going to overwrite the original “string” date with the datetime values by assigning the new data back to the date column.

df['date'] = pd.to_datetime(df['date'])

print(df['date'].dtype)
datetime64[ns]

Now our date is a datetime, and we can access each of the year, month, day easily by using the .dt accessor. For example, we can access the month of each date using dt.month. This way, all dates with month > 2 (Feb) are selected.

Similarly, we can access the year and days by calling dt.year and dt.day, respectively.

In addition to filtering with .dt accessors, we can still use string values. They work the same as before:

df[(df['date'] >= '2022-02-01') & (df['date'] <= '2022-04-01')]

	      date	open	high	low	close
2	2022-02-01	102	1002	2	202
3	2022-03-01	103	1003	3	203
4	2022-04-01	104	1004	4	204

Additional Resources

How to Convert Column to Datetime in Pandas

How to Fix ValueError: [day,month,year] is missing

One comment

Leave a Reply

Your email address will not be published. Required fields are marked *