Last Updated on July 14, 2022 by Jay
Sometimes we might want to truncate a dataframe to remove excessive data, we can do this by calling the truncate() pandas method.
Pandas truncate() Syntax
DataFrame.truncate(before=None,
after=None,
axis =None,
copy=True)
The before and after arguments control what rows to remove from the dataframe based on index values.
Example
For example, we have the below simple dataframe. We want to remove the top 2 rows and the bottom 3 rows.
- before=2 means removing rows with index values before 2, i.e. 0 and 1
- after=6 means removing rows with index values after 6, so 7,8 and 9
import pandas as pd
df = pd.DataFrame({'a':range(10,20), 'b':range(20,30)})
a b
0 10 20
1 11 21
2 12 22
3 13 23
4 14 24
5 15 25
6 16 26
7 17 27
8 18 28
9 19 29
df.truncate(before=2, after=6)
a b
2 12 22
3 13 23
4 14 24
5 15 25
6 16 26
Truncate Dataframe With Time-series Data In Pandas
Since the truncate method works on the index, it’s very convenient to use it on time-series data. In the below example let’s drop all data rows after 2022-04-25.
df = pd.DataFrame({'a':range(10,20), 'b':range(20,30),'c':range(30,40)},
index=pd.date_range('2022-04-20','2022-04-30'))
a b c
2022-04-21 10 20 30
2022-04-22 11 21 31
2022-04-23 12 22 32
2022-04-24 13 23 33
2022-04-25 14 24 34
2022-04-26 15 25 35
2022-04-27 16 26 36
2022-04-28 17 27 37
2022-04-29 18 28 38
2022-04-30 19 29 39
df.truncate(after='2022-04-25')
a b c
2022-04-21 10 20 30
2022-04-22 11 21 31
2022-04-23 12 22 32
2022-04-24 13 23 33
2022-04-25 14 24 34
Truncate Dataframe Columns
We can also remove excess columns by setting the argument axis=1:
df.truncate(before='b',axis=1)
b c
2022-04-21 20 30
2022-04-22 21 31
2022-04-23 22 32
2022-04-24 23 33
2022-04-25 24 34
2022-04-26 25 35
2022-04-27 26 36
2022-04-28 27 37
2022-04-29 28 38
2022-04-30 29 39
A Sorted Index Is Required
There’s one caveat when using truncate(). The dataframe index must be sorted first. Let’s try to mess up the index, and then apply truncate() again. The following code will result in a ValueError: truncate requires a sorted index
df2=pd.concat([df.iloc[-5:,:], df.iloc[:5,:]])
a b c
2022-04-26 15 25 35
2022-04-27 16 26 36
2022-04-28 17 27 37
2022-04-29 18 28 38
2022-04-30 19 29 39
2022-04-21 10 20 30
2022-04-22 11 21 31
2022-04-23 12 22 32
2022-04-24 13 23 33
2022-04-25 14 24 34
df2.truncate(after='2022-04-28')
Truncate vs loc/iloc
The querying functions loc and iloc work in similar ways as truncate(). As the below example shows:
df.loc[df.index<='2022-04-25']
a b c
2022-04-21 10 20 30
2022-04-22 11 21 31
2022-04-23 12 22 32
2022-04-24 13 23 33
2022-04-25 14 24 34
However, note that we can use loc/iloc on unsorted dataframes, but truncate works only on a sorted dataframe. Which makes loc & iloc more robust in certain cases.
df2.loc[df2.index<='2022-04-28']
a b c
2022-04-26 15 25 35
2022-04-27 16 26 36
2022-04-28 17 27 37
2022-04-21 10 20 30
2022-04-22 11 21 31
2022-04-23 12 22 32
2022-04-24 13 23 33
2022-04-25 14 24 34
Additional Resources
How to Filter Dataframe With Pandas Query Method – With Examples