How To Truncate Dataframe In Pandas

Sharing is caring!

Last Updated on July 14, 2022 by Jay

Sometimes we might want to truncate a dataframe to remove excessive data, we can do this by calling the truncate() pandas method.

Pandas truncate() Syntax

DataFrame.truncate(before=None,
                   after=None,
                   axis =None,
                   copy=True)

The before and after arguments control what rows to remove from the dataframe based on index values.

Example

For example, we have the below simple dataframe. We want to remove the top 2 rows and the bottom 3 rows.

  • before=2 means removing rows with index values before 2, i.e. 0 and 1
  • after=6 means removing rows with index values after 6, so 7,8 and 9
import pandas as pd
df = pd.DataFrame({'a':range(10,20), 'b':range(20,30)})
    a   b
0  10  20
1  11  21
2  12  22
3  13  23
4  14  24
5  15  25
6  16  26
7  17  27
8  18  28
9  19  29

df.truncate(before=2, after=6)
    a   b
2  12  22
3  13  23
4  14  24
5  15  25
6  16  26

Truncate Dataframe With Time-series Data In Pandas

Since the truncate method works on the index, it’s very convenient to use it on time-series data. In the below example let’s drop all data rows after 2022-04-25.

df = pd.DataFrame({'a':range(10,20), 'b':range(20,30),'c':range(30,40)},
                   index=pd.date_range('2022-04-20','2022-04-30'))
             a   b   c
2022-04-21  10  20  30
2022-04-22  11  21  31
2022-04-23  12  22  32
2022-04-24  13  23  33
2022-04-25  14  24  34
2022-04-26  15  25  35
2022-04-27  16  26  36
2022-04-28  17  27  37
2022-04-29  18  28  38
2022-04-30  19  29  39

df.truncate(after='2022-04-25')
             a   b   c
2022-04-21  10  20  30
2022-04-22  11  21  31
2022-04-23  12  22  32
2022-04-24  13  23  33
2022-04-25  14  24  34

Truncate Dataframe Columns

We can also remove excess columns by setting the argument axis=1:

df.truncate(before='b',axis=1)
             b   c
2022-04-21  20  30
2022-04-22  21  31
2022-04-23  22  32
2022-04-24  23  33
2022-04-25  24  34
2022-04-26  25  35
2022-04-27  26  36
2022-04-28  27  37
2022-04-29  28  38
2022-04-30  29  39

A Sorted Index Is Required

There’s one caveat when using truncate(). The dataframe index must be sorted first. Let’s try to mess up the index, and then apply truncate() again. The following code will result in a ValueError: truncate requires a sorted index

df2=pd.concat([df.iloc[-5:,:], df.iloc[:5,:]])
             a   b   c
2022-04-26  15  25  35
2022-04-27  16  26  36
2022-04-28  17  27  37
2022-04-29  18  28  38
2022-04-30  19  29  39
2022-04-21  10  20  30
2022-04-22  11  21  31
2022-04-23  12  22  32
2022-04-24  13  23  33
2022-04-25  14  24  34

df2.truncate(after='2022-04-28')
truncate dataframe value error

Truncate vs loc/iloc

The querying functions loc and iloc work in similar ways as truncate(). As the below example shows:

df.loc[df.index<='2022-04-25']
             a   b   c
2022-04-21  10  20  30
2022-04-22  11  21  31
2022-04-23  12  22  32
2022-04-24  13  23  33
2022-04-25  14  24  34

However, note that we can use loc/iloc on unsorted dataframes, but truncate works only on a sorted dataframe. Which makes loc & iloc more robust in certain cases.

df2.loc[df2.index<='2022-04-28']
             a   b   c
2022-04-26  15  25  35
2022-04-27  16  26  36
2022-04-28  17  27  37
2022-04-21  10  20  30
2022-04-22  11  21  31
2022-04-23  12  22  32
2022-04-24  13  23  33
2022-04-25  14  24  34

Additional Resources

How to Filter Dataframe With Pandas Query Method – With Examples

How to Filter Pandas Dataframe by Date

Filter a pandas dataframe – OR, AND, NOT

Leave a Reply

Your email address will not be published. Required fields are marked *