Last Updated on July 14, 2022 by Jay
Sometimes we might need to turn a pandas dataframe from a wide format to a long format. This can be done easily with the melt method. We’ll show how to use it with a simple example in this tutorial.
Sample Dataset
Consider the following sample dataset – we have a table containing sales data from 4 countries over the first 6 months. Then our goal is to convert the “wide” format into a “long” format like the above image shows.
import pandas as pd
import numpy as np
## for replicating exact numbers
np.random.seed(0)
## create a dataframe for sales
sales = pd.DataFrame({
'country': ['Canada','UK','USA','Australia'],
'Jan': np.random.randint(1,1000,size=(4)),
'Feb': np.random.randint(1,1000,size=(4)),
'Mar': np.random.randint(1,1000,size=(4)),
'Apr': np.random.randint(1,1000,size=(4)),
'May': np.random.randint(1,1000,size=(4)),
'Jun': np.random.randint(1,1000,size=(4)),
})
print(sales)
country Jan Feb Mar Apr May Jun
0 Canada 685 836 10 805 601 487
1 UK 560 764 724 600 397 552
2 USA 630 708 278 71 315 88
3 Australia 193 360 755 473 706 175
Pandas Melt() Syntax
First, note that the syntax is pd.melt:
pd.melt(dataframe, id_vars=None, value_vars=None, var_name=None, value_name=’value’, col_level=None, ignore_index=True)
- id_vars: a list/tuple of column names
- This is to specify which columns we want to use as the identifier variables
- value_vars: list/tuple of column names
- The columns we want to unpivot. Leaving this blank means using all columns except the id_vars
- var_name: string
- Column name for the “variable” column
- value_name: string
- Column name for the “value” column
Convert Pandas DataFrame from Wide to Long Form
We will use the ‘country’ column as the identifier variable id_vars. In the first line of code, by leaving value_vars blank we are really saying: use all columns other than ‘country’. Therefore, it’s equivalent to the 2nd line of code below.
pd.melt(sales, id_vars = 'country')
## below is equivalent
pd.melt(sales, id_vars = 'country', value_vars=['Jan','Feb','Mar','Apr','May','Jun'])
The results look good. However, notice a minor problem in the column header – the “variable” and “value” columns are not very descriptive. We want to change them to “Month” and “Sales” respectively.
Update Resulting DataFrame Column Names
We can achieve this by using the df.rename() method. However, there’s an easy way for that by utilizing the var_name and value_name arguments inside the melt() method.
pd.melt(sales, id_vars = 'country',var_name='Month', value_name ='Sales')