Turn Pandas DataFrame From Wide to Long Format

Sharing is caring!

Last Updated on July 14, 2022 by Jay

Sometimes we might need to turn a pandas dataframe from a wide format to a long format. This can be done easily with the melt method. We’ll show how to use it with a simple example in this tutorial.

convert a pandas dataframe from wide to long format

Sample Dataset

Consider the following sample dataset – we have a table containing sales data from 4 countries over the first 6 months. Then our goal is to convert the “wide” format into a “long” format like the above image shows.

import pandas as pd
import numpy as np

## for replicating exact numbers
np.random.seed(0)

## create a dataframe for sales
sales = pd.DataFrame({
    'country': ['Canada','UK','USA','Australia'],
    'Jan': np.random.randint(1,1000,size=(4)),
    'Feb': np.random.randint(1,1000,size=(4)),
    'Mar': np.random.randint(1,1000,size=(4)),
    'Apr': np.random.randint(1,1000,size=(4)),
    'May': np.random.randint(1,1000,size=(4)),
    'Jun': np.random.randint(1,1000,size=(4)),
})

print(sales)
     country  Jan  Feb  Mar  Apr  May  Jun
0     Canada  685  836   10  805  601  487
1         UK  560  764  724  600  397  552
2        USA  630  708  278   71  315   88
3  Australia  193  360  755  473  706  175

Pandas Melt() Syntax

First, note that the syntax is pd.melt:

pd.melt(dataframe, id_vars=None, value_vars=None, var_name=None, value_name=’value’, col_level=None, ignore_index=True)

  • id_vars: a list/tuple of column names
    • This is to specify which columns we want to use as the identifier variables
  • value_vars: list/tuple of column names
    • The columns we want to unpivot. Leaving this blank means using all columns except the id_vars
  • var_name: string
    • Column name for the “variable” column
  • value_name: string
    • Column name for the “value” column

Convert Pandas DataFrame from Wide to Long Form

We will use the ‘country’ column as the identifier variable id_vars. In the first line of code, by leaving value_vars blank we are really saying: use all columns other than ‘country’. Therefore, it’s equivalent to the 2nd line of code below.

pd.melt(sales, id_vars = 'country')

## below is equivalent
pd.melt(sales, id_vars = 'country', value_vars=['Jan','Feb','Mar','Apr','May','Jun'])
pandas melt

The results look good. However, notice a minor problem in the column header – the “variable” and “value” columns are not very descriptive. We want to change them to “Month” and “Sales” respectively.

Update Resulting DataFrame Column Names

We can achieve this by using the df.rename() method. However, there’s an easy way for that by utilizing the var_name and value_name arguments inside the melt() method.

pd.melt(sales, id_vars = 'country',var_name='Month', value_name ='Sales')
pandas melt update column names

Additional Resources

Rename pandas dataframe columns

Leave a Reply

Your email address will not be published. Required fields are marked *