How To Split An Excel File Into Multiple Files Using Python

Sharing is caring!

Last Updated on July 14, 2022 by Jay

I will show you how to split an Excel file into multiple files using Python in this short tutorial. Splitting an Excel file is a common task, and pretty easy to do by hand. However, if the file contains lots of data and many categories, this task will become repetitive, which means we need a solution to automate it!

Library

We need two libraries for this exercise, pandas and openpyxl. Type the following into the command prompt to install them:

pip install pandas openpyxl

The pandas library is used for working with data (filtering to be specific in our exercise), and openpyxl is used to create new Excel files.

Sample Excel File

For convenience, you can download the sample data from this github repo:

https://raw.githubusercontent.com/pythoninoffice/blog_example_code/main/split_excel_file/master_file.csv

Or just read it directly into a pandas dataframe:

df = pd.read_csv(https://raw.githubusercontent.com/pythoninoffice/blog_example_code/main/split_excel_file/master_file.csv)

df.head()
   order_id  sales_amount sale_product    country
0         1           756          RAM     Canada
1         2           760    Harddrive  Australia
2         3           763       Laptop  Australia
3         4           707      Monitor  Australia
4         5           577    Harddrive  Australia

The dataset contains sales information for an imaginary computer shop – the sales amount, the product, and the country. Our task is to split the data into different files based on the sale_product column.

The underlying mechanism is simple:

  1. First, we read the data into Python/pandas.
  2. Second, apply a filter to group data into different categories.
  3. Last but not least, save the groups of data into different Excel files.

Filter Data

It’s easy to filter data in a pandas dataframe. There are several ways but we’ll use the most straightforward one.

Let’s say we want to filter on data by selecting all Laptop sales.

df.loc[df['sale_product'] == 'Laptop']
    order_id  sales_amount sale_product    country
2          3           763       Laptop  Australia
7          8           881       Laptop         UK
9         10           738       Laptop        USA
10        11           765       Laptop  Australia
13        14           974       Laptop     Canada
16        17           956       Laptop  Australia
26        27           984       Laptop  Australia
28        29           608       Laptop         UK

Finding Categories

Next, we need to extract the categories from the data. They are basically values inside the sale_product column. We can simply return all the unique values from that column.

df['sale_product'].unique()
array(['RAM', 'Harddrive', 'Laptop', 'Monitor', 'GPU'], dtype=object)

Split An Excel Into Multiple Tabs

As shown above, the unique values from sale_product column are inside an array, which means we can loop through it to retrieve each value, such as “Laptop”, “Harddrive”, etc. Then, we can use those values as the filter condition to split the dataset. Finally, we can save each dataset into a separate tab within the same Excel file.

#creat new sheets

with pd.ExcelWriter('new_file.xlsx') as new_file:
    for p in df['sale_product'].unique():
        df.loc[df['sale_product'] == p].to_excel(new_file, sheet_name = p, index=False)
Split An Excel File Into Multiple Tabs
Split An Excel File Into Multiple Tabs

Split An Exel Into Multiple File Using Python

In case we need to split the data into different Excel files (instead of tabs). We can modify the above code a little bit, and just output data from each category into its own files.

#create new files

for p in df['sale_product'].unique():
    df.loc[df['sale_product'] == p].to_excel(f'{p}_sales.xlsx', index=False)
Split An Excel File Into Multiple Files
Split An Excel File Into Multiple Files

Additional Resources

Filter a pandas dataframe (think Excel filters but more powerful)

How to Filter Pandas Dataframe by Date

Filter Dataframe With Pandas Query Method – With Examples

How to Filter A Pandas Dataframe By A List of Values

Save data to an Excel file using Python

Leave a Reply

Your email address will not be published. Required fields are marked *