Use Python to combine multiple Excel files into one master spreadsheet

In this short tutorial, I’ll show you how to use Python to combine multiple Excel files into one master spreadsheet. Imagine that you have dozens of Excel files with the same data fields, and your job is to aggregate sheets from those files. Manually doing this job is super inefficient, and Python will save you a lot of time in the long run, so let’s all work smarter!

Note that this article talks about appending Excel files with the same format/data fields. Merging multiple dataset is a different task.

If you are new to Python, this series Integrate Python with Excel offers some tips on how to use Python to supercharge your Excel spreadsheets.

The workflow

To solve the problem, we’ll need to follow the below work flow:

  1. Identify the files we need to combine
  2. Get data from the file
  3. Move data from step 2) to a master dataset (we will call it “dataframe”)
  4. Report 2-3 for the number of files
  5. Save the master dataset into an Excel spreadsheet

Import libraries

Alright, let’s see how to code the above work flow in Python. For this exercise, we’ll need to use two Python libraries: os and pandas. If you want to follow along, feel free to grab the source code and files used in this tutorial from here. Although you can combine as many Excel files as you wish, we’ll use three files to demonstrate the process.

If you need help with installing Python or libraries, here’s a guide on how to do that.

os library gives a way of using operating system dependent functionalities. Such as manipulating folder and file paths. We use this library to get all the Excel file names, including their paths.

pandas library is the gold standard for data analysis and manipulation. It is fast, powerful, and flexible. We use this library to load Excel data into Python, manipulate data, and recreate the master spreadsheet.

We’ll start by importing these two libraries. Then find the current working directory, as well as all the file names within the directory.

import os
import pandas as pd
cwd = os.path.abspath('') 
files = os.listdir(cwd) 
python code showing current working directory
Getting current working directory and files within it

The variable cwd shows the path to the current working directory, and the variable files is a list of all the file names within the current working directory. Notice there are non-Excel files, and we don’t want to open those, so we’ll handle that soon.

Combine multiple Excel files into one spreadsheet

Next, we create an empty dataframe df for storing the data for master spreadsheet. We loop through all the files within the current working directory, but only process the Excel files whose name ends with “.xlsx”. This is done by this line of code
if file.endswith('.xlsx'):

pd.read_excel() will read Excel data into Python and store it as a pandas DataFrame object. Be aware that this method reads only the first tab/sheet of the Excel file by default. If your Excel file contains more than 1 sheet, continue reading to the next section.


df.append() will append/combine data from one file to another. Think about copying a block of data from one Excel file and pasting it into another. Instead of opening up Excel, data is stored inside your computer’s memory.

df = pd.DataFrame()
for file in files:
     if file.endswith('.xlsx'):
         df = df.append(pd.read_excel(file), ignore_index=True) 
df.head()

The above code does the following:

  1. Loop through all the files in the current working directory, determine if a file is Excel by checking the file name ends with “.xlsx”.
  2. If yes, read the file content (data), and append/add it to the master dataframe variable called df.
  3. Save the master dataframe into an Excel spreadsheet.

We can examine the master dataframe by checking df.head(), which shows the first 5 rows of the data.

python code showing the first 5 rows of data
Checking the first 5 rows of data in the dataframe

Seems good! Just another quick check to make sure we have loaded everything in the DataFrame. df.shape will show us the dimension (36 rows, 5 columns) of the data:

python code showing the size of the master dataframe

Everything looks good, so let’s output the data back into Excel. The last line df.to_excel() will do that.

Combine multiple sheets from the same Excel file

I talked about the two techniques to read multiple sheets from the same Excel file, so I won’t repeat it. However, I’ll walk through an example here with a slightly different setting.

We have 2 files each contains a number of sheets. We don’t know how many sheets are in each file, but we know the format is the same for all sheets. Our goal is to aggregate all sheets into one spreadsheet (and one file).

The workflow is similar:

  1. Get all Excel files
  2. Loop through the Excel files
  3. For each file, loop through all sheets
  4. Read each sheet into a dataframe, then combine all dataframes together.
df_total = pd.DataFrame()
for file in files:  # loop through Excel files
    if file.endswith('.xlsx'):
        excel_file = pd.ExcelFile(file)
        sheets = excel_file.sheet_names
        for sheet in sheets: # loop through sheets inside an Excel file
            df = excel_file.parse(sheet_name = sheet)
            df_total = df_total.append(df)
df_total.to_excel('combined_file.xlsx')

Putting it all together

Below is the full code put together. 10 lines of code will help you combine all your Excel files or sheets into one master spreadsheet. Enjoy!

import os
import pandas as pd
cwd = os.path.abspath('') 
files = os.listdir(cwd)  

## Method 1 gets the first sheet of a given file
df = pd.DataFrame()
for file in files:
    if file.endswith('.xlsx'):
        df = df.append(pd.read_excel(file), ignore_index=True) 
df.head() 
df.to_excel('total_sales.xlsx')



## Method 2 gets all sheets of a given file
df_total = pd.DataFrame()
for file in files:                         # loop through Excel files
    if file.endswith('.xlsx'):
        excel_file = pd.ExcelFile(file)
        sheets = excel_file.sheet_names
        for sheet in sheets:               # loop through sheets inside an Excel file
            df = excel_file.parse(sheet_name = sheet)
            df_total = df_total.append(df)
df_total.to_excel('combined_file.xlsx')

14 comments

  1. Hi, is there a way to combine 2 different data frames from 2 different excels/csvs into different sheets of a single workbook?

  2. Hi Jay I followed the step in combining multiple excel files with multiple sheets .I am getting empty combined file .do i need to mention the sheet names ? what i am missing.

    Thanks in advance

  3. Hi Jay, I have an excel file which has sheets formatted differently, I am trying to make it so that these sheets are read and combined into another excel file in which I want the sheet to be continuous flowing together after formatting has been made the same for them….

  4. Hi Jay,

    I have a process as follows

    use 2 separate csv files in a folder and run a power query in a separate excel to filter results by date.

    then next step is to use a combination of macro buttons in an excel to filter the combined data.

    Next the process is to cut and paste separate ranges to three separate excel files.

    Tedious much…. any help here would be greatly appreciated

    1. Hi Rick,

      Thanks for dropping by!

      I was in a similar situation a few years ago, that we had a combination of PQ, VBA, and Excel formulas. The process worked fine but it was tedious and slow (takes 2-3 days to update everything). Then we coded that process in Python and now it takes 30 mins.

      I’m confident that the process you outlined can be automated with Python. But I’ll need more details before I know how to help.

  5. Hi Jay, I have multiple excel files with multiple sheets in it and also I need to have one master sheet with combining only one sheet from each excel file and also need to eliminate few rows in it. how do I do this?
    can you please help me?

    1. Hi Supriya,

      Thanks for stopping by.

      For your problem, you can use the example in the tutorial with some tweaks.
      1. When you read the Excel file into Python, make sure to include a sheet_name argument to specify which sheet you want from the file.
      2. You’ll have to do some filtering to eliminate a few rows, but it really depends on what you need to remove.

      Hope that helps. You can also send me further questions or your code to my email address: pythoninoffice@gmail.com.

  6. Hi Jay,

    I have multiple excel files with multiple sheets in them. I have to map the data from all these excel files into one master excel file based on particular cells.

    For example: Getting data from Cell B5 in sheet1 in all excel workbooks to cell I5 and onwards in the master excel file.
    Then later on getting data from Cell B7 from sheet 2 into Cell H5 and onwards in the master excel file and so on.

    I am stuck with this! Hope you can help!

    1. Hi Sarthak,
      Thanks for your question. For your task I would suggest using a library like openpyxl/xlsxwriter/xlwings.
      In fact, we have a tutorial (still in progress) that touches on a similar task: copying data from excel files and then combining them into one file. Stay tuned and I’ll drop you a message once that’s available!

      Regards,
      Jay

Leave a Reply

Your email address will not be published.