Combine Excel Sheets Using Python

Sharing is caring!

Last Updated on July 14, 2022 by Jay

In this tutorial, we’ll combine sheets from different Excel files into a single workbook but still separate sheets using Python.

You guys asked for this tutorial, and I delivered 🙂

However, just a word of caution – having too many sheets inside one Excel is a nightmare. Try to keep only relevant sheets inside the same Excel file. Also, please don’t use Excel as a database. It’s okay to store a small amount of data inside Excel. I do that all the time too. But if you have a large amount of data (e.g.,> 100MB), please consider an actual database. Personally, I prefer SQLite since it’s powerful and lightweight.

Sample Datasets

Gather All Files

Typically, if we need to access many files at once, I like to put them all inside the same folder, then we can easily read all files from that folder into Python. This is similar to Excel (PowerQuery)’s way of “Get Data From Folder”. This blog focuses on integrating Python and Excel, but if you want to learn Power Query or Excel in General, leave a comment down below!

Excel PowerQuery/Get Data From Folder
Excel PowerQuery/Get Data From Folder

.Get All Files From Given Folder

We can use the os.listdir() method to get all file paths from a given folder, process just the Excel files, and pass on the other file types. Since I already have a tutorial for that, check out the details here.

Just a quick note on the code below, in case your Excel files are in other formats, e.g., “.xlsm” or “.xls”, make sure to add those into the if statement using or operator.

import pandas as pd
import os

url = ''
files = os.listdir(url)
for f in files:
    if f.endswith('.xlsx'):
        ## extract sheets       

.Get All Files Using An Input File

One of my other favorite ways to get file names is using an input file. Basically, we enter all potential file paths into another file. Either .txt or Excel is fine. I prefer Excel since it gives our data an excellent structure (i.e., table) to work with. The below shows how it works.

  • Use pandas to read the “input” file
  • Process information stored in the input file
  • Start working with individual files
Excel as an input for file paths
Excel as an input for file paths

The code looks something like this:

import pandas as pd

files = pd.read_excel('input.xlsx')
files_to_use = files[files['Include'] == 'Y']['File path'].tolist()
for f in files_to_use:
    df = pd.read_excel(f)

Note that we can include a column inside the input file to control the behavior of our program. This way, with a simple flip of a button, we can choose which files to load or to pass. Well, there’s no button here, but you get the idea – change to “Y” or “N”, and don’t forget to save the input file!

Because I set “N” for the second file with the above example, the program will skip it and only loads the first file.

Extract All Sheets

We know how to get all the file names. Now it’s time to extract information from each file.

Another word of caution – this tutorial uses only the pandas library to combine excel sheets, so it will NOT preserve any formulas or formats. Everything will be treated as hard-code values by the Python library. If you want to preserve formulas, you have to use other libraries such as xlwings, openpyxl or xlsxwriter.

I have a full series of tutorials on how to automate Excel using xlsxwriter library.

Unless we use all CSV files, we are unlikely to know how many sheets are in each workbook. We can use either pd.read_excel or pd.ExcelFile to read all sheets from a given spreadsheet, here I’m going to use the latter to do so.

excel= pd.ExcelFile('C:\\Users\\jay\\Desktop\\PythonInOffice\\combine_excel_sheets_python\\sales_1.xlsx')

for s in excel.sheet_names:
    print(excel.parse(s))
pandas read multiple Excel sheets
pandas read multiple Excel sheets

Create And Save The Output File

It’s easy to use pandas to create Excel files. In this tutorial, I discussed the two different ways of saving Excel files.

Basically, we need to create an ExcelWriter object, then write each dataframe as a single/separate sheet into the workbook. My current preferred approach is as follow:

  1. Create an empty dictionary (or list if you are lazy)
  2. Once I finish reading a dataframe (sheet), I store it inside the dictionary. Yes, you can store any object into a dictionary… Repeast this step for as many time as needed
  3. When I’m ready to save the Excel file, I’ll use a context manager and loop through all items of the dictionary to save all sheets into an Excel file

The code looks like this:

df_dict = {}

### Code here that prepares many dataframes
### Code here that adds each datafram into the above dictionary
### df_dict is a dictionary of dataframes

with pd.ExcelWriter('output.xlsx') as writer:
    for k in df_dict.keys():
        df_dict[k].to_excel(writer, sheet_name = k, index = False)

This approach is better if we need to manage many dataframes (sheets). Especially when our code gets too long, it’s easy to get lost sometimes. It also makes debugging more manageable because we save to Excel at one spot instead of many different places in the code.

Putting It All Together – Combine Multiple Sheets Into One Excel Workbook In Python

I’m going to use the os.listdir() approach for the complete code. Feel free to try the input file approach if that fits your needs better.

Also note, in the below sample code, I extract item names from individual sheets using this line df_name = df['Item'][0]. Your Excel files may not have the same structure, so you’ll have to tweak it a little bit to fit your files.

import pandas as pd
import os

url = r'C:\Users\jay\Desktop\PythonInOffice\combine_excel_sheets_python'

files = os.listdir(url)
df_dict = {}
for f in files:
    if f.endswith('.xlsx') and ('input' not in f):
        excel = pd.ExcelFile(f)
        sheets = excel.sheet_names
        for s in sheets:
            df = excel.parse(s)
            df_name = df['Item'][0]
            df_dict[df_name] = df
        
with pd.ExcelWriter('output.xlsx') as writer:
    for k in df_dict.keys():
        df_dict[k].to_excel(writer, sheet_name = k, index=False)

Here you go, with a few lines of code, we can combine multiple Excel sheets into one workbook using Python.

Leave a Reply

Your email address will not be published. Required fields are marked *