Last Updated on October 22, 2022 by Jay
This tutorial will walk through the best way I know of to add new data to a master Excel file using Python. There are several ways to add data to a master Excel file, I think this is the best way because it can preserve the master data formats and everything inside the file.
Install Library
For this project, we’ll use the xlwings Python library. In my opinion, xlwings is the best Python library for manipulating Excel files.
pip install xlwings
Sample Files
Feel free to download the sample Excel files for this project.
The master file will have some pre-set formats and cell background colors, and we would like to keep those.
The new data files are plain data file, can be either Excel or CSV format.
Working With The Master Excel File
We can use xlwings to open Excel files. As soon as we execute the following code, our Excel should open with the master file.
import xlwings as xw
master_wb = xw.Book('master_file.xlsx')
Three observations from the master file:
- There are two tabs, both with data
- Each tab has its own formatting and background colors
- We want to add new data to the last row of the existing data for each tab. In the above screenshot, we want to add new data starting row 5 for the “Canada” tab
Getting Excel Sheets
Getting all Excel sheets is as easy as calling the sheets attribute as shown below. Note it returns a Sheets object, which is a collection of Excel sheets, we can use indexing to access each individual sheet. To get the sheet names, simply call the .name attribute.
master_sheets = master_wb.sheets
master_sheets
Sheets([<Sheet [master_file.xlsx]Canada>, <Sheet [master_file.xlsx]USA>])
master_sheets[0].name
'Canada'
Figuring Out Where To Put New Data
We will put new data starting from the last (empty) row of the master file. By eyeballing it, we know it should be row 5 (for Canada tab). So how to find the last row of data in Excel? It’s by pressing Ctrl + down arrow. If you select cell A1 and press ctrl + down arrow, your cursor will move down to A4. The next row (row 5) is our destination.
To get A4 programmatically:
master_sheets[0].range('A1').end('down').address
'$A$4'
master_sheets[0].range('A1').end('down').row
4
Open & Read New Data File
We’ll open the new data file like we did with the master file. To get data from all the (non-empty) rows and columns. Use the .expand() method to expand the range selection. Note we expand from cell A2, as row 1 is the header row which we don’t need.
newdata_wb = xw.Book(r"new_data.xlsx")
data_range = newdata_wb.sheets[0].range('A2').expand()
<Range [new_data.xlsx]Sheet1!$A$2:$G$6>
new_data_raw = newdata_wb.sheets[0].range('A2').expand().value
new_data_raw
[['10/15/2022', 10.0, 'Waterloo', 'Canada', 'Event 10', 10.1, 1.0],
['10/16/2022', 11.0, 'St Louis', 'USA', 'Event 11', 10.2, 1.01],
['10/17/2022', 12.0, 'Boston', 'USA', 'Event 12', 10.3, 1.02],
['10/19/2022', 13.0, 'Ottawa', 'Canada', 'Event 13', 10.4, 111.11],
['4/1/2022', 5.0, 'Austin', 'USA', 'Event 5', 11.11, 1.23]]
So we got a list of lists. Both countries are inside the lists, let’s separate them and remove the country from each sub-lists. Using Canada as an example. Here we are using a list comprehension so we can avoid long loops. If you need help understanding list comprehension, read the following:
[i[0:3] + i[4:] for i in new_data_raw if i[3] == 'Canada']
[['10/15/2022', 10.0, 'Waterloo', 'Event 10', 10.1, 1.0],
['10/19/2022', 13.0, 'Ottawa', 'Event 13', 10.4, 111.11]]
Transfer Data To Master File
Let’s write the complete code for the steps up to this point:
import xlwings as xw
master_wb = xw.Book('master_file.xlsx')
newdata_wb = xw.Book("new_data.xlsx")
master_sheets = master_wb.sheets
new_data_raw = newdata_wb.sheets[0].range('A2').expand().value
## loop through each sheet in master file, move data over to the appropriate tab
for s in master_sheets:
temp_data = [i[0:3] + i[4:] for i in new_data_raw if i[3] == s.name]
newrow = master_wb.sheets[s.name].range('A1').end('down').row + 1
master_wb.sheets[s.name].range(newrow, 1).value = temp_data
After running the above code, our master file will look like the following:
The new data that we added doesn’t have consistent formatting! Let’s fix that.
Transfer Format To New Data Range
We’ll copy the cell format from row 2 and paste formats only to the rest of the data range. We’ll do so by modifying the above loop to the following:
for s in master_sheets:
temp_data = [i[0:3] + i[4:] for i in new_data_raw if i[3] == s.name]
newrow = master_wb.sheets[s.name].range('A1').end('down').row + 1
master_wb.sheets[s.name].range(newrow, 1).value = temp_data
master_wb.sheets[s.name].range('A2').expand('right').copy()
master_wb.sheets[s.name].range('A2').expand().paste(paste='formats')
As we see the copy/paste will also copy the cell background color. However the Date isn’t exactly how we want it to look like. We recognize the integer number in A5 and A6 are basically Excel’s date values, so we can change their format like the below:
master_wb.sheets[0].range('A5:A6').number_format = 'mm/dd/yyyy'
Hello, Jay. Thanks for sharing all of this helpful information. I would like to make a request. Could you please post on how to use Python to create something like this: VBA-Multiple-Find-and-Replace by DVN-Anakin on Github. Preferably, the one without the backlogs, which can actually replace every word in the Word document. And it also works while taking into account
that the Excel file is big in size.
I am a beginner and it was very helpful. Thank you.