Last Updated on June 11, 2022 by Jay
This tutorial will walk through how to read multiple Excel sheets (within the same file) in the R programming language. I will present two ways here. One is what might be familiar to Python programmers, the other is in R fashion.
Library
The R library we’ll use is openxlsx. Type the following inside the R console to install it:
> install.packages('openxlsx')
Tips on Using R library
To import an R library into our code, use library():
> library(openxlsx)
Once the library is imported, usually we can call functions without referring to the library name. For example, we can call read.xlsx(), which is a function in the openxlsx library to read Excel files. However, sometimes there might be functions with duplicated names when we import multiple libraries at once.
To specify which library & function to call, use the double colon :: symbol like the below:
openxlsx::read.xlsx()
Read Multiple Excel Sheets in R (Python style)
Coming from a Python background, this is what I naturally did when first started using R:
- Find all tab names inside the Excel file
- Do a loop to extract data from each tab
The code looks like this:
url <- "C:/Users/jay/Desktop/PythonInOffice/combine_excel_r/sales_1.xlsx"
sheet_names <- getSheetNames(url)
> sheet_names
[1] "Sheet 1" "Sheet 2" "Sheet 3"
for (s in sheet_names){
temp <- read.xlsx(url, sheet=s)
}
Indeed, the above code works perfectly fine. However, there’s a neater way to do this, and it’s good to learn how to use different R build-in functions.
Read Multiple Excel Sheets in R
The R function apply(x, function) applies a function to each item inside variable x. It works similar to the map() in Python.
As shown in the above R code, we can create a function that takes an argument x, which is going to be the sheet name. Then we use apply() to run each item inside sheet_names through that function. The returned result is a list that contains three dataframes.
lapply(sheet_names, function(x){read.xlsx(url, sheet=x)})
[[1]]
ID Customer Item Purchase.Price Purchase.Date
1 1 Cust_1 Water 3 43910
2 2 Cust_2 Water 3 43910
3 3 Cust_3 Water 3 43910
4 4 Cust_4 Water 3 43910
5 5 Cust_5 Water 3 43910
6 6 Cust_6 Water 3 43910
[[2]]
ID Customer Item Purchase.Price Purchase.Date
1 1 Cust_4 Coffee 15 43912
2 2 Cust_8 Coffee 15 43912
3 3 Cust_2 Coffee 15 43912
4 4 Cust_6 Coffee 15 43912
5 5 Cust_10 Coffee 15 43912
[[3]]
ID Customer Item Purchase.Price Purchase.Date
1 1 Cust_1 Pasta 5 43912
2 2 Cust_5 Pasta 5 43912
3 3 Cust_9 Pasta 5 43912
4 4 Cust_3 Pasta 5 43912
5 5 Cust_7 Pasta 5 43912
As a fun exercise, I want to replicate the above no-loop fashion in Python. The following gives the same results with a little help from the lambda function. It’s also interesting to see how similar the syntaxes are between R and Python!
import pandas as pd
url = r'C:\Users\jay\Desktop\PythonInOffice\combine_excel_r\sales_1.xlsx'
sheet_names = pd.ExcelFile(url).sheet_names
list(map(lambda x: pd.read_excel(url, sheet_name=x), sheet_names))