How to Read Multiple Excel Sheets in R

Sharing is caring!

Last Updated on June 11, 2022 by Jay

This tutorial will walk through how to read multiple Excel sheets (within the same file) in the R programming language. I will present two ways here. One is what might be familiar to Python programmers, the other is in R fashion.

Library

The R library we’ll use is openxlsx. Type the following inside the R console to install it:

> install.packages('openxlsx')

Tips on Using R library

To import an R library into our code, use library():

> library(openxlsx)

Once the library is imported, usually we can call functions without referring to the library name. For example, we can call read.xlsx(), which is a function in the openxlsx library to read Excel files. However, sometimes there might be functions with duplicated names when we import multiple libraries at once.

To specify which library & function to call, use the double colon :: symbol like the below:

openxlsx::read.xlsx()

Read Multiple Excel Sheets in R (Python style)

Coming from a Python background, this is what I naturally did when first started using R:

  1. Find all tab names inside the Excel file
  2. Do a loop to extract data from each tab

The code looks like this:

url <- "C:/Users/jay/Desktop/PythonInOffice/combine_excel_r/sales_1.xlsx"
sheet_names <- getSheetNames(url)

> sheet_names
[1] "Sheet 1" "Sheet 2" "Sheet 3"

for (s in sheet_names){
 temp <- read.xlsx(url, sheet=s)
}

Indeed, the above code works perfectly fine. However, there’s a neater way to do this, and it’s good to learn how to use different R build-in functions.

Read Multiple Excel Sheets in R

The R function apply(x, function) applies a function to each item inside variable x. It works similar to the map() in Python.

As shown in the above R code, we can create a function that takes an argument x, which is going to be the sheet name. Then we use apply() to run each item inside sheet_names through that function. The returned result is a list that contains three dataframes.

lapply(sheet_names, function(x){read.xlsx(url, sheet=x)})

[[1]]
  ID Customer  Item Purchase.Price Purchase.Date
1  1   Cust_1 Water              3         43910
2  2   Cust_2 Water              3         43910
3  3   Cust_3 Water              3         43910
4  4   Cust_4 Water              3         43910
5  5   Cust_5 Water              3         43910
6  6   Cust_6 Water              3         43910

[[2]]
  ID Customer   Item Purchase.Price Purchase.Date
1  1   Cust_4 Coffee             15         43912
2  2   Cust_8 Coffee             15         43912
3  3   Cust_2 Coffee             15         43912
4  4   Cust_6 Coffee             15         43912
5  5  Cust_10 Coffee             15         43912

[[3]]
  ID Customer  Item Purchase.Price Purchase.Date
1  1   Cust_1 Pasta              5         43912
2  2   Cust_5 Pasta              5         43912
3  3   Cust_9 Pasta              5         43912
4  4   Cust_3 Pasta              5         43912
5  5   Cust_7 Pasta              5         43912

As a fun exercise, I want to replicate the above no-loop fashion in Python. The following gives the same results with a little help from the lambda function. It’s also interesting to see how similar the syntaxes are between R and Python!

import pandas as pd
url = r'C:\Users\jay\Desktop\PythonInOffice\combine_excel_r\sales_1.xlsx'
sheet_names = pd.ExcelFile(url).sheet_names
list(map(lambda x: pd.read_excel(url, sheet_name=x), sheet_names))

Additional Resources

Run Python Code In R

How to use Python lambda, map and filter functions

Leave a Reply

Your email address will not be published. Required fields are marked *