Read multiple Excel sheets with Python pandas

Sharing is caring!

Last Updated on July 14, 2022 by Jay

In the previous post, we touched on how to read an Excel file into Python. Here we’ll attempt to read multiple Excel sheets (from the same file) with Python pandas. We can do this in two ways: use pd.read_excel() method, with the optional argument sheet_name; the alternative is to create a pd.ExcelFile object, then parse data from that object.

pd.read_excel() method

In the below example:

  • Select sheets to read by index: sheet_name = [0,1,2] means the first three sheets.
  • Select sheets to read by name: sheet_name = ['User_info', 'compound']. This method requires you to know the sheet names in advance.
  • Select all sheets: sheet_name = None.
import pandas as pd
df = pd.read_excel('users.xlsx', sheet_name = [0,1,2])
df = pd.read_excel('users.xlsx', sheet_name = ['User_info','compound'])
df = pd.read_excel('users.xlsx', sheet_name = None) # read all sheets

We will read all sheets from the sample Excel file, then use that dataframe for the examples going forward.

The df returns a dictionary of dataframes. The keys of the dictionary contain sheet names, and values of the dictionary contain sheet content.

>>> df.keys()
dict_keys(['User_info', 'purchase', 'compound', 'header_row5'])

>>> df.values()
dict_values([      User Name Country      City Gender  Age
0  Forrest Gump     USA  New York      M   50
1     Mary Jane  CANADA   Tornoto      F   30
2  Harry Porter      UK    London      M   20
3     Jean Grey   CHINA  Shanghai      F   30,     

ID      Customer            purchase       Date
0  101  Forrest Gump        Dragon Ball  2020-08-12
1  102     Mary Jane          Evangelion 2020-01-01
2  103  Harry Porter        Kill la Kill 2020-08-01
3  104     Jean Grey        Dragon Ball  1999-01-01
4  105     Mary Jane          Evangelion 2019-12-31
5  106  Harry Porter  Ghost in the Shell 2020-01-01
6  107     Jean Grey          Evangelion 2018-04-01,

....
]

To obtain data from a specific sheet, simply reference the key in the dictionary. For example, df['header_row5'] returns the sheet in which data starts from row 5.

>>> df['header_row5']

   Unnamed: 0    Unnamed: 1          Unnamed: 2           Unnamed: 3
0         NaN           NaN                 NaN                  NaN
1         NaN           NaN                 NaN                  NaN
2         NaN           NaN                 NaN                  NaN
3          ID      Customer            purchase                 Date
4         101  Forrest Gump        Dragon Ball   2020-08-12 00:00:00
5         102     Mary Jane          Evangelion  2020-01-01 00:00:00
6         103  Harry Porter        Kill la Kill  2020-08-01 00:00:00
7         104     Jean Grey        Dragon Ball   1999-01-01 00:00:00
8         105     Mary Jane          Evangelion  2019-12-31 00:00:00
9         106  Harry Porter  Ghost in the Shell  2020-01-01 00:00:00
10        107     Jean Grey          Evangelion  2018-04-01 00:00:00

pd.ExcelFile()

With this approach, we create a pd.ExcelFile object to represent the Excel file. We do not need to specify which sheets to read when using this method. Note that the previous read_excel() method returns a dataframe or a dictionary of dataframes; whereas pd.ExcelFile() returns a reference object to the Excel file.

f = pd.ExcelFile('users.xlsx')

>>> f
<pandas.io.excel._base.ExcelFile object at 0x00000138DAE66670>

To get sheet names, we can all the sheet_names attribute from the ExcelFile object, which returns a list of the sheet names (string).


>>> f.sheet_names
['User_info', 'purchase', 'compound', 'header_row5']

To get data from a sheet, we can use the parse() method, and provide the sheet name.

>>> f.parse(sheet_name = 'User_info')
      User Name Country      City Gender  Age
0  Forrest Gump     USA  New York      M   50
1     Mary Jane  CANADA   Tornoto      F   30
2  Harry Porter      UK    London      M   20
3     Jean Grey   CHINA  Shanghai      F   30

One thing to note is that the pd.ExcelFile.parse() method is equivalent to the pd.read_excel() method, so that means you can pass in the same arguments used in read_excel().

Moving on…

We have learned how to read data from Excel or CSV files, next we’ll cover how to save a dataframe back into an Excel (or CSV) file.

One comment

Leave a Reply

Your email address will not be published. Required fields are marked *