Last Updated on July 14, 2022 by Jay
Sometimes we want to access just certain data in an Excel file in Python instead of reading the whole spreadsheet/table into pandas. This tutorial will show you how to use the openpyxl library to access Excel data in Python.
openpyxl is a third-party Python library we can download by using pip. Open up a command prompt, then type:
pip install openpyxl
We’ll use a simple Excel file for this demonstration, so there’s no need to download anything. Below screenshots show what’s inside the example Excel spreadsheet with 2 tabs:
- Sheet1 contains some date, String, and number values
- calc contains just one value 3.14
Access Sheets of An Excel File Using Python openpyxl
We can use Python to read an Excel file using the openpyxl.load_workbook() method. wb is the variable name we chose to represent the Excel workbook. Upon checking wb.sheetnames, we can see there are two sheets in the file.
from openpyxl import load_workbook wb = load_workbook('book1.xlsx') wb.sheetnames ['Sheet1', 'calc']
wb.active will always give the first sheet (index = 0). We can also access other sheets by passing individual sheet names into the wb, think about the wb is like a dictionary of worksheets, and we can use the sheet names (key) to access each sheet (value).
ws = wb.active ws <Worksheet "Sheet1"> wb['calc'] <Worksheet "calc">
Another way to access individual sheets is by calling the worksheets attribute, then using the index to pick a sheet.
wb.worksheets [<Worksheet "Sheet1">, <Worksheet "calc">] wb.worksheets <Worksheet "calc">
Access Data of an Excel Sheet
Accessing One Cell
Let’s access some data points inside the Sheet1 (or the variable ws in Python). We can use the same Excel column-row notation to specify which cell we want to get. For example, ws[‘B3’] returns a Cell object that represents cell B3 of the Sheet1. To get data inside that cell, we call the .value attribute. Conveniently, we can also use the .data_type attribute to check what data is inside a cell.
ws['B3'] <Cell 'Sheet1'.B3> ws['B3'].value datetime.datetime(2022, 3, 1, 0, 0) ws['B3'].data_type 'd' ws['C6'].value 'Watermelon' ws['D8'].value 600
You can also access cells using the cell(row, column) method. Although this approach seems rather redundant compared to the Excel’s column-row notation, it provides an easier way to access cells programmatically.
Also note that cell (3,3) refers to cell C3, where the index starts from 1 instead of 0 in this expression.
ws.cell(row=3, column=3).value 'Apple'
Accessing Many Cells
We can reference columns, rows, or ranges using the same notations as in Excel. For example:
- ws[‘B’] – returns column B
- ws – returns row 3, again the index starts from 1
- Then, ws[‘B2:D8’] – returns range B2 to D8
Note the below code returns a tuple of Cell objects, we need to call the .value attribute to get the actual values inside those cells.
ws['B'] (<Cell 'Sheet1'.B1>, <Cell 'Sheet1'.B2>, <Cell 'Sheet1'.B3>, <Cell 'Sheet1'.B4>, <Cell 'Sheet1'.B5>, <Cell 'Sheet1'.B6>, <Cell 'Sheet1'.B7>, <Cell 'Sheet1'.B8>) ws (<Cell 'Sheet1'.A3>, <Cell 'Sheet1'.B3>, <Cell 'Sheet1'.C3>, <Cell 'Sheet1'.D3>) for row in ws['B2:D8']: for cell in row: print(cell.value) Date Item Sales 2022-03-01 00:00:00 Apple 100 2022-03-02 00:00:00 Banana 200 2022-03-03 00:00:00 Orange 300 2022-03-04 00:00:00 Watermelon 400 2022-03-05 00:00:00 Cherry 500 2022-03-06 00:00:00 Peach 600
How to Use Python to Read Excel Formula
How to Work with Excel Named Range in Python
Accessing Many Cells: what about using Defined Names (range of cells) of Excel to access for instance a table identified with Defined Names?
Thanks for your comments!
Here’s how to work with named ranges:
This is a new series I’m working on, so more openpyxl tutorials are coming. Please stay tuned 🙂