Last Updated on July 14, 2022 by Jay
Sometimes we want to access just certain data in an Excel file in Python instead of reading the whole spreadsheet/table into pandas. This tutorial will show you how to use the openpyxl library to access Excel data in Python.
Library
openpyxl is a third-party Python library we can download by using pip. Open up a command prompt, then type:
pip install openpyxl
We’ll use a simple Excel file for this demonstration, so there’s no need to download anything. Below screenshots show what’s inside the example Excel spreadsheet with 2 tabs:
- Sheet1 contains some date, String, and number values
- calc contains just one value 3.14
Access Sheets of An Excel File Using Python openpyxl
We can use Python to read an Excel file using the openpyxl.load_workbook() method. wb is the variable name we chose to represent the Excel workbook. Upon checking wb.sheetnames, we can see there are two sheets in the file.
from openpyxl import load_workbook
wb = load_workbook('book1.xlsx')
wb.sheetnames
['Sheet1', 'calc']
wb.active will always give the first sheet (index = 0). We can also access other sheets by passing individual sheet names into the wb, think about the wb is like a dictionary of worksheets, and we can use the sheet names (key) to access each sheet (value).
ws = wb.active
ws
<Worksheet "Sheet1">
wb['calc']
<Worksheet "calc">
Another way to access individual sheets is by calling the worksheets attribute, then using the index to pick a sheet.
wb.worksheets
[<Worksheet "Sheet1">, <Worksheet "calc">]
wb.worksheets[1]
<Worksheet "calc">
Access Data of an Excel Sheet
Accessing One Cell
Let’s access some data points inside the Sheet1 (or the variable ws in Python). We can use the same Excel column-row notation to specify which cell we want to get. For example, ws[‘B3’] returns a Cell object that represents cell B3 of the Sheet1. To get data inside that cell, we call the .value attribute. Conveniently, we can also use the .data_type attribute to check what data is inside a cell.
ws['B3']
<Cell 'Sheet1'.B3>
ws['B3'].value
datetime.datetime(2022, 3, 1, 0, 0)
ws['B3'].data_type
'd'
ws['C6'].value
'Watermelon'
ws['D8'].value
600
You can also access cells using the cell(row, column) method. Although this approach seems rather redundant compared to the Excel’s column-row notation, it provides an easier way to access cells programmatically.
Also note that cell (3,3) refers to cell C3, where the index starts from 1 instead of 0 in this expression.
ws.cell(row=3, column=3).value
'Apple'
Accessing Many Cells
We can reference columns, rows, or ranges using the same notations as in Excel. For example:
- ws[‘B’] – returns column B
- ws[3] – returns row 3, again the index starts from 1
- Then, ws[‘B2:D8’] – returns range B2 to D8
Note the below code returns a tuple of Cell objects, we need to call the .value attribute to get the actual values inside those cells.
ws['B']
(<Cell 'Sheet1'.B1>,
<Cell 'Sheet1'.B2>,
<Cell 'Sheet1'.B3>,
<Cell 'Sheet1'.B4>,
<Cell 'Sheet1'.B5>,
<Cell 'Sheet1'.B6>,
<Cell 'Sheet1'.B7>,
<Cell 'Sheet1'.B8>)
ws[3]
(<Cell 'Sheet1'.A3>,
<Cell 'Sheet1'.B3>,
<Cell 'Sheet1'.C3>,
<Cell 'Sheet1'.D3>)
for row in ws['B2:D8']:
for cell in row:
print(cell.value)
Date
Item
Sales
2022-03-01 00:00:00
Apple
100
2022-03-02 00:00:00
Banana
200
2022-03-03 00:00:00
Orange
300
2022-03-04 00:00:00
Watermelon
400
2022-03-05 00:00:00
Cherry
500
2022-03-06 00:00:00
Peach
600
Additional Resources
How to Use Python to Read Excel Formula
Accessing Many Cells: what about using Defined Names (range of cells) of Excel to access for instance a table identified with Defined Names?
Hi Ben,
Thanks for your comments!
Here’s how to work with named ranges:
https://pythoninoffice.com/how-to-work-with-excel-named-range-in-python/
This is a new series I’m working on, so more openpyxl tutorials are coming. Please stay tuned 🙂
How to get the Month or Year from the Cell of datetime?