Last Updated on February 25, 2022 by Jay
If you are trying to use pandas to read a password encrypted Excel file, and got the following message:
xlrd.biffh.XLRDError: Can't find workbook in OLE2 compound document
Then you’ve come to the right place! This message means we are trying to read a password-encrypted file without providing the password. In this short tutorial, I’m going to show you how to read a password-encrypted Excel file into pandas.
The best solution I’ve found so far is the msoffcrypto library. Despite its name, this library has nothing to do with cryptocurrency…
We’ll use pip to install it, note the installation name msoffcrypto-tool:
pip install msoffcrypto-tool
Read Encrypted Excel File Directly Into Pandas
The msoffcrypto library has a method load_key() to prepare password for the Excel file. Then the decrypt() method unlocks the file and saves it to disk.
Since we want to read the encrypted Excel file directly into pandas, saving to disk will be inefficient. Therefore we can write the file content temporarily to an in-memory buffer (RAM). To do this, we’ll need to use the io library.
import msoffcrypto import io import pandas as pd temp = io.BytesIO()
The io.BytesIO() allows writing contents to an in-memory buff (RAM). This helps process the file much faster than writing to the disk.
In our example, the password is ‘1234’, make sure to replace that with your own password when you test this code. The below code snippet opens the encrypted Excel file “secret.xlsx” in binary format using ‘rb’ mode, unlocks the file, then saves the content (an Excel file) to an in-memory buffer (RAM) location called temp.
with open('secret.xlsx', 'rb') as f: excel = msoffcrypto.OfficeFile(f) excel.load_key('1234') excel.decrypt(temp)
Now, this temp is our target Excel file with password stripped. We can proceed with the normal pd.read_excel() to read it!
Putting It Together
Here you have it, a short and sweet script for reading encrypted Excel files directly into pandas, note we have not modified the original Excel file nor created unnecessary files on disk during the process!
import msoffcrypto import io import pandas as pd temp = io.BytesIO() with open('secret.xlsx', 'rb') as f: excel = msoffcrypto.OfficeFile(f) excel.load_key('1234') excel.decrypt(temp) df = pd.read_excel(temp) del temp