Read Encrypted Excel File Using Pandas

Sharing is caring!

Last Updated on February 25, 2022 by Jay

If you are trying to use pandas to read a password encrypted Excel file, and got the following message:

xlrd.biffh.XLRDError: Can't find workbook in OLE2 compound document

Then you’ve come to the right place! This message means we are trying to read a password-encrypted file without providing the password. In this short tutorial, I’m going to show you how to read a password-encrypted Excel file into pandas.

Library

The best solution I’ve found so far is the msoffcrypto library. Despite its name, this library has nothing to do with cryptocurrency…

We’ll use pip to install it, note the installation name msoffcrypto-tool:

pip install msoffcrypto-tool

Read Encrypted Excel File Directly Into Pandas

The msoffcrypto library has a method load_key() to prepare password for the Excel file. Then the decrypt() method unlocks the file and saves it to disk.

Since we want to read the encrypted Excel file directly into pandas, saving to disk will be inefficient. Therefore we can write the file content temporarily to an in-memory buffer (RAM). To do this, we’ll need to use the io library.

import msoffcrypto
import io
import pandas as pd

temp = io.BytesIO()

The io.BytesIO() allows writing contents to an in-memory buff (RAM). This helps process the file much faster than writing to the disk.

A secret message inside an Excel file

In our example, the password is ‘1234’, make sure to replace that with your own password when you test this code. The below code snippet opens the encrypted Excel file “secret.xlsx” in binary format using ‘rb’ mode, unlocks the file, then saves the content (an Excel file) to an in-memory buffer (RAM) location called temp.

with open('secret.xlsx', 'rb') as f:
    excel = msoffcrypto.OfficeFile(f)
    excel.load_key('1234')
    excel.decrypt(temp)

Now, this temp is our target Excel file with password stripped. We can proceed with the normal pd.read_excel() to read it!

Read Excel file using pandas

Putting It Together

Here you have it, a short and sweet script for reading encrypted Excel files directly into pandas, note we have not modified the original Excel file nor created unnecessary files on disk during the process!

import msoffcrypto
import io
import pandas as pd

temp = io.BytesIO()

with open('secret.xlsx', 'rb') as f:
    excel = msoffcrypto.OfficeFile(f)
    excel.load_key('1234')
    excel.decrypt(temp)

df = pd.read_excel(temp)
del temp

7 comments

    1. Hi Fransicso,

      It sounds like you didn’t install the library successfully. In the command line type “pip list” to confirm what libraries you currently have installed, then try pip install msoffcrypto-tool again.

      Hope that helps.

  1. Hey thank you for this post. However i get the error FileFormatError: Unencrypted document or unsupported file format, even though it is a xlsx file?

  2. Hello thank you for your post. However i get the error of “FileFormatError: Unencrypted document or unsupported file format”, even though the file i’m trying to read is xlsx?

  3. Hi,
    I always get this, even tho I know the password is correct:

    InvalidKeyError: The file could not be decrypted with this password

    Thanks

  4. NASCA + SD Digital Rights Management software loaded on my pc. I am not able to open excel files using python before this Software, I am able to open any excel file using python. However after the NASCA Load, and after the excel file was opened, saved and encrypted using NASCA, can someone help to open these kind of file using python.

Leave a Reply

Your email address will not be published. Required fields are marked *