Last Updated on July 21, 2022 by Jay
Sometimes we might need to read data from a Google Sheet into pandas. Let’s take a look at how to do that.
It’s surprisingly simple, but first we need to “publish” the Google Sheet.
Share And Publish A Google Sheet
First, let’s publish the Google Sheet to the web. On the Google Sheet window, head to File -> Share -> Publish to web.
Then, the following window will pop up, and we’ll select either Excel or CSV from the dropdown under “Web page.” We also have the option to publish either the entire document or just publish specific tabs within the document.
I will publish the entire workbook, so I’ll keep the Entire Document option. Then click on the Publish button.
Google then will generate a link to our spreadsheet and will display that link in the next window. Here is the link if you want to try it:
https://docs.google.com/spreadsheets/d/e/2PACX-1vQzorMfVQKIcsBiBTC3DdCf36PYnwWsYSb0tJl_GcbbiWPFbwrwt_Cg8XraVqaRdFRt0MLvVDfM3DyM/pub?output=xlsx
Note how the above link ends with “output=xlsx”. If we change that from xlsx to csv, it will also work and download the .csv file instead of the .xlsx file.
Technically this is the link to the Excel/CSV file. If we simply enter this link into a browser, the file will be downloaded to our computer. Since we have published this workbook, anyone with the URL can access this file.
Read Google Sheet Into Pandas
So far we have done the file preparation, it’s time to bring the data into pandas, and it’s surprisingly simple! One thing to note is that we have to match the pandas method with the file type, see the below example:
import pandas as pd
excel = pd.read_excel('https://docs.google.com/spreadsheets/d/e/2PACX-1vQzorMfVQKIcsBiBTC3DdCf36PYnwWsYSb0tJl_GcbbiWPFbwrwt_Cg8XraVqaRdFRt0MLvVDfM3DyM/pub?output=xlsx')
csv = pd.read_csv('https://docs.google.com/spreadsheets/d/e/2PACX-1vQzorMfVQKIcsBiBTC3DdCf36PYnwWsYSb0tJl_GcbbiWPFbwrwt_Cg8XraVqaRdFRt0MLvVDfM3DyM/pub?output=csv')
The tables we retrieved from the two methods are different. This is because the spreadsheet uses a formula to randomly generate numbers (=RAND()) in the some_data column. So this is expected!