Last Updated on July 14, 2022 by Jay
Magic happens when you use Excel and Python together.
Let me put this out first: I love Excel, and I love Python. One is a great software, simple user interface, easy to understand, and used by billions. The other is a powerful and flexible programming language, also supported by a vast community. Python is not replacing Excel, but we can use them together.
Let’s start the series by looking at the similarities and differences between Excel and Python. For easy navigation, you can find the table of content for this series.
Interface
- Excel: workbooks, sheets, ranges, cells
- Python: IDE (code)
There are really not many similarities between Excel’s graphical user interface and Python’s IDE. However, the secret weapon of Python is its massive and powerful library bank. There are several libraries for interacting with Excel files. I will introduce one of the most important – pandas
. This is not the animal panda, but it’s one of the most powerful and widely used Python libraries for data analysis. The name pandas
was derived from “panel data”.
Take a look at the below comparison. On the left, I have an Excel sheet named “User_info”. On the right, I use Python to read the entire sheet as a pandas DataFrame
, then name it “user_info”, which contains the exact same data as the Excel sheet on the left. Note these grids actually don’t exist in the Python IDE, I drew them to help you visualize the table. A pandas DataFrame
is essentially a 2-dimensional table with data, similar to Excel, where you have rows. If you know the programming language R, congratulations – you already know most of the pandas Dataframe.
Excel: workbooks, sheets
Python: pandas DataFrame
You can have multiple workbooks or sheets on the Excel side. Whereas you can have various dataframes containing the same Excel data in the Python environment.
To use Python and Excel together, we have to change our mindset. Instead of staring at our spreadsheet, we need to mentally visualize the data table. If you can do that, you will have no problem using Python to control Excel.
Calculation
Take a look at the below examples to calculate the area of a square. In Excel, we use cells and formulas, and in Python, we use variables and formulas. Also, do you think the logic looks almost the same?
Excel
Python
Each Python variable is like a “cell” that contains some data. You can reference those “cells” by typing out the variable name in Python.
Multiple calculations
We often need to calculate many items at once. For example, to calculate compound interest factor each year for 10 years. We could do the following. **NOTE** The Python example below, looping is not the right way to do things in pandas, but I used a loop intentionally to show the concept.
Excel
Most likely, an Excel user would first set up a formula in cell B3=B2*(1+2%), then drag down to fill column B.
Python
This is a good example of a loop. The current factor = previous factor * 1.02. And the calculation is performed within a pandas dataframe
The point I’m trying to make is, no matter what kind of calculation, the underlying logic is the same between Excel and Python. With the help of a table-like pandas DataFrame
, if you already know how to use Excel, I’m confident that you will be able to learn Python quickly!
Accessing data
We saw from the previous examples, accessing data in Excel requires us to click on (reference) cells, which is done in a graphical user interface. Whereas for Python, we need to access to data by typing out the code, i.e. programmatically. You can get the value of a cell by referencing is using the “=B1” formula. You can also get the value of any “cell” in a dataframe by typing out a command.
“A graphical user interface is a lot easier to work with”, you might think Python is hard to use from this perspective, but think again. Imagine that you need to apply the same format or formula to hundreds of different files, would you do it manually 100 times for each file, or write a 4-line code and let the computer do the tedious work for you? I think the answer is clear.
What about VBA and PowerQuery/Pivot?
Let’s see some facts:
- VBA is slow and not really maintained by Microsoft since a decade ago.
- The new toys PowerQuery/Pivot seem fast at first, but wait until you apply many data transformations on a large dataset. It’s a nightmare.
Having said that, VBA is still useful in some cases. I will cover some topics on VBA in later articles.
Now we’ve compared Excel and Python, next we’ll dive into the details to learn how to use Python to control Excel!
One comment