Last Updated on July 14, 2022 by Jay
Excel makes plotting a graph very easy. So does Python! Today we’ll take a quick look at how to plot graphs in Python.
This tutorial is part of the “Integrate Python with Excel” series, you can find the table of content here for easier navigation.
Excel makes pretty graphs, why bother using Python?
We are in the Internet age. Everything is online – the Internet is inevitably the largest public database out there. One thing that makes Python the superior plotting tool (to Excel) is that we can get data easily from the Internet then plot it using Python. If we need to use some online data and want to plot in Excel, what do we do? Maybe download it to our laptop, then graph it. Or maybe use clunky VBA or PowerQuery to get the data then graph it. I’m sure those are not good experiences if you have done them before. That’s why we should use Python for seamless and painless data extraction, manipulation, and plotting!
Prepare a dataframe for demo
You don’t believe getting data from the Internet is easy using Python? Let’s take a look…We’ll use the John Hopkins University’s COVID19 database to plot the confirmed cases over time for this tutorial. Their daily updated global COVID confirmed cases file can be found here: https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv.
We’ll use the pandas library to process the data. And we’ll use 1 line of code to get the data into a table-like format into Python.
import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv')
>>> df
Province/State Country/Region Lat ... 9/1/20 9/2/20 9/3/20
0 NaN Afghanistan 33.939110 ... 38196 38243 38288
1 NaN Albania 41.153300 ... 9606 9728 9844
2 NaN Algeria 28.033900 ... 44833 45158 45469
3 NaN Andorra 42.506300 ... 1184 1199 1199
4 NaN Angola -11.202700 ... 2729 2777 2805
.. ... ... ... ... ... ... ...
261 NaN West Bank and Gaza 31.952200 ... 23281 23875 24471
262 NaN Western Sahara 24.215500 ... 10 10 10
263 NaN Yemen 15.552727 ... 1962 1976 1979
264 NaN Zambia -13.133897 ... 12381 12415 12523
265 NaN Zimbabwe -19.015438 ... 6559 6638 6678
[266 rows x 230 columns]
There are many countries in the reported data, to make this tutorial easy to follow, we’ll just look at the global confirmed numbers. If you want to focus on a specific country, simply apply a filter to the dataframe for your desired country.
Since the first 4 columns are just geographical information, we can get rid of them and focus on the daily numbers only.
df = df.iloc[:,4:]
global_num = df.sum()
>>> global_num
1/22/20 555
1/23/20 654
1/24/20 941
1/25/20 1434
1/26/20 2118
...
8/30/20 25222709
8/31/20 25484767
9/1/20 25749642
9/2/20 26031410
9/3/20 26304856
Length: 226, dtype: int64
Now we have a 1-dimensional table – dates and the the confirm COVID cases on the corresponding date. We’ll use this to plot the global COVID cases over time. pandas depends on another library called matplotlib
for plotting, so we’ll have to import that as well. Otherwise, your pandas
plot doesn’t show up. If you haven’t already, pip install
it first. By convention, we rename the matplotlib.pyplot
as plt
.
pip install matplotlib
pandas
provides a convenient way to plot graphs directly from a dataframe, so all we need is dataframe.plot()
. But we have to remember to let matplotlib
display the plot after we draw it, and that is the magic word plt.show()
.
import matplotlib.pyplot as plt
global_num.plot()
plt.show()
Quite impressive already considering we only used 2 lines of code (including the magic word), we didn’t even tell pandas which column is x-axis and which one is the y-axis! We’ll talk about how to make prettier graphs in the next couple of chapters.