Last Updated on March 24, 2022 by Jay
This tutorial will teach you how to create a bar chart race using the Python matplotlib library, which is an awesome plotting library that’s capable of doing some incredible things. A lot of people use matplotlib for plotting static charts, but we can also create animated plots and add interactivity to our chart!
You like Dragonball? Obsessed with the characters’ power levels? Let’s create some fun visualization!
Once you understand the techniques in this tutorial, you’ll be able to create any type of animation using matplotlib.
Library
We’ll need matplotlib for plotting, and pandas for some data manipulation. To install them, type the following into a command prompt/terminal window.
pip install matplotlib pandas
Create Animated Plot Using Matplotlib
I highly recommend you read the following tutorial first to learn how to create a simple animated plot using matplotlib:
How to Create Animation with Matplotlib
Once you understand the concept, you’ll be able to get through the rest of this tutorial quickly
Create A Bar Chart Race In Python Matplotlib
Actually, we are not limited to just a “bar chart” race. Using the theory discussed above, we can create animation with any type of plot: line, bubble, pie, etc. For demonstration purposes, we’ll stick with a bar chart
Step 1 – Create A Static Chart
Let’s load some data and start plotting. We got some Dragon Ball Power Level (PL) data.
Think about each row as a “snapshot” of the PL data for each character at a given point (frame) – note the PLs are cumulative down the rows.
%matplotlib notebook
import matplotlib.pyplot as plt
import pandas as pd
db = pd.read_csv('https://raw.githubusercontent.com/pythoninoffice/pythonio_examples/main/matplotlib_bar_chart_race/dragon_ball_pl.csv')
db.head()
Yamcha Krilin Goku Master Shen Chiaotzu Tien Shinhan
0 978 347 446 715 621 723
1 1375 954 1118 1659 656 988
2 1641 1784 1637 2235 1300 1701
3 2472 2160 1737 2778 1651 1889
4 3346 2926 2197 3103 1842 2006
We’ll take the first row and create a bar chart from it. We’ll go with a horizontal bar chart, but you can also create a vertical bar chart or even a line chart. A few notable arguments for the ax.barh() method:
- y: the y coordinates of the bars. i.e. the positions of the bars from bottom (position 0) to top (position n)
- tick_label: the tick labels of the bars, this is going to be the character names
- width: the widths of the bars, this is the PL values from our dataset
one_row = db.iloc[0]
one_row_ascending = one_row.sort_values()
characters = db.columns
fix, ax = plt.subplots(figsize=(10,10))
ax.barh(y = range(len(characters)),
tick_label = one_row_ascending.index,
width = one_row_ascending.values,
align='center',
color = plt.cm.Set1(range(len(characters))))
Ordering the PLs – Ascending vs Descending Order
Take a look at the below two screenshots. They are produced using almost the same code, except that the left picture used data sorted by “Ascending order”; whereas the right picture used data sorted by “Descending order”.
The key takeaway here is – the color of the bars stayed the same! This is because in both charts, we set y = range(len(characters)), which means the y coordinates never changed. However, to create the bar chart race animation, we need the color to stick with the characters, not with the order of the bars.
Note the above screenshots are not incorrect!
In the “Ascending order” case. Our dataset starts with the lowest PL which is Krilin. The y coordinates argument is a range but will translate to 0, 1, 2, 3, 4, 5. That’s why Krilin is in spot 0, and Yamcha (highest PL) is in position 5. Anyway, this is to show you that the sort_values() and the range(len(characters)) combination doesn’t give what we need for a bar chart race plot.
one_row_ascending
Krilin 347
Goku 446
Chiaotzu 621
Master Shen 715
Tien Shinhan 723
Yamcha 978
Name: 0, dtype: int64
range(len(characters))
range(0, 6)
The df.rank() Method
The rank() method ranks data for a given axis (either x or y). Instead of sorting and re-arranging the data, this method gives the ranks of each data point. Let’s take a look at the following example:
one_row
Yamcha 978
Krilin 347
Goku 446
Master Shen 715
Chiaotzu 621
Tien Shinhan 723
Name: 0, dtype: int64
one_row.rank()
Yamcha 6.0
Krilin 1.0
Goku 2.0
Master Shen 4.0
Chiaotzu 3.0
Tien Shinhan 5.0
Name: 0, dtype: float64
We are going to re-plot the previous chart using the data ranks as the y coordinates. Note we didn’t use sort_values() and it’s now in descending order. The highest PL is on the top, so this is exactly what we wanted.
Step 2 – Create Many Static Charts
Let’s use what we learned so far and create a few more charts (snapshots) using the first few rows of data:
db.head(3)
Yamcha Krilin Goku Master Shen Chiaotzu Tien Shinhan
0 978 347 446 715 621 723
1 1375 954 1118 1659 656 988
2 1641 1784 1637 2235 1300 1701
num = 3
fig, axs = plt.subplots(nrows = 1, ncols = num, figsize = (10, 5), tight_layout = True)
for i, ax in enumerate(axs):
ax.barh(y=db.iloc[i].rank(),
tick_label = db.iloc[i].index,
width = db.iloc[i].values,
color = plt.cm.Set1(range(6)))
ax.set_title(f'{i}-th row', fontsize='larger')
[spine.set_visible(False) for spine in ax.spines.values()] # remove chart outlines
Please excuse the improper use of ‘-th’ here. You see, with the help of rank() method, we are able to change character positions as their PL changes while maintaining the same colors for them. This is the foundation of the bar chart race animation.
Step 3 – Matplotlib Animation Function
If you get this far, I’m going to assume that you already read the other tutorial that explains how to create a simple animation using matplotlib: How to Create Animation with Matplotlib. You’ll need to understand the concept there first.
from matplotlib.animation import FuncAnimation
def update(i):
ax.clear()
ax.set_facecolor(plt.cm.Greys(0.2))
[spine.set_visible(False) for spine in ax.spines.values()]
hbars = ax.barh(y = db.iloc[i].rank().values,
tick_label=db.iloc[i].index,
width = db.iloc[i].values,
height = 0.8,
color = plt.cm.Set1(range(11))
)
ax.set_title(f'Frame: {i}')
#ax.bar_label(hbars, fmt='%.2d')
fig,ax = plt.subplots(#figsize=(10,7),
facecolor = plt.cm.Greys(0.2),
dpi = 150,
tight_layout=True
)
data_anime = FuncAnimation(
fig = fig,
func = update,
frames= len(db),
interval=300
)
WARNING: DO NOT STARE AT THE FOLLOWING CHART FOR TOO LONG AS IT MIGHT MAKE YOU SICK!
The above code will generate an animated plot. It works as expected and colors stick with each character as their rankings change. However, the below chart with abrupt movements makes my stomach sick…
There are two reasons the above chart sucks:
- There aren’t enough frames in the animation. As you might know, the higher FPS (frame per second), the better animation quality we’ll get.
- The abrupt movements/transitions is due to PL ranks keep changing as we move from 1 frame to the next (or 1 row of data to the next). The bar coordinates y always have the following values: 0,1,2,3,4,5. To create smoother transitions, we’ll need some ranks with demical points, e.g. 1.5 or 2.3, etc. in our dataset.
Step 4 – Smoother Transitions Between Static Charts
Step 4.1 – Create Dummy Frames And Ranks
Let’s tackle the first problem – not enough frames. Our original dataframe has only 21 rows of data, i.e. 21 frames. We are going to increase the number of rows by 10.
Because the data is already cumulative, we’ll insert some empty rows between each row of data.
We’ll do this by “cheating” with the index numbers. First, modify the original dataframe’s index by skipping every 10 rows. Then create an empty dataframe with only NaN values, with index rows from 0 to 210, except that we skip every 10th position. Combine the two dataframe and sort_index.
Voilà – we have “inserted” empty rows between each row of the original dataframe!
Also, don’t forget to create a rank_df for this expanded dataset! Indeed, we want to rank the dataframe with NaN values.
db.index = range(0,21*10,10)
pritn(list(db.index))
[0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200]
row_nums = [i for i in range(0,210) if i % 10 != 0 ]
empty = pd.DataFrame(np.nan, index= row_nums, columns = db.columns)
expand_df = pd.concat([db, empty]).sort_index()
expand_df
Yamcha Krilin Goku Master Shen Chiaotzu Tien Shinhan
0 978.0 347.0 446.0 715.0 621.0 723.0
1 NaN NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN NaN
4 NaN NaN NaN NaN NaN NaN
5 NaN NaN NaN NaN NaN NaN
6 NaN NaN NaN NaN NaN NaN
7 NaN NaN NaN NaN NaN NaN
8 NaN NaN NaN NaN NaN NaN
9 NaN NaN NaN NaN NaN NaN
10 1375.0 954.0 1118.0 1659.0 656.0 988.0
11 NaN NaN NaN NaN NaN NaN
12 NaN NaN NaN NaN NaN NaN
13 NaN NaN NaN NaN NaN NaN
14 NaN NaN NaN NaN NaN NaN
15 NaN NaN NaN NaN NaN NaN
16 NaN NaN NaN NaN NaN NaN
17 NaN NaN NaN NaN NaN NaN
18 NaN NaN NaN NaN NaN NaN
19 NaN NaN NaN NaN NaN NaN
20 1641.0 1784.0 1637.0 2235.0 1300.0 1701.0
................
rank_df = expand_df.rank(axis=1)
rank_df.head(11)
Yamcha Krilin Goku Master Shen Chiaotzu Tien Shinhan
0 6.0 1.0 2.0 4.0 3.0 5.0
1 NaN NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN NaN
4 NaN NaN NaN NaN NaN NaN
5 NaN NaN NaN NaN NaN NaN
6 NaN NaN NaN NaN NaN NaN
7 NaN NaN NaN NaN NaN NaN
8 NaN NaN NaN NaN NaN NaN
9 NaN NaN NaN NaN NaN NaN
10 5.0 2.0 4.0 6.0 1.0 3.0
Step 4.2 – Fill Missing Values By Linear Interpolation
Then we fill these NaN values by interpolating from existing values, i.e. those original values on every 10 rows. As shown below, the 0th and 10th rows contain original values from our dataset. Rows 1-9 are linearly interpolated values between 0th and 10th rows.
expand_df = expand_df.interpolate()
expand_df.head(11)
Yamcha Krilin Goku Master Shen Chiaotzu Tien Shinhan
0 978.0 347.0 446.0 715.0 621.0 723.0
1 1017.7 407.7 513.2 809.4 624.5 749.5
2 1057.4 468.4 580.4 903.8 628.0 776.0
3 1097.1 529.1 647.6 998.2 631.5 802.5
4 1136.8 589.8 714.8 1092.6 635.0 829.0
5 1176.5 650.5 782.0 1187.0 638.5 855.5
6 1216.2 711.2 849.2 1281.4 642.0 882.0
7 1255.9 771.9 916.4 1375.8 645.5 908.5
8 1295.6 832.6 983.6 1470.2 649.0 935.0
9 1335.3 893.3 1050.8 1564.6 652.5 961.5
10 1375.0 954.0 1118.0 1659.0 656.0 988.0
rank_df = rank_df.interpolate()
rank_df.head(11)
Yamcha Krilin Goku Master Shen Chiaotzu Tien Shinhan
0 6.0 1.0 2.0 4.0 3.0 5.0
1 5.9 1.1 2.2 4.2 2.8 4.8
2 5.8 1.2 2.4 4.4 2.6 4.6
3 5.7 1.3 2.6 4.6 2.4 4.4
4 5.6 1.4 2.8 4.8 2.2 4.2
5 5.5 1.5 3.0 5.0 2.0 4.0
6 5.4 1.6 3.2 5.2 1.8 3.8
7 5.3 1.7 3.4 5.4 1.6 3.6
8 5.2 1.8 3.6 5.6 1.4 3.4
9 5.1 1.9 3.8 5.8 1.2 3.2
10 5.0 2.0 4.0 6.0 1.0 3.0
See – now we have decimal “ranks” which will be used as the y-coordinates of the chart. These decimal ranks are essential for making smooth transitions because matplotlib will draw the bars between, let’s say positions 1 and 2. Then with a lot more frames to work with, the animation will look much smoother.
Let’s re-draw the first 3 rows of PL using the below code:
num = 3
fig, axs = plt.subplots(nrows = 1, ncols = num, figsize = (10, 5), tight_layout = True)
for i, ax in enumerate(axs):
ax.barh(y=rank_df.iloc[i].values,
tick_label = expand_df.iloc[i].index,
width = expand_df.iloc[i].values,
color = plt.cm.Set1(range(6)))
ax.set_title(f'{i}-th row', fontsize='larger')
[spine.set_visible(False) for spine in ax.spines.values()] # remove chart outlines
Now we can kind of “see” that Master Shen is taking over the 2nd place from Tien Shinhan at the 3rd frame above!
Smooth Bar Chart Race Using Python Matplotlib
We’ll use the two newly created dataframe to feed PL data for the animation:
- rank_df contains just the ranks of characters, each row is a snapshot/frame
- expand_df contains the actual PL values, each row is a snapshot/frame
- interval argument inside the FuncAnimation() is to control how fast/slow the animation plays
def update(i):
ax.clear()
ax.set_facecolor(plt.cm.Greys(0.2))
[spine.set_visible(False) for spine in ax.spines.values()]
hbars = ax.barh(y = rank_df.iloc[i].values,
tick_label=expand_df.iloc[i].index,
width = expand_df.iloc[i].values,
height = 0.8,
color = plt.cm.Set1(range(11))
)
ax.set_title(f'Frame: {i}')
ax.bar_label(hbars, fmt='%.2d')
fig,ax = plt.subplots(#figsize=(10,7),
facecolor = plt.cm.Greys(0.2),
dpi = 150,
tight_layout=True
)
data_anime = FuncAnimation(
fig = fig,
func = update,
frames= len(expand_df),
interval=100
)
The Bar Chart Race Library
Someone actually made a bar_chart_race library that automates the entire process of creating such a plot. In the next tutorial I’ll cover a variation of the bar_chart_race library. Now go create some fun animations yourself!
This is pretty need. Thanks for sharing.
Many applications might not want to show the cumulative total but the current value and how it changes over time. So in terms of your dragon ball powerlevel example, the objective would be to show who has received how many points at any given day and not who has received the most points in total over the time frame.
How would you change the update-procedure to account for that?
Thanks for answering. Great work!
Best
Alex