How to Create the Bar Chart Race Plot in Python

Sharing is caring!

Last Updated on March 24, 2022 by Jay

This tutorial will teach you how to create a bar chart race using the Python matplotlib library, which is an awesome plotting library that’s capable of doing some incredible things. A lot of people use matplotlib for plotting static charts, but we can also create animated plots and add interactivity to our chart!

You like Dragonball? Obsessed with the characters’ power levels? Let’s create some fun visualization!

Dragon Ball Power Level Bar Chart Race
Dragon Ball Power Level Bar Chart Race Made with Matplotlib

Once you understand the techniques in this tutorial, you’ll be able to create any type of animation using matplotlib.

Library

We’ll need matplotlib for plotting, and pandas for some data manipulation. To install them, type the following into a command prompt/terminal window.

pip install matplotlib pandas

Create Animated Plot Using Matplotlib

I highly recommend you read the following tutorial first to learn how to create a simple animated plot using matplotlib:

How to Create Animation with Matplotlib

Once you understand the concept, you’ll be able to get through the rest of this tutorial quickly

Create A Bar Chart Race In Python Matplotlib

Actually, we are not limited to just a “bar chart” race. Using the theory discussed above, we can create animation with any type of plot: line, bubble, pie, etc. For demonstration purposes, we’ll stick with a bar chart

Step 1 – Create A Static Chart

Let’s load some data and start plotting. We got some Dragon Ball Power Level (PL) data.

Think about each row as a “snapshot” of the PL data for each character at a given point (frame) – note the PLs are cumulative down the rows.

%matplotlib notebook

import matplotlib.pyplot as plt
import pandas as pd


db = pd.read_csv('https://raw.githubusercontent.com/pythoninoffice/pythonio_examples/main/matplotlib_bar_chart_race/dragon_ball_pl.csv')

db.head()
   Yamcha  Krilin  Goku  Master Shen  Chiaotzu  Tien Shinhan
0     978     347   446          715       621           723
1    1375     954  1118         1659       656           988
2    1641    1784  1637         2235      1300          1701
3    2472    2160  1737         2778      1651          1889
4    3346    2926  2197         3103      1842          2006

We’ll take the first row and create a bar chart from it. We’ll go with a horizontal bar chart, but you can also create a vertical bar chart or even a line chart. A few notable arguments for the ax.barh() method:

  • y: the y coordinates of the bars. i.e. the positions of the bars from bottom (position 0) to top (position n)
  • tick_label: the tick labels of the bars, this is going to be the character names
  • width: the widths of the bars, this is the PL values from our dataset
one_row = db.iloc[0]
one_row_ascending = one_row.sort_values()
characters = db.columns

fix, ax = plt.subplots(figsize=(10,10))
ax.barh(y = range(len(characters)), 
        tick_label = one_row_ascending.index,
        width = one_row_ascending.values, 
        align='center',
        color = plt.cm.Set1(range(len(characters))))

Ordering the PLs – Ascending vs Descending Order

Take a look at the below two screenshots. They are produced using almost the same code, except that the left picture used data sorted by “Ascending order”; whereas the right picture used data sorted by “Descending order”.

The key takeaway here is – the color of the bars stayed the same! This is because in both charts, we set y = range(len(characters)), which means the y coordinates never changed. However, to create the bar chart race animation, we need the color to stick with the characters, not with the order of the bars.

Ascending order dataset
Ascending order dataset
Descending order dataset
Descending order dataset

Note the above screenshots are not incorrect!

In the “Ascending order” case. Our dataset starts with the lowest PL which is Krilin. The y coordinates argument is a range but will translate to 0, 1, 2, 3, 4, 5. That’s why Krilin is in spot 0, and Yamcha (highest PL) is in position 5. Anyway, this is to show you that the sort_values() and the range(len(characters)) combination doesn’t give what we need for a bar chart race plot.

one_row_ascending
Krilin          347
Goku            446
Chiaotzu        621
Master Shen     715
Tien Shinhan    723
Yamcha          978
Name: 0, dtype: int64

range(len(characters))
range(0, 6)

The df.rank() Method

The rank() method ranks data for a given axis (either x or y). Instead of sorting and re-arranging the data, this method gives the ranks of each data point. Let’s take a look at the following example:

one_row
Yamcha          978
Krilin          347
Goku            446
Master Shen     715
Chiaotzu        621
Tien Shinhan    723
Name: 0, dtype: int64

one_row.rank()
Yamcha          6.0
Krilin          1.0
Goku            2.0
Master Shen     4.0
Chiaotzu        3.0
Tien Shinhan    5.0
Name: 0, dtype: float64

We are going to re-plot the previous chart using the data ranks as the y coordinates. Note we didn’t use sort_values() and it’s now in descending order. The highest PL is on the top, so this is exactly what we wanted.

ranked plot
ranked plot

Step 2 – Create Many Static Charts

Let’s use what we learned so far and create a few more charts (snapshots) using the first few rows of data:

db.head(3)
   Yamcha  Krilin  Goku  Master Shen  Chiaotzu  Tien Shinhan
0     978     347   446          715       621           723
1    1375     954  1118         1659       656           988
2    1641    1784  1637         2235      1300          1701

num = 3
fig, axs = plt.subplots(nrows = 1, ncols = num, figsize = (10, 5), tight_layout = True)
for i, ax in enumerate(axs):
    ax.barh(y=db.iloc[i].rank(),
            tick_label = db.iloc[i].index,
            width = db.iloc[i].values,
            color = plt.cm.Set1(range(6)))
    ax.set_title(f'{i}-th row', fontsize='larger')
    [spine.set_visible(False) for spine in ax.spines.values()]  # remove chart outlines

Please excuse the improper use of ‘-th’ here. You see, with the help of rank() method, we are able to change character positions as their PL changes while maintaining the same colors for them. This is the foundation of the bar chart race animation.

more ranked plots
more ranked plots

Step 3 – Matplotlib Animation Function

If you get this far, I’m going to assume that you already read the other tutorial that explains how to create a simple animation using matplotlib: How to Create Animation with Matplotlib. You’ll need to understand the concept there first.

from matplotlib.animation import FuncAnimation

def update(i):
    ax.clear()
    ax.set_facecolor(plt.cm.Greys(0.2))
    [spine.set_visible(False) for spine in ax.spines.values()]
    hbars = ax.barh(y = db.iloc[i].rank().values,
           tick_label=db.iloc[i].index,
           width = db.iloc[i].values,
           height = 0.8,
           color = plt.cm.Set1(range(11))
           )
    ax.set_title(f'Frame: {i}')
    #ax.bar_label(hbars, fmt='%.2d')
    

fig,ax = plt.subplots(#figsize=(10,7),
                      facecolor = plt.cm.Greys(0.2),
                      dpi = 150,
                      tight_layout=True
                     )

data_anime = FuncAnimation(
    fig = fig,
    func = update,
    frames= len(db),
    interval=300
)

WARNING: DO NOT STARE AT THE FOLLOWING CHART FOR TOO LONG AS IT MIGHT MAKE YOU SICK!

The above code will generate an animated plot. It works as expected and colors stick with each character as their rankings change. However, the below chart with abrupt movements makes my stomach sick…

animated plot by matplotlib
animated plot by matplotlib

There are two reasons the above chart sucks:

  1. There aren’t enough frames in the animation. As you might know, the higher FPS (frame per second), the better animation quality we’ll get.
  2. The abrupt movements/transitions is due to PL ranks keep changing as we move from 1 frame to the next (or 1 row of data to the next). The bar coordinates y always have the following values: 0,1,2,3,4,5. To create smoother transitions, we’ll need some ranks with demical points, e.g. 1.5 or 2.3, etc. in our dataset.

Step 4 – Smoother Transitions Between Static Charts

Step 4.1 – Create Dummy Frames And Ranks

Let’s tackle the first problem – not enough frames. Our original dataframe has only 21 rows of data, i.e. 21 frames. We are going to increase the number of rows by 10.

Because the data is already cumulative, we’ll insert some empty rows between each row of data.

We’ll do this by “cheating” with the index numbers. First, modify the original dataframe’s index by skipping every 10 rows. Then create an empty dataframe with only NaN values, with index rows from 0 to 210, except that we skip every 10th position. Combine the two dataframe and sort_index.

Voilà – we have “inserted” empty rows between each row of the original dataframe!

Also, don’t forget to create a rank_df for this expanded dataset! Indeed, we want to rank the dataframe with NaN values.

db.index = range(0,21*10,10)
pritn(list(db.index))
[0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200]

row_nums = [i for i in range(0,210) if i % 10 != 0 ]
empty = pd.DataFrame(np.nan, index= row_nums, columns = db.columns)

expand_df = pd.concat([db, empty]).sort_index()
expand_df
    Yamcha  Krilin    Goku  Master Shen  Chiaotzu  Tien Shinhan
0    978.0   347.0   446.0        715.0     621.0         723.0
1      NaN     NaN     NaN          NaN       NaN           NaN
2      NaN     NaN     NaN          NaN       NaN           NaN
3      NaN     NaN     NaN          NaN       NaN           NaN
4      NaN     NaN     NaN          NaN       NaN           NaN
5      NaN     NaN     NaN          NaN       NaN           NaN
6      NaN     NaN     NaN          NaN       NaN           NaN
7      NaN     NaN     NaN          NaN       NaN           NaN
8      NaN     NaN     NaN          NaN       NaN           NaN
9      NaN     NaN     NaN          NaN       NaN           NaN
10  1375.0   954.0  1118.0       1659.0     656.0         988.0
11     NaN     NaN     NaN          NaN       NaN           NaN
12     NaN     NaN     NaN          NaN       NaN           NaN
13     NaN     NaN     NaN          NaN       NaN           NaN
14     NaN     NaN     NaN          NaN       NaN           NaN
15     NaN     NaN     NaN          NaN       NaN           NaN
16     NaN     NaN     NaN          NaN       NaN           NaN
17     NaN     NaN     NaN          NaN       NaN           NaN
18     NaN     NaN     NaN          NaN       NaN           NaN
19     NaN     NaN     NaN          NaN       NaN           NaN
20  1641.0  1784.0  1637.0       2235.0    1300.0        1701.0
................

rank_df = expand_df.rank(axis=1)
rank_df.head(11)
    Yamcha  Krilin  Goku  Master Shen  Chiaotzu  Tien Shinhan
0      6.0     1.0   2.0          4.0       3.0           5.0
1      NaN     NaN   NaN          NaN       NaN           NaN
2      NaN     NaN   NaN          NaN       NaN           NaN
3      NaN     NaN   NaN          NaN       NaN           NaN
4      NaN     NaN   NaN          NaN       NaN           NaN
5      NaN     NaN   NaN          NaN       NaN           NaN
6      NaN     NaN   NaN          NaN       NaN           NaN
7      NaN     NaN   NaN          NaN       NaN           NaN
8      NaN     NaN   NaN          NaN       NaN           NaN
9      NaN     NaN   NaN          NaN       NaN           NaN
10     5.0     2.0   4.0          6.0       1.0           3.0

Step 4.2 – Fill Missing Values By Linear Interpolation

Then we fill these NaN values by interpolating from existing values, i.e. those original values on every 10 rows. As shown below, the 0th and 10th rows contain original values from our dataset. Rows 1-9 are linearly interpolated values between 0th and 10th rows.

expand_df = expand_df.interpolate()
expand_df.head(11)
    Yamcha  Krilin    Goku  Master Shen  Chiaotzu  Tien Shinhan
0    978.0   347.0   446.0        715.0     621.0         723.0
1   1017.7   407.7   513.2        809.4     624.5         749.5
2   1057.4   468.4   580.4        903.8     628.0         776.0
3   1097.1   529.1   647.6        998.2     631.5         802.5
4   1136.8   589.8   714.8       1092.6     635.0         829.0
5   1176.5   650.5   782.0       1187.0     638.5         855.5
6   1216.2   711.2   849.2       1281.4     642.0         882.0
7   1255.9   771.9   916.4       1375.8     645.5         908.5
8   1295.6   832.6   983.6       1470.2     649.0         935.0
9   1335.3   893.3  1050.8       1564.6     652.5         961.5
10  1375.0   954.0  1118.0       1659.0     656.0         988.0

rank_df = rank_df.interpolate()
rank_df.head(11)
    Yamcha  Krilin  Goku  Master Shen  Chiaotzu  Tien Shinhan
0      6.0     1.0   2.0          4.0       3.0           5.0
1      5.9     1.1   2.2          4.2       2.8           4.8
2      5.8     1.2   2.4          4.4       2.6           4.6
3      5.7     1.3   2.6          4.6       2.4           4.4
4      5.6     1.4   2.8          4.8       2.2           4.2
5      5.5     1.5   3.0          5.0       2.0           4.0
6      5.4     1.6   3.2          5.2       1.8           3.8
7      5.3     1.7   3.4          5.4       1.6           3.6
8      5.2     1.8   3.6          5.6       1.4           3.4
9      5.1     1.9   3.8          5.8       1.2           3.2
10     5.0     2.0   4.0          6.0       1.0           3.0

See – now we have decimal “ranks” which will be used as the y-coordinates of the chart. These decimal ranks are essential for making smooth transitions because matplotlib will draw the bars between, let’s say positions 1 and 2. Then with a lot more frames to work with, the animation will look much smoother.

Let’s re-draw the first 3 rows of PL using the below code:

num = 3
fig, axs = plt.subplots(nrows = 1, ncols = num, figsize = (10, 5), tight_layout = True)
for i, ax in enumerate(axs):
    ax.barh(y=rank_df.iloc[i].values,
            tick_label = expand_df.iloc[i].index,
            width = expand_df.iloc[i].values,
            color = plt.cm.Set1(range(6)))
    ax.set_title(f'{i}-th row', fontsize='larger')
    [spine.set_visible(False) for spine in ax.spines.values()]  # remove chart outlines
smoother transition incoming
smoother transition incoming

Now we can kind of “see” that Master Shen is taking over the 2nd place from Tien Shinhan at the 3rd frame above!

Smooth Bar Chart Race Using Python Matplotlib

We’ll use the two newly created dataframe to feed PL data for the animation:

  • rank_df contains just the ranks of characters, each row is a snapshot/frame
  • expand_df contains the actual PL values, each row is a snapshot/frame
  • interval argument inside the FuncAnimation() is to control how fast/slow the animation plays
def update(i):
    ax.clear()
    ax.set_facecolor(plt.cm.Greys(0.2))
    [spine.set_visible(False) for spine in ax.spines.values()]
    hbars = ax.barh(y = rank_df.iloc[i].values,
           tick_label=expand_df.iloc[i].index,
           width = expand_df.iloc[i].values,
           height = 0.8,
           color = plt.cm.Set1(range(11))
           )
    ax.set_title(f'Frame: {i}')
    ax.bar_label(hbars, fmt='%.2d')
    

fig,ax = plt.subplots(#figsize=(10,7),
                      facecolor = plt.cm.Greys(0.2),
                      dpi = 150,
                      tight_layout=True
                     )


data_anime = FuncAnimation(
    fig = fig,
    func = update,
    frames= len(expand_df),
    interval=100
)

The Bar Chart Race Library

Someone actually made a bar_chart_race library that automates the entire process of creating such a plot. In the next tutorial I’ll cover a variation of the bar_chart_race library. Now go create some fun animations yourself!

Additional Resources

How to Create Animation with Matplotlib

Insert rows into a dataframe

One comment

  1. This is pretty need. Thanks for sharing.

    Many applications might not want to show the cumulative total but the current value and how it changes over time. So in terms of your dragon ball powerlevel example, the objective would be to show who has received how many points at any given day and not who has received the most points in total over the time frame.

    How would you change the update-procedure to account for that?

    Thanks for answering. Great work!

    Best
    Alex

Leave a Reply

Your email address will not be published. Required fields are marked *