Last Updated on July 14, 2022 by Jay
In this tutorial, we’ll use Python Dash to create an interactive web application that will update graphs based on user input.
We are going to make a simple data visualization app for historical covid cases in each state of the US. And we’ll go through the following step by step:
- Make a web app with a simple layout
- Get data for the web app
- Make the web app interactive
What is Dash?
Dash
is a Python library for data visualization on the web. The library is developed and maintained by the same team that created plotly
, so sometimes you might hear people call it “plotly dash
“. Similar to its sibling plotly.py
, dash
is written on top of plotly.js
and react.js
. It means that we can create beautiful data visualization (thanks JS!) web apps with our favorite language – Python.
The three tools from the plotly
family each serve a different purpose, and they integrate really well together. We’ll use either plotly
or the express version to make a graph, then use dash
to create the web server, arrange the layout for display and add interactivity between the users and app.
plotly.express | Quick charting, exploratory data analysis |
plotly.graphic_object | Full customization and control of the plotly library |
dash | Interactive chart on the web (current tutorial) |
Why Web App?
Imagine we created an awesome data visualization in Python, and are excited to share it with our friends or colleagues. Do you just take a screenshot and send that to them? Our visualization is fully interactive so sending them a static picture makes no sense. Then do you send them the Python script, but what if they don’t know how to run Python?
The solution lies with Web App, which is really just a dynamic website that people can interact with. The advantage of a web app is you can put the visualization on the Internet, so people can access it from anywhere on almost any computer device.
We’ll use pip to get dash
and plotly
.
pip install dash plotly
Covid Data Visualization
Below screenshot is what the final product looks like:
To start, let’s import both plotly
and dash
.
import dash
from dash import html
from dash import dcc
import plotly.express as px
The html
is a class for writing HTML code, and dcc
is the “dash core components” class, which is used to model dynamic contents such as interactive graphs, dropdowns, etc.
A dash web application contains two major parts: the app layout, and the interactive features of the application. We’ll first create the layout for this web app, and then in the next part, we’ll show how to add the interactivity between the app and users.
Dash App Layout
First of all, we need a Dash
object. Then we need to edit the layout, which describes how the application will look like. This is where HTML knowledge will help. Because we are basically creating HTML components (using Python). Each HTML element will show up in the same order we type in the Python code.
Using the html
object, we can create HTML components such as div, headings, tables, etc.
app = dash.Dash()
app.layout = html.Div(
[html.Div('Hello World From Dash.'),
html.H1('H1 tag here'),
html.Div(dcc.Dropdown(id='dropdown',
options = [{'label':'california', 'value':'california'},
{'label':'illinois', 'value':'illinois',
{'label':'new york', 'value':'new york'}]
)
)
)
app.run_server(debug=True)
Then using the dcc.Dropdown
object we can create a dropdown box with 3 values (california, illionis, new york).
Then we’ll call the app.run_server
method which is used to create and start a flask web server in the backend.
To start this web app, we can simply press F5 or if you are using a virtual environment, you can type the command “Python” then the script name in the command line.
Either way will work, briefly, we should see a message “dash is running on this address 127.0.0.1” which is also referred to as the localhost so we can either just copy and paste this IP address into a web browser, or we can use localhost to replace the IP address, which will also work.
Setting the debug = True
will make our coding process easier, basically, every time we make a coding change (and save the code), we don’t have to stop and relaunch the app, just hit refresh and we should see the change reflected on the app.
For now, the app is just a static page. Whenever we need to stop the web app, just go to the console screen and press ctrl+c. You might need to press it a few times to stop the webserver.
Get Data
For this part, we are going to make a chart using plotly
to show the daily covid cases. First thing, we need to get the data from the Johns Hopkins University covid GitHub repository. https://github.com/CSSEGISandData/COVID-19/blob/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_US.csv
The source data is too big to display on GitHub. However, if you click on “View Raw”, it will take you to the following URL. https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_US.csv
This is essentially a csv file, which means we can use pandas
to read the data into Python.
We can use the groupby
function to aggregate data by state, the argument as_index = False
means not to use the province_state as a new index, then we want to sum up everything for a given state.
The resulting table will give us only state-level case counts.
There’s still a problem – all the case counts are still in separate columns and this is going to make plotting difficult.
df = pd.read_csv(r'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_US.csv')
df_state_lvl = df.groupby('Province_State', as_index=False).sum()
We need to have the case counts into a single column, with the corresponding dates in another column. Something like this:
It’s sort of like unpivoting the header row in the original dataframe, such that we can have all the dates and case counts going down vertically instead of going horizontally.
The pandas
melt function does exactly that – we pass in the state level dataframe, keep the province_state column, then unpivot the date columns by feeding all those columns (as a list) into the value_vars
argument.
df_melt = df_state_lvl.melt(id_vars=['Province_State'], value_vars=df_state_lvl.columns[(df_state_lvl.columns.str[-2:] == '21') | (df_state_lvl.columns.str[-2:] == '20')])
This line here value_vars=df_state_lvl.columns[(df_state_lvl.columns.str[-2:] == '21') | (df_state_lvl.columns.str[-2:] == '20')]
looks at the last two characters of each column name. If they are either ’20’ or ’21’, meaning the year 2020 or 2021, then we know it’s a date column that contains the covid case counts. We want to keep only the date columns and remove everything else.
The unpivoted table will look like the screenshot above. The variable column contains the date, and the value column contains the daily count.
We are done with preparing data, and next, let’s plot it.
Make A Plotly Chart
We are going to display just one state for now, otherwise, the chart will be very crowded.
fig = px.line(df_melt.loc[df_melt['Province_State'] == 'California'], x='variable', y = 'value')
fig.show()
The chart looks good, next step let’s add all the state names into the dropdown in the dash
app layout. We can find all the state names by using the unique
method. Then we’ll create a list of dictionaries with the label and value pair. With this, we just need to pass this list of dictionaries into the dcc.Dropdown
component.
ops = df_melt['Province_State'].unique()
labels = [{'label':i, 'value':i} for i in ops]
fig = px.line(df_melt.loc[df_melt['Province_State'] == 'California'], x='variable', y = 'value')
app.layout = html.Div([html.Div('Hello world from dash updated.'),
html.H1('H1 tag here'),
html.Div(dcc.Dropdown(id='dropdown', options = labels)),
dcc.Graph(id='fig1', figure=fig)])
Link The Dropdown Values With The Graph
As of now, there’s no connection between the dcc.Dropdown
and the graph, but our goal is to link them together so when we select a state in the dropdown, the graph will also update accordingly.
This magic is done using dash callback
functions, which are automatically called by dash
whenever a user provides an input to update some property in another component. it means that users can interact with the graph by adjusting the dropdown values.
The way to implement callback functions in dash
is simple. We need to:
- Write a function to update a component (graph, text, etc)
- Use the
app.callback
decorator to connect the function in 1) with the component we want to update
First, we need to import two other objects called Input
and Output
from the dash library. note it’s from dash.dependencies
as opposed to just dash
from dash.dependencies import Input, Output
The Input
object refers to the stuff that a user is going to change, in this case, the state name from the dropdown box.
the Output
object refers to the things that should be updated, which is the graph.
@app.callback(Output('fig1', 'figure'),
Input('dropdown', 'value'))
def update_graph(state):
df_state = df_melt.loc[df_melt['Province_State'] == state]
fig = px.line(df_state, x = 'variable', y ='value', title = f'{state} cumulative case counts')
return fig
For the decorator @app.callback()
:
- The first argument is the
Ouput
, and in theOutput
, the first argument is the ID of the element, going back to our webpage, this is going to be the graph, or thefig1
and we assigned earlier. then we specify the output type is afigure
- The second argument in the callback function is the
Input
object, similar to theOutput
, the first argumnet for Input is the ID of the element, which is thedropdown
, and the data type is avalue
)
Immediately following the decorator, we write a function to actually do the update. The function name doesn’t really matter, I’m just going to call it update_graph
, but you can call it anything you want.
It must have some arguments, and the number of arguments depends on the number of input objects we have in the callback decorator. In our case, we only have 1 input, so we just need one argument for this function, the name of the argument also doesn’t matter, but I’m going to call it state
, cuz that’s what the data is.
Basically, this function will filter data based on the given state
, then regenerate a new figure using data for just that state.
We can also add a title that will also update the state name as we update the dropdown box.
At the end of the function, we must return some data, which is a plotly figure
object.
How does the callback mechanism work?
- When we select illinois from the dropdown, then that value is going to get passed into the callback decorator as the
Input
, which is further passed into the argumentstate
of the callback function. - The callback function
update_graph
does its thing to update stuff, of course we can write a callback function to update pretty much anything on the web page, it just happens to be the chart in our example. - Once the callback function completes the data update, it will return that data, a
figure
object to the callback decoratorOutput
, which will then update the graph'fig1'
on the website.
That completes the full cycle of the interactive feature of dash
.
Now you know how to make interesting data visualizations, what will you create next?
Putting It All Together
import dash
from dash import html,dcc
import pandas as pd
import plotly.express as px
from dash.dependencies import Input, Output
df = pd.read_csv(r'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_US.csv')
df_state_lvl = df.groupby('Province_State', as_index=False).sum()
df_melt = df_state_lvl.melt(id_vars=['Province_State'], value_vars=df_state_lvl.columns[(df_state_lvl.columns.str[-2:] == '21') | (df_state_lvl.columns.str[-2:] == '20')])
ops = df_melt['Province_State'].unique()
labels = [{'label':i, 'value':i} for i in ops]
fig = px.line(df_melt.loc[df_melt['Province_State'] == 'California'], x='variable', y = 'value')
app = dash.Dash()
app.layout = html.Div([html.Div('Hello world from dash updated.'),
html.H1('H1 tag here'),
html.Div(dcc.Dropdown(id='dropdown', options = labels)),
dcc.Graph(id='fig1', figure=fig)])
@app.callback(Output('fig1', 'figure'),
Input('dropdown', 'value'))
def update_graph(state):
df_state = df_melt.loc[df_melt['Province_State'] == state]
fig = px.line(df_state, x = 'variable', y ='value', title = f'{state} cumulative case counts')
return fig
app.run_server(debug=True)
One comment