Last Updated on July 14, 2022 by Jay
248,306 Python lovers from 214 countries visited this site in 2021 and let’s visualize the traffic data in this streamlit tutorial!
Wow, I’m excited! Thank you everyone for your support! I started posting Python tutorials in 2020, and it has been a fun journey in the past 2 years. I’ve learned a lot myself and at the same time helped hundreds of thousands of people around the world. To be honest, I feel that I’m making more impact with my tutorials than I do at my day job.
Data
The data we are using today comes directly from Google Analytics, which is a tool from Google that tracks website traffic. Rest assured, no personally identifiable information was recorded.
The below is what a portion of the data looks like on a given day:
There are country, date, number of users, number of sessions, and average session duration. Most fields are self-explanatory, so I’ll just clarify two fields:
- Session – A group of user interactions with the website. For example, a single session can contain multiple page views. And a single user can open multiple sessions.
- Avg Session Duration – Measured in seconds, how long (on average) did the visitors stay on the site on a given day.
Taking the highlighted United Kingdom row as an example: on 01/01/2021, 10 visitors from the UK spent ~ 5min (318.83 seconds) on average on the site learning their favorite subject – Python ????
Required Libraries
We need three libraries for building this project: pandas
, plotly
, and streamlit
.
pip install pandas plotly streamlit
Introduction To Streamlit
Streamlit advertises itself as an “open-source app framework for Machine Learning and Data Science teams.” However, anyone can use it to make a web-based app really fast and easily with pure Python.
Although streamlit
is built on top of Flask
(a Python web development framework), we almost never need to worry about either front-end or back-end stuff.
Similar to the plotly
library, with streamlit
we just describe what we want to show declaratively, then the elements will show up on the web page. If you have experience with the plotly dash framework, I’d say that streamlit
is the easier / lazier version of dash
. Sounds good? Let’s get started with our first streamlit
app!
Hello World
It takes just two lines of code to make a streamlit
web app! This is pretty awesome.
import streamlit as st
st.title('hello world')
To run this app, first, save it as a .py file. In my case, I saved it as “analytics.py”.
Then in a command prompt/console window, instead of typing python ...
, use streamlit run ...
streamlit run analytics.py
If you forgot and still used python ...
, do not worry, it’s going to remind you that streamlit run
is the correct command to use!
Briefly, a browser window should pop up with the hello world displayed on-page. If no window automatically opens, simply copy/paste the URL from the console window to a browser.
Interactive Coding Environment For Streamlit
For the best coding experience with streamlit, I highly recommend re-arrange your web app and IDE windows and put them side by side. See the below example. The reason for this is the interactive coding experience, which you’ll see next.
Let’s add some exciting data to the web page. I’ll write a function with pandas to read the Google Analytics data into Python.
def load_data():
df = pd.read_excel(r'C:\Users\jay\Desktop\PythonInOffice\2021_traffic_overview\google_data.xlsx',
parse_dates=['Date'])
df['Hours Spent'] = df['Sessions'] * df['Avg. Session Duration'] / 3600
return df
- The Date column contains string data, the easist way to convert it into a more useful datetime object is by using the
parse_date
argument within theread_excel
method. - We can calculate the total hours spent on the site by multiplying the # of sessions and average session duration, then divide it by 3600.
Next, we’ll call this function to load data into our streamlit app. Then, display the dataframe on the web page by calling st.dataframe()
.
data = load_data()
st.dataframe(data)
Save the new changes, and watch the web page updates with a notification “Source file changed.” Here click on the “Always rerun” button. Then shortly, the web page will update again and display the dataframe.
Note, in the top right corner of the screen there’s a “hamburger” icon, which is the Setting menu for streamlit. If you already clicked on “Always rerun” button just now, you’ll see the “Run on save” box is already checked. Make sure this is checked. Then optionally, check the “Wide mode” for wider display. You can also change the background color/theme, but I like the dark theme so I will keep it as is.
Sidebar In Streamlit
Sidebars are very easy to create with streamlit
. Basically, whenever we want to place something on the sidebar, just call st.sidebar
, then followed by another method for displaying the item (text, chart, dataframe, etc)
num_visitors = data['Users'].sum()
hours_spent = data['Hours Spent'].sum()
st.sidebar.write('2021 website traffic')
st.sidebar.metric('Total visitors', f'{num_visitors:,}',100)
st.sidebar.metric('Total hours spent', f'{hours_spent:,.2f}',-100)
I want to display two metrics on the sidebar: Total number of visitors in 2021, and total hours spent on the site. We can calculate those two metrics easily, then st.sidebar.metric()
to display them on the sidebar. The third argument, 100 and -100 values inside those two metric()
methods are for display purposes only, since I don’t have other data to compare with.
Here’s a pro tip: passing numbers into an f-string to convert into a text format will make it easier to style number formats, like adding special signs/thousand delimiters, controlling the decimal points, and etc.
The web page with the above newly added code looks like this, pretty neat, right?
Improve Streamlit Performance
By now you might have noticed (if you are watching the video tutorial above), every time we change the source and save, the app tries to reload, and it takes some time to do that.
This is because whenever we change something and the app reloads, the load_data
function also needs to re-run, and because it reads data from Excel into Python, it’s slow.
We don’t really need to reload the data every time the app refreshes, it would be nice if we can skip the reload data part and just refresh the page to show the changes we made, right?
That’s where the cache
function/decorator can help. By adding the decorator @st.cache
right above our function, the first time we run this function, streamlit
will store the results into a local cache, or memory.
The next time when the app refreshes, streamlit
knows that it can skip running this function and instead, just reads the output from the local cache.
Sounds all good! However, we need to be aware of not changing the following if we want to skip running the data load:
- Name of the function
- Code inside the function body
- Parameters/arguments of the function
So basically, we can’t touch the function if we want to skip running it! If you change something for the function it has to be re-run at least once.
With this @st.cache
feature, our app should refresh much faster.
Interactive Streamlit
On the main body of the web page, we’ll add a “multiselect box” to show all the available countries from the data. By using the pandas.unique() method, we can get all the unique country names.
country_select = st.multiselect('What countries do you want to look at?',data['Country'].unique())
Save the above code, and we have a dropdown-style box that actually can take on multiple values at once.
You don’t want to miss what happens next…it’s a game-changer!
Multiselect box
Let’s display the multiselect
box values on the web page. Also upon checking the object’s type, it’s a list.
st.write(country_select)
st.write(type(country_select))
Now, edit values inside the multiselect
box by selecting and removing countries.
Do you see the list on the webpage also changes as you edit the menu???
This is super awesome considering we haven’t really done much coding yet. With literally one line of code (the st.multiselect
), we get this kind of interactive feature already!
Not just the multiselect
box, every streamlit component behaves like this! This means with one line of code, we allow users to pass input into the app.
I know it’s not an apple to apple comparison, but I can’t stop comparing this to plotly dash, where we have to write not only the app layout, but also the callback functions to achieve the exact same thing.
Add A Plotly Chart On Streamlit
streamlit is compatible with many charting libraries, but my favorite is still plotly
– easy to use and fully interactive, and I think they look much cooler than static charts.
We’ll add a daily visitor count line chart using plotly
. It’s fairly easy to calculate the worldwide daily visitor count, simply groupby
the Date column, and sum up the number of visitors.
A friendly reminder on plotly chart – plotly doesn’t automatically sort the data for us so we have to make sure the data is properly sorted before making the plot. Otherwise, the plot will be messed up. If you see a messed-up and out of order plot, sort it first. Usually, that will fix the problem.
Use st.plotly_chart
to display a plotly
figure
object. use_container_width = True
ensures that the plot can stretch when space permits so we can have a better view.
daily_visitor = data.groupby('Date', as_index=False).sum()
daily_visitor_fig = px.line(daily_visitor, x= 'Date', y ='Users', title='Total visitor count 2021')
st.plotly_chart(daily_visitor_fig, use_container_width = True)
Save the code, the web page should automatically refresh, then we’ll see the below chart, which is a fully functional and interactive plotly chart.
Link User Input And Chart Update (callback)
Note our plotly
chart currently is “static” and only can show worldwide daily visitor count. We’ll link it up with a user input – the multiselect
box, such that we can control what countries to show on the plot using the multiselect
box.
By default, when nothing is selected in the multiselect box, or when the len(country_select)
is 0, then we’ll show the worldwide daily count.
Otherwise, we can take the names in the country_select
list, pass them into the pandas
isin
method for filtering. The end result is we keep only those countries in the country_select
list and drop everything else.
Note here after applying the isin
filter, I had to sort the data by Date in order for plotly
to make a chart properly.
if len(country_select) == 0:
daily_visitor = data.groupby('Date', as_index=False).sum()
daily_visitor_fig = px.line(daily_visitor, x= 'Date', y ='Users', title='Total visitor count 2021')
else:
daily_visitor = data.loc[data['Country'].isin(country_select)].sort_values('Date')
daily_visitor_fig = px.line(daily_visitor, x='Date', y='Users', color='Country', title=f'{country_select} visitor count 2021')
st.plotly_chart(daily_visitor_fig, use_container_width = True)
Save the code, after the app refreshes, we get this:
Streamlit Column Layout
Next, I want to add another chart “total hours spent”, and show it with the total visitors count side-by-side.
Creating simple layouts in streamlit
is also a breeze! The st.columns()
method will insert containers laid out as side-by-side columns for us. I’m so thankful that I don’t have to worry about all the <div> and CSS styling.
Let’s make two containers side-by-side (2 columns).
col1, col2 = st.columns(2)
To place an element into a container/column, simply use a context manager. Below we place the total visitor count chart into “column 1”.
with col1: ##visitor count
if len(country_select) == 0:
daily_visitor = data.groupby('Date', as_index=False).sum()
daily_visitor_fig = px.line(daily_visitor, x= 'Date', y ='Users', title='Total visitor count 2021')
else:
daily_visitor = data.loc[data['Country'].isin(country_select)].sort_values('Date')
daily_visitor_fig = px.line(daily_visitor, x='Date', y='Users', color='Country', title=f'{country_select} visitor count 2021')
st.plotly_chart(daily_visitor_fig, use_container_width = True)
Then, “column 2” has the total hour spent chart. The chart is fairly simple to make – copy/paste the total visitor chart code, then replace the column name “Users” with “Total Hours Spent”.
If you want, feel free to make a function to do this so you don’t write the same code multiple times.
with col2: ##hours spent
if len(country_select) == 0:
daily_visitor = data.groupby('Date', as_index=False).sum()
daily_visitor_fig = px.line(daily_visitor, x= 'Date', y ='Hours Spent', title='Total hours spent 2021')
else:
daily_visitor = data.loc[data['Country'].isin(country_select)].sort_values('Date')
daily_visitor_fig = px.line(daily_visitor, x='Date', y='Hours Spent', color='Country', title=f'{country_select} hours spent 2021')
st.plotly_chart(daily_visitor_fig, use_container_width = True)
Save the code and reload, then we get this:
Slider As An Input
I got curious and wanted to know how many countries are in the data. So,
st.write(len(data['Country'].unique()))
A whopping 214 countries!! (plz insert confetti here, of course in your mind)
I really appreciate you guys taking the time to read what I have to share, it means a lot to me. Knowing that I can help hundreds of thousands of people on a global scale is what motivates me the most to continue this journey. I will do my best to continue to deliver high-quality tutorials for you guys & girls❤️!
Back to the tutorial.
We’ll add a slider
to the web app, which basically allows users to input integer values by sliding a widget, the same function as a simple text box, but looks much cooler.
top_country = st.slider('select number of top countries to look at:')
Since we also assigned the slider
to a variable named top_country
. It means whenever we move slider
to set a new value, the top_country
variable value will change accordingly.
Next, let’s add a bar chart to show the top countries in terms of the number of visitors.
This time we aggregate the visitor count by using groupby
on Country column. Then we can rank and find the largest countries with the nlargest
method. The slider
value top_country
allows us to control the number of countries we want to display on the chart.
top_n_countries = data.groupby('Country').agg({'Users':'sum'})['Users'].nlargest(top_country)
top_n_fig = px.bar(top_n_countries, x=top_n_countries.index, y='Users', color=top_n_countries.index)
st.plotly_chart(top_n_fig, use_container_width = True)
Not much surprise here to be honest, as both the US and India are large English-speaking countries population-wise.
There’s no way to display all countries in this bar chart. I mean technically there is a way, but we won’t be able to see anything as 214 countries have to squeeze inside this little chart. So why not plot all the countries on a map?!
Plotly Choropleth Map
Here comes the choropleth map, a kind of map composed of colored areas. It’s usually used to represent some quantity variables on a map. Just perfect for our use case – we’ll use colors to represent the number of visitors from each country.
Again we can use plotly
to make the chart. A few notes on the arguments inside px.choropleth
method:
locations
indicates which column (in the data) to find the geographic information, in our case its the Country columnlocationmode
can be one of the three values: ‘ISO-3’, ‘USA-states’, or ‘country names’.- ISO-3 is a type of country code consists of three letters, this wiki page has a list for most countries
- USA-states requires two-letter state abbreviations
- country names just requires the normal country names
color
really means the quantity/number of visitors, since we use different colors to represent quantity in a choroploeth mapcolor_continuous_scale = px.colors.sequential.Rainbow
a built-in color grading scheme. This link has all the built-in schemes to choose from: https://plotly.com/python/builtin-colorscales/
data_by_country = data.groupby('Country', as_index=False).sum()
map_fig = px.choropleth(data_by_country, locations = 'Country',
locationmode = 'country names',
color = 'Users',
color_continuous_scale = px.colors.sequential.Rainbow)
st.plotly_chart(map_fig, use_container_width = True)
I like colorful maps, and I like the fact that we have a visitor from Antarctica even more!
Code Block
Since the streamlit framework is built for machine learning and data science teams, they made sure code can be shared easily on the web app.
With another simple one-liner, we can display code on the web app. We can even choose the language
argument, I’m guessing probably for different syntax highlighting.
st.code(code, language='python')
Please find the full code of the tutorial below. Now go and make your own awesome data visualization dashboard/web apps!
code = r"""
import streamlit as st
import pandas as pd
import plotly.express as px
@st.cache
def load_data():
df = pd.read_excel(r'C:\Users\jay\Desktop\PythonInOffice\2021_traffic_overview\google_data.xlsx',
parse_dates=['Date'])
df['Hours Spent'] = df['Sessions'] * df['Avg. Session Duration'] / 3600
return df
st.title('pythoninoffice.com 2021 traffic overview')
data = load_data()
st.dataframe(data)
## side bar
num_visitors = data['Users'].sum()
hours_spent = data['Hours Spent'].sum()
st.sidebar.write('2021 website traffic')
st.sidebar.metric('Total visitors', f'{num_visitors:,}',100)
st.sidebar.metric('Total hours spent', f'{hours_spent:,.2f}',-100)
## main body
country_select = st.multiselect('What countries do you want to look at?',data['Country'].unique())
#st.write(country_select)
#st.write(type(country_select))
col1, col2 = st.columns(2)
with col1: ##visitor count
if len(country_select) == 0:
daily_visitor = data.groupby('Date', as_index=False).sum()
daily_visitor_fig = px.line(daily_visitor, x= 'Date', y ='Users', title='Total visitor count 2021')
else:
daily_visitor = data.loc[data['Country'].isin(country_select)].sort_values('Date')
daily_visitor_fig = px.line(daily_visitor, x='Date', y='Users', color='Country', title=f'{country_select} visitor count 2021')
st.plotly_chart(daily_visitor_fig, use_container_width = True)
with col2: ##hours spent
if len(country_select) == 0:
daily_visitor = data.groupby('Date', as_index=False).sum()
daily_visitor_fig = px.line(daily_visitor, x= 'Date', y ='Hours Spent', title='Total hours spent 2021')
else:
daily_visitor = data.loc[data['Country'].isin(country_select)].sort_values('Date')
daily_visitor_fig = px.line(daily_visitor, x='Date', y='Hours Spent', color='Country', title=f'{country_select} hours spent 2021')
st.plotly_chart(daily_visitor_fig, use_container_width = True)
##slider
top_country = st.slider('select number of top countries to look at:')
##top n largest
top_n_countries = data.groupby('Country').agg({'Users':'sum'})['Users'].nlargest(top_country)
top_n_fig = px.bar(top_n_countries, x=top_n_countries.index, y='Users', color=top_n_countries.index)
st.plotly_chart(top_n_fig, use_container_width = True)
##choropleth map
data_by_country = data.groupby('Country', as_index=False).sum()
map_fig = px.choropleth(data_by_country, locations = 'Country',
locationmode = 'country names',
color = 'Users',
color_continuous_scale = px.colors.sequential.Rainbow)
st.plotly_chart(map_fig, use_container_width = True)"""