248,306 Visitors In 2021 – A Streamlit Tutorial

Sharing is caring!

Last Updated on July 14, 2022 by Jay

248,306 Python lovers from 214 countries visited this site in 2021 and let’s visualize the traffic data in this streamlit tutorial!

Wow, I’m excited! Thank you everyone for your support! I started posting Python tutorials in 2020, and it has been a fun journey in the past 2 years. I’ve learned a lot myself and at the same time helped hundreds of thousands of people around the world. To be honest, I feel that I’m making more impact with my tutorials than I do at my day job.

Data

The data we are using today comes directly from Google Analytics, which is a tool from Google that tracks website traffic. Rest assured, no personally identifiable information was recorded.

The below is what a portion of the data looks like on a given day:

There are country, date, number of users, number of sessions, and average session duration. Most fields are self-explanatory, so I’ll just clarify two fields:

  • Session – A group of user interactions with the website. For example, a single session can contain multiple page views. And a single user can open multiple sessions.
  • Avg Session Duration – Measured in seconds, how long (on average) did the visitors stay on the site on a given day.

Taking the highlighted United Kingdom row as an example: on 01/01/2021, 10 visitors from the UK spent ~ 5min (318.83 seconds) on average on the site learning their favorite subject – Python ????

Required Libraries

We need three libraries for building this project: pandas, plotly, and streamlit.

pip install pandas plotly streamlit

Introduction To Streamlit

Streamlit advertises itself as an “open-source app framework for Machine Learning and Data Science teams.” However, anyone can use it to make a web-based app really fast and easily with pure Python.

Although streamlit is built on top of Flask (a Python web development framework), we almost never need to worry about either front-end or back-end stuff.

Similar to the plotly library, with streamlit we just describe what we want to show declaratively, then the elements will show up on the web page. If you have experience with the plotly dash framework, I’d say that streamlit is the easier / lazier version of dash. Sounds good? Let’s get started with our first streamlit app!

Hello World

It takes just two lines of code to make a streamlit web app! This is pretty awesome.

import streamlit as st
st.title('hello world')

To run this app, first, save it as a .py file. In my case, I saved it as “analytics.py”.

Then in a command prompt/console window, instead of typing python ..., use streamlit run ...

streamlit run analytics.py

If you forgot and still used python ..., do not worry, it’s going to remind you that streamlit run is the correct command to use!

Run a streamlit app

Briefly, a browser window should pop up with the hello world displayed on-page. If no window automatically opens, simply copy/paste the URL from the console window to a browser.

Streamlit app URL

Interactive Coding Environment For Streamlit

For the best coding experience with streamlit, I highly recommend re-arrange your web app and IDE windows and put them side by side. See the below example. The reason for this is the interactive coding experience, which you’ll see next.

Streamlit app & code side by side.

Let’s add some exciting data to the web page. I’ll write a function with pandas to read the Google Analytics data into Python.

def load_data():
    df = pd.read_excel(r'C:\Users\jay\Desktop\PythonInOffice\2021_traffic_overview\google_data.xlsx',
                       parse_dates=['Date'])
    df['Hours Spent']  = df['Sessions'] * df['Avg. Session Duration'] / 3600
    return df
  • The Date column contains string data, the easist way to convert it into a more useful datetime object is by using the parse_date argument within the read_excel method.
  • We can calculate the total hours spent on the site by multiplying the # of sessions and average session duration, then divide it by 3600.

Next, we’ll call this function to load data into our streamlit app. Then, display the dataframe on the web page by calling st.dataframe().

data = load_data()
st.dataframe(data)

Save the new changes, and watch the web page updates with a notification “Source file changed.” Here click on the “Always rerun” button. Then shortly, the web page will update again and display the dataframe.

Streamlit asks for rerun
Streamlit refreshes with a dataframe

Note, in the top right corner of the screen there’s a “hamburger” icon, which is the Setting menu for streamlit. If you already clicked on “Always rerun” button just now, you’ll see the “Run on save” box is already checked. Make sure this is checked. Then optionally, check the “Wide mode” for wider display. You can also change the background color/theme, but I like the dark theme so I will keep it as is.

Streamlit settings

Sidebar In Streamlit

Sidebars are very easy to create with streamlit. Basically, whenever we want to place something on the sidebar, just call st.sidebar, then followed by another method for displaying the item (text, chart, dataframe, etc)

num_visitors = data['Users'].sum()
hours_spent = data['Hours Spent'].sum()
st.sidebar.write('2021 website traffic')
st.sidebar.metric('Total visitors', f'{num_visitors:,}',100)
st.sidebar.metric('Total hours spent', f'{hours_spent:,.2f}',-100)

I want to display two metrics on the sidebar: Total number of visitors in 2021, and total hours spent on the site. We can calculate those two metrics easily, then st.sidebar.metric() to display them on the sidebar. The third argument, 100 and -100 values inside those two metric() methods are for display purposes only, since I don’t have other data to compare with.

Here’s a pro tip: passing numbers into an f-string to convert into a text format will make it easier to style number formats, like adding special signs/thousand delimiters, controlling the decimal points, and etc.

The web page with the above newly added code looks like this, pretty neat, right?

Streamlit app with sidebar

Improve Streamlit Performance

By now you might have noticed (if you are watching the video tutorial above), every time we change the source and save, the app tries to reload, and it takes some time to do that.

This is because whenever we change something and the app reloads, the load_data function also needs to re-run, and because it reads data from Excel into Python, it’s slow.

We don’t really need to reload the data every time the app refreshes, it would be nice if we can skip the reload data part and just refresh the page to show the changes we made, right?

That’s where the cache function/decorator can help. By adding the decorator @st.cache right above our function, the first time we run this function, streamlit will store the results into a local cache, or memory.

The next time when the app refreshes, streamlit knows that it can skip running this function and instead, just reads the output from the local cache.

Sounds all good! However, we need to be aware of not changing the following if we want to skip running the data load:

  • Name of the function
  • Code inside the function body
  • Parameters/arguments of the function

So basically, we can’t touch the function if we want to skip running it! If you change something for the function it has to be re-run at least once.

With this @st.cache feature, our app should refresh much faster.

Interactive Streamlit

On the main body of the web page, we’ll add a “multiselect box” to show all the available countries from the data. By using the pandas.unique() method, we can get all the unique country names.

country_select = st.multiselect('What countries do you want to look at?',data['Country'].unique())

Save the above code, and we have a dropdown-style box that actually can take on multiple values at once.

You don’t want to miss what happens next…it’s a game-changer!

Multiselect box

Let’s display the multiselect box values on the web page. Also upon checking the object’s type, it’s a list.

st.write(country_select)
st.write(type(country_select))

Now, edit values inside the multiselect box by selecting and removing countries.

Do you see the list on the webpage also changes as you edit the menu???

This is super awesome considering we haven’t really done much coding yet. With literally one line of code (the st.multiselect), we get this kind of interactive feature already!

Not just the multiselect box, every streamlit component behaves like this! This means with one line of code, we allow users to pass input into the app.

I know it’s not an apple to apple comparison, but I can’t stop comparing this to plotly dash, where we have to write not only the app layout, but also the callback functions to achieve the exact same thing.

streamlit interactive multiselect menu

Add A Plotly Chart On Streamlit

streamlit is compatible with many charting libraries, but my favorite is still plotly – easy to use and fully interactive, and I think they look much cooler than static charts.

We’ll add a daily visitor count line chart using plotly. It’s fairly easy to calculate the worldwide daily visitor count, simply groupby the Date column, and sum up the number of visitors.

A friendly reminder on plotly chart – plotly doesn’t automatically sort the data for us so we have to make sure the data is properly sorted before making the plot. Otherwise, the plot will be messed up. If you see a messed-up and out of order plot, sort it first. Usually, that will fix the problem.

Use st.plotly_chart to display a plotly figure object. use_container_width = True ensures that the plot can stretch when space permits so we can have a better view.

daily_visitor = data.groupby('Date', as_index=False).sum()
daily_visitor_fig = px.line(daily_visitor, x= 'Date', y ='Users', title='Total visitor count 2021')
st.plotly_chart(daily_visitor_fig, use_container_width = True)

Save the code, the web page should automatically refresh, then we’ll see the below chart, which is a fully functional and interactive plotly chart.

A plotly chart in streamlit

Link User Input And Chart Update (callback)

Note our plotly chart currently is “static” and only can show worldwide daily visitor count. We’ll link it up with a user input – the multiselect box, such that we can control what countries to show on the plot using the multiselect box.

By default, when nothing is selected in the multiselect box, or when the len(country_select) is 0, then we’ll show the worldwide daily count.

Otherwise, we can take the names in the country_select list, pass them into the pandas isin method for filtering. The end result is we keep only those countries in the country_select list and drop everything else.

Note here after applying the isin filter, I had to sort the data by Date in order for plotly to make a chart properly.

if len(country_select) == 0:
        daily_visitor = data.groupby('Date', as_index=False).sum()
        daily_visitor_fig = px.line(daily_visitor, x= 'Date', y ='Users', title='Total visitor count 2021')
    else:
        daily_visitor = data.loc[data['Country'].isin(country_select)].sort_values('Date')
        daily_visitor_fig = px.line(daily_visitor, x='Date', y='Users', color='Country', title=f'{country_select} visitor count 2021')
    st.plotly_chart(daily_visitor_fig, use_container_width = True)

Save the code, after the app refreshes, we get this:

user input updates plotly chart on streamlit

Streamlit Column Layout

Next, I want to add another chart “total hours spent”, and show it with the total visitors count side-by-side.

Creating simple layouts in streamlit is also a breeze! The st.columns() method will insert containers laid out as side-by-side columns for us. I’m so thankful that I don’t have to worry about all the <div> and CSS styling.

Let’s make two containers side-by-side (2 columns).

col1, col2 = st.columns(2)

To place an element into a container/column, simply use a context manager. Below we place the total visitor count chart into “column 1”.

with col1: ##visitor count
    if len(country_select) == 0:
        daily_visitor = data.groupby('Date', as_index=False).sum()
        daily_visitor_fig = px.line(daily_visitor, x= 'Date', y ='Users', title='Total visitor count 2021')
    else:
        daily_visitor = data.loc[data['Country'].isin(country_select)].sort_values('Date')
        daily_visitor_fig = px.line(daily_visitor, x='Date', y='Users', color='Country', title=f'{country_select} visitor count 2021')
    st.plotly_chart(daily_visitor_fig, use_container_width = True)

Then, “column 2” has the total hour spent chart. The chart is fairly simple to make – copy/paste the total visitor chart code, then replace the column name “Users” with “Total Hours Spent”.

If you want, feel free to make a function to do this so you don’t write the same code multiple times.

with col2: ##hours spent
    if len(country_select) == 0:
        daily_visitor = data.groupby('Date', as_index=False).sum()
        daily_visitor_fig = px.line(daily_visitor, x= 'Date', y ='Hours Spent', title='Total hours spent 2021')
    else:
        daily_visitor = data.loc[data['Country'].isin(country_select)].sort_values('Date')
        daily_visitor_fig = px.line(daily_visitor, x='Date', y='Hours Spent', color='Country', title=f'{country_select} hours spent 2021')
    st.plotly_chart(daily_visitor_fig, use_container_width = True)

Save the code and reload, then we get this:

streamlit layout – columns

Slider As An Input

I got curious and wanted to know how many countries are in the data. So,

st.write(len(data['Country'].unique()))

A whopping 214 countries!! (plz insert confetti here, of course in your mind)

I really appreciate you guys taking the time to read what I have to share, it means a lot to me. Knowing that I can help hundreds of thousands of people on a global scale is what motivates me the most to continue this journey. I will do my best to continue to deliver high-quality tutorials for you guys & girls❤️!

Back to the tutorial.

We’ll add a slider to the web app, which basically allows users to input integer values by sliding a widget, the same function as a simple text box, but looks much cooler.

top_country = st.slider('select number of top countries to look at:')

Since we also assigned the slider to a variable named top_country. It means whenever we move slider to set a new value, the top_country variable value will change accordingly.

Next, let’s add a bar chart to show the top countries in terms of the number of visitors.

This time we aggregate the visitor count by using groupby on Country column. Then we can rank and find the largest countries with the nlargest method. The slider value top_country allows us to control the number of countries we want to display on the chart.

top_n_countries = data.groupby('Country').agg({'Users':'sum'})['Users'].nlargest(top_country)
top_n_fig = px.bar(top_n_countries, x=top_n_countries.index, y='Users', color=top_n_countries.index)
st.plotly_chart(top_n_fig, use_container_width = True)
streamlit slider controls a plotly bar chart for country ranking

Not much surprise here to be honest, as both the US and India are large English-speaking countries population-wise.

There’s no way to display all countries in this bar chart. I mean technically there is a way, but we won’t be able to see anything as 214 countries have to squeeze inside this little chart. So why not plot all the countries on a map?!

Plotly Choropleth Map

Here comes the choropleth map, a kind of map composed of colored areas. It’s usually used to represent some quantity variables on a map. Just perfect for our use case – we’ll use colors to represent the number of visitors from each country.

Again we can use plotly to make the chart. A few notes on the arguments inside px.choropleth method:

  • locations indicates which column (in the data) to find the geographic information, in our case its the Country column
  • locationmode can be one of the three values: ‘ISO-3’, ‘USA-states’, or ‘country names’.
  • color really means the quantity/number of visitors, since we use different colors to represent quantity in a choroploeth map
  • color_continuous_scale = px.colors.sequential.Rainbow a built-in color grading scheme. This link has all the built-in schemes to choose from: https://plotly.com/python/builtin-colorscales/
data_by_country = data.groupby('Country', as_index=False).sum()
map_fig = px.choropleth(data_by_country, locations = 'Country',
                        locationmode = 'country names',
                        color = 'Users',
                        color_continuous_scale = px.colors.sequential.Rainbow)
st.plotly_chart(map_fig, use_container_width = True)

I like colorful maps, and I like the fact that we have a visitor from Antarctica even more!

Code Block

Since the streamlit framework is built for machine learning and data science teams, they made sure code can be shared easily on the web app.

With another simple one-liner, we can display code on the web app. We can even choose the language argument, I’m guessing probably for different syntax highlighting.

st.code(code, language='python')

Please find the full code of the tutorial below. Now go and make your own awesome data visualization dashboard/web apps!

code = r"""
import streamlit as st
import pandas as pd
import plotly.express as px

@st.cache
def load_data():
    df = pd.read_excel(r'C:\Users\jay\Desktop\PythonInOffice\2021_traffic_overview\google_data.xlsx',
                       parse_dates=['Date'])
    df['Hours Spent']  = df['Sessions'] * df['Avg. Session Duration'] / 3600
    return df


st.title('pythoninoffice.com 2021 traffic overview')
data = load_data()
st.dataframe(data)

## side bar
num_visitors = data['Users'].sum()
hours_spent = data['Hours Spent'].sum()
st.sidebar.write('2021 website traffic')
st.sidebar.metric('Total visitors', f'{num_visitors:,}',100)
st.sidebar.metric('Total hours spent', f'{hours_spent:,.2f}',-100)

## main body
country_select = st.multiselect('What countries do you want to look at?',data['Country'].unique())
#st.write(country_select)
#st.write(type(country_select))
col1, col2 = st.columns(2)

with col1: ##visitor count
    if len(country_select) == 0:
        daily_visitor = data.groupby('Date', as_index=False).sum()
        daily_visitor_fig = px.line(daily_visitor, x= 'Date', y ='Users', title='Total visitor count 2021')
    else:
        daily_visitor = data.loc[data['Country'].isin(country_select)].sort_values('Date')
        daily_visitor_fig = px.line(daily_visitor, x='Date', y='Users', color='Country', title=f'{country_select} visitor count 2021')
    st.plotly_chart(daily_visitor_fig, use_container_width = True)

with col2: ##hours spent
    if len(country_select) == 0:
        daily_visitor = data.groupby('Date', as_index=False).sum()
        daily_visitor_fig = px.line(daily_visitor, x= 'Date', y ='Hours Spent', title='Total hours spent 2021')
    else:
        daily_visitor = data.loc[data['Country'].isin(country_select)].sort_values('Date')
        daily_visitor_fig = px.line(daily_visitor, x='Date', y='Hours Spent', color='Country', title=f'{country_select} hours spent 2021')
    st.plotly_chart(daily_visitor_fig, use_container_width = True)

##slider
top_country = st.slider('select number of top countries to look at:')

##top n largest
top_n_countries = data.groupby('Country').agg({'Users':'sum'})['Users'].nlargest(top_country)
top_n_fig = px.bar(top_n_countries, x=top_n_countries.index, y='Users', color=top_n_countries.index)
st.plotly_chart(top_n_fig, use_container_width = True)

##choropleth map
data_by_country = data.groupby('Country', as_index=False).sum()
map_fig = px.choropleth(data_by_country, locations = 'Country',
                        locationmode = 'country names',
                        color = 'Users',
                        color_continuous_scale = px.colors.sequential.Rainbow)
st.plotly_chart(map_fig, use_container_width = True)"""

Leave a Reply

Your email address will not be published. Required fields are marked *