Last Updated on July 14, 2022 by Jay
In this tutorial, we’ll learn exploratory data analysis (EDA) with a Python library called
dtale. I first came across this library back in 2019. It is now one of my go-to libraries for exploratory data analysis (EDA). You can use this library to almost replace Excel entirely because it’s got a spreadsheet look & feel, plus all the powerful stuff that Python offers.
pip install dtale
Dtale in IDLE
Dtale works fine in either the traditional IDE or a notebook-style environment. Regardless of our choice of IDE, it’s very likely that we’ll eventually need to use the “full-screen mode” as opposed to a tiny window embedded in the notebook.
Let’s load up some data in IDLE and check out this awesome library. We are going to use the gapminder dataset.
import plotly.express as px import dtale df=px.data.gapminder() d = dtale.show(df)
After running the dtale.show(df), you should see something like this in the output:
Essentially dtale creates a flask server with the address shown above. NOTE that you’ll have a different URL than mine since the first part DESKTOP-T70R6NO is just the computer name.
To open the data in dtale, we can either:
- Copy & Paste the URL into a browser
Dtale in Jupyter Notebook (NOT recommended)
We’ll use the same code to load the gapminder dataset and fire up dtale UI. Note that with Jupyter, we can simply type “d” to show the dtale UI inside the notebook. Of course, if we have a large dataset, we’ll probably want to use the full-screen instead of the small window inside the notebook. So
d.open_browser() will open the dtale UI from another browser page.
“Spreadsheet in a browser”
The dtale user interface doesn’t only look like a spreadsheet inside a browser, but it also has a lot of the functionalities of a spreadsheet program.
There are two ways to access the menu. First is by clicking on the arrow button on the top left corner.
The second way is by hovering the mouse slightly above the column names, then the menu will appear above the column header.
dtale supports “editing in place”, meaning that we can just double click on a cell and we are in editing mode. The changes we make in the cells will be automatically saved & reflected in the dataframe. Just note that pressing Ctrl+Z will not undo the data changes.
dtale UI has a feature that’s similar to Excel’s conditional formatting. Head to the menu Highlight-> Highlight Range.
Then, we can select a column (or all columns) and set criteria. In this case, I’m setting the gdpPercap to be > 5000, then highlight in green. Click on Apply. We’ll see all the eligible cells are now highlighted in green!
Operation on columns
By clicking on a column name, another menu will appear and we can do a lot of common tasks on the column, for example, sort, filter, hide, delete, rename, etc.
We can also click on the Describe to show statistics just for the selected column. We’ll get metrics such as data count, max/min, average, etc, and a nice box plot!
We can even create simple calculated columns in the dtale UI. Head to the main menu (by clicking on the arrow key) then select Dataframe Functions.
Let’s try to create a new column to display the total GDP. We’ll name this column as “GDP”, then have the Numeric selected. In the Operation section, choose “Multiple”, then the first Col as pop, and second Col as gpdPercap. At the bottom of the screen, it also shows a preview of the code to help us understand what we are about to do.
Hit “Create” button, then we’ll see the GDP column appear in our spreadsheet.
That was just a simple example, and as you see there are many other things we can do in the menu, feel free to play around with it!
Head to menu Actions -> Summarize Data, we can find several ways to summarize data, including the pivot table.
The GroupBy is very useful, it’s essentially the same as the
pandas.groupby function, which I covered in detail in this tutorial.
Let’s find out the average age for each continent by using the GroupBy feature. We’ll GroupBy the column “continent” and “year”. Then choose “By Function” with Func: Mean, and the value “lifeExp”. Again, we can see the actual Python code at the bottom of the screen to get a sense what it’s trying to do under the hood. Because I still want to keep my original dataframe, so I’ll choose the New Instance output.
Next, we are going to use the Pivot table function to reshape the data in the same way. As we can see, the Python code is at the bottom of the window to help us understand what’s going on. Click on the “Execute” button to create the new dataframe.
In the previous example where we created new data instances for GroupBy and Pivot. Now go back to the GroupBy and Pivot dataframe webpages, you’ll see something like the below in their URL. The numbers represent a different dataframe instance in the Python environment.
To see how many active instances we currently have, go to the dtale menu, and click on “Instances”. A table should pop up with all the active instances like the below screenshot.
Note that even if we close a browser page for a given instance, it will still exist in the Python environment (Flask server) and we can always bring it back up by clicking on the “Preview” button.
Think about each instance being a separate dataframe that we create in Python. So it’s important to keep in mind which dataframe we want to use before performing any action. In our case, there are 5 active dataframes. If we want to work with the original dataset, we need to open the #1 instance.
To delete an instance, press the recycle bin icon in front of an instance.
Let’s start with instance #1, i.e. the original dataset. We’ll plot the data points to get a better sense of them. Head to the menu Visualize -> Charts, then a new webpage will show up with all the charting controls. If you pay a little attention to the chart, you’ll find out that dtale uses plotly as the charting library.
We’ll draw a line chart to show the average life expectancy of each continent over the years. We’ll have the following settings in the charting page:
- X: year
- Y: lifeExp
- Group: continent
- Aggregation: Mean
You’ll notice that as we input values, the chart starts to load & update to reflect user inputs. This charting system is likely based on
plotly dash. I didn’t look at the
dtale source code, but I’m 99.420% certain this is
There are many other plot types, so feel free to play with other ones!
Head to the menu Visualize – Correlations, then we can create this correlation matrix with just a few clicks.
Now that we we have a pivot table, and we can export the table into a csv file. Head to the menu, hover over the dtale icon, Export -> csv.
- All-in-one suite, EDA, interact with data, charting, data editing
- A combination of pandas_profiling, sweetviz, pandasgui
- By not means a lightweight library. Its extensive functionalities require a lot of libraries
- Can be slow to load
- steeper learning cuve