How to Make a WordCloud in Python

Sharing is caring!

Last Updated on July 14, 2022 by Jay

This tutorial will show you how to make a wordcloud in Python. Wordcloud is a type of visualization for text data. The below image is a wordcloud. Some words are bigger and bolder while others are smaller. Usually, the more often certain words are mentioned in the data, the bigger those words will appear in this visualization.

In the following wordcloud, the top three keywords are: “vehicle”, “energy” and “year”. Let’s make it now.

wordcloud python
wordcloud python

Libraries

Install the following libraries using pip:

pip install wordcloud numpy matplotlib pillow

Wordcloud in Python

The text data is an excerpt from Telsa’s 2021 impact report that describes the company’s goals. For your convenience, I saved a copy of the text and the source code for this tutorial in this GitHub repository: https://github.com/pythoninoffice/blog_example_code/blob/main/wordcloud.ipynb

from wordcloud import WordCloud
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image

text_data = '......' # see link to the source code

The wordcloud library is quite easy to use. It literally creates a wordcloud visualization in one line of Python code. (Not counting the code to show it)

Note the below code plt.axis(“off”) will hide axis, this is optional and only for better appearance purposes.

Also note to display the wordcloud, we need to use plt.imshow(), not the normal plt.show().

wc = WordCloud().generate(test_data)
plt.axis('off')
plt.imshow(wc)

The color and position of each word are randomized each time we run WordCloud().generate(). Below are a few examples:

wordcloud examples
wordcloud examples

Special shapes

To spice up the wordcloud, we can organize the words into any shape instead of just a rectangle.

I suggest using a black and white image for the best result, also we don’t need extra processing for the image. I found an image of the Apple logo – but you are free to use whatever image you want.

We’ll use the Pillow library to read the image into Python. To a computer, an image is just a matrix of integer numbers ranging from 0 to 255. The numpy library conveniently converts a Pillow image object into an np.array object. Note the [255,255,255] corresponds to the RGB color values. A value of [0,0,0] represents black, and a value of [255,255,255] represents white.

img_mask = np.array(Image.open(img_url))
img_mask
array([[[255, 255, 255],
        [255, 255, 255],
        [255, 255, 255],
        ...,
        [255, 255, 255],
        [255, 255, 255],
        [255, 255, 255]],
       
       ...,
   
       [[255, 255, 255],
        [255, 255, 255],
        [255, 255, 255],
        ...,
        [255, 255, 255],
        [255, 255, 255],
        [255, 255, 255]]], dtype=uint8)
apple logo black and white image
apple logo black and white image

Note the above image, the apple shape is in black and background is in white – this is exactly how we want it. The area in white color is the “mask”. The wordcloud library will not show anything in the (white) masking area, at the same time, it will find a way to organize words inside the apple logo shape.

wc = WordCloud(width=1600, height=1600, mask= img_mask, background_color = 'white').generate(text_data)

plt.figure(figsize=[10,10])
plt.axis("off")
plt.imshow(wc)
wordcloud-python
wordcloud-python

We can also add a borderline (contour) around the words if you think the shape isn’t obvious enough. Simply pass in the contour_width and contour_color arguments into the WordCloud() constructor:

wc = WordCloud(width=1600, height=1600, mask= img_mask, background_color = 'white',
               contour_width=1,
               contour_color='red'
              ).generate(a)

plt.figure(figsize=[10,10])
plt.axis("off")
plt.imshow(wc)
wordcloud-python-with-contour
wordcloud-python-with-contour

Additional resources

How To Make Waterfall Chart In Python Matplotlib

Create Animation with Matplotlib

How to Create the Bar Chart Race Plot in Python

Leave a Reply

Your email address will not be published. Required fields are marked *