How To Generate Random Data In Python

Sharing is caring!

Last Updated on July 14, 2022 by Jay

This tutorial will show you how to generate random and unique data in Python easily and we will use a library called faker.

Install Library

First, let’s start off by installing the library using pip.

pip install faker

Generate Random Data In Python

Then, to generate random data using the Python faker library, all we need is a Faker object, which will let us generate random names, addresses, and even (of course fake) credit card numbers and airline information!

from faker import Faker
fake = Faker()

fake.name()
'Charles Morgan'
'0541 Robert Rapids Apt. 512\nNorth Tracey, MO 57795'

fake.credit_card_number()
'376541772271895'

Reproducible Random Data

Note that each time we run the above code, we’ll get different results due to the library’s random nature. So you’ll get different names when running the code on your end.

Like many random generators, we can use a seed to ensure that other people can reproduce the results. So run the below 2 lines of code to reproduce the below result:

Faker.seed(0)
fake.name()

'Norma Fisher'

Random And Unique Data

The Faker object has an attribute .unique, which we can use to help generate unique data for the lifetime of a Faker instance.

Let’s test this, the below code proves that all 10,000 random names are unique. Note we first create a list containing 10,000 random names using list comprehension, then convert the list into a set, which would remove any duplicate values. As shown below, all 10,000 generated names are unique.

rand_names = [fake.unique.name() for i in range(10000)]
unique_num = len(set(rand_names))
unique_num
10000

Foreign Random Data

Faker can generate not only English data but also data in other languages and locales. By default, the locale in faker is set to be US/English. We can check that by calling the .locales attribute.

fake.locales
['en_US']

In order to add multiple locales in the random generator, we just need to pass a locale list into the Faker() constructor.

from faker import Faker

locale_list = ['de_DE', 'cs_CZ', 'en_US','fr_FR','it_IT', 'ja_JP', 'hi_IN','zh_CN']
fake2 = Faker(locale_list)
[fake2.name() for i in range(20)]

['鈴木 治',
 'Dipl.-Ing. Daniela Schottin B.Eng.',
 'Victor du Meunier',
 'ईश लोदी',
 'Tabea Zobel MBA.',
 'हुसैन अग्रवाल',
 'Scott Henson',
 'Samantha Ramos',
 'Nicoletta Gottardi-Mascagni',
 'Balthasar Bauer B.Sc.',
 '李鑫',
 'Mariusz Drub-Henk',
 'Brian Wagner',
 'Adamo Porzio',
 'Augustin Maréchal',
 'Charles de Roy',
 'Jared Harris',
 '遠藤 修平',
 'Dana Němcová',
 'Brunhild Scholtz']

What Kind Of Random Data Is Available?

So how do we find out what kind of random data can faker generate? The answer is that it’s a quite long list, which we can find by calling Faker.__dir__(). There are ~300 of them, so take your time and see if you find anything interesting!

fake.__dir__()

Extended Random Data

Although faker already provides a wide variety of random data, a few cool dudes on the Internet went above and beyond by extending the random data that Faker can provide. However, we need to install additional libraries to use these other random data, which are referred to as “providers”, and act as an add-on to the base Faker library. Below are a few interesting ones:

Provider nameDescriptionLibrary Name
AirtravelAirport names, airport codes, and flights.faker_airtravel
MicroserviceFake microservice namesfaker_microservice
MusicMusic genres, subgenres, and instruments.faker_music
VehicleFake vehicle information includes Year Make Modelfaker_vehicle

Let’s take a look at the faker_airtravel specifically and see how it works. Again, we use pip to install it.

pip install faker_airtravel

First, we need to use the faker.add_provider() method to add the provider to the Faker object. Then, we are able to call the .airport_object() method, which didn’t exist in the base Faker library.

from faker import Faker
from faker_airtravel import AirTravelProvider

fake = Faker()
fake.add_provider(AirTravelProvider)

fake.airport_object()
{'airport': "Nice-Cote d'Azur airport",
 'iata': 'NCE',
 'icao': 'LFMN',
 'city': 'Nice',
 'state': "Provence-alpes-cote d'Azur",
 'country': 'France'}

In order to find what random data is available in the AirTravelProvider object, we can use the dir trick again:

dir(AirTravelProvider)

Generate A Random Pandas Dataset

Let’s generate some random data for flight passengers using the faker and faker_airtravel libraries. It’s a pretty convenient way to generate random data!

import pandas as pd
from faker import Faker
from faker_airtravel import AirTravelProvider

fake = Faker() #instantiate faker object
fake.add_provider(AirTravelProvider) #add additional provider

df = pd.DataFrame({
    'passenger_name': [fake.name() for i in range(20)],
    'home_address': [fake.address() for i in range(20)],
    'id': [fake.ssn() for i in range(20)],
    'profession': [fake.job() for i in range(20)],
    'airline': [fake.airline() for i in range(20)],
    'boarding_time': [fake.date_time() for i in range(20)],
    'origin': [fake.city() for i in range(20)],
    'destination': [fake.city() for i in range(20)],
})
Random Data Generated In Python

Additional Resources

Looping with list comprehension

Leave a Reply

Your email address will not be published. Required fields are marked *