Last Updated on July 14, 2022 by Jay
This tutorial will show you how to generate random and unique data in Python easily and we will use a library called faker.
Install Library
First, let’s start off by installing the library using pip.
pip install faker
Generate Random Data In Python
Then, to generate random data using the Python faker library, all we need is a Faker object, which will let us generate random names, addresses, and even (of course fake) credit card numbers and airline information!
from faker import Faker
fake = Faker()
fake.name()
'Charles Morgan'
'0541 Robert Rapids Apt. 512\nNorth Tracey, MO 57795'
fake.credit_card_number()
'376541772271895'
Reproducible Random Data
Note that each time we run the above code, we’ll get different results due to the library’s random nature. So you’ll get different names when running the code on your end.
Like many random generators, we can use a seed to ensure that other people can reproduce the results. So run the below 2 lines of code to reproduce the below result:
Faker.seed(0)
fake.name()
'Norma Fisher'
Random And Unique Data
The Faker object has an attribute .unique, which we can use to help generate unique data for the lifetime of a Faker instance.
Let’s test this, the below code proves that all 10,000 random names are unique. Note we first create a list containing 10,000 random names using list comprehension, then convert the list into a set, which would remove any duplicate values. As shown below, all 10,000 generated names are unique.
rand_names = [fake.unique.name() for i in range(10000)]
unique_num = len(set(rand_names))
unique_num
10000
Foreign Random Data
Faker can generate not only English data but also data in other languages and locales. By default, the locale in faker is set to be US/English. We can check that by calling the .locales attribute.
fake.locales
['en_US']
In order to add multiple locales in the random generator, we just need to pass a locale list into the Faker() constructor.
from faker import Faker
locale_list = ['de_DE', 'cs_CZ', 'en_US','fr_FR','it_IT', 'ja_JP', 'hi_IN','zh_CN']
fake2 = Faker(locale_list)
[fake2.name() for i in range(20)]
['鈴木 治',
'Dipl.-Ing. Daniela Schottin B.Eng.',
'Victor du Meunier',
'ईश लोदी',
'Tabea Zobel MBA.',
'हुसैन अग्रवाल',
'Scott Henson',
'Samantha Ramos',
'Nicoletta Gottardi-Mascagni',
'Balthasar Bauer B.Sc.',
'李鑫',
'Mariusz Drub-Henk',
'Brian Wagner',
'Adamo Porzio',
'Augustin Maréchal',
'Charles de Roy',
'Jared Harris',
'遠藤 修平',
'Dana Němcová',
'Brunhild Scholtz']
What Kind Of Random Data Is Available?
So how do we find out what kind of random data can faker generate? The answer is that it’s a quite long list, which we can find by calling Faker.__dir__(). There are ~300 of them, so take your time and see if you find anything interesting!
fake.__dir__()
Extended Random Data
Although faker already provides a wide variety of random data, a few cool dudes on the Internet went above and beyond by extending the random data that Faker can provide. However, we need to install additional libraries to use these other random data, which are referred to as “providers”, and act as an add-on to the base Faker library. Below are a few interesting ones:
Provider name | Description | Library Name |
---|---|---|
Airtravel | Airport names, airport codes, and flights. | faker_airtravel |
Microservice | Fake microservice names | faker_microservice |
Music | Music genres, subgenres, and instruments. | faker_music |
Vehicle | Fake vehicle information includes Year Make Model | faker_vehicle |
Let’s take a look at the faker_airtravel specifically and see how it works. Again, we use pip to install it.
pip install faker_airtravel
First, we need to use the faker.add_provider() method to add the provider to the Faker object. Then, we are able to call the .airport_object() method, which didn’t exist in the base Faker library.
from faker import Faker
from faker_airtravel import AirTravelProvider
fake = Faker()
fake.add_provider(AirTravelProvider)
fake.airport_object()
{'airport': "Nice-Cote d'Azur airport",
'iata': 'NCE',
'icao': 'LFMN',
'city': 'Nice',
'state': "Provence-alpes-cote d'Azur",
'country': 'France'}
In order to find what random data is available in the AirTravelProvider object, we can use the dir trick again:
dir(AirTravelProvider)
Generate A Random Pandas Dataset
Let’s generate some random data for flight passengers using the faker and faker_airtravel libraries. It’s a pretty convenient way to generate random data!
import pandas as pd
from faker import Faker
from faker_airtravel import AirTravelProvider
fake = Faker() #instantiate faker object
fake.add_provider(AirTravelProvider) #add additional provider
df = pd.DataFrame({
'passenger_name': [fake.name() for i in range(20)],
'home_address': [fake.address() for i in range(20)],
'id': [fake.ssn() for i in range(20)],
'profession': [fake.job() for i in range(20)],
'airline': [fake.airline() for i in range(20)],
'boarding_time': [fake.date_time() for i in range(20)],
'origin': [fake.city() for i in range(20)],
'destination': [fake.city() for i in range(20)],
})