When coding in any programming language, sometimes the functionality needs to be tested using sample data, and manually creating a sample dataset is time-consuming and tedious. This tutorial explains how to easily and quickly create a dummy dataset in Python using the fake library function.
Using a fake library that generates fake data randomly, it is easy to create dummy data in Python. This fake library can be easily installed using the pip
command, as shown below:
pip install faker
Let's now look at some examples of this library before creating the dummy dataset. The code below will randomly return a fake name, address, and text:
Example:
from faker import Faker
fake = Faker()
print(fake.name())
print(fake.address())
print(fake.text())
Program Output:
Eric Miles 65822 Grant Center Maciasport, UT 21639 Writer most movie politics you hit one. Store machine ahead push yourself key give. Me seek ago practice visit list. Feeling religious general market car least water past.
Every time this code will give a different result. Now let us see how to create sample data for a dummy dataset using Python.
Create Dummy Dataset Using Python
Example:
from faker import Faker
import pandas as pd
fake = Faker()
data = [fake.profile() for i in range(50)]
data = pd.DataFrame(data)
print(data.head())
Program Output:
job company ssn \ 0 Event organiser Morton PLC 261-12-2709 1 Operational researcher Moore-Johnson 376-90-4804 2 Clinical molecular geneticist Dunn, Hill and Brooks 364-15-5960 3 Product/process development scientist Juarez-French 891-08-3314 4 Research scientist (maths) Berger and Sons 316-28-2550 residence current_location \ 0 3631 Timothy Falls\nAlexanderport, LA 24142 (-66.2562555, 142.468486) 1 8178 Tran Lodge\nChapmanland, KY 26585 (-83.5505865, 74.518738) 2 9487 King Estates\nFoleyhaven, SD 68166 (-54.688459, -159.129415) 3 90156 Gomez Drives\nLake Tamarastad, ME 26521 (-40.2652345, -99.965372) 4 PSC 4460, Box 2545\nAPO AE 22327 (-53.296416, 87.066028) blood_group website \ 0 A- [https://baker.net/, https://www.burgess.com/,... 1 A+ [https://lawson.com/, https://baker.biz/] 2 A- [http://ho.org/, https://espinoza.com/, http:/... 3 A+ [http://day.org/] 4 AB- [https://www.sanchez.com/] username name sex \ 0 david32 Mark Bishop M 1 jasonbauer Christopher Parker M 2 andersoncarolyn Timothy Best M 3 timothy09 Audrey Schultz F 4 tuckerrhonda William Crawford M address \ 0 19411 Don Shores\nNorth Margaretside, MT 79078 1 8357 Graves Oval Apt. 941\nSouth Valerie, ME 2... 2 3649 Ayers Ridge Suite 085\nEast Amanda, LA 63091 3 39363 Gardner Rue\nWest Amanda, DE 47391 4 90501 Michelle Mission\nWest Natashabury, FL 1... mail birthdate 0 [email protected] 1981-06-02 1 [email protected] 1952-08-15 2 [email protected] 1990-07-30 3 [email protected] 1912-05-11 4 [email protected] 2001-12-24
In the above example code, the fake.profile()
method returns different dummy data of 13 columns each time.