When coding in any programming language, sometimes the functionality needs to be tested using sample data, and manually creating a sample dataset is time-consuming and tedious. This tutorial explains how to easily and quickly create a dummy dataset in Python using the fake library function.

Using a fake library that generates fake data randomly, it is easy to create dummy data in Python. This fake library can be easily installed using the pip command, as shown below:

pip install faker

Let's now look at some examples of this library before creating the dummy dataset. The code below will randomly return a fake name, address, and text:


from faker import Faker
fake = Faker()
Program Output:
Eric Miles
65822 Grant Center
Maciasport, UT 21639
Writer most movie politics you hit one. Store machine ahead push yourself key give. Me seek ago practice visit list.
Feeling religious general market car least water past.

Every time this code will give a different result. Now let us see how to create sample data for a dummy dataset using Python.

Create Dummy Dataset Using Python


from faker import Faker
import pandas as pd
fake = Faker()
data = [fake.profile() for i in range(50)]
data = pd.DataFrame(data)
Program Output:
                                     job                company          ssn  \
0                        Event organiser             Morton PLC  261-12-2709   
1                 Operational researcher          Moore-Johnson  376-90-4804   
2          Clinical molecular geneticist  Dunn, Hill and Brooks  364-15-5960   
3  Product/process development scientist          Juarez-French  891-08-3314   
4             Research scientist (maths)        Berger and Sons  316-28-2550   

                                       residence           current_location  \
0    3631 Timothy Falls\nAlexanderport, LA 24142  (-66.2562555, 142.468486)   
1         8178 Tran Lodge\nChapmanland, KY 26585   (-83.5505865, 74.518738)   
2        9487 King Estates\nFoleyhaven, SD 68166  (-54.688459, -159.129415)   
3  90156 Gomez Drives\nLake Tamarastad, ME 26521  (-40.2652345, -99.965372)   
4               PSC 4460, Box 2545\nAPO AE 22327    (-53.296416, 87.066028)   

  blood_group                                            website  \
0          A-  [https://baker.net/, https://www.burgess.com/,...   
1          A+          [https://lawson.com/, https://baker.biz/]   
2          A-  [http://ho.org/, https://espinoza.com/, http:/...   
3          A+                                  [http://day.org/]   
4         AB-                         [https://www.sanchez.com/]   

          username                name sex  \
0          david32         Mark Bishop   M   
1       jasonbauer  Christopher Parker   M   
2  andersoncarolyn        Timothy Best   M   
3        timothy09      Audrey Schultz   F   
4     tuckerrhonda    William Crawford   M   

                                             address  \
0     19411 Don Shores\nNorth Margaretside, MT 79078   
1  8357 Graves Oval Apt. 941\nSouth Valerie, ME 2...   
2  3649 Ayers Ridge Suite 085\nEast Amanda, LA 63091   
3           39363 Gardner Rue\nWest Amanda, DE 47391   
4  90501 Michelle Mission\nWest Natashabury, FL 1...   

                             mail   birthdate  
0              ubonilla@yahoo.com  1981-06-02  
1          frankfischer@gmail.com  1952-08-15  
2        ginarobinson@hotmail.com  1990-07-30  
3        michaelwatkins@gmail.com  1912-05-11  
4  williamsonlawrence@hotmail.com  2001-12-24

In the above example code, the fake.profile() method returns different dummy data of 13 columns each time.