![]() Normally, we’d do an analysis of the table to determine which columns are PII and which are not, but in this case, I’d like to be able to generate arbitrary amounts of data for this schema so I’ll need to create a generating function for each column. Once installed, we can start masking individual columns. ![]() Refer to the Faker documentation for more details on how to install Faker, but in short you can run: As you can see, the table contains a variety of sensitive data including names, SSNs, birthdates, and salary information. Our ‘production’ data has the following schema. Our goal will be to generate a new dataset, our synthetic dataset, that looks and feels just like the original data. The data we will use is a table of employees at a fictitious company. We obviously won’t use real data in this article we’ll use data that is already fake but we will pretend it is real. This article, however, will focus entirely on the Python flavor of Faker. It is also available in a variety of other languages such as perl, ruby, and C#. ![]() ![]() What is Fakerįaker is a python package that generates fake data. To accomplish this, we’ll use Faker, a popular python library for creating fake data. In this article we’ll look at a variety of ways to populate your dev/staging environments with high quality synthetic data that is similar to your production data. Restricting access to high quality data with which to build and test leads to a variety of issues, including making it more difficult to find bugs. New regulations around data privacy and an increasing awareness of the importance of protecting sensitive data is pushing companies to lock down access to their production data. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |