How to mask, subset and generate test data in only one workflow

by Ekobit October 02, 2015
BizDataX features - mask, subset, generate test data

Almost all Test Data Management (TDM) vendors advertise data masking, data subsetting and synthetic data generation capabilities in their product portfolios today. What brings attention is the fact that many of them offer this key features as separate products. This of course raises questions about two important aspects of the products: the flexibility to cover all real world scenarios and licencing policies issues. Let’s focus on real world scenarios and leave the licensing policies behind us.

BizDataX TDM solution has data masking, data subsetting and the synthetic data generation capability integrated within the same design environment. The basic concept behind BizDataX is that you are enabled to use all these techniques in the same workflow subsetting, masking and generating test data as your test data provisioning scenario requires. This flexibility is built in the core of our TDM platform and proves to be a key benefit in many real world scenarios.

Let’s take a look at the scenario where test data needs to be generated from a banking application database. One of the usual business requirements would be to omit all customers whose salary stands out from the usual salary range because it will be obvious to find out that such salaries belong to top management or some high profile customers.

The usual approach would encompass the following activities:

  • Find all customers whose salary is greater than designated threshold
  • When masking data, skip the records related to the customers mentioned in 1.
  • Mask names, addresses and generate fake accounts for each customer

If you have a TDM solution that consists of separate modules for data masking, subsetting and synthetic data generation your only option is to:

  1. Create a subset by finding and eliminating all records that satisfy given criteria – Data Subsetting Application
  2. Mask data from the data subset – Data Masking Application
  3. Generate synthetic data –  Synthetic Data Generator

This approach gives you no flexibility when mapping requirements to implementation procedures. The requirements implementations are scattered across several applications creating maintainability, transparency and performance issues.

On the other way, what if you have a chance implementing all the requirements within one workflow, organizing activities as business analyst or requirements manager would expect them to be organized?

BizDataX Designer and Microsoft Visual Studio Workflow Editor provide workflow implementation environment that enables combining all the data transformation methods and algorithms in a single workflow or in a series of workflows, as required by the business scenarios.

In our exemplary scenario here is how the workflow could look like (image on the left):

Step 1: Find all employees whose salaries top the expected maximum salary amount and save them in a temporary storage.

Step 2: Mask all data in the ‘customers’ and ‘customer_account’ table, however skip the customers discovered in Step 1. Also, for each customer masked generate fake accounts.




Here is the workflow explained in more details. ‘Mask customer’ part of the workflow iterates through the customer table record by record. If the current record meets the condition set up in the step 1, it will be suppressed (i.e. removed from further processing), otherwise (‘Default’ branch) the fields will be masked in the ‘Masking block’.


‘Mask customer_account’ part of the workflow iterates through the ‘customer_account’ table, checking the ‘Condition’ activity and suppressing again all records meeting criteria from the step 1.  The ‘Default’ branch generates synthetic account numbers for all other ‘customer_account’ records.


Implementing business requirements with BizDataX workflows is much easier to do and also more traceable than using non-workflow based tools that offer data transformation capabilities in separate modules. Additionally, BizDataX can also optimize data access and achieve better performance.