Data Masking – who it helps?
The Latin saying “Exempla utiliora sunt quam praecepta”, which in translation means: “Examples are more useful than rules” is also applicable in the context of masking sensitive data. Examples, in this case, would be real, often sensitive data, and rules – synthetic data generated to replace real ones in non-production environments. Production data is full of a variety of real-life examples and is therefore of great value if it can be used in non-production environments. However, given the latest legislation and directives on personal data protection, this is no longer possible. Masked data comes as a second best.
What is well-masked data, and who needs it?
Because preparing data from scratch or modifying existing data requires valuable time, it’s difficult for either of these approaches to ultimately yield a good enough result. This is where well-masked data comes into the picture. Apart from masking sensitive data, good test data also serves another purpose: to offer a broader perspective on a system that’s being developed or maintained, so that everyone concerned has a sound understanding of the system. Some of the characteristics of such data are that it’s structured, consistent, and complete. These qualities, along with referential integrity, naturally make good test data reliable.
Using data that’s as realistic as possible is critical to everyone involved in the software development and maintenance process. Here are some of the key players.
Developers are our first cog in the process – the testers who come before all other testers. Meaningful data is also useful here. Synthetic data are not good enough to cover edge cases. The wanted data have a production data structure, complexity, and referential integrity. Using such data reduces the time required for testing. It may be easier when this is the case, but it can be extremely challenging to simulate real-world circumstances when debugging complex issues that depend on large data sets.
The next stage in the process is managed by testers. In preparing data that they’ll be using, they try to make the data as applicable to as many scenarios as possible. In contrast, for more marginal, complex, or extensive scenarios, they (sometimes, if they have the opportunity) use production environment data in the hope of finding as many flaws as possible. This is because the real scenarios are found in production: they have referential integrity, a clear structure, a sufficient amount of data, as well as a sufficient level of complexity. Finding edge cases is usually not an easy task.
An example of an edge case may be loan approval testing. In order for the system to offer the possibility of issuing a loan to a user, it’s necessary to check whether the whole set of preconditions has been met. Often some of the required user data is located in different systems, and for some edge cases, it’s difficult and time-consuming to set up satisfactory test data, especially if you need to engage other people from the system, which can be tricky to coordinate in larger organizations.
Also, ideas for new test cases come using the right data (exploratory testing approach). By masking sensitive data, such an approach can still be used without hindrance.
One of the main tasks of business analysts in the software development/maintenance process is to try to understand what application users want and then explain these findings to developers in the best possible way. Meaningful data makes it easier to understand individual functionalities, which makes proposing improvements to existing functionalities and devising new ones easier. When proposing improvement or inventing something new, production data may be used as an example in describing that improvement or invention. For the same purpose, masking sensitive data from a production data set may produce almost the same result.
Functional specifications and work items for developers written by business analysts can contain masked data as examples to help them do something.
For each change and upgrade to the existing functionality of the system, it’s necessary to make an analysis of the current state – that is, how the system works at that time. Such an analysis is usually done by a business analyst; afterward, he or she makes a proposal, which may be changed. Meaningful data greatly facilitates such an approach, but it’s often found in several different systems – for example, in the adaptation of the rules of a credit scoring engine following a new provision of law, where real estate values are appraised in one system while credit history is appraised in another.
Consultants and Educators
Consultants and educators must be able to present the system/application/functionality to existing and potential customers and users. They do this best by preparing a good demo with as many representative examples as possible. Demo ideas can be where sensitive data has been masked. The target group of users will best identify with what they have seen and will probably recognize how the presented data is helpful.
Masked data can be used for onboarding. Thanks to it, new team members – developers, testers, analysts … in fact, everyone who works on a product – can become more productive in a shorter period of time because the interval required for getting to know the product becomes shorter. Meaningful data facilitates understanding, and the system stops being so abstract.
Preparing non-production environments is usually the task of an administrator. The preparation of masked data based on production data, which will be used by various teams including developers, testers, and others, can be centralized by having a team that defines how the data is masked, then masks that data and distributes the sets as needed. Such an arrangement greatly contributes to flexibility.
How team leaders, managers, business owners and other stakeholders can implement data masking?
Once you’ve established appropriate data masking practices in your workplace, your job is half-done. When processes are well-defined when the right tools and approaches are used, when employees work directly on a product, everyone benefits. A critical element in establishing such practices is risk management. The more employees there are who have access to sensitive data, the greater the chances that something may go awry, which is why poor risk management is one of the most common causes of project failure.
Team leaders and managers can play a key role in their organizations. If they can properly design the procedures for preparing and using masked data, if their teams can truly understand the data – including where, specifically, certain information is stored and how it affects processes – then the benefits to the organization are many.
In short, properly masked data brings many benefits. It increases team productivity, delivers quality outputs faster, and lowers costs. It also ensures compliance with the GDPR and other regulations, which is checked through various inspections and other controls. Where conditions aren’t met, sanctions and penalties are imposed.