The importance of specification in defining a data masking project

by Zoran Šantek October 06, 2020
BizDataX The importance of specification in defining a masking project

The specification is needed in defining a data masking project to translate a specific idea into a set of program codes – in other words, to make that idea a reality. Time and money spent on project specification are actually an investment that quickly comes to fruition in all later stages. Well-made specification repays in further development and uses many times over – that is, in project maintenance. While the specification is a legal requirement in some industries, such as the construction industry, IT sector regulation is still non-existent. This makes it challenging to ensure that a well-made specification has been made before progressing to the next level. Although some consider writing specification to be “boring”, it is, nonetheless, necessary to define the desired state.

Specification as part of the masking project

The specification in a masking project is used to determine the paradigm of thought concerning the data the user wishes to make anonymous. Given that the field that is to be specified is very narrow, since we are focused on data only, often in practice and in the field, we find challenges with content, type, format, or other data aspects. In practice, we usually deal with two types of users: those who believe they have control over their data (usually solutions developed within the system) and those aware of the ignorance of their data structure for the most part. Both are right to an extent. Those who have control over data, often for historical or other reasons, find data in places they weren’t expecting to find it. Since a specific field is no longer used or a table with sensitive data has been left over due to transferring data or other similar circumstances.

On the other hand, those who are aware that they don’t know their data often know the location of their data within the system, simply because they use that data daily. This case includes essential data within the system, such as the data on users and their chief connections. Both types need structured access to data analysis, which is most conveniently done through specification. In most cases, anonymization is used as an underlining moment in a system. That is a period in which data analysis is considered more carefully, and users themselves become aware of specific challenges regarding data.

Tools as assistance

Regardless of the knowledge about our data and existing documentation, a fresh perspective often leads to the fact that some data isn’t what we expected it to be, or there is data of which we aren’t aware. To help clients analyze their data, tools have been used in an effort to analyze data on a generic level, before providing some guidelines that can be used or discarded upon further analysis. Ekobit has developed a tool – BDX Discovery – that analyzes sensitive data in a base. A tool that provides information on discovered data and an additional “hand inspection” determines whether or not actual sensitive data is being dealt with. In a “hand inspection”, multiple parameters are taken into account, such as the number of records in a table, hit rate and data dispersion, as well as insight into the data sample itself. Some data may be value sensitive but isn’t necessarily sensitive logically. The system uses mostly different codebooks or data for an application running. The tool allows using data sensitivity to generate specification that can be further analyzed and corrected by a user, depending on the user’s additional knowledge. For more complex data analysis and their co-dependence, other tools are recommended. It’s often necessary to implement workshops that involve teams with the client to prepare a specification.

Clients involvement as a necessity

An engaged client is a client who is willing to participate in all the required specification stages, the realization, and project testing. Primarily, a client’s involvement in the anonymization process should be about defining the right specification. If the specification isn’t done well, the end result of the entire process might be ruined. It’s necessary to involve multiple teams from the client-side to have a specification done well in practice. No single person knows everything when it comes to entire systems. With more people being familiar with data anonymization processes than ever before, it’s now easier to have the end product fit the desired solution.

The specification as a prerequisite for adequate anonymization

The specification’s goal is to get a definition for masking, so the implementer doesn’t have to go looking through the database. In practice, it’s challenging to get to that level of specification; however, insisting on it leads to a specification that, in good measure, allows such access. With a well-made specification that includes an exact masking definition as possible can be accomplished, implementation is far more straightforward. Still, the process of testing and the later maintenance of the project becomes far more manageable and effective.

The successful story of data masking project

Creating a specification is necessary for defining the desirable state. The specification is needed for anonymization to make clients aware of their data. Being well-prepared and having done specification masking implementation is made more accessible, along with testing and maintenance. Using tools to discover data in making a specification increases the probability of finding places containing sensitive data, which previously weren’t known because they weren’t used frequently, or not at all. An engaged client is necessary for a specification process because the client knows their data best and its influence on its functions. A well-made specification ensures that other project phases are done well, and that is an investment that soon provides additional value.

Related Articles