BizDataX version 3.7 released

The emphasis in the version lies on significantly improved sensitive data discovery module. Here are the two major improvements we added.
1. Better sensitive data discovery process
Discovery process consists of two parts: finding all potentially sensitive data and then pruning the results: deciding if it’s really sensitive or not.
Finding all potentially sensitive information is an automatic process. We extended our discoverers to include various algorithms for finding national identification numbers, addresses, names and so on. In addition, we implemented discoverer arithmetics. Now we can define new cases. E.g. if you want to find a telephone number that is comprised of 9 numbers AND is not a social security number. Discoverers are smarter and now they know about column metadata. For example, credit card discoverer can check the control digit in the column value, but also recognize column names which have ‘card’ or ‘credit’ in their name.
All of this functionality is extendable, which means that during the implementation, there is a simple process of adding discoverers according to your custom rules.
When it comes to analyzing findings and deciding if they are sensitive or not, there is a lot more context to support the decision. A user is informed about the hit rate (what percentage of sample data from that column satisfies the discoverer condition) and can request a sample of data. She also knows about the table size, null values, sample size, can filter the findigns according to various criteria and so on.


2. Better sample data
In order to search for sensitive data, or to mask existing data, BizDataX relies on multiple lookup lists containing representable data from various categories. In the case of discovery, we use it to search, and in case of masking, we use it as replacement data. This includes names, last names, addresses, companies and so on.
In this version, we significantly improved our lists for Croatia, Germany, United States, Switzerland, and Austria.
For example, with a lower quality first name list, we would get low maximum hit rate on Custome.FirstName column. This results in a large number of columns that have to be manually checked (to confirm it’s not something that just looks like a first name). Improved name discoverer has >95% hit rate on sensitive columns and thus removes the need for manual work.
We also enabled users to compile their own lists and use them for discovery and masking. For example, if the BizDataX first name list doesn’t have a satisfying hit rate on your Customer.FirstName column (or the equivalent of it), you can take a sample of top 1000 names from your own database and use that.
Conclusion
The new functionalities will furtherly allow BizDataX users to be more productive with the tool and spend less time on manual work. Our goal is to make the tool as friendly as possible and make the data masking process simple, without the need to have extensive technical knowledge.
Let us know if you have any questions, we would be glad to answer them and help you protect your sensitive data.