10 types of personal data companies must anonymize (and how)

by Anamarija Zegnal June 30, 2023
10 types of personal data companies must anonymize (and how) - Anamarija Zegnal - Ekobit

Privacy and personal data protection became one of the main burning issues in everyday life. Transparency and confidence are the most valuable standards in the relationship between users and parties that collect, handle and use their confidential information. Still, moral principles are not enough to ensure consistent and thorough implementation in all application fields. As a result, several major regulations and laws were passed and put in force to ensure the uniform protection of personal data. Some of the most significant regulations include: 

  1. General Data Protection Regulation (GDPR) – regulates the processing and protection of personal data in the European Union.  
  2. Health Insurance Portability and Accountability Act (HIPAA) – HIPAA is a federal law in the United States that regulates the protection of individuals’ medical information.  
  3. Payment Card Industry Data Security Standard (PCI DSS) – governs the processing of credit card payments. PCI DSS requires that certain types of personal data be masked or encrypted to protect individuals’ privacy.  
  4. California Consumer Privacy Act (CCPA) – The CCPA is a California law that provides California residents with certain rights on their personal information.  

Each country has its own laws and acts that additionally regulate this domain. For example, in Croatia the Personal Data Protection Act was adopted in 2018 to align with the EU’s General Data Protection Regulation (GDPR). It’s important for organizations to understand the regulations that apply to them and to take steps to follow the relevant requirements. 

The requirement that interests you the most is the obligation of organizations to mask or anonymize data to protect individuals’ privacy and identity. The main questions you may ask yourself are:   

  1. What data should you mask?  
  2. How to find that data?  
  3. How to mask it once you have found it? 

What data should you mask? 

Different regulations apply depending on the location and field in which your organization carries out its activities. Here you can find the most common rules and their corresponding obligations regarding masking personal data. It’s important to note that the specific data that needs to be masked may vary depending on the context and purpose of processing.  

10 common types of personal data you must mask and anonymize 

You probably noticed that these regulations have a lot of common data types that need to be masked. They can be categorized as:  

  1. Names: Personal names, including first names and last names.  
  2. Identification numbers: Any identification numbers, such as social security numbers, passport numbers, OIB (Osobni Identifikacijski Broj) or driver’s license numbers  
  3. Geolocation data: Data that shows a person’s location, such as GPS coordinates or postal address  
  4. Telephone numbers – certain digits of a telephone number should not be visible or accessible to those who do not have a legitimate reason to access it.  
  5. Financial data: all financial data such as credit card numbers, PANs (Primary Account Numbers) or service code numbers should be masked to prevent unauthorized access, theft, or misuse.  
  6. Email addresses: email addresses, both local-part and domains.  
  7. IP addresses: IP addresses are considered personal data   
  8. Dates: such as birthdates, admission dates or card expiration dates  
  9. Health data: diagnosis or health insurance numbers are a big part of an individual’s identity.  
  10. Biometric data: Biometric data, such as fingerprints or facial recognition data 

How to discover sensitive personal data

How to find sensitive personal data? 

To ensure personal data protection and privacy compliance, organizations must find sensitive data through data discovery. This involves finding where sensitive data is stored, who can access it, and how it is used. Organizations have several techniques and tools available for data discovery, including manual searching, data classification, and automated scanning tools. 

Manual searching 

Manual searching of sensitive data is a technique that takes the most resources to perform. It involves manually performing analysis using different methods that can be faulty or not applicable to a specific use case. Because a physical person does it, it is also prone to risks of human errors.  

It often includes predefined approaches, such as searching for specific keywords and data structures, which can lead to oversights. It depends on highly skilled personnel familiar with the data and database. It also requires a detailed and time-consuming analysis beforehand.  

The recommendation is to use manual searching only when other methods prove inefficient or when this method is specifically required by third parties (for example, your client). 

Data classification 

Data classification is a method used to identify and categorize data based on its sensitivity. It relies on marking sensitive data with predefined data labels indicating the sensitivity level.  

It also requires an extensive database analysis using methods like manual searching, pattern matching, regular expressions, keyword searches etc.  

The labelling can be done manually or by using an AI specifically designed for this purpose.  

The cons of this technique are subjectivity and inconsistency in labelling. It requires constant maintenance, and if the labelling is not done correctly, it can restrict access to the data.  

Automated scanning tools 

Automated scanning tools eliminate the risks present in previously described methods. It’s fast, easily customizable and fewer errors and omissions occur.  

An example of an automated scanning tool is BizDataX. BizDataX is a data masking solution that provides test data in production quality. Using production data and personal information in a non-production environment (development, testing) involves many risks. BizDataX makes data masking/data anonymization simple by cloning production or extracting only a subset of data. Achieving regulation compliance is much easier. BizDataX implements the following functionalities:  

  1. Data modelling,  
  2. Data subsetting,  
  3. Workflow editor,  
  4. Data masking,  
  5. Synthetic data generation. 

Read here how does data discovery work.

Finding sensitive data with BizDataX 

BizDataX automates locating sensitive data using metadata inspection, data sampling, and various discovery rules and algorithms. The process called “Sensitive data discovery” checks multiple systems, databases, tables and records and produces discovery findings that can be classified as sensitive or not.  

For this purpose, BizDataX uses so-called discoverers. They are used for finding specific values or types of values within the chosen data source, environment, schema and/or table and identifying them as sensitive data. They are grouped by country or usage, but there are several discoverers that have general uses when more specific discoverers are not needed. 

Discoverers that BizDataX offers are: 

  1. Categorized by country – discover data specific to Croatia, Switzerland and USA. Additional variables can be defined to target the sensitive data in even more detail.

    • Discoverers for Croatia: include BBAN, JMBG, OIB and others  
    • Discoverers for Switzerland: for example, as AHV number 
    • Discoverers for USA: for example, USA Social security number. 

  2. Categorized by a purpose – discover specific data used in certain functions: Human data, Geographical data, financial data and Legal entities.

    • Discoverers for Human Data: include First names, Last names, Emails and other 
    • Discoverers for Geographical Data: such as City (from random or specific country) 
    • Discoverers for Financial Data: various types, including SWIFT and IBAN (International Bank Account Number) 
    • For Legal entities – for example, Companies (from random or specific country)

  3. Categorized as General – discover custom values and allow you to specify what sensitive data you want to find (for example, XML, Keywords, Regular expressions, etc.). 

How to mask sensitive personal data

How to mask sensitive personal data? 

Once you’ve identified sensitive data, it should be masked and anonymized with unique and corresponding data that will not lose its quality and semantics when used. These can be performed by: 

  1. writing your own scripts that will be run every time you need to use the data or  
  2. by using ready-to-use automated tools for masking. 

Writing scripts 

Writing scripts for masking sensitive data involves creating custom code or scripts that replaces data marked as sensitive. It is a relatively complex process because it includes a lot of preparation and post-implementation steps, such as 

  • choosing the masking technique,  
  • developing masking logic,  
  • testing the final solution and  
  • constantly updating it and maintaining it to be applicable when new data is added.  

Also, appropriate security measures must be implemented to keep data integrity.  

Because of this, it’s very time-consuming and requires employees to understand data structures thoroughly. Our opinion on this technique? To quote our data masking expert: 

72% of scripts are not even functional. DIY data masking is typically poorly documented and difficult to maintain.

You shouldn’t be on your own and forced to take care of everything.

Automated tools for masking personal data 

Automated tools can also be used to mask the sensitive data that was found. Finding two tools that can be used simultaneously and get the needed results can be tiring. But some have integrated both functionalities in one place. They allow easy and fast working.  

The already mentioned BizDataX is an example of a tool that integrates those two features. After the sensitive data is discovered, it can be masked and anonymized using both BizDataX products: BizDataX Portal or BizDataX Designer.  

BizDataX Portal is a web application for managing data masking projects. The main goal of the BizDataX Portal is to enable different stakeholders to collaborate on discovering sensitive data in their databases, agreeing upon how this data will be masked and executing masking algorithms. In the newest version, masking can be done on Portal through the intuitive Wizard and configured through Portal UI (User Interface).  

BizDataX Designer is an add-in for Microsoft Visual Studio that can be used for simple and advanced masking. It is integrated with Visual Studio debugger and several other features that support rapid workflow implementation.  

Masking sensitive personal data with BizDataX 

Data is masked by generating data that will replace the original one. For generating data of various content and type, BizDataX uses so-called Generators. They create data used in the masking process. Generators offer a solution for creating realistic and diverse data sets for testing, research, and other applications.  

Various Generators are implemented and cover almost all types of data you must anonymize to be compliant with regulations:  

  1. Name generators: including First name and Last name.  
  2. Contact information generators: involving Emails and Phone numbers.  
  3. Financial data generators: IBAN, PAN (Primary Account Numbers), Credit card numbers and others.  
  4. Date and number generators: date in range, number in range, etc.  
  5. Personal identification generators: counting OIB, JMBG and similar.  
  6. Miscellaneous generators: for example, Places, banks, Companies and so on.  

“Almost all” is not “all”. So, BizDataX also supports the creation of custom Generators to cater to your needs. 

How to mask sensitive personal data

Protecting data privacy by finding and masking sensitive personal data 

Personal data protection is a crucial part of the digital world. Legal obligations exist to protect sensitive information from unauthorized access or disclosure. Organizations are obliged to use different tools. Regulations differ from country to country, but it is possible to specify common data types that present a risk for personal data and privacy, such as names, identification numbers, financial data etc.  

BizDataX is a reliable and efficient product for discovering and masking sensitive data, offering a variety of out-of-the-box discoverers and generators and an extension mechanism for specific cases. With high performance and preserved referential integrity, BizDataX can handle large databases and complex data relationships. BizDataX is an excellent solution for organizations that must ensure data privacy and security while maintaining data usability.  

Want to check out how this works? You can get a hands-on experience by getting a free Proof of Concept of the tool many other companies from banking, insurance and other industries use. Contact us, and we will schedule a quick call to see if the tool fits your needs. 

Related Articles