BizDataX version 3.5 released

by Ekobit February 08, 2018
BDX released 3.5

We have just completed BizDataX version 3.5 and are very excited to announce that our investment in this version was all about performance.

What could be so exciting about the performance? Well, the thing is, performance is the most important non-functional requirement of an enterprise data masking solution. It goes like this. If you have to update 2 billion records in a relational database, and you want it to be done in e.g. 8 hours during the night, you would have to be able to process the records at speeds around 70.000 r/s. Sounds good enough? What if we told you that it could be done at speeds of around 300.000 r/s? You would have more control and could do it during the day, in just 2 hours.

Processing speed – 1 billion records per hour

The way things work with BizDataX has not changed. We still have to read data from the database into BizDataX’s application memory, apply the masking rules to change the data, and finally, we have to write the changed data back into the database. The big difference is in two things new to version 3.5:

• updating of multiple tables in parallel
• using bulk database operations

This complements well with stuff that we had before such as reading and updating only a few columns (the columns that need to be anonymized) and using parallel writers for a single table, to mention a few. Lab results are very cool, but we prefer to share the real-world measurements at client’s site.

This may not be the nicest image, the designers will probably criticize us for publishing it, but this one is for the technology enthusiasts, the people that might recognize that this is the actual screenshot showing the native OS’s performance counters. The numbers are important here because 250.000-300.000 r/s turns out to be around 1 billion records per hour.

Days become hours, hours become minutes

Let us do the math for a database containing 2 billion records:

a. If a solution or tool were limited to write into a single table at expected speed around 3.000 r/s, the complete database would be processed in 185 hours or approximately 7 days.
b. If a solution or tool could do it for 10 independent tables in parallel, the total speed could be around 30.000 r/s, the complete database could be processed in 18 hours.
c. With BizDataX speed of around 300.000 r/s, as measured at client’s site, the complete database gets processed in less than 2 hours.

Would it work for your company’s databases?

Experienced among you might say that actual performance may vary depending on factors like hardware, network, database technology, database configuration, triggers, constraints, indices. This real-world measurement shows that figures like this are attainable and that BizDataX can smoothly integrate with most demanding scenarios.

Related Articles