Believing your data is “probably” secure means it’s not secure at all

No matter how careful you think you’re being with private data, maths boffins can probably prove you wrong

Many companies are being caught out when their “probabilistic” data security measures prove inadequate in the face of a breach, a CSIRO’s Data61 arm has noted in arguing for a more concrete, numbers-based method for evaluating the effectiveness of corporate security policies.

As organisations like Medicare have found out to great embarrassment, data scientists have derived many ways of reverse-engineering even anonymised data sets.

And with CSIRO experiments showing that any individual can be identified based on as few as six Facebook attributes that were “super easy” to aggregate, Dali Kaafar, research group leader for information security and privacy at CSIRO’s Data61 and chief scientist of the Optus Macquarie University Cyber Security Hub, says  too many companies are judging their privacy protections to be adequate in “an ad hoc way”.

“All too often organisations have been handling this issue of data privacy in an ad hoc way,” he told CSO Australia. “They say ‘the data is probably secure’ and convince themselves it’s OK without necessarily having proof of that. But if you’re saying it’s ‘probably’ secure, it probably isn’t. And we shouldn’t be playing this game of probabilistically securing things.”

Warnings about the integrity of privacy protection are timely, given the conclusion of Privacy Awareness Week (PAW) 2018 ahead of next week’s implementation of the European Union’s general data protection regulation (GDPR).

That regulation will require companies to have a much clearer view of their data holdings, imposing strict penalties for companies that fail to adequately secure their customers’ personally identifiable information (PII). But with researchers like Kaafar already showing that many conventional security methods can be worked around using data science techniques – and offering methods for protecting the privacy of research conducted on confidential data – he believes companies should add more mathematical rigour to their protections so they can categorically attest to the strength of their protections.

Even obfuscation techniques – such as burying real data in a flood of similar but counterfeit data records so as to confuse potential data thieves – had proven to be less effective than companies would like to believe: “the more noise you add, the more the noise will tell you about the basic signal,” Kaffar explained.

Kaafar’s team has developed an algorithm-based tool, called R4, that draws on a number of data sources to evaluate the risk of reidentification from data sets. By using mathematical constructs to evaluate the likely risk of exposure if a data set is published, he says, companies can better evaluate the real risk of their data-exchange practices – and attest to their having managed data in an “approvably private way”.

A Confidential Computing platform, developed inhouse by the analytics team, uses homomorphic encryption and secure multiparty computing, allowing analytical tools to learn across multiple data sets without requiring the data be published in a raw format.

Such data dumps have proven disastrous for organisations such as the Australian Red Cross Blood Service, which in late 2016 discovered that a third-party IT service provider had published a 1.74GB database backup, containing 1.3 million records, in “an insecure environment”. Similar mistakes saw cloud-data ‘buckets’, variously containing information about nearly 50,000 government employees and a raft of Australian Broadcasting users published for public access.

“This is worse than making the headlines,” Kaafar said. “This is making the headlines while being convinced that you’ve done every single thing right. We have many organisations thinking they are doing the right thing, and ending up in situations where they are not. We end up having this trade-off between utility and privacy.”

Evaluating data protections using mathematics may offer a better degree of governance, but PAW 2018 offers a timely reminder about the importance of privacy as an operating construct – shared and practiced by all employees – as well.

For its part, identity management provider Ping Identity offered seven tips for improving overall privacy practices. These included avoiding password reuse; creating secure passwords; using 2-factor authentication; encouraging users to be careful about the data and files they share online; monitoring the permissions they grant social-media platform; being sceptical about unsolicited phone calls and other approaches from untrusted third parties; and never revealing passwords over the phone.

“Consumers are increasingly concerned about how their personal data is used and shared,” APAC chief technology officer Mark Perry said in a statement. “It’s become a critical competitive requirement that leading brands not only provide privacy and consent options, but also make these options user friendly. If the customer can’t easily find or use them, they might as well not exist.”

“Having a customer identity and access management solution in place can play a critical role in ensuring customer confidence, as well as compliance with privacy regulations across all the jurisdictions in which a business operates.”