Data 61 finds way to collaborate on big data analysis while maintaining privacy
- 20 September, 2016 10:07
Big data analytics has been on top of everyone’s buzzword bingo cards for some time now. And while many organisations have their analytics house in order internally, solving the challenge of combining their data with that of other parties, while maintaining privacy has remained elusive.
Data 61 is part of the CSIRO, with a specific focus on digital technology. Stephen Hardy is the co-founder of Data61’s N1Analytics platform. This system uses homomorphic encryption which allows data to be analysed and manipulated without the need to decrypt it.
Hardy and his team initially started looking at the analysis of personal genome data. This data is a highly sensitive and can be used to determine an individual’s current and future health as well as revealing other personal information.
“The challenge was how could you analyse that data without disclosing the person;’s genome and still get interesting medical insights from it”.
But they saw wider applications as their platform can be used with any large dataset when the project started three years ago. After speaking with a number of businesses they found this was a common problem – how do you analyse data you can’t actually see?
The technology at the heart of solving this problem is homomorphic encryption. This is employed in mathematics to carry out calculations on encrypted numbers. The concept is not new – it’s been used for about 40 years says Hardy. But making it scalable and usable for business is a new field.
“We’re trying to build a practical, well engineered solution that lets you solve the problem of two organisations doping analysis on a problem jointly without putting it in the one spot or revealing the data to each other”.
Traditionally, two or more datasets would be joined into one table with everyone having access to the data. This new approach maintains the privacy of data as everything remains encrypted.
As the amount of data that is brought together increases, says Hardy, the more likely it becomes that a single individual can be identified separately from the data. Even if you remove identifiers such as names and addresses, once enough attributes are collected it becomes possible to isolate individuals.
Hardy has already been able to show this new technology to companies who have used it on their own internal data. The next step is to expand its application to external datasets – something that he says is being worked on.
A significant element of the application’s next phase is getting regulatory and legal clarity on how this works against the Australian Privacy Principles and other applicable laws. Hardy says this is expected to happen but that the regulatory side of things isn’t moving as quickly as the technology – something we’ve seen in many areas of technology. However, the N1Analytics platform is designed specifically to deal with the data privacy of individuals so he expects these issues to be overcome.
“We think that what we do is well supported by the Australian Privacy Principles and companies can use it safely and securely from an information point of view and data privacy point of view. It removes the risk of re-identification because the data is encrypted and fully de-identified in that way”.
Although N1Analytics started with applications in health, Hardy says there is broader application in finance and telecommunications, where datasets from different sources can be brought together for analysis without compromising the privacy of individuals.
For example, companies could potentially perform real-time credit analysis, without looking at actual personally identifiable data from financial institutions. For example, when a customer looks to enter a new contract for a mobile phone, the telco wouldn't see the financial data and the bank wouldn't see the telco data.
As well as the benefits for companies trying to keep data they see as corporately valuable private, there are benefits for private citizens. In the past, multi-organisational analysis required data to be centralised. This created a risk as a single breach could bring threats actors a rich reward. But by keeping data fragmented in encrypted repositories during analysis, personal data is better protected.
Hardy and the N1Analytics team from Data61 are still in the early stages of commercialising their platform. But they are working with companies to get the technology right and with the OAIC and other agencies to navigate the regulatory and legal challenges.