Google researchers have revealed a machine learning algorithm that is helping it clean up apps on the Play store that unnecessarily collect sensitive user data.
Some apps, like Google Maps or Pokemon Go, add value by knowing your location. Many other apps don’t, yet some still request the Android location permission.
Google wants you to know that it can catch developers that exploit the fact that most people just accept whatever Android permission developers request and it's doing this with an algorithm that flags which apps should be given once over by the team of human app reviewers it started using in 2015.
The algorithm brings “peer group analysis” to apps on the Play store. Google says its algorithm will help it detect “potentially harmful apps” from the store, which include everything from backdoors to phishing and ransomware.
Peer group analysis is used by investors to compare the financial performance of similar companies that, for example, operate in the same geography and sector.
Applied to Android apps, the algorithm allows Google to run large scale analysis on the behavior of similar apps on the Play store by grouping them to discover anomalies, such as whether one app stands out from its peers in terms of data collection.
“Our approach uses deep learning of vector embeddings to identify peer groups of apps with similar functionality, using app metadata, such as text descriptions, and user metrics, such as installs,” wrote Google’s security team.
It’s not clear when Google started using the algorithm but it is influenced by several recent research papers. In a 2015 paper, researchers at Columbia University and Google applied machine learning to “software peer group analysis” to “identify least privilege violation and rank software based on the severity of the violation”.
In theory Android’s permission system should make developers follow the principle of least privilege, meaning an app only has access to resources it needs for a legitimate purpose. However, in the real world developers face more work by following the principle and receive few rewards for doing so.
At the same time, developers can benefit by asking for extra privileges, and often face little resistance from users who tend to grant requests without fully understanding their implications. The researchers argued that machine learning could help “discover users expectations for intended software behavior, and thereby help set security policy”.
With more than 2 million apps on the Play store, any attempt to manually categorize apps into peer groups would fail. And as Google notes, apps are constantly changing which defies the use of fixed categories, such as tools, productivity, and games.
After organizing apps into peer groups, Google uses these clusters to spot “anomalous, potentially harmful signals related to privacy and security, from each app’s requested permissions and its observed behaviors”.
In a warning to developers, it notes that this information is "used to decided which apps to promote and determine which apps deserve a more careful look by our security and privacy experts".