Machine learning: what it is, and what it isn’t

Author: Bryan Gale, VP Product Marketing, Cylance

In 1959, an IBM employee by the name of Arthur Samuel programmed a computer to play checkers against him. Over time, the program was able to collect data, strategise and win a game all by itself. And thus, machine learning was born.

More than half a century on, there have been huge developments in artificial intelligence and machine learning capabilities for mass consumption – from computer games and self-driving cars, to facial recognition and curing cancer.

In the world of cybersecurity, it is only recently that artificial intelligence and machine learning has made its mark by predicting and proactively stopping attacks before they start. With the number of hacks, attacks and breaches growing every year, artificial intelligence and machine learning has come in the nick of time.

But in the last 12 months, we have seen almost every antivirus vendor and his sales engineer tout the latest in ‘revolutionary’, artificial intelligence-powered endpoint protection. And the noise is deafening. So, what is machine learning really and, more importantly, what isn’t it when it comes to endpoint security?

What machine learning isn’t

Let’s get one thing straight: machine learning techniques are not signatures, nor do they use signatures to operate. A signature is a set of instructions written by a human or, at best, written by machines once they are given a set of rules created by a human. Signatures cannot strategise, generalise, or make decisions that lie outside of the set ‘rule book’.

Legacy AV products that rely on signatures cannot identify malware that is not already known and threats that have not already been reported as malware. In other words, a ‘Patient Zero’ must first get infected for that malware to be discovered. Once the malware is identified, more time elapses as a signature is generated (akin to a mugshot being uploaded to a police database) before this information is sent to the antivirus product’s knowledge base to protect other customers. It’s like locking the stable door after a thief has stolen your prized racehorse.

In recent months, antivirus companies have begun to realise the limitations of legacy signature-based software and started offering new ‘machine learning-enabled’ solutions to their customers. In truth, most are just re-badged, glorified, signature-based products. Here are four reasons why:

1. A human wrote the rules that make all the decisions – perhaps a fancy signature but still a signature, which can leave gaps or can be easily circumvented.
2. A human makes the ultimate decision – at best these products offer a decision support system.
3. Machine-initiated actions do not automatically follow an automated analysis – if the system’s efficacy is not high enough to act without human command, the system has failed to achieve true autonomy and true artificial intelligence/machine learning.
4. It requires a high latency decision – if decisions take too long, it’s a clue that artificial intelligence/machine learning is probably not doing the work.

What machine learning is

Unlike products that react after the damage has been done, true machine learning techniques do not use signatures to identify what is or is not malicious. This means, it offers the unique ability to block malicious software in milliseconds, before it has a chance to execute, even if it has never been seen previously.

Think of it like teaching a child: rather than feeding the child all the answers, you teach them how to think for themselves. Machine learning builds on the human brain and outdoes it by millions of additional data points, allowing it to manage the overwhelming volume, variety and velocity of today’s cyberthreats.

Developers begin by feeding millions of malicious and non-malicious samples into a supercomputer that, over time and through supervised learning, begins to understand the nature and intentions of each file and trains itself to discern good files from bad. They do not have to verify findings with the cloud or wait for humans to determine a course of action once a breach has been identified. They can identify, decide, and act autonomously without human intervention. And with years of training, the machine learning engine has the potential to reach 99.99 per cent efficacy – highly accurate in its own right and even more so when compared to the 50-60 per cent accuracy offered by signature-based endpoint security.

By design, it is difficult to unmask the false claims made by cybersecurity vendors. Here are five simple questions to ask yourself when purchasing machine learning endpoint protection:

1. Does the machine learning capability work without requiring a patient zero or sacrificial lamb?
2. How extensive is the machine learning math model and how many years has it been tested in the real world?
3. Do the cyber-prevention capabilities prevent threats from executing?
4. Does the machine learning capability work both in connected and disconnected environments?
5. Can the protection work in milliseconds, with little impact on CPU usage?

The lesson to be learned here is this: don’t get sucked in by the buzzword bingo. Yes, machine learning is transforming the endpoint security game. But not all products are created equal, and most are not as they seem.

Bryan Gale is the Vice President of Product Marketing at Cylance. Bryan has over twenty years’ experience in the industry, with previous roles at McAfee, Webroot Software, and Oracle.