When it comes to security analytics, too much data still isn’t enough

Open-access cloud architectures improve security by enabling trawling through data lakes

Security specialists are taking new approaches to the problem of ‘alert fatigue’ by increasingly feeding large, aggregate data sets into specialised artificial intelligence (AI) engines that are learning to discern good and bad behaviour with a high degree of accuracy.

The addition of AI capabilities to analytics has found great currency in the analytics world, where masses of security tools are generating log events faster than most security teams can keep up.

Even with filtering in place, Cisco’s 2017 Security Capabilities Benchmark Study found that 44 percent of security operations managers see over 5000 security alerts every day. Organisational constraints mean just 56 percent of these can be investigated; half of these are deemed legitimate and just 46 percent of those legitimate alerts are remediated.

These figures highlight the persisting challenges that face security managers on a day-to-day basis – and reinforce the case for introducing better technology to help security teams keep their ships upright despite the torrent of security alerts.

Ironically, one answer lies in accumulating even more data – building massive cloud-based ‘data lakes’ of security log information that is fed into AI engines to teach them what normal behaviour looks like. This includes monitoring behaviour by staff using desktops, laptops and mobile devices as well as machines generating machine-to-machine (M2M) traffic without the involvement of humans.

Once behavioural baselines are established, it becomes relatively straightforward to spot deviations from the norm that may well indicate a new form of ransomware or other attack. There is great value in deriving these baselines from cloud-based threat-intelligence services – of which every security vendor has its own version – but Bill Smith, senior vice president of worldwide field operations with LogRhythm, believes this is definitely a case where there is never enough data.

“We’ve already done behavioural and anomaly detection [on-premises] but we believe there is a plateau that we can move to by expanding the data set,” he explains.

“There are certain types of behaviours that only emerge over time with larger data sets that we can’t necessarily collect on-premises or locally. This means that a cloud-based service provides a really high-fidelity look at the data that local systems just won’t be able to do.”

LogRhythm, for one, is extending its on-premises AI Engine correlation tools to the cloud to build upon their success in enabling high-granularity analysis of onsite data. Its cloud-based analytics engine, Cloud AI, is due out in the third quarter of this year and will aggregate massive volumes of data into a data lake – which will eventually become a crucial source of behavioural data to feed further security analysis and enforcement.

Cloud AI’s role as a large-scale data source will see its core Elasticsearch database steadily opened up for use by others in their own behavioural analyses. This includes the use of application programming interfaces (APIs) to allow online users to access data “from other locations for whatever purposes and uses that they want,” Smith says. “Companies like us need to provide more integration access bidirectionally, and we’re getting demand from people wanting to use our system as the back-end data lake so they can get into it with different applications. We’re doing all these things to help people more efficiently respond to threats.”

That response – which needs to be both accurate and timely – lies at the core of the analytics value chain, which is being strengthened considerably as the migration to the cloud enables a unified industry response based on consistent and comprehensive data.

Extensive usage of cloud-based services not only provides a richer data set for analytics engines to chew through, but paves the way for extensive peer-based benchmarking and comparisons within and between target industries. This will make analytics a key driver for sharing security best practices, as well as helping shorten response times for other businesses operating in an industry that has been targeted by new attacks.

Smarter analysis speeds response. The addition of AI-based machine learning is equally transformational, with Gartner recently predicting that investment in machine-learning tools for IT service management will more than triple by 2020 – and that 10 percent of penetration tests will be conducted by machine learning-based machines.

learning engines can rapidly adapt to changes in the security climate such as the one caused by the recent WannaCry and Petya ransomware outbreaks.

When such attacks emerge, the initial element of surprise rapidly gives way to forensic analysis that quickly isolates characteristic aspects of their behaviour. This, in turn, provides fodder for learning engines to quickly pick up and propagate characteristic behaviours – for example, beaconing on the network, the sudden transmission of large amounts of traffic, and accessing new Internet sites or known malicious sites.

“Networks have behaviours,” says Smith, adding that these behavioural signatures can be broadcast widely through the cloud, helping companies actually anticipate and stop infections they haven’t even seen yet.

When both WannaCry and Petya began spreading, analytics providers were quick to point out that their customers had not been hit because their systems were able to spot the ransomware’s anomalous behaviour immediately. In this way, the analytics response is not only about detecting an anomaly on the corporate network – but about heading off infection so that there are no anomalies to detect in the first place.

This important capability will become the norm as the climate of cybersecurity attacks continues to become fiercer and more regular. Automated assistance will help cybersecurity practitioners keep up with volumes by using intelligent analysis – and not just standard big-data analytics techniques – to more rapidly sift through large volumes of alerts.

The end result, says Smith, is a much lower volume of false negatives and higher specificity for new attacks that need real attention. “The bane of behavioural monitoring is alarm fatigue,” he explains, “but we are getting to a place where we can really provide a much more granular, high-fidelity look.”

“The result for the customer will be quality alerts, with really actionable alarms where they can drill down to get very specific information about why it’s alarmed and what to do about it. The way you do that is by providing much better data and analysis – and that’s what Cloud AI is going to allow us to do.”

Tags GartnerLogRhythmsecurity analyticssecurity managersartificial intelligence (AI)Cisco’s 2017application programming interfaces (APIs)comprehensive data

Show Comments