Information overload: Finding signals in the noise
- 30 May, 2014 02:38
Signal-to-noise ratios are hard to manage. As a security professional, you want the threat data, you want the attack notifications and alerts, and you need intelligence. But, when there's too much coming in, those alerts and notifications fall to the wayside. They're easily dismissed and ignored.
After all, if a device is generating 60 alerts a day - and for the first few weeks none of them amount to anything - as new alerts from that device arrive, they're eventually going to be dismissed.
This happens because the IT / InfoSec department has other things to worry about, and there isn't enough time (or people) to deal with a flood of alerts. It's possible the device generating the alerts will be properly tuned and configured later, but that depends on the staff's workload.
It's a defeatist argument, but it's also reality.
IT / InfoSec teams live with shrinking resources, including budgets and staffing, so they have to focus on the tasks that keep the business running, as well as help the business grow and profit. More to the point, most security purchases and deployments center on compliance, other wise known as "checking the box."
The check-box mentality is why there are 60 alerts a day to begin with. The appliance wasn't tuned right, it was installed to meet a compliance or regulation requirement, and the vendor promised a "set it and forget it" approach.
Yet, the default rules from the vendor aren't cutting it when deployed in a live production environment. So the organization is flooded with false positives, leading to more noise than signal.
Another painful reality is the fact that some of these ignored alerts are valid warnings. It's a reminder that stats such as those from the 2010 Verizon Business Data Breach Investigations Report, which said that 86 percent of victims had evidence of a breach sitting in their logs, aren't always fluff.
A more recent example of this problem would be the Target breach. While the breach was happening, SOC (security operation center) employees were fed alerts from a FireEye device. However, while those warnings were investigated, they were eventually dismissed as noise.
"With the benefit of hindsight, we are investigating whether, if different judgments had been made, the outcome may have been different," Target's spokesperson, Molly Snyder, told the New York Times.
"Like any large company, each week at Target there are a vast number of technical events that take place and are logged. Through our investigation, we learned that after these criminals entered our network, a small amount of their activity was logged and surfaced to our team. That activity was evaluated and acted upon. Based on their interpretation and evaluation of that activity, the team determined that it did not warrant immediate follow-up."
Was it a mistake? It's easy to call it one now, but at the time, the InfoSec team at Target were just conducting their daily routine. They checked the alert, determined it wasn't a high priority, and moved on to other things.
This happens daily at organizations across the globe, but the difference is that in hindsight, the public knows what happened to Target due to this oversight, so it's easy to single them out.
"So [in] some of these very high profile breaches, the product was able to identify that the breach was occurring, but the people's intelligence wasn't able to respond because they got so many alerts. They got so much information that it was difficult," commented FireEye's Dave DeWalt.
DeWalt is correct in that information overload is a burden for IT / InfoSec teams, but threat intelligence is a problem too. Most of the threat intelligence feeds available on the market aren't intelligence at all; they're aggregated reports on malware and spam, rogue IP addresses, and vulnerabilities that can't be tied to a given environment.
They're a general overview of the threat landscape, and a good source of data to have, but they can't protect an individual business on their own. But that's what they're promoted to do, which isn't realistic.
Data (intelligence) for as far as the log can read
The problem isn't data. Organizations have tons of data, but the signal-to-noise ratio is too low. So valid data, or threat intelligence, is missed; dismissed as a noisy appliance or overly sensitive alert.
It's a problem when information exists without the means to process it in a way that's meaningful to the organization. The little links between incidents, which on the surface look like random, meaningless threats, are often what cause the largest problems.
So what's an example of an alert that might be serious, but ignored because it happens so frequently?
"The detection of an opportunistic Trojan, which happens to include a keylogger (e.g., the Zeus Trojan), occurs at a high frequency and may be considered to have low business risk to an organization (AKA - a noisy detection) because the presumed motivation of the attacker is to steal a user's credentials to personal accounts (e.g., shopping, personal banking)," explained Oliver Tavakoli, the CTO of Vectra Networks.
"However, the same host may be used to login to IT systems or customer-owned systems, as in the case of an employee at Fazio Mechanical logging into a outside vendor support website at Target, thus resulting in the compromise of business-critical account credentials."
Likewise, Tavakoli added, the detection of spamming malware is also a frequent occurrence, but it is treated more as a nuisance.
However, an active spam bot that's been observed sending mail at high enough rates can cause the organization's external IP address to be black listed. Thus, employees could be prevented from sending legitimate communications, resulting in a hit to performance, the organization's reputation, and its bottom line.
Organizations collect all kinds of data that they've no clear intention of using, such as netflow or syslog for every session encountered by a given network appliance, login data from various application servers, or syslog messages for deny rules on the firewall.
The same is true for syslog messages for each URL that's blocked, which almost always ends with the assumption that a user violated policy, not that the event was triggered by a bot or remote actor.
"Many of these pieces of information are collected and dumped into repositories with some hope (usually unrequited) that someone in the org will find use for them in terms of regular operational cadence. More often than not, they're only looked at during post-breach forensics," Tavakoli said.
So the trick isn't to collect as much data as possible; it's to collect the right data at the right time. Tavakoli suggested a four-step strategy to accomplish this, including prioritizing the data that the organization intends to monitor.
The first step is to know where the organization's most important assets are located, how they're accessed, and what's required to protect them. After that, identify the primary attack surfaces, and consider the likelihood of a breach.
For example, in some cases, certain employees' machines could be an attack surface, or in other cases, contractors pose an outsized risk. Likewise, other examples include guest wireless networks, or Internet-facing portals that access internal systems or accounts.
From there, it's important to monitor those systems for anomalous behavior, especially as they communicate, not just among themselves (such as the application server talking to the database), but with other systems as well.
If these systems are part of the identified attack surface, any alert registered between them should be considered and investigated. This means tuning these security systems to send alerts on abnormal traffic only, so a clear understanding of the baseline is required.
Finally, it isn't enough to monitor inbound traffic for anomalies. Outbound traffic needs to be monitored as well, because this is where command and control communications take place, and ultimately, exfiltration activity can be observed.
Other types of observable anomalies
Reconnaissance can be fast (and noticeable), or slow and stealthy, Tavakoli explained, offering examples of other types of observable anomalies.
"Slow reconnaissance can be detected when a host on your network is contacting a large number of internal IP addresses that have not been active in the recent past. This type of scan occurs over longer periods (e.g. hours or days) than port scans (e.g. minutes or hours). Effective detection requires ignoring contact with systems that do not respond to the scanning host, but which are otherwise active."
In addition, he adds, brute-force password activity be detected by scrutinizing server logs for any server that is integrated into the organization's IdAM (identity access and management) infrastructure (e.g. Microsoft Active Directory) or by observing network behavior across a number of different protocols (e.g. SMB, Kerberos, RDP, VNC, SSH, HTTP, HTTPS).
Finally, exfiltration can be exceedingly difficult to detect on a smaller scale, but it's best to have a strategy for detecting large-scale exfiltrations.
"Look for patterns of behavior where a host on your network acquires large amounts of data from internal servers and subsequently sends a significant amount of data to an external system," Tavakoli said.
"Also, look for significant outbound flows of data from hosts on unusual channels (e.g. 10Gb outbound via FTP). The higher the volume of data acquired and sent increases the business risk and priority for investigation. This type of alert should be of the highest priority since it represents the last step of the attack chain and your last chance to prevent or mitigate data loss."
Doing more with less
Given that most organizations have a good deal of security infrastructure already in place, plenty can be done to better tune their devices and filter out the noise. Again, the key is to identify the most critical assets, and what it would take to attack them.
"For example, if your crown jewel is your Oracle database, you should have a well-established baseline for which hosts connect to it, the queries they perform on a regular basis and the amount of data transacted as a result of those queries," Tavakoli explained.
"A baseline provides an immediate intuitive reaction of whether a report showing 100,000 hosts connecting to the Oracle database performing 5 million queries in one day is normal or anomalous. You may be able to use products and technology you have (e.g, NetFlow analyzers), or you may need to evaluate new technology to accomplish this."
A good starting point is to practice good security hygiene by following the Critical Security Controls for Effective Cyber Defense, published by the Council on Cyber Security. The controls themselves can help protect against a number of common attacks, and improve the amount of actionable signals generated by the monitoring systems.
"For example, you can identify the exfiltration behavior of hosts, which represent a key part of your attack surface, by baselining the amount of data they send to the outside and setting a trigger to alert when that baseline is exceeded by a significant amount," Tavakoli said.
To put that in perspective, if a host that normally sends 20 MB of data on average is observed sending 2 GB in one day, this should raise alerts for possible exfiltration.
"The more these processes are automated and are performed in real time, the more the alerts will help mitigate attacks before they have done extensive damage."
So are there streams of data that can be safely ignored? Some can, but they are few and far between. The answer actually depends on the organization, as no two are alike.
"While there are very few security events that can be safely ignored, evaluate software that can help you automate the triage to separate opportunistic botnet activity from targeted attack activity. Ensure you can do this in near-real-time as finding this information out a day later exposes you to significant risk," Tavakoli said.