230 million patients. 3,300 hospitals. 900,000 healthcare professionals. 98 percent of U.S. pharmacies. More than 700 different electronic health record platforms. 764 million medication histories. 6.5 billion transactions processed last year alone.
"We've definitely had an opportunity to become experts in Big Data," said Paul Calatayud, CISO at Surescripts.
Surescripts is the country's largest health information network, storing and protecting one of the most valuable information treasure troves on the planet.
Calatayud said that the company first began looking at Big Data analytics to spot fraudulent activity by patients or doctors. Two years ago, this was done using spreadsheets and pivot tables, he said, and it took about a year to move to Hadoop for the data storage and Splunk for the analytics.
Then, six months ago, Surescripts began using the same approach for internal security, processing incident and log data.
Logs and incident reports
Surescripts might be operating on a bigger scale than most companies, but most enterprises of any size are dealing with a flood of data from firewalls, networks, email systems, individual work stations, servers, and other devices.
The information comes in fast, in large volumes, and in a wide variety of formats -- the classic definition of Big Data.
Traditional data management systems would quickly run into scalability issues.
However, just having all the log and incident data in one place doesn't improve security if there's too much information there for people to manage, said Marcin Kleczynski, founder and CEO at Malwarebytes.
Big Data analytics helps companies process all this information, prioritize the most significant threats, and weed out random noise and false alerts. At least, that's the idea.
"Lots of mysterious black-box technologies are offered for this," said Mike Lloyd, CTO at security analytics company RedSeal. They include genetic algorithms, machine learning, and artificial intelligence.
"What they have in common is that they are poorly understood, but powerful, and this makes them very appealing as silver bullets," he said. "But artificial intelligence has been frustratingly hard to build computers just aren't all that smart."
This is the cutting edge of security technology, where automation is balanced with human expertise, people who know which questions to ask to get useful answers.
"Data mountains need data mountaineers," he said. "The data won't analyze itself. Simply buying a big data warehouse and layering some Hadoop technologies on top isn't going to bring about enlightenment."
However, the automation technologies are evolving. They correlate more feeds, and are increasingly able to look at events from different perspectives.
For example, risk management, incident response and forensics ask different questions of the data, he said, and different technological approaches are being developed to meet these needs.
And that's just the start of what Big Data can do to improve security, said Jerry Irvine, CIO at Prescient Solutions.
Once a threat is spotted and confirmed anywhere on the network, it can be automatically detected and blocked everywhere else. The threat information can also be shared with industry peers, or the security community as a whole.
"This could be one of the first times that security professionals and security solutions are able to react to these cyberrisks as quickly as the cybercriminals can create them," Irvine said.
Vendors such as Resilient Systems already pull together internal logs with data from external threat intelligence feeds.
"This can obviously be a huge benefit," said Resilient CEO John Bruce. "The added context helps you organize better prevention, detect attacks faster, and in particular, orchestrate a much more efficient response."
For example, if particular IP addresses or malware have been spotted in other attacks, it may provide clues about what else might be going on, and help distinguish significant targeted threats from random, opportunistic attacks.
"It's through techniques like this that we'll see the true value of mining data," he said.
The right analysis of the security data can also help uncover subtle patterns that could be indicators of stealthy attackers.
This is particularly important for advanced persistent threats. Instead of just looking to hijack one computer to use as a botnet, or to steal one user's bank login, these attackers can spend weeks -- or months -- burrowing into a company's systems in order to go after the most valuable assets.
These cybercriminals have learned how to evade traditional approaches that use standard rules, signatures and sandboxing, said Muddu Sudhakar, co-founder and CEO at Caspida, an analytics firm acquired by Splunk earlier this month.
But as the criminals reconnoiter systems, move laterally and escalate privileges, they don't stay completely invisible.
"They leave behind telltale signals in the network and activity logs," he said.
The right analytics can look past the noise and spot these signals, whether the attackers are criminals, agents of foreign governments, or even internal actors.
Users behaving badly
Surescripts began looking at user behaviors and credentials three months ago.
"That's where things move more into unstructured data," said Surescripts' Calatayud.
To get the data analyzed, Calatayud is looking at an analytics platform from Gurucul, which specializes in identity access intelligence and user behavior analytics.
"They can slice up the data to specifically address my use cases," he said. "And it allows us to leverage industry expertise rather than trying to build up core competencies and strategies that might not become directly revenue opportunities.
According to Verizon, attackers use compromised user credentials more frequently than any other weapon in their arsenals. But it can be hard to tell if a particular user account is used legitimately by the actual employee, or by an intruder.
This is especially the case when the malicious behavior is only slightly out of the norm for that account, and when the enterprise is large, and there are a lot of accounts to monitor.
"Hackers are using good user credentials as a way to infiltrate organizations and a lot of the products on the market right now would not catch that," said Eric Schou, director of product marketing for HP's enterprise security products groups.
To help with this, HP released a user behavior analytics product this spring at the RSA conference.
The technology can identify typical user behavior patterns in order to spot unusual behaviors, but can also be programmed with rules based on access policies for particular groups of employees.
For example, he said, if a user from the marketing department is logging into Oracle Financials, that could be in violation of a policy and send up a red flag.
He warned, however, that employees will often do unusual things for legitimate reasons. "It might just be behavior that's out of the norm, but not malicious."
It's not just people who can behave badly. Similar technology can be used to identify normal behaviors of individual endpoints, and recognize when they're doing something suspicious.
"Previous efforts in security analytics were unable to meaningfully represent the expected and normal behavior for connected endpoints," said Bryan Doerr, CEO at Observable Networks, one of the vendors offering this technology.
That challenge has only been getting harder, he added, as the number of endpoints has been growing along with the amount of data they generate that companies can now store.
"Our big idea was to use the data avalanche as inputs to a modeling process," he said. "We use all this rich data about endpoints to maintain models of their behavior, so that we can recognize when they do things they should not do."
This is all new territory, said Sriram Ramachandran, CEO at analytics vendor Niara.
"It has been a challenge to make the whole thing work seamlessly as a product," he said, and it's been holding back adoption.
"Getting all the information and putting it into elastic storage -- you can do that very quickly," he said. "If you Google Alan Turing, you will get millions of results. But the panel on the right, with the summary -- that requires machine learning."