Another Look at Log Files

Marcus Ranum architected the first commercial firewall in 1990. He founded Network Flight Recorder Security, the company responsible for the first network forensics tool. And last summer at the Usenix conference, during a course he was teaching on log file analysis, he said that if nobody is ever going to look at your log files, then you might as well not bother keeping any logs at all.

For those who don't know him, Ranum is one of those security professionals who loves to be inflammatory. He maintains a webpage advertising an "ultimately secure intrusion prevention system"; it's a wire cutter. His most recent book, The Myth of Homeland Security, lambastes the government's "knee-jerk security at any price" response to 9/11. Ranum says that what we have bought for "nosebleed prices" are nothing more than "feel-good" security measures.

So when Ranum says that most log files are useless, and that most organizations are better off deleting their files and saving the disk space for something that's actually productive, he isn't really arguing for widespread file erasure. Instead, he's rightly sounding the alarm that organizations need to do a better job of working with the security tools they already have, such as log files.

This is especially true today because log files are increasingly being used in discovery and are now admissible in court. As a result, information that your organization is collecting could be used by (or against) you. In addition, you might be spending money duplicating efforts to mine for data that your log files have already collected for you. It is important for CSOs to understand the wealth of information being collected in their log files, and how they can use that information to its fullest.

A log file is simply a file that makes a record, or a log, of some actions that a computer has performed. Logs on Unix systems tend to be stored in ASCII text files, with one line per log entry. Logs on Windows systems can be stored either in ASCII log files or as "events" inside the Windows Event database managed by the operating system. On either system, log entries typically contain a date and time, the program responsible for creating the log message and a short text description of what actually happened. Log files are traditionally used for debugging. While the information in these files can be corrupted, the majority of the files are accurate.

Traditional Use Gets Expanded

The most common "official" uses for log files include billing, utilization analysis and incident management. For example, a Web-hosting provider might have a program that processes the log files from its Web server to determine how many gigabytes each of its customers transmitted in a month so that they can be billed appropriately. If a hacker starts probing a script for vulnerabilities, those repeated probe attempts will likewise show up in the logs.

But the real use of log files, in practice, is for debugging. The overwhelming number of log messages that I have seen were not designed for any set of functional requirements, but by a programmer who was trying to understand why his or her program wasn't working properly. As a result, log files frequently contain cryptic information that isn't documented and was designed to be interpreted by human eyes, not by automated software.

The level of detail that will show up in some log files can be astonishing, and the files frequently contain considerable volumes of personal information. Mail logs contain a detailed list of which users sent e-mail to whom, and when. Other logs reveal when e-mail was downloaded to desktops or laptops, which you can use to find out when people were actually working and when they were slacking off. Servers that hand out addresses for the Internet's Dynamic Host Configuration Protocol (DHCP) record the hardware MAC address of every Ethernet card they see. In a wireless network, DHCP servers can also record where the laptop was seen in your organization. Such information can be incredibly useful in an internal investigation, whether you are trying to recover from a break-in, trace the source of a harassing e-mail or gather information that will be used to terminate a problem employee.

On the other hand, information in log files can be wrong. One kind of error happens when the information that's recorded doesn't mean what you think it means. Tina may have stopped checking her e-mail because she wasn't getting any work done and was on a deadline, not because she was taking a three-hour lunch. Another kind of error is more insidious: Log entries can be deleted, modified, or even maliciously created in an attempt to eliminate evidence or deflect suspicion to an innocent party.

Protect Files from Malicious Attack

It's certainly true that the vast majority of log entries are absolutely honest and correct. But it's also true, as Ranum implies, that most log entries are never examined by a human being. If you are going to use a log for an investigation, you can't assume that the log is true simply because most of the other records on your computer are true and correct. If a crime really has taken place, it's quite possible that the bad guy has intentionally corrupted the logs.

One way to minimize the chances of log files being maliciously modified-and to increase the chances that your logs will hold up in court-is to store them on a special "log server" to which access is generally restricted. Setting up a system such as this is easy with Unix; the industry standard "syslog" logging utility has supported remote logging since the early 1980s. You can either set up a Unix workstation with a lot of disk space as your log server, or you can purchase a specially built logging appliance. In the Windows world, there are a number of remote logging systems available.

Determine Your Intentions

You must decide ahead of time whether you intend to use your logs as video recorders or burglar alarms. If your logs are primarily a recording system, then you will consult them only when evidence of wrongdoing arises from some other channel-for example, if your CEO receives a death threat. In that case, you'd examine all of the logs at your disposal to see who was in the building, where they were located, and what they were doing.

If you want to use your logs as an alarm system, you'll need to have a person or an automated process that regularly scans the logs for something noteworthy. This can be a challenge, because you often don't know exactly what you are looking for. The ideal log analysis tool would alert you to unauthorized or unusual activity. But how does a computer know what's unauthorized or unusual?

Accuracy Is the Goal

Not surprisingly, designing systems that can automatically recognize the unusual has been a hot area of research for many years. It is an easy problem to solve, but solutions are difficult to implement well. As with all recognition problems, the issue is accuracy: Make the system very sensitive, and you will get many false positives; make the system less sensitive, and important events will slip right through the cracks.

My favourite technique to analyze log files is to have filters that do not recognize unusual events, but instead recognize usual ones. Run these "negative" filters, and everything that's leftover is what you should focus your analysis on. If you see too many "normal" events that should have been filtered out, you respond by writing yet another filter. This is also the approach that Ranum recommends.

Vigilance Is Key

Making a negative filter work in practice, though, is hard: Even when it works properly, you end up with a system that is constantly trying to attract your attention. Put the reports on a webpage, and you'll forget to check them. Arrange to have them sent by e-mail, and you'll quickly learn to hit the delete button after a fast and insufficient scan. Ultimately, the only solution for this problem is vigilance, proactive auditing and penetration testing. If the auditor's attempts to break into your network aren't picked up by your logging system, then it's time to revamp the system, replace the people who are using it, or delete all of your logs and use the disk space for something more productive.

Simson Garfinkel, CISSP, is a technology writer based in the Boston area.
Show Comments