When a go-to guy takes a holiday

The knowledge gap is noticeable when a key staffer takes a break

I'm sure many of you can relate when I say that a company with a single point of failure is primed for disaster. Sadly, there's one in my department.

My senior security engineer is the go-to person for anything to do with firewalls, the virtual private network and certain other critical aspects of our security infrastructure. There's no one else. That makes me uncomfortable, but when I have brought it to the attention of upper management, I've been told that there is just no budget for more engineers.

So, how about cross-training one or more of the network engineers to administer the firewalls and VPN concentrators? Nope; the network manager doesn't have enough people to handle network-related work, let alone the added burden of firewall, VPN and SecurID administration.

Working without a net

Therefore, although I know the danger of relying on one person to maintain all knowledge about any aspect of my department's operations, I have my very own single point of failure. Naturally, a single point of failure can't work all the time, but my guy has been working his tail off for the past six months. When enough was finally enough, he asked to take a couple of days off to be with his family. I checked the calendar for upcoming changes to the infrastructure that might need his attention, and the coast looked clear. I let him take three days off. (Actually, I'm not even going to charge him vacation time; he works so hard that I gave him the days as comp time.)

On Wednesday, I received the first call. The manager of our mobility project needed a VPN set up between us and a service provider. This project enables our field service engineers to use BlackBerry smart phones to access the customer relationship management application on the internal network. Not surprisingly, the setup was needed immediately. Time to roll up my sleeves and get to work.

I've had hands-on experience at different points in my career, but I hadn't touched a Unix console or a firewall in at least a year. As a manager, I spend most of my time on project management, budget issues, personnel problems, policy writing and attending meetings. I simply don't have time for hands-on operational things, and I'm a bit rusty. But with my single point of failure unavailable, I had to make time, rusty or not.

I logged into our partner VPN firewall and attempted to configure the VPN tunnel using the parameters provided by the service provider. Sounds easy enough. But soon I was pulling my hair out as I tried to figure out why the VPN tunnel wasn't being established. I was almost bald when I realized what the problem was: The service provider's Cisco PIX firewall and my company's Juniper NetScreen firewall just don't talk the same language. This is a well-documented issue, but there's no easy fix, and it took me a while to figure out that the solution lay with what is called "proxy ID," which essentially defines which networks are to be tunneled. As soon as I configured the proxy ID properly, the tunnel came up, and I was able to successfully pass the proper traffic between three servers on our internal network and several resources on our partner's network.

That same day, I received a call from the network operations center about another VPN problem. Our suppliers were having trouble using a portal we had set up for them to access some of our internal applications. The portal is built on a Juniper SSL VPN concentrator, with RSA Secur­ID tokens used for two-factor authentication, CA Netegrity for single sign-on, and Microsoft Active Directory for identifying authorization levels.

Troubleshooting this problem took me several hours. First, I checked the SecurID logs, which indicated that the users were properly authenticating. The SSL VPN logs indicated that users' log-ons had been successful. Nonetheless, we couldn't be sure that the authentication traffic was reaching all the resources; in that regard, the logs weren't very meaningful.

I deployed a Snort sensor on the network segment that was running the supplier portal infrastructure. The network team configured the sensor on the proper network span ports, and I monitored the network traffic for indications of activity. That showed me that the SSL VPN concentrator wasn't sending properly formatted packets to the Web portal. This was odd, since the logs seemed to indicate that sessions had been successful. I ended up rebooting the SSL VPN concentrator, which fixed the problem. Then I opened up a support call; I'll let my security engineer handle this matter when he gets back.

Oh, how I wish my single point of failure never needed a holiday. But my days in the trenches showed me that he certainly deserved one.

Show Comments