Cleaning dirty data is not just a matter of mastering the technical challenges. It requires making sure your staff is working closely with the business every step of the way.
In the early hours of March 20, 2003, British soldiers, sailors and airmen joined US forces in the invasion of Iraq and the toppling of Saddam Hussein. Thus far, they have played a vital role in rebuilding Basra and the critical Persian Gulf port of Umm Qasr. Massive shipments of military materiel were essential to their success, and basically, anything that wasn't a vehicle, live ammunition or fresh provisions (which have different supply lines) began its journey to the Gulf from England's military warehouses. In the few weeks prior to the invasion of Iraq, these depots sent by ship or air 3169 6-metre shipping containers to the Gulf, along with almost 22,000 1-metre pallets.
Getting these shipments to the Gulf was a logistical nightmare that would have been far more fraught had the British defence ministry not embarked four years ago on a £6 million effort to pull together three separate supply chains: This involved reconciling some 850 different information systems, and integrating three inventory management systems and 15 remote systems.
The biggest foe in this massive integration effort was not Saddam Hussein, but dirty or disparate data. To one system, stock number 99 000 1111 was a 24-hour, cold-climate ration pack. To another system, the same number referred to an electronic radio valve. And if hungry troops were sent radio valves instead of rations, the invasion and rebuilding of Iraq wouldn't have gone very far.
Dirty data has long been a CIO's bugbear. But in today's wired world, the costs and consequences of inaccurate information are rising exponentially. Muddled mailing lists are one thing, missing military materiel quite another. Throw in the complications arising from merging different data sets, as in the aftermath of a merger or acquisition, and the difficulties of data cleansing multiply. For this article, we interviewed seasoned data-cleaning veterans from organizations as diverse as the British Ministry of Defence, the US Census Bureau and Cendant, a real estate and hospitality conglomerate. But the lessons learned contain two common themes: How to surmount the technical challenges of cleaning data, and how to align IT staff with the business side to ensure that the task gets done right.