You are here

States and Localities Share Disaster Recovery Lessons Learned

Last year's onslaught of natural disasters put organizations' COOP plans to the test.

Whatever their opinions may have been a few years ago, no city officials in Bastrop, Texas, would now view disaster recovery spending as a burden on their IT budget.

That's because they've discovered how quickly and unexpectedly disaster can hit. Last Labor Day, a 34,370-acre wildfire spread through the eastern side of this small city, destroying 1,688 structures and menacing the populace for 30 days before federal, state and local emergency crews were finally able to bring it under control.

"Everyone saw how close disaster came, and we consider ourselves very, very lucky to have come out of it relatively unscathed," admits Kevin Unger, Bastrop's IT director, who had deployed a new telecom system and a backup and recovery solution two months before the fire struck.

Natural disasters are nothing new for states and localities, but 2011 proved to be one for the record books as the United States experienced an unprecedented number of blizzards, tornadoes, hurricanes, floods, earthquakes and raging wildfires. Many of the events were beyond the scope of what anyone had ever experienced.

While many states and localities had continuity of operations (COOP) plans in place, often those plans didn't stand up in the face of ultra-extreme weather. Even well-experienced IT officials had to modify their plans as the disasters played out, and some had to abandon their contingency planning altogether.

Effective Communication

Hired in 2010 to modernize the city of Bastrop's outdated IT infrastructure, ­Unger had COOP in mind when he insisted on spending scarce resources to incorporate layers of redundancy, capacity, flexibility and data recovery capabilities into the new architecture.

The IT infrastructure includes a high-speed fiber-optic network linking the city's seven government locations, an Avaya Voice over IP phone system and two new HP servers outfitted with VMware virtualization software. Unger also purchased two Quantum DXi4500 appliances for backup.

"My concern was that we were not able to easily replicate and move our entire data set from one facility to the next," Unger says. "If I had a disaster at, say, the police department, I wanted to be able to literally push a button and bring everything up over at city hall with no interruption of services to our citizens."

While the flames never threatened those resources, the VoIP system proved to be vital because it facilitated communications between the hundreds of emergency personnel who came in and set up a command post in Bastrop's convention center.

"The VoIP system has so much fluidity to it that I could continue city operations and maintain service to our citizens while allocating portions of our telecommunications pathways from our phone servers to the convention center," Unger explains. "Agencies ended up using our phone system to communicate, direct the emergency response and operate a public call center. That really was crucial to keeping this disaster from being a lot worse than it might have been."

Worse and Worst

September was a tough month in ­Vermont too, as folks grappled with the aftermath of Hurricane Irene. The state backed up its data in preparation. ­"Vermont was not expected to take a direct hit, and we were mostly expecting wind," says Kris Rowley, the state's chief information security officer.

$52 billion Estimated financial cost of natural disasters to the United States in 2011

SOURCE: National Atmospheric and Oceanic Administration

Instead, the state got rain — lots of it. More than 14 inches fell in the first 12 hours of the storm, turning even tiny streams into raging rivers and setting off sudden mudslides. "We had a pretty robust disaster preparedness plan with all kinds of contingencies in place, but nothing was going to prepare us for this," Rowley says.

A river soon burst out of its banks near one of the state's largest data centers in Waterbury. Following the DR plan, IT officials ordered a soft shutdown of the operation and had trucks ready to haul away the servers and equipment to a safer location.

"Unfortunately, the water came up so fast that the trucks were washed away, so there went Plan A," recalls Rowley. When water suddenly began pouring into the building, personnel were reduced "to cutting wires and ripping things out of the machines so they could physically carry as much equipment as they could to the next floor up."

Much of the equipment ended up destroyed, and efforts to salvage it were derailed even further when a heavy construction crew scooped up the left-behind machines and transported them to the landfill. "That's been a lesson learned for us — communications and contingency plans need to extend to any cleanup effort," Rowley says. "We ended up having to go dumpster diving once we realized what was happening to make sure sensitive data didn't fall into the wrong hands. We were even on eBay trying to replace parts and equipment that had been destroyed due to the floodwaters."

All turned out well in the end. In seven days, the Department of Information and Innovation (DII), working with the IT staff at the Vermont Agency of Human Services, was able to fully reconstruct the lost data centers at DII's main building in Montpelier and move in 77 displaced personnel. "It was pure chaos, but we somehow worked our way through it," Rowley says.

North Dakota also experienced historic-level flooding during the summer with heavy rains and broken levees conspiring to threaten the capital city of Bismarck and completely inundate several small towns. Although the state's resources weren't directly affected, state IT personnel spent much of their time working to make the state's centralized network available to clients and help them carry out and modify their disaster recovery plans.

"Virtually all organizations had a DR plan they were working; however, in some instances, the projected flood levels exceeded their scenario planning," explains Duane Schell, director of the state telecommunications division. "In those cases, we worked with the entities to adjust their plans on the fly a little bit to make it workable and help them identify alternate secondary locations that could accommodate such historic flood levels."

"For the most part, we all worked together very well, and we were fortunate the flood levels did not reach the worst-case scenarios," he adds. "The event did, however, expose weaknesses for a few en­tities' existing DR plans, reinforcing the need for continual review and improvement."

Onward and Upward

Implementing a successful disaster recovery and continuity of operations plan should be an ongoing, iterative process. Those who weathered natural disasters in 2011 are now determining the most important lessons learned and incorporating them into their disaster recovery plans.

For North Dakota, one problem that came to light during last summer's floods was how many core IT and communications experts were unexpectedly detained by the need to save their own homes and move their families to safety, making them unavailable for the recovery effort. As a result, explains Dan Sipes, director of administrative services for the state IT department, "we are reviewing our documentation to ensure that critical processes and procedures are documented so we could have other staff members step into a critical role if we end up having similar coverage gaps in the future."

Bastrop, Texas, is looking closely at adding more geographic diversity to its backup sites, which are currently located in two different buildings in the city. "If the fire had come through the city, we would have been in real trouble," Unger says. He's exploring the possibility of remote backup, possibly utilizing the cloud or even going out of state, as a way of ensuring that his data can be easily removed and secured well beyond the threat of a local calamity.

Rowley notes that numerous changes and upgrades to Vermont's DR and COOP plans are now on the table, including incorporating more virtualization and cloud-based backup and storage, looking at alternative recovery sites, and merging agencies and consolidating operations.

"Quite frankly, there are a lot of things that we are considering now that we didn't before," Rowley says. "We're still in the process of recovering, but in the meantime, we're jotting down notes, rethinking things and looking forward to doing this better the next time. When you have a disaster of this magnitude, it changes not only the physical landscape of the state, but also the IT landscape."

Start Strong

Disaster recovery encompasses much more than having the right technology. "As people get into it, they realize that effective DR or business continuity is driven more by the right policies, procedures and relationships," says John Morency, a vice president for systems, security and risk at Gartner. "It's really more of an execution issue, a cultural issue, and each organization is going to deal with different challenges."

Organizations should heed the following advice:

  • Don't insist on perfection. A good DR plan involves an iterative process. Start with some simple steps based on a likely scenario, and then do tabletop exercises to tweak the plan as requirements necessitate and more money is made available.
  • Always think ahead. Apply lessons from other people's disasters. Imagine all the ways disasters might play themselves out, develop workable contingency plans, and establish relationships with DR experts and counterparts and other emergency personnel before disaster strikes.
  • Have a deep bench. Key personnel are sometimes directly impacted during a major crisis and unable to perform their jobs. For this reason, cross-train IT staff, document key duties and processes, and have two or three backups established and ready to step into that vacant role when necessary to provide needed expertise.
  • Seek peace of mind. At a minimum, government should put good backup procedures in place. By utilizing data vaulting or remote storage clouds, for example, IT officials will at least know their critical data is safe during a ­disaster, even if they don't yet have ­automatic procedures and technologies available for application failover and recovery.
  • Keep communications open. Employees and citizens will clamor for information during a disaster, so leverage web portals and social networking sites to keep everyone informed.
Apr 12 2012

Comments