No matter what measures you take, sometimes circumstances will occur that are beyond your control. Maybe someone decided to aim a massive DDOS at your servers to make a statement. Maybe there was a natural disaster that took things offline. Or maybe your load balancers went haywire.
Whatever the case, the unthinkable has happened. Your data center was brought to its knees. Clients were unable to access your services for an extended period of time. By the time you bring things back online, the damage has been done.
Don’t panic. At this stage, all you can do is recover, figure out what happened, and determine how you might avoid the outage in the future. Let’s talk about that.
Step 1: Take Stock of The Damage
First thing’s first, you need to estimate how much business and data were lost as a result of the downtime, what critical systems were impacted, and which clients suffered the greatest losses. All of this information is essential. To that end, in addition to examining your application infrastructure, execute a careful evaluation of the following systems:
- Hardware (servers, desktops, routers, switches, wireless devices, etc.)
- Data storage devices
- Transit routes
- Building appliances such as air conditioning and power infrastructure
Make sure you’re regularly communicating with your client base through a public medium such as Twitter – let them know you’re investigating the cause of your outage.
Step 2: Determine How The Disaster Happened
Once you’ve figured out what systems were damaged, it’s time to really get down to brass tacks. You need to launch an investigation into how the outage happened in the first place. In some cases, this might be fairly open and shut – it’s usually fairly obvious, for example, how a tornado might cause a data center to shut down.
Step 3: Notify Your Clients and Vendors As Needed
Once you’ve determined which systems have been impacted, get in touch with any clients and vendors who’ve been directly affected. Take stock of the damages they’ve suffered, and figure out how to restore connectivity to them as soon as possible. In addition, you may want to look into reparations, depending on how your service level agreement is worded.
Step 4: Get Things Up And Running Again
This one’s pretty self-explanatory. Replace any systems that were lost or damaged as a result of the disaster, and get yourself back in business as quickly as possible. The longer your facility’s out of commission for, the greater the cost.
Step 5: Re-Evaluate Your Disaster Recovery Plan
Last but certainly not least, take a close, careful look at your disaster recovery plan. Was it sufficient for this particular crisis? Did you do everything you could to keep your data center up and running? What could you do differently if something like this happened again.
A DR plan and the policies that surround it should be updated on a constant basis, especially in the wake of a disaster. And while disasters may occasionally occur that are beyond your control, there’s almost always a way to mitigate them. It’s just a matter of being properly prepared.
Tim Mullahy is the General Manager at Liberty Center One. Liberty Center One is a new breed of data center located in Royal Oak, MI. Liberty can host any customer solution regardless of space, power, or networking/bandwidth requirements.