have talked extensively on disaster recovery (DR) plans and I was pretty sure that I have covered everything. From the concept of moving DR to the cloud, including the issues to consider, to CIO tips, from choosing the right DR software to the people perspective, I thought my words on the DR was complete. Wrong. Here is an overall check on your DR plans, right from the field, assuming that you already have your business impact analysis.
Do you have a written plan? Is that obvious? Don’t be so sure. I have seen medium sized businesses who have “complete DR plans” in “somebody’s mind” that will be implemented when something goes wrong. It does not matter if your IT manager/CIO is in your company right from the first day. When things go wrong, people forget and confuse things. There is no replacement to a written DR plan. Make sure it is written and is available any time.
Do you have a DR team? Make sure you have a DR team and choose one contact person for communication from each office or department. In case your company is geographically distributed, and your other offices are as big, you can choose one contact person for each office. Make sure that their way of communication both with the IT and with our departments are clearly defined.
As an executive, you are right to expect that when disaster strikes, everybody will make their way to the corporate offices to bring your business back to normal. This may not be the case though. You may have missing personnel, your personnel may be emotionally unstable and there may be many other people-related factors. If the size of your IT department permits, make sure that you have failover clusters for every role – either from another staff member or a consultant. (I recommend you to go through my Disaster Recovery? Don’t Forget Your Staff! article for all people-related factors and possible risks.)
Did I assess my risks from various perspectives? To start with, what would you do if the affected place is completely inaccessible – an earthquake has brought it to rubbles or a flood made the building unreachable? With whom will you contact in such an emergency – both internally and externally? Once you informed everybody and gathered your IT team, what communication infrastructure you need to bring up to begin working – e-mail, call center, VPN, remote desktop, instant messaging? What information will the employees need to run the business at the minimum acceptable level? How can you make this information available? What will you do about access controls? Who will be given VPN rights, who will be permitted to the DR site, who will manage the various online functions of the business? You need to make sure that the DR plan includes every scenario and every step to make the business run after the disaster strikes.
Does my on-premise data center confirms with the best practices? Again, you may think that everything with the data center is well thought of. Awfully wrong! Do you know that one of the largest companies (a holding) that I was working for had its data center in B2 floor and it was flooded after a heavy rain? The data center used to be in 6th floor, owner’s daughter wasn’t happy with “the computers being in a high-valued floor” and ordered it to be moved to B2. Nobody could object and the data center was moved. You may be surprised to see that there is only one generator (not clustered), no smoke detector (or in a similar way, untested), no fire suppression mechanism (FM 200 or similar, non-water based) etc.. If you do not have planned maintenance in place, maintenance and tests of these items may easily be forgotten. Make sure that your DR plans have links to these maintenance procedures and further make sure that these procedures are carried out. And if your company is in a single-floor building, make sure that you have raised floor in your data center.
Where are my back-ups and how do I access them? If you have tape back-ups, make sure that they are removed daily from your office and stored in a secure, easily accessible location. Also make sure that at least 3 employees who are living in different parts of the city has the keys to access the backups. Tape backups will ensure that you will have faster recovery times (tape libraries are faster than many Wide Area Network -WAN- or Metropolitan Area Networks -MAN-). If your company can afford it, make sure that you backup your data remotely to another location.
How can my customers access my company? In a disaster scenario, customers will be in a panic situation and most of them will not be able remain calm. In this case, it is better to have a call center; better to also forward the company’s main line to the call center. Have a list of questions for the call center staff to help classify which calls are critical and train the call center personnel on how to answer those critical call types. For my clients, I recommend using virtual switchboard solutions. In one of my DR projects, I had the company forward its main lines to the virtual switchboard and further mapped call center staff’s mobile phones to the menu options in the welcome message. So, when a customer calls the company and dials a specific menu item, the relevant call center staff member’s mobile phone rings (of course there are more technical details to this – virtual voicemail boxes, waiting lists etc.).
Are my DR test scenarios realistic? Did I really test them? Did I improve them? The DR test scenarios should be as realistic as possible. This is not just about the technical issues, but also about people’s reactions. How will your DR team behave when they are under pressure? How will anxiety affect their ability to carry out the plans? Did you really tested the scenario or did you just “go through the steps?” How hard did you push the limits – i.e. working with your telecom partner? What did you learn from the test and how did you incorporate those lessons to your plans to improve them?
Take your time to evaluate your DR plan once again by asking those questions. When you have answered all the questions, your plans will be as complete as possible. But do not forget that there is no such thing as being too prepared. Review, evaluate, improve, train and test all the time.
Do you feel I left out anything? Help us improve in the comments!
- Featured image: http://www.senior-systems.com