In today’s world, a failure is not something just your employees handle and work to recover. Anyone connected somehow to your company bears the consequences somehow: your business partners, your customers, any party outside your organization that face you take some hit. The consequences of the failure is revenue loss from the inability to conduct business at best. Lost customers, lost confidence, financial and legal proceedings, violated SLA consequences, recovery and repair costs amount to much more than the lost revenue.
In a true business recovery scenario, we have two pillars: availability and recovery. In today’s world, the expectations for availability is almost 100%. In retail sector, even 99,99% availability becomes an issue to talk about. That means, corporations need to ensure that the business is available maximum time available and time to recovery goals are defined in a couple of hours – in a highly competitive retail sector, this comes to minutes.
The other pillar is recovery. Some corporations consider onsite recovery only (they have their own reasons) but this is what I would not recommend. It is faster and more convenient but when the disaster strikes, local recovery is not an option. This leads us to offsite/cloud recovery, which involves one/many diverse geographical locations which can be utilized as quickly as possible. Furthermore, such sites are accessible just with a computer and a working Internet connection, making access more robust.
So how can we start questioning our disaster recovery plan? It is best to start with a complete inventory of IT assets, primarily including business processes. What are the ERP systems? What is the backbone of the e-mail systems, including archiving? What are the functions facing outside organizations – extranets, ERP access etc?
Once the inventory is complete, these items should be ranked for priority: which asset will be recovered first? Two simple questions are enough to determine this: is it related to business continuity and does it generate revenue? Further questions will help clarify the priorities. What is critical to run the internal processes? What is important for legal compliance? What data is critical for your customers and suppliers? Answers to such questions will help to create the recovery sequence.
Then the time to recovery and recovery point objectives should be defined. What applications will be recovered at what time? Most probably email application will be near the top of the list because it is a vital link in the information exchange both inside and outside the company. Therefore the time to recovery of this application should be as short as possible. On the contrary the database used by the human resources department to store candidate data is not a mission critical application (unless the company is a human resources consultancy) and thus can be down in the recovery list with a longer recovery time. At this point the tolerance on losing the data is also important. Depending on the type of data your company can afford to lose will impact the recovery objectives and recovery time. Don’t fall into the trap of thinking all data is equally important: your ERP databases are more important than the draft reports in the shared folders.
Once this is complete, then the existing recovery tools should be analyzed to see if they are fit for the requirements. If your time to recovery is 6 hours for the Exchange server and you can afford to lose 12 hours of email, and in that scenario if your existing backup systems can take backups once a day (24 hour cycle) then you have to rework either your requirements or your existing backup infrastructure. And if you decide at the end that the disaster recovery tools at hand do not answer the business requirements, choose one that satisfies the requirements and immediately arrange a serious training.
In most of the companies, disaster recovery is believed to be only the backup administrators’ jobs. This is one of the wrongest beliefs I have heard. Disaster recovery tasks could be coordinated by the backup administrators, that is fine. But every member of the IT team has to have a responsibility and assigned tasks in the recovery process. If the backup administrator is restoring the Exchange database, mail administrator has to check consistency and mount the database, desktop administrator has to check Outlook connectivity and the IT manager to test the system by sending and receiving email and check if the contacts, task and calendar items are properly restored and in place. To achieve this, the IT staff has to be “clustered” in order to achieve “duplication” and “redundancy” the same way as the systems have.
Next steps are testing and practicing. Anyone who has a cold sweat running down his spine when he discovers the backup is corrupt knows the importance of testing the backups. I agree that there is no such thing as recovering each backup and ensuring it is healthy is the way to go. However I do not agree not using the data verification and automatic recoverability checking options that come with the backup tools. Backup administrators can make random checks in – say – two weeks to see if the backups are OK, and they are recoverable (randomly restoring a database table, recovering random files from the file server are simple and less time consuming things that can be done easily). After that, IT staff should begin practicing recovery from simulated disasters. Every simulation will help to see what is not thought of, will help to improve the scenarios further and ensure that the IT staff is comfortable with the recovery process.
Walking through these steps in a careful way will ensure that the IT department has taken its responsibility to the maximum extent possible and the disaster recovery plan is in line with the business requirements. Appropriate tools, trained staff and practice will prove to be invaluable in the face of a disaster.
How do you evaluate your disaster recovery plans? Anything you would add to these points? Has your business faced any disaster? Share with us in the comments below!
- Featured image: www.flagshipnetworks.com