We all know that we should have a Disaster Recovery / Business Continuity Plan. Yet most of us have worked in businesses where this is a convenient afterthought. Even when businesses have them, active testing of them is often patchy at best. Many businesses aspire to do this at least once a year, fail to meet this target and even if they do, they then brush a lot of things under the carpet.
For many years the key concern has been a major fire, followed by lesser concerns about flooding, terrorist attacks and other major natural disasters. Statistics suggest, that in the UK the typical rate of major fires is around once per hundred years of a data centre's operations. This is actually a very high high rate. Although in actual practice the more frequent major incidents which disrupt operations tend to be caused by more mundane things such as loss of power from the grid, major network switch failures within the the data centre or loss of telecommunications coming into a data centre.
Many businesses have been content to make minimal investment in preparations and accept the risk. They have mostly got away with this despite urban myths about the high percentage of businesses, suffering major incidents, which go out of business. Though if you personally have ever lived through such an incident, you would not want to do so again.
This complacency is looking increasingly out of place as enterprises go digital. For one thing, operations become impossible to deliver with failure, for another the increasing frequency of "Cyber Attacks" means that the old cosy assumptions are no longer valid and not only may operations be disrupted but valuable information or IPR stolen and an enterprise's reputation destroyed along with customer confidence.
The increasing pace of change inherent with modern digital business, based on Agile and DevOps styles of continuous change, also mean that an annual test is laughable as recovery plans will never be up to date if annual refresh thinking continues to dominate. This will also exacerbated by use of multiple SaaS, PaaS and IaaS services. As although each one used may increase the theoretical resilience of the enterprise's systems, it also complicates the inter-dependencies between them.
Business and IT Management Teams need to actively engage in preparing for major disasters and incidents. This means several things need to be addressed:
- capturing all changes to the systems and process lanscape, especially adoption of SaaS services, so that current architecture is documented, understood, risk assessed and continuously revised in recovery plans;
- regular incremental testing of recovery plans to address changes to the systems landscape;
- conduct of scenario "war games" to evaluate responses to different types of threat, taking into account that under Murphy's Law key people may be unavailable when a major incident occurs;
- regular review of major 3rd party services that the enterprise relies upon for the suitability their response capabilities and likely behaviours;
- media training of all senior executives and managers who may be called upon to represent the enterprise in the event of an incident, taking into account that some of them may have been incapacitated by the incident or away from the business.
Not many of us work in enterprises where all this happens, but most of us need this now.
No comments:
Post a Comment