Developing your IT recovery plan (ITSAP.40.004)

Unplanned outages, cyber attacks, and natural disasters can happen unexpectedly. Your organization may lose information or experience downtime that disrupts or stops critical business functions. Unplanned downtime is expensive and could have a lasting impact on your business. To ensure continued operations with minimal downtime, your organization should have an IT recovery plan as part of your overall business continuity approach. The IT recovery plan should identify critical data, applications, and processes, and define how your organization will recover IT services that support business operations, products, and services.

On this page

 

Your IT recovery plan should clearly identify and document what needs to be recovered, when, where, and by whom.

In general, there are 3 types of plans you should consider developing for your business. These plans take into consideration major events that could cause an unplanned outage and require a recovery response.

  • Incident response plan: Event-focused plan, specific to a security incident like a cyber attack affecting an organization
  • Business continuity plan: Specific plan to quickly resume only the most critical operations, as defined by a business impact analysis, in the event of a disaster
  • Disaster recovery plan: Holistic plan to return your organization to full operations after a disaster

Know your business disruption tolerance

To develop an effective recovery plan, you should tailor it to address the impact an incident would have on your organization. Your plan should also specify the level of disruption your organization is willing to accept if an incident occurs. There are 3 key measures to consider in your plan:

  • Maximum tolerable downtime: The total length of time that a process can be unavailable without causing significant harm to your business
  • Recovery point objective: The measurement of data loss that is tolerable to your organization
  • Recovery time objective: The planned time and level of service needed to meet the system owner’s minimum expectations

Identify your critical business functions, applications, and data

Your plan should identify your organization’s critical data, applications, and functions. Critical data may include financial records, proprietary assets, and personal data.

Critical applications are the systems that run your key business functions and are imperative to your business. These are the systems that must be restored immediately for business continuity in the event of an unplanned outage.

To identify critical business functions, applications, and data, you should conduct a risk assessment to identify threats and vulnerabilities. Run through specific scenarios (such as a cyber attack, significant power outage, or natural disaster) to identify key participants and stakeholders. Reviewing these scenarios will also help you address significant risks, develop mitigation strategies, and identify the recovery time and effort.

Conduct a business impact analysis (BIA) to predict how disruptions or incidents will harm your operations, business processes and systems, and finances. During your BIA, you should also assess the data that you collect and the applications that you use to determine their criticality and choose priorities for immediate recovery.

Create your IT recovery plan

Complete to the following steps when creating your organization’s IT recovery plan.

  1. Identify stakeholders, including clients, vendors, business owners, systems owners, and managers
  2. Identify your response team members, as well as their roles and responsibilities
  3. Take inventory of all your hardware and software assets
  4. Identify and prioritize critical business functions, applications, and data
  5. Set clear recovery objectives
  6. Define back-up and recovery strategies
  7. Test your plan regularly
  8. Develop a communications plan to inform key stakeholders
  9. Develop a training program for employees to ensure that everyone is aware of their roles, responsibilities, and the order of operations during an unplanned outage
  10. Engage with managed service providers if required to identify areas in which they can assist you with your recovery efforts

Choose your recovery strategy

There are several options to consider when implementing your recovery strategy, but you should choose a recovery strategy that meets your business needs and security requirements.

Hot, warm, or cold site

  • Hot site
    • back-up site with the same servers and equipment as your primary site
    • functions the same as your primary site and is always kept running in case of downtime
    • data synchronization occurs within minutes to hours, reducing the risk of data loss
  • Warm site
    • back-up site with network connectivity and some equipment installed
    • requires setup to function at the full capacity of your primary site
    • data synchronization occurs less frequently, which can result in some data loss
  • Cold site
    • back-up site with little to no equipment
    • requires more time and resources to set up and restore business operations
    • data synchronization can be a difficult and lengthy process as servers need to be migrated from your primary site, resulting in a higher risk of data loss

Storage replication

Storage replication copies your data in real time from one location to another over a Storage Area Network, Local Area Network or a Wide Area Network. Since it is done in real time, it is referred to as synchronous replication. You can also use asynchronous replication, which creates copies of data according to a defined schedule.

Disk mirroring

Disk mirroring replicates data on 2 or more disk hard drives. Disk mirroring automatically switches your critical data to a standby server or network when your main system experiences unplanned downtime. If you are unable to restore your systems, you can use the mirror copy. It is important that the mirrored copy is backed up to a separate server or location that is unaffected by the outage.

Cloud vs. on-premises recovery

With a cloud-based recovery platform, you can connect easily from anywhere with a variety of devices. You can back up your data frequently, and it can be less expensive than purchasing and maintaining an on-premises platform because you pay for the space you need as you need it. Using the cloud can also reduce or eliminate the need for a separate offsite recovery site.

Test your IT recovery plan

Testing is critical. You can identify inconsistencies and address areas that need revision. Be sure to use a test environment to avoid business interruptions. Some example test strategies include:

  • Checklist: Read through and explain the steps of the recovery plan
  • Walkthrough: Walk through the steps without enacting them
  • Simulation: Use a simulated incident or disaster to familiarize the recovery team with their roles and responsibilities
  • Parallel test: Set up and test recovery systems to see if they can perform operations to support key processes. You keep your main systems in full production mode
  • Cutover test: Your recovery systems are set up to assume all your business operations, and you disconnect primary systems. This type of test causes business interruptions and requires additional planning

Learn more

Date modified: