Skip to main content

Overview

Recovery plans define the complete failover procedure — which resources are protected, in what order they recover, what health checks confirm readiness, and what automation scripts run at each stage. A well-designed recovery plan is the foundation of a reliable DR strategy with predictable RTO.
Prerequisites
  • Sites registered and replication link verified (see Replication Configuration)
  • Administrator credentials on both sites
  • Instances and volumes to protect must exist in the project

Creating a Recovery Plan

Open Recovery Plans

Navigate to Disaster Recovery → Recovery Plans → Create Plan.

Define plan parameters

FieldDescription
Plan NameDescriptive label identifying the workload tier (e.g., prod-database-dr)
Primary SiteSource site for replication
DR SiteTarget site for recovery
RPO TargetMaximum acceptable data loss (e.g., 5 minutes)
RTO TargetMaximum acceptable recovery time (e.g., 30 minutes)
Failover TriggerManual or Automatic
Consistency ModeCrash-consistent or Application-consistent
Replication ModeAsynchronous or Synchronous

Add resource groups

Organize protected resources into ordered recovery groups. Resources within a group recover in parallel; groups recover sequentially.
GroupResourcesRecovery Order
Group 1Database instances1 — first to recover
Group 2Application servers2 — start after databases healthy
Group 3Load balancers / frontends3 — start after app tier healthy
Model recovery groups on the actual application dependency chain. Starting an application server before its database is ready causes service errors and may require manual intervention during a real failover.

Configure automation hooks

Add pre/post scripts to each resource group:
Hook TypeTriggerExample Use
Pre-FailoverBefore group starts recoveringNotify on-call; update DNS TTL
Post-RecoverAfter group is runningRun health check; update service registry
Pre-FailbackBefore reversing replicationDrain connections from DR instances
Post-FailbackAfter primary site is restoredRe-enable scheduled jobs

Set health check criteria

Define what “recovered” means for each resource group:
  • HTTP health check — URL and expected response code
  • TCP port check — host and port number
  • Script — custom validation command (exit 0 = healthy)
A recovery group advances to the next group only when all health checks in the current group pass. This prevents cascading failures where dependent services start before their dependencies are ready.

Activate the plan

Click Activate. XDR begins replicating all protected resources to the DR site. Initial sync time depends on data volume.
Plan status shows ACTIVE and initial replication sync progress is visible in the replication dashboard.

Managing Existing Plans

Navigate to Disaster Recovery → Recovery Plans to see all plans with their current status and replication lag.Available actions per plan:
  • Edit — update RPO/RTO targets, add/remove resources, modify health checks
  • Deactivate — pause replication without deleting the plan
  • Delete — permanently remove the plan (stops replication)
  • Failover — initiate failover (see Failover)
  • Test Failover — run an isolated DR test without cutting over production traffic

Consistency Modes

ModeHow It WorksRPO AccuracyOverhead
Crash-consistentReplicates data as written — like a power failure at the recovery pointMay require fsck on recovery; databases may need recoveryMinimal
Application-consistentCoordinates with the XAVS Guest Agent to quiesce writes before snapshot (includes VSS provider for Windows)Application-clean recovery point; no database recovery neededXAVS Guest Agent round-trip per snapshot interval
Use application-consistent mode for databases and transactional workloads. Crash-consistent mode is suitable for stateless compute instances where data integrity depends on the application rather than the storage layer.

Recovery Point Retention

XDR retains a configurable number of recovery points, allowing historical restore targets during failover: Configure retention settings from Disaster Recovery → Recovery Plans → [Plan] → Retention:
Retention SettingBehavior
CountNumber of recovery points to retain (older points are pruned)
IntervalMinimum time between recovery points
Maximum ageAbsolute oldest recovery point to retain
Increasing recovery point retention consumes additional storage on the DR site. Each recovery point is an incremental snapshot — for high-change workloads, deep retention can accumulate significant storage overhead.

Next Steps

DR Automation

Configure runbook scripts and automatic failover triggers

Monitoring

Monitor plan replication health and RPO adherence

Compliance

Generate RPO/RTO compliance reports from plan history

XDR User Guide — Protection Plans

User-facing protection plan management