Recovery Plans

Overview

Recovery plans define the complete failover procedure — which resources are protected, in what order they recover, what health checks confirm readiness, and what automation scripts run at each stage. A well-designed recovery plan is the foundation of a reliable DR strategy with predictable RTO.

Prerequisites

Sites registered and replication link verified (see Replication Configuration)
Administrator credentials on both sites
Instances and volumes to protect must exist in the project

Creating a Recovery Plan

Dashboard
CLI

Open Recovery Plans

Navigate to Disaster Recovery → Recovery Plans → Create Plan.

Define plan parameters

Field	Description
Plan Name	Descriptive label identifying the workload tier (e.g., `prod-database-dr`)
Primary Site	Source site for replication
DR Site	Target site for recovery
RPO Target	Maximum acceptable data loss (e.g., `5 minutes`)
RTO Target	Maximum acceptable recovery time (e.g., `30 minutes`)
Failover Trigger	`Manual` or `Automatic`
Consistency Mode	`Crash-consistent` or `Application-consistent`
Replication Mode	`Asynchronous` or `Synchronous`

Add resource groups

Organize protected resources into ordered recovery groups. Resources within a group recover in parallel; groups recover sequentially.

Group	Resources	Recovery Order
Group 1	Database instances	1 — first to recover
Group 2	Application servers	2 — start after databases healthy
Group 3	Load balancers / frontends	3 — start after app tier healthy

Model recovery groups on the actual application dependency chain. Starting an application server before its database is ready causes service errors and may require manual intervention during a real failover.

Configure automation hooks

Add pre/post scripts to each resource group:

Hook Type	Trigger	Example Use
Pre-Failover	Before group starts recovering	Notify on-call; update DNS TTL
Post-Recover	After group is running	Run health check; update service registry
Pre-Failback	Before reversing replication	Drain connections from DR instances
Post-Failback	After primary site is restored	Re-enable scheduled jobs

Set health check criteria

Define what “recovered” means for each resource group:

HTTP health check — URL and expected response code
TCP port check — host and port number
Script — custom validation command (exit 0 = healthy)

A recovery group advances to the next group only when all health checks in the current group pass. This prevents cascading failures where dependent services start before their dependencies are ready.

Activate the plan

Click Activate. XDR begins replicating all protected resources to the DR site. Initial sync time depends on data volume.

Plan status shows ACTIVE and initial replication sync progress is visible in the replication dashboard.

Managing Existing Plans

Dashboard
CLI

Navigate to Disaster Recovery → Recovery Plans to see all plans with their current status and replication lag.Available actions per plan:

Edit — update RPO/RTO targets, add/remove resources, modify health checks
Deactivate — pause replication without deleting the plan
Delete — permanently remove the plan (stops replication)
Failover — initiate failover (see Failover)
Test Failover — run an isolated DR test without cutting over production traffic

Consistency Modes

Mode	How It Works	RPO Accuracy	Overhead
Crash-consistent	Replicates data as written — like a power failure at the recovery point	May require fsck on recovery; databases may need recovery	Minimal
Application-consistent	Coordinates with the XAVS Guest Agent to quiesce writes before snapshot (includes VSS provider for Windows)	Application-clean recovery point; no database recovery needed	XAVS Guest Agent round-trip per snapshot interval

Use application-consistent mode for databases and transactional workloads. Crash-consistent mode is suitable for stateless compute instances where data integrity depends on the application rather than the storage layer.

Recovery Point Retention

XDR retains a configurable number of recovery points, allowing historical restore targets during failover: Configure retention settings from Disaster Recovery → Recovery Plans → [Plan] → Retention:

Retention Setting	Behavior
Count	Number of recovery points to retain (older points are pruned)
Interval	Minimum time between recovery points
Maximum age	Absolute oldest recovery point to retain

Increasing recovery point retention consumes additional storage on the DR site. Each recovery point is an incremental snapshot — for high-change workloads, deep retention can accumulate significant storage overhead.

Next Steps

DR Automation

Configure runbook scripts and automatic failover triggers

Monitoring

Monitor plan replication health and RPO adherence

Compliance

Generate RPO/RTO compliance reports from plan history

XDR User Guide — Protection Plans

User-facing protection plan management

Core Services

Other Services

Recovery Plans

Overview

Creating a Recovery Plan

Open Recovery Plans

Define plan parameters

Add resource groups

Configure automation hooks

Set health check criteria

Activate the plan

Managing Existing Plans

Consistency Modes

Recovery Point Retention

Next Steps

DR Automation

Monitoring

Compliance

XDR User Guide — Protection Plans

Core Services

Other Services

Documentation Index

​Overview

​Creating a Recovery Plan

Open Recovery Plans

Define plan parameters

Add resource groups

Configure automation hooks

Set health check criteria

Activate the plan

​Managing Existing Plans

​Consistency Modes

​Recovery Point Retention

​Next Steps

DR Automation

Monitoring

Compliance

XDR User Guide — Protection Plans

Overview

Creating a Recovery Plan

Managing Existing Plans

Consistency Modes

Recovery Point Retention

Next Steps