Overview
Failover switches protected workloads from the primary site to the DR site. Initiate failover when a primary site failure is confirmed and recovery at the primary site is not possible within the RTO window.Prerequisites
- An active protection plan in
ACTIVEreplication status - Confirmation that the primary site is unavailable — cross-reference with XIMP monitoring
- DR site confirmed healthy (navigate to Disaster Recovery → Sites)
Failover Procedure
- Dashboard
- CLI
Confirm primary site status
Navigate to Project → Disaster Recovery → Sites and verify the primary site
health indicator shows Unreachable or Failed. Cross-reference with the
XIMP monitoring portal for independent confirmation.
Initiate failover
Navigate to Project → Disaster Recovery → Protection Plans, select the
affected plan, and click Failover. Confirm the failover dialog.
| Option | Description |
|---|---|
| Latest Recovery Point | Use the most recent replicated snapshot |
| Specific Recovery Point | Select a point-in-time snapshot from the recovery point list |
| Test Mode | Bring up workloads in isolation without cutting over production traffic |
Selecting Latest Recovery Point uses data from the last successful
replication cycle. Any writes to the primary site since that cycle will be
lost permanently. Review the current replication lag before confirming.
Monitor recovery progress
The DR Runbook executes automatically in the configured priority order. Track
progress in Disaster Recovery → Failover Status. Each resource shows:
| Status | Meaning |
|---|---|
| Pending | Waiting for dependencies to recover first |
| Recovering | Instance starting on DR site |
| Validated | Recovery script confirmed service is available |
| Failed | Recovery step encountered an error — review event log |
Post-Failover Checklist
After failover completes, perform these steps:Validate application services
Run application-level health checks against the DR site endpoints. Verify
databases are consistent, application tiers are connected, and external services
can reach the DR site.
Update DNS and load balancers
Route production traffic to DR site IP addresses. Update:
- External DNS A/CNAME records
- Load balancer pools and health checks
- Any hardcoded IP references in application configuration
Notify stakeholders
Communicate the failover event and DR site endpoints to:
- Operations and on-call teams
- Business stakeholders and affected service owners
- Partners or customers if external connectivity has changed
Begin planning failback
Once the primary site issue is resolved, plan the failback operation. See
Failback for the full procedure.
Next Steps
Failback
Return workloads to the primary site after it has been restored
Protection Plans
Review and update protection plans after the failover event
Troubleshooting
Diagnose failover stuck states and recovery script failures
XDR Admin — DR Automation
Configure automatic failover triggers to reduce response time (administrator)