Overview
Failback returns workloads to the primary site after it has been restored following a failover event. Before initiating failback, confirm the primary site is fully operational and any data created on the DR site during the failover period has been synchronized back.Failback reverses the replication direction — data flows from the DR site back to
the primary site. The time required depends on the amount of changed data accumulated
during the failover period. Allow replication to fully synchronize before cutting over.
Prerequisites
- Primary site confirmed healthy — all services operational, storage accessible
- Network connectivity between primary and DR sites restored
- No active production traffic changes needed until failback is complete
Failback Procedure
- Dashboard
- CLI
Verify primary site is available
Navigate to Disaster Recovery → Sites and confirm the primary site status
returns to Healthy. Run a connectivity test from the DR site if available
by clicking Test Connectivity on the site entry.
Reverse replication
Select the protection plan and click Reverse Replication. XDR syncs
changed data from the DR site back to the primary site.Monitor sync progress in the plan status panel. The
replication_lag field
shows how much data remains to be transferred.Schedule the failback window
Coordinate with application owners and stakeholders to schedule a maintenance
window for the actual failback cutover. During the cutover:
- Application connections to the DR site are briefly interrupted
- Instances stop on the DR site and restart on the primary site
Execute failback
Once sync is complete and the maintenance window begins, click Failback.
The runbook executes in reverse priority order:
- Services stop on the DR site
- Final delta sync to primary site
- Instances start on the primary site
- Health checks validate service availability
Post-Failback Checklist
After failback completes, restore normal operations:Update DNS and load balancers
Revert DNS records and load balancer configurations back to primary site IP
addresses. Verify traffic is flowing to the primary site.
Validate application services
Run application-level health checks against the primary site endpoints. Confirm
data integrity and service connectivity.
Verify DR protection is active
Confirm the protection plan is replicating from the primary site back to the DR site.
The plan should return to normal
ACTIVE status with lag within RPO target.Next Steps
DR Testing
Run quarterly DR tests to keep failback procedures current and validated
Protection Plans
Review and update protection plans based on incident learnings
Troubleshooting
Diagnose failback synchronization issues
XDR Admin — Compliance
Generate post-incident RPO/RTO compliance reports (administrator)