Skip to main content

Overview

Failback returns workloads to the primary site after it has been restored following a failover event. Before initiating failback, confirm the primary site is fully operational and any data created on the DR site during the failover period has been synchronized back.
Failback reverses the replication direction — data flows from the DR site back to the primary site. The time required depends on the amount of changed data accumulated during the failover period. Allow replication to fully synchronize before cutting over.
Prerequisites
  • Primary site confirmed healthy — all services operational, storage accessible
  • Network connectivity between primary and DR sites restored
  • No active production traffic changes needed until failback is complete

Failback Procedure

Verify primary site is available

Navigate to Disaster Recovery → Sites and confirm the primary site status returns to Healthy. Run a connectivity test from the DR site if available by clicking Test Connectivity on the site entry.

Reverse replication

Select the protection plan and click Reverse Replication. XDR syncs changed data from the DR site back to the primary site.Monitor sync progress in the plan status panel. The replication_lag field shows how much data remains to be transferred.
Allow replication to fully synchronize before initiating failback. The sync duration depends on how much data changed during the failover period. For active production workloads, this may take hours.

Schedule the failback window

Coordinate with application owners and stakeholders to schedule a maintenance window for the actual failback cutover. During the cutover:
  • Application connections to the DR site are briefly interrupted
  • Instances stop on the DR site and restart on the primary site
Typical failback cutover time is 10–30 minutes depending on the number of instances and the recovery runbook complexity.

Execute failback

Once sync is complete and the maintenance window begins, click Failback. The runbook executes in reverse priority order:
  1. Services stop on the DR site
  2. Final delta sync to primary site
  3. Instances start on the primary site
  4. Health checks validate service availability

Verify and re-protect

Confirm workloads are running on the primary site. Navigate to Protection Plans and verify the plan is back in Active replication status, now protecting the primary site from the DR site.
Plan shows primary site as source and replication lag is within RPO target.

Post-Failback Checklist

After failback completes, restore normal operations:

Update DNS and load balancers

Revert DNS records and load balancer configurations back to primary site IP addresses. Verify traffic is flowing to the primary site.

Validate application services

Run application-level health checks against the primary site endpoints. Confirm data integrity and service connectivity.

Verify DR protection is active

Confirm the protection plan is replicating from the primary site back to the DR site. The plan should return to normal ACTIVE status with lag within RPO target.

Document the incident

Record the failover and failback timeline, data loss (if any), actual RTO achieved, and any issues encountered during the recovery. Update the DR runbook if procedures need to be adjusted.

Next Steps

DR Testing

Run quarterly DR tests to keep failback procedures current and validated

Protection Plans

Review and update protection plans based on incident learnings

Troubleshooting

Diagnose failback synchronization issues

XDR Admin — Compliance

Generate post-incident RPO/RTO compliance reports (administrator)