Overview
This page covers the most common issues encountered when using XDR — from replication lag that threatens RPO targets, to failover operations stuck on specific resources, to DR test instances that cannot be reached for validation.Prerequisites
- An active Xloud account with project access and XDR plan access
- For site connectivity and replication configuration issues, contact your administrator. Your administrator can configure this through XDeploy.
Common Issues
Replication lag exceeding RPO target
Replication lag exceeding RPO target
Cause: Network bandwidth between sites is insufficient for the current change
rate, or the source workload is writing data faster than replication can transfer it.Diagnosis: Navigate to Disaster Recovery → Protection Plans → [Plan] and
review the replication lag and throughput metrics displayed in the plan status panel.Resolution:
- Increase network bandwidth allocation for replication traffic (contact your administrator). Your administrator can configure this through XDeploy.
- Switch to a larger replication window that permits more transfer time
- Review the change rate of protected workloads — peak write periods may cause temporary lag spikes that resolve during quieter periods
Failover stuck on a specific resource
Failover stuck on a specific resource
Cause: A dependency is not yet recovered, a pre/post script failed, or the
DR site lacks sufficient capacity for the recovering instance.Diagnosis: Navigate to Disaster Recovery → Failover Status and expand
the stuck resource entry. Review the event log for error messages and timestamps.Common causes and resolutions:
| Cause | Resolution |
|---|---|
| Pre-recovery script returned non-zero exit code | Review script output in the log; fix the script |
| Insufficient quota on DR project | Check with administrator to increase quota |
| Dependency resource not yet recovered | Wait for the dependency to complete; check priority ordering |
| DR site capacity insufficient | Contact administrator to add capacity |
DR test instances not accessible
DR test instances not accessible
Cause: The isolated test network has no route to the validation host, or security
group rules block the required ports in the test environment.Resolution:
- Use console access to reach test instances without network: Navigate to Disaster Recovery → Test Sessions → Console
- Verify the test security groups match production configuration within the isolation boundary by reviewing the security group assignments in Disaster Recovery → Test Sessions → [Instance] → Security Groups
- Confirm the test network allows communication between test instances by reviewing the network topology in Disaster Recovery → Test Sessions → Network
Failback synchronization not completing
Failback synchronization not completing
Cause: The reverse replication sync is stalled due to network issues between
the DR and primary sites, or a large amount of data was written to the DR site
during the failover period.Diagnosis: Navigate to Disaster Recovery → Protection Plans → [Plan] and
review the reverse sync progress and replication lag metrics. Check the replication
link statistics in Disaster Recovery → Sites → Replication Links → [Link].Resolution:
- Verify network connectivity between DR and primary sites
- Check that firewall rules permit replication traffic in both directions
- If sync is making progress but slowly, allow more time — large datasets take proportionally longer
- If throughput is near-zero, check for network path issues or firewall changes that occurred during the failover period
Diagnostics Reference
All diagnostic operations are performed through the XDR Dashboard:| Issue | Dashboard Location |
|---|---|
| Replication lag | Disaster Recovery → Protection Plans → [Plan] — replication lag panel |
| Failover stuck | Disaster Recovery → Failover Status → [Resource] — event log |
| Site connectivity | Disaster Recovery → Sites → [Site] — Test Connectivity button |
| Test resources | Disaster Recovery → Test Sessions → [Instance] — IP and status |
| Link throughput | Disaster Recovery → Sites → Replication Links → [Link] — throughput statistics |
When to Contact Your Administrator
Contact your DR administrator or support@xloud.tech if any of the following persist. Your administrator can configure this through XDeploy.- Replication lag has exceeded the RPO target for more than 30 minutes
- Failover is stuck and the event log shows an unresolvable error
- Site connectivity tests fail consistently
- Failback synchronization shows zero throughput for more than 10 minutes
Next Steps
XDR Admin — Troubleshooting
Administrator-level DR diagnostics — site registration, replication links
Protection Plans
Review and adjust plan configuration based on troubleshooting findings
DR Testing
Run DR tests after resolving issues to validate recovery still works
Support
Contact Xloud support for issues requiring platform-level investigation