> ## Documentation Index > Fetch the complete documentation index at: https://docs.xloud.tech/llms.txt > Use this file to discover all available pages before exploring further. # Troubleshooting > Diagnose common XDR user-facing issues — replication lag exceeding RPO targets, failover stuck states, and DR test instance access problems. ## Overview This page covers the most common issues encountered when using XDR — from replication lag that threatens RPO targets, to failover operations stuck on specific resources, to DR test instances that cannot be reached for validation. **Prerequisites** * An active Xloud account with project access and XDR plan access * For site connectivity and replication configuration issues, contact your administrator. Your administrator can configure this through [XDeploy](/deployment). *** ## Common Issues **Cause**: Network bandwidth between sites is insufficient for the current change rate, or the source workload is writing data faster than replication can transfer it. **Diagnosis**: Navigate to **Disaster Recovery → Protection Plans → \[Plan]** and review the replication lag and throughput metrics displayed in the plan status panel. **Resolution**: * Increase network bandwidth allocation for replication traffic (contact your administrator). Your administrator can configure this through [XDeploy](/deployment). * Switch to a larger replication window that permits more transfer time * Review the change rate of protected workloads — peak write periods may cause temporary lag spikes that resolve during quieter periods If replication lag consistently exceeds the RPO target, data loss beyond the target threshold is possible in a failover scenario. Escalate to your storage administrator immediately — do not wait for an actual disaster event. **Cause**: A dependency is not yet recovered, a pre/post script failed, or the DR site lacks sufficient capacity for the recovering instance. **Diagnosis**: Navigate to **Disaster Recovery → Failover Status** and expand the stuck resource entry. Review the event log for error messages and timestamps. **Common causes and resolutions**: | Cause | Resolution | | ----------------------------------------------- | ------------------------------------------------------------ | | Pre-recovery script returned non-zero exit code | Review script output in the log; fix the script | | Insufficient quota on DR project | Check with administrator to increase quota | | Dependency resource not yet recovered | Wait for the dependency to complete; check priority ordering | | DR site capacity insufficient | Contact administrator to add capacity | **Cause**: The isolated test network has no route to the validation host, or security group rules block the required ports in the test environment. **Resolution**: 1. Use console access to reach test instances without network: Navigate to **Disaster Recovery → Test Sessions → Console** 2. Verify the test security groups match production configuration within the isolation boundary by reviewing the security group assignments in **Disaster Recovery → Test Sessions → \[Instance] → Security Groups** 3. Confirm the test network allows communication between test instances by reviewing the network topology in **Disaster Recovery → Test Sessions → Network** **Cause**: The reverse replication sync is stalled due to network issues between the DR and primary sites, or a large amount of data was written to the DR site during the failover period. **Diagnosis**: Navigate to **Disaster Recovery → Protection Plans → \[Plan]** and review the reverse sync progress and replication lag metrics. Check the replication link statistics in **Disaster Recovery → Sites → Replication Links → \[Link]**. **Resolution**: * Verify network connectivity between DR and primary sites * Check that firewall rules permit replication traffic in both directions * If sync is making progress but slowly, allow more time — large datasets take proportionally longer * If throughput is near-zero, check for network path issues or firewall changes that occurred during the failover period *** ## Diagnostics Reference All diagnostic operations are performed through the XDR Dashboard: | Issue | Dashboard Location | | ----------------- | ----------------------------------------------------------------------------------- | | Replication lag | **Disaster Recovery → Protection Plans → \[Plan]** — replication lag panel | | Failover stuck | **Disaster Recovery → Failover Status → \[Resource]** — event log | | Site connectivity | **Disaster Recovery → Sites → \[Site]** — Test Connectivity button | | Test resources | **Disaster Recovery → Test Sessions → \[Instance]** — IP and status | | Link throughput | **Disaster Recovery → Sites → Replication Links → \[Link]** — throughput statistics | *** ## When to Contact Your Administrator Contact your DR administrator or [support@xloud.tech](mailto:support@xloud.tech) if any of the following persist. Your administrator can configure this through [XDeploy](/deployment). * Replication lag has exceeded the RPO target for more than 30 minutes * Failover is stuck and the event log shows an unresolvable error * Site connectivity tests fail consistently * Failback synchronization shows zero throughput for more than 10 minutes *** ## Next Steps Administrator-level DR diagnostics — site registration, replication links Review and adjust plan configuration based on troubleshooting findings Run DR tests after resolving issues to validate recovery still works Contact Xloud support for issues requiring platform-level investigation