> ## Documentation Index
> Fetch the complete documentation index at: https://docs.xloud.tech/llms.txt
> Use this file to discover all available pages before exploring further.

# Failover

> Execute XDR failover to switch protected workloads from the primary site to the DR site after a confirmed primary site failure.

## Overview

Failover switches protected workloads from the primary site to the DR site. Initiate
failover when a primary site failure is confirmed and recovery at the primary site is
not possible within the RTO window.

<Warning>
  Failover is a significant operation. Confirm that the primary site is genuinely
  unavailable before proceeding. An unnecessary failover requires a full failback
  cycle to restore normal operations.
</Warning>

<Note>
  **Prerequisites**

  * An active protection plan in `ACTIVE` replication status
  * Confirmation that the primary site is unavailable — cross-reference with XIMP monitoring
  * DR site confirmed healthy (navigate to **Disaster Recovery → Sites**)
</Note>

***

## Failover Procedure

<Tabs>
  <Tab title="Dashboard" icon="gauge">
    <Steps titleSize="h3">
      <Step title="Confirm primary site status" icon="activity">
        Navigate to **Project → Disaster Recovery → Sites** and verify the primary site
        health indicator shows **Unreachable** or **Failed**. Cross-reference with the
        XIMP monitoring portal for independent confirmation.

        <Tip>
          Do not rely on a single monitoring source. A network partition may make the
          primary site appear unreachable from the DR site while it is actually still
          operational. Verify from multiple vantage points before proceeding.
        </Tip>
      </Step>

      <Step title="Initiate failover" icon="zap">
        Navigate to **Project → Disaster Recovery → Protection Plans**, select the
        affected plan, and click **Failover**. Confirm the failover dialog.

        | Option                      | Description                                                             |
        | --------------------------- | ----------------------------------------------------------------------- |
        | **Latest Recovery Point**   | Use the most recent replicated snapshot                                 |
        | **Specific Recovery Point** | Select a point-in-time snapshot from the recovery point list            |
        | **Test Mode**               | Bring up workloads in isolation without cutting over production traffic |

        <Danger>
          Selecting **Latest Recovery Point** uses data from the last successful
          replication cycle. Any writes to the primary site since that cycle will be
          lost permanently. Review the current replication lag before confirming.
        </Danger>
      </Step>

      <Step title="Monitor recovery progress" icon="activity">
        The DR Runbook executes automatically in the configured priority order. Track
        progress in **Disaster Recovery → Failover Status**. Each resource shows:

        | Status         | Meaning                                               |
        | -------------- | ----------------------------------------------------- |
        | **Pending**    | Waiting for dependencies to recover first             |
        | **Recovering** | Instance starting on DR site                          |
        | **Validated**  | Recovery script confirmed service is available        |
        | **Failed**     | Recovery step encountered an error — review event log |
      </Step>

      <Step title="Verify workloads on DR site" icon="circle-check">
        Confirm application-level availability by accessing services through the DR
        site endpoints. Update DNS or load balancer configurations to route traffic
        to the DR site.

        <Check>Protected workloads are running on the DR site and serving traffic.</Check>
      </Step>
    </Steps>
  </Tab>

  <Tab title="CLI" icon="terminal">
    <Info>
      XDR disaster recovery operations are managed exclusively through the XDR Dashboard.
      CLI access is not available for DR operations. Use the **Dashboard** tab above for
      the complete failover procedure.
    </Info>
  </Tab>
</Tabs>

***

## Post-Failover Checklist

After failover completes, perform these steps:

<Steps titleSize="h3">
  <Step title="Validate application services" icon="circle-check">
    Run application-level health checks against the DR site endpoints. Verify
    databases are consistent, application tiers are connected, and external services
    can reach the DR site.
  </Step>

  <Step title="Update DNS and load balancers" icon="globe">
    Route production traffic to DR site IP addresses. Update:

    * External DNS A/CNAME records
    * Load balancer pools and health checks
    * Any hardcoded IP references in application configuration
  </Step>

  <Step title="Notify stakeholders" icon="send">
    Communicate the failover event and DR site endpoints to:

    * Operations and on-call teams
    * Business stakeholders and affected service owners
    * Partners or customers if external connectivity has changed
  </Step>

  <Step title="Begin planning failback" icon="rotate-ccw">
    Once the primary site issue is resolved, plan the failback operation. See
    [Failback](/services/disaster-recovery/user-guide/failback) for the full procedure.
  </Step>
</Steps>

***

## Next Steps

<CardGroup cols={2}>
  <Card title="Failback" href="/services/disaster-recovery/user-guide/failback" color="#197560">
    Return workloads to the primary site after it has been restored
  </Card>

  <Card title="Protection Plans" href="/services/disaster-recovery/user-guide/protection-plans" color="#197560">
    Review and update protection plans after the failover event
  </Card>

  <Card title="Troubleshooting" href="/services/disaster-recovery/user-guide/troubleshooting" color="#197560">
    Diagnose failover stuck states and recovery script failures
  </Card>

  <Card title="XDR Admin — DR Automation" href="/services/disaster-recovery/admin-guide/dr-automation" color="#197560">
    Configure automatic failover triggers to reduce response time (administrator)
  </Card>
</CardGroup>
