> ## Documentation Index > Fetch the complete documentation index at: https://docs.xloud.tech/llms.txt > Use this file to discover all available pages before exploring further. # Instance High Availability > Automatically detect and recover failed instances and compute hosts with Xloud Instance HA — zero-touch failover for mission-critical workloads. Protect your workloads from compute host failures with automatic detection and recovery. Xloud Instance HA continuously monitors compute nodes and instances, triggering evacuation and restart workflows the moment a fault is detected — without manual intervention. Product details on xloud.tech ***

Instance High Availability

Understand protection segments, instance protection policies, and how to monitor recovery workflows for your running workloads. Configure failover segments, host and instance monitors, notification drivers, and integrate Instance HA with your compute cluster. Complete command reference for managing failover segments, hosts, and recovery notifications using the openstack CLI. Xloud Compute provides the hypervisor layer that Instance HA monitors and manages during host failover events. ***

Key Capabilities

IPMI and SSH-based monitors detect unreachable hosts in seconds and immediately trigger evacuation of all protected instances. Failed instances are automatically restarted on healthy hosts within the same protection segment, respecting affinity rules. Designate standby compute hosts that remain idle until a failover event occurs — guaranteeing resource availability for recovery. Group hosts and instances into logical fault domains. Each segment has its own recovery policy, monitors, and notification targets. Integrate with IPMI, SSH, and custom notification sources to receive precise fault signals from infrastructure monitoring tools. Every recovery event is logged with timestamps, affected instances, and resolution outcomes — fully queryable via the Dashboard and CLI. ***

How It Works

```mermaid theme={null} graph TD A[Compute Host] -->|Heartbeat monitored| B[Host Monitor] B -->|Fault detected| C[Notification Engine] C -->|Triggers recovery| D[Instance HA Engine] D -->|Evacuates instances| E[Healthy Compute Hosts] D -->|Logs event| F[Recovery Audit Log] E -->|Instances restarted| G[Protected Instances Active] ``` ***

Platform Resilience

**Xloud-Developed** — These resilience capabilities are developed by Xloud and ship with XAVS. Automatic instance restart on host failure. Configurable per-instance priority. Failover segments for per-group recovery policies. Requires XAVS. 9-phase automated recovery playbook. Target recovery time: 7-13 minutes. Sequential service startup with health gates between each phase. Requires XAVS. Three-tier autoheal daemon with dependency-aware restart ordering. Circuit breaker pattern prevents restart loops. Exponential backoff. Requires XAVS. Pre-configured alert rules across 13 groups covering storage, database, message queue, compute, networking, containers, APIs, system resources, disk, memory, security, and capacity. Predictive alerts for capacity forecasting. Requires XAVS. L3 high availability and DHCP high availability with sub-3-second failover. Automatic ARP gratuitous announcements for fast VIP convergence. Requires XAVS. Per-service container upgrades with 2-10 second swap time. Canary deployment (first node only). Image tag rollback mechanism. Previous images cached locally. Requires XAVS. ***

Related Services

The hypervisor layer monitored and managed by Instance HA Rebalances workloads after recovery to restore cluster efficiency Persistent volumes that survive host failover when using shared storage