Skip to main content

Overview

Xloud Instance HA provides automated detection and recovery of failed compute hosts and instances. Administrators configure the failover infrastructure — segments, host monitors, instance monitors, and notification drivers — that power zero-touch recovery for project workloads. This guide covers the full administrative lifecycle, from initial deployment to tuning recovery policies for production.
All operations in this guide require administrator privileges. Changes to segment configuration and monitor settings affect active recovery workflows immediately.

Open XDeploy Configuration

Log in to XDeploy (https://xdeploy.<your-domain>) and navigate to Configuration.

Enable Host HA

Select the Advance Features tab. Toggle Enable Host HA to Yes.This automatically enables the underlying services (enable_masakari and enable_hacluster) with no manual file editing required.

Save and deploy

Click Save Configuration, then navigate to Operations and run a deploy or reconfigure action.
Instance HA services start automatically across all registered compute hosts.

In This Guide

Architecture

Component diagram, service roles, and the detection-to-recovery data flow.

Failover Segments

Create and manage failover segments that define recovery scope and host groupings.

Host Monitors

Deploy and configure host monitors for host-level failure detection.

Instance Monitors

Configure instance-level monitors for guest OS and process failure detection.

Notification Drivers

Configure IPMI, libvirt, and custom notification drivers for failure signaling.

Recovery Methods

Configure and tune recovery methods — rescheduler, reserved host, and auto-evacuate.

Engine Configuration

Configure the recovery engine — timeouts, retry limits, and notification handling.

Security

Harden the Instance HA service account and restrict API access via RBAC.

Troubleshooting

Diagnose monitor failures, recovery timeouts, and notification processing errors.

Architecture Summary

ServiceRole
APIREST API for segment, host, and notification management
EngineProcesses notifications and orchestrates recovery workflows
Host MonitorDetects host failures via IPMI/libvirt probing
Instance MonitorDetects instance failures via XAVS Guest Agent probing

Next Steps

Instance HA User Guide

Enable protection on instances and monitor recovery events from an operator perspective.

Instance HA Overview

Service overview and getting started with Instance HA.