Documentation Index
Fetch the complete documentation index at: https://docs.xloud.tech/llms.txt
Use this file to discover all available pages before exploring further.
Overview
A recovery workflow is the ordered sequence of actions the Instance HA engine takes after a host failure notification is received. The workflow covers instance evacuation, restart on a healthy host, and post-recovery status reporting. The Dashboard provides real-time tracking of each VM evacuation through the Recovery Progress tab and a consolidated VM Moves page.Prerequisites
- Instance HA protection enabled on your instances
- At least one failover segment configured with registered hosts
Recovery Workflow Stages
Recovery Progress — Real-Time Tracking
When a recovery is in progress, the Dashboard provides real-time tracking of each individual VM evacuation through the Recovery Progress tab on the notification detail page.- Dashboard
- CLI
Open the notification detail
Navigate to Instance HA > Notifications. Click a notification UUID
to open the detail page.
View the Recovery Progress tab
Click the Recovery Progress tab. This tab shows:Summary card at the top:
VM Evacuations table below the summary:
VM evacuation status values:
| Field | Description |
|---|---|
| Notification Status | Current status as a colored tag |
| Total VMs | Total number of VMs being evacuated |
| Succeeded | Count of successfully recovered VMs (green) |
| Failed | Count of failed evacuations (red, if any) |
| Progress | Circular progress indicator showing completion |
| Column | Description |
|---|---|
| VM Name | Instance name (falls back to UUID if no name) |
| Source Host | The failed compute host |
| Destination Host | Target recovery host (“Pending” if not yet assigned) |
| Type | Evacuation type (typically evacuation) |
| Status | Evacuation status with icon |
| Start Time | When the evacuation started |
| End Time | When the evacuation completed |
| Message | Error message if the evacuation failed (red text) |
| Status | Color | Icon | Meaning |
|---|---|---|---|
| Pending | Grey | Clock | Evacuation queued, not yet started |
| Running | Blue | Loading spinner | Evacuation in progress |
| Succeeded | Green | Check circle | VM successfully recovered |
| Failed | Red | Close circle | Evacuation failed — manual intervention needed |
VM Moves — Consolidated View
The VM Moves page provides a single view of all VM evacuations across all notifications, making it easy to review recovery history.- Dashboard
- CLI
Review the VM moves list
The page displays all VM evacuations from recent notifications (up to the
last 50 notifications), sorted by start time.
Use the Refresh button to reload the latest data.
| Column | Description |
|---|---|
| VM Name | Instance name (falls back to UUID) |
| Instance ID | VM UUID (copyable, truncated display) |
| Notification | Parent notification UUID (copyable, truncated display) |
| Source Host | The failed compute host |
| Destination Host | Target recovery host, or - if pending |
| Type | Evacuation type (typically evacuation) |
| Status | Colored tag with icon (Succeeded/Failed/Running/Pending) |
| Start Time | When the evacuation started (default sort, descending) |
| End Time | When the evacuation completed, or - |
| Message | Error message if failed (red text) |
The VM Moves page is read-only — it provides a consolidated view for
monitoring and auditing. No actions are available on individual VM moves.
Recovery Methods in Detail
auto — Evacuate to Any Host
auto — Evacuate to Any Host
The
auto method selects the healthiest available host in the segment based on
current vCPU and memory availability. Instances are distributed across multiple
target hosts if no single host has sufficient capacity for all evacuees.Characteristics:- No pre-reserved capacity required
- Recovery succeeds as long as aggregate free capacity in the segment is sufficient
- Most flexible option for mixed workloads
auto_priority — Priority-Based Selection
auto_priority — Priority-Based Selection
Similar to
auto, but uses priority-based host selection. The engine evaluates
hosts based on configured priority attributes and selects the highest-priority
available host for each evacuation.Characteristics:- Allows administrators to influence target host selection
- Still best-effort — no guaranteed standby capacity
- Useful when certain hosts are preferred targets
reserved_host — Dedicated Standby
reserved_host — Dedicated Standby
One or more hosts in the segment are designated as reserved standby nodes. These
hosts remain idle until a failover event occurs, ensuring guaranteed capacity for
recovery.Characteristics:
- Guaranteed recovery capacity regardless of current cluster load
- Reserved hosts do not accept regular instance scheduling
- Higher infrastructure cost (idle nodes consume resources)
rh_priority — Prefer Reserved, Fall Back
rh_priority — Prefer Reserved, Fall Back
The engine attempts recovery to reserved hosts first. If reserved hosts are full,
it falls back to the
auto behaviour and selects any available host in the segment.Characteristics:- Balances guaranteed capacity for high-priority workloads with flexibility
- Works well in mixed segments that contain both critical and standard workloads
- Requires at least one reserved host in the segment
Instance State During Recovery
| Phase | Instance Status | Description |
|---|---|---|
| Normal operation | ACTIVE | Instance running on original host |
| Fault detected | UNKNOWN | Host unreachable; compute service cannot confirm instance state |
| Evacuation in progress | MIGRATING | Instance being moved to target host |
| Restarting | BUILD | Instance starting up on target host |
| Recovery complete | ACTIVE | Instance fully operational on new host |
| Recovery failed | ERROR | Manual intervention required |
Notification Status Reference
Every recovery event creates a notification record. The notificationstatus field
tracks progress through the workflow.
| Status | Color | Meaning |
|---|---|---|
new | Blue | Fault notification received; recovery not yet started |
running | Orange | Recovery workflow in progress |
finished | Green | All instances recovered successfully |
error | Red | Recovery encountered errors |
failed | Red | Recovery failed completely |
ignored | Grey | Notification was de-duplicated or the segment was disabled |
Recovery Time Expectations
Recovery time depends on several factors:| Factor | Typical Impact |
|---|---|
| Host monitor detection timeout | 30-120 seconds to declare host unreachable |
| Instance count on failed host | Each instance adds 30-120 seconds to total recovery time |
| Instance disk size (shared storage) | Minimal — shared storage volumes are reattached, not copied |
| Target host boot overhead | Constant per instance — determined by instance flavor and image |
Next Steps
Monitoring Status
View notifications, hosts, and VM moves in the Dashboard
Troubleshooting
Resolve stuck or failed recovery workflows
Protection Segments
Create segments and manage host registrations
Instance HA Admin Guide
Configure recovery policies, monitors, and engine settings