> ## Documentation Index > Fetch the complete documentation index at: https://docs.xloud.tech/llms.txt > Use this file to discover all available pages before exploring further. # Instance HA Troubleshooting — User Guide > Resolve Instance HA issues — stuck notifications, failed recoveries, and manual evacuation procedures. ## Overview This page covers common Instance HA issues encountered by project users — instances that did not recover, notifications stuck in error or running states, and protection settings that are not visible in the Dashboard. For platform-level issues such as monitor failures or engine misconfiguration, refer to the [Instance HA Admin Troubleshooting](/services/instance-ha/admin-guide/troubleshooting) guide. **Prerequisites** * Project access to the Xloud Dashboard or CLI * Knowledge of the affected instance IDs and the compute host involved *** ## Common Issues **Cause**: Instance HA protection may not be enabled for the instance, the compute host may not be registered in any segment, or the segment may be disabled. **Resolution**: Confirm the segment is enabled and the host is registered: ```bash title="Check segment status" theme={null} openstack segment list ``` ```bash title="List hosts in segment" theme={null} openstack segment host list ``` If the failed host is missing from the segment, contact your administrator to register it. Your administrator can configure this through [XDeploy](/deployment). If the segment is disabled (`enabled: False`), your administrator must re-enable it. After the root cause is resolved, manually evacuate the instance to restore service: ```bash title="Manual evacuation" theme={null} openstack server evacuate --host ``` **Cause**: Automatic recovery failed. Common causes: insufficient capacity on remaining hosts, an instance stuck in `ERROR` state that the engine cannot recover, or a network issue during evacuation. **Resolution**: ```bash title="Show notification detail" theme={null} openstack notification show ``` Review the payload for specific failure information. Then check instance state: ```bash title="Show instance status" theme={null} openstack server show -f value -c status ``` If the instance is in `ERROR` state, attempt a manual reset and evacuation: ```bash title="Reset instance state" theme={null} openstack server set --state active ``` ```bash title="Manually evacuate" theme={null} openstack server evacuate --host ``` Contact your administrator if the instance cannot be recovered through the above steps. Your administrator can configure this through [XDeploy](/deployment). **Cause**: The recovery engine is waiting for the target host to accept the instance, a workflow step has timed out, or the target host is under load. **Resolution**: Check how long the notification has been in `running` state: ```bash title="Show notification timestamps" theme={null} openstack notification show \ -f value -c generated_time -c status ``` If the notification has been `running` for more than 10 minutes, contact your administrator. They can inspect the Instance HA engine logs and reset the workflow if it has genuinely stalled. Your administrator can configure this through [XDeploy](/deployment). **Cause**: No failover segment has been created for your environment, or the segments that exist have not been made accessible to your project. **Resolution**: Contact your Xloud administrator and request that a failover segment be created and that the compute hosts used by your project be registered. Your administrator can configure this through [XDeploy](/deployment). See the [Instance HA Admin Guide — Failover Segments](/services/instance-ha/admin-guide/failover-segments) for the configuration steps. **Cause**: The compute service cannot confirm the instance state because the host is unreachable. This is expected immediately after a host fault is detected. **Resolution**: Wait for the recovery workflow to complete. The instance transitions from `UNKNOWN` → `MIGRATING` → `BUILD` → `ACTIVE` as recovery proceeds. If the instance remains `UNKNOWN` for more than 5 minutes without a recovery notification being created, verify that the failed host is registered in an enabled segment and that the host monitor can reach the Instance HA notification endpoint. **Cause**: The `auto` recovery method placed the instance on any available host rather than a specific preferred host. This is expected behaviour for the `auto` method. **Resolution**: If your workloads require guaranteed placement on specific hosts, ask your administrator to configure a segment with `reserved_host` or `rh_priority` recovery method and designate the preferred host as a reserved standby. *** ## Manual Recovery Procedure If automatic recovery fails, use the following procedure to restore service manually. ```bash title="Find instances on the failed host" theme={null} openstack server list \ --host \ -f table -c ID -c Name -c Status ``` ```bash title="Reset instance state" theme={null} openstack server set --state active ``` ```bash title="Evacuate to a specific host" theme={null} openstack server evacuate --host ``` Omit `--host` to let the scheduler choose any available host in the segment: ```bash title="Evacuate to any available host" theme={null} openstack server evacuate ``` ```bash title="Confirm instance is ACTIVE" theme={null} openstack server show \ -f value -c status -c "OS-EXT-SRV-ATTR:host" ``` Instance shows `ACTIVE` and the host field reflects the new compute node. *** ## Next Steps Track live and historical recovery notifications. Understand recovery methods and workflow stages. Administrator-level troubleshooting for monitors, engine failures, and capacity issues. Manage and recover instances manually using core compute operations.