> ## Documentation Index > Fetch the complete documentation index at: https://docs.xloud.tech/llms.txt > Use this file to discover all available pages before exploring further. # Instance Monitors > Configure Xloud Instance HA instance-level monitors — guest heartbeat detection, notification types, and per-instance fault handling independent of host state. ## Overview Instance monitors detect failures at the guest level — independent of whether the underlying compute host is healthy. When an instance stops responding to heartbeat checks, an instance-level fault notification is generated and the recovery engine attempts to restart the affected instance. This complements host monitoring by handling scenarios such as OS crashes, guest kernel panics, and runaway processes that consume all available resources without taking down the host. **Prerequisites** * Administrator privileges * Instance HA service deployed and running * XAVS Guest Agent or heartbeat capability enabled in the instance image *** ## Instance Monitor Architecture ```mermaid theme={null} graph TD subgraph Compute Host IM[Instance Monitor Daemon
masakari-instancemonitor] IM -->|Poll each instance| G1[Guest 1 — heartbeat OK] IM -->|Poll each instance| G2[Guest 2 — heartbeat FAIL] end G2 -->|Instance fault| NE[Notification Engine] NE -->|COMPUTE_INSTANCE notification| RE[Recovery Engine] RE -->|Restart instance| NOVA[Xloud Compute API] ``` The instance monitor runs on each compute host and monitors all running instances. It operates independently of the host monitor — both can run simultaneously. *** ## Notification Types Instance HA distinguishes between host-level and instance-level faults using the notification `type` field. | Type | Source | Trigger | | ------------------ | ---------------- | --------------------------------------------- | | `COMPUTE_HOST` | Host Monitor | Host becomes unreachable (IPMI / SSH timeout) | | `COMPUTE_INSTANCE` | Instance Monitor | Guest heartbeat stops responding | | `COMPUTE_PROCESS` | Process Monitor | Critical compute process (nova-compute) dies | *** ## View Instance-Level Notifications Navigate to **Instance-HA > Notifications (admin view)**. Filter by **Type: COMPUTE\_INSTANCE** to display only instance-level fault events. Each row shows the affected instance UUID, the source host, and the current recovery status. ```bash title="List instance-level notifications" theme={null} openstack notification list --type COMPUTE_INSTANCE ``` ```bash title="Show instance notification details" theme={null} openstack notification show ``` The detail view includes the `source_host_uuid`, `payload` (with instance UUID), `generated_time`, and `status`. *** ## Configure the Instance Monitor The instance monitor daemon runs on each compute host. Configure it via the Instance HA configuration overlay. Key instance monitor parameters: | Section | Parameter | Default | Description | | -------------------- | ---------------------------------- | ------- | ------------------------------------------------------- | | `[instance_failure]` | `recover_ignoring_error_instances` | `False` | Attempt recovery for instances already in `ERROR` state | | `[instance_failure]` | `recover_instance_failure_method` | `auto` | Recovery method for instance-level faults | | `[DEFAULT]` | `instance_check_interval` | `30` | Seconds between instance heartbeat polls | In XDeploy, navigate to **Advanced Configuration**. In the **Service Tree**, select **masakari**. Select or create `instance-ha.conf` in the Code Editor. Add or modify the instance monitor parameters: ```ini title="Instance monitor settings in XDeploy Advanced Configuration" theme={null} [instance_failure] recover_ignoring_error_instances = False recover_instance_failure_method = auto [DEFAULT] instance_check_interval = 30 ``` Click **Save Current File**. Navigate to **Operations** and run a **reconfigure** action. The instance monitor restarts automatically on each compute host with the updated parameters. Instance monitor is running on all compute hosts with the new configuration. Edit the configuration file directly and restart the instance monitor on each compute host: ```ini title="/etc/xavs/instance-ha/instance-ha.conf" theme={null} [instance_failure] recover_ignoring_error_instances = False recover_instance_failure_method = auto [DEFAULT] instance_check_interval = 30 ``` ```bash title="Restart instance monitor on each compute host" theme={null} docker restart masakari_instancemonitor ``` *** ## Enable Guest Heartbeat in Instances Instance-level detection requires the instance image to have the `masakari-instancemonitor` XAVS Guest Agent or a compatible heartbeat mechanism installed. The XAVS Guest Agent includes a VSS provider for Windows application-consistent snapshots. Check whether the agent is running inside an instance: ```bash title="Verify XAVS Guest Agent inside instance (SSH)" theme={null} systemctl status masakari-processmonitor ``` For instances using the standard Xloud images, the guest heartbeat is enabled by default. For custom images, install the `python3-masakari` package and enable the `masakari-processmonitor` service at boot. *** ## Difference Between Host and Instance Recovery | Scenario | Monitor Used | Recovery Scope | | ------------------------------------------- | ---------------- | ------------------------------------------------ | | Physical host failure, OS crash, power loss | Host Monitor | All instances on the failed host are evacuated | | Single guest OS crash, kernel panic | Instance Monitor | Only the crashed instance is restarted | | nova-compute process dies on a healthy host | Process Monitor | nova-compute restarted; instances remain on host | *** ## Validation Navigate to **Instance-HA > Notifications (admin view)** and confirm that instance-level notifications appear and transition to `finished` status when instance faults are detected and resolved. ```bash title="Check instance monitor service" theme={null} docker ps --filter name=masakari_instancemonitor ``` ```bash title="View instance monitor logs" theme={null} docker logs -f masakari_instancemonitor ``` Instance monitor is running on all compute hosts and logs confirm active polling. *** ## Next Steps Configure the notification driver that routes fault events to the recovery engine. Configure IPMI and SSH host-level monitors for your compute nodes. Select and configure the recovery method for each failover segment. Diagnose monitor failures, notification delivery issues, and recovery errors.