> ## Documentation Index > Fetch the complete documentation index at: https://docs.xloud.tech/llms.txt > Use this file to discover all available pages before exploring further. # Host Monitors > Configure Xloud Instance HA host monitors — IPMI out-of-band and SSH in-band monitoring for compute hosts, with timing parameters and credential management. ## Overview Host monitors are the detection layer of Instance HA. They continuously poll registered compute hosts and emit fault notifications when a host becomes unreachable. Xloud Instance HA supports two monitor types: IPMI (out-of-band, recommended for production) and SSH (in-band, for environments without IPMI access). This page covers configuration for both types and the timing parameters that control detection sensitivity. Misconfigured monitor credentials or unreachable IPMI endpoints will cause false-negative detections — the monitor reports success even when the host has failed. Validate all monitor connections before enabling production workloads. *** ## Monitor Types Uses out-of-band management hardware. Detects failures even when the host OS, kernel, or all network interfaces are completely unresponsive. Attempts an SSH TCP connection to the host. Simpler to set up but dependent on host network stack — may miss hardware failures that leave the SSH port unreachable. *** ## IPMI Host Monitor The IPMI monitor uses the host's `control_attributes` JSON field, set when registering the host in a segment. ### Configure IPMI Credentials When adding a host to a segment, set the **Control Attributes** field to a JSON object with the IPMI endpoint: ```json title="IPMI control attributes" theme={null} { "host": "192.168.10.11", "username": "admin", "password": "ipmi-password" } ``` | Key | Description | | ---------- | ---------------------------------------------- | | `host` | IPMI management IP address | | `username` | IPMI user with chassis status read permissions | | `password` | IPMI user password | ```bash title="Register host with IPMI credentials" theme={null} openstack segment host create \ --type COMPUTE \ --control_attributes '{"host": "192.168.10.11", "username": "admin", "password": "ipmi-password"}' \ --on_maintenance False \ ``` IPMI credentials are stored in the Instance HA database. Restrict database access to the Instance HA service account only. Consider using Xloud Key Management to manage IPMI secrets and inject them via a custom notification driver. ### Validate IPMI Connectivity Before registering hosts, validate IPMI access from the controller node: ```bash title="Test IPMI connectivity" theme={null} ipmitool -I lanplus \ -H \ -U \ -P \ chassis status ``` Expected output: `System Power State: on` confirms IPMI access is working. Ensure UDP port 623 is permitted between the Instance HA controller and all IPMI management interfaces. IPMI uses RMCP+ protocol over UDP 623 by default. *** ## SSH Host Monitor The SSH monitor attempts a TCP connection to port 22. It uses the SSH key configured for the Instance HA service account — no password authentication is used. ### Deploy SSH Keys The Instance HA host monitor generates an SSH key pair at service startup. Locate the public key on the controller node: ```bash title="Find service public key" theme={null} cat /etc/xavs/instance-ha/id_rsa.pub ``` Append the public key to the `authorized_keys` of the user the monitor will connect as (typically `root` or a dedicated service account): ```bash title="Deploy key to compute host" theme={null} ssh-copy-id -i /etc/xavs/instance-ha/id_rsa.pub root@ ``` When registering the host in the segment, use the IP address only — no credentials: ```bash title="Register host for SSH monitoring" theme={null} openstack segment host create \ --type COMPUTE \ --control_attributes '{"host": "10.0.1.72"}' \ ``` ```bash title="Test SSH from controller" theme={null} ssh -i /etc/xavs/instance-ha/id_rsa root@ hostname ``` SSH connection succeeds without password prompt. *** ## Timing Parameters Adjust monitoring sensitivity through the parameters below: | Section | Parameter | Default | Description | | ---------------- | ---------------------------------- | ------- | --------------------------------------------------------------------------------------------------------------------------- | | `[DEFAULT]` | `wait_period_after_service_update` | `180` | Seconds to wait after a host enters maintenance before triggering recovery -- prevents false alarms during planned restarts | | `[DEFAULT]` | `long_rpc_timeout` | `300` | Maximum seconds to wait for a Compute RPC call to complete before declaring it failed | | `[host_failure]` | `host_failure_recovery_interval` | `17` | Seconds between recovery retry attempts when the first evacuation attempt fails | | `[host_failure]` | `ignore_lease_seconds` | `0` | Seconds after host boot to suppress fault notifications -- set to 60-120 to avoid startup noise | In XDeploy, navigate to **Advanced Configuration**. In the **Service Tree**, select **masakari**. Select or create `instance-ha.conf` in the Code Editor. Add or modify the timing parameters: ```ini title="Host monitor timing in XDeploy Advanced Configuration" theme={null} [DEFAULT] wait_period_after_service_update = 180 long_rpc_timeout = 300 [host_failure] host_failure_recovery_interval = 17 ignore_lease_seconds = 60 ``` Click **Save Current File**. Navigate to **Operations** and run a **reconfigure** action. The host monitor service restarts automatically with the updated parameters. Host monitor is running with the new timing configuration. Edit the configuration file directly and restart the host monitor container: ```bash title="Open Instance HA configuration" theme={null} vi /etc/xavs/instance-ha/instance-ha.conf ``` ```ini title="Example timing configuration" theme={null} [DEFAULT] wait_period_after_service_update = 180 long_rpc_timeout = 300 [host_failure] host_failure_recovery_interval = 17 ignore_lease_seconds = 60 ``` ```bash title="Restart host monitor" theme={null} docker restart masakari_hostmonitor ``` *** ## Monitor Health Check Verify the host monitor is running and detecting hosts correctly: ```bash title="Check monitor service status" theme={null} docker ps --filter name=masakari_hostmonitor ``` ```bash title="View monitor logs" theme={null} docker logs -f masakari_hostmonitor ``` Look for log entries confirming successful polls: ``` INFO masakari.hostmonitor: Host compute-01 is ALIVE INFO masakari.hostmonitor: Host compute-02 is ALIVE ``` A repeated `UNREACHABLE` log for a running host indicates a credential or network configuration issue — not a genuine host failure. *** ## Next Steps Configure guest-level instance heartbeat monitoring for per-instance fault detection. Register and manage compute hosts within protection segments. Tune recovery engine timing and retry parameters. Secure IPMI credentials and restrict access to Instance HA APIs.