Skip to main content

Overview

Instance monitors detect failures at the guest level — independent of whether the underlying compute host is healthy. When an instance stops responding to heartbeat checks, an instance-level fault notification is generated and the recovery engine attempts to restart the affected instance. This complements host monitoring by handling scenarios such as OS crashes, guest kernel panics, and runaway processes that consume all available resources without taking down the host.
Prerequisites
  • Administrator privileges
  • Instance HA service deployed and running
  • XAVS Guest Agent or heartbeat capability enabled in the instance image

Instance Monitor Architecture

The instance monitor runs on each compute host and monitors all running instances. It operates independently of the host monitor — both can run simultaneously.

Notification Types

Instance HA distinguishes between host-level and instance-level faults using the notification type field.
TypeSourceTrigger
COMPUTE_HOSTHost MonitorHost becomes unreachable (IPMI / SSH timeout)
COMPUTE_INSTANCEInstance MonitorGuest heartbeat stops responding
COMPUTE_PROCESSProcess MonitorCritical compute process (nova-compute) dies

View Instance-Level Notifications

Navigate to Admin → Compute → Instance HA → Notifications.Filter by Type: COMPUTE_INSTANCE to display only instance-level fault events. Each row shows the affected instance UUID, the source host, and the current recovery status.

Configure the Instance Monitor

The instance monitor daemon runs on each compute host. Configure it via the Instance HA configuration overlay.
etc
xavs
instance-ha
instance-ha.conf
Key instance monitor parameters:
SectionParameterDefaultDescription
[instance_failure]recover_ignoring_error_instancesFalseAttempt recovery for instances already in ERROR state
[instance_failure]recover_instance_failure_methodautoRecovery method for instance-level faults
[DEFAULT]instance_check_interval30Seconds between instance heartbeat polls

Open Advanced Configuration

In XDeploy, navigate to Advanced Configuration. In the Service Tree, select masakari.

Edit instance monitor parameters

Select or create instance-ha.conf in the Code Editor. Add or modify the instance monitor parameters:
Instance monitor settings in XDeploy Advanced Configuration
[instance_failure]
recover_ignoring_error_instances = False
recover_instance_failure_method = auto

[DEFAULT]
instance_check_interval = 30
Click Save Current File.

Apply changes

Navigate to Operations and run a reconfigure action. The instance monitor restarts automatically on each compute host with the updated parameters.
Instance monitor is running on all compute hosts with the new configuration.

Enable Guest Heartbeat in Instances

Instance-level detection requires the instance image to have the masakari-instancemonitor XAVS Guest Agent or a compatible heartbeat mechanism installed. The XAVS Guest Agent includes a VSS provider for Windows application-consistent snapshots. Check whether the agent is running inside an instance:
Verify XAVS Guest Agent inside instance (SSH)
systemctl status masakari-processmonitor
For instances using the standard Xloud images, the guest heartbeat is enabled by default. For custom images, install the python3-masakari package and enable the masakari-processmonitor service at boot.

Difference Between Host and Instance Recovery

ScenarioMonitor UsedRecovery Scope
Physical host failure, OS crash, power lossHost MonitorAll instances on the failed host are evacuated
Single guest OS crash, kernel panicInstance MonitorOnly the crashed instance is restarted
nova-compute process dies on a healthy hostProcess Monitornova-compute restarted; instances remain on host

Validation

Navigate to Admin → Compute → Instance HA → Notifications and confirm that instance-level notifications appear and transition to finished status when instance faults are detected and resolved.

Next Steps

Notification Drivers

Configure the notification driver that routes fault events to the recovery engine.

Host Monitors

Configure IPMI and SSH host-level monitors for your compute nodes.

Recovery Methods

Select and configure the recovery method for each failover segment.

Troubleshooting

Diagnose monitor failures, notification delivery issues, and recovery errors.