Instance Monitors

Overview

Instance monitors detect failures at the guest level — independent of whether the underlying compute host is healthy. When an instance stops responding to heartbeat checks, an instance-level fault notification is generated and the recovery engine attempts to restart the affected instance. This complements host monitoring by handling scenarios such as OS crashes, guest kernel panics, and runaway processes that consume all available resources without taking down the host.

Prerequisites

Administrator privileges
Instance HA service deployed and running
XAVS Guest Agent or heartbeat capability enabled in the instance image

Instance Monitor Architecture

The instance monitor runs on each compute host and monitors all running instances. It operates independently of the host monitor — both can run simultaneously.

Notification Types

Instance HA distinguishes between host-level and instance-level faults using the notification type field.

Type	Source	Trigger
`COMPUTE_HOST`	Host Monitor	Host becomes unreachable (IPMI / SSH timeout)
`COMPUTE_INSTANCE`	Instance Monitor	Guest heartbeat stops responding
`COMPUTE_PROCESS`	Process Monitor	Critical compute process (nova-compute) dies

View Instance-Level Notifications

Dashboard
CLI

Navigate to Instance-HA > Notifications (admin view).Filter by Type: COMPUTE_INSTANCE to display only instance-level fault events. Each row shows the affected instance UUID, the source host, and the current recovery status.

List instance-level notifications

openstack notification list --type COMPUTE_INSTANCE

Show instance notification details

openstack notification show <notification-uuid>

The detail view includes the source_host_uuid, payload (with instance UUID), generated_time, and status.

Configure the Instance Monitor

The instance monitor daemon runs on each compute host. Configure it via the Instance HA configuration overlay.

etc

xavs

instance-ha

instance-ha.conf

Key instance monitor parameters:

Section	Parameter	Default	Description
`[instance_failure]`	`recover_ignoring_error_instances`	`False`	Attempt recovery for instances already in `ERROR` state
`[instance_failure]`	`recover_instance_failure_method`	`auto`	Recovery method for instance-level faults
`[DEFAULT]`	`instance_check_interval`	`30`	Seconds between instance heartbeat polls

XDeploy
CLI

Open Advanced Configuration

In XDeploy, navigate to Advanced Configuration. In the Service Tree, select masakari.

Edit instance monitor parameters

Select or create instance-ha.conf in the Code Editor. Add or modify the instance monitor parameters:

Instance monitor settings in XDeploy Advanced Configuration

[instance_failure]
recover_ignoring_error_instances = False
recover_instance_failure_method = auto

[DEFAULT]
instance_check_interval = 30

Click Save Current File.

Apply changes

Navigate to Operations and run a reconfigure action. The instance monitor restarts automatically on each compute host with the updated parameters.

Instance monitor is running on all compute hosts with the new configuration.

Edit the configuration file directly and restart the instance monitor on each compute host:

/etc/xavs/instance-ha/instance-ha.conf

[instance_failure]
recover_ignoring_error_instances = False
recover_instance_failure_method = auto

[DEFAULT]
instance_check_interval = 30

Restart instance monitor on each compute host

docker restart masakari_instancemonitor

Enable Guest Heartbeat in Instances

Instance-level detection requires the instance image to have the masakari-instancemonitor XAVS Guest Agent or a compatible heartbeat mechanism installed. The XAVS Guest Agent includes a VSS provider for Windows application-consistent snapshots. Check whether the agent is running inside an instance:

Verify XAVS Guest Agent inside instance (SSH)

systemctl status masakari-processmonitor

For instances using the standard Xloud images, the guest heartbeat is enabled by default. For custom images, install the python3-masakari package and enable the masakari-processmonitor service at boot.

Difference Between Host and Instance Recovery

Scenario	Monitor Used	Recovery Scope
Physical host failure, OS crash, power loss	Host Monitor	All instances on the failed host are evacuated
Single guest OS crash, kernel panic	Instance Monitor	Only the crashed instance is restarted
nova-compute process dies on a healthy host	Process Monitor	nova-compute restarted; instances remain on host

Validation

Dashboard
CLI

Navigate to Instance-HA > Notifications (admin view) and confirm that instance-level notifications appear and transition to finished status when instance faults are detected and resolved.

Check instance monitor service

docker ps --filter name=masakari_instancemonitor

View instance monitor logs

docker logs -f masakari_instancemonitor

Instance monitor is running on all compute hosts and logs confirm active polling.

Next Steps

Notification Drivers

Configure the notification driver that routes fault events to the recovery engine.

Host Monitors

Configure IPMI and SSH host-level monitors for your compute nodes.

Recovery Methods

Select and configure the recovery method for each failover segment.

Troubleshooting

Diagnose monitor failures, notification delivery issues, and recovery errors.

Core Services

Other Services

Instance Monitors

Overview

Instance Monitor Architecture

Notification Types

View Instance-Level Notifications

Configure the Instance Monitor

Open Advanced Configuration

Edit instance monitor parameters

Apply changes

Enable Guest Heartbeat in Instances

Difference Between Host and Instance Recovery

Validation

Next Steps

Notification Drivers

Host Monitors

Recovery Methods

Troubleshooting

Core Services

Other Services

Documentation Index

​Overview

​Instance Monitor Architecture

​Notification Types

​View Instance-Level Notifications

​Configure the Instance Monitor

Open Advanced Configuration

Edit instance monitor parameters

Apply changes

​Enable Guest Heartbeat in Instances

​Difference Between Host and Instance Recovery

​Validation

​Next Steps

Notification Drivers

Host Monitors

Recovery Methods

Troubleshooting

Overview

Instance Monitor Architecture

Notification Types

View Instance-Level Notifications

Configure the Instance Monitor

Enable Guest Heartbeat in Instances

Difference Between Host and Instance Recovery

Validation

Next Steps