Overview
Instance monitors detect failures at the guest level — independent of whether the underlying compute host is healthy. When an instance stops responding to heartbeat checks, an instance-level fault notification is generated and the recovery engine attempts to restart the affected instance. This complements host monitoring by handling scenarios such as OS crashes, guest kernel panics, and runaway processes that consume all available resources without taking down the host.Prerequisites
- Administrator privileges
- Instance HA service deployed and running
- XAVS Guest Agent or heartbeat capability enabled in the instance image
Instance Monitor Architecture
The instance monitor runs on each compute host and monitors all running instances. It operates independently of the host monitor — both can run simultaneously.Notification Types
Instance HA distinguishes between host-level and instance-level faults using the notificationtype field.
| Type | Source | Trigger |
|---|---|---|
COMPUTE_HOST | Host Monitor | Host becomes unreachable (IPMI / SSH timeout) |
COMPUTE_INSTANCE | Instance Monitor | Guest heartbeat stops responding |
COMPUTE_PROCESS | Process Monitor | Critical compute process (nova-compute) dies |
View Instance-Level Notifications
- Dashboard
- CLI
Navigate to Admin → Compute → Instance HA → Notifications.Filter by Type: COMPUTE_INSTANCE to display only instance-level fault events.
Each row shows the affected instance UUID, the source host, and the current recovery
status.
Configure the Instance Monitor
The instance monitor daemon runs on each compute host. Configure it via the Instance HA configuration overlay.etc
xavs
instance-ha
instance-ha.conf
| Section | Parameter | Default | Description |
|---|---|---|---|
[instance_failure] | recover_ignoring_error_instances | False | Attempt recovery for instances already in ERROR state |
[instance_failure] | recover_instance_failure_method | auto | Recovery method for instance-level faults |
[DEFAULT] | instance_check_interval | 30 | Seconds between instance heartbeat polls |
- XDeploy
- CLI
Open Advanced Configuration
In XDeploy, navigate to Advanced Configuration. In the Service Tree,
select masakari.
Edit instance monitor parameters
Select or create Click Save Current File.
instance-ha.conf in the Code Editor. Add or modify the
instance monitor parameters:Instance monitor settings in XDeploy Advanced Configuration
Enable Guest Heartbeat in Instances
Instance-level detection requires the instance image to have themasakari-instancemonitor
XAVS Guest Agent or a compatible heartbeat mechanism installed. The XAVS Guest Agent includes a VSS provider for Windows application-consistent snapshots. Check whether the agent is running inside an instance:
Verify XAVS Guest Agent inside instance (SSH)
Difference Between Host and Instance Recovery
| Scenario | Monitor Used | Recovery Scope |
|---|---|---|
| Physical host failure, OS crash, power loss | Host Monitor | All instances on the failed host are evacuated |
| Single guest OS crash, kernel panic | Instance Monitor | Only the crashed instance is restarted |
| nova-compute process dies on a healthy host | Process Monitor | nova-compute restarted; instances remain on host |
Validation
- Dashboard
- CLI
Navigate to Admin → Compute → Instance HA → Notifications and confirm that
instance-level notifications appear and transition to
finished status when
instance faults are detected and resolved.Next Steps
Notification Drivers
Configure the notification driver that routes fault events to the recovery engine.
Host Monitors
Configure IPMI and SSH host-level monitors for your compute nodes.
Recovery Methods
Select and configure the recovery method for each failover segment.
Troubleshooting
Diagnose monitor failures, notification delivery issues, and recovery errors.