Overview
The Instance HA engine processes fault notifications and orchestrates the recovery workflow. Its timing and behaviour parameters determine how quickly recovery begins, how many retries are attempted, and how edge-case scenarios (instances in ERROR state, short-lived faults) are handled. This page documents all key configuration parameters and their production recommendations.Configuration File Location
etc
xavs
instance-ha
instance-ha.conf — Primary configuration file
Restart Instance HA engine
Core Parameters
DEFAULT Section
| Parameter | Default | Description |
|---|---|---|
host | <hostname> | Service identifier used for distributed locking |
long_rpc_timeout | 300 | Max seconds to wait for a Compute RPC call |
wait_period_after_service_update | 180 | Seconds to wait after a host service update before triggering recovery — prevents false alarms from planned restarts |
notification_service_endpoint | — | External webhook endpoint for incoming notifications |
[host_failure] Section
| Parameter | Default | Description |
|---|---|---|
host_failure_recovery_interval | 17 | Seconds between recovery retry attempts |
ignore_lease_seconds | 0 | Seconds after host boot to suppress failure notifications |
evacuate_all_instances | True | Evacuate all instances from failed host, not just those with HA protection |
[instance_failure] Section
| Parameter | Default | Description |
|---|---|---|
recover_ignoring_error_instances | False | Attempt recovery for instances already in ERROR state |
recover_instance_failure_method | auto | Recovery method for instance-level faults |
Example Production Configuration
- XDeploy
- CLI
Enable Host HA
In XDeploy, navigate to Configuration → Advance Features and toggle
Enable Host HA to Yes. Click Save Configuration.
Database connection strings (
[database]) and Xloud Identity credentials
([keystone_authtoken]) are auto-managed by XDeploy. Do not edit these
sections manually — they are generated during deployment and kept in sync
with the cluster identity service automatically.Customize engine parameters (optional)
To tune timing or behaviour parameters beyond the defaults, open
Advanced Configuration in XDeploy. In the Service Tree, select
masakari and open (or create) Click Save Current File.
instance-ha.conf.Edit the parameters in the Code Editor:Engine parameters in XDeploy Advanced Configuration
Timing Tuning Guidance
Reduce false positives from planned restarts
Reduce false positives from planned restarts
Increase This adds up to 2 minutes of tolerance for hosts coming back online after a reboot
before Instance HA declares them permanently failed.
wait_period_after_service_update and ignore_lease_seconds to prevent
recovery from triggering during planned host reboots:Recommended for environments with frequent planned maintenance
Faster recovery for latency-sensitive workloads
Faster recovery for latency-sensitive workloads
Reduce retry intervals for faster recovery at the cost of increased sensitivity
to transient network partitions:
Faster recovery (higher false-positive risk)
Enable recovery for instances in ERROR state
Enable recovery for instances in ERROR state
For environments where instances frequently enter This setting is disabled by default because attempting to evacuate an instance
that is in
ERROR state due to transient
issues, enable recovery for error-state instances:Recover ERROR-state instances
ERROR due to a configuration issue (rather than a host failure)
may repeatedly fail and generate noise in the notification log.Verify Engine Configuration
View active configuration
Check engine logs for configuration errors
Validation
- Dashboard
- CLI
Navigate to Admin → Compute → Instance HA → Notifications after a test event.
Verify that recovery workflow timing aligns with the configured parameters.
Next Steps
Recovery Methods
Configure how instances are evacuated after fault detection.
Security
Manage service credentials and RBAC policies for Instance HA.
Troubleshooting
Diagnose engine startup failures and notification processing issues.
Notification Drivers
Configure the notification driver that feeds fault events to the engine.