Skip to main content

Overview

Notification drivers are the bridge between external monitoring systems and the Instance HA recovery engine. They receive fault signals in various formats, translate them into structured Instance HA notifications, and route them to the Recovery Engine. Xloud Instance HA ships with the NovaNotificationDriver as the default. Custom drivers allow integration with existing monitoring infrastructure such as Prometheus Alertmanager or Nagios.

Built-in Drivers

DriverSourceProtocolWhen to Use
NovaNotificationDriverXloud Compute message busAMQPDefault — all standard deployments
TaskFlowDriverTaskFlow workflow engineInternal RPCAdvanced workflow orchestration
Custom webhook driverThird-party tools (Prometheus, Nagios)HTTP POSTEnvironments with existing monitoring infrastructure

NovaNotificationDriver (Default)

The NovaNotificationDriver is enabled by default in all Xloud Instance HA deployments. It subscribes to the Xloud Compute AMQP message bus and listens for compute.host.error and compute.instance.error notification events.

How it works

When a compute host enters a failure state, the Compute service publishes an error notification on the AMQP message bus. The NovaNotificationDriver receives this message, extracts the affected host information, and creates an Instance HA notification record to trigger the recovery workflow.This driver requires no additional configuration beyond what is provided by the standard XDeploy deployment.
Check driver configuration
grep -i notification_driver \
  /etc/xavs/instance-ha/instance-ha.conf
Expected output:
notification_drivers = nova_notification
Confirm AMQP connectivity
docker logs masakari_engine | grep -i "notification"

Webhook Notification Driver

For environments that use Prometheus, Nagios, or other external monitoring tools as the primary fault detection system, Instance HA exposes an HTTP notification endpoint that accepts structured fault payloads.

Endpoint

POST /v1/notifications
Authorization: Bearer <token>
Content-Type: application/json

Payload Format

Host fault notification payload
{
  "hostname": "compute-01.xloud.local",
  "type": "COMPUTE_HOST",
  "payload": {
    "event": "STOPPED",
    "cluster_status": "OFFLINE",
    "host_status": "NORMAL"
  }
}
FieldTypeDescription
hostnamestringThe compute hostname as registered in the segment
typestringCOMPUTE_HOST, COMPUTE_INSTANCE, or COMPUTE_PROCESS
payload.eventstringSTOPPED or STARTED
payload.cluster_statusstringONLINE or OFFLINE
payload.host_statusstringNORMAL or UNKNOWN

Example: Prometheus Alertmanager Webhook

Configure an Alertmanager receiver that calls the Instance HA notification endpoint:
alertmanager.yml — webhook receiver
receivers:
  - name: "instance-ha-webhook"
    webhook_configs:
      - url: "http://<instance-ha-api>:15868/v1/notifications"
        http_config:
          bearer_token: "<service-token>"
The Instance HA API uses Xloud Identity token authentication. Generate a service token for the alertmanager integration using a dedicated service account with the admin role. Do not use personal user tokens in production.

TaskFlowDriver

The TaskFlow driver enables advanced workflow orchestration for recovery actions. It is used internally when the default recovery workflow requires multi-step sequencing with retry and rollback support. This driver operates transparently alongside the NovaNotificationDriver and does not require separate configuration in standard deployments. To customize the TaskFlow task pipeline, implement the BaseTask interface and register the plugin in the configuration.

Open Advanced Configuration

In XDeploy, navigate to Advanced Configuration. In the Service Tree, select masakari.

Edit workflow targets

Select or create instance-ha.conf in the Code Editor. Add the custom workflow targets:
TaskFlow workflow in XDeploy Advanced Configuration
[recovery_workflow_on_stop]
targets = disableComputeNodeTask, PrepareHAEnabled, EvacuateHost
Click Save Current File.

Apply changes

Navigate to Operations and run a reconfigure action. The recovery engine restarts with the updated workflow pipeline.
Engine logs confirm the custom TaskFlow targets are loaded.

Validation

Navigate to Admin → Compute → Instance HA → Notifications.Simulate a notification by creating one manually (test environments only):
  • Click Create Notification (admin view)
  • Set type to COMPUTE_HOST, hostname to a registered host, event to STOPPED
  • Confirm the notification appears and transitions to running
Notification is received, logged, and triggers the recovery workflow.

Next Steps

Recovery Methods

Configure how instances are evacuated after a notification triggers recovery.

Instance Monitors

Configure guest-level monitoring independent of the notification driver.

Engine Configuration

Tune recovery engine timing, retries, and workflow task ordering.

Security

Secure the notification API endpoint and service account credentials.