Notification Drivers

Overview

Notification drivers are the bridge between external monitoring systems and the Instance HA recovery engine. They receive fault signals in various formats, translate them into structured Instance HA notifications, and route them to the Recovery Engine. Xloud Instance HA ships with the NovaNotificationDriver as the default. Custom drivers allow integration with existing monitoring infrastructure such as Prometheus Alertmanager or Nagios.

Built-in Drivers

Driver	Source	Protocol	When to Use
`NovaNotificationDriver`	Xloud Compute message bus	AMQP	Default — all standard deployments
`TaskFlowDriver`	TaskFlow workflow engine	Internal RPC	Advanced workflow orchestration
Custom webhook driver	Third-party tools (Prometheus, Nagios)	HTTP POST	Environments with existing monitoring infrastructure

NovaNotificationDriver (Default)

The NovaNotificationDriver is enabled by default in all Xloud Instance HA deployments. It subscribes to the Xloud Compute AMQP message bus and listens for compute.host.error and compute.instance.error notification events.

How it works

When a compute host enters a failure state, the Compute service publishes an error notification on the AMQP message bus. The NovaNotificationDriver receives this message, extracts the affected host information, and creates an Instance HA notification record to trigger the recovery workflow.This driver requires no additional configuration beyond what is provided by the standard XDeploy deployment.

Verify the driver is active

Check driver configuration

grep -i notification_driver \
  /etc/xavs/instance-ha/instance-ha.conf

Expected output:

notification_drivers = nova_notification

Confirm AMQP connectivity

docker logs masakari_engine | grep -i "notification"

Webhook Notification Driver

For environments that use Prometheus, Nagios, or other external monitoring tools as the primary fault detection system, Instance HA exposes an HTTP notification endpoint that accepts structured fault payloads.

Endpoint

POST /v1/notifications
Authorization: Bearer <token>
Content-Type: application/json

Payload Format

Host fault notification payload

{
  "hostname": "compute-01.xloud.local",
  "type": "COMPUTE_HOST",
  "payload": {
    "event": "STOPPED",
    "cluster_status": "OFFLINE",
    "host_status": "NORMAL"
  }
}

Field	Type	Description
`hostname`	string	The compute hostname as registered in the segment
`type`	string	`COMPUTE_HOST`, `COMPUTE_INSTANCE`, or `COMPUTE_PROCESS`
`payload.event`	string	`STOPPED` or `STARTED`
`payload.cluster_status`	string	`ONLINE` or `OFFLINE`
`payload.host_status`	string	`NORMAL` or `UNKNOWN`

Example: Prometheus Alertmanager Webhook

Configure an Alertmanager receiver that calls the Instance HA notification endpoint:

alertmanager.yml — webhook receiver

receivers:
  - name: "instance-ha-webhook"
    webhook_configs:
      - url: "http://<instance-ha-api>:15868/v1/notifications"
        http_config:
          bearer_token: "<service-token>"

The Instance HA API uses Xloud Identity token authentication. Generate a service token for the alertmanager integration using a dedicated service account with the admin role. Do not use personal user tokens in production.

TaskFlowDriver

The TaskFlow driver enables advanced workflow orchestration for recovery actions. It is used internally when the default recovery workflow requires multi-step sequencing with retry and rollback support. This driver operates transparently alongside the NovaNotificationDriver and does not require separate configuration in standard deployments. To customize the TaskFlow task pipeline, implement the BaseTask interface and register the plugin in the configuration.

XDeploy
CLI

Open Advanced Configuration

In XDeploy, navigate to Advanced Configuration. In the Service Tree, select masakari.

Edit workflow targets

Select or create instance-ha.conf in the Code Editor. Add the custom workflow targets:

TaskFlow workflow in XDeploy Advanced Configuration

[recovery_workflow_on_stop]
targets = disableComputeNodeTask, PrepareHAEnabled, EvacuateHost

Click Save Current File.

Apply changes

Navigate to Operations and run a reconfigure action. The recovery engine restarts with the updated workflow pipeline.

Engine logs confirm the custom TaskFlow targets are loaded.

Edit the configuration file directly and restart the engine container:

/etc/xavs/instance-ha/instance-ha.conf

[recovery_workflow_on_stop]
targets = disableComputeNodeTask, PrepareHAEnabled, EvacuateHost

Restart recovery engine

docker restart masakari_engine

Validation

Dashboard
CLI

Navigate to Instance-HA > Notifications (admin view).Simulate a notification by creating one manually (test environments only):

Click Create Notification (admin view)
Set type to COMPUTE_HOST, hostname to a registered host, event to STOPPED
Confirm the notification appears and transitions to running

Notification is received, logged, and triggers the recovery workflow.

Create a test notification (test environments only)

openstack notification create \
  --hostname compute-01.xloud.local \
  --type COMPUTE_HOST \
  --payload '{"event": "STOPPED", "cluster_status": "OFFLINE", "host_status": "NORMAL"}'

Monitor notification status

openstack notification list --status new

Creating test notifications in production triggers real recovery workflows. Only use this in isolated test environments.

Next Steps

Recovery Methods

Configure how instances are evacuated after a notification triggers recovery.

Instance Monitors

Configure guest-level monitoring independent of the notification driver.

Engine Configuration

Tune recovery engine timing, retries, and workflow task ordering.

Security

Secure the notification API endpoint and service account credentials.

Core Services

Other Services

Notification Drivers

Overview

Built-in Drivers

NovaNotificationDriver (Default)

Webhook Notification Driver

Endpoint

Payload Format

Example: Prometheus Alertmanager Webhook

TaskFlowDriver

Open Advanced Configuration

Edit workflow targets

Apply changes

Validation

Next Steps

Recovery Methods

Instance Monitors

Engine Configuration

Security

Core Services

Other Services

Documentation Index

​Overview

​Built-in Drivers

​NovaNotificationDriver (Default)

​Webhook Notification Driver

​Endpoint

​Payload Format

​Example: Prometheus Alertmanager Webhook

​TaskFlowDriver

Open Advanced Configuration

Edit workflow targets

Apply changes

​Validation

​Next Steps

Recovery Methods

Instance Monitors

Engine Configuration

Security

Overview

Built-in Drivers

NovaNotificationDriver (Default)

Webhook Notification Driver

Endpoint

Payload Format

Example: Prometheus Alertmanager Webhook

TaskFlowDriver

Validation

Next Steps