Optimization Troubleshooting — User Guide

Audit completes but produces an empty action plan

Cause: The cluster is already optimally placed for the selected goal — the strategy found no hosts below the utilization threshold and no migrations are recommended.Resolution:Check current host utilization to confirm whether consolidation is genuinely needed:

Check per-host utilization

openstack hypervisor list --long

If all hosts show healthy, even utilization — this is expected behaviour. No action is needed.If hosts appear imbalanced but no plan was generated, the strategy threshold may be too conservative:

Create audit with lower threshold

watcher audit create \
  --goal server_consolidation \
  --parameter threshold=0.1 \
  --name lower-threshold-audit

The default consolidation threshold is 0.2 (20%). Lowering it to 0.1 (10%) means more hosts qualify as underutilized and are included in the migration plan.

Audit stuck in ONGOING state

Cause: The Decision Engine is waiting for metric data from a slow or unavailable data source (Prometheus or Telemetry).Resolution:

Check audit status and duration

watcher audit show <audit-uuid> \
  -f value -c state -c created_at

If the audit has been ONGOING for more than 5 minutes, contact your administrator to check Decision Engine and data source connectivity. Your administrator can configure this through XDeploy.For non-telemetry goals (e.g., server_consolidation, zone_migration), audits should complete within 30–90 seconds. Longer durations indicate a data collection issue.

Action fails with migration error

Cause: A live migration failed — commonly due to insufficient memory on the target host, a CPU model incompatibility between source and destination hosts, or a storage connectivity issue.Resolution:

Show failed action details

watcher action show <action-uuid> -f json

Review the fault field for the specific migration error. Common errors:

Error	Cause	Fix
`No valid host found`	Target host has insufficient capacity	Add compute capacity or adjust plan
`CPU compatibility`	CPU model mismatch between hosts	Configure `cpu_mode=custom` on all hosts
`Disk not found`	Instance uses local disk (not shared storage)	Verify instance uses shared storage backend

After resolving the root cause, create a new audit to generate a fresh plan.

Action plan shows CANCELLED state

Cause: A previous action in the plan failed, causing the Applier to halt and cancel all remaining actions automatically.Resolution: Review the failed action to identify the root cause:

List actions and find the failed one

watcher action list \
  --action-plan <action-plan-uuid> \
  -f table -c uuid -c action_type -c state

Fix the root cause (capacity, CPU compatibility, storage), then run a new audit to generate an updated plan reflecting the current cluster state.

Optimizations revert after execution

Cause: Another process — the compute scheduler placing new instances, auto-scaling, or manual migrations — is placing instances back on hosts that were just emptied by the optimization.Resolution: Coordinate with team members performing manual migrations during optimization windows. Consider applying compute host aggregates or availability zone constraints to prevent the scheduler from re-populating hosts that were intentionally consolidated.

Run an Audit

Create a new audit after resolving the issue.

Audit History

Review past audits to identify recurring patterns.

Admin Troubleshooting

Platform-level diagnostics for Decision Engine and data source failures.

Compute Admin Guide

Verify shared storage and live migration capability for optimization actions.

Optimization Troubleshooting — User Guide

Overview

Common Issues

Diagnostic Commands

Next Steps

Run an Audit

Audit History

Admin Troubleshooting

Compute Admin Guide

​Overview

​Common Issues

​Diagnostic Commands

​Next Steps

Run an Audit

Audit History

Admin Troubleshooting

Compute Admin Guide

Overview

Common Issues

Diagnostic Commands

Next Steps