Overview
This page covers common Resource Optimizer issues encountered by operators — audits that produce empty plans, audits stuck inONGOING, migrations that fail during execution, and
plans that revert after completion. For platform-level issues such as Decision Engine
failures or data source connectivity, see the
Admin Troubleshooting guide.
Common Issues
Audit completes but produces an empty action plan
Audit completes but produces an empty action plan
Cause: The cluster is already optimally placed for the selected goal — the strategy
found no hosts below the utilization threshold and no migrations are recommended.Resolution:Check current host utilization to confirm whether consolidation is genuinely needed:If all hosts show healthy, even utilization — this is expected behaviour. No action
is needed.If hosts appear imbalanced but no plan was generated, the strategy threshold may
be too conservative:
Check per-host utilization
Create audit with lower threshold
Audit stuck in ONGOING state
Audit stuck in ONGOING state
Cause: The Decision Engine is waiting for metric data from a slow or unavailable
data source (Prometheus or Telemetry).Resolution:If the audit has been
Check audit status and duration
ONGOING for more than 5 minutes, contact your administrator
to check Decision Engine and data source connectivity. Your administrator can configure this through XDeploy.For non-telemetry goals (e.g., server_consolidation, zone_migration), audits
should complete within 30–90 seconds. Longer durations indicate a data collection
issue.Action fails with migration error
Action fails with migration error
Cause: A live migration failed — commonly due to insufficient memory on the
target host, a CPU model incompatibility between source and destination hosts, or
a storage connectivity issue.Resolution:Review the
After resolving the root cause, create a new audit to generate a fresh plan.
Show failed action details
fault field for the specific migration error. Common errors:| Error | Cause | Fix |
|---|---|---|
No valid host found | Target host has insufficient capacity | Add compute capacity or adjust plan |
CPU compatibility | CPU model mismatch between hosts | Configure cpu_mode=custom on all hosts |
Disk not found | Instance uses local disk (not shared storage) | Verify instance uses shared storage backend |
Action plan shows CANCELLED state
Action plan shows CANCELLED state
Cause: A previous action in the plan failed, causing the Applier to halt and
cancel all remaining actions automatically.Resolution: Review the failed action to identify the root cause:Fix the root cause (capacity, CPU compatibility, storage), then run a new audit
to generate an updated plan reflecting the current cluster state.
List actions and find the failed one
Optimizations revert after execution
Optimizations revert after execution
Cause: Another process — the compute scheduler placing new instances, auto-scaling,
or manual migrations — is placing instances back on hosts that were just emptied by
the optimization.Resolution: Coordinate with team members performing manual migrations during
optimization windows. Consider applying compute host aggregates or availability zone
constraints to prevent the scheduler from re-populating hosts that were intentionally
consolidated.
Diagnostic Commands
List all audits with states
Show full audit detail
List action plans with states
Show individual action failures
Next Steps
Run an Audit
Create a new audit after resolving the issue.
Audit History
Review past audits to identify recurring patterns.
Admin Troubleshooting
Platform-level diagnostics for Decision Engine and data source failures.
Compute Admin Guide
Verify shared storage and live migration capability for optimization actions.