Compute Troubleshooting

Instance stuck in BUILD status

Cause: The scheduler placed the instance on a host but the Compute Agent failed to complete provisioning. Common causes include image download failure, networking misconfiguration, or storage attachment failure.Diagnosis:

Check instance event log

openstack server event list <instance-id>

Identify the target host

openstack server show <instance-id> \
  -f value -c OS-EXT-SRV-ATTR:host

Check Compute Agent logs on the target host (via XDeploy terminal)

journalctl -u nova-compute --since "1 hour ago" | grep <instance-id>

Common causes and resolutions:

Symptom in Event Log	Resolution
`Image download failed`	Verify Xloud Image Service reachability from the compute node
`Quota exceeded on host`	Check host capacity with `openstack hypervisor show <host>`
`Network interface allocation failed`	Verify network agent status on the host
`Volume attachment failed`	Check Xloud Block Storage service health

If the instance is permanently stuck, force-delete it with openstack server delete --force <instance-id> and re-launch on a healthy host. Verify the target host is up and enabled before retrying.

Instance in ERROR state

Cause: A fatal error occurred during instance creation, a running operation, or hypervisor interaction. The fault details are stored in the instance record.Diagnosis:

Show error fault details

openstack server show <instance-id> | grep -A5 fault

View full instance event log

openstack server event list <instance-id>

Resolution:If the error is recoverable (e.g., a temporary network partition that has since resolved), attempt to rebuild the instance from its original image:

Rebuild instance from original image

openstack server rebuild <instance-id> --image <original-image-id>

If the error is caused by a host-level hardware failure, migrate the instance to a healthy host before attempting a rebuild. See Live Migration for instructions.

Rebuilding an instance replaces the root disk. Any data written to the root disk after initial provisioning will be lost. Ensure the instance owner has backed up root disk data before issuing a rebuild.

Live migration fails

Cause: CPU compatibility mismatch, insufficient destination capacity, or a network timeout during the migration data transfer.Diagnosis:

Check migration status and error message

openstack server migration list --server <instance-id>

Show detailed migration information

openstack server migration show <instance-id> <migration-id>

Common errors and resolutions:

Error Message	Root Cause	Resolution
`guest CPU doesn't match specification: missing features`	CPU microarchitecture difference between hosts	Configure a common CPU baseline model on all hosts via XDeploy under Compute → Advanced Settings → CPU Compatibility
`No valid host found`	Destination host lacks capacity or is disabled	Check destination host capacity with `openstack hypervisor show <host>`; verify host is `enabled` and `up`
`Connection timeout`	Network disruption on migration network	Verify network connectivity between compute nodes; check firewall rules on the management interface
`Block migration disk copy failed`	Insufficient free disk on destination	Check available disk with `openstack hypervisor show <host>`

See Live Migration for a full walkthrough of the migration procedure.

No valid host found (scheduling failure)

Cause: All compute hosts were eliminated by the scheduler filter chain — no eligible host satisfies the combined instance requirements.Diagnosis:

Check cluster-wide capacity

openstack hypervisor stats show

List all hosts with capacity details

openstack hypervisor list --long

Common causes:

Cause	Resolution
All hosts at vCPU or RAM capacity	Scale out the cluster or increase over-commit ratios via XDeploy
Availability zone constraint too restrictive	Verify target AZ has active hosts with `openstack availability zone list --long`
Host aggregate metadata mismatch	Verify flavor extra specs match aggregate metadata keys on target hosts
Server group anti-affinity exhausted	Group has used all distinct hosts; scale out or remove anti-affinity constraint
Flavor requires PCI device not available	Verify PCI passthrough devices are configured on target hosts

If openstack hypervisor stats show reports available capacity but scheduling still fails, the Placement service inventory may be out of sync with actual host state. Trigger a resource reconciliation through XDeploy under Compute → Diagnostics → Reconcile Inventory.

Console connection refused

Cause: The VNC or SPICE console proxy service is not running, the firewall is blocking the console port, or the console token has expired.Diagnosis:

Verify console proxy service status

openstack compute service list | grep consoleauth

Check all compute services for degraded state

openstack compute service list

Resolution:

Verify ports 6080 (VNC), 6082 (SPICE), and 6083 (serial) are open in your firewall rules from the administrator’s workstation to the controller node.
If the console proxy service is down, restart it through XDeploy under Compute → Services → Console Proxy.
If the connection is refused immediately after generating a URL, the token may have expired. Generate a new console URL:

Generate a fresh console URL

openstack console url show --novnc <instance-id>

Console tokens expire after a short period. If the browser reports an authentication error when accessing the console URL, always generate a new URL rather than refreshing the page.

See Console Access for firewall port requirements and proxy configuration details.

Quota exceeded errors

Cause: The project has reached its allocation limit for instances, vCPUs, or RAM. New instance creation or resize operations are blocked until the quota is increased or existing resources are released.Diagnosis:

Show current quota usage

openstack quota show --compute <project-id>

List instances consuming quota in the project

openstack server list \
  --project <project-id> \
  --all-projects \
  --long

Resolution:Option 1 — Increase the project quota:

Increase quota for instances, vCPUs, and RAM

openstack quota set \
  --instances 50 \
  --cores 100 \
  --ram 204800 \
  <project-id>

Option 2 — Free capacity by removing unused instances:Coordinate with the project owner to identify and delete instances that are no longer in use. Do not delete instances without explicit confirmation from the project owner.

Before increasing quotas, verify the cluster has sufficient physical capacity with openstack hypervisor stats show. See Quota Management for quota adjustment procedures.

Compute Hosts

Monitor and manage hypervisor host health to prevent scheduling failures.

Live Migration

Move instances off degraded hosts before performing maintenance.

Admin Guide

Return to the Compute Administration Guide index.

Compute Troubleshooting

Overview

Common Issues

Next Steps

Compute Hosts

Live Migration

Admin Guide

​Overview

​Common Issues

​Next Steps

Compute Hosts

Live Migration

Admin Guide

Overview

Common Issues

Next Steps