Skip to main content

Overview

This guide covers the most common operational issues encountered in Xloud Compute environments. Each section provides the diagnostic commands, root cause analysis, and resolution steps needed to restore normal operation.
Administrator Access Required — This operation requires the admin role. Contact your Xloud administrator if you do not have sufficient permissions.
Prerequisites
  • Admin credentials sourced from admin-openrc.sh
  • openstack CLI installed and configured
  • SSH access to compute nodes for log inspection when needed

Common Issues

Cause: The scheduler placed the instance on a host but the Compute Agent failed to complete provisioning. Common causes include image download failure, networking misconfiguration, or storage attachment failure.Diagnosis:
Check instance event log
openstack server event list <instance-id>
Identify the target host
openstack server show <instance-id> \
  -f value -c OS-EXT-SRV-ATTR:host
Check Compute Agent logs on the target host (via XDeploy terminal)
journalctl -u nova-compute --since "1 hour ago" | grep <instance-id>
Common causes and resolutions:
Symptom in Event LogResolution
Image download failedVerify Xloud Image Service reachability from the compute node
Quota exceeded on hostCheck host capacity with openstack hypervisor show <host>
Network interface allocation failedVerify network agent status on the host
Volume attachment failedCheck Xloud Block Storage service health
If the instance is permanently stuck, force-delete it with openstack server delete --force <instance-id> and re-launch on a healthy host. Verify the target host is up and enabled before retrying.
Cause: A fatal error occurred during instance creation, a running operation, or hypervisor interaction. The fault details are stored in the instance record.Diagnosis:
Show error fault details
openstack server show <instance-id> | grep -A5 fault
View full instance event log
openstack server event list <instance-id>
Resolution:If the error is recoverable (e.g., a temporary network partition that has since resolved), attempt to rebuild the instance from its original image:
Rebuild instance from original image
openstack server rebuild <instance-id> --image <original-image-id>
If the error is caused by a host-level hardware failure, migrate the instance to a healthy host before attempting a rebuild. See Live Migration for instructions.
Rebuilding an instance replaces the root disk. Any data written to the root disk after initial provisioning will be lost. Ensure the instance owner has backed up root disk data before issuing a rebuild.
Cause: CPU compatibility mismatch, insufficient destination capacity, or a network timeout during the migration data transfer.Diagnosis:
Check migration status and error message
openstack server migration list --server <instance-id>
Show detailed migration information
openstack server migration show <instance-id> <migration-id>
Common errors and resolutions:
Error MessageRoot CauseResolution
guest CPU doesn't match specification: missing featuresCPU microarchitecture difference between hostsConfigure a common CPU baseline model on all hosts via XDeploy under Compute → Advanced Settings → CPU Compatibility
No valid host foundDestination host lacks capacity or is disabledCheck destination host capacity with openstack hypervisor show <host>; verify host is enabled and up
Connection timeoutNetwork disruption on migration networkVerify network connectivity between compute nodes; check firewall rules on the management interface
Block migration disk copy failedInsufficient free disk on destinationCheck available disk with openstack hypervisor show <host>
See Live Migration for a full walkthrough of the migration procedure.
Cause: All compute hosts were eliminated by the scheduler filter chain — no eligible host satisfies the combined instance requirements.Diagnosis:
Check cluster-wide capacity
openstack hypervisor stats show
List all hosts with capacity details
openstack hypervisor list --long
Common causes:
CauseResolution
All hosts at vCPU or RAM capacityScale out the cluster or increase over-commit ratios via XDeploy
Availability zone constraint too restrictiveVerify target AZ has active hosts with openstack availability zone list --long
Host aggregate metadata mismatchVerify flavor extra specs match aggregate metadata keys on target hosts
Server group anti-affinity exhaustedGroup has used all distinct hosts; scale out or remove anti-affinity constraint
Flavor requires PCI device not availableVerify PCI passthrough devices are configured on target hosts
If openstack hypervisor stats show reports available capacity but scheduling still fails, the Placement service inventory may be out of sync with actual host state. Trigger a resource reconciliation through XDeploy under Compute → Diagnostics → Reconcile Inventory.
Cause: The VNC or SPICE console proxy service is not running, the firewall is blocking the console port, or the console token has expired.Diagnosis:
Verify console proxy service status
openstack compute service list | grep consoleauth
Check all compute services for degraded state
openstack compute service list
Resolution:
  1. Verify ports 6080 (VNC), 6082 (SPICE), and 6083 (serial) are open in your firewall rules from the administrator’s workstation to the controller node.
  2. If the console proxy service is down, restart it through XDeploy under Compute → Services → Console Proxy.
  3. If the connection is refused immediately after generating a URL, the token may have expired. Generate a new console URL:
Generate a fresh console URL
openstack console url show --novnc <instance-id>
Console tokens expire after a short period. If the browser reports an authentication error when accessing the console URL, always generate a new URL rather than refreshing the page.
See Console Access for firewall port requirements and proxy configuration details.
Cause: The project has reached its allocation limit for instances, vCPUs, or RAM. New instance creation or resize operations are blocked until the quota is increased or existing resources are released.Diagnosis:
Show current quota usage
openstack quota show --compute <project-id>
List instances consuming quota in the project
openstack server list \
  --project <project-id> \
  --all-projects \
  --long
Resolution:Option 1 — Increase the project quota:
Increase quota for instances, vCPUs, and RAM
openstack quota set \
  --instances 50 \
  --cores 100 \
  --ram 204800 \
  <project-id>
Option 2 — Free capacity by removing unused instances:Coordinate with the project owner to identify and delete instances that are no longer in use. Do not delete instances without explicit confirmation from the project owner.
Before increasing quotas, verify the cluster has sufficient physical capacity with openstack hypervisor stats show. See Quota Management for quota adjustment procedures.

Next Steps

Compute Hosts

Monitor and manage hypervisor host health to prevent scheduling failures.

Live Migration

Move instances off degraded hosts before performing maintenance.

Admin Guide

Return to the Compute Administration Guide index.