Overview
This guide covers service-level troubleshooting for Xloud Block Storage administrators. It addresses issues with the volume service, scheduler, backend connectivity, and data operations that are not visible to or resolvable by end users.Before troubleshooting
- Authenticate with admin credentials:
source admin-openrc.sh - Access service logs via XDeploy for detailed error messages
- For critical production issues, contact Xloud Support with the affected volume IDs and log excerpts
Service Health Checks
Run these commands first to establish the overall service state:Check all volume service states
List all backend pools and capacity
Check API endpoint health
state = up and status = enabled. Any service showing
down requires immediate investigation.
Volume Service Issues
Volume service is 'down'
Volume service is 'down'
Symptom:
openstack volume service list shows one or more services with
state down.Cause: The volume service container has stopped, lost message queue connectivity,
or the storage backend driver failed to initialize.Resolution:-
Access the affected node through XDeploy and check the volume service container:
Check container status (on storage node)
- Review container logs for initialization errors: Access logs via XDeploy → Logs → cinder-volume on the affected node.
-
Common causes:
- Message queue (RabbitMQ) connectivity lost — check network and RabbitMQ status
- Database connection failure — verify MariaDB/Galera cluster health
- Backend driver error (keyring file missing, pool name wrong) — review driver-specific log entries
- After resolving the root cause, restart the volume service via XDeploy.
Scheduler fails to place volumes — 'No valid backend'
Scheduler fails to place volumes — 'No valid backend'
Symptom: Volume creation fails with “No valid host was found” or scheduler
filter messages appear in the logs.Cause: All backends were eliminated by the scheduler filters. Common causes:Confirm that
- All backends are at capacity
- The requested volume type’s
volume_backend_namedoes not match any active backend - The requested availability zone has no active backend
Verify backend capacity
Verify volume type extra specs
volume_backend_name in the type’s extra specs matches the name
column in the backend pool list.Backend Connectivity Issues
Backend reporting zero or unknown capacity
Backend reporting zero or unknown capacity
Symptom:
openstack volume backend pool list shows free_capacity_gb = 0
or the backend pool does not appear in the list.Cause: The volume service cannot connect to the storage cluster to query capacity.Resolution:- Verify storage cluster health from the storage administration interface.
- Verify the authentication keyring file is present on the volume service node:
Check keyring file (on storage node)
- Verify the pool name matches the configured backend:
List storage pools
- Restart the volume service via XDeploy after resolving connectivity issues.
Volume creation succeeds but attachment fails
Volume creation succeeds but attachment fails
Symptom: Volumes reach
available status but fail when attaching to instances.
Error typically references connection initialization or iSCSI/RBD target.Cause: The compute node cannot connect to the storage backend to initialize
the volume attachment. Common causes:- Missing storage client package on the compute node
- Authentication keyring not present on the compute node
- Network routing between compute and storage nodes is blocked
- Verify the storage client package is installed on the compute node
(e.g.,
librbd-dev,ceph-common, or iSCSI initiator packages) - Verify the keyring file is present on the compute node
- Test connectivity from the compute node to the storage cluster monitors
Data Operation Issues
Volume migration stuck in 'migrating' status
Volume migration stuck in 'migrating' status
Symptom: Volume status remains Check volume service logs on both source and destination nodes via XDeploy for
migration-related errors.If permanently stuck and data integrity has been verified:
migrating for more than 30 minutes with no
completion.Resolution:Check migration status
Reset volume state (admin only)
Snapshot stuck in 'deleting' state
Snapshot stuck in 'deleting' state
Symptom: A snapshot remains in
deleting state for an extended period.Cause: The backend could not complete the deletion — typically because dependent
volumes still reference the snapshot, or the storage cluster is degraded.Resolution:- Check for volumes created from the snapshot:
Look for volumes withList dependent volumes
source_volidmatching the snapshot. - Delete dependent volumes first, then retry the snapshot deletion.
- If the storage cluster is degraded, restore cluster health before retrying.
Encryption errors on volume attach
Encryption errors on volume attach
Symptom: Encrypted volume attachment fails with a key management or dm-crypt error.Diagnosis:
- Verify the Key Management service is running and accessible from the compute node:
Check key manager connectivity
- Confirm the compute service on the affected node can reach the Key Management service API (network path, port 9311).
- Check compute service logs on the affected node via XDeploy for messages
containing
barbican,secret, orcrypt.
Recovering Orphaned Volumes
Volumes can become orphaned (in-use with no valid attachment) when compute instances are force-deleted without detaching their volumes first:Find orphaned volumes (in-use with no valid instance)
Check attached instance
Reset orphaned volume to available
Diagnostic Commands Reference
| Command | Purpose |
|---|---|
openstack volume service list | Check all service states |
openstack volume backend pool list --long | Verify backend capacity |
openstack volume list --all-projects --status error | Find volumes in error state |
openstack volume snapshot list --all-projects | Audit all snapshots |
openstack quota list --detail | Check quota usage across projects |
openstack volume show <id> -c migration_status | Check migration state |
openstack volume set --state <state> <id> | Force-reset volume state (admin) |
Next Steps
User Troubleshooting
Common issues from the user perspective
Storage Backends
Review backend configuration and connectivity requirements
Architecture
Understand service components to narrow down failure domains
Contact Support
Open a support ticket for unresolved production issues