Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.xloud.tech/llms.txt

Use this file to discover all available pages before exploring further.

Overview

This page covers the most common Kubernetes cluster issues encountered by project users, with targeted diagnostics and resolution steps. For platform-level issues such as driver configuration failures or quota enforcement problems, refer to the Kubernetes Admin Troubleshooting guide.

Common Issues

Cause: Node provisioning is delayed — commonly due to insufficient compute quota, an unavailable node image, or a network configuration issue during node bootstrap.Resolution:
Show cluster failure reason
openstack coe cluster show prod-cluster-01 \
  -f value -c status_reason
Check compute quota:
Check project quota
openstack quota show --detail
Verify the node image exists:
Verify image
openstack image show fedora-coreos-39
If resources are insufficient, request a quota increase from your administrator. If the image is missing, ask your administrator to upload the required image.
Cause: The cluster API server endpoint is unreachable. The master load balancer floating IP may not have been allocated, or a security group rule is blocking port 6443.Resolution:
Show API server endpoint
openstack coe cluster show prod-cluster-01 \
  -f value -c api_address
Verify the API address is a reachable floating IP:
Test API server connectivity
curl -sk https://<api-address>:6443/healthz
Expected: okIf the endpoint is unreachable, check security groups for the master nodes:
List cluster security groups
openstack security group list | grep prod-cluster-01
Ensure inbound TCP port 6443 is permitted from your management network.
Cause: The container network interface plugin has not initialized, the node is still bootstrapping, or the node has run out of resources.Resolution:
Check node conditions
kubectl describe node <node-name>
Look for NetworkPlugin, DiskPressure, MemoryPressure, or PIDPressure conditions in the output.For CNI failures, check node logs via the instance console:
Access node console
openstack console url show <node-instance-id>
Review the bootstrap logs for CNI installation errors. If the CNI plugin did not install correctly, the node may need to be replaced (scale down then back up).
Cause: A node replacement failed during the rolling upgrade — commonly due to quota exhaustion or an image pull failure on the replacement node.Resolution:
Check upgrade status and reason
openstack coe cluster show prod-cluster-01 \
  -f value -c status -c status_reason
Identify the failure cause from status_reason. Common causes:
  • Quota exhausted: Free up compute quota, then retry the upgrade command
  • Image unavailable: Verify the target template’s image exists and is accessible
After resolving the root cause, retry the upgrade:
Retry upgrade
openstack coe cluster upgrade prod-cluster-01 k8s-1.30-prod
Cause: The volume driver (cinder) is not configured in the cluster template, or the storage class is missing from the cluster.Resolution:
Check storage classes
kubectl get storageclass
If no storage classes exist, verify the cluster template has --volume-driver cinder:
Show template volume driver
openstack coe cluster template show k8s-1.29-prod \
  -f value -c volume_driver
If the volume driver is missing, the cluster must be recreated from a corrected template. Contact your administrator to update the platform template. Your administrator can configure this through XDeploy.
Cause: The cluster CA has been rotated since you downloaded the kubeconfig, or the kubeconfig references an expired certificate.Resolution: Refresh your kubeconfig:
Re-download kubeconfig
openstack coe cluster config prod-cluster-01 \
  --dir ~/.kube \
  --force
Set kubeconfig
export KUBECONFIG=~/.kube/config
Verify connectivity
kubectl get nodes

Diagnostic Commands Reference

Show cluster full detail
openstack coe cluster show prod-cluster-01 -f json
List all clusters and their statuses
openstack coe cluster list
Check kubectl cluster connectivity
kubectl cluster-info
Show all system pods
kubectl get pods -n kube-system
Describe a specific node
kubectl describe node <node-name>

Next Steps

Deploy Cluster

Re-deploy a cluster after resolving provisioning issues.

Access Cluster

Reconfigure kubectl connectivity after certificate or endpoint changes.

Kubernetes Admin Troubleshooting

Platform-level diagnostics for driver and quota issues.

Cluster Upgrades

Resume or retry failed version upgrades.