Overview
This page covers the most common Kubernetes cluster issues encountered by project users, with targeted diagnostics and resolution steps. For platform-level issues such as driver configuration failures or quota enforcement problems, refer to the Kubernetes Admin Troubleshooting guide.Common Issues
Cluster stuck in CREATE_IN_PROGRESS
Cluster stuck in CREATE_IN_PROGRESS
Cause: Node provisioning is delayed — commonly due to insufficient compute quota,
an unavailable node image, or a network configuration issue during node bootstrap.Resolution:Check compute quota:Verify the node image exists:If resources are insufficient, request a quota increase from your administrator.
If the image is missing, ask your administrator to upload the required image.
Show cluster failure reason
Check project quota
Verify image
kubectl connection refused or timeout
kubectl connection refused or timeout
Cause: The cluster API server endpoint is unreachable. The master load balancer
floating IP may not have been allocated, or a security group rule is blocking
port 6443.Resolution:Verify the API address is a reachable floating IP:Expected: Ensure inbound TCP port 6443 is permitted from your management network.
Show API server endpoint
Test API server connectivity
okIf the endpoint is unreachable, check security groups for the master nodes:List cluster security groups
Nodes in NotReady state
Nodes in NotReady state
Cause: The container network interface plugin has not initialized, the node is
still bootstrapping, or the node has run out of resources.Resolution:Look for Review the bootstrap logs for CNI installation errors. If the CNI plugin did not
install correctly, the node may need to be replaced (scale down then back up).
Check node conditions
NetworkPlugin, DiskPressure, MemoryPressure, or PIDPressure
conditions in the output.For CNI failures, check node logs via the instance console:Access node console
Cluster upgrade fails mid-way
Cluster upgrade fails mid-way
Cause: A node replacement failed during the rolling upgrade — commonly due to
quota exhaustion or an image pull failure on the replacement node.Resolution:Identify the failure cause from
Check upgrade status and reason
status_reason. Common causes:- Quota exhausted: Free up compute quota, then retry the upgrade command
- Image unavailable: Verify the target template’s image exists and is accessible
Retry upgrade
Persistent volume claims not binding
Persistent volume claims not binding
Cause: The volume driver (If no storage classes exist, verify the cluster template has If the volume driver is missing, the cluster must be recreated from a corrected
template. Contact your administrator to update the platform template. Your administrator can configure this through XDeploy.
cinder) is not configured in the cluster template,
or the storage class is missing from the cluster.Resolution:Check storage classes
--volume-driver cinder:Show template volume driver
kubectl shows certificate verification error
kubectl shows certificate verification error
Cause: The cluster CA has been rotated since you downloaded the kubeconfig,
or the kubeconfig references an expired certificate.Resolution: Refresh your kubeconfig:
Re-download kubeconfig
Set kubeconfig
Verify connectivity
Diagnostic Commands Reference
Show cluster full detail
List all clusters and their statuses
Check kubectl cluster connectivity
Show all system pods
Describe a specific node
Next Steps
Deploy Cluster
Re-deploy a cluster after resolving provisioning issues.
Access Cluster
Reconfigure kubectl connectivity after certificate or endpoint changes.
Kubernetes Admin Troubleshooting
Platform-level diagnostics for driver and quota issues.
Cluster Upgrades
Resume or retry failed version upgrades.