> ## Documentation Index > Fetch the complete documentation index at: https://docs.xloud.tech/llms.txt > Use this file to discover all available pages before exploring further. # Kubernetes Troubleshooting — User Guide > Resolve common Xloud K8SaaS issues — clusters stuck in CREATE_IN_PROGRESS, kubectl connection failures, nodes in NotReady state, and upgrade failures. ## Overview This page covers the most common Kubernetes cluster issues encountered by project users, with targeted diagnostics and resolution steps. For platform-level issues such as driver configuration failures or quota enforcement problems, refer to the [Kubernetes Admin Troubleshooting](/services/kubernetes/admin-guide/troubleshooting) guide. *** ## Common Issues **Cause**: Node provisioning is delayed — commonly due to insufficient compute quota, an unavailable node image, or a network configuration issue during node bootstrap. **Resolution**: ```bash title="Show cluster failure reason" theme={null} openstack coe cluster show prod-cluster-01 \ -f value -c status_reason ``` Check compute quota: ```bash title="Check project quota" theme={null} openstack quota show --detail ``` Verify the node image exists: ```bash title="Verify image" theme={null} openstack image show fedora-coreos-39 ``` If resources are insufficient, request a quota increase from your administrator. If the image is missing, ask your administrator to upload the required image. **Cause**: The cluster API server endpoint is unreachable. The master load balancer floating IP may not have been allocated, or a security group rule is blocking port 6443. **Resolution**: ```bash title="Show API server endpoint" theme={null} openstack coe cluster show prod-cluster-01 \ -f value -c api_address ``` Verify the API address is a reachable floating IP: ```bash title="Test API server connectivity" theme={null} curl -sk https://:6443/healthz ``` Expected: `ok` If the endpoint is unreachable, check security groups for the master nodes: ```bash title="List cluster security groups" theme={null} openstack security group list | grep prod-cluster-01 ``` Ensure inbound TCP port 6443 is permitted from your management network. **Cause**: The container network interface plugin has not initialized, the node is still bootstrapping, or the node has run out of resources. **Resolution**: ```bash title="Check node conditions" theme={null} kubectl describe node ``` Look for `NetworkPlugin`, `DiskPressure`, `MemoryPressure`, or `PIDPressure` conditions in the output. For CNI failures, check node logs via the instance console: ```bash title="Access node console" theme={null} openstack console url show ``` Review the bootstrap logs for CNI installation errors. If the CNI plugin did not install correctly, the node may need to be replaced (scale down then back up). **Cause**: A node replacement failed during the rolling upgrade — commonly due to quota exhaustion or an image pull failure on the replacement node. **Resolution**: ```bash title="Check upgrade status and reason" theme={null} openstack coe cluster show prod-cluster-01 \ -f value -c status -c status_reason ``` Identify the failure cause from `status_reason`. Common causes: * **Quota exhausted**: Free up compute quota, then retry the upgrade command * **Image unavailable**: Verify the target template's image exists and is accessible After resolving the root cause, retry the upgrade: ```bash title="Retry upgrade" theme={null} openstack coe cluster upgrade prod-cluster-01 k8s-1.30-prod ``` **Cause**: The volume driver (`cinder`) is not configured in the cluster template, or the storage class is missing from the cluster. **Resolution**: ```bash title="Check storage classes" theme={null} kubectl get storageclass ``` If no storage classes exist, verify the cluster template has `--volume-driver cinder`: ```bash title="Show template volume driver" theme={null} openstack coe cluster template show k8s-1.29-prod \ -f value -c volume_driver ``` If the volume driver is missing, the cluster must be recreated from a corrected template. Contact your administrator to update the platform template. Your administrator can configure this through [XDeploy](/deployment). **Cause**: The cluster CA has been rotated since you downloaded the kubeconfig, or the kubeconfig references an expired certificate. **Resolution**: Refresh your kubeconfig: ```bash title="Re-download kubeconfig" theme={null} openstack coe cluster config prod-cluster-01 \ --dir ~/.kube \ --force ``` ```bash title="Set kubeconfig" theme={null} export KUBECONFIG=~/.kube/config ``` ```bash title="Verify connectivity" theme={null} kubectl get nodes ``` *** ## Diagnostic Commands Reference ```bash title="Show cluster full detail" theme={null} openstack coe cluster show prod-cluster-01 -f json ``` ```bash title="List all clusters and their statuses" theme={null} openstack coe cluster list ``` ```bash title="Check kubectl cluster connectivity" theme={null} kubectl cluster-info ``` ```bash title="Show all system pods" theme={null} kubectl get pods -n kube-system ``` ```bash title="Describe a specific node" theme={null} kubectl describe node ``` *** ## Next Steps Re-deploy a cluster after resolving provisioning issues. Reconfigure kubectl connectivity after certificate or endpoint changes. Platform-level diagnostics for driver and quota issues. Resume or retry failed version upgrades.