Skip to main content

Overview

This page covers the most common Kubernetes cluster issues encountered by project users, with targeted diagnostics and resolution steps. For platform-level issues such as driver configuration failures or quota enforcement problems, refer to the Kubernetes Admin Troubleshooting guide.

Common Issues

Cause: Node provisioning is delayed — commonly due to insufficient compute quota, an unavailable node image, or a network configuration issue during node bootstrap.Resolution:
Show cluster failure reason
openstack coe cluster show prod-cluster-01 \
  -f value -c status_reason
Check compute quota:
Check project quota
openstack quota show --detail
Verify the node image exists:
Verify image
openstack image show fedora-coreos-39
If resources are insufficient, request a quota increase from your administrator. If the image is missing, ask your administrator to upload the required image.
Cause: The cluster API server endpoint is unreachable. The master load balancer floating IP may not have been allocated, or a security group rule is blocking port 6443.Resolution:
Show API server endpoint
openstack coe cluster show prod-cluster-01 \
  -f value -c api_address
Verify the API address is a reachable floating IP:
Test API server connectivity
curl -sk https://<api-address>:6443/healthz
Expected: okIf the endpoint is unreachable, check security groups for the master nodes:
List cluster security groups
openstack security group list | grep prod-cluster-01
Ensure inbound TCP port 6443 is permitted from your management network.
Cause: The container network interface plugin has not initialized, the node is still bootstrapping, or the node has run out of resources.Resolution:
Check node conditions
kubectl describe node <node-name>
Look for NetworkPlugin, DiskPressure, MemoryPressure, or PIDPressure conditions in the output.For CNI failures, check node logs via the instance console:
Access node console
openstack console url show <node-instance-id>
Review the bootstrap logs for CNI installation errors. If the CNI plugin did not install correctly, the node may need to be replaced (scale down then back up).
Cause: A node replacement failed during the rolling upgrade — commonly due to quota exhaustion or an image pull failure on the replacement node.Resolution:
Check upgrade status and reason
openstack coe cluster show prod-cluster-01 \
  -f value -c status -c status_reason
Identify the failure cause from status_reason. Common causes:
  • Quota exhausted: Free up compute quota, then retry the upgrade command
  • Image unavailable: Verify the target template’s image exists and is accessible
After resolving the root cause, retry the upgrade:
Retry upgrade
openstack coe cluster upgrade prod-cluster-01 k8s-1.30-prod
Cause: The volume driver (cinder) is not configured in the cluster template, or the storage class is missing from the cluster.Resolution:
Check storage classes
kubectl get storageclass
If no storage classes exist, verify the cluster template has --volume-driver cinder:
Show template volume driver
openstack coe cluster template show k8s-1.29-prod \
  -f value -c volume_driver
If the volume driver is missing, the cluster must be recreated from a corrected template. Contact your administrator to update the platform template. Your administrator can configure this through XDeploy.
Cause: The cluster CA has been rotated since you downloaded the kubeconfig, or the kubeconfig references an expired certificate.Resolution: Refresh your kubeconfig:
Re-download kubeconfig
openstack coe cluster config prod-cluster-01 \
  --dir ~/.kube \
  --force
Set kubeconfig
export KUBECONFIG=~/.kube/config
Verify connectivity
kubectl get nodes

Diagnostic Commands Reference

Show cluster full detail
openstack coe cluster show prod-cluster-01 -f json
List all clusters and their statuses
openstack coe cluster list
Check kubectl cluster connectivity
kubectl cluster-info
Show all system pods
kubectl get pods -n kube-system
Describe a specific node
kubectl describe node <node-name>

Next Steps

Deploy Cluster

Re-deploy a cluster after resolving provisioning issues.

Access Cluster

Reconfigure kubectl connectivity after certificate or endpoint changes.

Kubernetes Admin Troubleshooting

Platform-level diagnostics for driver and quota issues.

Cluster Upgrades

Resume or retry failed version upgrades.