Kubernetes Troubleshooting — User Guide

Cluster stuck in CREATE_IN_PROGRESS

Cause: Node provisioning is delayed — commonly due to insufficient compute quota, an unavailable node image, or a network configuration issue during node bootstrap.Resolution:

Show cluster failure reason

openstack coe cluster show prod-cluster-01 \
  -f value -c status_reason

Check compute quota:

Check project quota

openstack quota show --detail

Verify the node image exists:

Verify image

openstack image show fedora-coreos-39

If resources are insufficient, request a quota increase from your administrator. If the image is missing, ask your administrator to upload the required image.

kubectl connection refused or timeout

Cause: The cluster API server endpoint is unreachable. The master load balancer floating IP may not have been allocated, or a security group rule is blocking port 6443.Resolution:

Show API server endpoint

openstack coe cluster show prod-cluster-01 \
  -f value -c api_address

Verify the API address is a reachable floating IP:

Test API server connectivity

curl -sk https://<api-address>:6443/healthz

Expected: okIf the endpoint is unreachable, check security groups for the master nodes:

List cluster security groups

openstack security group list | grep prod-cluster-01

Ensure inbound TCP port 6443 is permitted from your management network.

Nodes in NotReady state

Cause: The container network interface plugin has not initialized, the node is still bootstrapping, or the node has run out of resources.Resolution:

Check node conditions

kubectl describe node <node-name>

Look for NetworkPlugin, DiskPressure, MemoryPressure, or PIDPressure conditions in the output.For CNI failures, check node logs via the instance console:

Access node console

openstack console url show <node-instance-id>

Review the bootstrap logs for CNI installation errors. If the CNI plugin did not install correctly, the node may need to be replaced (scale down then back up).

Cluster upgrade fails mid-way

Cause: A node replacement failed during the rolling upgrade — commonly due to quota exhaustion or an image pull failure on the replacement node.Resolution:

Check upgrade status and reason

openstack coe cluster show prod-cluster-01 \
  -f value -c status -c status_reason

Identify the failure cause from status_reason. Common causes:

Quota exhausted: Free up compute quota, then retry the upgrade command
Image unavailable: Verify the target template’s image exists and is accessible

After resolving the root cause, retry the upgrade:

Retry upgrade

openstack coe cluster upgrade prod-cluster-01 k8s-1.30-prod

Persistent volume claims not binding

Cause: The volume driver (cinder) is not configured in the cluster template, or the storage class is missing from the cluster.Resolution:

Check storage classes

kubectl get storageclass

If no storage classes exist, verify the cluster template has --volume-driver cinder:

Show template volume driver

openstack coe cluster template show k8s-1.29-prod \
  -f value -c volume_driver

If the volume driver is missing, the cluster must be recreated from a corrected template. Contact your administrator to update the platform template. Your administrator can configure this through XDeploy.

kubectl shows certificate verification error

Cause: The cluster CA has been rotated since you downloaded the kubeconfig, or the kubeconfig references an expired certificate.Resolution: Refresh your kubeconfig:

Re-download kubeconfig

openstack coe cluster config prod-cluster-01 \
  --dir ~/.kube \
  --force

Set kubeconfig

export KUBECONFIG=~/.kube/config

Verify connectivity

kubectl get nodes

Deploy Cluster

Re-deploy a cluster after resolving provisioning issues.

Access Cluster

Reconfigure kubectl connectivity after certificate or endpoint changes.

Kubernetes Admin Troubleshooting

Platform-level diagnostics for driver and quota issues.

Cluster Upgrades

Resume or retry failed version upgrades.

Core Services

Other Services

Kubernetes Troubleshooting — User Guide

Overview

Common Issues

Diagnostic Commands Reference

Next Steps

Deploy Cluster

Access Cluster

Kubernetes Admin Troubleshooting

Cluster Upgrades

Core Services

Other Services

Documentation Index

​Overview

​Common Issues

​Diagnostic Commands Reference

​Next Steps

Deploy Cluster

Access Cluster

Kubernetes Admin Troubleshooting

Cluster Upgrades

Overview

Common Issues

Diagnostic Commands Reference

Next Steps