Kubernetes Admin Troubleshooting

Clusters fail to create across projects

Cause: The K8SaaS Conductor cannot reach the Orchestration service, or the cluster template references an image or flavor that does not exist.Resolution:

Check Conductor logs

docker logs -f magnum_conductor

Look for ConnectionError, NotFound, or AuthenticationRequired messages.

Verify Orchestration service is healthy

openstack stack list

Verify node image exists

openstack image show fedora-coreos-39

If the image is missing, upload it and ask project teams to retry cluster creation.

Heat stack in FAILED state

Cause: The Orchestration template failed during resource creation — quota exhaustion, a dependency failure (LB or DNS), or a template rendering error.Resolution:

Show failed stack events

openstack stack event list <cluster-stack-name> \
  --nested-depth 3 \
  | grep -i fail

Find the stack name for a cluster

openstack coe cluster show <cluster-name> \
  -f value -c stack_id

Address the root cause (quota, network, service availability) and then delete the failed cluster before retrying:

Delete failed cluster

openstack coe cluster delete <cluster-name>

Certificate errors after node replacement

Cause: Replaced nodes received new TLS certificates that do not match the cluster CA recorded in the K8SaaS database.Resolution: Rotate the cluster CA to regenerate consistent certificates:

Rotate cluster CA

openstack coe ca rotate <cluster-name>

Notify all project users to refresh their kubeconfig after the rotation completes.

DELETE_FAILED — cluster cannot be removed

Cause: The underlying Heat stack has resources in a failed state that prevent cleanup, or a resource dependency is blocking deletion.Resolution:

Show stack deletion error

openstack stack show <stack-id> -f value -c stack_status_reason

Manually delete the blocking resource (e.g., a floating IP still attached to a deleted VM):

List stack resources

openstack stack resource list <stack-id>

Force-delete the Heat stack

openstack stack delete --yes <stack-id>

After manual cleanup, delete the cluster record from K8SaaS:

Force delete cluster record

openstack coe cluster delete --force <cluster-name>

Conductor not processing cluster tasks

Cause: The Conductor is overloaded, has lost database connectivity, or crashed due to an unhandled exception.Resolution:

Check Conductor status and logs

docker ps --filter name=magnum_conductor
docker logs --tail 50 magnum_conductor

Restart the Conductor if it shows as unhealthy or has no recent log output:

Restart Conductor

docker restart magnum_conductor

Increase the worker count if the Conductor is consistently behind:

/etc/xavs/kubernetes/kubernetes.conf

[DEFAULT]
workers = 4

Monitoring

Monitor all clusters for failed and stuck lifecycle states.

Certificates

Resolve certificate errors with CA rotation.

Quotas

Resolve quota-related cluster creation failures.

User Troubleshooting

User-facing guide for individual cluster access and health issues.

Core Services

Other Services

Kubernetes Admin Troubleshooting

Overview

Common Issues

Diagnostic Commands Reference

Next Steps

Monitoring

Certificates

Quotas

User Troubleshooting

Core Services

Other Services

Documentation Index

​Overview

​Common Issues

​Diagnostic Commands Reference

​Next Steps

Monitoring

Certificates

Quotas

User Troubleshooting

Overview

Common Issues

Diagnostic Commands Reference

Next Steps