Skip to main content

Overview

This guide covers administrator-level troubleshooting for the K8SaaS platform — from Conductor startup failures and Heat stack errors to certificate issues and quota enforcement problems. For user-facing issues such as individual cluster access failures, see the Kubernetes User Troubleshooting guide.

Common Issues

Cause: The K8SaaS Conductor cannot reach the Orchestration service, or the cluster template references an image or flavor that does not exist.Resolution:
Check Conductor logs
docker logs -f magnum_conductor
Look for ConnectionError, NotFound, or AuthenticationRequired messages.
Verify Orchestration service is healthy
openstack stack list
Verify node image exists
openstack image show fedora-coreos-39
If the image is missing, upload it and ask project teams to retry cluster creation.
Cause: The Orchestration template failed during resource creation — quota exhaustion, a dependency failure (LB or DNS), or a template rendering error.Resolution:
Show failed stack events
openstack stack event list <cluster-stack-name> \
  --nested-depth 3 \
  | grep -i fail
Find the stack name for a cluster
openstack coe cluster show <cluster-name> \
  -f value -c stack_id
Address the root cause (quota, network, service availability) and then delete the failed cluster before retrying:
Delete failed cluster
openstack coe cluster delete <cluster-name>
Cause: Replaced nodes received new TLS certificates that do not match the cluster CA recorded in the K8SaaS database.Resolution: Rotate the cluster CA to regenerate consistent certificates:
Rotate cluster CA
openstack coe ca rotate <cluster-name>
Notify all project users to refresh their kubeconfig after the rotation completes.
Cause: The underlying Heat stack has resources in a failed state that prevent cleanup, or a resource dependency is blocking deletion.Resolution:
Show stack deletion error
openstack stack show <stack-id> -f value -c stack_status_reason
Manually delete the blocking resource (e.g., a floating IP still attached to a deleted VM):
List stack resources
openstack stack resource list <stack-id>
Force-delete the Heat stack
openstack stack delete --yes <stack-id>
After manual cleanup, delete the cluster record from K8SaaS:
Force delete cluster record
openstack coe cluster delete --force <cluster-name>
Cause: The Conductor is overloaded, has lost database connectivity, or crashed due to an unhandled exception.Resolution:
Check Conductor status and logs
docker ps --filter name=magnum_conductor
docker logs --tail 50 magnum_conductor
Restart the Conductor if it shows as unhealthy or has no recent log output:
Restart Conductor
docker restart magnum_conductor
Increase the worker count if the Conductor is consistently behind:
/etc/xavs/kubernetes/kubernetes.conf
[DEFAULT]
workers = 4

Diagnostic Commands Reference

Check all K8SaaS container statuses
docker ps --filter name=magnum
List all clusters across all projects
openstack coe cluster list --all
Show cluster with full detail
openstack coe cluster show <cluster-name> -f json
Show Orchestration stack events
openstack stack event list <stack-id> --nested-depth 2
Check K8SaaS API logs
docker logs --tail 100 magnum_api

Next Steps

Monitoring

Monitor all clusters for failed and stuck lifecycle states.

Certificates

Resolve certificate errors with CA rotation.

Quotas

Resolve quota-related cluster creation failures.

User Troubleshooting

User-facing guide for individual cluster access and health issues.