> ## Documentation Index > Fetch the complete documentation index at: https://docs.xloud.tech/llms.txt > Use this file to discover all available pages before exploring further. # Kubernetes Admin Troubleshooting > Diagnose and resolve Xloud K8SaaS platform issues — Conductor failures, Heat stack errors, certificate problems, and cross-project cluster failures. ## Overview This guide covers administrator-level troubleshooting for the K8SaaS platform — from Conductor startup failures and Heat stack errors to certificate issues and quota enforcement problems. For user-facing issues such as individual cluster access failures, see the [Kubernetes User Troubleshooting](/services/kubernetes/user-guide/troubleshooting) guide. *** ## Common Issues **Cause**: The K8SaaS Conductor cannot reach the Orchestration service, or the cluster template references an image or flavor that does not exist. **Resolution**: ```bash title="Check Conductor logs" theme={null} docker logs -f magnum_conductor ``` Look for `ConnectionError`, `NotFound`, or `AuthenticationRequired` messages. ```bash title="Verify Orchestration service is healthy" theme={null} openstack stack list ``` ```bash title="Verify node image exists" theme={null} openstack image show fedora-coreos-39 ``` If the image is missing, upload it and ask project teams to retry cluster creation. **Cause**: The Orchestration template failed during resource creation — quota exhaustion, a dependency failure (LB or DNS), or a template rendering error. **Resolution**: ```bash title="Show failed stack events" theme={null} openstack stack event list \ --nested-depth 3 \ | grep -i fail ``` ```bash title="Find the stack name for a cluster" theme={null} openstack coe cluster show \ -f value -c stack_id ``` Address the root cause (quota, network, service availability) and then delete the failed cluster before retrying: ```bash title="Delete failed cluster" theme={null} openstack coe cluster delete ``` **Cause**: Replaced nodes received new TLS certificates that do not match the cluster CA recorded in the K8SaaS database. **Resolution**: Rotate the cluster CA to regenerate consistent certificates: ```bash title="Rotate cluster CA" theme={null} openstack coe ca rotate ``` Notify all project users to refresh their kubeconfig after the rotation completes. **Cause**: The underlying Heat stack has resources in a failed state that prevent cleanup, or a resource dependency is blocking deletion. **Resolution**: ```bash title="Show stack deletion error" theme={null} openstack stack show -f value -c stack_status_reason ``` Manually delete the blocking resource (e.g., a floating IP still attached to a deleted VM): ```bash title="List stack resources" theme={null} openstack stack resource list ``` ```bash title="Force-delete the Heat stack" theme={null} openstack stack delete --yes ``` After manual cleanup, delete the cluster record from K8SaaS: ```bash title="Force delete cluster record" theme={null} openstack coe cluster delete --force ``` **Cause**: The Conductor is overloaded, has lost database connectivity, or crashed due to an unhandled exception. **Resolution**: ```bash title="Check Conductor status and logs" theme={null} docker ps --filter name=magnum_conductor docker logs --tail 50 magnum_conductor ``` Restart the Conductor if it shows as unhealthy or has no recent log output: ```bash title="Restart Conductor" theme={null} docker restart magnum_conductor ``` Increase the worker count if the Conductor is consistently behind: ```ini title="/etc/xavs/kubernetes/kubernetes.conf" theme={null} [DEFAULT] workers = 4 ``` *** ## Diagnostic Commands Reference ```bash title="Check all K8SaaS container statuses" theme={null} docker ps --filter name=magnum ``` ```bash title="List all clusters across all projects" theme={null} openstack coe cluster list --all ``` ```bash title="Show cluster with full detail" theme={null} openstack coe cluster show -f json ``` ```bash title="Show Orchestration stack events" theme={null} openstack stack event list --nested-depth 2 ``` ```bash title="Check K8SaaS API logs" theme={null} docker logs --tail 100 magnum_api ``` *** ## Next Steps Monitor all clusters for failed and stuck lifecycle states. Resolve certificate errors with CA rotation. Resolve quota-related cluster creation failures. User-facing guide for individual cluster access and health issues.