Skip to main content

Overview

Admin-level Orchestration issues differ from user-facing stack failures. They typically involve the service itself — engine workers not starting, the API becoming unreachable, trust or stack domain misconfiguration, or a resource plugin failing to load. Use the service log files and openstack orchestration service list as primary diagnostic tools.
Administrator Access Required — This operation requires the admin role. Contact your Xloud administrator if you do not have sufficient permissions.

Diagnostic Reference

Symptoms: Stacks remain in CREATE_IN_PROGRESS indefinitely. No events appear in openstack stack event list. openstack orchestration service list shows engine workers as down.Diagnosis:
Check service status
openstack orchestration service list
Check engine container logs (XDeploy/XAVS deployment)
docker logs heat_engine --tail=100
Check message queue connectivity
docker exec heat_engine python3 -c "
import kombu
conn = kombu.Connection('amqp://user:pass@rabbitmq/')
conn.ensure_connection()
print('RabbitMQ connection OK')
"
Common causes and resolutions:
CauseResolution
Engine container exited on startupCheck docker logs heat_engine for the error. Common causes: database connection failure, misconfigured heat.conf
RabbitMQ connection refusedVerify RabbitMQ is running: docker ps | grep rabbit. Check transport_url in engine configuration
Database migration not appliedRun docker exec heat_engine heat-manage db_sync to apply pending migrations
Stack domain not configuredCheck stack_domain_admin and stack_domain_admin_password in heat.conf
Restart the engine:
Restart engine container
docker restart heat_engine
Symptoms: Dashboard shows Orchestration as unavailable. CLI commands return 503 Service Unavailable or connection refused on port 8004.Diagnosis:
Check API container status
docker ps --filter name=heat_api
docker logs heat_api --tail=50
Test API endpoint directly
curl -s http://localhost:8004/
Check HAProxy backend health
echo "show stat" | socat stdio /var/run/haproxy/admin.sock | grep heat
Common causes and resolutions:
CauseResolution
API container not runningdocker start heat_api
Keystone endpoint not registeredVerify: openstack endpoint list | grep orchestration
SSL certificate expired (if TLS enabled)Renew certificate and restart API container
HAProxy backend marked DOWNCheck network connectivity between HAProxy and the API container; restart the API
Symptoms: Stacks containing WaitCondition or auto-scaling resources fail with errors mentioning StackDomainUser or TrustActionMismatch. Users cannot create stacks that require credentials delegation.Diagnosis:
Verify stack domain exists
openstack domain list | grep heat
Verify stack domain admin user
openstack user list --domain heat
Test stack domain admin credentials
openstack --os-username heat_domain_admin \
          --os-user-domain-name heat \
          --os-password <password> \
          token issue
Common causes and resolutions:
CauseResolution
heat domain does not existRe-run xavs-ansible deploy -t heat to recreate the domain
Stack domain admin password incorrectUpdate heat_domain_admin_password in passwords.yml and redeploy
stack_domain_admin setting missing from heat.confVerify XDeploy configuration and redeploy
Xloud Identity service unreachable from engineCheck network connectivity between the engine container and port 5000
Symptoms: Specific resource types consistently fail with InvalidTemplateVersion or ResourceTypeUnavailable. The engine log shows import errors.Diagnosis:
List available resource types
openstack orchestration resource type list
Show resource type schema
openstack orchestration resource type show Xloud::Compute::Server
Check engine log for plugin errors
docker logs heat_engine 2>&1 | grep -i "plugin\|resource_type\|ImportError"
Common causes and resolutions:
CauseResolution
Dependent service not enabledSome resource types require specific services. Xloud::Networking::FloatingIP requires networking; verify the service is enabled
Plugin version mismatch after upgradeRestart the engine after upgrades: docker restart heat_engine
Custom plugin missingIf using custom resource plugins, verify the plugin file exists in the engine’s plugin directory and has correct permissions
Symptoms: Stacks with many resources (100+) frequently time out or take much longer than expected. Engine workers appear idle despite stacks being queued.Diagnosis:
Check engine worker count
openstack orchestration service list | grep heat-eng | wc -l
Check message queue depth
docker exec rabbitmq rabbitmqctl list_queues name messages
Resolutions:
ActionSetting
Increase engine workersheat_engine_workers: 8 (or higher)
Increase RPC timeoutheat_rpc_response_timeout: 300
Increase database poolheat_db_max_pool_size: 20
Verify convergence mode is onheat_convergence_engine: true
Apply changes by updating globals and redeploying:
Redeploy with new settings
xavs-ansible deploy -t heat

Log Locations

ServiceLog Path
Orchestration Enginedocker logs heat_engine or /var/log/kolla/heat/heat-engine.log
Orchestration APIdocker logs heat_api or /var/log/kolla/heat/heat-api.log
CloudWatch APIdocker logs heat_api_cfn or /var/log/kolla/heat/heat-api-cfn.log

Next Steps

Configuration

Review and update service configuration through XDeploy

Scaling the Service

Add engine workers to resolve throughput and timeout issues

Security

Diagnose stack domain and trust authorization problems

User Troubleshooting

Stack-level diagnostics for CREATE_FAILED and template errors