> ## Documentation Index > Fetch the complete documentation index at: https://docs.xloud.tech/llms.txt > Use this file to discover all available pages before exploring further. # Auto-Scaling with Xloud Orchestration > Configure auto-scaling groups with Heat templates. Scale instances based on Prometheus metrics and webhooks. ## Overview Xloud Orchestration provides native auto-scaling through three coordinated resource types: a scaling group, a scaling policy, and an alarm trigger. When a metric threshold is breached — such as CPU utilization exceeding 80% — the alarm fires a webhook that activates the scaling policy, which adds or removes instances from the group. Prometheus is the monitoring backend for alarm-driven scaling. It replaces legacy telemetry stacks (Ceilometer/Aodh) and provides a more reliable, standards-based metrics pipeline with support for multi-dimensional labels, alerting rules, and long-retention storage. **Prerequisites** * Xloud Orchestration enabled in your project * A compatible image and flavor for scaled instances * Prometheus deployed and scraping compute metrics ([see Prometheus integration](/integrations/prometheus)) * Basic familiarity with [orchestration templates](/services/orchestration/template-guide) *** ## Architecture ```mermaid theme={null} graph LR PROM[Prometheus
Metrics] -->|Evaluate alert rules| ALERT[Alertmanager] ALERT -->|POST to signal URL| POLICY[Scaling Policy] POLICY -->|Scale out / in| GROUP[Auto-Scaling Group] GROUP -->|Launch instance| NOVA[Compute] GROUP -->|Terminate instance| NOVA style POLICY fill:#197560,color:#fff style GROUP fill:#197560,color:#fff style PROM fill:#145C4C,color:#fff ``` ### Scaling Components | Component | Resource Type | Purpose | | ---------------------- | ---------------------------- | -------------------------------------------------------------------- | | **Auto-Scaling Group** | `OS::Heat::AutoScalingGroup` | Manages a pool of identical instances with min/max size constraints | | **Scaling Policy** | `OS::Heat::ScalingPolicy` | Defines how to adjust the group — add N, remove N, or set exact size | | **Signal URL** | Produced by `ScalingPolicy` | Webhook endpoint activated by Prometheus Alertmanager or manual POST | | **Prometheus** | External | Scrapes instance metrics, evaluates alert rules, fires webhooks | *** ## Use Cases | Use Case | Description | Trigger | | ---------------------------------- | ------------------------------------------------------------------------ | -------------------- | | **Elastic web / application tier** | Scale web server instances based on HTTP request rate or CPU utilization | Prometheus alert | | **CI/CD build farm** | Add worker nodes during active builds, shrink on idle | Schedule or webhook | | **Batch processing cluster** | Provision compute nodes for heavy batch jobs, release when complete | Manual or scheduled | | **Dev/test resource pools** | Automatically scale out environments for short-lived test runs | On-demand webhook | | **Disaster recovery warm pool** | Maintain standby instances that scale out during failover events | Alertmanager webhook | *** ## Orchestration Templates ### Static Cluster Template Use this template when you need a fixed number of instances deployed as a named group. Each instance is declared as a discrete resource — suitable for small, stable clusters. ```yaml title="static-cluster.yaml" theme={null} heat_template_version: 2016-10-14 description: Static 3-node compute cluster parameters: image: type: string default: Ubuntu-22.04 flavor: type: string default: m1.small network: type: string default: private key_name: type: string resources: vm1: type: OS::Nova::Server properties: name: cluster-node-1 image: { get_param: image } flavor: { get_param: flavor } key_name: { get_param: key_name } networks: - network: { get_param: network } vm2: type: OS::Nova::Server properties: name: cluster-node-2 image: { get_param: image } flavor: { get_param: flavor } key_name: { get_param: key_name } networks: - network: { get_param: network } vm3: type: OS::Nova::Server properties: name: cluster-node-3 image: { get_param: image } flavor: { get_param: flavor } key_name: { get_param: key_name } networks: - network: { get_param: network } outputs: vm1_ip: value: { get_attr: [vm1, first_address] } vm2_ip: value: { get_attr: [vm2, first_address] } vm3_ip: value: { get_attr: [vm3, first_address] } ``` ### Auto-Scaling Template This template creates a web tier that scales between 1 and 10 instances. The scaling policy signal URLs are exposed as stack outputs and can be wired into Prometheus Alertmanager webhook receivers. ```yaml title="autoscaling-stack.yaml" theme={null} heat_template_version: 2016-10-14 description: > Auto-scaling web tier with scale-out and scale-in policies. Signal URLs are consumed by Prometheus Alertmanager webhook receivers. parameters: image: type: string label: Instance Image flavor: type: string label: Instance Flavor default: m1.small key_name: type: string label: Key Pair network: type: string label: Network min_size: type: number default: 1 constraints: - range: { min: 1, max: 10 } max_size: type: number default: 10 constraints: - range: { min: 2, max: 20 } resources: # Auto-scaling group — manages the instance pool web_asg: type: OS::Heat::AutoScalingGroup properties: min_size: { get_param: min_size } max_size: { get_param: max_size } desired_capacity: { get_param: min_size } resource: type: OS::Nova::Server properties: image: { get_param: image } flavor: { get_param: flavor } key_name: { get_param: key_name } networks: - network: { get_param: network } user_data: | #!/bin/bash apt-get update -y apt-get install -y nginx systemctl enable --now nginx # Scale-out policy — add 1 instance per trigger scale_out_policy: type: OS::Heat::ScalingPolicy properties: auto_scaling_group_id: { get_resource: web_asg } adjustment_type: change_in_capacity scaling_adjustment: 1 cooldown: 60 # Scale-in policy — remove 1 instance per trigger scale_in_policy: type: OS::Heat::ScalingPolicy properties: auto_scaling_group_id: { get_resource: web_asg } adjustment_type: change_in_capacity scaling_adjustment: -1 cooldown: 120 outputs: scale_out_url: description: Webhook URL to trigger scale-out (wire into Alertmanager) value: { get_attr: [scale_out_policy, signal_url] } scale_in_url: description: Webhook URL to trigger scale-in (wire into Alertmanager) value: { get_attr: [scale_in_policy, signal_url] } current_size: description: Current instance count in the scaling group value: { get_attr: [web_asg, current_size] } ``` *** ## Adjustment Types | `adjustment_type` | Behavior | Example | | ---------------------------- | --------------------------------------------------- | ------------------------------------------------ | | `change_in_capacity` | Add or remove N instances relative to current count | `scaling_adjustment: 2` adds 2 instances | | `exact_capacity` | Set the group to exactly N instances | `scaling_adjustment: 5` sets group size to 5 | | `percent_change_in_capacity` | Change capacity by a percentage of current size | `scaling_adjustment: 25` adds 25% more instances | *** ## Prometheus Integration Prometheus Alertmanager delivers scaling signals by sending an HTTP POST to the policy signal URL. Configure a webhook receiver in your Alertmanager configuration: ```yaml title="alertmanager.yml" theme={null} route: receiver: "default" routes: - match: alertname: "HighCpuUsage" receiver: "scale-out" - match: alertname: "LowCpuUsage" receiver: "scale-in" receivers: - name: "scale-out" webhook_configs: - url: "" send_resolved: false - name: "scale-in" webhook_configs: - url: "" send_resolved: false ``` A matching Prometheus alert rule that fires when average CPU exceeds 80% for 2 minutes: ```yaml title="alert-rules.yml" theme={null} groups: - name: autoscaling rules: - alert: HighCpuUsage expr: avg(rate(node_cpu_seconds_total{mode!="idle"}[2m])) by (job) > 0.80 for: 2m labels: severity: warning annotations: summary: "CPU usage above 80% — triggering scale-out" - alert: LowCpuUsage expr: avg(rate(node_cpu_seconds_total{mode!="idle"}[10m])) by (job) < 0.20 for: 10m labels: severity: info annotations: summary: "CPU usage below 20% — triggering scale-in" ``` Use longer evaluation windows (5–10 minutes) for scale-in rules to avoid prematurely terminating instances during short idle periods. Scale-out rules can use shorter windows (1–2 minutes) to respond faster to load spikes. *** ## Deploy and Trigger Scaling Navigate to **Orchestration > Stacks** and click **Create Stack**. In the **Prepare Template** step, upload `autoscaling-stack.yaml`. In the **Orchestration Information** step, fill in the parameters: | Parameter | Example Value | Description | | ---------- | -------------- | ------------------------------- | | `image` | `Ubuntu-22.04` | Base image for scaled instances | | `flavor` | `m1.small` | Instance size | | `key_name` | `my-keypair` | SSH key pair for access | | `network` | `private` | Network for instances | | `min_size` | `1` | Minimum instance count | | `max_size` | `10` | Maximum instance count | Click **Confirm**. Stack reaches **Create Complete**. The scaling group shows the initial instance count. Open the stack detail page and select the **Detail** tab (Outputs card). Copy the values for `scale_out_url` and `scale_in_url` — these are used as Alertmanager webhook receiver URLs. To test scaling without waiting for an alert, send an HTTP POST to the signal URL: ```bash title="Trigger scale-out via webhook" theme={null} curl -X POST "" ``` The scaling group adds one instance. Check the **Stack Resources** tab to confirm the new member. Return to **Orchestration > Stacks** and open the stack. The **Stack Resources** tab shows the current group resources. The **Stack Events** tab shows scaling events in real time as Prometheus alerts fire and Alertmanager posts to the signal URLs. ```bash title="Load credentials" theme={null} source openrc.sh ``` ```bash title="Create the auto-scaling stack" theme={null} openstack stack create \ --template autoscaling-stack.yaml \ --parameter image=Ubuntu-22.04 \ --parameter key_name=my-keypair \ --parameter network=private \ --parameter min_size=1 \ --parameter max_size=10 \ --wait \ web-asg-stack ``` ```bash title="Get scaling webhook URLs" theme={null} openstack stack output show web-asg-stack scale_out_url -c output_value -f value openstack stack output show web-asg-stack scale_in_url -c output_value -f value ``` Store these URLs in your Alertmanager webhook receiver configuration. ```bash title="Signal scale-out" theme={null} SCALE_OUT_URL=$(openstack stack output show web-asg-stack scale_out_url \ -c output_value -f value) curl -X POST "$SCALE_OUT_URL" ``` ```bash title="Verify group size increased" theme={null} openstack stack resource list web-asg-stack ``` The auto-scaling group shows one additional member instance. ```bash title="Show current instance count" theme={null} openstack stack output show web-asg-stack current_size -c output_value -f value ``` *** ## Cooldown Periods Cooldown prevents rapid successive scaling events from destabilizing your workload. The `cooldown` value is specified in seconds per scaling policy. | Scenario | Recommended Scale-Out Cooldown | Recommended Scale-In Cooldown | | --------------------------------------------- | ------------------------------ | ----------------------------- | | Fast-booting instances (cloud image, no init) | 30–60 s | 60–90 s | | Instances with cloud-init provisioning | 90–120 s | 120–180 s | | Instances requiring application warm-up | 120–180 s | 180–300 s | Setting cooldown too low on scale-in can cause thrashing — where instances are terminated before the remaining group has stabilized under the new load distribution. Use a scale-in cooldown at least twice the scale-out cooldown. *** ## Troubleshooting **Cause**: Insufficient quota, unavailable flavor, or invalid image name. **Resolution**: ```bash title="Check stack events for the error message" theme={null} openstack stack event list web-asg-stack --nested-depth 5 ``` Review the `resource_status_reason` field. Common causes: * Compute quota exceeded — check with `openstack quota show` * Image not found — verify with `openstack image list` * Flavor not available in the target availability zone **Cause**: The signal URL contains a temporary token that has expired, or the URL was copied incorrectly. **Resolution**: Retrieve a fresh signal URL from the stack output: ```bash title="Refresh signal URL" theme={null} openstack stack output show web-asg-stack scale_out_url -c output_value -f value ``` Signal URLs are valid as long as the stack exists. Update your Alertmanager config with the current URL after any stack update. **Cause**: Alertmanager is not reaching the signal URL, or the Prometheus alert is not firing. **Resolution**: 1. Verify Alertmanager is running: `curl http://:9093/-/healthy` 2. Check alert state in Prometheus UI under **Alerts** 3. Confirm the webhook receiver URL in Alertmanager config matches the stack output 4. Test manually: `curl -X POST ""` — if this works, the stack is healthy **Cause**: Compute capacity exhausted on available hosts, or image boot failure. **Resolution**: ```bash title="List instances in the scaling group" theme={null} openstack stack resource list web-asg-stack --nested-depth 2 ``` Identify failed instances and check their events: ```bash title="Check instance events" theme={null} openstack server event list ``` *** ## Next Steps Learn intrinsic functions and conditions used in scaling templates Update, suspend, and manage the auto-scaling stack lifecycle Configure Prometheus scrape targets and alert rules for scaling triggers Front auto-scaling groups with a load balancer for traffic distribution