> ## Documentation Index > Fetch the complete documentation index at: https://docs.xloud.tech/llms.txt > Use this file to discover all available pages before exploring further. # Capacity Planning > Monitor XSDS cluster utilization, maintain safe capacity headroom, and plan storage expansion before capacity constraints impact performance or availability. ## Overview Maintaining adequate free capacity in an XSDS cluster is critical for both performance and data safety. At high utilization, the cluster cannot complete recovery operations after OSD failures, and I/O performance degrades significantly. This page covers monitoring, thresholds, and expansion procedures. **Administrator Access Required** — This operation requires the `admin` role. Contact your Xloud administrator if you do not have sufficient permissions. **Prerequisites** * Administrator credentials with the `admin` role * SSH access to a cluster management node * Access to **XDeploy** (`https://connect.`) for node provisioning *** ## Capacity Thresholds | Utilization | Status | Action Required | | ----------- | --------- | ----------------------------------------- | | \< 60% | Healthy | Monitor routinely | | 60–70% | Watch | Begin planning expansion | | 70–80% | Warning | Initiate expansion — order hardware | | 80–85% | Critical | Accelerate expansion — immediate action | | > 85% | Emergency | Risk of degraded I/O and recovery failure | Above 85% utilization, the cluster may refuse writes and cannot complete data recovery after OSD failures. Maintain a minimum of 30% free capacity headroom. *** ## Monitoring Utilization Navigate to **XDeploy → Storage → Capacity** for a graphical capacity overview showing per-pool and cluster-wide utilization with trend projections. ```bash title="Cluster-wide capacity summary" theme={null} ceph df ``` ```bash title="Per-pool capacity" theme={null} ceph df detail ``` ```bash title="Per-OSD utilization" theme={null} ceph osd df tree ``` ```bash title="PG autoscale status" theme={null} ceph osd pool autoscale-status ``` Key metrics to monitor: * **Used %**: Alert at 70%, act at 80% * **PG distribution**: Imbalanced PGs cause some OSDs to bear disproportionate load * **Recovery I/O**: Active recovery competes with client I/O — schedule OSD additions during low-traffic windows where possible *** ## Capacity Calculations For a pool with replication factor `n`, usable capacity = raw capacity / `n`. | Raw Capacity | Replication Factor | Usable Capacity | | ------------ | ------------------ | --------------- | | 100 TB | 3 (default) | \~33 TB | | 100 TB | 2 | \~50 TB | Account for the 30% headroom recommendation: * 100 TB raw, factor 3 = \~33 TB usable * 30% headroom = \~10 TB reserved * Effective usable = \~23 TB For an erasure code profile `k+m`, usable capacity = raw capacity × `k/(k+m)`. | Profile | Overhead | Usable from 100 TB | | ------- | -------- | ------------------ | | 4+2 | 1.5× | \~67 TB | | 6+2 | 1.33× | \~75 TB | | 8+3 | 1.375× | \~73 TB | Snapshots consume incremental capacity proportional to the change rate after the snapshot is taken. A volume with 10% daily churn accumulates approximately 10% of its size in snapshot data per day per snapshot retained. Factor snapshot retention into capacity planning. For 7-day retention on a 10-TB pool with 10% daily churn: approximately 7 TB additional snapshot space required. *** ## Expanding the Cluster Navigate to **XDeploy → Infrastructure → Nodes → Add Node** and register the new storage node. XDeploy configures the OS, installs storage packages, and joins the node to the cluster. Add at least 3 OSDs per expansion batch to ensure balanced data distribution across the cluster. Adding a single OSD may cause temporary imbalance. ```bash title="Confirm new OSDs are up and in" theme={null} ceph osd tree ``` New OSDs should show `up` and `in`. The cluster begins re-balancing data automatically once OSDs are registered. ```bash title="Watch recovery progress" theme={null} watch ceph status ``` Rebalancing completes when `ceph status` shows `HEALTH_OK` with no active recovery operations. Rebalancing speed depends on cluster size and network bandwidth. Recovery I/O competes with client I/O. If client performance is impacted during rebalancing, throttle recovery: ```bash title="Throttle recovery I/O" theme={null} ceph osd set-recovery-delay 5 ``` Cluster returns to `HEALTH_OK` with data distributed across all OSDs including new ones. *** ## Capacity Trend Monitoring Configure XIMP alerts to proactively notify administrators before capacity reaches critical thresholds: | Alert | Threshold | XIMP Metric | | ----------------- | --------------- | ----------------------------- | | Capacity Warning | Pool used > 70% | `xloud_storage_pool_used_pct` | | Capacity Critical | Pool used > 80% | `xloud_storage_pool_used_pct` | | OSD Near Full | OSD used > 85% | `xloud_storage_osd_used_pct` | Navigate to **Monitoring → Alerting → Alert Rules** in the XIMP portal and create rules sourcing from the `xloud_storage` metric namespace. *** ## Next Steps Add OSDs and manage cluster health during expansion Configure XIMP alerts for capacity and health thresholds Add new tiers when expanding with different device classes Diagnose capacity-related HEALTH\_WARN states