Overview
Maintaining adequate free capacity in an XSDS cluster is critical for both performance and data safety. At high utilization, the cluster cannot complete recovery operations after OSD failures, and I/O performance degrades significantly. This page covers monitoring, thresholds, and expansion procedures.Prerequisites
- Administrator credentials with the
adminrole - SSH access to a cluster management node
- Access to XDeploy (
https://connect.<your-domain>) for node provisioning
Capacity Thresholds
| Utilization | Status | Action Required |
|---|---|---|
| < 60% | Healthy | Monitor routinely |
| 60–70% | Watch | Begin planning expansion |
| 70–80% | Warning | Initiate expansion — order hardware |
| 80–85% | Critical | Accelerate expansion — immediate action |
| > 85% | Emergency | Risk of degraded I/O and recovery failure |
Monitoring Utilization
- Dashboard
- CLI
Navigate to XDeploy → Storage → Capacity for a graphical capacity overview
showing per-pool and cluster-wide utilization with trend projections.
Capacity Calculations
Replicated pools
Replicated pools
For a pool with replication factor
Account for the 30% headroom recommendation:
n, usable capacity = raw capacity / n.| Raw Capacity | Replication Factor | Usable Capacity |
|---|---|---|
| 100 TB | 3 (default) | ~33 TB |
| 100 TB | 2 | ~50 TB |
- 100 TB raw, factor 3 = ~33 TB usable
- 30% headroom = ~10 TB reserved
- Effective usable = ~23 TB
Erasure-coded pools
Erasure-coded pools
For an erasure code profile
k+m, usable capacity = raw capacity × k/(k+m).| Profile | Overhead | Usable from 100 TB |
|---|---|---|
| 4+2 | 1.5× | ~67 TB |
| 6+2 | 1.33× | ~75 TB |
| 8+3 | 1.375× | ~73 TB |
Snapshot space
Snapshot space
Snapshots consume incremental capacity proportional to the change rate after the
snapshot is taken. A volume with 10% daily churn accumulates approximately 10% of
its size in snapshot data per day per snapshot retained.Factor snapshot retention into capacity planning. For 7-day retention on a 10-TB
pool with 10% daily churn: approximately 7 TB additional snapshot space required.
Expanding the Cluster
Deploy new OSD node via XDeploy
Navigate to XDeploy → Infrastructure → Nodes → Add Node and register the
new storage node. XDeploy configures the OS, installs storage packages, and
joins the node to the cluster.
Verify OSD integration
Confirm new OSDs are up and in
up and in. The cluster begins re-balancing data
automatically once OSDs are registered.Monitor rebalancing
Watch recovery progress
ceph status shows HEALTH_OK with no active
recovery operations. Rebalancing speed depends on cluster size and network
bandwidth.Recovery I/O competes with client I/O. If client performance is impacted during
rebalancing, throttle recovery:
Throttle recovery I/O
Cluster returns to
HEALTH_OK with data distributed across all OSDs including new ones.Capacity Trend Monitoring
Configure XIMP alerts to proactively notify administrators before capacity reaches critical thresholds:| Alert | Threshold | XIMP Metric |
|---|---|---|
| Capacity Warning | Pool used > 70% | xloud_storage_pool_used_pct |
| Capacity Critical | Pool used > 80% | xloud_storage_pool_used_pct |
| OSD Near Full | OSD used > 85% | xloud_storage_osd_used_pct |
xloud_storage metric namespace.
Next Steps
Cluster Management
Add OSDs and manage cluster health during expansion
Monitoring
Configure XIMP alerts for capacity and health thresholds
Storage Tiers
Add new tiers when expanding with different device classes
Troubleshooting
Diagnose capacity-related HEALTH_WARN states