Skip to main content

Overview

Maintaining adequate free capacity in an XSDS cluster is critical for both performance and data safety. At high utilization, the cluster cannot complete recovery operations after OSD failures, and I/O performance degrades significantly. This page covers monitoring, thresholds, and expansion procedures.
Administrator Access Required — This operation requires the admin role. Contact your Xloud administrator if you do not have sufficient permissions.
Prerequisites
  • Administrator credentials with the admin role
  • SSH access to a cluster management node
  • Access to XDeploy (https://connect.<your-domain>) for node provisioning

Capacity Thresholds

UtilizationStatusAction Required
< 60%HealthyMonitor routinely
60–70%WatchBegin planning expansion
70–80%WarningInitiate expansion — order hardware
80–85%CriticalAccelerate expansion — immediate action
> 85%EmergencyRisk of degraded I/O and recovery failure
Above 85% utilization, the cluster may refuse writes and cannot complete data recovery after OSD failures. Maintain a minimum of 30% free capacity headroom.

Monitoring Utilization

Navigate to XDeploy → Storage → Capacity for a graphical capacity overview showing per-pool and cluster-wide utilization with trend projections.

Capacity Calculations

For a pool with replication factor n, usable capacity = raw capacity / n.
Raw CapacityReplication FactorUsable Capacity
100 TB3 (default)~33 TB
100 TB2~50 TB
Account for the 30% headroom recommendation:
  • 100 TB raw, factor 3 = ~33 TB usable
  • 30% headroom = ~10 TB reserved
  • Effective usable = ~23 TB
For an erasure code profile k+m, usable capacity = raw capacity × k/(k+m).
ProfileOverheadUsable from 100 TB
4+21.5×~67 TB
6+21.33×~75 TB
8+31.375×~73 TB
Snapshots consume incremental capacity proportional to the change rate after the snapshot is taken. A volume with 10% daily churn accumulates approximately 10% of its size in snapshot data per day per snapshot retained.Factor snapshot retention into capacity planning. For 7-day retention on a 10-TB pool with 10% daily churn: approximately 7 TB additional snapshot space required.

Expanding the Cluster

Deploy new OSD node via XDeploy

Navigate to XDeploy → Infrastructure → Nodes → Add Node and register the new storage node. XDeploy configures the OS, installs storage packages, and joins the node to the cluster.
Add at least 3 OSDs per expansion batch to ensure balanced data distribution across the cluster. Adding a single OSD may cause temporary imbalance.

Verify OSD integration

Confirm new OSDs are up and in
ceph osd tree
New OSDs should show up and in. The cluster begins re-balancing data automatically once OSDs are registered.

Monitor rebalancing

Watch recovery progress
watch ceph status
Rebalancing completes when ceph status shows HEALTH_OK with no active recovery operations. Rebalancing speed depends on cluster size and network bandwidth.
Recovery I/O competes with client I/O. If client performance is impacted during rebalancing, throttle recovery:
Throttle recovery I/O
ceph osd set-recovery-delay 5
Cluster returns to HEALTH_OK with data distributed across all OSDs including new ones.

Capacity Trend Monitoring

Configure XIMP alerts to proactively notify administrators before capacity reaches critical thresholds:
AlertThresholdXIMP Metric
Capacity WarningPool used > 70%xloud_storage_pool_used_pct
Capacity CriticalPool used > 80%xloud_storage_pool_used_pct
OSD Near FullOSD used > 85%xloud_storage_osd_used_pct
Navigate to Monitoring → Alerting → Alert Rules in the XIMP portal and create rules sourcing from the xloud_storage metric namespace.

Next Steps

Cluster Management

Add OSDs and manage cluster health during expansion

Monitoring

Configure XIMP alerts for capacity and health thresholds

Storage Tiers

Add new tiers when expanding with different device classes

Troubleshooting

Diagnose capacity-related HEALTH_WARN states