Capacity Planning

Overview

Maintaining adequate free capacity in an XSDS cluster is critical for both performance and data safety. At high utilization, the cluster cannot complete recovery operations after OSD failures, and I/O performance degrades significantly. This page covers monitoring, thresholds, and expansion procedures.

Administrator Access Required — This operation requires the admin role. Contact your Xloud administrator if you do not have sufficient permissions.

Prerequisites

Administrator credentials with the admin role
SSH access to a cluster management node
Access to XDeploy (https://connect.<your-domain>) for node provisioning

Capacity Thresholds

Utilization	Status	Action Required
< 60%	Healthy	Monitor routinely
60–70%	Watch	Begin planning expansion
70–80%	Warning	Initiate expansion — order hardware
80–85%	Critical	Accelerate expansion — immediate action
> 85%	Emergency	Risk of degraded I/O and recovery failure

Above 85% utilization, the cluster may refuse writes and cannot complete data recovery after OSD failures. Maintain a minimum of 30% free capacity headroom.

Monitoring Utilization

Dashboard
CLI

Navigate to XDeploy → Storage → Capacity for a graphical capacity overview showing per-pool and cluster-wide utilization with trend projections.

Cluster-wide capacity summary

ceph df

Per-pool capacity

ceph df detail

Per-OSD utilization

ceph osd df tree

PG autoscale status

ceph osd pool autoscale-status

Key metrics to monitor:

Used %: Alert at 70%, act at 80%
PG distribution: Imbalanced PGs cause some OSDs to bear disproportionate load
Recovery I/O: Active recovery competes with client I/O — schedule OSD additions during low-traffic windows where possible

Capacity Calculations

Replicated pools

For a pool with replication factor n, usable capacity = raw capacity / n.

Raw Capacity	Replication Factor	Usable Capacity
100 TB	3 (default)	~33 TB
100 TB	2	~50 TB

Account for the 30% headroom recommendation:

100 TB raw, factor 3 = ~33 TB usable
30% headroom = ~10 TB reserved
Effective usable = ~23 TB

Erasure-coded pools

For an erasure code profile k+m, usable capacity = raw capacity × k/(k+m).

Profile	Overhead	Usable from 100 TB
4+2	1.5×	~67 TB
6+2	1.33×	~75 TB
8+3	1.375×	~73 TB

Snapshot space

Snapshots consume incremental capacity proportional to the change rate after the snapshot is taken. A volume with 10% daily churn accumulates approximately 10% of its size in snapshot data per day per snapshot retained.Factor snapshot retention into capacity planning. For 7-day retention on a 10-TB pool with 10% daily churn: approximately 7 TB additional snapshot space required.

Expanding the Cluster

Deploy new OSD node via XDeploy

Navigate to XDeploy → Infrastructure → Nodes → Add Node and register the new storage node. XDeploy configures the OS, installs storage packages, and joins the node to the cluster.

Add at least 3 OSDs per expansion batch to ensure balanced data distribution across the cluster. Adding a single OSD may cause temporary imbalance.

Verify OSD integration

Confirm new OSDs are up and in

ceph osd tree

New OSDs should show up and in. The cluster begins re-balancing data automatically once OSDs are registered.

Monitor rebalancing

Watch recovery progress

watch ceph status

Rebalancing completes when ceph status shows HEALTH_OK with no active recovery operations. Rebalancing speed depends on cluster size and network bandwidth.

Recovery I/O competes with client I/O. If client performance is impacted during rebalancing, throttle recovery:

Throttle recovery I/O

ceph osd set-recovery-delay 5

Cluster returns to HEALTH_OK with data distributed across all OSDs including new ones.

Capacity Trend Monitoring

Configure XIMP alerts to proactively notify administrators before capacity reaches critical thresholds:

Alert	Threshold	XIMP Metric
Capacity Warning	Pool used > 70%	`xloud_storage_pool_used_pct`
Capacity Critical	Pool used > 80%	`xloud_storage_pool_used_pct`
OSD Near Full	OSD used > 85%	`xloud_storage_osd_used_pct`

Navigate to Monitoring → Alerting → Alert Rules in the XIMP portal and create rules sourcing from the xloud_storage metric namespace.

Next Steps

Cluster Management

Add OSDs and manage cluster health during expansion

Monitoring

Configure XIMP alerts for capacity and health thresholds

Storage Tiers

Add new tiers when expanding with different device classes

Troubleshooting

Diagnose capacity-related HEALTH_WARN states

​Overview

​Capacity Thresholds

​Monitoring Utilization

​Capacity Calculations

​Expanding the Cluster

Deploy new OSD node via XDeploy

Verify OSD integration

Monitor rebalancing

​Capacity Trend Monitoring

​Next Steps

Cluster Management

Monitoring

Storage Tiers

Troubleshooting

Overview

Capacity Thresholds

Monitoring Utilization

Capacity Calculations

Expanding the Cluster

Capacity Trend Monitoring

Next Steps