Overview
XSDS cluster health and performance data is exported to XIMP for centralized monitoring and alerting. This page covers the key metrics to monitor, recommended alert thresholds, and how to configure the integration.Prerequisites
- Administrator credentials with the
adminrole - XIMP deployed and accessible (see XIMP Admin Guide)
- Metric scrape target configured for the XSDS cluster metrics endpoint
Key Metrics and Alert Thresholds
| Metric | Namespace | Alert Threshold | Action |
|---|---|---|---|
| Cluster health | xloud_storage_health | HEALTH_WARN | Investigate immediately |
OSD down count | xloud_storage_osd_down | > 0 | Replace or recover failed OSD |
| Pool used % | xloud_storage_pool_used_pct | > 70% | Plan capacity expansion |
| Recovery I/O rate | xloud_storage_recovery_bytes_sec | Sustained > 200 MB/s | Consider I/O throttling |
PG inconsistent count | xloud_storage_pg_inconsistent | > 0 | Run ceph health detail and repair |
| Replication lag (RGW) | xloud_storage_rgw_sync_lag_sec | Sustained > 30s | Check network bandwidth to RGW |
| OSD apply latency | xloud_storage_osd_apply_latency_ms | > 20 ms | Investigate OSD or disk health |
Configuring XIMP Integration
Verify metrics endpoint
The XSDS cluster exposes metrics on the management node. Verify the endpoint
is reachable:Port
Check metrics endpoint
9283 is the default metrics exporter port deployed by XDeploy.Add scrape target in XIMP
Navigate to Monitoring → Administration → Scrape Targets → Add Target:
Or via CLI:
| Field | Value |
|---|---|
| URL | http://<MGMT_NODE_IP>:9283/metrics |
| Scrape Interval | 60s (storage metrics don’t need sub-minute resolution) |
| Labels | service=xsds, cluster=<CLUSTER_NAME> |
Add XSDS scrape target
Built-In Dashboards
The XIMP portal includes pre-built XSDS dashboards. Navigate to Monitoring → Dashboards and search for “XSDS” or “Storage”:| Dashboard | Shows |
|---|---|
| XSDS Cluster Overview | Health state, OSD counts, capacity, recovery activity |
| XSDS Pool Utilization | Per-pool used %, available bytes, PG counts |
| XSDS OSD Performance | Per-OSD latency, IOPS, throughput |
| XSDS Recovery | Active recovery operations, estimated completion time |
Cluster CLI Health Check
For quick health checks without opening the XIMP portal, use the management CLI directly from a cluster node:Quick health overview
OSD performance snapshot
Pool I/O statistics (5-second window)
Active slow requests
Next Steps
XIMP Admin Guide
Configure the monitoring platform that collects and displays XSDS metrics
Capacity Planning
Use utilization metrics to plan cluster expansion before thresholds are reached
Troubleshooting
Diagnose the issues surfaced by monitoring alerts
XIMP Alert Rules
Configure notification channels for storage health alerts