Skip to main content

Overview

XSDS cluster health and performance data is exported to XIMP for centralized monitoring and alerting. This page covers the key metrics to monitor, recommended alert thresholds, and how to configure the integration.
Administrator Access Required — This operation requires the admin role. Contact your Xloud administrator if you do not have sufficient permissions.
Prerequisites
  • Administrator credentials with the admin role
  • XIMP deployed and accessible (see XIMP Admin Guide)
  • Metric scrape target configured for the XSDS cluster metrics endpoint

Key Metrics and Alert Thresholds

MetricNamespaceAlert ThresholdAction
Cluster healthxloud_storage_healthHEALTH_WARNInvestigate immediately
OSD down countxloud_storage_osd_down> 0Replace or recover failed OSD
Pool used %xloud_storage_pool_used_pct> 70%Plan capacity expansion
Recovery I/O ratexloud_storage_recovery_bytes_secSustained > 200 MB/sConsider I/O throttling
PG inconsistent countxloud_storage_pg_inconsistent> 0Run ceph health detail and repair
Replication lag (RGW)xloud_storage_rgw_sync_lag_secSustained > 30sCheck network bandwidth to RGW
OSD apply latencyxloud_storage_osd_apply_latency_ms> 20 msInvestigate OSD or disk health

Configuring XIMP Integration

Verify metrics endpoint

The XSDS cluster exposes metrics on the management node. Verify the endpoint is reachable:
Check metrics endpoint
curl http://<MGMT_NODE_IP>:9283/metrics | head -20
Port 9283 is the default metrics exporter port deployed by XDeploy.

Add scrape target in XIMP

Navigate to Monitoring → Administration → Scrape Targets → Add Target:
FieldValue
URLhttp://<MGMT_NODE_IP>:9283/metrics
Scrape Interval60s (storage metrics don’t need sub-minute resolution)
Labelsservice=xsds, cluster=<CLUSTER_NAME>
Or via CLI:
Add XSDS scrape target
ximp target add \
  --url http://<MGMT_NODE_IP>:9283/metrics \
  --interval 60s \
  --label service=xsds \
  --label cluster=prod-storage

Create alert rules

Navigate to Monitoring → Alerting → Alert Rules and create rules for each threshold in the table above.Example alert rule for pool utilization:
alert-storage-capacity.yaml
name: xsds-pool-capacity-warning
metric: xloud_storage_pool_used_pct
condition: ">"
threshold: 70
evaluation_period: 10m
severity: warning
notification_channels:
  - ops-email
Alert rules appear in the Active Rules list and evaluate against live storage metrics.

Built-In Dashboards

The XIMP portal includes pre-built XSDS dashboards. Navigate to Monitoring → Dashboards and search for “XSDS” or “Storage”:
DashboardShows
XSDS Cluster OverviewHealth state, OSD counts, capacity, recovery activity
XSDS Pool UtilizationPer-pool used %, available bytes, PG counts
XSDS OSD PerformancePer-OSD latency, IOPS, throughput
XSDS RecoveryActive recovery operations, estimated completion time
Pin the “XSDS Cluster Overview” dashboard to your XIMP home screen for constant visibility during on-call rotations.

Cluster CLI Health Check

For quick health checks without opening the XIMP portal, use the management CLI directly from a cluster node:
Quick health overview
ceph status
OSD performance snapshot
ceph osd perf
Pool I/O statistics (5-second window)
ceph osd pool stats
Active slow requests
ceph health detail | grep -i slow

Next Steps

XIMP Admin Guide

Configure the monitoring platform that collects and displays XSDS metrics

Capacity Planning

Use utilization metrics to plan cluster expansion before thresholds are reached

Troubleshooting

Diagnose the issues surfaced by monitoring alerts

XIMP Alert Rules

Configure notification channels for storage health alerts