Monitoring - Docs

Overview

XSDS cluster health and performance data is exported to XIMP for centralized monitoring and alerting. This page covers the key metrics to monitor, recommended alert thresholds, and how to configure the integration.

Administrator Access Required — This operation requires the admin role. Contact your Xloud administrator if you do not have sufficient permissions.

Prerequisites

Administrator credentials with the admin role
XIMP deployed and accessible (see XIMP Admin Guide)
Metric scrape target configured for the XSDS cluster metrics endpoint

Key Metrics and Alert Thresholds

Metric	Namespace	Alert Threshold	Action
Cluster health	`xloud_storage_health`	`HEALTH_WARN`	Investigate immediately
OSD `down` count	`xloud_storage_osd_down`	> 0	Replace or recover failed OSD
Pool used %	`xloud_storage_pool_used_pct`	> 70%	Plan capacity expansion
Recovery I/O rate	`xloud_storage_recovery_bytes_sec`	Sustained > 200 MB/s	Consider I/O throttling
PG `inconsistent` count	`xloud_storage_pg_inconsistent`	> 0	Run `ceph health detail` and repair
Replication lag (RGW)	`xloud_storage_rgw_sync_lag_sec`	Sustained > 30s	Check network bandwidth to RGW
OSD apply latency	`xloud_storage_osd_apply_latency_ms`	> 20 ms	Investigate OSD or disk health

Configuring XIMP Integration

Verify metrics endpoint

The XSDS cluster exposes metrics on the management node. Verify the endpoint is reachable:

Check metrics endpoint

curl http://<MGMT_NODE_IP>:9283/metrics | head -20

Port 9283 is the default metrics exporter port deployed by XDeploy.

Add scrape target in XIMP

Navigate to Monitoring → Administration → Scrape Targets → Add Target:

Field	Value
URL	`http://<MGMT_NODE_IP>:9283/metrics`
Scrape Interval	`60s` (storage metrics don’t need sub-minute resolution)
Labels	`service=xsds`, `cluster=<CLUSTER_NAME>`

Or via CLI:

Add XSDS scrape target

ximp target add \
  --url http://<MGMT_NODE_IP>:9283/metrics \
  --interval 60s \
  --label service=xsds \
  --label cluster=prod-storage

Create alert rules

Navigate to Monitoring → Alerting → Alert Rules and create rules for each threshold in the table above.Example alert rule for pool utilization:

alert-storage-capacity.yaml

name: xsds-pool-capacity-warning
metric: xloud_storage_pool_used_pct
condition: ">"
threshold: 70
evaluation_period: 10m
severity: warning
notification_channels:
  - ops-email

Alert rules appear in the Active Rules list and evaluate against live storage metrics.

Built-In Dashboards

The XIMP portal includes pre-built XSDS dashboards. Navigate to Monitoring → Dashboards and search for “XSDS” or “Storage”:

Dashboard	Shows
XSDS Cluster Overview	Health state, OSD counts, capacity, recovery activity
XSDS Pool Utilization	Per-pool used %, available bytes, PG counts
XSDS OSD Performance	Per-OSD latency, IOPS, throughput
XSDS Recovery	Active recovery operations, estimated completion time

Pin the “XSDS Cluster Overview” dashboard to your XIMP home screen for constant visibility during on-call rotations.

Cluster CLI Health Check

For quick health checks without opening the XIMP portal, use the management CLI directly from a cluster node:

Quick health overview

ceph status

OSD performance snapshot

ceph osd perf

Pool I/O statistics (5-second window)

ceph osd pool stats

Active slow requests

ceph health detail | grep -i slow

Next Steps

XIMP Admin Guide

Configure the monitoring platform that collects and displays XSDS metrics

Capacity Planning

Use utilization metrics to plan cluster expansion before thresholds are reached

Troubleshooting

Diagnose the issues surfaced by monitoring alerts

XIMP Alert Rules

Configure notification channels for storage health alerts

​Overview

​Key Metrics and Alert Thresholds

​Configuring XIMP Integration

Verify metrics endpoint

Add scrape target in XIMP

Create alert rules

​Built-In Dashboards

​Cluster CLI Health Check

​Next Steps

XIMP Admin Guide

Capacity Planning

Troubleshooting

XIMP Alert Rules

Overview

Key Metrics and Alert Thresholds

Configuring XIMP Integration

Built-In Dashboards

Cluster CLI Health Check

Next Steps