> ## Documentation Index > Fetch the complete documentation index at: https://docs.xloud.tech/llms.txt > Use this file to discover all available pages before exploring further. # Monitoring > Monitor XSDS cluster health, OSD status, and I/O performance through XIMP integration — key metrics, alert thresholds, and observability configuration. ## Overview XSDS cluster health and performance data is exported to XIMP for centralized monitoring and alerting. This page covers the key metrics to monitor, recommended alert thresholds, and how to configure the integration. **Administrator Access Required** — This operation requires the `admin` role. Contact your Xloud administrator if you do not have sufficient permissions. **Prerequisites** * Administrator credentials with the `admin` role * XIMP deployed and accessible (see [XIMP Admin Guide](/services/monitoring/admin-guide)) * Metric scrape target configured for the XSDS cluster metrics endpoint *** ## Key Metrics and Alert Thresholds | Metric | Namespace | Alert Threshold | Action | | ----------------------- | ------------------------------------ | -------------------- | ----------------------------------- | | Cluster health | `xloud_storage_health` | `HEALTH_WARN` | Investigate immediately | | OSD `down` count | `xloud_storage_osd_down` | > 0 | Replace or recover failed OSD | | Pool used % | `xloud_storage_pool_used_pct` | > 70% | Plan capacity expansion | | Recovery I/O rate | `xloud_storage_recovery_bytes_sec` | Sustained > 200 MB/s | Consider I/O throttling | | PG `inconsistent` count | `xloud_storage_pg_inconsistent` | > 0 | Run `ceph health detail` and repair | | Replication lag (RGW) | `xloud_storage_rgw_sync_lag_sec` | Sustained > 30s | Check network bandwidth to RGW | | OSD apply latency | `xloud_storage_osd_apply_latency_ms` | > 20 ms | Investigate OSD or disk health | *** ## Configuring XIMP Integration The XSDS cluster exposes metrics on the management node. Verify the endpoint is reachable: ```bash title="Check metrics endpoint" theme={null} curl http://:9283/metrics | head -20 ``` Port `9283` is the default metrics exporter port deployed by XDeploy. Navigate to **Monitoring → Administration → Scrape Targets → Add Target**: | Field | Value | | ------------------- | -------------------------------------------------------- | | **URL** | `http://:9283/metrics` | | **Scrape Interval** | `60s` (storage metrics don't need sub-minute resolution) | | **Labels** | `service=xsds`, `cluster=` | Or via CLI: ```bash title="Add XSDS scrape target" theme={null} ximp target add \ --url http://:9283/metrics \ --interval 60s \ --label service=xsds \ --label cluster=prod-storage ``` Navigate to **Monitoring → Alerting → Alert Rules** and create rules for each threshold in the table above. Example alert rule for pool utilization: ```yaml title="alert-storage-capacity.yaml" theme={null} name: xsds-pool-capacity-warning metric: xloud_storage_pool_used_pct condition: ">" threshold: 70 evaluation_period: 10m severity: warning notification_channels: - ops-email ``` Alert rules appear in the Active Rules list and evaluate against live storage metrics. *** ## Built-In Dashboards The XIMP portal includes pre-built XSDS dashboards. Navigate to **Monitoring → Dashboards** and search for "XSDS" or "Storage": | Dashboard | Shows | | ------------------------- | ----------------------------------------------------- | | **XSDS Cluster Overview** | Health state, OSD counts, capacity, recovery activity | | **XSDS Pool Utilization** | Per-pool used %, available bytes, PG counts | | **XSDS OSD Performance** | Per-OSD latency, IOPS, throughput | | **XSDS Recovery** | Active recovery operations, estimated completion time | Pin the "XSDS Cluster Overview" dashboard to your XIMP home screen for constant visibility during on-call rotations. *** ## Cluster CLI Health Check For quick health checks without opening the XIMP portal, use the management CLI directly from a cluster node: ```bash title="Quick health overview" theme={null} ceph status ``` ```bash title="OSD performance snapshot" theme={null} ceph osd perf ``` ```bash title="Pool I/O statistics (5-second window)" theme={null} ceph osd pool stats ``` ```bash title="Active slow requests" theme={null} ceph health detail | grep -i slow ``` *** ## Next Steps Configure the monitoring platform that collects and displays XSDS metrics Use utilization metrics to plan cluster expansion before thresholds are reached Diagnose the issues surfaced by monitoring alerts Configure notification channels for storage health alerts