> ## Documentation Index > Fetch the complete documentation index at: https://docs.xloud.tech/llms.txt > Use this file to discover all available pages before exploring further. # Object Storage Monitoring > Monitor Xloud Object Storage cluster health — track capacity utilization, proxy request metrics, replication latency, and quarantined object counts. ## Overview Effective monitoring of the object storage cluster ensures early detection of capacity constraints, performance degradation, and data integrity issues. This guide covers the key metrics and commands for ongoing operational visibility. **Administrator Access Required** — This operation requires the `admin` role. Contact your Xloud administrator if you do not have sufficient permissions. *** ## Cluster Capacity ```bash title="Storage capacity across all nodes" theme={null} xavs-storage-recon --diskusage --verbose ``` Capacity thresholds: | Metric | Warning | Critical | Action | | ----------------------- | ------- | -------- | ---------------------------- | | Node capacity used | 70% | 85% | Plan capacity expansion | | Single drive capacity | 80% | 90% | Add drives or rebalance ring | | Cluster-wide free space | \< 20% | \< 10% | Immediate expansion required | When any storage node exceeds 85% capacity, the ring rebalancer may be unable to place new replicas, causing `507 Insufficient Storage` errors for writes. Plan capacity expansion before reaching 70% utilization. ```bash title="Disk usage summary per node" theme={null} xavs-storage-recon --diskusage ``` The output shows each node's total capacity, used bytes, and percentage utilized. Identify outliers — nodes significantly above the cluster average indicate uneven data distribution, which may require ring weight adjustments. *** ## Proxy Metrics The proxy-server exposes metrics on the recon middleware endpoint: ```bash title="Check proxy load" theme={null} curl -s http://:6000/recon/load ``` ```bash title="Check proxy memory" theme={null} curl -s http://:6000/recon/mem ``` ```bash title="Check proxy async pending updates" theme={null} curl -s http://:6000/recon/async ``` Monitor these proxy-level metrics: | Metric | Description | Alert Threshold | | ----------------- | ----------------------------------- | -------------------------------- | | **Request rate** | Requests per second per proxy node | Baseline + 3× standard deviation | | **Error rate** | 4xx and 5xx responses as % of total | > 5% 5xx errors | | **GET latency** | p95 response time for object reads | > 500ms p95 | | **PUT latency** | p95 response time for object writes | > 1000ms p95 | | **Async pending** | Container/account updates queued | > 1000 pending | *** ## Replication Health ```bash title="Replication status across all nodes" theme={null} xavs-storage-recon --replication ``` ```bash title="Check for quarantined (corrupted) objects" theme={null} xavs-storage-recon --quarantined ``` ```bash title="Verify ring file consistency across nodes" theme={null} xavs-storage-recon --md5 ``` Replication health alerts: | Condition | Severity | Response | | --------------------------- | -------- | ----------------------------------------- | | `replication_time` > 300s | Warning | Investigate slow nodes | | `replication_last` > 600s | Critical | Check replicator daemon status | | Quarantine count increasing | Critical | Check drive health, replace failed drives | | MD5 mismatch | Critical | Redistribute ring files immediately | *** ## Integration with XIMP For continuous monitoring, connect the object storage recon endpoint to XIMP (Xloud Infrastructure Monitoring Platform): ```yaml title="Prometheus scrape config for object storage" theme={null} scrape_configs: - job_name: 'xavs-object-storage-recon' static_configs: - targets: [':6000', ':6000'] metrics_path: '/recon/metrics' ``` Configure alerting rules in XIMP for the critical thresholds above. Set notification channels for the on-call team to respond to 507 storage errors and quarantine count spikes promptly. *** ## Next Steps Deep-dive into replication health and quarantine management Expand capacity by adding drives and rebalancing rings Respond to monitoring alerts and diagnose failures Set limits to prevent individual projects from consuming all capacity