Overview
The replicator daemon continuously ensures each object has the configured number of replicas across different zones, detecting and repairing divergence. Monitoring replication health is essential for maintaining data durability guarantees.Monitor Replication
- Cluster-wide status
- Single node check
- Overall cluster health
Check replication status across all nodes
Verbose replication check
| Metric | Healthy Value | Concern Threshold |
|---|---|---|
replication_time | < 60 seconds | > 300 seconds |
replication_last | Recent timestamp | Older than 600 seconds |
object_count | Consistent across replicas | Divergence > 1% |
Quarantined Objects
The auditor daemon detects data corruption (bit-rot, write errors) through checksum verification. Corrupted objects are moved to a quarantine directory and excluded from reads until a healthy replica is served instead.Check quarantine counts by node
View quarantined objects on a node (SSH to node)
- Check drive health with
smartctlor the hardware vendor tool - Replace failing drives and add replacement devices to the ring
- Remove the degraded device from the ring to allow data to drain
Replication Configuration
Key replication parameters configurable through XDeploy:| Parameter | Description | Default |
|---|---|---|
concurrency | Number of parallel replication threads per daemon | 1 |
interval | Seconds between replication passes | 30 |
node_timeout | Seconds before marking a replica push as failed | 10 |
concurrency during off-peak hours to accelerate replication after large ring
changes:
Temporarily increase replication concurrency (XDeploy config)
Next Steps
Ring Management
Add or remove drives that affect replication targets
Monitoring
Set up capacity and replication health monitoring
Admin Troubleshooting
Diagnose replication failures and high-latency nodes
Storage Policies
Review replication factors for each storage policy