Object Storage Replication

Overview

The replicator daemon continuously ensures each object has the configured number of replicas across different zones, detecting and repairing divergence. Monitoring replication health is essential for maintaining data durability guarantees.

Administrator Access Required — This operation requires the admin role. Contact your Xloud administrator if you do not have sufficient permissions.

Monitor Replication

Cluster-wide status
Single node check
Overall cluster health

Check replication status across all nodes

xavs-storage-recon --replication

Verbose replication check

xavs-storage-recon --replication --verbose

Key replication metrics:

Metric	Healthy Value	Concern Threshold
`replication_time`	< 60 seconds	> 300 seconds
`replication_last`	Recent timestamp	Older than 600 seconds
`object_count`	Consistent across replicas	Divergence > 1%

Check replication on a specific storage node

xavs-storage-recon --replication -v <storage-node-ip>

Comprehensive cluster health check

xavs-storage-recon --all

Disk usage across all nodes

xavs-storage-recon --diskusage

Check for quarantined objects

xavs-storage-recon --quarantined

Quarantined Objects

The auditor daemon detects data corruption (bit-rot, write errors) through checksum verification. Corrupted objects are moved to a quarantine directory and excluded from reads until a healthy replica is served instead.

A high quarantine count indicates data corruption — potentially caused by drive failures, bit rot, or network errors during replication. Investigate and replace affected drives promptly. Quarantined objects are excluded from reads until a healthy replica is found.

Check quarantine counts by node

xavs-storage-recon --quarantined --verbose

View quarantined objects on a node (SSH to node)

ls /var/lib/xavs-object-storage/quarantined/

If quarantine counts are high on a specific node:

Check drive health with smartctl or the hardware vendor tool
Replace failing drives and add replacement devices to the ring
Remove the degraded device from the ring to allow data to drain

Replication Configuration

Key replication parameters configurable through XDeploy:

Parameter	Description	Default
`concurrency`	Number of parallel replication threads per daemon	1
`interval`	Seconds between replication passes	30
`node_timeout`	Seconds before marking a replica push as failed	10

Adjust concurrency during off-peak hours to accelerate replication after large ring changes:

Temporarily increase replication concurrency (XDeploy config)

# Edit object storage configuration → replicator section
# Set concurrency = 4, then deploy
xavs-ansible deploy -t swift

Next Steps

Ring Management

Add or remove drives that affect replication targets

Monitoring

Set up capacity and replication health monitoring

Admin Troubleshooting

Diagnose replication failures and high-latency nodes

Storage Policies

Review replication factors for each storage policy

​Overview

​Monitor Replication

​Quarantined Objects

​Replication Configuration

​Next Steps

Ring Management

Monitoring

Admin Troubleshooting

Storage Policies

Overview

Monitor Replication

Quarantined Objects

Replication Configuration

Next Steps