Overview
This guide covers platform-level object storage issues that require administrator access. For user-facing issues such as 403 access errors or upload timeouts, see the Object Storage Troubleshooting guide.Diagnostic Checklist
Overall cluster health
Disk usage across all nodes
Replication status
Ring file consistency
Platform Issues
507 Insufficient Storage on object PUT
507 Insufficient Storage on object PUT
Cause: One or more storage nodes targeted by the ring have insufficient free space
to accept the write.Diagnosis:Resolution:
Check disk usage per node
- Identify nodes or drives above 85% utilization
- Expand storage capacity by adding new drives (see Ring Management)
- Alternatively, rebalance the ring to shift weight toward nodes with available space:
Adjust weight on high-capacity node
Slow reads or high proxy latency
Slow reads or high proxy latency
Cause: Replication traffic competing with foreground I/O, or a storage node
with degraded drives experiencing high read latency.Diagnosis:If a specific node shows high Resolution:
Check replication load
replication_time, inspect that node’s disk I/O:Check disk I/O on storage node (SSH required)
- Consider throttling the replicator with
--concurrency 1during peak hours - If a specific drive is degraded, reduce its ring weight to shift load away
- Replace drives showing high latency or recurring I/O errors in
dmesg
Ring inconsistency between nodes
Ring inconsistency between nodes
Cause: The updated ring file was not distributed to all nodes after a rebalance.Diagnosis:Resolution: Nodes with mismatched MD5 hashes have stale ring files. Redistribute
the ring to affected nodes:Restart the object-server and replicator on the affected node after distribution.
Check MD5 of ring files on all nodes
Copy ring files to an affected node
High quarantine count on a specific node
High quarantine count on a specific node
Cause: Drive failure, bit-rot, or network errors during replication causing
data corruption detected by the auditor.Diagnosis:Resolution:
Check quarantine counts
Check drive health (SSH to node)
- If the drive is failing, set its ring weight to 0 and rebalance to drain data
- Replace the physical drive
- Add the replacement drive to the ring and rebalance
- The replicator will restore the quarantined objects from healthy replicas
Proxy service returns 503 Service Unavailable
Proxy service returns 503 Service Unavailable
Log Locations
| Component | Log Command |
|---|---|
| Proxy server | docker logs swift-proxy |
| Object server | docker logs swift-object |
| Container server | docker logs swift-container |
| Account server | docker logs swift-account |
| Replicator | docker logs swift-object-replicator |
Next Steps
Object Storage Troubleshooting (User)
User-facing issues — 403 errors, upload timeouts, versioning failures
Ring Management
Add capacity and redistribute data after failures
Replication
Monitor and restore data durability
Monitoring
Proactively catch issues before they become outages