Overview
Xloud Object Storage uses a three-tier, fully distributed architecture. Proxy nodes handle all API requests and authentication. Storage nodes persist object data on local drives. The consistent hash ring maps every object to its target storage locations without a central metadata server. This design eliminates single points of failure and enables horizontal scaling of each tier independently.Prerequisites
- Familiarity with Xloud Object Storage storage policies
- Admin access to review cluster topology
Cluster Topology
Proxy nodes are fully stateless — they hold only ring files (updated via ring distribution). All persistent state lives on storage nodes. Adding proxy nodes scales API throughput without touching the storage tier.
Component Descriptions
Proxy Server
Proxy Server
The proxy server is the single entry point for all client requests (Swift and S3 API). It performs:
- Token validation via Keystone auth middleware
- Ring lookups to identify target storage nodes for each request
- Parallel writes to all replica nodes for PUT operations
- Read fan-out and quorum resolution for GET operations
- Transparent S3 API translation via
s3apimiddleware
Account Server
Account Server
The account server manages project-level metadata:
- Tracks all containers belonging to a project
- Stores account-level statistics (bytes used, object count, container count)
- Enforces quota limits in conjunction with the proxy
- Served from the account ring — one partition per account
Container Server
Container Server
The container server manages container-level metadata:
- Lists all objects within a container (object listings)
- Stores container-level statistics and custom metadata headers
- Container records are replicated across the container ring
- Object listings are eventually consistent — updates propagate asynchronously via the updater
Object Server
Object Server
The object server handles the actual object data:
- Stores objects on local XFS or ext4 filesystems
- Each object stored as a file at a path derived from its MD5 hash
- Handles PUT, GET, DELETE, HEAD, and COPY operations
- Writes metadata (content-type, custom headers) as extended file attributes
- Generates a unique transaction ID for every operation
Replication Engine
Replication Engine
The replication engine runs continuously on every storage node to maintain the configured replica count:
- Object replicator: Compares local partition hashes with remote nodes; pushes missing objects via rsync or direct HTTP
- Container replicator: Synchronizes container database records across ring replicas
- Account replicator: Synchronizes account database records
- Replication is partition-based — the ring divides the hash space into partitions, and each partition’s primary and handoff nodes are replicated to
Background Services
Background Services
Additional background services maintain cluster health:
| Service | Function |
|---|---|
| Auditor | Reads every stored object and verifies checksum integrity. Quarantines corrupted objects. |
| Updater | Processes failed container and account update queues asynchronously. Resolves eventually-consistent listings. |
| Expirer | Deletes objects that have reached their X-Delete-At or X-Delete-After expiry timestamp. |
| Reconstructor | EC-specific: reconstructs missing or corrupted EC fragments from surviving shards. |
S3 API Middleware
S3 API Middleware
The
s3api middleware translates S3-format requests into Swift internal requests transparently:- Mounted in the proxy pipeline before the auth middleware
- Translates S3 bucket operations to Swift container operations
- Translates S3 object operations to Swift object operations
- Handles S3 authentication (HMAC-SHA256 signature v4)
- Translates S3 ACLs to Swift ACL headers
- Supports multipart upload via Swift dynamic large objects
- Supports object versioning via Swift versioning middleware
S3 and Swift APIs share the same underlying storage. An object uploaded via S3 API is immediately accessible via the Swift API using the same account/container/object path structure, and vice versa.
Consistent Hash Ring
The consistent hash ring is the core distribution mechanism. It determines which storage nodes hold each object without any central directory server. Ring mechanics:| Concept | Description |
|---|---|
| Partition power | 2^partition_power partitions in the ring (typically 2^18 = 262,144) |
| Partition | A slice of the hash space. Every object maps to exactly one partition. |
| Device | A physical drive with assigned weight (capacity proportion) |
| Weight | Determines what fraction of partitions a device receives |
| Replica count | How many distinct devices hold each partition’s data |
| Zone | Fault domain grouping — ring enforces replicas land in distinct zones |
| Region | Geographic grouping — for geo-redundant deployments across data centers |
Object Request Flow
- Write (PUT)
- Read (GET)
- S3 API
The proxy writes to all replicas in parallel. It returns success to the client once a write quorum (default:
(replicas // 2) + 1) confirms the write.Replication Zones and Fault Domains
Storage nodes are grouped into zones for fault domain separation. The ring builder enforces replica placement across distinct zones.| Zone | Typical Mapping | Failure Isolated |
|---|---|---|
| Zone 1 | Rack 1 / PDU A / Switch A | Any single rack failure |
| Zone 2 | Rack 2 / PDU B / Switch B | Any single rack failure |
| Zone 3 | Rack 3 / PDU C / Switch C | Any single rack failure |
Capacity Planning
Replication Overhead
3-replica policy: usable capacity = raw capacity ÷ 3.
For 60 TB raw storage across 10 nodes, usable capacity = 20 TB.
EC Overhead
8+4 EC policy: usable capacity = raw capacity × (8 ÷ 12) = 66.7%.
For 60 TB raw storage, usable capacity = 40 TB — 2× more efficient than 3-replica.
Minimum Nodes per Policy
Replication (3×): minimum 3 nodes in 3 distinct zones.
EC 8+4: minimum 12 nodes. EC 4+2: minimum 6 nodes.
Ring Rebalancing
Adding nodes triggers a ring rebalance. Set
min_part_hours (minimum 1 hour) to limit
partition moves per cycle and prevent rebalance storms during rapid node additions.Next Steps
Storage Policies
Configure replication, EC, and multi-tier storage policies
Ring Management
Add drives, adjust weights, and distribute updated rings
Replication
Monitor replication health and manage quarantined objects
Monitoring
Track cluster capacity and proxy request metrics