> ## Documentation Index
> Fetch the complete documentation index at: https://docs.xloud.tech/llms.txt
> Use this file to discover all available pages before exploring further.

# Troubleshooting

> Diagnose and resolve common XSDS user-facing issues — stuck volumes, snapshot failures, object storage performance, and access errors.

## Overview

This page covers common issues encountered when using XSDS block, object, and shared
file storage — including diagnosis steps and resolutions for each scenario.

<Note>
  **Prerequisites**

  * Access to the Xloud Dashboard and CLI (`openstack` CLI authenticated)
  * For advanced diagnostics, contact your storage administrator. Your administrator can configure this through [XDeploy](/deployment).
</Note>

***

## Common Issues

<AccordionGroup>
  <Accordion title="Volume stuck in 'creating' status" icon="clock">
    **Cause**: The storage scheduler could not place the volume on a suitable backend,
    or the backend is temporarily unavailable.

    **Diagnosis**:

    ```bash title="Check volume status and fault message" theme={null}
    openstack volume show <VOLUME_NAME> -c status -c fault
    ```

    Common causes and resolutions:

    | Cause                                         | Resolution                                                                                                                                                                 |
    | --------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
    | Insufficient capacity in the target pool      | Choose a different volume type or contact your administrator to expand capacity. Your administrator can configure this through [XDeploy](/deployment).                     |
    | Volume type references an unavailable backend | Try a different volume type; contact your administrator if the issue persists. Your administrator can configure this through [XDeploy](/deployment).                       |
    | Storage service temporarily unhealthy         | Wait 2–3 minutes and check status again; contact your administrator if it persists beyond 5 minutes. Your administrator can configure this through [XDeploy](/deployment). |
    | Quota exceeded                                | Check quota with `openstack quota show --volume` and request an increase from your administrator                                                                           |

    Contact your storage administrator if the volume remains in `creating` state for
    more than 5 minutes. Your administrator can configure this through [XDeploy](/deployment).
  </Accordion>

  <Accordion title="Volume stuck in 'deleting' status" icon="trash">
    **Cause**: A snapshot derived from this volume still exists, or there is an
    ongoing operation that holds a lock on the volume.

    **Diagnosis**:

    ```bash title="List snapshots from this volume" theme={null}
    openstack volume snapshot list --volume <VOLUME_NAME_OR_ID>
    ```

    **Resolution**:

    1. Delete all snapshots that were created from this volume:
       ```bash title="Delete child snapshot" theme={null}
       openstack volume snapshot delete <SNAPSHOT_ID>
       ```
    2. After all child snapshots are deleted, retry the volume deletion:
       ```bash title="Retry volume deletion" theme={null}
       openstack volume delete <VOLUME_NAME_OR_ID>
       ```
  </Accordion>

  <Accordion title="Snapshot creation fails" icon="x-circle">
    **Cause**: The source volume is attached and has in-flight I/O, or the snapshot
    quota for the project has been reached.

    **Diagnosis**:

    ```bash title="Check project snapshot quota" theme={null}
    openstack quota show --volume
    ```

    Look for `snapshots` in the output. If `used >= limit`, request a quota increase.

    **Resolution**:

    * For quota exceeded: contact your administrator to increase the snapshot quota. Your administrator can configure this through [XDeploy](/deployment).
    * For consistency issues: flush application writes before taking a snapshot of
      a database volume (see [Snapshots — Consistency](/services/sds/user-guide/snapshots))

    <Warning>
      Crash-consistent snapshots capture the on-disk state at the moment of the snapshot
      request. For databases and stateful applications, coordinate with application-level
      freeze/thaw procedures to ensure data integrity.
    </Warning>
  </Accordion>

  <Accordion title="Object storage — poor upload/download performance" icon="gauge">
    **Cause**: Large numbers of small objects, high latency between the client and
    the gateway, or network routing through the public internet for intra-cluster traffic.

    **Resolution**:

    * Use multi-part upload for objects larger than 100 MB:
      ```python title="boto3 multi-part upload" theme={null}
      s3.upload_file(
          'large_file.tar.gz', 'my-bucket', 'large_file.tar.gz',
          Config=boto3.s3.transfer.TransferConfig(
              multipart_threshold=1024*1024*100,  # 100 MB
              multipart_chunksize=1024*1024*50    # 50 MB chunks
          )
      )
      ```
    * For small-object workloads, batch objects into larger archives where the
      application permits
    * Verify network path to the storage endpoint — avoid routing through the public
      internet for intra-cluster traffic

    <Tip>
      Use the S3 API endpoint local to your region for lowest latency. Check
      **Project → Object Store → Endpoints** in the Dashboard for your regional endpoint URL.
    </Tip>
  </Accordion>

  <Accordion title="NFS mount fails or is inaccessible" icon="network">
    **Cause**: Firewall rules blocking NFS traffic, incorrect export path, or the
    NFS gateway service is unhealthy.

    **Diagnosis**:

    ```bash title="Test NFS gateway connectivity" theme={null}
    showmount -e <gateway-ip>
    ```

    ```bash title="Check mount connectivity" theme={null}
    rpcinfo -p <gateway-ip>
    ```

    **Resolution**:

    * Ensure security group rules on the client instance permit outbound traffic to
      the NFS gateway on ports 111 (portmapper) and 2049 (NFS)
    * Verify the export path matches exactly what was provided in the Dashboard
    * If `showmount` hangs, the NFS gateway may be temporarily unavailable — contact
      your storage administrator

    <Note>
      NFS port 2049 must be open in the security group applied to client instances.
      Navigate to **Project → Network → Security Groups** and verify the rule exists.
    </Note>
  </Accordion>

  <Accordion title="S3 authentication error (403 Forbidden)" icon="lock">
    **Cause**: The access key or secret key is incorrect, expired, or belongs to a
    different project.

    **Resolution**:

    1. Verify credentials in the Dashboard under **Project → Object Store → Access Keys**
    2. If the key was deleted or lost, generate a new key pair:
       * Navigate to **Project → Object Store → Access Keys → Create Key**
       * Update all applications and configuration files using the old key
    3. Confirm the endpoint URL matches your region:
       ```bash title="Verify S3 endpoint" theme={null}
       openstack catalog show object-store
       ```
  </Accordion>
</AccordionGroup>

***

## Diagnostics Reference

| Issue                    | First Diagnostic Command                         |
| ------------------------ | ------------------------------------------------ |
| Volume not creating      | `openstack volume show <VOL> -c status -c fault` |
| Quota check              | `openstack quota show --volume`                  |
| Snapshot list for volume | `openstack volume snapshot list --volume <VOL>`  |
| Object storage endpoint  | `openstack catalog show object-store`            |
| NFS gateway reachability | `showmount -e <gateway-ip>`                      |

***

## When to Contact Support

Contact [support@xloud.tech](mailto:support@xloud.tech) if:

* A volume has been stuck in `creating` or `deleting` state for more than 10 minutes
* The storage administrator cannot resolve the issue from the cluster admin CLI
* You observe data inconsistency after a snapshot restore
* Object storage bucket contents are missing unexpectedly

<Tip>
  When opening a support ticket, include the output of
  `openstack volume show <VOLUME_ID>` or `openstack volume snapshot show <SNAPSHOT_ID>` —
  the `fault` and `migration_status` fields are particularly useful for diagnosis.
</Tip>

***

## Next Steps

<CardGroup cols={2}>
  <Card title="XSDS Admin Troubleshooting" href="/services/sds/admin-guide/troubleshooting" color="#197560">
    Cluster-level diagnostics for storage administrators — OSD failures, slow requests
  </Card>

  <Card title="Data Protection" href="/services/sds/user-guide/data-protection" color="#197560">
    Configure replication and erasure coding to reduce exposure to hardware failures
  </Card>

  <Card title="Snapshots" href="/services/sds/user-guide/snapshots" color="#197560">
    Best practices for creating consistent snapshots to minimize recovery risk
  </Card>

  <Card title="Support" href="mailto:support@xloud.tech" color="#197560">
    Contact Xloud support for issues that require cluster-level investigation
  </Card>
</CardGroup>
