Overview
This page covers administrator-level XIMP troubleshooting. For user-facing issues such as alert delivery failures and missing metrics on dashboards, see the XIMP User Guide Troubleshooting page.Prerequisites
- Administrator credentials with the
adminrole - Access to XIMP CLI and management interfaces
Common Issues
High cardinality causing metric store performance issues
High cardinality causing metric store performance issues
Cause: Metric labels with unbounded values (e.g., request IDs, user IDs, or
ephemeral container names) create millions of unique metric series, degrading
query performance and consuming excessive storage.Diagnosis:Resolution:
Drop or relabel high-cardinality labels in the scrape configuration:Apply via:
List highest-cardinality metric series
Relabel config — drop high-cardinality label
Apply relabel configuration
Log ingestion backlog
Log ingestion backlog
Cause: Log volume exceeds the collector’s processing capacity, causing a write
backlog and delayed delivery to the search index.Diagnosis:Resolution:
Check ingestion queue depth
- Reduce log verbosity on high-volume services (set log level to
WARNINGinstead ofDEBUG):Example: reduce Nova log level - Increase log collector worker count in the XIMP configuration: Navigate to Monitoring → Administration → Collector Settings → Log Workers
- Add a second log collector node through XDeploy for horizontal scaling
Scrape target in DOWN state
Scrape target in DOWN state
Cause: The scrape target is unreachable — firewall blocking, service down,
or authentication failure.Diagnosis:Common causes:
Check specific target health
| Symptom | Cause | Resolution |
|---|---|---|
| Connection refused | Service not running on target port | Verify service is running; check port |
| Timeout | Firewall blocking | Add inbound rule for XIMP collector IP |
| 401 Unauthorized | Invalid auth credentials | Update auth config in target definition |
| 503 Service Unavailable | Service overloaded | Review service health; reduce scrape frequency |
XIMP metric store disk full
XIMP metric store disk full
Cause: Metric volume has exceeded the allocated storage for the metric store.
This can occur from high cardinality, insufficient retention management, or
unexpected metric bursts.Diagnosis:Resolution (in order of preference):
Check metric store disk usage
- Reduce raw metric retention to free space immediately:
Reduce raw retention to 15 days (emergency)
- Identify and drop high-cardinality series (see above)
- Expand storage on the metric store node through XDeploy
- Add a second metric store node for horizontal capacity
Dashboard shows 'No data' for a metric
Dashboard shows 'No data' for a metric
Cause: The scrape target is down, the agent is offline, or the metric name has
changed after a software update.Diagnosis:
- Check target health:
ximp target health --target <URL> - Verify agent is active:
ximp agent list --node <HOSTNAME> - Search for the metric by prefix to find renamed metrics:
Search metrics by prefix
Diagnostics Reference
| Issue | Diagnostic Command |
|---|---|
| Cardinality | ximp metric cardinality top --limit 20 |
| Log backlog | ximp log ingest-status |
| Target DOWN | ximp target health --verbose |
| Storage usage | ximp storage status |
| Agent offline | ximp agent list --status offline |
| Top log emitters | ximp log stats top-emitters --last 1h |
Next Steps
Agent Configuration
Review and fix agent configuration that may be causing issues
Retention Policies
Adjust retention settings to address storage pressure
Metric Endpoints
Review and fix scrape target configurations
User Guide Troubleshooting
User-facing issues — alerts not firing, log delays