Overview
This guide covers infrastructure-level networking issues that require administrator access to diagnose and resolve — agent failures, VXLAN tunnel connectivity, HA router failover, and MTU configuration across the physical underlay.Prerequisites
- Admin credentials sourced from
admin-openrc.sh - SSH access to compute and network nodes
- XDeploy access for agent restarts
Diagnostic Quick Reference
Overview agent health across the cluster
List agents by host
Show all routers with HA state
Common Issues
Agent is down or showing stale heartbeat
Agent is down or showing stale heartbeat
Cause: The agent process has crashed, the host is unreachable, or the message
bus is not delivering heartbeats.Resolution:
- Identify the affected host:
List agents with heartbeat timestamps
- SSH to the affected host and check the agent service:
Check L2 agent (Linux bridge)Check agent container logs
- Restart the agent via XDeploy:
Redeploy networking agents
VXLAN tunnel failures between compute nodes
VXLAN tunnel failures between compute nodes
Cause: MTU mismatch, firewall blocking UDP 4789, or misconfigured tunnel
endpoint IPs.Resolution:
- Verify UDP 4789 (VXLAN) is reachable between compute nodes:
Test VXLAN port reachability
- Confirm tunnel endpoint IPs:
TheShow L2 agent configuration including tunnel IP
Configurationfield showstunnel_typesandlocal_ip. - Verify the physical interface MTU accommodates VXLAN overhead:
For VXLAN, the physical MTU must be at leastCheck physical interface MTU
1550to carry 1500-byte tenant frames with 50-byte encapsulation overhead.
MTU mismatch causing application-level failures
MTU mismatch causing application-level failures
Cause: Tenant network MTU exceeds the physical network capacity after
VXLAN encapsulation overhead.Resolution:
- Set the correct MTU on the affected tenant network:
Update network MTU
- The DHCP agent automatically pushes the updated MTU to new instances via
DHCP option 26. Existing instances need a manual update or DHCP renewal:
Set MTU on Linux guest
MTU recommendations: VXLAN networks =
1450, VLAN networks = 1500,
jumbo-frame VLAN = up to 9000 (requires switch support end-to-end).HA router not recovering after L3 agent failure
HA router not recovering after L3 agent failure
Cause: VRRP failover completed but the new master has not programmed
floating IP NAT rules, or the failover did not complete.Resolution:
- Check the HA state across L3 agents:
Show router HA status
- List L3 agents for the router — confirm one is
active:List L3 agents for the router - If stuck, trigger rescheduling by toggling admin state:
Reschedule the HA router
- Check L3 agent logs on network nodes for VRRP negotiation errors:
View L3 agent logs
Provider network ports not binding
Provider network ports not binding
Cause: Physical network mapping misconfiguration or the L2 agent on the
compute node does not have the bridge mapped.Resolution:
- Verify the bridge mapping on the affected compute node:
Check bridge mapping in L2 agent config
- Confirm the bridge exists on the host:
Check bridge on compute node
- If the bridge is missing, redeploy the networking configuration:
Redeploy networking
Log Locations
| Service | Log Location |
|---|---|
| Networking API | docker logs neutron_server |
| L2 Agent (SDN) | docker logs neutron_openvswitch_agent |
| L3 Agent | docker logs neutron_l3_agent |
| DHCP Agent | docker logs neutron_dhcp_agent |
| Metadata Agent | docker logs neutron_metadata_agent |
Next Steps
Network Agent Management
Manage agent enable/disable state and monitor health
L3 Router Configuration
Configure HA and DVR to prevent the issues described in this guide
Provider Networks
Verify provider network configuration if port bindings are failing
User Troubleshooting
Tenant-facing connectivity and floating IP troubleshooting