Skip to main content

Overview

This guide covers infrastructure-level networking issues that require administrator access to diagnose and resolve — agent failures, VXLAN tunnel connectivity, HA router failover, and MTU configuration across the physical underlay.
Administrator Access Required — This operation requires the admin role. Contact your Xloud administrator if you do not have sufficient permissions.
Prerequisites
  • Admin credentials sourced from admin-openrc.sh
  • SSH access to compute and network nodes
  • XDeploy access for agent restarts

Diagnostic Quick Reference

Overview agent health across the cluster
openstack network agent list --long
List agents by host
openstack network agent list -f value -c Host -c Binary -c Alive | sort
Show all routers with HA state
openstack router list --all-projects -f json | grep -E '"id"|"ha"|"status"'

Common Issues

Cause: The agent process has crashed, the host is unreachable, or the message bus is not delivering heartbeats.Resolution:
  1. Identify the affected host:
    List agents with heartbeat timestamps
    openstack network agent list --long
    
  2. SSH to the affected host and check the agent service:
    Check L2 agent (Linux bridge)
    sudo systemctl status neutron-linuxbridge-agent
    
    Check agent container logs
    sudo docker logs neutron_openvswitch_agent --tail 100
    
  3. Restart the agent via XDeploy:
    Redeploy networking agents
    xavs-ansible deploy --tags neutron
    
After restarting, allow up to 30 seconds for the agent to re-register and send a heartbeat. Verify with openstack network agent list --long.
Cause: MTU mismatch, firewall blocking UDP 4789, or misconfigured tunnel endpoint IPs.Resolution:
  1. Verify UDP 4789 (VXLAN) is reachable between compute nodes:
    Test VXLAN port reachability
    nc -uvz <remote-compute-ip> 4789
    
  2. Confirm tunnel endpoint IPs:
    Show L2 agent configuration including tunnel IP
    openstack network agent list --agent-type ovs --long
    
    The Configuration field shows tunnel_types and local_ip.
  3. Verify the physical interface MTU accommodates VXLAN overhead:
    Check physical interface MTU
    ip link show eth0 | grep mtu
    
    For VXLAN, the physical MTU must be at least 1550 to carry 1500-byte tenant frames with 50-byte encapsulation overhead.
Cause: Tenant network MTU exceeds the physical network capacity after VXLAN encapsulation overhead.Resolution:
  1. Set the correct MTU on the affected tenant network:
    Update network MTU
    openstack network set app-network --mtu 1450
    
  2. The DHCP agent automatically pushes the updated MTU to new instances via DHCP option 26. Existing instances need a manual update or DHCP renewal:
    Set MTU on Linux guest
    ip link set eth0 mtu 1450
    
MTU recommendations: VXLAN networks = 1450, VLAN networks = 1500, jumbo-frame VLAN = up to 9000 (requires switch support end-to-end).
Cause: VRRP failover completed but the new master has not programmed floating IP NAT rules, or the failover did not complete.Resolution:
  1. Check the HA state across L3 agents:
    Show router HA status
    openstack router show ha-router -f json | grep -E "ha|status"
    
  2. List L3 agents for the router — confirm one is active:
    List L3 agents for the router
    openstack network agent list --router ha-router
    
  3. If stuck, trigger rescheduling by toggling admin state:
    Reschedule the HA router
    openstack router set ha-router --disable
    openstack router set ha-router --enable
    
  4. Check L3 agent logs on network nodes for VRRP negotiation errors:
    View L3 agent logs
    sudo docker logs neutron_l3_agent --tail 200
    
A long VRRP keepalive timeout (default ~3 seconds, dead interval ~10 seconds) can cause a 10–30 second outage before the standby takes over. Tune the VRRP timers in XDeploy if faster failover is required.
Cause: Physical network mapping misconfiguration or the L2 agent on the compute node does not have the bridge mapped.Resolution:
  1. Verify the bridge mapping on the affected compute node:
    Check bridge mapping in L2 agent config
    openstack network agent show <l2-agent-id> -f json | grep bridge_mappings
    
  2. Confirm the bridge exists on the host:
    Check bridge on compute node
    ssh xloud@<compute-node> "ip link show br-ex"
    
  3. If the bridge is missing, redeploy the networking configuration:
    Redeploy networking
    xavs-ansible deploy --tags neutron
    

Log Locations

ServiceLog Location
Networking APIdocker logs neutron_server
L2 Agent (SDN)docker logs neutron_openvswitch_agent
L3 Agentdocker logs neutron_l3_agent
DHCP Agentdocker logs neutron_dhcp_agent
Metadata Agentdocker logs neutron_metadata_agent

Next Steps

Network Agent Management

Manage agent enable/disable state and monitor health

L3 Router Configuration

Configure HA and DVR to prevent the issues described in this guide

Provider Networks

Verify provider network configuration if port bindings are failing

User Troubleshooting

Tenant-facing connectivity and floating IP troubleshooting