Linux bonding and active-backup of LACP DLA / 802.3ad adapters on linked, but non-stacked switches.

Today I’m trying to configure a couple of servers each with 2 LACP trunks going to separate switches on our network. I was hoping that if I made a single 802.3ad bond with all the interfaces it’d automatically work in active-backup mode with the 2 trunks and give me switch redundancy.

It would appear that the Linux bonding driver does do this, so I set up my bond as follows:

auto bond0
iface bond0 inet static
	address 192.168.0.30
	netmask 255.255.255.0
	network 192.168.0.0
	broadcast 192.168.0.255
	gateway 192.168.0.1
	bond_slaves eth0 eth1 eth2 eth3
	bond_mode 802.3ad
	bond_miimon 100
	bond_downdelay 200
	bond_updelay 200       
	bond_lacp_rate 1  
	bond_xmit_hash_policy layer2+3
# invoke-rc.d networking reload

… all initially appears good, and I can see 2 separate aggregator ids; 1 & 3 with active aggregator id 1:

Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2+3 (2)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 200
Down Delay (ms): 200

802.3ad info
LACP rate: fast
Min links: 0
Aggregator selection policy (ad_select): stable
Active Aggregator Info:
		Aggregator ID: 1
		Number of ports: 2
		Actor Key: 17
		Partner Key: 386
		Partner Mac Address: 00:21:f7:0e:c1:00

Slave Interface: eth0
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 24:6e:96:19:9d:f0
Aggregator ID: 1
Slave queue ID: 0

Slave Interface: eth1
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 24:6e:96:19:9d:f1
Aggregator ID: 1
Slave queue ID: 0

Slave Interface: eth2
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 24:6e:96:19:9d:f2
Aggregator ID: 3
Slave queue ID: 0

Slave Interface: eth3
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 24:6e:96:19:9d:f3
Aggregator ID: 3
Slave queue ID: 0

If I bring down both interfaces on aggregator 1 then the above switches to aggregator ID 3 and all seems good.

# ifconfig eth0 down; ifconfig eth1 down

But it all goes bad once I bring those interfaces back up; the machine disappears off the network.

# ifconfig eth0 up; ifconfig eth1 up

The issue appears to be that the link status on both trunks is up, and since the MAC address used is the same for each trunk (that of the first adapter) once traffic has passed through both switches they both have the MAC present in their switching tables.

I couldn’t find any proper workaround for this, and eventually found a stack-exchange post outlining the same issue. Aparrently if the switches can be linked with VPC (Virtual Port Channel) or MLAG (Multi Chassis Link Aggregation) then it can work, but otherwise not.

What I’ve done in the end is a poor-mans workaround that simply involves checking the status of the bond, and switching the interfaces when the aggregator becomes inactive. It looks like this (on debian):

auto bond0
iface bond0 inet static
	hwaddress <mac address>
	address 192.168.0.30
	netmask 255.255.255.0
	network 192.168.0.0
	broadcast 192.168.0.255
	gateway 192.168.0.1
	bond_slaves eth0 eth1
	bond_mode 802.3ad
	bond_miimon 100
	bond_downdelay 200
	bond_updelay 200       
	bond_lacp_rate 1  
	bond_xmit_hash_policy layer2+3

Set the bond to always use the MAC from eth0 instead of the first interface:

# mac=$(cat /sys/class/net/eth0/address); sed -i "s/<mac address>/$mac/" /etc/network/interfaces            

Reload our network configuration and check this simple configuration works as we expect:

# invoke-rc.d networking reload

Now create a script to check the status of the bond, and if it shows no active aggregator then switch the interfaces and reload network configuration:

# vi /usr/local/bin/lacp_switch.sh
#!/bin/bash
if [ $(grep -c 'bond bond0 has no active aggregator' /proc/net/bonding/bond0) -eq 1 ]; then
	if [ $(grep -c 'eth2' /etc/network/interfaces) -eq 1 ]; then
			echo "$(date +'%T %x') : Changing bond0 slaves to eth0 & eth1 on switch1"
			sed -i 's/eth2/eth0/;s/eth3/eth1/' /etc/network/interfaces
	elif [ $(grep -c 'eth0' /etc/network/interfaces) -eq 1 ]; then
			echo "$(date +'%T %x') : Changing bond0 slaves to eth2 & eth3 on switch2"
			sed -i 's/eth0/eth2/;s/eth1/eth3/' /etc/network/interfaces
	else
			echo "$(date +'%T %x') : Unknown configuration"
			exit 1
	fi
	/etc/init.d/networking reload
fi

Make it executable and schedule it to run every 6 seconds:

# chmod 700 /usr/local/bin/lacp_switch.sh	
# echo -e "SHELL=/bin/bash\n* * * * * root for i in {1..10}; do /usr/local/bin/lacp_switch.sh >> /var/log/lacp_switch_check 2>&1 & sleep 6; done" > /etc/cron.d/lacp_switch_check

This works, but I’m not happy with it. If somebody knows a way to do the above please do tell!

Leave a Reply

Your email address will not be published. Required fields are marked *