Category Archives: Debian

WSFC and iscsitarget: “does not have the inquiry data (SCSI page 83h VPD descriptor) that is required by failover clustering”

Last week whilst trying to get to grips with SQL Server AlwaysOn Failover Clusters, I set up a simple iSCSI target using the “iscsitarget” package as per the Debian docs. However when trying to validate the cluster in WSFC (Windows Server Failover Clustering) the disk checks failed with:

“does not have the inquiry data (SCSI page 83h VPD descriptor) that is required by failover clustering”

This has something to do with the scsiId, which is required by the cluster manager to control volume ownership, being supplied by iscsitarget in a format unsupported by WSFC.

I failed to find a workaround for this and instead switched to using “tgt” to serve the iSCSI targets. I was pushed for time, and couldn’t find a straightforward guide so I’m documenting my steps here.

1) Install tgt:

# apt-get install tgt

2) Enable and start tgt:

# systemctl enable tgt.service
# systemctl start tgt.service

3) Create the iSCSI target(s) and add their backing stores:

# tgtadm --lld iscsi --op new --mode target --tid 1 --targetname iqn.2001-04.com.example:storage.lun1
# tgtadm --lld iscsi --op new --mode logicalunit --tid 1 --lun 1 --backing-store /dev/sdb1

4) Bind the target to listen on all interfaces, with a user account:

# tgtadm --lld iscsi --op bind --mode target --tid 1 -I ALL
# tgtadm --lld iscsi --op new --mode account --user mssql --password secret
# tgtadm --lld iscsi --op bind --mode account --tid 1 --user mssql

5) Dump the config out into a configuration file:

# tgt-admin --dump > /etc/tgt/conf.d/default.conf
# sed -i -e 's/PLEASE_CORRECT_THE_PASSWORD/secret/' /etc/tgt/conf.d/default.conf

6) Restart to ensure the configuration is picked up.

# systemctl restart tgt.service

Linux bonding and active-backup of LACP DLA / 802.3ad adapters on linked, but non-stacked switches.

Today I’m trying to configure a couple of servers each with 2 LACP trunks going to separate switches on our network. I was hoping that if I made a single 802.3ad bond with all the interfaces it’d automatically work in active-backup mode with the 2 trunks and give me switch redundancy.

It would appear that the Linux bonding driver does do this, so I set up my bond as follows:

auto bond0
iface bond0 inet static
	address 192.168.0.30
	netmask 255.255.255.0
	network 192.168.0.0
	broadcast 192.168.0.255
	gateway 192.168.0.1
	bond_slaves eth0 eth1 eth2 eth3
	bond_mode 802.3ad
	bond_miimon 100
	bond_downdelay 200
	bond_updelay 200       
	bond_lacp_rate 1  
	bond_xmit_hash_policy layer2+3
# invoke-rc.d networking reload

… all initially appears good, and I can see 2 separate aggregator ids; 1 & 3 with active aggregator id 1:

Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2+3 (2)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 200
Down Delay (ms): 200

802.3ad info
LACP rate: fast
Min links: 0
Aggregator selection policy (ad_select): stable
Active Aggregator Info:
		Aggregator ID: 1
		Number of ports: 2
		Actor Key: 17
		Partner Key: 386
		Partner Mac Address: 00:21:f7:0e:c1:00

Slave Interface: eth0
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 24:6e:96:19:9d:f0
Aggregator ID: 1
Slave queue ID: 0

Slave Interface: eth1
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 24:6e:96:19:9d:f1
Aggregator ID: 1
Slave queue ID: 0

Slave Interface: eth2
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 24:6e:96:19:9d:f2
Aggregator ID: 3
Slave queue ID: 0

Slave Interface: eth3
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 24:6e:96:19:9d:f3
Aggregator ID: 3
Slave queue ID: 0

If I bring down both interfaces on aggregator 1 then the above switches to aggregator ID 3 and all seems good.

# ifconfig eth0 down; ifconfig eth1 down

But it all goes bad once I bring those interfaces back up; the machine disappears off the network.

# ifconfig eth0 up; ifconfig eth1 up

The issue appears to be that the link status on both trunks is up, and since the MAC address used is the same for each trunk (that of the first adapter) once traffic has passed through both switches they both have the MAC present in their switching tables.

I couldn’t find any proper workaround for this, and eventually found a stack-exchange post outlining the same issue. Aparrently if the switches can be linked with VPC (Virtual Port Channel) or MLAG (Multi Chassis Link Aggregation) then it can work, but otherwise not.

What I’ve done in the end is a poor-mans workaround that simply involves checking the status of the bond, and switching the interfaces when the aggregator becomes inactive. It looks like this (on debian):

auto bond0
iface bond0 inet static
	hwaddress <mac address>
	address 192.168.0.30
	netmask 255.255.255.0
	network 192.168.0.0
	broadcast 192.168.0.255
	gateway 192.168.0.1
	bond_slaves eth0 eth1
	bond_mode 802.3ad
	bond_miimon 100
	bond_downdelay 200
	bond_updelay 200       
	bond_lacp_rate 1  
	bond_xmit_hash_policy layer2+3

Set the bond to always use the MAC from eth0 instead of the first interface:

# mac=$(cat /sys/class/net/eth0/address); sed -i "s/<mac address>/$mac/" /etc/network/interfaces            

Reload our network configuration and check this simple configuration works as we expect:

# invoke-rc.d networking reload

Now create a script to check the status of the bond, and if it shows no active aggregator then switch the interfaces and reload network configuration:

# vi /usr/local/bin/lacp_switch.sh
#!/bin/bash
if [ $(grep -c 'bond bond0 has no active aggregator' /proc/net/bonding/bond0) -eq 1 ]; then
	if [ $(grep -c 'eth2' /etc/network/interfaces) -eq 1 ]; then
			echo "$(date +'%T %x') : Changing bond0 slaves to eth0 & eth1 on switch1"
			sed -i 's/eth2/eth0/;s/eth3/eth1/' /etc/network/interfaces
	elif [ $(grep -c 'eth0' /etc/network/interfaces) -eq 1 ]; then
			echo "$(date +'%T %x') : Changing bond0 slaves to eth2 & eth3 on switch2"
			sed -i 's/eth0/eth2/;s/eth1/eth3/' /etc/network/interfaces
	else
			echo "$(date +'%T %x') : Unknown configuration"
			exit 1
	fi
	/etc/init.d/networking reload
fi

Make it executable and schedule it to run every 6 seconds:

# chmod 700 /usr/local/bin/lacp_switch.sh	
# echo -e "SHELL=/bin/bash\n* * * * * root for i in {1..10}; do /usr/local/bin/lacp_switch.sh >> /var/log/lacp_switch_check 2>&1 & sleep 6; done" > /etc/cron.d/lacp_switch_check

This works, but I’m not happy with it. If somebody knows a way to do the above please do tell!

Latest megacli 8.07.14 emits message “Configure Adapter Failed” on Perc5i (LSI MegaSAS 8408E)

This morning I came to reconfigure a raid array on an old Dell Perc5i with the LSI MegaRAID CLI tool megacli, and whilst displaying information and removing an old LD appeared to work fine, I was greeted with the following when trying to add a new LD:

# megacli -CfgSpanAdd -r10 -Array0[10:2,10:3] Array1[10:6,10:7] WB RA Direct CachedBadBBU -a0

Adapter 0: Configure Adapter Failed

Exit Code: 0x03

Balls. The “Exit Code: 0x03” is supposed to mean “Input parameters are invalid” (ref), but after a few moments of head scratching and checking my parameters I realised that shouldn’t have been it. The megacli package I’m using comes from the debian repository at hwraid.le-vert.net and always worked in the past. A quick check of their homepage reveals a news item on their front page stating.

2014/01/26 — I just updated megacli to release 8.07.14. Despite it seems to works for me, I’d really appreciate some feedbacks, especially if you’re running a 32 bits system. Please drop me a mail !

So something in the new version isn’t 100% compatible with the Perc5i, I’ll send them an email, but I needed to get the adapter configured and wasn’t too keen to trudge off to the server room. After a quick google search I managed to find a rather old v4.00.16 rpm package in an archive here here and pull out the amd64 binary with rpm2cpio:

$ wget http://docs.avagotech.com/docs-and-downloads/legacy-raid-controllers/legacy-raid-controllers-common-files/4-00-16_Linux_MegaCli.zip
$ unzip 4.00.16_Linux_MegaCli.zip
$ unzip MegaCliLin.zip
$ rpm2cpio MegaCli-4.00.16-1.i386.rpm | cpio -idmv
$ cd opt/MegaRAID/MegaCli/
# ./MegaCli64 -CfgSpanAdd -r10 -Array0[10:2,10:3] Array1[10:6,10:7] WB RA Direct CachedBadBBU -a0

Adapter 0: Created VD 1

Adapter 0: Configured the Adapter!!

Exit Code: 0x00

Success!

Edit: 2014/10/16 12:30 The package here here is much more up to date and also works for me with the Perc5i, however it’s not a trustworthy link and I had to do a little more diddling to get it to run on Wheezy:

# apt-get install libsysfs2
# ln -s /lib/x86_64-linux-gnu/libsysfs.so.2.0.1 /lib/x86_64-linux-gnu/libsysfs.so.2.0.2