WSFC and iscsitarget: “does not have the inquiry data (SCSI page 83h VPD descriptor) that is required by failover clustering”

Last week whilst trying to get to grips with SQL Server AlwaysOn Failover Clusters, I set up a simple iSCSI target using the “iscsitarget” package as per the Debian docs. However when trying to validate the cluster in WSFC (Windows Server Failover Clustering) the disk checks failed with:

“does not have the inquiry data (SCSI page 83h VPD descriptor) that is required by failover clustering”

This has something to do with the scsiId, which is required by the cluster manager to control volume ownership, being supplied by iscsitarget in a format unsupported by WSFC.

I failed to find a workaround for this and instead switched to using “tgt” to serve the iSCSI targets. I was pushed for time, and couldn’t find a straightforward guide so I’m documenting my steps here.

1) Install tgt:

# apt-get install tgt

2) Enable and start tgt:

# systemctl enable tgt.service
# systemctl start tgt.service

3) Create the iSCSI target(s) and add their backing stores:

# tgtadm --lld iscsi --op new --mode target --tid 1 --targetname iqn.2001-04.com.example:storage.lun1
# tgtadm --lld iscsi --op new --mode logicalunit --tid 1 --lun 1 --backing-store /dev/sdb1

4) Bind the target to listen on all interfaces, with a user account:

# tgtadm --lld iscsi --op bind --mode target --tid 1 -I ALL
# tgtadm --lld iscsi --op new --mode account --user mssql --password secret
# tgtadm --lld iscsi --op bind --mode account --tid 1 --user mssql

5) Dump the config out into a configuration file:

# tgt-admin --dump > /etc/tgt/conf.d/default.conf
# sed -i -e 's/PLEASE_CORRECT_THE_PASSWORD/secret/' /etc/tgt/conf.d/default.conf

6) Restart to ensure the configuration is picked up.

# systemctl restart tgt.service

Linux bonding and active-backup of LACP DLA / 802.3ad adapters on linked, but non-stacked switches.

Today I’m trying to configure a couple of servers each with 2 LACP trunks going to separate switches on our network. I was hoping that if I made a single 802.3ad bond with all the interfaces it’d automatically work in active-backup mode with the 2 trunks and give me switch redundancy.

It would appear that the Linux bonding driver does do this, so I set up my bond as follows:

auto bond0
iface bond0 inet static
	address 192.168.0.30
	netmask 255.255.255.0
	network 192.168.0.0
	broadcast 192.168.0.255
	gateway 192.168.0.1
	bond_slaves eth0 eth1 eth2 eth3
	bond_mode 802.3ad
	bond_miimon 100
	bond_downdelay 200
	bond_updelay 200       
	bond_lacp_rate 1  
	bond_xmit_hash_policy layer2+3
# invoke-rc.d networking reload

… all initially appears good, and I can see 2 separate aggregator ids; 1 & 3 with active aggregator id 1:

Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2+3 (2)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 200
Down Delay (ms): 200

802.3ad info
LACP rate: fast
Min links: 0
Aggregator selection policy (ad_select): stable
Active Aggregator Info:
		Aggregator ID: 1
		Number of ports: 2
		Actor Key: 17
		Partner Key: 386
		Partner Mac Address: 00:21:f7:0e:c1:00

Slave Interface: eth0
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 24:6e:96:19:9d:f0
Aggregator ID: 1
Slave queue ID: 0

Slave Interface: eth1
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 24:6e:96:19:9d:f1
Aggregator ID: 1
Slave queue ID: 0

Slave Interface: eth2
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 24:6e:96:19:9d:f2
Aggregator ID: 3
Slave queue ID: 0

Slave Interface: eth3
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 24:6e:96:19:9d:f3
Aggregator ID: 3
Slave queue ID: 0

If I bring down both interfaces on aggregator 1 then the above switches to aggregator ID 3 and all seems good.

# ifconfig eth0 down; ifconfig eth1 down

But it all goes bad once I bring those interfaces back up; the machine disappears off the network.

# ifconfig eth0 up; ifconfig eth1 up

The issue appears to be that the link status on both trunks is up, and since the MAC address used is the same for each trunk (that of the first adapter) once traffic has passed through both switches they both have the MAC present in their switching tables.

I couldn’t find any proper workaround for this, and eventually found a stack-exchange post outlining the same issue. Aparrently if the switches can be linked with VPC (Virtual Port Channel) or MLAG (Multi Chassis Link Aggregation) then it can work, but otherwise not.

What I’ve done in the end is a poor-mans workaround that simply involves checking the status of the bond, and switching the interfaces when the aggregator becomes inactive. It looks like this (on debian):

auto bond0
iface bond0 inet static
	hwaddress <mac address>
	address 192.168.0.30
	netmask 255.255.255.0
	network 192.168.0.0
	broadcast 192.168.0.255
	gateway 192.168.0.1
	bond_slaves eth0 eth1
	bond_mode 802.3ad
	bond_miimon 100
	bond_downdelay 200
	bond_updelay 200       
	bond_lacp_rate 1  
	bond_xmit_hash_policy layer2+3

Set the bond to always use the MAC from eth0 instead of the first interface:

# mac=$(cat /sys/class/net/eth0/address); sed -i "s/<mac address>/$mac/" /etc/network/interfaces            

Reload our network configuration and check this simple configuration works as we expect:

# invoke-rc.d networking reload

Now create a script to check the status of the bond, and if it shows no active aggregator then switch the interfaces and reload network configuration:

# vi /usr/local/bin/lacp_switch.sh
#!/bin/bash
if [ $(grep -c 'bond bond0 has no active aggregator' /proc/net/bonding/bond0) -eq 1 ]; then
	if [ $(grep -c 'eth2' /etc/network/interfaces) -eq 1 ]; then
			echo "$(date +'%T %x') : Changing bond0 slaves to eth0 & eth1 on switch1"
			sed -i 's/eth2/eth0/;s/eth3/eth1/' /etc/network/interfaces
	elif [ $(grep -c 'eth0' /etc/network/interfaces) -eq 1 ]; then
			echo "$(date +'%T %x') : Changing bond0 slaves to eth2 & eth3 on switch2"
			sed -i 's/eth0/eth2/;s/eth1/eth3/' /etc/network/interfaces
	else
			echo "$(date +'%T %x') : Unknown configuration"
			exit 1
	fi
	/etc/init.d/networking reload
fi

Make it executable and schedule it to run every 6 seconds:

# chmod 700 /usr/local/bin/lacp_switch.sh	
# echo -e "SHELL=/bin/bash\n* * * * * root for i in {1..10}; do /usr/local/bin/lacp_switch.sh >> /var/log/lacp_switch_check 2>&1 & sleep 6; done" > /etc/cron.d/lacp_switch_check

This works, but I’m not happy with it. If somebody knows a way to do the above please do tell!

samba_4.1.17+dfsg-2+deb8u1 root share results in NT_STATUS_ACCESS_DENIED on subdirectories on Debian Jessie

Recently I saw a Debian Jessie server start returning “NT_STATUS_ACCESS_DENIED” whenever a user tried to access a subdirectory from a root share. A quick dig through the Debian bug tracker revealed this bug report so we’ll see it fixed in an update at some point.

However there’s no telling when the update will actually come; so the question is what to do in the meantime? One option is to replicate the mount point elsewhere and share that, e.g after doing the below we could just set “path=/mnt/root”

# mkdir /mnt/root
# mount -o rbind / /mnt/root  

The other option is to apply the patch supplied in the upstream bug report to the existing Debian package; the only issue here is we have to tread carefully so as not to break the packaging system. The Debian packaging system is very much an unknown to me, but this is how I go about applying such a patch (Disclaimer: Follow this advice at your own peril)

First we need to make sure we have the tools for building packages:

$ sudo apt-get install build-essential devscripts

Then get the source and the upstream patch:

$ cd /tmp
$ sudo apt-get update
$ wget -O samba.patch https://attachments.samba.org/attachment.cgi?id=11742
$ apt-get source samba
$ cd samba-4.1.17+dfsg

To prepare a patch proper we’d use quilt

$ sudo apt-get install quilt
$ export QUILT_PATCHES=debian/patches
$ export QUILT_REFRESH_ARGS="-p ab --no-timestamps --no-index"
$ quilt push -a
$ quilt new bug_812429_share_of_root_no_longer_works.patch 
$ quilt add source3/smbd/vfs.c
$ patch -p1 < ../samba.patch
$ quilt refresh
$ quilt pop -a 

Or alternatively as we only really care about making the binary package we can take a shortcut and just apply the patch on top of the source we’ve downloaded:

$ patch -p1 < ../samba.patch
$ dpkg-source --commit

Now we want to make sure our package doesn’t get overwritten until an actual update appears, we can bump the version number to enforce this:

$ debchange --increment
    * Add bug_812429_share_of_root_no_longer_works.patch

Now build our package(s):

$ dpkg-buildpackage -us -uc

We only really need the changes in “samba-libs_4.1.17+dfsg-2+deb8u1.1_amd64.deb”, but because we’ve bumped the version number we need to apply all the rebuilt packages that depend on samba-libs:

$ su -
# dpkg -i samba-libs_4.1.17+dfsg-2+deb8u1.1_amd64.deb
# dpkg -i python-samba_4.1.17+dfsg-2+deb8u1.1_amd64.deb
# dpkg -i libsmbclient_4.1.17+dfsg-2+deb8u1.1_amd64.deb
# dpkg -i samba-common_4.1.17+dfsg-2+deb8u1.1_all.deb
# dpkg -i samba-common-bin_4.1.17+dfsg-2+deb8u1.1_amd64.deb
# dpkg -i samba_4.1.17+dfsg-2+deb8u1.1_amd64.deb
# dpkg -i samba-dsdb-modules_4.1.17+dfsg-2+deb8u1.1_amd64.deb
# dpkg -i samba-vfs-modules_4.1.17+dfsg-2+deb8u1.1_amd64.deb
# dpkg -i smbclient_4.1.17+dfsg-2+deb8u1.1_amd64.deb  

Hopefully this will suffice, and once the Debian apt repository is updated, and only then will “apt-get upgrade” overwrite our patched package.