samba_4.1.17+dfsg-2+deb8u1 root share results in NT_STATUS_ACCESS_DENIED on subdirectories on Debian Jessie

Recently I saw a Debian Jessie server start returning “NT_STATUS_ACCESS_DENIED” whenever a user tried to access a subdirectory from a root share. A quick dig through the Debian bug tracker revealed this bug report so we’ll see it fixed in an update at some point.

However there’s no telling when the update will actually come; so the question is what to do in the meantime? One option is to replicate the mount point elsewhere and share that, e.g after doing the below we could just set “path=/mnt/root”

# mkdir /mnt/root
# mount -o rbind / /mnt/root  

The other option is to apply the patch supplied in the upstream bug report to the existing Debian package; the only issue here is we have to tread carefully so as not to break the packaging system. The Debian packaging system is very much an unknown to me, but this is how I go about applying such a patch (Disclaimer: Follow this advice at your own peril)

First we need to make sure we have the tools for building packages:

$ sudo apt-get install build-essential devscripts

Then get the source and the upstream patch:

$ cd /tmp
$ sudo apt-get update
$ wget -O samba.patch https://attachments.samba.org/attachment.cgi?id=11742
$ apt-get source samba
$ cd samba-4.1.17+dfsg

To prepare a patch proper we’d use quilt

$ sudo apt-get install quilt
$ export QUILT_PATCHES=debian/patches
$ export QUILT_REFRESH_ARGS="-p ab --no-timestamps --no-index"
$ quilt push -a
$ quilt new bug_812429_share_of_root_no_longer_works.patch 
$ quilt add source3/smbd/vfs.c
$ patch -p1 < ../samba.patch
$ quilt refresh
$ quilt pop -a 

Or alternatively as we only really care about making the binary package we can take a shortcut and just apply the patch on top of the source we’ve downloaded:

$ patch -p1 < ../samba.patch
$ dpkg-source --commit

Now we want to make sure our package doesn’t get overwritten until an actual update appears, we can bump the version number to enforce this:

$ debchange --increment
    * Add bug_812429_share_of_root_no_longer_works.patch

Now build our package(s):

$ dpkg-buildpackage -us -uc

We only really need the changes in “samba-libs_4.1.17+dfsg-2+deb8u1.1_amd64.deb”, but because we’ve bumped the version number we need to apply all the rebuilt packages that depend on samba-libs:

$ su -
# dpkg -i samba-libs_4.1.17+dfsg-2+deb8u1.1_amd64.deb
# dpkg -i python-samba_4.1.17+dfsg-2+deb8u1.1_amd64.deb
# dpkg -i libsmbclient_4.1.17+dfsg-2+deb8u1.1_amd64.deb
# dpkg -i samba-common_4.1.17+dfsg-2+deb8u1.1_all.deb
# dpkg -i samba-common-bin_4.1.17+dfsg-2+deb8u1.1_amd64.deb
# dpkg -i samba_4.1.17+dfsg-2+deb8u1.1_amd64.deb
# dpkg -i samba-dsdb-modules_4.1.17+dfsg-2+deb8u1.1_amd64.deb
# dpkg -i samba-vfs-modules_4.1.17+dfsg-2+deb8u1.1_amd64.deb
# dpkg -i smbclient_4.1.17+dfsg-2+deb8u1.1_amd64.deb  

Hopefully this will suffice, and once the Debian apt repository is updated, and only then will “apt-get upgrade” overwrite our patched package.

Windows psql and utf8 client_encoding issues

Prior to pg 9.1, you could connect to any database with psql regardless of encoding and you’d get the server encoding as your client encoding unless you set it:

C:\>chcp 1252
Active code page: 1252

C:\>"C:\Program Files\PostgreSQL\9.0\bin\psql" -U glyn -d TEST -h pgtest
psql (9.0.22, server 9.4.4)
WARNING: psql version 9.0, server version 9.4.
         Some psql features might not work.
SSL connection (cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256)
Type "help" for help.

TEST=> show client_encoding;
 client_encoding
-----------------
 LATIN1
(1 row)

That wasn’t quite right; the client_encoding is a lie. On a modern psql version that’s quite rightly prevented:

C:\>"C:\Program Files (x86)\pgAdmin III\1.20\psql" -U glyn -d TEST -h pgtest
psql: FATAL:  conversion between WIN1252 and LATIN1 is not supported

This is not an issue if you want to connect to a utf8 database, but the issue I had this morning was connecting to a latin1 database with psql from a Windows client (something I do rarely). If I set the codepage to utf8 to match client encoding, I got a “Not enough memory.” error:

C:\>chcp 65001
Active code page: 65001

C:\>set PGCLIENTENCODING=UTF8

C:\>"C:\Program Files (x86)\pgAdmin III\1.20\psql" -U glyn -d TEST -h pgtest
psql (9.4.0, server 9.4.4)
WARNING: Console code page (65001) differs from Windows code page (1252)
         8-bit characters might not work correctly. See psql reference
         page "Notes for Windows users" for details.
SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 25
6, compression: off)
Type "help" for help.

TEST=> show client_encoding;
Not enough memory.

I could set the codepage to 1252, but that would mean my setting for client_encoding would be a lie, and if I were to then revert to set client_encoding=’WIN1252′ I’d have come full circle and be back at the “FATAL: conversion between WIN1252 and LATIN1 is not supported” error message.

A quick google revealed these bug reports with no solutions. Another dig at the docs revealed the following passage:

pager

Controls use of a pager program for query and psql help output. If the environment variable PAGER is set, the output is piped to the specified program. Otherwise a platform-dependent default (such as more) is used.

So how does more behave?

C:\>chcp 65001
Active code page: 65001

C:\>more
Not enough memory.

Bingo! So if I turn the pager off the error should go:

C:\>chcp 65001
Active code page: 65001

C:\>set PGCLIENTENCODING=UTF8
C:\>"C:\Program Files (x86)\pgAdmin III\1.20\psql" -U glyn -d TEST -h pgtest
WARNING: Console code page (65001) differs from Windows code page (1252)
         8-bit characters might not work correctly. See psql reference
         page "Notes for Windows users" for details.
SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 25
6, compression: off)
Type "help" for help.

TEST=> \pset pager off
Pager usage is off.
TEST=> show client_encoding;
 client_encoding
-----------------
 UTF8
(1 row)

Cluprit found; quite embarrassing that the source of such a verbose error as “Not enough memory.” being Microsoft didn’t occour sooner. So lets try a different pager (sourced from)

C:\>chcp 65001
Active code page: 65001

C:\>set PGCLIENTENCODING=UTF8
C:\>set PAGER="C:\Program Files (x86)\gnuwin32\bin\less"
C:\>set LESS=--quit-at-eof

C:\>"C:\Program Files (x86)\pgAdmin III\1.20\psql" -U glyn -d TEST -h pgtest
psql (9.4.0, server 9.4.4)
WARNING: Console code page (65001) differs from Windows code page (1252)
         8-bit characters might not work correctly. See psql reference
         page "Notes for Windows users" for details.
SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 25
6, compression: off)
Type "help" for help.

TEST=> show client_encoding;

 client_encoding
-----------------
 UTF8
(1 row)

(END)

Being cavalier with slony nodes and pg_dump/pg_restore

It’s generally a bad idea to do logical dump/restores of slony nodes, and for this reason slony provides the CLONE PREPARE/CLONE FINISH action commands to clone subscriber nodes.

In this instance though, I’ve a test environment where I’d stopped the slons, dumped out and dropped a subscriber database and then gone on to do some other testing with that postgres cluster. Sometime later I want to do some more testing with the slony cluster; I never dropped this node from the slony config, but in the meantime I’ve changed the postgres version from 9.0 to 9.4 and recreated the postgres cluster with initdb. Schema wise nothing has changed with the old nodes.

What follows is some fudge to make the node resume with replication, it’s neither recommended nor guaranteed to work. Copy this process at your peril.

After recompiling the same version of slony I had before (2.2.4) and installing the binaries, I’ve just restored the old subscriber database dump into the new postgres 9.4 cluster and updated my extensions.

So what’s holding me off just restarting the slons now? Firstly slony stores the oid of each table participating in replication and these will surely have changed now. The following query produces a long list of tables where this is true:

[postgres]
TEST=# SELECT s.tab_set, s.tab_reloid, s.tab_nspname, s.tab_relname,
c.oid, n.nspname, c.relname
FROM _test_replication.sl_table s
JOIN pg_catalog.pg_class c ON c.oid = s.tab_reloid
JOIN pg_catalog.pg_namespace n ON c.relnamespace = n.oid
WHERE (s.tab_reloid, s.tab_nspname, s.tab_relname) <> (c.oid, n.nspname, c.relname);
[/postgres]

The fix here is pretty simple, I could run REPAIR CONFIG against the node, but instead I just hit the updatereloid(<set id>, <node id>) function directly:

[postgres]
TEST=# SELECT s.set_id, s.set_comment,
CASE _test_replication.updatereloid(s.set_id,1) WHEN 1 THEN ‘OK’ ELSE ‘FAILED’ END
FROM _test_replication.sl_set s;
[/postgres]

For clarity, my resurrected node is ID 1, and was a subscriber at the time it was destroyed.

Secondly slony stores the transaction ID for each replication event against the node, and if my resurrected node has a lower current xid then new events are going to look like they’re in the past compared to events that have already been generated against the node. I can see the current xid of my new postgresql cluster with the following query:

[postgres]
TEST=# SELECT txid_current();
txid_current
————–
25765
[/postgres]

And I can compare this to slony events generated against the node in its old incarnation as follows:

[postgres]
TEST=# SELECT max(txid_snapshot_xmax(ev_snapshot)), txid_current(), txid_current()-max(txid_snapshot_xmax(ev_snapshot)) AS delta
FROM _test_replication.sl_event where ev_origin = 1;

max | txid_current | delta
——-+————–+——–
89004 | 25767 | -63237
[/postgres]

So transaction ID wise my node is in the past, to bump that up I need about 63237 transactions to happen, if that were a lot higher I’d have to think of another way (perhaps by stopping all the slons and updating all values of ev_snapshot for node 1 on all nodes), but I can easily generate 60k transactions by hitting txid_current():

$ while [ 1 ]; do psql -U glyn -d TEST -tAc "SELECT txid_current();"; done
25768
25769
25770

<snip>

89002
89003
89004
89005
89006
^C
$ 

I can now restart my slons and replication should continue, and so far all appears well.