Category Archives: PostgreSQL

Windows psql and utf8 client_encoding issues

Prior to pg 9.1, you could connect to any database with psql regardless of encoding and you’d get the server encoding as your client encoding unless you set it:

C:\>chcp 1252
Active code page: 1252

C:\>"C:\Program Files\PostgreSQL\9.0\bin\psql" -U glyn -d TEST -h pgtest
psql (9.0.22, server 9.4.4)
WARNING: psql version 9.0, server version 9.4.
         Some psql features might not work.
SSL connection (cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256)
Type "help" for help.

TEST=> show client_encoding;
 client_encoding
-----------------
 LATIN1
(1 row)

That wasn’t quite right; the client_encoding is a lie. On a modern psql version that’s quite rightly prevented:

C:\>"C:\Program Files (x86)\pgAdmin III\1.20\psql" -U glyn -d TEST -h pgtest
psql: FATAL:  conversion between WIN1252 and LATIN1 is not supported

This is not an issue if you want to connect to a utf8 database, but the issue I had this morning was connecting to a latin1 database with psql from a Windows client (something I do rarely). If I set the codepage to utf8 to match client encoding, I got a “Not enough memory.” error:

C:\>chcp 65001
Active code page: 65001

C:\>set PGCLIENTENCODING=UTF8

C:\>"C:\Program Files (x86)\pgAdmin III\1.20\psql" -U glyn -d TEST -h pgtest
psql (9.4.0, server 9.4.4)
WARNING: Console code page (65001) differs from Windows code page (1252)
         8-bit characters might not work correctly. See psql reference
         page "Notes for Windows users" for details.
SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 25
6, compression: off)
Type "help" for help.

TEST=> show client_encoding;
Not enough memory.

I could set the codepage to 1252, but that would mean my setting for client_encoding would be a lie, and if I were to then revert to set client_encoding=’WIN1252′ I’d have come full circle and be back at the “FATAL: conversion between WIN1252 and LATIN1 is not supported” error message.

A quick google revealed these bug reports with no solutions. Another dig at the docs revealed the following passage:

pager

Controls use of a pager program for query and psql help output. If the environment variable PAGER is set, the output is piped to the specified program. Otherwise a platform-dependent default (such as more) is used.

So how does more behave?

C:\>chcp 65001
Active code page: 65001

C:\>more
Not enough memory.

Bingo! So if I turn the pager off the error should go:

C:\>chcp 65001
Active code page: 65001

C:\>set PGCLIENTENCODING=UTF8
C:\>"C:\Program Files (x86)\pgAdmin III\1.20\psql" -U glyn -d TEST -h pgtest
WARNING: Console code page (65001) differs from Windows code page (1252)
         8-bit characters might not work correctly. See psql reference
         page "Notes for Windows users" for details.
SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 25
6, compression: off)
Type "help" for help.

TEST=> \pset pager off
Pager usage is off.
TEST=> show client_encoding;
 client_encoding
-----------------
 UTF8
(1 row)

Cluprit found; quite embarrassing that the source of such a verbose error as “Not enough memory.” being Microsoft didn’t occour sooner. So lets try a different pager (sourced from)

C:\>chcp 65001
Active code page: 65001

C:\>set PGCLIENTENCODING=UTF8
C:\>set PAGER="C:\Program Files (x86)\gnuwin32\bin\less"
C:\>set LESS=--quit-at-eof

C:\>"C:\Program Files (x86)\pgAdmin III\1.20\psql" -U glyn -d TEST -h pgtest
psql (9.4.0, server 9.4.4)
WARNING: Console code page (65001) differs from Windows code page (1252)
         8-bit characters might not work correctly. See psql reference
         page "Notes for Windows users" for details.
SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 25
6, compression: off)
Type "help" for help.

TEST=> show client_encoding;

 client_encoding
-----------------
 UTF8
(1 row)

(END)

Being cavalier with slony nodes and pg_dump/pg_restore

It’s generally a bad idea to do logical dump/restores of slony nodes, and for this reason slony provides the CLONE PREPARE/CLONE FINISH action commands to clone subscriber nodes.

In this instance though, I’ve a test environment where I’d stopped the slons, dumped out and dropped a subscriber database and then gone on to do some other testing with that postgres cluster. Sometime later I want to do some more testing with the slony cluster; I never dropped this node from the slony config, but in the meantime I’ve changed the postgres version from 9.0 to 9.4 and recreated the postgres cluster with initdb. Schema wise nothing has changed with the old nodes.

What follows is some fudge to make the node resume with replication, it’s neither recommended nor guaranteed to work. Copy this process at your peril.

After recompiling the same version of slony I had before (2.2.4) and installing the binaries, I’ve just restored the old subscriber database dump into the new postgres 9.4 cluster and updated my extensions.

So what’s holding me off just restarting the slons now? Firstly slony stores the oid of each table participating in replication and these will surely have changed now. The following query produces a long list of tables where this is true:

[postgres]
TEST=# SELECT s.tab_set, s.tab_reloid, s.tab_nspname, s.tab_relname,
c.oid, n.nspname, c.relname
FROM _test_replication.sl_table s
JOIN pg_catalog.pg_class c ON c.oid = s.tab_reloid
JOIN pg_catalog.pg_namespace n ON c.relnamespace = n.oid
WHERE (s.tab_reloid, s.tab_nspname, s.tab_relname) <> (c.oid, n.nspname, c.relname);
[/postgres]

The fix here is pretty simple, I could run REPAIR CONFIG against the node, but instead I just hit the updatereloid(<set id>, <node id>) function directly:

[postgres]
TEST=# SELECT s.set_id, s.set_comment,
CASE _test_replication.updatereloid(s.set_id,1) WHEN 1 THEN ‘OK’ ELSE ‘FAILED’ END
FROM _test_replication.sl_set s;
[/postgres]

For clarity, my resurrected node is ID 1, and was a subscriber at the time it was destroyed.

Secondly slony stores the transaction ID for each replication event against the node, and if my resurrected node has a lower current xid then new events are going to look like they’re in the past compared to events that have already been generated against the node. I can see the current xid of my new postgresql cluster with the following query:

[postgres]
TEST=# SELECT txid_current();
txid_current
————–
25765
[/postgres]

And I can compare this to slony events generated against the node in its old incarnation as follows:

[postgres]
TEST=# SELECT max(txid_snapshot_xmax(ev_snapshot)), txid_current(), txid_current()-max(txid_snapshot_xmax(ev_snapshot)) AS delta
FROM _test_replication.sl_event where ev_origin = 1;

max | txid_current | delta
——-+————–+——–
89004 | 25767 | -63237
[/postgres]

So transaction ID wise my node is in the past, to bump that up I need about 63237 transactions to happen, if that were a lot higher I’d have to think of another way (perhaps by stopping all the slons and updating all values of ev_snapshot for node 1 on all nodes), but I can easily generate 60k transactions by hitting txid_current():

$ while [ 1 ]; do psql -U glyn -d TEST -tAc "SELECT txid_current();"; done
25768
25769
25770

<snip>

89002
89003
89004
89005
89006
^C
$ 

I can now restart my slons and replication should continue, and so far all appears well.

Wanting hstore style operators in jsonb – revisited

A couple of weeks ago I wrote about wanting a jsonb delete operator in 9.4, and yesterday evening I decided to have a go at writing some functions in C.

In the end all I actually did yesterday was make a mess and read a lot of existing code, but persisting this evening I’ve managed to put together some functions that appear to work. I’m not confident to say they’re efficient (or even correct; and they certainly shouldn’t be put on production systems), but I thought it’d be useful to benchmark them.

I’ve also added in a concatenate operator after reading Matthew Schinckel’s post.

First install the C shared library:

# make install 

Then install the functions and operators, these functions are named jsonb_delete and jsonb_concat:

[postgres]
test=# \i jsonb_opx.sql
CREATE FUNCTION
COMMENT
CREATE OPERATOR
COMMENT
CREATE FUNCTION
COMMENT
CREATE OPERATOR
COMMENT
CREATE FUNCTION
COMMENT
CREATE OPERATOR
COMMENT
CREATE FUNCTION
COMMENT
CREATE OPERATOR
COMMENT
[/postgres]

Then install the SQL versions for comparison, these functions are named jsonb_delete_left and jsonb_concat_left:

[postgres]
test=# \i jsonb_opx_sql_comparison.sql
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
[/postgres]

Test 1 – deleteing a key

This is actually an SQL wrapper to the C function for (jsonb, text[]), breaking that up
to have a separate version would be more efficient, but that should be a trivial task.

[postgres]
test=# \timing on
Timing is on.
test=# SELECT ‘{"a": 1, "b": 2, "c": 3}’::jsonb – ‘b’::text;
?column?
——————
{"a": 1, "c": 3}
(1 row)

Time: 7.099 ms
[/postgres]

The above is hitting the C function; from this point onwards I’ll just hit the functions directly:

[postgres]
test=# SELECT jsonb_delete(‘{"a": 1, "b": 2, "c": 3}’::jsonb, ‘b’);
jsonb_delete
——————
{"a": 1, "c": 3}
(1 row)
Time: 6.220 ms
[/postgres]

Now the original SQL version:

[postgres]
test=# SELECT jsonb_delete_left(‘{"a": 1, "b": 2, "c": 3}’::jsonb, ‘b’);
jsonb_delete_left
——————-
{"a": 1, "c": 3}
(1 row)

Time: 14.570 ms
[/postgres]

Now to benchmark for a large quantity of rows:

[postgres]
test=# EXPLAIN ANALYZE SELECT jsonb_delete((‘{"a":’ || x || ‘, "b":’ || x*2 || ‘, "c":’ || x*x || ‘}’)::jsonb, ‘b’)
FROM generate_series(1,10000) x;
QUERY PLAN
—————————————————————————————————————————-
Function Scan on generate_series x (cost=0.00..300.00 rows=1000 width=4) (actual time=10.407..263.489 rows=10000 loops=1)
Planning time: 0.335 ms
Execution time: 290.192 ms
(3 rows)

Time: 293.254 ms
test=# EXPLAIN ANALYZE SELECT jsonb_delete_left((‘{"a":’ || x || ‘, "b":’ || x*2 || ‘, "c":’ || x*x || ‘}’)::jsonb, ‘b’)
FROM generate_series(1,10000) x;
QUERY PLAN
—————————————————————————————————————————-
Function Scan on generate_series x (cost=0.00..300.00 rows=1000 width=4) (actual time=15.165..767.706 rows=10000 loops=1)
Planning time: 0.785 ms
Execution time: 803.258 ms
(3 rows)

Time: 809.088 ms
[/postgres]

Whilst processing 1 row really doesn’t show any improvement (the timings for both varied in the 2~10ms range), with 10,000 rows the C version is just over twice as quick.

If these times stick out as particularly dire, it’s probably just because the machine I’m testing on is very old.

Test 2 – deleting multiple keys

[postgres]
test=# SELECT jsonb_delete(‘{"a": 1, "b": 2, "c": 3}’::jsonb, ARRAY[‘a’,’b’]);
jsonb_delete
————–
{"c": 3}
(1 row)

Time: 3.482 ms

test=# SELECT jsonb_delete_left(‘{"a": 1, "b": 2, "c": 3}’::jsonb, ARRAY[‘a’,’b’]);
jsonb_delete_left
——————-
{"c": 3}
(1 row)

Time: 3.613 ms
[/postgres]

[postgres]
test=# EXPLAIN ANALYZE SELECT jsonb_delete((‘{"a":’ || x || ‘, "b":’ || x*2 || ‘, "c":’ || x*x || ‘}’)::jsonb, ARRAY[‘a’,’b’])
FROM generate_series(1,10000) x;
QUERY PLAN
————————————————————————————————————————–
Function Scan on generate_series x (cost=0.00..52.50 rows=1000 width=4) (actual time=5.805..177.507 rows=10000 loops=1)
Planning time: 1.646 ms
Execution time: 209.137 ms
(3 rows)

Time: 213.507 ms

test=# EXPLAIN ANALYZE SELECT jsonb_delete_left((‘{"a":’ || x || ‘, "b":’ || x*2 || ‘, "c":’ || x*x || ‘}’)::jsonb, ARRAY[‘a’,’b’])
FROM generate_series(1,10000) x;
QUERY PLAN
—————————————————————————————————————————-
Function Scan on generate_series x (cost=0.00..300.00 rows=1000 width=4) (actual time=15.805..757.500 rows=10000 loops=1)
Planning time: 0.595 ms
Execution time: 789.272 ms
(3 rows)

Time: 793.229 ms
[/postgres]

Results are similar; we’re essentially hitting the same C function.

Test 3 – Deleting matching jsonb key/value pairs

The C version of this function essentially loops round the left jsonb value looking up keys
in the right jsonb value. If it finds a matching key it does a string based comparison on
the values (treating nested jsonb as a string too) and if the values match as well then the key/value pair is removed.

[postgres]
test=# SELECT jsonb_delete(‘{"a": 1, "b": 2, "c": 3}’::jsonb, ‘{"a": 4, "b": 2}’::jsonb);
jsonb_delete
——————
{"a": 1, "c": 3}
(1 row)

Time: 3.114 ms
test=# SELECT jsonb_delete_left(‘{"a": 1, "b": 2, "c": 3}’::jsonb, ‘{"a": 4, "b": 2}’::jsonb);
jsonb_delete_left
——————-
{"a": 1, "c": 3}
(1 row)

Time: 6.899 ms
[/postgres]

[postgres]
test=# EXPLAIN ANALYZE SELECT jsonb_delete((‘{"a":’ || x || ‘, "b":’ || x*2 || ‘, "c":’ || x*x || ‘}’)::jsonb,
(‘{"a":’ || x || ‘, "d":’ || x*2 || ‘, "c":’ || x*10 || ‘}’)::jsonb)
FROM generate_series(1,10000) x;
QUERY PLAN
————————————————————————————————————————–
Function Scan on generate_series x (cost=0.00..92.50 rows=1000 width=4) (actual time=8.452..238.210 rows=10000 loops=1)
Planning time: 0.428 ms
Execution time: 266.358 ms
(3 rows)

Time: 270.161 ms

test=# EXPLAIN ANALYZE SELECT jsonb_delete_left((‘{"a":’ || x || ‘, "b":’ || x*2 || ‘, "c":’ || x*x || ‘}’)::jsonb,
(‘{"a":’ || x || ‘, "d":’ || x*2 || ‘, "c":’ || x*10 || ‘}’)::jsonb)
FROM generate_series(1,10000) x;
QUERY PLAN
—————————————————————————————————————————–
Function Scan on generate_series x (cost=0.00..340.00 rows=1000 width=4) (actual time=11.833..1206.990 rows=10000 loops=1)
Planning time: 0.759 ms
Execution time: 1248.481 ms
(3 rows)

Time: 1253.392 ms
[/postgres]

There’s a bigger improvement here; it’s about 4.5 times quicker.

Test 4 – concatenation

The C function for this is essentially a cut n shut job on both jsonb values, blindly
pushing all the values onto the return value and leaving the lower level jsonb api
to do the deduplication:

[postgres]
test=# SELECT jsonb_concat(‘{"a": 1, "b": 2, "c": 3}’::jsonb, ‘{"a": 4, "d": 4, "z": 26}’::jsonb);
jsonb_concat
——————————————-
{"a": 4, "b": 2, "c": 3, "d": 4, "z": 26}
(1 row)

Time: 3.028 ms

test=# SELECT jsonb_concat_left(‘{"a": 1, "b": 2, "c": 3}’::jsonb, ‘{"a": 4, "d": 4, "z": 26}’::jsonb);
jsonb_concat_left
——————————————-
{"a": 4, "b": 2, "c": 3, "d": 4, "z": 26}
(1 row)

Time: 4.731 ms
[/postgres]

Again nothing to get excited about. Testing on a larger quantity of rows shows a similar improvement to the jsonb -jsonb delete operator/function above:

[postgres]
test=# EXPLAIN ANALYZE SELECT jsonb_concat((‘{"a":’ || x || ‘, "b":’ || x*2 || ‘, "c":’ || x*x || ‘}’)::jsonb,
(‘{"a":’ || x || ‘, "d":’ || x*2 || ‘, "c":’ || x*10 || ‘}’)::jsonb)
FROM generate_series(1,10000) x;
QUERY PLAN
—————————————————————————————————————————
Function Scan on generate_series x (cost=0.00..92.50 rows=1000 width=4) (actual time=12.481..255.910 rows=10000 loops=1)
Planning time: 0.599 ms
Execution time: 285.357 ms
(3 rows)

Time: 288.615 ms

test=# EXPLAIN ANALYZE SELECT jsonb_concat_left((‘{"a":’ || x || ‘, "b":’ || x*2 || ‘, "c":’ || x*x || ‘}’)::jsonb,
(‘{"a":’ || x || ‘, "d":’ || x*2 || ‘, "c":’ || x*10 || ‘}’)::jsonb)
FROM generate_series(1,10000) x;
QUERY PLAN
—————————————————————————————————————————–
Function Scan on generate_series x (cost=0.00..340.00 rows=1000 width=4) (actual time=13.931..1051.100 rows=10000 loops=1)
Planning time: 5.160 ms
Execution time: 1091.596 ms
(3 rows)

Time: 1103.165 ms
[/postgres]

So in conclusion the results are nothing earth shattering, but there is a small improvement. Essentially all these functions are doing is iterating around the jsonb and building new return values; it’d be nice to see what someone more familiar with the jsonb internals at a lower level could come up with.