Monthly Archives: December 2014

PostgreSQL 9.4 released

It looks like PostgreSQL 9.4 was released last Thursday. I’ve been keeping an eye on 9.4 and watching some of the chat about new features, although I’ve just been too buried in work to pay too much attention. Today however is my first day off for Christmas, so finally I’ve got some time to look into it.

The most interesting features to me are jsonb and Logical Decoding, so that’s what I’m going to look at, but there’s more and you can read about it here.

jsonb

The new jsonb data type stores JSON data internally in a binary form, which makes it possible to index the keys and values within. In previous versions we have a JSON data type but all that does is enforce valid JSON; the data is still stored as text. Whilst it is possible to do lookups on key-value data in previous versions using the hstore type (provided by the hstore module), with JSON seemingly being ubiquitous in aplications these days jsonb means we can just let devs store their data straight into the database and still be able to do fast lookups and searches.

At work we get quite a lot of variable callback data from web APIs, or serialized data from application objects that tends to end up being stored as text. The ability to lookup that data via a GIN index will be invaluable. I assume even XML storage should become easier as there’s plenty of pre cooked ways to convert XML to JSON.

Let’s create a quick test table:

[postgres]
CREATE TABLE jsonb_test(
id integer PRIMARY KEY,
data jsonb
);
CREATE INDEX jsonb_test_data ON jsonb_test USING gin(data);

— Obviously this data is ridiculous, but we need enough rows for postgres to prefer an index over a seq scan.
INSERT INTO jsonb_test
SELECT i, (‘{"name": "Person’ || i || ‘","age" : ‘ || i || ‘,"address": {"add1": "’
|| i || ‘ Clarence Street","city": "Lancaster","postcode": "LA13BG"}}’)::jsonb
FROM generate_series(1,100000) i;
[/postgres]

Now if we query on the data column we should see the jsonb_test_data index being used:

[postgres]
TEST=# SELECT * FROM jsonb_test
WHERE data @> ‘{"address": {"add1": "2300 Clarence Street"}}’;
id | data
——+—————————————————————————————————————————–
2300 | {"age": 2300, "name": "Person2300", "address": {"add1": "2300 Clarence Street", "city": "Lancaster", "postcode": "LA13BG"}}
(1 row)

Time: 10.811 ms

TEST=# EXPLAIN SELECT * FROM jsonb_test
WHERE data @> ‘{"address": {"add1": "2300 Clarence Street"}}’;
QUERY PLAN
————————————————————————————–
Bitmap Heap Scan on jsonb_test (cost=1040.83..1395.09 rows=107 width=147)
Recheck Cond: (data @> ‘{"address": {"add1": "2300 Clarence Street"}}’::jsonb)
-> Bitmap Index Scan on jsonb_test_data (cost=0.00..1040.80 rows=107 width=0)
Index Cond: (data @> ‘{"address": {"add1": "2300 Clarence Street"}}’::jsonb)
(4 rows)
[/postgres]

Logical Decoding

Whilst Logical Decoding isn’t really in a state to be put into active duty right away, it is pretty special, and allows postgres to supply a stream of changes (or partial changes) in a user defined format. This is similar to what we’ve been doing for ages with trigger based replication like Slony and Londisite, but dissimilar because instead of all the overhead and clunkyness of log triggers the changes are read directly from WAL in a similar way to streaming binary replication. The uses don’t end at master-slave replication either; multimaster and selective replication with per-table granularity, auditing, online upgrades and cache invalidation are just some of the possible uses.

Logical Decoding uses the concept of “replication slots”, which represent a stream of changes logged for a particular consumer, and we can have as many replication slots as we like. The great thing about replication slots is that once they’re created all WAL files required by the slot are retained, and they aren’t just for Logical Decoding; Streaming Replication can make use of them too, so we don’t have to balance wal_keep_segments or rely on archive_command any more. Replication slots aren’t a magic bullet though; if a replication slot isn’t being consumed it will cause postgresql to consume disk space as it retains WAL files for the slot/consumer.

I mentioned earlier that Logical Decoding allows changes to be supplied in a “user defined format”; this is provided by an output plugin in the form of a shared library that needs to be custom written as required, and it’s in this output plugin where the format and any restrictions on what data we want would be controlled. The one exception to this is data used for identifying old rows from updates or deletes, which is defined before it is written to the WAL, and has to be set on a per table basis with ALTER TABLE REPLICA IDENTITY.

There’s a “test_decoding” plugin supplied as a contrib module that we can use for testing, and that’s what I’m going to have a quick look at now.

The first thing we have to do is set wal_level to logical and make sure max_replication_slots is greater than zero. Once we’ve done that and restarted PostgreSQL we’re ready to start playing, and we can create our first replication slot:

[postgres]
TEST=# SELECT * FROM pg_create_logical_replication_slot(‘test_replication_slot’, ‘test_decoding’);
slot_name | xlog_position
———————–+—————
test_replication_slot | 0/56436390
(1 row)
[/postgres]

We should now be able to see our replication slot in the pg_replication_slots view:

[postgres]
TEST=# SELECT * FROM pg_replication_slots;
slot_name | plugin | slot_type | datoid | database | active | xmin | catalog_xmin | restart_lsn
———————–+—————+———–+——–+———-+——–+——+————–+————-
test_replication_slot | test_decoding | logical | 16422 | TEST | f | | 1135904 | 0/56436358
(1 row)
[/postgres]

To look and see if there are any changes, we can use the pg_logical_slot_peek_changes function:

[postgres]
TEST=# \x
Expanded display is on.
TEST=# SELECT * FROM pg_logical_slot_peek_changes(‘test_replication_slot’,NULL, NULL);

-[ RECORD 1 ]——————————————————————————————————————————————————————————————————————————————————————————————————
location | 0/56436450
xid | 1135906
data | BEGIN 1135906
-[ RECORD 2 ]——————————————————————————————————————————————————————————————————————————————————————————————————
location | 0/56436450
xid | 1135906
data | table _test_replication.sl_components: UPDATE: co_actor:’local_sync’ co_pid[integer]:20814 co_node [integer]:0 co_connection_pid[integer]:20831 co_activity:’thread main loop’ co_starttime[timestamp with time zone]:’2014-12-22 16:00:48+00′ co_event[bigint]:null co_eventtype:’n/a’
-[ RECORD 3 ]——————————————————————————————————————————————————————————————————————————————————————————————————
location | 0/56436518
xid | 1135906
data | COMMIT 1135906

< snip >
[/postgres]

… and I’ll snip my output there at 3 rows; I use this machine for Slony testing, so we’re already seeing all of the Slony chatter here, but you should be able to see the capture of an update to the “_test_replication.sl_components” table (this could be any table – I just happened to call my slony cluster “test_replication” too). If you create some activity on your database, you should start so see some output. Notice that the output is the actual changes on the table, not a capture of the sql statement that caused the changes; we can use this change information to build SQL if we want, or some other form DML for another system.

To actually consume the queue we can call pg_logical_slot_get_changes:

[postgres]
TEST=# SELECT * FROM pg_logical_slot_get_changes(‘test_replication_slot’, NULL, NULL);
[/postgres]

This outputs the same as the above, but once we’ve called it the changes are classed as consumed regardless of the caller actually applying them, and will not be output again (nor the WAL reatined). One thing that would be useful here would be the ability to pull the changes, apply them, then confirm them as applied before they’re marked as consumed; I guess this could be achieved by first calling pg_logical_slot_peek_changes, applying the changes and then calling pg_logical_slot_get_changes passing the latest lsn seen from the peek.

In addition to the sql functions, the pg_recvlogical binary is provided to pull data over the streaming replication protocol with something like:

# pg_recvlogical -U postgres -d TEST --slot test_replication_slot --start -f -

For this, as with streaming replication we need to set max_wal_senders greater than zero.

Once we’re finished with our test, we should drop the replication slot:

[postgres]
TEST=# SELECT pg_drop_replication_slot(‘test_replication_slot’);
[/postgres]

Apparently the one thing Logical Decoding can’t do is output DDL, and I’m guessing this is due to other complexities that need to be overcome rather than by design. All exciting!

Logging information on long running transactions in PostgreSQL

PostgreSQL has some runtime parameters we can use to monitor long running queries; we can set log_min_duration_statement to log any statements that take longer than the value in milliseconds we set. But what can we do if we have an application that keeps a long running transaction open but is mainly just ‘<IDLE> in transaction’? Perhaps it’s pinning that transaction open whilst it performs some long running action elsewhere? We can use a monitoring tool like Nagios to show long running transactions, but the activity they actually do won’t be logged by postgres unless they’re blocking somewhere. What comes to mind here, is distributed transaction managers playing up.

For example here if we set log_min_duration_statement to 0 postgres will log every query we issue, and we’ll see a message like this in the log:

[postgres]
LOG: duration: 3.475 ms statement: SELECT * FROM pg_stat_database where datname=’TEST’
[/postgres]

We can set statement_timeout to forcefully prevent applications running queries that take longer than we would like them to, so any statement that exceeds statement_timeout in milliseconds will be cancelled, for example:

[postgres]
TEST=# set statement_timeout = ‘2ms’;
SET
TEST=# SELECT * FROM pg_stat_database where datname=’TEST’;
ERROR: canceling statement due to statement timeout
[/postgres]

Obviously setting such a low value is not much use, but a typical use case for this would be in a busy web application where a user is likely to have gone away after 10 seconds anyway, in which case we could alter the applications user to have a statement_timeout somewhere around 10 seconds.

To log blocking queries we can set log_lock_waits to on, in which case postgres will log details of blocking queries when it checks for deadlock after the deadlock_timeout interval (default 1 second). We’ll see a message something like this in the log depending on the type of lock:

[postgres]
LOG: process 16532 still waiting for ShareLock on transaction 311823400 after 1000.147 ms
STATEMENT: UPDATE balls SET description = ‘TEST’ WHERE id = 5;
[/postgres]

The above tells us that after 1 second our statement was still waiting on a lock, this is not to be confused with deadlock; the deadlock_timeout parameter factors into this only because postgres may as well output this information whilst it is doing the work to check for deadlock anyway. Actual deadlock would look something like this:

[postgres]
LOG: process 4465 detected deadlock while waiting for ShareLock on transaction 311824216 after 1000.158 ms
STATEMENT: UPDATE balls SET description = ‘TEST’ WHERE id = 5;
ERROR: deadlock detected
DETAIL: Process 4465 waits for ShareLock on transaction 311824216; blocked by process 16532.
Process 16532 waits for ShareLock on transaction 311824207; blocked by process 4465.
Process 4465: UPDATE balls SET name = ‘TEST’ WHERE id = 5;
Process 16532: UPDATE balls SET name = ‘TEST2’ WHERE id = 6;
[/postgres]

Whilst all of the above is useful for tracking down problems, the one thing that we can’t do do with runtime parameters is timeout or log long running transactions, and perhaps even log the statements that may only persist for short periods within the transaction. In most cases we’d be using some sort of monitoring platfom like nagios, cacti or munin to monitor transaction times along with everything else (including query times and locks mentioned above), but sometimes it’d be nice to see that information in the postgres logs too.

Generally if some application is keeping long running or idle transactions open, then it’s the application that needs to be fixed. But when that’s out of your hands you can just feel a bit stuck whilst you wait for that to happen, I’ve a pl/pgsql function that can be scheduled to log some information, and even cancel/terminate sessions. The function can be downloaded here or here

For example if we want to log information on long running transactions, we could run the following:

[postgres]
TEST=# SELECT public.pg_log_long_xact(‘1s’);
NOTICE: long_xact pid: 4465 duration: 431.325747 ms user: glyn application: psql client: [local] statement: <IDLE> in transaction
NOTICE: long_xact pid: 16532 duration: 327.438302 ms user: glyn application: psql client: [local] statement: UPDATE balls SET description = ‘TEST’ WHERE id = 5;
pg_log_long_xact
—————
(0 rows)
[/postgres]

If we want to see if either of these is waiting on the other:

[postgres]
TEST=# SELECT public.pg_log_long_xact(‘1s’, true);
NOTICE: long_xact pid: 4465 duration: 471.885373 ms user: glyn application: psql client: [local] statement: <IDLE> in transaction
NOTICE: long_xact pid: 16532 duration: 367.997928 ms user: glyn application: psql client: [local] statement: UPDATE balls SET description = ‘TEST’ WHERE id = 5;
NOTICE: long_xact waiter pid: 16532 blocker detail is; pid: 4465 duration: 471.885373 ms relation: any (public.balls (RowExclusiveLock)) lock type: transaction id 311824482 user: glyn application: psql client: [local] statement: <IDLE> in transaction
pg_log_long_xact
—————
(0 rows)
[/postgres]

We can set the level of raise it uses (in hindsight the default should probably have been LOG), or have it output the details as text:

[postgres]
TEST=# \t off
Showing only tuples.
TEST=# SELECT public.pg_log_long_xact(‘1s’, true, ‘text’);
long_xact pid: 4465 duration: 574.477076 ms user: glyn application: psql client: [local] statement: <IDLE>; in transaction
long_xact pid: 16532 duration: 470.589631 ms user: glyn application: psql client: [local] statement: UPDATE balls SET description = ‘TEST’ WHERE id = 5;
long_xact waiter pid: 16532 blocker detail is; pid: 4465 duration: 574.477076 ms relation: any (public.balls (RowExclusiveLock)) lock type: transaction id 311824482 user: glyn application: psql client: [local] statement: <IDLE> in transaction
[/postgres]

If we’re really having issues, we can make it start cancelling statements on each run, starting with those that have been blocking for the longest else those that have been running for the longest duration. As of pg 9.3 there’s now a lock_timeout parameter that will abort any statement waiting longer than the specified number of milliseconds, which is much better, but even post 9.3 the above function can be useful (Note that the difference here is that the above function will attempt to abort the blocking transaction rather than the waiting statement, or if there is no locking just the longest transaction. Obviously not to be used in haste.):

[postgres]
TEST=# SELECT public.pg_log_long_xact(‘1s’, true, ‘text’, ’10 minutes’);
long_xact pid: 4465 duration: 895.57988 ms user: glyn application: psql client: [local] statement: <IDLE> in transaction
long_xact unable to cancel backend with pid: 4465
long_xact pid: 16532 duration: 791.692435 ms user: glyn application: psql client: [local] statement: UPDATE balls SET description = ‘TEST’ WHERE id = 5;
long_xact waiter pid: 16532 blocker detail is; pid: 4465 duration: 895.57988 ms relation: any (public.balls (RowExclusiveLock)) lock type: transaction id 31182
4482 user: glyn application: psql client: [local] statement: <IDLE> in transaction
[/postgres]

In this case it can’t cancel the blocking statement as it is idle in trasaction, so there’s nothing to cancel, but we can pass another timeout to start terminating backends after:

[postgres]
TEST=# SELECT public.pg_log_long_xact(‘1s’, true, ‘text’, ’10 minutes’, ’15 minutes’);
long_xact pid: 4465 duration: 1026.90736 ms user: glyn application: psql client: [local] statement: <IDLE> in transaction
long_xact terminated backend with pid: 4465
long_xact pid: 16532 duration: 923.019915 ms user: glyn application: psql client: [local] statement: UPDATE balls SET description = ‘TEST’ WHERE id = 5;
[/postgres]

On out next run it has “resolved” the issue, I say “resolved” because it’s removed the blocking session:

[postgres]
TEST=#\t on
TEST=# SELECT public.pg_log_long_xact(‘1s’, true, ‘text’, ’10 minutes’, ’15 minutes’);
pg_log_long_xact
—————
(0 rows)
[/postgres]

By default the function only cancels/terminates the longest running blocker, or longest running transaction, we can be more forceful than that by passing an extra parameter to tell it to cancel statements/terminate backends of all long running transactions it finds (I’ve lowerd the timeouts purely because I had to restart the transactions/statements):

[postgres]
TEST=#\t off
TEST=# SELECT public.pg_log_long_xact(‘1s’, true, ‘text’, ‘2 minutes’, ‘3 minutes’, true);
long_xact pid: 19065 duration: 187.279089 ms user: glyn application: psql client: [local] statement: <IDLE> in transaction
long_xact terminated backend with pid: 19065
long_xact pid: 16532 duration: 184.251569 ms user: glyn application: psql client: [local] statement: UPDATE balls SET description = ‘TEST’ WHERE id = 5;
long_xact cancelled backend with pid: 16532
(4 rows)
[/postgres]

What would be nice to see in addition to the lock_timeout parameter would be a blocker_timeout parameter that would function just like the lock_timeout but abort the blocking session. Obviously such a parameter could just cause more havoc if the offending application just keeps retrying the blocking operation, or worse you have it set and it aborts the restructure you’ve been waiting hours for. But I think in some instances it could be useful, set to a large enough value we could kill the offending blocking transaction and let all the others complete.

Update 30/09/2016 – Postgres 9.6 adds idle_in_transaction_session_timeout which allows us to automatically terminate any transactions that sit idle for longer than the value in milliseconds we set.