database dumping ground

A couple of weeks ago I wrote about wanting a jsonb delete operator in 9.4, and yesterday evening I decided to have a go at writing some functions in C.

In the end all I actually did yesterday was make a mess and read a lot of existing code, but persisting this evening I’ve managed to put together some functions that appear to work. I’m not confident to say they’re efficient (or even correct; and they certainly shouldn’t be put on production systems), but I thought it’d be useful to benchmark them.

I’ve also added in a concatenate operator after reading Matthew Schinckel’s post.

First install the C shared library:

# make install

Then install the functions and operators, these functions are named jsonb_delete and jsonb_concat:

[postgres]
test=# \i jsonb_opx.sql
CREATE FUNCTION
COMMENT
CREATE OPERATOR
COMMENT
CREATE FUNCTION
COMMENT
CREATE OPERATOR
COMMENT
CREATE FUNCTION
COMMENT
CREATE OPERATOR
COMMENT
CREATE FUNCTION
COMMENT
CREATE OPERATOR
COMMENT
[/postgres]

Then install the SQL versions for comparison, these functions are named jsonb_delete_left and jsonb_concat_left:

[postgres]
test=# \i jsonb_opx_sql_comparison.sql
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
[/postgres]

Test 1 – deleteing a key

This is actually an SQL wrapper to the C function for (jsonb, text[]), breaking that up
to have a separate version would be more efficient, but that should be a trivial task.

[postgres]
test=# \timing on
Timing is on.
test=# SELECT ‘{"a": 1, "b": 2, "c": 3}’::jsonb – ‘b’::text;
?column?
——————
{"a": 1, "c": 3}
(1 row)

Time: 7.099 ms
[/postgres]

The above is hitting the C function; from this point onwards I’ll just hit the functions directly:

[postgres]
test=# SELECT jsonb_delete(‘{"a": 1, "b": 2, "c": 3}’::jsonb, ‘b’);
jsonb_delete
——————
{"a": 1, "c": 3}
(1 row)
Time: 6.220 ms
[/postgres]

Now the original SQL version:

[postgres]
test=# SELECT jsonb_delete_left(‘{"a": 1, "b": 2, "c": 3}’::jsonb, ‘b’);
jsonb_delete_left
——————-
{"a": 1, "c": 3}
(1 row)

Time: 14.570 ms
[/postgres]

Now to benchmark for a large quantity of rows:

[postgres]
test=# EXPLAIN ANALYZE SELECT jsonb_delete((‘{"a":’ || x || ‘, "b":’ || x*2 || ‘, "c":’ || x*x || ‘}’)::jsonb, ‘b’)
FROM generate_series(1,10000) x;
QUERY PLAN
—————————————————————————————————————————-
Function Scan on generate_series x (cost=0.00..300.00 rows=1000 width=4) (actual time=10.407..263.489 rows=10000 loops=1)
Planning time: 0.335 ms
Execution time: 290.192 ms
(3 rows)

Time: 293.254 ms
test=# EXPLAIN ANALYZE SELECT jsonb_delete_left((‘{"a":’ || x || ‘, "b":’ || x*2 || ‘, "c":’ || x*x || ‘}’)::jsonb, ‘b’)
FROM generate_series(1,10000) x;
QUERY PLAN
—————————————————————————————————————————-
Function Scan on generate_series x (cost=0.00..300.00 rows=1000 width=4) (actual time=15.165..767.706 rows=10000 loops=1)
Planning time: 0.785 ms
Execution time: 803.258 ms
(3 rows)

Time: 809.088 ms
[/postgres]

Whilst processing 1 row really doesn’t show any improvement (the timings for both varied in the 2~10ms range), with 10,000 rows the C version is just over twice as quick.

If these times stick out as particularly dire, it’s probably just because the machine I’m testing on is very old.

Test 2 – deleting multiple keys

[postgres]
test=# SELECT jsonb_delete(‘{"a": 1, "b": 2, "c": 3}’::jsonb, ARRAY[‘a’,’b’]);
jsonb_delete
————–
{"c": 3}
(1 row)

Time: 3.482 ms

test=# SELECT jsonb_delete_left(‘{"a": 1, "b": 2, "c": 3}’::jsonb, ARRAY[‘a’,’b’]);
jsonb_delete_left
——————-
{"c": 3}
(1 row)

Time: 3.613 ms
[/postgres]

[postgres]
test=# EXPLAIN ANALYZE SELECT jsonb_delete((‘{"a":’ || x || ‘, "b":’ || x*2 || ‘, "c":’ || x*x || ‘}’)::jsonb, ARRAY[‘a’,’b’])
FROM generate_series(1,10000) x;
QUERY PLAN
————————————————————————————————————————–
Function Scan on generate_series x (cost=0.00..52.50 rows=1000 width=4) (actual time=5.805..177.507 rows=10000 loops=1)
Planning time: 1.646 ms
Execution time: 209.137 ms
(3 rows)

Time: 213.507 ms

test=# EXPLAIN ANALYZE SELECT jsonb_delete_left((‘{"a":’ || x || ‘, "b":’ || x*2 || ‘, "c":’ || x*x || ‘}’)::jsonb, ARRAY[‘a’,’b’])
FROM generate_series(1,10000) x;
QUERY PLAN
—————————————————————————————————————————-
Function Scan on generate_series x (cost=0.00..300.00 rows=1000 width=4) (actual time=15.805..757.500 rows=10000 loops=1)
Planning time: 0.595 ms
Execution time: 789.272 ms
(3 rows)

Time: 793.229 ms
[/postgres]

Results are similar; we’re essentially hitting the same C function.

Test 3 – Deleting matching jsonb key/value pairs

The C version of this function essentially loops round the left jsonb value looking up keys
in the right jsonb value. If it finds a matching key it does a string based comparison on
the values (treating nested jsonb as a string too) and if the values match as well then the key/value pair is removed.

[postgres]
test=# SELECT jsonb_delete(‘{"a": 1, "b": 2, "c": 3}’::jsonb, ‘{"a": 4, "b": 2}’::jsonb);
jsonb_delete
——————
{"a": 1, "c": 3}
(1 row)

Time: 3.114 ms
test=# SELECT jsonb_delete_left(‘{"a": 1, "b": 2, "c": 3}’::jsonb, ‘{"a": 4, "b": 2}’::jsonb);
jsonb_delete_left
——————-
{"a": 1, "c": 3}
(1 row)

Time: 6.899 ms
[/postgres]

[postgres]
test=# EXPLAIN ANALYZE SELECT jsonb_delete((‘{"a":’ || x || ‘, "b":’ || x*2 || ‘, "c":’ || x*x || ‘}’)::jsonb,
(‘{"a":’ || x || ‘, "d":’ || x*2 || ‘, "c":’ || x*10 || ‘}’)::jsonb)
FROM generate_series(1,10000) x;
QUERY PLAN
————————————————————————————————————————–
Function Scan on generate_series x (cost=0.00..92.50 rows=1000 width=4) (actual time=8.452..238.210 rows=10000 loops=1)
Planning time: 0.428 ms
Execution time: 266.358 ms
(3 rows)

Time: 270.161 ms

test=# EXPLAIN ANALYZE SELECT jsonb_delete_left((‘{"a":’ || x || ‘, "b":’ || x*2 || ‘, "c":’ || x*x || ‘}’)::jsonb,
(‘{"a":’ || x || ‘, "d":’ || x*2 || ‘, "c":’ || x*10 || ‘}’)::jsonb)
FROM generate_series(1,10000) x;
QUERY PLAN
—————————————————————————————————————————–
Function Scan on generate_series x (cost=0.00..340.00 rows=1000 width=4) (actual time=11.833..1206.990 rows=10000 loops=1)
Planning time: 0.759 ms
Execution time: 1248.481 ms
(3 rows)

Time: 1253.392 ms
[/postgres]

There’s a bigger improvement here; it’s about 4.5 times quicker.

Test 4 – concatenation

The C function for this is essentially a cut n shut job on both jsonb values, blindly
pushing all the values onto the return value and leaving the lower level jsonb api
to do the deduplication:

[postgres]
test=# SELECT jsonb_concat(‘{"a": 1, "b": 2, "c": 3}’::jsonb, ‘{"a": 4, "d": 4, "z": 26}’::jsonb);
jsonb_concat
——————————————-
{"a": 4, "b": 2, "c": 3, "d": 4, "z": 26}
(1 row)

Time: 3.028 ms

test=# SELECT jsonb_concat_left(‘{"a": 1, "b": 2, "c": 3}’::jsonb, ‘{"a": 4, "d": 4, "z": 26}’::jsonb);
jsonb_concat_left
——————————————-
{"a": 4, "b": 2, "c": 3, "d": 4, "z": 26}
(1 row)

Time: 4.731 ms
[/postgres]

Again nothing to get excited about. Testing on a larger quantity of rows shows a similar improvement to the jsonb -jsonb delete operator/function above:

[postgres]
test=# EXPLAIN ANALYZE SELECT jsonb_concat((‘{"a":’ || x || ‘, "b":’ || x*2 || ‘, "c":’ || x*x || ‘}’)::jsonb,
(‘{"a":’ || x || ‘, "d":’ || x*2 || ‘, "c":’ || x*10 || ‘}’)::jsonb)
FROM generate_series(1,10000) x;
QUERY PLAN
—————————————————————————————————————————
Function Scan on generate_series x (cost=0.00..92.50 rows=1000 width=4) (actual time=12.481..255.910 rows=10000 loops=1)
Planning time: 0.599 ms
Execution time: 285.357 ms
(3 rows)

Time: 288.615 ms

test=# EXPLAIN ANALYZE SELECT jsonb_concat_left((‘{"a":’ || x || ‘, "b":’ || x*2 || ‘, "c":’ || x*x || ‘}’)::jsonb,
(‘{"a":’ || x || ‘, "d":’ || x*2 || ‘, "c":’ || x*10 || ‘}’)::jsonb)
FROM generate_series(1,10000) x;
QUERY PLAN
—————————————————————————————————————————–
Function Scan on generate_series x (cost=0.00..340.00 rows=1000 width=4) (actual time=13.931..1051.100 rows=10000 loops=1)
Planning time: 5.160 ms
Execution time: 1091.596 ms
(3 rows)

Time: 1103.165 ms
[/postgres]

So in conclusion the results are nothing earth shattering, but there is a small improvement. Essentially all these functions are doing is iterating around the jsonb and building new return values; it’d be nice to see what someone more familiar with the jsonb internals at a lower level could come up with.

On 15/01/2015 Pavel Stehule wrote about implementing a dead simple history table using the hstore type.

On Friday evening I wanted to copy this almost line for line switching the hstore type for jsonb , but I counldn’t really see how to replicate the update part so simply without creating a delete operator. Once that operator has been created it appears to work:

[postgres highlight=”6,7,14,23,31,32,35″]
CREATE TABLE test(a int, b int, c int);

CREATE TABLE history(
event_time timestamp(2),
executed_by text,
origin_value jsonb,
new_value jsonb
);

CREATE OR REPLACE FUNCTION history_insert()
RETURNS TRIGGER AS $$
BEGIN
INSERT INTO history(event_time, executed_by, new_value)
VALUES(CURRENT_TIMESTAMP, SESSION_USER, row_to_json(NEW)::jsonb);
RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE OR REPLACE FUNCTION history_delete()
RETURNS TRIGGER AS $$
BEGIN
INSERT INTO history(event_time, executed_by, origin_value)
VALUES(CURRENT_TIMESTAMP, SESSION_USER, row_to_json(OLD)::jsonb);
RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE OR REPLACE FUNCTION history_update()
RETURNS TRIGGER AS $$
DECLARE
js_new jsonb := row_to_json(NEW)::jsonb;
js_old jsonb := row_to_json(OLD)::jsonb;
BEGIN
INSERT INTO history(event_time, executed_by, origin_value, new_value)
VALUES(CURRENT_TIMESTAMP, SESSION_USER, js_old – js_new, js_new – js_old);
RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER test_history_insert AFTER INSERT ON test
FOR EACH ROW EXECUTE PROCEDURE history_insert();

CREATE TRIGGER test_history_delete AFTER DELETE ON test
FOR EACH ROW EXECUTE PROCEDURE history_delete();

CREATE TRIGGER test_history_update AFTER UPDATE ON test
FOR EACH ROW EXECUTE PROCEDURE history_update();
[/postgres]

Result:

[postgres]
INSERT INTO test VALUES(1000, 1001, 1002);
UPDATE test SET a = 10, c = 20;
DELETE FROM test;

postgres=# table history;
event_time | executed_by | origin_value | new_value
————————+————-+——————————-+———————————–
2015-01-19 13:12:54.26 | glyn | | {"a": 1000, "b": 1001, "c": 1002}
2015-01-19 13:12:54.31 | glyn | {"a": 1000, "c": 1002} | {"a": 10, "c": 20}
2015-01-19 13:12:54.94 | glyn | {"a": 10, "b": 1001, "c": 20} |
(3 rows)
[/postgres]

Tested on PostgreSQL 9.4

… getting the wrong answer fast is not an improvement over the right answer slow

More pi

Wanting hstore style operators in jsonb – revisited

Copying Pavel Stehule’s simple history table but with the jsonb type