Category Archives: C

C Programming

Wanting hstore style operators in jsonb – revisited

A couple of weeks ago I wrote about wanting a jsonb delete operator in 9.4, and yesterday evening I decided to have a go at writing some functions in C.

In the end all I actually did yesterday was make a mess and read a lot of existing code, but persisting this evening I’ve managed to put together some functions that appear to work. I’m not confident to say they’re efficient (or even correct; and they certainly shouldn’t be put on production systems), but I thought it’d be useful to benchmark them.

I’ve also added in a concatenate operator after reading Matthew Schinckel’s post.

First install the C shared library:

# make install 

Then install the functions and operators, these functions are named jsonb_delete and jsonb_concat:

[postgres]
test=# \i jsonb_opx.sql
CREATE FUNCTION
COMMENT
CREATE OPERATOR
COMMENT
CREATE FUNCTION
COMMENT
CREATE OPERATOR
COMMENT
CREATE FUNCTION
COMMENT
CREATE OPERATOR
COMMENT
CREATE FUNCTION
COMMENT
CREATE OPERATOR
COMMENT
[/postgres]

Then install the SQL versions for comparison, these functions are named jsonb_delete_left and jsonb_concat_left:

[postgres]
test=# \i jsonb_opx_sql_comparison.sql
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
[/postgres]

Test 1 – deleteing a key

This is actually an SQL wrapper to the C function for (jsonb, text[]), breaking that up
to have a separate version would be more efficient, but that should be a trivial task.

[postgres]
test=# \timing on
Timing is on.
test=# SELECT ‘{"a": 1, "b": 2, "c": 3}’::jsonb – ‘b’::text;
?column?
——————
{"a": 1, "c": 3}
(1 row)

Time: 7.099 ms
[/postgres]

The above is hitting the C function; from this point onwards I’ll just hit the functions directly:

[postgres]
test=# SELECT jsonb_delete(‘{"a": 1, "b": 2, "c": 3}’::jsonb, ‘b’);
jsonb_delete
——————
{"a": 1, "c": 3}
(1 row)
Time: 6.220 ms
[/postgres]

Now the original SQL version:

[postgres]
test=# SELECT jsonb_delete_left(‘{"a": 1, "b": 2, "c": 3}’::jsonb, ‘b’);
jsonb_delete_left
——————-
{"a": 1, "c": 3}
(1 row)

Time: 14.570 ms
[/postgres]

Now to benchmark for a large quantity of rows:

[postgres]
test=# EXPLAIN ANALYZE SELECT jsonb_delete((‘{"a":’ || x || ‘, "b":’ || x*2 || ‘, "c":’ || x*x || ‘}’)::jsonb, ‘b’)
FROM generate_series(1,10000) x;
QUERY PLAN
—————————————————————————————————————————-
Function Scan on generate_series x (cost=0.00..300.00 rows=1000 width=4) (actual time=10.407..263.489 rows=10000 loops=1)
Planning time: 0.335 ms
Execution time: 290.192 ms
(3 rows)

Time: 293.254 ms
test=# EXPLAIN ANALYZE SELECT jsonb_delete_left((‘{"a":’ || x || ‘, "b":’ || x*2 || ‘, "c":’ || x*x || ‘}’)::jsonb, ‘b’)
FROM generate_series(1,10000) x;
QUERY PLAN
—————————————————————————————————————————-
Function Scan on generate_series x (cost=0.00..300.00 rows=1000 width=4) (actual time=15.165..767.706 rows=10000 loops=1)
Planning time: 0.785 ms
Execution time: 803.258 ms
(3 rows)

Time: 809.088 ms
[/postgres]

Whilst processing 1 row really doesn’t show any improvement (the timings for both varied in the 2~10ms range), with 10,000 rows the C version is just over twice as quick.

If these times stick out as particularly dire, it’s probably just because the machine I’m testing on is very old.

Test 2 – deleting multiple keys

[postgres]
test=# SELECT jsonb_delete(‘{"a": 1, "b": 2, "c": 3}’::jsonb, ARRAY[‘a’,’b’]);
jsonb_delete
————–
{"c": 3}
(1 row)

Time: 3.482 ms

test=# SELECT jsonb_delete_left(‘{"a": 1, "b": 2, "c": 3}’::jsonb, ARRAY[‘a’,’b’]);
jsonb_delete_left
——————-
{"c": 3}
(1 row)

Time: 3.613 ms
[/postgres]

[postgres]
test=# EXPLAIN ANALYZE SELECT jsonb_delete((‘{"a":’ || x || ‘, "b":’ || x*2 || ‘, "c":’ || x*x || ‘}’)::jsonb, ARRAY[‘a’,’b’])
FROM generate_series(1,10000) x;
QUERY PLAN
————————————————————————————————————————–
Function Scan on generate_series x (cost=0.00..52.50 rows=1000 width=4) (actual time=5.805..177.507 rows=10000 loops=1)
Planning time: 1.646 ms
Execution time: 209.137 ms
(3 rows)

Time: 213.507 ms

test=# EXPLAIN ANALYZE SELECT jsonb_delete_left((‘{"a":’ || x || ‘, "b":’ || x*2 || ‘, "c":’ || x*x || ‘}’)::jsonb, ARRAY[‘a’,’b’])
FROM generate_series(1,10000) x;
QUERY PLAN
—————————————————————————————————————————-
Function Scan on generate_series x (cost=0.00..300.00 rows=1000 width=4) (actual time=15.805..757.500 rows=10000 loops=1)
Planning time: 0.595 ms
Execution time: 789.272 ms
(3 rows)

Time: 793.229 ms
[/postgres]

Results are similar; we’re essentially hitting the same C function.

Test 3 – Deleting matching jsonb key/value pairs

The C version of this function essentially loops round the left jsonb value looking up keys
in the right jsonb value. If it finds a matching key it does a string based comparison on
the values (treating nested jsonb as a string too) and if the values match as well then the key/value pair is removed.

[postgres]
test=# SELECT jsonb_delete(‘{"a": 1, "b": 2, "c": 3}’::jsonb, ‘{"a": 4, "b": 2}’::jsonb);
jsonb_delete
——————
{"a": 1, "c": 3}
(1 row)

Time: 3.114 ms
test=# SELECT jsonb_delete_left(‘{"a": 1, "b": 2, "c": 3}’::jsonb, ‘{"a": 4, "b": 2}’::jsonb);
jsonb_delete_left
——————-
{"a": 1, "c": 3}
(1 row)

Time: 6.899 ms
[/postgres]

[postgres]
test=# EXPLAIN ANALYZE SELECT jsonb_delete((‘{"a":’ || x || ‘, "b":’ || x*2 || ‘, "c":’ || x*x || ‘}’)::jsonb,
(‘{"a":’ || x || ‘, "d":’ || x*2 || ‘, "c":’ || x*10 || ‘}’)::jsonb)
FROM generate_series(1,10000) x;
QUERY PLAN
————————————————————————————————————————–
Function Scan on generate_series x (cost=0.00..92.50 rows=1000 width=4) (actual time=8.452..238.210 rows=10000 loops=1)
Planning time: 0.428 ms
Execution time: 266.358 ms
(3 rows)

Time: 270.161 ms

test=# EXPLAIN ANALYZE SELECT jsonb_delete_left((‘{"a":’ || x || ‘, "b":’ || x*2 || ‘, "c":’ || x*x || ‘}’)::jsonb,
(‘{"a":’ || x || ‘, "d":’ || x*2 || ‘, "c":’ || x*10 || ‘}’)::jsonb)
FROM generate_series(1,10000) x;
QUERY PLAN
—————————————————————————————————————————–
Function Scan on generate_series x (cost=0.00..340.00 rows=1000 width=4) (actual time=11.833..1206.990 rows=10000 loops=1)
Planning time: 0.759 ms
Execution time: 1248.481 ms
(3 rows)

Time: 1253.392 ms
[/postgres]

There’s a bigger improvement here; it’s about 4.5 times quicker.

Test 4 – concatenation

The C function for this is essentially a cut n shut job on both jsonb values, blindly
pushing all the values onto the return value and leaving the lower level jsonb api
to do the deduplication:

[postgres]
test=# SELECT jsonb_concat(‘{"a": 1, "b": 2, "c": 3}’::jsonb, ‘{"a": 4, "d": 4, "z": 26}’::jsonb);
jsonb_concat
——————————————-
{"a": 4, "b": 2, "c": 3, "d": 4, "z": 26}
(1 row)

Time: 3.028 ms

test=# SELECT jsonb_concat_left(‘{"a": 1, "b": 2, "c": 3}’::jsonb, ‘{"a": 4, "d": 4, "z": 26}’::jsonb);
jsonb_concat_left
——————————————-
{"a": 4, "b": 2, "c": 3, "d": 4, "z": 26}
(1 row)

Time: 4.731 ms
[/postgres]

Again nothing to get excited about. Testing on a larger quantity of rows shows a similar improvement to the jsonb -jsonb delete operator/function above:

[postgres]
test=# EXPLAIN ANALYZE SELECT jsonb_concat((‘{"a":’ || x || ‘, "b":’ || x*2 || ‘, "c":’ || x*x || ‘}’)::jsonb,
(‘{"a":’ || x || ‘, "d":’ || x*2 || ‘, "c":’ || x*10 || ‘}’)::jsonb)
FROM generate_series(1,10000) x;
QUERY PLAN
—————————————————————————————————————————
Function Scan on generate_series x (cost=0.00..92.50 rows=1000 width=4) (actual time=12.481..255.910 rows=10000 loops=1)
Planning time: 0.599 ms
Execution time: 285.357 ms
(3 rows)

Time: 288.615 ms

test=# EXPLAIN ANALYZE SELECT jsonb_concat_left((‘{"a":’ || x || ‘, "b":’ || x*2 || ‘, "c":’ || x*x || ‘}’)::jsonb,
(‘{"a":’ || x || ‘, "d":’ || x*2 || ‘, "c":’ || x*10 || ‘}’)::jsonb)
FROM generate_series(1,10000) x;
QUERY PLAN
—————————————————————————————————————————–
Function Scan on generate_series x (cost=0.00..340.00 rows=1000 width=4) (actual time=13.931..1051.100 rows=10000 loops=1)
Planning time: 5.160 ms
Execution time: 1091.596 ms
(3 rows)

Time: 1103.165 ms
[/postgres]

So in conclusion the results are nothing earth shattering, but there is a small improvement. Essentially all these functions are doing is iterating around the jsonb and building new return values; it’d be nice to see what someone more familiar with the jsonb internals at a lower level could come up with.