More pi

Today the server this site runs off got a well needed upgrade in the form of a Raspberry pi model 2B. I’m normally not one to rush out and buy the latest new gadget; my policy is if it aint broke don’t fix it. However wordpress had been taking it’s toll on the trusty old model B, and the timing of the 2B was perfect as I’d even started considering getting proper hosting again.

Raspberry pi model B and B2 side by side.

Raspberry pi model B2 and B side by side.

I’m still amazed at how fast technology gets smaller and cheaper these days. It seems like only yesterday I was marvelling at the qube2:

I thought the Qube2 was an amazing little server.

I thought the Qube2 was an amazing little server.

Wanting hstore style operators in jsonb – revisited

A couple of weeks ago I wrote about wanting a jsonb delete operator in 9.4, and yesterday evening I decided to have a go at writing some functions in C.

In the end all I actually did yesterday was make a mess and read a lot of existing code, but persisting this evening I’ve managed to put together some functions that appear to work. I’m not confident to say they’re efficient (or even correct; and they certainly shouldn’t be put on production systems), but I thought it’d be useful to benchmark them.

I’ve also added in a concatenate operator after reading Matthew Schinckel’s post.

First install the C shared library:

# make install 

Then install the functions and operators, these functions are named jsonb_delete and jsonb_concat:

test=# \i jsonb_opx.sql
CREATE FUNCTION
COMMENT
CREATE OPERATOR
COMMENT
CREATE FUNCTION
COMMENT
CREATE OPERATOR
COMMENT
CREATE FUNCTION
COMMENT
CREATE OPERATOR
COMMENT
CREATE FUNCTION
COMMENT
CREATE OPERATOR
COMMENT

Then install the SQL versions for comparison, these functions are named jsonb_delete_left and jsonb_concat_left:

test=# \i jsonb_opx_sql_comparison.sql
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION

Test 1 – deleteing a key

This is actually an SQL wrapper to the C function for (jsonb, text[]), breaking that up
to have a separate version would be more efficient, but that should be a trivial task.

test=# \timing on
Timing is on.
test=# SELECT '{"a": 1, "b": 2, "c": 3}'::jsonb - 'b'::text;
     ?column?     
------------------
 {"a": 1, "c": 3}
(1 row)

Time: 7.099 ms

The above is hitting the C function; from this point onwards I’ll just hit the functions directly:

test=# SELECT jsonb_delete('{"a": 1, "b": 2, "c": 3}'::jsonb, 'b');
   jsonb_delete   
------------------
 {"a": 1, "c": 3}
(1 row)
Time: 6.220 ms

Now the original SQL version:

test=# SELECT jsonb_delete_left('{"a": 1, "b": 2, "c": 3}'::jsonb, 'b');
 jsonb_delete_left 
-------------------
 {"a": 1, "c": 3}
(1 row)

Time: 14.570 ms

Now to benchmark for a large quantity of rows:

test=# EXPLAIN ANALYZE SELECT jsonb_delete(('{"a":' || x || ', "b":' || x*2 || ', "c":' || x*x || '}')::jsonb, 'b')
FROM generate_series(1,10000) x;
                                                         QUERY PLAN                                                         
----------------------------------------------------------------------------------------------------------------------------
 Function Scan on generate_series x  (cost=0.00..300.00 rows=1000 width=4) (actual time=10.407..263.489 rows=10000 loops=1)
 Planning time: 0.335 ms
 Execution time: 290.192 ms
(3 rows)

Time: 293.254 ms
test=# EXPLAIN ANALYZE SELECT jsonb_delete_left(('{"a":' || x || ', "b":' || x*2 || ', "c":' || x*x || '}')::jsonb, 'b')
FROM generate_series(1,10000) x;
                                                         QUERY PLAN                                                         
----------------------------------------------------------------------------------------------------------------------------
 Function Scan on generate_series x  (cost=0.00..300.00 rows=1000 width=4) (actual time=15.165..767.706 rows=10000 loops=1)
 Planning time: 0.785 ms
 Execution time: 803.258 ms
(3 rows)

Time: 809.088 ms

Whilst processing 1 row really doesn’t show any improvement (the timings for both varied in the 2~10ms range), with 10,000 rows the C version is just over twice as quick.

If these times stick out as particularly dire, it’s probably just because the machine I’m testing on is very old.

Test 2 – deleting multiple keys

test=# SELECT jsonb_delete('{"a": 1, "b": 2, "c": 3}'::jsonb, ARRAY['a','b']);
 jsonb_delete 
--------------
 {"c": 3}
(1 row)

Time: 3.482 ms

test=# SELECT jsonb_delete_left('{"a": 1, "b": 2, "c": 3}'::jsonb, ARRAY['a','b']);
 jsonb_delete_left 
-------------------
 {"c": 3}
(1 row)

Time: 3.613 ms
test=# EXPLAIN ANALYZE SELECT jsonb_delete(('{"a":' || x || ', "b":' || x*2 || ', "c":' || x*x || '}')::jsonb, ARRAY['a','b'])
FROM generate_series(1,10000) x;
                                                        QUERY PLAN                                                        
--------------------------------------------------------------------------------------------------------------------------
 Function Scan on generate_series x  (cost=0.00..52.50 rows=1000 width=4) (actual time=5.805..177.507 rows=10000 loops=1)
 Planning time: 1.646 ms
 Execution time: 209.137 ms
(3 rows)

Time: 213.507 ms

test=# EXPLAIN ANALYZE SELECT jsonb_delete_left(('{"a":' || x || ', "b":' || x*2 || ', "c":' || x*x || '}')::jsonb, ARRAY['a','b'])
FROM generate_series(1,10000) x;
                                                         QUERY PLAN                                                         
----------------------------------------------------------------------------------------------------------------------------
 Function Scan on generate_series x  (cost=0.00..300.00 rows=1000 width=4) (actual time=15.805..757.500 rows=10000 loops=1)
 Planning time: 0.595 ms
 Execution time: 789.272 ms
(3 rows)

Time: 793.229 ms

Results are similar; we’re essentially hitting the same C function.

Test 3 – Deleting matching jsonb key/value pairs

The C version of this function essentially loops round the left jsonb value looking up keys
in the right jsonb value. If it finds a matching key it does a string based comparison on
the values (treating nested jsonb as a string too) and if the values match as well then the key/value pair is removed.

test=# SELECT jsonb_delete('{"a": 1, "b": 2, "c": 3}'::jsonb, '{"a": 4, "b": 2}'::jsonb);
   jsonb_delete   
------------------
 {"a": 1, "c": 3}
(1 row)

Time: 3.114 ms
test=# SELECT jsonb_delete_left('{"a": 1, "b": 2, "c": 3}'::jsonb, '{"a": 4, "b": 2}'::jsonb);
 jsonb_delete_left 
-------------------
 {"a": 1, "c": 3}
(1 row)

Time: 6.899 ms
test=# EXPLAIN ANALYZE SELECT jsonb_delete(('{"a":' || x || ', "b":' || x*2 || ', "c":' || x*x || '}')::jsonb,
('{"a":' || x || ', "d":' || x*2 || ', "c":' || x*10 || '}')::jsonb)
FROM generate_series(1,10000) x;
                                                        QUERY PLAN                                                        
--------------------------------------------------------------------------------------------------------------------------
 Function Scan on generate_series x  (cost=0.00..92.50 rows=1000 width=4) (actual time=8.452..238.210 rows=10000 loops=1)
 Planning time: 0.428 ms
 Execution time: 266.358 ms
(3 rows)

Time: 270.161 ms

test=# EXPLAIN ANALYZE SELECT jsonb_delete_left(('{"a":' || x || ', "b":' || x*2 || ', "c":' || x*x || '}')::jsonb,
('{"a":' || x || ', "d":' || x*2 || ', "c":' || x*10 || '}')::jsonb)
FROM generate_series(1,10000) x;
                                                         QUERY PLAN                                                          
-----------------------------------------------------------------------------------------------------------------------------
 Function Scan on generate_series x  (cost=0.00..340.00 rows=1000 width=4) (actual time=11.833..1206.990 rows=10000 loops=1)
 Planning time: 0.759 ms
 Execution time: 1248.481 ms
(3 rows)

Time: 1253.392 ms

There’s a bigger improvement here; it’s about 4.5 times quicker.

Test 4 – concatenation

The C function for this is essentially a cut n shut job on both jsonb values, blindly
pushing all the values onto the return value and leaving the lower level jsonb api
to do the deduplication:

test=# SELECT jsonb_concat('{"a": 1, "b": 2, "c": 3}'::jsonb, '{"a": 4, "d": 4, "z": 26}'::jsonb);
               jsonb_concat                
-------------------------------------------
 {"a": 4, "b": 2, "c": 3, "d": 4, "z": 26}
(1 row)

Time: 3.028 ms

test=# SELECT jsonb_concat_left('{"a": 1, "b": 2, "c": 3}'::jsonb, '{"a": 4, "d": 4, "z": 26}'::jsonb);
             jsonb_concat_left             
-------------------------------------------
 {"a": 4, "b": 2, "c": 3, "d": 4, "z": 26}
(1 row)

Time: 4.731 ms

Again nothing to get excited about. Testing on a larger quantity of rows shows a similar improvement to the jsonb -jsonb delete operator/function above:

test=# EXPLAIN ANALYZE SELECT jsonb_concat(('{"a":' || x || ', "b":' || x*2 || ', "c":' || x*x || '}')::jsonb,
('{"a":' || x || ', "d":' || x*2 || ', "c":' || x*10 || '}')::jsonb)
FROM generate_series(1,10000) x;
                                                        QUERY PLAN                                                         
---------------------------------------------------------------------------------------------------------------------------
 Function Scan on generate_series x  (cost=0.00..92.50 rows=1000 width=4) (actual time=12.481..255.910 rows=10000 loops=1)
 Planning time: 0.599 ms
 Execution time: 285.357 ms
(3 rows)

Time: 288.615 ms

test=# EXPLAIN ANALYZE SELECT jsonb_concat_left(('{"a":' || x || ', "b":' || x*2 || ', "c":' || x*x || '}')::jsonb,
('{"a":' || x || ', "d":' || x*2 || ', "c":' || x*10 || '}')::jsonb)
FROM generate_series(1,10000) x;
                                                         QUERY PLAN                                                          
-----------------------------------------------------------------------------------------------------------------------------
 Function Scan on generate_series x  (cost=0.00..340.00 rows=1000 width=4) (actual time=13.931..1051.100 rows=10000 loops=1)
 Planning time: 5.160 ms
 Execution time: 1091.596 ms
(3 rows)

Time: 1103.165 ms

So in conclusion the results are nothing earth shattering, but there is a small improvement. Essentially all these functions are doing is iterating around the jsonb and building new return values; it’d be nice to see what someone more familiar with the jsonb internals at a lower level could come up with.

Copying Pavel Stehule’s simple history table but with the jsonb type

On 15/01/2015 Pavel Stehule wrote about implementing a dead simple history table using the hstore type.

On Friday evening I wanted to copy this almost line for line switching the hstore type for jsonb , but I counldn’t really see how to replicate the update part so simply without creating a delete operator. Once that operator has been created it appears to work:

CREATE TABLE test(a int, b int, c int);

CREATE TABLE history(
  event_time timestamp(2),
  executed_by text, 
  origin_value jsonb,
  new_value jsonb
);

CREATE OR REPLACE FUNCTION history_insert()
RETURNS TRIGGER AS $$
BEGIN
  INSERT INTO history(event_time, executed_by, new_value)
     VALUES(CURRENT_TIMESTAMP, SESSION_USER, row_to_json(NEW)::jsonb);
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE OR REPLACE FUNCTION history_delete()
RETURNS TRIGGER AS $$
BEGIN
  INSERT INTO history(event_time, executed_by, origin_value)
     VALUES(CURRENT_TIMESTAMP, SESSION_USER, row_to_json(OLD)::jsonb);
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE OR REPLACE FUNCTION history_update()
RETURNS TRIGGER AS $$
DECLARE
  js_new jsonb := row_to_json(NEW)::jsonb;
  js_old jsonb := row_to_json(OLD)::jsonb;
BEGIN
  INSERT INTO history(event_time, executed_by, origin_value, new_value)
     VALUES(CURRENT_TIMESTAMP, SESSION_USER, js_old - js_new, js_new - js_old);
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER test_history_insert AFTER INSERT ON test
  FOR EACH ROW EXECUTE PROCEDURE history_insert();

CREATE TRIGGER test_history_delete AFTER DELETE ON test
  FOR EACH ROW EXECUTE PROCEDURE history_delete();

CREATE TRIGGER test_history_update AFTER UPDATE ON test
  FOR EACH ROW EXECUTE PROCEDURE history_update();

Result:

INSERT INTO test VALUES(1000, 1001, 1002);
UPDATE test SET a = 10, c = 20;
DELETE FROM test;

postgres=# table history;
       event_time       | executed_by |         origin_value          |             new_value             
------------------------+-------------+-------------------------------+-----------------------------------
 2015-01-19 13:12:54.26 | glyn        |                               | {"a": 1000, "b": 1001, "c": 1002}
 2015-01-19 13:12:54.31 | glyn        | {"a": 1000, "c": 1002}        | {"a": 10, "c": 20}
 2015-01-19 13:12:54.94 | glyn        | {"a": 10, "b": 1001, "c": 20} | 
(3 rows)

Tested on PostgreSQL 9.4