A few months ago I bashed together a pl/pgsql function for importing Royal Mail PAF
data into PostgreSQL and at the same time I also put together another function to perform an update from their “changes” files.
Today I came to actually try and use that function, and it’s not much of a surprise that it needed a little more work; at the time I was working off a fresh import and a small set of changes, whilst looking (with squinty eyes) at the fairly long winded docs. The changes files from RM are supplied as a journal of changes, which I can only assume were intended to be read and applied sequentially; this means we have to rely on some of the smarts SQL provides us with to apply them in bulk. The docs even try to dissuade the use of the changes files by saying “Applying Changes files regularly is ‘high maintenance’, because applying thousands of changes is a very long & involved process compared to buying full refreshes”, however I don’t want to do full refreshes as that’s likely to cause more interruption, and besides it just seems a bit of a cop out to me.
So after another quick skim over the docs, I made a couple of corrections to the update function where I’d complicated things by replacing blank key columns with nulls, and added in an update procedure for the Alias and Postzon files (which are only supplied as a full refresh), this time using postgis to generate the latitude and longitude and save running my horrible perl script.
Then I had a go at running the update:
TEST=# BEGIN; TEST=# SELECT update_pc_paf('Y14M04', '/tmp'); NOTICE: 2014-05-22 19:40:00.161582+01: Import starting for edition Y14M04 with data root /tmp/Y14M04_CHANGES/ NOTICE: 2014-05-22 19:40:00.165233+01: Begin staging Changes1 file NOTICE: 2014-05-22 19:40:00.439189+01: Done staging Changes1 file NOTICE: 2014-05-22 19:40:00.439379+01: Preparing to update localities NOTICE: 2014-05-22 19:40:00.487656+01: Prepared 15 records for update on localities NOTICE: 2014-05-22 19:40:00.489481+01: Removed 0 records from localities NOTICE: 2014-05-22 19:40:00.504522+01: Created 15 records in localities NOTICE: 2014-05-22 19:40:00.504651+01: Preparing to update thoroughfares NOTICE: 2014-05-22 19:40:00.54741+01: Prepared 377 records for update on thoroughfares NOTICE: 2014-05-22 19:40:00.548924+01: Removed 0 records from thoroughfares NOTICE: 2014-05-22 19:40:00.589353+01: Created 377 records in thoroughfares NOTICE: 2014-05-22 19:40:00.589477+01: Preparing to update thoroughfare_descriptor NOTICE: 2014-05-22 19:40:00.631469+01: Prepared 0 records for update on thoroughfare_descriptor NOTICE: 2014-05-22 19:40:00.632967+01: Removed 0 records from thoroughfare_descriptor NOTICE: 2014-05-22 19:40:00.633275+01: Created 0 records in thoroughfare_descriptor NOTICE: 2014-05-22 19:40:00.633444+01: Preparing to update building_names NOTICE: 2014-05-22 19:40:00.700856+01: Prepared 7947 records for update on building_names NOTICE: 2014-05-22 19:40:00.704261+01: Removed 0 records from building_names NOTICE: 2014-05-22 19:40:01.155068+01: Created 7947 records in building_names NOTICE: 2014-05-22 19:40:01.155178+01: Preparing to update sub_building_names NOTICE: 2014-05-22 19:40:01.223108+01: Prepared 7798 records for update on sub_building_names NOTICE: 2014-05-22 19:40:01.226335+01: Removed 0 records from sub_building_names NOTICE: 2014-05-22 19:40:01.669071+01: Created 7798 records in sub_building_names NOTICE: 2014-05-22 19:40:01.675826+01: Begin staging Changes2 file NOTICE: 2014-05-22 19:40:02.59217+01: Done staging Changes2 file NOTICE: 2014-05-22 19:40:02.592303+01: Preparing to update mainfile NOTICE: 2014-05-22 19:40:05.07166+01: Prepared 216438 records for update on mainfile NOTICE: 2014-05-22 19:41:41.439761+01: Removed 87291 records from mainfile NOTICE: 2014-05-22 19:42:54.976222+01: Created 129147 records in mainfile NOTICE: 2014-05-22 19:42:54.97639+01: Preparing to update organisations NOTICE: 2014-05-22 19:42:55.609607+01: Prepared 65158 records for update on organisations NOTICE: 2014-05-22 19:43:07.778516+01: Removed 32840 records from organisations NOTICE: 2014-05-22 19:43:10.141383+01: Created 32318 records in organisations NOTICE: 2014-05-22 19:43:10.18821+01: Begin staging WChanges file NOTICE: 2014-05-22 19:43:10.218245+01: Done staging WChanges file NOTICE: 2014-05-22 19:43:10.218346+01: Preparing to update welsh mainfile NOTICE: 2014-05-22 19:43:10.311915+01: Prepared 8395 records for update on welsh mainfile NOTICE: 2014-05-22 19:43:20.631738+01: Removed 3098 records from welsh mainfile NOTICE: 2014-05-22 19:43:23.616801+01: Created 5149 records in welsh mainfile NOTICE: 2014-05-22 19:43:23.61696+01: Preparing to update welsh organisations NOTICE: 2014-05-22 19:43:23.624131+01: Prepared 0 records for update on welsh organisations NOTICE: 2014-05-22 19:43:23.624836+01: Removed 0 records from welsh organisations NOTICE: 2014-05-22 19:43:23.625512+01: Created 0 records in welsh organisations NOTICE: 2014-05-22 19:43:23.634001+01: Begin staging postzon NOTICE: 2014-05-22 19:43:29.097317+01: Done staging postzon file, now staging with lat long data NOTICE: 2014-05-22 19:44:41.996743+01: Done staging postzon with lat long data, updating NOTICE: 2014-05-22 19:44:57.195802+01: Inserted 3594 new records for update on postzon_100m NOTICE: 2014-05-22 19:46:13.242174+01: Updated 212398 records for update on postzon_100m NOTICE: 2014-05-22 19:46:24.038271+01: Deleted 3991 records for update on postzon_100m NOTICE: 2014-05-22 19:46:24.808325+01: Begin staging alias file NOTICE: 2014-05-22 19:46:32.810242+01: Done staging alias file NOTICE: 2014-05-22 19:46:34.021948+01: Done staging counties , updating NOTICE: 2014-05-22 19:46:34.023405+01: Inserted 0 new records for update on counties NOTICE: 2014-05-22 19:46:34.024632+01: Updated 0 records for update on counties NOTICE: 2014-05-22 19:46:34.025273+01: Deleted 0 records for update on counties NOTICE: 2014-05-22 19:46:38.25488+01: Done staging county_alias , updating NOTICE: 2014-05-22 19:46:41.711214+01: Inserted 3594 new records for update on county_alias NOTICE: 2014-05-22 19:46:46.227681+01: Updated 12 records for update on county_alias NOTICE: 2014-05-22 19:46:53.812323+01: Deleted 3991 records for update on county_alias NOTICE: 2014-05-22 19:46:53.812483+01: Completed update_pc_paf --------------- t (1 row)
My schema for the PAF data has deferrable foreign keys, and in order to apply the updates in chunks I’ve set them as deferred in the above function; I only find out if I’ve violated them when they’re checked on commit:
TEST=# COMMIT;
…. And it just sits there forever! I go and do something else, but when I come back it’s still sitting there; what did I forget?
Well we can make a pretty good guess that we’re waiting whilst the deferred constraints are checked, but the constraint checks don’t take that long when they’re not deferred so why is it taking so long?
Well it turns out I’ve got indexes on the satellite tables that are referenced by the foreign keys on my main address table, but no indexes for those keys on the actual main address table. It’s apparent that when the deferred constraint is checked it needs to look up the row first; something that wouldn’t have been required if the constraint were immediate. I add some indexes, and try again:
TEST=# COMMIT; Time: 281979.437 ms
It worked, and we didn’t violate any constraints!
The new script (named paf_postgresql_import_postgis.plpgsql) is in the git repository at postgresql/geographic_data (mirrored also on github glynastill/geographic_data).