Marcos Dione: adding-columns-from-osm-to-postgis-with-osmium

Datetime:2016-08-23 02:18:22          Topic: Python  PostgreSQL           Share

My latest Europe import was quite eventful. First, I run out of space several times during the import itself, at indexing time. The good thing is that, if you manage to reclaim some space, and reading a little of source code [1], you can replay the missing queries by hand and stop cursing. To be fair, osm2pgsql currently uses a lot of space in slim+flat-nodes mode: three tables, planet_osm_node , planet_osm_way and planet_osm_relation ; and one file, the flat nodes one. Those are not deleted until the whole process has finished, but they're actually not needed after the processing phase. I started working on fixing that .

But that was not the most difficult part. The most difficult part was that I forgot, somehow, to add a column to the file . Elevation , my own style, renders different icons for different types of castles (and forts too), just like the Historic Place map of the Hiking and Bridle map [2]. So today I sat down and tried to figure out how to reparse the OSM extract I used for the import to add this info.

The first step is to add the column to the tables. But first, which tables should be impacted? Well, the line I should have added to the import style is this:

node,way   castle_type  text         polygon

That says that this applies to nodes and ways. If the element is a way, polygon will try to convert it to a polygon and put it in the planet_osm_polygon table; if it's a node, it ends in the planet_osm_point table. So we just add the column to those tables:

ALTER TABLE planet_osm_point   ADD COLUMN castle_type text;
ALTER TABLE planet_osm_polygon ADD COLUMN castle_type text;

Now how to process the extract? Enter pyosmium . It's a Python binding for the osmium library with a stream-like type of processing à la expat for processing XML. The interface is quite simple: one subclasses osmium.SimpleHandler , defines the element type handlers ( node() , way() and/or relation() ) and that's it! Here's the full code of the simple Python script I did:

#! /usr/bin/python3

import osmium
import psycopg2

conn= psycopg2.connect ('dbname=gis')
cur= conn.cursor ()

class CastleTypes (osmium.SimpleHandler):

    def process (self, thing, table):
        if 'castle_type' in thing.tags:
                name= thing.tags['name']
            # osmium/boost do not raise a KeyError here!
            # SystemError: <Boost.Python.function object at 0x1329cd0> returned a result with an error set
            except (KeyError, SystemError):
                name= ''
            print (table,, name)

            cur.execute ('''UPDATE '''+table+
                         ''' SET castle_type = %s
                            WHERE osm_id = %s''',

    def node (self, n):
        self.process (n, 'planet_osm_point')

    def way (self, w):
        self.process (w, 'planet_osm_polygon')

    relation= way  # handle them the same way (*honk*)

ct= CastleTypes ()
ct.apply_file ('europe-latest.osm.pbf')

The only strange part of the API is that it doesn't seem to raise a KeyError when the tag does not exist, but a SystemError . I'll try to figure this out later. Also interesting is the big amount of unnamed elements with this tag that exist in the DB.

[1] I would love for GitHub to recognize something like and be directed to that method, because #Lxxx gets old pretty quick.

[2] I just noticed how much more complete those maps are. more ideas to use :)

About List