OSM, PostGIS and Docker: an approach for automatic processing

In the summer of 2019 Michael Marz started to extract most important items from OpenStreetMap and published those extracts as geopackages on his webpage. Back then I looked at his code and promised to create a docker-based workflow for this. Half a year later I would like to present my very first running docker container.

The workflow of Michael’s scripts is straight forward:

  • install PostGIS on a Ubuntu machine
  • download the OSM extract of your choice (he likes global)
  • cut it into pieces to save some space in the DB
  • load the data into PostGIS
  • export data into a geopackage
  • drop table
  • take the next piece
railways as tagged in OSM

The main advantage of a docker container: three lines of code and you can run it on every comupter with docker installed…

Docker: Start

As I like it simple I start with a simple Ubuntu installation and have not choosen one of the many PostGIS Docker containers which are out there. This instruction of Docker was a great starting point for dockerizing PostgreSQL. Yet I wanted to use the latest PostgreSQL and PostGIS versions. Please save the following code as Dockerfile:

# use of an ubuntu base for simplicity and transparency

FROM ubuntu:18.04
MAINTAINER Riccardo Klinger <riccardo.klinger@gmail.com>

# getting postgres
RUN apt-get update &amp;&amp; apt-get -y install wget gnupg2
RUN wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | apt-key add -

# Add PostgreSQL's repository. It contains the most recent stable release
#     of PostgreSQL, ``12``.
RUN echo "deb http://apt.postgresql.org/pub/repos/apt/ bionic-pgdg main" > /etc/apt/sources.list.d/pgdg.list

# Install software-properties-common and PostgreSQL 12
#  and some other packages for ftp
RUN apt-get update
RUN DEBIAN_FRONTEND=noninteractive apt-get install -y \
  software-properties-common \
  postgresql-12 \
  postgresql-client-12 \
  postgresql-contrib-12 \
  postgresql-12-postgis-3 \
  postgresql-12-postgis-3-scripts \
  aptitude  \
  unzip \
  openssh-client \
  openssh-server \
  sshpass \
  &amp;&amp; aptitude update \
  &amp;&amp; aptitude install -y nano axel wput screen p7zip-full osmium-tool \
  vnstat gdal-bin

As you see: there are plenty of packages involved. But in the end, I mainly followed Michael’s step-by-step instruction.

The Docker File System

The next step includes some file system work as I create a base folder to hold the downloaded files the pbf extracts, and the resulting geopackages:

WORKDIR /home/osmdata
RUN mkdir /home/osmdata/pbf
RUN mkdir /home/osmdata/gpkg
COPY ./import_osm.sh /home/osmdata/pbf
COPY ./mapping.yml /home/osmdata/pbf
RUN ["chown" , "-R", "postgres:postgres", "/home/osmdata"]
RUN ["chmod", "+x", "/home/osmdata/pbf/import_osm.sh"] 

In the last line I made the shell script executable which was copied to the machine from the local host. This shell script will do all the work, once the machine is up and running with the PostGIS database.

As the folders are in place now, I download the pbf file of interest (in this case I use OSM data of the city of Bremen as it is just a small dataset. Furthermore I install imposm as I need it to push data to PostGIS once the tagged pbfs are there:

## download imposm3
RUN wget https://github.com/omniscale/imposm3/releases/download/v0.10.0/imposm-0.10.0-linux-x86-64.tar.gz -P /home/osmdata &amp;&amp;\
    tar -xf /home/osmdata/imposm-0.10.0-linux-x86-64.tar.gz &amp;&amp;\
    cp -R /home/osmdata/imposm-0.10.0-linux-x86-64/* /home/osmdata/pbf/ &amp;&amp;\
    rm -R /home/osmdata/imposm-0.10.0-linux-x86-64

# Get the data and push it to the database
RUN axel -n 3 -a -v https://download.geofabrik.de/europe/germany/bremen-latest.osm.pbf &amp;&amp;\
    osmium tags-filter bremen-latest.osm.pbf nwr/aerialway -o pbf/bremen-latest_aerialway.osm.pbf &amp;&amp;\
    osmium tags-filter bremen-latest.osm.pbf nwr/aeroway -o pbf/bremen-latest_aeroway.osm.pbf &amp;&amp;\
    [...]
    osmium tags-filter bremen-latest.osm.pbf nwr/water -o pbf/bremen-latest_water.osm.pbf

From PostgreSQL to PostGIS

As the file system and the data is in place now, I “install” PostGIS and some other extensions. Furthermore I open the port 5432 and allow to listen to outside addresses:

# switch USER
USER postgres

# Adjust PostgreSQL configuration so that remote connections to the
# database are possible.
RUN echo "host all  all    0.0.0.0/0  md5" >> /etc/postgresql/12/main/pg_hba.conf

# And add ``listen_addresses`` to ``/etc/postgresql/12/main/postgresql.conf``
RUN echo "listen_addresses='*'" >> /etc/postgresql/12/main/postgresql.conf

# Expose the PostgreSQL port
EXPOSE 5432

# Create a PostgreSQL role named ``osmdata`` with ``osmdata`` as the password and
# then create a database `osmdata` owned by the ``osmdata`` role and add
# the postgis extension

RUN    /etc/init.d/postgresql start &amp;&amp;\
    psql --command "CREATE USER osmdata WITH SUPERUSER PASSWORD 'osmdata';" &amp;&amp;\
    createdb -O osmdata osmdata &amp;&amp;\
    psql -d osmdata --command "CREATE EXTENSION IF NOT EXISTS postgis;" &amp;&amp;\
    psql -d osmdata --command "CREATE EXTENSION IF NOT EXISTS postgis_topology;" &amp;&amp;\
    psql -d osmdata --command "CREATE EXTENSION hstore;" &amp;&amp;\
    psql -d osmdata --command "CREATE SCHEMA import;"

The next lines are from the dockerize PostgreSQL instruction mentioned above:

# Add VOLUMEs to allow backup of config, logs and databases
VOLUME  ["/etc/postgresql", "/var/log/postgresql", "/var/lib/postgresql", "/home/osmdata/gpkg"]

# Set the default command to run when starting the container
CMD ["/usr/lib/postgresql/12/bin/postgres", "-D", "/var/lib/postgresql/12/main", "-c", "config_file=/etc/postgresql/12/main/postgresql.conf"]

I’ve added the gpkg folder as this will be the folder where all the exports will be stored. Later on we will mount this folder to the host to have access to these files.

Build and Run the Container

This Dockerfile is used to build the docker container in the current path with the tag osmGIS:

docker build -t osmgis .

This will take some time as all the above steps are executed. If something goes wrong, you can rerun this line and docker will use the latest “successful” step of your Dockerfile and start from there. The non cached building process takes about 3 min from start to end.

Once the build process finishes, it’s time to run the osmgis container:

docker run --volume aLocalPath:/home/osmdata/gpkg --rm -P -p 0.0.0.0:55432:5432 -d --name osmgis osmgis

Please check: You can use the PostGIS database on the host with the port 55432. The volume option with aLocalPath will give you access to the exported files after we execute the shell script.

Importing Data and Extract GeoPackages

Once the container is up and running I execute the copied shell script import_osm.sh. This script does all the magic. The main points:

  • iterate thorugh the pbf files
  • loading the pbfs into PostGIS using imposm3 and a pre-defined mapping
  • extracting one geopackage with points, lines and polygons from the tables using ogr2ogr
  • dropping the three tables to have some space for the next pieces

Once this is done, we find the geopackages at the aLocalPath. If you would like to post the data to a ftp server right away. Just start the script with three parameters and of you go:

docker exec -it osmgis /home/osmdata/pbf/import_osm.sh

if you want to push the gpkgs to a ftp server use:

docker exec -it osmgis /home/osmdata/pbf/import_osm.sh USERNAME FTPSERVER password

If the extract was successful and you don’t need the container any more, you might want to shut down the container:

docker stop osmgis

You can download all the files directly from Michaels repository on github.

3
Leave a Reply

avatar
3 Comment threads
0 Thread replies
1 Followers
 
Most reacted comment
Hottest comment thread
1 Comment authors
Michel Stuyts Recent comment authors

This site uses Akismet to reduce spam. Learn how your comment data is processed.

  Subscribe  
newest oldest most voted
Notify of
Michel Stuyts
Guest

If you want to copy the resulting gpkgs from your docker container to the host machine you can run:

docker cp osmgis:/home/osmdata/gpkg/. .

trackback

[…] OSM, PostGIS and Docker: an approach for automatic processing; […]