In the summer of 2019 Michael Marz started to extract most important items from OpenStreetMap and published those extracts as geopackages on his webpage. Back then I looked at his code and promised to create a docker-based workflow for this. Half a year later I would like to present my very first running docker container.
The workflow of Michael’s scripts is straight forward:
- install PostGIS on a Ubuntu machine
- download the OSM extract of your choice (he likes global)
- cut it into pieces to save some space in the DB
- load the data into PostGIS
- export data into a geopackage
- drop table
- take the next piece
The main advantage of a docker container: three lines of code and you can run it on every comupter with docker installed…
Docker: Start
As I like it simple I start with a simple Ubuntu installation and have not choosen one of the many PostGIS Docker containers which are out there. This instruction of Docker was a great starting point for dockerizing PostgreSQL. Yet I wanted to use the latest PostgreSQL and PostGIS versions. Please save the following code as Dockerfile:
# use of an ubuntu base for simplicity and transparency FROM ubuntu:18.04 MAINTAINER Riccardo Klinger <riccardo.klinger@gmail.com> # getting postgres RUN apt-get update && apt-get -y install wget gnupg2 RUN wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | apt-key add - # Add PostgreSQL's repository. It contains the most recent stable release # of PostgreSQL, ``12``. RUN echo "deb http://apt.postgresql.org/pub/repos/apt/ bionic-pgdg main" > /etc/apt/sources.list.d/pgdg.list # Install software-properties-common and PostgreSQL 12 # and some other packages for ftp RUN apt-get update RUN DEBIAN_FRONTEND=noninteractive apt-get install -y \ software-properties-common \ postgresql-12 \ postgresql-client-12 \ postgresql-contrib-12 \ postgresql-12-postgis-3 \ postgresql-12-postgis-3-scripts \ aptitude \ unzip \ openssh-client \ openssh-server \ sshpass \ && aptitude update \ && aptitude install -y nano axel wput screen p7zip-full osmium-tool \ vnstat gdal-bin
As you see: there are plenty of packages involved. But in the end, I mainly followed Michael’s step-by-step instruction.
The Docker File System
The next step includes some file system work as I create a base folder to hold the downloaded files the pbf extracts, and the resulting geopackages:
WORKDIR /home/osmdata RUN mkdir /home/osmdata/pbf RUN mkdir /home/osmdata/gpkg COPY ./import_osm.sh /home/osmdata/pbf COPY ./mapping.yml /home/osmdata/pbf RUN ["chown" , "-R", "postgres:postgres", "/home/osmdata"] RUN ["chmod", "+x", "/home/osmdata/pbf/import_osm.sh"]
In the last line I made the shell script executable which was copied to the machine from the local host. This shell script will do all the work, once the machine is up and running with the PostGIS database.
As the folders are in place now, I download the pbf file of interest (in this case I use OSM data of the city of Bremen as it is just a small dataset. Furthermore I install imposm as I need it to push data to PostGIS once the tagged pbfs are there:
## download imposm3 RUN wget https://github.com/omniscale/imposm3/releases/download/v0.10.0/imposm-0.10.0-linux-x86-64.tar.gz -P /home/osmdata &&\ tar -xf /home/osmdata/imposm-0.10.0-linux-x86-64.tar.gz &&\ cp -R /home/osmdata/imposm-0.10.0-linux-x86-64/* /home/osmdata/pbf/ &&\ rm -R /home/osmdata/imposm-0.10.0-linux-x86-64 # Get the data and push it to the database RUN axel -n 3 -a -v https://download.geofabrik.de/europe/germany/bremen-latest.osm.pbf &&\ osmium tags-filter bremen-latest.osm.pbf nwr/aerialway -o pbf/bremen-latest_aerialway.osm.pbf &&\ osmium tags-filter bremen-latest.osm.pbf nwr/aeroway -o pbf/bremen-latest_aeroway.osm.pbf &&\ [...] osmium tags-filter bremen-latest.osm.pbf nwr/water -o pbf/bremen-latest_water.osm.pbf
From PostgreSQL to PostGIS
As the file system and the data is in place now, I “install” PostGIS and some other extensions. Furthermore I open the port 5432 and allow to listen to outside addresses:
# switch USER USER postgres # Adjust PostgreSQL configuration so that remote connections to the # database are possible. RUN echo "host all all 0.0.0.0/0 md5" >> /etc/postgresql/12/main/pg_hba.conf # And add ``listen_addresses`` to ``/etc/postgresql/12/main/postgresql.conf`` RUN echo "listen_addresses='*'" >> /etc/postgresql/12/main/postgresql.conf # Expose the PostgreSQL port EXPOSE 5432 # Create a PostgreSQL role named ``osmdata`` with ``osmdata`` as the password and # then create a database `osmdata` owned by the ``osmdata`` role and add # the postgis extension RUN /etc/init.d/postgresql start &&\ psql --command "CREATE USER osmdata WITH SUPERUSER PASSWORD 'osmdata';" &&\ createdb -O osmdata osmdata &&\ psql -d osmdata --command "CREATE EXTENSION IF NOT EXISTS postgis;" &&\ psql -d osmdata --command "CREATE EXTENSION IF NOT EXISTS postgis_topology;" &&\ psql -d osmdata --command "CREATE EXTENSION hstore;" &&\ psql -d osmdata --command "CREATE SCHEMA import;"
The next lines are from the dockerize PostgreSQL instruction mentioned above:
# Add VOLUMEs to allow backup of config, logs and databases VOLUME ["/etc/postgresql", "/var/log/postgresql", "/var/lib/postgresql", "/home/osmdata/gpkg"] # Set the default command to run when starting the container CMD ["/usr/lib/postgresql/12/bin/postgres", "-D", "/var/lib/postgresql/12/main", "-c", "config_file=/etc/postgresql/12/main/postgresql.conf"]
I’ve added the gpkg folder as this will be the folder where all the exports will be stored. Later on we will mount this folder to the host to have access to these files.
Build and Run the Container
This Dockerfile is used to build the docker container in the current path with the tag osmGIS:
docker build -t osmgis .
This will take some time as all the above steps are executed. If something goes wrong, you can rerun this line and docker will use the latest “successful” step of your Dockerfile and start from there. The non cached building process takes about 3 min from start to end.
Once the build process finishes, it’s time to run the osmgis container:
docker run --volume aLocalPath:/home/osmdata/gpkg --rm -P -p 0.0.0.0:55432:5432 -d --name osmgis osmgis
Please check: You can use the PostGIS database on the host with the port 55432. The volume option with aLocalPath will give you access to the exported files after we execute the shell script.
Importing Data and Extract GeoPackages
Once the container is up and running I execute the copied shell script import_osm.sh. This script does all the magic. The main points:
- iterate thorugh the pbf files
- loading the pbfs into PostGIS using imposm3 and a pre-defined mapping
- extracting one geopackage with points, lines and polygons from the tables using ogr2ogr
- dropping the three tables to have some space for the next pieces
Once this is done, we find the geopackages at the aLocalPath. If you would like to post the data to a ftp server right away. Just start the script with three parameters and of you go:
docker exec -it osmgis /home/osmdata/pbf/import_osm.sh
if you want to push the gpkgs to a ftp server use:
docker exec -it osmgis /home/osmdata/pbf/import_osm.sh USERNAME FTPSERVER password
If the extract was successful and you don’t need the container any more, you might want to shut down the container:
docker stop osmgis
You can download all the files directly from Michaels repository on github.
[…] https://digital-geography.com/osm-postgis-and-docker-an-approach-for-automatic-processing/ […]
If you want to copy the resulting gpkgs from your docker container to the host machine you can run:
docker cp osmgis:/home/osmdata/gpkg/. .
[…] OSM, PostGIS and Docker: an approach for automatic processing; […]
Does docker allow you to set up your infrastructure without disturbing our existing OS? Its portability and isolation is the only strength or there are more to it?