What’s it about
As I already used ETL tools in large-scale IT environments at banks I was interested in “What can I do with them in terms of spatial data?”. As I searched the web I found those two candidates: FME and the open source tool talend. Both are somehow different but I wanted to solve two problems:- safe an excel file in three different spatial formats (shapefile, KML and geoJSON)
- load data from a google spreadsheet, transform it into a geoJSON and put it onto a server using ftp
FME
FME is a product from Safe Software and is somehow the flagship when it comes to ETL processes, data manipulation and process monitoring. The installation is a no-brainer but this does definitely not come for free. An indicator for the price is set at the cloud provider. Despite this price: If you need software like FME for your studies, just drop them a line as they have an academic grant program. For short-term tests they offer a 30 day trial. The software itself can be used in three-way: Desktop version, Server version or Cloud version. You’ll get support for over 350 file formats. The first task was done quite easy as you simply load the excel with a file reader process and attach three different file writers to it.
The help section of this software is really great and you will find plenty of video tutorials for this baby made by FME.
For the second task I needed some more magic from the so-called Transformer Gallery. There all the fancy processes and tasks are stored like string manipulation, spatial queries and even an IP geocoder or the What3WordsEncoder. After choosing the Google Sheet reader I needed to replace some strings and format my lat/lon columns and defined the lat/lon columns for the Geometry. To work on with it I needed to store it on my hdd. I loaded it back again from this file and saved it again as geoJSON. That was the easy task. The hard one was to get it on the server so a potential webmap application could work with the data. Therefore I needed to use the System Caller Transformer. The normal FTPCaller was not working with our sftp digital-geography server (…). I ended up with calling WinSCP on my Windows PC with some custom commands. Furthermore I needed to start this last process with a so-called creator. I was not able to find out to start a follow-up process if the “storing process” was successful or not. In the end my project looks quite clean:

I like the clean interface and the supporting docs quite a lot and the Data Inspector will support your daily work a lot.

Talend Spatial
Talend is a java based platform independent data quality and integration tool. It is somehow “open source”. I was using talend’s Data Integration Tool to work on my problems. As this is a Java application make sure you have a proper JDK installation. Then there is no installation of talend at all as it runs in this Java sandbox… The core of talend is not designed to work with spatial data. But there is this nice github page for a spatial extension. Together with the forum it is a good starting point.The startup of the program takes a while and the interface is quite slow on my laptop (Core i7, 8GB, SSD) and the suggested download of additional external libraries failed often. I skipped this as talend tells from a look at you processes if a library is missing and asks you to install it prior running your project. The installation of the spatial components is done in 5 minutes:
Once I got set up everything I found out that I need to load the xls input file for my conversion three times…


I like the multipurpose approach of the tool but the program was a bit buggy (changing language to only English made it crash) and a bit slow from the interface. Nevertheless it is a good alternative if you need to take care of multiple also non spatial ETL tasks.
For script automation: There is a great article in the talend help center.
Other Software
Of course there are alternatives:- ArcGIS Data Interoperability (which has support for FME processes)
- Geokettle as a smaller ETL tool but also open source
- …
In case you want to add some tools to the “Other Software” section, there is also hale studio open source. It’s designed specifically for work with complex data structures and fully supports standards like GML, CityGML and INSPIRE, but also databases and services.
THank you Thorsten for the addition. HALE is already mentioned in the stackexchange article I linked to. But for the record: http://www.esdi-community.eu/projects/hale
And you should point out, that you’re the Manager at HALE đ