What’s it about
As I already used ETL tools in large-scale IT environments at banks I was interested in “What can I do with them in terms of spatial data?”. As I searched the web I found those two candidates: FME and the open source tool talend. Both are somehow different but I wanted to solve two problems:- safe an excel file in three different spatial formats (shapefile, KML and geoJSON)
- load data from a google spreadsheet, transform it into a geoJSON and put it onto a server using ftp
FME
FME is a product from Safe Software and is somehow the flagship when it comes to ETL processes, data manipulation and process monitoring. The installation is a no-brainer but this does definitely not come for free. An indicator for the price is set at the cloud provider. Despite this price: If you need software like FME for your studies, just drop them a line as they have an academic grant program. For short-term tests they offer a 30 day trial. The software itself can be used in three-way: Desktop version, Server version or Cloud version. You’ll get support for over 350 file formats. The first task was done quite easy as you simply load the excel with a file reader process and attach three different file writers to it.As you simply need to define the lat/lon columns everything else is done by the program. Take care of string formatting a bit and decide which columns you really need and off you go. One read process, three writers. clean and simple.
The help section of this software is really great and you will find plenty of video tutorials for this baby made by FME.
For the second task I needed some more magic from the so-called Transformer Gallery. There all the fancy processes and tasks are stored like string manipulation, spatial queries and even an IP geocoder or the What3WordsEncoder. After choosing the Google Sheet reader I needed to replace some strings and format my lat/lon columns and defined the lat/lon columns for the Geometry. To work on with it I needed to store it on my hdd. I loaded it back again from this file and saved it again as geoJSON. That was the easy task. The hard one was to get it on the server so a potential webmap application could work with the data. Therefore I needed to use the System Caller Transformer. The normal FTPCaller was not working with our sftp digital-geography server (…). I ended up with calling WinSCP on my Windows PC with some custom commands. Furthermore I needed to start this last process with a so-called creator. I was not able to find out to start a follow-up process if the “storing process” was successful or not. In the end my project looks quite clean:
The whole thing took me approx. 6 seconds to run. If you want to run tasks periodically you should take a look at FME Server…
I like the clean interface and the supporting docs quite a lot and the Data Inspector will support your daily work a lot.
Off course I was not able to explore the whole power of FME but I think it is a powerful tool. I will definitely play around with it for a while.
Talend Spatial
Talend is a java based platform independent data quality and integration tool. It is somehow “open source”. I was using talend’s Data Integration Tool to work on my problems. As this is a Java application make sure you have a proper JDK installation. Then there is no installation of talend at all as it runs in this Java sandbox… The core of talend is not designed to work with spatial data. But there is this nice github page for a spatial extension. Together with the forum it is a good starting point.The startup of the program takes a while and the interface is quite slow on my laptop (Core i7, 8GB, SSD) and the suggested download of additional external libraries failed often. I skipped this as talend tells from a look at you processes if a library is missing and asks you to install it prior running your project. The installation of the spatial components is done in 5 minutes:
Once I got set up everything I found out that I need to load the xls input file for my conversion three times…
As it is done it FME I also need to define each column in terms of the format but to define the geometry I needed to embed this process s2DPointReplacer. Not very intuitive compared to FME but maybe it’s the burden of the multi-purpose tool.
The other process of using a google spreadsheet for upload to a server as geoJSON was easier compared to FME in terms of running a FTP task but I was too lazy to figure out how to connect with GDRIVE, get the file and so on. I simply made a filefetch of the published google spreadsheet, stored it locally, made the decimal separator replacement and the geometry extraction. The whole process was finished after 3-4 seconds. So it was a bit faster than FME ;-).
I like the multipurpose approach of the tool but the program was a bit buggy (changing language to only English made it crash) and a bit slow from the interface. Nevertheless it is a good alternative if you need to take care of multiple also non spatial ETL tasks.
For script automation: There is a great article in the talend help center.
Other Software
Of course there are alternatives:- ArcGIS Data Interoperability (which has support for FME processes)
- Geokettle as a smaller ETL tool but also open source
- …
In case you want to add some tools to the “Other Software” section, there is also hale studio open source. It’s designed specifically for work with complex data structures and fully supports standards like GML, CityGML and INSPIRE, but also databases and services.
THank you Thorsten for the addition. HALE is already mentioned in the stackexchange article I linked to. But for the record: http://www.esdi-community.eu/projects/hale
And you should point out, that you’re the Manager at HALE 😉