Thorsten Schaeff has been studying Computer Science and Media at the Media University in Stuttgart and the Institute of Art, Design and Technology in Dublin. For the past six months heâs been interning with the Maps for Work Team in London, researching best practice architectures for working with big geo data in the cloud.
Googleâs Cloud Platform offers a great set of tools for creating easily scalable applications in the cloud. In this post, Iâll explore some of the special challenges of working with geospatial data in a cloud environment, and how Googleâs Cloud can help. Iâve found that there arenât many options to do this, especially when dealing with complicated operations like geofencing
with multiple complex polygons. You can find the complete code for my approach on GitHub
Geofencing is the procedure of identifying if a location lies within a certain fence, e.g. neighborhood boundaries, school attendance zones or even the outline of a shop in a mall. Itâs particularly useful in mobile applications that need to apply this extra context to someoneâs exact location. This process isnât actually as straight forward as youâd hope and, depending on the complexity of your fences, can include some intense calculations and if your app gets a lot of use, you need to make sure this doesnât impact performance.
In order to simplify this problem this blogpost outlines the process of creating a scalable but affordable geofencing API on Googleâs App Engine.
And the best part? Itâs completely free to start playing around.Geofencing a route through NYC against 342 different polygons that resulted from converting the NYC neighbourhood tabulation areas into single-part polygons.
Getting startedTo kick things off you can work through the Java Backend API Tutorial
. This uses Apache Maven
to manage and build the project.
If you want to dive right in you can download the finished geofencing API from my GitHub account
The ArchitectureThe requirements are:
- Storing complex fences (represented as polygons) and some metadata like a name and a description. For this I use Cloud Datastore, a scalable, fully managed, schemaless database for storing non-relational data. You can even use this to store and serve GeoJSON to your frontend.
- Indexing these polygons for fast querying in a spatial index. I use an STR-Tree (part of JTS) which I store as a Java Object in memcache for fast access.
- Serving results to multiple platforms through HTTP requests. To achieve this I use Google Cloud Endpoints, a set of tools and libraries that allow you to generate APIs and client libraries.
Thatâs all you need to get started - so letâs start cooking!
Creating the ProjectTo set up the project simply use Apache Maven and follow the instructions here
. This creates the correct folder structure, sets up the routing in the web.xml file for use with Googleâs API Explorer and creates a Java file with a sample endpoint.
Adding additional LibrariesIâm using the Java Topology Suite
to find out which polygon a certain latitude-longitude-pair is in. JTS is an open source library that offers a nice set of geometric functions.
To include this library into your build path you simply add the JTS Maven dependency
to the pom.xml file in your projectâs root directory.
In addition Iâm using the GSON
library to handle JSON within the Java backend. You can basically use any JSON library you want to. If you want to use GSON import this dependency
Writing your EndpointsAdding Fences to Cloud DatastoreFor the sake of convenience youâre only storing the fencesâ vertices and some metadata. To send and receive data through the endpoints you need an object model which you need to create in a little Java Bean called MyFence.java.
Now you need to create an endpoint called add. This endpoint expects a string for the group name, a boolean indicating whether to rebuild the spatial index, and a JSON object representing the fenceâs object model. From this App Engine creates a new fence and writes it to Cloud Datastore.
Retrieving a List of our FencesFor some use cases it makes sense to fetch all the fences at once in the beginning, therefore you want to have an endpoint to list all fences from a certain group.
Cloud Datastore uses internal indexes
to speed up queries. If you deploy the API directly to App Engine youâre probably going to get an error message, saying that the Datastore query youâre using needs an index. The App Engine Development server can auto-generate the indexes
, therefore Iâd recommend testing all your endpoints on the development server before pushing it to App Engine.
Getting a Fenceâs Metadata by its IDWhen querying the fences you only return the ids of the fences in the result, therefore you need an endpoint to retrieve the metadata that corresponds to a fenceâs id.
Building the Spatial IndexTo speed up the geofencing queries you put the fences in an STR tree. The JTS library does most of the heavy lifting here, so you only need to fetch all your fences from the Datastore, create a polygon object for each one and add the polygonâs bounding box to the index.
You then build the index and write it to memcache for fast read access.
Testing a pointYou want to have an endpoint to test any latitude-longitude-pair against all your fences. This is the actual geofencing part. Thatâs so you will be able to know, if the point falls into any of your fences and if so, it should return the ids of the fences the point is in.
For this you first need to retrieve the index from memcache. Then query the index with the bounding box of the point which returns a list of polygons. Since the index only tests against the bounding boxes of the polygons, you need to iterate through the list and test if the point actually lies within the polygon.
Querying for a Polylines or PolygonsThe process of testing for a point can easily be adapted to test the fences against polylines and polygons. In the case of polylines you query the index with the polylineâs bounding box and then test if the polyline actually crosses the returned fences.
When testing for a polygon you want to get back all fences that are either completely or partly contained in the polygon. Therefore you test if the returned fences are within the polygon or are not disjoint. For some use cases you only want to return fences that are completely contained within the polygon. In that case you want to delete the not disjoint test in the if clause.
Testing & Deploying to App EngineTo test or deploy your API simply follow the steps in the âUsing Apache Mavenâ tutorial
Scalability & PricingThe beauty of this is, since itâs running on App Engine, Googleâs platform as a service offering, it scales automatically and you only pay for what you use.
If you want to insure best performance and great scalability you should consider to switch from the free and shared memcache to a dedicated memcache
for your application. This guarantees enough capacity for your spatial index and therefore ensures enough space even for a large amount of complex fences.
Thatâs it - thatâs all you need to create a fast and scalable geofencing API.
Preview: Processing Big Spatial Data in the Cloud with Dataflow
In my next post I will show you how I geofenced more than 340 million NYC Taxi locations
in the cloud using Googleâs new Big Data tool called Cloud Dataflow