Technology

How to Perform Spatial and Geo-Location Based Querying Efficiently

How to Perform Spatial and Geo-Location Based Querying Efficiently

There are lots of applications in spatial or location based strategies when developing apps and features. With the growing costs of on demand services like Google places API which provides accurate data on venues, geological places for business logic, it is quite crucial to understand and implement efficient methods to cache and search geo positions and information.

Database frameworks that we can use to store geo positions without much resources.

There are many sophisticated frameworks that can store geo positions and vectors. Some of them are Esri GIS, PostGIS(Spatial PostgreSQL), Amazon Aurora etc. But apart from that there are two frameworks that can be used for geo operations with other features as well.

ElasticSearch - ElasticSearch is a distributed database that can provide many different searching capabilities very fast and resiliently. Apart from spatial database functions it also has other optimised searching capabilities that can be used for app functionalities.

GeoFire - GeoFire is a framework that can be run on top of Firebase. Although it is very limited in spatial functions, it can be used along with the goodness of Firebase for your startup project. Meaning if you are already using firebase and want some spatial functions, Geofire is the way to try.

ElasticSearch Geospatial Analysis

ElasticSearch has many geospatial capabilities and applications. Mainly we can classify it as Geo spatial Mapping, Ingest, Query, Aggregate and Visualize.

Geo spatial Mapping - represents the points or geographical shapes that can represent a point or area of a 2D plane such as lines circles, polygons, multi polygons etc. For this you have to explicitly map a geo data field with the indexes to represent data.

Ingest - pipelines can clean, transform and augment your raw data to structured so that it can be indexed by the ElasticSearch DB

Query - represents querying these spatial data with location driven arguments. Commonly used ones are intersect with, are within are contained by or not contained by etc. This is a very important part which allows us to keep the track of all the geo points and shapes when querying and caching.

Aggregate - will summarize your data giving meaning like metrics, statistics or analytics. There are many methods you can aggregate data in ElasticSearch and it is very useful when you are using large set of data with minimum calculations.

Caching Google Places API data in Elasticsearch

Before caching, lets think about why we need to cache. Although google places API was free back when it was started, Google jumped up its costs to a high level during recent years. So relying on google for nearby data is not a good solution for application costs.

Let's think about a way to cache the google data. For example, we will take the business case of “Find Sport Gyms within 5km radius of lat,lon location” query. We can invoke the google places api to get the nearest google places by sending the latitude, longitude and radius with the type of sports gyms.

So every time when location get updated from the phone, if the user has exceeded some displacement like 100m we have to call the api to fetch nearest gyms! Imagine 100k users login within a day to the app and every time they request location, backend will have to request from Google Places API for the nearest Gyms.

Simple method to cache nearest locations

For caching nearest locations, we can follow this simple method.

Google Places

  1. Check the small nearest radius(R) circle is within a large radius(2R) Cached Area.
  2. If yes,
    1. Fetch the locations from cached ElasticSearch db
  3. If not,
    1. Fetch and cache the large radius(2R) circle center the longitude and latitude provided by Google.
    2. Cache the ElasticSearch with the result
    3. Fetch the R radius locations from the DB.
  4. Send the response with locations.

This is a simple method to cache the data without using any crawlers or background operations.

With this simplicity, there are some drawbacks.

  1. There is an overhead when fetching a location initially.
  2. If there are intersections, there is a possibility that already cached data put back onto the db.

If your data doesn’t have to be lightening fast at the first request of an area until its cached, this could be a very easy method to cache. If you can increase the caching radius of 2R to 3R and so on it will provide more fast data delivery assuming more places will be cached. But it will add overhead to initial request and we should try to minimize this for better client experience.

How to implement it

First you need to create 2 indexes in Elasticsearch

  1. Store the places information
  2. Store the large radius information

Since geo points and shapes should have an index, you should create the indexes in the db before pushing data to the DB

jsx
# places
PUT /places
{
  "mappings": {
    "_doc": {
      "properties": {
        "location": {
          "type": "geo_point"
        }
      }
    }
  }
}

# locations
PUT /locations
{
  "mappings": {
    "doc": {
      "properties": {
        "location": {
          "type": "geo_shape",
          "tree": "quadtree",
          "precision": "10m"
        }
      }
    }
  }
}

Then you can create queries based on the radius to query the large radius and the small radius shapes from the ElasticSearch

GET /locations/_search
{
    "query":{
        "bool": {
            "must": {
                "match_all": {}
            },
            "filter": {
                "geo_shape": {
                    "location": {
                        "shape": {
                            "type": "circle",
                            "coordinates" : [104.000, 1.0],
                            "radius" : "1000m"
                        },
                        "relation": "contains"
                    }
                }
            }
        }
    }
}

GET /places/_search
{
   "query": {
        "bool" : {
            "must" : {
                "match_all" : {}
            },
            "filter" : {
                "geo_distance" : {
                    "distance" : "1000m",
                    "location" : {
                        "lat" : 50.84808079999999,
                        "lon" : 4.353067999999999
                    }
                }
            }
        }
    }
}

You can implement the logic in your preferred language and use the logic to determine the radius of the locations according to your requirements.

Considering cons and pros this could be the easiest way to implement location caching with a simple cache miss logic.

@zegates.com

Sandaruwan Nanayakkara

Sandaruwan Nanayakkara

Chief Executive Officer

Sandaruwan is a visionary technology leader and the Chief Technology Officer at Zegates. Driven by his passion for innovation, he has dedicated his professional journey to establishing the premier software service company in Sri Lanka. Through strategic growth and a steadfast commitment to technological advancement, Sandaruwan has been instrumental in the company's expansion, consistently pushing the boundaries of what's possible in the tech industry.