8.3. Working With Geospatial Data

Documentation

VoltDB Home » Documentation » Guide to Performance and Customization

8.3. Working With Geospatial Data

In addition to the classes, methods, and functions to insert and extract geospatial data from the database, VoltDB provides other SQL functions to help you manipulate the data. The functions fall into three main categories:

  • Converting to and from WKT representations:

    ASTEXT()
    POLYGONFROMTEXT()
    POINTFROMTEXT()
    VALIDPOLYGONFROMTEXT()
  • Performing geospatial calculations:

    AREA()
    CENTROID()
    CONTAINS()
    DISTANCE()
    DWITHIN()
    LATITUDE()
    LONGITUDE()
  • Analyzing the structure of a region:

    MAKEVALIDPOLYGON()
    ISVALID()
    ISINVALIDREASON()
    NUMINTERIORRINGS()
    NUMPOINTS()

The following sections provide examples of using these functions on both locations and regions.

8.3.1. Working With Locations

For geospatial locations, the data is often available as numeric values — longitude and latitude — rather than as WKT. In this case, you need to convert the numeric data to WKT before converting and inserting it as a GEOGRAPHY_POINT value.

For example, The VoltDB Tutorial uses data from the US Geographic Names Information Service (GNIS) to create a database of geographic locations. The original source data also includes the longitude and latitude of those locations. So it is easy to modify the database schema to add a location for each town:

CREATE TABLE towns (
   town VARCHAR(64),
   state VARCHAR(2),
   state_num TINYINT NOT NULL,
   county VARCHAR(64),
   county_num SMALLINT NOT NULL,
   location GEOGRAPHY_POINT,
   elevation INTEGER
);

However, the incoming data includes two floating point values rather than a GEOGRAPHY_POINT value or WKT. One solution is to create a simple stored procedure to perform the conversion to WKT and insert the record using the POINTFROMTEXT() function:

public class InsertTown extends VoltProcedure {

  public final SQLStmt insertrec = new SQLStmt(
      "INSERT INTO TOWNS VALUES (?,?,?,?,?,POINTFROMTEXT(?),?);"
  );

  public VoltTable[] run(   String t, String s, byte sn,
                            String c, short cn,
                            double latitude, double longitude,
                            long e)
      throws VoltAbortException {
          String wkt = "POINT( "  + 
               String.valueOf(longitude) + " " +
               String.valueOf(latitude) + ")";
          voltQueueSQL( insertrec, t,s,sn, c, cn, wkt, e);
          return voltExecuteSQL();
      }
}

Once the data is imported into the database, it is possible to use the geospatial functions to perform meaningful queries on the locations, such as determining which town is closest to a specific location (such as a cell phone user):

SELECT town, state FROM TOWNS 
   ORDER BY DISTANCE(location,CAST(? AS GEOGRAPHY_POINT)) 
   ASC LIMIT 1;

Or which town is furthest north:

SELECT town, state FROM TOWNS ORDER BY LATITUDE(location) DESC LIMIT 1;

8.3.2. Working With Regions

The textual representation for regions, or polygons, are not as easily constructed as geographic points. Therefore if you do not have region data already in WKT, your client application will need to generate WKT from whatever source data you are using.

Once you have the WKT representation, you can insert the data using a simple stored procedure similar to the example given above for locations. Since the data is already in WKT, you can even define the stored procedure using a CREATE PROCEDURE AS statement. The following example defines a table for storing information about the names and regions of national parks. It also defines the insert procedure for ingesting records from existing WKT data:

CREATE TABLE parks (
   park VARCHAR(64),
   park_code VARCHAR(2),
   border GEOGRAPHY
);
CREATE PROCEDURE InsertPark AS
   INSERT INTO parks VALUES (?,?, POLYGONFROMTEXT(?) );

As mentioned before, VoltDB does not validate the structure of the GEOGRAPHY polygon on input. So, if you are not positive the WKT representation meets the rules for a valid polygon, you should use the ISVALID() function on the GEOGRAPHY value before or after insertion to verify that your data is correct. For example, the following SQL statement uses the ISVALID() and ISINVALIDREASON() functions to report on all invalid park regions and the reason for the exception:

SELECT park, park_code, ISINVALIDREASON(border) 
  FROM Parks WHERE NOT ISVALID(border) ORDER BY park;

Alternately, you can use the VALIDPOLGYONFROMTEXT() function which combines the POLYGONFROMTEXT() and ISVALID() functions into a single function, ensuring that only valid polygons are generated. The preceding InsertPark can be rewritten to validate the incoming data like so:

CREATE PROCEDURE InsertPark AS
   INSERT INTO parks VALUES (?,?, VALIDPOLYGONFROMTEXT(?) );

Of course, the rewritten procedure will take incrementally longer because it performs both the conversion and validation. However, it performs these functions in a single step. The VALIDPOLYGONFROMTEXT() function will also correct simple errors in the WKT input. Specifically, it will correct any rings where the vertices are listed in the wrong direction.

Once you know your GEOGRAPHY data is valid, you can use the geospatial SQL functions to perform meaningful queries on the data. (If the polygons are not valid, the geospatial functions will not generate an error but will also not produce meaningful results.) The functions that perform calculations on GEOGRAPHY values are:

  • AREA() — the area of a region

  • CENTROID() — the geographic center point of a region

  • CONTAINS() — Whether a region contains a given point

  • DISTANCE() — distance between a point and a region (or between two points)

For example, the following SQL queries determine the three largest parks, what parks are closest to a given town, and what towns are contained with the region of a given park:

SELECT park, AREA(border) FROM Parks 
   ORDER BY AREA(border) DESC LIMIT 3;

SELECT p.park, DISTANCE(p.border,t.location)
   FROM parks AS P, towns AS T WHERE t.town=? 
   ORDER BY DISTANCE(p.border,t.location) ASC LIMIT 5;

SELECT t.town FROM parks AS P, towns AS T 
   WHERE p.park=? AND CONTAINS(p.border,t.location);