Working With GeoSpatial Data In Snowflake

Author: Mahima Khandelwal


This blog aims to provide detailed information about GeoSpatial data and its handling in Snowflake.


GeoSpatial data files majorly come with .Geotiff extension, which can further be converted to GeoJson. We can then load this GeoJson data into the Snowflake’s variant column. To make the analysis easier, we can flatten this data to Geospatial data type.



What is GeoTiff?

A GeoTIFF is a standard .tif or image file format that includes additional spatial (georeferencing) information embedded in the .tif file as tags.



An image of Los Angeles from Planet’s February 2018 Basemap. ©2018 Planet Labs Inc, CC BY-SA 4.0.


GeoJSON

GeoJSON is a way of representing geographic data. It is a JSON based format designed to represent the geographical features with their non-spatial attributes.


Sample GeoJson data with single value:



GeoJson properties may vary based on images.


Setting up Snowflake

You can create a free trial Snowflake account here - Snowflake Trial. Free trial provides 400 credits.


GeoJson to Snowflake - Raw Layer

We will be working with sample GeoJson data and using snowflake’s internal stage to store data files. You can load data into the internal stage using the put command in snowsql.


Steps:

1. Create file format and internal stage


Below script creates a file format of type ‘Json’ for GeoJson data with gzip compression and an internal stage with the created file format.


create or replace file format JsonFormat

type = json,strip_outer_array = true,

compression = gzip;


create stage GeoJsonStage

file_format = JsonFormat;


2. Load data to internal stage


Put command is used to load data from the local system to the internal stage we created in the previous step. You will need snowsql to execute this command.


put file:///POC/AllFiles/geotiff-1k-values-1k-rows_Modified.geojson @GeoJsonStage


3. List stage


View the loaded file using the list command in Snowflake.


list @GeoJsonStage;


4. Create raw table


Create a table with a variant column to load GeoJson - Semi-structured data.


create table Geo1Values1MRows(GeoValues variant);


5. Load data


Copy into command lets you load the data from the staged file into an existing table.


copy into Geo1Values1MRows

from @geojsonstage/geotiff-1-value-1M-rows_Modified.geojson.gz

file_format = JsonFormat;


Loaded data looks like:



Here we have one variant column with json dump. This is our landing zone which is a faithful representation of source data.


Snowflake Raw Layer to Staging Layer- Flatten Data

Now that we have data loaded in a raw table let us flatten it with our column of interest to get meaningful insights.


In the above example, we will be flattening our data based on properties, latitude, longitude, and coordinates.


1. Create a table with the required columns


We need a table with 3 integer columns for properties-value, latitude & longitude and a Geospatial data type - GEOGRAPHY column for coordinates.


create table Geo1Values1MRows_Modified(

Val number,

LNG number(15, 6),

LAT number(15, 6),

Coordinates geography

)


2. Load data


Insert data in the staging table, using select query. We use colon notation(:) between the variant column name and any first-level element.


insert into

Geo1Values1MRows_Modified

select

Geovalues:properties.value as value,

Geovalues:geometry.coordinates [0] as LNG,

Geovalues:geometry.coordinates [1] as LAT,

TO_GEOGRAPHY(Geovalues:geometry) as Coordinates

from

Geo1Values1MRows;


Querying GeoSpatial Data in Snowflake

Once the data is loaded in our staging table let us query a select statement to fetch all the results.



We can also query on latitude and longitude to get details of single and/or multiple coordinates.


1. Latitude & longitude - Single record.


Query on single latitude and longitude:


select * from

Geo1Values1MRows_Modified

where

lng = 4.341056

and

lat = 52.181202;



2. Latitude & longitude - Multiple records.


Below query results all records between the specified latitude and longitude.


select * from

Geo1Values1MRows_Modified

where

(

lng < 4.341078

and lng > 4.00000

)

and (

lat > 52.179936

and lat < 52.18125

);



Let’s see how can we play along with geospatial data type in snowflake -


1. St_within: Returns true if the 1st geography object is fully contained by the 2nd geography object.

select * from

Geo1Values1MRows_Modified

where

st_within(

coordinates,

st_makepoint(4.440488898317344, 52.18029904071257)

);




2. St_dwithin : Returns true if the minimum geodesic distance between two points is within the specified distance.


select * from

Geo1Values1MRows_Modified

where

st_dwithin(

coordinates,

st_makepoint(4.440488898317344, 52.18029904071257),

200

);



References
  1. https://quickstarts.snowflake.com/guide/getting_started_with_geospatial_geography/index.html?index=..%2F..index#4

  2. https://docs.snowflake.com/en/sql-reference/data-types-geospatial.html

  3. https://docs.snowflake.com/en/user-guide/querying-semistructured.html




451 views0 comments

Recent Posts

See All