Geophile: Converting and importing shapefiles for Compose PostgreSQL and MongoDB

Datetime:2016-08-23 02:18:04          Topic: PostgreSQL  MongoDB           Share

Geophile is a series dedicated to looking at geographical data, its features and its uses. In this article, we discuss how to convert shapefiles into SQL and GeoJSON and import them into Compose PostgreSQL and MongoDB.

When viewing geographical data on GIS software, you are likely viewing data produced by a shapefile. A shapefile is a vector data format used for storing data that references geographical objects. These files must be converted into a data format that your database can read before it is stored and queried.

Here, we will discuss several tools that will enable you to convert shapefiles to SQL and GeoJSON to work with your Compose PostgresSQL and MongoDB databases.

A shapefile is commonly downloaded as a single .zip file that, once unzipped, contains three mandatory files with the prefixes .shp , .dbf , and .shx . The .shp file contains the geography data, which includes points, lines, and polygons. The .dbf file (or dBase table) contains non-geographic features and attributes that describes the data. And the .shx file contains indices of the record sets in the .shp file for quicker lookups. Other files can be included within the shapefile, which can be found here .

Converting to SQL and importing into Compose PostgreSQL

Importing shapefiles into your Compose PostgreSQL deployment is made easier by the command-line tool shp2psql that comes with the PostGIS extension. To use the tool, you must add the PostGIS, fuzzystrmatch and postgis tiger geocoder extensions to your Compose PostgreSQL database. This can be done in the Compose-UI by clicking on your database > browser > extensions. You can also connect to your Compose PostgreSQL deployment via the terminal and add the extensions via the SQL command:

CREATE EXTENSION extension_name;

Now, we just need to download a shapefile that we want to use. Here, I am using the Washington State Counties TIGER shapefile from the Washington State Office of Financial Management (WSOFM). This data contains the various shapes that make up the different counties in Washington State.

Using the command line tool shp2psql , we will convert the shapefile to a .sql file so that we can import it into our database. shp2psql provides us with many options to convert our data into any required format, which can be viewed on this cheatsheet .

For this demo, we convert the shapefile using the following command:

shp2pgsql -s 4269 -g geom_4269 -I -W "latin1" "county10" county

Let's break down what we did. We start with the shp2pgsql command, then add -s to indicate the spacial reference system of the source we are using. The US Census and WSOFM uses NAD83, which corresponds to the North American geodetic network, and has the EPSG projection code and PostGIS SRID number 4269. We should include this number to tell the program that our shapefile uses this EPSG code. The -g sets the geometry or geography column name in the table created by shp2pgsql , which I've named geom_4269. -I creates a spatial index on the geometry column, which can be included now, or you can index your table later. -W sets the encoding to Latin1 (or ISO-8859-1) based on what the .dbf file is encoded in. The file name "county10" refers to the file we are importing, and county is the name we are assigning the table once it's created.

The output is dumped into a .sql file and can be immediately imported into any Compose PostgreSQL database. You can also save the file to a location on your computer and import it later. I have provided the commands to run both below.

Convert and saved on your computer

shp2pgsql -s 4269 -g geom_4269 -I -W "latin1" "county10" county > county.sql

Convert and imported to Compose PostgreSQL

shp2pgsql -s 4269 -g geom_4269 -I -W "latin1" "county10" county | psql "sslmode= require host=aws-us-east-1-portal.99.dbplayer.com port=9675309 dbname=shapefile_demo user=admin"

Converting to GeoJSON and importing into Compose MongoDB

Converting shapefiles into GeoJSON is done using a command-line tool called ogr2ogr , from the GDAL library , which can be downloaded here . The tool allows you to convert shapefiles into many different formats. Here, we will convert our shapefile into GeoJSON so that MongoDB, or any database that can read GeoJSON, is able about to read it. However, if you wanted to convert your GeoJSON file into a shapefile, the tool can do that as well.

In order to convert our shapefile to GeoJSON, we run the following in our terminal:

ogr2ogr -f "GeoJSON" growth.json "county10.shp" -t_srs EPSG:4326 -lco COORDINATE_PRECISION=7

We will explain this command as well. We start with ogr2ogr and add -f to indicate the file type that we are converting our data into - GeoJSON. The filename after is the name of the output file. The file in quotations is our source file - the count10.shp shapefile. -t_srs with EPSG:4326 is reprojecting the file from EPSG 4269 to EPSG 4326 because MongoDB only support this geospatial reference currently. The -lco switch (layer creation option) allows us to control other features of the document we are creating. There are several options, but we will use the COORDINATE_PRECISION option that sets the number of decimals after the point to seven for a more accurate reading. If you do not set COORDINATE_PRECISION to a value, your GeoJSON will not come out as expected and it will set your longitude and latitude coordinates to numbers like 28377973 , which is not valid.

Once our data has been exported into a JSON file, we can clean the data up a little by removing the following lines at the top of the file:

{ "type": "FeatureCollection", "crs": { "type": "name", "properties": { "name": "urn:ogc:def:crs:OGC:1.3:CRS84" } }, "features": [...]}

We will keep all of the other generated data within the array. This allows us import separate documents into our collection versus storing all our data inside a single document. After that, we can import our data with the following command:

mongoimport -h aws-us-east-1-portal.99.dblayer.com:8675309 -p mypassword -u someone -d shapefile_demo county.json --jsonArray

We tell MongoDB to import from our host with our username and password credentials. We also provide the database name shapefile_demo with the JSON file we want to import. Then we tell MongoDB to save each item in the JSON array as a single document. This takes care of the issue of saving the entire JSON file as one document.

For more a more in-depth discussion about GDAL's ogr2ogr, read this .

All together now

So, we've take shapefiles and changed them into formats that can be imported into your Compose Postgres and MongoDB deployments. All you need to do is use a GIS software package to view your data. You can use a number of popular GIS software packages like OpenJUMP or QGIS to connect to you Compose deployments and view the counties.

Next time we will look at what EPSG codes are and what they mean for your geographic data.





About List