How to Convert ESRI Geospatial Data Into JSON or PBF Formats

It’s been awhile since I’ve written a tutorial, but this seems like a good of a topic as any to get back into it (especially with the recent announcement of the OPEN Government Data Act).

In this post I’ll cover how to convert ESRI geospatial databases and shapefiles into more open formats such as JSON or PBF.

Why?

ESRI, for the unfamiliar, is a Geographic Information Systems (GIS) software company who creates geospatial products such as ArcGIS. They are also the developers of shapefiles and geodatabases. All three are widely used in the geospatial community, but ESRI’s shapefiles and geodatabases weren’t built for the modern web. To use the data stored within them for web-based maps, they need to be converted. That’s where JSON and PBFs come in.

Converting ESRI Geospatial Data to JSON

JSON is a popular, open file format for storing and transporting data. Web-based maps often use an open standard format of JSON called GeoJSON. We’ll first convert our ESRI data to GeoJSON.

To convert the data, you can use either ogr2ogr or Python.

ogr2ogr

ogr2ogr is a great, simple option for converting ESRI data into GeoJSON data. The easiest way to run it is using OSGeo4W Shell.

Once you have the shell installed, all you need to do is open it, navigate to where your geodatabase or shapefile is located, and run:

ogr2ogr -f "GeoJSON" "outputFile.json" "inputFile.shp"

If you’re working with a Geodatabase (.gdb), you may want to run ogrinfo on the gdb first to see what layers are stored within it.

ogrinfo exampleDB.gdb

Once you know what layers you want, you can run:

ogr2ogr -f "GeoJSON" "outputFile.json" "exampleDB.gdb" "layerName"

After ogr2ogr runs, you should have a GeoJSON file ready to use.

Python

If you prefer to use Python to convert your shapefile, something like this should do the trick:

import json
import ogr

driver = ogr.GetDriverByName('ESRI Shapefile')
shp_path = r'./data/example'
data_source = driver.Open(shp_path, 0)

fc = {
    'type': 'FeatureCollection',
    'features': []
    }

lyr = data_source.GetLayer(0)
for feature in lyr:    
    fc['features'].append(feature.ExportToJson(as_object=True))

with open('outputData.json', 'w') as f:
    json.dump(fc, f)

Converting GeoJSON to PBF files

With the data in GeoJSON format now, it’s ready to use. The file size might be a bit big though depending on how much data you have.

To reduce the file size and make it manageable for displaying features on a map, Tippecanoe and mbutil come to the rescue. They’ll help reduce the file size and optimize it into a {z}/{x}/{y} schema for web based maps to use.

Tippecanoe

Tippecanoe is a great open source tool that lets you convert GeoJSON into MBTiles (a file format for storing tilesets).

Windows Installation

Tippecanoe wasn’t made to run on Windows machines, so if you’re using a Windows machine you’ll need to use one of three options:

Using Tippecanoe

Tippecanoe works by splitting your data up into tiles at zoom levels, which can range from 0 to 22. Each zoom level corresponds to a given level of precision (from 32,000 feet to smaller than 1 ft).

To find out what level of precision to use, I like to first run:

tippecanoe -o out.mbtiles -zg --drop-densest-as-needed in.geojson

The -zg flag will have Tippecanoe choose a max zoom level that should reflect the precision of the original dat. The –drop-densest-as-needed flag will have Tippecanoe drop the densest (least visible) features at each zoom level so tiles stay under 500 Kb.

Once the above code runs, I often like to check the output folder to see what zoom level Tippecanoe chose, then run it again setting the zoom level to be 1 higher. For example, if Tippecanoe set the max zoom level to 10, I will run:

tippecanoe -o out.mbtiles -z11 --drop-densest-as-needed in.geojson

I then try mapping the data to see how it looks. If the output still doesn’t look the way I want it to, I’ll start dropping feature attributes. Depending on how many features I want to drop, I will either use:

tippecanoe -o out.mbtiles -z11 --drop-densest-as-needed -x "featureToExclude" in.geojson

or

tippecanoe -o out.mbtiles -z11 --drop-densest-as-needed -y "featureToInclude" in.geojson

Dropping feature attributes can help cut down the size of each feature, allowing more features to fit within a given tile. If you just want polygons and lines, without any feature attributes, you can run:

tippecanoe -o out.mbtiles -z11 --drop-densest-as-needed -X in.geojson

The above will result in the most performant vector tiles (at the cost of losing all your feature attributes).

Another good option for cutting down on file size can be to cluster points. Tippecanoe allows you to do this a number of ways. As above, I usually run:

tippecanoe -o out.mbtiles -zg --cluster-densest-as-needed in.geojson

Depending on how the result turns out, I then change the zoom level or use –cluster-distance to get more accurate results.

mbutil

Now that we have an mbtiles file, we need to unpack into pbf (protocol buffer binary format files) it to be able to actually use it on the web. To do so, we can use mbutil.

mb-util --image_format=pbf exampleFolder/example.mbtiles outFolder
Installation

You should be able to install using the global mbutil installation steps. If they don’t work though, you can use mbutil with python from within the cloned git repository:

python mb-util --image_format=pbf ../exampleFolder/example.mbtiles ../outFolder
Output

Your pbf files will be output in a {z}/{x}/{y} folder format. Keep in mind that the pbf files are gzipped. So if you want to serve them locally, you would need to figure out a way to do it with the content-encoding gzip header set.

Uploading to S3

With your files converted, you need somewhere to store them. If that storage space is S3, here’s how to upload the files:

aws s3 cp --recursive --content-encoding 'gzip' outFolder s3://your-bucket/outFolder

If you want to set the ACL on upload (so you don’t have to do it later), you can add the –acl parameter:

aws s3 cp --recursive --content-encoding 'gzip' --acl public-read outFolder s3://your-bucket/outFolder

 

Leave a Comment.