How to Convert ESRI Geospatial Data Into JSON or PBF Formats

It’s been awhile since I’ve written a tutorial, but this seems like a good of a topic as any to get back into it (especially with the recent announcement of the OPEN Government Data Act).

In this post I’ll cover how to convert ESRI geospatial databases and shapefiles into more open formats such as JSON or PBF.

Why?

ESRI, for the unfamiliar, is a Geographic Information Systems (GIS) software company who creates geospatial products such as ArcGIS. They are also the developers of shapefiles and geodatabases. All three are widely used in the geospatial community, but ESRI’s shapefiles and geodatabases weren’t built for the modern web. To use the data stored within them for web-based maps, they need to be converted. That’s where JSON and PBFs come in.

Converting ESRI Geospatial Data to JSON

JSON is a popular, open file format for storing and transporting data. Web-based maps often use an open standard format of JSON called GeoJSON. We’ll first convert our ESRI data to GeoJSON.

To convert the data, you can use either ogr2ogr or Python.

ogr2ogr

ogr2ogr is a great, simple option for converting ESRI data into GeoJSON data. The easiest way to run it is using OSGeo4W Shell.

Once you have the shell installed, all you need to do is open it, navigate to where your geodatabase or shapefile is located, and run:

ogr2ogr -f "GeoJSON" "outputFile.json" "inputFile.shp"

If you’re working with a Geodatabase (.gdb), you may want to run ogrinfo on the gdb first to see what layers are stored within it.

ogrinfo exampleDB.gdb

Once you know what layers you want, you can run:

ogr2ogr -f "GeoJSON" "outputFile.json" "exampleDB.gdb" "layerName"

After ogr2ogr runs, you should have a GeoJSON file ready to use.

Python

If you prefer to use Python to convert your shapefile, something like this should do the trick:

import json
import ogr

driver = ogr.GetDriverByName('ESRI Shapefile')
shp_path = r'./data/example'
data_source = driver.Open(shp_path, 0)

fc = {
    'type': 'FeatureCollection',
    'features': []
    }

lyr = data_source.GetLayer(0)
for feature in lyr:    
    fc['features'].append(feature.ExportToJson(as_object=True))

with open('outputData.json', 'w') as f:
    json.dump(fc, f)

Converting GeoJSON to PBF files

With the data in GeoJSON format now, it’s ready to use. The file size might be a bit big though depending on how much data you have.

To reduce the file size and make it manageable for displaying features on a map, Tippecanoe and mbutil come to the rescue. They’ll help reduce the file size and optimize it into a {z}/{x}/{y} schema for web based maps to use.

Tippecanoe

Tippecanoe is a great open source tool that lets you convert GeoJSON into MBTiles (a file format for storing tilesets).

Windows Installation

Tippecanoe wasn’t made to run on Windows machines, so if you’re using a Windows machine you’ll need to use one of three options:

Using Tippecanoe

Tippecanoe works by splitting your data up into tiles at zoom levels, which can range from 0 to 22. Each zoom level corresponds to a given level of precision (from 32,000 feet to smaller than 1 ft).

To find out what level of precision to use, I like to first run:

tippecanoe -o out.mbtiles -zg --drop-densest-as-needed in.geojson

The -zg flag will have Tippecanoe choose a max zoom level that should reflect the precision of the original dat. The –drop-densest-as-needed flag will have Tippecanoe drop the densest (least visible) features at each zoom level so tiles stay under 500 Kb.

Once the above code runs, I often like to check the output folder to see what zoom level Tippecanoe chose, then run it again setting the zoom level to be 1 higher. For example, if Tippecanoe set the max zoom level to 10, I will run:

tippecanoe -o out.mbtiles -z11 --drop-densest-as-needed in.geojson

I then try mapping the data to see how it looks. If the output still doesn’t look the way I want it to, I’ll start dropping feature attributes. Depending on how many features I want to drop, I will either use:

tippecanoe -o out.mbtiles -z11 --drop-densest-as-needed -x "featureToExclude" in.geojson

or

tippecanoe -o out.mbtiles -z11 --drop-densest-as-needed -y "featureToInclude" in.geojson

Dropping feature attributes can help cut down the size of each feature, allowing more features to fit within a given tile. If you just want polygons and lines, without any feature attributes, you can run:

tippecanoe -o out.mbtiles -z11 --drop-densest-as-needed -X in.geojson

The above will result in the most performant vector tiles (at the cost of losing all your feature attributes).

Another good option for cutting down on file size can be to cluster points. Tippecanoe allows you to do this a number of ways. As above, I usually run:

tippecanoe -o out.mbtiles -zg --cluster-densest-as-needed in.geojson

Depending on how the result turns out, I then change the zoom level or use –cluster-distance to get more accurate results.

mbutil

Now that we have an mbtiles file, we need to unpack into pbf (protocol buffer binary format files) it to be able to actually use it on the web. To do so, we can use mbutil.

mb-util --image_format=pbf exampleFolder/example.mbtiles outFolder
Installation

You should be able to install using the global mbutil installation steps. If they don’t work though, you can use mbutil with python from within the cloned git repository:

python mb-util --image_format=pbf ../exampleFolder/example.mbtiles ../outFolder
Output

Your pbf files will be output in a {z}/{x}/{y} folder format. Keep in mind that the pbf files are gzipped. So if you want to serve them locally, you would need to figure out a way to do it with the content-encoding gzip header set.

Uploading to S3

With your files converted, you need somewhere to store them. If that storage space is S3, here’s how to upload the files:

aws s3 cp --recursive --content-encoding 'gzip' outFolder s3://your-bucket/outFolder

If you want to set the ACL on upload (so you don’t have to do it later), you can add the –acl parameter:

aws s3 cp --recursive --content-encoding 'gzip' --acl public-read outFolder s3://your-bucket/outFolder

 

A Quick-Start Guide to the MEAN Stack (on a Mac)

Want to get a web application up and running quickly? Consider using the MEAN stack – MongoDB, ExpressJS, AngularJS, and Node.js. In this quick-start guide I’ll walk you through the process of getting everything set up properly with a MEAN stack on your Mac, then we’ll jump right in to developing a simple web application.

What are the technologies in the MEAN stack?

The MEAN stack consists of:

  • MongoDB – a NoSQL database. Consists of a set of Collections, which hold a set of Documents. Each Document is a set of key-value pairs, and is analogous to a JSON object. One key difference between a Document and a JSON object is that a Document is in BSON format, which is a binary form of JSON.
  • ExpressJS – a Node.js web application framework. Provides robust set of features for building single, multipage, and hybrid web applications.
  • AngularJS – a Javascript UI framework for web apps. It lets you extend HTML’s syntax and works very well for single-page applications.
  • Node.js – a server-side Javascript runtime environment. Node.js uses an event-driven, non-blocking I/O model based on the V8 Javascript engine.

Set-up

Install Homebrew

Before installing the MEAN stack on your Mac, you’ll want to ensure Homebrew is installed. Homebrew is a package manager for Mac that will make installing everything in the MEAN stack much simpler. First, check if you have Homebrew installed by opening up the terminal and typing:

brew help

If it is installed, proceed to install Node and npm (node package manager). Otherwise, install it by running this code:

ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

Then run:

brew doctor

Finally, you may have to set up the permissions of your /usr/local folder by running this code:

sudo chgrp -R staff /usr/local
sudo chmod -R g+w /usr/local
sudo chgrp -R staff /Library/Caches/Homebrew
sudo chmod -R g+w /Library/Caches/Homebrew/

The code above assumes your user’s group is staff. You can check your usergroup by typing this in the terminal:

id

It should print out a long list of different variables, with the 2nd and 3rd ones being gid=xx(staff) groups=xx(staff). If your group is different from ‘staff’, then modify the provided code accordingly with your group name.

Install Node and npm

Run this:

brew update
brew install node

Check that it succeeded using:

node -v
npm -v

Install MongoDB

brew install mongodb

Next, create the data directory for the mongod process to write data:

[sourecode]sudo mkdir -p /data/db[/sourcecode]

Then update the permissions of the directory:

sudo chgrp -R staff /data
sudo chmod -R g+w /data
Check MongoDB is working

Run:

mongod --httpinterface

The –httpinterface part allows you to access the interface via your browser (which is disabled by default). Navigate to http://localhost:28017/. You should see a web interface that provides overview, clients, dbtop, and writelock % rows, as well as the log.

Go back to your terminal and hit control + c to shut down the server.

Install Bower

Next, install Bower, which you’ll use to manage your front-end packages:

npm install -g bower

Install Grunt

Up next is Grunt, which is a task runner that helps you automate your development process (minification, compilation, unit testing, etc.):

npm install -g grunt-cli

Install Yeoman

We’re not done installing programs yet. Let’s set you up with Yeoman, a generator that provides scaffolding for projects:

npm install -g yo

Install MEAN.js

You’re finally ready to install MEAN.js, a generator that helps you create a MEAN.js application, a CRUD (Create, Read, Update and Delete) module, Angular module, and other AngularJS/Express entities.

npm install -g generator-meanjs

Start creating your first application

Now that you have everything set up, let’s start working on a simple application. First, create a new project folder. My folder is located at /Users/ryanwolniak/Development/MEAN/p1.

cd /Users/ryanwolniak/Development/MEAN/p1

If you’re unfamiliar with how to create a new directory, you’ll want to run a command similar to this first:

mkdir -p /Users/ryanwolniak/Development/MEAN/p1

With your project directory created, generate your application with yo:

yo meanjs

You will be asked a number of questions. For the Mean.js version I chose 0.4.1. Press enter when asked for a folder, then call your project whatever you like. I called mine p1. For the description, describe it however you wish. Add simple keywords if you like, and put your name down as the author of the application. When asked if you want to generate a CRUD module and a chat example, type yes for both. Then wait for the files to be generated.

What to do if the generator fails

If the generator fails, try running the following code:

npm config set registry http://registry.npmjs.org/ --global
set NODE_TLS_REJECT_UNAUTHORIZED=0

Then ensure that you have accepted the XCode license by running:

sudo xcodebuild -license

Scroll all the way to the bottom of the license and type ‘agree’.

If prompted, close and reopen your terminal. Then install nvm:

curl -o- https://raw.githubusercontent.com/creationix/nvm/v0.29.0/install.sh | bash

Switch your node version to v0.12.7:

nvm install 0.12.7
nvm use 0.12.7

If prompted, close and reopen your terminal.

After running the above code, delete the mean folder that was created from the first time you ran ‘yo meanjs’ (it should be in the directory you ran ‘yo meanjs’ from). Then retry the generator:

yo meanjs

After that, you are almost ready to run your application. Change directories to the directory you generated the files at:

cd /Users/ryanwolniak/Development/MEAN/p1/mean

Then install the ruby gem sass:

sudo gem install sass

Run your application

Let’s start your application up!

First, start MongoDB:

mongod &

Note: the & symbol will allow mongod to run in the background so that you can start up your application. Speaking of which, run:

grunt

Your application should now be up and running. Navigate to http://localhost:3000/

You should see a screen similar to this one:

MEAN.JS Sample Application

Stopping your application

Want to end your application? If your terminal is still up, hit ‘control + c’. That should stop the application. But mongod will still be running in the background. To stop mongod, first run ‘ps’ to find a list of processes running on your machine:

ps

Then run:

kill 59178

where 59178 is the process id (PID). If it’s still hung, hit ‘control + c’.

Questions?

Since things sometimes vary from machine to machine, feel free to reach out if you’re having trouble getting any of this working via email or in the comments.

Getting Started with SparkSQL on the Hortonworks Sandbox I: Installing Zeppelin

SparkSQL is a module for Spark which allows you to run SQL queries over the Spark engine. It works alongside Hive, in the sense that it reuses the Hive frontend and metastore to give you complete compatibility with any of your Hive UDFs, data, or queries. The advantage of SparkSQL over classic Hive (which used MapReduce to run queries) is speed and integration with Spark.

Before diving into SparkSQL, it’s worth noting that Hive on Apache Tez, of which integration was brought about by Hortonworks’ Stinger Initiative, increased Hive’s performance on interactive queries. This slide deck by Hortonworks even suggests that Hive on Tez outperforms SparkSQL for short running queries, ETL, large joins and aggregates, and resource utilization. However, SparkSQL still makes for a great way to explore your data on your cluster. With APIs in Java, Python, and Scala, as well as a web based interpreter called Zeppelin, it offers a multitude of options for running queries and even visualizing data.

To start you off easy, we’ll begin this guide by installing Zeppelin so that we can use the web based interpreter. If you’re familiar with R, think of Zeppelin as a bit like RStudio. It allows for data ingestion, discovery, analytics, and visualization.

Install Zeppelin

Download Zeppelin

Let’s start by downloading the Zeppelin files. To do this, we’ll clone it from its Github repository:

git clone https://github.com/apache/incubator-zeppelin.git

Install Maven

In order to build Zeppelin, you’ll need Maven on your sandbox. To install it, first download it:

wget http://apache.cs.utah.edu/maven/maven-3/3.3.3/binaries/apache-maven-3.3.3-bin.tar.gz

Extract it:

tar xvf apache-maven-3.3.3-bin.tar.gz

Move it:

mv apache-maven-3.3.3 /usr/local/apache-maven

Add the environment variables:

export M2_HOME=/usr/local/apache-maven
export M2=$M2_HOME/bin
export PATH=$M2:$PATH

Run this command:

source ~/.bashrc

Then verify that Maven works by running this command:

mvn -version

Build Zeppelin

With Maven installed, you’re now ready to build Zeppelin:

mvn clean install -DskipTests -Pspark-1.3 -Dspark.version=1.3.1 -Phadoop-2.6 -Pyarn

Prepare for it to take about 15-20 minutes to build. Let it run.

When the build finishes you should get a screen that looks like this:

Zeppelin Maven Build

Next, create the zeppelin-env.sh file by copying the zeppelin-env.sh.template:

cp conf/zeppelin-env.sh.template conf/zeppelin-env.sh

The above code assumes you are in the directory that you downloaded zeppelin to.

Configure Zeppelin

Next, edit the zeppelin-env.sh file:

vi conf/zeppelin-env.sh

Hit i to enter edit/insert mode. Add these lines at the end of the file:

export HADOOP_CONF_DIR=/etc/hadoop/conf
export ZEPPELIN_PORT=10008
export ZEPPELIN_JAVA_OPTS="-Dhdp.version=2.3.0.0-2557"

Note: the Dhdp.version should be the version of Hadoop that you are running. If you are running the 2.3 version of the sandbox your version will be the same as mine.To figure out your version in other cases, run this code:

hadoop version

You should get something that looks like this:

Hadoop version

The version you’ll want to type in comes after the first three numbers (2.7.1).

Once you know your version, go back and edit the zeppelin-env.sh file as instructed. Hit escape, then ‘:wq’ to save and quit.

Next, copy the hive-site.xml file to the conf folder:

cp /etc/hive/conf/hive-site.xml conf

The above code assumes you are in the directory you downloaded Zeppelin to.

Switch to the hdfs user next:

su hdfs

Then create a directory in HDFS for zeppelin:

hdfs dfs -mkdir /user/zeppelin
hdfs dfs -chown zeppelin:hdfs /user/zeppelin

You’re almost ready to start Zeppelin. But first you need to change your port forwarding settings on your sandbox.

Add Port Forwarding

Power off your sandbox, then navigate to the Machine > Settings while your Hortonworks sandbox is selected in the VirtualBox Manager. Click on Network once in the settings. You should be in NAT mode. Click on Advanced > Port Forwarding.

Port Forwarding

Next, add a rule by clicking the green plus side. Call the rule zeppelin, and give it the Host Name 127.0.0.1, the Host Port 10008, and the Guest Port 10008.

Zeppelin Port

Click OK twice, then start your sandbox back up.

Start Zeppelin

Ready to start Zeppelin? Navigate to wherever you downloaded Zeppelin (the incubator-zeppelin folder), then type this code into your command line:

bin/zeppelin-daemon.sh start

Congratulations! You should now have Zeppelin up and running on port 10008, i.e. 127.0.0.1:10008.

If you want to stop it, run this code:

bin/zeppelin-daemon stop

With Zeppelin up and running, it’s time to start exploring SparkSQL. Check out Part II of this guide for an introduction to SparkSQL.

Setting up IPython Notebook with Apache Spark on the Hortonworks Sandbox

Apache Spark, if you haven’t heard of it, is a fast, in-memory data processing engine that can be used with data on Hadoop. It offers excellent performance and can handle tasks such as batch processing, streaming, interactive queries and machine learning.

Spark offers APIs in Java, Scala, and Python. Today, I’ll be covering how to get an IPython notebook set up on your Hortonworks sandbox so you can use it to run ad-hoc queries, data exploration, analysis, and visualization over your data.

Let’s get started

Setup

First, make sure your sandbox’s network adapter is in NAT mode.

While your sandbox is powered off, navigate to the settings of your Hortonworks sandbox by clicking Machine > Settings while ‘Hortonworks Sandbox with HDP 2.3_1’ is highlighted.

Navigate to Network once in the settings. Ensure that your NAT network adapter is turned on, and that it is the only network adapter turned on:

Enable NAT Adapter

The default settings should be fine.

With the proper network adapter enabled, start up your machine.

Install IPython

Next, use yum to install a number of necessary dependencies:

yum install nano centos-release-SCL zlib-devel bzip2-devel openssl-devel ncurses-devel sqlite-devel readline-devel tk-devel gdbm-devel db4-devel libpcap-devel xz-devel libpng-devel libjpg-devel atlas-devel

Make sure you include all of them. When you’re prompted with a y/N question, be sure to type y.

Next, install the development tools dependency for Python 2.7:

yum groupinstall "Development tools"

Then, install Python 2.7:

yum install python27

Since there are now multiple Python versions on your Hortonworks sandbox, you need to switch to Python 2.7:

source /opt/rh/python27/enable

Don’t worry, the Python version will switch back to its default the next time you power off your sandbox.

Next, download ez_setup.py, which will let you install ez_install. You’ll then use ez_install to install pip. It’s a bit of jumping through hoops, but these tools will get you what you need. Start by navigating to a test directory, or any directory you feel comfortable downloading files to. I chose to create a directory called /test_dev/:

mkdir /test_dev/
cd /test_dev/

Next, download ez_setup.py:

wget http://bitbucket.org/pypa/setuptools/raw/bootstrap/ez_setup.py

Run ez_setup.py:

python ez_setup.py

Now install pip:

easy_install-2.7 pip

With pip installed, you can now install some of the packages you’ll want to use with IPython. You want to install the following packages:

numpy scipy pandas scikit-learn tornado pyzmq pygments matplotlib jinja2 jsonschema

You can install them by running this code, followed by the packages you want to install:

pip install

For example:

pip install scipy pandas scikit-learn

I recommend installing only a few at a time, as you may run into issues if you try to install them all at once. The first couple of installations take a good amount of time to install.

Install IPython Notebook next:

pip install "ipython[notebook]"

Next, create an IPython profile for Spark:

ipython profile create pyspark

Before we continue any further, you should set up port forwarding for port 8889 on your sandbox.

Setting up port forwarding

Power off your sandbox. Go into the settings of your Hortonworks sandbox, then navigate to Network:

Network settings

Click Port Forwarding. Then click the plus sign to add a new rule. Call the rule ipython, give it the host IP 127.0.0.1, host port 8889, and guest port 8889. Leave the guest IP blank. It should look like this when you are done:

Port Forwarding in VirtualBox

Click OK, then OK again.

Start your sandbox back up.

Create a shell script to launch IPython Notebook

To make things easier for you to launch IPython Notebook, let’s create a shell script. Navigate to the folder you’d like to launch IPython Notebook from. For me that was /dev_test/:

cd /dev_test/

Launch nano to create the shell script:

nano start_ipython_notebook.sh

Enter this code in nano:

#!/bin/bash
source /opt/rh/python27/enable
IPYTHON_OPTS="notebook --port 8889 --notebook-dir=u'/usr/hdp/2.3.0.0-2557/spark/' --ip='*' --no-browser" pyspark

Type control + o to write/save the file. Then control + x to quit.

Run it using this code:

sh start_ipython_notebook.sh

It should look like this:

IPython Notebook running on Hortonworks sandbox

Congratulations! You’re now running IPython Notebook on the Hortonworks sandbox in conjunction with Spark.

Using your new IPython Notebook

To start exploring with your new notebook, go to this web address: 127.0.0.1:8889

That’s it for this guide. In the next guide I’ll cover some basics of IPython Notebook and how you can get started using it with Spark.

Credits

Credit goes out to Hortonworks for writing their own guide, which I used as a basis of knowledge for this post. Since their guide was outdated at the time of writing, this post has updates and modifications which ensure a seamless installation of IPython Notebook with Apache Spark on Hortonworks Sandbox 2.3 and 2.3.1.

How to Integrate SQuirreL with Phoenix

If you’re looking to use a client GUI, or graphical user interface, to interact with Phoenix, you might want to give SQuirreL a try. SQuirreL is a graphical java program that lets you see the structure of a database, browse the data in tables, and perform SQL queries.

Installing SQuirreL

The first thing you’ll want to do is install SQuirreL. To do this, go here and select the appropriate download for your operating system.

Once you have downloaded the file, open a new terminal window. Navigate to wherever you downloaded the file to, and run this code:

java -jar squirrel-sql-3.6-MACOSX-install.jar

You may have to modify that code slightly based on the name of the file you downloaded (version and operating system).

Once the installer dialog pops up, follow the instructions to install SQuirreL onto your system. You can choose to select optional installs if you like. For now, I chose to just do the base and standard install:

Installing SQuirrelSQL

SQuirrelSQL is now installed on your local machine, but we’re not done yet. We still need to set it up so we can use it with Phoenix.

Configuring SQuirrelSQL for Phoenix

Before we get started, make sure you have your Hortonworks sandbox VM up and running. You should have already followed this guide to set up Phoenix on your sandbox.

Step 1 – Move Phoenix client jar to SQuirreL lib directory

In your VM, navigate to the folder where Phoenix is installed. For me that folder is /usr/hdp/2.3.0.0-2557/phoenix. You need to move the phoenix client jar from that folder to your local machine. If you have a shared folder set up between your vm and your computer, run code similar to this to copy it to your local machine:

cp /usr/hdp/2.3.0.0-2557/phoenix/phoenix-4.4.0.2.3.0.0-2557-client.jar /media/sf_hdp_shared_folder/

Once you’ve got the file on your local machine, add it to SQuirrelSQL’s lib directory. On a Mac, you do this by navigating to Applications then right clicking on SQuirrelSQL and clicking ‘Show Package Contents’:

Finding SQuirreL's lib folder

From there, navigate to Contents > Resources > Java > lib. Copy the phoenix client jar to that directory:

Adding the phoenix jar to SQuirreL's lib folder

Step 2 – Switch your VM’s network adapter to Host-only

To make things easier for us, we’re going to switch the VM’s network adapter to Host-only. This gets rid of some bugs that can pop up if you try to connect to the VM when it’s using a NAT adapter. If your VM’s network adapter is already in Host-only mode, skip ahead to step 3.

To switch to Host-only mode, first power off your Hortonworks sandbox. Open up the VirtualBox Manager, click on your Hortonworks Sandbox, and select Settings > Network. Disable your NAT adapter, or any other adapters, if they are enabled. Then, with an empty adapter spot click Enable Network Adapter. Attach it to ‘Host-only Adapter’, then select any of the available options for its Name. Leave the Advanced Settings as they are:

Host-only network settings

Troubleshooting – What if there is no option for a Host-only Adapter after the Name field?

In this case, you need to edit your VirtualBox Manager settings. Close out of the Hortonworks settings and go to VirtualBox > Preferences > Network. Select the Host-only Networks tab, then click the plus icon:

Adding a new host-only network

A new Host-only network should now be available for you to select for your Hortonworks sandbox. Make sure you go back into the settings for your Hortonworks sandbox and switch your VM’s network adapter to your new Host-only adapter. When you have done that, move on to Step 3.

Step 3 – Add hortonworks.hbase.vm to your hosts file

If you haven’t already updated your local machine’s hosts file with the ip address of your sandbox, we’re going to do that in this step. If you have already done that, skip ahead to step 4.

To add hortonworks.hbase.vm to your hosts file, open up a terminal window. On a Mac, type this code:

sudo nano /etc/hosts

You’ll be asked to enter your password, then nano (a text editor) will open. Add this line at the end of your hosts file:

192.168.56.101 hortonworks.hbase.vm

This is what my hosts file currently looks like:

Hosts file

Some important notes

  • The ip address you enter may be different than mine. To check it, type this code in your Hortonworks sandbox:
    ifconfig
  • You can call your sandbox whatever you like in the hosts file. I chose to go with hortonworks.hbase.vm, but you can call it sandbox.hortonworks.com if you like. Just remember what you called it because we’ll be using that name later.

Step 4 – Add Phoenix driver to SQuirreL

Open up SQuirrel, click the Drivers tab on the left side of the window, and click the plus button to create a new driver. Enter this information into the driver creation window:

  • Name: Phoenix
  • Example URL: jdbc:phoenix:hortonworks.hbase.vm:2181:/hbase-unsecure
  • Website URL: [Blank. Do not write anything here]
  • Class Name: org.apache.phoenix.jdbc.PhoenixDriver

It should look like this:

Phoenix Driver Settings

Click OK. You should get a message which reads “Driver class org.apache.phoenix.jdbc.PhoenixDriver successfully registered for driver definition: Phoenix”

Step 4.5 – Ensure your sandbox and HBase are running

Before continuing, make sure both your Hortonworks sandbox and HBase are running. Recall that you can check if HBase is running in Ambari at your-sandbox-ip:8080. If you’re having trouble starting HBase or getting warnings that don’t disappear after a minute or two, check out the ‘Starting HBase on the sandbox’ section of this guide.

Step 5 – Create an Alias

Switch to the Aliases tab and click the plus button to create a new alias. Enter this information in the alias creation window:

  • Name – Alias name (ex: hortonworksSandbox, whatever you want)
  • Driver – Phoenix
  • URL – This should be auto-populated when you select your driver with jdbc:phoenix:hortonworks.hbase.vm:2181:/hbase-unsecure
  • User Name – Whatever you like (ex: admin)
  • Password – Whatever you like (ex: admin)

It should look like this:

Phoenix Alias

Once you’ve filled out the above information, click Test then select Connect. A box should pop up which says “Connection successful”. Click OK then OK again to create the alias.

Step 6 – Connect

We’re almost done! Double click on your newly created alias and click Connect. You should now be successfully connected and able to run SQL queries.

For a very short guide to using SQuirreL, check out this link.