If you’re looking to get started with using Apache Phoenix, the open source SQL skin for HBase, the first thing you’ll want to do is install it. This guide will show you how to do that on the Hortonworks virtual sandbox.
If you’re running your setup on a machine that isn’t the Hortonworks sandbox, the installation guide over on the Phoenix website should help. Hortonworks also has an installation guide for both unsecure and secure hadoop clusters. In this guide we’ll be setting up Phoenix on an unsecure cluster (sandbox).
What is Apache Phoenix?
Before we start, let’s talk briefly about what Phoenix is and what it can do for you. As previously mentioned, Phoenix is an open source SQL skin for HBase. This means that it takes your SQL queries and transforms them into a series of HBase scans. The transformations are all done under the hood. For the most part, you can run SQL queries over HBase as if you were merely using a relational database like MySQL or SQLite.
What are some use cases for Phoenix?
Phoenix can be used for a few different use cases:
- At SiftScience, they use Phoenix for ad-hoc queries and exposing data insights.
- At Alibaba, they use Phoenix for queries where there is a large dataset with a relatively small result (10,000 records or so), or for complicated queries over large dataset with a large result (millions of records).
- At Ebay, they use Phoenix for Path or Flow analysis, as well as for real time data trends.
To see more use cases, go here.
Where to learn more
Ready to get started? We’re going to be using the open-source, package management utility yum (Yellowdog Updater, Modified) to install Phoenix. To start the installation run:
yum install phoenix
Possible installation errors (and their fixes)
There’s a good chance that code will fail if you haven’t used yum before.
If it fails with the error message Couldn’t resolve host mirrorlist.centos.org, the issue most likely stems from your network adapter. Check your network adapter settings: Machine > Settings > Network. You should have a network adapter enabled that is attached to NAT. Make sure no other network adapters are enabled. If you don’t have a NAT adapter enabled, power off your machine. Once it’s powered off you can return to the same Machine > Settings > Network menu to add or enable a NAT adapter. The default settings should be fine:
If you receive the message Error: Cannot retrieve metalink for respository : epel, you will have to run this code in your VM:
sudo sed -i "s/mirrorlist=https/mirrorlist=http/" /etc/yum.repos.d/epel.repo
It will update yum’s repository to use http instead of https.
Installing Phoenix with Yum
With the above fixes in place, you should be ready to install Phoenix with yum. Run this code:
yum install phoenix
Once the installation finishes, find your Phoenix core jar file. For me it was located at /usr/hdp/18.104.22.168-2557/phoenix/lib/phoenix-core-22.214.171.124.3.0.0-2557.jar. Link the Phoenix core jar file to the HBase Master and Region servers. Here was the code I used to link it:
ln -sf /usr/hdp/126.96.36.199-2557/phoenix/lib/phoenix-core-188.8.131.52.3.0.0-2557.jar /usr/hdp/184.108.40.206-2557/hbase/lib/phoenix.jar
Change the version numbers if you have different versions of hadoop (hdp) or phoenix.
Edit the hbase-site.xml
The next step is editing the hbase-site.xml. Run this:
Again, change version numbers in that code as necessary. Now that you’re in vi, a linux text editor, hit i to change from command mode to insert mode. Insert mode will let you make changes to the text in the file, while command mode lets you cause actions that will be taken on the file. Place this code between the two configuration tags:
hbase.defaults.for.version.skip true hbase.regionserver.wal.codec org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec
The file should look like this when you’re done:
Save the file by pressing ‘ESC’ to change from Insert Mode to Command mode, then hit ‘:wq’ to save and quit.
If HBase isn’t running yet, you need to start it. Similarly, if HBase is already running, you need to restart it.
Log into Ambari in your browser at 127.0.0.1:8080 with username/password admin/admin. If that doesn’t work, check which ip address to use by typing this code in your terminal:
Once you’re logged in, start HBase by clicking HBase on the left panel
then Service Actions > Start:
Give it a minute or two to start if you get a red alert when it first starts up. If the alert persists, you may have to stop another service to free up memory on your sandbox. I chose to stop MapReduce2 for now. You can always enable it later.
Phoenix should now be installed and ready for use.
Testing your new Phoenix installation
To test your new Phoenix installation, navigate to phoenix’s bin folder:
Let’s run the sqlline.py program:
python sqlline.py localhost:2181:/hbase-unsecure
It may take a minute or two to start up. If it hangs for too long go check Ambari to make sure HBase is still running.
Once the program starts, enter these commands:
create table test (mykey integer not null primary key, mycolumn varchar); upsert into test values (1,'Hello'); upsert into test values (2,'World!'); select * from test;
The first command creates a table called test with an integer (numeric) key and a varchar (text) column. The next two commands insert rows into the table. In this case, the third command selects all rows from the table and prints them to the screen:
That’s it for now! You’ve successfully integrated Apache Phoenix with HBase used it to create a simple table. If you’d like to use a GUI to interact with Phoenix, go check out this guide. To dive deeper into Phoenix, check out the quick start guide, or the FAQ. And as always, if you have any questions feel free to reach out to me.