How to Start Accumulo on the Hortonworks Sandbox

The Hortonworks sandbox is a great virtual environment for learning about technologies in the Hadoop ecosystem. It comes bundled with the ability to start and stop services such as MapReduce, Hive, HBase, Kafka, Spark and more in just a few clicks.

Unfortunately, installing and starting Accumulo on the Hortonworks sandbox is a little trickier than that. Luckily for you, all you have to do is follow the steps in this guide and you’ll be up and running Accumulo in no time.

Setting up

Before we get started, make sure your Hortonworks sandbox has the proper network adapter settings.

While your sandbox is powered off, navigate to the settings of your Hortonworks sandbox by clicking Machine > Settings while ‘Hortonworks Sandbox with HDP 2.3_1’ is highlighted. Navigate to Network once in the settings and ensure that your NAT network adapter is turned on, and that it is the only network adapter turned on:

Enable NAT Adapter

The default settings should be fine.

Once you’re enabled the correct network adapter, start up your virtual machine.

Installing Accumulo

Since Accumulo doesn’t come bundled with the Hortonworks sandbox you’ll have to install it. Run this code:

yum install accumulo

It will install Accumulo under a directory similar to /usr/hdp/

If the install code fails, run this command, then try again:

sudo sed -i "s/mirrorlist=https/mirrorlist=http/" /etc/yum.repos.d/epel.repo

Switch to Host-only adapter

Before we go any further, switch to your host-only adapter on your Hortonworks virtual machine. This will allow you to set things up properly for if you later decide you want to remotely connect to your Accumulo instance.

First, power down your VM.

Navigate to your Hortonworks virtual machine settings: click the Hortonworks sandbox on the virtualbox manager then clicking Machine > Settings. Navigate to Network and ensure the NAT network adapter is unchecked.

Ensure NAT is turned off

Next, turn on your Host-only network adapter:

Host-only network adapter

The default settings should be fine. If you were able to follow these steps, move on to Copy a Configuration Example.

If you don’t see an option for a Host-only network adapter after Name, navigate to VirtualBox > Preferences > Network > Host-only Networks:

Adding a new host-only network

Click the plus sign to add a new Host-only network. Then go back and ensure that your Hortonworks virtual machine settings (Machine > Settings > Network) are correct.  I.e. ensure NAT is unchecked, and create a new network adapter for which you select Host-only and vboxnet0. Refer to the screenshots/steps above if you’re still lost, or reach out to me to ask.

Copy a configuration example

Next, you’ll want to copy the files from one of the configuration examples provided by Accumulo to Accumulo’s config directory:

cp /usr/hdp/* /usr/hdp/

You can choose different sizes ranging from 512MB to 3GB based on your available memory. I choose 1 GB since my sandbox system is on the smaller side.


Next, open in a text editor. I’ll use vi:

vi /usr/hdp/

Press to enter insert/edit mode in vi.

Edit the line about your JAVA_HOME to be:

test -z "$JAVA_HOME" && export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk.x86_64

The above code is most likely sandwiched between and if … else statement. Don’t modify those if..else statements, just modify the line that looks like the line I gave you. The important thing that’s changed in the line I gave you is the path. Everything else should be as-is.

Next edit ZOOKEEPER_HOME to be:

test -z "$ZOOKEEPER_HOME" && export ZOOKEEPER_HOME=/usr/hdp/

Then, find the line with HADOOP_PREFIX and change it to:

test -z "$HADOOP_PREFIX" && export HADOOP_PREFIX=/usr/hdp/

Finally, uncomment this line:


Press the escape key, then type ‘:wq’ to save and exit vi.

Edit accumulo-site.xml

Open up accumulo-site.xml in vi:

vi /usr/hdp/

Press i to enter edit/insert mode.

Change instance.secret property’s value to be hadoop:

  A secret unique to a given instance that all servers must know in order to communicate with one another. Change it before initialization. To change it later use ./bin/accumulo org.apache.accumulo.server.util.ChangeSecret --old [oldpasswd] --new [newpasswd], and then update this file.

Then scroll down and change to hadoop:

Press escape then ‘:wq’ to save and exit

Edit gc, masters, monitor, slaves, and tracers files

To ensure you are able to use your Accumulo instance with a client program (Java) you need to replace ‘localhost’ in all of the following files with your sandbox’s IP address:

  • gc
  • masters
  • monitor
  • slaves
  • tracers

The files are located in your accumulo conf folder:

cd /usr/hdp/

Recall that you can figure out what your ip address is by typing this in your terminal:


Edit Accumulo User Properties

Now you need to change the accumulo user properties. Edit your password file:

vi /etc/passwd

Press to edit. Scroll all the way to the bottom and edit the accumulo line to read:


Don’t worry if the third entry isn’t 496. The important thing is to change the 4th entry to 501 and the 6th entry to /home/accumulo. Press escape then ‘:wq’ to save and exit.

Create a home directory for accumulo

The next step we’ll be doing is creating a home directory for the accumulo data to reside in on your local filesystem and hadoop filesystem. Think of this like a development directory. Create it using this code:

mkdir -p /home/accumulo/data


hadoop fs -mkdir -p /home/accumulo/data

Change the permissions:

chown -R accumulo:hadoop /home/accumulo/data
sudo -u hdfs hadoop fs -chown -R accumulo:hadoop /home/accumulo/data

Initialize Accumulo

Now that you have everything set up, it’s time to initialize Accumulo. Run the following lines of code in your Hortonworks sandbox:

su - accumulo
. /usr/hdp/
accumulo init

Once you run accumulo init a few messages will come up on your screen, followed by a message asking you to give accumulo an instance name. I kept it simple and chose accumulo-instance as mine, but choose whatever you like.

Next, enter the password from earlier: hadoop

Change file permissions of Accumulo folder on HDFS

In order for Accumulo to actually be able to run, we need to change the file permissions of the Accumulo folder. To do that, exit the accumulo user by typing ‘su’ into your hortonworks sandbox terminal.

You will be prompted for a password. Type hadoop.

Now run this code:

sudo -u hdfs hadoop fs -chmod 777 /accumulo

Run Accumulo

If you followed all the above steps, you should now be ready to run Accumulo. Enter this code into your terminal to run the start-all shell script:


You’re done! Accumulo should now be successfully running on your VM. To check, go to this web address:

If you notice that your instance name is null the first time you load that page, simply reload the page. It should then display properly.

If you just wanted to get Accumulo up and running, congratulations! You’ve successfully completed this guide.

As a bonus, here’s how to stop Accumulo.

Stopping Accumulo

Use this code to stop Accumulo:


Starting Accumulo Back Up

Want to start Accumulo back up?


Hope you enjoyed reading, and, as always, feel free to reach out with questions. In my next guide I’ll show you how to connect to Accumulo remotely.

The simple, quick start guide to using SQuirreL

SQuirreL, if you haven’t heard of it yet, is a graphical java program (GUI) which lets you see the structure of your Phoenix database, browse the data in your tables, and run SQL queries over the data.

In the last big data guide, I helped you set up SQuirreL with Phoenix. If you’re reading this, I’ll assume you’re using Hortonworks sandbox and already have SQuirreL connected to Phoenix. In this guide, we’re going to delve into two SQuirreL basics that will quickly let you start viewing and transforming your data: viewing table information and running queries.

Viewing basic table information

To see information about your table such as the row count, columns, primary key, and indexes, you’ll want to be in the Objects view:


Select Table > YourTableName and you’ll be able to access some stats about your table.

Pretty simple right?

Let’s get to the good stuff: running SQL queries.

Running SQL Queries

SQuirreL allows you to run single SQL queries or batches of SQL queries. To do, select the SQL tab, enter the queries you’d like to run, and click the icon that looks like a running person:


SQuirreL will run your queries and send output to the screen, as well as include the time it took to run. There’s also options to store the result of your SQL queries in a table or file:Store results in table or file

Knowing how to do just these two things will allow you to run most of the SQL queries you might want. If you’re going for something more complex, you might find some of SQuirreL’s other features useful. Similarly, you might want to read up more about Phoenix and it’s features to understand how to optimize your queries and schemas.

How to Integrate SQuirreL with Phoenix

If you’re looking to use a client GUI, or graphical user interface, to interact with Phoenix, you might want to give SQuirreL a try. SQuirreL is a graphical java program that lets you see the structure of a database, browse the data in tables, and perform SQL queries.

Installing SQuirreL

The first thing you’ll want to do is install SQuirreL. To do this, go here and select the appropriate download for your operating system.

Once you have downloaded the file, open a new terminal window. Navigate to wherever you downloaded the file to, and run this code:

java -jar squirrel-sql-3.6-MACOSX-install.jar

You may have to modify that code slightly based on the name of the file you downloaded (version and operating system).

Once the installer dialog pops up, follow the instructions to install SQuirreL onto your system. You can choose to select optional installs if you like. For now, I chose to just do the base and standard install:

Installing SQuirrelSQL

SQuirrelSQL is now installed on your local machine, but we’re not done yet. We still need to set it up so we can use it with Phoenix.

Configuring SQuirrelSQL for Phoenix

Before we get started, make sure you have your Hortonworks sandbox VM up and running. You should have already followed this guide to set up Phoenix on your sandbox.

Step 1 – Move Phoenix client jar to SQuirreL lib directory

In your VM, navigate to the folder where Phoenix is installed. For me that folder is /usr/hdp/ You need to move the phoenix client jar from that folder to your local machine. If you have a shared folder set up between your vm and your computer, run code similar to this to copy it to your local machine:

cp /usr/hdp/ /media/sf_hdp_shared_folder/

Once you’ve got the file on your local machine, add it to SQuirrelSQL’s lib directory. On a Mac, you do this by navigating to Applications then right clicking on SQuirrelSQL and clicking ‘Show Package Contents’:

Finding SQuirreL's lib folder

From there, navigate to Contents > Resources > Java > lib. Copy the phoenix client jar to that directory:

Adding the phoenix jar to SQuirreL's lib folder

Step 2 – Switch your VM’s network adapter to Host-only

To make things easier for us, we’re going to switch the VM’s network adapter to Host-only. This gets rid of some bugs that can pop up if you try to connect to the VM when it’s using a NAT adapter. If your VM’s network adapter is already in Host-only mode, skip ahead to step 3.

To switch to Host-only mode, first power off your Hortonworks sandbox. Open up the VirtualBox Manager, click on your Hortonworks Sandbox, and select Settings > Network. Disable your NAT adapter, or any other adapters, if they are enabled. Then, with an empty adapter spot click Enable Network Adapter. Attach it to ‘Host-only Adapter’, then select any of the available options for its Name. Leave the Advanced Settings as they are:

Host-only network settings

Troubleshooting – What if there is no option for a Host-only Adapter after the Name field?

In this case, you need to edit your VirtualBox Manager settings. Close out of the Hortonworks settings and go to VirtualBox > Preferences > Network. Select the Host-only Networks tab, then click the plus icon:

Adding a new host-only network

A new Host-only network should now be available for you to select for your Hortonworks sandbox. Make sure you go back into the settings for your Hortonworks sandbox and switch your VM’s network adapter to your new Host-only adapter. When you have done that, move on to Step 3.

Step 3 – Add hortonworks.hbase.vm to your hosts file

If you haven’t already updated your local machine’s hosts file with the ip address of your sandbox, we’re going to do that in this step. If you have already done that, skip ahead to step 4.

To add hortonworks.hbase.vm to your hosts file, open up a terminal window. On a Mac, type this code:

sudo nano /etc/hosts

You’ll be asked to enter your password, then nano (a text editor) will open. Add this line at the end of your hosts file: hortonworks.hbase.vm

This is what my hosts file currently looks like:

Hosts file

Some important notes

  • The ip address you enter may be different than mine. To check it, type this code in your Hortonworks sandbox:
  • You can call your sandbox whatever you like in the hosts file. I chose to go with hortonworks.hbase.vm, but you can call it if you like. Just remember what you called it because we’ll be using that name later.

Step 4 – Add Phoenix driver to SQuirreL

Open up SQuirrel, click the Drivers tab on the left side of the window, and click the plus button to create a new driver. Enter this information into the driver creation window:

  • Name: Phoenix
  • Example URL: jdbc:phoenix:hortonworks.hbase.vm:2181:/hbase-unsecure
  • Website URL: [Blank. Do not write anything here]
  • Class Name: org.apache.phoenix.jdbc.PhoenixDriver

It should look like this:

Phoenix Driver Settings

Click OK. You should get a message which reads “Driver class org.apache.phoenix.jdbc.PhoenixDriver successfully registered for driver definition: Phoenix”

Step 4.5 – Ensure your sandbox and HBase are running

Before continuing, make sure both your Hortonworks sandbox and HBase are running. Recall that you can check if HBase is running in Ambari at your-sandbox-ip:8080. If you’re having trouble starting HBase or getting warnings that don’t disappear after a minute or two, check out the ‘Starting HBase on the sandbox’ section of this guide.

Step 5 – Create an Alias

Switch to the Aliases tab and click the plus button to create a new alias. Enter this information in the alias creation window:

  • Name – Alias name (ex: hortonworksSandbox, whatever you want)
  • Driver – Phoenix
  • URL – This should be auto-populated when you select your driver with jdbc:phoenix:hortonworks.hbase.vm:2181:/hbase-unsecure
  • User Name – Whatever you like (ex: admin)
  • Password – Whatever you like (ex: admin)

It should look like this:

Phoenix Alias

Once you’ve filled out the above information, click Test then select Connect. A box should pop up which says “Connection successful”. Click OK then OK again to create the alias.

Step 6 – Connect

We’re almost done! Double click on your newly created alias and click Connect. You should now be successfully connected and able to run SQL queries.

For a very short guide to using SQuirreL, check out this link.

How to Integrate Apache Phoenix with HBase

If you’re looking to get started with using Apache Phoenix, the open source SQL skin for HBase, the first thing you’ll want to do is install it. This guide will show you how to do that on the Hortonworks virtual sandbox.

If you’re running your setup on a machine that isn’t the Hortonworks sandbox, the installation guide over on the Phoenix website should help. Hortonworks also has an installation guide for both unsecure and secure hadoop clusters. In this guide we’ll be setting up Phoenix on an unsecure cluster (sandbox).

What is Apache Phoenix?

Before we start, let’s talk briefly about what Phoenix is and what it can do for you. As previously mentioned, Phoenix is an open source SQL skin for HBase. This means that it takes your SQL queries and transforms them into a series of HBase scans. The transformations are all done under the hood. For the most part, you can run SQL queries over HBase as if you were merely using a relational database like MySQL or SQLite.

What are some use cases for Phoenix?

Phoenix can be used for a few different use cases:

  • At SiftScience, they use Phoenix for ad-hoc queries and exposing data insights.
  • At Alibaba, they use Phoenix for queries where there is a large dataset with a relatively small result (10,000 records or so), or for complicated queries over large dataset with a large result (millions of records).
  • At Ebay, they use Phoenix for Path or Flow analysis, as well as for real time data trends.

To see more use cases, go here.

Where to learn more

If you’re inclined to learn more about Phoenix before we get started, check out the FAQ, learn about which SQL statements are supported (a lot), or simply check out the project home page.

Installing Phoenix

Ready to get started? We’re going to be using the open-source, package management utility yum (Yellowdog Updater, Modified) to install Phoenix. To start the installation run:

yum install phoenix

Possible installation errors (and their fixes)

There’s a good chance that code will fail if you haven’t used yum before.

If it fails with the error message Couldn’t resolve host, the issue most likely stems from your network adapter. Check your network adapter settings: Machine > Settings > Network. You should have a network adapter enabled that is attached to NAT. Make sure no other network adapters are enabled. If you don’t have a NAT adapter enabled, power off your machine. Once it’s powered off you can return to the same Machine > Settings > Network menu to add or enable a NAT adapter. The default settings should be fine:

Enable NAT Adapter

If you receive the message Error: Cannot retrieve metalink for respository : epel, you will have to run this code in your VM:

sudo sed -i "s/mirrorlist=https/mirrorlist=http/" /etc/yum.repos.d/epel.repo

It will update yum’s repository to use http instead of https.

Installing Phoenix with Yum

With the above fixes in place, you should be ready to install Phoenix with yum. Run this code:

yum install phoenix

Once the installation finishes, find your Phoenix core jar file. For me it was located at /usr/hdp/ Link the Phoenix core jar file to the HBase Master and Region servers. Here was the code I used to link it:

ln -sf /usr/hdp/ /usr/hdp/

Change the version numbers if you have different versions of hadoop (hdp) or phoenix.

Edit the hbase-site.xml

The next step is editing the hbase-site.xml. Run this:

vi /usr/hdp/

Again, change version numbers in that code as necessary. Now that you’re in vi, a linux text editor, hit to change from command mode to insert mode. Insert mode will let you make changes to the text in the file, while command mode lets you cause actions that will be taken on the file. Place this code between the two configuration tags:


The file should look like this when you’re done:

hbase-site.xml file

Notice that ‘:wq’ will allow you to save the file and exit

Save the file by pressing ‘ESC’ to change from Insert Mode to Command mode, then hit ‘:wq’ to save and quit.

Start HBase

If HBase isn’t running yet, you need to start it. Similarly, if HBase is already running, you need to restart it.

Log into Ambari in your browser at with username/password admin/admin. If that doesn’t work, check which ip address to use by typing this code in your terminal:


Once you’re logged in, start HBase by clicking HBase on the left panel

Starting HBase in Ambari

then Service Actions > Start:

HBase tab in Ambari







Give it a minute or two to start if you get a red alert when it first starts up. If the alert persists, you may have to stop another service to free up memory on your sandbox. I chose to stop MapReduce2 for now. You can always enable it later.

Phoenix should now be installed and ready for use.

Testing your new Phoenix installation

To test your new Phoenix installation, navigate to phoenix’s bin folder:

cd /usr/hdp/

Let’s run the program:

python localhost:2181:/hbase-unsecure

It may take a minute or two to start up. If it hangs for too long go check Ambari to make sure HBase is still running.

Once the program starts, enter these commands:

create table test (mykey integer not null primary key, mycolumn varchar);
upsert into test values (1,'Hello');
upsert into test values (2,'World!');
select * from test;

The first command creates a table called test with an integer (numeric) key and a varchar (text) column. The next two commands insert rows into the table. In this case, the third command selects all rows from the table and prints them to the screen:

Phoenix results

That’s it for now! You’ve successfully integrated Apache Phoenix with HBase used it to create a simple table. If you’d like to use a GUI to interact with Phoenix, go check out this guide. To dive deeper into Phoenix, check out the quick start guide, or the FAQ. And as always, if you have any questions feel free to reach out to me.

HBase Development in Java

If you want to learn how to do some simple HBase operations in Java, this guide is for you.

Some important notes before we get started:

  • I’ll be using HBase’s 2.0.0 API. There are some differences between this API and other, older versions of the API. At the time of this writing, not many tutorials or guides exist for the the 2.0 version of the API. So if you run into issues trying to get something to work, looking at the API docs are probably your best bet.
  • To test the code you’ll be developing, you’ll want to use the Hortonworks Sandbox.
  • If you haven’t configured Eclipse for big data development, give this guide a look. It will cover how to set up your IDE with Maven, connect to Github, and set up a shared folder between your computer and the Hortonworks sandbox.

Ready to begin?

Starting HBase on the sandbox

The first thing you’ll want to do is start HBase on your sandbox. If you haven’t already, fire up your virtual machine and point your browser to If that doesn’t work, check your sandbox’s IP address using this code in your sandbox:


Log in to Ambari using the username ‘admin’ and password ‘admin’.

Once you reach your dashboard, click on HBase. Then click Service Actions > Start.

If your virtual box doesn’t have enough RAM available, you might have to stop some other running services to ensure that all the HBase servers are able to start. In my case, I stopped MapReduce2. You can always go back and enable it again later. Also, if you see any alerts when MapReduce2 is turned off, you can click Service Actions > Turn On Maintenance Mode.

Creating a table and inserting data into it

Now it’s time to start developing in Eclipse.

Open Eclipse and create a new Java project. If you don’t have Maven set up yet, give this guide a look. Once you’ve created your Java project., right click it and hit Configure > Convert to Maven Project.

If you read the code here, check out the comments, and add the necessary jars to your pom.xml file (hbase-client 1.1.1) it should be fairly simple to get the code running. You’ll have to uncomment a line in the main method in order to insert records. Export it like you did in the MapReduce tutorial, upload it to your VM, and run it using this code:

yarn jar BasicTableTransactions.jar BasicTableTransactions

Don’t forget to change the permissions of your uploaded jar before trying to run it:

chmod 777 BasicTableTransactions.jar

The code will create a table for you and insert a few records into it.

You can check that the code works by opening up your Hortonworks sandbox and typing:

hbase shell

Once the shell starts, you can view your table, count the number of records in it, and perform scans over the data:

describe 'business_data'
count 'business_data'
scan 'business_data',{COLUMNS => 'artist_data'}

For further clarification about the code on Github or the code mentioned above, feel free to reach out to me.

Schema design

If you want to dip your feet into creating well designed schemas in HBase, give this article a look. It’s a good introduction that covers some important design considerations. For further reading, try HBase’s suggestions in the schema section of their reference guide.

Remotely connecting to HBase

Being able to debug your code is a very important part of software development. One of the issues I ran into with testing my HBase code was a lack of clear direction as to where the log files were stored. When testing my code I tried to print logs to ensure that the program was running as I expected, but I was unable to figure out where to log files were being printed. From trying to find the job in the yarn application master (port 16010), which it seems HBase jobs don’t show up in, to reading through all the logs I could get my hands on, my log statements just weren’t showing up. If you know how or where to access the logs Java sends, please let me know and I’ll update this guide accordingly.

Back to connecting to HBase remotely.

Follow this guide. It includes source code and instructions on setting up your sandbox correctly. I skipped step 4 and started HBase via Ambari instead (port 8080). Once you switch to a host-only network adapter, the IP address you’ll use to log in will likely change. To check what it is, use this code in your Hortonworks sandbox VM:


In step 5, the author is talking about the hosts file of your local machine, not your sandbox.

Once you’re done with the guide, run your code. You should be able to successfully connect to HBase.

Customizing log events / print statements

It’s nice to use System.out.println statements to try to track what’s going on in your code, but it’s better to use log events. Log events will let you track what’s going on at a much more granular level than simple print statements would.

Ready to make the switch to log events?

The first thing you need to do is find your file. For me, this was located at

/Users/myname/Development/Hadoop/hadoop-2.7.1/share/hadoop/tools/sls/sample-conf/ If that’s too long and complicated for you, you could just make your own and place it wherever you like. Here’s what is in my file:

# Define the root logger with appender X:

log4j.rootLogger = INFO, consoleAppender

## Set the appender named X to be a console appender:
log4j.appender.consoleAppender = org.apache.log4j.ConsoleAppender

# Define the layout for console Appender appender
log4j.appender.consoleAppender.layout = org.apache.log4j.PatternLayout
log4j.appender.consoleAppender.layout.ConversionPattern=%-4r [%t] %-5p %c %x - %m%n

Notice that I set the level of the logger to be INFO. This allows the logger to ignore DEBUG events instead of printing them to screen. If you would like to see debug events, simply change INFO to DEBUG.

Now that you have your file set up, it’s time to use it. Create a new logger using this code:

private static final Log LOG = LogFactory.getLog(BasicTableTransactions.class);

The above code should be put within your class, outside of any methods. Now, put this code in your main method:

//configure log4j so it can run
Properties props = new Properties();
props.load(new FileInputStream("/Users/myname/Development/Hadoop/hadoop-2.7.1/share/hadoop/tools/sls/sample-conf/"));

Replace the path I have up there with whatever path you have your file in.

We’re almost done. Now all you have to do is log events. To do this, place this code wherever you want to log something:"Here's a log statement I want to send to the console");

If you want to change the level of the log statement (more on logging levels and their hierarchy here), change ‘info’ to whichever level you prefer. You can see how my code is using logging here and here.

Deleting all records from an HBase table

Say you ran the code above, accidentally inserted a ton of near-duplicate rows, and want to restart with an empty table. To do that, log onto your Hortonworks sandbox’s shell and type the following to launch the HBase shell:

hbase shell

Now, run this code:

truncate 'business_data'

The above code will remove your table and create a new one with the same settings. If you want to remove your table altogether, use:

disable 'business_data'
drop 'business_data'

That’s it for the guide for now. Stay tuned for updates.