How to Start Accumulo on the Hortonworks Sandbox

The Hortonworks sandbox is a great virtual environment for learning about technologies in the Hadoop ecosystem. It comes bundled with the ability to start and stop services such as MapReduce, Hive, HBase, Kafka, Spark and more in just a few clicks.

Unfortunately, installing and starting Accumulo on the Hortonworks sandbox is a little trickier than that. Luckily for you, all you have to do is follow the steps in this guide and you’ll be up and running Accumulo in no time.

Setting up

Before we get started, make sure your Hortonworks sandbox has the proper network adapter settings.

While your sandbox is powered off, navigate to the settings of your Hortonworks sandbox by clicking Machine > Settings while ‘Hortonworks Sandbox with HDP 2.3_1′ is highlighted. Navigate to Network once in the settings and ensure that your NAT network adapter is turned on, and that it is the only network adapter turned on:

Enable NAT Adapter

The default settings should be fine.

Once you’re enabled the correct network adapter, start up your virtual machine.

Installing Accumulo

Since Accumulo doesn’t come bundled with the Hortonworks sandbox you’ll have to install it. Run this code:

yum install accumulo

It will install Accumulo under a directory similar to /usr/hdp/2.3.0.0-2557/accumulo.

If the install code fails, run this command, then try again:

sudo sed -i "s/mirrorlist=https/mirrorlist=http/" /etc/yum.repos.d/epel.repo

Switch to Host-only adapter

Before we go any further, switch to your host-only adapter on your Hortonworks virtual machine. This will allow you to set things up properly for if you later decide you want to remotely connect to your Accumulo instance.

First, power down your VM.

Navigate to your Hortonworks virtual machine settings: click the Hortonworks sandbox on the virtualbox manager then clicking Machine > Settings. Navigate to Network and ensure the NAT network adapter is unchecked.

Ensure NAT is turned off

Next, turn on your Host-only network adapter:

Host-only network adapter

The default settings should be fine. If you were able to follow these steps, move on to Copy a Configuration Example.

If you don’t see an option for a Host-only network adapter after Name, navigate to VirtualBox > Preferences > Network > Host-only Networks:

Adding a new host-only network

Click the plus sign to add a new Host-only network. Then go back and ensure that your Hortonworks virtual machine settings (Machine > Settings > Network) are correct.  I.e. ensure NAT is unchecked, and create a new network adapter for which you select Host-only and vboxnet0. Refer to the screenshots/steps above if you’re still lost, or reach out to me to ask.

Copy a configuration example

Next, you’ll want to copy the files from one of the configuration examples provided by Accumulo to Accumulo’s config directory:

cp /usr/hdp/2.3.0.0-2557/accumulo/conf/examples/1GB/standalone/* /usr/hdp/2.3.0.0-2557/accumulo/conf

You can choose different sizes ranging from 512MB to 3GB based on your available memory. I choose 1 GB since my sandbox system is on the smaller side.

Edit accumulo-env.sh

Next, open accumulo-env.sh in a text editor. I’ll use vi:

vi /usr/hdp/2.3.0.0-2557/accumulo/conf/accumulo-env.sh

Press to enter insert/edit mode in vi.

Edit the line about your JAVA_HOME to be:

test -z "$JAVA_HOME" && export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk.x86_64

The above code is most likely sandwiched between and if … else statement. Don’t modify those if..else statements, just modify the line that looks like the line I gave you. The important thing that’s changed in the line I gave you is the path. Everything else should be as-is.

Next edit ZOOKEEPER_HOME to be:

test -z "$ZOOKEEPER_HOME" && export ZOOKEEPER_HOME=/usr/hdp/2.3.0.0-2557/zookeeper

Then, find the line with HADOOP_PREFIX and change it to:

test -z "$HADOOP_PREFIX" && export HADOOP_PREFIX=/usr/hdp/2.3.0.0-2557/hadoop

Finally, uncomment this line:

export ACCUMULO_MONITOR_BIND_ALL="true"

Press the escape key, then type ‘:wq’ to save and exit vi.

Edit accumulo-site.xml

Open up accumulo-site.xml in vi:

vi /usr/hdp/2.3.0.0-2557/accumulo/conf/accumulo-site.xml

Press i to enter edit/insert mode.

Change instance.secret property’s value to be hadoop:

  instance.secret
  hadoop
  A secret unique to a given instance that all servers must know in order to communicate with one another. Change it before initialization. To change it later use ./bin/accumulo org.apache.accumulo.server.util.ChangeSecret --old [oldpasswd] --new [newpasswd], and then update this file.

Then scroll down and change trace.token.property.password to hadoop:

  trace.token.property.password
  hadoop

Press escape then ‘:wq’ to save and exit

Edit gc, masters, monitor, slaves, and tracers files

To ensure you are able to use your Accumulo instance with a client program (Java) you need to replace ‘localhost’ in all of the following files with your sandbox’s IP address:

  • gc
  • masters
  • monitor
  • slaves
  • tracers

The files are located in your accumulo conf folder:

cd /usr/hdp/2.3.0.0-2557/accumulo/conf

Recall that you can figure out what your ip address is by typing this in your terminal:

ifconfig

Edit Accumulo User Properties

Now you need to change the accumulo user properties. Edit your password file:

vi /etc/passwd

Press to edit. Scroll all the way to the bottom and edit the accumulo line to read:

accumulo:x:496:501:accumulo:/home/accumulo:/bin/bash

Don’t worry if the third entry isn’t 496. The important thing is to change the 4th entry to 501 and the 6th entry to /home/accumulo. Press escape then ‘:wq’ to save and exit.

Create a home directory for accumulo

The next step we’ll be doing is creating a home directory for the accumulo data to reside in on your local filesystem and hadoop filesystem. Think of this like a development directory. Create it using this code:

mkdir -p /home/accumulo/data

 

hadoop fs -mkdir -p /home/accumulo/data

Change the permissions:

chown -R accumulo:hadoop /home/accumulo/data
sudo -u hdfs hadoop fs -chown -R accumulo:hadoop /home/accumulo/data

Initialize Accumulo

Now that you have everything set up, it’s time to initialize Accumulo. Run the following lines of code in your Hortonworks sandbox:

su - accumulo
. /usr/hdp/2.3.0.0-2557/accumulo/conf/accumulo-env.sh
accumulo init

Once you run accumulo init a few messages will come up on your screen, followed by a message asking you to give accumulo an instance name. I kept it simple and chose accumulo-instance as mine, but choose whatever you like.

Next, enter the password from earlier: hadoop

Change file permissions of Accumulo folder on HDFS

In order for Accumulo to actually be able to run, we need to change the file permissions of the Accumulo folder. To do that, exit the accumulo user by typing ‘su’ into your hortonworks sandbox terminal.

You will be prompted for a password. Type hadoop.

Now run this code:

sudo -u hdfs hadoop fs -chmod 777 /accumulo

Run Accumulo

If you followed all the above steps, you should now be ready to run Accumulo. Enter this code into your terminal to run the start-all shell script:

/usr/hdp/2.3.0.0-2557/accumulo/bin/start-all.sh

You’re done! Accumulo should now be successfully running on your VM. To check, go to this web address: http://192.168.56.101:50095/.

If you notice that your instance name is null the first time you load that page, simply reload the page. It should then display properly.

If you just wanted to get Accumulo up and running, congratulations! You’ve successfully completed this guide.

As a bonus, here’s how to stop Accumulo.

Stopping Accumulo

Use this code to stop Accumulo:

/usr/hdp/2.3.0.0-2557/accumulo/bin/stop-all.sh

Starting Accumulo Back Up

Want to start Accumulo back up?

/usr/hdp/2.3.0.0-2557/accumulo/bin/start-all.sh

Hope you enjoyed reading, and, as always, feel free to reach out with questions. In my next guide I’ll show you how to connect to Accumulo remotely.

Leave a Comment.