Connecting to an Accumulo Instance Remotely via Java Client Code

In my last guide, I showed you how to properly set up Accumulo on a Hortonworks sandbox. This time, I’ll be showing you how to remotely connect to that Accumulo instance via a Java client.

Setting up

If you haven’t already set up Eclipse for big data development, go look at this post. It will cover how to set up Maven, as well as how to connect to Github (if you wish).

Before we get started, create a new Java project in Eclipse, then convert it to a Maven project. Recall that to convert to a Maven project you can right click on it, then click Configure > Convert to Maven Project.

Edit the pom.xml file

In order to run the code you’ll be using, you need to add the proper jars to your pom.xml file. Double click your pom.xml file, then click Dependencies.

Click ‘Add…’ then fill out the following information:

Accumulo-core dependency

You can leave the scope as Compile.

Click Done.

Add your sandbox IP to your hosts file

To make it easier to connect to your sandbox, we’ll add the IP address to your hosts file. To do this on a Mac, open up your terminal and type:

sudo nano /etc/hosts

Add a line that looks similar to this (may vary based on your sandbox IP address):

192.168.56.101 sandbox.hortonworks.com

Hosts file

Save and exit.

Edit the Authorizations of the root user

Since the code in this guide covers an Accumulo concept known as Authorizations, you’ll need to give your root user the proper authorizations in order to use it.

To give your root user an authorization to modify Accumulo rows that have a “public” authorization, first power on your sandbox and start up Accumulo:

/usr/hdp/2.3.0.0-2557/accumulo/bin/start-all.sh[/sourcecode]

Log on to the accumulo shell:

accumulo shell

Enter your password (hadoop).

Then run this code to to tell Accumulo to give the root user an authorization for any rows with the ‘public’ authorization:

setauths -s public -u root

Finally, quit out of the accumulo shell:

quit

Create a log4j.properties file

This step is optional, but if you choose to ignore it you’ll have to edit out a few lines of code in the Java program you create. A log4j.properties file will allow you to use a Logger with your Java program, which is useful for debugging.

See Customizing Log/Print Statements in the HBase guide for information on how to create the log4j.properties file.

Create the Java class

Now that everything is set up, create the class you’ll be using to connect to Accumulo. I called my class AccumuloConnection.class, but call it whatever you like.

Next, add this code to your class.

Examine the comments of the code to understand how it works. Some notes about the code:

  • It uses a Logger to keep track of system messages as well as any message the developer wants to print
  • It sets up the Logger configuration using the configuration file we created earlier
  • It connects to Accumulo and Zookeeper, then tries to create a table if the table doesn’t already exist
  • Next, it writes a mutation (row) to the server using a BatchWriter
    • Multiple mutations could be written using the BatchWriter, but for this example we just write one
  • Next, a scanner is created
    • Authorizations for the scanner are specified
    • A range to scan over is given
    • A column family is specified to further narrow down the search
  • Finally, the entries the scanner returns are iterated through and printed to console

Hope you enjoyed learning about Accumulo development using Java. If you have any questions feel free to reach out in the comments or an email.

Leave a Comment.