HBase Development in Java

If you want to learn how to do some simple HBase operations in Java, this guide is for you.

Some important notes before we get started:

  • I’ll be using HBase’s 2.0.0 API. There are some differences between this API and other, older versions of the API. At the time of this writing, not many tutorials or guides exist for the the 2.0 version of the API. So if you run into issues trying to get something to work, looking at the API docs are probably your best bet.
  • To test the code you’ll be developing, you’ll want to use the Hortonworks Sandbox.
  • If you haven’t configured Eclipse for big data development, give this guide a look. It will cover how to set up your IDE with Maven, connect to Github, and set up a shared folder between your computer and the Hortonworks sandbox.

Ready to begin?

Starting HBase on the sandbox

The first thing you’ll want to do is start HBase on your sandbox. If you haven’t already, fire up your virtual machine and point your browser to http://127.0.0.1:8080/. If that doesn’t work, check your sandbox’s IP address using this code in your sandbox:

ifconfig

Log in to Ambari using the username ‘admin’ and password ‘admin’.

Once you reach your dashboard, click on HBase. Then click Service Actions > Start.

If your virtual box doesn’t have enough RAM available, you might have to stop some other running services to ensure that all the HBase servers are able to start. In my case, I stopped MapReduce2. You can always go back and enable it again later. Also, if you see any alerts when MapReduce2 is turned off, you can click Service Actions > Turn On Maintenance Mode.

Creating a table and inserting data into it

Now it’s time to start developing in Eclipse.

Open Eclipse and create a new Java project. If you don’t have Maven set up yet, give this guide a look. Once you’ve created your Java project., right click it and hit Configure > Convert to Maven Project.

If you read the code here, check out the comments, and add the necessary jars to your pom.xml file (hbase-client 1.1.1) it should be fairly simple to get the code running. You’ll have to uncomment a line in the main method in order to insert records. Export it like you did in the MapReduce tutorial, upload it to your VM, and run it using this code:

yarn jar BasicTableTransactions.jar BasicTableTransactions

Don’t forget to change the permissions of your uploaded jar before trying to run it:

chmod 777 BasicTableTransactions.jar

The code will create a table for you and insert a few records into it.

You can check that the code works by opening up your Hortonworks sandbox and typing:

hbase shell

Once the shell starts, you can view your table, count the number of records in it, and perform scans over the data:

describe 'business_data'
count 'business_data'
scan 'business_data',{COLUMNS => 'artist_data'}

For further clarification about the code on Github or the code mentioned above, feel free to reach out to me.

Schema design

If you want to dip your feet into creating well designed schemas in HBase, give this article a look. It’s a good introduction that covers some important design considerations. For further reading, try HBase’s suggestions in the schema section of their reference guide.

Remotely connecting to HBase

Being able to debug your code is a very important part of software development. One of the issues I ran into with testing my HBase code was a lack of clear direction as to where the log files were stored. When testing my code I tried to print logs to ensure that the program was running as I expected, but I was unable to figure out where to log files were being printed. From trying to find the job in the yarn application master (port 16010), which it seems HBase jobs don’t show up in, to reading through all the logs I could get my hands on, my log statements just weren’t showing up. If you know how or where to access the logs Java sends, please let me know and I’ll update this guide accordingly.

Back to connecting to HBase remotely.

Follow this guide. It includes source code and instructions on setting up your sandbox correctly. I skipped step 4 and started HBase via Ambari instead (port 8080). Once you switch to a host-only network adapter, the IP address you’ll use to log in will likely change. To check what it is, use this code in your Hortonworks sandbox VM:

ifconfig

In step 5, the author is talking about the hosts file of your local machine, not your sandbox.

Once you’re done with the guide, run your code. You should be able to successfully connect to HBase.

Customizing log events / print statements

It’s nice to use System.out.println statements to try to track what’s going on in your code, but it’s better to use log events. Log events will let you track what’s going on at a much more granular level than simple print statements would.

Ready to make the switch to log events?

The first thing you need to do is find your slf4j.properties file. For me, this was located at

/Users/myname/Development/Hadoop/hadoop-2.7.1/share/hadoop/tools/sls/sample-conf/log4j.properties. If that’s too long and complicated for you, you could just make your own and place it wherever you like. Here’s what is in my log4j.properties file:

# Define the root logger with appender X:

log4j.rootLogger = INFO, consoleAppender

## Set the appender named X to be a console appender:
log4j.appender.consoleAppender = org.apache.log4j.ConsoleAppender

# Define the layout for console Appender appender
log4j.appender.consoleAppender.layout = org.apache.log4j.PatternLayout
log4j.appender.consoleAppender.layout.ConversionPattern=%-4r [%t] %-5p %c %x - %m%n

Notice that I set the level of the logger to be INFO. This allows the logger to ignore DEBUG events instead of printing them to screen. If you would like to see debug events, simply change INFO to DEBUG.

Now that you have your log4j.properties file set up, it’s time to use it. Create a new logger using this code:

private static final Log LOG = LogFactory.getLog(BasicTableTransactions.class);

The above code should be put within your class, outside of any methods. Now, put this code in your main method:

//configure log4j so it can run
Properties props = new Properties();
props.load(new FileInputStream("/Users/myname/Development/Hadoop/hadoop-2.7.1/share/hadoop/tools/sls/sample-conf/log4j.properties"));
PropertyConfigurator.configure(props);

Replace the path I have up there with whatever path you have your log4j.properties file in.

We’re almost done. Now all you have to do is log events. To do this, place this code wherever you want to log something:

LOG.info("Here's a log statement I want to send to the console");

If you want to change the level of the log statement (more on logging levels and their hierarchy here), change ‘info’ to whichever level you prefer. You can see how my code is using logging here and here.

Deleting all records from an HBase table

Say you ran the code above, accidentally inserted a ton of near-duplicate rows, and want to restart with an empty table. To do that, log onto your Hortonworks sandbox’s shell and type the following to launch the HBase shell:

hbase shell

Now, run this code:

truncate 'business_data'

The above code will remove your table and create a new one with the same settings. If you want to remove your table altogether, use:

disable 'business_data'
drop 'business_data'

That’s it for the guide for now. Stay tuned for updates.

Leave a Comment.