Getting Started With Apache Giraph on CDH 5.1.2

Wondering how to set up a working version of Apache Giraph on CDH 5.1.2? This guide will get you started.

Building Giraph

1. Clone Giraph from GitHub: ‘git clone¬†’

2. Modify the hadoop_2 profile in the pom.xml contained in the giraph folder you just cloned

  • Change the hadoop.version to read ‘2.3.0-cdh5.1.2’

3. Compile, package and install Giraph: ‘mvn -Phadoop_2 -fae -DskipTests clean install’

  • Giraph is now located in giraph/giraph-core/target

Running an Example

Now that you’ve built Giraph, it’s time to run an example.

Create a simple graph text file to use as input. For example:


I called the graph tiny_graph.txt. Next, create a shell script to take care of running the example:

#remove everything from the folder called giraph/output in the hadoop file system
hadoop fs -rm -r giraph/output/*
#remove all text file from the giraph/input folder
hadoop fs -rm giraph/input/*.txt
#put the 2nd argument to this script (located at /path/) into the hdfs folder giraph/input
hadoop fs -put /path/$2 giraph/input/
#change path and ClusterURL:Port as neccesary. $1 = name of example to run. $3 = num workers
hadoop jar /path/giraph/giraph-examples/target/giraph-examples-1.1.0-SNAPSHOT-for-hadoop-2.3.0-cdh5.1.2-jar-with-dependencies.jar org.apache.giraph.GiraphRunner -D"-Xms10240m -Xmx15360m" -D mapred.job.tracker="ClusterURL:Port" -D giraph.zkList="ClusterURL:Port" org.apache.giraph.examples.$1 -vif -vip giraph/input/$2 -vof -op giraph/output/lcc -w $3 -ca giraph.SplitMasterWorker=false
rm -f part-m-00001

That’s it! You should now be able to run it with ‘sh ExampleName InputFileName.txt NumWorkers’. Thank you to Abdul Quamar, who wrote the shell script mine is based on.

Leave a Comment.