Wondering how to set up a working version of Apache Giraph on CDH 5.1.2? This guide will get you started.
Building Giraph
1. Clone Giraph from GitHub: ‘git clone https://github.com/apache/giraph.git’
2. Modify the hadoop_2 profile in the pom.xml contained in the giraph folder you just cloned
- Change the hadoop.version to read ‘2.3.0-cdh5.1.2’
3. Compile, package and install Giraph: ‘mvn -Phadoop_2 -fae -DskipTests clean install’
- Giraph is now located in giraph/giraph-core/target
Running an Example
Now that you’ve built Giraph, it’s time to run an example.
Create a simple graph text file to use as input. For example:
[0,0,[[1,1],[3,3]]] [1,0,[[0,1],[2,2],[3,1]]] [2,0,[[1,2],[4,4]]] [3,0,[[0,3],[1,1],[4,4]]] [4,0,[[3,4],[2,4]]]
I called the graph tiny_graph.txt. Next, create a shell script to take care of running the example:
#remove everything from the folder called giraph/output in the hadoop file system hadoop fs -rm -r giraph/output/* #remove all text file from the giraph/input folder hadoop fs -rm giraph/input/*.txt #put the 2nd argument to this script (located at /path/) into the hdfs folder giraph/input hadoop fs -put /path/$2 giraph/input/ #change path and ClusterURL:Port as neccesary. $1 = name of example to run. $3 = num workers hadoop jar /path/giraph/giraph-examples/target/giraph-examples-1.1.0-SNAPSHOT-for-hadoop-2.3.0-cdh5.1.2-jar-with-dependencies.jar org.apache.giraph.GiraphRunner -D mapred.child.java.opts="-Xms10240m -Xmx15360m" -D mapred.job.tracker="ClusterURL:Port" -D giraph.zkList="ClusterURL:Port" org.apache.giraph.examples.$1 -vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -vip giraph/input/$2 -vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op giraph/output/lcc -w $3 -ca giraph.SplitMasterWorker=false rm -f part-m-00001
That’s it! You should now be able to run it with ‘sh nameOfScript.sh ExampleName InputFileName.txt NumWorkers’. Thank you to Abdul Quamar, who wrote the shell script mine is based on.