Getting Started With Apache Giraph on CDH 5.1.2

Wondering how to set up a working version of Apache Giraph on CDH 5.1.2? This guide will get you started.

Building Giraph

1. Clone Giraph from GitHub: ‘git clone’

2. Modify the hadoop_2 profile in the pom.xml contained in the giraph folder you just cloned

  • Change the hadoop.version to read ’2.3.0-cdh5.1.2′

3. Compile, package and install Giraph: ‘mvn -Phadoop_2 -fae -DskipTests clean install’

  • Giraph is now located in giraph/giraph-core/target

Running an Example

Now that you’ve built Giraph, it’s time to run an example.

Create a simple graph text file to use as input. For example:


I called the graph tiny_graph.txt. Next, create a shell script to take care of running the example:

#remove everything from the folder called giraph/output in the hadoop file system
hadoop fs -rm -r giraph/output/*
#remove all text file from the giraph/input folder
hadoop fs -rm giraph/input/*.txt
#put the 2nd argument to this script (located at /path/) into the hdfs folder giraph/input
hadoop fs -put /path/$2 giraph/input/
#change path and ClusterURL:Port as neccesary. $1 = name of example to run. $3 = num workers
hadoop jar /path/giraph/giraph-examples/target/giraph-examples-1.1.0-SNAPSHOT-for-hadoop-2.3.0-cdh5.1.2-jar-with-dependencies.jar org.apache.giraph.GiraphRunner -D"-Xms10240m -Xmx15360m" -D mapred.job.tracker="ClusterURL:Port" -D giraph.zkList="ClusterURL:Port" org.apache.giraph.examples.$1 -vif -vip giraph/input/$2 -vof -op giraph/output/lcc -w $3 -ca giraph.SplitMasterWorker=false
rm -f part-m-00001

That’s it! You should now be able to run it with ‘sh ExampleName InputFileName.txt NumWorkers’. Thank you to Abdul Quamar, who wrote the shell script mine is based on.

A Beginner’s Guide to Generating SSH Keys for Github

Hard at work

A mouse sure comes in handy when using bash

I’m currently in the process of setting up an account on a RHEL (Red Hat Enterprise Linux) server (accessed via PuTTY on a Windows machine) to communicate with GitHub. There’s a great guide for generating the necessary SSH keys over at Github. If you know your way around Linux, give it a look. It should be pretty straightforward to follow. If Linux is a bit new to you though, you may run into some questions and issues with the guide. This post will alleviate two commons issues with the guide.

Problem 1: Could not open a connection to your authentication agent

  • This error comes about as a result of attempting to run ‘ssh-add ~/.ssh/id_rsa’
    • To fix: run ‘ssh-agent’ followed by ‘eval $(ssh-agent)’. Run ‘ssh-add’ and you’re good to go.

Problem 2: clip: command not found

  • Less of a problem than an inconvenience. To work around not being able to run ‘clip < ~/.ssh/’, you can instead simply run ‘cat ~/.ssh/’. Highlight the output, which will copy it to your clip board. Now paste into Github where the tutorial tells you to paste it. You may have to remove some white space, but once you do you’re set.

Hope these tips help!

Interview with Thomson Nguyen

Last month I had the chance to talk to Thomson Nguyen, co-founder & CEO at Framed Data, about what got him to where he is today, startups, big data, traveling, and more. Here’s what I learned:

Thomson Nguyen

About Thomson

Thomson spent his undergraduate years at Berkeley. He started as a Bioengineering major before moving on to a Pure Mathematics major with an English minor. Or unemployable math, as he lightly calls it. This consisted of a good deal of logic, set theory, and other pure math concepts. While his classes might not have geared towards been solving real world applications in math, his formative years built a foundation for his later work with computers. Before he got to that point though, he went off to Cambridge for a Master’s degree in computational biology.

Travel (and how to pay for it)

While at Cambridge, Thomson knew that he wanted to travel and explore Europe. One problem though: travel gets expensive.

To pay for his adventures, Thomson started setting up fish and chips websites for local sellers. These owners were often unfamiliar with the internet, but wanted to have a presence on Google Maps, Yelp, and the web.

How do you get started when you don’t have any existing work to show clients?

Thomson made his first website for a fish and chips owner for free. He wowed the owner, and from there his business started taking off. Able to show off his previous work to other fish and chips sellers, he could justify charging for the sites. In 2 months of contract work, he made enough money to travel around Europe for 6 months. As a current college student myself, I’m going to be taking this advice and laterally shifting it to the photography business to fund my upcoming trip abroad. More on that in another blog post though.

Data Science

After graduate school, Thomson went to NY to work at a hedge fund over the summer. The path didn’t seem right though, so when it came time to find a job Thomson went to NYU to research machine learning. There, he learned to derive insights form data via statistical methods. In mid-2011, he went to work for Lookout. Lookout is a security company which helps keep users’ mobile phones safe and private. They also offer solutions for government and enterprise. When Thomson was at Lookout, the idea was to look at Android app data and figure out which apps were malicious. For example, say the average Flashlight application is 70 kb and the average permissions needed is 0. With Android’s rich ontology, apps are already helpfully grouped into specific subcategories. If there are apps that are significantly larger than that, or are asking for more permissions, given a specific distribution curve then it should be flagged for review.

Thomson enjoyed his time at Lookout, but an opportunity came to take a job at Causes that he couldn’t resist. Here, he was a data scientist lead. His day-to-day experiences here consisted of one-on-one meetings with people, product management, and coding. In this exercise in management, he learned how to scale up a team and products. This is the sort of thing that academia doesn’t teach.

Framed Data

After a year and a half, Thomson left Causes to start working on his own company: Framed Data. Framed Data is a predictive analytics application that takes in user data and predicts when they’re going to leave your application, and why they’re leaving. It helps users figure out how they can improve their application, as well as knowing when to reach out to high-risk users.

The idea wasn’t always this polished though. The original idea for the company was to take data scientist’s models and productionize them. The concept was good, and it got Framed Data into Y Combinator Winter 2014. From there, Framed Data pivoted to provide a marketing automation service which helps applications retain their users.

Since getting into Y Combinator, Thomson mentioned his job has transitioned from coding to hiring. Now much of his day-to-day activities are non-technical, and mainly involve sales, business development, customer support, and management.

Data Science Tips and Advice

With his experience in data science, Thomson expanded upon a number of different ways for someone interested in data science to get more involved. One aspect he mentioned was General Assembly courses, which teach you how to become a data scientist over the course of 12 weeks. These types of courses tend to be better suited for non-technical people, though technical people can still pick up a lot from them too. In the course Thomson taught, 95% of the participants graduated from the course with jobs.

Another option for learning more about data science is Kaggle. Here, you compete with teams from around the world on one of the many different data science problems available. It’s a great way to build your data science portfolio on Github. Side tip: your portfolio should consist of code AND plain English talking about the different trends you found. One of the benefits (and also perhaps a drawback) of Kaggle is that it’s graded on a quantitative scale. You are optimizing for a predetermined number. This makes it great for competitions, but in the real world things aren’t always so objective.

Closing Thoughts

Thomson ended our discussion with some closing thoughts on start-ups, career paths, and locations.

Big Companies vs Startups

In his opinion, companies can destroy creativity. Six-figures for a young person starting at their first job makes life too comfortable. Once you’re being compensated at that level, you’re not going to want to take the risk to start your own company.

As an employee at a startup, you’re going to have high visibility to everyone else. You’ll be able to interact with the investors, VPs, CEOs, etc. on a much more regular basis than you would at a big company like Google. Lastly, if you want to work at a startup, consider moving out west. It’s a great place to live, and Framed Data is hiring.


HackTech Highlights

In my last post, I talked a little bit about my travels in Venice and Santa Monica. Today, I’ll be talking about my experiences at HackTech, the hackathon put on by the students of Caltech.

Let the Hacking Begin

Around 7 on Friday night, 1/24/14 we arrived at the venue we’d be hacking in: a convention center in the middle of Santa Monica Place. I couldn’t have asked for a better location.

Our (pretty awesome) team, consisting of Britt, CraigKunal, and me, grabbed a table with other UMD hackers and started setting up our development environment. I updated to a fresh version of Ubuntu and installed Sublime text. With some help from Craig and Kunal, and some command line magic, I was ready to start hacking.

Our idea, which didn’t have a name at this point, was to build a web application using eBay’s API which could find the average price of items. The user would enter a search term, select the category, and our application would find the average price for said item. This could be used to allow a buyer to find out how much he should be paying for his item, or for a seller to figure out how much to sell the item for. We didn’t want to stop there though, we decided to build in another feature which would search for items that were under-priced and allow buyers to find great deals on eBay.

Kunal and myself would familiarize ourselves with the eBay API and build the nit and grit of the back-end, while Craig and Britt would work on the gorgeous front-end. We chose Python/Django for our back-end and were excited to start. But first, I needed to familiarize myself with Python/Django.

After some quick tutorials in Python I was ready to begin. Having never used Python before, I was glad I had Craig and Kunal around to offer advice. One of the things I love about hackathons is how much learning can take place in such a short period of time. I went from knowing nothing about Python to coding the piece of our backend used to find underpriced items. It was a great learning experience, about both Python and eBay’s API (which was very easy to use, thanks eBay!).

While Kunal and myself worked on the backend, Craig and Britt were busy making an amazing front-end. Craig’s skills in both back-end development and front-end development proved crucial in making our application a success.

On Saturday, a name was decided upon for our hack: Dat Price. Special thank you to for the free domain name.

Speaking of successes, we presented our application to the judges on Sunday and won eBay’s prize! It was an exciting moment, and I’m honored we were selected to win. Thanks eBay!

Alexis Ohanian and me

Me and a really chill dude

Other notable events

Alexis Ohanian showed up

I got to meet him! He could only stay for a little while, but he was really excited about the hacks going on and took the time to take pictures with everyone (or at least as many people as he could) before his manager told him he had to get going.

Free In-N-Out Burger was given out

It was delicious.

A lot of great tech companies and start-ups were there

Some of my favorites included Pebble, Whisper, Firebase, Fitbit, Namecheap, Dropbox, Mitek, Lob, and eBay, Inc.


I’d like to thank all the organizers of Hacktech for putting on a great hackathon, and the sponsors for providing the funding to make it happen.

I’d also like to give a shout out to my team. I enjoyed hacking with all of you and would gladly do so again.


California Highlights: Winter 2014

This weekend I had the pleasure of attending HackTech, a hackathon put on by the wonderful students of Caltech in a exhibition space at Santa Monica Place.

As I sit here at the airport awaiting my flight home, I thought I’d share some of my experiences in California with you.

Hackathon-specific post coming shortly (when I’m less sleep deprived).


CaliforniaMy flight landed at LAX, and I exchanged the cold and snow in Baltimore for weather in the mid-seventies.


I had a $25 credit to Uber, so I requested a driver from my phone and gave the service a try. For those unaware, Uber is a car service company which has been making headlines for competing with the taxi industry. They offer three levels of service: uberX, Black Car, and SUV. I chose uberX, which is their car-sharing service that lets everyday people sign up to drive customers in their own car. It’s essentially a crowd-sourced taxi service.

My driver arrived at LAX, and after brief confusion about where he was picking me up (apparently drivers aren’t allowed to pick up from the airport in California), we were on our way to the hostel.

I’d definitely recommend giving Uber a try if you haven’t already. If you want to help me out, use this referral link. You’ll get a $20 credit to Uber, and I’ll get one too. We both win!

Venice Beach Hostel

I soon met up with Kunal and checked into Venice Beach Hostel. Our third team member, Craig, had his flight canceled on account of the snow back in Philadelphia. Unfortunately, he wouldn’t be arriving until the next day.

Our room was nicer than I’d expected; it was spacious with two bunk beds, a private bathroom, and a high ceiling. For my first hostel experience, I was impressed.

Venice Boardwalk

Our adventure begins


The day of exploration.

We woke up early and hit up the Venice Boardwalk.

After a bit, we broke away from the shops and street vendors to walk along the beach with the warm sand beneath our toes.

We soon arrived at a mostly deserted Santa Monica Pier, where we discovered people aren’t too keen on waking up early on the West Coast.

 Venice Pier

Ocean from the pier

We also discovered you probably shouldn’t eat the fish you catch in the ocean.

After the pier, we were off to the city of Santa Monica.

Santa Monica

It had a more upscale vibe than Venice. There were more shops, high-end restaurants, and even an outdoor mall.

We hit up a Thai place for lunch, and I tried Red Curry and Thai Tea for the first time. I’m happy to say they were both delicious.

Thai food

Red curry with brown rice, potato wantons and Thai tea.

Santa Monica

Santa Monica, which translates to Saint Monica for you English speakers out there












Craig arrived at night, and after some lengthy brainstorming our hackathon idea was hatched (to be revealed shortly).


Beach. All day.


Lamb wrap with sweet potato fries

Nom nom nom

At night, we hit up a local Greek restaurant. I ordered a lamb wrap with sweet potato fries. The wrap was good, and I loved the way they did the sweet potato fries (some places add salt to their sweet potato fries, which distracts from the flavor. These were sweet and made the way sweet potato fries should be).


Fried Bananas and Coconut Ice Cream

Thai Dessert

We check out of our hotel and book an UberX ride to Santa Monica.

Our group splits up to explore, then meet back up again for a light snack before lunch. We hit up our new favorite Thai place, and I order fried bananas and coconut ice cream (another first for me). The ice cream was alright, but the fried bananas were delicious.

Then, we were off to Whisper, who were giving a tech talk and free lunch to 25 HackTech attendees. Unaware of what Whisper was, but enticed by the chance to meet a local startup (and free lunch), we headed down Ocean Avenue. We soon arrived at the address of Whisper’s office: a place off Ocean Avenue (I’ve always wanted an excuse to say that). If you’ve ever seen The Social Network, their place was a bit like the setup they had in that. It was a house with conference rooms and workspaces set up inside. The best part was the backyard though. It had a pool, basketball courts, a hammock (that was quickly taken advantage of by yours truly), and a guest house. We ate for a bit, met our fourth team member Britt, then listened to co-founder Michael Heyward talk about his company and how it got started. I enjoyed listening to it, and really liked how approachable he was. If you’re reading this, thanks for having us Whisper.

Then, we were off to HackTech. But, that’s a story for another post.

Here’s your reward for reading until the end: a picture of Bumblebee and Optimus Prime at Venice Beach.

Bumblebee and Optimus Prime