Last month I had a project for my advanced databases course, it was to implement a simple iterative UV matrix decomposition algorithm using Hadoop. This post is just about the first step: setting up Hadoop to work, not the algorithm itself. I’m gonna talk about that in a future post.
For those of you noobs like me who never tried Hadoop on more complicated stuff beyond Apache’s boring copy & paste WordCount, it can be a real pain in the butt to run anything more complicated, simply because it’s just one of the worst-documented, badly written pieces of software that implement a really nice idea!
One good idea is to compile a Hadoop version from scratch, rather than use the pre-compiled libraries. This way you should be able to debug and run your Hadoop applications easily. For that, take a look at this video tutorial.
In this post however, I list very simple steps to quickly get Hadoop running on a Mac. So here we go:
1. Download Hadoop
2. Setting Environment Variables
- JAVA_HOME: needs to point to the directory for your Java binaries. To find this you can first get which Java binary is set in your path by running which java, then running readlink on the output path from that. Note that sometimes the readlink would just return another symlink, which you need to follow again. Do that till you reach the JRE/bin/java directory.
- HADOOP_HOME: needs to point to the bin directory of your downloaded Hadoop version. In our case, set it to /home/user_name/Downloads/hadoop-*/bin, where hadoop-* is your hadoop version.