Feeds:
Posts
Comments

In his famous book, The God Delusion, Richard Dawkins holds a comparison between the two hypotheses for our existence in this universe:

  • Either there is a complex intelligent entity, a God, who created this universe and fine-tuned it to be suitable for life.
  • Or, we can trace, through some natural-selection-infused evolution, some path of infinite regress, leading up to utter simplicity that explains the origin of life, something modern science has not offered an explanation for as of yet.

Dawkins’ Razor would be that the simpler hypothesis is always better, since the more complicated one (God in his opinion) introduces more questions than it answers — a major argument, discussed by many atheists.

However, what I fail to understand is their complete denial of the idea of God, even one that is in itself representing some step of the infinite regress. I am not myself a follower of such logic, but I wonder at why the idea of God is so instantly tabooed (I wouldn’t say refuted, I strongly think it couldn’t, won’t be) by atheists. In their convoluted logic, why wouldn’t they consider the idea that our life can be traced back (regresses) to ever-simpler forms, until at some point it gives away to the creator. Who said this has to be the end of the story? As much as they don’t know how exactly life started, they wouldn’t also know how God came to be. Why the persistence on keeping God out of the table.

Now I know that Darwin’s, and later Dawkins’, idea of infinite regress claims that each step is more complicated than the one outdating it. And I’m asking, why can’t a complicated entity create something simpler, in the same sense that an artist creates his art, or an engineer creates a marvel of construction?

SVD is probably the single most successful technique used to do recommendations in collaborative filtering. It’s been used in the Netflix contest, adding recommender systems to the very long list of problems it can solve. It’s not an easy solution though. Not to understand in the beginning, and not the implement either.

A simpler form of the decomposition is called UV, and it’s very widely known among students of data mining and information retrieval for being a simple algorithm that does a lot of heavy lifting in a variety of problems. This post is my attempt at building a scalable version of the simple UV algorithm described in Jeff Ullman’s book Mining of Massive Datasets, chapter 9. I used Hadoop to decompose a 2.5 gig netflix dataset on a cluster of 88 machines in a class project. The source code is available here, as well as a serial version.

One thing to note about working on a serious Hadoop project. Hadoop can easily drive you crazy. You go through so much trouble you actually manage to pull a few hairs out just setting your distributed cluster on 88 nodes and running your WordCount program, then development is horrifically slow because of all the unexplained bugs, the inability to debug properly, and the sluggish performance of a cluster that’s been cursed by Hadoop. But then right before you get the epiphany that programming is just not your thing, and you’d rather chase your dream of being a professional WoW player, everything works after you’ve fixed a bug, renamed a file, deleted a certain log, or eliminated some other really dumb excuse for a fault-tolerant framework not to work, and it just works beautifully!

a hidden treasure of obfuscated C code!! beau.ti.ful!

www.ioccc.org/2006/hamre/hint.text

Interesting, on how we think and more.

How fast do you think? – The Two Systems that define us

Computer’s pseudo-randomness not good enough for your lottery application? Try this. It gives you true randomness through analyzing atmospheric noise. Neat idea huh!

I wonder how they actually do it. If some important encryption technique is based on this service, how secure is it? Can’t their sensors be somewhat manipulated to provide a predictable stream of not-so-random-anymore numbers?

RANDOM.ORG – True Random Number Service

Last month I had a project for my advanced databases course, it was to implement a simple iterative UV matrix decomposition algorithm using Hadoop. This post is just about the first step: setting up Hadoop to work, not the algorithm itself. I’m gonna talk about that in a future post.

For those of you noobs like me who never tried Hadoop on more complicated stuff beyond Apache’s boring copy & paste WordCount, it can be a real pain in the butt to run anything more complicated, simply because it’s just one of the worst-documented, badly written pieces of software that implement a really nice idea!

One good idea is to compile a Hadoop version from scratch, rather than use the pre-compiled libraries. This way you should be able to debug and run your Hadoop applications easily. For that, take a look at this video tutorial.

In this post however, I list very simple steps to quickly get Hadoop running on a Mac. So here we go:

1. Download Hadoop


You can get the latest stable version from here: http://www.apache.org/dyn/closer.cgi/hadoop/common/ or you can go to the archive to find older versions. Sometimes you need a specific version for compatibility, and there are some tutorials that recommend a Hadoop version that’s earlier than 0.21.0 so that it works with the infamous Hadoop Eclipse plugin. I tried the plugin, it’s definitely not worth it. You’re better off either compiling your Hadoop, or just following these steps if you don’t need Eclipse’s step debugging.
Let’s assume you downloaded the version and decompressed it into your ~/Downloads directory.

2. Setting Environment Variables

You need two things set, either export them temporarily, or put your exports in your bash_rc or bash_profile:
  • JAVA_HOME: needs to point to the directory for your Java binaries. To find this you can first get which Java binary is set in your path by running which java, then running readlink on the output path from that. Note that sometimes the readlink would just return another symlink, which you need to follow again. Do that till you reach the JRE/bin/java directory.
  • HADOOP_HOME: needs to point to the bin directory of your downloaded Hadoop version. In our case, set it to /home/user_name/Downloads/hadoop-*/bin, where hadoop-* is your hadoop version.

3. Configuration and Local SSH

Follow apache’s pseudo-distributed operation steps for configurations and passphraseless SSH, they’re standard. Here.

4. Run it!

First, you need to tell Hadoop to set up its file system by running a format command:
> bin/hadoop namenode -format
Then start the daemons:
> bin/start-all.sh
You should see a few lines appear indicating the starting of the namenode, the datanode, the secondarynamenode, the jobtracker, and the tasktracker.

5. Monitor Your Jobs

You can use Hadoop’s graphical administration interface for the jobtracker, by default it’s at http://localhost:50030/jobtracker.jsp where you can see all your running, failed, and retired jobs, should look like this:

such a blessing! 🙂

Laughing Quadruplets – The Next Day (by Steve Mathias)

Tyrannosaurus is the best!

How Animals Eat Their Food (by MisterEpicMann)

Thelastpsychiatrist strikes again. On how the system sends misleading signals to women (at least that’s what i gathered from his mind-boggling-metaphores-infested writing!). He argues that women are picking the wrong fights with the system, striving for the trappings of power such as promotions, management, and work prestige that is without a real reason, but just for the sake of itself. Instead, they should fight for real chances to change the unbalanced workforce, and that is, in his opinion, all about how much money you make, you long you work, and other material benefits.

One quote i especially liked:
“It’s so great that Americans will still vote for a white guy even if he’s a little black”

hilarious gifs for moments in a developer’s life! it’s just awesome to use moments from House, the guy has such facial expressions!!