Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Hadoop ported to R (and it's trivial) (revolution-computing.com)
17 points by Anon84 on Nov 16, 2009 | hide | past | favorite | 7 comments


Misleading title at best as Hadoop is a framework for the management and execution of map/reduce while the article demonstrates a map/reduce operation in R. I guess Hadoop has been ported to Clojure as well since I can just say:

(apply + (range 10))

And get an answer of 45. Hadoop must have been ported to MySQL as well:

SELECT type, SUM(price) FROM products GROUP BY type

It's not that I mind people pointing out the obvious (map/reduce style constructs have been around forever), but I'm accustomed to a port meaning something being migrated from one platform to another with a comparable feature set.


What a stupid article. "It's not quite that simple, of course".

So the distributed petabyte-scale filesystem that scales to thousands of nodes, the recent work on append which guarantees fresh writes will be visible to all hosts, the efforts at map-locality which will run your map function on the host where the data split is located, compression layers to improve I/O throughput..

Trivial?


I'm sure they could bang that out in an afternoon. A weekend, at most.

</sarcasm>

I really wish people would actually read the MapReduce paper before trying to talk about what it is.


amen, the MapReduce paper is all about the distributed system they built around the core algorithm.

this article would be akin to somebody 5 years ago saying "hey guys, it's easy to implement Google's search engine in 5 lines of R ... here's the calculation for PageRank, the rest is just detailz!"


In addition to the points below about Hadoop's implementation of MapReduce, Hadoop consists of many subprojects which perform complex tasks: HDFS, a reliable, petabyte-scale distributed file system; HBase, a clone of Google's BigTable; Pig and Hive, which provide higher-level syntax, columnar storage in HDFS, and a persistent metadata repository; and Zookeeper, a coordination service for distributed systems.

I'm not impressed by how REvolution's new management is approaching marketing.


sorry to sound trollish, but the sad part is that this sort of marketing-speak masquerading as technical blogging might actually be able to earn companies 'streed cred' with customers. their potential clients might be like "oh wow you guys really CAN implement Google's famous MapReduce or Yahoo's Hadoop in a few lines of R, and you even wrote about it on a techie blog, we'll buy it!"


"It's not quite that simple, of course"

No, it isn't, or else every language with map and reduce functions would have a trivial, one-liner version of Hadoop.

Does anyone have any familiarity with the MapReduce package?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: