Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
10x speedup in write performance in Riak Innostore based on keyname (basho.com)
63 points by LiveTheDream on May 22, 2011 | hide | past | favorite | 11 comments


Be careful with this trick. Although it appears to work well for Riak, it can cause lots of problems for BigTable style stores, like HBase, since it bottlenecks all the writes through one node.

See: http://ikaisays.com/2011/01/25/app-engine-datastore-tip-mono...

HBase (and I've heard BigTable) work best with purely random row keys - like the author of this post was using initially.


It should be noted that Riak uses consistent hashing (where you route a key based on the Murmur or FNV hash of its md5 checksum) and virtual nodes. That means that even if keys are next to each other, they will get routed to different virtual nodes. Even in a single virtual node is "hot", virtual nodes do not map 1-to-1 onto physical nodes.

BigTable uses token ranges, which allows for range queries, but makes it vulnerable to this kind of a situation. This, however, shouldn't be needed with BigTable or HBase as BigTable and HBase use LSMs instead of a conventional B-Tree (what InnoDB -- which is what Innostore uses -- is): all writes and updates are strictly sequential, so this kind of "trick" is not needed.

[Disclaimer: Voldemort developer here, we use consistent hashing, virtual nodes and -- by default -- a log structured B+Tree from BerkeleyDB Java Edition]


Yeah, I realized that the first time I was looking at someone else's HBase and they had a primary key of timestamp.toString.reverse. ;)


This is also a best practice in CouchDB - in Couch as long as you use the UUIDs that Couch generates for you (via the /_uuids API endpoint) you'll get keys that are designed to minimize the work the b-tree has to do, to insert them.


There are basically 2 type of common UUIDs : 1 and 4.

UUID1 is generated from mac address of the machine + timestamp + random bits and UUID4 is completely random. Sometimes you'd want one sometimes the other.

You can try these in python as:

    >>> import uuid
    >>> uuid.uuid1()
    >>> uuid.uuid4()


IMHO for all this class of problems, that is: need to log big amount of data over time for years, the way to go is not Riak, nor Redis, nor <put your preferred DB name here> but, simply writing to files in append only mode (and when you can, using a fixed size record for fast access later).

There are good reasons IMHO for writing a small networked C server doing this work.


> There are good reasons IMHO for writing a small networked C server doing this

Right, because:

https://github.com/cloudera/flume

https://github.com/facebook/scribe/wiki

http://sna-projects.com/kafka/

http://www.freebsd.org/cgi/man.cgi?query=syslogd&sektion...

... don't exist?

(Formulation courtesy of abhay, plug for Kafka mine)


if there is something already great at doing this, sure, no need. I did not checked however, so can't talk about this specific projects.


Well, yes. Exactly.

Big ups to the Homo Sapiens posse.

- Lil' B


I've recently discovered that when you know almost nothing about the problem someone is trying to solve it is both easy and attractive to speculate about how much better your solution is than the one reached by the people who have to solve it. Full time. For money. I see you have discovered the same. Great minds think alike?

- Lil' B


good point... but another good one related to your is that often people do everything to avoid relaxing specifications, ending with complex designs.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: