Your jemalloc case is using 1564672 kB, this means you only have 764 hugepages. ...

antirez · on Nov 3, 2014

This was my first thought as well, but actually this is not what is happening AFAIK, and the performance hit is likely due to inefficient huge page allocation. There are reasons I believe this, but I'm actually checking in a more systematic way right now before saying random things.

EDIT: you were exactly right. This is what happens, there are 50 clients in the benchmark, with many queued requests, so indeed since the benchmark is designed to touch all the keys evenly, what happens is that every client served in a given event loop cycle has a big chance to get a page fault. This seemed unrealistic to me, since I saw the spike in a single event-loop cycle, but it is how is working actually. Thanks!