While I really liked the main idea of the Lambda Architecture, I think this article is answering my doubts I had. It is really more reasonable to have another stream processing pipeline for treating historical data, rather than separate hadoop or other map-reduce framework. In case you need to reprocess the data, you instantiate another pipeline and stream that historical data through it. Clever!