Cloudera Impala

If you have a strong background in either databases or distributed systems, and fancy working on such an exciting technology, send me a note!

It’s great to finally be able to say something about what I’ve been working at Cloudera for nearly a year. At StrataConf / Hadoop World in New York a couple of weeks ago we announced Cloudera Impala. Impala is a distributed query execution engine that understands a subset of SQL, and critically runs over HDFS and HBase as storage managers. It’s very similar in functionality to Apache Hive, but it is much, much, much (anecdotally up to 100x) faster.
Continue reading

On some subtleties of Paxos

There’s one particular aspect of the Paxos protocol that gives readers of this blog – and for some time, me! – some difficulty. This short post tries to clear up some confusion on a part of the protocol that is poorly explained in pretty much every major description.
Continue reading