Some miscellanea


I wrote an FAQ on The CAP Theorem. The aim is to definitively settle some of the common misconceptions around CAP so as to help prevent its invocation in useless places. If someone says they got around CAP, refer them to the FAQ. It should be a pretty simple introduction to the theorem as well. I think that CAP itself is a pretty uninteresting result, but it does at least shine a light on tradeoffs implicit in distributed systems. I have a couple of residual thoughts about failures rather than partitions that I might write up at some point in the future, but otherwise I hope the FAQ helps move the conversation on.

Impala and aggregation trees

I also wrote a quick answer on Quora about Impala’s execution trees, and how deeper trees help do aggregation more effectively. There’s a lot more of interest to write about planning, partitioning and scheduling queries.

Not so HotOS

Matt Welsh talks about what he wishes systems researchers would work on. I partially agree: there are few papers in the HotOS 2013 program that pass the “if this were true, a lot of our assumptions would be wrong” test. I don’t think that IO is an unworthy topic however, but the main problem with improving IO interfaces is inertia rather than want for better ideas. I hope that SSDs and / or flash-as-RAM will be forcing functions here. Otherwise his big topics are all worthy research challenges, but I have always seen HotOS as a venue for new ideas, not new topics – despite its name. If it really were intended to be a venue for the fashionable areas of study in systems it would have much less potential value.

Building distributed systems

Andy Gross (Basho VP Eng) gave a closing keynote calling for more reusable primitives in distributed systems. I couldn’t agree more (and have been gently complaining about this for years to anyone who would listen), although I think doing this right is not straightforward and requires a lot of careful thought about RPC mechanisms, libraries vs. services and much more. I have some thoughts on this that I want to wrap up in a blog post sometime soon.

Highly available transactions

Peter Bailis, a Berkeley PhD candidate, has some solid work on Highly Available Transactions, based on exploring the space of consistency models available to databases and finding that there are demonstrably useful consistency guarantees that can be made when availability is kept high. This is the right way to move on from all the CAP arguments – given that we are stuck with certain impossibility results, let’s map out the vast terrain that we can conquer.

CAP confusion: Problems with Partition Tolerance

Over on the Cloudera blog I’ve written an article that should be of interest to readers of this blog.

I’m no great fan of the ubiquity of the CAP theorem – it’s a solid impossibility result which appeals to the theorist in me, but it doesn’t capture every fundamental tension in a distributed system. For example: we make our systems distributed across more than one machine usually for reasons of performance and to eliminate a single point of failure. Neither of these motivations are captured verbatim by the CAP theorem. There’s more to designing distributed systems!

In this, I agree with Stonebraker; it’s the erroneous representation of ‘partition tolerance’ that I found very strange. I’ve been a good deal more forceful in private about this than I have in public 🙂

GFS Retrospective in ACM Queue

This is a really great article. Sean Quinlan talks very openly and critically about the design of the Google File System given ten years of use (ten years!).

What’s interesting is that the general sentiment seems to be that the concessions that GFS made for performance and simplicity (single master, loose consistency model) have turned out to probably be net bad decisions, although they probably weren’t at the time.

There are scaling issues with GFS – the well known many-small-files problem that also plagues HDFS, and a similar huge-files problem. Both increase the amount of metadata that a master has to maintain, and both therefore degrade performance due to some linear costs on metadata operations.

A replacement for GFS is in the works – a complete re-do, not just incremental updates to the design. This will involve, at least, a rethink of metadata storage and clearly a more fault-tolerant and scalable design. Multiple appenders, also a bugbear of HDFS, may be serialised through a single writer process, which would help the consistency issues.

“Our user base has definitely migrated from being a MapReduce-based world to more of an interactive world that relies on things such as BigTable. Gmail is an obvious example of that. Videos aren’t quite as bad where GFS is concerned because you get to stream data, meaning you can buffer. Still, trying to build an interactive database on top of a file system that was designed from the start to support more batch-oriented operations has certainly proved to be a pain point.”

One of those great articles where every paragraph is rich with insight. Worth reading, probably twice.


What do you know, StackOverflow has its useful moments.

The question in, er, question relates to an issue I had been having myself with the behaviour of lexical scoping in Python, but had been able to work around sufficiently easily to not devote time to finding a proper solution. This in itself is an unfortunate truth of work; there isn’t enough time to investigate every interesting problem thoroughly. StackOverflow might just turn out to help with that niche: I presume that over time it’s going to evolve into a Not-So-Frequently-Asked-But-Still-Interesting-Questions repository. I’ve put some effort myself into answering some questions about basic computer science and distributed systems. The reputation farming I can do without, the badges are quite a neat feature – they encourage participation passively. The problem is that there really is a blind-leading-the-blind feel to some question answers, especially in the areas of data structures and the ever popular and yet heavily abused ‘big-O’ notation. If people speak authoritatively enough then they will be taken as authoritative, garner reputation points which serve as a feedback loop, amplifying their authority on future occasions. Unfortunately, there seem to be more people willing to upvote correct sounding answers than those who know whether an answer is actually correct. Time will tell if that is a general truth or just an early adoption issue. I suspect, alas, that it might be the latter.