Paper Trail

Computer Systems, Distributed Algorithms and Databases

Category: Distributed systems

Distributed systems theory for the distributed systems engineer

Gwen Shapira, SA superstar and now full-time engineer at Cloudera, asked a question on Twitter that got me thinking. My response of old might have been “well, here’s the FLP paper, and here’s the Paxos paper, and here’s the Byzantine generals paper…”, and I’d have prescribed a laundry list of primary source material which would […]

Some miscellanea

CAP FAQ I wrote an FAQ on The CAP Theorem. The aim is to definitively settle some of the common misconceptions around CAP so as to help prevent its invocation in useless places. If someone says they got around CAP, refer them to the FAQ. It should be a pretty simple introduction to the theorem […]

Cloudera Impala

If you have a strong background in either databases or distributed systems, and fancy working on such an exciting technology, send me a note! It’s great to finally be able to say something about what I’ve been working at Cloudera for nearly a year. At StrataConf / Hadoop World in New York a couple of […]

On some subtleties of Paxos

There’s one particular aspect of the Paxos protocol that gives readers of this blog – and for some time, me! – some difficulty. This short post tries to clear up some confusion on a part of the protocol that is poorly explained in pretty much every major description.

FLP and CAP aren’t the same thing

An interesting question came up on Quora this last week. Roughly speaking, the question asked how, if at all, the FLP theorem and the CAP theorem were related. I’d thought idly about exactly the same question myself before. Both theorems concern the impossibility of solving fairly similar fundamental distributed systems problems in what appear to […]

The Theorem That Will Not Go Away

The CAP theorem gets another airing. I think the article makes a point worth making again, and makes it fairly well – that CAP is really about P=> ~(C & A). A couple of things I want to call out though, after a rollicking discussion on Hacker News. “For a distributed (i.e., multi-node) system to […]

CAP confusion: Problems with Partition Tolerance

Over on the Cloudera blog I’ve written an article that should be of interest to readers of this blog. I’m no great fan of the ubiquity of the CAP theorem – it’s a solid impossibility result which appeals to the theorist in me, but it doesn’t capture every fundamental tension in a distributed system. For […]

GFS Retrospective in ACM Queue

This is a really great article. Sean Quinlan talks very openly and critically about the design of the Google File System given ten years of use (ten years!). What’s interesting is that the general sentiment seems to be that the concessions that GFS made for performance and simplicity (single master, loose consistency model) have turned […]

Barbara Liskov's Turing Award, and Byzantine Fault Tolerance

Barbara Liskov has just been announced as the recipient of the 2008 Turing Award, which is one of the most important prizes in computer science, and can be thought of as our field’s equivalent to the various Nobel Prizes. Professor Liskov is a worthy recipient of the award, even if judged alone by her citation […]

OSDI '08: FlightPath: Obedience vs. Choice in Cooperative Services

This is one of my favourite papers from OSDI ’08 (yes, still doing a few reviews, trying to get to five or so before SOSP…). FlightPath is a system developed by some folks mainly at UT Austin for peer-to-peer streaming in dynamic networks. This is a reasonably challenging problem in itself, although one that’s seen […]