Exactly-once or not, atomic broadcast is still impossible in Kafka – or anywhere


Update: Jay responded on Twitter, which you can read here.

I read an article recently by Jay Kreps about a feature for delivering messages ‘exactly-once’ within the Kafka framework. Everyone’s excited, and for good reason. But there’s been a bit of a side story about what exactly ‘exactly-once’ means, and what Kafka can actually do.

In the article, Jay identifies the safety and liveness properties of atomic broadcast as a pretty good definition for the set of properties that Kafka is going after with their new exactly-once feature, and then starts to address claims by naysayers that atomic broadcast is impossible.

For this note, I’m not going to address whether or not exactly-once is an implementation of atomic broadcast. I also believe that exactly-once is a powerful feature that’s been impressively realised by Confluent and the Kafka community; nothing here is a criticism of that effort or the feature itself. But the article makes some claims about impossibility that are, at best, a bit shaky – and, well, impossibility’s kind of my jam. Jay posted his article with a tweet saying he couldn’t ‘resist a good argument’. I’m responding in that spirit.

In particular, the article makes the claim that atomic broadcast is ‘solvable’ (and later that consensus is as well…), which is wrong. What follows is why, and why that matters.

I have since left the pub. So let’s begin.

Continue reading

Make any algorithm lock-free with this one crazy trick

Lock-free algorithms often operate by having several versions of a data structure in use at one time. The general pattern is that you can prepare an update to a data structure, and then use a machine primitive to atomically install the update by changing a pointer. This means that all subsequent readers will follow the pointer to its new location – for example, to a new node in a linked-list – but this pattern can’t do anything about readers that have already followed the old pointer value, and are traversing the previous version of the data structure.

Continue reading

Distributed systems theory for the distributed systems engineer

Gwen Shapira, SA superstar and now full-time engineer at Cloudera, asked a question on Twitter that got me thinking.

My response of old might have been “well, here’s the FLP paper, and here’s the Paxos paper, and here’s the Byzantine generals paper…”, and I’d have prescribed a laundry list of primary source material which would have taken at least six months to get through if you rushed. But I’ve come to thinking that recommending a ton of theoretical papers is often precisely the wrong way to go about learning distributed systems theory (unless you are in a PhD program). Papers are usually deep, usually complex, and require both serious study, and usually significant experience to glean their important contributions and to place them in context. What good is requiring that level of expertise of engineers?

And yet, unfortunately, there’s a paucity of good ‘bridge’ material that summarises, distills and contextualises the important results and ideas in distributed systems theory; particularly material that does so without condescending. Considering that gap lead me to another interesting question:

What distributed systems theory should a distributed systems engineer know?

A little theory is, in this case, not such a dangerous thing. So I tried to come up with a list of what I consider the basic concepts that are applicable to my every-day job as a distributed systems engineer; what I consider ‘table stakes’ for distributed systems engineers competent enough to design a new system. Let me know what you think I missed!

Continue reading

The Elephant was a Trojan Horse: On the Death of Map-Reduce at Google

Note: this is a personal blog post, and doesn’t reflect the views of my employers at Cloudera

Map-Reduce is on its way out. But we shouldn’t measure its importance in the number of bytes it crunches, but the fundamental shift in data processing architectures it helped popularise.

This morning, at their I/O Conference, Google revealed that they’re not using Map-Reduce to process data internally at all any more.

We shouldn’t be surprised. The writing has been on the wall for Map-Reduce for some time. The truth is that Map-Reduce as a processing paradigm continues to be severely restrictive, and is no more than a subset of richer processing systems.

Continue reading

Paper notes: MemC3, a better Memcached

MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter Hashing

Fan and Andersen, NSDI 2013

The big idea: This is a paper about choosing your data structures and algorithms carefully. By paying careful attention to the workload and functional requirements, the authors reimplement memcached to achieve a) better concurrency and b) better space efficiency. Specifically, they introduce a variant of cuckoo hashing that is highly amenable to concurrent workloads, and integrate the venerable CLOCK cache eviction algorithm with the hash table for space-efficient approximate LRU.

Continue reading