Paper Trail

Computer Systems, Distributed Algorithms and Databases

Category: Uncategorized

Make any algorithm lock-free with this one crazy trick

Lock-free algorithms often operate by having several versions of a data structure in use at one time. The general pattern is that you can prepare an update to a data structure, and then use a machine primitive to atomically install the update by changing a pointer. This means that all subsequent readers will follow the […]

The Elephant was a Trojan Horse: On the Death of Map-Reduce at Google

Note: this is a personal blog post, and doesn’t reflect the views of my employers at Cloudera Map-Reduce is on its way out. But we shouldn’t measure its importance in the number of bytes it crunches, but the fundamental shift in data processing architectures it helped popularise. This morning, at their I/O Conference, Google revealed […]

ByteArrayOutputStream is really, really slow sometimes in JDK6

TLDR: Yesterday I mentioned on Twitter that I’d found a bad performance problem when writing to a large ByteArrayOutputStream in Java. After some digging, it appears to be the case that there’s a bad bug in JDK6 that doesn’t affect correctness, but does cause performance to nosedive when a ByteArrayOutputStream gets large. This post explains […]

On Raft, briefly

Raft is a new-ish consensus implementation whose great benefit, to my mind it, is its applicability for real systems. We briefly discussed it internally at Cloudera, and I thought I’d share what I contributed, below. There’s an underlying theme here regarding the role of distributed systems research in practitioners’ daily work, and how the act […]


Reasoning about Knowledge Toward a Cloud Computing Research Agenda (2009) – “One of the LADIS attendees commented at some point that Byzantine Consensus could be used to improve Chubby, making it tolerant of faults that could disrupt it as currently implemented. But for our keynote speakers, enhancing Chubby to tolerate such faults turns out to […]

Something a bit different: translations of classic mathematical texts (!)

During his retirement, my father has been able to spend much time indulging his love of mathematics. This included, amongst other impressive endeavours, attending Cambridge at a more advanced age than average to take (and pass!) the Part III of the Mathematical Tripos, often considered one of the hardest taught courses in maths in the […]

EuroSys 2012 blog notes

EuroSys 2012 was last week – one of the premier European systems conferences. Over at the Cambridge System Research Group’s blog, various people from the group have written notes on the papers presented. They’re very well-written summaries, and worth checking out for an overview of the research presented. Day 1 Day 2 Day 3

Should I take a systems reading course?

A smart student asked me a couple of days ago whether I thought taking a 2xx-level reading course in operating systems was a good idea. The student, understandably, was unsure whether talking about these systems was as valuable as actually building them, and also whether, since his primary interest is in ‘distributed’ systems, he stood […]

I’m talking at Strata Conference 2012

I’ll be giving a talk at this year’s Strata Conference in Santa Clara, on February 29th. My talk is called Monitoring Apache Hadoop – A Big Data Problem?. I’d be lying if I said that every slide was fully realised at this point, but you can read the abstract to see what I’ve committed myself […]

How consistent is eventual consistency?

This page, from the ‘PBS’ team at Berkeley’s AMPLab is quite interesting. It allows you to tweak the parameters of a Dynamo-style system, then by running a series of Monte Carlo simulations gives an estimate of the likelihood of staleness of reads after writes. Since the Dynamo paper appeared and really popularised eventual consistency, the […]