“Well, I’m back.”

Astute readers may have noticed that this blog was unavailable for the past three-and-a-half months. The short reason for this was that the computer on which Paper Trail was hosted was on a boat in the Atlantic and it is surprisingly hard to get a good internet connection out there, let alone a power supply to the shipping crate it was in.

The long reason was that I have now moved across the ocean, from Cambridge to San Francisco, where I am living in order that the commute to Cloudera be a little shorter than 24 hours. My possessions finally arrived on Thursday – over three months since we shipped them – and we are finally getting everything in order, including getting this blog back online. I’m now hosted at Bluehost, and have transferred all the old posts over to the new WordPress installation. I don’t yet have access to the images, so unfortunately those wil have to wait, but most other things are in order. Please excuse the dust as I find out which links are broken.

The old address – http://hnr.dnsalias.net/wordpress – will still work, but the correct link is now our very own domain: http://the-paper-trail.org/. Hopefully we will start showing up in Google again when the crawlers do their thing. Please update your rss readers – those of you who are still following and haven’t deleted my feed in disgust. Does anyone even still use rss readers anymore?

I am in the process of deciding what to write about for the next few posts. I still have the remainder of the theory of computation posts to write, which would be fun to do but is a bit of a departure from the systems focus. I never really fully explained Byzantine Fault Tolerance – at least, I never got as far as describing Zyzzyva and other modern systems. At the same time, some interesting stuff in systems research has happened since the blog went quiet – Google released the Go programming language which is intriguing for writing user-space systems software. SOSP 2009 happened, with some very cool papers which I really want to write about. And I’ve been busy myself – I was recently made a committer on the Apache ZooKeeper project, which is a distributed coordination system written by some engineers and researchers at Yahoo!, and is very cool. My largest contribution was a patch for ‘observers’ – which are listeners, in Paxos terminology – which help maintain the read performance of the cluster as the number of clients scales.

So, lots going on, plenty to write about, and some exciting possibilities coming down the queue. Good to be back.

Diagrams, and the state of the union

Due to popular request, I’ve started retrospectively adding some diagrams to articles that really need them. First to get the treatment has been two-phase commit – by far the most popular article on this blog. The Dynamo article will be the next up, then 3PC and maybe the GFS and BigTable entries.

I’m using an old version of OmniGraffle, which came installed on my Powerbook G4. I recently replaced the power supply board in the G4 (which involved some hair-raising open Mac surgery) and am delighting in having all these great applications at my fingertips again. OmniGraffle makes diagrams for things like this so very easy. The effort expended in producing a diagram is far less than that for writing 1000 words, so if the old adage is true this is a very efficient way of producing content.

Although Real Work is consuming a lot of my time at the moment, I’ve been pretty good at finding time here and there to write for this blog. I’ve got two outward-facing goals here. The first is to make available some clear explanations for basic distributed systems theory and practice. I think there’s a niche for good work here – I don’t think any textbooks I have read adequately treat practice in a sufficiently theoretical way, and the theory textbooks can be too abstract to be accessible. Therefore I’ve been writing articles like the tour of FLP impossibility, the aforementioned introduction to two-phase and three-phase commit and the discussion of consensus in the context of lossy links. Continuing in this vein, I have plans to talk about Paxos, failure detectors, distributed spanner construction and some more simple, fundamental distributed algorithms such as leadership election.

My second ‘public-facing’ goal is to survey some of the more interesting (and occasionally less interesting) systems research, with a particular emphasis on real systems that exist and work. Hence the GFS, BigTable and PNUTS articles, and the recent series on OSDI papers. Part of my day job is being familiar with OSDI, NSDI, SOSP, HotOS, Mobi* etc. conferences and workshops, and by writing the articles I get the chance to consolidate my understanding, which is highly useful.

I’d be very interested to hear, by mail or by comment, if there are any particular topics that you’d like me to cover. I suspect the imminent article on Paxos will be popular (executive summary: it’s not that hard, especially if you already understand 3PC), but otherwise it’s hard to gauge what people are looking forward to reading on this blog. I even would enjoy picking up writing about algorithms that I abortively started to do. So help me out!