Consensus Protocols: Two-Phase Commit

Posted: November 27th, 2008 | Author: | Filed under: computer science, Distributed systems, Essay | Tags: , , | 15 Comments »

For the next few articles here, I’m going to write about one of the most fundamental concepts in distributed computing – of equal importance to the theory and practice communities. The consensus problem is the problem of getting a set of nodes in a distributed system to agree on something – it might be a value, a course of action or a decision. Achieving consensus allows a distributed system to act as a single entity, with every individual node aware of and in agreement with the actions of the whole of the network.

 For example, some possible uses of consensus are:

  • deciding whether or not to commit a transaction to a database
  • synchronising clocks by agreeing on the current time
  • agreeing to move to the next stage of a distributed algorithm (this is the famous replicated state machine approach)
  • electing a leader node to coordinate some higher-level protocol

Such a simple-sounding problem has surprisingly been at the core particularly of theoretical distributed systems research for over twenty years. How come? As I see it, the answers are threefold.

Read the rest of this entry »


BigTable: Google's Distributed Data Store

Posted: October 29th, 2008 | Author: | Filed under: computer science, Distributed systems, Paper Walkthrough | Tags: , , , | 2 Comments »

Although GFS provides Google with reliable, scalable distributed file storage, it does not provide any facility for structuring the data contained in the files beyond a hierarchical directory structure and meaningful file names. It’s well known that more expressive solutions are required for large data sets. Google’s terabytes upon terabytes of data that they retrieve from web crawlers, amongst many other sources, need organising, so that client applications can quickly perform lookups and updates at a finer granularity than the file level.

So they built BigTable, wrote it up, and published it in OSDI 2006. The paper is here, and my walkthrough follows.

Read the rest of this entry »


Yahoo's PNUTS

Posted: October 12th, 2008 | Author: | Filed under: computer science, Distributed systems, Paper Walkthrough | Tags: , , , | 1 Comment »

In these politically charged times, it’s important for written media to give equal coverage to all major parties so as not to appear biased or to be endorsing one particular group. With that in mind, we at Paper Trail are happy to devote significant programming time to all the major distributed systems players.

This, therefore, is a party political broadcast on behalf of the Yahoo Party.

PNUTS: Yahoo!’s Hosted Data Serving Platform

(Please note, that’s the first and last time in this article that I’ll be using the exclamation mark in Yahoo’s name, it looks funny.)

As you might expect from the company that runs Flickr, Yahoo have need for a large scale distributed data store. In particular, they need a system that runs in many geographical locations in order to optimise response times for users from any region, while at the same time coordinating data across the entire system. As ever, the system must exhibit high availability and fault tolerance, scalability and good latency properties.

These, of course, are not new or unique requirements. We’ve seen already that Amazon’s Dynamo, and Google’s BigTable/GFS stack offer similar services. Any business that has a web-based product that requires storing and updating data for thousands of users has a need for a system like Dynamo. Many can’t afford the engineering time required to develop their own tuned solution, so settle for well-understood RDBMS-based stacks. However, as readers of this blog will know, RDBMSs can be almost too strict in terms of how data are managed, sacrificing responsiveness and throughput for correctness. This is a tradeoff that many systems are willing to explore.

PNUTS is Yahoo’s entry into this space. As usual, it occupies the grey areas somewhere between a straight-forward distributed hash-table and a fully-featured relational database. They published details in the conference on Very Large DataBases (VLDB) in 2008. Read on to find out what design decisions they made…

(The paper is here, and playing along at home is as ever encouraged).

Read the rest of this entry »


The Google File System

Posted: October 1st, 2008 | Author: | Filed under: computer science, Distributed systems, Operating systems, Paper Walkthrough | Tags: , , , , | 3 Comments »

It’s been a little while since my last technically meaty update. One system that I’ve been looking at a fair bit recently is Hadoop, which is an open-source implementation of Google’s MapReduce. For me, the interesting part is the large-scale distributed filesystem on which it runs called HDFS. It’s well known that HDFS is based heavily on its Google equivalent.

In 2003 Google published a paper on their Google File System (GFS) at SOSP, the Symposium on Operating Systems Principles. This is the same venue at which Amazon published their Dynamo work, albeit four years earlier. One of the lecturers in my group tells me that SOSP is a venue where “interesting” is rated highly as a criterion for acceptance, over other more staid conferences. So what, if anything, was interesting about GFS? Read on for some details…
Read the rest of this entry »


Consistency and availability in Amazon's Dynamo

Posted: August 26th, 2008 | Author: | Filed under: computer science, Distributed systems, Paper Walkthrough | Tags: , , | 12 Comments »

There is a continuing and welcome trend amongst large, modern technology companies like Google, Yahoo and Amazon to publish details of their systems at academic conferences. One of the problems that researchers at universities have is making a convincing case that their ideas would work well in the real world, since no matter how many assumptions are made there really is no substitute for field testing, and the infrastructure, workloads and data just aren’t available to do that effectively. However, companies have infrastructure to burn and a genuine use-case with genuine users. Using their experience and data to discover what does and doesn’t work, and what is and is not really important provides an invaluable feedback loop to researchers.

More than that, large systems are built from a set of independent ideas. Most academic papers leave the construction of a practical real-world system as an exercise for the reader. Synthesising a set of disparate techniques often throws up lots of gotchas which no papers directly address. Companies with businesses to run have a much greater incentive to build a robust system that works.

At 2007′s Symposium on Operating Systems Principles (SOSP), Amazon presented a paper about one of their real-world systems: “Dynamo: Amazon’s Highly Available Key-value Store”. It wound up winning, I think, the audience prize for best paper. In this post, I was planning to describe Dynamo ‘inside-out’, based on a reading group mandated close reading of the paper. However, trying to lucidly explain a dense 12 page paper leads to many more than 12 pages of explanation. So instead, I want to focus on one particular aspect of Dynamo which I think is the most interesting.

Read the rest of this entry »


In Defense Of Computer Science

Posted: January 30th, 2008 | Author: | Filed under: computer science, Essay | Tags: , | 2 Comments »

Computer science, as an academic discipline, has been the subject of a great deal of scrutiny lately. Of particular interest has been the worth of a CS degree to us, the fee-paying consumers who apparently want nothing more than to transform their college dollars into CV-ready bullet points that will smooth the path to the cubicle job of our dreams.Many of the arguments in these recent exchanges have been predictable. Java is used as a placeholder for all that is wrong with the subject, because teaching it as an introductory language sacrifices pedagogical merit for practical applicability. Proponents of its use retort that they’ll never use type theory anyhow, so what use is it learning it in the first place? The reply is that if they knew about it, they’d get a job where they had to know about it, and so on.

What has surprised me most is that no-one has stood up and defended computer science on its own merits: as an academic subject of some breathtaking beauty and profundity.

Read the rest of this entry »