OSDI '08: FlightPath: Obedience vs. Choice in Cooperative Services

This is one of my favourite papers from OSDI ’08 (yes, still doing a few reviews, trying to get to five or so before SOSP…). FlightPath is a system developed by some folks mainly at UT Austin for peer-to-peer streaming in dynamic networks. This is a reasonably challenging problem in itself, although one that’s seen a good deal of work before. However, the really cool thing about this paper is that they treat participants in the network as potentially rational agents. Since Lamport’s seminal work on the Byzantine generals problem, it’s been standard practice to assign one of two behaviour modes to members of distributed systems: either you’re alturistic, which means that you do exactly what the protocol tells you to do, no matter what the cost to yourself, or Byzantine, which means that you do whatever you like, again no matter what the cost to yourself.

It was realised recently that this is a false dichotomy: there’s a whole class of behaviour that’s not captured by these two extremes. Rational agents participate in a protocol as long as it is worth their while to do so. At its most simple, this means that rational agents will not incur a cost unless they expect to recoup a benefit that is worth equal to or more than the original cost to them. This gave rise to the Byzantine-Alturistic-Rational (BAR) model, due to the same UTA group, which can be used to more realistically model the performance of peer-to-peer protocols.
Continue reading

OSDI '08: Corey, an operating system for many cores

Just before Christmas, the systems community held one of its premier conferences – Operating Systems Design and Implementation (OSDI ’08). This biannual conference showcases some of the best research in operating systems, networks, distributed systems and software technology from the past couple of years.

Although I wasn’t lucky enough to go, I did grab a copy of the proceedings and had a read through a bunch of the papers that interested me. I plan to post summaries of a few to this blog. I see people ask repeatedly on various forums (fora?) “what’s new in computer science?”. No-one seems to give a satisfactory answer, for a number of reasons. Hopefully I can redress some of the balance here, at least in the systems world.

Without further ado, I’ll get stuck in to one of the OSDI papers: Corey: an operating system for many cores by Boyd-Wickizer et al from a combination of MIT, Fudan University, MSR Asia and Xi’an Jiaotong University (12 authors!). Download the paper and play along at home, as usual.
Continue reading

BigTable: Google's Distributed Data Store

Although GFS provides Google with reliable, scalable distributed file storage, it does not provide any facility for structuring the data contained in the files beyond a hierarchical directory structure and meaningful file names. It’s well known that more expressive solutions are required for large data sets. Google’s terabytes upon terabytes of data that they retrieve from web crawlers, amongst many other sources, need organising, so that client applications can quickly perform lookups and updates at a finer granularity than the file level.

So they built BigTable, wrote it up, and published it in OSDI 2006. The paper is here, and my walkthrough follows.

Continue reading

Yahoo's PNUTS

In these politically charged times, it’s important for written media to give equal coverage to all major parties so as not to appear biased or to be endorsing one particular group. With that in mind, we at Paper Trail are happy to devote significant programming time to all the major distributed systems players.

This, therefore, is a party political broadcast on behalf of the Yahoo Party.

PNUTS: Yahoo!’s Hosted Data Serving Platform

(Please note, that’s the first and last time in this article that I’ll be using the exclamation mark in Yahoo’s name, it looks funny.)

As you might expect from the company that runs Flickr, Yahoo have need for a large scale distributed data store. In particular, they need a system that runs in many geographical locations in order to optimise response times for users from any region, while at the same time coordinating data across the entire system. As ever, the system must exhibit high availability and fault tolerance, scalability and good latency properties.

These, of course, are not new or unique requirements. We’ve seen already that Amazon’s Dynamo, and Google’s BigTable/GFS stack offer similar services. Any business that has a web-based product that requires storing and updating data for thousands of users has a need for a system like Dynamo. Many can’t afford the engineering time required to develop their own tuned solution, so settle for well-understood RDBMS-based stacks. However, as readers of this blog will know, RDBMSs can be almost too strict in terms of how data are managed, sacrificing responsiveness and throughput for correctness. This is a tradeoff that many systems are willing to explore.

PNUTS is Yahoo’s entry into this space. As usual, it occupies the grey areas somewhere between a straight-forward distributed hash-table and a fully-featured relational database. They published details in the conference on Very Large DataBases (VLDB) in 2008. Read on to find out what design decisions they made…

(The paper is here, and playing along at home is as ever encouraged).

Continue reading