Subverting Sinfonia

Recently I presented a paper at a reading group from SOSP ’07 about a transactional distributed shared memory system called Sinfonia. The idea is that you given the abstraction of several nodes each with its own contiguous memory address space. You can perform transactions across more than one node at once by locking the set of locations you want to write, then committing or aborting. The paper is quite cute: the main contribution is to restrict the set of transactions to those that can be set up in one message, and therefore piggybacked onto the beginning of a two phase commit, saving a bunch of message latencies.

The idea was sufficiently simple, and I was sufficiently idle that day, that I decided to implement a Sinfonia-like system in Python in a couple of hours. The main point was to test my hypothesis that, at their core, systems presented in research papers aren’t often complex to understand or implement (this does not speak to their cleverness or value as research contributions). I started at midday, and had my first minitransactions working at around 2:30pm across multiple memory nodes.

Of course, the resultant code was rubbish, so since then (a week or so ago) I’ve done some tidying up. What results is certainly not top-quality Python by any stretch, but it’s a bit more workable. There are a lot of features from Sinfonia missing. The most egregious is the lack of a watchdog to shepherd incomplete 2PCs to their conclusion if the client dies – the expectation is that the memory nodes are sufficiently robust to never fail-stop. Of course, there is no robustness for memory nodes either – no persistent logging, no self-monitoring and re-start and so on. And there’s precious little error checking. And bugs are doubtless rife. But it’s a start, right?

The code is downloadable from Google Code – the project is called Py-Concerto. I hope to convert this prototype into much more robust Java soon – at that point I’ll stop work on the Python version, but not for a week or two yet.

What kind of stuff can you build on Concerto? There’s a paper from the same guys who did the original Sinfonia work here on a scalable, distributed B-tree. In general, distributed data structures are a good fit. You could build a reasonably neat overlay network on top of Concerto, using memory locations for pointers and metadata, or a simple distributed linked list.

The code comes with a shell that you can use to talk directly to a Concerto server – instructions are in the Python scripts themselves.

In the next few days I’ll add logging and the watchdog process, so watch the svn repo and let me know if you do anything interesting with this.

Binomial Heaps

(The python code for this article is available here)

The standard binary heaps that everyone learns as part of a first algorithms course are very cool. They give guaranteed n sorting cost, can be stored compactly in memory since they’re full binary trees and allow for very fast implementations of priority queues. However, there are a couple of operations that we might be interested in that binary trees don’t give us, at least not cheaply.

In particular, we might be concerned with merging two heaps together. Say, for example, that we’re shutting down a processor with its own priority queue for schedulable processes, and we want to merge the workload in with another processor. One way to do this would be to insert every item in the first processor’s queue into the receiving processor’s queue. However, this takes O(n) time – at least, depending on how the queues are implemented. We’d like to be able to do that more efficiently.

Step forward binomial heaps. Binomial heaps are rather different to binary heaps – although they share a few details in common. Binomial heaps allow us to merge two heaps together in O(\log n) time, in return for some extra cost when finding the minimum. However, extracting the minimum still takes O(\log n), which is the same as a binary heap.

Continue reading