It’s a sign of how down-trodden the Software Transactional Memory (STM) effort must have become that the article (sorry, ACM subscription required) published in a recent CACM might have been just as correctly called “STM: Not as bad as the worst possible case”. The authors present a series of experiments that demonstrate that highly concurrent STM code beats sequential, single threaded code. You’d hope that this had long ago become a given, but what this demonstrates is only hey, STM allows some parallelism. And this weak lower bound got a whole article.
Another conclusion from the article is that STM performs best when there is little contention for transactions between threads. Again, that should really be a given – all reasonable concurrency primitives have high throughput when there is little contention but high parallelism. (A lot of work has gone into making this a very fast case (since it is the most common) for locking, see e.g. biased locking schemes in the Hotspot JVM).
Bryan Cantrill (previously of Fishworks, now of Joyent) rips on transactional memory more eloquently than I ever could. STM is a declarative solution to thread safety, which I like, but no more declarative really than synchronised blocks – and Cantrill points out the elephant in the room that the CACM article seemed to ignore: doing IO inside transactions is hugely problematic (because how precisely do you roll back a network packet?).
A recent paper at SOSP 2009 called Operating System Transactions attacked this problem, although not from the viewpoint of STM, but to provide atomicity and isolation for situations where bugs arise from the separation between reads, and writes that depend on that read (Time Of Check To Time Of Use – TOCTTOU). Perhaps there’s an overlap between this paper and STM approaches, but it’s not clear whether the workloads inside an operating system’s system call layer are general enough to map onto typical user-space STM work.
The second paper from OSDI that I’ll mention here is one I’ll only treat briefly – partly because it’s a bit lightweight compared to some, and partly because I’m writing in a hurry. CuriOS: Improving Reliability Through Operating System Structure attacks a problem with recovery from errors in microkernel operating systems.
Just before Christmas, the systems community held one of its premier conferences – Operating Systems Design and Implementation (OSDI ’08). This biannual conference showcases some of the best research in operating systems, networks, distributed systems and software technology from the past couple of years.
Although I wasn’t lucky enough to go, I did grab a copy of the proceedings and had a read through a bunch of the papers that interested me. I plan to post summaries of a few to this blog. I see people ask repeatedly on various forums (fora?) “what’s new in computer science?”. No-one seems to give a satisfactory answer, for a number of reasons. Hopefully I can redress some of the balance here, at least in the systems world.
Without further ado, I’ll get stuck in to one of the OSDI papers: Corey: an operating system for many cores by Boyd-Wickizer et al from a combination of MIT, Fudan University, MSR Asia and Xi’an Jiaotong University (12 authors!). Download the paper and play along at home, as usual.
It’s been a little while since my last technically meaty update. One system that I’ve been looking at a fair bit recently is Hadoop, which is an open-source implementation of Google’s MapReduce. For me, the interesting part is the large-scale distributed filesystem on which it runs called HDFS. It’s well known that HDFS is based heavily on its Google equivalent.
In 2003 Google published a paper on their Google File System (GFS) at SOSP, the Symposium on Operating Systems Principles. This is the same venue at which Amazon published their Dynamo work, albeit four years earlier. One of the lecturers in my group tells me that SOSP is a venue where “interesting” is rated highly as a criterion for acceptance, over other more staid conferences. So what, if anything, was interesting about GFS? Read on for some details…
So, despite ostensibly being a ‘systems’ guy, I haven’t spent too much time in my life getting hands on with the Linux kernel. I’ve written tiny toy operating-system-like projects before, but haven’t done much open-heart surgery on real life code.
I think this should change, so in my very limited spare time I’m doing some very simple projects to teach me more about the Linux kernel code layout so that if it should so happen in a job interview that someone asks me if I’m comfortable hacking at the kernel level I can answer `yes’ with far more conviction (I would probably answer positively anyhow, because I’m arrogant enough to think that it’s not beyond me, but there’s a lot of metaphorical difference between having the book on your shelf and having read it 🙂 ).