I’ll be giving a talk at this year’s Strata Conference in Santa Clara, on February 29th. My talk is called Monitoring Apache Hadoop – A Big Data Problem?. I’d be lying if I said that every slide was fully realised at this point, but you can read the abstract to see what I’ve committed myself to. The general idea is that building large scale shared-nothing distributed systems is at most half the problem in making them a reality. Managing these systems day-to-day requires the understanding and analysis of a serious amount of data; so there’s a nice cycle here that you might be able to use the data processing systems you’re trying to understand to understand them. I’ll try and tie the whole thing together with a discussion of failure; the thesis being that partial failure in distributed systems is both to blame for the incidents we’re trying to understand, and making understanding them very difficult – I believe this is true in a very fundamental sense, so I’ll make that case and also talk about what is to be done.
(And if I’m not a big enough draw – perish the thought – there are many, many other interesting sessions. In particular, Josh will be talking about Crunch, and Sarah will be giving both introductory and advanced Hadoop classes – both people I work with, and both fantastic speakers!)