
What do we learn from this paper by google-folk Pinheiro, Weber, and Barroso?
We learn that the famed Bigtable is good for gathering time-series statistical data about servers. It’s probably better than what the rest of the world does with rrdtool, since there is no loss of granularity over time. We hear again about Mapreduce and about Sawzall. It’s all well and good, and pretty impressive.
Furthermore, we read that
We conclude that it is unlikely that SMART data alone can be effectively used to build models that predict failures of individual drives.
Is it a good science? Yeah, it’s fine, the negative result is nevertheless a result, and we can learn some important things from failures.
But then we read:
Failure rates are known to be highly correlated with drive models, manufacturers and vintages. Our results do not contradict this fact. For example, Figure 2 changes significantly when we normalize failure rates per each drive model. Most age-related results are impacted by drive vintages. However, in this paper, we do not show a breakdown of drives per manufacturer, model, or vintage due to the proprietary nature of these data.
And this, my friends, is not a good science at all.

Leave a comment