February 2007 Archives

Google disk failures paper

| No Comments

What do we learn from this paper by google-folk Pinheiro, Weber, and Barroso?

We learn that the famed Bigtable is good for gathering time-series statistical data about servers. It’s probably better than what the rest of the world does with rrdtool, since there is no loss of granularity over time. We hear again about Mapreduce and about Sawzall. It’s all well and good, and pretty impressive.

Furthermore, we read that

We conclude that it is unlikely that SMART data alone can be effectively used to build models that predict failures of individual drives.

Is it a good science? Yeah, it’s fine, the negative result is nevertheless a result, and we can learn some important things from failures.

But then we read:

Failure rates are known to be highly correlated with drive models, manufacturers and vintages. Our results do not contradict this fact. For example, Figure 2 changes significantly when we normalize failure rates per each drive model. Most age-related results are impacted by drive vintages. However, in this paper, we do not show a breakdown of drives per manufacturer, model, or vintage due to the proprietary nature of these data.

And this, my friends, is not a good science at all.

About this Archive

This page is an archive of entries from February 2007 listed from newest to oldest.

January 2007 is the previous archive.

April 2007 is the next archive.

Find recent content on the main index or look in the archives to find all content.