Reliability

Measured by Mean Time Between Failures (MTBF) and Mean Time To Recover (MTTR).

MTBF

\text{MTBF} = \frac{ \text{Total Elapsed Time} - \text{ Total Down Time } }{ \text{Failure Count} }. Note, the numerator excludes downtime which may be contributed by maintenance or repair. In other words, MTBF = \frac{ \text{Total Operating Time} }{ Failure Count  }. It measures how often good time had been punctured by failures. We want high MTBF.

MTTR

MTTR = \frac{ \text{Total Maintenance Time} }{ \text{ Repair Count } }. Measures how good the repair mechanism is. We want low MTTR.

We could define availability as a function of MTBF and MTTR: \text{Availability} = \frac{ \text{MTBF} }{ \text{MTBF} + \text{MTTR} }. This is more actionable. To improve availability we can make failures rarer or make the repair mechanism better.

Leave a comment