[svlug] More links about ssds

John Conover conover at rahul.net
Thu Jun 2 11:38:28 PDT 2016

Since SSD memory elements are semiconductor based, it is probably a
reasonable assumption that failures are ergodic, (including
write/erase cycles.)

MTBF does not mean how long a device would be expected to last.

MTBF means that given sufficiently many devices, half would have
failed by the MTBF, and half would be still be running, (this is NOT
true if failures are non-ergodic.)

If a system depends on multiple devices to function, (NOT a stripped
array; SCSI, for example, which can tolerate at least a single
failure,) doubling the number of devices reduces the system MTBF by a
factor of two, (if failures are ergodic.)


BTW, further, the MTBF is a function of operating temperature for
semiconductor devices, (usually specified at 105F ambient for consumer
products, which means about 70C junction temperature, for a
"reasonable," design, at 105F ambient, i.e., a commodity PC.)

Increasing the device temperature 18C decreases the MTBF by a factor
of two, meaning whatever is going to fail, will fail in half the time
for a specific device.

Sarah Newman writes:
> I get the impression it's more likely for SSDs with the same age and
> workload to fail concurrently than with hard drives, though I don't
> know how serious that risk is in practice if the SSDs you're using
> aren't designed to self-destruct after a certain number of wear
> cycles.
> It doesn't hurt to mix manufacturers and/or hours of use within a
> single array. Unevenly distributed parity (if using parity-based
> RAID) is another approach to reduce the possibility of having
> multiple drives fail concurrently:
> http://www.cs.yale.edu/homes/mahesh/papers/eurosys10-diffraid.pdf


John Conover, conover at rahul.net, http://www.johncon.com/

More information about the svlug mailing list