[svlug] Should dust be busted?

John Conover conover at rahul.net
Sat Mar 6 16:00:51 PST 2010


Joel Williams writes:
> 
> Heat is a big circuit killer.

>From the Arrhenius equation, whatever is going to fail in a circuit
will fail in half the time for each increase in temperature of 10C;
each increase of 10C decreases MTBF by a factor of 2.

Note that decreasing MTBF is *_exponential_* on increasing
temperature.

The general consumer electronics system spec is a 105F ambient
temperature, maximum, for an MTBF of about 2K hours, (8 hours a day,
for one year.)  So, for 85F its about a 2 years MTBF, about 4 years
for 65F, which is close to 72F, (and a half year for 125F,) for 8
hours usage a day.

MTBF does not mean that most systems will last that long. It means
half of the systems will have failed by that time, (thus the term
"mean.")

     John

BTW, laptops, cellphones, (and other portable consumer gadgets,) TV's,
etc., are usually designed using the consumer MTBF spec. PC's are
designed using a spec that depends on its anticipated ASP, (which
could be consumer or industrial, 125F, or even military, 155F,) for 2K
hours MTBF. Note that an industrial PC would last 8 years MTBF, (at 8
hours per day, or almost 3 years of 24/7, HW MTBF in a 65F data
center.)

The HW designers in most reputable companies are required to hit the
MTBF requirements, (its part of the design spec.) An MTBF that is too
short means more support/returns than anticipated, (and, the company
loses money,) and an MTBF that is too long means higher manufacturing
costs than necessary, (and, ditto.)

Most corporate QA departments run accelerated testing, (usually above
the maximum ambient temperature to force failures, such that
reliability problems can be identified in several months,
pre-production,) on 1000 pre-production units to measure MTBF to a 98%
confidence level that the MTBF is within +/- 3% of the spec before the
product is released to manufacturing. Ongoing MTBF testing, (on a
sample basis per manufacturing lot,) is continued through the life of
the product to verify that the accelerated testing was reliable,
(i.e., the statistics of the initial MTBF were ergodic-meaning
measuring the failure rate of a few units over a long time will be the
same as measuring a lot of units over a short time, and the
accelerated higher temperature failure rate fits the Arrhenius
equation.)

Note, also, that making an assessment of MTBF based on 3 failed
industrial PC units out of 6 in a 65F data center in 14 months, (which
means a +/- 40% error in MTBF assessment, to a 98% confidence level,
because of the small sample size of 6,) would not, necessarily,
indicate an abnormally high failure rate, even though the measured
MTBF of the PCs is less than half the 3 year MTBF spec. Similarly, if
the PCs lasted 4 years with no failures, (even though half, 3, should
have failed by 3 years,) nothing could be said about abnormal any
failure rate. Both of these would be seen 12.5% of the time if the
test was repeated enough times, and the MTBF of the PCs really was 3
years.

-- 

John Conover, conover at rahul.net, http://www.johncon.com/




More information about the svlug mailing list