Similar but Different – The Tale of Transient and Permanent Faults
When determining whether an IC is safe from random hardware faults, applying safety metrics such as PMHF, SPFM, and LFM, engineers must analyze both transient and permanent faults. This paper highlights the fundamental differences between permanent and transient faults on digital circuits, and why this distinction is important in the context of the ISO 26262:2018 functional safety standard.

-
Introduction
Are you trying to decide if your design is safe from random hardware faults and trying to figure out safety metrics such as the single point fault metric (SPFM), latent fault metric (LFM), and probabilistic metric for hardware failure (PMHF)? If so, you are undoubtedly weighing both transient and permanent faults. The reality is that they both need to be analyzed, upon which we discover that there is quite a bit of difference between them. The objective of this paper is to highlight the fundamental differences between permanent and transient faults on digital circuits, and why this distinction is important in the context of the ISO 26262:2018 functional safety standard.
What are they and where do they come from?
In an integrated circuit, sources of faults come from a variety of sources: electromagnetic interference (EMI), radiation, electro migration, shocks, vibrations, and more. In some cases, it is important to know the specific sources so targeted measures can be taken. When this is the case, it is usually reasonable to abstract them to bit flips (transients) and stuck-at faults (permanents). Importantly, this abstraction is allowed by ISO 26262.
Figure 1: Permanent versus transient fault comparison.
How often do they occur (base failure rate)?
For integrated circuits, the base failure rate (λ) is the basis for ISO 26262:2018 metric calculations. It is also the means to normalize across the different technologies and fault types within an integrated circuit. For example, logic for IOs, mixed signal logic, digital logic, dense memories, and high-speed memories have different inherent failure rates. When base failure rates are calculated for each technology, they are represented in units of failure in time (FIT) where 1 FIT = 1 failure/billion device hours of operation.
The base failure rate for a chip is a summation of the base failure rate for each technology and fault type (n-combinations) in that chip:
-
Download Paper
-
Similar but Different – The Tale of Transient and Permanent Faults
Functional Safety Oct 18, 2023 pdf
-