Similar but Different: The Tale of Transient and Permanent Faults

When determining whether an IC is safe from random hardware faults, applying safety metrics such as PMHF, SPFM, and LFM, engineers must analyze both transient and permanent faults. This paper highlights the fundamental differences between permanent and transient faults on digital circuits, and why this distinction is important in the context of the ISO 26262:2018 functional safety standard.

Functional Safety

Jake Wiltgen Charles Battikha

Last Updated Oct 2023

Introduction

Are you trying to decide if your design is safe from random hardware faults and trying to figure out safety metrics such as the single point fault metric (SPFM), latent fault metric (LFM), and probabilistic metric for hardware failure (PMHF)? If so, you are undoubtedly weighing both transient and permanent faults. The reality is that they both need to be analyzed, upon which we discover that there is quite a bit of difference between them. The objective of this paper is to highlight the fundamental differences between permanent and transient faults on digital circuits, and why this distinction is important in the context of the ISO 26262:2018 functional safety standard.
What are they and where do they come from?
In an integrated circuit, sources of faults come from a variety of sources: electromagnetic interference (EMI), radiation, electro migration, shocks, vibrations, and more. In some cases, it is important to know the specific sources so targeted measures can be taken. When this is the case, it is usually reasonable to abstract them to bit flips (transients) and stuck-at faults (permanents). Importantly, this abstraction is allowed by ISO 26262.
^{Figure 1: Permanent versus transient fault comparison.}
How often do they occur (base failure rate)?
For integrated circuits, the base failure rate (λ) is the basis for ISO 26262:2018 metric calculations. It is also the means to normalize across the different technologies and fault types within an integrated circuit. For example, logic for IOs, mixed signal logic, digital logic, dense memories, and high-speed memories have different inherent failure rates. When base failure rates are calculated for each technology, they are represented in units of failure in time (FIT) where 1 FIT = 1 failure/billion device hours of operation.
The base failure rate for a chip is a summation of the base failure rate for each technology and fault type (n-combinations) in that chip:
Download Paper
- Similar but Different – The Tale of Transient and Permanent Faults
  Functional Safety Oct 18, 2023 Charles Battikha pdf

Agentic AI Webinar

Wednesday, Apr 22nd-8:00 AM PDT

From Apps to Orchestration: Agentic AI for Autonomous RTL Signoff with Questa One Agentic Toolkit

DVCon Keynote

Beyond Bigger Designs: Rethinking Verification for the Era of Convergence

DVCon Workshop

Agentic AI for RTL Signoff: Gen AI for Chip Design Flows using Questa Toolkit

Press Release

Siemens accelerates integrated circuit design and verification with agentic AI in Questa One

BUGGED OUT Podcast

Every chip has bugs — the real question is how fast you can find and fix them.

New White Papers

Agentic AI, Avery VIP, Coverage, Formal Verification, Functional Safety, UPF and More!

Similar but Different: The Tale of Transient and Permanent Faults

Introduction

What are they and where do they come from?

How often do they occur (base failure rate)?

Download Paper

Similar but Different – The Tale of Transient and Permanent Faults