Verification Academy

Search form

My Account Menu

  • Register
  • Log In
  • Topics
  • Courses
  • Forums
  • Patterns Library
  • Cookbooks
  • Events
  • More
  • All Topics
    The Verification Academy offers users multiple entry points to find the information they need. One of these entry points is through Topic collections. These topics are industry standards that all design and verification engineers should recognize. While we continue to add new topics, users are encourage to further refine collection information to meet their specific interests.
    • Languages & Standards

      • Portable Test and Stimulus
      • Functional Safety
      • Design & Verification Languages
    • Methodologies

      • UVM - Universal Verification Methodology
      • UVM Framework
      • UVM Connect
      • FPGA Verification
      • Coverage
    • Techniques & Tools

      • Verification IP
      • Simulation-Based Techniques
      • Planning, Measurement, and Analysis
      • Formal-Based Techniques
      • Debug
      • Clock-Domain Crossing
      • Acceleration
  • All Courses
    The Verification Academy is organized into a collection of free online courses, focusing on various key aspects of advanced functional verification. Each course consists of multiple sessions—allowing the participant to pick and choose specific topics of interest, as well as revisit any specific topics for future reference. After completing a specific course, the participant should be armed with enough knowledge to then understand the necessary steps required for maturing their own organization’s skills and infrastructure on the specific topic of interest. The Verification Academy will provide you with a unique opportunity to develop an understanding of how to mature your organization’s processes so that you can then reap the benefits that advanced functional verification offers.
    • Universal Verification Methodology (UVM)

      • Advanced UVM
      • Basic UVM
      • Introduction to UVM
      • UVM Connect
      • UVM Debug
      • UVMF - One Bite at a Time
    • Featured Courses

      • Introduction to ISO 26262
      • Introduction to DO-254
      • Clock-Domain Crossing Verification
      • Portable Stimulus Basics
      • Power Aware CDC Verification
      • Power Aware Verification
      • SystemVerilog OOP for UVM Verification
    • Additional Courses

      • Assertion-Based Verification
      • An Introduction to Unit Testing with SVUnit
      • Evolving FPGA Verification Capabilities
      • Metrics in SoC Verification
      • SystemVerilog Testbench Acceleration
      • Testbench Co-Emulation: SystemC & TLM-2.0
      • Verification Planning and Management
      • VHDL-2008 Why It Matters
    • Formal-Based Techniques

      • Formal Assertion-Based Verification
      • Formal-Based Technology: Automatic Formal Solutions
      • Formal Coverage
      • Getting Started with Formal-Based Technology
      • Handling Inconclusive Assertions in Formal Verification
      • Sequential Logic Equivalence Checking
    • Analog/Mixed Signal

      • AMS Design Configuration Schemes
      • Improve AMS Verification Performance
      • Improve AMS Verification Quality
  • All Forum Topics
    The Verification Community is eager to answer your UVM, SystemVerilog and Coverage related questions. We encourage you to take an active role in the Forums by answering and commenting to any questions that you are able to.
    • UVM Forum

      • Active Questions
      • Solutions
      • Replies
      • No Replies
      • Search
      • UVM Forum
    • SystemVerilog Forum

      • Active Questions
      • Solutions
      • Replies
      • No Replies
      • Search
      • SystemVerilog Forum
    • Coverage Forum

      • Active Questions
      • Solutions
      • Replies
      • No Replies
      • Search
      • Coverage Forum
    • Additional Forums

      • Announcements
      • Downloads
      • OVM Forum
  • Patterns Library
    The Verification Academy Patterns Library contains a collection of solutions to many of today's verification problems. The patterns contained in the library span across the entire domain of verification (i.e., from specification to methodology to implementation—and across multiple verification engines such as formal, simulation, and emulation).
    • Implementation Patterns

      • Environment Patterns
      • Stimulus Patterns
      • Analysis Patterns
      • All Implementation Patterns
    • Specification Patterns

      • Occurrence Property Patterns
      • Order Property Patterns
      • All Specification Patterns
    • Pattern Resources

      • Start Here - Patterns Library Overview
      • Whitepaper - Taking Reuse to the Next Level
      • Verification Horizons - The Verification Academy Patterns Library
      • Contribute a Pattern to the Library
  • All Cookbooks
    Find all the methodology you need in this comprehensive and vast collection. The UVM and Coverage Cookbooks contain dozens of informative, executable articles covering all aspects of UVM and Coverage.
    • UVM Cookbook

      • UVM Basics
      • Testbench Architecture
      • DUT-Testbench Connections
      • Configuring a Test Environment
      • Analysis Components & Techniques
      • End Of Test Mechanisms
      • Sequences
      • The UVM Messaging System
      • Other Stimulus Techniques
      • Register Abstraction Layer
      • Testbench Acceleration through Co-Emulation
      • Debug of SV and UVM
      • UVM Connect - SV-SystemC interoperability
      • UVM Versions and Compatibility
      • UVM Cookbook
    • Coding Guidelines & Deployment

      • Code Examples
      • UVM Verification Component
      • Package/Organization
      • Questa/Compiling UVM
      • SystemVerilog Guidelines
      • SystemVerilog Performance Guidelines
      • UVM Guidelines
      • UVM Performance Guidelines
    • Coverage Cookbook

      • Introduction
      • What is Coverage?
      • Kinds of Coverage
      • Specification to Testplan
      • Testplan to Functional Coverage
      • Bus Protocol Coverage
      • Block Level Coverage
      • Datapath Coverage
      • SoC Coverage Example
      • Requirements Writing Guidelines
      • Coverage Cookbook
  • All Events
    No one argues that the challenges of verification are growing exponentially. What is needed to meet these challenges are tools, methodologies and processes that can help you transform your verification environment. These recorded seminars from Verification Academy trainers and users provide examples for adoption of new technologies and how to evolve your verification process.
    • Upcoming & Featured Events

      • Creating an Optimal Safety Architecture  - February 9th
      • The ABC of Formal Verification - February 11th
      • Events Calendar
    • On Demand Seminars

      • I'm Excited About Formal...
      • Visualizer Coverage
      • Formal-based ‘X’ Verification
      • 2020 Functional Verification Study
      • All On-Demand Seminars
    • Recording Archive

      • Improving Your SystemVerilog & UVM Skills
      • Should I Kill My Formal Run?
      • Visualizer Debug Environment
      • All Recordings
    • Mentor Training Center

      • SystemVerilog for Verification
      • SystemVerilog UVM
      • UVM Framework
      • Instructor-led Training
    • Mentor Learning Center

      • SystemVerilog Fundamentals
      • SystemVerilog UVM
      • Questa Simulation Coverage Acceleration Apps with inFact
      • View all Learning Paths
  • About Verification Academy
    The Verification Academy will provide you with a unique opportunity to develop an understanding of how to mature your organization's processes so that you can then reap the benefits that advanced functional verification offers.
    • Blog & News

      • Verification Horizons Blog
      • Academy News
      • Academy Newsletter
      • Technical Resources
    • Verification Horizons Publication

      • Verification Horizons - November 2020
      • Verification Horizons - July 2020
      • Verification Horizons - March 2020
      • Issue Archive
    • About Us

      • Verification Academy Overview
      • Subject Matter Experts
      • Contact Us
    • Training

      • Questa® & ModelSim®
      • Questa® inFact
      • Functional Verification Library
  • Home /
  • Verification Horizons /
  • December 2019 /
  • Why Hardware Emulation Is Necessary to Verify Deep Learning Designs

Why Hardware Emulation Is Necessary to Verify Deep Learning Designs

Verification Horizons - Tom Fitzpatrick, Editor

Why Hardware Emulation Is Necessary to Verify Deep Learning Designs Jean-Marie Brunet - Mentor, A Siemens Business, and Lauro Rizzatti - Verification Expert

INTRODUCTION

There is no doubt that computers have changed our lives forever. Still, as much as computers outperform humans at complex tasks such as solving complex mathematical equations in almost zero time, they may underperform when solving what humans can do easily — image identification, for instance. Anyone in the world can identify a picture of a cat in no time at all. The most powerful PC in the world may take hours to get the same answer.

The problem belongs to the traditional control-processing-unit (CPU) Von Neuman architecture. Devised to overcome the inflexibility of early computers that were hardwired to perform a single task, the stored-program computer, credited to Von Neuman, gained the flexibility to execute any program at the expense of lower performance.

Figure 1 - For many years, computers were underperformers when tasked to solve problems that humans can do easily (Source: Wikipedia)


Limitations of the stored-program computer, compounded by limited data available for analysis and inadequate algorithms to perform the analysis, conspired to delay artificial intelligence (AI), and its sub-classes of machine learning (ML) and deep learning (DL) implementation for decades.

The turning point came around the beginning of this decade when the computational error in the ability to recognize images via DL started to decrease, and in 2015 crossed over the ability of humans, as shown in figure 1. The human error in image recognition is slightly higher than 5%. Today DL is widely successful in image and video recognition.

CONVOLUTIONAL NEURAL NETWORKS

DL is built on convolutional neural networks (CNN), artificial neural networks similar to the brain’s network of neurons. They consist of a huge array of several billions or even trillions of simple arithmetic operators including multipliers and adders that are tightly interconnected.

It is not the vast number of operators that make a CNN complex. Rather, its complexity stems from the way they are layered, arranged, and interconnected. The largest CNN designs include multi-billion ASIC-equivalent gates. Simpler CNN designs start at hundreds of million gates.

CNN designs are data-path oriented with limited control logic. Still, they present significant challenges to designers to implement fast and efficient data transfers between layered arithmetic operators and data memories.

DL designs “learn” what they are supposed to do via a training process that configures the multitude of weights and biases of the multipliers and adders. The more layers, the deeper is the learning. The complexity of the task is exacerbated by training back-propagation algorithms aimed at refining weights and biases.

Once trained and configured, a DL design is deployed in the inference mode. See Figure 2.

Figure 2 - DL designs learn through a training process that configures the data and deploys it in inference mode once trained and configured. (Source: NVIDIA)


CLASSICAL PROGRAMMING VERSUS MACHINE LEARNING

It is interesting to look at the input information fed into and the output information generated from a CPU and compare it to a CNN in training versus a CNN in inference. See Table I below.

Table 1 - This matrix compares in/out information between Von Neuman and DL processing. (Source: Mentor, a Siemens Business)


CNN PROCESSING POWER, COST, ACCURACY

The computational demand to perform learning and inference operations is imposing. DL training calls for intensive processing. While DL training must be performed only once, a trained neural network must perform inference on potentially hundreds of thousands of applications, serving many millions of users making scores of requests to be served quickly in many application scenarios. This massive computational load requires processing resources that scale in performance, power consumption, and size at a price point that can be justified economically.

In general, digital computing can be carried out using floating-point or fixed-point numerical representations. It can also be executed with different levels of accuracy, measured by the number of bits used to hold the data. While CNN learning requires higher accuracy provided by 16-bit or 32-bit floating point math, CNN inference can benefit from 16-bit or 8-bit fixed point, or even lower precision. Several analyses performed in the past few years prove that 8 bits lead to quality of results comparable to those obtained by 16-bit or 32-bit floating point calculation.

A rule of thumb captures the interdependency between precision and silicon requirement. Reducing an arithmetic calculation by 1-bit halves the silicon size and power needed to perform it. There is a lower limit to the precision not to be trespassed without breaking the learned results.

WHERE DEEP LEARNING OCCURS

CNN computation can occur in four spatial locations: in datacenters, at the edge of datacenters, on desktops, and in embedded applications. Depending on whether it is learning or inference, some locations are better than others.

Training in a datacenter requires massive computational throughput to perform large tasks and achieve a high quality of results. On the downside, it consumes more power, occupies a large footprint, and costs orders of magnitude more than edge DL processing.

Conversely, DL computing at the edge is less demanding in processing throughput, but provides shorter latencies, and accommodates constrained power budgets in smaller footprints. DL computing on desktops and in embedded applications shares the profile with edge computing.

Ultimately, execution speed, power consumption, accuracy, size, and cost are driven by the application. Most applications demand lower pricing over lower precision of calculation.

DETERMINISTIC VERSUS STATISTICAL DESIGNS

Unlike a processor design or any other semiconductor design performing a function whose output is rigorous and deterministic, AI/ML/DL designs produce responses that are more or less statistically correct. They implement complex algorithms that process input data and generate responses that typically are correct within a percentage of error. For instance, an image recognition algorithm running on a CNN design may identify a picture of a cat within a margin of error — the smaller the error, the more accurate the design.

This poses a challenge to the design verification team.

HARDWARE EMULATION FOR DL DESIGN VERIFICATION

Currently, there are four distinctive groups of companies developing silicon for AI and, more specifically, for DL acceleration.

The first group includes the established semiconductor companies, such as Intel, AMD, NVIDIA, IBM, Qualcomm and Xilinx. With a few exceptions, most have roots in CPUs and Application Processors development. The exceptions are NVIDIA that leads the AI field with advanced graphics processing units (GPUs), and Xilinx (and Intel) that bet on the success of the field programmable gate arrays (FPGAs), pushing boundaries of the FPGA technology.

Second, several large system/software companies such as Google, Amazon, Microsoft, Facebook and Apple are entering the silicon arena. While they started from scratch with no history of design-ing silicon, they have deep pockets that help them recruit the talent to fulfil the objective. They design application specific integrated circuits (ASICs) and systems on chips (SoCs) targeting CNN and explore bold new architectures.

Third, several intellectual property (IP) companies are actively developing IP consisting of advanced processing cores to offer huge computational capabilities. Among them are Arm, CEVA, Cadence, Synopsys, Imagination, Achronix, FlexLogic and
many others.

Fourth, a plethora of startups with ambitious goals and substantial funding are populating the field. They include Wave Computing, Graphcore, Cerebras, Habana, Mythic and SambaNova all creating ad-hoc ASICs with some level of programmability.

Regardless of the type of silicon they develop, the designs share characteristics that together impose a unique set of challenges to design verification teams. See Figure 3.

Figure 3 - A typical SoC design for AI and DL acceleration share many characteristics that challenge design verification teams with open-source software harnesses and frameworks organized of drivers, OS, firmware, FrameWorks, algorithms and performance benchmarks. (Source: Lauro Rizzatti)


Exhaustive Hardware Verification

As stated earlier, AI designs are among the largest reaching into the two to four billion-equivalent-gates. A critical aspect of these design is the memory access and its bandwidth. Efficient memory accesses make or break a design.

Further, a high-level of computational power and low degree of power consumption are mandatory. A popular metric to measure the above and compare designs is TOPS/watt (Tera-Operations-Per-Second/watt).

All call for thorough and exhaustive hardware verification, supported by high debug productivity.

Comprehensive Software Validation

While the hardware is the foundation of CNN designs, software plays a critical role, adding another dimension to the complexity of CNN designs. Designs are built on sophisticated open-source software harnesses and frameworks created and sponsored by some of the big companies and better-known universities. The software is organized in a stack that includes drivers, OS, firmware, FrameWorks, algorithms, and finally, performance benchmarks, typically MLPerf, DAWNBench. See figure 3 above.

The ultimate goal is to train a CNN before committing to silicon, a task that requires plenty of processing power. Reaching the objective accelerates time-to-market critical in this hyper-competitive market and, at the same time, achieves quality-of-results. The task falls to the design verification/validation team, who can meet the challenge via hardware emulation.

HARDWARE EMULATION REQUIREMENTS FOR CNN VERIFICATION

One of the prominent hardware emulation suppliers, Mentor, a Siemens Business, identified three main requirements necessary to successfully deploy hardware emulation for design verification of CNN accelerators. Requirements are:

1. Scalability

CNN accelerators differ when developed for data centers versus edge installations.

In data centers, accelerators must possess high computing power, large memory bandwidth at the expense of large footprint and significant power consumption.

At the edge, low latency, low-power consumption, and low footprint top the list of design attributes.

To succeed in the verification task, an emulator must provide a wide range of capacity, and upward scalability to accommodate the next generations of designs. To be specific, the emulation platform should offer a capacity range from tens of million gates to 10-billion gates.

As designs scale up, the emulation platform ought to scale upward in capacity without compromising its performance, necessary to execute AI frameworks and process MLPerf or DAWNBench benchmarks within the allocated verification schedule.

2. Virtualization

For few years now, deploying emulation in virtual mode has been increasing and replacing the in-circuit emulation (ICE) mode due to multiple advantages. Advantages include simpler deployment without hardware dependencies typical of ICE, ease/deterministic design debug, high debug accuracy, ability to perform several verification tasks not possible in ICE (power estimation, low-power verification, DFT verification, hardware/software debug…), and remote access for multiple concurrent users. The virtual mode moves the emulator out of a lab and into a data center making it an enterprise resource.

For Mentor users, advantages are augmented with benefits offered by Veloce® VirtuaLAB that boost the emulation platform productivity by an order of magnitude, enabling more testing on shorter schedules.

Given the critical nature of verifying AI designs and their software stacks, virtualization allows for increasing levels of abstraction and ensures full design visibility and controllability to execute any verification suite in the search for design bugs. It also enables precise measurements of important system behavior parameters.

Virtualization enables the execution of a broad range of AI frameworks and of any performance benchmarks. These tasks would not be possible in ICE mode.

3. Determinism

AI silicon is one component of a complex system that includes software and algorithms, as noted earlier. Development and optimization of all three require concurrency of tasks. It is crucial to guarantee that hardware design mapping into an emulator happens exactly the same way for every iteration when any benchmarks, software and algorithms are run on it.

This is possible only by creating a deterministic emulation environment.

Design compilation must complete successfully and quickly avoiding trial and error in order to produce the same results from one compile to the next because compilation must be a deterministic process.

Emulation of frameworks, benchmarks and tests must run in a consistent order because execution must be a deterministic process.Debug must be a deterministic process. Problems found in emulation must be reproduceable, calling for a repeatable debug methodology.

Conclusion

AI/ML/DL designs are posing new challenges to design verification teams. Only hardware emulation can confront and tame those challenges.

Back to Top

Table of Contents

Verification Horizons Articles:

  • Lessons from “Verifying” the Harry Potter Franchise

  • Why Hardware Emulation Is Necessary to Verify Deep Learning Designs

  • Deadlock Prevention Made Easy with Formal Verification

  • Exercising State Machines with Command Sequences

  • Designing A Portable Stimulus Reuse Strategy

  • Don’t Forget the Protocol! A CDC Protocol Methodology to Avoid Bugs in Silicon

© Mentor, a Siemens Business, All rights reserved www.mentor.com

Footer Menu

  • Sitemap
  • Terms & Conditions
  • Verification Horizons Blog
  • LinkedIn Group
SiteLock