Verification Academy

Search form

My Account Menu

  • Register
  • Log In
  • Topics
  • Courses
  • Forums
  • Patterns Library
  • Cookbooks
  • Events
  • More
  • All Topics
    The Verification Academy offers users multiple entry points to find the information they need. One of these entry points is through Topic collections. These topics are industry standards that all design and verification engineers should recognize. While we continue to add new topics, users are encourage to further refine collection information to meet their specific interests.
    • Languages & Standards

      • Portable Test and Stimulus
      • Functional Safety
      • Design & Verification Languages
    • Methodologies

      • UVM - Universal Verification Methodology
      • UVM Framework
      • UVM Connect
      • FPGA Verification
      • Coverage
    • Techniques & Tools

      • Verification IP
      • Simulation-Based Techniques
      • Planning, Measurement, and Analysis
      • Formal-Based Techniques
      • Debug
      • Clock-Domain Crossing
      • Acceleration
  • All Courses
    The Verification Academy is organized into a collection of free online courses, focusing on various key aspects of advanced functional verification. Each course consists of multiple sessions—allowing the participant to pick and choose specific topics of interest, as well as revisit any specific topics for future reference. After completing a specific course, the participant should be armed with enough knowledge to then understand the necessary steps required for maturing their own organization’s skills and infrastructure on the specific topic of interest. The Verification Academy will provide you with a unique opportunity to develop an understanding of how to mature your organization’s processes so that you can then reap the benefits that advanced functional verification offers.
    • Universal Verification Methodology (UVM)

      • Advanced UVM
      • Basic UVM
      • Introduction to UVM
      • UVM Connect
      • UVM Debug
      • UVMF - One Bite at a Time
    • Featured Courses

      • Introduction to ISO 26262
      • Introduction to DO-254
      • Clock-Domain Crossing Verification
      • Portable Stimulus Basics
      • Power Aware CDC Verification
      • Power Aware Verification
      • SystemVerilog OOP for UVM Verification
    • Additional Courses

      • Assertion-Based Verification
      • An Introduction to Unit Testing with SVUnit
      • Evolving FPGA Verification Capabilities
      • Metrics in SoC Verification
      • SystemVerilog Testbench Acceleration
      • Testbench Co-Emulation: SystemC & TLM-2.0
      • Verification Planning and Management
      • VHDL-2008 Why It Matters
    • Formal-Based Techniques

      • Formal Assertion-Based Verification
      • Formal-Based Technology: Automatic Formal Solutions
      • Formal Coverage
      • Getting Started with Formal-Based Technology
      • Handling Inconclusive Assertions in Formal Verification
      • Sequential Logic Equivalence Checking
    • Analog/Mixed Signal

      • AMS Design Configuration Schemes
      • Improve AMS Verification Performance
      • Improve AMS Verification Quality
  • All Forum Topics
    The Verification Community is eager to answer your UVM, SystemVerilog and Coverage related questions. We encourage you to take an active role in the Forums by answering and commenting to any questions that you are able to.
    • UVM Forum

      • Active Questions
      • Solutions
      • Replies
      • No Replies
      • Search
      • UVM Forum
    • SystemVerilog Forum

      • Active Questions
      • Solutions
      • Replies
      • No Replies
      • Search
      • SystemVerilog Forum
    • Coverage Forum

      • Active Questions
      • Solutions
      • Replies
      • No Replies
      • Search
      • Coverage Forum
    • Additional Forums

      • Announcements
      • Downloads
      • OVM Forum
  • Patterns Library
    The Verification Academy Patterns Library contains a collection of solutions to many of today's verification problems. The patterns contained in the library span across the entire domain of verification (i.e., from specification to methodology to implementation—and across multiple verification engines such as formal, simulation, and emulation).
    • Implementation Patterns

      • Environment Patterns
      • Stimulus Patterns
      • Analysis Patterns
      • All Implementation Patterns
    • Specification Patterns

      • Occurrence Property Patterns
      • Order Property Patterns
      • All Specification Patterns
    • Pattern Resources

      • Start Here - Patterns Library Overview
      • Whitepaper - Taking Reuse to the Next Level
      • Verification Horizons - The Verification Academy Patterns Library
      • Contribute a Pattern to the Library
  • All Cookbooks
    Find all the methodology you need in this comprehensive and vast collection. The UVM and Coverage Cookbooks contain dozens of informative, executable articles covering all aspects of UVM and Coverage.
    • UVM Cookbook

      • UVM Basics
      • Testbench Architecture
      • DUT-Testbench Connections
      • Configuring a Test Environment
      • Analysis Components & Techniques
      • End Of Test Mechanisms
      • Sequences
      • The UVM Messaging System
      • Other Stimulus Techniques
      • Register Abstraction Layer
      • Testbench Acceleration through Co-Emulation
      • Debug of SV and UVM
      • UVM Connect - SV-SystemC interoperability
      • UVM Versions and Compatibility
      • UVM Cookbook
    • Coding Guidelines & Deployment

      • Code Examples
      • UVM Verification Component
      • Package/Organization
      • Questa/Compiling UVM
      • SystemVerilog Guidelines
      • SystemVerilog Performance Guidelines
      • UVM Guidelines
      • UVM Performance Guidelines
    • Coverage Cookbook

      • Introduction
      • What is Coverage?
      • Kinds of Coverage
      • Specification to Testplan
      • Testplan to Functional Coverage
      • Bus Protocol Coverage
      • Block Level Coverage
      • Datapath Coverage
      • SoC Coverage Example
      • Requirements Writing Guidelines
      • Coverage Cookbook
  • All Events
    No one argues that the challenges of verification are growing exponentially. What is needed to meet these challenges are tools, methodologies and processes that can help you transform your verification environment. These recorded seminars from Verification Academy trainers and users provide examples for adoption of new technologies and how to evolve your verification process.
    • Upcoming & Featured Events

      • Low Power Verification - 4/29
      • Fault Campaign for Mixed-Signal - 5/4
      • User2User - 5/26
      • Webinar Calendar
    • On-Demand Webinars

      • CDC+RDC Analysis
      • Basic Abstraction Techniques
      • Safety Analysis Techniques
      • QVIP Workflow and Debug for PCIe
      • Writing a Proxy-driven Testbench
      • Achieving High Defect Coverage
      • Visualizer Features
      • All On-Demand Webinars
    • Recording Archive

      • Siemens EDA 2021 Functional Verification Webinar Series
      • Improving Your SystemVerilog & UVM Skills
      • Should I Kill My Formal Run?
      • Visualizer Debug Environment
      • Industry Data & Surveys
      • All Recordings
    • Conferences

      • DVCon 2021
      • DVCon 2020
      • DAC 2019
      • All Conferences
    • Mentor Learning Center

      • SystemVerilog Fundamentals
      • SystemVerilog UVM
      • View all Learning Paths
  • About Verification Academy
    The Verification Academy will provide you with a unique opportunity to develop an understanding of how to mature your organization's processes so that you can then reap the benefits that advanced functional verification offers.
    • Blog & News

      • Verification Horizons Blog
      • Academy News
      • Academy Newsletter
      • Technical Resources
    • Verification Horizons Publication

      • Verification Horizons - March 2021
      • Verification Horizons - November 2020
      • Verification Horizons - July 2020
      • Issue Archive
    • About Us

      • Verification Academy Overview
      • Subject Matter Experts
      • Contact Us
    • Training

      • Questa Basic
      • Questa Advanced
      • Mastering Questa
  • Home
  • Verification Horizons
  • March 2021
  • Purging CXL Cache Coherency Dilemmas

Purging CXL Cache Coherency Dilemmas

Verification Horizons - Tom Fitzpatrick, Editor

 | Verification Horizons - March 2021 by Nikhil Jain and Gaurav Manocha, Siemens EDA

OVERVIEW

The massive growth in the production and consumption of data, particularly unstructured data, like images, digitized speech, and video, results in an enormous increase in accelerators' usage. The growing trend towards heterogeneous computing in the data center means that, increasingly, different processors and co-processors must work together efficiently, while sharing memory and utilizing caches for data sharing. Hence sharing memory with a cache brings a formidable technical challenge known as coherency; which is addressed by the Compute Express Link (CXL).

WHAT IS CXL?

CXL is a technology that enables high-bandwidth, low-latency connectivity between the host processor and devices such as accelerators, memory buffers, and smart I/O devices. CXL, based on the PCI Express® (PCIe®) 5.0 physical layer infrastructure, i.e. uses PCIe electricals and standard PCIe form factors for the add-in card. Leveraging the PCIe 5.0 infrastructure makes it easy for devices and platforms to adopt CXL without designing and validating the PHY, channel, any channel extension devices such as retimers, or the upper layers of PCIe, including the software stack. It is designed to address the growing high-performance computational workloads by supporting heterogeneous processing and memory systems with applications in Artificial Intelligence, Machine Learning, communication systems, and High-Performance Computing by enabling coherency and memory semantics.

CXL supports dynamic multiplexing between a rich set of protocols that includes I/O (CXL.io, based on PCIe), caching (CXL.cache), and memory (CXL.memory) semantics. CXL.io protocol is used for functions such as device discovery, configuration, initialization, I/O virtualization, and direct memory access (DMA) using non-coherent load-store, producer-consumer semantics. CXL.cache enables a device to cache data from the host memory, employing a simple request and response protocol. The host processor manages the coherency of data cached at the device utilizing snoop messages. CXL.memory allows a host processor to access the memory attached to a CXL device. CXL.memory transactions are simple memory load and store transactions that run downstream from the host processor. CXL maintains a unified, coherent memory space between the CPU (host processor) and any memory on the attached CXL device, allowing both the CPU and device to share resources for higher performance and reduced software stack complexity.

Figure 1 - CXL Protocol Stack


WHY CXL?: CXL VS CCIX

There are many host-to-device, and device-to-device high-speed cache-coherent interconnect standards, such as GenZ and OpenCAPI (Open Coherent Accelerator Processor Interface), CCIX (Cache Coherent Interconnect for Accelerators). Different companies have developed all these interfaces to target heterogeneous computing and coherency challenges. It is visible as different groups have been working to solve similar problems.

Cache Coherent Interconnect for Accelerators (CCIX), is an industry-standard specification to enable coherent interconnect technologies between general-purpose processors and acceleration devices for efficient heterogeneous computing. CCIX was created in 2016 by a consortium that included AMD, Arm, Huawei, IBM, Mellanox, Qualcomm, and Xilinx.

Compute Express Link (CXL) is an open standard interconnection for high-speed central processing unit (CPU)-to-device and CPU-to-memory, designed to accelerate next-generation data center performance. The CXL specification's founding promoter members included: Alibaba Group, Cisco Systems, Dell EMC, Facebook, Google, Hewlett Packard Enterprise (HPE), Huawei, Intel, and Microsoft.

Both CXL and CCIX target the same problem. The major difference between them is that CXL is a master-slave architecture where the CPU is in charge, and the other devices are all subservient, while CCIX allows peer-to-peer connections with no CPU.

Possible shakeouts/convergence is needed to move things forward. Compute Express Link and Gen-Z Consortiums have already announced their execution of a memorandum of understanding (MoU), describing a mutual collaboration plan between the two organizations.

WHY IS CACHE COHERENCY REQUIRED?

For higher performance in a multiprocessor system, each processor usually has its cache. Cache coherence refers to keeping the data in these caches consistent.

Since each core has its cache, the copy of the data in that cache may not always be the most up-to-date version. For example, imagine a dual-core processor where each core brought a block of memory into its private cache, and then one core writes a value to a specific location. When the second core attempts to read that value from its cache, it won't have the most recent version unless its cache entry is invalidated. So there is a need for a coherence policy to update the cache entry in the second core's cache; otherwise, it becomes the cause of incorrect data and invalid results.

There are various Cache Coherence Protocols in the multiprocessor system. One of the most common cache coherency protocol is MESI. This protocol is an invalidation-based protocol that is named after the four states that a cache block can have:

  • Modified: Cache block is dirty for the shared levels of the memory hierarchy. The core that owns the cache with the Modified data can make further changes at will.
  • Exclusive: The cache block is clean for the shared levels of the memory hierarchy. If the owning core wants to write to the data, it can change the data state to Modified without consulting any other cores.
  • Shared: Cache block is clean for the shared levels of the memory hierarchy. The block is read-only. If a core wants to read a block in this Shared state, it may do so; however, if it wishes to write, then the block must be transitioned to the Exclusive state.
  • Invalid: This state represents cache data that is not present in the cache.

The states' transition is controlled by memory accesses and bus snooping activity. When several caches share specific data, and a processor modifies the shared data's value, the change must be propagated to all the other caches that have a copy of the data. The notification of data change can be done by bus snooping. If a transaction modifying a shared cache block appears on a bus, all the snoopers check whether their caches have the same copy of the shared block. If they have, then that cache block needs to be invalidated or flushed to ensure cache coherency.

Figure 2 is the state-transition diagram for this protocol and shows how the cache states transition on receiving commands from the local and remote processor.

Figure 2 - MESI Transitions


VERIFICATION GOALS TO ADDRESS CACHE COHERENCY CHALLENGES

Coherency management, being a high-risk event, is required because multiple copies of the same data are in different caches throughout the system. Since data in each cache can be modified locally, the risk of using invalid data is high. Therefore, it is essential to provide a mechanism that manages when and how changes can are made. Cache coherent systems are high-risk design elements—they are challenging to design and even more challenging to verify. In the end, you need a way to sign off that your system is cache coherent confidently—this a key verification challenge.

Another challenge in verifying CXL cache-based design is that the CXL specification provides a vast range of Request types, Response Types, and a vast possibility of cache state combinations. Every combination and permutation must be verified thoroughly. Although the specification defines the logical behavior of activity on the bus, sequencing and timing of cache shared lines must also be verified accurately.

To verify a multi-core cache coherent system requires the capabilities mentioned below.

  1. Verification Plan with Stimulus Generation - A subtle verification plan is a requirement for a complex environment on which designs can rely on their verification requirements. Thus, it off-loads the user from knowing the protocol details to create legal (or illegal) transactions.
    A requirement is to turn that plan into stimulus generation to achieve the plan's intentions. VIP whose purpose is to mimic the core/device behavior must create a stimulus that accounts for the protocol rules, cache line states, and any design-specific constraints when generating transactions.
  2. Cache Checking - Another verification goal is to catch any illegal activity happening on the BUS and ensure that each device complies with the specification. Also, checking must be done at the cache level to check that communication with cache is compliant with the CXL specification.
  3. Debug Mechanism - Once something fishy was caught, one needs to get to the root level of that activity which needs to be less time-consuming and efficient to get directly to the root cause of the issue. Less debug time ultimately leads to a lesser turnaround time for any system.
  4. Coverage completeness - Coverage helps us in ensuring the completeness of our verification plan and space. Thus, relieving the verification team of the burden of creating thousands of scenarios necessary. It dramatically reduces and focuses the test writing effort down to only filling the coverage holes.

HOW QUESTA VERIFICATION IP HELPS ADDRESS THE ABOVE VERIFICATION CHALLENGES

Intelligent modeling allows the QVIP to mimic both host and device behavior if DUT is at the other end of the CXL interconnect. Also, QVIP can be hook as a passive component to actively monitor the bus and provides various verification capabilities like a checker, coverage and logger.

As can be seen from Figure 3, CXL QVIP can act as a host or a device, or can be hooked up as a passive device (on the bus or attached to the CXL component) for analysis purposes.

Figure 3 - CXL QVIP Environment


1) Verification Plan with Stimulus Generation

The complexity of verifying CXL-based designs requires using QVIP to model the variety of CXL hosts and devices in the system. Therefore, offloading the user from knowing the protocol details to create legal (or illegal) transactions.

QVIP provides a comprehensive verification plan covering all the complex and simple scenarios required to verify a cache coherent system. QVIP role is to mimic all CXL compliant components, which helps create a stimulus that takes into account the protocol rules, cache line states, and any design-specific constraints when generating transactions.

Figure 4 - Detailed Verification Plan with QVIP


Questa VIP comes with in-build sequences that allow the user to use these sequences to create their scenarios if required or directly use them to achieve their verification plan completeness. The sequences required that the user executes transactions like D2H/H2D request and H2D/D2H response as per their scenario requirement.

Figure 5 - D2H Sequence Flow example


LOAD/STORE APIs

Mostly, users need to define or generate a scenario by thinking only at the higher level, i.e., instead of thinking about various D2H requests (Read, Write, Eviction), users only need to do Load and Store operation for a cache line.

So QVIP comes with this abstraction of providing high-level sequences that ultimately break down into lower-level sequences of D2H/H2D requests depending upon the cache line states. It does so by taking into account the cache line biasing state.

Figure 6 - Load/Store APIs Usage


Suppose the user wants to perform the cacheable write using the store APIs, QVIP first checks the cache line state let's say the cache line state is invalid for the provided address. The Device QVIP automatically executes lower-level transactions like gaining exclusive ownership of line using RdOwn D2H Request, modifying the cache line, and evict the cache line into the memory. On the other hand, if QVIP is configured as HOST upon receiving the RdOwn D2H request, it automatically invalidates the cache line in all caches using H2S Snoop commands.

Figure 7 - Store Operation if QVIP Is a Device


As shown in Figure 7, for the device QVIP, a user only needs to execute store operation without taking care of lower-level transactions. The API itself, based on the cache line state, executes lower-level transactions and updates the cache internally. Similarly, if QVIP is the host, it automatically executes snoop requests based on D2H requests and cache line state.

AUTOMATIC RESPONDER AND DCOH ENGINE

QVIP also provides the automatic responder which responds to the D2H/H2D request automatically and provides the appropriate D2H/H2D response based on cache line state and the request received. Therefore, the user doesn't need to take care of the environment's response.

DCOH engine automatically completes the device's request if a particular cache line address lies in the device attached memory and the line is device bias. Otherwise, it forwards the appropriate D2H request onto the host's bus.

2) Cache Checking using QVIP

In a multi-core multi-device environment, it is essential to verify each host and each device individually to ensure that they comply with the specification. QVIP host and device agents with enabled checker components help achieve this aspect.

Assertions: Each agent has its mirror cache model which mimics the other end cache and updates its local cache by monitoring the bus. The cache model changes its cache line states as per MESI protocol and the transaction observed after observing the transactions.

The QVIP checker would throw an assertion if any illegal activity or transactions happened on the bus, for instance.

Figure 8 - Assertion Message


As shown in Figure 8, an assertion message provides full information about violation with proper message format, error tagging, and required information to debug that violation.

Cache predictor: QVIP provides a separate cache component that helps maintain full system coherency by providing data integrity checks and providing the correctness of the cache by predicting what could be the transaction should be executed on the bus depending upon the current state of the line. This predictor mirrors the cache model and holds all the data about the cache line states and their related metadata.

Before executing any cache transaction on the bus, the user can communicate with this predictor regarding the type of command that can be executed on any cache line. The predictor returns the list of commands that can be executed on a given cache line address based on its state.

Figure 9 - QVIP Cache Predictor


As shown in Figure 9, cache predictor can communicate with devices and suggest which D2H request must be executed, based on cache line state and cache line biasing. This offloads the DCOH engine and reduces the latency as the device already knows which request to execute before DCOH processes it.

Cache predictor takes input from BUS and host and also throws an error whether the correct set of D2H and H2D request has been executed on BUS or not.

3) Debug Mechanism

Whenever a cache system gets caught in some unpredictable conditions or some illegal activity happens, the turnaround time to debug the behavior is crucial in the verification cycle.

QVIP comes with the following debug ways that help the user identify whether the behavior is correct or not, which ultimately helps in less turnaround time, leading to less verification cycle time.

Loggers: QVIP provides a cache enabled logger that provides all the bus information on CXL interconnect. This cache logger can be used to debug the traffic at a particular time from both the directions, i.e. from both host and device, leading the user to quickly reach the desired timestamp and observe the traffic at that specific timestamp.

Figure 10 - Logger Snapshot


As shown in Figure 10, logger instances from both device and host provide the necessary information required to debug and verify the behavior at any timestamp.

DEBUG MESSAGES

QVIP also provides the debug messages whenever enabled; they can print the traffic and transactions information in a transcript or on a shell as a first step of debugging without looking into the loggers.

Figure 11 - Debug Message


As seen in figure 11, all debug information about the cache transaction can be obtained directly on the shell by using debug messages.

4) Coverage

As previously mentioned, the vast verification space associated with CXL cache designs presents a key verification challenge. Defining all the complex scenarios requires significant investment. Yet, this is insufficient. The CXL verification solution must also enable you to measure and ensure the verification space's completeness.

CXL QVIP defines all the coverage points required to attain verification productivity. Thus, relieves the burden on the verification team of creating thousands of scenarios necessary. It dramatically reduces and focuses the test writing effort down to only filling the coverage holes. What's needed is an executable verification plan that hierarchically correlates to the CXL specification sections. It also must provide a way to differentiate between high and low importance coverage items easily.

Figure 12 - Coverage Map Example


SUMMARY

For verifying cache coherent systems verification plan is essential but not sufficient. The complexity of creating and managing thousands of individual test cases is not feasible within realistic schedule constraints. There is a clear need for a wide range of pre-defined stimuli to ensure that you can achieve high coverage against your compliance goals.

The major challenges needed for CXL cache verifications are:

  1. Verification plan with stimulus: A variety of stimulus required for achieving verification completeness goals
  2. Checks: A complete protocol checking with cache coherency checking required at every stage of verification
  3. Debug Mechanism: An efficient debug mechanism required to reduce the verification cycle time
  4. Coverage: A subtle coverage map required for verification completeness

Questa VIP helps achieve all the above verification challenges by providing means to tackle the above points like QVIP provides a verification plan with a vast pre-defined stimulus library. Loggers, cache predictors, debug messages, all QVIP components mentioned above must be incorporated into the environment to achieve 100% verification productivity and design quality.

Contact your local Siemens EDA representative to find out more about our Questa Verification IP solutions for CXL 1.1, CXL 2.0, PCIe5, our upcoming CXL 3.0 and PCIe6 support, and other protocols in the extensive QVIP portfolio.

ABOUT THE AUTHORS

Nikhil Jain is a lead member of the Consulting Staff on the Questa Verification IP team at Siemens EDA, specializing in the development of CXL, as well as Memory and Ethernet verification IP. He received his B. Tech. degree in Electronics and Communication from GGSIPU University, Delhi in 2007.

Gaurav Manocha is a member of the Consulting Staff on the Questa Verification IP team at Siemens EDA, specializing in the development of PCIe, NVMe, and CXL verification IP. He received his B. Tech. degree in Electronics and Communication from The NorthCap University (NCU), formerly ITM University, Gurgaon in 2013.

Back to Top

Table of Contents

Verification Horizons Articles:

  • All About a New Name, Avoiding Ruts, and Learning to Verify Well

  • Celebrating 10 Years of the UVM

  • Purging CXL Cache Coherency Dilemmas

  • What is “Verification” in the Context of DO-254 (Avionics) Programs?

  • Formal Etiquette for Code Coverage Closure

  • A Formal Verification Technique for Complex Arithmetic Hardware

  • Predictable and Scalable End-to-End Formal Verification

  • Enabling RISC-V Based System Development

  • The Six Steps of RISC-V Processor Verification Including Vector Extensions

Siemens Digital Industries Software

Siemens Digital Industries Software

##TodayMeetsTomorrow

Solutions

  • Cloud
  • Mendix
  • Siemens EDA
  • MindSphere
  • Siemens PLM
  • View all portfolio

Explore

  • Digital Journeys
  • Community
  • Blog
  • Online Store

Siemens

  • About Us
  • Careers
  • Events
  • News and Press
  • Newsletter
  • Customer Stories

Contact Us

USA:

phone-office +1 800 547 3000

See our Worldwide Directory

  • Contact Us
  • Support Center
  • Give us Feedback
©2021 Siemens Digital Industries Software. All Rights Reserved.
Terms of Use Privacy Cookie Policy