by David Jones, Xtreme EDA
SystemVerilog offers an exciting new environment in which to construct testbenches. Language features support constrained random generation, object-oriented programming, assertions, coverage, and more. Verification engineers new to this environment may not know where to start or how to use these features. This paper presents a complete testbench for verifying a rock, scissors, paper arbitration module, based on a methodology developed at Mentor Graphics and XtremeEDA, aimed at building effective verification environments with minimal complexity. Due to the simplicity of the device under test (DUT) we can present the complete testbench here, as it totals fewer than 300 lines of code.
The first section of this paper briefly describes the motivations behind the major verification features of SystemVerilog. The paper then proceeds to present a high-level verification environment and describe the components that comprise it.
MODERN VERIFICATION: CONSTRAINED RANDOM COVERAGE-DRIVEN VERIFICATION
A typical test environment developed in Verilog enables directed test cases: each test case sets up a scenario in the DUT, runs a (hopefully) problematic input and verifies that the DUT responds correctly. Directed test cases are good at finding expected bugs, but will not find more complex bugs resulting from the interaction of different features. As devices get more complex, finding these bugs becomes critical to success. SystemVerilog has four main verification features to support advanced randomized testbenches:
- Object-oriented programming features allow the user to represent complex data types and to abstract away low level operations on these types. Typically, SystemVerilog verification proceeds at the transaction level, where the fundamental objects are entire transactions (bus cycles, packets, etc.) rather than signal transitions.
- Constrained randomization selects random values for data transactions. Constraints are used to ensure that the selection of values is both relevant (e.g. Ethernet payload size between 46 and 1500 bytes) and useful.
- Functional coverage tabulates events occurring within the DUT and testbench to allow the verification engineer to determine if important functions inside the DUT have been exercised under all relevant scenarios. For example, a scheduler block may take an exceptional action if all output FIFOs are full. Functional coverage can be used to verify that all output FIFOs are indeed full at some point in the simulation, preparatory to verifying that the scheduler performs correctly in this case.
- Temporal assertions verify low-level aspects of communication protocols. Communication between functional blocks in a design typically must follow a well defined protocol over time. Such protocols include not only standards such as PCI and Ethernet, but also the timing details of any proprietary internal interfaces.
TYPES OF REUSE
Before examining a methodology for creating reusable testbenches, we should first look at what we are trying to achieve with "reuse". We can identify at least three types of reuse: reuse of verification components, reuse of test cases, and reuse of testbenches. Each type of reuse places conditions on the resulting code.
When one thinks of a reusable testbench, the first thing that comes to mind is reusable verification components, typically associated with a signaling protocol such as PCI or SPI4φ2. Indeed, reusable verification components have spawned a whole industry of verification intellectual property (VIP). Verification component reuse requires that the domain of possible transactions (read, write, etc.) is well defined.
In addition to verification components, individual test cases can be reused. For example, a PCI core is verified using an environment consisting of PCI drivers and monitors, as well as a suite of test cases. Although it is common to package the drivers and monitors for reuse, one can also package the test cases to create a complete test suite. Doing this effectively requires that the PCI test engineer define and abstract the protocol operations on the application side of the PCI bus (e.g. a PCI-to-Wishbone bridge would reflect transactions on a Wishbone bus). To use the test suite, customers must implement the application-side protocol in a manner specific to their designs.
Finally, proper architecture of testbenches can enhance reuse possibilities, either related to a single design, or across multiple designs. For a single design, proper architecture at the block level allows the testbench to be used as-is to cover the block within the full-chip testbench. Across multiple designs, proper architecture minimizes re-work to test successive devices.
This paper will provide the complete testbench for a rock-paper-scissors (RPS) arbiter. The rules for RPS itself are described at http://www.worldrps.com. The example DUT referees a version between two digital logic players. The pin interface for each player is shown below.
There are three inputs that normally must be low. Upon receipt of the go signal (active high for one clock), the player must assert one of the three lines for one clock. The DUT will then determine which wins, and increment the appropriate score. Play proceeds until one score meets a limit, which is configured through a configuration interface:
Figure 1: Player protocol
Figure 2: Typical Verification Environment
The timing of the protocol between a player and an arbiter is shown in Figure 1.
ELEMENTS OF A REUSABLE TEST ENVIRONMENT
The elements of a typical reusable test environment are shown in Figure 2. Drivers, monitors, test cases and scoreboards will be familiar to seasoned Verilog testbench designers. SystemVerilog allows the verification engineer to better model transactions and define the lines of communication between the components. To this end, the elements of a reusable SystemVerilog testbench also include the dynamic data objects, as well as standardized communication channels.
DYNAMIC DATA MODEL
Modern verification is done as much as possible at the transaction level. A transaction is a logical unit of work, such as a burst cycle on a bus, or a packet sent over an interface. Transaction-level modeling concentrates on the interactions of transactions upon the DUT without worrying about the pin-level representation of the transactions. Verilog testbenches cannot model data transactions very well as Verilog's only real data type is the vector of bits. In contrast, SystemVerilog classes can represent complex transactions in an organized manner. Transaction objects are passed among testbench components by reference, improving performance.
Transactions are best modeled using SystemVerilog classes. A transaction object must contain all information required: operation type, address, data, etc. Depending on the transaction it may also include the time at which the transaction was issued.
In addition to defining the data, SystemVerilog's object-oriented features allow the designer to define a functional interface to the objects through a set of public methods (tasks and functions). Manipulation of class instances through the methods is preferred over direct access to the data items. The methods allow decoupling of the data representation and the behavior of the object, so that new fields can be added, or the implementation of certain behaviors modified without impacting the verification components that work with the objects. The following operations are representative of what can be done with methods:
- Make a copy of an object.
- Compare an object with another.
- Create a string representation of the object for use in a text messaging system.
- Pack or unpack the fields of an object into a stream of bytes.
Finally, class definitions allow the specification of random fields and randomization constraints. Constraints permit automatic generation of legal transactions where the specific components of the transaction are randomly generated. Care must be taken when applying constraints. Although some constraints (such as Ethernet frame size) are desired to conform to a protocol specification, it is useful to disable these constraints to test the DUT behavior outside of the specification. Besides constraints applied for correctness, one may also apply constraints to bias stimulus generation towards interesting cases, e.g. generating high-speed streams of small packets to stress a DUT having a minimum per-packet overhead. Often these constraints conflict with one another. To deal with this, either group the constraints into separate constraint blocks and use constraint_ mode() to selectively disable them, or create subclasses of a base class for each desired group of constraints. Both approaches are useful in practice.
As an example, here is the definition of the transactions for our RPS example. The definitions are placed in a package (rps_pkg) so that they may be used anywhere in the testbench.
Class rps_c models a single move by one player. Its only component is the choice of play (rock, paper or scissors) which we represent by an enumerated type. The rps field is random, enabling generation of random plays. Class rps_mon_c models a transaction recovered by a monitor. In addition to the play, it conveys an "ok" status (a player may decline to make a move when required) as well as the score. Both of these classes have a toString() method that returns a string representation of the data values.
Before discussing the static elements of the testbench, it is useful to discuss the techniques used to connect them together. Our methodology uses SystemVerilog interfaces for both pin-level and transaction-level interconnect.
SystemVerilog interfaces encapsulate both signal definitions and task/function definitions inside a construct that can be instantiated much like a module. We use modports to document the various functional aspects inside an interface. Pin-level (physical) interfaces are defined through the signals contained within interfaces, and transaction-level interfaces are defined using tasks and functions. The pin level interfaces to our DUT have already been described.
For a verification component to be reusable, the functional interfaces through which it generates or accepts transactions must be well defined and standardized. Mentor Graphics has developed SystemVerilog standardized interconnects based on the OSCI SystemC TLM standard transports. Each transport is type parameterized for the transaction type and optionally the response type.
- The TLM FIFO interface supports unidirectional blocking and non-blocking data transfers. This transport is used where the source does not care about completions (e.g. transmitting ATM cells).
- The TLM request/response channel supports two independent FIFO interfaces, one for requests, and one for responses. This transport is used where a response is required to a request, and requests and responses may overlap in time (e.g. PCI Express.) Each FIFO may block independently.
- The TLM transport channel supports a serialized request-response mechanism. A semaphore is used to ensure that only one request may be outstanding in the channel at any given time. The transmitting component blocks until a response is received.
In addition to the above channels inspired by SystemC, we have developed an analysis port interface. An analysis port is a non-blocking communication channel that can be connected to more than one sink. Each sink component is presented with a transaction using a non blocking void function call. In contrast to the other transports, the analysis port functions correctly with zero, one, or more than one sink connected.
The above interconnect schemes handle transactional communication. Since transactions can be common to multiple devices or environments, they are good candidates for reuse. However, testbench-specific communication (e.g. between a scoreboard and a test controller) is often necessary. Adhoc methods, such as signals (good for boolean indications), events and hierarchical task calls can be used where required.
Referring back to Figure 2, a test environment will have the following types of devices.
A stimulus generator creates the transactions that are sent into the DUT. A directed stimulus generator uses imperative code to create individual transactions. A better approach is to use a random stimulus generator, which randomizes the data properties of a class instance. The stimulus generator usually connects to a blocking transport, such as a TLM FIFO. Here is the stimulus generator for our RPS example:
The code above creates a new rps_c transaction object, randomizes it, and sends it to the transport, in this case a TLM FIFO. The FIFO must have a finite size such that this generator will eventually block. A more complex generator may support being started/stopped from the test case.
Drivers convert transactions into lower-layer transactions or pin activity. The typical driver accepts transactions from a blocking transport such as a FIFO and either creates transactions for a lower-level protocol and passes them on to another transport, or implements the transaction as pin-level activity. Some drivers, such as bus drivers, may also need to obtain a response. These drivers will connect to the request/response or transport channels. Pin level drivers for synchronous protocols should use non-blocking assignments to avoid race conditions.
Here is the example driver:
The driver must conform to the device protocol. It cannot set any of the rock/paper/ scissors bits until the arbiter gives us the go signal. At that point, the pins are driven based on the transaction. At all other times the player's pins must be low. This driver uses try_get() so that it won't block. It is effectively a synchronous circuit in itself. Non-blocking assignments are used to avoid race conditions with the DUT.
Monitors convert pin-level activity or lower layer transactions into higher-layer transactions. In our methodology, monitors connect to analysis ports, which guarantees that the act of issuing a transaction is non-blocking, thereby avoiding the monitor missing subsequent pin-level activity. Pin-level monitors also incorporate assertions to verify the temporal properties of the protocols they are monitoring. Typically, the assertions check that the signals are well-defined (not Z or X), and that each transaction conforms to some legal cycle and follows all of the conditions imposed upon it. Basically, the assertions verify all properties of the protocol independent of the data. Monitors are located on the output path from the DUT as expected, but they are also useful on the input path, to verify that the drivers are working correctly, to provide proper transactions for the scoreboard, and to collect coverage.
Our example monitor code for one player follows:
The first part of the monitor recovers transactions from the pin-level activity. We need to sample the player inputs one clock after "go" is sampled high, and we need to sample the updated scores one clock after that.
On the first clock, we decode the pin activity into one of ROCK/PAPER/SCISSORS. Only legal bit patterns are accepted; all others will result in an illegal transaction. On the second clock we sample the score and send the transaction to the analysis port.
The other important part of a monitor is the protocol assertions. We use assertions to verify that:
- There are no Z/X meta-values in the plays or score.
- That exactly one player input is high one clock after "go".
- That no player input is high at any other time.
A scoreboard is a component that performs complex data checks. A typical scoreboard may include the following components:
- A database of transactions received to date. The format of this database depends on the requirements. For example, a router scoreboard may require that each port maintain an ordered queue of expected packets.
- A behavioral model of the DUT. This model must implement any required data manipulation functions of the DUT. This model is usually simpler than the DUT since it operates at the transactional level rather than the pin level, and need not be synthesizable.
- A collection of data checks. Each data check runs when the database has received sufficient data to do so. Where possible, the data checks should be written to verify the behavior of the DUT without using a behavioral model, as it is likely that a behavioral model will contain the same conceptual errors (although not necessarily implementation errors) as the DUT. For example, a Reed-Solomon encoder should be verified by attempting to decode with a behavioral decoder. If the input has not been corrupted by error injection, then the decoder should be able to confirm zero errors. However, use of checks alone is not always possible; for example, an image processing circuit is often verified against a behavioral model simply because image aesthetics are too difficult to capture in a data check set.
Our example scoreboard connects to the two monitors, one for each player. A transaction is expected at each monitor at the same time. The "database" consists of the expected scores for each player. Due to the simplicity of the DUT, the verification is performed using a behavioral model. The function wins_over() determines who wins given a pair of rps_t items obtained from two transaction objects. We collect two transactions, update the scores and compare against the scores obtained from the transactions. The scoreboard also has logic to determine when the game is over, at which point the test is done.
Coverage is the component that brings "closure" to the testbench. Coverage is required to ensure that all interesting cases have been tested. This is required in a constrained-random environment because one cannot guarantee that any given random testcase will test all interesting aspects of DUT operation. Instead, a few testcases often end up testing most of the DUT, and specially constrained testcases will be required to test the remaining corner conditions. Candidates for coverage include input and output transactions, state machines inside the DUT, corner cases for FIFOs, etc. Cross coverage (coverage of all combinations of two otherwise independent events) is useful for verifying that functional blocks operate in all modes. Any suspected problem areas within the DUT can also be covered.
The one coverage item that comes to mind in our example is all possibilities of rock/scissors/ paper from both players. This is an example of a cross-coverage item since the coverage data is the Cartesian cross-product of more than one data source. We have chosen to integrate coverage into the scoreboard since the data comes from the same transactions upon which the scoreboard operates.
The covergroup declaration defines what we are to cover. It is then necessary to actually instantiate the covergroup which is done immediately below. This covergroup is set up with an explicit sampling event, which is executed once the scoreboard has obtained the two transactions to cover.
The testcase is where any test-specific configuration is performed. A testcase must configure the DUT as well as any random stimulus generators. The testcase should also manage the test termination conditions.
After bringing the DUT out of reset, our example testcase performs DUT configuration: it fixes the score limit at 20. It could also potentially randomize this item. The testcase then waits until the scoreboard claims the test is over, after which it displays the verdict.
There will be a different testcase file for each test scenario. It is often the job of the simulation compile/run script to select a test case to run. Alternatively, all testcases may be compiled into a single environment, such that the test to run can be selected at run time. Techniques for doing this are beyond the scope of this paper.
The top-level file instantiates the DUT and all other components. This does not differ in construction from a typical Verilog top-level file, except for possibly the instantiation and use of interfaces. The "gen_fi fo" is a standard component from our TLM library.
This paper has presented a basic SystemVerilog testbench using constrained random, coverage-driven, assertion-based techniques. We used a SystemVerilog version of the SystemC TLM library to manage the interconnect. Although this example looks complex, the test environment itself weighs in at only 300 lines of code, and illustrates the basic concepts and roles of each component. The reader can use this testbench as a template for more complex designs.