The need for intelligent verification is the outcome of a two decade long pre-silicon verification process. Intelligent testbench automation, which is a supplement of intelligent verification, is a step closer towards achieving more confidence in design with minimal engineering effort. Applications today demand diverse functionality, which results in complex to very complex designs. Pre-silicon verification for first-pass success using current verification approaches is just not enough. A unique approach is needed that not only verifies the design faster but also achieves consistent results. Intelligent testbench with automation is the answer to today's manual verification approach.
ASICs today demand high-bandwidth operations; which in turn demand high bandwidth on a system memory bus, like a DRAM interface bus. It is imperative that a comprehensive verification plan also includes verification for performance and power along with functional features. Having a large number of variables makes verification more complex. But this adds confidence in ASIC/SoC completeness for an end user's application. In order to achieve high system performance in any ASIC/ SoC, DRAM bus bandwidth utilization is equally important for that system.
High bandwidth on DRAM means less idle DRAM cycles. Manually finding coverage holes in the verification of a DRAM bus is a tedious process. This article proposes a unique verification component that helps find these holes in an intelligent manner, and it discusses potential solutions and advantages over other verification approaches. Additionally, it proposes another intelligent component which helps in simulating real-world fault/error cases without waiting for a chip to get fabricated and tested and also discusses achieving seamless portability across all memory sub-systems.
There has always been space for growth in pre-silicon verification. The driving force behind this growth is first time verification and performance success. ASIC/SoC designs are becoming complex and first time functional and performance success requires rigorous pre-synthesis simulation. Today, achieving targeted performance is equally important as achieving functional specifications. A major performance bottleneck is caused by low performance of a memory sub-system. Most of the data ports on a chip deal with memory sub-systems. So, it is imperative that memory sub-systems perform at targeted bandwidth. This article proposes a unique verification component, which helps design/verification engineers find coverage holes in the verification of DRAM buses with minimal effort, such that the memory controller can be fixed to serve the system request and targeted bandwidth can be achieved. We have coined this component to be the scheduler, as it "predicts" the request place holder and assists the design engineer to fix the memory controller "to schedule" the system requests. The scheduler requires certain inputs to predict the system request place holder over the DRAM bus and functions independently regardless of the end application. It is connected to the system interface at one end and the DRAM interface at the other end, but it does not mimic the memory controller's scheduling. Detailed understanding of the scheduler is described in further sections, Another component, the article proposes is the memory handler. The inception of this component arose from the question, how to verify memory controllers (MC) along with DFI in the case of real-time bus faults/errors over DRAM bus in a polished and organized manner during pre-silicon verification. Real-time faults are ones generated during bootup; such as gate or leveling training failure, data transfer failure, etc. Real-time errors are CRC, parity, ECC, etc. An argument can be made that these errors can e injected via a "DFI model", provided the model provides enough hook-ups to exercise these errors/faults on both interfaces; i.e., DFI and PHY. But the models are not available with this support, it just performs the intended function. DFI as a separate component is unnecessary. Some IP vendors provide a combined solution for MC and DFI. In such a case, exercising the faults/errors becomes arduous. Hence, a unique component is required that can help exercise these errors regardless of the memory sub-system combination. The section further also elaborates in detail about the memory handler.
The two components in discussion here are in the context of a DDR4 memory sub-system and have been used as intelligent verification components in Arastu Systems DDR4 DRAM Memory Controller verification environment. The article also discusses he results achieved using these components. Both the components have been developed using SystemVerilog and UVM.
The primary function of the scheduler is to report holes over a DRAM bus but secondarily, it also reports DRAM bus and system bus utilization, which would be AXI4 System Interface in this case.
The term "scheduler" means schedule-based prediction, which predicts DRAM command issuance during an idle DRAM cycle. Not only does scheduler consist of components similar to RTL, but also it includes an additional component to check system performance. A desired performance can be achieved in multiple ways:
- Through finding holes by opening the wave dump
- By mimicking the exact behavior of RTL; i.e. using RTL Model
- Using a schedule-based predictor
Opening the wave and finding coverage holes is a tedious process. Once the RTL is fixed for the purpose of scheduling, each time a design/verification engineer would have to open the wave to view the fixed solution and move on to finding the next hole. A better approach would be to plug the RTL model parallel to RTL, which would be to a system interface at one end and a DRAM bus interface at the other end. But the problem with this approach is, it is difficult to find the holes using the model if the application of the memory sub-system changes. Both the model and RTL would have to be changed in order to accommodate the new application. To mimic the exact behavior of RTL in the context of a data request service would in itself be a new task.
A much easier approach would be to have an error alert raised on the DRAM bus at each occasion, independent of application. Approach three in this article, which uses a schedule based predictor, serves this purpose. The scheduler here takes certain inputs such as:
- Write memory controller latency
- Number of data bytes to wait from system end for write request processing
- RD memory controller latency
- DFI latency
- Number of DFI phases
- Merge requests
- Starvation threshold
The scheduler also considers DDR4 JEDEC defined inter-bank, intra-bank and other timings along with the above inputs. Different DRAM request queue and latency counters are managed inside the scheduler. Whenever there is a latency counter and respective DDR4 timings are satisfied, if no request is found over DRAM, error alert suggests that "such request could have been exercised." With the help of the scheduler, bus utilization was improved by more the 20% during verification of an Arastu Systems DDR4 DRAM Memory Controller for packet buffer application. The configurations under consideration were 4 AXI4 Masters, each operating at a frequency of 500 MHz and DDR4-1600 as the memory configuration.
The scheduler also offers a few additional features that are imperative and assists in improving DRAM bus utilization:
- Request merging
- Data hazard
- Starvation threshold
If two or more system requests target the same location, then they can be merged and predicted as a single DRAM command. This helps decrease redundant DRAM commands and increase DRAM bus efficiency.
If multiple write/read system requests target the same location, then their order of execution must be maintained. Data hazard logic of scheduler watches these types of requests.
In QoS, the master having the lowest priority sometimes suffers a larger latency period for the requested operation. Starvation threshold is the cap on the worst case latency claimed. If a request is accepted from the system interface and crosses the starvation threshold, the scheduler alert suggests the crossing of starvation threshold for the particular request. The starvation threshold value can be configured by the user.
This scheduler is used in Arastu System's DDR4 DRAM memory sub-system verification. At the end of simulation, it generates a DRAM bus utilization report which contains write request max/min/avg. latency, read request max/min/avg. latency. It also generates a system bus utilization report.
Following is a sample report from the scheduler:
# Performance Report for AXI Master:0
# Read Request Completion Time (clocks)
# Average : 1004.44
# Minimum : 282.00
# Maximum : 1405.00
# Write Request Completion Time (clocks)
# Average : 674.11
# Minimum : 263.00
# Maximum : 1185.00
# Read Data Transfers
# Actual : 4.39 MB
# Possible : 6.19 MB
# Bus Utilization(%) : 70.95
# Write Data Transfer
# Actual : 4.49 MB
# Possible : 6.19 MB
# Bus Utilization(%) : 72.53
Note: All latency values are in terms of the system clock. The above report is for a single AXI master for illustration purposes. Scheduler displays reports for all the AXI masters in the system and it can be easily configured to support #N AXI masters.
# Performance Report for DIMM: 0
# Read Latency (clocks)
# Average : 1497.19
# Minimum : 8.00
# Maximum : 2546.00
# Write Latency (clocks)
# Average : 841.72
# Minimum : 4.00
# Maximum : 2186.00
# Transfer Rate (MTps)
# Average : 1419.10
# Maximum Possible : 1600.00
# DRAM bus utilization (%) : 88.69
Note: All latency values are in terms of PHY clocks.
The idea behind designing a memory handler is to exercise real-time bus faults/errors, which are generally observed during board bring-up or run time usage. Some of the bus faults are as follows:
- Failure in gate training
- Failure in leveling training, etc.
Bus errors include:
- CRC error
- Parity error
- ECC error
- RD data timeout error, etc.
Implementing and verifying these faults/errors during simulation will give confidence in the memory controller's/DFI's error handling mechanism. The memory handler mimics the bus faults during simulation. It is designed using SystemVerilog and UVM and is utilized to verify Arastu Systems DDR4 DRAM Memory Controllers.
As shown below, memory handler is located between DFI and DIMM. At both ends, it is connected to DDR4 DIMM interfaces.
Memory handler provides a rich set of callback hook-ups to insert faults/errors on both the sides; i.e. towards DFI and DIMM. All the callback hook-ups reflect live status of the signal received and the user has the privilege to alter the value of the signal in order to insert fault/error.
Once training via MRS is enabled, the READ command has been issued and DQS is toggled by DIMM, an error can be inserted in gate training. The value for DQS is reflected in the callback and the verification engineer can alter its value to either high impedance or logic low. So at end "A" there is no toggling of DQS, which results in training failure. The user has the privilege to repeat this multiple times. One can verify whether DFI raises an interrupt for training failure or not, by inserting this error. Error injection and handling mechanism verification discussed here can be applied to other trainings also.
Similarly, the memory controller's error handling mechanism can be verified for errors like CRC, parity, ECC, read data timeout, etc.
In case of a CRC error, the memory handler receives all the data with CRC from interface "A" and the user can alter either the data or CRC value to insert CRC error on interface "B". DIMM asserts "ALERT_n" when it recognizes CRC error. When the memory controller sees the "ALERT_n" assertion, it suspends request service and will initiate error handling mechanism logic. As per DDR4 JEDEC specification, the memory controller shall re-transmit all the transmitted commands in a window, once "ALERT_n" is received. With use of the scheduler, it keeps watch on the commands repeated after "ALERT_n" is received. If RTL misses re-transmitting any command, the scheduler will raise an error alert. Multiple CRC errors were inserted in a few milliseconds using the memory handler to validate the interrupt generation logic for Arastu Systems DDR4 DRAM Memory Controller.
One can also utilize the memory handler to check the response from DRAM Memory Controller to System Interface, in case of missing read data from DIMM. By not toggling DQS lines over "A", no data will be received by DFI and hence also to the memory controller. Eventually, this times out the timer inside the RTL and the RTL shall report unavailability of data by responding over AXI interface.
All the above mentioned cases were exercised during verification of the Arastu Systems DDR4 DRAM Memory Controller. Additionally, other errors like parity, ECC, etc. were also included during the verification process.
At times bugs are found during the later stage of execution, the very objective of pre-synthesis simulation is to exercise as many scenarios as possible without compromising on cost and performance. The proposed verification components "schedule-based-predictor" and "memory handler" satisfies this objective. As described, "scheduler" predicts the DDR4 DRAM command place holder in case of idle DRAM cycle(s) and memory handler helps verifying error handling mechanism of DFI and the memory controller. Both the proposed components are used in verification of the Arastu Systems DDR4 DRAM Memory Controller and have achieved higher than 20% improvement in bus utilization for packet buffer applications.
Back to Top