With more emphasis within the electronics industry on high-performance and shorter time to market, the need for high-confidence and high-quality end-to-end verification is becoming more and more important. This is especially true on the heavily optimized arithmetic datapath blocks most used in modern compute and neural network applications. Missed bugs can delay projects and be costly to reputations, and, at the very least, be likely to sap performance.
Given that physical hardware is inherently fixed and unchangeable, it is extremely hard, if not impossible, to correct bugs and design faults after release. As such, the verification of hardware IP well before the move to full silicon tape-out is an established and widespread practice. A hardware verification engineer has several tools at their disposal, including:
- Simulation: Compiling the hardware design into a software model and applying stimuli to test various aspects of the expected functionality.
- Emulation: Compiling the hardware design onto an emulation device (an FPGA or similar), and much as with simulation, applying stimuli, but often at higher throughput and capacity than the software model could achieve.
- Formal Verification: A set of systematic techniques to prove mathematically that the hardware design functions according to its specification.
Simulation and emulation are well used, and widely understood. Sophisticated engineers can test designs well and often find many bugs by carefully choosing their stimuli to test deep and complex aspects of the functionality. Despite this, however, simulation can never fully prove the absence of bugs in a system, it can only demonstrate for certain that no bug was present for the particular stimuli used and provide statistical evidence for the likelihood of having missed other vulnerabilities.
Interpreting that evidence can be an art in its own right! As all experienced verification managers are aware, the quality of the test is highly dependent on the quality of the chosen stimuli, and by extension, on the quality of the verification engineer. For high-performance applications, it would be ideal to get more confidence than simulation and emulation alone can provide. These are nonetheless the two primary methods of hardware IP verification, even for high-end applications where missed bugs can significantly hurt performance.
Of the three methods mentioned above, only formal verification has the potential to show the total absence of particular types of vulnerabilities, as opposed to simply finding ones that are there. As Edsger W. Dijkstra noted at the NATO Rome Conference on software engineering techniques as far back as 1969 , “Testing shows the presence, not the absence of bugs”. Formal verification, by contrast, has the potential to fully demonstrate the integrity and security of a system.
However, for many designs, formal verification suffers from scalability problems. In particular, when the state space of a system is especially large, or the nature of the design is mathematically complex, formal verification tools can struggle to give a conclusive answer as to whether or not a bug or vulnerability exists. In such cases, we say the proof fails to converge.
For this reason, despite the potential for full confidence, engineers often don’t think of formal verification as a potential avenue to verify their component, expecting their design to be too complex for them to get useful results.
In this article, we will explore a particular class of potential bugs which are intractable to current formal verification methodologies, namely, bugs resulting from the interaction of arithmetic and control logic. This class of bug has the potential to cause serious and systemic problems with an electronic device.
At one end of the severity scale, this could lead to an occasional breakdown of mathematical calculations and the corresponding failure of the hardware to give meaningful output. At the other, perhaps more serious end, it could result in catastrophic and systemic issues that could compromise the reliability of an IP between inputs and output, ultimately rendering the device useless. Possibly the most famous bug in history, the FDIV bug in mid-1990s Pentium processors, demonstrates that when targeting high-performance applications, reliability and correctness are some of the costliest aspects of design to get wrong.
EXAMPLES OF COMPONENTS WITH ARITHMETIC AND CONTROL
Although often considered separately, almost all arithmetic or data transformation components also contain some amount of control logic. At the simplest end, a component may have a multi-stage pipeline through which data progresses, and some set of rules about how that progression occurs. Common examples are fixed pipelines with interrupts, external register enables and handshake protocols.
For complex components, there could be data-dependent feedback, there could be resource contention, and the results of calculations could appear out-of-order, compared to the order they were input. In the out-of-order case, it is usual to provide an accompanying sideband identifier, to associate the correct output packet with each input packet.
In these components, as well as simple pipelines, there are two main (broad) sources of arithmetic error. The first is a fundamental issue with the algorithm or the implementation of that algorithm. Here, the bug means that for certain inputs (or possibly all inputs), a particular calculation in the hardware is fundamentally broken. If such an input arrives, the output will always be wrong. For this reason, these bugs will present themselves even if the transaction is presented immediately after reset, with no stalling or other transactions in flight.
The second source of error is in the data control. In this case, the hardware is in place to correctly calculate the result for any input. If the control sequence is simple, perhaps if the transaction is presented straight after reset and there are no
other transactions to interfere, the result may still be correct.
However, if the same transaction is presented at a different point, perhaps after many thousands of cycles with a number of other transactions already in-flight, the control bug may cause a different, possibly even junk, result to be output instead of the correct answer.
The second type of bug is much harder to find, because it may only present when the device is in a particularly deep and hard to reach state. Such bugs are extremely difficult to hit in simulation and are not easily covered by the capabilities of current arithmetic formal verification tools.
What's more, while the first type of bug is a serious issue, the second type of bug can have far more concerning consequences from a performance perspective. Where hardware is incorrect for specific known inputs, it may be possible to produce a software workaround which performs a different operation only in those infrequent cases. While not ideal, this produces a known and fixed reduction in performance for the given operation.
Where a bug is state-based, unpredictable and could occur on any input, there could be no reasonable workaround which will consistently produce the correct answer. In this case, the only solution may be completely bypassing the problem component and emulating the operation much less efficiently on other hardware. This reduces performance both by slowing down the given operation and by tying up other components which would otherwise be processing different operations.
This second type of bug also differs from more commonly tested control properties, which tend to concern the flow of data rather than its value. Even for arithmetic components, it is common to write, and perhaps successfully prove, properties about handshake protocols, transaction counts and the movement of data. But these properties won’t catch the corruption of data values based on state, which is generally too hard for existing tools to cope with.
LIMITATIONS IN CURRENT ARITHMETIC HARDWARE VERIFICATION METHODOLOGIES
Arithmetic hardware designs are some of the most complex and difficult to verify. Such components are often pipelined over many hardware cycles and may involve data-dependent feedback.
As we have mentioned above, simulation methodologies are widely used to verify all kinds of components and are often used to verify mathematical hardware. As we noted, such methodologies have many advantages, but cannot be used to fully demonstrate the absence of bugs and vulnerabilities.
Formal tools exist which can prove the correctness of arithmetic hardware, but they generally suffer from one of a few limitations:
High Effort Requirement: Tools that employ methodologies based on some theorem provers and proof checkers require a manual proof to be written by the user. This work requires significant expertise and time investment, with no guarantee of convergence.
Finite Time-Window: Some tools can only prove output correctness over finite time windows – primarily to prove the correctness of the first output after reset, and as such demonstrate that the design can function correctly, rather than that it always functions correctly.
Low Arithmetic Support: Model checkers or property checkers can prove proper-ties automatically and over infinite time windows, but generally struggle with any mathematical content.
For this reason, getting a full, formal proof of the correctness of a
mathematical component for all time is currently out of scope for anything except a full hand-proof completed at high-effort and potentially high risk by an expert.
OUR ABSTRACTION METHODOLOGY
At Imagination Technologies, we have developed an abstraction methodology which can be used with Siemens EDA Questa Formal, as well as other major formal property checkers. The technique allows the mathematical aspects of the component to be removed without weakening the completeness of
The key problem we’re looking to solve is that of whether a given device is consistent, or not. That is – is the result of a given calculation will be the same no matter when it is performed? If we can demonstrate consistency of this type, we can be sure that output of that operation can only be influenced by the inputs intended for that calculation, and we can never get junk data out, or potentially worse, data from another program’s calculation.
If we combine such a proof with a proof that the device always functions correctly immediately out of reset, we will have proved that the result is correct for all inputs and for all time. To check the consistency aspect, we do not need to demonstrate that the result is correct in its own right, only that it is the same as the result produced at a different point in time. With this in mind, we set up the following equivalence proof, shown in Figure 1 below.
Figure 1 - Equivalence Flow
We take two identical copies of the design we are verifying and feed them the same transaction. In one copy of the design, we restrict the control flow to the simplest possible behavior. In the other, we allow the device to be driven with any legal inputs for any number of cycles and input the symbolic transaction at any point in time, while allowing any number of transactions to be simultaneously in-flight, and any stalling behavior. We then compare the results of these two copies, which should be the same.
This setup does not immediately allow the existing collection of commercially available formal tools to complete a full proof. The existence of complex mathematical components still means that the problem is too hard for them to solve, in general. At this point we make our key observation which forms the basis of our abstraction technique:
“It does not matter what calculation is being performed, only that it is consistent with the same calculation performed at a different time”.
With this in mind, we make the following abstraction. “The output of any piece of combinational logic may be abstracted as being only deterministic in its inputs, with no assumptions being made on specifically how the output is created from those inputs.” We call this the determinism abstraction. In this context, a piece of logic is deterministic if it will always produce the same output if presented with the same input.
By replacing combinational blocks within a device with the assumption that the given sub-block is deterministic, we can massively simplify the problem of proving that the overall device is consistent. It’s worth noting that combinational logic is naturally deterministic since it has no stored state.
To test the capability of the methodology, we took a selection of designs of interest and attempted to verify consistency as described above. We performed the verification both with and without the determinism abstraction to demonstrate the difference in convergence rates.
Table 1 above shows the speed of the results (DA denotes determinism abstraction).
Table 1 - Results for designs 1-3 in a leading property checker
We also performed a trial with the technique on several more substantial components and obtained the results noted in Table 2.
Table 2 - Results for designs 4 & 5 in a leading property checker
With increasing focus on high-performance and the importance that places on mathematical hardware, certainty in verification is of ever-increasing importance. What we have described here is a methodology that involves more than a single stage. First, we independently verify the maths blocks, testing the multipliers and adders in isolation using industry-standard datapath formal tools, such as Mentor’s SLEC. This approach enables us to be confident that the sub-modules are correct, ahead of using our suggested methodology to verify the entire data path design.
Formal has the potential to add robustness to the verification process but suffers from scalability issues. By dividing the problem of verifying arithmetic hardware into one of verifying correctness, and one of verifying consistency, we can leverage the great strengths of different aspects of formal. However, there are still significant hurdles to overcome. The determinism abstraction provides a bridge to allow otherwise non-arithmetic tools to cope well with mathematics and push out the boundary of what can be verified with formal.
Back to Top