by Ram Narayan, Oracle Labs
Project RAPID is a hardware-software co-design initiative in Oracle Labs that uses a heterogeneous hardware architecture combined with architecture-conscious software to improve the energy efficiency of database-processing systems. This article, adapted from a paper we presented at DVCon 2014, describes how formal methods went from being used opportunistically to a central place in the verification methodology of the RAPID SoC. Though not without a learning curve, formal did help the Oracle team achieve its verification goals of finishing on schedule and achieving first pass silicon success.
The initial RAPID verification plan largely relied on a constrained-random simulation environment developed using UVM. Most newly designed units had their own verification environment to thoroughly verify the unit prior to integration into the RAPID SoC. The SoC verification was largely used to verify the interaction between the different units under various operating modes.
EVOLVING USE OF FORMAL STRATEGIES
1) Connectivity Verification
At the outset of the project, there were no specific plans to use formal verification on RAPID. We did not have any infrastructure in place for running formal tools, and neither did we have anyone on the team with any noteworthy experience using these tools. But, early on, we decided to explore the use of formal methods to verify the connectivity at the SoC level focusing on interrupt and Design for Test (DFT) signals.
Our goal was to catch trivial design errors through formal methods without having to rely on lengthy and in some cases, random SoC simulations. Given our modest expectations at the outset, we would have been satisfied if we just verified these SoC connectivity checks with formal tools.
With immediate success using formal for the connectivity checks, we became more ambitious, deciding to explore the use of formal to verify some of the custom IP being designed for RAPID.
2) Assurance Targets
Some of the IP units, (Event Count Monitor (ECM), System Interrupts Controller (SIC), Least Recently Used Arbiter (LRU) and SRAM Parity Error Register (SPE)) were good candidates for formal verification. That’s because it was very reasonable to expect to be able to prove the complete functionality of these units formally. We decided to target these units with the Assurance strategy. In addition to proving these units with formal verification, we decided to add a few SoC simulation sanity tests to demonstrate the operation of these units in a broader scope. These simulations were largely to verify the access to these units as specified in the RAPID register map.
The SRAM controller (SCR) interconnect protocol was more complex and would have made it harder to get thorough formal proofs. The logic beyond those interfaces, however, made a great target for the Assurance strategy. We decided to verify the core logic of the SCR with formal methods using an Assurance strategy and resort to simulation to verify the interconnect logic using SoC tests.
3) Bug Hunting Targets
Spurred by the the success of applying, we considered applying formal methods in a bug hunting mode for units that were already being verified with simulation. Unlike Assurance, Bug Hunting doesn’t attempt to prove the entire functionality, but rather targets specific areas where simulation is not providing enough confidence that all corner case bugs have been discovered.
OUR EXPERIENCES WITH FORMAL
A. SoC Connectivity
SoC Connectivity checks were written to verify the correct connectivity between critical SoC signals like interrupts, events and other control/datapath signals. These checks are trivial to define and are of high value. Proving these connections saved us significant cycles in SoC simulations.
SoC Connectivity checking also included Boundary Scan (BSR) connectivity tests to prove drive, sample and high impedance properties of each I/O cell. The RAPID Nodewatcher functionality was also targeted with formal verification to verify the connectivity of thousands of internal signals to a selectable set of I/O pins. These are conditional connectivity checks based on the configuration of the Test Data Registers (TDR). Some of these SoC checks went beyond just the point-to-point connection between SoC events and verified the correct configurability and functioning of certain global functions on the SoC.
To make the SoC Connectivity proofs easier for the formal tool, we specified directives to the formal tool to black-box most of the units in the SoC. This reduced the time to prove the connectivity between these units significantly. In the absence of these black-box directives, the formal tool would have had to justify the generation of ‘1’ and ‘0’ at the source of the connections.
B. IP Assurance
As an example of our process for IP block assurance, we shall discuss the flow for the System Interrupt Controller. The SIC is the global interrupt router on RAPID. It routes interrupts from N different sources to M different destinations. The enablement and clearing of interrupts is managed through programmable control registers. The status of the interrupts is also available through interrupt status registers. These control and status registers are accessible through a system interconnect. The incoming interrupt from each source can be configured to be rising edge, falling edge, active level high or active level low. This behavior can be configured through the control registers. Fig. 1 shows a generic block diagram of the SIC. This diagram is typical of most units in an SoC.
Figure 1: System Interrupt Controller Block Diagram.
|
 |
We can partition the unit verification into two areas.
1) Register Access Verification
This is an attempt to answer two questions:
- Are the control registers being written to correctly through the system interconnect?
- Are the status registers being read from correctly through the system interconnect?
This verification requires the system interconnect interface to be constrained to ensure that the formal tool only generates legal transactions. We took advantage of vendor provided constraints to constrain these interfaces. The verification IP also included checks to ensure that the SIC adhered to the protocols of the interconnect.
We developed some sequences and properties to be able to write to and read from the registers based on the interconnect protocol. These sequences accounted for any wait states in the protocols and did not constrain the response latencies from the slave at all. We used these properties to prove that a write to each address specified in the architectural spec for the unit caused the appropriate control register in the design to receive the data that was written. Reserved bits were masked from the comparison.
Similar properties were used to ensure that the data in the status registers were read correctly. The status registers were constrained to hold a stable value during the read protocol to prevent the hardware from writing to them and causing the read properties to fail.
In the SoC context, we added simulation tests to ensure the correct channeling of requests through the SoC system interconnect fabric to these registers. This was done more for SoC verification and less for the unit verification. While these interconnect protocol properties were easy to implement for some interconnects like the one in the SIC, it was not a trivial approach for more complex protocols. In those situations, we relied on simulation to verify the correct functioning of the registers.
Fig. 2 shows the formal testbench for the SIC unit.
Figure 2: System Interrupt Controller Formal Testbench.
|
 |
2) IP Core Logic Verification
Core logic verification is an attempt to answer these questions:
- Are the control registers being correctly used by the unit as per the design specification?
- Is the design being reset correctly?
- Are the inputs being interpreted correctly?
- Are the outputs being generated appropriately?
- Are the status registers being updated correctly?
The Register Access Verification established controllability and observability of the registers in the unit from its interface. The IP core logic verification could now safely use the control registers as inputs to properties on the rest of the logic they drive. In addition to these registers, we chose a few internal nodes in the design as observation and control points in our properties. These points gave us additional controllability and observability to the design and reduced the complexity of the cones of logic being analyzed around them. We proved the correctness (observability) of these points prior to enjoying the benefits of using them (controllability) for other properties. This approach made it easier to write properties on the entire unit without any compromise on the efficacy of the overall unit verification.
While defining the set of properties to verify the core logic, we had to constrain the control registers to allow only legal values as defined in the architectural spec. These constraints were promoted to the RTL simulation environment to ensure that they were not violated in the system tests that used this unit at the SoC level. If the other inputs to the design needed to be constrained, assumptions were added and promoted accordingly. Additional assertions around internal nodes were used as needed for convenience with the same discipline as when using the control registers for properties. Exhaustive properties were written to ensure the correctness of the primary outputs of the unit and the correct updates to the status registers. To be thorough in our verification, we also added checks to verify the reset state of some key registers as specified in the micro architecture.
C. IP Bug Hunting
Our objective in this strategy is clearly to drive to verification closure by finding the remaining bugs in the design as fast as possible. Emphasis is less on complete proofs and thorough verification of the unit through a set of formal proofs. In this approach, bounded proofs are tolerable.
Such a strategy always complements simulation or is complemented by simulation. Traditional simulation-based code and functional coverage become the metrics for verification closure. Formal verification is just a catalyst to accelerate to closure. On RAPID we applied the Bug Hunting strategy in three different situations.
1) Fuse Controller
The FUSE unit has a deep state machine and hence we did not target this with the IP Assurance strategy. Besides, most of this unit was verified in the early stage of the project before formal verification had made its mark on RAPID.
The FUSE unit was largely verified through its system interconnect interface using SoC level simulation. Through those simulations, we had verified the ability to program and read the fuse array. The BISR interface to the FUSE was yet to be verified. Our goal was to iron out this access mechanism to the FUSE unit prior to the BIST and BISR interface being designed. We just wrote two properties to verify that fuse array read and write requests through the BISR interface would be completed within the expected number of cycles. The read/write operations take over 500/1000 clock cycles respectively. We were a little skeptical about the applicability of formal verification to explore the state space of the controller to these depths. These two assertions proved highly valuable by highlighting a few bugs in the unit which could very likely have been missed in probabilistic SoC level simulation. The formal tool highlighted issues in the state machine being able to handle back-to-back reads/writes. These failures occurred around 1000 cycles after reset. Once the bugs were fixed, we were able to get unbounded proofs for these properties. The run times for these proofs were very reasonable as well (less than 2 hours). It would have required us an elaborate random stimulus generator for the BISR interface to probably find these bugs. The SoC environment does not lend the controllability for such random stimulus. At the very least, this formal effort saved us the longer debug times in the simulation environment.
2) Clock Controller Unit
The CCU controls the clock generation to the various units on the RAPID SoC. Towards the end of the project, we found a bug using random SoC simulations in the CCU with one of the clock modes the SoC was expected to operate in. The challenge was to ensure that this bug did not occur in any of the other clock modes. Doing that in simulation would have been impractical. We decided to explore the use of formal verification to give us that confidence. We described the property that the bug would have violated and ran formal model checking on the CCU. Through this effort we were able to confirm the bug that was found in simulation and prove that this bug only occurred in the clock mode that was reported in simulation. That proof gave us confidence in the health of the design and expanded our options to fix or ignore the bug.
3) MEMORY INTERFACE SUBSYSTEM
The Memory Interface Subsystem (MIS) includes a localized Memory Access Arbiter (MAA) and its interface (MSYS) to the system interconnect. The units in this subsystem were already being verified in a unified UVM- based unit level environment. We decided to accelerate the verification closure of these units by using formal verification for finding the remaining bugs. We asserted some general invariant properties about the design. We also implemented properties to assert some corner cases. In order to increase our chances to find bugs, we verified the MAA and the MSYS in separate formal environments. Fig. 3 shows the formal testbenches for the units in the MIS.
Like in the previous units, we used vendor IP to constrain the system interconnect. We strictly followed the Assume- Guarantee formal proof methodology in this situation. The assumptions that were made to verify properties for the MSYS unit became targets for the MAA verification. This bug hunting exercise revealed a few design issues including a critical one that potentially saved us a respin of silicon. Although we did not get unbounded proofs for all these properties, we were able to achieve our goals of driving to verification closure on these units.
Figure 3: MAA and MSYS Formal Testbench.
|
 |
Subsequent to the initial bug hunting effort, the verifica- tion methodology for this unit has evolved into using formal methods for performing early bug hunting for all the new design features designed. This has proven to be a very valuable exercise with bugs being found within minutes after the designer has implemented the design.
This is as productive as it can get when the design and verification responsibilities are split between different teams.
RESULTS: FINDING BUGS FAST
Table 1 shows the results of applying formal verification on the different units on RAPID. The table also shares some attributes about the designs as reported by the formal tool. The number of assertions reflects the number of unique assertions eventually run for the unit. These assertions were broken down into multiple tests for each unit. Our environment enables a definition of a hierarchy of tests for a unit and automatically handles the dynamic scheduling of these formal jobs on the server farm. The run times reflect the total amount of time taken to run all the tests for each unit.
Table 1: Formal Verification Results on RAPID |
---|
Unit |
Strategy |
Registers |
Assertions |
Run Time |
Bugs |
---|
SIC |
Assurance |
~4200 |
~3000 |
3m |
5 |
ECM |
Assurance |
~5500 |
~1500 |
5m |
15 |
LRU |
Assurance |
~75 |
~60 |
120m |
2 |
SRC |
Partial Assurance |
~2500 |
~1500 |
600m |
39 |
FUSE |
Bug Hunting |
~4500 |
2 |
110m |
4 |
MAA |
Bug Hunting |
~6700 |
~2000 |
600m |
6 |
MSYS |
Bug Hunting |
~9500 |
~100 |
120m |
2 |
SPE |
Assurance |
~350 |
~150 |
2m |
8 |
Table 2: Formal Verification Time Savings on RAPID |
---|
|
Time Without Formal |
Time With Formal |
---|
Unit Verification |
4-5 weeks/unit with Simulation with Functional Coverage |
1-2 weeks/unit with Formal |
Connectivity Verification |
2-3 weeks of SoC Simulation with Functional Coverage |
<1 week with Formal |
While most of the bugs manifested as firings of the properties targeted, some of them were revealed due to vacuous proofs or design check violations that were reported by the formal tool as a by-product of the model checking. The inability of the formal tool to justify the antecedent of the property could be due to an undriven signal or some other condition that would prevent the property from being proven or fired. Such failures may be trickier to debug because of a lack of a waveform to describe the failure. From our experience, it is important to not ignore them. They are likely masking a bug. It would be beneficial to identify some of these bugs earlier concurrently with the design process using automatic formal checks.
Each of the units targeted for IP Assurance was verified well within the original time allocated for those units in the original project plan, which planned only for simulation. These units were qualified for SoC integration well ahead of their schedule. The role that SoC tests played in the thorough coverage of these units was also reduced considerably. For example, with the formal verification of the SIC and the SoC interrupt connectivity, it was no longer necessary to verify the correct functioning of every interrupt on the SoC level. Such tests would have consumed MxN tests at significant lower SoC simulation performance to cover the entire matrix of N interrupt generators and M destinations. Instead of these exhaustive tests, just a few SoC simulation tests were written to demonstrate the correct system behavior. This was more a system level task. These tests were not expected to find any bugs and they did not. The schedule savings in the RAPID SoC verification plan is estimated as shown in Table 2.
Other benefits of applying formal verification included a more structured and efficient way of fostering collaboration between design and verification teams; also, since formal tools tend to quickly find counterexamples to disprove properties, a useful means of doing early bug hunting.
CONCLUSION
Formal verification is a highly effective and efficient approach to finding bugs. Simulation is the only means available to compute functional coverage towards verification closure. In this project we attempted to strike a balance between the two methodologies and to operate within the strengths of each approach towards meeting the projects goals. We demonstrated that closer collaboration between the design and verification teams during the pre-implementation phase is essential for maximizing the applicability of formal methods. The most significant accomplishment to me is the shift in the outlook of the design team towards formal. According to one designer whose unit was targeted with Bug Hunting, “I was initially skeptical about what formal could do. From what I have seen, I want to target my next design first with formal and find most of the bugs.”
Verification is widely accepted as the long pole in the project schedule and SoC design teams are tasked with verifying increasingly complex designs. Under these circumstances, the time savings and improved design quality that formal verification brings are welcome benefits. We plan to continue to push the boundaries of what is covered by formal in future projects.
Back to Top