You are about to go into a planning meeting for a new project when you get a call from one of your company’s Customer Advocate Managers: a product that you worked on just started shipping in volume, and there are a growing number of field reports that the system randomly freezes-up after a few weeks in operation. While a soft reset “works,” it will run anywhere from 5 to 10 days before it has to be rebooted again.
The highly variable MTBF suggests the problem is not a clock-domain crossing (CDC) issue (after all, crashes from CDC issues would have never made it out of the pre-production lab). The unfortunate conclusion is that the design’s logic itself has a problem. Somehow the code reviews, linting, and constrained-random simulations missed case(s) where the system could deadlock. The only people happy about this are your competitors’ sales reps…
To find a way out you need to figure out how your design is getting stuck in the first place. Let’s start by reviewing the two common flavors of deadlock.
Scenario A – A design gets into a state that it can never escape and there is no sequence of inputs you can give it to dislodge it from that state. This is the classical definition of deadlock. There is no way out no matter what you do, like a raccoon in a trap.
Let’s illustrate Scenario A with this simple, three state FSM example:
Figure 1 - A finite state machine example of Scenario A
Starting from IDLE, there is an increment state (INCR) where you can advance the counter by the value of the “in” vector; and then an “increment by 2X” state (INCR_2X) to multiply this input by two and add it to the counter.
We can stay in IDLE as long as we want; or we can start incrementing the counter by one; or we can go to increment by 2X and start incrementing the counter by two times the input. And we transition from the 2X state back to IDLE when the count overflows and goes back to zero.
As highlighted in the code example above, the trouble happens when we end up in the “increment by 2X” state. Observe that if we enter this state when the count is an odd number, and since we’re only adding even numbers to it, even when it rolls over there’s no way it can ever go back to being zero. Once we reach this point, there’s no input which can get us out of the state — this is a true deadlock.
Scenario B – This is when your design gets into a state that it can actually escape, but even if the escape route is obvious the system might tend to hang up there. Picture a koala slumbering in her favorite eucalyptus tree: why would she want to hang out anywhere else? The larger point is that regardless of whether there is an obvious or obscure escape route, this form of “deadlock” is not always a design issue that needs to be fixed.
Back to the corresponding code and state diagram example::
Figure 2 - Code and state diagram for a deadlocked system
As per the highlighted line of code for the IDLE state above, the question is whether it is possible to deadlock in IDLE. If the DUT was a sleepy koala it would say “yes,” because as long as the input is zero, the system will stay in the IDLE loop forever. (This is a very simple example to illustrate this case — this isn’t necessarily a bug, but it is similar to many such counter examples found for this type of deadlock.)
Detecting which of these deadlock scenarios is occurring is extremely hard because it involves multiple things happening at just the right time — and it can be a very long time before the right conditions manifest themselves.
A SOLUTION JUST OUT OF REACH
First let’s review why it is nearly impossible to detect system deadlock with even well constructed, constrained-random RTL simulations. Specifically, simulation fails to deal with the above deadlock scenarios in at least three ways:
- First of all, simulation cannot directly tell if the system is deadlocked — you can only observe that nothing has happened for a long time. The question is, when should you get worried that nothing is happening? How long is too long?
- The second problem is that simulation can’t differentiate between situations where your system is truly locked up (Scenario A, trapped raccoon) from a situation where the right stimulus hasn’t come along to take the design out of the current stuck state (Scenario B, cozy koala).
- Third, simulation is dependent on the engineer knowing the right stimulus to trigger a particular problem. This is the same challenge that simulation has for all properties you’re trying to verify, but it is particularly difficult with deadlock because it is not only hard to trigger in simulation (as it requires a number of specific interactions) but also difficult to observe and know that it has even happened.
FORMAL TO THE RESCUE?
Because it employs an exhaustive, mathematical analysis, formal verification is uniquely qualified to detect the risk of a design going into deadlock. Traditional formal-based approaches use two foundational formal analysis concepts: liveness
and safety. Liveness properties are used to specify that something good will eventually happen. Safety properties state that something bad will never happen.
Figure 3 - Liveness vs. Safety Failures
Unfortunately, there is a big caveat to manually employing these techniques with a mix of properties and constraints: you really have to know what you are doing. And even then, it can be a tedious and error-prone manual process often involving many lengthy iterations. For example, let’s unpack the “safety” analysis approach. Three steps are required:
- Writing cover properties (e.g., in SVA) to find a trace where the system remains in some specific state for N cycles:
cover property (@posedge clk)
design_state == `SOME_SPECIFIC_STATE [*30];
- Test if that trace is indeed a deadlock scenario:
assert property (@posedge clk)
design_state == `SOME_SPECIFIC_STATE;
- If the deadlock is not found, modify the cover property in (1)
Clearly, when you have to repeat this third step over and over — which is typically the case — it’s clear that the manual application of safety properties to find deadlock is impractical. (Granted, it’s a little better if you know that your system has to respond within a certain number of cycles. In this case, applying safety properties has a decent chance of working.)
Conversely, when there is no clear spec for when the system should time out – or that time out is a very long time (seconds) in the formal or even simulation world – the initial course of action is to check for deadlock using liveness properties. The catch is that with either SVA or PSL liveness properties, you indeed will find a lot of Scenario B (koala) deadlocks; but be forewarned that you will have to also do a lot of iterations constraining away all Scenario B (koala) cases you will find before you eventually unearth the Scenario A (raccoon) case. Bottom-line: it’s also painful to iteratively apply liveness properties.
The good news is that there is an academic formal property specification language that can be employed for verifying Scenario A (raccoon) case: Computational-Tree Logic (CTL). The bad news this is an academic language that is not user-friendly for engineers familiar with SVA and PSL. Indeed, there is no mainstream commercial support for this language today.
A different academic language: Linear-Temporal Logic (LTL) – addresses the Scenario B (koala) case. It is actually a subset of SVA and PSL, where historically PSL and SVA grew out of this language; and there is commercial support for it. However, as noted above, regardless of whether your liveness property was expressed in LTL, SVA, or PSL directly, you still have to manually create flows that involve running formal multiple times, extracting information from the results, constraining their design, and re-running it over-and-over again.
Bottom-line: even though formal verification engineers can use liveness and safety properties to try to verify the absence of deadlock, they must manually cobble a lot of things together, which is a painful and unreliable exercise.
Fortunately, a new development has changed the picture.
NEW FORMAL-BASED AUTOMATION
Recently Questa® PropCheck has enhanced the Scenario B / LTL / koala analysis by running alongside it new formal algorithms for Scenario A / CTL / raccoon analysis, which leverage standard SVA or PSL properties to capture the deadlock specification. It automates what was formerly too esoteric for practical purposes, making full deadlock analysis readily available to everyday, professional formal users. Because all the things that used to be done manually are now built-in, PropCheck replaces tedious, error prone approaches. This is a unique capability that is not available in any other tools today.
Another advantage of this formal-based solution is that you can start writing properties and running the formal tools as soon as the RTL is ready. There is no big UVM testbench or infrastructure that you need to put in place before you can start verification. If you’re worried about deadlock and you have complicated interactions, you should write those properties upfront as part of the design process, then you are really ahead of the game.
EXAMPLE WORK FLOW
Referring back to the code example shown when we first introduced the scenarios above, let’s input the following two SVA properties into Questa® PropCheck:
A_IDLE: assert property (s_eventually st != IDLE);
A_INCR_2X: assert property (s_eventually st != INCR_2X);
The first property asks, “Can we deadlock in the IDLE state?”
The second property is saying, “Can we deadlock in the INCR_2X state?”
We run the tool, and the SVA properties fire, indicating the engines have identified some issues.
Referring to the first line of the screen shot segment in Figure 4 below: presuming for a moment that you keep that input at zero, the DUT will never leave the IDLE state. But Questa® PropCheck runs a deadlock check by performing the Scenario A / CTL / trapped raccoon analysis and displays the green checkmark to tell you, “Nope, you’re all good because whatever stimulus you use here to get stuck, I can extend that stimulus with additional inputs to get you unstuck.” So by combining the traditional SVA liveness analysis with the “CTL-style” deadlock checking (under-the-hood), you can differentiate between these two very different cases: the unmotivated koala and the trapped raccoon.
Figure 4 - Example Scenario A and B results in Questa® PropCheck
Referring to the second line in the Figure 4 output example, the test to see if we’re deadlocked in the “increment 2X” state comes back with a red “X” to tell you, "Yes, here you are really stuck. There is no stimulus I can provide to break out of this loop."
Additionally, a counterexample waveform is generated to show the circuit getting stuck in the INCR_2X state. It makes the input non-zero in the IDLE state to go to the INCR state, and then it adds one to make the counter odd before setting the input value to two to cause the transition to the INCR_2X state. Then when you enter INCR_2X, the count becomes odd, and the design will be stuck in that state forever. The waveform in Figure 5 below indicates that if you provide the same inputs, you’re never going to leave this state. Note that the waveform just shows one possible choice of inputs inside the loop – by definition, no matter what inputs are provided, the DUT is never leaving that state. Hence, this is the counterexample you can get by applying the Scenario A / raccoon / “CTL style” deadlock liveness to generate an unequivocal deadlock situation.
Figure 5 - Simple FSM — Scenario A Deadlock Waveform
Referring to Figure 6, and going back to our req-ack example with a “raccoon” deadlock: here is a situation where you made a request and you haven’t had an acknowledgment, and then the design goes into a loop.
Figure 6 - Scenario A (Trapped Raccoon) Example
This is useful information. Since we found our counterexample using this Scenario A (raccoon) analysis, we found a real bug in the design, and we have something to go off and debug! However, a confusing and somewhat counterintuitive aspect is that if we get a proof that there is no such counterexample, that doesn’t mean that our system is deadlock free. Rephrasing, while we can prove that there’s no such counterexample like this, the system might still have deadlock.
THERE IS A LITTLE BIT MORE TO IT...
It’s important to realize that the Scenario A (raccoon) analysis is actually about bug hunting and not formal proofs, per se. This analysis makes it much easier to find deadlocks because it allows us to directly target what a real deadlock in the system is. If we can find one of these, we have found a real bug. On the other hand, with Scenario A proven to be clean, it means that the Scenario B (koala) analysis should be reporting escape routes, because if it is not, by definition, it means the animal is actually trapped and you will get an error from the Scenario A (raccoon) analysis in the first place.
The larger point is that it’s critical to understand all the escape routes the koala analysis uncovers; and thus, by process of elimination, you should remove or constrain away the escape routes until you are certain all such paths are either ok or trouble.
The most common escape route in any DUT is reset. If the reset signal hasn’t been constrained, and you can always assert the reset signal and to bring the state machine back to IDLE, then strictly speaking the system is never going to have a true deadlock because you can just hit reset and the algorithm’s going to say, “Oh, yeah. If I get stuck in this state, I hit reset and I'm out of it.”
Clearly you don’t want your customers to constantly have to reset their systems because it’s the only way out of a deadlock. Hence, it’s essential to constrain away the reset case so you can enable the analysis to find “real” escape routes or deadlocks.
Peeling the onion further, look at the other escape routes reported by the tool to determine their legitimacy. For example, maybe the koala escaped by enabling the scan chain, scanning out everything, and scanning in some other states, etc. By identifying all these omissions of constraints and then updating your constraints, you can close off all the undesirable escape routes one-by-one.
This eventually ends up at a point where either you’ve removed one of the these illegal escape paths and discovered a true deadlock situation, or you’ve constrained away all of the Scenario B (koala) counterexamples, in which case there is no deadlock because there’s no ability to loop. In summary, it is important to use both Scenario A and Scenario B analyses together to make sure you’ve properly verified the system.
Back to our example DUT: Figure 7 below shows the waveform from the Scenario B (koala) situation. Note that the design is in IDLE initially, and as long as the input is kept at zero, it stays in the idle state. Indeed, this first part is the counterexample you would have got from a traditional formal tool.
Figure 7 - Simple FSM - Scenario B Deadlock Waveform
But with the addition of the Scenario A analysis under-the-hood, you can compute the escape waveform and show that if you assert the input to one, then the design moves out of IDLE and on to some other state.
In general, by looking at these waveforms, you can quickly identify the constraints which were missing; and by looking at the transitions where it escaped, determine whether or not it really is a deadlock free system or not.
In practice, the formal analysis finds the deadlock traps relatively fast. Additionally, there usually aren’t that many constraints that are missing. There might be some obvious ones like a reset, a scan chain, or maybe the formal tool can generate some sort of illegal error condition and cause a reset. But, once you’ve constrained away the behaviors that you don’t want the analysis to consider, the tool will be single-mindedly focused on finding true system deadlock.
WHEN CRITTERS RUN FREE
In summary, the risk of a design going into deadlock is not something you can effectively verify with simulation. You could try to do it with traditional formal liveness and safety properties, but that is a painful and error prone process. Fortunately, the LTL and CTL algorithms have been embedded in Mentor’s PropCheck formal property checking tool to help you detect traps and uncover hidden escape routes. This innovative, unique solution allows verification engineers to employ traditional SVA syntax and quickly differentiate between your DUT resting in a tree or trapped in a cage.
Back to Top