Question regarding the generate block

Hi All,

module tb;
  generate
    genvar i;
    for(i=0; i<10; i=i+1) begin
      initial begin 
        #1ns;
        haha(i);
      end
    end
  endgenerate

  task haha(int i);
      // #10ns;
      $display("haha_%0d", i);
  endtask
endmodule

Regarding the code snippet above, when I don’t use #10ns in the task, the final output is haha0…haha9; but after using #10ns, it changes to haha9…haha9.
Could you please explain the reason for this in detail?

BR

the unexpected behavior is caused by task lifetime. Tasks are static unless declared as automatic, which means all task calls share the same storage for their local variables.

multiple initial blocks are generated using a for loop, and each one calls the same task “haha(i)” after a small delay.

The reason is that the task haha is static, and therefore its argument “i” is also stored in a shared memory location. When the simulation reaches time 1ns, all initial blocks call the task almost at the same time. Each call assigns a new value to the same shared variable i"”.

each invocation is suspended before it reaches the $display statement. During this waiting period, other task calls continue to execute and overwrite the same i variable. By the time the delay finishes at 11ns, the value of i has already been overwritten multiple times, and the last assigned value 9 remains. all task instances print the same value.

my recommendation: if you expect a sequential print and want to use #10ns, your task define with the ‘automatic‘ so that the i variable allocates a different memory location in each call, returning the relevant task based on the current variable value.

You can follow about topic and answer: SystemVerilog Tasks - Verification Guide

Thanks for your reply.
I understand the concept of a static lifetime that you mentioned, but if I remove the delay from the task, the final output doesn’t behave like a static lifetime — it prints haha_0…haha_9. Why is that?

After giving it some thought, I think I might know the reason.

This has nothing to do with the delay in the initial block; let’s remove the delay from it.

generate creates 10 initial blocks, and the corresponding genvar i in each initial block ranges from 0 to 9.

When there is no delay in the task, at 0 ns, because 10 initial blocks are called simultaneously, they all call the task and pass genvar i to the formal parameter i, since i is a static variable, a race condition is triggered. It’s possible that the output is printed immediately after i is assigned 0; it’s also possible that it’s printed immediately after i is assigned 0, 1, 9 in sequence. In short, multiple print results may occur.

When there is a 10ns delay in the task, a race condition for the formal parameter i is similarly triggered at 0 ns. However, after 0 ns has elapsed, the outcome of the race condition for i has already been determined—it could be any number from 0 to 9. Since the display statement occurs at 10 ns, the value printed will always be the same (from 0 to 9).

I’m not sure if I’ve understood this correctly, so I hope @dave_59 can clarify it for us.

You’re correct. It’s worth noting that different tools and even different versions of the same tool may select different execution orderings for the initial blocks. However, the same tool will consistently use the same ordering. Consequently, the race condition becomes exceedingly difficult to detect.

You said: “the outcome of the race condition for i has already been determined–it could be any number from 0 to 9.”

Theoretically true, but in practice this value is almost always 9 — and not by chance. It’s actually a deterministic outcome.

Here’s why: the simulator schedules all 10 initial blocks at T=0 on a single thread, in elaboration order. So haha(0) is called and i=0 is written, then immediately haha(1) is called and i=1 overwrites it, and so on until haha(9) writes i=9. At that point all blocks have entered their #10ns wait, and nothing touches i during that time. When they all wake up at T=10ns, they read deterministically 9. It’s not a random race condition outcome — it’s the last writer winning in a predictable order.

You touched on this but didn’t fully explain it. The reason it works correctly without a delay isn’t luck — it’s this: even with a static task, when there’s no delay, the assignment to i and the $display happen “within the same delta cycle”, uninterrupted.

Nothing can slip in between the assignment and the display call. The moment you add a delay, a 10ns window opens between those two operations, and every other call overwrites i during that window.

So your core diagnosis, static task, shared i, race condition is spot on. The only thing worth refining is that “any number from 0 to 9” almost always manifests as 9 in practice, and the reason the no-delay case works correctly is slightly different from what you described.

Technically, this is not correct. There’s no assurance that the order of elaboration leads to the order of execution. In fact, some implementations have opted for the reverse order in some cases. As more people experiment with multicore execution, it might seem like other statements in one process are inserted between the statements of another process.

There are two key points here:

1. Regarding the execution order of initial blocks: There is no doubt that the order in which parallel initial blocks enter the Active Region is indeterminate.

2. Regarding the execution order of statements within an initial block: As Dave pointed out, there is no guarantee that once an initial block begins execution, the statements within it will execute atomically until they are blocked.

We may have observed seemingly consistent results no matter how many times we run the simulation, but this might simply be because our simulators use the same strategy for handling race conditions. However, as Dave noted, different simulators may produce different results.

Thanks Dave.