Working of fork join_none

I am not able to understand why this piece of code gives the following output. According to fork join_none, it should create 5 parallel threads. And two blocking statements will execute in parallel, so when $display is executed, it should have no value of j.

module tb;

integer i,j;

initial
for(i=0;i<5;i++)
fork
j = i;
$display(“Value of j is %d at time=%d \n”, j, $time);
join_none

endmodule

Output:

Value of j is 5 at time=0
Value of j is 5 at time=0
Value of j is 5 at time=0
Value of j is 5 at time=0
Value of j is 5 at time=0

In reply to smukerji:

Two problems with your code. First is that it create 10 parallel threads. Each statement inside a fork becomes a thread. You have two statements in a loop that iterated 5 times, so 10 threads. You probably meant to surround those two statements with a begin/end block. Then the fork sees that as a single block statement.

But the second, bigger, problem is that you have one set of variables i and j shared across all the threads. Even if you fixed the first problem, SystemVerilog won’t begin execution of any fork’ed child thread until the parent thread block, or, in this case, the parent thread completes. The value of i and j will be 5 for all threads.

What you need to do is change j to be an automatic variable so there is one copy per thread.

module tb;

initial
  for(int i=0;i<5;i++) begin
     automatic int j = i; // a new j for each entry into this block
     fork
       $display("Value of j is %d at time=%d \n", j, $time); 
     join_none
  end
endmodule

In reply to dave_59:

Thanks. So as I understand, if it creates 5 parallel threads of $displays and 5 parallel threads of j=i, then all run at same simulation time, why only last value (5) is displayed and not the other ones? (kind of a race condition here?)

My another question here is j gets value of i at the same time the display statement gets executed, so ideally $display should show j as x. To simplify, let me add a code here.

module tb;
integer i,j;

initial
fork
i=5;
j=i;
$display(“value of j is %d”, j);
join_none
endmodule

Here, i=5, j=i and $display all get executed concurrently, and not sequentially. But output shows Value of j is 5. Now all the statements execute in active region, but I would have expected j as 5 if it was enclosed in begin…end. But why such behaviour in fork…join_none?

In reply to dave_59:

In reply to smukerji:
Two problems with your code. First is that it create 10 parallel threads. Each statement inside a fork becomes a thread. You have two statements in a loop that iterated 5 times, so 10 threads. You probably meant to surround those two statements with a begin/end block. Then the fork sees that as a single block statement.
But the second, bigger, problem is that you have one set of variables i and j shared across all the threads. Even if you fixed the first problem, SystemVerilog won’t begin execution of any fork’ed child thread until the parent thread block, or, in this case, the parent thread completes. The value of i and j will be 5 for all threads.
What you need to do is change j to be an automatic variable so there is one copy per thread.

module tb;
initial
for(int i=0;i<5;i++) begin
automatic int j = i; // a new j for each entry into this block
fork
$display("Value of j is %d at time=%d \n", j, $time); 
join_none
end
endmodule

When I ran this code in EDAPlayground with VCS, I got
Value of j is 4 at time= 0

Value of j is 4 at time= 0

Value of j is 4 at time= 0

Value of j is 4 at time= 0

Value of j is 4 at time= 0

In reply to smukerji:
The behavior of that version of VCS is incorrect. It is not creating a new allocation of j for each iteration. Try a different simulator.

In reply to smukerji:

Hi ,

replying to first set of core .( DAVE please correct me if i am wrong or my understanding is not correct on this part, )


module tb;

integer i,j;

initial
for(i=0;i<5;i++)
fork
j = i; 
$display("Value of j is %d at time=%d \n", j, $time); 
join_none

endmodule 

if you consider the same block in a form of Digital blocks , and draw a diagram(using gates) you will see that j=i is an assignment and since there is no DFF call (no clk) the simulator will be collapsing the j signal and remap j with i, since eventually j and i are the same. so everytime we will be getting the value of j as i, the simulator will be creating 5 seperate blocks at the starting , and each 5 block will have a print block only and input as i/j(both are same) all running parallely(forked)