Write code to execute task A in parallel 10 times, and call task B only after task A has finished executing 10 times

Hello, I am writing code to do the following:
Execute task A in parallel 10 times, and call task B only after task A has finished executing 10 times.
The way I thought of going about this is to have a for loop, with each iteration executing in parallel. But I do not get the expected output.

 
module fork_join;
  
  event a;
  task A(input int i);
    $display("Task A. Time = %0t, i = %0d", $time, i);
  endtask
  
  task B(i);
    $display("Task B. Time = %0t, i = %0d", $time, i);
  endtask
  
  initial begin
    for(int j = 0; j < 10; j++) begin
      fork
        A(j);
      join_none
    end
    ->a;
    
    @(a);
    B(2);
  end
    
endmodule

Now when I run this code, I get the following output:

Task A. Time = 0, i = 10

repeated 10 times. But what I should be seeing is the value of i having the values 0 to 9 with each iteration of the for loop executed in parallel. How can I go about this?
Moreover, I see time = 0 being displayed 10 times even without the fork join. I realize this is because they are under one initial block which executes at time 0. But how can I actually see whether they execute in parallel or not if time = 0 prints for all iterations with and without the fork?

In reply to vk7715:
Actually I am puzzled. Wrote this modified model.
What is weird is that the task is called 2 times with j==0 and then j==1
but it executes with j==1 and j==2
I used breakpoints and step-into to verify the actions.
Why?

 
module m;
  // event a;
  bit [0:9] done;
  task automatic A(input int i);
    $display("Task A. Time = %0t, i = %0d", $time, i);
    done[i] = 1;
  endtask

  task automatic B(i);
    $display("Task B. Time = %0t, i = %0d", $time, i);
  endtask

  task automatic t();
    for (int j = 0; j <= 1; j++) begin
      #10; 
      fork
        A(j);
      join_none
    end
  endtask

  initial begin
    t();
    $display("sent t()");
    wait (done == 10'b00_0000_0011);
    $display("done=%b", done);

    // @(a);
    B(2);
  end

endmodule


SIm results 
run -all
# Task A. Time = 10, i = 1
# sent t()
# Task A. Time = 20, i = 2
# exit

In reply to vk7715:

fork/join_none does not start any processes until the parent process suspends or terminates.



module fork_join;
  
  event a;
  task A(input int i);
    $display("Task A. Time = %0t, i = %0d", $time, i);
  endtask
  
  task B(i);
    $display("Task B. Time = %0t, i = %0d", $time, i);
  endtask
  
  initial begin
    for(int j = 0; j < 10; j++) begin
      fork
        A(j);
      join_none
      #0;
    end
    ->a;
    
    @(a);
    B(2);
  end
    
endmodule


In reply to rag123:
For some reason task B is not executed.

This works


module m;
  // event a;
  bit [9:0] done;
  task automatic A(input int i);
    $display("Task A. Time = %0t, i = %0d", $time, i);
    done[i] = 1;
  endtask

  task automatic B(i);
    $display("Task B. Time = %0t, i = %0d", $time, i);
  endtask

  /* task automatic t();
    for (int j = 0; j <= 1; j++) begin
      #10; 
      fork
        A(j);
      join_none
    end
  endtask */ 
  task automatic t();
    for (int j = 0; j <= 9; j++) begin
      int k = j;
      // #10;
      fork
        begin
          $display("j %0d k %0d ",j,k);
          A(k);
        end
      join_none
    end
  endtask

  initial begin
    t();
    $display("sent t()");
    // wait (done == 10'b00_0000_0011);
    @ (done == 10'b11_1111_1111);
    $display("done=%b", done);

    // @(a);
    B(2);
  end

endmodule

In reply to ben@SystemVerilog.us:

Hi Ben, thanks for your answer. But how would I truly know if all the 10 iterations executed in parallel at the same time? I ask this because since the task is called inside an initial block which takes a simulation time of 0, wouldn’t all 10 iterations complete in time 0 irrespective of the fork join_none?

I also noticed that the output of the second code you sent is:

# sent t()
# j 10 k 9 
# Task A. Time = 0, i = 9
# j 10 k 8 
# Task A. Time = 0, i = 8
# j 10 k 7 
# Task A. Time = 0, i = 7
# j 10 k 6 
# Task A. Time = 0, i = 6
# j 10 k 5 
# Task A. Time = 0, i = 5
# j 10 k 4 
# Task A. Time = 0, i = 4
# j 10 k 3 
# Task A. Time = 0, i = 3
# j 10 k 2 
# Task A. Time = 0, i = 2
# j 10 k 1 
# Task A. Time = 0, i = 1
# j 10 k 0 
# Task A. Time = 0, i = 0
# done=1111111111
# Task B. Time = 0, i = 0
# exit

Can you please explain why the values of i are 9 down to 0 when the loop is incrementing in an ascending fashion? Also, why is i = 0 when task B is called even I though I pass the argument = 2?

In reply to rag123:

Hi Rag, can you explain the purpose of the #0 after the join_none? If its present, I see the task correctly printing values 0 to 9, but without that I see the value being 10 always. Can you please explain what the #0 is doing?

In reply to vk7715:
ON “Can you please explain why the values of i are 9 down to 0 when the loop is incrementing in an ascending fashion?”


 task automatic t();
    for (int j = 0; j <= 9; j++) begin
      int k = j;
      // #10;
      fork
        begin
          $display("j %0d k %0d ",j,k);
          A(k);
        end
      join_none
    end
  endtask

Per rag123’s response “fork/join_none does not start any processes until the parent process suspends or terminates”. Thus, the A(k) are queued up by the simulator from 0 to 9
until t() ends and the initial reached the @ (done == 10’b11_1111_1111);
Following that, the simulator processes those A(k) tasks from the queue. I believe that would be up to the tool to decide how it wants to pull the job off the queu. Looks like it did in a LIFO fashion.

On But how would I truly know if all the 10 iterations executed in parallel at the same time? I ask this because since the task is called inside an initial block which takes a simulation time of 0, wouldn’t all 10 iterations complete in time 0 irrespective of the fork join_none? The tasks did complete at time 0. If you put a #3 just after the initial begin the time would be 3.

On task B, the formal needs to be defined as an int, it defaulted to a bit.
Made changes in Edit code - EDA Playground


module m;
  // event a;
  bit [9:0] done;
  task automatic A(input int i);
    $display("Task A. Time = %0t, i = %0d", $time, i);
    done[i] = 1;
  endtask

  task automatic B(int p);
   // int w; 
   // w=p; 
    $display("Task B. Time = %0t, p = %0d", $time, p);
  endtask

  /* task automatic t();
    for (int j = 0; j <= 1; j++) begin
      #10; 
      fork
        A(j);
      join_none
    end
  endtask */ 
  task automatic t();
    for (int j = 0; j <= 9; j++) begin
      int k = j;
      // #10;
      fork
        begin
          $display("j %0d k %0d ",j,k);
          A(k);
        end
      join_none
    end
  endtask

  initial begin
    #3; 
    t();
    $display("sent t()");
    // wait (done == 10'b00_0000_0011);
    @ (done == 10'b11_1111_1111);
    $display("done=%b", done);

    // @(a);
    B(2);
  end

endmodule
//
# sent t()
# j 10 k 9 
# Task A. Time = 3, i = 9
# j 10 k 8 
# Task A. Time = 3, i = 8
# j 10 k 7 
# Task A. Time = 3, i = 7
# j 10 k 6 
# Task A. Time = 3, i = 6
# j 10 k 5 
# Task A. Time = 3, i = 5
# j 10 k 4 
# Task A. Time = 3, i = 4
# j 10 k 3 
# Task A. Time = 3, i = 3
# j 10 k 2 
# Task A. Time = 3, i = 2
# j 10 k 1 
# Task A. Time = 3, i = 1
# j 10 k 0 
# Task A. Time = 3, i = 0
# done=1111111111
# Task B. Time = 3, p = 2
# exit 

In reply to ben@SystemVerilog.us:

I believe you can use wait fork instead of creating a vector, just keep in mind that specifying wait fork causes the calling process to block until all its immediate child subprocesses have completed, as described in the LRM section 9.6.1 Wait fork statement


   repeat(10) begin // this could be a for loop with the automatic k = i
      fork
        A(); //call A
      join_none
   end
   wait fork; // wait for all spawned tasks to complete
   B(); // execute B

For example

module m();
 
  task automatic A();
    int unsigned w; 
    w = $urandom_range(1, 20);
    #w;
    $display("Task A. w= %0d Time = %0t", w, $realtime);
  endtask
 
  task automatic B();
    int unsigned w; 
    w = $urandom_range(1, 10);
    #w;
    $display("Task B. w = %0d Time = %0t", w, $realtime);
  endtask
 

  task automatic t();
    repeat(10) begin
      fork
        A();
      join_none
    end
    wait fork;
    B();
  endtask
 
  initial begin
    t();
  end
 
endmodule

# KERNEL: Task A. w= 1 Time = 1
# KERNEL: Task A. w= 2 Time = 2
# KERNEL: Task A. w= 3 Time = 3
# KERNEL: Task A. w= 5 Time = 5
# KERNEL: Task A. w= 5 Time = 5
# KERNEL: Task A. w= 10 Time = 10
# KERNEL: Task A. w= 12 Time = 12
# KERNEL: Task A. w= 14 Time = 14
# KERNEL: Task A. w= 18 Time = 18
# KERNEL: Task A. w= 18 Time = 18
# KERNEL: Task B. w = 9 Time = 27

If you require a fine control of each process you can look in to the LRM section 9.7 Fine-grain process control

HTH,
-R

In reply to rgarcia07:

Many thanks for your comments about the wait fork; I like it.

On a separate thought, I figured out why the “j” in the following code is 10 and why I need the “k”.


task automatic t();
    for (int j = 0; j <= 9; j++) begin
      int k = j;
      // #10;
      fork
        begin
          $display("j %0d k %0d ",j,k);
          A(j); // A(k);
        end
      join_none
    end
  endtask

It’s because “j” acts like a local variable and all these A(j) task calls get pushed into a queue.
When they get processed they use the current value of “j” that is now 10.
One would have thought that the value would have been the value of “j” in the loop cycle it was
initiated.

In reply to vk7715:

Try the following code: , I added an automatic int i, since in your code due to fork and for combination , for looped through to last iteration.

module fork_join;
 
  event a;
  task A(input int i);
    $display("Task A. Time = %0t, i = %0d", $time, i);
  endtask
 
  task B(i);
    $display("Task B. Time = %0t, i = %0d", $time, i);
  endtask
 
  initial begin
       for(int j = 0; j < 10; j++) begin
      automatic int i=j;
      fork
        A(i);
      join_none
    end
    ->a;
 
    @(a);
    B(2);
  end
 
endmodule

In reply to vk7715:

Can you please explain why the values of i are 9 down to 0 when the loop is incrementing in an ascending fashion?

I think this is how compiler schedules it.
Getting different prints for same code with Mentor and Synopsys compilers.

// Questa Sim-64

// Version 2021.3 linux_x86_64 Jul 13 2021

the values of i are 9 down to 0

Synopsys

Compiler version S-2021.09; Runtime version S-2021.09; Oct 31 08:58 2022
the values of i are 0 to 9

1 Like