How to fork-join a looped fork join_none threads?

I would like to fork join a looped fork none (parallel threads).
with respect to the below code following were my results…

As per my Round - 1 result : Fork join completed when all the threads completed.
As per my Round - 2 result : Fork join completes immediately !

*My goal: To control the number of threads being spun using a parameter and ensure all the threads complete before the next execution logic…
please help me achieve the same result of Round - 1 in Round-2’s setup
*
Feel free to suggest any other method to achieve the same result…

module top();
  parameter NUM_OF_IDS = 8;
  
  initial begin
    $timeformat(-9, 0, " ns", 10);
    //---------------- 
    // ROUND - 1
    //----------------
    /*
    $display("%t - Round 1\n",$time);
    fork
       nb_rand_delay(1);
       nb_rand_delay(2);
       nb_rand_delay(3);
       nb_rand_delay(4);
       nb_rand_delay(5);
       nb_rand_delay(6);
    join
    */


    //----------------
    // ROUND - 2
    //----------------
    $display("\n%t - Round 2\n",$realtime);
  
    fork
       repeat(NUM_OF_IDS)begin 
                   int j; 
         automatic int i;
         i = j++;
         fork nb_rand_delay(i); join_none
       end 
    join
  
    $display("\n%t - Completed All\n",$time);
  end
  
  // Task to create a random delay 
  task automatic nb_rand_delay(int ID);
    int rand_delay;
      begin
        std::randomize(rand_delay) with {rand_delay inside {[2:10]};};
        #(rand_delay * 1us); 
        $display("%t - Completed Thread ID : %0d, Rand_Delay: %0dus",$time,ID,rand_delay);
      end
  endtask
  
endmodule

Round - 1 Result :

#       0 ns - Round 1
#
#    2000 ns - Completed Thread ID : 1, Rand_Delay: 2us
#    3000 ns - Completed Thread ID : 5, Rand_Delay: 3us
#    3000 ns - Completed Thread ID : 6, Rand_Delay: 3us
#    5000 ns - Completed Thread ID : 4, Rand_Delay: 5us
#    6000 ns - Completed Thread ID : 2, Rand_Delay: 6us
#    8000 ns - Completed Thread ID : 3, Rand_Delay: 8us
#
#    8000 ns - Completed All // Completed at the end of last thread ID : #3

Round - 2 Result with Round -1 code commented

#       0 ns - Round 2
#
#
#       0 ns - Completed All // <= Concludes prematurely 
#
#    2000 ns - Completed Thread ID : 1, Rand_Delay: 2us
#    2000 ns - Completed Thread ID : 0, Rand_Delay: 2us
#    4000 ns - Completed Thread ID : 2, Rand_Delay: 4us
#    6000 ns - Completed Thread ID : 6, Rand_Delay: 6us
#    6000 ns - Completed Thread ID : 3, Rand_Delay: 6us
#    9000 ns - Completed Thread ID : 7, Rand_Delay: 9us
#    9000 ns - Completed Thread ID : 4, Rand_Delay: 9us
#   10000 ns - Completed Thread ID : 5, Rand_Delay: 10us

In reply to rshrig:

  //----------------
    // ROUND - 2
    //----------------
    $display("\n%t - Round 2\n",$realtime);
     int j=0
       repeat(NUM_OF_IDS)begin 
         automatic int i;
         i = j++;
         fork nb_rand_delay(i); join_none
       end 
    wait fork; 
    $display("\n%t - Completed All\n",$time);

wait fork waits for all children of current process to complete.

Hi Dave, along with wait fork, one more update is needed in his code, is to avoid guard ring fork join he used outside of for loop else we still get the same print as he pointed out. The final code would look like this

    $display("\n%t - Round 2\n",$realtime);
  
  //  fork//avoid using guard ring here
       repeat(NUM_OF_IDS)begin 
                   int j; 
         automatic int i;
         i = j++;
         fork nb_rand_delay(i); join_none
       end 
  //  join 
   wait fork;
  $display("\n%t - Completed All\n",$time);