How to fork-join a looped fork join_none threads?

I would like to fork join a looped fork none (parallel threads).
with respect to the below code following were my results…

As per my Round - 1 result : Fork join completed when all the threads completed.
As per my Round - 2 result : Fork join completes immediately !

*My goal: To control the number of threads being spun using a parameter and ensure all the threads complete before the next execution logic…
please help me achieve the same result of Round - 1 in Round-2’s setup
*
Feel free to suggest any other method to achieve the same result…

module top();
  parameter NUM_OF_IDS = 8;
  
  initial begin
    $timeformat(-9, 0, " ns", 10);
    //---------------- 
    // ROUND - 1
    //----------------
    /*
    $display("%t - Round 1\n",$time);
    fork
       nb_rand_delay(1);
       nb_rand_delay(2);
       nb_rand_delay(3);
       nb_rand_delay(4);
       nb_rand_delay(5);
       nb_rand_delay(6);
    join
    */


    //----------------
    // ROUND - 2
    //----------------
    $display("\n%t - Round 2\n",$realtime);
  
    fork
       repeat(NUM_OF_IDS)begin 
                   int j; 
         automatic int i;
         i = j++;
         fork nb_rand_delay(i); join_none
       end 
    join
  
    $display("\n%t - Completed All\n",$time);
  end
  
  // Task to create a random delay 
  task automatic nb_rand_delay(int ID);
    int rand_delay;
      begin
        std::randomize(rand_delay) with {rand_delay inside {[2:10]};};
        #(rand_delay * 1us); 
        $display("%t - Completed Thread ID : %0d, Rand_Delay: %0dus",$time,ID,rand_delay);
      end
  endtask
  
endmodule

Round - 1 Result :

#       0 ns - Round 1
#
#    2000 ns - Completed Thread ID : 1, Rand_Delay: 2us
#    3000 ns - Completed Thread ID : 5, Rand_Delay: 3us
#    3000 ns - Completed Thread ID : 6, Rand_Delay: 3us
#    5000 ns - Completed Thread ID : 4, Rand_Delay: 5us
#    6000 ns - Completed Thread ID : 2, Rand_Delay: 6us
#    8000 ns - Completed Thread ID : 3, Rand_Delay: 8us
#
#    8000 ns - Completed All // Completed at the end of last thread ID : #3

Round - 2 Result with Round -1 code commented

#       0 ns - Round 2
#
#
#       0 ns - Completed All // <= Concludes prematurely 
#
#    2000 ns - Completed Thread ID : 1, Rand_Delay: 2us
#    2000 ns - Completed Thread ID : 0, Rand_Delay: 2us
#    4000 ns - Completed Thread ID : 2, Rand_Delay: 4us
#    6000 ns - Completed Thread ID : 6, Rand_Delay: 6us
#    6000 ns - Completed Thread ID : 3, Rand_Delay: 6us
#    9000 ns - Completed Thread ID : 7, Rand_Delay: 9us
#    9000 ns - Completed Thread ID : 4, Rand_Delay: 9us
#   10000 ns - Completed Thread ID : 5, Rand_Delay: 10us

In reply to rshrig:

  //----------------
    // ROUND - 2
    //----------------
    $display("\n%t - Round 2\n",$realtime);
     int j=0
       repeat(NUM_OF_IDS)begin 
         automatic int i;
         i = j++;
         fork nb_rand_delay(i); join_none
       end 
    wait fork; 
    $display("\n%t - Completed All\n",$time);

wait fork waits for all children of current process to complete.