Fork join_none in for loop with multiple threads sharing semaphore

I have three questions about the code below:
**Q1: Why fork … join 3 threads always get the same order - get_key, a=2 → get_key, a=1 → get_key, a=0? How does this for … loop expand to three parts inside fork … join?
**
**Q2: Is there a way that the order of three threads get semaphore can vary in order to cover 6 cases(321 = 6)?
**
Q3: From the log file @0, why one thread(a=2) got the semaphore, a=0, a=1 still get printing out? My understanding is that once one thread(a=2) got the semaphore, the other thread should not start, which should not print out a=1, a=0 from the other 2 threads.

program tb;
   semaphore key = new(1);
   
   initial begin
	 for(int i=0; i<3; i++) begin
	 	fork
	 	 automatic int l = i;
		 process_it(l);
	 	join_none
    end
    wait fork;
    #10 $finish;
  end


   task automatic process_it(int a);
      $display($time,,"a=%0d", a);
      key.get(1);
      $display($time,,"Process_it: get key, a=%0d", a);
      #2;
      $display($time,,"Process_it: put key back, a=%0d", a);
      key.put(1);
   endtask
endprogram

log file:

ncsim> run
0 a=2
0 Process_it: get key, a=2
0 a=1
0 a=0

2 Process_it: put key back, a=2
2 Process_it: get key, a=1
4 Process_it: put key back, a=1
4 Process_it: get key, a=0
6 Process_it: put key back, a=0
Simulation complete via $finish(1) at time 16 NS + 1

Hi, Mlsdx
In VCS i get following output during simulation.

               0 a=0
               0 Process_it: get key, a=0
               0 a=1
               0 a=2
               2 Process_it: put key back, a=0
               2 Process_it: get key, a=1
               4 Process_it: put key back, a=1
               4 Process_it: get key, a=2
               6 Process_it: put key back, a=2

In reply to mlsxdx:

A1: When you have parallel threads, you cannot depend on the order of execution of those threads - it is not deterministic. However, using the same version of the same tool, a simulator will choose the same order of execution for stability during debugging.
A2: You cannot choose the thread ordering unless you write code, like an arbiter, that controls the ordering.
A3: The reason you see the a=1, a=0 printed is that the other threads to not block until reaching the key.get(1) statement. The $display comes before that.