In reply to mseyunni:
Just use a fork/join instead of fork/join_none/wait fork in the inner loop.
while (outer loop) begin
thread-x;
fork
thread-outer-1;
thread-outer-2;
thread-outer-3;
join_none
while (inner loop)
fork
thread-inner-1;
thread-inner-2;
thread-inner-3;
join
wait fork;
end