In reply to AnPham:
As an HDL, Verilog was designed as if code could be run in parallel, even though it was implemented in a single threaded program. This allows much more flexibility in optimizations of the code, as well as eventual parallel implementation.
But a semaphore also provides FIFO thread access to shared variables. It also has queues to make sure each thread waiting for access get its access in order.