Avoiding program blocks

Hi,
I have read here that there is no need to use program blocks, as the clocking block alone can take care of the races that the program block is aimed at.
If I understand correctly, this refers to the fact that the default timing of the clocking block is to sample inputs with a skew of #1step and drive outputs with a skew of #0, which means that the inputs are sampled in the preponed region while the outputs are driven in the re-NBA region. On the other hand, since for all modules (TB and DUT), signal evaluations is basically done in the active region, there can be no race between when the signals are driven and when they are evaluated. Is this understanding correct? Or is it missing some basic point?
I also have another question. If we’re supposed to solve the design/testbench races by clocking blocks, this means that all modules should contain clocking blocks for their ports. Doesn’t this lead to more code verbosity, compared to the simplicity of using program blocks?
Thank you

In reply to Farhad:

Use of clocking blocks is the most robust way of avoiding race conditions between the test bench in design. They can certainly help when dealing with mixed gate level timing and RTL design styles in ways that program blocks cannot.

But there are simpler ways of avoiding risk conditions that RTL designers use all the time between their blocks, and that is using non-blocking assignments. Another methodology that can be used is shifting the stimulus away from the active edge of the clock, like using the opposite edge or shifting it.