I think we need more details. Does this frontdoor sequence consume time, or does it execute in zero time?
How about adding some log messages to those threads, so we can see when each call starts and finishes. And similar start/end messages in your frontdoor sequence.
Most revealing would be to see the order those m_atomic statements are executed, from each thread, relative to each other. Probably most easily done from the debugger.