memory_order

Defined in header `<stdatomic.h>`
enum memory_order { memory_order_relaxed, memory_order_consume, memory_order_acquire, memory_order_release, memory_order_acq_rel, memory_order_seq_cst };		(since C11)

memory_order specifies how regular, non-atomic memory accesses are to be ordered around an atomic operation. Absent any constraints on a multi-core system, when multiple threads simultaneously read and write to several variables, one thread can observe the values change in an order different from the order another thread wrote them. Indeed, the apparent order of changes can even differ among multiple reader threads. Some similar effects can occur even on uniprocessor systems due to compiler transformations allowed by the memory model.

The default behavior of all atomic operations in the language and the library provides for sequentially consistent ordering (see discussion below). That default can hurt performance, but the library's atomic operations can be given an additional memory_order argument to specify the exact constraints, beyond atomicity, that the compiler and processor must enforce for that operation.

Defined in header `<atomic>`
Value	Explanation
`memory_order_relaxed`	Relaxed operation: there are no synchronization or ordering constraints, only atomicity is required of this operation.
`memory_order_consume`	A load operation with this memory order performs a consume operation on the affected memory location: no reads in the current thread dependent on the value currently loaded can be reordered before this load. This ensures that writes to data-dependent variables in other threads that release the same atomic variable are visible in the current thread. On most platforms, this affects compiler optimizations only.
`memory_order_acquire`	A load operation with this memory order performs the acquire operation on the affected memory location: no memory accesses in the current thread can be reordered before this load. This ensures that all writes in other threads that release the same atomic variable are visible in the current thread.
`memory_order_release`	A store operation with this memory order performs the release operation: no memory accesses in the current thread can be reordered after this store. This ensures that all writes in the current thread are visible in other threads that acquire the same atomic variable and writes that carry a dependency into the atomic variable become visible in other threads that consume the same atomic.
`memory_order_acq_rel`	A read-modify-write operation with this memory order is both an acquire operation and a release operation. No memory accesses in the current thread can be reordered before this load, and no memory accesses in the current thread can be reordered after this store. It is ensured that all writes in other threads that release the same atomic variable are visible before the modification and the modification is visible in other threads that acquire the same atomic variable.
`memory_order_seq_cst`	Any operation with this memory order is both an acquire operation and a release operation, plus a single total order exists in which all threads observe all modifications (see below) in the same order.

[edit] Relaxed ordering

Atomic operations tagged memory_order_relaxed are not synchronization operations, they do not order memory. They only guarantee atomicity and modification order consistency.

For example, with x and y initially zero,

// Thread 1:
r1 = atomic_load_explicit(y, memory_order_relaxed); // A
atomic_store_explicit(x, r1, memory_order_relaxed); // B
// Thread 2:
r2 = atomic_load_explicit(x, memory_order_relaxed); // C
atomic_store_explicit(y, 42, memory_order_relaxed); // D

is allowed to produce r1 == r2 == 42 because, although A is sequenced-before B within thread 1 and C is sequenced before D within thread 2, nothing prevents D from appearing before A in the modification order of y, and B from appearing before C in the modification order of x. The side-effect of D on y could be visible to the load A in Thread 1 while the side effect of B on x could be visible to the load C in Thread 2.

Typical use for relaxed memory ordering is incrementing counters, such as the reference counters , since this only requires atomicity, but not ordering or synchronization (note that decrementing the shared_ptr counters requires acquire-release synchronization with the destructor)

[edit] Release-Consume ordering

If an atomic store in thread A is tagged memory_order_release and an atomic load in thread B from the same variable is tagged memory_order_consume, all memory writes (non-atomic and relaxed atomic) that are dependency-ordered-before the atomic store from the point of view of thread A, become visible side-effects within those operations in thread B into which the load operation carries dependency, that is, once the atomic load is completed, those operators and functions in thread B that use the value obtained from the load are guaranteed to see what thread A wrote to memory.

The synchronization is established only between the threads releasing and consuming the same atomic variable. Other threads can see different order of memory accesses than either or both of the synchronized threads.

On all mainstream CPUs other than DEC Alpha, dependency ordering is automatic, no additional CPU instructions are issued for this synchronization mode, only certain compiler optimizations are affected (e.g. the compiler is prohibited from performing speculative loads on the objects that are involved in the dependency chain).

Typical use cases for this ordering involve read access to rarely written concurrent data structures (routing tables, configuration, security policies, firewall rules, etc) and publisher-subscriber situations with pointer-mediated publication, that is, when the producer publishes a pointer through which the consumer can access information: there is no need to make everything else the producer wrote to memory visible to the consumer (which may be an expensive operation on weakly-ordered architectures). An example of such scenario is rcu_dereference.

Note that currently (2/2015) no known production compilers track dependency chains: consume operations are lifted to acquire operations.

[edit] Release sequence

If some atomic is store-released and several other threads perform read-modify-write operations on that atomic, a "release sequence" is formed: all threads that perform the read-modify-writes to the same atomic synchronize with the first thread and each other even if they have no memory_order_release semantics. This makes single producer - multiple consumers situations possible without imposing unnecessary synchronization between individual consumer threads.

[edit] Release-Acquire ordering

If an atomic store in thread A is tagged memory_order_release and an atomic load in thread B from the same variable is tagged memory_order_acquire, all memory writes (non-atomic and relaxed atomic) that happened-before the atomic store from the point of view of thread A, become visible side-effects in thread B, that is, once the atomic load is completed, thread B is guaranteed to see everything thread A wrote to memory.

The synchronization is established only between the threads releasing and acquiring the same atomic variable. Other threads can see different order of memory accesses than either or both of the synchronized threads.

On strongly-ordered systems (x86, SPARC TSO, IBM mainframe), release-acquire ordering is automatic for the majority of operations. No additional CPU instructions are issued for this synchronization mode, only certain compiler optimizations are affected (e.g. the compiler is prohibited from moving non-atomic stores past the atomic store-release or perform non-atomic loads earlier than the atomic load-acquire). On weakly-ordered systems (ARM, Itanium, PowerPC), special CPU load or memory fence instructions have to be used.

Mutual exclusion locks (such as mutexes or atomic spinlock) are an example of release-acquire synchronization: when the lock is released by thread A and acquired by thread B, everything that took place in the critical section (before the release) in the context of thread A has to be visible to thread B (after the acquire) which is executing the same critical section.

[edit] Sequentially-consistent ordering

Atomic operations tagged memory_order_seq_cst not only order memory the same way as release/acquire ordering (everything that happened-before a store in one thread becomes a visible side effect in the thread that did a load), but also establish a single total modification order of all atomic operations that are so tagged.

Formally,

Each memory_order_seq_cst operation B that loads from atomic variable M, observes one of the following:

the result of the last operation A that modified M, which appears before B in the single total order
OR, if there was such an A, B may observe the result of some modification on M that is not memory_order_seq_cst and does not happen-before A
OR, if there wasn't such an A, B may observe the result of some unrelated modification of M that is not memory_order_seq_cst

If there was a memory_order_seq_cst std::atomic_thread_fence operation X sequenced-before B, then B observes one of the following:

the last memory_order_seq_cst modification of M that appears before X in the single total order
some unrelated modification of M that appears later in M's modification order

For a pair of atomic operations on M called A and B, where A writes and B reads M's value, if there are two memory_order_seq_cst std::atomic_thread_fences X and Y, and if A is sequenced-before X, Y is sequenced-before B, and X appears before Y in the Single Total Order, then B observes either:

the effect of A
some unrelated modification of M that appears after A in M's modification order

For a pair of atomic modifications of M called A and B, B occurs after A in M's modification order if

there is a memory_order_seq_cst std::atomic_thread_fence X such that A is sequenced-before X and X appears before B in the Single Total Order
or, there is a memory_order_seq_cst std::atomic_thread_fence Y such that Y is sequenced-before B and A appears before Y in the Single Total Order
or, there are memory_order_seq_cst std::atomic_thread_fences X and Y such that A is sequenced-before X, Y is sequenced-before B, and X appears before Y in the Single Total Order.

Note that the means that:

1) as soon as atomic operations that are not tagged memory_order_seq_cst enter the picture, the sequential consistency is lost

2) the sequentially-consistent fences are only establishing total ordering for the fences themselves, not for the atomic operations in the general case (sequenced-before is not a cross-thread relationship, unlike happens-before)

Sequential ordering may be necessary for multiple producer-multiple consumer situations where all consumers must observe the actions of all producers occurring in the same order.

Total sequential ordering requires a full memory fence CPU instruction on all multi-core systems. This may become a performance bottleneck since it forces the affected memory accesses to propagate to every core.

[edit] Relationship with volatile

Within a thread of execution, accesses (reads and writes) to all volatile objects are guaranteed to not be reordered relative to each other, but this order is not guaranteed to be observed by another thread, since volatile access does not establish inter-thread synchronization.

In addition, volatile accesses are not atomic (concurrent read and write is a data race) and do not order memory (non-volatile memory accesses may be freely reordered around the volatile access).

One notable exception is Visual Studio, where, with default settings, every volatile write has release semantics and every volatile read has acquire semantics (MSDN), and thus volatiles may be used for inter-thread synchronization. Standard volatile semantics are not applicable to multithreaded programming, although they are sufficient for e.g. communication with a signal handler.

[edit] Examples

[edit] References

C11 standard (ISO/IEC 9899:2011):

7.17.1/4 memory_order (p: 273)

7.17.3 Order and consistency (p: 275-277)

[edit] See also

C++ documentation for memory order

[edit] External links

MOESI protocol

x86-TSO: A Rigorous and Usable Programmer’s Model for x86 Multiprocessors P. Sewell et. al., 2010
A Tutorial Introduction to the ARM and POWER Relaxed Memory Models P. Sewell et al, 2012
MESIF: A Two-Hop Cache Coherency Protocol for Point-to-Point Interconnects J.R. Goodman, H.H.J. Hum, 2009

Language
headers
Type support
Dynamic memory management
Error handling
Program utilities
Variadic function support
Date and time utilities
Strings library
Algorithms
Numerics
Input/output support
Localization support
Thread support (C11)
Atomic operations (C11)
Technical Specifications

Types
memory_order
atomic_flag
Macros
ATOMIC_***_LOCK_FREE
ATOMIC_FLAG_INIT
ATOMIC_VAR_INIT
kill_dependency
Functions
atomic_flag_test_and_set
atomic_flag_clear
atomic_init
atomic_is_lock_free
atomic_store
atomic_load
atomic_exchange
atomic_compare_exchange
atomic_fetch_add
atomic_fetch_sub
atomic_fetch_or
atomic_fetch_xor
atomic_fetch_and
atomic_thread_fence
atomic_signal_fence