Tree - rpms/qemu-kvm - CentOS Git server

cryptospore / rpms / qemu-kvm

Forked from rpms/qemu-kvm 2 years ago

Source
Stats

Blame SOURCES/kvm-add-a-header-file-for-atomic-operations.patch

Blob History Raw

		0a122b	`From c5386144fbf09f628148101bc674e2421cdd16e3 Mon Sep 17 00:00:00 2001`
		0a122b	`Message-Id: <c5386144fbf09f628148101bc674e2421cdd16e3.1387382496.git.minovotn@redhat.com>`
		0a122b	`From: Nigel Croxon <ncroxon@redhat.com>`
		0a122b	`Date: Thu, 14 Nov 2013 22:52:37 +0100`
		0a122b	`Subject: [PATCH 01/46] add a header file for atomic operations`
		0a122b
		0a122b	`RH-Author: Nigel Croxon <ncroxon@redhat.com>`
		0a122b	`Message-id: <1384469598-13137-2-git-send-email-ncroxon@redhat.com>`
		0a122b	`Patchwork-id: 55686`
		0a122b	`O-Subject: [RHEL7.0 PATCH 01/42] add a header file for atomic operations`
		0a122b	`Bugzilla: 1011720`
		0a122b	`RH-Acked-by: Orit Wasserman <owasserm@redhat.com>`
		0a122b	`RH-Acked-by: Amit Shah <amit.shah@redhat.com>`
		0a122b	`RH-Acked-by: Paolo Bonzini <pbonzini@redhat.com>`
		0a122b
		0a122b	`Bugzilla: 1011720`
		0a122b	`https://bugzilla.redhat.com/show_bug.cgi?id=1011720`
		0a122b
		0a122b	`>From commit ID:`
		0a122b	`commit 5444e768ee1abe6e021bece19a9a932351f88c88`
		0a122b	`Author: Paolo Bonzini <pbonzini@redhat.com>`
		0a122b	`Date: Mon May 13 13:29:47 2013 +0200`
		0a122b
		0a122b	`add a header file for atomic operations`
		0a122b
		0a122b	`We're already using them in several places, but __sync builtins are just`
		0a122b	`too ugly to type, and do not provide seqcst load/store operations.`
		0a122b
		0a122b	`Reviewed-by: Richard Henderson <rth@twiddle.net>`
		0a122b	`Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>`
		0a122b	`---`
		0a122b	`docs/atomics.txt \| 352 ++++++++++++++++++++++++++++++++++++++++++++++`
		0a122b	`hw/display/qxl.c \| 3 +-`
		0a122b	`hw/virtio/vhost.c \| 9 +-`
		0a122b	`include/qemu/atomic.h \| 198 ++++++++++++++++++++++----`
		0a122b	`migration.c \| 3 +-`
		0a122b	`tests/test-thread-pool.c \| 8 +-`
		0a122b	`6 files changed, 529 insertions(+), 44 deletions(-)`
		0a122b	`create mode 100644 docs/atomics.txt`
		0a122b
		0a122b	`Signed-off-by: Michal Novotny <minovotn@redhat.com>`
		0a122b	`---`
		0a122b	`docs/atomics.txt \| 352 +++++++++++++++++++++++++++++++++++++++++++++++`
		0a122b	`hw/display/qxl.c \| 3 +-`
		0a122b	`hw/virtio/vhost.c \| 9 +-`
		0a122b	`include/qemu/atomic.h \| 198 +++++++++++++++++++++-----`
		0a122b	`migration.c \| 3 +-`
		0a122b	`tests/test-thread-pool.c \| 8 +-`
		0a122b	`6 files changed, 529 insertions(+), 44 deletions(-)`
		0a122b	`create mode 100644 docs/atomics.txt`
		0a122b
		0a122b	`diff --git a/docs/atomics.txt b/docs/atomics.txt`
		0a122b	`new file mode 100644`
		0a122b	`index 0000000..6f2997b`
		0a122b	`--- /dev/null`
		0a122b	`+++ b/docs/atomics.txt`
		0a122b	`@@ -0,0 +1,352 @@`
		0a122b	`+CPUs perform independent memory operations effectively in random order.`
		0a122b	`+but this can be a problem for CPU-CPU interaction (including interactions`
		0a122b	`+between QEMU and the guest). Multi-threaded programs use various tools`
		0a122b	`+to instruct the compiler and the CPU to restrict the order to something`
		0a122b	`+that is consistent with the expectations of the programmer.`
		0a122b	`+`
		0a122b	`+The most basic tool is locking. Mutexes, condition variables and`
		0a122b	`+semaphores are used in QEMU, and should be the default approach to`
		0a122b	`+synchronization. Anything else is considerably harder, but it's`
		0a122b	`+also justified more often than one would like. The two tools that`
		0a122b	`+are provided by qemu/atomic.h are memory barriers and atomic operations.`
		0a122b	`+`
		0a122b	`+Macros defined by qemu/atomic.h fall in three camps:`
		0a122b	`+`
		0a122b	`+- compiler barriers: barrier();`
		0a122b	`+`
		0a122b	`+- weak atomic access and manual memory barriers: atomic_read(),`
		0a122b	`+ atomic_set(), smp_rmb(), smp_wmb(), smp_mb(), smp_read_barrier_depends();`
		0a122b	`+`
		0a122b	`+- sequentially consistent atomic access: everything else.`
		0a122b	`+`
		0a122b	`+`
		0a122b	`+COMPILER MEMORY BARRIER`
		0a122b	`+=======================`
		0a122b	`+`
		0a122b	`+barrier() prevents the compiler from moving the memory accesses either`
		0a122b	`+side of it to the other side. The compiler barrier has no direct effect`
		0a122b	`+on the CPU, which may then reorder things however it wishes.`
		0a122b	`+`
		0a122b	`+barrier() is mostly used within qemu/atomic.h itself. On some`
		0a122b	`+architectures, CPU guarantees are strong enough that blocking compiler`
		0a122b	`+optimizations already ensures the correct order of execution. In this`
		0a122b	`+case, qemu/atomic.h will reduce stronger memory barriers to simple`
		0a122b	`+compiler barriers.`
		0a122b	`+`
		0a122b	`+Still, barrier() can be useful when writing code that can be interrupted`
		0a122b	`+by signal handlers.`
		0a122b	`+`
		0a122b	`+`
		0a122b	`+SEQUENTIALLY CONSISTENT ATOMIC ACCESS`
		0a122b	`+=====================================`
		0a122b	`+`
		0a122b	`+Most of the operations in the qemu/atomic.h header ensure *sequential`
		0a122b	`+consistency*, where "the result of any execution is the same as if the`
		0a122b	`+operations of all the processors were executed in some sequential order,`
		0a122b	`+and the operations of each individual processor appear in this sequence`
		0a122b	`+in the order specified by its program".`
		0a122b	`+`
		0a122b	`+qemu/atomic.h provides the following set of atomic read-modify-write`
		0a122b	`+operations:`
		0a122b	`+`
		0a122b	`+ void atomic_inc(ptr)`
		0a122b	`+ void atomic_dec(ptr)`
		0a122b	`+ void atomic_add(ptr, val)`
		0a122b	`+ void atomic_sub(ptr, val)`
		0a122b	`+ void atomic_and(ptr, val)`
		0a122b	`+ void atomic_or(ptr, val)`
		0a122b	`+`
		0a122b	`+ typeof(*ptr) atomic_fetch_inc(ptr)`
		0a122b	`+ typeof(*ptr) atomic_fetch_dec(ptr)`
		0a122b	`+ typeof(*ptr) atomic_fetch_add(ptr, val)`
		0a122b	`+ typeof(*ptr) atomic_fetch_sub(ptr, val)`
		0a122b	`+ typeof(*ptr) atomic_fetch_and(ptr, val)`
		0a122b	`+ typeof(*ptr) atomic_fetch_or(ptr, val)`
		0a122b	`+ typeof(*ptr) atomic_xchg(ptr, val`
		0a122b	`+ typeof(*ptr) atomic_cmpxchg(ptr, old, new)`
		0a122b	`+`
		0a122b	`+all of which return the old value of *ptr. These operations are`
		0a122b	`+polymorphic; they operate on any type that is as wide as an int.`
		0a122b	`+`
		0a122b	`+Sequentially consistent loads and stores can be done using:`
		0a122b	`+`
		0a122b	`+ atomic_fetch_add(ptr, 0) for loads`
		0a122b	`+ atomic_xchg(ptr, val) for stores`
		0a122b	`+`
		0a122b	`+However, they are quite expensive on some platforms, notably POWER and`
		0a122b	`+ARM. Therefore, qemu/atomic.h provides two primitives with slightly`
		0a122b	`+weaker constraints:`
		0a122b	`+`
		0a122b	`+ typeof(*ptr) atomic_mb_read(ptr)`
		0a122b	`+ void atomic_mb_set(ptr, val)`
		0a122b	`+`
		0a122b	`+The semantics of these primitives map to Java volatile variables,`
		0a122b	`+and are strongly related to memory barriers as used in the Linux`
		0a122b	`+kernel (see below).`
		0a122b	`+`
		0a122b	`+As long as you use atomic_mb_read and atomic_mb_set, accesses cannot`
		0a122b	`+be reordered with each other, and it is also not possible to reorder`
		0a122b	`+"normal" accesses around them.`
		0a122b	`+`
		0a122b	`+However, and this is the important difference between`
		0a122b	`+atomic_mb_read/atomic_mb_set and sequential consistency, it is important`
		0a122b	`+for both threads to access the same volatile variable. It is not the`
		0a122b	`+case that everything visible to thread A when it writes volatile field f`
		0a122b	`+becomes visible to thread B after it reads volatile field g. The store`
		0a122b	`+and load have to "match" (i.e., be performed on the same volatile`
		0a122b	`+field) to achieve the right semantics.`
		0a122b	`+`
		0a122b	`+`
		0a122b	`+These operations operate on any type that is as wide as an int or smaller.`
		0a122b	`+`
		0a122b	`+`
		0a122b	`+WEAK ATOMIC ACCESS AND MANUAL MEMORY BARRIERS`
		0a122b	`+=============================================`
		0a122b	`+`
		0a122b	`+Compared to sequentially consistent atomic access, programming with`
		0a122b	`+weaker consistency models can be considerably more complicated.`
		0a122b	`+In general, if the algorithm you are writing includes both writes`
		0a122b	`+and reads on the same side, it is generally simpler to use sequentially`
		0a122b	`+consistent primitives.`
		0a122b	`+`
		0a122b	`+When using this model, variables are accessed with atomic_read() and`
		0a122b	`+atomic_set(), and restrictions to the ordering of accesses is enforced`
		0a122b	`+using the smp_rmb(), smp_wmb(), smp_mb() and smp_read_barrier_depends()`
		0a122b	`+memory barriers.`
		0a122b	`+`
		0a122b	`+atomic_read() and atomic_set() prevents the compiler from using`
		0a122b	`+optimizations that might otherwise optimize accesses out of existence`
		0a122b	`+on the one hand, or that might create unsolicited accesses on the other.`
		0a122b	`+In general this should not have any effect, because the same compiler`
		0a122b	`+barriers are already implied by memory barriers. However, it is useful`
		0a122b	`+to do so, because it tells readers which variables are shared with`
		0a122b	`+other threads, and which are local to the current thread or protected`
		0a122b	`+by other, more mundane means.`
		0a122b	`+`
		0a122b	`+Memory barriers control the order of references to shared memory.`
		0a122b	`+They come in four kinds:`
		0a122b	`+`
		0a122b	`+- smp_rmb() guarantees that all the LOAD operations specified before`
		0a122b	`+ the barrier will appear to happen before all the LOAD operations`
		0a122b	`+ specified after the barrier with respect to the other components of`
		0a122b	`+ the system.`
		0a122b	`+`
		0a122b	`+ In other words, smp_rmb() puts a partial ordering on loads, but is not`
		0a122b	`+ required to have any effect on stores.`
		0a122b	`+`
		0a122b	`+- smp_wmb() guarantees that all the STORE operations specified before`
		0a122b	`+ the barrier will appear to happen before all the STORE operations`
		0a122b	`+ specified after the barrier with respect to the other components of`
		0a122b	`+ the system.`
		0a122b	`+`
		0a122b	`+ In other words, smp_wmb() puts a partial ordering on stores, but is not`
		0a122b	`+ required to have any effect on loads.`
		0a122b	`+`
		0a122b	`+- smp_mb() guarantees that all the LOAD and STORE operations specified`
		0a122b	`+ before the barrier will appear to happen before all the LOAD and`
		0a122b	`+ STORE operations specified after the barrier with respect to the other`
		0a122b	`+ components of the system.`
		0a122b	`+`
		0a122b	`+ smp_mb() puts a partial ordering on both loads and stores. It is`
		0a122b	`+ stronger than both a read and a write memory barrier; it implies both`
		0a122b	`+ smp_rmb() and smp_wmb(), but it also prevents STOREs coming before the`
		0a122b	`+ barrier from overtaking LOADs coming after the barrier and vice versa.`
		0a122b	`+`
		0a122b	`+- smp_read_barrier_depends() is a weaker kind of read barrier. On`
		0a122b	`+ most processors, whenever two loads are performed such that the`
		0a122b	`+ second depends on the result of the first (e.g., the first load`
		0a122b	`+ retrieves the address to which the second load will be directed),`
		0a122b	`+ the processor will guarantee that the first LOAD will appear to happen`
		0a122b	`+ before the second with respect to the other components of the system.`
		0a122b	`+ However, this is not always true---for example, it was not true on`
		0a122b	`+ Alpha processors. Whenever this kind of access happens to shared`
		0a122b	`+ memory (that is not protected by a lock), a read barrier is needed,`
		0a122b	`+ and smp_read_barrier_depends() can be used instead of smp_rmb().`
		0a122b	`+`
		0a122b	`+ Note that the first load really has to have a _data_ dependency and not`
		0a122b	`+ a control dependency. If the address for the second load is dependent`
		0a122b	`+ on the first load, but the dependency is through a conditional rather`
		0a122b	`+ than actually loading the address itself, then it's a _control_`
		0a122b	`+ dependency and a full read barrier or better is required.`
		0a122b	`+`
		0a122b	`+`
		0a122b	`+This is the set of barriers that is required between two atomic_read()`
		0a122b	`+and atomic_set() operations to achieve sequential consistency:`
		0a122b	`+`
		0a122b	`+ \| 2nd operation \|`
		0a122b	`+ \|-----------------------------------------\|`
		0a122b	`+ 1st operation \| (after last) \| atomic_read \| atomic_set \|`
		0a122b	`+ ---------------+--------------+-------------+------------\|`
		0a122b	`+ (before first) \| \| none \| smp_wmb() \|`
		0a122b	`+ ---------------+--------------+-------------+------------\|`
		0a122b	`+ atomic_read \| smp_rmb() \| smp_rmb()* \| ** \|`
		0a122b	`+ ---------------+--------------+-------------+------------\|`
		0a122b	`+ atomic_set \| none \| smp_mb()*** \| smp_wmb() \|`
		0a122b	`+ ---------------+--------------+-------------+------------\|`
		0a122b	`+`
		0a122b	`+ * Or smp_read_barrier_depends().`
		0a122b	`+`
		0a122b	`+ ** This requires a load-store barrier. How to achieve this varies`
		0a122b	`+ depending on the machine, but in practice smp_rmb()+smp_wmb()`
		0a122b	`+ should have the desired effect. For example, on PowerPC the`
		0a122b	`+ lwsync instruction is a combined load-load, load-store and`
		0a122b	`+ store-store barrier.`
		0a122b	`+`
		0a122b	`+ *** This requires a store-load barrier. On most machines, the only`
		0a122b	`+ way to achieve this is a full barrier.`
		0a122b	`+`
		0a122b	`+`
		0a122b	`+You can see that the two possible definitions of atomic_mb_read()`
		0a122b	`+and atomic_mb_set() are the following:`
		0a122b	`+`
		0a122b	`+ 1) atomic_mb_read(p) = atomic_read(p); smp_rmb()`
		0a122b	`+ atomic_mb_set(p, v) = smp_wmb(); atomic_set(p, v); smp_mb()`
		0a122b	`+`
		0a122b	`+ 2) atomic_mb_read(p) = smp_mb() atomic_read(p); smp_rmb()`
		0a122b	`+ atomic_mb_set(p, v) = smp_wmb(); atomic_set(p, v);`
		0a122b	`+`
		0a122b	`+Usually the former is used, because smp_mb() is expensive and a program`
		0a122b	`+normally has more reads than writes. Therefore it makes more sense to`
		0a122b	`+make atomic_mb_set() the more expensive operation.`
		0a122b	`+`
		0a122b	`+There are two common cases in which atomic_mb_read and atomic_mb_set`
		0a122b	`+generate too many memory barriers, and thus it can be useful to manually`
		0a122b	`+place barriers instead:`
		0a122b	`+`
		0a122b	`+- when a data structure has one thread that is always a writer`
		0a122b	`+ and one thread that is always a reader, manual placement of`
		0a122b	`+ memory barriers makes the write side faster. Furthermore,`
		0a122b	`+ correctness is easy to check for in this case using the "pairing"`
		0a122b	`+ trick that is explained below:`
		0a122b	`+`
		0a122b	`+ thread 1 thread 1`
		0a122b	`+ ------------------------- ------------------------`
		0a122b	`+ (other writes)`
		0a122b	`+ smp_wmb()`
		0a122b	`+ atomic_mb_set(&a, x) atomic_set(&a, x)`
		0a122b	`+ smp_wmb()`
		0a122b	`+ atomic_mb_set(&b, y) atomic_set(&b, y)`
		0a122b	`+`
		0a122b	`+ =>`
		0a122b	`+ thread 2 thread 2`
		0a122b	`+ ------------------------- ------------------------`
		0a122b	`+ y = atomic_mb_read(&b) y = atomic_read(&b)`
		0a122b	`+ smp_rmb()`
		0a122b	`+ x = atomic_mb_read(&a) x = atomic_read(&a)`
		0a122b	`+ smp_rmb()`
		0a122b	`+`
		0a122b	`+- sometimes, a thread is accessing many variables that are otherwise`
		0a122b	`+ unrelated to each other (for example because, apart from the current`
		0a122b	`+ thread, exactly one other thread will read or write each of these`
		0a122b	`+ variables). In this case, it is possible to "hoist" the implicit`
		0a122b	`+ barriers provided by atomic_mb_read() and atomic_mb_set() outside`
		0a122b	`+ a loop. For example, the above definition atomic_mb_read() gives`
		0a122b	`+ the following transformation:`
		0a122b	`+`
		0a122b	`+ n = 0; n = 0;`
		0a122b	`+ for (i = 0; i < 10; i++) => for (i = 0; i < 10; i++)`
		0a122b	`+ n += atomic_mb_read(&a[i]); n += atomic_read(&a[i]);`
		0a122b	`+ smp_rmb();`
		0a122b	`+`
		0a122b	`+ Similarly, atomic_mb_set() can be transformed as follows:`
		0a122b	`+ smp_mb():`
		0a122b	`+`
		0a122b	`+ smp_wmb();`
		0a122b	`+ for (i = 0; i < 10; i++) => for (i = 0; i < 10; i++)`
		0a122b	`+ atomic_mb_set(&a[i], false); atomic_set(&a[i], false);`
		0a122b	`+ smp_mb();`
		0a122b	`+`
		0a122b	`+`
		0a122b	`+The two tricks can be combined. In this case, splitting a loop in`
		0a122b	`+two lets you hoist the barriers out of the loops _and_ eliminate the`
		0a122b	`+expensive smp_mb():`
		0a122b	`+`
		0a122b	`+ smp_wmb();`
		0a122b	`+ for (i = 0; i < 10; i++) { => for (i = 0; i < 10; i++)`
		0a122b	`+ atomic_mb_set(&a[i], false); atomic_set(&a[i], false);`
		0a122b	`+ atomic_mb_set(&b[i], false); smb_wmb();`
		0a122b	`+ } for (i = 0; i < 10; i++)`
		0a122b	`+ atomic_set(&a[i], false);`
		0a122b	`+ smp_mb();`
		0a122b	`+`
		0a122b	`+ The other thread can still use atomic_mb_read()/atomic_mb_set()`
		0a122b	`+`
		0a122b	`+`
		0a122b	`+Memory barrier pairing`
		0a122b	`+----------------------`
		0a122b	`+`
		0a122b	`+A useful rule of thumb is that memory barriers should always, or almost`
		0a122b	`+always, be paired with another barrier. In the case of QEMU, however,`
		0a122b	`+note that the other barrier may actually be in a driver that runs in`
		0a122b	`+the guest!`
		0a122b	`+`
		0a122b	`+For the purposes of pairing, smp_read_barrier_depends() and smp_rmb()`
		0a122b	`+both count as read barriers. A read barriers shall pair with a write`
		0a122b	`+barrier or a full barrier; a write barrier shall pair with a read`
		0a122b	`+barrier or a full barrier. A full barrier can pair with anything.`
		0a122b	`+For example:`
		0a122b	`+`
		0a122b	`+ thread 1 thread 2`
		0a122b	`+ =============== ===============`
		0a122b	`+ a = 1;`
		0a122b	`+ smp_wmb();`
		0a122b	`+ b = 2; x = b;`
		0a122b	`+ smp_rmb();`
		0a122b	`+ y = a;`
		0a122b	`+`
		0a122b	`+Note that the "writing" thread are accessing the variables in the`
		0a122b	`+opposite order as the "reading" thread. This is expected: stores`
		0a122b	`+before the write barrier will normally match the loads after the`
		0a122b	`+read barrier, and vice versa. The same is true for more than 2`
		0a122b	`+access and for data dependency barriers:`
		0a122b	`+`
		0a122b	`+ thread 1 thread 2`
		0a122b	`+ =============== ===============`
		0a122b	`+ b[2] = 1;`
		0a122b	`+ smp_wmb();`
		0a122b	`+ x->i = 2;`
		0a122b	`+ smp_wmb();`
		0a122b	`+ a = x; x = a;`
		0a122b	`+ smp_read_barrier_depends();`
		0a122b	`+ y = x->i;`
		0a122b	`+ smp_read_barrier_depends();`
		0a122b	`+ z = b[y];`
		0a122b	`+`
		0a122b	`+smp_wmb() also pairs with atomic_mb_read(), and smp_rmb() also pairs`
		0a122b	`+with atomic_mb_set().`
		0a122b	`+`
		0a122b	`+`
		0a122b	`+COMPARISON WITH LINUX KERNEL MEMORY BARRIERS`
		0a122b	`+============================================`
		0a122b	`+`
		0a122b	`+Here is a list of differences between Linux kernel atomic operations`
		0a122b	`+and memory barriers, and the equivalents in QEMU:`
		0a122b	`+`
		0a122b	`+- atomic operations in Linux are always on a 32-bit int type and`
		0a122b	`+ use a boxed atomic_t type; atomic operations in QEMU are polymorphic`
		0a122b	`+ and use normal C types.`
		0a122b	`+`
		0a122b	`+- atomic_read and atomic_set in Linux give no guarantee at all;`
		0a122b	`+ atomic_read and atomic_set in QEMU include a compiler barrier`
		0a122b	`+ (similar to the ACCESS_ONCE macro in Linux).`
		0a122b	`+`
		0a122b	`+- most atomic read-modify-write operations in Linux return void;`
		0a122b	`+ in QEMU, all of them return the old value of the variable.`
		0a122b	`+`
		0a122b	`+- different atomic read-modify-write operations in Linux imply`
		0a122b	`+ a different set of memory barriers; in QEMU, all of them enforce`
		0a122b	`+ sequential consistency, which means they imply full memory barriers`
		0a122b	`+ before and after the operation.`
		0a122b	`+`
		0a122b	`+- Linux does not have an equivalent of atomic_mb_read() and`
		0a122b	`+ atomic_mb_set(). In particular, note that set_mb() is a little`
		0a122b	`+ weaker than atomic_mb_set().`
		0a122b	`+`
		0a122b	`+`
		0a122b	`+SOURCES`
		0a122b	`+=======`
		0a122b	`+`
		0a122b	`+* Documentation/memory-barriers.txt from the Linux kernel`
		0a122b	`+`
		0a122b	`+* "The JSR-133 Cookbook for Compiler Writers", available at`
		0a122b	`+ http://g.oswego.edu/dl/jmm/cookbook.html`
		0a122b	`diff --git a/hw/display/qxl.c b/hw/display/qxl.c`
		0a122b	`index ea985d2..830b3c5 100644`
		0a122b	`--- a/hw/display/qxl.c`
		0a122b	`+++ b/hw/display/qxl.c`
		0a122b	`@@ -23,6 +23,7 @@`
		0a122b	`#include "qemu-common.h"`
		0a122b	`#include "qemu/timer.h"`
		0a122b	`#include "qemu/queue.h"`
		0a122b	`+#include "qemu/atomic.h"`
		0a122b	`#include "monitor/monitor.h"`
		0a122b	`#include "sysemu/sysemu.h"`
		0a122b	`#include "trace.h"`
		0a122b	`@@ -1726,7 +1727,7 @@ static void qxl_send_events(PCIQXLDevice *d, uint32_t events)`
		0a122b	`trace_qxl_send_events_vm_stopped(d->id, events);`
		0a122b	`return;`
		0a122b	`}`
		0a122b	`- old_pending = __sync_fetch_and_or(&d->ram->int_pending, le_events);`
		0a122b	`+ old_pending = atomic_fetch_or(&d->ram->int_pending, le_events);`
		0a122b	`if ((old_pending & le_events) == le_events) {`
		0a122b	`return;`
		0a122b	`}`
		0a122b	`diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c`
		0a122b	`index 0dabf26..54aa569 100644`
		0a122b	`--- a/hw/virtio/vhost.c`
		0a122b	`+++ b/hw/virtio/vhost.c`
		0a122b	`@@ -16,6 +16,7 @@`
		0a122b	`#include <sys/ioctl.h>`
		0a122b	`#include "hw/virtio/vhost.h"`
		0a122b	`#include "hw/hw.h"`
		0a122b	`+#include "qemu/atomic.h"`
		0a122b	`#include "qemu/range.h"`
		0a122b	`#include <linux/vhost.h>`
		0a122b	`#include "exec/address-spaces.h"`
		0a122b	`@@ -47,11 +48,9 @@ static void vhost_dev_sync_region(struct vhost_dev *dev,`
		0a122b	`addr += VHOST_LOG_CHUNK;`
		0a122b	`continue;`
		0a122b	`}`
		0a122b	`- /* Data must be read atomically. We don't really`
		0a122b	`- * need the barrier semantics of __sync`
		0a122b	`- * builtins, but it's easier to use them than`
		0a122b	`- * roll our own. */`
		0a122b	`- log = __sync_fetch_and_and(from, 0);`
		0a122b	`+ /* Data must be read atomically. We don't really need barrier semantics`
		0a122b	`+ * but it's easier to use atomic_* than roll our own. */`
		0a122b	`+ log = atomic_xchg(from, 0);`
		0a122b	`while ((bit = sizeof(log) > sizeof(int) ?`
		0a122b	`ffsll(log) : ffs(log))) {`
		0a122b	`hwaddr page_addr;`
		0a122b	`diff --git a/include/qemu/atomic.h b/include/qemu/atomic.h`
		0a122b	`index 10becb6..0aa8913 100644`
		0a122b	`--- a/include/qemu/atomic.h`
		0a122b	`+++ b/include/qemu/atomic.h`
		0a122b	`@@ -1,68 +1,202 @@`
		0a122b	`-#ifndef __QEMU_BARRIER_H`
		0a122b	`-#define __QEMU_BARRIER_H 1`
		0a122b	`+/*`
		0a122b	`+ * Simple interface for atomic operations.`
		0a122b	`+ *`
		0a122b	`+ * Copyright (C) 2013 Red Hat, Inc.`
		0a122b	`+ *`
		0a122b	`+ * Author: Paolo Bonzini <pbonzini@redhat.com>`
		0a122b	`+ *`
		0a122b	`+ * This work is licensed under the terms of the GNU GPL, version 2 or later.`
		0a122b	`+ * See the COPYING file in the top-level directory.`
		0a122b	`+ *`
		0a122b	`+ */`
		0a122b
		0a122b	`-/* Compiler barrier */`
		0a122b	`-#define barrier() asm volatile("" ::: "memory")`
		0a122b	`+#ifndef __QEMU_ATOMIC_H`
		0a122b	`+#define __QEMU_ATOMIC_H 1`
		0a122b
		0a122b	`-#if defined(__i386__)`
		0a122b	`+#include "qemu/compiler.h"`
		0a122b
		0a122b	`-#include "qemu/compiler.h" /* QEMU_GNUC_PREREQ */`
		0a122b	`+/* For C11 atomic ops */`
		0a122b
		0a122b	`-/*`
		0a122b	`- * Because of the strongly ordered x86 storage model, wmb() and rmb() are nops`
		0a122b	`- * on x86(well, a compiler barrier only). Well, at least as long as`
		0a122b	`- * qemu doesn't do accesses to write-combining memory or non-temporal`
		0a122b	`- * load/stores from C code.`
		0a122b	`- */`
		0a122b	`-#define smp_wmb() barrier()`
		0a122b	`-#define smp_rmb() barrier()`
		0a122b	`+/* Compiler barrier */`
		0a122b	`+#define barrier() ({ asm volatile("" ::: "memory"); (void)0; })`
		0a122b	`+`
		0a122b	`+#ifndef __ATOMIC_RELAXED`
		0a122b
		0a122b	`/*`
		0a122b	`- * We use GCC builtin if it's available, as that can use`
		0a122b	`- * mfence on 32 bit as well, e.g. if built with -march=pentium-m.`
		0a122b	`- * However, on i386, there seem to be known bugs as recently as 4.3.`
		0a122b	`- * */`
		0a122b	`-#if QEMU_GNUC_PREREQ(4, 4)`
		0a122b	`-#define smp_mb() __sync_synchronize()`
		0a122b	`+ * We use GCC builtin if it's available, as that can use mfence on`
		0a122b	`+ * 32-bit as well, e.g. if built with -march=pentium-m. However, on`
		0a122b	`+ * i386 the spec is buggy, and the implementation followed it until`
		0a122b	`+ * 4.3 (http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36793).`
		0a122b	`+ */`
		0a122b	`+#if defined(__i386__) \|\| defined(__x86_64__)`
		0a122b	`+#if !QEMU_GNUC_PREREQ(4, 4)`
		0a122b	`+#if defined __x86_64__`
		0a122b	`+#define smp_mb() ({ asm volatile("mfence" ::: "memory"); (void)0; })`
		0a122b	`#else`
		0a122b	`-#define smp_mb() asm volatile("lock; addl $0,0(%%esp) " ::: "memory")`
		0a122b	`+#define smp_mb() ({ asm volatile("lock; addl $0,0(%%esp) " ::: "memory"); (void)0; })`
		0a122b	`+#endif`
		0a122b	`+#endif`
		0a122b	`+#endif`
		0a122b	`+`
		0a122b	`+`
		0a122b	`+#ifdef __alpha__`
		0a122b	`+#define smp_read_barrier_depends() asm volatile("mb":::"memory")`
		0a122b	`#endif`
		0a122b
		0a122b	`-#elif defined(__x86_64__)`
		0a122b	`+#if defined(__i386__) \|\| defined(__x86_64__) \|\| defined(__s390x__)`
		0a122b
		0a122b	`+/*`
		0a122b	`+ * Because of the strongly ordered storage model, wmb() and rmb() are nops`
		0a122b	`+ * here (a compiler barrier only). QEMU doesn't do accesses to write-combining`
		0a122b	`+ * qemu memory or non-temporal load/stores from C code.`
		0a122b	`+ */`
		0a122b	`#define smp_wmb() barrier()`
		0a122b	`#define smp_rmb() barrier()`
		0a122b	`-#define smp_mb() asm volatile("mfence" ::: "memory")`
		0a122b	`+`
		0a122b	`+/*`
		0a122b	`+ * __sync_lock_test_and_set() is documented to be an acquire barrier only,`
		0a122b	`+ * but it is a full barrier at the hardware level. Add a compiler barrier`
		0a122b	`+ * to make it a full barrier also at the compiler level.`
		0a122b	`+ */`
		0a122b	`+#define atomic_xchg(ptr, i) (barrier(), __sync_lock_test_and_set(ptr, i))`
		0a122b	`+`
		0a122b	`+/*`
		0a122b	`+ * Load/store with Java volatile semantics.`
		0a122b	`+ */`
		0a122b	`+#define atomic_mb_set(ptr, i) ((void)atomic_xchg(ptr, i))`
		0a122b
		0a122b	`#elif defined(_ARCH_PPC)`
		0a122b
		0a122b	`/*`
		0a122b	`* We use an eieio() for wmb() on powerpc. This assumes we don't`
		0a122b	`* need to order cacheable and non-cacheable stores with respect to`
		0a122b	`- * each other`
		0a122b	`+ * each other.`
		0a122b	`+ *`
		0a122b	`+ * smp_mb has the same problem as on x86 for not-very-new GCC`
		0a122b	`+ * (http://patchwork.ozlabs.org/patch/126184/, Nov 2011).`
		0a122b	`*/`
		0a122b	`-#define smp_wmb() asm volatile("eieio" ::: "memory")`
		0a122b	`-`
		0a122b	`+#define smp_wmb() ({ asm volatile("eieio" ::: "memory"); (void)0; })`
		0a122b	`#if defined(__powerpc64__)`
		0a122b	`-#define smp_rmb() asm volatile("lwsync" ::: "memory")`
		0a122b	`+#define smp_rmb() ({ asm volatile("lwsync" ::: "memory"); (void)0; })`
		0a122b	`#else`
		0a122b	`-#define smp_rmb() asm volatile("sync" ::: "memory")`
		0a122b	`+#define smp_rmb() ({ asm volatile("sync" ::: "memory"); (void)0; })`
		0a122b	`#endif`
		0a122b	`+#define smp_mb() ({ asm volatile("sync" ::: "memory"); (void)0; })`
		0a122b
		0a122b	`-#define smp_mb() asm volatile("sync" ::: "memory")`
		0a122b	`+#endif /* _ARCH_PPC */`
		0a122b
		0a122b	`-#else`
		0a122b	`+#endif /* C11 atomics */`
		0a122b
		0a122b	`/*`
		0a122b	`* For (host) platforms we don't have explicit barrier definitions`
		0a122b	`* for, we use the gcc __sync_synchronize() primitive to generate a`
		0a122b	`* full barrier. This should be safe on all platforms, though it may`
		0a122b	`- * be overkill for wmb() and rmb().`
		0a122b	`+ * be overkill for smp_wmb() and smp_rmb().`
		0a122b	`*/`
		0a122b	`+#ifndef smp_mb`
		0a122b	`+#define smp_mb() __sync_synchronize()`
		0a122b	`+#endif`
		0a122b	`+`
		0a122b	`+#ifndef smp_wmb`
		0a122b	`+#ifdef __ATOMIC_RELEASE`
		0a122b	`+#define smp_wmb() __atomic_thread_fence(__ATOMIC_RELEASE)`
		0a122b	`+#else`
		0a122b	`#define smp_wmb() __sync_synchronize()`
		0a122b	`-#define smp_mb() __sync_synchronize()`
		0a122b	`+#endif`
		0a122b	`+#endif`
		0a122b	`+`
		0a122b	`+#ifndef smp_rmb`
		0a122b	`+#ifdef __ATOMIC_ACQUIRE`
		0a122b	`+#define smp_rmb() __atomic_thread_fence(__ATOMIC_ACQUIRE)`
		0a122b	`+#else`
		0a122b	`#define smp_rmb() __sync_synchronize()`
		0a122b	`+#endif`
		0a122b	`+#endif`
		0a122b	`+`
		0a122b	`+#ifndef smp_read_barrier_depends`
		0a122b	`+#ifdef __ATOMIC_CONSUME`
		0a122b	`+#define smp_read_barrier_depends() __atomic_thread_fence(__ATOMIC_CONSUME)`
		0a122b	`+#else`
		0a122b	`+#define smp_read_barrier_depends() barrier()`
		0a122b	`+#endif`
		0a122b	`+#endif`
		0a122b
		0a122b	`+#ifndef atomic_read`
		0a122b	`+#define atomic_read(ptr) ((__typeof__(ptr) *volatile) (ptr))`
		0a122b	`#endif`
		0a122b
		0a122b	`+#ifndef atomic_set`
		0a122b	`+#define atomic_set(ptr, i) (((__typeof__(ptr) *volatile) (ptr)) = (i))`
		0a122b	`+#endif`
		0a122b	`+`
		0a122b	`+/* These have the same semantics as Java volatile variables.`
		0a122b	`+ * See http://gee.cs.oswego.edu/dl/jmm/cookbook.html:`
		0a122b	`+ * "1. Issue a StoreStore barrier (wmb) before each volatile store."`
		0a122b	`+ * 2. Issue a StoreLoad barrier after each volatile store.`
		0a122b	`+ * Note that you could instead issue one before each volatile load, but`
		0a122b	`+ * this would be slower for typical programs using volatiles in which`
		0a122b	`+ * reads greatly outnumber writes. Alternatively, if available, you`
		0a122b	`+ * can implement volatile store as an atomic instruction (for example`
		0a122b	`+ * XCHG on x86) and omit the barrier. This may be more efficient if`
		0a122b	`+ * atomic instructions are cheaper than StoreLoad barriers.`
		0a122b	`+ * 3. Issue LoadLoad and LoadStore barriers after each volatile load."`
		0a122b	`+ *`
		0a122b	`+ * If you prefer to think in terms of "pairing" of memory barriers,`
		0a122b	`+ * an atomic_mb_read pairs with an atomic_mb_set.`
		0a122b	`+ *`
		0a122b	`+ * And for the few ia64 lovers that exist, an atomic_mb_read is a ld.acq,`
		0a122b	`+ * while an atomic_mb_set is a st.rel followed by a memory barrier.`
		0a122b	`+ *`
		0a122b	`+ * These are a bit weaker than __atomic_load/store with __ATOMIC_SEQ_CST`
		0a122b	`+ * (see docs/atomics.txt), and I'm not sure that __ATOMIC_ACQ_REL is enough.`
		0a122b	`+ * Just always use the barriers manually by the rules above.`
		0a122b	`+ */`
		0a122b	`+#ifndef atomic_mb_read`
		0a122b	`+#define atomic_mb_read(ptr) ({ \`
		0a122b	`+ typeof(*ptr) _val = atomic_read(ptr); \`
		0a122b	`+ smp_rmb(); \`
		0a122b	`+ _val; \`
		0a122b	`+})`
		0a122b	`+#endif`
		0a122b	`+`
		0a122b	`+#ifndef atomic_mb_set`
		0a122b	`+#define atomic_mb_set(ptr, i) do { \`
		0a122b	`+ smp_wmb(); \`
		0a122b	`+ atomic_set(ptr, i); \`
		0a122b	`+ smp_mb(); \`
		0a122b	`+} while (0)`
		0a122b	`+#endif`
		0a122b	`+`
		0a122b	`+#ifndef atomic_xchg`
		0a122b	`+#ifdef __ATOMIC_SEQ_CST`
		0a122b	`+#define atomic_xchg(ptr, i) ({ \`
		0a122b	`+ typeof(*ptr) _new = (i), _old; \`
		0a122b	`+ __atomic_exchange(ptr, &_new, &_old, __ATOMIC_SEQ_CST); \`
		0a122b	`+ _old; \`
		0a122b	`+})`
		0a122b	`+#elif defined __clang__`
		0a122b	`+#define atomic_xchg(ptr, i) __sync_exchange(ptr, i)`
		0a122b	`+#else`
		0a122b	`+/* __sync_lock_test_and_set() is documented to be an acquire barrier only. */`
		0a122b	`+#define atomic_xchg(ptr, i) (smp_mb(), __sync_lock_test_and_set(ptr, i))`
		0a122b	`+#endif`
		0a122b	`+#endif`
		0a122b	`+`
		0a122b	`+/* Provide shorter names for GCC atomic builtins. */`
		0a122b	`+#define atomic_fetch_inc(ptr) __sync_fetch_and_add(ptr, 1)`
		0a122b	`+#define atomic_fetch_dec(ptr) __sync_fetch_and_add(ptr, -1)`
		0a122b	`+#define atomic_fetch_add __sync_fetch_and_add`
		0a122b	`+#define atomic_fetch_sub __sync_fetch_and_sub`
		0a122b	`+#define atomic_fetch_and __sync_fetch_and_and`
		0a122b	`+#define atomic_fetch_or __sync_fetch_and_or`
		0a122b	`+#define atomic_cmpxchg __sync_val_compare_and_swap`
		0a122b	`+`
		0a122b	`+/* And even shorter names that return void. */`
		0a122b	`+#define atomic_inc(ptr) ((void) __sync_fetch_and_add(ptr, 1))`
		0a122b	`+#define atomic_dec(ptr) ((void) __sync_fetch_and_add(ptr, -1))`
		0a122b	`+#define atomic_add(ptr, n) ((void) __sync_fetch_and_add(ptr, n))`
		0a122b	`+#define atomic_sub(ptr, n) ((void) __sync_fetch_and_sub(ptr, n))`
		0a122b	`+#define atomic_and(ptr, n) ((void) __sync_fetch_and_and(ptr, n))`
		0a122b	`+#define atomic_or(ptr, n) ((void) __sync_fetch_and_or(ptr, n))`
		0a122b	`+`
		0a122b	`#endif`
		0a122b	`diff --git a/migration.c b/migration.c`
		0a122b	`index 46c633a..d91e702 100644`
		0a122b	`--- a/migration.c`
		0a122b	`+++ b/migration.c`
		0a122b	`@@ -291,8 +291,7 @@ static void migrate_fd_cleanup(void *opaque)`
		0a122b
		0a122b	`static void migrate_finish_set_state(MigrationState *s, int new_state)`
		0a122b	`{`
		0a122b	`- if (__sync_val_compare_and_swap(&s->state, MIG_STATE_ACTIVE,`
		0a122b	`- new_state) == new_state) {`
		0a122b	`+ if (atomic_cmpxchg(&s->state, MIG_STATE_ACTIVE, new_state) == new_state) {`
		0a122b	`trace_migrate_set_state(new_state);`
		0a122b	`}`
		0a122b	`}`
		0a122b	`diff --git a/tests/test-thread-pool.c b/tests/test-thread-pool.c`
		0a122b	`index 22915aa..b62338f 100644`
		0a122b	`--- a/tests/test-thread-pool.c`
		0a122b	`+++ b/tests/test-thread-pool.c`
		0a122b	`@@ -17,15 +17,15 @@ typedef struct {`
		0a122b	`static int worker_cb(void *opaque)`
		0a122b	`{`
		0a122b	`WorkerTestData *data = opaque;`
		0a122b	`- return __sync_fetch_and_add(&data->n, 1);`
		0a122b	`+ return atomic_fetch_inc(&data->n);`
		0a122b	`}`
		0a122b
		0a122b	`static int long_cb(void *opaque)`
		0a122b	`{`
		0a122b	`WorkerTestData *data = opaque;`
		0a122b	`- __sync_fetch_and_add(&data->n, 1);`
		0a122b	`+ atomic_inc(&data->n);`
		0a122b	`g_usleep(2000000);`
		0a122b	`- __sync_fetch_and_add(&data->n, 1);`
		0a122b	`+ atomic_inc(&data->n);`
		0a122b	`return 0;`
		0a122b	`}`
		0a122b
		0a122b	`@@ -169,7 +169,7 @@ static void test_cancel(void)`
		0a122b	`/* Cancel the jobs that haven't been started yet. */`
		0a122b	`num_canceled = 0;`
		0a122b	`for (i = 0; i < 100; i++) {`
		0a122b	`- if (__sync_val_compare_and_swap(&data[i].n, 0, 3) == 0) {`
		0a122b	`+ if (atomic_cmpxchg(&data[i].n, 0, 3) == 0) {`
		0a122b	`data[i].ret = -ECANCELED;`
		0a122b	`bdrv_aio_cancel(data[i].aiocb);`
		0a122b	`active--;`
		0a122b	`--`
		0a122b	`1.7.11.7`
		0a122b

cryptospore / rpms / qemu-kvm

Source Code

Blame SOURCES/kvm-add-a-header-file-for-atomic-operations.patch