From 2fa2ea233bec906b682fc82376649a1a6e18e9df Mon Sep 17 00:00:00 2001 From: "H.J. Lu" Date: Tue, 2 Nov 2021 18:33:07 -0700 Subject: [PATCH] Add LLL_MUTEX_READ_LOCK [BZ #28537] CAS instruction is expensive. From the x86 CPU's point of view, getting a cache line for writing is more expensive than reading. See Appendix A.2 Spinlock in: https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/xeon-lock-scaling-analysis-paper.pdf The full compare and swap will grab the cache line exclusive and cause excessive cache line bouncing. Add LLL_MUTEX_READ_LOCK to do an atomic load and skip CAS in spinlock loop if compare may fail to reduce cache line bouncing on contended locks. Reviewed-by: Szabolcs Nagy (cherry picked from commit d672a98a1af106bd68deb15576710cd61363f7a6) --- nptl/pthread_mutex_lock.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/nptl/pthread_mutex_lock.c b/nptl/pthread_mutex_lock.c index 60ada70d..eb4d8baa 100644 --- a/nptl/pthread_mutex_lock.c +++ b/nptl/pthread_mutex_lock.c @@ -56,6 +56,11 @@ #define FORCE_ELISION(m, s) #endif +#ifndef LLL_MUTEX_READ_LOCK +# define LLL_MUTEX_READ_LOCK(mutex) \ + atomic_load_relaxed (&(mutex)->__data.__lock) +#endif + static int __pthread_mutex_lock_full (pthread_mutex_t *mutex) __attribute_noinline__; @@ -136,6 +141,8 @@ __pthread_mutex_lock (pthread_mutex_t *mutex) break; } atomic_spin_nop (); + if (LLL_MUTEX_READ_LOCK (mutex) != 0) + continue; } while (LLL_MUTEX_TRYLOCK (mutex) != 0); -- GitLab