Mark Wielaard 04e90e
commit 530df882b8f60ecacaf2b9b8a719f7ea1c1d1650
Mark Wielaard 04e90e
Author: Julian Seward <jseward@acm.org>
Mark Wielaard 04e90e
Date:   Fri Nov 12 12:13:45 2021 +0100
Mark Wielaard 04e90e
Mark Wielaard 04e90e
    Bug 444399 - disInstr(arm64): unhandled instruction 0xC87F2D89 (LD{,A}XP and ST{,L}XP).
Mark Wielaard 04e90e
    
Mark Wielaard 04e90e
    This is unfortunately a big and complex patch, to implement LD{,A}XP and
Mark Wielaard 04e90e
    ST{,L}XP.  These were omitted from the original AArch64 v8.0 implementation
Mark Wielaard 04e90e
    for unknown reasons.
Mark Wielaard 04e90e
    
Mark Wielaard 04e90e
    (Background) the patch is made significantly more complex because for AArch64
Mark Wielaard 04e90e
    we actually have two implementations of the underlying
Mark Wielaard 04e90e
    Load-Linked/Store-Conditional (LL/SC) machinery: a "primary" implementation,
Mark Wielaard 04e90e
    which translates LL/SC more or less directly into IR and re-emits them at the
Mark Wielaard 04e90e
    back end, and a "fallback" implementation that implements LL/SC "manually", by
Mark Wielaard 04e90e
    taking advantage of the fact that V serialises thread execution, so we can
Mark Wielaard 04e90e
    "implement" LL/SC by simulating a reservation using fields LLSC_* in the guest
Mark Wielaard 04e90e
    state, and invalidating the reservation at every thread switch.
Mark Wielaard 04e90e
    
Mark Wielaard 04e90e
    (Background) the fallback scheme is needed because the primary scheme is in
Mark Wielaard 04e90e
    violation of the ARMv8 semantics in that it can (easily) introduce extra
Mark Wielaard 04e90e
    memory references between the LL and SC, hence on some hardware causing the
Mark Wielaard 04e90e
    reservation to always fail and so the simulated program to wind up looping
Mark Wielaard 04e90e
    forever.
Mark Wielaard 04e90e
    
Mark Wielaard 04e90e
    For these instructions, big picture:
Mark Wielaard 04e90e
    
Mark Wielaard 04e90e
    * for the primary implementation, we take advantage of the fact that
Mark Wielaard 04e90e
      IRStmt_LLSC allows I128 bit transactions to be represented.  Hence we bundle
Mark Wielaard 04e90e
      up the two 64-bit data elements into an I128 (or vice versa) and present a
Mark Wielaard 04e90e
      single I128-typed IRStmt_LLSC in the IR.  In the backend, those are
Mark Wielaard 04e90e
      re-emitted as LDXP/STXP respectively.  For LL/SC on 32-bit register pairs,
Mark Wielaard 04e90e
      that bundling produces a single 64-bit item, and so the existing LL/SC
Mark Wielaard 04e90e
      backend machinery handles it.  The effect is that a doubleword 32-bit LL/SC
Mark Wielaard 04e90e
      in the front end translates into a single 64-bit LL/SC in the back end.
Mark Wielaard 04e90e
      Overall, though, the implementation is straightforward.
Mark Wielaard 04e90e
    
Mark Wielaard 04e90e
    * for the fallback implementation, it is necessary to extend the guest state
Mark Wielaard 04e90e
      field `guest_LLSC_DATA` to represent a 128-bit transaction, by splitting it
Mark Wielaard 04e90e
      into _DATA_LO64 and DATA_HI64.  Then, the implementation is an exact
Mark Wielaard 04e90e
      analogue of the fallback implementation for single-word LL/SC.  It takes
Mark Wielaard 04e90e
      advantage of the fact that the backend already supports 128-bit CAS, as
Mark Wielaard 04e90e
      fixed in bug 445354.  As with the primary implementation, doubleword 32-bit
Mark Wielaard 04e90e
      LL/SC is bundled into a single 64-bit transaction.
Mark Wielaard 04e90e
    
Mark Wielaard 04e90e
    Detailed changes:
Mark Wielaard 04e90e
    
Mark Wielaard 04e90e
    * new arm64 guest state fields LLSC_DATA_LO64/LLSC_DATA_LO64 to replace
Mark Wielaard 04e90e
      guest_LLSC_DATA
Mark Wielaard 04e90e
    
Mark Wielaard 04e90e
    * (ridealong fix) arm64 front end: a fix to a minor and harmless decoding bug
Mark Wielaard 04e90e
      for the single-word LDX/STX case.
Mark Wielaard 04e90e
    
Mark Wielaard 04e90e
    * arm64 front end: IR generation for LD{,A}XP/ST{,L}XP: tedious and
Mark Wielaard 04e90e
      longwinded, but per comments above, an exact(ish) analogue of the singleword
Mark Wielaard 04e90e
      case
Mark Wielaard 04e90e
    
Mark Wielaard 04e90e
    * arm64 backend: new insns ARM64Instr_LdrEXP / ARM64Instr_StrEXP to wrap up 2
Mark Wielaard 04e90e
      x 64 exclusive loads/stores.  Per comments above, there's no need to handle
Mark Wielaard 04e90e
      the 2 x 32 case.
Mark Wielaard 04e90e
    
Mark Wielaard 04e90e
    * arm64 isel: translate I128-typed IRStmt_LLSC into the above two insns
Mark Wielaard 04e90e
    
Mark Wielaard 04e90e
    * arm64 isel: some auxiliary bits and pieces needed to handle I128 values;
Mark Wielaard 04e90e
      this is standard doubleword isel stuff
Mark Wielaard 04e90e
    
Mark Wielaard 04e90e
    * arm64 isel: (ridealong fix): Ist_CAS: check for endianness of the CAS!
Mark Wielaard 04e90e
    
Mark Wielaard 04e90e
    * arm64 isel: (ridealong) a couple of formatting fixes
Mark Wielaard 04e90e
    
Mark Wielaard 04e90e
    * IR infrastructure: add support for I128 constants, done the same as V128
Mark Wielaard 04e90e
      constants
Mark Wielaard 04e90e
    
Mark Wielaard 04e90e
    * memcheck: handle shadow loads and stores for I128 values
Mark Wielaard 04e90e
    
Mark Wielaard 04e90e
    * testcase: memcheck/tests/atomic_incs.c: on arm64, also test 128-bit atomic
Mark Wielaard 04e90e
      addition, to check we really have atomicity right
Mark Wielaard 04e90e
    
Mark Wielaard 04e90e
    * testcase: new test none/tests/arm64/ldxp_stxp.c, tests operation but not
Mark Wielaard 04e90e
      atomicity.  (Smoke test).
Mark Wielaard 04e90e
Mark Wielaard 04e90e
diff --git a/VEX/priv/guest_arm64_toIR.c b/VEX/priv/guest_arm64_toIR.c
Mark Wielaard 04e90e
index 12a1c5978..ee018c6a9 100644
Mark Wielaard 04e90e
--- a/VEX/priv/guest_arm64_toIR.c
Mark Wielaard 04e90e
+++ b/VEX/priv/guest_arm64_toIR.c
Mark Wielaard 04e90e
@@ -1184,9 +1184,10 @@ static IRExpr* narrowFrom64 ( IRType dstTy, IRExpr* e )
Mark Wielaard 04e90e
 #define OFFB_CMSTART  offsetof(VexGuestARM64State,guest_CMSTART)
Mark Wielaard 04e90e
 #define OFFB_CMLEN    offsetof(VexGuestARM64State,guest_CMLEN)
Mark Wielaard 04e90e
 
Mark Wielaard 04e90e
-#define OFFB_LLSC_SIZE offsetof(VexGuestARM64State,guest_LLSC_SIZE)
Mark Wielaard 04e90e
-#define OFFB_LLSC_ADDR offsetof(VexGuestARM64State,guest_LLSC_ADDR)
Mark Wielaard 04e90e
-#define OFFB_LLSC_DATA offsetof(VexGuestARM64State,guest_LLSC_DATA)
Mark Wielaard 04e90e
+#define OFFB_LLSC_SIZE      offsetof(VexGuestARM64State,guest_LLSC_SIZE)
Mark Wielaard 04e90e
+#define OFFB_LLSC_ADDR      offsetof(VexGuestARM64State,guest_LLSC_ADDR)
Mark Wielaard 04e90e
+#define OFFB_LLSC_DATA_LO64 offsetof(VexGuestARM64State,guest_LLSC_DATA_LO64)
Mark Wielaard 04e90e
+#define OFFB_LLSC_DATA_HI64 offsetof(VexGuestARM64State,guest_LLSC_DATA_HI64)
Mark Wielaard 04e90e
 
Mark Wielaard 04e90e
 
Mark Wielaard 04e90e
 /* ---------------- Integer registers ---------------- */
Mark Wielaard 04e90e
@@ -6652,7 +6653,7 @@ Bool dis_ARM64_load_store(/*MB_OUT*/DisResult* dres, UInt insn,
Mark Wielaard 04e90e
         (coregrind/m_scheduler/scheduler.c, run_thread_for_a_while()
Mark Wielaard 04e90e
          has to do this bit)
Mark Wielaard 04e90e
    */   
Mark Wielaard 04e90e
-   if (INSN(29,23) == BITS7(0,0,1,0,0,0,0)
Mark Wielaard 04e90e
+   if (INSN(29,24) == BITS6(0,0,1,0,0,0)
Mark Wielaard 04e90e
        && (INSN(23,21) & BITS3(1,0,1)) == BITS3(0,0,0)
Mark Wielaard 04e90e
        && INSN(14,10) == BITS5(1,1,1,1,1)) {
Mark Wielaard 04e90e
       UInt szBlg2     = INSN(31,30);
Mark Wielaard 04e90e
@@ -6678,7 +6679,8 @@ Bool dis_ARM64_load_store(/*MB_OUT*/DisResult* dres, UInt insn,
Mark Wielaard 04e90e
             // if it faults.
Mark Wielaard 04e90e
             IRTemp loaded_data64 = newTemp(Ity_I64);
Mark Wielaard 04e90e
             assign(loaded_data64, widenUto64(ty, loadLE(ty, mkexpr(ea))));
Mark Wielaard 04e90e
-            stmt( IRStmt_Put( OFFB_LLSC_DATA, mkexpr(loaded_data64) ));
Mark Wielaard 04e90e
+            stmt( IRStmt_Put( OFFB_LLSC_DATA_LO64, mkexpr(loaded_data64) ));
Mark Wielaard 04e90e
+            stmt( IRStmt_Put( OFFB_LLSC_DATA_HI64, mkU64(0) ));
Mark Wielaard 04e90e
             stmt( IRStmt_Put( OFFB_LLSC_ADDR, mkexpr(ea) ));
Mark Wielaard 04e90e
             stmt( IRStmt_Put( OFFB_LLSC_SIZE, mkU64(szB) ));
Mark Wielaard 04e90e
             putIReg64orZR(tt, mkexpr(loaded_data64));
Mark Wielaard 04e90e
@@ -6729,7 +6731,7 @@ Bool dis_ARM64_load_store(/*MB_OUT*/DisResult* dres, UInt insn,
Mark Wielaard 04e90e
             ));
Mark Wielaard 04e90e
             // Fail if the data doesn't match the LL data
Mark Wielaard 04e90e
             IRTemp llsc_data64 = newTemp(Ity_I64);
Mark Wielaard 04e90e
-            assign(llsc_data64, IRExpr_Get(OFFB_LLSC_DATA, Ity_I64));
Mark Wielaard 04e90e
+            assign(llsc_data64, IRExpr_Get(OFFB_LLSC_DATA_LO64, Ity_I64));
Mark Wielaard 04e90e
             stmt( IRStmt_Exit(
Mark Wielaard 04e90e
                       binop(Iop_CmpNE64, widenUto64(ty, loadLE(ty, mkexpr(ea))),
Mark Wielaard 04e90e
                                          mkexpr(llsc_data64)),
Mark Wielaard 04e90e
@@ -6771,6 +6773,257 @@ Bool dis_ARM64_load_store(/*MB_OUT*/DisResult* dres, UInt insn,
Mark Wielaard 04e90e
       /* else fall through */
Mark Wielaard 04e90e
    }
Mark Wielaard 04e90e
 
Mark Wielaard 04e90e
+   /* -------------------- LD{,A}XP -------------------- */
Mark Wielaard 04e90e
+   /* -------------------- ST{,L}XP -------------------- */
Mark Wielaard 04e90e
+   /* 31 30 29     23  20    15 14  9  4
Mark Wielaard 04e90e
+       1 sz 001000 011 11111 0  t2  n  t1   LDXP  Rt1, Rt2, [Xn|SP]
Mark Wielaard 04e90e
+       1 sz 001000 011 11111 1  t2  n  t1   LDAXP Rt1, Rt2, [Xn|SP]
Mark Wielaard 04e90e
+       1 sz 001000 001 s     0  t2  n  t1   STXP  Ws, Rt1, Rt2, [Xn|SP]
Mark Wielaard 04e90e
+       1 sz 001000 001 s     1  t2  n  t1   STLXP Ws, Rt1, Rt2, [Xn|SP]
Mark Wielaard 04e90e
+   */
Mark Wielaard 04e90e
+   /* See just above, "LD{,A}X{R,RH,RB} / ST{,L}X{R,RH,RB}", for detailed
Mark Wielaard 04e90e
+      comments about this implementation.  Note the 'sz' field here is only 1
Mark Wielaard 04e90e
+      bit; above, it is 2 bits, and has a different encoding.
Mark Wielaard 04e90e
+   */
Mark Wielaard 04e90e
+   if (INSN(31,31) == 1
Mark Wielaard 04e90e
+       && INSN(29,24) == BITS6(0,0,1,0,0,0)
Mark Wielaard 04e90e
+       && (INSN(23,21) & BITS3(1,0,1)) == BITS3(0,0,1)) {
Mark Wielaard 04e90e
+      Bool elemIs64   = INSN(30,30) == 1;
Mark Wielaard 04e90e
+      Bool isLD       = INSN(22,22) == 1;
Mark Wielaard 04e90e
+      Bool isAcqOrRel = INSN(15,15) == 1;
Mark Wielaard 04e90e
+      UInt ss         = INSN(20,16);
Mark Wielaard 04e90e
+      UInt tt2        = INSN(14,10);
Mark Wielaard 04e90e
+      UInt nn         = INSN(9,5);
Mark Wielaard 04e90e
+      UInt tt1        = INSN(4,0);
Mark Wielaard 04e90e
+
Mark Wielaard 04e90e
+      UInt   elemSzB = elemIs64 ? 8 : 4;
Mark Wielaard 04e90e
+      UInt   fullSzB = 2 * elemSzB;
Mark Wielaard 04e90e
+      IRType elemTy  = integerIRTypeOfSize(elemSzB);
Mark Wielaard 04e90e
+      IRType fullTy  = integerIRTypeOfSize(fullSzB);
Mark Wielaard 04e90e
+
Mark Wielaard 04e90e
+      IRTemp ea = newTemp(Ity_I64);
Mark Wielaard 04e90e
+      assign(ea, getIReg64orSP(nn));
Mark Wielaard 04e90e
+      /* FIXME generate check that ea is 2*elemSzB-aligned */
Mark Wielaard 04e90e
+
Mark Wielaard 04e90e
+      if (isLD && ss == BITS5(1,1,1,1,1)) {
Mark Wielaard 04e90e
+         if (abiinfo->guest__use_fallback_LLSC) {
Mark Wielaard 04e90e
+            // Fallback implementation of LL.
Mark Wielaard 04e90e
+            // Do the load first so we don't update any guest state if it
Mark Wielaard 04e90e
+            // faults.  Assumes little-endian guest.
Mark Wielaard 04e90e
+            if (fullTy == Ity_I64) {
Mark Wielaard 04e90e
+               vassert(elemSzB == 4);
Mark Wielaard 04e90e
+               IRTemp loaded_data64 = newTemp(Ity_I64);
Mark Wielaard 04e90e
+               assign(loaded_data64, loadLE(fullTy, mkexpr(ea)));
Mark Wielaard 04e90e
+               stmt( IRStmt_Put( OFFB_LLSC_DATA_LO64, mkexpr(loaded_data64) ));
Mark Wielaard 04e90e
+               stmt( IRStmt_Put( OFFB_LLSC_DATA_HI64, mkU64(0) ));
Mark Wielaard 04e90e
+               stmt( IRStmt_Put( OFFB_LLSC_ADDR, mkexpr(ea) ));
Mark Wielaard 04e90e
+               stmt( IRStmt_Put( OFFB_LLSC_SIZE, mkU64(8) ));
Mark Wielaard 04e90e
+               putIReg64orZR(tt1, unop(Iop_32Uto64,
Mark Wielaard 04e90e
+                                       unop(Iop_64to32,
Mark Wielaard 04e90e
+                                            mkexpr(loaded_data64))));
Mark Wielaard 04e90e
+               putIReg64orZR(tt2, unop(Iop_32Uto64,
Mark Wielaard 04e90e
+                                       unop(Iop_64HIto32,
Mark Wielaard 04e90e
+                                            mkexpr(loaded_data64))));
Mark Wielaard 04e90e
+            } else {
Mark Wielaard 04e90e
+               vassert(elemSzB == 8 && fullTy == Ity_I128);
Mark Wielaard 04e90e
+               IRTemp loaded_data128 = newTemp(Ity_I128);
Mark Wielaard 04e90e
+               // Hack: do the load as V128 rather than I128 so as to avoid
Mark Wielaard 04e90e
+               // having to implement I128 loads in the arm64 back end.
Mark Wielaard 04e90e
+               assign(loaded_data128, unop(Iop_ReinterpV128asI128,
Mark Wielaard 04e90e
+                                           loadLE(Ity_V128, mkexpr(ea))));
Mark Wielaard 04e90e
+               IRTemp loaded_data_lo64 = newTemp(Ity_I64);
Mark Wielaard 04e90e
+               IRTemp loaded_data_hi64 = newTemp(Ity_I64);
Mark Wielaard 04e90e
+               assign(loaded_data_lo64, unop(Iop_128to64,
Mark Wielaard 04e90e
+                                             mkexpr(loaded_data128)));
Mark Wielaard 04e90e
+               assign(loaded_data_hi64, unop(Iop_128HIto64,
Mark Wielaard 04e90e
+                                             mkexpr(loaded_data128)));
Mark Wielaard 04e90e
+               stmt( IRStmt_Put( OFFB_LLSC_DATA_LO64,
Mark Wielaard 04e90e
+                                 mkexpr(loaded_data_lo64) ));
Mark Wielaard 04e90e
+               stmt( IRStmt_Put( OFFB_LLSC_DATA_HI64,
Mark Wielaard 04e90e
+                                 mkexpr(loaded_data_hi64) ));
Mark Wielaard 04e90e
+               stmt( IRStmt_Put( OFFB_LLSC_ADDR, mkexpr(ea) ));
Mark Wielaard 04e90e
+               stmt( IRStmt_Put( OFFB_LLSC_SIZE, mkU64(16) ));
Mark Wielaard 04e90e
+               putIReg64orZR(tt1, mkexpr(loaded_data_lo64));
Mark Wielaard 04e90e
+               putIReg64orZR(tt2, mkexpr(loaded_data_hi64));
Mark Wielaard 04e90e
+            }
Mark Wielaard 04e90e
+         } else {
Mark Wielaard 04e90e
+            // Non-fallback implementation of LL.
Mark Wielaard 04e90e
+            IRTemp res = newTemp(fullTy); // I64 or I128
Mark Wielaard 04e90e
+            stmt(IRStmt_LLSC(Iend_LE, res, mkexpr(ea), NULL/*LL*/));
Mark Wielaard 04e90e
+            // Assuming a little-endian guest here.  Rt1 goes at the lower
Mark Wielaard 04e90e
+            // address, so it must live in the least significant half of `res`.
Mark Wielaard 04e90e
+            IROp opGetLO = fullTy == Ity_I128 ? Iop_128to64   : Iop_64to32;
Mark Wielaard 04e90e
+            IROp opGetHI = fullTy == Ity_I128 ? Iop_128HIto64 : Iop_64HIto32;
Mark Wielaard 04e90e
+            putIReg64orZR(tt1, widenUto64(elemTy, unop(opGetLO, mkexpr(res))));
Mark Wielaard 04e90e
+            putIReg64orZR(tt2, widenUto64(elemTy, unop(opGetHI, mkexpr(res))));
Mark Wielaard 04e90e
+         }
Mark Wielaard 04e90e
+         if (isAcqOrRel) {
Mark Wielaard 04e90e
+            stmt(IRStmt_MBE(Imbe_Fence));
Mark Wielaard 04e90e
+         }
Mark Wielaard 04e90e
+         DIP("ld%sxp %s, %s, [%s] %s\n",
Mark Wielaard 04e90e
+             isAcqOrRel ? (isLD ? "a" : "l") : "",
Mark Wielaard 04e90e
+             nameIRegOrZR(elemSzB == 8, tt1),
Mark Wielaard 04e90e
+             nameIRegOrZR(elemSzB == 8, tt2),
Mark Wielaard 04e90e
+             nameIReg64orSP(nn),
Mark Wielaard 04e90e
+             abiinfo->guest__use_fallback_LLSC
Mark Wielaard 04e90e
+                ? "(fallback implementation)" : "");
Mark Wielaard 04e90e
+         return True;
Mark Wielaard 04e90e
+      }
Mark Wielaard 04e90e
+      if (!isLD) {
Mark Wielaard 04e90e
+         if (isAcqOrRel) {
Mark Wielaard 04e90e
+            stmt(IRStmt_MBE(Imbe_Fence));
Mark Wielaard 04e90e
+         }
Mark Wielaard 04e90e
+         if (abiinfo->guest__use_fallback_LLSC) {
Mark Wielaard 04e90e
+            // Fallback implementation of SC.
Mark Wielaard 04e90e
+            // This is really ugly, since we don't have any way to do
Mark Wielaard 04e90e
+            // proper if-then-else.  First, set up as if the SC failed,
Mark Wielaard 04e90e
+            // and jump forwards if it really has failed.
Mark Wielaard 04e90e
+
Mark Wielaard 04e90e
+            // Continuation address
Mark Wielaard 04e90e
+            IRConst* nia = IRConst_U64(guest_PC_curr_instr + 4);
Mark Wielaard 04e90e
+
Mark Wielaard 04e90e
+            // "the SC failed".  Any non-zero value means failure.
Mark Wielaard 04e90e
+            putIReg64orZR(ss, mkU64(1));
Mark Wielaard 04e90e
+
Mark Wielaard 04e90e
+            IRTemp tmp_LLsize = newTemp(Ity_I64);
Mark Wielaard 04e90e
+            assign(tmp_LLsize, IRExpr_Get(OFFB_LLSC_SIZE, Ity_I64));
Mark Wielaard 04e90e
+            stmt( IRStmt_Put( OFFB_LLSC_SIZE, mkU64(0) // "no transaction"
Mark Wielaard 04e90e
+            ));
Mark Wielaard 04e90e
+            // Fail if no or wrong-size transaction
Mark Wielaard 04e90e
+            vassert((fullSzB == 8 && fullTy == Ity_I64)
Mark Wielaard 04e90e
+                    || (fullSzB == 16 && fullTy == Ity_I128));
Mark Wielaard 04e90e
+            stmt( IRStmt_Exit(
Mark Wielaard 04e90e
+                     binop(Iop_CmpNE64, mkexpr(tmp_LLsize), mkU64(fullSzB)),
Mark Wielaard 04e90e
+                     Ijk_Boring, nia, OFFB_PC
Mark Wielaard 04e90e
+            ));
Mark Wielaard 04e90e
+            // Fail if the address doesn't match the LL address
Mark Wielaard 04e90e
+            stmt( IRStmt_Exit(
Mark Wielaard 04e90e
+                      binop(Iop_CmpNE64, mkexpr(ea),
Mark Wielaard 04e90e
+                                         IRExpr_Get(OFFB_LLSC_ADDR, Ity_I64)),
Mark Wielaard 04e90e
+                      Ijk_Boring, nia, OFFB_PC
Mark Wielaard 04e90e
+            ));
Mark Wielaard 04e90e
+            // The data to be stored.
Mark Wielaard 04e90e
+            IRTemp store_data = newTemp(fullTy);
Mark Wielaard 04e90e
+            if (fullTy == Ity_I64) {
Mark Wielaard 04e90e
+               assign(store_data,
Mark Wielaard 04e90e
+                      binop(Iop_32HLto64,
Mark Wielaard 04e90e
+                            narrowFrom64(Ity_I32, getIReg64orZR(tt2)),
Mark Wielaard 04e90e
+                            narrowFrom64(Ity_I32, getIReg64orZR(tt1))));
Mark Wielaard 04e90e
+            } else {
Mark Wielaard 04e90e
+               assign(store_data,
Mark Wielaard 04e90e
+                      binop(Iop_64HLto128,
Mark Wielaard 04e90e
+                            getIReg64orZR(tt2), getIReg64orZR(tt1)));
Mark Wielaard 04e90e
+            }
Mark Wielaard 04e90e
+
Mark Wielaard 04e90e
+            if (fullTy == Ity_I64) {
Mark Wielaard 04e90e
+               // 64 bit (2x32 bit) path
Mark Wielaard 04e90e
+               // Fail if the data in memory doesn't match the data stashed by
Mark Wielaard 04e90e
+               // the LL.
Mark Wielaard 04e90e
+               IRTemp llsc_data_lo64 = newTemp(Ity_I64);
Mark Wielaard 04e90e
+               assign(llsc_data_lo64,
Mark Wielaard 04e90e
+                      IRExpr_Get(OFFB_LLSC_DATA_LO64, Ity_I64));
Mark Wielaard 04e90e
+               stmt( IRStmt_Exit(
Mark Wielaard 04e90e
+                         binop(Iop_CmpNE64, loadLE(Ity_I64, mkexpr(ea)),
Mark Wielaard 04e90e
+                                            mkexpr(llsc_data_lo64)),
Mark Wielaard 04e90e
+                      Ijk_Boring, nia, OFFB_PC
Mark Wielaard 04e90e
+               ));
Mark Wielaard 04e90e
+               // Try to CAS the new value in.
Mark Wielaard 04e90e
+               IRTemp old = newTemp(Ity_I64);
Mark Wielaard 04e90e
+               IRTemp expd = newTemp(Ity_I64);
Mark Wielaard 04e90e
+               assign(expd, mkexpr(llsc_data_lo64));
Mark Wielaard 04e90e
+               stmt( IRStmt_CAS(mkIRCAS(/*oldHi*/IRTemp_INVALID, old,
Mark Wielaard 04e90e
+                                        Iend_LE, mkexpr(ea),
Mark Wielaard 04e90e
+                                        /*expdHi*/NULL, mkexpr(expd),
Mark Wielaard 04e90e
+                                        /*dataHi*/NULL, mkexpr(store_data)
Mark Wielaard 04e90e
+               )));
Mark Wielaard 04e90e
+               // Fail if the CAS failed (viz, old != expd)
Mark Wielaard 04e90e
+               stmt( IRStmt_Exit(
Mark Wielaard 04e90e
+                         binop(Iop_CmpNE64, mkexpr(old), mkexpr(expd)),
Mark Wielaard 04e90e
+                         Ijk_Boring, nia, OFFB_PC
Mark Wielaard 04e90e
+               ));
Mark Wielaard 04e90e
+            } else {
Mark Wielaard 04e90e
+               // 128 bit (2x64 bit) path
Mark Wielaard 04e90e
+               // Fail if the data in memory doesn't match the data stashed by
Mark Wielaard 04e90e
+               // the LL.
Mark Wielaard 04e90e
+               IRTemp llsc_data_lo64 = newTemp(Ity_I64);
Mark Wielaard 04e90e
+               assign(llsc_data_lo64,
Mark Wielaard 04e90e
+                      IRExpr_Get(OFFB_LLSC_DATA_LO64, Ity_I64));
Mark Wielaard 04e90e
+               IRTemp llsc_data_hi64 = newTemp(Ity_I64);
Mark Wielaard 04e90e
+               assign(llsc_data_hi64,
Mark Wielaard 04e90e
+                      IRExpr_Get(OFFB_LLSC_DATA_HI64, Ity_I64));
Mark Wielaard 04e90e
+               IRTemp data_at_ea = newTemp(Ity_I128);
Mark Wielaard 04e90e
+               assign(data_at_ea,
Mark Wielaard 04e90e
+                      unop(Iop_ReinterpV128asI128,
Mark Wielaard 04e90e
+                           loadLE(Ity_V128, mkexpr(ea))));
Mark Wielaard 04e90e
+               stmt( IRStmt_Exit(
Mark Wielaard 04e90e
+                        binop(Iop_CmpNE64,
Mark Wielaard 04e90e
+                              unop(Iop_128to64, mkexpr(data_at_ea)),
Mark Wielaard 04e90e
+                              mkexpr(llsc_data_lo64)),
Mark Wielaard 04e90e
+                        Ijk_Boring, nia, OFFB_PC
Mark Wielaard 04e90e
+               ));
Mark Wielaard 04e90e
+               stmt( IRStmt_Exit(
Mark Wielaard 04e90e
+                        binop(Iop_CmpNE64,
Mark Wielaard 04e90e
+                              unop(Iop_128HIto64, mkexpr(data_at_ea)),
Mark Wielaard 04e90e
+                              mkexpr(llsc_data_hi64)),
Mark Wielaard 04e90e
+                        Ijk_Boring, nia, OFFB_PC
Mark Wielaard 04e90e
+               ));
Mark Wielaard 04e90e
+               // Try to CAS the new value in.
Mark Wielaard 04e90e
+               IRTemp old_lo64 = newTemp(Ity_I64);
Mark Wielaard 04e90e
+               IRTemp old_hi64 = newTemp(Ity_I64);
Mark Wielaard 04e90e
+               IRTemp expd_lo64 = newTemp(Ity_I64);
Mark Wielaard 04e90e
+               IRTemp expd_hi64 = newTemp(Ity_I64);
Mark Wielaard 04e90e
+               IRTemp store_data_lo64 = newTemp(Ity_I64);
Mark Wielaard 04e90e
+               IRTemp store_data_hi64 = newTemp(Ity_I64);
Mark Wielaard 04e90e
+               assign(expd_lo64, mkexpr(llsc_data_lo64));
Mark Wielaard 04e90e
+               assign(expd_hi64, mkexpr(llsc_data_hi64));
Mark Wielaard 04e90e
+               assign(store_data_lo64, unop(Iop_128to64, mkexpr(store_data)));
Mark Wielaard 04e90e
+               assign(store_data_hi64, unop(Iop_128HIto64, mkexpr(store_data)));
Mark Wielaard 04e90e
+               stmt( IRStmt_CAS(mkIRCAS(old_hi64, old_lo64,
Mark Wielaard 04e90e
+                                        Iend_LE, mkexpr(ea),
Mark Wielaard 04e90e
+                                        mkexpr(expd_hi64), mkexpr(expd_lo64),
Mark Wielaard 04e90e
+                                        mkexpr(store_data_hi64),
Mark Wielaard 04e90e
+                                        mkexpr(store_data_lo64)
Mark Wielaard 04e90e
+               )));
Mark Wielaard 04e90e
+               // Fail if the CAS failed (viz, old != expd)
Mark Wielaard 04e90e
+               stmt( IRStmt_Exit(
Mark Wielaard 04e90e
+                        binop(Iop_CmpNE64, mkexpr(old_lo64), mkexpr(expd_lo64)),
Mark Wielaard 04e90e
+                        Ijk_Boring, nia, OFFB_PC
Mark Wielaard 04e90e
+               ));
Mark Wielaard 04e90e
+               stmt( IRStmt_Exit(
Mark Wielaard 04e90e
+                        binop(Iop_CmpNE64, mkexpr(old_hi64), mkexpr(expd_hi64)),
Mark Wielaard 04e90e
+                        Ijk_Boring, nia, OFFB_PC
Mark Wielaard 04e90e
+               ));
Mark Wielaard 04e90e
+            }
Mark Wielaard 04e90e
+            // Otherwise we succeeded (!)
Mark Wielaard 04e90e
+            putIReg64orZR(ss, mkU64(0));
Mark Wielaard 04e90e
+         } else {
Mark Wielaard 04e90e
+            // Non-fallback implementation of SC.
Mark Wielaard 04e90e
+            IRTemp  res     = newTemp(Ity_I1);
Mark Wielaard 04e90e
+            IRExpr* dataLO  = narrowFrom64(elemTy, getIReg64orZR(tt1));
Mark Wielaard 04e90e
+            IRExpr* dataHI  = narrowFrom64(elemTy, getIReg64orZR(tt2));
Mark Wielaard 04e90e
+            IROp    opMerge = fullTy == Ity_I128 ? Iop_64HLto128 : Iop_32HLto64;
Mark Wielaard 04e90e
+            IRExpr* data    = binop(opMerge, dataHI, dataLO);
Mark Wielaard 04e90e
+            // Assuming a little-endian guest here.  Rt1 goes at the lower
Mark Wielaard 04e90e
+            // address, so it must live in the least significant half of `data`.
Mark Wielaard 04e90e
+            stmt(IRStmt_LLSC(Iend_LE, res, mkexpr(ea), data));
Mark Wielaard 04e90e
+            /* IR semantics: res is 1 if store succeeds, 0 if it fails.
Mark Wielaard 04e90e
+               Need to set rS to 1 on failure, 0 on success. */
Mark Wielaard 04e90e
+            putIReg64orZR(ss, binop(Iop_Xor64, unop(Iop_1Uto64, mkexpr(res)),
Mark Wielaard 04e90e
+                                               mkU64(1)));
Mark Wielaard 04e90e
+         }
Mark Wielaard 04e90e
+         DIP("st%sxp %s, %s, %s, [%s] %s\n",
Mark Wielaard 04e90e
+             isAcqOrRel ? (isLD ? "a" : "l") : "",
Mark Wielaard 04e90e
+             nameIRegOrZR(False, ss),
Mark Wielaard 04e90e
+             nameIRegOrZR(elemSzB == 8, tt1),
Mark Wielaard 04e90e
+             nameIRegOrZR(elemSzB == 8, tt2),
Mark Wielaard 04e90e
+             nameIReg64orSP(nn),
Mark Wielaard 04e90e
+             abiinfo->guest__use_fallback_LLSC
Mark Wielaard 04e90e
+                ? "(fallback implementation)" : "");
Mark Wielaard 04e90e
+         return True;
Mark Wielaard 04e90e
+      }
Mark Wielaard 04e90e
+      /* else fall through */
Mark Wielaard 04e90e
+   }
Mark Wielaard 04e90e
+
Mark Wielaard 04e90e
    /* ------------------ LDA{R,RH,RB} ------------------ */
Mark Wielaard 04e90e
    /* ------------------ STL{R,RH,RB} ------------------ */
Mark Wielaard 04e90e
    /* 31 29     23  20      14    9 4
Mark Wielaard 04e90e
diff --git a/VEX/priv/host_arm64_defs.c b/VEX/priv/host_arm64_defs.c
Mark Wielaard 04e90e
index 5657bcab9..b65e27db4 100644
Mark Wielaard 04e90e
--- a/VEX/priv/host_arm64_defs.c
Mark Wielaard 04e90e
+++ b/VEX/priv/host_arm64_defs.c
Mark Wielaard 04e90e
@@ -1059,6 +1059,16 @@ ARM64Instr* ARM64Instr_StrEX ( Int szB ) {
Mark Wielaard 04e90e
    vassert(szB == 8 || szB == 4 || szB == 2 || szB == 1);
Mark Wielaard 04e90e
    return i;
Mark Wielaard 04e90e
 }
Mark Wielaard 04e90e
+ARM64Instr* ARM64Instr_LdrEXP ( void ) {
Mark Wielaard 04e90e
+   ARM64Instr* i = LibVEX_Alloc_inline(sizeof(ARM64Instr));
Mark Wielaard 04e90e
+   i->tag        = ARM64in_LdrEXP;
Mark Wielaard 04e90e
+   return i;
Mark Wielaard 04e90e
+}
Mark Wielaard 04e90e
+ARM64Instr* ARM64Instr_StrEXP ( void ) {
Mark Wielaard 04e90e
+   ARM64Instr* i = LibVEX_Alloc_inline(sizeof(ARM64Instr));
Mark Wielaard 04e90e
+   i->tag        = ARM64in_StrEXP;
Mark Wielaard 04e90e
+   return i;
Mark Wielaard 04e90e
+}
Mark Wielaard 04e90e
 ARM64Instr* ARM64Instr_CAS ( Int szB ) {
Mark Wielaard 04e90e
    ARM64Instr* i = LibVEX_Alloc_inline(sizeof(ARM64Instr));
Mark Wielaard 04e90e
    i->tag             = ARM64in_CAS;
Mark Wielaard 04e90e
@@ -1699,12 +1709,19 @@ void ppARM64Instr ( const ARM64Instr* i ) {
Mark Wielaard 04e90e
                     sz, i->ARM64in.StrEX.szB == 8 ? 'x' : 'w');
Mark Wielaard 04e90e
          return;
Mark Wielaard 04e90e
       }
Mark Wielaard 04e90e
+      case ARM64in_LdrEXP:
Mark Wielaard 04e90e
+         vex_printf("ldxp   x2, x3, [x4]");
Mark Wielaard 04e90e
+         return;
Mark Wielaard 04e90e
+      case ARM64in_StrEXP:
Mark Wielaard 04e90e
+         vex_printf("stxp   w0, x2, x3, [x4]");
Mark Wielaard 04e90e
+         return;
Mark Wielaard 04e90e
       case ARM64in_CAS: {
Mark Wielaard 04e90e
          vex_printf("x1 = cas(%dbit)(x3, x5 -> x7)", 8 * i->ARM64in.CAS.szB);
Mark Wielaard 04e90e
          return;
Mark Wielaard 04e90e
       }
Mark Wielaard 04e90e
       case ARM64in_CASP: {
Mark Wielaard 04e90e
-         vex_printf("x0,x1 = casp(%dbit)(x2, x4,x5 -> x6,x7)", 8 * i->ARM64in.CASP.szB);
Mark Wielaard 04e90e
+         vex_printf("x0,x1 = casp(2x%dbit)(x2, x4,x5 -> x6,x7)",
Mark Wielaard 04e90e
+                    8 * i->ARM64in.CASP.szB);
Mark Wielaard 04e90e
          return;
Mark Wielaard 04e90e
       }
Mark Wielaard 04e90e
       case ARM64in_MFence:
Mark Wielaard 04e90e
@@ -2253,6 +2270,17 @@ void getRegUsage_ARM64Instr ( HRegUsage* u, const ARM64Instr* i, Bool mode64 )
Mark Wielaard 04e90e
          addHRegUse(u, HRmWrite, hregARM64_X0());
Mark Wielaard 04e90e
          addHRegUse(u, HRmRead, hregARM64_X2());
Mark Wielaard 04e90e
          return;
Mark Wielaard 04e90e
+      case ARM64in_LdrEXP:
Mark Wielaard 04e90e
+         addHRegUse(u, HRmRead, hregARM64_X4());
Mark Wielaard 04e90e
+         addHRegUse(u, HRmWrite, hregARM64_X2());
Mark Wielaard 04e90e
+         addHRegUse(u, HRmWrite, hregARM64_X3());
Mark Wielaard 04e90e
+         return;
Mark Wielaard 04e90e
+      case ARM64in_StrEXP:
Mark Wielaard 04e90e
+         addHRegUse(u, HRmRead, hregARM64_X4());
Mark Wielaard 04e90e
+         addHRegUse(u, HRmWrite, hregARM64_X0());
Mark Wielaard 04e90e
+         addHRegUse(u, HRmRead, hregARM64_X2());
Mark Wielaard 04e90e
+         addHRegUse(u, HRmRead, hregARM64_X3());
Mark Wielaard 04e90e
+         return;
Mark Wielaard 04e90e
       case ARM64in_CAS:
Mark Wielaard 04e90e
          addHRegUse(u, HRmRead, hregARM64_X3());
Mark Wielaard 04e90e
          addHRegUse(u, HRmRead, hregARM64_X5());
Mark Wielaard 04e90e
@@ -2571,6 +2599,10 @@ void mapRegs_ARM64Instr ( HRegRemap* m, ARM64Instr* i, Bool mode64 )
Mark Wielaard 04e90e
          return;
Mark Wielaard 04e90e
       case ARM64in_StrEX:
Mark Wielaard 04e90e
          return;
Mark Wielaard 04e90e
+      case ARM64in_LdrEXP:
Mark Wielaard 04e90e
+         return;
Mark Wielaard 04e90e
+      case ARM64in_StrEXP:
Mark Wielaard 04e90e
+         return;
Mark Wielaard 04e90e
       case ARM64in_CAS:
Mark Wielaard 04e90e
          return;
Mark Wielaard 04e90e
       case ARM64in_CASP:
Mark Wielaard 04e90e
@@ -4167,6 +4199,16 @@ Int emit_ARM64Instr ( /*MB_MOD*/Bool* is_profInc,
Mark Wielaard 04e90e
          }
Mark Wielaard 04e90e
          goto bad;
Mark Wielaard 04e90e
       }
Mark Wielaard 04e90e
+      case ARM64in_LdrEXP: {
Mark Wielaard 04e90e
+         // 820C7FC8   ldxp x2, x3, [x4]
Mark Wielaard 04e90e
+         *p++ = 0xC87F0C82;
Mark Wielaard 04e90e
+         goto done;
Mark Wielaard 04e90e
+      }
Mark Wielaard 04e90e
+      case ARM64in_StrEXP: {
Mark Wielaard 04e90e
+         // 820C20C8   stxp w0, x2, x3, [x4]
Mark Wielaard 04e90e
+         *p++ = 0xC8200C82;
Mark Wielaard 04e90e
+         goto done;
Mark Wielaard 04e90e
+      }
Mark Wielaard 04e90e
       case ARM64in_CAS: {
Mark Wielaard 04e90e
          /* This isn't simple.  For an explanation see the comment in
Mark Wielaard 04e90e
             host_arm64_defs.h on the definition of ARM64Instr case CAS.
Mark Wielaard 04e90e
diff --git a/VEX/priv/host_arm64_defs.h b/VEX/priv/host_arm64_defs.h
Mark Wielaard 04e90e
index 01fb5708e..dc686dff7 100644
Mark Wielaard 04e90e
--- a/VEX/priv/host_arm64_defs.h
Mark Wielaard 04e90e
+++ b/VEX/priv/host_arm64_defs.h
Mark Wielaard 04e90e
@@ -509,8 +509,10 @@ typedef
Mark Wielaard 04e90e
       ARM64in_AddToSP,     /* move SP by small, signed constant */
Mark Wielaard 04e90e
       ARM64in_FromSP,      /* move SP to integer register */
Mark Wielaard 04e90e
       ARM64in_Mul,
Mark Wielaard 04e90e
-      ARM64in_LdrEX,
Mark Wielaard 04e90e
-      ARM64in_StrEX,
Mark Wielaard 04e90e
+      ARM64in_LdrEX,       /* load exclusive, single register */
Mark Wielaard 04e90e
+      ARM64in_StrEX,       /* store exclusive, single register */
Mark Wielaard 04e90e
+      ARM64in_LdrEXP,      /* load exclusive, register pair, 2x64-bit only */
Mark Wielaard 04e90e
+      ARM64in_StrEXP,      /* store exclusive, register pair, 2x64-bit only */
Mark Wielaard 04e90e
       ARM64in_CAS,
Mark Wielaard 04e90e
       ARM64in_CASP,
Mark Wielaard 04e90e
       ARM64in_MFence,
Mark Wielaard 04e90e
@@ -719,6 +721,12 @@ typedef
Mark Wielaard 04e90e
          struct {
Mark Wielaard 04e90e
             Int  szB; /* 1, 2, 4 or 8 */
Mark Wielaard 04e90e
          } StrEX;
Mark Wielaard 04e90e
+         /* LDXP x2, x3, [x4].  This is 2x64-bit only. */
Mark Wielaard 04e90e
+         struct {
Mark Wielaard 04e90e
+         } LdrEXP;
Mark Wielaard 04e90e
+         /* STXP w0, x2, x3, [x4].  This is 2x64-bit only. */
Mark Wielaard 04e90e
+         struct {
Mark Wielaard 04e90e
+         } StrEXP;
Mark Wielaard 04e90e
          /* x1 = CAS(x3(addr), x5(expected) -> x7(new)),
Mark Wielaard 04e90e
             and trashes x8
Mark Wielaard 04e90e
             where x1[8*szB-1 : 0] == x5[8*szB-1 : 0] indicates success,
Mark Wielaard 04e90e
@@ -1037,6 +1045,8 @@ extern ARM64Instr* ARM64Instr_Mul     ( HReg dst, HReg argL, HReg argR,
Mark Wielaard 04e90e
                                         ARM64MulOp op );
Mark Wielaard 04e90e
 extern ARM64Instr* ARM64Instr_LdrEX   ( Int szB );
Mark Wielaard 04e90e
 extern ARM64Instr* ARM64Instr_StrEX   ( Int szB );
Mark Wielaard 04e90e
+extern ARM64Instr* ARM64Instr_LdrEXP  ( void );
Mark Wielaard 04e90e
+extern ARM64Instr* ARM64Instr_StrEXP  ( void );
Mark Wielaard 04e90e
 extern ARM64Instr* ARM64Instr_CAS     ( Int szB );
Mark Wielaard 04e90e
 extern ARM64Instr* ARM64Instr_CASP    ( Int szB );
Mark Wielaard 04e90e
 extern ARM64Instr* ARM64Instr_MFence  ( void );
Mark Wielaard 04e90e
diff --git a/VEX/priv/host_arm64_isel.c b/VEX/priv/host_arm64_isel.c
Mark Wielaard 04e90e
index 4b1d8c846..094e7e74b 100644
Mark Wielaard 04e90e
--- a/VEX/priv/host_arm64_isel.c
Mark Wielaard 04e90e
+++ b/VEX/priv/host_arm64_isel.c
Mark Wielaard 04e90e
@@ -196,9 +196,9 @@ static HReg        iselCondCode_R        ( ISelEnv* env, IRExpr* e );
Mark Wielaard 04e90e
 static HReg        iselIntExpr_R_wrk     ( ISelEnv* env, IRExpr* e );
Mark Wielaard 04e90e
 static HReg        iselIntExpr_R         ( ISelEnv* env, IRExpr* e );
Mark Wielaard 04e90e
 
Mark Wielaard 04e90e
-static void        iselInt128Expr_wrk    ( /*OUT*/HReg* rHi, HReg* rLo, 
Mark Wielaard 04e90e
+static void        iselInt128Expr_wrk    ( /*OUT*/HReg* rHi, /*OUT*/HReg* rLo,
Mark Wielaard 04e90e
                                            ISelEnv* env, IRExpr* e );
Mark Wielaard 04e90e
-static void        iselInt128Expr        ( /*OUT*/HReg* rHi, HReg* rLo, 
Mark Wielaard 04e90e
+static void        iselInt128Expr        ( /*OUT*/HReg* rHi, /*OUT*/HReg* rLo,
Mark Wielaard 04e90e
                                            ISelEnv* env, IRExpr* e );
Mark Wielaard 04e90e
 
Mark Wielaard 04e90e
 static HReg        iselDblExpr_wrk        ( ISelEnv* env, IRExpr* e );
Mark Wielaard 04e90e
@@ -1759,9 +1759,12 @@ static HReg iselIntExpr_R_wrk ( ISelEnv* env, IRExpr* e )
Mark Wielaard 04e90e
 
Mark Wielaard 04e90e
       /* AND/OR/XOR(e1, e2) (for any e1, e2) */
Mark Wielaard 04e90e
       switch (e->Iex.Binop.op) {
Mark Wielaard 04e90e
-         case Iop_And64: case Iop_And32: lop = ARM64lo_AND; goto log_binop;
Mark Wielaard 04e90e
-         case Iop_Or64:  case Iop_Or32:  case Iop_Or16: lop = ARM64lo_OR;  goto log_binop;
Mark Wielaard 04e90e
-         case Iop_Xor64: case Iop_Xor32: lop = ARM64lo_XOR; goto log_binop;
Mark Wielaard 04e90e
+         case Iop_And64: case Iop_And32:
Mark Wielaard 04e90e
+            lop = ARM64lo_AND; goto log_binop;
Mark Wielaard 04e90e
+         case Iop_Or64:  case Iop_Or32:  case Iop_Or16:
Mark Wielaard 04e90e
+            lop = ARM64lo_OR;  goto log_binop;
Mark Wielaard 04e90e
+         case Iop_Xor64: case Iop_Xor32:
Mark Wielaard 04e90e
+            lop = ARM64lo_XOR; goto log_binop;
Mark Wielaard 04e90e
          log_binop: {
Mark Wielaard 04e90e
             HReg      dst  = newVRegI(env);
Mark Wielaard 04e90e
             HReg      argL = iselIntExpr_R(env, e->Iex.Binop.arg1);
Mark Wielaard 04e90e
@@ -2013,6 +2016,11 @@ static HReg iselIntExpr_R_wrk ( ISelEnv* env, IRExpr* e )
Mark Wielaard 04e90e
             iselInt128Expr(&rHi,&rLo, env, e->Iex.Unop.arg);
Mark Wielaard 04e90e
             return rHi; /* and abandon rLo */
Mark Wielaard 04e90e
          }
Mark Wielaard 04e90e
+         case Iop_128to64: {
Mark Wielaard 04e90e
+            HReg rHi, rLo;
Mark Wielaard 04e90e
+            iselInt128Expr(&rHi,&rLo, env, e->Iex.Unop.arg);
Mark Wielaard 04e90e
+            return rLo; /* and abandon rHi */
Mark Wielaard 04e90e
+         }
Mark Wielaard 04e90e
          case Iop_8Sto32: case Iop_8Sto64: {
Mark Wielaard 04e90e
             IRExpr* arg = e->Iex.Unop.arg;
Mark Wielaard 04e90e
             HReg    src = iselIntExpr_R(env, arg);
Mark Wielaard 04e90e
@@ -2185,13 +2193,19 @@ static HReg iselIntExpr_R_wrk ( ISelEnv* env, IRExpr* e )
Mark Wielaard 04e90e
             }
Mark Wielaard 04e90e
             return dst;
Mark Wielaard 04e90e
          }
Mark Wielaard 04e90e
+         case Iop_64HIto32: {
Mark Wielaard 04e90e
+            HReg dst = newVRegI(env);
Mark Wielaard 04e90e
+            HReg src = iselIntExpr_R(env, e->Iex.Unop.arg);
Mark Wielaard 04e90e
+            addInstr(env, ARM64Instr_Shift(dst, src, ARM64RI6_I6(32),
Mark Wielaard 04e90e
+                                           ARM64sh_SHR));
Mark Wielaard 04e90e
+            return dst;
Mark Wielaard 04e90e
+         }
Mark Wielaard 04e90e
          case Iop_64to32:
Mark Wielaard 04e90e
          case Iop_64to16:
Mark Wielaard 04e90e
          case Iop_64to8:
Mark Wielaard 04e90e
          case Iop_32to16:
Mark Wielaard 04e90e
             /* These are no-ops. */
Mark Wielaard 04e90e
             return iselIntExpr_R(env, e->Iex.Unop.arg);
Mark Wielaard 04e90e
-
Mark Wielaard 04e90e
          default:
Mark Wielaard 04e90e
             break;
Mark Wielaard 04e90e
       }
Mark Wielaard 04e90e
@@ -2335,6 +2349,43 @@ static void iselInt128Expr_wrk ( HReg* rHi, HReg* rLo,
Mark Wielaard 04e90e
    vassert(e);
Mark Wielaard 04e90e
    vassert(typeOfIRExpr(env->type_env,e) == Ity_I128);
Mark Wielaard 04e90e
 
Mark Wielaard 04e90e
+   /* --------- TEMP --------- */
Mark Wielaard 04e90e
+   if (e->tag == Iex_RdTmp) {
Mark Wielaard 04e90e
+      lookupIRTempPair(rHi, rLo, env, e->Iex.RdTmp.tmp);
Mark Wielaard 04e90e
+      return;
Mark Wielaard 04e90e
+   }
Mark Wielaard 04e90e
+
Mark Wielaard 04e90e
+   /* --------- CONST --------- */
Mark Wielaard 04e90e
+   if (e->tag == Iex_Const) {
Mark Wielaard 04e90e
+      IRConst* c = e->Iex.Const.con;
Mark Wielaard 04e90e
+      vassert(c->tag == Ico_U128);
Mark Wielaard 04e90e
+      if (c->Ico.U128 == 0) {
Mark Wielaard 04e90e
+         // The only case we need to handle (so far)
Mark Wielaard 04e90e
+         HReg zero = newVRegI(env);
Mark Wielaard 04e90e
+         addInstr(env, ARM64Instr_Imm64(zero, 0));
Mark Wielaard 04e90e
+         *rHi = *rLo = zero;
Mark Wielaard 04e90e
+         return;
Mark Wielaard 04e90e
+      }
Mark Wielaard 04e90e
+   }
Mark Wielaard 04e90e
+
Mark Wielaard 04e90e
+   /* --------- UNARY ops --------- */
Mark Wielaard 04e90e
+   if (e->tag == Iex_Unop) {
Mark Wielaard 04e90e
+      switch (e->Iex.Unop.op) {
Mark Wielaard 04e90e
+         case Iop_ReinterpV128asI128: {
Mark Wielaard 04e90e
+            HReg dstHi = newVRegI(env);
Mark Wielaard 04e90e
+            HReg dstLo = newVRegI(env);
Mark Wielaard 04e90e
+            HReg src    = iselV128Expr(env, e->Iex.Unop.arg);
Mark Wielaard 04e90e
+            addInstr(env, ARM64Instr_VXfromQ(dstHi, src, 1));
Mark Wielaard 04e90e
+            addInstr(env, ARM64Instr_VXfromQ(dstLo, src, 0));
Mark Wielaard 04e90e
+            *rHi = dstHi;
Mark Wielaard 04e90e
+            *rLo = dstLo;
Mark Wielaard 04e90e
+            return;
Mark Wielaard 04e90e
+         }
Mark Wielaard 04e90e
+         default:
Mark Wielaard 04e90e
+            break;
Mark Wielaard 04e90e
+      }
Mark Wielaard 04e90e
+   }
Mark Wielaard 04e90e
+
Mark Wielaard 04e90e
    /* --------- BINARY ops --------- */
Mark Wielaard 04e90e
    if (e->tag == Iex_Binop) {
Mark Wielaard 04e90e
       switch (e->Iex.Binop.op) {
Mark Wielaard 04e90e
@@ -4086,6 +4137,14 @@ static void iselStmt ( ISelEnv* env, IRStmt* stmt )
Mark Wielaard 04e90e
          addInstr(env, ARM64Instr_VMov(8/*yes, really*/, dst, src));
Mark Wielaard 04e90e
          return;
Mark Wielaard 04e90e
       }
Mark Wielaard 04e90e
+      if (ty == Ity_I128) {
Mark Wielaard 04e90e
+         HReg rHi, rLo, dstHi, dstLo;
Mark Wielaard 04e90e
+         iselInt128Expr(&rHi,&rLo, env, stmt->Ist.WrTmp.data);
Mark Wielaard 04e90e
+         lookupIRTempPair( &dstHi, &dstLo, env, tmp);
Mark Wielaard 04e90e
+         addInstr(env, ARM64Instr_MovI(dstHi, rHi));
Mark Wielaard 04e90e
+         addInstr(env, ARM64Instr_MovI(dstLo, rLo));
Mark Wielaard 04e90e
+         return;
Mark Wielaard 04e90e
+      }
Mark Wielaard 04e90e
       if (ty == Ity_V128) {
Mark Wielaard 04e90e
          HReg src = iselV128Expr(env, stmt->Ist.WrTmp.data);
Mark Wielaard 04e90e
          HReg dst = lookupIRTemp(env, tmp);
Mark Wielaard 04e90e
@@ -4183,42 +4242,67 @@ static void iselStmt ( ISelEnv* env, IRStmt* stmt )
Mark Wielaard 04e90e
          /* LL */
Mark Wielaard 04e90e
          IRTemp res = stmt->Ist.LLSC.result;
Mark Wielaard 04e90e
          IRType ty  = typeOfIRTemp(env->type_env, res);
Mark Wielaard 04e90e
-         if (ty == Ity_I64 || ty == Ity_I32 
Mark Wielaard 04e90e
+         if (ty == Ity_I128 || ty == Ity_I64 || ty == Ity_I32
Mark Wielaard 04e90e
              || ty == Ity_I16 || ty == Ity_I8) {
Mark Wielaard 04e90e
             Int  szB   = 0;
Mark Wielaard 04e90e
-            HReg r_dst = lookupIRTemp(env, res);
Mark Wielaard 04e90e
             HReg raddr = iselIntExpr_R(env, stmt->Ist.LLSC.addr);
Mark Wielaard 04e90e
             switch (ty) {
Mark Wielaard 04e90e
-               case Ity_I8:  szB = 1; break;
Mark Wielaard 04e90e
-               case Ity_I16: szB = 2; break;
Mark Wielaard 04e90e
-               case Ity_I32: szB = 4; break;
Mark Wielaard 04e90e
-               case Ity_I64: szB = 8; break;
Mark Wielaard 04e90e
-               default:      vassert(0);
Mark Wielaard 04e90e
+               case Ity_I8:   szB = 1;  break;
Mark Wielaard 04e90e
+               case Ity_I16:  szB = 2;  break;
Mark Wielaard 04e90e
+               case Ity_I32:  szB = 4;  break;
Mark Wielaard 04e90e
+               case Ity_I64:  szB = 8;  break;
Mark Wielaard 04e90e
+               case Ity_I128: szB = 16; break;
Mark Wielaard 04e90e
+               default:       vassert(0);
Mark Wielaard 04e90e
+            }
Mark Wielaard 04e90e
+            if (szB == 16) {
Mark Wielaard 04e90e
+               HReg r_dstMSword = INVALID_HREG;
Mark Wielaard 04e90e
+               HReg r_dstLSword = INVALID_HREG;
Mark Wielaard 04e90e
+               lookupIRTempPair(&r_dstMSword, &r_dstLSword, env, res);
Mark Wielaard 04e90e
+               addInstr(env, ARM64Instr_MovI(hregARM64_X4(), raddr));
Mark Wielaard 04e90e
+               addInstr(env, ARM64Instr_LdrEXP());
Mark Wielaard 04e90e
+               addInstr(env, ARM64Instr_MovI(r_dstLSword, hregARM64_X2()));
Mark Wielaard 04e90e
+               addInstr(env, ARM64Instr_MovI(r_dstMSword, hregARM64_X3()));
Mark Wielaard 04e90e
+            } else {
Mark Wielaard 04e90e
+               vassert(szB != 0);
Mark Wielaard 04e90e
+               HReg r_dst = lookupIRTemp(env, res);
Mark Wielaard 04e90e
+               addInstr(env, ARM64Instr_MovI(hregARM64_X4(), raddr));
Mark Wielaard 04e90e
+               addInstr(env, ARM64Instr_LdrEX(szB));
Mark Wielaard 04e90e
+               addInstr(env, ARM64Instr_MovI(r_dst, hregARM64_X2()));
Mark Wielaard 04e90e
             }
Mark Wielaard 04e90e
-            addInstr(env, ARM64Instr_MovI(hregARM64_X4(), raddr));
Mark Wielaard 04e90e
-            addInstr(env, ARM64Instr_LdrEX(szB));
Mark Wielaard 04e90e
-            addInstr(env, ARM64Instr_MovI(r_dst, hregARM64_X2()));
Mark Wielaard 04e90e
             return;
Mark Wielaard 04e90e
          }
Mark Wielaard 04e90e
          goto stmt_fail;
Mark Wielaard 04e90e
       } else {
Mark Wielaard 04e90e
          /* SC */
Mark Wielaard 04e90e
          IRType tyd = typeOfIRExpr(env->type_env, stmt->Ist.LLSC.storedata);
Mark Wielaard 04e90e
-         if (tyd == Ity_I64 || tyd == Ity_I32
Mark Wielaard 04e90e
+         if (tyd == Ity_I128 || tyd == Ity_I64 || tyd == Ity_I32
Mark Wielaard 04e90e
              || tyd == Ity_I16 || tyd == Ity_I8) {
Mark Wielaard 04e90e
             Int  szB = 0;
Mark Wielaard 04e90e
-            HReg rD  = iselIntExpr_R(env, stmt->Ist.LLSC.storedata);
Mark Wielaard 04e90e
             HReg rA  = iselIntExpr_R(env, stmt->Ist.LLSC.addr);
Mark Wielaard 04e90e
             switch (tyd) {
Mark Wielaard 04e90e
-               case Ity_I8:  szB = 1; break;
Mark Wielaard 04e90e
-               case Ity_I16: szB = 2; break;
Mark Wielaard 04e90e
-               case Ity_I32: szB = 4; break;
Mark Wielaard 04e90e
-               case Ity_I64: szB = 8; break;
Mark Wielaard 04e90e
-               default:      vassert(0);
Mark Wielaard 04e90e
+               case Ity_I8:   szB = 1; break;
Mark Wielaard 04e90e
+               case Ity_I16:  szB = 2; break;
Mark Wielaard 04e90e
+               case Ity_I32:  szB = 4; break;
Mark Wielaard 04e90e
+               case Ity_I64:  szB = 8; break;
Mark Wielaard 04e90e
+               case Ity_I128: szB = 16; break;
Mark Wielaard 04e90e
+               default:       vassert(0);
Mark Wielaard 04e90e
+            }
Mark Wielaard 04e90e
+            if (szB == 16) {
Mark Wielaard 04e90e
+               HReg rD_MSword = INVALID_HREG;
Mark Wielaard 04e90e
+               HReg rD_LSword = INVALID_HREG;
Mark Wielaard 04e90e
+               iselInt128Expr(&rD_MSword,
Mark Wielaard 04e90e
+                              &rD_LSword, env, stmt->Ist.LLSC.storedata);
Mark Wielaard 04e90e
+               addInstr(env, ARM64Instr_MovI(hregARM64_X2(), rD_LSword));
Mark Wielaard 04e90e
+               addInstr(env, ARM64Instr_MovI(hregARM64_X3(), rD_MSword));
Mark Wielaard 04e90e
+               addInstr(env, ARM64Instr_MovI(hregARM64_X4(), rA));
Mark Wielaard 04e90e
+               addInstr(env, ARM64Instr_StrEXP());
Mark Wielaard 04e90e
+            } else {
Mark Wielaard 04e90e
+               vassert(szB != 0);
Mark Wielaard 04e90e
+               HReg rD  = iselIntExpr_R(env, stmt->Ist.LLSC.storedata);
Mark Wielaard 04e90e
+               addInstr(env, ARM64Instr_MovI(hregARM64_X2(), rD));
Mark Wielaard 04e90e
+               addInstr(env, ARM64Instr_MovI(hregARM64_X4(), rA));
Mark Wielaard 04e90e
+               addInstr(env, ARM64Instr_StrEX(szB));
Mark Wielaard 04e90e
             }
Mark Wielaard 04e90e
-            addInstr(env, ARM64Instr_MovI(hregARM64_X2(), rD));
Mark Wielaard 04e90e
-            addInstr(env, ARM64Instr_MovI(hregARM64_X4(), rA));
Mark Wielaard 04e90e
-            addInstr(env, ARM64Instr_StrEX(szB));
Mark Wielaard 04e90e
          } else {
Mark Wielaard 04e90e
             goto stmt_fail;
Mark Wielaard 04e90e
          }
Mark Wielaard 04e90e
@@ -4243,10 +4327,10 @@ static void iselStmt ( ISelEnv* env, IRStmt* stmt )
Mark Wielaard 04e90e
 
Mark Wielaard 04e90e
    /* --------- ACAS --------- */
Mark Wielaard 04e90e
    case Ist_CAS: {
Mark Wielaard 04e90e
-      if (stmt->Ist.CAS.details->oldHi == IRTemp_INVALID) {
Mark Wielaard 04e90e
+      IRCAS* cas = stmt->Ist.CAS.details;
Mark Wielaard 04e90e
+      if (cas->oldHi == IRTemp_INVALID && cas->end == Iend_LE) {
Mark Wielaard 04e90e
          /* "normal" singleton CAS */
Mark Wielaard 04e90e
          UChar  sz;
Mark Wielaard 04e90e
-         IRCAS* cas = stmt->Ist.CAS.details;
Mark Wielaard 04e90e
          IRType ty  = typeOfIRExpr(env->type_env, cas->dataLo);
Mark Wielaard 04e90e
          switch (ty) { 
Mark Wielaard 04e90e
             case Ity_I64: sz = 8; break;
Mark Wielaard 04e90e
@@ -4281,10 +4365,9 @@ static void iselStmt ( ISelEnv* env, IRStmt* stmt )
Mark Wielaard 04e90e
          addInstr(env, ARM64Instr_MovI(rOld, rResult));
Mark Wielaard 04e90e
          return;
Mark Wielaard 04e90e
       }
Mark Wielaard 04e90e
-      else {
Mark Wielaard 04e90e
+      if (cas->oldHi != IRTemp_INVALID && cas->end == Iend_LE) {
Mark Wielaard 04e90e
          /* Paired register CAS, i.e. CASP */
Mark Wielaard 04e90e
          UChar  sz;
Mark Wielaard 04e90e
-         IRCAS* cas = stmt->Ist.CAS.details;
Mark Wielaard 04e90e
          IRType ty  = typeOfIRExpr(env->type_env, cas->dataLo);
Mark Wielaard 04e90e
          switch (ty) {
Mark Wielaard 04e90e
             case Ity_I64: sz = 8; break;
Mark Wielaard 04e90e
diff --git a/VEX/priv/ir_defs.c b/VEX/priv/ir_defs.c
Mark Wielaard 04e90e
index 25566c41c..2d82c41a1 100644
Mark Wielaard 04e90e
--- a/VEX/priv/ir_defs.c
Mark Wielaard 04e90e
+++ b/VEX/priv/ir_defs.c
Mark Wielaard 04e90e
@@ -76,6 +76,7 @@ void ppIRConst ( const IRConst* con )
Mark Wielaard 04e90e
       case Ico_U16:  vex_printf( "0x%x:I16",     (UInt)(con->Ico.U16)); break;
Mark Wielaard 04e90e
       case Ico_U32:  vex_printf( "0x%x:I32",     (UInt)(con->Ico.U32)); break;
Mark Wielaard 04e90e
       case Ico_U64:  vex_printf( "0x%llx:I64",   (ULong)(con->Ico.U64)); break;
Mark Wielaard 04e90e
+      case Ico_U128: vex_printf( "I128{0x%04x}", (UInt)(con->Ico.U128)); break;
Mark Wielaard 04e90e
       case Ico_F32:  u.f32 = con->Ico.F32;
Mark Wielaard 04e90e
                      vex_printf( "F32{0x%x}",   u.i32);
Mark Wielaard 04e90e
                      break;
Mark Wielaard 04e90e
@@ -2266,6 +2267,13 @@ IRConst* IRConst_U64 ( ULong u64 )
Mark Wielaard 04e90e
    c->Ico.U64 = u64;
Mark Wielaard 04e90e
    return c;
Mark Wielaard 04e90e
 }
Mark Wielaard 04e90e
+IRConst* IRConst_U128 ( UShort con )
Mark Wielaard 04e90e
+{
Mark Wielaard 04e90e
+   IRConst* c  = LibVEX_Alloc_inline(sizeof(IRConst));
Mark Wielaard 04e90e
+   c->tag      = Ico_U128;
Mark Wielaard 04e90e
+   c->Ico.U128 = con;
Mark Wielaard 04e90e
+   return c;
Mark Wielaard 04e90e
+}
Mark Wielaard 04e90e
 IRConst* IRConst_F32 ( Float f32 )
Mark Wielaard 04e90e
 {
Mark Wielaard 04e90e
    IRConst* c = LibVEX_Alloc_inline(sizeof(IRConst));
Mark Wielaard 04e90e
@@ -4230,6 +4238,7 @@ IRType typeOfIRConst ( const IRConst* con )
Mark Wielaard 04e90e
       case Ico_U16:   return Ity_I16;
Mark Wielaard 04e90e
       case Ico_U32:   return Ity_I32;
Mark Wielaard 04e90e
       case Ico_U64:   return Ity_I64;
Mark Wielaard 04e90e
+      case Ico_U128:  return Ity_I128;
Mark Wielaard 04e90e
       case Ico_F32:   return Ity_F32;
Mark Wielaard 04e90e
       case Ico_F32i:  return Ity_F32;
Mark Wielaard 04e90e
       case Ico_F64:   return Ity_F64;
Mark Wielaard 04e90e
@@ -5129,7 +5138,7 @@ void tcStmt ( const IRSB* bb, const IRStmt* stmt, IRType gWordTy )
Mark Wielaard 04e90e
          tyRes = typeOfIRTemp(tyenv, stmt->Ist.LLSC.result);
Mark Wielaard 04e90e
          if (stmt->Ist.LLSC.storedata == NULL) {
Mark Wielaard 04e90e
             /* it's a LL */
Mark Wielaard 04e90e
-            if (tyRes != Ity_I64 && tyRes != Ity_I32
Mark Wielaard 04e90e
+            if (tyRes != Ity_I128 && tyRes != Ity_I64 && tyRes != Ity_I32
Mark Wielaard 04e90e
                 && tyRes != Ity_I16 && tyRes != Ity_I8)
Mark Wielaard 04e90e
                sanityCheckFail(bb,stmt,"Ist.LLSC(LL).result :: bogus");
Mark Wielaard 04e90e
          } else {
Mark Wielaard 04e90e
@@ -5137,7 +5146,7 @@ void tcStmt ( const IRSB* bb, const IRStmt* stmt, IRType gWordTy )
Mark Wielaard 04e90e
             if (tyRes != Ity_I1)
Mark Wielaard 04e90e
                sanityCheckFail(bb,stmt,"Ist.LLSC(SC).result: not :: Ity_I1");
Mark Wielaard 04e90e
             tyData = typeOfIRExpr(tyenv, stmt->Ist.LLSC.storedata);
Mark Wielaard 04e90e
-            if (tyData != Ity_I64 && tyData != Ity_I32
Mark Wielaard 04e90e
+            if (tyData != Ity_I128 && tyData != Ity_I64 && tyData != Ity_I32
Mark Wielaard 04e90e
                 && tyData != Ity_I16 && tyData != Ity_I8)
Mark Wielaard 04e90e
                sanityCheckFail(bb,stmt,
Mark Wielaard 04e90e
                                "Ist.LLSC(SC).result :: storedata bogus");
Mark Wielaard 04e90e
@@ -5385,6 +5394,7 @@ Int sizeofIRType ( IRType ty )
Mark Wielaard 04e90e
 IRType integerIRTypeOfSize ( Int szB )
Mark Wielaard 04e90e
 {
Mark Wielaard 04e90e
    switch (szB) {
Mark Wielaard 04e90e
+      case 16: return Ity_I128;
Mark Wielaard 04e90e
       case 8: return Ity_I64;
Mark Wielaard 04e90e
       case 4: return Ity_I32;
Mark Wielaard 04e90e
       case 2: return Ity_I16;
Mark Wielaard 04e90e
diff --git a/VEX/pub/libvex_guest_arm64.h b/VEX/pub/libvex_guest_arm64.h
Mark Wielaard 04e90e
index 39b6ecdc2..91d06bd75 100644
Mark Wielaard 04e90e
--- a/VEX/pub/libvex_guest_arm64.h
Mark Wielaard 04e90e
+++ b/VEX/pub/libvex_guest_arm64.h
Mark Wielaard 04e90e
@@ -157,14 +157,18 @@ typedef
Mark Wielaard 04e90e
          note of bits 23 and 22. */
Mark Wielaard 04e90e
       UInt  guest_FPCR;
Mark Wielaard 04e90e
 
Mark Wielaard 04e90e
-      /* Fallback LL/SC support.  See bugs 344524 and 369459. */
Mark Wielaard 04e90e
-      ULong guest_LLSC_SIZE; // 0==no current transaction, else 1,2,4 or 8.
Mark Wielaard 04e90e
+      /* Fallback LL/SC support.  See bugs 344524 and 369459.  _LO64 and _HI64
Mark Wielaard 04e90e
+         contain the original contents of _ADDR+0 .. _ADDR+15, but only _SIZE
Mark Wielaard 04e90e
+         number of bytes of it.  The remaining 16-_SIZE bytes of them must be
Mark Wielaard 04e90e
+         zero. */
Mark Wielaard 04e90e
+      ULong guest_LLSC_SIZE; // 0==no current transaction, else 1,2,4,8 or 16.
Mark Wielaard 04e90e
       ULong guest_LLSC_ADDR; // Address of transaction.
Mark Wielaard 04e90e
-      ULong guest_LLSC_DATA; // Original value at _ADDR, zero-extended.
Mark Wielaard 04e90e
+      ULong guest_LLSC_DATA_LO64; // Original value at _ADDR+0.
Mark Wielaard 04e90e
+      ULong guest_LLSC_DATA_HI64; // Original value at _ADDR+8.
Mark Wielaard 04e90e
 
Mark Wielaard 04e90e
       /* Padding to make it have an 16-aligned size */
Mark Wielaard 04e90e
       /* UInt  pad_end_0; */
Mark Wielaard 04e90e
-      ULong pad_end_1;
Mark Wielaard 04e90e
+      /* ULong pad_end_1; */
Mark Wielaard 04e90e
    }
Mark Wielaard 04e90e
    VexGuestARM64State;
Mark Wielaard 04e90e
 
Mark Wielaard 04e90e
diff --git a/VEX/pub/libvex_ir.h b/VEX/pub/libvex_ir.h
Mark Wielaard 04e90e
index deaa044c1..85805bb69 100644
Mark Wielaard 04e90e
--- a/VEX/pub/libvex_ir.h
Mark Wielaard 04e90e
+++ b/VEX/pub/libvex_ir.h
Mark Wielaard 04e90e
@@ -269,6 +269,8 @@ typedef
Mark Wielaard 04e90e
       Ico_U16, 
Mark Wielaard 04e90e
       Ico_U32, 
Mark Wielaard 04e90e
       Ico_U64,
Mark Wielaard 04e90e
+      Ico_U128,  /* 128-bit restricted integer constant,
Mark Wielaard 04e90e
+                    same encoding scheme as V128 */
Mark Wielaard 04e90e
       Ico_F32,   /* 32-bit IEEE754 floating */
Mark Wielaard 04e90e
       Ico_F32i,  /* 32-bit unsigned int to be interpreted literally
Mark Wielaard 04e90e
                     as a IEEE754 single value. */
Mark Wielaard 04e90e
@@ -295,6 +297,7 @@ typedef
Mark Wielaard 04e90e
          UShort U16;
Mark Wielaard 04e90e
          UInt   U32;
Mark Wielaard 04e90e
          ULong  U64;
Mark Wielaard 04e90e
+         UShort U128;
Mark Wielaard 04e90e
          Float  F32;
Mark Wielaard 04e90e
          UInt   F32i;
Mark Wielaard 04e90e
          Double F64;
Mark Wielaard 04e90e
@@ -311,6 +314,7 @@ extern IRConst* IRConst_U8   ( UChar );
Mark Wielaard 04e90e
 extern IRConst* IRConst_U16  ( UShort );
Mark Wielaard 04e90e
 extern IRConst* IRConst_U32  ( UInt );
Mark Wielaard 04e90e
 extern IRConst* IRConst_U64  ( ULong );
Mark Wielaard 04e90e
+extern IRConst* IRConst_U128 ( UShort );
Mark Wielaard 04e90e
 extern IRConst* IRConst_F32  ( Float );
Mark Wielaard 04e90e
 extern IRConst* IRConst_F32i ( UInt );
Mark Wielaard 04e90e
 extern IRConst* IRConst_F64  ( Double );
Mark Wielaard 04e90e
diff --git a/memcheck/mc_machine.c b/memcheck/mc_machine.c
Mark Wielaard 04e90e
index 919c7fae8..176c8e5cb 100644
Mark Wielaard 04e90e
--- a/memcheck/mc_machine.c
Mark Wielaard 04e90e
+++ b/memcheck/mc_machine.c
Mark Wielaard 04e90e
@@ -1115,9 +1115,10 @@ static Int get_otrack_shadow_offset_wrk ( Int offset, Int szB )
Mark Wielaard 04e90e
    if (o == GOF(CMSTART) && sz == 8) return -1; // untracked
Mark Wielaard 04e90e
    if (o == GOF(CMLEN)   && sz == 8) return -1; // untracked
Mark Wielaard 04e90e
 
Mark Wielaard 04e90e
-   if (o == GOF(LLSC_SIZE) && sz == 8) return -1; // untracked
Mark Wielaard 04e90e
-   if (o == GOF(LLSC_ADDR) && sz == 8) return o;
Mark Wielaard 04e90e
-   if (o == GOF(LLSC_DATA) && sz == 8) return o;
Mark Wielaard 04e90e
+   if (o == GOF(LLSC_SIZE)      && sz == 8) return -1; // untracked
Mark Wielaard 04e90e
+   if (o == GOF(LLSC_ADDR)      && sz == 8) return o;
Mark Wielaard 04e90e
+   if (o == GOF(LLSC_DATA_LO64) && sz == 8) return o;
Mark Wielaard 04e90e
+   if (o == GOF(LLSC_DATA_HI64) && sz == 8) return o;
Mark Wielaard 04e90e
 
Mark Wielaard 04e90e
    VG_(printf)("MC_(get_otrack_shadow_offset)(arm64)(off=%d,sz=%d)\n",
Mark Wielaard 04e90e
                offset,szB);
Mark Wielaard 04e90e
diff --git a/memcheck/mc_translate.c b/memcheck/mc_translate.c
Mark Wielaard 04e90e
index c6fd2653f..72ccb3c8c 100644
Mark Wielaard 04e90e
--- a/memcheck/mc_translate.c
Mark Wielaard 04e90e
+++ b/memcheck/mc_translate.c
Mark Wielaard 04e90e
@@ -5497,8 +5497,11 @@ IRAtom* expr2vbits_Load_WRK ( MCEnv* mce,
Mark Wielaard 04e90e
       the address (shadow) to 'defined' following the test. */
Mark Wielaard 04e90e
    complainIfUndefined( mce, addr, guard );
Mark Wielaard 04e90e
 
Mark Wielaard 04e90e
-   /* Now cook up a call to the relevant helper function, to read the
Mark Wielaard 04e90e
-      data V bits from shadow memory. */
Mark Wielaard 04e90e
+   /* Now cook up a call to the relevant helper function, to read the data V
Mark Wielaard 04e90e
+      bits from shadow memory.  Note that I128 loads are done by pretending
Mark Wielaard 04e90e
+      we're doing a V128 load, and then converting the resulting V128 vbits
Mark Wielaard 04e90e
+      word to an I128, right at the end of this function -- see `castedToI128`
Mark Wielaard 04e90e
+      below.  (It's only a minor hack :-) This pertains to bug 444399. */
Mark Wielaard 04e90e
    ty = shadowTypeV(ty);
Mark Wielaard 04e90e
 
Mark Wielaard 04e90e
    void*        helper           = NULL;
Mark Wielaard 04e90e
@@ -5511,6 +5514,7 @@ IRAtom* expr2vbits_Load_WRK ( MCEnv* mce,
Mark Wielaard 04e90e
                         hname = "MC_(helperc_LOADV256le)";
Mark Wielaard 04e90e
                         ret_via_outparam = True;
Mark Wielaard 04e90e
                         break;
Mark Wielaard 04e90e
+         case Ity_I128: // fallthrough.  See comment above.
Mark Wielaard 04e90e
          case Ity_V128: helper = &MC_(helperc_LOADV128le);
Mark Wielaard 04e90e
                         hname = "MC_(helperc_LOADV128le)";
Mark Wielaard 04e90e
                         ret_via_outparam = True;
Mark Wielaard 04e90e
@@ -5576,7 +5580,7 @@ IRAtom* expr2vbits_Load_WRK ( MCEnv* mce,
Mark Wielaard 04e90e
 
Mark Wielaard 04e90e
    /* We need to have a place to park the V bits we're just about to
Mark Wielaard 04e90e
       read. */
Mark Wielaard 04e90e
-   IRTemp datavbits = newTemp(mce, ty, VSh);
Mark Wielaard 04e90e
+   IRTemp datavbits = newTemp(mce, ty == Ity_I128 ? Ity_V128 : ty, VSh);
Mark Wielaard 04e90e
 
Mark Wielaard 04e90e
    /* Here's the call. */
Mark Wielaard 04e90e
    IRDirty* di;
Mark Wielaard 04e90e
@@ -5603,7 +5607,14 @@ IRAtom* expr2vbits_Load_WRK ( MCEnv* mce,
Mark Wielaard 04e90e
    }
Mark Wielaard 04e90e
    stmt( 'V', mce, IRStmt_Dirty(di) );
Mark Wielaard 04e90e
 
Mark Wielaard 04e90e
-   return mkexpr(datavbits);
Mark Wielaard 04e90e
+   if (ty == Ity_I128) {
Mark Wielaard 04e90e
+      IRAtom* castedToI128
Mark Wielaard 04e90e
+         = assignNew('V', mce, Ity_I128,
Mark Wielaard 04e90e
+                     unop(Iop_ReinterpV128asI128, mkexpr(datavbits)));
Mark Wielaard 04e90e
+      return castedToI128;
Mark Wielaard 04e90e
+   } else {
Mark Wielaard 04e90e
+      return mkexpr(datavbits);
Mark Wielaard 04e90e
+   }
Mark Wielaard 04e90e
 }
Mark Wielaard 04e90e
 
Mark Wielaard 04e90e
 
Mark Wielaard 04e90e
@@ -5631,6 +5642,7 @@ IRAtom* expr2vbits_Load ( MCEnv* mce,
Mark Wielaard 04e90e
       case Ity_I16:
Mark Wielaard 04e90e
       case Ity_I32:
Mark Wielaard 04e90e
       case Ity_I64:
Mark Wielaard 04e90e
+      case Ity_I128:
Mark Wielaard 04e90e
       case Ity_V128:
Mark Wielaard 04e90e
       case Ity_V256:
Mark Wielaard 04e90e
          return expr2vbits_Load_WRK(mce, end, ty, addr, bias, guard);
Mark Wielaard 04e90e
@@ -5928,6 +5940,7 @@ void do_shadow_Store ( MCEnv* mce,
Mark Wielaard 04e90e
                         c = IRConst_V256(V_BITS32_DEFINED); break;
Mark Wielaard 04e90e
          case Ity_V128: // V128 weirdness -- used twice
Mark Wielaard 04e90e
                         c = IRConst_V128(V_BITS16_DEFINED); break;
Mark Wielaard 04e90e
+         case Ity_I128: c = IRConst_U128(V_BITS16_DEFINED); break;
Mark Wielaard 04e90e
          case Ity_I64:  c = IRConst_U64 (V_BITS64_DEFINED); break;
Mark Wielaard 04e90e
          case Ity_I32:  c = IRConst_U32 (V_BITS32_DEFINED); break;
Mark Wielaard 04e90e
          case Ity_I16:  c = IRConst_U16 (V_BITS16_DEFINED); break;
Mark Wielaard 04e90e
@@ -5948,6 +5961,7 @@ void do_shadow_Store ( MCEnv* mce,
Mark Wielaard 04e90e
       switch (ty) {
Mark Wielaard 04e90e
          case Ity_V256: /* we'll use the helper four times */
Mark Wielaard 04e90e
          case Ity_V128: /* we'll use the helper twice */
Mark Wielaard 04e90e
+         case Ity_I128: /* we'll use the helper twice */
Mark Wielaard 04e90e
          case Ity_I64: helper = &MC_(helperc_STOREV64le);
Mark Wielaard 04e90e
                        hname = "MC_(helperc_STOREV64le)";
Mark Wielaard 04e90e
                        break;
Mark Wielaard 04e90e
@@ -6051,9 +6065,9 @@ void do_shadow_Store ( MCEnv* mce,
Mark Wielaard 04e90e
       stmt( 'V', mce, IRStmt_Dirty(diQ3) );
Mark Wielaard 04e90e
 
Mark Wielaard 04e90e
    } 
Mark Wielaard 04e90e
-   else if (UNLIKELY(ty == Ity_V128)) {
Mark Wielaard 04e90e
+   else if (UNLIKELY(ty == Ity_V128 || ty == Ity_I128)) {
Mark Wielaard 04e90e
 
Mark Wielaard 04e90e
-      /* V128-bit case */
Mark Wielaard 04e90e
+      /* V128/I128-bit case */
Mark Wielaard 04e90e
       /* See comment in next clause re 64-bit regparms */
Mark Wielaard 04e90e
       /* also, need to be careful about endianness */
Mark Wielaard 04e90e
 
Mark Wielaard 04e90e
@@ -6062,6 +6076,7 @@ void do_shadow_Store ( MCEnv* mce,
Mark Wielaard 04e90e
       IRAtom  *addrLo64, *addrHi64;
Mark Wielaard 04e90e
       IRAtom  *vdataLo64, *vdataHi64;
Mark Wielaard 04e90e
       IRAtom  *eBiasLo64, *eBiasHi64;
Mark Wielaard 04e90e
+      IROp    opGetLO64,  opGetHI64;
Mark Wielaard 04e90e
 
Mark Wielaard 04e90e
       if (end == Iend_LE) {
Mark Wielaard 04e90e
          offLo64 = 0;
Mark Wielaard 04e90e
@@ -6071,9 +6086,17 @@ void do_shadow_Store ( MCEnv* mce,
Mark Wielaard 04e90e
          offHi64 = 0;
Mark Wielaard 04e90e
       }
Mark Wielaard 04e90e
 
Mark Wielaard 04e90e
+      if (ty == Ity_V128) {
Mark Wielaard 04e90e
+         opGetLO64 = Iop_V128to64;
Mark Wielaard 04e90e
+         opGetHI64 = Iop_V128HIto64;
Mark Wielaard 04e90e
+      } else {
Mark Wielaard 04e90e
+         opGetLO64 = Iop_128to64;
Mark Wielaard 04e90e
+         opGetHI64 = Iop_128HIto64;
Mark Wielaard 04e90e
+      }
Mark Wielaard 04e90e
+
Mark Wielaard 04e90e
       eBiasLo64 = tyAddr==Ity_I32 ? mkU32(bias+offLo64) : mkU64(bias+offLo64);
Mark Wielaard 04e90e
       addrLo64  = assignNew('V', mce, tyAddr, binop(mkAdd, addr, eBiasLo64) );
Mark Wielaard 04e90e
-      vdataLo64 = assignNew('V', mce, Ity_I64, unop(Iop_V128to64, vdata));
Mark Wielaard 04e90e
+      vdataLo64 = assignNew('V', mce, Ity_I64, unop(opGetLO64, vdata));
Mark Wielaard 04e90e
       diLo64    = unsafeIRDirty_0_N( 
Mark Wielaard 04e90e
                      1/*regparms*/, 
Mark Wielaard 04e90e
                      hname, VG_(fnptr_to_fnentry)( helper ), 
Mark Wielaard 04e90e
@@ -6081,7 +6104,7 @@ void do_shadow_Store ( MCEnv* mce,
Mark Wielaard 04e90e
                   );
Mark Wielaard 04e90e
       eBiasHi64 = tyAddr==Ity_I32 ? mkU32(bias+offHi64) : mkU64(bias+offHi64);
Mark Wielaard 04e90e
       addrHi64  = assignNew('V', mce, tyAddr, binop(mkAdd, addr, eBiasHi64) );
Mark Wielaard 04e90e
-      vdataHi64 = assignNew('V', mce, Ity_I64, unop(Iop_V128HIto64, vdata));
Mark Wielaard 04e90e
+      vdataHi64 = assignNew('V', mce, Ity_I64, unop(opGetHI64, vdata));
Mark Wielaard 04e90e
       diHi64    = unsafeIRDirty_0_N( 
Mark Wielaard 04e90e
                      1/*regparms*/, 
Mark Wielaard 04e90e
                      hname, VG_(fnptr_to_fnentry)( helper ), 
Mark Wielaard 04e90e
@@ -6888,7 +6911,7 @@ static void do_shadow_LLSC ( MCEnv*    mce,
Mark Wielaard 04e90e
       /* Just treat this as a normal load, followed by an assignment of
Mark Wielaard 04e90e
          the value to .result. */
Mark Wielaard 04e90e
       /* Stay sane */
Mark Wielaard 04e90e
-      tl_assert(resTy == Ity_I64 || resTy == Ity_I32
Mark Wielaard 04e90e
+      tl_assert(resTy == Ity_I128 || resTy == Ity_I64 || resTy == Ity_I32
Mark Wielaard 04e90e
                 || resTy == Ity_I16 || resTy == Ity_I8);
Mark Wielaard 04e90e
       assign( 'V', mce, resTmp,
Mark Wielaard 04e90e
                    expr2vbits_Load(
Mark Wielaard 04e90e
@@ -6899,7 +6922,7 @@ static void do_shadow_LLSC ( MCEnv*    mce,
Mark Wielaard 04e90e
       /* Stay sane */
Mark Wielaard 04e90e
       IRType dataTy = typeOfIRExpr(mce->sb->tyenv,
Mark Wielaard 04e90e
                                    stStoredata);
Mark Wielaard 04e90e
-      tl_assert(dataTy == Ity_I64 || dataTy == Ity_I32
Mark Wielaard 04e90e
+      tl_assert(dataTy == Ity_I128 || dataTy == Ity_I64 || dataTy == Ity_I32
Mark Wielaard 04e90e
                 || dataTy == Ity_I16 || dataTy == Ity_I8);
Mark Wielaard 04e90e
       do_shadow_Store( mce, stEnd,
Mark Wielaard 04e90e
                             stAddr, 0/* addr bias */,
Mark Wielaard 04e90e
@@ -7684,7 +7707,7 @@ static void schemeS ( MCEnv* mce, IRStmt* st )
Mark Wielaard 04e90e
                = typeOfIRTemp(mce->sb->tyenv, st->Ist.LLSC.result);
Mark Wielaard 04e90e
             IRExpr* vanillaLoad
Mark Wielaard 04e90e
                = IRExpr_Load(st->Ist.LLSC.end, resTy, st->Ist.LLSC.addr);
Mark Wielaard 04e90e
-            tl_assert(resTy == Ity_I64 || resTy == Ity_I32
Mark Wielaard 04e90e
+            tl_assert(resTy == Ity_I128 || resTy == Ity_I64 || resTy == Ity_I32
Mark Wielaard 04e90e
                       || resTy == Ity_I16 || resTy == Ity_I8);
Mark Wielaard 04e90e
             assign( 'B', mce, findShadowTmpB(mce, st->Ist.LLSC.result),
Mark Wielaard 04e90e
                               schemeE(mce, vanillaLoad));
Mark Wielaard 04e90e
diff --git a/memcheck/tests/Makefile.am b/memcheck/tests/Makefile.am
Mark Wielaard 04e90e
index 449710020..2b43ef7d7 100644
Mark Wielaard 04e90e
--- a/memcheck/tests/Makefile.am
Mark Wielaard 04e90e
+++ b/memcheck/tests/Makefile.am
Mark Wielaard 04e90e
@@ -90,6 +90,7 @@ EXTRA_DIST = \
Mark Wielaard 04e90e
 	addressable.stderr.exp addressable.stdout.exp addressable.vgtest \
Mark Wielaard 04e90e
 	atomic_incs.stderr.exp atomic_incs.vgtest \
Mark Wielaard 04e90e
 	atomic_incs.stdout.exp-32bit atomic_incs.stdout.exp-64bit \
Mark Wielaard 04e90e
+	atomic_incs.stdout.exp-64bit-and-128bit \
Mark Wielaard 04e90e
 	badaddrvalue.stderr.exp \
Mark Wielaard 04e90e
 	badaddrvalue.stdout.exp badaddrvalue.vgtest \
Mark Wielaard 04e90e
         exit_on_first_error.stderr.exp \
Mark Wielaard 04e90e
diff --git a/memcheck/tests/atomic_incs.c b/memcheck/tests/atomic_incs.c
Mark Wielaard 04e90e
index f931750f4..1c738c530 100644
Mark Wielaard 04e90e
--- a/memcheck/tests/atomic_incs.c
Mark Wielaard 04e90e
+++ b/memcheck/tests/atomic_incs.c
Mark Wielaard 04e90e
@@ -22,6 +22,17 @@
Mark Wielaard 04e90e
 #define NNN 3456987
Mark Wielaard 04e90e
 
Mark Wielaard 04e90e
 #define IS_8_ALIGNED(_ptr)   (0 == (((unsigned long)(_ptr)) & 7))
Mark Wielaard 04e90e
+#define IS_16_ALIGNED(_ptr)  (0 == (((unsigned long)(_ptr)) & 15))
Mark Wielaard 04e90e
+
Mark Wielaard 04e90e
+// U128 from libvex_basictypes.h is a 4-x-UInt array, which is a bit
Mark Wielaard 04e90e
+// inconvenient, hence:
Mark Wielaard 04e90e
+typedef
Mark Wielaard 04e90e
+   struct {
Mark Wielaard 04e90e
+      // assuming little-endianness
Mark Wielaard 04e90e
+      unsigned long long int lo64;
Mark Wielaard 04e90e
+      unsigned long long int hi64;
Mark Wielaard 04e90e
+   }
Mark Wielaard 04e90e
+   MyU128;
Mark Wielaard 04e90e
 
Mark Wielaard 04e90e
 
Mark Wielaard 04e90e
 __attribute__((noinline)) void atomic_add_8bit ( char* p, int n ) 
Mark Wielaard 04e90e
@@ -712,6 +723,40 @@ __attribute__((noinline)) void atomic_add_64bit ( long long int* p, int n )
Mark Wielaard 04e90e
 #endif
Mark Wielaard 04e90e
 }
Mark Wielaard 04e90e
 
Mark Wielaard 04e90e
+__attribute__((noinline)) void atomic_add_128bit ( MyU128* p,
Mark Wielaard 04e90e
+                                                   unsigned long long int n )
Mark Wielaard 04e90e
+{
Mark Wielaard 04e90e
+#if defined(VGA_x86) || defined(VGA_ppc32) || defined(VGA_mips32) \
Mark Wielaard 04e90e
+    || defined (VGA_nanomips) || defined(VGA_mips64) \
Mark Wielaard 04e90e
+    || defined(VGA_amd64) \
Mark Wielaard 04e90e
+    || defined(VGA_ppc64be) || defined(VGA_ppc64le) \
Mark Wielaard 04e90e
+    || defined(VGA_arm) \
Mark Wielaard 04e90e
+    || defined(VGA_s390x)
Mark Wielaard 04e90e
+   /* do nothing; is not supported */
Mark Wielaard 04e90e
+#elif defined(VGA_arm64)
Mark Wielaard 04e90e
+   unsigned long long int block[3]
Mark Wielaard 04e90e
+      = { (unsigned long long int)p, (unsigned long long int)n,
Mark Wielaard 04e90e
+          0xFFFFFFFFFFFFFFFFULL};
Mark Wielaard 04e90e
+   do {
Mark Wielaard 04e90e
+      __asm__ __volatile__(
Mark Wielaard 04e90e
+         "mov   x5, %0"             "\n\t" // &block[0]
Mark Wielaard 04e90e
+         "ldr   x9, [x5, #0]"       "\n\t" // p
Mark Wielaard 04e90e
+         "ldr   x10, [x5, #8]"      "\n\t" // n
Mark Wielaard 04e90e
+         "ldxp  x7, x8, [x9]"       "\n\t"
Mark Wielaard 04e90e
+         "adds  x7, x7, x10"        "\n\t"
Mark Wielaard 04e90e
+         "adc   x8, x8, xzr"        "\n\t"
Mark Wielaard 04e90e
+         "stxp  w4, x7, x8, [x9]"   "\n\t"
Mark Wielaard 04e90e
+         "str   x4, [x5, #16]"      "\n\t"
Mark Wielaard 04e90e
+         : /*out*/
Mark Wielaard 04e90e
+         : /*in*/ "r"(&block[0])
Mark Wielaard 04e90e
+         : /*trash*/ "memory", "cc", "x5", "x7", "x8", "x9", "x10", "x4"
Mark Wielaard 04e90e
+      );
Mark Wielaard 04e90e
+   } while (block[2] != 0);
Mark Wielaard 04e90e
+#else
Mark Wielaard 04e90e
+# error "Unsupported arch"
Mark Wielaard 04e90e
+#endif
Mark Wielaard 04e90e
+}
Mark Wielaard 04e90e
+
Mark Wielaard 04e90e
 int main ( int argc, char** argv )
Mark Wielaard 04e90e
 {
Mark Wielaard 04e90e
    int    i, status;
Mark Wielaard 04e90e
@@ -720,8 +765,12 @@ int main ( int argc, char** argv )
Mark Wielaard 04e90e
    short* p16;
Mark Wielaard 04e90e
    int*   p32;
Mark Wielaard 04e90e
    long long int* p64;
Mark Wielaard 04e90e
+   MyU128*  p128;
Mark Wielaard 04e90e
    pid_t  child, p2;
Mark Wielaard 04e90e
 
Mark Wielaard 04e90e
+   assert(sizeof(MyU128) == 16);
Mark Wielaard 04e90e
+   assert(sysconf(_SC_PAGESIZE) >= 4096);
Mark Wielaard 04e90e
+
Mark Wielaard 04e90e
    printf("parent, pre-fork\n");
Mark Wielaard 04e90e
 
Mark Wielaard 04e90e
    page = mmap( 0, sysconf(_SC_PAGESIZE),
Mark Wielaard 04e90e
@@ -736,11 +785,13 @@ int main ( int argc, char** argv )
Mark Wielaard 04e90e
    p16 = (short*)(page+256);
Mark Wielaard 04e90e
    p32 = (int*)(page+512);
Mark Wielaard 04e90e
    p64 = (long long int*)(page+768);
Mark Wielaard 04e90e
+   p128 = (MyU128*)(page+1024);
Mark Wielaard 04e90e
 
Mark Wielaard 04e90e
    assert( IS_8_ALIGNED(p8) );
Mark Wielaard 04e90e
    assert( IS_8_ALIGNED(p16) );
Mark Wielaard 04e90e
    assert( IS_8_ALIGNED(p32) );
Mark Wielaard 04e90e
    assert( IS_8_ALIGNED(p64) );
Mark Wielaard 04e90e
+   assert( IS_16_ALIGNED(p128) );
Mark Wielaard 04e90e
 
Mark Wielaard 04e90e
    memset(page, 0, 1024);
Mark Wielaard 04e90e
 
Mark Wielaard 04e90e
@@ -748,6 +799,7 @@ int main ( int argc, char** argv )
Mark Wielaard 04e90e
    *p16 = 0;
Mark Wielaard 04e90e
    *p32 = 0;
Mark Wielaard 04e90e
    *p64 = 0;
Mark Wielaard 04e90e
+   p128->lo64 = p128->hi64 = 0;
Mark Wielaard 04e90e
 
Mark Wielaard 04e90e
    child = fork();
Mark Wielaard 04e90e
    if (child == -1) {
Mark Wielaard 04e90e
@@ -763,6 +815,7 @@ int main ( int argc, char** argv )
Mark Wielaard 04e90e
          atomic_add_16bit(p16, 1);
Mark Wielaard 04e90e
          atomic_add_32bit(p32, 1);
Mark Wielaard 04e90e
          atomic_add_64bit(p64, 98765 ); /* ensure we hit the upper 32 bits */
Mark Wielaard 04e90e
+         atomic_add_128bit(p128, 0x1000000013374771ULL); // ditto re upper 64
Mark Wielaard 04e90e
       }
Mark Wielaard 04e90e
       return 1;
Mark Wielaard 04e90e
       /* NOTREACHED */
Mark Wielaard 04e90e
@@ -778,6 +831,7 @@ int main ( int argc, char** argv )
Mark Wielaard 04e90e
       atomic_add_16bit(p16, 1);
Mark Wielaard 04e90e
       atomic_add_32bit(p32, 1);
Mark Wielaard 04e90e
       atomic_add_64bit(p64, 98765 ); /* ensure we hit the upper 32 bits */
Mark Wielaard 04e90e
+      atomic_add_128bit(p128, 0x1000000013374771ULL); // ditto re upper 64
Mark Wielaard 04e90e
    }
Mark Wielaard 04e90e
 
Mark Wielaard 04e90e
    p2 = waitpid(child, &status, 0);
Mark Wielaard 04e90e
@@ -788,11 +842,17 @@ int main ( int argc, char** argv )
Mark Wielaard 04e90e
 
Mark Wielaard 04e90e
    printf("FINAL VALUES:  8 bit %d,  16 bit %d,  32 bit %d,  64 bit %lld\n",
Mark Wielaard 04e90e
           (int)(*(signed char*)p8), (int)(*p16), *p32, *p64 );
Mark Wielaard 04e90e
+   printf("               128 bit 0x%016llx:0x%016llx\n",
Mark Wielaard 04e90e
+          p128->hi64, p128->lo64);
Mark Wielaard 04e90e
 
Mark Wielaard 04e90e
    if (-74 == (int)(*(signed char*)p8) 
Mark Wielaard 04e90e
        && 32694 == (int)(*p16) 
Mark Wielaard 04e90e
        && 6913974 == *p32
Mark Wielaard 04e90e
-       && (0LL == *p64 || 682858642110LL == *p64)) {
Mark Wielaard 04e90e
+       && (0LL == *p64 || 682858642110LL == *p64)
Mark Wielaard 04e90e
+       && ((0 == p128->hi64 && 0 == p128->lo64)
Mark Wielaard 04e90e
+           || (0x00000000000697fb == p128->hi64
Mark Wielaard 04e90e
+               && 0x6007eb426316d956ULL == p128->lo64))
Mark Wielaard 04e90e
+      ) {
Mark Wielaard 04e90e
       printf("PASS\n");
Mark Wielaard 04e90e
    } else {
Mark Wielaard 04e90e
       printf("FAIL -- see source code for expected values\n");
Mark Wielaard 04e90e
diff --git a/memcheck/tests/atomic_incs.stdout.exp-32bit b/memcheck/tests/atomic_incs.stdout.exp-32bit
Mark Wielaard 04e90e
index c5b8781e5..55e5044b5 100644
Mark Wielaard 04e90e
--- a/memcheck/tests/atomic_incs.stdout.exp-32bit
Mark Wielaard 04e90e
+++ b/memcheck/tests/atomic_incs.stdout.exp-32bit
Mark Wielaard 04e90e
@@ -3,5 +3,6 @@ child
Mark Wielaard 04e90e
 parent, pre-fork
Mark Wielaard 04e90e
 parent
Mark Wielaard 04e90e
 FINAL VALUES:  8 bit -74,  16 bit 32694,  32 bit 6913974,  64 bit 0
Mark Wielaard 04e90e
+               128 bit 0x0000000000000000:0x0000000000000000
Mark Wielaard 04e90e
 PASS
Mark Wielaard 04e90e
 parent exits
Mark Wielaard 04e90e
diff --git a/memcheck/tests/atomic_incs.stdout.exp-64bit b/memcheck/tests/atomic_incs.stdout.exp-64bit
Mark Wielaard 04e90e
index 82405c520..ca2f4fc97 100644
Mark Wielaard 04e90e
--- a/memcheck/tests/atomic_incs.stdout.exp-64bit
Mark Wielaard 04e90e
+++ b/memcheck/tests/atomic_incs.stdout.exp-64bit
Mark Wielaard 04e90e
@@ -3,5 +3,6 @@ child
Mark Wielaard 04e90e
 parent, pre-fork
Mark Wielaard 04e90e
 parent
Mark Wielaard 04e90e
 FINAL VALUES:  8 bit -74,  16 bit 32694,  32 bit 6913974,  64 bit 682858642110
Mark Wielaard 04e90e
+               128 bit 0x0000000000000000:0x0000000000000000
Mark Wielaard 04e90e
 PASS
Mark Wielaard 04e90e
 parent exits
Mark Wielaard 04e90e
diff --git a/memcheck/tests/atomic_incs.stdout.exp-64bit-and-128bit b/memcheck/tests/atomic_incs.stdout.exp-64bit-and-128bit
Mark Wielaard 04e90e
new file mode 100644
Mark Wielaard 04e90e
index 000000000..ef6580917
Mark Wielaard 04e90e
--- /dev/null
Mark Wielaard 04e90e
+++ b/memcheck/tests/atomic_incs.stdout.exp-64bit-and-128bit
Mark Wielaard 04e90e
@@ -0,0 +1,8 @@
Mark Wielaard 04e90e
+parent, pre-fork
Mark Wielaard 04e90e
+child
Mark Wielaard 04e90e
+parent, pre-fork
Mark Wielaard 04e90e
+parent
Mark Wielaard 04e90e
+FINAL VALUES:  8 bit -74,  16 bit 32694,  32 bit 6913974,  64 bit 682858642110
Mark Wielaard 04e90e
+               128 bit 0x00000000000697fb:0x6007eb426316d956
Mark Wielaard 04e90e
+PASS
Mark Wielaard 04e90e
+parent exits
Mark Wielaard 04e90e
diff --git a/none/tests/arm64/Makefile.am b/none/tests/arm64/Makefile.am
Mark Wielaard 04e90e
index 00cbfa52c..9efb49b27 100644
Mark Wielaard 04e90e
--- a/none/tests/arm64/Makefile.am
Mark Wielaard 04e90e
+++ b/none/tests/arm64/Makefile.am
Mark Wielaard 04e90e
@@ -12,7 +12,10 @@ EXTRA_DIST = \
Mark Wielaard 04e90e
 	atomics_v81.stdout.exp atomics_v81.stderr.exp atomics_v81.vgtest \
Mark Wielaard 04e90e
 	simd_v81.stdout.exp simd_v81.stderr.exp simd_v81.vgtest \
Mark Wielaard 04e90e
         fmadd_sub.stdout.exp fmadd_sub.stderr.exp fmadd_sub.vgtest \
Mark Wielaard 04e90e
-	fp_and_simd_v82.stdout.exp fp_and_simd_v82.stderr.exp fp_and_simd_v82.vgtest
Mark Wielaard 04e90e
+	fp_and_simd_v82.stdout.exp fp_and_simd_v82.stderr.exp \
Mark Wielaard 04e90e
+	fp_and_simd_v82.vgtest \
Mark Wielaard 04e90e
+	ldxp_stxp.stdout.exp ldxp_stxp.stderr.exp \
Mark Wielaard 04e90e
+	ldxp_stxp_basisimpl.vgtest ldxp_stxp_fallbackimpl.vgtest
Mark Wielaard 04e90e
 
Mark Wielaard 04e90e
 check_PROGRAMS = \
Mark Wielaard 04e90e
 	allexec \
Mark Wielaard 04e90e
@@ -20,7 +23,8 @@ check_PROGRAMS = \
Mark Wielaard 04e90e
 	fp_and_simd \
Mark Wielaard 04e90e
 	integer \
Mark Wielaard 04e90e
 	memory \
Mark Wielaard 04e90e
-	fmadd_sub
Mark Wielaard 04e90e
+	fmadd_sub \
Mark Wielaard 04e90e
+	ldxp_stxp
Mark Wielaard 04e90e
 
Mark Wielaard 04e90e
 if BUILD_ARMV8_CRC_TESTS
Mark Wielaard 04e90e
   check_PROGRAMS += crc32
Mark Wielaard 04e90e
diff --git a/none/tests/arm64/ldxp_stxp.c b/none/tests/arm64/ldxp_stxp.c
Mark Wielaard 04e90e
new file mode 100644
Mark Wielaard 04e90e
index 000000000..b5f6ea121
Mark Wielaard 04e90e
--- /dev/null
Mark Wielaard 04e90e
+++ b/none/tests/arm64/ldxp_stxp.c
Mark Wielaard 04e90e
@@ -0,0 +1,93 @@
Mark Wielaard 04e90e
+
Mark Wielaard 04e90e
+/* Note, this is only a basic smoke test of LD{A}XP and ST{L}XP.  Their
Mark Wielaard 04e90e
+   atomicity properties are tested by memcheck/tests/atomic_incs.c. */
Mark Wielaard 04e90e
+
Mark Wielaard 04e90e
+#include <stdio.h>
Mark Wielaard 04e90e
+#include <stdlib.h>
Mark Wielaard 04e90e
+#include <malloc.h>
Mark Wielaard 04e90e
+#include <assert.h>
Mark Wielaard 04e90e
+
Mark Wielaard 04e90e
+typedef  unsigned int            UInt;
Mark Wielaard 04e90e
+typedef  unsigned long long int  ULong;
Mark Wielaard 04e90e
+
Mark Wielaard 04e90e
+
Mark Wielaard 04e90e
+void initBlock ( ULong* block )
Mark Wielaard 04e90e
+{
Mark Wielaard 04e90e
+   block[0] = 0x0001020304050607ULL;
Mark Wielaard 04e90e
+   block[1] = 0x1011121314151617ULL;
Mark Wielaard 04e90e
+   block[2] = 0x2021222324252627ULL;
Mark Wielaard 04e90e
+   block[3] = 0x3031323334353637ULL;
Mark Wielaard 04e90e
+   block[4] = 0x4041424344454647ULL;
Mark Wielaard 04e90e
+   block[5] = 0x5051525354555657ULL;
Mark Wielaard 04e90e
+}
Mark Wielaard 04e90e
+
Mark Wielaard 04e90e
+void printBlock ( const char* who,
Mark Wielaard 04e90e
+                  ULong* block, ULong rt1contents, ULong rt2contents,
Mark Wielaard 04e90e
+                  UInt zeroIfSuccess )
Mark Wielaard 04e90e
+{
Mark Wielaard 04e90e
+   printf("Block %s (%s)\n", who, zeroIfSuccess == 0 ? "success" : "FAILURE" );
Mark Wielaard 04e90e
+   for (int i = 0; i < 6; i++) {
Mark Wielaard 04e90e
+      printf("0x%016llx\n", block[i]);
Mark Wielaard 04e90e
+   }
Mark Wielaard 04e90e
+   printf("0x%016llx rt1contents\n", rt1contents);
Mark Wielaard 04e90e
+   printf("0x%016llx rt2contents\n", rt2contents);
Mark Wielaard 04e90e
+   printf("\n");
Mark Wielaard 04e90e
+}
Mark Wielaard 04e90e
+
Mark Wielaard 04e90e
+int main ( void )
Mark Wielaard 04e90e
+{
Mark Wielaard 04e90e
+   ULong* block = memalign(16, 6 * sizeof(ULong));
Mark Wielaard 04e90e
+   assert(block);
Mark Wielaard 04e90e
+
Mark Wielaard 04e90e
+   ULong rt1in, rt2in, rt1out, rt2out;
Mark Wielaard 04e90e
+   UInt scRes;
Mark Wielaard 04e90e
+
Mark Wielaard 04e90e
+   // Do ldxp then stxp with x-registers
Mark Wielaard 04e90e
+   initBlock(block);
Mark Wielaard 04e90e
+   rt1in  = 0x5555666677778888ULL;
Mark Wielaard 04e90e
+   rt2in  = 0xAAAA9999BBBB0000ULL;
Mark Wielaard 04e90e
+   rt1out = 0x1111222233334444ULL;
Mark Wielaard 04e90e
+   rt2out = 0xFFFFEEEEDDDDCCCCULL;
Mark Wielaard 04e90e
+   scRes  = 0x55555555;
Mark Wielaard 04e90e
+   __asm__ __volatile__(
Mark Wielaard 04e90e
+      "ldxp %1, %2, [%5]"       "\n\t"
Mark Wielaard 04e90e
+      "stxp %w0, %3, %4, [%5]"  "\n\t"
Mark Wielaard 04e90e
+      : /*OUT*/
Mark Wielaard 04e90e
+        "=&r"(scRes),  // %0
Mark Wielaard 04e90e
+        "=&r"(rt1out), // %1
Mark Wielaard 04e90e
+        "=&r"(rt2out)  // %2
Mark Wielaard 04e90e
+      : /*IN*/
Mark Wielaard 04e90e
+        "r"(rt1in),    // %3
Mark Wielaard 04e90e
+        "r"(rt2in),    // %4
Mark Wielaard 04e90e
+        "r"(&block[2]) // %5
Mark Wielaard 04e90e
+      : /*TRASH*/
Mark Wielaard 04e90e
+        "memory","cc"
Mark Wielaard 04e90e
+   );
Mark Wielaard 04e90e
+   printBlock("after ldxp/stxp 2x64-bit", block, rt1out, rt2out, scRes);
Mark Wielaard 04e90e
+
Mark Wielaard 04e90e
+   // Do ldxp then stxp with w-registers
Mark Wielaard 04e90e
+   initBlock(block);
Mark Wielaard 04e90e
+   rt1in  = 0x5555666677778888ULL;
Mark Wielaard 04e90e
+   rt2in  = 0xAAAA9999BBBB0000ULL;
Mark Wielaard 04e90e
+   rt1out = 0x1111222233334444ULL;
Mark Wielaard 04e90e
+   rt2out = 0xFFFFEEEEDDDDCCCCULL;
Mark Wielaard 04e90e
+   scRes  = 0x55555555;
Mark Wielaard 04e90e
+   __asm__ __volatile__(
Mark Wielaard 04e90e
+      "ldxp %w1, %w2, [%5]"       "\n\t"
Mark Wielaard 04e90e
+      "stxp %w0, %w3, %w4, [%5]"  "\n\t"
Mark Wielaard 04e90e
+      : /*OUT*/
Mark Wielaard 04e90e
+        "=&r"(scRes),  // %0
Mark Wielaard 04e90e
+        "=&r"(rt1out), // %1
Mark Wielaard 04e90e
+        "=&r"(rt2out)  // %2
Mark Wielaard 04e90e
+      : /*IN*/
Mark Wielaard 04e90e
+        "r"(rt1in),    // %3
Mark Wielaard 04e90e
+        "r"(rt2in),    // %4
Mark Wielaard 04e90e
+        "r"(&block[2]) // %5
Mark Wielaard 04e90e
+      : /*TRASH*/
Mark Wielaard 04e90e
+        "memory","cc"
Mark Wielaard 04e90e
+   );
Mark Wielaard 04e90e
+   printBlock("after ldxp/stxp 2x32-bit", block, rt1out, rt2out, scRes);
Mark Wielaard 04e90e
+
Mark Wielaard 04e90e
+   free(block);
Mark Wielaard 04e90e
+   return 0;
Mark Wielaard 04e90e
+}
Mark Wielaard 04e90e
diff --git a/none/tests/arm64/ldxp_stxp_basisimpl.stderr.exp b/none/tests/arm64/ldxp_stxp_basisimpl.stderr.exp
Mark Wielaard 04e90e
new file mode 100644
Mark Wielaard 04e90e
index 000000000..e69de29bb
Mark Wielaard 04e90e
diff --git a/none/tests/arm64/ldxp_stxp_basisimpl.stdout.exp b/none/tests/arm64/ldxp_stxp_basisimpl.stdout.exp
Mark Wielaard 04e90e
new file mode 100644
Mark Wielaard 04e90e
index 000000000..f269ecdcc
Mark Wielaard 04e90e
--- /dev/null
Mark Wielaard 04e90e
+++ b/none/tests/arm64/ldxp_stxp_basisimpl.stdout.exp
Mark Wielaard 04e90e
@@ -0,0 +1,20 @@
Mark Wielaard 04e90e
+Block after ldxp/stxp 2x64-bit (success)
Mark Wielaard 04e90e
+0x0001020304050607
Mark Wielaard 04e90e
+0x1011121314151617
Mark Wielaard 04e90e
+0x5555666677778888
Mark Wielaard 04e90e
+0xaaaa9999bbbb0000
Mark Wielaard 04e90e
+0x4041424344454647
Mark Wielaard 04e90e
+0x5051525354555657
Mark Wielaard 04e90e
+0x2021222324252627 rt1contents
Mark Wielaard 04e90e
+0x3031323334353637 rt2contents
Mark Wielaard 04e90e
+
Mark Wielaard 04e90e
+Block after ldxp/stxp 2x32-bit (success)
Mark Wielaard 04e90e
+0x0001020304050607
Mark Wielaard 04e90e
+0x1011121314151617
Mark Wielaard 04e90e
+0xbbbb000077778888
Mark Wielaard 04e90e
+0x3031323334353637
Mark Wielaard 04e90e
+0x4041424344454647
Mark Wielaard 04e90e
+0x5051525354555657
Mark Wielaard 04e90e
+0x0000000024252627 rt1contents
Mark Wielaard 04e90e
+0x0000000020212223 rt2contents
Mark Wielaard 04e90e
+
Mark Wielaard 04e90e
diff --git a/none/tests/arm64/ldxp_stxp_basisimpl.vgtest b/none/tests/arm64/ldxp_stxp_basisimpl.vgtest
Mark Wielaard 04e90e
new file mode 100644
Mark Wielaard 04e90e
index 000000000..29133729a
Mark Wielaard 04e90e
--- /dev/null
Mark Wielaard 04e90e
+++ b/none/tests/arm64/ldxp_stxp_basisimpl.vgtest
Mark Wielaard 04e90e
@@ -0,0 +1,2 @@
Mark Wielaard 04e90e
+prog: ldxp_stxp
Mark Wielaard 04e90e
+vgopts: -q
Mark Wielaard 04e90e
diff --git a/none/tests/arm64/ldxp_stxp_fallbackimpl.stderr.exp b/none/tests/arm64/ldxp_stxp_fallbackimpl.stderr.exp
Mark Wielaard 04e90e
new file mode 100644
Mark Wielaard 04e90e
index 000000000..e69de29bb
Mark Wielaard 04e90e
diff --git a/none/tests/arm64/ldxp_stxp_fallbackimpl.stdout.exp b/none/tests/arm64/ldxp_stxp_fallbackimpl.stdout.exp
Mark Wielaard 04e90e
new file mode 100644
Mark Wielaard 04e90e
index 000000000..f269ecdcc
Mark Wielaard 04e90e
--- /dev/null
Mark Wielaard 04e90e
+++ b/none/tests/arm64/ldxp_stxp_fallbackimpl.stdout.exp
Mark Wielaard 04e90e
@@ -0,0 +1,20 @@
Mark Wielaard 04e90e
+Block after ldxp/stxp 2x64-bit (success)
Mark Wielaard 04e90e
+0x0001020304050607
Mark Wielaard 04e90e
+0x1011121314151617
Mark Wielaard 04e90e
+0x5555666677778888
Mark Wielaard 04e90e
+0xaaaa9999bbbb0000
Mark Wielaard 04e90e
+0x4041424344454647
Mark Wielaard 04e90e
+0x5051525354555657
Mark Wielaard 04e90e
+0x2021222324252627 rt1contents
Mark Wielaard 04e90e
+0x3031323334353637 rt2contents
Mark Wielaard 04e90e
+
Mark Wielaard 04e90e
+Block after ldxp/stxp 2x32-bit (success)
Mark Wielaard 04e90e
+0x0001020304050607
Mark Wielaard 04e90e
+0x1011121314151617
Mark Wielaard 04e90e
+0xbbbb000077778888
Mark Wielaard 04e90e
+0x3031323334353637
Mark Wielaard 04e90e
+0x4041424344454647
Mark Wielaard 04e90e
+0x5051525354555657
Mark Wielaard 04e90e
+0x0000000024252627 rt1contents
Mark Wielaard 04e90e
+0x0000000020212223 rt2contents
Mark Wielaard 04e90e
+
Mark Wielaard 04e90e
diff --git a/none/tests/arm64/ldxp_stxp_fallbackimpl.vgtest b/none/tests/arm64/ldxp_stxp_fallbackimpl.vgtest
Mark Wielaard 04e90e
new file mode 100644
Mark Wielaard 04e90e
index 000000000..474282a03
Mark Wielaard 04e90e
--- /dev/null
Mark Wielaard 04e90e
+++ b/none/tests/arm64/ldxp_stxp_fallbackimpl.vgtest
Mark Wielaard 04e90e
@@ -0,0 +1,2 @@
Mark Wielaard 04e90e
+prog: ldxp_stxp
Mark Wielaard 04e90e
+vgopts: -q --sim-hints=fallback-llsc
Mark Wielaard 04e90e
Mark Wielaard 04e90e
commit 0d38ca5dd6b446c70738031132d41f09de0f7a8a
Mark Wielaard 04e90e
Author: Julian Seward <jseward@acm.org>
Mark Wielaard 04e90e
Date:   Fri Nov 12 13:08:45 2021 +0100
Mark Wielaard 04e90e
Mark Wielaard 04e90e
    Bug 444399 - disInstr(arm64): unhandled instruction 0xC87F2D89 (LD{,A}XP and ST{,L}XP).  FOLLOWUP FIX.
Mark Wielaard 04e90e
    
Mark Wielaard 04e90e
    This is an attempt to un-break 'make dist', as broken by the main commit for
Mark Wielaard 04e90e
    this bug, which was 530df882b8f60ecacaf2b9b8a719f7ea1c1d1650.
Mark Wielaard 04e90e
Mark Wielaard 04e90e
diff --git a/none/tests/arm64/Makefile.am b/none/tests/arm64/Makefile.am
Mark Wielaard 04e90e
index 9efb49b27..4a06f0996 100644
Mark Wielaard 04e90e
--- a/none/tests/arm64/Makefile.am
Mark Wielaard 04e90e
+++ b/none/tests/arm64/Makefile.am
Mark Wielaard 04e90e
@@ -14,8 +14,10 @@ EXTRA_DIST = \
Mark Wielaard 04e90e
         fmadd_sub.stdout.exp fmadd_sub.stderr.exp fmadd_sub.vgtest \
Mark Wielaard 04e90e
 	fp_and_simd_v82.stdout.exp fp_and_simd_v82.stderr.exp \
Mark Wielaard 04e90e
 	fp_and_simd_v82.vgtest \
Mark Wielaard 04e90e
-	ldxp_stxp.stdout.exp ldxp_stxp.stderr.exp \
Mark Wielaard 04e90e
-	ldxp_stxp_basisimpl.vgtest ldxp_stxp_fallbackimpl.vgtest
Mark Wielaard 04e90e
+	ldxp_stxp_basisimpl.stdout.exp ldxp_stxp_basisimpl.stderr.exp \
Mark Wielaard 04e90e
+	ldxp_stxp_basisimpl.vgtest \
Mark Wielaard 04e90e
+	ldxp_stxp_fallbackimpl.stdout.exp ldxp_stxp_fallbackimpl.stderr.exp \
Mark Wielaard 04e90e
+	ldxp_stxp_fallbackimpl.vgtest
Mark Wielaard 04e90e
 
Mark Wielaard 04e90e
 check_PROGRAMS = \
Mark Wielaard 04e90e
 	allexec \