Blame SOURCES/bz2024658-1-totemsrp-Switch-totempg-buffers-at-the-right-time.patch

70e21d
From e7a82370a7b5d3ca342d5e42e25763fa2c938739 Mon Sep 17 00:00:00 2001
70e21d
From: Jan Friesse <jfriesse@redhat.com>
70e21d
Date: Tue, 26 Oct 2021 18:17:59 +0200
70e21d
Subject: [PATCH] totemsrp: Switch totempg buffers at the right time
70e21d
70e21d
Commit 92e0f9c7bb9b4b6a0da8d64bdf3b2e47ae55b1cc added switching of
70e21d
totempg buffers in sync phase. But because buffers got switch too early
70e21d
there was a problem when delivering recovered messages (messages got
70e21d
corrupted and/or lost). Solution is to switch buffers after recovered
70e21d
messages got delivered.
70e21d
70e21d
I think it is worth to describe complete history with reproducers so it
70e21d
doesn't get lost.
70e21d
70e21d
It all started with 402638929e5045ef520a7339696c687fbed0b31b (more info
70e21d
about original problem is described in
70e21d
https://bugzilla.redhat.com/show_bug.cgi?id=820821). This patch
70e21d
solves problem which is way to be reproduced with following reproducer:
70e21d
- 2 nodes
70e21d
- Both nodes running corosync and testcpg
70e21d
- Pause node 1 (SIGSTOP of corosync)
70e21d
- On node 1, send some messages by testcpg
70e21d
  (it's not answering but this doesn't matter). Simply hit ENTER key
70e21d
  few times is enough)
70e21d
- Wait till node 2 detects that node 1 left
70e21d
- Unpause node 1 (SIGCONT of corosync)
70e21d
70e21d
and on node 1 newly mcasted cpg messages got sent before sync barrier,
70e21d
so node 2 logs "Unknown node -> we will not deliver message".
70e21d
70e21d
Solution was to add switch of totemsrp new messages buffer.
70e21d
70e21d
This patch was not enough so new one
70e21d
(92e0f9c7bb9b4b6a0da8d64bdf3b2e47ae55b1cc) was created. Reproducer of
70e21d
problem was similar, just cpgverify was used instead of testcpg.
70e21d
Occasionally when node 1 was unpaused it hang in sync phase because
70e21d
there was a partial message in totempg buffers. New sync message had
70e21d
different frag cont so it was thrown away and never delivered.
70e21d
70e21d
After many years problem was found which is solved by this patch
70e21d
(original issue describe in
70e21d
https://github.com/corosync/corosync/issues/660).
70e21d
Reproducer is more complex:
70e21d
- 2 nodes
70e21d
- Node 1 is rate-limited (used script on the hypervisor side):
70e21d
  ```
70e21d
  iface=tapXXXX
70e21d
  # ~0.1MB/s in bit/s
70e21d
  rate=838856
70e21d
  # 1mb/s
70e21d
  burst=1048576
70e21d
  tc qdisc add dev $iface root handle 1: htb default 1
70e21d
  tc class add dev $iface parent 1: classid 1:1 htb rate ${rate}bps \
70e21d
    burst ${burst}b
70e21d
  tc qdisc add dev $iface handle ffff: ingress
70e21d
  tc filter add dev $iface parent ffff: prio 50 basic police rate \
70e21d
    ${rate}bps burst ${burst}b mtu 64kb "drop"
70e21d
  ```
70e21d
- Node 2 is running corosync and cpgverify
70e21d
- Node 1 keeps restarting of corosync and running cpgverify in cycle
70e21d
  - Console 1: while true; do corosync; sleep 20; \
70e21d
      kill $(pidof corosync); sleep 20; done
70e21d
  - Console 2: while true; do ./cpgverify;done
70e21d
70e21d
And from time to time (reproduced usually in less than 5 minutes)
70e21d
cpgverify reports corrupted message.
70e21d
70e21d
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
70e21d
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
70e21d
---
70e21d
 exec/totemsrp.c | 16 +++++++++++++++-
70e21d
 1 file changed, 15 insertions(+), 1 deletion(-)
70e21d
70e21d
diff --git a/exec/totemsrp.c b/exec/totemsrp.c
70e21d
index d24b11fa..fd71771b 100644
70e21d
--- a/exec/totemsrp.c
70e21d
+++ b/exec/totemsrp.c
70e21d
@@ -1989,13 +1989,27 @@ static void memb_state_operational_enter (struct totemsrp_instance *instance)
70e21d
 		trans_memb_list_totemip, instance->my_trans_memb_entries,
70e21d
 		left_list, instance->my_left_memb_entries,
70e21d
 		0, 0, &instance->my_ring_id);
70e21d
+	/*
70e21d
+	 * Switch new totemsrp messages queue. Messages sent from now on are stored
70e21d
+	 * in different queue so synchronization messages are delivered first. Totempg
70e21d
+	 * buffers will be switched later.
70e21d
+	 */
70e21d
 	instance->waiting_trans_ack = 1;
70e21d
-	instance->totemsrp_waiting_trans_ack_cb_fn (1);
70e21d
 
70e21d
 // TODO we need to filter to ensure we only deliver those
70e21d
 // messages which are part of instance->my_deliver_memb
70e21d
 	messages_deliver_to_app (instance, 1, instance->old_ring_state_high_seq_received);
70e21d
 
70e21d
+	/*
70e21d
+	 * Switch totempg buffers. This used to be right after
70e21d
+	 *   instance->waiting_trans_ack = 1;
70e21d
+	 * line. This was causing problem, because there may be not yet
70e21d
+	 * processed parts of messages in totempg buffers.
70e21d
+	 * So when buffers were switched and recovered messages
70e21d
+	 * got delivered it was not possible to assemble them.
70e21d
+	 */
70e21d
+	instance->totemsrp_waiting_trans_ack_cb_fn (1);
70e21d
+
70e21d
 	instance->my_aru = aru_save;
70e21d
 
70e21d
 	/*
70e21d
-- 
70e21d
2.27.0
70e21d