From e7a82370a7b5d3ca342d5e42e25763fa2c938739 Mon Sep 17 00:00:00 2001
From: Jan Friesse <jfriesse@redhat.com>
Date: Tue, 26 Oct 2021 18:17:59 +0200
Subject: [PATCH] totemsrp: Switch totempg buffers at the right time
Commit 92e0f9c7bb9b4b6a0da8d64bdf3b2e47ae55b1cc added switching of
totempg buffers in sync phase. But because buffers got switch too early
there was a problem when delivering recovered messages (messages got
corrupted and/or lost). Solution is to switch buffers after recovered
messages got delivered.
I think it is worth to describe complete history with reproducers so it
doesn't get lost.
It all started with 402638929e5045ef520a7339696c687fbed0b31b (more info
about original problem is described in
https://bugzilla.redhat.com/show_bug.cgi?id=820821). This patch
solves problem which is way to be reproduced with following reproducer:
- 2 nodes
- Both nodes running corosync and testcpg
- Pause node 1 (SIGSTOP of corosync)
- On node 1, send some messages by testcpg
(it's not answering but this doesn't matter). Simply hit ENTER key
few times is enough)
- Wait till node 2 detects that node 1 left
- Unpause node 1 (SIGCONT of corosync)
and on node 1 newly mcasted cpg messages got sent before sync barrier,
so node 2 logs "Unknown node -> we will not deliver message".
Solution was to add switch of totemsrp new messages buffer.
This patch was not enough so new one
(92e0f9c7bb9b4b6a0da8d64bdf3b2e47ae55b1cc) was created. Reproducer of
problem was similar, just cpgverify was used instead of testcpg.
Occasionally when node 1 was unpaused it hang in sync phase because
there was a partial message in totempg buffers. New sync message had
different frag cont so it was thrown away and never delivered.
After many years problem was found which is solved by this patch
(original issue describe in
https://github.com/corosync/corosync/issues/660).
Reproducer is more complex:
- 2 nodes
- Node 1 is rate-limited (used script on the hypervisor side):
```
iface=tapXXXX
# ~0.1MB/s in bit/s
rate=838856
# 1mb/s
burst=1048576
tc qdisc add dev $iface root handle 1: htb default 1
tc class add dev $iface parent 1: classid 1:1 htb rate ${rate}bps \
burst ${burst}b
tc qdisc add dev $iface handle ffff: ingress
tc filter add dev $iface parent ffff: prio 50 basic police rate \
${rate}bps burst ${burst}b mtu 64kb "drop"
```
- Node 2 is running corosync and cpgverify
- Node 1 keeps restarting of corosync and running cpgverify in cycle
- Console 1: while true; do corosync; sleep 20; \
kill $(pidof corosync); sleep 20; done
- Console 2: while true; do ./cpgverify;done
And from time to time (reproduced usually in less than 5 minutes)
cpgverify reports corrupted message.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
---
exec/totemsrp.c | 16 +++++++++++++++-
1 file changed, 15 insertions(+), 1 deletion(-)
diff --git a/exec/totemsrp.c b/exec/totemsrp.c
index d24b11fa..fd71771b 100644
--- a/exec/totemsrp.c
+++ b/exec/totemsrp.c
@@ -1989,13 +1989,27 @@ static void memb_state_operational_enter (struct totemsrp_instance *instance)
trans_memb_list_totemip, instance->my_trans_memb_entries,
left_list, instance->my_left_memb_entries,
0, 0, &instance->my_ring_id);
+ /*
+ * Switch new totemsrp messages queue. Messages sent from now on are stored
+ * in different queue so synchronization messages are delivered first. Totempg
+ * buffers will be switched later.
+ */
instance->waiting_trans_ack = 1;
- instance->totemsrp_waiting_trans_ack_cb_fn (1);
// TODO we need to filter to ensure we only deliver those
// messages which are part of instance->my_deliver_memb
messages_deliver_to_app (instance, 1, instance->old_ring_state_high_seq_received);
+ /*
+ * Switch totempg buffers. This used to be right after
+ * instance->waiting_trans_ack = 1;
+ * line. This was causing problem, because there may be not yet
+ * processed parts of messages in totempg buffers.
+ * So when buffers were switched and recovered messages
+ * got delivered it was not possible to assemble them.
+ */
+ instance->totemsrp_waiting_trans_ack_cb_fn (1);
+
instance->my_aru = aru_save;
/*
--
2.27.0