Blame SOURCES/004-g_source_remove.patch

f6d6d9
From 45617b727e280cac384a28ae3d96145e066e6197 Mon Sep 17 00:00:00 2001
f6d6d9
From: Reid Wahl <nrwahl@protonmail.com>
f6d6d9
Date: Fri, 3 Feb 2023 12:08:57 -0800
f6d6d9
Subject: [PATCH] Fix: fencer: Prevent double g_source_remove of op_timer_one
f6d6d9
f6d6d9
QE observed a rarely reproducible core dump in the fencer during
f6d6d9
Pacemaker shutdown, in which we try to g_source_remove() an op timer
f6d6d9
that's already been removed.
f6d6d9
f6d6d9
free_stonith_remote_op_list()
f6d6d9
-> g_hash_table_destroy()
f6d6d9
-> g_hash_table_remove_all_nodes()
f6d6d9
-> clear_remote_op_timers()
f6d6d9
-> g_source_remove()
f6d6d9
-> crm_glib_handler()
f6d6d9
-> "Source ID 190 was not found when attempting to remove it"
f6d6d9
f6d6d9
The likely cause is that request_peer_fencing() doesn't set
f6d6d9
op->op_timer_one to 0 after calling g_source_remove() on it, so if that
f6d6d9
op is still in the stonith_remote_op_list at shutdown with the same
f6d6d9
timer, clear_remote_op_timers() tries to remove the source for
f6d6d9
op_timer_one again.
f6d6d9
f6d6d9
There are only five locations that call g_source_remove() on a
f6d6d9
remote_fencing_op_t timer.
f6d6d9
* Three of them are in clear_remote_op_timers(), which first 0-checks
f6d6d9
  the timer and then sets it to 0 after g_source_remove().
f6d6d9
* One is in remote_op_query_timeout(), which does the same.
f6d6d9
* The last is the one we fix here in request_peer_fencing().
f6d6d9
f6d6d9
I don't know all the conditions of QE's test scenario at this point.
f6d6d9
What I do know:
f6d6d9
* have-watchdog=true
f6d6d9
* stonith-watchdog-timeout=10
f6d6d9
* no explicit topology
f6d6d9
* fence agent script is missing for the configured fence device
f6d6d9
* requested fencing of one node
f6d6d9
* cluster shutdown
f6d6d9
f6d6d9
Fixes RHBZ2166967
f6d6d9
f6d6d9
Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
f6d6d9
---
f6d6d9
 daemons/fenced/fenced_remote.c | 1 +
f6d6d9
 1 file changed, 1 insertion(+)
f6d6d9
f6d6d9
diff --git a/daemons/fenced/fenced_remote.c b/daemons/fenced/fenced_remote.c
f6d6d9
index d61b5bd..b7426ff 100644
f6d6d9
--- a/daemons/fenced/fenced_remote.c
f6d6d9
+++ b/daemons/fenced/fenced_remote.c
f6d6d9
@@ -1825,6 +1825,7 @@ request_peer_fencing(remote_fencing_op_t *op, peer_device_info_t *peer)
f6d6d9
         op->state = st_exec;
f6d6d9
         if (op->op_timer_one) {
f6d6d9
             g_source_remove(op->op_timer_one);
f6d6d9
+            op->op_timer_one = 0;
f6d6d9
         }
f6d6d9
 
f6d6d9
         if (!((stonith_watchdog_timeout_ms > 0)
f6d6d9
-- 
f6d6d9
2.31.1
f6d6d9