Blob Blame History Raw
From ca320beac25f82c0c555799e647a47975a333c28 Mon Sep 17 00:00:00 2001
From: Jan Friesse <jfriesse@redhat.com>
Date: Tue, 10 Mar 2020 17:49:27 +0100
Subject: [PATCH] votequorum: set wfa status only on startup

Previously reload of configuration with enabled wait_for_all result in
set of wait_for_all_status which set cluster_is_quorate to 0 but didn't
inform the quorum service so votequorum and quorum information may get
out of sync.

Example is 1 node cluster, which is extended to 3 nodes. Quorum service
reports cluster as a quorate (incorrect) and votequorum as not-quorate
(correct). Similar behavior happens when extending cluster in general,
but some configurations are less incorrect (3->4).

Discussed solution was to inform quorum service but that would mean
every reload would cause loss of quorum until all nodes would be seen
again.

Such behaviour is consistent but seems to be a bit too strict.

Proposed solution sets wait_for_all_status only on startup and
doesn't touch it during reload.

This solution fulfills requirement of "cluster will be quorate for
the first time only after all nodes have been visible at least
once at the same time." because node clears wait_for_all_status only
after it sees all other nodes or joins cluster which is quorate. It also
solves problem with extending cluster, because when cluster becomes
unquorate (1->3) wait_for_all_status is set.

Added assert is only for ensure that I haven't missed any case when
quorate cluster may become unquorate.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
---
 exec/votequorum.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/exec/votequorum.c b/exec/votequorum.c
index b152425..fb9f1cd 100644
--- a/exec/votequorum.c
+++ b/exec/votequorum.c
@@ -1009,7 +1009,7 @@ static void are_we_quorate(unsigned int total_votes)
 				   "Waiting for all cluster members. "
 				   "Current votes: %d expected_votes: %d",
 				   total_votes, us->expected_votes);
-			cluster_is_quorate = 0;
+			assert(!cluster_is_quorate);
 			return;
 		}
 		update_wait_for_all_status(0);
@@ -1547,7 +1547,9 @@ static char *votequorum_readconfig(int runtime)
 	update_ev_barrier(us->expected_votes);
 	update_two_node();
 	if (wait_for_all) {
-		update_wait_for_all_status(1);
+		if (!runtime) {
+			update_wait_for_all_status(1);
+		}
 	} else if (wait_for_all_autoset && wait_for_all_status) {
 		/*
 		 * Reset wait for all status for consistency when wfa is auto-unset by 2node.
-- 
1.8.3.1