Blame SOURCES/0019-mdmon-wait-for-previous-mdmon-to-exit-during-takeove.patch

5eacff
From d2e11da4b7fd0453e942f43e4196dc63b3dbd708 Mon Sep 17 00:00:00 2001
5eacff
From: Pawel Baldysiak <pawel.baldysiak@intel.com>
5eacff
Date: Fri, 22 Feb 2019 13:30:27 +0100
5eacff
Subject: [RHEL7.7 PATCH 19/24] mdmon: wait for previous mdmon to exit during
5eacff
 takeover
5eacff
5eacff
Since the patch c76242c5("mdmon: get safe mode delay file descriptor
5eacff
early"), safe_mode_dalay is set properly by initrd mdmon.  But in some
5eacff
cases with filesystem traffic since the very start of the system, it
5eacff
might take a while to transit to clean state.  Due to fact that new
5eacff
mdmon does not wait for the old one to exit - it might happen that the
5eacff
new one switches safe_mode_delay back to seconds, before old one exits.
5eacff
As the result two mdmons are running concurrently on same array.
5eacff
5eacff
Wait for the old mdmon to exit by pinging it with SIGUSR1 signal, just
5eacff
in case it is sleeping.
5eacff
5eacff
Signed-off-by: Pawel Baldysiak <pawel.baldysiak@intel.com>
5eacff
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
5eacff
---
5eacff
 mdmon.c | 14 +++++++++++---
5eacff
 1 file changed, 11 insertions(+), 3 deletions(-)
5eacff
5eacff
diff --git a/mdmon.c b/mdmon.c
5eacff
index 0955fcc..ff985d2 100644
5eacff
--- a/mdmon.c
5eacff
+++ b/mdmon.c
5eacff
@@ -171,6 +171,7 @@ static void try_kill_monitor(pid_t pid, char *devname, int sock)
5eacff
 	int fd;
5eacff
 	int n;
5eacff
 	long fl;
5eacff
+	int rv;
5eacff
 
5eacff
 	/* first rule of survival... don't off yourself */
5eacff
 	if (pid == getpid())
5eacff
@@ -201,9 +202,16 @@ static void try_kill_monitor(pid_t pid, char *devname, int sock)
5eacff
 	fl &= ~O_NONBLOCK;
5eacff
 	fcntl(sock, F_SETFL, fl);
5eacff
 	n = read(sock, buf, 100);
5eacff
-	/* Ignore result, it is just the wait that
5eacff
-	 * matters
5eacff
-	 */
5eacff
+
5eacff
+	/* If there is I/O going on it might took some time to get to
5eacff
+	 * clean state. Wait for monitor to exit fully to avoid races.
5eacff
+	 * Ping it with SIGUSR1 in case that it is sleeping  */
5eacff
+	for (n = 0; n < 25; n++) {
5eacff
+		rv = kill(pid, SIGUSR1);
5eacff
+		if (rv < 0)
5eacff
+			break;
5eacff
+		usleep(200000);
5eacff
+	}
5eacff
 }
5eacff
 
5eacff
 void remove_pidfile(char *devname)
5eacff
-- 
5eacff
2.7.5
5eacff