Blame SOURCES/0019-mdmon-wait-for-previous-mdmon-to-exit-during-takeove.patch

c8f3db
From d2e11da4b7fd0453e942f43e4196dc63b3dbd708 Mon Sep 17 00:00:00 2001
c8f3db
From: Pawel Baldysiak <pawel.baldysiak@intel.com>
c8f3db
Date: Fri, 22 Feb 2019 13:30:27 +0100
c8f3db
Subject: [RHEL7.7 PATCH 19/21] mdmon: wait for previous mdmon to exit during
c8f3db
 takeover
c8f3db
c8f3db
Since the patch c76242c5("mdmon: get safe mode delay file descriptor
c8f3db
early"), safe_mode_dalay is set properly by initrd mdmon.  But in some
c8f3db
cases with filesystem traffic since the very start of the system, it
c8f3db
might take a while to transit to clean state.  Due to fact that new
c8f3db
mdmon does not wait for the old one to exit - it might happen that the
c8f3db
new one switches safe_mode_delay back to seconds, before old one exits.
c8f3db
As the result two mdmons are running concurrently on same array.
c8f3db
c8f3db
Wait for the old mdmon to exit by pinging it with SIGUSR1 signal, just
c8f3db
in case it is sleeping.
c8f3db
c8f3db
Signed-off-by: Pawel Baldysiak <pawel.baldysiak@intel.com>
c8f3db
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
c8f3db
---
c8f3db
 mdmon.c | 14 +++++++++++---
c8f3db
 1 file changed, 11 insertions(+), 3 deletions(-)
c8f3db
c8f3db
diff --git a/mdmon.c b/mdmon.c
c8f3db
index 0955fcc..ff985d2 100644
c8f3db
--- a/mdmon.c
c8f3db
+++ b/mdmon.c
c8f3db
@@ -171,6 +171,7 @@ static void try_kill_monitor(pid_t pid, char *devname, int sock)
c8f3db
 	int fd;
c8f3db
 	int n;
c8f3db
 	long fl;
c8f3db
+	int rv;
c8f3db
 
c8f3db
 	/* first rule of survival... don't off yourself */
c8f3db
 	if (pid == getpid())
c8f3db
@@ -201,9 +202,16 @@ static void try_kill_monitor(pid_t pid, char *devname, int sock)
c8f3db
 	fl &= ~O_NONBLOCK;
c8f3db
 	fcntl(sock, F_SETFL, fl);
c8f3db
 	n = read(sock, buf, 100);
c8f3db
-	/* Ignore result, it is just the wait that
c8f3db
-	 * matters
c8f3db
-	 */
c8f3db
+
c8f3db
+	/* If there is I/O going on it might took some time to get to
c8f3db
+	 * clean state. Wait for monitor to exit fully to avoid races.
c8f3db
+	 * Ping it with SIGUSR1 in case that it is sleeping  */
c8f3db
+	for (n = 0; n < 25; n++) {
c8f3db
+		rv = kill(pid, SIGUSR1);
c8f3db
+		if (rv < 0)
c8f3db
+			break;
c8f3db
+		usleep(200000);
c8f3db
+	}
c8f3db
 }
c8f3db
 
c8f3db
 void remove_pidfile(char *devname)
c8f3db
-- 
c8f3db
2.7.5
c8f3db