Blame SOURCES/0022-mdadm-super1-restore-commit-45a87c2f31335-to-fix-clu.patch

fdf7c0
From 63902857b98c37c8ac4b837bb01d006b327a4532 Mon Sep 17 00:00:00 2001
fdf7c0
From: Heming Zhao <heming.zhao@suse.com>
fdf7c0
Date: Tue, 21 Jun 2022 00:10:40 +0800
01ff50
Subject: [PATCH 22/83] mdadm/super1: restore commit 45a87c2f31335 to fix
fdf7c0
 clustered slot issue
fdf7c0
fdf7c0
Commit 9d67f6496c71 ("mdadm:check the nodes when operate clustered
fdf7c0
array") modified assignment logic for st->nodes in write_bitmap1(),
fdf7c0
which introduced bitmap slot issue:
fdf7c0
fdf7c0
load_super1 didn't set up supertype.nodes, which made spare disk only
fdf7c0
have one slot info. Then it triggered kernel md_bitmap_load_sb to get
fdf7c0
wrong bitmap slot data.
fdf7c0
fdf7c0
For fixing this issue, there are two methods:
fdf7c0
fdf7c0
1> revert the related code of commit 9d67f6496c71. and restore the code
fdf7c0
   from former commit 45a87c2f31335 ("super1: add more checks for
fdf7c0
   NodeNumUpdate option").
fdf7c0
   st->nodes value would be 0 & 1 under current code logic. i.e.
fdf7c0
   When adding a spare disk, there is no place to init st->nodes, and
fdf7c0
   the value is ZERO.
fdf7c0
fdf7c0
2> keep 9d67f6496c71, add additional ->nodes handling in load_super1(),
fdf7c0
   let load_super1 to set st->nodes when bitmap is BITMAP_MAJOR_CLUSTERED.
fdf7c0
   Under current mdadm code logic, load_super1 will be called many
fdf7c0
   times, any new code in load_super1 will cost mdadm running more time.
fdf7c0
   And more reason is I prefer as much as possible to limit clustered
fdf7c0
   code spreading in every corner.
fdf7c0
fdf7c0
So I used method <1> to fix this issue.
fdf7c0
fdf7c0
How to trigger:
fdf7c0
fdf7c0
dd if=/dev/zero bs=1M count=1 oflag=direct of=/dev/sda
fdf7c0
dd if=/dev/zero bs=1M count=1 oflag=direct of=/dev/sdb
fdf7c0
dd if=/dev/zero bs=1M count=1 oflag=direct of=/dev/sdc
fdf7c0
mdadm -C /dev/md0 -b clustered -e 1.2 -n 2 -l mirror /dev/sda /dev/sdb
fdf7c0
mdadm -a /dev/md0 /dev/sdc
fdf7c0
mdadm /dev/md0 --fail /dev/sda
fdf7c0
mdadm /dev/md0 --remove /dev/sda
fdf7c0
mdadm -Ss
fdf7c0
mdadm -A /dev/md0 /dev/sdb /dev/sdc
fdf7c0
fdf7c0
the output of current "mdadm -X /dev/sdc":
fdf7c0
(there should be (by default) 4 slot info for correct output)
fdf7c0
```
fdf7c0
        Filename : /dev/sdc
fdf7c0
           Magic : 6d746962
fdf7c0
         Version : 5
fdf7c0
            UUID : a74642f8:a6b1fba8:58e1f8db:cfe7b082
fdf7c0
          Events : 29
fdf7c0
  Events Cleared : 0
fdf7c0
           State : OK
fdf7c0
       Chunksize : 64 MB
fdf7c0
          Daemon : 5s flush period
fdf7c0
      Write Mode : Normal
fdf7c0
       Sync Size : 306176 (299.00 MiB 313.52 MB)
fdf7c0
          Bitmap : 5 bits (chunks), 5 dirty (100.0%)
fdf7c0
```
fdf7c0
fdf7c0
And mdadm later operations will trigger kernel output error message:
fdf7c0
(triggered by "mdadm -A /dev/md0 /dev/sdb /dev/sdc")
fdf7c0
```
fdf7c0
kernel: md0: invalid bitmap file superblock: bad magic
fdf7c0
kernel: md_bitmap_copy_from_slot can't get bitmap from slot 1
fdf7c0
kernel: md-cluster: Could not gather bitmaps from slot 1
fdf7c0
kernel: md0: invalid bitmap file superblock: bad magic
fdf7c0
kernel: md_bitmap_copy_from_slot can't get bitmap from slot 2
fdf7c0
kernel: md-cluster: Could not gather bitmaps from slot 2
fdf7c0
kernel: md0: invalid bitmap file superblock: bad magic
fdf7c0
kernel: md_bitmap_copy_from_slot can't get bitmap from slot 3
fdf7c0
kernel: md-cluster: Could not gather bitmaps from slot 3
fdf7c0
kernel: md-cluster: failed to gather all resyn infos
fdf7c0
kernel: md0: detected capacity change from 0 to 612352
fdf7c0
```
fdf7c0
fdf7c0
Acked-by: Coly Li <colyli@suse.de>
fdf7c0
Signed-off-by: Heming Zhao <heming.zhao@suse.com>
fdf7c0
Signed-off-by: Jes Sorensen <jsorensen@fb.com>
fdf7c0
---
fdf7c0
 super1.c | 12 +++++++++++-
fdf7c0
 1 file changed, 11 insertions(+), 1 deletion(-)
fdf7c0
fdf7c0
diff --git a/super1.c b/super1.c
fdf7c0
index e3e2f954..3a0c69fd 100644
fdf7c0
--- a/super1.c
fdf7c0
+++ b/super1.c
fdf7c0
@@ -2674,7 +2674,17 @@ static int write_bitmap1(struct supertype *st, int fd, enum bitmap_update update
fdf7c0
 		}
fdf7c0
 
fdf7c0
 		if (bms->version == BITMAP_MAJOR_CLUSTERED) {
fdf7c0
-			if (__cpu_to_le32(st->nodes) < bms->nodes) {
fdf7c0
+			if (st->nodes == 1) {
fdf7c0
+				/* the parameter for nodes is not valid */
fdf7c0
+				pr_err("Warning: cluster-md at least needs two nodes\n");
fdf7c0
+				return -EINVAL;
fdf7c0
+			} else if (st->nodes == 0) {
fdf7c0
+				/*
fdf7c0
+				 * parameter "--nodes" is not specified, (eg, add a disk to
fdf7c0
+				 * clustered raid)
fdf7c0
+				 */
fdf7c0
+				break;
fdf7c0
+			} else if (__cpu_to_le32(st->nodes) < bms->nodes) {
fdf7c0
 				/*
fdf7c0
 				 * Since the nodes num is not increased, no
fdf7c0
 				 * need to check the space enough or not,
fdf7c0
-- 
01ff50
2.38.1
fdf7c0