ppc64: tackle SRCU hang issue
Resolves: bz2158296
Upstream: RHEL-only
On PowerPC platform, the following hang is witnessed:
Welcome to
Red Hat Enterprise Linux 9.2 Beta (Plow) dracut-057-13.git20220816.el9 (Initramfs)
!
[ 1.631210] systemd[1]: Hostname set to <ibm-p9z-18-lp11.virt.pnr.lab.eng.rdu2.redhat.com>.
[-- MARK -- Mon Sep 26 01:45:00 2022]
[ 243.681283] INFO: task systemd:1 blocked for more than 122 seconds.
[ 243.681303] Not tainted 5.14.0-167.el9.ppc64le #1
[ 243.681315] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 243.681329] task:systemd state:D stack: 0 pid: 1 ppid: 0 flags:0x00042000
[ 243.681349] Call Trace:
[ 243.681356] [c00000001a603640] [c00000004f990100] 0xc00000004f990100 (unreliable)
[ 243.681378] [c00000001a603830] [c00000001001e9cc] __switch_to+0x12c/0x220
[ 243.681400] [c00000001a603890] [c000000010ec5b40] __schedule+0x230/0x720
[ 243.681418] [c00000001a603950] [c000000010ec6090] schedule+0x60/0x110
[ 243.681435] [c00000001a603980] [c000000010ecd948] schedule_timeout+0x168/0x1c0
[ 243.681454] [c00000001a603a60] [c000000010ec7214] __wait_for_common+0x134/0x360
[ 243.681473] [c00000001a603b00] [c00000001017c98c] __flush_work.isra.0+0x1dc/0x3d0
[ 243.681493] [c00000001a603ba0] [c0000000105cbd88] fsnotify_wait_marks_destroyed+0x28/0x40
[ 243.681512] [c00000001a603bc0] [c0000000105cb800] fsnotify_destroy_group+0x60/0x150
[ 243.681531] [c00000001a603c30] [c0000000105cf640] inotify_release+0x30/0xa0
[ 243.681548] [c00000001a603ca0] [c00000001054fad8] __fput+0xc8/0x350
[ 243.681565] [c00000001a603cf0] [c000000010183174] task_work_run+0xe4/0x160
[ 243.681583] [c00000001a603d40] [c000000010021874] do_notify_resume+0x134/0x140
[ 243.681602] [c00000001a603d70] [c000000010030168] interrupt_exit_user_prepare_main+0x198/0x270
[ 243.681622] [c00000001a603de0] [c0000000100305ac] syscall_exit_prepare+0x6c/0x180
[ 243.681641] [c00000001a603e10] [c00000001000bff4] system_call_vectored_common+0xf4/0x278
[ 243.681661] --- interrupt: 3000 at 0x7fffb3015ba4
[ 243.681673] NIP: 00007fffb3015ba4 LR: 0000000000000000 CTR: 0000000000000000
[ 243.681687] REGS: c00000001a603e80 TRAP: 3000 Not tainted (5.14.0-167.el9.ppc64le)
[ 243.681703] MSR: 800000000000d033 <SF,EE,PR,ME,IR,DR,RI,LE> CR: 42044440 XER: 00000000
[ 243.681737] IRQMASK: 0
[ 243.681737] GPR00: 0000000000000006 00007fffd24a31a0 00007fffb3127200 0000000000000000
[ 243.681737] GPR04: 0000000000000002 000000000000000a 0000000000000000 0000000000000000
[ 243.681737] GPR08: 0000010009ea2d40 0000000000000000 0000000000000000 0000000000000000
[ 243.681737] GPR12: 0000000000000000 00007fffb3834bc0 0000000000000000 0000000000000000
[ 243.681737] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 243.681737] GPR20: 000000012c74ddf0 000000000000000e 000000000017cd3f 0000000000000000
[ 243.681737] GPR24: 00007fffd24a3570 0000000000000005 0000010009eb5490 0000010009ea24e0
[ 243.681737] GPR28: 0000010009ea2900 0000010009eb4850 0000010009ea2d70 00007fffb382dd98
[ 243.681896] NIP [00007fffb3015ba4] 0x7fffb3015ba4
[ 243.681907] LR [0000000000000000] 0x0
[ 243.681917] --- interrupt: 3000
[ 243.681928] INFO: task kworker/u16:1:34 blocked for more than 122 seconds.
[ 243.681941] Not tainted 5.14.0-167.el9.ppc64le #1
[ 243.681951] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 243.681964] task:kworker/u16:1 state:D stack: 0 pid: 34 ppid: 2 flags:0x00000800
[ 243.681982] Workqueue: events_unbound fsnotify_mark_destroy_workfn
[ 243.681998] Call Trace:
[ 243.682005] [c00000001a9336d0] [c00000004f990100] 0xc00000004f990100 (unreliable)
[ 243.682023] [c00000001a9338c0] [c00000001001e9cc] __switch_to+0x12c/0x220
[ 243.682042] [c00000001a933920] [c000000010ec5b40] __schedule+0x230/0x720
[ 243.682059] [c00000001a9339e0] [c000000010ec6090] schedule+0x60/0x110
[ 243.682075] [c00000001a933a10] [c000000010ecd948] schedule_timeout+0x168/0x1c0
[ 243.682094] [c00000001a933af0] [c000000010ec7214] __wait_for_common+0x134/0x360
[ 243.682113] [c00000001a933b90] [c000000010213370] __synchronize_srcu.part.0+0xa0/0xe0
[ 243.682132] [c00000001a933c00] [c0000000105cc154] fsnotify_mark_destroy_workfn+0xc4/0x1a0
[ 243.682151] [c00000001a933c70] [c00000001017acb8] process_one_work+0x298/0x580
[ 243.682169] [c00000001a933d10] [c00000001017b048] worker_thread+0xa8/0x630
[ 243.682185] [c00000001a933da0] [c000000010188348] kthread+0x1b8/0x1c0
[ 243.682203] [c00000001a933e10] [c00000001000cd64] ret_from_kernel_thread+0x5c/0x64
[ 366.561279] INFO: task systemd:1 blocked for more than 245 seconds.
The right solution should be in kernel, but since the patch [1] for SRCU
will not be merged into the mainline in near future, it had better to
have a userspace workaround to overcome this test blocker.
The workaround method is to pass the kernel parameter "srcutree.big_cpu_lim=0", so
that the SRCU system will always use srcu_node array.
[1]: https://lore.kernel.org/rcu/20221026032716.78674-1-kernelfans@gmail.com/T/#m6534975507c2abca497a94d81c7abbfea1d0978d
Signed-off-by: Pingfan Liu <piliu@redhat.com>