Blob Blame History Raw
From 04dce26bd12888b924425beefa449a07b683021a Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Renaud=20M=C3=A9trich?= <rmetrich@redhat.com>
Date: Fri, 13 Apr 2018 09:24:30 +0200
Subject: [PATCH] [kernel] Disable gathering /proc/timer* statistics
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Disable gathering /proc/timer* statistics by default, a new option
'kernel.with-timer' enables gathering these.

If /proc/timer_list is huge, then kernel will experience issues with
processing all the timers since it needs to spin in a tight loop inside
the kernel.

We have tried to fix it from kernel side, added touch_nmi_watchdog() to
silence softlockups, cond_resched() to fix RCU stall issue but with such
huge number of timers the RHEL7 kernel is still hangs.
It can reproduced somehow on upstream kernel (however, there will be
workqueue lockups).

We came to conclusion that reading /proc/timer_list should be disabled
in sosreport. Since /proc/timer_stats is tight to /proc/timer_list, both
are disabled at the same time.

Resolves: #1268

Signed-off-by: Renaud Métrich <rmetrich@redhat.com>
Signed-off-by: Bryn M. Reeves <bmr@redhat.com>
---
 sos/plugins/kernel.py | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/sos/plugins/kernel.py b/sos/plugins/kernel.py
index 97ef7862..6c2f509c 100644
--- a/sos/plugins/kernel.py
+++ b/sos/plugins/kernel.py
@@ -27,6 +27,10 @@ class Kernel(Plugin, RedHatPlugin, DebianPlugin, UbuntuPlugin):
 
     sys_module = '/sys/module'
 
+    option_list = [
+        ("with-timer", "gather /proc/timer* statistics", "slow", False)
+    ]
+
     def setup(self):
         # compat
         self.add_cmd_output("uname -a", root_symlink="uname")
@@ -83,7 +87,6 @@ class Kernel(Plugin, RedHatPlugin, DebianPlugin, UbuntuPlugin):
             "/proc/driver",
             "/proc/sys/kernel/tainted",
             "/proc/softirqs",
-            "/proc/timer*",
             "/proc/lock*",
             "/proc/misc",
             "/var/log/dmesg",
@@ -92,4 +95,9 @@ class Kernel(Plugin, RedHatPlugin, DebianPlugin, UbuntuPlugin):
             clocksource_path + "current_clocksource"
         ])
 
+        if self.get_option("with-timer"):
+            # This can be very slow, depending on the number of timers,
+            # and may also cause softlockups
+            self.add_copy_spec("/proc/timer*")
+
 # vim: set et ts=4 sw=4 :
-- 
2.13.6