Blame SOURCES/gdb-rhbz1210135-internal-error-linux_nat_post_attach_wait.patch

01917d
  NOTE: This patch has been forwardported to RHEL-7.2.  It is originally
01917d
  from RHEL-6.7.
01917d
01917d
  Message-ID: <54E37CE7.50703@redhat.com>
01917d
  Date: Tue, 17 Feb 2015 17:39:51 +0000
01917d
  From: Pedro Alves <palves@redhat.com>
01917d
  To: Sergio Durigan Junior <sergiodj@redhat.com>
01917d
  Subject: [debug-list] [PATCH] RH BZ #1162264 - gdb/linux-nat.c:1411:
01917d
   internal-error:,
01917d
   linux_nat_post_attach_wait: Assertion `pid == new_pid' failed.
01917d
01917d
  Hi.
01917d
01917d
  Ref: https://bugzilla.redhat.com/show_bug.cgi?id=1162264
01917d
01917d
  So I spend a few more hours today trying to reproduce the
01917d
  EACCES, to no avail.  Also, unfortunately, none of the attach
01917d
  bugs exposed by attach-many-short-lived-threads.exp test
01917d
  can explain this.
01917d
01917d
  It seems to be that really the best we can do is cope with
01917d
  the error, like in the patch below.
01917d
01917d
  Note that the backtrace at
01917d
01917d
   https://bugzilla.redhat.com/show_bug.cgi?id=1162264#c3 :
01917d
01917d
  shows that this triggers for the main thread already:
01917d
01917d
  ...
01917d
  #6  0x000000000044fd2e in linux_nat_post_attach_wait (ptid=..., first=1, cloned=0x1d84368,
01917d
  ...
01917d
01917d
  (note "first=1").
01917d
01917d
  For upstream, I think linux_nat_attach should be adjusted to work
01917d
  like gdbserver -- that is, leave the initial waitpid to the main
01917d
  wait code, like all other events, instead of synchronously
01917d
  doing waitpid(PID).  That'll get rid of linux_nat_post_attach_wait
01917d
  altogether.  But that's too invasive for a bug fix.
01917d
01917d
  >From 072c61aeb9adc64e1eb45c120061b85fbf6f4d25 Mon Sep 17 00:00:00 2001
01917d
  From: Pedro Alves <palves@redhat.com>
01917d
  Date: Tue, 17 Feb 2015 17:11:05 +0000
01917d
  Subject: [PATCH] RH BZ #1162264 - gdb/linux-nat.c:1411: internal-error:
01917d
   linux_nat_post_attach_wait: Assertion `pid == new_pid' failed.
01917d
01917d
  According to BZ #1162264, it can happen that we manage to attach to a
01917d
  process, but then waitpid on it fails with EACCES.  That's unexpected,
01917d
  and gdb hits an assertion.  But given this is an error that is out of
01917d
  our control, we should handle it gracefully.  I wasn't able to
01917d
  reproduce the EACCES, but hacking in the error, like:
01917d
01917d
  |  --- a/gdb/linux-nat.c
01917d
  |  +++ b/gdb/linux-nat.c
01917d
  |  @@ -1409,7 +1409,7 @@ linux_nat_post_attach_wait (ptid_t ptid, int first, int *cloned,
01917d
  | 	   *cloned = 1;
01917d
  | 	 }
01917d
  | 
01917d
  |  -  if (new_pid != pid)
01917d
  |  +  if (new_pid != pid || 1)
01917d
  | 	 {
01917d
  | 	   int saved_errno = errno;
01917d
  | 
01917d
  |  @@ -1423,6 +1423,7 @@ linux_nat_post_attach_wait (ptid_t ptid, int first, int *cloned,
01917d
  | 	   ptrace (PTRACE_DETACH, pid, 0, 0);
01917d
  | 
01917d
  | 	   errno = saved_errno;
01917d
  |  +      errno = EACCES;
01917d
  | 	   perror_with_name (_("waitpid"));
01917d
  | 	 }
01917d
01917d
  ... I could confirm that the error handling works properly.  On the
01917d
  EACCES case, we get:
01917d
01917d
   (gdb) attach 1202
01917d
   Attaching to process 1202
01917d
   Unable to attach: waitpid: Permission denied.
01917d
   (gdb) info inferiors
01917d
     Num  Description       Executable
01917d
   * 1    <null>
01917d
   (gdb)
01917d
01917d
  No test because the conditions that lead to the waitpid error are
01917d
  unknown.
01917d
01917d
  gdb/ChangeLog:
01917d
  2015-02-17  Pedro Alves  <palves@redhat.com>
01917d
01917d
	  * linux-nat.c: Include "exceptions.h".
01917d
	  (linux_nat_post_attach_wait): If waitpid returns an excepted
01917d
	  result, detach and error out instead of asserting.
01917d
	  (linux_nat_attach): Wrap linux_nat_post_attach_wait in TRY_CATCH.
01917d
	  Mourn inferior and rethrow in case of error while waiting for the
01917d
	  initial stop.
01917d
---
01917d
 gdb/linux-nat.c | 34 +++++++++++++++++++++++++++++++---
01917d
 1 file changed, 31 insertions(+), 3 deletions(-)
01917d
01917d
Index: gdb-7.6.1/gdb/linux-nat.c
01917d
===================================================================
01917d
--- gdb-7.6.1.orig/gdb/linux-nat.c
01917d
+++ gdb-7.6.1/gdb/linux-nat.c
01917d
@@ -1397,7 +1397,22 @@ linux_nat_post_attach_wait (ptid_t ptid,
01917d
       *cloned = 1;
01917d
     }
01917d
 
01917d
-  gdb_assert (pid == new_pid);
01917d
+  if (new_pid != pid)
01917d
+    {
01917d
+      int saved_errno = errno;
01917d
+
01917d
+      /* Unexpected waitpid result.  EACCES has been observed on RHEL
01917d
+	 6.5 (RH BZ #1162264).  This is most likely a kernel bug, thus
01917d
+	 out of our control, so treat it as invalid input.  The LWP's
01917d
+	 state is indeterminate at this point, so best we can do is
01917d
+	 error out, otherwise we'd probably end up wedged later on.
01917d
+
01917d
+	 In case we're still attached.  */
01917d
+      ptrace (PTRACE_DETACH, pid, 0, 0);
01917d
+
01917d
+      errno = saved_errno;
01917d
+      perror_with_name (_("waitpid"));
01917d
+    }
01917d
 
01917d
   if (!WIFSTOPPED (status))
01917d
     {
01917d
@@ -1621,7 +1636,7 @@ static void
01917d
 linux_nat_attach (struct target_ops *ops, char *args, int from_tty)
01917d
 {
01917d
   struct lwp_info *lp;
01917d
-  int status;
01917d
+  int status = 0;
01917d
   ptid_t ptid;
01917d
   volatile struct gdb_exception ex;
01917d
 
01917d
@@ -1659,8 +1674,19 @@ linux_nat_attach (struct target_ops *ops
01917d
   /* Add the initial process as the first LWP to the list.  */
01917d
   lp = add_initial_lwp (ptid);
01917d
 
01917d
-  status = linux_nat_post_attach_wait (lp->ptid, 1, &lp->cloned,
01917d
-				       &lp->signalled);
01917d
+  TRY_CATCH (ex, RETURN_MASK_ERROR)
01917d
+    {
01917d
+      status = linux_nat_post_attach_wait (lp->ptid, 1, &lp->cloned,
01917d
+					   &lp->signalled);
01917d
+    }
01917d
+  if (ex.reason < 0)
01917d
+    {
01917d
+      target_terminal_ours ();
01917d
+      target_mourn_inferior ();
01917d
+
01917d
+      error (_("Unable to attach: %s"), ex.message);
01917d
+    }
01917d
+
01917d
   if (!WIFSTOPPED (status))
01917d
     {
01917d
       if (WIFEXITED (status))