NOTE: This patch has been forwardported to RHEL-7.2. It is originally
from RHEL-6.7.
Message-ID: <54E37CE7.50703@redhat.com>
Date: Tue, 17 Feb 2015 17:39:51 +0000
From: Pedro Alves <palves@redhat.com>
To: Sergio Durigan Junior <sergiodj@redhat.com>
Subject: [debug-list] [PATCH] RH BZ #1162264 - gdb/linux-nat.c:1411:
internal-error:,
linux_nat_post_attach_wait: Assertion `pid == new_pid' failed.
Hi.
Ref: https://bugzilla.redhat.com/show_bug.cgi?id=1162264
So I spend a few more hours today trying to reproduce the
EACCES, to no avail. Also, unfortunately, none of the attach
bugs exposed by attach-many-short-lived-threads.exp test
can explain this.
It seems to be that really the best we can do is cope with
the error, like in the patch below.
Note that the backtrace at
https://bugzilla.redhat.com/show_bug.cgi?id=1162264#c3 :
shows that this triggers for the main thread already:
...
#6 0x000000000044fd2e in linux_nat_post_attach_wait (ptid=..., first=1, cloned=0x1d84368,
...
(note "first=1").
For upstream, I think linux_nat_attach should be adjusted to work
like gdbserver -- that is, leave the initial waitpid to the main
wait code, like all other events, instead of synchronously
doing waitpid(PID). That'll get rid of linux_nat_post_attach_wait
altogether. But that's too invasive for a bug fix.
>From 072c61aeb9adc64e1eb45c120061b85fbf6f4d25 Mon Sep 17 00:00:00 2001
From: Pedro Alves <palves@redhat.com>
Date: Tue, 17 Feb 2015 17:11:05 +0000
Subject: [PATCH] RH BZ #1162264 - gdb/linux-nat.c:1411: internal-error:
linux_nat_post_attach_wait: Assertion `pid == new_pid' failed.
According to BZ #1162264, it can happen that we manage to attach to a
process, but then waitpid on it fails with EACCES. That's unexpected,
and gdb hits an assertion. But given this is an error that is out of
our control, we should handle it gracefully. I wasn't able to
reproduce the EACCES, but hacking in the error, like:
| --- a/gdb/linux-nat.c
| +++ b/gdb/linux-nat.c
| @@ -1409,7 +1409,7 @@ linux_nat_post_attach_wait (ptid_t ptid, int first, int *cloned,
| *cloned = 1;
| }
|
| - if (new_pid != pid)
| + if (new_pid != pid || 1)
| {
| int saved_errno = errno;
|
| @@ -1423,6 +1423,7 @@ linux_nat_post_attach_wait (ptid_t ptid, int first, int *cloned,
| ptrace (PTRACE_DETACH, pid, 0, 0);
|
| errno = saved_errno;
| + errno = EACCES;
| perror_with_name (_("waitpid"));
| }
... I could confirm that the error handling works properly. On the
EACCES case, we get:
(gdb) attach 1202
Attaching to process 1202
Unable to attach: waitpid: Permission denied.
(gdb) info inferiors
Num Description Executable
* 1 <null>
(gdb)
No test because the conditions that lead to the waitpid error are
unknown.
gdb/ChangeLog:
2015-02-17 Pedro Alves <palves@redhat.com>
* linux-nat.c: Include "exceptions.h".
(linux_nat_post_attach_wait): If waitpid returns an excepted
result, detach and error out instead of asserting.
(linux_nat_attach): Wrap linux_nat_post_attach_wait in TRY_CATCH.
Mourn inferior and rethrow in case of error while waiting for the
initial stop.
---
gdb/linux-nat.c | 34 +++++++++++++++++++++++++++++++---
1 file changed, 31 insertions(+), 3 deletions(-)
Index: gdb-7.6.1/gdb/linux-nat.c
===================================================================
--- gdb-7.6.1.orig/gdb/linux-nat.c
+++ gdb-7.6.1/gdb/linux-nat.c
@@ -1397,7 +1397,22 @@ linux_nat_post_attach_wait (ptid_t ptid,
*cloned = 1;
}
- gdb_assert (pid == new_pid);
+ if (new_pid != pid)
+ {
+ int saved_errno = errno;
+
+ /* Unexpected waitpid result. EACCES has been observed on RHEL
+ 6.5 (RH BZ #1162264). This is most likely a kernel bug, thus
+ out of our control, so treat it as invalid input. The LWP's
+ state is indeterminate at this point, so best we can do is
+ error out, otherwise we'd probably end up wedged later on.
+
+ In case we're still attached. */
+ ptrace (PTRACE_DETACH, pid, 0, 0);
+
+ errno = saved_errno;
+ perror_with_name (_("waitpid"));
+ }
if (!WIFSTOPPED (status))
{
@@ -1621,7 +1636,7 @@ static void
linux_nat_attach (struct target_ops *ops, char *args, int from_tty)
{
struct lwp_info *lp;
- int status;
+ int status = 0;
ptid_t ptid;
volatile struct gdb_exception ex;
@@ -1659,8 +1674,19 @@ linux_nat_attach (struct target_ops *ops
/* Add the initial process as the first LWP to the list. */
lp = add_initial_lwp (ptid);
- status = linux_nat_post_attach_wait (lp->ptid, 1, &lp->cloned,
- &lp->signalled);
+ TRY_CATCH (ex, RETURN_MASK_ERROR)
+ {
+ status = linux_nat_post_attach_wait (lp->ptid, 1, &lp->cloned,
+ &lp->signalled);
+ }
+ if (ex.reason < 0)
+ {
+ target_terminal_ours ();
+ target_mourn_inferior ();
+
+ error (_("Unable to attach: %s"), ex.message);
+ }
+
if (!WIFSTOPPED (status))
{
if (WIFEXITED (status))