8a8cfb
commit 711a322a235d4c8177713f11aa59156603b94aeb
8a8cfb
Author: Zack Weinberg <zackw@panix.com>
8a8cfb
Date:   Mon Mar 11 10:59:27 2019 -0400
8a8cfb
8a8cfb
    Use a proper C tokenizer to implement the obsolete typedefs test.
8a8cfb
    
8a8cfb
    The test for obsolete typedefs in installed headers was implemented
8a8cfb
    using grep, and could therefore get false positives on e.g. “ulong”
8a8cfb
    in a comment.  It was also scanning all of the headers included by
8a8cfb
    our headers, and therefore testing headers we don’t control, e.g.
8a8cfb
    Linux kernel headers.
8a8cfb
    
8a8cfb
    This patch splits the obsolete-typedef test from
8a8cfb
    scripts/check-installed-headers.sh to a separate program,
8a8cfb
    scripts/check-obsolete-constructs.py.  Being implemented in Python,
8a8cfb
    it is feasible to make it tokenize C accurately enough to avoid false
8a8cfb
    positives on the contents of comments and strings.  It also only
8a8cfb
    examines $(headers) in each subdirectory--all the headers we install,
8a8cfb
    but not any external dependencies of those headers.  Headers whose
8a8cfb
    installed name starts with finclude/ are ignored, on the assumption
8a8cfb
    that they contain Fortran.
8a8cfb
    
8a8cfb
    It is also feasible to make the new test understand the difference
8a8cfb
    between _defining_ the obsolete typedefs and _using_ the obsolete
8a8cfb
    typedefs, which means posix/{bits,sys}/types.h no longer need to be
8a8cfb
    exempted.  This uncovered an actual bug in bits/types.h: __quad_t and
8a8cfb
    __u_quad_t were being used to define __S64_TYPE, __U64_TYPE,
8a8cfb
    __SQUAD_TYPE and __UQUAD_TYPE.  These are changed to __int64_t and
8a8cfb
    __uint64_t respectively.  This is a safe change, despite the comments
8a8cfb
    in bits/types.h claiming a difference between __quad_t and __int64_t,
8a8cfb
    because those comments are incorrect.  In all current ABIs, both
8a8cfb
    __quad_t and __int64_t are ‘long’ when ‘long’ is a 64-bit type, and
8a8cfb
    ‘long long’ when ‘long’ is a 32-bit type, and similarly for __u_quad_t
8a8cfb
    and __uint64_t.  (Changing the types to be what the comments say they
8a8cfb
    are would be an ABI break, as it affects C++ name mangling.)  This
8a8cfb
    patch includes a minimal change to make the comments not completely
8a8cfb
    wrong.
8a8cfb
    
8a8cfb
    sys/types.h was defining the legacy BSD u_intN_t typedefs using a
8a8cfb
    construct that was not necessarily consistent with how the C99 uintN_t
8a8cfb
    typedefs are defined, and is also too complicated for the new script to
8a8cfb
    understand (it lexes C relatively accurately, but it does not attempt
8a8cfb
    to expand preprocessor macros, nor does it do any actual parsing).
8a8cfb
    This patch cuts all of that out and uses bits/types.h's __uintN_t typedefs
8a8cfb
    to define u_intN_t instead.  This is verified to not change the ABI on
8a8cfb
    any supported architecture, via the c++-types test, which means u_intN_t
8a8cfb
    and uintN_t were, in fact, consistent on all supported architectures.
8a8cfb
    
8a8cfb
    Reviewed-by: Carlos O'Donell <carlos@redhat.com>
8a8cfb
    
8a8cfb
            * scripts/check-obsolete-constructs.py: New test script.
8a8cfb
            * scripts/check-installed-headers.sh: Remove tests for
8a8cfb
            obsolete typedefs, superseded by check-obsolete-constructs.py.
8a8cfb
            * Rules: Run scripts/check-obsolete-constructs.py over $(headers)
8a8cfb
            as a special test.  Update commentary.
8a8cfb
            * posix/bits/types.h (__SQUAD_TYPE, __S64_TYPE): Define as __int64_t.
8a8cfb
            (__UQUAD_TYPE, __U64_TYPE): Define as __uint64_t.
8a8cfb
            Update commentary.
8a8cfb
            * posix/sys/types.h (__u_intN_t): Remove.
8a8cfb
            (u_int8_t): Typedef using __uint8_t.
8a8cfb
            (u_int16_t): Typedef using __uint16_t.
8a8cfb
            (u_int32_t): Typedef using __uint32_t.
8a8cfb
            (u_int64_t): Typedef using __uint64_t.
8a8cfb
8a8cfb
Conflicts:
8a8cfb
	Rules
8a8cfb
	  (textual conflicts due to lack of check-wrapper-headers test.)
8a8cfb
8a8cfb
diff --git a/Rules b/Rules
8a8cfb
index 5abb7270aa8e24aa..a07dbb8d978b5769 100644
8a8cfb
--- a/Rules
8a8cfb
+++ b/Rules
8a8cfb
@@ -82,7 +82,8 @@ $(common-objpfx)dummy.c:
8a8cfb
 common-generated += dummy.o dummy.c
8a8cfb
 
8a8cfb
 ifneq "$(headers)" ""
8a8cfb
-# Special test of all the installed headers in this directory.
8a8cfb
+# Test that all of the headers installed by this directory can be compiled
8a8cfb
+# in isolation.
8a8cfb
 tests-special += $(objpfx)check-installed-headers-c.out
8a8cfb
 libof-check-installed-headers-c := testsuite
8a8cfb
 $(objpfx)check-installed-headers-c.out: \
8a8cfb
@@ -93,6 +94,8 @@ $(objpfx)check-installed-headers-c.out: \
8a8cfb
 	$(evaluate-test)
8a8cfb
 
8a8cfb
 ifneq "$(CXX)" ""
8a8cfb
+# If a C++ compiler is available, also test that they can be compiled
8a8cfb
+# in isolation as C++.
8a8cfb
 tests-special += $(objpfx)check-installed-headers-cxx.out
8a8cfb
 libof-check-installed-headers-cxx := testsuite
8a8cfb
 $(objpfx)check-installed-headers-cxx.out: \
8a8cfb
@@ -101,8 +104,19 @@ $(objpfx)check-installed-headers-cxx.out: \
8a8cfb
 	  "$(CXX) $(filter-out -std=%,$(CXXFLAGS)) -D_ISOMAC $(+includes)" \
8a8cfb
 	  $(headers) > $@; \
8a8cfb
 	$(evaluate-test)
8a8cfb
-endif
8a8cfb
-endif
8a8cfb
+endif # $(CXX)
8a8cfb
+
8a8cfb
+# Test that none of the headers installed by this directory use certain
8a8cfb
+# obsolete constructs (e.g. legacy BSD typedefs superseded by stdint.h).
8a8cfb
+# This script does not need $(py-env).
8a8cfb
+tests-special += $(objpfx)check-obsolete-constructs.out
8a8cfb
+libof-check-obsolete-constructs := testsuite
8a8cfb
+$(objpfx)check-obsolete-constructs.out: \
8a8cfb
+    $(..)scripts/check-obsolete-constructs.py $(headers)
8a8cfb
+	$(PYTHON) $^ > $@ 2>&1; \
8a8cfb
+	$(evaluate-test)
8a8cfb
+
8a8cfb
+endif # $(headers)
8a8cfb
 
8a8cfb
 # This makes all the auxiliary and test programs.
8a8cfb
 
8a8cfb
diff --git a/posix/bits/types.h b/posix/bits/types.h
8a8cfb
index 5e22ce41bf4c29b3..64f344c6e7897491 100644
8a8cfb
--- a/posix/bits/types.h
8a8cfb
+++ b/posix/bits/types.h
8a8cfb
@@ -86,7 +86,7 @@ __extension__ typedef unsigned long long int __uintmax_t;
8a8cfb
 	32		-- "natural" 32-bit type (always int)
8a8cfb
 	64		-- "natural" 64-bit type (long or long long)
8a8cfb
 	LONG32		-- 32-bit type, traditionally long
8a8cfb
-	QUAD		-- 64-bit type, always long long
8a8cfb
+	QUAD		-- 64-bit type, traditionally long long
8a8cfb
 	WORD		-- natural type of __WORDSIZE bits (int or long)
8a8cfb
 	LONGWORD	-- type of __WORDSIZE bits, traditionally long
8a8cfb
 
8a8cfb
@@ -112,14 +112,14 @@ __extension__ typedef unsigned long long int __uintmax_t;
8a8cfb
 #define __SLONGWORD_TYPE	long int
8a8cfb
 #define __ULONGWORD_TYPE	unsigned long int
8a8cfb
 #if __WORDSIZE == 32
8a8cfb
-# define __SQUAD_TYPE		__quad_t
8a8cfb
-# define __UQUAD_TYPE		__u_quad_t
8a8cfb
+# define __SQUAD_TYPE		__int64_t
8a8cfb
+# define __UQUAD_TYPE		__uint64_t
8a8cfb
 # define __SWORD_TYPE		int
8a8cfb
 # define __UWORD_TYPE		unsigned int
8a8cfb
 # define __SLONG32_TYPE		long int
8a8cfb
 # define __ULONG32_TYPE		unsigned long int
8a8cfb
-# define __S64_TYPE		__quad_t
8a8cfb
-# define __U64_TYPE		__u_quad_t
8a8cfb
+# define __S64_TYPE		__int64_t
8a8cfb
+# define __U64_TYPE		__uint64_t
8a8cfb
 /* We want __extension__ before typedef's that use nonstandard base types
8a8cfb
    such as `long long' in C89 mode.  */
8a8cfb
 # define __STD_TYPE		__extension__ typedef
8a8cfb
diff --git a/posix/sys/types.h b/posix/sys/types.h
8a8cfb
index db524d6cd13f0379..47eff1a7b1a91c81 100644
8a8cfb
--- a/posix/sys/types.h
8a8cfb
+++ b/posix/sys/types.h
8a8cfb
@@ -154,37 +154,20 @@ typedef unsigned int uint;
8a8cfb
 
8a8cfb
 #include <bits/stdint-intn.h>
8a8cfb
 
8a8cfb
-#if !__GNUC_PREREQ (2, 7)
8a8cfb
-
8a8cfb
 /* These were defined by ISO C without the first `_'.  */
8a8cfb
-typedef	unsigned char u_int8_t;
8a8cfb
-typedef	unsigned short int u_int16_t;
8a8cfb
-typedef	unsigned int u_int32_t;
8a8cfb
-# if __WORDSIZE == 64
8a8cfb
-typedef unsigned long int u_int64_t;
8a8cfb
-# else
8a8cfb
-__extension__ typedef unsigned long long int u_int64_t;
8a8cfb
-# endif
8a8cfb
-
8a8cfb
-typedef int register_t;
8a8cfb
-
8a8cfb
-#else
8a8cfb
-
8a8cfb
-/* For GCC 2.7 and later, we can use specific type-size attributes.  */
8a8cfb
-# define __u_intN_t(N, MODE) \
8a8cfb
-  typedef unsigned int u_int##N##_t __attribute__ ((__mode__ (MODE)))
8a8cfb
-
8a8cfb
-__u_intN_t (8, __QI__);
8a8cfb
-__u_intN_t (16, __HI__);
8a8cfb
-__u_intN_t (32, __SI__);
8a8cfb
-__u_intN_t (64, __DI__);
8a8cfb
+typedef __uint8_t u_int8_t;
8a8cfb
+typedef __uint16_t u_int16_t;
8a8cfb
+typedef __uint32_t u_int32_t;
8a8cfb
+typedef __uint64_t u_int64_t;
8a8cfb
 
8a8cfb
+#if __GNUC_PREREQ (2, 7)
8a8cfb
 typedef int register_t __attribute__ ((__mode__ (__word__)));
8a8cfb
-
8a8cfb
+#else
8a8cfb
+typedef int register_t;
8a8cfb
+#endif
8a8cfb
 
8a8cfb
 /* Some code from BIND tests this macro to see if the types above are
8a8cfb
    defined.  */
8a8cfb
-#endif
8a8cfb
 #define __BIT_TYPES_DEFINED__	1
8a8cfb
 
8a8cfb
 
8a8cfb
diff --git a/scripts/check-installed-headers.sh b/scripts/check-installed-headers.sh
8a8cfb
index 7a1969b43a144ebb..c2aeea5aabcc7ffd 100644
8a8cfb
--- a/scripts/check-installed-headers.sh
8a8cfb
+++ b/scripts/check-installed-headers.sh
8a8cfb
@@ -16,11 +16,9 @@
8a8cfb
 # License along with the GNU C Library; if not, see
8a8cfb
 # <http://www.gnu.org/licenses/>.
8a8cfb
 
8a8cfb
-# Check installed headers for cleanliness.  For each header, confirm
8a8cfb
-# that it's possible to compile a file that includes that header and
8a8cfb
-# does nothing else, in several different compilation modes.  Also,
8a8cfb
-# scan the header for a set of obsolete typedefs that should no longer
8a8cfb
-# appear.
8a8cfb
+# For each installed header, confirm that it's possible to compile a
8a8cfb
+# file that includes that header and does nothing else, in several
8a8cfb
+# different compilation modes.
8a8cfb
 
8a8cfb
 # These compilation switches assume GCC or compatible, which is probably
8a8cfb
 # fine since we also assume that when _building_ glibc.
8a8cfb
@@ -31,13 +29,6 @@ cxx_modes="-std=c++98 -std=gnu++98 -std=c++11 -std=gnu++11"
8a8cfb
 # These are probably the most commonly used three.
8a8cfb
 lib_modes="-D_DEFAULT_SOURCE=1 -D_GNU_SOURCE=1 -D_XOPEN_SOURCE=700"
8a8cfb
 
8a8cfb
-# sys/types.h+bits/types.h have to define the obsolete types.
8a8cfb
-# rpc(svc)/* have the obsolete types too deeply embedded in their API
8a8cfb
-# to remove.
8a8cfb
-skip_obsolete_type_check='*/sys/types.h|*/bits/types.h|*/rpc/*|*/rpcsvc/*'
8a8cfb
-obsolete_type_re=\
8a8cfb
-'\<((__)?(quad_t|u(short|int|long|_(char|short|int([0-9]+_t)?|long|quad_t))))\>'
8a8cfb
-
8a8cfb
 if [ $# -lt 3 ]; then
8a8cfb
     echo "usage: $0 c|c++ \"compile command\" header header header..." >&2
8a8cfb
     exit 2
8a8cfb
@@ -46,14 +37,10 @@ case "$1" in
8a8cfb
     (c)
8a8cfb
         lang_modes="$c_modes"
8a8cfb
         cih_test_c=$(mktemp ${TMPDIR-/tmp}/cih_test_XXXXXX.c)
8a8cfb
-        already="$skip_obsolete_type_check"
8a8cfb
     ;;
8a8cfb
     (c++)
8a8cfb
         lang_modes="$cxx_modes"
8a8cfb
         cih_test_c=$(mktemp ${TMPDIR-/tmp}/cih_test_XXXXXX.cc)
8a8cfb
-        # The obsolete-type check can be skipped for C++; it is
8a8cfb
-        # sufficient to do it for C.
8a8cfb
-        already="*"
8a8cfb
     ;;
8a8cfb
     (*)
8a8cfb
         echo "usage: $0 c|c++ \"compile command\" header header header..." >&2
8a8cfb
@@ -155,22 +142,8 @@ $expanded_lib_mode
8a8cfb
 int avoid_empty_translation_unit;
8a8cfb
 EOF
8a8cfb
             if $cc_cmd -fsyntax-only $lang_mode "$cih_test_c" 2>&1
8a8cfb
-            then
8a8cfb
-                includes=$($cc_cmd -fsyntax-only -H $lang_mode \
8a8cfb
-                              "$cih_test_c" 2>&1 | sed -ne 's/^[.][.]* //p')
8a8cfb
-                for h in $includes; do
8a8cfb
-                    # Don't repeat work.
8a8cfb
-                    eval 'case "$h" in ('"$already"') continue;; esac'
8a8cfb
-
8a8cfb
-                    if grep -qE "$obsolete_type_re" "$h"; then
8a8cfb
-                        echo "*** Obsolete types detected:"
8a8cfb
-                        grep -HE "$obsolete_type_re" "$h"
8a8cfb
-                        failed=1
8a8cfb
-                    fi
8a8cfb
-                    already="$already|$h"
8a8cfb
-                done
8a8cfb
-            else
8a8cfb
-                failed=1
8a8cfb
+            then :
8a8cfb
+            else failed=1
8a8cfb
             fi
8a8cfb
         done
8a8cfb
     done
8a8cfb
diff --git a/scripts/check-obsolete-constructs.py b/scripts/check-obsolete-constructs.py
8a8cfb
new file mode 100755
8a8cfb
index 0000000000000000..ce5c72251f4d7cc0
8a8cfb
--- /dev/null
8a8cfb
+++ b/scripts/check-obsolete-constructs.py
8a8cfb
@@ -0,0 +1,466 @@
8a8cfb
+#! /usr/bin/python3
8a8cfb
+# Copyright (C) 2019 Free Software Foundation, Inc.
8a8cfb
+# This file is part of the GNU C Library.
8a8cfb
+#
8a8cfb
+# The GNU C Library is free software; you can redistribute it and/or
8a8cfb
+# modify it under the terms of the GNU Lesser General Public
8a8cfb
+# License as published by the Free Software Foundation; either
8a8cfb
+# version 2.1 of the License, or (at your option) any later version.
8a8cfb
+#
8a8cfb
+# The GNU C Library is distributed in the hope that it will be useful,
8a8cfb
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
8a8cfb
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
8a8cfb
+# Lesser General Public License for more details.
8a8cfb
+#
8a8cfb
+# You should have received a copy of the GNU Lesser General Public
8a8cfb
+# License along with the GNU C Library; if not, see
8a8cfb
+# <http://www.gnu.org/licenses/>.
8a8cfb
+
8a8cfb
+"""Verifies that installed headers do not use any obsolete constructs:
8a8cfb
+ * legacy BSD typedefs superseded by <stdint.h>:
8a8cfb
+   ushort uint ulong u_char u_short u_int u_long u_intNN_t quad_t u_quad_t
8a8cfb
+   (sys/types.h is allowed to _define_ these types, but not to use them
8a8cfb
+    to define anything else).
8a8cfb
+"""
8a8cfb
+
8a8cfb
+import argparse
8a8cfb
+import collections
8a8cfb
+import re
8a8cfb
+import sys
8a8cfb
+
8a8cfb
+# Simplified lexical analyzer for C preprocessing tokens.
8a8cfb
+# Does not implement trigraphs.
8a8cfb
+# Does not implement backslash-newline in the middle of any lexical
8a8cfb
+#   item other than a string literal.
8a8cfb
+# Does not implement universal-character-names in identifiers.
8a8cfb
+# Treats prefixed strings (e.g. L"...") as two tokens (L and "...")
8a8cfb
+# Accepts non-ASCII characters only within comments and strings.
8a8cfb
+
8a8cfb
+# Caution: The order of the outermost alternation matters.
8a8cfb
+# STRING must be before BAD_STRING, CHARCONST before BAD_CHARCONST,
8a8cfb
+# BLOCK_COMMENT before BAD_BLOCK_COM before PUNCTUATOR, and OTHER must
8a8cfb
+# be last.
8a8cfb
+# Caution: There should be no capturing groups other than the named
8a8cfb
+# captures in the outermost alternation.
8a8cfb
+
8a8cfb
+# For reference, these are all of the C punctuators as of C11:
8a8cfb
+#   [ ] ( ) { } , ; ? ~
8a8cfb
+#   ! != * *= / /= ^ ^= = ==
8a8cfb
+#   # ##
8a8cfb
+#   % %= %> %: %:%:
8a8cfb
+#   & &= &&
8a8cfb
+#   | |= ||
8a8cfb
+#   + += ++
8a8cfb
+#   - -= -- ->
8a8cfb
+#   . ...
8a8cfb
+#   : :>
8a8cfb
+#   < <% <: << <<= <=
8a8cfb
+#   > >= >> >>=
8a8cfb
+
8a8cfb
+# The BAD_* tokens are not part of the official definition of pp-tokens;
8a8cfb
+# they match unclosed strings, character constants, and block comments,
8a8cfb
+# so that the regex engine doesn't have to backtrack all the way to the
8a8cfb
+# beginning of a broken construct and then emit dozens of junk tokens.
8a8cfb
+
8a8cfb
+PP_TOKEN_RE_ = re.compile(r"""
8a8cfb
+    (?P<STRING>        \"(?:[^\"\\\r\n]|\\(?:[\r\n -~]|\r\n))*\")
8a8cfb
+   |(?P<BAD_STRING>    \"(?:[^\"\\\r\n]|\\[ -~])*)
8a8cfb
+   |(?P<CHARCONST>     \'(?:[^\'\\\r\n]|\\(?:[\r\n -~]|\r\n))*\')
8a8cfb
+   |(?P<BAD_CHARCONST> \'(?:[^\'\\\r\n]|\\[ -~])*)
8a8cfb
+   |(?P<BLOCK_COMMENT> /\*(?:\*(?!/)|[^*])*\*/)
8a8cfb
+   |(?P<BAD_BLOCK_COM> /\*(?:\*(?!/)|[^*])*\*?)
8a8cfb
+   |(?P<LINE_COMMENT>  //[^\r\n]*)
8a8cfb
+   |(?P<IDENT>         [_a-zA-Z][_a-zA-Z0-9]*)
8a8cfb
+   |(?P<PP_NUMBER>     \.?[0-9](?:[0-9a-df-oq-zA-DF-OQ-Z_.]|[eEpP][+-]?)*)
8a8cfb
+   |(?P<PUNCTUATOR>
8a8cfb
+       [,;?~(){}\[\]]
8a8cfb
+     | [!*/^=]=?
8a8cfb
+     | \#\#?
8a8cfb
+     | %(?:[=>]|:(?:%:)?)?
8a8cfb
+     | &[=&]?
8a8cfb
+     |\|[=|]?
8a8cfb
+     |\+[=+]?
8a8cfb
+     | -[=->]?
8a8cfb
+     |\.(?:\.\.)?
8a8cfb
+     | :>?
8a8cfb
+     | <(?:[%:]|<(?:=|<=?)?)?
8a8cfb
+     | >(?:=|>=?)?)
8a8cfb
+   |(?P<ESCNL>         \\(?:\r|\n|\r\n))
8a8cfb
+   |(?P<WHITESPACE>    [ \t\n\r\v\f]+)
8a8cfb
+   |(?P<OTHER>         .)
8a8cfb
+""", re.DOTALL | re.VERBOSE)
8a8cfb
+
8a8cfb
+HEADER_NAME_RE_ = re.compile(r"""
8a8cfb
+    < [^>\r\n]+ >
8a8cfb
+  | " [^"\r\n]+ "
8a8cfb
+""", re.DOTALL | re.VERBOSE)
8a8cfb
+
8a8cfb
+ENDLINE_RE_ = re.compile(r"""\r|\n|\r\n""")
8a8cfb
+
8a8cfb
+# based on the sample code in the Python re documentation
8a8cfb
+Token_ = collections.namedtuple("Token", (
8a8cfb
+    "kind", "text", "line", "column", "context"))
8a8cfb
+Token_.__doc__ = """
8a8cfb
+   One C preprocessing token, comment, or chunk of whitespace.
8a8cfb
+   'kind' identifies the token type, which will be one of:
8a8cfb
+       STRING, CHARCONST, BLOCK_COMMENT, LINE_COMMENT, IDENT,
8a8cfb
+       PP_NUMBER, PUNCTUATOR, ESCNL, WHITESPACE, HEADER_NAME,
8a8cfb
+       or OTHER.  The BAD_* alternatives in PP_TOKEN_RE_ are
8a8cfb
+       handled within tokenize_c, below.
8a8cfb
+
8a8cfb
+   'text' is the sequence of source characters making up the token;
8a8cfb
+       no decoding whatsoever is performed.
8a8cfb
+
8a8cfb
+   'line' and 'column' give the position of the first character of the
8a8cfb
+      token within the source file.  They are both 1-based.
8a8cfb
+
8a8cfb
+   'context' indicates whether or not this token occurred within a
8a8cfb
+      preprocessing directive; it will be None for running text,
8a8cfb
+      '<null>' for the leading '#' of a directive line (because '#'
8a8cfb
+      all by itself on a line is a "null directive"), or the name of
8a8cfb
+      the directive for tokens within a directive line, starting with
8a8cfb
+      the IDENT for the name itself.
8a8cfb
+"""
8a8cfb
+
8a8cfb
+def tokenize_c(file_contents, reporter):
8a8cfb
+    """Yield a series of Token objects, one for each preprocessing
8a8cfb
+       token, comment, or chunk of whitespace within FILE_CONTENTS.
8a8cfb
+       The REPORTER object is expected to have one method,
8a8cfb
+       reporter.error(token, message), which will be called to
8a8cfb
+       indicate a lexical error at the position of TOKEN.
8a8cfb
+       If MESSAGE contains the four-character sequence '{!r}', that
8a8cfb
+       is expected to be replaced by repr(token.text).
8a8cfb
+    """
8a8cfb
+
8a8cfb
+    Token = Token_
8a8cfb
+    PP_TOKEN_RE = PP_TOKEN_RE_
8a8cfb
+    ENDLINE_RE = ENDLINE_RE_
8a8cfb
+    HEADER_NAME_RE = HEADER_NAME_RE_
8a8cfb
+
8a8cfb
+    line_num = 1
8a8cfb
+    line_start = 0
8a8cfb
+    pos = 0
8a8cfb
+    limit = len(file_contents)
8a8cfb
+    directive = None
8a8cfb
+    at_bol = True
8a8cfb
+    while pos < limit:
8a8cfb
+        if directive == "include":
8a8cfb
+            mo = HEADER_NAME_RE.match(file_contents, pos)
8a8cfb
+            if mo:
8a8cfb
+                kind = "HEADER_NAME"
8a8cfb
+                directive = "after_include"
8a8cfb
+            else:
8a8cfb
+                mo = PP_TOKEN_RE.match(file_contents, pos)
8a8cfb
+                kind = mo.lastgroup
8a8cfb
+                if kind != "WHITESPACE":
8a8cfb
+                    directive = "after_include"
8a8cfb
+        else:
8a8cfb
+            mo = PP_TOKEN_RE.match(file_contents, pos)
8a8cfb
+            kind = mo.lastgroup
8a8cfb
+
8a8cfb
+        text = mo.group()
8a8cfb
+        line = line_num
8a8cfb
+        column = mo.start() - line_start
8a8cfb
+        adj_line_start = 0
8a8cfb
+        # only these kinds can contain a newline
8a8cfb
+        if kind in ("WHITESPACE", "BLOCK_COMMENT", "LINE_COMMENT",
8a8cfb
+                    "STRING", "CHARCONST", "BAD_BLOCK_COM", "ESCNL"):
8a8cfb
+            for tmo in ENDLINE_RE.finditer(text):
8a8cfb
+                line_num += 1
8a8cfb
+                adj_line_start = tmo.end()
8a8cfb
+            if adj_line_start:
8a8cfb
+                line_start = mo.start() + adj_line_start
8a8cfb
+
8a8cfb
+        # Track whether or not we are scanning a preprocessing directive.
8a8cfb
+        if kind == "LINE_COMMENT" or (kind == "WHITESPACE" and adj_line_start):
8a8cfb
+            at_bol = True
8a8cfb
+            directive = None
8a8cfb
+        else:
8a8cfb
+            if kind == "PUNCTUATOR" and text == "#" and at_bol:
8a8cfb
+                directive = "<null>"
8a8cfb
+            elif kind == "IDENT" and directive == "<null>":
8a8cfb
+                directive = text
8a8cfb
+            at_bol = False
8a8cfb
+
8a8cfb
+        # Report ill-formed tokens and rewrite them as their well-formed
8a8cfb
+        # equivalents, so downstream processing doesn't have to know about them.
8a8cfb
+        # (Rewriting instead of discarding provides better error recovery.)
8a8cfb
+        if kind == "BAD_BLOCK_COM":
8a8cfb
+            reporter.error(Token("BAD_BLOCK_COM", "", line, column+1, ""),
8a8cfb
+                           "unclosed block comment")
8a8cfb
+            text += "*/"
8a8cfb
+            kind = "BLOCK_COMMENT"
8a8cfb
+        elif kind == "BAD_STRING":
8a8cfb
+            reporter.error(Token("BAD_STRING", "", line, column+1, ""),
8a8cfb
+                           "unclosed string")
8a8cfb
+            text += "\""
8a8cfb
+            kind = "STRING"
8a8cfb
+        elif kind == "BAD_CHARCONST":
8a8cfb
+            reporter.error(Token("BAD_CHARCONST", "", line, column+1, ""),
8a8cfb
+                           "unclosed char constant")
8a8cfb
+            text += "'"
8a8cfb
+            kind = "CHARCONST"
8a8cfb
+
8a8cfb
+        tok = Token(kind, text, line, column+1,
8a8cfb
+                    "include" if directive == "after_include" else directive)
8a8cfb
+        # Do not complain about OTHER tokens inside macro definitions.
8a8cfb
+        # $ and @ appear in macros defined by headers intended to be
8a8cfb
+        # included from assembly language, e.g. sysdeps/mips/sys/asm.h.
8a8cfb
+        if kind == "OTHER" and directive != "define":
8a8cfb
+            self.error(tok, "stray {!r} in program")
8a8cfb
+
8a8cfb
+        yield tok
8a8cfb
+        pos = mo.end()
8a8cfb
+
8a8cfb
+#
8a8cfb
+# Base and generic classes for individual checks.
8a8cfb
+#
8a8cfb
+
8a8cfb
+class ConstructChecker:
8a8cfb
+    """Scan a stream of C preprocessing tokens and possibly report
8a8cfb
+       problems with them.  The REPORTER object passed to __init__ has
8a8cfb
+       one method, reporter.error(token, message), which should be
8a8cfb
+       called to indicate a problem detected at the position of TOKEN.
8a8cfb
+       If MESSAGE contains the four-character sequence '{!r}' then that
8a8cfb
+       will be replaced with a textual representation of TOKEN.
8a8cfb
+    """
8a8cfb
+    def __init__(self, reporter):
8a8cfb
+        self.reporter = reporter
8a8cfb
+
8a8cfb
+    def examine(self, tok):
8a8cfb
+        """Called once for each token in a header file.
8a8cfb
+           Call self.reporter.error if a problem is detected.
8a8cfb
+        """
8a8cfb
+        raise NotImplementedError
8a8cfb
+
8a8cfb
+    def eof(self):
8a8cfb
+        """Called once at the end of the stream.  Subclasses need only
8a8cfb
+           override this if it might have something to do."""
8a8cfb
+        pass
8a8cfb
+
8a8cfb
+class NoCheck(ConstructChecker):
8a8cfb
+    """Generic checker class which doesn't do anything.  Substitute this
8a8cfb
+       class for a real checker when a particular check should be skipped
8a8cfb
+       for some file."""
8a8cfb
+
8a8cfb
+    def examine(self, tok):
8a8cfb
+        pass
8a8cfb
+
8a8cfb
+#
8a8cfb
+# Check for obsolete type names.
8a8cfb
+#
8a8cfb
+
8a8cfb
+# The obsolete type names we're looking for:
8a8cfb
+OBSOLETE_TYPE_RE_ = re.compile(r"""\A
8a8cfb
+  (__)?
8a8cfb
+  (   quad_t
8a8cfb
+    | u(?: short | int | long
8a8cfb
+         | _(?: char | short | int(?:[0-9]+_t)? | long | quad_t )))
8a8cfb
+\Z""", re.VERBOSE)
8a8cfb
+
8a8cfb
+class ObsoleteNotAllowed(ConstructChecker):
8a8cfb
+    """Don't allow any use of the obsolete typedefs."""
8a8cfb
+    def examine(self, tok):
8a8cfb
+        if OBSOLETE_TYPE_RE_.match(tok.text):
8a8cfb
+            self.reporter.error(tok, "use of {!r}")
8a8cfb
+
8a8cfb
+class ObsoletePrivateDefinitionsAllowed(ConstructChecker):
8a8cfb
+    """Allow definitions of the private versions of the
8a8cfb
+       obsolete typedefs; that is, 'typedef [anything] __obsolete;'
8a8cfb
+    """
8a8cfb
+    def __init__(self, reporter):
8a8cfb
+        super().__init__(reporter)
8a8cfb
+        self.in_typedef = False
8a8cfb
+        self.prev_token = None
8a8cfb
+
8a8cfb
+    def examine(self, tok):
8a8cfb
+        # bits/types.h hides 'typedef' in a macro sometimes.
8a8cfb
+        if (tok.kind == "IDENT"
8a8cfb
+            and tok.text in ("typedef", "__STD_TYPE")
8a8cfb
+            and tok.context is None):
8a8cfb
+            self.in_typedef = True
8a8cfb
+        elif tok.kind == "PUNCTUATOR" and tok.text == ";" and self.in_typedef:
8a8cfb
+            self.in_typedef = False
8a8cfb
+            if self.prev_token.kind == "IDENT":
8a8cfb
+                m = OBSOLETE_TYPE_RE_.match(self.prev_token.text)
8a8cfb
+                if m and m.group(1) != "__":
8a8cfb
+                    self.reporter.error(self.prev_token, "use of {!r}")
8a8cfb
+            self.prev_token = None
8a8cfb
+        else:
8a8cfb
+            self._check_prev()
8a8cfb
+
8a8cfb
+        self.prev_token = tok
8a8cfb
+
8a8cfb
+    def eof(self):
8a8cfb
+        self._check_prev()
8a8cfb
+
8a8cfb
+    def _check_prev(self):
8a8cfb
+        if (self.prev_token is not None
8a8cfb
+            and self.prev_token.kind == "IDENT"
8a8cfb
+            and OBSOLETE_TYPE_RE_.match(self.prev_token.text)):
8a8cfb
+            self.reporter.error(self.prev_token, "use of {!r}")
8a8cfb
+
8a8cfb
+class ObsoletePublicDefinitionsAllowed(ConstructChecker):
8a8cfb
+    """Allow definitions of the public versions of the obsolete
8a8cfb
+       typedefs.  Only specific forms of definition are allowed:
8a8cfb
+
8a8cfb
+           typedef __obsolete obsolete;  // identifiers must agree
8a8cfb
+           typedef __uintN_t u_intN_t;   // N must agree
8a8cfb
+           typedef unsigned long int ulong;
8a8cfb
+           typedef unsigned short int ushort;
8a8cfb
+           typedef unsigned int uint;
8a8cfb
+    """
8a8cfb
+    def __init__(self, reporter):
8a8cfb
+        super().__init__(reporter)
8a8cfb
+        self.typedef_tokens = []
8a8cfb
+
8a8cfb
+    def examine(self, tok):
8a8cfb
+        if tok.kind in ("WHITESPACE", "BLOCK_COMMENT",
8a8cfb
+                        "LINE_COMMENT", "NL", "ESCNL"):
8a8cfb
+            pass
8a8cfb
+
8a8cfb
+        elif (tok.kind == "IDENT" and tok.text == "typedef"
8a8cfb
+              and tok.context is None):
8a8cfb
+            if self.typedef_tokens:
8a8cfb
+                self.reporter.error(tok, "typedef inside typedef")
8a8cfb
+                self._reset()
8a8cfb
+            self.typedef_tokens.append(tok)
8a8cfb
+
8a8cfb
+        elif tok.kind == "PUNCTUATOR" and tok.text == ";":
8a8cfb
+            self._finish()
8a8cfb
+
8a8cfb
+        elif self.typedef_tokens:
8a8cfb
+            self.typedef_tokens.append(tok)
8a8cfb
+
8a8cfb
+    def eof(self):
8a8cfb
+        self._reset()
8a8cfb
+
8a8cfb
+    def _reset(self):
8a8cfb
+        while self.typedef_tokens:
8a8cfb
+            tok = self.typedef_tokens.pop(0)
8a8cfb
+            if tok.kind == "IDENT" and OBSOLETE_TYPE_RE_.match(tok.text):
8a8cfb
+                self.reporter.error(tok, "use of {!r}")
8a8cfb
+
8a8cfb
+    def _finish(self):
8a8cfb
+        if not self.typedef_tokens: return
8a8cfb
+        if self.typedef_tokens[-1].kind == "IDENT":
8a8cfb
+            m = OBSOLETE_TYPE_RE_.match(self.typedef_tokens[-1].text)
8a8cfb
+            if m:
8a8cfb
+                if self._permissible_public_definition(m):
8a8cfb
+                    self.typedef_tokens.clear()
8a8cfb
+        self._reset()
8a8cfb
+
8a8cfb
+    def _permissible_public_definition(self, m):
8a8cfb
+        if m.group(1) == "__": return False
8a8cfb
+        name = m.group(2)
8a8cfb
+        toks = self.typedef_tokens
8a8cfb
+        ntok = len(toks)
8a8cfb
+        if ntok == 3 and toks[1].kind == "IDENT":
8a8cfb
+            defn = toks[1].text
8a8cfb
+            n = OBSOLETE_TYPE_RE_.match(defn)
8a8cfb
+            if n and n.group(1) == "__" and n.group(2) == name:
8a8cfb
+                return True
8a8cfb
+
8a8cfb
+            if (name[:5] == "u_int" and name[-2:] == "_t"
8a8cfb
+                and defn[:6] == "__uint" and defn[-2:] == "_t"
8a8cfb
+                and name[5:-2] == defn[6:-2]):
8a8cfb
+                return True
8a8cfb
+
8a8cfb
+            return False
8a8cfb
+
8a8cfb
+        if (name == "ulong" and ntok == 5
8a8cfb
+            and toks[1].kind == "IDENT" and toks[1].text == "unsigned"
8a8cfb
+            and toks[2].kind == "IDENT" and toks[2].text == "long"
8a8cfb
+            and toks[3].kind == "IDENT" and toks[3].text == "int"):
8a8cfb
+            return True
8a8cfb
+
8a8cfb
+        if (name == "ushort" and ntok == 5
8a8cfb
+            and toks[1].kind == "IDENT" and toks[1].text == "unsigned"
8a8cfb
+            and toks[2].kind == "IDENT" and toks[2].text == "short"
8a8cfb
+            and toks[3].kind == "IDENT" and toks[3].text == "int"):
8a8cfb
+            return True
8a8cfb
+
8a8cfb
+        if (name == "uint" and ntok == 4
8a8cfb
+            and toks[1].kind == "IDENT" and toks[1].text == "unsigned"
8a8cfb
+            and toks[2].kind == "IDENT" and toks[2].text == "int"):
8a8cfb
+            return True
8a8cfb
+
8a8cfb
+        return False
8a8cfb
+
8a8cfb
+def ObsoleteTypedefChecker(reporter, fname):
8a8cfb
+    """Factory: produce an instance of the appropriate
8a8cfb
+       obsolete-typedef checker for FNAME."""
8a8cfb
+
8a8cfb
+    # The obsolete rpc/ and rpcsvc/ headers are allowed to use the
8a8cfb
+    # obsolete types, because it would be more trouble than it's
8a8cfb
+    # worth to remove them from headers that we intend to stop
8a8cfb
+    # installing eventually anyway.
8a8cfb
+    if (fname.startswith("rpc/")
8a8cfb
+        or fname.startswith("rpcsvc/")
8a8cfb
+        or "/rpc/" in fname
8a8cfb
+        or "/rpcsvc/" in fname):
8a8cfb
+        return NoCheck(reporter)
8a8cfb
+
8a8cfb
+    # bits/types.h is allowed to define the __-versions of the
8a8cfb
+    # obsolete types.
8a8cfb
+    if (fname == "bits/types.h"
8a8cfb
+        or fname.endswith("/bits/types.h")):
8a8cfb
+        return ObsoletePrivateDefinitionsAllowed(reporter)
8a8cfb
+
8a8cfb
+    # sys/types.h is allowed to use the __-versions of the
8a8cfb
+    # obsolete types, but only to define the unprefixed versions.
8a8cfb
+    if (fname == "sys/types.h"
8a8cfb
+        or fname.endswith("/sys/types.h")):
8a8cfb
+        return ObsoletePublicDefinitionsAllowed(reporter)
8a8cfb
+
8a8cfb
+    return ObsoleteNotAllowed(reporter)
8a8cfb
+
8a8cfb
+#
8a8cfb
+# Master control
8a8cfb
+#
8a8cfb
+
8a8cfb
+class HeaderChecker:
8a8cfb
+    """Perform all of the checks on each header.  This is also the
8a8cfb
+       "reporter" object expected by tokenize_c and ConstructChecker.
8a8cfb
+    """
8a8cfb
+    def __init__(self):
8a8cfb
+        self.fname = None
8a8cfb
+        self.status = 0
8a8cfb
+
8a8cfb
+    def error(self, tok, message):
8a8cfb
+        self.status = 1
8a8cfb
+        if '{!r}' in message:
8a8cfb
+            message = message.format(tok.text)
8a8cfb
+        sys.stderr.write("{}:{}:{}: error: {}\n".format(
8a8cfb
+            self.fname, tok.line, tok.column, message))
8a8cfb
+
8a8cfb
+    def check(self, fname):
8a8cfb
+        self.fname = fname
8a8cfb
+        try:
8a8cfb
+            with open(fname, "rt") as fp:
8a8cfb
+                contents = fp.read()
8a8cfb
+        except OSError as e:
8a8cfb
+            sys.stderr.write("{}: {}\n".format(fname, e.strerror))
8a8cfb
+            self.status = 1
8a8cfb
+            return
8a8cfb
+
8a8cfb
+        typedef_checker = ObsoleteTypedefChecker(self, self.fname)
8a8cfb
+
8a8cfb
+        for tok in tokenize_c(contents, self):
8a8cfb
+            typedef_checker.examine(tok)
8a8cfb
+
8a8cfb
+def main():
8a8cfb
+    ap = argparse.ArgumentParser(description=__doc__)
8a8cfb
+    ap.add_argument("headers", metavar="header", nargs="+",
8a8cfb
+                    help="one or more headers to scan for obsolete constructs")
8a8cfb
+    args = ap.parse_args()
8a8cfb
+
8a8cfb
+    checker = HeaderChecker()
8a8cfb
+    for fname in args.headers:
8a8cfb
+        # Headers whose installed name begins with "finclude/" contain
8a8cfb
+        # Fortran, not C, and this program should completely ignore them.
8a8cfb
+        if not (fname.startswith("finclude/") or "/finclude/" in fname):
8a8cfb
+            checker.check(fname)
8a8cfb
+    sys.exit(checker.status)
8a8cfb
+
8a8cfb
+main()