|
![](https://seccdn.libravatar.org/avatar/eca4d0658e293ac828929c05684948cb411ba891fa2ad2130aafa91b664e89ad?s=16&d=retro) |
243a19 |
From d72ba890c8d8ac800c9d00a1f542deca11551f33 Mon Sep 17 00:00:00 2001
|
|
![](https://seccdn.libravatar.org/avatar/eca4d0658e293ac828929c05684948cb411ba891fa2ad2130aafa91b664e89ad?s=16&d=retro) |
243a19 |
From: Karl Williamson <khw@cpan.org>
|
|
![](https://seccdn.libravatar.org/avatar/eca4d0658e293ac828929c05684948cb411ba891fa2ad2130aafa91b664e89ad?s=16&d=retro) |
243a19 |
Date: Tue, 13 Feb 2018 07:03:43 -0700
|
|
![](https://seccdn.libravatar.org/avatar/eca4d0658e293ac828929c05684948cb411ba891fa2ad2130aafa91b664e89ad?s=16&d=retro) |
243a19 |
Subject: utf8.c: Don't dump malformation past first NUL
|
|
![](https://seccdn.libravatar.org/avatar/eca4d0658e293ac828929c05684948cb411ba891fa2ad2130aafa91b664e89ad?s=16&d=retro) |
243a19 |
|
|
![](https://seccdn.libravatar.org/avatar/eca4d0658e293ac828929c05684948cb411ba891fa2ad2130aafa91b664e89ad?s=16&d=retro) |
243a19 |
When a UTF-8 string contains a malformation, the bytes are dumped out as
|
|
![](https://seccdn.libravatar.org/avatar/eca4d0658e293ac828929c05684948cb411ba891fa2ad2130aafa91b664e89ad?s=16&d=retro) |
243a19 |
a debugging aid. One should exercise caution, however, and not dump out
|
|
![](https://seccdn.libravatar.org/avatar/eca4d0658e293ac828929c05684948cb411ba891fa2ad2130aafa91b664e89ad?s=16&d=retro) |
243a19 |
bytes that are actually past the end of the string. Commit 99a765e9e37
|
|
![](https://seccdn.libravatar.org/avatar/eca4d0658e293ac828929c05684948cb411ba891fa2ad2130aafa91b664e89ad?s=16&d=retro) |
243a19 |
from 2016 added the capability to signal to the dumping routines that
|
|
![](https://seccdn.libravatar.org/avatar/eca4d0658e293ac828929c05684948cb411ba891fa2ad2130aafa91b664e89ad?s=16&d=retro) |
243a19 |
we're not sure where the string ends, and to dump the minimal possible.
|
|
![](https://seccdn.libravatar.org/avatar/eca4d0658e293ac828929c05684948cb411ba891fa2ad2130aafa91b664e89ad?s=16&d=retro) |
243a19 |
|
|
![](https://seccdn.libravatar.org/avatar/eca4d0658e293ac828929c05684948cb411ba891fa2ad2130aafa91b664e89ad?s=16&d=retro) |
243a19 |
It occurred to me that an additional safety measure can be easily added,
|
|
![](https://seccdn.libravatar.org/avatar/eca4d0658e293ac828929c05684948cb411ba891fa2ad2130aafa91b664e89ad?s=16&d=retro) |
243a19 |
which this commit does. And that is, in the dumping routines to stop at
|
|
![](https://seccdn.libravatar.org/avatar/eca4d0658e293ac828929c05684948cb411ba891fa2ad2130aafa91b664e89ad?s=16&d=retro) |
243a19 |
the first NUL. All strings automatically get a traiing NUL added, even
|
|
![](https://seccdn.libravatar.org/avatar/eca4d0658e293ac828929c05684948cb411ba891fa2ad2130aafa91b664e89ad?s=16&d=retro) |
243a19 |
if they contain embedded NULs. A NUL can never be part of a
|
|
![](https://seccdn.libravatar.org/avatar/eca4d0658e293ac828929c05684948cb411ba891fa2ad2130aafa91b664e89ad?s=16&d=retro) |
243a19 |
malformation, and so its presence likely signals the end of the string.
|
|
![](https://seccdn.libravatar.org/avatar/eca4d0658e293ac828929c05684948cb411ba891fa2ad2130aafa91b664e89ad?s=16&d=retro) |
243a19 |
---
|
|
![](https://seccdn.libravatar.org/avatar/eca4d0658e293ac828929c05684948cb411ba891fa2ad2130aafa91b664e89ad?s=16&d=retro) |
243a19 |
utf8.c | 16 ++++++++++++++--
|
|
![](https://seccdn.libravatar.org/avatar/eca4d0658e293ac828929c05684948cb411ba891fa2ad2130aafa91b664e89ad?s=16&d=retro) |
243a19 |
1 file changed, 14 insertions(+), 2 deletions(-)
|
|
![](https://seccdn.libravatar.org/avatar/eca4d0658e293ac828929c05684948cb411ba891fa2ad2130aafa91b664e89ad?s=16&d=retro) |
243a19 |
|
|
![](https://seccdn.libravatar.org/avatar/eca4d0658e293ac828929c05684948cb411ba891fa2ad2130aafa91b664e89ad?s=16&d=retro) |
243a19 |
diff --git a/utf8.c b/utf8.c
|
|
![](https://seccdn.libravatar.org/avatar/eca4d0658e293ac828929c05684948cb411ba891fa2ad2130aafa91b664e89ad?s=16&d=retro) |
243a19 |
index a3d5f61b64..61346f0cb6 100644
|
|
![](https://seccdn.libravatar.org/avatar/eca4d0658e293ac828929c05684948cb411ba891fa2ad2130aafa91b664e89ad?s=16&d=retro) |
243a19 |
--- a/utf8.c
|
|
![](https://seccdn.libravatar.org/avatar/eca4d0658e293ac828929c05684948cb411ba891fa2ad2130aafa91b664e89ad?s=16&d=retro) |
243a19 |
+++ b/utf8.c
|
|
![](https://seccdn.libravatar.org/avatar/eca4d0658e293ac828929c05684948cb411ba891fa2ad2130aafa91b664e89ad?s=16&d=retro) |
243a19 |
@@ -810,7 +810,7 @@ Perl__byte_dump_string(pTHX_ const U8 * s, const STRLEN len, const bool format)
|
|
![](https://seccdn.libravatar.org/avatar/eca4d0658e293ac828929c05684948cb411ba891fa2ad2130aafa91b664e89ad?s=16&d=retro) |
243a19 |
PERL_STATIC_INLINE char *
|
|
![](https://seccdn.libravatar.org/avatar/eca4d0658e293ac828929c05684948cb411ba891fa2ad2130aafa91b664e89ad?s=16&d=retro) |
243a19 |
S_unexpected_non_continuation_text(pTHX_ const U8 * const s,
|
|
![](https://seccdn.libravatar.org/avatar/eca4d0658e293ac828929c05684948cb411ba891fa2ad2130aafa91b664e89ad?s=16&d=retro) |
243a19 |
|
|
![](https://seccdn.libravatar.org/avatar/eca4d0658e293ac828929c05684948cb411ba891fa2ad2130aafa91b664e89ad?s=16&d=retro) |
243a19 |
- /* How many bytes to print */
|
|
![](https://seccdn.libravatar.org/avatar/eca4d0658e293ac828929c05684948cb411ba891fa2ad2130aafa91b664e89ad?s=16&d=retro) |
243a19 |
+ /* Max number of bytes to print */
|
|
![](https://seccdn.libravatar.org/avatar/eca4d0658e293ac828929c05684948cb411ba891fa2ad2130aafa91b664e89ad?s=16&d=retro) |
243a19 |
STRLEN print_len,
|
|
![](https://seccdn.libravatar.org/avatar/eca4d0658e293ac828929c05684948cb411ba891fa2ad2130aafa91b664e89ad?s=16&d=retro) |
243a19 |
|
|
![](https://seccdn.libravatar.org/avatar/eca4d0658e293ac828929c05684948cb411ba891fa2ad2130aafa91b664e89ad?s=16&d=retro) |
243a19 |
/* Which one is the non-continuation */
|
|
![](https://seccdn.libravatar.org/avatar/eca4d0658e293ac828929c05684948cb411ba891fa2ad2130aafa91b664e89ad?s=16&d=retro) |
243a19 |
@@ -826,6 +826,8 @@ S_unexpected_non_continuation_text(pTHX_ const U8 * const s,
|
|
![](https://seccdn.libravatar.org/avatar/eca4d0658e293ac828929c05684948cb411ba891fa2ad2130aafa91b664e89ad?s=16&d=retro) |
243a19 |
? "immediately"
|
|
![](https://seccdn.libravatar.org/avatar/eca4d0658e293ac828929c05684948cb411ba891fa2ad2130aafa91b664e89ad?s=16&d=retro) |
243a19 |
: Perl_form(aTHX_ "%d bytes",
|
|
![](https://seccdn.libravatar.org/avatar/eca4d0658e293ac828929c05684948cb411ba891fa2ad2130aafa91b664e89ad?s=16&d=retro) |
243a19 |
(int) non_cont_byte_pos);
|
|
![](https://seccdn.libravatar.org/avatar/eca4d0658e293ac828929c05684948cb411ba891fa2ad2130aafa91b664e89ad?s=16&d=retro) |
243a19 |
+ const U8 * x = s + non_cont_byte_pos;
|
|
![](https://seccdn.libravatar.org/avatar/eca4d0658e293ac828929c05684948cb411ba891fa2ad2130aafa91b664e89ad?s=16&d=retro) |
243a19 |
+ const U8 * e = s + print_len;
|
|
![](https://seccdn.libravatar.org/avatar/eca4d0658e293ac828929c05684948cb411ba891fa2ad2130aafa91b664e89ad?s=16&d=retro) |
243a19 |
|
|
![](https://seccdn.libravatar.org/avatar/eca4d0658e293ac828929c05684948cb411ba891fa2ad2130aafa91b664e89ad?s=16&d=retro) |
243a19 |
PERL_ARGS_ASSERT_UNEXPECTED_NON_CONTINUATION_TEXT;
|
|
![](https://seccdn.libravatar.org/avatar/eca4d0658e293ac828929c05684948cb411ba891fa2ad2130aafa91b664e89ad?s=16&d=retro) |
243a19 |
|
|
![](https://seccdn.libravatar.org/avatar/eca4d0658e293ac828929c05684948cb411ba891fa2ad2130aafa91b664e89ad?s=16&d=retro) |
243a19 |
@@ -833,10 +835,20 @@ S_unexpected_non_continuation_text(pTHX_ const U8 * const s,
|
|
![](https://seccdn.libravatar.org/avatar/eca4d0658e293ac828929c05684948cb411ba891fa2ad2130aafa91b664e89ad?s=16&d=retro) |
243a19 |
* calculated, it's likely faster to pass it; verify under DEBUGGING */
|
|
![](https://seccdn.libravatar.org/avatar/eca4d0658e293ac828929c05684948cb411ba891fa2ad2130aafa91b664e89ad?s=16&d=retro) |
243a19 |
assert(expect_len == UTF8SKIP(s));
|
|
![](https://seccdn.libravatar.org/avatar/eca4d0658e293ac828929c05684948cb411ba891fa2ad2130aafa91b664e89ad?s=16&d=retro) |
243a19 |
|
|
![](https://seccdn.libravatar.org/avatar/eca4d0658e293ac828929c05684948cb411ba891fa2ad2130aafa91b664e89ad?s=16&d=retro) |
243a19 |
+ /* As a defensive coding measure, don't output anything past a NUL. Such
|
|
![](https://seccdn.libravatar.org/avatar/eca4d0658e293ac828929c05684948cb411ba891fa2ad2130aafa91b664e89ad?s=16&d=retro) |
243a19 |
+ * bytes shouldn't be in the middle of a malformation, and could mark the
|
|
![](https://seccdn.libravatar.org/avatar/eca4d0658e293ac828929c05684948cb411ba891fa2ad2130aafa91b664e89ad?s=16&d=retro) |
243a19 |
+ * end of the allocated string, and what comes after is undefined */
|
|
![](https://seccdn.libravatar.org/avatar/eca4d0658e293ac828929c05684948cb411ba891fa2ad2130aafa91b664e89ad?s=16&d=retro) |
243a19 |
+ for (; x < e; x++) {
|
|
![](https://seccdn.libravatar.org/avatar/eca4d0658e293ac828929c05684948cb411ba891fa2ad2130aafa91b664e89ad?s=16&d=retro) |
243a19 |
+ if (*x == '\0') {
|
|
![](https://seccdn.libravatar.org/avatar/eca4d0658e293ac828929c05684948cb411ba891fa2ad2130aafa91b664e89ad?s=16&d=retro) |
243a19 |
+ x++; /* Output this particular NUL */
|
|
![](https://seccdn.libravatar.org/avatar/eca4d0658e293ac828929c05684948cb411ba891fa2ad2130aafa91b664e89ad?s=16&d=retro) |
243a19 |
+ break;
|
|
![](https://seccdn.libravatar.org/avatar/eca4d0658e293ac828929c05684948cb411ba891fa2ad2130aafa91b664e89ad?s=16&d=retro) |
243a19 |
+ }
|
|
![](https://seccdn.libravatar.org/avatar/eca4d0658e293ac828929c05684948cb411ba891fa2ad2130aafa91b664e89ad?s=16&d=retro) |
243a19 |
+ }
|
|
![](https://seccdn.libravatar.org/avatar/eca4d0658e293ac828929c05684948cb411ba891fa2ad2130aafa91b664e89ad?s=16&d=retro) |
243a19 |
+
|
|
![](https://seccdn.libravatar.org/avatar/eca4d0658e293ac828929c05684948cb411ba891fa2ad2130aafa91b664e89ad?s=16&d=retro) |
243a19 |
return Perl_form(aTHX_ "%s: %s (unexpected non-continuation byte 0x%02x,"
|
|
![](https://seccdn.libravatar.org/avatar/eca4d0658e293ac828929c05684948cb411ba891fa2ad2130aafa91b664e89ad?s=16&d=retro) |
243a19 |
" %s after start byte 0x%02x; need %d bytes, got %d)",
|
|
![](https://seccdn.libravatar.org/avatar/eca4d0658e293ac828929c05684948cb411ba891fa2ad2130aafa91b664e89ad?s=16&d=retro) |
243a19 |
malformed_text,
|
|
![](https://seccdn.libravatar.org/avatar/eca4d0658e293ac828929c05684948cb411ba891fa2ad2130aafa91b664e89ad?s=16&d=retro) |
243a19 |
- _byte_dump_string(s, print_len, 0),
|
|
![](https://seccdn.libravatar.org/avatar/eca4d0658e293ac828929c05684948cb411ba891fa2ad2130aafa91b664e89ad?s=16&d=retro) |
243a19 |
+ _byte_dump_string(s, x - s, 0),
|
|
![](https://seccdn.libravatar.org/avatar/eca4d0658e293ac828929c05684948cb411ba891fa2ad2130aafa91b664e89ad?s=16&d=retro) |
243a19 |
*(s + non_cont_byte_pos),
|
|
![](https://seccdn.libravatar.org/avatar/eca4d0658e293ac828929c05684948cb411ba891fa2ad2130aafa91b664e89ad?s=16&d=retro) |
243a19 |
where,
|
|
![](https://seccdn.libravatar.org/avatar/eca4d0658e293ac828929c05684948cb411ba891fa2ad2130aafa91b664e89ad?s=16&d=retro) |
243a19 |
*s,
|
|
![](https://seccdn.libravatar.org/avatar/eca4d0658e293ac828929c05684948cb411ba891fa2ad2130aafa91b664e89ad?s=16&d=retro) |
243a19 |
--
|
|
![](https://seccdn.libravatar.org/avatar/eca4d0658e293ac828929c05684948cb411ba891fa2ad2130aafa91b664e89ad?s=16&d=retro) |
243a19 |
2.11.0
|
|
![](https://seccdn.libravatar.org/avatar/eca4d0658e293ac828929c05684948cb411ba891fa2ad2130aafa91b664e89ad?s=16&d=retro) |
243a19 |
|