Blame SOURCES/00359-CVE-2021-23336.patch

3309fe
From 976a4010aa4e450855dce5fa4c865bcbdc86cccd Mon Sep 17 00:00:00 2001
3309fe
From: Charalampos Stratakis <cstratak@redhat.com>
3309fe
Date: Fri, 16 Apr 2021 18:02:00 +0200
3309fe
Subject: [PATCH] CVE-2021-23336: Add `separator` argument to parse_qs; warn
3309fe
 with default
3309fe
MIME-Version: 1.0
3309fe
Content-Type: text/plain; charset=UTF-8
3309fe
Content-Transfer-Encoding: 8bit
3309fe
3309fe
Partially backports https://bugs.python.org/issue42967 : [security] Address a web cache-poisoning issue reported in urllib.parse.parse_qsl().
3309fe
3309fe
Backported from the python3 branch.
3309fe
However, this solution is different than the upstream solution in Python 3.
3309fe
3309fe
Based on the downstream solution for python 3.6.13 by Petr Viktorin.
3309fe
3309fe
An optional argument seperator is added to specify the separator.
3309fe
It is recommended to set it to '&' or ';' to match the application or proxy in use.
3309fe
The default can be set with an env variable of a config file.
3309fe
If neither the argument, env var or config file specifies a separator, "&" is used
3309fe
but a warning is raised if parse_qs is used on input that contains ';'.
3309fe
3309fe
Co-authors of the downstream change:
3309fe
Co-authored-by: Petr Viktorin <pviktori@redhat.com>
3309fe
Co-authors of the upstream change (who do not necessarily agree with this):
3309fe
Co-authored-by: Adam Goldschmidt <adamgold7@gmail.com>
3309fe
Co-authored-by: Ken Jin <28750310+Fidget-Spinner@users.noreply.github.com>
3309fe
Co-authored-by: Éric Araujo <merwok@netwok.org>
3309fe
---
3309fe
 Doc/library/cgi.rst       |   5 +-
3309fe
 Doc/library/urlparse.rst  |  15 ++-
3309fe
 Lib/cgi.py                |  34 +++---
3309fe
 Lib/test/test_cgi.py      |  59 ++++++++++-
3309fe
 Lib/test/test_urlparse.py | 210 +++++++++++++++++++++++++++++++++++++-
3309fe
 Lib/urlparse.py           |  78 +++++++++++++-
3309fe
 6 files changed, 369 insertions(+), 32 deletions(-)
3309fe
3309fe
diff --git a/Doc/library/cgi.rst b/Doc/library/cgi.rst
3309fe
index ecd62c8c019..a96cd38717b 100644
3309fe
--- a/Doc/library/cgi.rst
3309fe
+++ b/Doc/library/cgi.rst
3309fe
@@ -285,10 +285,10 @@ These are useful if you want more control, or if you want to employ some of the
3309fe
 algorithms implemented in this module in other circumstances.
3309fe
 
3309fe
 
3309fe
-.. function:: parse(fp[, environ[, keep_blank_values[, strict_parsing]]])
3309fe
+.. function:: parse(fp[, environ[, keep_blank_values[, strict_parsing[, separator]]]])
3309fe
 
3309fe
    Parse a query in the environment or from a file (the file defaults to
3309fe
-   ``sys.stdin`` and environment defaults to ``os.environ``).  The *keep_blank_values* and *strict_parsing* parameters are
3309fe
+   ``sys.stdin`` and environment defaults to ``os.environ``).  The *keep_blank_values*, *strict_parsing* and *separator* parameters are
3309fe
    passed to :func:`urlparse.parse_qs` unchanged.
3309fe
 
3309fe
 
3309fe
@@ -316,7 +316,6 @@ algorithms implemented in this module in other circumstances.
3309fe
    Note that this does not parse nested multipart parts --- use
3309fe
    :class:`FieldStorage` for that.
3309fe
 
3309fe
-
3309fe
 .. function:: parse_header(string)
3309fe
 
3309fe
    Parse a MIME header (such as :mailheader:`Content-Type`) into a main value and a
3309fe
diff --git a/Doc/library/urlparse.rst b/Doc/library/urlparse.rst
3309fe
index 0989c88c302..97d1119257c 100644
3309fe
--- a/Doc/library/urlparse.rst
3309fe
+++ b/Doc/library/urlparse.rst
3309fe
@@ -136,7 +136,7 @@ The :mod:`urlparse` module defines the following functions:
3309fe
       now raise :exc:`ValueError`.
3309fe
 
3309fe
 
3309fe
-.. function:: parse_qs(qs[, keep_blank_values[, strict_parsing[, max_num_fields]]])
3309fe
+.. function:: parse_qs(qs[, keep_blank_values[, strict_parsing[, max_num_fields[, separator]]]])
3309fe
 
3309fe
    Parse a query string given as a string argument (data of type
3309fe
    :mimetype:`application/x-www-form-urlencoded`).  Data are returned as a
3309fe
@@ -157,6 +157,15 @@ The :mod:`urlparse` module defines the following functions:
3309fe
    read. If set, then throws a :exc:`ValueError` if there are more than
3309fe
    *max_num_fields* fields read.
3309fe
 
3309fe
+   The optional argument *separator* is the symbol to use for separating the
3309fe
+   query arguments. It is recommended to set it to ``'&'`` or ``';'``.
3309fe
+   It defaults to ``'&'``; a warning is raised if this default is used.
3309fe
+   This default may be changed with the following environment variable settings:
3309fe
+
3309fe
+   - ``PYTHON_URLLIB_QS_SEPARATOR='&'``: use only ``&`` as separator, without warning (as in Python 3.6.13+ or 3.10)
3309fe
+   - ``PYTHON_URLLIB_QS_SEPARATOR=';'``: use only ``;`` as separator
3309fe
+   - ``PYTHON_URLLIB_QS_SEPARATOR=legacy``: use both ``&`` and ``;`` (as in previous versions of Python)
3309fe
+
3309fe
    Use the :func:`urllib.urlencode` function to convert such dictionaries into
3309fe
    query strings.
3309fe
 
3309fe
@@ -186,6 +195,9 @@ The :mod:`urlparse` module defines the following functions:
3309fe
    read. If set, then throws a :exc:`ValueError` if there are more than
3309fe
    *max_num_fields* fields read.
3309fe
 
3309fe
+   The optional argument *separator* is the symbol to use for separating the
3309fe
+   query arguments. It works as in :py:func:`parse_qs`.
3309fe
+
3309fe
    Use the :func:`urllib.urlencode` function to convert such lists of pairs into
3309fe
    query strings.
3309fe
 
3309fe
@@ -195,6 +207,7 @@ The :mod:`urlparse` module defines the following functions:
3309fe
    .. versionchanged:: 2.7.16
3309fe
       Added *max_num_fields* parameter.
3309fe
 
3309fe
+
3309fe
 .. function:: urlunparse(parts)
3309fe
 
3309fe
    Construct a URL from a tuple as returned by ``urlparse()``. The *parts* argument
3309fe
diff --git a/Lib/cgi.py b/Lib/cgi.py
3309fe
index 5b903e03477..1421f2d90e0 100755
3309fe
--- a/Lib/cgi.py
3309fe
+++ b/Lib/cgi.py
3309fe
@@ -121,7 +121,8 @@ log = initlog           # The current logging function
3309fe
 # 0 ==> unlimited input
3309fe
 maxlen = 0
3309fe
 
3309fe
-def parse(fp=None, environ=os.environ, keep_blank_values=0, strict_parsing=0):
3309fe
+def parse(fp=None, environ=os.environ, keep_blank_values=0,
3309fe
+          strict_parsing=0, separator=None):
3309fe
     """Parse a query in the environment or from a file (default stdin)
3309fe
 
3309fe
         Arguments, all optional:
3309fe
@@ -140,6 +141,8 @@ def parse(fp=None, environ=os.environ, keep_blank_values=0, strict_parsing=0):
3309fe
         strict_parsing: flag indicating what to do with parsing errors.
3309fe
             If false (the default), errors are silently ignored.
3309fe
             If true, errors raise a ValueError exception.
3309fe
+
3309fe
+        separator: str. The symbol to use for separating the query arguments.
3309fe
     """
3309fe
     if fp is None:
3309fe
         fp = sys.stdin
3309fe
@@ -171,25 +174,26 @@ def parse(fp=None, environ=os.environ, keep_blank_values=0, strict_parsing=0):
3309fe
         else:
3309fe
             qs = ""
3309fe
         environ['QUERY_STRING'] = qs    # XXX Shouldn't, really
3309fe
-    return urlparse.parse_qs(qs, keep_blank_values, strict_parsing)
3309fe
+    return urlparse.parse_qs(qs, keep_blank_values, strict_parsing, separator=separator)
3309fe
 
3309fe
 
3309fe
 # parse query string function called from urlparse,
3309fe
 # this is done in order to maintain backward compatibility.
3309fe
 
3309fe
-def parse_qs(qs, keep_blank_values=0, strict_parsing=0):
3309fe
+def parse_qs(qs, keep_blank_values=0, strict_parsing=0, separator=None):
3309fe
     """Parse a query given as a string argument."""
3309fe
     warn("cgi.parse_qs is deprecated, use urlparse.parse_qs instead",
3309fe
          PendingDeprecationWarning, 2)
3309fe
-    return urlparse.parse_qs(qs, keep_blank_values, strict_parsing)
3309fe
+    return urlparse.parse_qs(qs, keep_blank_values, strict_parsing,
3309fe
+                             separator=separator)
3309fe
 
3309fe
 
3309fe
-def parse_qsl(qs, keep_blank_values=0, strict_parsing=0, max_num_fields=None):
3309fe
+def parse_qsl(qs, keep_blank_values=0, strict_parsing=0, max_num_fields=None, separator=None):
3309fe
     """Parse a query given as a string argument."""
3309fe
     warn("cgi.parse_qsl is deprecated, use urlparse.parse_qsl instead",
3309fe
          PendingDeprecationWarning, 2)
3309fe
     return urlparse.parse_qsl(qs, keep_blank_values, strict_parsing,
3309fe
-                              max_num_fields)
3309fe
+                              max_num_fields, separator=separator)
3309fe
 
3309fe
 def parse_multipart(fp, pdict):
3309fe
     """Parse multipart input.
3309fe
@@ -288,7 +292,6 @@ def parse_multipart(fp, pdict):
3309fe
 
3309fe
     return partdict
3309fe
 
3309fe
-
3309fe
 def _parseparam(s):
3309fe
     while s[:1] == ';':
3309fe
         s = s[1:]
3309fe
@@ -395,7 +398,7 @@ class FieldStorage:
3309fe
 
3309fe
     def __init__(self, fp=None, headers=None, outerboundary="",
3309fe
                  environ=os.environ, keep_blank_values=0, strict_parsing=0,
3309fe
-                 max_num_fields=None):
3309fe
+                 max_num_fields=None, separator=None):
3309fe
         """Constructor.  Read multipart/* until last part.
3309fe
 
3309fe
         Arguments, all optional:
3309fe
@@ -430,6 +433,7 @@ class FieldStorage:
3309fe
         self.keep_blank_values = keep_blank_values
3309fe
         self.strict_parsing = strict_parsing
3309fe
         self.max_num_fields = max_num_fields
3309fe
+        self.separator = separator
3309fe
         if 'REQUEST_METHOD' in environ:
3309fe
             method = environ['REQUEST_METHOD'].upper()
3309fe
         self.qs_on_post = None
3309fe
@@ -613,7 +617,8 @@ class FieldStorage:
3309fe
         if self.qs_on_post:
3309fe
             qs += '&' + self.qs_on_post
3309fe
         query = urlparse.parse_qsl(qs, self.keep_blank_values,
3309fe
-                                   self.strict_parsing, self.max_num_fields)
3309fe
+                                   self.strict_parsing, self.max_num_fields,
3309fe
+                                   self.separator)
3309fe
         self.list = [MiniFieldStorage(key, value) for key, value in query]
3309fe
         self.skip_lines()
3309fe
 
3309fe
@@ -629,7 +634,8 @@ class FieldStorage:
3309fe
             query = urlparse.parse_qsl(self.qs_on_post,
3309fe
                                        self.keep_blank_values,
3309fe
                                        self.strict_parsing,
3309fe
-                                       self.max_num_fields)
3309fe
+                                       self.max_num_fields,
3309fe
+                                       self.separator)
3309fe
             self.list.extend(MiniFieldStorage(key, value)
3309fe
                              for key, value in query)
3309fe
             FieldStorageClass = None
3309fe
@@ -649,7 +655,8 @@ class FieldStorage:
3309fe
             headers = rfc822.Message(self.fp)
3309fe
             part = klass(self.fp, headers, ib,
3309fe
                          environ, keep_blank_values, strict_parsing,
3309fe
-                         max_num_fields)
3309fe
+                         max_num_fields,
3309fe
+                         separator=self.separator)
3309fe
 
3309fe
             if max_num_fields is not None:
3309fe
                 max_num_fields -= 1
3309fe
@@ -817,10 +824,11 @@ class FormContentDict(UserDict.UserDict):
3309fe
     form.dict == {key: [val, val, ...], ...}
3309fe
 
3309fe
     """
3309fe
-    def __init__(self, environ=os.environ, keep_blank_values=0, strict_parsing=0):
3309fe
+    def __init__(self, environ=os.environ, keep_blank_values=0, strict_parsing=0, separator=None):
3309fe
         self.dict = self.data = parse(environ=environ,
3309fe
                                       keep_blank_values=keep_blank_values,
3309fe
-                                      strict_parsing=strict_parsing)
3309fe
+                                      strict_parsing=strict_parsing,
3309fe
+                                      separator=separator)
3309fe
         self.query_string = environ['QUERY_STRING']
3309fe
 
3309fe
 
3309fe
diff --git a/Lib/test/test_cgi.py b/Lib/test/test_cgi.py
3309fe
index 743c2afbd4c..9956ea9d4e8 100644
3309fe
--- a/Lib/test/test_cgi.py
3309fe
+++ b/Lib/test/test_cgi.py
3309fe
@@ -61,12 +61,9 @@ parse_strict_test_cases = [
3309fe
     ("", ValueError("bad query field: ''")),
3309fe
     ("&", ValueError("bad query field: ''")),
3309fe
     ("&&", ValueError("bad query field: ''")),
3309fe
-    (";", ValueError("bad query field: ''")),
3309fe
-    (";&;", ValueError("bad query field: ''")),
3309fe
     # Should the next few really be valid?
3309fe
     ("=", {}),
3309fe
     ("=&=", {}),
3309fe
-    ("=;=", {}),
3309fe
     # This rest seem to make sense
3309fe
     ("=a", {'': ['a']}),
3309fe
     ("&=a", ValueError("bad query field: ''")),
3309fe
@@ -81,8 +78,6 @@ parse_strict_test_cases = [
3309fe
     ("a=a+b&b=b+c", {'a': ['a b'], 'b': ['b c']}),
3309fe
     ("a=a+b&a=b+a", {'a': ['a b', 'b a']}),
3309fe
     ("x=1&y=2.0&z=2-3.%2b0", {'x': ['1'], 'y': ['2.0'], 'z': ['2-3.+0']}),
3309fe
-    ("x=1;y=2.0&z=2-3.%2b0", {'x': ['1'], 'y': ['2.0'], 'z': ['2-3.+0']}),
3309fe
-    ("x=1;y=2.0;z=2-3.%2b0", {'x': ['1'], 'y': ['2.0'], 'z': ['2-3.+0']}),
3309fe
     ("Hbc5161168c542333633315dee1182227:key_store_seqid=400006&cuyer=r&view=bustomer&order_id=0bb2e248638833d48cb7fed300000f1b&expire=964546263&lobale=en-US&kid=130003.300038&ss=env",
3309fe
      {'Hbc5161168c542333633315dee1182227:key_store_seqid': ['400006'],
3309fe
       'cuyer': ['r'],
3309fe
@@ -177,6 +172,60 @@ class CgiTests(unittest.TestCase):
3309fe
                         self.assertItemsEqual(sd.items(),
3309fe
                                                 first_second_elts(expect.items()))
3309fe
 
3309fe
+    def test_separator(self):
3309fe
+        parse_semicolon = [
3309fe
+            ("x=1;y=2.0", {'x': ['1'], 'y': ['2.0']}),
3309fe
+            ("x=1;y=2.0;z=2-3.%2b0", {'x': ['1'], 'y': ['2.0'], 'z': ['2-3.+0']}),
3309fe
+            (";", ValueError("bad query field: ''")),
3309fe
+            (";;", ValueError("bad query field: ''")),
3309fe
+            ("=;a", ValueError("bad query field: 'a'")),
3309fe
+            (";b=a", ValueError("bad query field: ''")),
3309fe
+            ("b;=a", ValueError("bad query field: 'b'")),
3309fe
+            ("a=a+b;b=b+c", {'a': ['a b'], 'b': ['b c']}),
3309fe
+            ("a=a+b;a=b+a", {'a': ['a b', 'b a']}),
3309fe
+        ]
3309fe
+        for orig, expect in parse_semicolon:
3309fe
+            env = {'QUERY_STRING': orig}
3309fe
+            fcd = cgi.FormContentDict(env, separator=';')
3309fe
+            sd = cgi.SvFormContentDict(env, separator=';')
3309fe
+            fs = cgi.FieldStorage(environ=env, separator=';')
3309fe
+            if isinstance(expect, dict):
3309fe
+                # test dict interface
3309fe
+                self.assertEqual(len(expect), len(fcd))
3309fe
+                self.assertItemsEqual(expect.keys(), fcd.keys())
3309fe
+                self.assertItemsEqual(expect.values(), fcd.values())
3309fe
+                self.assertItemsEqual(expect.items(), fcd.items())
3309fe
+                self.assertEqual(fcd.get("nonexistent field", "default"), "default")
3309fe
+                self.assertEqual(len(sd), len(fs))
3309fe
+                self.assertItemsEqual(sd.keys(), fs.keys())
3309fe
+                self.assertEqual(fs.getvalue("nonexistent field", "default"), "default")
3309fe
+                # test individual fields
3309fe
+                for key in expect.keys():
3309fe
+                    expect_val = expect[key]
3309fe
+                    self.assertTrue(fcd.has_key(key))
3309fe
+                    self.assertItemsEqual(fcd[key], expect[key])
3309fe
+                    self.assertEqual(fcd.get(key, "default"), fcd[key])
3309fe
+                    self.assertTrue(fs.has_key(key))
3309fe
+                    if len(expect_val) > 1:
3309fe
+                        single_value = 0
3309fe
+                    else:
3309fe
+                        single_value = 1
3309fe
+                    try:
3309fe
+                        val = sd[key]
3309fe
+                    except IndexError:
3309fe
+                        self.assertFalse(single_value)
3309fe
+                        self.assertEqual(fs.getvalue(key), expect_val)
3309fe
+                    else:
3309fe
+                        self.assertTrue(single_value)
3309fe
+                        self.assertEqual(val, expect_val[0])
3309fe
+                        self.assertEqual(fs.getvalue(key), expect_val[0])
3309fe
+                    self.assertItemsEqual(sd.getlist(key), expect_val)
3309fe
+                    if single_value:
3309fe
+                        self.assertItemsEqual(sd.values(),
3309fe
+                                                first_elts(expect.values()))
3309fe
+                        self.assertItemsEqual(sd.items(),
3309fe
+                                                first_second_elts(expect.items()))
3309fe
+
3309fe
     def test_weird_formcontentdict(self):
3309fe
         # Test the weird FormContentDict classes
3309fe
         env = {'QUERY_STRING': "x=1&y=2.0&z=2-3.%2b0&1=1abc"}
3309fe
diff --git a/Lib/test/test_urlparse.py b/Lib/test/test_urlparse.py
3309fe
index 86c4a0595c4..21875bb2991 100644
3309fe
--- a/Lib/test/test_urlparse.py
3309fe
+++ b/Lib/test/test_urlparse.py
3309fe
@@ -3,6 +3,12 @@ import sys
3309fe
 import unicodedata
3309fe
 import unittest
3309fe
 import urlparse
3309fe
+from test.support import EnvironmentVarGuard
3309fe
+from warnings import catch_warnings, filterwarnings
3309fe
+import tempfile
3309fe
+import contextlib
3309fe
+import os.path
3309fe
+import shutil
3309fe
 
3309fe
 RFC1808_BASE = "http://a/b/c/d;p?q#f"
3309fe
 RFC2396_BASE = "http://a/b/c/d;p?q"
3309fe
@@ -24,16 +30,29 @@ parse_qsl_test_cases = [
3309fe
     ("&a=b", [('a', 'b')]),
3309fe
     ("a=a+b&b=b+c", [('a', 'a b'), ('b', 'b c')]),
3309fe
     ("a=1&a=2", [('a', '1'), ('a', '2')]),
3309fe
+]
3309fe
+
3309fe
+parse_qsl_test_cases_semicolon = [
3309fe
     (";", []),
3309fe
     (";;", []),
3309fe
     (";a=b", [('a', 'b')]),
3309fe
     ("a=a+b;b=b+c", [('a', 'a b'), ('b', 'b c')]),
3309fe
     ("a=1;a=2", [('a', '1'), ('a', '2')]),
3309fe
-    (b";", []),
3309fe
-    (b";;", []),
3309fe
-    (b";a=b", [(b'a', b'b')]),
3309fe
-    (b"a=a+b;b=b+c", [(b'a', b'a b'), (b'b', b'b c')]),
3309fe
-    (b"a=1;a=2", [(b'a', b'1'), (b'a', b'2')]),
3309fe
+]
3309fe
+
3309fe
+parse_qsl_test_cases_legacy = [
3309fe
+    ("a=1;a=2&a=3", [('a', '1'), ('a', '2'), ('a', '3')]),
3309fe
+    ("a=1;b=2&c=3", [('a', '1'), ('b', '2'), ('c', '3')]),
3309fe
+    ("a=1&b=2&c=3;", [('a', '1'), ('b', '2'), ('c', '3')]),
3309fe
+]
3309fe
+
3309fe
+parse_qsl_test_cases_warn = [
3309fe
+    (";a=b", [(';a', 'b')]),
3309fe
+    ("a=a+b;b=b+c", [('a', 'a b;b=b c')]),
3309fe
+    (b";a=b", [(b';a', b'b')]),
3309fe
+    (b"a=a+b;b=b+c", [(b'a', b'a b;b=b c')]),
3309fe
+    ("a=1;a=2&a=3", [('a', '1;a=2'), ('a', '3')]),
3309fe
+    (b"a=1;a=2&a=3", [(b'a', b'1;a=2'), (b'a', b'3')]),
3309fe
 ]
3309fe
 
3309fe
 parse_qs_test_cases = [
3309fe
@@ -57,6 +76,9 @@ parse_qs_test_cases = [
3309fe
     (b"&a=b", {b'a': [b'b']}),
3309fe
     (b"a=a+b&b=b+c", {b'a': [b'a b'], b'b': [b'b c']}),
3309fe
     (b"a=1&a=2", {b'a': [b'1', b'2']}),
3309fe
+]
3309fe
+
3309fe
+parse_qs_test_cases_semicolon = [
3309fe
     (";", {}),
3309fe
     (";;", {}),
3309fe
     (";a=b", {'a': ['b']}),
3309fe
@@ -69,6 +91,24 @@ parse_qs_test_cases = [
3309fe
     (b"a=1;a=2", {b'a': [b'1', b'2']}),
3309fe
 ]
3309fe
 
3309fe
+parse_qs_test_cases_legacy = [
3309fe
+    ("a=1;a=2&a=3", {'a': ['1', '2', '3']}),
3309fe
+    ("a=1;b=2&c=3", {'a': ['1'], 'b': ['2'], 'c': ['3']}),
3309fe
+    ("a=1&b=2&c=3;", {'a': ['1'], 'b': ['2'], 'c': ['3']}),
3309fe
+    (b"a=1;a=2&a=3", {b'a': [b'1', b'2', b'3']}),
3309fe
+    (b"a=1;b=2&c=3", {b'a': [b'1'], b'b': [b'2'], b'c': [b'3']}),
3309fe
+    (b"a=1&b=2&c=3;", {b'a': [b'1'], b'b': [b'2'], b'c': [b'3']}),
3309fe
+]
3309fe
+
3309fe
+parse_qs_test_cases_warn = [
3309fe
+    (";a=b", {';a': ['b']}),
3309fe
+    ("a=a+b;b=b+c", {'a': ['a b;b=b c']}),
3309fe
+    (b";a=b", {b';a': [b'b']}),
3309fe
+    (b"a=a+b;b=b+c", {b'a':[ b'a b;b=b c']}),
3309fe
+    ("a=1;a=2&a=3", {'a': ['1;a=2', '3']}),
3309fe
+    (b"a=1;a=2&a=3", {b'a': [b'1;a=2', b'3']}),
3309fe
+]
3309fe
+
3309fe
 class UrlParseTestCase(unittest.TestCase):
3309fe
 
3309fe
     def checkRoundtrips(self, url, parsed, split):
3309fe
@@ -141,6 +181,40 @@ class UrlParseTestCase(unittest.TestCase):
3309fe
             self.assertEqual(result, expect_without_blanks,
3309fe
                     "Error parsing %r" % orig)
3309fe
 
3309fe
+    def test_qs_default_warn(self):
3309fe
+        for orig, expect in parse_qs_test_cases_warn:
3309fe
+            with catch_warnings(record=True) as w:
3309fe
+                filterwarnings(action='always',
3309fe
+                                        category=urlparse._QueryStringSeparatorWarning)
3309fe
+                result = urlparse.parse_qs(orig, keep_blank_values=True)
3309fe
+                self.assertEqual(result, expect, "Error parsing %r" % orig)
3309fe
+            self.assertEqual(len(w), 1)
3309fe
+            self.assertEqual(w[0].category, urlparse._QueryStringSeparatorWarning)
3309fe
+
3309fe
+    def test_qsl_default_warn(self):
3309fe
+        for orig, expect in parse_qsl_test_cases_warn:
3309fe
+            with catch_warnings(record=True) as w:
3309fe
+                filterwarnings(action='always',
3309fe
+                               category=urlparse._QueryStringSeparatorWarning)
3309fe
+                result = urlparse.parse_qsl(orig, keep_blank_values=True)
3309fe
+                self.assertEqual(result, expect, "Error parsing %r" % orig)
3309fe
+            self.assertEqual(len(w), 1)
3309fe
+            self.assertEqual(w[0].category, urlparse._QueryStringSeparatorWarning)
3309fe
+
3309fe
+    def test_default_qs_no_warnings(self):
3309fe
+        for orig, expect in parse_qs_test_cases:
3309fe
+            with catch_warnings(record=True) as w:
3309fe
+                result = urlparse.parse_qs(orig, keep_blank_values=True)
3309fe
+                self.assertEqual(result, expect, "Error parsing %r" % orig)
3309fe
+            self.assertEqual(len(w), 0)
3309fe
+
3309fe
+    def test_default_qsl_no_warnings(self):
3309fe
+        for orig, expect in parse_qsl_test_cases:
3309fe
+            with catch_warnings(record=True) as w:
3309fe
+                result = urlparse.parse_qsl(orig, keep_blank_values=True)
3309fe
+                self.assertEqual(result, expect, "Error parsing %r" % orig)
3309fe
+            self.assertEqual(len(w), 0)
3309fe
+
3309fe
     def test_roundtrips(self):
3309fe
         testcases = [
3309fe
             ('file:///tmp/junk.txt',
3309fe
@@ -626,6 +700,132 @@ class UrlParseTestCase(unittest.TestCase):
3309fe
         self.assertEqual(urlparse.urlparse("http://www.python.org:80"),
3309fe
                 ('http','www.python.org:80','','','',''))
3309fe
 
3309fe
+    def test_parse_qs_separator_bytes(self):
3309fe
+        expected = {b'a': [b'1'], b'b': [b'2']}
3309fe
+
3309fe
+        result = urlparse.parse_qs(b'a=1;b=2', separator=b';')
3309fe
+        self.assertEqual(result, expected)
3309fe
+        result = urlparse.parse_qs(b'a=1;b=2', separator=';')
3309fe
+        self.assertEqual(result, expected)
3309fe
+        result = urlparse.parse_qs('a=1;b=2', separator=';')
3309fe
+        self.assertEqual(result, {'a': ['1'], 'b': ['2']})
3309fe
+
3309fe
+    @contextlib.contextmanager
3309fe
+    def _qsl_sep_config(self, sep):
3309fe
+        """Context for the given parse_qsl default separator configured in config file"""
3309fe
+        old_filename = urlparse._QS_SEPARATOR_CONFIG_FILENAME
3309fe
+        urlparse._default_qs_separator = None
3309fe
+        try:
3309fe
+            tmpdirname = tempfile.mkdtemp()
3309fe
+            filename = os.path.join(tmpdirname, 'conf.cfg')
3309fe
+            with open(filename, 'w') as file:
3309fe
+                file.write('[parse_qs]\n')
3309fe
+                file.write('PYTHON_URLLIB_QS_SEPARATOR = {}'.format(sep))
3309fe
+            urlparse._QS_SEPARATOR_CONFIG_FILENAME = filename
3309fe
+            yield
3309fe
+        finally:
3309fe
+            urlparse._QS_SEPARATOR_CONFIG_FILENAME = old_filename
3309fe
+            urlparse._default_qs_separator = None
3309fe
+            shutil.rmtree(tmpdirname)
3309fe
+
3309fe
+    def test_parse_qs_separator_semicolon(self):
3309fe
+        for orig, expect in parse_qs_test_cases_semicolon:
3309fe
+            result = urlparse.parse_qs(orig, separator=';')
3309fe
+            self.assertEqual(result, expect, "Error parsing %r" % orig)
3309fe
+            with EnvironmentVarGuard() as environ, catch_warnings(record=True) as w:
3309fe
+                environ['PYTHON_URLLIB_QS_SEPARATOR'] = ';'
3309fe
+                result = urlparse.parse_qs(orig)
3309fe
+            self.assertEqual(result, expect, "Error parsing %r" % orig)
3309fe
+            self.assertEqual(len(w), 0)
3309fe
+            with self._qsl_sep_config(';'), catch_warnings(record=True) as w:
3309fe
+                result = urlparse.parse_qs(orig)
3309fe
+            self.assertEqual(result, expect, "Error parsing %r" % orig)
3309fe
+            self.assertEqual(len(w), 0)
3309fe
+
3309fe
+    def test_parse_qsl_separator_semicolon(self):
3309fe
+        for orig, expect in parse_qsl_test_cases_semicolon:
3309fe
+            result = urlparse.parse_qsl(orig, separator=';')
3309fe
+            self.assertEqual(result, expect, "Error parsing %r" % orig)
3309fe
+            with EnvironmentVarGuard() as environ, catch_warnings(record=True) as w:
3309fe
+                environ['PYTHON_URLLIB_QS_SEPARATOR'] = ';'
3309fe
+                result = urlparse.parse_qsl(orig)
3309fe
+            self.assertEqual(result, expect, "Error parsing %r" % orig)
3309fe
+            self.assertEqual(len(w), 0)
3309fe
+            with self._qsl_sep_config(';'), catch_warnings(record=True) as w:
3309fe
+                result = urlparse.parse_qsl(orig)
3309fe
+            self.assertEqual(result, expect, "Error parsing %r" % orig)
3309fe
+            self.assertEqual(len(w), 0)
3309fe
+
3309fe
+    def test_parse_qs_separator_legacy(self):
3309fe
+        for orig, expect in parse_qs_test_cases_legacy:
3309fe
+            with EnvironmentVarGuard() as environ, catch_warnings(record=True) as w:
3309fe
+                environ['PYTHON_URLLIB_QS_SEPARATOR'] = 'legacy'
3309fe
+                result = urlparse.parse_qs(orig)
3309fe
+            self.assertEqual(result, expect, "Error parsing %r" % orig)
3309fe
+            self.assertEqual(len(w), 0)
3309fe
+            with self._qsl_sep_config('legacy'), catch_warnings(record=True) as w:
3309fe
+                result = urlparse.parse_qs(orig)
3309fe
+            self.assertEqual(result, expect, "Error parsing %r" % orig)
3309fe
+            self.assertEqual(len(w), 0)
3309fe
+
3309fe
+    def test_parse_qsl_separator_legacy(self):
3309fe
+        for orig, expect in parse_qsl_test_cases_legacy:
3309fe
+            with EnvironmentVarGuard() as environ, catch_warnings(record=True) as w:
3309fe
+                environ['PYTHON_URLLIB_QS_SEPARATOR'] = 'legacy'
3309fe
+                result = urlparse.parse_qsl(orig)
3309fe
+            self.assertEqual(result, expect, "Error parsing %r" % orig)
3309fe
+            self.assertEqual(len(w), 0)
3309fe
+            with self._qsl_sep_config('legacy'), catch_warnings(record=True) as w:
3309fe
+                result = urlparse.parse_qsl(orig)
3309fe
+            self.assertEqual(result, expect, "Error parsing %r" % orig)
3309fe
+            self.assertEqual(len(w), 0)
3309fe
+
3309fe
+    def test_parse_qs_separator_bad_value_env_or_config(self):
3309fe
+        for bad_sep in '', 'abc', 'safe', '&;', 'SEP':
3309fe
+            with EnvironmentVarGuard() as environ, catch_warnings(record=True) as w:
3309fe
+                environ['PYTHON_URLLIB_QS_SEPARATOR'] = bad_sep
3309fe
+                with self.assertRaises(ValueError):
3309fe
+                    urlparse.parse_qsl('a=1;b=2')
3309fe
+            with self._qsl_sep_config('bad_sep'), catch_warnings(record=True) as w:
3309fe
+                with self.assertRaises(ValueError):
3309fe
+                    urlparse.parse_qsl('a=1;b=2')
3309fe
+
3309fe
+    def test_parse_qs_separator_bad_value_arg(self):
3309fe
+        for bad_sep in True, {}, '':
3309fe
+            with self.assertRaises(ValueError):
3309fe
+                urlparse.parse_qsl('a=1;b=2', separator=bad_sep)
3309fe
+
3309fe
+    def test_parse_qs_separator_num_fields(self):
3309fe
+        for qs, sep in (
3309fe
+            ('a&b&c', '&'),
3309fe
+            ('a;b;c', ';'),
3309fe
+            ('a&b;c', 'legacy'),
3309fe
+        ):
3309fe
+            with EnvironmentVarGuard() as environ, catch_warnings(record=True) as w:
3309fe
+                if sep != 'legacy':
3309fe
+                    with self.assertRaises(ValueError):
3309fe
+                        urlparse.parse_qsl(qs, separator=sep, max_num_fields=2)
3309fe
+                if sep:
3309fe
+                    environ['PYTHON_URLLIB_QS_SEPARATOR'] = sep
3309fe
+                with self.assertRaises(ValueError):
3309fe
+                    urlparse.parse_qsl(qs, max_num_fields=2)
3309fe
+
3309fe
+    def test_parse_qs_separator_priority(self):
3309fe
+        # env variable trumps config file
3309fe
+        with self._qsl_sep_config('~'), EnvironmentVarGuard() as environ:
3309fe
+            environ['PYTHON_URLLIB_QS_SEPARATOR'] = '!'
3309fe
+            result = urlparse.parse_qs('a=1!b=2~c=3')
3309fe
+            self.assertEqual(result, {'a': ['1'], 'b': ['2~c=3']})
3309fe
+        # argument trumps config file
3309fe
+        with self._qsl_sep_config('~'):
3309fe
+            result = urlparse.parse_qs('a=1$b=2~c=3', separator='$')
3309fe
+            self.assertEqual(result, {'a': ['1'], 'b': ['2~c=3']})
3309fe
+        # argument trumps env variable
3309fe
+        with EnvironmentVarGuard() as environ:
3309fe
+            environ['PYTHON_URLLIB_QS_SEPARATOR'] = '~'
3309fe
+            result = urlparse.parse_qs('a=1$b=2~c=3', separator='$')
3309fe
+            self.assertEqual(result, {'a': ['1'], 'b': ['2~c=3']})
3309fe
+
3309fe
     def test_urlsplit_normalization(self):
3309fe
         # Certain characters should never occur in the netloc,
3309fe
         # including under normalization.
3309fe
diff --git a/Lib/urlparse.py b/Lib/urlparse.py
3309fe
index 798b467b605..69504d8fd93 100644
3309fe
--- a/Lib/urlparse.py
3309fe
+++ b/Lib/urlparse.py
3309fe
@@ -29,6 +29,7 @@ test_urlparse.py provides a good indicator of parsing behavior.
3309fe
 """
3309fe
 
3309fe
 import re
3309fe
+import os
3309fe
 
3309fe
 __all__ = ["urlparse", "urlunparse", "urljoin", "urldefrag",
3309fe
            "urlsplit", "urlunsplit", "parse_qs", "parse_qsl"]
3309fe
@@ -382,7 +383,8 @@ def unquote(s):
3309fe
             append(item)
3309fe
     return ''.join(res)
3309fe
 
3309fe
-def parse_qs(qs, keep_blank_values=0, strict_parsing=0, max_num_fields=None):
3309fe
+def parse_qs(qs, keep_blank_values=0, strict_parsing=0, max_num_fields=None,
3309fe
+             separator=None):
3309fe
     """Parse a query given as a string argument.
3309fe
 
3309fe
         Arguments:
3309fe
@@ -405,14 +407,23 @@ def parse_qs(qs, keep_blank_values=0, strict_parsing=0, max_num_fields=None):
3309fe
     """
3309fe
     dict = {}
3309fe
     for name, value in parse_qsl(qs, keep_blank_values, strict_parsing,
3309fe
-                                 max_num_fields):
3309fe
+                                 max_num_fields, separator):
3309fe
         if name in dict:
3309fe
             dict[name].append(value)
3309fe
         else:
3309fe
             dict[name] = [value]
3309fe
     return dict
3309fe
 
3309fe
-def parse_qsl(qs, keep_blank_values=0, strict_parsing=0, max_num_fields=None):
3309fe
+class _QueryStringSeparatorWarning(RuntimeWarning):
3309fe
+    """Warning for using default `separator` in parse_qs or parse_qsl"""
3309fe
+
3309fe
+# The default "separator" for parse_qsl can be specified in a config file.
3309fe
+# It's cached after first read.
3309fe
+_QS_SEPARATOR_CONFIG_FILENAME = '/etc/python/urllib.cfg'
3309fe
+_default_qs_separator = None
3309fe
+
3309fe
+def parse_qsl(qs, keep_blank_values=0, strict_parsing=0, max_num_fields=None,
3309fe
+              separator=None):
3309fe
     """Parse a query given as a string argument.
3309fe
 
3309fe
     Arguments:
3309fe
@@ -434,15 +445,72 @@ def parse_qsl(qs, keep_blank_values=0, strict_parsing=0, max_num_fields=None):
3309fe
 
3309fe
     Returns a list, as G-d intended.
3309fe
     """
3309fe
+
3309fe
+    if (not separator or (not isinstance(separator, (str, bytes)))) and separator is not None:
3309fe
+        raise ValueError("Separator must be of type string or bytes.")
3309fe
+
3309fe
+    # Used when both "&" and ";" act as separators. (Need a non-string value.)
3309fe
+    _legacy = object()
3309fe
+
3309fe
+    if separator is None:
3309fe
+        global _default_qs_separator
3309fe
+        separator = _default_qs_separator
3309fe
+        envvar_name = 'PYTHON_URLLIB_QS_SEPARATOR'
3309fe
+        if separator is None:
3309fe
+            # Set default separator from environment variable
3309fe
+            separator = os.environ.get(envvar_name)
3309fe
+            config_source = 'environment variable'
3309fe
+        if separator is None:
3309fe
+            # Set default separator from the configuration file
3309fe
+            try:
3309fe
+                file = open(_QS_SEPARATOR_CONFIG_FILENAME)
3309fe
+            except EnvironmentError:
3309fe
+                pass
3309fe
+            else:
3309fe
+                with file:
3309fe
+                    import ConfigParser
3309fe
+                    config = ConfigParser.ConfigParser()
3309fe
+                    config.readfp(file)
3309fe
+                    separator = config.get('parse_qs', envvar_name)
3309fe
+                    _default_qs_separator = separator
3309fe
+                config_source = _QS_SEPARATOR_CONFIG_FILENAME
3309fe
+        if separator is None:
3309fe
+            # The default is '&', but warn if not specified explicitly
3309fe
+            if ';' in qs:
3309fe
+                from warnings import warn
3309fe
+                warn("The default separator of urlparse.parse_qsl and "
3309fe
+                    + "parse_qs was changed to '&' to avoid a web cache "
3309fe
+                    + "poisoning issue (CVE-2021-23336). "
3309fe
+                    + "By default, semicolons no longer act as query field "
3309fe
+                    + "separators. "
3309fe
+                    + "See https://access.redhat.com/articles/5860431 for "
3309fe
+                    + "more details.",
3309fe
+                    _QueryStringSeparatorWarning, stacklevel=2)
3309fe
+            separator = '&'
3309fe
+        elif separator == 'legacy':
3309fe
+            separator = _legacy
3309fe
+        elif len(separator) != 1:
3309fe
+            raise ValueError(
3309fe
+                '{} (from {}) must contain '.format(envvar_name, config_source)
3309fe
+                + '1 character, or "legacy". See '
3309fe
+                + 'https://access.redhat.com/articles/5860431 for more details.'
3309fe
+            )
3309fe
+
3309fe
     # If max_num_fields is defined then check that the number of fields
3309fe
     # is less than max_num_fields. This prevents a memory exhaustion DOS
3309fe
     # attack via post bodies with many fields.
3309fe
     if max_num_fields is not None:
3309fe
-        num_fields = 1 + qs.count('&') + qs.count(';')
3309fe
+        if separator is _legacy:
3309fe
+            num_fields = 1 + qs.count('&') + qs.count(';')
3309fe
+        else:
3309fe
+            num_fields = 1 + qs.count(separator)
3309fe
         if max_num_fields < num_fields:
3309fe
             raise ValueError('Max number of fields exceeded')
3309fe
 
3309fe
-    pairs = [s2 for s1 in qs.split('&') for s2 in s1.split(';')]
3309fe
+    if separator is _legacy:
3309fe
+        pairs = [s2 for s1 in qs.split('&') for s2 in s1.split(';')]
3309fe
+    else:
3309fe
+        pairs = [s1 for s1 in qs.split(separator)]
3309fe
     r = []
3309fe
     for name_value in pairs:
3309fe
         if not name_value and not strict_parsing:
3309fe
-- 
3309fe
2.30.2
3309fe