Blame SOURCES/00359-CVE-2021-23336.patch

d9cf46
From 976a4010aa4e450855dce5fa4c865bcbdc86cccd Mon Sep 17 00:00:00 2001
d9cf46
From: Charalampos Stratakis <cstratak@redhat.com>
d9cf46
Date: Fri, 16 Apr 2021 18:02:00 +0200
d9cf46
Subject: [PATCH] CVE-2021-23336: Add `separator` argument to parse_qs; warn
d9cf46
 with default
d9cf46
MIME-Version: 1.0
d9cf46
Content-Type: text/plain; charset=UTF-8
d9cf46
Content-Transfer-Encoding: 8bit
d9cf46
d9cf46
Partially backports https://bugs.python.org/issue42967 : [security] Address a web cache-poisoning issue reported in urllib.parse.parse_qsl().
d9cf46
d9cf46
Backported from the python3 branch.
d9cf46
However, this solution is different than the upstream solution in Python 3.
d9cf46
d9cf46
Based on the downstream solution for python 3.6.13 by Petr Viktorin.
d9cf46
d9cf46
An optional argument seperator is added to specify the separator.
d9cf46
It is recommended to set it to '&' or ';' to match the application or proxy in use.
d9cf46
The default can be set with an env variable of a config file.
d9cf46
If neither the argument, env var or config file specifies a separator, "&" is used
d9cf46
but a warning is raised if parse_qs is used on input that contains ';'.
d9cf46
d9cf46
Co-authors of the downstream change:
d9cf46
Co-authored-by: Petr Viktorin <pviktori@redhat.com>
d9cf46
Co-authors of the upstream change (who do not necessarily agree with this):
d9cf46
Co-authored-by: Adam Goldschmidt <adamgold7@gmail.com>
d9cf46
Co-authored-by: Ken Jin <28750310+Fidget-Spinner@users.noreply.github.com>
d9cf46
Co-authored-by: Éric Araujo <merwok@netwok.org>
d9cf46
---
d9cf46
 Doc/library/cgi.rst       |   5 +-
d9cf46
 Doc/library/urlparse.rst  |  15 ++-
d9cf46
 Lib/cgi.py                |  34 +++---
d9cf46
 Lib/test/test_cgi.py      |  59 ++++++++++-
d9cf46
 Lib/test/test_urlparse.py | 210 +++++++++++++++++++++++++++++++++++++-
d9cf46
 Lib/urlparse.py           |  78 +++++++++++++-
d9cf46
 6 files changed, 369 insertions(+), 32 deletions(-)
d9cf46
d9cf46
diff --git a/Doc/library/cgi.rst b/Doc/library/cgi.rst
d9cf46
index ecd62c8c019..a96cd38717b 100644
d9cf46
--- a/Doc/library/cgi.rst
d9cf46
+++ b/Doc/library/cgi.rst
d9cf46
@@ -285,10 +285,10 @@ These are useful if you want more control, or if you want to employ some of the
d9cf46
 algorithms implemented in this module in other circumstances.
d9cf46
 
d9cf46
 
d9cf46
-.. function:: parse(fp[, environ[, keep_blank_values[, strict_parsing]]])
d9cf46
+.. function:: parse(fp[, environ[, keep_blank_values[, strict_parsing[, separator]]]])
d9cf46
 
d9cf46
    Parse a query in the environment or from a file (the file defaults to
d9cf46
-   ``sys.stdin`` and environment defaults to ``os.environ``).  The *keep_blank_values* and *strict_parsing* parameters are
d9cf46
+   ``sys.stdin`` and environment defaults to ``os.environ``).  The *keep_blank_values*, *strict_parsing* and *separator* parameters are
d9cf46
    passed to :func:`urlparse.parse_qs` unchanged.
d9cf46
 
d9cf46
 
d9cf46
@@ -316,7 +316,6 @@ algorithms implemented in this module in other circumstances.
d9cf46
    Note that this does not parse nested multipart parts --- use
d9cf46
    :class:`FieldStorage` for that.
d9cf46
 
d9cf46
-
d9cf46
 .. function:: parse_header(string)
d9cf46
 
d9cf46
    Parse a MIME header (such as :mailheader:`Content-Type`) into a main value and a
d9cf46
diff --git a/Doc/library/urlparse.rst b/Doc/library/urlparse.rst
d9cf46
index 0989c88c302..97d1119257c 100644
d9cf46
--- a/Doc/library/urlparse.rst
d9cf46
+++ b/Doc/library/urlparse.rst
d9cf46
@@ -136,7 +136,7 @@ The :mod:`urlparse` module defines the following functions:
d9cf46
       now raise :exc:`ValueError`.
d9cf46
 
d9cf46
 
d9cf46
-.. function:: parse_qs(qs[, keep_blank_values[, strict_parsing[, max_num_fields]]])
d9cf46
+.. function:: parse_qs(qs[, keep_blank_values[, strict_parsing[, max_num_fields[, separator]]]])
d9cf46
 
d9cf46
    Parse a query string given as a string argument (data of type
d9cf46
    :mimetype:`application/x-www-form-urlencoded`).  Data are returned as a
d9cf46
@@ -157,6 +157,15 @@ The :mod:`urlparse` module defines the following functions:
d9cf46
    read. If set, then throws a :exc:`ValueError` if there are more than
d9cf46
    *max_num_fields* fields read.
d9cf46
 
d9cf46
+   The optional argument *separator* is the symbol to use for separating the
d9cf46
+   query arguments. It is recommended to set it to ``'&'`` or ``';'``.
d9cf46
+   It defaults to ``'&'``; a warning is raised if this default is used.
d9cf46
+   This default may be changed with the following environment variable settings:
d9cf46
+
d9cf46
+   - ``PYTHON_URLLIB_QS_SEPARATOR='&'``: use only ``&`` as separator, without warning (as in Python 3.6.13+ or 3.10)
d9cf46
+   - ``PYTHON_URLLIB_QS_SEPARATOR=';'``: use only ``;`` as separator
d9cf46
+   - ``PYTHON_URLLIB_QS_SEPARATOR=legacy``: use both ``&`` and ``;`` (as in previous versions of Python)
d9cf46
+
d9cf46
    Use the :func:`urllib.urlencode` function to convert such dictionaries into
d9cf46
    query strings.
d9cf46
 
d9cf46
@@ -186,6 +195,9 @@ The :mod:`urlparse` module defines the following functions:
d9cf46
    read. If set, then throws a :exc:`ValueError` if there are more than
d9cf46
    *max_num_fields* fields read.
d9cf46
 
d9cf46
+   The optional argument *separator* is the symbol to use for separating the
d9cf46
+   query arguments. It works as in :py:func:`parse_qs`.
d9cf46
+
d9cf46
    Use the :func:`urllib.urlencode` function to convert such lists of pairs into
d9cf46
    query strings.
d9cf46
 
d9cf46
@@ -195,6 +207,7 @@ The :mod:`urlparse` module defines the following functions:
d9cf46
    .. versionchanged:: 2.7.16
d9cf46
       Added *max_num_fields* parameter.
d9cf46
 
d9cf46
+
d9cf46
 .. function:: urlunparse(parts)
d9cf46
 
d9cf46
    Construct a URL from a tuple as returned by ``urlparse()``. The *parts* argument
d9cf46
diff --git a/Lib/cgi.py b/Lib/cgi.py
d9cf46
index 5b903e03477..1421f2d90e0 100755
d9cf46
--- a/Lib/cgi.py
d9cf46
+++ b/Lib/cgi.py
d9cf46
@@ -121,7 +121,8 @@ log = initlog           # The current logging function
d9cf46
 # 0 ==> unlimited input
d9cf46
 maxlen = 0
d9cf46
 
d9cf46
-def parse(fp=None, environ=os.environ, keep_blank_values=0, strict_parsing=0):
d9cf46
+def parse(fp=None, environ=os.environ, keep_blank_values=0,
d9cf46
+          strict_parsing=0, separator=None):
d9cf46
     """Parse a query in the environment or from a file (default stdin)
d9cf46
 
d9cf46
         Arguments, all optional:
d9cf46
@@ -140,6 +141,8 @@ def parse(fp=None, environ=os.environ, keep_blank_values=0, strict_parsing=0):
d9cf46
         strict_parsing: flag indicating what to do with parsing errors.
d9cf46
             If false (the default), errors are silently ignored.
d9cf46
             If true, errors raise a ValueError exception.
d9cf46
+
d9cf46
+        separator: str. The symbol to use for separating the query arguments.
d9cf46
     """
d9cf46
     if fp is None:
d9cf46
         fp = sys.stdin
d9cf46
@@ -171,25 +174,26 @@ def parse(fp=None, environ=os.environ, keep_blank_values=0, strict_parsing=0):
d9cf46
         else:
d9cf46
             qs = ""
d9cf46
         environ['QUERY_STRING'] = qs    # XXX Shouldn't, really
d9cf46
-    return urlparse.parse_qs(qs, keep_blank_values, strict_parsing)
d9cf46
+    return urlparse.parse_qs(qs, keep_blank_values, strict_parsing, separator=separator)
d9cf46
 
d9cf46
 
d9cf46
 # parse query string function called from urlparse,
d9cf46
 # this is done in order to maintain backward compatibility.
d9cf46
 
d9cf46
-def parse_qs(qs, keep_blank_values=0, strict_parsing=0):
d9cf46
+def parse_qs(qs, keep_blank_values=0, strict_parsing=0, separator=None):
d9cf46
     """Parse a query given as a string argument."""
d9cf46
     warn("cgi.parse_qs is deprecated, use urlparse.parse_qs instead",
d9cf46
          PendingDeprecationWarning, 2)
d9cf46
-    return urlparse.parse_qs(qs, keep_blank_values, strict_parsing)
d9cf46
+    return urlparse.parse_qs(qs, keep_blank_values, strict_parsing,
d9cf46
+                             separator=separator)
d9cf46
 
d9cf46
 
d9cf46
-def parse_qsl(qs, keep_blank_values=0, strict_parsing=0, max_num_fields=None):
d9cf46
+def parse_qsl(qs, keep_blank_values=0, strict_parsing=0, max_num_fields=None, separator=None):
d9cf46
     """Parse a query given as a string argument."""
d9cf46
     warn("cgi.parse_qsl is deprecated, use urlparse.parse_qsl instead",
d9cf46
          PendingDeprecationWarning, 2)
d9cf46
     return urlparse.parse_qsl(qs, keep_blank_values, strict_parsing,
d9cf46
-                              max_num_fields)
d9cf46
+                              max_num_fields, separator=separator)
d9cf46
 
d9cf46
 def parse_multipart(fp, pdict):
d9cf46
     """Parse multipart input.
d9cf46
@@ -288,7 +292,6 @@ def parse_multipart(fp, pdict):
d9cf46
 
d9cf46
     return partdict
d9cf46
 
d9cf46
-
d9cf46
 def _parseparam(s):
d9cf46
     while s[:1] == ';':
d9cf46
         s = s[1:]
d9cf46
@@ -395,7 +398,7 @@ class FieldStorage:
d9cf46
 
d9cf46
     def __init__(self, fp=None, headers=None, outerboundary="",
d9cf46
                  environ=os.environ, keep_blank_values=0, strict_parsing=0,
d9cf46
-                 max_num_fields=None):
d9cf46
+                 max_num_fields=None, separator=None):
d9cf46
         """Constructor.  Read multipart/* until last part.
d9cf46
 
d9cf46
         Arguments, all optional:
d9cf46
@@ -430,6 +433,7 @@ class FieldStorage:
d9cf46
         self.keep_blank_values = keep_blank_values
d9cf46
         self.strict_parsing = strict_parsing
d9cf46
         self.max_num_fields = max_num_fields
d9cf46
+        self.separator = separator
d9cf46
         if 'REQUEST_METHOD' in environ:
d9cf46
             method = environ['REQUEST_METHOD'].upper()
d9cf46
         self.qs_on_post = None
d9cf46
@@ -613,7 +617,8 @@ class FieldStorage:
d9cf46
         if self.qs_on_post:
d9cf46
             qs += '&' + self.qs_on_post
d9cf46
         query = urlparse.parse_qsl(qs, self.keep_blank_values,
d9cf46
-                                   self.strict_parsing, self.max_num_fields)
d9cf46
+                                   self.strict_parsing, self.max_num_fields,
d9cf46
+                                   self.separator)
d9cf46
         self.list = [MiniFieldStorage(key, value) for key, value in query]
d9cf46
         self.skip_lines()
d9cf46
 
d9cf46
@@ -629,7 +634,8 @@ class FieldStorage:
d9cf46
             query = urlparse.parse_qsl(self.qs_on_post,
d9cf46
                                        self.keep_blank_values,
d9cf46
                                        self.strict_parsing,
d9cf46
-                                       self.max_num_fields)
d9cf46
+                                       self.max_num_fields,
d9cf46
+                                       self.separator)
d9cf46
             self.list.extend(MiniFieldStorage(key, value)
d9cf46
                              for key, value in query)
d9cf46
             FieldStorageClass = None
d9cf46
@@ -649,7 +655,8 @@ class FieldStorage:
d9cf46
             headers = rfc822.Message(self.fp)
d9cf46
             part = klass(self.fp, headers, ib,
d9cf46
                          environ, keep_blank_values, strict_parsing,
d9cf46
-                         max_num_fields)
d9cf46
+                         max_num_fields,
d9cf46
+                         separator=self.separator)
d9cf46
 
d9cf46
             if max_num_fields is not None:
d9cf46
                 max_num_fields -= 1
d9cf46
@@ -817,10 +824,11 @@ class FormContentDict(UserDict.UserDict):
d9cf46
     form.dict == {key: [val, val, ...], ...}
d9cf46
 
d9cf46
     """
d9cf46
-    def __init__(self, environ=os.environ, keep_blank_values=0, strict_parsing=0):
d9cf46
+    def __init__(self, environ=os.environ, keep_blank_values=0, strict_parsing=0, separator=None):
d9cf46
         self.dict = self.data = parse(environ=environ,
d9cf46
                                       keep_blank_values=keep_blank_values,
d9cf46
-                                      strict_parsing=strict_parsing)
d9cf46
+                                      strict_parsing=strict_parsing,
d9cf46
+                                      separator=separator)
d9cf46
         self.query_string = environ['QUERY_STRING']
d9cf46
 
d9cf46
 
d9cf46
diff --git a/Lib/test/test_cgi.py b/Lib/test/test_cgi.py
d9cf46
index 743c2afbd4c..9956ea9d4e8 100644
d9cf46
--- a/Lib/test/test_cgi.py
d9cf46
+++ b/Lib/test/test_cgi.py
d9cf46
@@ -61,12 +61,9 @@ parse_strict_test_cases = [
d9cf46
     ("", ValueError("bad query field: ''")),
d9cf46
     ("&", ValueError("bad query field: ''")),
d9cf46
     ("&&", ValueError("bad query field: ''")),
d9cf46
-    (";", ValueError("bad query field: ''")),
d9cf46
-    (";&;", ValueError("bad query field: ''")),
d9cf46
     # Should the next few really be valid?
d9cf46
     ("=", {}),
d9cf46
     ("=&=", {}),
d9cf46
-    ("=;=", {}),
d9cf46
     # This rest seem to make sense
d9cf46
     ("=a", {'': ['a']}),
d9cf46
     ("&=a", ValueError("bad query field: ''")),
d9cf46
@@ -81,8 +78,6 @@ parse_strict_test_cases = [
d9cf46
     ("a=a+b&b=b+c", {'a': ['a b'], 'b': ['b c']}),
d9cf46
     ("a=a+b&a=b+a", {'a': ['a b', 'b a']}),
d9cf46
     ("x=1&y=2.0&z=2-3.%2b0", {'x': ['1'], 'y': ['2.0'], 'z': ['2-3.+0']}),
d9cf46
-    ("x=1;y=2.0&z=2-3.%2b0", {'x': ['1'], 'y': ['2.0'], 'z': ['2-3.+0']}),
d9cf46
-    ("x=1;y=2.0;z=2-3.%2b0", {'x': ['1'], 'y': ['2.0'], 'z': ['2-3.+0']}),
d9cf46
     ("Hbc5161168c542333633315dee1182227:key_store_seqid=400006&cuyer=r&view=bustomer&order_id=0bb2e248638833d48cb7fed300000f1b&expire=964546263&lobale=en-US&kid=130003.300038&ss=env",
d9cf46
      {'Hbc5161168c542333633315dee1182227:key_store_seqid': ['400006'],
d9cf46
       'cuyer': ['r'],
d9cf46
@@ -177,6 +172,60 @@ class CgiTests(unittest.TestCase):
d9cf46
                         self.assertItemsEqual(sd.items(),
d9cf46
                                                 first_second_elts(expect.items()))
d9cf46
 
d9cf46
+    def test_separator(self):
d9cf46
+        parse_semicolon = [
d9cf46
+            ("x=1;y=2.0", {'x': ['1'], 'y': ['2.0']}),
d9cf46
+            ("x=1;y=2.0;z=2-3.%2b0", {'x': ['1'], 'y': ['2.0'], 'z': ['2-3.+0']}),
d9cf46
+            (";", ValueError("bad query field: ''")),
d9cf46
+            (";;", ValueError("bad query field: ''")),
d9cf46
+            ("=;a", ValueError("bad query field: 'a'")),
d9cf46
+            (";b=a", ValueError("bad query field: ''")),
d9cf46
+            ("b;=a", ValueError("bad query field: 'b'")),
d9cf46
+            ("a=a+b;b=b+c", {'a': ['a b'], 'b': ['b c']}),
d9cf46
+            ("a=a+b;a=b+a", {'a': ['a b', 'b a']}),
d9cf46
+        ]
d9cf46
+        for orig, expect in parse_semicolon:
d9cf46
+            env = {'QUERY_STRING': orig}
d9cf46
+            fcd = cgi.FormContentDict(env, separator=';')
d9cf46
+            sd = cgi.SvFormContentDict(env, separator=';')
d9cf46
+            fs = cgi.FieldStorage(environ=env, separator=';')
d9cf46
+            if isinstance(expect, dict):
d9cf46
+                # test dict interface
d9cf46
+                self.assertEqual(len(expect), len(fcd))
d9cf46
+                self.assertItemsEqual(expect.keys(), fcd.keys())
d9cf46
+                self.assertItemsEqual(expect.values(), fcd.values())
d9cf46
+                self.assertItemsEqual(expect.items(), fcd.items())
d9cf46
+                self.assertEqual(fcd.get("nonexistent field", "default"), "default")
d9cf46
+                self.assertEqual(len(sd), len(fs))
d9cf46
+                self.assertItemsEqual(sd.keys(), fs.keys())
d9cf46
+                self.assertEqual(fs.getvalue("nonexistent field", "default"), "default")
d9cf46
+                # test individual fields
d9cf46
+                for key in expect.keys():
d9cf46
+                    expect_val = expect[key]
d9cf46
+                    self.assertTrue(fcd.has_key(key))
d9cf46
+                    self.assertItemsEqual(fcd[key], expect[key])
d9cf46
+                    self.assertEqual(fcd.get(key, "default"), fcd[key])
d9cf46
+                    self.assertTrue(fs.has_key(key))
d9cf46
+                    if len(expect_val) > 1:
d9cf46
+                        single_value = 0
d9cf46
+                    else:
d9cf46
+                        single_value = 1
d9cf46
+                    try:
d9cf46
+                        val = sd[key]
d9cf46
+                    except IndexError:
d9cf46
+                        self.assertFalse(single_value)
d9cf46
+                        self.assertEqual(fs.getvalue(key), expect_val)
d9cf46
+                    else:
d9cf46
+                        self.assertTrue(single_value)
d9cf46
+                        self.assertEqual(val, expect_val[0])
d9cf46
+                        self.assertEqual(fs.getvalue(key), expect_val[0])
d9cf46
+                    self.assertItemsEqual(sd.getlist(key), expect_val)
d9cf46
+                    if single_value:
d9cf46
+                        self.assertItemsEqual(sd.values(),
d9cf46
+                                                first_elts(expect.values()))
d9cf46
+                        self.assertItemsEqual(sd.items(),
d9cf46
+                                                first_second_elts(expect.items()))
d9cf46
+
d9cf46
     def test_weird_formcontentdict(self):
d9cf46
         # Test the weird FormContentDict classes
d9cf46
         env = {'QUERY_STRING': "x=1&y=2.0&z=2-3.%2b0&1=1abc"}
d9cf46
diff --git a/Lib/test/test_urlparse.py b/Lib/test/test_urlparse.py
d9cf46
index 86c4a0595c4..21875bb2991 100644
d9cf46
--- a/Lib/test/test_urlparse.py
d9cf46
+++ b/Lib/test/test_urlparse.py
d9cf46
@@ -3,6 +3,12 @@ import sys
d9cf46
 import unicodedata
d9cf46
 import unittest
d9cf46
 import urlparse
d9cf46
+from test.support import EnvironmentVarGuard
d9cf46
+from warnings import catch_warnings, filterwarnings
d9cf46
+import tempfile
d9cf46
+import contextlib
d9cf46
+import os.path
d9cf46
+import shutil
d9cf46
 
d9cf46
 RFC1808_BASE = "http://a/b/c/d;p?q#f"
d9cf46
 RFC2396_BASE = "http://a/b/c/d;p?q"
d9cf46
@@ -24,16 +30,29 @@ parse_qsl_test_cases = [
d9cf46
     ("&a=b", [('a', 'b')]),
d9cf46
     ("a=a+b&b=b+c", [('a', 'a b'), ('b', 'b c')]),
d9cf46
     ("a=1&a=2", [('a', '1'), ('a', '2')]),
d9cf46
+]
d9cf46
+
d9cf46
+parse_qsl_test_cases_semicolon = [
d9cf46
     (";", []),
d9cf46
     (";;", []),
d9cf46
     (";a=b", [('a', 'b')]),
d9cf46
     ("a=a+b;b=b+c", [('a', 'a b'), ('b', 'b c')]),
d9cf46
     ("a=1;a=2", [('a', '1'), ('a', '2')]),
d9cf46
-    (b";", []),
d9cf46
-    (b";;", []),
d9cf46
-    (b";a=b", [(b'a', b'b')]),
d9cf46
-    (b"a=a+b;b=b+c", [(b'a', b'a b'), (b'b', b'b c')]),
d9cf46
-    (b"a=1;a=2", [(b'a', b'1'), (b'a', b'2')]),
d9cf46
+]
d9cf46
+
d9cf46
+parse_qsl_test_cases_legacy = [
d9cf46
+    ("a=1;a=2&a=3", [('a', '1'), ('a', '2'), ('a', '3')]),
d9cf46
+    ("a=1;b=2&c=3", [('a', '1'), ('b', '2'), ('c', '3')]),
d9cf46
+    ("a=1&b=2&c=3;", [('a', '1'), ('b', '2'), ('c', '3')]),
d9cf46
+]
d9cf46
+
d9cf46
+parse_qsl_test_cases_warn = [
d9cf46
+    (";a=b", [(';a', 'b')]),
d9cf46
+    ("a=a+b;b=b+c", [('a', 'a b;b=b c')]),
d9cf46
+    (b";a=b", [(b';a', b'b')]),
d9cf46
+    (b"a=a+b;b=b+c", [(b'a', b'a b;b=b c')]),
d9cf46
+    ("a=1;a=2&a=3", [('a', '1;a=2'), ('a', '3')]),
d9cf46
+    (b"a=1;a=2&a=3", [(b'a', b'1;a=2'), (b'a', b'3')]),
d9cf46
 ]
d9cf46
 
d9cf46
 parse_qs_test_cases = [
d9cf46
@@ -57,6 +76,9 @@ parse_qs_test_cases = [
d9cf46
     (b"&a=b", {b'a': [b'b']}),
d9cf46
     (b"a=a+b&b=b+c", {b'a': [b'a b'], b'b': [b'b c']}),
d9cf46
     (b"a=1&a=2", {b'a': [b'1', b'2']}),
d9cf46
+]
d9cf46
+
d9cf46
+parse_qs_test_cases_semicolon = [
d9cf46
     (";", {}),
d9cf46
     (";;", {}),
d9cf46
     (";a=b", {'a': ['b']}),
d9cf46
@@ -69,6 +91,24 @@ parse_qs_test_cases = [
d9cf46
     (b"a=1;a=2", {b'a': [b'1', b'2']}),
d9cf46
 ]
d9cf46
 
d9cf46
+parse_qs_test_cases_legacy = [
d9cf46
+    ("a=1;a=2&a=3", {'a': ['1', '2', '3']}),
d9cf46
+    ("a=1;b=2&c=3", {'a': ['1'], 'b': ['2'], 'c': ['3']}),
d9cf46
+    ("a=1&b=2&c=3;", {'a': ['1'], 'b': ['2'], 'c': ['3']}),
d9cf46
+    (b"a=1;a=2&a=3", {b'a': [b'1', b'2', b'3']}),
d9cf46
+    (b"a=1;b=2&c=3", {b'a': [b'1'], b'b': [b'2'], b'c': [b'3']}),
d9cf46
+    (b"a=1&b=2&c=3;", {b'a': [b'1'], b'b': [b'2'], b'c': [b'3']}),
d9cf46
+]
d9cf46
+
d9cf46
+parse_qs_test_cases_warn = [
d9cf46
+    (";a=b", {';a': ['b']}),
d9cf46
+    ("a=a+b;b=b+c", {'a': ['a b;b=b c']}),
d9cf46
+    (b";a=b", {b';a': [b'b']}),
d9cf46
+    (b"a=a+b;b=b+c", {b'a':[ b'a b;b=b c']}),
d9cf46
+    ("a=1;a=2&a=3", {'a': ['1;a=2', '3']}),
d9cf46
+    (b"a=1;a=2&a=3", {b'a': [b'1;a=2', b'3']}),
d9cf46
+]
d9cf46
+
d9cf46
 class UrlParseTestCase(unittest.TestCase):
d9cf46
 
d9cf46
     def checkRoundtrips(self, url, parsed, split):
d9cf46
@@ -141,6 +181,40 @@ class UrlParseTestCase(unittest.TestCase):
d9cf46
             self.assertEqual(result, expect_without_blanks,
d9cf46
                     "Error parsing %r" % orig)
d9cf46
 
d9cf46
+    def test_qs_default_warn(self):
d9cf46
+        for orig, expect in parse_qs_test_cases_warn:
d9cf46
+            with catch_warnings(record=True) as w:
d9cf46
+                filterwarnings(action='always',
d9cf46
+                                        category=urlparse._QueryStringSeparatorWarning)
d9cf46
+                result = urlparse.parse_qs(orig, keep_blank_values=True)
d9cf46
+                self.assertEqual(result, expect, "Error parsing %r" % orig)
d9cf46
+            self.assertEqual(len(w), 1)
d9cf46
+            self.assertEqual(w[0].category, urlparse._QueryStringSeparatorWarning)
d9cf46
+
d9cf46
+    def test_qsl_default_warn(self):
d9cf46
+        for orig, expect in parse_qsl_test_cases_warn:
d9cf46
+            with catch_warnings(record=True) as w:
d9cf46
+                filterwarnings(action='always',
d9cf46
+                               category=urlparse._QueryStringSeparatorWarning)
d9cf46
+                result = urlparse.parse_qsl(orig, keep_blank_values=True)
d9cf46
+                self.assertEqual(result, expect, "Error parsing %r" % orig)
d9cf46
+            self.assertEqual(len(w), 1)
d9cf46
+            self.assertEqual(w[0].category, urlparse._QueryStringSeparatorWarning)
d9cf46
+
d9cf46
+    def test_default_qs_no_warnings(self):
d9cf46
+        for orig, expect in parse_qs_test_cases:
d9cf46
+            with catch_warnings(record=True) as w:
d9cf46
+                result = urlparse.parse_qs(orig, keep_blank_values=True)
d9cf46
+                self.assertEqual(result, expect, "Error parsing %r" % orig)
d9cf46
+            self.assertEqual(len(w), 0)
d9cf46
+
d9cf46
+    def test_default_qsl_no_warnings(self):
d9cf46
+        for orig, expect in parse_qsl_test_cases:
d9cf46
+            with catch_warnings(record=True) as w:
d9cf46
+                result = urlparse.parse_qsl(orig, keep_blank_values=True)
d9cf46
+                self.assertEqual(result, expect, "Error parsing %r" % orig)
d9cf46
+            self.assertEqual(len(w), 0)
d9cf46
+
d9cf46
     def test_roundtrips(self):
d9cf46
         testcases = [
d9cf46
             ('file:///tmp/junk.txt',
d9cf46
@@ -626,6 +700,132 @@ class UrlParseTestCase(unittest.TestCase):
d9cf46
         self.assertEqual(urlparse.urlparse("http://www.python.org:80"),
d9cf46
                 ('http','www.python.org:80','','','',''))
d9cf46
 
d9cf46
+    def test_parse_qs_separator_bytes(self):
d9cf46
+        expected = {b'a': [b'1'], b'b': [b'2']}
d9cf46
+
d9cf46
+        result = urlparse.parse_qs(b'a=1;b=2', separator=b';')
d9cf46
+        self.assertEqual(result, expected)
d9cf46
+        result = urlparse.parse_qs(b'a=1;b=2', separator=';')
d9cf46
+        self.assertEqual(result, expected)
d9cf46
+        result = urlparse.parse_qs('a=1;b=2', separator=';')
d9cf46
+        self.assertEqual(result, {'a': ['1'], 'b': ['2']})
d9cf46
+
d9cf46
+    @contextlib.contextmanager
d9cf46
+    def _qsl_sep_config(self, sep):
d9cf46
+        """Context for the given parse_qsl default separator configured in config file"""
d9cf46
+        old_filename = urlparse._QS_SEPARATOR_CONFIG_FILENAME
d9cf46
+        urlparse._default_qs_separator = None
d9cf46
+        try:
d9cf46
+            tmpdirname = tempfile.mkdtemp()
d9cf46
+            filename = os.path.join(tmpdirname, 'conf.cfg')
d9cf46
+            with open(filename, 'w') as file:
d9cf46
+                file.write('[parse_qs]\n')
d9cf46
+                file.write('PYTHON_URLLIB_QS_SEPARATOR = {}'.format(sep))
d9cf46
+            urlparse._QS_SEPARATOR_CONFIG_FILENAME = filename
d9cf46
+            yield
d9cf46
+        finally:
d9cf46
+            urlparse._QS_SEPARATOR_CONFIG_FILENAME = old_filename
d9cf46
+            urlparse._default_qs_separator = None
d9cf46
+            shutil.rmtree(tmpdirname)
d9cf46
+
d9cf46
+    def test_parse_qs_separator_semicolon(self):
d9cf46
+        for orig, expect in parse_qs_test_cases_semicolon:
d9cf46
+            result = urlparse.parse_qs(orig, separator=';')
d9cf46
+            self.assertEqual(result, expect, "Error parsing %r" % orig)
d9cf46
+            with EnvironmentVarGuard() as environ, catch_warnings(record=True) as w:
d9cf46
+                environ['PYTHON_URLLIB_QS_SEPARATOR'] = ';'
d9cf46
+                result = urlparse.parse_qs(orig)
d9cf46
+            self.assertEqual(result, expect, "Error parsing %r" % orig)
d9cf46
+            self.assertEqual(len(w), 0)
d9cf46
+            with self._qsl_sep_config(';'), catch_warnings(record=True) as w:
d9cf46
+                result = urlparse.parse_qs(orig)
d9cf46
+            self.assertEqual(result, expect, "Error parsing %r" % orig)
d9cf46
+            self.assertEqual(len(w), 0)
d9cf46
+
d9cf46
+    def test_parse_qsl_separator_semicolon(self):
d9cf46
+        for orig, expect in parse_qsl_test_cases_semicolon:
d9cf46
+            result = urlparse.parse_qsl(orig, separator=';')
d9cf46
+            self.assertEqual(result, expect, "Error parsing %r" % orig)
d9cf46
+            with EnvironmentVarGuard() as environ, catch_warnings(record=True) as w:
d9cf46
+                environ['PYTHON_URLLIB_QS_SEPARATOR'] = ';'
d9cf46
+                result = urlparse.parse_qsl(orig)
d9cf46
+            self.assertEqual(result, expect, "Error parsing %r" % orig)
d9cf46
+            self.assertEqual(len(w), 0)
d9cf46
+            with self._qsl_sep_config(';'), catch_warnings(record=True) as w:
d9cf46
+                result = urlparse.parse_qsl(orig)
d9cf46
+            self.assertEqual(result, expect, "Error parsing %r" % orig)
d9cf46
+            self.assertEqual(len(w), 0)
d9cf46
+
d9cf46
+    def test_parse_qs_separator_legacy(self):
d9cf46
+        for orig, expect in parse_qs_test_cases_legacy:
d9cf46
+            with EnvironmentVarGuard() as environ, catch_warnings(record=True) as w:
d9cf46
+                environ['PYTHON_URLLIB_QS_SEPARATOR'] = 'legacy'
d9cf46
+                result = urlparse.parse_qs(orig)
d9cf46
+            self.assertEqual(result, expect, "Error parsing %r" % orig)
d9cf46
+            self.assertEqual(len(w), 0)
d9cf46
+            with self._qsl_sep_config('legacy'), catch_warnings(record=True) as w:
d9cf46
+                result = urlparse.parse_qs(orig)
d9cf46
+            self.assertEqual(result, expect, "Error parsing %r" % orig)
d9cf46
+            self.assertEqual(len(w), 0)
d9cf46
+
d9cf46
+    def test_parse_qsl_separator_legacy(self):
d9cf46
+        for orig, expect in parse_qsl_test_cases_legacy:
d9cf46
+            with EnvironmentVarGuard() as environ, catch_warnings(record=True) as w:
d9cf46
+                environ['PYTHON_URLLIB_QS_SEPARATOR'] = 'legacy'
d9cf46
+                result = urlparse.parse_qsl(orig)
d9cf46
+            self.assertEqual(result, expect, "Error parsing %r" % orig)
d9cf46
+            self.assertEqual(len(w), 0)
d9cf46
+            with self._qsl_sep_config('legacy'), catch_warnings(record=True) as w:
d9cf46
+                result = urlparse.parse_qsl(orig)
d9cf46
+            self.assertEqual(result, expect, "Error parsing %r" % orig)
d9cf46
+            self.assertEqual(len(w), 0)
d9cf46
+
d9cf46
+    def test_parse_qs_separator_bad_value_env_or_config(self):
d9cf46
+        for bad_sep in '', 'abc', 'safe', '&;', 'SEP':
d9cf46
+            with EnvironmentVarGuard() as environ, catch_warnings(record=True) as w:
d9cf46
+                environ['PYTHON_URLLIB_QS_SEPARATOR'] = bad_sep
d9cf46
+                with self.assertRaises(ValueError):
d9cf46
+                    urlparse.parse_qsl('a=1;b=2')
d9cf46
+            with self._qsl_sep_config('bad_sep'), catch_warnings(record=True) as w:
d9cf46
+                with self.assertRaises(ValueError):
d9cf46
+                    urlparse.parse_qsl('a=1;b=2')
d9cf46
+
d9cf46
+    def test_parse_qs_separator_bad_value_arg(self):
d9cf46
+        for bad_sep in True, {}, '':
d9cf46
+            with self.assertRaises(ValueError):
d9cf46
+                urlparse.parse_qsl('a=1;b=2', separator=bad_sep)
d9cf46
+
d9cf46
+    def test_parse_qs_separator_num_fields(self):
d9cf46
+        for qs, sep in (
d9cf46
+            ('a&b&c', '&'),
d9cf46
+            ('a;b;c', ';'),
d9cf46
+            ('a&b;c', 'legacy'),
d9cf46
+        ):
d9cf46
+            with EnvironmentVarGuard() as environ, catch_warnings(record=True) as w:
d9cf46
+                if sep != 'legacy':
d9cf46
+                    with self.assertRaises(ValueError):
d9cf46
+                        urlparse.parse_qsl(qs, separator=sep, max_num_fields=2)
d9cf46
+                if sep:
d9cf46
+                    environ['PYTHON_URLLIB_QS_SEPARATOR'] = sep
d9cf46
+                with self.assertRaises(ValueError):
d9cf46
+                    urlparse.parse_qsl(qs, max_num_fields=2)
d9cf46
+
d9cf46
+    def test_parse_qs_separator_priority(self):
d9cf46
+        # env variable trumps config file
d9cf46
+        with self._qsl_sep_config('~'), EnvironmentVarGuard() as environ:
d9cf46
+            environ['PYTHON_URLLIB_QS_SEPARATOR'] = '!'
d9cf46
+            result = urlparse.parse_qs('a=1!b=2~c=3')
d9cf46
+            self.assertEqual(result, {'a': ['1'], 'b': ['2~c=3']})
d9cf46
+        # argument trumps config file
d9cf46
+        with self._qsl_sep_config('~'):
d9cf46
+            result = urlparse.parse_qs('a=1$b=2~c=3', separator='$')
d9cf46
+            self.assertEqual(result, {'a': ['1'], 'b': ['2~c=3']})
d9cf46
+        # argument trumps env variable
d9cf46
+        with EnvironmentVarGuard() as environ:
d9cf46
+            environ['PYTHON_URLLIB_QS_SEPARATOR'] = '~'
d9cf46
+            result = urlparse.parse_qs('a=1$b=2~c=3', separator='$')
d9cf46
+            self.assertEqual(result, {'a': ['1'], 'b': ['2~c=3']})
d9cf46
+
d9cf46
     def test_urlsplit_normalization(self):
d9cf46
         # Certain characters should never occur in the netloc,
d9cf46
         # including under normalization.
d9cf46
diff --git a/Lib/urlparse.py b/Lib/urlparse.py
d9cf46
index 798b467b605..69504d8fd93 100644
d9cf46
--- a/Lib/urlparse.py
d9cf46
+++ b/Lib/urlparse.py
d9cf46
@@ -29,6 +29,7 @@ test_urlparse.py provides a good indicator of parsing behavior.
d9cf46
 """
d9cf46
 
d9cf46
 import re
d9cf46
+import os
d9cf46
 
d9cf46
 __all__ = ["urlparse", "urlunparse", "urljoin", "urldefrag",
d9cf46
            "urlsplit", "urlunsplit", "parse_qs", "parse_qsl"]
d9cf46
@@ -382,7 +383,8 @@ def unquote(s):
d9cf46
             append(item)
d9cf46
     return ''.join(res)
d9cf46
 
d9cf46
-def parse_qs(qs, keep_blank_values=0, strict_parsing=0, max_num_fields=None):
d9cf46
+def parse_qs(qs, keep_blank_values=0, strict_parsing=0, max_num_fields=None,
d9cf46
+             separator=None):
d9cf46
     """Parse a query given as a string argument.
d9cf46
 
d9cf46
         Arguments:
d9cf46
@@ -405,14 +407,23 @@ def parse_qs(qs, keep_blank_values=0, strict_parsing=0, max_num_fields=None):
d9cf46
     """
d9cf46
     dict = {}
d9cf46
     for name, value in parse_qsl(qs, keep_blank_values, strict_parsing,
d9cf46
-                                 max_num_fields):
d9cf46
+                                 max_num_fields, separator):
d9cf46
         if name in dict:
d9cf46
             dict[name].append(value)
d9cf46
         else:
d9cf46
             dict[name] = [value]
d9cf46
     return dict
d9cf46
 
d9cf46
-def parse_qsl(qs, keep_blank_values=0, strict_parsing=0, max_num_fields=None):
d9cf46
+class _QueryStringSeparatorWarning(RuntimeWarning):
d9cf46
+    """Warning for using default `separator` in parse_qs or parse_qsl"""
d9cf46
+
d9cf46
+# The default "separator" for parse_qsl can be specified in a config file.
d9cf46
+# It's cached after first read.
d9cf46
+_QS_SEPARATOR_CONFIG_FILENAME = '/etc/python/urllib.cfg'
d9cf46
+_default_qs_separator = None
d9cf46
+
d9cf46
+def parse_qsl(qs, keep_blank_values=0, strict_parsing=0, max_num_fields=None,
d9cf46
+              separator=None):
d9cf46
     """Parse a query given as a string argument.
d9cf46
 
d9cf46
     Arguments:
d9cf46
@@ -434,15 +445,72 @@ def parse_qsl(qs, keep_blank_values=0, strict_parsing=0, max_num_fields=None):
d9cf46
 
d9cf46
     Returns a list, as G-d intended.
d9cf46
     """
d9cf46
+
d9cf46
+    if (not separator or (not isinstance(separator, (str, bytes)))) and separator is not None:
d9cf46
+        raise ValueError("Separator must be of type string or bytes.")
d9cf46
+
d9cf46
+    # Used when both "&" and ";" act as separators. (Need a non-string value.)
d9cf46
+    _legacy = object()
d9cf46
+
d9cf46
+    if separator is None:
d9cf46
+        global _default_qs_separator
d9cf46
+        separator = _default_qs_separator
d9cf46
+        envvar_name = 'PYTHON_URLLIB_QS_SEPARATOR'
d9cf46
+        if separator is None:
d9cf46
+            # Set default separator from environment variable
d9cf46
+            separator = os.environ.get(envvar_name)
d9cf46
+            config_source = 'environment variable'
d9cf46
+        if separator is None:
d9cf46
+            # Set default separator from the configuration file
d9cf46
+            try:
d9cf46
+                file = open(_QS_SEPARATOR_CONFIG_FILENAME)
d9cf46
+            except EnvironmentError:
d9cf46
+                pass
d9cf46
+            else:
d9cf46
+                with file:
d9cf46
+                    import ConfigParser
d9cf46
+                    config = ConfigParser.ConfigParser()
d9cf46
+                    config.readfp(file)
d9cf46
+                    separator = config.get('parse_qs', envvar_name)
d9cf46
+                    _default_qs_separator = separator
d9cf46
+                config_source = _QS_SEPARATOR_CONFIG_FILENAME
d9cf46
+        if separator is None:
d9cf46
+            # The default is '&', but warn if not specified explicitly
d9cf46
+            if ';' in qs:
d9cf46
+                from warnings import warn
d9cf46
+                warn("The default separator of urlparse.parse_qsl and "
d9cf46
+                    + "parse_qs was changed to '&' to avoid a web cache "
d9cf46
+                    + "poisoning issue (CVE-2021-23336). "
d9cf46
+                    + "By default, semicolons no longer act as query field "
d9cf46
+                    + "separators. "
d9cf46
+                    + "See https://access.redhat.com/articles/5860431 for "
d9cf46
+                    + "more details.",
d9cf46
+                    _QueryStringSeparatorWarning, stacklevel=2)
d9cf46
+            separator = '&'
d9cf46
+        elif separator == 'legacy':
d9cf46
+            separator = _legacy
d9cf46
+        elif len(separator) != 1:
d9cf46
+            raise ValueError(
d9cf46
+                '{} (from {}) must contain '.format(envvar_name, config_source)
d9cf46
+                + '1 character, or "legacy". See '
d9cf46
+                + 'https://access.redhat.com/articles/5860431 for more details.'
d9cf46
+            )
d9cf46
+
d9cf46
     # If max_num_fields is defined then check that the number of fields
d9cf46
     # is less than max_num_fields. This prevents a memory exhaustion DOS
d9cf46
     # attack via post bodies with many fields.
d9cf46
     if max_num_fields is not None:
d9cf46
-        num_fields = 1 + qs.count('&') + qs.count(';')
d9cf46
+        if separator is _legacy:
d9cf46
+            num_fields = 1 + qs.count('&') + qs.count(';')
d9cf46
+        else:
d9cf46
+            num_fields = 1 + qs.count(separator)
d9cf46
         if max_num_fields < num_fields:
d9cf46
             raise ValueError('Max number of fields exceeded')
d9cf46
 
d9cf46
-    pairs = [s2 for s1 in qs.split('&') for s2 in s1.split(';')]
d9cf46
+    if separator is _legacy:
d9cf46
+        pairs = [s2 for s1 in qs.split('&') for s2 in s1.split(';')]
d9cf46
+    else:
d9cf46
+        pairs = [s1 for s1 in qs.split(separator)]
d9cf46
     r = []
d9cf46
     for name_value in pairs:
d9cf46
         if not name_value and not strict_parsing:
d9cf46
-- 
d9cf46
2.30.2
d9cf46