Blame SOURCES/00359-CVE-2021-23336.patch

468aad
From 976a4010aa4e450855dce5fa4c865bcbdc86cccd Mon Sep 17 00:00:00 2001
468aad
From: Charalampos Stratakis <cstratak@redhat.com>
468aad
Date: Fri, 16 Apr 2021 18:02:00 +0200
468aad
Subject: [PATCH] CVE-2021-23336: Add `separator` argument to parse_qs; warn
468aad
 with default
468aad
MIME-Version: 1.0
468aad
Content-Type: text/plain; charset=UTF-8
468aad
Content-Transfer-Encoding: 8bit
468aad
468aad
Partially backports https://bugs.python.org/issue42967 : [security] Address a web cache-poisoning issue reported in urllib.parse.parse_qsl().
468aad
468aad
Backported from the python3 branch.
468aad
However, this solution is different than the upstream solution in Python 3.
468aad
468aad
Based on the downstream solution for python 3.6.13 by Petr Viktorin.
468aad
468aad
An optional argument seperator is added to specify the separator.
468aad
It is recommended to set it to '&' or ';' to match the application or proxy in use.
468aad
The default can be set with an env variable of a config file.
468aad
If neither the argument, env var or config file specifies a separator, "&" is used
468aad
but a warning is raised if parse_qs is used on input that contains ';'.
468aad
468aad
Co-authors of the downstream change:
468aad
Co-authored-by: Petr Viktorin <pviktori@redhat.com>
468aad
Co-authors of the upstream change (who do not necessarily agree with this):
468aad
Co-authored-by: Adam Goldschmidt <adamgold7@gmail.com>
468aad
Co-authored-by: Ken Jin <28750310+Fidget-Spinner@users.noreply.github.com>
468aad
Co-authored-by: Éric Araujo <merwok@netwok.org>
468aad
---
468aad
 Doc/library/cgi.rst       |   5 +-
468aad
 Doc/library/urlparse.rst  |  15 ++-
468aad
 Lib/cgi.py                |  34 +++---
468aad
 Lib/test/test_cgi.py      |  59 ++++++++++-
468aad
 Lib/test/test_urlparse.py | 210 +++++++++++++++++++++++++++++++++++++-
468aad
 Lib/urlparse.py           |  78 +++++++++++++-
468aad
 6 files changed, 369 insertions(+), 32 deletions(-)
468aad
468aad
diff --git a/Doc/library/cgi.rst b/Doc/library/cgi.rst
468aad
index ecd62c8c019..a96cd38717b 100644
468aad
--- a/Doc/library/cgi.rst
468aad
+++ b/Doc/library/cgi.rst
468aad
@@ -285,10 +285,10 @@ These are useful if you want more control, or if you want to employ some of the
468aad
 algorithms implemented in this module in other circumstances.
468aad
 
468aad
 
468aad
-.. function:: parse(fp[, environ[, keep_blank_values[, strict_parsing]]])
468aad
+.. function:: parse(fp[, environ[, keep_blank_values[, strict_parsing[, separator]]]])
468aad
 
468aad
    Parse a query in the environment or from a file (the file defaults to
468aad
-   ``sys.stdin`` and environment defaults to ``os.environ``).  The *keep_blank_values* and *strict_parsing* parameters are
468aad
+   ``sys.stdin`` and environment defaults to ``os.environ``).  The *keep_blank_values*, *strict_parsing* and *separator* parameters are
468aad
    passed to :func:`urlparse.parse_qs` unchanged.
468aad
 
468aad
 
468aad
@@ -316,7 +316,6 @@ algorithms implemented in this module in other circumstances.
468aad
    Note that this does not parse nested multipart parts --- use
468aad
    :class:`FieldStorage` for that.
468aad
 
468aad
-
468aad
 .. function:: parse_header(string)
468aad
 
468aad
    Parse a MIME header (such as :mailheader:`Content-Type`) into a main value and a
468aad
diff --git a/Doc/library/urlparse.rst b/Doc/library/urlparse.rst
468aad
index 0989c88c302..97d1119257c 100644
468aad
--- a/Doc/library/urlparse.rst
468aad
+++ b/Doc/library/urlparse.rst
468aad
@@ -136,7 +136,7 @@ The :mod:`urlparse` module defines the following functions:
468aad
       now raise :exc:`ValueError`.
468aad
 
468aad
 
468aad
-.. function:: parse_qs(qs[, keep_blank_values[, strict_parsing[, max_num_fields]]])
468aad
+.. function:: parse_qs(qs[, keep_blank_values[, strict_parsing[, max_num_fields[, separator]]]])
468aad
 
468aad
    Parse a query string given as a string argument (data of type
468aad
    :mimetype:`application/x-www-form-urlencoded`).  Data are returned as a
468aad
@@ -157,6 +157,15 @@ The :mod:`urlparse` module defines the following functions:
468aad
    read. If set, then throws a :exc:`ValueError` if there are more than
468aad
    *max_num_fields* fields read.
468aad
 
468aad
+   The optional argument *separator* is the symbol to use for separating the
468aad
+   query arguments. It is recommended to set it to ``'&'`` or ``';'``.
468aad
+   It defaults to ``'&'``; a warning is raised if this default is used.
468aad
+   This default may be changed with the following environment variable settings:
468aad
+
468aad
+   - ``PYTHON_URLLIB_QS_SEPARATOR='&'``: use only ``&`` as separator, without warning (as in Python 3.6.13+ or 3.10)
468aad
+   - ``PYTHON_URLLIB_QS_SEPARATOR=';'``: use only ``;`` as separator
468aad
+   - ``PYTHON_URLLIB_QS_SEPARATOR=legacy``: use both ``&`` and ``;`` (as in previous versions of Python)
468aad
+
468aad
    Use the :func:`urllib.urlencode` function to convert such dictionaries into
468aad
    query strings.
468aad
 
468aad
@@ -186,6 +195,9 @@ The :mod:`urlparse` module defines the following functions:
468aad
    read. If set, then throws a :exc:`ValueError` if there are more than
468aad
    *max_num_fields* fields read.
468aad
 
468aad
+   The optional argument *separator* is the symbol to use for separating the
468aad
+   query arguments. It works as in :py:func:`parse_qs`.
468aad
+
468aad
    Use the :func:`urllib.urlencode` function to convert such lists of pairs into
468aad
    query strings.
468aad
 
468aad
@@ -195,6 +207,7 @@ The :mod:`urlparse` module defines the following functions:
468aad
    .. versionchanged:: 2.7.16
468aad
       Added *max_num_fields* parameter.
468aad
 
468aad
+
468aad
 .. function:: urlunparse(parts)
468aad
 
468aad
    Construct a URL from a tuple as returned by ``urlparse()``. The *parts* argument
468aad
diff --git a/Lib/cgi.py b/Lib/cgi.py
468aad
index 5b903e03477..1421f2d90e0 100755
468aad
--- a/Lib/cgi.py
468aad
+++ b/Lib/cgi.py
468aad
@@ -121,7 +121,8 @@ log = initlog           # The current logging function
468aad
 # 0 ==> unlimited input
468aad
 maxlen = 0
468aad
 
468aad
-def parse(fp=None, environ=os.environ, keep_blank_values=0, strict_parsing=0):
468aad
+def parse(fp=None, environ=os.environ, keep_blank_values=0,
468aad
+          strict_parsing=0, separator=None):
468aad
     """Parse a query in the environment or from a file (default stdin)
468aad
 
468aad
         Arguments, all optional:
468aad
@@ -140,6 +141,8 @@ def parse(fp=None, environ=os.environ, keep_blank_values=0, strict_parsing=0):
468aad
         strict_parsing: flag indicating what to do with parsing errors.
468aad
             If false (the default), errors are silently ignored.
468aad
             If true, errors raise a ValueError exception.
468aad
+
468aad
+        separator: str. The symbol to use for separating the query arguments.
468aad
     """
468aad
     if fp is None:
468aad
         fp = sys.stdin
468aad
@@ -171,25 +174,26 @@ def parse(fp=None, environ=os.environ, keep_blank_values=0, strict_parsing=0):
468aad
         else:
468aad
             qs = ""
468aad
         environ['QUERY_STRING'] = qs    # XXX Shouldn't, really
468aad
-    return urlparse.parse_qs(qs, keep_blank_values, strict_parsing)
468aad
+    return urlparse.parse_qs(qs, keep_blank_values, strict_parsing, separator=separator)
468aad
 
468aad
 
468aad
 # parse query string function called from urlparse,
468aad
 # this is done in order to maintain backward compatibility.
468aad
 
468aad
-def parse_qs(qs, keep_blank_values=0, strict_parsing=0):
468aad
+def parse_qs(qs, keep_blank_values=0, strict_parsing=0, separator=None):
468aad
     """Parse a query given as a string argument."""
468aad
     warn("cgi.parse_qs is deprecated, use urlparse.parse_qs instead",
468aad
          PendingDeprecationWarning, 2)
468aad
-    return urlparse.parse_qs(qs, keep_blank_values, strict_parsing)
468aad
+    return urlparse.parse_qs(qs, keep_blank_values, strict_parsing,
468aad
+                             separator=separator)
468aad
 
468aad
 
468aad
-def parse_qsl(qs, keep_blank_values=0, strict_parsing=0, max_num_fields=None):
468aad
+def parse_qsl(qs, keep_blank_values=0, strict_parsing=0, max_num_fields=None, separator=None):
468aad
     """Parse a query given as a string argument."""
468aad
     warn("cgi.parse_qsl is deprecated, use urlparse.parse_qsl instead",
468aad
          PendingDeprecationWarning, 2)
468aad
     return urlparse.parse_qsl(qs, keep_blank_values, strict_parsing,
468aad
-                              max_num_fields)
468aad
+                              max_num_fields, separator=separator)
468aad
 
468aad
 def parse_multipart(fp, pdict):
468aad
     """Parse multipart input.
468aad
@@ -288,7 +292,6 @@ def parse_multipart(fp, pdict):
468aad
 
468aad
     return partdict
468aad
 
468aad
-
468aad
 def _parseparam(s):
468aad
     while s[:1] == ';':
468aad
         s = s[1:]
468aad
@@ -395,7 +398,7 @@ class FieldStorage:
468aad
 
468aad
     def __init__(self, fp=None, headers=None, outerboundary="",
468aad
                  environ=os.environ, keep_blank_values=0, strict_parsing=0,
468aad
-                 max_num_fields=None):
468aad
+                 max_num_fields=None, separator=None):
468aad
         """Constructor.  Read multipart/* until last part.
468aad
 
468aad
         Arguments, all optional:
468aad
@@ -430,6 +433,7 @@ class FieldStorage:
468aad
         self.keep_blank_values = keep_blank_values
468aad
         self.strict_parsing = strict_parsing
468aad
         self.max_num_fields = max_num_fields
468aad
+        self.separator = separator
468aad
         if 'REQUEST_METHOD' in environ:
468aad
             method = environ['REQUEST_METHOD'].upper()
468aad
         self.qs_on_post = None
468aad
@@ -613,7 +617,8 @@ class FieldStorage:
468aad
         if self.qs_on_post:
468aad
             qs += '&' + self.qs_on_post
468aad
         query = urlparse.parse_qsl(qs, self.keep_blank_values,
468aad
-                                   self.strict_parsing, self.max_num_fields)
468aad
+                                   self.strict_parsing, self.max_num_fields,
468aad
+                                   self.separator)
468aad
         self.list = [MiniFieldStorage(key, value) for key, value in query]
468aad
         self.skip_lines()
468aad
 
468aad
@@ -629,7 +634,8 @@ class FieldStorage:
468aad
             query = urlparse.parse_qsl(self.qs_on_post,
468aad
                                        self.keep_blank_values,
468aad
                                        self.strict_parsing,
468aad
-                                       self.max_num_fields)
468aad
+                                       self.max_num_fields,
468aad
+                                       self.separator)
468aad
             self.list.extend(MiniFieldStorage(key, value)
468aad
                              for key, value in query)
468aad
             FieldStorageClass = None
468aad
@@ -649,7 +655,8 @@ class FieldStorage:
468aad
             headers = rfc822.Message(self.fp)
468aad
             part = klass(self.fp, headers, ib,
468aad
                          environ, keep_blank_values, strict_parsing,
468aad
-                         max_num_fields)
468aad
+                         max_num_fields,
468aad
+                         separator=self.separator)
468aad
 
468aad
             if max_num_fields is not None:
468aad
                 max_num_fields -= 1
468aad
@@ -817,10 +824,11 @@ class FormContentDict(UserDict.UserDict):
468aad
     form.dict == {key: [val, val, ...], ...}
468aad
 
468aad
     """
468aad
-    def __init__(self, environ=os.environ, keep_blank_values=0, strict_parsing=0):
468aad
+    def __init__(self, environ=os.environ, keep_blank_values=0, strict_parsing=0, separator=None):
468aad
         self.dict = self.data = parse(environ=environ,
468aad
                                       keep_blank_values=keep_blank_values,
468aad
-                                      strict_parsing=strict_parsing)
468aad
+                                      strict_parsing=strict_parsing,
468aad
+                                      separator=separator)
468aad
         self.query_string = environ['QUERY_STRING']
468aad
 
468aad
 
468aad
diff --git a/Lib/test/test_cgi.py b/Lib/test/test_cgi.py
468aad
index 743c2afbd4c..9956ea9d4e8 100644
468aad
--- a/Lib/test/test_cgi.py
468aad
+++ b/Lib/test/test_cgi.py
468aad
@@ -61,12 +61,9 @@ parse_strict_test_cases = [
468aad
     ("", ValueError("bad query field: ''")),
468aad
     ("&", ValueError("bad query field: ''")),
468aad
     ("&&", ValueError("bad query field: ''")),
468aad
-    (";", ValueError("bad query field: ''")),
468aad
-    (";&;", ValueError("bad query field: ''")),
468aad
     # Should the next few really be valid?
468aad
     ("=", {}),
468aad
     ("=&=", {}),
468aad
-    ("=;=", {}),
468aad
     # This rest seem to make sense
468aad
     ("=a", {'': ['a']}),
468aad
     ("&=a", ValueError("bad query field: ''")),
468aad
@@ -81,8 +78,6 @@ parse_strict_test_cases = [
468aad
     ("a=a+b&b=b+c", {'a': ['a b'], 'b': ['b c']}),
468aad
     ("a=a+b&a=b+a", {'a': ['a b', 'b a']}),
468aad
     ("x=1&y=2.0&z=2-3.%2b0", {'x': ['1'], 'y': ['2.0'], 'z': ['2-3.+0']}),
468aad
-    ("x=1;y=2.0&z=2-3.%2b0", {'x': ['1'], 'y': ['2.0'], 'z': ['2-3.+0']}),
468aad
-    ("x=1;y=2.0;z=2-3.%2b0", {'x': ['1'], 'y': ['2.0'], 'z': ['2-3.+0']}),
468aad
     ("Hbc5161168c542333633315dee1182227:key_store_seqid=400006&cuyer=r&view=bustomer&order_id=0bb2e248638833d48cb7fed300000f1b&expire=964546263&lobale=en-US&kid=130003.300038&ss=env",
468aad
      {'Hbc5161168c542333633315dee1182227:key_store_seqid': ['400006'],
468aad
       'cuyer': ['r'],
468aad
@@ -177,6 +172,60 @@ class CgiTests(unittest.TestCase):
468aad
                         self.assertItemsEqual(sd.items(),
468aad
                                                 first_second_elts(expect.items()))
468aad
 
468aad
+    def test_separator(self):
468aad
+        parse_semicolon = [
468aad
+            ("x=1;y=2.0", {'x': ['1'], 'y': ['2.0']}),
468aad
+            ("x=1;y=2.0;z=2-3.%2b0", {'x': ['1'], 'y': ['2.0'], 'z': ['2-3.+0']}),
468aad
+            (";", ValueError("bad query field: ''")),
468aad
+            (";;", ValueError("bad query field: ''")),
468aad
+            ("=;a", ValueError("bad query field: 'a'")),
468aad
+            (";b=a", ValueError("bad query field: ''")),
468aad
+            ("b;=a", ValueError("bad query field: 'b'")),
468aad
+            ("a=a+b;b=b+c", {'a': ['a b'], 'b': ['b c']}),
468aad
+            ("a=a+b;a=b+a", {'a': ['a b', 'b a']}),
468aad
+        ]
468aad
+        for orig, expect in parse_semicolon:
468aad
+            env = {'QUERY_STRING': orig}
468aad
+            fcd = cgi.FormContentDict(env, separator=';')
468aad
+            sd = cgi.SvFormContentDict(env, separator=';')
468aad
+            fs = cgi.FieldStorage(environ=env, separator=';')
468aad
+            if isinstance(expect, dict):
468aad
+                # test dict interface
468aad
+                self.assertEqual(len(expect), len(fcd))
468aad
+                self.assertItemsEqual(expect.keys(), fcd.keys())
468aad
+                self.assertItemsEqual(expect.values(), fcd.values())
468aad
+                self.assertItemsEqual(expect.items(), fcd.items())
468aad
+                self.assertEqual(fcd.get("nonexistent field", "default"), "default")
468aad
+                self.assertEqual(len(sd), len(fs))
468aad
+                self.assertItemsEqual(sd.keys(), fs.keys())
468aad
+                self.assertEqual(fs.getvalue("nonexistent field", "default"), "default")
468aad
+                # test individual fields
468aad
+                for key in expect.keys():
468aad
+                    expect_val = expect[key]
468aad
+                    self.assertTrue(fcd.has_key(key))
468aad
+                    self.assertItemsEqual(fcd[key], expect[key])
468aad
+                    self.assertEqual(fcd.get(key, "default"), fcd[key])
468aad
+                    self.assertTrue(fs.has_key(key))
468aad
+                    if len(expect_val) > 1:
468aad
+                        single_value = 0
468aad
+                    else:
468aad
+                        single_value = 1
468aad
+                    try:
468aad
+                        val = sd[key]
468aad
+                    except IndexError:
468aad
+                        self.assertFalse(single_value)
468aad
+                        self.assertEqual(fs.getvalue(key), expect_val)
468aad
+                    else:
468aad
+                        self.assertTrue(single_value)
468aad
+                        self.assertEqual(val, expect_val[0])
468aad
+                        self.assertEqual(fs.getvalue(key), expect_val[0])
468aad
+                    self.assertItemsEqual(sd.getlist(key), expect_val)
468aad
+                    if single_value:
468aad
+                        self.assertItemsEqual(sd.values(),
468aad
+                                                first_elts(expect.values()))
468aad
+                        self.assertItemsEqual(sd.items(),
468aad
+                                                first_second_elts(expect.items()))
468aad
+
468aad
     def test_weird_formcontentdict(self):
468aad
         # Test the weird FormContentDict classes
468aad
         env = {'QUERY_STRING': "x=1&y=2.0&z=2-3.%2b0&1=1abc"}
468aad
diff --git a/Lib/test/test_urlparse.py b/Lib/test/test_urlparse.py
468aad
index 86c4a0595c4..21875bb2991 100644
468aad
--- a/Lib/test/test_urlparse.py
468aad
+++ b/Lib/test/test_urlparse.py
468aad
@@ -3,6 +3,12 @@ import sys
468aad
 import unicodedata
468aad
 import unittest
468aad
 import urlparse
468aad
+from test.support import EnvironmentVarGuard
468aad
+from warnings import catch_warnings, filterwarnings
468aad
+import tempfile
468aad
+import contextlib
468aad
+import os.path
468aad
+import shutil
468aad
 
468aad
 RFC1808_BASE = "http://a/b/c/d;p?q#f"
468aad
 RFC2396_BASE = "http://a/b/c/d;p?q"
468aad
@@ -24,16 +30,29 @@ parse_qsl_test_cases = [
468aad
     ("&a=b", [('a', 'b')]),
468aad
     ("a=a+b&b=b+c", [('a', 'a b'), ('b', 'b c')]),
468aad
     ("a=1&a=2", [('a', '1'), ('a', '2')]),
468aad
+]
468aad
+
468aad
+parse_qsl_test_cases_semicolon = [
468aad
     (";", []),
468aad
     (";;", []),
468aad
     (";a=b", [('a', 'b')]),
468aad
     ("a=a+b;b=b+c", [('a', 'a b'), ('b', 'b c')]),
468aad
     ("a=1;a=2", [('a', '1'), ('a', '2')]),
468aad
-    (b";", []),
468aad
-    (b";;", []),
468aad
-    (b";a=b", [(b'a', b'b')]),
468aad
-    (b"a=a+b;b=b+c", [(b'a', b'a b'), (b'b', b'b c')]),
468aad
-    (b"a=1;a=2", [(b'a', b'1'), (b'a', b'2')]),
468aad
+]
468aad
+
468aad
+parse_qsl_test_cases_legacy = [
468aad
+    ("a=1;a=2&a=3", [('a', '1'), ('a', '2'), ('a', '3')]),
468aad
+    ("a=1;b=2&c=3", [('a', '1'), ('b', '2'), ('c', '3')]),
468aad
+    ("a=1&b=2&c=3;", [('a', '1'), ('b', '2'), ('c', '3')]),
468aad
+]
468aad
+
468aad
+parse_qsl_test_cases_warn = [
468aad
+    (";a=b", [(';a', 'b')]),
468aad
+    ("a=a+b;b=b+c", [('a', 'a b;b=b c')]),
468aad
+    (b";a=b", [(b';a', b'b')]),
468aad
+    (b"a=a+b;b=b+c", [(b'a', b'a b;b=b c')]),
468aad
+    ("a=1;a=2&a=3", [('a', '1;a=2'), ('a', '3')]),
468aad
+    (b"a=1;a=2&a=3", [(b'a', b'1;a=2'), (b'a', b'3')]),
468aad
 ]
468aad
 
468aad
 parse_qs_test_cases = [
468aad
@@ -57,6 +76,9 @@ parse_qs_test_cases = [
468aad
     (b"&a=b", {b'a': [b'b']}),
468aad
     (b"a=a+b&b=b+c", {b'a': [b'a b'], b'b': [b'b c']}),
468aad
     (b"a=1&a=2", {b'a': [b'1', b'2']}),
468aad
+]
468aad
+
468aad
+parse_qs_test_cases_semicolon = [
468aad
     (";", {}),
468aad
     (";;", {}),
468aad
     (";a=b", {'a': ['b']}),
468aad
@@ -69,6 +91,24 @@ parse_qs_test_cases = [
468aad
     (b"a=1;a=2", {b'a': [b'1', b'2']}),
468aad
 ]
468aad
 
468aad
+parse_qs_test_cases_legacy = [
468aad
+    ("a=1;a=2&a=3", {'a': ['1', '2', '3']}),
468aad
+    ("a=1;b=2&c=3", {'a': ['1'], 'b': ['2'], 'c': ['3']}),
468aad
+    ("a=1&b=2&c=3;", {'a': ['1'], 'b': ['2'], 'c': ['3']}),
468aad
+    (b"a=1;a=2&a=3", {b'a': [b'1', b'2', b'3']}),
468aad
+    (b"a=1;b=2&c=3", {b'a': [b'1'], b'b': [b'2'], b'c': [b'3']}),
468aad
+    (b"a=1&b=2&c=3;", {b'a': [b'1'], b'b': [b'2'], b'c': [b'3']}),
468aad
+]
468aad
+
468aad
+parse_qs_test_cases_warn = [
468aad
+    (";a=b", {';a': ['b']}),
468aad
+    ("a=a+b;b=b+c", {'a': ['a b;b=b c']}),
468aad
+    (b";a=b", {b';a': [b'b']}),
468aad
+    (b"a=a+b;b=b+c", {b'a':[ b'a b;b=b c']}),
468aad
+    ("a=1;a=2&a=3", {'a': ['1;a=2', '3']}),
468aad
+    (b"a=1;a=2&a=3", {b'a': [b'1;a=2', b'3']}),
468aad
+]
468aad
+
468aad
 class UrlParseTestCase(unittest.TestCase):
468aad
 
468aad
     def checkRoundtrips(self, url, parsed, split):
468aad
@@ -141,6 +181,40 @@ class UrlParseTestCase(unittest.TestCase):
468aad
             self.assertEqual(result, expect_without_blanks,
468aad
                     "Error parsing %r" % orig)
468aad
 
468aad
+    def test_qs_default_warn(self):
468aad
+        for orig, expect in parse_qs_test_cases_warn:
468aad
+            with catch_warnings(record=True) as w:
468aad
+                filterwarnings(action='always',
468aad
+                                        category=urlparse._QueryStringSeparatorWarning)
468aad
+                result = urlparse.parse_qs(orig, keep_blank_values=True)
468aad
+                self.assertEqual(result, expect, "Error parsing %r" % orig)
468aad
+            self.assertEqual(len(w), 1)
468aad
+            self.assertEqual(w[0].category, urlparse._QueryStringSeparatorWarning)
468aad
+
468aad
+    def test_qsl_default_warn(self):
468aad
+        for orig, expect in parse_qsl_test_cases_warn:
468aad
+            with catch_warnings(record=True) as w:
468aad
+                filterwarnings(action='always',
468aad
+                               category=urlparse._QueryStringSeparatorWarning)
468aad
+                result = urlparse.parse_qsl(orig, keep_blank_values=True)
468aad
+                self.assertEqual(result, expect, "Error parsing %r" % orig)
468aad
+            self.assertEqual(len(w), 1)
468aad
+            self.assertEqual(w[0].category, urlparse._QueryStringSeparatorWarning)
468aad
+
468aad
+    def test_default_qs_no_warnings(self):
468aad
+        for orig, expect in parse_qs_test_cases:
468aad
+            with catch_warnings(record=True) as w:
468aad
+                result = urlparse.parse_qs(orig, keep_blank_values=True)
468aad
+                self.assertEqual(result, expect, "Error parsing %r" % orig)
468aad
+            self.assertEqual(len(w), 0)
468aad
+
468aad
+    def test_default_qsl_no_warnings(self):
468aad
+        for orig, expect in parse_qsl_test_cases:
468aad
+            with catch_warnings(record=True) as w:
468aad
+                result = urlparse.parse_qsl(orig, keep_blank_values=True)
468aad
+                self.assertEqual(result, expect, "Error parsing %r" % orig)
468aad
+            self.assertEqual(len(w), 0)
468aad
+
468aad
     def test_roundtrips(self):
468aad
         testcases = [
468aad
             ('file:///tmp/junk.txt',
468aad
@@ -626,6 +700,132 @@ class UrlParseTestCase(unittest.TestCase):
468aad
         self.assertEqual(urlparse.urlparse("http://www.python.org:80"),
468aad
                 ('http','www.python.org:80','','','',''))
468aad
 
468aad
+    def test_parse_qs_separator_bytes(self):
468aad
+        expected = {b'a': [b'1'], b'b': [b'2']}
468aad
+
468aad
+        result = urlparse.parse_qs(b'a=1;b=2', separator=b';')
468aad
+        self.assertEqual(result, expected)
468aad
+        result = urlparse.parse_qs(b'a=1;b=2', separator=';')
468aad
+        self.assertEqual(result, expected)
468aad
+        result = urlparse.parse_qs('a=1;b=2', separator=';')
468aad
+        self.assertEqual(result, {'a': ['1'], 'b': ['2']})
468aad
+
468aad
+    @contextlib.contextmanager
468aad
+    def _qsl_sep_config(self, sep):
468aad
+        """Context for the given parse_qsl default separator configured in config file"""
468aad
+        old_filename = urlparse._QS_SEPARATOR_CONFIG_FILENAME
468aad
+        urlparse._default_qs_separator = None
468aad
+        try:
468aad
+            tmpdirname = tempfile.mkdtemp()
468aad
+            filename = os.path.join(tmpdirname, 'conf.cfg')
468aad
+            with open(filename, 'w') as file:
468aad
+                file.write('[parse_qs]\n')
468aad
+                file.write('PYTHON_URLLIB_QS_SEPARATOR = {}'.format(sep))
468aad
+            urlparse._QS_SEPARATOR_CONFIG_FILENAME = filename
468aad
+            yield
468aad
+        finally:
468aad
+            urlparse._QS_SEPARATOR_CONFIG_FILENAME = old_filename
468aad
+            urlparse._default_qs_separator = None
468aad
+            shutil.rmtree(tmpdirname)
468aad
+
468aad
+    def test_parse_qs_separator_semicolon(self):
468aad
+        for orig, expect in parse_qs_test_cases_semicolon:
468aad
+            result = urlparse.parse_qs(orig, separator=';')
468aad
+            self.assertEqual(result, expect, "Error parsing %r" % orig)
468aad
+            with EnvironmentVarGuard() as environ, catch_warnings(record=True) as w:
468aad
+                environ['PYTHON_URLLIB_QS_SEPARATOR'] = ';'
468aad
+                result = urlparse.parse_qs(orig)
468aad
+            self.assertEqual(result, expect, "Error parsing %r" % orig)
468aad
+            self.assertEqual(len(w), 0)
468aad
+            with self._qsl_sep_config(';'), catch_warnings(record=True) as w:
468aad
+                result = urlparse.parse_qs(orig)
468aad
+            self.assertEqual(result, expect, "Error parsing %r" % orig)
468aad
+            self.assertEqual(len(w), 0)
468aad
+
468aad
+    def test_parse_qsl_separator_semicolon(self):
468aad
+        for orig, expect in parse_qsl_test_cases_semicolon:
468aad
+            result = urlparse.parse_qsl(orig, separator=';')
468aad
+            self.assertEqual(result, expect, "Error parsing %r" % orig)
468aad
+            with EnvironmentVarGuard() as environ, catch_warnings(record=True) as w:
468aad
+                environ['PYTHON_URLLIB_QS_SEPARATOR'] = ';'
468aad
+                result = urlparse.parse_qsl(orig)
468aad
+            self.assertEqual(result, expect, "Error parsing %r" % orig)
468aad
+            self.assertEqual(len(w), 0)
468aad
+            with self._qsl_sep_config(';'), catch_warnings(record=True) as w:
468aad
+                result = urlparse.parse_qsl(orig)
468aad
+            self.assertEqual(result, expect, "Error parsing %r" % orig)
468aad
+            self.assertEqual(len(w), 0)
468aad
+
468aad
+    def test_parse_qs_separator_legacy(self):
468aad
+        for orig, expect in parse_qs_test_cases_legacy:
468aad
+            with EnvironmentVarGuard() as environ, catch_warnings(record=True) as w:
468aad
+                environ['PYTHON_URLLIB_QS_SEPARATOR'] = 'legacy'
468aad
+                result = urlparse.parse_qs(orig)
468aad
+            self.assertEqual(result, expect, "Error parsing %r" % orig)
468aad
+            self.assertEqual(len(w), 0)
468aad
+            with self._qsl_sep_config('legacy'), catch_warnings(record=True) as w:
468aad
+                result = urlparse.parse_qs(orig)
468aad
+            self.assertEqual(result, expect, "Error parsing %r" % orig)
468aad
+            self.assertEqual(len(w), 0)
468aad
+
468aad
+    def test_parse_qsl_separator_legacy(self):
468aad
+        for orig, expect in parse_qsl_test_cases_legacy:
468aad
+            with EnvironmentVarGuard() as environ, catch_warnings(record=True) as w:
468aad
+                environ['PYTHON_URLLIB_QS_SEPARATOR'] = 'legacy'
468aad
+                result = urlparse.parse_qsl(orig)
468aad
+            self.assertEqual(result, expect, "Error parsing %r" % orig)
468aad
+            self.assertEqual(len(w), 0)
468aad
+            with self._qsl_sep_config('legacy'), catch_warnings(record=True) as w:
468aad
+                result = urlparse.parse_qsl(orig)
468aad
+            self.assertEqual(result, expect, "Error parsing %r" % orig)
468aad
+            self.assertEqual(len(w), 0)
468aad
+
468aad
+    def test_parse_qs_separator_bad_value_env_or_config(self):
468aad
+        for bad_sep in '', 'abc', 'safe', '&;', 'SEP':
468aad
+            with EnvironmentVarGuard() as environ, catch_warnings(record=True) as w:
468aad
+                environ['PYTHON_URLLIB_QS_SEPARATOR'] = bad_sep
468aad
+                with self.assertRaises(ValueError):
468aad
+                    urlparse.parse_qsl('a=1;b=2')
468aad
+            with self._qsl_sep_config('bad_sep'), catch_warnings(record=True) as w:
468aad
+                with self.assertRaises(ValueError):
468aad
+                    urlparse.parse_qsl('a=1;b=2')
468aad
+
468aad
+    def test_parse_qs_separator_bad_value_arg(self):
468aad
+        for bad_sep in True, {}, '':
468aad
+            with self.assertRaises(ValueError):
468aad
+                urlparse.parse_qsl('a=1;b=2', separator=bad_sep)
468aad
+
468aad
+    def test_parse_qs_separator_num_fields(self):
468aad
+        for qs, sep in (
468aad
+            ('a&b&c', '&'),
468aad
+            ('a;b;c', ';'),
468aad
+            ('a&b;c', 'legacy'),
468aad
+        ):
468aad
+            with EnvironmentVarGuard() as environ, catch_warnings(record=True) as w:
468aad
+                if sep != 'legacy':
468aad
+                    with self.assertRaises(ValueError):
468aad
+                        urlparse.parse_qsl(qs, separator=sep, max_num_fields=2)
468aad
+                if sep:
468aad
+                    environ['PYTHON_URLLIB_QS_SEPARATOR'] = sep
468aad
+                with self.assertRaises(ValueError):
468aad
+                    urlparse.parse_qsl(qs, max_num_fields=2)
468aad
+
468aad
+    def test_parse_qs_separator_priority(self):
468aad
+        # env variable trumps config file
468aad
+        with self._qsl_sep_config('~'), EnvironmentVarGuard() as environ:
468aad
+            environ['PYTHON_URLLIB_QS_SEPARATOR'] = '!'
468aad
+            result = urlparse.parse_qs('a=1!b=2~c=3')
468aad
+            self.assertEqual(result, {'a': ['1'], 'b': ['2~c=3']})
468aad
+        # argument trumps config file
468aad
+        with self._qsl_sep_config('~'):
468aad
+            result = urlparse.parse_qs('a=1$b=2~c=3', separator='$')
468aad
+            self.assertEqual(result, {'a': ['1'], 'b': ['2~c=3']})
468aad
+        # argument trumps env variable
468aad
+        with EnvironmentVarGuard() as environ:
468aad
+            environ['PYTHON_URLLIB_QS_SEPARATOR'] = '~'
468aad
+            result = urlparse.parse_qs('a=1$b=2~c=3', separator='$')
468aad
+            self.assertEqual(result, {'a': ['1'], 'b': ['2~c=3']})
468aad
+
468aad
     def test_urlsplit_normalization(self):
468aad
         # Certain characters should never occur in the netloc,
468aad
         # including under normalization.
468aad
diff --git a/Lib/urlparse.py b/Lib/urlparse.py
468aad
index 798b467b605..69504d8fd93 100644
468aad
--- a/Lib/urlparse.py
468aad
+++ b/Lib/urlparse.py
468aad
@@ -29,6 +29,7 @@ test_urlparse.py provides a good indicator of parsing behavior.
468aad
 """
468aad
 
468aad
 import re
468aad
+import os
468aad
 
468aad
 __all__ = ["urlparse", "urlunparse", "urljoin", "urldefrag",
468aad
            "urlsplit", "urlunsplit", "parse_qs", "parse_qsl"]
468aad
@@ -382,7 +383,8 @@ def unquote(s):
468aad
             append(item)
468aad
     return ''.join(res)
468aad
 
468aad
-def parse_qs(qs, keep_blank_values=0, strict_parsing=0, max_num_fields=None):
468aad
+def parse_qs(qs, keep_blank_values=0, strict_parsing=0, max_num_fields=None,
468aad
+             separator=None):
468aad
     """Parse a query given as a string argument.
468aad
 
468aad
         Arguments:
468aad
@@ -405,14 +407,23 @@ def parse_qs(qs, keep_blank_values=0, strict_parsing=0, max_num_fields=None):
468aad
     """
468aad
     dict = {}
468aad
     for name, value in parse_qsl(qs, keep_blank_values, strict_parsing,
468aad
-                                 max_num_fields):
468aad
+                                 max_num_fields, separator):
468aad
         if name in dict:
468aad
             dict[name].append(value)
468aad
         else:
468aad
             dict[name] = [value]
468aad
     return dict
468aad
 
468aad
-def parse_qsl(qs, keep_blank_values=0, strict_parsing=0, max_num_fields=None):
468aad
+class _QueryStringSeparatorWarning(RuntimeWarning):
468aad
+    """Warning for using default `separator` in parse_qs or parse_qsl"""
468aad
+
468aad
+# The default "separator" for parse_qsl can be specified in a config file.
468aad
+# It's cached after first read.
468aad
+_QS_SEPARATOR_CONFIG_FILENAME = '/etc/python/urllib.cfg'
468aad
+_default_qs_separator = None
468aad
+
468aad
+def parse_qsl(qs, keep_blank_values=0, strict_parsing=0, max_num_fields=None,
468aad
+              separator=None):
468aad
     """Parse a query given as a string argument.
468aad
 
468aad
     Arguments:
468aad
@@ -434,15 +445,72 @@ def parse_qsl(qs, keep_blank_values=0, strict_parsing=0, max_num_fields=None):
468aad
 
468aad
     Returns a list, as G-d intended.
468aad
     """
468aad
+
468aad
+    if (not separator or (not isinstance(separator, (str, bytes)))) and separator is not None:
468aad
+        raise ValueError("Separator must be of type string or bytes.")
468aad
+
468aad
+    # Used when both "&" and ";" act as separators. (Need a non-string value.)
468aad
+    _legacy = object()
468aad
+
468aad
+    if separator is None:
468aad
+        global _default_qs_separator
468aad
+        separator = _default_qs_separator
468aad
+        envvar_name = 'PYTHON_URLLIB_QS_SEPARATOR'
468aad
+        if separator is None:
468aad
+            # Set default separator from environment variable
468aad
+            separator = os.environ.get(envvar_name)
468aad
+            config_source = 'environment variable'
468aad
+        if separator is None:
468aad
+            # Set default separator from the configuration file
468aad
+            try:
468aad
+                file = open(_QS_SEPARATOR_CONFIG_FILENAME)
468aad
+            except EnvironmentError:
468aad
+                pass
468aad
+            else:
468aad
+                with file:
468aad
+                    import ConfigParser
468aad
+                    config = ConfigParser.ConfigParser()
468aad
+                    config.readfp(file)
468aad
+                    separator = config.get('parse_qs', envvar_name)
468aad
+                    _default_qs_separator = separator
468aad
+                config_source = _QS_SEPARATOR_CONFIG_FILENAME
468aad
+        if separator is None:
468aad
+            # The default is '&', but warn if not specified explicitly
468aad
+            if ';' in qs:
468aad
+                from warnings import warn
468aad
+                warn("The default separator of urlparse.parse_qsl and "
468aad
+                    + "parse_qs was changed to '&' to avoid a web cache "
468aad
+                    + "poisoning issue (CVE-2021-23336). "
468aad
+                    + "By default, semicolons no longer act as query field "
468aad
+                    + "separators. "
468aad
+                    + "See https://access.redhat.com/articles/5860431 for "
468aad
+                    + "more details.",
468aad
+                    _QueryStringSeparatorWarning, stacklevel=2)
468aad
+            separator = '&'
468aad
+        elif separator == 'legacy':
468aad
+            separator = _legacy
468aad
+        elif len(separator) != 1:
468aad
+            raise ValueError(
468aad
+                '{} (from {}) must contain '.format(envvar_name, config_source)
468aad
+                + '1 character, or "legacy". See '
468aad
+                + 'https://access.redhat.com/articles/5860431 for more details.'
468aad
+            )
468aad
+
468aad
     # If max_num_fields is defined then check that the number of fields
468aad
     # is less than max_num_fields. This prevents a memory exhaustion DOS
468aad
     # attack via post bodies with many fields.
468aad
     if max_num_fields is not None:
468aad
-        num_fields = 1 + qs.count('&') + qs.count(';')
468aad
+        if separator is _legacy:
468aad
+            num_fields = 1 + qs.count('&') + qs.count(';')
468aad
+        else:
468aad
+            num_fields = 1 + qs.count(separator)
468aad
         if max_num_fields < num_fields:
468aad
             raise ValueError('Max number of fields exceeded')
468aad
 
468aad
-    pairs = [s2 for s1 in qs.split('&') for s2 in s1.split(';')]
468aad
+    if separator is _legacy:
468aad
+        pairs = [s2 for s1 in qs.split('&') for s2 in s1.split(';')]
468aad
+    else:
468aad
+        pairs = [s1 for s1 in qs.split(separator)]
468aad
     r = []
468aad
     for name_value in pairs:
468aad
         if not name_value and not strict_parsing:
468aad
-- 
468aad
2.30.2
468aad