Blame SOURCES/00359-CVE-2021-23336.patch

88175c
From 976a4010aa4e450855dce5fa4c865bcbdc86cccd Mon Sep 17 00:00:00 2001
88175c
From: Charalampos Stratakis <cstratak@redhat.com>
88175c
Date: Fri, 16 Apr 2021 18:02:00 +0200
88175c
Subject: [PATCH] CVE-2021-23336: Add `separator` argument to parse_qs; warn
88175c
 with default
88175c
MIME-Version: 1.0
88175c
Content-Type: text/plain; charset=UTF-8
88175c
Content-Transfer-Encoding: 8bit
88175c
88175c
Partially backports https://bugs.python.org/issue42967 : [security] Address a web cache-poisoning issue reported in urllib.parse.parse_qsl().
88175c
88175c
Backported from the python3 branch.
88175c
However, this solution is different than the upstream solution in Python 3.
88175c
88175c
Based on the downstream solution for python 3.6.13 by Petr Viktorin.
88175c
88175c
An optional argument seperator is added to specify the separator.
88175c
It is recommended to set it to '&' or ';' to match the application or proxy in use.
88175c
The default can be set with an env variable of a config file.
88175c
If neither the argument, env var or config file specifies a separator, "&" is used
88175c
but a warning is raised if parse_qs is used on input that contains ';'.
88175c
88175c
Co-authors of the downstream change:
88175c
Co-authored-by: Petr Viktorin <pviktori@redhat.com>
88175c
Co-authors of the upstream change (who do not necessarily agree with this):
88175c
Co-authored-by: Adam Goldschmidt <adamgold7@gmail.com>
88175c
Co-authored-by: Ken Jin <28750310+Fidget-Spinner@users.noreply.github.com>
88175c
Co-authored-by: Éric Araujo <merwok@netwok.org>
88175c
---
88175c
 Doc/library/cgi.rst       |   5 +-
88175c
 Doc/library/urlparse.rst  |  15 ++-
88175c
 Lib/cgi.py                |  34 +++---
88175c
 Lib/test/test_cgi.py      |  59 ++++++++++-
88175c
 Lib/test/test_urlparse.py | 210 +++++++++++++++++++++++++++++++++++++-
88175c
 Lib/urlparse.py           |  78 +++++++++++++-
88175c
 6 files changed, 369 insertions(+), 32 deletions(-)
88175c
88175c
diff --git a/Doc/library/cgi.rst b/Doc/library/cgi.rst
88175c
index ecd62c8c019..a96cd38717b 100644
88175c
--- a/Doc/library/cgi.rst
88175c
+++ b/Doc/library/cgi.rst
88175c
@@ -285,10 +285,10 @@ These are useful if you want more control, or if you want to employ some of the
88175c
 algorithms implemented in this module in other circumstances.
88175c
 
88175c
 
88175c
-.. function:: parse(fp[, environ[, keep_blank_values[, strict_parsing]]])
88175c
+.. function:: parse(fp[, environ[, keep_blank_values[, strict_parsing[, separator]]]])
88175c
 
88175c
    Parse a query in the environment or from a file (the file defaults to
88175c
-   ``sys.stdin`` and environment defaults to ``os.environ``).  The *keep_blank_values* and *strict_parsing* parameters are
88175c
+   ``sys.stdin`` and environment defaults to ``os.environ``).  The *keep_blank_values*, *strict_parsing* and *separator* parameters are
88175c
    passed to :func:`urlparse.parse_qs` unchanged.
88175c
 
88175c
 
88175c
@@ -316,7 +316,6 @@ algorithms implemented in this module in other circumstances.
88175c
    Note that this does not parse nested multipart parts --- use
88175c
    :class:`FieldStorage` for that.
88175c
 
88175c
-
88175c
 .. function:: parse_header(string)
88175c
 
88175c
    Parse a MIME header (such as :mailheader:`Content-Type`) into a main value and a
88175c
diff --git a/Doc/library/urlparse.rst b/Doc/library/urlparse.rst
88175c
index 0989c88c302..97d1119257c 100644
88175c
--- a/Doc/library/urlparse.rst
88175c
+++ b/Doc/library/urlparse.rst
88175c
@@ -136,7 +136,7 @@ The :mod:`urlparse` module defines the following functions:
88175c
       now raise :exc:`ValueError`.
88175c
 
88175c
 
88175c
-.. function:: parse_qs(qs[, keep_blank_values[, strict_parsing[, max_num_fields]]])
88175c
+.. function:: parse_qs(qs[, keep_blank_values[, strict_parsing[, max_num_fields[, separator]]]])
88175c
 
88175c
    Parse a query string given as a string argument (data of type
88175c
    :mimetype:`application/x-www-form-urlencoded`).  Data are returned as a
88175c
@@ -157,6 +157,15 @@ The :mod:`urlparse` module defines the following functions:
88175c
    read. If set, then throws a :exc:`ValueError` if there are more than
88175c
    *max_num_fields* fields read.
88175c
 
88175c
+   The optional argument *separator* is the symbol to use for separating the
88175c
+   query arguments. It is recommended to set it to ``'&'`` or ``';'``.
88175c
+   It defaults to ``'&'``; a warning is raised if this default is used.
88175c
+   This default may be changed with the following environment variable settings:
88175c
+
88175c
+   - ``PYTHON_URLLIB_QS_SEPARATOR='&'``: use only ``&`` as separator, without warning (as in Python 3.6.13+ or 3.10)
88175c
+   - ``PYTHON_URLLIB_QS_SEPARATOR=';'``: use only ``;`` as separator
88175c
+   - ``PYTHON_URLLIB_QS_SEPARATOR=legacy``: use both ``&`` and ``;`` (as in previous versions of Python)
88175c
+
88175c
    Use the :func:`urllib.urlencode` function to convert such dictionaries into
88175c
    query strings.
88175c
 
88175c
@@ -186,6 +195,9 @@ The :mod:`urlparse` module defines the following functions:
88175c
    read. If set, then throws a :exc:`ValueError` if there are more than
88175c
    *max_num_fields* fields read.
88175c
 
88175c
+   The optional argument *separator* is the symbol to use for separating the
88175c
+   query arguments. It works as in :py:func:`parse_qs`.
88175c
+
88175c
    Use the :func:`urllib.urlencode` function to convert such lists of pairs into
88175c
    query strings.
88175c
 
88175c
@@ -195,6 +207,7 @@ The :mod:`urlparse` module defines the following functions:
88175c
    .. versionchanged:: 2.7.16
88175c
       Added *max_num_fields* parameter.
88175c
 
88175c
+
88175c
 .. function:: urlunparse(parts)
88175c
 
88175c
    Construct a URL from a tuple as returned by ``urlparse()``. The *parts* argument
88175c
diff --git a/Lib/cgi.py b/Lib/cgi.py
88175c
index 5b903e03477..1421f2d90e0 100755
88175c
--- a/Lib/cgi.py
88175c
+++ b/Lib/cgi.py
88175c
@@ -121,7 +121,8 @@ log = initlog           # The current logging function
88175c
 # 0 ==> unlimited input
88175c
 maxlen = 0
88175c
 
88175c
-def parse(fp=None, environ=os.environ, keep_blank_values=0, strict_parsing=0):
88175c
+def parse(fp=None, environ=os.environ, keep_blank_values=0,
88175c
+          strict_parsing=0, separator=None):
88175c
     """Parse a query in the environment or from a file (default stdin)
88175c
 
88175c
         Arguments, all optional:
88175c
@@ -140,6 +141,8 @@ def parse(fp=None, environ=os.environ, keep_blank_values=0, strict_parsing=0):
88175c
         strict_parsing: flag indicating what to do with parsing errors.
88175c
             If false (the default), errors are silently ignored.
88175c
             If true, errors raise a ValueError exception.
88175c
+
88175c
+        separator: str. The symbol to use for separating the query arguments.
88175c
     """
88175c
     if fp is None:
88175c
         fp = sys.stdin
88175c
@@ -171,25 +174,26 @@ def parse(fp=None, environ=os.environ, keep_blank_values=0, strict_parsing=0):
88175c
         else:
88175c
             qs = ""
88175c
         environ['QUERY_STRING'] = qs    # XXX Shouldn't, really
88175c
-    return urlparse.parse_qs(qs, keep_blank_values, strict_parsing)
88175c
+    return urlparse.parse_qs(qs, keep_blank_values, strict_parsing, separator=separator)
88175c
 
88175c
 
88175c
 # parse query string function called from urlparse,
88175c
 # this is done in order to maintain backward compatibility.
88175c
 
88175c
-def parse_qs(qs, keep_blank_values=0, strict_parsing=0):
88175c
+def parse_qs(qs, keep_blank_values=0, strict_parsing=0, separator=None):
88175c
     """Parse a query given as a string argument."""
88175c
     warn("cgi.parse_qs is deprecated, use urlparse.parse_qs instead",
88175c
          PendingDeprecationWarning, 2)
88175c
-    return urlparse.parse_qs(qs, keep_blank_values, strict_parsing)
88175c
+    return urlparse.parse_qs(qs, keep_blank_values, strict_parsing,
88175c
+                             separator=separator)
88175c
 
88175c
 
88175c
-def parse_qsl(qs, keep_blank_values=0, strict_parsing=0, max_num_fields=None):
88175c
+def parse_qsl(qs, keep_blank_values=0, strict_parsing=0, max_num_fields=None, separator=None):
88175c
     """Parse a query given as a string argument."""
88175c
     warn("cgi.parse_qsl is deprecated, use urlparse.parse_qsl instead",
88175c
          PendingDeprecationWarning, 2)
88175c
     return urlparse.parse_qsl(qs, keep_blank_values, strict_parsing,
88175c
-                              max_num_fields)
88175c
+                              max_num_fields, separator=separator)
88175c
 
88175c
 def parse_multipart(fp, pdict):
88175c
     """Parse multipart input.
88175c
@@ -288,7 +292,6 @@ def parse_multipart(fp, pdict):
88175c
 
88175c
     return partdict
88175c
 
88175c
-
88175c
 def _parseparam(s):
88175c
     while s[:1] == ';':
88175c
         s = s[1:]
88175c
@@ -395,7 +398,7 @@ class FieldStorage:
88175c
 
88175c
     def __init__(self, fp=None, headers=None, outerboundary="",
88175c
                  environ=os.environ, keep_blank_values=0, strict_parsing=0,
88175c
-                 max_num_fields=None):
88175c
+                 max_num_fields=None, separator=None):
88175c
         """Constructor.  Read multipart/* until last part.
88175c
 
88175c
         Arguments, all optional:
88175c
@@ -430,6 +433,7 @@ class FieldStorage:
88175c
         self.keep_blank_values = keep_blank_values
88175c
         self.strict_parsing = strict_parsing
88175c
         self.max_num_fields = max_num_fields
88175c
+        self.separator = separator
88175c
         if 'REQUEST_METHOD' in environ:
88175c
             method = environ['REQUEST_METHOD'].upper()
88175c
         self.qs_on_post = None
88175c
@@ -613,7 +617,8 @@ class FieldStorage:
88175c
         if self.qs_on_post:
88175c
             qs += '&' + self.qs_on_post
88175c
         query = urlparse.parse_qsl(qs, self.keep_blank_values,
88175c
-                                   self.strict_parsing, self.max_num_fields)
88175c
+                                   self.strict_parsing, self.max_num_fields,
88175c
+                                   self.separator)
88175c
         self.list = [MiniFieldStorage(key, value) for key, value in query]
88175c
         self.skip_lines()
88175c
 
88175c
@@ -629,7 +634,8 @@ class FieldStorage:
88175c
             query = urlparse.parse_qsl(self.qs_on_post,
88175c
                                        self.keep_blank_values,
88175c
                                        self.strict_parsing,
88175c
-                                       self.max_num_fields)
88175c
+                                       self.max_num_fields,
88175c
+                                       self.separator)
88175c
             self.list.extend(MiniFieldStorage(key, value)
88175c
                              for key, value in query)
88175c
             FieldStorageClass = None
88175c
@@ -649,7 +655,8 @@ class FieldStorage:
88175c
             headers = rfc822.Message(self.fp)
88175c
             part = klass(self.fp, headers, ib,
88175c
                          environ, keep_blank_values, strict_parsing,
88175c
-                         max_num_fields)
88175c
+                         max_num_fields,
88175c
+                         separator=self.separator)
88175c
 
88175c
             if max_num_fields is not None:
88175c
                 max_num_fields -= 1
88175c
@@ -817,10 +824,11 @@ class FormContentDict(UserDict.UserDict):
88175c
     form.dict == {key: [val, val, ...], ...}
88175c
 
88175c
     """
88175c
-    def __init__(self, environ=os.environ, keep_blank_values=0, strict_parsing=0):
88175c
+    def __init__(self, environ=os.environ, keep_blank_values=0, strict_parsing=0, separator=None):
88175c
         self.dict = self.data = parse(environ=environ,
88175c
                                       keep_blank_values=keep_blank_values,
88175c
-                                      strict_parsing=strict_parsing)
88175c
+                                      strict_parsing=strict_parsing,
88175c
+                                      separator=separator)
88175c
         self.query_string = environ['QUERY_STRING']
88175c
 
88175c
 
88175c
diff --git a/Lib/test/test_cgi.py b/Lib/test/test_cgi.py
88175c
index 743c2afbd4c..9956ea9d4e8 100644
88175c
--- a/Lib/test/test_cgi.py
88175c
+++ b/Lib/test/test_cgi.py
88175c
@@ -61,12 +61,9 @@ parse_strict_test_cases = [
88175c
     ("", ValueError("bad query field: ''")),
88175c
     ("&", ValueError("bad query field: ''")),
88175c
     ("&&", ValueError("bad query field: ''")),
88175c
-    (";", ValueError("bad query field: ''")),
88175c
-    (";&;", ValueError("bad query field: ''")),
88175c
     # Should the next few really be valid?
88175c
     ("=", {}),
88175c
     ("=&=", {}),
88175c
-    ("=;=", {}),
88175c
     # This rest seem to make sense
88175c
     ("=a", {'': ['a']}),
88175c
     ("&=a", ValueError("bad query field: ''")),
88175c
@@ -81,8 +78,6 @@ parse_strict_test_cases = [
88175c
     ("a=a+b&b=b+c", {'a': ['a b'], 'b': ['b c']}),
88175c
     ("a=a+b&a=b+a", {'a': ['a b', 'b a']}),
88175c
     ("x=1&y=2.0&z=2-3.%2b0", {'x': ['1'], 'y': ['2.0'], 'z': ['2-3.+0']}),
88175c
-    ("x=1;y=2.0&z=2-3.%2b0", {'x': ['1'], 'y': ['2.0'], 'z': ['2-3.+0']}),
88175c
-    ("x=1;y=2.0;z=2-3.%2b0", {'x': ['1'], 'y': ['2.0'], 'z': ['2-3.+0']}),
88175c
     ("Hbc5161168c542333633315dee1182227:key_store_seqid=400006&cuyer=r&view=bustomer&order_id=0bb2e248638833d48cb7fed300000f1b&expire=964546263&lobale=en-US&kid=130003.300038&ss=env",
88175c
      {'Hbc5161168c542333633315dee1182227:key_store_seqid': ['400006'],
88175c
       'cuyer': ['r'],
88175c
@@ -177,6 +172,60 @@ class CgiTests(unittest.TestCase):
88175c
                         self.assertItemsEqual(sd.items(),
88175c
                                                 first_second_elts(expect.items()))
88175c
 
88175c
+    def test_separator(self):
88175c
+        parse_semicolon = [
88175c
+            ("x=1;y=2.0", {'x': ['1'], 'y': ['2.0']}),
88175c
+            ("x=1;y=2.0;z=2-3.%2b0", {'x': ['1'], 'y': ['2.0'], 'z': ['2-3.+0']}),
88175c
+            (";", ValueError("bad query field: ''")),
88175c
+            (";;", ValueError("bad query field: ''")),
88175c
+            ("=;a", ValueError("bad query field: 'a'")),
88175c
+            (";b=a", ValueError("bad query field: ''")),
88175c
+            ("b;=a", ValueError("bad query field: 'b'")),
88175c
+            ("a=a+b;b=b+c", {'a': ['a b'], 'b': ['b c']}),
88175c
+            ("a=a+b;a=b+a", {'a': ['a b', 'b a']}),
88175c
+        ]
88175c
+        for orig, expect in parse_semicolon:
88175c
+            env = {'QUERY_STRING': orig}
88175c
+            fcd = cgi.FormContentDict(env, separator=';')
88175c
+            sd = cgi.SvFormContentDict(env, separator=';')
88175c
+            fs = cgi.FieldStorage(environ=env, separator=';')
88175c
+            if isinstance(expect, dict):
88175c
+                # test dict interface
88175c
+                self.assertEqual(len(expect), len(fcd))
88175c
+                self.assertItemsEqual(expect.keys(), fcd.keys())
88175c
+                self.assertItemsEqual(expect.values(), fcd.values())
88175c
+                self.assertItemsEqual(expect.items(), fcd.items())
88175c
+                self.assertEqual(fcd.get("nonexistent field", "default"), "default")
88175c
+                self.assertEqual(len(sd), len(fs))
88175c
+                self.assertItemsEqual(sd.keys(), fs.keys())
88175c
+                self.assertEqual(fs.getvalue("nonexistent field", "default"), "default")
88175c
+                # test individual fields
88175c
+                for key in expect.keys():
88175c
+                    expect_val = expect[key]
88175c
+                    self.assertTrue(fcd.has_key(key))
88175c
+                    self.assertItemsEqual(fcd[key], expect[key])
88175c
+                    self.assertEqual(fcd.get(key, "default"), fcd[key])
88175c
+                    self.assertTrue(fs.has_key(key))
88175c
+                    if len(expect_val) > 1:
88175c
+                        single_value = 0
88175c
+                    else:
88175c
+                        single_value = 1
88175c
+                    try:
88175c
+                        val = sd[key]
88175c
+                    except IndexError:
88175c
+                        self.assertFalse(single_value)
88175c
+                        self.assertEqual(fs.getvalue(key), expect_val)
88175c
+                    else:
88175c
+                        self.assertTrue(single_value)
88175c
+                        self.assertEqual(val, expect_val[0])
88175c
+                        self.assertEqual(fs.getvalue(key), expect_val[0])
88175c
+                    self.assertItemsEqual(sd.getlist(key), expect_val)
88175c
+                    if single_value:
88175c
+                        self.assertItemsEqual(sd.values(),
88175c
+                                                first_elts(expect.values()))
88175c
+                        self.assertItemsEqual(sd.items(),
88175c
+                                                first_second_elts(expect.items()))
88175c
+
88175c
     def test_weird_formcontentdict(self):
88175c
         # Test the weird FormContentDict classes
88175c
         env = {'QUERY_STRING': "x=1&y=2.0&z=2-3.%2b0&1=1abc"}
88175c
diff --git a/Lib/test/test_urlparse.py b/Lib/test/test_urlparse.py
88175c
index 86c4a0595c4..21875bb2991 100644
88175c
--- a/Lib/test/test_urlparse.py
88175c
+++ b/Lib/test/test_urlparse.py
88175c
@@ -3,6 +3,12 @@ import sys
88175c
 import unicodedata
88175c
 import unittest
88175c
 import urlparse
88175c
+from test.support import EnvironmentVarGuard
88175c
+from warnings import catch_warnings, filterwarnings
88175c
+import tempfile
88175c
+import contextlib
88175c
+import os.path
88175c
+import shutil
88175c
 
88175c
 RFC1808_BASE = "http://a/b/c/d;p?q#f"
88175c
 RFC2396_BASE = "http://a/b/c/d;p?q"
88175c
@@ -24,16 +30,29 @@ parse_qsl_test_cases = [
88175c
     ("&a=b", [('a', 'b')]),
88175c
     ("a=a+b&b=b+c", [('a', 'a b'), ('b', 'b c')]),
88175c
     ("a=1&a=2", [('a', '1'), ('a', '2')]),
88175c
+]
88175c
+
88175c
+parse_qsl_test_cases_semicolon = [
88175c
     (";", []),
88175c
     (";;", []),
88175c
     (";a=b", [('a', 'b')]),
88175c
     ("a=a+b;b=b+c", [('a', 'a b'), ('b', 'b c')]),
88175c
     ("a=1;a=2", [('a', '1'), ('a', '2')]),
88175c
-    (b";", []),
88175c
-    (b";;", []),
88175c
-    (b";a=b", [(b'a', b'b')]),
88175c
-    (b"a=a+b;b=b+c", [(b'a', b'a b'), (b'b', b'b c')]),
88175c
-    (b"a=1;a=2", [(b'a', b'1'), (b'a', b'2')]),
88175c
+]
88175c
+
88175c
+parse_qsl_test_cases_legacy = [
88175c
+    ("a=1;a=2&a=3", [('a', '1'), ('a', '2'), ('a', '3')]),
88175c
+    ("a=1;b=2&c=3", [('a', '1'), ('b', '2'), ('c', '3')]),
88175c
+    ("a=1&b=2&c=3;", [('a', '1'), ('b', '2'), ('c', '3')]),
88175c
+]
88175c
+
88175c
+parse_qsl_test_cases_warn = [
88175c
+    (";a=b", [(';a', 'b')]),
88175c
+    ("a=a+b;b=b+c", [('a', 'a b;b=b c')]),
88175c
+    (b";a=b", [(b';a', b'b')]),
88175c
+    (b"a=a+b;b=b+c", [(b'a', b'a b;b=b c')]),
88175c
+    ("a=1;a=2&a=3", [('a', '1;a=2'), ('a', '3')]),
88175c
+    (b"a=1;a=2&a=3", [(b'a', b'1;a=2'), (b'a', b'3')]),
88175c
 ]
88175c
 
88175c
 parse_qs_test_cases = [
88175c
@@ -57,6 +76,9 @@ parse_qs_test_cases = [
88175c
     (b"&a=b", {b'a': [b'b']}),
88175c
     (b"a=a+b&b=b+c", {b'a': [b'a b'], b'b': [b'b c']}),
88175c
     (b"a=1&a=2", {b'a': [b'1', b'2']}),
88175c
+]
88175c
+
88175c
+parse_qs_test_cases_semicolon = [
88175c
     (";", {}),
88175c
     (";;", {}),
88175c
     (";a=b", {'a': ['b']}),
88175c
@@ -69,6 +91,24 @@ parse_qs_test_cases = [
88175c
     (b"a=1;a=2", {b'a': [b'1', b'2']}),
88175c
 ]
88175c
 
88175c
+parse_qs_test_cases_legacy = [
88175c
+    ("a=1;a=2&a=3", {'a': ['1', '2', '3']}),
88175c
+    ("a=1;b=2&c=3", {'a': ['1'], 'b': ['2'], 'c': ['3']}),
88175c
+    ("a=1&b=2&c=3;", {'a': ['1'], 'b': ['2'], 'c': ['3']}),
88175c
+    (b"a=1;a=2&a=3", {b'a': [b'1', b'2', b'3']}),
88175c
+    (b"a=1;b=2&c=3", {b'a': [b'1'], b'b': [b'2'], b'c': [b'3']}),
88175c
+    (b"a=1&b=2&c=3;", {b'a': [b'1'], b'b': [b'2'], b'c': [b'3']}),
88175c
+]
88175c
+
88175c
+parse_qs_test_cases_warn = [
88175c
+    (";a=b", {';a': ['b']}),
88175c
+    ("a=a+b;b=b+c", {'a': ['a b;b=b c']}),
88175c
+    (b";a=b", {b';a': [b'b']}),
88175c
+    (b"a=a+b;b=b+c", {b'a':[ b'a b;b=b c']}),
88175c
+    ("a=1;a=2&a=3", {'a': ['1;a=2', '3']}),
88175c
+    (b"a=1;a=2&a=3", {b'a': [b'1;a=2', b'3']}),
88175c
+]
88175c
+
88175c
 class UrlParseTestCase(unittest.TestCase):
88175c
 
88175c
     def checkRoundtrips(self, url, parsed, split):
88175c
@@ -141,6 +181,40 @@ class UrlParseTestCase(unittest.TestCase):
88175c
             self.assertEqual(result, expect_without_blanks,
88175c
                     "Error parsing %r" % orig)
88175c
 
88175c
+    def test_qs_default_warn(self):
88175c
+        for orig, expect in parse_qs_test_cases_warn:
88175c
+            with catch_warnings(record=True) as w:
88175c
+                filterwarnings(action='always',
88175c
+                                        category=urlparse._QueryStringSeparatorWarning)
88175c
+                result = urlparse.parse_qs(orig, keep_blank_values=True)
88175c
+                self.assertEqual(result, expect, "Error parsing %r" % orig)
88175c
+            self.assertEqual(len(w), 1)
88175c
+            self.assertEqual(w[0].category, urlparse._QueryStringSeparatorWarning)
88175c
+
88175c
+    def test_qsl_default_warn(self):
88175c
+        for orig, expect in parse_qsl_test_cases_warn:
88175c
+            with catch_warnings(record=True) as w:
88175c
+                filterwarnings(action='always',
88175c
+                               category=urlparse._QueryStringSeparatorWarning)
88175c
+                result = urlparse.parse_qsl(orig, keep_blank_values=True)
88175c
+                self.assertEqual(result, expect, "Error parsing %r" % orig)
88175c
+            self.assertEqual(len(w), 1)
88175c
+            self.assertEqual(w[0].category, urlparse._QueryStringSeparatorWarning)
88175c
+
88175c
+    def test_default_qs_no_warnings(self):
88175c
+        for orig, expect in parse_qs_test_cases:
88175c
+            with catch_warnings(record=True) as w:
88175c
+                result = urlparse.parse_qs(orig, keep_blank_values=True)
88175c
+                self.assertEqual(result, expect, "Error parsing %r" % orig)
88175c
+            self.assertEqual(len(w), 0)
88175c
+
88175c
+    def test_default_qsl_no_warnings(self):
88175c
+        for orig, expect in parse_qsl_test_cases:
88175c
+            with catch_warnings(record=True) as w:
88175c
+                result = urlparse.parse_qsl(orig, keep_blank_values=True)
88175c
+                self.assertEqual(result, expect, "Error parsing %r" % orig)
88175c
+            self.assertEqual(len(w), 0)
88175c
+
88175c
     def test_roundtrips(self):
88175c
         testcases = [
88175c
             ('file:///tmp/junk.txt',
88175c
@@ -626,6 +700,132 @@ class UrlParseTestCase(unittest.TestCase):
88175c
         self.assertEqual(urlparse.urlparse("http://www.python.org:80"),
88175c
                 ('http','www.python.org:80','','','',''))
88175c
 
88175c
+    def test_parse_qs_separator_bytes(self):
88175c
+        expected = {b'a': [b'1'], b'b': [b'2']}
88175c
+
88175c
+        result = urlparse.parse_qs(b'a=1;b=2', separator=b';')
88175c
+        self.assertEqual(result, expected)
88175c
+        result = urlparse.parse_qs(b'a=1;b=2', separator=';')
88175c
+        self.assertEqual(result, expected)
88175c
+        result = urlparse.parse_qs('a=1;b=2', separator=';')
88175c
+        self.assertEqual(result, {'a': ['1'], 'b': ['2']})
88175c
+
88175c
+    @contextlib.contextmanager
88175c
+    def _qsl_sep_config(self, sep):
88175c
+        """Context for the given parse_qsl default separator configured in config file"""
88175c
+        old_filename = urlparse._QS_SEPARATOR_CONFIG_FILENAME
88175c
+        urlparse._default_qs_separator = None
88175c
+        try:
88175c
+            tmpdirname = tempfile.mkdtemp()
88175c
+            filename = os.path.join(tmpdirname, 'conf.cfg')
88175c
+            with open(filename, 'w') as file:
88175c
+                file.write('[parse_qs]\n')
88175c
+                file.write('PYTHON_URLLIB_QS_SEPARATOR = {}'.format(sep))
88175c
+            urlparse._QS_SEPARATOR_CONFIG_FILENAME = filename
88175c
+            yield
88175c
+        finally:
88175c
+            urlparse._QS_SEPARATOR_CONFIG_FILENAME = old_filename
88175c
+            urlparse._default_qs_separator = None
88175c
+            shutil.rmtree(tmpdirname)
88175c
+
88175c
+    def test_parse_qs_separator_semicolon(self):
88175c
+        for orig, expect in parse_qs_test_cases_semicolon:
88175c
+            result = urlparse.parse_qs(orig, separator=';')
88175c
+            self.assertEqual(result, expect, "Error parsing %r" % orig)
88175c
+            with EnvironmentVarGuard() as environ, catch_warnings(record=True) as w:
88175c
+                environ['PYTHON_URLLIB_QS_SEPARATOR'] = ';'
88175c
+                result = urlparse.parse_qs(orig)
88175c
+            self.assertEqual(result, expect, "Error parsing %r" % orig)
88175c
+            self.assertEqual(len(w), 0)
88175c
+            with self._qsl_sep_config(';'), catch_warnings(record=True) as w:
88175c
+                result = urlparse.parse_qs(orig)
88175c
+            self.assertEqual(result, expect, "Error parsing %r" % orig)
88175c
+            self.assertEqual(len(w), 0)
88175c
+
88175c
+    def test_parse_qsl_separator_semicolon(self):
88175c
+        for orig, expect in parse_qsl_test_cases_semicolon:
88175c
+            result = urlparse.parse_qsl(orig, separator=';')
88175c
+            self.assertEqual(result, expect, "Error parsing %r" % orig)
88175c
+            with EnvironmentVarGuard() as environ, catch_warnings(record=True) as w:
88175c
+                environ['PYTHON_URLLIB_QS_SEPARATOR'] = ';'
88175c
+                result = urlparse.parse_qsl(orig)
88175c
+            self.assertEqual(result, expect, "Error parsing %r" % orig)
88175c
+            self.assertEqual(len(w), 0)
88175c
+            with self._qsl_sep_config(';'), catch_warnings(record=True) as w:
88175c
+                result = urlparse.parse_qsl(orig)
88175c
+            self.assertEqual(result, expect, "Error parsing %r" % orig)
88175c
+            self.assertEqual(len(w), 0)
88175c
+
88175c
+    def test_parse_qs_separator_legacy(self):
88175c
+        for orig, expect in parse_qs_test_cases_legacy:
88175c
+            with EnvironmentVarGuard() as environ, catch_warnings(record=True) as w:
88175c
+                environ['PYTHON_URLLIB_QS_SEPARATOR'] = 'legacy'
88175c
+                result = urlparse.parse_qs(orig)
88175c
+            self.assertEqual(result, expect, "Error parsing %r" % orig)
88175c
+            self.assertEqual(len(w), 0)
88175c
+            with self._qsl_sep_config('legacy'), catch_warnings(record=True) as w:
88175c
+                result = urlparse.parse_qs(orig)
88175c
+            self.assertEqual(result, expect, "Error parsing %r" % orig)
88175c
+            self.assertEqual(len(w), 0)
88175c
+
88175c
+    def test_parse_qsl_separator_legacy(self):
88175c
+        for orig, expect in parse_qsl_test_cases_legacy:
88175c
+            with EnvironmentVarGuard() as environ, catch_warnings(record=True) as w:
88175c
+                environ['PYTHON_URLLIB_QS_SEPARATOR'] = 'legacy'
88175c
+                result = urlparse.parse_qsl(orig)
88175c
+            self.assertEqual(result, expect, "Error parsing %r" % orig)
88175c
+            self.assertEqual(len(w), 0)
88175c
+            with self._qsl_sep_config('legacy'), catch_warnings(record=True) as w:
88175c
+                result = urlparse.parse_qsl(orig)
88175c
+            self.assertEqual(result, expect, "Error parsing %r" % orig)
88175c
+            self.assertEqual(len(w), 0)
88175c
+
88175c
+    def test_parse_qs_separator_bad_value_env_or_config(self):
88175c
+        for bad_sep in '', 'abc', 'safe', '&;', 'SEP':
88175c
+            with EnvironmentVarGuard() as environ, catch_warnings(record=True) as w:
88175c
+                environ['PYTHON_URLLIB_QS_SEPARATOR'] = bad_sep
88175c
+                with self.assertRaises(ValueError):
88175c
+                    urlparse.parse_qsl('a=1;b=2')
88175c
+            with self._qsl_sep_config('bad_sep'), catch_warnings(record=True) as w:
88175c
+                with self.assertRaises(ValueError):
88175c
+                    urlparse.parse_qsl('a=1;b=2')
88175c
+
88175c
+    def test_parse_qs_separator_bad_value_arg(self):
88175c
+        for bad_sep in True, {}, '':
88175c
+            with self.assertRaises(ValueError):
88175c
+                urlparse.parse_qsl('a=1;b=2', separator=bad_sep)
88175c
+
88175c
+    def test_parse_qs_separator_num_fields(self):
88175c
+        for qs, sep in (
88175c
+            ('a&b&c', '&'),
88175c
+            ('a;b;c', ';'),
88175c
+            ('a&b;c', 'legacy'),
88175c
+        ):
88175c
+            with EnvironmentVarGuard() as environ, catch_warnings(record=True) as w:
88175c
+                if sep != 'legacy':
88175c
+                    with self.assertRaises(ValueError):
88175c
+                        urlparse.parse_qsl(qs, separator=sep, max_num_fields=2)
88175c
+                if sep:
88175c
+                    environ['PYTHON_URLLIB_QS_SEPARATOR'] = sep
88175c
+                with self.assertRaises(ValueError):
88175c
+                    urlparse.parse_qsl(qs, max_num_fields=2)
88175c
+
88175c
+    def test_parse_qs_separator_priority(self):
88175c
+        # env variable trumps config file
88175c
+        with self._qsl_sep_config('~'), EnvironmentVarGuard() as environ:
88175c
+            environ['PYTHON_URLLIB_QS_SEPARATOR'] = '!'
88175c
+            result = urlparse.parse_qs('a=1!b=2~c=3')
88175c
+            self.assertEqual(result, {'a': ['1'], 'b': ['2~c=3']})
88175c
+        # argument trumps config file
88175c
+        with self._qsl_sep_config('~'):
88175c
+            result = urlparse.parse_qs('a=1$b=2~c=3', separator='$')
88175c
+            self.assertEqual(result, {'a': ['1'], 'b': ['2~c=3']})
88175c
+        # argument trumps env variable
88175c
+        with EnvironmentVarGuard() as environ:
88175c
+            environ['PYTHON_URLLIB_QS_SEPARATOR'] = '~'
88175c
+            result = urlparse.parse_qs('a=1$b=2~c=3', separator='$')
88175c
+            self.assertEqual(result, {'a': ['1'], 'b': ['2~c=3']})
88175c
+
88175c
     def test_urlsplit_normalization(self):
88175c
         # Certain characters should never occur in the netloc,
88175c
         # including under normalization.
88175c
diff --git a/Lib/urlparse.py b/Lib/urlparse.py
88175c
index 798b467b605..69504d8fd93 100644
88175c
--- a/Lib/urlparse.py
88175c
+++ b/Lib/urlparse.py
88175c
@@ -29,6 +29,7 @@ test_urlparse.py provides a good indicator of parsing behavior.
88175c
 """
88175c
 
88175c
 import re
88175c
+import os
88175c
 
88175c
 __all__ = ["urlparse", "urlunparse", "urljoin", "urldefrag",
88175c
            "urlsplit", "urlunsplit", "parse_qs", "parse_qsl"]
88175c
@@ -382,7 +383,8 @@ def unquote(s):
88175c
             append(item)
88175c
     return ''.join(res)
88175c
 
88175c
-def parse_qs(qs, keep_blank_values=0, strict_parsing=0, max_num_fields=None):
88175c
+def parse_qs(qs, keep_blank_values=0, strict_parsing=0, max_num_fields=None,
88175c
+             separator=None):
88175c
     """Parse a query given as a string argument.
88175c
 
88175c
         Arguments:
88175c
@@ -405,14 +407,23 @@ def parse_qs(qs, keep_blank_values=0, strict_parsing=0, max_num_fields=None):
88175c
     """
88175c
     dict = {}
88175c
     for name, value in parse_qsl(qs, keep_blank_values, strict_parsing,
88175c
-                                 max_num_fields):
88175c
+                                 max_num_fields, separator):
88175c
         if name in dict:
88175c
             dict[name].append(value)
88175c
         else:
88175c
             dict[name] = [value]
88175c
     return dict
88175c
 
88175c
-def parse_qsl(qs, keep_blank_values=0, strict_parsing=0, max_num_fields=None):
88175c
+class _QueryStringSeparatorWarning(RuntimeWarning):
88175c
+    """Warning for using default `separator` in parse_qs or parse_qsl"""
88175c
+
88175c
+# The default "separator" for parse_qsl can be specified in a config file.
88175c
+# It's cached after first read.
88175c
+_QS_SEPARATOR_CONFIG_FILENAME = '/etc/python/urllib.cfg'
88175c
+_default_qs_separator = None
88175c
+
88175c
+def parse_qsl(qs, keep_blank_values=0, strict_parsing=0, max_num_fields=None,
88175c
+              separator=None):
88175c
     """Parse a query given as a string argument.
88175c
 
88175c
     Arguments:
88175c
@@ -434,15 +445,72 @@ def parse_qsl(qs, keep_blank_values=0, strict_parsing=0, max_num_fields=None):
88175c
 
88175c
     Returns a list, as G-d intended.
88175c
     """
88175c
+
88175c
+    if (not separator or (not isinstance(separator, (str, bytes)))) and separator is not None:
88175c
+        raise ValueError("Separator must be of type string or bytes.")
88175c
+
88175c
+    # Used when both "&" and ";" act as separators. (Need a non-string value.)
88175c
+    _legacy = object()
88175c
+
88175c
+    if separator is None:
88175c
+        global _default_qs_separator
88175c
+        separator = _default_qs_separator
88175c
+        envvar_name = 'PYTHON_URLLIB_QS_SEPARATOR'
88175c
+        if separator is None:
88175c
+            # Set default separator from environment variable
88175c
+            separator = os.environ.get(envvar_name)
88175c
+            config_source = 'environment variable'
88175c
+        if separator is None:
88175c
+            # Set default separator from the configuration file
88175c
+            try:
88175c
+                file = open(_QS_SEPARATOR_CONFIG_FILENAME)
88175c
+            except EnvironmentError:
88175c
+                pass
88175c
+            else:
88175c
+                with file:
88175c
+                    import ConfigParser
88175c
+                    config = ConfigParser.ConfigParser()
88175c
+                    config.readfp(file)
88175c
+                    separator = config.get('parse_qs', envvar_name)
88175c
+                    _default_qs_separator = separator
88175c
+                config_source = _QS_SEPARATOR_CONFIG_FILENAME
88175c
+        if separator is None:
88175c
+            # The default is '&', but warn if not specified explicitly
88175c
+            if ';' in qs:
88175c
+                from warnings import warn
88175c
+                warn("The default separator of urlparse.parse_qsl and "
88175c
+                    + "parse_qs was changed to '&' to avoid a web cache "
88175c
+                    + "poisoning issue (CVE-2021-23336). "
88175c
+                    + "By default, semicolons no longer act as query field "
88175c
+                    + "separators. "
88175c
+                    + "See https://access.redhat.com/articles/5860431 for "
88175c
+                    + "more details.",
88175c
+                    _QueryStringSeparatorWarning, stacklevel=2)
88175c
+            separator = '&'
88175c
+        elif separator == 'legacy':
88175c
+            separator = _legacy
88175c
+        elif len(separator) != 1:
88175c
+            raise ValueError(
88175c
+                '{} (from {}) must contain '.format(envvar_name, config_source)
88175c
+                + '1 character, or "legacy". See '
88175c
+                + 'https://access.redhat.com/articles/5860431 for more details.'
88175c
+            )
88175c
+
88175c
     # If max_num_fields is defined then check that the number of fields
88175c
     # is less than max_num_fields. This prevents a memory exhaustion DOS
88175c
     # attack via post bodies with many fields.
88175c
     if max_num_fields is not None:
88175c
-        num_fields = 1 + qs.count('&') + qs.count(';')
88175c
+        if separator is _legacy:
88175c
+            num_fields = 1 + qs.count('&') + qs.count(';')
88175c
+        else:
88175c
+            num_fields = 1 + qs.count(separator)
88175c
         if max_num_fields < num_fields:
88175c
             raise ValueError('Max number of fields exceeded')
88175c
 
88175c
-    pairs = [s2 for s1 in qs.split('&') for s2 in s1.split(';')]
88175c
+    if separator is _legacy:
88175c
+        pairs = [s2 for s1 in qs.split('&') for s2 in s1.split(';')]
88175c
+    else:
88175c
+        pairs = [s1 for s1 in qs.split(separator)]
88175c
     r = []
88175c
     for name_value in pairs:
88175c
         if not name_value and not strict_parsing:
88175c
-- 
88175c
2.30.2
88175c