dcavalca / rpms / rpm

Forked from rpms/rpm 2 years ago
Clone

Blame 0001-In-Python-3-return-all-our-string-data-as-surrogate-.patch

James Antill ee2eaf
From 84920f898315d09a57a3f1067433eaeb7de5e830 Mon Sep 17 00:00:00 2001
James Antill ee2eaf
Message-Id: <84920f898315d09a57a3f1067433eaeb7de5e830.1554884444.git.pmatilai@redhat.com>
James Antill ee2eaf
From: Panu Matilainen <pmatilai@redhat.com>
James Antill ee2eaf
Date: Fri, 22 Feb 2019 19:44:16 +0200
James Antill ee2eaf
Subject: [PATCH] In Python 3, return all our string data as surrogate-escaped
James Antill ee2eaf
 utf-8 strings
James Antill ee2eaf
James Antill ee2eaf
In the almost ten years of rpm sort of supporting Python 3 bindings, quite
James Antill ee2eaf
obviously nobody has actually tried to use them. There's a major mismatch
James Antill ee2eaf
between what the header API outputs (bytes) and what all the other APIs
James Antill ee2eaf
accept (strings), resulting in hysterical TypeErrors all over the place,
James Antill ee2eaf
including but not limited to labelCompare() (RhBug:1631292). Also a huge
James Antill ee2eaf
number of other places have been returning strings and silently assuming
James Antill ee2eaf
utf-8 through use of Py_BuildValue("s", ...), which will just irrevocably
James Antill ee2eaf
fail when non-utf8 data is encountered.
James Antill ee2eaf
James Antill ee2eaf
The politically Python 3-correct solution would be declaring all our data
James Antill ee2eaf
as bytes with unspecified encoding - that's exactly what it historically is.
James Antill ee2eaf
However doing so would by definition break every single rpm script people
James Antill ee2eaf
have developed on Python 2. And when 99% of the rpm content in the world
James Antill ee2eaf
actually is utf-8 encoded even if it doesn't say so (and in recent times
James Antill ee2eaf
packages even advertise themselves as utf-8 encoded), the bytes-only route
James Antill ee2eaf
seems a wee bit too draconian, even to this grumpy old fella.
James Antill ee2eaf
James Antill ee2eaf
Instead, route all our string returns through a single helper macro
James Antill ee2eaf
which on Python 2 just does what we always did, but in Python 3 converts
James Antill ee2eaf
the data to surrogate-escaped utf-8 strings. This makes stuff "just work"
James Antill ee2eaf
out of the box pretty much everywhere even with Python 3 (including
James Antill ee2eaf
our own test-suite!), while still allowing to handle the non-utf8 case.
James Antill ee2eaf
Handling the non-utf8 case is a bit more uglier but still possible,
James Antill ee2eaf
which is exactly how you want corner-cases to be. There might be some
James Antill ee2eaf
uses for retrieving raw byte data from the header, but worrying about
James Antill ee2eaf
such an API is a case for some other rainy day, for now we mostly only
James Antill ee2eaf
care that stuff works again.
James Antill ee2eaf
James Antill ee2eaf
Also add test-cases for mixed data source labelCompare() and
James Antill ee2eaf
non-utf8 insert to + retrieve from header.
James Antill ee2eaf
---
James Antill ee2eaf
 python/header-py.c     |  2 +-
James Antill ee2eaf
 python/rpmds-py.c      |  8 ++++----
James Antill ee2eaf
 python/rpmfd-py.c      |  6 +++---
James Antill ee2eaf
 python/rpmfi-py.c      | 24 ++++++++++++------------
James Antill ee2eaf
 python/rpmfiles-py.c   | 26 +++++++++++++-------------
James Antill ee2eaf
 python/rpmkeyring-py.c |  2 +-
James Antill ee2eaf
 python/rpmmacro-py.c   |  2 +-
James Antill ee2eaf
 python/rpmmodule.c     |  2 +-
James Antill ee2eaf
 python/rpmps-py.c      |  8 ++++----
James Antill ee2eaf
 python/rpmstrpool-py.c |  2 +-
James Antill ee2eaf
 python/rpmsystem-py.h  |  7 +++++++
James Antill ee2eaf
 python/rpmtd-py.c      |  2 +-
James Antill ee2eaf
 python/rpmte-py.c      | 16 ++++++++--------
James Antill ee2eaf
 python/rpmts-py.c      | 11 ++++++-----
James Antill ee2eaf
 python/spec-py.c       |  8 ++++----
James Antill ee2eaf
 tests/local.at         |  1 +
James Antill ee2eaf
 tests/rpmpython.at     | 34 ++++++++++++++++++++++++++++++++++
James Antill ee2eaf
 17 files changed, 102 insertions(+), 59 deletions(-)
James Antill ee2eaf
James Antill ee2eaf
diff --git a/python/header-py.c b/python/header-py.c
James Antill ee2eaf
index c9d54e869..93c241cb7 100644
James Antill ee2eaf
--- a/python/header-py.c
James Antill ee2eaf
+++ b/python/header-py.c
James Antill ee2eaf
@@ -231,7 +231,7 @@ static PyObject * hdrFormat(hdrObject * s, PyObject * args, PyObject * kwds)
James Antill ee2eaf
 	return NULL;
James Antill ee2eaf
     }
James Antill ee2eaf
 
James Antill ee2eaf
-    result = Py_BuildValue("s", r);
James Antill ee2eaf
+    result = utf8FromString(r);
James Antill ee2eaf
     free(r);
James Antill ee2eaf
 
James Antill ee2eaf
     return result;
James Antill ee2eaf
diff --git a/python/rpmds-py.c b/python/rpmds-py.c
James Antill ee2eaf
index 39b26628e..ecc9af9d5 100644
James Antill ee2eaf
--- a/python/rpmds-py.c
James Antill ee2eaf
+++ b/python/rpmds-py.c
James Antill ee2eaf
@@ -31,19 +31,19 @@ rpmds_Ix(rpmdsObject * s)
James Antill ee2eaf
 static PyObject *
James Antill ee2eaf
 rpmds_DNEVR(rpmdsObject * s)
James Antill ee2eaf
 {
James Antill ee2eaf
-    return Py_BuildValue("s", rpmdsDNEVR(s->ds));
James Antill ee2eaf
+    return utf8FromString(rpmdsDNEVR(s->ds));
James Antill ee2eaf
 }
James Antill ee2eaf
 
James Antill ee2eaf
 static PyObject *
James Antill ee2eaf
 rpmds_N(rpmdsObject * s)
James Antill ee2eaf
 {
James Antill ee2eaf
-    return Py_BuildValue("s", rpmdsN(s->ds));
James Antill ee2eaf
+    return utf8FromString(rpmdsN(s->ds));
James Antill ee2eaf
 }
James Antill ee2eaf
 
James Antill ee2eaf
 static PyObject *
James Antill ee2eaf
 rpmds_EVR(rpmdsObject * s)
James Antill ee2eaf
 {
James Antill ee2eaf
-    return Py_BuildValue("s", rpmdsEVR(s->ds));
James Antill ee2eaf
+    return utf8FromString(rpmdsEVR(s->ds));
James Antill ee2eaf
 }
James Antill ee2eaf
 
James Antill ee2eaf
 static PyObject *
James Antill ee2eaf
@@ -261,7 +261,7 @@ rpmds_subscript(rpmdsObject * s, PyObject * key)
James Antill ee2eaf
 
James Antill ee2eaf
     ix = (int) PyInt_AsLong(key);
James Antill ee2eaf
     rpmdsSetIx(s->ds, ix);
James Antill ee2eaf
-    return Py_BuildValue("s", rpmdsDNEVR(s->ds));
James Antill ee2eaf
+    return utf8FromString(rpmdsDNEVR(s->ds));
James Antill ee2eaf
 }
James Antill ee2eaf
 
James Antill ee2eaf
 static PyMappingMethods rpmds_as_mapping = {
James Antill ee2eaf
diff --git a/python/rpmfd-py.c b/python/rpmfd-py.c
James Antill ee2eaf
index 85fb0cd24..4b05cce5f 100644
James Antill ee2eaf
--- a/python/rpmfd-py.c
James Antill ee2eaf
+++ b/python/rpmfd-py.c
James Antill ee2eaf
@@ -327,17 +327,17 @@ static PyObject *rpmfd_get_closed(rpmfdObject *s)
James Antill ee2eaf
 static PyObject *rpmfd_get_name(rpmfdObject *s)
James Antill ee2eaf
 {
James Antill ee2eaf
     /* XXX: rpm returns non-paths with [mumble], python files use <mumble> */
James Antill ee2eaf
-    return Py_BuildValue("s", Fdescr(s->fd));
James Antill ee2eaf
+    return utf8FromString(Fdescr(s->fd));
James Antill ee2eaf
 }
James Antill ee2eaf
 
James Antill ee2eaf
 static PyObject *rpmfd_get_mode(rpmfdObject *s)
James Antill ee2eaf
 {
James Antill ee2eaf
-    return Py_BuildValue("s", s->mode);
James Antill ee2eaf
+    return utf8FromString(s->mode);
James Antill ee2eaf
 }
James Antill ee2eaf
 
James Antill ee2eaf
 static PyObject *rpmfd_get_flags(rpmfdObject *s)
James Antill ee2eaf
 {
James Antill ee2eaf
-    return Py_BuildValue("s", s->flags);
James Antill ee2eaf
+    return utf8FromString(s->flags);
James Antill ee2eaf
 }
James Antill ee2eaf
 
James Antill ee2eaf
 static PyGetSetDef rpmfd_getseters[] = {
James Antill ee2eaf
diff --git a/python/rpmfi-py.c b/python/rpmfi-py.c
James Antill ee2eaf
index 8d2f926d0..db405c231 100644
James Antill ee2eaf
--- a/python/rpmfi-py.c
James Antill ee2eaf
+++ b/python/rpmfi-py.c
James Antill ee2eaf
@@ -41,19 +41,19 @@ rpmfi_DX(rpmfiObject * s, PyObject * unused)
James Antill ee2eaf
 static PyObject *
James Antill ee2eaf
 rpmfi_BN(rpmfiObject * s, PyObject * unused)
James Antill ee2eaf
 {
James Antill ee2eaf
-    return Py_BuildValue("s", rpmfiBN(s->fi));
James Antill ee2eaf
+    return utf8FromString(rpmfiBN(s->fi));
James Antill ee2eaf
 }
James Antill ee2eaf
 
James Antill ee2eaf
 static PyObject *
James Antill ee2eaf
 rpmfi_DN(rpmfiObject * s, PyObject * unused)
James Antill ee2eaf
 {
James Antill ee2eaf
-    return Py_BuildValue("s", rpmfiDN(s->fi));
James Antill ee2eaf
+    return utf8FromString(rpmfiDN(s->fi));
James Antill ee2eaf
 }
James Antill ee2eaf
 
James Antill ee2eaf
 static PyObject *
James Antill ee2eaf
 rpmfi_FN(rpmfiObject * s, PyObject * unused)
James Antill ee2eaf
 {
James Antill ee2eaf
-    return Py_BuildValue("s", rpmfiFN(s->fi));
James Antill ee2eaf
+    return utf8FromString(rpmfiFN(s->fi));
James Antill ee2eaf
 }
James Antill ee2eaf
 
James Antill ee2eaf
 static PyObject *
James Antill ee2eaf
@@ -98,7 +98,7 @@ rpmfi_Digest(rpmfiObject * s, PyObject * unused)
James Antill ee2eaf
 {
James Antill ee2eaf
     char *digest = rpmfiFDigestHex(s->fi, NULL);
James Antill ee2eaf
     if (digest) {
James Antill ee2eaf
-	PyObject *dig = Py_BuildValue("s", digest);
James Antill ee2eaf
+	PyObject *dig = utf8FromString(digest);
James Antill ee2eaf
 	free(digest);
James Antill ee2eaf
 	return dig;
James Antill ee2eaf
     } else {
James Antill ee2eaf
@@ -109,7 +109,7 @@ rpmfi_Digest(rpmfiObject * s, PyObject * unused)
James Antill ee2eaf
 static PyObject *
James Antill ee2eaf
 rpmfi_FLink(rpmfiObject * s, PyObject * unused)
James Antill ee2eaf
 {
James Antill ee2eaf
-    return Py_BuildValue("s", rpmfiFLink(s->fi));
James Antill ee2eaf
+    return utf8FromString(rpmfiFLink(s->fi));
James Antill ee2eaf
 }
James Antill ee2eaf
 
James Antill ee2eaf
 static PyObject *
James Antill ee2eaf
@@ -133,13 +133,13 @@ rpmfi_FMtime(rpmfiObject * s, PyObject * unused)
James Antill ee2eaf
 static PyObject *
James Antill ee2eaf
 rpmfi_FUser(rpmfiObject * s, PyObject * unused)
James Antill ee2eaf
 {
James Antill ee2eaf
-    return Py_BuildValue("s", rpmfiFUser(s->fi));
James Antill ee2eaf
+    return utf8FromString(rpmfiFUser(s->fi));
James Antill ee2eaf
 }
James Antill ee2eaf
 
James Antill ee2eaf
 static PyObject *
James Antill ee2eaf
 rpmfi_FGroup(rpmfiObject * s, PyObject * unused)
James Antill ee2eaf
 {
James Antill ee2eaf
-    return Py_BuildValue("s", rpmfiFGroup(s->fi));
James Antill ee2eaf
+    return utf8FromString(rpmfiFGroup(s->fi));
James Antill ee2eaf
 }
James Antill ee2eaf
 
James Antill ee2eaf
 static PyObject *
James Antill ee2eaf
@@ -155,7 +155,7 @@ rpmfi_FClass(rpmfiObject * s, PyObject * unused)
James Antill ee2eaf
 
James Antill ee2eaf
     if ((FClass = rpmfiFClass(s->fi)) == NULL)
James Antill ee2eaf
 	FClass = "";
James Antill ee2eaf
-    return Py_BuildValue("s", FClass);
James Antill ee2eaf
+    return utf8FromString(FClass);
James Antill ee2eaf
 }
James Antill ee2eaf
 
James Antill ee2eaf
 static PyObject *
James Antill ee2eaf
@@ -208,7 +208,7 @@ rpmfi_iternext(rpmfiObject * s)
James Antill ee2eaf
 	    Py_INCREF(Py_None);
James Antill ee2eaf
 	    PyTuple_SET_ITEM(result, 0, Py_None);
James Antill ee2eaf
 	} else
James Antill ee2eaf
-	    PyTuple_SET_ITEM(result,  0, Py_BuildValue("s", FN));
James Antill ee2eaf
+	    PyTuple_SET_ITEM(result,  0, utf8FromString(FN));
James Antill ee2eaf
 	PyTuple_SET_ITEM(result,  1, PyLong_FromLongLong(FSize));
James Antill ee2eaf
 	PyTuple_SET_ITEM(result,  2, PyInt_FromLong(FMode));
James Antill ee2eaf
 	PyTuple_SET_ITEM(result,  3, PyInt_FromLong(FMtime));
James Antill ee2eaf
@@ -222,12 +222,12 @@ rpmfi_iternext(rpmfiObject * s)
James Antill ee2eaf
 	    Py_INCREF(Py_None);
James Antill ee2eaf
 	    PyTuple_SET_ITEM(result, 10, Py_None);
James Antill ee2eaf
 	} else
James Antill ee2eaf
-	    PyTuple_SET_ITEM(result, 10, Py_BuildValue("s", FUser));
James Antill ee2eaf
+	    PyTuple_SET_ITEM(result, 10, utf8FromString(FUser));
James Antill ee2eaf
 	if (FGroup == NULL) {
James Antill ee2eaf
 	    Py_INCREF(Py_None);
James Antill ee2eaf
 	    PyTuple_SET_ITEM(result, 11, Py_None);
James Antill ee2eaf
 	} else
James Antill ee2eaf
-	    PyTuple_SET_ITEM(result, 11, Py_BuildValue("s", FGroup));
James Antill ee2eaf
+	    PyTuple_SET_ITEM(result, 11, utf8FromString(FGroup));
James Antill ee2eaf
 	PyTuple_SET_ITEM(result, 12, rpmfi_Digest(s, NULL));
James Antill ee2eaf
 
James Antill ee2eaf
     } else
James Antill ee2eaf
@@ -313,7 +313,7 @@ rpmfi_subscript(rpmfiObject * s, PyObject * key)
James Antill ee2eaf
 
James Antill ee2eaf
     ix = (int) PyInt_AsLong(key);
James Antill ee2eaf
     rpmfiSetFX(s->fi, ix);
James Antill ee2eaf
-    return Py_BuildValue("s", rpmfiFN(s->fi));
James Antill ee2eaf
+    return utf8FromString(rpmfiFN(s->fi));
James Antill ee2eaf
 }
James Antill ee2eaf
 
James Antill ee2eaf
 static PyMappingMethods rpmfi_as_mapping = {
James Antill ee2eaf
diff --git a/python/rpmfiles-py.c b/python/rpmfiles-py.c
James Antill ee2eaf
index bc07dbeaf..557246cae 100644
James Antill ee2eaf
--- a/python/rpmfiles-py.c
James Antill ee2eaf
+++ b/python/rpmfiles-py.c
James Antill ee2eaf
@@ -41,37 +41,37 @@ static PyObject *rpmfile_dx(rpmfileObject *s)
James Antill ee2eaf
 static PyObject *rpmfile_name(rpmfileObject *s)
James Antill ee2eaf
 {
James Antill ee2eaf
     char * fn = rpmfilesFN(s->files, s->ix);
James Antill ee2eaf
-    PyObject *o = Py_BuildValue("s", fn);
James Antill ee2eaf
+    PyObject *o = utf8FromString(fn);
James Antill ee2eaf
     free(fn);
James Antill ee2eaf
     return o;
James Antill ee2eaf
 }
James Antill ee2eaf
 
James Antill ee2eaf
 static PyObject *rpmfile_basename(rpmfileObject *s)
James Antill ee2eaf
 {
James Antill ee2eaf
-    return Py_BuildValue("s", rpmfilesBN(s->files, s->ix));
James Antill ee2eaf
+    return utf8FromString(rpmfilesBN(s->files, s->ix));
James Antill ee2eaf
 }
James Antill ee2eaf
 
James Antill ee2eaf
 static PyObject *rpmfile_dirname(rpmfileObject *s)
James Antill ee2eaf
 {
James Antill ee2eaf
-    return Py_BuildValue("s", rpmfilesDN(s->files, rpmfilesDI(s->files, s->ix)));
James Antill ee2eaf
+    return utf8FromString(rpmfilesDN(s->files, rpmfilesDI(s->files, s->ix)));
James Antill ee2eaf
 }
James Antill ee2eaf
 
James Antill ee2eaf
 static PyObject *rpmfile_orig_name(rpmfileObject *s)
James Antill ee2eaf
 {
James Antill ee2eaf
     char * fn = rpmfilesOFN(s->files, s->ix);
James Antill ee2eaf
-    PyObject *o = Py_BuildValue("s", fn);
James Antill ee2eaf
+    PyObject *o = utf8FromString(fn);
James Antill ee2eaf
     free(fn);
James Antill ee2eaf
     return o;
James Antill ee2eaf
 }
James Antill ee2eaf
 
James Antill ee2eaf
 static PyObject *rpmfile_orig_basename(rpmfileObject *s)
James Antill ee2eaf
 {
James Antill ee2eaf
-    return Py_BuildValue("s", rpmfilesOBN(s->files, s->ix));
James Antill ee2eaf
+    return utf8FromString(rpmfilesOBN(s->files, s->ix));
James Antill ee2eaf
 }
James Antill ee2eaf
 
James Antill ee2eaf
 static PyObject *rpmfile_orig_dirname(rpmfileObject *s)
James Antill ee2eaf
 {
James Antill ee2eaf
-    return Py_BuildValue("s", rpmfilesODN(s->files, rpmfilesODI(s->files, s->ix)));
James Antill ee2eaf
+    return utf8FromString(rpmfilesODN(s->files, rpmfilesODI(s->files, s->ix)));
James Antill ee2eaf
 }
James Antill ee2eaf
 static PyObject *rpmfile_mode(rpmfileObject *s)
James Antill ee2eaf
 {
James Antill ee2eaf
@@ -105,17 +105,17 @@ static PyObject *rpmfile_nlink(rpmfileObject *s)
James Antill ee2eaf
 
James Antill ee2eaf
 static PyObject *rpmfile_linkto(rpmfileObject *s)
James Antill ee2eaf
 {
James Antill ee2eaf
-    return Py_BuildValue("s", rpmfilesFLink(s->files, s->ix));
James Antill ee2eaf
+    return utf8FromString(rpmfilesFLink(s->files, s->ix));
James Antill ee2eaf
 }
James Antill ee2eaf
 
James Antill ee2eaf
 static PyObject *rpmfile_user(rpmfileObject *s)
James Antill ee2eaf
 {
James Antill ee2eaf
-    return Py_BuildValue("s", rpmfilesFUser(s->files, s->ix));
James Antill ee2eaf
+    return utf8FromString(rpmfilesFUser(s->files, s->ix));
James Antill ee2eaf
 }
James Antill ee2eaf
 
James Antill ee2eaf
 static PyObject *rpmfile_group(rpmfileObject *s)
James Antill ee2eaf
 {
James Antill ee2eaf
-    return Py_BuildValue("s", rpmfilesFGroup(s->files, s->ix));
James Antill ee2eaf
+    return utf8FromString(rpmfilesFGroup(s->files, s->ix));
James Antill ee2eaf
 }
James Antill ee2eaf
 
James Antill ee2eaf
 static PyObject *rpmfile_fflags(rpmfileObject *s)
James Antill ee2eaf
@@ -145,7 +145,7 @@ static PyObject *rpmfile_digest(rpmfileObject *s)
James Antill ee2eaf
 						  NULL, &diglen);
James Antill ee2eaf
     if (digest) {
James Antill ee2eaf
 	char * hex = pgpHexStr(digest, diglen);
James Antill ee2eaf
-	PyObject *o = Py_BuildValue("s", hex);
James Antill ee2eaf
+	PyObject *o = utf8FromString(hex);
James Antill ee2eaf
 	free(hex);
James Antill ee2eaf
 	return o;
James Antill ee2eaf
     }
James Antill ee2eaf
@@ -154,17 +154,17 @@ static PyObject *rpmfile_digest(rpmfileObject *s)
James Antill ee2eaf
 
James Antill ee2eaf
 static PyObject *rpmfile_class(rpmfileObject *s)
James Antill ee2eaf
 {
James Antill ee2eaf
-    return Py_BuildValue("s", rpmfilesFClass(s->files, s->ix));
James Antill ee2eaf
+    return utf8FromString(rpmfilesFClass(s->files, s->ix));
James Antill ee2eaf
 }
James Antill ee2eaf
 
James Antill ee2eaf
 static PyObject *rpmfile_caps(rpmfileObject *s)
James Antill ee2eaf
 {
James Antill ee2eaf
-    return Py_BuildValue("s", rpmfilesFCaps(s->files, s->ix));
James Antill ee2eaf
+    return utf8FromString(rpmfilesFCaps(s->files, s->ix));
James Antill ee2eaf
 }
James Antill ee2eaf
 
James Antill ee2eaf
 static PyObject *rpmfile_langs(rpmfileObject *s)
James Antill ee2eaf
 {
James Antill ee2eaf
-    return Py_BuildValue("s", rpmfilesFLangs(s->files, s->ix));
James Antill ee2eaf
+    return utf8FromString(rpmfilesFLangs(s->files, s->ix));
James Antill ee2eaf
 }
James Antill ee2eaf
 
James Antill ee2eaf
 static PyObject *rpmfile_links(rpmfileObject *s)
James Antill ee2eaf
diff --git a/python/rpmkeyring-py.c b/python/rpmkeyring-py.c
James Antill ee2eaf
index d5f131e42..8968e0513 100644
James Antill ee2eaf
--- a/python/rpmkeyring-py.c
James Antill ee2eaf
+++ b/python/rpmkeyring-py.c
James Antill ee2eaf
@@ -38,7 +38,7 @@ static PyObject *rpmPubkey_new(PyTypeObject *subtype,
James Antill ee2eaf
 static PyObject * rpmPubkey_Base64(rpmPubkeyObject *s)
James Antill ee2eaf
 {
James Antill ee2eaf
     char *b64 = rpmPubkeyBase64(s->pubkey);
James Antill ee2eaf
-    PyObject *res = Py_BuildValue("s", b64);
James Antill ee2eaf
+    PyObject *res = utf8FromString(b64);
James Antill ee2eaf
     free(b64);
James Antill ee2eaf
     return res;
James Antill ee2eaf
 }
James Antill ee2eaf
diff --git a/python/rpmmacro-py.c b/python/rpmmacro-py.c
James Antill ee2eaf
index 3cb1a51f5..d8a365547 100644
James Antill ee2eaf
--- a/python/rpmmacro-py.c
James Antill ee2eaf
+++ b/python/rpmmacro-py.c
James Antill ee2eaf
@@ -52,7 +52,7 @@ rpmmacro_ExpandMacro(PyObject * self, PyObject * args, PyObject * kwds)
James Antill ee2eaf
 	if (rpmExpandMacros(NULL, macro, &str, 0) < 0)
James Antill ee2eaf
 	    PyErr_SetString(pyrpmError, "error expanding macro");
James Antill ee2eaf
 	else
James Antill ee2eaf
-	    res = Py_BuildValue("s", str);
James Antill ee2eaf
+	    res = utf8FromString(str);
James Antill ee2eaf
 	free(str);
James Antill ee2eaf
     }
James Antill ee2eaf
     return res;
James Antill ee2eaf
diff --git a/python/rpmmodule.c b/python/rpmmodule.c
James Antill ee2eaf
index 3faad23c7..05032edc7 100644
James Antill ee2eaf
--- a/python/rpmmodule.c
James Antill ee2eaf
+++ b/python/rpmmodule.c
James Antill ee2eaf
@@ -237,7 +237,7 @@ static void addRpmTags(PyObject *module)
James Antill ee2eaf
 
James Antill ee2eaf
 	PyModule_AddIntConstant(module, tagname, tagval);
James Antill ee2eaf
 	pyval = PyInt_FromLong(tagval);
James Antill ee2eaf
-	pyname = Py_BuildValue("s", shortname);
James Antill ee2eaf
+	pyname = utf8FromString(shortname);
James Antill ee2eaf
 	PyDict_SetItem(dict, pyval, pyname);
James Antill ee2eaf
 	Py_DECREF(pyval);
James Antill ee2eaf
 	Py_DECREF(pyname);
James Antill ee2eaf
diff --git a/python/rpmps-py.c b/python/rpmps-py.c
James Antill ee2eaf
index bdc899a60..902b2ae63 100644
James Antill ee2eaf
--- a/python/rpmps-py.c
James Antill ee2eaf
+++ b/python/rpmps-py.c
James Antill ee2eaf
@@ -18,12 +18,12 @@ static PyObject *rpmprob_get_type(rpmProblemObject *s, void *closure)
James Antill ee2eaf
 
James Antill ee2eaf
 static PyObject *rpmprob_get_pkgnevr(rpmProblemObject *s, void *closure)
James Antill ee2eaf
 {
James Antill ee2eaf
-    return Py_BuildValue("s", rpmProblemGetPkgNEVR(s->prob));
James Antill ee2eaf
+    return utf8FromString(rpmProblemGetPkgNEVR(s->prob));
James Antill ee2eaf
 }
James Antill ee2eaf
 
James Antill ee2eaf
 static PyObject *rpmprob_get_altnevr(rpmProblemObject *s, void *closure)
James Antill ee2eaf
 {
James Antill ee2eaf
-    return Py_BuildValue("s", rpmProblemGetAltNEVR(s->prob));
James Antill ee2eaf
+    return utf8FromString(rpmProblemGetAltNEVR(s->prob));
James Antill ee2eaf
 }
James Antill ee2eaf
 
James Antill ee2eaf
 static PyObject *rpmprob_get_key(rpmProblemObject *s, void *closure)
James Antill ee2eaf
@@ -38,7 +38,7 @@ static PyObject *rpmprob_get_key(rpmProblemObject *s, void *closure)
James Antill ee2eaf
 
James Antill ee2eaf
 static PyObject *rpmprob_get_str(rpmProblemObject *s, void *closure)
James Antill ee2eaf
 {
James Antill ee2eaf
-    return Py_BuildValue("s", rpmProblemGetStr(s->prob));
James Antill ee2eaf
+    return utf8FromString(rpmProblemGetStr(s->prob));
James Antill ee2eaf
 }
James Antill ee2eaf
 
James Antill ee2eaf
 static PyObject *rpmprob_get_num(rpmProblemObject *s, void *closure)
James Antill ee2eaf
@@ -59,7 +59,7 @@ static PyGetSetDef rpmprob_getseters[] = {
James Antill ee2eaf
 static PyObject *rpmprob_str(rpmProblemObject *s)
James Antill ee2eaf
 {
James Antill ee2eaf
     char *str = rpmProblemString(s->prob);
James Antill ee2eaf
-    PyObject *res = Py_BuildValue("s", str);
James Antill ee2eaf
+    PyObject *res = utf8FromString(str);
James Antill ee2eaf
     free(str);
James Antill ee2eaf
     return res;
James Antill ee2eaf
 }
James Antill ee2eaf
diff --git a/python/rpmstrpool-py.c b/python/rpmstrpool-py.c
James Antill ee2eaf
index 356bd1de5..a56e2b540 100644
James Antill ee2eaf
--- a/python/rpmstrpool-py.c
James Antill ee2eaf
+++ b/python/rpmstrpool-py.c
James Antill ee2eaf
@@ -44,7 +44,7 @@ static PyObject *strpool_id2str(rpmstrPoolObject *s, PyObject *item)
James Antill ee2eaf
 	const char *str = rpmstrPoolStr(s->pool, id);
James Antill ee2eaf
 
James Antill ee2eaf
 	if (str)
James Antill ee2eaf
-	    ret = PyBytes_FromString(str);
James Antill ee2eaf
+	    ret = utf8FromString(str);
James Antill ee2eaf
 	else 
James Antill ee2eaf
 	    PyErr_SetObject(PyExc_KeyError, item);
James Antill ee2eaf
     }
James Antill ee2eaf
diff --git a/python/rpmsystem-py.h b/python/rpmsystem-py.h
James Antill ee2eaf
index 955d60cd3..87c750571 100644
James Antill ee2eaf
--- a/python/rpmsystem-py.h
James Antill ee2eaf
+++ b/python/rpmsystem-py.h
James Antill ee2eaf
@@ -19,4 +19,11 @@
James Antill ee2eaf
 #define PyInt_AsSsize_t PyLong_AsSsize_t
James Antill ee2eaf
 #endif
James Antill ee2eaf
 
James Antill ee2eaf
+/* In Python 3, we return all strings as surrogate-escaped utf-8 */
James Antill ee2eaf
+#if PY_MAJOR_VERSION >= 3
James Antill ee2eaf
+#define utf8FromString(_s) PyUnicode_DecodeUTF8(_s, strlen(_s), "surrogateescape")
James Antill ee2eaf
+#else
James Antill ee2eaf
+#define utf8FromString(_s) PyBytes_FromString(_s)
James Antill ee2eaf
+#endif
James Antill ee2eaf
+
James Antill ee2eaf
 #endif	/* H_SYSTEM_PYTHON */
James Antill ee2eaf
diff --git a/python/rpmtd-py.c b/python/rpmtd-py.c
James Antill ee2eaf
index 247c7502a..23ca10517 100644
James Antill ee2eaf
--- a/python/rpmtd-py.c
James Antill ee2eaf
+++ b/python/rpmtd-py.c
James Antill ee2eaf
@@ -17,7 +17,7 @@ PyObject * rpmtd_ItemAsPyobj(rpmtd td, rpmTagClass tclass)
James Antill ee2eaf
 
James Antill ee2eaf
     switch (tclass) {
James Antill ee2eaf
     case RPM_STRING_CLASS:
James Antill ee2eaf
-	res = PyBytes_FromString(rpmtdGetString(td));
James Antill ee2eaf
+	res = utf8FromString(rpmtdGetString(td));
James Antill ee2eaf
 	break;
James Antill ee2eaf
     case RPM_NUMERIC_CLASS:
James Antill ee2eaf
 	res = PyLong_FromLongLong(rpmtdGetNumber(td));
James Antill ee2eaf
diff --git a/python/rpmte-py.c b/python/rpmte-py.c
James Antill ee2eaf
index 99ff2f496..2b3745754 100644
James Antill ee2eaf
--- a/python/rpmte-py.c
James Antill ee2eaf
+++ b/python/rpmte-py.c
James Antill ee2eaf
@@ -54,49 +54,49 @@ rpmte_TEType(rpmteObject * s, PyObject * unused)
James Antill ee2eaf
 static PyObject *
James Antill ee2eaf
 rpmte_N(rpmteObject * s, PyObject * unused)
James Antill ee2eaf
 {
James Antill ee2eaf
-    return Py_BuildValue("s", rpmteN(s->te));
James Antill ee2eaf
+    return utf8FromString(rpmteN(s->te));
James Antill ee2eaf
 }
James Antill ee2eaf
 
James Antill ee2eaf
 static PyObject *
James Antill ee2eaf
 rpmte_E(rpmteObject * s, PyObject * unused)
James Antill ee2eaf
 {
James Antill ee2eaf
-    return Py_BuildValue("s", rpmteE(s->te));
James Antill ee2eaf
+    return utf8FromString(rpmteE(s->te));
James Antill ee2eaf
 }
James Antill ee2eaf
 
James Antill ee2eaf
 static PyObject *
James Antill ee2eaf
 rpmte_V(rpmteObject * s, PyObject * unused)
James Antill ee2eaf
 {
James Antill ee2eaf
-    return Py_BuildValue("s", rpmteV(s->te));
James Antill ee2eaf
+    return utf8FromString(rpmteV(s->te));
James Antill ee2eaf
 }
James Antill ee2eaf
 
James Antill ee2eaf
 static PyObject *
James Antill ee2eaf
 rpmte_R(rpmteObject * s, PyObject * unused)
James Antill ee2eaf
 {
James Antill ee2eaf
-    return Py_BuildValue("s", rpmteR(s->te));
James Antill ee2eaf
+    return utf8FromString(rpmteR(s->te));
James Antill ee2eaf
 }
James Antill ee2eaf
 
James Antill ee2eaf
 static PyObject *
James Antill ee2eaf
 rpmte_A(rpmteObject * s, PyObject * unused)
James Antill ee2eaf
 {
James Antill ee2eaf
-    return Py_BuildValue("s", rpmteA(s->te));
James Antill ee2eaf
+    return utf8FromString(rpmteA(s->te));
James Antill ee2eaf
 }
James Antill ee2eaf
 
James Antill ee2eaf
 static PyObject *
James Antill ee2eaf
 rpmte_O(rpmteObject * s, PyObject * unused)
James Antill ee2eaf
 {
James Antill ee2eaf
-    return Py_BuildValue("s", rpmteO(s->te));
James Antill ee2eaf
+    return utf8FromString(rpmteO(s->te));
James Antill ee2eaf
 }
James Antill ee2eaf
 
James Antill ee2eaf
 static PyObject *
James Antill ee2eaf
 rpmte_NEVR(rpmteObject * s, PyObject * unused)
James Antill ee2eaf
 {
James Antill ee2eaf
-    return Py_BuildValue("s", rpmteNEVR(s->te));
James Antill ee2eaf
+    return utf8FromString(rpmteNEVR(s->te));
James Antill ee2eaf
 }
James Antill ee2eaf
 
James Antill ee2eaf
 static PyObject *
James Antill ee2eaf
 rpmte_NEVRA(rpmteObject * s, PyObject * unused)
James Antill ee2eaf
 {
James Antill ee2eaf
-    return Py_BuildValue("s", rpmteNEVRA(s->te));
James Antill ee2eaf
+    return utf8FromString(rpmteNEVRA(s->te));
James Antill ee2eaf
 }
James Antill ee2eaf
 
James Antill ee2eaf
 static PyObject *
James Antill ee2eaf
diff --git a/python/rpmts-py.c b/python/rpmts-py.c
James Antill ee2eaf
index 1ddfc9a1e..96e3bb28e 100644
James Antill ee2eaf
--- a/python/rpmts-py.c
James Antill ee2eaf
+++ b/python/rpmts-py.c
James Antill ee2eaf
@@ -230,8 +230,9 @@ rpmts_SolveCallback(rpmts ts, rpmds ds, const void * data)
James Antill ee2eaf
 
James Antill ee2eaf
     PyEval_RestoreThread(cbInfo->_save);
James Antill ee2eaf
 
James Antill ee2eaf
-    args = Py_BuildValue("(Oissi)", cbInfo->tso,
James Antill ee2eaf
-		rpmdsTagN(ds), rpmdsN(ds), rpmdsEVR(ds), rpmdsFlags(ds));
James Antill ee2eaf
+    args = Py_BuildValue("(OiNNi)", cbInfo->tso,
James Antill ee2eaf
+		rpmdsTagN(ds), utf8FromString(rpmdsN(ds)),
James Antill ee2eaf
+		utf8FromString(rpmdsEVR(ds)), rpmdsFlags(ds));
James Antill ee2eaf
     result = PyEval_CallObject(cbInfo->cb, args);
James Antill ee2eaf
     Py_DECREF(args);
James Antill ee2eaf
 
James Antill ee2eaf
@@ -409,7 +410,7 @@ rpmts_HdrCheck(rpmtsObject * s, PyObject *obj)
James Antill ee2eaf
     rpmrc = headerCheck(s->ts, uh, uc, &msg;;
James Antill ee2eaf
     Py_END_ALLOW_THREADS;
James Antill ee2eaf
 
James Antill ee2eaf
-    return Py_BuildValue("(is)", rpmrc, msg);
James Antill ee2eaf
+    return Py_BuildValue("(iN)", rpmrc, utf8FromString(msg));
James Antill ee2eaf
 }
James Antill ee2eaf
 
James Antill ee2eaf
 static PyObject *
James Antill ee2eaf
@@ -500,7 +501,7 @@ rpmtsCallback(const void * hd, const rpmCallbackType what,
James Antill ee2eaf
     /* Synthesize a python object for callback (if necessary). */
James Antill ee2eaf
     if (pkgObj == NULL) {
James Antill ee2eaf
 	if (h) {
James Antill ee2eaf
-	    pkgObj = Py_BuildValue("s", headerGetString(h, RPMTAG_NAME));
James Antill ee2eaf
+	    pkgObj = utf8FromString(headerGetString(h, RPMTAG_NAME));
James Antill ee2eaf
 	} else {
James Antill ee2eaf
 	    pkgObj = Py_None;
James Antill ee2eaf
 	    Py_INCREF(pkgObj);
James Antill ee2eaf
@@ -845,7 +846,7 @@ static PyObject *rpmts_get_tid(rpmtsObject *s, void *closure)
James Antill ee2eaf
 
James Antill ee2eaf
 static PyObject *rpmts_get_rootDir(rpmtsObject *s, void *closure)
James Antill ee2eaf
 {
James Antill ee2eaf
-    return Py_BuildValue("s", rpmtsRootDir(s->ts));
James Antill ee2eaf
+    return utf8FromString(rpmtsRootDir(s->ts));
James Antill ee2eaf
 }
James Antill ee2eaf
 
James Antill ee2eaf
 static int rpmts_set_scriptFd(rpmtsObject *s, PyObject *value, void *closure)
James Antill ee2eaf
diff --git a/python/spec-py.c b/python/spec-py.c
James Antill ee2eaf
index 4efdbf4bf..70b796531 100644
James Antill ee2eaf
--- a/python/spec-py.c
James Antill ee2eaf
+++ b/python/spec-py.c
James Antill ee2eaf
@@ -57,7 +57,7 @@ static PyObject *pkgGetSection(rpmSpecPkg pkg, int section)
James Antill ee2eaf
 {
James Antill ee2eaf
     char *sect = rpmSpecPkgGetSection(pkg, section);
James Antill ee2eaf
     if (sect != NULL) {
James Antill ee2eaf
-        PyObject *ps = PyBytes_FromString(sect);
James Antill ee2eaf
+        PyObject *ps = utf8FromString(sect);
James Antill ee2eaf
         free(sect);
James Antill ee2eaf
         if (ps != NULL)
James Antill ee2eaf
             return ps;
James Antill ee2eaf
@@ -158,7 +158,7 @@ static PyObject * getSection(rpmSpec spec, int section)
James Antill ee2eaf
 {
James Antill ee2eaf
     const char *sect = rpmSpecGetSection(spec, section);
James Antill ee2eaf
     if (sect) {
James Antill ee2eaf
-	return Py_BuildValue("s", sect);
James Antill ee2eaf
+	return utf8FromString(sect);
James Antill ee2eaf
     }
James Antill ee2eaf
     Py_RETURN_NONE;
James Antill ee2eaf
 }
James Antill ee2eaf
@@ -208,8 +208,8 @@ static PyObject * spec_get_sources(specObject *s, void *closure)
James Antill ee2eaf
 
James Antill ee2eaf
     rpmSpecSrcIter iter = rpmSpecSrcIterInit(s->spec);
James Antill ee2eaf
     while ((source = rpmSpecSrcIterNext(iter)) != NULL) {
James Antill ee2eaf
-	PyObject *srcUrl = Py_BuildValue("(sii)",
James Antill ee2eaf
-				rpmSpecSrcFilename(source, 1),
James Antill ee2eaf
+	PyObject *srcUrl = Py_BuildValue("(Nii)",
James Antill ee2eaf
+				utf8FromString(rpmSpecSrcFilename(source, 1)),
James Antill ee2eaf
 				rpmSpecSrcNum(source),
James Antill ee2eaf
 				rpmSpecSrcFlags(source)); 
James Antill ee2eaf
         if (!srcUrl) {
James Antill ee2eaf
diff --git a/tests/local.at b/tests/local.at
James Antill ee2eaf
index 02ead66c9..42eef1c75 100644
James Antill ee2eaf
--- a/tests/local.at
James Antill ee2eaf
+++ b/tests/local.at
James Antill ee2eaf
@@ -10,6 +10,7 @@ rm -rf "${abs_builddir}"/testing`rpm --eval '%_dbpath'`/*
James Antill ee2eaf
 
James Antill ee2eaf
 m4_define([RPMPY_RUN],[[
James Antill ee2eaf
 cat << EOF > test.py
James Antill ee2eaf
+# coding=utf-8
James Antill ee2eaf
 import rpm, sys
James Antill ee2eaf
 dbpath=rpm.expandMacro('%_dbpath')
James Antill ee2eaf
 rpm.addMacro('_dbpath', '${abs_builddir}/testing%s' % dbpath)
James Antill ee2eaf
diff --git a/tests/rpmpython.at b/tests/rpmpython.at
James Antill ee2eaf
index ff77f868c..58f3e84a6 100644
James Antill ee2eaf
--- a/tests/rpmpython.at
James Antill ee2eaf
+++ b/tests/rpmpython.at
James Antill ee2eaf
@@ -106,6 +106,25 @@ None
James Antill ee2eaf
 'rpm.hdr' object has no attribute '__foo__']
James Antill ee2eaf
 )
James Antill ee2eaf
 
James Antill ee2eaf
+RPMPY_TEST([non-utf8 data in header],[
James Antill ee2eaf
+str = u'älämölö'
James Antill ee2eaf
+enc = 'iso-8859-1'
James Antill ee2eaf
+b = str.encode(enc)
James Antill ee2eaf
+h = rpm.hdr()
James Antill ee2eaf
+h['group'] = b
James Antill ee2eaf
+d = h['group']
James Antill ee2eaf
+try:
James Antill ee2eaf
+    # python 3
James Antill ee2eaf
+    t = bytes(d, 'utf-8', 'surrogateescape')
James Antill ee2eaf
+except TypeError:
James Antill ee2eaf
+    # python 2
James Antill ee2eaf
+    t = bytes(d)
James Antill ee2eaf
+res = t.decode(enc)
James Antill ee2eaf
+myprint(str == res)
James Antill ee2eaf
+],
James Antill ee2eaf
+[True]
James Antill ee2eaf
+)
James Antill ee2eaf
+
James Antill ee2eaf
 RPMPY_TEST([invalid header data],[
James Antill ee2eaf
 h1 = rpm.hdr()
James Antill ee2eaf
 h1['basenames'] = ['bing', 'bang', 'bong']
James Antill ee2eaf
@@ -125,6 +144,21 @@ for h in [h1, h2]:
James Antill ee2eaf
 /opt/bing,/opt/bang,/flopt/bong]
James Antill ee2eaf
 )
James Antill ee2eaf
 
James Antill ee2eaf
+RPMPY_TEST([labelCompare],[
James Antill ee2eaf
+v = '1.0'
James Antill ee2eaf
+r = '1'
James Antill ee2eaf
+e = 3
James Antill ee2eaf
+h = rpm.hdr()
James Antill ee2eaf
+h['name'] = 'testpkg'
James Antill ee2eaf
+h['version'] = v
James Antill ee2eaf
+h['release'] = r
James Antill ee2eaf
+h['epoch'] = e
James Antill ee2eaf
+myprint(rpm.labelCompare((str(h['epoch']), h['version'], h['release']),
James Antill ee2eaf
+			 (str(e), v, r)))
James Antill ee2eaf
+],
James Antill ee2eaf
+[0]
James Antill ee2eaf
+)
James Antill ee2eaf
+
James Antill ee2eaf
 RPMPY_TEST([vfyflags API],[
James Antill ee2eaf
 ts = rpm.ts()
James Antill ee2eaf
 dlv = ts.getVfyFlags()
James Antill ee2eaf
-- 
James Antill ee2eaf
2.20.1
James Antill ee2eaf