dcavalca / rpms / dnf

Forked from rpms/dnf 2 years ago
Clone

Blame SOURCES/0001-Base.reset-plug-temporary-leak-of-libsolv-s-page-fil.patch

a58eb0
From 5ce5ed1ea08ad6e198c1c1642c4d9ea2db6eab86 Mon Sep 17 00:00:00 2001
a58eb0
From: Laszlo Ersek <lersek@redhat.com>
a58eb0
Date: Sun, 24 Apr 2022 09:08:28 +0200
a58eb0
Subject: [PATCH] Base.reset: plug (temporary) leak of libsolv's page file
a58eb0
 descriptors
a58eb0
a58eb0
Consider the following call paths (mixed Python and C), extending from
a58eb0
livecd-creator down to libsolv:
a58eb0
a58eb0
  main                                                [livecd-tools/tools/livecd-creator]
a58eb0
    install()                                         [livecd-tools/imgcreate/creator.py]
a58eb0
      fill_sack()                                     [dnf/dnf/base.py]
a58eb0
        _add_repo_to_sack()                           [dnf/dnf/base.py]
a58eb0
          load_repo()                                 [libdnf/python/hawkey/sack-py.cpp]
a58eb0
            dnf_sack_load_repo()                      [libdnf/libdnf/dnf-sack.cpp]
a58eb0
              write_main()                            [libdnf/libdnf/dnf-sack.cpp]
a58eb0
                repo_add_solv()                       [libsolv/src/repo_solv.c]
a58eb0
                  repopagestore_read_or_setup_pages() [libsolv/src/repopage.c]
a58eb0
                    dup()
a58eb0
              write_ext()                             [libdnf/libdnf/dnf-sack.cpp]
a58eb0
                repo_add_solv()                       [libsolv/src/repo_solv.c]
a58eb0
                  repopagestore_read_or_setup_pages() [libsolv/src/repopage.c]
a58eb0
                    dup()
a58eb0
a58eb0
The dup() calls create the following file descriptors (output from
a58eb0
"lsof"):
a58eb0
a58eb0
> COMMAND  PID USER FD  TYPE DEVICE SIZE/OFF   NODE NAME
a58eb0
> python3 6500 root  7r  REG    8,1 25320727 395438 /var/tmp/imgcreate-mytcghah/install_root/var/cache/dnf/fedora.solv (deleted)
a58eb0
> python3 6500 root  8r  REG    8,1 52531426 395450 /var/tmp/imgcreate-mytcghah/install_root/var/cache/dnf/fedora-filenames.solvx
a58eb0
a58eb0
These file descriptors are *owned* by the DnfSack object (which is derived
a58eb0
from GObject), as follows:
a58eb0
a58eb0
  sack->priv->pool->repos[1]->repodata[1]->store.pagefd = 7
a58eb0
  sack->priv->pool->repos[1]->repodata[2]->store.pagefd = 8
a58eb0
  ^     ^     ^     ^         ^            ^      ^
a58eb0
  |     |     |     |         |            |      |
a58eb0
  |     |     |     |         |            |      int
a58eb0
  |     |     |     |         |            Repopagestore [libsolv/src/repopage.h]
a58eb0
  |     |     |     |         Repodata                   [libsolv/src/repodata.h]
a58eb0
  |     |     |     struct s_Repo                        [libsolv/src/repo.h]
a58eb0
  |     |     struct s_Pool (aka Pool)                   [libsolv/src/pool.h]
a58eb0
  |     DnfSackPrivate                                   [libdnf/libdnf/dnf-sack.cpp]
a58eb0
  DnfSack                                                [libdnf/libdnf/dnf-sack.h]
a58eb0
a58eb0
The file descriptors are *supposed* to be closed on the following call
a58eb0
path:
a58eb0
a58eb0
  main                                         [livecd-tools/tools/livecd-creator]
a58eb0
    install()                                  [livecd-tools/imgcreate/creator.py]
a58eb0
      close()                                  [livecd-tools/imgcreate/dnfinst.py]
a58eb0
        close()                                [dnf/dnf/base.py]
a58eb0
          reset()                              [dnf/dnf/base.py]
a58eb0
            _sack = None
a58eb0
            _goal = None
a58eb0
            _transaction = None
a58eb0
              ...
a58eb0
                dnf_sack_finalize()            [libdnf/libdnf/dnf-sack.cpp]
a58eb0
                  pool_free()                  [libsolv/src/pool.c]
a58eb0
                    pool_freeallrepos()        [libsolv/src/pool.c]
a58eb0
                      repo_freedata()          [libsolv/src/repo.c]
a58eb0
                        repodata_freedata()    [libsolv/src/repodata.c]
a58eb0
                          repopagestore_free() [libsolv/src/repopage.c]
a58eb0
                            close()
a58eb0
a58eb0
Namely, when dnf.Base.reset() [dnf/dnf/base.py] is called with (sack=True,
a58eb0
goal=True), the reference counts of the objects pointed to by the "_sack",
a58eb0
"_goal" and "_transaction" fields are supposed to reach zero, and then, as
a58eb0
part of the DnfSack object's finalization, the libsolv file descriptors
a58eb0
are supposed to be closed.
a58eb0
a58eb0
Now, while this *may* happen immediately in dnf.Base.reset(), it may as
a58eb0
well not. The reason is that there is a multitude of *circular references*
a58eb0
between DnfSack and the packages that it contains. When dnf.Base.reset()
a58eb0
is entered, we have the following picture:
a58eb0
a58eb0
     _sack                   _goal
a58eb0
        |                      |
a58eb0
        v                      v
a58eb0
   +----------------+      +-------------+
a58eb0
   | DnfSack object | <--- | Goal object |
a58eb0
   +----------------+      +-------------+
a58eb0
     |^    |^    |^
a58eb0
     ||    ||    ||
a58eb0
     ||    ||    ||
a58eb0
  +--||----||----||---+
a58eb0
  |  v|    v|    v|   | <-- _transaction
a58eb0
  | Pkg1  Pkg2  PkgN  |
a58eb0
  |                   |
a58eb0
  | Transaction oject |
a58eb0
  +-------------------+
a58eb0
a58eb0
That is, the reference count of the DnfSack object is (1 + 1 + N), where N
a58eb0
is the number of packages in the transaction. Details:
a58eb0
a58eb0
(a) The first reference comes from the "_sack" field, established like
a58eb0
    this:
a58eb0
a58eb0
      main                       [livecd-tools/tools/livecd-creator]
a58eb0
        install()                [livecd-tools/imgcreate/creator.py]
a58eb0
          fill_sack()            [dnf/dnf/base.py]
a58eb0
            _build_sack()        [dnf/dnf/sack.py]
a58eb0
              Sack()
a58eb0
                sack_init()      [libdnf/python/hawkey/sack-py.cpp]
a58eb0
                  dnf_sack_new() [libdnf/libdnf/dnf-sack.cpp]
a58eb0
a58eb0
(b) The second reference on the DnfSack object comes from "_goal":
a58eb0
a58eb0
      main                        [livecd-tools/tools/livecd-creator]
a58eb0
        install()                 [livecd-tools/imgcreate/creator.py]
a58eb0
          fill_sack()             [dnf/dnf/base.py]
a58eb0
             _goal = Goal(_sack)
a58eb0
               goal_init()        [libdnf/python/hawkey/goal-py.cpp]
a58eb0
                 Py_INCREF(_sack)
a58eb0
a58eb0
(c) Then there is one reference to "_sack" *per package* in the
a58eb0
    transaction:
a58eb0
a58eb0
      main                                  [livecd-tools/tools/livecd-creator]
a58eb0
        install()                           [livecd-tools/imgcreate/creator.py]
a58eb0
          runInstall()                      [livecd-tools/imgcreate/dnfinst.py]
a58eb0
            resolve()                       [dnf/dnf/base.py]
a58eb0
              _goal2transaction()           [dnf/dnf/base.py]
a58eb0
                list_installs()             [libdnf/python/hawkey/goal-py.cpp]
a58eb0
                  list_generic()            [libdnf/python/hawkey/goal-py.cpp]
a58eb0
                    packagelist_to_pylist() [libdnf/python/hawkey/iutil-py.cpp]
a58eb0
                      new_package()         [libdnf/python/hawkey/sack-py.cpp]
a58eb0
                        Py_BuildValue()
a58eb0
                ts.add_install()
a58eb0
a58eb0
    list_installs() creates a list of packages that need to be installed
a58eb0
    by DNF. Inside the loop in packagelist_to_pylist(), which constructs
a58eb0
    the elements of that list, Py_BuildValue() is called with the "O"
a58eb0
    format specifier, and that increases the reference count on "_sack".
a58eb0
a58eb0
    Subsequently, in the _goal2transaction() method, we iterate over the
a58eb0
    package list created by list_installs(), and add each package to the
a58eb0
    transaction (ts.add_install()). After _goal2transaction() returns,
a58eb0
    this transaction is assigned to "self._transaction" in resolve(). This
a58eb0
    is where the last N (back-)references on the DnfSack object come from.
a58eb0
a58eb0
(d) Now, to quote the defintion of the DnfSack object
a58eb0
    ("libdnf/docs/hawkey/tutorial-py.rst"):
a58eb0
a58eb0
> *Sack* is an abstraction for a collection of packages.
a58eb0
a58eb0
    That's why the DnfSack object references all the Pkg1 through PkgN
a58eb0
    packages.
a58eb0
a58eb0
So, when the dnf.Base.reset() method completes, the picture changes like
a58eb0
this:
a58eb0
a58eb0
     _sack                     _goal
a58eb0
        |                        |
a58eb0
   -- [CUT] --              -- [CUT] --
a58eb0
        |                        |
a58eb0
        v                |       v
a58eb0
   +----------------+   [C]  +-------------+
a58eb0
   | DnfSack object | <-[U]- | Goal object |
a58eb0
   +----------------+   [T]  +-------------+
a58eb0
     |^    |^    |^      |
a58eb0
     ||    ||    ||
a58eb0
     ||    ||    ||         |
a58eb0
  +--||----||----||---+    [C]
a58eb0
  |  v|    v|    v|   | <--[U]-- _transaction
a58eb0
  | Pkg1  Pkg2  PkgN  |    [T]
a58eb0
  |                   |     |
a58eb0
  | Transaction oject |
a58eb0
  +-------------------+
a58eb0
a58eb0
and we are left with N reference cycles (one between each package and the
a58eb0
same DnfSack object).
a58eb0
a58eb0
This set of cycles can only be cleaned up by Python's generational garbage
a58eb0
collector <https://stackify.com/python-garbage-collection/>. The GC will
a58eb0
collect the DnfSack object, and consequently close the libsolv page file
a58eb0
descriptors via dnf_sack_finalize() -- but garbage collection will happen
a58eb0
*only eventually*, unpredictably.
a58eb0
a58eb0
This means that the dnf.Base.reset() method breaks its interface contract:
a58eb0
a58eb0
> Make the Base object forget about various things.
a58eb0
a58eb0
because the libsolv file descriptors can (and frequently do, in practice)
a58eb0
survive dnf.Base.reset().
a58eb0
a58eb0
In general, as long as the garbage collector only tracks process-private
a58eb0
memory blocks, there's nothing wrong; however, file descriptors are
a58eb0
visible to the kernel. When dnf.Base.reset() *temporarily* leaks file
a58eb0
descriptors as explained above, then immediately subsequent operations
a58eb0
that depend on those file descriptors having been closed, can fail.
a58eb0
a58eb0
An example is livecd-creator's unmounting of:
a58eb0
a58eb0
  /var/tmp/imgcreate-mytcghah/install_root/var/cache/dnf
a58eb0
a58eb0
which the kernel refuses, due to libsolv's still open file descriptors
a58eb0
pointing into that filesystem:
a58eb0
a58eb0
> umount: /var/tmp/imgcreate-mytcghah/install_root/var/cache/dnf: target
a58eb0
> is busy.
a58eb0
> Unable to unmount /var/tmp/imgcreate-mytcghah/install_root/var/cache/dnf
a58eb0
> normally, using lazy unmount
a58eb0
a58eb0
(Unfortunately, the whole lazy umount idea is misguided in livecd-tools;
a58eb0
it's a misfeature that should be removed, as it permits the corruption of
a58eb0
the loop-backed filesystem. Now that the real bug is being fixed in DNF,
a58eb0
lazy umount is not needed as a (broken) workaround in livecd-tools. But
a58eb0
that's a separate patch for livecd-tools:
a58eb0
<https://github.com/livecd-tools/livecd-tools/pull/227>.)
a58eb0
a58eb0
Plug the fd leak by forcing a garbage collection in dnf.Base.reset()
a58eb0
whenever we cut the "_sack", "_goal" and "_transaction" links -- that is,
a58eb0
when the "sack" and "goal" parameters are True.
a58eb0
a58eb0
Note that precisely due to the unpredictable behavior of the garbage
a58eb0
collector, reproducing the bug may prove elusive. In order to reproduce it
a58eb0
deterministically, through usage with livecd-creator, disabling automatic
a58eb0
garbage collection with the following patch (for livecd-tools) is
a58eb0
sufficient:
a58eb0
a58eb0
> diff --git a/tools/livecd-creator b/tools/livecd-creator
a58eb0
> index 291de10cbbf9..8d2c740c238b 100755
a58eb0
> --- a/tools/livecd-creator
a58eb0
> +++ b/tools/livecd-creator
a58eb0
> @@ -31,6 +31,8 @@ from dnf.exceptions import Error as DnfBaseError
a58eb0
>  import imgcreate
a58eb0
>  from imgcreate.errors import KickstartError
a58eb0
>
a58eb0
> +import gc
a58eb0
> +
a58eb0
>  class Usage(Exception):
a58eb0
>      def __init__(self, msg = None, no_error = False):
a58eb0
>          Exception.__init__(self, msg, no_error)
a58eb0
> @@ -261,5 +263,6 @@ def do_nss_libs_hack():
a58eb0
>      return hack
a58eb0
>
a58eb0
>  if __name__ == "__main__":
a58eb0
> +    gc.disable()
a58eb0
>      hack = do_nss_libs_hack()
a58eb0
>      sys.exit(main())
a58eb0
a58eb0
Also note that you need to use livecd-tools at git commit 4afde9352e82 or
a58eb0
later, for this fix to make any difference: said commit fixes a different
a58eb0
(independent) bug in livecd-tools that produces identical symptoms, but
a58eb0
from a different origin. In other words, if you don't have commit
a58eb0
4afde9352e82 in your livecd-tools install, then said bug in livecd-tools
a58eb0
will mask this DNF fix.
a58eb0
a58eb0
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
a58eb0
---
a58eb0
 dnf/base.py | 41 +++++++++++++++++++++++++++++++++++++++++
a58eb0
 1 file changed, 41 insertions(+)
a58eb0
a58eb0
diff --git a/dnf/base.py b/dnf/base.py
a58eb0
index caace028..520574b4 100644
a58eb0
--- a/dnf/base.py
a58eb0
+++ b/dnf/base.py
a58eb0
@@ -72,6 +72,7 @@ import dnf.transaction
a58eb0
 import dnf.util
a58eb0
 import dnf.yum.rpmtrans
a58eb0
 import functools
a58eb0
+import gc
a58eb0
 import hawkey
a58eb0
 import itertools
a58eb0
 import logging
a58eb0
@@ -569,6 +570,46 @@ class Base(object):
a58eb0
             self._comps_trans = dnf.comps.TransactionBunch()
a58eb0
             self._transaction = None
a58eb0
         self._update_security_filters = []
a58eb0
+        if sack and goal:
a58eb0
+            # We've just done this, above:
a58eb0
+            #
a58eb0
+            #      _sack                     _goal
a58eb0
+            #         |                        |
a58eb0
+            #    -- [CUT] --              -- [CUT] --
a58eb0
+            #         |                        |
a58eb0
+            #         v                |       v
a58eb0
+            #    +----------------+   [C]  +-------------+
a58eb0
+            #    | DnfSack object | <-[U]- | Goal object |
a58eb0
+            #    +----------------+   [T]  +-------------+
a58eb0
+            #      |^    |^    |^      |
a58eb0
+            #      ||    ||    ||
a58eb0
+            #      ||    ||    ||         |
a58eb0
+            #   +--||----||----||---+    [C]
a58eb0
+            #   |  v|    v|    v|   | <--[U]-- _transaction
a58eb0
+            #   | Pkg1  Pkg2  PkgN  |    [T]
a58eb0
+            #   |                   |     |
a58eb0
+            #   | Transaction oject |
a58eb0
+            #   +-------------------+
a58eb0
+            #
a58eb0
+            # At this point, the DnfSack object would be released only
a58eb0
+            # eventually, by Python's generational garbage collector, due to the
a58eb0
+            # cyclic references DnfSack<->Pkg1 ... DnfSack<->PkgN.
a58eb0
+            #
a58eb0
+            # The delayed release is a problem: the DnfSack object may
a58eb0
+            # (indirectly) own "page file" file descriptors in libsolv, via
a58eb0
+            # libdnf. For example,
a58eb0
+            #
a58eb0
+            #   sack->priv->pool->repos[1]->repodata[1]->store.pagefd = 7
a58eb0
+            #   sack->priv->pool->repos[1]->repodata[2]->store.pagefd = 8
a58eb0
+            #
a58eb0
+            # These file descriptors are closed when the DnfSack object is
a58eb0
+            # eventually released, that is, when dnf_sack_finalize() (in libdnf)
a58eb0
+            # calls pool_free() (in libsolv).
a58eb0
+            #
a58eb0
+            # We need that to happen right now, as callers may want to unmount
a58eb0
+            # the filesystems which those file descriptors refer to immediately
a58eb0
+            # after reset() returns. Therefore, force a garbage collection here.
a58eb0
+            gc.collect()
a58eb0
 
a58eb0
     def _closeRpmDB(self):
a58eb0
         """Closes down the instances of rpmdb that could be open."""
a58eb0
-- 
a58eb0
2.35.1
a58eb0