richardphibel / rpms / dnf

Forked from rpms/dnf 2 years ago
Clone

Blame SOURCES/0023-Base.reset-plug-temporary-leak-of-libsolv-s-page-fil.patch

52b19a
From 88a6289a4f72b11253c01a5a5d834b74d5abb6c3 Mon Sep 17 00:00:00 2001
52b19a
From: Laszlo Ersek <lersek@redhat.com>
52b19a
Date: Sun, 24 Apr 2022 09:08:28 +0200
52b19a
Subject: [PATCH] Base.reset: plug (temporary) leak of libsolv's page file
52b19a
 descriptors
52b19a
52b19a
Consider the following call paths (mixed Python and C), extending from
52b19a
livecd-creator down to libsolv:
52b19a
52b19a
  main                                                [livecd-tools/tools/livecd-creator]
52b19a
    install()                                         [livecd-tools/imgcreate/creator.py]
52b19a
      fill_sack()                                     [dnf/dnf/base.py]
52b19a
        _add_repo_to_sack()                           [dnf/dnf/base.py]
52b19a
          load_repo()                                 [libdnf/python/hawkey/sack-py.cpp]
52b19a
            dnf_sack_load_repo()                      [libdnf/libdnf/dnf-sack.cpp]
52b19a
              write_main()                            [libdnf/libdnf/dnf-sack.cpp]
52b19a
                repo_add_solv()                       [libsolv/src/repo_solv.c]
52b19a
                  repopagestore_read_or_setup_pages() [libsolv/src/repopage.c]
52b19a
                    dup()
52b19a
              write_ext()                             [libdnf/libdnf/dnf-sack.cpp]
52b19a
                repo_add_solv()                       [libsolv/src/repo_solv.c]
52b19a
                  repopagestore_read_or_setup_pages() [libsolv/src/repopage.c]
52b19a
                    dup()
52b19a
52b19a
The dup() calls create the following file descriptors (output from
52b19a
"lsof"):
52b19a
52b19a
> COMMAND  PID USER FD  TYPE DEVICE SIZE/OFF   NODE NAME
52b19a
> python3 6500 root  7r  REG    8,1 25320727 395438 /var/tmp/imgcreate-mytcghah/install_root/var/cache/dnf/fedora.solv (deleted)
52b19a
> python3 6500 root  8r  REG    8,1 52531426 395450 /var/tmp/imgcreate-mytcghah/install_root/var/cache/dnf/fedora-filenames.solvx
52b19a
52b19a
These file descriptors are *owned* by the DnfSack object (which is derived
52b19a
from GObject), as follows:
52b19a
52b19a
  sack->priv->pool->repos[1]->repodata[1]->store.pagefd = 7
52b19a
  sack->priv->pool->repos[1]->repodata[2]->store.pagefd = 8
52b19a
  ^     ^     ^     ^         ^            ^      ^
52b19a
  |     |     |     |         |            |      |
52b19a
  |     |     |     |         |            |      int
52b19a
  |     |     |     |         |            Repopagestore [libsolv/src/repopage.h]
52b19a
  |     |     |     |         Repodata                   [libsolv/src/repodata.h]
52b19a
  |     |     |     struct s_Repo                        [libsolv/src/repo.h]
52b19a
  |     |     struct s_Pool (aka Pool)                   [libsolv/src/pool.h]
52b19a
  |     DnfSackPrivate                                   [libdnf/libdnf/dnf-sack.cpp]
52b19a
  DnfSack                                                [libdnf/libdnf/dnf-sack.h]
52b19a
52b19a
The file descriptors are *supposed* to be closed on the following call
52b19a
path:
52b19a
52b19a
  main                                         [livecd-tools/tools/livecd-creator]
52b19a
    install()                                  [livecd-tools/imgcreate/creator.py]
52b19a
      close()                                  [livecd-tools/imgcreate/dnfinst.py]
52b19a
        close()                                [dnf/dnf/base.py]
52b19a
          reset()                              [dnf/dnf/base.py]
52b19a
            _sack = None
52b19a
            _goal = None
52b19a
            _transaction = None
52b19a
              ...
52b19a
                dnf_sack_finalize()            [libdnf/libdnf/dnf-sack.cpp]
52b19a
                  pool_free()                  [libsolv/src/pool.c]
52b19a
                    pool_freeallrepos()        [libsolv/src/pool.c]
52b19a
                      repo_freedata()          [libsolv/src/repo.c]
52b19a
                        repodata_freedata()    [libsolv/src/repodata.c]
52b19a
                          repopagestore_free() [libsolv/src/repopage.c]
52b19a
                            close()
52b19a
52b19a
Namely, when dnf.Base.reset() [dnf/dnf/base.py] is called with (sack=True,
52b19a
goal=True), the reference counts of the objects pointed to by the "_sack",
52b19a
"_goal" and "_transaction" fields are supposed to reach zero, and then, as
52b19a
part of the DnfSack object's finalization, the libsolv file descriptors
52b19a
are supposed to be closed.
52b19a
52b19a
Now, while this *may* happen immediately in dnf.Base.reset(), it may as
52b19a
well not. The reason is that there is a multitude of *circular references*
52b19a
between DnfSack and the packages that it contains. When dnf.Base.reset()
52b19a
is entered, we have the following picture:
52b19a
52b19a
     _sack                   _goal
52b19a
        |                      |
52b19a
        v                      v
52b19a
   +----------------+      +-------------+
52b19a
   | DnfSack object | <--- | Goal object |
52b19a
   +----------------+      +-------------+
52b19a
     |^    |^    |^
52b19a
     ||    ||    ||
52b19a
     ||    ||    ||
52b19a
  +--||----||----||---+
52b19a
  |  v|    v|    v|   | <-- _transaction
52b19a
  | Pkg1  Pkg2  PkgN  |
52b19a
  |                   |
52b19a
  | Transaction oject |
52b19a
  +-------------------+
52b19a
52b19a
That is, the reference count of the DnfSack object is (1 + 1 + N), where N
52b19a
is the number of packages in the transaction. Details:
52b19a
52b19a
(a) The first reference comes from the "_sack" field, established like
52b19a
    this:
52b19a
52b19a
      main                       [livecd-tools/tools/livecd-creator]
52b19a
        install()                [livecd-tools/imgcreate/creator.py]
52b19a
          fill_sack()            [dnf/dnf/base.py]
52b19a
            _build_sack()        [dnf/dnf/sack.py]
52b19a
              Sack()
52b19a
                sack_init()      [libdnf/python/hawkey/sack-py.cpp]
52b19a
                  dnf_sack_new() [libdnf/libdnf/dnf-sack.cpp]
52b19a
52b19a
(b) The second reference on the DnfSack object comes from "_goal":
52b19a
52b19a
      main                        [livecd-tools/tools/livecd-creator]
52b19a
        install()                 [livecd-tools/imgcreate/creator.py]
52b19a
          fill_sack()             [dnf/dnf/base.py]
52b19a
             _goal = Goal(_sack)
52b19a
               goal_init()        [libdnf/python/hawkey/goal-py.cpp]
52b19a
                 Py_INCREF(_sack)
52b19a
52b19a
(c) Then there is one reference to "_sack" *per package* in the
52b19a
    transaction:
52b19a
52b19a
      main                                  [livecd-tools/tools/livecd-creator]
52b19a
        install()                           [livecd-tools/imgcreate/creator.py]
52b19a
          runInstall()                      [livecd-tools/imgcreate/dnfinst.py]
52b19a
            resolve()                       [dnf/dnf/base.py]
52b19a
              _goal2transaction()           [dnf/dnf/base.py]
52b19a
                list_installs()             [libdnf/python/hawkey/goal-py.cpp]
52b19a
                  list_generic()            [libdnf/python/hawkey/goal-py.cpp]
52b19a
                    packagelist_to_pylist() [libdnf/python/hawkey/iutil-py.cpp]
52b19a
                      new_package()         [libdnf/python/hawkey/sack-py.cpp]
52b19a
                        Py_BuildValue()
52b19a
                ts.add_install()
52b19a
52b19a
    list_installs() creates a list of packages that need to be installed
52b19a
    by DNF. Inside the loop in packagelist_to_pylist(), which constructs
52b19a
    the elements of that list, Py_BuildValue() is called with the "O"
52b19a
    format specifier, and that increases the reference count on "_sack".
52b19a
52b19a
    Subsequently, in the _goal2transaction() method, we iterate over the
52b19a
    package list created by list_installs(), and add each package to the
52b19a
    transaction (ts.add_install()). After _goal2transaction() returns,
52b19a
    this transaction is assigned to "self._transaction" in resolve(). This
52b19a
    is where the last N (back-)references on the DnfSack object come from.
52b19a
52b19a
(d) Now, to quote the defintion of the DnfSack object
52b19a
    ("libdnf/docs/hawkey/tutorial-py.rst"):
52b19a
52b19a
> *Sack* is an abstraction for a collection of packages.
52b19a
52b19a
    That's why the DnfSack object references all the Pkg1 through PkgN
52b19a
    packages.
52b19a
52b19a
So, when the dnf.Base.reset() method completes, the picture changes like
52b19a
this:
52b19a
52b19a
     _sack                     _goal
52b19a
        |                        |
52b19a
   -- [CUT] --              -- [CUT] --
52b19a
        |                        |
52b19a
        v                |       v
52b19a
   +----------------+   [C]  +-------------+
52b19a
   | DnfSack object | <-[U]- | Goal object |
52b19a
   +----------------+   [T]  +-------------+
52b19a
     |^    |^    |^      |
52b19a
     ||    ||    ||
52b19a
     ||    ||    ||         |
52b19a
  +--||----||----||---+    [C]
52b19a
  |  v|    v|    v|   | <--[U]-- _transaction
52b19a
  | Pkg1  Pkg2  PkgN  |    [T]
52b19a
  |                   |     |
52b19a
  | Transaction oject |
52b19a
  +-------------------+
52b19a
52b19a
and we are left with N reference cycles (one between each package and the
52b19a
same DnfSack object).
52b19a
52b19a
This set of cycles can only be cleaned up by Python's generational garbage
52b19a
collector <https://stackify.com/python-garbage-collection/>. The GC will
52b19a
collect the DnfSack object, and consequently close the libsolv page file
52b19a
descriptors via dnf_sack_finalize() -- but garbage collection will happen
52b19a
*only eventually*, unpredictably.
52b19a
52b19a
This means that the dnf.Base.reset() method breaks its interface contract:
52b19a
52b19a
> Make the Base object forget about various things.
52b19a
52b19a
because the libsolv file descriptors can (and frequently do, in practice)
52b19a
survive dnf.Base.reset().
52b19a
52b19a
In general, as long as the garbage collector only tracks process-private
52b19a
memory blocks, there's nothing wrong; however, file descriptors are
52b19a
visible to the kernel. When dnf.Base.reset() *temporarily* leaks file
52b19a
descriptors as explained above, then immediately subsequent operations
52b19a
that depend on those file descriptors having been closed, can fail.
52b19a
52b19a
An example is livecd-creator's unmounting of:
52b19a
52b19a
  /var/tmp/imgcreate-mytcghah/install_root/var/cache/dnf
52b19a
52b19a
which the kernel refuses, due to libsolv's still open file descriptors
52b19a
pointing into that filesystem:
52b19a
52b19a
> umount: /var/tmp/imgcreate-mytcghah/install_root/var/cache/dnf: target
52b19a
> is busy.
52b19a
> Unable to unmount /var/tmp/imgcreate-mytcghah/install_root/var/cache/dnf
52b19a
> normally, using lazy unmount
52b19a
52b19a
(Unfortunately, the whole lazy umount idea is misguided in livecd-tools;
52b19a
it's a misfeature that should be removed, as it permits the corruption of
52b19a
the loop-backed filesystem. Now that the real bug is being fixed in DNF,
52b19a
lazy umount is not needed as a (broken) workaround in livecd-tools. But
52b19a
that's a separate patch for livecd-tools:
52b19a
<https://github.com/livecd-tools/livecd-tools/pull/227>.)
52b19a
52b19a
Plug the fd leak by forcing a garbage collection in dnf.Base.reset()
52b19a
whenever we cut the "_sack", "_goal" and "_transaction" links -- that is,
52b19a
when the "sack" and "goal" parameters are True.
52b19a
52b19a
Note that precisely due to the unpredictable behavior of the garbage
52b19a
collector, reproducing the bug may prove elusive. In order to reproduce it
52b19a
deterministically, through usage with livecd-creator, disabling automatic
52b19a
garbage collection with the following patch (for livecd-tools) is
52b19a
sufficient:
52b19a
52b19a
> diff --git a/tools/livecd-creator b/tools/livecd-creator
52b19a
> index 291de10cbbf9..8d2c740c238b 100755
52b19a
> --- a/tools/livecd-creator
52b19a
> +++ b/tools/livecd-creator
52b19a
> @@ -31,6 +31,8 @@ from dnf.exceptions import Error as DnfBaseError
52b19a
>  import imgcreate
52b19a
>  from imgcreate.errors import KickstartError
52b19a
>
52b19a
> +import gc
52b19a
> +
52b19a
>  class Usage(Exception):
52b19a
>      def __init__(self, msg = None, no_error = False):
52b19a
>          Exception.__init__(self, msg, no_error)
52b19a
> @@ -261,5 +263,6 @@ def do_nss_libs_hack():
52b19a
>      return hack
52b19a
>
52b19a
>  if __name__ == "__main__":
52b19a
> +    gc.disable()
52b19a
>      hack = do_nss_libs_hack()
52b19a
>      sys.exit(main())
52b19a
52b19a
Also note that you need to use livecd-tools at git commit 4afde9352e82 or
52b19a
later, for this fix to make any difference: said commit fixes a different
52b19a
(independent) bug in livecd-tools that produces identical symptoms, but
52b19a
from a different origin. In other words, if you don't have commit
52b19a
4afde9352e82 in your livecd-tools install, then said bug in livecd-tools
52b19a
will mask this DNF fix.
52b19a
52b19a
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
52b19a
(cherry picked from commit 5ce5ed1ea08ad6e198c1c1642c4d9ea2db6eab86)
52b19a
---
52b19a
 dnf/base.py | 41 +++++++++++++++++++++++++++++++++++++++++
52b19a
 1 file changed, 41 insertions(+)
52b19a
52b19a
diff --git a/dnf/base.py b/dnf/base.py
52b19a
index babca31d..852fcdd8 100644
52b19a
--- a/dnf/base.py
52b19a
+++ b/dnf/base.py
52b19a
@@ -72,6 +72,7 @@ import dnf.transaction
52b19a
 import dnf.util
52b19a
 import dnf.yum.rpmtrans
52b19a
 import functools
52b19a
+import gc
52b19a
 import hawkey
52b19a
 import itertools
52b19a
 import logging
52b19a
@@ -568,6 +569,46 @@ class Base(object):
52b19a
             self._comps_trans = dnf.comps.TransactionBunch()
52b19a
             self._transaction = None
52b19a
         self._update_security_filters = []
52b19a
+        if sack and goal:
52b19a
+            # We've just done this, above:
52b19a
+            #
52b19a
+            #      _sack                     _goal
52b19a
+            #         |                        |
52b19a
+            #    -- [CUT] --              -- [CUT] --
52b19a
+            #         |                        |
52b19a
+            #         v                |       v
52b19a
+            #    +----------------+   [C]  +-------------+
52b19a
+            #    | DnfSack object | <-[U]- | Goal object |
52b19a
+            #    +----------------+   [T]  +-------------+
52b19a
+            #      |^    |^    |^      |
52b19a
+            #      ||    ||    ||
52b19a
+            #      ||    ||    ||         |
52b19a
+            #   +--||----||----||---+    [C]
52b19a
+            #   |  v|    v|    v|   | <--[U]-- _transaction
52b19a
+            #   | Pkg1  Pkg2  PkgN  |    [T]
52b19a
+            #   |                   |     |
52b19a
+            #   | Transaction oject |
52b19a
+            #   +-------------------+
52b19a
+            #
52b19a
+            # At this point, the DnfSack object would be released only
52b19a
+            # eventually, by Python's generational garbage collector, due to the
52b19a
+            # cyclic references DnfSack<->Pkg1 ... DnfSack<->PkgN.
52b19a
+            #
52b19a
+            # The delayed release is a problem: the DnfSack object may
52b19a
+            # (indirectly) own "page file" file descriptors in libsolv, via
52b19a
+            # libdnf. For example,
52b19a
+            #
52b19a
+            #   sack->priv->pool->repos[1]->repodata[1]->store.pagefd = 7
52b19a
+            #   sack->priv->pool->repos[1]->repodata[2]->store.pagefd = 8
52b19a
+            #
52b19a
+            # These file descriptors are closed when the DnfSack object is
52b19a
+            # eventually released, that is, when dnf_sack_finalize() (in libdnf)
52b19a
+            # calls pool_free() (in libsolv).
52b19a
+            #
52b19a
+            # We need that to happen right now, as callers may want to unmount
52b19a
+            # the filesystems which those file descriptors refer to immediately
52b19a
+            # after reset() returns. Therefore, force a garbage collection here.
52b19a
+            gc.collect()
52b19a
 
52b19a
     def _closeRpmDB(self):
52b19a
         """Closes down the instances of rpmdb that could be open."""
52b19a
-- 
52b19a
2.35.1
52b19a