Blob Blame History Raw
From be3448ed5d9d59752cff4df8325ee67eb7d41531 Mon Sep 17 00:00:00 2001
From: Xavi Hernandez <xhernandez@redhat.com>
Date: Mon, 19 Jul 2021 06:56:18 +0200
Subject: [PATCH 592/610] md-cache: fix integer signedness mismatch

md-cache uses a mechanism based on a generation number to detect
modifications made by other clients to the entries and invalidate
the cached data.

This generation number is a 32 bit integer. When it overflows,
special management is done to avoid problems. This overflow condition
is tracked with a single bit.

For many fops, when they are received, the overflow bit and the
current generation number are recorded in a single 64-bit value
which is used later in the cbk.

This is the problematic function:

    uint64_t
    __mdc_get_generation(xlator_t *this, struct md_cache *mdc)
    {
        uint64_t gen = 0, rollover;
        struct mdc_conf *conf = NULL;

        conf = this->private;

        gen = GF_ATOMIC_INC(conf->generation);
        if (gen == 0) {
            gf_log("MDC", GF_LOG_NOTICE, "%p Reset 1", mdc);
            mdc->gen_rollover = !mdc->gen_rollover;
            gen = GF_ATOMIC_INC(conf->generation);
            mdc->ia_time = 0;
            mdc->generation = 0;
            mdc->invalidation_time = gen - 1;
        }

        rollover = mdc->gen_rollover;
        gen |= (rollover << 32);
        return gen;
    }

'conf->generation' is declared as an atomic signed 32-bit integer,
and 'gen' is an unsigned 64-bit value. When 'gen' is assigned from
a signed int, the sign bit is extended to fill the high 32 bits of
'gen'. If the counter has overflown the maximum signed positive
value, it will become negative (sign bit = 1).

In this case, when 'rollover' is later combined with 'gen', all the
high bits remain at '1'.

This value is used later in 'mdc_inode_iatt_set_validate' during
callback processing. The overflow condition and generation numbers
from when the operation was received are recovered this way:

    rollover = incident_time >> 32;
    incident_time = (incident_time & 0xffffffff);

('incident_time' is the saved value from '__mdc_get_generation').

So here rollover will be 0xffffffff, when it's expected to be 0
or 1 only. When this is compared later with the cached overflow
bit, it doesn't match, which prevents updating the cached info.

This is bad in general, but it's even worse when an entry is not
cached and 'rollover' is 0xffffffff the first time. When md-cache
doesn't have cached data it assumes it's everything 0. This causes
a mismatch, which sends an invalidation request to the kernel, but
since the 'rollover' doesn't match, the cached data is not updated.
So the next time the cached data is checked, it will also send an
invalidation to the kernel, indefinitely.

This patch fixes two things:

1. The 'generation' field is made unsigned to avoid sign extension.
2. Invalidation requests are only sent if we already had valid cached
   data. Otherwise it doesn't make sense to send an invalidation.

Upstream patch:
> Upstream-patch-link: https://github.com/gluster/glusterfs/pull/2619
> Fixes: #2617
> Change-Id: Ie40e68288cf143e1bc1a40f46da98f51bb2d6864
> Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>

BUG: 1904137
Change-Id: Ie40e68288cf143e1bc1a40f46da98f51bb2d6864
Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
Reviewed-on: https://code.engineering.redhat.com/gerrit/c/rhs-glusterfs/+/279188
Tested-by: RHGS Build Bot <nigelb@redhat.com>
Reviewed-by: Sunil Kumar Heggodu Gopala Acharya <sheggodu@redhat.com>
---
 xlators/performance/md-cache/src/md-cache.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/xlators/performance/md-cache/src/md-cache.c b/xlators/performance/md-cache/src/md-cache.c
index bbbee3b..e0256d6 100644
--- a/xlators/performance/md-cache/src/md-cache.c
+++ b/xlators/performance/md-cache/src/md-cache.c
@@ -79,7 +79,7 @@ struct mdc_conf {
     gf_boolean_t cache_statfs;
     struct mdc_statfs_cache statfs_cache;
     char *mdc_xattr_str;
-    gf_atomic_int32_t generation;
+    gf_atomic_uint32_t generation;
 };
 
 struct mdc_local;
@@ -537,7 +537,7 @@ mdc_inode_iatt_set_validate(xlator_t *this, inode_t *inode, struct iatt *prebuf,
             (iatt->ia_mtime_nsec != mdc->md_mtime_nsec) ||
             (iatt->ia_ctime != mdc->md_ctime) ||
             (iatt->ia_ctime_nsec != mdc->md_ctime_nsec)) {
-            if (conf->global_invalidation &&
+            if (conf->global_invalidation && mdc->valid &&
                 (!prebuf || (prebuf->ia_mtime != mdc->md_mtime) ||
                  (prebuf->ia_mtime_nsec != mdc->md_mtime_nsec) ||
                  (prebuf->ia_ctime != mdc->md_ctime) ||
-- 
1.8.3.1