Blob Blame History Raw
From 8217d00a0a54457961e7ec7d3afb24e953923c7d Mon Sep 17 00:00:00 2001
From: Ashish Pandey <aspandey@redhat.com>
Date: Tue, 13 Mar 2018 14:03:20 +0530
Subject: [PATCH 198/201] cluster/ec: Change default read policy to gfid-hash

Problem:
Whenever we read data from file over NFS, NFS reads
more data then requested and caches it. Based on the
stat information it makes sure that the cached/pre-read
data is valid or not.

Consider 4 + 2 EC volume and all the bricks are on
differnt nodes.

In EC, with round-robin read policy, reads are sent on
different set of data bricks. This way, it balances the
read fops to go on all the bricks and avoid heating UP
(overloading) same set of bricks.

Due to small difference in clock speed, it is possible
that we get minor difference for atime, mtime or ctime
for different bricks. That might cause a different stat
returned to NFS based on which NFS will discard
cached/pre-read data which is actually not changed and
could be used.

Solution:
Change read policy for EC as gfid-hash. That will force
all the read to go to same set of bricks.

>Change-Id: I825441cc519e94bf3dc3aa0bd4cb7c6ae6392c84
>BUG: 1554743
>Signed-off-by: Ashish Pandey <aspandey@redhat.com>

upstream patch: https://review.gluster.org/#/c/19703/

Change-Id: I43e95717980ca52c228fdcb7863c58bd4d14151c
BUG: 1559084
Signed-off-by: Ashish Pandey <aspandey@redhat.com>
Reviewed-on: https://code.engineering.redhat.com/gerrit/133746
Tested-by: RHGS Build Bot <nigelb@redhat.com>
Reviewed-by: Sunil Kumar Heggodu Gopala Acharya <sheggodu@redhat.com>
---
 tests/basic/ec/ec-read-policy.t | 7 +++----
 xlators/cluster/ec/src/ec.c     | 2 +-
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/tests/basic/ec/ec-read-policy.t b/tests/basic/ec/ec-read-policy.t
index e4390aa..fe6fe65 100644
--- a/tests/basic/ec/ec-read-policy.t
+++ b/tests/basic/ec/ec-read-policy.t
@@ -20,10 +20,9 @@ TEST $CLI volume start $V0
 TEST glusterfs --direct-io-mode=yes --entry-timeout=0 --attribute-timeout=0 -s $H0 --volfile-id $V0 $M0
 EXPECT_WITHIN $CHILD_UP_TIMEOUT "6" ec_child_up_count $V0 0
 #TEST volume operations work fine
-EXPECT "round-robin" mount_get_option_value $M0 $V0-disperse-0 read-policy
-TEST $CLI volume set $V0 disperse.read-policy gfid-hash
-EXPECT_WITHIN $CONFIG_UPDATE_TIMEOUT "gfid-hash" mount_get_option_value $M0 $V0-disperse-0 read-policy
-TEST $CLI volume reset $V0 disperse.read-policy
+
+EXPECT "gfid-hash" mount_get_option_value $M0 $V0-disperse-0 read-policy
+TEST $CLI volume set $V0 disperse.read-policy round-robin
 EXPECT_WITHIN $CONFIG_UPDATE_TIMEOUT "round-robin" mount_get_option_value $M0 $V0-disperse-0 read-policy
 
 #TEST if the option gives the intended behavior. The way we perform this test
diff --git a/xlators/cluster/ec/src/ec.c b/xlators/cluster/ec/src/ec.c
index 13ce7fb..bfdca64 100644
--- a/xlators/cluster/ec/src/ec.c
+++ b/xlators/cluster/ec/src/ec.c
@@ -1447,7 +1447,7 @@ struct volume_options options[] =
     { .key = {"read-policy" },
       .type = GF_OPTION_TYPE_STR,
       .value = {"round-robin", "gfid-hash"},
-      .default_value = "round-robin",
+      .default_value = "gfid-hash",
       .description = "inode-read fops happen only on 'k' number of bricks in"
               " n=k+m disperse subvolume. 'round-robin' selects the read"
               " subvolume using round-robin algo. 'gfid-hash' selects read"
-- 
1.8.3.1