d1681e
From 737d077a44899f6222822408c400fcd91939ca5b Mon Sep 17 00:00:00 2001
d1681e
From: Kotresh HR <khiremat@redhat.com>
d1681e
Date: Thu, 12 Jul 2018 04:31:01 -0400
d1681e
Subject: [PATCH 321/325] geo-rep: Fix symlink rename syncing issue
d1681e
d1681e
Problem:
d1681e
   Geo-rep sometimes fails to sync the rename of symlink
d1681e
if the I/O is as follows
d1681e
d1681e
  1. touch file1
d1681e
  2. ln -s "./file1" sym_400
d1681e
  3. mv sym_400 renamed_sym_400
d1681e
  4. mkdir sym_400
d1681e
d1681e
 The file 'renamed_sym_400' failed to sync to slave
d1681e
d1681e
Cause:
d1681e
  Assume there are three distribute subvolume (brick1, brick2, brick3).
d1681e
  The changelogs are recorded as follows for above I/O pattern.
d1681e
  Note that the MKDIR is recorded on all bricks.
d1681e
d1681e
  1. brick1:
d1681e
     -------
d1681e
d1681e
     CREATE file1
d1681e
     SYMLINK sym_400
d1681e
     RENAME sym_400 renamed_sym_400
d1681e
     MKDIR sym_400
d1681e
d1681e
  2. brick2:
d1681e
     -------
d1681e
d1681e
     MKDIR sym_400
d1681e
d1681e
  3. brick3:
d1681e
     -------
d1681e
d1681e
     MKDIR sym_400
d1681e
d1681e
  The operations on 'brick1' should be processed sequentially. But
d1681e
  since MKDIR is recorded on all the bricks, The brick 'brick2/brick3'
d1681e
  processed MKDIR first before 'brick1' causing out of order syncing
d1681e
  and created directory sym_400 first.
d1681e
d1681e
  Now 'brick1' processed it's changelog.
d1681e
d1681e
     CREATE file1 -> succeeds
d1681e
     SYMLINK sym_400 -> No longer present in master. Ignored
d1681e
     RENAME sym_400 renamed_sym_400
d1681e
            While processing RENAME, if source('sym_400') doesn't
d1681e
            present, destination('renamed_sym_400') is created. But
d1681e
            geo-rep stats the name 'sym_400' to confirm source file's
d1681e
            presence. In this race, since source name 'sym_400' is
d1681e
            present as directory, it doesn't create destination.
d1681e
            Hence RENAME is ignored.
d1681e
d1681e
Fix:
d1681e
  The fix is not rely only on stat of source name during RENAME.
d1681e
  It should stat the name and if the name is present, gfid should
d1681e
  be same. Only then it can conclude the presence of source.
d1681e
d1681e
>upstream patch : https://review.gluster.org/#/c/20496/
d1681e
d1681e
Backport of:
d1681e
 > BUG: 1600405
d1681e
 > Change-Id: I9fbec4f13ca6a182798a7f81b356fe2003aff969
d1681e
 > Signed-off-by: Kotresh HR <khiremat@redhat.com>
d1681e
d1681e
BUG: 1601314
d1681e
Change-Id: I9fbec4f13ca6a182798a7f81b356fe2003aff969
d1681e
Signed-off-by: Kotresh HR <khiremat@redhat.com>
d1681e
Reviewed-on: https://code.engineering.redhat.com/gerrit/144104
d1681e
Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
d1681e
Tested-by: RHGS Build Bot <nigelb@redhat.com>
d1681e
---
d1681e
 geo-replication/syncdaemon/resource.py | 11 +++++++++--
d1681e
 1 file changed, 9 insertions(+), 2 deletions(-)
d1681e
d1681e
diff --git a/geo-replication/syncdaemon/resource.py b/geo-replication/syncdaemon/resource.py
d1681e
index 00e62b7..0d5462a 100644
d1681e
--- a/geo-replication/syncdaemon/resource.py
d1681e
+++ b/geo-replication/syncdaemon/resource.py
d1681e
@@ -674,8 +674,14 @@ class Server(object):
d1681e
                     collect_failure(e, EEXIST)
d1681e
             elif op == 'RENAME':
d1681e
                 en = e['entry1']
d1681e
-                st = lstat(entry)
d1681e
-                if isinstance(st, int):
d1681e
+                # The matching disk gfid check validates two things
d1681e
+                #  1. Validates name is present, return false otherwise
d1681e
+                #  2. Validates gfid is same, returns false otherwise
d1681e
+                # So both validations are necessary to decide src doesn't
d1681e
+                # exist. We can't rely on only gfid stat as hardlink could
d1681e
+                # be present and we can't rely only on name as name could
d1681e
+                # exist with differnt gfid.
d1681e
+                if not matching_disk_gfid(gfid, entry):
d1681e
                     if e['stat'] and not stat.S_ISDIR(e['stat']['mode']):
d1681e
                         if stat.S_ISLNK(e['stat']['mode']) and \
d1681e
                            e['link'] is not None:
d1681e
@@ -699,6 +705,7 @@ class Server(object):
d1681e
                                                     [ENOENT, EEXIST], [ESTALE])
d1681e
                                 collect_failure(e, cmd_ret)
d1681e
                 else:
d1681e
+                    st = lstat(entry)
d1681e
                     st1 = lstat(en)
d1681e
                     if isinstance(st1, int):
d1681e
                         rename_with_disk_gfid_confirmation(gfid, entry, en)
d1681e
-- 
d1681e
1.8.3.1
d1681e