d1681e
From a47d863ea4501d3d0daceacb194c9f900cefe1a7 Mon Sep 17 00:00:00 2001
d1681e
From: Kotresh HR <khiremat@redhat.com>
d1681e
Date: Mon, 13 Nov 2017 05:27:50 -0500
d1681e
Subject: [PATCH 119/128] geo-rep: Fix data sync issue during hardlink, rename
d1681e
d1681e
Problem:
d1681e
The data is not getting synced if master witnessed
d1681e
IO as below.
d1681e
d1681e
1. echo "test_data" > f1
d1681e
2. ln f1 f2
d1681e
3. mv f2 f3
d1681e
4. unlink f1
d1681e
d1681e
On master, 'f3' exists with data "test_data" but on
d1681e
slave, only f3 exists with zero byte file without
d1681e
backend gfid link.
d1681e
d1681e
Cause:
d1681e
On master, since 'f2' no longer exists, the hardlink
d1681e
is skipped during processing. Later, on trying to sync
d1681e
rename, since source ('f2') doesn't exist, dst ('f3')
d1681e
is created with same gfid. But in this use case, it
d1681e
succeeds but backend gfid would not have linked as 'f1'
d1681e
exists with the same gfid. So, rsync would fail with
d1681e
ENOENT as backend gfid is not linked with 'f3' and 'f1'
d1681e
is unlinked.
d1681e
d1681e
Fix:
d1681e
On processing rename, if src doesn't exist on slave,
d1681e
don't blindly create dst with same gfid. The gfid
d1681e
needs to be checked, if it exists, hardlink needs
d1681e
to be created instead of mknod.
d1681e
d1681e
Thanks Aravinda for helping in RCA :)
d1681e
d1681e
Upstream Reference:
d1681e
> Patch: https://review.gluster.org/18731
d1681e
> BUG: 1512483
d1681e
d1681e
Change-Id: I5af4f99798ed1bcb297598a4bc796b701d1e0130
d1681e
BUG: 1512496
d1681e
Signed-off-by: Kotresh HR <khiremat@redhat.com>
d1681e
Reviewed-on: https://code.engineering.redhat.com/gerrit/126728
d1681e
Tested-by: RHGS Build Bot <nigelb@redhat.com>
d1681e
Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
d1681e
---
d1681e
 geo-replication/syncdaemon/resource.py | 13 +++++++++++--
d1681e
 1 file changed, 11 insertions(+), 2 deletions(-)
d1681e
d1681e
diff --git a/geo-replication/syncdaemon/resource.py b/geo-replication/syncdaemon/resource.py
d1681e
index 22aaf85..5ad5b97 100644
d1681e
--- a/geo-replication/syncdaemon/resource.py
d1681e
+++ b/geo-replication/syncdaemon/resource.py
d1681e
@@ -814,8 +814,17 @@ class Server(object):
d1681e
                             elif not matching_disk_gfid(gfid, en):
d1681e
                                 collect_failure(e, EEXIST, True)
d1681e
                         else:
d1681e
-                            (pg, bname) = entry2pb(en)
d1681e
-                            blob = entry_pack_reg_stat(gfid, bname, e['stat'])
d1681e
+                            slink = os.path.join(pfx, gfid)
d1681e
+                            st = lstat(slink)
d1681e
+                            # don't create multiple entries with same gfid
d1681e
+                            if isinstance(st, int):
d1681e
+                                (pg, bname) = entry2pb(en)
d1681e
+                                blob = entry_pack_reg_stat(gfid, bname,
d1681e
+                                                           e['stat'])
d1681e
+                            else:
d1681e
+                                cmd_ret = errno_wrap(os.link, [slink, en],
d1681e
+                                                    [ENOENT, EEXIST], [ESTALE])
d1681e
+                                collect_failure(e, cmd_ret)
d1681e
                 else:
d1681e
                     st1 = lstat(en)
d1681e
                     if isinstance(st1, int):
d1681e
-- 
d1681e
1.8.3.1
d1681e