Blob Blame Raw
From 06e1fe32e47b98efaa3598629fb59e5f7791e28d Mon Sep 17 00:00:00 2001
From: Thierry Bordaz <tbordaz@redhat.com>
Date: Wed, 27 Nov 2019 14:04:14 +0100
Subject: [PATCH] Ticket 50745: ns-slapd hangs during CleanAllRUV tests

Bug Description:
	The hang condition:
		- is not systematic
		- occurs in rare case, for example here during the deletion of a replica.
		- a thread is waiting for a dblock that an other thread "forgot" to
		  release.
		- have always existed, at least since 1.4.0 but likely since 1.2.x

	When deleting a replica, the replica is retrieved from
	mapping tree structure (mtnode).
	The replica is also retrieved through the mapping tree
	when writing updates to the changelog.

	When deleting the replica, mapping tree structure is cleared
	after the changelog is deleted (that can take some cycles).
	There is a window where an update can retrieve the replica,
	from the not yet cleared MT, while the changelog being removed.

	At the end, the update will update the changelog that is
	currently removed and keeps an unfree lock in the DB.

Fix description:
	Ideally mapping tree should be protected by a lock but it
	is not done systematically (e.g.  slapi_get_mapping_tree_node).
	Using a lock looks an overkill and can probably introduce
	deadlock and performance hit.
	The idea of the fix is to reduce the window, moving the
	mapping tree clear before the changelog removal.

https://pagure.io/389-ds-base/issue/50745

Reviewed by: Mark Reynolds, Ludwig Krispenz
---
 ldap/servers/plugins/replication/repl5_replica_config.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/ldap/servers/plugins/replication/repl5_replica_config.c b/ldap/servers/plugins/replication/repl5_replica_config.c
index 79b257564..02b36f6ad 100644
--- a/ldap/servers/plugins/replication/repl5_replica_config.c
+++ b/ldap/servers/plugins/replication/repl5_replica_config.c
@@ -757,6 +757,10 @@ replica_config_delete(Slapi_PBlock *pb __attribute__((unused)),
     if (mtnode_ext->replica) {
         /* remove object from the hash */
         r = (Replica *)object_get_data(mtnode_ext->replica);
+        mtnode_ext->replica = NULL;  /* moving it before deleting the CL because
+                                      * deletion can take some time giving the opportunity
+                                      * to an operation to start while CL is deleted
+                                      */
         PR_ASSERT(r);
         /* The changelog for this replica is no longer valid, so we should remove it. */
         slapi_log_err(SLAPI_LOG_WARNING, repl_plugin_name, "replica_config_delete - "
@@ -765,7 +769,6 @@ replica_config_delete(Slapi_PBlock *pb __attribute__((unused)),
                       slapi_sdn_get_dn(replica_get_root(r)));
         cl5DeleteDBSync(r);
         replica_delete_by_name(replica_get_name(r));
-        mtnode_ext->replica = NULL;
     }
 
     PR_Unlock(s_configLock);
-- 
2.21.1