chengshan / rpms / kernel

Forked from rpms/kernel 2 years ago
Clone
24d93b
From 743848de574b660972f457c28c02cbb19c8aa439 Mon Sep 17 00:00:00 2001
24d93b
From: "T.kabe" <kabe@>
24d93b
Date: Fri, 3 Mar 2017 17:06:44 +0900
24d93b
Subject: [PATCH 4/4] vfs: Lazily remove mounts on unlinked files and directories.
24d93b
24d93b
[upstream commit 8ed936b5671bfb33d89bc60bdcc7cf0470ba52fe]
24d93b
[upstream commit 7af1364ffa64db61e386628594836e13d2ef04b5]
24d93b
24d93b
commit 8ed936b5671bfb33d89bc60bdcc7cf0470ba52fe
24d93b
Author: Eric W. Biederman <ebiederman@twitter.com>
24d93b
Date:   Tue Oct 1 18:33:48 2013 -0700
24d93b
24d93b
    vfs: Lazily remove mounts on unlinked files and directories.
24d93b
24d93b
    With the introduction of mount namespaces and bind mounts it became
24d93b
    possible to access files and directories that on some paths are mount
24d93b
    points but are not mount points on other paths.  It is very confusing
24d93b
    when rm -rf somedir returns -EBUSY simply because somedir is mounted
24d93b
    somewhere else.  With the addition of user namespaces allowing
24d93b
    unprivileged mounts this condition has gone from annoying to allowing
24d93b
    a DOS attack on other users in the system.
24d93b
24d93b
    The possibility for mischief is removed by updating the vfs to support
24d93b
    rename, unlink and rmdir on a dentry that is a mountpoint and by
24d93b
    lazily unmounting mountpoints on deleted dentries.
24d93b
24d93b
    In particular this change allows rename, unlink and rmdir system calls
24d93b
    on a dentry without a mountpoint in the current mount namespace to
24d93b
    succeed, and it allows rename, unlink, and rmdir performed on a
24d93b
    distributed filesystem to update the vfs cache even if when there is a
24d93b
    mount in some namespace on the original dentry.
24d93b
24d93b
    There are two common patterns of maintaining mounts: Mounts on trusted
24d93b
    paths with the parent directory of the mount point and all ancestory
24d93b
    directories up to / owned by root and modifiable only by root
24d93b
    (i.e. /media/xxx, /dev, /dev/pts, /proc, /sys, /sys/fs/cgroup/{cpu,
24d93b
    cpuacct, ...}, /usr, /usr/local).  Mounts on unprivileged directories
24d93b
    maintained by fusermount.
24d93b
24d93b
    In the case of mounts in trusted directories owned by root and
24d93b
    modifiable only by root the current parent directory permissions are
24d93b
    sufficient to ensure a mount point on a trusted path is not removed
24d93b
    or renamed by anyone other than root, even if there is a context
24d93b
    where the there are no mount points to prevent this.
24d93b
24d93b
    In the case of mounts in directories owned by less privileged users
24d93b
    races with users modifying the path of a mount point are already a
24d93b
    danger.  fusermount already uses a combination of chdir,
24d93b
    /proc/<pid>/fd/NNN, and UMOUNT_NOFOLLOW to prevent these races.  The
24d93b
    removable of global rename, unlink, and rmdir protection really adds
24d93b
    nothing new to consider only a widening of the attack window, and
24d93b
    fusermount is already safe against unprivileged users modifying the
24d93b
    directory simultaneously.
24d93b
24d93b
    In principle for perfect userspace programs returning -EBUSY for
24d93b
    unlink, rmdir, and rename of dentires that have mounts in the local
24d93b
    namespace is actually unnecessary.  Unfortunately not all userspace
24d93b
    programs are perfect so retaining -EBUSY for unlink, rmdir and rename
24d93b
    of dentries that have mounts in the current mount namespace plays an
24d93b
    important role of maintaining consistency with historical behavior and
24d93b
    making imperfect userspace applications hard to exploit.
24d93b
24d93b
    v2: Remove spurious old_dentry.
24d93b
    v3: Optimized shrink_submounts_and_drop
24d93b
        Removed unsued afs label
24d93b
    v4: Simplified the changes to check_submounts_and_drop
24d93b
        Do not rename check_submounts_and_drop shrink_submounts_and_drop
24d93b
        Document what why we need atomicity in check_submounts_and_drop
24d93b
        Rely on the parent inode mutex to make d_revalidate and d_invalidate
24d93b
        an atomic unit.
24d93b
    v5: Refcount the mountpoint to detach in case of simultaneous
24d93b
        renames.
24d93b
24d93b
    Reviewed-by: Miklos Szeredi <miklos@szeredi.hu>
24d93b
    Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
24d93b
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
24d93b
24d93b
commit 7af1364ffa64db61e386628594836e13d2ef04b5
24d93b
Author: Eric W. Biederman <ebiederm@xmission.com>
24d93b
Date:   Fri Oct 4 19:15:13 2013 -0700
24d93b
24d93b
    vfs: Don't allow overwriting mounts in the current mount namespace
24d93b
24d93b
    In preparation for allowing mountpoints to be renamed and unlinked
24d93b
    in remote filesystems and in other mount namespaces test if on a dentry
24d93b
    there is a mount in the local mount namespace before allowing it to
24d93b
    be renamed or unlinked.
24d93b
24d93b
    The primary motivation here are old versions of fusermount unmount
24d93b
    which is not safe if the a path can be renamed or unlinked while it is
24d93b
    verifying the mount is safe to unmount.  More recent versions are simpler
24d93b
    and safer by simply using UMOUNT_NOFOLLOW when unmounting a mount
24d93b
    in a directory owned by an arbitrary user.
24d93b
24d93b
    Miklos Szeredi <miklos@szeredi.hu> reports this is approach is good
24d93b
    enough to remove concerns about new kernels mixed with old versions
24d93b
    of fusermount.
24d93b
24d93b
    A secondary motivation for restrictions here is that it removing empty
24d93b
    directories that have non-empty mount points on them appears to
24d93b
    violate the rule that rmdir can not remove empty directories.  As
24d93b
    Linus Torvalds pointed out this is useful for programs (like git) that
24d93b
    test if a directory is empty with rmdir.
24d93b
24d93b
    Therefore this patch arranges to enforce the existing mount point
24d93b
    semantics for local mount namespace.
24d93b
24d93b
    v2: Rewrote the test to be a drop in replacement for d_mountpoint
24d93b
    v3: Use bool instead of int as the return type of is_local_mountpoint
24d93b
24d93b
    Reviewed-by: Miklos Szeredi <miklos@szeredi.hu>
24d93b
    Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
24d93b
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
24d93b
---
24d93b
 fs/dcache.c    | 69 +++++++++++++++++++++++++++++++---------------------------
24d93b
 fs/mount.h     |  9 ++++++++
24d93b
 fs/namei.c     | 16 +++++++++-----
24d93b
 fs/namespace.c | 35 +++++++++++++++++++++++++++++
24d93b
 4 files changed, 91 insertions(+), 38 deletions(-)
24d93b
24d93b
diff --git a/fs/dcache.c b/fs/dcache.c
24d93b
index 5dabe0e..a3e9e7a 100644
24d93b
--- a/fs/dcache.c
24d93b
+++ b/fs/dcache.c
24d93b
@@ -1285,36 +1285,38 @@ void shrink_dcache_parent(struct dentry *parent)
24d93b
 }
24d93b
 EXPORT_SYMBOL(shrink_dcache_parent);
24d93b
 
24d93b
-static enum d_walk_ret check_and_collect(void *_data, struct dentry *dentry)
24d93b
+struct detach_data {
24d93b
+	struct select_data select;
24d93b
+	struct dentry *mountpoint;
24d93b
+};
24d93b
+static enum d_walk_ret detach_and_collect(void *_data, struct dentry *dentry)
24d93b
 {
24d93b
-	struct select_data *data = _data;
24d93b
-
24d93b
-	if (d_mountpoint(dentry)) {
24d93b
-		data->found = -EBUSY;
24d93b
-		return D_WALK_QUIT;
24d93b
-	}
24d93b
+	struct detach_data *data = _data;
24d93b
 
24d93b
-	return select_collect(_data, dentry);
24d93b
-}
24d93b
+ 	if (d_mountpoint(dentry)) {
24d93b
+		__dget_dlock(dentry);
24d93b
+		data->mountpoint = dentry;
24d93b
+ 		return D_WALK_QUIT;
24d93b
+ 	}
24d93b
+	return select_collect(&data->select, dentry);
24d93b
+ }
24d93b
 
24d93b
 static void check_and_drop(void *_data)
24d93b
 {
24d93b
-	struct select_data *data = _data;
24d93b
+	struct detach_data *data = _data;
24d93b
 
24d93b
-	if (d_mountpoint(data->start))
24d93b
-		data->found = -EBUSY;
24d93b
-	if (!data->found)
24d93b
-		__d_drop(data->start);
24d93b
+	if (!data->mountpoint && !data->select.found)
24d93b
+		__d_drop(data->select.start);
24d93b
 }
24d93b
 
24d93b
 /**
24d93b
- * check_submounts_and_drop - prune dcache, check for submounts and drop
24d93b
+ * check_submounts_and_drop - detach submounts, prune dcache, and drop
24d93b
  *
24d93b
- * All done as a single atomic operation relative to has_unlinked_ancestor().
24d93b
- * Returns 0 if successfully unhashed @parent.  If there were submounts then
24d93b
- * return -EBUSY.
24d93b
+ * The final d_drop is done as an atomic operation relative to
24d93b
+ * rename_lock ensuring there are no races with d_set_mounted.  This
24d93b
+ * ensures there are no unhashed dentries on the path to a mountpoint.
24d93b
  *
24d93b
- * @dentry: dentry to prune and drop
24d93b
+ * @dentry: dentry to detach, prune and drop
24d93b
  */
24d93b
 int check_submounts_and_drop(struct dentry *dentry)
24d93b
 {
24d93b
@@ -1327,19 +1329,24 @@ int check_submounts_and_drop(struct dentry *dentry)
24d93b
 	}
24d93b
 
24d93b
 	for (;;) {
24d93b
-		struct select_data data;
24d93b
+		struct detach_data data;
24d93b
 
24d93b
-		INIT_LIST_HEAD(&data.dispose);
24d93b
-		data.start = dentry;
24d93b
-		data.found = 0;
24d93b
+		data.mountpoint = NULL;
24d93b
+		INIT_LIST_HEAD(&data.select.dispose);
24d93b
+		data.select.start = dentry;
24d93b
+		data.select.found = 0;
24d93b
 
24d93b
-		d_walk(dentry, &data, check_and_collect, check_and_drop);
24d93b
-		ret = data.found;
24d93b
+		d_walk(dentry, &data, detach_and_collect, check_and_drop);
24d93b
 
24d93b
-		if (!list_empty(&data.dispose))
24d93b
-			shrink_dentry_list(&data.dispose);
24d93b
+		if (data.select.found)
24d93b
+			shrink_dentry_list(&data.select.dispose);
24d93b
 
24d93b
-		if (ret <= 0)
24d93b
+		if (data.mountpoint) {
24d93b
+			detach_mounts(data.mountpoint);
24d93b
+			dput(data.mountpoint);
24d93b
+		}
24d93b
+
24d93b
+		if (!data.mountpoint && !data.select.found)
24d93b
 			break;
24d93b
 
24d93b
 		cond_resched();
24d93b
@@ -2554,10 +2561,8 @@ static struct dentry *__d_unalias(struct inode *inode,
24d93b
 		goto out_err;
24d93b
 	m2 = &alias->d_parent->d_inode->i_mutex;
24d93b
 out_unalias:
24d93b
-	if (likely(!d_mountpoint(alias))) {
24d93b
-		__d_move(alias, dentry, false);
24d93b
-		ret = alias;
24d93b
-	}
24d93b
+	__d_move(alias, dentry, false);
24d93b
+	ret = alias;
24d93b
 out_err:
24d93b
 	spin_unlock(&inode->i_lock);
24d93b
 	if (m2)
24d93b
diff --git a/fs/mount.h b/fs/mount.h
24d93b
index 9959119..a373c86 100644
24d93b
--- a/fs/mount.h
24d93b
+++ b/fs/mount.h
24d93b
@@ -107,3 +107,12 @@ struct proc_mounts {
24d93b
 #define proc_mounts(p) (container_of((p), struct proc_mounts, m))
24d93b
 
24d93b
 extern const struct seq_operations mounts_op;
24d93b
+
24d93b
+extern bool __is_local_mountpoint(struct dentry *dentry);
24d93b
+static inline bool is_local_mountpoint(struct dentry *dentry)
24d93b
+{
24d93b
+	if (!d_mountpoint(dentry))
24d93b
+		return false;
24d93b
+
24d93b
+	return __is_local_mountpoint(dentry);
24d93b
+}
24d93b
diff --git a/fs/namei.c b/fs/namei.c
24d93b
index 872e5e5..ef70aa8 100644
24d93b
--- a/fs/namei.c
24d93b
+++ b/fs/namei.c
24d93b
@@ -3691,8 +3691,8 @@ int vfs_rmdir(struct inode *dir, struct dentry *dentry)
24d93b
 	mutex_lock(&dentry->d_inode->i_mutex);
24d93b
 
24d93b
 	error = -EBUSY;
24d93b
-	if (d_mountpoint(dentry))
24d93b
-		goto out;
24d93b
+ 	if (is_local_mountpoint(dentry))
24d93b
+ 		goto out;
24d93b
 
24d93b
 	error = security_inode_rmdir(dir, dentry);
24d93b
 	if (error)
24d93b
@@ -3705,6 +3705,7 @@ int vfs_rmdir(struct inode *dir, struct dentry *dentry)
24d93b
 
24d93b
 	dentry->d_inode->i_flags |= S_DEAD;
24d93b
 	dont_mount(dentry);
24d93b
+	detach_mounts(dentry);
24d93b
 
24d93b
 out:
24d93b
 	mutex_unlock(&dentry->d_inode->i_mutex);
24d93b
@@ -3806,7 +3807,7 @@ int vfs_unlink(struct inode *dir, struct dentry *dentry, struct inode **delegate
24d93b
 		return -EPERM;
24d93b
 
24d93b
 	mutex_lock(&target->i_mutex);
24d93b
-	if (d_mountpoint(dentry))
24d93b
+	if (is_local_mountpoint(dentry))
24d93b
 		error = -EBUSY;
24d93b
 	else {
24d93b
 		error = security_inode_unlink(dir, dentry);
24d93b
@@ -3815,8 +3816,10 @@ int vfs_unlink(struct inode *dir, struct dentry *dentry, struct inode **delegate
24d93b
 			if (error)
24d93b
 				goto out;
24d93b
 			error = dir->i_op->unlink(dir, dentry);
24d93b
-			if (!error)
24d93b
+			if (!error) {
24d93b
 				dont_mount(dentry);
24d93b
+				detach_mounts(dentry);
24d93b
+			}
24d93b
 		}
24d93b
 	}
24d93b
 out:
24d93b
@@ -4254,8 +4257,8 @@ int vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
24d93b
 		mutex_lock(&target->i_mutex);
24d93b
 
24d93b
 	error = -EBUSY;
24d93b
-	if (d_mountpoint(old_dentry) || d_mountpoint(new_dentry))
24d93b
-		goto out;
24d93b
+ 	if (is_local_mountpoint(old_dentry) || is_local_mountpoint(new_dentry))
24d93b
+ 		goto out;
24d93b
 
24d93b
 	if (max_links && new_dir != old_dir) {
24d93b
 		error = -EMLINK;
24d93b
@@ -4292,6 +4295,7 @@ int vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
24d93b
 		if (is_dir)
24d93b
 			target->i_flags |= S_DEAD;
24d93b
 		dont_mount(new_dentry);
24d93b
+		detach_mounts(new_dentry);
24d93b
 	}
24d93b
 	if (!(old_dir->i_sb->s_type->fs_flags & FS_RENAME_DOES_D_MOVE)) {
24d93b
 		if (!(flags & RENAME_EXCHANGE))
24d93b
diff --git a/fs/namespace.c b/fs/namespace.c
24d93b
index e48fed3..d633562 100644
24d93b
--- a/fs/namespace.c
24d93b
+++ b/fs/namespace.c
24d93b
@@ -625,6 +625,41 @@ static struct mountpoint *lookup_mountpoint(struct dentry *dentry)
24d93b
 	return NULL;
24d93b
 }
24d93b
 
24d93b
+/*
24d93b
+ * __is_local_mountpoint - Test to see if dentry is a mountpoint in the
24d93b
+ *                         current mount namespace.
24d93b
+ *
24d93b
+ * The common case is dentries are not mountpoints at all and that
24d93b
+ * test is handled inline.  For the slow case when we are actually
24d93b
+ * dealing with a mountpoint of some kind, walk through all of the
24d93b
+ * mounts in the current mount namespace and test to see if the dentry
24d93b
+ * is a mountpoint.
24d93b
+ *
24d93b
+ * The mount_hashtable is not usable in the context because we
24d93b
+ * need to identify all mounts that may be in the current mount
24d93b
+ * namespace not just a mount that happens to have some specified
24d93b
+ * parent mount.
24d93b
+ */
24d93b
+bool __is_local_mountpoint(struct dentry *dentry)
24d93b
+{
24d93b
+	struct mnt_namespace *ns = current->nsproxy->mnt_ns;
24d93b
+	struct mount *mnt;
24d93b
+	bool is_covered = false;
24d93b
+
24d93b
+	if (!d_mountpoint(dentry))
24d93b
+		goto out;
24d93b
+
24d93b
+	down_read(&namespace_sem);
24d93b
+	list_for_each_entry(mnt, &ns->list, mnt_list) {
24d93b
+		is_covered = (mnt->mnt_mountpoint == dentry);
24d93b
+		if (is_covered)
24d93b
+			break;
24d93b
+	}
24d93b
+	up_read(&namespace_sem);
24d93b
+out:
24d93b
+	return is_covered;
24d93b
+}
24d93b
+
24d93b
 static struct mountpoint *new_mountpoint(struct dentry *dentry)
24d93b
 {
24d93b
 	struct list_head *chain = mountpoint_hashtable + hash(NULL, dentry);
24d93b
-- 
24d93b
1.8.3.1
24d93b