<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/fs/xattr.c, branch linux-6.0.y</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=linux-6.0.y</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=linux-6.0.y'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2022-12-31T12:25:42+00:00</updated>
<entry>
<title>fs: don't audit the capability check in simple_xattr_list()</title>
<updated>2022-12-31T12:25:42+00:00</updated>
<author>
<name>Ondrej Mosnacek</name>
<email>omosnace@redhat.com</email>
</author>
<published>2022-11-03T15:12:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=7107fb84607de7448de5cb7f1526eb14ec35495f'/>
<id>urn:sha1:7107fb84607de7448de5cb7f1526eb14ec35495f</id>
<content type='text'>
[ Upstream commit e7eda157c4071cd1e69f4b1687b0fbe1ae5e6f46 ]

The check being unconditional may lead to unwanted denials reported by
LSMs when a process has the capability granted by DAC, but denied by an
LSM. In the case of SELinux such denials are a problem, since they can't
be effectively filtered out via the policy and when not silenced, they
produce noise that may hide a true problem or an attack.

Checking for the capability only if any trusted xattr is actually
present wouldn't really address the issue, since calling listxattr(2) on
such node on its own doesn't indicate an explicit attempt to see the
trusted xattrs. Additionally, it could potentially leak the presence of
trusted xattrs to an unprivileged user if they can check for the denials
(e.g. through dmesg).

Therefore, it's best (and simplest) to keep the check unconditional and
instead use ns_capable_noaudit() that will silence any associated LSM
denials.

Fixes: 38f38657444d ("xattr: extract simple_xattr code from tmpfs")
Reported-by: Martin Pitt &lt;mpitt@redhat.com&gt;
Suggested-by: Christian Brauner (Microsoft) &lt;brauner@kernel.org&gt;
Signed-off-by: Ondrej Mosnacek &lt;omosnace@redhat.com&gt;
Reviewed-by: Christian Brauner (Microsoft) &lt;brauner@kernel.org&gt;
Reviewed-by: Paul Moore &lt;paul@paul-moore.com&gt;
Signed-off-by: Christian Brauner (Microsoft) &lt;brauner@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>acl: move idmapped mount fixup into vfs_{g,s}etxattr()</title>
<updated>2022-07-15T20:08:59+00:00</updated>
<author>
<name>Christian Brauner</name>
<email>brauner@kernel.org</email>
</author>
<published>2022-07-06T16:30:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=0c5fd887d2bb47aa37aa9fb1eb1d1d2abac62972'/>
<id>urn:sha1:0c5fd887d2bb47aa37aa9fb1eb1d1d2abac62972</id>
<content type='text'>
This cycle we added support for mounting overlayfs on top of idmapped mounts.
Recently I've started looking into potential corner cases when trying to add
additional tests and I noticed that reporting for POSIX ACLs is currently wrong
when using idmapped layers with overlayfs mounted on top of it.

I'm going to give a rather detailed explanation to both the origin of the
problem and the solution.

Let's assume the user creates the following directory layout and they have a
rootfs /var/lib/lxc/c1/rootfs. The files in this rootfs are owned as you would
expect files on your host system to be owned. For example, ~/.bashrc for your
regular user would be owned by 1000:1000 and /root/.bashrc would be owned by
0:0. IOW, this is just regular boring filesystem tree on an ext4 or xfs
filesystem.

The user chooses to set POSIX ACLs using the setfacl binary granting the user
with uid 4 read, write, and execute permissions for their .bashrc file:

        setfacl -m u:4:rwx /var/lib/lxc/c2/rootfs/home/ubuntu/.bashrc

Now they to expose the whole rootfs to a container using an idmapped mount. So
they first create:

        mkdir -pv /vol/contpool/{ctrover,merge,lowermap,overmap}
        mkdir -pv /vol/contpool/ctrover/{over,work}
        chown 10000000:10000000 /vol/contpool/ctrover/{over,work}

The user now creates an idmapped mount for the rootfs:

        mount-idmapped/mount-idmapped --map-mount=b:0:10000000:65536 \
                                      /var/lib/lxc/c2/rootfs \
                                      /vol/contpool/lowermap

This for example makes it so that /var/lib/lxc/c2/rootfs/home/ubuntu/.bashrc
which is owned by uid and gid 1000 as being owned by uid and gid 10001000 at
/vol/contpool/lowermap/home/ubuntu/.bashrc.

Assume the user wants to expose these idmapped mounts through an overlayfs
mount to a container.

       mount -t overlay overlay                      \
             -o lowerdir=/vol/contpool/lowermap,     \
                upperdir=/vol/contpool/overmap/over, \
                workdir=/vol/contpool/overmap/work   \
             /vol/contpool/merge

The user can do this in two ways:

(1) Mount overlayfs in the initial user namespace and expose it to the
    container.
(2) Mount overlayfs on top of the idmapped mounts inside of the container's
    user namespace.

Let's assume the user chooses the (1) option and mounts overlayfs on the host
and then changes into a container which uses the idmapping 0:10000000:65536
which is the same used for the two idmapped mounts.

Now the user tries to retrieve the POSIX ACLs using the getfacl command

        getfacl -n /vol/contpool/lowermap/home/ubuntu/.bashrc

and to their surprise they see:

        # file: vol/contpool/merge/home/ubuntu/.bashrc
        # owner: 1000
        # group: 1000
        user::rw-
        user:4294967295:rwx
        group::r--
        mask::rwx
        other::r--

indicating the the uid wasn't correctly translated according to the idmapped
mount. The problem is how we currently translate POSIX ACLs. Let's inspect the
callchain in this example:

        idmapped mount /vol/contpool/merge:      0:10000000:65536
        caller's idmapping:                      0:10000000:65536
        overlayfs idmapping (ofs-&gt;creator_cred): 0:0:4k /* initial idmapping */

        sys_getxattr()
        -&gt; path_getxattr()
           -&gt; getxattr()
              -&gt; do_getxattr()
                  |&gt; vfs_getxattr()
                  |  -&gt; __vfs_getxattr()
                  |     -&gt; handler-&gt;get == ovl_posix_acl_xattr_get()
                  |        -&gt; ovl_xattr_get()
                  |           -&gt; vfs_getxattr()
                  |              -&gt; __vfs_getxattr()
                  |                 -&gt; handler-&gt;get() /* lower filesystem callback */
                  |&gt; posix_acl_fix_xattr_to_user()
                     {
                              4 = make_kuid(&amp;init_user_ns, 4);
                              4 = mapped_kuid_fs(&amp;init_user_ns /* no idmapped mount */, 4);
                              /* FAILURE */
                             -1 = from_kuid(0:10000000:65536 /* caller's idmapping */, 4);
                     }

If the user chooses to use option (2) and mounts overlayfs on top of idmapped
mounts inside the container things don't look that much better:

        idmapped mount /vol/contpool/merge:      0:10000000:65536
        caller's idmapping:                      0:10000000:65536
        overlayfs idmapping (ofs-&gt;creator_cred): 0:10000000:65536

        sys_getxattr()
        -&gt; path_getxattr()
           -&gt; getxattr()
              -&gt; do_getxattr()
                  |&gt; vfs_getxattr()
                  |  -&gt; __vfs_getxattr()
                  |     -&gt; handler-&gt;get == ovl_posix_acl_xattr_get()
                  |        -&gt; ovl_xattr_get()
                  |           -&gt; vfs_getxattr()
                  |              -&gt; __vfs_getxattr()
                  |                 -&gt; handler-&gt;get() /* lower filesystem callback */
                  |&gt; posix_acl_fix_xattr_to_user()
                     {
                              4 = make_kuid(&amp;init_user_ns, 4);
                              4 = mapped_kuid_fs(&amp;init_user_ns, 4);
                              /* FAILURE */
                             -1 = from_kuid(0:10000000:65536 /* caller's idmapping */, 4);
                     }

As is easily seen the problem arises because the idmapping of the lower mount
isn't taken into account as all of this happens in do_gexattr(). But
do_getxattr() is always called on an overlayfs mount and inode and thus cannot
possible take the idmapping of the lower layers into account.

This problem is similar for fscaps but there the translation happens as part of
vfs_getxattr() already. Let's walk through an fscaps overlayfs callchain:

        setcap 'cap_net_raw+ep' /var/lib/lxc/c2/rootfs/home/ubuntu/.bashrc

The expected outcome here is that we'll receive the cap_net_raw capability as
we are able to map the uid associated with the fscap to 0 within our container.
IOW, we want to see 0 as the result of the idmapping translations.

If the user chooses option (1) we get the following callchain for fscaps:

        idmapped mount /vol/contpool/merge:      0:10000000:65536
        caller's idmapping:                      0:10000000:65536
        overlayfs idmapping (ofs-&gt;creator_cred): 0:0:4k /* initial idmapping */

        sys_getxattr()
        -&gt; path_getxattr()
           -&gt; getxattr()
              -&gt; do_getxattr()
                   -&gt; vfs_getxattr()
                      -&gt; xattr_getsecurity()
                         -&gt; security_inode_getsecurity()                                       ________________________________
                            -&gt; cap_inode_getsecurity()                                         |                              |
                               {                                                               V                              |
                                        10000000 = make_kuid(0:0:4k /* overlayfs idmapping */, 10000000);                     |
                                        10000000 = mapped_kuid_fs(0:0:4k /* no idmapped mount */, 10000000);                  |
                                               /* Expected result is 0 and thus that we own the fscap. */                     |
                                               0 = from_kuid(0:10000000:65536 /* caller's idmapping */, 10000000);            |
                               }                                                                                              |
                               -&gt; vfs_getxattr_alloc()                                                                        |
                                  -&gt; handler-&gt;get == ovl_other_xattr_get()                                                    |
                                     -&gt; vfs_getxattr()                                                                        |
                                        -&gt; xattr_getsecurity()                                                                |
                                           -&gt; security_inode_getsecurity()                                                    |
                                              -&gt; cap_inode_getsecurity()                                                      |
                                                 {                                                                            |
                                                                0 = make_kuid(0:0:4k /* lower s_user_ns */, 0);               |
                                                         10000000 = mapped_kuid_fs(0:10000000:65536 /* idmapped mount */, 0); |
                                                         10000000 = from_kuid(0:0:4k /* overlayfs idmapping */, 10000000);    |
                                                         |____________________________________________________________________|
                                                 }
                                                 -&gt; vfs_getxattr_alloc()
                                                    -&gt; handler-&gt;get == /* lower filesystem callback */

And if the user chooses option (2) we get:

        idmapped mount /vol/contpool/merge:      0:10000000:65536
        caller's idmapping:                      0:10000000:65536
        overlayfs idmapping (ofs-&gt;creator_cred): 0:10000000:65536

        sys_getxattr()
        -&gt; path_getxattr()
           -&gt; getxattr()
              -&gt; do_getxattr()
                   -&gt; vfs_getxattr()
                      -&gt; xattr_getsecurity()
                         -&gt; security_inode_getsecurity()                                                _______________________________
                            -&gt; cap_inode_getsecurity()                                                  |                             |
                               {                                                                        V                             |
                                       10000000 = make_kuid(0:10000000:65536 /* overlayfs idmapping */, 0);                           |
                                       10000000 = mapped_kuid_fs(0:0:4k /* no idmapped mount */, 10000000);                           |
                                               /* Expected result is 0 and thus that we own the fscap. */                             |
                                              0 = from_kuid(0:10000000:65536 /* caller's idmapping */, 10000000);                     |
                               }                                                                                                      |
                               -&gt; vfs_getxattr_alloc()                                                                                |
                                  -&gt; handler-&gt;get == ovl_other_xattr_get()                                                            |
                                    |-&gt; vfs_getxattr()                                                                                |
                                        -&gt; xattr_getsecurity()                                                                        |
                                           -&gt; security_inode_getsecurity()                                                            |
                                              -&gt; cap_inode_getsecurity()                                                              |
                                                 {                                                                                    |
                                                                 0 = make_kuid(0:0:4k /* lower s_user_ns */, 0);                      |
                                                          10000000 = mapped_kuid_fs(0:10000000:65536 /* idmapped mount */, 0);        |
                                                                 0 = from_kuid(0:10000000:65536 /* overlayfs idmapping */, 10000000); |
                                                                 |____________________________________________________________________|
                                                 }
                                                 -&gt; vfs_getxattr_alloc()
                                                    -&gt; handler-&gt;get == /* lower filesystem callback */

We can see how the translation happens correctly in those cases as the
conversion happens within the vfs_getxattr() helper.

For POSIX ACLs we need to do something similar. However, in contrast to fscaps
we cannot apply the fix directly to the kernel internal posix acl data
structure as this would alter the cached values and would also require a rework
of how we currently deal with POSIX ACLs in general which almost never take the
filesystem idmapping into account (the noteable exception being FUSE but even
there the implementation is special) and instead retrieve the raw values based
on the initial idmapping.

The correct values are then generated right before returning to userspace. The
fix for this is to move taking the mount's idmapping into account directly in
vfs_getxattr() instead of having it be part of posix_acl_fix_xattr_to_user().

To this end we split out two small and unexported helpers
posix_acl_getxattr_idmapped_mnt() and posix_acl_setxattr_idmapped_mnt(). The
former to be called in vfs_getxattr() and the latter to be called in
vfs_setxattr().

Let's go back to the original example. Assume the user chose option (1) and
mounted overlayfs on top of idmapped mounts on the host:

        idmapped mount /vol/contpool/merge:      0:10000000:65536
        caller's idmapping:                      0:10000000:65536
        overlayfs idmapping (ofs-&gt;creator_cred): 0:0:4k /* initial idmapping */

        sys_getxattr()
        -&gt; path_getxattr()
           -&gt; getxattr()
              -&gt; do_getxattr()
                  |&gt; vfs_getxattr()
                  |  |&gt; __vfs_getxattr()
                  |  |  -&gt; handler-&gt;get == ovl_posix_acl_xattr_get()
                  |  |     -&gt; ovl_xattr_get()
                  |  |        -&gt; vfs_getxattr()
                  |  |           |&gt; __vfs_getxattr()
                  |  |           |  -&gt; handler-&gt;get() /* lower filesystem callback */
                  |  |           |&gt; posix_acl_getxattr_idmapped_mnt()
                  |  |              {
                  |  |                              4 = make_kuid(&amp;init_user_ns, 4);
                  |  |                       10000004 = mapped_kuid_fs(0:10000000:65536 /* lower idmapped mount */, 4);
                  |  |                       10000004 = from_kuid(&amp;init_user_ns, 10000004);
                  |  |                       |_______________________
                  |  |              }                               |
                  |  |                                              |
                  |  |&gt; posix_acl_getxattr_idmapped_mnt()           |
                  |     {                                           |
                  |                                                 V
                  |             10000004 = make_kuid(&amp;init_user_ns, 10000004);
                  |             10000004 = mapped_kuid_fs(&amp;init_user_ns /* no idmapped mount */, 10000004);
                  |             10000004 = from_kuid(&amp;init_user_ns, 10000004);
                  |     }       |_________________________________________________
                  |                                                              |
                  |                                                              |
                  |&gt; posix_acl_fix_xattr_to_user()                               |
                     {                                                           V
                                 10000004 = make_kuid(0:0:4k /* init_user_ns */, 10000004);
                                        /* SUCCESS */
                                        4 = from_kuid(0:10000000:65536 /* caller's idmapping */, 10000004);
                     }

And similarly if the user chooses option (1) and mounted overayfs on top of
idmapped mounts inside the container:

        idmapped mount /vol/contpool/merge:      0:10000000:65536
        caller's idmapping:                      0:10000000:65536
        overlayfs idmapping (ofs-&gt;creator_cred): 0:10000000:65536

        sys_getxattr()
        -&gt; path_getxattr()
           -&gt; getxattr()
              -&gt; do_getxattr()
                  |&gt; vfs_getxattr()
                  |  |&gt; __vfs_getxattr()
                  |  |  -&gt; handler-&gt;get == ovl_posix_acl_xattr_get()
                  |  |     -&gt; ovl_xattr_get()
                  |  |        -&gt; vfs_getxattr()
                  |  |           |&gt; __vfs_getxattr()
                  |  |           |  -&gt; handler-&gt;get() /* lower filesystem callback */
                  |  |           |&gt; posix_acl_getxattr_idmapped_mnt()
                  |  |              {
                  |  |                              4 = make_kuid(&amp;init_user_ns, 4);
                  |  |                       10000004 = mapped_kuid_fs(0:10000000:65536 /* lower idmapped mount */, 4);
                  |  |                       10000004 = from_kuid(&amp;init_user_ns, 10000004);
                  |  |                       |_______________________
                  |  |              }                               |
                  |  |                                              |
                  |  |&gt; posix_acl_getxattr_idmapped_mnt()           |
                  |     {                                           V
                  |             10000004 = make_kuid(&amp;init_user_ns, 10000004);
                  |             10000004 = mapped_kuid_fs(&amp;init_user_ns /* no idmapped mount */, 10000004);
                  |             10000004 = from_kuid(0(&amp;init_user_ns, 10000004);
                  |             |_________________________________________________
                  |     }                                                        |
                  |                                                              |
                  |&gt; posix_acl_fix_xattr_to_user()                               |
                     {                                                           V
                                 10000004 = make_kuid(0:0:4k /* init_user_ns */, 10000004);
                                        /* SUCCESS */
                                        4 = from_kuid(0:10000000:65536 /* caller's idmappings */, 10000004);
                     }

The last remaining problem we need to fix here is ovl_get_acl(). During
ovl_permission() overlayfs will call:

        ovl_permission()
        -&gt; generic_permission()
           -&gt; acl_permission_check()
              -&gt; check_acl()
                 -&gt; get_acl()
                    -&gt; inode-&gt;i_op-&gt;get_acl() == ovl_get_acl()
                        &gt; get_acl() /* on the underlying filesystem)
                          -&gt;inode-&gt;i_op-&gt;get_acl() == /*lower filesystem callback */
                 -&gt; posix_acl_permission()

passing through the get_acl request to the underlying filesystem. This will
retrieve the acls stored in the lower filesystem without taking the idmapping
of the underlying mount into account as this would mean altering the cached
values for the lower filesystem. So we block using ACLs for now until we
decided on a nice way to fix this. Note this limitation both in the
documentation and in the code.

The most straightforward solution would be to have ovl_get_acl() simply
duplicate the ACLs, update the values according to the idmapped mount and
return it to acl_permission_check() so it can be used in posix_acl_permission()
forgetting them afterwards. This is a bit heavy handed but fairly
straightforward otherwise.

Link: https://github.com/brauner/mount-idmapped/issues/9
Link: https://lore.kernel.org/r/20220708090134.385160-2-brauner@kernel.org
Cc: Seth Forshee &lt;sforshee@digitalocean.com&gt;
Cc: Amir Goldstein &lt;amir73il@gmail.com&gt;
Cc: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Aleksa Sarai &lt;cyphar@cyphar.com&gt;
Cc: Miklos Szeredi &lt;mszeredi@redhat.com&gt;
Cc: linux-unionfs@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Seth Forshee &lt;sforshee@digitalocean.com&gt;
Signed-off-by: Christian Brauner (Microsoft) &lt;brauner@kernel.org&gt;
</content>
</entry>
<entry>
<title>fs: split off do_getxattr from getxattr</title>
<updated>2022-04-25T00:18:37+00:00</updated>
<author>
<name>Stefan Roesch</name>
<email>shr@fb.com</email>
</author>
<published>2022-04-25T00:13:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c975cad931570004b5f51248424a2a696fb65630'/>
<id>urn:sha1:c975cad931570004b5f51248424a2a696fb65630</id>
<content type='text'>
This splits off do_getxattr function from the getxattr function. This will
allow io_uring to call it from its io worker.

Signed-off-by: Stefan Roesch &lt;shr@fb.com&gt;
Acked-by: Christian Brauner &lt;christian.brauner@ubuntu.com&gt;
Link: https://lore.kernel.org/r/20220323154420.3301504-3-shr@fb.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>fs: split off setxattr_copy and do_setxattr function from setxattr</title>
<updated>2022-04-25T00:18:37+00:00</updated>
<author>
<name>Stefan Roesch</name>
<email>shr@fb.com</email>
</author>
<published>2022-04-25T00:10:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=1a91794ce8481a293c5ef432feb440aee1455619'/>
<id>urn:sha1:1a91794ce8481a293c5ef432feb440aee1455619</id>
<content type='text'>
This splits of the setup part of the function setxattr in its own
dedicated function called setxattr_copy. In addition it also exposes a new
function called do_setxattr for making the setxattr call.

This makes it possible to call these two functions from io_uring in the
processing of an xattr request.

Signed-off-by: Stefan Roesch &lt;shr@fb.com&gt;
Acked-by: Christian Brauner &lt;christian.brauner@ubuntu.com&gt;
Link: https://lore.kernel.org/r/20220323154420.3301504-2-shr@fb.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>fs: fix acl translation</title>
<updated>2022-04-19T17:19:02+00:00</updated>
<author>
<name>Christian Brauner</name>
<email>brauner@kernel.org</email>
</author>
<published>2022-04-19T13:14:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=705191b03d507744c7e097f78d583621c14988ac'/>
<id>urn:sha1:705191b03d507744c7e097f78d583621c14988ac</id>
<content type='text'>
Last cycle we extended the idmapped mounts infrastructure to support
idmapped mounts of idmapped filesystems (No such filesystem yet exist.).
Since then, the meaning of an idmapped mount is a mount whose idmapping
is different from the filesystems idmapping.

While doing that work we missed to adapt the acl translation helpers.
They still assume that checking for the identity mapping is enough.  But
they need to use the no_idmapping() helper instead.

Note, POSIX ACLs are always translated right at the userspace-kernel
boundary using the caller's current idmapping and the initial idmapping.
The order depends on whether we're coming from or going to userspace.
The filesystem's idmapping doesn't matter at the border.

Consequently, if a non-idmapped mount is passed we need to make sure to
always pass the initial idmapping as the mount's idmapping and not the
filesystem idmapping.  Since it's irrelevant here it would yield invalid
ids and prevent setting acls for filesystems that are mountable in a
userns and support posix acls (tmpfs and fuse).

I verified the regression reported in [1] and verified that this patch
fixes it.  A regression test will be added to xfstests in parallel.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=215849 [1]
Fixes: bd303368b776 ("fs: support mapped mounts of mapped filesystems")
Cc: Seth Forshee &lt;sforshee@digitalocean.com&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: &lt;stable@vger.kernel.org&gt; # 5.17
Cc: &lt;regressions@lists.linux.dev&gt;
Signed-off-by: Christian Brauner (Microsoft) &lt;brauner@kernel.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>xattr: fix kernel-doc for mnt_userns and vfs xattr helpers</title>
<updated>2021-03-23T10:20:26+00:00</updated>
<author>
<name>Randy Dunlap</name>
<email>rdunlap@infradead.org</email>
</author>
<published>2021-02-16T04:29:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=6961fed420146297467efe4bc022458818839a1a'/>
<id>urn:sha1:6961fed420146297467efe4bc022458818839a1a</id>
<content type='text'>
Fix kernel-doc warnings in xattr.c:

../fs/xattr.c:257: warning: Function parameter or member 'mnt_userns' not described in '__vfs_setxattr_locked'
../fs/xattr.c:485: warning: Function parameter or member 'mnt_userns' not described in '__vfs_removexattr_locked'

and fix one function whose kernel-doc was not in the correct format.

Link: https://lore.kernel.org/r/20210216042929.8931-4-rdunlap@infradead.org
Fixes: 71bc356f93a1 ("commoncap: handle idmapped mounts")
Fixes: b1ab7e4b2a88 ("VFS: Factor out part of vfs_setxattr so it can be called from the SELinux hook for inode_setsecctx.")
Signed-off-by: Randy Dunlap &lt;rdunlap@infradead.org&gt;
Cc: David Howells &lt;dhowells@redhat.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: linux-fsdevel@vger.kernel.org
Cc: Christian Brauner &lt;christian.brauner@ubuntu.com&gt;
Cc: David P. Quigley &lt;dpquigl@tycho.nsa.gov&gt;
Cc: James Morris &lt;jmorris@namei.org&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Acked-by: Christian Brauner &lt;christian.brauner@ubuntu.com&gt;
Signed-off-by: Christian Brauner &lt;christian.brauner@ubuntu.com&gt;
</content>
</entry>
<entry>
<title>namei: handle idmapped mounts in may_*() helpers</title>
<updated>2021-01-24T13:27:17+00:00</updated>
<author>
<name>Christian Brauner</name>
<email>christian.brauner@ubuntu.com</email>
</author>
<published>2021-01-21T13:19:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=ba73d98745be1c10dc3cce68e8d7b95012d07d05'/>
<id>urn:sha1:ba73d98745be1c10dc3cce68e8d7b95012d07d05</id>
<content type='text'>
The may_follow_link(), may_linkat(), may_lookup(), may_open(),
may_o_create(), may_create_in_sticky(), may_delete(), and may_create()
helpers determine whether the caller is privileged enough to perform the
associated operations. Let them handle idmapped mounts by mapping the
inode or fsids according to the mount's user namespace. Afterwards the
checks are identical to non-idmapped inodes. The patch takes care to
retrieve the mount's user namespace right before performing permission
checks and passing it down into the fileystem so the user namespace
can't change in between by someone idmapping a mount that is currently
not idmapped. If the initial user namespace is passed nothing changes so
non-idmapped mounts will see identical behavior as before.

Link: https://lore.kernel.org/r/20210121131959.646623-13-christian.brauner@ubuntu.com
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: David Howells &lt;dhowells@redhat.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: James Morris &lt;jamorris@linux.microsoft.com&gt;
Signed-off-by: Christian Brauner &lt;christian.brauner@ubuntu.com&gt;
</content>
</entry>
<entry>
<title>commoncap: handle idmapped mounts</title>
<updated>2021-01-24T13:27:17+00:00</updated>
<author>
<name>Christian Brauner</name>
<email>christian.brauner@ubuntu.com</email>
</author>
<published>2021-01-21T13:19:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=71bc356f93a1c589fad13f7487258f89c417976e'/>
<id>urn:sha1:71bc356f93a1c589fad13f7487258f89c417976e</id>
<content type='text'>
When interacting with user namespace and non-user namespace aware
filesystem capabilities the vfs will perform various security checks to
determine whether or not the filesystem capabilities can be used by the
caller, whether they need to be removed and so on. The main
infrastructure for this resides in the capability codepaths but they are
called through the LSM security infrastructure even though they are not
technically an LSM or optional. This extends the existing security hooks
security_inode_removexattr(), security_inode_killpriv(),
security_inode_getsecurity() to pass down the mount's user namespace and
makes them aware of idmapped mounts.

In order to actually get filesystem capabilities from disk the
capability infrastructure exposes the get_vfs_caps_from_disk() helper.
For user namespace aware filesystem capabilities a root uid is stored
alongside the capabilities.

In order to determine whether the caller can make use of the filesystem
capability or whether it needs to be ignored it is translated according
to the superblock's user namespace. If it can be translated to uid 0
according to that id mapping the caller can use the filesystem
capabilities stored on disk. If we are accessing the inode that holds
the filesystem capabilities through an idmapped mount we map the root
uid according to the mount's user namespace. Afterwards the checks are
identical to non-idmapped mounts: reading filesystem caps from disk
enforces that the root uid associated with the filesystem capability
must have a mapping in the superblock's user namespace and that the
caller is either in the same user namespace or is a descendant of the
superblock's user namespace. For filesystems that are mountable inside
user namespace the caller can just mount the filesystem and won't
usually need to idmap it. If they do want to idmap it they can create an
idmapped mount and mark it with a user namespace they created and which
is thus a descendant of s_user_ns. For filesystems that are not
mountable inside user namespaces the descendant rule is trivially true
because the s_user_ns will be the initial user namespace.

If the initial user namespace is passed nothing changes so non-idmapped
mounts will see identical behavior as before.

Link: https://lore.kernel.org/r/20210121131959.646623-11-christian.brauner@ubuntu.com
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: David Howells &lt;dhowells@redhat.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Acked-by: James Morris &lt;jamorris@linux.microsoft.com&gt;
Signed-off-by: Christian Brauner &lt;christian.brauner@ubuntu.com&gt;
</content>
</entry>
<entry>
<title>xattr: handle idmapped mounts</title>
<updated>2021-01-24T13:27:17+00:00</updated>
<author>
<name>Tycho Andersen</name>
<email>tycho@tycho.pizza</email>
</author>
<published>2021-01-21T13:19:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c7c7a1a18af4c3bb7749d33e3df3acdf0a95bbb5'/>
<id>urn:sha1:c7c7a1a18af4c3bb7749d33e3df3acdf0a95bbb5</id>
<content type='text'>
When interacting with extended attributes the vfs verifies that the
caller is privileged over the inode with which the extended attribute is
associated. For posix access and posix default extended attributes a uid
or gid can be stored on-disk. Let the functions handle posix extended
attributes on idmapped mounts. If the inode is accessed through an
idmapped mount we need to map it according to the mount's user
namespace. Afterwards the checks are identical to non-idmapped mounts.
This has no effect for e.g. security xattrs since they don't store uids
or gids and don't perform permission checks on them like posix acls do.

Link: https://lore.kernel.org/r/20210121131959.646623-10-christian.brauner@ubuntu.com
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: David Howells &lt;dhowells@redhat.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: James Morris &lt;jamorris@linux.microsoft.com&gt;
Signed-off-by: Tycho Andersen &lt;tycho@tycho.pizza&gt;
Signed-off-by: Christian Brauner &lt;christian.brauner@ubuntu.com&gt;
</content>
</entry>
<entry>
<title>acl: handle idmapped mounts</title>
<updated>2021-01-24T13:27:17+00:00</updated>
<author>
<name>Christian Brauner</name>
<email>christian.brauner@ubuntu.com</email>
</author>
<published>2021-01-21T13:19:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e65ce2a50cf6af216bea6fd80d771fcbb4c0aaa1'/>
<id>urn:sha1:e65ce2a50cf6af216bea6fd80d771fcbb4c0aaa1</id>
<content type='text'>
The posix acl permission checking helpers determine whether a caller is
privileged over an inode according to the acls associated with the
inode. Add helpers that make it possible to handle acls on idmapped
mounts.

The vfs and the filesystems targeted by this first iteration make use of
posix_acl_fix_xattr_from_user() and posix_acl_fix_xattr_to_user() to
translate basic posix access and default permissions such as the
ACL_USER and ACL_GROUP type according to the initial user namespace (or
the superblock's user namespace) to and from the caller's current user
namespace. Adapt these two helpers to handle idmapped mounts whereby we
either map from or into the mount's user namespace depending on in which
direction we're translating.
Similarly, cap_convert_nscap() is used by the vfs to translate user
namespace and non-user namespace aware filesystem capabilities from the
superblock's user namespace to the caller's user namespace. Enable it to
handle idmapped mounts by accounting for the mount's user namespace.

In addition the fileystems targeted in the first iteration of this patch
series make use of the posix_acl_chmod() and, posix_acl_update_mode()
helpers. Both helpers perform permission checks on the target inode. Let
them handle idmapped mounts. These two helpers are called when posix
acls are set by the respective filesystems to handle this case we extend
the -&gt;set() method to take an additional user namespace argument to pass
the mount's user namespace down.

Link: https://lore.kernel.org/r/20210121131959.646623-9-christian.brauner@ubuntu.com
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: David Howells &lt;dhowells@redhat.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Christian Brauner &lt;christian.brauner@ubuntu.com&gt;
</content>
</entry>
</feed>
