<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/fs/ceph/dir.c, branch v6.1.168</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.1.168</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.1.168'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2026-03-25T10:03:08+00:00</updated>
<entry>
<title>ceph: fix i_nlink underrun during async unlink</title>
<updated>2026-03-25T10:03:08+00:00</updated>
<author>
<name>Max Kellermann</name>
<email>max.kellermann@ionos.com</email>
</author>
<published>2025-09-05T21:15:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=fcc477a6e8856c8a42b3c9e171724d8d6dfadd06'/>
<id>urn:sha1:fcc477a6e8856c8a42b3c9e171724d8d6dfadd06</id>
<content type='text'>
commit ce0123cbb4a40a2f1bbb815f292b26e96088639f upstream.

During async unlink, we drop the `i_nlink` counter before we receive
the completion (that will eventually update the `i_nlink`) because "we
assume that the unlink will succeed".  That is not a bad idea, but it
races against deletions by other clients (or against the completion of
our own unlink) and can lead to an underrun which emits a WARNING like
this one:

 WARNING: CPU: 85 PID: 25093 at fs/inode.c:407 drop_nlink+0x50/0x68
 Modules linked in:
 CPU: 85 UID: 3221252029 PID: 25093 Comm: php-cgi8.1 Not tainted 6.14.11-cm4all1-ampere #655
 Hardware name: Supermicro ARS-110M-NR/R12SPD-A, BIOS 1.1b 10/17/2023
 pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
 pc : drop_nlink+0x50/0x68
 lr : ceph_unlink+0x6c4/0x720
 sp : ffff80012173bc90
 x29: ffff80012173bc90 x28: ffff086d0a45aaf8 x27: ffff0871d0eb5680
 x26: ffff087f2a64a718 x25: 0000020000000180 x24: 0000000061c88647
 x23: 0000000000000002 x22: ffff07ff9236d800 x21: 0000000000001203
 x20: ffff07ff9237b000 x19: ffff088b8296afc0 x18: 00000000f3c93365
 x17: 0000000000070000 x16: ffff08faffcbdfe8 x15: ffff08faffcbdfec
 x14: 0000000000000000 x13: 45445f65645f3037 x12: 34385f6369706f74
 x11: 0000a2653104bb20 x10: ffffd85f26d73290 x9 : ffffd85f25664f94
 x8 : 00000000000000c0 x7 : 0000000000000000 x6 : 0000000000000002
 x5 : 0000000000000081 x4 : 0000000000000481 x3 : 0000000000000000
 x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff08727d3f91e8
 Call trace:
  drop_nlink+0x50/0x68 (P)
  vfs_unlink+0xb0/0x2e8
  do_unlinkat+0x204/0x288
  __arm64_sys_unlinkat+0x3c/0x80
  invoke_syscall.constprop.0+0x54/0xe8
  do_el0_svc+0xa4/0xc8
  el0_svc+0x18/0x58
  el0t_64_sync_handler+0x104/0x130
  el0t_64_sync+0x154/0x158

In ceph_unlink(), a call to ceph_mdsc_submit_request() submits the
CEPH_MDS_OP_UNLINK to the MDS, but does not wait for completion.

Meanwhile, between this call and the following drop_nlink() call, a
worker thread may process a CEPH_CAP_OP_IMPORT, CEPH_CAP_OP_GRANT or
just a CEPH_MSG_CLIENT_REPLY (the latter of which could be our own
completion).  These will lead to a set_nlink() call, updating the
`i_nlink` counter to the value received from the MDS.  If that new
`i_nlink` value happens to be zero, it is illegal to decrement it
further.  But that is exactly what ceph_unlink() will do then.

The WARNING can be reproduced this way:

1. Force async unlink; only the async code path is affected.  Having
   no real clue about Ceph internals, I was unable to find out why the
   MDS wouldn't give me the "Fxr" capabilities, so I patched
   get_caps_for_async_unlink() to always succeed.

   (Note that the WARNING dump above was found on an unpatched kernel,
   without this kludge - this is not a theoretical bug.)

2. Add a sleep call after ceph_mdsc_submit_request() so the unlink
   completion gets handled by a worker thread before drop_nlink() is
   called.  This guarantees that the `i_nlink` is already zero before
   drop_nlink() runs.

The solution is to skip the counter decrement when it is already zero,
but doing so without a lock is still racy (TOCTOU).  Since
ceph_fill_inode() and handle_cap_grant() both hold the
`ceph_inode_info.i_ceph_lock` spinlock while set_nlink() runs, this
seems like the proper lock to protect the `i_nlink` updates.

I found prior art in NFS and SMB (using `inode.i_lock`) and AFS (using
`afs_vnode.cb_lock`).  All three have the zero check as well.

Cc: stable@vger.kernel.org
Fixes: 2ccb45462aea ("ceph: perform asynchronous unlink if we have sufficient caps")
Signed-off-by: Max Kellermann &lt;max.kellermann@ionos.com&gt;
Reviewed-by: Viacheslav Dubeyko &lt;Slava.Dubeyko@ibm.com&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>ceph: wait for the first reply of inflight async unlink</title>
<updated>2022-08-02T22:54:12+00:00</updated>
<author>
<name>Xiubo Li</name>
<email>xiubli@redhat.com</email>
</author>
<published>2022-05-10T01:47:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=4868e537fa867f82e38e37429d61d7bb8357d79b'/>
<id>urn:sha1:4868e537fa867f82e38e37429d61d7bb8357d79b</id>
<content type='text'>
In async unlink case the kclient won't wait for the first reply
from MDS and just drop all the links and unhash the dentry and then
succeeds immediately.

For any new create/link/rename,etc requests followed by using the
same file names we must wait for the first reply of the inflight
unlink request, or the MDS possibly will fail these following
requests with -EEXIST if the inflight async unlink request was
delayed for some reasons.

And the worst case is that for the none async openc request it will
successfully open the file if the CDentry hasn't been unlinked yet,
but later the previous delayed async unlink request will remove the
CDenty. That means the just created file is possiblly deleted later
by accident.

We need to wait for the inflight async unlink requests to finish
when creating new files/directories by using the same file names.

Link: https://tracker.ceph.com/issues/55332
Signed-off-by: Xiubo Li &lt;xiubli@redhat.com&gt;
Reviewed-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</content>
</entry>
<entry>
<title>ceph: fix memory leak in ceph_readdir when note_last_dentry returns error</title>
<updated>2022-03-21T12:35:16+00:00</updated>
<author>
<name>Xiubo Li</name>
<email>xiubli@redhat.com</email>
</author>
<published>2022-03-05T11:52:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=f639d9867eea647005dc824e0e24f39ffc50d4e4'/>
<id>urn:sha1:f639d9867eea647005dc824e0e24f39ffc50d4e4</id>
<content type='text'>
Reset the last_readdir at the same time, and add a comment explaining
why we don't free last_readdir when dir_emit returns false.

Signed-off-by: Xiubo Li &lt;xiubli@redhat.com&gt;
Reviewed-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</content>
</entry>
<entry>
<title>ceph: fix comments mentioning i_mutex</title>
<updated>2022-03-01T17:26:36+00:00</updated>
<author>
<name>hongnanli</name>
<email>hongnan.li@linux.alibaba.com</email>
</author>
<published>2022-02-11T03:33:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=810313c5f3f5cff36c342d3e26c13f170c3af044'/>
<id>urn:sha1:810313c5f3f5cff36c342d3e26c13f170c3af044</id>
<content type='text'>
inode-&gt;i_mutex has been replaced with inode-&gt;i_rwsem long ago. Fix
comments still mentioning i_mutex.

Signed-off-by: hongnanli &lt;hongnan.li@linux.alibaba.com&gt;
Reviewed-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</content>
</entry>
<entry>
<title>ceph: take reference to req-&gt;r_parent at point of assignment</title>
<updated>2021-06-28T22:15:52+00:00</updated>
<author>
<name>Jeff Layton</name>
<email>jlayton@kernel.org</email>
</author>
<published>2021-06-18T17:05:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=4c18347238ab5a4ee0e71ca765460d84c75a26b5'/>
<id>urn:sha1:4c18347238ab5a4ee0e71ca765460d84c75a26b5</id>
<content type='text'>
Currently, we set the r_parent pointer but then don't take a reference
to it until we submit the request. If we end up freeing the req before
that point, then we'll do a iput when we shouldn't.

Instead, take the inode reference in the callers, so that it's always
safe to call ceph_mdsc_put_request on the req, even before submission.

Signed-off-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Reviewed-by: Luis Henriques &lt;lhenriques@suse.de&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</content>
</entry>
<entry>
<title>ceph: eliminate session-&gt;s_gen_ttl_lock</title>
<updated>2021-06-28T22:15:52+00:00</updated>
<author>
<name>Jeff Layton</name>
<email>jlayton@kernel.org</email>
</author>
<published>2021-06-04T16:03:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=52d60f8e18b855d67ecdc4fa34ae1b894d36c7b9'/>
<id>urn:sha1:52d60f8e18b855d67ecdc4fa34ae1b894d36c7b9</id>
<content type='text'>
Turn s_cap_gen field into an atomic_t, and just rely on the fact that we
hold the s_mutex when changing the s_cap_ttl field.

Signed-off-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Reviewed-by: Luis Henriques &lt;lhenriques@suse.de&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</content>
</entry>
<entry>
<title>ceph: allow ceph_put_mds_session to take NULL or ERR_PTR</title>
<updated>2021-06-28T22:15:52+00:00</updated>
<author>
<name>Jeff Layton</name>
<email>jlayton@kernel.org</email>
</author>
<published>2021-06-09T18:09:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=7e65624d32b6e0429b1d3559e5585657f34f74a1'/>
<id>urn:sha1:7e65624d32b6e0429b1d3559e5585657f34f74a1</id>
<content type='text'>
...to simplify some error paths.

Signed-off-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Reviewed-by: Luis Henriques &lt;lhenriques@suse.de&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</content>
</entry>
<entry>
<title>ceph: fix error handling in ceph_atomic_open and ceph_lookup</title>
<updated>2021-06-22T11:53:28+00:00</updated>
<author>
<name>Jeff Layton</name>
<email>jlayton@kernel.org</email>
</author>
<published>2021-06-02T16:46:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=7a971e2c0767b6fc9a77c4108eceff0509c61cdb'/>
<id>urn:sha1:7a971e2c0767b6fc9a77c4108eceff0509c61cdb</id>
<content type='text'>
Commit aa60cfc3f7ee broke the error handling in these functions such
that they don't handle non-ENOENT errors from ceph_mdsc_do_request
properly.

Move the checking of -ENOENT out of ceph_handle_snapdir and into the
callers, and if we get a different error, return it immediately.

Fixes: aa60cfc3f7ee ("ceph: don't use d_add in ceph_handle_snapdir")
Signed-off-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Reviewed-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</content>
</entry>
<entry>
<title>ceph: don't use d_add in ceph_handle_snapdir</title>
<updated>2021-04-27T21:52:23+00:00</updated>
<author>
<name>Jeff Layton</name>
<email>jlayton@kernel.org</email>
</author>
<published>2021-03-01T13:01:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=aa60cfc3f7ee32766766f71e6bfbea963b4f94bc'/>
<id>urn:sha1:aa60cfc3f7ee32766766f71e6bfbea963b4f94bc</id>
<content type='text'>
It's possible ceph_get_snapdir could end up finding a (disconnected)
inode that already exists in the cache. Change the prototype for
ceph_handle_snapdir to return a dentry pointer and have it use
d_splice_alias so we don't end up with an aliased dentry in the cache.

Reported-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</content>
</entry>
<entry>
<title>ceph: fix fall-through warnings for Clang</title>
<updated>2021-04-27T21:52:22+00:00</updated>
<author>
<name>Gustavo A. R. Silva</name>
<email>gustavoars@kernel.org</email>
</author>
<published>2021-03-05T09:59:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=fcaddb1d851bf69c94b3046227341d9684e276b1'/>
<id>urn:sha1:fcaddb1d851bf69c94b3046227341d9684e276b1</id>
<content type='text'>
In preparation to enable -Wimplicit-fallthrough for Clang, fix a couple
of warnings by explicitly adding a break and a goto statements instead
of just letting the code fall through to the next case.

URL: https://github.com/KSPP/linux/issues/115
Signed-off-by: Gustavo A. R. Silva &lt;gustavoars@kernel.org&gt;
Reviewed-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</content>
</entry>
</feed>
