<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/fs/ceph/locks.c, branch v7.2-rc1</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v7.2-rc1</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v7.2-rc1'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2026-06-22T20:45:00+00:00</updated>
<entry>
<title>ceph: add client reset state machine and session teardown</title>
<updated>2026-06-22T20:45:00+00:00</updated>
<author>
<name>Alex Markuze</name>
<email>amarkuze@redhat.com</email>
</author>
<published>2026-05-07T09:11:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b52861897be8a7ba563077ae62cd48158d16a5d9'/>
<id>urn:sha1:b52861897be8a7ba563077ae62cd48158d16a5d9</id>
<content type='text'>
Add the client-side reset state machine, request gating, and manual
session teardown implementation.

Manual reset is an operator-triggered escape hatch for client/MDS
stalemates in which caps, locks, or unsafe metadata state stop making
forward progress.  The reset blocks new metadata work, attempts a
bounded best-effort drain of dirty client state while sessions are
still alive, and finally asks the MDS to close sessions before tearing
local session state down directly.

The reset state machine tracks four phases: IDLE -&gt; QUIESCING -&gt;
DRAINING -&gt; TEARDOWN -&gt; IDLE.  QUIESCING is set synchronously by
schedule_reset() before the workqueue item is dispatched, so that new
metadata requests and file-lock acquisitions are gated immediately --
even before the work function begins running.  All non-IDLE phases
block callers on blocked_wq, preventing races with session teardown.

The drain phase flushes mdlog state, dirty caps, and pending cap
releases for a bounded interval.  State that still cannot make progress
within that interval is discarded during teardown, which is the point
of the reset: break the stalemate and allow fresh sessions to rebuild
clean state.

The session teardown follows the established check_new_map()
forced-close pattern: unregister sessions under mdsc-&gt;mutex, then clean
up caps and requests under s-&gt;s_mutex.  Reconnect is not attempted
because the MDS only accepts reconnects during its own RECONNECT phase
after restart, not from an active client.

Blocked callers are released when reset completes and observe the final
result via -EAGAIN (reset failed) or 0 (success).  Internal work-function
errors such as -ENOMEM are not propagated to unrelated callers like
open() or flock(); the detailed error remains in debugfs and
tracepoints.

The work function checks st-&gt;shutdown before each phase transition
(DRAINING, TEARDOWN) so that a concurrent ceph_mdsc_destroy() is not
overwritten.  If destroy already took ownership, the work function
releases session references and returns without touching the state.

The timeout calculation for blocked-request waiters uses max_t() to
prevent jiffies underflow when the deadline has already passed.

The close-grace sleep before teardown is a best-effort nudge to let
queued REQUEST_CLOSE messages egress; it is not a correctness
requirement since the MDS still has session_autoclose as a fallback.

The destroy path marks reset as failed and wakes blocked waiters before
cancel_work_sync() so unmount does not stall.

Signed-off-by: Alex Markuze &lt;amarkuze@redhat.com&gt;
Reviewed-by: Viacheslav Dubeyko &lt;Slava.Dubeyko@ibm.com&gt;
Signed-off-by: Viacheslav Dubeyko &lt;Slava.Dubeyko@ibm.com&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</content>
</entry>
<entry>
<title>ceph: convert inode flags to named bit positions and atomic bitops</title>
<updated>2026-06-22T20:44:42+00:00</updated>
<author>
<name>Alex Markuze</name>
<email>amarkuze@redhat.com</email>
</author>
<published>2026-05-07T08:05:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e120e2b666851c4c0c7bffd315ff69a09f9fe4ac'/>
<id>urn:sha1:e120e2b666851c4c0c7bffd315ff69a09f9fe4ac</id>
<content type='text'>
Define named bit-position constants for all CEPH_I_* inode flags and
derive the bitmask values from them.  This gives every flag a named
_BIT constant usable with the test_bit/set_bit/clear_bit family.
The intentionally unused bit position 1 is documented inline.

Convert all flag modifications to use atomic bitops (set_bit,
clear_bit, test_and_clear_bit).  The previous code mixed lockless
atomic ops on some flags (ERROR_WRITE, ODIRECT) with non-atomic
read-modify-write (|= / &amp;= ~) on other flags sharing the same
unsigned long.  A concurrent non-atomic RMW can clobber an
adjacent lockless atomic update -- for example, a lockless
clear_bit(ERROR_WRITE) could be silently resurrected by a
concurrent ci-&gt;i_ceph_flags |= CEPH_I_FLUSH under the spinlock.
Using atomic bitops for all modifications eliminates this class
of race entirely.

Flags whose only users are now the _BIT form (ERROR_WRITE,
ASYNC_CHECK_CAPS) have their old mask defines removed to document
that callers must use the _BIT constant with the set_bit/test_bit
family.  ERROR_FILELOCK and SHUTDOWN retain their mask defines
because they are still used via bitmask tests in lockless readers
(ceph_inode_is_shutdown, reconnect_caps_cb).

The direct assignment in ceph_finish_async_create() is converted
from i_ceph_flags = CEPH_I_ASYNC_CREATE to set_bit().  This
inode is I_NEW at this point -- still invisible to other threads
and guaranteed to have zero flags from alloc_inode -- so either
form is safe, but set_bit() keeps the conversion uniform.

Signed-off-by: Alex Markuze &lt;amarkuze@redhat.com&gt;
Reviewed-by: Viacheslav Dubeyko &lt;Slava.Dubeyko@ibm.com&gt;
Signed-off-by: Viacheslav Dubeyko &lt;Slava.Dubeyko@ibm.com&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</content>
</entry>
<entry>
<title>ceph: add checking of wait_for_completion_killable() return value</title>
<updated>2025-10-08T21:30:46+00:00</updated>
<author>
<name>Viacheslav Dubeyko</name>
<email>Slava.Dubeyko@ibm.com</email>
</author>
<published>2025-06-06T19:04:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b7ed1e29cfe773d648ca09895b92856bd3a2092d'/>
<id>urn:sha1:b7ed1e29cfe773d648ca09895b92856bd3a2092d</id>
<content type='text'>
The Coverity Scan service has detected the calling of
wait_for_completion_killable() without checking the return
value in ceph_lock_wait_for_completion() [1]. The CID 1636232
defect contains explanation: "If the function returns an error
value, the error value may be mistaken for a normal value.
In ceph_lock_wait_for_completion(): Value returned from
a function is not checked for errors before being used. (CWE-252)".

The patch adds the checking of wait_for_completion_killable()
return value and return the error code from
ceph_lock_wait_for_completion().

[1] https://scan5.scan.coverity.com/#/project-view/64304/10063?selectedIssue=1636232

Signed-off-by: Viacheslav Dubeyko &lt;Slava.Dubeyko@ibm.com&gt;
Reviewed-by: Alex Markuze &lt;amarkuze@redhat.com&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</content>
</entry>
<entry>
<title>ceph: adapt to breakup of struct file_lock</title>
<updated>2024-02-05T12:11:42+00:00</updated>
<author>
<name>Jeff Layton</name>
<email>jlayton@kernel.org</email>
</author>
<published>2024-01-31T23:02:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=3956f35fbd36aee35766008ce6a9b0ed812d93ef'/>
<id>urn:sha1:3956f35fbd36aee35766008ce6a9b0ed812d93ef</id>
<content type='text'>
Most of the existing APIs have remained the same, but subsystems that
access file_lock fields directly need to reach into struct
file_lock_core now.

Signed-off-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Link: https://lore.kernel.org/r/20240131-flsplit-v3-36-c6129007ee8d@kernel.org
Reviewed-by: NeilBrown &lt;neilb@suse.de&gt;
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
<entry>
<title>filelock: split common fields into struct file_lock_core</title>
<updated>2024-02-05T12:11:38+00:00</updated>
<author>
<name>Jeff Layton</name>
<email>jlayton@kernel.org</email>
</author>
<published>2024-01-31T23:01:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a69ce85ec9af6bdc0b3511959a7dc1a324e5e16a'/>
<id>urn:sha1:a69ce85ec9af6bdc0b3511959a7dc1a324e5e16a</id>
<content type='text'>
In a future patch, we're going to split file leases into their own
structure. Since a lot of the underlying machinery uses the same fields
move those into a new file_lock_core, and embed that inside struct
file_lock.

For now, add some macros to ensure that we can continue to build while
the conversion is in progress.

Signed-off-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Link: https://lore.kernel.org/r/20240131-flsplit-v3-17-c6129007ee8d@kernel.org
Reviewed-by: NeilBrown &lt;neilb@suse.de&gt;
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
<entry>
<title>ceph: convert to using new filelock helpers</title>
<updated>2024-02-05T12:11:35+00:00</updated>
<author>
<name>Jeff Layton</name>
<email>jlayton@kernel.org</email>
</author>
<published>2024-01-31T23:01:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=75e9570c93c70e8ad7be59cce82cea3872d8e8a7'/>
<id>urn:sha1:75e9570c93c70e8ad7be59cce82cea3872d8e8a7</id>
<content type='text'>
Convert to using the new file locking helper functions.

Signed-off-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Link: https://lore.kernel.org/r/20240131-flsplit-v3-7-c6129007ee8d@kernel.org
Reviewed-by: NeilBrown &lt;neilb@suse.de&gt;
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
<entry>
<title>ceph: print cluster fsid and client global_id in all debug logs</title>
<updated>2023-11-03T22:28:33+00:00</updated>
<author>
<name>Xiubo Li</name>
<email>xiubli@redhat.com</email>
</author>
<published>2023-06-12T01:04:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=38d46409c4639a1d659ebfa70e27a8bed6b8ee1d'/>
<id>urn:sha1:38d46409c4639a1d659ebfa70e27a8bed6b8ee1d</id>
<content type='text'>
Multiple CephFS mounts on a host is increasingly common so
disambiguating messages like this is necessary and will make it easier
to debug issues.

At the same this will improve the debug logs to make them easier to
troubleshooting issues, such as print the ino# instead only printing
the memory addresses of the corresponding inodes and print the dentry
names instead of the corresponding memory addresses for the dentry,etc.

Link: https://tracker.ceph.com/issues/61590
Signed-off-by: Xiubo Li &lt;xiubli@redhat.com&gt;
Reviewed-by: Patrick Donnelly &lt;pdonnell@redhat.com&gt;
Reviewed-by: Milind Changire &lt;mchangir@redhat.com&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</content>
</entry>
<entry>
<title>filelock: move file locking definitions to separate header file</title>
<updated>2023-01-11T11:52:32+00:00</updated>
<author>
<name>Jeff Layton</name>
<email>jlayton@kernel.org</email>
</author>
<published>2022-11-20T14:15:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=5970e15dbcfeb0ed3a0bf1954f35bbe60a048754'/>
<id>urn:sha1:5970e15dbcfeb0ed3a0bf1954f35bbe60a048754</id>
<content type='text'>
The file locking definitions have lived in fs.h since the dawn of time,
but they are only used by a small subset of the source files that
include it.

Move the file locking definitions to a new header file, and add the
appropriate #include directives to the source files that need them. By
doing this we trim down fs.h a bit and limit the amount of rebuilding
that has to be done when we make changes to the file locking APIs.

Reviewed-by: Xiubo Li &lt;xiubli@redhat.com&gt;
Reviewed-by: Christian Brauner (Microsoft) &lt;brauner@kernel.org&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: David Howells &lt;dhowells@redhat.com&gt;
Reviewed-by: Russell King (Oracle) &lt;rmk+kernel@armlinux.org.uk&gt;
Acked-by: Chuck Lever &lt;chuck.lever@oracle.com&gt;
Acked-by: Joseph Qi &lt;joseph.qi@linux.alibaba.com&gt;
Acked-by: Steve French &lt;stfrench@microsoft.com&gt;
Acked-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Acked-by: Darrick J. Wong &lt;djwong@kernel.org&gt;
Signed-off-by: Jeff Layton &lt;jlayton@kernel.org&gt;
</content>
</entry>
<entry>
<title>ceph: avoid use-after-free in ceph_fl_release_lock()</title>
<updated>2023-01-02T11:27:25+00:00</updated>
<author>
<name>Xiubo Li</name>
<email>xiubli@redhat.com</email>
</author>
<published>2022-11-17T02:57:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=8e1858710d9a71d88acd922f2e95d1eddb90eea0'/>
<id>urn:sha1:8e1858710d9a71d88acd922f2e95d1eddb90eea0</id>
<content type='text'>
When ceph releasing the file_lock it will try to get the inode pointer
from the fl-&gt;fl_file, which the memory could already be released by
another thread in filp_close(). Because in VFS layer the fl-&gt;fl_file
doesn't increase the file's reference counter.

Will switch to use ceph dedicate lock info to track the inode.

And in ceph_fl_release_lock() we should skip all the operations if the
fl-&gt;fl_u.ceph.inode is not set, which should come from the request
file_lock. And we will set fl-&gt;fl_u.ceph.inode when inserting it to the
inode lock list, which is when copying the lock.

Link: https://tracker.ceph.com/issues/57986
Signed-off-by: Xiubo Li &lt;xiubli@redhat.com&gt;
Reviewed-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Reviewed-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</content>
</entry>
<entry>
<title>ceph: switch to vfs_inode_has_locks() to fix file lock bug</title>
<updated>2023-01-02T11:27:25+00:00</updated>
<author>
<name>Xiubo Li</name>
<email>xiubli@redhat.com</email>
</author>
<published>2022-11-17T02:43:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=461ab10ef7e6ea9b41a0571a7fc6a72af9549a3c'/>
<id>urn:sha1:461ab10ef7e6ea9b41a0571a7fc6a72af9549a3c</id>
<content type='text'>
For the POSIX locks they are using the same owner, which is the
thread id. And multiple POSIX locks could be merged into single one,
so when checking whether the 'file' has locks may fail.

For a file where some openers use locking and others don't is a
really odd usage pattern though. Locks are like stoplights -- they
only work if everyone pays attention to them.

Just switch ceph_get_caps() to check whether any locks are set on
the inode. If there are POSIX/OFD/FLOCK locks on the file at the
time, we should set CHECK_FILELOCK, regardless of what fd was used
to set the lock.

Fixes: ff5d913dfc71 ("ceph: return -EIO if read/write against filp that lost file locks")
Signed-off-by: Xiubo Li &lt;xiubli@redhat.com&gt;
Reviewed-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Reviewed-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</content>
</entry>
</feed>
