<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/fs, branch v5.13.12</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v5.13.12</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v5.13.12'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2021-08-18T07:07:08+00:00</updated>
<entry>
<title>ceph: take snap_empty_lock atomically with snaprealm refcount change</title>
<updated>2021-08-18T07:07:08+00:00</updated>
<author>
<name>Jeff Layton</name>
<email>jlayton@kernel.org</email>
</author>
<published>2021-08-03T16:47:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=ac0e79584d4156d3c5c496f3d8d6c746637ef1ad'/>
<id>urn:sha1:ac0e79584d4156d3c5c496f3d8d6c746637ef1ad</id>
<content type='text'>
commit 8434ffe71c874b9c4e184b88d25de98c2bf5fe3f upstream.

There is a race in ceph_put_snap_realm. The change to the nref and the
spinlock acquisition are not done atomically, so you could decrement
nref, and before you take the spinlock, the nref is incremented again.
At that point, you end up putting it on the empty list when it
shouldn't be there. Eventually __cleanup_empty_realms runs and frees
it when it's still in-use.

Fix this by protecting the 1-&gt;0 transition with atomic_dec_and_lock,
and just drop the spinlock if we can get the rwsem.

Because these objects can also undergo a 0-&gt;1 refcount transition, we
must protect that change as well with the spinlock. Increment locklessly
unless the value is at 0, in which case we take the spinlock, increment
and then take it off the empty list if it did the 0-&gt;1 transition.

With these changes, I'm removing the dout() messages from these
functions, as well as in __put_snap_realm. They've always been racy, and
it's better to not print values that may be misleading.

Cc: stable@vger.kernel.org
URL: https://tracker.ceph.com/issues/46419
Reported-by: Mark Nelson &lt;mnelson@redhat.com&gt;
Signed-off-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Reviewed-by: Luis Henriques &lt;lhenriques@suse.de&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>ceph: clean up locking annotation for ceph_get_snap_realm and __lookup_snap_realm</title>
<updated>2021-08-18T07:07:08+00:00</updated>
<author>
<name>Jeff Layton</name>
<email>jlayton@kernel.org</email>
</author>
<published>2021-06-01T13:24:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=dcd02a1248cc4a6a889a1ea00734c201d14f5f1a'/>
<id>urn:sha1:dcd02a1248cc4a6a889a1ea00734c201d14f5f1a</id>
<content type='text'>
commit df2c0cb7f8e8c83e495260ad86df8c5da947f2a7 upstream.

They both say that the snap_rwsem must be held for write, but I don't
see any real reason for it, and it's not currently always called that
way.

The lookup is just walking the rbtree, so holding it for read should be
fine there. The "get" is bumping the refcount and (possibly) removing
it from the empty list. I see no need to hold the snap_rwsem for write
for that.

Signed-off-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Reviewed-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>ceph: add some lockdep assertions around snaprealm handling</title>
<updated>2021-08-18T07:07:07+00:00</updated>
<author>
<name>Jeff Layton</name>
<email>jlayton@kernel.org</email>
</author>
<published>2021-06-01T12:13:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=36b361cb1966dea56408c2f9214def2199d13c29'/>
<id>urn:sha1:36b361cb1966dea56408c2f9214def2199d13c29</id>
<content type='text'>
commit a6862e6708c15995bc10614b2ef34ca35b4b9078 upstream.

Turn some comments into lockdep asserts.

Signed-off-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Reviewed-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>ovl: fix deadlock in splice write</title>
<updated>2021-08-18T07:06:59+00:00</updated>
<author>
<name>Miklos Szeredi</name>
<email>mszeredi@redhat.com</email>
</author>
<published>2021-07-28T08:38:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=1d1808fae2e085228da8577f1bf7f4cbe25fef2a'/>
<id>urn:sha1:1d1808fae2e085228da8577f1bf7f4cbe25fef2a</id>
<content type='text'>
[ Upstream commit 9b91b6b019fda817eb52f728eb9c79b3579760bc ]

There's possibility of an ABBA deadlock in case of a splice write to an
overlayfs file and a concurrent splice write to a corresponding real file.

The call chain for splice to an overlay file:

 -&gt; do_splice                     [takes sb_writers on overlay file]
   -&gt; do_splice_from
     -&gt; iter_file_splice_write    [takes pipe-&gt;mutex]
       -&gt; vfs_iter_write
         ...
         -&gt; ovl_write_iter        [takes sb_writers on real file]

And the call chain for splice to a real file:

 -&gt; do_splice                     [takes sb_writers on real file]
   -&gt; do_splice_from
     -&gt; iter_file_splice_write    [takes pipe-&gt;mutex]

Syzbot successfully bisected this to commit 82a763e61e2b ("ovl: simplify
file splice").

Fix by reverting the write part of the above commit and by adding missing
bits from ovl_write_iter() into ovl_splice_write().

Fixes: 82a763e61e2b ("ovl: simplify file splice")
Reported-and-tested-by: syzbot+579885d1a9a833336209@syzkaller.appspotmail.com
Signed-off-by: Miklos Szeredi &lt;mszeredi@redhat.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>io-wq: fix IO_WORKER_F_FIXED issue in create_io_worker()</title>
<updated>2021-08-18T07:06:57+00:00</updated>
<author>
<name>Hao Xu</name>
<email>haoxu@linux.alibaba.com</email>
</author>
<published>2021-08-08T13:54:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=f49d457950b99901f6d63a55cf2167a4889d8509'/>
<id>urn:sha1:f49d457950b99901f6d63a55cf2167a4889d8509</id>
<content type='text'>
[ Upstream commit 47cae0c71f7a126903f930191e6e9f103674aca1 ]

There may be cases like:
        A                                 B
spin_lock(wqe-&gt;lock)
nr_workers is 0
nr_workers++
spin_unlock(wqe-&gt;lock)
                                     spin_lock(wqe-&gt;lock)
                                     nr_wokers is 1
                                     nr_workers++
                                     spin_unlock(wqe-&gt;lock)
create_io_worker()
  acct-&gt;worker is 1
                                     create_io_worker()
                                       acct-&gt;worker is 1

There should be one worker marked IO_WORKER_F_FIXED, but no one is.
Fix this by introduce a new agrument for create_io_worker() to indicate
if it is the first worker.

Fixes: 3d4e4face9c1 ("io-wq: fix no lock protection of acct-&gt;nr_worker")
Signed-off-by: Hao Xu &lt;haoxu@linux.alibaba.com&gt;
Link: https://lore.kernel.org/r/20210808135434.68667-3-haoxu@linux.alibaba.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>io-wq: fix bug of creating io-wokers unconditionally</title>
<updated>2021-08-18T07:06:57+00:00</updated>
<author>
<name>Hao Xu</name>
<email>haoxu@linux.alibaba.com</email>
</author>
<published>2021-08-08T13:54:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=815a0fe3f415e78ae8678fc1b8864e2dc34b9091'/>
<id>urn:sha1:815a0fe3f415e78ae8678fc1b8864e2dc34b9091</id>
<content type='text'>
[ Upstream commit 49e7f0c789add1330b111af0b7caeb0e87df063e ]

The former patch to add check between nr_workers and max_workers has a
bug, which will cause unconditionally creating io-workers. That's
because the result of the check doesn't affect the call of
create_io_worker(), fix it by bringing in a boolean value for it.

Fixes: 21698274da5b ("io-wq: fix lack of acct-&gt;nr_workers &lt; acct-&gt;max_workers judgement")
Signed-off-by: Hao Xu &lt;haoxu@linux.alibaba.com&gt;
Link: https://lore.kernel.org/r/20210808135434.68667-2-haoxu@linux.alibaba.com
[axboe: drop hunk that isn't strictly needed]
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>io_uring: clear TIF_NOTIFY_SIGNAL when running task work</title>
<updated>2021-08-18T07:06:55+00:00</updated>
<author>
<name>Nadav Amit</name>
<email>namit@vmware.com</email>
</author>
<published>2021-08-08T00:13:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=6a4b92821135940de9790e135f28bff3d8b7b5eb'/>
<id>urn:sha1:6a4b92821135940de9790e135f28bff3d8b7b5eb</id>
<content type='text'>
[ Upstream commit ef98eb0409c31c39ab55ff46b2721c3b4f84c122 ]

When using SQPOLL, the submission queue polling thread calls
task_work_run() to run queued work. However, when work is added with
TWA_SIGNAL - as done by io_uring itself - the TIF_NOTIFY_SIGNAL remains
set afterwards and is never cleared.

Consequently, when the submission queue polling thread checks whether
signal_pending(), it may always find a pending signal, if
task_work_add() was ever called before.

The impact of this bug might be different on different kernel versions.
It appears that on 5.14 it would only cause unnecessary calculation and
prevent the polling thread from sleeping. On 5.13, where the bug was
found, it stops the polling thread from finding newly submitted work.

Instead of task_work_run(), use tracehook_notify_signal() that clears
TIF_NOTIFY_SIGNAL. Test for TIF_NOTIFY_SIGNAL in addition to
current-&gt;task_works to avoid a race in which task_works is cleared but
the TIF_NOTIFY_SIGNAL is set.

Fixes: 685fe7feedb96 ("io-wq: eliminate the need for a manager thread")
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Pavel Begunkov &lt;asml.silence@gmail.com&gt;
Signed-off-by: Nadav Amit &lt;namit@vmware.com&gt;
Link: https://lore.kernel.org/r/20210808001342.964634-2-namit@vmware.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>ceph: reduce contention in ceph_check_delayed_caps()</title>
<updated>2021-08-18T07:06:49+00:00</updated>
<author>
<name>Luis Henriques</name>
<email>lhenriques@suse.de</email>
</author>
<published>2021-07-06T13:52:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=3d33960c8d663dbbf98f2cbcda49b198e0fc2918'/>
<id>urn:sha1:3d33960c8d663dbbf98f2cbcda49b198e0fc2918</id>
<content type='text'>
commit bf2ba432213fade50dd39f2e348085b758c0726e upstream.

Function ceph_check_delayed_caps() is called from the mdsc-&gt;delayed_work
workqueue and it can be kept looping for quite some time if caps keep
being added back to the mdsc-&gt;cap_delay_list.  This may result in the
watchdog tainting the kernel with the softlockup flag.

This patch breaks this loop if the caps have been recently (i.e. during
the loop execution).  Any new caps added to the list will be handled in
the next run.

Also, allow schedule_delayed() callers to explicitly set the delay value
instead of defaulting to 5s, so we can ensure that it runs soon
afterward if it looks like there is more work.

Cc: stable@vger.kernel.org
URL: https://tracker.ceph.com/issues/46284
Signed-off-by: Luis Henriques &lt;lhenriques@suse.de&gt;
Reviewed-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>io_uring: fix ctx-exit io_rsrc_put_work() deadlock</title>
<updated>2021-08-18T07:06:48+00:00</updated>
<author>
<name>Pavel Begunkov</name>
<email>asml.silence@gmail.com</email>
</author>
<published>2021-08-10T01:44:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=3e35c7dbf7064bde4c92b4637f719d6cbd9e21b7'/>
<id>urn:sha1:3e35c7dbf7064bde4c92b4637f719d6cbd9e21b7</id>
<content type='text'>
commit 43597aac1f87230cb565ab354d331682f13d3c7a upstream.

__io_rsrc_put_work() might need -&gt;uring_lock, so nobody should wait for
rsrc nodes holding the mutex. However, that's exactly what
io_ring_ctx_free() does with io_wait_rsrc_data().

Split it into rsrc wait + dealloc, and move the first one out of the
lock.

Cc: stable@vger.kernel.org
Fixes: b60c8dce33895 ("io_uring: preparation for rsrc tagging")
Signed-off-by: Pavel Begunkov &lt;asml.silence@gmail.com&gt;
Link: https://lore.kernel.org/r/0130c5c2693468173ec1afab714e0885d2c9c363.1628559783.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>io_uring: drop ctx-&gt;uring_lock before flushing work item</title>
<updated>2021-08-18T07:06:48+00:00</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2021-08-09T14:15:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e1c5046e341dbd1cb1a331210645253993414d6b'/>
<id>urn:sha1:e1c5046e341dbd1cb1a331210645253993414d6b</id>
<content type='text'>
commit c018db4a57f3e31a9cb24d528e9f094eda89a499 upstream.

Ammar reports that he's seeing a lockdep splat on running test/rsrc_tags
from the regression suite:

======================================================
WARNING: possible circular locking dependency detected
5.14.0-rc3-bluetea-test-00249-gc7d102232649 #5 Tainted: G           OE
------------------------------------------------------
kworker/2:4/2684 is trying to acquire lock:
ffff88814bb1c0a8 (&amp;ctx-&gt;uring_lock){+.+.}-{3:3}, at: io_rsrc_put_work+0x13d/0x1a0

but task is already holding lock:
ffffc90001c6be70 ((work_completion)(&amp;(&amp;ctx-&gt;rsrc_put_work)-&gt;work)){+.+.}-{0:0}, at: process_one_work+0x1bc/0x530

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-&gt; #1 ((work_completion)(&amp;(&amp;ctx-&gt;rsrc_put_work)-&gt;work)){+.+.}-{0:0}:
       __flush_work+0x31b/0x490
       io_rsrc_ref_quiesce.part.0.constprop.0+0x35/0xb0
       __do_sys_io_uring_register+0x45b/0x1060
       do_syscall_64+0x35/0xb0
       entry_SYSCALL_64_after_hwframe+0x44/0xae

-&gt; #0 (&amp;ctx-&gt;uring_lock){+.+.}-{3:3}:
       __lock_acquire+0x119a/0x1e10
       lock_acquire+0xc8/0x2f0
       __mutex_lock+0x86/0x740
       io_rsrc_put_work+0x13d/0x1a0
       process_one_work+0x236/0x530
       worker_thread+0x52/0x3b0
       kthread+0x135/0x160
       ret_from_fork+0x1f/0x30

other info that might help us debug this:

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock((work_completion)(&amp;(&amp;ctx-&gt;rsrc_put_work)-&gt;work));
                               lock(&amp;ctx-&gt;uring_lock);
                               lock((work_completion)(&amp;(&amp;ctx-&gt;rsrc_put_work)-&gt;work));
  lock(&amp;ctx-&gt;uring_lock);

 *** DEADLOCK ***

2 locks held by kworker/2:4/2684:
 #0: ffff88810004d938 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x1bc/0x530
 #1: ffffc90001c6be70 ((work_completion)(&amp;(&amp;ctx-&gt;rsrc_put_work)-&gt;work)){+.+.}-{0:0}, at: process_one_work+0x1bc/0x530

stack backtrace:
CPU: 2 PID: 2684 Comm: kworker/2:4 Tainted: G           OE     5.14.0-rc3-bluetea-test-00249-gc7d102232649 #5
Hardware name: Acer Aspire ES1-421/OLVIA_BE, BIOS V1.05 07/02/2015
Workqueue: events io_rsrc_put_work
Call Trace:
 dump_stack_lvl+0x6a/0x9a
 check_noncircular+0xfe/0x110
 __lock_acquire+0x119a/0x1e10
 lock_acquire+0xc8/0x2f0
 ? io_rsrc_put_work+0x13d/0x1a0
 __mutex_lock+0x86/0x740
 ? io_rsrc_put_work+0x13d/0x1a0
 ? io_rsrc_put_work+0x13d/0x1a0
 ? io_rsrc_put_work+0x13d/0x1a0
 ? process_one_work+0x1ce/0x530
 io_rsrc_put_work+0x13d/0x1a0
 process_one_work+0x236/0x530
 worker_thread+0x52/0x3b0
 ? process_one_work+0x530/0x530
 kthread+0x135/0x160
 ? set_kthread_struct+0x40/0x40
 ret_from_fork+0x1f/0x30

which is due to holding the ctx-&gt;uring_lock when flushing existing
pending work, while the pending work flushing may need to grab the uring
lock if we're using IOPOLL.

Fix this by dropping the uring_lock a bit earlier as part of the flush.

Cc: stable@vger.kernel.org
Link: https://github.com/axboe/liburing/issues/404
Tested-by: Ammar Faizi &lt;ammarfaizi2@gmail.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
</feed>
