<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/lib/sbitmap.c, branch v6.1.12</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.1.12</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.1.12'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2022-10-11T23:42:58+00:00</updated>
<entry>
<title>treewide: use prandom_u32_max() when possible, part 2</title>
<updated>2022-10-11T23:42:58+00:00</updated>
<author>
<name>Jason A. Donenfeld</name>
<email>Jason@zx2c4.com</email>
</author>
<published>2022-10-05T14:43:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=8b3ccbc1f1f91847160951aa15dd27c22dddcb49'/>
<id>urn:sha1:8b3ccbc1f1f91847160951aa15dd27c22dddcb49</id>
<content type='text'>
Rather than incurring a division or requesting too many random bytes for
the given range, use the prandom_u32_max() function, which only takes
the minimum required bytes from the RNG and avoids divisions. This was
done by hand, covering things that coccinelle could not do on its own.

Reviewed-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Reviewed-by: Kees Cook &lt;keescook@chromium.org&gt;
Reviewed-by: Yury Norov &lt;yury.norov@gmail.com&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt; # for ext2, ext4, and sbitmap
Acked-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: Jason A. Donenfeld &lt;Jason@zx2c4.com&gt;
</content>
</entry>
<entry>
<title>treewide: use prandom_u32_max() when possible, part 1</title>
<updated>2022-10-11T23:42:55+00:00</updated>
<author>
<name>Jason A. Donenfeld</name>
<email>Jason@zx2c4.com</email>
</author>
<published>2022-10-05T14:43:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=81895a65ec63ee1daec3255dc1a06675d2fbe915'/>
<id>urn:sha1:81895a65ec63ee1daec3255dc1a06675d2fbe915</id>
<content type='text'>
Rather than incurring a division or requesting too many random bytes for
the given range, use the prandom_u32_max() function, which only takes
the minimum required bytes from the RNG and avoids divisions. This was
done mechanically with this coccinelle script:

@basic@
expression E;
type T;
identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
typedef u64;
@@
(
- ((T)get_random_u32() % (E))
+ prandom_u32_max(E)
|
- ((T)get_random_u32() &amp; ((E) - 1))
+ prandom_u32_max(E * XXX_MAKE_SURE_E_IS_POW2)
|
- ((u64)(E) * get_random_u32() &gt;&gt; 32)
+ prandom_u32_max(E)
|
- ((T)get_random_u32() &amp; ~PAGE_MASK)
+ prandom_u32_max(PAGE_SIZE)
)

@multi_line@
identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
identifier RAND;
expression E;
@@

-       RAND = get_random_u32();
        ... when != RAND
-       RAND %= (E);
+       RAND = prandom_u32_max(E);

// Find a potential literal
@literal_mask@
expression LITERAL;
type T;
identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
position p;
@@

        ((T)get_random_u32()@p &amp; (LITERAL))

// Add one to the literal.
@script:python add_one@
literal &lt;&lt; literal_mask.LITERAL;
RESULT;
@@

value = None
if literal.startswith('0x'):
        value = int(literal, 16)
elif literal[0] in '123456789':
        value = int(literal, 10)
if value is None:
        print("I don't know how to handle %s" % (literal))
        cocci.include_match(False)
elif value == 2**32 - 1 or value == 2**31 - 1 or value == 2**24 - 1 or value == 2**16 - 1 or value == 2**8 - 1:
        print("Skipping 0x%x for cleanup elsewhere" % (value))
        cocci.include_match(False)
elif value &amp; (value + 1) != 0:
        print("Skipping 0x%x because it's not a power of two minus one" % (value))
        cocci.include_match(False)
elif literal.startswith('0x'):
        coccinelle.RESULT = cocci.make_expr("0x%x" % (value + 1))
else:
        coccinelle.RESULT = cocci.make_expr("%d" % (value + 1))

// Replace the literal mask with the calculated result.
@plus_one@
expression literal_mask.LITERAL;
position literal_mask.p;
expression add_one.RESULT;
identifier FUNC;
@@

-       (FUNC()@p &amp; (LITERAL))
+       prandom_u32_max(RESULT)

@collapse_ret@
type T;
identifier VAR;
expression E;
@@

 {
-       T VAR;
-       VAR = (E);
-       return VAR;
+       return E;
 }

@drop_var@
type T;
identifier VAR;
@@

 {
-       T VAR;
        ... when != VAR
 }

Reviewed-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Reviewed-by: Kees Cook &lt;keescook@chromium.org&gt;
Reviewed-by: Yury Norov &lt;yury.norov@gmail.com&gt;
Reviewed-by: KP Singh &lt;kpsingh@kernel.org&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt; # for ext4 and sbitmap
Reviewed-by: Christoph Böhmwalder &lt;christoph.boehmwalder@linbit.com&gt; # for drbd
Acked-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Acked-by: Heiko Carstens &lt;hca@linux.ibm.com&gt; # for s390
Acked-by: Ulf Hansson &lt;ulf.hansson@linaro.org&gt; # for mmc
Acked-by: Darrick J. Wong &lt;djwong@kernel.org&gt; # for xfs
Signed-off-by: Jason A. Donenfeld &lt;Jason@zx2c4.com&gt;
</content>
</entry>
<entry>
<title>sbitmap: fix lockup while swapping</title>
<updated>2022-09-29T23:58:17+00:00</updated>
<author>
<name>Hugh Dickins</name>
<email>hughd@google.com</email>
</author>
<published>2022-09-29T19:50:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=30514bd2dd4e86a3ecfd6a93a3eadf7b9ea164a0'/>
<id>urn:sha1:30514bd2dd4e86a3ecfd6a93a3eadf7b9ea164a0</id>
<content type='text'>
Commit 4acb83417cad ("sbitmap: fix batched wait_cnt accounting")
is a big improvement: without it, I had to revert to before commit
040b83fcecfb ("sbitmap: fix possible io hung due to lost wakeup")
to avoid the high system time and freezes which that had introduced.

Now okay on the NVME laptop, but 4acb83417cad is a disaster for heavy
swapping (kernel builds in low memory) on another: soon locking up in
sbitmap_queue_wake_up() (into which __sbq_wake_up() is inlined), cycling
around with waitqueue_active() but wait_cnt 0 .  Here is a backtrace,
showing the common pattern of outer sbitmap_queue_wake_up() interrupted
before setting wait_cnt 0 back to wake_batch (in some cases other CPUs
are idle, in other cases they're spinning for a lock in dd_bio_merge()):

sbitmap_queue_wake_up &lt; sbitmap_queue_clear &lt; blk_mq_put_tag &lt;
__blk_mq_free_request &lt; blk_mq_free_request &lt; __blk_mq_end_request &lt;
scsi_end_request &lt; scsi_io_completion &lt; scsi_finish_command &lt;
scsi_complete &lt; blk_complete_reqs &lt; blk_done_softirq &lt; __do_softirq &lt;
__irq_exit_rcu &lt; irq_exit_rcu &lt; common_interrupt &lt; asm_common_interrupt &lt;
_raw_spin_unlock_irqrestore &lt; __wake_up_common_lock &lt; __wake_up &lt;
sbitmap_queue_wake_up &lt; sbitmap_queue_clear &lt; blk_mq_put_tag &lt;
__blk_mq_free_request &lt; blk_mq_free_request &lt; dd_bio_merge &lt;
blk_mq_sched_bio_merge &lt; blk_mq_attempt_bio_merge &lt; blk_mq_submit_bio &lt;
__submit_bio &lt; submit_bio_noacct_nocheck &lt; submit_bio_noacct &lt;
submit_bio &lt; __swap_writepage &lt; swap_writepage &lt; pageout &lt;
shrink_folio_list &lt; evict_folios &lt; lru_gen_shrink_lruvec &lt;
shrink_lruvec &lt; shrink_node &lt; do_try_to_free_pages &lt; try_to_free_pages &lt;
__alloc_pages_slowpath &lt; __alloc_pages &lt; folio_alloc &lt; vma_alloc_folio &lt;
do_anonymous_page &lt; __handle_mm_fault &lt; handle_mm_fault &lt;
do_user_addr_fault &lt; exc_page_fault &lt; asm_exc_page_fault

See how the process-context sbitmap_queue_wake_up() has been interrupted,
after bringing wait_cnt down to 0 (and in this example, after doing its
wakeups), before advancing wake_index and refilling wake_cnt: an
interrupt-context sbitmap_queue_wake_up() of the same sbq gets stuck.

I have almost no grasp of all the possible sbitmap races, and their
consequences: but __sbq_wake_up() can do nothing useful while wait_cnt 0,
so it is better if sbq_wake_ptr() skips on to the next ws in that case:
which fixes the lockup and shows no adverse consequence for me.

The check for wait_cnt being 0 is obviously racy, and ultimately can lead
to lost wakeups: for example, when there is only a single waitqueue with
waiters.  However, lost wakeups are unlikely to matter in these cases,
and a proper fix requires redesign (and benchmarking) of the batched
wakeup code: so let's plug the hole with this bandaid for now.

Signed-off-by: Hugh Dickins &lt;hughd@google.com&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Reviewed-by: Keith Busch &lt;kbusch@kernel.org&gt;
Link: https://lore.kernel.org/r/9c2038a7-cdc5-5ee-854c-fbc6168bf16@google.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>sbitmap: fix batched wait_cnt accounting</title>
<updated>2022-09-12T06:10:34+00:00</updated>
<author>
<name>Keith Busch</name>
<email>kbusch@kernel.org</email>
</author>
<published>2022-09-09T18:40:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=4acb83417cadfdcbe64215f9d0ddcf3132af808e'/>
<id>urn:sha1:4acb83417cadfdcbe64215f9d0ddcf3132af808e</id>
<content type='text'>
Batched completions can clear multiple bits, but we're only decrementing
the wait_cnt by one each time. This can cause waiters to never be woken,
stalling IO. Use the batched count instead.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=215679
Signed-off-by: Keith Busch &lt;kbusch@kernel.org&gt;
Link: https://lore.kernel.org/r/20220909184022.1709476-1-kbusch@fb.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>sbitmap: Use atomic_long_try_cmpxchg in __sbitmap_queue_get_batch</title>
<updated>2022-09-08T15:22:42+00:00</updated>
<author>
<name>Uros Bizjak</name>
<email>ubizjak@gmail.com</email>
</author>
<published>2022-09-08T15:12:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c35227d4e8cbc70a6622cc7cc5f8c3bff513f1fa'/>
<id>urn:sha1:c35227d4e8cbc70a6622cc7cc5f8c3bff513f1fa</id>
<content type='text'>
Use atomic_long_try_cmpxchg instead of
atomic_long_cmpxchg (*ptr, old, new) == old in __sbitmap_queue_get_batch.
x86 CMPXCHG instruction returns success in ZF flag, so this change
saves a compare after cmpxchg (and related move instruction in front
of cmpxchg).

Also, atomic_long_cmpxchg implicitly assigns old *ptr value to "old"
when cmpxchg fails, enabling further code simplifications, e.g.
an extra memory read can be avoided in the loop.

No functional change intended.

Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Uros Bizjak &lt;ubizjak@gmail.com&gt;
Link: https://lore.kernel.org/r/20220908151200.9993-1-ubizjak@gmail.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>sbitmap: Avoid leaving waitqueue in invalid state in __sbq_wake_up()</title>
<updated>2022-09-08T14:39:04+00:00</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2022-09-08T13:09:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=48c033314f372478548203c583529f53080fd078'/>
<id>urn:sha1:48c033314f372478548203c583529f53080fd078</id>
<content type='text'>
When __sbq_wake_up() decrements wait_cnt to 0 but races with someone
else waking the waiter on the waitqueue (so the waitqueue becomes
empty), it exits without reseting wait_cnt to wake_batch number. Once
wait_cnt is 0, nobody will ever reset the wait_cnt or wake the new
waiters resulting in possible deadlocks or busyloops. Fix the problem by
making sure we reset wait_cnt even if we didn't wake up anybody in the
end.

Fixes: 040b83fcecfb ("sbitmap: fix possible io hung due to lost wakeup")
Reported-by: Keith Busch &lt;kbusch@kernel.org&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Link: https://lore.kernel.org/r/20220908130937.2795-1-jack@suse.cz
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>Revert "sbitmap: fix batched wait_cnt accounting"</title>
<updated>2022-09-04T12:39:25+00:00</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2022-09-04T12:39:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=bce1b56c73826fec8caf6187f0c922ede397a5a8'/>
<id>urn:sha1:bce1b56c73826fec8caf6187f0c922ede397a5a8</id>
<content type='text'>
This reverts commit 16ede66973c84f890c03584f79158dd5b2d725f5.

This is causing issues with CPU stalls on my test box, revert it for
now until we understand what is going on. It looks like infinite
looping off sbitmap_queue_wake_up(), but hard to tell with a lot of
CPUs hitting this issue and the console scrolling infinitely.

Link: https://lore.kernel.org/linux-block/e742813b-ce5c-0d58-205b-1626f639b1bd@kernel.dk/
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>sbitmap: fix batched wait_cnt accounting</title>
<updated>2022-09-01T16:42:41+00:00</updated>
<author>
<name>Keith Busch</name>
<email>kbusch@kernel.org</email>
</author>
<published>2022-08-25T14:53:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=16ede66973c84f890c03584f79158dd5b2d725f5'/>
<id>urn:sha1:16ede66973c84f890c03584f79158dd5b2d725f5</id>
<content type='text'>
Batched completions can clear multiple bits, but we're only decrementing
the wait_cnt by one each time. This can cause waiters to never be woken,
stalling IO. Use the batched count instead.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=215679
Signed-off-by: Keith Busch &lt;kbusch@kernel.org&gt;
Link: https://lore.kernel.org/r/20220825145312.1217900-1-kbusch@fb.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>sbitmap: remove unnecessary code in __sbitmap_queue_get_batch</title>
<updated>2022-08-26T13:32:53+00:00</updated>
<author>
<name>Liu Song</name>
<email>liusong@linux.alibaba.com</email>
</author>
<published>2022-08-26T03:14:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=ddbfc34fcf5d0bc33b006b90c580c56edeb31068'/>
<id>urn:sha1:ddbfc34fcf5d0bc33b006b90c580c56edeb31068</id>
<content type='text'>
If "nr + nr_tags &lt;= map_depth", then the value of nr_tags will not be
greater than map_depth, so no additional comparison is required.

Signed-off-by: Liu Song &lt;liusong@linux.alibaba.com&gt;
Link: https://lore.kernel.org/r/1661483653-27326-1-git-send-email-liusong@linux.alibaba.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>sbitmap: fix possible io hung due to lost wakeup</title>
<updated>2022-08-23T13:37:21+00:00</updated>
<author>
<name>Yu Kuai</name>
<email>yukuai3@huawei.com</email>
</author>
<published>2022-08-03T12:15:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=040b83fcecfb86f3225d3a5de7fd9b3fbccf83b4'/>
<id>urn:sha1:040b83fcecfb86f3225d3a5de7fd9b3fbccf83b4</id>
<content type='text'>
There are two problems can lead to lost wakeup:

1) invalid wakeup on the wrong waitqueue:

For example, 2 * wake_batch tags are put, while only wake_batch threads
are woken:

__sbq_wake_up
 atomic_cmpxchg -&gt; reset wait_cnt
			__sbq_wake_up -&gt; decrease wait_cnt
			...
			__sbq_wake_up -&gt; wait_cnt is decreased to 0 again
			 atomic_cmpxchg
			 sbq_index_atomic_inc -&gt; increase wake_index
			 wake_up_nr -&gt; wake up and waitqueue might be empty
 sbq_index_atomic_inc -&gt; increase again, one waitqueue is skipped
 wake_up_nr -&gt; invalid wake up because old wakequeue might be empty

To fix the problem, increasing 'wake_index' before resetting 'wait_cnt'.

2) 'wait_cnt' can be decreased while waitqueue is empty

As pointed out by Jan Kara, following race is possible:

CPU1				CPU2
__sbq_wake_up			 __sbq_wake_up
 sbq_wake_ptr()			 sbq_wake_ptr() -&gt; the same
 wait_cnt = atomic_dec_return()
 /* decreased to 0 */
 sbq_index_atomic_inc()
 /* move to next waitqueue */
 atomic_set()
 /* reset wait_cnt */
 wake_up_nr()
 /* wake up on the old waitqueue */
				 wait_cnt = atomic_dec_return()
				 /*
				  * decrease wait_cnt in the old
				  * waitqueue, while it can be
				  * empty.
				  */

Fix the problem by waking up before updating 'wake_index' and
'wait_cnt'.

With this patch, noted that 'wait_cnt' is still decreased in the old
empty waitqueue, however, the wakeup is redirected to a active waitqueue,
and the extra decrement on the old empty waitqueue is not handled.

Fixes: 88459642cba4 ("blk-mq: abstract tag allocation out into sbitmap library")
Signed-off-by: Yu Kuai &lt;yukuai3@huawei.com&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Link: https://lore.kernel.org/r/20220803121504.212071-1-yukuai1@huaweicloud.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
</feed>
