<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/lib/sbitmap.c, branch v6.12.80</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.12.80</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.12.80'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2025-08-20T16:30:49+00:00</updated>
<entry>
<title>lib/sbitmap: convert shallow_depth from one word to the whole sbitmap</title>
<updated>2025-08-20T16:30:49+00:00</updated>
<author>
<name>Yu Kuai</name>
<email>yukuai3@huawei.com</email>
</author>
<published>2025-08-07T03:24:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=ed30c38d1e002916b7ef816ba2a56f621a85ed00'/>
<id>urn:sha1:ed30c38d1e002916b7ef816ba2a56f621a85ed00</id>
<content type='text'>
[ Upstream commit 42e6c6ce03fd3e41e39a0f93f9b1a1d9fa664338 ]

Currently elevators will record internal 'async_depth' to throttle
asynchronous requests, and they both calculate shallow_dpeth based on
sb-&gt;shift, with the respect that sb-&gt;shift is the available tags in one
word.

However, sb-&gt;shift is not the availbale tags in the last word, see
__map_depth:

if (index == sb-&gt;map_nr - 1)
  return sb-&gt;depth - (index &lt;&lt; sb-&gt;shift);

For consequence, if the last word is used, more tags can be get than
expected, for example, assume nr_requests=256 and there are four words,
in the worst case if user set nr_requests=32, then the first word is
the last word, and still use bits per word, which is 64, to calculate
async_depth is wrong.

One the ohter hand, due to cgroup qos, bfq can allow only one request
to be allocated, and set shallow_dpeth=1 will still allow the number
of words request to be allocated.

Fix this problems by using shallow_depth to the whole sbitmap instead
of per word, also change kyber, mq-deadline and bfq to follow this,
a new helper __map_depth_with_shallow() is introduced to calculate
available bits in each word.

Signed-off-by: Yu Kuai &lt;yukuai3@huawei.com&gt;
Link: https://lore.kernel.org/r/20250807032413.1469456-2-yukuai1@huaweicloud.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>lib/sbitmap: define swap_lock as raw_spinlock_t</title>
<updated>2024-09-20T06:20:06+00:00</updated>
<author>
<name>Ming Lei</name>
<email>ming.lei@redhat.com</email>
</author>
<published>2024-09-19T02:17:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=65f666c6203600053478ce8e34a1db269a8701c9'/>
<id>urn:sha1:65f666c6203600053478ce8e34a1db269a8701c9</id>
<content type='text'>
When called from sbitmap_queue_get(), sbitmap_deferred_clear() may be run
with preempt disabled. In RT kernel, spin_lock() can sleep, then warning
of "BUG: sleeping function called from invalid context" can be triggered.

Fix it by replacing it with raw_spin_lock.

Cc: Yang Yang &lt;yang.yang@vivo.com&gt;
Fixes: 72d04bdcf3f7 ("sbitmap: fix io hung due to race on sbitmap_word::cleared")
Signed-off-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Reviewed-by: Yang Yang &lt;yang.yang@vivo.com&gt;
Link: https://lore.kernel.org/r/20240919021709.511329-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>sbitmap: fix io hung due to race on sbitmap_word::cleared</title>
<updated>2024-07-19T15:39:32+00:00</updated>
<author>
<name>Yang Yang</name>
<email>yang.yang@vivo.com</email>
</author>
<published>2024-07-16T08:26:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=72d04bdcf3f7d7e07d82f9757946f68802a7270a'/>
<id>urn:sha1:72d04bdcf3f7d7e07d82f9757946f68802a7270a</id>
<content type='text'>
Configuration for sbq:
  depth=64, wake_batch=6, shift=6, map_nr=1

1. There are 64 requests in progress:
  map-&gt;word = 0xFFFFFFFFFFFFFFFF
2. After all the 64 requests complete, and no more requests come:
  map-&gt;word = 0xFFFFFFFFFFFFFFFF, map-&gt;cleared = 0xFFFFFFFFFFFFFFFF
3. Now two tasks try to allocate requests:
  T1:                                       T2:
  __blk_mq_get_tag                          .
  __sbitmap_queue_get                       .
  sbitmap_get                               .
  sbitmap_find_bit                          .
  sbitmap_find_bit_in_word                  .
  __sbitmap_get_word  -&gt; nr=-1              __blk_mq_get_tag
  sbitmap_deferred_clear                    __sbitmap_queue_get
  /* map-&gt;cleared=0xFFFFFFFFFFFFFFFF */     sbitmap_find_bit
    if (!READ_ONCE(map-&gt;cleared))           sbitmap_find_bit_in_word
      return false;                         __sbitmap_get_word -&gt; nr=-1
    mask = xchg(&amp;map-&gt;cleared, 0)           sbitmap_deferred_clear
    atomic_long_andnot()                    /* map-&gt;cleared=0 */
                                              if (!(map-&gt;cleared))
                                                return false;
                                     /*
                                      * map-&gt;cleared is cleared by T1
                                      * T2 fail to acquire the tag
                                      */

4. T2 is the sole tag waiter. When T1 puts the tag, T2 cannot be woken
up due to the wake_batch being set at 6. If no more requests come, T1
will wait here indefinitely.

This patch achieves two purposes:
1. Check on -&gt;cleared and update on both -&gt;cleared and -&gt;word need to
be done atomically, and using spinlock could be the simplest solution.
2. Add extra check in sbitmap_deferred_clear(), to identify whether
-&gt;word has free bits.

Fixes: ea86ea2cdced ("sbitmap: ammortize cost of clearing bits")
Signed-off-by: Yang Yang &lt;yang.yang@vivo.com&gt;
Reviewed-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Reviewed-by: Bart Van Assche &lt;bvanassche@acm.org&gt;
Link: https://lore.kernel.org/r/20240716082644.659566-1-yang.yang@vivo.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>sbitmap: use READ_ONCE to access map-&gt;word</title>
<updated>2024-04-26T13:40:28+00:00</updated>
<author>
<name>linke li</name>
<email>lilinke99@qq.com</email>
</author>
<published>2024-04-26T10:34:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=6ad0d7e0f4b68f87a98ea2b239123b7d865df86b'/>
<id>urn:sha1:6ad0d7e0f4b68f87a98ea2b239123b7d865df86b</id>
<content type='text'>
In __sbitmap_queue_get_batch(), map-&gt;word is read several times, and
update atomically using atomic_long_try_cmpxchg(). But the first two read
of map-&gt;word is not protected.

This patch moves the statement val = READ_ONCE(map-&gt;word) forward,
eliminating unprotected accesses to map-&gt;word within the function.
It is aimed at reducing the number of benign races reported by KCSAN in
order to focus future debugging effort on harmful races.

Signed-off-by: linke li &lt;lilinke99@qq.com&gt;
Link: https://lore.kernel.org/r/tencent_0B517C25E519D3D002194E8445E86C04AD0A@qq.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>sbitmap: remove stale comment in sbq_calc_wake_batch</title>
<updated>2024-01-15T14:23:50+00:00</updated>
<author>
<name>Kemeng Shi</name>
<email>shikemeng@huaweicloud.com</email>
</author>
<published>2024-01-15T14:56:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=5c7fa5c8ad79a1d7cc9f59636e2f99e8b5471248'/>
<id>urn:sha1:5c7fa5c8ad79a1d7cc9f59636e2f99e8b5471248</id>
<content type='text'>
After commit 106397376c036 ("sbitmap: fix batching wakeup"), we may wake
up more than one queue for each batch. Just remove stale comment that
we wake up only one queue for each batch.

Signed-off-by: Kemeng Shi &lt;shikemeng@huaweicloud.com&gt;
Link: https://lore.kernel.org/r/20240115145626.665562-1-shikemeng@huaweicloud.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>sbitmap: fix batching wakeup</title>
<updated>2023-07-21T17:40:20+00:00</updated>
<author>
<name>David Jeffery</name>
<email>djeffery@redhat.com</email>
</author>
<published>2023-07-21T09:57:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=106397376c0369fcc01c58dd189ff925a2724a57'/>
<id>urn:sha1:106397376c0369fcc01c58dd189ff925a2724a57</id>
<content type='text'>
Current code supposes that it is enough to provide forward progress by
just waking up one wait queue after one completion batch is done.

Unfortunately this way isn't enough, cause waiter can be added to wait
queue just after it is woken up.

Follows one example(64 depth, wake_batch is 8)

1) all 64 tags are active

2) in each wait queue, there is only one single waiter

3) each time one completion batch(8 completions) wakes up just one
   waiter in each wait queue, then immediately one new sleeper is added
   to this wait queue

4) after 64 completions, 8 waiters are wakeup, and there are still 8
   waiters in each wait queue

5) after another 8 active tags are completed, only one waiter can be
   wakeup, and the other 7 can't be waken up anymore.

Turns out it isn't easy to fix this problem, so simply wakeup enough
waiters for single batch.

Cc: Kemeng Shi &lt;shikemeng@huaweicloud.com&gt;
Cc: Chengming Zhou &lt;zhouchengming@bytedance.com&gt;
Cc: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: David Jeffery &lt;djeffery@redhat.com&gt;
Signed-off-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Reviewed-by: Gabriel Krisman Bertazi &lt;krisman@suse.de&gt;
Reviewed-by: Keith Busch &lt;kbusch@kernel.org&gt;
Link: https://lore.kernel.org/r/20230721095715.232728-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>sbitmap: correct wake_batch recalculation to avoid potential IO hung</title>
<updated>2023-01-30T03:03:01+00:00</updated>
<author>
<name>Kemeng Shi</name>
<email>shikemeng@huaweicloud.com</email>
</author>
<published>2023-01-16T20:50:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b5fcf7871acb7f9a3a8ed341a68bd86aba3e254a'/>
<id>urn:sha1:b5fcf7871acb7f9a3a8ed341a68bd86aba3e254a</id>
<content type='text'>
Commit 180dccb0dba4f ("blk-mq: fix tag_get wait task can't be awakened")
mentioned that in case of shared tags, there could be just one real
active hctx(queue) because of lazy detection of tag idle. Then driver tag
allocation may wait forever on this real active hctx(queue) if wake_batch
is &gt; hctx_max_depth where hctx_max_depth is available tags depth for the
actve hctx(queue). However, the condition wake_batch &gt; hctx_max_depth is
not strong enough to avoid IO hung as the sbitmap_queue_wake_up will only
wake up one wait queue for each wake_batch even though there is only one
waiter in the woken wait queue. After this, there is only one tag to free
and wake_batch may not be reached anymore. Commit 180dccb0dba4f ("blk-mq:
fix tag_get wait task can't be awakened") methioned that driver tag
allocation may wait forever. Actually, the inactive hctx(queue) will be
truely idle after at most 30 seconds and will call blk_mq_tag_wakeup_all
to wake one waiter per wait queue to break the hung. But IO hung for 30
seconds is also not acceptable. Set batch size to small enough that depth
of the shared hctx(queue) is enough to wake up all of the queues like
sbq_calc_wake_batch do to fix this potential IO hung.

Although hctx_max_depth will be clamped to at least 4 while wake_batch
recalculation does not do the clamp, the wake_batch will be always
recalculated to 1 when hctx_max_depth &lt;= 4.

Fixes: 180dccb0dba4 ("blk-mq: fix tag_get wait task can't be awakened")
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Kemeng Shi &lt;shikemeng@huaweicloud.com&gt;
Link: https://lore.kernel.org/r/20230116205059.3821738-6-shikemeng@huaweicloud.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>sbitmap: add sbitmap_find_bit to remove repeat code in __sbitmap_get/__sbitmap_get_shallow</title>
<updated>2023-01-30T03:03:01+00:00</updated>
<author>
<name>Kemeng Shi</name>
<email>shikemeng@huaweicloud.com</email>
</author>
<published>2023-01-16T20:50:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=678418c6128f112fc5584beb5cdd21fbc225badf'/>
<id>urn:sha1:678418c6128f112fc5584beb5cdd21fbc225badf</id>
<content type='text'>
There are three differences between __sbitmap_get and
__sbitmap_get_shallow when searching free bit:
1. __sbitmap_get_shallow limit number of bit to search per word.
__sbitmap_get has no such limit.
2. __sbitmap_get_shallow always searches with wrap set. __sbitmap_get set
wrap according to round_robin.
3. __sbitmap_get_shallow always searches from first bit in first word.
__sbitmap_get searches from first bit when round_robin is not set
otherwise searches from SB_NR_TO_BIT(sb, alloc_hint).

Add helper function sbitmap_find_bit function to do common search while
accept "limit depth per word", "wrap flag" and "first bit to
search" from caller to support the need of both __sbitmap_get and
__sbitmap_get_shallow.

Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Kemeng Shi &lt;shikemeng@huaweicloud.com&gt;
Link: https://lore.kernel.org/r/20230116205059.3821738-5-shikemeng@huaweicloud.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>sbitmap: rewrite sbitmap_find_bit_in_index to reduce repeat code</title>
<updated>2023-01-30T03:03:01+00:00</updated>
<author>
<name>Kemeng Shi</name>
<email>shikemeng@huaweicloud.com</email>
</author>
<published>2023-01-16T20:50:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=08470a98a7d7e32c787b23b87353f13b03c23195'/>
<id>urn:sha1:08470a98a7d7e32c787b23b87353f13b03c23195</id>
<content type='text'>
Rewrite sbitmap_find_bit_in_index as following:
1. Rename sbitmap_find_bit_in_index to sbitmap_find_bit_in_word
2. Accept "struct sbitmap_word *" directly instead of accepting
"struct sbitmap *" and "int index" to get "struct sbitmap_word *".
3. Accept depth/shallow_depth and wrap for __sbitmap_get_word from caller
to support need of both __sbitmap_get_shallow and __sbitmap_get.

With helper function sbitmap_find_bit_in_word, we can remove repeat
code in __sbitmap_get_shallow to find bit considring deferred clear.

Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Kemeng Shi &lt;shikemeng@huaweicloud.com&gt;
Link: https://lore.kernel.org/r/20230116205059.3821738-4-shikemeng@huaweicloud.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>sbitmap: remove redundant check in __sbitmap_queue_get_batch</title>
<updated>2023-01-30T03:03:01+00:00</updated>
<author>
<name>Kemeng Shi</name>
<email>shikemeng@huaweicloud.com</email>
</author>
<published>2023-01-16T20:50:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=903e86f3a64d9573352bbab2f211fdbbaa5772b7'/>
<id>urn:sha1:903e86f3a64d9573352bbab2f211fdbbaa5772b7</id>
<content type='text'>
Commit fbb564a557809 ("lib/sbitmap: Fix invalid loop in
__sbitmap_queue_get_batch()") mentioned that "Checking free bits when
setting the target bits. Otherwise, it may reuse the busying bits."
This commit add check to make sure all masked bits in word before
cmpxchg is zero. Then the existing check after cmpxchg to check any
zero bit is existing in masked bits in word is redundant.

Actually, old value of word before cmpxchg is stored in val and we
will filter out busy bits in val by "(get_mask &amp; ~val)" after cmpxchg.
So we will not reuse busy bits methioned in commit fbb564a557809
("lib/sbitmap: Fix invalid loop in __sbitmap_queue_get_batch()"). Revert
new-added check to remove redundant check.

Fixes: fbb564a55780 ("lib/sbitmap: Fix invalid loop in __sbitmap_queue_get_batch()")
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Kemeng Shi &lt;shikemeng@huaweicloud.com&gt;
Link: https://lore.kernel.org/r/20230116205059.3821738-3-shikemeng@huaweicloud.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
</feed>
