<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/block/elevator.h, branch v6.19.11</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.19.11</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.19.11'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2025-12-12T19:51:11+00:00</updated>
<entry>
<title>block: fix race between wbt_enable_default and IO submission</title>
<updated>2025-12-12T19:51:11+00:00</updated>
<author>
<name>Ming Lei</name>
<email>ming.lei@redhat.com</email>
</author>
<published>2025-12-12T14:35:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=9869d3a6fed381f3b98404e26e1afc75d680cbf9'/>
<id>urn:sha1:9869d3a6fed381f3b98404e26e1afc75d680cbf9</id>
<content type='text'>
When wbt_enable_default() is moved out of queue freezing in elevator_change(),
it can cause the wbt inflight counter to become negative (-1), leading to hung
tasks in the writeback path. Tasks get stuck in wbt_wait() because the counter
is in an inconsistent state.

The issue occurs because wbt_enable_default() could race with IO submission,
allowing the counter to be decremented before proper initialization. This manifests
as:

  rq_wait[0]:
    inflight:             -1
    has_waiters:        True

rwb_enabled() checks the state, which can be updated exactly between wbt_wait()
(rq_qos_throttle()) and wbt_track()(rq_qos_track()), then the inflight counter
will become negative.

And results in hung task warnings like:
  task:kworker/u24:39 state:D stack:0 pid:14767
  Call Trace:
    rq_qos_wait+0xb4/0x150
    wbt_wait+0xa9/0x100
    __rq_qos_throttle+0x24/0x40
    blk_mq_submit_bio+0x672/0x7b0
    ...

Fix this by:

1. Splitting wbt_enable_default() into:
   - __wbt_enable_default(): Returns true if wbt_init() should be called
   - wbt_enable_default(): Wrapper for existing callers (no init)
   - wbt_init_enable_default(): New function that checks and inits WBT

2. Using wbt_init_enable_default() in blk_register_queue() to ensure
   proper initialization during queue registration

3. Move wbt_init() out of wbt_enable_default() which is only for enabling
   disabled wbt from bfq and iocost, and wbt_init() isn't needed. Then the
   original lock warning can be avoided.

4. Removing the ELEVATOR_FLAG_ENABLE_WBT_ON_EXIT flag and its handling
   code since it's no longer needed

This ensures WBT is properly initialized before any IO can be submitted,
preventing the counter from going negative.

Cc: Nilay Shroff &lt;nilay@linux.ibm.com&gt;
Cc: Yu Kuai &lt;yukuai@fnnas.com&gt;
Cc: Guangwu Zhang &lt;guazhang@redhat.com&gt;
Fixes: 78c271344b6f ("block: move wbt_enable_default() out of queue freezing from sched -&gt;exit()")
Signed-off-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Reviewed-by: Nilay Shroff &lt;nilay@linux.ibm.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>block: remove the declaration of elevator_init_mq function</title>
<updated>2025-11-18T23:27:19+00:00</updated>
<author>
<name>Chengkaitao</name>
<email>chengkaitao@kylinos.cn</email>
</author>
<published>2025-11-18T01:26:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=8e1d91c2582d2b60d62616649546bb132fff566b'/>
<id>urn:sha1:8e1d91c2582d2b60d62616649546bb132fff566b</id>
<content type='text'>
In commit 1e44bedbc921 ("block: unifying elevator change"), the
elevator_init_mq function was deleted, but its declaration in elevator.h
was overlooked. This patch fixes it.

Signed-off-by: Chengkaitao &lt;chengkaitao@kylinos.cn&gt;
Reviewed-by: Chaitanya Kulkarni &lt;kch@nvidia.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>block: use {alloc|free}_sched data methods</title>
<updated>2025-11-13T16:27:49+00:00</updated>
<author>
<name>Nilay Shroff</name>
<email>nilay@linux.ibm.com</email>
</author>
<published>2025-11-13T08:58:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=0315476e78c050048e80f66334a310e5581b46bb'/>
<id>urn:sha1:0315476e78c050048e80f66334a310e5581b46bb</id>
<content type='text'>
The previous patch introduced -&gt;alloc_sched_data and
-&gt;free_sched_data methods. This patch builds upon that
by now using these methods during elevator switch and
nr_hw_queue update.

It's also ensured that scheduler-specific data is
allocated and freed through the new callbacks outside
of the -&gt;freeze_lock and -&gt;elevator_lock locking contexts,
thereby preventing any dependency on pcpu_alloc_mutex.

Reviewed-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Reviewed-by: Yu Kuai &lt;yukuai@fnnas.com&gt;
Signed-off-by: Nilay Shroff &lt;nilay@linux.ibm.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>block: introduce alloc_sched_data and free_sched_data elevator methods</title>
<updated>2025-11-13T16:27:49+00:00</updated>
<author>
<name>Nilay Shroff</name>
<email>nilay@linux.ibm.com</email>
</author>
<published>2025-11-13T08:58:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=61019afdf6ac17c8e8f9c42665aa1fa82f04a3e2'/>
<id>urn:sha1:61019afdf6ac17c8e8f9c42665aa1fa82f04a3e2</id>
<content type='text'>
The recent lockdep splat [1] highlights a potential deadlock risk
involving -&gt;elevator_lock and -&gt;freeze_lock dependencies on -pcpu_alloc_
mutex. The trace shows that the issue occurs when the Kyber scheduler
allocates dynamic memory for its elevator data during initialization.

To address this, introduce two new elevator operation callbacks:
-&gt;alloc_sched_data and -&gt;free_sched_data. The subsequent patch would
build upon these newly introduced methods to suppress lockdep splat[1].

[1] https://lore.kernel.org/all/CAGVVp+VNW4M-5DZMNoADp6o2VKFhi7KxWpTDkcnVyjO0=-D5+A@mail.gmail.com/

Signed-off-by: Nilay Shroff &lt;nilay@linux.ibm.com&gt;
Reviewed-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>block: move elevator tags into struct elevator_resources</title>
<updated>2025-11-13T16:27:49+00:00</updated>
<author>
<name>Nilay Shroff</name>
<email>nilay@linux.ibm.com</email>
</author>
<published>2025-11-13T08:58:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=04728ce90966c54417fd8120a3820104d18ba68d'/>
<id>urn:sha1:04728ce90966c54417fd8120a3820104d18ba68d</id>
<content type='text'>
This patch introduces a new structure, struct elevator_resources, to
group together all elevator-related resources that share the same
lifetime. As a first step, this change moves the elevator tag pointer
from struct elv_change_ctx into the new struct elevator_resources.

Additionally, rename blk_mq_alloc_sched_tags_batch() and
blk_mq_free_sched_tags_batch() to blk_mq_alloc_sched_res_batch() and
blk_mq_free_sched_res_batch(), respectively. Introduce two new wrapper
helpers, blk_mq_alloc_sched_res() and blk_mq_free_sched_res(), around
blk_mq_alloc_sched_tags() and blk_mq_free_sched_tags().

These changes pave the way for consolidating the allocation and freeing
of elevator-specific resources into common helper functions. This
refactoring improves encapsulation and prepares the code for future
extensions, allowing additional elevator-specific data to be added to
struct elevator_resources without cluttering struct elv_change_ctx.

Subsequent patches will extend struct elevator_resources to include
other elevator-related data.

Reviewed-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Reviewed-by: Yu Kuai &lt;yukuai@fnnas.com&gt;
Signed-off-by: Nilay Shroff &lt;nilay@linux.ibm.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>block: unify elevator tags and type xarrays into struct elv_change_ctx</title>
<updated>2025-11-13T16:27:49+00:00</updated>
<author>
<name>Nilay Shroff</name>
<email>nilay@linux.ibm.com</email>
</author>
<published>2025-11-13T08:58:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=232143b605387b372dee0ec7830f93b93df5f67d'/>
<id>urn:sha1:232143b605387b372dee0ec7830f93b93df5f67d</id>
<content type='text'>
Currently, the nr_hw_queues update path manages two disjoint xarrays —
one for elevator tags and another for elevator type — both used during
elevator switching. Maintaining these two parallel structures for the
same purpose adds unnecessary complexity and potential for mismatched
state.

This patch unifies both xarrays into a single structure, struct
elv_change_ctx, which holds all per-queue elevator change context. A
single xarray, named elv_tbl, now maps each queue (q-&gt;id) in a tagset
to its corresponding elv_change_ctx entry, encapsulating the elevator
tags, type and name references.

This unification simplifies the code, improves maintainability, and
clarifies ownership of per-queue elevator state.

Reviewed-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Reviewed-by: Yu Kuai &lt;yukuai@fnnas.com&gt;
Signed-off-by: Nilay Shroff &lt;nilay@linux.ibm.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>blk-mq: fix elevator depth_updated method</title>
<updated>2025-09-05T19:52:52+00:00</updated>
<author>
<name>Yu Kuai</name>
<email>yukuai3@huawei.com</email>
</author>
<published>2025-08-21T06:06:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=7d337eef4affc5e26e0570513168c69ddbc40f92'/>
<id>urn:sha1:7d337eef4affc5e26e0570513168c69ddbc40f92</id>
<content type='text'>
Current depth_updated has some problems:

1) depth_updated() will be called for each hctx, while all elevators
will update async_depth for the disk level, this is not related to hctx;
2) In blk_mq_update_nr_requests(), if previous hctx update succeed and
this hctx update failed, q-&gt;nr_requests will not be updated, while
async_depth is already updated with new nr_reqeuests in previous
depth_updated();
3) All elevators are using q-&gt;nr_requests to calculate async_depth now,
however, q-&gt;nr_requests is still the old value when depth_updated() is
called from blk_mq_update_nr_requests();

Those problems are first from error path, then mq-deadline, and recently
for bfq and kyber, fix those problems by:

- pass in request_queue instead of hctx;
- move depth_updated() after q-&gt;nr_requests is updated in
  blk_mq_update_nr_requests();
- add depth_updated() call inside init_sched() method to initialize
  async_depth;
- remove init_hctx() method for mq-deadline and bfq that is useless now;

Fixes: 77f1e0a52d26 ("bfq: update internal depth state when queue depth changes")
Fixes: 39823b47bbd4 ("block/mq-deadline: Fix the tag reservation code")
Fixes: 42e6c6ce03fd ("lib/sbitmap: convert shallow_depth from one word to the whole sbitmap")
Signed-off-by: Yu Kuai &lt;yukuai3@huawei.com&gt;
Reviewed-by: Hannes Reinecke &lt;hare@suse.de&gt;
Reviewed-by: Li Nan &lt;linan122@huawei.com&gt;
Reviewed-by: Nilay Shroff &lt;nilay@linux.ibm.com&gt;
Link: https://lore.kernel.org/r/20250821060612.1729939-2-yukuai1@huaweicloud.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>block: fix lockdep warning caused by lock dependency in elv_iosched_store</title>
<updated>2025-07-30T12:20:51+00:00</updated>
<author>
<name>Nilay Shroff</name>
<email>nilay@linux.ibm.com</email>
</author>
<published>2025-07-30T07:46:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=f5a6604f7a4405450e4a1f54e5430f47290c500f'/>
<id>urn:sha1:f5a6604f7a4405450e4a1f54e5430f47290c500f</id>
<content type='text'>
Recent lockdep reports [1] have revealed a potential deadlock caused by a
lock dependency between the percpu allocator lock and the elevator lock.
This issue can be avoided by ensuring that the allocation and release of
scheduler tags (sched_tags) are performed outside the elevator lock.
Furthermore, the queue does not need to be remain frozen during these
operations.

To address this, move all sched_tags allocations and deallocations outside
of both the -&gt;elevator_lock and the -&gt;freeze_lock. Since the lifetime of
the elevator queue and its associated sched_tags is closely tied, the
allocated sched_tags are now stored in the elevator queue structure. Then,
during the actual elevator switch (which runs under -&gt;freeze_lock and
-&gt;elevator_lock), the pre-allocated sched_tags are assigned to the
appropriate q-&gt;hctx. Once the elevator switch is complete and the locks
are released, the old elevator queue and its associated sched_tags are
freed.

This commit specifically addresses the allocation/deallocation of sched_
tags during elevator switching. Note that sched_tags may also be allocated
in other contexts, such as during nr_hw_queues updates. Supporting that
use case will require batch allocation/deallocation, which will be handled
in a follow-up patch.

This restructuring ensures that sched_tags memory management occurs
entirely outside of the -&gt;elevator_lock and -&gt;freeze_lock context,
eliminating the lock dependency problem seen during scheduler updates.

[1] https://lore.kernel.org/all/0659ea8d-a463-47c8-9180-43c719e106eb@linux.ibm.com/

Reported-by: Stefan Haberland &lt;sth@linux.ibm.com&gt;
Closes: https://lore.kernel.org/all/0659ea8d-a463-47c8-9180-43c719e106eb@linux.ibm.com/
Reviewed-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Hannes Reinecke &lt;hare@suse.de&gt;
Signed-off-by: Nilay Shroff &lt;nilay@linux.ibm.com&gt;
Link: https://lore.kernel.org/r/20250730074614.2537382-3-nilay@linux.ibm.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>block: move elevator queue allocation logic into blk_mq_init_sched</title>
<updated>2025-07-30T12:20:51+00:00</updated>
<author>
<name>Nilay Shroff</name>
<email>nilay@linux.ibm.com</email>
</author>
<published>2025-07-30T07:46:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=49811586be373e26a3ab52f54e0dfa663c02fddd'/>
<id>urn:sha1:49811586be373e26a3ab52f54e0dfa663c02fddd</id>
<content type='text'>
In preparation for allocating sched_tags before freezing the request
queue and acquiring -&gt;elevator_lock, move the elevator queue allocation
logic from the elevator ops -&gt;init_sched callback into blk_mq_init_sched.
As elevator_alloc is now only invoked from block layer core, we don't
need to export it, so unexport elevator_alloc function.

This refactoring provides a centralized location for elevator queue
initialization, which makes it easier to store pre-allocated sched_tags
in the struct elevator_queue during later changes.

Reviewed-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Reviewed-by: Hannes Reinecke &lt;hare@suse.de&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Nilay Shroff &lt;nilay@linux.ibm.com&gt;
Link: https://lore.kernel.org/r/20250730074614.2537382-2-nilay@linux.ibm.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>block: move wbt_enable_default() out of queue freezing from sched -&gt;exit()</title>
<updated>2025-05-06T13:43:43+00:00</updated>
<author>
<name>Ming Lei</name>
<email>ming.lei@redhat.com</email>
</author>
<published>2025-05-05T14:18:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=78c271344b6f64ce24c845e54903e09928cf2061'/>
<id>urn:sha1:78c271344b6f64ce24c845e54903e09928cf2061</id>
<content type='text'>
scheduler's -&gt;exit() is called with queue frozen and elevator lock is held, and
wbt_enable_default() can't be called with queue frozen, otherwise the
following lockdep warning is triggered:

	#6 (&amp;q-&gt;rq_qos_mutex){+.+.}-{4:4}:
	#5 (&amp;eq-&gt;sysfs_lock){+.+.}-{4:4}:
	#4 (&amp;q-&gt;elevator_lock){+.+.}-{4:4}:
	#3 (&amp;q-&gt;q_usage_counter(io)#3){++++}-{0:0}:
	#2 (fs_reclaim){+.+.}-{0:0}:
	#1 (&amp;sb-&gt;s_type-&gt;i_mutex_key#3){+.+.}-{4:4}:
	#0 (&amp;q-&gt;debugfs_mutex){+.+.}-{4:4}:

Fix the issue by moving wbt_enable_default() out of bfq's exit(), and
call it from elevator_change_done().

Meantime add disk-&gt;rqos_state_mutex for covering wbt state change, which
matches the purpose more than -&gt;elevator_lock.

Reviewed-by: Hannes Reinecke &lt;hare@suse.de&gt;
Reviewed-by: Nilay Shroff &lt;nilay@linux.ibm.com&gt;
Signed-off-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Link: https://lore.kernel.org/r/20250505141805.2751237-26-ming.lei@redhat.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
</feed>
