<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/block/blk-iocost.c, branch v7.1</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v7.1</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v7.1'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2026-03-31T19:56:38+00:00</updated>
<entry>
<title>blk-iocost: fix busy_level reset when no IOs complete</title>
<updated>2026-03-31T19:56:38+00:00</updated>
<author>
<name>Jialin Wang</name>
<email>wjl.linux@gmail.com</email>
</author>
<published>2026-03-31T10:05:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=f91ffe89b2016d280995a9c28d73288b02d83615'/>
<id>urn:sha1:f91ffe89b2016d280995a9c28d73288b02d83615</id>
<content type='text'>
When a disk is saturated, it is common for no IOs to complete within a
timer period. Currently, in this case, rq_wait_pct and missed_ppm are
calculated as 0, the iocost incorrectly interprets this as meeting QoS
targets and resets busy_level to 0.

This reset prevents busy_level from reaching the threshold (4) needed
to reduce vrate. On certain cloud storage, such as Azure Premium SSD,
we observed that iocost may fail to reduce vrate for tens of seconds
during saturation, failing to mitigate noisy neighbor issues.

Fix this by tracking the number of IO completions (nr_done) in a period.
If nr_done is 0 and there are lagging IOs, the saturation status is
unknown, so we keep busy_level unchanged.

The issue is consistently reproducible on Azure Standard_D8as_v5 (Dasv5)
VMs with 512GB Premium SSD (P20) using the script below. It was not
observed on GCP n2d VMs (with 100G pd-ssd and 1.5T local-ssd), and no
regressions were found with this patch. In this script, cgA performs
large IOs with iodepth=128, while cgB performs small IOs with iodepth=1
rate_iops=100 rw=randrw. With iocost enabled, we expect it to throttle
cgA, the submission latency (slat) of cgA should be significantly higher,
cgB can reach 200 IOPS and the completion latency (clat) should below.

  BLK_DEVID="8:0"
  MODEL="rbps=173471131 rseqiops=3566 rrandiops=3566 wbps=173333269 wseqiops=3566 wrandiops=3566"
  QOS="rpct=90 rlat=3500 wpct=90 wlat=3500 min=80 max=10000"

  echo "$BLK_DEVID ctrl=user model=linear $MODEL" &gt; /sys/fs/cgroup/io.cost.model
  echo "$BLK_DEVID enable=1 ctrl=user $QOS" &gt; /sys/fs/cgroup/io.cost.qos

  CG_A="/sys/fs/cgroup/cgA"
  CG_B="/sys/fs/cgroup/cgB"

  FILE_A="/path/to/sda/A.fio.testfile"
  FILE_B="/path/to/sda/B.fio.testfile"
  RESULT_DIR="./iocost_results_$(date +%Y%m%d_%H%M%S)"

  mkdir -p "$CG_A" "$CG_B" "$RESULT_DIR"

  get_result() {
    local file=$1
    local label=$2

    local results=$(jq -r '
    .jobs[0].mixed |
    ( .iops | tonumber | round ) as $iops |
    ( .bw_bytes / 1024 / 1024 ) as $bps |
    ( .slat_ns.mean / 1000000 ) as $slat |
    ( .clat_ns.mean / 1000000 ) as $avg |
    ( .clat_ns.max / 1000000 ) as $max |
    ( .clat_ns.percentile["90.000000"] / 1000000 ) as $p90 |
    ( .clat_ns.percentile["99.000000"] / 1000000 ) as $p99 |
    ( .clat_ns.percentile["99.900000"] / 1000000 ) as $p999 |
    ( .clat_ns.percentile["99.990000"] / 1000000 ) as $p9999 |
    "\($iops)|\($bps)|\($slat)|\($avg)|\($max)|\($p90)|\($p99)|\($p999)|\($p9999)"
    ' "$file")

    IFS='|' read -r iops bps slat avg max p90 p99 p999 p9999 &lt;&lt;&lt;"$results"
    printf "%-8s %-6s %-7.2f %-8.2f %-8.2f %-8.2f %-8.2f %-8.2f %-8.2f %-8.2f\n" \
           "$label" "$iops" "$bps" "$slat" "$avg" "$max" "$p90" "$p99" "$p999" "$p9999"
  }

  run_fio() {
    local cg_path=$1
    local filename=$2
    local name=$3
    local bs=$4
    local qd=$5
    local out=$6
    shift 6
    local extra=$@

    (
      pid=$(sh -c 'echo $PPID')
      echo $pid &gt;"${cg_path}/cgroup.procs"
      fio --name="$name" --filename="$filename" --direct=1 --rw=randrw --rwmixread=50 \
          --ioengine=libaio --bs="$bs" --iodepth="$qd" --size=4G --runtime=10 \
          --time_based --group_reporting --unified_rw_reporting=mixed \
          --output-format=json --output="$out" $extra &gt;/dev/null 2&gt;&amp;1
    ) &amp;
  }

  echo "Starting Test ..."

  for bs_b in "4k" "32k" "256k"; do
    echo "Running iteration: BS=$bs_b"
    out_a="${RESULT_DIR}/cgA_1m.json"
    out_b="${RESULT_DIR}/cgB_${bs_b}.json"

    # cgA: Heavy background (BS 1MB, QD 128)
    run_fio "$CG_A" "$FILE_A" "cgA" "1m" 128 "$out_a"
    # cgB: Latency sensitive (Variable BS, QD 1, Read/Write IOPS limit 100)
    run_fio "$CG_B" "$FILE_B" "cgB" "$bs_b" 1 "$out_b" "--rate_iops=100"

    wait
    SUMMARY_DATA+="$(get_result "$out_a" "cgA-1m")"$'\n'
    SUMMARY_DATA+="$(get_result "$out_b" "cgB-$bs_b")"$'\n\n'
  done

  echo -e "\nFinal Results Summary:\n"

  printf "%-8s %-6s %-7s %-8s %-8s %-8s %-8s %-8s %-8s %-8s\n" \
          "" "" "" "slat" "clat" "clat" "clat" "clat" "clat" "clat"
  printf "%-8s %-6s %-7s %-8s %-8s %-8s %-8s %-8s %-8s %-8s\n\n" \
          "CGROUP" "IOPS" "MB/s" "avg(ms)" "avg(ms)" "max(ms)" "P90(ms)" "P99" "P99.9" "P99.99"
  echo "$SUMMARY_DATA"

  echo "Results saved in $RESULT_DIR"

Before:
                          slat     clat     clat     clat     clat     clat     clat
  CGROUP   IOPS   MB/s    avg(ms)  avg(ms)  max(ms)  P90(ms)  P99      P99.9    P99.99

  cgA-1m   166    166.37  3.44     748.95   1298.29  977.27   1233.13  1300.23  1300.23
  cgB-4k   5      0.02    0.02     181.74   761.32   742.39   759.17   759.17   759.17

  cgA-1m   167    166.51  1.98     748.68   1549.41  809.50   1451.23  1551.89  1551.89
  cgB-32k  6      0.18    0.02     169.98   761.76   742.39   759.17   759.17   759.17

  cgA-1m   166    165.55  2.89     750.89   1540.37  851.44   1451.23  1535.12  1535.12
  cgB-256k 5      1.30    0.02     191.35   759.51   750.78   759.17   759.17   759.17

After:
                          slat     clat     clat     clat     clat     clat     clat
  CGROUP   IOPS   MB/s    avg(ms)  avg(ms)  max(ms)  P90(ms)  P99      P99.9    P99.99

  cgA-1m   162    162.48  6.14     749.69   850.02   826.28   834.67   843.06   851.44
  cgB-4k   199    0.78    0.01     1.95     42.12    2.57     7.50     34.87    42.21

  cgA-1m   146    146.20  6.83     833.04   908.68   893.39   901.78   910.16   910.16
  cgB-32k  200    6.25    0.01     2.32     31.40    3.06     7.50     16.58    31.33

  cgA-1m   110    110.46  9.04     1082.67  1197.91  1182.79  1199.57  1199.57  1199.57
  cgB-256k 200    49.98   0.02     3.69     22.20    4.88     9.11     20.05    22.15

Signed-off-by: Jialin Wang &lt;wjl.linux@gmail.com&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Link: https://patch.msgid.link/20260331100509.182882-1-wjl.linux@gmail.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>Convert 'alloc_obj' family to use the new default GFP_KERNEL argument</title>
<updated>2026-02-22T01:09:51+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2026-02-22T00:37:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=bf4afc53b77aeaa48b5409da5c8da6bb4eff7f43'/>
<id>urn:sha1:bf4afc53b77aeaa48b5409da5c8da6bb4eff7f43</id>
<content type='text'>
This was done entirely with mindless brute force, using

    git grep -l '\&lt;k[vmz]*alloc_objs*(.*, GFP_KERNEL)' |
        xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/'

to convert the new alloc_obj() users that had a simple GFP_KERNEL
argument to just drop that argument.

Note that due to the extreme simplicity of the scripting, any slightly
more complex cases spread over multiple lines would not be triggered:
they definitely exist, but this covers the vast bulk of the cases, and
the resulting diff is also then easier to check automatically.

For the same reason the 'flex' versions will be done as a separate
conversion.

Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>treewide: Replace kmalloc with kmalloc_obj for non-scalar types</title>
<updated>2026-02-21T09:02:28+00:00</updated>
<author>
<name>Kees Cook</name>
<email>kees@kernel.org</email>
</author>
<published>2026-02-21T07:49:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=69050f8d6d075dc01af7a5f2f550a8067510366f'/>
<id>urn:sha1:69050f8d6d075dc01af7a5f2f550a8067510366f</id>
<content type='text'>
This is the result of running the Coccinelle script from
scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to
avoid scalar types (which need careful case-by-case checking), and
instead replace kmalloc-family calls that allocate struct or union
object instances:

Single allocations:	kmalloc(sizeof(TYPE), ...)
are replaced with:	kmalloc_obj(TYPE, ...)

Array allocations:	kmalloc_array(COUNT, sizeof(TYPE), ...)
are replaced with:	kmalloc_objs(TYPE, COUNT, ...)

Flex array allocations:	kmalloc(struct_size(PTR, FAM, COUNT), ...)
are replaced with:	kmalloc_flex(*PTR, FAM, COUNT, ...)

(where TYPE may also be *VAR)

The resulting allocations no longer return "void *", instead returning
"TYPE *".

Signed-off-by: Kees Cook &lt;kees@kernel.org&gt;
</content>
</entry>
<entry>
<title>block: introduce blk_queue_rot()</title>
<updated>2026-01-29T20:15:50+00:00</updated>
<author>
<name>Damien Le Moal</name>
<email>dlemoal@kernel.org</email>
</author>
<published>2026-01-29T07:27:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=2719bd1ee1a1cd0535bc62e89b52822f2bbd14eb'/>
<id>urn:sha1:2719bd1ee1a1cd0535bc62e89b52822f2bbd14eb</id>
<content type='text'>
To check if a request queue is for a rotational device, a double
negation is needed with the pattern "!blk_queue_nonrot(q)". Simplify
this with the introduction of the helper blk_queue_rot() which tests
if a requests queue limit has the BLK_FEAT_ROTATIONAL feature set.
All call sites of blk_queue_nonrot() are modified to use blk_queue_rot()
and blk_queue_nonrot() definition removed.

No functional changes.

Signed-off-by: Damien Le Moal &lt;dlemoal@kernel.org&gt;
Reviewed-by: Nitesh Shetty &lt;nj.shetty@samsung.com&gt;
Reviewed-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>block: use min() instead of min_t()</title>
<updated>2025-11-20T14:44:29+00:00</updated>
<author>
<name>David Laight</name>
<email>david.laight.linux@gmail.com</email>
</author>
<published>2025-11-19T22:41:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=9420e720ad192c53c8d2803c5a2313b2d586adbd'/>
<id>urn:sha1:9420e720ad192c53c8d2803c5a2313b2d586adbd</id>
<content type='text'>
min_t(unsigned int, a, b) casts an 'unsigned long' to 'unsigned int'.
Use min(a, b) instead as it promotes any 'unsigned int' to 'unsigned long'
and so cannot discard significant bits.

In this case the 'unsigned long' value is small enough that the result
is ok.

(Similarly for max_t() and clamp_t().)

Detected by an extra check added to min_t().

Signed-off-by: David Laight &lt;david.laight.linux@gmail.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>Merge tag 'for-6.15/block-20250322' of git://git.kernel.dk/linux</title>
<updated>2025-03-27T01:08:55+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2025-03-27T01:08:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=9b960d8cd6f712cb2c03e2bdd4d5ca058238037f'/>
<id>urn:sha1:9b960d8cd6f712cb2c03e2bdd4d5ca058238037f</id>
<content type='text'>
Pull block updates from Jens Axboe:

 - Fixes for integrity handling

 - NVMe pull request via Keith:
      - Secure concatenation for TCP transport (Hannes)
      - Multipath sysfs visibility (Nilay)
      - Various cleanups (Qasim, Baruch, Wang, Chen, Mike, Damien, Li)
      - Correct use of 64-bit BARs for pci-epf target (Niklas)
      - Socket fix for selinux when used in containers (Peijie)

 - MD pull request via Yu:
      - fix recovery can preempt resync (Li Nan)
      - fix md-bitmap IO limit (Su Yue)
      - fix raid10 discard with REQ_NOWAIT (Xiao Ni)
      - fix raid1 memory leak (Zheng Qixing)
      - fix mddev uaf (Yu Kuai)
      - fix raid1,raid10 IO flags (Yu Kuai)
      - some refactor and cleanup (Yu Kuai)

 - Series cleaning up and fixing bugs in the bad block handling code

 - Improve support for write failure simulation in null_blk

 - Various lock ordering fixes

 - Fixes for locking for debugfs attributes

 - Various ublk related fixes and improvements

 - Cleanups for blk-rq-qos wait handling

 - blk-throttle fixes

 - Fixes for loop dio and sync handling

 - Fixes and cleanups for the auto-PI code

 - Block side support for hardware encryption keys in blk-crypto

 - Various cleanups and fixes

* tag 'for-6.15/block-20250322' of git://git.kernel.dk/linux: (105 commits)
  nvmet: replace max(a, min(b, c)) by clamp(val, lo, hi)
  nvme-tcp: fix selinux denied when calling sock_sendmsg
  nvmet: pci-epf: Always configure BAR0 as 64-bit
  nvmet: Remove duplicate uuid_copy
  nvme: zns: Simplify nvme_zone_parse_entry()
  nvmet: pci-epf: Remove redundant 'flush_workqueue()' calls
  nvmet-fc: Remove unused functions
  nvme-pci: remove stale comment
  nvme-fc: Utilise min3() to simplify queue count calculation
  nvme-multipath: Add visibility for queue-depth io-policy
  nvme-multipath: Add visibility for numa io-policy
  nvme-multipath: Add visibility for round-robin io-policy
  nvmet: add tls_concat and tls_key debugfs entries
  nvmet-tcp: support secure channel concatenation
  nvmet: Add 'sq' argument to alloc_ctrl_args
  nvme-fabrics: reset admin connection for secure concatenation
  nvme-tcp: request secure channel concatenation
  nvme-keyring: add nvme_tls_psk_refresh()
  nvme: add nvme_auth_derive_tls_psk()
  nvme: add nvme_auth_generate_digest()
  ...
</content>
</entry>
<entry>
<title>block/blk-iocost: ensure 'ret' is set on error</title>
<updated>2025-03-19T20:51:36+00:00</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2025-03-19T20:36:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=03c90afb21b45edb87533fa6f11c5f914d26298b'/>
<id>urn:sha1:03c90afb21b45edb87533fa6f11c5f914d26298b</id>
<content type='text'>
In case blkg_conf_open_bdev_frozen() fails, ioc_qos_write() jumps to the
error path without assigning a value to 'ret'. Ensure that it inherits
the error from the passed back error value.

Reported-by: kernel test robot &lt;lkp@intel.com&gt;
Closes: https://lore.kernel.org/oe-kbuild-all/202503200454.QWpwKeJu-lkp@intel.com/
Fixes: 9730763f4756 ("block: correct locking order for protecting blk-wbt parameters")
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>block: correct locking order for protecting blk-wbt parameters</title>
<updated>2025-03-19T17:35:45+00:00</updated>
<author>
<name>Nilay Shroff</name>
<email>nilay@linux.ibm.com</email>
</author>
<published>2025-03-19T10:53:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=9730763f4756e32520cb86778331465e8d063a8f'/>
<id>urn:sha1:9730763f4756e32520cb86778331465e8d063a8f</id>
<content type='text'>
The commit '245618f8e45f ("block: protect wbt_lat_usec using q-&gt;
elevator_lock")' introduced q-&gt;elevator_lock to protect updates
to blk-wbt parameters when writing to the sysfs attribute wbt_
lat_usec and the cgroup attribute io.cost.qos.  However, both
these attributes also acquire q-&gt;rq_qos_mutex, leading to the
following lockdep warning:

======================================================
WARNING: possible circular locking dependency detected
6.14.0-rc5+ #138 Not tainted
------------------------------------------------------
bash/5902 is trying to acquire lock:
c000000085d495a0 (&amp;q-&gt;rq_qos_mutex){+.+.}-{4:4}, at: wbt_init+0x164/0x238

but task is already holding lock:
c000000085d498c8 (&amp;q-&gt;elevator_lock){+.+.}-{4:4}, at: queue_wb_lat_store+0xb0/0x20c

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-&gt; #1 (&amp;q-&gt;elevator_lock){+.+.}-{4:4}:
        __mutex_lock+0xf0/0xa58
        ioc_qos_write+0x16c/0x85c
        cgroup_file_write+0xc4/0x32c
        kernfs_fop_write_iter+0x1b8/0x29c
        vfs_write+0x410/0x584
        ksys_write+0x84/0x140
        system_call_exception+0x134/0x360
        system_call_vectored_common+0x15c/0x2ec

-&gt; #0 (&amp;q-&gt;rq_qos_mutex){+.+.}-{4:4}:
        __lock_acquire+0x1b6c/0x2ae0
        lock_acquire+0x140/0x430
        __mutex_lock+0xf0/0xa58
        wbt_init+0x164/0x238
        queue_wb_lat_store+0x1dc/0x20c
        queue_attr_store+0x12c/0x164
        sysfs_kf_write+0x6c/0xb0
        kernfs_fop_write_iter+0x1b8/0x29c
        vfs_write+0x410/0x584
        ksys_write+0x84/0x140
        system_call_exception+0x134/0x360
        system_call_vectored_common+0x15c/0x2ec

other info that might help us debug this:

    Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
    lock(&amp;q-&gt;elevator_lock);
                                lock(&amp;q-&gt;rq_qos_mutex);
                                lock(&amp;q-&gt;elevator_lock);
    lock(&amp;q-&gt;rq_qos_mutex);

    *** DEADLOCK ***

6 locks held by bash/5902:
    #0: c000000051122400 (sb_writers#3){.+.+}-{0:0}, at: ksys_write+0x84/0x140
    #1: c00000007383f088 (&amp;of-&gt;mutex#2){+.+.}-{4:4}, at: kernfs_fop_write_iter+0x174/0x29c
    #2: c000000008550428 (kn-&gt;active#182){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x180/0x29c
    #3: c000000085d493a8 (&amp;q-&gt;q_usage_counter(io)#5){++++}-{0:0}, at: blk_mq_freeze_queue_nomemsave+0x28/0x40
    #4: c000000085d493e0 (&amp;q-&gt;q_usage_counter(queue)#5){++++}-{0:0}, at: blk_mq_freeze_queue_nomemsave+0x28/0x40
    #5: c000000085d498c8 (&amp;q-&gt;elevator_lock){+.+.}-{4:4}, at: queue_wb_lat_store+0xb0/0x20c

stack backtrace:
CPU: 17 UID: 0 PID: 5902 Comm: bash Kdump: loaded Not tainted 6.14.0-rc5+ #138
Hardware name: IBM,9043-MRX POWER10 (architected) 0x800200 0xf000006 of:IBM,FW1060.00 (NM1060_028) hv:phyp pSeries
Call Trace:
[c0000000721ef590] [c00000000118f8a8] dump_stack_lvl+0x108/0x18c (unreliable)
[c0000000721ef5c0] [c00000000022563c] print_circular_bug+0x448/0x604
[c0000000721ef670] [c000000000225a44] check_noncircular+0x24c/0x26c
[c0000000721ef740] [c00000000022bf28] __lock_acquire+0x1b6c/0x2ae0
[c0000000721ef870] [c000000000229240] lock_acquire+0x140/0x430
[c0000000721ef970] [c0000000011cfbec] __mutex_lock+0xf0/0xa58
[c0000000721efaa0] [c00000000096c46c] wbt_init+0x164/0x238
[c0000000721efaf0] [c0000000008f8cd8] queue_wb_lat_store+0x1dc/0x20c
[c0000000721efb50] [c0000000008f8fa0] queue_attr_store+0x12c/0x164
[c0000000721efc60] [c0000000007c11cc] sysfs_kf_write+0x6c/0xb0
[c0000000721efca0] [c0000000007bfa4c] kernfs_fop_write_iter+0x1b8/0x29c
[c0000000721efcf0] [c0000000006a281c] vfs_write+0x410/0x584
[c0000000721efdc0] [c0000000006a2cc8] ksys_write+0x84/0x140
[c0000000721efe10] [c000000000031b64] system_call_exception+0x134/0x360
[c0000000721efe50] [c00000000000cedc] system_call_vectored_common+0x15c/0x2ec

&gt;From the above log it's apparent that method which writes to sysfs attr
wbt_lat_usec acquires q-&gt;elevator_lock first, and then acquires q-&gt;rq_
qos_mutex. However the another method which writes to io.cost.qos,
acquires q-&gt;rq_qos_mutex first, and then acquires q-&gt;rq_qos_mutex. So
this could potentially cause the deadlock.

A closer look at ioc_qos_write shows that correcting the lock order is
non-trivial because q-&gt;rq_qos_mutex is acquired in blkg_conf_open_bdev
and released in blkg_conf_exit. The function blkg_conf_open_bdev is
responsible for parsing user input and finding the corresponding block
device (bdev) from the user provided major:minor number.

Since we do not know the bdev until blkg_conf_open_bdev completes, we
cannot simply move q-&gt;elevator_lock acquisition before blkg_conf_open_
bdev. So to address this, we intoduce new helpers blkg_conf_open_bdev_
frozen and blkg_conf_exit_frozen which are just wrappers around blkg_
conf_open_bdev and blkg_conf_exit respectively. The helper blkg_conf_
open_bdev_frozen is similar to blkg_conf_open_bdev, but additionally
freezes the queue, acquires q-&gt;elevator_lock and ensures the correct
locking order is followed between q-&gt;elevator_lock and q-&gt;rq_qos_mutex.
Similarly another helper blkg_conf_exit_frozen in addition to unfreezing
the queue ensures that we release the locks in correct order.

By using these helpers, now we maintain the same locking order in all
code paths where we update blk-wbt parameters.

Fixes: 245618f8e45f ("block: protect wbt_lat_usec using q-&gt;elevator_lock")
Reported-by: kernel test robot &lt;oliver.sang@intel.com&gt;
Closes: https://lore.kernel.org/oe-lkp/202503171650.cc082b66-lkp@intel.com
Signed-off-by: Nilay Shroff &lt;nilay@linux.ibm.com&gt;
Link: https://lore.kernel.org/r/20250319105518.468941-3-nilay@linux.ibm.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>block: release q-&gt;elevator_lock in ioc_qos_write</title>
<updated>2025-03-19T17:35:45+00:00</updated>
<author>
<name>Nilay Shroff</name>
<email>nilay@linux.ibm.com</email>
</author>
<published>2025-03-19T10:53:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=89ed5fa3b5419f04452051fbcb6d3e5b801cdb1b'/>
<id>urn:sha1:89ed5fa3b5419f04452051fbcb6d3e5b801cdb1b</id>
<content type='text'>
The ioc_qos_write method acquires q-&gt;elevator_lock to protect
updates to blk-wbt parameters. Once these updates are complete,
the lock should be released before returning from ioc_qos_write.

However, in one code path, the release of q-&gt;elevator_lock was
mistakenly omitted, potentially leading to a lock leak. This commit
fixes the issue by ensuring that q-&gt;elevator_lock is properly
released in all return paths of ioc_qos_write.

Fixes: 245618f8e45f ("block: protect wbt_lat_usec using q-&gt;elevator_lock")
Reported-by: kernel test robot &lt;oliver.sang@intel.com&gt;
Closes: https://lore.kernel.org/oe-lkp/202503171650.cc082b66-lkp@intel.com
Signed-off-by: Nilay Shroff &lt;nilay@linux.ibm.com&gt;
Reviewed-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Link: https://lore.kernel.org/r/20250319105518.468941-2-nilay@linux.ibm.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>block: protect wbt_lat_usec using q-&gt;elevator_lock</title>
<updated>2025-03-10T13:30:18+00:00</updated>
<author>
<name>Nilay Shroff</name>
<email>nilay@linux.ibm.com</email>
</author>
<published>2025-03-04T10:22:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=245618f8e45ff4f79327627b474b563da71c2c75'/>
<id>urn:sha1:245618f8e45ff4f79327627b474b563da71c2c75</id>
<content type='text'>
The wbt latency and state could be updated while initializing the
elevator or exiting the elevator. It could be also updated while
configuring IO latency QoS parameters using cgroup. The elevator
code path is now protected with q-&gt;elevator_lock. So we should
protect the access to sysfs attribute wbt_lat_usec using q-&gt;elevator
_lock instead of q-&gt;sysfs_lock. White we're at it, also protect
ioc_qos_write(), which configures wbt parameters via cgroup, using
q-&gt;elevator_lock.

Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Hannes Reinecke &lt;hare@suse.de&gt;
Reviewed-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Signed-off-by: Nilay Shroff &lt;nilay@linux.ibm.com&gt;
Link: https://lore.kernel.org/r/20250304102551.2533767-7-nilay@linux.ibm.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
</feed>
