kernel/linux.git - Linux kernel stable tree (mirror)

Age	Commit message (Collapse)	Author	Files	Lines
2026-04-02	nvme-pci: ensure we're polling a polled queue	Keith Busch	1	-1/+2
	[ Upstream commit 166e31d7dbf6aa44829b98aa446bda5c9580f12a ] A user can change the polled queue count at run time. There's a brief window during a reset where a hipri task may try to poll that queue before the block layer has updated the queue maps, which would race with the now interrupt driven queue and may cause double completions. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-04-02	nvme-fabrics: use kfree_sensitive() for DHCHAP secrets	Daniel Hodges	1	-2/+2
	[ Upstream commit 0a1fc2f301529ac75aec0ce80d5ab9d9e4dc4b16 ] The DHCHAP secrets (dhchap_secret and dhchap_ctrl_secret) contain authentication key material for NVMe-oF. Use kfree_sensitive() instead of kfree() in nvmf_free_options() to ensure secrets are zeroed before the memory is freed, preventing recovery from freed pages. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Daniel Hodges <hodgesd@meta.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-04-02	nvme-pci: cap queue creation to used queues	Keith Busch	1	-1/+7
	[ Upstream commit 4735b510a00fb2d4ac9e8d21a8c9552cb281f585 ] If the user reduces the special queue count at runtime and resets the controller, we need to reduce the number of queues and interrupts requested accordingly rather than start with the pre-allocated queue count. Tested-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-19	nvme-pci: Fix race bug in nvme_poll_irqdisable()	Sungwoo Kim	1	-2/+4
	[ Upstream commit fc71f409b22ca831a9f87a2712eaa09ef2bb4a5e ] In the following scenario, pdev can be disabled between (1) and (3) by (2). This sets pdev->msix_enabled = 0. Then, pci_irq_vector() will return MSI-X IRQ(>15) for (1) whereas return INTx IRQ(<=15) for (2). This causes IRQ warning because it tries to enable INTx IRQ that has never been disabled before. To fix this, save IRQ number into a local variable and ensure disable_irq() and enable_irq() operate on the same IRQ number. Even if pci_free_irq_vectors() frees the IRQ concurrently, disable_irq() and enable_irq() on a stale IRQ number is still valid and safe, and the depth accounting reamins balanced. task 1: nvme_poll_irqdisable() disable_irq(pci_irq_vector(pdev, nvmeq->cq_vector)) ...(1) enable_irq(pci_irq_vector(pdev, nvmeq->cq_vector)) ...(3) task 2: nvme_reset_work() nvme_dev_disable() pdev->msix_enable = 0; ...(2) crash log: ------------[ cut here ]------------ Unbalanced enable for IRQ 10 WARNING: kernel/irq/manage.c:753 at __enable_irq+0x102/0x190 kernel/irq/manage.c:753, CPU#1: kworker/1:0H/26 Modules linked in: CPU: 1 UID: 0 PID: 26 Comm: kworker/1:0H Not tainted 6.19.0-dirty #9 PREEMPT(voluntary) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 Workqueue: kblockd blk_mq_timeout_work RIP: 0010:__enable_irq+0x107/0x190 kernel/irq/manage.c:753 Code: ff df 48 89 fa 48 c1 ea 03 0f b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 04 84 d2 75 79 48 8d 3d 2e 7a 3f 05 41 8b 74 24 2c <67> 48 0f b9 3a e8 ef b9 21 00 5b 41 5c 5d e9 46 54 66 03 e8 e1 b9 RSP: 0018:ffffc900001bf550 EFLAGS: 00010046 RAX: 0000000000000007 RBX: 0000000000000000 RCX: ffffffffb20c0e90 RDX: 0000000000000000 RSI: 000000000000000a RDI: ffffffffb74b88f0 RBP: ffffc900001bf560 R08: ffff88800197cf00 R09: 0000000000000001 R10: 0000000000000003 R11: 0000000000000003 R12: ffff8880012a6000 R13: 1ffff92000037eae R14: 000000000000000a R15: 0000000000000293 FS: 0000000000000000(0000) GS:ffff8880b49f7000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000555da4a25fa8 CR3: 00000000208e8000 CR4: 00000000000006f0 Call Trace: <TASK> enable_irq+0x121/0x1e0 kernel/irq/manage.c:797 nvme_poll_irqdisable+0x162/0x1c0 drivers/nvme/host/pci.c:1494 nvme_timeout+0x965/0x14b0 drivers/nvme/host/pci.c:1744 blk_mq_rq_timed_out block/blk-mq.c:1653 [inline] blk_mq_handle_expired+0x227/0x2d0 block/blk-mq.c:1721 bt_iter+0x2fc/0x3a0 block/blk-mq-tag.c:292 __sbitmap_for_each_set include/linux/sbitmap.h:269 [inline] sbitmap_for_each_set include/linux/sbitmap.h:290 [inline] bt_for_each block/blk-mq-tag.c:324 [inline] blk_mq_queue_tag_busy_iter+0x969/0x1e80 block/blk-mq-tag.c:536 blk_mq_timeout_work+0x627/0x870 block/blk-mq.c:1763 process_one_work+0x956/0x1aa0 kernel/workqueue.c:3257 process_scheduled_works kernel/workqueue.c:3340 [inline] worker_thread+0x65c/0xe60 kernel/workqueue.c:3421 kthread+0x41a/0x930 kernel/kthread.c:463 ret_from_fork+0x6f8/0x8c0 arch/x86/kernel/process.c:158 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:246 </TASK> irq event stamp: 74478 hardirqs last enabled at (74477): [<ffffffffb5720a9c>] __raw_spin_unlock_irq include/linux/spinlock_api_smp.h:159 [inline] hardirqs last enabled at (74477): [<ffffffffb5720a9c>] _raw_spin_unlock_irq+0x2c/0x60 kernel/locking/spinlock.c:202 hardirqs last disabled at (74478): [<ffffffffb57207b5>] __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:108 [inline] hardirqs last disabled at (74478): [<ffffffffb57207b5>] _raw_spin_lock_irqsave+0x85/0xa0 kernel/locking/spinlock.c:162 softirqs last enabled at (74304): [<ffffffffb1e9466c>] __do_softirq kernel/softirq.c:656 [inline] softirqs last enabled at (74304): [<ffffffffb1e9466c>] invoke_softirq kernel/softirq.c:496 [inline] softirqs last enabled at (74304): [<ffffffffb1e9466c>] __irq_exit_rcu+0xdc/0x120 kernel/softirq.c:723 softirqs last disabled at (74287): [<ffffffffb1e9466c>] __do_softirq kernel/softirq.c:656 [inline] softirqs last disabled at (74287): [<ffffffffb1e9466c>] invoke_softirq kernel/softirq.c:496 [inline] softirqs last disabled at (74287): [<ffffffffb1e9466c>] __irq_exit_rcu+0xdc/0x120 kernel/softirq.c:723 ---[ end trace 0000000000000000 ]--- Fixes: fa059b856a59 (nvme-pci: Simplify nvme_poll_irqdisable) Acked-by: Chao Shi <cshi008@fiu.edu> Acked-by: Weidong Zhu <weizhu@fiu.edu> Acked-by: Dave Tian <daveti@purdue.edu> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sungwoo Kim <iam@sung-woo.kim> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-19	nvme-pci: Fix slab-out-of-bounds in nvme_dbbuf_set	Sungwoo Kim	1	-1/+1
	[ Upstream commit b4e78f1427c7d6859229ae9616df54e1fc05a516 ] dev->online_queues is a count incremented in nvme_init_queue. Thus, valid indices are 0 through dev->online_queues − 1. This patch fixes the loop condition to ensure the index stays within the valid range. Index 0 is excluded because it is the admin queue. KASAN splat: ================================================================== BUG: KASAN: slab-out-of-bounds in nvme_dbbuf_free drivers/nvme/host/pci.c:377 [inline] BUG: KASAN: slab-out-of-bounds in nvme_dbbuf_set+0x39c/0x400 drivers/nvme/host/pci.c:404 Read of size 2 at addr ffff88800592a574 by task kworker/u8:5/74 CPU: 0 UID: 0 PID: 74 Comm: kworker/u8:5 Not tainted 6.19.0-dirty #10 PREEMPT(voluntary) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 Workqueue: nvme-reset-wq nvme_reset_work Call Trace: <TASK> __dump_stack lib/dump_stack.c:94 [inline] dump_stack_lvl+0xea/0x150 lib/dump_stack.c:120 print_address_description mm/kasan/report.c:378 [inline] print_report+0xce/0x5d0 mm/kasan/report.c:482 kasan_report+0xdc/0x110 mm/kasan/report.c:595 __asan_report_load2_noabort+0x18/0x20 mm/kasan/report_generic.c:379 nvme_dbbuf_free drivers/nvme/host/pci.c:377 [inline] nvme_dbbuf_set+0x39c/0x400 drivers/nvme/host/pci.c:404 nvme_reset_work+0x36b/0x8c0 drivers/nvme/host/pci.c:3252 process_one_work+0x956/0x1aa0 kernel/workqueue.c:3257 process_scheduled_works kernel/workqueue.c:3340 [inline] worker_thread+0x65c/0xe60 kernel/workqueue.c:3421 kthread+0x41a/0x930 kernel/kthread.c:463 ret_from_fork+0x6f8/0x8c0 arch/x86/kernel/process.c:158 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:246 </TASK> Allocated by task 34 on cpu 1 at 4.241550s: kasan_save_stack+0x2c/0x60 mm/kasan/common.c:57 kasan_save_track+0x1c/0x70 mm/kasan/common.c:78 kasan_save_alloc_info+0x3c/0x50 mm/kasan/generic.c:570 poison_kmalloc_redzone mm/kasan/common.c:398 [inline] __kasan_kmalloc+0xb5/0xc0 mm/kasan/common.c:415 kasan_kmalloc include/linux/kasan.h:263 [inline] __do_kmalloc_node mm/slub.c:5657 [inline] __kmalloc_node_noprof+0x2bf/0x8d0 mm/slub.c:5663 kmalloc_array_node_noprof include/linux/slab.h:1075 [inline] nvme_pci_alloc_dev drivers/nvme/host/pci.c:3479 [inline] nvme_probe+0x2f1/0x1820 drivers/nvme/host/pci.c:3534 local_pci_probe+0xef/0x1c0 drivers/pci/pci-driver.c:324 pci_call_probe drivers/pci/pci-driver.c:392 [inline] __pci_device_probe drivers/pci/pci-driver.c:417 [inline] pci_device_probe+0x743/0x920 drivers/pci/pci-driver.c:451 call_driver_probe drivers/base/dd.c:583 [inline] really_probe+0x29b/0xb70 drivers/base/dd.c:661 __driver_probe_device+0x3b0/0x4a0 drivers/base/dd.c:803 driver_probe_device+0x56/0x1f0 drivers/base/dd.c:833 __driver_attach_async_helper+0x155/0x340 drivers/base/dd.c:1159 async_run_entry_fn+0xa6/0x4b0 kernel/async.c:129 process_one_work+0x956/0x1aa0 kernel/workqueue.c:3257 process_scheduled_works kernel/workqueue.c:3340 [inline] worker_thread+0x65c/0xe60 kernel/workqueue.c:3421 kthread+0x41a/0x930 kernel/kthread.c:463 ret_from_fork+0x6f8/0x8c0 arch/x86/kernel/process.c:158 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:246 The buggy address belongs to the object at ffff88800592a000 which belongs to the cache kmalloc-2k of size 2048 The buggy address is located 244 bytes to the right of allocated 1152-byte region [ffff88800592a000, ffff88800592a480) The buggy address belongs to the physical page: page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x5928 head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0 anon flags: 0xfffffc0000040(head\|node=0\|zone=1\|lastcpupid=0x1fffff) page_type: f5(slab) raw: 000fffffc0000040 ffff888001042000 0000000000000000 dead000000000001 raw: 0000000000000000 0000000000080008 00000000f5000000 0000000000000000 head: 000fffffc0000040 ffff888001042000 0000000000000000 dead000000000001 head: 0000000000000000 0000000000080008 00000000f5000000 0000000000000000 head: 000fffffc0000003 ffffea0000164a01 00000000ffffffff 00000000ffffffff head: ffffffffffffffff 0000000000000000 00000000ffffffff 0000000000000008 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff88800592a400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff88800592a480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >ffff88800592a500: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ^ ffff88800592a580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff88800592a600: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ================================================================== Fixes: 0f0d2c876c96 (nvme: free sq/cq dbbuf pointers when dbbuf set fails) Acked-by: Chao Shi <cshi008@fiu.edu> Acked-by: Weidong Zhu <weizhu@fiu.edu> Acked-by: Dave Tian <daveti@purdue.edu> Signed-off-by: Sungwoo Kim <iam@sung-woo.kim> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-12	nvme: fix memory allocation in nvme_pr_read_keys()	Sungwoo Kim	1	-2/+2
	[ Upstream commit c3320153769f05fd7fe9d840cb555dd3080ae424 ] nvme_pr_read_keys() takes num_keys from userspace and uses it to calculate the allocation size for rse via struct_size(). The upper limit is PR_KEYS_MAX (64K). A malicious or buggy userspace can pass a large num_keys value that results in a 4MB allocation attempt at most, causing a warning in the page allocator when the order exceeds MAX_PAGE_ORDER. To fix this, use kvzalloc() instead of kzalloc(). This bug has the same reasoning and fix with the patch below: https://lore.kernel.org/linux-block/20251212013510.3576091-1-kartikey406@gmail.com/ Warning log: WARNING: mm/page_alloc.c:5216 at __alloc_frozen_pages_noprof+0x5aa/0x2300 mm/page_alloc.c:5216, CPU#1: syz-executor117/272 Modules linked in: CPU: 1 UID: 0 PID: 272 Comm: syz-executor117 Not tainted 6.19.0 #1 PREEMPT(voluntary) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 RIP: 0010:__alloc_frozen_pages_noprof+0x5aa/0x2300 mm/page_alloc.c:5216 Code: ff 83 bd a8 fe ff ff 0a 0f 86 69 fb ff ff 0f b6 1d f9 f9 c4 04 80 fb 01 0f 87 3b 76 30 ff 83 e3 01 75 09 c6 05 e4 f9 c4 04 01 <0f> 0b 48 c7 85 70 fe ff ff 00 00 00 00 e9 8f fd ff ff 31 c0 e9 0d RSP: 0018:ffffc90000fcf450 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 1ffff920001f9ea0 RDX: 0000000000000000 RSI: 000000000000000b RDI: 0000000000040dc0 RBP: ffffc90000fcf648 R08: ffff88800b6c3380 R09: 0000000000000001 R10: ffffc90000fcf840 R11: ffff88807ffad280 R12: 0000000000000000 R13: 0000000000040dc0 R14: 0000000000000001 R15: ffffc90000fcf620 FS: 0000555565db33c0(0000) GS:ffff8880be26c000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000002000000c CR3: 0000000003b72000 CR4: 00000000000006f0 Call Trace: <TASK> alloc_pages_mpol+0x236/0x4d0 mm/mempolicy.c:2486 alloc_frozen_pages_noprof+0x149/0x180 mm/mempolicy.c:2557 ___kmalloc_large_node+0x10c/0x140 mm/slub.c:5598 __kmalloc_large_node_noprof+0x25/0xc0 mm/slub.c:5629 __do_kmalloc_node mm/slub.c:5645 [inline] __kmalloc_noprof+0x483/0x6f0 mm/slub.c:5669 kmalloc_noprof include/linux/slab.h:961 [inline] kzalloc_noprof include/linux/slab.h:1094 [inline] nvme_pr_read_keys+0x8f/0x4c0 drivers/nvme/host/pr.c:245 blkdev_pr_read_keys block/ioctl.c:456 [inline] blkdev_common_ioctl+0x1b71/0x29b0 block/ioctl.c:730 blkdev_ioctl+0x299/0x700 block/ioctl.c:786 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:597 [inline] __se_sys_ioctl fs/ioctl.c:583 [inline] __x64_sys_ioctl+0x1bf/0x220 fs/ioctl.c:583 x64_sys_call+0x1280/0x21b0 mnt/fuzznvme_1/fuzznvme/linux-build/v6.19/./arch/x86/include/generated/asm/syscalls_64.h:17 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0x71/0x330 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7fb893d3108d Code: 28 c3 e8 46 1e 00 00 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffff61f2f38 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007ffff61f3138 RCX: 00007fb893d3108d RDX: 0000000020000040 RSI: 00000000c01070ce RDI: 0000000000000003 RBP: 0000000000000001 R08: 0000000000000000 R09: 00007ffff61f3138 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001 R13: 00007ffff61f3128 R14: 00007fb893dae530 R15: 0000000000000001 </TASK> Fixes: 5fd96a4e15de (nvme: Add pr_ops read_keys support) Acked-by: Chao Shi <cshi008@fiu.edu> Acked-by: Weidong Zhu <weizhu@fiu.edu> Acked-by: Dave Tian <daveti@purdue.edu> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Sungwoo Kim <iam@sung-woo.kim> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-12	nvme-multipath: fix leak on try_module_get failure	Keith Busch	1	-7/+5
	[ Upstream commit 0f5197ea9a73a4c406c75e6d8af3a13f7f96ae89 ] We need to fall back to the synchronous removal if we can't get a reference on the module needed for the deferred removal. Fixes: 62188639ec16 ("nvme-multipath: introduce delayed removal of the multipath head node") Reviewed-by: Nilay Shroff <nilay@linux.ibm.com> Reviewed-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-12	nvme: fix admin queue leak on controller reset	Ming Lei	1	-0/+7
	[ Upstream commit b84bb7bd913d8ca2f976ee6faf4a174f91c02b8d ] When nvme_alloc_admin_tag_set() is called during a controller reset, a previous admin queue may still exist. Release it properly before allocating a new one to avoid orphaning the old queue. This fixes a regression introduced by commit 03b3bcd319b3 ("nvme: fix admin request_queue lifetime"). Cc: Keith Busch <kbusch@kernel.org> Fixes: 03b3bcd319b3 ("nvme: fix admin request_queue lifetime"). Reported-and-tested-by: Yi Zhang <yi.zhang@redhat.com> Closes: https://lore.kernel.org/linux-block/CAHj4cs9wv3SdPo+N01Fw2SHBYDs9tj2M_e1-GdQOkRy=DsBB1w@mail.gmail.com/ Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-02-05	nvme-pci: handle changing device dma map requirements	Keith Busch	1	-15/+30
	The initial state of dma_needs_unmap may be false, but change to true while mapping the data iterator. Enabling swiotlb is one such case that can change the result. The nvme driver needs to save the mapped dma vectors to be unmapped later, so allocate as needed during iteration rather than assume it was always allocated at the beginning. This fixes a NULL dereference from accessing an uninitialized dma_vecs when the device dma unmapping requirements change mid-iteration. Fixes: b8b7570a7ec8 ("nvme-pci: fix dma unmapping when using PRPs and not using the IOVA mapping") Link: https://lore.kernel.org/linux-nvme/20260202125738.1194899-1-pradeep.pragallapati@oss.qualcomm.com/ Reported-by: Pradeep P V K <pradeep.pragallapati@oss.qualcomm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2026-01-28	nvme-pci: DMA unmap the correct regions in nvme_free_sgls	Roger Pau Monne	1	-2/+2
	The call to nvme_free_sgls() in nvme_unmap_data() has the sg_list and sge parameters swapped. This wasn't noticed by the compiler because both share the same type. On a Xen PV hardware domain, and possibly any other architectures that takes that path, this leads to corruption of the NVMe contents. Fixes: f0887e2a52d4 ("nvme-pci: create common sgl unmapping helper") Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
2026-01-14	nvme: fix PCIe subsystem reset controller state transition	Nilay Shroff	1	-1/+4
	The commit d2fe192348f9 (“nvme: only allow entering LIVE from CONNECTING state”) disallows controller state transitions directly from RESETTING to LIVE. However, the NVMe PCIe subsystem reset path relies on this transition to recover the controller on PowerPC (PPC) systems. On PPC systems, issuing a subsystem reset causes a temporary loss of communication with the NVMe adapter. A subsequent PCIe MMIO read then triggers EEH recovery, which restores the PCIe link and brings the controller back online. For EEH recovery to proceed correctly, the controller must transition back to the LIVE state. Due to the changes introduced by commit d2fe192348f9 (“nvme: only allow entering LIVE from CONNECTING state”), the controller can no longer transition directly from RESETTING to LIVE. As a result, EEH recovery exits prematurely, leaving the controller stuck in the RESETTING state. Fix this by explicitly transitioning the controller state from RESETTING to CONNECTING and then to LIVE. This satisfies the updated state transition rules and allows the controller to be successfully recovered on PPC systems following a PCIe subsystem reset. Cc: stable@vger.kernel.org Fixes: d2fe192348f9 ("nvme: only allow entering LIVE from CONNECTING state") Reviewed-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Nilay Shroff <nilay@linux.ibm.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
2026-01-09	nvme-fc: release admin tagset if init fails	Chaitanya Kulkarni	1	-0/+2
	nvme_fabrics creates an NVMe/FC controller in following path: nvmf_dev_write() -> nvmf_create_ctrl() -> nvme_fc_create_ctrl() -> nvme_fc_init_ctrl() nvme_fc_init_ctrl() allocates the admin blk-mq resources right after nvme_add_ctrl() succeeds. If any of the subsequent steps fail (changing the controller state, scheduling connect work, etc.), we jump to the fail_ctrl path, which tears down the controller references but never frees the admin queue/tag set. The leaked blk-mq allocations match the kmemleak report seen during blktests nvme/fc. Check ctrl->ctrl.admin_tagset in the fail_ctrl path and call nvme_remove_admin_tag_set() when it is set so that all admin queue allocations are reclaimed whenever controller setup aborts. Reported-by: Yi Zhang <yi.zhang@redhat.com> Reviewed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Chaitanya Kulkarni <ckulkarnilinux@gmail.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
2026-01-09	nvme-apple: add "apple,t8103-nvme-ans2" as compatible	Janne Grunau	1	-0/+1
	After discussion with the devicetree maintainers we agreed to not extend lists with the generic compatible "apple,nvme-ans2" anymore [1]. Add "apple,t8103-nvme-ans2" as fallback compatible as it is the SoC the driver and bindings were written for. [1]: https://lore.kernel.org/asahi/12ab93b7-1fc2-4ce0-926e-c8141cfe81bf@kernel.org/ Cc: stable@vger.kernel.org # v6.18+ Fixes: 5bd2927aceba ("nvme-apple: Add initial Apple SoC NVMe driver") Reviewed-by: Neal Gompa <neal@gompa.dev> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Janne Grunau <j@jannau.net> Signed-off-by: Keith Busch <kbusch@kernel.org>
2025-12-15	nvme-pci: disable secondary temp for Wodposit WPBSNM8	Ilikara Zheng	1	-0/+2
	Secondary temperature thresholds (temp2_{min,max}) were not reported properly on this NVMe SSD. This resulted in an error while attempting to read these values with sensors(1): ERROR: Can't get value of subfeature temp2_min: I/O error ERROR: Can't get value of subfeature temp2_max: I/O error Add the device to the nvme_id_table with the NVME_QUIRK_NO_SECONDARY_TEMP_THRESH flag to suppress access to all non- composite temperature thresholds. Cc: stable@vger.kernel.org Tested-by: Wu Haotian <rigoligo03@gmail.com> Signed-off-by: Ilikara Zheng <ilikara@aosc.io> Signed-off-by: Keith Busch <kbusch@kernel.org>
2025-12-05	Merge tag 'nvme-6.19-2025-12-04' of git://git.infradead.org/nvme into block-6.19	Jens Axboe	4	-4/+10
	Pull NVMe updates from Keith: "- Subsystem usage cleanups (Max) - Endpoint device fixes (Shin'ichiro) - Debug statements (Gerd) - FC fabrics cleanups and fixes (Daniel) - Consistent alloc API usages (Israel) - Code comment updates (Chu) - Authentication retry fix (Justin)" * tag 'nvme-6.19-2025-12-04' of git://git.infradead.org/nvme: nvme-fabrics: add ENOKEY to no retry criteria for authentication failures nvme-auth: use kvfree() for memory allocated with kvcalloc() nvmet-tcp: use kvcalloc for commands array nvmet-rdma: use kvcalloc for commands and responses arrays nvme: fix typo error in nvme target nvmet-fc: use pr_* print macros instead of dev_* nvmet-fcloop: remove unused lsdir member. nvmet-fcloop: check all request and response have been processed nvme-fc: check all request and response have been processed nvme-fc: don't hold rport lock when putting ctrl nvme-pci: add debug message on fail to read CSTS nvme-pci: print error message on failure in nvme_probe nvmet: pci-epf: fix DMA channel debug print nvmet: pci-epf: move DMA initialization to EPC init callback nvmet: remove redundant subsysnqn field from ctrl nvmet: add sanity checks when freeing subsystem
2025-12-05	nvme-fabrics: add ENOKEY to no retry criteria for authentication failures	Justin Tee	1	-1/+1
	With authentication, in addition to EKEYREJECTED there is also no point in retrying reconnects when status is ENOKEY. Thus, add -ENOKEY as another criteria to determine when to stop retries. Cc: Daniel Wagner <wagi@kernel.org> Cc: Hannes Reinecke <hare@suse.de> Closes: https://lore.kernel.org/linux-nvme/20250829-nvme-fc-sync-v3-0-d69c87e63aee@kernel.org/ Signed-off-by: Justin Tee <justintee8345@gmail.com> Tested-by: Daniel Wagner <wagi@kernel.org> Reviewed-by: Daniel Wagner <wagi@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2025-12-05	nvme-auth: use kvfree() for memory allocated with kvcalloc()	Israel Rukshin	1	-1/+1
	Memory allocated by kvcalloc() may come from vmalloc or kmalloc, so use kvfree() instead of kfree() for proper deallocation. Fixes: aa36d711e945 ("nvme-auth: convert dhchap_auth_list to an array") Signed-off-by: Israel Rukshin <israelr@nvidia.com> Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2025-12-05	nvme-fc: check all request and response have been processed	Daniel Wagner	1	-0/+2
	When the rport is removed there shouldn't be any in flight request or responses. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Daniel Wagner <wagi@kernel.org> Signed-off-by: Keith Busch <kbusch@kernel.org>
2025-12-04	nvme: reject invalid pr_read_keys() num_keys values	Stefan Hajnoczi	1	-1/+5
	The pr_read_keys() interface has a u32 num_keys parameter. The NVMe Reservation Report command has a u32 maximum length. Reject num_keys values that are too large to fit. This will become important when pr_read_keys() is exposed to untrusted userspace via an <linux/pr.h> ioctl. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-12-04	block: use bio_alloc_bioset for passthru IO by default	Fengnan Chang	1	-1/+1
	Use bio_alloc_bioset for passthru IO by default, so that we can enable bio cache for irq and polled passthru IO in later. Signed-off-by: Fengnan Chang <changfengnan@bytedance.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-12-04	Merge tag 'for-6.19/block-20251201' of ↵	Linus Torvalds	10	-35/+131
	git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux Pull block updates from Jens Axboe: - Fix head insertion for mq-deadline, a regression from when priority support was added - Series simplifying and improving the ublk user copy code - Various ublk related cleanups - Fixup REQ_NOWAIT handling in loop/zloop, clearing NOWAIT when the request is punted to a thread for handling - Merge and then later revert loop dio nowait support, as it ended up causing excessive stack usage for when the inline issue code needs to dip back into the full file system code - Improve auto integrity code, making it less deadlock prone - Speedup polled IO handling, but manually managing the hctx lookups - Fixes for blk-throttle for SSD devices - Small series with fixes for the S390 dasd driver - Add support for caching zones, avoiding unnecessary report zone queries - MD pull requests via Yu: - fix null-ptr-dereference regression for dm-raid0 - fix IO hang for raid5 when array is broken with IO inflight - remove legacy 1s delay to speed up system shutdown - change maintainer's email address - data can be lost if array is created with different lbs devices, fix this problem and record lbs of the array in metadata - fix rcu protection for md_thread - fix mddev kobject lifetime regression - enable atomic writes for md-linear - some cleanups - bcache updates via Coly - remove useless discard and cache device code - improve usage of per-cpu workqueues - Reorganize the IO scheduler switching code, fixing some lockdep reports as well - Improve the block layer P2P DMA support - Add support to the block tracing code for zoned devices - Segment calculation improves, and memory alignment flexibility improvements - Set of prep and cleanups patches for ublk batching support. The actual batching hasn't been added yet, but helps shrink down the workload of getting that patchset ready for 6.20 - Fix for how the ps3 block driver handles segments offsets - Improve how block plugging handles batch tag allocations - nbd fixes for use-after-free of the configuration on device clear/put - Set of improvements and fixes for zloop - Add Damien as maintainer of the block zoned device code handling - Various other fixes and cleanups * tag 'for-6.19/block-20251201' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: (162 commits) block/rnbd: correct all kernel-doc complaints blk-mq: use queue_hctx in blk_mq_map_queue_type md: remove legacy 1s delay in md_notify_reboot md/raid5: fix IO hang when array is broken with IO inflight md: warn about updating super block failure md/raid0: fix NULL pointer dereference in create_strip_zones() for dm-raid sbitmap: fix all kernel-doc warnings ublk: add helper of __ublk_fetch() ublk: pass const pointer to ublk_queue_is_zoned() ublk: refactor auto buffer register in ublk_dispatch_req() ublk: add `union ublk_io_buf` with improved naming ublk: add parameter `struct io_uring_cmd *` to ublk_prep_auto_buf_reg() kfifo: add kfifo_alloc_node() helper for NUMA awareness blk-mq: fix potential uaf for 'queue_hw_ctx' blk-mq: use array manage hctx map instead of xarray ublk: prevent invalid access with DEBUG s390/dasd: Use scnprintf() instead of sprintf() s390/dasd: Move device name formatting into separate function s390/dasd: Remove unnecessary debugfs_create() return checks s390/dasd: Fix gendisk parent after copy pair swap ...
2025-12-04	Merge tag 'for-6.19/io_uring-20251201' of ↵	Linus Torvalds	1	-3/+4
	git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux Pull io_uring updates from Jens Axboe: - Unify how task_work cancelations are detected, placing it in the task_work running state rather than needing to check the task state - Series cleaning up and moving the cancelation code to where it belongs, in cancel.c - Cleanup of waitid and futex argument handling - Add support for mixed sized SQEs. 6.18 added support for mixed sized CQEs, improving flexibility and efficiency of workloads that need big CQEs. This adds similar support for SQEs, where the occasional need for a 128b SQE doesn't necessitate having all SQEs be 128b in size - Introduce zcrx and SQ/CQ layout queries. The former returns what zcrx features are available. And both return the ring size information to help with allocation size calculation for user provided rings like IORING_SETUP_NO_MMAP and IORING_MEM_REGION_TYPE_USER - Zcrx updates for 6.19. It includes a bunch of small patches, IORING_REGISTER_ZCRX_CTRL and RQ flushing and David's work on sharing zcrx b/w multiple io_uring instances - Series cleaning up ring initializations, notable deduplicating ring size and offset calculations. It also moves most of the checking before doing any allocations, making the code simpler - Add support for getsockname and getpeername, which is mostly a trivial hookup after a bit of refactoring on the networking side - Various fixes and cleanups * tag 'for-6.19/io_uring-20251201' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: (68 commits) io_uring: Introduce getsockname io_uring cmd socket: Split out a getsockname helper for io_uring socket: Unify getsockname and getpeername implementation io_uring/query: drop unused io_handle_query_entry() ctx arg io_uring/kbuf: remove obsolete buf_nr_pages and update comments io_uring/register: use correct location for io_rings_layout io_uring/zcrx: share an ifq between rings io_uring/zcrx: add io_fill_zcrx_offsets() io_uring/zcrx: export zcrx via a file io_uring/zcrx: move io_zcrx_scrub() and dependencies up io_uring/zcrx: count zcrx users io_uring/zcrx: add sync refill queue flushing io_uring/zcrx: introduce IORING_REGISTER_ZCRX_CTRL io_uring/zcrx: elide passing msg flags io_uring/zcrx: use folio_nr_pages() instead of shift operation io_uring/zcrx: convert to use netmem_desc io_uring/query: introduce rings info query io_uring/query: introduce zcrx query io_uring: move cq/sq user offset init around io_uring: pre-calculate scq layout ...
2025-12-02	nvme-fc: don't hold rport lock when putting ctrl	Daniel Wagner	1	-2/+4
	nvme_fc_ctrl_put can acquire the rport lock when freeing the ctrl object: nvme_fc_ctrl_put nvme_fc_ctrl_free spin_lock_irqsave(rport->lock) Thus we can't hold the rport lock when calling nvme_fc_ctrl_put. Justin suggested use the safe list iterator variant because nvme_fc_ctrl_put will also modify the rport->list. Cc: Justin Tee <justin.tee@broadcom.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Daniel Wagner <wagi@kernel.org> Signed-off-by: Keith Busch <kbusch@kernel.org>
2025-12-02	nvme-pci: add debug message on fail to read CSTS	Gerd Bayer	1	-0/+1
	Add a debug log spelling out that reading the CSTS register failed - to distinguish this from other reasons for ENODEV. Reviewed-by: Wilfred Mallawa <wilfred.mallawa@wdc.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
2025-12-02	nvme-pci: print error message on failure in nvme_probe	Gerd Bayer	1	-0/+1
	Add a new error message that makes failures to probe visible in the kernel log, like: nvme 0008:00:00.0: error -ENODEV: probe failed This highlights issues with a particular device right away instead of leaving users to search for missing drives. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Wilfred Mallawa <wilfred.mallawa@wdc.com> Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
2025-11-27	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Jakub Kicinski	3	-9/+11
	Conflicts: net/xdp/xsk.c 0ebc27a4c67d ("xsk: avoid data corruption on cq descriptor number") 8da7bea7db69 ("xsk: add indirect call for xsk_destruct_skb") 30ed05adca4a ("xsk: use a smaller new lock for shared pool case") https://lore.kernel.org/20251127105450.4a1665ec@canb.auug.org.au https://lore.kernel.org/eb4eee14-7e24-4d1b-b312-e9ea738fefee@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-17	nvme: nvme-fc: Ensure ->ioerr_work is cancelled in nvme_fc_delete_ctrl()	Ewan D. Milne	1	-1/+1
	nvme_fc_delete_assocation() waits for pending I/O to complete before returning, and an error can cause ->ioerr_work to be queued after cancel_work_sync() had been called. Move the call to cancel_work_sync() to be after nvme_fc_delete_association() to ensure ->ioerr_work is not running when the nvme_fc_ctrl object is freed. Otherwise the following can occur: [ 1135.911754] list_del corruption, ff2d24c8093f31f8->next is NULL [ 1135.917705] ------------[ cut here ]------------ [ 1135.922336] kernel BUG at lib/list_debug.c:52! [ 1135.926784] Oops: invalid opcode: 0000 [#1] SMP NOPTI [ 1135.931851] CPU: 48 UID: 0 PID: 726 Comm: kworker/u449:23 Kdump: loaded Not tainted 6.12.0 #1 PREEMPT(voluntary) [ 1135.943490] Hardware name: Dell Inc. PowerEdge R660/0HGTK9, BIOS 2.5.4 01/16/2025 [ 1135.950969] Workqueue: 0x0 (nvme-wq) [ 1135.954673] RIP: 0010:__list_del_entry_valid_or_report.cold+0xf/0x6f [ 1135.961041] Code: c7 c7 98 68 72 94 e8 26 45 fe ff 0f 0b 48 c7 c7 70 68 72 94 e8 18 45 fe ff 0f 0b 48 89 fe 48 c7 c7 80 69 72 94 e8 07 45 fe ff <0f> 0b 48 89 d1 48 c7 c7 a0 6a 72 94 48 89 c2 e8 f3 44 fe ff 0f 0b [ 1135.979788] RSP: 0018:ff579b19482d3e50 EFLAGS: 00010046 [ 1135.985015] RAX: 0000000000000033 RBX: ff2d24c8093f31f0 RCX: 0000000000000000 [ 1135.992148] RDX: 0000000000000000 RSI: ff2d24d6bfa1d0c0 RDI: ff2d24d6bfa1d0c0 [ 1135.999278] RBP: ff2d24c8093f31f8 R08: 0000000000000000 R09: ffffffff951e2b08 [ 1136.006413] R10: ffffffff95122ac8 R11: 0000000000000003 R12: ff2d24c78697c100 [ 1136.013546] R13: fffffffffffffff8 R14: 0000000000000000 R15: ff2d24c78697c0c0 [ 1136.020677] FS: 0000000000000000(0000) GS:ff2d24d6bfa00000(0000) knlGS:0000000000000000 [ 1136.028765] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1136.034510] CR2: 00007fd207f90b80 CR3: 000000163ea22003 CR4: 0000000000f73ef0 [ 1136.041641] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1136.048776] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 [ 1136.055910] PKRU: 55555554 [ 1136.058623] Call Trace: [ 1136.061074] <TASK> [ 1136.063179] ? show_trace_log_lvl+0x1b0/0x2f0 [ 1136.067540] ? show_trace_log_lvl+0x1b0/0x2f0 [ 1136.071898] ? move_linked_works+0x4a/0xa0 [ 1136.075998] ? __list_del_entry_valid_or_report.cold+0xf/0x6f [ 1136.081744] ? __die_body.cold+0x8/0x12 [ 1136.085584] ? die+0x2e/0x50 [ 1136.088469] ? do_trap+0xca/0x110 [ 1136.091789] ? do_error_trap+0x65/0x80 [ 1136.095543] ? __list_del_entry_valid_or_report.cold+0xf/0x6f [ 1136.101289] ? exc_invalid_op+0x50/0x70 [ 1136.105127] ? __list_del_entry_valid_or_report.cold+0xf/0x6f [ 1136.110874] ? asm_exc_invalid_op+0x1a/0x20 [ 1136.115059] ? __list_del_entry_valid_or_report.cold+0xf/0x6f [ 1136.120806] move_linked_works+0x4a/0xa0 [ 1136.124733] worker_thread+0x216/0x3a0 [ 1136.128485] ? __pfx_worker_thread+0x10/0x10 [ 1136.132758] kthread+0xfa/0x240 [ 1136.135904] ? __pfx_kthread+0x10/0x10 [ 1136.139657] ret_from_fork+0x31/0x50 [ 1136.143236] ? __pfx_kthread+0x10/0x10 [ 1136.146988] ret_from_fork_asm+0x1a/0x30 [ 1136.150915] </TASK> Fixes: 19fce0470f05 ("nvme-fc: avoid calling _nvme_fc_abort_outstanding_ios from interrupt context") Cc: stable@vger.kernel.org Tested-by: Marco Patalano <mpatalan@redhat.com> Reviewed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Ewan D. Milne <emilne@redhat.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
2025-11-17	nvme: nvme-fc: move tagset removal to nvme_fc_delete_ctrl()	Ewan D. Milne	1	-6/+7
	Now target is removed from nvme_fc_ctrl_free() which is the ctrl->ref release handler. And even admin queue is unquiesced there, this way is definitely wrong because the ctr->ref is grabbed when submitting command. And Marco observed that nvme_fc_ctrl_free() can be called from request completion code path, and trigger kernel warning since request completes from softirq context. Fix the issue by moveing target removal into nvme_fc_delete_ctrl(), which is also aligned with nvme-tcp and nvme-rdma. Patch originally proposed by Ming Lei, then modified to move the tagset removal down to after nvme_fc_delete_association() after further testing. Cc: Marco Patalano <mpatalan@redhat.com> Cc: Ewan Milne <emilne@redhat.com> Cc: James Smart <james.smart@broadcom.com> Cc: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Ming Lei <ming.lei@redhat.com> Cc: stable@vger.kernel.org Tested-by: Marco Patalano <mpatalan@redhat.com> Reviewed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Ewan D. Milne <emilne@redhat.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
2025-11-17	nvme-multipath: fix lockdep WARN due to partition scan work	Shin'ichiro Kawasaki	1	-1/+1
	Blktests test cases nvme/014, 057 and 058 fail occasionally due to a lockdep WARN. As reported in the Closes tag URL, the WARN indicates that a deadlock can happen due to the dependency among disk->open_mutex, kblockd workqueue completion and partition_scan_work completion. To avoid the lockdep WARN and the potential deadlock, cut the dependency by running the partition_scan_work not by kblockd workqueue but by nvme_wq. Reported-by: Yi Zhang <yi.zhang@redhat.com> Closes: https://lore.kernel.org/linux-block/CAHj4cs8mJ+R_GmQm9R8ebResKAWUE8kF5+_WVg0v8zndmqd6BQ@mail.gmail.com/ Link: https://lore.kernel.org/linux-block/oeyzci6ffshpukpfqgztsdeke5ost5hzsuz4rrsjfmvpqcevax@5nhnwbkzbrpa/ Fixes: 1f021341eef4 ("nvme-multipath: defer partition scanning") Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2025-11-14	block-dma: properly take MMIO path	Leon Romanovsky	1	-8/+65
	In commit eadaa8b255f3 ("dma-mapping: introduce new DMA attribute to indicate MMIO memory"), DMA_ATTR_MMIO attribute was added to describe MMIO addresses, which require to avoid any memory cache flushing, as an outcome of the discussion pointed in Link tag below. In case of PCI_P2PDMA_MAP_THRU_HOST_BRIDGE transfer, blk-mq-dm logic treated this as regular page and relied on "struct page" DMA flow. That flow performs CPU cache flushing, which shouldn't be done here, and doesn't set IOMMU_MMIO flag in DMA-IOMMU case. As a solution, let's encode peer-to-peer transaction type in NVMe IOD flags variable and provide it to blk-mq-dma API. Link: https://lore.kernel.org/all/f912c446-1ae9-4390-9c11-00dce7bf0fd3@arm.com/ Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-11-14	nvme-pci: migrate to dma_map_phys instead of map_page	Leon Romanovsky	1	-12/+13
	After introduction of dma_map_phys(), there is no need to convert from physical address to struct page in order to map page. So let's use it directly. Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-11-07	nvme: remove virtual boundary for sgl capable devices	Keith Busch	8	-8/+47
	The nvme virtual boundary is only required for the PRP format. Devices that can use SGL for DMA don't need it for IO queues. Drop reporting it for such devices; rdma fabrics controllers will continue to use the limit as they currently don't report any boundary requirements, but tcp and fc never needed it in the first place so they get to report no virtual boundary. Applications may continue to align to the same virtual boundaries for optimization purposes if they want, and the driver will continue to decide whether to use the PRP format the same as before if the IO allows it. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-11-06	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Jakub Kicinski	1	-3/+10
	Cross-merge networking fixes after downstream PR (net-6.18-rc5). Conflicts: drivers/net/wireless/ath/ath12k/mac.c 9222582ec524 ("Revert "wifi: ath12k: Fix missing station power save configuration"") 6917e268c433 ("wifi: ath12k: Defer vdev bring-up until CSA finalize to avoid stale beacon") https://lore.kernel.org/11cece9f7e36c12efd732baa5718239b1bf8c950.camel@sipsolutions.net Adjacent changes: drivers/net/ethernet/intel/Kconfig b1d16f7c0063 ("libie: depend on DEBUG_FS when building LIBIE_FWLOG") 93f53db9f9dc ("ice: switch to Page Pool") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-05	nvme: fix admin request_queue lifetime	Keith Busch	1	-1/+2
	The namespaces can access the controller's admin request_queue, and stale references on the namespaces may exist after tearing down the controller. Ensure the admin request_queue is active by moving the controller's 'put' to after all controller references have been released to ensure no one is can access the request_queue. This fixes a reported use-after-free bug: BUG: KASAN: slab-use-after-free in blk_queue_enter+0x41c/0x4a0 Read of size 8 at addr ffff88c0a53819f8 by task nvme/3287 CPU: 67 UID: 0 PID: 3287 Comm: nvme Tainted: G E 6.13.2-ga1582f1a031e #15 Tainted: [E]=UNSIGNED_MODULE Hardware name: Jabil /EGS 2S MB1, BIOS 1.00 06/18/2025 Call Trace: <TASK> dump_stack_lvl+0x4f/0x60 print_report+0xc4/0x620 ? _raw_spin_lock_irqsave+0x70/0xb0 ? _raw_read_unlock_irqrestore+0x30/0x30 ? blk_queue_enter+0x41c/0x4a0 kasan_report+0xab/0xe0 ? blk_queue_enter+0x41c/0x4a0 blk_queue_enter+0x41c/0x4a0 ? __irq_work_queue_local+0x75/0x1d0 ? blk_queue_start_drain+0x70/0x70 ? irq_work_queue+0x18/0x20 ? vprintk_emit.part.0+0x1cc/0x350 ? wake_up_klogd_work_func+0x60/0x60 blk_mq_alloc_request+0x2b7/0x6b0 ? __blk_mq_alloc_requests+0x1060/0x1060 ? __switch_to+0x5b7/0x1060 nvme_submit_user_cmd+0xa9/0x330 nvme_user_cmd.isra.0+0x240/0x3f0 ? force_sigsegv+0xe0/0xe0 ? nvme_user_cmd64+0x400/0x400 ? vfs_fileattr_set+0x9b0/0x9b0 ? cgroup_update_frozen_flag+0x24/0x1c0 ? cgroup_leave_frozen+0x204/0x330 ? nvme_ioctl+0x7c/0x2c0 blkdev_ioctl+0x1a8/0x4d0 ? blkdev_common_ioctl+0x1930/0x1930 ? fdget+0x54/0x380 __x64_sys_ioctl+0x129/0x190 do_syscall_64+0x5b/0x160 entry_SYSCALL_64_after_hwframe+0x4b/0x53 RIP: 0033:0x7f765f703b0b Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d dd 52 0f 00 f7 d8 64 89 01 48 RSP: 002b:00007ffe2cefe808 EFLAGS: 00000202 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007ffe2cefe860 RCX: 00007f765f703b0b RDX: 00007ffe2cefe860 RSI: 00000000c0484e41 RDI: 0000000000000003 RBP: 0000000000000000 R08: 0000000000000003 R09: 0000000000000000 R10: 00007f765f611d50 R11: 0000000000000202 R12: 0000000000000003 R13: 00000000c0484e41 R14: 0000000000000001 R15: 00007ffe2cefea60 </TASK> Reported-by: Casey Chen <cachen@purestorage.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
2025-11-05	block: introduce disk_report_zone()	Damien Le Moal	4	-11/+10
	Commit b76b840fd933 ("dm: Fix dm-zoned-reclaim zone write pointer alignment") introduced an indirect call for the callback function of a report zones executed with blkdev_report_zones(). This is necessary so that the function disk_zone_wplug_sync_wp_offset() can be called to refresh a zone write plug zone write pointer offset after a write error. However, this solution makes following the path of a zone information harder to understand. Clean this up by introducing the new blk_report_zones_args structure to define a zone report callback and its private data and introduce the helper function disk_report_zone() which calls both disk_zone_wplug_sync_wp_offset() and the zone report user callback function for all zones of a zone report. This helper function must be called by all block device drivers that implement the report zones block operation in order to correctly report a zone information. All block device drivers supporting the report_zones block operation are updated to use this new scheme. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-11-05	net: Convert proto_ops connect() callbacks to use sockaddr_unsized	Kees Cook	1	-1/+1
	Update all struct proto_ops connect() callback function prototypes from "struct sockaddr " to "struct sockaddr_unsized " to avoid lying to the compiler about object sizes. Calls into struct proto handlers gain casts that will be removed in the struct proto conversion patch. No binary changes expected. Signed-off-by: Kees Cook <kees@kernel.org> Link: https://patch.msgid.link/20251104002617.2752303-3-kees@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-05	net: Convert proto_ops bind() callbacks to use sockaddr_unsized	Kees Cook	1	-1/+1
	Update all struct proto_ops bind() callback function prototypes from "struct sockaddr " to "struct sockaddr_unsized " to avoid lying to the compiler about object sizes. Calls into struct proto handlers gain casts that will be removed in the struct proto conversion patch. No binary changes expected. Signed-off-by: Kees Cook <kees@kernel.org> Link: https://patch.msgid.link/20251104002617.2752303-2-kees@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-03	io_uring/uring_cmd: avoid double indirect call in task work dispatch	Caleb Sander Mateos	1	-3/+4
	io_uring task work dispatch makes an indirect call to struct io_kiocb's io_task_work.func field to allow running arbitrary task work functions. In the uring_cmd case, this calls io_uring_cmd_work(), which immediately makes another indirect call to struct io_uring_cmd's task_work_cb field. Change the uring_cmd task work callbacks to functions whose signatures match io_req_tw_func_t. Add a function io_uring_cmd_from_tw() to convert from the task work's struct io_tw_req argument to struct io_uring_cmd *. Define a constant IO_URING_CMD_TASK_WORK_ISSUE_FLAGS to avoid manufacturing issue_flags in the uring_cmd task work callbacks. Now uring_cmd task work dispatch makes a single indirect call to the uring_cmd implementation's callback. This also allows removing the task_work_cb field from struct io_uring_cmd, freeing up 8 bytes for future storage. Since fuse_uring_send_in_task() now has access to the io_tw_token_t, check its cancel field directly instead of relying on the IO_URING_F_TASK_DEAD issue flag. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-10-23	nvme-pci: use blk_map_iter for p2p metadata	Keith Busch	1	-3/+10
	The dma_map_bvec helper doesn't work for p2p data, so use the same blk_map_iter method that sgl uses for this memory type. Reported-by: Leon Romanovsky <leon@kernel.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2025-10-16	nvme/tcp: handle tls partially sent records in write_space()	Wilfred Mallawa	1	-0/+3
	With TLS enabled, records that are encrypted and appended to TLS TX list can fail to see a retry if the underlying TCP socket is busy, for example, hitting an EAGAIN from tcp_sendmsg_locked(). This is not known to the NVMe TCP driver, as the TLS layer successfully generated a record. Typically, the TLS write_space() callback would ensure such records are retried, but in the NVMe TCP Host driver, write_space() invokes nvme_tcp_write_space(). This causes a partially sent record in the TLS TX list to timeout after not being retried. This patch fixes the above by calling queue->write_space(), which calls into the TLS layer to retry any pending records. Fixes: be8e82caa685 ("nvme-tcp: enable TLS handshake upcall") Signed-off-by: Wilfred Mallawa <wilfred.mallawa@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2025-10-14	nvme-auth: update sc_c in host response	Martin George	1	-1/+5
	The sc_c field is currently not updated in the host response to the controller challenge leading to failures while attempting secure channel concatenation. Fix this by adding a new sc_c variable to the dhchap queue context structure which is appropriately set during negotiate and then used in the host response. Fixes: e88a7595b57f ("nvme-tcp: request secure channel concatenation") Signed-off-by: Martin George <marting@netapp.com> Signed-off-by: Prashanth Adurthi <prashana@netapp.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2025-10-09	nvme-multipath: Skip nr_active increments in RETRY disposition	Amit Chaudhary	1	-2/+4
	For queue-depth I/O policy, this patch fixes unbalanced I/Os across nvme multipaths. Issue Description: The RETRY disposition incorrectly increments ns->ctrl->nr_active counter and reinitializes iostat start-time. In such cases nr_active counter never goes back to zero until that path disconnects and reconnects. Such a path is not chosen for new I/Os if multiple RETRY cases on a given a path cause its queue-depth counter to be artificially higher compared to other paths. This leads to unbalanced I/Os across paths. The patch skips incrementing nr_active if NVME_MPATH_CNT_ACTIVE is already set. And it skips restarting io stats if NVME_MPATH_IO_STATS is already set. base-commit: e989a3da2d371a4b6597ee8dee5c72e407b4db7a Fixes: d4d957b53d91eeb ("nvme-multipath: support io stats on the mpath device") Signed-off-by: Amit Chaudhary <achaudhary@purestorage.com> Reviewed-by: Randy Jennings <randyj@purestorage.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
2025-10-02	Merge tag 'for-6.18/block-20250929' of ↵	Linus Torvalds	7	-105/+127
	git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux Pull block updates from Jens Axboe: - NVMe pull request via Keith: - FC target fixes (Daniel) - Authentication fixes and updates (Martin, Chris) - Admin controller handling (Kamaljit) - Target lockdep assertions (Max) - Keep-alive updates for discovery (Alastair) - Suspend quirk (Georg) - MD pull request via Yu: - Add support for a lockless bitmap. A key feature for the new bitmap are that the IO fastpath is lockless. If a user issues lots of write IO to the same bitmap bit in a short time, only the first write has additional overhead to update bitmap bit, no additional overhead for the following writes. By supporting only resync or recover written data, means in the case creating new array or replacing with a new disk, there is no need to do a full disk resync/recovery. - Switch ->getgeo() and ->bios_param() to using struct gendisk rather than struct block_device. - Rust block changes via Andreas. This series adds configuration via configfs and remote completion to the rnull driver. The series also includes a set of changes to the rust block device driver API: a few cleanup patches, and a few features supporting the rnull changes. The series removes the raw buffer formatting logic from `kernel::block` and improves the logic available in `kernel::string` to support the same use as the removed logic. - floppy arch cleanups - Reduce the number of dereferencing needed for ublk commands - Restrict supported sockets for nbd. Mostly done to eliminate a class of issues perpetually reported by syzbot, by using nonsensical socket setups. - A few s390 dasd block fixes - Fix a few issues around atomic writes - Improve DMA interation for integrity requests - Improve how iovecs are treated with regards to O_DIRECT aligment constraints. We used to require each segment to adhere to the constraints, now only the request as a whole needs to. - Clean up and improve p2p support, enabling use of p2p for metadata payloads - Improve locking of request lookup, using SRCU where appropriate - Use page references properly for brd, avoiding very long RCU sections - Fix ordering of recursively submitted IOs - Clean up and improve updating nr_requests for a live device - Various fixes and cleanups * tag 'for-6.18/block-20250929' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: (164 commits) s390/dasd: enforce dma_alignment to ensure proper buffer validation s390/dasd: Return BLK_STS_INVAL for EINVAL from do_dasd_request ublk: remove redundant zone op check in ublk_setup_iod() nvme: Use non zero KATO for persistent discovery connections nvmet: add safety check for subsys lock nvme-core: use nvme_is_io_ctrl() for I/O controller check nvme-core: do ioccsz/iorcsz validation only for I/O controllers nvme-core: add method to check for an I/O controller blk-cgroup: fix possible deadlock while configuring policy blk-mq: fix null-ptr-deref in blk_mq_free_tags() from error path blk-mq: Fix more tag iteration function documentation selftests: ublk: fix behavior when fio is not installed ublk: don't access ublk_queue in ublk_unmap_io() ublk: pass ublk_io to __ublk_complete_rq() ublk: don't access ublk_queue in ublk_need_complete_req() ublk: don't access ublk_queue in ublk_check_commit_and_fetch() ublk: don't pass ublk_queue to ublk_fetch() ublk: don't access ublk_queue in ublk_config_io_buf() ublk: don't access ublk_queue in ublk_check_fetch_buf() ublk: pass q_id and tag to __ublk_check_and_get_req() ...
2025-10-02	Merge tag 'for-6.18/io_uring-20250929' of ↵	Linus Torvalds	1	-1/+1
	git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux Pull io_uring updates from Jens Axboe: - Store ring provided buffers locally for the users, rather than stuff them into struct io_kiocb. These types of buffers must always be fully consumed or recycled in the current context, and leaving them in struct io_kiocb is hence not a good ideas as that struct has a vastly different life time. Basically just an architecture cleanup that can help prevent issues with ring provided buffers in the future. - Support for mixed CQE sizes in the same ring. Before this change, a CQ ring either used the default 16b CQEs, or it was setup with 32b CQE using IORING_SETUP_CQE32. For use cases where a few 32b CQEs were needed, this caused everything else to use big CQEs. This is wasteful both in terms of memory usage, but also memory bandwidth for the posted CQEs. With IORING_SETUP_CQE_MIXED, applications may use request types that post both normal 16b and big 32b CQEs on the same ring. - Add helpers for async data management, to make it harder for opcode handlers to mess it up. - Add support for multishot for uring_cmd, which ublk can use. This helps improve efficiency, by providing a persistent request type that can trigger multiple CQEs. - Add initial support for ring feature querying. We had basic support for probe operations, but the API isn't great. Rather than expand that, add support for QUERY which is easily expandable and can cover a lot more cases than the existing probe support. This will help applications get a better idea of what operations are supported on a given host. - zcrx improvements from Pavel: - Improve refill entry alignment for better caching - Various cleanups, especially around deduplicating normal memory vs dmabuf setup. - Generalisation of the niov size (Patch 12). It's still hard coded to PAGE_SIZE on init, but will let the user to specify the rx buffer length on setup. - Syscall / synchronous bufer return. It'll be used as a slow fallback path for returning buffers when the refill queue is full. Useful for tolerating slight queue size misconfiguration or with inconsistent load. - Accounting more memory to cgroups. - Additional independent cleanups that will also be useful for mutli-area support. - Various fixes and cleanups * tag 'for-6.18/io_uring-20250929' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: (68 commits) io_uring/cmd: drop unused res2 param from io_uring_cmd_done() io_uring: fix nvme's 32b cqes on mixed cq io_uring/query: cap number of queries io_uring/query: prevent infinite loops io_uring/zcrx: account niov arrays to cgroup io_uring/zcrx: allow synchronous buffer return io_uring/zcrx: introduce io_parse_rqe() io_uring/zcrx: don't adjust free cache space io_uring/zcrx: use guards for the refill lock io_uring/zcrx: reduce netmem scope in refill io_uring/zcrx: protect netdev with pp_lock io_uring/zcrx: rename dma lock io_uring/zcrx: make niov size variable io_uring/zcrx: set sgt for umem area io_uring/zcrx: remove dmabuf_offset io_uring/zcrx: deduplicate area mapping io_uring/zcrx: pass ifq to io_zcrx_alloc_fallback() io_uring/zcrx: check all niovs filled with dma addresses io_uring/zcrx: move area reg checks into io_import_area io_uring/zcrx: don't pass slot to io_zcrx_create_area ...
2025-10-02	Merge tag 'soc-drivers-6.18' of ↵	Linus Torvalds	1	-60/+137
	git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull SoC driver updates from Arnd Bergmann: "Lots of platform specific updates for Qualcomm SoCs, including a new TEE subsystem driver for the Qualcomm QTEE firmware interface. Added support for the Apple A11 SoC in drivers that are shared with the M1/M2 series, among more updates for those. Smaller platform specific driver updates for Renesas, ASpeed, Broadcom, Nvidia, Mediatek, Amlogic, TI, Allwinner, and Freescale SoCs. Driver updates in the cache controller, memory controller and reset controller subsystems. SCMI firmware updates to add more features and improve robustness. This includes support for having multiple SCMI providers in a single system. TEE subsystem support for protected DMA-bufs, allowing hardware to access memory areas that managed by the kernel but remain inaccessible from the CPU in EL1/EL0" * tag 'soc-drivers-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (139 commits) soc/fsl/qbman: Use for_each_online_cpu() instead of for_each_cpu() soc: fsl: qe: Drop legacy-of-mm-gpiochip.h header from GPIO driver soc: fsl: qe: Change GPIO driver to a proper platform driver tee: fix register_shm_helper() pmdomain: apple: Add "apple,t8103-pmgr-pwrstate" dt-bindings: spmi: Add Apple A11 and T2 compatible serial: qcom-geni: Load UART qup Firmware from linux side spi: geni-qcom: Load spi qup Firmware from linux side i2c: qcom-geni: Load i2c qup Firmware from linux side soc: qcom: geni-se: Add support to load QUP SE Firmware via Linux subsystem soc: qcom: geni-se: Cleanup register defines and update copyright dt-bindings: qcom: se-common: Add QUP Peripheral-specific properties for I2C, SPI, and SERIAL bus Documentation: tee: Add Qualcomm TEE driver tee: qcom: enable TEE_IOC_SHM_ALLOC ioctl tee: qcom: add primordial object tee: add Qualcomm TEE driver tee: increase TEE_MAX_ARG_SIZE to 4096 tee: add TEE_IOCTL_PARAM_ATTR_TYPE_OBJREF tee: add TEE_IOCTL_PARAM_ATTR_TYPE_UBUF tee: add close_context to TEE driver operation ...
2025-09-24	Merge tag 'nvme-6.18-2025-09-23' of git://git.infradead.org/nvme into ↵	Jens Axboe	5	-8/+31
	for-6.18/block Pull NVMe updates from Keith: " - FC target fixes (Daniel) - Authentication fixes and updates (Martin, Chris) - Admin controller handling (Kamaljit) - Target lockdep assertions (Max) - Keep-alive updates for discovery (Alastair) - Suspend quirk (Georg)" * tag 'nvme-6.18-2025-09-23' of git://git.infradead.org/nvme: nvme: Use non zero KATO for persistent discovery connections nvmet: add safety check for subsys lock nvme-core: use nvme_is_io_ctrl() for I/O controller check nvme-core: do ioccsz/iorcsz validation only for I/O controllers nvme-core: add method to check for an I/O controller nvme-pci: Add TUXEDO IBS Gen8 to Samsung sleep quirk nvme-auth: use hkdf_expand_label() nvme-auth: add hkdf_expand_label() nvme-tcp: send only permitted commands for secure concat nvme-fc: use lock accessing port_state and rport state nvmet-fcloop: call done callback even when remote port is gone nvmet-fc: avoid scheduling association deletion twice nvmet-fc: move lsop put work to nvmet_fc_ls_req_op nvme-auth: update bi_directional flag
2025-09-24	nvme: Use non zero KATO for persistent discovery connections	Alistair Francis	1	-1/+7
	The NVMe Base Specification 2.1 states that: """ A host requests an explicit persistent connection ... by specifying a non-zero Keep Alive Timer value in the Connect command. """ As such if we are starting a persistent connection to a discovery controller and the KATO is currently 0 we need to update KATO to a non zero value to avoid continuous timeouts on the target. Signed-off-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2025-09-24	nvme-core: use nvme_is_io_ctrl() for I/O controller check	Martin George	1	-1/+1
	Replace the current I/O controller check in nvme_init_non_mdts_limits() with the helper nvme_is_io_ctrl() function to maintain consistency with similar checks in other parts of the code and improve code readability. Signed-off-by: Martin George <marting@netapp.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2025-09-24	nvme-core: do ioccsz/iorcsz validation only for I/O controllers	Kamaljit Singh	1	-2/+2
	An administrative controller does not support I/O queues, hence it should ignore existing checks for IOCCSZ/IORCSZ. Currently, these checks only exclude a discovery controller but need to also exclude an administrative controller. Signed-off-by: Kamaljit Singh <kamaljit.singh@opensource.wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2025-09-24	nvme-core: add method to check for an I/O controller	Kamaljit Singh	1	-0/+5
	Add nvme_is_io_ctrl() to check if the controller is of type I/O controller. Uses negative logic by excluding an administrative controller and a discovery controller. Signed-off-by: Kamaljit Singh <kamaljit.singh@opensource.wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Keith Busch <kbusch@kernel.org>