summaryrefslogtreecommitdiff
path: root/drivers/nvme
AgeCommit message (Collapse)AuthorFilesLines
2026-02-11nvme-pci: handle changing device dma map requirementsKeith Busch1-15/+30
[ Upstream commit 071be3b0b6575d45be9df9c5b612f5882bfc5e88 ] The initial state of dma_needs_unmap may be false, but change to true while mapping the data iterator. Enabling swiotlb is one such case that can change the result. The nvme driver needs to save the mapped dma vectors to be unmapped later, so allocate as needed during iteration rather than assume it was always allocated at the beginning. This fixes a NULL dereference from accessing an uninitialized dma_vecs when the device dma unmapping requirements change mid-iteration. Fixes: b8b7570a7ec8 ("nvme-pci: fix dma unmapping when using PRPs and not using the IOVA mapping") Link: https://lore.kernel.org/linux-nvme/20260202125738.1194899-1-pradeep.pragallapati@oss.qualcomm.com/ Reported-by: Pradeep P V K <pradeep.pragallapati@oss.qualcomm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-02-11nvmet-tcp: fixup hang in nvmet_tcp_listen_data_ready()Hannes Reinecke1-5/+4
[ Upstream commit 2fa8961d3a6a1c2395d8d560ffed2c782681bade ] When the socket is closed while in TCP_LISTEN a callback is run to flush all outstanding packets, which in turns calls nvmet_tcp_listen_data_ready() with the sk_callback_lock held. So we need to check if we are in TCP_LISTEN before attempting to get the sk_callback_lock() to avoid a deadlock. Link: https://lore.kernel.org/linux-nvme/CAHj4cs-zu7eVB78yUpFjVe2UqMWFkLk8p+DaS3qj+uiGCXBAoA@mail.gmail.com/ Tested-by: Yi Zhang <yi.zhang@redhat.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Hannes Reinecke <hare@kernel.org> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-02-11nvme-fc: release admin tagset if init failsChaitanya Kulkarni1-0/+2
[ Upstream commit d1877cc7270302081a315a81a0ee8331f19f95c8 ] nvme_fabrics creates an NVMe/FC controller in following path: nvmf_dev_write() -> nvmf_create_ctrl() -> nvme_fc_create_ctrl() -> nvme_fc_init_ctrl() nvme_fc_init_ctrl() allocates the admin blk-mq resources right after nvme_add_ctrl() succeeds. If any of the subsequent steps fail (changing the controller state, scheduling connect work, etc.), we jump to the fail_ctrl path, which tears down the controller references but never frees the admin queue/tag set. The leaked blk-mq allocations match the kmemleak report seen during blktests nvme/fc. Check ctrl->ctrl.admin_tagset in the fail_ctrl path and call nvme_remove_admin_tag_set() when it is set so that all admin queue allocations are reclaimed whenever controller setup aborts. Reported-by: Yi Zhang <yi.zhang@redhat.com> Reviewed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Chaitanya Kulkarni <ckulkarnilinux@gmail.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-02-11nvmet-tcp: add bounds checks in nvmet_tcp_build_pdu_iovecYunJe Shin1-0/+17
commit 52a0a98549344ca20ad81a4176d68d28e3c05a5c upstream. nvmet_tcp_build_pdu_iovec() could walk past cmd->req.sg when a PDU length or offset exceeds sg_cnt and then use bogus sg->length/offset values, leading to _copy_to_iter() GPF/KASAN. Guard sg_idx, remaining entries, and sg->length/offset before building the bvec. Fixes: 872d26a391da ("nvmet-tcp: add NVMe over TCP target driver") Signed-off-by: YunJe Shin <ioerts@kookmin.ac.kr> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Joonkyo Jung <joonkyoj@yonsei.ac.kr> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-02-06nvmet: fix race in nvmet_bio_done() leading to NULL pointer dereferenceMing Lei1-1/+2
commit 0fcee2cfc4b2e16e62ff8e0cc2cd8dd24efad65e upstream. There is a race condition in nvmet_bio_done() that can cause a NULL pointer dereference in blk_cgroup_bio_start(): 1. nvmet_bio_done() is called when a bio completes 2. nvmet_req_complete() is called, which invokes req->ops->queue_response(req) 3. The queue_response callback can re-queue and re-submit the same request 4. The re-submission reuses the same inline_bio from nvmet_req 5. Meanwhile, nvmet_req_bio_put() (called after nvmet_req_complete) invokes bio_uninit() for inline_bio, which sets bio->bi_blkg to NULL 6. The re-submitted bio enters submit_bio_noacct_nocheck() 7. blk_cgroup_bio_start() dereferences bio->bi_blkg, causing a crash: BUG: kernel NULL pointer dereference, address: 0000000000000028 #PF: supervisor read access in kernel mode RIP: 0010:blk_cgroup_bio_start+0x10/0xd0 Call Trace: submit_bio_noacct_nocheck+0x44/0x250 nvmet_bdev_execute_rw+0x254/0x370 [nvmet] process_one_work+0x193/0x3c0 worker_thread+0x281/0x3a0 Fix this by reordering nvmet_bio_done() to call nvmet_req_bio_put() BEFORE nvmet_req_complete(). This ensures the bio is cleaned up before the request can be re-submitted, preventing the race condition. Fixes: 190f4c2c863a ("nvmet: fix memory leak of bio integrity") Cc: Dmitry Bogdanov <d.bogdanov@yadro.com> Cc: stable@vger.kernel.org Cc: Guangwu Zhang <guazhang@redhat.com> Link: http://www.mail-archive.com/debian-kernel@lists.debian.org/msg146238.html Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-01-23nvme: fix PCIe subsystem reset controller state transitionNilay Shroff1-1/+4
commit 0edb475ac0a7d153318a24d4dca175a270a5cc4f upstream. The commit d2fe192348f9 (“nvme: only allow entering LIVE from CONNECTING state”) disallows controller state transitions directly from RESETTING to LIVE. However, the NVMe PCIe subsystem reset path relies on this transition to recover the controller on PowerPC (PPC) systems. On PPC systems, issuing a subsystem reset causes a temporary loss of communication with the NVMe adapter. A subsequent PCIe MMIO read then triggers EEH recovery, which restores the PCIe link and brings the controller back online. For EEH recovery to proceed correctly, the controller must transition back to the LIVE state. Due to the changes introduced by commit d2fe192348f9 (“nvme: only allow entering LIVE from CONNECTING state”), the controller can no longer transition directly from RESETTING to LIVE. As a result, EEH recovery exits prematurely, leaving the controller stuck in the RESETTING state. Fix this by explicitly transitioning the controller state from RESETTING to CONNECTING and then to LIVE. This satisfies the updated state transition rules and allows the controller to be successfully recovered on PPC systems following a PCIe subsystem reset. Cc: stable@vger.kernel.org Fixes: d2fe192348f9 ("nvme: only allow entering LIVE from CONNECTING state") Reviewed-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Nilay Shroff <nilay@linux.ibm.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-01-23nvme-pci: disable secondary temp for Wodposit WPBSNM8Ilikara Zheng1-0/+2
commit 340f4fc5508c2905a1f30de229e2a4b299d55735 upstream. Secondary temperature thresholds (temp2_{min,max}) were not reported properly on this NVMe SSD. This resulted in an error while attempting to read these values with sensors(1): ERROR: Can't get value of subfeature temp2_min: I/O error ERROR: Can't get value of subfeature temp2_max: I/O error Add the device to the nvme_id_table with the NVME_QUIRK_NO_SECONDARY_TEMP_THRESH flag to suppress access to all non- composite temperature thresholds. Cc: stable@vger.kernel.org Tested-by: Wu Haotian <rigoligo03@gmail.com> Signed-off-by: Ilikara Zheng <ilikara@aosc.io> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-01-23nvme-tcp: fix NULL pointer dereferences in nvmet_tcp_build_pdu_iovecShivam Kumar1-0/+12
[ Upstream commit 32b63acd78f577b332d976aa06b56e70d054cbba ] Commit efa56305908b ("nvmet-tcp: Fix a kernel panic when host sends an invalid H2C PDU length") added ttag bounds checking and data_offset validation in nvmet_tcp_handle_h2c_data_pdu(), but it did not validate whether the command's data structures (cmd->req.sg and cmd->iov) have been properly initialized before processing H2C_DATA PDUs. The nvmet_tcp_build_pdu_iovec() function dereferences these pointers without NULL checks. This can be triggered by sending H2C_DATA PDU immediately after the ICREQ/ICRESP handshake, before sending a CONNECT command or NVMe write command. Attack vectors that trigger NULL pointer dereferences: 1. H2C_DATA PDU sent before CONNECT → both pointers NULL 2. H2C_DATA PDU for READ command → cmd->req.sg allocated, cmd->iov NULL 3. H2C_DATA PDU for uninitialized command slot → both pointers NULL The fix validates both cmd->req.sg and cmd->iov before calling nvmet_tcp_build_pdu_iovec(). Both checks are required because: - Uninitialized commands: both NULL - READ commands: cmd->req.sg allocated, cmd->iov NULL - WRITE commands: both allocated Fixes: efa56305908b ("nvmet-tcp: Fix a kernel panic when host sends an invalid H2C PDU length") Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Shivam Kumar <kumar.shivam43666@gmail.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-01-23nvme-apple: add "apple,t8103-nvme-ans2" as compatibleJanne Grunau1-0/+1
commit 7d3fa7e954934fbda0a017ac1c305b7b10ecceef upstream. After discussion with the devicetree maintainers we agreed to not extend lists with the generic compatible "apple,nvme-ans2" anymore [1]. Add "apple,t8103-nvme-ans2" as fallback compatible as it is the SoC the driver and bindings were written for. [1]: https://lore.kernel.org/asahi/12ab93b7-1fc2-4ce0-926e-c8141cfe81bf@kernel.org/ Cc: stable@vger.kernel.org # v6.18+ Fixes: 5bd2927aceba ("nvme-apple: Add initial Apple SoC NVMe driver") Reviewed-by: Neal Gompa <neal@gompa.dev> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Janne Grunau <j@jannau.net> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-01-08nvmet: pci-epf: move DMA initialization to EPC init callbackShin'ichiro Kawasaki1-2/+2
commit 511b3b644e28d9b66e32515a74c57ff599e89035 upstream. For DMA initialization to work across all EPC drivers, the DMA initialization has to be done in the .init() callback. This is because not all EPC drivers will have a refclock (which is often needed to access registers of a DMA controller embedded in a PCIe controller) at the time the .bind() callback is called. However, all EPC drivers are guaranteed to have a refclock by the time the .init() callback is called. Thus, move the DMA initialization to the .init() callback. This change was already done for other EPF drivers in commit 60bd3e039aa2 ("PCI: endpoint: pci-epf-{mhi/test}: Move DMA initialization to EPC init callback"). Cc: stable@vger.kernel.org Fixes: 0faa0fe6f90e ("nvmet: New NVMe PCI endpoint function target driver") Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Signed-off-by: Niklas Cassel <cassel@kernel.org> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-01-02nvme-fabrics: add ENOKEY to no retry criteria for authentication failuresJustin Tee1-1/+1
[ Upstream commit 13989207ee29c40501e719512e8dc90768325895 ] With authentication, in addition to EKEYREJECTED there is also no point in retrying reconnects when status is ENOKEY. Thus, add -ENOKEY as another criteria to determine when to stop retries. Cc: Daniel Wagner <wagi@kernel.org> Cc: Hannes Reinecke <hare@suse.de> Closes: https://lore.kernel.org/linux-nvme/20250829-nvme-fc-sync-v3-0-d69c87e63aee@kernel.org/ Signed-off-by: Justin Tee <justintee8345@gmail.com> Tested-by: Daniel Wagner <wagi@kernel.org> Reviewed-by: Daniel Wagner <wagi@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-01-02nvme-fc: don't hold rport lock when putting ctrlDaniel Wagner1-2/+4
[ Upstream commit b71cbcf7d170e51148d5467820ae8a72febcb651 ] nvme_fc_ctrl_put can acquire the rport lock when freeing the ctrl object: nvme_fc_ctrl_put nvme_fc_ctrl_free spin_lock_irqsave(rport->lock) Thus we can't hold the rport lock when calling nvme_fc_ctrl_put. Justin suggested use the safe list iterator variant because nvme_fc_ctrl_put will also modify the rport->list. Cc: Justin Tee <justin.tee@broadcom.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Daniel Wagner <wagi@kernel.org> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-12-18nvme-auth: use kvfree() for memory allocated with kvcalloc()Israel Rukshin1-1/+1
[ Upstream commit bb9f4cca7c031de6f0e85f7ba24abf0172829f85 ] Memory allocated by kvcalloc() may come from vmalloc or kmalloc, so use kvfree() instead of kfree() for proper deallocation. Fixes: aa36d711e945 ("nvme-auth: convert dhchap_auth_list to an array") Signed-off-by: Israel Rukshin <israelr@nvidia.com> Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-11-17nvme: nvme-fc: Ensure ->ioerr_work is cancelled in nvme_fc_delete_ctrl()Ewan D. Milne1-1/+1
nvme_fc_delete_assocation() waits for pending I/O to complete before returning, and an error can cause ->ioerr_work to be queued after cancel_work_sync() had been called. Move the call to cancel_work_sync() to be after nvme_fc_delete_association() to ensure ->ioerr_work is not running when the nvme_fc_ctrl object is freed. Otherwise the following can occur: [ 1135.911754] list_del corruption, ff2d24c8093f31f8->next is NULL [ 1135.917705] ------------[ cut here ]------------ [ 1135.922336] kernel BUG at lib/list_debug.c:52! [ 1135.926784] Oops: invalid opcode: 0000 [#1] SMP NOPTI [ 1135.931851] CPU: 48 UID: 0 PID: 726 Comm: kworker/u449:23 Kdump: loaded Not tainted 6.12.0 #1 PREEMPT(voluntary) [ 1135.943490] Hardware name: Dell Inc. PowerEdge R660/0HGTK9, BIOS 2.5.4 01/16/2025 [ 1135.950969] Workqueue: 0x0 (nvme-wq) [ 1135.954673] RIP: 0010:__list_del_entry_valid_or_report.cold+0xf/0x6f [ 1135.961041] Code: c7 c7 98 68 72 94 e8 26 45 fe ff 0f 0b 48 c7 c7 70 68 72 94 e8 18 45 fe ff 0f 0b 48 89 fe 48 c7 c7 80 69 72 94 e8 07 45 fe ff <0f> 0b 48 89 d1 48 c7 c7 a0 6a 72 94 48 89 c2 e8 f3 44 fe ff 0f 0b [ 1135.979788] RSP: 0018:ff579b19482d3e50 EFLAGS: 00010046 [ 1135.985015] RAX: 0000000000000033 RBX: ff2d24c8093f31f0 RCX: 0000000000000000 [ 1135.992148] RDX: 0000000000000000 RSI: ff2d24d6bfa1d0c0 RDI: ff2d24d6bfa1d0c0 [ 1135.999278] RBP: ff2d24c8093f31f8 R08: 0000000000000000 R09: ffffffff951e2b08 [ 1136.006413] R10: ffffffff95122ac8 R11: 0000000000000003 R12: ff2d24c78697c100 [ 1136.013546] R13: fffffffffffffff8 R14: 0000000000000000 R15: ff2d24c78697c0c0 [ 1136.020677] FS: 0000000000000000(0000) GS:ff2d24d6bfa00000(0000) knlGS:0000000000000000 [ 1136.028765] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1136.034510] CR2: 00007fd207f90b80 CR3: 000000163ea22003 CR4: 0000000000f73ef0 [ 1136.041641] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1136.048776] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 [ 1136.055910] PKRU: 55555554 [ 1136.058623] Call Trace: [ 1136.061074] <TASK> [ 1136.063179] ? show_trace_log_lvl+0x1b0/0x2f0 [ 1136.067540] ? show_trace_log_lvl+0x1b0/0x2f0 [ 1136.071898] ? move_linked_works+0x4a/0xa0 [ 1136.075998] ? __list_del_entry_valid_or_report.cold+0xf/0x6f [ 1136.081744] ? __die_body.cold+0x8/0x12 [ 1136.085584] ? die+0x2e/0x50 [ 1136.088469] ? do_trap+0xca/0x110 [ 1136.091789] ? do_error_trap+0x65/0x80 [ 1136.095543] ? __list_del_entry_valid_or_report.cold+0xf/0x6f [ 1136.101289] ? exc_invalid_op+0x50/0x70 [ 1136.105127] ? __list_del_entry_valid_or_report.cold+0xf/0x6f [ 1136.110874] ? asm_exc_invalid_op+0x1a/0x20 [ 1136.115059] ? __list_del_entry_valid_or_report.cold+0xf/0x6f [ 1136.120806] move_linked_works+0x4a/0xa0 [ 1136.124733] worker_thread+0x216/0x3a0 [ 1136.128485] ? __pfx_worker_thread+0x10/0x10 [ 1136.132758] kthread+0xfa/0x240 [ 1136.135904] ? __pfx_kthread+0x10/0x10 [ 1136.139657] ret_from_fork+0x31/0x50 [ 1136.143236] ? __pfx_kthread+0x10/0x10 [ 1136.146988] ret_from_fork_asm+0x1a/0x30 [ 1136.150915] </TASK> Fixes: 19fce0470f05 ("nvme-fc: avoid calling _nvme_fc_abort_outstanding_ios from interrupt context") Cc: stable@vger.kernel.org Tested-by: Marco Patalano <mpatalan@redhat.com> Reviewed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Ewan D. Milne <emilne@redhat.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
2025-11-17nvme: nvme-fc: move tagset removal to nvme_fc_delete_ctrl()Ewan D. Milne1-6/+7
Now target is removed from nvme_fc_ctrl_free() which is the ctrl->ref release handler. And even admin queue is unquiesced there, this way is definitely wrong because the ctr->ref is grabbed when submitting command. And Marco observed that nvme_fc_ctrl_free() can be called from request completion code path, and trigger kernel warning since request completes from softirq context. Fix the issue by moveing target removal into nvme_fc_delete_ctrl(), which is also aligned with nvme-tcp and nvme-rdma. Patch originally proposed by Ming Lei, then modified to move the tagset removal down to after nvme_fc_delete_association() after further testing. Cc: Marco Patalano <mpatalan@redhat.com> Cc: Ewan Milne <emilne@redhat.com> Cc: James Smart <james.smart@broadcom.com> Cc: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Ming Lei <ming.lei@redhat.com> Cc: stable@vger.kernel.org Tested-by: Marco Patalano <mpatalan@redhat.com> Reviewed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Ewan D. Milne <emilne@redhat.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
2025-11-17nvme-multipath: fix lockdep WARN due to partition scan workShin'ichiro Kawasaki1-1/+1
Blktests test cases nvme/014, 057 and 058 fail occasionally due to a lockdep WARN. As reported in the Closes tag URL, the WARN indicates that a deadlock can happen due to the dependency among disk->open_mutex, kblockd workqueue completion and partition_scan_work completion. To avoid the lockdep WARN and the potential deadlock, cut the dependency by running the partition_scan_work not by kblockd workqueue but by nvme_wq. Reported-by: Yi Zhang <yi.zhang@redhat.com> Closes: https://lore.kernel.org/linux-block/CAHj4cs8mJ+R_GmQm9R8ebResKAWUE8kF5+_WVg0v8zndmqd6BQ@mail.gmail.com/ Link: https://lore.kernel.org/linux-block/oeyzci6ffshpukpfqgztsdeke5ost5hzsuz4rrsjfmvpqcevax@5nhnwbkzbrpa/ Fixes: 1f021341eef4 ("nvme-multipath: defer partition scanning") Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2025-11-17nvmet-auth: update sc_c in target host hash calculationAlistair Francis3-2/+4
Commit 7e091add9c43 "nvme-auth: update sc_c in host response" added the sc_c variable to the dhchap queue context structure which is appropriately set during negotiate and then used in the host response. This breaks secure concat connections with a Linux target as the target code wasn't updated at the same time. This patch fixes this by adding a new sc_c variable to the host hash calculations. Fixes: 7e091add9c43 ("nvme-auth: update sc_c in host response") Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Tested-by: Yi Zhang <yi.zhang@redhat.com> Reviewed-by: Martin George <marting@netapp.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
2025-11-05nvme: fix admin request_queue lifetimeKeith Busch1-1/+2
The namespaces can access the controller's admin request_queue, and stale references on the namespaces may exist after tearing down the controller. Ensure the admin request_queue is active by moving the controller's 'put' to after all controller references have been released to ensure no one is can access the request_queue. This fixes a reported use-after-free bug: BUG: KASAN: slab-use-after-free in blk_queue_enter+0x41c/0x4a0 Read of size 8 at addr ffff88c0a53819f8 by task nvme/3287 CPU: 67 UID: 0 PID: 3287 Comm: nvme Tainted: G E 6.13.2-ga1582f1a031e #15 Tainted: [E]=UNSIGNED_MODULE Hardware name: Jabil /EGS 2S MB1, BIOS 1.00 06/18/2025 Call Trace: <TASK> dump_stack_lvl+0x4f/0x60 print_report+0xc4/0x620 ? _raw_spin_lock_irqsave+0x70/0xb0 ? _raw_read_unlock_irqrestore+0x30/0x30 ? blk_queue_enter+0x41c/0x4a0 kasan_report+0xab/0xe0 ? blk_queue_enter+0x41c/0x4a0 blk_queue_enter+0x41c/0x4a0 ? __irq_work_queue_local+0x75/0x1d0 ? blk_queue_start_drain+0x70/0x70 ? irq_work_queue+0x18/0x20 ? vprintk_emit.part.0+0x1cc/0x350 ? wake_up_klogd_work_func+0x60/0x60 blk_mq_alloc_request+0x2b7/0x6b0 ? __blk_mq_alloc_requests+0x1060/0x1060 ? __switch_to+0x5b7/0x1060 nvme_submit_user_cmd+0xa9/0x330 nvme_user_cmd.isra.0+0x240/0x3f0 ? force_sigsegv+0xe0/0xe0 ? nvme_user_cmd64+0x400/0x400 ? vfs_fileattr_set+0x9b0/0x9b0 ? cgroup_update_frozen_flag+0x24/0x1c0 ? cgroup_leave_frozen+0x204/0x330 ? nvme_ioctl+0x7c/0x2c0 blkdev_ioctl+0x1a8/0x4d0 ? blkdev_common_ioctl+0x1930/0x1930 ? fdget+0x54/0x380 __x64_sys_ioctl+0x129/0x190 do_syscall_64+0x5b/0x160 entry_SYSCALL_64_after_hwframe+0x4b/0x53 RIP: 0033:0x7f765f703b0b Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d dd 52 0f 00 f7 d8 64 89 01 48 RSP: 002b:00007ffe2cefe808 EFLAGS: 00000202 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007ffe2cefe860 RCX: 00007f765f703b0b RDX: 00007ffe2cefe860 RSI: 00000000c0484e41 RDI: 0000000000000003 RBP: 0000000000000000 R08: 0000000000000003 R09: 0000000000000000 R10: 00007f765f611d50 R11: 0000000000000202 R12: 0000000000000003 R13: 00000000c0484e41 R14: 0000000000000001 R15: 00007ffe2cefea60 </TASK> Reported-by: Casey Chen <cachen@purestorage.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
2025-10-23nvme-pci: use blk_map_iter for p2p metadataKeith Busch1-3/+10
The dma_map_bvec helper doesn't work for p2p data, so use the same blk_map_iter method that sgl uses for this memory type. Reported-by: Leon Romanovsky <leon@kernel.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2025-10-23nvmet-auth: update sc_c in host responseHannes Reinecke1-2/+3
The target code should set the sc_c bit in calculating the host response based on the status of the 'concat' setting, otherwise we'll get an authentication mismatch for hosts setting that bit correctly. Fixes: 7e091add9c43 ("nvme-auth: update sc_c in host response") Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2025-10-16nvme/tcp: handle tls partially sent records in write_space()Wilfred Mallawa1-0/+3
With TLS enabled, records that are encrypted and appended to TLS TX list can fail to see a retry if the underlying TCP socket is busy, for example, hitting an EAGAIN from tcp_sendmsg_locked(). This is not known to the NVMe TCP driver, as the TLS layer successfully generated a record. Typically, the TLS write_space() callback would ensure such records are retried, but in the NVMe TCP Host driver, write_space() invokes nvme_tcp_write_space(). This causes a partially sent record in the TLS TX list to timeout after not being retried. This patch fixes the above by calling queue->write_space(), which calls into the TLS layer to retry any pending records. Fixes: be8e82caa685 ("nvme-tcp: enable TLS handshake upcall") Signed-off-by: Wilfred Mallawa <wilfred.mallawa@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2025-10-14nvme-auth: update sc_c in host responseMartin George1-1/+5
The sc_c field is currently not updated in the host response to the controller challenge leading to failures while attempting secure channel concatenation. Fix this by adding a new sc_c variable to the dhchap queue context structure which is appropriately set during negotiate and then used in the host response. Fixes: e88a7595b57f ("nvme-tcp: request secure channel concatenation") Signed-off-by: Martin George <marting@netapp.com> Signed-off-by: Prashanth Adurthi <prashana@netapp.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2025-10-09nvme-multipath: Skip nr_active increments in RETRY dispositionAmit Chaudhary1-2/+4
For queue-depth I/O policy, this patch fixes unbalanced I/Os across nvme multipaths. Issue Description: The RETRY disposition incorrectly increments ns->ctrl->nr_active counter and reinitializes iostat start-time. In such cases nr_active counter never goes back to zero until that path disconnects and reconnects. Such a path is not chosen for new I/Os if multiple RETRY cases on a given a path cause its queue-depth counter to be artificially higher compared to other paths. This leads to unbalanced I/Os across paths. The patch skips incrementing nr_active if NVME_MPATH_CNT_ACTIVE is already set. And it skips restarting io stats if NVME_MPATH_IO_STATS is already set. base-commit: e989a3da2d371a4b6597ee8dee5c72e407b4db7a Fixes: d4d957b53d91eeb ("nvme-multipath: support io stats on the mpath device") Signed-off-by: Amit Chaudhary <achaudhary@purestorage.com> Reviewed-by: Randy Jennings <randyj@purestorage.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
2025-10-02Merge tag 'for-6.18/block-20250929' of ↵Linus Torvalds11-154/+222
git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux Pull block updates from Jens Axboe: - NVMe pull request via Keith: - FC target fixes (Daniel) - Authentication fixes and updates (Martin, Chris) - Admin controller handling (Kamaljit) - Target lockdep assertions (Max) - Keep-alive updates for discovery (Alastair) - Suspend quirk (Georg) - MD pull request via Yu: - Add support for a lockless bitmap. A key feature for the new bitmap are that the IO fastpath is lockless. If a user issues lots of write IO to the same bitmap bit in a short time, only the first write has additional overhead to update bitmap bit, no additional overhead for the following writes. By supporting only resync or recover written data, means in the case creating new array or replacing with a new disk, there is no need to do a full disk resync/recovery. - Switch ->getgeo() and ->bios_param() to using struct gendisk rather than struct block_device. - Rust block changes via Andreas. This series adds configuration via configfs and remote completion to the rnull driver. The series also includes a set of changes to the rust block device driver API: a few cleanup patches, and a few features supporting the rnull changes. The series removes the raw buffer formatting logic from `kernel::block` and improves the logic available in `kernel::string` to support the same use as the removed logic. - floppy arch cleanups - Reduce the number of dereferencing needed for ublk commands - Restrict supported sockets for nbd. Mostly done to eliminate a class of issues perpetually reported by syzbot, by using nonsensical socket setups. - A few s390 dasd block fixes - Fix a few issues around atomic writes - Improve DMA interation for integrity requests - Improve how iovecs are treated with regards to O_DIRECT aligment constraints. We used to require each segment to adhere to the constraints, now only the request as a whole needs to. - Clean up and improve p2p support, enabling use of p2p for metadata payloads - Improve locking of request lookup, using SRCU where appropriate - Use page references properly for brd, avoiding very long RCU sections - Fix ordering of recursively submitted IOs - Clean up and improve updating nr_requests for a live device - Various fixes and cleanups * tag 'for-6.18/block-20250929' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: (164 commits) s390/dasd: enforce dma_alignment to ensure proper buffer validation s390/dasd: Return BLK_STS_INVAL for EINVAL from do_dasd_request ublk: remove redundant zone op check in ublk_setup_iod() nvme: Use non zero KATO for persistent discovery connections nvmet: add safety check for subsys lock nvme-core: use nvme_is_io_ctrl() for I/O controller check nvme-core: do ioccsz/iorcsz validation only for I/O controllers nvme-core: add method to check for an I/O controller blk-cgroup: fix possible deadlock while configuring policy blk-mq: fix null-ptr-deref in blk_mq_free_tags() from error path blk-mq: Fix more tag iteration function documentation selftests: ublk: fix behavior when fio is not installed ublk: don't access ublk_queue in ublk_unmap_io() ublk: pass ublk_io to __ublk_complete_rq() ublk: don't access ublk_queue in ublk_need_complete_req() ublk: don't access ublk_queue in ublk_check_commit_and_fetch() ublk: don't pass ublk_queue to ublk_fetch() ublk: don't access ublk_queue in ublk_config_io_buf() ublk: don't access ublk_queue in ublk_check_fetch_buf() ublk: pass q_id and tag to __ublk_check_and_get_req() ...
2025-10-02Merge tag 'for-6.18/io_uring-20250929' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux Pull io_uring updates from Jens Axboe: - Store ring provided buffers locally for the users, rather than stuff them into struct io_kiocb. These types of buffers must always be fully consumed or recycled in the current context, and leaving them in struct io_kiocb is hence not a good ideas as that struct has a vastly different life time. Basically just an architecture cleanup that can help prevent issues with ring provided buffers in the future. - Support for mixed CQE sizes in the same ring. Before this change, a CQ ring either used the default 16b CQEs, or it was setup with 32b CQE using IORING_SETUP_CQE32. For use cases where a few 32b CQEs were needed, this caused everything else to use big CQEs. This is wasteful both in terms of memory usage, but also memory bandwidth for the posted CQEs. With IORING_SETUP_CQE_MIXED, applications may use request types that post both normal 16b and big 32b CQEs on the same ring. - Add helpers for async data management, to make it harder for opcode handlers to mess it up. - Add support for multishot for uring_cmd, which ublk can use. This helps improve efficiency, by providing a persistent request type that can trigger multiple CQEs. - Add initial support for ring feature querying. We had basic support for probe operations, but the API isn't great. Rather than expand that, add support for QUERY which is easily expandable and can cover a lot more cases than the existing probe support. This will help applications get a better idea of what operations are supported on a given host. - zcrx improvements from Pavel: - Improve refill entry alignment for better caching - Various cleanups, especially around deduplicating normal memory vs dmabuf setup. - Generalisation of the niov size (Patch 12). It's still hard coded to PAGE_SIZE on init, but will let the user to specify the rx buffer length on setup. - Syscall / synchronous bufer return. It'll be used as a slow fallback path for returning buffers when the refill queue is full. Useful for tolerating slight queue size misconfiguration or with inconsistent load. - Accounting more memory to cgroups. - Additional independent cleanups that will also be useful for mutli-area support. - Various fixes and cleanups * tag 'for-6.18/io_uring-20250929' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: (68 commits) io_uring/cmd: drop unused res2 param from io_uring_cmd_done() io_uring: fix nvme's 32b cqes on mixed cq io_uring/query: cap number of queries io_uring/query: prevent infinite loops io_uring/zcrx: account niov arrays to cgroup io_uring/zcrx: allow synchronous buffer return io_uring/zcrx: introduce io_parse_rqe() io_uring/zcrx: don't adjust free cache space io_uring/zcrx: use guards for the refill lock io_uring/zcrx: reduce netmem scope in refill io_uring/zcrx: protect netdev with pp_lock io_uring/zcrx: rename dma lock io_uring/zcrx: make niov size variable io_uring/zcrx: set sgt for umem area io_uring/zcrx: remove dmabuf_offset io_uring/zcrx: deduplicate area mapping io_uring/zcrx: pass ifq to io_zcrx_alloc_fallback() io_uring/zcrx: check all niovs filled with dma addresses io_uring/zcrx: move area reg checks into io_import_area io_uring/zcrx: don't pass slot to io_zcrx_create_area ...
2025-10-02Merge tag 'soc-drivers-6.18' of ↵Linus Torvalds1-60/+137
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull SoC driver updates from Arnd Bergmann: "Lots of platform specific updates for Qualcomm SoCs, including a new TEE subsystem driver for the Qualcomm QTEE firmware interface. Added support for the Apple A11 SoC in drivers that are shared with the M1/M2 series, among more updates for those. Smaller platform specific driver updates for Renesas, ASpeed, Broadcom, Nvidia, Mediatek, Amlogic, TI, Allwinner, and Freescale SoCs. Driver updates in the cache controller, memory controller and reset controller subsystems. SCMI firmware updates to add more features and improve robustness. This includes support for having multiple SCMI providers in a single system. TEE subsystem support for protected DMA-bufs, allowing hardware to access memory areas that managed by the kernel but remain inaccessible from the CPU in EL1/EL0" * tag 'soc-drivers-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (139 commits) soc/fsl/qbman: Use for_each_online_cpu() instead of for_each_cpu() soc: fsl: qe: Drop legacy-of-mm-gpiochip.h header from GPIO driver soc: fsl: qe: Change GPIO driver to a proper platform driver tee: fix register_shm_helper() pmdomain: apple: Add "apple,t8103-pmgr-pwrstate" dt-bindings: spmi: Add Apple A11 and T2 compatible serial: qcom-geni: Load UART qup Firmware from linux side spi: geni-qcom: Load spi qup Firmware from linux side i2c: qcom-geni: Load i2c qup Firmware from linux side soc: qcom: geni-se: Add support to load QUP SE Firmware via Linux subsystem soc: qcom: geni-se: Cleanup register defines and update copyright dt-bindings: qcom: se-common: Add QUP Peripheral-specific properties for I2C, SPI, and SERIAL bus Documentation: tee: Add Qualcomm TEE driver tee: qcom: enable TEE_IOC_SHM_ALLOC ioctl tee: qcom: add primordial object tee: add Qualcomm TEE driver tee: increase TEE_MAX_ARG_SIZE to 4096 tee: add TEE_IOCTL_PARAM_ATTR_TYPE_OBJREF tee: add TEE_IOCTL_PARAM_ATTR_TYPE_UBUF tee: add close_context to TEE driver operation ...
2025-09-24Merge tag 'nvme-6.18-2025-09-23' of git://git.infradead.org/nvme into ↵Jens Axboe9-57/+126
for-6.18/block Pull NVMe updates from Keith: " - FC target fixes (Daniel) - Authentication fixes and updates (Martin, Chris) - Admin controller handling (Kamaljit) - Target lockdep assertions (Max) - Keep-alive updates for discovery (Alastair) - Suspend quirk (Georg)" * tag 'nvme-6.18-2025-09-23' of git://git.infradead.org/nvme: nvme: Use non zero KATO for persistent discovery connections nvmet: add safety check for subsys lock nvme-core: use nvme_is_io_ctrl() for I/O controller check nvme-core: do ioccsz/iorcsz validation only for I/O controllers nvme-core: add method to check for an I/O controller nvme-pci: Add TUXEDO IBS Gen8 to Samsung sleep quirk nvme-auth: use hkdf_expand_label() nvme-auth: add hkdf_expand_label() nvme-tcp: send only permitted commands for secure concat nvme-fc: use lock accessing port_state and rport state nvmet-fcloop: call done callback even when remote port is gone nvmet-fc: avoid scheduling association deletion twice nvmet-fc: move lsop put work to nvmet_fc_ls_req_op nvme-auth: update bi_directional flag
2025-09-24nvme: Use non zero KATO for persistent discovery connectionsAlistair Francis1-1/+7
The NVMe Base Specification 2.1 states that: """ A host requests an explicit persistent connection ... by specifying a non-zero Keep Alive Timer value in the Connect command. """ As such if we are starting a persistent connection to a discovery controller and the KATO is currently 0 we need to update KATO to a non zero value to avoid continuous timeouts on the target. Signed-off-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2025-09-24nvmet: add safety check for subsys lockMax Gurtovoy1-9/+6
Replace comment about required lock with a lockdep_assert_held() check in the following functions: - nvmet_p2pmem_ns_add_p2p() - nvmet_setup_p2p_ns_map() - nvmet_release_p2p_ns_map() This ensures the subsystem lock is held at runtime. Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2025-09-24nvme-core: use nvme_is_io_ctrl() for I/O controller checkMartin George1-1/+1
Replace the current I/O controller check in nvme_init_non_mdts_limits() with the helper nvme_is_io_ctrl() function to maintain consistency with similar checks in other parts of the code and improve code readability. Signed-off-by: Martin George <marting@netapp.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2025-09-24nvme-core: do ioccsz/iorcsz validation only for I/O controllersKamaljit Singh1-2/+2
An administrative controller does not support I/O queues, hence it should ignore existing checks for IOCCSZ/IORCSZ. Currently, these checks only exclude a discovery controller but need to also exclude an administrative controller. Signed-off-by: Kamaljit Singh <kamaljit.singh@opensource.wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2025-09-24nvme-core: add method to check for an I/O controllerKamaljit Singh1-0/+5
Add nvme_is_io_ctrl() to check if the controller is of type I/O controller. Uses negative logic by excluding an administrative controller and a discovery controller. Signed-off-by: Kamaljit Singh <kamaljit.singh@opensource.wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2025-09-20io_uring: fix nvme's 32b cqes on mixed cqKeith Busch1-1/+1
The nvme uring_cmd only uses 32b CQEs. If the ring uses a mixed CQ, then we need to make sure we flag the completion as a 32b CQE. On the other hand, if nvme uring_cmd was using a dedicated 32b CQE, the posting was missing the extra memcpy because it only applied to bit CQEs on a mixed CQ. Fixes: e26dca67fde1943 ("io_uring: add support for IORING_SETUP_CQE_MIXED") Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-09-16nvme-pci: Add TUXEDO IBS Gen8 to Samsung sleep quirkGeorg Gottleuber1-0/+2
On the TUXEDO InfinityBook S Gen8, a Samsung 990 Evo NVMe leads to a high power consumption in s2idle sleep (3.5 watts). This patch applies 'Force No Simple Suspend' quirk to achieve a sleep with a lower power consumption, typically around 1 watts. Signed-off-by: Georg Gottleuber <ggo@tuxedocomputers.com> Signed-off-by: Werner Sembach <wse@tuxedocomputers.com> Cc: stable@vger.kernel.org Signed-off-by: Keith Busch <kbusch@kernel.org>
2025-09-16nvme-auth: use hkdf_expand_label()Chris Leech1-20/+13
When generating keying material during an authentication transaction (secure channel concatenation), the HKDF-Expand-Label function is part of the specified key derivation process. The current open-coded implementation misses the length prefix requirements on the HkdfLabel label and context variable-length vectors (RFC 8446 Section 3.4). Instead, use the hkdf_expand_label() function. Signed-off-by: Chris Leech <cleech@redhat.com> Signed-off-by: Hannes Reinecke <hare@kernel.org> Signed-off-by: Keith Busch <kbusch@kernel.org>
2025-09-16nvme-auth: add hkdf_expand_label()Chris Leech1-0/+53
Provide an implementation of RFC 8446 (TLS 1.3) HKDF-Expand-Label Signed-off-by: Chris Leech <cleech@redhat.com> Signed-off-by: Hannes Reinecke <hare@kernel.org> Signed-off-by: Keith Busch <kbusch@kernel.org>
2025-09-15nvme-tcp: send only permitted commands for secure concatMartin George1-0/+3
In addition to sending permitted commands such as connect/auth over the initial unencrypted admin connection as part of secure channel concatenation, the host also sends commands such as Property Get and Identify on the same. This is a spec violation leading to secure concat failures. Fix this by ensuring these additional commands are avoided on this connection. Fixes: 104d0e2f6222 ("nvme-fabrics: reset admin connection for secure concatenation") Signed-off-by: Martin George <marting@netapp.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2025-09-15nvme-fc: use lock accessing port_state and rport stateDaniel Wagner1-2/+8
nvme_fc_unregister_remote removes the remote port on a lport object at any point in time when there is no active association. This races with with the reconnect logic, because nvme_fc_create_association is not taking a lock to check the port_state and atomically increase the active count on the rport. Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com> Closes: https://lore.kernel.org/all/u4ttvhnn7lark5w3sgrbuy2rxupcvosp4qmvj46nwzgeo5ausc@uyrkdls2muwx Signed-off-by: Daniel Wagner <wagi@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2025-09-15nvmet-fcloop: call done callback even when remote port is goneDaniel Wagner1-3/+5
When the target port is gone, it's not possible to access any of the request resources. The function should just silently drop the response. The comment is misleading in this regard. Though it's still necessary to call the driver via the ->done callback so the driver is able to release all resources. Reported-by: Yi Zhang <yi.zhang@redhat.com> Closes: https://lore.kernel.org/all/CAHj4cs-OBA0WMt5f7R0dz+rR4HcEz19YLhnyGsj-MRV3jWDsPg@mail.gmail.com/ Fixes: 84eedced1c5b ("nvmet-fcloop: drop response if targetport is gone") Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Daniel Wagner <wagi@kernel.org> Signed-off-by: Keith Busch <kbusch@kernel.org>
2025-09-15nvmet-fc: avoid scheduling association deletion twiceDaniel Wagner1-7/+9
When forcefully shutting down a port via the configfs interface, nvmet_port_subsys_drop_link() first calls nvmet_port_del_ctrls() and then nvmet_disable_port(). Both functions will eventually schedule all remaining associations for deletion. The current implementation checks whether an association is about to be removed, but only after the work item has already been scheduled. As a result, it is possible for the first scheduled work item to free all resources, and then for the same work item to be scheduled again for deletion. Because the association list is an RCU list, it is not possible to take a lock and remove the list entry directly, so it cannot be looked up again. Instead, a flag (terminating) must be used to determine whether the association is already in the process of being deleted. Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com> Closes: https://lore.kernel.org/all/rsdinhafrtlguauhesmrrzkybpnvwantwmyfq2ih5aregghax5@mhr7v3eryci3/ Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Daniel Wagner <wagi@kernel.org> Signed-off-by: Keith Busch <kbusch@kernel.org>
2025-09-15nvmet-fc: move lsop put work to nvmet_fc_ls_req_opDaniel Wagner1-10/+9
It’s possible for more than one async command to be in flight from __nvmet_fc_send_ls_req. For each command, a tgtport reference is taken. In the current code, only one put work item is queued at a time, which results in a leaked reference. To fix this, move the work item to the nvmet_fc_ls_req_op struct, which already tracks all resources related to the command. Fixes: 710c69dbaccd ("nvmet-fc: avoid deadlock on delete association path") Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Daniel Wagner <wagi@kernel.org> Signed-off-by: Keith Busch <kbusch@kernel.org>
2025-09-15nvme-auth: update bi_directional flagMartin George1-2/+3
While setting chap->s2 to zero as part of secure channel concatenation, the host missed out to disable the bi_directional flag to indicate that controller authentication is not requested. Fix the same. Fixes: e88a7595b57f ("nvme-tcp: request secure channel concatenation") Signed-off-by: Martin George <marting@netapp.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2025-09-09blk-map: provide the bdev to bio if one existsKeith Busch1-5/+0
We can now safely provide a block device when extracting user pages for driver and user passthrough commands. Set the bdev so the caller doesn't have to do that later. This has an additional benefit of being able to extract P2P pages in the passthrough path. Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-09-09blk-mq-dma: bring back p2p request flagsKeith Busch1-17/+4
We only need to consider data and metadata dma mapping types separately. The request and bio integrity payload have enough flag bits to internally track the mapping type for each. Use these so the caller doesn't need to track them, and provide separete request and integrity helpers to the common code. This will make it easier to scale new mappings, like the proposed MMIO attribute, without burdening the caller to track such things. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-09-06nvme: apple: Add Apple A11 supportNick Chan1-60/+137
Add support for ANS2 NVMe on Apple A11 SoC. This version of ANS2 is less quirky than the one in M1, and does not have NVMMU or Linear SQ. However, it still requires a non-standard 128-byte SQE. Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nick Chan <towinchenmi@gmail.com> Reviewed-by: Sven Peter <sven@kernel.org> Link: https://lore.kernel.org/r/20250826-t8015-nvme-v5-2-caee6ab00144@gmail.com Signed-off-by: Sven Peter <sven@kernel.org>
2025-09-04Merge tag 'pull-getgeo' of ↵Jens Axboe2-3/+3
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs into for-6.18/block Pull struct block_device getgeo changes from Al. "switching ->getgeo() from struct block_device to struct gendisk Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>" * tag 'pull-getgeo' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: block: switch ->getgeo() to struct gendisk scsi: switch ->bios_param() to passing gendisk scsi: switch scsi_bios_ptable() and scsi_partsize() to gendisk
2025-09-02nvme: fix PI insert on writeChristoph Hellwig1-7/+11
I recently ran into an issue where the PI generated using the block layer integrity code differs from that from a kernel using the PRACT fallback when the block layer integrity code is disabled, and I tracked this down to us using PRACT incorrectly. The NVM Command Set Specification (section 5.33 in 1.2, similar in older versions) specifies the PRACT insert behavior as: Inserted protection information consists of the computed CRC for the protection information format (refer to section 5.3.1) in the Guard field, the LBAT field value in the Application Tag field, the LBST field value in the Storage Tag field, if defined, and the computed reference tag in the Logical Block Reference Tag. Where the computed reference tag is defined as following for type 1 and type 2 using the text below that is duplicated in the respective bullet points: the value of the computed reference tag for the first logical block of the command is the value contained in the Initial Logical Block Reference Tag (ILBRT) or Expected Initial Logical Block Reference Tag (EILBRT) field in the command, and the computed reference tag is incremented for each subsequent logical block. So we need to set ILBRT field, but we currently don't. Interestingly this works fine on my older type 1 formatted SSD, but Qemu trips up on this. We already set ILBRT for Write Same since commit aeb7bb061be5 ("nvme: set the PRACT bit when using Write Zeroes with T10 PI"). To ease this, move the PI type check into nvme_set_ref_tag. Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2025-08-25nvme-pci: convert metadata mapping to dma iterKeith Busch1-76/+87
Aligns data and metadata to the similar dma mapping scheme and removes one more user of the scatter-gather dma mapping. Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20250813153153.3260897-10-kbusch@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-08-25nvme-pci: create common sgl unmapping helperKeith Busch1-15/+14
This can be reused by metadata sgls once that starts using the blk-mq dma api. Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20250813153153.3260897-9-kbusch@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-08-25blk-mq-dma: require unmap caller provide p2p map typeKeith Busch1-1/+8
In preparing for integrity dma mappings, we can't rely on the request flag because data and metadata may have different mapping types. Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20250813153153.3260897-4-kbusch@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk>