summaryrefslogtreecommitdiff
path: root/drivers/scsi
AgeCommit message (Collapse)AuthorFilesLines
3 daysscsi: ses: Handle positive SCSI error from ses_recv_diag()Greg Kroah-Hartman1-1/+1
commit 7a9f448d44127217fabc4065c5ba070d4e0b5d37 upstream. ses_recv_diag() can return a positive value, which also means that an error happened, so do not only test for negative values. Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: stable <stable@kernel.org> Assisted-by: gkh_clanker_2000 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://patch.msgid.link/2026022301-bony-overstock-a07f@gregkh Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysscsi: ibmvfc: Fix OOB access in ibmvfc_discover_targets_done()Tyllis Xu1-1/+2
commit 61d099ac4a7a8fb11ebdb6e2ec8d77f38e77362f upstream. A malicious or compromised VIO server can return a num_written value in the discover targets MAD response that exceeds max_targets. This value is stored directly in vhost->num_targets without validation, and is then used as the loop bound in ibmvfc_alloc_targets() to index into disc_buf[], which is only allocated for max_targets entries. Indices at or beyond max_targets access kernel memory outside the DMA-coherent allocation. The out-of-bounds data is subsequently embedded in Implicit Logout and PLOGI MADs that are sent back to the VIO server, leaking kernel memory. Fix by clamping num_written to max_targets before storing it. Fixes: 072b91f9c651 ("[SCSI] ibmvfc: IBM Power Virtual Fibre Channel Adapter Client Driver") Reported-by: Yuhao Jiang <danisjiang@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Tyllis Xu <LivelyCarpet87@gmail.com> Reviewed-by: Dave Marquardt <davemarq@linux.ibm.com> Acked-by: Tyrel Datwyler <tyreld@linux.ibm.com> Link: https://patch.msgid.link/20260314170151.548614-1-LivelyCarpet87@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysscsi: scsi_transport_sas: Fix the maximum channel scanning issueYihang Li1-1/+1
[ Upstream commit d71afa9deb4d413232ba16d693f7d43b321931b4 ] After commit 37c4e72b0651 ("scsi: Fix sas_user_scan() to handle wildcard and multi-channel scans"), if the device supports multiple channels (0 to shost->max_channel), user_scan() invokes updated sas_user_scan() to perform the scan behavior for a specific transfer. However, when the user specifies shost->max_channel, it will return -EINVAL, which is not expected. Fix and support specifying the scan shost->max_channel for scanning. Fixes: 37c4e72b0651 ("scsi: Fix sas_user_scan() to handle wildcard and multi-channel scans") Signed-off-by: Yihang Li <liyihang9@huawei.com> Reviewed-by: John Garry <john.g.garry@oracle.com> Link: https://patch.msgid.link/20260317063147.2182562-1-liyihang9@huawei.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
3 daysscsi: devinfo: Add BLIST_SKIP_IO_HINTS for Iomega ZIPFlorian Fuchs1-1/+1
[ Upstream commit 80bf3b28d32b431f84f244a8469488eb6d96afbb ] The Iomega ZIP 100 (Z100P2) can't process IO Advice Hints Grouping mode page query. It immediately switches to the status phase 0xb8 after receiving the subpage code 0x05 of MODE_SENSE_10 command, which fails imm_out() and turns into DID_ERROR of this command, which leads to unusable device. This was tested with an Iomega ZIP 100 (Z100P2) connected with a StarTech PEX1P2 AX99100 PCIe parallel port card. Prior to this fix, Test Unit Ready fails and the drive can't be used: IMM: returned SCSI status b8 sd 7:0:6:0: [sdh] Test Unit Ready failed: Result: hostbyte=0x01 driverbyte=DRIVER_OK Signed-off-by: Florian Fuchs <fuchsfl@gmail.com> Link: https://patch.msgid.link/20260227181823.892932-1-fuchsfl@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
3 daysscsi: mpi3mr: Clear reset history on ready and recheck state after timeoutRanjan Kumar1-0/+10
[ Upstream commit dbd53975ed4132d161b6a97ebe785a262380182d ] The driver retains reset history even after the IOC has successfully reached the READY state. That leaves stale reset information active during normal operation and can mislead recovery and diagnostics. In addition, if the IOC becomes READY just as the ready timeout loop exits, the driver still follows the failure path and may retry or report failure incorrectly. Clear reset history once READY is confirmed so driver state matches actual IOC status. After the timeout loop, recheck the IOC state and treat READY as success instead of failing. Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com> Link: https://patch.msgid.link/20260225082622.82588-1-ranjan.kumar@broadcom.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
11 daysscsi: core: Fix error handling for scsi_alloc_sdev()Junxiao Bi1-6/+2
commit 4ce7ada40c008fa21b7e52ab9d04e8746e2e9325 upstream. After scsi_sysfs_device_initialize() was called, error paths must call __scsi_remove_device(). Fixes: 1ac22c8eae81 ("scsi: core: Fix refcount leak for tagset_refcnt") Cc: stable@vger.kernel.org Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://patch.msgid.link/20260304164603.51528-1-junxiao.bi@oracle.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 daysscsi: hisi_sas: Fix NULL pointer exception during user_scan()Xingui Yang2-2/+2
[ Upstream commit 8ddc0c26916574395447ebf4cff684314f6873a9 ] user_scan() invokes updated sas_user_scan() for channel 0, and if successful, iteratively scans remaining channels (1 to shost->max_channel) via scsi_scan_host_selected() in commit 37c4e72b0651 ("scsi: Fix sas_user_scan() to handle wildcard and multi-channel scans"). However, hisi_sas supports only one channel, and the current value of max_channel is 1. sas_user_scan() for channel 1 will trigger the following NULL pointer exception: [ 441.554662] Unable to handle kernel NULL pointer dereference at virtual address 00000000000008b0 [ 441.554699] Mem abort info: [ 441.554710] ESR = 0x0000000096000004 [ 441.554718] EC = 0x25: DABT (current EL), IL = 32 bits [ 441.554723] SET = 0, FnV = 0 [ 441.554726] EA = 0, S1PTW = 0 [ 441.554730] FSC = 0x04: level 0 translation fault [ 441.554735] Data abort info: [ 441.554737] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000 [ 441.554742] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 [ 441.554747] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ 441.554752] user pgtable: 4k pages, 48-bit VAs, pgdp=00000828377a6000 [ 441.554757] [00000000000008b0] pgd=0000000000000000, p4d=0000000000000000 [ 441.554769] Internal error: Oops: 0000000096000004 [#1] SMP [ 441.629589] Modules linked in: arm_spe_pmu arm_smmuv3_pmu tpm_tis_spi hisi_uncore_sllc_pmu hisi_uncore_pa_pmu hisi_uncore_l3c_pmu hisi_uncore_hha_pmu hisi_uncore_ddrc_pmu hisi_uncore_cpa_pmu hns3_pmu hisi_ptt hisi_pcie_pmu tpm_tis_core spidev spi_hisi_sfc_v3xx hisi_uncore_pmu spi_dw_mmio fuse hclge hclge_common hisi_sec2 hisi_hpre hisi_zip hisi_qm hns3 hisi_sas_v3_hw sm3_ce sbsa_gwdt hnae3 hisi_sas_main uacce hisi_dma i2c_hisi dm_mirror dm_region_hash dm_log dm_mod [ 441.670819] CPU: 46 UID: 0 PID: 6994 Comm: bash Kdump: loaded Not tainted 7.0.0-rc2+ #84 PREEMPT [ 441.691327] pstate: 81400009 (Nzcv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--) [ 441.698277] pc : sas_find_dev_by_rphy+0x44/0x118 [ 441.702896] lr : sas_find_dev_by_rphy+0x3c/0x118 [ 441.707502] sp : ffff80009abbba40 [ 441.710805] x29: ffff80009abbba40 x28: ffff082819a40008 x27: ffff082810c37c08 [ 441.717930] x26: ffff082810c37c28 x25: ffff082819a40290 x24: ffff082810c37c00 [ 441.725054] x23: 0000000000000000 x22: 0000000000000001 x21: ffff082819a40000 [ 441.732179] x20: ffff082819a40290 x19: 0000000000000000 x18: 0000000000000020 [ 441.739304] x17: 0000000000000000 x16: ffffb5dad6bda690 x15: 00000000ffffffff [ 441.746428] x14: ffff082814c3b26c x13: 00000000ffffffff x12: ffff082814c3b26a [ 441.753553] x11: 00000000000000c0 x10: 000000000000003a x9 : ffffb5dad5ea94f4 [ 441.760678] x8 : 000000000000003a x7 : ffff80009abbbab0 x6 : 0000000000000030 [ 441.767802] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000 [ 441.774926] x2 : ffff08280f35a300 x1 : ffffb5dad7127180 x0 : 0000000000000000 [ 441.782053] Call trace: [ 441.784488] sas_find_dev_by_rphy+0x44/0x118 (P) [ 441.789095] sas_target_alloc+0x24/0xb0 [ 441.792920] scsi_alloc_target+0x290/0x330 [ 441.797010] __scsi_scan_target+0x88/0x258 [ 441.801096] scsi_scan_channel+0x74/0xb8 [ 441.805008] scsi_scan_host_selected+0x170/0x188 [ 441.809615] sas_user_scan+0xfc/0x148 [ 441.813267] store_scan+0x10c/0x180 [ 441.816743] dev_attr_store+0x20/0x40 [ 441.820398] sysfs_kf_write+0x84/0xa8 [ 441.824054] kernfs_fop_write_iter+0x130/0x1c8 [ 441.828487] vfs_write+0x2c0/0x370 [ 441.831880] ksys_write+0x74/0x118 [ 441.835271] __arm64_sys_write+0x24/0x38 [ 441.839182] invoke_syscall+0x50/0x120 [ 441.842919] el0_svc_common.constprop.0+0xc8/0xf0 [ 441.847611] do_el0_svc+0x24/0x38 [ 441.850913] el0_svc+0x38/0x158 [ 441.854043] el0t_64_sync_handler+0xa0/0xe8 [ 441.858214] el0t_64_sync+0x1ac/0x1b0 [ 441.861865] Code: aa1303e0 97ff70a8 34ffff80 d10a4273 (f9445a75) [ 441.867946] ---[ end trace 0000000000000000 ]--- Therefore, set max_channel to 0. Fixes: e21fe3a52692 ("scsi: hisi_sas: add initialisation for v3 pci-based controller") Signed-off-by: Xingui Yang <yangxingui@huawei.com> Signed-off-by: Yihang Li <liyihang9@huawei.com> Link: https://patch.msgid.link/20260305064039.4096775-1-liyihang9@huawei.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
11 daysscsi: hisi_sas: Use macro instead of magic numberYihang Li3-115/+213
[ Upstream commit 4ca7fe99fc8485fcd04b367f37dc7a48f1355419 ] The hisi_sas driver has a large number of magic numbers which makes for unfriendly code reading. Use macro definitions instead. Signed-off-by: Yihang Li <liyihang9@huawei.com> Link: https://lore.kernel.org/r/20250414080845.1220997-2-liyihang9@huawei.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Stable-dep-of: 8ddc0c269165 ("scsi: hisi_sas: Fix NULL pointer exception during user_scan()") Signed-off-by: Sasha Levin <sashal@kernel.org>
11 daysscsi: hisi_sas: Add time interval between two H2D FIS following soft reset specXingui Yang1-0/+1
[ Upstream commit 3c62791322e42d1afd65acfdb5b3a371bde21ede ] Spec says at least 5us between two H2D FIS when do soft reset, but be generous and sleep for about 1ms. Signed-off-by: Xingui Yang <yangxingui@huawei.com> Link: https://lore.kernel.org/r/20241008021822.2617339-11-liyihang9@huawei.com Reviewed-by: Yihang Li <liyihang9@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Stable-dep-of: 8ddc0c269165 ("scsi: hisi_sas: Fix NULL pointer exception during user_scan()") Signed-off-by: Sasha Levin <sashal@kernel.org>
11 daysscsi: ses: Fix devices attaching to different hostsTomas Henzl1-3/+2
[ Upstream commit 70ca8caa96ce473647054f5c7b9dab5423902402 ] On a multipath SAS system some devices don't end up with correct symlinks from the SCSI device to its enclosure. Some devices even have enclosure links pointing to enclosures attached to different SCSI hosts. ses_match_to_enclosure() calls enclosure_for_each_device() which iterates over all enclosures on the system, not just enclosures attached to the current SCSI host. Replace the iteration with a direct call to ses_enclosure_find_by_addr(). Reviewed-by: David Jeffery <djeffery@redhat.com> Signed-off-by: Tomas Henzl <thenzl@redhat.com> Link: https://patch.msgid.link/20260210191850.36784-1-thenzl@redhat.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
11 daysscsi: mpi3mr: Add NULL checks when resetting request and reply queuesRanjan Kumar1-15/+19
[ Upstream commit fa96392ebebc8fade2b878acb14cce0f71016503 ] The driver encountered a crash during resource cleanup when the reply and request queues were NULL due to freed memory. This issue occurred when the creation of reply or request queues failed, and the driver freed the memory first, but attempted to mem set the content of the freed memory, leading to a system crash. Add NULL pointer checks for reply and request queues before accessing the reply/request memory during cleanup Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com> Link: https://patch.msgid.link/20260212070026.30263-1-ranjan.kumar@broadcom.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
11 daysscsi: storvsc: Fix scheduling while atomic on PREEMPT_RTJan Kiszka1-2/+3
[ Upstream commit 57297736c08233987e5d29ce6584c6ca2a831b12 ] This resolves the follow splat and lock-up when running with PREEMPT_RT enabled on Hyper-V: [ 415.140818] BUG: scheduling while atomic: stress-ng-iomix/1048/0x00000002 [ 415.140822] INFO: lockdep is turned off. [ 415.140823] Modules linked in: intel_rapl_msr intel_rapl_common intel_uncore_frequency_common intel_pmc_core pmt_telemetry pmt_discovery pmt_class intel_pmc_ssram_telemetry intel_vsec ghash_clmulni_intel aesni_intel rapl binfmt_misc nls_ascii nls_cp437 vfat fat snd_pcm hyperv_drm snd_timer drm_client_lib drm_shmem_helper snd sg soundcore drm_kms_helper pcspkr hv_balloon hv_utils evdev joydev drm configfs efi_pstore nfnetlink vsock_loopback vmw_vsock_virtio_transport_common hv_sock vmw_vsock_vmci_transport vsock vmw_vmci efivarfs autofs4 ext4 crc16 mbcache jbd2 sr_mod sd_mod cdrom hv_storvsc serio_raw hid_generic scsi_transport_fc hid_hyperv scsi_mod hid hv_netvsc hyperv_keyboard scsi_common [ 415.140846] Preemption disabled at: [ 415.140847] [<ffffffffc0656171>] storvsc_queuecommand+0x2e1/0xbe0 [hv_storvsc] [ 415.140854] CPU: 8 UID: 0 PID: 1048 Comm: stress-ng-iomix Not tainted 6.19.0-rc7 #30 PREEMPT_{RT,(full)} [ 415.140856] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 09/04/2024 [ 415.140857] Call Trace: [ 415.140861] <TASK> [ 415.140861] ? storvsc_queuecommand+0x2e1/0xbe0 [hv_storvsc] [ 415.140863] dump_stack_lvl+0x91/0xb0 [ 415.140870] __schedule_bug+0x9c/0xc0 [ 415.140875] __schedule+0xdf6/0x1300 [ 415.140877] ? rtlock_slowlock_locked+0x56c/0x1980 [ 415.140879] ? rcu_is_watching+0x12/0x60 [ 415.140883] schedule_rtlock+0x21/0x40 [ 415.140885] rtlock_slowlock_locked+0x502/0x1980 [ 415.140891] rt_spin_lock+0x89/0x1e0 [ 415.140893] hv_ringbuffer_write+0x87/0x2a0 [ 415.140899] vmbus_sendpacket_mpb_desc+0xb6/0xe0 [ 415.140900] ? rcu_is_watching+0x12/0x60 [ 415.140902] storvsc_queuecommand+0x669/0xbe0 [hv_storvsc] [ 415.140904] ? HARDIRQ_verbose+0x10/0x10 [ 415.140908] ? __rq_qos_issue+0x28/0x40 [ 415.140911] scsi_queue_rq+0x760/0xd80 [scsi_mod] [ 415.140926] __blk_mq_issue_directly+0x4a/0xc0 [ 415.140928] blk_mq_issue_direct+0x87/0x2b0 [ 415.140931] blk_mq_dispatch_queue_requests+0x120/0x440 [ 415.140933] blk_mq_flush_plug_list+0x7a/0x1a0 [ 415.140935] __blk_flush_plug+0xf4/0x150 [ 415.140940] __submit_bio+0x2b2/0x5c0 [ 415.140944] ? submit_bio_noacct_nocheck+0x272/0x360 [ 415.140946] submit_bio_noacct_nocheck+0x272/0x360 [ 415.140951] ext4_read_bh_lock+0x3e/0x60 [ext4] [ 415.140995] ext4_block_write_begin+0x396/0x650 [ext4] [ 415.141018] ? __pfx_ext4_da_get_block_prep+0x10/0x10 [ext4] [ 415.141038] ext4_da_write_begin+0x1c4/0x350 [ext4] [ 415.141060] generic_perform_write+0x14e/0x2c0 [ 415.141065] ext4_buffered_write_iter+0x6b/0x120 [ext4] [ 415.141083] vfs_write+0x2ca/0x570 [ 415.141087] ksys_write+0x76/0xf0 [ 415.141089] do_syscall_64+0x99/0x1490 [ 415.141093] ? rcu_is_watching+0x12/0x60 [ 415.141095] ? finish_task_switch.isra.0+0xdf/0x3d0 [ 415.141097] ? rcu_is_watching+0x12/0x60 [ 415.141098] ? lock_release+0x1f0/0x2a0 [ 415.141100] ? rcu_is_watching+0x12/0x60 [ 415.141101] ? finish_task_switch.isra.0+0xe4/0x3d0 [ 415.141103] ? rcu_is_watching+0x12/0x60 [ 415.141104] ? __schedule+0xb34/0x1300 [ 415.141106] ? hrtimer_try_to_cancel+0x1d/0x170 [ 415.141109] ? do_nanosleep+0x8b/0x160 [ 415.141111] ? hrtimer_nanosleep+0x89/0x100 [ 415.141114] ? __pfx_hrtimer_wakeup+0x10/0x10 [ 415.141116] ? xfd_validate_state+0x26/0x90 [ 415.141118] ? rcu_is_watching+0x12/0x60 [ 415.141120] ? do_syscall_64+0x1e0/0x1490 [ 415.141121] ? do_syscall_64+0x1e0/0x1490 [ 415.141123] ? rcu_is_watching+0x12/0x60 [ 415.141124] ? do_syscall_64+0x1e0/0x1490 [ 415.141125] ? do_syscall_64+0x1e0/0x1490 [ 415.141127] ? irqentry_exit+0x140/0x7e0 [ 415.141129] entry_SYSCALL_64_after_hwframe+0x76/0x7e get_cpu() disables preemption while the spinlock hv_ringbuffer_write is using is converted to an rt-mutex under PREEMPT_RT. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Tested-by: Florian Bezdeka <florian.bezdeka@siemens.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Tested-by: Michael Kelley <mhklinux@outlook.com> Link: https://patch.msgid.link/0c7fb5cd-fb21-4760-8593-e04bade84744@siemens.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-13scsi: core: Fix refcount leak for tagset_refcntJunxiao Bi1-0/+1
commit 1ac22c8eae81366101597d48360718dff9b9d980 upstream. This leak will cause a hang when tearing down the SCSI host. For example, iscsid hangs with the following call trace: [130120.652718] scsi_alloc_sdev: Allocation failure during SCSI scanning, some SCSI devices might not be configured PID: 2528 TASK: ffff9d0408974e00 CPU: 3 COMMAND: "iscsid" #0 [ffffb5b9c134b9e0] __schedule at ffffffff860657d4 #1 [ffffb5b9c134ba28] schedule at ffffffff86065c6f #2 [ffffb5b9c134ba40] schedule_timeout at ffffffff86069fb0 #3 [ffffb5b9c134bab0] __wait_for_common at ffffffff8606674f #4 [ffffb5b9c134bb10] scsi_remove_host at ffffffff85bfe84b #5 [ffffb5b9c134bb30] iscsi_sw_tcp_session_destroy at ffffffffc03031c4 [iscsi_tcp] #6 [ffffb5b9c134bb48] iscsi_if_recv_msg at ffffffffc0292692 [scsi_transport_iscsi] #7 [ffffb5b9c134bb98] iscsi_if_rx at ffffffffc02929c2 [scsi_transport_iscsi] #8 [ffffb5b9c134bbf0] netlink_unicast at ffffffff85e551d6 #9 [ffffb5b9c134bc38] netlink_sendmsg at ffffffff85e554ef Fixes: 8fe4ce5836e9 ("scsi: core: Fix a use-after-free") Cc: stable@vger.kernel.org Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Mike Christie <michael.christie@oracle.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://patch.msgid.link/20260223232728.93350-1-junxiao.bi@oracle.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-03-13scsi: pm8001: Fix use-after-free in pm8001_queue_command()Salomon Dushimirimana1-2/+3
[ Upstream commit 38353c26db28efd984f51d426eac2396d299cca7 ] Commit e29c47fe8946 ("scsi: pm8001: Simplify pm8001_task_exec()") refactors pm8001_queue_command(), however it introduces a potential cause of a double free scenario when it changes the function to return -ENODEV in case of phy down/device gone state. In this path, pm8001_queue_command() updates task status and calls task_done to indicate to upper layer that the task has been handled. However, this also frees the underlying SAS task. A -ENODEV is then returned to the caller. When libsas sas_ata_qc_issue() receives this error value, it assumes the task wasn't handled/queued by LLDD and proceeds to clean up and free the task again, resulting in a double free. Since pm8001_queue_command() handles the SAS task in this case, it should return 0 to the caller indicating that the task has been handled. Fixes: e29c47fe8946 ("scsi: pm8001: Simplify pm8001_task_exec()") Signed-off-by: Salomon Dushimirimana <salomondush@google.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://patch.msgid.link/20260213192806.439432-1-salomondush@google.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-13scsi: lpfc: Properly set WC for DPP mappingMathias Krause3-6/+35
[ Upstream commit bffda93a51b40afd67c11bf558dc5aae83ca0943 ] Using set_memory_wc() to enable write-combining for the DPP portion of the MMIO mapping is wrong as set_memory_*() is meant to operate on RAM only, not MMIO mappings. In fact, as used currently triggers a BUG_ON() with enabled CONFIG_DEBUG_VIRTUAL. Simply map the DPP region separately and in addition to the already existing mappings, avoiding any possible negative side effects for these. Fixes: 1351e69fc6db ("scsi: lpfc: Add push-to-adapter support to sli4") Signed-off-by: Mathias Krause <minipli@grsecurity.net> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Reviewed-by: Mathias Krause <minipli@grsecurity.net> Link: https://patch.msgid.link/20260212192327.141104-1-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04scsi: buslogic: Reduce stack usageArnd Bergmann1-2/+4
[ Upstream commit e17f0d4cc006265dd92129db4bf9da3a2e4a4f66 ] Some randconfig builds run into excessive stack usage with gcc-14 or higher, which use __attribute__((cold)) where earlier versions did not do that: drivers/scsi/BusLogic.c: In function 'blogic_init': drivers/scsi/BusLogic.c:2398:1: error: the frame size of 1680 bytes is larger than 1536 bytes [-Werror=frame-larger-than=] The problem is that a lot of code gets inlined into blogic_init() here. Two functions stick out, but they are a bit different: - blogic_init_probeinfo_list() actually uses a few hundred bytes of kernel stack, which is a problem in combination with other functions that also do. Marking this one as noinline means that the stack slots get get reused between function calls - blogic_reportconfig() has a few large variables, but whenever it is not inlined into its caller, the compiler is actually smart enough to reuse stack slots for these automatically, so marking it as noinline saves most of the stack space by itself. The combination of both of these should avoid the problem entirely. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://patch.msgid.link/20260203163321.2598593-1-arnd@kernel.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04scsi: csiostor: Fix dereference of null pointer rnColin Ian King1-1/+2
[ Upstream commit 1982257570b84dc33753d536dd969fd357a014e9 ] The error exit path when rn is NULL ends up deferencing the null pointer rn via the use of the macro CSIO_INC_STATS. Fix this by adding a new error return path label after the use of the macro to avoid the deference. Fixes: a3667aaed569 ("[SCSI] csiostor: Chelsio FCoE offload driver") Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Link: https://patch.msgid.link/20260129155332.196338-1-colin.i.king@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04scsi: smartpqi: Fix memory leak in pqi_report_phys_luns()Zilin Guan1-3/+10
[ Upstream commit 41b37312bd9722af77ec7817ccf22d7a4880c289 ] pqi_report_phys_luns() fails to release the rpl_list buffer when encountering an unsupported data format or when the allocation for rpl_16byte_wwid_list fails. These early returns bypass the cleanup logic, leading to memory leaks. Consolidate the error handling by adding an out_free_rpl_list label and use goto statements to ensure rpl_list is consistently freed on failure. Compile tested only. Issue found using a prototype static analysis tool and code review. Fixes: 28ca6d876c5a ("scsi: smartpqi: Add extended report physical LUNs") Signed-off-by: Zilin Guan <zilin@seu.edu.cn> Tested-by: Don Brace <don.brace@microchip.com> Acked-by: Don Brace <don.brace@microchip.com> Link: https://patch.msgid.link/20260131093641.1008117-1-zilin@seu.edu.cn Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04scsi: efct: Use IRQF_ONESHOT and default primary handlerSebastian Andrzej Siewior1-7/+1
[ Upstream commit bd81f07e9a27c341cd7e72be95eb0b7cf3910926 ] There is no added value in efct_intr_msix() compared to irq_default_primary_handler(). Using a threaded interrupt without a dedicated primary handler mandates the IRQF_ONESHOT flag to mask the interrupt source while the threaded handler is active. Otherwise the interrupt can fire again before the threaded handler had a chance to run. Use the default primary interrupt handler by specifying NULL and set IRQF_ONESHOT so the interrupt source is masked until the secondary handler is done. Fixes: 4df84e8466242 ("scsi: elx: efct: Driver initialization routines") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Link: https://patch.msgid.link/20260128095540.863589-8-bigeasy@linutronix.de Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-02-19scsi: qla2xxx: Fix bsg_done() causing double freeAnil Gurumurthy1-11/+17
commit c2c68225b1456f4d0d393b5a8778d51bb0d5b1d0 upstream. Kernel panic observed on system, [5353358.825191] BUG: unable to handle page fault for address: ff5f5e897b024000 [5353358.825194] #PF: supervisor write access in kernel mode [5353358.825195] #PF: error_code(0x0002) - not-present page [5353358.825196] PGD 100006067 P4D 0 [5353358.825198] Oops: 0002 [#1] PREEMPT SMP NOPTI [5353358.825200] CPU: 5 PID: 2132085 Comm: qlafwupdate.sub Kdump: loaded Tainted: G W L ------- --- 5.14.0-503.34.1.el9_5.x86_64 #1 [5353358.825203] Hardware name: HPE ProLiant DL360 Gen11/ProLiant DL360 Gen11, BIOS 2.44 01/17/2025 [5353358.825204] RIP: 0010:memcpy_erms+0x6/0x10 [5353358.825211] RSP: 0018:ff591da8f4f6b710 EFLAGS: 00010246 [5353358.825212] RAX: ff5f5e897b024000 RBX: 0000000000007090 RCX: 0000000000001000 [5353358.825213] RDX: 0000000000001000 RSI: ff591da8f4fed090 RDI: ff5f5e897b024000 [5353358.825214] RBP: 0000000000010000 R08: ff5f5e897b024000 R09: 0000000000000000 [5353358.825215] R10: ff46cf8c40517000 R11: 0000000000000001 R12: 0000000000008090 [5353358.825216] R13: ff591da8f4f6b720 R14: 0000000000001000 R15: 0000000000000000 [5353358.825218] FS: 00007f1e88d47740(0000) GS:ff46cf935f940000(0000) knlGS:0000000000000000 [5353358.825219] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [5353358.825220] CR2: ff5f5e897b024000 CR3: 0000000231532004 CR4: 0000000000771ef0 [5353358.825221] PKRU: 55555554 [5353358.825222] Call Trace: [5353358.825223] <TASK> [5353358.825224] ? show_trace_log_lvl+0x1c4/0x2df [5353358.825229] ? show_trace_log_lvl+0x1c4/0x2df [5353358.825232] ? sg_copy_buffer+0xc8/0x110 [5353358.825236] ? __die_body.cold+0x8/0xd [5353358.825238] ? page_fault_oops+0x134/0x170 [5353358.825242] ? kernelmode_fixup_or_oops+0x84/0x110 [5353358.825244] ? exc_page_fault+0xa8/0x150 [5353358.825247] ? asm_exc_page_fault+0x22/0x30 [5353358.825252] ? memcpy_erms+0x6/0x10 [5353358.825253] sg_copy_buffer+0xc8/0x110 [5353358.825259] qla2x00_process_vendor_specific+0x652/0x1320 [qla2xxx] [5353358.825317] qla24xx_bsg_request+0x1b2/0x2d0 [qla2xxx] Most routines in qla_bsg.c call bsg_done() only for success cases. However a few invoke it for failure case as well leading to a double free. Validate before calling bsg_done(). Cc: stable@vger.kernel.org Signed-off-by: Anil Gurumurthy <agurumurthy@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Reviewed-by: Himanshu Madhani <hmadhani2024@gmail.com> Link: https://patch.msgid.link/20251210101604.431868-12-njavali@marvell.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-02-16scsi: qla2xxx: Query FW again before proceeding with loginAnil Gurumurthy2-4/+34
commit 42b2dab4340d39b71334151e10c6d7d9b0040ffa upstream. Issue occurred during a continuous reboot test of several thousand iterations specific to a fabric topo with dual mode target where it sends a PLOGI/PRLI and then sends a LOGO. The initiator was also in the process of discovery and sent a PLOGI to the switch. It then queried a list of ports logged in via mbx 75h and the GPDB response indicated that the target was logged in. This caused a mismatch in the states between the driver and FW. Requery the FW for the state and proceed with the rest of discovery process. Fixes: a4239945b8ad ("scsi: qla2xxx: Add switch command to simplify fabric discovery") Cc: stable@vger.kernel.org Signed-off-by: Anil Gurumurthy <agurumurthy@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Reviewed-by: Himanshu Madhani <hmadhani2024@gmail.com> Link: https://patch.msgid.link/20251210101604.431868-11-njavali@marvell.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-02-16scsi: qla2xxx: Free sp in error path to fix system crashAnil Gurumurthy1-2/+2
commit 7adbd2b7809066c75f0433e5e2a8e114b429f30f upstream. System crash seen during load/unload test in a loop, [61110.449331] qla2xxx [0000:27:00.0]-0042:0: Disabled MSI-X. [61110.467494] ============================================================================= [61110.467498] BUG qla2xxx_srbs (Tainted: G OE -------- --- ): Objects remaining in qla2xxx_srbs on __kmem_cache_shutdown() [61110.467501] ----------------------------------------------------------------------------- [61110.467502] Slab 0x000000000ffc8162 objects=51 used=1 fp=0x00000000e25d3d85 flags=0x57ffffc0010200(slab|head|node=1|zone=2|lastcpupid=0x1fffff) [61110.467509] CPU: 53 PID: 455206 Comm: rmmod Kdump: loaded Tainted: G OE -------- --- 5.14.0-284.11.1.el9_2.x86_64 #1 [61110.467513] Hardware name: HPE ProLiant DL385 Gen10 Plus v2/ProLiant DL385 Gen10 Plus v2, BIOS A42 08/17/2023 [61110.467515] Call Trace: [61110.467516] <TASK> [61110.467519] dump_stack_lvl+0x34/0x48 [61110.467526] slab_err.cold+0x53/0x67 [61110.467534] __kmem_cache_shutdown+0x16e/0x320 [61110.467540] kmem_cache_destroy+0x51/0x160 [61110.467544] qla2x00_module_exit+0x93/0x99 [qla2xxx] [61110.467607] ? __do_sys_delete_module.constprop.0+0x178/0x280 [61110.467613] ? syscall_trace_enter.constprop.0+0x145/0x1d0 [61110.467616] ? do_syscall_64+0x5c/0x90 [61110.467619] ? exc_page_fault+0x62/0x150 [61110.467622] ? entry_SYSCALL_64_after_hwframe+0x63/0xcd [61110.467626] </TASK> [61110.467627] Disabling lock debugging due to kernel taint [61110.467635] Object 0x0000000026f7e6e6 @offset=16000 [61110.467639] ------------[ cut here ]------------ [61110.467639] kmem_cache_destroy qla2xxx_srbs: Slab cache still has objects when called from qla2x00_module_exit+0x93/0x99 [qla2xxx] [61110.467659] WARNING: CPU: 53 PID: 455206 at mm/slab_common.c:520 kmem_cache_destroy+0x14d/0x160 [61110.467718] CPU: 53 PID: 455206 Comm: rmmod Kdump: loaded Tainted: G B OE -------- --- 5.14.0-284.11.1.el9_2.x86_64 #1 [61110.467720] Hardware name: HPE ProLiant DL385 Gen10 Plus v2/ProLiant DL385 Gen10 Plus v2, BIOS A42 08/17/2023 [61110.467721] RIP: 0010:kmem_cache_destroy+0x14d/0x160 [61110.467724] Code: 99 7d 07 00 48 89 ef e8 e1 6a 07 00 eb b3 48 8b 55 60 48 8b 4c 24 20 48 c7 c6 70 fc 66 90 48 c7 c7 f8 ef a1 90 e8 e1 ed 7c 00 <0f> 0b eb 93 c3 cc cc cc cc 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 [61110.467725] RSP: 0018:ffffa304e489fe80 EFLAGS: 00010282 [61110.467727] RAX: 0000000000000000 RBX: ffffffffc0d9a860 RCX: 0000000000000027 [61110.467729] RDX: ffff8fd5ff9598a8 RSI: 0000000000000001 RDI: ffff8fd5ff9598a0 [61110.467730] RBP: ffff8fb6aaf78700 R08: 0000000000000000 R09: 0000000100d863b7 [61110.467731] R10: ffffa304e489fd20 R11: ffffffff913bef48 R12: 0000000040002000 [61110.467731] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [61110.467733] FS: 00007f64c89fb740(0000) GS:ffff8fd5ff940000(0000) knlGS:0000000000000000 [61110.467734] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [61110.467735] CR2: 00007f0f02bfe000 CR3: 00000020ad6dc005 CR4: 0000000000770ee0 [61110.467736] PKRU: 55555554 [61110.467737] Call Trace: [61110.467738] <TASK> [61110.467739] qla2x00_module_exit+0x93/0x99 [qla2xxx] [61110.467755] ? __do_sys_delete_module.constprop.0+0x178/0x280 Free sp in the error path to fix the crash. Fixes: f352eeb75419 ("scsi: qla2xxx: Add ability to use GPNFT/GNNFT for RSCN handling") Cc: stable@vger.kernel.org Signed-off-by: Anil Gurumurthy <agurumurthy@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Reviewed-by: Himanshu Madhani <hmadhani2024@gmail.com> Link: https://patch.msgid.link/20251210101604.431868-9-njavali@marvell.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-02-16scsi: qla2xxx: Delay module unload while fabric scan in progressAnil Gurumurthy1-1/+2
commit 8890bf450e0b6b283f48ac619fca5ac2f14ddd62 upstream. System crash seen during load/unload test in a loop. [105954.384919] RBP: ffff914589838dc0 R08: 0000000000000000 R09: 0000000000000086 [105954.384920] R10: 000000000000000f R11: ffffa31240904be5 R12: ffff914605f868e0 [105954.384921] R13: ffff914605f86910 R14: 0000000000008010 R15: 00000000ddb7c000 [105954.384923] FS: 0000000000000000(0000) GS:ffff9163fec40000(0000) knlGS:0000000000000000 [105954.384925] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [105954.384926] CR2: 000055d31ce1d6a0 CR3: 0000000119f5e001 CR4: 0000000000770ee0 [105954.384928] PKRU: 55555554 [105954.384929] Call Trace: [105954.384931] <IRQ> [105954.384934] qla24xx_sp_unmap+0x1f3/0x2a0 [qla2xxx] [105954.384962] ? qla_async_scan_sp_done+0x114/0x1f0 [qla2xxx] [105954.384980] ? qla24xx_els_ct_entry+0x4de/0x760 [qla2xxx] [105954.384999] ? __wake_up_common+0x80/0x190 [105954.385004] ? qla24xx_process_response_queue+0xc2/0xaa0 [qla2xxx] [105954.385023] ? qla24xx_msix_rsp_q+0x44/0xb0 [qla2xxx] [105954.385040] ? __handle_irq_event_percpu+0x3d/0x190 [105954.385044] ? handle_irq_event+0x58/0xb0 [105954.385046] ? handle_edge_irq+0x93/0x240 [105954.385050] ? __common_interrupt+0x41/0xa0 [105954.385055] ? common_interrupt+0x3e/0xa0 [105954.385060] ? asm_common_interrupt+0x22/0x40 The root cause of this was that there was a free (dma_free_attrs) in the interrupt context. There was a device discovery/fabric scan in progress. A module unload was issued which set the UNLOADING flag. As part of the discovery, after receiving an interrupt a work queue was scheduled (which involved a work to be queued). Since the UNLOADING flag is set, the work item was not allocated and the mapped memory had to be freed. The free occurred in interrupt context leading to system crash. Delay the driver unload until the fabric scan is complete to avoid the crash. Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/all/202512090414.07Waorz0-lkp@intel.com/ Fixes: 783e0dc4f66a ("qla2xxx: Check for device state before unloading the driver.") Cc: stable@vger.kernel.org Signed-off-by: Anil Gurumurthy <agurumurthy@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Reviewed-by: Himanshu Madhani <hmadhani2024@gmail.com> Link: https://patch.msgid.link/20251210101604.431868-8-njavali@marvell.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-02-16scsi: qla2xxx: Allow recovery for tape devicesShreyas Deodhar2-12/+0
commit b0335ee4fb94832a4ef68774ca7e7b33b473c7a6 upstream. Tape device doesn't show up after RSCNs. To fix this, remove tape device specific checks which allows recovery of tape devices. Fixes: 44c57f205876 ("scsi: qla2xxx: Changes to support FCP2 Target") Cc: stable@vger.kernel.org Signed-off-by: Shreyas Deodhar <sdeodhar@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Reviewed-by: Himanshu Madhani <hmadhani2024@gmail.com> Link: https://patch.msgid.link/20251210101604.431868-7-njavali@marvell.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-02-16scsi: qla2xxx: Validate sp before freeing associated memoryAnil Gurumurthy1-16/+18
commit b6df15aec8c3441357d4da0eaf4339eb20f5999f upstream. System crash with the following signature [154563.214890] nvme nvme2: NVME-FC{1}: controller connect complete [154564.169363] qla2xxx [0000:b0:00.1]-3002:2: nvme: Sched: Set ZIO exchange threshold to 3. [154564.169405] qla2xxx [0000:b0:00.1]-ffffff:2: SET ZIO Activity exchange threshold to 5. [154565.539974] qla2xxx [0000:b0:00.1]-5013:2: RSCN database changed – 0078 0080 0000. [154565.545744] qla2xxx [0000:b0:00.1]-5013:2: RSCN database changed – 0078 00a0 0000. [154565.545857] qla2xxx [0000:b0:00.1]-11a2:2: FEC=enabled (data rate). [154565.552760] qla2xxx [0000:b0:00.1]-11a2:2: FEC=enabled (data rate). [154565.553079] BUG: kernel NULL pointer dereference, address: 00000000000000f8 [154565.553080] #PF: supervisor read access in kernel mode [154565.553082] #PF: error_code(0x0000) - not-present page [154565.553084] PGD 80000010488ab067 P4D 80000010488ab067 PUD 104978a067 PMD 0 [154565.553089] Oops: 0000 1 PREEMPT SMP PTI [154565.553092] CPU: 10 PID: 858 Comm: qla2xxx_2_dpc Kdump: loaded Tainted: G OE ------- --- 5.14.0-503.11.1.el9_5.x86_64 #1 [154565.553096] Hardware name: HPE Synergy 660 Gen10/Synergy 660 Gen10 Compute Module, BIOS I43 09/30/2024 [154565.553097] RIP: 0010:qla_fab_async_scan.part.0+0x40b/0x870 [qla2xxx] [154565.553141] Code: 00 00 e8 58 a3 ec d4 49 89 e9 ba 12 20 00 00 4c 89 e6 49 c7 c0 00 ee a8 c0 48 c7 c1 66 c0 a9 c0 bf 00 80 00 10 e8 15 69 00 00 <4c> 8b 8d f8 00 00 00 4d 85 c9 74 35 49 8b 84 24 00 19 00 00 48 8b [154565.553143] RSP: 0018:ffffb4dbc8aebdd0 EFLAGS: 00010286 [154565.553145] RAX: 0000000000000000 RBX: ffff8ec2cf0908d0 RCX: 0000000000000002 [154565.553147] RDX: 0000000000000000 RSI: ffffffffc0a9c896 RDI: ffffb4dbc8aebd47 [154565.553148] RBP: 0000000000000000 R08: ffffb4dbc8aebd45 R09: 0000000000ffff0a [154565.553150] R10: 0000000000000000 R11: 000000000000000f R12: ffff8ec2cf0908d0 [154565.553151] R13: ffff8ec2cf090900 R14: 0000000000000102 R15: ffff8ec2cf084000 [154565.553152] FS: 0000000000000000(0000) GS:ffff8ed27f800000(0000) knlGS:0000000000000000 [154565.553154] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [154565.553155] CR2: 00000000000000f8 CR3: 000000113ae0a005 CR4: 00000000007706f0 [154565.553157] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [154565.553158] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [154565.553159] PKRU: 55555554 [154565.553160] Call Trace: [154565.553162] <TASK> [154565.553165] ? show_trace_log_lvl+0x1c4/0x2df [154565.553172] ? show_trace_log_lvl+0x1c4/0x2df [154565.553177] ? qla_fab_async_scan.part.0+0x40b/0x870 [qla2xxx] [154565.553215] ? __die_body.cold+0x8/0xd [154565.553218] ? page_fault_oops+0x134/0x170 [154565.553223] ? snprintf+0x49/0x70 [154565.553229] ? exc_page_fault+0x62/0x150 [154565.553238] ? asm_exc_page_fault+0x22/0x30 Check for sp being non NULL before freeing any associated memory Fixes: a4239945b8ad ("scsi: qla2xxx: Add switch command to simplify fabric discovery") Cc: stable@vger.kernel.org Signed-off-by: Anil Gurumurthy <agurumurthy@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Reviewed-by: Himanshu Madhani <hmadhani2024@gmail.com> Link: https://patch.msgid.link/20251210101604.431868-10-njavali@marvell.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-02-06scsi: qla2xxx: edif: Fix dma_free_coherent() sizeThomas Fourier1-1/+1
commit 56bd3c0f749f45793d1eae1d0ddde4255c749bf6 upstream. Earlier in the function, the ha->flt buffer is allocated with size sizeof(struct qla_flt_header) + FLT_REGIONS_SIZE but freed in the error path with size SFP_DEV_SIZE. Fixes: 84318a9f01ce ("scsi: qla2xxx: edif: Add send, receive, and accept for auth_els") Cc: stable@vger.kernel.org Signed-off-by: Thomas Fourier <fourier.thomas@gmail.com> Link: https://patch.msgid.link/20260112134326.55466-2-fourier.thomas@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-02-06scsi: be2iscsi: Fix a memory leak in beiscsi_boot_get_sinfo()Haoxiang Li1-0/+1
commit 4747bafaa50115d9667ece446b1d2d4aba83dc7f upstream. If nonemb_cmd->va fails to be allocated, free the allocation previously made by alloc_mcc_wrb(). Fixes: 50a4b824be9e ("scsi: be2iscsi: Fix to make boot discovery non-blocking") Cc: stable@vger.kernel.org Signed-off-by: Haoxiang Li <lihaoxiang@isrc.iscas.ac.cn> Link: https://patch.msgid.link/20251213083643.301240-1-lihaoxiang@isrc.iscas.ac.cn Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-01-30scsi: qla2xxx: Sanitize payload size to prevent member overflowJiasheng Jiang1-0/+7
[ Upstream commit 19bc5f2a6962dfaa0e32d0e0bc2271993d85d414 ] In qla27xx_copy_fpin_pkt() and qla27xx_copy_multiple_pkt(), the frame_size reported by firmware is used to calculate the copy length into item->iocb. However, the iocb member is defined as a fixed-size 64-byte array within struct purex_item. If the reported frame_size exceeds 64 bytes, subsequent memcpy calls will overflow the iocb member boundary. While extra memory might be allocated, this cross-member write is unsafe and triggers warnings under CONFIG_FORTIFY_SOURCE. Fix this by capping total_bytes to the size of the iocb member (64 bytes) before allocation and copying. This ensures all copies remain within the bounds of the destination structure member. Fixes: 875386b98857 ("scsi: qla2xxx: Add Unsolicited LS Request and Response Support for NVMe") Signed-off-by: Jiasheng Jiang <jiashengjiangcool@gmail.com> Reviewed-by: Himanshu Madhani <hmadhani2024@gmail.com> Link: https://patch.msgid.link/20260106205344.18031-1-jiashengjiangcool@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-01-30scsi: core: Wake up the error handler when final completions race against ↵David Jeffery2-1/+18
each other [ Upstream commit fe2f8ad6f0999db3b318359a01ee0108c703a8c3 ] The fragile ordering between marking commands completed or failed so that the error handler only wakes when the last running command completes or times out has race conditions. These race conditions can cause the SCSI layer to fail to wake the error handler, leaving I/O through the SCSI host stuck as the error state cannot advance. First, there is an memory ordering issue within scsi_dec_host_busy(). The write which clears SCMD_STATE_INFLIGHT may be reordered with reads counting in scsi_host_busy(). While the local CPU will see its own write, reordering can allow other CPUs in scsi_dec_host_busy() or scsi_eh_inc_host_failed() to see a raised busy count, causing no CPU to see a host busy equal to the host_failed count. This race condition can be prevented with a memory barrier on the error path to force the write to be visible before counting host busy commands. Second, there is a general ordering issue with scsi_eh_inc_host_failed(). By counting busy commands before incrementing host_failed, it can race with a final command in scsi_dec_host_busy(), such that scsi_dec_host_busy() does not see host_failed incremented but scsi_eh_inc_host_failed() counts busy commands before SCMD_STATE_INFLIGHT is cleared by scsi_dec_host_busy(), resulting in neither waking the error handler task. This needs the call to scsi_host_busy() to be moved after host_failed is incremented to close the race condition. Fixes: 6eb045e092ef ("scsi: core: avoid host-wide host_busy counter for scsi_mq") Signed-off-by: David Jeffery <djeffery@redhat.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://patch.msgid.link/20260113161036.6730-1-djeffery@redhat.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-01-30scsi: storvsc: Process unsupported MODE_SENSE_10Long Li1-1/+2
commit 9eacec5d18f98f89be520eeeef4b377acee3e4b8 upstream. The Hyper-V host does not support MODE_SENSE_10 and MODE_SENSE. The driver handles MODE_SENSE as unsupported command, but not for MODE_SENSE_10. Add MODE_SENSE_10 to the same handling logic and return correct code to SCSI layer. Fixes: 89ae7d709357 ("Staging: hv: storvsc: Move the storage driver out of the staging area") Cc: stable@kernel.org Signed-off-by: Long Li <longli@microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Link: https://patch.msgid.link/20260117010302.294068-1-longli@linux.microsoft.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-01-23scsi: core: Fix error handler encryption supportBrian Kao1-0/+24
commit 9a49157deeb23581fc5c8189b486340d7343264a upstream. Some low-level drivers (LLD) access block layer crypto fields, such as rq->crypt_keyslot and rq->crypt_ctx within `struct request`, to configure hardware for inline encryption. However, SCSI Error Handling (EH) commands (e.g., TEST UNIT READY, START STOP UNIT) should not involve any encryption setup. To prevent drivers from erroneously applying crypto settings during EH, this patch saves the original values of rq->crypt_keyslot and rq->crypt_ctx before an EH command is prepared via scsi_eh_prep_cmnd(). These fields in the 'struct request' are then set to NULL. The original values are restored in scsi_eh_restore_cmnd() after the EH command completes. This ensures that the block layer crypto context does not leak into EH command execution. Signed-off-by: Brian Kao <powenkao@google.com> Link: https://patch.msgid.link/20251218031726.2642834-1-powenkao@google.com Cc: stable@vger.kernel.org Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-01-17scsi: sg: Fix occasional bogus elapsed time that exceeds timeoutMichal Rábek1-7/+13
[ Upstream commit 0e1677654259a2f3ccf728de1edde922a3c4ba57 ] A race condition was found in sg_proc_debug_helper(). It was observed on a system using an IBM LTO-9 SAS Tape Drive (ULTRIUM-TD9) and monitoring /proc/scsi/sg/debug every second. A very large elapsed time would sometimes appear. This is caused by two race conditions. We reproduced the issue with an IBM ULTRIUM-HH9 tape drive on an x86_64 architecture. A patched kernel was built, and the race condition could not be observed anymore after the application of this patch. A reproducer C program utilising the scsi_debug module was also built by Changhui Zhong and can be viewed here: https://github.com/MichaelRabek/linux-tests/blob/master/drivers/scsi/sg/sg_race_trigger.c The first race happens between the reading of hp->duration in sg_proc_debug_helper() and request completion in sg_rq_end_io(). The hp->duration member variable may hold either of two types of information: #1 - The start time of the request. This value is present while the request is not yet finished. #2 - The total execution time of the request (end_time - start_time). If sg_proc_debug_helper() executes *after* the value of hp->duration was changed from #1 to #2, but *before* srp->done is set to 1 in sg_rq_end_io(), a fresh timestamp is taken in the else branch, and the elapsed time (value type #2) is subtracted from a timestamp, which cannot yield a valid elapsed time (which is a type #2 value as well). To fix this issue, the value of hp->duration must change under the protection of the sfp->rq_list_lock in sg_rq_end_io(). Since sg_proc_debug_helper() takes this read lock, the change to srp->done and srp->header.duration will happen atomically from the perspective of sg_proc_debug_helper() and the race condition is thus eliminated. The second race condition happens between sg_proc_debug_helper() and sg_new_write(). Even though hp->duration is set to the current time stamp in sg_add_request() under the write lock's protection, it gets overwritten by a call to get_sg_io_hdr(), which calls copy_from_user() to copy struct sg_io_hdr from userspace into kernel space. hp->duration is set to the start time again in sg_common_write(). If sg_proc_debug_helper() is called between these two calls, an arbitrary value set by userspace (usually zero) is used to compute the elapsed time. To fix this issue, hp->duration must be set to the current timestamp again after get_sg_io_hdr() returns successfully. A small race window still exists between get_sg_io_hdr() and setting hp->duration, but this window is only a few instructions wide and does not result in observable issues in practice, as confirmed by testing. Additionally, we fix the format specifier from %d to %u for printing unsigned int values in sg_proc_debug_helper(). Signed-off-by: Michal Rábek <mrabek@redhat.com> Suggested-by: Tomas Henzl <thenzl@redhat.com> Tested-by: Changhui Zhong <czhong@redhat.com> Reviewed-by: Ewan D. Milne <emilne@redhat.com> Reviewed-by: John Meneghini <jmeneghi@redhat.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Link: https://patch.msgid.link/20251212160900.64924-1-mrabek@redhat.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-01-17scsi: Revert "scsi: libsas: Fix exp-attached device scan after probe failure ↵Xingui Yang1-14/+0
scanned in again after probe failed" [ Upstream commit 278712d20bc8ec29d1ad6ef9bdae9000ef2c220c ] This reverts commit ab2068a6fb84751836a84c26ca72b3beb349619d. When probing the exp-attached sata device, libsas/libata will issue a hard reset in sas_probe_sata() -> ata_sas_async_probe(), then a broadcast event will be received after the disk probe fails, and this commit causes the probe will be re-executed on the disk, and a faulty disk may get into an indefinite loop of probe. Therefore, revert this commit, although it can fix some temporary issues with disk probe failure. Signed-off-by: Xingui Yang <yangxingui@huawei.com> Reviewed-by: Jason Yan <yanaijie@huawei.com> Reviewed-by: John Garry <john.g.garry@oracle.com> Link: https://patch.msgid.link/20251202065627.140361-1-yangxingui@huawei.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-01-17scsi: ipr: Enable/disable IRQD_NO_BALANCING during resetWen Xiong1-1/+27
[ Upstream commit 6ac3484fb13b2fc7f31cfc7f56093e7d0ce646a5 ] A dynamic remove/add storage adapter test hits EEH on PowerPC: EEH: [c00000000004f75c] __eeh_send_failure_event+0x7c/0x160 EEH: [c000000000048444] eeh_dev_check_failure.part.0+0x254/0x650 EEH: [c008000001650678] eeh_readl+0x60/0x90 [ipr] EEH: [c00800000166746c] ipr_cancel_op+0x2b8/0x524 [ipr] EEH: [c008000001656524] ipr_eh_abort+0x6c/0x130 [ipr] EEH: [c000000000ab0d20] scmd_eh_abort_handler+0x140/0x440 EEH: [c00000000017e558] process_one_work+0x298/0x590 EEH: [c00000000017eef8] worker_thread+0xa8/0x620 EEH: [c00000000018be34] kthread+0x124/0x130 EEH: [c00000000000cd64] ret_from_kernel_thread+0x5c/0x64 A PCIe bus trace reveals that a vector of MSI-X is cleared to 0 by irqbalance daemon. If we disable irqbalance daemon, we won't see the issue. With debug enabled in ipr driver: [ 44.103071] ipr: Entering __ipr_remove [ 44.103083] ipr: Entering ipr_initiate_ioa_bringdown [ 44.103091] ipr: Entering ipr_reset_shutdown_ioa [ 44.103099] ipr: Leaving ipr_reset_shutdown_ioa [ 44.103105] ipr: Leaving ipr_initiate_ioa_bringdown [ 44.149918] ipr: Entering ipr_reset_ucode_download [ 44.149935] ipr: Entering ipr_reset_alert [ 44.150032] ipr: Entering ipr_reset_start_timer [ 44.150038] ipr: Leaving ipr_reset_alert [ 44.244343] scsi 1:2:3:0: alua: Detached [ 44.254300] ipr: Entering ipr_reset_start_bist [ 44.254320] ipr: Entering ipr_reset_start_timer [ 44.254325] ipr: Leaving ipr_reset_start_bist [ 44.364329] scsi 1:2:4:0: alua: Detached [ 45.134341] scsi 1:2:5:0: alua: Detached [ 45.860949] ipr: Entering ipr_reset_shutdown_ioa [ 45.860962] ipr: Leaving ipr_reset_shutdown_ioa [ 45.860966] ipr: Entering ipr_reset_alert [ 45.861028] ipr: Entering ipr_reset_start_timer [ 45.861035] ipr: Leaving ipr_reset_alert [ 45.964302] ipr: Entering ipr_reset_start_bist [ 45.964309] ipr: Entering ipr_reset_start_timer [ 45.964313] ipr: Leaving ipr_reset_start_bist [ 46.264301] ipr: Entering ipr_reset_bist_done [ 46.264309] ipr: Leaving ipr_reset_bist_done During adapter reset, ipr device driver blocks config space access but can't block MMIO access for MSI-X entries. There is very small window: irqbalance daemon kicks in during adapter reset before ipr driver calls pci_restore_state(pdev) to restore MSI-X table. irqbalance daemon reads back all 0 for that MSI-X vector in __pci_read_msi_msg(). irqbalance daemon: msi_domain_set_affinity() ->irq_chip_set_affinity_patent() ->xive_irq_set_affinity() ->irq_chip_compose_msi_msg() ->pseries_msi_compose_msg() ->__pci_read_msi_msg(): read all 0 since didn't call pci_restore_state ->irq_chip_write_msi_msg() -> pci_write_msg_msi(): write 0 to the msix vector entry When ipr driver calls pci_restore_state(pdev) in ipr_reset_restore_cfg_space(), the MSI-X vector entry has been cleared by irqbalance daemon in pci_write_msg_msix(). pci_restore_state() ->__pci_restore_msix_state() Below is the MSI-X table for ipr adapter after irqbalance daemon kicked in during adapter reset: Dump MSIx table: index=0 address_lo=c800 address_hi=10000000 msg_data=0 Dump MSIx table: index=1 address_lo=c810 address_hi=10000000 msg_data=0 Dump MSIx table: index=2 address_lo=c820 address_hi=10000000 msg_data=0 Dump MSIx table: index=3 address_lo=c830 address_hi=10000000 msg_data=0 Dump MSIx table: index=4 address_lo=c840 address_hi=10000000 msg_data=0 Dump MSIx table: index=5 address_lo=c850 address_hi=10000000 msg_data=0 Dump MSIx table: index=6 address_lo=c860 address_hi=10000000 msg_data=0 Dump MSIx table: index=7 address_lo=c870 address_hi=10000000 msg_data=0 Dump MSIx table: index=8 address_lo=0 address_hi=0 msg_data=0 ---------> Hit EEH since msix vector of index=8 are 0 Dump MSIx table: index=9 address_lo=c890 address_hi=10000000 msg_data=0 Dump MSIx table: index=10 address_lo=c8a0 address_hi=10000000 msg_data=0 Dump MSIx table: index=11 address_lo=c8b0 address_hi=10000000 msg_data=0 Dump MSIx table: index=12 address_lo=c8c0 address_hi=10000000 msg_data=0 Dump MSIx table: index=13 address_lo=c8d0 address_hi=10000000 msg_data=0 Dump MSIx table: index=14 address_lo=c8e0 address_hi=10000000 msg_data=0 Dump MSIx table: index=15 address_lo=c8f0 address_hi=10000000 msg_data=0 [ 46.264312] ipr: Entering ipr_reset_restore_cfg_space [ 46.267439] ipr: Entering ipr_fail_all_ops [ 46.267447] ipr: Leaving ipr_fail_all_ops [ 46.267451] ipr: Leaving ipr_reset_restore_cfg_space [ 46.267454] ipr: Entering ipr_ioa_bringdown_done [ 46.267458] ipr: Leaving ipr_ioa_bringdown_done [ 46.267467] ipr: Entering ipr_worker_thread [ 46.267470] ipr: Leaving ipr_worker_thread IRQ balancing is not required during adapter reset. Enable "IRQ_NO_BALANCING" flag before starting adapter reset and disable it after calling pci_restore_state(). The irqbalance daemon is disabled for this short period of time (~2s). Co-developed-by: Kyle Mahlkuch <Kyle.Mahlkuch@ibm.com> Signed-off-by: Kyle Mahlkuch <Kyle.Mahlkuch@ibm.com> Signed-off-by: Wen Xiong <wenxiong@linux.ibm.com> Link: https://patch.msgid.link/20251028142427.3969819-2-wenxiong@linux.ibm.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-01-08scsi: mpi3mr: Read missing IOCFacts flag for reply queue full overflowChandrakanth Patil2-0/+3
commit d373163194982f43b92c552c138c29d9f0b79553 upstream. The driver was not reading the MAX_REQ_PER_REPLY_QUEUE_LIMIT IOCFacts flag, so the reply-queue-full handling was never enabled, even on firmware that supports it. Reading this flag enables the feature and prevents reply queue overflow. Fixes: f08b24d82749 ("scsi: mpi3mr: Avoid reply queue full condition") Cc: stable@vger.kernel.org Signed-off-by: Chandrakanth Patil <chandrakanth.patil@broadcom.com> Link: https://patch.msgid.link/20251211002929.22071-1-chandrakanth.patil@broadcom.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-01-08scsi: aic94xx: fix use-after-free in device removal pathJunrui Luo1-0/+3
commit f6ab594672d4cba08540919a4e6be2e202b60007 upstream. The asd_pci_remove() function fails to synchronize with pending tasklets before freeing the asd_ha structure, leading to a potential use-after-free vulnerability. When a device removal is triggered (via hot-unplug or module unload), race condition can occur. The fix adds tasklet_kill() before freeing the asd_ha structure, ensuring all scheduled tasklets complete before cleanup proceeds. Reported-by: Yuhao Jiang <danisjiang@gmail.com> Reported-by: Junrui Luo <moonafterrain@outlook.com> Fixes: 2908d778ab3e ("[SCSI] aic94xx: new driver") Cc: stable@vger.kernel.org Signed-off-by: Junrui Luo <moonafterrain@outlook.com> Link: https://patch.msgid.link/ME2PR01MB3156AB7DCACA206C845FC7E8AFFDA@ME2PR01MB3156.ausprd01.prod.outlook.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-01-08scsi: Revert "scsi: qla2xxx: Perform lockless command completion in abort path"Tony Battersby1-6/+0
commit b57fbc88715b6d18f379463f48a15b560b087ffe upstream. This reverts commit 0367076b0817d5c75dfb83001ce7ce5c64d803a9. The commit being reverted added code to __qla2x00_abort_all_cmds() to call sp->done() without holding a spinlock. But unlike the older code below it, this new code failed to check sp->cmd_type and just assumed TYPE_SRB, which results in a jump to an invalid pointer in target-mode with TYPE_TGT_CMD: qla2xxx [0000:65:00.0]-d034:8: qla24xx_do_nack_work create sess success 0000000009f7a79b qla2xxx [0000:65:00.0]-5003:8: ISP System Error - mbx1=1ff5h mbx2=10h mbx3=0h mbx4=0h mbx5=191h mbx6=0h mbx7=0h. qla2xxx [0000:65:00.0]-d01e:8: -> fwdump no buffer qla2xxx [0000:65:00.0]-f03a:8: qla_target(0): System error async event 0x8002 occurred qla2xxx [0000:65:00.0]-00af:8: Performing ISP error recovery - ha=0000000058183fda. BUG: kernel NULL pointer dereference, address: 0000000000000000 PF: supervisor instruction fetch in kernel mode PF: error_code(0x0010) - not-present page PGD 0 P4D 0 Oops: 0010 [#1] SMP CPU: 2 PID: 9446 Comm: qla2xxx_8_dpc Tainted: G O 6.1.133 #1 Hardware name: Supermicro Super Server/X11SPL-F, BIOS 4.2 12/15/2023 RIP: 0010:0x0 Code: Unable to access opcode bytes at 0xffffffffffffffd6. RSP: 0018:ffffc90001f93dc8 EFLAGS: 00010206 RAX: 0000000000000282 RBX: 0000000000000355 RCX: ffff88810d16a000 RDX: ffff88810dbadaa8 RSI: 0000000000080000 RDI: ffff888169dc38c0 RBP: ffff888169dc38c0 R08: 0000000000000001 R09: 0000000000000045 R10: ffffffffa034bdf0 R11: 0000000000000000 R12: ffff88810800bb40 R13: 0000000000001aa8 R14: ffff888100136610 R15: ffff8881070f7400 FS: 0000000000000000(0000) GS:ffff88bf80080000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffffffffd6 CR3: 000000010c8ff006 CR4: 00000000003706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ? __die+0x4d/0x8b ? page_fault_oops+0x91/0x180 ? trace_buffer_unlock_commit_regs+0x38/0x1a0 ? exc_page_fault+0x391/0x5e0 ? asm_exc_page_fault+0x22/0x30 __qla2x00_abort_all_cmds+0xcb/0x3e0 [qla2xxx_scst] qla2x00_abort_all_cmds+0x50/0x70 [qla2xxx_scst] qla2x00_abort_isp_cleanup+0x3b7/0x4b0 [qla2xxx_scst] qla2x00_abort_isp+0xfd/0x860 [qla2xxx_scst] qla2x00_do_dpc+0x581/0xa40 [qla2xxx_scst] kthread+0xa8/0xd0 </TASK> Then commit 4475afa2646d ("scsi: qla2xxx: Complete command early within lock") added the spinlock back, because not having the lock caused a race and a crash. But qla2x00_abort_srb() in the switch below already checks for qla2x00_chip_is_down() and handles it the same way, so the code above the switch is now redundant and still buggy in target-mode. Remove it. Cc: stable@vger.kernel.org Signed-off-by: Tony Battersby <tonyb@cybernetics.com> Link: https://patch.msgid.link/3a8022dc-bcfd-4b01-9f9b-7a9ec61fa2a3@cybernetics.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-01-08scsi: scsi_debug: Fix atomic write enable module param descriptionJohn Garry1-1/+1
[ Upstream commit 1f7d6e2efeedd8f545d3e0e9bf338023bf4ea584 ] The atomic write enable module param is "atomic_wr", and not "atomic_write", so fix the module param description. Fixes: 84f3a3c01d70 ("scsi: scsi_debug: Atomic write support") Signed-off-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://patch.msgid.link/20251211100651.9056-1-john.g.garry@oracle.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-01-08scsi: qla2xxx: Use reinit_completion on mbx_intr_compTony Battersby1-0/+2
[ Upstream commit 957aa5974989fba4ae4f807ebcb27f12796edd4d ] If a mailbox command completes immediately after wait_for_completion_timeout() times out, ha->mbx_intr_comp could be left in an inconsistent state, causing the next mailbox command not to wait for the hardware. Fix by reinitializing the completion before use. Signed-off-by: Tony Battersby <tonyb@cybernetics.com> Link: https://patch.msgid.link/11b6485e-0bfd-4784-8f99-c06a196dad94@cybernetics.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-01-08scsi: qla2xxx: Fix initiator mode with qlini_mode=exclusiveTony Battersby1-7/+1
[ Upstream commit 8f58fc64d559b5fda1b0a5e2a71422be61e79ab9 ] When given the module parameter qlini_mode=exclusive, qla2xxx in initiator mode is initially unable to successfully send SCSI commands to devices it finds while scanning, resulting in an escalating series of resets until an adapter reset clears the issue. Fix by checking the active mode instead of the module parameter. Signed-off-by: Tony Battersby <tonyb@cybernetics.com> Link: https://patch.msgid.link/1715ec14-ba9a-45dc-9cf2-d41aa6b81b5e@cybernetics.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-01-08scsi: qla2xxx: Fix lost interrupts with qlini_mode=disabledTony Battersby4-34/+5
[ Upstream commit 4f6aaade2a22ac428fa99ed716cf2b87e79c9837 ] When qla2xxx is loaded with qlini_mode=disabled, ha->flags.disable_msix_handshake is used before it is set, resulting in the wrong interrupt handler being used on certain HBAs (qla2xxx_msix_rsp_q_hs() is used when qla2xxx_msix_rsp_q() should be used). The only difference between these two interrupt handlers is that the _hs() version writes to a register to clear the "RISC" interrupt, whereas the other version does not. So this bug results in the RISC interrupt being cleared when it should not be. This occasionally causes a different interrupt handler qla24xx_msix_default() for a different vector to see ((stat & HSRX_RISC_INT) == 0) and ignore its interrupt, which then causes problems like: qla2xxx [0000:02:00.0]-d04c:6: MBX Command timeout for cmd 20, iocontrol=8 jiffies=1090c0300 mb[0-3]=[0x4000 0x0 0x40 0xda] mb7 0x500 host_status 0x40000010 hccr 0x3f00 qla2xxx [0000:02:00.0]-101e:6: Mailbox cmd timeout occurred, cmd=0x20, mb[0]=0x20. Scheduling ISP abort (the cmd varies; sometimes it is 0x20, 0x22, 0x54, 0x5a, 0x5d, or 0x6a) This problem can be reproduced with a 16 or 32 Gbps HBA by loading qla2xxx with qlini_mode=disabled and running a high IOPS test while triggering frequent RSCN database change events. While analyzing the problem I discovered that even with disable_msix_handshake forced to 0, it is not necessary to clear the RISC interrupt from qla2xxx_msix_rsp_q_hs() (more below). So just completely remove qla2xxx_msix_rsp_q_hs() and the logic for selecting it, which also fixes the bug with qlini_mode=disabled. The test below describes the justification for not needing qla2xxx_msix_rsp_q_hs(): Force disable_msix_handshake to 0: qla24xx_config_rings(): if (0 && (ha->fw_attributes & BIT_6) && (IS_MSIX_NACK_CAPABLE(ha)) && (ha->flags.msix_enabled)) { In qla24xx_msix_rsp_q() and qla2xxx_msix_rsp_q_hs(), check: (rd_reg_dword(&reg->host_status) & HSRX_RISC_INT) Count the number of calls to each function with HSRX_RISC_INT set and the number with HSRX_RISC_INT not set while performing some I/O. If qla2xxx_msix_rsp_q_hs() clears the RISC interrupt (original code): qla24xx_msix_rsp_q: 50% of calls have HSRX_RISC_INT set qla2xxx_msix_rsp_q_hs: 5% of calls have HSRX_RISC_INT set (# of qla2xxx_msix_rsp_q_hs interrupts) = (# of qla24xx_msix_rsp_q interrupts) * 3 If qla2xxx_msix_rsp_q_hs() does not clear the RISC interrupt (patched code): qla24xx_msix_rsp_q: 100% of calls have HSRX_RISC_INT set qla2xxx_msix_rsp_q_hs: 9% of calls have HSRX_RISC_INT set (# of qla2xxx_msix_rsp_q_hs interrupts) = (# of qla24xx_msix_rsp_q interrupts) * 3 In the case of the original code, qla24xx_msix_rsp_q() was seeing HSRX_RISC_INT set only 50% of the time because qla2xxx_msix_rsp_q_hs() was clearing it when it shouldn't have been. In the patched code, qla24xx_msix_rsp_q() sees HSRX_RISC_INT set 100% of the time, which makes sense if that interrupt handler needs to clear the RISC interrupt (which it does). qla2xxx_msix_rsp_q_hs() sees HSRX_RISC_INT only 9% of the time, which is just overlap from the other interrupt during the high IOPS test. Tested with SCST on: QLE2742 FW:v9.08.02 (32 Gbps 2-port) QLE2694L FW:v9.10.11 (16 Gbps 4-port) QLE2694L FW:v9.08.02 (16 Gbps 4-port) QLE2672 FW:v8.07.12 (16 Gbps 2-port) both initiator and target mode Signed-off-by: Tony Battersby <tonyb@cybernetics.com> Link: https://patch.msgid.link/56d378eb-14ad-49c7-bae9-c649b6c7691e@cybernetics.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-01-08scsi: smartpqi: Add support for Hurray Data new controller PCI deviceDavid Strahan1-0/+4
[ Upstream commit 48e6b7e708029cea451e53a8c16fc8c16039ecdc ] Add support for new Hurray Data controller. All entries are in HEX. Add PCI IDs for Hurray Data controllers: VID / DID / SVID / SDID ---- ---- ---- ---- 9005 028f 207d 4840 Reviewed-by: Scott Benesh <scott.benesh@microchip.com> Reviewed-by: Scott Teel <scott.teel@microchip.com> Reviewed-by: Mike McGowen <mike.mcgowen@microchip.com> Signed-off-by: David Strahan <David.Strahan@microchip.com> Signed-off-by: Don Brace <don.brace@microchip.com> Link: https://patch.msgid.link/20251106163823.786828-4-don.brace@microchip.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-12-18scsi: imm: Fix use-after-free bug caused by unfinished delayed workDuoming Zhou1-0/+1
[ Upstream commit ab58153ec64fa3fc9aea09ca09dc9322e0b54a7c ] The delayed work item 'imm_tq' is initialized in imm_attach() and scheduled via imm_queuecommand() for processing SCSI commands. When the IMM parallel port SCSI host adapter is detached through imm_detach(), the imm_struct device instance is deallocated. However, the delayed work might still be pending or executing when imm_detach() is called, leading to use-after-free bugs when the work function imm_interrupt() accesses the already freed imm_struct memory. The race condition can occur as follows: CPU 0(detach thread) | CPU 1 | imm_queuecommand() | imm_queuecommand_lck() imm_detach() | schedule_delayed_work() kfree(dev) //FREE | imm_interrupt() | dev = container_of(...) //USE dev-> //USE Add disable_delayed_work_sync() in imm_detach() to guarantee proper cancellation of the delayed work item before imm_struct is deallocated. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Link: https://patch.msgid.link/20251028100149.40721-1-duoming@zju.edu.cn Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-12-18scsi: qla2xxx: Fix improper freeing of purex itemZilin Guan1-1/+1
[ Upstream commit 78b1a242fe612a755f2158fd206ee6bb577d18ca ] In qla2xxx_process_purls_iocb(), an item is allocated via qla27xx_copy_multiple_pkt(), which internally calls qla24xx_alloc_purex_item(). The qla24xx_alloc_purex_item() function may return a pre-allocated item from a per-adapter pool for small allocations, instead of dynamically allocating memory with kzalloc(). An error handling path in qla2xxx_process_purls_iocb() incorrectly uses kfree() to release the item. If the item was from the pre-allocated pool, calling kfree() on it is a bug that can lead to memory corruption. Fix this by using the correct deallocation function, qla24xx_free_purex_item(), which properly handles both dynamically allocated and pre-allocated items. Fixes: 875386b98857 ("scsi: qla2xxx: Add Unsolicited LS Request and Response Support for NVMe") Signed-off-by: Zilin Guan <zilin@seu.edu.cn> Reviewed-by: Himanshu Madhani <hmadhani2024@gmail.com> Link: https://patch.msgid.link/20251113151246.762510-1-zilin@seu.edu.cn Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-12-18scsi: sim710: Fix resource leak by adding missing ioport_unmap() callsHaotian Zhang1-0/+2
[ Upstream commit acd194d9b5bac419e04968ffa44351afabb50bac ] The driver calls ioport_map() to map I/O ports in sim710_probe_common() but never calls ioport_unmap() to release the mapping. This causes resource leaks in both the error path when request_irq() fails and in the normal device removal path via sim710_device_remove(). Add ioport_unmap() calls in the out_release error path and in sim710_device_remove(). Fixes: 56fece20086e ("[PATCH] finally fix 53c700 to use the generic iomem infrastructure") Signed-off-by: Haotian Zhang <vulab@iscas.ac.cn> Link: https://patch.msgid.link/20251029032555.1476-1-vulab@iscas.ac.cn Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-12-18scsi: smartpqi: Fix device resources accessed after device removalMike McGowen1-0/+19
[ Upstream commit b518e86d1a70a88f6592a7c396cf1b93493d1aab ] Correct possible race conditions during device removal. Previously, a scheduled work item to reset a LUN could still execute after the device was removed, leading to use-after-free and other resource access issues. This race condition occurs because the abort handler may schedule a LUN reset concurrently with device removal via sdev_destroy(), leading to use-after-free and improper access to freed resources. - Check in the device reset handler if the device is still present in the controller's SCSI device list before running; if not, the reset is skipped. - Cancel any pending TMF work that has not started in sdev_destroy(). - Ensure device freeing in sdev_destroy() is done while holding the LUN reset mutex to avoid races with ongoing resets. Fixes: 2d80f4054f7f ("scsi: smartpqi: Update deleting a LUN via sysfs") Reviewed-by: Scott Teel <scott.teel@microchip.com> Reviewed-by: Scott Benesh <scott.benesh@microchip.com> Signed-off-by: Mike McGowen <mike.mcgowen@microchip.com> Signed-off-by: Don Brace <don.brace@microchip.com> Link: https://patch.msgid.link/20251106163823.786828-3-don.brace@microchip.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-12-18scsi: stex: Fix reboot_notifier leak in probe error pathHaotian Zhang1-0/+1
[ Upstream commit 20da637eb545b04753e20c675cfe97b04c7b600b ] In stex_probe(), register_reboot_notifier() is called at the beginning, but if any subsequent initialization step fails, the function returns without unregistering the notifier, resulting in a resource leak. Add unregister_reboot_notifier() in the out_disable error path to ensure proper cleanup on all failure paths. Fixes: 61b745fa63db ("scsi: stex: Add S6 support") Signed-off-by: Haotian Zhang <vulab@iscas.ac.cn> Link: https://patch.msgid.link/20251104094847.270-1-vulab@iscas.ac.cn Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-12-01scsi: core: Fix a regression triggered by scsi_host_busy()Bart Van Assche1-2/+3
[ Upstream commit a0b7780602b1b196f47e527fec82166a7e67c4d0 ] Commit 995412e23bb2 ("blk-mq: Replace tags->lock with SRCU for tag iterators") introduced the following regression: Call trace: __srcu_read_lock+0x30/0x80 (P) blk_mq_tagset_busy_iter+0x44/0x300 scsi_host_busy+0x38/0x70 ufshcd_print_host_state+0x34/0x1bc ufshcd_link_startup.constprop.0+0xe4/0x2e0 ufshcd_init+0x944/0xf80 ufshcd_pltfrm_init+0x504/0x820 ufs_rockchip_probe+0x2c/0x88 platform_probe+0x5c/0xa4 really_probe+0xc0/0x38c __driver_probe_device+0x7c/0x150 driver_probe_device+0x40/0x120 __driver_attach+0xc8/0x1e0 bus_for_each_dev+0x7c/0xdc driver_attach+0x24/0x30 bus_add_driver+0x110/0x230 driver_register+0x68/0x130 __platform_driver_register+0x20/0x2c ufs_rockchip_pltform_init+0x1c/0x28 do_one_initcall+0x60/0x1e0 kernel_init_freeable+0x248/0x2c4 kernel_init+0x20/0x140 ret_from_fork+0x10/0x20 Fix this regression by making scsi_host_busy() check whether the SCSI host tag set has already been initialized. tag_set->ops is set by scsi_mq_setup_tags() just before blk_mq_alloc_tag_set() is called. This fix is based on the assumption that scsi_host_busy() and scsi_mq_setup_tags() calls are serialized. This is the case in the UFS driver. Reported-by: Sebastian Reichel <sebastian.reichel@collabora.com> Closes: https://lore.kernel.org/linux-block/pnezafputodmqlpumwfbn644ohjybouveehcjhz2hmhtcf2rka@sdhoiivync4y/ Cc: Ming Lei <ming.lei@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Ming Lei <ming.lei@redhat.com> Tested-by: Sebastian Reichel <sebastian.reichel@collabora.com> Link: https://patch.msgid.link/20251007214800.1678255-1-bvanassche@acm.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-12-01scsi: sg: Do not sleep in atomic contextBart Van Assche1-1/+9
commit 90449f2d1e1f020835cba5417234636937dd657e upstream. sg_finish_rem_req() calls blk_rq_unmap_user(). The latter function may sleep. Hence, call sg_finish_rem_req() with interrupts enabled instead of disabled. Reported-by: syzbot+c01f8e6e73f20459912e@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-scsi/691560c4.a70a0220.3124cb.001a.GAE@google.com/ Cc: Hannes Reinecke <hare@suse.de> Cc: stable@vger.kernel.org Fixes: 97d27b0dd015 ("scsi: sg: close race condition in sg_remove_sfp_usercontext()") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://patch.msgid.link/20251113181643.1108973-1-bvanassche@acm.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-11-13scsi: mpt3sas: Add support for 22.5 Gbps SAS link rateRanjan Kumar1-0/+3
[ Upstream commit 4be7599d6b27bade41bfccca42901b917c01c30c ] Add handling for MPI26_SAS_NEG_LINK_RATE_22_5 in _transport_convert_phy_link_rate(). This maps the new 22.5 Gbps negotiated rate to SAS_LINK_RATE_22_5_GBPS, to get correct PHY link speeds. Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com> Message-Id: <20250922095113.281484-4-ranjan.kumar@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>