summaryrefslogtreecommitdiff
path: root/drivers/scsi
AgeCommit message (Collapse)AuthorFilesLines
2026-04-02scsi: ses: Handle positive SCSI error from ses_recv_diag()Greg Kroah-Hartman1-1/+1
commit 7a9f448d44127217fabc4065c5ba070d4e0b5d37 upstream. ses_recv_diag() can return a positive value, which also means that an error happened, so do not only test for negative values. Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: stable <stable@kernel.org> Assisted-by: gkh_clanker_2000 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://patch.msgid.link/2026022301-bony-overstock-a07f@gregkh Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-04-02scsi: ibmvfc: Fix OOB access in ibmvfc_discover_targets_done()Tyllis Xu1-1/+2
commit 61d099ac4a7a8fb11ebdb6e2ec8d77f38e77362f upstream. A malicious or compromised VIO server can return a num_written value in the discover targets MAD response that exceeds max_targets. This value is stored directly in vhost->num_targets without validation, and is then used as the loop bound in ibmvfc_alloc_targets() to index into disc_buf[], which is only allocated for max_targets entries. Indices at or beyond max_targets access kernel memory outside the DMA-coherent allocation. The out-of-bounds data is subsequently embedded in Implicit Logout and PLOGI MADs that are sent back to the VIO server, leaking kernel memory. Fix by clamping num_written to max_targets before storing it. Fixes: 072b91f9c651 ("[SCSI] ibmvfc: IBM Power Virtual Fibre Channel Adapter Client Driver") Reported-by: Yuhao Jiang <danisjiang@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Tyllis Xu <LivelyCarpet87@gmail.com> Reviewed-by: Dave Marquardt <davemarq@linux.ibm.com> Acked-by: Tyrel Datwyler <tyreld@linux.ibm.com> Link: https://patch.msgid.link/20260314170151.548614-1-LivelyCarpet87@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-04-02scsi: scsi_transport_sas: Fix the maximum channel scanning issueYihang Li1-1/+1
[ Upstream commit d71afa9deb4d413232ba16d693f7d43b321931b4 ] After commit 37c4e72b0651 ("scsi: Fix sas_user_scan() to handle wildcard and multi-channel scans"), if the device supports multiple channels (0 to shost->max_channel), user_scan() invokes updated sas_user_scan() to perform the scan behavior for a specific transfer. However, when the user specifies shost->max_channel, it will return -EINVAL, which is not expected. Fix and support specifying the scan shost->max_channel for scanning. Fixes: 37c4e72b0651 ("scsi: Fix sas_user_scan() to handle wildcard and multi-channel scans") Signed-off-by: Yihang Li <liyihang9@huawei.com> Reviewed-by: John Garry <john.g.garry@oracle.com> Link: https://patch.msgid.link/20260317063147.2182562-1-liyihang9@huawei.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-04-02scsi: devinfo: Add BLIST_SKIP_IO_HINTS for Iomega ZIPFlorian Fuchs1-1/+1
[ Upstream commit 80bf3b28d32b431f84f244a8469488eb6d96afbb ] The Iomega ZIP 100 (Z100P2) can't process IO Advice Hints Grouping mode page query. It immediately switches to the status phase 0xb8 after receiving the subpage code 0x05 of MODE_SENSE_10 command, which fails imm_out() and turns into DID_ERROR of this command, which leads to unusable device. This was tested with an Iomega ZIP 100 (Z100P2) connected with a StarTech PEX1P2 AX99100 PCIe parallel port card. Prior to this fix, Test Unit Ready fails and the drive can't be used: IMM: returned SCSI status b8 sd 7:0:6:0: [sdh] Test Unit Ready failed: Result: hostbyte=0x01 driverbyte=DRIVER_OK Signed-off-by: Florian Fuchs <fuchsfl@gmail.com> Link: https://patch.msgid.link/20260227181823.892932-1-fuchsfl@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-04-02scsi: mpi3mr: Clear reset history on ready and recheck state after timeoutRanjan Kumar1-0/+10
[ Upstream commit dbd53975ed4132d161b6a97ebe785a262380182d ] The driver retains reset history even after the IOC has successfully reached the READY state. That leaves stale reset information active during normal operation and can mislead recovery and diagnostics. In addition, if the IOC becomes READY just as the ready timeout loop exits, the driver still follows the failure path and may retry or report failure incorrectly. Clear reset history once READY is confirmed so driver state matches actual IOC status. After the timeout loop, recheck the IOC state and treat READY as success instead of failing. Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com> Link: https://patch.msgid.link/20260225082622.82588-1-ranjan.kumar@broadcom.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-19scsi: core: Fix error handling for scsi_alloc_sdev()Junxiao Bi1-6/+2
commit 4ce7ada40c008fa21b7e52ab9d04e8746e2e9325 upstream. After scsi_sysfs_device_initialize() was called, error paths must call __scsi_remove_device(). Fixes: 1ac22c8eae81 ("scsi: core: Fix refcount leak for tagset_refcnt") Cc: stable@vger.kernel.org Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://patch.msgid.link/20260304164603.51528-1-junxiao.bi@oracle.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-03-19scsi: hisi_sas: Fix NULL pointer exception during user_scan()Xingui Yang2-2/+2
[ Upstream commit 8ddc0c26916574395447ebf4cff684314f6873a9 ] user_scan() invokes updated sas_user_scan() for channel 0, and if successful, iteratively scans remaining channels (1 to shost->max_channel) via scsi_scan_host_selected() in commit 37c4e72b0651 ("scsi: Fix sas_user_scan() to handle wildcard and multi-channel scans"). However, hisi_sas supports only one channel, and the current value of max_channel is 1. sas_user_scan() for channel 1 will trigger the following NULL pointer exception: [ 441.554662] Unable to handle kernel NULL pointer dereference at virtual address 00000000000008b0 [ 441.554699] Mem abort info: [ 441.554710] ESR = 0x0000000096000004 [ 441.554718] EC = 0x25: DABT (current EL), IL = 32 bits [ 441.554723] SET = 0, FnV = 0 [ 441.554726] EA = 0, S1PTW = 0 [ 441.554730] FSC = 0x04: level 0 translation fault [ 441.554735] Data abort info: [ 441.554737] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000 [ 441.554742] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 [ 441.554747] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ 441.554752] user pgtable: 4k pages, 48-bit VAs, pgdp=00000828377a6000 [ 441.554757] [00000000000008b0] pgd=0000000000000000, p4d=0000000000000000 [ 441.554769] Internal error: Oops: 0000000096000004 [#1] SMP [ 441.629589] Modules linked in: arm_spe_pmu arm_smmuv3_pmu tpm_tis_spi hisi_uncore_sllc_pmu hisi_uncore_pa_pmu hisi_uncore_l3c_pmu hisi_uncore_hha_pmu hisi_uncore_ddrc_pmu hisi_uncore_cpa_pmu hns3_pmu hisi_ptt hisi_pcie_pmu tpm_tis_core spidev spi_hisi_sfc_v3xx hisi_uncore_pmu spi_dw_mmio fuse hclge hclge_common hisi_sec2 hisi_hpre hisi_zip hisi_qm hns3 hisi_sas_v3_hw sm3_ce sbsa_gwdt hnae3 hisi_sas_main uacce hisi_dma i2c_hisi dm_mirror dm_region_hash dm_log dm_mod [ 441.670819] CPU: 46 UID: 0 PID: 6994 Comm: bash Kdump: loaded Not tainted 7.0.0-rc2+ #84 PREEMPT [ 441.691327] pstate: 81400009 (Nzcv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--) [ 441.698277] pc : sas_find_dev_by_rphy+0x44/0x118 [ 441.702896] lr : sas_find_dev_by_rphy+0x3c/0x118 [ 441.707502] sp : ffff80009abbba40 [ 441.710805] x29: ffff80009abbba40 x28: ffff082819a40008 x27: ffff082810c37c08 [ 441.717930] x26: ffff082810c37c28 x25: ffff082819a40290 x24: ffff082810c37c00 [ 441.725054] x23: 0000000000000000 x22: 0000000000000001 x21: ffff082819a40000 [ 441.732179] x20: ffff082819a40290 x19: 0000000000000000 x18: 0000000000000020 [ 441.739304] x17: 0000000000000000 x16: ffffb5dad6bda690 x15: 00000000ffffffff [ 441.746428] x14: ffff082814c3b26c x13: 00000000ffffffff x12: ffff082814c3b26a [ 441.753553] x11: 00000000000000c0 x10: 000000000000003a x9 : ffffb5dad5ea94f4 [ 441.760678] x8 : 000000000000003a x7 : ffff80009abbbab0 x6 : 0000000000000030 [ 441.767802] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000 [ 441.774926] x2 : ffff08280f35a300 x1 : ffffb5dad7127180 x0 : 0000000000000000 [ 441.782053] Call trace: [ 441.784488] sas_find_dev_by_rphy+0x44/0x118 (P) [ 441.789095] sas_target_alloc+0x24/0xb0 [ 441.792920] scsi_alloc_target+0x290/0x330 [ 441.797010] __scsi_scan_target+0x88/0x258 [ 441.801096] scsi_scan_channel+0x74/0xb8 [ 441.805008] scsi_scan_host_selected+0x170/0x188 [ 441.809615] sas_user_scan+0xfc/0x148 [ 441.813267] store_scan+0x10c/0x180 [ 441.816743] dev_attr_store+0x20/0x40 [ 441.820398] sysfs_kf_write+0x84/0xa8 [ 441.824054] kernfs_fop_write_iter+0x130/0x1c8 [ 441.828487] vfs_write+0x2c0/0x370 [ 441.831880] ksys_write+0x74/0x118 [ 441.835271] __arm64_sys_write+0x24/0x38 [ 441.839182] invoke_syscall+0x50/0x120 [ 441.842919] el0_svc_common.constprop.0+0xc8/0xf0 [ 441.847611] do_el0_svc+0x24/0x38 [ 441.850913] el0_svc+0x38/0x158 [ 441.854043] el0t_64_sync_handler+0xa0/0xe8 [ 441.858214] el0t_64_sync+0x1ac/0x1b0 [ 441.861865] Code: aa1303e0 97ff70a8 34ffff80 d10a4273 (f9445a75) [ 441.867946] ---[ end trace 0000000000000000 ]--- Therefore, set max_channel to 0. Fixes: e21fe3a52692 ("scsi: hisi_sas: add initialisation for v3 pci-based controller") Signed-off-by: Xingui Yang <yangxingui@huawei.com> Signed-off-by: Yihang Li <liyihang9@huawei.com> Link: https://patch.msgid.link/20260305064039.4096775-1-liyihang9@huawei.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-19scsi: qla2xxx: Completely fix fcport double freeVladimir Riabchun1-2/+0
[ Upstream commit c0b7da13a04bd70ef6070bfb9ea85f582294560a ] In qla24xx_els_dcmd_iocb() sp->free is set to qla2x00_els_dcmd_sp_free(). When an error happens, this function is called by qla2x00_sp_release(), when kref_put() releases the first and the last reference. qla2x00_els_dcmd_sp_free() frees fcport by calling qla2x00_free_fcport(). Doing it one more time after kref_put() is a bad idea. Fixes: 82f522ae0d97 ("scsi: qla2xxx: Fix double free of fcport") Fixes: 4895009c4bb7 ("scsi: qla2xxx: Prevent command send on chip reset") Signed-off-by: Vladimir Riabchun <ferr.lambarginio@gmail.com> Signed-off-by: Farhat Abbas <fabbas@cloudlinux.com> Link: https://patch.msgid.link/aYsDln9NFQQsPDgg@vova-pc Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-19scsi: ses: Fix devices attaching to different hostsTomas Henzl1-3/+2
[ Upstream commit 70ca8caa96ce473647054f5c7b9dab5423902402 ] On a multipath SAS system some devices don't end up with correct symlinks from the SCSI device to its enclosure. Some devices even have enclosure links pointing to enclosures attached to different SCSI hosts. ses_match_to_enclosure() calls enclosure_for_each_device() which iterates over all enclosures on the system, not just enclosures attached to the current SCSI host. Replace the iteration with a direct call to ses_enclosure_find_by_addr(). Reviewed-by: David Jeffery <djeffery@redhat.com> Signed-off-by: Tomas Henzl <thenzl@redhat.com> Link: https://patch.msgid.link/20260210191850.36784-1-thenzl@redhat.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-19scsi: mpi3mr: Add NULL checks when resetting request and reply queuesRanjan Kumar1-15/+19
[ Upstream commit fa96392ebebc8fade2b878acb14cce0f71016503 ] The driver encountered a crash during resource cleanup when the reply and request queues were NULL due to freed memory. This issue occurred when the creation of reply or request queues failed, and the driver freed the memory first, but attempted to mem set the content of the freed memory, leading to a system crash. Add NULL pointer checks for reply and request queues before accessing the reply/request memory during cleanup Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com> Link: https://patch.msgid.link/20260212070026.30263-1-ranjan.kumar@broadcom.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-19scsi: storvsc: Fix scheduling while atomic on PREEMPT_RTJan Kiszka1-2/+3
[ Upstream commit 57297736c08233987e5d29ce6584c6ca2a831b12 ] This resolves the follow splat and lock-up when running with PREEMPT_RT enabled on Hyper-V: [ 415.140818] BUG: scheduling while atomic: stress-ng-iomix/1048/0x00000002 [ 415.140822] INFO: lockdep is turned off. [ 415.140823] Modules linked in: intel_rapl_msr intel_rapl_common intel_uncore_frequency_common intel_pmc_core pmt_telemetry pmt_discovery pmt_class intel_pmc_ssram_telemetry intel_vsec ghash_clmulni_intel aesni_intel rapl binfmt_misc nls_ascii nls_cp437 vfat fat snd_pcm hyperv_drm snd_timer drm_client_lib drm_shmem_helper snd sg soundcore drm_kms_helper pcspkr hv_balloon hv_utils evdev joydev drm configfs efi_pstore nfnetlink vsock_loopback vmw_vsock_virtio_transport_common hv_sock vmw_vsock_vmci_transport vsock vmw_vmci efivarfs autofs4 ext4 crc16 mbcache jbd2 sr_mod sd_mod cdrom hv_storvsc serio_raw hid_generic scsi_transport_fc hid_hyperv scsi_mod hid hv_netvsc hyperv_keyboard scsi_common [ 415.140846] Preemption disabled at: [ 415.140847] [<ffffffffc0656171>] storvsc_queuecommand+0x2e1/0xbe0 [hv_storvsc] [ 415.140854] CPU: 8 UID: 0 PID: 1048 Comm: stress-ng-iomix Not tainted 6.19.0-rc7 #30 PREEMPT_{RT,(full)} [ 415.140856] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 09/04/2024 [ 415.140857] Call Trace: [ 415.140861] <TASK> [ 415.140861] ? storvsc_queuecommand+0x2e1/0xbe0 [hv_storvsc] [ 415.140863] dump_stack_lvl+0x91/0xb0 [ 415.140870] __schedule_bug+0x9c/0xc0 [ 415.140875] __schedule+0xdf6/0x1300 [ 415.140877] ? rtlock_slowlock_locked+0x56c/0x1980 [ 415.140879] ? rcu_is_watching+0x12/0x60 [ 415.140883] schedule_rtlock+0x21/0x40 [ 415.140885] rtlock_slowlock_locked+0x502/0x1980 [ 415.140891] rt_spin_lock+0x89/0x1e0 [ 415.140893] hv_ringbuffer_write+0x87/0x2a0 [ 415.140899] vmbus_sendpacket_mpb_desc+0xb6/0xe0 [ 415.140900] ? rcu_is_watching+0x12/0x60 [ 415.140902] storvsc_queuecommand+0x669/0xbe0 [hv_storvsc] [ 415.140904] ? HARDIRQ_verbose+0x10/0x10 [ 415.140908] ? __rq_qos_issue+0x28/0x40 [ 415.140911] scsi_queue_rq+0x760/0xd80 [scsi_mod] [ 415.140926] __blk_mq_issue_directly+0x4a/0xc0 [ 415.140928] blk_mq_issue_direct+0x87/0x2b0 [ 415.140931] blk_mq_dispatch_queue_requests+0x120/0x440 [ 415.140933] blk_mq_flush_plug_list+0x7a/0x1a0 [ 415.140935] __blk_flush_plug+0xf4/0x150 [ 415.140940] __submit_bio+0x2b2/0x5c0 [ 415.140944] ? submit_bio_noacct_nocheck+0x272/0x360 [ 415.140946] submit_bio_noacct_nocheck+0x272/0x360 [ 415.140951] ext4_read_bh_lock+0x3e/0x60 [ext4] [ 415.140995] ext4_block_write_begin+0x396/0x650 [ext4] [ 415.141018] ? __pfx_ext4_da_get_block_prep+0x10/0x10 [ext4] [ 415.141038] ext4_da_write_begin+0x1c4/0x350 [ext4] [ 415.141060] generic_perform_write+0x14e/0x2c0 [ 415.141065] ext4_buffered_write_iter+0x6b/0x120 [ext4] [ 415.141083] vfs_write+0x2ca/0x570 [ 415.141087] ksys_write+0x76/0xf0 [ 415.141089] do_syscall_64+0x99/0x1490 [ 415.141093] ? rcu_is_watching+0x12/0x60 [ 415.141095] ? finish_task_switch.isra.0+0xdf/0x3d0 [ 415.141097] ? rcu_is_watching+0x12/0x60 [ 415.141098] ? lock_release+0x1f0/0x2a0 [ 415.141100] ? rcu_is_watching+0x12/0x60 [ 415.141101] ? finish_task_switch.isra.0+0xe4/0x3d0 [ 415.141103] ? rcu_is_watching+0x12/0x60 [ 415.141104] ? __schedule+0xb34/0x1300 [ 415.141106] ? hrtimer_try_to_cancel+0x1d/0x170 [ 415.141109] ? do_nanosleep+0x8b/0x160 [ 415.141111] ? hrtimer_nanosleep+0x89/0x100 [ 415.141114] ? __pfx_hrtimer_wakeup+0x10/0x10 [ 415.141116] ? xfd_validate_state+0x26/0x90 [ 415.141118] ? rcu_is_watching+0x12/0x60 [ 415.141120] ? do_syscall_64+0x1e0/0x1490 [ 415.141121] ? do_syscall_64+0x1e0/0x1490 [ 415.141123] ? rcu_is_watching+0x12/0x60 [ 415.141124] ? do_syscall_64+0x1e0/0x1490 [ 415.141125] ? do_syscall_64+0x1e0/0x1490 [ 415.141127] ? irqentry_exit+0x140/0x7e0 [ 415.141129] entry_SYSCALL_64_after_hwframe+0x76/0x7e get_cpu() disables preemption while the spinlock hv_ringbuffer_write is using is converted to an rt-mutex under PREEMPT_RT. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Tested-by: Florian Bezdeka <florian.bezdeka@siemens.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Tested-by: Michael Kelley <mhklinux@outlook.com> Link: https://patch.msgid.link/0c7fb5cd-fb21-4760-8593-e04bade84744@siemens.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-12scsi: core: Fix refcount leak for tagset_refcntJunxiao Bi1-0/+1
commit 1ac22c8eae81366101597d48360718dff9b9d980 upstream. This leak will cause a hang when tearing down the SCSI host. For example, iscsid hangs with the following call trace: [130120.652718] scsi_alloc_sdev: Allocation failure during SCSI scanning, some SCSI devices might not be configured PID: 2528 TASK: ffff9d0408974e00 CPU: 3 COMMAND: "iscsid" #0 [ffffb5b9c134b9e0] __schedule at ffffffff860657d4 #1 [ffffb5b9c134ba28] schedule at ffffffff86065c6f #2 [ffffb5b9c134ba40] schedule_timeout at ffffffff86069fb0 #3 [ffffb5b9c134bab0] __wait_for_common at ffffffff8606674f #4 [ffffb5b9c134bb10] scsi_remove_host at ffffffff85bfe84b #5 [ffffb5b9c134bb30] iscsi_sw_tcp_session_destroy at ffffffffc03031c4 [iscsi_tcp] #6 [ffffb5b9c134bb48] iscsi_if_recv_msg at ffffffffc0292692 [scsi_transport_iscsi] #7 [ffffb5b9c134bb98] iscsi_if_rx at ffffffffc02929c2 [scsi_transport_iscsi] #8 [ffffb5b9c134bbf0] netlink_unicast at ffffffff85e551d6 #9 [ffffb5b9c134bc38] netlink_sendmsg at ffffffff85e554ef Fixes: 8fe4ce5836e9 ("scsi: core: Fix a use-after-free") Cc: stable@vger.kernel.org Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Mike Christie <michael.christie@oracle.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://patch.msgid.link/20260223232728.93350-1-junxiao.bi@oracle.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-03-12scsi: pm8001: Fix use-after-free in pm8001_queue_command()Salomon Dushimirimana1-2/+3
[ Upstream commit 38353c26db28efd984f51d426eac2396d299cca7 ] Commit e29c47fe8946 ("scsi: pm8001: Simplify pm8001_task_exec()") refactors pm8001_queue_command(), however it introduces a potential cause of a double free scenario when it changes the function to return -ENODEV in case of phy down/device gone state. In this path, pm8001_queue_command() updates task status and calls task_done to indicate to upper layer that the task has been handled. However, this also frees the underlying SAS task. A -ENODEV is then returned to the caller. When libsas sas_ata_qc_issue() receives this error value, it assumes the task wasn't handled/queued by LLDD and proceeds to clean up and free the task again, resulting in a double free. Since pm8001_queue_command() handles the SAS task in this case, it should return 0 to the caller indicating that the task has been handled. Fixes: e29c47fe8946 ("scsi: pm8001: Simplify pm8001_task_exec()") Signed-off-by: Salomon Dushimirimana <salomondush@google.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://patch.msgid.link/20260213192806.439432-1-salomondush@google.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-12scsi: lpfc: Properly set WC for DPP mappingMathias Krause3-6/+35
[ Upstream commit bffda93a51b40afd67c11bf558dc5aae83ca0943 ] Using set_memory_wc() to enable write-combining for the DPP portion of the MMIO mapping is wrong as set_memory_*() is meant to operate on RAM only, not MMIO mappings. In fact, as used currently triggers a BUG_ON() with enabled CONFIG_DEBUG_VIRTUAL. Simply map the DPP region separately and in addition to the already existing mappings, avoiding any possible negative side effects for these. Fixes: 1351e69fc6db ("scsi: lpfc: Add push-to-adapter support to sli4") Signed-off-by: Mathias Krause <minipli@grsecurity.net> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Reviewed-by: Mathias Krause <minipli@grsecurity.net> Link: https://patch.msgid.link/20260212192327.141104-1-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04scsi: buslogic: Reduce stack usageArnd Bergmann1-2/+4
[ Upstream commit e17f0d4cc006265dd92129db4bf9da3a2e4a4f66 ] Some randconfig builds run into excessive stack usage with gcc-14 or higher, which use __attribute__((cold)) where earlier versions did not do that: drivers/scsi/BusLogic.c: In function 'blogic_init': drivers/scsi/BusLogic.c:2398:1: error: the frame size of 1680 bytes is larger than 1536 bytes [-Werror=frame-larger-than=] The problem is that a lot of code gets inlined into blogic_init() here. Two functions stick out, but they are a bit different: - blogic_init_probeinfo_list() actually uses a few hundred bytes of kernel stack, which is a problem in combination with other functions that also do. Marking this one as noinline means that the stack slots get get reused between function calls - blogic_reportconfig() has a few large variables, but whenever it is not inlined into its caller, the compiler is actually smart enough to reuse stack slots for these automatically, so marking it as noinline saves most of the stack space by itself. The combination of both of these should avoid the problem entirely. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://patch.msgid.link/20260203163321.2598593-1-arnd@kernel.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-02-27scsi: csiostor: Fix dereference of null pointer rnColin Ian King1-1/+2
[ Upstream commit 1982257570b84dc33753d536dd969fd357a014e9 ] The error exit path when rn is NULL ends up deferencing the null pointer rn via the use of the macro CSIO_INC_STATS. Fix this by adding a new error return path label after the use of the macro to avoid the deference. Fixes: a3667aaed569 ("[SCSI] csiostor: Chelsio FCoE offload driver") Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Link: https://patch.msgid.link/20260129155332.196338-1-colin.i.king@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-02-27scsi: smartpqi: Fix memory leak in pqi_report_phys_luns()Zilin Guan1-3/+10
[ Upstream commit 41b37312bd9722af77ec7817ccf22d7a4880c289 ] pqi_report_phys_luns() fails to release the rpl_list buffer when encountering an unsupported data format or when the allocation for rpl_16byte_wwid_list fails. These early returns bypass the cleanup logic, leading to memory leaks. Consolidate the error handling by adding an out_free_rpl_list label and use goto statements to ensure rpl_list is consistently freed on failure. Compile tested only. Issue found using a prototype static analysis tool and code review. Fixes: 28ca6d876c5a ("scsi: smartpqi: Add extended report physical LUNs") Signed-off-by: Zilin Guan <zilin@seu.edu.cn> Tested-by: Don Brace <don.brace@microchip.com> Acked-by: Don Brace <don.brace@microchip.com> Link: https://patch.msgid.link/20260131093641.1008117-1-zilin@seu.edu.cn Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-02-27scsi: efct: Use IRQF_ONESHOT and default primary handlerSebastian Andrzej Siewior1-7/+1
[ Upstream commit bd81f07e9a27c341cd7e72be95eb0b7cf3910926 ] There is no added value in efct_intr_msix() compared to irq_default_primary_handler(). Using a threaded interrupt without a dedicated primary handler mandates the IRQF_ONESHOT flag to mask the interrupt source while the threaded handler is active. Otherwise the interrupt can fire again before the threaded handler had a chance to run. Use the default primary interrupt handler by specifying NULL and set IRQF_ONESHOT so the interrupt source is masked until the secondary handler is done. Fixes: 4df84e8466242 ("scsi: elx: efct: Driver initialization routines") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Link: https://patch.msgid.link/20260128095540.863589-8-bigeasy@linutronix.de Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-02-19scsi: qla2xxx: Fix bsg_done() causing double freeAnil Gurumurthy1-11/+17
commit c2c68225b1456f4d0d393b5a8778d51bb0d5b1d0 upstream. Kernel panic observed on system, [5353358.825191] BUG: unable to handle page fault for address: ff5f5e897b024000 [5353358.825194] #PF: supervisor write access in kernel mode [5353358.825195] #PF: error_code(0x0002) - not-present page [5353358.825196] PGD 100006067 P4D 0 [5353358.825198] Oops: 0002 [#1] PREEMPT SMP NOPTI [5353358.825200] CPU: 5 PID: 2132085 Comm: qlafwupdate.sub Kdump: loaded Tainted: G W L ------- --- 5.14.0-503.34.1.el9_5.x86_64 #1 [5353358.825203] Hardware name: HPE ProLiant DL360 Gen11/ProLiant DL360 Gen11, BIOS 2.44 01/17/2025 [5353358.825204] RIP: 0010:memcpy_erms+0x6/0x10 [5353358.825211] RSP: 0018:ff591da8f4f6b710 EFLAGS: 00010246 [5353358.825212] RAX: ff5f5e897b024000 RBX: 0000000000007090 RCX: 0000000000001000 [5353358.825213] RDX: 0000000000001000 RSI: ff591da8f4fed090 RDI: ff5f5e897b024000 [5353358.825214] RBP: 0000000000010000 R08: ff5f5e897b024000 R09: 0000000000000000 [5353358.825215] R10: ff46cf8c40517000 R11: 0000000000000001 R12: 0000000000008090 [5353358.825216] R13: ff591da8f4f6b720 R14: 0000000000001000 R15: 0000000000000000 [5353358.825218] FS: 00007f1e88d47740(0000) GS:ff46cf935f940000(0000) knlGS:0000000000000000 [5353358.825219] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [5353358.825220] CR2: ff5f5e897b024000 CR3: 0000000231532004 CR4: 0000000000771ef0 [5353358.825221] PKRU: 55555554 [5353358.825222] Call Trace: [5353358.825223] <TASK> [5353358.825224] ? show_trace_log_lvl+0x1c4/0x2df [5353358.825229] ? show_trace_log_lvl+0x1c4/0x2df [5353358.825232] ? sg_copy_buffer+0xc8/0x110 [5353358.825236] ? __die_body.cold+0x8/0xd [5353358.825238] ? page_fault_oops+0x134/0x170 [5353358.825242] ? kernelmode_fixup_or_oops+0x84/0x110 [5353358.825244] ? exc_page_fault+0xa8/0x150 [5353358.825247] ? asm_exc_page_fault+0x22/0x30 [5353358.825252] ? memcpy_erms+0x6/0x10 [5353358.825253] sg_copy_buffer+0xc8/0x110 [5353358.825259] qla2x00_process_vendor_specific+0x652/0x1320 [qla2xxx] [5353358.825317] qla24xx_bsg_request+0x1b2/0x2d0 [qla2xxx] Most routines in qla_bsg.c call bsg_done() only for success cases. However a few invoke it for failure case as well leading to a double free. Validate before calling bsg_done(). Cc: stable@vger.kernel.org Signed-off-by: Anil Gurumurthy <agurumurthy@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Reviewed-by: Himanshu Madhani <hmadhani2024@gmail.com> Link: https://patch.msgid.link/20251210101604.431868-12-njavali@marvell.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-02-16scsi: qla2xxx: Query FW again before proceeding with loginAnil Gurumurthy2-4/+34
commit 42b2dab4340d39b71334151e10c6d7d9b0040ffa upstream. Issue occurred during a continuous reboot test of several thousand iterations specific to a fabric topo with dual mode target where it sends a PLOGI/PRLI and then sends a LOGO. The initiator was also in the process of discovery and sent a PLOGI to the switch. It then queried a list of ports logged in via mbx 75h and the GPDB response indicated that the target was logged in. This caused a mismatch in the states between the driver and FW. Requery the FW for the state and proceed with the rest of discovery process. Fixes: a4239945b8ad ("scsi: qla2xxx: Add switch command to simplify fabric discovery") Cc: stable@vger.kernel.org Signed-off-by: Anil Gurumurthy <agurumurthy@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Reviewed-by: Himanshu Madhani <hmadhani2024@gmail.com> Link: https://patch.msgid.link/20251210101604.431868-11-njavali@marvell.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-02-16scsi: qla2xxx: Free sp in error path to fix system crashAnil Gurumurthy1-2/+2
commit 7adbd2b7809066c75f0433e5e2a8e114b429f30f upstream. System crash seen during load/unload test in a loop, [61110.449331] qla2xxx [0000:27:00.0]-0042:0: Disabled MSI-X. [61110.467494] ============================================================================= [61110.467498] BUG qla2xxx_srbs (Tainted: G OE -------- --- ): Objects remaining in qla2xxx_srbs on __kmem_cache_shutdown() [61110.467501] ----------------------------------------------------------------------------- [61110.467502] Slab 0x000000000ffc8162 objects=51 used=1 fp=0x00000000e25d3d85 flags=0x57ffffc0010200(slab|head|node=1|zone=2|lastcpupid=0x1fffff) [61110.467509] CPU: 53 PID: 455206 Comm: rmmod Kdump: loaded Tainted: G OE -------- --- 5.14.0-284.11.1.el9_2.x86_64 #1 [61110.467513] Hardware name: HPE ProLiant DL385 Gen10 Plus v2/ProLiant DL385 Gen10 Plus v2, BIOS A42 08/17/2023 [61110.467515] Call Trace: [61110.467516] <TASK> [61110.467519] dump_stack_lvl+0x34/0x48 [61110.467526] slab_err.cold+0x53/0x67 [61110.467534] __kmem_cache_shutdown+0x16e/0x320 [61110.467540] kmem_cache_destroy+0x51/0x160 [61110.467544] qla2x00_module_exit+0x93/0x99 [qla2xxx] [61110.467607] ? __do_sys_delete_module.constprop.0+0x178/0x280 [61110.467613] ? syscall_trace_enter.constprop.0+0x145/0x1d0 [61110.467616] ? do_syscall_64+0x5c/0x90 [61110.467619] ? exc_page_fault+0x62/0x150 [61110.467622] ? entry_SYSCALL_64_after_hwframe+0x63/0xcd [61110.467626] </TASK> [61110.467627] Disabling lock debugging due to kernel taint [61110.467635] Object 0x0000000026f7e6e6 @offset=16000 [61110.467639] ------------[ cut here ]------------ [61110.467639] kmem_cache_destroy qla2xxx_srbs: Slab cache still has objects when called from qla2x00_module_exit+0x93/0x99 [qla2xxx] [61110.467659] WARNING: CPU: 53 PID: 455206 at mm/slab_common.c:520 kmem_cache_destroy+0x14d/0x160 [61110.467718] CPU: 53 PID: 455206 Comm: rmmod Kdump: loaded Tainted: G B OE -------- --- 5.14.0-284.11.1.el9_2.x86_64 #1 [61110.467720] Hardware name: HPE ProLiant DL385 Gen10 Plus v2/ProLiant DL385 Gen10 Plus v2, BIOS A42 08/17/2023 [61110.467721] RIP: 0010:kmem_cache_destroy+0x14d/0x160 [61110.467724] Code: 99 7d 07 00 48 89 ef e8 e1 6a 07 00 eb b3 48 8b 55 60 48 8b 4c 24 20 48 c7 c6 70 fc 66 90 48 c7 c7 f8 ef a1 90 e8 e1 ed 7c 00 <0f> 0b eb 93 c3 cc cc cc cc 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 [61110.467725] RSP: 0018:ffffa304e489fe80 EFLAGS: 00010282 [61110.467727] RAX: 0000000000000000 RBX: ffffffffc0d9a860 RCX: 0000000000000027 [61110.467729] RDX: ffff8fd5ff9598a8 RSI: 0000000000000001 RDI: ffff8fd5ff9598a0 [61110.467730] RBP: ffff8fb6aaf78700 R08: 0000000000000000 R09: 0000000100d863b7 [61110.467731] R10: ffffa304e489fd20 R11: ffffffff913bef48 R12: 0000000040002000 [61110.467731] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [61110.467733] FS: 00007f64c89fb740(0000) GS:ffff8fd5ff940000(0000) knlGS:0000000000000000 [61110.467734] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [61110.467735] CR2: 00007f0f02bfe000 CR3: 00000020ad6dc005 CR4: 0000000000770ee0 [61110.467736] PKRU: 55555554 [61110.467737] Call Trace: [61110.467738] <TASK> [61110.467739] qla2x00_module_exit+0x93/0x99 [qla2xxx] [61110.467755] ? __do_sys_delete_module.constprop.0+0x178/0x280 Free sp in the error path to fix the crash. Fixes: f352eeb75419 ("scsi: qla2xxx: Add ability to use GPNFT/GNNFT for RSCN handling") Cc: stable@vger.kernel.org Signed-off-by: Anil Gurumurthy <agurumurthy@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Reviewed-by: Himanshu Madhani <hmadhani2024@gmail.com> Link: https://patch.msgid.link/20251210101604.431868-9-njavali@marvell.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-02-16scsi: qla2xxx: Delay module unload while fabric scan in progressAnil Gurumurthy1-1/+2
commit 8890bf450e0b6b283f48ac619fca5ac2f14ddd62 upstream. System crash seen during load/unload test in a loop. [105954.384919] RBP: ffff914589838dc0 R08: 0000000000000000 R09: 0000000000000086 [105954.384920] R10: 000000000000000f R11: ffffa31240904be5 R12: ffff914605f868e0 [105954.384921] R13: ffff914605f86910 R14: 0000000000008010 R15: 00000000ddb7c000 [105954.384923] FS: 0000000000000000(0000) GS:ffff9163fec40000(0000) knlGS:0000000000000000 [105954.384925] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [105954.384926] CR2: 000055d31ce1d6a0 CR3: 0000000119f5e001 CR4: 0000000000770ee0 [105954.384928] PKRU: 55555554 [105954.384929] Call Trace: [105954.384931] <IRQ> [105954.384934] qla24xx_sp_unmap+0x1f3/0x2a0 [qla2xxx] [105954.384962] ? qla_async_scan_sp_done+0x114/0x1f0 [qla2xxx] [105954.384980] ? qla24xx_els_ct_entry+0x4de/0x760 [qla2xxx] [105954.384999] ? __wake_up_common+0x80/0x190 [105954.385004] ? qla24xx_process_response_queue+0xc2/0xaa0 [qla2xxx] [105954.385023] ? qla24xx_msix_rsp_q+0x44/0xb0 [qla2xxx] [105954.385040] ? __handle_irq_event_percpu+0x3d/0x190 [105954.385044] ? handle_irq_event+0x58/0xb0 [105954.385046] ? handle_edge_irq+0x93/0x240 [105954.385050] ? __common_interrupt+0x41/0xa0 [105954.385055] ? common_interrupt+0x3e/0xa0 [105954.385060] ? asm_common_interrupt+0x22/0x40 The root cause of this was that there was a free (dma_free_attrs) in the interrupt context. There was a device discovery/fabric scan in progress. A module unload was issued which set the UNLOADING flag. As part of the discovery, after receiving an interrupt a work queue was scheduled (which involved a work to be queued). Since the UNLOADING flag is set, the work item was not allocated and the mapped memory had to be freed. The free occurred in interrupt context leading to system crash. Delay the driver unload until the fabric scan is complete to avoid the crash. Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/all/202512090414.07Waorz0-lkp@intel.com/ Fixes: 783e0dc4f66a ("qla2xxx: Check for device state before unloading the driver.") Cc: stable@vger.kernel.org Signed-off-by: Anil Gurumurthy <agurumurthy@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Reviewed-by: Himanshu Madhani <hmadhani2024@gmail.com> Link: https://patch.msgid.link/20251210101604.431868-8-njavali@marvell.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-02-16scsi: qla2xxx: Allow recovery for tape devicesShreyas Deodhar2-12/+0
commit b0335ee4fb94832a4ef68774ca7e7b33b473c7a6 upstream. Tape device doesn't show up after RSCNs. To fix this, remove tape device specific checks which allows recovery of tape devices. Fixes: 44c57f205876 ("scsi: qla2xxx: Changes to support FCP2 Target") Cc: stable@vger.kernel.org Signed-off-by: Shreyas Deodhar <sdeodhar@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Reviewed-by: Himanshu Madhani <hmadhani2024@gmail.com> Link: https://patch.msgid.link/20251210101604.431868-7-njavali@marvell.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-02-16scsi: qla2xxx: Validate sp before freeing associated memoryAnil Gurumurthy1-16/+18
commit b6df15aec8c3441357d4da0eaf4339eb20f5999f upstream. System crash with the following signature [154563.214890] nvme nvme2: NVME-FC{1}: controller connect complete [154564.169363] qla2xxx [0000:b0:00.1]-3002:2: nvme: Sched: Set ZIO exchange threshold to 3. [154564.169405] qla2xxx [0000:b0:00.1]-ffffff:2: SET ZIO Activity exchange threshold to 5. [154565.539974] qla2xxx [0000:b0:00.1]-5013:2: RSCN database changed – 0078 0080 0000. [154565.545744] qla2xxx [0000:b0:00.1]-5013:2: RSCN database changed – 0078 00a0 0000. [154565.545857] qla2xxx [0000:b0:00.1]-11a2:2: FEC=enabled (data rate). [154565.552760] qla2xxx [0000:b0:00.1]-11a2:2: FEC=enabled (data rate). [154565.553079] BUG: kernel NULL pointer dereference, address: 00000000000000f8 [154565.553080] #PF: supervisor read access in kernel mode [154565.553082] #PF: error_code(0x0000) - not-present page [154565.553084] PGD 80000010488ab067 P4D 80000010488ab067 PUD 104978a067 PMD 0 [154565.553089] Oops: 0000 1 PREEMPT SMP PTI [154565.553092] CPU: 10 PID: 858 Comm: qla2xxx_2_dpc Kdump: loaded Tainted: G OE ------- --- 5.14.0-503.11.1.el9_5.x86_64 #1 [154565.553096] Hardware name: HPE Synergy 660 Gen10/Synergy 660 Gen10 Compute Module, BIOS I43 09/30/2024 [154565.553097] RIP: 0010:qla_fab_async_scan.part.0+0x40b/0x870 [qla2xxx] [154565.553141] Code: 00 00 e8 58 a3 ec d4 49 89 e9 ba 12 20 00 00 4c 89 e6 49 c7 c0 00 ee a8 c0 48 c7 c1 66 c0 a9 c0 bf 00 80 00 10 e8 15 69 00 00 <4c> 8b 8d f8 00 00 00 4d 85 c9 74 35 49 8b 84 24 00 19 00 00 48 8b [154565.553143] RSP: 0018:ffffb4dbc8aebdd0 EFLAGS: 00010286 [154565.553145] RAX: 0000000000000000 RBX: ffff8ec2cf0908d0 RCX: 0000000000000002 [154565.553147] RDX: 0000000000000000 RSI: ffffffffc0a9c896 RDI: ffffb4dbc8aebd47 [154565.553148] RBP: 0000000000000000 R08: ffffb4dbc8aebd45 R09: 0000000000ffff0a [154565.553150] R10: 0000000000000000 R11: 000000000000000f R12: ffff8ec2cf0908d0 [154565.553151] R13: ffff8ec2cf090900 R14: 0000000000000102 R15: ffff8ec2cf084000 [154565.553152] FS: 0000000000000000(0000) GS:ffff8ed27f800000(0000) knlGS:0000000000000000 [154565.553154] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [154565.553155] CR2: 00000000000000f8 CR3: 000000113ae0a005 CR4: 00000000007706f0 [154565.553157] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [154565.553158] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [154565.553159] PKRU: 55555554 [154565.553160] Call Trace: [154565.553162] <TASK> [154565.553165] ? show_trace_log_lvl+0x1c4/0x2df [154565.553172] ? show_trace_log_lvl+0x1c4/0x2df [154565.553177] ? qla_fab_async_scan.part.0+0x40b/0x870 [qla2xxx] [154565.553215] ? __die_body.cold+0x8/0xd [154565.553218] ? page_fault_oops+0x134/0x170 [154565.553223] ? snprintf+0x49/0x70 [154565.553229] ? exc_page_fault+0x62/0x150 [154565.553238] ? asm_exc_page_fault+0x22/0x30 Check for sp being non NULL before freeing any associated memory Fixes: a4239945b8ad ("scsi: qla2xxx: Add switch command to simplify fabric discovery") Cc: stable@vger.kernel.org Signed-off-by: Anil Gurumurthy <agurumurthy@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Reviewed-by: Himanshu Madhani <hmadhani2024@gmail.com> Link: https://patch.msgid.link/20251210101604.431868-10-njavali@marvell.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-01-24scsi: be2iscsi: Fix a memory leak in beiscsi_boot_get_sinfo()Haoxiang Li1-0/+1
If nonemb_cmd->va fails to be allocated, free the allocation previously made by alloc_mcc_wrb(). Fixes: 50a4b824be9e ("scsi: be2iscsi: Fix to make boot discovery non-blocking") Cc: stable@vger.kernel.org Signed-off-by: Haoxiang Li <lihaoxiang@isrc.iscas.ac.cn> Link: https://patch.msgid.link/20251213083643.301240-1-lihaoxiang@isrc.iscas.ac.cn Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2026-01-24scsi: qla2xxx: edif: Fix dma_free_coherent() sizeThomas Fourier1-1/+1
Earlier in the function, the ha->flt buffer is allocated with size sizeof(struct qla_flt_header) + FLT_REGIONS_SIZE but freed in the error path with size SFP_DEV_SIZE. Fixes: 84318a9f01ce ("scsi: qla2xxx: edif: Add send, receive, and accept for auth_els") Cc: stable@vger.kernel.org Signed-off-by: Thomas Fourier <fourier.thomas@gmail.com> Link: https://patch.msgid.link/20260112134326.55466-2-fourier.thomas@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2026-01-17scsi: qla2xxx: Sanitize payload size to prevent member overflowJiasheng Jiang1-0/+7
In qla27xx_copy_fpin_pkt() and qla27xx_copy_multiple_pkt(), the frame_size reported by firmware is used to calculate the copy length into item->iocb. However, the iocb member is defined as a fixed-size 64-byte array within struct purex_item. If the reported frame_size exceeds 64 bytes, subsequent memcpy calls will overflow the iocb member boundary. While extra memory might be allocated, this cross-member write is unsafe and triggers warnings under CONFIG_FORTIFY_SOURCE. Fix this by capping total_bytes to the size of the iocb member (64 bytes) before allocation and copying. This ensures all copies remain within the bounds of the destination structure member. Fixes: 875386b98857 ("scsi: qla2xxx: Add Unsolicited LS Request and Response Support for NVMe") Signed-off-by: Jiasheng Jiang <jiashengjiangcool@gmail.com> Reviewed-by: Himanshu Madhani <hmadhani2024@gmail.com> Link: https://patch.msgid.link/20260106205344.18031-1-jiashengjiangcool@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2026-01-17scsi: core: Wake up the error handler when final completions race against ↵David Jeffery2-1/+18
each other The fragile ordering between marking commands completed or failed so that the error handler only wakes when the last running command completes or times out has race conditions. These race conditions can cause the SCSI layer to fail to wake the error handler, leaving I/O through the SCSI host stuck as the error state cannot advance. First, there is an memory ordering issue within scsi_dec_host_busy(). The write which clears SCMD_STATE_INFLIGHT may be reordered with reads counting in scsi_host_busy(). While the local CPU will see its own write, reordering can allow other CPUs in scsi_dec_host_busy() or scsi_eh_inc_host_failed() to see a raised busy count, causing no CPU to see a host busy equal to the host_failed count. This race condition can be prevented with a memory barrier on the error path to force the write to be visible before counting host busy commands. Second, there is a general ordering issue with scsi_eh_inc_host_failed(). By counting busy commands before incrementing host_failed, it can race with a final command in scsi_dec_host_busy(), such that scsi_dec_host_busy() does not see host_failed incremented but scsi_eh_inc_host_failed() counts busy commands before SCMD_STATE_INFLIGHT is cleared by scsi_dec_host_busy(), resulting in neither waking the error handler task. This needs the call to scsi_host_busy() to be moved after host_failed is incremented to close the race condition. Fixes: 6eb045e092ef ("scsi: core: avoid host-wide host_busy counter for scsi_mq") Signed-off-by: David Jeffery <djeffery@redhat.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://patch.msgid.link/20260113161036.6730-1-djeffery@redhat.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2026-01-17scsi: storvsc: Process unsupported MODE_SENSE_10Long Li1-1/+2
The Hyper-V host does not support MODE_SENSE_10 and MODE_SENSE. The driver handles MODE_SENSE as unsupported command, but not for MODE_SENSE_10. Add MODE_SENSE_10 to the same handling logic and return correct code to SCSI layer. Fixes: 89ae7d709357 ("Staging: hv: storvsc: Move the storage driver out of the staging area") Cc: stable@kernel.org Signed-off-by: Long Li <longli@microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Link: https://patch.msgid.link/20260117010302.294068-1-longli@linux.microsoft.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2026-01-04scsi: bfa: Update outdated commentJulia Lawall1-1/+1
The function bfa_lps_is_brcd_fabric() was eliminated, being a one-line function, in commit f7f73812e950 ("[SCSI] bfa: clean up one line functions"). Replace the call in the comment by its inlined counterpart, referring to the parameter of the subsequent function. Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Link: https://patch.msgid.link/20251231165027.142443-1-Julia.Lawall@inria.fr Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2026-01-04scsi: core: Fix error handler encryption supportBrian Kao1-0/+24
Some low-level drivers (LLD) access block layer crypto fields, such as rq->crypt_keyslot and rq->crypt_ctx within `struct request`, to configure hardware for inline encryption. However, SCSI Error Handling (EH) commands (e.g., TEST UNIT READY, START STOP UNIT) should not involve any encryption setup. To prevent drivers from erroneously applying crypto settings during EH, this patch saves the original values of rq->crypt_keyslot and rq->crypt_ctx before an EH command is prepared via scsi_eh_prep_cmnd(). These fields in the 'struct request' are then set to NULL. The original values are restored in scsi_eh_restore_cmnd() after the EH command completes. This ensures that the block layer crypto context does not leak into EH command execution. Signed-off-by: Brian Kao <powenkao@google.com> Link: https://patch.msgid.link/20251218031726.2642834-1-powenkao@google.com Cc: stable@vger.kernel.org Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2026-01-04scsi: core: Correct documentation for scsi_test_unit_ready()Miao Li1-1/+1
If scsi_test_unit_ready() returns zero, TEST UNIT READY was executed successfully. Signed-off-by: Miao Li <limiao@kylinos.cn> Link: https://patch.msgid.link/20251218023129.284307-1-limiao870622@163.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-12-17scsi: sg: Fix occasional bogus elapsed time that exceeds timeoutMichal Rábek1-7/+13
A race condition was found in sg_proc_debug_helper(). It was observed on a system using an IBM LTO-9 SAS Tape Drive (ULTRIUM-TD9) and monitoring /proc/scsi/sg/debug every second. A very large elapsed time would sometimes appear. This is caused by two race conditions. We reproduced the issue with an IBM ULTRIUM-HH9 tape drive on an x86_64 architecture. A patched kernel was built, and the race condition could not be observed anymore after the application of this patch. A reproducer C program utilising the scsi_debug module was also built by Changhui Zhong and can be viewed here: https://github.com/MichaelRabek/linux-tests/blob/master/drivers/scsi/sg/sg_race_trigger.c The first race happens between the reading of hp->duration in sg_proc_debug_helper() and request completion in sg_rq_end_io(). The hp->duration member variable may hold either of two types of information: #1 - The start time of the request. This value is present while the request is not yet finished. #2 - The total execution time of the request (end_time - start_time). If sg_proc_debug_helper() executes *after* the value of hp->duration was changed from #1 to #2, but *before* srp->done is set to 1 in sg_rq_end_io(), a fresh timestamp is taken in the else branch, and the elapsed time (value type #2) is subtracted from a timestamp, which cannot yield a valid elapsed time (which is a type #2 value as well). To fix this issue, the value of hp->duration must change under the protection of the sfp->rq_list_lock in sg_rq_end_io(). Since sg_proc_debug_helper() takes this read lock, the change to srp->done and srp->header.duration will happen atomically from the perspective of sg_proc_debug_helper() and the race condition is thus eliminated. The second race condition happens between sg_proc_debug_helper() and sg_new_write(). Even though hp->duration is set to the current time stamp in sg_add_request() under the write lock's protection, it gets overwritten by a call to get_sg_io_hdr(), which calls copy_from_user() to copy struct sg_io_hdr from userspace into kernel space. hp->duration is set to the start time again in sg_common_write(). If sg_proc_debug_helper() is called between these two calls, an arbitrary value set by userspace (usually zero) is used to compute the elapsed time. To fix this issue, hp->duration must be set to the current timestamp again after get_sg_io_hdr() returns successfully. A small race window still exists between get_sg_io_hdr() and setting hp->duration, but this window is only a few instructions wide and does not result in observable issues in practice, as confirmed by testing. Additionally, we fix the format specifier from %d to %u for printing unsigned int values in sg_proc_debug_helper(). Signed-off-by: Michal Rábek <mrabek@redhat.com> Suggested-by: Tomas Henzl <thenzl@redhat.com> Tested-by: Changhui Zhong <czhong@redhat.com> Reviewed-by: Ewan D. Milne <emilne@redhat.com> Reviewed-by: John Meneghini <jmeneghi@redhat.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Link: https://patch.msgid.link/20251212160900.64924-1-mrabek@redhat.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-12-17scsi: mpi3mr: Read missing IOCFacts flag for reply queue full overflowChandrakanth Patil2-0/+3
The driver was not reading the MAX_REQ_PER_REPLY_QUEUE_LIMIT IOCFacts flag, so the reply-queue-full handling was never enabled, even on firmware that supports it. Reading this flag enables the feature and prevents reply queue overflow. Fixes: f08b24d82749 ("scsi: mpi3mr: Avoid reply queue full condition") Cc: stable@vger.kernel.org Signed-off-by: Chandrakanth Patil <chandrakanth.patil@broadcom.com> Link: https://patch.msgid.link/20251211002929.22071-1-chandrakanth.patil@broadcom.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-12-17scsi: scsi_debug: Fix atomic write enable module param descriptionJohn Garry1-1/+1
The atomic write enable module param is "atomic_wr", and not "atomic_write", so fix the module param description. Fixes: 84f3a3c01d70 ("scsi: scsi_debug: Atomic write support") Signed-off-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://patch.msgid.link/20251211100651.9056-1-john.g.garry@oracle.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-12-14Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds11-26/+102
Pull SCSI fixes from James Bottomley: "The only core fix is in doc; all the others are in drivers, with the biggest impacts in libsas being the rollback on error handling and in ufs coming from a couple of error handling fixes, one causing a crash if it's activated before scanning and the other fixing W-LUN resumption" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: ufs: qcom: Fix confusing cleanup.h syntax scsi: libsas: Add rollback handling when an error occurs scsi: device_handler: Return error pointer in scsi_dh_attached_handler_name() scsi: ufs: core: Fix a deadlock in the frequency scaling code scsi: ufs: core: Fix an error handler crash scsi: Revert "scsi: libsas: Fix exp-attached device scan after probe failure scanned in again after probe failed" scsi: ufs: core: Fix RPMB link error by reversing Kconfig dependencies scsi: qla4xxx: Use time conversion macros scsi: qla2xxx: Enable/disable IRQD_NO_BALANCING during reset scsi: ipr: Enable/disable IRQD_NO_BALANCING during reset scsi: imm: Fix use-after-free bug caused by unfinished delayed work scsi: target: sbp: Remove KMSG_COMPONENT macro scsi: core: Correct documentation for scsi_device_quiesce() scsi: mpi3mr: Prevent duplicate SAS/SATA device entries in channel 1 scsi: target: Reset t_task_cdb pointer in error case scsi: ufs: core: Fix EH failure after W-LUN resume error
2025-12-09scsi: libsas: Add rollback handling when an error occursChaohai Chen3-3/+32
In sas_register_phys(), if an error is triggered in the loop process, we need to roll back the resources that have already been requested. Add sas_unregister_phys() when an error occurs in sas_register_ha(). [mkp: a few coding style tweaks and address John's comment] Signed-off-by: Chaohai Chen <wdhh6@aliyun.com> Reviewed-by: John Garry <john.g.garry@oracle.com> Link: https://patch.msgid.link/20251206060616.69216-1-wdhh6@aliyun.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-12-09scsi: device_handler: Return error pointer in scsi_dh_attached_handler_name()Benjamin Marzinski1-3/+5
If scsi_dh_attached_handler_name() fails to allocate the handler name, dm-multipath (its only caller) assumes there is no attached device handler, and sets the device up incorrectly. Return an error pointer instead, so multipath can distinguish between failure, success where there is no attached device handler, or when the path device is not a SCSI device at all. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Reviewed-by: Martin Wilck <mwilck@suse.com> Link: https://patch.msgid.link/20251206010015.1595225-1-bmarzins@redhat.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-12-09Merge tag 'block-6.19-20251208' of ↵Linus Torvalds1-1/+11
git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux Pull block updates from Jens Axboe: "Followup set of fixes and updates for block for the 6.19 merge window. NVMe had some late minute debates which lead to dropping some patches from that tree, which is why the initial PR didn't have NVMe included. It's here now. This pull request contains: - NVMe pull request via Keith: - Subsystem usage cleanups (Max) - Endpoint device fixes (Shin'ichiro) - Debug statements (Gerd) - FC fabrics cleanups and fixes (Daniel) - Consistent alloc API usages (Israel) - Code comment updates (Chu) - Authentication retry fix (Justin) - Fix a memory leak in the discard ioctl code, if the task is being interrupted by a signal at just the wrong time - Zoned write plugging fixes - Add ioctls for for persistent reservations - Enable per-cpu bio caching by default - Various little fixes and tweaks" * tag 'block-6.19-20251208' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: (27 commits) nvme-fabrics: add ENOKEY to no retry criteria for authentication failures nvme-auth: use kvfree() for memory allocated with kvcalloc() nvmet-tcp: use kvcalloc for commands array nvmet-rdma: use kvcalloc for commands and responses arrays nvme: fix typo error in nvme target nvmet-fc: use pr_* print macros instead of dev_* nvmet-fcloop: remove unused lsdir member. nvmet-fcloop: check all request and response have been processed nvme-fc: check all request and response have been processed block: fix memory leak in __blkdev_issue_zero_pages block: fix comment for op_is_zone_mgmt() to include RESET_ALL block: Clear BLK_ZONE_WPLUG_PLUGGED when aborting plugged BIOs blk-mq: Abort suspend when wakeup events are pending blk-mq: add blk_rq_nr_bvec() helper block: add IOC_PR_READ_RESERVATION ioctl block: add IOC_PR_READ_KEYS ioctl nvme: reject invalid pr_read_keys() num_keys values scsi: sd: reject invalid pr_read_keys() num_keys values block: enable per-cpu bio cache by default block: use bio_alloc_bioset for passthru IO by default ...
2025-12-06Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds55-659/+2506
Pull SCSI updates from James Bottomley: "Usual driver updates (ufs, lpfc, target, qla2xxx) plus assorted cleanups and fixes including the WQ_PERCPU series. The biggest core change is the new allocation of pseudo-devices which allow the sending of internal commands to a given SCSI target" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (147 commits) scsi: MAINTAINERS: Add the UFS include directory scsi: scsi_debug: Support injecting unaligned write errors scsi: qla2xxx: Fix improper freeing of purex item scsi: ufs: rockchip: Fix compile error without CONFIG_GPIOLIB scsi: ufs: rockchip: Reset controller on PRE_CHANGE of hce enable notify scsi: ufs: core: Use scsi_device_busy() scsi: ufs: core: Fix single doorbell mode support scsi: pm80xx: Add WQ_PERCPU to alloc_workqueue() users scsi: target: Add WQ_PERCPU to alloc_workqueue() users scsi: qedi: Add WQ_PERCPU to alloc_workqueue() users scsi: target: ibmvscsi: Add WQ_PERCPU to alloc_workqueue() users scsi: qedf: Add WQ_PERCPU to alloc_workqueue() users scsi: bnx2fc: Add WQ_PERCPU to alloc_workqueue() users scsi: be2iscsi: Add WQ_PERCPU to alloc_workqueue() users scsi: message: fusion: Add WQ_PERCPU to alloc_workqueue() users scsi: lpfc: WQ_PERCPU added to alloc_workqueue() users scsi: scsi_transport_fc: WQ_PERCPU added to alloc_workqueue users() scsi: scsi_dh_alua: WQ_PERCPU added to alloc_workqueue() users scsi: qla2xxx: WQ_PERCPU added to alloc_workqueue() users scsi: target: sbp: Replace use of system_unbound_wq with system_dfl_wq ...
2025-12-05Merge tag 'pci-v6.19-changes' of ↵Linus Torvalds6-19/+0
git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci Pull PCI updates from Bjorn Helgaas: "Enumeration: - Enable host bridge emulation for PCI_DOMAINS_GENERIC platforms (Dan Williams) - Switch vmd from custom domain number allocator to the common allocator to prevent a potential race with new non-VMD buses (Dan Williams) - Enable Precision Time Measurement (PTM) only if device advertises support for a relevant role, to prevent invalid PTM Requests that cause ACS violations that are reported as AER Uncorrectable Non-Fatal errors (Mika Westerberg) Resource management: - Prevent resource tree corruption when BAR resize fails (Ilpo Järvinen) - Restore BARs to the original size if a BAR resize fails (Ilpo Järvinen) - Remove BAR release from BAR resize attempts by the xe, i915, and amdgpu drivers so the PCI core can restore BARs if the resize fails (Ilpo Järvinen) - Move Resizable BAR code to rebar.c (Ilpo Järvinen) - Add pci_rebar_size_supported() and use it in i915 and xe (Ilpo Järvinen) - Add pci_rebar_get_max_size() and use it in xe and amdgpu (Ilpo Järvinen) Power management and error handling: - For drivers using PCI legacy suspend, save config state at suspend so that state (not any earlier state from enumeration, probe, or error recovery) will be restored when resuming (Lukas Wunner) - For devices with no driver or a driver that lacks power management, save config state at hibernate so that state (not any earlier state from enumeration, probe, or error recovery) will be restored when resuming (Lukas Wunner) - Save device config space on device addition, before driver binding, so error recovery works more reliably (Lukas Wunner) - Drop pci_save_state() from several drivers that no longer need it since the PCI core always does it and pci_restore_state() no longer invalidates the saved state (Lukas Wunner) - Document use of pci_save_state() by drivers to capture the state they want restored during error recovery (Lukas Wunner) Power control: - Add a struct pci_ops.assert_perst() function pointer to assert/deassert PCIe PERST# and implement it for the qcom driver (Krishna Chaitanya Chundru) - Add DT binding and pwrctrl driver for the Toshiba TC9563 PCIe switch, which must be held in reset after poweron so the pwrctrl driver can configure the switch via I2C before bringing up the links (Krishna Chaitanya Chundru) Endpoint framework: - Convert the endpoint doorbell test to use a threaded IRQ to fix a 'sleeping while atomic' issue (Bhanu Seshu Kumar Valluri) - Add endpoint VNTB MSI doorbell support to reduce latency between host and endpoint (Frank Li) New native PCIe controller drivers: - Add CIX Sky1 host controller DT binding and driver (Hans Zhang) - Add NXP S32G host controller DT binding and driver (Vincent Guittot) - Add Renesas RZ/G3S host controller DT binding and driver (Claudiu Beznea) - Add SpacemiT K1 host controller DT binding and driver (Alex Elder) Amlogic Meson PCIe controller driver: - Update DT binding to name DBI region 'dbi', not 'elbi', and update driver to support both (Manivannan Sadhasivam) Apple PCIe controller driver: - Move struct pci_host_bridge allocation from pci_host_common_init() to callers, which significantly simplifies pcie-apple (Marc Zyngier) Broadcom STB PCIe controller driver: - Disable advertising ASPM L0s support correctly (Jim Quinlan) - Add a panic/die handler to print diagnostic info in case PCIe caused an unrecoverable abort (Jim Quinlan) Cadence PCIe controller driver: - Add module support for Cadence platform host and endpoint controller driver (Manikandan K Pillai) - Split headers into 'legacy' (LGA) and 'high perf' (HPA) to prepare for new CIX Sky1 driver (Manikandan K Pillai) MediaTek PCIe controller driver: - Convert DT binding to YAML schema (Christian Marangi) - Add Airoha AN7583 DT compatible and driver support (Christian Marangi) Qualcomm PCIe controller driver: - Add Qualcomm Kaanapali to SM8550 DT binding (Qiang Yu) - Add required 'power-domains' and 'resets' to qcom sa8775p, sc7280, sc8280xp, sm8150, sm8250, sm8350, sm8450, sm8550, x1e80100 DT schemas (Krzysztof Kozlowski) - Look up OPP using both frequency and data rate (not just frequency) so RPMh votes can account for both (Krishna Chaitanya Chundru) Rockchip DesignWare PCIe controller driver: - Add Rockchip RK3528 compatible strings in DT binding (Yao Zi) STMicroelectronics STM32MP25 PCIe controller driver: - Fix a race between link training and endpoint register initialization (Christian Bruel) - Align endpoint allocations to match the ATU requirements (Christian Bruel) Synopsys DesignWare PCIe controller driver: - Clear L1 PM Substate Capability 'Supported' bits unless glue driver says it's supported, which prevents users from enabling non-working L1SS. Currently only qcom and tegra194 support L1SS (Bjorn Helgaas) - Remove now-superfluous L1SS disable code from tegra194 (Bjorn Helgaas) - Configure L1SS support in dw-rockchip when DT says 'supports-clkreq' (Shawn Lin) TI Keystone PCIe controller driver: - Fail the probe instead of silently succeeding if ks_pcie_of_data didn't specify Root Complex or Endpoint mode (Siddharth Vadapalli) - Make keystone buildable as a loadable module, except on ARM32 where hook_fault_code() is __init (Siddharth Vadapalli)" * tag 'pci-v6.19-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (100 commits) MAINTAINERS: Add Manivannan Sadhasivam as PCI/pwrctrl maintainer MAINTAINERS: Add CIX Sky1 PCIe controller driver maintainer PCI: sky1: Add PCIe host support for CIX Sky1 dt-bindings: PCI: Add CIX Sky1 PCIe Root Complex bindings PCI: cadence: Add support for High Perf Architecture (HPA) controller MAINTAINERS: Add NXP S32G PCIe controller driver maintainer PCI: s32g: Add NXP S32G PCIe controller driver (RC) PCI: dwc: Add register and bitfield definitions dt-bindings: PCI: s32g: Add NXP S32G PCIe controller PCI: Add Renesas RZ/G3S host controller driver PCI: host-generic: Move bridge allocation outside of pci_host_common_init() dt-bindings: PCI: Add Renesas RZ/G3S PCIe controller binding PCI: Validate pci_rebar_size_supported() input Documentation: PCI: Amend error recovery doc with pci_save_state() rules treewide: Drop pci_save_state() after pci_restore_state() PCI/ERR: Ensure error recoverability at all times PCI/PM: Stop needlessly clearing state_saved on enumeration and thaw PCI/PM: Reinstate clearing state_saved in legacy and !PM codepaths PCI: dw-rockchip: Configure L1SS support PCI: tegra194: Remove unnecessary L1SS disable code ...
2025-12-04scsi: sd: reject invalid pr_read_keys() num_keys valuesStefan Hajnoczi1-1/+11
The pr_read_keys() interface has a u32 num_keys parameter. The SCSI PERSISTENT RESERVE IN command has a maximum READ KEYS service action size of 65536 bytes. Reject num_keys values that are too large to fit into the SCSI command. This will become important when pr_read_keys() is exposed to untrusted userspace via an <linux/pr.h> ioctl. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-12-04Merge tag 'for-6.19/block-20251201' of ↵Linus Torvalds2-14/+8
git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux Pull block updates from Jens Axboe: - Fix head insertion for mq-deadline, a regression from when priority support was added - Series simplifying and improving the ublk user copy code - Various ublk related cleanups - Fixup REQ_NOWAIT handling in loop/zloop, clearing NOWAIT when the request is punted to a thread for handling - Merge and then later revert loop dio nowait support, as it ended up causing excessive stack usage for when the inline issue code needs to dip back into the full file system code - Improve auto integrity code, making it less deadlock prone - Speedup polled IO handling, but manually managing the hctx lookups - Fixes for blk-throttle for SSD devices - Small series with fixes for the S390 dasd driver - Add support for caching zones, avoiding unnecessary report zone queries - MD pull requests via Yu: - fix null-ptr-dereference regression for dm-raid0 - fix IO hang for raid5 when array is broken with IO inflight - remove legacy 1s delay to speed up system shutdown - change maintainer's email address - data can be lost if array is created with different lbs devices, fix this problem and record lbs of the array in metadata - fix rcu protection for md_thread - fix mddev kobject lifetime regression - enable atomic writes for md-linear - some cleanups - bcache updates via Coly - remove useless discard and cache device code - improve usage of per-cpu workqueues - Reorganize the IO scheduler switching code, fixing some lockdep reports as well - Improve the block layer P2P DMA support - Add support to the block tracing code for zoned devices - Segment calculation improves, and memory alignment flexibility improvements - Set of prep and cleanups patches for ublk batching support. The actual batching hasn't been added yet, but helps shrink down the workload of getting that patchset ready for 6.20 - Fix for how the ps3 block driver handles segments offsets - Improve how block plugging handles batch tag allocations - nbd fixes for use-after-free of the configuration on device clear/put - Set of improvements and fixes for zloop - Add Damien as maintainer of the block zoned device code handling - Various other fixes and cleanups * tag 'for-6.19/block-20251201' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: (162 commits) block/rnbd: correct all kernel-doc complaints blk-mq: use queue_hctx in blk_mq_map_queue_type md: remove legacy 1s delay in md_notify_reboot md/raid5: fix IO hang when array is broken with IO inflight md: warn about updating super block failure md/raid0: fix NULL pointer dereference in create_strip_zones() for dm-raid sbitmap: fix all kernel-doc warnings ublk: add helper of __ublk_fetch() ublk: pass const pointer to ublk_queue_is_zoned() ublk: refactor auto buffer register in ublk_dispatch_req() ublk: add `union ublk_io_buf` with improved naming ublk: add parameter `struct io_uring_cmd *` to ublk_prep_auto_buf_reg() kfifo: add kfifo_alloc_node() helper for NUMA awareness blk-mq: fix potential uaf for 'queue_hw_ctx' blk-mq: use array manage hctx map instead of xarray ublk: prevent invalid access with DEBUG s390/dasd: Use scnprintf() instead of sprintf() s390/dasd: Move device name formatting into separate function s390/dasd: Remove unnecessary debugfs_create() return checks s390/dasd: Fix gendisk parent after copy pair swap ...
2025-12-03Merge tag 'printk-for-6.19' of ↵Linus Torvalds3-41/+31
git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux Pull printk updates from Petr Mladek: - Allow creaing nbcon console drivers with an unsafe write_atomic() callback that can only be called by the final nbcon_atomic_flush_unsafe(). Otherwise, the driver would rely on the kthread. It is going to be used as the-best-effort approach for an experimental nbcon netconsole driver, see https://lore.kernel.org/r/20251121-nbcon-v1-2-503d17b2b4af@debian.org Note that a safe .write_atomic() callback is supposed to work in NMI context. But some networking drivers are not safe even in IRQ context: https://lore.kernel.org/r/oc46gdpmmlly5o44obvmoatfqo5bhpgv7pabpvb6sjuqioymcg@gjsma3ghoz35 In an ideal world, all networking drivers would be fixed first and the atomic flush would be blocked only in NMI context. But it brings the question how reliable networking drivers are when the system is in a bad state. They might block flushing more reliable serial consoles which are more suitable for serious debugging anyway. - Allow to use the last 4 bytes of the printk ring buffer. - Prevent queuing IRQ work and block printk kthreads when consoles are suspended. Otherwise, they create non-necessary churn or even block the suspend. - Release console_lock() between each record in the kthread used for legacy consoles on RT. It might significantly speed up the boot. - Release nbcon context between each record in the atomic flush. It prevents stalls of the related printk kthread after it has lost the ownership in the middle of a record - Add support for NBCON consoles into KDB - Add %ptsP modifier for printing struct timespec64 and use it where possible - Misc code clean up * tag 'printk-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: (48 commits) printk: Use console_is_usable on console_unblank arch: um: kmsg_dump: Use console_is_usable drivers: serial: kgdboc: Drop checks for CON_ENABLED and CON_BOOT lib/vsprintf: Unify FORMAT_STATE_NUM handlers printk: Avoid irq_work for printk_deferred() on suspend printk: Avoid scheduling irq_work on suspend printk: Allow printk_trigger_flush() to flush all types tracing: Switch to use %ptSp scsi: snic: Switch to use %ptSp scsi: fnic: Switch to use %ptSp s390/dasd: Switch to use %ptSp ptp: ocp: Switch to use %ptSp pps: Switch to use %ptSp PCI: epf-test: Switch to use %ptSp net: dsa: sja1105: Switch to use %ptSp mmc: mmc_test: Switch to use %ptSp media: av7110: Switch to use %ptSp ipmi: Switch to use %ptSp igb: Switch to use %ptSp e1000e: Switch to use %ptSp ...
2025-12-03scsi: Revert "scsi: libsas: Fix exp-attached device scan after probe failure ↵Xingui Yang1-14/+0
scanned in again after probe failed" This reverts commit ab2068a6fb84751836a84c26ca72b3beb349619d. When probing the exp-attached sata device, libsas/libata will issue a hard reset in sas_probe_sata() -> ata_sas_async_probe(), then a broadcast event will be received after the disk probe fails, and this commit causes the probe will be re-executed on the disk, and a faulty disk may get into an indefinite loop of probe. Therefore, revert this commit, although it can fix some temporary issues with disk probe failure. Signed-off-by: Xingui Yang <yangxingui@huawei.com> Reviewed-by: Jason Yan <yanaijie@huawei.com> Reviewed-by: John Garry <john.g.garry@oracle.com> Link: https://patch.msgid.link/20251202065627.140361-1-yangxingui@huawei.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-30scsi: qla4xxx: Use time conversion macrosShi Hao1-1/+1
Replace the raw use of 500 value in schedule_timeout() function with msecs_to_jiffies() to ensure intended value across different kernel configurations regardless of HZ value. Signed-off-by: Shi Hao <i.shihao.999@gmail.com> Link: https://patch.msgid.link/20251117200949.42557-1-i.shihao.999@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-29scsi: qla2xxx: Enable/disable IRQD_NO_BALANCING during resetWen Xiong1-0/+30
A dynamic remove/add storage adapter test hits EEH on PowerPC: EEH: [c00000000004f77c] __eeh_send_failure_event+0x7c/0x160 EEH: [c000000000048464] eeh_dev_check_failure.part.0+0x254/0x660 EEH: [c000000000934e0c] __pci_read_msi_msg+0x1ac/0x280 EEH: [c000000000100f68] pseries_msi_compose_msg+0x28/0x40 EEH: [c00000000020e1cc] irq_chip_compose_msi_msg+0x5c/0x90 EEH: [c000000000214b1c] msi_domain_set_affinity+0xbc/0x100 EEH: [c000000000206be4] irq_do_set_affinity+0x214/0x2c0 EEH: [c000000000206e04] irq_set_affinity_locked+0x174/0x230 EEH: [c000000000207044] irq_set_affinity+0x64/0xa0 EEH: [c000000000212890] write_irq_affinity.constprop.0.isra.0+0x130/0x150 EEH: [c00000000068868c] proc_reg_write+0xfc/0x160 EEH: [c0000000005adb48] vfs_write+0xf8/0x4e0 EEH: [c0000000005ae234] ksys_write+0x84/0x140 EEH: [c00000000002e994] system_call_exception+0x164/0x310 EEH: [c00000000000bfe8] system_call_vectored_common+0xe8/0x278 The irqbalance daemon kicks in before invoking qla2xxx->slot_reset during the EEH recovery process. irqbalance daemon ->irq_set_affinity() ->msi_domain_set_affinity() ->irq_chip_set_affiinity_parent() ->xive_irq_set_affinity() ->pseries_msi_compose_ms() ->__pci_read_msi_msg() ->irq_chip_compose_msi_msg() In __pci_read_msi_msg(), the first MSI-X vector is set to all F by the irqbalance daemon. pci_write_msg_msix: index=0, lo=ffffffff hi=fffffff IRQ balancing is not required during adapter reset. Enable "IRQ_NO_BALANCING" bit before starting adapter reset and disable it calling pci_restore_state(). The irqbalance daemon is disabled for this short period of time (~2s). Co-developed-by: Kyle Mahlkuch <Kyle.Mahlkuch@ibm.com> Signed-off-by: Kyle Mahlkuch <Kyle.Mahlkuch@ibm.com> Signed-off-by: Wen Xiong <wenxiong@linux.ibm.com> Link: https://patch.msgid.link/20251028142427.3969819-3-wenxiong@linux.ibm.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-29scsi: ipr: Enable/disable IRQD_NO_BALANCING during resetWen Xiong1-1/+27
A dynamic remove/add storage adapter test hits EEH on PowerPC: EEH: [c00000000004f75c] __eeh_send_failure_event+0x7c/0x160 EEH: [c000000000048444] eeh_dev_check_failure.part.0+0x254/0x650 EEH: [c008000001650678] eeh_readl+0x60/0x90 [ipr] EEH: [c00800000166746c] ipr_cancel_op+0x2b8/0x524 [ipr] EEH: [c008000001656524] ipr_eh_abort+0x6c/0x130 [ipr] EEH: [c000000000ab0d20] scmd_eh_abort_handler+0x140/0x440 EEH: [c00000000017e558] process_one_work+0x298/0x590 EEH: [c00000000017eef8] worker_thread+0xa8/0x620 EEH: [c00000000018be34] kthread+0x124/0x130 EEH: [c00000000000cd64] ret_from_kernel_thread+0x5c/0x64 A PCIe bus trace reveals that a vector of MSI-X is cleared to 0 by irqbalance daemon. If we disable irqbalance daemon, we won't see the issue. With debug enabled in ipr driver: [ 44.103071] ipr: Entering __ipr_remove [ 44.103083] ipr: Entering ipr_initiate_ioa_bringdown [ 44.103091] ipr: Entering ipr_reset_shutdown_ioa [ 44.103099] ipr: Leaving ipr_reset_shutdown_ioa [ 44.103105] ipr: Leaving ipr_initiate_ioa_bringdown [ 44.149918] ipr: Entering ipr_reset_ucode_download [ 44.149935] ipr: Entering ipr_reset_alert [ 44.150032] ipr: Entering ipr_reset_start_timer [ 44.150038] ipr: Leaving ipr_reset_alert [ 44.244343] scsi 1:2:3:0: alua: Detached [ 44.254300] ipr: Entering ipr_reset_start_bist [ 44.254320] ipr: Entering ipr_reset_start_timer [ 44.254325] ipr: Leaving ipr_reset_start_bist [ 44.364329] scsi 1:2:4:0: alua: Detached [ 45.134341] scsi 1:2:5:0: alua: Detached [ 45.860949] ipr: Entering ipr_reset_shutdown_ioa [ 45.860962] ipr: Leaving ipr_reset_shutdown_ioa [ 45.860966] ipr: Entering ipr_reset_alert [ 45.861028] ipr: Entering ipr_reset_start_timer [ 45.861035] ipr: Leaving ipr_reset_alert [ 45.964302] ipr: Entering ipr_reset_start_bist [ 45.964309] ipr: Entering ipr_reset_start_timer [ 45.964313] ipr: Leaving ipr_reset_start_bist [ 46.264301] ipr: Entering ipr_reset_bist_done [ 46.264309] ipr: Leaving ipr_reset_bist_done During adapter reset, ipr device driver blocks config space access but can't block MMIO access for MSI-X entries. There is very small window: irqbalance daemon kicks in during adapter reset before ipr driver calls pci_restore_state(pdev) to restore MSI-X table. irqbalance daemon reads back all 0 for that MSI-X vector in __pci_read_msi_msg(). irqbalance daemon: msi_domain_set_affinity() ->irq_chip_set_affinity_patent() ->xive_irq_set_affinity() ->irq_chip_compose_msi_msg() ->pseries_msi_compose_msg() ->__pci_read_msi_msg(): read all 0 since didn't call pci_restore_state ->irq_chip_write_msi_msg() -> pci_write_msg_msi(): write 0 to the msix vector entry When ipr driver calls pci_restore_state(pdev) in ipr_reset_restore_cfg_space(), the MSI-X vector entry has been cleared by irqbalance daemon in pci_write_msg_msix(). pci_restore_state() ->__pci_restore_msix_state() Below is the MSI-X table for ipr adapter after irqbalance daemon kicked in during adapter reset: Dump MSIx table: index=0 address_lo=c800 address_hi=10000000 msg_data=0 Dump MSIx table: index=1 address_lo=c810 address_hi=10000000 msg_data=0 Dump MSIx table: index=2 address_lo=c820 address_hi=10000000 msg_data=0 Dump MSIx table: index=3 address_lo=c830 address_hi=10000000 msg_data=0 Dump MSIx table: index=4 address_lo=c840 address_hi=10000000 msg_data=0 Dump MSIx table: index=5 address_lo=c850 address_hi=10000000 msg_data=0 Dump MSIx table: index=6 address_lo=c860 address_hi=10000000 msg_data=0 Dump MSIx table: index=7 address_lo=c870 address_hi=10000000 msg_data=0 Dump MSIx table: index=8 address_lo=0 address_hi=0 msg_data=0 ---------> Hit EEH since msix vector of index=8 are 0 Dump MSIx table: index=9 address_lo=c890 address_hi=10000000 msg_data=0 Dump MSIx table: index=10 address_lo=c8a0 address_hi=10000000 msg_data=0 Dump MSIx table: index=11 address_lo=c8b0 address_hi=10000000 msg_data=0 Dump MSIx table: index=12 address_lo=c8c0 address_hi=10000000 msg_data=0 Dump MSIx table: index=13 address_lo=c8d0 address_hi=10000000 msg_data=0 Dump MSIx table: index=14 address_lo=c8e0 address_hi=10000000 msg_data=0 Dump MSIx table: index=15 address_lo=c8f0 address_hi=10000000 msg_data=0 [ 46.264312] ipr: Entering ipr_reset_restore_cfg_space [ 46.267439] ipr: Entering ipr_fail_all_ops [ 46.267447] ipr: Leaving ipr_fail_all_ops [ 46.267451] ipr: Leaving ipr_reset_restore_cfg_space [ 46.267454] ipr: Entering ipr_ioa_bringdown_done [ 46.267458] ipr: Leaving ipr_ioa_bringdown_done [ 46.267467] ipr: Entering ipr_worker_thread [ 46.267470] ipr: Leaving ipr_worker_thread IRQ balancing is not required during adapter reset. Enable "IRQ_NO_BALANCING" flag before starting adapter reset and disable it after calling pci_restore_state(). The irqbalance daemon is disabled for this short period of time (~2s). Co-developed-by: Kyle Mahlkuch <Kyle.Mahlkuch@ibm.com> Signed-off-by: Kyle Mahlkuch <Kyle.Mahlkuch@ibm.com> Signed-off-by: Wen Xiong <wenxiong@linux.ibm.com> Link: https://patch.msgid.link/20251028142427.3969819-2-wenxiong@linux.ibm.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-29scsi: imm: Fix use-after-free bug caused by unfinished delayed workDuoming Zhou1-0/+1
The delayed work item 'imm_tq' is initialized in imm_attach() and scheduled via imm_queuecommand() for processing SCSI commands. When the IMM parallel port SCSI host adapter is detached through imm_detach(), the imm_struct device instance is deallocated. However, the delayed work might still be pending or executing when imm_detach() is called, leading to use-after-free bugs when the work function imm_interrupt() accesses the already freed imm_struct memory. The race condition can occur as follows: CPU 0(detach thread) | CPU 1 | imm_queuecommand() | imm_queuecommand_lck() imm_detach() | schedule_delayed_work() kfree(dev) //FREE | imm_interrupt() | dev = container_of(...) //USE dev-> //USE Add disable_delayed_work_sync() in imm_detach() to guarantee proper cancellation of the delayed work item before imm_struct is deallocated. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Link: https://patch.msgid.link/20251028100149.40721-1-duoming@zju.edu.cn Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-29scsi: core: Correct documentation for scsi_device_quiesce()Miao Li1-1/+1
If scsi_device_quiesce() returns zero, the function executed successfully. Signed-off-by: Miao Li <limiao@kylinos.cn> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://patch.msgid.link/20251124075444.32699-1-limiao870622@163.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>