summaryrefslogtreecommitdiff
path: root/drivers/scsi
AgeCommit message (Collapse)AuthorFilesLines
2022-10-24scsi: 3w-9xxx: Avoid disabling device if failing to enable itLetu Ren1-1/+1
[ Upstream commit 7eff437b5ee1309b34667844361c6bbb5c97df05 ] The original code will "goto out_disable_device" and call pci_disable_device() if pci_enable_device() fails. The kernel will generate a warning message like "3w-9xxx 0000:00:05.0: disabling already-disabled device". We shouldn't disable a device that failed to be enabled. A simple return is fine. Link: https://lore.kernel.org/r/20220829110115.38789-1-fantasquex@gmail.com Reported-by: Zheyu Ma <zheyuma97@gmail.com> Signed-off-by: Letu Ren <fantasquex@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-24scsi: lpfc: Fix null ndlp ptr dereference in abnormal exit path for GFT_IDJames Smart1-5/+2
[ Upstream commit 59b7e210a522b836a01516c71ee85d1d92c1f075 ] An error case exit from lpfc_cmpl_ct_cmd_gft_id() results in a call to lpfc_nlp_put() with a null pointer to a nodelist structure. Changed lpfc_cmpl_ct_cmd_gft_id() to initialize nodelist pointer upon entry. Link: https://lore.kernel.org/r/20220819011736.14141-3-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-24scsi: iscsi: iscsi_tcp: Fix null-ptr-deref while calling getpeername()Mike Christie2-21/+55
[ Upstream commit 57569c37f0add1b6489e1a1563c71519daf732cf ] Fix a NULL pointer crash that occurs when we are freeing the socket at the same time we access it via sysfs. The problem is that: 1. iscsi_sw_tcp_conn_get_param() and iscsi_sw_tcp_host_get_param() take the frwd_lock and do sock_hold() then drop the frwd_lock. sock_hold() does a get on the "struct sock". 2. iscsi_sw_tcp_release_conn() does sockfd_put() which does the last put on the "struct socket" and that does __sock_release() which sets the sock->ops to NULL. 3. iscsi_sw_tcp_conn_get_param() and iscsi_sw_tcp_host_get_param() then call kernel_getpeername() which accesses the NULL sock->ops. Above we do a get on the "struct sock", but we needed a get on the "struct socket". Originally, we just held the frwd_lock the entire time but in commit bcf3a2953d36 ("scsi: iscsi: iscsi_tcp: Avoid holding spinlock while calling getpeername()") we switched to refcount based because the network layer changed and started taking a mutex in that path, so we could no longer hold the frwd_lock. Instead of trying to maintain multiple refcounts, this just has us use a mutex for accessing the socket in the interface code paths. Link: https://lore.kernel.org/r/20220907221700.10302-1-michael.christie@oracle.com Fixes: bcf3a2953d36 ("scsi: iscsi: iscsi_tcp: Avoid holding spinlock while calling getpeername()") Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-24scsi: iscsi: Run recv path from workqueueMike Christie2-13/+54
[ Upstream commit f1d269765ee29da56b32818b7a08054484ed89f2 ] We don't always want to run the recv path from the network softirq because when we have to have multiple sessions sharing the same CPUs, some sessions can eat up the NAPI softirq budget and affect other sessions or users. Allow us to queue the recv handling to the iscsi workqueue so we can have the scheduler/wq code try to balance the work and CPU use across all sessions' worker threads. Note: It wasn't the original intent of the change but a nice side effect is that for some workloads/configs we get a nice performance boost. For a simple read heavy test: fio --direct=1 --filename=/dev/dm-0 --rw=randread --bs=256K --ioengine=libaio --iodepth=128 --numjobs=4 where the iscsi threads, fio jobs, and rps_cpus share CPUs we see a 32% throughput boost. We also see increases for small I/O IOPs tests but it's not as high. Link: https://lore.kernel.org/r/20220616224557.115234-4-michael.christie@oracle.com Reviewed-by: Lee Duncan <lduncan@suse.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Stable-dep-of: 57569c37f0ad ("scsi: iscsi: iscsi_tcp: Fix null-ptr-deref while calling getpeername()") Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-24scsi: iscsi: Add recv workqueue helpersMike Christie1-2/+27
[ Upstream commit 8af809966c0b34cfacd8da9a412689b8e9910354 ] Add helpers to allow the drivers to run their recv paths from libiscsi's workqueue. Link: https://lore.kernel.org/r/20220616224557.115234-3-michael.christie@oracle.com Reviewed-by: Lee Duncan <lduncan@suse.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Stable-dep-of: 57569c37f0ad ("scsi: iscsi: iscsi_tcp: Fix null-ptr-deref while calling getpeername()") Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-24scsi: iscsi: Rename iscsi_conn_queue_work()Mike Christie3-8/+8
[ Upstream commit 4b9f8ce4d5e823e42944c5a0a4842b0f936365ad ] Rename iscsi_conn_queue_work() to iscsi_conn_queue_xmit() to reflect that it handles queueing of xmits only. Link: https://lore.kernel.org/r/20220616224557.115234-2-michael.christie@oracle.com Reviewed-by: Lee Duncan <lduncan@suse.com> Reviewed-by: Wu Bo <wubo40@huawei.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Stable-dep-of: 57569c37f0ad ("scsi: iscsi: iscsi_tcp: Fix null-ptr-deref while calling getpeername()") Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-24scsi: pm8001: Fix running_req for internal abort commandsJohn Garry1-0/+4
[ Upstream commit d8c22c4697c11ed28062afe3c2b377025be11a23 ] Disabling the remote phy for a SATA disk causes a hang: root@(none)$ more /sys/class/sas_phy/phy-0:0:8/target_port_protocols sata root@(none)$ echo 0 > sys/class/sas_phy/phy-0:0:8/enable root@(none)$ [ 67.855950] sas: ex 500e004aaaaaaa1f phy08 change count has changed [ 67.920585] sd 0:0:2:0: [sdc] Synchronizing SCSI cache [ 67.925780] sd 0:0:2:0: [sdc] Synchronize Cache(10) failed: Result: hostbyte=0x04 driverbyte=DRIVER_OK [ 67.935094] sd 0:0:2:0: [sdc] Stopping disk [ 67.939305] sd 0:0:2:0: [sdc] Start/Stop Unit failed: Result: hostbyte=0x04 driverbyte=DRIVER_OK ... [ 123.998998] INFO: task kworker/u192:1:642 blocked for more than 30 seconds. [ 124.005960] Not tainted 6.0.0-rc1-205202-gf26f8f761e83 #218 [ 124.012049] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 124.019872] task:kworker/u192:1 state:D stack:0 pid: 642 ppid: 2 flags:0x00000008 [ 124.028223] Workqueue: 0000:04:00.0_event_q sas_port_event_worker [ 124.034319] Call trace: [ 124.036758] __switch_to+0x128/0x278 [ 124.040333] __schedule+0x434/0xa58 [ 124.043820] schedule+0x94/0x138 [ 124.047045] schedule_timeout+0x2fc/0x368 [ 124.051052] wait_for_completion+0xdc/0x200 [ 124.055234] __flush_workqueue+0x1a8/0x708 [ 124.059328] sas_porte_broadcast_rcvd+0xa8/0xc0 [ 124.063858] sas_port_event_worker+0x60/0x98 [ 124.068126] process_one_work+0x3f8/0x660 [ 124.072134] worker_thread+0x70/0x700 [ 124.075793] kthread+0x1a4/0x1b8 [ 124.079014] ret_from_fork+0x10/0x20 The issue is that the per-device running_req read in pm8001_dev_gone_notify() never goes to zero and we never make progress. This is caused by missing accounting for running_req for when an internal abort command completes. In commit 2cbbf489778e ("scsi: pm8001: Use libsas internal abort support") we started to send internal abort commands as a proper sas_task. In this when we deliver a sas_task to HW the per-device running_req is incremented in pm8001_queue_command(). However it is never decremented for internal abort commnds, so decrement in pm8001_mpi_task_abort_resp(). Link: https://lore.kernel.org/r/1663854664-76165-1-git-send-email-john.garry@huawei.com Fixes: 2cbbf489778e ("scsi: pm8001: Use libsas internal abort support") Acked-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-24scsi: libsas: Fix use-after-free bug in smp_execute_task_sg()Duoming Zhou1-1/+1
[ Upstream commit 46ba53c30666717cb06c2b3c5d896301cd00d0c0 ] When executing SMP task failed, the smp_execute_task_sg() calls del_timer() to delete "slow_task->timer". However, if the timer handler sas_task_internal_timedout() is running, the del_timer() in smp_execute_task_sg() will not stop it and a UAF will happen. The process is shown below: (thread 1) | (thread 2) smp_execute_task_sg() | sas_task_internal_timedout() ... | del_timer() | ... | ... sas_free_task(task) | kfree(task->slow_task) //FREE| | task->slow_task->... //USE Fix by calling del_timer_sync() in smp_execute_task_sg(), which makes sure the timer handler have finished before the "task->slow_task" is deallocated. Link: https://lore.kernel.org/r/20220920144213.10536-1-duoming@zju.edu.cn Fixes: 2908d778ab3e ("[SCSI] aic94xx: new driver") Reviewed-by: Jason Yan <yanaijie@huawei.com> Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-24scsi: qedf: Populate sysfs attributes for vportSaurav Kashyap1-0/+21
commit 592642e6b11e620e4b43189f8072752429fc8dc3 upstream. Few vport parameters were displayed by systool as 'Unknown' or 'NULL'. Copy speed, supported_speed, frame_size and update port_type for NPIV port. Link: https://lore.kernel.org/r/20220919134434.3513-1-njavali@marvell.com Cc: stable@vger.kernel.org Tested-by: Guangwu Zhang <guazhang@redhat.com> Reviewed-by: John Meneghini <jmeneghi@redhat.com> Signed-off-by: Saurav Kashyap <skashyap@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-10-24scsi: lpfc: Rework MIB Rx Monitor debug info logicJames Smart7-125/+240
commit bd269188ea94e40ab002cad7b0df8f12b8f0de54 upstream. The kernel test robot reported the following sparse warning: arch/arm64/include/asm/cmpxchg.h:88:1: sparse: sparse: cast truncates bits from constant value (369 becomes 69) On arm64, atomic_xchg only works on 8-bit byte fields. Thus, the macro usage of LPFC_RXMONITOR_TABLE_IN_USE can be unintentionally truncated leading to all logic involving the LPFC_RXMONITOR_TABLE_IN_USE macro to not work properly. Replace the Rx Table atomic_t indexing logic with a new lpfc_rx_info_monitor structure that holds a circular ring buffer. For locking semantics, a spinlock_t is used. Link: https://lore.kernel.org/r/20220819011736.14141-4-jsmart2021@gmail.com Fixes: 17b27ac59224 ("scsi: lpfc: Add rx monitoring statistics") Cc: <stable@vger.kernel.org> # v5.15+ Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-10-15scsi: stex: Properly zero out the passthrough command structureLinus Torvalds1-8/+9
commit 6022f210461fef67e6e676fd8544ca02d1bcfa7a upstream. The passthrough structure is declared off of the stack, so it needs to be set to zero before copied back to userspace to prevent any unintentional data leakage. Switch things to be statically allocated which will fill the unused fields with 0 automatically. Link: https://lore.kernel.org/r/YxrjN3OOw2HHl9tx@kroah.com Cc: stable@kernel.org Cc: "James E.J. Bottomley" <jejb@linux.ibm.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Reported-by: hdthky <hdthky0@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-10-12scsi: qedf: Fix a UAF bug in __qedf_probe()Letu Ren1-5/+0
[ Upstream commit fbfe96869b782364caebae0445763969ddb6ea67 ] In __qedf_probe(), if qedf->cdev is NULL which means qed_ops->common->probe() failed, then the program will goto label err1, and scsi_host_put() will free lport->host pointer. Because the memory qedf points to is allocated by libfc_host_alloc(), it will be freed by scsi_host_put(). However, the if statement below label err0 only checks whether qedf is NULL but doesn't check whether the memory has been freed. So a UAF bug can occur. There are two ways to reach the statements below err0. The first one is described as before, "qedf" should be set to NULL. The second one is goto "err0" directly. In the latter scenario qedf hasn't been changed and it has the initial value NULL. As a result the if statement is not reachable in any situation. The KASAN logs are as follows: [ 2.312969] BUG: KASAN: use-after-free in __qedf_probe+0x5dcf/0x6bc0 [ 2.312969] [ 2.312969] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 [ 2.312969] Call Trace: [ 2.312969] dump_stack_lvl+0x59/0x7b [ 2.312969] print_address_description+0x7c/0x3b0 [ 2.312969] ? __qedf_probe+0x5dcf/0x6bc0 [ 2.312969] __kasan_report+0x160/0x1c0 [ 2.312969] ? __qedf_probe+0x5dcf/0x6bc0 [ 2.312969] kasan_report+0x4b/0x70 [ 2.312969] ? kobject_put+0x25d/0x290 [ 2.312969] kasan_check_range+0x2ca/0x310 [ 2.312969] __qedf_probe+0x5dcf/0x6bc0 [ 2.312969] ? selinux_kernfs_init_security+0xdc/0x5f0 [ 2.312969] ? trace_rpm_return_int_rcuidle+0x18/0x120 [ 2.312969] ? rpm_resume+0xa5c/0x16e0 [ 2.312969] ? qedf_get_generic_tlv_data+0x160/0x160 [ 2.312969] local_pci_probe+0x13c/0x1f0 [ 2.312969] pci_device_probe+0x37e/0x6c0 Link: https://lore.kernel.org/r/20211112120641.16073-1-fantasquex@gmail.com Reported-by: Zheyu Ma <zheyuma97@gmail.com> Acked-by: Saurav Kashyap <skashyap@marvell.com> Co-developed-by: Wende Tan <twd2.me@gmail.com> Signed-off-by: Wende Tan <twd2.me@gmail.com> Signed-off-by: Letu Ren <fantasquex@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-09-28scsi: mpt3sas: Fix return value check of dma_get_required_mask()Sreekanth Reddy1-1/+1
[ Upstream commit e0e0747de0ea3dd87cdbb0393311e17471a9baf1 ] Fix the incorrect return value check of dma_get_required_mask(). Due to this incorrect check, the driver was always setting the DMA mask to 63 bit. Link: https://lore.kernel.org/r/20220913120538.18759-2-sreekanth.reddy@broadcom.com Fixes: ba27c5cf286d ("scsi: mpt3sas: Don't change the DMA coherent mask after allocations") Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-09-28scsi: qla2xxx: Fix memory leak in __qlt_24xx_handle_abts()Rafael Mendonca1-1/+3
[ Upstream commit 601be20fc6a1b762044d2398befffd6bf236cebf ] Commit 8f394da36a36 ("scsi: qla2xxx: Drop TARGET_SCF_LOOKUP_LUN_FROM_TAG") made the __qlt_24xx_handle_abts() function return early if tcm_qla2xxx_find_cmd_by_tag() didn't find a command, but it missed to clean up the allocated memory for the management command. Link: https://lore.kernel.org/r/20220914024924.695604-1-rafaelmendsr@gmail.com Fixes: 8f394da36a36 ("scsi: qla2xxx: Drop TARGET_SCF_LOOKUP_LUN_FROM_TAG") Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Rafael Mendonca <rafaelmendsr@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-09-28scsi: core: Fix a use-after-freeBart Van Assche5-5/+21
[ Upstream commit 8fe4ce5836e932f5766317cb651c1ff2a4cd0506 ] There are two .exit_cmd_priv implementations. Both implementations use resources associated with the SCSI host. Make sure that these resources are still available when .exit_cmd_priv is called by waiting inside scsi_remove_host() until the tag set has been freed. This commit fixes the following use-after-free: ================================================================== BUG: KASAN: use-after-free in srp_exit_cmd_priv+0x27/0xd0 [ib_srp] Read of size 8 at addr ffff888100337000 by task multipathd/16727 Call Trace: <TASK> dump_stack_lvl+0x34/0x44 print_report.cold+0x5e/0x5db kasan_report+0xab/0x120 srp_exit_cmd_priv+0x27/0xd0 [ib_srp] scsi_mq_exit_request+0x4d/0x70 blk_mq_free_rqs+0x143/0x410 __blk_mq_free_map_and_rqs+0x6e/0x100 blk_mq_free_tag_set+0x2b/0x160 scsi_host_dev_release+0xf3/0x1a0 device_release+0x54/0xe0 kobject_put+0xa5/0x120 device_release+0x54/0xe0 kobject_put+0xa5/0x120 scsi_device_dev_release_usercontext+0x4c1/0x4e0 execute_in_process_context+0x23/0x90 device_release+0x54/0xe0 kobject_put+0xa5/0x120 scsi_disk_release+0x3f/0x50 device_release+0x54/0xe0 kobject_put+0xa5/0x120 disk_release+0x17f/0x1b0 device_release+0x54/0xe0 kobject_put+0xa5/0x120 dm_put_table_device+0xa3/0x160 [dm_mod] dm_put_device+0xd0/0x140 [dm_mod] free_priority_group+0xd8/0x110 [dm_multipath] free_multipath+0x94/0xe0 [dm_multipath] dm_table_destroy+0xa2/0x1e0 [dm_mod] __dm_destroy+0x196/0x350 [dm_mod] dev_remove+0x10c/0x160 [dm_mod] ctl_ioctl+0x2c2/0x590 [dm_mod] dm_ctl_ioctl+0x5/0x10 [dm_mod] __x64_sys_ioctl+0xb4/0xf0 dm_ctl_ioctl+0x5/0x10 [dm_mod] __x64_sys_ioctl+0xb4/0xf0 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x46/0xb0 Link: https://lore.kernel.org/r/20220826002635.919423-1-bvanassche@acm.org Fixes: 65ca846a5314 ("scsi: core: Introduce {init,exit}_cmd_priv()") Cc: Ming Lei <ming.lei@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Mike Christie <michael.christie@oracle.com> Cc: Hannes Reinecke <hare@suse.de> Cc: John Garry <john.garry@huawei.com> Cc: Li Zhijian <lizhijian@fujitsu.com> Reported-by: Li Zhijian <lizhijian@fujitsu.com> Tested-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-09-28block: simplify disk shutdownChristoph Hellwig4-8/+8
[ Upstream commit 6f8191fdf41d3a53cc1d63fe2234e812c55a0092 ] Set the queue dying flag and call blk_mq_exit_queue from del_gendisk for all disks that do not have separately allocated queues, and thus remove the need to call blk_cleanup_queue for them. Rename blk_cleanup_disk to blk_mq_destroy_queue to make it clear that this function is intended only for separately allocated blk-mq queues. This saves an extra queue freeze for devices without a separately allocated queue. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20220619060552.1850436-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Stable-dep-of: 8fe4ce5836e9 ("scsi: core: Fix a use-after-free") Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-09-15scsi: lpfc: Add missing destroy_workqueue() in error pathYang Yingliang1-1/+4
commit da6d507f5ff328f346b3c50e19e19993027b8ffd upstream. Add the missing destroy_workqueue() before return from lpfc_sli4_driver_resource_setup() in the error path. Link: https://lore.kernel.org/r/20220823044237.285643-1-yangyingliang@huawei.com Fixes: 3cee98db2610 ("scsi: lpfc: Fix crash on driver unload in wq free") Reviewed-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-15scsi: mpt3sas: Fix use-after-free warningSreekanth Reddy1-1/+1
commit 991df3dd5144f2e6b1c38b8d20ed3d4d21e20b34 upstream. Fix the following use-after-free warning which is observed during controller reset: refcount_t: underflow; use-after-free. WARNING: CPU: 23 PID: 5399 at lib/refcount.c:28 refcount_warn_saturate+0xa6/0xf0 Link: https://lore.kernel.org/r/20220906134908.1039-2-sreekanth.reddy@broadcom.com Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-15scsi: megaraid_sas: Fix double kfree()Guixin Liu1-1/+0
[ Upstream commit 8c499e49240bd93628368c3588975cfb94169b8b ] When allocating log_to_span fails, kfree(instance->ctrl_context) is called twice. Remove redundant call. Link: https://lore.kernel.org/r/1659424729-46502-1-git-send-email-kanie@linux.alibaba.com Acked-by: Sumit Saxena <sumit.saxena@broadcom.com> Signed-off-by: Guixin Liu <kanie@linux.alibaba.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-09-15scsi: core: Allow the ALUA transitioning state enough timeBrian Bunker1-19/+25
[ Upstream commit 54249306e2776774ccb827969e62d34570f991db ] The error path for the SCSI check condition of not ready, target in ALUA state transition, will result in the failure of that path after the retries are exhausted. In most cases that is well ahead of the transition timeout established in the SCSI ALUA device handler. Instead, reprep the command and re-add it to the queue after a 1 second delay. This will allow the handler to take care of the timeout and only fail the path if the target has exceeded the transition expiry timeout (default 60 seconds). If the expiry timeout is exceeded, the handler will change the path state from transitioning to standby leading to a path failure eliminating the potential of this re-prep to continue endlessly. In most cases the target will exit the transitioning state well before the expiry timeout but after the retries are exhausted as mentioned. Additionally remove the scsi_io_completion_reprep() function which provides little value. Link: https://lore.kernel.org/r/20220729214110.58576-1-brian@purestorage.com Reviewed-by: Martin Wilck <mwilck@suse.com> Acked-by: Krishna Kant <krishna.kant@purestorage.com> Acked-by: Seamus Connor <sconnor@purestorage.com> Signed-off-by: Brian Bunker <brian@purestorage.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-09-15scsi: qla2xxx: Disable ATIO interrupt coalesce for quad port ISP27XXTony Battersby1-8/+2
[ Upstream commit 53661ded2460b414644532de6b99bd87f71987e9 ] This partially reverts commit d2b292c3f6fd ("scsi: qla2xxx: Enable ATIO interrupt handshake for ISP27XX") For some workloads where the host sends a batch of commands and then pauses, ATIO interrupt coalesce can cause some incoming ATIO entries to be ignored for extended periods of time, resulting in slow performance, timeouts, and aborted commands. Disable interrupt coalesce and re-enable the dedicated ATIO MSI-X interrupt. Link: https://lore.kernel.org/r/97dcf365-89ff-014d-a3e5-1404c6af511c@cybernetics.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Reviewed-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Tony Battersby <tonyb@cybernetics.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-31scsi: core: Fix passthrough retry counter handlingMike Christie1-1/+2
commit fac8e558da9485e13a0ae0488aa0b8a8c307cd34 upstream. Passthrough users will set the scsi_cmnd->allowed value and were expecting up to $allowed retries. The problem is that before: commit 6aded12b10e0 ("scsi: core: Remove struct scsi_request") we used to set the retries on the scsi_request then copy them over to scsi_cmnd->allowed in scsi_setup_scsi_cmnd. With that patch we now set scsi_cmnd->allowed to 0 in scsi_prepare_cmd and overwrite what the passthrough user set. This moves the allowed initialization to after the blk_rq_is_passthrough() check so it's only done for the non-passthrough path where the ULD init_command will normally set an allowed value it prefers. Link: https://lore.kernel.org/r/20220812011206.9157-1-michael.christie@oracle.com Fixes: 6aded12b10e0 ("scsi: core: Remove struct scsi_request") Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-31scsi: storvsc: Remove WQ_MEM_RECLAIM from storvsc_error_wqSaurabh Sengar1-1/+1
commit d957e7ffb2c72410bcc1a514153a46719255a5da upstream. storvsc_error_wq workqueue should not be marked as WQ_MEM_RECLAIM as it doesn't need to make forward progress under memory pressure. Marking this workqueue as WQ_MEM_RECLAIM may cause deadlock while flushing a non-WQ_MEM_RECLAIM workqueue. In the current state it causes the following warning: [ 14.506347] ------------[ cut here ]------------ [ 14.506354] workqueue: WQ_MEM_RECLAIM storvsc_error_wq_0:storvsc_remove_lun is flushing !WQ_MEM_RECLAIM events_freezable_power_:disk_events_workfn [ 14.506360] WARNING: CPU: 0 PID: 8 at <-snip->kernel/workqueue.c:2623 check_flush_dependency+0xb5/0x130 [ 14.506390] CPU: 0 PID: 8 Comm: kworker/u4:0 Not tainted 5.4.0-1086-azure #91~18.04.1-Ubuntu [ 14.506391] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 05/09/2022 [ 14.506393] Workqueue: storvsc_error_wq_0 storvsc_remove_lun [ 14.506395] RIP: 0010:check_flush_dependency+0xb5/0x130 <-snip-> [ 14.506408] Call Trace: [ 14.506412] __flush_work+0xf1/0x1c0 [ 14.506414] __cancel_work_timer+0x12f/0x1b0 [ 14.506417] ? kernfs_put+0xf0/0x190 [ 14.506418] cancel_delayed_work_sync+0x13/0x20 [ 14.506420] disk_block_events+0x78/0x80 [ 14.506421] del_gendisk+0x3d/0x2f0 [ 14.506423] sr_remove+0x28/0x70 [ 14.506427] device_release_driver_internal+0xef/0x1c0 [ 14.506428] device_release_driver+0x12/0x20 [ 14.506429] bus_remove_device+0xe1/0x150 [ 14.506431] device_del+0x167/0x380 [ 14.506432] __scsi_remove_device+0x11d/0x150 [ 14.506433] scsi_remove_device+0x26/0x40 [ 14.506434] storvsc_remove_lun+0x40/0x60 [ 14.506436] process_one_work+0x209/0x400 [ 14.506437] worker_thread+0x34/0x400 [ 14.506439] kthread+0x121/0x140 [ 14.506440] ? process_one_work+0x400/0x400 [ 14.506441] ? kthread_park+0x90/0x90 [ 14.506443] ret_from_fork+0x35/0x40 [ 14.506445] ---[ end trace 2d9633159fdc6ee7 ]--- Link: https://lore.kernel.org/r/1659628534-17539-1-git-send-email-ssengar@linux.microsoft.com Fixes: 436ad9413353 ("scsi: storvsc: Allow only one remove lun work item to be issued per lun") Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-25scsi: lpfc: Fix possible memory leak when failing to issue CMF WQEJames Smart1-1/+3
[ Upstream commit 2f67dc7970bce3529edce93a0a14234d88b3fcd5 ] There is no corresponding free routine if lpfc_sli4_issue_wqe fails to issue the CMF WQE in lpfc_issue_cmf_sync_wqe. If ret_val is non-zero, then free the iocbq request structure. Link: https://lore.kernel.org/r/20220701211425.2708-6-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-25scsi: lpfc: Prevent buffer overflow crashes in debugfs with malformed user inputJames Smart1-10/+10
[ Upstream commit f8191d40aa612981ce897e66cda6a88db8df17bb ] Malformed user input to debugfs results in buffer overflow crashes. Adapt input string lengths to fit within internal buffers, leaving space for NULL terminators. Link: https://lore.kernel.org/r/20220701211425.2708-3-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-25scsi: iscsi: Fix HW conn removal use after freeMike Christie1-2/+0
[ Upstream commit c577ab7ba5f3bf9062db8a58b6e89d4fe370447e ] If qla4xxx doesn't remove the connection before the session, the iSCSI class tries to remove the connection for it. We were doing a iscsi_put_conn() in the iter function which is not needed and will result in a use after free because iscsi_remove_conn() will free the connection. Link: https://lore.kernel.org/r/20220616222738.5722-2-michael.christie@oracle.com Tested-by: Nilesh Javali <njavali@marvell.com> Reviewed-by: Lee Duncan <lduncan@suse.com> Reviewed-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-17scsi: qla2xxx: Fix losing FCP-2 targets during port perturbation testsArun Easi1-1/+1
commit 58d1c124cd79ea686b512043c5bd515590b2ed95 upstream. When a mix of FCP-2 (tape) and non-FCP-2 targets are present, FCP-2 target state was incorrectly transitioned when both of the targets were gone. Fix this by ignoring state transition for FCP-2 targets. Link: https://lore.kernel.org/r/20220616053508.27186-7-njavali@marvell.com Fixes: 44c57f205876 ("scsi: qla2xxx: Changes to support FCP2 Target") Cc: stable@vger.kernel.org Signed-off-by: Arun Easi <aeasi@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-17scsi: qla2xxx: Fix losing target when it reappears during deleteArun Easi1-7/+17
commit 118b0c863c8f5629cc5271fc24d72d926e0715d9 upstream. FC target disappeared during port perturbation tests due to a race that tramples target state. Fix the issue by adding state checks before proceeding. Link: https://lore.kernel.org/r/20220616053508.27186-8-njavali@marvell.com Fixes: 44c57f205876 ("scsi: qla2xxx: Changes to support FCP2 Target") Cc: stable@vger.kernel.org Signed-off-by: Arun Easi <aeasi@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-17scsi: qla2xxx: Fix losing FCP-2 targets on long port disable with I/OsArun Easi1-4/+8
commit 2416ccd3815ba1613e10a6da0a24ef21acfe5633 upstream. FCP-2 devices were not coming back online once they were lost, login retries exhausted, and then came back up. Fix this by accepting RSCN when the device is not online. Link: https://lore.kernel.org/r/20220616053508.27186-10-njavali@marvell.com Fixes: 44c57f205876 ("scsi: qla2xxx: Changes to support FCP2 Target") Cc: stable@vger.kernel.org Signed-off-by: Arun Easi <aeasi@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-17scsi: qla2xxx: Wind down adapter after PCIe errorQuinn Tran4-1/+81
commit d3117c83ba316b3200d9f2fe900f2b9a5525a25c upstream. Put adapter into a wind down state if OS does not make any attempt to recover the adapter after PCIe error. Link: https://lore.kernel.org/r/20220616053508.27186-4-njavali@marvell.com Cc: stable@vger.kernel.org Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-17scsi: qla2xxx: Fix erroneous mailbox timeout after PCI error injectionQuinn Tran1-6/+6
commit f260694e6463b63ae550aad25ddefe94cb1904da upstream. Clear wait for mailbox interrupt flag to prevent stale mailbox: Feb 22 05:22:56 ltcden4-lp7 kernel: qla2xxx [0135:90:00.1]-500a:4: LOOP UP detected (16 Gbps). Feb 22 05:22:59 ltcden4-lp7 kernel: qla2xxx [0135:90:00.1]-d04c:4: MBX Command timeout for cmd 69, ... To fix the issue, driver needs to clear the MBX_INTR_WAIT flag on purging the mailbox. When the stale mailbox completion does arrive, it will be dropped. Link: https://lore.kernel.org/r/20220616053508.27186-11-njavali@marvell.com Fixes: b6faaaf796d7 ("scsi: qla2xxx: Serialize mailbox request") Cc: Naresh Bannoth <nbannoth@in.ibm.com> Cc: Kyle Mahlkuch <Kyle.Mahlkuch@ibm.com> Cc: stable@vger.kernel.org Reported-by: Naresh Bannoth <nbannoth@in.ibm.com> Tested-by: Naresh Bannoth <nbannoth@in.ibm.com> Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-17scsi: qla2xxx: Fix excessive I/O error messages by defaultArun Easi1-2/+2
commit bff4873c709085e09d0ffae0c25b8e65256e3205 upstream. Disable printing I/O error messages by default. The messages will be printed only when logging was enabled. Link: https://lore.kernel.org/r/20220616053508.27186-2-njavali@marvell.com Fixes: 8e2d81c6b5be ("scsi: qla2xxx: Fix excessive messages during device logout") Cc: stable@vger.kernel.org Signed-off-by: Arun Easi <aeasi@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-17scsi: qla2xxx: Fix crash due to stale SRB access around I/O timeoutsArun Easi1-12/+31
commit c39587bc0abaf16593f7abcdf8aeec3c038c7d52 upstream. Ensure SRB is returned during I/O timeout error escalation. If that is not possible fail the escalation path. Following crash stack was seen: BUG: unable to handle kernel paging request at 0000002f56aa90f8 IP: qla_chk_edif_rx_sa_delete_pending+0x14/0x30 [qla2xxx] Call Trace: ? qla2x00_status_entry+0x19f/0x1c50 [qla2xxx] ? qla2x00_start_sp+0x116/0x1170 [qla2xxx] ? dma_pool_alloc+0x1d6/0x210 ? mempool_alloc+0x54/0x130 ? qla24xx_process_response_queue+0x548/0x12b0 [qla2xxx] ? qla_do_work+0x2d/0x40 [qla2xxx] ? process_one_work+0x14c/0x390 Link: https://lore.kernel.org/r/20220616053508.27186-6-njavali@marvell.com Fixes: d74595278f4a ("scsi: qla2xxx: Add multiple queue pair functionality.") Cc: stable@vger.kernel.org Signed-off-by: Arun Easi <aeasi@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-17scsi: qla2xxx: Turn off multi-queue for 8G adaptersQuinn Tran2-12/+8
commit 5304673bdb1635e27555bd636fd5d6956f1cd552 upstream. For 8G adapters, multi-queue was enabled accidentally. Make sure multi-queue is not enabled. Link: https://lore.kernel.org/r/20220616053508.27186-5-njavali@marvell.com Cc: stable@vger.kernel.org Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-17scsi: qla2xxx: Fix discovery issues in FC-AL topologyArun Easi3-2/+35
commit 47ccb113cead905bdc236571bf8ac6fed90321b3 upstream. A direct attach tape device, when gets swapped with another, was not discovered. Fix this by looking at loop map and reinitialize link if there are devices present. Link: https://lore.kernel.org/linux-scsi/baef87c3-5dad-3b47-44c1-6914bfc90108@cybernetics.com/ Link: https://lore.kernel.org/r/20220713052045.10683-8-njavali@marvell.com Cc: stable@vger.kernel.org Reported-by: Tony Battersby <tonyb@cybernetics.com> Tested-by: Tony Battersby <tonyb@cybernetics.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Arun Easi <aeasi@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-17scsi: qla2xxx: Fix imbalance vha->vref_countQuinn Tran1-0/+4
commit 63fa7f2644b4b48e1913af33092c044bf48e9321 upstream. vref_count took an extra decrement in the task management path. Add an extra ref count to compensate the imbalance. Link: https://lore.kernel.org/r/20220713052045.10683-7-njavali@marvell.com Cc: stable@vger.kernel.org Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-17scsi: smartpqi: Fix DMA direction for RAID requestsMahesh Rajashekhara1-2/+2
[ Upstream commit 69695aeaa6621bc49cdd7a8e5a8d1042461e496e ] Correct a SOP READ and WRITE DMA flags for some requests. This update corrects DMA direction issues with SCSI commands removed from the controller's internal lookup table. Currently, SCSI READ BLOCK LIMITS (0x5) was removed from the controller lookup table and exposed a DMA direction flag issue. SCSI READ BLOCK LIMITS was recently removed from our controller lookup table so the controller uses the respective IU flag field to set the DMA data direction. Since the DMA direction is incorrect the FW never completes the request causing a hang. Some SCSI commands which use SCSI READ BLOCK LIMITS * sg_map * mt -f /dev/stX status After updating controller firmware, users may notice their tape units failing. This patch resolves the issue. Also, the AIO path DMA direction is correct. The DMA direction flag is a day-one bug with no reported BZ. Fixes: 6c223761eb54 ("smartpqi: initial commit of Microsemi smartpqi driver") Link: https://lore.kernel.org/r/165730605618.177165.9054223644512926624.stgit@brunhilda Reviewed-by: Scott Benesh <scott.benesh@microchip.com> Reviewed-by: Scott Teel <scott.teel@microchip.com> Reviewed-by: Mike McGowen <mike.mcgowen@microchip.com> Reviewed-by: Kevin Barnett <kevin.barnett@microchip.com> Signed-off-by: Mahesh Rajashekhara <Mahesh.Rajashekhara@microchip.com> Signed-off-by: Don Brace <don.brace@microchip.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-17scsi: lpfc: Revert RSCN_MEMENTO workaround for misbehaved configurationJames Smart3-9/+3
[ Upstream commit ffc566411aded3c12c63e1310fb23830e9608c59 ] The RSCN_MEMENTO logic was to workaround a target that does not register both FCP and NVMe FC4 types at the same time. This caused the configuration to not produce a second RSCN for the NVMe FC4 type registration in a timely manner. The intention of the RSCN_MEMENTO flag was to always signal to try NVMe PRLI. However, there are other FCP-only target arrays in correctly behaved configurations that reject the NVMe PRLI followed by a LOGO leading to never rediscovering the target after an issue_lip (as LOGO causes a repeat of PLOGI/PRLIs). Revert the RSCN_MEMENTO patch as it is causing correctly behaved configs to fail while it exists only to succeed on a misbehaved config. Link: https://lore.kernel.org/r/20220701211425.2708-9-jsmart2021@gmail.com Fixes: 1045592fc968 ("scsi: lpfc: Introduce FC_RSCN_MEMENTO flag for tracking post RSCN completion") Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-17scsi: qla2xxx: Check correct variable in qla24xx_async_gffid()Dan Carpenter1-2/+2
[ Upstream commit 7c33e477bd883f79cccec418980cb8f7f2d50347 ] There is a copy and paste bug here. It should check ".rsp" instead of ".req". The error message is copy and pasted as well so update that too. Link: https://lore.kernel.org/r/YrK1A/t3L6HKnswO@kili Fixes: 9c40c36e75ff ("scsi: qla2xxx: edif: Reduce Initiator-Initiator thrashing") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-17scsi: iscsi: Fix session removal on shutdownMike Christie6-10/+18
[ Upstream commit 31500e902759322ba3c64b60dabae2704e738df8 ] When the system is shutting down, iscsid is not running so we will not get a response to the ISCSI_ERR_INVALID_HOST error event. The system shutdown will then hang waiting on userspace to remove the session. This has libiscsi force the destruction of the session from the kernel when iscsi_host_remove() is called from a driver's shutdown callout. This fixes a regression added in qedi boot with commit d1f2ce77638d ("scsi: qedi: Fix host removal with running sessions") which made qedi use the common session removal function that waits on userspace instead of rolling its own kernel based removal. Link: https://lore.kernel.org/r/20220616222738.5722-7-michael.christie@oracle.com Fixes: d1f2ce77638d ("scsi: qedi: Fix host removal with running sessions") Tested-by: Nilesh Javali <njavali@marvell.com> Reviewed-by: Lee Duncan <lduncan@suse.com> Reviewed-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-17scsi: iscsi: Add helper to remove a session from the kernelMike Christie1-0/+49
[ Upstream commit bb42856bfd54fda1cbc7c470fcf5db1596938f4f ] During qedi shutdown we need to stop the iSCSI layer from sending new nops as pings and from responding to target ones and make sure there is no running connection cleanups. Commit d1f2ce77638d ("scsi: qedi: Fix host removal with running sessions") converted the driver to use the libicsi helper to drive session removal, so the above issues could be handled. The problem is that during system shutdown iscsid will not be running so when we try to remove the root session we will hang waiting for userspace to reply. Add a helper that will drive the destruction of sessions like these during system shutdown. Link: https://lore.kernel.org/r/20220616222738.5722-5-michael.christie@oracle.com Tested-by: Nilesh Javali <njavali@marvell.com> Reviewed-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-17scsi: iscsi: Allow iscsi_if_stop_conn() to be called from kernelMike Christie1-10/+7
[ Upstream commit 3328333b47f4163504267440ec0a36087a407a5f ] iscsi_if_stop_conn() is only called from the userspace interface but in a subsequent commit we will want to call it from the kernel interface to allow drivers like qedi to remove sessions from inside the kernel during shutdown. This removes the iscsi_uevent code from iscsi_if_stop_conn() so we can call it in a new helper. Link: https://lore.kernel.org/r/20220616222738.5722-3-michael.christie@oracle.com Tested-by: Nilesh Javali <njavali@marvell.com> Reviewed-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-17scsi: qla2xxx: edif: Reduce N2N thrashing at app_start timeQuinn Tran1-13/+34
[ Upstream commit 37be3f9d6993a721bc019f03c97ea0fe66319997 ] For N2N + remote WWPN is bigger than local adapter, remote adapter will login to local adapter while authentication application is not running. When authentication application starts, the current session in FW needs to to be invalidated. Make sure the old session is torn down before triggering a relogin. Link: https://lore.kernel.org/r/20220608115849.16693-9-njavali@marvell.com Fixes: 4de067e5df12 ("scsi: qla2xxx: edif: Add N2N support for EDIF") Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-17scsi: qla2xxx: edif: Fix no logout on delete for N2NQuinn Tran1-0/+3
[ Upstream commit ec538eb838f334453b10e7e9b260f0c358018a37 ] The driver failed to send implicit logout on session delete. For edif, this failed to flush any lingering SA index in FW. Set a flag to turn on implicit logout early in the session recovery to make sure the logout will go out in case of error. Link: https://lore.kernel.org/r/20220608115849.16693-8-njavali@marvell.com Fixes: 4de067e5df12 ("scsi: qla2xxx: edif: Add N2N support for EDIF") Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-17scsi: qla2xxx: edif: Fix session thrashQuinn Tran3-2/+14
[ Upstream commit a8fdfb0b39c2b31722c70bdf2272b949d5af4b7b ] Current code prematurely sends out PRLI before authentication application has given the OK to do so. This causes PRLI failure and session teardown. Prevents PRLI from going out before authentication app gives the OK. Link: https://lore.kernel.org/r/20220608115849.16693-7-njavali@marvell.com Fixes: 91f6f5fbe87b ("scsi: qla2xxx: edif: Reduce connection thrash") Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-17scsi: qla2xxx: edif: Tear down session if keys have been removedQuinn Tran2-0/+6
[ Upstream commit d7e2e4a68fc047a025afcd200e6b7e1fbc8b1999 ] If all keys for a session have been deleted, trigger a session teardown. Link: https://lore.kernel.org/r/20220608115849.16693-6-njavali@marvell.com Fixes: dd30706e73b7 ("scsi: qla2xxx: edif: Add key update") Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-17scsi: qla2xxx: edif: Fix no login after app startQuinn Tran1-0/+1
[ Upstream commit 24c796098f5395477f7f7ebf8e24f3f08a139f71 ] The scenario is this: User loaded driver but has not started authentication app. All sessions to secure device will exhaust all login attempts, fail, and in stay in deleted state. Then some time later the app is started. The driver will replenish the login retry count, trigger delete to prepare for secure login. After deletion, relogin is triggered. For the session that is already deleted, the delete trigger is a no-op. If none of the sessions trigger a relogin, no progress is made. Add a relogin trigger. Link: https://lore.kernel.org/r/20220608115849.16693-5-njavali@marvell.com Fixes: 7ebb336e45ef ("scsi: qla2xxx: edif: Add start + stop bsgs") Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-17scsi: qla2xxx: edif: Reduce disruption due to multiple app startQuinn Tran1-2/+2
[ Upstream commit 0dbfce5255fe8d069a1a3b712a25b263264cfa58 ] Multiple app start can trigger a session bounce. Make driver skip over session teardown if app start is seen more than once. Link: https://lore.kernel.org/r/20220608115849.16693-4-njavali@marvell.com Fixes: 7ebb336e45ef ("scsi: qla2xxx: edif: Add start + stop bsgs") Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-17scsi: qla2xxx: edif: Send LOGO for unexpected IKE messageQuinn Tran2-3/+18
[ Upstream commit 2b659ed67a12f39f56d8dcad9b5d5a74d67c01b3 ] If the session is down and the local port continues to receive AUTH ELS messages, the driver needs to send back LOGO so that the remote device knows to tear down its session. Terminate and clean up the AUTH ELS exchange followed by a passthrough LOGO. Link: https://lore.kernel.org/r/20220608115849.16693-3-njavali@marvell.com Fixes: 225479296c4f ("scsi: qla2xxx: edif: Reject AUTH ELS on session down") Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-17scsi: qla2xxx: edif: Fix n2n login retry for secure deviceQuinn Tran1-0/+7
[ Upstream commit aec55325ddec975216119da000092cb8664a3399 ] After initiator has burned up all login retries, target authentication application begins to run. This triggers a link bounce on target side. Initiator will attempt another login. Due to N2N, the PRLI [nvme | fcp] can fail because of the mode mismatch with target. This patch add a few more login retries to revive the connection. Link: https://lore.kernel.org/r/20220607044627.19563-11-njavali@marvell.com Fixes: 4de067e5df12 ("scsi: qla2xxx: edif: Add N2N support for EDIF") Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>