kernel/linux.git/drivers/scsi/device_handler, branch v6.1.168

scsi: core: alua: I/O errors for ALUA state transitions

2024-07-25T07:49:07+00:00

[ Upstream commit 10157b1fc1a762293381e9145041253420dfc6ad ] When a host is configured with a few LUNs and I/O is running, injecting FC faults repeatedly leads to path recovery problems. The LUNs have 4 paths each and 3 of them come back active after say an FC fault which makes 2 of the paths go down, instead of all 4. This happens after several iterations of continuous FC faults. Reason here is that we're returning an I/O error whenever we're encountering sense code 06/04/0a (LOGICAL UNIT NOT ACCESSIBLE, ASYMMETRIC ACCESS STATE TRANSITION) instead of retrying. [mwilck: The original patch was developed by Rajashekhar M A and Hannes Reinecke. I moved the code to alua_check_sense() as suggested by Mike Christie [1]. Evan Milne had raised the question whether pg->state should be set to transitioning in the UA case [2]. I believe that doing this is correct. SCSI_ACCESS_STATE_TRANSITIONING by itself doesn't cause I/O errors. Our handler schedules an RTPG, which will only result in an I/O error condition if the transitioning timeout expires.] [1] https://lore.kernel.org/all/0bc96e82-fdda-4187-148d-5b34f81d4942@oracle.com/ [2] https://lore.kernel.org/all/CAGtn9r=kicnTDE2o7Gt5Y=yoidHYD7tG8XdMHEBJTBraVEoOCw@mail.gmail.com/ Co-developed-by: Rajashekhar M A Co-developed-by: Hannes Reinecke Signed-off-by: Hannes Reinecke Signed-off-by: Martin Wilck Link: https://lore.kernel.org/r/20240514140344.19538-1-mwilck@suse.com Reviewed-by: Damien Le Moal Reviewed-by: Christoph Hellwig Reviewed-by: Mike Christie Signed-off-by: Martin K. Petersen Signed-off-by: Sasha Levin

scsi: scsi_dh_alua: Fix memleak for 'qdata' in alua_activate()

2023-03-30T10:49:03+00:00

[ Upstream commit a13faca032acbf2699293587085293bdfaafc8ae ] If alua_rtpg_queue() failed from alua_activate(), then 'qdata' is not freed, which will cause following memleak: unreferenced object 0xffff88810b2c6980 (size 32): comm "kworker/u16:2", pid 635322, jiffies 4355801099 (age 1216426.076s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 40 39 24 c1 ff ff ff ff 00 f8 ea 0a 81 88 ff ff @9$............. backtrace: [<0000000098f3a26d>] alua_activate+0xb0/0x320 [<000000003b529641>] scsi_dh_activate+0xb2/0x140 [<000000007b296db3>] activate_path_work+0xc6/0xe0 [dm_multipath] [<000000007adc9ace>] process_one_work+0x3c5/0x730 [<00000000c457a985>] worker_thread+0x93/0x650 [<00000000cb80e628>] kthread+0x1ba/0x210 [<00000000a1e61077>] ret_from_fork+0x22/0x30 Fix the problem by freeing 'qdata' in error path. Fixes: 625fe857e4fa ("scsi: scsi_dh_alua: Check scsi_device_get() return value") Signed-off-by: Yu Kuai Link: https://lore.kernel.org/r/20230315062154.668812-1-yukuai1@huaweicloud.com Reviewed-by: Benjamin Block Reviewed-by: Bart Van Assche Signed-off-by: Martin K. Petersen Signed-off-by: Sasha Levin

scsi/device_handlers: Use the new blk_opf_t type

2022-07-14T18:14:32+00:00

Improve static type checking by using the new blk_opf_t type for variables that represent request flags. Cc: Hannes Reinecke Cc: Martin Wilck Signed-off-by: Bart Van Assche Link: https://lore.kernel.org/r/20220714180729.1065367-43-bvanassche@acm.org Signed-off-by: Jens Axboe

scsi: scsi_dh_alua: Properly handle the ALUA transitioning state

2022-05-02T23:52:13+00:00

The handling of the ALUA transitioning state is currently broken. When a target goes into this state, it is expected that the target is allowed to stay in this state for the implicit transition timeout without a path failure. The handler has this logic, but it gets skipped currently. When the target transitions, there is in-flight I/O from the initiator. The first of these responses from the target will be a unit attention letting the initiator know that the ALUA state has changed. The remaining in-flight I/Os, before the initiator finds out that the portal state has changed, will return not ready, ALUA state is transitioning. The portal state will change to SCSI_ACCESS_STATE_TRANSITIONING. This will lead to all new I/O immediately failing the path unexpectedly. The path failure happens in less than a second instead of the expected successes until the transition timer is exceeded. Allow I/Os to continue while the path is in the ALUA transitioning state. The handler already takes care of a target that stays in the transitioning state for too long by changing the state to ALUA state standby once the transition timeout is exceeded at which point the path will fail. Link: https://lore.kernel.org/r/CAHZQxy+4sTPz9+pY3=7VJH+CLUJsDct81KtnR2be8ycN5mhqTg@mail.gmail.com Reviewed-by: Hannes Reinecke Acked-by: Krishna Kant Acked-by: Seamus Connor Signed-off-by: Brian Bunker Signed-off-by: Martin K. Petersen

scsi: scsi_dh_rdac: Avoid crash during rdac_bus_attach()

2021-07-30T01:58:35+00:00

The following BUG_ON() was observed during RDAC scan: [595952.944297] kernel BUG at drivers/scsi/device_handler/scsi_dh_rdac.c:427! [595952.951143] Internal error: Oops - BUG: 0 [#1] SMP ...... [595953.251065] Call trace: [595953.259054] check_ownership+0xb0/0x118 [595953.269794] rdac_bus_attach+0x1f0/0x4b0 [595953.273787] scsi_dh_handler_attach+0x3c/0xe8 [595953.278211] scsi_dh_add_device+0xc4/0xe8 [595953.282291] scsi_sysfs_add_sdev+0x8c/0x2a8 [595953.286544] scsi_probe_and_add_lun+0x9fc/0xd00 [595953.291142] __scsi_scan_target+0x598/0x630 [595953.295395] scsi_scan_target+0x120/0x130 [595953.299481] fc_user_scan+0x1a0/0x1c0 [scsi_transport_fc] [595953.304944] store_scan+0xb0/0x108 [595953.308420] dev_attr_store+0x44/0x60 [595953.312160] sysfs_kf_write+0x58/0x80 [595953.315893] kernfs_fop_write+0xe8/0x1f0 [595953.319888] __vfs_write+0x60/0x190 [595953.323448] vfs_write+0xac/0x1c0 [595953.326836] ksys_write+0x74/0xf0 [595953.330221] __arm64_sys_write+0x24/0x30 Code is in check_ownership: list_for_each_entry_rcu(tmp, &h->ctlr->dh_list, node) { /* h->sdev should always be valid */ BUG_ON(!tmp->sdev); tmp->sdev->access_state = access_state; } rdac_bus_attach initialize_controller list_add_rcu(&h->node, &h->ctlr->dh_list); h->sdev = sdev; rdac_bus_detach list_del_rcu(&h->node); h->sdev = NULL; Fix the race between rdac_bus_attach() and rdac_bus_detach() where h->sdev is NULL when processing the RDAC attach. Link: https://lore.kernel.org/r/20210113063103.2698953-1-yebin10@huawei.com Reviewed-by: Bart Van Assche Signed-off-by: Ye Bin Signed-off-by: Martin K. Petersen

scsi: scsi_dh_alua: Fix signedness bug in alua_rtpg()

2021-06-08T01:48:16+00:00

The "retval" variable needs to be signed for the error handling to work. Link: https://lore.kernel.org/r/YLjMEAFNxOas1mIp@mwanda Fixes: 7e26e3ea0287 ("scsi: scsi_dh_alua: Check for negative result value") Reviewed-by: Martin Wilck Signed-off-by: Dan Carpenter Signed-off-by: Martin K. Petersen

scsi: scsi_dh_alua: Check for negative result value

2021-06-01T02:48:21+00:00

scsi_execute() will now return a negative error if there was an error prior to command submission; evaluate that instead if checking for DRIVER_ERROR. [mkp: build fix] Link: https://lore.kernel.org/r/20210427083046.31620-6-hare@suse.de Signed-off-by: Hannes Reinecke Signed-off-by: Martin K. Petersen

scsi: scsi_dh_alua: Retry RTPG on a different path after failure

2021-05-22T03:06:29+00:00

If an RTPG fails, we can't infer anything wrt. the state of the ports in the port group except that we were unable to reach the one port on which the RTPG had failed. "offline" is just a secondary port state, which means that we can't infer the state of any port in the PG from the failure (in fact, even the failed port might still be in "active/optimized" primary port access state). Therefore, when we encounter an RTPG failure, we should retry the RTPG on a different port. This avoids falsely setting port states to offline for unreachable ports. To do this, ports on which an RTPG has failed are temporarily set to "disabled" to avoid repeating the failed I/O on the same target port. Once the RTPG has either succeeded on one port or failed on all ports of the PG, the ports are enabled again. Link: https://lore.kernel.org/r/20210514153214.5626-1-mwilck@suse.com Signed-off-by: Martin Wilck Signed-off-by: Hannes Reinecke Signed-off-by: Martin K. Petersen

scsi: core: Introduce enum scsi_disposition

2021-04-16T02:44:40+00:00

Improve readability of the code in the SCSI core by introducing an enumeration type for the values used internally that decide how to continue processing a SCSI command. The eh_*_handler return values have not been changed because that would involve modifying all SCSI drivers. The output of the following command has been inspected to verify that no out-of-range values are assigned to a variable of type enum scsi_disposition: KCFLAGS=-Wassign-enum make CC=clang W=1 drivers/scsi/ Link: https://lore.kernel.org/r/20210415220826.29438-6-bvanassche@acm.org Cc: Christoph Hellwig Cc: Johannes Thumshirn Cc: Hannes Reinecke Cc: Daniel Wagner Signed-off-by: Bart Van Assche Signed-off-by: Martin K. Petersen

scsi: scsi_dh_alua: Remove check for ASC 24h in alua_rtpg()

2021-04-06T03:24:18+00:00

Some arrays return ILLEGAL_REQUEST with ASC 00h if they don't support the RTPG extended header so remove the check for INVALID FIELD IN CDB. Link: https://lore.kernel.org/r/20210331201154.20348-1-emilne@redhat.com Reviewed-by: Hannes Reinecke Signed-off-by: Ewan D. Milne Signed-off-by: Martin K. Petersen