kernel/linux.git/drivers/s390, branch v5.4.50

s390/qeth: fix error handling for isolation mode cmds

2020-06-30T19:36:57+00:00

[ Upstream commit e2dfcfba00ba4a414617ef4c5a8501fe21567eb3 ] Current(?) OSA devices also store their cmd-specific return codes for SET_ACCESS_CONTROL cmds into the top-level cmd->hdr.return_code. So once we added stricter checking for the top-level field a while ago, none of the error logic that rolls back the user's configuration to its old state is applied any longer. For this specific cmd, go back to the old model where we peek into the cmd structure even though the top-level field indicated an error. Fixes: 686c97ee29c8 ("s390/qeth: fix error handling in adapter command callbacks") Signed-off-by: Julian Wiedmann Signed-off-by: David S. Miller Signed-off-by: Sasha Levin

scsi: zfcp: Fix panic on ERP timeout for previously dismissed ERP action

2020-06-30T19:36:51+00:00

commit 936e6b85da0476dd2edac7c51c68072da9fb4ba2 upstream. Suppose that, for unrelated reasons, FSF requests on behalf of recovery are very slow and can run into the ERP timeout. In the case at hand, we did adapter recovery to a large degree. However due to the slowness a LUN open is pending so the corresponding fc_rport remains blocked. After fast_io_fail_tmo we trigger close physical port recovery for the port under which the LUN should have been opened. The new higher order port recovery dismisses the pending LUN open ERP action and dismisses the pending LUN open FSF request. Such dismissal decouples the ERP action from the pending corresponding FSF request by setting zfcp_fsf_req->erp_action to NULL (among other things) [zfcp_erp_strategy_check_fsfreq()]. If now the ERP timeout for the pending open LUN request runs out, we must not use zfcp_fsf_req->erp_action in the ERP timeout handler. This is a problem since v4.15 commit 75492a51568b ("s390/scsi: Convert timers to use timer_setup()"). Before that we intentionally only passed zfcp_erp_action as context argument to zfcp_erp_timeout_handler(). Note: The lifetime of the corresponding zfcp_fsf_req object continues until a (late) response or an (unrelated) adapter recovery. Just like the regular response path ignores dismissed requests [zfcp_fsf_req_complete() => zfcp_fsf_protstatus_eval() => return early] the ERP timeout handler now needs to ignore dismissed requests. So simply return early in the ERP timeout handler if the FSF request is marked as dismissed in its status flags. To protect against the race where zfcp_erp_strategy_check_fsfreq() dismisses and sets zfcp_fsf_req->erp_action to NULL after our previous status flag check, return early if zfcp_fsf_req->erp_action is NULL. After all, the former ERP action does not need to be woken up as that was already done as part of the dismissal above [zfcp_erp_action_dismiss()]. This fixes the following panic due to kernel page fault in IRQ context: Unable to handle kernel pointer dereference in virtual kernel address space Failing address: 0000000000000000 TEID: 0000000000000483 Fault in home space mode while using kernel ASCE. AS:000009859238c00b R2:00000e3e7ffd000b R3:00000e3e7ffcc007 S:00000e3e7ffd7000 P:000000000000013d Oops: 0004 ilc:2 [#1] SMP Modules linked in: ... CPU: 82 PID: 311273 Comm: stress Kdump: loaded Tainted: G E X ... Hardware name: IBM 8561 T01 701 (LPAR) Krnl PSW : 0404c00180000000 001fffff80549be0 (zfcp_erp_notify+0x40/0xc0 [zfcp]) R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 Krnl GPRS: 0000000000000080 00000e3d00000000 00000000000000f0 0000000000030000 000000010028e700 000000000400a39c 000000010028e700 00000e3e7cf87e02 0000000010000000 0700098591cb67f0 0000000000000000 0000000000000000 0000033840e9a000 0000000000000000 001fffe008d6bc18 001fffe008d6bbc8 Krnl Code: 001fffff80549bd4: a7180000 lhi %r1,0 001fffff80549bd8: 4120a0f0 la %r2,240(%r10) #001fffff80549bdc: a53e0003 llilh %r3,3 >001fffff80549be0: ba132000 cs %r1,%r3,0(%r2) 001fffff80549be4: a7740037 brc 7,1fffff80549c52 001fffff80549be8: e320b0180004 lg %r2,24(%r11) 001fffff80549bee: e31020e00004 lg %r1,224(%r2) 001fffff80549bf4: 412020e0 la %r2,224(%r2) Call Trace: [<001fffff80549be0>] zfcp_erp_notify+0x40/0xc0 [zfcp] [<00000985915e26f0>] call_timer_fn+0x38/0x190 [<00000985915e2944>] expire_timers+0xfc/0x190 [<00000985915e2ac4>] run_timer_softirq+0xec/0x218 [<0000098591ca7c4c>] __do_softirq+0x144/0x398 [<00000985915110aa>] do_softirq_own_stack+0x72/0x88 [<0000098591551b58>] irq_exit+0xb0/0xb8 [<0000098591510c6a>] do_IRQ+0x82/0xb0 [<0000098591ca7140>] ext_int_handler+0x128/0x12c [<0000098591722d98>] clear_subpage.constprop.13+0x38/0x60 ([<000009859172ae4c>] clear_huge_page+0xec/0x250) [<000009859177e7a2>] do_huge_pmd_anonymous_page+0x32a/0x768 [<000009859172a712>] __handle_mm_fault+0x88a/0x900 [<000009859172a860>] handle_mm_fault+0xd8/0x1b0 [<0000098591529ef6>] do_dat_exception+0x136/0x3e8 [<0000098591ca6d34>] pgm_check_handler+0x1c8/0x220 Last Breaking-Event-Address: [<001fffff80549c88>] zfcp_erp_timeout_handler+0x10/0x18 [zfcp] Kernel panic - not syncing: Fatal exception in interrupt Link: https://lore.kernel.org/r/20200623140242.98864-1-maier@linux.ibm.com Fixes: 75492a51568b ("s390/scsi: Convert timers to use timer_setup()") Cc: #4.15+ Reviewed-by: Julian Wiedmann Signed-off-by: Steffen Maier Signed-off-by: Martin K. Petersen Signed-off-by: Greg Kroah-Hartman

s390/qdio: put thinint indicator after early error

2020-06-24T15:50:22+00:00

[ Upstream commit 75e82bec6b2622c6f455b7a543fb5476a5d0eed7 ] qdio_establish() calls qdio_setup_thinint() via qdio_setup_irq(). If the subsequent qdio_establish_thinint() fails, we miss to put the DSCI again. Thus the DSCI isn't available for re-use. Given enough of such errors, we could end up with having only the shared DSCI available. Merge qdio_setup_thinint() into qdio_establish_thinint(), and deal with such an error internally. Fixes: 779e6e1c724d ("[S390] qdio: new qdio driver.") Signed-off-by: Julian Wiedmann Reviewed-by: Benjamin Block Signed-off-by: Vasily Gorbik Signed-off-by: Sasha Levin

s390/ism: fix error return code in ism_probe()

2020-05-20T06:20:26+00:00

[ Upstream commit 29b74cb75e3572d83708745e81e24d37837415f9 ] Fix to return negative error code -ENOMEM from the smcd_alloc_dev() error handling case instead of 0, as done elsewhere in this function. Fixes: 684b89bc39ce ("s390/ism: add device driver for internal shared memory") Reported-by: Hulk Robot Signed-off-by: Wei Yongjun Signed-off-by: Ursula Braun Signed-off-by: David S. Miller Signed-off-by: Sasha Levin

s390/cio: avoid duplicated 'ADD' uevents

2020-04-29T14:33:01+00:00

[ Upstream commit 05ce3e53f375295c2940390b2b429e506e07655c ] The common I/O layer delays the ADD uevent for subchannels and delegates generating this uevent to the individual subchannel drivers. The io_subchannel driver will do so when the associated ccw_device has been registered -- but unconditionally, so more ADD uevents will be generated if a subchannel has been unbound from the io_subchannel driver and later rebound. To fix this, only generate the ADD event if uevents were still suppressed for the device. Fixes: fa1a8c23eb7d ("s390: cio: Delay uevents for subchannels") Message-Id: <20200327124503.9794-2-cohuck@redhat.com> Reported-by: Boris Fiuczynski Reviewed-by: Peter Oberparleiter Reviewed-by: Boris Fiuczynski Signed-off-by: Cornelia Huck Signed-off-by: Vasily Gorbik Signed-off-by: Sasha Levin

s390/cio: generate delayed uevent for vfio-ccw subchannels

2020-04-29T14:33:01+00:00

[ Upstream commit 2bc55eaeb88d30accfc1b6ac2708d4e4b81ca260 ] The common I/O layer delays the ADD uevent for subchannels and delegates generating this uevent to the individual subchannel drivers. The vfio-ccw I/O subchannel driver, however, did not do that, and will not generate an ADD uevent for subchannels that had not been bound to a different driver (or none at all, which also triggers the uevent). Generate the ADD uevent at the end of the probe function if uevents were still suppressed for the device. Message-Id: <20200327124503.9794-3-cohuck@redhat.com> Fixes: 63f1934d562d ("vfio: ccw: basic implementation for vfio_ccw driver") Reviewed-by: Eric Farman Signed-off-by: Cornelia Huck Signed-off-by: Vasily Gorbik Signed-off-by: Sasha Levin

scsi: zfcp: fix missing erp_lock in port recovery trigger for point-to-point

2020-04-17T08:50:18+00:00

commit 819732be9fea728623e1ed84eba28def7384ad1f upstream. v2.6.27 commit cc8c282963bd ("[SCSI] zfcp: Automatically attach remote ports") introduced zfcp automatic port scan. Before that, the user had to use the sysfs attribute "port_add" of an FCP device (adapter) to add and open remote (target) ports, even for the remote peer port in point-to-point topology. That code path did a proper port open recovery trigger taking the erp_lock. Since above commit, a new helper function zfcp_erp_open_ptp_port() performed an UNlocked port open recovery trigger. This can race with other parallel recovery triggers. In zfcp_erp_action_enqueue() this could corrupt e.g. adapter->erp_total_count or adapter->erp_ready_head. As already found for fabric topology in v4.17 commit fa89adba1941 ("scsi: zfcp: fix infinite iteration on ERP ready list"), there was an endless loop during tracing of rport (un)block. A subsequent v4.18 commit 9e156c54ace3 ("scsi: zfcp: assert that the ERP lock is held when tracing a recovery trigger") introduced a lockdep assertion for that case. As a side effect, that lockdep assertion now uncovered the unlocked code path for PtP. It is from within an adapter ERP action: zfcp_erp_strategy[1479] intentionally DROPs erp lock around zfcp_erp_strategy_do_action() zfcp_erp_strategy_do_action[1441] NO erp lock zfcp_erp_adapter_strategy[876] NO erp lock zfcp_erp_adapter_strategy_open[855] NO erp lock zfcp_erp_adapter_strategy_open_fsf[806]NO erp lock zfcp_erp_adapter_strat_fsf_xconf[772] erp lock only around zfcp_erp_action_to_running(), BUT *_not_* around zfcp_erp_enqueue_ptp_port() zfcp_erp_enqueue_ptp_port[728] BUG: *_not_* taking erp lock _zfcp_erp_port_reopen[432] assumes to be called with erp lock zfcp_erp_action_enqueue[314] assumes to be called with erp lock zfcp_dbf_rec_trig[288] _checks_ to be called with erp lock: lockdep_assert_held(&adapter->erp_lock); It causes the following lockdep warning: WARNING: CPU: 2 PID: 775 at drivers/s390/scsi/zfcp_dbf.c:288 zfcp_dbf_rec_trig+0x16a/0x188 no locks held by zfcperp0.0.17c0/775. Fix this by using the proper locked recovery trigger helper function. Link: https://lore.kernel.org/r/20200312174505.51294-2-maier@linux.ibm.com Fixes: cc8c282963bd ("[SCSI] zfcp: Automatically attach remote ports") Cc: #v2.6.27+ Reviewed-by: Jens Remus Reviewed-by: Benjamin Block Signed-off-by: Steffen Maier Signed-off-by: Martin K. Petersen Signed-off-by: Greg Kroah-Hartman

s390/qeth: handle error when backing RX buffer

2020-04-01T09:01:54+00:00

[ Upstream commit 17413852804d7e86e6f0576cca32c1541817800e ] qeth_init_qdio_queues() fills the RX ring with an initial set of RX buffers. If qeth_init_input_buffer() fails to back one of the RX buffers with memory, we need to bail out and report the error. Fixes: 4a71df50047f ("qeth: new qeth device driver") Signed-off-by: Julian Wiedmann Signed-off-by: David S. Miller Signed-off-by: Sasha Levin

s390/qeth: don't reset default_out_queue

2020-04-01T09:01:54+00:00

[ Upstream commit 240c1948491b81cfe40f84ea040a8f2a4966f101 ] When an OSA device in prio-queue setup is reduced to 1 TX queue due to HW restrictions, we reset its the default_out_queue to 0. In the old code this was needed so that qeth_get_priority_queue() gets the queue selection right. But with proper multiqueue support we already reduced dev->real_num_tx_queues to 1, and so the stack puts all traffic on txq 0 without even calling .ndo_select_queue. Thus we can preserve the user's configuration, and apply it if the OSA device later re-gains support for multiple TX queues. Fixes: 73dc2daf110f ("s390/qeth: add TX multiqueue support for OSA devices") Signed-off-by: Julian Wiedmann Signed-off-by: David S. Miller Signed-off-by: Sasha Levin

s390/dasd: fix data corruption for thin provisioned devices

2020-03-18T06:17:52+00:00

commit 5e6bdd37c5526ef01326df5dabb93011ee89237e upstream. Devices are formatted in multiple of tracks. For an Extent Space Efficient (ESE) volume we get errors when accessing unformatted tracks. In this case the driver either formats the track on the flight for write requests or returns zero data for read requests. In case a request spans multiple tracks, the indication of an unformatted track presented for the first track is incorrectly applied to all tracks covered by the request. As a result, tracks containing data will be handled as empty, resulting in zero data being returned on read, or overwriting existing data with zero on write. Fix by determining the track that gets the NRF error. For write requests only format the track that is surely not formatted. For Read requests all tracks before have returned valid data and should not be touched. All tracks after the unformatted track might be formatted or not. Those are returned to the blocklayer to build a new request. When using alias devices there is a chance that multiple write requests trigger a format of the same track which might lead to data loss. Ensure that a track is formatted only once by maintaining a list of currently processed tracks. Fixes: 5e2b17e712cf ("s390/dasd: Add dynamic formatting support for ESE volumes") Cc: stable@vger.kernel.org # 5.3+ Signed-off-by: Stefan Haberland Reviewed-by: Jan Hoeppner Reviewed-by: Peter Oberparleiter Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman