summaryrefslogtreecommitdiff
path: root/drivers/scsi
AgeCommit message (Collapse)AuthorFilesLines
2018-11-21SCSI: fix queue cleanup race before queue initialization is doneMing Lei1-0/+8
commit 8dc765d438f1e42b3e8227b3b09fad7d73f4ec9a upstream. c2856ae2f315d ("blk-mq: quiesce queue before freeing queue") has already fixed this race, however the implied synchronize_rcu() in blk_mq_quiesce_queue() can slow down LUN probe a lot, so caused performance regression. Then 1311326cf4755c7 ("blk-mq: avoid to synchronize rcu inside blk_cleanup_queue()") tried to quiesce queue for avoiding unnecessary synchronize_rcu() only when queue initialization is done, because it is usual to see lots of inexistent LUNs which need to be probed. However, turns out it isn't safe to quiesce queue only when queue initialization is done. Because when one SCSI command is completed, the user of sending command can be waken up immediately, then the scsi device may be removed, meantime the run queue in scsi_end_request() is still in-progress, so kernel panic can be caused. In Red Hat QE lab, there are several reports about this kind of kernel panic triggered during kernel booting. This patch tries to address the issue by grabing one queue usage counter during freeing one request and the following run queue. Fixes: 1311326cf4755c7 ("blk-mq: avoid to synchronize rcu inside blk_cleanup_queue()") Cc: Andrew Jones <drjones@redhat.com> Cc: Bart Van Assche <bart.vanassche@wdc.com> Cc: linux-scsi@vger.kernel.org Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Christoph Hellwig <hch@lst.de> Cc: James E.J. Bottomley <jejb@linux.vnet.ibm.com> Cc: stable <stable@vger.kernel.org> Cc: jianchao.wang <jianchao.w.wang@oracle.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-21scsi: qla2xxx: Initialize port speed to avoid setting lower speedQuinn Tran1-0/+1
commit f635e48e866ee1a47d2d42ce012fdcc07bf55853 upstream. This patch initializes port speed so that firmware does not set lower operating speed. Setting lower speed in firmware impacts WRITE perfomance. Fixes: 726b85487067 ("qla2xxx: Add framework for async fabric discovery") Cc: <stable@vger.kernel.org> Signed-off-by: Quinn Tran <quinn.tran@cavium.com> Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com> Tested-by: Laurence Oberman <loberman@redhat.com> Reviewed-by: Ewan D. Milne <emilne@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-21scsi: qla2xxx: Fix NVMe session hang on unloadQuinn Tran1-1/+1
commit f7d61c995df74d6bb57bbff6a2b7b1874c4a2baa upstream. Send aborts only when chip is active. Fixes: 623ee824e579 ("scsi: qla2xxx: Fix FC-NVMe IO abort during driver reset") Cc: <stable@vger.kernel.org> # 4.14 Signed-off-by: Quinn Tran <quinn.tran@cavium.com> Reviewed-by: Ewan D. Milne <emilne@redhat.com> Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-21scsi: qla2xxx: Fix re-using LoopID when handle is in useQuinn Tran2-15/+6
commit 5c6400536481d9ef44ef94e7bf2c7b8e81534db7 upstream. This patch fixes issue where driver clears NPort ID map instead of marking handle in use. Once driver clears NPort ID from the database, it can reuse the same NPort ID resulting in a PLOGI failure. [mkp: fixed Himanshu's SoB] Fixes: a084fd68e1d2 ("scsi: qla2xxx: Fix re-login for Nport Handle in use") Cc: <stable@vger.kernel.org> Signed-of-by: Quinn Tran <quinn.tran@cavium.com> Reviewed-by: Ewan D. Milne <emilne@redhat.com> Signed-off-by: Himanshu Madhani <hmadhani@cavium.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-21scsi: qla2xxx: Fix driver hang when FC-NVMe LUNs are configuredHimanshu Madhani1-3/+0
commit 39553065f77c297239308470ee313841f4e07db4 upstream. This patch fixes multiple call for qla_nvme_unregister_remote_port() as part of qlt_schedule_session_for_deletion(), Do not call it again during qla_nvme_delete() Fixes: e473b3074104 ("scsi: qla2xxx: Add FC-NVMe abort processing") Cc: <stable@vger.kernel.org> Reviewed-by: Ewan D. Milne <emilne@redhat.com> Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-21scsi: qla2xxx: Fix duplicate switch database entriesQuinn Tran1-3/+3
commit 732ee9a912cf2d9a50c5f9c4213cdc2f885d6aa6 upstream. The response data buffer used in switch scan is reused 4 times. (For example, for commands GPN_FT, GNN_FT for FCP and FC-NVME) Before driver reuses this buffer, clear it to prevent duplicate entries in our database. Fixes: a4239945b8ad1 ("scsi: qla2xxx: Add switch command to simplify fabric discovery" Cc: <stable@vger.kernel.org> Signed-off-by: Quinn Tran <quinn.tran@cavium.com> Reviewed-by: Ewan D. Milne <emilne@redhat.com> Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-21scsi: qla2xxx: shutdown chip if reset failQuinn Tran1-1/+1
commit 1e4ac5d6fe0a4af17e4b6251b884485832bf75a3 upstream. If chip unable to fully initialize, use full shutdown sequence to clear out any stale FW state. Fixes: e315cd28b9ef ("[SCSI] qla2xxx: Code changes for qla data structure refactoring") Cc: stable@vger.kernel.org #4.10 Signed-off-by: Quinn Tran <quinn.tran@cavium.com> Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-21scsi: qla2xxx: Remove stale debug trace message from tcm_qla2xxxQuinn Tran1-4/+0
commit 7c388f91ec1a59b0ed815b07b90536e2d57e1e1f upstream. Remove stale debug trace. Fixes: 1eb42f965ced ("qla2xxx: Make trace flags more readable") Cc: stable@vger.kernel.org #4.10 Signed-off-by: Quinn Tran <quinn.tran@cavium.com> Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-21scsi: qla2xxx: Fix process response queue for ISP26XX and aboveQuinn Tran2-19/+0
commit b86ac8fd4b2f6ec2f9ca9194c56eac12d620096f upstream. This patch improves performance for 16G and above adapter by removing additional call to process_response_queue(). [mkp: typo] Cc: <stable@vger.kernel.org> Signed-off-by: Quinn Tran <quinn.tran@cavium.com> Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-21scsi: qla2xxx: Fix incorrect port speed being set for FC adaptersHimanshu Madhani1-4/+1
commit 4c1458df9635c7e3ced155f594d2e7dfd7254e21 upstream. Fixes: 6246b8a1d26c7c ("[SCSI] qla2xxx: Enhancements to support ISP83xx.") Fixes: 1bb395485160d2 ("qla2xxx: Correct iiDMA-update calling conventions.") Cc: <stable@vger.kernel.org> Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-13scsi: lpfc: Correct race with abort on completion pathJames Smart1-3/+11
[ Upstream commit ca7fb76e091f889cfda1287c07a9358f73832b39 ] On io completion, the driver is taking an adapter wide lock and nulling the scsi command back pointer. The nulling of the back pointer is to signify the io was completed and the scsi_done() routine was called. However, the routine makes no check to see if the abort routine had done the same thing and possibly nulled the pointer. Thus it may doubly-complete the io. Make the following mods: - Check to make sure forward progress (call scsi_done()) only happens if the command pointer was non-null. - As the taking of the lock, which is adapter wide, is very costly on a system under load, null the pointer using an xchg operation rather than under lock. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-13scsi: lpfc: Correct soft lockup when running mds diagnosticsJames Smart1-0/+7
[ Upstream commit 0ef01a2d95fd62bb4f536e7ce4d5e8e74b97a244 ] When running an mds diagnostic that passes frames with the switch, soft lockups are detected. The driver is in a CQE processing loop and has sufficient amount of traffic that it never exits the ring processing routine, thus the "lockup". Cap the number of elements in the work processing routine to 64 elements. This ensures that the cpu will be given up and the handler reschedule to process additional items. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-13scsi: megaraid_sas: fix a missing-check bugWenwen Wang1-0/+3
[ Upstream commit 47db7873136a9c57c45390a53b57019cf73c8259 ] In megasas_mgmt_compat_ioctl_fw(), to handle the structure compat_megasas_iocpacket 'cioc', a user-space structure megasas_iocpacket 'ioc' is allocated before megasas_mgmt_ioctl_fw() is invoked to handle the packet. Since the two data structures have different fields, the data is copied from 'cioc' to 'ioc' field by field. In the copy process, 'sense_ptr' is prepared if the field 'sense_len' is not null, because it will be used in megasas_mgmt_ioctl_fw(). To prepare 'sense_ptr', the user-space data 'ioc->sense_off' and 'cioc->sense_off' are copied and saved to kernel-space variables 'local_sense_off' and 'user_sense_off' respectively. Given that 'ioc->sense_off' is also copied from 'cioc->sense_off', 'local_sense_off' and 'user_sense_off' should have the same value. However, 'cioc' is in the user space and a malicious user can race to change the value of 'cioc->sense_off' after it is copied to 'ioc->sense_off' but before it is copied to 'user_sense_off'. By doing so, the attacker can inject different values into 'local_sense_off' and 'user_sense_off'. This can cause undefined behavior in the following execution, because the two variables are supposed to be same. This patch enforces a check on the two kernel variables 'local_sense_off' and 'user_sense_off' to make sure they are the same after the copy. In case they are not, an error code EINVAL will be returned. Signed-off-by: Wenwen Wang <wang6495@umn.edu> Acked-by: Sumit Saxena <sumit.saxena@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-13scsi: ufs: Schedule clk gating work on correct queueEvan Green1-2/+3
[ Upstream commit f4bb7704699beee9edfbee875daa9089c86cf724 ] With commit 10e5e37581fc ("scsi: ufs: Add clock ungating to a separate workqueue"), clock gating work was moved to a separate work queue with WQ_MEM_RECLAIM set, since clock gating could occur from a memory reclaim context. Unfortunately, clk_gating.gate_work was left queued via schedule_delayed_work, which is a system workqueue that does not have WQ_MEM_RECLAIM set. Because ufshcd_ungate_work attempts to cancel gate_work, the following warning appears: [ 14.174170] workqueue: WQ_MEM_RECLAIM ufs_clk_gating_0:ufshcd_ungate_work is flushing !WQ_MEM_RECLAIM events:ufshcd_gate_work [ 14.174179] WARNING: CPU: 4 PID: 173 at kernel/workqueue.c:2440 check_flush_dependency+0x110/0x118 [ 14.205725] CPU: 4 PID: 173 Comm: kworker/u16:3 Not tainted 4.14.68 #1 [ 14.212437] Hardware name: Google Cheza (rev1) (DT) [ 14.217459] Workqueue: ufs_clk_gating_0 ufshcd_ungate_work [ 14.223107] task: ffffffc0f6a40080 task.stack: ffffff800a490000 [ 14.229195] PC is at check_flush_dependency+0x110/0x118 [ 14.234569] LR is at check_flush_dependency+0x110/0x118 [ 14.239944] pc : [<ffffff80080cad14>] lr : [<ffffff80080cad14>] pstate: 60c001c9 [ 14.333050] Call trace: [ 14.427767] [<ffffff80080cad14>] check_flush_dependency+0x110/0x118 [ 14.434219] [<ffffff80080cafec>] start_flush_work+0xac/0x1fc [ 14.440046] [<ffffff80080caeec>] flush_work+0x40/0x94 [ 14.445246] [<ffffff80080cb288>] __cancel_work_timer+0x11c/0x1b8 [ 14.451433] [<ffffff80080cb4b8>] cancel_delayed_work_sync+0x20/0x30 [ 14.457886] [<ffffff80085b9294>] ufshcd_ungate_work+0x24/0xd0 [ 14.463800] [<ffffff80080cfb04>] process_one_work+0x32c/0x690 [ 14.469713] [<ffffff80080d0154>] worker_thread+0x218/0x338 [ 14.475361] [<ffffff80080d527c>] kthread+0x120/0x130 [ 14.480470] [<ffffff8008084814>] ret_from_fork+0x10/0x18 The simple solution is to put the gate_work on the same WQ_MEM_RECLAIM work queue as the ungate_work. Fixes: 10e5e37581fc ("scsi: ufs: Add clock ungating to a separate workqueue") Signed-off-by: Evan Green <evgreen@chromium.org> Reviewed-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Stephen Boyd <swboyd@chromium.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-13scsi: esp_scsi: Track residual for PIO transfersFinn Thain3-0/+5
[ Upstream commit fd47d919d0c336e7c22862b51ee94927ffea227a ] If a target disconnects during a PIO data transfer the command may fail when the target reconnects: scsi host1: DMA length is zero! scsi host1: cur adr[04380000] len[00000000] The scsi bus is then reset. This happens because the residual reached zero before the transfer was completed. The usual residual calculation relies on the Transfer Count registers. That works for DMA transfers but not for PIO transfers. Fix the problem by storing the PIO transfer residual and using that to correctly calculate bytes_sent. Fixes: 6fe07aaffbf0 ("[SCSI] m68k: new mac_esp scsi driver") Tested-by: Stan Johnson <userm57@yahoo.com> Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Tested-by: Michael Schmitz <schmitzmic@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-04scsi: qedi: Initialize the stats mutex lockNilesh Javali1-0/+1
[ Upstream commit 3cc5746e5ad7688e274e193fa71278d98aa52759 ] Fix kernel NULL pointer dereference, Call Trace: [<ffffffff9b7658e6>] __mutex_lock_slowpath+0xa6/0x1d0 [<ffffffff9b764cef>] mutex_lock+0x1f/0x2f [<ffffffffc061b5e1>] qedi_get_protocol_tlv_data+0x61/0x450 [qedi] [<ffffffff9b1f9d8e>] ? map_vm_area+0x2e/0x40 [<ffffffff9b1fc370>] ? __vmalloc_node_range+0x170/0x280 [<ffffffffc0b81c3d>] ? qed_mfw_process_tlv_req+0x27d/0xbd0 [qed] [<ffffffffc0b6461b>] qed_mfw_fill_tlv_data+0x4b/0xb0 [qed] [<ffffffffc0b81c59>] qed_mfw_process_tlv_req+0x299/0xbd0 [qed] [<ffffffff9b02a59e>] ? __switch_to+0xce/0x580 [<ffffffffc0b61e5b>] qed_slowpath_task+0x5b/0x80 [qed] Signed-off-by: Nilesh Javali <nilesh.javali@cavium.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-10-20scsi: sd: don't crash the host on invalid commandsJohannes Thumshirn1-1/+2
[ Upstream commit f1f1fadacaf08b7cf11714c0c29f8fa4d4ef68a9 ] When sd_init_command() get's a command with a unknown req_op() it crashes the system via BUG(). This makes debugging the actual reason for the broken request cmd_flags pretty hard as the system is down before it's able to write out debugging data on the serial console or the trace buffer. Change the BUG() to a WARN_ON() and return BLKPREP_KILL to fail gracefully and return an I/O error to the producer of the request. Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Cc: Hannes Reinecke <hare@suse.de> Cc: Bart Van Assche <bvanassche@acm.org> Cc: Christoph Hellwig <hch@lst.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-20scsi: ipr: System hung while dlpar adding primary ipr adapter backWen Xiong2-45/+62
[ Upstream commit 318ddb34b2052f838aa243d07173e2badf3e630e ] While dlpar adding primary ipr adapter back, driver goes through adapter initialization then schedule ipr_worker_thread to start te disk scan by dropping the host lock, calling scsi_add_device. Then get the adapter reset request again, so driver does scsi_block_requests, this will cause the scsi_add_device get hung until we unblock. But we can't run ipr_worker_thread to do the unblock because its stuck in scsi_add_device. This patch fixes the issue. [mkp: typo and whitespace fixes] Signed-off-by: Wen Xiong <wenxiong@linux.vnet.ibm.com> Acked-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-20scsi: lpfc: Synchronize access to remoteport via rportJames Smart3-13/+23
[ Upstream commit 9e210178267b80c4eeb832fade7e146a18c84915 ] The driver currently uses the ndlp to get the local rport which is then used to get the nvme transport remoteport pointer. There can be cases where a stale remoteport pointer is obtained as synchronization isn't done through the different dereferences. Correct by using locks to synchronize the dereferences. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-20scsi: ibmvscsis: Ensure partition name is properly NUL terminatedLaura Abbott1-1/+1
[ Upstream commit adad633af7b970bfa5dd1b624a4afc83cac9b235 ] While reviewing another part of the code, Kees noticed that the strncpy of the partition name might not always be NUL terminated. Switch to using strscpy which does this safely. Reported-by: Kees Cook <keescook@chromium.org> Signed-off-by: Laura Abbott <labbott@redhat.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-20scsi: ibmvscsis: Fix a stringop-overflow warningLaura Abbott1-2/+1
[ Upstream commit d792d4c4fc866ae224b0b0ca2aabd87d23b4d6cc ] There's currently a warning about string overflow with strncat: drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c: In function 'ibmvscsis_probe': drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c:3479:2: error: 'strncat' specified bound 64 equals destination size [-Werror=stringop-overflow=] strncat(vscsi->eye, vdev->name, MAX_EYE); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Switch to a single snprintf instead of a strcpy + strcat to handle this cleanly. Signed-off-by: Laura Abbott <labbott@redhat.com> Suggested-by: Kees Cook <keescook@chromium.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-18scsi: qla2xxx: Fix an endian bug in fcpcmd_is_corrupted()Dan Carpenter1-2/+2
[ Upstream commit cbe3fd39d223f14b1c60c80fe9347a3dd08c2edb ] We should first do the le16_to_cpu endian conversion and then apply the FCP_CMD_LENGTH_MASK mask. Fixes: 5f35509db179 ("qla2xxx: Terminate exchange if corrupted") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Quinn Tran <Quinn.Tran@cavium.com> Acked-by: Himanshu Madhani <himanshu.madhani@cavium.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-10scsi: qedi: Add the CRC size within iSCSI NVM imageNilesh Javali2-14/+21
[ Upstream commit c77a2fa3ff8f73d1a485e67e6f81c64823739d59 ] The QED driver commit, 1ac4329a1cff ("qed: Add configuration information to register dump and debug data"), removes the CRC length validation causing nvm_get_image failure while loading qedi driver: [qed_mcp_get_nvm_image:2700(host_10-0)]Image [0] is too big - 00006008 bytes where only 00006004 are available [qedi_get_boot_info:2253]:10: Could not get NVM image. ret = -12 Hence add and adjust the CRC size to iSCSI NVM image to read boot info at qedi load time. Signed-off-by: Nilesh Javali <nilesh.javali@cavium.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-10scsi: aacraid: fix a signedness bugDan Carpenter1-1/+1
[ Upstream commit b9eb3b14f1dbf16bf27b6c1ffe6b8c00ec945c9b ] The problem is that ->reset_state is a u8 but it can be set to -1 or -2 in aac_tmf_callback() and the error handling in aac_eh_target_reset() relies on it to be signed. [mkp: fixed typo] Fixes: 0d643ff3c353 ("scsi: aacraid: use aac_tmf_callback for reset fib") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-10scsi: csiostor: fix incorrect port capabilitiesVarun Prakash3-14/+48
[ Upstream commit 68bdc630721c40e908d22cffe07b5ca225a69f6e ] - use be32_to_cpu() instead of ntohs() for 32 bit port capabilities. - add a new function fwcaps32_to_caps16() to convert 32 bit port capabilities to 16 bit port capabilities. Signed-off-by: Varun Prakash <varun@chelsio.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-10scsi: csiostor: add a check for NULL pointer after kmalloc()Varun Prakash1-7/+9
[ Upstream commit 89809b028b6f54187b7d81a0c69b35d394c52e62 ] Reported-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Varun Prakash <varun@chelsio.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-04scsi: megaraid_sas: Update controller info during resumeShivasharan S1-0/+3
[ Upstream commit c3b10a55abc943a526aaecd7e860b15671beb906 ] There is a possibility that firmware on the controller was upgraded before system was suspended. During resume, driver needs to read updated controller properties. Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-04scsi: hisi_sas: Fix the conflict between dev gone and host resetXiaofei Tan2-0/+7
[ Upstream commit d2fc401e47529d9ffd2673a5395d56002e31ad98 ] There is a possible conflict when a device is removed and host reset occurs concurrently. The reason is that then the device is notified as gone, we try to clear the ITCT, which is notified via an interrupt. The dev gone function pends on this event with a completion, which is completed when the ITCT interrupt occurs. But host reset will disable all interrupts, the wait_for_completion() may wait indefinitely. This patch adds an semaphore to synchronise this two processes. The semaphore is taken by the host reset as the basis of synchronising. Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-04scsi: bnx2i: add error handling for ioremap_nocacheZhouyang Jia1-0/+2
[ Upstream commit aa154ea885eb0c2407457ce9c1538d78c95456fa ] When ioremap_nocache fails, the lack of error-handling code may cause unexpected results. This patch adds error-handling code after calling ioremap_nocache. Signed-off-by: Zhouyang Jia <jiazhouyang09@gmail.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Acked-by: Manish Rangankar <Manish.Rangankar@cavium.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-04scsi: ibmvscsi: Improve strings handlingBreno Leitao1-2/+2
[ Upstream commit 1262dc09dc9ae7bf4ad00b6a2c5ed6a6936bcd10 ] Currently an open firmware property is copied into partition_name variable without keeping a room for \0. Later one, this variable (partition_name), which is 97 bytes long, is strncpyed into ibmvcsci_host_data->madapter_info->partition_name, which is 96 bytes long, possibly truncating it 'again' and removing the \0. This patch simply decreases the partition name to 96 and just copy using strlcpy() which guarantees that the string is \0 terminated. I think there is no issue if this there is a truncation in this very first copy, i.e, when the open firmware property is read and copied into the driver for the very first time; This issue also causes the following warning on GCC 8: drivers/scsi/ibmvscsi/ibmvscsi.c:281:2: warning: strncpy output may be truncated copying 96 bytes from a string of length 96 [-Wstringop-truncation] ... inlined from ibmvscsi_probe at drivers/scsi/ibmvscsi/ibmvscsi.c:2221:7: drivers/scsi/ibmvscsi/ibmvscsi.c:265:3: warning: strncpy specified bound 97 equals destination size [-Wstringop-truncation] CC: Bart Van Assche <bart.vanassche@wdc.com> CC: Tyrel Datwyler <tyreld@linux.vnet.ibm.com> Signed-off-by: Breno Leitao <leitao@debian.org> Acked-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-26scsi: lpfc: Fix panic if driver unloaded when port is offlineJames Smart1-1/+4
[ Upstream commit d580c6137476ab307a66e278cf7dbc666230f714 ] System crashes when the lpfc module is unloaded after making the port offline The nvme queue pointers were freed during port offline, but were later accessed in pci remove path. Validate the pointers in pci remove path before accessing them. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-26scsi: lpfc: Fix NVME Target crash in defer rcv logicJames Smart1-1/+11
[ Upstream commit 6871e8144f935a1f08e7fc6269c894861ce494aa ] Kernel occasionally crashed with the following ops on NVME Target: BUG: unable to handle kernel NULL pointer dereference at 0000000000000058 IP: [<ffffffffa042ee50>] lpfc_nvmet_defer_rcv+0x50/0x70 [lpfc] Callback routine was called for deferred rcv when it should be treated as a normal rcv. Added code in callback routine to detect this condition and log a message, then bail. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-26scsi: libfc: fixup 'sleeping function called from invalid context'Hannes Reinecke1-3/+4
[ Upstream commit fa519f701d27198a2858bb108fc18ea9d8c106a7 ] fc_rport_login() will be calling mutex_lock() while running inside an RCU-protected section, triggering the warning 'sleeping function called from invalid context'. To fix this we can drop the rcu functions here altogether as the disc mutex protecting the list itself is already held, preventing any list manipulation. Fixes: a407c593398c ("scsi: libfc: Fixup disc_mutex handling") Signed-off-by: Hannes Reinecke <hare@suse.com> Acked-by: Johannes Thumshirn <jth@kernel.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-19scsi: 3ware: fix return 0 on the error path of probeAnton Vasilyev3-1/+10
[ Upstream commit 4dc98c1995482262e70e83ef029135247fafe0f2 ] tw_probe() returns 0 in case of fail of tw_initialize_device_extension(), pci_resource_start() or tw_reset_sequence() and releases resources. twl_probe() returns 0 in case of fail of twl_initialize_device_extension(), pci_iomap() and twl_reset_sequence(). twa_probe() returns 0 in case of fail of tw_initialize_device_extension(), ioremap() and twa_reset_sequence(). The patch adds retval initialization for these cases. Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: Anton Vasilyev <vasilyev@ispras.ru> Acked-by: Adam Radford <aradford@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-19scsi: lpfc: Fix driver crash when re-registering NVME rports.James Smart1-8/+14
[ Upstream commit 93a3922da428ec0752e8b2ab00c42dadbbf805a9 ] During remote port loss fault testing, the driver crashed with the following trace: general protection fault: 0000 [#1] SMP RIP: ... lpfc_nvme_register_port+0x250/0x480 [lpfc] Call Trace: lpfc_nlp_state_cleanup+0x1b3/0x7a0 [lpfc] lpfc_nlp_set_state+0xa6/0x1d0 [lpfc] lpfc_cmpl_prli_prli_issue+0x213/0x440 lpfc_disc_state_machine+0x7e/0x1e0 [lpfc] lpfc_cmpl_els_prli+0x18a/0x200 [lpfc] lpfc_sli_sp_handle_rspiocb+0x3b5/0x6f0 [lpfc] lpfc_sli_handle_slow_ring_event_s4+0x161/0x240 [lpfc] lpfc_work_done+0x948/0x14c0 [lpfc] lpfc_do_work+0x16f/0x180 [lpfc] kthread+0xc9/0xe0 ret_from_fork+0x55/0x80 After registering a new remoteport, the driver is pulling an ndlp pointer from the lpfc rport associated with the private area of a newly registered remoteport. The private area is uninitialized, so it's garbage. Correct by pulling the the lpfc rport pointer from the entering ndlp point, then ndlp value from at rport. Note the entering ndlp may be replacing by the rport->ndlp due to an address change swap. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-19scsi: qla2xxx: Silent erroneous messageQuinn Tran1-0/+9
[ Upstream commit 3f915271b12e11183c606bed1c3dfff0983662d3 ] Driver uses shadow pointer instead of Mirror pointer for firmware dump collection. Skip those entries for Mirror pointers for Request/Response queue from firmware dump template reading. Following messages are printed in log messages: qla27xx_fwdt_entry_t268: unknown buffer 4 qla27xx_fwdt_entry_t268: unknown buffer 5 This patch fixes these error messages by adding skip_entry() to not read them from template. Signed-off-by: Quinn Tran <quinn.tran@cavium.com> Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-19scsi: qla2xxx: Fix session state stuck in Get Port DBQuinn Tran1-3/+6
[ Upstream commit 8fde6977ac478c00eeb2beccfdd4a6ad44219f6c ] This patch sets discovery state back to GNL (Get Name List) when session is stuck at GPDB (Get Port DataBase). This will allow state machine to retry login and move session state ahead in discovery. Signed-off-by: Quinn Tran <quinn.tran@cavium.com> Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-19scsi: qla2xxx: Fix unintended LogoutQuinn Tran1-1/+2
[ Upstream commit cb97f2c2e8d9f8c71ddbf04ad57e163ee6d86474 ] During normal IO, FW can return IO with 'port unavailble' status. Driver would send a LOGO to remote port for session resync. On an off chance, a PLOGI could arrive before sending the LOGO. This patch will skip sendiing LOGO if a PLOGI just came in. Signed-off-by: Quinn Tran <quinn.tran@cavium.com> Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-19scsi: lpfc: Correct MDS diag and nvmet configurationJames Smart1-1/+1
commit 53e13ee087a80e8d4fc95436318436e5c2c1f8c2 upstream. A recent change added some MDS processing in the lpfc_drain_txq routine that relies on the fcp_wq being allocated. For nvmet operation the fcp_wq is not allocated because it can only be an nvme-target. When the original MDS support was added LS_MDS_LOOPBACK was defined wrong, (0x16) it should have been 0x10 (decimal value used for hex setting). This incorrect value allowed MDS_LOOPBACK to be set simultaneously with LS_NPIV_FAB_SUPPORTED, causing the driver to crash when it accesses the non-existent fcp_wq. Correct the bad value setting for LS_MDS_LOOPBACK. Fixes: ae9e28f36a6c ("lpfc: Add MDS Diagnostic support.") Cc: <stable@vger.kernel.org> # v4.12+ Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Tested-by: Ewan D. Milne <emilne@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-15scsi: aic94xx: fix an error code in aic94xx_init()Dan Carpenter1-1/+3
[ Upstream commit 0756c57bce3d26da2592d834d8910b6887021701 ] We accidentally return success instead of -ENOMEM on this error path. Fixes: 2908d778ab3e ("[SCSI] aic94xx: new driver") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-05scsi: core: Avoid that SCSI device removal through sysfs triggers a deadlockBart Van Assche1-2/+18
commit 0ee223b2e1f67cb2de9c0e3247c510d846e74d63 upstream. A long time ago the unfortunate decision was taken to add a self-deletion attribute to the sysfs SCSI device directory. That decision was unfortunate because self-deletion is really tricky. We can't drop that attribute because widely used user space software depends on it, namely the rescan-scsi-bus.sh script. Hence this patch that avoids that writing into that attribute triggers a deadlock. See also commit 7973cbd9fbd9 ("[PATCH] add sysfs attributes to scan and delete scsi_devices"). This patch avoids that self-removal triggers the following deadlock: ====================================================== WARNING: possible circular locking dependency detected 4.18.0-rc2-dbg+ #5 Not tainted ------------------------------------------------------ modprobe/6539 is trying to acquire lock: 000000008323c4cd (kn->count#202){++++}, at: kernfs_remove_by_name_ns+0x45/0x90 but task is already holding lock: 00000000a6ec2c69 (&shost->scan_mutex){+.+.}, at: scsi_remove_host+0x21/0x150 [scsi_mod] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&shost->scan_mutex){+.+.}: __mutex_lock+0xfe/0xc70 mutex_lock_nested+0x1b/0x20 scsi_remove_device+0x26/0x40 [scsi_mod] sdev_store_delete+0x27/0x30 [scsi_mod] dev_attr_store+0x3e/0x50 sysfs_kf_write+0x87/0xa0 kernfs_fop_write+0x190/0x230 __vfs_write+0xd2/0x3b0 vfs_write+0x101/0x270 ksys_write+0xab/0x120 __x64_sys_write+0x43/0x50 do_syscall_64+0x77/0x230 entry_SYSCALL_64_after_hwframe+0x49/0xbe -> #0 (kn->count#202){++++}: lock_acquire+0xd2/0x260 __kernfs_remove+0x424/0x4a0 kernfs_remove_by_name_ns+0x45/0x90 remove_files.isra.1+0x3a/0x90 sysfs_remove_group+0x5c/0xc0 sysfs_remove_groups+0x39/0x60 device_remove_attrs+0x82/0xb0 device_del+0x251/0x580 __scsi_remove_device+0x19f/0x1d0 [scsi_mod] scsi_forget_host+0x37/0xb0 [scsi_mod] scsi_remove_host+0x9b/0x150 [scsi_mod] sdebug_driver_remove+0x4b/0x150 [scsi_debug] device_release_driver_internal+0x241/0x360 device_release_driver+0x12/0x20 bus_remove_device+0x1bc/0x290 device_del+0x259/0x580 device_unregister+0x1a/0x70 sdebug_remove_adapter+0x8b/0xf0 [scsi_debug] scsi_debug_exit+0x76/0xe8 [scsi_debug] __x64_sys_delete_module+0x1c1/0x280 do_syscall_64+0x77/0x230 entry_SYSCALL_64_after_hwframe+0x49/0xbe other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&shost->scan_mutex); lock(kn->count#202); lock(&shost->scan_mutex); lock(kn->count#202); *** DEADLOCK *** 2 locks held by modprobe/6539: #0: 00000000efaf9298 (&dev->mutex){....}, at: device_release_driver_internal+0x68/0x360 #1: 00000000a6ec2c69 (&shost->scan_mutex){+.+.}, at: scsi_remove_host+0x21/0x150 [scsi_mod] stack backtrace: CPU: 10 PID: 6539 Comm: modprobe Not tainted 4.18.0-rc2-dbg+ #5 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014 Call Trace: dump_stack+0xa4/0xf5 print_circular_bug.isra.34+0x213/0x221 __lock_acquire+0x1a7e/0x1b50 lock_acquire+0xd2/0x260 __kernfs_remove+0x424/0x4a0 kernfs_remove_by_name_ns+0x45/0x90 remove_files.isra.1+0x3a/0x90 sysfs_remove_group+0x5c/0xc0 sysfs_remove_groups+0x39/0x60 device_remove_attrs+0x82/0xb0 device_del+0x251/0x580 __scsi_remove_device+0x19f/0x1d0 [scsi_mod] scsi_forget_host+0x37/0xb0 [scsi_mod] scsi_remove_host+0x9b/0x150 [scsi_mod] sdebug_driver_remove+0x4b/0x150 [scsi_debug] device_release_driver_internal+0x241/0x360 device_release_driver+0x12/0x20 bus_remove_device+0x1bc/0x290 device_del+0x259/0x580 device_unregister+0x1a/0x70 sdebug_remove_adapter+0x8b/0xf0 [scsi_debug] scsi_debug_exit+0x76/0xe8 [scsi_debug] __x64_sys_delete_module+0x1c1/0x280 do_syscall_64+0x77/0x230 entry_SYSCALL_64_after_hwframe+0x49/0xbe See also https://www.mail-archive.com/linux-scsi@vger.kernel.org/msg54525.html. Fixes: ac0ece9174ac ("scsi: use device_remove_file_self() instead of device_schedule_callback()") Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Tejun Heo <tj@kernel.org> Cc: Johannes Thumshirn <jthumshirn@suse.de> Cc: <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-09-05scsi: mpt3sas: Fix _transport_smp_handler() error pathBart Van Assche1-2/+3
commit 91b7bdb2c0089cbbb817df6888ab1458c645184e upstream. This patch avoids that smatch complains about a double unlock on ioc->transport_cmds.mutex. Fixes: 651a01364994 ("scsi: scsi_transport_sas: switch to bsg-lib for SMP passthrough") Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Sathya Prakash <sathya.prakash@broadcom.com> Cc: Chaitra P B <chaitra.basappa@broadcom.com> Cc: Suganath Prabu Subramani <suganath-prabu.subramani@broadcom.com> Cc: stable@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-05scsi: mpt3sas: Fix calltrace observed while running IO & resetSreekanth Reddy2-1/+2
commit e70183143cc472960bc60dfee1b7bbe1949feffb upstream. Below kernel BUG was observed while running IOs with host reset (issued from application), mpt3sas_cm0: diag reset: SUCCESS ------------[ cut here ]------------ WARNING: CPU: 12 PID: 4336 at drivers/scsi/mpt3sas/mpt3sas_base.c:3282 mpt3sas_base_clear_st+0x3d/0x40 [mpt3sas] Modules linked in: macsec tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag binfmt_misc fuse xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun devlink ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter sunrpc vfat fat sb_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt iTCO_vendor_support dcdbas pcspkr joydev ipmi_ssif ses enclosure sg ipmi_devintf acpi_pad ipmi_msghandler acpi_power_meter mei_me lpc_ich wmi mei shpchp ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic ata_generic pata_acpi uas usb_storage mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix mpt3sas libata crct10dif_pclmul crct10dif_common tg3 crc32c_intel i2c_core raid_class ptp scsi_transport_sas pps_core dm_mirror dm_region_hash dm_log dm_mod CPU: 12 PID: 4336 Comm: python Kdump: loaded Tainted: G W ------------ 3.10.0-875.el7.brdc.x86_64 #1 Hardware name: Dell Inc. PowerEdge R820/0YWR73, BIOS 1.5.0 03/08/2013 Call Trace: [<ffffffff9cf16583>] dump_stack+0x19/0x1b [<ffffffff9c891698>] __warn+0xd8/0x100 [<ffffffff9c8917dd>] warn_slowpath_null+0x1d/0x20 [<ffffffffc04f3f4d>] mpt3sas_base_clear_st+0x3d/0x40 [mpt3sas] [<ffffffffc05047d2>] _scsih_flush_running_cmds+0x92/0xe0 [mpt3sas] [<ffffffffc05095db>] mpt3sas_scsih_reset_handler+0x43b/0xaf0 [mpt3sas] [<ffffffff9c894829>] ? vprintk_default+0x29/0x40 [<ffffffff9cf10531>] ? printk+0x60/0x77 [<ffffffffc04f06c8>] ? _base_diag_reset+0x238/0x340 [mpt3sas] [<ffffffffc04f794d>] mpt3sas_base_hard_reset_handler+0x1ad/0x420 [mpt3sas] [<ffffffffc05132b9>] _ctl_ioctl_main.isra.12+0x11b9/0x1200 [mpt3sas] [<ffffffffc068d585>] ? xfs_file_aio_write+0x155/0x1b0 [xfs] [<ffffffff9ca1a4e3>] ? do_sync_write+0x93/0xe0 [<ffffffffc051337a>] _ctl_ioctl+0x1a/0x20 [mpt3sas] [<ffffffff9ca2fe90>] do_vfs_ioctl+0x350/0x560 [<ffffffff9ca1dec1>] ? __sb_end_write+0x31/0x60 [<ffffffff9ca30141>] SyS_ioctl+0xa1/0xc0 [<ffffffff9cf28715>] ? system_call_after_swapgs+0xa2/0x146 [<ffffffff9cf287d5>] system_call_fastpath+0x1c/0x21 [<ffffffff9cf28721>] ? system_call_after_swapgs+0xae/0x146 ---[ end trace 5dac5b98d89aaa3c ]--- ------------[ cut here ]------------ kernel BUG at block/blk-core.c:1476! invalid opcode: 0000 [#1] SMP Modules linked in: macsec tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag binfmt_misc fuse xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun devlink ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter sunrpc vfat fat sb_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt iTCO_vendor_support dcdbas pcspkr joydev ipmi_ssif ses enclosure sg ipmi_devintf acpi_pad ipmi_msghandler acpi_power_meter mei_me lpc_ich wmi mei shpchp ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic ata_generic pata_acpi uas usb_storage mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix mpt3sas libata crct10dif_pclmul crct10dif_common tg3 crc32c_intel i2c_core raid_class ptp scsi_transport_sas pps_core dm_mirror dm_region_hash dm_log dm_mod CPU: 12 PID: 4336 Comm: python Kdump: loaded Tainted: G W ------------ 3.10.0-875.el7.brdc.x86_64 #1 Hardware name: Dell Inc. PowerEdge R820/0YWR73, BIOS 1.5.0 03/08/2013 task: ffff903fc96e0fd0 ti: ffff903fb1eec000 task.ti: ffff903fb1eec000 RIP: 0010:[<ffffffff9cb19ec0>] [<ffffffff9cb19ec0>] blk_requeue_request+0x90/0xa0 RSP: 0018:ffff903c6b783dc0 EFLAGS: 00010087 RAX: ffff903bb67026d0 RBX: ffff903b7d6a6140 RCX: dead000000000200 RDX: ffff903bb67026d0 RSI: ffff903bb6702580 RDI: ffff903bb67026d0 RBP: ffff903c6b783dd8 R08: ffff903bb67026d0 R09: ffffd97e80000000 R10: ffff903c658bac00 R11: 0000000000000000 R12: ffff903bb6702580 R13: ffff903fa9a292f0 R14: 0000000000000246 R15: 0000000000001057 FS: 00007f7026f5b740(0000) GS:ffff903c6b780000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f298877c004 CR3: 00000000caf36000 CR4: 00000000000607e0 Call Trace: <IRQ> [<ffffffff9cca68ff>] __scsi_queue_insert+0xbf/0x110 [<ffffffff9cca79ca>] scsi_io_completion+0x5da/0x6a0 [<ffffffff9cc9ca3c>] scsi_finish_command+0xdc/0x140 [<ffffffff9cca6aa2>] scsi_softirq_done+0x132/0x160 [<ffffffff9cb240c6>] blk_done_softirq+0x96/0xc0 [<ffffffff9c89a905>] __do_softirq+0xf5/0x280 [<ffffffff9cf2bd2c>] call_softirq+0x1c/0x30 [<ffffffff9c82d625>] do_softirq+0x65/0xa0 [<ffffffff9c89ac85>] irq_exit+0x105/0x110 [<ffffffff9cf2d0a8>] smp_apic_timer_interrupt+0x48/0x60 [<ffffffff9cf297f2>] apic_timer_interrupt+0x162/0x170 <EOI> [<ffffffff9cca5f41>] ? scsi_done+0x21/0x60 [<ffffffff9cb5ac18>] ? delay_tsc+0x38/0x60 [<ffffffff9cb5ab5d>] __const_udelay+0x2d/0x30 [<ffffffffc04effde>] _base_handshake_req_reply_wait+0x8e/0x4a0 [mpt3sas] [<ffffffffc04f0b13>] _base_get_ioc_facts+0x123/0x590 [mpt3sas] [<ffffffffc04f06c8>] ? _base_diag_reset+0x238/0x340 [mpt3sas] [<ffffffffc04f7993>] mpt3sas_base_hard_reset_handler+0x1f3/0x420 [mpt3sas] [<ffffffffc05132b9>] _ctl_ioctl_main.isra.12+0x11b9/0x1200 [mpt3sas] [<ffffffffc068d585>] ? xfs_file_aio_write+0x155/0x1b0 [xfs] [<ffffffff9ca1a4e3>] ? do_sync_write+0x93/0xe0 [<ffffffffc051337a>] _ctl_ioctl+0x1a/0x20 [mpt3sas] [<ffffffff9ca2fe90>] do_vfs_ioctl+0x350/0x560 [<ffffffff9ca1dec1>] ? __sb_end_write+0x31/0x60 [<ffffffff9ca30141>] SyS_ioctl+0xa1/0xc0 [<ffffffff9cf28715>] ? system_call_after_swapgs+0xa2/0x146 [<ffffffff9cf287d5>] system_call_fastpath+0x1c/0x21 [<ffffffff9cf28721>] ? system_call_after_swapgs+0xae/0x146 Code: 83 c3 10 4c 89 e2 4c 89 ee e8 8d 21 04 00 48 8b 03 48 85 c0 75 e5 41 f6 44 24 4a 10 74 ad 4c 89 e6 4c 89 ef e8 b2 42 00 00 eb a0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 RIP [<ffffffff9cb19ec0>] blk_requeue_request+0x90/0xa0 RSP <ffff903c6b783dc0> As a part of host reset operation, driver will flushout all IOs outstanding at driver level with "DID_RESET" result. To find which are all commands outstanding at the driver level, driver loops with smid starting from one to HBA queue depth and calls mpt3sas_scsih_scsi_lookup_get() to get scmd as shown below for (smid = 1; smid <= ioc->scsiio_depth; smid++) { scmd = mpt3sas_scsih_scsi_lookup_get(ioc, smid); if (!scmd) continue; But in mpt3sas_scsih_scsi_lookup_get() function, driver returns some scsi cmnds which are not outstanding at the driver level (possibly request is constructed at block layer since QUEUE_FLAG_QUIESCED is not set. Even if driver uses scsi_block_requests and scsi_unblock_requests, issue still persists as they will be just blocking further IO from scsi layer and not from block layer) and these commands are flushed with DID_RESET host bytes thus resulting into above kernel BUG. This issue got introduced by commit dbec4c9040ed ("scsi: mpt3sas: lockless command submission"). To fix this issue, we have modified the mpt3sas_scsih_scsi_lookup_get() to check for smid equals to zero (note: whenever any scsi cmnd is processing at the driver level then smid for that scsi cmnd will be non-zero, always it starts from one) before it returns the scmd pointer to the caller. If smid is zero then this function returns scmd pointer as NULL and driver won't flushout those scsi cmnds at driver level with DID_RESET host byte thus this issue will not be observed. [mkp: amended with updated fix from Sreekanth] Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Fixes: dbec4c9040ed ("scsi: mpt3sas: lockless command submission") Cc: stable@vger.kernel.org # v4.16+ Reviewed-by: Tomas Henzl <thenzl@redhat.com> Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-05scsi: qla2xxx: Fix stalled reloginHimanshu Madhani2-1/+2
commit 15b6c3c9568765f0717b2dd3aa67a5f7eadd9734 upstream. This patch sets and clears FCF_ASYNC_{SENT|ACTIVE} flags to prevent stalling of relogin attempt. Once flag are correctly set/cleared, relogin timer can retry relogin attempt for driver to continue login. Fixes: fa83e65885b9 ("scsi: qla2xxx: ensure async flags are reset correctly") Cc: stable@vger.kernel.org #4.17 Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-05scsi: libsas: dynamically allocate and free ata hostJason Yan2-14/+28
commit 2fa4a32613c9182b00e46872755b0662374424a7 upstream. Commit 2623c7a5f2 ("libata: add refcounting to ata_host") v4.17+ introduced refcounting to ata_host and will increase or decrease the refcount when adding or deleting transport ATA port. Now the ata host for libsas is embedded in domain_device, and the ->kref member is not initialized. Afer we add ata transport class, ata_host_get() will be called when adding transport ATA port and a warning will be triggered as below: refcount_t: increment on 0; use-after-free. WARNING: CPU: 2 PID: 103 at lib/refcount.c:153 refcount_inc+0x40/0x48 ...... Call trace: refcount_inc+0x40/0x48 ata_host_get+0x10/0x18 ata_tport_add+0x40/0x120 ata_sas_tport_add+0xc/0x14 sas_ata_init+0x7c/0xc8 sas_discover_domain+0x380/0x53c process_one_work+0x12c/0x288 worker_thread+0x58/0x3f0 kthread+0xfc/0x128 ret_from_fork+0x10/0x18 And also when removing transport ATA port ata_host_put() will be called and another similar warning will be triggered. If the refcount decreased to zero, the ata host will be freed. But this ata host is only part of domain_device, it cannot be freed directly. So we have to change this embedded static ata host to a dynamically allocated ata host and initialize the ->kref member. To use ata_host_get() and ata_host_put() in libsas, we need to move the declaration of these functions to the public libata.h and export them. Fixes: b6240a4df018 ("scsi: libsas: add transport class for ATA devices") Signed-off-by: Jason Yan <yanaijie@huawei.com> CC: John Garry <john.garry@huawei.com> CC: Taras Kondratiuk <takondra@cisco.com> CC: Tejun Heo <tj@kernel.org> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-12Merge tag 'scsi-fixes' of ↵Linus Torvalds7-49/+69
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Eight fixes. The most important one is the mpt3sas fix which makes the driver work again on big endian systems. The rest are mostly minor error path or checker issues and the vmw_scsi one fixes a performance problem" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: vmw_pvscsi: Return DID_RESET for status SAM_STAT_COMMAND_TERMINATED scsi: sr: Avoid that opening a CD-ROM hangs with runtime power management enabled scsi: mpt3sas: Swap I/O memory read value back to cpu endianness scsi: fcoe: clear FC_RP_STARTED flags when receiving a LOGO scsi: fcoe: drop frames in ELS LOGO error path scsi: fcoe: fix use-after-free in fcoe_ctlr_els_send scsi: qedi: Fix a potential buffer overflow scsi: qla2xxx: Fix memory leak for allocating abort IOCB
2018-08-03scsi: vmw_pvscsi: Return DID_RESET for status SAM_STAT_COMMAND_TERMINATEDJim Gill1-3/+8
Commands that are reset are returned with status SAM_STAT_COMMAND_TERMINATED. PVSCSI currently returns DID_OK | SAM_STAT_COMMAND_TERMINATED which fails the command. Instead, set hostbyte to DID_RESET to allow upper layers to retry. Tested by copying a large file between two pvscsi disks on same adapter while performing a bus reset at 1-second intervals. Before fix, commands sometimes fail with DID_OK. After fix, commands observed to fail with DID_RESET. Signed-off-by: Jim Gill <jgill@vmware.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-08-03scsi: sr: Avoid that opening a CD-ROM hangs with runtime power management ↵Bart Van Assche1-8/+21
enabled Surround scsi_execute() calls with scsi_autopm_get_device() and scsi_autopm_put_device(). Note: removing sr_mutex protection from the scsi_cd_get() and scsi_cd_put() calls is safe because the purpose of sr_mutex is to serialize cdrom_*() calls. This patch avoids that complaints similar to the following appear in the kernel log if runtime power management is enabled: INFO: task systemd-udevd:650 blocked for more than 120 seconds. Not tainted 4.18.0-rc7-dbg+ #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. systemd-udevd D28176 650 513 0x00000104 Call Trace: __schedule+0x444/0xfe0 schedule+0x4e/0xe0 schedule_preempt_disabled+0x18/0x30 __mutex_lock+0x41c/0xc70 mutex_lock_nested+0x1b/0x20 __blkdev_get+0x106/0x970 blkdev_get+0x22c/0x5a0 blkdev_open+0xe9/0x100 do_dentry_open.isra.19+0x33e/0x570 vfs_open+0x7c/0xd0 path_openat+0x6e3/0x1120 do_filp_open+0x11c/0x1c0 do_sys_open+0x208/0x2d0 __x64_sys_openat+0x59/0x70 do_syscall_64+0x77/0x230 entry_SYSCALL_64_after_hwframe+0x49/0xbe Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Maurizio Lombardi <mlombard@redhat.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: <stable@vger.kernel.org> Tested-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-08-03scsi: mpt3sas: Swap I/O memory read value back to cpu endiannessSreekanth Reddy1-8/+8
Swap the I/O memory read value back to cpu endianness before storing it in a data structures which are defined in the MPI headers where u8 components are not defined in the endianness order. In this area from day one mpt3sas driver is using le32_to_cpu() & cpu_to_le32() APIs. But in commit cf6bf9710c (mpt3sas: Bug fix for big endian systems) we have removed these APIs before reading I/O memory which we should haven't done it. So in this patch I am correcting it by adding these APIs back before accessing I/O memory. Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-08-02scsi: fcoe: clear FC_RP_STARTED flags when receiving a LOGOJohannes Thumshirn1-0/+1
When receiving a LOGO request we forget to clear the FC_RP_STARTED flag before starting the rport delete routine. As the started flag was not cleared, we're not deleting the rport but waiting for a restart and thus are keeping the reference count of the rdata object at 1. This leads to the following kmemleak report: unreferenced object 0xffff88006542aa00 (size 512): comm "kworker/0:2", pid 24, jiffies 4294899222 (age 226.880s) hex dump (first 32 bytes): 68 96 fe 65 00 88 ff ff 00 00 00 00 00 00 00 00 h..e............ 01 00 00 00 08 00 00 00 02 c5 45 24 ac b8 00 10 ..........E$.... backtrace: [<(____ptrval____)>] fcoe_ctlr_vn_add.isra.5+0x7f/0x770 [libfcoe] [<(____ptrval____)>] fcoe_ctlr_vn_recv+0x12af/0x27f0 [libfcoe] [<(____ptrval____)>] fcoe_ctlr_recv_work+0xd01/0x32f0 [libfcoe] [<(____ptrval____)>] process_one_work+0x7ff/0x1420 [<(____ptrval____)>] worker_thread+0x87/0xef0 [<(____ptrval____)>] kthread+0x2db/0x390 [<(____ptrval____)>] ret_from_fork+0x35/0x40 [<(____ptrval____)>] 0xffffffffffffffff Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Reported-by: ard <ard@kwaak.net> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>