Age | Commit message (Collapse) | Author | Files | Lines |
|
commit 8dc765d438f1e42b3e8227b3b09fad7d73f4ec9a upstream.
c2856ae2f315d ("blk-mq: quiesce queue before freeing queue") has
already fixed this race, however the implied synchronize_rcu()
in blk_mq_quiesce_queue() can slow down LUN probe a lot, so caused
performance regression.
Then 1311326cf4755c7 ("blk-mq: avoid to synchronize rcu inside blk_cleanup_queue()")
tried to quiesce queue for avoiding unnecessary synchronize_rcu()
only when queue initialization is done, because it is usual to see
lots of inexistent LUNs which need to be probed.
However, turns out it isn't safe to quiesce queue only when queue
initialization is done. Because when one SCSI command is completed,
the user of sending command can be waken up immediately, then the
scsi device may be removed, meantime the run queue in scsi_end_request()
is still in-progress, so kernel panic can be caused.
In Red Hat QE lab, there are several reports about this kind of kernel
panic triggered during kernel booting.
This patch tries to address the issue by grabing one queue usage
counter during freeing one request and the following run queue.
Fixes: 1311326cf4755c7 ("blk-mq: avoid to synchronize rcu inside blk_cleanup_queue()")
Cc: Andrew Jones <drjones@redhat.com>
Cc: Bart Van Assche <bart.vanassche@wdc.com>
Cc: linux-scsi@vger.kernel.org
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
Cc: stable <stable@vger.kernel.org>
Cc: jianchao.wang <jianchao.w.wang@oracle.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit f635e48e866ee1a47d2d42ce012fdcc07bf55853 upstream.
This patch initializes port speed so that firmware does not set lower
operating speed. Setting lower speed in firmware impacts WRITE perfomance.
Fixes: 726b85487067 ("qla2xxx: Add framework for async fabric discovery")
Cc: <stable@vger.kernel.org>
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Tested-by: Laurence Oberman <loberman@redhat.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit f7d61c995df74d6bb57bbff6a2b7b1874c4a2baa upstream.
Send aborts only when chip is active.
Fixes: 623ee824e579 ("scsi: qla2xxx: Fix FC-NVMe IO abort during driver reset")
Cc: <stable@vger.kernel.org> # 4.14
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 5c6400536481d9ef44ef94e7bf2c7b8e81534db7 upstream.
This patch fixes issue where driver clears NPort ID map instead of marking
handle in use. Once driver clears NPort ID from the database, it can reuse
the same NPort ID resulting in a PLOGI failure.
[mkp: fixed Himanshu's SoB]
Fixes: a084fd68e1d2 ("scsi: qla2xxx: Fix re-login for Nport Handle in use")
Cc: <stable@vger.kernel.org>
Signed-of-by: Quinn Tran <quinn.tran@cavium.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Himanshu Madhani <hmadhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 39553065f77c297239308470ee313841f4e07db4 upstream.
This patch fixes multiple call for qla_nvme_unregister_remote_port() as part
of qlt_schedule_session_for_deletion(), Do not call it again during
qla_nvme_delete()
Fixes: e473b3074104 ("scsi: qla2xxx: Add FC-NVMe abort processing")
Cc: <stable@vger.kernel.org>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 732ee9a912cf2d9a50c5f9c4213cdc2f885d6aa6 upstream.
The response data buffer used in switch scan is reused 4 times. (For example,
for commands GPN_FT, GNN_FT for FCP and FC-NVME) Before driver reuses this
buffer, clear it to prevent duplicate entries in our database.
Fixes: a4239945b8ad1 ("scsi: qla2xxx: Add switch command to simplify fabric discovery"
Cc: <stable@vger.kernel.org>
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 1e4ac5d6fe0a4af17e4b6251b884485832bf75a3 upstream.
If chip unable to fully initialize, use full shutdown sequence to clear out
any stale FW state.
Fixes: e315cd28b9ef ("[SCSI] qla2xxx: Code changes for qla data structure refactoring")
Cc: stable@vger.kernel.org #4.10
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 7c388f91ec1a59b0ed815b07b90536e2d57e1e1f upstream.
Remove stale debug trace.
Fixes: 1eb42f965ced ("qla2xxx: Make trace flags more readable")
Cc: stable@vger.kernel.org #4.10
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit b86ac8fd4b2f6ec2f9ca9194c56eac12d620096f upstream.
This patch improves performance for 16G and above adapter by removing
additional call to process_response_queue().
[mkp: typo]
Cc: <stable@vger.kernel.org>
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 4c1458df9635c7e3ced155f594d2e7dfd7254e21 upstream.
Fixes: 6246b8a1d26c7c ("[SCSI] qla2xxx: Enhancements to support ISP83xx.")
Fixes: 1bb395485160d2 ("qla2xxx: Correct iiDMA-update calling conventions.")
Cc: <stable@vger.kernel.org>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit ca7fb76e091f889cfda1287c07a9358f73832b39 ]
On io completion, the driver is taking an adapter wide lock and nulling the
scsi command back pointer. The nulling of the back pointer is to signify the
io was completed and the scsi_done() routine was called. However, the routine
makes no check to see if the abort routine had done the same thing and
possibly nulled the pointer. Thus it may doubly-complete the io.
Make the following mods:
- Check to make sure forward progress (call scsi_done()) only happens if the
command pointer was non-null.
- As the taking of the lock, which is adapter wide, is very costly on a system
under load, null the pointer using an xchg operation rather than under lock.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 0ef01a2d95fd62bb4f536e7ce4d5e8e74b97a244 ]
When running an mds diagnostic that passes frames with the switch, soft
lockups are detected. The driver is in a CQE processing loop and has
sufficient amount of traffic that it never exits the ring processing routine,
thus the "lockup".
Cap the number of elements in the work processing routine to 64 elements. This
ensures that the cpu will be given up and the handler reschedule to process
additional items.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 47db7873136a9c57c45390a53b57019cf73c8259 ]
In megasas_mgmt_compat_ioctl_fw(), to handle the structure
compat_megasas_iocpacket 'cioc', a user-space structure megasas_iocpacket
'ioc' is allocated before megasas_mgmt_ioctl_fw() is invoked to handle
the packet. Since the two data structures have different fields, the data
is copied from 'cioc' to 'ioc' field by field. In the copy process,
'sense_ptr' is prepared if the field 'sense_len' is not null, because it
will be used in megasas_mgmt_ioctl_fw(). To prepare 'sense_ptr', the
user-space data 'ioc->sense_off' and 'cioc->sense_off' are copied and
saved to kernel-space variables 'local_sense_off' and 'user_sense_off'
respectively. Given that 'ioc->sense_off' is also copied from
'cioc->sense_off', 'local_sense_off' and 'user_sense_off' should have the
same value. However, 'cioc' is in the user space and a malicious user can
race to change the value of 'cioc->sense_off' after it is copied to
'ioc->sense_off' but before it is copied to 'user_sense_off'. By doing
so, the attacker can inject different values into 'local_sense_off' and
'user_sense_off'. This can cause undefined behavior in the following
execution, because the two variables are supposed to be same.
This patch enforces a check on the two kernel variables 'local_sense_off'
and 'user_sense_off' to make sure they are the same after the copy. In
case they are not, an error code EINVAL will be returned.
Signed-off-by: Wenwen Wang <wang6495@umn.edu>
Acked-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit f4bb7704699beee9edfbee875daa9089c86cf724 ]
With commit 10e5e37581fc ("scsi: ufs: Add clock ungating to a separate
workqueue"), clock gating work was moved to a separate work queue with
WQ_MEM_RECLAIM set, since clock gating could occur from a memory reclaim
context. Unfortunately, clk_gating.gate_work was left queued via
schedule_delayed_work, which is a system workqueue that does not have
WQ_MEM_RECLAIM set. Because ufshcd_ungate_work attempts to cancel
gate_work, the following warning appears:
[ 14.174170] workqueue: WQ_MEM_RECLAIM ufs_clk_gating_0:ufshcd_ungate_work is flushing !WQ_MEM_RECLAIM events:ufshcd_gate_work
[ 14.174179] WARNING: CPU: 4 PID: 173 at kernel/workqueue.c:2440 check_flush_dependency+0x110/0x118
[ 14.205725] CPU: 4 PID: 173 Comm: kworker/u16:3 Not tainted 4.14.68 #1
[ 14.212437] Hardware name: Google Cheza (rev1) (DT)
[ 14.217459] Workqueue: ufs_clk_gating_0 ufshcd_ungate_work
[ 14.223107] task: ffffffc0f6a40080 task.stack: ffffff800a490000
[ 14.229195] PC is at check_flush_dependency+0x110/0x118
[ 14.234569] LR is at check_flush_dependency+0x110/0x118
[ 14.239944] pc : [<ffffff80080cad14>] lr : [<ffffff80080cad14>] pstate: 60c001c9
[ 14.333050] Call trace:
[ 14.427767] [<ffffff80080cad14>] check_flush_dependency+0x110/0x118
[ 14.434219] [<ffffff80080cafec>] start_flush_work+0xac/0x1fc
[ 14.440046] [<ffffff80080caeec>] flush_work+0x40/0x94
[ 14.445246] [<ffffff80080cb288>] __cancel_work_timer+0x11c/0x1b8
[ 14.451433] [<ffffff80080cb4b8>] cancel_delayed_work_sync+0x20/0x30
[ 14.457886] [<ffffff80085b9294>] ufshcd_ungate_work+0x24/0xd0
[ 14.463800] [<ffffff80080cfb04>] process_one_work+0x32c/0x690
[ 14.469713] [<ffffff80080d0154>] worker_thread+0x218/0x338
[ 14.475361] [<ffffff80080d527c>] kthread+0x120/0x130
[ 14.480470] [<ffffff8008084814>] ret_from_fork+0x10/0x18
The simple solution is to put the gate_work on the same WQ_MEM_RECLAIM
work queue as the ungate_work.
Fixes: 10e5e37581fc ("scsi: ufs: Add clock ungating to a separate workqueue")
Signed-off-by: Evan Green <evgreen@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit fd47d919d0c336e7c22862b51ee94927ffea227a ]
If a target disconnects during a PIO data transfer the command may fail
when the target reconnects:
scsi host1: DMA length is zero!
scsi host1: cur adr[04380000] len[00000000]
The scsi bus is then reset. This happens because the residual reached
zero before the transfer was completed.
The usual residual calculation relies on the Transfer Count registers.
That works for DMA transfers but not for PIO transfers. Fix the problem
by storing the PIO transfer residual and using that to correctly
calculate bytes_sent.
Fixes: 6fe07aaffbf0 ("[SCSI] m68k: new mac_esp scsi driver")
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Tested-by: Michael Schmitz <schmitzmic@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 3cc5746e5ad7688e274e193fa71278d98aa52759 ]
Fix kernel NULL pointer dereference,
Call Trace:
[<ffffffff9b7658e6>] __mutex_lock_slowpath+0xa6/0x1d0
[<ffffffff9b764cef>] mutex_lock+0x1f/0x2f
[<ffffffffc061b5e1>] qedi_get_protocol_tlv_data+0x61/0x450 [qedi]
[<ffffffff9b1f9d8e>] ? map_vm_area+0x2e/0x40
[<ffffffff9b1fc370>] ? __vmalloc_node_range+0x170/0x280
[<ffffffffc0b81c3d>] ? qed_mfw_process_tlv_req+0x27d/0xbd0 [qed]
[<ffffffffc0b6461b>] qed_mfw_fill_tlv_data+0x4b/0xb0 [qed]
[<ffffffffc0b81c59>] qed_mfw_process_tlv_req+0x299/0xbd0 [qed]
[<ffffffff9b02a59e>] ? __switch_to+0xce/0x580
[<ffffffffc0b61e5b>] qed_slowpath_task+0x5b/0x80 [qed]
Signed-off-by: Nilesh Javali <nilesh.javali@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit f1f1fadacaf08b7cf11714c0c29f8fa4d4ef68a9 ]
When sd_init_command() get's a command with a unknown req_op() it crashes the
system via BUG().
This makes debugging the actual reason for the broken request cmd_flags pretty
hard as the system is down before it's able to write out debugging data on the
serial console or the trace buffer.
Change the BUG() to a WARN_ON() and return BLKPREP_KILL to fail gracefully and
return an I/O error to the producer of the request.
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 318ddb34b2052f838aa243d07173e2badf3e630e ]
While dlpar adding primary ipr adapter back, driver goes through adapter
initialization then schedule ipr_worker_thread to start te disk scan by
dropping the host lock, calling scsi_add_device. Then get the adapter reset
request again, so driver does scsi_block_requests, this will cause the
scsi_add_device get hung until we unblock. But we can't run ipr_worker_thread
to do the unblock because its stuck in scsi_add_device.
This patch fixes the issue.
[mkp: typo and whitespace fixes]
Signed-off-by: Wen Xiong <wenxiong@linux.vnet.ibm.com>
Acked-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 9e210178267b80c4eeb832fade7e146a18c84915 ]
The driver currently uses the ndlp to get the local rport which is then used
to get the nvme transport remoteport pointer. There can be cases where a stale
remoteport pointer is obtained as synchronization isn't done through the
different dereferences.
Correct by using locks to synchronize the dereferences.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit adad633af7b970bfa5dd1b624a4afc83cac9b235 ]
While reviewing another part of the code, Kees noticed that the strncpy of the
partition name might not always be NUL terminated. Switch to using strscpy
which does this safely.
Reported-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Laura Abbott <labbott@redhat.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit d792d4c4fc866ae224b0b0ca2aabd87d23b4d6cc ]
There's currently a warning about string overflow with strncat:
drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c: In function 'ibmvscsis_probe':
drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c:3479:2: error: 'strncat' specified
bound 64 equals destination size [-Werror=stringop-overflow=]
strncat(vscsi->eye, vdev->name, MAX_EYE);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Switch to a single snprintf instead of a strcpy + strcat to handle this
cleanly.
Signed-off-by: Laura Abbott <labbott@redhat.com>
Suggested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit cbe3fd39d223f14b1c60c80fe9347a3dd08c2edb ]
We should first do the le16_to_cpu endian conversion and then apply the
FCP_CMD_LENGTH_MASK mask.
Fixes: 5f35509db179 ("qla2xxx: Terminate exchange if corrupted")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Quinn Tran <Quinn.Tran@cavium.com>
Acked-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit c77a2fa3ff8f73d1a485e67e6f81c64823739d59 ]
The QED driver commit, 1ac4329a1cff ("qed: Add configuration information
to register dump and debug data"), removes the CRC length validation
causing nvm_get_image failure while loading qedi driver:
[qed_mcp_get_nvm_image:2700(host_10-0)]Image [0] is too big - 00006008 bytes
where only 00006004 are available
[qedi_get_boot_info:2253]:10: Could not get NVM image. ret = -12
Hence add and adjust the CRC size to iSCSI NVM image to read boot info at
qedi load time.
Signed-off-by: Nilesh Javali <nilesh.javali@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit b9eb3b14f1dbf16bf27b6c1ffe6b8c00ec945c9b ]
The problem is that ->reset_state is a u8 but it can be set to -1 or -2 in
aac_tmf_callback() and the error handling in aac_eh_target_reset() relies
on it to be signed.
[mkp: fixed typo]
Fixes: 0d643ff3c353 ("scsi: aacraid: use aac_tmf_callback for reset fib")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 68bdc630721c40e908d22cffe07b5ca225a69f6e ]
- use be32_to_cpu() instead of ntohs() for 32 bit port capabilities.
- add a new function fwcaps32_to_caps16() to convert 32 bit port
capabilities to 16 bit port capabilities.
Signed-off-by: Varun Prakash <varun@chelsio.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 89809b028b6f54187b7d81a0c69b35d394c52e62 ]
Reported-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Varun Prakash <varun@chelsio.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit c3b10a55abc943a526aaecd7e860b15671beb906 ]
There is a possibility that firmware on the controller was upgraded before
system was suspended. During resume, driver needs to read updated
controller properties.
Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit d2fc401e47529d9ffd2673a5395d56002e31ad98 ]
There is a possible conflict when a device is removed and host reset occurs
concurrently.
The reason is that then the device is notified as gone, we try to clear the
ITCT, which is notified via an interrupt. The dev gone function pends on
this event with a completion, which is completed when the ITCT interrupt
occurs.
But host reset will disable all interrupts, the wait_for_completion() may
wait indefinitely.
This patch adds an semaphore to synchronise this two processes. The
semaphore is taken by the host reset as the basis of synchronising.
Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit aa154ea885eb0c2407457ce9c1538d78c95456fa ]
When ioremap_nocache fails, the lack of error-handling code may cause
unexpected results.
This patch adds error-handling code after calling ioremap_nocache.
Signed-off-by: Zhouyang Jia <jiazhouyang09@gmail.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Acked-by: Manish Rangankar <Manish.Rangankar@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 1262dc09dc9ae7bf4ad00b6a2c5ed6a6936bcd10 ]
Currently an open firmware property is copied into partition_name variable
without keeping a room for \0.
Later one, this variable (partition_name), which is 97 bytes long, is
strncpyed into ibmvcsci_host_data->madapter_info->partition_name, which is
96 bytes long, possibly truncating it 'again' and removing the \0.
This patch simply decreases the partition name to 96 and just copy using
strlcpy() which guarantees that the string is \0 terminated. I think there
is no issue if this there is a truncation in this very first copy, i.e,
when the open firmware property is read and copied into the driver for the
very first time;
This issue also causes the following warning on GCC 8:
drivers/scsi/ibmvscsi/ibmvscsi.c:281:2: warning: strncpy output may be truncated copying 96 bytes from a string of length 96 [-Wstringop-truncation]
...
inlined from ibmvscsi_probe at drivers/scsi/ibmvscsi/ibmvscsi.c:2221:7:
drivers/scsi/ibmvscsi/ibmvscsi.c:265:3: warning: strncpy specified bound 97 equals destination size [-Wstringop-truncation]
CC: Bart Van Assche <bart.vanassche@wdc.com>
CC: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Signed-off-by: Breno Leitao <leitao@debian.org>
Acked-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit d580c6137476ab307a66e278cf7dbc666230f714 ]
System crashes when the lpfc module is unloaded after making the port
offline
The nvme queue pointers were freed during port offline, but were later
accessed in pci remove path.
Validate the pointers in pci remove path before accessing them.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 6871e8144f935a1f08e7fc6269c894861ce494aa ]
Kernel occasionally crashed with the following
ops on NVME Target:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000058
IP: [<ffffffffa042ee50>] lpfc_nvmet_defer_rcv+0x50/0x70 [lpfc]
Callback routine was called for deferred rcv when it should be treated as a
normal rcv.
Added code in callback routine to detect this condition and log a message,
then bail.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit fa519f701d27198a2858bb108fc18ea9d8c106a7 ]
fc_rport_login() will be calling mutex_lock() while running inside an
RCU-protected section, triggering the warning 'sleeping function called
from invalid context'. To fix this we can drop the rcu functions here
altogether as the disc mutex protecting the list itself is already held,
preventing any list manipulation.
Fixes: a407c593398c ("scsi: libfc: Fixup disc_mutex handling")
Signed-off-by: Hannes Reinecke <hare@suse.com>
Acked-by: Johannes Thumshirn <jth@kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 4dc98c1995482262e70e83ef029135247fafe0f2 ]
tw_probe() returns 0 in case of fail of tw_initialize_device_extension(),
pci_resource_start() or tw_reset_sequence() and releases resources.
twl_probe() returns 0 in case of fail of twl_initialize_device_extension(),
pci_iomap() and twl_reset_sequence(). twa_probe() returns 0 in case of
fail of tw_initialize_device_extension(), ioremap() and
twa_reset_sequence().
The patch adds retval initialization for these cases.
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Anton Vasilyev <vasilyev@ispras.ru>
Acked-by: Adam Radford <aradford@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 93a3922da428ec0752e8b2ab00c42dadbbf805a9 ]
During remote port loss fault testing, the driver crashed with the
following trace:
general protection fault: 0000 [#1] SMP
RIP: ... lpfc_nvme_register_port+0x250/0x480 [lpfc]
Call Trace:
lpfc_nlp_state_cleanup+0x1b3/0x7a0 [lpfc]
lpfc_nlp_set_state+0xa6/0x1d0 [lpfc]
lpfc_cmpl_prli_prli_issue+0x213/0x440
lpfc_disc_state_machine+0x7e/0x1e0 [lpfc]
lpfc_cmpl_els_prli+0x18a/0x200 [lpfc]
lpfc_sli_sp_handle_rspiocb+0x3b5/0x6f0 [lpfc]
lpfc_sli_handle_slow_ring_event_s4+0x161/0x240 [lpfc]
lpfc_work_done+0x948/0x14c0 [lpfc]
lpfc_do_work+0x16f/0x180 [lpfc]
kthread+0xc9/0xe0
ret_from_fork+0x55/0x80
After registering a new remoteport, the driver is pulling an ndlp pointer
from the lpfc rport associated with the private area of a newly registered
remoteport. The private area is uninitialized, so it's garbage.
Correct by pulling the the lpfc rport pointer from the entering ndlp point,
then ndlp value from at rport. Note the entering ndlp may be replacing by
the rport->ndlp due to an address change swap.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 3f915271b12e11183c606bed1c3dfff0983662d3 ]
Driver uses shadow pointer instead of Mirror pointer for firmware dump
collection. Skip those entries for Mirror pointers for Request/Response
queue from firmware dump template reading.
Following messages are printed in log messages:
qla27xx_fwdt_entry_t268: unknown buffer 4
qla27xx_fwdt_entry_t268: unknown buffer 5
This patch fixes these error messages by adding skip_entry() to not read
them from template.
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 8fde6977ac478c00eeb2beccfdd4a6ad44219f6c ]
This patch sets discovery state back to GNL (Get Name List) when session is
stuck at GPDB (Get Port DataBase). This will allow state machine to retry
login and move session state ahead in discovery.
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit cb97f2c2e8d9f8c71ddbf04ad57e163ee6d86474 ]
During normal IO, FW can return IO with 'port unavailble' status. Driver
would send a LOGO to remote port for session resync. On an off chance, a
PLOGI could arrive before sending the LOGO. This patch will skip sendiing
LOGO if a PLOGI just came in.
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 53e13ee087a80e8d4fc95436318436e5c2c1f8c2 upstream.
A recent change added some MDS processing in the lpfc_drain_txq routine
that relies on the fcp_wq being allocated. For nvmet operation the fcp_wq
is not allocated because it can only be an nvme-target. When the original
MDS support was added LS_MDS_LOOPBACK was defined wrong, (0x16) it should
have been 0x10 (decimal value used for hex setting). This incorrect value
allowed MDS_LOOPBACK to be set simultaneously with LS_NPIV_FAB_SUPPORTED,
causing the driver to crash when it accesses the non-existent fcp_wq.
Correct the bad value setting for LS_MDS_LOOPBACK.
Fixes: ae9e28f36a6c ("lpfc: Add MDS Diagnostic support.")
Cc: <stable@vger.kernel.org> # v4.12+
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Tested-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 0756c57bce3d26da2592d834d8910b6887021701 ]
We accidentally return success instead of -ENOMEM on this error path.
Fixes: 2908d778ab3e ("[SCSI] aic94xx: new driver")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 0ee223b2e1f67cb2de9c0e3247c510d846e74d63 upstream.
A long time ago the unfortunate decision was taken to add a self-deletion
attribute to the sysfs SCSI device directory. That decision was unfortunate
because self-deletion is really tricky. We can't drop that attribute
because widely used user space software depends on it, namely the
rescan-scsi-bus.sh script. Hence this patch that avoids that writing into
that attribute triggers a deadlock. See also commit 7973cbd9fbd9 ("[PATCH]
add sysfs attributes to scan and delete scsi_devices").
This patch avoids that self-removal triggers the following deadlock:
======================================================
WARNING: possible circular locking dependency detected
4.18.0-rc2-dbg+ #5 Not tainted
------------------------------------------------------
modprobe/6539 is trying to acquire lock:
000000008323c4cd (kn->count#202){++++}, at: kernfs_remove_by_name_ns+0x45/0x90
but task is already holding lock:
00000000a6ec2c69 (&shost->scan_mutex){+.+.}, at: scsi_remove_host+0x21/0x150 [scsi_mod]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&shost->scan_mutex){+.+.}:
__mutex_lock+0xfe/0xc70
mutex_lock_nested+0x1b/0x20
scsi_remove_device+0x26/0x40 [scsi_mod]
sdev_store_delete+0x27/0x30 [scsi_mod]
dev_attr_store+0x3e/0x50
sysfs_kf_write+0x87/0xa0
kernfs_fop_write+0x190/0x230
__vfs_write+0xd2/0x3b0
vfs_write+0x101/0x270
ksys_write+0xab/0x120
__x64_sys_write+0x43/0x50
do_syscall_64+0x77/0x230
entry_SYSCALL_64_after_hwframe+0x49/0xbe
-> #0 (kn->count#202){++++}:
lock_acquire+0xd2/0x260
__kernfs_remove+0x424/0x4a0
kernfs_remove_by_name_ns+0x45/0x90
remove_files.isra.1+0x3a/0x90
sysfs_remove_group+0x5c/0xc0
sysfs_remove_groups+0x39/0x60
device_remove_attrs+0x82/0xb0
device_del+0x251/0x580
__scsi_remove_device+0x19f/0x1d0 [scsi_mod]
scsi_forget_host+0x37/0xb0 [scsi_mod]
scsi_remove_host+0x9b/0x150 [scsi_mod]
sdebug_driver_remove+0x4b/0x150 [scsi_debug]
device_release_driver_internal+0x241/0x360
device_release_driver+0x12/0x20
bus_remove_device+0x1bc/0x290
device_del+0x259/0x580
device_unregister+0x1a/0x70
sdebug_remove_adapter+0x8b/0xf0 [scsi_debug]
scsi_debug_exit+0x76/0xe8 [scsi_debug]
__x64_sys_delete_module+0x1c1/0x280
do_syscall_64+0x77/0x230
entry_SYSCALL_64_after_hwframe+0x49/0xbe
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&shost->scan_mutex);
lock(kn->count#202);
lock(&shost->scan_mutex);
lock(kn->count#202);
*** DEADLOCK ***
2 locks held by modprobe/6539:
#0: 00000000efaf9298 (&dev->mutex){....}, at: device_release_driver_internal+0x68/0x360
#1: 00000000a6ec2c69 (&shost->scan_mutex){+.+.}, at: scsi_remove_host+0x21/0x150 [scsi_mod]
stack backtrace:
CPU: 10 PID: 6539 Comm: modprobe Not tainted 4.18.0-rc2-dbg+ #5
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
Call Trace:
dump_stack+0xa4/0xf5
print_circular_bug.isra.34+0x213/0x221
__lock_acquire+0x1a7e/0x1b50
lock_acquire+0xd2/0x260
__kernfs_remove+0x424/0x4a0
kernfs_remove_by_name_ns+0x45/0x90
remove_files.isra.1+0x3a/0x90
sysfs_remove_group+0x5c/0xc0
sysfs_remove_groups+0x39/0x60
device_remove_attrs+0x82/0xb0
device_del+0x251/0x580
__scsi_remove_device+0x19f/0x1d0 [scsi_mod]
scsi_forget_host+0x37/0xb0 [scsi_mod]
scsi_remove_host+0x9b/0x150 [scsi_mod]
sdebug_driver_remove+0x4b/0x150 [scsi_debug]
device_release_driver_internal+0x241/0x360
device_release_driver+0x12/0x20
bus_remove_device+0x1bc/0x290
device_del+0x259/0x580
device_unregister+0x1a/0x70
sdebug_remove_adapter+0x8b/0xf0 [scsi_debug]
scsi_debug_exit+0x76/0xe8 [scsi_debug]
__x64_sys_delete_module+0x1c1/0x280
do_syscall_64+0x77/0x230
entry_SYSCALL_64_after_hwframe+0x49/0xbe
See also https://www.mail-archive.com/linux-scsi@vger.kernel.org/msg54525.html.
Fixes: ac0ece9174ac ("scsi: use device_remove_file_self() instead of device_schedule_callback()")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
commit 91b7bdb2c0089cbbb817df6888ab1458c645184e upstream.
This patch avoids that smatch complains about a double unlock on
ioc->transport_cmds.mutex.
Fixes: 651a01364994 ("scsi: scsi_transport_sas: switch to bsg-lib for SMP passthrough")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sathya Prakash <sathya.prakash@broadcom.com>
Cc: Chaitra P B <chaitra.basappa@broadcom.com>
Cc: Suganath Prabu Subramani <suganath-prabu.subramani@broadcom.com>
Cc: stable@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit e70183143cc472960bc60dfee1b7bbe1949feffb upstream.
Below kernel BUG was observed while running IOs with host reset (issued
from application),
mpt3sas_cm0: diag reset: SUCCESS
------------[ cut here ]------------
WARNING: CPU: 12 PID: 4336 at drivers/scsi/mpt3sas/mpt3sas_base.c:3282 mpt3sas_base_clear_st+0x3d/0x40 [mpt3sas]
Modules linked in: macsec tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag binfmt_misc fuse xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun devlink ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter sunrpc vfat fat sb_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt iTCO_vendor_support
dcdbas pcspkr joydev ipmi_ssif ses enclosure sg ipmi_devintf acpi_pad ipmi_msghandler acpi_power_meter mei_me lpc_ich wmi mei shpchp ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic ata_generic pata_acpi uas usb_storage mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix mpt3sas libata crct10dif_pclmul crct10dif_common tg3 crc32c_intel i2c_core raid_class ptp scsi_transport_sas pps_core dm_mirror dm_region_hash dm_log dm_mod
CPU: 12 PID: 4336 Comm: python Kdump: loaded Tainted: G W ------------ 3.10.0-875.el7.brdc.x86_64 #1
Hardware name: Dell Inc. PowerEdge R820/0YWR73, BIOS 1.5.0 03/08/2013
Call Trace:
[<ffffffff9cf16583>] dump_stack+0x19/0x1b
[<ffffffff9c891698>] __warn+0xd8/0x100
[<ffffffff9c8917dd>] warn_slowpath_null+0x1d/0x20
[<ffffffffc04f3f4d>] mpt3sas_base_clear_st+0x3d/0x40 [mpt3sas]
[<ffffffffc05047d2>] _scsih_flush_running_cmds+0x92/0xe0 [mpt3sas]
[<ffffffffc05095db>] mpt3sas_scsih_reset_handler+0x43b/0xaf0 [mpt3sas]
[<ffffffff9c894829>] ? vprintk_default+0x29/0x40
[<ffffffff9cf10531>] ? printk+0x60/0x77
[<ffffffffc04f06c8>] ? _base_diag_reset+0x238/0x340 [mpt3sas]
[<ffffffffc04f794d>] mpt3sas_base_hard_reset_handler+0x1ad/0x420 [mpt3sas]
[<ffffffffc05132b9>] _ctl_ioctl_main.isra.12+0x11b9/0x1200 [mpt3sas]
[<ffffffffc068d585>] ? xfs_file_aio_write+0x155/0x1b0 [xfs]
[<ffffffff9ca1a4e3>] ? do_sync_write+0x93/0xe0
[<ffffffffc051337a>] _ctl_ioctl+0x1a/0x20 [mpt3sas]
[<ffffffff9ca2fe90>] do_vfs_ioctl+0x350/0x560
[<ffffffff9ca1dec1>] ? __sb_end_write+0x31/0x60
[<ffffffff9ca30141>] SyS_ioctl+0xa1/0xc0
[<ffffffff9cf28715>] ? system_call_after_swapgs+0xa2/0x146
[<ffffffff9cf287d5>] system_call_fastpath+0x1c/0x21
[<ffffffff9cf28721>] ? system_call_after_swapgs+0xae/0x146
---[ end trace 5dac5b98d89aaa3c ]---
------------[ cut here ]------------
kernel BUG at block/blk-core.c:1476!
invalid opcode: 0000 [#1] SMP
Modules linked in: macsec tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag binfmt_misc fuse xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun devlink ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter sunrpc vfat fat sb_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt iTCO_vendor_support
dcdbas pcspkr joydev ipmi_ssif ses enclosure sg ipmi_devintf acpi_pad ipmi_msghandler acpi_power_meter mei_me lpc_ich wmi mei shpchp ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic ata_generic pata_acpi uas usb_storage mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix mpt3sas libata crct10dif_pclmul crct10dif_common tg3 crc32c_intel i2c_core raid_class ptp scsi_transport_sas pps_core dm_mirror dm_region_hash dm_log dm_mod
CPU: 12 PID: 4336 Comm: python Kdump: loaded Tainted: G W ------------ 3.10.0-875.el7.brdc.x86_64 #1
Hardware name: Dell Inc. PowerEdge R820/0YWR73, BIOS 1.5.0 03/08/2013
task: ffff903fc96e0fd0 ti: ffff903fb1eec000 task.ti: ffff903fb1eec000
RIP: 0010:[<ffffffff9cb19ec0>] [<ffffffff9cb19ec0>] blk_requeue_request+0x90/0xa0
RSP: 0018:ffff903c6b783dc0 EFLAGS: 00010087
RAX: ffff903bb67026d0 RBX: ffff903b7d6a6140 RCX: dead000000000200
RDX: ffff903bb67026d0 RSI: ffff903bb6702580 RDI: ffff903bb67026d0
RBP: ffff903c6b783dd8 R08: ffff903bb67026d0 R09: ffffd97e80000000
R10: ffff903c658bac00 R11: 0000000000000000 R12: ffff903bb6702580
R13: ffff903fa9a292f0 R14: 0000000000000246 R15: 0000000000001057
FS: 00007f7026f5b740(0000) GS:ffff903c6b780000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f298877c004 CR3: 00000000caf36000 CR4: 00000000000607e0
Call Trace:
<IRQ>
[<ffffffff9cca68ff>] __scsi_queue_insert+0xbf/0x110
[<ffffffff9cca79ca>] scsi_io_completion+0x5da/0x6a0
[<ffffffff9cc9ca3c>] scsi_finish_command+0xdc/0x140
[<ffffffff9cca6aa2>] scsi_softirq_done+0x132/0x160
[<ffffffff9cb240c6>] blk_done_softirq+0x96/0xc0
[<ffffffff9c89a905>] __do_softirq+0xf5/0x280
[<ffffffff9cf2bd2c>] call_softirq+0x1c/0x30
[<ffffffff9c82d625>] do_softirq+0x65/0xa0
[<ffffffff9c89ac85>] irq_exit+0x105/0x110
[<ffffffff9cf2d0a8>] smp_apic_timer_interrupt+0x48/0x60
[<ffffffff9cf297f2>] apic_timer_interrupt+0x162/0x170
<EOI>
[<ffffffff9cca5f41>] ? scsi_done+0x21/0x60
[<ffffffff9cb5ac18>] ? delay_tsc+0x38/0x60
[<ffffffff9cb5ab5d>] __const_udelay+0x2d/0x30
[<ffffffffc04effde>] _base_handshake_req_reply_wait+0x8e/0x4a0 [mpt3sas]
[<ffffffffc04f0b13>] _base_get_ioc_facts+0x123/0x590 [mpt3sas]
[<ffffffffc04f06c8>] ? _base_diag_reset+0x238/0x340 [mpt3sas]
[<ffffffffc04f7993>] mpt3sas_base_hard_reset_handler+0x1f3/0x420 [mpt3sas]
[<ffffffffc05132b9>] _ctl_ioctl_main.isra.12+0x11b9/0x1200 [mpt3sas]
[<ffffffffc068d585>] ? xfs_file_aio_write+0x155/0x1b0 [xfs]
[<ffffffff9ca1a4e3>] ? do_sync_write+0x93/0xe0
[<ffffffffc051337a>] _ctl_ioctl+0x1a/0x20 [mpt3sas]
[<ffffffff9ca2fe90>] do_vfs_ioctl+0x350/0x560
[<ffffffff9ca1dec1>] ? __sb_end_write+0x31/0x60
[<ffffffff9ca30141>] SyS_ioctl+0xa1/0xc0
[<ffffffff9cf28715>] ? system_call_after_swapgs+0xa2/0x146
[<ffffffff9cf287d5>] system_call_fastpath+0x1c/0x21
[<ffffffff9cf28721>] ? system_call_after_swapgs+0xae/0x146
Code: 83 c3 10 4c 89 e2 4c 89 ee e8 8d 21 04 00 48 8b 03 48 85 c0 75 e5 41 f6 44 24 4a 10 74 ad 4c 89 e6 4c 89 ef e8 b2 42 00 00 eb a0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90
RIP [<ffffffff9cb19ec0>] blk_requeue_request+0x90/0xa0
RSP <ffff903c6b783dc0>
As a part of host reset operation, driver will flushout all IOs outstanding
at driver level with "DID_RESET" result. To find which are all commands
outstanding at the driver level, driver loops with smid starting from one
to HBA queue depth and calls mpt3sas_scsih_scsi_lookup_get() to get scmd as
shown below
for (smid = 1; smid <= ioc->scsiio_depth; smid++) {
scmd = mpt3sas_scsih_scsi_lookup_get(ioc, smid);
if (!scmd)
continue;
But in mpt3sas_scsih_scsi_lookup_get() function, driver returns some scsi
cmnds which are not outstanding at the driver level (possibly request is
constructed at block layer since QUEUE_FLAG_QUIESCED is not set. Even if
driver uses scsi_block_requests and scsi_unblock_requests, issue still
persists as they will be just blocking further IO from scsi layer and not
from block layer) and these commands are flushed with DID_RESET host bytes
thus resulting into above kernel BUG.
This issue got introduced by commit dbec4c9040ed ("scsi: mpt3sas: lockless
command submission").
To fix this issue, we have modified the mpt3sas_scsih_scsi_lookup_get() to
check for smid equals to zero (note: whenever any scsi cmnd is processing
at the driver level then smid for that scsi cmnd will be non-zero, always
it starts from one) before it returns the scmd pointer to the caller. If
smid is zero then this function returns scmd pointer as NULL and driver
won't flushout those scsi cmnds at driver level with DID_RESET host byte
thus this issue will not be observed.
[mkp: amended with updated fix from Sreekanth]
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Fixes: dbec4c9040ed ("scsi: mpt3sas: lockless command submission")
Cc: stable@vger.kernel.org # v4.16+
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 15b6c3c9568765f0717b2dd3aa67a5f7eadd9734 upstream.
This patch sets and clears FCF_ASYNC_{SENT|ACTIVE} flags to prevent
stalling of relogin attempt. Once flag are correctly set/cleared, relogin
timer can retry relogin attempt for driver to continue login.
Fixes: fa83e65885b9 ("scsi: qla2xxx: ensure async flags are reset correctly")
Cc: stable@vger.kernel.org #4.17
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 2fa4a32613c9182b00e46872755b0662374424a7 upstream.
Commit 2623c7a5f2 ("libata: add refcounting to ata_host") v4.17+ introduced
refcounting to ata_host and will increase or decrease the refcount when
adding or deleting transport ATA port.
Now the ata host for libsas is embedded in domain_device, and the ->kref
member is not initialized. Afer we add ata transport class, ata_host_get()
will be called when adding transport ATA port and a warning will be
triggered as below:
refcount_t: increment on 0; use-after-free.
WARNING: CPU: 2 PID: 103 at
lib/refcount.c:153 refcount_inc+0x40/0x48 ...... Call trace:
refcount_inc+0x40/0x48
ata_host_get+0x10/0x18
ata_tport_add+0x40/0x120
ata_sas_tport_add+0xc/0x14
sas_ata_init+0x7c/0xc8
sas_discover_domain+0x380/0x53c
process_one_work+0x12c/0x288
worker_thread+0x58/0x3f0
kthread+0xfc/0x128
ret_from_fork+0x10/0x18
And also when removing transport ATA port ata_host_put() will be called and
another similar warning will be triggered. If the refcount decreased to
zero, the ata host will be freed. But this ata host is only part of
domain_device, it cannot be freed directly.
So we have to change this embedded static ata host to a dynamically
allocated ata host and initialize the ->kref member. To use ata_host_get()
and ata_host_put() in libsas, we need to move the declaration of these
functions to the public libata.h and export them.
Fixes: b6240a4df018 ("scsi: libsas: add transport class for ATA devices")
Signed-off-by: Jason Yan <yanaijie@huawei.com>
CC: John Garry <john.garry@huawei.com>
CC: Taras Kondratiuk <takondra@cisco.com>
CC: Tejun Heo <tj@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Eight fixes.
The most important one is the mpt3sas fix which makes the driver work
again on big endian systems. The rest are mostly minor error path or
checker issues and the vmw_scsi one fixes a performance problem"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: vmw_pvscsi: Return DID_RESET for status SAM_STAT_COMMAND_TERMINATED
scsi: sr: Avoid that opening a CD-ROM hangs with runtime power management enabled
scsi: mpt3sas: Swap I/O memory read value back to cpu endianness
scsi: fcoe: clear FC_RP_STARTED flags when receiving a LOGO
scsi: fcoe: drop frames in ELS LOGO error path
scsi: fcoe: fix use-after-free in fcoe_ctlr_els_send
scsi: qedi: Fix a potential buffer overflow
scsi: qla2xxx: Fix memory leak for allocating abort IOCB
|
|
Commands that are reset are returned with status
SAM_STAT_COMMAND_TERMINATED. PVSCSI currently returns DID_OK |
SAM_STAT_COMMAND_TERMINATED which fails the command. Instead, set hostbyte
to DID_RESET to allow upper layers to retry.
Tested by copying a large file between two pvscsi disks on same adapter
while performing a bus reset at 1-second intervals. Before fix, commands
sometimes fail with DID_OK. After fix, commands observed to fail with
DID_RESET.
Signed-off-by: Jim Gill <jgill@vmware.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
enabled
Surround scsi_execute() calls with scsi_autopm_get_device() and
scsi_autopm_put_device(). Note: removing sr_mutex protection from the
scsi_cd_get() and scsi_cd_put() calls is safe because the purpose of
sr_mutex is to serialize cdrom_*() calls.
This patch avoids that complaints similar to the following appear in the
kernel log if runtime power management is enabled:
INFO: task systemd-udevd:650 blocked for more than 120 seconds.
Not tainted 4.18.0-rc7-dbg+ #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
systemd-udevd D28176 650 513 0x00000104
Call Trace:
__schedule+0x444/0xfe0
schedule+0x4e/0xe0
schedule_preempt_disabled+0x18/0x30
__mutex_lock+0x41c/0xc70
mutex_lock_nested+0x1b/0x20
__blkdev_get+0x106/0x970
blkdev_get+0x22c/0x5a0
blkdev_open+0xe9/0x100
do_dentry_open.isra.19+0x33e/0x570
vfs_open+0x7c/0xd0
path_openat+0x6e3/0x1120
do_filp_open+0x11c/0x1c0
do_sys_open+0x208/0x2d0
__x64_sys_openat+0x59/0x70
do_syscall_64+0x77/0x230
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Maurizio Lombardi <mlombard@redhat.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: <stable@vger.kernel.org>
Tested-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Swap the I/O memory read value back to cpu endianness before storing it in
a data structures which are defined in the MPI headers where u8 components
are not defined in the endianness order.
In this area from day one mpt3sas driver is using le32_to_cpu() &
cpu_to_le32() APIs. But in commit cf6bf9710c
(mpt3sas: Bug fix for big endian systems) we have removed these APIs
before reading I/O memory which we should haven't done it. So
in this patch I am correcting it by adding these APIs back
before accessing I/O memory.
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
When receiving a LOGO request we forget to clear the FC_RP_STARTED flag
before starting the rport delete routine.
As the started flag was not cleared, we're not deleting the rport but
waiting for a restart and thus are keeping the reference count of the rdata
object at 1.
This leads to the following kmemleak report:
unreferenced object 0xffff88006542aa00 (size 512):
comm "kworker/0:2", pid 24, jiffies 4294899222 (age 226.880s)
hex dump (first 32 bytes):
68 96 fe 65 00 88 ff ff 00 00 00 00 00 00 00 00 h..e............
01 00 00 00 08 00 00 00 02 c5 45 24 ac b8 00 10 ..........E$....
backtrace:
[<(____ptrval____)>] fcoe_ctlr_vn_add.isra.5+0x7f/0x770 [libfcoe]
[<(____ptrval____)>] fcoe_ctlr_vn_recv+0x12af/0x27f0 [libfcoe]
[<(____ptrval____)>] fcoe_ctlr_recv_work+0xd01/0x32f0 [libfcoe]
[<(____ptrval____)>] process_one_work+0x7ff/0x1420
[<(____ptrval____)>] worker_thread+0x87/0xef0
[<(____ptrval____)>] kthread+0x2db/0x390
[<(____ptrval____)>] ret_from_fork+0x35/0x40
[<(____ptrval____)>] 0xffffffffffffffff
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Reported-by: ard <ard@kwaak.net>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|