Age | Commit message (Collapse) | Author | Files | Lines |
|
[ Upstream commit 9ddf190a7df77b77817f955fdb9c2ae9d1c9c9a3 ]
JAZZ_ESP is a bool kconfig symbol that selects SCSI_SPI_ATTRS. When
CONFIG_SCSI=m, this results in SCSI_SPI_ATTRS=m while JAZZ_ESP=y, which
causes many undefined symbol linker errors.
Fix this by only offering to build this driver when CONFIG_SCSI=y.
[mkp: JAZZ_ESP is unique in that it does not support being compiled as a
module unlike the remaining SPI SCSI HBA drivers]
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/r/20240214055953.9612-1-rdunlap@infradead.org
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: linux-mips@vger.kernel.org
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Nicolas Schier <nicolas@fjasle.eu>
Cc: James E.J. Bottomley <jejb@linux.ibm.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: linux-scsi@vger.kernel.org
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202402112222.Gl0udKyU-lkp@intel.com/
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 5761eb9761d2d5fe8248a9b719efc4d8baf1f24a ]
Correct blk-mq registration issue with module parameter
disable_managed_interrupts enabled.
When we turn off the default PCI_IRQ_AFFINITY flag, the driver needs to
register with blk-mq using blk_mq_map_queues(). The driver is currently
calling blk_mq_pci_map_queues() which results in a stack trace and possibly
undefined behavior.
Stack Trace:
[ 7.860089] scsi host2: smartpqi
[ 7.871934] WARNING: CPU: 0 PID: 238 at block/blk-mq-pci.c:52 blk_mq_pci_map_queues+0xca/0xd0
[ 7.889231] Modules linked in: sd_mod t10_pi sg uas smartpqi(+) crc32c_intel scsi_transport_sas usb_storage dm_mirror dm_region_hash dm_log dm_mod ipmi_devintf ipmi_msghandler fuse
[ 7.924755] CPU: 0 PID: 238 Comm: kworker/0:3 Not tainted 4.18.0-372.88.1.el8_6_smartpqi_test.x86_64 #1
[ 7.944336] Hardware name: HPE ProLiant DL380 Gen10/ProLiant DL380 Gen10, BIOS U30 03/08/2022
[ 7.963026] Workqueue: events work_for_cpu_fn
[ 7.978275] RIP: 0010:blk_mq_pci_map_queues+0xca/0xd0
[ 7.978278] Code: 48 89 de 89 c7 e8 f6 0f 4f 00 3b 05 c4 b7 8e 01 72 e1 5b 31 c0 5d 41 5c 41 5d 41 5e 41 5f e9 7d df 73 00 31 c0 e9 76 df 73 00 <0f> 0b eb bc 90 90 0f 1f 44 00 00 41 57 49 89 ff 41 56 41 55 41 54
[ 7.978280] RSP: 0018:ffffa95fc3707d50 EFLAGS: 00010216
[ 7.978283] RAX: 00000000ffffffff RBX: 0000000000000000 RCX: 0000000000000010
[ 7.978284] RDX: 0000000000000004 RSI: 0000000000000000 RDI: ffff9190c32d4310
[ 7.978286] RBP: 0000000000000000 R08: ffffa95fc3707d38 R09: ffff91929b81ac00
[ 7.978287] R10: 0000000000000001 R11: ffffa95fc3707ac0 R12: 0000000000000000
[ 7.978288] R13: ffff9190c32d4000 R14: 00000000ffffffff R15: ffff9190c4c950a8
[ 7.978290] FS: 0000000000000000(0000) GS:ffff9193efc00000(0000) knlGS:0000000000000000
[ 7.978292] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 8.172814] CR2: 000055d11166c000 CR3: 00000002dae10002 CR4: 00000000007706f0
[ 8.172816] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 8.172817] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 8.172818] PKRU: 55555554
[ 8.172819] Call Trace:
[ 8.172823] blk_mq_alloc_tag_set+0x12e/0x310
[ 8.264339] scsi_add_host_with_dma.cold.9+0x30/0x245
[ 8.279302] pqi_ctrl_init+0xacf/0xc8e [smartpqi]
[ 8.294085] ? pqi_pci_probe+0x480/0x4c8 [smartpqi]
[ 8.309015] pqi_pci_probe+0x480/0x4c8 [smartpqi]
[ 8.323286] local_pci_probe+0x42/0x80
[ 8.337855] work_for_cpu_fn+0x16/0x20
[ 8.351193] process_one_work+0x1a7/0x360
[ 8.364462] ? create_worker+0x1a0/0x1a0
[ 8.379252] worker_thread+0x1ce/0x390
[ 8.392623] ? create_worker+0x1a0/0x1a0
[ 8.406295] kthread+0x10a/0x120
[ 8.418428] ? set_kthread_struct+0x50/0x50
[ 8.431532] ret_from_fork+0x1f/0x40
[ 8.444137] ---[ end trace 1bf0173d39354506 ]---
Fixes: cf15c3e734e8 ("scsi: smartpqi: Add module param to disable managed ints")
Tested-by: Yogesh Chandra Pandey <YogeshChandra.Pandey@microchip.com>
Reviewed-by: Scott Benesh <scott.benesh@microchip.com>
Reviewed-by: Scott Teel <scott.teel@microchip.com>
Reviewed-by: Mahesh Rajashekhara <mahesh.rajashekhara@microchip.com>
Reviewed-by: Mike McGowen <mike.mcgowen@microchip.com>
Reviewed-by: Kevin Barnett <kevin.barnett@microchip.com>
Signed-off-by: Don Brace <don.brace@microchip.com>
Link: https://lore.kernel.org/r/20240213162200.1875970-2-don.brace@microchip.com
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit b5fc07a5fb56216a49e6c1d0b172d5464d99a89b upstream.
Commit c92a6b5d6335 ("scsi: core: Query VPD size before getting full
page") removed the logic which checks whether a VPD page is present on
the supported pages list before asking for the page itself. That was
done because SPC helpfully states "The Supported VPD Pages VPD page
list may or may not include all the VPD pages that are able to be
returned by the device server". Testing had revealed a few devices
that supported some of the 0xBn pages but didn't actually list them in
page 0.
Julian Sikorski bisected a problem with his drive resetting during
discovery to the commit above. As it turns out, this particular drive
firmware will crash if we attempt to fetch page 0xB9.
Various approaches were attempted to work around this. In the end,
reinstating the logic that consults VPD page 0 before fetching any
other page was the path of least resistance. A firmware update for the
devices which originally compelled us to remove the check has since
been released.
Link: https://lore.kernel.org/r/20240214221411.2888112-1-martin.petersen@oracle.com
Fixes: c92a6b5d6335 ("scsi: core: Query VPD size before getting full page")
Cc: stable@vger.kernel.org
Cc: Bart Van Assche <bvanassche@acm.org>
Reported-by: Julian Sikorski <belegdol@gmail.com>
Tested-by: Julian Sikorski <belegdol@gmail.com>
Reviewed-by: Lee Duncan <lee.duncan@suse.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 321da3dc1f3c92a12e3c5da934090d2992a8814c upstream.
It has been observed that some USB/UAS devices return generic properties
hardcoded in firmware for mode pages for a period of time after a device
has been discovered. The reported properties are either garbage or they do
not accurately reflect the characteristics of the physical storage device
attached in the case of a bridge.
Prior to commit 1e029397d12f ("scsi: sd: Reorganize DIF/DIX code to
avoid calling revalidate twice") we would call revalidate several
times during device discovery. As a result, incorrect values would
eventually get replaced with ones accurately describing the attached
storage. When we did away with the redundant revalidate pass, several
cases were reported where devices reported nonsensical values or would
end up in write-protected state.
An initial attempt at addressing this issue involved introducing a
delayed second revalidate invocation. However, this approach still
left some devices reporting incorrect characteristics.
Tasos Sahanidis debugged the problem further and identified that
introducing a READ operation prior to MODE SENSE fixed the problem and that
it wasn't a timing issue. Issuing a READ appears to cause the devices to
update their state to reflect the actual properties of the storage
media. Device properties like vendor, model, and storage capacity appear to
be correctly reported from the get-go. It is unclear why these devices
defer populating the remaining characteristics.
Match the behavior of a well known commercial operating system and
trigger a READ operation prior to querying device characteristics to
force the device to populate the mode pages.
The additional READ is triggered by a flag set in the USB storage and
UAS drivers. We avoid issuing the READ for other transport classes
since some storage devices identify Linux through our particular
discovery command sequence.
Link: https://lore.kernel.org/r/20240213143306.2194237-1-martin.petersen@oracle.com
Fixes: 1e029397d12f ("scsi: sd: Reorganize DIF/DIX code to avoid calling revalidate twice")
Cc: stable@vger.kernel.org
Reported-by: Tasos Sahanidis <tasos@tasossah.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Tasos Sahanidis <tasos@tasossah.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit d6c1b19153f92e95e5e1801d540e98771053afae ]
LUNs going into "failed ready running" state observed on >1T and on even
numbers of size (2T, 4T, 6T, 8T and 10T). The issue occurs when DIF is
enabled at the host.
The kernel logs:
Cannot setup S/G List for HBAIO segs 1/1 SGL 512 SCSI 256: 3 0
The host lpfc driver is failing to setup scatter/gather list (protection
data) for the I/Os.
The return type lpfc_bg_setup_sgl()/lpfc_bg_setup_sgl_prot() causes the
compiler to remove the most significant bit. Use an unsigned type instead.
Signed-off-by: Hannes Reinecke <hare@suse.de>
[dwagner: added commit message]
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Link: https://lore.kernel.org/r/20231220162658.12392-1-dwagner@suse.de
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit fb4cece17b4583f55b34a8538e27a4adc833c9d4 ]
Correct rescan flag race condition.
Multiple conditions are being evaluated before notifying OS to do a rescan.
Driver will skip rescanning the device if any one of the following
conditions are met:
- Devices that have not yet been added to the OS or devices that have been
removed.
- Devices which are already marked for removal or in the phase of removal.
Under very rare conditions, after logical volume size expansion, the OS
still sees the size of the logical volume which was before expansion.
The rescan flag in the driver is used to signal the need for a logical
volume rescan. A race condition can occur in the driver, and it leads to
one thread overwriting the flag inadvertently. As a result, driver is not
notifying the OS SML to rescan the logical volume.
Move device->rescan update into new function pqi_mark_volumes_for_rescan()
and protect with a spin lock.
Move check for device->rescan into new function pqi_volume_rescan_needed()
and protect function call with a spin_lock.
Reviewed-by: Scott Teel <scott.teel@microchip.com>
Reviewed-by: Scott Benesh <scott.benesh@microchip.com>
Reviewed-by: Mike McGowen <mike.mcgowen@microchip.com>
Reviewed-by: Kevin Barnett <kevin.barnett@microchip.com>
Co-developed-by: Murthy Bhat <Murthy.Bhat@microchip.com>
Signed-off-by: Murthy Bhat <Murthy.Bhat@microchip.com>
Signed-off-by: Mahesh Rajashekhara <mahesh.rajashekhara@microchip.com>
Signed-off-by: Don Brace <don.brace@microchip.com>
Link: https://lore.kernel.org/r/20231219193653.277553-3-don.brace@microchip.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit c6d5aa44eaf6d119f9ceb3bfc7d22405ac04232a ]
All PCI ID entries in Hex.
Add PCI IDs for Cisco controllers:
VID / DID / SVID / SDID
---- ---- ---- ----
Cisco 24G TriMode M1 RAID 4GB FBWC 32D 9005 / 028f / 1137 / 02f8
Cisco 24G TriMode M1 RAID 4GB FBWC 16D 9005 / 028f / 1137 / 02f9
Cisco 24G TriMode M1 HBA 16D 9005 / 028f / 1137 / 02fa
Add PCI IDs for CloudNine controllers:
VID / DID / SVID / SDID
---- ---- ---- ----
SmartRAID P7604N-16i 9005 / 028f / 1f51 / 100e
SmartRAID P7604N-8i 9005 / 028f / 1f51 / 100f
SmartRAID P7504N-16i 9005 / 028f / 1f51 / 1010
SmartRAID P7504N-8i 9005 / 028f / 1f51 / 1011
SmartRAID P7504N-8i 9005 / 028f / 1f51 / 1043
SmartHBA P6500-8i 9005 / 028f / 1f51 / 1044
SmartRAID P7504-8i 9005 / 028f / 1f51 / 1045
Reviewed-by: Murthy Bhat <Murthy.Bhat@microchip.com>
Reviewed-by: Mahesh Rajashekhara <mahesh.rajashekhara@microchip.com>
Reviewed-by: Scott Teel <scott.teel@microchip.com>
Reviewed-by: Scott Benesh <scott.benesh@microchip.com>
Reviewed-by: Mike McGowen <mike.mcgowen@microchip.com>
Reviewed-by: Kevin Barnett <kevin.barnett@microchip.com>
Signed-off-by: David Strahan <david.strahan@microchip.com>
Signed-off-by: Don Brace <don.brace@microchip.com>
Link: https://lore.kernel.org/r/20231219193653.277553-2-don.brace@microchip.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 977fe773dcc7098d8eaf4ee6382cb51e13e784cb upstream.
This reverts commit 1a1975551943f681772720f639ff42fbaa746212.
This commit causes interrupts to be lost for FCoE devices, since it changed
sping locks from "bh" to "irqsave".
Instead, a work queue should be used, and will be addressed in a separate
commit.
Fixes: 1a1975551943 ("scsi: fcoe: Fix potential deadlock on &fip->ctlr_lock")
Signed-off-by: Lee Duncan <lduncan@suse.com>
Link: https://lore.kernel.org/r/c578cdcd46b60470535c4c4a953e6a1feca0dffd.1707500786.git.lduncan@suse.com
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit f4469f3858352ad1197434557150b1f7086762a0 upstream.
Current code uses the specified ring buffer size (either the default of 128
Kbytes or a module parameter specified value) to encompass the one page
ring buffer header plus the actual ring itself. When the page size is 4K,
carving off one page for the header isn't significant. But when the page
size is 64K on ARM64, only half of the default 128 Kbytes is left for the
actual ring. While this doesn't break anything, the smaller ring size
could be a performance bottleneck.
Fix this by applying the VMBUS_RING_SIZE macro to the specified ring buffer
size. This macro adds a page for the header, and rounds up the size to a
page boundary, using the page size for which the kernel is built. Use this
new size for subsequent ring buffer calculations. For example, on ARM64
with 64K page size and the default ring size, this results in the actual
ring being 128 Kbytes, which is intended.
Cc: stable@vger.kernel.org # 5.15.x
Signed-off-by: Michael Kelley <mhklinux@outlook.com>
Link: https://lore.kernel.org/r/20240122170956.496436-1-mhklinux@outlook.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 4e6c9011990726f4d175e2cdfebe5b0b8cce4839 ]
Commit 4373534a9850 ("scsi: core: Move scsi_host_busy() out of host lock
for waking up EH handler") intended to fix a hard lockup issue triggered by
EH. The core idea was to move scsi_host_busy() out of the host lock when
processing individual commands for EH. However, a suggested style change
inadvertently caused scsi_host_busy() to remain under the host lock. Fix
this by calling scsi_host_busy() outside the lock.
Fixes: 4373534a9850 ("scsi: core: Move scsi_host_busy() out of host lock for waking up EH handler")
Cc: Sathya Prakash Veerichetty <safhya.prakash@broadcom.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20240203024521.2006455-1-ming.lei@redhat.com
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 4373534a9850627a2695317944898eb1283a2db0 ]
Inside scsi_eh_wakeup(), scsi_host_busy() is called & checked with host
lock every time for deciding if error handler kthread needs to be waken up.
This can be too heavy in case of recovery, such as:
- N hardware queues
- queue depth is M for each hardware queue
- each scsi_host_busy() iterates over (N * M) tag/requests
If recovery is triggered in case that all requests are in-flight, each
scsi_eh_wakeup() is strictly serialized, when scsi_eh_wakeup() is called
for the last in-flight request, scsi_host_busy() has been run for (N * M -
1) times, and request has been iterated for (N*M - 1) * (N * M) times.
If both N and M are big enough, hard lockup can be triggered on acquiring
host lock, and it is observed on mpi3mr(128 hw queues, queue depth 8169).
Fix the issue by calling scsi_host_busy() outside the host lock. We don't
need the host lock for getting busy count because host the lock never
covers that.
[mkp: Drop unnecessary 'busy' variables pointed out by Bart]
Cc: Ewan Milne <emilne@redhat.com>
Fixes: 6eb045e092ef ("scsi: core: avoid host-wide host_busy counter for scsi_mq")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20240112070000.4161982-1-ming.lei@redhat.com
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Reviewed-by: Sathya Prakash Veerichetty <safhya.prakash@broadcom.com>
Tested-by: Sathya Prakash Veerichetty <safhya.prakash@broadcom.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 658365c6b0857e6a306436e315a8633937e3af42 ]
Clang static complains that Value stored to 'status' is never read. Return
'status' rather than 'SCI_SUCCESS'.
Fixes: f1f52e75939b ("isci: uplevel request infrastructure")
Signed-off-by: Su Hui <suhui@nfschina.com>
Link: https://lore.kernel.org/r/20240112041926.3924315-1-suhui@nfschina.com
Reviewed-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
HISI_PHYE_PHY_UP_PM
[ Upstream commit ce26497c745d0541aec930d5211b431a1c26af97 ]
Currently in directly attached scenario, the phyup event
HISI_PHYE_PHY_UP_PM is notified before .phy_attached is set - this may
cause the phyup work hisi_sas_bytes_dmaed() execution failed and the
attached device will not be found.
To fix it, set .phy_attached before notifing phyup event.
Signed-off-by: Yihang Li <liyihang9@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Link: https://lore.kernel.org/r/1702525516-51258-2-git-send-email-chenxiang66@hisilicon.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
completes
[ Upstream commit aba0fb0ef607a71511d23a07406f11130c1a54c5 ]
If priority tagging is set in the service parameters of a FLOGI cmpl, then
we update the vmid_flag. In the current logic, if a follow up FLOGI cmpl
updates its service parameters such that priority tagging is no longer set,
then the vmid_flag ends up keeping stale data.
Fix by ensuring we clear the vmid_flag member during lpfc_reinit_vmid, and
check the priority tagging service parameter after reinitialization of the
vmid data structures.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20231207224039.35466-4-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 8dc8eb89f4df74593ba4bf30c3d31a0fc6d3ea47 ]
After a follow up FDISC cmpl, an NPIV's VMID data structures are not
updated.
Fix by calling lpfc_reinit_vmid and copying the physical port's vmid_flag
to the NPIV's vmid_flag in the NPIV registration cmpl code path.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20231207224039.35466-3-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 53122a49f49796beb2c4a1bb702303b66347e29f ]
We should set the status to FC_TIMED_OUT when a timeout error is passed to
fc_fcp_rec_error().
Signed-off-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20231129165832.224100-3-hare@kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit b57c4db5d23b9df0118a25e2441c9288edd73710 ]
The current FC error recovery is sending up to three REC (recovery) frames
in 10 second intervals, and as a final step sending an ABTS after 30
seconds for the command itself. Unfortunately sending an ABTS is also the
action for the SCSI abort handler, and the default timeout for SCSI
commands is also 30 seconds. This causes two ABTS to be scheduled, with the
libfc one slightly earlier. The ABTS scheduled by SCSI EH then sees the
command to be already aborted, and will always return with a 'GOOD' status
irrespective on the actual result from the first ABTS. This causes the
SCSI EH abort handler to always succeed, and SCSI EH never to be engaged.
Fix this by not issuing an ABTS when a SCSI command is present for the
exchange, but rather wait for the abort scheduled from SCSI EH. And warn
if an abort is already scheduled to avoid similar errors in the future.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20231129165832.224100-2-hare@kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 41c8a1a1e90fa4721f856bf3cf71211fd16d6434 ]
Add support for Areca RAID controllers with PCI device IDs 1883 and 1886.
Signed-off-by: ching Huang <ching2048@areca.com.tw>
Link: https://lore.kernel.org/r/7732e743eaad57681b1552eec9c6a86c76dbe459.camel@areca.com.tw
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit c9260ff28ee561fca5f96425c9328a9698e8427b ]
Add PCI IDs checks for the cases where SAS5116 diverges from SAS4116 in
behavior.
Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Link: https://lore.kernel.org/r/20231123160132.4155-3-sumit.saxena@broadcom.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 6fa21eab82be57a3ad2470fac27b982793805336 ]
Add support for Broadcom's SAS5116 IO/RAID controllers PCI IDs.
Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Link: https://lore.kernel.org/r/20231123160132.4155-2-sumit.saxena@broadcom.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit f5779b529240b715f0e358489ad0ed933bf77c97 ]
Because file_name and phba->ModelName are both declared a size 80 bytes,
the extra ".grp" file extension could cause an overflow into file_name.
Define a ELX_FW_NAME_SIZE macro with value 84. 84 incorporates the 4 extra
characters from ".grp". file_name is changed to be declared as a char and
initialized to zeros i.e. null chars.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20231031191224.150862-3-justintee8345@gmail.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 6df0e077d76bd144c533b61d6182676aae6b0a85 ]
When libata calls ata_link_abort() to abort all ata queued commands, it
calls blk_abort_request() on the SCSI command representing each QC.
This causes scsi_timeout() to be called, which calls scsi_eh_scmd_add() for
each SCSI command.
scsi_eh_scmd_add() sets the SCSI host to state recovery, and then adds the
command to shost->eh_cmd_q.
This will wake up the SCSI EH, and eventually the libata EH strategy
handler will be called, which calls scsi_eh_flush_done_q() to either flush
retry or flush finish each failed command.
The commands that are flush retried by scsi_eh_flush_done_q() are done so
using scsi_queue_insert().
Before commit 8b566edbdbfb ("scsi: core: Only kick the requeue list if
necessary"), __scsi_queue_insert() called blk_mq_requeue_request() with the
second argument set to true, indicating that it should always kick/run the
requeue list after inserting.
After commit 8b566edbdbfb ("scsi: core: Only kick the requeue list if
necessary"), __scsi_queue_insert() does not kick/run the requeue list after
inserting, if the current SCSI host state is recovery (which is the case in
the libata example above).
This optimization is probably fine in most cases, as I can only assume that
most often someone will eventually kick/run the queues.
However, that is not the case for scsi_eh_flush_done_q(), where we can see
that the request gets inserted to the requeue list, but the queue is never
started after the request has been inserted, leading to the block layer
waiting for the completion of command that never gets to run.
Since scsi_eh_flush_done_q() is called by SCSI EH context, the SCSI host
state is most likely always in recovery when this function is called.
Thus, let scsi_eh_flush_done_q() explicitly kick the requeue list after
inserting a flush retry command, so that scsi_eh_flush_done_q() keeps the
same behavior as before commit 8b566edbdbfb ("scsi: core: Only kick the
requeue list if necessary").
Simple reproducer for the libata example above:
$ hdparm -Y /dev/sda
$ echo 1 > /sys/class/scsi_device/0\:0\:0\:0/device/delete
Fixes: 8b566edbdbfb ("scsi: core: Only kick the requeue list if necessary")
Reported-by: Kevin Locke <kevin@kevinlocke.name>
Closes: https://lore.kernel.org/linux-scsi/ZZw3Th70wUUvCiCY@kevinlocke.name/
Signed-off-by: Niklas Cassel <cassel@kernel.org>
Link: https://lore.kernel.org/r/20240111120533.3612509-1-cassel@kernel.org
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
State
commit f8fb3f39148e8010479e4b2003ba4728818ec661 upstream.
If a controller reset is underway or the controller is in an unrecoverable
state, the PEL enable management command will be returned as EAGAIN or
EFAULT.
Cc: <stable@vger.kernel.org> # v6.1+
Co-developed-by: Sathya Prakash <sathya.prakash@broadcom.com>
Signed-off-by: Sathya Prakash <sathya.prakash@broadcom.com>
Signed-off-by: Chandrakanth patil <chandrakanth.patil@broadcom.com>
Link: https://lore.kernel.org/r/20231126053134.10133-4-chandrakanth.patil@broadcom.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit c01d515687e358b22aa8414d6dac60d7defa6eb9 upstream.
After a controller reset, if the firmware changes the state of devices to
"hide", then remove those devices from the OS.
Cc: <stable@vger.kernel.org> # v6.6+
Co-developed-by: Sathya Prakash <sathya.prakash@broadcom.com>
Signed-off-by: Sathya Prakash <sathya.prakash@broadcom.com>
Signed-off-by: Chandrakanth patil <chandrakanth.patil@broadcom.com>
Link: https://lore.kernel.org/r/20231126053134.10133-3-chandrakanth.patil@broadcom.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit e5aab848dfdf7996d20ece4d28d2733c732c5e5a upstream.
After a controller reset, the firmware may modify the device queue depth.
Therefore, update the device queue depth accordingly.
Cc: <stable@vger.kernel.org> # v5.15+
Co-developed-by: Sathya Prakash <sathya.prakash@broadcom.com>
Signed-off-by: Sathya Prakash <sathya.prakash@broadcom.com>
Signed-off-by: Chandrakanth patil <chandrakanth.patil@broadcom.com>
Link: https://lore.kernel.org/r/20231126053134.10133-2-chandrakanth.patil@broadcom.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 73e33f969ef05328766b23a99b2c07bfff765009 ]
In function debugfs_debugfs_snapshot_global_reg_v3_hw() it uses
debugfs_axi_reg.count (which is the number of axi debugfs registers) to
acquire the number of global debugfs registers.
Use debugfs_global_reg.count to acquire the number of global debugfs
registers instead.
Fixes: 623a4b6d5c2a ("scsi: hisi_sas: Move debugfs code to v3 hw driver")
Signed-off-by: Yihang Li <liyihang9@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Link: https://lore.kernel.org/r/1702525516-51258-6-git-send-email-chenxiang66@hisilicon.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 7ea3e7763c50b20a8bd25cf524ea0c6463de69be ]
We obtain the semaphore and set HISI_SAS_RESETTING_BIT in
hisi_sas_reset_prepare_v3_hw(), block the scsi host and set
HISI_SAS_REJECT_CMD_BIT in hisi_sas_controller_reset_prepare(), released
them in hisi_sas_controller_reset_done(). However, if the HW reset failure
in FLR results in early return, the semaphore and flag bits will not be
release.
Rollback some operations including clearing flags / releasing semaphore
when FLR is failed.
Fixes: e5ea48014adc ("scsi: hisi_sas: Implement handlers of PCIe FLR for v3 hw")
Signed-off-by: Yihang Li <liyihang9@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Link: https://lore.kernel.org/r/1702525516-51258-5-git-send-email-chenxiang66@hisilicon.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 8dd10296be8562a45c6c6794dd492a2b7dccede8 ]
In commit 4b329abc9180 ("scsi: hisi_sas: Move slot variable definition in
hisi_sas_abort_task()"), we move the variables slot to the function head.
However, the variable slot may be NULL, we should check it in each branch.
Fixes: 4b329abc9180 ("scsi: hisi_sas: Move slot variable definition in hisi_sas_abort_task()")
Signed-off-by: Yihang Li <liyihang9@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Link: https://lore.kernel.org/r/1702525516-51258-4-git-send-email-chenxiang66@hisilicon.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit d34ee535705eb43885bc0f561c63046f697355ad ]
In function hisi_sas_controller_prereset(), -ENOSYS (Function not
implemented) should be returned if the driver does not support .soft_reset.
Returns -EPERM (Operation not permitted) if HISI_SAS_RESETTING_BIT is
already be set.
In function _suspend_v3_hw(), returns -EPERM (Operation not permitted) if
HISI_SAS_RESETTING_BIT is already be set.
Fixes: 4522204ab218 ("scsi: hisi_sas: tidy host controller reset function a bit")
Signed-off-by: Yihang Li <liyihang9@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Link: https://lore.kernel.org/r/1702525516-51258-3-git-send-email-chenxiang66@hisilicon.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit f5f27a332a14f43463aa0075efa3a0c662c0f4a8 ]
In fnic_init_module() exists redundant check for return value from
fnic_debugfs_init(), because at moment it only can return zero. It make
sense to process theoretical vmalloc() failure.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: 9730ddfb123d ("scsi: fnic: remove redundant assignment of variable rc")
Signed-off-by: Artem Chernyshev <artem.chernyshev@red-soft.ru>
Link: https://lore.kernel.org/r/20231128111008.2280507-1-artem.chernyshev@red-soft.ru
Reviewed-by: Karan Tilak Kumar <kartilak@cisco.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 0349be31e4ffc79723e46e2e373569567b06347b ]
Fix the following sparse warning:
drivers/scsi/bfa/bfad_bsg.c:2553:50: sparse: sparse: incorrect type in initializer (different base types)
Fixes: 2e5a6c3baccd ("scsi: bfa: Convert bfad_reset_sdev_bflags() from a macro into a function")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202311031255.lmSPisIk-lkp@intel.com/
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20231115193338.2261972-1-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 066c5b46b6eaf2f13f80c19500dbb3b84baabb33 upstream.
In commit 8930a6c20791 ("scsi: core: add support for request batching") the
block layer bd->last flag was mapped to SCMD_LAST and used as an indicator
to send the batch for the drivers that implement this feature. However, the
error handling code was not updated accordingly.
scsi_send_eh_cmnd() is used to send error handling commands and request
sense. The problem is that request sense comes as a single command that
gets into the batch queue and times out. As a result the device goes
offline after several failed resets. This was observed on virtio_scsi
during a device resize operation.
[ 496.316946] sd 0:0:4:0: [sdd] tag#117 scsi_eh_0: requesting sense
[ 506.786356] sd 0:0:4:0: [sdd] tag#117 scsi_send_eh_cmnd timeleft: 0
[ 506.787981] sd 0:0:4:0: [sdd] tag#117 abort
To fix this always set SCMD_LAST flag in scsi_send_eh_cmnd() and
scsi_reset_ioctl().
Fixes: 8930a6c20791 ("scsi: core: add support for request batching")
Cc: <stable@vger.kernel.org>
Signed-off-by: Alexander Atanasov <alexander.atanasov@virtuozzo.com>
Link: https://lore.kernel.org/r/20231215121008.2881653-1-alexander.atanasov@virtuozzo.com
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit c5becf57dd5659c687d41d623a69f42d63f59eb2 upstream.
This reverts commit 9dc704dcc09eae7d21b5da0615eb2ed79278f63e.
Several reports have been made indicating that this commit caused
hangs. Numerous attempts at root causing and fixing the issue have
been unsuccessful so let's revert for now.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217599
Cc: <stable@vger.kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 08c94d80b2da481652fb633e79cbc41e9e326a91 ]
skb_share_check() already drops the reference to the skb when returning
NULL. Using kfree_skb() in the error handling path leads to an skb double
free.
Fix this by removing the variable tmp_skb, and return directly when
skb_share_check() returns NULL.
Fixes: 01a4cc4d0cd6 ("bnx2fc: do not add shared skbs to the fcoe_rx_list")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Link: https://lore.kernel.org/r/20221114110626.526643-1-weiyongjun@huaweicloud.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 235f2b548d7f4ac5931d834f05d3f7f5166a2e72 ]
When an error occurs in the for loop of beiscsi_init_wrb_handle(), we
should free phwi_ctxt->be_wrbq before returning an error code to prevent
potential memleak.
Fixes: a7909b396ba7 ("[SCSI] be2iscsi: Fix dynamic CID allocation Mechanism in driver")
Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn>
Link: https://lore.kernel.org/r/20231123081941.24854-1-dinghao.liu@zju.edu.cn
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 3b83486399a6a9feb9c681b74c21a227d48d7020 ]
If scsi_execute_cmd() returns < 0, it doesn't initialize the sshdr, so we
shouldn't access the sshdr. If it returns 0, then the cmd executed
successfully, so there is no need to check the sshdr. sd_sync_cache() will
only access the sshdr if it's been setup because it calls
scsi_status_is_check_condition() before accessing it. However, the
sd_sync_cache() caller, sd_suspend_common(), does not check.
sd_suspend_common() is only checking for ILLEGAL_REQUEST which it's using
to determine if the command is supported. If it's not it just ignores the
error. So to fix its sshdr use this patch just moves that check to
sd_sync_cache() where it converts ILLEGAL_REQUEST to success/0.
sd_suspend_common() was ignoring that error and sd_shutdown() doesn't check
for errors so there will be no behavior changes.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Link: https://lore.kernel.org/r/20231106231304.5694-2-michael.christie@oracle.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin Wilck <mwilck@suse.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit b09d7f8fd50f6e93cbadd8d27fde178f745b42a1 upstream.
It is not always possible to keep a device in the runtime suspended state
when a system level suspend/resume cycle is executed. E.g. for ATA devices
connected to AHCI adapters, system resume resets the ATA ports, which
causes connected devices to spin up. In such case, a runtime suspended disk
will incorrectly be seen with a suspended runtime state because the device
is not resumed by sd_resume_system(). The power state seen by the user is
different than the actual device physical power state.
Fix this issue by introducing the struct scsi_device flag
force_runtime_start_on_system_start. When set, this flag causes
sd_resume_system() to request a runtime resume operation for runtime
suspended devices. This results in the user seeing the device runtime_state
as active after a system resume, thus correctly reflecting the device
physical power state.
Fixes: 9131bff6a9f1 ("scsi: core: pm: Only runtime resume if necessary")
Cc: <stable@vger.kernel.org>
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20231120225631.37938-3-dlemoal@kernel.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 19597cad64d608aa8ac2f8aef50a50187a565223 upstream.
User experiences system crash when running AER error injection. The
perturbation causes the abort-all-I/O path to trigger. The driver assumes
all I/O on this path is FCP only. If there is both NVMe & FCP traffic, a
system crash happens. Add additional check to see if I/O is FCP or not
before access.
PID: 999019 TASK: ff35d769f24722c0 CPU: 53 COMMAND: "kworker/53:1"
0 [ff3f78b964847b58] machine_kexec at ffffffffae86973d
1 [ff3f78b964847ba8] __crash_kexec at ffffffffae9be29d
2 [ff3f78b964847c70] crash_kexec at ffffffffae9bf528
3 [ff3f78b964847c78] oops_end at ffffffffae8282ab
4 [ff3f78b964847c98] exc_page_fault at ffffffffaf2da502
5 [ff3f78b964847cc0] asm_exc_page_fault at ffffffffaf400b62
[exception RIP: qla2x00_abort_srb+444]
RIP: ffffffffc07b5f8c RSP: ff3f78b964847d78 RFLAGS: 00010046
RAX: 0000000000000282 RBX: ff35d74a0195a200 RCX: ff35d76886fd03a0
RDX: 0000000000000001 RSI: ffffffffc07c5ec8 RDI: ff35d74a0195a200
RBP: ff35d76913d22080 R8: ff35d7694d103200 R9: ff35d7694d103200
R10: 0000000100000000 R11: ffffffffb05d6630 R12: 0000000000010000
R13: ff3f78b964847df8 R14: ff35d768d8754000 R15: ff35d768877248e0
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
6 [ff3f78b964847d70] qla2x00_abort_srb at ffffffffc07b5f84 [qla2xxx]
7 [ff3f78b964847de0] __qla2x00_abort_all_cmds at ffffffffc07b6238 [qla2xxx]
8 [ff3f78b964847e38] qla2x00_abort_all_cmds at ffffffffc07ba635 [qla2xxx]
9 [ff3f78b964847e58] qla2x00_terminate_rport_io at ffffffffc08145eb [qla2xxx]
10 [ff3f78b964847e70] fc_terminate_rport_io at ffffffffc045987e [scsi_transport_fc]
11 [ff3f78b964847e88] process_one_work at ffffffffae914f15
12 [ff3f78b964847ed0] worker_thread at ffffffffae9154c0
13 [ff3f78b964847f10] kthread at ffffffffae91c456
14 [ff3f78b964847f50] ret_from_fork at ffffffffae8036ef
Cc: stable@vger.kernel.org
Fixes: f45bca8c5052 ("scsi: qla2xxx: Fix double scsi_done for abort path")
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Link: https://lore.kernel.org/r/20231030064912.37912-1-njavali@marvell.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
selected registers
commit 8e3ed9e786511ad800c33605ed904b9de49323cf upstream.
In BMC environments with concurrent access to multiple registers, certain
registers occasionally yield a value of 0 even after 3 retries due to
hardware errata. As a fix, we have extended the retry count from 3 to 30.
The same errata applies to the mpt3sas driver, and a similar patch has
been accepted. Please find more details in the mpt3sas patch reference
link.
Link: https://lore.kernel.org/r/20230829090020.5417-2-ranjan.kumar@broadcom.com
Fixes: 272652fcbf1a ("scsi: megaraid_sas: add retry logic in megasas_readl")
Cc: stable@vger.kernel.org
Signed-off-by: Chandrakanth patil <chandrakanth.patil@broadcom.com>
Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Link: https://lore.kernel.org/r/20231003110021.168862-2-chandrakanth.patil@broadcom.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 3c978492c333f0c08248a8d51cecbe5eb5f617c9 upstream.
The retry loop continues to iterate until the count reaches 30, even after
receiving the correct value. Exit loop when a non-zero value is read.
Fixes: 4ca10f3e3174 ("scsi: mpt3sas: Perform additional retries if doorbell read returns 0")
Cc: stable@vger.kernel.org
Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com>
Link: https://lore.kernel.org/r/20231020105849.6350-1-ranjan.kumar@broadcom.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 4df105f0ce9f6f30cda4e99f577150d23f0c9c5f ]
fc_lport_ptp_setup() did not check the return value of fc_rport_create()
which can return NULL and would cause a NULL pointer dereference. Address
this issue by checking return value of fc_rport_create() and log error
message on fc_rport_create() failed.
Signed-off-by: Wenchao Hao <haowenchao2@huawei.com>
Link: https://lore.kernel.org/r/20231011130350.819571-1-haowenchao2@huawei.com
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit b39f2d10b86d0af353ea339e5815820026bca48f ]
In practice the driver should never send more commands than are allocated
to a queue's event pool. In the unlikely event that this happens, the code
asserts a BUG_ON, and in the case that the kernel is not configured to
crash on panic returns a junk event pointer from the empty event list
causing things to spiral from there. This BUG_ON is a historical artifact
of the ibmvfc driver first being upstreamed, and it is well known now that
the use of BUG_ON is bad practice except in the most unrecoverable
scenario. There is nothing about this scenario that prevents the driver
from recovering and carrying on.
Remove the BUG_ON in question from ibmvfc_get_event() and return a NULL
pointer in the case of an empty event pool. Update all call sites to
ibmvfc_get_event() to check for a NULL pointer and perfrom the appropriate
failure or recovery action.
Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Link: https://lore.kernel.org/r/20230921225435.3537728-2-tyreld@linux.ibm.com
Reviewed-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 6de426f9276c448e2db7238911c97fb157cb23be ]
If init debugfs failed during device registration due to memory allocation
failure, debugfs_remove_recursive() is called, after which debugfs_dir is
not set to NULL. debugfs_remove_recursive() will be called again during
device removal. As a result, illegal pointer is accessed.
[ 1665.467244] hisi_sas_v3_hw 0000:b4:02.0: failed to init debugfs!
...
[ 1669.836708] Unable to handle kernel NULL pointer dereference at virtual address 00000000000000a0
[ 1669.872669] pc : down_write+0x24/0x70
[ 1669.876315] lr : down_write+0x1c/0x70
[ 1669.879961] sp : ffff000036f53a30
[ 1669.883260] x29: ffff000036f53a30 x28: ffffa027c31549f8
[ 1669.888547] x27: ffffa027c3140000 x26: 0000000000000000
[ 1669.893834] x25: ffffa027bf37c270 x24: ffffa027bf37c270
[ 1669.899122] x23: ffff0000095406b8 x22: ffff0000095406a8
[ 1669.904408] x21: 0000000000000000 x20: ffffa027bf37c310
[ 1669.909695] x19: 00000000000000a0 x18: ffff8027dcd86f10
[ 1669.914982] x17: 0000000000000000 x16: 0000000000000000
[ 1669.920268] x15: 0000000000000000 x14: ffffa0274014f870
[ 1669.925555] x13: 0000000000000040 x12: 0000000000000228
[ 1669.930842] x11: 0000000000000020 x10: 0000000000000bb0
[ 1669.936129] x9 : ffff000036f537f0 x8 : ffff80273088ca10
[ 1669.941416] x7 : 000000000000001d x6 : 00000000ffffffff
[ 1669.946702] x5 : ffff000008a36310 x4 : ffff80273088be00
[ 1669.951989] x3 : ffff000009513e90 x2 : 0000000000000000
[ 1669.957276] x1 : 00000000000000a0 x0 : ffffffff00000001
[ 1669.962563] Call trace:
[ 1669.965000] down_write+0x24/0x70
[ 1669.968301] debugfs_remove_recursive+0x5c/0x1b0
[ 1669.972905] hisi_sas_debugfs_exit+0x24/0x30 [hisi_sas_main]
[ 1669.978541] hisi_sas_v3_remove+0x130/0x150 [hisi_sas_v3_hw]
[ 1669.984175] pci_device_remove+0x48/0xd8
[ 1669.988082] device_release_driver_internal+0x1b4/0x250
[ 1669.993282] device_release_driver+0x28/0x38
[ 1669.997534] pci_stop_bus_device+0x84/0xb8
[ 1670.001611] pci_stop_and_remove_bus_device_locked+0x24/0x40
[ 1670.007244] remove_store+0xfc/0x140
[ 1670.010802] dev_attr_store+0x44/0x60
[ 1670.014448] sysfs_kf_write+0x58/0x80
[ 1670.018095] kernfs_fop_write+0xe8/0x1f0
[ 1670.022000] __vfs_write+0x60/0x190
[ 1670.025472] vfs_write+0xac/0x1c0
[ 1670.028771] ksys_write+0x6c/0xd8
[ 1670.032071] __arm64_sys_write+0x24/0x30
[ 1670.035977] el0_svc_common+0x78/0x130
[ 1670.039710] el0_svc_handler+0x38/0x78
[ 1670.043442] el0_svc+0x8/0xc
To fix this, set debugfs_dir to NULL after debugfs_remove_recursive().
Signed-off-by: Yihang Li <liyihang9@huawei.com>
Signed-off-by: Xingui Yang <yangxingui@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Link: https://lore.kernel.org/r/1694571327-78697-2-git-send-email-chenxiang66@hisilicon.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 670106eb4c8b23475f8c2b3416005a312afa622f ]
Commit 0217a272fe13 ("scsi: ibmvfc: Store return code of H_FREE_SUB_CRQ
during cleanup") wrongly changed the busy loop check to use
rtas_busy_delay() instead of H_BUSY and H_IS_LONG_BUSY(). The busy return
codes for RTAS and hypercalls are not the same.
Fix this issue by restoring the use of H_BUSY and H_IS_LONG_BUSY().
Fixes: 0217a272fe13 ("scsi: ibmvfc: Store return code of H_FREE_SUB_CRQ during cleanup")
Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Link: https://lore.kernel.org/r/20230921225435.3537728-5-tyreld@linux.ibm.com
Reviewed-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata
Pull ATA fix from Damien Le Moal:
"A single patch to fix a regression introduced by the recent
suspend/resume fixes.
The regression is that ATA disks are not stopped on system shutdown,
which is not recommended and increases the disks SMART counters for
unclean power off events.
This patch fixes this by refining the recent rework of the scsi device
manage_xxx flags"
* tag 'ata-6.6-final' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
scsi: sd: Introduce manage_shutdown device flag
|
|
Commit aa3998dbeb3a ("ata: libata-scsi: Disable scsi device
manage_system_start_stop") change setting the manage_system_start_stop
flag to false for libata managed disks to enable libata internal
management of disk suspend/resume. However, a side effect of this change
is that on system shutdown, disks are no longer being stopped (set to
standby mode with the heads unloaded). While this is not a critical
issue, this unclean shutdown is not recommended and shows up with
increased smart counters (e.g. the unexpected power loss counter
"Unexpect_Power_Loss_Ct").
Instead of defining a shutdown driver method for all ATA adapter
drivers (not all of them define that operation), this patch resolves
this issue by further refining the sd driver start/stop control of disks
using the new flag manage_shutdown. If this new flag is set to true by
a low level driver, the function sd_shutdown() will issue a
START STOP UNIT command with the start argument set to 0 when a disk
needs to be powered off (suspended) on system power off, that is, when
system_state is equal to SYSTEM_POWER_OFF.
Similarly to the other manage_xxx flags, the new manage_shutdown flag is
exposed through sysfs as a read-write device attribute.
To avoid any confusion between manage_shutdown and
manage_system_start_stop, the comments describing these flags in
include/scsi/scsi.h are also improved.
Fixes: aa3998dbeb3a ("ata: libata-scsi: Disable scsi device manage_system_start_stop")
Cc: stable@vger.kernel.org
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218038
Link: https://lore.kernel.org/all/cd397c88-bf53-4768-9ab8-9d107df9e613@gmail.com/
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Two small fixes, both in drivers.
The mptsas one is really fixing an error path issue where it can leave
the misc driver loaded even though the sas driver fails to initialize"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: qla2xxx: Fix double free of dsd_list during driver load
scsi: mpt3sas: Fix in error path
|
|
On driver load, scsi_add_host() can fail. This triggers the free path to
call qla2x00_mem_free() multiple times. This causes NULL pointer access of
ha->base_qpair. Add check before access.
BUG: unable to handle kernel NULL pointer dereference at 0000000000000030
IP: [<ffffffffc118f73c>] qla2x00_mem_free+0x51c/0xcb0 [qla2xxx]
PGD 8000001fcfe4a067 PUD 1fc8f0a067 PMD 0
Oops: 0000 [#1] SMP
RIP: 0010:[<ffffffffc118f73c>] [<ffffffffc118f73c>] qla2x00_mem_free+0x51c/0xcb0 [qla2xxx]
RSP: 0018:ffff8ace97a93a30 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8ace8efd0000 RCX: 000000000000488f
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffff8ace97a93a60 R08: 000000000001f040 R09: ffffffff8678209b
R10: ffff8acf7d6df040 R11: ffffc591c0fcc980 R12: ffffffff87034800
R13: ffff8acf0e3cc740 R14: ffff8ace8efd0000 R15: 00000000fffffff4
FS: 00007f4cf5449740(0000) GS:ffff8acf7d6c0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000030 CR3: 0000001fc2f6c000 CR4: 00000000007607e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
[<ffffffff86781f18>] ? kobject_put+0x28/0x60
[<ffffffffc119a59c>] qla2x00_probe_one+0x19fc/0x3040 [qla2xxx]
Fixes: efeda3bf912f ("scsi: qla2xxx: Move resource to allow code reuse")
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Link: https://lore.kernel.org/r/20231016101749.5059-1-njavali@marvell.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
The driver should be deregistered as misc driver after PCI registration
failure.
Signed-off-by: Tomas Henzl <thenzl@redhat.com>
Link: https://lore.kernel.org/r/20231015114529.10725-1-thenzl@redhat.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Commit ff48b37802e5 ("scsi: Do not attempt to rescan suspended devices")
modified scsi_rescan_device() to avoid attempting rescanning a suspended
device. However, the modification added a check to verify that a SCSI
device is in the running state without checking if the device request
queue (in the case of block device) is also running, thus allowing the
exectuion of internal requests. Without checking the device request
queue, commit ff48b37802e5 fix is incomplete and deadlocks on resume can
still happen. Use blk_queue_pm_only() to check if the device request
queue allows executing commands in addition to checking the SCSI device
state.
Reported-by: Petr Tesarik <petr@tesarici.cz>
Fixes: ff48b37802e5 ("scsi: Do not attempt to rescan suspended devices")
Cc: stable@vger.kernel.org
Tested-by: Petr Tesarik <petr@tesarici.cz>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
|