summaryrefslogtreecommitdiff
path: root/drivers/scsi/lpfc/lpfc.h
AgeCommit message (Collapse)AuthorFilesLines
2021-11-05Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds1-0/+1
Pull SCSI updates from James Bottomley: "This consists of the usual driver updates (ufs, smartpqi, lpfc, target, megaraid_sas, hisi_sas, qla2xxx) and minor updates and bug fixes. Notable core changes are the removal of scsi->tag which caused some churn in obsolete drivers and a sweep through all drivers to call scsi_done() directly instead of scsi->done() which removes a pointer indirection from the hot path and a move to register core sysfs files earlier, which means they're available to KOBJ_ADD processing, which necessitates switching all drivers to using attribute groups" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (279 commits) scsi: lpfc: Update lpfc version to 14.0.0.3 scsi: lpfc: Allow fabric node recovery if recovery is in progress before devloss scsi: lpfc: Fix link down processing to address NULL pointer dereference scsi: lpfc: Allow PLOGI retry if previous PLOGI was aborted scsi: lpfc: Fix use-after-free in lpfc_unreg_rpi() routine scsi: lpfc: Correct sysfs reporting of loop support after SFP status change scsi: lpfc: Wait for successful restart of SLI3 adapter during host sg_reset scsi: lpfc: Revert LOG_TRACE_EVENT back to LOG_INIT prior to driver_resource_setup() scsi: ufs: ufshcd-pltfrm: Fix memory leak due to probe defer scsi: ufs: mediatek: Avoid sched_clock() misuse scsi: mpt3sas: Make mpt3sas_dev_attrs static scsi: scsi_transport_sas: Add 22.5 Gbps link rate definitions scsi: target: core: Stop using bdevname() scsi: aha1542: Use memcpy_{from,to}_bvec() scsi: sr: Add error handling support for add_disk() scsi: sd: Add error handling support for add_disk() scsi: target: Perform ALUA group changes in one step scsi: target: Replace lun_tg_pt_gp_lock with rcu in I/O path scsi: target: Fix alua_tg_pt_gps_count tracking scsi: target: Fix ordered tag handling ...
2021-10-18block: move elevator.h to block/Christoph Hellwig1-0/+1
Except for the features passed to blk_queue_required_elevator_features, elevator.h is only needed internally to the block layer. Move the ELEVATOR_F_* definitions to blkdev.h, and the move elevator.h to block/, dropping all the spurious includes outside of that. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20210920123328.1399408-13-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-09-15scsi: lpfc: Fix EEH support for NVMe I/OJames Smart1-0/+1
Injecting errors on the PCI slot while the driver is handling NVMe I/O will cause crashes and hangs. There are several rather difficult scenarios occurring. The main issue is that the adapter can report a PCI error before or simultaneously to the PCI subsystem reporting the error. Both paths have different entry points and currently there is no interlock between them. Thus multiple teardown paths are competing and all heck breaks loose. Complicating things is the NVMs path. To a large degree, I/O was able to be shutdown for a full FC port on the SCSI stack. But on NVMe, there isn't a similar call. At best, it works on a per-controller basis, but even at the controller level, it's a controller "reset" call. All of which means I/O is still flowing on different CPUs with reset paths expecting hw access (mailbox commands) to execute properly. The following modifications are made: - A new flag is set in PCI error entrypoints so the driver can track being called by that path. - An interlock is added in the SLI hw error path and the PCI error path such that only one of the paths proceeds with the teardown logic. - RPI cleanup is patched such that RPIs are marked unregistered w/o mbx cmds in cases of hw error. - If entering the SLI port re-init calls, a case where SLI error teardown was quick and beat the PCI calls now reporting error, check whether the SLI port is still live on the PCI bus. - In the PCI reset code to bring the adapter back, recheck the IRQ settings. Different checks for SLI3 vs SLI4. - In I/O completions, that may be called as part of the cleanup or underway just before the hw error, check the state of the adapter. If in error, shortcut handling that would expect further adapter completions as the hw error won't be sending them. - In routines waiting on I/O completions, which may have been in progress prior to the hw error, detect the device is being torn down and abort from their waits and just give up. This points to a larger issue in the driver on ref-counting for data structures, as it doesn't have ref-counting on q and port structures. We'll do this fix for now as it would be a major rework to be done differently. - Fix the NVMe cleanup to simulate NVMe I/O completions if I/O is being failed back due to hw error. - In I/O buf allocation, done at the start of new I/Os, check hw state and fail if hw error. Link: https://lore.kernel.org/r/20210910233159.115896-10-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-08-25scsi: lpfc: Add cmf_info sysfs entryJames Smart1-0/+1
Allow abbreviated cm framework status information to be obtained via sysfs. Link: https://lore.kernel.org/r/20210816162901.121235-14-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-08-25scsi: lpfc: Add debugfs support for cm framework buffersJames Smart1-0/+2
Add support via debugfs to report the cm statistics, cm enablement, and rx monitor information. Link: https://lore.kernel.org/r/20210816162901.121235-13-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-08-25scsi: lpfc: Add rx monitoring statisticsJames Smart1-0/+21
The driver provides overwatch of the cm behavior by maintaining a set of rx I/O statistics. This information is also used in later updating of the cm statistics buffer. Link: https://lore.kernel.org/r/20210816162901.121235-11-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-08-25scsi: lpfc: Add support for the CM frameworkJames Smart1-3/+30
Complete the enablement of the cm framework feature in the adapter. Perform the following: - Detect the presence of the congestion management framework feature. When the cm framework is present: - Issue the SET_FEATURE command to enable the feature. - Register the cm statistics buffer with the adapter. - Read the cm enablement buffer to determine the cm framework state for cm management. When cm management is enabled: - Monitor all FPIN and congestion signalling events, incrementing counters. - Regularly sync with the adapter to communicate congestion events and to receive an rx request limit. - Monitor requests for rx data and ensure that no more than the adapter prescribed limit is issued on the link. If the limit is exceeded, SCSI and/or NVMe traffic is temporarily suspended. - Maintain the minute, hourly, daily statistics buffer. - Monitor for congestion enablement change events, causing a reread of the enablement buffer and acting on any change in enablement. And: - Add teardown logic, including buffer deregistration, on adapter detachment or reset. Link: https://lore.kernel.org/r/20210816162901.121235-10-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-08-25scsi: lpfc: Add cmfsync WQE supportJames Smart1-0/+4
When congestion mgmt is enabled, cmf has the driver regularly issue a command to synchronize reporting of congestion mgmt events such as fpin and signal delivery. This patch adds the definition of the CMF_SYNC WQE and its CQE fields as well as support for issuing the command. The patch also adds the few remaining cmf-related SLI additions, such as feature definition for enablement of CMF and notifications to the driver if the cm enablement mode changes. Link: https://lore.kernel.org/r/20210816162901.121235-9-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-08-25scsi: lpfc: Add support for cm enablement bufferJames Smart1-0/+29
As part of the cmf framework, the firmware maintains a table with congestion related state information, specifically whether enabled and if enabled, whether monitoring or actively managing congestion. Add definition of the table and add support to read the table from the adapter and determine if it is enabled. In support of this, the READ_OBJECT mailbox command definition is added to the driver. Link: https://lore.kernel.org/r/20210816162901.121235-8-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-08-25scsi: lpfc: Add cm statistics buffer supportJames Smart1-0/+133
The cmf framework requires the driver to maintain a cm statistics table, accessible inband, of congestion related statistics that are reported per minute, rolled up to per hour, and rolled up again per day. Several days worth may be maintained. The table is registered with the adapter when the MIB feature is enabled. Add definition of the table and add support to register the table with the adapter. Includes definition and initialization of event counters that are later added to the statistics table. Link: https://lore.kernel.org/r/20210816162901.121235-7-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-08-25scsi: lpfc: Add EDC ELS supportJames Smart1-0/+35
When congestion management is enabled, issue EDC ELS to register congestion signaling capabilities with the fabric. The response handling will process the fabric parameters and set the reporting parameters. Similarly, add support for receiving an EDC request from the fabric generating a corresponding response. Implement handlers for congestion signals from the fabric and maintain statistics for them. Link: https://lore.kernel.org/r/20210816162901.121235-6-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-07-19scsi: lpfc: Fix NVMe support reporting in log messageJames Smart1-1/+0
The NVMe support indicator in log message 6422 is displaying a field that was initialized but never set to indicate NVMe support. Remove obsolete nvme_support element from the lpfc_hba structure and change log message to display NVMe support status as reported in SLI4 Config Parameters mailbox command. Link: https://lore.kernel.org/r/20210707184351.67872-2-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-06-10scsi: lpfc: vmid: Add datastructure for supporting VMID in lpfcGaurav Srivastava1-0/+122
Add the primary datastructures needed to implement VMID in the lpfc driver. Maintain the capability, current state, and hash table for the vmid/appid along with other information. This implementation supports the two versions of vmid implementation (app header and priority tagging). Link: https://lore.kernel.org/r/20210608043556.274139-5-muneendra.kumar@broadcom.com Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Gaurav Srivastava <gaurav.srivastava@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Muneendra Kumar <muneendra.kumar@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-05-22scsi: lpfc: Reregister FPIN types if ELS_RDF is received from fabric controllerJames Smart1-0/+1
FC-LS-5 specifies that a received RDF implies a possible change to fabric supported diagnostic functions. Endpoints are to re-perform the RDF exchange with the fabric to enable possible new features or adapt to changes in values. This patch adds the logic to RDF receive to re-perform the RDF exchange with the switch. Link: https://lore.kernel.org/r/20210514195559.119853-11-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-05-22scsi: lpfc: Add a option to enable interlocked ABTS before job completionJames Smart1-0/+1
Default behavior for the driver, when aborting an I/O, is to terminate the I/O with the adapter. The adapter will initiate an ABTS to terminate the exchange on the link and mark the exchange is terminated so that no further use of the sgl or any traffic for the exchange is worked on. Completion on the Abort is then posted to the driver, which as the I/O is terminated can complete the I/O to the OS. This completion may occur prior to the ABTS handshake completing on the wire. The ABTS handshake can take a long time to complete with timeouts and retries reaching 60+ seconds. Note: if retries fail, LOGO occurs. Some devices want to ensure that the ABTS handshake fully completes (this device has fully ack'd it) before the I/O completion is posted back to the OS, where a failed I/O may be retried via a different path. To support this behavior, an option was added to the driver to change I/O completion from the Abort cmd completion to the Exchange termination (aka ABTS) completion. Link: https://lore.kernel.org/r/20210514195559.119853-10-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-05scsi: lpfc: Update copyrights for 12.8.0.7 and 12.8.0.8 changesJames Smart1-1/+1
For the files modified in 2021 via the 12.8.0.7 and 12.8.0.8 patch sets, update the copyright for 2021. Link: https://lore.kernel.org/r/20210301171821.3427-23-jsmart2021@gmail.com Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-05scsi: lpfc: Fix dropped FLOGI during pt2pt discovery recoveryJames Smart1-0/+1
When connected in pt2pt mode, there is a scenario where the remote port significantly delays sending a response to our FLOGI, but acts on the FLOGI it sent us and proceeds to PLOGI/PRLI. The FLOGI ends up timing out and kicks off recovery logic. End result is a lot of unnecessary state changes and lots of discovery messages being logged. Fix by terminating the FLOGI and noop'ing its completion if we have already accepted the remote ports FLOGI and are now processing PLOGI. Link: https://lore.kernel.org/r/20210301171821.3427-13-jsmart2021@gmail.com Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-08scsi: lpfc: Implement health checking when aborting I/OJames Smart1-1/+2
Several errors have occurred where the adapter stops or fails but does not raise the register values for the driver to detect failure. Thus driver is unaware of the failure. The failure typically results in I/O timeouts, the I/O timeout handler failing (after several seconds), and the error handler escalating recovery policy and resulting in more errors. Eventually, the driver is in a position where things have spiraled and it can't do recovery because other recovery ops are still outstanding and it becomes unusable. Resolve the situation by having the I/O timeout handler (actually a els, SCSI I/O, NVMe ls, or NVMe I/O timeout), in addition to aborting the I/O, perform a mailbox command and look for a response from the hardware. If the mailbox command fails, it will mark the adapter offline and then invoke the adapter reset handler to clean up. The new I/O timeout test will be limited to a test every 5s. If there are multiple I/O timeouts concurrently, only the 1st I/O timeout will generate the mailbox command. Further testing will only occur once a timeout occurs after a 5s delay from the last mailbox command has expired. Link: https://lore.kernel.org/r/20210104180240.46824-14-jsmart2021@gmail.com Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-08scsi: lpfc: Fix auto sli_mode and its effect on CONFIG_PORT for SLI3James Smart1-0/+1
A very long time ago, there was a feature: auto sli mode. It gave the user the ability to auto select the SLI mode (SLI2 or SLI3) to run the port in, or even force SLI2 mode if configured. Because of the convoluted logic, the CONFIG_PORT mbox command ends up being called 2 or 3 times. It should have been called only once. Additionally, the driver no longer supports SLI-2, so only SLI-3 mode should be allowed. The following changes were made: - Force module parameter to SLI3 only. - Rip out redundant CONFIG_PORT mbox commands. - Force CONFIG_PORT mbox command to be in beginning of enable ISR routine. - Added changes for offline to online behavior Link: https://lore.kernel.org/r/20210104180240.46824-3-jsmart2021@gmail.com Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-11-17scsi: lpfc: Convert SCSI path to use common I/O submission pathJames Smart1-0/+4
This patch converts the SCSI I/O path from the iocb-centric interfaces to the common I/O submission path which supports native SLI-4 WQEs. A wrapper routine is put in place to distinguish SLI-3 from SLI. If SLI-3, the same iocb-centric paths are used, perhaps with refactored code that is explicitly for SLI-3. For SLI-4, any iocb-related formatting is replaced by wqe-based formatting, although much of that is addressed by the common wqe templates in the SLI-4 path. Link: https://lore.kernel.org/r/20201115192646.12977-14-james.smart@broadcom.com Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-11-17scsi: lpfc: Enable common send_io interface for SCSI and NVMeJames Smart1-0/+3
To set up common use by the SCSI and NVMe I/O paths, create a new routine that issues FCP I/O commands which can be used by either protocol. The new routine addresses SLI-3 vs SLI-4 differences within its implementation. Replace the (SLI-3 centric) iocb routine in the SCSI path with this new WQE-centric common routine. Link: https://lore.kernel.org/r/20201115192646.12977-13-james.smart@broadcom.com Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-11-17scsi: lpfc: Rework remote port lock handlingJames Smart1-2/+0
Currently the discovery layers within the driver use the SCSI midlayer host_lock to access node-specific structures. This can contend with the I/O path and is too coarse of a lock. Rework the driver so that it uses a lock specific to the remote port node structure when accessing the structure contents. A few of the changes brought out spots were some slightly reorganized routines worked better. Link: https://lore.kernel.org/r/20201115192646.12977-6-james.smart@broadcom.com Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-10-27scsi: lpfc: Add FDMI Vendor MIB supportJames Smart1-1/+3
Created new attribute lpfc_enable_mi, which by default is enabled. Add command definition bits for SLI-4 parameters that recognize whether the adapter has MIB information support and what revision of MIB data. Using the adapter information, register vendor-specific MIB support with FDMI. The registration will be done every link up. During FDMI registration, encountered a couple of errors when reverting to FDMI rev1. Code needed to exist once reverting. Fixed these. Link: https://lore.kernel.org/r/20201020202719.54726-8-james.smart@broadcom.com Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-10-27scsi: lpfc: Fix scheduling call while in softirq context in lpfc_unreg_rpiJames Smart1-1/+1
The following call trace was seen during HBA reset testing: BUG: scheduling while atomic: swapper/2/0/0x10000100 ... Call Trace: dump_stack+0x19/0x1b __schedule_bug+0x64/0x72 __schedule+0x782/0x840 __cond_resched+0x26/0x30 _cond_resched+0x3a/0x50 mempool_alloc+0xa0/0x170 lpfc_unreg_rpi+0x151/0x630 [lpfc] lpfc_sli_abts_recover_port+0x171/0x190 [lpfc] lpfc_sli4_abts_err_handler+0xb2/0x1f0 [lpfc] lpfc_sli4_io_xri_aborted+0x256/0x300 [lpfc] lpfc_sli4_sp_handle_abort_xri_wcqe.isra.51+0xa3/0x190 [lpfc] lpfc_sli4_fp_handle_cqe+0x89/0x4d0 [lpfc] __lpfc_sli4_process_cq+0xdb/0x2e0 [lpfc] __lpfc_sli4_hba_process_cq+0x41/0x100 [lpfc] lpfc_cq_poll_hdler+0x1a/0x30 [lpfc] irq_poll_softirq+0xc7/0x100 __do_softirq+0xf5/0x280 call_softirq+0x1c/0x30 do_softirq+0x65/0xa0 irq_exit+0x105/0x110 do_IRQ+0x56/0xf0 common_interrupt+0x16a/0x16a With the conversion to blk_io_poll for better interrupt latency in normal cases, it introduced this code path, executed when I/O aborts or logouts are seen, which attempts to allocate memory for a mailbox command to be issued. The allocation is GFP_KERNEL, thus it could attempt to sleep. Fix by creating a work element that performs the event handling for the remote port. This will have the mailbox commands and other items performed in the work element, not the irq. A much better method as the "irq" routine does not stall while performing all this deep handling code. Ensure that allocation failures are handled and send LOGO on failure. Additionally, enlarge the mailbox memory pool to reduce the possibility of additional allocation in this path. Link: https://lore.kernel.org/r/20201020202719.54726-3-james.smart@broadcom.com Fixes: 317aeb83c92b ("scsi: lpfc: Add blk_io_poll support for latency improvment") Cc: <stable@vger.kernel.org> # v5.9+ Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-07-03scsi: lpfc: Add an internal trace log bufferDick Kennedy1-0/+12
The current logging methods typically end up requesting a reproduction with a different logging level set to figure out what happened. This was mainly by design to not clutter the kernel log messages with things that were typically not interesting and the messages themselves could cause other issues. When looking to make a better system, it was seen that in many cases when more data was wanted was when another message, usually at KERN_ERR level, was logged. And in most cases, what the additional logging that was then enabled was typically. Most of these areas fell into the discovery machine. Based on this summary, the following design has been put in place: The driver will maintain an internal log (256 elements of 256 bytes). The "additional logging" messages that are usually enabled in a reproduction will be changed to now log all the time to the internal log. A new logging level is defined - LOG_TRACE_EVENT. When this level is set (it is not by default) and a message marked as KERN_ERR is logged, all the messages in the internal log will be dumped to the kernel log before the KERN_ERR message is logged. There is a timestamp on each message added to the internal log. However, this timestamp is not converted to wall time when logged. The value of the timestamp is solely to give a crude time reference for the messages. Link: https://lore.kernel.org/r/20200630215001.70793-14-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-07-03scsi: lpfc: Add blk_io_poll support for latency improvmentDick Kennedy1-0/+3
Although the existing implementation is very good at high I/O load, on tests involving light load, especially on only a few hardware queues, latency was a little higher than it can be due to using workqueue scheduling. Other tasks in the system can delay handling. Change the lower level to use irq_poll by default which uses a softirq for I/O completion. This gives better latency as variance in when the cq is processed is reduced over the workqueue interface. However, as high load is better served by not being in softirq when the CPU is loaded, work queues are still used under high I/O load. Link: https://lore.kernel.org/r/20200630215001.70793-13-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-06-06Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds1-5/+18
Pull SCSI updates from James Bottomley: :This series consists of the usual driver updates (qla2xxx, ufs, zfcp, target, scsi_debug, lpfc, qedi, qedf, hisi_sas, mpt3sas) plus a host of other minor updates. There are no major core changes in this series apart from a refactoring in scsi_lib.c" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (207 commits) scsi: ufs: ti-j721e-ufs: Fix unwinding of pm_runtime changes scsi: cxgb3i: Fix some leaks in init_act_open() scsi: ibmvscsi: Make some functions static scsi: iscsi: Fix deadlock on recovery path during GFP_IO reclaim scsi: ufs: Fix WriteBooster flush during runtime suspend scsi: ufs: Fix index of attributes query for WriteBooster feature scsi: ufs: Allow WriteBooster on UFS 2.2 devices scsi: ufs: Remove unnecessary memset for dev_info scsi: ufs-qcom: Fix scheduling while atomic issue scsi: mpt3sas: Fix reply queue count in non RDPQ mode scsi: lpfc: Fix lpfc_nodelist leak when processing unsolicited event scsi: target: tcmu: Fix a use after free in tcmu_check_expired_queue_cmd() scsi: vhost: Notify TCM about the maximum sg entries supported per command scsi: qla2xxx: Remove return value from qla_nvme_ls() scsi: qla2xxx: Remove an unused function scsi: iscsi: Register sysfs for iscsi workqueue scsi: scsi_debug: Parser tables and code interaction scsi: core: Refactor scsi_mq_setup_tags function scsi: core: Fix incorrect usage of shost_for_each_device scsi: qla2xxx: Fix endianness annotations in source files ...
2020-05-10lpfc: Refactor nvmet_rcv_ctx to create lpfc_async_xchg_ctxJames Smart1-1/+1
To support FC-NVME-2 support (actually FC-NVME (rev 1) with Ammendment 1), both the nvme (host) and nvmet (controller/target) sides will need to be able to receive LS requests. Currently, this support is in the nvmet side only. To prepare for both sides supporting LS receive, rename lpfc_nvmet_rcv_ctx to lpfc_async_xchg_ctx and commonize the definition. Signed-off-by: Paul Ely <paul.ely@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-08scsi: lpfc: Change default queue allocation for reduced memory consumptionDick Kennedy1-5/+18
By default, the driver attempts to allocate a hdwq per logical cpu in order to provide good cpu affinity. Some systems have extremely high cpu counts and this can significantly raise memory consumption. In testing on x86 platforms (non-AMD) it is found that sharing of a hdwq by a physical cpu and its HT cpu can occur with little performance degredation. By sharing, the hdwq count can be halved, significantly reducing the memory overhead. Change the default behavior of the driver on non-AMD x86 platforms to share a hdwq by the cpu and its HT cpu. Link: https://lore.kernel.org/r/20200501214310.91713-6-jsmart2021@gmail.com Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-03-30scsi: lpfc: Remove prototype FIPS/DSS options from SLI-3James Smart1-7/+2
During code review, identified dss feature that was a prototype only and was never productized in SLI3. They shouldn't be there and prevents reuse of the command areas. Remove any code in the driver to deal with dss, including code to deal with fips, which is associated with the dss feature. Link: https://lore.kernel.org/r/20200322181304.37655-12-jsmart2021@gmail.com Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-03-30scsi: lpfc: Make debugfs ktime stats generic for NVME and SCSIJames Smart1-1/+1
Currently driver ktime stats, measuring code paths, is NVME-specific. Convert the stats routines such that the code paths are generic, providing status for NVME and SCSI. Added ktime stat calls in SCSI queuecommand and cmpl routines. Link: https://lore.kernel.org/r/20200322181304.37655-11-jsmart2021@gmail.com Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-03-30scsi: lpfc: Fix erroneous cpu limit of 128 on I/O statisticsJames Smart1-5/+4
The cpu io statistics were capped by a hard define limit of 128. This effectively was a max number of CPUs, not an actual CPU count, nor actual CPU numbers which can be even larger than both of those values. This made stats off/misleading and on large CPU count systems, wrong. Fix the stats so that all CPUs can have a stats struct. Fix the looping such that it loops by hdwq, finds CPUs that used the hdwq, and sum the stats, then display. Link: https://lore.kernel.org/r/20200322181304.37655-9-jsmart2021@gmail.com Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-03-27scsi: lpfc: Fix scsi host template for SLI3 vportsJames Smart1-0/+5
SCSI layer sends driver IOs with more s/g segments than driver can handle. This results in "Too many sg segments from dma_map_sg. Config 64, seg_cnt 219" error messages from the lpfc_scsi_prep_dma_buf_s3() routine. The was due to use the driver using individual templates for pport and vport, host reset enabled or not, nvme vs scsi, etc. In the end, there was a combination for a vport that didn't match the pport. Rather than enumerating more templates and more discretionary assignments, revert to a base template that is copied to a template specific to the pport/vport. Then, based on role, attributes and sli type, modify the fields that are different for that port. Added a log message to lpfc_create_port to validate values. Link: https://lore.kernel.org/r/20200322181304.37655-5-jsmart2021@gmail.com Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-02-18scsi: lpfc: add RDF registration and Link Integrity FPIN loggingJames Smart1-0/+29
This patch modifies lpfc to register for Link Integrity events via the use of an RDF ELS and to perform Link Integrity FPIN logging. Specifically, the driver was modified to: - Format and issue the RDF ELS immediately following SCR registration. This registers the ability of the driver to receive FPIN ELS. - Adds decoding of the FPIN els into the received descriptors, with logging of the Link Integrity event information. After decoding, the ELS is delivered to the scsi fc transport to be delivered to any user-space applications. - To aid in logging, simple helpers were added to create enum to name string lookup functions that utilize the initialization helpers from the fc_els.h header. - Note: base header definitions for the ELS's don't populate the descriptor payloads. As such, lpfc creates it's own version of the structures, using the base definitions (mostly headers) and additionally declaring the descriptors that will complete the population of the ELS. Link: https://lore.kernel.org/r/20200210173155.547-3-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-02-11scsi: lpfc: Copyright updates for 12.6.0.4 patchesJames Smart1-1/+1
Update copyrights to 2020 for files modified in the 12.6.0.4 patch set. Link: https://lore.kernel.org/r/20200128002312.16346-13-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-02-11scsi: lpfc: Remove handler for obsolete ELS - Read Port Status (RPS)James Smart1-1/+0
There was report of an odd "Fix me..." log message, which was tracked down to the lpfc_els_rcv_rps() routine. This was in handling of a very old and obsolete ELS - Read Port Status. The RPS ELS was defined in FC-LS-1, but deprecated in FC-LS-2, and removed from all later FC-LS revisions. It was replaced by the Read Diagnostic Parameters (RDP) ELS and the Link Error Status Block descriptor. There should be no support for the RSP ELS. Remove support from driver. Link: https://lore.kernel.org/r/20200128002312.16346-9-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-02-11scsi: lpfc: Fix broken Credit Recovery after driver loadJames Smart1-0/+1
When driver is set to enable bb credit recovery, the switch displayed the setting as inactive. If the link bounces, it switches to Active. During link up processing, the driver currently does a MBX_READ_SPARAM followed by a MBX_CONFIG_LINK. These mbox commands are queued to be executed, one at a time and the completion is processed by the worker thread. Since the MBX_READ_SPARAM is done BEFORE the MBX_CONFIG_LINK, the BB_SC_N bit is never set the the returned values. BB Credit recovery status only gets set after the driver requests the feature in CONFIG_LINK, which is done after the link up. Thus the ordering of READ_SPARAM needs to follow the CONFIG_LINK. Fix by reordering so that READ_SPARAM is done after CONFIG_LINK. Added a HBA_DEFER_FLOGI flag so that any FLOGI handling waits until after the READ_SPARAM is done so that the proper BB credit value is set in the FLOGI payload. Fixes: 6bfb16208298 ("scsi: lpfc: Fix configuration of BB credit recovery in service parameters") Cc: <stable@vger.kernel.org> # v5.4+ Link: https://lore.kernel.org/r/20200128002312.16346-4-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-12-21scsi: lpfc: Fix Fabric hostname registration if system hostname changesJames Smart1-0/+2
There are reports of multiple ports on the same system displaying different hostnames in fabric FDMI displays. Currently, the driver registers the hostname at initialization and obtains the hostname via init_utsname()->nodename queried at the time the FC link comes up. Unfortunately, if the machine hostname is updated after initialization, such as via DHCP or admin command, the value registered initially will be incorrect. Fix by having the driver save the hostname that was registered with FDMI. The driver then runs a heartbeat action that will check the hostname. If the name changes, reregister the FMDI data. The hostname is used in RSNN_NN, FDMI RPA and FDMI RHBA. Link: https://lore.kernel.org/r/20191218235808.31922-5-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-11-06scsi: lpfc: Change default IRQ model on AMD architecturesJames Smart1-0/+21
The current driver attempts to allocate an interrupt vector per cpu using the systems managed IRQ allocator (flag PCI_IRQ_AFFINITY). The system IRQ allocator will either provide the per-cpu vector, or return fewer vectors. When fewer vectors, they are evenly spread between the numa nodes on the system. When run on an AMD architecture, if interrupts occur to a cpu that is not in the same numa node as the adapter generating the interrupt, there are extreme costs and overheads in performance. Thus, if 1:1 vector allocation is used, or the "balanced" vectors in the other numa nodes, performance can be hit significantly. A much more performant model is to allocate interrupts only on the cpus that are in the numa node where the adapter resides. I/O completion is still performed by the cpu where the I/O was generated. Unfortunately, there is no flag to request the managed IRQ subsystem allocate vectors only for the CPUs in the numa node as the adapter. On AMD architecture, revert the irq allocation to the normal style (non-managed) and then use irq_set_affinity_hint() to set the cpu affinity and disable user-space rebalancing. Tie the support into CPU offline/online. If the cpu being offlined owns a vector, the vector is re-affinitized to one of the other CPUs on the same numa node. If there are no more CPUs on the numa node, the vector has all affinity removed and lets the system determine where it's serviced. Similarly, when the cpu that owned a vector comes online, the vector is reaffinitized to the cpu. Link: https://lore.kernel.org/r/20191105005708.7399-10-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-11-06scsi: lpfc: Add registration for CPU Offline/Online eventsJames Smart1-0/+7
The recent affinitization didn't address cpu offlining/onlining. If an interrupt vector is shared and the low order cpu owning the vector is offlined, as interrupts are managed, the vector is taken offline. This causes the other CPUs sharing the vector will hang as they can't get io completions. Correct by registering callbacks with the system for Offline/Online events. When a cpu is taken offline, its eq, which is tied to an interrupt vector is found. If the cpu is the "owner" of the vector and if the eq/vector is shared by other CPUs, the eq is placed into a polled mode. Additionally, code paths that perform io submission on the "sharing CPUs" will check the eq state and poll for completion after submission of new io to a wq that uses the eq. Similarly, when a cpu comes back online and owns an offlined vector, the eq is taken out of polled mode and rearmed to start driving interrupts for eq. Link: https://lore.kernel.org/r/20191105005708.7399-9-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-10-25scsi: lpfc: Add FC-AL support to lpe32000 modelsJames Smart1-0/+1
In the past, the lpe32000 models, based their main support being for 32G, and as FC-AL is not supported in the FC standards past 8G, did not support FC-AL operation. This patch adds private-loop FC-AL support for the LPE32000 adapters when a link is 8G or below. To avoid conditions where link rate may change, which would cause non-connectivity to the AL device, FC-AL mode must become a persistent setting and the link kept at a speed supporting FC-AL. The patch: - Adds a pls attribute indicating whether the adapter properly supports FC-AL. - Adds support for the adapter to indicate that topology should be fixed and the topology types to be configured. - Adds a pt attribute to report the persistent topology if present. Link: https://lore.kernel.org/r/20191018211832.7917-15-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-10-25scsi: lpfc: Make FW logging dynamically configurableJames Smart1-1/+8
Currently, the FW logging facility is a load/boot time parameter which requires the driver to be unloaded/reloaded or the system rebooted in order to change its configuration. Convert the logging facility to allow dynamic enablement and configuration. Specifically: - Convert the feature so that it can be enabled dynamically via an attribute. Additionally, the size of the buffer can be configured dynamically. - Add locks around states that now may be changing. - Tie the feature into debugfs so that the logs can be read at any time. Link: https://lore.kernel.org/r/20191018211832.7917-12-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-10-25scsi: lpfc: Remove lock contention target write pathJames Smart1-1/+0
Lower IOps performance with write operations. Perf tool shows lock contention in dma_pool_alloc and dma_pool_free related to the txrdy_payload_pool. The allocations are for dma buffers for XFER_RDY's, which actually are not needed for the FCP_TRECEIVE command as the command contents are used by the adapter to generate the IU. Remove the allocations and the associated buffer pool. Rather than leaving NULLs in buffer pointer locations, set command and sgl to indicate skipped SGLE indexes. Link: https://lore.kernel.org/r/20191018211832.7917-10-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-10-01scsi: lpfc: Fix NVME io abort failures causing hangsJames Smart1-1/+0
The nvme-fc transport may call to abort an io on controller reset. If the driver is out of resources to issue an abort command, it just gives up and does nothing. The transport expects the lldd to always be able to terminate an io it has issued. At that point, the controller hangs waiting for aborted ios to be returned. Note: flaged by "6136" and "6176" error messages. Root issue was the adapter mis-allocated the number resources it allocated for command entries for the adapter. Convert the driver to allocate command resources based on the number of xris supported by the FC port - 1 resource for the original command and 1 resource for the abort request. Link: https://lore.kernel.org/r/20190922035906.10977-5-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-21Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds1-4/+7
Pull SCSI updates from James Bottomley: "This is mostly update of the usual drivers: qla2xxx, ufs, smartpqi, lpfc, hisi_sas, qedf, mpt3sas; plus a whole load of minor updates. The only core change this time around is the addition of request batching for virtio. Since batching requires an additional flag to use, it should be invisible to the rest of the drivers" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (264 commits) scsi: hisi_sas: Fix the conflict between device gone and host reset scsi: hisi_sas: Add BIST support for phy loopback scsi: hisi_sas: Add hisi_sas_debugfs_alloc() to centralise allocation scsi: hisi_sas: Remove some unused function arguments scsi: hisi_sas: Remove redundant work declaration scsi: hisi_sas: Remove hisi_sas_hw.slot_complete scsi: hisi_sas: Assign NCQ tag for all NCQ commands scsi: hisi_sas: Update all the registers after suspend and resume scsi: hisi_sas: Retry 3 times TMF IO for SAS disks when init device scsi: hisi_sas: Remove sleep after issue phy reset if sas_smp_phy_control() fails scsi: hisi_sas: Directly return when running I_T_nexus reset if phy disabled scsi: hisi_sas: Use true/false as input parameter of sas_phy_reset() scsi: hisi_sas: add debugfs auto-trigger for internal abort time out scsi: virtio_scsi: unplug LUNs when events missed scsi: scsi_dh_rdac: zero cdb in send_mode_select() scsi: fcoe: fix null-ptr-deref Read in fc_release_transport scsi: ufs-hisi: use devm_platform_ioremap_resource() to simplify code scsi: ufshcd: use devm_platform_ioremap_resource() to simplify code scsi: hisi_sas: use devm_platform_ioremap_resource() to simplify code scsi: ufs: Use kmemdup in ufshcd_read_string_desc() ...
2019-08-30scsi: lpfc: Remove bg debugfs buffersJames Smart1-2/+0
Capturing and downloading dif command data and dif data was done a dozen years ago and no longer being used. Also creates a potential security hole. Remove the debugfs buffer for dif debugging. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> CC: KyleMahlkuch <kmahlkuc@linux.vnet.ibm.com> CC: Hannes Reinecke <hare@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-08-20scsi: lpfc: Merge per-protocol WQ/CQ pairs into single per-cpu pairJames Smart1-2/+1
Currently, each hardware queue, typically allocated per-cpu, consists of a WQ/CQ pair per protocol. Meaning if both SCSI and NVMe are supported 2 WQ/CQ pairs will exist for the hardware queue. Separate queues are unnecessary. The current implementation wastes memory backing the 2nd set of queues, and the use of double the SLI-4 WQ/CQ's means less hardware queues can be supported which means there may not always be enough to have a pair per cpu. If there is only 1 pair per cpu, more cpu's may get their own WQ/CQ. Rework the implementation to use a single WQ/CQ pair by both protocols. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-08-20scsi: lpfc: Add NVMe sequence level error recovery supportJames Smart1-0/+1
FC-NVMe-2 added support for sequence level error recovery in the FC-NVME protocol. This allows for the detection of errors and lost frames and immediate retransmission of data to avoid exchange termination, which escalates into NVMeoFC connection and association failures. A significant RAS improvement. The driver is modified to indicate support for SLER in the NVMe PRLI is issues and to check for support in the PRLI response. When both sides support it, the driver will set a bit in the WQE to enable the recovery behavior on the exchange. The adapter will take care of all detection and retransmission. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-08-20scsi: lpfc: Support dynamic unbounded SGL lists on G7 hardware.James Smart1-0/+5
Typical SLI-4 hardware supports up to 2 4KB pages to be registered per XRI to contain the exchanges Scatter/Gather List. This caps the number of SGL elements that can be in the SGL. There are not extensions to extend the list out of the 2 pages. The G7 hardware adds a SGE type that allows the SGL to be vectored to a different scatter/gather list segment. And that segment can contain a SGE to go to another segment and so on. The initial segment must still be pre-registered for the XRI, but it can be a much smaller amount (256Bytes) as it can now be dynamically grown. This much smaller allocation can handle the SG list for most normal I/O, and the dynamic aspect allows it to support many MB's if needed. The implementation creates a pool which contains "segments" and which is initially sized to hold the initial small segment per xri. If an I/O requires additional segments, they are allocated from the pool. If the pool has no more segments, the pool is grown based on what is now needed. After the I/O completes, the additional segments are returned to the pool for use by other I/Os. Once allocated, the additional segments are not released under the assumption of "if needed once, it will be needed again". Pools are kept on a per-hardware queue basis, which is typically 1:1 per cpu, but may be shared by multiple cpus. The switch to the smaller initial allocation significantly reduces the memory footprint of the driver (which only grows if large ios are issued). Based on the several K of XRIs for the adapter, the 8KB->256B reduction can conserve 32MBs or more. It has been observed with per-cpu resource pools that allocating a resource on CPU A, may be put back on CPU B. While the get routines are distributed evenly, only a limited subset of CPUs may be handling the put routines. This can put a strain on the lpfc_put_cmd_rsp_buf_per_cpu routine because all the resources are being put on a limited subset of CPUs. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-08-20scsi: lpfc: Mitigate high memory pre-allocation by SCSI-MQJames Smart1-0/+1
When SCSI-MQ is enabled, the SCSI-MQ layers will do pre-allocation of MQ resources based on shost values set by the driver. In newer cases of the driver, which attempts to set nr_hw_queues to the cpu count, the multipliers become excessive, with a single shost having SCSI-MQ pre-allocation reaching into the multiple GBytes range. NPIV, which creates additional shosts, only multiply this overhead. On lower-memory systems, this can exhaust system memory very quickly, resulting in a system crash or failures in the driver or elsewhere due to low memory conditions. After testing several scenarios, the situation can be mitigated by limiting the value set in shost->nr_hw_queues to 4. Although the shost values were changed, the driver still had per-cpu hardware queues of its own that allowed parallelization per-cpu. Testing revealed that even with the smallish number for nr_hw_queues for SCSI-MQ, performance levels remained near maximum with the within-driver affiinitization. A module parameter was created to allow the value set for the nr_hw_queues to be tunable. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Ewan D. Milne <emilne@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>