Age | Commit message (Collapse) | Author | Files | Lines |
|
[ Upstream commit 99154581b05c8fb22607afb7c3d66c1bace6aa5d ]
When parsing the txq list in lpfc_drain_txq(), the driver attempts to pass
the requests to the adapter. If such an attempt fails, a local "fail_msg"
string is set and a log message output. The job is then added to a
completions list for cancellation.
Processing of any further jobs from the txq list continues, but since
"fail_msg" remains set, jobs are added to the completions list regardless
of whether a wqe was passed to the adapter. If successfully added to
txcmplq, jobs are added to both lists resulting in list corruption.
Fix by clearing the fail_msg string after adding a job to the completions
list. This stops the subsequent jobs from being added to the completions
list unless they had an appropriate failure.
Link: https://lore.kernel.org/r/20210910233159.115896-2-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 5aa615d195f1e142c662cb2253f057c9baec7531 ]
The driver is encountering a crash in lpfc_free_iocb_list() while
performing initial attachment.
Code review found this to be an errant failure path that was taken, jumping
to a tag that then referenced structures that were uninitialized.
Fix the failure path.
Link: https://lore.kernel.org/r/20210514195559.119853-9-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit fffd18ec6579c2d9c72b212169259062fe747888 ]
Fix a crash caused by a double put on the node when the driver completed an
ACC for an unsolicted abort on the same node. The second put was executed
by lpfc_nlp_not_used() and is wrong because the completion routine executes
the nlp_put when the iocbq was released. Additionally, the driver is
issuing a LOGO then immediately calls lpfc_nlp_set_state to put the node
into NPR. This call does nothing.
Remove the lpfc_nlp_not_used call and additional set_state in the
completion routine. Remove the lpfc_nlp_set_state post issue_logo. Isn't
necessary.
Link: https://lore.kernel.org/r/20210412013127.2387-3-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 39c4f1a965a9244c3ba60695e8ff8da065ec6ac4 ]
The driver is occasionally seeing the following SLI Port error, requiring
reset and reinit:
Port Status Event: ... error 1=0x52004a01, error 2=0x218
The failure means an RQ timeout. That is, the adapter had received
asynchronous receive frames, ran out of buffer slots to place the frames,
and the driver did not replenish the buffer slots before a timeout
occurred. The driver should not be so slow in replenishing buffers that a
timeout can occur.
When the driver received all the frames of a sequence, it allocates an IOCB
to put the frames in. In a situation where there was no IOCB available for
the frame of a sequence, the RQ buffer corresponding to the first frame of
the sequence was not returned to the FW. Eventually, with enough traffic
encountering the situation, the timeout occurred.
Fix by releasing the buffer back to firmware whenever there is no IOCB for
the first frame.
[mkp: typo]
Link: https://lore.kernel.org/r/20200128002312.16346-2-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 38503943c89f0bafd9e3742f63f872301d44cbea ]
The following kasan bug was called out:
BUG: KASAN: slab-out-of-bounds in lpfc_unreg_login+0x7c/0xc0 [lpfc]
Read of size 2 at addr ffff889fc7c50a22 by task lpfc_worker_3/6676
...
Call Trace:
dump_stack+0x96/0xe0
? lpfc_unreg_login+0x7c/0xc0 [lpfc]
print_address_description.constprop.6+0x1b/0x220
? lpfc_unreg_login+0x7c/0xc0 [lpfc]
? lpfc_unreg_login+0x7c/0xc0 [lpfc]
__kasan_report.cold.9+0x37/0x7c
? lpfc_unreg_login+0x7c/0xc0 [lpfc]
kasan_report+0xe/0x20
lpfc_unreg_login+0x7c/0xc0 [lpfc]
lpfc_sli_def_mbox_cmpl+0x334/0x430 [lpfc]
...
When processing the completion of a "Reg Rpi" login mailbox command in
lpfc_sli_def_mbox_cmpl, a call may be made to lpfc_unreg_login. The vpi is
extracted from the completing mailbox context and passed as an input for
the next. However, the vpi stored in the mailbox command context is an
absolute vpi, which for SLI4 represents both base + offset. When used with
a non-zero base component, (function id > 0) this results in an
out-of-range access beyond the allocated phba->vpi_ids array.
Fix by subtracting the function's base value to get an accurate vpi number.
Link: https://lore.kernel.org/r/20200322181304.37655-2-jsmart2021@gmail.com
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 7cfd5639d99bec0d27af089d0c8c114330e43a72 ]
If the driver receives a login that is later then LOGO'd by the remote port
(aka ndlp), the driver, upon the completion of the LOGO ACC transmission,
will logout the node and unregister the rpi that is being used for the
node. As part of the unreg, the node's rpi value is replaced by the
LPFC_RPI_ALLOC_ERROR value. If the port is subsequently offlined, the
offline walks the nodes and ensures they are logged out, which possibly
entails unreg'ing their rpi values. This path does not validate the node's
rpi value, thus doesn't detect that it has been unreg'd already. The
replaced rpi value is then used when accessing the rpi bitmask array which
tracks active rpi values. As the LPFC_RPI_ALLOC_ERROR value is not a valid
index for the bitmask, it may fault the system.
Revise the rpi release code to detect when the rpi value is the replaced
RPI_ALLOC_ERROR value and ignore further release steps.
Link: https://lore.kernel.org/r/20191105005708.7399-2-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 07b8582430370097238b589f4e24da7613ca6dd3 ]
Symptoms were seen of the driver not having valid data for mailbox
commands. After debugging, the following sequence was found:
The driver maintains a port-wide pointer of the mailbox command that is
currently in execution. Once finished, the port-wide pointer is cleared
(done in lpfc_sli4_mq_release()). The next mailbox command issued will set
the next pointer and so on.
The mailbox response data is only copied if there is a valid port-wide
pointer.
In the failing case, it was seen that a new mailbox command was being
attempted in parallel with the completion. The parallel path was seeing
the mailbox no long in use (flag check under lock) and thus set the port
pointer. The completion path had cleared the active flag under lock, but
had not touched the port pointer. The port pointer is cleared after the
lock is released. In this case, the completion path cleared the just-set
value by the parallel path.
Fix by making the calls that clear mbox state/port pointer while under
lock. Also slightly cleaned up the error path.
Link: https://lore.kernel.org/r/20190922035906.10977-8-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 1c36833d82ff24d0d54215fd956e7cc30fffce54 ]
Driver is setting bits in word 10 of the SLI4 ABORT WQE (the wqid). The
field was a carry over from a prior SLI revision. The field does not exist
in SLI4, and the action may result in an overlap with future definition of
the WQE.
Remove the setting of WQID in the ABORT WQE.
Also cleaned up WQE field settings - initialize to zero, don't bother to
set fields to zero.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 5a9eeff57f340238c39c95d8e7e54c96fc722de7 ]
Driver is hitting null pring pointers in lpfc_do_work().
Pointer assignment occurs based on SLI-revision. If recovering after an
error, its possible the sli revision for the port was cleared, making the
lpfc_phba_elsring() not return a ring pointer, thus the null pointer.
Add SLI revision checking to lpfc_phba_elsring() and status checking to all
callers.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 036cad1f1ac9ce03e2db94b8460f98eaf1e1ee4c ]
On FCoE adapters, when running link bounce test in a loop, initiator
failed to login with switch switch and required driver reload to
recover. Switch reached a point where all subsequent FLOGIs would be
LS_RJT'd. Further testing showed the condition to be related to not
performing FCF discovery between FLOGI's.
Fix by monitoring FLOGI failures and once a repeated error is seen
repeat FCF discovery.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
invalid
commit 4e87eb2f46ea547d12a276b2e696ab934d16cfb6 upstream.
Certain older adapters such as the OneConnect OCe10100 may not have a valid
wqpcnt value. In this case, do not set queue->page_count to 0 in
lpfc_sli4_queue_alloc() as this will prevent the driver from initializing.
Fixes: 895427bd01 ("scsi: lpfc: NVME Initiator: Base modifications")
Cc: stable@vger.kernel.org # 4.11+
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
Reviewed-by: Laurence Oberman <loberman@redhat.com>
Tested-by: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 0ef01a2d95fd62bb4f536e7ce4d5e8e74b97a244 ]
When running an mds diagnostic that passes frames with the switch, soft
lockups are detected. The driver is in a CQE processing loop and has
sufficient amount of traffic that it never exits the ring processing routine,
thus the "lockup".
Cap the number of elements in the work processing routine to 64 elements. This
ensures that the cpu will be given up and the handler reschedule to process
additional items.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 04673e38f56b30cd39b1fa0f386137d818b17781 ]
The driver controls when the hardware sends completions that communicate
consumption of elements from the WQ. This is done by setting a WQEC bit
on a WQE.
The current driver sets it on every Nth WQE posting. However, the driver
isn't clearing the bit if the WQE is reused. Thus, if the queue depth
isn't evenly divisible by N, with enough time, it can be set on every
element, creating a lot of overhead and risking CQ full conditions.
Correct by clearing the bit when not setting it on an Nth element.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 8e036a9497c5d565baafda4c648f2f372999a547 upstream.
The driver is encountering oops in lpfc_sli_calc_ring.
The driver is setting hba_wqidx for FCP based on the policy in use for
NVME. The two may not be the same. Change to set the wqidx based on the
FCP policy.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 1234a6d54fed8a00091968c4eb2fb52e1cbb8e2e upstream.
The driver crashes when attempting to use a freed ndpl pointer.
The pci_remove_one handler runs on a separate kernel thread. The order
of the removal is starting by freeing all of the ndlps and then
disabling interrupts. In between these two events the driver can still
receive an ELS and process it. When it tries to use the ndlp pointer
will be NULL
Change the order of the pci_remove_one vs disable interrupts so that
interrupts are disabled before the ndlp's are freed.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 401bb4169da655f3e5d28d0b208182e1ab60bf2a upstream.
During pci hot plug, the kernel crashes in a list_add_call
The lookup by tag function will return null if the IOCB is out of range
or does not have the on txcmplq flag set.
Fix: Check for null return from lookup by tag.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Various oops including cpu LOCKUPs were seen.
For asynchronously received ius where the driver must assign exchange
resources, the resources were on a single get (free) list and put list
(finished, waiting to be put on get list). As all cpus are sharing the
lists, an interrupt for a receive frame may have to wait for all the
other cpus to place their done work onto the put list before it can
acquire the lock to pull from the list.
Fix by breaking the resource lists into per-cpu lists or at least more
than 1 list with cpu's sharing the lists). A cpu would allocate from the
free list for its own cpu, and put its done work on the its own put list
- avoiding the contention. As cpu load may vary, when empty, a cpu may
grab from another cpu, thereby changing resource distribution. But
searching for a resource only occurs on 1 or a few cpus until a single
resource can be allocated. if the condition reoccurs, it starts looking
at a different cpu.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Various oops being seen on being in the ISR too long and cpu lockups,
when under heavy load.
The amount of work being posted off of completion queues kept the ISR
running almost all the time
Correct the issue by limiting the amount of work per iteration.
[mkp: typo]
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Modify driver return error codes to align with host nvme transport.
Driver isn't returning Exxx error codes to properly reflect out of
resource or connectivity conditions (-EBUSY), yet there were hard error
conditions returning -EBUSY.
Ensure the following situations return the proper return code:
- Temporary failures or temporary resource availability: -EBUSY
- Connectivity issues: -ENODEV
All others are treated as hard errors and return an -Exxx value that
indicates the type of error.
Also, lpfc_sli4_issue_wqe() was modified to not translate error from
-Exxx to WQE state. This allows lpfc_nvme_fcp_io_submit() routine to
just return whatever -E value was returned from other routines.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
The PCI pool API is deprecated. This commit replaces the PCI pool old
API by the appropriate function with the DMA pool API. It also updates
some comments, accordingly.
Signed-off-by: Romain Perier <romain.perier@collabora.com>
Reviewed-by: Peter Senna Tschudin <peter.senna@collabora.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
During every reset, IOCBs are allocated. So, at one point, number of
allocated IOCBs reaches maximum limit and lpfc_sli_next_iotag fails.
Allocate IOCBs only during initialization. Reuse them after every reset
instead of allocating new set of IOCBs.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
OS crashes after the completion of firmware download.
Failure in posting SCSI SGL buffers because number of SGL buffers is
less than total count. Some of the pending IOs are not completed by
driver. SGL buffers for these IOs are not added back to the list.
Pending IOs are not completed because lpfc_wq_list list is initialized
before completion of pending IOs.
Postpone lpfc_wq_list reinitialization by moving
lpfc_sli4_queue_destroy() after lpfc_hba_down_post().
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
There is a null pointer dereference that can happen in the FOF interrupt
handler.
The driver was not setting up cq->assoc_qp_for sli4_hba->oas_cq.
Initialize cq->assoc_qp before accessing it.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Administrator intervention is currently required to get good numbers
when switching from running latency tests to IOPS tests.
The configured interrupt coalescing values will greatly effect the
results of these tests. Currently, the driver has a single coalescing
value set by values of the module attribute. This patch changes the
driver to support auto-configuration of the coalescing value based on
the total number of outstanding IOs and average number of CQEs processed
per interrupt for an EQ. Values are checked every 5 seconds.
The driver defaults to the automatic selection. Automatic selection can
be disabled by the new lpfc_auto_imax module_parameter.
Older hardware can only change interrupt coalescing by mailbox
command. Newer hardware supports change via a register. The patch
support both.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Addressed the following reported defects:
** CID 1411552: Control flow issues (MISSING_BREAK)
/drivers/scsi/lpfc/lpfc_sli.c: 13259 in lpfc_sli4_nvmet_handle_rcqe()
** CID 1411553: Memory - illegal accesses (OVERRUN)
/drivers/scsi/lpfc/lpfc_sli.c: 16218 in lpfc_fc_frame_check()
** CID 1411553: Memory - illegal accesses (OVERRUN)
Overrunning array "lpfc_rctl_names" of 202 8-byte elements at element
index 244 (byte offset 1952) using index "fc_hdr->fh_r_ctl" (which
evaluates to 244).
** CID 1411554: Null pointer dereferences (REVERSE_INULL)
/drivers/scsi/lpfc/lpfc_nvmet.c: 2131 in lpfc_nvmet_unsol_fcp_abort_cmp()
** CID 1411555: Memory - illegal accesses (UNINIT)
/drivers/scsi/lpfc/lpfc_nvmet.c: 180 in lpfc_nvmet_ctxbuf_post()
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Kernel panic when log_verbose is set to 0xffffffff
phba->pport is dereferenced before it is initialized
Fix: Do not dereference phba->pport if it is NULL
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Null pointer dereference when BFS VM is powered off
The driver incorrectly uses sli3_ring on SLI-4 adapters
Use the correct ring structure based on sli_rev
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Tested-by: Raphael Silva <raphasil@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Observing lpfc port down after issuing hbacmd reset command
Failure in posting SGL buffers. If there is only one SGL buffer and rrq
is valid for its XRI, we are rightly returning NULL but not adding the
buffer back to the SGL list. So, number of buffers become less than
total count and repost fails during reset.
Add SGL buffer back to list before returning NULL.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Added code to support Cisco MDS loopback diagnostic. The diagnostics run
various loopbacks including one which loops-back frame through the
driver.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Too many work items being processed in IRQ context take a lot of CPU
time and cause problems.
With a recent change, we get out of the ISR after hitting entry_repost
work items on a queue. However, the actual values for entry repost are
still high. EQ is 128 and CQ is 128, this could translate into
processing 128 * 128 (16384) work items under IRQ context.
Set entry_repost in the actual queue creation routine now. Limit EQ
repost to 8 and CQ repost to 64 to further limit the amount of time
spent in the IRQ.
Fix fof IRQ routines as well.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Previous logic would just drop the IO.
Added logic to queue the IO to wait for an IO context resource from an
IO thats already in progress.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Currently IO resources are mapped 1 to 1 with RQ buffers posted
Added logic to separate RQE buffers from IO op resources
(sgl/iocbq/context). During initialization, the driver will determine
how many SGLs it will allocate for NVMET (based on what the firmware
reports) and associate a NVMET IOCBq and NVMET context structure with
each one.
Now that hdr/data buffers are immediately reposted back to the RQ, 512
RQEs for each MRQ is sufficient. Also, since NVMET data buffers are now
128 bytes, lpfc_nvmet_mrq_post is not necessary anymore as we will
always post the max (512) buffers per NVMET MRQ.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Using 2048 byte buffer and onle 128 bytes is needed.
Create nee LFPC_NVMET_DATA_BUF_SIZE define to use for NVMET RQ/MRQs.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
After running IOPS test for 30 second we get kernel:NMI watchdog:
Watchdog detected hard LOCKUP on cpu 0
The driver is speend too much time in its ISR.
In ISR EQ and CQ processing routines, if we hit the entry_repost numbers
of EQE/CQEs just break out of the routine as opposed to hitting the
doorbell with NOARM and continue processing.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Large block writes to the nvme target were failing because the default
number of RQs posted was insufficient.
Expand the NVMET RQs to 2048 RQEs and ensure a minimum of 512 RQEs are
posted, no matter how many MRQs are configured.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
More debug messages added for nvme statistics.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
With 255 vports created a link trasition can casue a crash.
When going through discovery after a link bounce the driver is using
rpis before the cmd FCOE_POST_HDR_TEMPLATES completes. By doing that the
next rpi bumps the rpi range out of the boundary.
The fix it to increment the next_rpi only when the
FCOE_POST_HDR_TEMPLATE succeeds.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
To select the appropriate shost template, the driver is issuing a
mailbox command to retrieve the wwn. Turns out the sending of the
command precedes the reset of the function. On SLI-4 adapters, this is
inconsequential as the mailbox command location is specified by dma via
the BMBX register. However, on SLI-3 adapters, the location of the
mailbox command submission area changes. When the function is first
powered on or reset, the cmd is submitted via PCI bar memory. Later the
driver changes the function config to use host memory and DMA. The
request to start a mailbox command is the same, a simple doorbell write,
regardless of submission area. So.. if there has not been a boot driver
run against the adapter, the mailbox command works as defaults are
ok. But, if the boot driver has configured the card and, and if no
platform pci function/slot reset occurs as the os starts, the mailbox
command will fail. The SLI-3 device will use the stale boot driver dma
location. This can cause PCI eeh errors.
Fix is to reset the sli-3 function before sending the mailbox command,
thus synchronizing the function/driver on mailbox location.
Note: The fix uses routines that are typically invoked later in the call
flow to reset the sli-3 device. The issue in using those routines is
that the normal (non-fix) flow does additional initialization, namely
the allocation of the pport structure. So, rather than significantly
reworking the initialization flow so that the pport is alloc'd first,
pointer checks are added to work around it. Checks are limited to the
routines invoked by a sli-3 adapter (s3 routines) as this fix/early call
is only invoked on a sli3 adapter. Nothing changes post the
fix. Subsequent initialization, and another adapter reset, still occur -
both on sli-3 and sli-4 adapters.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Fixes: 96418b5e2c88 ("scsi: lpfc: Fix eh_deadline setting for sli3 adapters.")
Cc: stable@vger.kernel.org # v4.11+
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
The driver with nvme had this routine stubbed.
Right now XRI_ABORTED_CQE is not handled and the FC NVMET
Transport has a new API for the driver.
Missing code path, new NVME abort API
Update ABORT processing for NVMET
There are 3 new FC NVMET Transport API/ template routines for NVMET:
lpfc_nvmet_xmt_fcp_release
This NVMET template callback routine called to release context
associated with an IO This routine is ALWAYS called last, even
if the IO was aborted or completed in error.
lpfc_nvmet_xmt_fcp_abort
This NVMET template callback routine called to abort an exchange that
has an IO in progress
nvmet_fc_rcv_fcp_req
When the lpfc driver receives an ABTS, this NVME FC transport layer
callback routine is called. For this case there are 2 paths thru the
driver: the driver either has an outstanding exchange / context for the
XRI to be aborted or not. If not, a BA_RJT is issued otherwise a BA_ACC
NVMET Driver abort paths:
There are 2 paths for aborting an IO. The first one is we receive an IO and
decide not to process it because of lack of resources. An unsolicated ABTS
is immediately sent back to the initiator as a response.
lpfc_nvmet_unsol_fcp_buffer
lpfc_nvmet_unsol_issue_abort (XMIT_SEQUENCE_WQE)
The second one is we sent the IO up to the NVMET transport layer to
process, and for some reason the NVME Transport layer decided to abort the
IO before it completes all its phases. For this case there are 2 paths
thru the driver:
the driver either has an outstanding TSEND/TRECEIVE/TRSP WQE or no
outstanding WQEs are present for the exchange / context.
lpfc_nvmet_xmt_fcp_abort
if (LPFC_NVMET_IO_INP)
lpfc_nvmet_sol_fcp_issue_abort (ABORT_WQE)
lpfc_nvmet_sol_fcp_abort_cmp
else
lpfc_nvmet_unsol_fcp_issue_abort
lpfc_nvmet_unsol_issue_abort (XMIT_SEQUENCE_WQE)
lpfc_nvmet_unsol_fcp_abort_cmp
Context flags:
LPFC_NVMET_IOP - his flag signifies an IO is in progress on the exchange.
LPFC_NVMET_XBUSY - this flag indicates the IO completed but the firmware
is still busy with the corresponding exchange. The exchange should not be
reused until after a XRI_ABORTED_CQE is received for that exchange.
LPFC_NVMET_ABORT_OP - this flag signifies an ABORT_WQE was issued on the
exchange.
LPFC_NVMET_CTX_RLS - this flag signifies a context free was requested,
but we are deferring it due to an XBUSY or ABORT in progress.
A ctxlock is added to the context structure that is used whenever these
flags are set/read within the context of an IO.
The LPFC_NVMET_CTX_RLS flag is only set in the defer_relase routine when
the transport has resolved all IO associated with the buffer. The flag is
cleared when the CTX is associated with a new IO.
An exchange can has both an LPFC_NVMET_XBUSY and a LPFC_NVMET_ABORT_OP
condition active simultaneously. Both conditions must complete before the
exchange is freed.
When the abort callback (lpfc_nvmet_xmt_fcp_abort) is envoked:
If there is an outstanding IO, the driver will issue an ABORT_WQE. This
should result in 3 completions for the exchange:
1) IO cmpl with XB bit set
2) Abort WQE cmpl
3) XRI_ABORTED_CQE cmpl
For this scenerio, after completion #1, the NVMET Transport IO rsp
callback is called. After completion #2, no action is taken with respect
to the exchange / context. After completion #3, the exchange context is
free for re-use on another IO.
If there is no outstanding activity on the exchange, the driver will send a
ABTS to the Initiator. Upon completion of this WQE, the exchange / context
is freed for re-use on another IO.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
|
|
When RPI is not available, driver sends WQE with invalid RPI value and
rejected by HBA.
lpfc 0000:82:00.3: 1:3154 BLS ABORT RSP failed, data: x3/xa0320008
and
lpfc :2753 PLOGI failure DID:FFFFFA Status:x3/xa0240008
In this case, driver accesses rpi_ids array out of bounds.
Fix:
Check return value of lpfc_sli4_alloc_rpi(). Do not allocate
lpfc_nodelist entry if RPI is not available.
When RPI is not available, we will get discovery timeouts and
command drops for some of the vports as seen below.
lpfc :0273 Unexpected discovery timeout, vport State x0
lpfc :0230 Unexpected timeout, hba link state x5
lpfc :0111 Dropping received ELS cmd Data: x0 xc90c55 x0
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
|
|
There are two versions of a structure for queue creation and setup that the
driver shares with FW. The driver was only treating as version 0.
Verify WQ_CREATE with 128B WQEs in V0 and V1.
Code review of another bug showed the driver passing
128B WQEs and 8 pages in WQ CREATE and V0.
Code inspection/instrumentation showed that the driver
uses V0 in WQ_CREATE and if the caller passes queue->entry_size
128B, the driver sets the hdr_version to V1 so all is good.
When I tested the V1 WQ_CREATE, the mailbox failed causing
the driver to unload.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
|
|
There are couple of different load/unload issues fixed with this patch.
One of the issues was reported by Junichi Nomura, a patch was submitted
by Johannes Thumsrhirn which did fix one of the problems but the fix in
this patch separates the pring free from the queue free and does not set
the parameter passed in to NULL.
issues:
(1) driver could not be unloaded and reloaded without some Oops or
Panic occurring.
(2) The driver was panicking because of a corruption in the Memory
Manager when the iocb list was getting allocated.
Root cause for the memory corruption was a double free of the Work Queue
ring pointer memory - Freed once in the lpfc_sli4_queue_free when the CQ
was destroyed and again in lpfc_sli4_queue_free when the WQ was destroyed.
The pring free and the queue free were separated, the pring free was moved
to the wq destroy routine because it a better fit logically to delete the
ring with the wq.
The checkpatch flagged several alignmenet issues that were also corrected
with this patch.
The mboxq was never initialed correctly before it was used by the driver
this patch corrects that issue.
Reported-by: Junichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Tested-by: Junichi Nomura <j-nomura@ce.jp.nec.com>
|
|
Comment should have said Repost.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
|
|
|
|
Without apriori understanding of what the define is, the name gives
a very different impression of what it is (a max delay value
for an EQ). Rename the define so it reflects what it is: the number
of EQ IDs that can be set in one instance of the MODIFY_EQ_DELAY
mbx command.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
previous code did little more than log a message.
This patch adds abort path support, modeled after the SCSI code paths.
Currently addresses only the initiator path. Target path under
development, but stubbed out.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
From: Colin Ian King <colin.king@canonical.com>
In the case where sglq is null, the current code just returns without
unlocking the spinlock sql_list_lock. Fix this by breaking out of the
while loop and the exit path will then unlock and return NULL as was
the original intention.
Detected by CoverityScan, CID#1411635 ("Missing unlock")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
From: Colin Ian King <colin.king@canonical.com>
The sanity check for hrq should be moved to before the deference
of hrq to ensure we don't perform a null pointer deference.
Detected by CoverityScan, CID#1411650 ("Dereference before null check")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Pull more SCSI updates from James Bottomley:
"This is the set of stuff that didn't quite make the initial pull and a
set of fixes for stuff which did.
The new stuff is basically lpfc (nvme), qedi and aacraid. The fixes
cover a lot of previously submitted stuff, the most important of which
probably covers some of the failing irq vectors allocation and other
fallout from having the SCSI command allocated as part of the block
allocation functions"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (59 commits)
scsi: qedi: Fix memory leak in tmf response processing.
scsi: aacraid: remove redundant zero check on ret
scsi: lpfc: use proper format string for dma_addr_t
scsi: lpfc: use div_u64 for 64-bit division
scsi: mac_scsi: Fix MAC_SCSI=m option when SCSI=m
scsi: cciss: correct check map error.
scsi: qla2xxx: fix spelling mistake: "seperator" -> "separator"
scsi: aacraid: Fixed expander hotplug for SMART family
scsi: mpt3sas: switch to pci_alloc_irq_vectors
scsi: qedf: fixup compilation warning about atomic_t usage
scsi: remove scsi_execute_req_flags
scsi: merge __scsi_execute into scsi_execute
scsi: simplify scsi_execute_req_flags
scsi: make the sense header argument to scsi_test_unit_ready mandatory
scsi: sd: improve TUR handling in sd_check_events
scsi: always zero sshdr in scsi_normalize_sense
scsi: scsi_dh_emc: return success in clariion_std_inquiry()
scsi: fix memory leak of sdpk on when gd fails to allocate
scsi: sd: make sd_devt_release() static
scsi: qedf: Add QLogic FastLinQ offload FCoE driver framework.
...
|
|
Fix typos and add the following to the scripts/spelling.txt:
comsume||consume
comsumer||consumer
comsuming||consuming
I see some variable names with this pattern, but this commit is only
touching comment blocks to avoid unexpected impact.
Link: http://lkml.kernel.org/r/1481573103-11329-19-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|