diff options
author | James Smart <jsmart2021@gmail.com> | 2019-08-15 02:57:09 +0300 |
---|---|---|
committer | Martin K. Petersen <martin.petersen@oracle.com> | 2019-08-20 05:41:12 +0300 |
commit | d79c9e9d4b3d9330ee38f392a7c98e0fc494f7f8 (patch) | |
tree | 088e0dc43e6f8415f9bd529e49a3e072c89631d9 /drivers/scsi/lpfc/lpfc_sli4.h | |
parent | e62245d923caebc02582b12ce861c3d780b4106f (diff) | |
download | linux-d79c9e9d4b3d9330ee38f392a7c98e0fc494f7f8.tar.xz |
scsi: lpfc: Support dynamic unbounded SGL lists on G7 hardware.
Typical SLI-4 hardware supports up to 2 4KB pages to be registered per XRI
to contain the exchanges Scatter/Gather List. This caps the number of SGL
elements that can be in the SGL. There are not extensions to extend the
list out of the 2 pages.
The G7 hardware adds a SGE type that allows the SGL to be vectored to a
different scatter/gather list segment. And that segment can contain a SGE
to go to another segment and so on. The initial segment must still be
pre-registered for the XRI, but it can be a much smaller amount (256Bytes)
as it can now be dynamically grown. This much smaller allocation can
handle the SG list for most normal I/O, and the dynamic aspect allows it to
support many MB's if needed.
The implementation creates a pool which contains "segments" and which is
initially sized to hold the initial small segment per xri. If an I/O
requires additional segments, they are allocated from the pool. If the
pool has no more segments, the pool is grown based on what is now
needed. After the I/O completes, the additional segments are returned to
the pool for use by other I/Os. Once allocated, the additional segments are
not released under the assumption of "if needed once, it will be needed
again". Pools are kept on a per-hardware queue basis, which is typically
1:1 per cpu, but may be shared by multiple cpus.
The switch to the smaller initial allocation significantly reduces the
memory footprint of the driver (which only grows if large ios are
issued). Based on the several K of XRIs for the adapter, the 8KB->256B
reduction can conserve 32MBs or more.
It has been observed with per-cpu resource pools that allocating a resource
on CPU A, may be put back on CPU B. While the get routines are distributed
evenly, only a limited subset of CPUs may be handling the put routines.
This can put a strain on the lpfc_put_cmd_rsp_buf_per_cpu routine because
all the resources are being put on a limited subset of CPUs.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Diffstat (limited to 'drivers/scsi/lpfc/lpfc_sli4.h')
-rw-r--r-- | drivers/scsi/lpfc/lpfc_sli4.h | 18 |
1 files changed, 18 insertions, 0 deletions
diff --git a/drivers/scsi/lpfc/lpfc_sli4.h b/drivers/scsi/lpfc/lpfc_sli4.h index 3aeca387b22a..3ec9cf4c6427 100644 --- a/drivers/scsi/lpfc/lpfc_sli4.h +++ b/drivers/scsi/lpfc/lpfc_sli4.h @@ -680,6 +680,13 @@ struct lpfc_sli4_hdw_queue { uint32_t cpucheck_xmt_io[LPFC_CHECK_CPU_CNT]; uint32_t cpucheck_cmpl_io[LPFC_CHECK_CPU_CNT]; #endif + + /* Per HDWQ pool resources */ + struct list_head sgl_list; + struct list_head cmd_rsp_buf_list; + + /* Lock for syncing Per HDWQ pool resources */ + spinlock_t hdwq_lock; }; #ifdef LPFC_HDWQ_LOCK_STAT @@ -1089,6 +1096,17 @@ int lpfc_sli4_post_status_check(struct lpfc_hba *); uint8_t lpfc_sli_config_mbox_subsys_get(struct lpfc_hba *, LPFC_MBOXQ_t *); uint8_t lpfc_sli_config_mbox_opcode_get(struct lpfc_hba *, LPFC_MBOXQ_t *); void lpfc_sli4_ras_dma_free(struct lpfc_hba *phba); +struct sli4_hybrid_sgl *lpfc_get_sgl_per_hdwq(struct lpfc_hba *phba, + struct lpfc_io_buf *buf); +struct fcp_cmd_rsp_buf *lpfc_get_cmd_rsp_buf_per_hdwq(struct lpfc_hba *phba, + struct lpfc_io_buf *buf); +int lpfc_put_sgl_per_hdwq(struct lpfc_hba *phba, struct lpfc_io_buf *buf); +int lpfc_put_cmd_rsp_buf_per_hdwq(struct lpfc_hba *phba, + struct lpfc_io_buf *buf); +void lpfc_free_sgl_per_hdwq(struct lpfc_hba *phba, + struct lpfc_sli4_hdw_queue *hdwq); +void lpfc_free_cmd_rsp_buf_per_hdwq(struct lpfc_hba *phba, + struct lpfc_sli4_hdw_queue *hdwq); static inline void *lpfc_sli4_qe(struct lpfc_queue *q, uint16_t idx) { return q->q_pgs[idx / q->entry_cnt_per_pg] + |