diff options
author | Xiang Chen <chenxiang66@hisilicon.com> | 2018-05-31 15:50:48 +0300 |
---|---|---|
committer | Martin K. Petersen <martin.petersen@oracle.com> | 2018-06-20 05:02:25 +0300 |
commit | 2ba5afb6834b876c1547fa7d714ddca8a8039b36 (patch) | |
tree | 20da589986c0ae1c377d802c34739894f29f71b3 /drivers/scsi/hisi_sas/hisi_sas.h | |
parent | f2ae8d04327f14ff93a263af265bc91c689c208e (diff) | |
download | linux-2ba5afb6834b876c1547fa7d714ddca8a8039b36.tar.xz |
scsi: hisi_sas: Pre-allocate slot DMA buffers
Currently the driver spends much time allocating and freeing the slot DMA
buffer for command delivery/completion. To boost the performance,
pre-allocate the buffers for all IPTT. The downside of this approach is
that we are reallocating all buffer memory upfront, so hog memory which we
may not need.
However, the current method - DMA buffer pool - also caches all buffers and
does not free them until the pool is destroyed, so is not exactly efficient
either.
On top of this, since the slot DMA buffer is slightly bigger than a 4K
page, we need to allocate 2x4K pages per buffer (for 4K page kernel), which
is quite wasteful. For 64K page size this is not such an issue.
So, for the 4K page case, in order to make memory usage more efficient,
pre-allocating larger blocks of DMA memory for the buffers can be more
efficient.
To make DMA memory usage most efficient, we would choose a single
contiguous DMA memory block, but this could use up all the DMA memory in
the system (when CMA enabled and no IOMMU), or we may just not be able to
allocate a DMA buffer large enough when no CMA or IOMMU.
To decide the block size we use the LCM (least common multiple) of the
buffer size and the page size. We roundup(64) to ensure the LCM is not too
large, even though a little memory may be wasted per block.
So, with this, the total memory requirement is about is about 17MB for 4096
max IPTT.
Previously (for 4K pages case), it would be 32MB (for all slots
allocated).
With this change, the relative increase of IOPS for bs=4K read when
PAGE_SIZE=4K and PAGE_SIZE=64K is as follows:
IODEPTH 4K PAGE_SIZE 64K PAGE_SIZE
32 56% 47%
64 53% 44%
128 64% 43%
256 67% 45%
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Diffstat (limited to 'drivers/scsi/hisi_sas/hisi_sas.h')
-rw-r--r-- | drivers/scsi/hisi_sas/hisi_sas.h | 9 |
1 files changed, 5 insertions, 4 deletions
diff --git a/drivers/scsi/hisi_sas/hisi_sas.h b/drivers/scsi/hisi_sas/hisi_sas.h index 78e5a9254143..beda41238285 100644 --- a/drivers/scsi/hisi_sas/hisi_sas.h +++ b/drivers/scsi/hisi_sas/hisi_sas.h @@ -16,6 +16,7 @@ #include <linux/clk.h> #include <linux/dmapool.h> #include <linux/iopoll.h> +#include <linux/lcm.h> #include <linux/mfd/syscon.h> #include <linux/module.h> #include <linux/of_address.h> @@ -199,17 +200,18 @@ struct hisi_sas_slot { int dlvry_queue_slot; int cmplt_queue; int cmplt_queue_slot; - int idx; int abort; int ready; - void *buf; - dma_addr_t buf_dma; void *cmd_hdr; dma_addr_t cmd_hdr_dma; struct work_struct abort_slot; struct timer_list internal_abort_timer; bool is_internal; struct hisi_sas_tmf_task *tmf; + /* Do not reorder/change members after here */ + void *buf; + dma_addr_t buf_dma; + int idx; }; struct hisi_sas_hw { @@ -299,7 +301,6 @@ struct hisi_hba { int queue_count; - struct dma_pool *buffer_pool; struct hisi_sas_device devices[HISI_SAS_MAX_DEVICES]; struct hisi_sas_cmd_hdr *cmd_hdr[HISI_SAS_MAX_QUEUES]; dma_addr_t cmd_hdr_dma[HISI_SAS_MAX_QUEUES]; |