summaryrefslogtreecommitdiff
path: root/drivers/infiniband/ulp/rtrs/rtrs-clt.c
AgeCommit message (Collapse)AuthorFilesLines
2022-03-02RDMA/rtrs-clt: Move free_permit from free_clt to rtrs_clt_closeMd Haris Iqbal1-1/+1
[ Upstream commit c46fa8911b17e3f808679061a8af8bee219f4602 ] Error path of rtrs_clt_open() calls free_clt(), where free_permit is called. This is wrong since error path of rtrs_clt_open() does not need to call free_permit(). Also, moving free_permits() call to rtrs_clt_close(), makes it more aligned with the call to alloc_permit() in rtrs_clt_open(). Fixes: 6a98d71daea1 ("RDMA/rtrs: client: main functionality") Link: https://lore.kernel.org/r/20220217030929.323849-2-haris.iqbal@ionos.com Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Reviewed-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-03-02RDMA/rtrs-clt: Fix possible double free in error caseMd Haris Iqbal1-17/+20
[ Upstream commit 8700af2cc18c919b2a83e74e0479038fd113c15d ] Callback function rtrs_clt_dev_release() for put_device() calls kfree(clt) to free memory. We shouldn't call kfree(clt) again, and we can't use the clt after kfree too. Replace device_register() with device_initialize() and device_add() so that dev_set_name can() be used appropriately. Move mutex_destroy() to the release function so it can be called in the alloc_clt err path. Fixes: eab098246625 ("RDMA/rtrs-clt: Refactor the failure cases in alloc_clt") Link: https://lore.kernel.org/r/20220217030929.323849-1-haris.iqbal@ionos.com Reported-by: Miaoqian Lin <linmq006@gmail.com> Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Reviewed-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27RDMA/rtrs-clt: Fix the initial value of min_latencyJack Wang1-1/+1
[ Upstream commit 925cac6358677d3d64f9b25f205eeb3d31c9f7f8 ] The type of min_latency is ktime_t, so use KTIME_MAX to initialize the initial value. Fixes: dc3b66a0ce70 ("RDMA/rtrs-clt: Add a minimum latency multipath policy") Link: https://lore.kernel.org/r/20211124081040.19533-1-jinpu.wang@ionos.com Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Reviewed-by: Guoqing Jiang <Guoqing.Jiang@linux.dev> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-08-23RDMA/rtrs-clt: Fix counting inflight IOGioh Kim1-3/+4
There are mis-match at counting inflight IO after changing the multipath policy. For example, we started fio test with round-robin policy and then we changed the policy to min-inflight. IOs created under the RR policy is finished under the min-inflight policy and inflight counter only decreased. So the counter would be negative value. And also we started fio test with min-inflight policy and changed the policy to the round-robin. IOs created under the min-inflight policy increased the inflight IO counter but the inflight IO counter was not decreased because the policy was the round-robin when IO was finished. So it should count IOs only if the IO is created under the min-inflight policy. It should not care the policy when the IO is finished. This patch adds a field mp_policy in struct rtrs_clt_io_req and stores the multipath policy when an object of rtrs_clt_io_req is created. Then rtrs-clt checks the mp_policy of only struct rtrs_clt_io_req instead of the struct rtrs_clt. Link: https://lore.kernel.org/r/20210806112112.124313-6-haris.iqbal@ionos.com Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-23RDMA/rtrs: Remove all likely and unlikelyGioh Kim1-64/+62
The IO performance test with fio after swapping the likely and unlikely macros in all if-statement shows no difference. They do not help for the performance of rtrs. Thanks to Haakon Bugge for the test scenario. The fio test did random read on 32 rnbd devices and 64 processes. Test environment: - Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz - 376G memory - kernel version: 5.4.86 - gcc version: gcc (Debian 8.3.0-6) 8.3.0 - Infiniband controller: Mellanox Technologies MT27800 Family [ConnectX-5] Test result: - before swapping: IOPS=829k, BW=3239MiB/s - after swapping: IOPS=829k, BW=3238MiB/s - remove all (un)likely: IOPS=829k, BW=3238MiB/s Link: https://lore.kernel.org/r/20210806112112.124313-5-haris.iqbal@ionos.com Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-23RDMA/rtrs-clt: During add_path change for_new_clt according to path_numMd Haris Iqbal1-0/+12
When all the paths are removed for a session, the addition of the first path is like a new session for the storage server. Hence, for_new_clt has to be set to 1. Link: https://lore.kernel.org/r/20210806112112.124313-2-haris.iqbal@ionos.com Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-19RDMA/rtrs: Remove a useless kfree()Christophe JAILLET1-1/+0
'sess->rbufs' is known to be NULL here, so there is no point in kfree'ing it. Fixes: 6a98d71daea1 ("RDMA/rtrs: client: main functionality") Link: https://lore.kernel.org/r/9a57c9f837fa2c6f0070578a1bc4840688f62962.1628185335.git.christophe.jaillet@wanadoo.fr Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Acked-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-07-15RDMA/rtrs: Move sq_wr_avail to rtrs_conJack Wang1-0/+1
In order to account HB for sq_wr_avail properly, move sq_wr_avail from rtrs_srv_con to rtrs_con. Although rtrs-clt do not care sq_wr_avail, but still init it to max_send_wr. Fixes: b38041d50add ("RDMA/rtrs: Do not signal for heatbeat") Link: https://lore.kernel.org/r/20210712060750.16494-7-jinpu.wang@ionos.com Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Reviewed-by: Aleksei Marov <aleksei.marov@ionos.com> Reviewed-by: Gioh Kim <gi-oh.kim@ionos.com> Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-07-15RDMA/rtrs: Enable the same selective signal for heartbeat and IOJack Wang1-2/+5
On idle session, because we do not do signal for heartbeat, it will overflow the send queue after sometime. To avoid that, we need to enable the signal for heartbeat. To do that, add a new member signal_interval in rtrs_path, which will set min of queue_depth and SERVICE_CON_QUEUE_DEPTH, and track it for both heartbeat and IO, so the sq queue full accounting is correct. Fixes: b38041d50add ("RDMA/rtrs: Do not signal for heatbeat") Link: https://lore.kernel.org/r/20210712060750.16494-4-jinpu.wang@ionos.com Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Reviewed-by: Aleksei Marov <aleksei.marov@ionos.com> Reviewed-by: Gioh Kim <gi-oh.kim@ionos.com> Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-07-15RDMA/rtrs: move wr_cnt from rtrs_srv_con to rtrs_conJack Wang1-3/+4
We need to track also the wr used for heatbeat. This is a preparation for that, will be used in later patch. The io_cnt in rtrs_clt is removed, use wr_cnt instead. Link: https://lore.kernel.org/r/20210712060750.16494-3-jinpu.wang@ionos.com Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Reviewed-by: Aleksei Marov <aleksei.marov@ionos.com> Reviewed-by: Gioh Kim <gi-oh.kim@ionos.com> Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-22rnbd/rtrs-clt: Query and use max_segments from rtrs-clt.Jack Wang1-10/+8
With fast memory registration on write request, rnbd-clt can do bigger IO without split. rnbd-clt now can query rtrs-clt to get the max_segments, instead of using BMAX_SEGMENTS. BMAX_SEGMENTS is not longer needed, so remove it. Link: https://lore.kernel.org/r/20210621055340.11789-6-jinpu.wang@ionos.com Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Reviewed-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-22RDMA/rtrs-clt: Raise MAX_SEGMENTSJack Wang1-2/+4
As we can do fast memory registration on write, we can increase the max_segments, default to 512K. Link: https://lore.kernel.org/r/20210621055340.11789-5-jinpu.wang@ionos.com Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Reviewed-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-22RDMA/rtrs_clt: Alloc less memory with write path fast memory registrationJack Wang1-3/+2
With write path fast memory registration, we need less memory for each request. With fast memory registration, we can reduce max_send_sge to save memory usage. Also convert the kmalloc_array to kcalloc. Link: https://lore.kernel.org/r/20210621055340.11789-4-jinpu.wang@ionos.com Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Reviewed-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-22RDMA/rtrs-clt: Write path fast memory registrationJack Wang1-27/+73
With fast memory registration in write path, we can reduce the memory consumption by using less max_send_sge, support IO bigger than 116 KB (29 segments * 4 KB) without splitting, and it also make the IO path more symmetric. To avoid some times MR reg failed, waiting for the invalidation to finish before the new mr reg. Introduce a refcount, only finish the request when both local invalidation and io reply are there. Link: https://lore.kernel.org/r/20210621055340.11789-3-jinpu.wang@ionos.com Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Dima Stepanov <dmitrii.stepanov@ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-22RDMA/rtrs: Introduce head/tail wrJack Wang1-7/+9
Introduce tail wr, we can send as the last wr, we want to send the local invalidate wr after rdma wr in later patch. While at it, also fix coding style issue. Link: https://lore.kernel.org/r/20210621055340.11789-2-jinpu.wang@ionos.com Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Reviewed-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-22RDMA: Fix kernel-doc warnings about wrong commentLeon Romanovsky1-2/+2
Compilation with W=1 produces warnings similar to the below. drivers/infiniband/ulp/ipoib/ipoib_main.c:320: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst All such occurrences were found with the following one line git grep -A 1 "\/\*\*" drivers/infiniband/ Link: https://lore.kernel.org/r/e57d5f4ddd08b7a19934635b44d6d632841b9ba7.1623823612.git.leonro@nvidia.com Reviewed-by: Jack Wang <jinpu.wang@ionos.com> #rtrs Reviewed-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-18RDMA/rtrs: Check device max_qp_wr limit when create QPJack Wang1-14/+15
Currently we only check device max_qp_wr limit for IO connection, but not for service connection. We should check for both. So save the max_qp_wr device limit in wr_limit, and use it for both IO connections and service connections. While at it, also remove an outdated comments. Link: https://lore.kernel.org/r/20210614090337.29557-6-jinpu.wang@ionos.com Suggested-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-18RDMA/rtrs: Rename cq_size/queue_size to cq_num/queue_numGuoqing Jiang1-9/+9
Those variables are passed to create_cq, create_qp, rtrs_iu_alloc and rtrs_iu_free, so these *_size means the num of unit. And cq_size also means number of cq element. Also move the setting of cq_num to common path. Link: https://lore.kernel.org/r/20210614090337.29557-5-jinpu.wang@ionos.com Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Reviewed-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com> Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-18RDMA/rtrs: RDMA_RXE requires more number of WRMd Haris Iqbal1-3/+4
When using rdma_rxe, post_one_recv() returns ENOMEM error due to the full recv queue. This patch increase the number of WR for receive queue to support all devices. Link: https://lore.kernel.org/r/20210614090337.29557-4-jinpu.wang@ionos.com Signed-off-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com> Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-18RDMA/rtrs-clt: Use minimal max_send_sge when create qpJack Wang1-6/+8
We use device limit max_send_sge, which is suboptimal for memory usage. We don't need that much for User Con, 1 is enough. And for IO con, sess->max_segments + 1 is enough Link: https://lore.kernel.org/r/20210614090337.29557-3-jinpu.wang@ionos.com Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-05-31RDMA/rtrs: Avoid Wtautological-constant-out-of-range-compareJack Wang1-5/+0
drivers/infiniband/ulp/rtrs/rtrs-clt.c:1786:19: warning: result of comparison of constant 'MAX_SESS_QUEUE_DEPTH' (65536) with expression of type 'u16' (aka 'unsigned short') is always false [-Wtautological-constant-out-of-range-compare] To fix it, limit MAX_SESS_QUEUE_DEPTH to u16 max, which is 65535, and drop the check in rtrs-clt, as it's the type u16 max. Link: https://lore.kernel.org/r/20210531122835.58329-1-jinpu.wang@ionos.com Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-05-29RDMA/rtrs-clt: Fix memory leak of not-freed sess->stats and stats->pcpu_statsGioh Kim1-0/+6
sess->stats and sess->stats->pcpu_stats objects are freed when sysfs entry is removed. If something wrong happens and session is closed before sysfs entry is created, sess->stats and sess->stats->pcpu_stats objects are not freed. This patch adds freeing of them at three places: 1. When client uses wrong address and session creation fails. 2. When client fails to create a sysfs entry. 3. When client adds wrong address via sysfs add_path. Fixes: 215378b838df0 ("RDMA/rtrs: client: sysfs interface functions") Link: https://lore.kernel.org/r/20210528113018.52290-21-jinpu.wang@ionos.com Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-05-29RDMA/rtrs-clt: Check if the queue_depth has changed during a reconnectionMd Haris Iqbal1-4/+15
The queue_depth is a module parameter for rtrs_server. It is used on the client side to determing the queue_depth of the request queue for the RNBD virtual block device. During a reconnection event for an already mapped device, in case the rtrs_server module queue_depth has changed, fail the reconnect attempt. Also stop further auto reconnection attempts. A manual reconnect via sysfs has to be triggerred. Fixes: 6a98d71daea18 ("RDMA/rtrs: client: main functionality") Link: https://lore.kernel.org/r/20210528113018.52290-20-jinpu.wang@ionos.com Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-05-29RDMA/rtrs-clt: Check state of the rtrs_clt_sess before reading its statsMd Haris Iqbal1-0/+3
When get_next_path_min_inflight is called to select the next path, it iterates over the list of available rtrs_clt_sess (paths). It then reads the number of inflight IOs for that path to select one which has the least inflight IO. But it may so happen that rtrs_clt_sess (path) is no longer in the connected state because closing or error recovery paths can change the status of the rtrs_clt_Sess. For example, the client sent the heart-beat and did not get the response, it would change the session status and stop IO processing. The added checking of this patch can prevent accessing the broken path and generating duplicated error messages. It is ok if the status is changed after checking the status because the error recovery path does not free memory and only tries to reconnection. And also it is ok if the session is closed after checking the status because closing the session changes the session status and flush all IO beforing free memory. If the session is being accessed for IO processing, the closing session will wait. Fixes: 6a98d71daea18 ("RDMA/rtrs: client: main functionality") Link: https://lore.kernel.org/r/20210528113018.52290-13-jinpu.wang@ionos.com Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Reviewed-by: Gioh Kim <gi-oh.kim@ionos.com> Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-05-29RDMA/rtrs-clt: Remove redundant 'break'Guoqing Jiang1-1/+0
It is duplicated with the very next line Link: https://lore.kernel.org/r/20210528113018.52290-12-jinpu.wang@ionos.com Signed-off-by: Guoqing Jiang <guoqing.jiang@ionos.com> Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-05-29RDMA/rtrs-clt: Kill rtrs_clt_disconnect_from_sysfsGuoqing Jiang1-8/+1
The function is just a wrapper of rtrs_clt_close_conns, let's call rtrs_clt_close_conns directly. Link: https://lore.kernel.org/r/20210528113018.52290-10-jinpu.wang@ionos.com Signed-off-by: Guoqing Jiang <guoqing.jiang@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-05-29RDMA/rtrs-clt: Kill rtrs_clt_{start,stop}_hbGuoqing Jiang1-12/+2
The two wrappers are not needed since we can call rtrs_{start,stop}_hb directly. Link: https://lore.kernel.org/r/20210528113018.52290-9-jinpu.wang@ionos.com Signed-off-by: Guoqing Jiang <guoqing.jiang@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-05-29RDMA/rtrs: Use strscpy instead of strlcpyDima Stepanov1-2/+2
During checkpatch analyzing the following warning message was found: WARNING:STRLCPY: Prefer strscpy over strlcpy - see: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/ Fix it by using strscpy calls instead of strlcpy. Link: https://lore.kernel.org/r/20210528113018.52290-8-jinpu.wang@ionos.com Signed-off-by: Dima Stepanov <dmitrii.stepanov@ionos.com> Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-05-29RDMA/rtrs-clt: Remove MAX_SESS_QUEUE_DEPTH from rtrs_send_sess_infoGioh Kim1-1/+1
Client receives queue_depth value from server. There is no need to use MAX_SESS_QUEUE_DEPTH value. Link: https://lore.kernel.org/r/20210528113018.52290-3-jinpu.wang@ionos.com Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-05-07Merge tag 'block-5.13-2021-05-07' of git://git.kernel.dk/linux-blockLinus Torvalds1-1/+2
Pull block fixes from Jens Axboe: - dasd spelling fixes (Bhaskar) - Limit bio max size on multi-page bvecs to the hardware limit, to avoid overly large bio's (and hence latencies). Originally queued for the merge window, but needed a fix and was dropped from the initial pull (Changheun) - NVMe pull request (Christoph): - reset the bdev to ns head when failover (Daniel Wagner) - remove unsupported command noise (Keith Busch) - misc passthrough improvements (Kanchan Joshi) - fix controller ioctl through ns_head (Minwoo Im) - fix controller timeouts during reset (Tao Chiu) - rnbd fixes/cleanups (Gioh, Md, Dima) - Fix iov_iter re-expansion (yangerkun) * tag 'block-5.13-2021-05-07' of git://git.kernel.dk/linux-block: block: reexpand iov_iter after read/write nvmet: remove unsupported command noise nvme-multipath: reset bdev to ns head when failover nvme-pci: fix controller reset hang when racing with nvme_timeout nvme: move the fabrics queue ready check routines to core nvme: avoid memset for passthrough requests nvme: add nvme_get_ns helper nvme: fix controller ioctl through ns_head bio: limit bio max size RDMA/rtrs: fix uninitialized symbol 'cnt' s390: dasd: Mundane spelling fixes block/rnbd: Remove all likely and unlikely block/rnbd-clt: Check the return value of the function rtrs_clt_query block/rnbd: Fix style issues block/rnbd-clt: Change queue_depth type in rnbd_clt_session to size_t
2021-05-03RDMA/rtrs: fix uninitialized symbol 'cnt'Gioh Kim1-1/+2
rtrs_clt_rdma_cq_direct returns an ninitialized value in cnt if there is no session. This patch makes rtrs_clt_rdma_cq_direct returns a negative value for block layer not to try again. Fixes: 2958a995edc94 ("block/rnbd-clt: Support polling mode for IO latency optimization") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Link: https://lore.kernel.org/r/20210429092741.266533-1-gi-oh.kim@ionos.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-05-01Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds1-21/+101
Pull rdma updates from Jason Gunthorpe: "This is significantly bug fixes and general cleanups. The noteworthy new features are fairly small: - XRC support for HNS and improves RQ operations - Bug fixes and updates for hns, mlx5, bnxt_re, hfi1, i40iw, rxe, siw and qib - Quite a few general cleanups on spelling, error handling, static checker detections, etc - Increase the number of device ports supported beyond 255. High port count software switches now exist - Several bug fixes for rtrs - mlx5 Device Memory support for host controlled atomics - Report SRQ tables through to rdma-tool" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (145 commits) IB/qib: Remove redundant assignment to ret RDMA/nldev: Add copy-on-fork attribute to get sys command RDMA/bnxt_re: Fix a double free in bnxt_qplib_alloc_res RDMA/siw: Fix a use after free in siw_alloc_mr IB/hfi1: Remove redundant variable rcd RDMA/nldev: Add QP numbers to SRQ information RDMA/nldev: Return SRQ information RDMA/restrack: Add support to get resource tracking for SRQ RDMA/nldev: Return context information RDMA/core: Add CM to restrack after successful attachment to a device RDMA/cma: Skip device which doesn't support CM RDMA/rxe: Fix a bug in rxe_fill_ip_info() RDMA/mlx5: Expose private query port RDMA/mlx4: Remove an unused variable RDMA/mlx5: Fix type assignment for ICM DM IB/mlx5: Set right RoCE l3 type and roce version while deleting GID RDMA/i40iw: Fix error unwinding when i40iw_hmc_sd_one fails RDMA/cxgb4: add missing qpid increment IB/ipoib: Remove unnecessary struct declaration RDMA/bnxt_re: Get rid of custom module reference counting ...
2021-04-29Merge tag 'for-5.13/drivers-2021-04-27' of git://git.kernel.dk/linux-blockLinus Torvalds1-22/+53
Pull block driver updates from Jens Axboe: - MD changes via Song: - raid5 POWER fix - raid1 failure fix - UAF fix for md cluster - mddev_find_or_alloc() clean up - Fix NULL pointer deref with external bitmap - Performance improvement for raid10 discard requests - Fix missing information of /proc/mdstat - rsxx const qualifier removal (Arnd) - Expose allocated brd pages (Calvin) - rnbd via Gioh Kim: - Change maintainer - Change domain address of maintainers' email - Add polling IO mode and document update - Fix memory leak and some bug detected by static code analysis tools - Code refactoring - Series of floppy cleanups/fixes (Denis) - s390 dasd fixes (Julian) - kerneldoc fixes (Lee) - null_blk double free (Lv) - null_blk virtual boundary addition (Max) - Remove xsysace driver (Michal) - umem driver removal (Davidlohr) - ataflop fixes (Dan) - Revalidate disk removal (Christoph) - Bounce buffer cleanups (Christoph) - Mark lightnvm as deprecated (Christoph) - mtip32xx init cleanups (Shixin) - Various fixes (Tian, Gustavo, Coly, Yang, Zhang, Zhiqiang) * tag 'for-5.13/drivers-2021-04-27' of git://git.kernel.dk/linux-block: (143 commits) async_xor: increase src_offs when dropping destination page drivers/block/null_blk/main: Fix a double free in null_init. md/raid1: properly indicate failure when ending a failed write request md-cluster: fix use-after-free issue when removing rdev nvme: introduce generic per-namespace chardev nvme: cleanup nvme_configure_apst nvme: do not try to reconfigure APST when the controller is not live nvme: add 'kato' sysfs attribute nvme: sanitize KATO setting nvmet: avoid queuing keep-alive timer if it is disabled brd: expose number of allocated pages in debugfs ataflop: fix off by one in ataflop_probe() ataflop: potential out of bounds in do_format() drbd: Fix fall-through warnings for Clang block/rnbd: Use strscpy instead of strlcpy block/rnbd-clt-sysfs: Remove copy buffer overlap in rnbd_clt_get_path_name block/rnbd-clt: Remove max_segment_size block/rnbd-clt: Generate kobject_uevent when the rnbd device state changes block/rnbd-srv: Remove unused arguments of rnbd_srv_rdma_ev Documentation/ABI/rnbd-clt: Add description for nr_poll_queues ...
2021-04-20block/rnbd-clt: Remove max_segment_sizeJack Wang1-10/+5
We always map with SZ_4K, so do not need max_segment_size. Cc: Leon Romanovsky <leonro@nvidia.com> Cc: linux-rdma@vger.kernel.org Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com> Acked-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/20210419073722.15351-18-gi-oh.kim@ionos.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-20block/rnbd-clt: Support polling mode for IO latency optimizationGioh Kim1-13/+49
RNBD can make double-queues for irq-mode and poll-mode. For example, on 4-CPU system 8 request-queues are created, 4 for irq-mode and 4 for poll-mode. If the IO has HIPRI flag, the block-layer will call .poll function of RNBD. Then IO is sent to the poll-mode queue. Add optional nr_poll_queues argument for map_devices interface. To support polling of RNBD, RTRS client creates connections for both of irq-mode and direct-poll-mode. For example, on 4-CPU system it could've create 5 connections: con[0] => user message (softirq cq) con[1:4] => softirq cq After this patch, it can create 9 connections: con[0] => user message (softirq cq) con[1:4] => softirq cq con[5:8] => DIRECT-POLL cq Cc: Leon Romanovsky <leonro@nvidia.com> Cc: linux-rdma@vger.kernel.org Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Acked-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/20210419073722.15351-14-gi-oh.kim@ionos.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-20block/rnbd-clt: Replace {NO_WAIT,WAIT} with RTRS_PERMIT_{WAIT,NOWAIT}Gioh Kim1-2/+2
They are defined with the same value and similar meaning, let's remove one of them, then we can remove {WAIT,NOWAIT}. Also change the type of 'wait' from 'int' to 'enum wait_type' to make it clear. Cc: Leon Romanovsky <leonro@nvidia.com> Cc: linux-rdma@vger.kernel.org Signed-off-by: Guoqing Jiang <guoqing.jiang@ionos.com> Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Acked-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/20210419073722.15351-9-gi-oh.kim@ionos.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-14RDMA/rtrs-clt: Simplify error messageGioh Kim1-16/+7
Two error messages are only different message but have common code to generate the path string. Link: https://lore.kernel.org/r/20210406123639.202899-4-gi-oh.kim@ionos.com Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-14RDMA/rtrs-clt: Print more info when an error happensGioh Kim1-4/+38
Client prints only error value and it is not enough for debugging. 1. When client receives an error from server: the client does not only print the error value but also more information of server connection. 2. When client failes to send IO: the client gets an error from RDMA layer. It also print more information of server connection. Link: https://lore.kernel.org/r/20210406123639.202899-2-gi-oh.kim@ionos.com Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-14RDMA/rtrs-clt: Add a minimum latency multipath policyGioh Kim1-1/+56
This patch adds new multipath policy: min-latency. Client checks the latency of each path when it sends the heart-beat. And it sends IO to the path with the minimum latency. Link: https://lore.kernel.org/r/20210407113444.150961-2-gi-oh.kim@ionos.com Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-13RDMA/rtrs-clt: destroy sysfs after removing session from active listGioh Kim1-1/+1
A session can be removed dynamically by sysfs interface "remove_path" that eventually calls rtrs_clt_remove_path_from_sysfs function. The current rtrs_clt_remove_path_from_sysfs first removes the sysfs interfaces and frees sess->stats object. Second it removes the session from the active list. Therefore some functions could access non-connected session and access the freed sess->stats object even-if they check the session status before accessing the session. For instance rtrs_clt_request and get_next_path_min_inflight check the session status and try to send IO to the session. The session status could be changed when they are trying to send IO but they could not catch the change and update the statistics information in sess->stats object, and generate use-after-free problem. (see: "RDMA/rtrs-clt: Check state of the rtrs_clt_sess before reading its stats") This patch changes the rtrs_clt_remove_path_from_sysfs to remove the session from the active session list and then destroy the sysfs interfaces. Each function still should check the session status because closing or error recovery paths can change the status. Fixes: 6a98d71daea1 ("RDMA/rtrs: client: main functionality") Link: https://lore.kernel.org/r/20210412084002.33582-1-gi-oh.kim@ionos.com Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com> Reviewed-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-01RDMA/rtrs-clt: Cap max_io_sizeJack Wang1-1/+3
Max io size is limited by both remote buffer size and the max fr pages per mr. Link: https://lore.kernel.org/r/20210325153308.1214057-20-gi-oh.kim@ionos.com Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-01RDMA/rtrs-clt: Close rtrs client conn before destroying rtrs clt session filesMd Haris Iqbal1-1/+1
KASAN detected the following BUG: BUG: KASAN: use-after-free in rtrs_clt_update_wc_stats+0x41/0x100 [rtrs_client] Read of size 8 at addr ffff88bf2fb4adc0 by task swapper/0/0 CPU: 0 PID: 0 Comm: swapper/0 Tainted: G O 5.4.84-pserver #5.4.84-1+feature+linux+5.4.y+dbg+20201216.1319+b6b887b~deb10 Hardware name: Supermicro H8QG6/H8QG6, BIOS 3.00 09/04/2012 Call Trace: <IRQ> dump_stack+0x96/0xe0 print_address_description.constprop.4+0x1f/0x300 ? irq_work_claim+0x2e/0x50 __kasan_report.cold.8+0x78/0x92 ? rtrs_clt_update_wc_stats+0x41/0x100 [rtrs_client] kasan_report+0x10/0x20 rtrs_clt_update_wc_stats+0x41/0x100 [rtrs_client] rtrs_clt_rdma_done+0xb1/0x760 [rtrs_client] ? lockdep_hardirqs_on+0x1a8/0x290 ? process_io_rsp+0xb0/0xb0 [rtrs_client] ? mlx4_ib_destroy_cq+0x100/0x100 [mlx4_ib] ? add_interrupt_randomness+0x1a2/0x340 __ib_process_cq+0x97/0x100 [ib_core] ib_poll_handler+0x41/0xb0 [ib_core] irq_poll_softirq+0xe0/0x260 __do_softirq+0x127/0x672 irq_exit+0xd1/0xe0 do_IRQ+0xa3/0x1d0 common_interrupt+0xf/0xf </IRQ> RIP: 0010:cpuidle_enter_state+0xea/0x780 Code: 31 ff e8 99 48 47 ff 80 7c 24 08 00 74 12 9c 58 f6 c4 02 0f 85 53 05 00 00 31 ff e8 b0 6f 53 ff e8 ab 4f 5e ff fb 8b 44 24 04 <85> c0 0f 89 f3 01 00 00 48 8d 7b 14 e8 65 1e 77 ff c7 43 14 00 00 RSP: 0018:ffffffffab007d58 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffca RAX: 0000000000000002 RBX: ffff88b803d69800 RCX: ffffffffa91a8298 RDX: 0000000000000007 RSI: dffffc0000000000 RDI: ffffffffab021414 RBP: ffffffffab6329e0 R08: 0000000000000002 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000002 R13: 000000bf39d82466 R14: ffffffffab632aa0 R15: ffffffffab632ae0 ? lockdep_hardirqs_on+0x1a8/0x290 ? cpuidle_enter_state+0xe5/0x780 cpuidle_enter+0x3c/0x60 do_idle+0x2fb/0x390 ? arch_cpu_idle_exit+0x40/0x40 ? schedule+0x94/0x120 cpu_startup_entry+0x19/0x1b start_kernel+0x5da/0x61b ? thread_stack_cache_init+0x6/0x6 ? load_ucode_amd_bsp+0x6f/0xc4 ? init_amd_microcode+0xa6/0xa6 ? x86_family+0x5/0x20 ? load_ucode_bsp+0x182/0x1fd secondary_startup_64+0xa4/0xb0 Allocated by task 5730: save_stack+0x19/0x80 __kasan_kmalloc.constprop.9+0xc1/0xd0 kmem_cache_alloc_trace+0x15b/0x350 alloc_sess+0xf4/0x570 [rtrs_client] rtrs_clt_open+0x3b4/0x780 [rtrs_client] find_and_get_or_create_sess+0x649/0x9d0 [rnbd_client] rnbd_clt_map_device+0xd7/0xf50 [rnbd_client] rnbd_clt_map_device_store+0x4ee/0x970 [rnbd_client] kernfs_fop_write+0x141/0x240 vfs_write+0xf3/0x280 ksys_write+0xba/0x150 do_syscall_64+0x68/0x270 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 5822: save_stack+0x19/0x80 __kasan_slab_free+0x125/0x170 kfree+0xe7/0x3f0 kobject_put+0xd3/0x240 rtrs_clt_destroy_sess_files+0x3f/0x60 [rtrs_client] rtrs_clt_close+0x3c/0x80 [rtrs_client] close_rtrs+0x45/0x80 [rnbd_client] rnbd_client_exit+0x10f/0x2bd [rnbd_client] __x64_sys_delete_module+0x27b/0x340 do_syscall_64+0x68/0x270 entry_SYSCALL_64_after_hwframe+0x49/0xbe When rtrs_clt_close is triggered, it iterates over all the present rtrs_clt_sess and triggers close on them. However, the call to rtrs_clt_destroy_sess_files is done before the rtrs_clt_close_conns. This is incorrect since during the initialization phase we allocate rtrs_clt_sess first, and then we go ahead and create rtrs_clt_con for it. If we free the rtrs_clt_sess structure before closing the rtrs_clt_con, it may so happen that an inflight IO completion would trigger the function rtrs_clt_rdma_done, which would lead to the above UAF case. Hence close the rtrs_clt_con connections first, and then trigger the destruction of session files. Fixes: 6a98d71daea1 ("RDMA/rtrs: client: main functionality") Link: https://lore.kernel.org/r/20210325153308.1214057-12-gi-oh.kim@ionos.com Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-01RDMA/rtrs: Remove sessname and sess_kobj from rtrs_attrsGuoqing Jiang1-2/+0
The two members are not used in the code, so remove them. Link: https://lore.kernel.org/r/20210325153308.1214057-10-gi-oh.kim@ionos.com Signed-off-by: Guoqing Jiang <guoqing.jiang@ionos.com> Reviewed-by: Danil Kipnis <danil.kipnis@ionos.com> Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-01RDMA/rtrs-clt: Remove redundant code from rtrs_clt_read_reqGuoqing Jiang1-4/+1
There is no need to dereference 's' from 'sess', since we have "sess = to_clt_sess(s)" before. And we can deference 'dev' from 's' earlier. Link: https://lore.kernel.org/r/20210325153308.1214057-8-gi-oh.kim@ionos.com Signed-off-by: Guoqing Jiang <guoqing.jiang@ionos.com> Reviewed-by: Danil Kipnis <danil.kipnis@ionos.com> Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-03-11RDMA/rtrs-clt: Use rdma_event_msg in logJack Wang1-3/+6
It's easier to understand a string instead of enum. Link: https://lore.kernel.org/r/20210222141551.54345-2-jinpu.wang@cloud.ionos.com Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-03-11RDMA/rtrs: Use new shared CQ mechanismJack Wang1-5/+5
Have the driver use shared CQs which provids a ~10%-20% improvement during test. Instead of opening a CQ for each QP per connection, a CQ for each QP will be provided by the RDMA core driver that will be shared between the QPs on that core reducing interrupt overhead. Link: https://lore.kernel.org/r/20210222141551.54345-1-jinpu.wang@cloud.ionos.com Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-02-16RDMA/rtrs: Only allow addition of path to an already established sessionMd Haris Iqbal1-0/+7
While adding a path from the client side to an already established session, it was possible to provide the destination IP to a different server. This is dangerous. This commit adds an extra member to the rtrs_msg_conn_req structure, named first_conn; which is supposed to notify if the connection request is the first for that session or not. On the server side, if a session does not exist but the first_conn received inside the rtrs_msg_conn_req structure is 1, the connection request is failed. This signifies that the connection request is for an already existing session, and since the server did not find one, it is an wrong connection request. Fixes: 6a98d71daea1 ("RDMA/rtrs: client: main functionality") Fixes: 9cb837480424 ("RDMA/rtrs: server: main functionality") Link: https://lore.kernel.org/r/20210212134525.103456-3-jinpu.wang@cloud.ionos.com Signed-off-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com> Reviewed-by: Lutz Pogrell <lutz.pogrell@cloud.ionos.com> Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-15RDMA/rtrs-clt: Use bitmask to check sess->flagsJack Wang1-7/+7
We may want to add new flags, so it's better to use bitmask to check flags. Fixes: 6a98d71daea1 ("RDMA/rtrs: client: main functionality") Link: https://lore.kernel.org/r/20201217141915.56989-17-jinpu.wang@cloud.ionos.com Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Signed-off-by: Gioh Kim <gi-oh.kim@cloud.ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-15RDMA/rtrs: Do not signal for heatbeatJack Wang1-1/+0
For HB, there is no need to generate signal for completion. Also remove a comment accordingly. Fixes: c0894b3ea69d ("RDMA/rtrs: core: lib functions shared between client and server modules") Link: https://lore.kernel.org/r/20201217141915.56989-16-jinpu.wang@cloud.ionos.com Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Reported-by: Gioh Kim <gi-oh.kim@cloud.ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-15RDMA/rtrs-clt: Refactor the failure cases in alloc_cltGuoqing Jiang1-13/+12
Make all failure cases go to the common path to avoid duplicate code. And some issued existed before. 1. clt need to be freed to avoid memory leak. 2. return ERR_PTR(-ENOMEM) if kobject_create_and_add fails, because rtrs_clt_open checks the return value of by call "IS_ERR(clt)". Fixes: 6a98d71daea1 ("RDMA/rtrs: client: main functionality") Link: https://lore.kernel.org/r/20201217141915.56989-15-jinpu.wang@cloud.ionos.com Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Reviewed-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com> Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>