summaryrefslogtreecommitdiff
path: root/drivers/infiniband/sw
AgeCommit message (Collapse)AuthorFilesLines
2022-10-24RDMA/rxe: Delete error messages triggered by incoming Read requestsDaisuke Matsuda1-7/+3
[ Upstream commit 2c02249fcbfc066bd33e2a7375c7006d4cb367f6 ] An incoming Read request causes multiple Read responses. If a user MR to copy data from is unavailable or responder cannot send a reply, then the error messages can be printed for each response attempt, resulting in message overflow. Link: https://lore.kernel.org/r/20220829071218.1639065-1-matsuda-daisuke@fujitsu.com Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-24RDMA/rxe: Fix resize_finish() in rxe_queue.cBob Pearson1-5/+7
[ Upstream commit fda5d0cf8aef12f0a4f714a96a4b2fce039a3e55 ] Currently in resize_finish() in rxe_queue.c there is a loop which copies the entries in the original queue into a newly allocated queue. The termination logic for this loop is incorrect. The call to queue_next_index() updates cons but has no effect on whether the queue is empty. So if the queue starts out empty nothing is copied but if it is not then the loop will run forever. This patch changes the loop to compare the value of cons to the original producer index. Fixes: ae6e843fe08d0 ("RDMA/rxe: Add memory barriers to kernel queues") Link: https://lore.kernel.org/r/20220825221446.6512-1-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-24RDMA/siw: Fix QP destroy to wait for all references dropped.Bernard Metzler3-1/+5
[ Upstream commit a3c278807a459e6f50afee6971cabe74cccfb490 ] Delay QP destroy completion until all siw references to QP are dropped. The calling RDMA core will free QP structure after successful return from siw_qp_destroy() call, so siw must not hold any remaining reference to the QP upon return. A use-after-free was encountered in xfstest generic/460, while testing NFSoRDMA. Here, after a TCP connection drop by peer, the triggered siw_cm_work_handler got delayed until after QP destroy call, referencing a QP which has already freed. Fixes: 303ae1cdfdf7 ("rdma/siw: application interface") Reported-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com> Link: https://lore.kernel.org/r/20220920082503.224189-1-bmt@zurich.ibm.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-24RDMA/siw: Always consume all skbuf data in sk_data_ready() upcall.Bernard Metzler1-12/+15
[ Upstream commit 754209850df8367c954ac1de7671c7430b1f342c ] For header and trailer/padding processing, siw did not consume new skb data until minimum amount present to fill current header or trailer structure, including potential payload padding. Not consuming any data during upcall may cause a receive stall, since tcp_read_sock() is not upcalling again if no new data arrive. A NFSoRDMA client got stuck at RDMA Write reception of unaligned payload, if the current skb did contain only the expected 3 padding bytes, but not the 4 bytes CRC trailer. Expecting 4 more bytes already arrived in another skb, and not consuming those 3 bytes in the current upcall left the Write incomplete, waiting for the CRC forever. Fixes: 8b6a361b8c48 ("rdma/siw: receive path") Reported-by: Olga Kornievskaia <kolga@netapp.com> Tested-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com> Link: https://lore.kernel.org/r/20220920081202.223629-1-bmt@zurich.ibm.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-24RDMA/rxe: Fix the error caused by qp->skZhu Yanjun1-2/+4
[ Upstream commit 548ce2e66725dcba4e27d1e8ac468d5dd17fd509 ] When sock_create_kern in the function rxe_qp_init_req fails, qp->sk is set to NULL. Then the function rxe_create_qp will call rxe_qp_do_cleanup to handle allocated resource. Before handling qp->sk, this variable should be checked. Fixes: 8700e3e7c485 ("Soft RoCE driver") Link: https://lore.kernel.org/r/20220822011615.805603-3-yanjun.zhu@linux.dev Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-24RDMA/rxe: Fix "kernel NULL pointer dereference" errorZhu Yanjun1-1/+3
[ Upstream commit a625ca30eff806395175ebad3ac1399014bdb280 ] When rxe_queue_init in the function rxe_qp_init_req fails, both qp->req.task.func and qp->req.task.arg are not initialized. Because of creation of qp fails, the function rxe_create_qp will call rxe_qp_do_cleanup to handle allocated resource. Before calling __rxe_do_task, both qp->req.task.func and qp->req.task.arg should be checked. Fixes: 8700e3e7c485 ("Soft RoCE driver") Link: https://lore.kernel.org/r/20220822011615.805603-2-yanjun.zhu@linux.dev Reported-by: syzbot+ab99dc4c6e961eed8b8e@syzkaller.appspotmail.com Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-09-15RDMA/siw: Pass a pointer to virt_to_page()Linus Walleij1-4/+14
[ Upstream commit 0d1b756acf60da5004c1e20ca4462f0c257bf6e1 ] Functions that work on a pointer to virtual memory such as virt_to_pfn() and users of that function such as virt_to_page() are supposed to pass a pointer to virtual memory, ideally a (void *) or other pointer. However since many architectures implement virt_to_pfn() as a macro, this function becomes polymorphic and accepts both a (unsigned long) and a (void *). If we instead implement a proper virt_to_pfn(void *addr) function the following happens (occurred on arch/arm): drivers/infiniband/sw/siw/siw_qp_tx.c:32:23: warning: incompatible integer to pointer conversion passing 'dma_addr_t' (aka 'unsigned int') to parameter of type 'const void *' [-Wint-conversion] drivers/infiniband/sw/siw/siw_qp_tx.c:32:37: warning: passing argument 1 of 'virt_to_pfn' makes pointer from integer without a cast [-Wint-conversion] drivers/infiniband/sw/siw/siw_qp_tx.c:538:36: warning: incompatible integer to pointer conversion passing 'unsigned long long' to parameter of type 'const void *' [-Wint-conversion] Fix this with an explicit cast. In one case where the SIW SGE uses an unaligned u64 we need a double cast modifying the virtual address (va) to a platform-specific uintptr_t before casting to a (void *). Fixes: b9be6f18cf9e ("rdma/siw: transmit path") Cc: linux-rdma@vger.kernel.org Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/20220902215918.603761-1-linus.walleij@linaro.org Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-25Revert "RDMA/rxe: Create duplicate mapping tables for FMRs"Li Zhijian5-162/+104
[ Upstream commit 1e75550648da1fa1cd1969e7597355de8fe8caf6 ] Below 2 commits will be reverted: commit 8ff5f5d9d8cf ("RDMA/rxe: Prevent double freeing rxe_map_set()") commit 647bf13ce944 ("RDMA/rxe: Create duplicate mapping tables for FMRs") The community has a few bug reports which pointed this commit at last. Some proposals are raised up in the meantime but all of them have no follow-up operation. The previous commit led the map_set of FMR to be not available any more if the MR is registered again after invalidating. Although the mentioned patch try to fix a potential race in building/accessing the same table for fast memory regions, it broke rtrs etc ULPs. Since the latter could be worse, revert this patch. With previous commit, it's observed that a same MR in rnbd server will trigger below code path: -> rxe_mr_init_fast() |-> alloc map_set() # map_set is uninitialized |...-> rxe_map_mr_sg() # build the map_set |-> rxe_mr_set_page() |...-> rxe_reg_fast_mr() # mr->state change to VALID from FREE that means # we can access host memory(such rxe_mr_copy) |...-> rxe_invalidate_mr() # mr->state change to FREE from VALID |...-> rxe_reg_fast_mr() # mr->state change to VALID from FREE, # but map_set was not built again |...-> rxe_mr_copy() # kernel crash due to access wild addresses # that lookup from the map_set The backtraces are not always identical. [1st]---------- RIP: 0010:lookup_iova+0x66/0xa0 [rdma_rxe] Code: 00 00 00 48 d3 ee 89 32 c3 4c 8b 18 49 8b 3b 48 8b 47 08 48 39 c6 72 38 48 29 c6 45 31 d2 b8 01 00 00 00 48 63 c8 48 c1 e1 04 <48> 8b 4c 0f 08 48 39 f1 77 21 83 c0 01 48 29 ce 3d 00 01 00 00 75 RSP: 0018:ffffb7ff80063bf0 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff9b9949d86800 RCX: 0000000000000000 RDX: ffffb7ff80063c00 RSI: 0000000049f6b378 RDI: 002818da00000004 RBP: 0000000000000120 R08: ffffb7ff80063c08 R09: ffffb7ff80063c04 R10: 0000000000000002 R11: ffff9b9916f7eef8 R12: ffff9b99488a0038 R13: ffff9b99488a0038 R14: ffff9b9914fb346a R15: ffff9b990ab27000 FS: 0000000000000000(0000) GS:ffff9b997dc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007efc33a98ed0 CR3: 0000000014f32004 CR4: 00000000001706f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> rxe_mr_copy.part.0+0x6f/0x140 [rdma_rxe] rxe_responder+0x12ee/0x1b60 [rdma_rxe] ? rxe_icrc_check+0x7e/0x100 [rdma_rxe] ? rxe_rcv+0x1d0/0x780 [rdma_rxe] ? rxe_icrc_hdr.isra.0+0xf6/0x160 [rdma_rxe] rxe_do_task+0x67/0xb0 [rdma_rxe] rxe_xmit_packet+0xc7/0x210 [rdma_rxe] rxe_requester+0x680/0xee0 [rdma_rxe] ? update_load_avg+0x5f/0x690 ? update_load_avg+0x5f/0x690 ? rtrs_clt_recv_done+0x1b/0x30 [rtrs_client] [2nd]---------- RIP: 0010:rxe_mr_copy.part.0+0xa8/0x140 [rdma_rxe] Code: 00 00 49 c1 e7 04 48 8b 00 4c 8d 2c d0 48 8b 44 24 10 4d 03 7d 00 85 ed 7f 10 eb 6c 89 54 24 0c 49 83 c7 10 31 c0 85 ed 7e 5e <49> 8b 3f 8b 14 24 4c 89 f6 48 01 c7 85 d2 74 06 48 89 fe 4c 89 f7 RSP: 0018:ffffae3580063bf8 EFLAGS: 00010202 RAX: 0000000000018978 RBX: ffff9d7ef7a03600 RCX: 0000000000000008 RDX: 000000000000007c RSI: 000000000000007c RDI: ffff9d7ef7a03600 RBP: 0000000000000120 R08: ffffae3580063c08 R09: ffffae3580063c04 R10: ffff9d7efece0038 R11: ffff9d7ec4b1db00 R12: ffff9d7efece0038 R13: ffff9d7ef4098260 R14: ffff9d7f11e23c6a R15: 4c79500065708144 FS: 0000000000000000(0000) GS:ffff9d7f3dc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fce47276c60 CR3: 0000000003f66004 CR4: 00000000001706f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> rxe_responder+0x12ee/0x1b60 [rdma_rxe] ? rxe_icrc_check+0x7e/0x100 [rdma_rxe] ? rxe_rcv+0x1d0/0x780 [rdma_rxe] ? rxe_icrc_hdr.isra.0+0xf6/0x160 [rdma_rxe] rxe_do_task+0x67/0xb0 [rdma_rxe] rxe_xmit_packet+0xc7/0x210 [rdma_rxe] rxe_requester+0x680/0xee0 [rdma_rxe] ? update_load_avg+0x5f/0x690 ? update_load_avg+0x5f/0x690 ? rtrs_clt_recv_done+0x1b/0x30 [rtrs_client] rxe_do_task+0x67/0xb0 [rdma_rxe] tasklet_action_common.constprop.0+0x92/0xc0 __do_softirq+0xe1/0x2d8 run_ksoftirqd+0x21/0x30 smpboot_thread_fn+0x183/0x220 ? sort_range+0x20/0x20 kthread+0xe2/0x110 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x22/0x30 Link: https://lore.kernel.org/r/1658805386-2-1-git-send-email-lizhijian@fujitsu.com Link: https://lore.kernel.org/all/20220210073655.42281-1-guoqing.jiang@linux.dev/T/ Link: https://www.spinics.net/lists/linux-rdma/msg110836.html Link: https://lore.kernel.org/lkml/94a5ea93-b8bb-3a01-9497-e2021f29598a@linux.dev/t/ Tested-by: Md Haris Iqbal <haris.iqbal@ionos.com> Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-25RDMA/rxe: Limit the number of calls to each taskletBob Pearson2-4/+18
[ Upstream commit eff6d998ca297cb0b2e53b032a56cf8e04dd8b17 ] Limit the maximum number of calls to each tasklet from rxe_do_task() before yielding the cpu. When the limit is reached reschedule the tasklet and exit the calling loop. This patch prevents one tasklet from consuming 100% of a cpu core and causing a deadlock or soft lockup. Link: https://lore.kernel.org/r/20220630190425.2251-9-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-17RDMA/rxe: Fix error unwind in rxe_create_qp()Zhu Yanjun1-4/+8
[ Upstream commit fd5382c5805c4bcb50fd25b7246247d3f7114733 ] In the function rxe_create_qp(), rxe_qp_from_init() is called to initialize qp, internally things like the spin locks are not setup until rxe_qp_init_req(). If an error occures before this point then the unwind will call rxe_cleanup() and eventually to rxe_qp_do_cleanup()/rxe_cleanup_task() which will oops when trying to access the uninitialized spinlock. Move the spinlock initializations earlier before any failures. Fixes: 8700e3e7c485 ("Soft RoCE driver") Link: https://lore.kernel.org/r/20220731063621.298405-1-yanjun.zhu@linux.dev Reported-by: syzbot+833061116fa28df97f3b@syzkaller.appspotmail.com Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-17RDMA/rxe: Fix rnr retry behaviorBob Pearson4-3/+22
[ Upstream commit 445fd4f4fb76d513de6b05b08b3a4d0bb980fc80 ] Currently the completer tasklet when retransmit timer or the rnr timer fires the same flag (qp->req.need_retry) is set so that if either timer fires it will attempt to perform a retry flow on the send queue. This has the effect of responding to an RNR NAK at the first retransmit timer event which might not allow the requested rnr timeout. This patch adds a new flag (qp->req.wait_for_rnr_timer) which, if set, prevents a retry flow until the rnr nak timer fires. This patch fixes rnr retry errors which can be observed by running the pyverbs test_rdmacm_async_traffic_external_qp multiple times. With this patch applied they do not occur. Link: https://lore.kernel.org/linux-rdma/a8287823-1408-4273-bc22-99a0678db640@gmail.com/ Link: https://lore.kernel.org/linux-rdma/2bafda9e-2bb6-186d-12a1-179e8f6a2678@talpey.com/ Fixes: 8700e3e7c485 ("Soft RoCE driver") Link: https://lore.kernel.org/r/20220630190425.2251-6-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-17RDMA/rxe: For invalidate compare according to set keys in mrMd Haris Iqbal2-7/+7
[ Upstream commit 174e7b137042f19b5ce88beb4fc0ff4ec6b0c72a ] The 'rkey' input can be an lkey or rkey, and in rxe the lkey or rkey have the same value, including the variant bits. So, if mr->rkey is set, compare the invalidate key with it, otherwise compare with the mr->lkey. Since we already did a lookup on the non-varient bits to get this far, the check's only purpose is to confirm that the wqe has the correct variant bits. Fixes: 001345339f4c ("RDMA/rxe: Separate HW and SW l/rkeys") Link: https://lore.kernel.org/r/20220707073006.328737-1-haris.phnx@gmail.com Signed-off-by: Md Haris Iqbal <haris.phnx@gmail.com> Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-17RDMA/rxe: Fix mw bind to allow any consumer key portionBob Pearson1-7/+0
[ Upstream commit 1603f89935ec86d40a7667e1250392626976ccc2 ] The current implementation of rxe_check_bind_mw() in rxe_mw.c is incorrect since it requires the new key portion provided by the mw consumer to be different than the previous key portion. This is not required by the IBA. Remove the test. Link: https://lore.kernel.org/linux-rdma/fb4614e7-4cac-0dc7-3ef7-766dfd10e8f2@gmail.com/ Fixes: 32a577b4c3a9 ("Add support for bind MW work requests") Link: https://lore.kernel.org/r/20220714204619.13396-1-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-17RDMA/rxe: Fix BUG: KASAN: null-ptr-deref in rxe_qp_do_cleanupZhu Yanjun1-4/+6
[ Upstream commit 37da51efe6eaa0560f46803c8c436a48a2084da7 ] The function rxe_create_qp calls rxe_qp_from_init. If some error occurs, the error handler of function rxe_qp_from_init will set both scq and rcq to NULL. Then rxe_create_qp calls rxe_put to handle qp. In the end, rxe_qp_do_cleanup is called by rxe_put. rxe_qp_do_cleanup directly accesses scq and rcq before checking them. This will cause null-ptr-deref error. The call graph is as below: rxe_create_qp { ... rxe_qp_from_init { ... err1: ... qp->rcq = NULL; <---rcq is set to NULL qp->scq = NULL; <---scq is set to NULL ... } qp_init: rxe_put{ ... rxe_qp_do_cleanup { ... atomic_dec(&qp->scq->num_wq); <--- scq is accessed ... atomic_dec(&qp->rcq->num_wq); <--- rcq is accessed } } Fixes: 4703b4f0d94a ("RDMA/rxe: Enforce IBA C11-17") Link: https://lore.kernel.org/r/20220705225414.315478-1-yanjun.zhu@linux.dev Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com> Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-17RDMA/siw: Fix duplicated reported IW_CM_EVENT_CONNECT_REPLY eventCheng Xu1-3/+4
[ Upstream commit 3056fc6c32e613b760422b94c7617ac9a24a4721 ] If siw_recv_mpa_rr returns -EAGAIN, it means that the MPA reply hasn't been received completely, and should not report IW_CM_EVENT_CONNECT_REPLY in this case. This may trigger a call trace in iw_cm. A simple way to trigger this: server: ib_send_lat client: ib_send_lat -R <server_ip> The call trace looks like this: kernel BUG at drivers/infiniband/core/iwcm.c:894! invalid opcode: 0000 [#1] PREEMPT SMP NOPTI <...> Workqueue: iw_cm_wq cm_work_handler [iw_cm] Call Trace: <TASK> cm_work_handler+0x1dd/0x370 [iw_cm] process_one_work+0x1e2/0x3b0 worker_thread+0x49/0x2e0 ? rescuer_thread+0x370/0x370 kthread+0xe5/0x110 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x1f/0x30 </TASK> Fixes: 6c52fdc244b5 ("rdma/siw: connection management") Link: https://lore.kernel.org/r/dae34b5fd5c2ea2bd9744812c1d2653a34a94c67.1657706960.git.chengyou@linux.alibaba.com Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-17RDMA/rxe: Fix deadlock in rxe_do_local_ops()Bob Pearson1-3/+5
[ Upstream commit 7cb33d1bc1ac8e51fd88928f96674d392f8e07c4 ] When a local operation (invalidate mr, reg mr, bind mw) is finished there will be no ack packet coming from a responder to cause the wqe to be completed. This may happen anyway if a subsequent wqe performs IO. Currently if the wqe is signalled the completer tasklet is scheduled immediately but not otherwise. This leads to a deadlock if the next wqe has the fence bit set in send flags and the operation is not signalled. This patch removes the condition that the wqe must be signalled in order to schedule the completer tasklet which is the simplest fix for this deadlock and is fairly low cost. This is the analog for local operations of always setting the ackreq bit in all last or only request packets even if the operation is not signalled. Link: https://lore.kernel.org/r/20220523223251.15350-1-rpearsonhpe@gmail.com Reported-by: Jenny Hack <jhack@hpe.com> Fixes: c1a411268a4b ("RDMA/rxe: Move local ops to subroutine") Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-17RDMA/rxe: Add a responder state for atomic replyBob Pearson1-6/+18
[ Upstream commit 0ed5493e430a1d887eb62b45c75dd5d6bb2dcf48 ] Add a responder state for atomic reply similar to read reply and rename process_atomic() rxe_atomic_reply(). In preparation for merging the normal and retry atomic responder flows. Link: https://lore.kernel.org/r/20220606143836.3323-3-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-17RDMA/rxe: fix xa_alloc_cycle() error return value check againDongliang Mu1-2/+2
[ Upstream commit 1a685940e6200e9def6e34bbaa19dd31dc5aeaf8 ] Currently rxe_alloc checks ret to indicate error, but 1 is also a valid return and just indicates that the allocation succeeded with a wrap. Fix this by modifying the check to be < 0. Link: https://lore.kernel.org/r/20220609070656.1446121-1-dzm91@hust.edu.cn Fixes: 3225717f6dfa ("RDMA/rxe: Replace red-black trees by xarrays") Signed-off-by: Dongliang Mu <mudongliangabcd@gmail.com> Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-05-24RDMA/rxe: Fix an error handling path in rxe_get_mcg()Christophe JAILLET1-2/+4
The commit in the Fixes tag has shuffled some code. Now 'mcg_num' is incremented before the kzalloc(). So if the memory allocation fails, this increment must be undone. Fixes: a926a903b7dc ("RDMA/rxe: Do not call dev_mc_add/del() under a spinlock") Link: https://lore.kernel.org/r/fe137cd8b1f17593243aa73d59c18ea71ab9ee36.1653225896.git.christophe.jaillet@wanadoo.fr Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-05-24Merge tag 'v5.18' into rdma.git for-nextJason Gunthorpe4-60/+69
Following patches have dependencies. Resolve the merge conflict in drivers/net/ethernet/mellanox/mlx5/core/main.c by keeping the new names for the fs functions following linux-next: https://lore.kernel.org/r/20220519113529.226bc3e2@canb.auug.org.au/ Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-05-11RDMA/siw: Enable siw on tunnel devicesBernard Metzler1-2/+3
Enable siw to attach to tunnel devices, there is no reason not to, siw properly generates all packets already. Link: https://lore.kernel.org/r/20220510143917.23735-1-bmt@zurich.ibm.com Tested-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-05-09RDMA/rxe: Enforce IBA C11-17Bob Pearson3-0/+17
Add a counter to keep track of the number of WQs connected to a CQ and return an error if destroy_cq() is called while the counter is non zero. Link: https://lore.kernel.org/r/20220421014042.26985-8-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-05-09RDMA/rxe: Move mw cleanup code to rxe_mw_cleanup()Bob Pearson2-29/+29
Move code from rxe_dealloc_mw() to rxe_mw_cleanup() to allow flows which hold a reference to mw to complete. Link: https://lore.kernel.org/r/20220421014042.26985-7-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-05-09RDMA/rxe: Move mr cleanup code to rxe_mr_cleanup()Bob Pearson1-6/+4
Move the code which tears down an mr to rxe_mr_cleanup to allow operations holding a reference to the mr to complete. Link: https://lore.kernel.org/r/20220421014042.26985-6-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-05-09RDMA/rxe: Move qp cleanup code to rxe_qp_do_cleanup()Bob Pearson3-10/+4
Move the code from rxe_qp_destroy() to rxe_qp_do_cleanup(). This allows flows holding references to qp to complete before the qp object is torn down. Link: https://lore.kernel.org/r/20220421014042.26985-5-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-05-09RDMA/rxe: Check rxe_get() return valueBob Pearson3-3/+6
In the tasklets (completer, responder, and requester) check the return value from rxe_get() to detect failures to get a reference. This only occurs if the qp has had its reference count drop to zero which indicates that it no longer should be used. The ref is never 0 today because the tasklets are flushed before the ref is dropped. The next patch changes this so that the ref is dropped then the tasklets are flushed. Link: https://lore.kernel.org/r/20220421014042.26985-4-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-05-09RDMA/rxe: Add rxe_srq_cleanup()Bob Pearson4-22/+25
Move cleanup code from rxe_destroy_srq() to rxe_srq_cleanup() which is called after all references are dropped to allow code depending on the srq object to complete. Link: https://lore.kernel.org/r/20220421014042.26985-3-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-05-06RDMA/rxe: Remove IB_SRQ_INIT_MASKBob Pearson3-57/+74
Currently the #define IB_SRQ_INIT_MASK is used to distinguish the rxe_create_srq verb from the rxe_modify_srq verb so that some code can be shared between these two subroutines. This commit splits rxe_srq_chk_attr into two subroutines: rxe_srq_chk_init and rxe_srq_chk_attr which handle the create_srq and modify_srq verbs separately. Link: https://lore.kernel.org/r/20220421014042.26985-2-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-05-06RDMA/rxe: Skip adjusting remote addr for write in retry operationChengguang Xu1-2/+0
For write request the remote addr will be sent only with first packet so we don't have to adjust wqe->iova in retry operation. Link: https://lore.kernel.org/r/20220502053907.6388-1-cgxu519@mykernel.net Signed-off-by: Chengguang Xu <cgxu519@mykernel.net> Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-05-05RDMA/rxe: Optimize the mr pool structZhu Yanjun2-11/+3
Based on the commit c9f4c695835c ("RDMA/rxe: Reverse the sense of RXE_POOL_NO_ALLOC"), only the mr pool uses the RXE_POOL_ALLOC, As such, replace this flags with pool type to save memory. Link: https://lore.kernel.org/r/20220428041028.1363139-1-yanjun.zhu@linux.dev Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-05-05RDMA/rxe: Change mcg_lock to a _bh lockBob Pearson1-21/+15
rxe_mcast.c currently uses _irqsave spinlocks for rxe->mcg_lock while rxe_recv.c uses _bh spinlocks for the same lock. As there is no case where the mcg_lock can be taken from an IRQ, change these all to bh locks so we don't have confusing mismatched lock types on the same spinlock. Fixes: 6090a0c4c7c6 ("RDMA/rxe: Cleanup rxe_mcast.c") Link: https://lore.kernel.org/r/20220504202817.98247-1-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-05-05RDMA/rxe: Do not call dev_mc_add/del() under a spinlockBob Pearson1-28/+23
These routines were not intended to be called under a spinlock and will throw debugging warnings: raw_local_irq_restore() called with IRQs enabled WARNING: CPU: 13 PID: 3107 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x2f/0x50 CPU: 13 PID: 3107 Comm: python3 Tainted: G E 5.18.0-rc1+ #7 Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 RIP: 0010:warn_bogus_irq_restore+0x2f/0x50 Call Trace: <TASK> _raw_spin_unlock_irqrestore+0x75/0x80 rxe_attach_mcast+0x304/0x480 [rdma_rxe] ib_attach_mcast+0x88/0xa0 [ib_core] ib_uverbs_attach_mcast+0x186/0x1e0 [ib_uverbs] ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0xcd/0x140 [ib_uverbs] ib_uverbs_cmd_verbs+0xdb0/0xea0 [ib_uverbs] ib_uverbs_ioctl+0xd2/0x160 [ib_uverbs] do_syscall_64+0x5c/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae Move them out of the spinlock, it is OK if there is some races setting up the MC reception at the ethernet layer with rbtree lookups. Fixes: 6090a0c4c7c6 ("RDMA/rxe: Cleanup rxe_mcast.c") Link: https://lore.kernel.org/r/20220504202817.98247-1-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-05-05RDMA/siw: Fix a condition race issue in MPA request processingCheng Xu1-3/+4
The calling of siw_cm_upcall and detaching new_cep with its listen_cep should be atomistic semantics. Otherwise siw_reject may be called in a temporary state, e,g, siw_cm_upcall is called but the new_cep->listen_cep has not being cleared. This fixes a WARN: WARNING: CPU: 7 PID: 201 at drivers/infiniband/sw/siw/siw_cm.c:255 siw_cep_put+0x125/0x130 [siw] CPU: 2 PID: 201 Comm: kworker/u16:22 Kdump: loaded Tainted: G E 5.17.0-rc7 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 Workqueue: iw_cm_wq cm_work_handler [iw_cm] RIP: 0010:siw_cep_put+0x125/0x130 [siw] Call Trace: <TASK> siw_reject+0xac/0x180 [siw] iw_cm_reject+0x68/0xc0 [iw_cm] cm_work_handler+0x59d/0xe20 [iw_cm] process_one_work+0x1e2/0x3b0 worker_thread+0x50/0x3a0 ? rescuer_thread+0x390/0x390 kthread+0xe5/0x110 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x1f/0x30 </TASK> Fixes: 6c52fdc244b5 ("rdma/siw: connection management") Link: https://lore.kernel.org/r/d528d83466c44687f3872eadcb8c184528b2e2d4.1650526554.git.chengyou@linux.alibaba.com Reported-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-05-04RDMA/rxe: Replace paylen by payloadBob Pearson1-6/+6
In finish_packet() in rxe_req.c a variable was incorrectly called paylen instead of payload. Elsewhere in the rxe source payload is always used for the RoCE payload length and paylen is always used for the UDP payload length. This will cause unnecessary confusion. Replace paylen by payload in finish_packet(). Link: https://lore.kernel.org/r/20220420172316.5465-1-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-04-25RDMA/rxe: Remove useless parameters for update_state()Li Zhijian1-3/+2
wqe was not used by update_state() so far. Commit aaaf62e06623 ("RDMA/rxe: Remove useless argument for update_state()") just did a partial fixes. Link: https://lore.kernel.org/r/20220412022903.574238-1-lizhijian@fujitsu.com Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-04-20RDMA/rxe: Recheck the MR in when generating a READ replyBob Pearson1-2/+8
The rping benchmark fails on long runs. The root cause of this failure has been traced to a failure to compute a nonzero value of mr in rare situations. Fix this failure by correctly handling the computation of mr in read_reply() in rxe_resp.c in the replay flow. Fixes: 8a1a0be894da ("RDMA/rxe: Replace mr by rkey in responder resources") Link: https://lore.kernel.org/r/20220418174103.3040-1-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-04-12RDMA/rxe: Fix "Replace mr by rkey in responder resources"Bob Pearson1-8/+17
The referenced commit generates a reference counting error if the rkey has the same index but the wrong key. In this case the reference taken by rxe_pool_get_index() is not dropped. Drop the reference if the keys don't match in rxe_recheck_mr(). Check that the mw and mr are still valid. Fixes: 8a1a0be894da ("RDMA/rxe: Replace mr by rkey in responder resources") Link: https://lore.kernel.org/r/20220411030647.20011-1-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-04-12RDMA/rxe: Generate a completion for unsupported/invalid opcodeXiao Yang1-1/+1
Current rxe_requester() doesn't generate a completion when processing an unsupported/invalid opcode. If rxe driver doesn't support a new opcode (e.g. RDMA Atomic Write) and RDMA library supports it, an application using the new opcode can reproduce this issue. Fix the issue by calling "goto err;". Fixes: 8700e3e7c485 ("Soft RoCE driver") Link: https://lore.kernel.org/r/20220410113513.27537-1-yangx.jy@fujitsu.com Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-04-08RDMA/rxe: Remove reliable datagram supportBob Pearson2-4/+2
The rdma_rxe driver does not actually support the reliable datagram transport but contains two references to RD opcodes in driver code. This commit removes these references to RD transport opcodes which are never used. Link: https://lore.kernel.org/r/cce0f07d-25fc-5880-69e7-001d951750b7@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-04-08RDMA/rxe: Remove support for SMI QPs from rdma_rxeBob Pearson7-21/+1
Currently the rdma_rxe driver supports SMI type QPs in a few places which is incorrect. RoCE devices never should support SMI QPs. This commit removes SMI QP support from the driver. Link: https://lore.kernel.org/r/20220407185416.16372-1-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-04-08RDMA/rxe: Remove mc_grp_pool from struct rxe_devBob Pearson1-1/+0
Remove struct rxe_dev mc_grp_pool field. This field is no longer used. Link: https://lore.kernel.org/r/20220407184849.14359-1-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-04-08RDMA/rxe: Remove type 2A memory window capabilityBob Pearson2-1/+8
Currently the rdma_rxe driver claims to support both 2A and 2B type memory windows. But the IBA requires 010-37.2.31: If an HCA supports the Base Memory Management extensions, the HCA shall support either Type 2A or Type 2B MWs, but not both. This commit removes the device capability bit for type 2A memory windows and adds a clarifying comment to rxe_mw.c. Link: https://lore.kernel.org/r/20220407184321.14207-1-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-04-06RDMA: Split kernel-only global device caps from uverbs device capsJason Gunthorpe3-3/+3
Split out flags from ib_device::device_cap_flags that are only used internally to the kernel into kernel_cap_flags that is not part of the uapi. This limits the device_cap_flags to being the same bitmap that will be copied to userspace. This cleanly splits out the uverbs flags from the kernel flags to avoid confusion in the flags bitmap. Add some short comments describing which each of the kernel flags is connected to. Remove unused kernel flags. Link: https://lore.kernel.org/r/0-v2-22c19e565eef+139a-kern_caps_jgg@nvidia.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-04-04IB/rdmavt: add missing locks in rvt_ruc_loopbackNiels Dossche1-1/+5
The documentation of the function rvt_error_qp says both r_lock and s_lock need to be held when calling that function. It also asserts using lockdep that both of those locks are held. rvt_error_qp is called form rvt_send_cq, which is called from rvt_qp_complete_swqe, which is called from rvt_send_complete, which is called from rvt_ruc_loopback in two places. Both of these places do not hold r_lock. Fix this by acquiring a spin_lock of r_lock in both of these places. The r_lock acquiring cannot be added in rvt_qp_complete_swqe because some of its other callers already have r_lock acquired. Link: https://lore.kernel.org/r/20220228195144.71946-1-dossche.niels@gmail.com Signed-off-by: Niels Dossche <dossche.niels@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-04-04IB/rdmavt: add lock to call to rvt_error_qp to prevent a race conditionNiels Dossche1-1/+5
The documentation of the function rvt_error_qp says both r_lock and s_lock need to be held when calling that function. It also asserts using lockdep that both of those locks are held. However, the commit I referenced in Fixes accidentally makes the call to rvt_error_qp in rvt_ruc_loopback no longer covered by r_lock. This results in the lockdep assertion failing and also possibly in a race condition. Fixes: d757c60eca9b ("IB/rdmavt: Fix concurrency panics in QP post_send and modify to error") Link: https://lore.kernel.org/r/20220228165330.41546-1-dossche.niels@gmail.com Signed-off-by: Niels Dossche <dossche.niels@gmail.com> Acked-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-03-25Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds20-968/+866
Pull rdma updates from Jason Gunthorpe: - Minor bug fixes in mlx5, mthca, pvrdma, rtrs, mlx4, hfi1, hns - Minor cleanups: coding style, useless includes and documentation - Reorganize how multicast processing works in rxe - Replace a red/black tree with xarray in rxe which improves performance - DSCP support and HW address handle re-use in irdma - Simplify the mailbox command handling in hns - Simplify iser now that FMR is eliminated * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (93 commits) RDMA/nldev: Prevent underflow in nldev_stat_set_counter_dynamic_doit() IB/iser: Fix error flow in case of registration failure IB/iser: Generalize map/unmap dma tasks IB/iser: Use iser_fr_desc as registration context IB/iser: Remove iser_reg_data_sg helper function RDMA/rxe: Use standard names for ref counting RDMA/rxe: Replace red-black trees by xarrays RDMA/rxe: Shorten pool names in rxe_pool.c RDMA/rxe: Move max_elem into rxe_type_info RDMA/rxe: Replace obj by elem in declaration RDMA/rxe: Delete _locked() APIs for pool objects RDMA/rxe: Reverse the sense of RXE_POOL_NO_ALLOC RDMA/rxe: Replace mr by rkey in responder resources RDMA/rxe: Fix ref error in rxe_av.c RDMA/hns: Use the reserved loopback QPs to free MR before destroying MPT RDMA/irdma: Add support for address handle re-use RDMA/qib: Fix typos in comments RDMA/mlx5: Fix memory leak in error flow for subscribe event routine Revert "RDMA/core: Fix ib_qp_usecnt_dec() called when error" RDMA/rxe: Remove useless argument for update_state() ...
2022-03-16RDMA/rxe: Use standard names for ref countingBob Pearson12-96/+94
Rename rxe_add_ref() to rxe_get() and rxe_drop_ref() to rxe_put(). Significantly improves readability for new readers. Link: https://lore.kernel.org/r/20220304000808.225811-10-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-03-16RDMA/rxe: Replace red-black trees by xarraysBob Pearson6-280/+85
Currently the rxe driver uses red-black trees to add indices to the rxe object pools. Linux xarrays provide a better way to implement the same functionality for indices. This patch replaces red-black trees by xarrays for pool objects. Since xarrays already have a spinlock use that in place of the pool rwlock. Make sure that all changes in the xarray(index) and kref(ref counnt) occur atomically. Link: https://lore.kernel.org/r/20220304000808.225811-9-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-03-16RDMA/rxe: Shorten pool names in rxe_pool.cBob Pearson1-8/+8
Replace pool names like "rxe-xx" with "xx". Just reduces clutter. Link: https://lore.kernel.org/r/20220304000808.225811-8-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-03-16RDMA/rxe: Move max_elem into rxe_type_infoBob Pearson3-20/+20
Move the maximum number of elements from a parameter in rxe_pool_init to a member of the rxe_type_info array. Link: https://lore.kernel.org/r/20220304000808.225811-7-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>