Age | Commit message (Collapse) | Author | Files | Lines |
|
[ Upstream commit 2c02249fcbfc066bd33e2a7375c7006d4cb367f6 ]
An incoming Read request causes multiple Read responses. If a user MR to
copy data from is unavailable or responder cannot send a reply, then the
error messages can be printed for each response attempt, resulting in
message overflow.
Link: https://lore.kernel.org/r/20220829071218.1639065-1-matsuda-daisuke@fujitsu.com
Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit fda5d0cf8aef12f0a4f714a96a4b2fce039a3e55 ]
Currently in resize_finish() in rxe_queue.c there is a loop which copies
the entries in the original queue into a newly allocated queue. The
termination logic for this loop is incorrect. The call to
queue_next_index() updates cons but has no effect on whether the queue is
empty. So if the queue starts out empty nothing is copied but if it is not
then the loop will run forever. This patch changes the loop to compare the
value of cons to the original producer index.
Fixes: ae6e843fe08d0 ("RDMA/rxe: Add memory barriers to kernel queues")
Link: https://lore.kernel.org/r/20220825221446.6512-1-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit a3c278807a459e6f50afee6971cabe74cccfb490 ]
Delay QP destroy completion until all siw references to QP are
dropped. The calling RDMA core will free QP structure after
successful return from siw_qp_destroy() call, so siw must not
hold any remaining reference to the QP upon return.
A use-after-free was encountered in xfstest generic/460, while
testing NFSoRDMA. Here, after a TCP connection drop by peer,
the triggered siw_cm_work_handler got delayed until after
QP destroy call, referencing a QP which has already freed.
Fixes: 303ae1cdfdf7 ("rdma/siw: application interface")
Reported-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com>
Link: https://lore.kernel.org/r/20220920082503.224189-1-bmt@zurich.ibm.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 754209850df8367c954ac1de7671c7430b1f342c ]
For header and trailer/padding processing, siw did not consume new
skb data until minimum amount present to fill current header or trailer
structure, including potential payload padding. Not consuming any
data during upcall may cause a receive stall, since tcp_read_sock()
is not upcalling again if no new data arrive.
A NFSoRDMA client got stuck at RDMA Write reception of unaligned
payload, if the current skb did contain only the expected 3 padding
bytes, but not the 4 bytes CRC trailer. Expecting 4 more bytes already
arrived in another skb, and not consuming those 3 bytes in the current
upcall left the Write incomplete, waiting for the CRC forever.
Fixes: 8b6a361b8c48 ("rdma/siw: receive path")
Reported-by: Olga Kornievskaia <kolga@netapp.com>
Tested-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com>
Link: https://lore.kernel.org/r/20220920081202.223629-1-bmt@zurich.ibm.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 548ce2e66725dcba4e27d1e8ac468d5dd17fd509 ]
When sock_create_kern in the function rxe_qp_init_req fails,
qp->sk is set to NULL.
Then the function rxe_create_qp will call rxe_qp_do_cleanup
to handle allocated resource.
Before handling qp->sk, this variable should be checked.
Fixes: 8700e3e7c485 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20220822011615.805603-3-yanjun.zhu@linux.dev
Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit a625ca30eff806395175ebad3ac1399014bdb280 ]
When rxe_queue_init in the function rxe_qp_init_req fails,
both qp->req.task.func and qp->req.task.arg are not initialized.
Because of creation of qp fails, the function rxe_create_qp will
call rxe_qp_do_cleanup to handle allocated resource.
Before calling __rxe_do_task, both qp->req.task.func and
qp->req.task.arg should be checked.
Fixes: 8700e3e7c485 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20220822011615.805603-2-yanjun.zhu@linux.dev
Reported-by: syzbot+ab99dc4c6e961eed8b8e@syzkaller.appspotmail.com
Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 0d1b756acf60da5004c1e20ca4462f0c257bf6e1 ]
Functions that work on a pointer to virtual memory such as
virt_to_pfn() and users of that function such as
virt_to_page() are supposed to pass a pointer to virtual
memory, ideally a (void *) or other pointer. However since
many architectures implement virt_to_pfn() as a macro,
this function becomes polymorphic and accepts both a
(unsigned long) and a (void *).
If we instead implement a proper virt_to_pfn(void *addr)
function the following happens (occurred on arch/arm):
drivers/infiniband/sw/siw/siw_qp_tx.c:32:23: warning: incompatible
integer to pointer conversion passing 'dma_addr_t' (aka 'unsigned int')
to parameter of type 'const void *' [-Wint-conversion]
drivers/infiniband/sw/siw/siw_qp_tx.c:32:37: warning: passing argument
1 of 'virt_to_pfn' makes pointer from integer without a cast
[-Wint-conversion]
drivers/infiniband/sw/siw/siw_qp_tx.c:538:36: warning: incompatible
integer to pointer conversion passing 'unsigned long long'
to parameter of type 'const void *' [-Wint-conversion]
Fix this with an explicit cast. In one case where the SIW
SGE uses an unaligned u64 we need a double cast modifying the
virtual address (va) to a platform-specific uintptr_t before
casting to a (void *).
Fixes: b9be6f18cf9e ("rdma/siw: transmit path")
Cc: linux-rdma@vger.kernel.org
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/20220902215918.603761-1-linus.walleij@linaro.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 1e75550648da1fa1cd1969e7597355de8fe8caf6 ]
Below 2 commits will be reverted:
commit 8ff5f5d9d8cf ("RDMA/rxe: Prevent double freeing rxe_map_set()")
commit 647bf13ce944 ("RDMA/rxe: Create duplicate mapping tables for FMRs")
The community has a few bug reports which pointed this commit at last.
Some proposals are raised up in the meantime but all of them have no
follow-up operation.
The previous commit led the map_set of FMR to be not available any more if
the MR is registered again after invalidating. Although the mentioned
patch try to fix a potential race in building/accessing the same table
for fast memory regions, it broke rtrs etc ULPs. Since the latter could
be worse, revert this patch.
With previous commit, it's observed that a same MR in rnbd server will
trigger below code path:
-> rxe_mr_init_fast()
|-> alloc map_set() # map_set is uninitialized
|...-> rxe_map_mr_sg() # build the map_set
|-> rxe_mr_set_page()
|...-> rxe_reg_fast_mr() # mr->state change to VALID from FREE that means
# we can access host memory(such rxe_mr_copy)
|...-> rxe_invalidate_mr() # mr->state change to FREE from VALID
|...-> rxe_reg_fast_mr() # mr->state change to VALID from FREE,
# but map_set was not built again
|...-> rxe_mr_copy() # kernel crash due to access wild addresses
# that lookup from the map_set
The backtraces are not always identical.
[1st]----------
RIP: 0010:lookup_iova+0x66/0xa0 [rdma_rxe]
Code: 00 00 00 48 d3 ee 89 32 c3 4c 8b 18 49 8b 3b 48 8b 47 08 48 39 c6 72 38 48 29 c6 45 31 d2 b8 01 00 00 00 48 63 c8 48 c1 e1 04 <48> 8b 4c 0f 08 48 39 f1 77 21 83 c0 01 48 29 ce 3d 00 01 00 00 75
RSP: 0018:ffffb7ff80063bf0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff9b9949d86800 RCX: 0000000000000000
RDX: ffffb7ff80063c00 RSI: 0000000049f6b378 RDI: 002818da00000004
RBP: 0000000000000120 R08: ffffb7ff80063c08 R09: ffffb7ff80063c04
R10: 0000000000000002 R11: ffff9b9916f7eef8 R12: ffff9b99488a0038
R13: ffff9b99488a0038 R14: ffff9b9914fb346a R15: ffff9b990ab27000
FS: 0000000000000000(0000) GS:ffff9b997dc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007efc33a98ed0 CR3: 0000000014f32004 CR4: 00000000001706f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
rxe_mr_copy.part.0+0x6f/0x140 [rdma_rxe]
rxe_responder+0x12ee/0x1b60 [rdma_rxe]
? rxe_icrc_check+0x7e/0x100 [rdma_rxe]
? rxe_rcv+0x1d0/0x780 [rdma_rxe]
? rxe_icrc_hdr.isra.0+0xf6/0x160 [rdma_rxe]
rxe_do_task+0x67/0xb0 [rdma_rxe]
rxe_xmit_packet+0xc7/0x210 [rdma_rxe]
rxe_requester+0x680/0xee0 [rdma_rxe]
? update_load_avg+0x5f/0x690
? update_load_avg+0x5f/0x690
? rtrs_clt_recv_done+0x1b/0x30 [rtrs_client]
[2nd]----------
RIP: 0010:rxe_mr_copy.part.0+0xa8/0x140 [rdma_rxe]
Code: 00 00 49 c1 e7 04 48 8b 00 4c 8d 2c d0 48 8b 44 24 10 4d 03 7d 00 85 ed 7f 10 eb 6c 89 54 24 0c 49 83 c7 10 31 c0 85 ed 7e 5e <49> 8b 3f 8b 14 24 4c 89 f6 48 01 c7 85 d2 74 06 48 89 fe 4c 89 f7
RSP: 0018:ffffae3580063bf8 EFLAGS: 00010202
RAX: 0000000000018978 RBX: ffff9d7ef7a03600 RCX: 0000000000000008
RDX: 000000000000007c RSI: 000000000000007c RDI: ffff9d7ef7a03600
RBP: 0000000000000120 R08: ffffae3580063c08 R09: ffffae3580063c04
R10: ffff9d7efece0038 R11: ffff9d7ec4b1db00 R12: ffff9d7efece0038
R13: ffff9d7ef4098260 R14: ffff9d7f11e23c6a R15: 4c79500065708144
FS: 0000000000000000(0000) GS:ffff9d7f3dc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fce47276c60 CR3: 0000000003f66004 CR4: 00000000001706f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
rxe_responder+0x12ee/0x1b60 [rdma_rxe]
? rxe_icrc_check+0x7e/0x100 [rdma_rxe]
? rxe_rcv+0x1d0/0x780 [rdma_rxe]
? rxe_icrc_hdr.isra.0+0xf6/0x160 [rdma_rxe]
rxe_do_task+0x67/0xb0 [rdma_rxe]
rxe_xmit_packet+0xc7/0x210 [rdma_rxe]
rxe_requester+0x680/0xee0 [rdma_rxe]
? update_load_avg+0x5f/0x690
? update_load_avg+0x5f/0x690
? rtrs_clt_recv_done+0x1b/0x30 [rtrs_client]
rxe_do_task+0x67/0xb0 [rdma_rxe]
tasklet_action_common.constprop.0+0x92/0xc0
__do_softirq+0xe1/0x2d8
run_ksoftirqd+0x21/0x30
smpboot_thread_fn+0x183/0x220
? sort_range+0x20/0x20
kthread+0xe2/0x110
? kthread_complete_and_exit+0x20/0x20
ret_from_fork+0x22/0x30
Link: https://lore.kernel.org/r/1658805386-2-1-git-send-email-lizhijian@fujitsu.com
Link: https://lore.kernel.org/all/20220210073655.42281-1-guoqing.jiang@linux.dev/T/
Link: https://www.spinics.net/lists/linux-rdma/msg110836.html
Link: https://lore.kernel.org/lkml/94a5ea93-b8bb-3a01-9497-e2021f29598a@linux.dev/t/
Tested-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit eff6d998ca297cb0b2e53b032a56cf8e04dd8b17 ]
Limit the maximum number of calls to each tasklet from rxe_do_task()
before yielding the cpu. When the limit is reached reschedule the tasklet
and exit the calling loop. This patch prevents one tasklet from consuming
100% of a cpu core and causing a deadlock or soft lockup.
Link: https://lore.kernel.org/r/20220630190425.2251-9-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit fd5382c5805c4bcb50fd25b7246247d3f7114733 ]
In the function rxe_create_qp(), rxe_qp_from_init() is called to
initialize qp, internally things like the spin locks are not setup until
rxe_qp_init_req().
If an error occures before this point then the unwind will call
rxe_cleanup() and eventually to rxe_qp_do_cleanup()/rxe_cleanup_task()
which will oops when trying to access the uninitialized spinlock.
Move the spinlock initializations earlier before any failures.
Fixes: 8700e3e7c485 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20220731063621.298405-1-yanjun.zhu@linux.dev
Reported-by: syzbot+833061116fa28df97f3b@syzkaller.appspotmail.com
Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 445fd4f4fb76d513de6b05b08b3a4d0bb980fc80 ]
Currently the completer tasklet when retransmit timer or the rnr timer
fires the same flag (qp->req.need_retry) is set so that if either timer
fires it will attempt to perform a retry flow on the send queue. This has
the effect of responding to an RNR NAK at the first retransmit timer event
which might not allow the requested rnr timeout.
This patch adds a new flag (qp->req.wait_for_rnr_timer) which, if set,
prevents a retry flow until the rnr nak timer fires.
This patch fixes rnr retry errors which can be observed by running the
pyverbs test_rdmacm_async_traffic_external_qp multiple times. With this
patch applied they do not occur.
Link: https://lore.kernel.org/linux-rdma/a8287823-1408-4273-bc22-99a0678db640@gmail.com/
Link: https://lore.kernel.org/linux-rdma/2bafda9e-2bb6-186d-12a1-179e8f6a2678@talpey.com/
Fixes: 8700e3e7c485 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20220630190425.2251-6-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 174e7b137042f19b5ce88beb4fc0ff4ec6b0c72a ]
The 'rkey' input can be an lkey or rkey, and in rxe the lkey or rkey have
the same value, including the variant bits.
So, if mr->rkey is set, compare the invalidate key with it, otherwise
compare with the mr->lkey.
Since we already did a lookup on the non-varient bits to get this far, the
check's only purpose is to confirm that the wqe has the correct variant
bits.
Fixes: 001345339f4c ("RDMA/rxe: Separate HW and SW l/rkeys")
Link: https://lore.kernel.org/r/20220707073006.328737-1-haris.phnx@gmail.com
Signed-off-by: Md Haris Iqbal <haris.phnx@gmail.com>
Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 1603f89935ec86d40a7667e1250392626976ccc2 ]
The current implementation of rxe_check_bind_mw() in rxe_mw.c is incorrect
since it requires the new key portion provided by the mw consumer to be
different than the previous key portion. This is not required by the
IBA. Remove the test.
Link: https://lore.kernel.org/linux-rdma/fb4614e7-4cac-0dc7-3ef7-766dfd10e8f2@gmail.com/
Fixes: 32a577b4c3a9 ("Add support for bind MW work requests")
Link: https://lore.kernel.org/r/20220714204619.13396-1-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 37da51efe6eaa0560f46803c8c436a48a2084da7 ]
The function rxe_create_qp calls rxe_qp_from_init. If some error
occurs, the error handler of function rxe_qp_from_init will set
both scq and rcq to NULL.
Then rxe_create_qp calls rxe_put to handle qp. In the end,
rxe_qp_do_cleanup is called by rxe_put. rxe_qp_do_cleanup directly
accesses scq and rcq before checking them. This will cause
null-ptr-deref error.
The call graph is as below:
rxe_create_qp {
...
rxe_qp_from_init {
...
err1:
...
qp->rcq = NULL; <---rcq is set to NULL
qp->scq = NULL; <---scq is set to NULL
...
}
qp_init:
rxe_put{
...
rxe_qp_do_cleanup {
...
atomic_dec(&qp->scq->num_wq); <--- scq is accessed
...
atomic_dec(&qp->rcq->num_wq); <--- rcq is accessed
}
}
Fixes: 4703b4f0d94a ("RDMA/rxe: Enforce IBA C11-17")
Link: https://lore.kernel.org/r/20220705225414.315478-1-yanjun.zhu@linux.dev
Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com>
Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 3056fc6c32e613b760422b94c7617ac9a24a4721 ]
If siw_recv_mpa_rr returns -EAGAIN, it means that the MPA reply hasn't
been received completely, and should not report IW_CM_EVENT_CONNECT_REPLY
in this case. This may trigger a call trace in iw_cm. A simple way to
trigger this:
server: ib_send_lat
client: ib_send_lat -R <server_ip>
The call trace looks like this:
kernel BUG at drivers/infiniband/core/iwcm.c:894!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
<...>
Workqueue: iw_cm_wq cm_work_handler [iw_cm]
Call Trace:
<TASK>
cm_work_handler+0x1dd/0x370 [iw_cm]
process_one_work+0x1e2/0x3b0
worker_thread+0x49/0x2e0
? rescuer_thread+0x370/0x370
kthread+0xe5/0x110
? kthread_complete_and_exit+0x20/0x20
ret_from_fork+0x1f/0x30
</TASK>
Fixes: 6c52fdc244b5 ("rdma/siw: connection management")
Link: https://lore.kernel.org/r/dae34b5fd5c2ea2bd9744812c1d2653a34a94c67.1657706960.git.chengyou@linux.alibaba.com
Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 7cb33d1bc1ac8e51fd88928f96674d392f8e07c4 ]
When a local operation (invalidate mr, reg mr, bind mw) is finished there
will be no ack packet coming from a responder to cause the wqe to be
completed. This may happen anyway if a subsequent wqe performs
IO. Currently if the wqe is signalled the completer tasklet is scheduled
immediately but not otherwise.
This leads to a deadlock if the next wqe has the fence bit set in send
flags and the operation is not signalled. This patch removes the condition
that the wqe must be signalled in order to schedule the completer tasklet
which is the simplest fix for this deadlock and is fairly low cost. This
is the analog for local operations of always setting the ackreq bit in all
last or only request packets even if the operation is not signalled.
Link: https://lore.kernel.org/r/20220523223251.15350-1-rpearsonhpe@gmail.com
Reported-by: Jenny Hack <jhack@hpe.com>
Fixes: c1a411268a4b ("RDMA/rxe: Move local ops to subroutine")
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 0ed5493e430a1d887eb62b45c75dd5d6bb2dcf48 ]
Add a responder state for atomic reply similar to read reply and rename
process_atomic() rxe_atomic_reply(). In preparation for merging the normal
and retry atomic responder flows.
Link: https://lore.kernel.org/r/20220606143836.3323-3-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 1a685940e6200e9def6e34bbaa19dd31dc5aeaf8 ]
Currently rxe_alloc checks ret to indicate error, but 1 is also a valid
return and just indicates that the allocation succeeded with a wrap.
Fix this by modifying the check to be < 0.
Link: https://lore.kernel.org/r/20220609070656.1446121-1-dzm91@hust.edu.cn
Fixes: 3225717f6dfa ("RDMA/rxe: Replace red-black trees by xarrays")
Signed-off-by: Dongliang Mu <mudongliangabcd@gmail.com>
Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
The commit in the Fixes tag has shuffled some code.
Now 'mcg_num' is incremented before the kzalloc(). So if the memory
allocation fails, this increment must be undone.
Fixes: a926a903b7dc ("RDMA/rxe: Do not call dev_mc_add/del() under a spinlock")
Link: https://lore.kernel.org/r/fe137cd8b1f17593243aa73d59c18ea71ab9ee36.1653225896.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Following patches have dependencies.
Resolve the merge conflict in
drivers/net/ethernet/mellanox/mlx5/core/main.c by keeping the new names
for the fs functions following linux-next:
https://lore.kernel.org/r/20220519113529.226bc3e2@canb.auug.org.au/
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Enable siw to attach to tunnel devices, there is no reason not to, siw
properly generates all packets already.
Link: https://lore.kernel.org/r/20220510143917.23735-1-bmt@zurich.ibm.com
Tested-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Add a counter to keep track of the number of WQs connected to a CQ and
return an error if destroy_cq() is called while the counter is non zero.
Link: https://lore.kernel.org/r/20220421014042.26985-8-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Move code from rxe_dealloc_mw() to rxe_mw_cleanup() to allow flows which
hold a reference to mw to complete.
Link: https://lore.kernel.org/r/20220421014042.26985-7-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Move the code which tears down an mr to rxe_mr_cleanup to allow operations
holding a reference to the mr to complete.
Link: https://lore.kernel.org/r/20220421014042.26985-6-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Move the code from rxe_qp_destroy() to rxe_qp_do_cleanup(). This allows
flows holding references to qp to complete before the qp object is torn
down.
Link: https://lore.kernel.org/r/20220421014042.26985-5-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
In the tasklets (completer, responder, and requester) check the return
value from rxe_get() to detect failures to get a reference. This only
occurs if the qp has had its reference count drop to zero which indicates
that it no longer should be used.
The ref is never 0 today because the tasklets are flushed before the ref
is dropped. The next patch changes this so that the ref is dropped then
the tasklets are flushed.
Link: https://lore.kernel.org/r/20220421014042.26985-4-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Move cleanup code from rxe_destroy_srq() to rxe_srq_cleanup() which is
called after all references are dropped to allow code depending on the srq
object to complete.
Link: https://lore.kernel.org/r/20220421014042.26985-3-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Currently the #define IB_SRQ_INIT_MASK is used to distinguish the
rxe_create_srq verb from the rxe_modify_srq verb so that some code can be
shared between these two subroutines.
This commit splits rxe_srq_chk_attr into two subroutines: rxe_srq_chk_init
and rxe_srq_chk_attr which handle the create_srq and modify_srq verbs
separately.
Link: https://lore.kernel.org/r/20220421014042.26985-2-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
For write request the remote addr will be sent only with first packet so
we don't have to adjust wqe->iova in retry operation.
Link: https://lore.kernel.org/r/20220502053907.6388-1-cgxu519@mykernel.net
Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Based on the commit c9f4c695835c ("RDMA/rxe: Reverse the sense of
RXE_POOL_NO_ALLOC"), only the mr pool uses the RXE_POOL_ALLOC, As such,
replace this flags with pool type to save memory.
Link: https://lore.kernel.org/r/20220428041028.1363139-1-yanjun.zhu@linux.dev
Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
rxe_mcast.c currently uses _irqsave spinlocks for rxe->mcg_lock while
rxe_recv.c uses _bh spinlocks for the same lock.
As there is no case where the mcg_lock can be taken from an IRQ, change
these all to bh locks so we don't have confusing mismatched lock types on
the same spinlock.
Fixes: 6090a0c4c7c6 ("RDMA/rxe: Cleanup rxe_mcast.c")
Link: https://lore.kernel.org/r/20220504202817.98247-1-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
These routines were not intended to be called under a spinlock and will
throw debugging warnings:
raw_local_irq_restore() called with IRQs enabled
WARNING: CPU: 13 PID: 3107 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x2f/0x50
CPU: 13 PID: 3107 Comm: python3 Tainted: G E 5.18.0-rc1+ #7
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
RIP: 0010:warn_bogus_irq_restore+0x2f/0x50
Call Trace:
<TASK>
_raw_spin_unlock_irqrestore+0x75/0x80
rxe_attach_mcast+0x304/0x480 [rdma_rxe]
ib_attach_mcast+0x88/0xa0 [ib_core]
ib_uverbs_attach_mcast+0x186/0x1e0 [ib_uverbs]
ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0xcd/0x140 [ib_uverbs]
ib_uverbs_cmd_verbs+0xdb0/0xea0 [ib_uverbs]
ib_uverbs_ioctl+0xd2/0x160 [ib_uverbs]
do_syscall_64+0x5c/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xae
Move them out of the spinlock, it is OK if there is some races setting up
the MC reception at the ethernet layer with rbtree lookups.
Fixes: 6090a0c4c7c6 ("RDMA/rxe: Cleanup rxe_mcast.c")
Link: https://lore.kernel.org/r/20220504202817.98247-1-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
The calling of siw_cm_upcall and detaching new_cep with its listen_cep
should be atomistic semantics. Otherwise siw_reject may be called in a
temporary state, e,g, siw_cm_upcall is called but the new_cep->listen_cep
has not being cleared.
This fixes a WARN:
WARNING: CPU: 7 PID: 201 at drivers/infiniband/sw/siw/siw_cm.c:255 siw_cep_put+0x125/0x130 [siw]
CPU: 2 PID: 201 Comm: kworker/u16:22 Kdump: loaded Tainted: G E 5.17.0-rc7 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
Workqueue: iw_cm_wq cm_work_handler [iw_cm]
RIP: 0010:siw_cep_put+0x125/0x130 [siw]
Call Trace:
<TASK>
siw_reject+0xac/0x180 [siw]
iw_cm_reject+0x68/0xc0 [iw_cm]
cm_work_handler+0x59d/0xe20 [iw_cm]
process_one_work+0x1e2/0x3b0
worker_thread+0x50/0x3a0
? rescuer_thread+0x390/0x390
kthread+0xe5/0x110
? kthread_complete_and_exit+0x20/0x20
ret_from_fork+0x1f/0x30
</TASK>
Fixes: 6c52fdc244b5 ("rdma/siw: connection management")
Link: https://lore.kernel.org/r/d528d83466c44687f3872eadcb8c184528b2e2d4.1650526554.git.chengyou@linux.alibaba.com
Reported-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Bernard Metzler <bmt@zurich.ibm.com>
Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
In finish_packet() in rxe_req.c a variable was incorrectly called paylen
instead of payload. Elsewhere in the rxe source payload is always used for
the RoCE payload length and paylen is always used for the UDP payload
length. This will cause unnecessary confusion.
Replace paylen by payload in finish_packet().
Link: https://lore.kernel.org/r/20220420172316.5465-1-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
wqe was not used by update_state() so far.
Commit aaaf62e06623 ("RDMA/rxe: Remove useless argument for
update_state()") just did a partial fixes.
Link: https://lore.kernel.org/r/20220412022903.574238-1-lizhijian@fujitsu.com
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
The rping benchmark fails on long runs. The root cause of this failure has
been traced to a failure to compute a nonzero value of mr in rare
situations.
Fix this failure by correctly handling the computation of mr in
read_reply() in rxe_resp.c in the replay flow.
Fixes: 8a1a0be894da ("RDMA/rxe: Replace mr by rkey in responder resources")
Link: https://lore.kernel.org/r/20220418174103.3040-1-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
The referenced commit generates a reference counting error if the rkey has
the same index but the wrong key. In this case the reference taken by
rxe_pool_get_index() is not dropped.
Drop the reference if the keys don't match in rxe_recheck_mr(). Check
that the mw and mr are still valid.
Fixes: 8a1a0be894da ("RDMA/rxe: Replace mr by rkey in responder resources")
Link: https://lore.kernel.org/r/20220411030647.20011-1-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Current rxe_requester() doesn't generate a completion when processing an
unsupported/invalid opcode. If rxe driver doesn't support a new opcode
(e.g. RDMA Atomic Write) and RDMA library supports it, an application
using the new opcode can reproduce this issue. Fix the issue by calling
"goto err;".
Fixes: 8700e3e7c485 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20220410113513.27537-1-yangx.jy@fujitsu.com
Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
The rdma_rxe driver does not actually support the reliable datagram
transport but contains two references to RD opcodes in driver code. This
commit removes these references to RD transport opcodes which are never
used.
Link: https://lore.kernel.org/r/cce0f07d-25fc-5880-69e7-001d951750b7@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Currently the rdma_rxe driver supports SMI type QPs in a few places which
is incorrect. RoCE devices never should support SMI QPs. This commit
removes SMI QP support from the driver.
Link: https://lore.kernel.org/r/20220407185416.16372-1-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Remove struct rxe_dev mc_grp_pool field. This field is no longer used.
Link: https://lore.kernel.org/r/20220407184849.14359-1-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Currently the rdma_rxe driver claims to support both 2A and 2B type memory
windows. But the IBA requires
010-37.2.31: If an HCA supports the Base Memory Management
extensions, the HCA shall support either Type 2A or Type 2B MWs,
but not both.
This commit removes the device capability bit for type 2A memory windows
and adds a clarifying comment to rxe_mw.c.
Link: https://lore.kernel.org/r/20220407184321.14207-1-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Split out flags from ib_device::device_cap_flags that are only used
internally to the kernel into kernel_cap_flags that is not part of the
uapi. This limits the device_cap_flags to being the same bitmap that will
be copied to userspace.
This cleanly splits out the uverbs flags from the kernel flags to avoid
confusion in the flags bitmap.
Add some short comments describing which each of the kernel flags is
connected to. Remove unused kernel flags.
Link: https://lore.kernel.org/r/0-v2-22c19e565eef+139a-kern_caps_jgg@nvidia.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
The documentation of the function rvt_error_qp says both r_lock and
s_lock need to be held when calling that function.
It also asserts using lockdep that both of those locks are held.
rvt_error_qp is called form rvt_send_cq, which is called from
rvt_qp_complete_swqe, which is called from rvt_send_complete, which is
called from rvt_ruc_loopback in two places. Both of these places do not
hold r_lock. Fix this by acquiring a spin_lock of r_lock in both of
these places.
The r_lock acquiring cannot be added in rvt_qp_complete_swqe because
some of its other callers already have r_lock acquired.
Link: https://lore.kernel.org/r/20220228195144.71946-1-dossche.niels@gmail.com
Signed-off-by: Niels Dossche <dossche.niels@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
The documentation of the function rvt_error_qp says both r_lock and s_lock
need to be held when calling that function. It also asserts using lockdep
that both of those locks are held. However, the commit I referenced in
Fixes accidentally makes the call to rvt_error_qp in rvt_ruc_loopback no
longer covered by r_lock. This results in the lockdep assertion failing
and also possibly in a race condition.
Fixes: d757c60eca9b ("IB/rdmavt: Fix concurrency panics in QP post_send and modify to error")
Link: https://lore.kernel.org/r/20220228165330.41546-1-dossche.niels@gmail.com
Signed-off-by: Niels Dossche <dossche.niels@gmail.com>
Acked-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Pull rdma updates from Jason Gunthorpe:
- Minor bug fixes in mlx5, mthca, pvrdma, rtrs, mlx4, hfi1, hns
- Minor cleanups: coding style, useless includes and documentation
- Reorganize how multicast processing works in rxe
- Replace a red/black tree with xarray in rxe which improves performance
- DSCP support and HW address handle re-use in irdma
- Simplify the mailbox command handling in hns
- Simplify iser now that FMR is eliminated
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (93 commits)
RDMA/nldev: Prevent underflow in nldev_stat_set_counter_dynamic_doit()
IB/iser: Fix error flow in case of registration failure
IB/iser: Generalize map/unmap dma tasks
IB/iser: Use iser_fr_desc as registration context
IB/iser: Remove iser_reg_data_sg helper function
RDMA/rxe: Use standard names for ref counting
RDMA/rxe: Replace red-black trees by xarrays
RDMA/rxe: Shorten pool names in rxe_pool.c
RDMA/rxe: Move max_elem into rxe_type_info
RDMA/rxe: Replace obj by elem in declaration
RDMA/rxe: Delete _locked() APIs for pool objects
RDMA/rxe: Reverse the sense of RXE_POOL_NO_ALLOC
RDMA/rxe: Replace mr by rkey in responder resources
RDMA/rxe: Fix ref error in rxe_av.c
RDMA/hns: Use the reserved loopback QPs to free MR before destroying MPT
RDMA/irdma: Add support for address handle re-use
RDMA/qib: Fix typos in comments
RDMA/mlx5: Fix memory leak in error flow for subscribe event routine
Revert "RDMA/core: Fix ib_qp_usecnt_dec() called when error"
RDMA/rxe: Remove useless argument for update_state()
...
|
|
Rename rxe_add_ref() to rxe_get() and rxe_drop_ref() to rxe_put().
Significantly improves readability for new readers.
Link: https://lore.kernel.org/r/20220304000808.225811-10-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Currently the rxe driver uses red-black trees to add indices to the rxe
object pools. Linux xarrays provide a better way to implement the same
functionality for indices. This patch replaces red-black trees by xarrays
for pool objects. Since xarrays already have a spinlock use that in place
of the pool rwlock. Make sure that all changes in the xarray(index) and
kref(ref counnt) occur atomically.
Link: https://lore.kernel.org/r/20220304000808.225811-9-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Replace pool names like "rxe-xx" with "xx". Just reduces clutter.
Link: https://lore.kernel.org/r/20220304000808.225811-8-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Move the maximum number of elements from a parameter in rxe_pool_init to a
member of the rxe_type_info array.
Link: https://lore.kernel.org/r/20220304000808.225811-7-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|