Age | Commit message (Collapse) | Author | Files | Lines |
|
commit 7694a7de22c53a312ea98960fcafc6ec62046531 upstream.
Because of the possible failure of the allocation, data might be NULL
pointer and will cause the dereference of the NULL pointer later.
Therefore, it might be better to check it and return -ENOMEM.
Fixes: 6884c6c4bd09 ("RDMA/verbs: Store the write/write_ex uapi entry points in the uverbs_api")
Link: https://lore.kernel.org/r/20211231093315.1917667-1-jiasheng@iscas.ac.cn
Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit b35a0f4dd544eaa6162b6d2f13a2557a121ae5fd upstream.
If dst->is_global field is not set, the GRH fields are not cleared
and the following infoleak is reported.
=====================================================
BUG: KMSAN: kernel-infoleak in instrument_copy_to_user include/linux/instrumented.h:121 [inline]
BUG: KMSAN: kernel-infoleak in _copy_to_user+0x1c9/0x270 lib/usercopy.c:33
instrument_copy_to_user include/linux/instrumented.h:121 [inline]
_copy_to_user+0x1c9/0x270 lib/usercopy.c:33
copy_to_user include/linux/uaccess.h:209 [inline]
ucma_init_qp_attr+0x8c7/0xb10 drivers/infiniband/core/ucma.c:1242
ucma_write+0x637/0x6c0 drivers/infiniband/core/ucma.c:1732
vfs_write+0x8ce/0x2030 fs/read_write.c:588
ksys_write+0x28b/0x510 fs/read_write.c:643
__do_sys_write fs/read_write.c:655 [inline]
__se_sys_write fs/read_write.c:652 [inline]
__ia32_sys_write+0xdb/0x120 fs/read_write.c:652
do_syscall_32_irqs_on arch/x86/entry/common.c:114 [inline]
__do_fast_syscall_32+0x96/0xf0 arch/x86/entry/common.c:180
do_fast_syscall_32+0x34/0x70 arch/x86/entry/common.c:205
do_SYSENTER_32+0x1b/0x20 arch/x86/entry/common.c:248
entry_SYSENTER_compat_after_hwframe+0x4d/0x5c
Local variable resp created at:
ucma_init_qp_attr+0xa4/0xb10 drivers/infiniband/core/ucma.c:1214
ucma_write+0x637/0x6c0 drivers/infiniband/core/ucma.c:1732
Bytes 40-59 of 144 are uninitialized
Memory access of size 144 starts at ffff888167523b00
Data copied to user address 0000000020000100
CPU: 1 PID: 25910 Comm: syz-executor.1 Not tainted 5.16.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
=====================================================
Fixes: 4ba66093bdc6 ("IB/core: Check for global flag when using ah_attr")
Link: https://lore.kernel.org/r/0e9dd51f93410b7b2f4f5562f52befc878b71afa.1641298868.git.leonro@nvidia.com
Reported-by: syzbot+6d532fa8f9463da290bc@syzkaller.appspotmail.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 64733956ebba7cc629856f4a6ee35a52bc9c023f upstream.
When copying the device name, the length of the data memcpy copied exceeds
the length of the source buffer, which cause the KASAN issue below. Use
strscpy_pad() instead.
BUG: KASAN: slab-out-of-bounds in ib_nl_set_path_rec_attrs+0x136/0x320 [ib_core]
Read of size 64 at addr ffff88811a10f5e0 by task rping/140263
CPU: 3 PID: 140263 Comm: rping Not tainted 5.15.0-rc1+ #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Call Trace:
dump_stack_lvl+0x57/0x7d
print_address_description.constprop.0+0x1d/0xa0
kasan_report+0xcb/0x110
kasan_check_range+0x13d/0x180
memcpy+0x20/0x60
ib_nl_set_path_rec_attrs+0x136/0x320 [ib_core]
ib_nl_make_request+0x1c6/0x380 [ib_core]
send_mad+0x20a/0x220 [ib_core]
ib_sa_path_rec_get+0x3e3/0x800 [ib_core]
cma_query_ib_route+0x29b/0x390 [rdma_cm]
rdma_resolve_route+0x308/0x3e0 [rdma_cm]
ucma_resolve_route+0xe1/0x150 [rdma_ucm]
ucma_write+0x17b/0x1f0 [rdma_ucm]
vfs_write+0x142/0x4d0
ksys_write+0x133/0x160
do_syscall_64+0x43/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f26499aa90f
Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 29 fd ff ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 44 24 08 e8 5c fd ff ff 48
RSP: 002b:00007f26495f2dc0 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 00000000000007d0 RCX: 00007f26499aa90f
RDX: 0000000000000010 RSI: 00007f26495f2e00 RDI: 0000000000000003
RBP: 00005632a8315440 R08: 0000000000000000 R09: 0000000000000001
R10: 0000000000000000 R11: 0000000000000293 R12: 00007f26495f2e00
R13: 00005632a83154e0 R14: 00005632a8315440 R15: 00005632a830a810
Allocated by task 131419:
kasan_save_stack+0x1b/0x40
__kasan_kmalloc+0x7c/0x90
proc_self_get_link+0x8b/0x100
pick_link+0x4f1/0x5c0
step_into+0x2eb/0x3d0
walk_component+0xc8/0x2c0
link_path_walk+0x3b8/0x580
path_openat+0x101/0x230
do_filp_open+0x12e/0x240
do_sys_openat2+0x115/0x280
__x64_sys_openat+0xce/0x140
do_syscall_64+0x43/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae
Fixes: 2ca546b92a02 ("IB/sa: Route SA pathrecord query through netlink")
Link: https://lore.kernel.org/r/72ede0f6dab61f7f23df9ac7a70666e07ef314b0.1635055496.git.leonro@nvidia.com
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit ca465e1f1f9b38fe916a36f7d80c5d25f2337c81 ]
If cma_listen_on_all() fails it leaves the per-device ID still on the
listen_list but the state is not set to RDMA_CM_ADDR_BOUND.
When the cmid is eventually destroyed cma_cancel_listens() is not called
due to the wrong state, however the per-device IDs are still holding the
refcount preventing the ID from being destroyed, thus deadlocking:
task:rping state:D stack: 0 pid:19605 ppid: 47036 flags:0x00000084
Call Trace:
__schedule+0x29a/0x780
? free_unref_page_commit+0x9b/0x110
schedule+0x3c/0xa0
schedule_timeout+0x215/0x2b0
? __flush_work+0x19e/0x1e0
wait_for_completion+0x8d/0xf0
_destroy_id+0x144/0x210 [rdma_cm]
ucma_close_id+0x2b/0x40 [rdma_ucm]
__destroy_id+0x93/0x2c0 [rdma_ucm]
? __xa_erase+0x4a/0xa0
ucma_destroy_id+0x9a/0x120 [rdma_ucm]
ucma_write+0xb8/0x130 [rdma_ucm]
vfs_write+0xb4/0x250
ksys_write+0xb5/0xd0
? syscall_trace_enter.isra.19+0x123/0x190
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Ensure that cma_listen_on_all() atomically unwinds its action under the
lock during error.
Fixes: c80a0c52d85c ("RDMA/cma: Add missing error handling of listen_id")
Link: https://lore.kernel.org/r/20210913093344.17230-1-thomas.liu@ucloud.cn
Signed-off-by: Tao Liu <thomas.liu@ucloud.cn>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 2cc74e1ee31d00393b6698ec80b322fd26523da4 ]
ROCE uses IGMP for Multicast instead of the native Infiniband system where
joins are required in order to post messages on the Multicast group. On
Ethernet one can send Multicast messages to arbitrary addresses without
the need to subscribe to a group.
So ROCE correctly does not send IGMP joins during rdma_join_multicast().
F.e. in cma_iboe_join_multicast() we see:
if (addr->sa_family == AF_INET) {
if (gid_type == IB_GID_TYPE_ROCE_UDP_ENCAP) {
ib.rec.hop_limit = IPV6_DEFAULT_HOPLIMIT;
if (!send_only) {
err = cma_igmp_send(ndev, &ib.rec.mgid,
true);
}
}
} else {
So the IGMP join is suppressed as it is unnecessary.
However no such check is done in destroy_mc(). And therefore leaving a
sendonly multicast group will send an IGMP leave.
This means that the following scenario can lead to a multicast receiver
unexpectedly being unsubscribed from a MC group:
1. Sender thread does a sendonly join on MC group X. No IGMP join
is sent.
2. Receiver thread does a regular join on the same MC Group x.
IGMP join is sent and the receiver begins to get messages.
3. Sender thread terminates and destroys MC group X.
IGMP leave is sent and the receiver no longer receives data.
This patch adds the same logic for sendonly joins to destroy_mc() that is
also used in cma_iboe_join_multicast().
Fixes: ab15c95a17b3 ("IB/core: Support for CMA multicast join flags")
Link: https://lore.kernel.org/r/alpine.DEB.2.22.394.2109081340540.668072@gentwo.de
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit bc0bdc5afaa740d782fbf936aaeebd65e5c2921d upstream.
If the state is not idle then rdma_bind_addr() will immediately fail and
no change to global state should happen.
For instance if the state is already RDMA_CM_LISTEN then this will corrupt
the src_addr and would cause the test in cma_cancel_operation():
if (cma_any_addr(cma_src_addr(id_priv)) && !id_priv->cma_dev)
To view a mangled src_addr, eg with a IPv6 loopback address but an IPv4
family, failing the test.
This would manifest as this trace from syzkaller:
BUG: KASAN: use-after-free in __list_add_valid+0x93/0xa0 lib/list_debug.c:26
Read of size 8 at addr ffff8881546491e0 by task syz-executor.1/32204
CPU: 1 PID: 32204 Comm: syz-executor.1 Not tainted 5.12.0-rc8-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:79 [inline]
dump_stack+0x141/0x1d7 lib/dump_stack.c:120
print_address_description.constprop.0.cold+0x5b/0x2f8 mm/kasan/report.c:232
__kasan_report mm/kasan/report.c:399 [inline]
kasan_report.cold+0x7c/0xd8 mm/kasan/report.c:416
__list_add_valid+0x93/0xa0 lib/list_debug.c:26
__list_add include/linux/list.h:67 [inline]
list_add_tail include/linux/list.h:100 [inline]
cma_listen_on_all drivers/infiniband/core/cma.c:2557 [inline]
rdma_listen+0x787/0xe00 drivers/infiniband/core/cma.c:3751
ucma_listen+0x16a/0x210 drivers/infiniband/core/ucma.c:1102
ucma_write+0x259/0x350 drivers/infiniband/core/ucma.c:1732
vfs_write+0x28e/0xa30 fs/read_write.c:603
ksys_write+0x1ee/0x250 fs/read_write.c:658
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xae
Which is indicating that an rdma_id_private was destroyed without doing
cma_cancel_listens().
Instead of trying to re-use the src_addr memory to indirectly create an
any address build one explicitly on the stack and bind to that as any
other normal flow would do.
Link: https://lore.kernel.org/r/0-v1-9fbb33f5e201+2a-cma_listen_jgg@nvidia.com
Cc: stable@vger.kernel.org
Fixes: 732d41c545bb ("RDMA/cma: Make the locking for automatic state transition more clear")
Reported-by: syzbot+6bb0528b13611047209c@syzkaller.appspotmail.com
Tested-by: Hao Sun <sunhao.th@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit e677b72a0647249370f2635862bf0241c86f66ad ]
The failure during iw_cm module initialization partially left the system
with unreleased memory and other resources. Rewrite the module init/exit
routines in such way that netlink commands will be opened only after
successful initialization.
Fixes: b493d91d333e ("iwcm: common code for port mapper")
Link: https://lore.kernel.org/r/b01239f99cb1a3e6d2b0694c242d89e6410bcd93.1627048781.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 74f160ead74bfe5f2b38afb4fcf86189f9ff40c9 ]
Fix a memory leak when "mda_resolve_route() is called more than once on
the same "rdma_cm_id".
This is possible if cma_query_handler() triggers the
RDMA_CM_EVENT_ROUTE_ERROR flow which puts the state machine back and
allows rdma_resolve_route() to be called again.
Link: https://lore.kernel.org/r/f6662b7b-bdb7-2706-1e12-47c61d3474b6@oracle.com
Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 3d8287544223a3d2f37981c1f9ffd94d0b5e9ffc ]
Change location of rdma_restrack_del() to fix the bug where
task_struct was acquired but not released, causing to resource leak.
ucma_create_id() {
ucma_alloc_ctx();
rdma_create_user_id() {
rdma_restrack_new();
rdma_restrack_set_name() {
rdma_restrack_attach_task.part.0(); <--- task_struct was gotten
}
}
ucma_destroy_private_ctx() {
ucma_put_ctx();
rdma_destroy_id() {
_destroy_id() <--- id_priv was freed
}
}
}
Fixes: 889d916b6f8a ("RDMA/core: Don't access cm_id after its destruction")
Link: https://lore.kernel.org/r/073ec27acb943ca8b6961663c47c5abe78a5c8cc.1624948948.git.leonro@nvidia.com
Reported-by: Pavel Skripkin <paskripkin@gmail.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit e84045eab69c625bc0b0bf24d8e05bc65da1eed1 ]
An approximation for the PacketLifeTime is half the local ACK timeout.
The encoding for both timers are logarithmic.
If the local ACK timeout is set, but zero, it means the timer is
disabled. In this case, we choose the CMA_IBOE_PACKET_LIFETIME value,
since 50% of infinite makes no sense.
Before this commit, the PacketLifeTime became 255 if local ACK
timeout was zero (not running).
Fixed by explicitly testing for timeout being zero.
Fixes: e1ee1e62bec4 ("RDMA/cma: Use ACK timeout for RoCE packetLifeTime")
Link: https://lore.kernel.org/r/1624371207-26710-1-git-send-email-haakon.bugge@oracle.com
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit ca0c448d2b9f43e3175835d536853854ef544e22 ]
The struct rdma_id_private contains three bit-fields, tos_set,
timeout_set, and min_rnr_timer_set. These are set by accessor functions
without any synchronization. If two or all accessor functions are invoked
in close proximity in time, there will be Read-Modify-Write from several
contexts to the same variable, and the result will be intermittent.
Fixed by protecting the bit-fields by the qp_mutex in the accessor
functions.
The consumer of timeout_set and min_rnr_timer_set is in
rdma_init_qp_attr(), which is called with qp_mutex held for connected
QPs. Explicit locking is added for the consumers of tos and tos_set.
This commit depends on ("RDMA/cma: Remove unnecessary INIT->INIT
transition"), since the call to rdma_init_qp_attr() from
cma_init_conn_qp() does not hold the qp_mutex.
Fixes: 2c1619edef61 ("IB/cma: Define option to set ack timeout and pack tos_set")
Fixes: 3aeffc46afde ("IB/cma: Introduce rdma_set_min_rnr_timer()")
Link: https://lore.kernel.org/r/1624369197-24578-3-git-send-email-haakon.bugge@oracle.com
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit f97442887275d11c88c2899e720fe945c1f61488 ]
The mlx4 and mlx5 implemented differently the WQ input checks. Instead of
duplicating mlx4 logic in the mlx5, let's prepare the input in the central
place.
The mlx5 implementation didn't check for validity of state input. It is
not real bug because our FW checked that, but still worth to fix.
Fixes: f213c0527210 ("IB/uverbs: Add WQ support")
Link: https://lore.kernel.org/r/ac41ad6a81b095b1a8ad453dcf62cf8d3c5da779.1621413310.git.leonro@nvidia.com
Reported-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 463a3f66473b58d71428a1c3ce69ea52c05440e5 ]
The uapi_get_object() function returns error pointers, it never returns
NULL.
Fixes: 149d3845f4a5 ("RDMA/uverbs: Add a method to introspect handles in a context")
Link: https://lore.kernel.org/r/YJ6Got+U7lz+3n9a@mwanda
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 889d916b6f8a48b8c9489fffcad3b78eedd01a51 ]
restrack should only be attached to a cm_id while the ID has a valid
device pointer. It is set up when the device is first loaded, but not
cleared when the device is removed. There is also two copies of the device
pointer, one private and one in the public API, and these were left out of
sync.
Make everything go to NULL together and manipulate restrack right around
the device assignments.
Found by syzcaller:
BUG: KASAN: wild-memory-access in __list_del include/linux/list.h:112 [inline]
BUG: KASAN: wild-memory-access in __list_del_entry include/linux/list.h:135 [inline]
BUG: KASAN: wild-memory-access in list_del include/linux/list.h:146 [inline]
BUG: KASAN: wild-memory-access in cma_cancel_listens drivers/infiniband/core/cma.c:1767 [inline]
BUG: KASAN: wild-memory-access in cma_cancel_operation drivers/infiniband/core/cma.c:1795 [inline]
BUG: KASAN: wild-memory-access in cma_cancel_operation+0x1f4/0x4b0 drivers/infiniband/core/cma.c:1783
Write of size 8 at addr dead000000000108 by task syz-executor716/334
CPU: 0 PID: 334 Comm: syz-executor716 Not tainted 5.11.0+ #271
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Call Trace:
__dump_stack lib/dump_stack.c:79 [inline]
dump_stack+0xbe/0xf9 lib/dump_stack.c:120
__kasan_report mm/kasan/report.c:400 [inline]
kasan_report.cold+0x5f/0xd5 mm/kasan/report.c:413
__list_del include/linux/list.h:112 [inline]
__list_del_entry include/linux/list.h:135 [inline]
list_del include/linux/list.h:146 [inline]
cma_cancel_listens drivers/infiniband/core/cma.c:1767 [inline]
cma_cancel_operation drivers/infiniband/core/cma.c:1795 [inline]
cma_cancel_operation+0x1f4/0x4b0 drivers/infiniband/core/cma.c:1783
_destroy_id+0x29/0x460 drivers/infiniband/core/cma.c:1862
ucma_close_id+0x36/0x50 drivers/infiniband/core/ucma.c:185
ucma_destroy_private_ctx+0x58d/0x5b0 drivers/infiniband/core/ucma.c:576
ucma_close+0x91/0xd0 drivers/infiniband/core/ucma.c:1797
__fput+0x169/0x540 fs/file_table.c:280
task_work_run+0xb7/0x100 kernel/task_work.c:140
exit_task_work include/linux/task_work.h:30 [inline]
do_exit+0x7da/0x17f0 kernel/exit.c:825
do_group_exit+0x9e/0x190 kernel/exit.c:922
__do_sys_exit_group kernel/exit.c:933 [inline]
__se_sys_exit_group kernel/exit.c:931 [inline]
__x64_sys_exit_group+0x2d/0x30 kernel/exit.c:931
do_syscall_64+0x2d/0x40 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes: 255d0c14b375 ("RDMA/cma: rdma_bind_addr() leaks a cma_dev reference count")
Link: https://lore.kernel.org/r/3352ee288fe34f2b44220457a29bfc0548686363.1620711734.git.leonro@nvidia.com
Signed-off-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 54d87913f147a983589923c7f651f97de9af5be1 ]
The user_entry_size is supplied by the user and later used as a
denominator to calculate number of entries. The zero supplied by the user
will trigger the following divide-by-zero error:
divide error: 0000 [#1] SMP KASAN PTI
CPU: 4 PID: 497 Comm: c_repro Not tainted 5.13.0-rc1+ #281
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
RIP: 0010:ib_uverbs_handler_UVERBS_METHOD_QUERY_GID_TABLE+0x1b1/0x510
Code: 87 59 03 00 00 e8 9f ab 1e ff 48 8d bd a8 00 00 00 e8 d3 70 41 ff 44 0f b7 b5 a8 00 00 00 e8 86 ab 1e ff 31 d2 4c 89 f0 31 ff <49> f7 f5 48 89 d6 48 89 54 24 10 48 89 04 24 e8 1b ad 1e ff 48 8b
RSP: 0018:ffff88810416f828 EFLAGS: 00010246
RAX: 0000000000000008 RBX: 1ffff1102082df09 RCX: ffffffff82183f3d
RDX: 0000000000000000 RSI: ffff888105f2da00 RDI: 0000000000000000
RBP: ffff88810416fa98 R08: 0000000000000001 R09: ffffed102082df5f
R10: ffff88810416faf7 R11: ffffed102082df5e R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000008 R15: ffff88810416faf0
FS: 00007f5715efa740(0000) GS:ffff88811a700000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020000840 CR3: 000000010c2e0001 CR4: 0000000000370ea0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
? ib_uverbs_handler_UVERBS_METHOD_INFO_HANDLES+0x4b0/0x4b0
ib_uverbs_cmd_verbs+0x1546/0x1940
ib_uverbs_ioctl+0x186/0x240
__x64_sys_ioctl+0x38a/0x1220
do_syscall_64+0x3f/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xae
Fixes: 9f85cbe50aa0 ("RDMA/uverbs: Expose the new GID query API to user space")
Link: https://lore.kernel.org/r/b971cc70a8b240a8b5eda33c99fa0558a0071be2.1620657876.git.leonro@nvidia.com
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit cb5cd0ea4eb3ce338a593a5331ddb4986ae20faa ]
The device attach triggers addition of CM_ID to the restrack DB.
However, when error occurs, we releasing this device, but defer CM_ID
release. This causes to the situation where restrack sees CM_ID that
is not valid anymore.
As a solution, add the CM_ID to the resource tracking DB only after the
attachment is finished.
Found by syzcaller:
infiniband syz0: added syz_tun
rdma_rxe: ignoring netdev event = 10 for syz_tun
infiniband syz0: set down
infiniband syz0: ib_query_port failed (-19)
restrack: ------------[ cut here ]------------
infiniband syz0: BUG: RESTRACK detected leak of resources
restrack: User CM_ID object allocated by syz-executor716 is not freed
restrack: ------------[ cut here ]------------
Fixes: b09c4d701220 ("RDMA/restrack: Improve readability in task name management")
Link: https://lore.kernel.org/r/ab93e56ba831eac65c322b3256796fa1589ec0bb.1618753862.git.leonro@nvidia.com
Signed-off-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 194f64a3cad3ab9e381e996a13089de3215d1887 ]
On RoCE systems, a CM REQ contains a Primary Hop Limit > 1 and Primary
Subnet Local is zero.
In cm_req_handler(), the cm_process_routed_req() function is called. Since
the Primary Subnet Local value is zero in the request, and since this is
RoCE (Primary Local LID is permissive), the following statement will be
executed:
IBA_SET(CM_REQ_PRIMARY_SL, req_msg, wc->sl);
This corrupts SL in req_msg if it was different from zero. In other words,
a request to setup a connection using an SL != zero, will not be honored,
and a connection using SL zero will be created instead.
Fixed by not calling cm_process_routed_req() on RoCE systems, the
cm_process_route_req() is only for IB anyhow.
Fixes: 3971c9f6dbf2 ("IB/cm: Add interim support for routed paths")
Link: https://lore.kernel.org/r/1616420132-31005-1-git-send-email-haakon.bugge@oracle.com
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit d1c803a9ccd7bd3aff5e989ccfb39ed3b799b975 ]
The nla_len() is less than or equal to 16. If it's less than 16 then end
of the "gid" buffer is uninitialized.
Fixes: ae43f8286730 ("IB/core: Add IP to GID netlink offload")
Link: https://lore.kernel.org/r/20210405074434.264221-1-leon@kernel.org
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit b116c702791a9834e6485f67ca6267d9fdf59b87 upstream.
RDMA ULPs must not call DMA mapping APIs directly but instead use the
ib_dma_* wrappers.
Fixes: 0c16d9635e3a ("RDMA/umem: Move to allocate SG table from pages")
Link: https://lore.kernel.org/r/20201106181941.1878556-3-hch@lst.de
Reported-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Cc: "Marciniszyn, Mike" <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 221384df6123747d2a75517dd06cc01752f81518 ]
ib_send_cm_sidr_rep() {
spin_lock_irqsave()
cm_send_sidr_rep_locked() {
...
spin_lock_irq()
....
spin_unlock_irq() <--- this will enable interrupts
}
spin_unlock_irqrestore()
}
spin_unlock_irqrestore() expects interrupts to be disabled but the
internal spin_unlock_irq() will always enable hard interrupts.
Fix this by replacing the internal spin_{lock,unlock}_irq() with
irqsave/restore variants.
It fixes the following kernel trace:
raw_local_irq_restore() called with IRQs enabled
WARNING: CPU: 2 PID: 20001 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x1d/0x20
Call Trace:
_raw_spin_unlock_irqrestore+0x4e/0x50
ib_send_cm_sidr_rep+0x3a/0x50 [ib_cm]
cma_send_sidr_rep+0xa1/0x160 [rdma_cm]
rdma_accept+0x25e/0x350 [rdma_cm]
ucma_accept+0x132/0x1cc [rdma_ucm]
ucma_write+0xbf/0x140 [rdma_ucm]
vfs_write+0xc1/0x340
ksys_write+0xb3/0xe0
do_syscall_64+0x2d/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
Fixes: 87c4c774cbef ("RDMA/cm: Protect access to remote_sidr_table")
Link: https://lore.kernel.org/r/20210301081844.445823-1-leon@kernel.org
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit fe454dc31e84f8c14cb8942fcb61666c9f40745b ]
ucma_process_join() allocates struct ucma_multicast mc and frees it if an
error occurs during its run. Specifically, if an error occurs in
copy_to_user(), a use-after-free might happen in the following scenario:
1. mc struct is allocated.
2. rdma_join_multicast() is called and succeeds. During its run,
cma_iboe_join_multicast() enqueues a work that will later use the
aforementioned mc struct.
3. copy_to_user() is called and fails.
4. mc struct is deallocated.
5. The work that was enqueued by cma_iboe_join_multicast() is run and
calls ucma_create_uevent() which tries to access mc struct (which is
freed by now).
Fix this bug by cancelling the work enqueued by cma_iboe_join_multicast().
Since cma_work_handler() frees struct cma_work, we don't use it in
cma_iboe_join_multicast() so we can safely cancel the work later.
The following syzkaller report revealed it:
BUG: KASAN: use-after-free in ucma_create_uevent+0x2dd/0x;3f0 drivers/infiniband/core/ucma.c:272
Read of size 8 at addr ffff88810b3ad110 by task kworker/u8:1/108
CPU: 1 PID: 108 Comm: kworker/u8:1 Not tainted 5.10.0-rc6+ #257
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Workqueue: rdma_cm cma_work_handler
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0xbe/0xf9 lib/dump_stack.c:118
print_address_description.constprop.0+0x3e/0×60 mm/kasan/report.c:385
__kasan_report mm/kasan/report.c:545 [inline]
kasan_report.cold+0x1f/0×37 mm/kasan/report.c:562
ucma_create_uevent+0x2dd/0×3f0 drivers/infiniband/core/ucma.c:272
ucma_event_handler+0xb7/0×3c0 drivers/infiniband/core/ucma.c:349
cma_cm_event_handler+0x5d/0×1c0 drivers/infiniband/core/cma.c:1977
cma_work_handler+0xfa/0×190 drivers/infiniband/core/cma.c:2718
process_one_work+0x54c/0×930 kernel/workqueue.c:2272
worker_thread+0x82/0×830 kernel/workqueue.c:2418
kthread+0x1ca/0×220 kernel/kthread.c:292
ret_from_fork+0x1f/0×30 arch/x86/entry/entry_64.S:296
Allocated by task 359:
kasan_save_stack+0x1b/0×40 mm/kasan/common.c:48
kasan_set_track mm/kasan/common.c:56 [inline]
__kasan_kmalloc mm/kasan/common.c:461 [inline]
__kasan_kmalloc.constprop.0+0xc2/0xd0 mm/kasan/common.c:434
kmalloc include/linux/slab.h:552 [inline]
kzalloc include/linux/slab.h:664 [inline]
ucma_process_join+0x16e/0×3f0 drivers/infiniband/core/ucma.c:1453
ucma_join_multicast+0xda/0×140 drivers/infiniband/core/ucma.c:1538
ucma_write+0x1f7/0×280 drivers/infiniband/core/ucma.c:1724
vfs_write fs/read_write.c:603 [inline]
vfs_write+0x191/0×4c0 fs/read_write.c:585
ksys_write+0x1a1/0×1e0 fs/read_write.c:658
do_syscall_64+0x2d/0×40 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Freed by task 359:
kasan_save_stack+0x1b/0×40 mm/kasan/common.c:48
kasan_set_track+0x1c/0×30 mm/kasan/common.c:56
kasan_set_free_info+0x1b/0×30 mm/kasan/generic.c:355
__kasan_slab_free+0x112/0×160 mm/kasan/common.c:422
slab_free_hook mm/slub.c:1544 [inline]
slab_free_freelist_hook mm/slub.c:1577 [inline]
slab_free mm/slub.c:3142 [inline]
kfree+0xb3/0×3e0 mm/slub.c:4124
ucma_process_join+0x22d/0×3f0 drivers/infiniband/core/ucma.c:1497
ucma_join_multicast+0xda/0×140 drivers/infiniband/core/ucma.c:1538
ucma_write+0x1f7/0×280 drivers/infiniband/core/ucma.c:1724
vfs_write fs/read_write.c:603 [inline]
vfs_write+0x191/0×4c0 fs/read_write.c:585
ksys_write+0x1a1/0×1e0 fs/read_write.c:658
do_syscall_64+0x2d/0×40 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9
The buggy address belongs to the object at ffff88810b3ad100
which belongs to the cache kmalloc-192 of size 192
The buggy address is located 16 bytes inside of
192-byte region [ffff88810b3ad100, ffff88810b3ad1c0)
Fixes: b5de0c60cc30 ("RDMA/cma: Fix use after free race in roce multicast join")
Link: https://lore.kernel.org/r/20210211090517.1278415-1-leon@kernel.org
Reported-by: Amit Matityahu <mitm@nvidia.com>
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 131be26750379592f0dd6244b2a90bbb504a10bb ]
When RDMA device has 255 ports, loop iterator i overflows. Due to which
cm_add_one() port iterator loops infinitely. Use core provided port
iterator to avoid the infinite loop.
Fixes: a977049dacde ("[PATCH] IB: Add the kernel CM implementation")
Link: https://lore.kernel.org/r/20210127150010.1876121-9-leon@kernel.org
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit def4cd43f522253645b72c97181399c241b54536 ]
Currently, polling a umad device will always works, even if the device was
disassociated. A disassociated device should immediately return EPOLLERR
from poll(). Otherwise userspace is endlessly hung on poll() with no idea
that the device has been removed from the system.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Link: https://lore.kernel.org/r/20210125121339.837518-3-leon@kernel.org
Signed-off-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 4fc5461823c9cad547a9bdfbf17d13f0da0d6bb5 ]
MAD message received by the user has EINVAL error in all flows
including when the device is disassociated. That makes it impossible
for the applications to treat such flow differently.
Change it to return EIO, so the applications will be able to perform
disassociation recovery.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Link: https://lore.kernel.org/r/20210125121339.837518-2-leon@kernel.org
Signed-off-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 7c7b3e5d9aeed31d35c5dab0bf9c0fd4c8923206 ]
In default_roce_mode_store(), we took a reference to cma_dev, but didn't
return it with cma_dev_put in the error flow.
Fixes: 1c15b4f2a42f ("RDMA/core: Modify enum ib_gid_type and enum rdma_network_type")
Link: https://lore.kernel.org/r/20210113130214.562108-1-leon@kernel.org
Signed-off-by: Neta Ostrovsky <netao@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit b79f2dc5ffe17b03ec8c55f0d63f65e87bcac676 ]
rounddown_pow_of_two() is undefined when the input is 0. Therefore we need
to avoid it in ib_umem_find_best_pgsz and return 0. Otherwise, it could
result in not rejecting an invalid page size which eventually causes a
kernel oops due to the logical inconsistency.
Fixes: 3361c29e9279 ("RDMA/umem: Use simpler logic for ib_umem_find_best_pgsz()")
Link: https://lore.kernel.org/r/20210113121703.559778-2-leon@kernel.org
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 8ae291cc95e49011b736b641b0cfad502b7a1526 ]
The destruction flow is very complicated here because the cm_id can be
destroyed from the event handler at any time if the device is
hot-removed. This leaves behind a partial ctx with no cm_id in the
xarray, and will let user space leak memory.
Make everything consistent in this flow in all places:
- Return the xarray back to XA_ZERO_ENTRY before beginning any
destruction. The thread that reaches this first is responsible to
kfree, everyone else does nothing.
- Test the xarray during the special hot-removal case to block the
queue_work, this has much simpler locking and doesn't require a
'destroying'
- Fix the ref initialization so that it is only positive if cm_id !=
NULL, then rely on that to guide the destruction process in all cases.
Now the new ucma_destroy_private_ctx() can be called in all places that
want to free the ctx, including all the error unwinds, and none of the
details are missed.
Fixes: a1d33b70dbbc ("RDMA/ucma: Rework how new connections are passed through event delivery")
Link: https://lore.kernel.org/r/20210105111327.230270-1-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 3c638cdb8ecc0442552156e0fed8708dd2c7f35b upstream.
xa_alloc_cyclic() call returns positive number if ID allocation
succeeded but wrapped. It is not an error, so normalize the "ret"
variable to zero as marker of not-an-error.
drivers/infiniband/core/restrack.c:261 rdma_restrack_add()
warn: 'ret' can be either negative or positive
Fixes: fd47c2f99f04 ("RDMA/restrack: Convert internal DB from hash to XArray")
Link: https://lore.kernel.org/r/20201216100753.1127638-1-leon@kernel.org
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 5a7a9e038b032137ae9c45d5429f18a2ffdf7d42 ]
Use the ib_dma_* helpers to skip the DMA translation instead. This
removes the last user if dma_virt_ops and keeps the weird layering
violation inside the RDMA core instead of burderning the DMA mapping
subsystems with it. This also means the software RDMA drivers now don't
have to mess with DMA parameters that are not relevant to them at all, and
that in the future we can use PCI P2P transfers even for software RDMA, as
there is no first fake layer of DMA mapping that the P2P DMA support.
Link: https://lore.kernel.org/r/20201106181941.1878556-8-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit e246b7c035d74abfb3507fa10082d0c42cc016c3 ]
As part of the cma_dev release, that pointer will be set to NULL. In case
it happens in rdma_bind_addr() (part of an error flow), the next call to
addr_handler() will have a call to cma_acquire_dev_by_src_ip() which will
overwrite sgid_attr without releasing it.
WARNING: CPU: 2 PID: 108 at drivers/infiniband/core/cma.c:606 cma_bind_sgid_attr drivers/infiniband/core/cma.c:606 [inline]
WARNING: CPU: 2 PID: 108 at drivers/infiniband/core/cma.c:606 cma_acquire_dev_by_src_ip+0x470/0x4b0 drivers/infiniband/core/cma.c:649
CPU: 2 PID: 108 Comm: kworker/u8:1 Not tainted 5.10.0-rc6+ #257
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Workqueue: ib_addr process_one_req
RIP: 0010:cma_bind_sgid_attr drivers/infiniband/core/cma.c:606 [inline]
RIP: 0010:cma_acquire_dev_by_src_ip+0x470/0x4b0 drivers/infiniband/core/cma.c:649
Code: 66 d9 4a ff 4d 8b 6e 10 49 8d bd 1c 08 00 00 e8 b6 d6 4a ff 45 0f b6 bd 1c 08 00 00 41 83 e7 01 e9 49 fd ff ff e8 90 c5 29 ff <0f> 0b e9 80 fe ff ff e8 84 c5 29 ff 4c 89 f7 e8 2c d9 4a ff 4d 8b
RSP: 0018:ffff8881047c7b40 EFLAGS: 00010293
RAX: ffff888104789c80 RBX: 0000000000000001 RCX: ffffffff820b8ef8
RDX: 0000000000000000 RSI: ffffffff820b9080 RDI: ffff88810cd4c998
RBP: ffff8881047c7c08 R08: ffff888104789c80 R09: ffffed10209f4036
R10: ffff888104fa01ab R11: ffffed10209f4035 R12: ffff88810cd4c800
R13: ffff888105750e28 R14: ffff888108f0a100 R15: ffff88810cd4c998
FS: 0000000000000000(0000) GS:ffff888119c00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000104e60005 CR4: 0000000000370ea0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
addr_handler+0x266/0x350 drivers/infiniband/core/cma.c:3190
process_one_req+0xa3/0x300 drivers/infiniband/core/addr.c:645
process_one_work+0x54c/0x930 kernel/workqueue.c:2272
worker_thread+0x82/0x830 kernel/workqueue.c:2418
kthread+0x1ca/0x220 kernel/kthread.c:292
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:296
Fixes: ff11c6cd521f ("RDMA/cma: Introduce and use cma_acquire_dev_by_src_ip()")
Link: https://lore.kernel.org/r/20201213132940.345554-5-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit e0da68994d16b46384cce7b86eb645f1ef7c51ef ]
Fix incorrect type of max_entries in UVERBS_METHOD_QUERY_GID_TABLE -
max_entries is of type size_t although it can take negative values.
The following static check revealed it:
drivers/infiniband/core/uverbs_std_types_device.c:338 ib_uverbs_handler_UVERBS_METHOD_QUERY_GID_TABLE() warn: 'max_entries' unsigned <= 0
Fixes: 9f85cbe50aa0 ("RDMA/uverbs: Expose the new GID query API to user space")
Link: https://lore.kernel.org/r/20201208073545.9723-4-leon@kernel.org
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 779e0bf47632c609c59f527f9711ecd3214dccb0 ]
In procedure ib_register_device, procedure kobject_uevent is called
(advertising that the device is ready for userspace usage) even when
device_enable_and_get() returned an error.
As a result, various RDMA modules attempted to register for the device
even while the device driver was preparing to unregister the device.
Fix this by advertising the device availability only after enabling the
device succeeds.
Fixes: e7a5b4aafd82 ("RDMA/device: Don't fire uevent before device is fully initialized")
Link: https://lore.kernel.org/r/20201208073545.9723-3-leon@kernel.org
Suggested-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit b47a98efa97889c5b16d17e77eed3dc4500674eb ]
Device memory (DM) are registered as MR during initialization flow, these
MRs were not tracked by resource tracker and had res->valid set as a
false. Update the code to manage them too.
Before this change:
$ ibv_rc_pingpong -j &
$ rdma res show mr <-- shows nothing
After this change:
$ ibv_rc_pingpong -j &
$ rdma res show mr
dev ibp0s9 mrn 0 mrlen 4096 pdn 3 pid 734 comm ibv_rc_pingpong
Fixes: be934cca9e98 ("IB/uverbs: Add device memory registration ioctl support")
Link: https://lore.kernel.org/r/20201117070148.1974114-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit dd37d2f59eb839d51b988f6668ce5f0d533b23fd ]
rdma_detroy_id() cannot be called under &lock - we must instead keep the
error'd ID around until &lock can be released, then destroy it.
This is complicated by the usual way listen IDs are destroyed through
cma_process_remove() which can run at any time and will asynchronously
destroy the same ID.
Remove the ID from visiblity of cma_process_remove() before going down the
destroy path outside the locking.
Fixes: c80a0c52d85c ("RDMA/cma: Add missing error handling of listen_id")
Link: https://lore.kernel.org/r/20201118133756.GK244516@ziepe.ca
Reported-by: syzbot+1bc48bf7f78253f664a9@syzkaller.appspotmail.com
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit c80a0c52d85c49a910d0dc0e342e8d8898677dc0 ]
Don't silently continue if rdma_listen() fails but destroy previously
created CM_ID and return an error to the caller.
Fixes: d02d1f5359e7 ("RDMA/cma: Fix deadlock destroying listen requests")
Link: https://lore.kernel.org/r/20201104144008.3808124-5-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 5333499c6014224756e97fa1a1047dfa592d76d3 ]
Fix to return error code PTR_ERR() from the error handling case instead of
0.
Fixes: 51aab12631dd ("RDMA/core: Get xmit slave for LAG")
Link: https://lore.kernel.org/r/20201016075845.129562-1-jingxiangfeng@huawei.com
Signed-off-by: Jing Xiangfeng <jingxiangfeng@huawei.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
If cm_create_timewait_info() fails, the timewait_info pointer will contain
an error value and will be used in cm_remove_remote() later.
general protection fault, probably for non-canonical address 0xdffffc0000000024: 0000 [#1] SMP KASAN PTI
KASAN: null-ptr-deref in range [0×0000000000000120-0×0000000000000127]
CPU: 2 PID: 12446 Comm: syz-executor.3 Not tainted 5.10.0-rc5-5d4c0742a60e #27
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
RIP: 0010:cm_remove_remote.isra.0+0x24/0×170 drivers/infiniband/core/cm.c:978
Code: 84 00 00 00 00 00 41 54 55 53 48 89 fb 48 8d ab 2d 01 00 00 e8 7d bf 4b fe 48 89 ea 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <0f> b6 04 02 48 89 ea 83 e2 07 38 d0 7f 08 84 c0 0f 85 fc 00 00 00
RSP: 0018:ffff888013127918 EFLAGS: 00010006
RAX: dffffc0000000000 RBX: fffffffffffffff4 RCX: ffffc9000a18b000
RDX: 0000000000000024 RSI: ffffffff82edc573 RDI: fffffffffffffff4
RBP: 0000000000000121 R08: 0000000000000001 R09: ffffed1002624f1d
R10: 0000000000000003 R11: ffffed1002624f1c R12: ffff888107760c70
R13: ffff888107760c40 R14: fffffffffffffff4 R15: ffff888107760c9c
FS: 00007fe1ffcc1700(0000) GS:ffff88811a600000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000001b2ff21000 CR3: 000000010f504001 CR4: 0000000000370ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
cm_destroy_id+0x189/0×15b0 drivers/infiniband/core/cm.c:1155
cma_connect_ib drivers/infiniband/core/cma.c:4029 [inline]
rdma_connect_locked+0x1100/0×17c0 drivers/infiniband/core/cma.c:4107
rdma_connect+0x2a/0×40 drivers/infiniband/core/cma.c:4140
ucma_connect+0x277/0×340 drivers/infiniband/core/ucma.c:1069
ucma_write+0x236/0×2f0 drivers/infiniband/core/ucma.c:1724
vfs_write+0x220/0×830 fs/read_write.c:603
ksys_write+0x1df/0×240 fs/read_write.c:658
do_syscall_64+0x33/0×40 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes: a977049dacde ("[PATCH] IB: Add the kernel CM implementation")
Link: https://lore.kernel.org/r/20201204064205.145795-1-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Reported-by: Amit Matityahu <mitm@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
The query_gid_table ioctl skips non IB/RoCE ports, which as a result
returns an empty gid table for devices such as EFA which have a GID table,
but are not IB/RoCE.
Fixes: c4b4d548fabc ("RDMA/core: Introduce new GID table query API")
Link: https://lore.kernel.org/r/20201206153238.34878-1-galpress@amazon.com
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
The xarray is never mutated from an IRQ handler, only from work queues
under a spinlock_irq. Thus there is no reason for it be an IRQ type
xarray.
This was copied over from the original IDR code, but the recent rework put
the xarray inside another spinlock_irq which will unbalance the unlocking.
Fixes: c206f8bad15d ("RDMA/cm: Make it clearer how concurrency works in cm_req_handler()")
Link: https://lore.kernel.org/r/0-v1-808b6da3bd3f+1857-cm_xarray_no_irq_jgg@nvidia.com
Reported-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
There are two flows for handling RDMA_CM_EVENT_ROUTE_RESOLVED, either the
handler triggers a completion and another thread does rdma_connect() or
the handler directly calls rdma_connect().
In all cases rdma_connect() needs to hold the handler_mutex, but when
handler's are invoked this is already held by the core code. This causes
ULPs using the 2nd method to deadlock.
Provide a rdma_connect_locked() and have all ULPs call it from their
handlers.
Link: https://lore.kernel.org/r/0-v2-53c22d5c1405+33-rdma_connect_locking_jgg@nvidia.com
Reported-and-tested-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Fixes: 2a7cec538169 ("RDMA/cma: Fix locking for the RDMA_CM_CONNECT state")
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Acked-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Some drivers (such as EFA) have a GID table, but aren't IB/RoCE devices.
Remove the unnecessary rdma_ib_or_roce() check.
This fixes rdma-core failures for EFA when it uses the new ioctl interface
for querying the GID table.
Fixes: 9f85cbe50aa0 ("RDMA/uverbs: Expose the new GID query API to user space")
Link: https://lore.kernel.org/r/20201026082621.32463-1-galpress@amazon.com
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Pull rdma updates from Jason Gunthorpe:
"A usual cycle for RDMA with a typical mix of driver and core subsystem
updates:
- Driver minor changes and bug fixes for mlx5, efa, rxe, vmw_pvrdma,
hns, usnic, qib, qedr, cxgb4, hns, bnxt_re
- Various rtrs fixes and updates
- Bug fix for mlx4 CM emulation for virtualization scenarios where
MRA wasn't working right
- Use tracepoints instead of pr_debug in the CM code
- Scrub the locking in ucma and cma to close more syzkaller bugs
- Use tasklet_setup in the subsystem
- Revert the idea that 'destroy' operations are not allowed to fail
at the driver level. This proved unworkable from a HW perspective.
- Revise how the umem API works so drivers make fewer mistakes using
it
- XRC support for qedr
- Convert uverbs objects RWQ and MW to new the allocation scheme
- Large queue entry sizes for hns
- Use hmm_range_fault() for mlx5 On Demand Paging
- uverbs APIs to inspect the GID table instead of sysfs
- Move some of the RDMA code for building large page SGLs into
lib/scatterlist"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (191 commits)
RDMA/ucma: Fix use after free in destroy id flow
RDMA/rxe: Handle skb_clone() failure in rxe_recv.c
RDMA/rxe: Move the definitions for rxe_av.network_type to uAPI
RDMA: Explicitly pass in the dma_device to ib_register_device
lib/scatterlist: Do not limit max_segment to PAGE_ALIGNED values
IB/mlx4: Convert rej_tmout radix-tree to XArray
RDMA/rxe: Fix bug rejecting all multicast packets
RDMA/rxe: Fix skb lifetime in rxe_rcv_mcast_pkt()
RDMA/rxe: Remove duplicate entries in struct rxe_mr
IB/hfi,rdmavt,qib,opa_vnic: Update MAINTAINERS
IB/rdmavt: Fix sizeof mismatch
MAINTAINERS: CISCO VIC LOW LATENCY NIC DRIVER
RDMA/bnxt_re: Fix sizeof mismatch for allocation of pbl_tbl.
RDMA/bnxt_re: Use rdma_umem_for_each_dma_block()
RDMA/umem: Move to allocate SG table from pages
lib/scatterlist: Add support in dynamic allocation of SG table from pages
tools/testing/scatterlist: Show errors in human readable form
tools/testing/scatterlist: Rejuvenate bit-rotten test
RDMA/ipoib: Set rtnl_link_ops for ipoib interfaces
RDMA/uverbs: Expose the new GID query API to user space
...
|
|
The preceding patches have ensured that core dumping properly takes the
mmap_lock. Thanks to that, we can now remove mmget_still_valid() and all
its users.
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: "Eric W . Biederman" <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Link: http://lkml.kernel.org/r/20200827114932.3572699-8-jannh@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
ucma_free_ctx() should call to __destroy_id() on all the connection requests
that have not been delivered to user space. Currently it calls on the
context itself and cause to use after free.
Fixes the trace:
BUG: Unable to handle kernel data access on write at 0x5deadbeef0000108
Faulting instruction address: 0xc0080000002428f4
Oops: Kernel access of bad area, sig: 11 [#1]
Call Trace:
[c000000207f2b680] [c00800000024280c] .__destroy_id+0x28c/0x610 [rdma_ucm] (unreliable)
[c000000207f2b750] [c0080000002429c4] .__destroy_id+0x444/0x610 [rdma_ucm]
[c000000207f2b820] [c008000000242c24] .ucma_close+0x94/0xf0 [rdma_ucm]
[c000000207f2b8c0] [c00000000046fbdc] .__fput+0xac/0x330
[c000000207f2b960] [c00000000015d48c] .task_work_run+0xbc/0x110
[c000000207f2b9f0] [c00000000012fb00] .do_exit+0x430/0xc50
[c000000207f2bae0] [c0000000001303ec] .do_group_exit+0x5c/0xd0
[c000000207f2bb70] [c000000000144a34] .get_signal+0x194/0xe30
[c000000207f2bc60] [c00000000001f6b4] .do_notify_resume+0x124/0x470
[c000000207f2bd60] [c000000000032484] .interrupt_exit_user_prepare+0x1b4/0x240
[c000000207f2be20] [c000000000010034] interrupt_return+0x14/0x1c0
Rename listen_ctx to conn_req_ctx as the poor name was the cause of this
bug.
Fixes: a1d33b70dbbc ("RDMA/ucma: Rework how new connections are passed through event delivery")
Link: https://lore.kernel.org/r/20201012045600.418271-4-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
The code in setup_dma_device has become rather convoluted, move all of
this to the drivers. Drives now pass in a DMA capable struct device which
will be used to setup DMA, or drivers must fully configure the ibdev for
DMA and pass in NULL.
Other than setting the masks in rvt all drivers were doing this already
anyhow.
mthca, mlx4 and mlx5 were already setting up maximum DMA segment size for
DMA based on their hardweare limits in:
__mthca_init_one()
dma_set_max_seg_size (1G)
__mlx4_init_one()
dma_set_max_seg_size (1G)
mlx5_pci_init()
set_dma_caps()
dma_set_max_seg_size (2G)
Other non software drivers (except usnic) were extended to UINT_MAX [1, 2]
instead of 2G as was before.
[1] https://lore.kernel.org/linux-rdma/20200924114940.GE9475@nvidia.com/
[2] https://lore.kernel.org/linux-rdma/20200924114940.GE9475@nvidia.com/
Link: https://lore.kernel.org/r/20201008082752.275846-1-leon@kernel.org
Link: https://lore.kernel.org/r/6b2ed339933d066622d5715903870676d8cc523a.1602590106.git.mchehab+huawei@kernel.org
Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
From Maor Gottlieb says:
====================
This series extends __sg_alloc_table_from_pages to allow chaining of new
pages to an already initialized SG table.
This allows for drivers to utilize the optimization of merging contiguous
pages without a need to pre allocate all the pages and hold them in a very
large temporary buffer prior to the call to SG table initialization.
The last patch changes the Infiniband core to use the new API. It removes
duplicate functionality from the code and benefits from the optimization
of allocating dynamic SG table from pages.
In huge pages system of 2MB page size, without this change, the SG table
would contain x512 SG entries.
====================
* branch 'dynamic_sg':
RDMA/umem: Move to allocate SG table from pages
lib/scatterlist: Add support in dynamic allocation of SG table from pages
tools/testing/scatterlist: Show errors in human readable form
tools/testing/scatterlist: Rejuvenate bit-rotten test
|
|
Remove the implementation of ib_umem_add_sg_table and instead
call to __sg_alloc_table_from_pages which already has the logic to
merge contiguous pages.
Besides that it removes duplicated functionality, it reduces the
memory consumption of the SG table significantly. Prior to this
patch, the SG table was allocated in advance regardless consideration
of contiguous pages.
In huge pages system of 2MB page size, without this change, the SG table
would contain x512 SG entries.
E.g. for 100GB memory registration:
Number of entries Size
Before 26214400 600.0MB
After 51200 1.2MB
Link: https://lore.kernel.org/r/20201004154340.1080481-5-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Pull networking fixes from David Miller:
1) Make sure SKB control block is in the proper state during IPSEC
ESP-in-TCP encapsulation. From Sabrina Dubroca.
2) Various kinds of attributes were not being cloned properly when we
build new xfrm_state objects from existing ones. Fix from Antony
Antony.
3) Make sure to keep BTF sections, from Tony Ambardar.
4) TX DMA channels need proper locking in lantiq driver, from Hauke
Mehrtens.
5) Honour route MTU during forwarding, always. From Maciej
Żenczykowski.
6) Fix races in kTLS which can result in crashes, from Rohit
Maheshwari.
7) Skip TCP DSACKs with rediculous sequence ranges, from Priyaranjan
Jha.
8) Use correct address family in xfrm state lookups, from Herbert Xu.
9) A bridge FDB flush should not clear out user managed fdb entries
with the ext_learn flag set, from Nikolay Aleksandrov.
10) Fix nested locking of netdev address lists, from Taehee Yoo.
11) Fix handling of 32-bit DATA_FIN values in mptcp, from Mat Martineau.
12) Fix r8169 data corruptions on RTL8402 chips, from Heiner Kallweit.
13) Don't free command entries in mlx5 while comp handler could still be
running, from Eran Ben Elisha.
14) Error flow of request_irq() in mlx5 is busted, due to an off by one
we try to free and IRQ never allocated. From Maor Gottlieb.
15) Fix leak when dumping netlink policies, from Johannes Berg.
16) Sendpage cannot be performed when a page is a slab page, or the page
count is < 1. Some subsystems such as nvme were doing so. Create a
"sendpage_ok()" helper and use it as needed, from Coly Li.
17) Don't leak request socket when using syncookes with mptcp, from
Paolo Abeni.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (111 commits)
net/core: check length before updating Ethertype in skb_mpls_{push,pop}
net: mvneta: fix double free of txq->buf
net_sched: check error pointer in tcf_dump_walker()
net: team: fix memory leak in __team_options_register
net: typhoon: Fix a typo Typoon --> Typhoon
net: hinic: fix DEVLINK build errors
net: stmmac: Modify configuration method of EEE timers
tcp: fix syn cookied MPTCP request socket leak
libceph: use sendpage_ok() in ceph_tcp_sendpage()
scsi: libiscsi: use sendpage_ok() in iscsi_tcp_segment_map()
drbd: code cleanup by using sendpage_ok() to check page for kernel_sendpage()
tcp: use sendpage_ok() to detect misused .sendpage
nvme-tcp: check page by sendpage_ok() before calling kernel_sendpage()
net: add WARN_ONCE in kernel_sendpage() for improper zero-copy send
net: introduce helper sendpage_ok() in include/linux/net.h
net: usb: pegasus: Proper error handing when setting pegasus' MAC address
net: core: document two new elements of struct net_device
netlink: fix policy dump leak
net/mlx5e: Fix race condition on nhe->n pointer in neigh update
net/mlx5e: Fix VLAN create flow
...
|
|
Expose the query GID table and entry API to user space by adding two new
methods and method handlers to the device object.
This API provides a faster way to query a GID table using single call and
will be used in libibverbs to improve current approach that requires
multiple calls to open, close and read multiple sysfs files for a single
GID table entry.
Link: https://lore.kernel.org/r/20200923165015.2491894-5-leon@kernel.org
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Introduce rdma_query_gid_table which enables querying all the GID tables
of a given device and copying the attributes of all valid GID entries to a
provided buffer.
This API provides a faster way to query a GID table using single call and
will be used in libibverbs to improve current approach that requires
multiple calls to open, close and read multiple sysfs files for a single
GID table entry.
Link: https://lore.kernel.org/r/20200923165015.2491894-4-leon@kernel.org
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|