Age | Commit message (Collapse) | Author | Files | Lines |
|
commit ef95a90ae6f4f21990e1f7ced6719784a409e811 upstream.
Validating input parameters should be done before getting the cm_id
otherwise it can leak a cm_id reference.
Fixes: 6a21dfc0d0db ("RDMA/ucma: Limit possible option size")
Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
[iwamatsu: Backported to 4.4, 4.9 and 4.14: adjust context]
Signed-off-by: Nobuhiro Iwamatsu (CIP) <nobuhiro1.iwamatsu@toshiba.co.jp>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit a3671a4f973ee9d9621d60166cc3b037c397d604 upstream.
hdr.cmd can be indirectly controlled by user-space, hence leading to
a potential exploitation of the Spectre variant 1 vulnerability.
This issue was detected with the help of Smatch:
drivers/infiniband/core/ucma.c:1686 ucma_write() warn: potential
spectre issue 'ucma_cmd_table' [r] (local cap)
Fix this by sanitizing hdr.cmd before using it to index
ucm_cmd_table.
Notice that given that speculation windows are large, the policy is
to kill the speculation on the first load and not worry if it can be
completed with a dependent load/store [1].
[1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2
Cc: stable@vger.kernel.org
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 5fe23f262e0548ca7f19fb79f89059a60d087d22 upstream.
There is a race condition between ucma_close() and ucma_resolve_ip():
CPU0 CPU1
ucma_resolve_ip(): ucma_close():
ctx = ucma_get_ctx(file, cmd.id);
list_for_each_entry_safe(ctx, tmp, &file->ctx_list, list) {
mutex_lock(&mut);
idr_remove(&ctx_idr, ctx->id);
mutex_unlock(&mut);
...
mutex_lock(&mut);
if (!ctx->closing) {
mutex_unlock(&mut);
rdma_destroy_id(ctx->cm_id);
...
ucma_free_ctx(ctx);
ret = rdma_resolve_addr();
ucma_put_ctx(ctx);
Before idr_remove(), ucma_get_ctx() could still find the ctx
and after rdma_destroy_id(), rdma_resolve_addr() may still
access id_priv pointer. Also, ucma_put_ctx() may use ctx after
ucma_free_ctx() too.
ucma_close() should call ucma_put_ctx() too which tests the
refcnt and waits for the last one releasing it. The similar
pattern is already used by ucma_destroy_id().
Reported-and-tested-by: syzbot+da2591e115d57a9cbb8b@syzkaller.appspotmail.com
Reported-by: syzbot+cfe3c1e8ef634ba8964b@syzkaller.appspotmail.com
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 0d23ba6034b9cf48b8918404367506da3e4b3ee5 ]
The current code grabs the private_data of whatever file descriptor
userspace has supplied and implicitly casts it to a `struct ucma_file *`,
potentially causing a type confusion.
This is probably fine in practice because the pointer is only used for
comparisons, it is never actually dereferenced; and even in the
comparisons, it is unlikely that a file from another filesystem would have
a ->private_data pointer that happens to also be valid in this context.
But ->private_data is not always guaranteed to be a valid pointer to an
object owned by the file's filesystem; for example, some filesystems just
cram numbers in there.
Check the type of the supplied file descriptor to be safe, analogous to how
other places in the kernel do it.
Fixes: 88314e4dda1e ("RDMA/cma: add support for rdma_migrate_id()")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit cb2595c1393b4a5211534e6f0a0fbad369e21ad8 ]
ucma_process_join() will free the new allocated "mc" struct,
if there is any error after that, especially the copy_to_user().
But in parallel, ucma_leave_multicast() could find this "mc"
through idr_find() before ucma_process_join() frees it, since it
is already published.
So "mc" could be used in ucma_leave_multicast() after it is been
allocated and freed in ucma_process_join(), since we don't refcnt
it.
Fix this by separating "publish" from ID allocation, so that we
can get an ID first and publish it later after copy_to_user().
Fixes: c8f6a362bf3e ("RDMA/cma: Add multicast communication support")
Reported-by: Noam Rathaus <noamr@beyondsecurity.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 5f3e3b85cc0a5eae1c46d72e47d3de7bf208d9e2 ]
The option size check is using optval instead of optlen
causing the set option call to fail. Use the correct
field, optlen, for size check.
Fixes: 6a21dfc0d0db ("RDMA/ucma: Limit possible option size")
Signed-off-by: Chien Tin Tung <chien.tin.tung@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 09abfe7b5b2f442a85f4c4d59ecf582ad76088d7 upstream.
The RDMA CM will select a source device and address by consulting
the routing table if no source address is passed into
rdma_resolve_address(). Userspace will ask for this by passing an
all-zero source address in the RESOLVE_IP command. Unfortunately
the new check for non-zero address size rejects this with EINVAL,
which breaks valid userspace applications.
Fix this by explicitly allowing a zero address family for the source.
Fixes: 2975d5de6428 ("RDMA/ucma: Check AF family prior resolving address")
Cc: <stable@vger.kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 8435168d50e66fa5eae01852769d20a36f9e5e83 upstream.
Check to make sure that ctx->cm_id->device is set before we use it.
Otherwise userspace can trigger a NULL dereference by doing
RDMA_USER_CM_CMD_SET_OPTION on an ID that is not bound to a device.
Cc: <stable@vger.kernel.org>
Reported-by: <syzbot+a67bc93e14682d92fc2f@syzkaller.appspotmail.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 89838118a515847d3e5c904d2e022779a7173bec ]
The 'if' logic in ucma_query_path was broken with OPA was introduced
and started to treat RoCE paths as as OPA paths. Invert the logic
of the 'if' so only OPA paths are treated as OPA paths.
Otherwise the path records returned to rdma_cma users are mangled
when in RoCE mode.
Fixes: 57520751445b ("IB/SA: Add OPA path record type")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 84652aefb347297aa08e91e283adf7b18f77c2d5 upstream.
There are several places in the ucma ABI where userspace can pass in a
sockaddr but set the address family to AF_IB. When that happens,
rdma_addr_size() will return a size bigger than sizeof struct sockaddr_in6,
and the ucma kernel code might end up copying past the end of a buffer
not sized for a struct sockaddr_ib.
Fix this by introducing new variants
int rdma_addr_size_in6(struct sockaddr_in6 *addr);
int rdma_addr_size_kss(struct __kernel_sockaddr_storage *addr);
that are type-safe for the types used in the ucma ABI and return 0 if the
size computed is bigger than the size of the type passed in. We can use
these new variants to check what size userspace has passed in before
copying any addresses.
Reported-by: <syzbot+6800425d54ed3ed8135d@syzkaller.appspotmail.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit c8d3bcbfc5eab3f01cf373d039af725f3b488813 upstream.
Ensure that device exists prior to accessing its properties.
Reported-by: <syzbot+71655d44855ac3e76366@syzkaller.appspotmail.com>
Fixes: 75216638572f ("RDMA/cma: Export rdma cm interface to userspace")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 4b658d1bbc16605330694bb3ef2570c465ef383d upstream.
Add missing check that device is connected prior to access it.
[ 55.358652] BUG: KASAN: null-ptr-deref in rdma_init_qp_attr+0x4a/0x2c0
[ 55.359389] Read of size 8 at addr 00000000000000b0 by task qp/618
[ 55.360255]
[ 55.360432] CPU: 1 PID: 618 Comm: qp Not tainted 4.16.0-rc1-00071-gcaf61b1b8b88 #91
[ 55.361693] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org 04/01/2014
[ 55.363264] Call Trace:
[ 55.363833] dump_stack+0x5c/0x77
[ 55.364215] kasan_report+0x163/0x380
[ 55.364610] ? rdma_init_qp_attr+0x4a/0x2c0
[ 55.365238] rdma_init_qp_attr+0x4a/0x2c0
[ 55.366410] ucma_init_qp_attr+0x111/0x200
[ 55.366846] ? ucma_notify+0xf0/0xf0
[ 55.367405] ? _get_random_bytes+0xea/0x1b0
[ 55.367846] ? urandom_read+0x2f0/0x2f0
[ 55.368436] ? kmem_cache_alloc_trace+0xd2/0x1e0
[ 55.369104] ? refcount_inc_not_zero+0x9/0x60
[ 55.369583] ? refcount_inc+0x5/0x30
[ 55.370155] ? rdma_create_id+0x215/0x240
[ 55.370937] ? _copy_to_user+0x4f/0x60
[ 55.371620] ? mem_cgroup_commit_charge+0x1f5/0x290
[ 55.372127] ? _copy_from_user+0x5e/0x90
[ 55.372720] ucma_write+0x174/0x1f0
[ 55.373090] ? ucma_close_id+0x40/0x40
[ 55.373805] ? __lru_cache_add+0xa8/0xd0
[ 55.374403] __vfs_write+0xc4/0x350
[ 55.374774] ? kernel_read+0xa0/0xa0
[ 55.375173] ? fsnotify+0x899/0x8f0
[ 55.375544] ? fsnotify_unmount_inodes+0x170/0x170
[ 55.376689] ? __fsnotify_update_child_dentry_flags+0x30/0x30
[ 55.377522] ? handle_mm_fault+0x174/0x320
[ 55.378169] vfs_write+0xf7/0x280
[ 55.378864] SyS_write+0xa1/0x120
[ 55.379270] ? SyS_read+0x120/0x120
[ 55.379643] ? mm_fault_error+0x180/0x180
[ 55.380071] ? task_work_run+0x7d/0xd0
[ 55.380910] ? __task_pid_nr_ns+0x120/0x140
[ 55.381366] ? SyS_read+0x120/0x120
[ 55.381739] do_syscall_64+0xeb/0x250
[ 55.382143] entry_SYSCALL_64_after_hwframe+0x21/0x86
[ 55.382841] RIP: 0033:0x7fc2ef803e99
[ 55.383227] RSP: 002b:00007fffcc5f3be8 EFLAGS: 00000217 ORIG_RAX: 0000000000000001
[ 55.384173] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc2ef803e99
[ 55.386145] RDX: 0000000000000057 RSI: 0000000020000080 RDI: 0000000000000003
[ 55.388418] RBP: 00007fffcc5f3c00 R08: 0000000000000000 R09: 0000000000000000
[ 55.390542] R10: 0000000000000000 R11: 0000000000000217 R12: 0000000000400480
[ 55.392916] R13: 00007fffcc5f3cf0 R14: 0000000000000000 R15: 0000000000000000
[ 55.521088] Code: e5 4d 1e ff 48 89 df 44 0f b6 b3 b8 01 00 00 e8 65 50 1e ff 4c 8b 2b 49
8d bd b0 00 00 00 e8 56 50 1e ff 41 0f b6 c6 48 c1 e0 04 <49> 03 85 b0 00 00 00 48 8d 78 08
48 89 04 24 e8 3a 4f 1e ff 48
[ 55.525980] RIP: rdma_init_qp_attr+0x52/0x2c0 RSP: ffff8801e2c2f9d8
[ 55.532648] CR2: 00000000000000b0
[ 55.534396] ---[ end trace 70cee64090251c0b ]---
Fixes: 75216638572f ("RDMA/cma: Export rdma cm interface to userspace")
Fixes: d541e45500bd ("IB/core: Convert ah_attr from OPA to IB when copying to user")
Reported-by: <syzbot+7b62c837c2516f8f38c8@syzkaller.appspotmail.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit e8980d67d6017c8eee8f9c35f782c4bd68e004c9 upstream.
Prior to access UCMA commands, the context should be initialized
and connected to CM_ID with ucma_create_id(). In case user skips
this step, he can provide non-valid ctx without CM_ID and cause
to multiple NULL dereferences.
Also there are situations where the create_id can be raced with
other user access, ensure that the context is only shared to
other threads once it is fully initialized to avoid the races.
[ 109.088108] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
[ 109.090315] IP: ucma_connect+0x138/0x1d0
[ 109.092595] PGD 80000001dc02d067 P4D 80000001dc02d067 PUD 1da9ef067 PMD 0
[ 109.095384] Oops: 0000 [#1] SMP KASAN PTI
[ 109.097834] CPU: 0 PID: 663 Comm: uclose Tainted: G B 4.16.0-rc1-00062-g2975d5de6428 #45
[ 109.100816] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org 04/01/2014
[ 109.105943] RIP: 0010:ucma_connect+0x138/0x1d0
[ 109.108850] RSP: 0018:ffff8801c8567a80 EFLAGS: 00010246
[ 109.111484] RAX: 0000000000000000 RBX: 1ffff100390acf50 RCX: ffffffff9d7812e2
[ 109.114496] RDX: 1ffffffff3f507a5 RSI: 0000000000000297 RDI: 0000000000000297
[ 109.117490] RBP: ffff8801daa15600 R08: 0000000000000000 R09: ffffed00390aceeb
[ 109.120429] R10: 0000000000000001 R11: ffffed00390aceea R12: 0000000000000000
[ 109.123318] R13: 0000000000000120 R14: ffff8801de6459c0 R15: 0000000000000118
[ 109.126221] FS: 00007fabb68d6700(0000) GS:ffff8801e5c00000(0000) knlGS:0000000000000000
[ 109.129468] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 109.132523] CR2: 0000000000000020 CR3: 00000001d45d8003 CR4: 00000000003606b0
[ 109.135573] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 109.138716] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 109.142057] Call Trace:
[ 109.144160] ? ucma_listen+0x110/0x110
[ 109.146386] ? wake_up_q+0x59/0x90
[ 109.148853] ? futex_wake+0x10b/0x2a0
[ 109.151297] ? save_stack+0x89/0xb0
[ 109.153489] ? _copy_from_user+0x5e/0x90
[ 109.155500] ucma_write+0x174/0x1f0
[ 109.157933] ? ucma_resolve_route+0xf0/0xf0
[ 109.160389] ? __mod_node_page_state+0x1d/0x80
[ 109.162706] __vfs_write+0xc4/0x350
[ 109.164911] ? kernel_read+0xa0/0xa0
[ 109.167121] ? path_openat+0x1b10/0x1b10
[ 109.169355] ? fsnotify+0x899/0x8f0
[ 109.171567] ? fsnotify_unmount_inodes+0x170/0x170
[ 109.174145] ? __fget+0xa8/0xf0
[ 109.177110] vfs_write+0xf7/0x280
[ 109.179532] SyS_write+0xa1/0x120
[ 109.181885] ? SyS_read+0x120/0x120
[ 109.184482] ? compat_start_thread+0x60/0x60
[ 109.187124] ? SyS_read+0x120/0x120
[ 109.189548] do_syscall_64+0xeb/0x250
[ 109.192178] entry_SYSCALL_64_after_hwframe+0x21/0x86
[ 109.194725] RIP: 0033:0x7fabb61ebe99
[ 109.197040] RSP: 002b:00007fabb68d5e98 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
[ 109.200294] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fabb61ebe99
[ 109.203399] RDX: 0000000000000120 RSI: 00000000200001c0 RDI: 0000000000000004
[ 109.206548] RBP: 00007fabb68d5ec0 R08: 0000000000000000 R09: 0000000000000000
[ 109.209902] R10: 0000000000000000 R11: 0000000000000202 R12: 00007fabb68d5fc0
[ 109.213327] R13: 0000000000000000 R14: 00007fff40ab2430 R15: 00007fabb68d69c0
[ 109.216613] Code: 88 44 24 2c 0f b6 84 24 6e 01 00 00 88 44 24 2d 0f
b6 84 24 69 01 00 00 88 44 24 2e 8b 44 24 60 89 44 24 30 e8 da f6 06 ff
31 c0 <66> 41 83 7c 24 20 1b 75 04 8b 44 24 64 48 8d 74 24 20 4c 89 e7
[ 109.223602] RIP: ucma_connect+0x138/0x1d0 RSP: ffff8801c8567a80
[ 109.226256] CR2: 0000000000000020
Fixes: 75216638572f ("RDMA/cma: Export rdma cm interface to userspace")
Reported-by: <syzbot+36712f50b0552615bf59@syzkaller.appspotmail.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit ed65a4dc22083e73bac599ded6a262318cad7baf upstream.
The error in ucma_create_id() left ctx in the list of contexts belong
to ucma file descriptor. The attempt to close this file descriptor causes
to use-after-free accesses while iterating over such list.
Fixes: 75216638572f ("RDMA/cma: Export rdma cm interface to userspace")
Reported-by: <syzbot+dcfd344365a56fbebd0f@syzkaller.appspotmail.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 2975d5de6428ff6d9317e9948f0968f7d42e5d74 upstream.
Garbage supplied by user will cause to UCMA module provide zero
memory size for memcpy(), because it wasn't checked, it will
produce unpredictable results in rdma_resolve_addr().
[ 42.873814] BUG: KASAN: null-ptr-deref in rdma_resolve_addr+0xc8/0xfb0
[ 42.874816] Write of size 28 at addr 00000000000000a0 by task resaddr/1044
[ 42.876765]
[ 42.876960] CPU: 1 PID: 1044 Comm: resaddr Not tainted 4.16.0-rc1-00057-gaa56a5293d7e #34
[ 42.877840] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org 04/01/2014
[ 42.879691] Call Trace:
[ 42.880236] dump_stack+0x5c/0x77
[ 42.880664] kasan_report+0x163/0x380
[ 42.881354] ? rdma_resolve_addr+0xc8/0xfb0
[ 42.881864] memcpy+0x34/0x50
[ 42.882692] rdma_resolve_addr+0xc8/0xfb0
[ 42.883366] ? deref_stack_reg+0x88/0xd0
[ 42.883856] ? vsnprintf+0x31a/0x770
[ 42.884686] ? rdma_bind_addr+0xc40/0xc40
[ 42.885327] ? num_to_str+0x130/0x130
[ 42.885773] ? deref_stack_reg+0x88/0xd0
[ 42.886217] ? __read_once_size_nocheck.constprop.6+0x10/0x10
[ 42.887698] ? unwind_get_return_address_ptr+0x50/0x50
[ 42.888302] ? replace_slot+0x147/0x170
[ 42.889176] ? delete_node+0x12c/0x340
[ 42.890223] ? __radix_tree_lookup+0xa9/0x160
[ 42.891196] ? ucma_resolve_ip+0xb7/0x110
[ 42.891917] ucma_resolve_ip+0xb7/0x110
[ 42.893003] ? ucma_resolve_addr+0x190/0x190
[ 42.893531] ? _copy_from_user+0x5e/0x90
[ 42.894204] ucma_write+0x174/0x1f0
[ 42.895162] ? ucma_resolve_route+0xf0/0xf0
[ 42.896309] ? dequeue_task_fair+0x67e/0xd90
[ 42.897192] ? put_prev_entity+0x7d/0x170
[ 42.897870] ? ring_buffer_record_is_on+0xd/0x20
[ 42.898439] ? tracing_record_taskinfo_skip+0x20/0x50
[ 42.899686] __vfs_write+0xc4/0x350
[ 42.900142] ? kernel_read+0xa0/0xa0
[ 42.900602] ? firmware_map_remove+0xdf/0xdf
[ 42.901135] ? do_task_dead+0x5d/0x60
[ 42.901598] ? do_exit+0xcc6/0x1220
[ 42.902789] ? __fget+0xa8/0xf0
[ 42.903190] vfs_write+0xf7/0x280
[ 42.903600] SyS_write+0xa1/0x120
[ 42.904206] ? SyS_read+0x120/0x120
[ 42.905710] ? compat_start_thread+0x60/0x60
[ 42.906423] ? SyS_read+0x120/0x120
[ 42.908716] do_syscall_64+0xeb/0x250
[ 42.910760] entry_SYSCALL_64_after_hwframe+0x21/0x86
[ 42.912735] RIP: 0033:0x7f138b0afe99
[ 42.914734] RSP: 002b:00007f138b799e98 EFLAGS: 00000287 ORIG_RAX: 0000000000000001
[ 42.917134] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f138b0afe99
[ 42.919487] RDX: 000000000000002e RSI: 0000000020000c40 RDI: 0000000000000004
[ 42.922393] RBP: 00007f138b799ec0 R08: 00007f138b79a700 R09: 0000000000000000
[ 42.925266] R10: 00007f138b79a700 R11: 0000000000000287 R12: 00007f138b799fc0
[ 42.927570] R13: 0000000000000000 R14: 00007ffdbae757c0 R15: 00007f138b79a9c0
[ 42.930047]
[ 42.932681] Disabling lock debugging due to kernel taint
[ 42.934795] BUG: unable to handle kernel NULL pointer dereference at 00000000000000a0
[ 42.936939] IP: memcpy_erms+0x6/0x10
[ 42.938864] PGD 80000001bea92067 P4D 80000001bea92067 PUD 1bea96067 PMD 0
[ 42.941576] Oops: 0002 [#1] SMP KASAN PTI
[ 42.943952] CPU: 1 PID: 1044 Comm: resaddr Tainted: G B 4.16.0-rc1-00057-gaa56a5293d7e #34
[ 42.946964] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org 04/01/2014
[ 42.952336] RIP: 0010:memcpy_erms+0x6/0x10
[ 42.954707] RSP: 0018:ffff8801c8b479c8 EFLAGS: 00010286
[ 42.957227] RAX: 00000000000000a0 RBX: ffff8801c8b47ba0 RCX: 000000000000001c
[ 42.960543] RDX: 000000000000001c RSI: ffff8801c8b47bbc RDI: 00000000000000a0
[ 42.963867] RBP: ffff8801c8b47b60 R08: 0000000000000000 R09: ffffed0039168ed1
[ 42.967303] R10: 0000000000000001 R11: ffffed0039168ed0 R12: ffff8801c8b47bbc
[ 42.970685] R13: 00000000000000a0 R14: 1ffff10039168f4a R15: 0000000000000000
[ 42.973631] FS: 00007f138b79a700(0000) GS:ffff8801e5d00000(0000) knlGS:0000000000000000
[ 42.976831] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 42.979239] CR2: 00000000000000a0 CR3: 00000001be908002 CR4: 00000000003606a0
[ 42.982060] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 42.984877] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 42.988033] Call Trace:
[ 42.990487] rdma_resolve_addr+0xc8/0xfb0
[ 42.993202] ? deref_stack_reg+0x88/0xd0
[ 42.996055] ? vsnprintf+0x31a/0x770
[ 42.998707] ? rdma_bind_addr+0xc40/0xc40
[ 43.000985] ? num_to_str+0x130/0x130
[ 43.003410] ? deref_stack_reg+0x88/0xd0
[ 43.006302] ? __read_once_size_nocheck.constprop.6+0x10/0x10
[ 43.008780] ? unwind_get_return_address_ptr+0x50/0x50
[ 43.011178] ? replace_slot+0x147/0x170
[ 43.013517] ? delete_node+0x12c/0x340
[ 43.016019] ? __radix_tree_lookup+0xa9/0x160
[ 43.018755] ? ucma_resolve_ip+0xb7/0x110
[ 43.021270] ucma_resolve_ip+0xb7/0x110
[ 43.023968] ? ucma_resolve_addr+0x190/0x190
[ 43.026312] ? _copy_from_user+0x5e/0x90
[ 43.029384] ucma_write+0x174/0x1f0
[ 43.031861] ? ucma_resolve_route+0xf0/0xf0
[ 43.034782] ? dequeue_task_fair+0x67e/0xd90
[ 43.037483] ? put_prev_entity+0x7d/0x170
[ 43.040215] ? ring_buffer_record_is_on+0xd/0x20
[ 43.042990] ? tracing_record_taskinfo_skip+0x20/0x50
[ 43.045595] __vfs_write+0xc4/0x350
[ 43.048624] ? kernel_read+0xa0/0xa0
[ 43.051604] ? firmware_map_remove+0xdf/0xdf
[ 43.055379] ? do_task_dead+0x5d/0x60
[ 43.058000] ? do_exit+0xcc6/0x1220
[ 43.060783] ? __fget+0xa8/0xf0
[ 43.063133] vfs_write+0xf7/0x280
[ 43.065677] SyS_write+0xa1/0x120
[ 43.068647] ? SyS_read+0x120/0x120
[ 43.071179] ? compat_start_thread+0x60/0x60
[ 43.074025] ? SyS_read+0x120/0x120
[ 43.076705] do_syscall_64+0xeb/0x250
[ 43.079006] entry_SYSCALL_64_after_hwframe+0x21/0x86
[ 43.081606] RIP: 0033:0x7f138b0afe99
[ 43.083679] RSP: 002b:00007f138b799e98 EFLAGS: 00000287 ORIG_RAX: 0000000000000001
[ 43.086802] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f138b0afe99
[ 43.089989] RDX: 000000000000002e RSI: 0000000020000c40 RDI: 0000000000000004
[ 43.092866] RBP: 00007f138b799ec0 R08: 00007f138b79a700 R09: 0000000000000000
[ 43.096233] R10: 00007f138b79a700 R11: 0000000000000287 R12: 00007f138b799fc0
[ 43.098913] R13: 0000000000000000 R14: 00007ffdbae757c0 R15: 00007f138b79a9c0
[ 43.101809] Code: 90 90 90 90 90 eb 1e 0f 1f 00 48 89 f8 48 89 d1 48
c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48
89 d1 <f3> a4 c3 0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38
[ 43.107950] RIP: memcpy_erms+0x6/0x10 RSP: ffff8801c8b479c8
Reported-by: <syzbot+1d8c43206853b369d00c@syzkaller.appspotmail.com>
Fixes: 75216638572f ("RDMA/cma: Export rdma cm interface to userspace")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 0c81ffc60d5280991773d17e84bda605387148b1 upstream.
Users can provide garbage while calling to ucma_join_ip_multicast(),
it will indirectly cause to rdma_addr_size() return 0, making the
call to ucma_process_join(), which had the right checks, but it is
better to check the input as early as possible.
The following crash from syzkaller revealed it.
kernel BUG at lib/string.c:1052!
invalid opcode: 0000 [#1] SMP KASAN Dumping ftrace buffer:
(ftrace buffer empty)
Modules linked in:
CPU: 0 PID: 4113 Comm: syz-executor0 Not tainted 4.16.0-rc5+ #261
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:fortify_panic+0x13/0x20 lib/string.c:1051
RSP: 0018:ffff8801ca81f8f0 EFLAGS: 00010286
RAX: 0000000000000022 RBX: 1ffff10039503f23 RCX: 0000000000000000
RDX: 0000000000000022 RSI: 1ffff10039503ed3 RDI: ffffed0039503f12
RBP: ffff8801ca81f8f0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000006 R11: 0000000000000000 R12: ffff8801ca81f998
R13: ffff8801ca81f938 R14: ffff8801ca81fa58 R15: 000000000000fa00
FS: 0000000000000000(0000) GS:ffff8801db200000(0063) knlGS:000000000a12a900
CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033
CR2: 0000000008138024 CR3: 00000001cbb58004 CR4: 00000000001606f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
memcpy include/linux/string.h:344 [inline]
ucma_join_ip_multicast+0x36b/0x3b0 drivers/infiniband/core/ucma.c:1421
ucma_write+0x2d6/0x3d0 drivers/infiniband/core/ucma.c:1633
__vfs_write+0xef/0x970 fs/read_write.c:480
vfs_write+0x189/0x510 fs/read_write.c:544
SYSC_write fs/read_write.c:589 [inline]
SyS_write+0xef/0x220 fs/read_write.c:581
do_syscall_32_irqs_on arch/x86/entry/common.c:330 [inline]
do_fast_syscall_32+0x3ec/0xf9f arch/x86/entry/common.c:392
entry_SYSENTER_compat+0x70/0x7f arch/x86/entry/entry_64_compat.S:139
RIP: 0023:0xf7f9ec99
RSP: 002b:00000000ff8172cc EFLAGS: 00000282 ORIG_RAX: 0000000000000004
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000020000100
RDX: 0000000000000063 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
Code: 08 5b 41 5c 41 5d 41 5e 41 5f 5d c3 0f 0b 48 89 df e8 42 2c e3 fb eb de
55 48 89 fe 48 c7 c7 80 75 98 86 48 89 e5 e8 85 95 94 fb <0f> 0b 90 90 90 90
90 90 90 90 90 90 90 55 48 89 e5 41 57 41 56
RIP: fortify_panic+0x13/0x20 lib/string.c:1051 RSP: ffff8801ca81f8f0
Fixes: 5bc2b7b397b0 ("RDMA/ucma: Allow user space to specify AF_IB when joining multicast")
Reported-by: <syzbot+2287ac532caa81900a4e@syzkaller.appspotmail.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit a5880b84430316e3e1c1f5d23aa32ec6000cc717 upstream.
The QP state is limited and declared in enum ib_qp_state,
but ucma user was able to supply any possible (u32) value.
Reported-by: syzbot+0df1ab766f8924b1edba@syzkaller.appspotmail.com
Fixes: 75216638572f ("RDMA/cma: Export rdma cm interface to userspace")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 6a21dfc0d0db7b7e0acedce67ca533a6eb19283c upstream.
Users of ucma are supposed to provide size of option level,
in most paths it is supposed to be equal to u8 or u16, but
it is not the case for the IB path record, where it can be
multiple of struct ib_path_rec_data.
This patch takes simplest possible approach and prevents providing
values more than possible to allocate.
Reported-by: syzbot+a38b0e9f694c379ca7ce@syzkaller.appspotmail.com
Fixes: 7ce86409adcd ("RDMA/ucma: Allow user space to set service type")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
OPA address handle atttibutes that have 32 bit LIDs would have to
be converted to IB address handle attribute with the LID field
programmed in the GID before copying to user space.
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Add opa_sa_path_rec to sa_path_rec data structure.
The 'type' field in sa_path_rec identifies the
type of the path record.
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Rename ib_sa_path_rec to a more generic sa_path_rec.
This is part of extending ib_sa to also support OPA
path records in addition to the IB defined path records.
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
On Mon, Nov 21, 2016 at 09:52:53AM -0700, Jason Gunthorpe wrote:
> On Mon, Nov 21, 2016 at 02:14:08PM +0200, Leon Romanovsky wrote:
> > >
> > > In ib_ucm_write function there is a wrong prefix:
> > >
> > > + pr_err_once("ucm_write: process %d (%s) tried to do something hinky\n",
> >
> > I did it intentionally to have the same errors for all flows.
>
> Lets actually use a good message too please?
>
> pr_err_once("ucm_write: process %d (%s) changed security contexts after opening FD, this is not allowed.\n",
>
> Jason
>From 70f95b2d35aea42e5b97e7d27ab2f4e8effcbe67 Mon Sep 17 00:00:00 2001
From: Leon Romanovsky <leonro@mellanox.com>
Date: Mon, 21 Nov 2016 13:30:59 +0200
Subject: [PATCH rdma-next V2] IB/{core, qib}: Remove WARN that is not kernel bug
WARNINGs mean kernel bugs, in this case, they are placed
to mark programming errors and/or malicious attempts.
BUG/WARNs that are not kernel bugs hinder automated testing efforts.
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
alloc_ordered_workqueue() with WQ_MEM_RECLAIM set, replaces
deprecated create_singlethread_workqueue(). This is the identity
conversion.
The workqueue "close_wq" queues work items &ctx->close_work (maps to
ucma_close_id) and &con_req_eve->close_work (maps to
ucma_close_event_id). It has been identity converted.
WQ_MEM_RECLAIM has been set to ensure forward progress under
memory pressure.
Signed-off-by: Bhaktipriya Shridhar <bhaktipriya96@gmail.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Added UCMA and CMA support for multicast join flags. Flags are
passed using UCMA CM join command previously reserved fields.
Currently supporting two join flags indicating two different
multicast JoinStates:
1. Full Member:
The initiator creates the Multicast group(MCG) if it wasn't
previously created, can send Multicast messages to the group
and receive messages from the MCG.
2. Send Only Full Member:
The initiator creates the Multicast group(MCG) if it wasn't
previously created, can send Multicast messages to the group
but doesn't receive any messages from the MCG.
IB: Send Only Full Member requires a query of ClassPortInfo
to determine if SM/SA supports this option. If SM/SA
doesn't support Send-Only there will be no join request
sent and an error will be returned.
ETH: When Send Only Full Member is requested no IGMP join
will be sent.
Signed-off-by: Alex Vesker <valex@mellanox.com>
Reviewed by: Hal Rosenstock <hal@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
The drivers/infiniband stack uses write() as a replacement for
bi-directional ioctl(). This is not safe. There are ways to
trigger write calls that result in the return structure that
is normally written to user space being shunted off to user
specified kernel memory instead.
For the immediate repair, detect and deny suspicious accesses to
the write API.
For long term, update the user space libraries and the kernel API
to something that doesn't present the same security vulnerabilities
(likely a structured ioctl() interface).
The impacted uAPI interfaces are generally only available if
hardware from drivers/infiniband is installed in the system.
Reported-by: Jann Horn <jann@thejh.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
[ Expanded check to all known write() entry points ]
Cc: stable@vger.kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
1. Replaced printk with appropriate pr_warn, pr_err, pr_info.
2. Removed unnecessary prints around memory allocation failure
which are not required, as reported by the checkpatch script.
Signed-off-by: Parav Pandit <pandit.parav@gmail.com>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Add support for network namespaces from user space. This is done by passing
the network namespace of the process instead of init_net.
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Yotam Kenneth <yotamke@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Guy Shapiro <guysh@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Add support for network namespaces in the ib_cma module. This is
accomplished by:
1. Adding network namespace parameter for rdma_create_id. This parameter is
used to populate the network namespace field in rdma_id_private.
rdma_create_id keeps a reference on the network namespace.
2. Using the network namespace from the rdma_id instead of init_net inside
of ib_cma, when listening on an ID and when looking for an ID for an
incoming request.
3. Decrementing the reference count for the appropriate network namespace
when calling rdma_destroy_id.
In order to preserve the current behavior init_net is passed when calling
from other modules.
Signed-off-by: Guy Shapiro <guysh@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Yotam Kenneth <yotamke@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Smac and vlan id could be resolved from the GID attribute, and thus
these attributes aren't needed anymore. Removing them.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-By: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Allocating a workqueue might fail, which wasn't checked so far and would
lead to NULL ptr derefs when an attempt to use it was made.
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Currently, IB/cma remove_one flow blocks until all user descriptor managed by
IB/ucma are released. This prevents hot-removal of IB devices. This patch
allows IB/cma to remove devices regardless of user space activity. Upon getting
the RDMA_CM_EVENT_DEVICE_REMOVAL event we close all the underlying HW resources
for the given ucontext. The ucontext itself is still alive till its explicit
destroying by its creator.
Running applications at that time will have some zombie device, further
operations may fail.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Something like this:
CPU A CPU B
Acked-by: Sean Hefty <sean.hefty@intel.com>
======================== ================================
ucma_destroy_id()
wait_for_completion()
.. anything
ucma_put_ctx()
complete()
.. continues ...
ucma_leave_multicast()
mutex_lock(mut)
atomic_inc(ctx->ref)
mutex_unlock(mut)
ucma_free_ctx()
ucma_cleanup_multicast()
mutex_lock(mut)
kfree(mc)
rdma_leave_multicast(mc->ctx->cm_id,..
Fix it by latching the ref at 0. Once it goes to 0 mc and ctx cannot
leave the mutex(mut) protection.
The other atomic_inc in ucma_get_ctx is OK because mutex(mut) protects
it from racing with ucma_destroy_id.
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Destroy multcast_idr on module exit, reclaiming the allocated memory.
This was detected by the following semantic patch (written by Luis Rodriguez
<mcgrof@suse.com>)
<SmPL>
@ defines_module_init @
declarer name module_init, module_exit;
declarer name DEFINE_IDR;
identifier init;
@@
module_init(init);
@ defines_module_exit @
identifier exit;
@@
module_exit(exit);
@ declares_idr depends on defines_module_init && defines_module_exit @
identifier idr;
@@
DEFINE_IDR(idr);
@ on_exit_calls_destroy depends on declares_idr && defines_module_exit @
identifier declares_idr.idr, defines_module_exit.exit;
@@
exit(void)
{
...
idr_destroy(&idr);
...
}
@ missing_module_idr_destroy depends on declares_idr && defines_module_exit && !on_exit_calls_destroy @
identifier declares_idr.idr, defines_module_exit.exit;
@@
exit(void)
{
...
+idr_destroy(&idr);
}
</SmPL>
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
The ucma_lock_files() locks the mut mutex on two files, e.g. for migrating
an ID. Use mutex_lock_nested() to prevent the warning below.
=============================================
[ INFO: possible recursive locking detected ]
4.1.0-rc6-hmm+ #40 Tainted: G O
---------------------------------------------
pingpong_rpc_se/10260 is trying to acquire lock:
(&file->mut){+.+.+.}, at: [<ffffffffa047ac55>] ucma_migrate_id+0xc5/0x248 [rdma_ucm]
but task is already holding lock:
(&file->mut){+.+.+.}, at: [<ffffffffa047ac4b>] ucma_migrate_id+0xbb/0x248 [rdma_ucm]
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&file->mut);
lock(&file->mut);
*** DEADLOCK ***
May be due to missing lock nesting notation
1 lock held by pingpong_rpc_se/10260:
#0: (&file->mut){+.+.+.}, at: [<ffffffffa047ac4b>] ucma_migrate_id+0xbb/0x248 [rdma_ucm]
stack backtrace:
CPU: 0 PID: 10260 Comm: pingpong_rpc_se Tainted: G O 4.1.0-rc6-hmm+ #40
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
ffff8801f85b63d0 ffff880195677b58 ffffffff81668f49 0000000000000001
ffffffff825cbbe0 ffff880195677c38 ffffffff810bb991 ffff880100000000
ffff880100000000 ffff880100000001 ffff8801f85b7010 ffffffff8121bee9
Call Trace:
[<ffffffff81668f49>] dump_stack+0x4f/0x6e
[<ffffffff810bb991>] __lock_acquire+0x741/0x1820
[<ffffffff8121bee9>] ? dput+0x29/0x320
[<ffffffff810bcb38>] lock_acquire+0xc8/0x240
[<ffffffffa047ac55>] ? ucma_migrate_id+0xc5/0x248 [rdma_ucm]
[<ffffffff8166b901>] ? mutex_lock_nested+0x291/0x3e0
[<ffffffff8166b6d5>] mutex_lock_nested+0x65/0x3e0
[<ffffffffa047ac55>] ? ucma_migrate_id+0xc5/0x248 [rdma_ucm]
[<ffffffff810baeed>] ? trace_hardirqs_on+0xd/0x10
[<ffffffff8166b66e>] ? mutex_unlock+0xe/0x10
[<ffffffffa047ac55>] ucma_migrate_id+0xc5/0x248 [rdma_ucm]
[<ffffffffa0478474>] ucma_write+0xa4/0xb0 [rdma_ucm]
[<ffffffff81200674>] __vfs_write+0x34/0x100
[<ffffffff8112427c>] ? __audit_syscall_entry+0xac/0x110
[<ffffffff810ec055>] ? current_kernel_time+0xc5/0xe0
[<ffffffff812aa4d3>] ? security_file_permission+0x23/0x90
[<ffffffff8120088d>] ? rw_verify_area+0x5d/0xe0
[<ffffffff812009bb>] vfs_write+0xab/0x120
[<ffffffff81201519>] SyS_write+0x59/0xd0
[<ffffffff8112427c>] ? __audit_syscall_entry+0xac/0x110
[<ffffffff8166ffee>] system_call_fastpath+0x12/0x76
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
After discussion upstream, it was agreed to transition the usage of iboe
in the kernel to roce. This keeps our terminology consistent with what
was finalized in the IBTA Annex 16 and IBTA Annex 17 publications.
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Introduce helper rdma_cap_ib_sa() to help us check if the port of an
IB device support Infiniband Subnet Administration.
Signed-off-by: Michael Wang <yun.wang@profitbricks.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Tested-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Tested-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Use raw management helpers to reform route related part in IB-core cma.
Signed-off-by: Michael Wang <yun.wang@profitbricks.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Tested-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Tested-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
When marshaling a user path to the kernel struct ib_sa_path, we need
to zero smac and dmac and set the vlan id to the "no vlan" value.
This is to ensure that Ethernet attributes are not used with
InfiniBand QPs.
Fixes: dd5f03beb4f7 ("IB/core: Ethernet L2 attributes in verbs/cm structures")
Signed-off-by: Ilya Nelkenbaum <ilyan@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|
Currently, the IB core and specifically the RDMA-CM assumes that IBoE
(RoCE) gids encode related Ethernet netdevice interface MAC address
and possibly VLAN id.
Change GIDs to be treated as they encode interface IP address.
Since Ethernet layer 2 address parameters are not longer encoded
within gids, we have to extend the Infiniband address structures (e.g.
ib_ah_attr) with layer 2 address parameters, namely mac and vlan.
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|
'nes', 'ocrdma', 'qib' and 'srp' into for-next
|
|
This typedef is unnecessary and should just be removed.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|
Problem reported by Avneesh Pant <avneesh.pant@oracle.com>:
It looks like we are triggering a bug in RDMA CM/UCM interaction.
The bug specifically hits when we have an incoming connection
request and the connecting process dies BEFORE the passive end of
the connection can process the request i.e. it does not call
rdma_get_cm_event() to retrieve the initial connection event. We
were able to triage this further and have some additional
information now.
In the example below when P1 dies after issuing a connect request
as the CM id is being destroyed all outstanding connects (to P2)
are sent a reject message. We see this reject message being
received on the passive end and the appropriate CM ID created for
the initial connection message being retrieved in cm_match_req().
The problem is in the ucma_event_handler() code when this reject
message is delivered to it and the initial connect message itself
HAS NOT been delivered to the client. In fact the client has not
even called rdma_cm_get_event() at this stage so we haven't
allocated a new ctx in ucma_get_event() and updated the new
connection CM_ID to point to the new UCMA context.
This results in the reject message not being dropped in
ucma_event_handler() for the new connection request as the
(if (!ctx->uid)) block is skipped since the ctx it refers to is
the listen CM id context which does have a valid UID associated
with it (I believe the new CMID for the connection initially
uses the listen CMID -> context when it is created in
cma_new_conn_id). Thus the assumption that new events for a
connection can get dropped in ucma_event_handler() is incorrect
IF the initial connect request has not been retrieved in the
first case. We end up getting a CM Reject event on the listen CM
ID and our upper layer code asserts (in fact this event does not
even have the listen_id set as that only gets set up librdmacm
for connect requests).
The solution is to verify that the cm_id being reported in the event
is the same as the cm_id referenced by the ucma context. A mismatch
indicates that the ucma context corresponds to the listen. This fix
was validated by using a modified version of librdmacm that was able
to verify the problem and see that the reject message was indeed
dropped after this patch was applied.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|
Allow user space applications to join multicast groups using MGIDs
directly. MGIDs may be passed using AF_IB addresses. Since the
current multicast join command only supports addresses as large as
sockaddr_in6, define a new structure for joining addresses specified
using sockaddr_ib.
Since AF_IB allows the user to specify the qkey when resolving a
remote UD QP address, when joining the multicast group use the qkey
value, if one has been assigned.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|
Allow user space applications to call resolve_addr using AF_IB. To
support sockaddr_ib, we need to define a new structure capable of
handling the larger address size.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|
Support user space binding to addresses using AF_IB. Since
sockaddr_ib is larger than sockaddr_in6, we need to define a larger
structure when binding using AF_IB. This time we use sockaddr_storage
to cover future cases.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|
Several commands into the RDMA CM from user space are restricted to
supporting addresses which fit into a sockaddr_in6 structure: bind
address, resolve address, and join multicast.
With the addition of AF_IB, we need to support addresses which are
larger than sockaddr_in6. This will be done by adding new commands
that exchange address information using sockaddr_storage. However, to
support existing applications, we maintain the current commands and
structures, but rename them to indicate that they only support IPv4
and v6 addresses.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|
Part of address resolution is mapping IP addresses to IB GIDs. With
the changes to support querying larger addresses and more path records,
also provide a way to query IB GIDs after resolution completes.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|
The current query_route call can return up to two path records. The
assumption being that one is the primary path, with optional support
for an alternate path. In both cases, the paths are assumed to be
reversible and are used to send CM MADs.
With the ability to manually set IB path data, the rdma cm can
eventually be capable of using up to 6 paths per connection:
forward primary, reverse primary,
forward alternate, reverse alternate,
reversible primary path for CM MADs
reversible alternate path for CM MADs.
(It is unclear at this time if IB routing will complicate this) In
order to handle more flexible routing topologies, add a new command to
report any number of paths.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|
The sockaddr structure for AF_IB is larger than sockaddr_in6. The
rdma cm user space ABI uses the latter to exchange address information
between user space and the kernel.
To support querying for larger addresses, define a new query command
that exchanges data using sockaddr_storage, rather than sockaddr_in6.
Unlike the existing query_route command, the new command only returns
address information. Route (i.e. path record) data is separated.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|
Allow the user to specify the qkey when using AF_IB. The qkey is
added to struct rdma_ucm_conn_param in place of a reserved field, but
for backwards compatability, is only accessed if the associated
rdma_cm_id is using AF_IB.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|