summaryrefslogtreecommitdiff
path: root/drivers/infiniband/hw/mlx5/mr.c
AgeCommit message (Collapse)AuthorFilesLines
2020-01-17RDMA/mlx5: Return proper error valueLeon Romanovsky1-1/+1
commit 546d30099ed204792083f043cd7e016de86016a3 upstream. Returned value from mlx5_mr_cache_alloc() is checked to be error or real pointer. Return proper error code instead of NULL which is not checked later. Fixes: 81713d3788d2 ("IB/mlx5: Add implicit MR support") Link: https://lore.kernel.org/r/20191029055721.7192-1-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-06IB/mlx5: Move MRs to a kernel PD when freeing them to the MR cacheYishai Hadas1-1/+3
commit 9ec4483a3f0f71a228a5933bc040441322bfb090 upstream. Fix unreg_umr to move the MR to a kernel owned PD (i.e. the UMR PD) which can't be accessed by userspace. This ensures that nothing can continue to access the MR once it has been placed in the kernels cache for reuse. MRs in the cache continue to have their HW state, including DMA tables, present. Even though the MR has been invalidated, changing the PD provides an additional layer of protection against use of the MR. Link: https://lore.kernel.org/r/20190723065733.4899-5-leon@kernel.org Cc: <stable@vger.kernel.org> # 3.10 Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters") Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-06IB/mlx5: Use direct mkey destroy command upon UMR unreg failureYishai Hadas1-5/+8
commit afd1417404fba6dbfa6c0a8e5763bd348da682e4 upstream. Use a direct firmware command to destroy the mkey in case the unreg UMR operation has failed. This prevents a case that a mkey will leak out from the cache post a failure to be destroyed by a UMR WR. In case the MR cache limit didn't reach a call to add another entry to the cache instead of the destroyed one is issued. In addition, replaced a warn message to WARN_ON() as this flow is fatal and can't happen unless some bug around. Link: https://lore.kernel.org/r/20190723065733.4899-4-leon@kernel.org Cc: <stable@vger.kernel.org> # 4.10 Fixes: 49780d42dfc9 ("IB/mlx5: Expose MR cache for mlx5_ib") Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-06IB/mlx5: Fix unreg_umr to ignore the mkey stateYishai Hadas1-2/+2
commit 6a053953739d23694474a5f9c81d1a30093da81a upstream. Fix unreg_umr to ignore the mkey state and do not fail if was freed. This prevents a case that a user space application already changed the mkey state to free and then the UMR operation will fail leaving the mkey in an inappropriate state. Link: https://lore.kernel.org/r/20190723065733.4899-3-leon@kernel.org Cc: <stable@vger.kernel.org> # 3.19 Fixes: 968e78dd9644 ("IB/mlx5: Enhance UMR support to allow partial page table update") Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-13IB/mlx5: Fix MR cache initializationArtemy Kovalyov1-1/+1
commit 013c2403bf32e48119aeb13126929f81352cc7ac upstream. Schedule MR cache work only after bucket was initialized. Cc: <stable@vger.kernel.org> # 4.10 Fixes: 49780d42dfc9 ("IB/mlx5: Expose MR cache for mlx5_ib") Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Reviewed-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-09RDMA/mlx5: Fix multiple NULL-ptr deref errors in rereg_mr flowLeon Romanovsky1-9/+23
commit b4bd701ac469075d94ed9699a28755f2862252b9 upstream. Failure in rereg MR releases UMEM but leaves the MR to be destroyed by the user. As a result the following scenario may happen: "create MR -> rereg MR with failure -> call to rereg MR again" and hit "NULL-ptr deref or user memory access" errors. Ensure that rereg MR is only performed on a non-dead MR. Cc: syzkaller <syzkaller@googlegroups.com> Cc: <stable@vger.kernel.org> # 4.5 Fixes: 395a8e4c32ea ("IB/mlx5: Refactoring register MR code") Reported-by: Noa Osherovich <noaos@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-24RDMA/mlx5: Protect from NULL pointer derefenceLeon Romanovsky1-0/+2
commit 4289861d88d6c7b5e4c8cc7fe2ad6cdf0cdfc366 upstream. The mlx5_ib_alloc_implicit_mr() can fail to acquire pages and the returned mr pointer won't be valid. Ensure that it is not error prior to access. Cc: <stable@vger.kernel.org> # 4.10 Fixes: 81713d3788d2 ("IB/mlx5: Add implicit MR support") Reported-by: Noa Osherovich <noaos@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28RDMA/mlx5: Fix crash while accessing garbage pointer and freed memoryLeon Romanovsky1-4/+8
commit f3f134f5260ae9ee1f5a4d0a8cc625c6c77655b4 upstream. The failure in rereg_mr flow caused to set garbage value (error value) into mr->umem pointer. This pointer is accessed at the release stage and it causes to the following crash. There is not enough to simply change umem to point to NULL, because the MR struct is needed to be accessed during MR deregistration phase, so delay kfree too. [ 6.237617] BUG: unable to handle kernel NULL pointer dereference a 0000000000000228 [ 6.238756] IP: ib_dereg_mr+0xd/0x30 [ 6.239264] PGD 80000000167eb067 P4D 80000000167eb067 PUD 167f9067 PMD 0 [ 6.240320] Oops: 0000 [#1] SMP PTI [ 6.240782] CPU: 0 PID: 367 Comm: dereg Not tainted 4.16.0-rc1-00029-gc198fafe0453 #183 [ 6.242120] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 [ 6.244504] RIP: 0010:ib_dereg_mr+0xd/0x30 [ 6.245253] RSP: 0018:ffffaf5d001d7d68 EFLAGS: 00010246 [ 6.246100] RAX: 0000000000000000 RBX: ffff95d4172daf00 RCX: 0000000000000000 [ 6.247414] RDX: 00000000ffffffff RSI: 0000000000000001 RDI: ffff95d41a317600 [ 6.248591] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000 [ 6.249810] R10: ffff95d417033c10 R11: 0000000000000000 R12: ffff95d4172c3a80 [ 6.251121] R13: ffff95d4172c3720 R14: ffff95d4172c3a98 R15: 00000000ffffffff [ 6.252437] FS: 0000000000000000(0000) GS:ffff95d41fc00000(0000) knlGS:0000000000000000 [ 6.253887] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 6.254814] CR2: 0000000000000228 CR3: 00000000172b4000 CR4: 00000000000006b0 [ 6.255943] Call Trace: [ 6.256368] remove_commit_idr_uobject+0x1b/0x80 [ 6.257118] uverbs_cleanup_ucontext+0xe4/0x190 [ 6.257855] ib_uverbs_cleanup_ucontext.constprop.14+0x19/0x40 [ 6.258857] ib_uverbs_close+0x2a/0x100 [ 6.259494] __fput+0xca/0x1c0 [ 6.259938] task_work_run+0x84/0xa0 [ 6.260519] do_exit+0x312/0xb40 [ 6.261023] ? __do_page_fault+0x24d/0x490 [ 6.261707] do_group_exit+0x3a/0xa0 [ 6.262267] SyS_exit_group+0x10/0x10 [ 6.262802] do_syscall_64+0x75/0x180 [ 6.263391] entry_SYSCALL_64_after_hwframe+0x21/0x86 [ 6.264253] RIP: 0033:0x7f1b39c49488 [ 6.264827] RSP: 002b:00007ffe2de05b68 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 [ 6.266049] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f1b39c49488 [ 6.267187] RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000 [ 6.268377] RBP: 00007f1b39f258e0 R08: 00000000000000e7 R09: ffffffffffffff98 [ 6.269640] R10: 00007f1b3a147260 R11: 0000000000000246 R12: 00007f1b39f258e0 [ 6.270783] R13: 00007f1b39f2ac20 R14: 0000000000000000 R15: 0000000000000000 [ 6.271943] Code: 74 07 31 d2 e9 25 d8 6c 00 b8 da ff ff ff c3 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 8b 07 53 48 8b 5f 08 <48> 8b 80 28 02 00 00 e8 f7 d7 6c 00 85 c0 75 04 3e ff 4b 18 5b [ 6.274927] RIP: ib_dereg_mr+0xd/0x30 RSP: ffffaf5d001d7d68 [ 6.275760] CR2: 0000000000000228 [ 6.276200] ---[ end trace a35641f1c474bd20 ]--- Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters") Cc: syzkaller <syzkaller@googlegroups.com> Cc: <stable@vger.kernel.org> Reported-by: Noa Osherovich <noaos@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-19IB/mlx5: revisit -Wmaybe-uninitialized warningArnd Bergmann1-0/+3
[ Upstream commit 1b19b95169cd52fe82cd442fec0b279fe25cc838 ] A warning that I thought I had fixed before occasionally comes back in rare randconfig builds (I found 7 instances in the last 100000 builds, originally it was much more frequent): drivers/infiniband/hw/mlx5/mr.c: In function 'mlx5_ib_reg_user_mr': drivers/infiniband/hw/mlx5/mr.c:1229:5: error: 'order' may be used uninitialized in this function [-Werror=maybe-uninitialized] if (order <= mr_cache_max_order(dev)) { ^ drivers/infiniband/hw/mlx5/mr.c:1247:8: error: 'ncont' may be used uninitialized in this function [-Werror=maybe-uninitialized] drivers/infiniband/hw/mlx5/mr.c:1247:8: error: 'page_shift' may be used uninitialized in this function [-Werror=maybe-uninitialized] drivers/infiniband/hw/mlx5/mr.c:1260:2: error: 'npages' may be used uninitialized in this function [-Werror=maybe-uninitialized] I've looked at all those findings again and noticed that they are all with CONFIG_INFINIBAND_USER_MEM=n, which means ib_umem_get() returns an error unconditionally and we never initialize or use those variables. This triggers a condition in gcc iff mr_umem_get() is partially but not entirely inlined, which in turn depends on the exact combination of optimization settings. This is a known problem with gcc, with no easy solution in the compiler, so this adds another workaround that should be more reliable than my previous attempt. Returning an error from mlx5_ib_reg_user_mr() earlier means that we can completely bypass the logic that caused the warning, the compiler can now see that the variable is never accessed. Fixes: 14ab8896f5d9 ("IB/mlx5: avoid bogus -Wmaybe-uninitialized warning") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-15IB/mlx5: Fix incorrect size of klms in the memory regionSergey Gorenko1-1/+1
commit da343b6d90e11132f1e917d865d88ee35d6e6d00 upstream. The value of mr->ndescs greater than mr->max_descs is set in the function mlx5_ib_sg_to_klms() if sg_nents is greater than mr->max_descs. This is an invalid value and it causes the following error when registering mr: mlx5_0:dump_cqe:276:(pid 193): dump error cqe 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000030: 00 00 00 00 0f 00 78 06 25 00 00 8b 08 1e 8f d3 Cc: <stable@vger.kernel.org> # 4.5 Fixes: b005d3164713 ("mlx5: Add arbitrary sg list support") Signed-off-by: Sergey Gorenko <sergeygo@mellanox.com> Tested-by: Laurence Oberman <loberman@redhat.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-03IB/mlx5: Fix mlx5_ib_alloc_mr error flowNitzan Carmi1-0/+1
[ Upstream commit 45e6ae7ef21b907dacb18da62d5787d74a31d860 ] ibmr.device is being set only after ib_alloc_mr() is (successfully) complete. Therefore, in case mlx5_core_create_mkey() return with error, the error flow calls mlx5_free_priv_descs() which uses ibmr.device (which doesn't exist yet), causing a NULL dereference oops. To fix this, the IB device should be set in the mr struct earlier stage (e.g. prior to calling mlx5_core_create_mkey()). Fixes: 8a187ee52b04 ("IB/mlx5: Support the new memory registration API") Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Nitzan Carmi <nitzanc@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-25IB/mlx5: Fix NULL deference on mlx5_ib_update_xlt failureIlya Lesokhin1-10/+17
mlx5_ib_reg_user_mr called mlx5_ib_dereg_mr in case of MR population failure. This resulted in a NULL dereference as ibmr->device wasn't initialized yet. We address this by adding an internal dereg_mr function that can handle partially initialized MRs, and fixing clean_mr to work on partially initialized MRs. Fixes: ff740aefecb9 ("IB/mlx5: Decouple MR allocation and population flows") Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-25IB/mlx5: Fix integer overflow when page_shift == 31Ilya Lesokhin1-4/+5
Fix a bug where MR registration fails when mlx5_ib_cont_pages indicates that the MR can be mapped using 2GB pages (page_shift == 31). Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters") Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-25IB/mlx5: Fix memory leak in clean_mr error pathKamal Heib1-4/+4
In clean_mr error path the 'mr' should be freed. Fixes: e126ba97dba9 ('mlx5: Add driver for Mellanox Connect-IB adapters') Signed-off-by: Kamal Heib <kamalh@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-25IB/mlx5: Decouple MR allocation and population flowsIlya Lesokhin1-30/+45
mlx5 compatible devices have two ways of populating the MTT table of an MKEY: using a FW command and using a UMR WQE. A UMR is much faster, so it should be used whenever possible. Unfortunately the code today uses UMR only if the MKEY was allocated from the MR cache. Fix the code to use UMR even for MKEYs that were allocated using a FW command. Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-25IB/mlx5: Enable UMR for MRs created with reg_createIlya Lesokhin1-19/+14
This patch is the first step in decoupling UMR usage and allocation from the MR cache. The only functional change in this patch is to enables UMR for MRs created with reg_create. This change fixes a bug where ODP memory regions that were not allocated from the MR cache did not have UMR enabled. Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-24IB/mlx5: Fix cached MR allocation flowMajd Dibbiny1-5/+13
When we have a miss in one order of the mkey cache, we try to get an mkey from a higher order. We still need to check that the higher order can be used with UMR before using it. Otherwise, we will get an mkey with 0 entries and the post send operation that is used to fill it will complete with the following error: mlx5_0:dump_cqe:275:(pid 0): dump error cqe 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 0f007806 25000025 49ce59d2 Fixes: 49780d42dfc9 ("IB/mlx5: Expose MR cache for mlx5_ib") Cc: <stable@vger.kernel.org> # v4.10+ Signed-off-by: Majd Dibbiny <majd@mellanox.com> Reviewed-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-20IB/mlx5: Fix a warning messageDan Carpenter1-1/+1
"umem" is a valid pointer. We intended to print "*umem" or even just "err" instead. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-18IB/mlx5: Clean mr_cache debugfs in case of failureLeon Romanovsky1-12/+22
The failure in creation of debugfs entries for mr_cache left entries, which were already created. It caused to mismatch and misguiding for the end users. The solution is to clean mr_cache debugfs root, so no leftovers will be in the system. In addition, let's document why the error is not needed to be forwarded to user in case of failure. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-17mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] arrayBart Van Assche1-1/+1
ib_map_mr_sg() can pass an SG-list to .map_mr_sg() that is larger than what fits into a single MR. .map_mr_sg() must not attempt to map more SG-list elements than what fits into a single MR. Hence make sure that mlx5_ib_sg_to_klms() does not write outside the MR klms[] array. Fixes: b005d3164713 ("mlx5: Add arbitrary sg list support") Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Leon Romanovsky <leonro@mellanox.com> Cc: Israel Rukshin <israelr@mellanox.com> Cc: <stable@vger.kernel.org> Acked-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-14{net, IB}/mlx5: Replace mlx5_vzalloc with kvzallocLeon Romanovsky1-1/+1
Commit a7c3e901a46f ("mm: introduce kv[mz]alloc helpers") added proper implementation of mlx5_vzalloc function to the MM core. This made the mlx5_vzalloc function useless, so let's remove it. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-05-01mlx5: Fix mlx5_ib_map_mr_sg mr lengthSagi Grimberg1-1/+1
In case we got an initial sg_offset, we need to account for it in the mr length. Cc: stable@vger.kernel.org Fixes: ff2ba9936591 ("IB/core: Add passing an offset into the SG to ib_map_mr_sg") Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Tested-by: Israel Rukshin <israelr@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-25IB/mlx5: Add ODP support to MWArtemy Kovalyov1-0/+1
Internally MW implemented as KLM MKey and filled by userspace UMR postsends. Handle pagefault trigered by operations on this MKeys. Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-25IB/mlx5: Fix UMR size calculationArtemy Kovalyov1-1/+2
Translation table updates of large UMR may require multiple post send operations. The last operations can be in various lengths, but current code set them to be the same length. Fixes: 7d0cc6edcc70 ('IB/mlx5: Add MR cache for large UMR regions') Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-25IB/mlx5: Fix function updating xlt emergency pathArtemy Kovalyov1-1/+1
In memory shortage path we fall back to use spare buffer. mlx5_ib_update_xlt() called from ib_uverbs_reg_mr when ibmr.ucontext not initialized yet. Scenario how to test it: 1. trigger memory exhaustion so __get_free_pages(GFP_KERNEL, 4) will fail 2. register MR 3. there should be no kernel oops Fixes: 7d0cc6edcc70 ('IB/mlx5: Add MR cache for large UMR regions') Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-26Merge tag 'for-next-dma_ops' of ↵Linus Torvalds1-4/+4
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull rdma DMA mapping updates from Doug Ledford: "Drop IB DMA mapping code and use core DMA code instead. Bart Van Assche noted that the ib DMA mapping code was significantly similar enough to the core DMA mapping code that with a few changes it was possible to remove the IB DMA mapping code entirely and switch the RDMA stack to use the core DMA mapping code. This resulted in a nice set of cleanups, but touched the entire tree and has been kept separate for that reason." * tag 'for-next-dma_ops' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (37 commits) IB/rxe, IB/rdmavt: Use dma_virt_ops instead of duplicating it IB/core: Remove ib_device.dma_device nvme-rdma: Switch from dma_device to dev.parent RDS: net: Switch from dma_device to dev.parent IB/srpt: Modify a debug statement IB/srp: Switch from dma_device to dev.parent IB/iser: Switch from dma_device to dev.parent IB/IPoIB: Switch from dma_device to dev.parent IB/rxe: Switch from dma_device to dev.parent IB/vmw_pvrdma: Switch from dma_device to dev.parent IB/usnic: Switch from dma_device to dev.parent IB/qib: Switch from dma_device to dev.parent IB/qedr: Switch from dma_device to dev.parent IB/ocrdma: Switch from dma_device to dev.parent IB/nes: Remove a superfluous assignment statement IB/mthca: Switch from dma_device to dev.parent IB/mlx5: Switch from dma_device to dev.parent IB/mlx4: Switch from dma_device to dev.parent IB/i40iw: Remove a superfluous assignment statement IB/hns: Switch from dma_device to dev.parent ...
2017-02-14IB/mlx5: Add implicit MR supportArtemy Kovalyov1-5/+28
Add implicit MR, covering entire user address space. The MR is implemented as an indirect KSM MR consisting of 1GB direct MRs. Pages and direct MRs are added/removed to MR by ODP. Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-14IB/mlx5: Expose MR cache for mlx5_ibArtemy Kovalyov1-27/+72
Allow other parts of mlx5_ib to use MR cache mechanism. * Add new functions mlx5_mr_cache_alloc and mlx5_mr_cache_free * Traditional MTT MKey buckets are limited by MAX_UMR_CACHE_ENTRY Additinal buckets may be added above. Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-24IB/mlx5: Switch from dma_device to dev.parentBart Van Assche1-7/+7
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Matan Barak <matanb@mellanox.com> Cc: Leon Romanovsky <leonro@mellanox.com> Acked-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-02IB/mlx5: Improve MR checkArtemy Kovalyov1-0/+5
Add "type" field to mlx5_core MKEY struct. Check whether page fault happens on MKEY corresponding to MR. Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-02IB/mlx5: Add MR cache for large UMR regionsArtemy Kovalyov1-231/+155
In this change we turn mlx5_ib_update_mtt() into generic mlx5_ib_update_xlt() to perfrom HCA translation table modifiactions supporting both atomic and process contexts and not limited by number of modified entries. Using this function we increase preallocated MRs up to 16GB. Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-02IB/mlx5: Add support for big MRsArtemy Kovalyov1-3/+5
Make use of extended UMR translation offset. Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-02IB/mlx5: Refactor UMR post send formatArtemy Kovalyov1-24/+26
* Update struct mlx5_wqe_umr_ctrl_seg. * Currenlty UMR send_flags aim only certain use cases: enabled/disable cached MR, modifying XLT for ODP. By making flags independent make UMR more flexible allowing arbitrary manipulations. * Since different UMR formats have different entry sizes UMR request should receive exact size of translation table update instead of number of entries. Rename field npages to xlt_size in struct mlx5_umr_wr and update relevant code accordingly. * Add support of length64 bit. Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-02IB/mlx5: Add helper mlx5_ib_post_send_waitBinoy Jayan1-83/+32
Clean up the following common code (to post a list of work requests to the send queue of the specified QP) at various places and add a helper function 'mlx5_ib_post_send_wait' to implement the same. - Initialize 'mlx5_ib_umr_context' on stack - Assign "mlx5_umr_wr:wr:wr_cqe to umr_context.cqe - Acquire the semaphore - call ib_post_send with a single ib_send_wr - wait_for_completion() - Check for umr_context.status - Release the semaphore Signed-off-by: Binoy Jayan <binoy.jayan@linaro.org> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-15Merge tag 'for-linus' of ↵Linus Torvalds1-19/+52
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull rdma updates from Doug Ledford: "This is the complete update for the rdma stack for this release cycle. Most of it is typical driver and core updates, but there is the entirely new VMWare pvrdma driver. You may have noticed that there were changes in DaveM's pull request to the bnxt Ethernet driver to support a RoCE RDMA driver. The bnxt_re driver was tentatively set to be pulled in this release cycle, but it simply wasn't ready in time and was dropped (a few review comments still to address, and some multi-arch build issues like prefetch() not working across all arches). Summary: - shared mlx5 updates with net stack (will drop out on merge if Dave's tree has already been merged) - driver updates: cxgb4, hfi1, hns-roce, i40iw, mlx4, mlx5, qedr, rxe - debug cleanups - new connection rejection helpers - SRP updates - various misc fixes - new paravirt driver from vmware" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (210 commits) IB: Add vmw_pvrdma driver IB/mlx4: fix improper return value IB/ocrdma: fix bad initialization infiniband: nes: return value of skb_linearize should be handled MAINTAINERS: Update Intel RDMA RNIC driver maintainers MAINTAINERS: Remove Mitesh Ahuja from emulex maintainers IB/core: fix unmap_sg argument qede: fix general protection fault may occur on probe IB/mthca: Replace pci_pool_alloc by pci_pool_zalloc mlx5, calc_sq_size(): Make a debug message more informative mlx5: Remove a set-but-not-used variable mlx5: Use { } instead of { 0 } to init struct IB/srp: Make writing the add_target sysfs attr interruptible IB/srp: Make mapping failures easier to debug IB/srp: Make login failures easier to debug IB/srp: Introduce a local variable in srp_add_one() IB/srp: Fix CONFIG_DYNAMIC_DEBUG=n build IB/multicast: Check ib_find_pkey() return value IPoIB: Avoid reading an uninitialized member variable IB/mad: Fix an array index check ...
2016-12-14IB/mlx5: avoid bogus -Wmaybe-uninitialized warningArnd Bergmann1-18/+21
We get a false-positive warning in linux-next for the mlx5 driver: infiniband/hw/mlx5/mr.c: In function ‘mlx5_ib_reg_user_mr’: infiniband/hw/mlx5/mr.c:1172:5: error: ‘order’ may be used uninitialized in this function [-Werror=maybe-uninitialized] infiniband/hw/mlx5/mr.c:1161:6: note: ‘order’ was declared here infiniband/hw/mlx5/mr.c:1173:6: error: ‘ncont’ may be used uninitialized in this function [-Werror=maybe-uninitialized] infiniband/hw/mlx5/mr.c:1160:6: note: ‘ncont’ was declared here infiniband/hw/mlx5/mr.c:1173:6: error: ‘page_shift’ may be used uninitialized in this function [-Werror=maybe-uninitialized] infiniband/hw/mlx5/mr.c:1158:6: note: ‘page_shift’ was declared here infiniband/hw/mlx5/mr.c:1143:13: error: ‘npages’ may be used uninitialized in this function [-Werror=maybe-uninitialized] infiniband/hw/mlx5/mr.c:1159:6: note: ‘npages’ was declared here I had a trivial workaround for gcc-5 or higher, but that didn't work on gcc-4.9 unfortunately. The only way I found to avoid the warnings for gcc-4.9, short of initializing each of the arguments first was to change the calling conventions to separate the error code from the umem pointer. This avoids casting the error codes from one pointer to another incompatible pointer, and lets gcc figure out when that the data is actually valid whenever we return successfully. Acked-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-13IB/mlx5: Avoid system crash when enabling many VFsEli Cohen1-1/+2
When enabling many VFs, the total amount of DMA mappings increase significantly. This causes DMA allocations to take a lot of time since they are serialized in the kernel. As a result the driver enters into fatal condition due to timeout and the system hangs. To recover from this we disable MR cache for VFs. PFs will still have a full cache and VFs cache can be manipulated as usual after driver load. Fixes: e126ba97dba9 ('mlx5: Add driver for Mellanox Connect-IB adapters') Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-11-17IB/mlx5: Limit mkey page size to 2GBMajd Dibbiny1-1/+2
The maximum page size in the mkey context is 2GB. Until today, we didn't enforce this requirement in the code, and therefore, if we got a page size larger than 2GB, we have passed zeros in the log_page_shift instead of the actual value and the registration failed. This patch limits the driver to use compound pages of 2GB for mkeys. Fixes: e126ba97dba9 ('mlx5: Add driver for Mellanox Connect-IB adapters') Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-11-17IB/mlx5: Wait for all async command completions to completeEli Cohen1-0/+28
Wait before continuing unload till all pending mkey async creation requests are done. Fixes: e126ba97dba9 ('mlx5: Add driver for Mellanox Connect-IB adapters') Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-11-17IB/mlx5: Resolve soft lock on massive reg MRsMoshe Lazer1-1/+5
When calling reg_mr of large MRs (e.g. 4GB) from multiple processes and MR caches can't supply the required amount of MRs the slow-path of MR allocation may be used. In this case we need to serialize the slow-path between the processes to avoid soft lock. Fixes: e126ba97dba9 ('mlx5: Add driver for Mellanox Connect-IB adapters') Signed-off-by: Moshe Lazer <moshel@mellanox.com> Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Reviewed-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-07IB/mlx5: Move and decouple user vendor structuresLeon Romanovsky1-1/+0
This patch decouples and moves vendors specific structures to common UAPI folder which will be visible to all consumers. These structures are used by user-space library driver (libmlx5) and currently manually copied to that library. This move will allow cross-compile against these files and simplify introduction of vendor specific data. Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-07IB/mlx5: Remove deprecated create_singlethread_workqueueBhaktipriya Shridhar1-1/+1
alloc_ordered_workqueue() with WQ_MEM_RECLAIM set, replaces deprecated create_singlethread_workqueue(). This is the identity conversion. The workqueue "cache->wq" queues work items &ent->work (maps to cache_work_func) and &ent->dwork(maps to delayed_cache_work_func). It has been identity converted. WQ_MEM_RECLAIM has been set to ensure forward progress under memory pressure. Signed-off-by: Bhaktipriya Shridhar <bhaktipriya96@gmail.com> Acked-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-14{net,IB}/mlx5: MKey/PSV commands via mlx5 ifcSaeed Mahameed1-81/+103
Remove old representation of manually created MKey/PSV commands layout, and use mlx5_ifc canonical structures and defines. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-06-23IB/mlx5: Reset flow support for IB kernel ULPsMaor Gottlieb1-0/+4
The driver exposes interfaces that directly relate to HW state. Upon fatal error, consumers of these interfaces (ULPs) that rely on completion of all their posted work-request could hang, thereby introducing dependencies in shutdown order. To prevent this from happening, we manage the relevant resources (CQs, QPs) that are used by the device. Upon a fatal error, we now generate simulated completions for outstanding WQEs that were not completed at the time the HW was reset. It includes invoking the completion event handler for all involved CQs so that the ULPs will poll those CQs. When polled we return simulated CQEs with IB_WC_WR_FLUSH_ERR return code enabling ULPs to clean up their resources and not wait forever for completions upon receiving remove_one. The above change requires an extra check in the data path to make sure that when device is in error state, the simulated CQEs will be returned and no further WQEs will be posted. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13IB/core: Enhance ib_map_mr_sg()Bart Van Assche1-2/+6
The SRP initiator allows to set max_sectors to a value that exceeds the largest amount of data that can be mapped at once with an mlx4 HCA using fast registration and a page size of 4 KB. Hence modify ib_map_mr_sg() such that it can map partial sg-elements. If an sg-element has been mapped partially, let the caller know which fraction has been mapped by adjusting *sg_offset. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Tested-by: Laurence Oberman <loberman@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13IB/core: Add passing an offset into the SG to ib_map_mr_sgChristoph Hellwig1-9/+12
Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-04mlx5: Add arbitrary sg list supportSagi Grimberg1-6/+44
Allocate proper context for arbitrary scatterlist registration If ib_alloc_mr is called with IB_MR_MAP_ARB_SG, the driver allocate a private klm list instead of a private page list. Set the UMR wqe correctly when posting the fast registration. Also, expose device cap IB_DEVICE_MAP_ARB_SG according to the device id (until we have a FW bit that correctly exposes it). Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-04IB/mlx5: Make coding style more consistentDoug Ledford1-4/+2
These three related functions can't agree whether to put the umrwr on the stack dirty and then memset it, or to initialize it on the stack. Make them all agree. Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-04IB/mlx5: Convert UMR CQ to new CQ APIChristoph Hellwig1-26/+23
Simplifies the code, and makes it more fair vs other users by using a softirq for polling. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Haggai Eran <haggaie@mellanox.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-01IB/mlx5: Add memory windows allocation supportMatan Barak1-0/+83
This patch adds user-space support for memory windows allocation and deallocation. It also exposes the supported types via query_device_caps verb. Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Tested-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>