summaryrefslogtreecommitdiff
path: root/drivers/infiniband
AgeCommit message (Collapse)AuthorFilesLines
2020-09-09RDMA/umem: Replace for_each_sg_dma_page with rdma_umem_for_each_dma_blockJason Gunthorpe4-15/+13
Generally drivers should be using this core helper to split up the umem into DMA pages. These drivers are all probably wrong in some way to pass PAGE_SIZE in as the HW page size. Either the driver doesn't support other page sizes and it should use 4096, or the driver does support other page sizes and should use ib_umem_find_best_pgsz() to select the best HW pages size of the HW supported set. The only case it could be correct is if the HW has a global setting for PAGE_SIZE set at driver initialization time. Link: https://lore.kernel.org/r/5-v2-270386b7e60b+28f4-umem_1_jgg@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09RDMA/umem: Add rdma_umem_for_each_dma_block()Jason Gunthorpe4-7/+4
This helper does the same as rdma_for_each_block(), except it works on a umem. This simplifies most of the call sites. Link: https://lore.kernel.org/r/4-v2-270386b7e60b+28f4-umem_1_jgg@nvidia.com Acked-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> Acked-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09RDMA/umem: Use simpler logic for ib_umem_find_best_pgsz()Jason Gunthorpe1-3/+8
The calculation in rdma_find_pg_bit() is fairly complicated, and the function is never called anywhere else. Inline a simpler version into ib_umem_find_best_pgsz() Link: https://lore.kernel.org/r/3-v2-270386b7e60b+28f4-umem_1_jgg@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09RDMA/umem: Prevent small pages from being returned by ib_umem_find_best_pgsz()Jason Gunthorpe1-0/+6
rdma_for_each_block() makes assumptions about how the SGL is constructed that don't work if the block size is below the page size used to to build the SGL. The rules for umem SGL construction require that the SG's all be PAGE_SIZE aligned and we don't encode the actual byte offset of the VA range inside the SGL using offset and length. So rdma_for_each_block() has no idea where the actual starting/ending point is to compute the first/last block boundary if the starting address should be within a SGL. Fixing the SGL construction turns out to be really hard, and will be the subject of other patches. For now block smaller pages. Fixes: 4a35339958f1 ("RDMA/umem: Add API to find best driver supported page size in an MR") Link: https://lore.kernel.org/r/2-v2-270386b7e60b+28f4-umem_1_jgg@nvidia.com Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09RDMA/umem: Fix ib_umem_find_best_pgsz() for mappings that cross a page boundaryJason Gunthorpe1-2/+7
It is possible for a single SGL to span an aligned boundary, eg if the SGL is 61440 -> 90112 Then the length is 28672, which currently limits the block size to 32k. With a 32k page size the two covering blocks will be: 32768->65536 and 65536->98304 However, the correct answer is a 128K block size which will span the whole 28672 bytes in a single block. Instead of limiting based on length figure out which high IOVA bits don't change between the start and end addresses. That is the highest useful page size. Fixes: 4a35339958f1 ("RDMA/umem: Add API to find best driver supported page size in an MR") Link: https://lore.kernel.org/r/1-v2-270386b7e60b+28f4-umem_1_jgg@nvidia.com Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09RDMA: Make counters destroy symmetricalLeon Romanovsky2-2/+5
Change counters to return failure like any other verbs destroy, however this flow shouldn't return error at all. Link: https://lore.kernel.org/r/20200907120921.476363-10-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09RDMA: Restore ability to return error for destroy WQLeon Romanovsky8-16/+25
Make this interface symmetrical to other destroy paths. Fixes: a49b1dc7ae44 ("RDMA: Convert destroy_wq to be void") Link: https://lore.kernel.org/r/20200907120921.476363-9-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09RDMA: Change XRCD destroy return valueLeon Romanovsky4-6/+11
Update XRCD destroy flow to allow command failure. Fixes: 28ad5f65c314 ("RDMA: Move XRCD to be under ib_core responsibility") Link: https://lore.kernel.org/r/20200907120921.476363-8-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09RDMA: Allow fail of destroy CQLeon Romanovsky30-35/+62
Like any other verbs objects, CQ shouldn't fail during destroy, but mlx5_ib didn't follow this contract with mixed IB verbs objects with DEVX. Such mix causes to the situation where FW and kernel are fully interdependent on the reference counting of each side. Kernel verbs and drivers that don't have DEVX flows shouldn't fail. Fixes: e39afe3d6dbd ("RDMA: Convert CQ allocations to be under core responsibility") Link: https://lore.kernel.org/r/20200907120921.476363-7-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09RDMA/core: Delete function indirection for alloc/free kernel CQLeon Romanovsky1-15/+12
The ib_alloc_cq*() and ib_free_cq*() are solely kernel verbs to manage CQs and doesn't need extra indirection just to call same functions with constant parameter NULL as udata. Link: https://lore.kernel.org/r/20200907120921.476363-6-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09RDMA: Restore ability to fail on SRQ destroyLeon Romanovsky26-45/+54
In similar way to other IB objects, restore the ability to return error on SRQ destroy. Strictly speaking, this change is not necessary, and provided here to ensure a symmetrical interface like other destroy functions. Fixes: 68e326dea1db ("RDMA: Handle SRQ allocations by IB/core") Link: https://lore.kernel.org/r/20200907120921.476363-5-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09RDMA/mlx5: Issue FW command to destroy SRQ on reentryLeon Romanovsky1-3/+12
The HW release can fail and leave the system in limbo state, where SRQ is removed from the table, but can't be destroyed later. In every reentry, the initial xa_erase_irq() check will fail. Rewrite the erase logic to keep index, but don't store the entry itself. By doing it, we can safely reinsert entry back in the case of destroy failure. Link: https://lore.kernel.org/r/20200907120921.476363-4-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09RDMA: Restore ability to fail on AH destroyLeon Romanovsky21-35/+41
Like any other IB verbs objects, AH are refcounted by ib_core. The release of those objects are controlled by ib_core with promise that AH destroy can't fail. Being SW object for now, this change makes dealloc_ah() to behave like any other destroy IB flows. Fixes: d345691471b4 ("RDMA: Handle AH allocations by IB/core") Link: https://lore.kernel.org/r/20200907120921.476363-3-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09RDMA: Restore ability to fail on PD deallocateLeon Romanovsky28-34/+51
The IB verbs objects are counted by the kernel and ib_core ensures that deallocate PD will success so it will be called once all other objects that depends on PD will be released. This is achieved by managing various reference counters on such objects. The mlx5 driver didn't follow this standard flow when allowed DEVX objects that are not managed by ib_core to be interleaved with the ones under ib_core responsibility. In such interleaved scenarios deallocate command can fail and ib_core will leave uobject in internal DB and attempt to clean it later to free resources anyway. This change partially restores returned value from dealloc_pd() for all drivers, but keeping in mind that non-DEVX devices and kernel verbs paths shouldn't fail. Fixes: 21a428a019c9 ("RDMA: Handle PD allocations by IB/core") Link: https://lore.kernel.org/r/20200907120921.476363-2-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09IB/isert: Fix unaligned immediate-data handlingSagi Grimberg2-56/+78
Currently we allocate rx buffers in a single contiguous buffers for headers (iser and iscsi) and data trailer. This means that most likely the data starting offset is aligned to 76 bytes (size of both headers). This worked fine for years, but at some point this broke, resulting in data corruptions in isert when a command comes with immediate data and the underlying backend device assumes 512 bytes buffer alignment. We assume a hard-requirement for all direct I/O buffers to be 512 bytes aligned. To fix this, we should avoid passing unaligned buffers for I/O. Instead, we allocate our recv buffers with some extra space such that we can have the data portion align to 512 byte boundary. This also means that we cannot reference headers or data using structure but rather accessors (as they may move based on alignment). Also, get rid of the wrong __packed annotation from iser_rx_desc as this has only harmful effects (not aligned to anything). This affects the rx descriptors for iscsi login and data plane. Fixes: 3d75ca0adef4 ("block: introduce multi-page bvec helpers") Link: https://lore.kernel.org/r/20200904195039.31687-1-sagi@grimberg.me Reported-by: Stephen Rust <srust@blockbridge.com> Tested-by: Doug Dumitru <doug@dumitru.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09RDMA/rtrs-srv: Incorporate ib_register_client into rtrs server initMd Haris Iqbal2-3/+80
The rnbd_server module's communication manager (cm) initialization depends on the registration of the "network namespace subsystem" of the RDMA CM agent module. As such, when the kernel is configured to load the rnbd_server and the RDMA cma module during initialization; and if the rnbd_server module is initialized before RDMA cma module, a null ptr dereference occurs during the RDMA bind operation. Call trace: Call Trace: ? xas_load+0xd/0x80 xa_load+0x47/0x80 cma_ps_find+0x44/0x70 rdma_bind_addr+0x782/0x8b0 ? get_random_bytes+0x35/0x40 rtrs_srv_cm_init+0x50/0x80 rtrs_srv_open+0x102/0x180 ? rnbd_client_init+0x6e/0x6e rnbd_srv_init_module+0x34/0x84 ? rnbd_client_init+0x6e/0x6e do_one_initcall+0x4a/0x200 kernel_init_freeable+0x1f1/0x26e ? rest_init+0xb0/0xb0 kernel_init+0xe/0x100 ret_from_fork+0x22/0x30 Modules linked in: CR2: 0000000000000015 All this happens cause the cm init is in the call chain of the module init, which is not a preferred practice. So remove the call to rdma_create_id() from the module init call chain. Instead register rtrs-srv as an ib client, which makes sure that the rdma_create_id() is called only when an ib device is added. Fixes: 9cb837480424 ("RDMA/rtrs: server: main functionality") Link: https://lore.kernel.org/r/20200907103106.104530-1-haris.iqbal@cloud.ionos.com Reported-by: kernel test robot <rong.a.chen@intel.com> Signed-off-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com> Reviewed-by: Jack Wang <jinpu.wang@cloud.ionos.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09RDMA/rtrs-srv: Set .release function for rtrs srv device during device initMd Haris Iqbal2-8/+8
The device .release function was not being set during the device initialization. This was leading to the below warning, in error cases when put_srv was called before device_add was called. Warning: Device '(null)' does not have a release() function, it is broken and must be fixed. See Documentation/kobject.txt. So, set the device .release function during device initialization in the __alloc_srv() function. Fixes: baa5b28b7a47 ("RDMA/rtrs-srv: Replace device_register with device_initialize and device_add") Link: https://lore.kernel.org/r/20200907102216.104041-1-haris.iqbal@cloud.ionos.com Signed-off-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Acked-by: Jack Wang <jinpu.wang@cloud.ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09RDMA/hns: Avoid unncessary initializationLijun Ou6-24/+24
Some variables have been initialized when used. As a result, here removes some unncessary initial assignment. Link: https://lore.kernel.org/r/1599547944-30671-1-git-send-email-oulijun@huawei.com Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09RDMA/bnxt_re: Remove set but not used variable 'qplib_ctx'YueHaibing1-2/+0
drivers/infiniband/hw/bnxt_re/main.c:1012:25: warning: variable ‘qplib_ctx’ set but not used [-Wunused-but-set-variable] Fixes: f86b31c6a28f ("RDMA/bnxt_re: Static NQ depth allocation") Link: https://lore.kernel.org/r/20200905121624.32776-1-yuehaibing@huawei.com Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09RDMA/core: Change how failing destroy is handled during uobj abortJason Gunthorpe1-15/+15
Currently it triggers a WARN_ON and then goes ahead and destroys the uobject anyhow, leaking any driver memory. The only place that leaks driver memory should be during FD close() in uverbs_destroy_ufile_hw(). Drivers are only allowed to fail destroy uobjects if they guarantee destroy will eventually succeed. uverbs_destroy_ufile_hw() provides the loop to give the driver that chance. Link: https://lore.kernel.org/r/20200902081708.746631-1-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-03RDMA/rxe: Convert tasklets to use new tasklet_setup() APIAllen Pais3-8/+8
In preparation for unconditionally passing the struct tasklet_struct pointer to all tasklet callbacks, switch to using the new tasklet_setup() and from_tasklet() to pass the tasklet pointer explicitly. Link: https://lore.kernel.org/r/20200903060637.424458-6-allen.lkml@gmail.com Signed-off-by: Romain Perier <romain.perier@gmail.com> Signed-off-by: Allen Pais <allen.lkml@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-03RDMA/qib: Convert tasklets to use new tasklet_setup() APIAllen Pais2-9/+8
In preparation for unconditionally passing the struct tasklet_struct pointer to all tasklet callbacks, switch to using the new tasklet_setup() and from_tasklet() to pass the tasklet pointer explicitly. Link: https://lore.kernel.org/r/20200903060637.424458-5-allen.lkml@gmail.com Signed-off-by: Romain Perier <romain.perier@gmail.com> Signed-off-by: Allen Pais <allen.lkml@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-03RDMA/i40iw: Convert tasklets to use new tasklet_setup() APIAllen Pais1-7/+7
In preparation for unconditionally passing the struct tasklet_struct pointer to all tasklet callbacks, switch to using the new tasklet_setup() and from_tasklet() to pass the tasklet pointer explicitly. Link: https://lore.kernel.org/r/20200903060637.424458-4-allen.lkml@gmail.com Signed-off-by: Romain Perier <romain.perier@gmail.com> Signed-off-by: Allen Pais <allen.lkml@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-03RDMA/hfi1: Convert tasklets to use new tasklet_setup() APIAllen Pais1-11/+11
In preparation for unconditionally passing the struct tasklet_struct pointer to all tasklet callbacks, switch to using the new tasklet_setup() and from_tasklet() to pass the tasklet pointer explicitly. Link: https://lore.kernel.org/r/20200903060637.424458-3-allen.lkml@gmail.com Signed-off-by: Romain Perier <romain.perier@gmail.com> Signed-off-by: Allen Pais <allen.lkml@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-03RDMA/bnxt_re: Convert tasklets to use new tasklet_setup() APIAllen Pais2-10/+8
In preparation for unconditionally passing the struct tasklet_struct pointer to all tasklet callbacks, switch to using the new tasklet_setup() and from_tasklet() to pass the tasklet pointer explicitly. Link: https://lore.kernel.org/r/20200903060637.424458-2-allen.lkml@gmail.com Signed-off-by: Romain Perier <romain.perier@gmail.com> Signed-off-by: Allen Pais <allen.lkml@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-03RDMA/mlx5: Fix potential race between destroy and CQE pollLeon Romanovsky1-2/+3
The SRQ can be destroyed right before mlx5_cmd_get_srq is called. In such case the latter will return NULL instead of expected SRQ. Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters") Link: https://lore.kernel.org/r/20200830084010.102381-5-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-02RDMA/ucma: Fix resource leak on error pathAlex Dewar1-7/+8
In ucma_process_join(), if the call to xa_alloc() fails, the function will return without freeing mc. Fix this by jumping to the correct line. In the process I renamed the jump labels to something more memorable for extra clarity. Link: https://lore.kernel.org/r/20200902162454.332828-1-alex.dewar90@gmail.com Addresses-Coverity-ID: 1496814 ("Resource leak") Fixes: 95fe51096b7a ("RDMA/ucma: Remove mc_list and rely on xarray") Signed-off-by: Alex Dewar <alex.dewar90@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-02RDMA/core: Fix reported speed and widthKamal Heib1-1/+1
When the returned speed from __ethtool_get_link_ksettings() is SPEED_UNKNOWN this will lead to reporting a wrong speed and width for providers that uses ib_get_eth_speed(), fix that by defaulting the netdev_speed to SPEED_1000 in case the returned value from __ethtool_get_link_ksettings() is SPEED_UNKNOWN. Fixes: d41861942fc5 ("IB/core: Add generic function to extract IB speed from netdev") Link: https://lore.kernel.org/r/20200902124304.170912-1-kamalheib1@gmail.com Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-02RDMA/core: Fix unsafe linked list traversal after failing to allocate CQXi Wang1-2/+2
It's not safe to access the next CQ in list_for_each_entry() after invoking ib_free_cq(), because the CQ has already been freed in current iteration. It should be replaced by list_for_each_entry_safe(). Fixes: c7ff819aefea ("RDMA/core: Introduce shared CQ pool API") Link: https://lore.kernel.org/r/1598963935-32335-1-git-send-email-liweihang@huawei.com Signed-off-by: Xi Wang <wangxi11@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-02RDMA/qedr: Fix reported max_pkeysKamal Heib1-1/+1
As qedr driver supports both RoCE and iWarp, make sure to set the max_pkeys only when running in RoCE mode. Link: https://lore.kernel.org/r/20200827141655.406185-1-kamalheib1@gmail.com Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Acked-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-31RDMA/qib: Tidy up process_cc()Alex Dewar1-37/+13
This function has a lot of gotos which could be replaced by simple returns, making the function tidier and less bug prone. Link: https://lore.kernel.org/r/20200825171242.448447-2-alex.dewar90@gmail.com Signed-off-by: Alex Dewar <alex.dewar90@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-31RDMA/qib: Remove superfluous fallthrough statementsAlex Dewar1-2/+0
Commit 36a8f01cd24b ("IB/qib: Add congestion control agent implementation") erroneously marked a couple of switch cases as /* FALLTHROUGH */, which were later converted to fallthrough statements by commit df561f6688fe ("treewide: Use fallthrough pseudo-keyword"). This triggered a Coverity warning about unreachable code. Remove the fallthrough statements. Link: https://lore.kernel.org/r/20200825171242.448447-1-alex.dewar90@gmail.com Addresses-Coverity: ("Unreachable code") Fixes: 36a8f01cd24b ("IB/qib: Add congestion control agent implementation") Signed-off-by: Alex Dewar <alex.dewar90@gmail.com> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-31Merge tag 'v5.9-rc3' into rdma.git for-nextJason Gunthorpe58-130/+114
Required due to dependencies in following patches. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-31RDMA/rxe: Address an issue with hardened user copyBob Pearson3-73/+2
Change rxe pools to use kzalloc instead of kmem_cache to allocate memory for rxe objects. The pools are not really necessary and they trigger hardened user copy warnings as the ioctl framework copies the QP number directly to userspace. Also the general project to move object alloation to the core code will eventually clean these out anyhow. Link: https://lore.kernel.org/r/20200827163535.2632-1-rpearson@hpe.com Signed-off-by: Bob Pearson <rpearson@hpe.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-31RDMA/rxe: Add SPDX hdrs to rxe source filesBob Pearson32-896/+32
Add SPDX headers to all rxe .c and .h files. Link: https://lore.kernel.org/r/20200827145439.2273-1-rpearson@hpe.com Signed-off-by: Bob Pearson <rpearson@hpe.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-31RDMA/core: Trigger a WARN_ON if the driver causes uobjects to become leakedJason Gunthorpe1-1/+2
Drivers that fail destroy can cause uverbs to leak uobjects. Drivers are required to always eventually destroy their ubojects, so trigger a WARN_ON to detect this driver bug. Link: https://lore.kernel.org/r/0-v1-b1e0ed400ba9+f7-warn_destroy_ufile_hw_jgg@nvidia.com Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-31RDMA/hns: Get udp sport num dynamically instead of using a fixed valueWeihang Li3-13/+41
The UDP source port number in RoCE v2 is used to create entropy for network routers (ECMP), load balancers and 802.3ad link aggregation switching that are not aware of RoCE IB headers. Considering that the IB core has achieved a new interface to get a hashed value of it, the fixed value of it in QPC and UD WQE in hns driver could be fixed and the port number is to be set dynamically now. For QPC of RC, the value could be hashed from flow_lable if the user pass it in or from remote qpn and local qpn. For WQE of UD, it is set according to fl or as a random value. Link: https://lore.kernel.org/r/1598002289-8611-1-git-send-email-liweihang@huawei.com Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27RDMA/rxe: Fix style warningsBob Pearson4-6/+4
Fixed several minor checkpatch warnings in existing rxe source. Link: https://lore.kernel.org/r/20200820224638.3212-3-rpearson@hpe.com Signed-off-by: Bob Pearson <rpearson@hpe.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27RDMA/hns: Add a check for current state before modifying QPLang Cheng1-2/+4
It should be considered an illegal operation if the ULP attempts to modify a QP from another state to the current hardware state. Otherwise, the ULP can modify some fields of QPC at any time. For example, for a QP in state of RTS, modify it from RTR to RTS can change the PSN, which is always not as expected. Fixes: 9a4435375cd1 ("IB/hns: Add driver files for hns RoCE driver") Link: https://lore.kernel.org/r/1598353674-24270-1-git-send-email-liweihang@huawei.com Signed-off-by: Lang Cheng <chenglang@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27RDMA/bnxt_re: Remove the qp from list only if the qp destroy succeedsSelvin Xavier1-11/+11
Driver crashes when destroy_qp is re-tried because of an error returned. This is because the qp entry was removed from the qp list during the first call. Remove qp from the list only if destroy_qp returns success. The driver will still trigger a WARN_ON due to the memory leaking, but at least it isn't corrupting memory too. Fixes: 8dae419f9ec7 ("RDMA/bnxt_re: Refactor queue pair creation code") Link: https://lore.kernel.org/r/1598292876-26529-2-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27RDMA/bnxt_re: Fix driver crash on unaligned PSN entry addressNaresh Kumar PBS1-2/+2
When computing the first psn entry, driver checks for page alignment. If this address is not page aligned,it attempts to compute the offset in that page for later use by using ALIGN macro. ALIGN macro does not return offset bytes but the requested aligned address and hence cannot be used directly to store as offset. Since driver was using the address itself instead of offset, it resulted in invalid address when filling the psn buffer. Fixed driver to use PAGE_MASK macro to calculate the offset. Fixes: fddcbbb02af4 ("RDMA/bnxt_re: Simplify obtaining queue entry from hw ring") Link: https://lore.kernel.org/r/1598292876-26529-7-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Naresh Kumar PBS <nareshkumar.pbs@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27RDMA/bnxt_re: Restrict the max_gids to 256Naresh Kumar PBS2-1/+2
Some adapters report more than 256 gid entries. Restrict it to 256 for now. Fixes: 1ac5a4047975("RDMA/bnxt_re: Add bnxt_re RoCE driver") Link: https://lore.kernel.org/r/1598292876-26529-6-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Naresh Kumar PBS <nareshkumar.pbs@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27RDMA/bnxt_re: Static NQ depth allocationNaresh Kumar PBS1-2/+1
At first, driver allocates memory for NQ based on qplib_ctx->cq_count and qplib_ctx->srqc_count. Later when creating ring, it uses a static value of 128K -1. Fixing this with a static value for now. Fixes: b08fe048a69d ("RDMA/bnxt_re: Refactor net ring allocation function") Link: https://lore.kernel.org/r/1598292876-26529-5-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Naresh Kumar PBS <nareshkumar.pbs@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27RDMA/bnxt_re: Fix the qp table indexingSelvin Xavier3-12/+25
qp->id can be a value outside the max number of qp. Indexing the qp table with the id can cause out of bounds crash. So changing the qp table indexing by (qp->id % max_qp -1). Allocating one extra entry for QP1. Some adapters create one more than the max_qp requested to accommodate QP1. If the qp->id is 1, store the inforamtion in the last entry of the qp table. Fixes: f218d67ef004 ("RDMA/bnxt_re: Allow posting when QPs are in error") Link: https://lore.kernel.org/r/1598292876-26529-4-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27RDMA/bnxt_re: Do not report transparent vlan from QP1Selvin Xavier1-3/+18
QP1 Rx CQE reports transparent VLAN ID in the completion and this is used while reporting the completion for received MAD packet. Check if the vlan id is configured before reporting it in the work completion. Fixes: 84511455ac5b ("RDMA/bnxt_re: report vlan_id and sl in qp1 recv completion") Link: https://lore.kernel.org/r/1598292876-26529-3-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27RDMA/mlx4: Read pkey table length instead of hardcoded valueMark Bloch1-1/+2
If the pkey_table is not available (which is the case when RoCE is not supported), the cited commit caused a regression where mlx4_devices without RoCE are not created. Fix this by returning a pkey table length of zero in procedure eth_link_query_port() if the pkey-table length reported by the device is zero. Link: https://lore.kernel.org/r/20200824110229.1094376-1-leon@kernel.org Cc: <stable@vger.kernel.org> Fixes: 1901b91f9982 ("IB/core: Fix potential NULL pointer dereference in pkey cache") Fixes: fa417f7b520e ("IB/mlx4: Add support for IBoE") Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27RDMA/rxe: Fix panic when calling kmem_cache_create()Kamal Heib3-0/+11
To avoid the following kernel panic when calling kmem_cache_create() with a NULL pointer from pool_cache(), Block the rxe_param_set_add() from running if the rdma_rxe module is not initialized. BUG: unable to handle kernel NULL pointer dereference at 000000000000000b PGD 0 P4D 0 Oops: 0000 [#1] SMP NOPTI CPU: 4 PID: 8512 Comm: modprobe Kdump: loaded Not tainted 4.18.0-231.el8.x86_64 #1 Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 10/02/2018 RIP: 0010:kmem_cache_alloc+0xd1/0x1b0 Code: 8b 57 18 45 8b 77 1c 48 8b 5c 24 30 0f 1f 44 00 00 5b 48 89 e8 5d 41 5c 41 5d 41 5e 41 5f c3 81 e3 00 00 10 00 75 0e 4d 89 fe <41> f6 47 0b 04 0f 84 6c ff ff ff 4c 89 ff e8 cc da 01 00 49 89 c6 RSP: 0018:ffffa2b8c773f9d0 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000005 RDX: 0000000000000004 RSI: 00000000006080c0 RDI: 0000000000000000 RBP: ffff8ea0a8634fd0 R08: ffffa2b8c773f988 R09: 00000000006000c0 R10: 0000000000000000 R11: 0000000000000230 R12: 00000000006080c0 R13: ffffffffc0a97fc8 R14: 0000000000000000 R15: 0000000000000000 FS: 00007f9138ed9740(0000) GS:ffff8ea4ae800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000000000b CR3: 000000046d59a000 CR4: 00000000003406e0 Call Trace: rxe_alloc+0xc8/0x160 [rdma_rxe] rxe_get_dma_mr+0x25/0xb0 [rdma_rxe] __ib_alloc_pd+0xcb/0x160 [ib_core] ib_mad_init_device+0x296/0x8b0 [ib_core] add_client_context+0x11a/0x160 [ib_core] enable_device_and_get+0xdc/0x1d0 [ib_core] ib_register_device+0x572/0x6b0 [ib_core] ? crypto_create_tfm+0x32/0xe0 ? crypto_create_tfm+0x7a/0xe0 ? crypto_alloc_tfm+0x58/0xf0 rxe_register_device+0x19d/0x1c0 [rdma_rxe] rxe_net_add+0x3d/0x70 [rdma_rxe] ? dev_get_by_name_rcu+0x73/0x90 rxe_param_set_add+0xaf/0xc0 [rdma_rxe] parse_args+0x179/0x370 ? ref_module+0x1b0/0x1b0 load_module+0x135e/0x17e0 ? ref_module+0x1b0/0x1b0 ? __do_sys_init_module+0x13b/0x180 __do_sys_init_module+0x13b/0x180 do_syscall_64+0x5b/0x1a0 entry_SYSCALL_64_after_hwframe+0x65/0xca RIP: 0033:0x7f9137ed296e This can be triggered if a user tries to use the 'module option' which is not actually a real module option but some idiotic (and thankfully no obsolete) sysfs interface. Fixes: 8700e3e7c485 ("Soft RoCE driver") Link: https://lore.kernel.org/r/20200825151725.254046-1-kamalheib1@gmail.com Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27RDMA/usnic: Remove the query_pkey callbackKamal Heib3-15/+0
Now that the query_pkey() isn't mandatory by the RDMA core, this callback can be removed from the usnic provider. The libfabric userspace never touches the pkey. Link: https://lore.kernel.org/r/20200820125346.111902-1-kamalheib1@gmail.com Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27RDMA/rxe: Fix memleak in rxe_mem_init_userDinghao Liu1-0/+1
When page_address() fails, umem should be freed just like when rxe_mem_alloc() fails. Fixes: 8700e3e7c485 ("Soft RoCE driver") Link: https://lore.kernel.org/r/20200819075632.22285-1-dinghao.liu@zju.edu.cn Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27RDMA/ucma: Remove closing and the close_wqJason Gunthorpe1-34/+15
Use cancel_work_sync() to ensure that the wq is not running and simply assign NULL to ctx->cm_id to indicate if the work ran or not. Delete the close_wq since flush_workqueue() is no longer needed. Link: https://lore.kernel.org/r/20200818120526.702120-15-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>