summaryrefslogtreecommitdiff
path: root/include/rdma
AgeCommit message (Collapse)AuthorFilesLines
2026-03-30RDMA/umem: Use consistent DMA attributes when unmapping entriesLeon Romanovsky1-0/+1
The DMA API expects that mapping and unmapping use the same DMA attributes. The RDMA umem code did not meet this requirement, so fix the mismatch. Fixes: f03d9fadfe13 ("RDMA/core: Add weak ordering dma attr to dma mapping") Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2026-03-30RDMA: Properly propagate the number of CQEs as unsigned intLeon Romanovsky1-1/+1
Instead of checking whether the number of CQEs is negative or zero, fix the .resize_user_cq() declaration to use unsigned int. This better reflects the expected value range. The sanity check is then handled correctly in ib_uvbers. Link: https://patch.msgid.link/20260319-resize_cq-cqe-v1-1-b78c6efc1def@nvidia.com Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2026-03-30RDMA: Clarify that CQ resize is a user‑space verbLeon Romanovsky1-1/+2
The CQ resize operation is used only by uverbs. Make this explicit. Link: https://patch.msgid.link/20260318-resize_cq-type-v1-2-b2846ed18846@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2026-03-30RDMA/core: Remove unused ib_resize_cq() implementationLeon Romanovsky1-9/+0
There are no in-kernel users of the CQ resize functionality, so drop it. Link: https://patch.msgid.link/20260318-resize_cq-type-v1-1-b2846ed18846@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2026-03-30RDMA/nldev: Add dellink function pointerZhu Yanjun1-0/+2
Add a dellink function pointer to rdma_link_ops to allow drivers to clean up resources created during newlink. Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> Link: https://patch.msgid.link/20260313023058.13020-2-yanjun.zhu@linux.dev Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-11RDMA/rdmavt: Add driver mmap callbackDean Luick1-0/+3
Add a reserved range and a driver callback to allow the driver to have custom mmaps. Generated mmap offsets are cookies and are not related to the size of the mmap. Advance the mmap offset by the minimum, PAGE_SIZE, rather than the size of the mmap. Signed-off-by: Dean Luick <dean.luick@cornelisnetworks.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Link: https://patch.msgid.link/177308909972.1279894.15543003811821875042.stgit@awdrv-04.cornelisnetworks.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-11RDMA/rdmavt: Add ucontext alloc/dealloc passthroughDean Luick1-0/+7
Add a private data pointer to the ucontext structure and add per-client pass-throughs. Signed-off-by: Dean Luick <dean.luick@cornelisnetworks.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Link: https://patch.msgid.link/177325008318.52243.7367786996925601681.stgit@awdrv-04.cornelisnetworks.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-11RDMA/OPA: Update OPA link speed listDean Luick1-3/+5
Update the list of available link speeds. Fix comments. Signed-off-by: Dean Luick <dean.luick@cornelisnetworks.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Link: https://patch.msgid.link/177308908456.1279894.16723781060261360236.stgit@awdrv-04.cornelisnetworks.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-10RDMA/hfi1: Remove opa_vnicDennis Dalessandro2-102/+0
OPA Vnic has been abandoned and left to rot. Time to excise. Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Link: https://patch.msgid.link/177308912950.1280237.15051663328388849915.stgit@awdrv-04.cornelisnetworks.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-08RDMA/umem: Add helpers for umem dmabuf revoke lockJacob Moroni1-0/+4
Added helpers to acquire and release the umem dmabuf revoke lock. The intent is to avoid the need for drivers to peek into the ib_umem_dmabuf internals to get the dma_resv_lock and bring us one step closer to abstracting ib_umem_dmabuf away from drivers in general. Signed-off-by: Jacob Moroni <jmoroni@google.com> Link: https://patch.msgid.link/20260305170826.3803155-5-jmoroni@google.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-08RDMA/umem: Add pinned revocable dmabuf import interfaceJacob Moroni1-0/+19
Added an interface for importing a pinned but revocable dmabuf. This interface can be used by drivers that are capable of revocation so that they can import dmabufs from exporters that may require it, such as VFIO. This interface implements a two step process, where drivers will first call ib_umem_dmabuf_get_pinned_revocable_and_lock() which will pin and map the dmabuf (and provide a functional move_notify/invalidate_mappings callback), but will return with the lock still held so that the driver can then populate the callback via ib_umem_dmabuf_set_revoke_locked() without races from concurrent revocations. This scheme also allows for easier integration with drivers that may not have actually allocated their internal MR objects at the time of the get_pinned_revocable* call. Signed-off-by: Jacob Moroni <jmoroni@google.com> Link: https://patch.msgid.link/20260305170826.3803155-4-jmoroni@google.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-08RDMA: Add IB_UVERBS_CORE_SUPPORT_ROBUST_UDATAJason Gunthorpe1-0/+6
This flag can be set by drivers once they have finished auditing and implementing the full udata support on every udata operation. My intention going forward is that driver authors proposing new udata uAPI for their drivers must first do the work and set this flag. If this flag is not set the userspace should not try to use udata based uAPI newer than this commit, though on a case by case basis it may be OK based on what checks historical kernels performed on the specific call. Since bnxt_re is audited now, it is the first driver to set the flag. Link: https://patch.msgid.link/r/13-v3-bd56dd443069+49-bnxt_re_uapi_jgg@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-03-08RDMA: Provide documentation about the uABI compatibility rulesJason Gunthorpe1-0/+87
Write down how all of this is supposed to work using the new helpers. Link: https://patch.msgid.link/r/7-v3-bd56dd443069+49-bnxt_re_uapi_jgg@nvidia.com Tested-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Acked-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-03-08RDMA: Add ib_is_udata_in_empty()Jason Gunthorpe1-0/+15
If the driver doesn't yet support any request driver data it should check that it is all zeroed. This is a common pattern, add a helper around _ib_copy_validate_udata_in() to do this. Link: https://patch.msgid.link/r/6-v3-bd56dd443069+49-bnxt_re_uapi_jgg@nvidia.com Tested-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Acked-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-03-08RDMA: Add ib_respond_udata()Jason Gunthorpe1-0/+33
Wrap the common copy_to_user() pattern used in drivers and enhance it to zero pad as well. Include debug logging on failures. Link: https://patch.msgid.link/r/5-v3-bd56dd443069+49-bnxt_re_uapi_jgg@nvidia.com Tested-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Acked-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-03-08RDMA: Add ib_copy_validate_udata_in_cm()Jason Gunthorpe1-0/+25
For structures with comp_mask also absorb the check of comp_mask valid bits into the helper. This is slightly tricky because ~ might not fully extend to 64 bits, the helper inserts an explicit type to ensure that ~ covers all bits. Link: https://patch.msgid.link/r/4-v3-bd56dd443069+49-bnxt_re_uapi_jgg@nvidia.com Tested-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Acked-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-03-08RDMA: Add ib_copy_validate_udata_in()Jason Gunthorpe1-0/+26
Add a new function to consolidate the required compatibility pattern for driver data of checking against a minimum size, and checking for unknown trailing bytes to be zero into a function. This new function uses the faster copy_struct_from_user() instead of trying to directly check for zero. Incorporate the common ibdev_dbg() logging directly into the error paths of the helper. Link: https://patch.msgid.link/r/3-v3-bd56dd443069+49-bnxt_re_uapi_jgg@nvidia.com Tested-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Acked-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-03-08RDMA/core: Add rdma_udata_to_dev()Jason Gunthorpe1-0/+2
Get an ib_device out of a udata so it can be used for debug prints. Link: https://patch.msgid.link/r/2-v3-bd56dd443069+49-bnxt_re_uapi_jgg@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-03-05Add support for TLP emulationLeon Romanovsky1-1/+1
This series adds support for Transaction Layer Packet (TLP) emulation response gateway regions, enabling userspace device emulation software to write TLP responses directly to lower layers without kernel driver involvement. Currently, the mlx5 driver exposes VirtIO emulation access regions via the MLX5_IB_METHOD_VAR_OBJ_ALLOC ioctl. This series extends that ioctl to also support allocating TLP response gateway channels for PCI device emulation use cases. Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-05RDMA/core: Delete not-implemented get_vector_affinityLeon Romanovsky1-23/+0
No drivers implement .get_vector_affinity(), and no callers invoke ib_get_vector_affinity(), so remove it. Link: https://patch.msgid.link/20260226-get_vector_affinity-v1-1-910a899c4e5d@nvidia.com Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2026-03-02RDMA/core: Add pinned handles to FRMR poolsMichael Guralnik1-0/+2
Add a configuration of pinned handles on a specific FRMR pool. The configured amount of pinned handles will not be aged and will stay available for users to claim. Upon setting the amount of pinned handles to an FRMR pool, we will make sure we have at least the pinned amount of handles associated with the pool and create more, if necessary. The count for pinned handles take into account handles that are used by user MRs and handles in the queue. Introduce a new FRMR operation of build_key that allows drivers to manipulate FRMR keys supplied by the user, allowing failing for unsupported properties and masking of properties that are modifiable. Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Reviewed-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> Link: https://patch.msgid.link/20260226-frmr_pools-v4-5-95360b54f15e@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-02IB/core: Introduce FRMR poolsMichael Guralnik2-0/+45
Add a generic Fast Registration Memory Region pools mechanism to allow drivers to optimize memory registration performance. Drivers that have the ability to reuse MRs or their underlying HW objects can take advantage of the mechanism to keep a 'handle' for those objects and use them upon user request. We assume that to achieve this goal a driver and its HW should implement a modify operation for the MRs that is able to at least clear and set the MRs and in more advanced implementations also support changing a subset of the MRs properties. The mechanism is built using an RB-tree consisting of pools, each pool represents a set of MR properties that are shared by all of the MRs residing in the pool and are unmodifiable by the vendor driver or HW. The exposed API from ib_core to the driver has 4 operations: Init and cleanup - handles data structs and locks for the pools. Push and pop - store and retrieve 'handle' for a memory registration or deregistrations request. The FRMR pools mechanism implements the logic to search the RB-tree for a pool with matching properties and create a new one when needed and requires the driver to implement creation and destruction of a 'handle' when pool is empty or a handle is requested or is being destroyed. Later patch will introduce Netlink API to interact with the FRMR pools mechanism to allow users to both configure and track its usage. A vendor wishing to configure FRMR pool without exposing it or without exposing internal MR properties to users, should use the kernel_vendor_key field in the pools key. This can be useful in a few cases, e.g, when the FRMR handle has a vendor-specific un-modifiable property that the user registering the memory might not be aware of. Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Reviewed-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> Link: https://patch.msgid.link/20260226-frmr_pools-v4-2-95360b54f15e@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-02-25RDMA/core: Prepare create CQ path for API unificationLeon Romanovsky1-2/+1
Ensure that .create_cq_umem() and .create_cq() follow the same API contract, allowing drivers to be gradually migrated to the umem-aware CQ management flow. Link: https://patch.msgid.link/20260213-refactor-umem-v1-7-f3be85847922@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2026-02-25RDMA/core: Manage CQ umem in core codeLeon Romanovsky1-0/+1
In the current implementation, CQ umem is handled both by ib_core and the driver. ib_core sometimes creates and destroys it, while the driver also destroys it. Store the umem in struct ib_cq and ensure that only ib_core manages its lifetime, relying solely on its internal reference counter. Link: https://patch.msgid.link/20260213-refactor-umem-v1-5-f3be85847922@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2026-02-25RDMA/umem: Remove unnecessary includes and defines from ib_umem headerLeon Romanovsky1-4/+0
The ib_umem header no longer requires the removed includes or forward declarations, so drop them to reduce clutter. Link: https://patch.msgid.link/20260213-refactor-umem-v1-3-f3be85847922@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2026-02-25RDMA/umem: Allow including ib_umem header from any locationLeon Romanovsky1-1/+1
Including ib_umem.h currently triggers circular dependency errors. These issues can be resolved by removing the include of ib_verbs.h, which was only needed to resolve the struct ib_device pointer. >> depmod: ERROR: Cycle detected: ib_core -> ib_uverbs -> ib_core >> depmod: ERROR: Found 2 modules in dependency cycles! make[3]: *** [scripts/Makefile.modinst:132: depmod] Error 1 make[3]: Target '__modinst' not remade because of errors. make[2]: *** [Makefile:1960: modules_install] Error 2 make[1]: *** [Makefile:248: __sub-make] Error 2 make[1]: Target 'modules_install' not remade because of errors. make: *** [Makefile:248: __sub-make] Error 2 make: Target 'modules_install' not remade because of errors. Link: https://patch.msgid.link/20260213-refactor-umem-v1-2-f3be85847922@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2026-02-25RDMA: Move DMA block iterator logic into dedicated filesLeon Romanovsky3-80/+88
The DMA iterator logic was mixed into verbs and umem-specific code, forcing all users to include rdma/ib_umem.h. Move the block iterator logic into iter.c and rdma/iter.h so that rdma/ib_umem.h and rdma/ib_verbs.h can be separated in a follow-up patch. Link: https://patch.msgid.link/20260213-refactor-umem-v1-1-f3be85847922@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2026-02-25RDMA/core: Check id_priv->restricted_node_type in cma_listen_on_dev()Stefan Metzmacher1-1/+1
When listening on wildcard addresses we have a global list for the application layer rdma_cm_id and for any existing device or any device added in future we try to listen on any wildcard listener. When the listener has a restricted_node_type we should prevent listening on devices with a different node type. While there fix the documentation comment of rdma_restrict_node_type() to include rdma_resolve_addr() instead of having rdma_bind_addr() twice. Fixes: a760e80e90f5 ("RDMA/core: introduce rdma_restrict_node_type()") Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Leon Romanovsky <leon@kernel.org> Cc: Steve French <smfrench@gmail.com> Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: linux-rdma@vger.kernel.org Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Link: https://patch.msgid.link/20260224165951.3582093-2-metze@samba.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-02-24RDMA/restrack: fix kernel-doc indicatorRandy Dunlap1-2/+2
Use "/**" to begin kernel-doc comments. This eliminates these kernel-doc warnings: Warning: include/rdma/restrack.h:123 struct member 'kref' not described in 'rdma_restrack_entry' Warning: include/rdma/restrack.h:123 struct member 'comp' not described in 'rdma_restrack_entry' (not adding missing return value kernel-doc descriptions) Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Link: https://patch.msgid.link/20260224003149.3175815-1-rdunlap@infradead.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-02-24RDMA/iwcm: fix some kernel-doc issues in iw_cm.hRandy Dunlap1-7/+7
Use the "typedef" keyword as needed. Correct 2 function parameter names. Warning: include/rdma/iw_cm.h:42 function parameter 'iw_cm_handler' not described in 'int' Warning: include/rdma/iw_cm.h:42 expecting prototype for iw_cm_handler(). Prototype was for int() instead Warning: include/rdma/iw_cm.h:53 function parameter 'iw_event_handler' not described in 'int' Warning: include/rdma/iw_cm.h:53 expecting prototype for iw_event_handler(). Prototype was for int() instead Warning: include/rdma/iw_cm.h:104 function parameter 'cm_handler' not described in 'iw_create_cm_id' Warning: include/rdma/iw_cm.h:158 function parameter 'private_data' not described in 'iw_cm_reject' (not adding missing return value kernel-doc descriptions) Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Link: https://patch.msgid.link/20260224003134.3174856-1-rdunlap@infradead.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-02-24RDMA/umem: fix kernel-doc warningsRandy Dunlap1-1/+5
Add or correct kernel-doc comments to eliminate warnings: Warning: include/rdma/ib_umem.h:104 function parameter 'biter' not described in 'rdma_umem_for_each_dma_block' Warning: include/rdma/ib_umem.h:140 function parameter 'pgsz_bitmap' not described in 'ib_umem_find_best_pgoff' Warning: include/rdma/ib_umem.h:141 No description found for return value of 'ib_umem_find_best_pgoff' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Link: https://patch.msgid.link/20260224003120.3173892-1-rdunlap@infradead.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-02-24IB/cache: avoid kernel-doc warningsRandy Dunlap1-2/+2
Use the correct function parameters names to eliminate kernel-doc warnings: Warning: include/rdma/ib_cache.h:47 function parameter 'device_handle' not described in 'ib_get_cached_pkey' Warning: include/rdma/ib_cache.h:89 function parameter 'port_active' not described in 'ib_get_cached_port_state' (not adding missing function return value descriptions) Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Link: https://patch.msgid.link/20260224003106.3172916-1-rdunlap@infradead.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-02-13Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds3-2/+91
Pull rdma updates from Jason Gunthorpe: "Usual smallish cycle. The NFS biovec work to push it down into RDMA instead of indirecting through a scatterlist is pretty nice to see, been talked about for a long time now. - Various code improvements in irdma, rtrs, qedr, ocrdma, irdma, rxe - Small driver improvements and minor bug fixes to hns, mlx5, rxe, mana, mlx5, irdma - Robusness improvements in completion processing for EFA - New query_port_speed() verb to move past limited IBA defined speed steps - Support for SG_GAPS in rts and many other small improvements - Rare list corruption fix in iwcm - Better support different page sizes in rxe - Device memory support for mana - Direct bio vec to kernel MR for use by NFS-RDMA - QP rate limiting for bnxt_re - Remote triggerable NULL pointer crash in siw - DMA-buf exporter support for RDMA mmaps like doorbells" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (66 commits) RDMA/mlx5: Implement DMABUF export ops RDMA/uverbs: Add DMABUF object type and operations RDMA/uverbs: Support external FD uobjects RDMA/siw: Fix potential NULL pointer dereference in header processing RDMA/umad: Reject negative data_len in ib_umad_write IB/core: Extend rate limit support for RC QPs RDMA/mlx5: Support rate limit only for Raw Packet QP RDMA/bnxt_re: Report QP rate limit in debugfs RDMA/bnxt_re: Report packet pacing capabilities when querying device RDMA/bnxt_re: Add support for QP rate limiting MAINTAINERS: Drop RDMA files from Hyper-V section RDMA/uverbs: Add __GFP_NOWARN to ib_uverbs_unmarshall_recv() kmalloc svcrdma: use bvec-based RDMA read/write API RDMA/core: add rdma_rw_max_sge() helper for SQ sizing RDMA/core: add MR support for bvec-based RDMA operations RDMA/core: use IOVA-based DMA mapping for bvec RDMA operations RDMA/core: add bio_vec based RDMA read/write API RDMA/irdma: Use kvzalloc for paged memory DMA address array RDMA/rxe: Fix race condition in QP timer handlers RDMA/mana_ib: Add device‑memory support ...
2026-02-09RDMA/uverbs: Add DMABUF object type and operationsYishai Hadas2-0/+10
Expose DMABUF functionality to userspace through the uverbs interface, enabling InfiniBand/RDMA devices to export PCI based memory regions (e.g. device memory) as DMABUF file descriptors. This allows zero-copy sharing of RDMA memory with other subsystems that support the dma-buf framework. A new UVERBS_OBJECT_DMABUF object type and allocation method were introduced. During allocation, uverbs invokes the driver to supply the rdma_user_mmap_entry associated with the given page offset (pgoff). Based on the returned rdma_user_mmap_entry, uverbs requests the driver to provide the corresponding physical-memory details as well as the driver’s PCI provider information. Using this information, dma_buf_export() is called; if it succeeds, uobj->object is set to the underlying file pointer returned by the dma-buf framework. The file descriptor number follows the standard uverbs allocation flow, but the file pointer comes from the dma-buf subsystem, including its own fops and private data. When an mmap entry is removed, uverbs iterates over its associated DMABUFs, marks them as revoked, and calls dma_buf_move_notify() so that their importers are notified. The same procedure applies during the disassociate flow; final cleanup occurs when the application closes the file. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> Link: https://patch.msgid.link/20260201-dmabuf-export-v3-2-da238b614fe3@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-02-09RDMA/core: introduce rdma_restrict_node_type()Stefan Metzmacher1-0/+17
For smbdirect it required to use different ports depending on the RDMA protocol. E.g. for iWarp 5445 is needed (as tcp port 445 already used by the raw tcp transport for SMB), while InfiniBand, RoCEv1 and RoCEv2 use port 445, as they use an independent port range (even for RoCEv2, which uses udp port 4791 itself). Currently ksmbd is not able to function correctly at all if the system has iWarp (RDMA_NODE_RNIC) interface(s) and any InfiniBand, RoCEv1 and/or RoCEv2 interface(s) at the same time. And cifs.ko uses 5445 with a fallback to 445, which means depending on the available interfaces, it tries 5445 in the RoCE range or may tries iWarp with 445 as a fallback. This leads to strange error messages and strange network captures. To avoid these problems they will be able to use rdma_restrict_node_type(RDMA_NODE_RNIC) before trying port 5445 and rdma_restrict_node_type(RDMA_NODE_IB_CA) before trying port 445. It means we'll get early -ENODEV early from rdma_resolve_addr() without any network traffic and timeouts. This is designed to be called before calling any of rdma_bind_addr(), rdma_resolve_addr() or rdma_listen(). Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: linux-rdma@vger.kernel.org Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Acked-by: Leon Romanovsky <leon@kernel.org> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2026-01-28RDMA/core: add rdma_rw_max_sge() helper for SQ sizingChuck Lever1-0/+2
svc_rdma_accept() computes sc_sq_depth as the sum of rq_depth and the number of rdma_rw contexts (ctxts). This value is used to allocate the Send CQ and to initialize the sc_sq_avail credit pool. However, when the device uses memory registration for RDMA operations, rdma_rw_init_qp() inflates the QP's max_send_wr by a factor of three per context to account for REG and INV work requests. The Send CQ and credit pool remain sized for only one work request per context, causing Send Queue exhaustion under heavy NFS WRITE workloads. Introduce rdma_rw_max_sge() to compute the actual number of Send Queue entries required for a given number of rdma_rw contexts. Upper layer protocols call this helper before creating a Queue Pair so that their Send CQs and credit accounting match the QP's true capacity. Update svc_rdma_accept() to use rdma_rw_max_sge() when computing sc_sq_depth, ensuring the credit pool reflects the work requests that rdma_rw_init_qp() will reserve. Reviewed-by: Christoph Hellwig <hch@lst.de> Fixes: 00bd1439f464 ("RDMA/rw: Support threshold for registration vs scattering to local pages") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Link: https://patch.msgid.link/20260128005400.25147-5-cel@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-01-28RDMA/core: add MR support for bvec-based RDMA operationsChuck Lever1-0/+1
The bvec-based RDMA API currently returns -EOPNOTSUPP when Memory Region registration is required. This prevents iWARP devices from using the bvec path, since iWARP requires MR registration for RDMA READ operations. The force_mr debug parameter is also unusable with bvec input. Add rdma_rw_init_mr_wrs_bvec() to handle MR registration for bvec arrays. The approach creates a synthetic scatterlist populated with DMA addresses from the bvecs, then reuses the existing ib_map_mr_sg() infrastructure. This avoids driver changes while keeping the implementation small. The synthetic scatterlist is stored in the rdma_rw_ctx for cleanup. On destroy, the MRs are returned to the pool and the bvec DMA mappings are released using the stored addresses. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Link: https://patch.msgid.link/20260128005400.25147-4-cel@kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-01-28RDMA/core: use IOVA-based DMA mapping for bvec RDMA operationsChuck Lever1-0/+8
The bvec RDMA API maps each bvec individually via dma_map_phys(), requiring an IOTLB sync for each mapping. For large I/O operations with many bvecs, this overhead becomes significant. The two-step IOVA API (dma_iova_try_alloc / dma_iova_link / dma_iova_sync) allocates a contiguous IOVA range upfront, links all physical pages without IOTLB syncs, then performs a single sync at the end. This reduces IOTLB flushes from O(n) to O(1). It also requires only a single output dma_addr_t compared to extra per-input element storage in struct scatterlist. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Link: https://patch.msgid.link/20260128005400.25147-3-cel@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-01-28RDMA/core: add bio_vec based RDMA read/write APIChuck Lever2-0/+53
The existing rdma_rw_ctx_init() API requires callers to construct a scatterlist, which is then DMA-mapped page by page. Callers that already have data in bio_vec form (such as the NVMe-oF target) must first convert to scatterlist, adding overhead and complexity. Introduce rdma_rw_ctx_init_bvec() and rdma_rw_ctx_destroy_bvec() to accept bio_vec arrays directly. The new helpers use dma_map_phys() for hardware RDMA devices and virtual addressing for software RDMA devices (rxe, siw), avoiding intermediate scatterlist construction. Memory registration (MR) path support is deferred to a follow-up series; callers requiring MR-based transfers (iWARP devices or force_mr=1) receive -EOPNOTSUPP and should use the scatterlist API. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Link: https://patch.msgid.link/20260128005400.25147-2-cel@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-01-05RDMA/core: Avoid exporting module local functions and remove not-used onesParav Pandit1-2/+0
Some of the functions are local to the module and some are not used starting from commit 36783dec8d79 ("RDMA/rxe: Delete deprecated module parameters interface"). Delete and avoid exporting them. Signed-off-by: Parav Pandit <parav@nvidia.com> Link: https://patch.msgid.link/20260104-ib-core-misc-v1-2-00367f77f3a8@nvidia.com Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-01-05IB/core: Add query_port_speed verbOr Har-Toov1-0/+2
Add new ibv_query_port_speed() verb to enable applications to query the effective bandwidth of a port. This verb is particularly useful when the speed is not a multiplication of IB speed and width where width is 2^n. Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-01-05IB/core: Add helper to convert port attributes to data rateOr Har-Toov1-0/+14
Introduce ib_port_attr_to_rate() to compute the data rate in 100 Mbps units (deci-Gb/sec) from a port's active_speed and active_width attributes. This generic helper removes duplicated speed-to-rate calculations, which are used by sysfs and the upcoming new verb. Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-01-05IB/core: Add async event on device speed changeOr Har-Toov1-0/+1
Add IB_EVENT_DEVICE_SPEED_CHANGE for notifying user applications on device's ports speed changes. Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-11-24RDMA/core: Add new IB rate for XDR (8x) supportMaher Sanalla1-0/+1
Add the new rates as defined in the Infiniband spec for XDR and 8x link width support. Furthermore, modify the utility conversion methods accordingly. Reference: IB Spec Release 1.8 Reviewed-by: Michael Guralnik <michaelgur@nvidia.com> Signed-off-by: Maher Sanalla <msanalla@nvidia.com> Link: https://patch.msgid.link/20251120-speed-8-v1-1-e6a7efef8cb8@nvidia.com Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-11-12RDMA/cm: Correct typedef and bad line warningsRandy Dunlap1-2/+2
In include/rdma/ib_cm.h: Correct a typedef's kernel-doc notation by adding the 'typedef' keyword to it to avoid a warning. Add a leading " *" to a kernel-doc line to avoid a warning. Warning: ib_cm.h:289 function parameter 'ib_cm_handler' not described in 'int' Warning: ib_cm.h:289 expecting prototype for ib_cm_handler(). Prototype was for int() instead Warning: ib_cm.h:484 bad line: connection message in case duplicates are received. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Link: https://patch.msgid.link/20251112062908.2711007-1-rdunlap@infradead.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-11-06IB/rdmavt: rdmavt_qp.h: clean up kernel-doc commentsRandy Dunlap1-34/+36
Correct the kernel-doc comments format to avoid around 35 kernel-doc warnings: - use struct keyword to introduce struct kernel-doc comments - use correct variable name for some struct members - use correct function name in comments for some functions - fix spelling in a few comments - use a ':' instead of '-' to separate struct members from their descriptions - add a function name heading for rvt_div_mtu() This leaves one struct member that is not described: rdmavt_qp.h:206: warning: Function parameter or struct member 'wq' not described in 'rvt_krwq' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Link: https://patch.msgid.link/20251105045127.106822-1-rdunlap@infradead.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-10-20RDMA/uverbs: fix some kernel-doc warningsRandy Dunlap1-49/+50
Fix 49 kernel-doc warnings in ib_verbs.h: - Add struct short description for rdma_stat_desc, rdma_hw_stats. - Fix kernel-doc format for struct members (use ':' instead of '-') for several structs. - Don't use "/**" kernel-doc notation for struct members in ib_device_ops (most members are not documented and most of the kernel-doc was not formatted correctly). - Spell function parameters correctly in ib_dma_map_sgtable_attrs(), ib_device_try_get(), rdma_roce_rescan_device(). - Add kernel-doc for the function parameter in rdma_flow_label_to_udp_sport(). Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Link: https://patch.msgid.link/20251020034320.3011094-1-rdunlap@infradead.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-08-13RDMA/ucma: Support write an event into a CMMark Zhang1-1/+4
Enable user-space to inject an event into a CM through it's event channel. Two new events are added and supported: RDMA_CM_EVENT_USER and RDMA_CM_EVENT_INTERNAL. With these 2 events a new event parameter "arg" is supported, which is passed from sender to receiver transparently. With this feature an application is able to write an event into a CM channel with a new user-space rdmacm API. For example thread T1 could write an event with the API: rdma_write_cm_event(cm_id, RDMA_CM_EVENT_USER, status, arg); and thread T2 could receive the event with rdma_get_cm_event(). Signed-off-by: Mark Zhang <markzhang@nvidia.com> Reviewed-by: Vlad Dumitrescu <vdumitrescu@nvidia.com> Link: https://patch.msgid.link/fdf49d0b17a45933c5d8c1d90605c9447d9a3c73.1751279794.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-08-13RDMA/cma: Support IB service record resolutionMark Zhang1-1/+17
Add new UCMA command and the corresponding CMA implementation. Userspace can send this command to request service resolution based on service name or ID. On a successful resolution, one or multiple service records are returned, the first one will be used as destination address by default. Two new CM events are added and returned to caller accordingly: - RDMA_CM_EVENT_ADDRINFO_RESOLVED: Resolve succeeded; - RDMA_CM_EVENT_ADDRINFO_ERROR: Resolve failed. Internally two new CM states are added: - RDMA_CM_ADDRINFO_QUERY: CM is in the process of IB service resolution; - RDMA_CM_ADDRINFO_RESOLVED: CM has finished the resolve process. With these new states, beside existing state transfer processes, 2 new processes are supported: 1. The default address is used: RDMA_CM_ADDR_BOUND -> RDMA_CM_ADDRINFO_QUERY -> RDMA_CM_ADDRINFO_RESOLVED -> RDMA_CM_ROUTE_QUERY 2. To use a different address: RDMA_CM_ADDR_BOUND -> RDMA_CM_ADDRINFO_QUERY-> RDMA_CM_ADDRINFO_RESOLVED -> RDMA_CM_ADDR_QUERY -> RDMA_CM_ADDR_RESOLVED -> RDMA_CM_ROUTE_QUERY In the 2nd case, resolve_addrinfo returns multiple records, a user could call rdma_resolve_addr() with the one that is not the first. Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Signed-off-by: Mark Zhang <markzhang@nvidia.com> Reviewed-by: Vlad Dumitrescu <vdumitrescu@nvidia.com> Link: https://patch.msgid.link/b6e82ad75522a13b5efe4ff86da0e465aab04cc2.1751279794.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-08-13RDMA/sa_query: Support IB service records resolutionMark Zhang2-0/+38
Add an SA query API ib_sa_service_rec_get() to support building and sending SA query MADs that ask for service records with a specific name or ID, and receiving and parsing responses from the SM. Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Signed-off-by: Mark Zhang <markzhang@nvidia.com> Reviewed-by: Vlad Dumitrescu <vdumitrescu@nvidia.com> Link: https://patch.msgid.link/9af6c82f3a3a9d975115a33235fb4ffc7c8edb21.1751279793.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>