| Age | Commit message (Collapse) | Author | Files | Lines |
|
The DMA API expects that mapping and unmapping use the same DMA
attributes. The RDMA umem code did not meet this requirement, so fix
the mismatch.
Fixes: f03d9fadfe13 ("RDMA/core: Add weak ordering dma attr to dma mapping")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
Instead of checking whether the number of CQEs is negative or zero, fix the
.resize_user_cq() declaration to use unsigned int. This better reflects the
expected value range. The sanity check is then handled correctly in ib_uvbers.
Link: https://patch.msgid.link/20260319-resize_cq-cqe-v1-1-b78c6efc1def@nvidia.com
Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
The CQ resize operation is used only by uverbs. Make this explicit.
Link: https://patch.msgid.link/20260318-resize_cq-type-v1-2-b2846ed18846@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
There are no in-kernel users of the CQ resize functionality, so drop it.
Link: https://patch.msgid.link/20260318-resize_cq-type-v1-1-b2846ed18846@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
Add a dellink function pointer to rdma_link_ops to
allow drivers to clean up resources created during
newlink.
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Link: https://patch.msgid.link/20260313023058.13020-2-yanjun.zhu@linux.dev
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Add a reserved range and a driver callback to allow the driver to
have custom mmaps.
Generated mmap offsets are cookies and are not related to the size of
the mmap. Advance the mmap offset by the minimum, PAGE_SIZE, rather
than the size of the mmap.
Signed-off-by: Dean Luick <dean.luick@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Link: https://patch.msgid.link/177308909972.1279894.15543003811821875042.stgit@awdrv-04.cornelisnetworks.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Add a private data pointer to the ucontext structure and add
per-client pass-throughs.
Signed-off-by: Dean Luick <dean.luick@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Link: https://patch.msgid.link/177325008318.52243.7367786996925601681.stgit@awdrv-04.cornelisnetworks.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Update the list of available link speeds. Fix comments.
Signed-off-by: Dean Luick <dean.luick@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Link: https://patch.msgid.link/177308908456.1279894.16723781060261360236.stgit@awdrv-04.cornelisnetworks.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
OPA Vnic has been abandoned and left to rot. Time to excise.
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Link: https://patch.msgid.link/177308912950.1280237.15051663328388849915.stgit@awdrv-04.cornelisnetworks.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Added helpers to acquire and release the umem dmabuf revoke
lock. The intent is to avoid the need for drivers to peek
into the ib_umem_dmabuf internals to get the dma_resv_lock
and bring us one step closer to abstracting ib_umem_dmabuf
away from drivers in general.
Signed-off-by: Jacob Moroni <jmoroni@google.com>
Link: https://patch.msgid.link/20260305170826.3803155-5-jmoroni@google.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Added an interface for importing a pinned but revocable dmabuf.
This interface can be used by drivers that are capable of revocation
so that they can import dmabufs from exporters that may require it,
such as VFIO.
This interface implements a two step process, where drivers will first
call ib_umem_dmabuf_get_pinned_revocable_and_lock() which will pin and
map the dmabuf (and provide a functional move_notify/invalidate_mappings
callback), but will return with the lock still held so that the
driver can then populate the callback via
ib_umem_dmabuf_set_revoke_locked() without races from concurrent
revocations. This scheme also allows for easier integration with drivers
that may not have actually allocated their internal MR objects at the time
of the get_pinned_revocable* call.
Signed-off-by: Jacob Moroni <jmoroni@google.com>
Link: https://patch.msgid.link/20260305170826.3803155-4-jmoroni@google.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
This flag can be set by drivers once they have finished auditing and
implementing the full udata support on every udata operation.
My intention going forward is that driver authors proposing new udata uAPI
for their drivers must first do the work and set this flag.
If this flag is not set the userspace should not try to use udata based
uAPI newer than this commit, though on a case by case basis it may be OK
based on what checks historical kernels performed on the specific call.
Since bnxt_re is audited now, it is the first driver to set the flag.
Link: https://patch.msgid.link/r/13-v3-bd56dd443069+49-bnxt_re_uapi_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Write down how all of this is supposed to work using the new helpers.
Link: https://patch.msgid.link/r/7-v3-bd56dd443069+49-bnxt_re_uapi_jgg@nvidia.com
Tested-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Acked-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
If the driver doesn't yet support any request driver data it should check
that it is all zeroed. This is a common pattern, add a helper around
_ib_copy_validate_udata_in() to do this.
Link: https://patch.msgid.link/r/6-v3-bd56dd443069+49-bnxt_re_uapi_jgg@nvidia.com
Tested-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Acked-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Wrap the common copy_to_user() pattern used in drivers and enhance it
to zero pad as well. Include debug logging on failures.
Link: https://patch.msgid.link/r/5-v3-bd56dd443069+49-bnxt_re_uapi_jgg@nvidia.com
Tested-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Acked-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
For structures with comp_mask also absorb the check of comp_mask valid
bits into the helper. This is slightly tricky because ~ might not fully
extend to 64 bits, the helper inserts an explicit type to ensure that ~
covers all bits.
Link: https://patch.msgid.link/r/4-v3-bd56dd443069+49-bnxt_re_uapi_jgg@nvidia.com
Tested-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Acked-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Add a new function to consolidate the required compatibility pattern for
driver data of checking against a minimum size, and checking for unknown
trailing bytes to be zero into a function.
This new function uses the faster copy_struct_from_user() instead of
trying to directly check for zero.
Incorporate the common ibdev_dbg() logging directly into the error paths
of the helper.
Link: https://patch.msgid.link/r/3-v3-bd56dd443069+49-bnxt_re_uapi_jgg@nvidia.com
Tested-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Acked-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Get an ib_device out of a udata so it can be used for debug prints.
Link: https://patch.msgid.link/r/2-v3-bd56dd443069+49-bnxt_re_uapi_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
This series adds support for Transaction Layer Packet (TLP) emulation
response gateway regions, enabling userspace device emulation software
to write TLP responses directly to lower layers without kernel driver
involvement.
Currently, the mlx5 driver exposes VirtIO emulation access regions via
the MLX5_IB_METHOD_VAR_OBJ_ALLOC ioctl. This series extends that
ioctl to also support allocating TLP response gateway channels for
PCI device emulation use cases.
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
No drivers implement .get_vector_affinity(), and no callers invoke
ib_get_vector_affinity(), so remove it.
Link: https://patch.msgid.link/20260226-get_vector_affinity-v1-1-910a899c4e5d@nvidia.com
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
Add a configuration of pinned handles on a specific FRMR pool.
The configured amount of pinned handles will not be aged and will stay
available for users to claim.
Upon setting the amount of pinned handles to an FRMR pool, we will make
sure we have at least the pinned amount of handles associated with the
pool and create more, if necessary.
The count for pinned handles take into account handles that are used by
user MRs and handles in the queue.
Introduce a new FRMR operation of build_key that allows drivers to
manipulate FRMR keys supplied by the user, allowing failing for
unsupported properties and masking of properties that are modifiable.
Signed-off-by: Michael Guralnik <michaelgur@nvidia.com>
Reviewed-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Edward Srouji <edwards@nvidia.com>
Link: https://patch.msgid.link/20260226-frmr_pools-v4-5-95360b54f15e@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Add a generic Fast Registration Memory Region pools mechanism to allow
drivers to optimize memory registration performance.
Drivers that have the ability to reuse MRs or their underlying HW
objects can take advantage of the mechanism to keep a 'handle' for those
objects and use them upon user request.
We assume that to achieve this goal a driver and its HW should implement
a modify operation for the MRs that is able to at least clear and set the
MRs and in more advanced implementations also support changing a subset
of the MRs properties.
The mechanism is built using an RB-tree consisting of pools, each pool
represents a set of MR properties that are shared by all of the MRs
residing in the pool and are unmodifiable by the vendor driver or HW.
The exposed API from ib_core to the driver has 4 operations:
Init and cleanup - handles data structs and locks for the pools.
Push and pop - store and retrieve 'handle' for a memory registration
or deregistrations request.
The FRMR pools mechanism implements the logic to search the RB-tree for
a pool with matching properties and create a new one when needed and
requires the driver to implement creation and destruction of a 'handle'
when pool is empty or a handle is requested or is being destroyed.
Later patch will introduce Netlink API to interact with the FRMR pools
mechanism to allow users to both configure and track its usage.
A vendor wishing to configure FRMR pool without exposing it or without
exposing internal MR properties to users, should use the
kernel_vendor_key field in the pools key. This can be useful in a few
cases, e.g, when the FRMR handle has a vendor-specific un-modifiable
property that the user registering the memory might not be aware of.
Signed-off-by: Michael Guralnik <michaelgur@nvidia.com>
Reviewed-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Edward Srouji <edwards@nvidia.com>
Link: https://patch.msgid.link/20260226-frmr_pools-v4-2-95360b54f15e@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Ensure that .create_cq_umem() and .create_cq() follow the same API
contract, allowing drivers to be gradually migrated to the umem-aware
CQ management flow.
Link: https://patch.msgid.link/20260213-refactor-umem-v1-7-f3be85847922@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
In the current implementation, CQ umem is handled both by ib_core and
the driver. ib_core sometimes creates and destroys it, while the driver
also destroys it.
Store the umem in struct ib_cq and ensure that only ib_core manages
its lifetime, relying solely on its internal reference counter.
Link: https://patch.msgid.link/20260213-refactor-umem-v1-5-f3be85847922@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
The ib_umem header no longer requires the removed includes or forward
declarations, so drop them to reduce clutter.
Link: https://patch.msgid.link/20260213-refactor-umem-v1-3-f3be85847922@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
Including ib_umem.h currently triggers circular dependency errors.
These issues can be resolved by removing the include of ib_verbs.h,
which was only needed to resolve the struct ib_device pointer.
>> depmod: ERROR: Cycle detected: ib_core -> ib_uverbs -> ib_core
>> depmod: ERROR: Found 2 modules in dependency cycles!
make[3]: *** [scripts/Makefile.modinst:132: depmod] Error 1
make[3]: Target '__modinst' not remade because of errors.
make[2]: *** [Makefile:1960: modules_install] Error 2
make[1]: *** [Makefile:248: __sub-make] Error 2
make[1]: Target 'modules_install' not remade because of errors.
make: *** [Makefile:248: __sub-make] Error 2
make: Target 'modules_install' not remade because of errors.
Link: https://patch.msgid.link/20260213-refactor-umem-v1-2-f3be85847922@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
The DMA iterator logic was mixed into verbs and umem-specific code,
forcing all users to include rdma/ib_umem.h. Move the block iterator
logic into iter.c and rdma/iter.h so that rdma/ib_umem.h and
rdma/ib_verbs.h can be separated in a follow-up patch.
Link: https://patch.msgid.link/20260213-refactor-umem-v1-1-f3be85847922@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
When listening on wildcard addresses we have a global list for the application
layer rdma_cm_id and for any existing device or any device added in future we
try to listen on any wildcard listener.
When the listener has a restricted_node_type we should prevent listening on
devices with a different node type.
While there fix the documentation comment of rdma_restrict_node_type()
to include rdma_resolve_addr() instead of having rdma_bind_addr() twice.
Fixes: a760e80e90f5 ("RDMA/core: introduce rdma_restrict_node_type()")
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Steve French <smfrench@gmail.com>
Cc: Namjae Jeon <linkinjeon@kernel.org>
Cc: Tom Talpey <tom@talpey.com>
Cc: Long Li <longli@microsoft.com>
Cc: linux-rdma@vger.kernel.org
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Link: https://patch.msgid.link/20260224165951.3582093-2-metze@samba.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Use "/**" to begin kernel-doc comments. This eliminates these
kernel-doc warnings:
Warning: include/rdma/restrack.h:123 struct member 'kref' not described in
'rdma_restrack_entry'
Warning: include/rdma/restrack.h:123 struct member 'comp' not described in
'rdma_restrack_entry'
(not adding missing return value kernel-doc descriptions)
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://patch.msgid.link/20260224003149.3175815-1-rdunlap@infradead.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Use the "typedef" keyword as needed.
Correct 2 function parameter names.
Warning: include/rdma/iw_cm.h:42 function parameter 'iw_cm_handler' not
described in 'int'
Warning: include/rdma/iw_cm.h:42 expecting prototype for iw_cm_handler().
Prototype was for int() instead
Warning: include/rdma/iw_cm.h:53 function parameter 'iw_event_handler' not
described in 'int'
Warning: include/rdma/iw_cm.h:53 expecting prototype for
iw_event_handler(). Prototype was for int() instead
Warning: include/rdma/iw_cm.h:104 function parameter 'cm_handler' not
described in 'iw_create_cm_id'
Warning: include/rdma/iw_cm.h:158 function parameter 'private_data' not
described in 'iw_cm_reject'
(not adding missing return value kernel-doc descriptions)
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://patch.msgid.link/20260224003134.3174856-1-rdunlap@infradead.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Add or correct kernel-doc comments to eliminate warnings:
Warning: include/rdma/ib_umem.h:104 function parameter 'biter' not
described in 'rdma_umem_for_each_dma_block'
Warning: include/rdma/ib_umem.h:140 function parameter 'pgsz_bitmap' not
described in 'ib_umem_find_best_pgoff'
Warning: include/rdma/ib_umem.h:141 No description found for return
value of 'ib_umem_find_best_pgoff'
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://patch.msgid.link/20260224003120.3173892-1-rdunlap@infradead.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Use the correct function parameters names to eliminate kernel-doc
warnings:
Warning: include/rdma/ib_cache.h:47 function parameter 'device_handle'
not described in 'ib_get_cached_pkey'
Warning: include/rdma/ib_cache.h:89 function parameter 'port_active'
not described in 'ib_get_cached_port_state'
(not adding missing function return value descriptions)
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://patch.msgid.link/20260224003106.3172916-1-rdunlap@infradead.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Pull rdma updates from Jason Gunthorpe:
"Usual smallish cycle. The NFS biovec work to push it down into RDMA
instead of indirecting through a scatterlist is pretty nice to see,
been talked about for a long time now.
- Various code improvements in irdma, rtrs, qedr, ocrdma, irdma, rxe
- Small driver improvements and minor bug fixes to hns, mlx5, rxe,
mana, mlx5, irdma
- Robusness improvements in completion processing for EFA
- New query_port_speed() verb to move past limited IBA defined speed
steps
- Support for SG_GAPS in rts and many other small improvements
- Rare list corruption fix in iwcm
- Better support different page sizes in rxe
- Device memory support for mana
- Direct bio vec to kernel MR for use by NFS-RDMA
- QP rate limiting for bnxt_re
- Remote triggerable NULL pointer crash in siw
- DMA-buf exporter support for RDMA mmaps like doorbells"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (66 commits)
RDMA/mlx5: Implement DMABUF export ops
RDMA/uverbs: Add DMABUF object type and operations
RDMA/uverbs: Support external FD uobjects
RDMA/siw: Fix potential NULL pointer dereference in header processing
RDMA/umad: Reject negative data_len in ib_umad_write
IB/core: Extend rate limit support for RC QPs
RDMA/mlx5: Support rate limit only for Raw Packet QP
RDMA/bnxt_re: Report QP rate limit in debugfs
RDMA/bnxt_re: Report packet pacing capabilities when querying device
RDMA/bnxt_re: Add support for QP rate limiting
MAINTAINERS: Drop RDMA files from Hyper-V section
RDMA/uverbs: Add __GFP_NOWARN to ib_uverbs_unmarshall_recv() kmalloc
svcrdma: use bvec-based RDMA read/write API
RDMA/core: add rdma_rw_max_sge() helper for SQ sizing
RDMA/core: add MR support for bvec-based RDMA operations
RDMA/core: use IOVA-based DMA mapping for bvec RDMA operations
RDMA/core: add bio_vec based RDMA read/write API
RDMA/irdma: Use kvzalloc for paged memory DMA address array
RDMA/rxe: Fix race condition in QP timer handlers
RDMA/mana_ib: Add device‑memory support
...
|
|
Expose DMABUF functionality to userspace through the uverbs interface,
enabling InfiniBand/RDMA devices to export PCI based memory regions
(e.g. device memory) as DMABUF file descriptors. This allows
zero-copy sharing of RDMA memory with other subsystems that support the
dma-buf framework.
A new UVERBS_OBJECT_DMABUF object type and allocation method were
introduced.
During allocation, uverbs invokes the driver to supply the
rdma_user_mmap_entry associated with the given page offset (pgoff).
Based on the returned rdma_user_mmap_entry, uverbs requests the driver
to provide the corresponding physical-memory details as well as the
driver’s PCI provider information.
Using this information, dma_buf_export() is called; if it succeeds,
uobj->object is set to the underlying file pointer returned by the
dma-buf framework.
The file descriptor number follows the standard uverbs allocation flow,
but the file pointer comes from the dma-buf subsystem, including its own
fops and private data.
When an mmap entry is removed, uverbs iterates over its associated
DMABUFs, marks them as revoked, and calls dma_buf_move_notify() so that
their importers are notified.
The same procedure applies during the disassociate flow; final cleanup
occurs when the application closes the file.
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Edward Srouji <edwards@nvidia.com>
Link: https://patch.msgid.link/20260201-dmabuf-export-v3-2-da238b614fe3@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
For smbdirect it required to use different ports depending
on the RDMA protocol. E.g. for iWarp 5445 is needed
(as tcp port 445 already used by the raw tcp transport for SMB),
while InfiniBand, RoCEv1 and RoCEv2 use port 445, as they
use an independent port range (even for RoCEv2, which uses udp
port 4791 itself).
Currently ksmbd is not able to function correctly at
all if the system has iWarp (RDMA_NODE_RNIC) interface(s)
and any InfiniBand, RoCEv1 and/or RoCEv2 interface(s)
at the same time.
And cifs.ko uses 5445 with a fallback to 445, which
means depending on the available interfaces, it tries
5445 in the RoCE range or may tries iWarp with 445
as a fallback. This leads to strange error messages
and strange network captures.
To avoid these problems they will be able to
use rdma_restrict_node_type(RDMA_NODE_RNIC) before
trying port 5445 and rdma_restrict_node_type(RDMA_NODE_IB_CA)
before trying port 445. It means we'll get early
-ENODEV early from rdma_resolve_addr() without any
network traffic and timeouts.
This is designed to be called before calling any
of rdma_bind_addr(), rdma_resolve_addr() or rdma_listen().
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Steve French <smfrench@gmail.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Long Li <longli@microsoft.com>
Cc: linux-rdma@vger.kernel.org
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Acked-by: Leon Romanovsky <leon@kernel.org>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
svc_rdma_accept() computes sc_sq_depth as the sum of rq_depth and the
number of rdma_rw contexts (ctxts). This value is used to allocate the
Send CQ and to initialize the sc_sq_avail credit pool.
However, when the device uses memory registration for RDMA operations,
rdma_rw_init_qp() inflates the QP's max_send_wr by a factor of three
per context to account for REG and INV work requests. The Send CQ and
credit pool remain sized for only one work request per context,
causing Send Queue exhaustion under heavy NFS WRITE workloads.
Introduce rdma_rw_max_sge() to compute the actual number of Send Queue
entries required for a given number of rdma_rw contexts. Upper layer
protocols call this helper before creating a Queue Pair so that their
Send CQs and credit accounting match the QP's true capacity.
Update svc_rdma_accept() to use rdma_rw_max_sge() when computing
sc_sq_depth, ensuring the credit pool reflects the work requests
that rdma_rw_init_qp() will reserve.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Fixes: 00bd1439f464 ("RDMA/rw: Support threshold for registration vs scattering to local pages")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Link: https://patch.msgid.link/20260128005400.25147-5-cel@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
The bvec-based RDMA API currently returns -EOPNOTSUPP when Memory
Region registration is required. This prevents iWARP devices from
using the bvec path, since iWARP requires MR registration for RDMA
READ operations. The force_mr debug parameter is also unusable with
bvec input.
Add rdma_rw_init_mr_wrs_bvec() to handle MR registration for bvec
arrays. The approach creates a synthetic scatterlist populated with
DMA addresses from the bvecs, then reuses the existing ib_map_mr_sg()
infrastructure. This avoids driver changes while keeping the
implementation small.
The synthetic scatterlist is stored in the rdma_rw_ctx for cleanup.
On destroy, the MRs are returned to the pool and the bvec DMA
mappings are released using the stored addresses.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Link: https://patch.msgid.link/20260128005400.25147-4-cel@kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
The bvec RDMA API maps each bvec individually via dma_map_phys(),
requiring an IOTLB sync for each mapping. For large I/O operations
with many bvecs, this overhead becomes significant.
The two-step IOVA API (dma_iova_try_alloc / dma_iova_link /
dma_iova_sync) allocates a contiguous IOVA range upfront, links
all physical pages without IOTLB syncs, then performs a single
sync at the end. This reduces IOTLB flushes from O(n) to O(1).
It also requires only a single output dma_addr_t compared to extra
per-input element storage in struct scatterlist.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Link: https://patch.msgid.link/20260128005400.25147-3-cel@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
The existing rdma_rw_ctx_init() API requires callers to construct a
scatterlist, which is then DMA-mapped page by page. Callers that
already have data in bio_vec form (such as the NVMe-oF target) must
first convert to scatterlist, adding overhead and complexity.
Introduce rdma_rw_ctx_init_bvec() and rdma_rw_ctx_destroy_bvec() to
accept bio_vec arrays directly. The new helpers use dma_map_phys()
for hardware RDMA devices and virtual addressing for software RDMA
devices (rxe, siw), avoiding intermediate scatterlist construction.
Memory registration (MR) path support is deferred to a follow-up
series; callers requiring MR-based transfers (iWARP devices or
force_mr=1) receive -EOPNOTSUPP and should use the scatterlist API.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Link: https://patch.msgid.link/20260128005400.25147-2-cel@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Some of the functions are local to the module and some are not used
starting from commit 36783dec8d79 ("RDMA/rxe: Delete deprecated module
parameters interface"). Delete and avoid exporting them.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Link: https://patch.msgid.link/20260104-ib-core-misc-v1-2-00367f77f3a8@nvidia.com
Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Add new ibv_query_port_speed() verb to enable applications to query
the effective bandwidth of a port.
This verb is particularly useful when the speed is not a multiplication
of IB speed and width where width is 2^n.
Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Edward Srouji <edwards@nvidia.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Introduce ib_port_attr_to_rate() to compute the data rate in 100 Mbps
units (deci-Gb/sec) from a port's active_speed and active_width
attributes. This generic helper removes duplicated speed-to-rate
calculations, which are used by sysfs and the upcoming new verb.
Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Edward Srouji <edwards@nvidia.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Add IB_EVENT_DEVICE_SPEED_CHANGE for notifying user applications on
device's ports speed changes.
Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Edward Srouji <edwards@nvidia.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Add the new rates as defined in the Infiniband spec for XDR and 8x
link width support.
Furthermore, modify the utility conversion methods accordingly.
Reference: IB Spec Release 1.8
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Link: https://patch.msgid.link/20251120-speed-8-v1-1-e6a7efef8cb8@nvidia.com
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
In include/rdma/ib_cm.h:
Correct a typedef's kernel-doc notation by adding the 'typedef' keyword
to it to avoid a warning.
Add a leading " *" to a kernel-doc line to avoid a warning.
Warning: ib_cm.h:289 function parameter 'ib_cm_handler' not described
in 'int'
Warning: ib_cm.h:289 expecting prototype for ib_cm_handler(). Prototype
was for int() instead
Warning: ib_cm.h:484 bad line: connection message in case duplicates
are received.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://patch.msgid.link/20251112062908.2711007-1-rdunlap@infradead.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Correct the kernel-doc comments format to avoid around 35 kernel-doc
warnings:
- use struct keyword to introduce struct kernel-doc comments
- use correct variable name for some struct members
- use correct function name in comments for some functions
- fix spelling in a few comments
- use a ':' instead of '-' to separate struct members from their
descriptions
- add a function name heading for rvt_div_mtu()
This leaves one struct member that is not described:
rdmavt_qp.h:206: warning: Function parameter or struct member 'wq'
not described in 'rvt_krwq'
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://patch.msgid.link/20251105045127.106822-1-rdunlap@infradead.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Fix 49 kernel-doc warnings in ib_verbs.h:
- Add struct short description for rdma_stat_desc, rdma_hw_stats.
- Fix kernel-doc format for struct members (use ':' instead of '-') for
several structs.
- Don't use "/**" kernel-doc notation for struct members in ib_device_ops
(most members are not documented and most of the kernel-doc was
not formatted correctly).
- Spell function parameters correctly in ib_dma_map_sgtable_attrs(),
ib_device_try_get(), rdma_roce_rescan_device().
- Add kernel-doc for the function parameter in
rdma_flow_label_to_udp_sport().
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://patch.msgid.link/20251020034320.3011094-1-rdunlap@infradead.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Enable user-space to inject an event into a CM through it's event
channel. Two new events are added and supported: RDMA_CM_EVENT_USER and
RDMA_CM_EVENT_INTERNAL. With these 2 events a new event parameter "arg"
is supported, which is passed from sender to receiver transparently.
With this feature an application is able to write an event into a CM
channel with a new user-space rdmacm API. For example thread T1 could
write an event with the API:
rdma_write_cm_event(cm_id, RDMA_CM_EVENT_USER, status, arg);
and thread T2 could receive the event with rdma_get_cm_event().
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Reviewed-by: Vlad Dumitrescu <vdumitrescu@nvidia.com>
Link: https://patch.msgid.link/fdf49d0b17a45933c5d8c1d90605c9447d9a3c73.1751279794.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Add new UCMA command and the corresponding CMA implementation. Userspace
can send this command to request service resolution based on service
name or ID.
On a successful resolution, one or multiple service records are
returned, the first one will be used as destination address by default.
Two new CM events are added and returned to caller accordingly:
- RDMA_CM_EVENT_ADDRINFO_RESOLVED: Resolve succeeded;
- RDMA_CM_EVENT_ADDRINFO_ERROR: Resolve failed.
Internally two new CM states are added:
- RDMA_CM_ADDRINFO_QUERY: CM is in the process of IB service
resolution;
- RDMA_CM_ADDRINFO_RESOLVED: CM has finished the resolve process.
With these new states, beside existing state transfer processes, 2 new
processes are supported:
1. The default address is used:
RDMA_CM_ADDR_BOUND ->
RDMA_CM_ADDRINFO_QUERY ->
RDMA_CM_ADDRINFO_RESOLVED ->
RDMA_CM_ROUTE_QUERY
2. To use a different address:
RDMA_CM_ADDR_BOUND ->
RDMA_CM_ADDRINFO_QUERY->
RDMA_CM_ADDRINFO_RESOLVED ->
RDMA_CM_ADDR_QUERY ->
RDMA_CM_ADDR_RESOLVED ->
RDMA_CM_ROUTE_QUERY
In the 2nd case, resolve_addrinfo returns multiple records, a user
could call rdma_resolve_addr() with the one that is not the first.
Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Reviewed-by: Vlad Dumitrescu <vdumitrescu@nvidia.com>
Link: https://patch.msgid.link/b6e82ad75522a13b5efe4ff86da0e465aab04cc2.1751279794.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Add an SA query API ib_sa_service_rec_get() to support building and
sending SA query MADs that ask for service records with a specific
name or ID, and receiving and parsing responses from the SM.
Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Reviewed-by: Vlad Dumitrescu <vdumitrescu@nvidia.com>
Link: https://patch.msgid.link/9af6c82f3a3a9d975115a33235fb4ffc7c8edb21.1751279793.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|