summaryrefslogtreecommitdiff
path: root/include/linux/mlx5
AgeCommit message (Collapse)AuthorFilesLines
14 daysMerge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds2-2/+7
Pull rdma updates from Jason Gunthorpe: "Many AI driven bug fixes, and several big driver API cleanups - Driver bug fixes and minor cleanups in mlx5, hns, rxe, efa, siw, rtrs, mana, irdma, mlx4. Commonly error path flows, integer arithmetic overflows on unsafe data, out of bounds access, and use after free issues under races. - Second half of the new udata API for drivers focusing on uAPI response - bnxt_re supports more options for QP creation that will allow a dv path in rdma-core - Untangle the module dependencies so drivers don't link to ib_uverbs.ko as was originall intended - Provide a new way to handle umems with a consistent simplified uAPI and update several drivers to use it. This brings dmabuf support to more places and more drivers - Support for mlx5 rate limit and packet pacing for UD and UC - A batch of fixes for the new shared FRMR pools infrastructure" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (148 commits) RDMA/irdma: Replace waitqueue and flag with completion RDMA/hns: Fix memory leak of bonding resources RDMA/rtrs-srv: Bound RDMA-Write length to chunk size in rdma_write_sg docs: infiniband: correct name of option to enable the ib_uverbs module RDMA/bnxt_re: Reject GET_TOGGLE_MEM when toggle page was not allocated RDMA/bnxt_re: Fail DBR related page allocation UAPIs if the feature is disabled RDMA/bnxt_re: Avoid repeated requests to allocate WC pages RDMA/bnxt_re: Proper rollback if the ioremap fails RDMA/bnxt_re: Add a max slot check for SQ RDMA/bnxt_re: Avoid displaying the kernel pointer RDMA/bnxt_re: Free CQ toggle page after firmware teardown RDMA/bnxt_re: Free SRQ toggle page after firmware teardown RDMA/bnxt_re: Initialize dpi variable to zero ABI: sysfs-class-infiniband: minor cleanup RDMA/mlx5: Release the HW‑provided UAR index rather than the SW one RDMA/mlx5: Fix undefined shift of user RQ WQE size RDMA/mlx5: Remove raw RSS QP restrack tracking RDMA/mlx5: Remove DCT restrack tracking RDMA/mlx5: Drop FRMR pool handle on UMR revoke failure RDMA/core: Add ib_frmr_pool_drop for unrecoverable handles ...
2026-06-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-2/+2
Cross-merge networking fixes after downstream PR (net-7.1-rc8). Conflicts: drivers/net/ethernet/wangxun/txgbe/txgbe_aml.c f67aead16e85 ("net: txgbe: rework service event handling") 57d39faed4c9 ("net: txgbe: improve functions of AML 40G devices") net/rds/info.c 512db8267b73 ("rds: mark snapshot pages dirty in rds_info_getsockopt()") 6e94eeb2a2a6 ("rds: convert to getsockopt_iter") Adjacent changes: include/net/sock.h 1ee90b77b727 ("net: guard timestamp cmsgs to real error queue skbs") f0de88303d5e ("net: make is_skb_wmem() available to modules") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-06-11RDMA/mlx5: Add support for rate limit in UD and UC QPsMaher Sanalla1-0/+1
Rate limiting is currently supported only for raw packet QPs, where the packet pacing index is programmed into the SQC during SQ modify. Extend rate limit support to UD and UC QPs by setting the pacing index in the QPC during RTR2RTS and RTS2RTS transitions. Signed-off-by: Maher Sanalla <msanalla@nvidia.com> Reviewed-by: Michael Guralnik <michaelgur@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> Link: https://patch.msgid.link/20260524-packet-pacing-v1-3-3d79439f8d08@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-06-11net/mlx5: Add UD and UC packet pacing capsMaher Sanalla1-2/+6
Add the needed capabilities in mlx5_ifc to support packet pacing for UC and UD QPs. Signed-off-by: Maher Sanalla <msanalla@nvidia.com> Reviewed-by: Michael Guralnik <michaelgur@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> Link: https://patch.msgid.link/20260524-packet-pacing-v1-1-3d79439f8d08@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-06-09net/mlx5: Fix slab-out-of-bounds in mlx5_query_nic_vport_mac_listDragos Tatulea1-2/+2
mlx5_query_nic_vport_mac_list() sizes its firmware command buffer using the PF's log_max_current_uc/mc_list capabilities. When querying a VF vport with a larger configured max (via devlink), the firmware response can overflow this buffer: BUG: KASAN: slab-out-of-bounds in mlx5_query_nic_vport_mac_list+0x453/0x4c0 [mlx5_core] Read of size 4 at addr ff1100013ffc8a12 by task kworker/u96:2/385 CPU: 12 UID: 0 PID: 385 Comm: kworker/u96:2 Not tainted 7.0.0-rc6+ #1 PREEMPT Hardware name: QEMU Standard PC (Q35 + ICH9, 2009) Workqueue: mlx5_esw_wq esw_vport_change_handler [mlx5_core] Call Trace: <TASK> dump_stack_lvl+0x69/0xa0 print_report+0x176/0x4e4 kasan_report+0xc8/0x100 mlx5_query_nic_vport_mac_list+0x453/0x4c0 [mlx5_core] esw_update_vport_addr_list+0x2e3/0xda0 [mlx5_core] esw_vport_change_handle_locked+0xa1f/0x1060 [mlx5_core] esw_vport_change_handler+0x6a/0x90 [mlx5_core] process_one_work+0x87f/0x15e0 worker_thread+0x62b/0x1020 kthread+0x375/0x490 ret_from_fork+0x4dc/0x810 ret_from_fork_asm+0x11/0x20 </TASK> Fix by querying the vport's own HCA caps to size the buffer correctly. Refactor the function to allocate and return the MAC list internally, removing the caller's dependency on knowing the correct max. Fixes: e16aea2744ab ("net/mlx5: Introduce access functions to modify/query vport mac lists") Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260604135849.458060-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-06-09Merge branch 'mlx5-next' of ↵Jakub Kicinski1-4/+7
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Tariq Toukan says: ==================== mlx5-next updates 2026-06-07 * 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux: net/mlx5: Add sd_group_size bits for SD management net/mlx5: Update IFC allowed_list_size field bits ==================== Link: https://patch.msgid.link/20260607111157.470978-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-06-07net/mlx5: Add sd_group_size bits for SD managementShay Drory1-2/+5
Currently, mlx5 is querying the MPIR register to get the number of PFs that should comprise the SD group. However, this register does not reflect the correct number in complex deployments. Hence, add an sd_group_size field to nic_vport_context to determine the correct number of PFs, and add an sd_group_size capability bit to indicate whether FW supports it. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260529052359.389413-3-tariqt@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-06-07net/mlx5: Update IFC allowed_list_size field bitsDragos Tatulea1-2/+2
The vport context allowed_list_size was increased from 12 to 16 bits. Writing to this field is protected by the log_max_current_uc/mc_list capabilities. On older FW versions these capabilities are limited to < 2K and only the high bits of the field are extended. This means that the change is backward compatible with older FW versions. Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260529052359.389413-2-tariqt@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-05-25net/mlx5: Add SPF function type for page managementMoshe Shemesh1-0/+1
Add MLX5_SPF to enum mlx5_func_type so SPFs get their own page counter, and add the corresponding WARN check at page cleanup. Wait for SPF pages to be reclaimed during ECPF teardown, alongside the existing host PF and VF page waits. SPF page requests are always identified by vhca_id, so the legacy func_id_to_type() path is not reached for satellite PFs. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260521110843.367329-13-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-19net/mlx5: add debugfs stats for frag buf dma poolsNimrod Oren1-0/+1
Add a debugfs file exposing per-node DMA pool usage for mlx5_frag_buf allocations. # cat /sys/kernel/debug/mlx5/<dev>/frag_buf_dma_pools node block_size used_blocks allocated_blocks 0 4096 0 0 0 8192 0 0 0 16384 0 0 0 32768 0 0 0 65536 0 0 1 4096 0 0 1 8192 0 0 1 16384 0 0 1 32768 0 0 1 65536 0 0 Signed-off-by: Nimrod Oren <noren@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260514104925.337570-3-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-09net/mlx5: Add VHCA_ID page management mode supportMoshe Shemesh1-0/+7
Add support for VHCA_ID-based page management mode. When the device firmware advertises the icm_mng_function_id_mode capability with MLX5_ID_MODE_FUNCTION_VHCA_ID, page management operations between the driver and firmware may use vhca_id instead of function_id as the effective function identifier, and the ec_function field is ignored. Update page management commands to conditionally set ec_function field only in FUNC_ID mode. Boot page allocation always uses FUNC_ID mode semantics for backward compatibility, as the capability bit is only available after set_hca_cap(). If after set_hca_cap() VHCA_ID mode was set, modify the tracking of the boot pages in page_root_xa to use vhca_id too. Add mlx5_esw_vhca_id_to_func_type() to resolve the function type in VHCA_ID mode, enabling per-type debugfs counters. Use a dedicated vhca_type_map xarray, to provide lockless lookup. Store the resolved type on each fw_page at allocation time so reclaim and release paths read it directly without any lookup. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Akiva Goldberger <agoldberger@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260506133239.276237-4-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-09net/mlx5: Make debugfs page counters by function type dynamicMoshe Shemesh1-0/+2
Make the per function type debugfs page counters dynamically added after mlx5_eswitch_init(). When page management operates in vhca_id mode, only the function acting as either eSwitch or vport manager can initialize the eSwitch structure and translate the vhca_id to function type for the functions to which it supplies pages. The next patch will add support for page management in vhca_id mode. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Akiva Goldberger <agoldberger@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260506133239.276237-3-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-07net/mlx5: E-Switch, serialize representor lifecycleMark Bloch1-0/+6
Representor callbacks can be registered and unregistered while the E-Switch is already in switchdev mode, and the same E-Switch may also be reconfigured by devlink, VF changes and SF changes. Serialize these paths with the per-E-Switch representor mutex instead of relying on ad-hoc bit state and wait queues. Take the representor lock around the mode transition, VF/SF representor changes and representor ops registration. Keep mode_lock and the representor lock unnested by using the operation flag while the mode lock is dropped. During mode changes, drop the representor lock around the auxiliary bus rescan because driver bind/unbind may register or unregister representor ops. Split representor ops registration into locked public wrappers and blocked internal helpers, clear the ops pointer on unregister, and add nested wrappers for the shared-FDB master IB path that registers peer representor ops while another E-Switch representor lock is already held. On unregister, always call __unload_reps_all_vport() before marking reps unregistered and clearing rep_ops. The per-representor state check makes this a no-op for types that were not loaded, so unregister no longer has to infer load state from esw->mode. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260503202726.266415-6-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-02net/mlx5: use internal dma pools for frag buf allocNimrod Oren1-0/+2
Add mlx5_dma_pool alloc/free paths, and wire mlx5_frag_buf allocation and free paths to use them. mlx5_frag_buf_alloc_node() now selects an mlx5_dma_pool to allocate fragments from, instead of directly allocating full coherent pages. mlx5_frag_buf_free() frees from the respective pool. mlx5_dma_pool_alloc() keeps allocation fast by maintaining pages with available indexes at the head of the list, so the common allocation path can take a free index immediately. New backing pages are allocated only when no free index is available. mlx5_dma_pool_free() returns released indexes to the pool and frees a backing page once all of its indexes become free. This avoids keeping fully free pages for the lifetime of the pool and reduces coherent DMA memory footprint. Signed-off-by: Nimrod Oren <noren@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260429201429.223809-4-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-02net/mlx5: add frag buf pools create/destroy pathsNimrod Oren1-2/+5
Introduce mlx5 DMA pool and pool-page data structures, and add the creation and teardown paths. Each NUMA node owns a set of mlx5_dma_pool instances, each one with a different block size. The sizes are defined as all powers of two starting from MLX5_ADAPTER_PAGE_SHIFT and up to PAGE_SHIFT. Since mlx5_frag_bufs are used to back objects whose sizes are encoded relative to MLX5_ADAPTER_PAGE_SHIFT, a smaller block_shift value cannot be used. Requests larger than PAGE_SIZE continue to be handled as page-sized fragments, as in the existing frag-buf allocation model. Signed-off-by: Nimrod Oren <noren@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260429201429.223809-3-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-29net/mlx5: Extend query_esw_functions output for multi-function supportMoshe Shemesh1-5/+59
Update the query_esw_functions command to support a new response layout that can report data for multiple network functions. Setting bit 14 of the op_mod field selects the v1 layout with network_function_params entries instead of the legacy host_params_context. The query_host_net_function_v1 read-only capability indicates firmware support for layout version 1, and query_host_net_function_num_max advertises the maximum number of network function entries. Define a new network_function_params layout and a net_function_params union that groups host_params_context and network_function_params. Rework the query_esw_functions output to use a flexible array of this union, and adjust existing driver callers to use it. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260428053851.220089-5-tariqt@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-04-29net/mlx5: Remove unused host_sf_enable fieldMoshe Shemesh1-1/+0
Drop the unused host_sf_enable array from mlx5_ifc_query_esw_functions_out_bits layout. This field has been deprecated in firmware and is not referenced by the mlx5 driver, so it can be safely removed. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260428053851.220089-4-tariqt@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-04-29net/mlx5: Add function_id_type for enable/disable_hca cmdsMoshe Shemesh1-2/+6
Add a function_id_type field to the enable_hca and disable_hca command input layouts in mlx5_ifc.h to allow using vhca_id as the function index instead of function_id. The new field support by firmware is indicated by the function_id_type_vhca_id capability bit, which is already exposed in hca caps. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260428053851.220089-3-tariqt@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-04-29mlx5: Rename the vport number enums for host PF and VFMoshe Shemesh2-3/+3
Rename the vport number enums MLX5_VPORT_PF to MLX5_VPORT_HOST_PF and MLX5_VPORT_FIRST_VF to MLX5_VPORT_FIRST_HOST_VF to indicate that these vport indices represent the host PF and its VFs. This prepares the code for upcoming support of an additional PF type. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260428053851.220089-2-tariqt@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-04-20Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds1-11/+0
Pull rdma updates from Jason Gunthorpe: "The usual collection of driver changes, more core infrastructure updates that typical this cycle: - Minor cleanups and kernel-doc fixes in bnxt_re, hns, rdmavt, efa, ocrdma, erdma, rtrs, hfi1, ionic, and pvrdma - New udata validation framework and driver updates - Modernize CQ creation interface in mlx4 and mlx5, manage CQ umem in core - Promote UMEM to a core component, split out DMA block iterator logic - Introduce FRMR pools with aging, statistics, pinned handles, and netlink control and use it in mlx5 - Add PCIe TLP emulation support in mlx5 - Extend umem to work with revocable pinned dmabuf's and use it in irdma - More net namespace improvements for rxe - GEN4 hardware support in irdma - First steps to MW and UC support in mana_ib - Support for CQ umem and doorbells in bnxt_re - Drop opa_vnic driver from hfi1 Fixes: - IB/core zero dmac neighbor resolution race - GID table memory free - rxe pad/ICRC validation and r_key async errors - mlx4 external umem for CQ - umem DMA attributes on unmap - mana_ib RX steering on RSS QP destroy" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (116 commits) RDMA/core: Fix user CQ creation for drivers without create_cq RDMA/ionic: bound node_desc sysfs read with %.64s IB/core: Fix zero dmac race in neighbor resolution RDMA/mana_ib: Support memory windows RDMA/rxe: Validate pad and ICRC before payload_size() in rxe_rcv RDMA/core: Prefer NLA_NUL_STRING RDMA/core: Fix memory free for GID table RDMA/hns: Remove the duplicate calls to ib_copy_validate_udata_in() RDMA: Remove redundant = {} for udata req structs RDMA/irdma: Add missing comp_mask check in alloc_ucontext RDMA/hns: Add missing comp_mask check in create_qp RDMA/mlx5: Pull comp_mask validation into ib_copy_validate_udata_in_cm() RDMA: Use ib_copy_validate_udata_in_cm() for zero comp_mask RDMA/hns: Use ib_copy_validate_udata_in() RDMA/mlx4: Use ib_copy_validate_udata_in() for QP RDMA/mlx4: Use ib_copy_validate_udata_in() RDMA/mlx5: Use ib_copy_validate_udata_in() for MW RDMA/mlx5: Use ib_copy_validate_udata_in() for SRQ RDMA/pvrdma: Use ib_copy_validate_udata_in() for srq RDMA: Use ib_copy_validate_udata_in() for implicit full structs ...
2026-04-16Merge tag 'vfio-v7.1-rc1' of https://github.com/awilliam/linux-vfioLinus Torvalds1-2/+14
Pull VFIO updates from Alex Williamson: - Update QAT vfio-pci variant driver for Gen 5, 420xx devices (Vijay Sundar Selvamani, Suman Kumar Chakraborty, Giovanni Cabiddu) - Fix vfio selftest MMIO DMA mapping selftest (Alex Mastro) - Conversions to const struct class in support of class_create() deprecation (Jori Koolstra) - Improve selftest compiler compatibility by avoiding initializer on variable-length array (Manish Honap) - Define new uAPI for drivers supporting migration to advise user- space of new initial data for reducing target startup latency. Implemented for mlx5 vfio-pci variant driver (Yishai Hadas) - Enable vfio selftests on aarch64, not just cross-compiles reporting arm64 (Ted Logan) - Update vfio selftest driver support to include additional DSA devices (Yi Lai) - Unconditionally include debugfs root pointer in vfio device struct, avoiding a build failure seen in hisi_acc variant driver without debugfs otherwise (Arnd Bergmann) - Add support for the s390 ISM (Internal Shared Memory) device via a new variant driver. The device is unique in the size of its BAR space (256TiB) and lack of mmap support (Julian Ruess) - Enforce that vfio-pci drivers implement a name in their ops structure for use in sequestering SR-IOV VFs (Alex Williamson) - Prune leftover group notifier code (Paolo Bonzini) - Fix Xe vfio-pci variant driver to avoid migration support as a dependency in the reset path and missing release call (Michał Winiarski) * tag 'vfio-v7.1-rc1' of https://github.com/awilliam/linux-vfio: (23 commits) vfio/xe: Add a missing vfio_pci_core_release_dev() vfio/xe: Reorganize the init to decouple migration from reset vfio: remove dead notifier code vfio/pci: Require vfio_device_ops.name MAINTAINERS: add VFIO ISM PCI DRIVER section vfio/ism: Implement vfio_pci driver for ISM devices vfio/pci: Rename vfio_config_do_rw() to vfio_pci_config_rw_single() and export it vfio: unhide vdev->debug_root vfio/qat: add support for Intel QAT 420xx VFs vfio: selftests: Support DMR and GNR-D DSA devices vfio: selftests: Build tests on aarch64 vfio/mlx5: Add REINIT support to VFIO_MIG_GET_PRECOPY_INFO vfio/mlx5: consider inflight SAVE during PRE_COPY net/mlx5: Add IFC bits for migration state vfio: Adapt drivers to use the core helper vfio_check_precopy_ioctl vfio: Add support for VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2 vfio: Define uAPI for re-init initial bytes during the PRE_COPY phase vfio: selftests: Fix VLA initialisation in vfio_pci_irq_set() vfio: uapi: fix comment typo vfio: mdev: replace mtty_dev->vd_class with a const struct class ...
2026-04-13Merge branch 'mlx5-next' of ↵Jakub Kicinski2-5/+13
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Tariq Toukan says: ==================== mlx5-next updates 2026-04-09 * 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux: net/mlx5: Add icm_mng_function_id_mode cap bit net/mlx5: Rename MLX5_PF page counter type to MLX5_SELF net/mlx5: Add vhca_id_type bit to alias context mlx5: Remove redundant iseg base ==================== Link: https://patch.msgid.link/20260409110431.154894-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-09net/mlx5: Add icm_mng_function_id_mode cap bitMoshe Shemesh1-1/+7
Introduce the capability bit icm_mng_function_id_mode to indicate that the device firmware uses vhca_id instead of function_id as the effective identifier for the firmware commands MANAGE_PAGES, QUERY_PAGES, and page request event. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Akiva Goldberger <agoldberger@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260403090028.137783-3-tariqt@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-04-09net/mlx5: Rename MLX5_PF page counter type to MLX5_SELFMoshe Shemesh1-1/+1
The MLX5_PF enum value in mlx5_func_type is used to track firmware page allocations for the page manager function itself, which is either the ECPF on SmartNIC systems or the host PF when there is no ECPF. Rename it to MLX5_SELF to accurately reflect that this counter tracks pages allocated by the manager for its own use, regardless of whether it is a PF or ECPF. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260403090028.137783-2-tariqt@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-22net/mlx5: Add vhca_id_type bit to alias contextPatrisious Haddad1-2/+5
Add vhca_id_type bit to alias context which allows indicating the vhca_id_type to be passed at vhca_id_to_be_accessed, which can be either HW or SW, note that SW_VHCA_ID must be used to allow alias to work properly after migration. Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260319122211.27384-3-tariqt@nvidia.com Reviewed-by: Joe Damato <joe@dama.to> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-22mlx5: Remove redundant iseg baseParav Pandit1-1/+0
iseg_base and base_addr both point to BAR0, making iseg_base redundant. Remove iseg_base and rely on base_addr instead, reducing the size of struct mlx5_core_dev. Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Shay Drori <shayd@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260319122211.27384-2-tariqt@nvidia.com Reviewed-by: Joe Damato <joe@dama.to> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-19net/mlx5: Add IFC bits for migration stateYishai Hadas1-2/+14
Add the relevant IFC bits for querying an extra migration state from the device. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20260317161753.18964-5-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex@shazbot.org>
2026-03-19Merge branch 'mlx5-next' of ↵Jakub Kicinski4-11/+79
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Tariq Toukan says: ==================== mlx5-next updates 2026-03-17 The following pull-request contains common mlx5 updates * 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux: net/mlx5: Expose MLX5_UMR_ALIGN definition {net/RDMA}/mlx5: Add LAG demux table API and vport demux rules net/mlx5: Add VHCA RX flow destination support for FW steering net/mlx5: LAG, replace mlx5_get_dev_index with LAG sequence number net/mlx5: E-switch, modify peer miss rule index to vhca_id net/mlx5: LAG, use xa_alloc to manage LAG device indices net/mlx5: LAG, replace pf array with xarray net/mlx5: Add silent mode set/query and VHCA RX IFC bits net/mlx5: Add IFC bits for shared headroom pool PBMC support net/mlx5: Expose TLP emulation capabilities net/mlx5: Add TLP emulation device capabilities ==================== Link: https://patch.msgid.link/20260317075844.12066-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-16net/mlx5: Expose MLX5_UMR_ALIGN definitionTariq Toukan1-0/+1
Expose HW constant value in a shared header, to be used by core/EN drivers. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260309093435.1850724-10-tariqt@nvidia.com Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-16{net/RDMA}/mlx5: Add LAG demux table API and vport demux rulesShay Drory2-3/+13
Downstream patches will introduce SW-only LAG (e.g. shared_fdb without HW LAG). In this mode the firmware cannot create the LAG demux table, but vport demuxing is still required. Move LAG demux flow-table ownership to the LAG layer and introduce APIs to init/cleanup the demux table and add/delete per-vport rules. Adjust the RDMA driver to use the new APIs. In this mode, the LAG layer will create a flow group that matches vport metadata. Vports that are not native to the LAG master eswitch add the demux rule during IB representor load and remove it on unload. The demux rule forward traffic from said vports to their native eswitch manager via a new dest type - MLX5_FLOW_DESTINATION_TYPE_VHCA_RX. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260309093435.1850724-9-tariqt@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-16net/mlx5: Add VHCA RX flow destination support for FW steeringShay Drory1-0/+4
Introduce MLX5_FLOW_DESTINATION_TYPE_VHCA_RX as a new flow steering destination type. Wire the new destination through flow steering command setup by mapping it to MLX5_IFC_FLOW_DESTINATION_TYPE_VHCA_RX and passing the vhca id, extend forward-destination validation to accept it, and teach the flow steering tracepoint formatter to print rx_vhca_id. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260309093435.1850724-8-tariqt@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-16net/mlx5: LAG, replace mlx5_get_dev_index with LAG sequence numberShay Drory1-0/+11
Introduce mlx5_lag_get_dev_seq() which returns a device's sequence number within the LAG: master is always 0, remaining devices numbered sequentially. This provides a stable index for peer flow tracking and vport ordering without depending on native_port_num. Replace mlx5_get_dev_index() usage in en_tc.c (peer flow array indexing) and ib_rep.c (vport index ordering) with the new API. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260309093435.1850724-7-tariqt@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-16net/mlx5: Add silent mode set/query and VHCA RX IFC bitsShay Drory1-5/+14
Update the mlx5 IFC headers with newly defined capability and command-layout bits: - Add silent_mode_query and rename silent_mode to silent_mode_set cap fields. - Add forward_vhca_rx and MLX5_IFC_FLOW_DESTINATION_TYPE_VHCA_RX. - Expose silent mode fields in the L2 table query command structures. Update the SD support check to use the new capability name (silent_mode_set) to match the updated IFC definition. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260309093435.1850724-3-tariqt@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-16net/mlx5: Add IFC bits for shared headroom pool PBMC supportAlexei Lazar1-2/+5
Add hardware interface definitions for shared headroom pool (SHP) in port buffer management: - shp_pbmc_pbsr_support: capability bit in PCAM enhanced features indicating device support for shared headroom pool in PBMC/PBSR. - shared_headroom_pool: buffer entry in PBMC register (pbmc_reg_bits) for the shared headroom pool configuration, reusing the bufferx layout; reduce trailing reserved region accordingly. Signed-off-by: Alexei Lazar <alazar@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260309093435.1850724-2-tariqt@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-14net/mlx5: Add a shared devlink instance for PFs on same chipJiri Pirko1-0/+1
Use the previously introduced shared devlink infrastructure to create a shared devlink instance for mlx5 PFs that reside on the same physical chip. The shared instance is identified by the chip's serial number extracted from PCI VPD (V3 keyword, with fallback to serial number for older devices). Each PF that probes calls mlx5_shd_init() which extracts the chip serial number and uses devlink_shd_get() to get or create the shared instance. When a PF is removed, mlx5_shd_uninit() calls devlink_shd_put() to release the reference. The shared instance is automatically destroyed when the last PF is removed. Make the PF devlink instances nested in this shared devlink instance, allowing userspace to identify which PFs belong to the same physical chip. Example: pci/0000:08:00.0: index 0 nested_devlink: auxiliary/mlx5_core.eth.0 devlink_index/1: index 1 nested_devlink: pci/0000:08:00.0 pci/0000:08:00.1 auxiliary/mlx5_core.eth.0: index 2 pci/0000:08:00.1: index 3 nested_devlink: auxiliary/mlx5_core.eth.1 auxiliary/mlx5_core.eth.1: index 4 Signed-off-by: Jiri Pirko <jiri@nvidia.com> Link: https://patch.msgid.link/20260312100407.551173-14-jiri@resnulli.us Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-05Add support for TLP emulationLeon Romanovsky2-1/+31
This series adds support for Transaction Layer Packet (TLP) emulation response gateway regions, enabling userspace device emulation software to write TLP responses directly to lower layers without kernel driver involvement. Currently, the mlx5 driver exposes VirtIO emulation access regions via the MLX5_IB_METHOD_VAR_OBJ_ALLOC ioctl. This series extends that ioctl to also support allocating TLP response gateway channels for PCI device emulation use cases. Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-05net/mlx5: Expose TLP emulation capabilitiesMaher Sanalla1-0/+9
Expose and query TLP device emulation caps on driver load. Signed-off-by: Maher Sanalla <msanalla@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2026-03-05net/mlx5: Add TLP emulation device capabilitiesMaher Sanalla1-1/+22
Introduce the hardware structures and definitions needed for the driver support of TLP emulation in mlx5_ifc. Signed-off-by: Maher Sanalla <msanalla@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2026-03-02net/mlx5: Drop MR cache related codeMichael Guralnik1-11/+0
Following mlx5_ib move to using FRMR pools, drop all unused code of MR cache. Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Reviewed-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> Link: https://patch.msgid.link/20260226-frmr_pools-v4-7-95360b54f15e@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-02-19net/mlx5: Fix multiport device check over light SFsShay Drory1-2/+2
Driver is using num_vhca_ports capability to distinguish between multiport master device and multiport slave device. num_vhca_ports is a capability the driver sets according to the MAX num_vhca_ports capability reported by FW. On the other hand, light SFs doesn't set the above capbility. This leads to wrong results whenever light SFs is checking whether he is a multiport master or slave. Therefore, use the MAX capability to distinguish between master and slave devices. Fixes: e71383fb9cd1 ("net/mlx5: Light probe local SFs") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Jacob Keller <Jacob.e.keller@intel.com> Link: https://patch.msgid.link/20260218072904.1764634-2-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-06net/mlx5: Fix 1600G link mode enum namingYael Chemla1-1/+1
Rename TAUI/TBASE to GAUI/GBASE in 1600G link mode identifier and its usage in ethtool and link-info tables. Reported-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com> Signed-off-by: Yael Chemla <ychemla@nvidia.com> Reviewed-by: Shahar Shitrit <shshitrit@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com> Reported-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com> Signed-off-by: Yael Chemla <ychemla@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://patch.msgid.link/20260204194324.1723534-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-05net/mlx5e: RX, Drop oversized packets in non-linear modeDragos Tatulea1-0/+5
Currently the driver has an inconsistent behaviour between modes when it comes to oversized packets that are not dropped through the physical MTU check in HW. This can happen for Multi Host configurations where each port has a different MTU. Current behavior: 1) Striding RQ in linear mode drops the packet in SW and counts it with oversize_pkts_sw_drop. 2) Striding RQ in non-linear mode allows it like a normal packet. 3) Legacy RQ can't receive oversized packets by design: the RX WQE uses MTU sized packet buffers. This inconsistency is not a violation of the netdev policy [1] but it is better to be consistent across modes. This patch aligns (2) with (1) and (3). One exception is added for LRO: don't drop the oversized packet if it is an LRO packet. As now rq->hw_mtu always needs to be updated during the MTU change flow, drop the reset avoidance optimization from mlx5e_change_mtu(). Extract the CQE LRO segments reading into a helper function as it is used twice now. [1] Documentation/networking/netdevices.rst#L205 Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20260203072130.1710255-2-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-13net/mlx5: Add IFC bits for extended ETS rate limit bandwidth valueAlexei Lazar1-3/+4
Add hardware interface definitions to support extended bandwidth rate limiting in the QoS Enhanced Transmission Selection (ETS) configuration. The new fields include: - max_bw_value: extended from 8-bit to 16-bit in ets_tcn_config_reg, simplifying the implementation by using a single field instead of separate MSB/LSB fields. - qetcr_qshr_max_bw_val_msb: capability bit in qcam_qos_feature_cap_mask indicating device support for the extended 16-bit max_bw_value field. These interface additions are prerequisites for increasing the per-TC rate limit beyond 255 Gbps to support higher-bandwidth NICs. Signed-off-by: Alexei Lazar <alazar@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1768200608-1543180-1-git-send-email-tariqt@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-01-05net/mlx5: Add support for querying bond speedOr Har-Toov1-1/+1
Add mlx5_lag_query_bond_speed() to query the aggregated speed of lag configurations with a bond device. Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-01-05net/mlx5: Handle port and vport speed change events in MPESWOr Har-Toov2-0/+3
Add port change event handling logic for MPESW LAG mode, ensuring VFs are updated when the speed of LAG physical ports changes. This triggers a speed update workflow when relevant port state changes occur, enabling consistent and accurate reporting of VF bandwidth. Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Maher Sanalla <msanalla@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-01-05net/mlx5: Propagate LAG effective max_tx_speed to vportsOr Har-Toov1-0/+4
Currently, vports report only their parent's uplink speed, which in LAG setups does not reflect the true aggregated bandwidth. This makes it hard for upper-layer software to optimize load balancing decisions based on accurate bandwidth information. Fix the issue by calculating the possible maximum speed of a LAG as the sum of speeds of all active uplinks that are part of the LAG. Propagate this effective max speed to vports associated with the LAG whenever a relevant event occurs, such as physical port link state changes or LAG creation/modification. With this change, upper-layer components receive accurate bandwidth information corresponding to the active members of the LAG and can make better load balancing decisions. Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Maher Sanalla <msanalla@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-01-05net/mlx5: Add max_tx_speed and its CAP bit to IFCOr Har-Toov1-3/+6
Introduce the max_tx_speed field to the query and modify_vport_state structures. Add the esw_vport_state_max_tx_speed capability bit, indicating the firmware support modifying the max_tx_speed field via the MODIFY_VPORT_STATE command. Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Maher Sanalla <msanalla@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-11-20net/mlx5: Move SF dev table notifier registration outside the PF devlink lockCosmin Ratiu1-0/+1
This completes the previous patches by moving notifier registration for SF dev tables outside the devlink locked critical section in mlx5_init_one() / mlx5_uninit_one() and into the mlx5_mdev_init() / mlx5_mdev_uninit() functions. This is only done for non-SFs, since SFs do not have a SF HW table themselves. After this patch, notifiers can grab the PF devlink lock (soon to be necessary) without creating a locking cycle. Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1763325940-1231508-7-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-20net/mlx5: Move the SF table notifiers outside the devlink lockCosmin Ratiu1-0/+3
Move the SF table notifiers registration/unregistration outside of mlx5_init_one() / mlx5_uninit_one() and into the mlx5_mdev_init() / mlx5_mdev_uninit() functions. This is only done for non-SFs, since SFs do not have a SF table themselves and thus don't need notifiers. Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1763325940-1231508-6-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-20net/mlx5: Move the SF HW table notifier outside the devlink lockCosmin Ratiu1-0/+1
Move the SF HW table notifier registration/unregistration outside of mlx5_init_one() / mlx5_uninit_one() and into the mlx5_mdev_init() / mlx5_mdev_uninit() functions. This is only done for non-SFs, since SFs do not have a SF HW table themselves. Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1763325940-1231508-5-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>