<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/include/linux/mlx5/driver.h, branch v6.1.168</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.1.168</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.1.168'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2026-01-11T14:18:43+00:00</updated>
<entry>
<title>net/mlx5: Create a new profile for SFs</title>
<updated>2026-01-11T14:18:43+00:00</updated>
<author>
<name>Parav Pandit</name>
<email>parav@mellanox.com</email>
</author>
<published>2020-04-23T13:27:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e7b57d5b7371d0f55f7d078ad0917699245aff3e'/>
<id>urn:sha1:e7b57d5b7371d0f55f7d078ad0917699245aff3e</id>
<content type='text'>
[ Upstream commit 9df839a711aee437390b16ee39cf0b5c1620be6a ]

Create a new profile for SFs in order to disable the command cache.
Each function command cache consumes ~500KB of memory, when using a
large number of SFs this savings is notable on memory constarined
systems.

Use a new profile to provide for future differences between SFs and PFs.

The mr_cache not used for non-PF functions, so it is excluded from the
new profile.

Signed-off-by: Parav Pandit &lt;parav@mellanox.com&gt;
Reviewed-by: Bodong Wang &lt;bodong@nvidia.com&gt;
Signed-off-by: Saeed Mahameed &lt;saeedm@nvidia.com&gt;
Stable-dep-of: 5846a365fc64 ("net/mlx5: Drain firmware reset in shutdown callback")
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>net/mlx5: Expose shared buffer registers bits and structs</title>
<updated>2025-12-06T21:12:33+00:00</updated>
<author>
<name>Maher Sanalla</name>
<email>msanalla@nvidia.com</email>
</author>
<published>2022-11-28T16:00:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=967bd1254ff590f907b519c67ed3ae9405f9c4f9'/>
<id>urn:sha1:967bd1254ff590f907b519c67ed3ae9405f9c4f9</id>
<content type='text'>
[ Upstream commit 8d231dbc3b10155727bcfa9e543d397ad357f14f ]

Add the shared receive buffer management and configuration registers:
1. SBPR - Shared Buffer Pools Register
2. SBCM - Shared Buffer Class Management Register

Signed-off-by: Maher Sanalla &lt;msanalla@nvidia.com&gt;
Reviewed-by: Moshe Shemesh &lt;moshe@nvidia.com&gt;
Signed-off-by: Saeed Mahameed &lt;saeedm@nvidia.com&gt;
Stable-dep-of: 9fcc2b6c1052 ("net/mlx5e: Fix potentially misleading debug message")
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>RDMA/mlx5: Fix error flow upon firmware failure for RQ destruction</title>
<updated>2025-06-27T10:07:10+00:00</updated>
<author>
<name>Patrisious Haddad</name>
<email>phaddad@nvidia.com</email>
</author>
<published>2025-04-28T11:34:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=cf32affe6f3801cfb72a65e69c4bc7a8ee9be100'/>
<id>urn:sha1:cf32affe6f3801cfb72a65e69c4bc7a8ee9be100</id>
<content type='text'>
[ Upstream commit 5d2ea5aebbb2f3ebde4403f9c55b2b057e5dd2d6 ]

Upon RQ destruction if the firmware command fails which is the
last resource to be destroyed some SW resources were already cleaned
regardless of the failure.

Now properly rollback the object to its original state upon such failure.

In order to avoid a use-after free in case someone tries to destroy the
object again, which results in the following kernel trace:
refcount_t: underflow; use-after-free.
WARNING: CPU: 0 PID: 37589 at lib/refcount.c:28 refcount_warn_saturate+0xf4/0x148
Modules linked in: rdma_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_ib(OE) rfkill mlx5_core(OE) mlxdevm(OE) ib_uverbs(OE) ib_core(OE) psample mlxfw(OE) mlx_compat(OE) macsec tls pci_hyperv_intf sunrpc vfat fat virtio_net net_failover failover fuse loop nfnetlink vsock_loopback vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vmw_vmci vsock xfs crct10dif_ce ghash_ce sha2_ce sha256_arm64 sha1_ce virtio_console virtio_gpu virtio_blk virtio_dma_buf virtio_mmio dm_mirror dm_region_hash dm_log dm_mod xpmem(OE)
CPU: 0 UID: 0 PID: 37589 Comm: python3 Kdump: loaded Tainted: G           OE     -------  ---  6.12.0-54.el10.aarch64 #1
Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : refcount_warn_saturate+0xf4/0x148
lr : refcount_warn_saturate+0xf4/0x148
sp : ffff80008b81b7e0
x29: ffff80008b81b7e0 x28: ffff000133d51600 x27: 0000000000000001
x26: 0000000000000000 x25: 00000000ffffffea x24: ffff00010ae80f00
x23: ffff00010ae80f80 x22: ffff0000c66e5d08 x21: 0000000000000000
x20: ffff0000c66e0000 x19: ffff00010ae80340 x18: 0000000000000006
x17: 0000000000000000 x16: 0000000000000020 x15: ffff80008b81b37f
x14: 0000000000000000 x13: 2e656572662d7265 x12: ffff80008283ef78
x11: ffff80008257efd0 x10: ffff80008283efd0 x9 : ffff80008021ed90
x8 : 0000000000000001 x7 : 00000000000bffe8 x6 : c0000000ffff7fff
x5 : ffff0001fb8e3408 x4 : 0000000000000000 x3 : ffff800179993000
x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff000133d51600
Call trace:
 refcount_warn_saturate+0xf4/0x148
 mlx5_core_put_rsc+0x88/0xa0 [mlx5_ib]
 mlx5_core_destroy_rq_tracked+0x64/0x98 [mlx5_ib]
 mlx5_ib_destroy_wq+0x34/0x80 [mlx5_ib]
 ib_destroy_wq_user+0x30/0xc0 [ib_core]
 uverbs_free_wq+0x28/0x58 [ib_uverbs]
 destroy_hw_idr_uobject+0x34/0x78 [ib_uverbs]
 uverbs_destroy_uobject+0x48/0x240 [ib_uverbs]
 __uverbs_cleanup_ufile+0xd4/0x1a8 [ib_uverbs]
 uverbs_destroy_ufile_hw+0x48/0x120 [ib_uverbs]
 ib_uverbs_close+0x2c/0x100 [ib_uverbs]
 __fput+0xd8/0x2f0
 __fput_sync+0x50/0x70
 __arm64_sys_close+0x40/0x90
 invoke_syscall.constprop.0+0x74/0xd0
 do_el0_svc+0x48/0xe8
 el0_svc+0x44/0x1d0
 el0t_64_sync_handler+0x120/0x130
 el0t_64_sync+0x1a4/0x1a8

Fixes: e2013b212f9f ("net/mlx5_core: Add RQ and SQ event handling")
Signed-off-by: Patrisious Haddad &lt;phaddad@nvidia.com&gt;
Link: https://patch.msgid.link/3181433ccdd695c63560eeeb3f0c990961732101.1745839855.git.leon@kernel.org
Signed-off-by: Leon Romanovsky &lt;leon@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>net/mlx5: use do_aux_work for PHC overflow checks</title>
<updated>2025-02-21T12:49:32+00:00</updated>
<author>
<name>Vadim Fedorenko</name>
<email>vadfed@meta.com</email>
</author>
<published>2025-01-07T10:48:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=67adfca904ca6604ed334ae334fa761e220bb1c7'/>
<id>urn:sha1:67adfca904ca6604ed334ae334fa761e220bb1c7</id>
<content type='text'>
[ Upstream commit e61e6c415ba9ff2b32bb6780ce1b17d1d76238f1 ]

The overflow_work is using system wq to do overflow checks and updates
for PHC device timecounter, which might be overhelmed by other tasks.
But there is dedicated kthread in PTP subsystem designed for such
things. This patch changes the work queue to proper align with PTP
subsystem and to avoid overloading system work queue.
The adjfine() function acts the same way as overflow check worker,
we can postpone ptp aux worker till the next overflow period after
adjfine() was called.

Reviewed-by: Dragos Tatulea &lt;dtatulea@nvidia.com&gt;
Signed-off-by: Vadim Fedorenko &lt;vadfed@meta.com&gt;
Acked-by: Tariq Toukan &lt;tariqt@nvidia.com&gt;
Link: https://patch.msgid.link/20250107104812.380225-1-vadfed@meta.com
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>RDMA/mlx5: Enforce same type port association for multiport RoCE</title>
<updated>2025-01-09T12:29:57+00:00</updated>
<author>
<name>Patrisious Haddad</name>
<email>phaddad@nvidia.com</email>
</author>
<published>2024-12-03T13:45:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=cbc35242a97a472ba8e3a87cac2ad861a4f9b164'/>
<id>urn:sha1:cbc35242a97a472ba8e3a87cac2ad861a4f9b164</id>
<content type='text'>
[ Upstream commit e05feab22fd7dabcd6d272c4e2401ec1acdfdb9b ]

Different core device types such as PFs and VFs shouldn't be affiliated
together since they have different capabilities, fix that by enforcing
type check before doing the affiliation.

Fixes: 32f69e4be269 ("{net, IB}/mlx5: Manage port association for multiport RoCE")
Reviewed-by: Mark Bloch &lt;mbloch@nvidia.com&gt;
Signed-off-by: Patrisious Haddad &lt;phaddad@nvidia.com&gt;
Link: https://patch.msgid.link/88699500f690dff1c1852c1ddb71f8a1cc8b956e.1733233480.git.leonro@nvidia.com
Reviewed-by: Mateusz Polchlopek &lt;mateusz.polchlopek@intel.com&gt;
Signed-off-by: Leon Romanovsky &lt;leon@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>net/mlx5: Add a timeout to acquire the command queue semaphore</title>
<updated>2024-06-12T09:03:19+00:00</updated>
<author>
<name>Akiva Goldberger</name>
<email>agoldberger@nvidia.com</email>
</author>
<published>2024-05-09T11:29:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=4baae687a20ef2b82fde12de3c04461e6f2521d6'/>
<id>urn:sha1:4baae687a20ef2b82fde12de3c04461e6f2521d6</id>
<content type='text'>
[ Upstream commit 485d65e1357123a697c591a5aeb773994b247ad7 ]

Prevent forced completion handling on an entry that has not yet been
assigned an index, causing an out of bounds access on idx = -22.
Instead of waiting indefinitely for the sem, blocking flow now waits for
index to be allocated or a sem acquisition timeout before beginning the
timer for FW completion.

Kernel log example:
mlx5_core 0000:06:00.0: wait_func_handle_exec_timeout:1128:(pid 185911): cmd[-22]: CREATE_UCTX(0xa04) No done completion

Fixes: 8e715cd613a1 ("net/mlx5: Set command entry semaphore up once got index free")
Signed-off-by: Akiva Goldberger &lt;agoldberger@nvidia.com&gt;
Reviewed-by: Moshe Shemesh &lt;moshe@nvidia.com&gt;
Signed-off-by: Tariq Toukan &lt;tariqt@nvidia.com&gt;
Link: https://lore.kernel.org/r/20240509112951.590184-5-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>net/mlx5: Re-organize mlx5_cmd struct</title>
<updated>2024-01-01T12:38:55+00:00</updated>
<author>
<name>Shay Drory</name>
<email>shayd@nvidia.com</email>
</author>
<published>2023-01-18T14:52:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=f3739647a7373d29a76f5d6f07aa27e5c4496591'/>
<id>urn:sha1:f3739647a7373d29a76f5d6f07aa27e5c4496591</id>
<content type='text'>
[ Upstream commit 58db72869a9f8e01910844ca145efc2ea91bbbf9 ]

Downstream patch will split mlx5_cmd_init() to probe and reload
routines. As a preparation, organize mlx5_cmd struct so that any
field that will be used in the reload routine are grouped at new
nested struct.

Signed-off-by: Shay Drory &lt;shayd@nvidia.com&gt;
Reviewed-by: Moshe Shemesh &lt;moshe@nvidia.com&gt;
Signed-off-by: Saeed Mahameed &lt;saeedm@nvidia.com&gt;
Stable-dep-of: 8f5100da56b3 ("net/mlx5e: Fix a race in command alloc flow")
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>net/mlx5: Prevent high-rate FW commands from populating all slots</title>
<updated>2024-01-01T12:38:55+00:00</updated>
<author>
<name>Tariq Toukan</name>
<email>tariqt@nvidia.com</email>
</author>
<published>2022-08-02T11:47:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=148ec770c63e5a338a5c7d2b27aaa1eb2bcb1c91'/>
<id>urn:sha1:148ec770c63e5a338a5c7d2b27aaa1eb2bcb1c91</id>
<content type='text'>
[ Upstream commit 63fbae0a74c3e1df7c20c81e04353ced050d9887 ]

Certain connection-based device-offload protocols (like TLS) use
per-connection HW objects to track the state, maintain the context, and
perform the offload properly. Some of these objects are created,
modified, and destroyed via FW commands. Under high connection rate,
this type of FW commands might continuously populate all slots of the FW
command interface and throttle it, while starving other critical control
FW commands.

Limit these throttle commands to using only up to a portion (half) of
the FW command interface slots. FW commands maximal rate is not hit, and
the same high rate is still reached when applying this limitation.

Signed-off-by: Tariq Toukan &lt;tariqt@nvidia.com&gt;
Reviewed-by: Moshe Shemesh &lt;moshe@nvidia.com&gt;
Signed-off-by: Saeed Mahameed &lt;saeedm@nvidia.com&gt;
Stable-dep-of: 8f5100da56b3 ("net/mlx5e: Fix a race in command alloc flow")
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>RDMA/mlx5: Fix affinity assignment</title>
<updated>2023-06-21T14:01:00+00:00</updated>
<author>
<name>Mark Bloch</name>
<email>mbloch@nvidia.com</email>
</author>
<published>2023-06-05T10:33:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=1def2a94f4ee38e8442b4b1e6d8354bf61cb6bb0'/>
<id>urn:sha1:1def2a94f4ee38e8442b4b1e6d8354bf61cb6bb0</id>
<content type='text'>
[ Upstream commit 617f5db1a626f18d5cbb7c7faf7bf8f9ea12be78 ]

The cited commit aimed to ensure that Virtual Functions (VFs) assign a
queue affinity to a Queue Pair (QP) to distribute traffic when
the LAG master creates a hardware LAG. If the affinity was set while
the hardware was not in LAG, the firmware would ignore the affinity value.

However, this commit unintentionally assigned an affinity to QPs on the LAG
master's VPORT even if the RDMA device was not marked as LAG-enabled.
In most cases, this was not an issue because when the hardware entered
hardware LAG configuration, the RDMA device of the LAG master would be
destroyed and a new one would be created, marked as LAG-enabled.

The problem arises when a user configures Equal-Cost Multipath (ECMP).
In ECMP mode, traffic can be directed to different physical ports based on
the queue affinity, which is intended for use by VPORTS other than the
E-Switch manager. ECMP mode is supported only if both E-Switch managers are
in switchdev mode and the appropriate route is configured via IP. In this
configuration, the RDMA device is not destroyed, and we retain the RDMA
device that is not marked as LAG-enabled.

To ensure correct behavior, Send Queues (SQs) opened by the E-Switch
manager through verbs should be assigned strict affinity. This means they
will only be able to communicate through the native physical port
associated with the E-Switch manager. This will prevent the firmware from
assigning affinity and will not allow the SQs to be remapped in case of
failover.

Fixes: 802dcc7fc5ec ("RDMA/mlx5: Support TX port affinity for VF drivers in LAG mode")
Reviewed-by: Maor Gottlieb &lt;maorg@nvidia.com&gt;
Signed-off-by: Mark Bloch &lt;mbloch@nvidia.com&gt;
Link: https://lore.kernel.org/r/425b05f4da840bc684b0f7e8ebf61aeb5cef09b0.1685960567.git.leon@kernel.org
Signed-off-by: Leon Romanovsky &lt;leon@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>net/mlx5: Expose SF firmware pages counter</title>
<updated>2023-02-14T18:11:47+00:00</updated>
<author>
<name>Maher Sanalla</name>
<email>msanalla@nvidia.com</email>
</author>
<published>2023-01-22T21:24:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=ee128b700fd0c6689785196577d00ec968857e8e'/>
<id>urn:sha1:ee128b700fd0c6689785196577d00ec968857e8e</id>
<content type='text'>
[ Upstream commit 9965bbebae59b3563a4d95e4aed121e8965dfdc2 ]

Currently, each core device has VF pages counter which stores number of
fw pages used by its VFs and SFs.

The current design led to a hang when performing firmware reset on DPU,
where the DPU PFs stalled in sriov unload flow due to waiting on release
of SFs pages instead of waiting on only VFs pages.

Thus, Add a separate counter for SF firmware pages, which will prevent
the stall scenario described above.

Fixes: 1958fc2f0712 ("net/mlx5: SF, Add auxiliary device driver")
Signed-off-by: Maher Sanalla &lt;msanalla@nvidia.com&gt;
Reviewed-by: Shay Drory &lt;shayd@nvidia.com&gt;
Signed-off-by: Saeed Mahameed &lt;saeedm@nvidia.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
</feed>
