<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/net/devlink, branch v6.18.21</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.18.21</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.18.21'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2025-11-19T01:12:21+00:00</updated>
<entry>
<title>devlink: rate: Unset parent pointer in devl_rate_nodes_destroy</title>
<updated>2025-11-19T01:12:21+00:00</updated>
<author>
<name>Shay Drory</name>
<email>shayd@nvidia.com</email>
</author>
<published>2025-11-17T12:05:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=f94c1a114ac209977bdf5ca841b98424295ab1f0'/>
<id>urn:sha1:f94c1a114ac209977bdf5ca841b98424295ab1f0</id>
<content type='text'>
The function devl_rate_nodes_destroy is documented to "Unset parent for
all rate objects". However, it was only calling the driver-specific
`rate_leaf_parent_set` or `rate_node_parent_set` ops and decrementing
the parent's refcount, without actually setting the
`devlink_rate-&gt;parent` pointer to NULL.

This leaves a dangling pointer in the `devlink_rate` struct, which cause
refcount error in netdevsim[1] and mlx5[2]. In addition, this is
inconsistent with the behavior of `devlink_nl_rate_parent_node_set`,
where the parent pointer is correctly cleared.

This patch fixes the issue by explicitly setting `devlink_rate-&gt;parent`
to NULL after notifying the driver, thus fulfilling the function's
documented behavior for all rate objects.

[1]
repro steps:
echo 1 &gt; /sys/bus/netdevsim/new_device
devlink dev eswitch set netdevsim/netdevsim1 mode switchdev
echo 1 &gt; /sys/bus/netdevsim/devices/netdevsim1/sriov_numvfs
devlink port function rate add netdevsim/netdevsim1/test_node
devlink port function rate set netdevsim/netdevsim1/128 parent test_node
echo 1 &gt; /sys/bus/netdevsim/del_device

dmesg:
refcount_t: decrement hit 0; leaking memory.
WARNING: CPU: 8 PID: 1530 at lib/refcount.c:31 refcount_warn_saturate+0x42/0xe0
CPU: 8 UID: 0 PID: 1530 Comm: bash Not tainted 6.18.0-rc4+ #1 NONE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
RIP: 0010:refcount_warn_saturate+0x42/0xe0
Call Trace:
 &lt;TASK&gt;
 devl_rate_leaf_destroy+0x8d/0x90
 __nsim_dev_port_del+0x6c/0x70 [netdevsim]
 nsim_dev_reload_destroy+0x11c/0x140 [netdevsim]
 nsim_drv_remove+0x2b/0xb0 [netdevsim]
 device_release_driver_internal+0x194/0x1f0
 bus_remove_device+0xc6/0x130
 device_del+0x159/0x3c0
 device_unregister+0x1a/0x60
 del_device_store+0x111/0x170 [netdevsim]
 kernfs_fop_write_iter+0x12e/0x1e0
 vfs_write+0x215/0x3d0
 ksys_write+0x5f/0xd0
 do_syscall_64+0x55/0x10f0
 entry_SYSCALL_64_after_hwframe+0x4b/0x53

[2]
devlink dev eswitch set pci/0000:08:00.0 mode switchdev
devlink port add pci/0000:08:00.0 flavour pcisf pfnum 0 sfnum 1000
devlink port function rate add pci/0000:08:00.0/group1
devlink port function rate set pci/0000:08:00.0/32768 parent group1
modprobe -r mlx5_ib mlx5_fwctl mlx5_core

dmesg:
refcount_t: decrement hit 0; leaking memory.
WARNING: CPU: 7 PID: 16151 at lib/refcount.c:31 refcount_warn_saturate+0x42/0xe0
CPU: 7 UID: 0 PID: 16151 Comm: bash Not tainted 6.17.0-rc7_for_upstream_min_debug_2025_10_02_12_44 #1 NONE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
RIP: 0010:refcount_warn_saturate+0x42/0xe0
Call Trace:
 &lt;TASK&gt;
 devl_rate_leaf_destroy+0x8d/0x90
 mlx5_esw_offloads_devlink_port_unregister+0x33/0x60 [mlx5_core]
 mlx5_esw_offloads_unload_rep+0x3f/0x50 [mlx5_core]
 mlx5_eswitch_unload_sf_vport+0x40/0x90 [mlx5_core]
 mlx5_sf_esw_event+0xc4/0x120 [mlx5_core]
 notifier_call_chain+0x33/0xa0
 blocking_notifier_call_chain+0x3b/0x50
 mlx5_eswitch_disable_locked+0x50/0x110 [mlx5_core]
 mlx5_eswitch_disable+0x63/0x90 [mlx5_core]
 mlx5_unload+0x1d/0x170 [mlx5_core]
 mlx5_uninit_one+0xa2/0x130 [mlx5_core]
 remove_one+0x78/0xd0 [mlx5_core]
 pci_device_remove+0x39/0xa0
 device_release_driver_internal+0x194/0x1f0
 unbind_store+0x99/0xa0
 kernfs_fop_write_iter+0x12e/0x1e0
 vfs_write+0x215/0x3d0
 ksys_write+0x5f/0xd0
 do_syscall_64+0x53/0x1f0
 entry_SYSCALL_64_after_hwframe+0x4b/0x53

Fixes: d75559845078 ("devlink: Allow setting parent node of rate objects")
Signed-off-by: Shay Drory &lt;shayd@nvidia.com&gt;
Reviewed-by: Carolina Jubran &lt;cjubran@nvidia.com&gt;
Signed-off-by: Tariq Toukan &lt;tariqt@nvidia.com&gt;
Link: https://patch.msgid.link/1763381149-1234377-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net: replace use of system_wq with system_percpu_wq</title>
<updated>2025-09-23T00:40:30+00:00</updated>
<author>
<name>Marco Crivellari</name>
<email>marco.crivellari@suse.com</email>
</author>
<published>2025-09-18T14:24:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=5fd8bb982e10f29e856ef71072609af5ce55d281'/>
<id>urn:sha1:5fd8bb982e10f29e856ef71072609af5ce55d281</id>
<content type='text'>
Currently if a user enqueue a work item using schedule_delayed_work() the
used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND.

This lack of consistentcy cannot be addressed without refactoring the API.

system_unbound_wq should be the default workqueue so as not to enforce
locality constraints for random work whenever it's not required.

Adding system_dfl_wq to encourage its use when unbound work should be used.

The old system_unbound_wq will be kept for a few release cycles.

Suggested-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Marco Crivellari &lt;marco.crivellari@suse.com&gt;
Link: https://patch.msgid.link/20250918142427.309519-3-marco.crivellari@suse.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net</title>
<updated>2025-09-18T18:26:06+00:00</updated>
<author>
<name>Jakub Kicinski</name>
<email>kuba@kernel.org</email>
</author>
<published>2025-09-18T18:23:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=f2cdc4c22bca57ab4cdb28ef4a02c53837d26fe0'/>
<id>urn:sha1:f2cdc4c22bca57ab4cdb28ef4a02c53837d26fe0</id>
<content type='text'>
Cross-merge networking fixes after downstream PR (net-6.17-rc7).

No conflicts.

Adjacent changes:

drivers/net/ethernet/mellanox/mlx5/core/en/fs.h
  9536fbe10c9d ("net/mlx5e: Add PSP steering in local NIC RX")
  7601a0a46216 ("net/mlx5e: Add a miss level for ipsec crypto offload")

Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>devlink rate: Remove unnecessary 'static' from a couple places</title>
<updated>2025-09-18T14:47:18+00:00</updated>
<author>
<name>Cosmin Ratiu</name>
<email>cratiu@nvidia.com</email>
</author>
<published>2025-09-18T10:15:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=3191df0a4882c827cac29925e80ecb1775b904bd'/>
<id>urn:sha1:3191df0a4882c827cac29925e80ecb1775b904bd</id>
<content type='text'>
devlink_rate_node_get_by_name() and devlink_rate_nodes_destroy() have a
couple of unnecessary static variables for iterating over devlink rates.

This could lead to races/corruption/unhappiness if two concurrent
operations execute the same function.

Remove 'static' from both. It's amazing this was missed for 4+ years.
While at it, I confirmed there are no more examples of this mistake in
net/ with 1, 2 or 3 levels of indentation.

Fixes: a8ecb93ef03d ("devlink: Introduce rate nodes")
Signed-off-by: Cosmin Ratiu &lt;cratiu@nvidia.com&gt;
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>devlink: Add a 'num_doorbells' driverinit param</title>
<updated>2025-09-18T01:30:51+00:00</updated>
<author>
<name>Cosmin Ratiu</name>
<email>cratiu@nvidia.com</email>
</author>
<published>2025-09-16T14:11:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=6bdcb735fec6cb866b0d40634d4f23effba81074'/>
<id>urn:sha1:6bdcb735fec6cb866b0d40634d4f23effba81074</id>
<content type='text'>
This parameter can be used by drivers to configure a different number of
doorbells.

Signed-off-by: Cosmin Ratiu &lt;cratiu@nvidia.com&gt;
Reviewed-by: Dragos Tatulea &lt;dtatulea@nvidia.com&gt;
Reviewed-by: Jiri Pirko &lt;jiri@nvidia.com&gt;
Signed-off-by: Tariq Toukan &lt;tariqt@nvidia.com&gt;
Reviewed-by: Simon Horman &lt;horms@kernel.org&gt;
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>devlink: Add 'total_vfs' generic device param</title>
<updated>2025-09-10T02:14:23+00:00</updated>
<author>
<name>Vlad Dumitrescu</name>
<email>vdumitrescu@nvidia.com</email>
</author>
<published>2025-09-07T01:29:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=ce0b015e2619ae64b7d33fb24a6b6cadcd70c317'/>
<id>urn:sha1:ce0b015e2619ae64b7d33fb24a6b6cadcd70c317</id>
<content type='text'>
NICs are typically configured with total_vfs=0, forcing users to rely
on external tools to enable SR-IOV (a widely used and essential feature).

Add total_vfs parameter to devlink for SR-IOV max VF configurability.
Enables standard kernel tools to manage SR-IOV, addressing the need for
flexible VF configuration.

Signed-off-by: Vlad Dumitrescu &lt;vdumitrescu@nvidia.com&gt;
Tested-by: Kamal Heib &lt;kheib@redhat.com&gt;
Reviewed-by: Jiri Pirko &lt;jiri@nvidia.com&gt;
Signed-off-by: Saeed Mahameed &lt;saeedm@nvidia.com&gt;
Reviewed-by: Simon Horman &lt;horms@kernel.org&gt;
Link: https://patch.msgid.link/20250907012953.301746-2-saeed@kernel.org
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>devlink: Make health reporter burst period configurable</title>
<updated>2025-08-27T00:24:16+00:00</updated>
<author>
<name>Shahar Shitrit</name>
<email>shshitrit@nvidia.com</email>
</author>
<published>2025-08-24T08:43:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=da0e2197645c8e01bb6080c7a2b86d9a56cc64a9'/>
<id>urn:sha1:da0e2197645c8e01bb6080c7a2b86d9a56cc64a9</id>
<content type='text'>
Enable configuration of the burst period — a time window starting
from the first error recovery, during which the reporter allows
recovery attempts for each reported error.

This feature is helpful when a single underlying issue causes multiple
errors, as it delays the start of the grace period to allow sufficient
time for recovering all related errors. For example, if multiple TX
queues time out simultaneously, a sufficient burst period could allow
all affected TX queues to be recovered within that window. Without this
period, only the first TX queue that reports a timeout will undergo
recovery, while the remaining TX queues will be blocked once the grace
period begins.

Configuration example:
$ devlink health set pci/0000:00:09.0 reporter tx burst_period 500

Configuration example with ynl:
./tools/net/ynl/pyynl/cli.py \
 --spec Documentation/netlink/specs/devlink.yaml \
 --do health-reporter-set --json '{
  "bus-name": "auxiliary",
  "dev-name": "mlx5_core.eth.0",
  "port-index": 65535,
  "health-reporter-name": "tx",
  "health-reporter-burst-period": 500
}'

Signed-off-by: Shahar Shitrit &lt;shshitrit@nvidia.com&gt;
Reviewed-by: Jiri Pirko &lt;jiri@nvidia.com&gt;
Reviewed-by: Dragos Tatulea &lt;dtatulea@nvidia.com&gt;
Reviewed-by: Carolina Jubran &lt;cjubran@nvidia.com&gt;
Signed-off-by: Mark Bloch &lt;mbloch@nvidia.com&gt;
Link: https://patch.msgid.link/20250824084354.533182-5-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>devlink: Introduce burst period for health reporter</title>
<updated>2025-08-27T00:24:16+00:00</updated>
<author>
<name>Shahar Shitrit</name>
<email>shshitrit@nvidia.com</email>
</author>
<published>2025-08-24T08:43:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=6a06d8c40510ba1ecf27977f528b1eb74f290a60'/>
<id>urn:sha1:6a06d8c40510ba1ecf27977f528b1eb74f290a60</id>
<content type='text'>
Currently, the devlink health reporter starts the grace period
immediately after handling an error, blocking any further recoveries
until it finished.

However, when a single root cause triggers multiple errors in a short
time frame, it is desirable to treat them as a bulk of errors and to
allow their recoveries, avoiding premature blocking of subsequent
related errors, and reducing the risk of inconsistent or incomplete
error handling.

To address this, introduce a configurable burst period for devlink
health reporter. Start this period when the first error is handled,
and allow recovery attempts for reported errors during this window.
Once burst period expires, begin the grace period to block further
recoveries until it concludes.

Timeline summary:

----|--------|------------------------------/----------------------/--
error is  error is       burst period           grace period
reported  recovered  (recoveries allowed)    (recoveries blocked)

For calculating the burst period duration, use the same
last_recovery_ts as the grace period. Update it on recovery only
when the burst period is inactive (either disabled or at the
first error).

This patch implements the framework for the burst period and
effectively sets its value to 0 at reporter creation, so the current
behavior remains unchanged, which ensures backward compatibility.

A downstream patch will make the burst period configurable.

Signed-off-by: Shahar Shitrit &lt;shshitrit@nvidia.com&gt;
Reviewed-by: Jiri Pirko &lt;jiri@nvidia.com&gt;
Signed-off-by: Mark Bloch &lt;mbloch@nvidia.com&gt;
Link: https://patch.msgid.link/20250824084354.533182-4-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>devlink: Move health reporter recovery abort logic to a separate function</title>
<updated>2025-08-27T00:24:16+00:00</updated>
<author>
<name>Shahar Shitrit</name>
<email>shshitrit@nvidia.com</email>
</author>
<published>2025-08-24T08:43:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=20597fb9436e2e2372ddf782f0bb5ecbe3481068'/>
<id>urn:sha1:20597fb9436e2e2372ddf782f0bb5ecbe3481068</id>
<content type='text'>
Extract the health reporter recovery abort logic into a separate
function devlink_health_recover_abort().
The function encapsulates the conditions for aborting recovery:
- When auto-recovery is disabled
- When previous error wasn't recovered
- When within the grace period after last recovery

Signed-off-by: Shahar Shitrit &lt;shshitrit@nvidia.com&gt;
Reviewed-by: Jiri Pirko &lt;jiri@nvidia.com&gt;
Reviewed-by: Dragos Tatulea &lt;dtatulea@nvidia.com&gt;
Reviewed-by: Carolina Jubran &lt;cjubran@nvidia.com&gt;
Signed-off-by: Mark Bloch &lt;mbloch@nvidia.com&gt;
Link: https://patch.msgid.link/20250824084354.533182-3-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>devlink: Move graceful period parameter to reporter ops</title>
<updated>2025-08-27T00:24:16+00:00</updated>
<author>
<name>Shahar Shitrit</name>
<email>shshitrit@nvidia.com</email>
</author>
<published>2025-08-24T08:43:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=d2b007374551ac09db16badde575cdd698f6fc92'/>
<id>urn:sha1:d2b007374551ac09db16badde575cdd698f6fc92</id>
<content type='text'>
Move the default graceful period from a parameter to
devlink_health_reporter_create() to a field in the
devlink_health_reporter_ops structure.

This change improves consistency, as the graceful period is inherently
tied to the reporter's behavior and recovery policy. It simplifies the
signature of devlink_health_reporter_create() and its internal helper
functions. It also centralizes the reporter configuration at the ops
structure, preparing the groundwork for a downstream patch that will
introduce a devlink health reporter burst period attribute whose
default value will similarly be provided by the driver via the ops
structure.

Signed-off-by: Shahar Shitrit &lt;shshitrit@nvidia.com&gt;
Reviewed-by: Jiri Pirko &lt;jiri@nvidia.com&gt;
Signed-off-by: Mark Bloch &lt;mbloch@nvidia.com&gt;
Link: https://patch.msgid.link/20250824084354.533182-2-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
</feed>
