<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/net/devlink, branch v6.12.80</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.12.80</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.12.80'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2025-12-01T10:43:33+00:00</updated>
<entry>
<title>devlink: rate: Unset parent pointer in devl_rate_nodes_destroy</title>
<updated>2025-12-01T10:43:33+00:00</updated>
<author>
<name>Shay Drory</name>
<email>shayd@nvidia.com</email>
</author>
<published>2025-11-17T12:05:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c70df6c17d389cc743f0eb30160e2d6bc6910db8'/>
<id>urn:sha1:c70df6c17d389cc743f0eb30160e2d6bc6910db8</id>
<content type='text'>
[ Upstream commit f94c1a114ac209977bdf5ca841b98424295ab1f0 ]

The function devl_rate_nodes_destroy is documented to "Unset parent for
all rate objects". However, it was only calling the driver-specific
`rate_leaf_parent_set` or `rate_node_parent_set` ops and decrementing
the parent's refcount, without actually setting the
`devlink_rate-&gt;parent` pointer to NULL.

This leaves a dangling pointer in the `devlink_rate` struct, which cause
refcount error in netdevsim[1] and mlx5[2]. In addition, this is
inconsistent with the behavior of `devlink_nl_rate_parent_node_set`,
where the parent pointer is correctly cleared.

This patch fixes the issue by explicitly setting `devlink_rate-&gt;parent`
to NULL after notifying the driver, thus fulfilling the function's
documented behavior for all rate objects.

[1]
repro steps:
echo 1 &gt; /sys/bus/netdevsim/new_device
devlink dev eswitch set netdevsim/netdevsim1 mode switchdev
echo 1 &gt; /sys/bus/netdevsim/devices/netdevsim1/sriov_numvfs
devlink port function rate add netdevsim/netdevsim1/test_node
devlink port function rate set netdevsim/netdevsim1/128 parent test_node
echo 1 &gt; /sys/bus/netdevsim/del_device

dmesg:
refcount_t: decrement hit 0; leaking memory.
WARNING: CPU: 8 PID: 1530 at lib/refcount.c:31 refcount_warn_saturate+0x42/0xe0
CPU: 8 UID: 0 PID: 1530 Comm: bash Not tainted 6.18.0-rc4+ #1 NONE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
RIP: 0010:refcount_warn_saturate+0x42/0xe0
Call Trace:
 &lt;TASK&gt;
 devl_rate_leaf_destroy+0x8d/0x90
 __nsim_dev_port_del+0x6c/0x70 [netdevsim]
 nsim_dev_reload_destroy+0x11c/0x140 [netdevsim]
 nsim_drv_remove+0x2b/0xb0 [netdevsim]
 device_release_driver_internal+0x194/0x1f0
 bus_remove_device+0xc6/0x130
 device_del+0x159/0x3c0
 device_unregister+0x1a/0x60
 del_device_store+0x111/0x170 [netdevsim]
 kernfs_fop_write_iter+0x12e/0x1e0
 vfs_write+0x215/0x3d0
 ksys_write+0x5f/0xd0
 do_syscall_64+0x55/0x10f0
 entry_SYSCALL_64_after_hwframe+0x4b/0x53

[2]
devlink dev eswitch set pci/0000:08:00.0 mode switchdev
devlink port add pci/0000:08:00.0 flavour pcisf pfnum 0 sfnum 1000
devlink port function rate add pci/0000:08:00.0/group1
devlink port function rate set pci/0000:08:00.0/32768 parent group1
modprobe -r mlx5_ib mlx5_fwctl mlx5_core

dmesg:
refcount_t: decrement hit 0; leaking memory.
WARNING: CPU: 7 PID: 16151 at lib/refcount.c:31 refcount_warn_saturate+0x42/0xe0
CPU: 7 UID: 0 PID: 16151 Comm: bash Not tainted 6.17.0-rc7_for_upstream_min_debug_2025_10_02_12_44 #1 NONE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
RIP: 0010:refcount_warn_saturate+0x42/0xe0
Call Trace:
 &lt;TASK&gt;
 devl_rate_leaf_destroy+0x8d/0x90
 mlx5_esw_offloads_devlink_port_unregister+0x33/0x60 [mlx5_core]
 mlx5_esw_offloads_unload_rep+0x3f/0x50 [mlx5_core]
 mlx5_eswitch_unload_sf_vport+0x40/0x90 [mlx5_core]
 mlx5_sf_esw_event+0xc4/0x120 [mlx5_core]
 notifier_call_chain+0x33/0xa0
 blocking_notifier_call_chain+0x3b/0x50
 mlx5_eswitch_disable_locked+0x50/0x110 [mlx5_core]
 mlx5_eswitch_disable+0x63/0x90 [mlx5_core]
 mlx5_unload+0x1d/0x170 [mlx5_core]
 mlx5_uninit_one+0xa2/0x130 [mlx5_core]
 remove_one+0x78/0xd0 [mlx5_core]
 pci_device_remove+0x39/0xa0
 device_release_driver_internal+0x194/0x1f0
 unbind_store+0x99/0xa0
 kernfs_fop_write_iter+0x12e/0x1e0
 vfs_write+0x215/0x3d0
 ksys_write+0x5f/0xd0
 do_syscall_64+0x53/0x1f0
 entry_SYSCALL_64_after_hwframe+0x4b/0x53

Fixes: d75559845078 ("devlink: Allow setting parent node of rate objects")
Signed-off-by: Shay Drory &lt;shayd@nvidia.com&gt;
Reviewed-by: Carolina Jubran &lt;cjubran@nvidia.com&gt;
Signed-off-by: Tariq Toukan &lt;tariqt@nvidia.com&gt;
Link: https://patch.msgid.link/1763381149-1234377-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>devlink: fix xa_alloc_cyclic() error handling</title>
<updated>2025-03-28T21:03:27+00:00</updated>
<author>
<name>Michal Swiatkowski</name>
<email>michal.swiatkowski@linux.intel.com</email>
</author>
<published>2025-03-12T09:52:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=f8aaa38cfaf6f20afa4db36b6529032fb69165dc'/>
<id>urn:sha1:f8aaa38cfaf6f20afa4db36b6529032fb69165dc</id>
<content type='text'>
[ Upstream commit f3b97b7d4bf316c3991e5634c9f4847c2df35478 ]

In case of returning 1 from xa_alloc_cyclic() (wrapping) ERR_PTR(1) will
be returned, which will cause IS_ERR() to be false. Which can lead to
dereference not allocated pointer (rel).

Fix it by checking if err is lower than zero.

This wasn't found in real usecase, only noticed. Credit to Pierre.

Fixes: c137743bce02 ("devlink: introduce object and nested devlink relationship infra")
Signed-off-by: Michal Swiatkowski &lt;michal.swiatkowski@linux.intel.com&gt;
Reviewed-by: Andrew Lunn &lt;andrew@lunn.ch&gt;
Reviewed-by: Jiri Pirko &lt;jiri@nvidia.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>devlink: Constify the 'table_ops' parameter of devl_dpipe_table_register()</title>
<updated>2024-06-05T09:24:57+00:00</updated>
<author>
<name>Christophe JAILLET</name>
<email>christophe.jaillet@wanadoo.fr</email>
</author>
<published>2024-06-02T14:18:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=82dc29b9737edf2d13561ebcf6212c0b88c41129'/>
<id>urn:sha1:82dc29b9737edf2d13561ebcf6212c0b88c41129</id>
<content type='text'>
"struct devlink_dpipe_table_ops" only contains some function pointers.

Update "struct devlink_dpipe_table" and the 'table_ops' parameter of
devl_dpipe_table_register() so that structures in drivers can be
constified.

Constifying these structures will move some data to a read-only section, so
increase overall security.

Signed-off-by: Christophe JAILLET &lt;christophe.jaillet@wanadoo.fr&gt;
Reviewed-by: Wojciech Drewek &lt;wojciech.drewek@intel.com&gt;
Reviewed-by: Ido Schimmel &lt;idosch@nvidia.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>devlink: extend devlink_param *set pointer</title>
<updated>2024-04-22T20:05:19+00:00</updated>
<author>
<name>Mateusz Polchlopek</name>
<email>mateusz.polchlopek@intel.com</email>
</author>
<published>2024-04-19T08:08:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=5625ca5640caa3fb797f155601d56379d260d6ba'/>
<id>urn:sha1:5625ca5640caa3fb797f155601d56379d260d6ba</id>
<content type='text'>
Extend devlink_param *set function pointer to take extack as a param.
Sometimes it is needed to pass information to the end user from set
function. It is more proper to use for that netlink instead of passing
message to dmesg.

Reviewed-by: Jiri Pirko &lt;jiri@nvidia.com&gt;
Reviewed-by: Przemek Kitszel &lt;przemyslaw.kitszel@intel.com&gt;
Signed-off-by: Mateusz Polchlopek &lt;mateusz.polchlopek@intel.com&gt;
Signed-off-by: Tony Nguyen &lt;anthony.l.nguyen@intel.com&gt;
</content>
</entry>
<entry>
<title>devlink: Support setting max_io_eqs</title>
<updated>2024-04-08T13:10:45+00:00</updated>
<author>
<name>Parav Pandit</name>
<email>parav@nvidia.com</email>
</author>
<published>2024-04-06T01:05:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=5af3e3876d567fb79a355bec1cb48e432d69b4fb'/>
<id>urn:sha1:5af3e3876d567fb79a355bec1cb48e432d69b4fb</id>
<content type='text'>
Many devices send event notifications for the IO queues,
such as tx and rx queues, through event queues.

Enable a privileged owner, such as a hypervisor PF, to set the number
of IO event queues for the VF and SF during the provisioning stage.

example:
Get maximum IO event queues of the VF device::

  $ devlink port show pci/0000:06:00.0/2
  pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1
      function:
          hw_addr 00:00:00:00:00:00 ipsec_packet disabled max_io_eqs 10

Set maximum IO event queues of the VF device::

  $ devlink port function set pci/0000:06:00.0/2 max_io_eqs 32

  $ devlink port show pci/0000:06:00.0/2
  pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1
      function:
          hw_addr 00:00:00:00:00:00 ipsec_packet disabled max_io_eqs 32

Reviewed-by: Jiri Pirko &lt;jiri@nvidia.com&gt;
Reviewed-by: Shay Drory &lt;shayd@nvidia.com&gt;
Signed-off-by: Parav Pandit &lt;parav@nvidia.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>netlink: add nlmsg_consume() and use it in devlink compat</title>
<updated>2024-04-06T17:20:14+00:00</updated>
<author>
<name>Jakub Kicinski</name>
<email>kuba@kernel.org</email>
</author>
<published>2024-04-03T20:22:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=8e69b3459ca1ed4f6f7bd0b0a11962ddb3e7d34a'/>
<id>urn:sha1:8e69b3459ca1ed4f6f7bd0b0a11962ddb3e7d34a</id>
<content type='text'>
devlink_compat_running_version() sticks out when running
netdevsim tests and watching dropped skbs. Add nlmsg_consume()
for cases were we want to free a netlink skb but it is expected,
rather than a drop. af_netlink code uses consume_skb() directly,
which is fine, but some may prefer the symmetry of nlmsg_new() /
nlmsg_consume().

Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>netlink: introduce type-checking attribute iteration</title>
<updated>2024-03-29T22:06:02+00:00</updated>
<author>
<name>Johannes Berg</name>
<email>johannes.berg@intel.com</email>
</author>
<published>2024-03-28T19:31:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e8058a49e67fe7bc7e4a0308851a3ca3a6d2e45d'/>
<id>urn:sha1:e8058a49e67fe7bc7e4a0308851a3ca3a6d2e45d</id>
<content type='text'>
There are, especially with multi-attr arrays, many cases
of needing to iterate all attributes of a specific type
in a netlink message or a nested attribute. Add specific
macros to support that case.

Also convert many instances using this spatch:

    @@
    iterator nla_for_each_attr;
    iterator name nla_for_each_attr_type;
    identifier nla;
    expression head, len, rem;
    expression ATTR;
    type T;
    identifier x;
    @@
    -nla_for_each_attr(nla, head, len, rem)
    +nla_for_each_attr_type(nla, ATTR, head, len, rem)
     {
    &lt;... T x; ...&gt;
    -if (nla_type(nla) == ATTR) {
     ...
    -}
     }

    @@
    identifier nla;
    iterator nla_for_each_nested;
    iterator name nla_for_each_nested_type;
    expression attr, rem;
    expression ATTR;
    type T;
    identifier x;
    @@
    -nla_for_each_nested(nla, attr, rem)
    +nla_for_each_nested_type(nla, ATTR, attr, rem)
     {
    &lt;... T x; ...&gt;
    -if (nla_type(nla) == ATTR) {
     ...
    -}
     }

    @@
    iterator nla_for_each_attr;
    iterator name nla_for_each_attr_type;
    identifier nla;
    expression head, len, rem;
    expression ATTR;
    type T;
    identifier x;
    @@
    -nla_for_each_attr(nla, head, len, rem)
    +nla_for_each_attr_type(nla, ATTR, head, len, rem)
     {
    &lt;... T x; ...&gt;
    -if (nla_type(nla) != ATTR) continue;
     ...
     }

    @@
    identifier nla;
    iterator nla_for_each_nested;
    iterator name nla_for_each_nested_type;
    expression attr, rem;
    expression ATTR;
    type T;
    identifier x;
    @@
    -nla_for_each_nested(nla, attr, rem)
    +nla_for_each_nested_type(nla, ATTR, attr, rem)
     {
    &lt;... T x; ...&gt;
    -if (nla_type(nla) != ATTR) continue;
     ...
     }

Although I had to undo one bad change this made, and
I also adjusted some other code for whitespace and to
use direct variable initialization now.

Signed-off-by: Johannes Berg &lt;johannes.berg@intel.com&gt;
Link: https://lore.kernel.org/r/20240328203144.b5a6c895fb80.I1869b44767379f204998ff44dd239803f39c23e0@changeid
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>devlink: use kvzalloc() to allocate devlink instance resources</title>
<updated>2024-03-29T19:24:38+00:00</updated>
<author>
<name>Jian Wen</name>
<email>wenjianhn@gmail.com</email>
</author>
<published>2024-03-27T08:21:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=730fffce4fd2eb7a0be2d0b6cd7e55e9194d76d5'/>
<id>urn:sha1:730fffce4fd2eb7a0be2d0b6cd7e55e9194d76d5</id>
<content type='text'>
During live migration of a virtual machine, the SR-IOV VF need to be
re-registered. It may fail when the memory is badly fragmented.

The related log is as follows.

    kernel: hv_netvsc 6045bdaa-c0d1-6045-bdaa-c0d16045bdaa eth0: VF slot 1 added
...
    kernel: kworker/0:0: page allocation failure: order:7, mode:0x40dc0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), nodemask=(null),cpuset=/,mems_allowed=0
    kernel: CPU: 0 PID: 24006 Comm: kworker/0:0 Tainted: G            E     5.4...x86_64 #1
    kernel: Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008  12/07/2018
    kernel: Workqueue: events work_for_cpu_fn
    kernel: Call Trace:
    kernel: dump_stack+0x8b/0xc8
    kernel: warn_alloc+0xff/0x170
    kernel: __alloc_pages_slowpath+0x92c/0xb2b
    kernel: ? get_page_from_freelist+0x1d4/0x1140
    kernel: __alloc_pages_nodemask+0x2f9/0x320
    kernel: alloc_pages_current+0x6a/0xb0
    kernel: kmalloc_order+0x1e/0x70
    kernel: kmalloc_order_trace+0x26/0xb0
    kernel: ? __switch_to_asm+0x34/0x70
    kernel: __kmalloc+0x276/0x280
    kernel: ? _raw_spin_unlock_irqrestore+0x1e/0x40
    kernel: devlink_alloc+0x29/0x110
    kernel: mlx5_devlink_alloc+0x1a/0x20 [mlx5_core]
    kernel: init_one+0x1d/0x650 [mlx5_core]
    kernel: local_pci_probe+0x46/0x90
    kernel: work_for_cpu_fn+0x1a/0x30
    kernel: process_one_work+0x16d/0x390
    kernel: worker_thread+0x1d3/0x3f0
    kernel: kthread+0x105/0x140
    kernel: ? max_active_store+0x80/0x80
    kernel: ? kthread_bind+0x20/0x20
    kernel: ret_from_fork+0x3a/0x50

Signed-off-by: Jian Wen &lt;wenjian1@xiaomi.com&gt;
Link: https://lore.kernel.org/r/20240327082128.942818-1-wenjian1@xiaomi.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>devlink: fix port new reply cmd type</title>
<updated>2024-03-20T02:37:57+00:00</updated>
<author>
<name>Jiri Pirko</name>
<email>jiri@nvidia.com</email>
</author>
<published>2024-03-18T09:19:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=78a2f5e6c15d8dcbd6495bb9635c7cb89235dfc5'/>
<id>urn:sha1:78a2f5e6c15d8dcbd6495bb9635c7cb89235dfc5</id>
<content type='text'>
Due to a c&amp;p error, port new reply fills-up cmd with wrong value,
any other existing port command replies and notifications.

Fix it by filling cmd with value DEVLINK_CMD_PORT_NEW.

Skimmed through devlink userspace implementations, none of them cares
about this cmd value.

Reported-by: Chenyuan Yang &lt;chenyuan0y@gmail.com&gt;
Closes: https://lore.kernel.org/all/ZfZcDxGV3tSy4qsV@cy-server/
Fixes: cd76dcd68d96 ("devlink: Support add and delete devlink port")
Signed-off-by: Jiri Pirko &lt;jiri@nvidia.com&gt;
Reviewed-by: Parav Pandit &lt;parav@nvidia.com&gt;
Reviewed-by: Kalesh AP &lt;kalesh-anakkur.purayil@broadcom.com&gt;
Link: https://lore.kernel.org/r/20240318091908.2736542-1-jiri@resnulli.us
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>devlink: Fix devlink parallel commands processing</title>
<updated>2024-03-13T08:31:40+00:00</updated>
<author>
<name>Shay Drory</name>
<email>shayd@nvidia.com</email>
</author>
<published>2024-03-12T10:52:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=d7d75124965aee23e5e4421d78376545cf070b0a'/>
<id>urn:sha1:d7d75124965aee23e5e4421d78376545cf070b0a</id>
<content type='text'>
Commit 870c7ad4a52b ("devlink: protect devlink-&gt;dev by the instance
lock") added devlink instance locking inside a loop that iterates over
all the registered devlink instances on the machine in the pre-doit
phase. This can lead to serialization of devlink commands over
different devlink instances.

For example: While the first devlink instance is executing firmware
flash, all commands to other devlink instances on the machine are
forced to wait until the first devlink finishes.

Therefore, in the pre-doit phase, take the devlink instance lock only
for the devlink instance the command is targeting. Devlink layer is
taking a reference on the devlink instance, ensuring the devlink-&gt;dev
pointer is valid. This reference taking was introduced by commit
a380687200e0 ("devlink: take device reference for devlink object").
Without this commit, it would not be safe to access devlink-&gt;dev
lockless.

Fixes: 870c7ad4a52b ("devlink: protect devlink-&gt;dev by the instance lock")
Signed-off-by: Shay Drory &lt;shayd@nvidia.com&gt;
Reviewed-by: Jiri Pirko &lt;jiri@nvidia.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
