<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/drivers/net/vrf.c, branch v6.18.21</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.18.21</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.18.21'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2025-08-27T00:34:31+00:00</updated>
<entry>
<title>ipv4: Convert -&gt;flowi4_tos to dscp_t.</title>
<updated>2025-08-27T00:34:31+00:00</updated>
<author>
<name>Guillaume Nault</name>
<email>gnault@redhat.com</email>
</author>
<published>2025-08-25T13:37:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=1bec9d0c0046fe4e2bfb6a1c5aadcb5d56cdb0fb'/>
<id>urn:sha1:1bec9d0c0046fe4e2bfb6a1c5aadcb5d56cdb0fb</id>
<content type='text'>
Convert the -&gt;flowic_tos field of struct flowi_common from __u8 to
dscp_t, rename it -&gt;flowic_dscp and propagate these changes to struct
flowi and struct flowi4.

We've had several bugs in the past where ECN bits could interfere with
IPv4 routing, because these bits were not properly cleared when setting
-&gt;flowi4_tos. These bugs should be fixed now and the dscp_t type has
been introduced to ensure that variables carrying DSCP values don't
accidentally have any ECN bits set. Several variables and structure
fields have been converted to dscp_t already, but the main IPv4 routing
structure, struct flowi4, is still using a __u8. To avoid any future
regression, this patch converts it to dscp_t.

There are many users to convert at once. Fortunately, around half of
-&gt;flowi4_tos users already have a dscp_t value at hand, which they
currently convert to __u8 using inet_dscp_to_dsfield(). For all of
these users, we just need to drop that conversion.

But, although we try to do the __u8 &lt;-&gt; dscp_t conversions at the
boundaries of the network or of user space, some places still store
TOS/DSCP variables as __u8 in core networking code. Those can hardly be
converted either because the data structure is part of UAPI or because
the same variable or field is also used for handling ECN in other parts
of the code. In all of these cases where we don't have a dscp_t
variable at hand, we need to use inet_dsfield_to_dscp() when
interacting with -&gt;flowi4_dscp.

Changes since v1:
  * Fix space alignment in __bpf_redirect_neigh_v4() (Ido).

Signed-off-by: Guillaume Nault &lt;gnault@redhat.com&gt;
Reviewed-by: Ido Schimmel &lt;idosch@nvidia.com&gt;
Link: https://patch.msgid.link/29acecb45e911d17446b9a3dbdb1ab7b821ea371.1756128932.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>vrf: Drop existing dst reference in vrf_ip6_input_dst</title>
<updated>2025-07-26T18:28:45+00:00</updated>
<author>
<name>Stanislav Fomichev</name>
<email>sdf@fomichev.me</email>
</author>
<published>2025-07-25T16:00:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=f388f807eca1de9e6e70f9ffb1a573c3811c4215'/>
<id>urn:sha1:f388f807eca1de9e6e70f9ffb1a573c3811c4215</id>
<content type='text'>
Commit ff3fbcdd4724 ("selftests: tc: Add generic erspan_opts matching support
for tc-flower") started triggering the following kmemleak warning:

unreferenced object 0xffff888015fb0e00 (size 512):
  comm "softirq", pid 0, jiffies 4294679065
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 40 d2 85 9e ff ff ff ff  ........@.......
    41 69 59 9d ff ff ff ff 00 00 00 00 00 00 00 00  AiY.............
  backtrace (crc 30b71e8b):
    __kmalloc_noprof+0x359/0x460
    metadata_dst_alloc+0x28/0x490
    erspan_rcv+0x4f1/0x1160 [ip_gre]
    gre_rcv+0x217/0x240 [ip_gre]
    gre_rcv+0x1b8/0x400 [gre]
    ip_protocol_deliver_rcu+0x31d/0x3a0
    ip_local_deliver_finish+0x37d/0x620
    ip_local_deliver+0x174/0x460
    ip_rcv+0x52b/0x6b0
    __netif_receive_skb_one_core+0x149/0x1a0
    process_backlog+0x3c8/0x1390
    __napi_poll.constprop.0+0xa1/0x390
    net_rx_action+0x59b/0xe00
    handle_softirqs+0x22b/0x630
    do_softirq+0xb1/0xf0
    __local_bh_enable_ip+0x115/0x150

vrf_ip6_input_dst unconditionally sets skb dst entry, add a call to
skb_dst_drop to drop any existing entry.

Cc: David Ahern &lt;dsahern@kernel.org&gt;
Reviewed-by: Ido Schimmel &lt;idosch@nvidia.com&gt;
Fixes: 9ff74384600a ("net: vrf: Handle ipv6 multicast and link-local addresses")
Signed-off-by: Stanislav Fomichev &lt;sdf@fomichev.me&gt;
Link: https://patch.msgid.link/20250725160043.350725-1-sdf@fomichev.me
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net: sched: generalize check for no-queue qdisc on TX queue</title>
<updated>2025-04-28T21:06:58+00:00</updated>
<author>
<name>Jesper Dangaard Brouer</name>
<email>hawk@kernel.org</email>
</author>
<published>2025-04-25T14:55:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=34dd0fecaa02d654c447d43a7e4c72f9b18b7033'/>
<id>urn:sha1:34dd0fecaa02d654c447d43a7e4c72f9b18b7033</id>
<content type='text'>
The "noqueue" qdisc can either be directly attached, or get default
attached if net_device priv_flags has IFF_NO_QUEUE. In both cases, the
allocated Qdisc structure gets it's enqueue function pointer reset to
NULL by noqueue_init() via noqueue_qdisc_ops.

This is a common case for software virtual net_devices. For these devices
with no-queue, the transmission path in __dev_queue_xmit() will bypass
the qdisc layer. Directly invoking device drivers ndo_start_xmit (via
dev_hard_start_xmit).  In this mode the device driver is not allowed to
ask for packets to be queued (either via returning NETDEV_TX_BUSY or
stopping the TXQ).

The simplest and most reliable way to identify this no-queue case is by
checking if enqueue == NULL.

The vrf driver currently open-codes this check (!qdisc-&gt;enqueue). While
functionally correct, this low-level detail is better encapsulated in a
dedicated helper for clarity and long-term maintainability.

To make this behavior more explicit and reusable, this patch introduce a
new helper: qdisc_txq_has_no_queue(). Helper will also be used by the
veth driver in the next patch, which introduces optional qdisc-based
backpressure.

This is a non-functional change.

Reviewed-by: David Ahern &lt;dsahern@kernel.org&gt;
Reviewed-by: Toke Høiland-Jørgensen &lt;toke@redhat.com&gt;
Signed-off-by: Jesper Dangaard Brouer &lt;hawk@kernel.org&gt;
Link: https://patch.msgid.link/174559293172.827981.7583862632045264175.stgit@firesoul
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net: move misc netdev_lock flavors to a separate header</title>
<updated>2025-03-08T17:06:50+00:00</updated>
<author>
<name>Jakub Kicinski</name>
<email>kuba@kernel.org</email>
</author>
<published>2025-03-07T18:30:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=8ef890df4031121a94407c84659125cbccd3fdbe'/>
<id>urn:sha1:8ef890df4031121a94407c84659125cbccd3fdbe</id>
<content type='text'>
Move the more esoteric helpers for netdev instance lock to
a dedicated header. This avoids growing netdevice.h to infinity
and makes rebuilding the kernel much faster (after touching
the header with the helpers).

The main netdev_lock() / netdev_unlock() functions are used
in static inlines in netdevice.h and will probably be used
most commonly, so keep them in netdevice.h.

Acked-by: Stanislav Fomichev &lt;sdf@fomichev.me&gt;
Link: https://patch.msgid.link/20250307183006.2312761-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net: rename netns_local to netns_immutable</title>
<updated>2025-03-04T11:44:48+00:00</updated>
<author>
<name>Nicolas Dichtel</name>
<email>nicolas.dichtel@6wind.com</email>
</author>
<published>2025-02-28T10:20:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=0c493da86374dffff7505e67289ad75b21f5b301'/>
<id>urn:sha1:0c493da86374dffff7505e67289ad75b21f5b301</id>
<content type='text'>
The name 'netns_local' is confusing. A following commit will export it via
netlink, so let's use a more explicit name.

Reported-by: Eric Dumazet &lt;edumazet@google.com&gt;
Suggested-by: Kuniyuki Iwashima &lt;kuniyu@amazon.com&gt;
Signed-off-by: Nicolas Dichtel &lt;nicolas.dichtel@6wind.com&gt;
Reviewed-by: Kuniyuki Iwashima &lt;kuniyu@amazon.com&gt;
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;

</content>
</entry>
<entry>
<title>rtnetlink: Pack newlink() params into struct</title>
<updated>2025-02-21T23:28:02+00:00</updated>
<author>
<name>Xiao Liang</name>
<email>shaw.leon@gmail.com</email>
</author>
<published>2025-02-19T12:50:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=69c7be1b903fca2835e80ec506bd1d75ce84fb4d'/>
<id>urn:sha1:69c7be1b903fca2835e80ec506bd1d75ce84fb4d</id>
<content type='text'>
There are 4 net namespaces involved when creating links:

 - source netns - where the netlink socket resides,
 - target netns - where to put the device being created,
 - link netns - netns associated with the device (backend),
 - peer netns - netns of peer device.

Currently, two nets are passed to newlink() callback - "src_net"
parameter and "dev_net" (implicitly in net_device). They are set as
follows, depending on netlink attributes in the request.

 +------------+-------------------+---------+---------+
 | peer netns | IFLA_LINK_NETNSID | src_net | dev_net |
 +------------+-------------------+---------+---------+
 |            | absent            | source  | target  |
 | absent     +-------------------+---------+---------+
 |            | present           | link    | link    |
 +------------+-------------------+---------+---------+
 |            | absent            | peer    | target  |
 | present    +-------------------+---------+---------+
 |            | present           | peer    | link    |
 +------------+-------------------+---------+---------+

When IFLA_LINK_NETNSID is present, the device is created in link netns
first and then moved to target netns. This has some side effects,
including extra ifindex allocation, ifname validation and link events.
These could be avoided if we create it in target netns from
the beginning.

On the other hand, the meaning of src_net parameter is ambiguous. It
varies depending on how parameters are passed. It is the effective
link (or peer netns) by design, but some drivers ignore it and use
dev_net instead.

To provide more netns context for drivers, this patch packs existing
newlink() parameters, along with the source netns, link netns and peer
netns, into a struct. The old "src_net" is renamed to "net" to avoid
confusion with real source netns, and will be deprecated later. The use
of src_net are converted to params-&gt;net trivially.

Signed-off-by: Xiao Liang &lt;shaw.leon@gmail.com&gt;
Reviewed-by: Kuniyuki Iwashima &lt;kuniyu@amazon.com&gt;
Link: https://patch.msgid.link/20250219125039.18024-3-shaw.leon@gmail.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net: fib_rules: Factorise fib_newrule() and fib_delrule().</title>
<updated>2025-02-11T03:08:52+00:00</updated>
<author>
<name>Kuniyuki Iwashima</name>
<email>kuniyu@amazon.com</email>
</author>
<published>2025-02-07T07:24:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a0596c2c63fc9d9ad16d701044293e11d8f97953'/>
<id>urn:sha1:a0596c2c63fc9d9ad16d701044293e11d8f97953</id>
<content type='text'>
fib_nl_newrule() / fib_nl_delrule() is the doit() handler for
RTM_NEWRULE / RTM_DELRULE but also called from vrf_newlink().

Currently, we hold RTNL on both paths but will not on the former.

Also, we set dev_net(dev)-&gt;rtnl to skb-&gt;sk in vrf_fib_rule() because
fib_nl_newrule() / fib_nl_delrule() fetch net as sock_net(skb-&gt;sk).

Let's Factorise the two functions and pass net and rtnl_held flag.

Signed-off-by: Kuniyuki Iwashima &lt;kuniyu@amazon.com&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reviewed-by: Ido Schimmel &lt;idosch@nvidia.com&gt;
Tested-by: Ido Schimmel &lt;idosch@nvidia.com&gt;
Link: https://patch.msgid.link/20250207072502.87775-6-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>vrf: Make pcpu_dstats update functions available to other modules.</title>
<updated>2024-12-07T01:56:52+00:00</updated>
<author>
<name>Guillaume Nault</name>
<email>gnault@redhat.com</email>
</author>
<published>2024-12-04T12:11:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=18eabadd73ae60023ab05e376246bd725fb0c113'/>
<id>urn:sha1:18eabadd73ae60023ab05e376246bd725fb0c113</id>
<content type='text'>
Currently vrf is the only module that uses NETDEV_PCPU_STAT_DSTATS.
In order to make this kind of statistics available to other modules,
we need to define the update functions in netdevice.h.

Therefore, let's define dev_dstats_*() functions for RX and TX packet
updates (packets, bytes and drops). Use these new functions in vrf.c
instead of vrf_rx_stats() and the other manual counter updates.

While there, update the type of the "len" variables to "unsigned int",
so that there're aligned with both skb-&gt;len and the new dstats update
functions.

Signed-off-by: Guillaume Nault &lt;gnault@redhat.com&gt;
Link: https://patch.msgid.link/d7a552ee382c79f4854e7fcc224cf176cd21150d.1733313925.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>vrf: Prepare vrf_process_v4_outbound() to future .flowi4_tos conversion.</title>
<updated>2024-11-03T22:29:05+00:00</updated>
<author>
<name>Guillaume Nault</name>
<email>gnault@redhat.com</email>
</author>
<published>2024-10-30T13:27:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=937677f481259b5291001ef7c68242d366e23b64'/>
<id>urn:sha1:937677f481259b5291001ef7c68242d366e23b64</id>
<content type='text'>
Use ip4h_dscp() to get the DSCP from the IPv4 header, then convert the
dscp_t value to __u8 with inet_dscp_to_dsfield().

Then, when we'll convert .flowi4_tos to dscp_t, we'll just have to drop
the inet_dscp_to_dsfield() call.

Signed-off-by: Guillaume Nault &lt;gnault@redhat.com&gt;
Reviewed-by: David Ahern &lt;dsahern@kernel.org&gt;
Link: https://patch.msgid.link/6be084229008dcfa7a4e2758befccfd2217a331e.1730294788.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>vrf: revert "vrf: Remove unnecessary RCU-bh critical section"</title>
<updated>2024-10-03T00:26:11+00:00</updated>
<author>
<name>Willem de Bruijn</name>
<email>willemb@google.com</email>
</author>
<published>2024-09-29T06:18:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b04c4d9eb4f25b950b33218e33b04c94e7445e51'/>
<id>urn:sha1:b04c4d9eb4f25b950b33218e33b04c94e7445e51</id>
<content type='text'>
This reverts commit 504fc6f4f7f681d2a03aa5f68aad549d90eab853.

dev_queue_xmit_nit is expected to be called with BH disabled.
__dev_queue_xmit has the following:

        /* Disable soft irqs for various locks below. Also
         * stops preemption for RCU.
         */
        rcu_read_lock_bh();

VRF must follow this invariant. The referenced commit removed this
protection. Which triggered a lockdep warning:

	================================
	WARNING: inconsistent lock state
	6.11.0 #1 Tainted: G        W
	--------------------------------
	inconsistent {IN-SOFTIRQ-W} -&gt; {SOFTIRQ-ON-W} usage.
	btserver/134819 [HC0[0]:SC0[0]:HE1:SE1] takes:
	ffff8882da30c118 (rlock-AF_PACKET){+.?.}-{2:2}, at: tpacket_rcv+0x863/0x3b30
	{IN-SOFTIRQ-W} state was registered at:
	  lock_acquire+0x19a/0x4f0
	  _raw_spin_lock+0x27/0x40
	  packet_rcv+0xa33/0x1320
	  __netif_receive_skb_core.constprop.0+0xcb0/0x3a90
	  __netif_receive_skb_list_core+0x2c9/0x890
	  netif_receive_skb_list_internal+0x610/0xcc0
          [...]

	other info that might help us debug this:
	 Possible unsafe locking scenario:

	       CPU0
	       ----
	  lock(rlock-AF_PACKET);
	  &lt;Interrupt&gt;
	    lock(rlock-AF_PACKET);

	 *** DEADLOCK ***

	Call Trace:
	 &lt;TASK&gt;
	 dump_stack_lvl+0x73/0xa0
	 mark_lock+0x102e/0x16b0
	 __lock_acquire+0x9ae/0x6170
	 lock_acquire+0x19a/0x4f0
	 _raw_spin_lock+0x27/0x40
	 tpacket_rcv+0x863/0x3b30
	 dev_queue_xmit_nit+0x709/0xa40
	 vrf_finish_direct+0x26e/0x340 [vrf]
	 vrf_l3_out+0x5f4/0xe80 [vrf]
	 __ip_local_out+0x51e/0x7a0
          [...]

Fixes: 504fc6f4f7f6 ("vrf: Remove unnecessary RCU-bh critical section")
Link: https://lore.kernel.org/netdev/20240925185216.1990381-1-greearb@candelatech.com/
Reported-by: Ben Greear &lt;greearb@candelatech.com&gt;
Signed-off-by: Willem de Bruijn &lt;willemb@google.com&gt;
Cc: stable@vger.kernel.org
Reviewed-by: Ido Schimmel &lt;idosch@nvidia.com&gt;
Tested-by: Ido Schimmel &lt;idosch@nvidia.com&gt;
Reviewed-by: David Ahern &lt;dsahern@kernel.org&gt;
Link: https://patch.msgid.link/20240929061839.1175300-1-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
</feed>
