<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/include/net/flow.h, branch v6.19.11</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.19.11</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.19.11'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2025-08-27T00:34:31+00:00</updated>
<entry>
<title>ipv4: Convert -&gt;flowi4_tos to dscp_t.</title>
<updated>2025-08-27T00:34:31+00:00</updated>
<author>
<name>Guillaume Nault</name>
<email>gnault@redhat.com</email>
</author>
<published>2025-08-25T13:37:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=1bec9d0c0046fe4e2bfb6a1c5aadcb5d56cdb0fb'/>
<id>urn:sha1:1bec9d0c0046fe4e2bfb6a1c5aadcb5d56cdb0fb</id>
<content type='text'>
Convert the -&gt;flowic_tos field of struct flowi_common from __u8 to
dscp_t, rename it -&gt;flowic_dscp and propagate these changes to struct
flowi and struct flowi4.

We've had several bugs in the past where ECN bits could interfere with
IPv4 routing, because these bits were not properly cleared when setting
-&gt;flowi4_tos. These bugs should be fixed now and the dscp_t type has
been introduced to ensure that variables carrying DSCP values don't
accidentally have any ECN bits set. Several variables and structure
fields have been converted to dscp_t already, but the main IPv4 routing
structure, struct flowi4, is still using a __u8. To avoid any future
regression, this patch converts it to dscp_t.

There are many users to convert at once. Fortunately, around half of
-&gt;flowi4_tos users already have a dscp_t value at hand, which they
currently convert to __u8 using inet_dscp_to_dsfield(). For all of
these users, we just need to drop that conversion.

But, although we try to do the __u8 &lt;-&gt; dscp_t conversions at the
boundaries of the network or of user space, some places still store
TOS/DSCP variables as __u8 in core networking code. Those can hardly be
converted either because the data structure is part of UAPI or because
the same variable or field is also used for handling ECN in other parts
of the code. In all of these cases where we don't have a dscp_t
variable at hand, we need to use inet_dsfield_to_dscp() when
interacting with -&gt;flowi4_dscp.

Changes since v1:
  * Fix space alignment in __bpf_redirect_neigh_v4() (Ido).

Signed-off-by: Guillaume Nault &lt;gnault@redhat.com&gt;
Reviewed-by: Ido Schimmel &lt;idosch@nvidia.com&gt;
Link: https://patch.msgid.link/29acecb45e911d17446b9a3dbdb1ab7b821ea371.1756128932.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>ip: load balance tcp connections to single dst addr and port</title>
<updated>2025-04-29T14:22:25+00:00</updated>
<author>
<name>Willem de Bruijn</name>
<email>willemb@google.com</email>
</author>
<published>2025-04-24T14:35:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=65e9024643c7512ade3aedbb341e11d77ed7abc2'/>
<id>urn:sha1:65e9024643c7512ade3aedbb341e11d77ed7abc2</id>
<content type='text'>
Load balance new TCP connections across nexthops also when they
connect to the same service at a single remote address and port.

This affects only port-based multipath hashing:
fib_multipath_hash_policy 1 or 3.

Local connections must choose both a source address and port when
connecting to a remote service, in ip_route_connect. This
"chicken-and-egg problem" (commit 2d7192d6cbab ("ipv4: Sanitize and
simplify ip_route_{connect,newports}()")) is resolved by first
selecting a source address, by looking up a route using the zero
wildcard source port and address.

As a result multiple connections to the same destination address and
port have no entropy in fib_multipath_hash.

This is not a problem when forwarding, as skb-based hashing has a
4-tuple. Nor when establishing UDP connections, as autobind there
selects a port before reaching ip_route_connect.

Load balance also TCP, by using a random port in fib_multipath_hash.
Port assignment in inet_hash_connect is not atomic with
ip_route_connect. Thus ports are unpredictable, effectively random.

Implementation details:

Do not actually pass a random fl4_sport, as that affects not only
hashing, but routing more broadly, and can match a source port based
policy route, which existing wildcard port 0 will not. Instead,
define a new wildcard flowi flag that is used only for hashing.

Selecting a random source is equivalent to just selecting a random
hash entirely. But for code clarity, follow the normal 4-tuple hash
process and only update this field.

fib_multipath_hash can be reached with zero sport from other code
paths, so explicitly pass this flowi flag, rather than trying to infer
this case in the function itself.

Signed-off-by: Willem de Bruijn &lt;willemb@google.com&gt;
Reviewed-by: David Ahern &lt;dsahern@kernel.org&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reviewed-by: Ido Schimmel &lt;idosch@nvidia.com&gt;
Link: https://patch.msgid.link/20250424143549.669426-3-willemdebruijn.kernel@gmail.com
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;

</content>
</entry>
<entry>
<title>net: fib_rules: Fix iif / oif matching on L3 master device</title>
<updated>2025-04-16T00:54:56+00:00</updated>
<author>
<name>Ido Schimmel</name>
<email>idosch@nvidia.com</email>
</author>
<published>2025-04-14T17:20:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=2d300ce0b783ba74420052ed3a12010ac93bdb08'/>
<id>urn:sha1:2d300ce0b783ba74420052ed3a12010ac93bdb08</id>
<content type='text'>
Before commit 40867d74c374 ("net: Add l3mdev index to flow struct and
avoid oif reset for port devices") it was possible to use FIB rules to
match on a L3 domain. This was done by having a FIB rule match on iif /
oif being a L3 master device. It worked because prior to the FIB rule
lookup the iif / oif fields in the flow structure were reset to the
index of the L3 master device to which the input / output device was
enslaved to.

The above scheme made it impossible to match on the original input /
output device. Therefore, cited commit stopped overwriting the iif / oif
fields in the flow structure and instead stored the index of the
enslaving L3 master device in a new field ('flowi_l3mdev') in the flow
structure.

While the change enabled new use cases, it broke the original use case
of matching on a L3 domain. Fix this by interpreting the iif / oif
matching on a L3 master device as a match against the L3 domain. In
other words, if the iif / oif in the FIB rule points to a L3 master
device, compare the provided index against 'flowi_l3mdev' rather than
'flowi_{i,o}if'.

Before cited commit, a FIB rule that matched on 'iif vrf1' would only
match incoming traffic from devices enslaved to 'vrf1'. With the
proposed change (i.e., comparing against 'flowi_l3mdev'), the rule would
also match traffic originating from a socket bound to 'vrf1'. Avoid that
by adding a new flow flag ('FLOWI_FLAG_L3MDEV_OIF') that indicates if
the L3 domain was derived from the output interface or the input
interface (when not set) and take this flag into account when evaluating
the FIB rule against the flow structure.

Avoid unnecessary checks in the data path by detecting that a rule
matches on a L3 master device when the rule is installed and marking it
as such.

Tested using the following script [1].

Output before 40867d74c374 (v5.4.291):

default dev dummy1 table 100 scope link
default dev dummy1 table 200 scope link

Output after 40867d74c374:

default dev dummy1 table 300 scope link
default dev dummy1 table 300 scope link

Output with this patch:

default dev dummy1 table 100 scope link
default dev dummy1 table 200 scope link

[1]
 #!/bin/bash

 ip link add name vrf1 up type vrf table 10
 ip link add name dummy1 up master vrf1 type dummy

 sysctl -wq net.ipv4.conf.all.forwarding=1
 sysctl -wq net.ipv4.conf.all.rp_filter=0

 ip route add table 100 default dev dummy1
 ip route add table 200 default dev dummy1
 ip route add table 300 default dev dummy1

 ip rule add prio 0 oif vrf1 table 100
 ip rule add prio 1 iif vrf1 table 200
 ip rule add prio 2 table 300

 ip route get 192.0.2.1 oif dummy1 fibmatch
 ip route get 192.0.2.1 iif dummy1 from 198.51.100.1 fibmatch

Fixes: 40867d74c374 ("net: Add l3mdev index to flow struct and avoid oif reset for port devices")
Reported-by: hanhuihui &lt;hanhuihui5@huawei.com&gt;
Closes: https://lore.kernel.org/netdev/ec671c4f821a4d63904d0da15d604b75@huawei.com/
Signed-off-by: Ido Schimmel &lt;idosch@nvidia.com&gt;
Acked-by: David Ahern &lt;dsahern@kernel.org&gt;
Link: https://patch.msgid.link/20250414172022.242991-2-idosch@nvidia.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>inet: shrink struct flowi_common</title>
<updated>2023-11-02T08:31:02+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2023-10-25T14:10:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=1726483b79a72e0150734d5367e4a0238bf8fcff'/>
<id>urn:sha1:1726483b79a72e0150734d5367e4a0238bf8fcff</id>
<content type='text'>
I am looking at syzbot reports triggering kernel stack overflows
involving a cascade of ipvlan devices.

We can save 8 bytes in struct flowi_common.

This patch alone will not fix the issue, but is a start.

Fixes: 24ba14406c5c ("route: Add multipath_hash in flowi_common to make user-define hash")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: wenxu &lt;wenxu@ucloud.cn&gt;
Reviewed-by: David Ahern &lt;dsahern@kernel.org&gt;
Link: https://lore.kernel.org/r/20231025141037.3448203-1-edumazet@google.com
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
</content>
</entry>
<entry>
<title>ipv4: Drop tos parameter from flowi4_update_output()</title>
<updated>2023-06-02T09:52:38+00:00</updated>
<author>
<name>Guillaume Nault</name>
<email>gnault@redhat.com</email>
</author>
<published>2023-06-01T16:37:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=3f06760c00f56c5fe6c7f3361c2cf64becee1174'/>
<id>urn:sha1:3f06760c00f56c5fe6c7f3361c2cf64becee1174</id>
<content type='text'>
Callers of flowi4_update_output() never try to update -&gt;flowi4_tos:

  * ip_route_connect() updates -&gt;flowi4_tos with its own current
    value.

  * ip_route_newports() has two users: tcp_v4_connect() and
    dccp_v4_connect. Both initialise fl4 with ip_route_connect(), which
    in turn sets -&gt;flowi4_tos with RT_TOS(inet_sk(sk)-&gt;tos) and
    -&gt;flowi4_scope based on SOCK_LOCALROUTE.

    Then ip_route_newports() updates -&gt;flowi4_tos with
    RT_CONN_FLAGS(sk), which is the same as RT_TOS(inet_sk(sk)-&gt;tos),
    unless SOCK_LOCALROUTE is set on the socket. In that case, the
    lowest order bit is set to 1, to eventually inform
    ip_route_output_key_hash() to restrict the scope to RT_SCOPE_LINK.
    This is equivalent to properly setting -&gt;flowi4_scope as
    ip_route_connect() did.

  * ip_vs_xmit.c initialises -&gt;flowi4_tos with memset(0), then calls
    flowi4_update_output() with tos=0.

  * sctp_v4_get_dst() uses the same RT_CONN_FLAGS_TOS() when
    initialising -&gt;flowi4_tos and when calling flowi4_update_output().

In the end, -&gt;flowi4_tos never changes. So let's just drop the tos
parameter. This will simplify the conversion of -&gt;flowi4_tos from __u8
to dscp_t.

Signed-off-by: Guillaume Nault &lt;gnault@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: remove unnecessary includes from net/flow.h</title>
<updated>2023-01-27T11:19:46+00:00</updated>
<author>
<name>Jakub Kicinski</name>
<email>kuba@kernel.org</email>
</author>
<published>2023-01-26T07:14:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=21bf73158fe7ae8a3d55618e58edc958f84739a8'/>
<id>urn:sha1:21bf73158fe7ae8a3d55618e58edc958f84739a8</id>
<content type='text'>
This file is included by a lot of other commonly included
headers, it doesn't need socket.h or flow_dissector.h.

This reduces the size of this file after pre-processing
from 28165 to 4663.

Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: Remove DECnet leftovers from flow.h.</title>
<updated>2022-10-03T11:41:59+00:00</updated>
<author>
<name>Guillaume Nault</name>
<email>gnault@redhat.com</email>
</author>
<published>2022-09-30T14:37:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=9bc61c04ff6cce6a3756b86e6b34914f7b39d734'/>
<id>urn:sha1:9bc61c04ff6cce6a3756b86e6b34914f7b39d734</id>
<content type='text'>
DECnet was removed by commit 1202cdd66531 ("Remove DECnet support from
kernel"). Let's also revome its flow structure.

Compile-tested only (allmodconfig).

Signed-off-by: Guillaume Nault &lt;gnault@redhat.com&gt;
Acked-by: Stephen Hemminger &lt;stephen@networkplumber.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: Add l3mdev index to flow struct and avoid oif reset for port devices</title>
<updated>2022-03-16T03:20:02+00:00</updated>
<author>
<name>David Ahern</name>
<email>dsahern@kernel.org</email>
</author>
<published>2022-03-14T20:45:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=40867d74c374b235e14d839f3a77f26684feefe5'/>
<id>urn:sha1:40867d74c374b235e14d839f3a77f26684feefe5</id>
<content type='text'>
The fundamental premise of VRF and l3mdev core code is binding a socket
to a device (l3mdev or netdev with an L3 domain) to indicate L3 scope.
Legacy code resets flowi_oif to the l3mdev losing any original port
device binding. Ben (among others) has demonstrated use cases where the
original port device binding is important and needs to be retained.
This patch handles that by adding a new entry to the common flow struct
that can indicate the l3mdev index for later rule and table matching
avoiding the need to reset flowi_oif.

In addition to allowing more use cases that require port device binds,
this patch brings a few datapath simplications:

1. l3mdev_fib_rule_match is only called when walking fib rules and
   always after l3mdev_update_flow. That allows an optimization to bail
   early for non-VRF type uses cases when flowi_l3mdev is not set. Also,
   only that index needs to be checked for the FIB table id.

2. l3mdev_update_flow can be called with flowi_oif set to a l3mdev
   (e.g., VRF) device. By resetting flowi_oif only for this case the
   FLOWI_FLAG_SKIP_NH_OIF flag is not longer needed and can be removed,
   removing several checks in the datapath. The flowi_iif path can be
   simplified to only be called if the it is not loopback (loopback can
   not be assigned to an L3 domain) and the l3mdev index is not already
   set.

3. Avoid another device lookup in the output path when the fib lookup
   returns a reject failure.

Note: 2 functional tests for local traffic with reject fib rules are
updated to reflect the new direct failure at FIB lookup time for ping
rather than the failure on packet path. The current code fails like this:

    HINT: Fails since address on vrf device is out of device scope
    COMMAND: ip netns exec ns-A ping -c1 -w1 -I eth1 172.16.3.1
    ping: Warning: source address might be selected on device other than: eth1
    PING 172.16.3.1 (172.16.3.1) from 172.16.3.1 eth1: 56(84) bytes of data.

    --- 172.16.3.1 ping statistics ---
    1 packets transmitted, 0 received, 100% packet loss, time 0ms

where the test now directly fails:

    HINT: Fails since address on vrf device is out of device scope
    COMMAND: ip netns exec ns-A ping -c1 -w1 -I eth1 172.16.3.1
    ping: connect: No route to host

Signed-off-by: David Ahern &lt;dsahern@kernel.org&gt;
Tested-by: Ben Greear &lt;greearb@candelatech.com&gt;
Link: https://lore.kernel.org/r/20220314204551.16369-1-dsahern@kernel.org
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>flow: fix object-size-mismatch warning in flowi{4,6}_to_flowi_common()</title>
<updated>2021-09-02T10:44:19+00:00</updated>
<author>
<name>Tetsuo Handa</name>
<email>penguin-kernel@i-love.sakura.ne.jp</email>
</author>
<published>2021-09-02T05:36:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b9edbfe1adecfc48fd11061dce68afb03d6adbdc'/>
<id>urn:sha1:b9edbfe1adecfc48fd11061dce68afb03d6adbdc</id>
<content type='text'>
Commit 3df98d79215ace13 ("lsm,selinux: pass flowi_common instead of flowi
to the LSM hooks") introduced flowi{4,6}_to_flowi_common() functions which
cause UBSAN warning when building with LLVM 11.0.1 on Ubuntu 21.04.

 ================================================================================
 UBSAN: object-size-mismatch in ./include/net/flow.h:197:33
 member access within address ffffc9000109fbd8 with insufficient space
 for an object of type 'struct flowi'
 CPU: 2 PID: 7410 Comm: systemd-resolve Not tainted 5.14.0 #51
 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 02/27/2020
 Call Trace:
  dump_stack_lvl+0x103/0x171
  ubsan_type_mismatch_common+0x1de/0x390
  __ubsan_handle_type_mismatch_v1+0x41/0x50
  udp_sendmsg+0xda2/0x1300
  ? ip_skb_dst_mtu+0x1f0/0x1f0
  ? sock_rps_record_flow+0xe/0x200
  ? inet_send_prepare+0x2d/0x90
  sock_sendmsg+0x49/0x80
  ____sys_sendmsg+0x269/0x370
  __sys_sendmsg+0x15e/0x1d0
  ? syscall_enter_from_user_mode+0xf0/0x1b0
  do_syscall_64+0x3d/0xb0
  entry_SYSCALL_64_after_hwframe+0x44/0xae
 RIP: 0033:0x7f7081a50497
 Code: 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 &lt;48&gt; 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10
 RSP: 002b:00007ffc153870f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
 RAX: ffffffffffffffda RBX: 000000000000000c RCX: 00007f7081a50497
 RDX: 0000000000000000 RSI: 00007ffc15387140 RDI: 000000000000000c
 RBP: 00007ffc15387140 R08: 0000563f29a5e4fc R09: 000000000000cd28
 R10: 0000563f29a68a30 R11: 0000000000000246 R12: 000000000000000c
 R13: 0000000000000001 R14: 0000563f29a68a30 R15: 0000563f29a5e50c
 ================================================================================

I don't think we need to call flowi{4,6}_to_flowi() from these functions
because the first member of "struct flowi4" and "struct flowi6" is

  struct flowi_common __fl_common;

while the first member of "struct flowi" is

  union {
    struct flowi_common __fl_common;
    struct flowi4       ip4;
    struct flowi6       ip6;
    struct flowidn      dn;
  } u;

which should point to the same address without access to "struct flowi".

Signed-off-by: Tetsuo Handa &lt;penguin-kernel@I-love.SAKURA.ne.jp&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>flow: remove spi key from flowi struct</title>
<updated>2021-04-19T10:25:11+00:00</updated>
<author>
<name>Florian Westphal</name>
<email>fw@strlen.de</email>
</author>
<published>2021-04-14T16:12:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b07dd26f07af294ceed9715fd11e312ff8de6138'/>
<id>urn:sha1:b07dd26f07af294ceed9715fd11e312ff8de6138</id>
<content type='text'>
xfrm session decode ipv4 path (but not ipv6) sets this, but there are no
consumers.  Remove it.

Signed-off-by: Florian Westphal &lt;fw@strlen.de&gt;
Signed-off-by: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
</content>
</entry>
</feed>
