<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/net/ipv4/fib_semantics.c, branch v6.18.22</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.18.22</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.18.22'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2026-01-08T09:16:49+00:00</updated>
<entry>
<title>net: fib: restore ECMP balance from loopback</title>
<updated>2026-01-08T09:16:49+00:00</updated>
<author>
<name>Vadim Fedorenko</name>
<email>vadim.fedorenko@linux.dev</email>
</author>
<published>2025-12-21T19:26:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=ffd335167d7edc171380546a0f204d4c680d690d'/>
<id>urn:sha1:ffd335167d7edc171380546a0f204d4c680d690d</id>
<content type='text'>
[ Upstream commit 6e17474aa9fe15015c9921a5081c7ca71783aac6 ]

Preference of nexthop with source address broke ECMP for packets with
source addresses which are not in the broadcast domain, but rather added
to loopback/dummy interfaces. Original behaviour was to balance over
nexthops while now it uses the latest nexthop from the group. To fix the
issue introduce next hop scoring system where next hops with source
address equal to requested will always have higher priority.

For the case with 198.51.100.1/32 assigned to dummy0 and routed using
192.0.2.0/24 and 203.0.113.0/24 networks:

2: dummy0: &lt;BROADCAST,NOARP,UP,LOWER_UP&gt; mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether d6:54:8a:ff:78:f5 brd ff:ff:ff:ff:ff:ff
    inet 198.51.100.1/32 scope global dummy0
       valid_lft forever preferred_lft forever
7: veth1@if6: &lt;BROADCAST,MULTICAST,UP,LOWER_UP&gt; mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 06:ed:98:87:6d:8a brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 192.0.2.2/24 scope global veth1
       valid_lft forever preferred_lft forever
    inet6 fe80::4ed:98ff:fe87:6d8a/64 scope link proto kernel_ll
       valid_lft forever preferred_lft forever
9: veth3@if8: &lt;BROADCAST,MULTICAST,UP,LOWER_UP&gt; mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether ae:75:23:38:a0:d2 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 203.0.113.2/24 scope global veth3
       valid_lft forever preferred_lft forever
    inet6 fe80::ac75:23ff:fe38:a0d2/64 scope link proto kernel_ll
       valid_lft forever preferred_lft forever

~ ip ro list:
default
	nexthop via 192.0.2.1 dev veth1 weight 1
	nexthop via 203.0.113.1 dev veth3 weight 1
192.0.2.0/24 dev veth1 proto kernel scope link src 192.0.2.2
203.0.113.0/24 dev veth3 proto kernel scope link src 203.0.113.2

before:
   for i in {1..255} ; do ip ro get 10.0.0.$i; done | grep veth | awk ' {print $(NF-2)}' | sort | uniq -c:
    255 veth3

after:
   for i in {1..255} ; do ip ro get 10.0.0.$i; done | grep veth | awk ' {print $(NF-2)}' | sort | uniq -c:
    122 veth1
    133 veth3

Fixes: 32607a332cfe ("ipv4: prefer multipath nexthop that matches source address")
Signed-off-by: Vadim Fedorenko &lt;vadim.fedorenko@linux.dev&gt;
Reviewed-by: Ido Schimmel &lt;idosch@nvidia.com&gt;
Reviewed-by: Willem de Bruijn &lt;willemb@google.com&gt;
Link: https://patch.msgid.link/20251221192639.3911901-1-vadim.fedorenko@linux.dev
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>net: s/dev_get_flags/netif_get_flags/</title>
<updated>2025-07-19T00:27:47+00:00</updated>
<author>
<name>Stanislav Fomichev</name>
<email>sdf@fomichev.me</email>
</author>
<published>2025-07-17T17:23:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=93893a57efd431b9b4e72359bc8a8428681ca688'/>
<id>urn:sha1:93893a57efd431b9b4e72359bc8a8428681ca688</id>
<content type='text'>
Commit cc34acd577f1 ("docs: net: document new locking reality")
introduced netif_ vs dev_ function semantics: the former expects locked
netdev, the latter takes care of the locking. We don't strictly
follow this semantics on either side, but there are more dev_xxx handlers
now that don't fit. Rename them to netif_xxx where appropriate.

Signed-off-by: Stanislav Fomichev &lt;sdf@fomichev.me&gt;
Link: https://patch.msgid.link/20250717172333.1288349-6-sdf@fomichev.me
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>ipv4: fib: Remove unnecessary encap_type check</title>
<updated>2025-06-27T00:16:03+00:00</updated>
<author>
<name>Yue Haibing</name>
<email>yuehaibing@huawei.com</email>
</author>
<published>2025-06-25T02:20:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=77e12dba07d08080e5124cd9320577f2832a9eab'/>
<id>urn:sha1:77e12dba07d08080e5124cd9320577f2832a9eab</id>
<content type='text'>
lwtunnel_build_state() has check validity of encap_type,
so no need to do this before call it.

Signed-off-by: Yue Haibing &lt;yuehaibing@huawei.com&gt;
Reviewed-by: Kuniyuki Iwashima &lt;kuniyu@google.com&gt;
Link: https://patch.msgid.link/20250625022059.3958215-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net: Remove unnecessary NULL check for lwtunnel_fill_encap()</title>
<updated>2025-06-25T23:52:46+00:00</updated>
<author>
<name>Yue Haibing</name>
<email>yuehaibing@huawei.com</email>
</author>
<published>2025-06-24T14:00:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=3b3ccf9ed05e0650273e7086f4d8f495e4b2c850'/>
<id>urn:sha1:3b3ccf9ed05e0650273e7086f4d8f495e4b2c850</id>
<content type='text'>
lwtunnel_fill_encap() has NULL check and return 0, so no need
to check before call it.

Signed-off-by: Yue Haibing &lt;yuehaibing@huawei.com&gt;
Reviewed-by: Kuniyuki Iwashima &lt;kuniyu@google.com&gt;
Link: https://patch.msgid.link/20250624140015.3929241-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>ipv6: Revert two per-cpu var allocation for RTM_NEWROUTE.</title>
<updated>2025-05-21T02:18:24+00:00</updated>
<author>
<name>Kuniyuki Iwashima</name>
<email>kuniyu@amazon.com</email>
</author>
<published>2025-05-16T02:27:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=002dba13c824f1cf86f618f0d23d1f0ad3c93bbb'/>
<id>urn:sha1:002dba13c824f1cf86f618f0d23d1f0ad3c93bbb</id>
<content type='text'>
These two commits preallocated two per-cpu variables in
ip6_route_info_create() as fib_nh_common_init() and fib6_nh_init()
were expected to be called under RCU.

  * commit d27b9c40dbd6 ("ipv6: Preallocate nhc_pcpu_rth_output in
    ip6_route_info_create().")
  * commit 5720a328c3e9 ("ipv6: Preallocate rt-&gt;fib6_nh-&gt;rt6i_pcpu in
    ip6_route_info_create().")

Now these functions can be called without RCU and can use GFP_KERNEL.

Let's revert the commits.

Signed-off-by: Kuniyuki Iwashima &lt;kuniyu@amazon.com&gt;
Link: https://patch.msgid.link/20250516022759.44392-8-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>ipv4: Honor "ignore_routes_with_linkdown" sysctl in nexthop selection</title>
<updated>2025-05-03T20:52:38+00:00</updated>
<author>
<name>Ido Schimmel</name>
<email>idosch@nvidia.com</email>
</author>
<published>2025-04-30T10:02:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=836b313a14a316290886dcc2ce7e78bf5ecc8658'/>
<id>urn:sha1:836b313a14a316290886dcc2ce7e78bf5ecc8658</id>
<content type='text'>
Commit 32607a332cfe ("ipv4: prefer multipath nexthop that matches source
address") changed IPv4 nexthop selection to prefer a nexthop whose
nexthop device is assigned the specified source address for locally
generated traffic.

While the selection honors the "fib_multipath_use_neigh" sysctl and will
not choose a nexthop with an invalid neighbour, it does not honor the
"ignore_routes_with_linkdown" sysctl and can choose a nexthop without a
carrier:

 $ sysctl net.ipv4.conf.all.ignore_routes_with_linkdown
 net.ipv4.conf.all.ignore_routes_with_linkdown = 1
 $ ip route show 198.51.100.0/24
 198.51.100.0/24
         nexthop via 192.0.2.2 dev dummy1 weight 1
         nexthop via 192.0.2.18 dev dummy2 weight 1 dead linkdown
 $ ip route get 198.51.100.1 from 192.0.2.17
 198.51.100.1 from 192.0.2.17 via 192.0.2.18 dev dummy2 uid 0

Solve this by skipping over nexthops whose assigned hash upper bound is
minus one, which is the value assigned to nexthops that do not have a
carrier when the "ignore_routes_with_linkdown" sysctl is set.

In practice, this probably does not matter a lot as the initial route
lookup for the source address would not choose a nexthop that does not
have a carrier in the first place, but the change does make the code
clearer.

Signed-off-by: Ido Schimmel &lt;idosch@nvidia.com&gt;
Reviewed-by: Willem de Bruijn &lt;willemb@google.com&gt;
Reviewed-by: David Ahern &lt;dsahern@kernel.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>ipv4: fib: Fix fib_info_hash_alloc() allocation type</title>
<updated>2025-04-29T17:44:17+00:00</updated>
<author>
<name>Kees Cook</name>
<email>kees@kernel.org</email>
</author>
<published>2025-04-26T06:05:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=fca6170f5a039543fa5f390f1895fde503b80f46'/>
<id>urn:sha1:fca6170f5a039543fa5f390f1895fde503b80f46</id>
<content type='text'>
In preparation for making the kmalloc family of allocators type aware,
we need to make sure that the returned type from the allocation matches
the type of the variable being assigned. (Before, the allocator would
always return "void *", which can be implicitly cast to any pointer type.)

This was allocating many sizeof(struct hlist_head *) when it actually
wanted sizeof(struct hlist_head). Luckily these are the same size.
Adjust the allocation type to match the assignment.

Signed-off-by: Kees Cook &lt;kees@kernel.org&gt;
Reviewed-by: Simon Horman &lt;horms@kernel.org&gt;
Reviewed-by: David Ahern &lt;dsahern@kernel.org&gt;
Reviewed-by: Kuniyuki Iwashima &lt;kuniyu@amazon.com&gt;
Link: https://patch.msgid.link/20250426060529.work.873-kees@kernel.org
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>ipv4: prefer multipath nexthop that matches source address</title>
<updated>2025-04-29T14:22:25+00:00</updated>
<author>
<name>Willem de Bruijn</name>
<email>willemb@google.com</email>
</author>
<published>2025-04-24T14:35:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=32607a332cfea5a4b2a185f3e3d605a9bf4f8df0'/>
<id>urn:sha1:32607a332cfea5a4b2a185f3e3d605a9bf4f8df0</id>
<content type='text'>
With multipath routes, try to ensure that packets leave on the device
that is associated with the source address.

Avoid the following tcpdump example:

    veth0 Out IP 10.1.0.2.38640 &gt; 10.2.0.3.8000: Flags [S]
    veth1 Out IP 10.1.0.2.38648 &gt; 10.2.0.3.8000: Flags [S]

Which can happen easily with the most straightforward setup:

    ip addr add 10.0.0.1/24 dev veth0
    ip addr add 10.1.0.1/24 dev veth1

    ip route add 10.2.0.3 nexthop via 10.0.0.2 dev veth0 \
    			  nexthop via 10.1.0.2 dev veth1

This is apparently considered WAI, based on the comment in
ip_route_output_key_hash_rcu:

    * 2. Moreover, we are allowed to send packets with saddr
    *    of another iface. --ANK

It may be ok for some uses of multipath, but not all. For instance,
when using two ISPs, a router may drop packets with unknown source.

The behavior occurs because tcp_v4_connect makes three route
lookups when establishing a connection:

1. ip_route_connect calls to select a source address, with saddr zero.
2. ip_route_connect calls again now that saddr and daddr are known.
3. ip_route_newports calls again after a source port is also chosen.

With a route with multiple nexthops, each lookup may make a different
choice depending on available entropy to fib_select_multipath. So it
is possible for 1 to select the saddr from the first entry, but 3 to
select the second entry. Leading to the above situation.

Address this by preferring a match that matches the flowi4 saddr. This
will make 2 and 3 make the same choice as 1. Continue to update the
backup choice until a choice that matches saddr is found.

Do this in fib_select_multipath itself, rather than passing an fl4_oif
constraint, to avoid changing non-multipath route selection. Commit
e6b45241c57a ("ipv4: reset flowi parameters on route connect") shows
how that may cause regressions.

Also read ipv4.sysctl_fib_multipath_use_neigh only once. No need to
refresh in the loop.

This does not happen in IPv6, which performs only one lookup.

Signed-off-by: Willem de Bruijn &lt;willemb@google.com&gt;
Reviewed-by: David Ahern &lt;dsahern@kernel.org&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reviewed-by: Ido Schimmel &lt;idosch@nvidia.com&gt;
Link: https://patch.msgid.link/20250424143549.669426-2-willemdebruijn.kernel@gmail.com
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;

</content>
</entry>
<entry>
<title>ipv6: Preallocate nhc_pcpu_rth_output in ip6_route_info_create().</title>
<updated>2025-04-24T07:29:56+00:00</updated>
<author>
<name>Kuniyuki Iwashima</name>
<email>kuniyu@amazon.com</email>
</author>
<published>2025-04-18T00:03:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=d27b9c40dbd66aa78b3e6657e600cf057a48ac1e'/>
<id>urn:sha1:d27b9c40dbd66aa78b3e6657e600cf057a48ac1e</id>
<content type='text'>
ip6_route_info_create_nh() will be called under RCU.

It calls fib_nh_common_init() and allocates nhc-&gt;nhc_pcpu_rth_output.

As with the reason for rt-&gt;fib6_nh-&gt;rt6i_pcpu, we want to avoid
GFP_ATOMIC allocation for nhc-&gt;nhc_pcpu_rth_output under RCU.

Let's preallocate it in ip6_route_info_create().

Signed-off-by: Kuniyuki Iwashima &lt;kuniyu@amazon.com&gt;
Link: https://patch.msgid.link/20250418000443.43734-9-kuniyu@amazon.com
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;

</content>
</entry>
<entry>
<title>ipv4: fib: Namespacify fib_info hash tables.</title>
<updated>2025-03-03T23:04:10+00:00</updated>
<author>
<name>Kuniyuki Iwashima</name>
<email>kuniyu@amazon.com</email>
</author>
<published>2025-02-28T04:23:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=9f7f3ebeba9388c30b0cba9dc5df0c43d0e07605'/>
<id>urn:sha1:9f7f3ebeba9388c30b0cba9dc5df0c43d0e07605</id>
<content type='text'>
We will convert RTM_NEWROUTE and RTM_DELROUTE to per-netns RTNL.
Then, we need to have per-netns hash tables for struct fib_info.

Let's allocate the hash tables per netns.

fib_info_hash, fib_info_hash_bits, and fib_info_cnt are now moved
to struct netns_ipv4 and accessed with net-&gt;ipv4.fib_XXX.

Also, the netns checks are removed from fib_find_info_nh() and
fib_find_info().

Signed-off-by: Kuniyuki Iwashima &lt;kuniyu@amazon.com&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reviewed-by: David Ahern &lt;dsahern@kernel.org&gt;
Link: https://patch.msgid.link/20250228042328.96624-9-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
</feed>
