<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/net/ipv4/fib_rules.c, branch v6.12.80</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.12.80</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.12.80'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2025-05-29T09:02:57+00:00</updated>
<entry>
<title>ip: fib_rules: Fetch net from fib_rule in fib[46]_rule_configure().</title>
<updated>2025-05-29T09:02:57+00:00</updated>
<author>
<name>Kuniyuki Iwashima</name>
<email>kuniyu@amazon.com</email>
</author>
<published>2025-02-07T07:24:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=9f2911868a73342ec950e67a0d19adc933bdd45e'/>
<id>urn:sha1:9f2911868a73342ec950e67a0d19adc933bdd45e</id>
<content type='text'>
[ Upstream commit 5a1ccffd30a08f5a2428cd5fbb3ab03e8eb6c66d ]

The following patch will not set skb-&gt;sk from VRF path.

Let's fetch net from fib_rule-&gt;fr_net instead of sock_net(skb-&gt;sk)
in fib[46]_rule_configure().

Signed-off-by: Kuniyuki Iwashima &lt;kuniyu@amazon.com&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reviewed-by: Ido Schimmel &lt;idosch@nvidia.com&gt;
Tested-by: Ido Schimmel &lt;idosch@nvidia.com&gt;
Link: https://patch.msgid.link/20250207072502.87775-5-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>ipv4: fib_rules: Add DSCP selector support</title>
<updated>2024-09-14T04:15:44+00:00</updated>
<author>
<name>Ido Schimmel</name>
<email>idosch@nvidia.com</email>
</author>
<published>2024-09-11T09:37:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b9455fef8b1fc662369d982fe97dc66e6c332699'/>
<id>urn:sha1:b9455fef8b1fc662369d982fe97dc66e6c332699</id>
<content type='text'>
Implement support for the new DSCP selector that allows IPv4 FIB rules
to match on the entire DSCP field, unlike the existing TOS selector that
only matches on the three lower DSCP bits.

Differentiate between both selectors by adding a new bit in the IPv4 FIB
rule structure (in an existing one byte hole) that is only set when the
'FRA_DSCP' attribute is specified by user space. Reject rules that use
both selectors.

Signed-off-by: Ido Schimmel &lt;idosch@nvidia.com&gt;
Reviewed-by: Guillaume Nault &lt;gnault@redhat.com&gt;
Reviewed-by: David Ahern &lt;dsahern@kernel.org&gt;
Link: https://patch.msgid.link/20240911093748.3662015-3-idosch@nvidia.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>ipv4: Centralize TOS matching</title>
<updated>2024-08-20T12:57:08+00:00</updated>
<author>
<name>Ido Schimmel</name>
<email>idosch@nvidia.com</email>
</author>
<published>2024-08-14T12:52:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=1fa3314c14c6a20d098991a0a6980f9b18b2f930'/>
<id>urn:sha1:1fa3314c14c6a20d098991a0a6980f9b18b2f930</id>
<content type='text'>
The TOS field in the IPv4 flow information structure ('flowi4_tos') is
matched by the kernel against the TOS selector in IPv4 rules and routes.
The field is initialized differently by different call sites. Some treat
it as DSCP (RFC 2474) and initialize all six DSCP bits, some treat it as
RFC 1349 TOS and initialize it using RT_TOS() and some treat it as RFC
791 TOS and initialize it using IPTOS_RT_MASK.

What is common to all these call sites is that they all initialize the
lower three DSCP bits, which fits the TOS definition in the initial IPv4
specification (RFC 791).

Therefore, the kernel only allows configuring IPv4 FIB rules that match
on the lower three DSCP bits which are always guaranteed to be
initialized by all call sites:

 # ip -4 rule add tos 0x1c table 100
 # ip -4 rule add tos 0x3c table 100
 Error: Invalid tos.

While this works, it is unlikely to be very useful. RFC 791 that
initially defined the TOS and IP precedence fields was updated by RFC
2474 over twenty five years ago where these fields were replaced by a
single six bits DSCP field.

Extending FIB rules to match on DSCP can be done by adding a new DSCP
selector while maintaining the existing semantics of the TOS selector
for applications that rely on that.

A prerequisite for allowing FIB rules to match on DSCP is to adjust all
the call sites to initialize the high order DSCP bits and remove their
masking along the path to the core where the field is matched on.

However, making this change alone will result in a behavior change. For
example, a forwarded IPv4 packet with a DS field of 0xfc will no longer
match a FIB rule that was configured with 'tos 0x1c'.

This behavior change can be avoided by masking the upper three DSCP bits
in 'flowi4_tos' before comparing it against the TOS selectors in FIB
rules and routes.

Implement the above by adding a new function that checks whether a given
DSCP value matches the one specified in the IPv4 flow information
structure and invoke it from the three places that currently match on
'flowi4_tos'.

Use RT_TOS() for the masking of 'flowi4_tos' instead of IPTOS_RT_MASK
since the latter is not uAPI and we should be able to remove it at some
point.

Include &lt;linux/ip.h&gt; in &lt;linux/in_route.h&gt; since the former defines
IPTOS_TOS_MASK which is used in the definition of RT_TOS() in
&lt;linux/in_route.h&gt;.

No regressions in FIB tests:

 # ./fib_tests.sh
 [...]
 Tests passed: 218
 Tests failed:   0

And FIB rule tests:

 # ./fib_rule_tests.sh
 [...]
 Tests passed: 116
 Tests failed:   0

Signed-off-by: Ido Schimmel &lt;idosch@nvidia.com&gt;
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;

</content>
</entry>
<entry>
<title>fib: remove unnecessary input parameters in fib_default_rule_add</title>
<updated>2024-01-04T00:42:48+00:00</updated>
<author>
<name>Zhengchao Shao</name>
<email>shaozhengchao@huawei.com</email>
</author>
<published>2024-01-02T07:15:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b4c1d4d9734cda4394da5b59ebf7d9ca3579561a'/>
<id>urn:sha1:b4c1d4d9734cda4394da5b59ebf7d9ca3579561a</id>
<content type='text'>
When fib_default_rule_add is invoked, the value of the input parameter
'flags' is always 0. Rules uses kzalloc to allocate memory, so 'flags' has
been initialized to 0. Therefore, remove the input parameter 'flags' in
fib_default_rule_add.

Signed-off-by: Zhengchao Shao &lt;shaozhengchao@huawei.com&gt;
Reviewed-by: David Ahern &lt;dsahern@kernel.org&gt;
Link: https://lore.kernel.org/r/20240102071519.3781384-1-shaozhengchao@huawei.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>ipv4: remove unnecessary type castings</title>
<updated>2022-04-30T14:12:58+00:00</updated>
<author>
<name>Yu Zhe</name>
<email>yuzhe@nfschina.com</email>
</author>
<published>2022-04-29T02:14:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=2e47eece158a7e5d2205be42e9c44b87302de1a7'/>
<id>urn:sha1:2e47eece158a7e5d2205be42e9c44b87302de1a7</id>
<content type='text'>
remove unnecessary void* type castings.

Signed-off-by: Yu Zhe &lt;yuzhe@nfschina.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>ipv4: Reject again rules with high DSCP values</title>
<updated>2022-02-10T15:33:33+00:00</updated>
<author>
<name>Guillaume Nault</name>
<email>gnault@redhat.com</email>
</author>
<published>2022-02-10T12:24:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=dc513a405cade3e47bcda8de27c7a7bf6eeddd18'/>
<id>urn:sha1:dc513a405cade3e47bcda8de27c7a7bf6eeddd18</id>
<content type='text'>
Commit 563f8e97e054 ("ipv4: Stop taking ECN bits into account in
fib4-rules") replaced the validation test on frh-&gt;tos. While the new
test is stricter for ECN bits, it doesn't detect the use of high order
DSCP bits. This would be fine if IPv4 could properly handle them. But
currently, most IPv4 lookups are done with the three high DSCP bits
masked. Therefore, using these bits doesn't lead to the expected
result.

Let's reject such configurations again, so that nobody starts to
use and make any assumption about how the stack handles the three high
order DSCP bits in fib4 rules.

Fixes: 563f8e97e054 ("ipv4: Stop taking ECN bits into account in fib4-rules")
Signed-off-by: Guillaume Nault &lt;gnault@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>ipv4: Stop taking ECN bits into account in fib4-rules</title>
<updated>2022-02-08T04:12:46+00:00</updated>
<author>
<name>Guillaume Nault</name>
<email>gnault@redhat.com</email>
</author>
<published>2022-02-04T13:58:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=563f8e97e054451d167327336a53b7381517a998'/>
<id>urn:sha1:563f8e97e054451d167327336a53b7381517a998</id>
<content type='text'>
Use the new dscp_t type to replace the tos field of struct fib4_rule,
so that fib4-rules consistently ignore ECN bits.

Before this patch, fib4-rules did accept rules with the high order ECN
bit set (but not the low order one). Also, it relied on its callers
masking the ECN bits of -&gt;flowi4_tos to prevent those from influencing
the result. This was brittle and a few call paths still do the lookup
without masking the ECN bits first.

After this patch fib4-rules only compare the DSCP bits. ECN can't
influence the result anymore, even if the caller didn't mask these
bits. Also, fib4-rules now must have both ECN bits cleared or they will
be rejected.

Signed-off-by: Guillaume Nault &lt;gnault@redhat.com&gt;
Acked-by: David Ahern &lt;dsahern@kernel.org&gt;
Reviewed-by: Toke Høiland-Jørgensen &lt;toke@redhat.com&gt;
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>fib: rules: remove duplicated nla policies</title>
<updated>2021-12-16T15:18:35+00:00</updated>
<author>
<name>Florian Westphal</name>
<email>fw@strlen.de</email>
</author>
<published>2021-12-16T12:05:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=92e1bcee067f1722703bf61047ef674be1b6d1c0'/>
<id>urn:sha1:92e1bcee067f1722703bf61047ef674be1b6d1c0</id>
<content type='text'>
The attributes are identical in all implementations so move the ipv4 one
into the core and remove the per-family nla policies.

Signed-off-by: Florian Westphal &lt;fw@strlen.de&gt;
Reviewed-by: David Ahern &lt;dsahern@kernel.org&gt;
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>ipv4: convert fib_num_tclassid_users to atomic_t</title>
<updated>2021-12-02T11:56:04+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2021-12-02T02:26:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=213f5f8f31f10aa1e83187ae20fb7fa4e626b724'/>
<id>urn:sha1:213f5f8f31f10aa1e83187ae20fb7fa4e626b724</id>
<content type='text'>
Before commit faa041a40b9f ("ipv4: Create cleanup helper for fib_nh")
changes to net-&gt;ipv4.fib_num_tclassid_users were protected by RTNL.

After the change, this is no longer the case, as free_fib_info_rcu()
runs after rcu grace period, without rtnl being held.

Fixes: faa041a40b9f ("ipv4: Create cleanup helper for fib_nh")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: David Ahern &lt;dsahern@kernel.org&gt;
Reviewed-by: David Ahern &lt;dsahern@kernel.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>ipv6: fix memory leak in fib6_rule_suppress</title>
<updated>2021-11-29T14:43:35+00:00</updated>
<author>
<name>msizanoen1</name>
<email>msizanoen@qtmlabs.xyz</email>
</author>
<published>2021-11-23T12:48:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=cdef485217d30382f3bf6448c54b4401648fe3f1'/>
<id>urn:sha1:cdef485217d30382f3bf6448c54b4401648fe3f1</id>
<content type='text'>
The kernel leaks memory when a `fib` rule is present in IPv6 nftables
firewall rules and a suppress_prefix rule is present in the IPv6 routing
rules (used by certain tools such as wg-quick). In such scenarios, every
incoming packet will leak an allocation in `ip6_dst_cache` slab cache.

After some hours of `bpftrace`-ing and source code reading, I tracked
down the issue to ca7a03c41753 ("ipv6: do not free rt if
FIB_LOOKUP_NOREF is set on suppress rule").

The problem with that change is that the generic `args-&gt;flags` always have
`FIB_LOOKUP_NOREF` set[1][2] but the IPv6-specific flag
`RT6_LOOKUP_F_DST_NOREF` might not be, leading to `fib6_rule_suppress` not
decreasing the refcount when needed.

How to reproduce:
 - Add the following nftables rule to a prerouting chain:
     meta nfproto ipv6 fib saddr . mark . iif oif missing drop
   This can be done with:
     sudo nft create table inet test
     sudo nft create chain inet test test_chain '{ type filter hook prerouting priority filter + 10; policy accept; }'
     sudo nft add rule inet test test_chain meta nfproto ipv6 fib saddr . mark . iif oif missing drop
 - Run:
     sudo ip -6 rule add table main suppress_prefixlength 0
 - Watch `sudo slabtop -o | grep ip6_dst_cache` to see memory usage increase
   with every incoming ipv6 packet.

This patch exposes the protocol-specific flags to the protocol
specific `suppress` function, and check the protocol-specific `flags`
argument for RT6_LOOKUP_F_DST_NOREF instead of the generic
FIB_LOOKUP_NOREF when decreasing the refcount, like this.

[1]: https://github.com/torvalds/linux/blob/ca7a03c4175366a92cee0ccc4fec0038c3266e26/net/ipv6/fib6_rules.c#L71
[2]: https://github.com/torvalds/linux/blob/ca7a03c4175366a92cee0ccc4fec0038c3266e26/net/ipv6/fib6_rules.c#L99

Link: https://bugzilla.kernel.org/show_bug.cgi?id=215105
Fixes: ca7a03c41753 ("ipv6: do not free rt if FIB_LOOKUP_NOREF is set on suppress rule")
Cc: stable@vger.kernel.org
Signed-off-by: Jason A. Donenfeld &lt;Jason@zx2c4.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
