<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/include/net/route.h, branch v6.4.15</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.4.15</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.4.15'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2023-08-11T10:14:13+00:00</updated>
<entry>
<title>net: annotate data-races around sk-&gt;sk_mark</title>
<updated>2023-08-11T10:14:13+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2023-07-28T15:03:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b76d2fa662b7292fec3d095eb7864fa9eb14a480'/>
<id>urn:sha1:b76d2fa662b7292fec3d095eb7864fa9eb14a480</id>
<content type='text'>
[ Upstream commit 3c5b4d69c358a9275a8de98f87caf6eda644b086 ]

sk-&gt;sk_mark is often read while another thread could change the value.

Fixes: 4a19ec5800fc ("[NET]: Introducing socket mark socket option.")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>net: dst: Prevent false sharing vs. dst_entry:: __refcnt</title>
<updated>2023-03-29T01:52:22+00:00</updated>
<author>
<name>Wangyang Guo</name>
<email>wangyang.guo@intel.com</email>
</author>
<published>2023-03-23T20:55:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=d288a162dd1c73507da582966f17dd226e34a0c0'/>
<id>urn:sha1:d288a162dd1c73507da582966f17dd226e34a0c0</id>
<content type='text'>
dst_entry::__refcnt is highly contended in scenarios where many connections
happen from and to the same IP. The reference count is an atomic_t, so the
reference count operations have to take the cache-line exclusive.

Aside of the unavoidable reference count contention there is another
significant problem which is caused by that: False sharing.

perf top identified two affected read accesses. dst_entry::lwtstate and
rtable::rt_genid.

dst_entry:__refcnt is located at offset 64 of dst_entry, which puts it into
a seperate cacheline vs. the read mostly members located at the beginning
of the struct.

That prevents false sharing vs. the struct members in the first 64
bytes of the structure, but there is also

  dst_entry::lwtstate

which is located after the reference count and in the same cache line. This
member is read after a reference count has been acquired.

struct rtable embeds a struct dst_entry at offset 0. struct dst_entry has a
size of 112 bytes, which means that the struct members of rtable which
follow the dst member share the same cache line as dst_entry::__refcnt.
Especially

  rtable::rt_genid

is also read by the contexts which have a reference count acquired
already.

When dst_entry:__refcnt is incremented or decremented via an atomic
operation these read accesses stall. This was found when analysing the
memtier benchmark in 1:100 mode, which amplifies the problem extremly.

Move the rt[6i]_uncached[_list] members out of struct rtable and struct
rt6_info into struct dst_entry to provide padding and move the lwtstate
member after that so it ends up in the same cache line.

The resulting improvement depends on the micro-architecture and the number
of CPUs. It ranges from +20% to +120% with a localhost memtier/memcached
benchmark.

[ tglx: Rearrange struct ]

Signed-off-by: Wangyang Guo &lt;wangyang.guo@intel.com&gt;
Signed-off-by: Arjan van de Ven &lt;arjan@linux.intel.com&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reviewed-by: David Ahern &lt;dsahern@kernel.org&gt;
Link: https://lore.kernel.org/r/20230323102800.042297517@linutronix.de
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net: add a couple of helpers for iph tot_len</title>
<updated>2023-02-02T04:54:27+00:00</updated>
<author>
<name>Xin Long</name>
<email>lucien.xin@gmail.com</email>
</author>
<published>2023-01-28T15:58:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=058a8f7f73aae1cc22b53fcefec031b9e391b54d'/>
<id>urn:sha1:058a8f7f73aae1cc22b53fcefec031b9e391b54d</id>
<content type='text'>
This patch adds three APIs to replace the iph-&gt;tot_len setting
and getting in all places where IPv4 BIG TCP packets may reach,
they will be used in the following patches.

Note that iph_totlen() will be used when iph is not in linear
data of the skb.

Signed-off-by: Xin Long &lt;lucien.xin@gmail.com&gt;
Reviewed-by: David Ahern &lt;dsahern@kernel.org&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next</title>
<updated>2022-07-25T12:25:39+00:00</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2022-07-25T12:25:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e222dc8d8408788e906a5c0dd5f01fbe7f6cd1d3'/>
<id>urn:sha1:e222dc8d8408788e906a5c0dd5f01fbe7f6cd1d3</id>
<content type='text'>
Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2022-07-20

1) Don't set DST_NOPOLICY in IPv4, a recent patch made this
   superfluous. From Eyal Birger.

2) Convert alg_key to flexible array member to avoid an iproute2
   compile warning when built with gcc-12.
   From Stephen Hemminger.

3) xfrm_register_km and xfrm_unregister_km do always return 0
   so change the type to void. From Zhengchao Shao.

4) Fix spelling mistake in esp6.c
   From Zhang Jiaming.

5) Improve the wording of comment above XFRM_OFFLOAD flags.
   From Petr Vaněk.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net</title>
<updated>2022-07-21T20:03:39+00:00</updated>
<author>
<name>Jakub Kicinski</name>
<email>kuba@kernel.org</email>
</author>
<published>2022-07-21T20:03:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=6e0e846ee2ab01bc44254e6a0a6a6a0db1cba16d'/>
<id>urn:sha1:6e0e846ee2ab01bc44254e6a0a6a6a0db1cba16d</id>
<content type='text'>
No conflicts.

Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>ip: Fix data-races around sysctl_ip_default_ttl.</title>
<updated>2022-07-15T10:49:55+00:00</updated>
<author>
<name>Kuniyuki Iwashima</name>
<email>kuniyu@amazon.com</email>
</author>
<published>2022-07-13T20:51:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=8281b7ec5c56b71cb2cc5a1728b41607be66959c'/>
<id>urn:sha1:8281b7ec5c56b71cb2cc5a1728b41607be66959c</id>
<content type='text'>
While reading sysctl_ip_default_ttl, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima &lt;kuniyu@amazon.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: change the type of ip_route_input_rcu to static</title>
<updated>2022-07-12T13:08:45+00:00</updated>
<author>
<name>Zhengchao Shao</name>
<email>shaozhengchao@huawei.com</email>
</author>
<published>2022-07-11T07:35:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=5022e221c98a609e0e5b0a73852c7e3d32f1c545'/>
<id>urn:sha1:5022e221c98a609e0e5b0a73852c7e3d32f1c545</id>
<content type='text'>
The type of ip_route_input_rcu should be static.

Signed-off-by: Zhengchao Shao &lt;shaozhengchao@huawei.com&gt;
Link: https://lore.kernel.org/r/20220711073549.8947-1-shaozhengchao@huawei.com
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
</content>
</entry>
<entry>
<title>xfrm: no need to set DST_NOPOLICY in IPv4</title>
<updated>2022-06-10T09:01:36+00:00</updated>
<author>
<name>Eyal Birger</name>
<email>eyal.birger@gmail.com</email>
</author>
<published>2022-05-20T10:48:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b5c8b3fe8946c1927155599dfb79045477901009'/>
<id>urn:sha1:b5c8b3fe8946c1927155599dfb79045477901009</id>
<content type='text'>
This is a cleanup patch following commit e6175a2ed1f1
("xfrm: fix "disable_policy" flag use when arriving from different devices")
which made DST_NOPOLICY no longer be used for inbound policy checks.

On outbound the flag was set, but never used.

As such, avoid setting it altogether and remove the nopolicy argument
from rt_dst_alloc().

Signed-off-by: Eyal Birger &lt;eyal.birger@gmail.com&gt;
Reviewed-by: Nicolas Dichtel &lt;nicolas.dichtel@6wind.com&gt;
Signed-off-by: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
</content>
</entry>
<entry>
<title>ipv4: Avoid using RTO_ONLINK with ip_route_connect().</title>
<updated>2022-04-22T12:06:03+00:00</updated>
<author>
<name>Guillaume Nault</name>
<email>gnault@redhat.com</email>
</author>
<published>2022-04-20T23:21:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=67e1e2f4854bb2c0dd2b8440cf090016a0e1a091'/>
<id>urn:sha1:67e1e2f4854bb2c0dd2b8440cf090016a0e1a091</id>
<content type='text'>
Now that ip_rt_fix_tos() doesn't reset -&gt;flowi4_scope unconditionally,
we don't have to rely on the RTO_ONLINK bit to properly set the scope
of a flowi4 structure. We can just set -&gt;flowi4_scope explicitly and
avoid using RTO_ONLINK in -&gt;flowi4_tos.

This patch converts callers of ip_route_connect(). Instead of setting
the tos parameter with RT_CONN_FLAGS(sk), as all callers do, we can:

  1- Drop the tos parameter from ip_route_connect(): its value was
     entirely based on sk, which is also passed as parameter.

  2- Set -&gt;flowi4_scope depending on the SOCK_LOCALROUTE socket option
     instead of always initialising it with RT_SCOPE_UNIVERSE (let's
     define ip_sock_rt_scope() for this purpose).

  3- Avoid overloading -&gt;flowi4_tos with RTO_ONLINK: since the scope is
     now properly initialised, we don't need to tell ip_rt_fix_tos() to
     adjust -&gt;flowi4_scope for us. So let's define ip_sock_rt_tos(),
     which is the same as RT_CONN_FLAGS() but without the RTO_ONLINK
     bit overload.

Note:
  In the original ip_route_connect() code, __ip_route_output_key()
  might clear the RTO_ONLINK bit of fl4-&gt;flowi4_tos (because of
  ip_rt_fix_tos()). Therefore flowi4_update_output() had to reuse the
  original tos variable. Now that we don't set RTO_ONLINK any more,
  this is not a problem and we can use fl4-&gt;flowi4_tos in
  flowi4_update_output().

Signed-off-by: Guillaume Nault &lt;gnault@redhat.com&gt;
Reviewed-by: David Ahern &lt;dsahern@kernel.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>ipv4: remove sparse error in ip_neigh_gw4()</title>
<updated>2022-01-27T16:38:33+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2022-01-27T01:34:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=3c42b2019863b327caa233072c50739d4144dd16'/>
<id>urn:sha1:3c42b2019863b327caa233072c50739d4144dd16</id>
<content type='text'>
./include/net/route.h:373:48: warning: incorrect type in argument 2 (different base types)
./include/net/route.h:373:48:    expected unsigned int [usertype] key
./include/net/route.h:373:48:    got restricted __be32 [usertype] daddr

Fixes: 5c9f7c1dfc2e ("ipv4: Add helpers for neigh lookup for nexthop")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reviewed-by: David Ahern &lt;dsahern@kernel.org&gt;
Link: https://lore.kernel.org/r/20220127013404.1279313-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
</feed>
