<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/net/ipv4/ip_output.c, branch v6.19.11</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.19.11</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.19.11'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2025-10-03T17:23:50+00:00</updated>
<entry>
<title>net: psp: don't assume reply skbs will have a socket</title>
<updated>2025-10-03T17:23:50+00:00</updated>
<author>
<name>Jakub Kicinski</name>
<email>kuba@kernel.org</email>
</author>
<published>2025-10-01T02:24:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=7a0f94361ffd6e1d31c79023e8674b492bef05e3'/>
<id>urn:sha1:7a0f94361ffd6e1d31c79023e8674b492bef05e3</id>
<content type='text'>
Rx path may be passing around unreferenced sockets, which means
that skb_set_owner_edemux() may not set skb-&gt;sk and PSP will crash:

  KASAN: null-ptr-deref in range [0x0000000000000010-0x0000000000000017]
  RIP: 0010:psp_reply_set_decrypted (./include/net/psp/functions.h:132 net/psp/psp_sock.c:287)
    tcp_v6_send_response.constprop.0 (net/ipv6/tcp_ipv6.c:979)
    tcp_v6_send_reset (net/ipv6/tcp_ipv6.c:1140 (discriminator 1))
    tcp_v6_do_rcv (net/ipv6/tcp_ipv6.c:1683)
    tcp_v6_rcv (net/ipv6/tcp_ipv6.c:1912)

Fixes: 659a2899a57d ("tcp: add datapath logic for PSP with inline key exchange")
Reviewed-by: Kuniyuki Iwashima &lt;kuniyu@google.com&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Link: https://patch.msgid.link/20251001022426.2592750-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>tcp: add datapath logic for PSP with inline key exchange</title>
<updated>2025-09-18T10:32:06+00:00</updated>
<author>
<name>Jakub Kicinski</name>
<email>kuba@kernel.org</email>
</author>
<published>2025-09-17T00:09:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=659a2899a57da59f433182eba571881884d6323e'/>
<id>urn:sha1:659a2899a57da59f433182eba571881884d6323e</id>
<content type='text'>
Add validation points and state propagation to support PSP key
exchange inline, on TCP connections. The expectation is that
application will use some well established mechanism like TLS
handshake to establish a secure channel over the connection and
if both endpoints are PSP-capable - exchange and install PSP keys.
Because the connection can existing in PSP-unsecured and PSP-secured
state we need to make sure that there are no race conditions or
retransmission leaks.

On Tx - mark packets with the skb-&gt;decrypted bit when PSP key
is at the enqueue time. Drivers should only encrypt packets with
this bit set. This prevents retransmissions getting encrypted when
original transmission was not. Similarly to TLS, we'll use
sk-&gt;sk_validate_xmit_skb to make sure PSP skbs can't "escape"
via a PSP-unaware device without being encrypted.

On Rx - validation is done under socket lock. This moves the validation
point later than xfrm, for example. Please see the documentation patch
for more details on the flow of securing a connection, but for
the purpose of this patch what's important is that we want to
enforce the invariant that once connection is secured any skb
in the receive queue has been encrypted with PSP.

Add GRO and coalescing checks to prevent PSP authenticated data from
being combined with cleartext data, or data with non-matching PSP
state. On Rx, check skb's with psp_skb_coalesce_diff() at points
before psp_sk_rx_policy_check(). After skb's are policy checked and on
the socket receive queue, skb_cmp_decrypted() is sufficient for
checking for coalescable PSP state. On Tx, tcp_write_collapse_fence()
should be called when transitioning a socket into PSP Tx state to
prevent data sent as cleartext from being coalesced with PSP
encapsulated data.

This change only adds the validation points, for ease of review.
Subsequent change will add the ability to install keys, and flesh
the enforcement logic out

Reviewed-by: Willem de Bruijn &lt;willemb@google.com&gt;
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Co-developed-by: Daniel Zahka &lt;daniel.zahka@gmail.com&gt;
Signed-off-by: Daniel Zahka &lt;daniel.zahka@gmail.com&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Link: https://patch.msgid.link/20250917000954.859376-5-daniel.zahka@gmail.com
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;

</content>
</entry>
<entry>
<title>ipv4: Convert -&gt;flowi4_tos to dscp_t.</title>
<updated>2025-08-27T00:34:31+00:00</updated>
<author>
<name>Guillaume Nault</name>
<email>gnault@redhat.com</email>
</author>
<published>2025-08-25T13:37:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=1bec9d0c0046fe4e2bfb6a1c5aadcb5d56cdb0fb'/>
<id>urn:sha1:1bec9d0c0046fe4e2bfb6a1c5aadcb5d56cdb0fb</id>
<content type='text'>
Convert the -&gt;flowic_tos field of struct flowi_common from __u8 to
dscp_t, rename it -&gt;flowic_dscp and propagate these changes to struct
flowi and struct flowi4.

We've had several bugs in the past where ECN bits could interfere with
IPv4 routing, because these bits were not properly cleared when setting
-&gt;flowi4_tos. These bugs should be fixed now and the dscp_t type has
been introduced to ensure that variables carrying DSCP values don't
accidentally have any ECN bits set. Several variables and structure
fields have been converted to dscp_t already, but the main IPv4 routing
structure, struct flowi4, is still using a __u8. To avoid any future
regression, this patch converts it to dscp_t.

There are many users to convert at once. Fortunately, around half of
-&gt;flowi4_tos users already have a dscp_t value at hand, which they
currently convert to __u8 using inet_dscp_to_dsfield(). For all of
these users, we just need to drop that conversion.

But, although we try to do the __u8 &lt;-&gt; dscp_t conversions at the
boundaries of the network or of user space, some places still store
TOS/DSCP variables as __u8 in core networking code. Those can hardly be
converted either because the data structure is part of UAPI or because
the same variable or field is also used for handling ECN in other parts
of the code. In all of these cases where we don't have a dscp_t
variable at hand, we need to use inet_dsfield_to_dscp() when
interacting with -&gt;flowi4_dscp.

Changes since v1:
  * Fix space alignment in __bpf_redirect_neigh_v4() (Ido).

Signed-off-by: Guillaume Nault &lt;gnault@redhat.com&gt;
Reviewed-by: Ido Schimmel &lt;idosch@nvidia.com&gt;
Link: https://patch.msgid.link/29acecb45e911d17446b9a3dbdb1ab7b821ea371.1756128932.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net: Add locking to protect skb-&gt;dev access in ip_output</title>
<updated>2025-08-01T22:17:52+00:00</updated>
<author>
<name>Sharath Chandra Vurukala</name>
<email>quic_sharathv@quicinc.com</email>
</author>
<published>2025-07-30T10:51:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=1dbf1d590d10a6d1978e8184f8dfe20af22d680a'/>
<id>urn:sha1:1dbf1d590d10a6d1978e8184f8dfe20af22d680a</id>
<content type='text'>
In ip_output() skb-&gt;dev is updated from the skb_dst(skb)-&gt;dev
this can become invalid when the interface is unregistered and freed,

Introduced new skb_dst_dev_rcu() function to be used instead of
skb_dst_dev() within rcu_locks in ip_output.This will ensure that
all the skb's associated with the dev being deregistered will
be transnmitted out first, before freeing the dev.

Given that ip_output() is called within an rcu_read_lock()
critical section or from a bottom-half context, it is safe to introduce
an RCU read-side critical section within it.

Multiple panic call stacks were observed when UL traffic was run
in concurrency with device deregistration from different functions,
pasting one sample for reference.

[496733.627565][T13385] Call trace:
[496733.627570][T13385] bpf_prog_ce7c9180c3b128ea_cgroupskb_egres+0x24c/0x7f0
[496733.627581][T13385] __cgroup_bpf_run_filter_skb+0x128/0x498
[496733.627595][T13385] ip_finish_output+0xa4/0xf4
[496733.627605][T13385] ip_output+0x100/0x1a0
[496733.627613][T13385] ip_send_skb+0x68/0x100
[496733.627618][T13385] udp_send_skb+0x1c4/0x384
[496733.627625][T13385] udp_sendmsg+0x7b0/0x898
[496733.627631][T13385] inet_sendmsg+0x5c/0x7c
[496733.627639][T13385] __sys_sendto+0x174/0x1e4
[496733.627647][T13385] __arm64_sys_sendto+0x28/0x3c
[496733.627653][T13385] invoke_syscall+0x58/0x11c
[496733.627662][T13385] el0_svc_common+0x88/0xf4
[496733.627669][T13385] do_el0_svc+0x2c/0xb0
[496733.627676][T13385] el0_svc+0x2c/0xa4
[496733.627683][T13385] el0t_64_sync_handler+0x68/0xb4
[496733.627689][T13385] el0t_64_sync+0x1a4/0x1a8

Changes in v3:
- Replaced WARN_ON() with  WARN_ON_ONCE(), as suggested by Willem de Bruijn.
- Dropped legacy lines mistakenly pulled in from an outdated branch.

Changes in v2:
- Addressed review comments from Eric Dumazet
- Used READ_ONCE() to prevent potential load/store tearing
- Added skb_dst_dev_rcu() and used along with rcu_read_lock() in ip_output

Signed-off-by: Sharath Chandra Vurukala &lt;quic_sharathv@quicinc.com&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Link: https://patch.msgid.link/20250730105118.GA26100@hu-sharathv-hyd.qualcomm.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net: splice: Drop unused @gfp</title>
<updated>2025-07-08T15:37:15+00:00</updated>
<author>
<name>Michal Luczaj</name>
<email>mhal@rbox.co</email>
</author>
<published>2025-07-02T13:38:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=25489a4f556414445d342951615178368ee45cde'/>
<id>urn:sha1:25489a4f556414445d342951615178368ee45cde</id>
<content type='text'>
Since its introduction in commit 2e910b95329c ("net: Add a function to
splice pages into an skbuff for MSG_SPLICE_PAGES"), skb_splice_from_iter()
never used the @gfp argument. Remove it and adapt callers.

No functional change intended.

Reviewed-by: Simon Horman &lt;horms@kernel.org&gt;
Signed-off-by: Michal Luczaj &lt;mhal@rbox.co&gt;
Link: https://patch.msgid.link/20250702-splice-drop-unused-v3-2-55f68b60d2b7@rbox.co
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>ipv4: adopt dst_dev, skb_dst_dev and skb_dst_dev_net[_rcu]</title>
<updated>2025-07-02T21:32:30+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2025-06-30T12:19:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a74fc62eec155ca5a6da8ff3856f3dc87fe24558'/>
<id>urn:sha1:a74fc62eec155ca5a6da8ff3856f3dc87fe24558</id>
<content type='text'>
Use the new helpers as a first step to deal with
potential dst-&gt;dev races.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reviewed-by: Kuniyuki Iwashima &lt;kuniyu@google.com&gt;
Link: https://patch.msgid.link/20250630121934.3399505-8-edumazet@google.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net: devmem: Implement TX path</title>
<updated>2025-05-13T09:12:48+00:00</updated>
<author>
<name>Mina Almasry</name>
<email>almasrymina@google.com</email>
</author>
<published>2025-05-08T00:48:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=bd61848900bff597764238f3a8ec67c815cd316e'/>
<id>urn:sha1:bd61848900bff597764238f3a8ec67c815cd316e</id>
<content type='text'>
Augment dmabuf binding to be able to handle TX. Additional to all the RX
binding, we also create tx_vec needed for the TX path.

Provide API for sendmsg to be able to send dmabufs bound to this device:

- Provide a new dmabuf_tx_cmsg which includes the dmabuf to send from.
- MSG_ZEROCOPY with SCM_DEVMEM_DMABUF cmsg indicates send from dma-buf.

Devmem is uncopyable, so piggyback off the existing MSG_ZEROCOPY
implementation, while disabling instances where MSG_ZEROCOPY falls back
to copying.

We additionally pipe the binding down to the new
zerocopy_fill_skb_from_devmem which fills a TX skb with net_iov netmems
instead of the traditional page netmems.

We also special case skb_frag_dma_map to return the dma-address of these
dmabuf net_iovs instead of attempting to map pages.

The TX path may release the dmabuf in a context where we cannot wait.
This happens when the user unbinds a TX dmabuf while there are still
references to its netmems in the TX path. In that case, the netmems will
be put_netmem'd from a context where we can't unmap the dmabuf, Resolve
this by making __net_devmem_dmabuf_binding_free schedule_work'd.

Based on work by Stanislav Fomichev &lt;sdf@fomichev.me&gt;. A lot of the meat
of the implementation came from devmem TCP RFC v1[1], which included the
TX path, but Stan did all the rebasing on top of netmem/net_iov.

Cc: Stanislav Fomichev &lt;sdf@fomichev.me&gt;
Signed-off-by: Kaiyuan Zhang &lt;kaiyuanz@google.com&gt;
Signed-off-by: Mina Almasry &lt;almasrymina@google.com&gt;
Acked-by: Stanislav Fomichev &lt;sdf@fomichev.me&gt;
Link: https://patch.msgid.link/20250508004830.4100853-5-almasrymina@google.com
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;

</content>
</entry>
<entry>
<title>tcp: add new TCP_TW_ACK_OOW state and allow ECN bits in TOS</title>
<updated>2025-03-17T13:56:17+00:00</updated>
<author>
<name>Ilpo Järvinen</name>
<email>ij@kernel.org</email>
</author>
<published>2025-03-05T22:38:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=4618e195f9251947a5f15f7e5533bcc3a19b93ef'/>
<id>urn:sha1:4618e195f9251947a5f15f7e5533bcc3a19b93ef</id>
<content type='text'>
ECN bits in TOS are always cleared when sending in ACKs in TW. Clearing
them is problematic for TCP flows that used Accurate ECN because ECN bits
decide which service queue the packet is placed into (L4S vs Classic).
Effectively, TW ACKs are always downgraded from L4S to Classic queue
which might impact, e.g., delay the ACK will experience on the path
compared with the other packets of the flow.

Change the TW ACK sending code to differentiate:
- In tcp_v4_send_reset(), commit ba9e04a7ddf4f ("ip: fix tos reflection
  in ack and reset packets") cleans ECN bits for TW reset and this is
  not affected.
- In tcp_v4_timewait_ack(), ECN bits for all TW ACKs are cleaned. But now
  only ECN bits of ACKs for oow data or paws_reject are cleaned, and ECN
  bits of other ACKs will not be cleaned.
- In tcp_v4_reqsk_send_ack(), commit 66b13d99d96a1 ("ipv4: tcp: fix TOS
  value in ACK messages sent from TIME_WAIT") did not clean ECN bits of
  ACKs for oow data or paws_reject. But now the ECN bits rae cleaned for
  these ACKs.

Signed-off-by: Ilpo Järvinen &lt;ij@kernel.org&gt;
Signed-off-by: Chia-Yu Chang &lt;chia-yu.chang@nokia-bell-labs.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>ipv4: Use inet_sk_init_flowi4() in __ip_queue_xmit().</title>
<updated>2024-12-20T21:50:09+00:00</updated>
<author>
<name>Guillaume Nault</name>
<email>gnault@redhat.com</email>
</author>
<published>2024-12-16T17:21:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=148721f8e04a10a3b9c51f030c9be0d15b0a4d17'/>
<id>urn:sha1:148721f8e04a10a3b9c51f030c9be0d15b0a4d17</id>
<content type='text'>
Use inet_sk_init_flowi4() to automatically initialise the flowi4
structure in __ip_queue_xmit() instead of passing parameters manually
to ip_route_output_ports().

Override -&gt;flowi4_tos with the value passed as parameter since that's
required by SCTP.

Signed-off-by: Guillaume Nault &lt;gnault@redhat.com&gt;
Link: https://patch.msgid.link/37e64ffbd9adac187b14aa9097b095f5c86e85be.1734357769.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>sock: support SO_PRIORITY cmsg</title>
<updated>2024-12-17T02:13:44+00:00</updated>
<author>
<name>Anna Emese Nyiri</name>
<email>annaemesenyiri@gmail.com</email>
</author>
<published>2024-12-13T08:44:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a32f3e9d1ed146f81162702605d65447a319eb76'/>
<id>urn:sha1:a32f3e9d1ed146f81162702605d65447a319eb76</id>
<content type='text'>
The Linux socket API currently allows setting SO_PRIORITY at the
socket level, applying a uniform priority to all packets sent through
that socket. The exception to this is IP_TOS, when the priority value
is calculated during the handling of
ancillary data, as implemented in commit f02db315b8d8 ("ipv4: IP_TOS
and IP_TTL can be specified as ancillary data").
However, this is a computed
value, and there is currently no mechanism to set a custom priority
via control messages prior to this patch.

According to this patch, if SO_PRIORITY is specified as ancillary data,
the packet is sent with the priority value set through
sockc-&gt;priority, overriding the socket-level values
set via the traditional setsockopt() method. This is analogous to
the existing support for SO_MARK, as implemented in
commit c6af0c227a22 ("ip: support SO_MARK cmsg").

If both cmsg SO_PRIORITY and IP_TOS are passed, then the one that
takes precedence is the last one in the cmsg list.

This patch has the side effect that raw_send_hdrinc now interprets cmsg
IP_TOS.

Reviewed-by: Willem de Bruijn &lt;willemb@google.com&gt;
Suggested-by: Ferenc Fejes &lt;fejes@inf.elte.hu&gt;
Signed-off-by: Anna Emese Nyiri &lt;annaemesenyiri@gmail.com&gt;
Link: https://patch.msgid.link/20241213084457.45120-3-annaemesenyiri@gmail.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
</feed>
