<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/include/uapi/linux/if_xdp.h, branch v6.19.11</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.19.11</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.19.11'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2025-07-10T12:48:29+00:00</updated>
<entry>
<title>net: xsk: introduce XDP_MAX_TX_SKB_BUDGET setsockopt</title>
<updated>2025-07-10T12:48:29+00:00</updated>
<author>
<name>Jason Xing</name>
<email>kernelxing@tencent.com</email>
</author>
<published>2025-07-04T16:01:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=45e359be1ce88fb22e61fa3aa23b2e450a6cae03'/>
<id>urn:sha1:45e359be1ce88fb22e61fa3aa23b2e450a6cae03</id>
<content type='text'>
This patch provides a setsockopt method to let applications leverage to
adjust how many descs to be handled at most in one send syscall. It
mitigates the situation where the default value (32) that is too small
leads to higher frequency of triggering send syscall.

Considering the prosperity/complexity the applications have, there is no
absolutely ideal suggestion fitting all cases. So keep 32 as its default
value like before.

The patch does the following things:
- Add XDP_MAX_TX_SKB_BUDGET socket option.
- Set max_tx_budget to 32 by default in the initialization phase as a
  per-socket granular control.
- Set the range of max_tx_budget as [32, xs-&gt;tx-&gt;nentries].

The idea behind this comes out of real workloads in production. We use a
user-level stack with xsk support to accelerate sending packets and
minimize triggering syscalls. When the packets are aggregated, it's not
hard to hit the upper bound (namely, 32). The moment user-space stack
fetches the -EAGAIN error number passed from sendto(), it will loop to try
again until all the expected descs from tx ring are sent out to the driver.
Enlarging the XDP_MAX_TX_SKB_BUDGET value contributes to less frequency of
sendto() and higher throughput/PPS.

Here is what I did in production, along with some numbers as follows:
For one application I saw lately, I suggested using 128 as max_tx_budget
because I saw two limitations without changing any default configuration:
1) XDP_MAX_TX_SKB_BUDGET, 2) socket sndbuf which is 212992 decided by
net.core.wmem_default. As to XDP_MAX_TX_SKB_BUDGET, the scenario behind
this was I counted how many descs are transmitted to the driver at one
time of sendto() based on [1] patch and then I calculated the
possibility of hitting the upper bound. Finally I chose 128 as a
suitable value because 1) it covers most of the cases, 2) a higher
number would not bring evident results. After twisting the parameters,
a stable improvement of around 4% for both PPS and throughput and less
resources consumption were found to be observed by strace -c -p xxx:
1) %time was decreased by 7.8%
2) error counter was decreased from 18367 to 572

[1]: https://lore.kernel.org/all/20250619093641.70700-1-kerneljasonxing@gmail.com/

Signed-off-by: Jason Xing &lt;kernelxing@tencent.com&gt;
Acked-by: Maciej Fijalkowski &lt;maciej.fijalkowski@intel.com&gt;
Link: https://patch.msgid.link/20250704160138.48677-1-kerneljasonxing@gmail.com
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;

</content>
</entry>
<entry>
<title>net: add UAPI to the header guard in various network headers</title>
<updated>2025-04-18T02:04:50+00:00</updated>
<author>
<name>Jakub Kicinski</name>
<email>kuba@kernel.org</email>
</author>
<published>2025-04-16T20:08:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=8066e388be48f1ad62b0449dc1d31a25489fa12a'/>
<id>urn:sha1:8066e388be48f1ad62b0449dc1d31a25489fa12a</id>
<content type='text'>
fib_rule, ip6_tunnel, and a whole lot of if_* headers lack the customary
_UAPI in the header guard. Without it YNL build can't protect from in tree
and system headers both getting included. YNL doesn't need most of these
but it's annoying to have to fix them one by one.

Note that header installation strips this _UAPI prefix so this should
result in no change to the end user.

Acked-by: Jamal Hadi Salim &lt;jhs@mojatatu.com&gt;
Reviewed-by: Jason Xing &lt;kerneljasonxing@gmail.com&gt;
Reviewed-by: Ido Schimmel &lt;idosch@nvidia.com&gt;
Link: https://patch.msgid.link/20250416200840.1338195-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>xsk: Add launch time hardware offload support to XDP Tx metadata</title>
<updated>2025-02-20T23:13:45+00:00</updated>
<author>
<name>Song Yoong Siang</name>
<email>yoong.siang.song@intel.com</email>
</author>
<published>2025-02-16T09:34:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=ca4419f15abd19ba8be1e109661b60f9f5b6c9f0'/>
<id>urn:sha1:ca4419f15abd19ba8be1e109661b60f9f5b6c9f0</id>
<content type='text'>
Extend the XDP Tx metadata framework so that user can requests launch time
hardware offload, where the Ethernet device will schedule the packet for
transmission at a pre-determined time called launch time. The value of
launch time is communicated from user space to Ethernet driver via
launch_time field of struct xsk_tx_metadata.

Suggested-by: Stanislav Fomichev &lt;sdf@fomichev.me&gt;
Signed-off-by: Song Yoong Siang &lt;yoong.siang.song@intel.com&gt;
Signed-off-by: Martin KaFai Lau &lt;martin.lau@kernel.org&gt;
Acked-by: Stanislav Fomichev &lt;sdf@fomichev.me&gt;
Acked-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Link: https://patch.msgid.link/20250216093430.957880-2-yoong.siang.song@intel.com
</content>
</entry>
<entry>
<title>xsk: Require XDP_UMEM_TX_METADATA_LEN to actuate tx_metadata_len</title>
<updated>2024-07-25T09:57:27+00:00</updated>
<author>
<name>Stanislav Fomichev</name>
<email>sdf@fomichev.me</email>
</author>
<published>2024-07-13T01:52:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=d5e726d9143c5624135f5dc9e4069799adeef734'/>
<id>urn:sha1:d5e726d9143c5624135f5dc9e4069799adeef734</id>
<content type='text'>
Julian reports that commit 341ac980eab9 ("xsk: Support tx_metadata_len")
can break existing use cases which don't zero-initialize xdp_umem_reg
padding. Introduce new XDP_UMEM_TX_METADATA_LEN to make sure we
interpret the padding as tx_metadata_len only when being explicitly
asked.

Fixes: 341ac980eab9 ("xsk: Support tx_metadata_len")
Reported-by: Julian Schindel &lt;mail@arctic-alpaca.de&gt;
Signed-off-by: Stanislav Fomichev &lt;sdf@fomichev.me&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Reviewed-by: Maciej Fijalkowski &lt;maciej.fijalkowski@intel.com&gt;
Link: https://lore.kernel.org/bpf/20240713015253.121248-2-sdf@fomichev.me
</content>
</entry>
<entry>
<title>xsk: Add option to calculate TX checksum in SW</title>
<updated>2023-11-29T22:59:40+00:00</updated>
<author>
<name>Stanislav Fomichev</name>
<email>sdf@google.com</email>
</author>
<published>2023-11-27T19:03:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=11614723af26e7c32fcb704d8f30fdf60c1122dc'/>
<id>urn:sha1:11614723af26e7c32fcb704d8f30fdf60c1122dc</id>
<content type='text'>
For XDP_COPY mode, add a UMEM option XDP_UMEM_TX_SW_CSUM
to call skb_checksum_help in transmit path. Might be useful
to debugging issues with real hardware. I also use this mode
in the selftests.

Signed-off-by: Stanislav Fomichev &lt;sdf@google.com&gt;
Link: https://lore.kernel.org/r/20231127190319.1190813-9-sdf@google.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>xsk: Add TX timestamp and TX checksum offload support</title>
<updated>2023-11-29T22:59:40+00:00</updated>
<author>
<name>Stanislav Fomichev</name>
<email>sdf@google.com</email>
</author>
<published>2023-11-27T19:03:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=48eb03dd26304c24f03bdbb9382e89c8564e71df'/>
<id>urn:sha1:48eb03dd26304c24f03bdbb9382e89c8564e71df</id>
<content type='text'>
This change actually defines the (initial) metadata layout
that should be used by AF_XDP userspace (xsk_tx_metadata).
The first field is flags which requests appropriate offloads,
followed by the offload-specific fields. The supported per-device
offloads are exported via netlink (new xsk-flags).

The offloads themselves are still implemented in a bit of a
framework-y fashion that's left from my initial kfunc attempt.
I'm introducing new xsk_tx_metadata_ops which drivers are
supposed to implement. The drivers are also supposed
to call xsk_tx_metadata_request/xsk_tx_metadata_complete in
the right places. Since xsk_tx_metadata_{request,_complete}
are static inline, we don't incur any extra overhead doing
indirect calls.

The benefit of this scheme is as follows:
- keeps all metadata layout parsing away from driver code
- makes it easy to grep and see which drivers implement what
- don't need any extra flags to maintain to keep track of what
  offloads are implemented; if the callback is implemented - the offload
  is supported (used by netlink reporting code)

Two offloads are defined right now:
1. XDP_TXMD_FLAGS_CHECKSUM: skb-style csum_start+csum_offset
2. XDP_TXMD_FLAGS_TIMESTAMP: writes TX timestamp back into metadata
   area upon completion (tx_timestamp field)

XDP_TXMD_FLAGS_TIMESTAMP is also implemented for XDP_COPY mode: it writes
SW timestamp from the skb destructor (note I'm reusing hwtstamps to pass
metadata pointer).

The struct is forward-compatible and can be extended in the future
by appending more fields.

Reviewed-by: Song Yoong Siang &lt;yoong.siang.song@intel.com&gt;
Signed-off-by: Stanislav Fomichev &lt;sdf@google.com&gt;
Acked-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Link: https://lore.kernel.org/r/20231127190319.1190813-3-sdf@google.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>xsk: Support tx_metadata_len</title>
<updated>2023-11-29T22:59:40+00:00</updated>
<author>
<name>Stanislav Fomichev</name>
<email>sdf@google.com</email>
</author>
<published>2023-11-27T19:03:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=341ac980eab90ac1f6c22ee9f9da83ed9604d899'/>
<id>urn:sha1:341ac980eab90ac1f6c22ee9f9da83ed9604d899</id>
<content type='text'>
For zerocopy mode, tx_desc-&gt;addr can point to an arbitrary offset
and carry some TX metadata in the headroom. For copy mode, there
is no way currently to populate skb metadata.

Introduce new tx_metadata_len umem config option that indicates how many
bytes to treat as metadata. Metadata bytes come prior to tx_desc address
(same as in RX case).

The size of the metadata has mostly the same constraints as XDP:
- less than 256 bytes
- 8-byte aligned (compared to 4-byte alignment on xdp, due to 8-byte
  timestamp in the completion)
- non-zero

This data is not interpreted in any way right now.

Reviewed-by: Song Yoong Siang &lt;yoong.siang.song@intel.com&gt;
Signed-off-by: Stanislav Fomichev &lt;sdf@google.com&gt;
Reviewed-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Link: https://lore.kernel.org/r/20231127190319.1190813-2-sdf@google.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>xsk: introduce XSK_USE_SG bind flag for xsk socket</title>
<updated>2023-07-19T16:56:48+00:00</updated>
<author>
<name>Tirthendu Sarkar</name>
<email>tirthendu.sarkar@intel.com</email>
</author>
<published>2023-07-19T13:23:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=81470b5c3c6649eef8e5f282cd06793f788ae165'/>
<id>urn:sha1:81470b5c3c6649eef8e5f282cd06793f788ae165</id>
<content type='text'>
As of now xsk core drops any xdp_buff with data size greater than the
xsk frame_size as set by the af_xdp application. With multi-buffer
support introduced in the next patch xsk core can now split those
buffers into multiple descriptors provided the af_xdp application can
handle them. Such capability of the application needs to be independent
of the xdp_prog's frag support capability since there are cases where
even a single xdp_buffer may need to be split into multiple descriptors
owing to a smaller xsk frame size.

For e.g., with NIC rx_buffer size set to 4kB, a 3kB packet will
constitute of a single buffer and so will be sent as such to AF_XDP layer
irrespective of 'xdp.frags' capability of the XDP program. Now if the xsk
frame size is set to 2kB by the AF_XDP application, then the packet will
need to be split into 2 descriptors if AF_XDP application can handle
multi-buffer, else it needs to be dropped.

Applications can now advertise their frag handling capability to xsk core
so that xsk core can decide if it should drop or split xdp_buffs that
exceed xsk frame size. This is done using a new 'XSK_USE_SG' bind flag
for the xdp socket.

Signed-off-by: Tirthendu Sarkar &lt;tirthendu.sarkar@intel.com&gt;
Link: https://lore.kernel.org/r/20230719132421.584801-3-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>xsk: prepare 'options' in xdp_desc for multi-buffer use</title>
<updated>2023-07-19T16:56:48+00:00</updated>
<author>
<name>Tirthendu Sarkar</name>
<email>tirthendu.sarkar@intel.com</email>
</author>
<published>2023-07-19T13:23:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=63a64a56bc3f77c74085047ee45356ac850da3e8'/>
<id>urn:sha1:63a64a56bc3f77c74085047ee45356ac850da3e8</id>
<content type='text'>
Use the 'options' field in xdp_desc as a packet continuity marker. Since
'options' field was unused till now and was expected to be set to 0, the
'eop' descriptor will have it set to 0, while the non-eop descriptors
will have to set it to 1. This ensures legacy applications continue to
work without needing any change for single-buffer packets.

Add helper functions and extend xskq_prod_reserve_desc() to use the
'options' field.

Signed-off-by: Tirthendu Sarkar &lt;tirthendu.sarkar@intel.com&gt;
Link: https://lore.kernel.org/r/20230719132421.584801-2-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>xsk: Add new statistics</title>
<updated>2020-07-13T22:32:56+00:00</updated>
<author>
<name>Ciara Loftus</name>
<email>ciara.loftus@intel.com</email>
</author>
<published>2020-07-08T07:28:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=8aa5a33578e9685d06020bd10d1637557423e945'/>
<id>urn:sha1:8aa5a33578e9685d06020bd10d1637557423e945</id>
<content type='text'>
It can be useful for the user to know the reason behind a dropped packet.
Introduce new counters which track drops on the receive path caused by:
1. rx ring being full
2. fill ring being empty

Also, on the tx path introduce a counter which tracks the number of times
we attempt pull from the tx ring when it is empty.

Signed-off-by: Ciara Loftus &lt;ciara.loftus@intel.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Link: https://lore.kernel.org/bpf/20200708072835.4427-2-ciara.loftus@intel.com
</content>
</entry>
</feed>
