<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/include/net/xsk_buff_pool.h, branch linux-6.5.y</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=linux-6.5.y</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=linux-6.5.y'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2023-04-27T20:24:51+00:00</updated>
<entry>
<title>xsk: Use pool-&gt;dma_pages to check for DMA</title>
<updated>2023-04-27T20:24:51+00:00</updated>
<author>
<name>Kal Conley</name>
<email>kal.conley@dectris.com</email>
</author>
<published>2023-04-23T18:01:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=6ec7be9a2d2b09815f69fc2183ec31857eaa6e27'/>
<id>urn:sha1:6ec7be9a2d2b09815f69fc2183ec31857eaa6e27</id>
<content type='text'>
Compare pool-&gt;dma_pages instead of pool-&gt;dma_pages_cnt to check for an
active DMA mapping. pool-&gt;dma_pages needs to be read anyway to access
the map so this compiles to more efficient code.

Signed-off-by: Kal Conley &lt;kal.conley@dectris.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Reviewed-by: Xuan Zhuo &lt;xuanzhuo@linux.alibaba.com&gt;
Acked-by: Magnus Karlsson &lt;magnus.karlsson@intel.com&gt;
Link: https://lore.kernel.org/bpf/20230423180157.93559-1-kal.conley@dectris.com
</content>
</entry>
<entry>
<title>xsk: Fix unaligned descriptor validation</title>
<updated>2023-04-06T16:53:05+00:00</updated>
<author>
<name>Kal Conley</name>
<email>kal.conley@dectris.com</email>
</author>
<published>2023-04-05T23:59:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=d769ccaf957fe7391f357c0a923de71f594b8a2b'/>
<id>urn:sha1:d769ccaf957fe7391f357c0a923de71f594b8a2b</id>
<content type='text'>
Make sure unaligned descriptors that straddle the end of the UMEM are
considered invalid. Currently, descriptor validation is broken for
zero-copy mode which only checks descriptors at page granularity.
For example, descriptors in zero-copy mode that overrun the end of the
UMEM but not a page boundary are (incorrectly) considered valid. The
UMEM boundary check needs to happen before the page boundary and
contiguity checks in xp_desc_crosses_non_contig_pg(). Do this check in
xp_unaligned_validate_desc() instead like xp_check_unaligned() already
does.

Fixes: 2b43470add8c ("xsk: Introduce AF_XDP buffer allocation API")
Signed-off-by: Kal Conley &lt;kal.conley@dectris.com&gt;
Acked-by: Magnus Karlsson &lt;magnus.karlsson@intel.com&gt;
Link: https://lore.kernel.org/r/20230405235920.7305-2-kal.conley@dectris.com
Signed-off-by: Martin KaFai Lau &lt;martin.lau@kernel.org&gt;
</content>
</entry>
<entry>
<title>xsk: Add cb area to struct xdp_buff_xsk</title>
<updated>2023-01-23T17:58:23+00:00</updated>
<author>
<name>Toke Høiland-Jørgensen</name>
<email>toke@redhat.com</email>
</author>
<published>2023-01-19T22:15:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=94ecc5ca4dbf1f01bae6e32f5cd88c0fc5dc3cc9'/>
<id>urn:sha1:94ecc5ca4dbf1f01bae6e32f5cd88c0fc5dc3cc9</id>
<content type='text'>
Add an area after the xdp_buff in struct xdp_buff_xsk that drivers can use
to stash extra information to use in metadata kfuncs. The maximum size of
24 bytes means the full xdp_buff_xsk structure will take up exactly two
cache lines (with the cb field spanning both). Also add a macro drivers can
use to check their own wrapping structs against the available size.

Cc: John Fastabend &lt;john.fastabend@gmail.com&gt;
Cc: David Ahern &lt;dsahern@gmail.com&gt;
Cc: Martin KaFai Lau &lt;martin.lau@linux.dev&gt;
Cc: Jakub Kicinski &lt;kuba@kernel.org&gt;
Cc: Willem de Bruijn &lt;willemb@google.com&gt;
Cc: Jesper Dangaard Brouer &lt;brouer@redhat.com&gt;
Cc: Anatoly Burakov &lt;anatoly.burakov@intel.com&gt;
Cc: Alexander Lobakin &lt;alexandr.lobakin@intel.com&gt;
Cc: Magnus Karlsson &lt;magnus.karlsson@gmail.com&gt;
Cc: Maryam Tahhan &lt;mtahhan@redhat.com&gt;
Cc: xdp-hints@xdp-project.net
Cc: netdev@vger.kernel.org
Suggested-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: Toke Høiland-Jørgensen &lt;toke@redhat.com&gt;
Signed-off-by: Stanislav Fomichev &lt;sdf@google.com&gt;
Link: https://lore.kernel.org/r/20230119221536.3349901-15-sdf@google.com
Signed-off-by: Martin KaFai Lau &lt;martin.lau@kernel.org&gt;
</content>
</entry>
<entry>
<title>xsk: Inherit need_wakeup flag for shared sockets</title>
<updated>2022-09-22T15:16:22+00:00</updated>
<author>
<name>Jalal Mostafa</name>
<email>jalal.a.mostapha@gmail.com</email>
</author>
<published>2022-09-21T13:57:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=60240bc26114543fcbfcd8a28466e67e77b20388'/>
<id>urn:sha1:60240bc26114543fcbfcd8a28466e67e77b20388</id>
<content type='text'>
The flag for need_wakeup is not set for xsks with `XDP_SHARED_UMEM`
flag and of different queue ids and/or devices. They should inherit
the flag from the first socket buffer pool since no flags can be
specified once `XDP_SHARED_UMEM` is specified.

Fixes: b5aea28dca134 ("xsk: Add shared umem support between queue ids")
Signed-off-by: Jalal Mostafa &lt;jalal.a.mostapha@gmail.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Magnus Karlsson &lt;magnus.karlsson@intel.com&gt;
Link: https://lore.kernel.org/bpf/20220921135701.10199-1-jalal.a.mostapha@gmail.com
</content>
</entry>
<entry>
<title>xsk: Fix possible crash when multiple sockets are created</title>
<updated>2022-04-26T14:19:54+00:00</updated>
<author>
<name>Maciej Fijalkowski</name>
<email>maciej.fijalkowski@intel.com</email>
</author>
<published>2022-04-25T15:37:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=ba3beec2ec1d3b4fd8672ca6e781dac4b3267f6e'/>
<id>urn:sha1:ba3beec2ec1d3b4fd8672ca6e781dac4b3267f6e</id>
<content type='text'>
Fix a crash that happens if an Rx only socket is created first, then a
second socket is created that is Tx only and bound to the same umem as
the first socket and also the same netdev and queue_id together with the
XDP_SHARED_UMEM flag. In this specific case, the tx_descs array page
pool was not created by the first socket as it was an Rx only socket.
When the second socket is bound it needs this tx_descs array of this
shared page pool as it has a Tx component, but unfortunately it was
never allocated, leading to a crash. Note that this array is only used
for zero-copy drivers using the batched Tx APIs, currently only ice and
i40e.

[ 5511.150360] BUG: kernel NULL pointer dereference, address: 0000000000000008
[ 5511.158419] #PF: supervisor write access in kernel mode
[ 5511.164472] #PF: error_code(0x0002) - not-present page
[ 5511.170416] PGD 0 P4D 0
[ 5511.173347] Oops: 0002 [#1] PREEMPT SMP PTI
[ 5511.178186] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G            E     5.18.0-rc1+ #97
[ 5511.187245] Hardware name: Intel Corp. GRANTLEY/GRANTLEY, BIOS GRRFCRB1.86B.0276.D07.1605190235 05/19/2016
[ 5511.198418] RIP: 0010:xsk_tx_peek_release_desc_batch+0x198/0x310
[ 5511.205375] Code: c0 83 c6 01 84 c2 74 6d 8d 46 ff 23 07 44 89 e1 48 83 c0 14 48 c1 e1 04 48 c1 e0 04 48 03 47 10 4c 01 c1 48 8b 50 08 48 8b 00 &lt;48&gt; 89 51 08 48 89 01 41 80 bd d7 00 00 00 00 75 82 48 8b 19 49 8b
[ 5511.227091] RSP: 0018:ffffc90000003dd0 EFLAGS: 00010246
[ 5511.233135] RAX: 0000000000000000 RBX: ffff88810c8da600 RCX: 0000000000000000
[ 5511.241384] RDX: 000000000000003c RSI: 0000000000000001 RDI: ffff888115f555c0
[ 5511.249634] RBP: ffffc90000003e08 R08: 0000000000000000 R09: ffff889092296b48
[ 5511.257886] R10: 0000ffffffffffff R11: ffff889092296800 R12: 0000000000000000
[ 5511.266138] R13: ffff88810c8db500 R14: 0000000000000040 R15: 0000000000000100
[ 5511.274387] FS:  0000000000000000(0000) GS:ffff88903f800000(0000) knlGS:0000000000000000
[ 5511.283746] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5511.290389] CR2: 0000000000000008 CR3: 00000001046e2001 CR4: 00000000003706f0
[ 5511.298640] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 5511.306892] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 5511.315142] Call Trace:
[ 5511.317972]  &lt;IRQ&gt;
[ 5511.320301]  ice_xmit_zc+0x68/0x2f0 [ice]
[ 5511.324977]  ? ktime_get+0x38/0xa0
[ 5511.328913]  ice_napi_poll+0x7a/0x6a0 [ice]
[ 5511.333784]  __napi_poll+0x2c/0x160
[ 5511.337821]  net_rx_action+0xdd/0x200
[ 5511.342058]  __do_softirq+0xe6/0x2dd
[ 5511.346198]  irq_exit_rcu+0xb5/0x100
[ 5511.350339]  common_interrupt+0xa4/0xc0
[ 5511.354777]  &lt;/IRQ&gt;
[ 5511.357201]  &lt;TASK&gt;
[ 5511.359625]  asm_common_interrupt+0x1e/0x40
[ 5511.364466] RIP: 0010:cpuidle_enter_state+0xd2/0x360
[ 5511.370211] Code: 49 89 c5 0f 1f 44 00 00 31 ff e8 e9 00 7b ff 45 84 ff 74 12 9c 58 f6 c4 02 0f 85 72 02 00 00 31 ff e8 02 0c 80 ff fb 45 85 f6 &lt;0f&gt; 88 11 01 00 00 49 63 c6 4c 2b 2c 24 48 8d 14 40 48 8d 14 90 49
[ 5511.391921] RSP: 0018:ffffffff82a03e60 EFLAGS: 00000202
[ 5511.397962] RAX: ffff88903f800000 RBX: 0000000000000001 RCX: 000000000000001f
[ 5511.406214] RDX: 0000000000000000 RSI: ffffffff823400b9 RDI: ffffffff8234c046
[ 5511.424646] RBP: ffff88810a384800 R08: 000005032a28c046 R09: 0000000000000008
[ 5511.443233] R10: 000000000000000b R11: 0000000000000006 R12: ffffffff82bcf700
[ 5511.461922] R13: 000005032a28c046 R14: 0000000000000001 R15: 0000000000000000
[ 5511.480300]  cpuidle_enter+0x29/0x40
[ 5511.494329]  do_idle+0x1c7/0x250
[ 5511.507610]  cpu_startup_entry+0x19/0x20
[ 5511.521394]  start_kernel+0x649/0x66e
[ 5511.534626]  secondary_startup_64_no_verify+0xc3/0xcb
[ 5511.549230]  &lt;/TASK&gt;

Detect such case during bind() and allocate this memory region via newly
introduced xp_alloc_tx_descs(). Also, use kvcalloc instead of kcalloc as
for other buffer pool allocations, so that it matches the kvfree() from
xp_destroy().

Fixes: d1bc532e99be ("i40e: xsk: Move tmp desc array from driver to pool")
Signed-off-by: Maciej Fijalkowski &lt;maciej.fijalkowski@intel.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Magnus Karlsson &lt;magnus.karlsson@intel.com&gt;
Link: https://lore.kernel.org/bpf/20220425153745.481322-1-maciej.fijalkowski@intel.com
</content>
</entry>
<entry>
<title>i40e: xsk: Move tmp desc array from driver to pool</title>
<updated>2022-01-27T16:25:32+00:00</updated>
<author>
<name>Magnus Karlsson</name>
<email>magnus.karlsson@intel.com</email>
</author>
<published>2022-01-25T16:04:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=d1bc532e99becf104635ed4da6fefa306f452321'/>
<id>urn:sha1:d1bc532e99becf104635ed4da6fefa306f452321</id>
<content type='text'>
Move desc_array from the driver to the pool. The reason behind this is
that we can then reuse this array as a temporary storage for descriptors
in all zero-copy drivers that use the batched interface. This will make
it easier to add batching to more drivers.

i40e is the only driver that has a batched Tx zero-copy
implementation, so no need to touch any other driver.

Signed-off-by: Magnus Karlsson &lt;magnus.karlsson@intel.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Reviewed-by: Alexander Lobakin &lt;alexandr.lobakin@intel.com&gt;
Link: https://lore.kernel.org/bpf/20220125160446.78976-6-maciej.fijalkowski@intel.com
</content>
</entry>
<entry>
<title>xsk: Optimize for aligned case</title>
<updated>2021-09-27T22:18:35+00:00</updated>
<author>
<name>Magnus Karlsson</name>
<email>magnus.karlsson@intel.com</email>
</author>
<published>2021-09-22T07:56:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=94033cd8e73b8632bab7c8b7bb54caa4f5616db7'/>
<id>urn:sha1:94033cd8e73b8632bab7c8b7bb54caa4f5616db7</id>
<content type='text'>
Optimize for the aligned case by precomputing the parameter values of
the xdp_buff_xsk and xdp_buff structures in the heads array. We can do
this as the heads array size is equal to the number of chunks in the
umem for the aligned case. Then every entry in this array will reflect
a certain chunk/frame and can therefore be prepopulated with the
correct values and we can drop the use of the free_heads stack. Note
that it is not possible to allocate more buffers than what has been
allocated in the aligned case since each chunk can only contain a
single buffer.

We can unfortunately not do this in the unaligned case as one chunk
might contain multiple buffers. In this case, we keep the old scheme
of populating a heads entry every time it is used and using
the free_heads stack.

Also move xp_release() and xp_get_handle() to xsk_buff_pool.h. They
were for some reason in xsk.c even though they are buffer pool
operations.

Signed-off-by: Magnus Karlsson &lt;magnus.karlsson@intel.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Link: https://lore.kernel.org/bpf/20210922075613.12186-7-magnus.karlsson@gmail.com
</content>
</entry>
<entry>
<title>xsk: Batched buffer allocation for the pool</title>
<updated>2021-09-27T22:18:34+00:00</updated>
<author>
<name>Magnus Karlsson</name>
<email>magnus.karlsson@intel.com</email>
</author>
<published>2021-09-22T07:56:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=47e4075df300050a920b99299c4db3dad9adaba9'/>
<id>urn:sha1:47e4075df300050a920b99299c4db3dad9adaba9</id>
<content type='text'>
Add a new driver interface xsk_buff_alloc_batch() offering batched
buffer allocations to improve performance. The new interface takes
three arguments: the buffer pool to allocated from, a pointer to an
array of struct xdp_buff pointers which will contain pointers to the
allocated xdp_buffs, and an unsigned integer specifying the max number
of buffers to allocate. The return value is the actual number of
buffers that the allocator managed to allocate and it will be in the
range 0 &lt;= N &lt;= max, where max is the third parameter to the function.

u32 xsk_buff_alloc_batch(struct xsk_buff_pool *pool, struct xdp_buff **xdp,
                         u32 max);

A second driver interface is also introduced that need to be used in
conjunction with xsk_buff_alloc_batch(). It is a helper that sets the
size of struct xdp_buff and is used by the NIC Rx irq routine when
receiving a packet. This helper sets the three struct members data,
data_meta, and data_end. The two first ones is in the xsk_buff_alloc()
case set in the allocation routine and data_end is set when a packet
is received in the receive irq function. This unfortunately leads to
worse performance since the xdp_buff is touched twice with a long time
period in between leading to an extra cache miss. Instead, we fill out
the xdp_buff with all 3 fields at one single point in time in the
driver, when the size of the packet is known. Hence this helper. Note
that the driver has to use this helper (or set all three fields
itself) when using xsk_buff_alloc_batch(). xsk_buff_alloc() works as
before and does not require this.

void xsk_buff_set_size(struct xdp_buff *xdp, u32 size);

Signed-off-by: Magnus Karlsson &lt;magnus.karlsson@intel.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Link: https://lore.kernel.org/bpf/20210922075613.12186-3-magnus.karlsson@gmail.com
</content>
</entry>
<entry>
<title>xsk: Get rid of unused entry in struct xdp_buff_xsk</title>
<updated>2021-09-27T22:18:34+00:00</updated>
<author>
<name>Magnus Karlsson</name>
<email>magnus.karlsson@intel.com</email>
</author>
<published>2021-09-22T07:56:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=10a5e009b93a812956e232cee8804ed99f5b93bb'/>
<id>urn:sha1:10a5e009b93a812956e232cee8804ed99f5b93bb</id>
<content type='text'>
Get rid of the unused entry "unaligned" in struct xdp_buff_xsk.

Signed-off-by: Magnus Karlsson &lt;magnus.karlsson@intel.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Link: https://lore.kernel.org/bpf/20210922075613.12186-2-magnus.karlsson@gmail.com
</content>
</entry>
<entry>
<title>xsk: Fix missing validation for skb and unaligned mode</title>
<updated>2021-06-18T14:57:19+00:00</updated>
<author>
<name>Magnus Karlsson</name>
<email>magnus.karlsson@intel.com</email>
</author>
<published>2021-06-17T09:22:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=2f99619820c2269534eb2c0cde44870313c6d353'/>
<id>urn:sha1:2f99619820c2269534eb2c0cde44870313c6d353</id>
<content type='text'>
Fix a missing validation of a Tx descriptor when executing in skb mode
and the umem is in unaligned mode. A descriptor could point to a
buffer straddling the end of the umem, thus effectively tricking the
kernel to read outside the allowed umem region. This could lead to a
kernel crash if that part of memory is not mapped.

In zero-copy mode, the descriptor validation code rejects such
descriptors by checking a bit in the DMA address that tells us if the
next page is physically contiguous or not. For the last page in the
umem, this bit is not set, therefore any descriptor pointing to a
packet straddling this last page boundary will be rejected. However,
the skb path does not use this bit since it copies out data and can do
so to two different pages. (It also does not have the array of DMA
address, so it cannot even store this bit.) The code just returned
that the packet is always physically contiguous. But this is
unfortunately also returned for the last page in the umem, which means
that packets that cross the end of the umem are being allowed, which
they should not be.

Fix this by introducing a check for this in the SKB path only, not
penalizing the zero-copy path.

Fixes: 2b43470add8c ("xsk: Introduce AF_XDP buffer allocation API")
Signed-off-by: Magnus Karlsson &lt;magnus.karlsson@intel.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Björn Töpel &lt;bjorn@kernel.org&gt;
Link: https://lore.kernel.org/bpf/20210617092255.3487-1-magnus.karlsson@gmail.com
</content>
</entry>
</feed>
