<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/drivers/net/veth.c, branch v4.19.77</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v4.19.77</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v4.19.77'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2018-09-16T22:33:50+00:00</updated>
<entry>
<title>veth: Orphan skb before GRO</title>
<updated>2018-09-16T22:33:50+00:00</updated>
<author>
<name>Toshiaki Makita</name>
<email>makita.toshiaki@lab.ntt.co.jp</email>
</author>
<published>2018-09-14T04:33:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=4bf9ffa0fb5744ed40d7348c24fa9ae398b1d603'/>
<id>urn:sha1:4bf9ffa0fb5744ed40d7348c24fa9ae398b1d603</id>
<content type='text'>
GRO expects skbs not to be owned by sockets, but when XDP is enabled veth
passed skbs owned by sockets. It caused corrupted sk_wmem_alloc.

Paolo Abeni reported the following splat:

[  362.098904] refcount_t overflow at skb_set_owner_w+0x5e/0xa0 in iperf3[1644], uid/euid: 0/0
[  362.108239] WARNING: CPU: 0 PID: 1644 at kernel/panic.c:648 refcount_error_report+0xa0/0xa4
[  362.117547] Modules linked in: tcp_diag inet_diag veth intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate intel_uncore intel_rapl_perf ipmi_ssif iTCO_wdt sg ipmi_si iTCO_vendor_support ipmi_devintf mxm_wmi ipmi_msghandler pcspkr dcdbas mei_me wmi mei lpc_ich acpi_power_meter pcc_cpufreq xfs libcrc32c sd_mod mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ixgbe igb ttm ahci mdio libahci ptp crc32c_intel drm pps_core libata i2c_algo_bit dca dm_mirror dm_region_hash dm_log dm_mod
[  362.176622] CPU: 0 PID: 1644 Comm: iperf3 Not tainted 4.19.0-rc2.vanilla+ #2025
[  362.184777] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.1.7 06/16/2016
[  362.193124] RIP: 0010:refcount_error_report+0xa0/0xa4
[  362.198758] Code: 08 00 00 48 8b 95 80 00 00 00 49 8d 8c 24 80 0a 00 00 41 89 c1 44 89 2c 24 48 89 de 48 c7 c7 18 4d e7 9d 31 c0 e8 30 fa ff ff &lt;0f&gt; 0b eb 88 0f 1f 44 00 00 55 48 89 e5 41 56 41 55 41 54 49 89 fc
[  362.219711] RSP: 0018:ffff9ee6ff603c20 EFLAGS: 00010282
[  362.225538] RAX: 0000000000000000 RBX: ffffffff9de83e10 RCX: 0000000000000000
[  362.233497] RDX: 0000000000000001 RSI: ffff9ee6ff6167d8 RDI: ffff9ee6ff6167d8
[  362.241457] RBP: ffff9ee6ff603d78 R08: 0000000000000490 R09: 0000000000000004
[  362.249416] R10: 0000000000000000 R11: ffff9ee6ff603990 R12: ffff9ee664b94500
[  362.257377] R13: 0000000000000000 R14: 0000000000000004 R15: ffffffff9de615f9
[  362.265337] FS:  00007f1d22d28740(0000) GS:ffff9ee6ff600000(0000) knlGS:0000000000000000
[  362.274363] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  362.280773] CR2: 00007f1d222f35d0 CR3: 0000001fddfec003 CR4: 00000000001606f0
[  362.288733] Call Trace:
[  362.291459]  &lt;IRQ&gt;
[  362.293702]  ex_handler_refcount+0x4e/0x80
[  362.298269]  fixup_exception+0x35/0x40
[  362.302451]  do_trap+0x109/0x150
[  362.306048]  do_error_trap+0xd5/0x130
[  362.315766]  invalid_op+0x14/0x20
[  362.319460] RIP: 0010:skb_set_owner_w+0x5e/0xa0
[  362.324512] Code: ef ff ff 74 49 48 c7 43 60 20 7b 4a 9d 8b 85 f4 01 00 00 85 c0 75 16 8b 83 e0 00 00 00 f0 01 85 44 01 00 00 0f 88 d8 23 16 00 &lt;5b&gt; 5d c3 80 8b 91 00 00 00 01 8b 85 f4 01 00 00 89 83 a4 00 00 00
[  362.345465] RSP: 0018:ffff9ee6ff603e20 EFLAGS: 00010a86
[  362.351291] RAX: 0000000000001100 RBX: ffff9ee65deec700 RCX: ffff9ee65e829244
[  362.359250] RDX: 0000000000000100 RSI: ffff9ee65e829100 RDI: ffff9ee65deec700
[  362.367210] RBP: ffff9ee65e829100 R08: 000000000002a380 R09: 0000000000000000
[  362.375169] R10: 0000000000000002 R11: fffff1a4bf77bb00 R12: ffffc0754661d000
[  362.383130] R13: ffff9ee65deec200 R14: ffff9ee65f597000 R15: 00000000000000aa
[  362.391092]  veth_xdp_rcv+0x4e4/0x890 [veth]
[  362.399357]  veth_poll+0x4d/0x17a [veth]
[  362.403731]  net_rx_action+0x2af/0x3f0
[  362.407912]  __do_softirq+0xdd/0x29e
[  362.411897]  do_softirq_own_stack+0x2a/0x40
[  362.416561]  &lt;/IRQ&gt;
[  362.418899]  do_softirq+0x4b/0x70
[  362.422594]  __local_bh_enable_ip+0x50/0x60
[  362.427258]  ip_finish_output2+0x16a/0x390
[  362.431824]  ip_output+0x71/0xe0
[  362.440670]  __tcp_transmit_skb+0x583/0xab0
[  362.445333]  tcp_write_xmit+0x247/0xfb0
[  362.449609]  __tcp_push_pending_frames+0x2d/0xd0
[  362.454760]  tcp_sendmsg_locked+0x857/0xd30
[  362.459424]  tcp_sendmsg+0x27/0x40
[  362.463216]  sock_sendmsg+0x36/0x50
[  362.467104]  sock_write_iter+0x87/0x100
[  362.471382]  __vfs_write+0x112/0x1a0
[  362.475369]  vfs_write+0xad/0x1a0
[  362.479062]  ksys_write+0x52/0xc0
[  362.482759]  do_syscall_64+0x5b/0x180
[  362.486841]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  362.492473] RIP: 0033:0x7f1d22293238
[  362.496458] Code: 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 c5 54 2d 00 8b 00 85 c0 75 17 b8 01 00 00 00 0f 05 &lt;48&gt; 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 49 89 d4 55
[  362.517409] RSP: 002b:00007ffebaef8008 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[  362.525855] RAX: ffffffffffffffda RBX: 0000000000002800 RCX: 00007f1d22293238
[  362.533816] RDX: 0000000000002800 RSI: 00007f1d22d36000 RDI: 0000000000000005
[  362.541775] RBP: 00007f1d22d36000 R08: 00000002db777a30 R09: 0000562b70712b20
[  362.549734] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000005
[  362.557693] R13: 0000000000002800 R14: 00007ffebaef8060 R15: 0000562b70712260

In order to avoid this, orphan the skb before entering GRO.

Fixes: 948d4f214fde ("veth: Add driver XDP")
Reported-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
Signed-off-by: Toshiaki Makita &lt;makita.toshiaki@lab.ntt.co.jp&gt;
Tested-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>veth: Free queues on link delete</title>
<updated>2018-08-16T19:22:31+00:00</updated>
<author>
<name>Toshiaki Makita</name>
<email>makita.toshiaki@lab.ntt.co.jp</email>
</author>
<published>2018-08-15T08:07:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=7797b93b756dabffd1a8db3e7e7b778fb07ef0a6'/>
<id>urn:sha1:7797b93b756dabffd1a8db3e7e7b778fb07ef0a6</id>
<content type='text'>
David Ahern reported memory leak in veth.

=======================================================================
$ cat /sys/kernel/debug/kmemleak
unreferenced object 0xffff8800354d5c00 (size 1024):
  comm "ip", pid 836, jiffies 4294722952 (age 25.904s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [&lt;(____ptrval____)&gt;] kmemleak_alloc+0x70/0x94
    [&lt;(____ptrval____)&gt;] slab_post_alloc_hook+0x42/0x52
    [&lt;(____ptrval____)&gt;] __kmalloc+0x101/0x142
    [&lt;(____ptrval____)&gt;] kmalloc_array.constprop.20+0x1e/0x26 [veth]
    [&lt;(____ptrval____)&gt;] veth_newlink+0x147/0x3ac [veth]
    ...
unreferenced object 0xffff88002e009c00 (size 1024):
  comm "ip", pid 836, jiffies 4294722958 (age 25.898s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [&lt;(____ptrval____)&gt;] kmemleak_alloc+0x70/0x94
    [&lt;(____ptrval____)&gt;] slab_post_alloc_hook+0x42/0x52
    [&lt;(____ptrval____)&gt;] __kmalloc+0x101/0x142
    [&lt;(____ptrval____)&gt;] kmalloc_array.constprop.20+0x1e/0x26 [veth]
    [&lt;(____ptrval____)&gt;] veth_newlink+0x219/0x3ac [veth]
=======================================================================

veth_rq allocated in veth_newlink() was not freed on dellink.

We need to free up them after veth_close() so that any packets will not
reference the queues afterwards. Thus free them in veth_dev_free() in
the same way as freeing stats structure (vstats).

Also move queues allocation to veth_dev_init() to be in line with stats
allocation.

Fixes: 638264dc90227 ("veth: Support per queue XDP ring")
Reported-by: David Ahern &lt;dsahern@gmail.com&gt;
Signed-off-by: Toshiaki Makita &lt;makita.toshiaki@lab.ntt.co.jp&gt;
Reviewed-by: David Ahern &lt;dsahern@gmail.com&gt;
Tested-by: David Ahern &lt;dsahern@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>veth: Support per queue XDP ring</title>
<updated>2018-08-10T14:12:21+00:00</updated>
<author>
<name>Toshiaki Makita</name>
<email>makita.toshiaki@lab.ntt.co.jp</email>
</author>
<published>2018-08-03T07:58:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=638264dc90227cca00d20c26680171addce18e51'/>
<id>urn:sha1:638264dc90227cca00d20c26680171addce18e51</id>
<content type='text'>
Move XDP and napi related fields from veth_priv to newly created veth_rq
structure.

When xdp_frames are enqueued from ndo_xdp_xmit and XDP_TX, rxq is
selected by current cpu.

When skbs are enqueued from the peer device, rxq is one to one mapping
of its peer txq. This way we have a restriction that the number of rxqs
must not less than the number of peer txqs, but leave the possibility to
achieve bulk skb xmit in the future because txq lock would make it
possible to remove rxq ptr_ring lock.

v3:
- Add extack messages.
- Fix array overrun in veth_xmit.

Signed-off-by: Toshiaki Makita &lt;makita.toshiaki@lab.ntt.co.jp&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
</content>
</entry>
<entry>
<title>veth: Add XDP TX and REDIRECT</title>
<updated>2018-08-10T14:12:21+00:00</updated>
<author>
<name>Toshiaki Makita</name>
<email>makita.toshiaki@lab.ntt.co.jp</email>
</author>
<published>2018-08-03T07:58:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=d1396004dd868642ea2596abe058d96dcf97990f'/>
<id>urn:sha1:d1396004dd868642ea2596abe058d96dcf97990f</id>
<content type='text'>
This allows further redirection of xdp_frames like

 NIC   -&gt; veth--veth -&gt; veth--veth
 (XDP)          (XDP)         (XDP)

The intermediate XDP, redirecting packets from NIC to the other veth,
reuses xdp_mem_info from NIC so that page recycling of the NIC works on
the destination veth's XDP.
In this way return_frame is not fully guarded by NAPI, since another
NAPI handler on another cpu may use the same xdp_mem_info concurrently.
Thus disable napi_direct by xdp_set_return_frame_no_direct() during the
NAPI context.

v8:
- Don't use xdp_frame pointer address for data_hard_start of xdp_buff.

v4:
- Use xdp_[set|clear]_return_frame_no_direct() instead of a flag in
  xdp_mem_info.

v3:
- Fix double free when veth_xdp_tx() returns a positive value.
- Convert xdp_xmit and xdp_redir variables into flags.

Signed-off-by: Toshiaki Makita &lt;makita.toshiaki@lab.ntt.co.jp&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
</content>
</entry>
<entry>
<title>veth: Add ndo_xdp_xmit</title>
<updated>2018-08-10T14:12:21+00:00</updated>
<author>
<name>Toshiaki Makita</name>
<email>makita.toshiaki@lab.ntt.co.jp</email>
</author>
<published>2018-08-03T07:58:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=af87a3aa1b5f397a2f5c99b97b000943c5177da7'/>
<id>urn:sha1:af87a3aa1b5f397a2f5c99b97b000943c5177da7</id>
<content type='text'>
This allows NIC's XDP to redirect packets to veth. The destination veth
device enqueues redirected packets to the napi ring of its peer, then
they are processed by XDP on its peer veth device.
This can be thought as calling another XDP program by XDP program using
REDIRECT, when the peer enables driver XDP.

Note that when the peer veth device does not set driver xdp, redirected
packets will be dropped because the peer is not ready for NAPI.

v4:
- Don't use xdp_ok_fwd_dev() because checking IFF_UP is not necessary.
  Add comments about it and check only MTU.

v2:
- Drop the part converting xdp_frame into skb when XDP is not enabled.
- Implement bulk interface of ndo_xdp_xmit.
- Implement XDP_XMIT_FLUSH bit and drop ndo_xdp_flush.

Signed-off-by: Toshiaki Makita &lt;makita.toshiaki@lab.ntt.co.jp&gt;
Acked-by: John Fastabend &lt;john.fastabend@gmail.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
</content>
</entry>
<entry>
<title>veth: Handle xdp_frames in xdp napi ring</title>
<updated>2018-08-10T14:12:21+00:00</updated>
<author>
<name>Toshiaki Makita</name>
<email>makita.toshiaki@lab.ntt.co.jp</email>
</author>
<published>2018-08-03T07:58:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=9fc8d518d9d590998209f2686e026a488f65d41e'/>
<id>urn:sha1:9fc8d518d9d590998209f2686e026a488f65d41e</id>
<content type='text'>
This is preparation for XDP TX and ndo_xdp_xmit.
This allows napi handler to handle xdp_frames through xdp ring as well
as sk_buff.

v8:
- Don't use xdp_frame pointer address to calculate skb-&gt;head and
  headroom.

v7:
- Use xdp_scrub_frame() instead of memset().

v3:
- Revert v2 change around rings and use a flag to differentiate skb and
  xdp_frame, since bulk skb xmit makes little performance difference
  for now.

v2:
- Use another ring instead of using flag to differentiate skb and
  xdp_frame. This approach makes bulk skb transmit possible in
  veth_xmit later.
- Clear xdp_frame feilds in skb-&gt;head.
- Implement adjust_tail.

Signed-off-by: Toshiaki Makita &lt;makita.toshiaki@lab.ntt.co.jp&gt;
Acked-by: John Fastabend &lt;john.fastabend@gmail.com&gt;
Acked-by: Jesper Dangaard Brouer &lt;brouer@redhat.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
</content>
</entry>
<entry>
<title>veth: Avoid drops by oversized packets when XDP is enabled</title>
<updated>2018-08-10T14:12:20+00:00</updated>
<author>
<name>Toshiaki Makita</name>
<email>makita.toshiaki@lab.ntt.co.jp</email>
</author>
<published>2018-08-03T07:58:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=dc2248220a4aa61560c95aca98d4162095bd7e8a'/>
<id>urn:sha1:dc2248220a4aa61560c95aca98d4162095bd7e8a</id>
<content type='text'>
Oversized packets including GSO packets can be dropped if XDP is
enabled on receiver side, so don't send such packets from peer.

Drop TSO and SCTP fragmentation features so that veth devices themselves
segment packets with XDP enabled. Also cap MTU accordingly.

v4:
- Don't auto-adjust MTU but cap max MTU.

Signed-off-by: Toshiaki Makita &lt;makita.toshiaki@lab.ntt.co.jp&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
</content>
</entry>
<entry>
<title>veth: Add driver XDP</title>
<updated>2018-08-10T14:12:20+00:00</updated>
<author>
<name>Toshiaki Makita</name>
<email>makita.toshiaki@lab.ntt.co.jp</email>
</author>
<published>2018-08-03T07:58:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=948d4f214fde43743c57aae0c708bff44f6345f2'/>
<id>urn:sha1:948d4f214fde43743c57aae0c708bff44f6345f2</id>
<content type='text'>
This is the basic implementation of veth driver XDP.

Incoming packets are sent from the peer veth device in the form of skb,
so this is generally doing the same thing as generic XDP.

This itself is not so useful, but a starting point to implement other
useful veth XDP features like TX and REDIRECT.

This introduces NAPI when XDP is enabled, because XDP is now heavily
relies on NAPI context. Use ptr_ring to emulate NIC ring. Tx function
enqueues packets to the ring and peer NAPI handler drains the ring.

Currently only one ring is allocated for each veth device, so it does
not scale on multiqueue env. This can be resolved by allocating rings
on the per-queue basis later.

Note that NAPI is not used but netif_rx is used when XDP is not loaded,
so this does not change the default behaviour.

v6:
- Check skb-&gt;len only when allocation is needed.
- Add __GFP_NOWARN to alloc_page() as it can be triggered by external
  events.

v3:
- Fix race on closing the device.
- Add extack messages in ndo_bpf.

v2:
- Squashed with the patch adding NAPI.
- Implement adjust_tail.
- Don't acquire consumer lock because it is guarded by NAPI.
- Make poll_controller noop since it is unnecessary.
- Register rxq_info on enabling XDP rather than on opening the device.

Signed-off-by: Toshiaki Makita &lt;makita.toshiaki@lab.ntt.co.jp&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
</content>
</entry>
<entry>
<title>veth: set peer GSO values</title>
<updated>2017-12-08T19:22:59+00:00</updated>
<author>
<name>Stephen Hemminger</name>
<email>stephen@networkplumber.org</email>
</author>
<published>2017-12-07T23:40:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=72d24955b44a4039db54a1c252b5031969eeaac3'/>
<id>urn:sha1:72d24955b44a4039db54a1c252b5031969eeaac3</id>
<content type='text'>
When new veth is created, and GSO values have been configured
on one device, clone those values to the peer.

For example:
   # ip link add dev vm1 gso_max_size 65530 type veth peer name vm2

This should create vm1 &lt;--&gt; vm2 with both having GSO maximum
size set to 65530.

Signed-off-by: Stephen Hemminger &lt;sthemmin@microsoft.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net</title>
<updated>2017-06-30T16:43:08+00:00</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2017-06-30T16:43:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b07911593719828cac023bdcf6bf4da1c9ba546f'/>
<id>urn:sha1:b07911593719828cac023bdcf6bf4da1c9ba546f</id>
<content type='text'>
A set of overlapping changes in macvlan and the rocker
driver, nothing serious.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
