Age | Commit message (Collapse) | Author | Files | Lines |
|
virtio_find_vqs_ctx() is defined but never be called currently,
it is the right place to use it.
Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We should not directly BUG() when there is hdr error, it is
better to output a print when such error happens. Currently,
the caller of xmit_skb() already did it.
Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Bug fixes overlapping feature additions and refactoring, mostly.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In virtio-net's large packet mode, there is a hole in the space behind
buf.
hdr_padded_len - hdr_len
We must take this into account when calculating tailroom.
[ 44.544385] skb_put.cold (net/core/skbuff.c:5254 (discriminator 1) net/core/skbuff.c:5252 (discriminator 1))
[ 44.544864] page_to_skb (drivers/net/virtio_net.c:485) [ 44.545361] receive_buf (drivers/net/virtio_net.c:849 drivers/net/virtio_net.c:1131)
[ 44.545870] ? netif_receive_skb_list_internal (net/core/dev.c:5714)
[ 44.546628] ? dev_gro_receive (net/core/dev.c:6103)
[ 44.547135] ? napi_complete_done (./include/linux/list.h:35 net/core/dev.c:5867 net/core/dev.c:5862 net/core/dev.c:6565)
[ 44.547672] virtnet_poll (drivers/net/virtio_net.c:1427 drivers/net/virtio_net.c:1525)
[ 44.548251] __napi_poll (net/core/dev.c:6985)
[ 44.548744] net_rx_action (net/core/dev.c:7054 net/core/dev.c:7139)
[ 44.549264] __do_softirq (./arch/x86/include/asm/jump_label.h:19 ./include/linux/jump_label.h:200 ./include/trace/events/irq.h:142 kernel/softirq.c:560)
[ 44.549762] irq_exit_rcu (kernel/softirq.c:433 kernel/softirq.c:637 kernel/softirq.c:649)
[ 44.551384] common_interrupt (arch/x86/kernel/irq.c:240 (discriminator 13))
[ 44.551991] ? asm_common_interrupt (./arch/x86/include/asm/idtentry.h:638)
[ 44.552654] asm_common_interrupt (./arch/x86/include/asm/idtentry.h:638)
Fixes: fb32856b16ad ("virtio-net: page_to_skb() use build_skb when there's sufficient tailroom")
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Reported-by: Corentin Noël <corentin.noel@collabora.com>
Tested-by: Corentin Noël <corentin.noel@collabora.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In the case of merge, the page passed into page_to_skb() may be a head
page, not the page where the current data is located. So when trying to
get the buf where the data is located, we should get buf based on
headroom instead of offset.
This patch solves this problem. But if you don't use this patch, the
original code can also run, because if the page is not the page of the
current data, the calculated tailroom will be less than 0, and will not
enter the logic of build_skb() . The significance of this patch is to
modify this logical problem, allowing more situations to use
build_skb().
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In merge mode, when xdp is enabled, if the headroom of buf is smaller
than virtnet_get_headroom(), xdp_linearize_page() will be called but the
variable of "headroom" is still 0, which leads to wrong logic after
entering page_to_skb().
[ 16.600944] BUG: unable to handle page fault for address: ffffecbfff7b43c8[ 16.602175] #PF: supervisor read access in kernel mode
[ 16.603350] #PF: error_code(0x0000) - not-present page
[ 16.604200] PGD 0 P4D 0
[ 16.604686] Oops: 0000 [#1] SMP PTI
[ 16.605306] CPU: 4 PID: 715 Comm: sh Tainted: G B 5.12.0+ #312
[ 16.606429] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/04
[ 16.608217] RIP: 0010:unmap_page_range+0x947/0xde0
[ 16.609014] Code: 00 00 08 00 48 83 f8 01 45 19 e4 41 f7 d4 41 83 e4 03 e9 a4 fd ff ff e8 b7 63 ed ff 4c 89 e0 48 c1 e0 065
[ 16.611863] RSP: 0018:ffffc90002503c58 EFLAGS: 00010286
[ 16.612720] RAX: ffffecbfff7b43c0 RBX: 00007f19f7203000 RCX: ffffffff812ff359
[ 16.613853] RDX: ffff888107778000 RSI: 0000000000000000 RDI: 0000000000000005
[ 16.614976] RBP: ffffea000425e000 R08: 0000000000000000 R09: 3030303030303030
[ 16.616124] R10: ffffffff82ed7d94 R11: 6637303030302052 R12: 7c00000afffded0f
[ 16.617276] R13: 0000000000000001 R14: ffff888119ee7010 R15: 00007f19f7202000
[ 16.618423] FS: 0000000000000000(0000) GS:ffff88842fd00000(0000) knlGS:0000000000000000
[ 16.619738] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 16.620670] CR2: ffffecbfff7b43c8 CR3: 0000000103220005 CR4: 0000000000370ee0
[ 16.621792] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 16.622920] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 16.624047] Call Trace:
[ 16.624525] ? release_pages+0x24d/0x730
[ 16.625209] unmap_single_vma+0xa9/0x130
[ 16.625885] unmap_vmas+0x76/0xf0
[ 16.626480] exit_mmap+0xa0/0x210
[ 16.627129] mmput+0x67/0x180
[ 16.627673] do_exit+0x3d1/0xf10
[ 16.628259] ? do_user_addr_fault+0x231/0x840
[ 16.629000] do_group_exit+0x53/0xd0
[ 16.629631] __x64_sys_exit_group+0x1d/0x20
[ 16.630354] do_syscall_64+0x3c/0x80
[ 16.630988] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 16.631828] RIP: 0033:0x7f1a043d0191
[ 16.632464] Code: Unable to access opcode bytes at RIP 0x7f1a043d0167.
[ 16.633502] RSP: 002b:00007ffe3d993308 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
[ 16.634737] RAX: ffffffffffffffda RBX: 00007f1a044c9490 RCX: 00007f1a043d0191
[ 16.635857] RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000000
[ 16.636986] RBP: 0000000000000000 R08: ffffffffffffff88 R09: 0000000000000001
[ 16.638120] R10: 0000000000000008 R11: 0000000000000246 R12: 00007f1a044c9490
[ 16.639245] R13: 0000000000000001 R14: 00007f1a044c9968 R15: 0000000000000000
[ 16.640408] Modules linked in:
[ 16.640958] CR2: ffffecbfff7b43c8
[ 16.641557] ---[ end trace bc4891c6ce46354c ]---
[ 16.642335] RIP: 0010:unmap_page_range+0x947/0xde0
[ 16.643135] Code: 00 00 08 00 48 83 f8 01 45 19 e4 41 f7 d4 41 83 e4 03 e9 a4 fd ff ff e8 b7 63 ed ff 4c 89 e0 48 c1 e0 065
[ 16.645983] RSP: 0018:ffffc90002503c58 EFLAGS: 00010286
[ 16.646845] RAX: ffffecbfff7b43c0 RBX: 00007f19f7203000 RCX: ffffffff812ff359
[ 16.647970] RDX: ffff888107778000 RSI: 0000000000000000 RDI: 0000000000000005
[ 16.649091] RBP: ffffea000425e000 R08: 0000000000000000 R09: 3030303030303030
[ 16.650250] R10: ffffffff82ed7d94 R11: 6637303030302052 R12: 7c00000afffded0f
[ 16.651394] R13: 0000000000000001 R14: ffff888119ee7010 R15: 00007f19f7202000
[ 16.652529] FS: 0000000000000000(0000) GS:ffff88842fd00000(0000) knlGS:0000000000000000
[ 16.653887] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 16.654841] CR2: ffffecbfff7b43c8 CR3: 0000000103220005 CR4: 0000000000370ee0
[ 16.655992] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 16.657150] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 16.658290] Kernel panic - not syncing: Fatal exception
[ 16.659613] Kernel Offset: disabled
[ 16.660234] ---[ end Kernel panic - not syncing: Fatal exception ]---
Fixes: fb32856b16ad ("virtio-net: page_to_skb() use build_skb when there's sufficient tailroom")
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This adds validation for used length (might come
from an untrusted device) to avoid data corruption
or loss.
Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20210531135852.113-1-xieyongji@bytedance.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In the case of merge, the page passed into page_to_skb() may be a head
page, not the page where the current data is located. So when trying to
get the buf where the data is located, you should directly use the
pointer(p) to get the address corresponding to the page.
At the same time, the offset of the data in the page should also be
obtained using offset_in_page().
This patch solves this problem. But if you don’t use this patch, the
original code can also run, because if the page is not the page of the
current data, the calculated tailroom will be less than 0, and will not
enter the logic of build_skb() . The significance of this patch is to
modify this logical problem, allowing more situations to use
build_skb().
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In merge mode, when xdp is enabled, if the headroom of buf is smaller
than virtnet_get_headroom(), xdp_linearize_page() will be called but the
variable of "headroom" is still 0, which leads to wrong logic after
entering page_to_skb().
[ 16.600944] BUG: unable to handle page fault for address: ffffecbfff7b43c8[ 16.602175] #PF: supervisor read access in kernel mode
[ 16.603350] #PF: error_code(0x0000) - not-present page
[ 16.604200] PGD 0 P4D 0
[ 16.604686] Oops: 0000 [#1] SMP PTI
[ 16.605306] CPU: 4 PID: 715 Comm: sh Tainted: G B 5.12.0+ #312
[ 16.606429] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/04
[ 16.608217] RIP: 0010:unmap_page_range+0x947/0xde0
[ 16.609014] Code: 00 00 08 00 48 83 f8 01 45 19 e4 41 f7 d4 41 83 e4 03 e9 a4 fd ff ff e8 b7 63 ed ff 4c 89 e0 48 c1 e0 065
[ 16.611863] RSP: 0018:ffffc90002503c58 EFLAGS: 00010286
[ 16.612720] RAX: ffffecbfff7b43c0 RBX: 00007f19f7203000 RCX: ffffffff812ff359
[ 16.613853] RDX: ffff888107778000 RSI: 0000000000000000 RDI: 0000000000000005
[ 16.614976] RBP: ffffea000425e000 R08: 0000000000000000 R09: 3030303030303030
[ 16.616124] R10: ffffffff82ed7d94 R11: 6637303030302052 R12: 7c00000afffded0f
[ 16.617276] R13: 0000000000000001 R14: ffff888119ee7010 R15: 00007f19f7202000
[ 16.618423] FS: 0000000000000000(0000) GS:ffff88842fd00000(0000) knlGS:0000000000000000
[ 16.619738] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 16.620670] CR2: ffffecbfff7b43c8 CR3: 0000000103220005 CR4: 0000000000370ee0
[ 16.621792] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 16.622920] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 16.624047] Call Trace:
[ 16.624525] ? release_pages+0x24d/0x730
[ 16.625209] unmap_single_vma+0xa9/0x130
[ 16.625885] unmap_vmas+0x76/0xf0
[ 16.626480] exit_mmap+0xa0/0x210
[ 16.627129] mmput+0x67/0x180
[ 16.627673] do_exit+0x3d1/0xf10
[ 16.628259] ? do_user_addr_fault+0x231/0x840
[ 16.629000] do_group_exit+0x53/0xd0
[ 16.629631] __x64_sys_exit_group+0x1d/0x20
[ 16.630354] do_syscall_64+0x3c/0x80
[ 16.630988] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 16.631828] RIP: 0033:0x7f1a043d0191
[ 16.632464] Code: Unable to access opcode bytes at RIP 0x7f1a043d0167.
[ 16.633502] RSP: 002b:00007ffe3d993308 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
[ 16.634737] RAX: ffffffffffffffda RBX: 00007f1a044c9490 RCX: 00007f1a043d0191
[ 16.635857] RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000000
[ 16.636986] RBP: 0000000000000000 R08: ffffffffffffff88 R09: 0000000000000001
[ 16.638120] R10: 0000000000000008 R11: 0000000000000246 R12: 00007f1a044c9490
[ 16.639245] R13: 0000000000000001 R14: 00007f1a044c9968 R15: 0000000000000000
[ 16.640408] Modules linked in:
[ 16.640958] CR2: ffffecbfff7b43c8
[ 16.641557] ---[ end trace bc4891c6ce46354c ]---
[ 16.642335] RIP: 0010:unmap_page_range+0x947/0xde0
[ 16.643135] Code: 00 00 08 00 48 83 f8 01 45 19 e4 41 f7 d4 41 83 e4 03 e9 a4 fd ff ff e8 b7 63 ed ff 4c 89 e0 48 c1 e0 065
[ 16.645983] RSP: 0018:ffffc90002503c58 EFLAGS: 00010286
[ 16.646845] RAX: ffffecbfff7b43c0 RBX: 00007f19f7203000 RCX: ffffffff812ff359
[ 16.647970] RDX: ffff888107778000 RSI: 0000000000000000 RDI: 0000000000000005
[ 16.649091] RBP: ffffea000425e000 R08: 0000000000000000 R09: 3030303030303030
[ 16.650250] R10: ffffffff82ed7d94 R11: 6637303030302052 R12: 7c00000afffded0f
[ 16.651394] R13: 0000000000000001 R14: ffff888119ee7010 R15: 00007f19f7202000
[ 16.652529] FS: 0000000000000000(0000) GS:ffff88842fd00000(0000) knlGS:0000000000000000
[ 16.653887] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 16.654841] CR2: ffffecbfff7b43c8 CR3: 0000000103220005 CR4: 0000000000370ee0
[ 16.655992] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 16.657150] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 16.658290] Kernel panic - not syncing: Fatal exception
[ 16.659613] Kernel Offset: disabled
[ 16.660234] ---[ end Kernel panic - not syncing: Fatal exception ]---
Fixes: fb32856b16ad ("virtio-net: page_to_skb() use build_skb when there's sufficient tailroom")
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pull virtio updates from Michael Tsirkin:
"A bunch of new drivers including vdpa support for block and
virtio-vdpa.
Beginning of vq kick (aka doorbell) mapping support.
Misc fixes"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (40 commits)
virtio_pci_modern: correct sparse tags for notify
virtio_pci_modern: __force cast the notify mapping
vDPA/ifcvf: get_config_size should return dev specific config size
vDPA/ifcvf: enable Intel C5000X-PL virtio-block for vDPA
vDPA/ifcvf: deduce VIRTIO device ID when probe
vdpa_sim_blk: add support for vdpa management tool
vdpa_sim_blk: handle VIRTIO_BLK_T_GET_ID
vdpa_sim_blk: implement ramdisk behaviour
vdpa: add vdpa simulator for block device
vhost/vdpa: Remove the restriction that only supports virtio-net devices
vhost/vdpa: use get_config_size callback in vhost_vdpa_config_validate()
vdpa: add get_config_size callback in vdpa_config_ops
vdpa_sim: cleanup kiovs in vdpasim_free()
vringh: add vringh_kiov_length() helper
vringh: implement vringh_kiov_advance()
vringh: explain more about cleaning riov and wiov
vringh: reset kiov 'consumed' field in __vringh_iov()
vringh: add 'iotlb_lock' to synchronize iotlb accesses
vdpa_sim: use iova module to allocate IOVA addresses
vDPA/ifcvf: deduce VIRTIO device ID from pdev ids
...
|
|
Not all virtio_net devices support the ctrl queue feature. Thus, there
is no need to allocate unused resources.
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Link: https://lore.kernel.org/r/20210502093319.61313-1-mgurtovoy@nvidia.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
When "headroom" > 0, the actual allocated memory space is the entire
page, so the address of the page should be used when passing it to
build_skb().
BUG: KASAN: use-after-free in skb_gro_receive (net/core/skbuff.c:4260)
Write of size 16 at addr ffff88811619fffc by task kworker/u9:0/534
CPU: 2 PID: 534 Comm: kworker/u9:0 Not tainted 5.12.0-rc7-custom-16372-gb150be05b806 #3382
Hardware name: QEMU MSN2700, BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Workqueue: xprtiod xs_stream_data_receive_workfn [sunrpc]
Call Trace:
<IRQ>
dump_stack (lib/dump_stack.c:122)
print_address_description.constprop.0 (mm/kasan/report.c:233)
kasan_report.cold (mm/kasan/report.c:400 mm/kasan/report.c:416)
skb_gro_receive (net/core/skbuff.c:4260)
tcp_gro_receive (net/ipv4/tcp_offload.c:266 (discriminator 1))
tcp4_gro_receive (net/ipv4/tcp_offload.c:316)
inet_gro_receive (net/ipv4/af_inet.c:1545 (discriminator 2))
dev_gro_receive (net/core/dev.c:6075)
napi_gro_receive (net/core/dev.c:6168 net/core/dev.c:6198)
receive_buf (drivers/net/virtio_net.c:1151) virtio_net
virtnet_poll (drivers/net/virtio_net.c:1415 drivers/net/virtio_net.c:1519) virtio_net
__napi_poll (net/core/dev.c:6964)
net_rx_action (net/core/dev.c:7033 net/core/dev.c:7118)
__do_softirq (./arch/x86/include/asm/jump_label.h:25 ./include/linux/jump_label.h:200 ./include/trace/events/irq.h:142 kernel/softirq.c:346)
irq_exit_rcu (kernel/softirq.c:221 kernel/softirq.c:422 kernel/softirq.c:434)
common_interrupt (arch/x86/kernel/irq.c:240 (discriminator 14))
</IRQ>
Fixes: fb32856b16ad ("virtio-net: page_to_skb() use build_skb when there's sufficient tailroom")
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Reported-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
KASAN/syzbot had 4 reports, one of them being:
BUG: KASAN: slab-out-of-bounds in memcpy include/linux/fortify-string.h:191 [inline]
BUG: KASAN: slab-out-of-bounds in page_to_skb+0x5cf/0xb70 drivers/net/virtio_net.c:480
Read of size 12 at addr ffff888014a5f800 by task systemd-udevd/8445
CPU: 0 PID: 8445 Comm: systemd-udevd Not tainted 5.12.0-rc8-next-20210419-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:79 [inline]
dump_stack+0x141/0x1d7 lib/dump_stack.c:120
print_address_description.constprop.0.cold+0x5b/0x2f8 mm/kasan/report.c:233
__kasan_report mm/kasan/report.c:419 [inline]
kasan_report.cold+0x7c/0xd8 mm/kasan/report.c:436
check_region_inline mm/kasan/generic.c:180 [inline]
kasan_check_range+0x13d/0x180 mm/kasan/generic.c:186
memcpy+0x20/0x60 mm/kasan/shadow.c:65
memcpy include/linux/fortify-string.h:191 [inline]
page_to_skb+0x5cf/0xb70 drivers/net/virtio_net.c:480
receive_mergeable drivers/net/virtio_net.c:1009 [inline]
receive_buf+0x2bc0/0x6250 drivers/net/virtio_net.c:1119
virtnet_receive drivers/net/virtio_net.c:1411 [inline]
virtnet_poll+0x568/0x10b0 drivers/net/virtio_net.c:1516
__napi_poll+0xaf/0x440 net/core/dev.c:6962
napi_poll net/core/dev.c:7029 [inline]
net_rx_action+0x801/0xb40 net/core/dev.c:7116
__do_softirq+0x29b/0x9fe kernel/softirq.c:559
invoke_softirq kernel/softirq.c:433 [inline]
__irq_exit_rcu+0x136/0x200 kernel/softirq.c:637
irq_exit_rcu+0x5/0x20 kernel/softirq.c:649
common_interrupt+0xa4/0xd0 arch/x86/kernel/irq.c:240
Fixes: fb32856b16ad ("virtio-net: page_to_skb() use build_skb when there's sufficient tailroom")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Reported-by: Guenter Roeck <linux@roeck-us.net>
Reported-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: virtualization@lists.linux-foundation.org
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
build_skb() is supposed to be followed by
skb_reserve(skb, NET_IP_ALIGN), so that IP headers are word-aligned.
(Best practice is to reserve NET_IP_ALIGN+NET_SKB_PAD, but the NET_SKB_PAD
part is only a performance optimization if tunnel encaps are added.)
Unfortunately virtio_net has not provisioned this reserve.
We can only use build_skb() for arches where NET_IP_ALIGN == 0
We might refine this later, with enough testing.
Fixes: fb32856b16ad ("virtio-net: page_to_skb() use build_skb when there's sufficient tailroom")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Guenter Roeck <linux@roeck-us.net>
Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: virtualization@lists.linux-foundation.org
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In page_to_skb(), if we have enough tailroom to save skb_shared_info, we
can use build_skb to create skb directly. No need to alloc for
additional space. And it can save a 'frags slot', which is very friendly
to GRO.
Here, if the payload of the received package is too small (less than
GOOD_COPY_LEN), we still choose to copy it directly to the space got by
napi_alloc_skb. So we can reuse these pages.
Testing Machine:
The four queues of the network card are bound to the cpu1.
Test command:
for ((i=0;i<5;++i)); do sockperf tp --ip 192.168.122.64 -m 1000 -t 150& done
The size of the udp package is 1000, so in the case of this patch, there
will always be enough tailroom to use build_skb. The sent udp packet
will be discarded because there is no port to receive it. The irqsoftd
of the machine is 100%, we observe the received quantity displayed by
sar -n DEV 1:
no build_skb: 956864.00 rxpck/s
build_skb: 1158465.00 rxpck/s
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Suggested-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Conflicts:
MAINTAINERS
- keep Chandrasekar
drivers/net/ethernet/mellanox/mlx5/core/en_main.c
- simple fix + trust the code re-added to param.c in -next is fine
include/linux/bpf.h
- trivial
include/linux/ethtool.h
- trivial, fix kdoc while at it
include/linux/skmsg.h
- move to relevant place in tcp.c, comment re-wrapped
net/core/skmsg.c
- add the sk = sk // sk = NULL around calls
net/tipc/crypto.c
- trivial
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Xuan Zhuo reported that commit 3226b158e67c ("net: avoid 32 x truesize
under-estimation for tiny skbs") brought a ~10% performance drop.
The reason for the performance drop was that GRO was forced
to chain sk_buff (using skb_shinfo(skb)->frag_list), which
uses more memory but also cause packet consumers to go over
a lot of overhead handling all the tiny skbs.
It turns out that virtio_net page_to_skb() has a wrong strategy :
It allocates skbs with GOOD_COPY_LEN (128) bytes in skb->head, then
copies 128 bytes from the page, before feeding the packet to GRO stack.
This was suboptimal before commit 3226b158e67c ("net: avoid 32 x truesize
under-estimation for tiny skbs") because GRO was using 2 frags per MSS,
meaning we were not packing MSS with 100% efficiency.
Fix is to pull only the ethernet header in page_to_skb()
Then, we change virtio_net_hdr_to_skb() to pull the missing
headers, instead of assuming they were already pulled by callers.
This fixes the performance regression, but could also allow virtio_net
to accept packets with more than 128bytes of headers.
Many thanks to Xuan Zhuo for his report, and his tests/help.
Fixes: 3226b158e67c ("net: avoid 32 x truesize under-estimation for tiny skbs")
Reported-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Link: https://www.spinics.net/lists/netdev/msg731397.html
Co-Developed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: virtualization@lists.linux-foundation.org
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Alexei Starovoitov says:
====================
pull-request: bpf-next 2021-03-24
The following pull-request contains BPF updates for your *net-next* tree.
We've added 37 non-merge commits during the last 15 day(s) which contain
a total of 65 files changed, 3200 insertions(+), 738 deletions(-).
The main changes are:
1) Static linking of multiple BPF ELF files, from Andrii.
2) Move drop error path to devmap for XDP_REDIRECT, from Lorenzo.
3) Spelling fixes from various folks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Move the xps maps (xps_cpus_map and xps_rxqs_map) to an array in
net_device. That will simplify a lot the code removing the need for lots
of if/else conditionals as the correct map will be available using its
offset in the array.
This should not modify the xps maps behaviour in any way.
Suggested-by: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We want to change the current ndo_xdp_xmit drop semantics because it will
allow us to implement better queue overflow handling. This is working
towards the larger goal of a XDP TX queue-hook. Move XDP_REDIRECT error
path handling from each XDP ethernet driver to devmap code. According to
the new APIs, the driver running the ndo_xdp_xmit pointer, will break tx
loop whenever the hw reports a tx error and it will just return to devmap
caller the number of successfully transmitted frames. It will be devmap
responsibility to free dropped frames.
Move each XDP ndo_xdp_xmit capable driver to the new APIs:
- veth
- virtio-net
- mvneta
- mvpp2
- socionext
- amazon ena
- bnxt
- freescale (dpaa2, dpaa)
- xen-frontend
- qede
- ice
- igb
- ixgbe
- i40e
- mlx5
- ti (cpsw, cpsw-new)
- tun
- sfc
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Reviewed-by: Camelia Groza <camelia.groza@nxp.com>
Acked-by: Edward Cree <ecree.xilinx@gmail.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Shay Agroskin <shayagr@amazon.com>
Link: https://lore.kernel.org/bpf/ed670de24f951cfd77590decf0229a0ad7fd12f6.1615201152.git.lorenzo@kernel.org
|
|
Update the code to replace instances of snprintf and a pointer update with
just calling ethtool_sprintf.
Also replace the char pointer with a u8 pointer to avoid having to recast
the pointer type.
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The number of queues implemented by many virtio backends is limited,
especially some machines have a large number of CPUs. In this case, it
is often impossible to allocate a separate queue for
XDP_TX/XDP_REDIRECT, then xdp cannot be loaded to work, even xdp does
not use the XDP_TX/XDP_REDIRECT.
This patch allows XDP_TX/XDP_REDIRECT to run by reuse the existing SQ
with __netif_tx_lock() hold when there are not enough queues.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Alexei Starovoitov says:
====================
pull-request: bpf-next 2021-03-09
The following pull-request contains BPF updates for your *net-next* tree.
We've added 90 non-merge commits during the last 17 day(s) which contain
a total of 114 files changed, 5158 insertions(+), 1288 deletions(-).
The main changes are:
1) Faster bpf_redirect_map(), from Björn.
2) skmsg cleanup, from Cong.
3) Support for floating point types in BTF, from Ilya.
4) Documentation for sys_bpf commands, from Joe.
5) Support for sk_lookup in bpf_prog_test_run, form Lorenz.
6) Enable task local storage for tracing programs, from Song.
7) bpf_for_each_map_elem() helper, from Yonghong.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pull virtio updates from Michael Tsirkin:
- new vdpa features to allow creation and deletion of new devices
- virtio-blk support per-device queue depth
- fixes, cleanups all over the place
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (31 commits)
virtio-input: add multi-touch support
virtio_mmio: fix one typo
vdpa/mlx5: fix param validation in mlx5_vdpa_get_config()
virtio_net: Fix fall-through warnings for Clang
virtio_input: Prevent EV_MSC/MSC_TIMESTAMP loop storm for MT.
virtio-blk: support per-device queue depth
virtio_vdpa: don't warn when fail to disable vq
virtio-pci: introduce modern device module
virito-pci-modern: rename map_capability() to vp_modern_map_capability()
virtio-pci-modern: introduce helper to get notification offset
virtio-pci-modern: introduce helper for getting queue nums
virtio-pci-modern: introduce helper for setting/geting queue size
virtio-pci-modern: introduce helper to set/get queue_enable
virtio-pci-modern: introduce vp_modern_queue_address()
virtio-pci-modern: introduce vp_modern_set_queue_vector()
virtio-pci-modern: introduce vp_modern_generation()
virtio-pci-modern: introduce helpers for setting and getting features
virtio-pci-modern: introduce helpers for setting and getting status
virtio-pci-modern: introduce helper to set config vector
virtio-pci-modern: introduce vp_modern_remove()
...
|
|
Virtio net supports the case where the skb linear space is empty, so
add priv_flags.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20210218204908.5455-4-alobakin@pm.me
|
|
In preparation to enable -Wimplicit-fallthrough for Clang, fix a warning
by explicitly adding a goto statement instead of letting the code fall
through to the next case.
Link: https://github.com/KSPP/linux/issues/115
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/cb9b9534572bc476f4fb7b49a73dc8646b780c84.1605896060.git.gustavoars@kernel.org
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
Introduce xdp_prepare_buff utility routine to initialize per-descriptor
xdp_buff fields (e.g. xdp_buff pointers). Rely on xdp_prepare_buff() in
all XDP capable drivers.
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Shay Agroskin <shayagr@amazon.com>
Acked-by: Martin Habets <habetsm.xilinx@gmail.com>
Acked-by: Camelia Groza <camelia.groza@nxp.com>
Acked-by: Marcin Wojtas <mw@semihalf.com>
Link: https://lore.kernel.org/bpf/45f46f12295972a97da8ca01990b3e71501e9d89.1608670965.git.lorenzo@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Introduce xdp_init_buff utility routine to initialize xdp_buff fields
const over NAPI iterations (e.g. frame_sz or rxq pointer). Rely on
xdp_init_buff in all XDP capable drivers.
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Shay Agroskin <shayagr@amazon.com>
Acked-by: Martin Habets <habetsm.xilinx@gmail.com>
Acked-by: Camelia Groza <camelia.groza@nxp.com>
Acked-by: Marcin Wojtas <mw@semihalf.com>
Link: https://lore.kernel.org/bpf/7f8329b6da1434dc2b05a77f2e800b29628a8913.1608670965.git.lorenzo@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Networking fixes, including fixes from netfilter, wireless and bpf
trees.
Current release - regressions:
- mt76: fix NULL pointer dereference in mt76u_status_worker and
mt76s_process_tx_queue
- net: ipa: fix interconnect enable bug
Current release - always broken:
- netfilter: fixes possible oops in mtype_resize in ipset
- ath11k: fix number of coding issues found by static analysis tools
and spurious error messages
Previous releases - regressions:
- e1000e: re-enable s0ix power saving flows for systems with the
Intel i219-LM Ethernet controllers to fix power use regression
- virtio_net: fix recursive call to cpus_read_lock() to avoid a
deadlock
- ipv4: ignore ECN bits for fib lookups in fib_compute_spec_dst()
- sysfs: take the rtnl lock around XPS configuration
- xsk: fix memory leak for failed bind and rollback reservation at
NETDEV_TX_BUSY
- r8169: work around power-saving bug on some chip versions
Previous releases - always broken:
- dcb: validate netlink message in DCB handler
- tun: fix return value when the number of iovs exceeds MAX_SKB_FRAGS
to prevent unnecessary retries
- vhost_net: fix ubuf refcount when sendmsg fails
- bpf: save correct stopping point in file seq iteration
- ncsi: use real net-device for response handler
- neighbor: fix div by zero caused by a data race (TOCTOU)
- bareudp: fix use of incorrect min_headroom size and a false
positive lockdep splat from the TX lock
- mvpp2:
- clear force link UP during port init procedure in case
bootloader had set it
- add TCAM entry to drop flow control pause frames
- fix PPPoE with ipv6 packet parsing
- fix GoP Networking Complex Control config of port 3
- fix pkt coalescing IRQ-threshold configuration
- xsk: fix race in SKB mode transmit with shared cq
- ionic: account for vlan tag len in rx buffer len
- stmmac: ignore the second clock input, current clock framework does
not handle exclusive clock use well, other drivers may reconfigure
the second clock
Misc:
- ppp: change PPPIOCUNBRIDGECHAN ioctl request number to follow
existing scheme"
* tag 'net-5.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (99 commits)
net: dsa: lantiq_gswip: Fix GSWIP_MII_CFG(p) register access
net: dsa: lantiq_gswip: Enable GSWIP_MII_CFG_EN also for internal PHYs
net: lapb: Decrease the refcount of "struct lapb_cb" in lapb_device_event
r8169: work around power-saving bug on some chip versions
net: usb: qmi_wwan: add Quectel EM160R-GL
selftests: mlxsw: Set headroom size of correct port
net: macb: Correct usage of MACB_CAPS_CLK_HW_CHG flag
ibmvnic: fix: NULL pointer dereference.
docs: networking: packet_mmap: fix old config reference
docs: networking: packet_mmap: fix formatting for C macros
vhost_net: fix ubuf refcount incorrectly when sendmsg fails
bareudp: Fix use of incorrect min_headroom size
bareudp: set NETIF_F_LLTX flag
net: hdlc_ppp: Fix issues when mod_timer is called while timer is running
atlantic: remove architecture depends
erspan: fix version 1 check in gre_parse_header()
net: hns: fix return value check in __lb_other_process()
net: sched: prevent invalid Scell_log shift count
net: neighbor: fix a crash caused by mod zero
ipv4: Ignore ECN bits for fib lookups in fib_compute_spec_dst()
...
|
|
Pull virtio updates from Michael Tsirkin:
- vdpa sim refactoring
- virtio mem: Big Block Mode support
- misc cleanus, fixes
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (61 commits)
vdpa: Use simpler version of ida allocation
vdpa: Add missing comment for virtqueue count
uapi: virtio_ids: add missing device type IDs from OASIS spec
uapi: virtio_ids.h: consistent indentions
vhost scsi: fix error return code in vhost_scsi_set_endpoint()
virtio_ring: Fix two use after free bugs
virtio_net: Fix error code in probe()
virtio_ring: Cut and paste bugs in vring_create_virtqueue_packed()
tools/virtio: add barrier for aarch64
tools/virtio: add krealloc_array
tools/virtio: include asm/bug.h
vdpa/mlx5: Use write memory barrier after updating CQ index
vdpa: split vdpasim to core and net modules
vdpa_sim: split vdpasim_virtqueue's iov field in out_iov and in_iov
vdpa_sim: make vdpasim->buffer size configurable
vdpa_sim: use kvmalloc to allocate vdpasim->buffer
vdpa_sim: set vringh notify callback
vdpa_sim: add set_config callback in vdpasim_dev_attr
vdpa_sim: add get_config callback in vdpasim_dev_attr
vdpa_sim: make 'config' generic and usable for any device type
...
|
|
virtnet_set_channels can recursively call cpus_read_lock if CONFIG_XPS
and CONFIG_HOTPLUG are enabled.
The path is:
virtnet_set_channels - calls get_online_cpus(), which is a trivial
wrapper around cpus_read_lock()
netif_set_real_num_tx_queues
netif_reset_xps_queues_gt
netif_reset_xps_queues - calls cpus_read_lock()
This call chain and potential deadlock happens when the number of TX
queues is reduced.
This commit the removes netif_set_real_num_[tr]x_queues calls from
inside the get/put_online_cpus section, as they don't require that it
be held.
Fixes: 47be24796c13 ("virtio-net: fix the set affinity bug when CPU IDs are not consecutive")
Signed-off-by: Jeff Dike <jdike@akamai.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://lore.kernel.org/r/20201223025421.671-1-jdike@akamai.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Set a negative error code intead of returning success if the MTU has
been changed to something invalid.
Fixes: fe36cbe0671e ("virtio_net: clear MTU when out of range")
Reported-by: Robert Buhren <robert.buhren@sect.tu-berlin.de>
Reported-by: Felicitas Hetzelt <file@sect.tu-berlin.de>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/r/X8pGVJSeeCdII1Ys@mwanda
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
|
|
Add napi_id to the xdp_rxq_info structure, and make sure the XDP
socket pick up the napi_id in the Rx path. The napi_id is used to find
the corresponding NAPI structure for socket busy polling.
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/bpf/20201130185205.196029-7-bjorn.topel@gmail.com
|
|
This reverts commit 3618ad2a7c0e78e4258386394d5d5f92a3dbccf8.
When control vq is not negotiated, that commit causes a crash:
[ 72.229171] kernel BUG at drivers/net/virtio_net.c:1667!
[ 72.230266] invalid opcode: 0000 [#1] PREEMPT SMP
[ 72.231172] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.9.0-rc8-02934-g3618ad2a7c0e7 #1
[ 72.231172] EIP: virtnet_send_command+0x120/0x140
[ 72.231172] Code: 00 0f 94 c0 8b 7d f0 65 33 3d 14 00 00 00 75 1c 8d 65 f4 5b 5e 5f 5d c3 66 90 be 01 00 00 00 e9 6e ff ff ff 8d b6 00
+00 00 00 <0f> 0b e8 d9 bb 82 00 eb 17 8d b4 26 00 00 00 00 8d b4 26 00 00 00
[ 72.231172] EAX: 0000000d EBX: f72895c0 ECX: 00000017 EDX: 00000011
[ 72.231172] ESI: f7197800 EDI: ed69bd00 EBP: ed69bcf4 ESP: ed69bc98
[ 72.231172] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010246
[ 72.231172] CR0: 80050033 CR2: 00000000 CR3: 02c84000 CR4: 000406f0
[ 72.231172] Call Trace:
[ 72.231172] ? __virt_addr_valid+0x45/0x60
[ 72.231172] ? ___cache_free+0x51f/0x760
[ 72.231172] ? kobject_uevent_env+0xf4/0x560
[ 72.231172] virtnet_set_guest_offloads+0x4d/0x80
[ 72.231172] virtnet_set_features+0x85/0x120
[ 72.231172] ? virtnet_set_guest_offloads+0x80/0x80
[ 72.231172] __netdev_update_features+0x27a/0x8e0
[ 72.231172] ? kobject_uevent+0xa/0x20
[ 72.231172] ? netdev_register_kobject+0x12c/0x160
[ 72.231172] register_netdevice+0x4fe/0x740
[ 72.231172] register_netdev+0x1c/0x40
[ 72.231172] virtnet_probe+0x728/0xb60
[ 72.231172] ? _raw_spin_unlock+0x1d/0x40
[ 72.231172] ? virtio_vdpa_get_status+0x1c/0x20
[ 72.231172] virtio_dev_probe+0x1c6/0x271
[ 72.231172] really_probe+0x195/0x2e0
[ 72.231172] driver_probe_device+0x26/0x60
[ 72.231172] device_driver_attach+0x49/0x60
[ 72.231172] __driver_attach+0x46/0xc0
[ 72.231172] ? device_driver_attach+0x60/0x60
[ 72.231172] bus_add_driver+0x197/0x1c0
[ 72.231172] driver_register+0x66/0xc0
[ 72.231172] register_virtio_driver+0x1b/0x40
[ 72.231172] virtio_net_driver_init+0x61/0x86
[ 72.231172] ? veth_init+0x14/0x14
[ 72.231172] do_one_initcall+0x76/0x2e4
[ 72.231172] ? rdinit_setup+0x2a/0x2a
[ 72.231172] do_initcalls+0xb2/0xd5
[ 72.231172] kernel_init_freeable+0x14f/0x179
[ 72.231172] ? rest_init+0x100/0x100
[ 72.231172] kernel_init+0xd/0xe0
[ 72.231172] ret_from_fork+0x1c/0x30
[ 72.231172] Modules linked in:
[ 72.269563] ---[ end trace a6ebc4afea0e6cb1 ]---
The reason is that virtnet_set_features now calls virtnet_set_guest_offloads
unconditionally, it used to only call it when there is something
to configure.
If device does not have a control vq, everything breaks.
Revert the original commit for now.
Cc: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Fixes: 3618ad2a7c0e7 ("virtio-net: ethtool configurable RXCSUM")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20201021142944.13615-1-mst@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Allow user configuring RXCSUM separately with ethtool -K,
reusing the existing virtnet_set_guest_offloads helper
that configures RXCSUM for XDP. This is conditional on
VIRTIO_NET_F_CTRL_GUEST_OFFLOADS.
If Rx checksum is disabled, LRO should also be disabled.
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20201012015820.62042-1-xiangxia.m.yue@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Rejecting non-native endian BTF overlapped with the addition
of support for it.
The rest were more simple overlapping changes, except the
renesas ravb binding update, which had to follow a file
move as well as a YAML conversion.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Open vSwitch and Linux bridge will disable LRO of the interface
when this interface added to them. Now when disable the LRO, the
virtio-net csum is disable too. That drops the forwarding performance.
Fixes: a02e8964eaf9 ("virtio-net: ethtool configurable LRO")
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We allow drivers to call napi_hash_del() before calling
netif_napi_del() to batch RCU grace periods. This makes
the API asymmetric and leaks internal implementation details.
Soon we will want the grace period to protect more than just
the NAPI hash table.
Restructure the API and have drivers call a new function -
__netif_napi_del() if they want to take care of RCU waits.
Note that only core was checking the return status from
napi_hash_del() so the new helper does not report if the
NAPI was actually deleted.
Some notes on driver oddness:
- veth observed the grace period before calling netif_napi_del()
but that should not matter
- myri10ge observed normal RCU flavor
- bnx2x and enic did not actually observe the grace period
(unless they did so implicitly)
- virtio_net and enic only unhashed Rx NAPIs
The last two points seem to indicate that the calls to
napi_hash_del() were a left over rather than an optimization.
Regardless, it's easy enough to correct them.
This patch may introduce extra synchronize_net() calls for
interfaces which set NAPI_STATE_NO_BUSY_POLL and depend on
free_netdev() to call netif_napi_del(). This seems inevitable
since we want to use RCU for netpoll dev->napi_list traversal,
and almost no drivers set IFF_DISABLE_NETPOLL.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.
[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
|
|
Pull virtio updates from Michael Tsirkin:
- IRQ bypass support for vdpa and IFC
- MLX5 vdpa driver
- Endianness fixes for virtio drivers
- Misc other fixes
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (71 commits)
vdpa/mlx5: fix up endian-ness for mtu
vdpa: Fix pointer math bug in vdpasim_get_config()
vdpa/mlx5: Fix pointer math in mlx5_vdpa_get_config()
vdpa/mlx5: fix memory allocation failure checks
vdpa/mlx5: Fix uninitialised variable in core/mr.c
vdpa_sim: init iommu lock
virtio_config: fix up warnings on parisc
vdpa/mlx5: Add VDPA driver for supported mlx5 devices
vdpa/mlx5: Add shared memory registration code
vdpa/mlx5: Add support library for mlx5 VDPA implementation
vdpa/mlx5: Add hardware descriptive header file
vdpa: Modify get_vq_state() to return error code
net/vdpa: Use struct for set/get vq state
vdpa: remove hard coded virtq num
vdpasim: support batch updating
vhost-vdpa: support IOTLB batching hints
vhost-vdpa: support get/set backend features
vhost: generialize backend features setting/getting
vhost-vdpa: refine ioctl pre-processing
vDPA: dont change vq irq after DRIVER_OK
...
|
|
Speed and duplex config fields depend on VIRTIO_NET_F_SPEED_DUPLEX
which being 63>31 depends on VIRTIO_F_VERSION_1.
Accordingly, use LE accessors for these fields.
Reported-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
Now that BPF program/link management is centralized in generic net_device
code, kernel code never queries program id from drivers, so
XDP_QUERY_PROG/XDP_QUERY_PROG_HW commands are unnecessary.
This patch removes all the implementations of those commands in kernel, along
the xdp_attachment_query().
This patch was compile-tested on allyesconfig.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200722064603.3350758-10-andriin@fb.com
|
|
In order to use standard 'xdp' prefix, rename convert_to_xdp_frame
utility routine in xdp_convert_buff_to_frame and replace all the
occurrences
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Link: https://lore.kernel.org/bpf/6344f739be0d1a08ab2b9607584c4d5478c8c083.1590698295.git.lorenzo@kernel.org
|
|
Move the bpf verifier trace check into the new switch statement in
HEAD.
Resolve the overlapping changes in hinic, where bug fixes overlap
the addition of VF support.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The virtio_net driver is running inside the guest-OS. There are two
XDP receive code-paths in virtio_net, namely receive_small() and
receive_mergeable(). The receive_big() function does not support XDP.
In receive_small() the frame size is available in buflen. The buffer
backing these frames are allocated in add_recvbuf_small() with same
size, except for the headroom, but tailroom have reserved room for
skb_shared_info. The headroom is encoded in ctx pointer as a value.
In receive_mergeable() the frame size is more dynamic. There are two
basic cases: (1) buffer size is based on a exponentially weighted
moving average (see DECLARE_EWMA) of packet length. Or (2) in case
virtnet_get_headroom() have any headroom then buffer size is
PAGE_SIZE. The ctx pointer is this time used for encoding two values;
the buffer len "truesize" and headroom. In case (1) if the rx buffer
size is underestimated, the packet will have been split over more
buffers (num_buf info in virtio_net_hdr_mrg_rxbuf placed in top of
buffer area). If that happens the XDP path does a xdp_linearize_page
operation.
V3: Adjust frame_sz in receive_mergeable() case, spotted by Jason Wang.
The code is really hard to follow, so some hints to reviewers.
The receive_mergeable() case gets frames that were allocated in
add_recvbuf_mergeable() which uses headroom=virtnet_get_headroom(),
and 'buf' ptr is advanced this headroom. The headroom can only
be 0 or VIRTIO_XDP_HEADROOM, as virtnet_get_headroom is really
simple:
static unsigned int virtnet_get_headroom(struct virtnet_info *vi)
{
return vi->xdp_queue_pairs ? VIRTIO_XDP_HEADROOM : 0;
}
As frame_sz is an offset size from xdp.data_hard_start, reviewers
should notice how this is calculated in receive_mergeable():
int offset = buf - page_address(page);
[...]
data = page_address(xdp_page) + offset;
xdp.data_hard_start = data - VIRTIO_XDP_HEADROOM + vi->hdr_len;
The calculated offset will always be VIRTIO_XDP_HEADROOM when
reaching this code. Thus, xdp.data_hard_start will be page-start
address plus vi->hdr_len. Given this xdp.frame_sz need to be
reduced with vi->hdr_len size.
IMHO a followup patch should cleanup this code to make it easier
to maintain and understand, but it is outside the scope of this
patchset.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/bpf/158945344436.97035.9445115070189151680.stgit@firesoul
|
|
When we fill up a receive VQ, try_fill_recv currently tries to count
kicks using a 64 bit stats counter. Turns out, on a 32 bit kernel that
uses a seqcount. sequence counts are "lock" constructs where you need to
make sure that writers are serialized.
In turn, this means that we mustn't run two try_fill_recv concurrently.
Which of course we don't. We do run try_fill_recv sometimes from a
softirq napi context, and sometimes from a fully preemptible context,
but the later always runs with napi disabled.
However, when it comes to the seqcount, lockdep is trying to enforce the
rule that the same lock isn't accessed from preemptible and softirq
context - it doesn't know about napi being enabled/disabled. This causes
a false-positive warning:
WARNING: inconsistent lock state
...
inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
As a work around, shut down the warning by switching
to u64_stats_update_begin_irqsave - that works by disabling
interrupts on 32 bit only, is a NOP on 64 bit.
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Set ethtool_ops->supported_coalesce_params to let
the core reject unsupported coalescing parameters.
This driver correctly rejects all unsupported parameters.
As a side effect of these changes the error code for
unsupported params changes from EINVAL to EOPNOTSUPP.
v2: correctly handle rx-frames (and adjust the commit msg)
v3: adjust commit message for new error code and member name
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
With the ethtool_virtdev_set_link_ksettings function in core/ethtool.c,
ibmveth, netvsc, and virtio now use the core's helper function.
Funtionality changes that pertain to ibmveth driver include:
1. Changed the initial hardcoded link speed to 1GB.
2. Added support for allowing a user to change the reported link
speed via ethtool.
Functionality changes to the netvsc driver include:
1. When netvsc_get_link_ksettings is called, it will defer to the VF
device if it exists to pull accelerated networking values, otherwise
pull default or user-defined values.
2. Similarly, if netvsc_set_link_ksettings called and a VF device
exists, the real values of speed and duplex are changed.
Signed-off-by: Cris Forno <cforno12@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Implement support for transferring XDP meta data into skb for
virtio_net driver; before calling into the program, xdp.data_meta points
to xdp.data, where on program return with pass verdict, we call
into skb_metadata_set().
Tested with the script at
https://github.com/higebu/virtio_net-xdp-metadata-test.
Signed-off-by: Yuya Kusakabe <yuya.kusakabe@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://lore.kernel.org/bpf/20200225033212.437563-2-yuya.kusakabe@gmail.com
|
|
We do not want to care about the vnet header in receive_small() if XDP
is loaded, since we can not know whether or not the packet is modified
by XDP.
Fixes: f6b10209b90d ("virtio-net: switch to use build_skb() for small buffer")
Signed-off-by: Yuya Kusakabe <yuya.kusakabe@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://lore.kernel.org/bpf/20200225033212.437563-1-yuya.kusakabe@gmail.com
|