summaryrefslogtreecommitdiff
path: root/drivers/net/virtio_net.c
AgeCommit message (Collapse)AuthorFilesLines
2023-02-06virtio-net: Maintain reverse cleanup orderParav Pandit1-1/+1
To easily audit the code, better to keep the device stop() sequence to be mirror of the device open() sequence. Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-1/+1
net/core/gro.c 7d2c89b32587 ("skb: Do mix page pool and page referenced frags in GRO") b1a78b9b9886 ("net: add support for ipv4 big tcp") https://lore.kernel.org/all/20230203094454.5766f160@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-02virtio-net: Keep stop() to follow mirror sequence of open()Parav Pandit1-1/+1
Cited commit in fixes tag frees rxq xdp info while RQ NAPI is still enabled and packet processing may be ongoing. Follow the mirror sequence of open() in the stop() callback. This ensures that when rxq info is unregistered, no rx packet processing is ongoing. Fixes: 754b8a21a96d ("virtio_net: setup xdp_rxq_info") Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Parav Pandit <parav@nvidia.com> Link: https://lore.kernel.org/r/20230202163516.12559-1-parav@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-02virtio-net: fix possible unsigned integer overflowHeng Qi1-6/+9
When the single-buffer xdp is loaded and after xdp_linearize_page() is called, *num_buf becomes 0 and (*num_buf - 1) may overflow into a large integer in virtnet_build_xdp_buff_mrg(), resulting in unexpected packet dropping. Fixes: ef75cb51f139 ("virtio-net: build xdp_buff with multi buffers") Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Link: https://lore.kernel.org/r/20230131085004.98687-1-hengqi@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-02virtio_net: notify MAC address change on device initializationLaurent Vivier1-0/+20
In virtnet_probe(), if the device doesn't provide a MAC address the driver assigns a random one. As we modify the MAC address we need to notify the device to allow it to update all the related information. The problem can be seen with vDPA and mlx5_vdpa driver as it doesn't assign a MAC address by default. The virtio_net device uses a random MAC address (we can see it with "ip link"), but we can't ping a net namespace from another one using the virtio-vdpa device because the new MAC address has not been provided to the hardware: RX packets are dropped since they don't go through the receive filters, TX packets go through unaffected. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-02virtio_net: disable VIRTIO_NET_F_STANDBY if VIRTIO_NET_F_MAC is not setLaurent Vivier1-0/+6
failover relies on the MAC address to pair the primary and the standby devices: "[...] the hypervisor needs to enable VIRTIO_NET_F_STANDBY feature on the virtio-net interface and assign the same MAC address to both virtio-net and VF interfaces." Documentation/networking/net_failover.rst This patch disables the STANDBY feature if the MAC address is not provided by the hypervisor. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-28Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-3/+3
Conflicts: drivers/net/ethernet/intel/ice/ice_main.c 418e53401e47 ("ice: move devlink port creation/deletion") 643ef23bd9dd ("ice: Introduce local var for readability") https://lore.kernel.org/all/20230127124025.0dacef40@canb.auug.org.au/ https://lore.kernel.org/all/20230124005714.3996270-1-anthony.l.nguyen@intel.com/ drivers/net/ethernet/engleder/tsnep_main.c 3d53aaef4332 ("tsnep: Fix TX queue stop/wake for multiple queues") 25faa6a4c5ca ("tsnep: Replace TX spin_lock with __netif_tx_lock") https://lore.kernel.org/all/20230127123604.36bb3e99@canb.auug.org.au/ net/netfilter/nf_conntrack_proto_sctp.c 13bd9b31a969 ("Revert "netfilter: conntrack: add sctp DATA_SENT state"") a44b7651489f ("netfilter: conntrack: unify established states for SCTP paths") f71cb8f45d09 ("netfilter: conntrack: sctp: use nf log infrastructure for invalid packets") https://lore.kernel.org/all/20230127125052.674281f9@canb.auug.org.au/ https://lore.kernel.org/all/d36076f3-6add-a442-6d4b-ead9f7ffff86@tessares.net/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-28virtio-net: execute xdp_do_flush() before napi_complete_done()Magnus Karlsson1-3/+3
Make sure that xdp_do_flush() is always executed before napi_complete_done(). This is important for two reasons. First, a redirect to an XSKMAP assumes that a call to xdp_do_redirect() from napi context X on CPU Y will be followed by a xdp_do_flush() from the same napi context and CPU. This is not guaranteed if the napi_complete_done() is executed before xdp_do_flush(), as it tells the napi logic that it is fine to schedule napi context X on another CPU. Details from a production system triggering this bug using the veth driver can be found following the first link below. The second reason is that the XDP_REDIRECT logic in itself relies on being inside a single NAPI instance through to the xdp_do_flush() call for RCU protection of all in-kernel data structures. Details can be found in the second link below. Fixes: 186b3c998c50 ("virtio-net: support XDP_REDIRECT") Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/r/20221220185903.1105011-1-sbohrer@cloudflare.com Link: https://lore.kernel.org/all/20210624160609.292325-1-toke@redhat.com/ Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-25virtio-net: Reduce debug name field size to 16 bytesParav Pandit1-2/+2
virtio queue index can be maximum of 65535. 16 bytes are enough to store the vq name with the existing string prefix. With this change, send queue struct saves 24 bytes and receive queue saves whole cache line worth 64 bytes per structure due to saving in alignment bytes. Pahole results before: pahole -s drivers/net/virtio_net.o | \ grep -e "send_queue" -e "receive_queue" send_queue 1112 0 receive_queue 1280 1 Pahole results after: pahole -s drivers/net/virtio_net.o | \ grep -e "send_queue" -e "receive_queue" send_queue 1088 0 receive_queue 1216 1 Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-2/+4
drivers/net/ipa/ipa_interrupt.c drivers/net/ipa/ipa_interrupt.h 9ec9b2a30853 ("net: ipa: disable ipa interrupt during suspend") 8e461e1f092b ("net: ipa: introduce ipa_interrupt_enable()") d50ed3558719 ("net: ipa: enable IPA interrupt handlers separate from registration") https://lore.kernel.org/all/20230119114125.5182c7ab@canb.auug.org.au/ https://lore.kernel.org/all/79e46152-8043-a512-79d9-c3b905462774@tessares.net/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-18virtio-net: correctly enable callback during start_xmitJason Wang1-2/+4
Commit a7766ef18b33("virtio_net: disable cb aggressively") enables virtqueue callback via the following statement: do { if (use_napi) virtqueue_disable_cb(sq->vq); free_old_xmit_skbs(sq, false); } while (use_napi && kick && unlikely(!virtqueue_enable_cb_delayed(sq->vq))); When NAPI is used and kick is false, the callback won't be enabled here. And when the virtqueue is about to be full, the tx will be disabled, but we still don't enable tx interrupt which will cause a TX hang. This could be observed when using pktgen with burst enabled. TO be consistent with the logic that tries to disable cb only for NAPI, fixing this by trying to enable delayed callback only when NAPI is enabled when the queue is about to be full. Fixes: a7766ef18b33 ("virtio_net: disable cb aggressively") Signed-off-by: Jason Wang <jasowang@redhat.com> Tested-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-18virtio_net: Reuse buffer free functionParav Pandit1-7/+1
virtnet_rq_free_unused_buf() helper function to free the buffer already exists. Avoid code duplication by reusing existing function. Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: Parav Pandit <parav@nvidia.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-16virtio-net: support multi-buffer xdpHeng Qi1-55/+10
Driver can pass the skb to stack by build_skb_from_xdp_buff(). Driver forwards multi-buffer packets using the send queue when XDP_TX and XDP_REDIRECT, and clears the reference of multi pages when XDP_DROP. Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-16virtio-net: remove xdp related info from page_to_skb()Heng Qi1-32/+9
For the clear construction of xdp_buff, we remove the xdp processing interleaved with page_to_skb(). Now, the logic of xdp and building skb from xdp are separate and independent. Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-16virtio-net: build skb from multi-buffer xdpHeng Qi1-0/+49
This converts the xdp_buff directly to a skb, including multi-buffer and single buffer xdp. We'll isolate the construction of skb based on xdp from page_to_skb(). Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-16virtio-net: transmit the multi-buffer xdpHeng Qi1-5/+26
This serves as the basis for XDP_TX and XDP_REDIRECT to send a multi-buffer xdp_frame. Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-16virtio-net: construct multi-buffer xdp in mergeableHeng Qi1-14/+44
Build multi-buffer xdp using virtnet_build_xdp_buff_mrg(). For the prefilled buffer before xdp is set, we will probably use vq reset in the future. At the same time, virtio net currently uses comp pages, and bpf_xdp_frags_increase_tail() needs to calculate the tailroom of the last frag, which will involve the offset of the corresponding page and cause a negative value, so we disable tail increase by not setting xdp_rxq->frag_size. Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-16virtio-net: build xdp_buff with multi buffersHeng Qi1-8/+100
Support xdp for multi buffer packets in mergeable mode. Putting the first buffer as the linear part for xdp_buff, and the rest of the buffers as non-linear fragments to struct skb_shared_info in the tailroom belonging to xdp_buff. Let 'truesize' return to its literal meaning, that is, when xdp is set, it includes the length of headroom and tailroom. Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-16virtio-net: update bytes calculation for xdp_frameHeng Qi1-2/+2
Update relative record value for xdp_frame as basis for multi-buffer xdp transmission. Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-16virtio-net: set up xdp for multi buffer packetsHeng Qi1-3/+3
When the xdp program sets xdp.frags, which means it can process multi-buffer packets over larger MTU, so we continue to support xdp. Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-16virtio-net: fix calculation of MTU for single-buffer xdpHeng Qi1-2/+4
When single-buffer xdp is loaded, the size of the buffer filled each time is 'sz = (PAGE_SIZE - headroom - tailroom)', which is the maximum packet length that the driver allows the device to pass in. Otherwise, the packet with a length greater than sz will come in, so num_buf will be greater than or equal to 2, and xdp_linearize_page() will be performed and the packet will be dropped because the total length is greater than PAGE_SIZE. So the maximum value of MTU for single-buffer xdp is 'max_sz = sz - ETH_HLEN'. Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-16virtio-net: disable the hole mechanism for xdpHeng Qi1-1/+4
XDP core assumes that the frame_size of xdp_buff and the length of the frag are PAGE_SIZE. The hole may cause the processing of xdp to fail, so we disable the hole mechanism when xdp is set. Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-12drivers/net/virtio_net.c: Added USO support.Andrew Melnychenko1-4/+15
Now, it possible to enable GSO_UDP_L4("tx-udp-segmentation") for VirtioNet. Signed-off-by: Andrew Melnychenko <andrew@daynix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-2/+1
tools/lib/bpf/ringbuf.c 927cbb478adf ("libbpf: Handle size overflow for ringbuf mmap") b486d19a0ab0 ("libbpf: checkpatch: Fixed code alignments in ringbuf.c") https://lore.kernel.org/all/20221121122707.44d1446a@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-24virtio_net: Fix probe failed when modprobe virtio_netLi Zetao1-2/+1
When doing the following test steps, an error was found: step 1: modprobe virtio_net succeeded # modprobe virtio_net <-- OK step 2: fault injection in register_netdevice() # modprobe -r virtio_net <-- OK # ... FAULT_INJECTION: forcing a failure. name failslab, interval 1, probability 0, space 0, times 0 CPU: 0 PID: 3521 Comm: modprobe Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), Call Trace: <TASK> ... should_failslab+0xa/0x20 ... dev_set_name+0xc0/0x100 netdev_register_kobject+0xc2/0x340 register_netdevice+0xbb9/0x1320 virtnet_probe+0x1d72/0x2658 [virtio_net] ... </TASK> virtio_net: probe of virtio0 failed with error -22 step 3: modprobe virtio_net failed # modprobe virtio_net <-- failed virtio_net: probe of virtio0 failed with error -2 The root cause of the problem is that the queues are not disable on the error handling path when register_netdevice() fails in virtnet_probe(), resulting in an error "-ENOENT" returned in the next modprobe call in setup_vq(). virtio_pci_modern_device uses virtqueues to send or receive message, and "queue_enable" records whether the queues are available. In vp_modern_find_vqs(), all queues will be selected and activated, but once queues are enabled there is no way to go back except reset. Fix it by reset virtio device on error handling path. This makes error handling follow the same order as normal device cleanup in virtnet_remove() which does: unregister, destroy failover, then reset. And that flow is better tested than error handling so we can be reasonably sure it works well. Fixes: 024655555021 ("virtio_net: fix use after free on allocation failure") Signed-off-by: Li Zetao <lizetao1@huawei.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://lore.kernel.org/r/20221122150046.3910638-1-lizetao1@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-10-29net: Remove the obsolte u64_stats_fetch_*_irq() users (drivers).Thomas Gleixner1-8/+8
Now that the 32bit UP oddity is gone and 32bit uses always a sequence count, there is no need for the fetch_irq() variants anymore. Convert to the regular interface. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-11Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds1-16/+32
Pull virtio updates from Michael Tsirkin: - 9k mtu perf improvements - vdpa feature provisioning - virtio blk SECURE ERASE support - fixes and cleanups all over the place * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: virtio_pci: don't try to use intxif pin is zero vDPA: conditionally read MTU and MAC in dev cfg space vDPA: fix spars cast warning in vdpa_dev_net_mq_config_fill vDPA: check virtio device features to detect MQ vDPA: check VIRTIO_NET_F_RSS for max_virtqueue_paris's presence vDPA: only report driver features if FEATURES_OK is set vDPA: allow userspace to query features of a vDPA device virtio_blk: add SECURE ERASE command support vp_vdpa: support feature provisioning vdpa_sim_net: support feature provisioning vdpa: device feature provisioning virtio-net: use mtu size as buffer length for big packets virtio-net: introduce and use helper function for guest gso support checks virtio: drop vp_legacy_set_queue_size virtio_ring: make vring_alloc_queue_packed prettier virtio_ring: split: Operators use unified style vhost: add __init/__exit annotations to module init/exit funcs
2022-10-07virtio-net: use mtu size as buffer length for big packetsGavin Li1-13/+24
Currently add_recvbuf_big() allocates MAX_SKB_FRAGS segments for big packets even when GUEST_* offloads are not present on the device. However, if guest GSO is not supported, it would be sufficient to allocate segments to cover just up the MTU size and no further. Allocating the maximum amount of segments results in a large waste of buffer space in the queue, which limits the number of packets that can be buffered and can result in reduced performance. Therefore, if guest GSO is not supported, use the MTU to calculate the optimal amount of segments required. Below is the iperf TCP test results over a Mellanox NIC, using vDPA for 1 VQ, queue size 1024, before and after the change, with the iperf server running over the virtio-net interface. MTU(Bytes)/Bandwidth (Gbit/s) Before After 1500 22.5 22.4 9000 12.8 25.9 And result of queue size 256. MTU(Bytes)/Bandwidth (Gbit/s) Before After 9000 2.15 11.9 With this patch no degradation is observed with multiple below tests and feature bit combinations. Results are summarized below for q depth of 1024. Interface MTU is 1500 if MTU feature is disabled. MTU is set to 9000 in other tests. Features/ Bandwidth (Gbit/s) Before After mtu off 20.1 20.2 mtu/indirect on 17.4 17.3 mtu/indirect/packed on 17.2 17.2 Signed-off-by: Gavin Li <gavinl@nvidia.com> Reviewed-by: Gavi Teitz <gavi@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> Message-Id: <20220914144911.56422-3-gavinl@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2022-10-07virtio-net: introduce and use helper function for guest gso support checksGavin Li1-4/+9
Probe routine is already several hundred lines. Use helper function for guest gso support check. Signed-off-by: Gavin Li <gavinl@nvidia.com> Reviewed-by: Gavi Teitz <gavi@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> Message-Id: <20220914144911.56422-2-gavinl@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-09-01net: move from strlcpy with unused retval to strscpyWolfram Sang1-3/+3
Follow the advice of the below link and prefer 'strscpy' in this subsystem. Conversion is 1:1 because the return value is not used. Generated by a coccinelle script. Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/ Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> # for CAN Link: https://lore.kernel.org/r/20220830201457.7984-1-wsa+renesas@sang-engineering.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-19Merge tag 'net-6.0-rc2' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from netfilter. Current release - regressions: - tcp: fix cleanup and leaks in tcp_read_skb() (the new way BPF socket maps get data out of the TCP stack) - tls: rx: react to strparser initialization errors - netfilter: nf_tables: fix scheduling-while-atomic splat - net: fix suspicious RCU usage in bpf_sk_reuseport_detach() Current release - new code bugs: - mlxsw: ptp: fix a couple of races, static checker warnings and error handling Previous releases - regressions: - netfilter: - nf_tables: fix possible module reference underflow in error path - make conntrack helpers deal with BIG TCP (skbs > 64kB) - nfnetlink: re-enable conntrack expectation events - net: fix potential refcount leak in ndisc_router_discovery() Previous releases - always broken: - sched: cls_route: disallow handle of 0 - neigh: fix possible local DoS due to net iface start/stop loop - rtnetlink: fix module refcount leak in rtnetlink_rcv_msg - sched: fix adding qlen to qcpu->backlog in gnet_stats_add_queue_cpu - virtio_net: fix endian-ness for RSS - dsa: mv88e6060: prevent crash on an unused port - fec: fix timer capture timing in `fec_ptp_enable_pps()` - ocelot: stats: fix races, integer wrapping and reading incorrect registers (the change of register definitions here accounts for bulk of the changed LoC in this PR)" * tag 'net-6.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (77 commits) net: moxa: MAC address reading, generating, validity checking tcp: handle pure FIN case correctly tcp: refactor tcp_read_skb() a bit tcp: fix tcp_cleanup_rbuf() for tcp_read_skb() tcp: fix sock skb accounting in tcp_read_skb() igb: Add lock to avoid data race dt-bindings: Fix incorrect "the the" corrections net: genl: fix error path memory leak in policy dumping stmmac: intel: Add a missing clk_disable_unprepare() call in intel_eth_pci_remove() net: ethernet: mtk_eth_soc: fix possible NULL pointer dereference in mtk_xdp_run net/mlx5e: Allocate flow steering storage during uplink initialization net: mscc: ocelot: report ndo_get_stats64 from the wraparound-resistant ocelot->stats net: mscc: ocelot: keep ocelot_stat_layout by reg address, not offset net: mscc: ocelot: make struct ocelot_stat_layout array indexable net: mscc: ocelot: fix race between ndo_get_stats64 and ocelot_check_stats_work net: mscc: ocelot: turn stats_lock into a spinlock net: mscc: ocelot: fix address of SYS_COUNT_TX_AGING counter net: mscc: ocelot: fix incorrect ndo_get_stats64 packet counters net: dsa: felix: fix ethtool 256-511 and 512-1023 TX packet counters net: dsa: don't warn in dsa_port_set_state_now() when driver doesn't support it ...
2022-08-16virtio_net: Revert "virtio_net: set the default max ring size by find_vqs()"Michael S. Tsirkin1-38/+4
This reverts commit 762faee5a2678559d3dc09d95f8f2c54cd0466a7. This has been reported to trip up guests on GCP (Google Cloud). The reason is that virtio_find_vqs_ctx_size is broken on legacy devices. We can in theory fix virtio_find_vqs_ctx_size but in fact the patch itself has several other issues: - It treats unknown speed as < 10G - It leaves userspace no way to find out the ring size set by hypervisor - It tests speed when link is down - It ignores the virtio spec advice: Both \field{speed} and \field{duplex} can change, thus the driver is expected to re-read these values after receiving a configuration change notification. - It is not clear the performance impact has been tested properly Revert the patch for now. Reported-by: Andres Freund <andres@anarazel.de> Link: https://lore.kernel.org/r/20220814212610.GA3690074%40roeck-us.net Link: https://lore.kernel.org/r/20220815070203.plwjx7b3cyugpdt7%40awork3.anarazel.de Link: https://lore.kernel.org/r/3df6bb82-1951-455d-a768-e9e1513eb667%40www.fastmail.com Link: https://lore.kernel.org/r/FCDC5DDE-3CDD-4B8A-916F-CA7D87B547CE%40anarazel.de Fixes: 762faee5a267 ("virtio_net: set the default max ring size by find_vqs()") Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Cc: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Andres Freund <andres@anarazel.de> Tested-by: Guenter Roeck <linux@roeck-us.net> Message-Id: <20220816053602.173815-2-mst@redhat.com>
2022-08-12Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds1-39/+286
Pull virtio updates from Michael Tsirkin: - A huge patchset supporting vq resize using the new vq reset capability - Features, fixes, and cleanups all over the place * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (88 commits) vdpa/mlx5: Fix possible uninitialized return value vdpa_sim_blk: add support for discard and write-zeroes vdpa_sim_blk: add support for VIRTIO_BLK_T_FLUSH vdpa_sim_blk: make vdpasim_blk_check_range usable by other requests vdpa_sim_blk: check if sector is 0 for commands other than read or write vdpa_sim: Implement suspend vdpa op vhost-vdpa: uAPI to suspend the device vhost-vdpa: introduce SUSPEND backend feature bit vdpa: Add suspend operation virtio-blk: Avoid use-after-free on suspend/resume virtio_vdpa: support the arg sizes of find_vqs() vhost-vdpa: Call ida_simple_remove() when failed vDPA: fix 'cast to restricted le16' warnings in vdpa.c vDPA: !FEATURES_OK should not block querying device config space vDPA/ifcvf: support userspace to query features and MQ of a management device vDPA/ifcvf: get_config_size should return a value no greater than dev implementation vhost scsi: Allow user to control num virtqueues vhost-scsi: Fix max number of virtqueues vdpa/mlx5: Support different address spaces for control and data vdpa/mlx5: Implement susupend virtqueue callback ...
2022-08-12virtio_net: fix endian-ness for RSSMichael S. Tsirkin1-2/+2
Using native endian-ness for device supplied fields is wrong on BE platforms. Sparse warns about this. Fixes: 91f41f01d219 ("drivers/net/virtio_net: Added RSS hash report.") Cc: "Andrew Melnychenko" <andrew@daynix.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-11net: virtio_net: notifications coalescing supportAlvaro Karsz1-15/+96
New VirtIO network feature: VIRTIO_NET_F_NOTF_COAL. Control a Virtio network device notifications coalescing parameters using the control virtqueue. A device that supports this fetature can receive VIRTIO_NET_CTRL_NOTF_COAL control commands. - VIRTIO_NET_CTRL_NOTF_COAL_TX_SET: Ask the network device to change the following parameters: - tx_usecs: Maximum number of usecs to delay a TX notification. - tx_max_packets: Maximum number of packets to send before a TX notification. - VIRTIO_NET_CTRL_NOTF_COAL_RX_SET: Ask the network device to change the following parameters: - rx_usecs: Maximum number of usecs to delay a RX notification. - rx_max_packets: Maximum number of packets to receive before a RX notification. VirtIO spec. patch: https://lists.oasis-open.org/archives/virtio-comment/202206/msg00100.html Signed-off-by: Alvaro Karsz <alvaro.karsz@solid-run.com> Message-Id: <20220718091102.498774-1-alvaro.karsz@solid-run.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Jason Wang <jasowang@redhat.com>
2022-08-11virtio_net: support set_ringparamXuan Zhuo1-0/+48
Support set_ringparam based on virtio queue reset. Users can use ethtool -G eth0 <ring_num> to modify the ring size of virtio-net. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220801063902.129329-43-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11virtio_net: support tx queue resizeXuan Zhuo1-0/+50
This patch implements the resize function of the tx queues. Based on this function, it is possible to modify the ring num of the queue. Inludes fixup: virtio_net: fix for stuck when change tx ring size with dev down When dev is set to DOWN state, napi has been disabled, if we modify the ring size at this time, we should not call napi_disable() again, which will cause stuck. And all operations are under the protection of rtnl_lock, so there is no need to consider concurrency issues. Message-Id: <20220801063902.129329-42-xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220811080258.79398-3-xuanzhuo@linux.alibaba.com> Reported-by: Kangjie Xu <kangjie.xu@linux.alibaba.com> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11virtio_net: support rx queue resizeXuan Zhuo1-0/+25
This patch implements the resize function of the rx queues. Based on this function, it is possible to modify the ring num of the queue. Includes fixup: virtio_net: fix for stuck when change rx ring size with dev down When dev is set to DOWN state, napi has been disabled, if we modify the ring size at this time, we should not call napi_disable() again, which will cause stuck. And all operations are under the protection of rtnl_lock, so there is no need to consider concurrency issues. Message-Id: <20220801063902.129329-41-xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220811080258.79398-2-xuanzhuo@linux.alibaba.com> Reported-by: Kangjie Xu <kangjie.xu@linux.alibaba.com> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11virtio_net: split free_unused_bufs()Xuan Zhuo1-16/+25
This patch separates two functions for freeing sq buf and rq buf from free_unused_bufs(). When supporting the enable/disable tx/rq queue in the future, it is necessary to support separate recovery of a sq buf or a rq buf. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220801063902.129329-40-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11virtio_net: get ringparam by virtqueue_get_vring_max_size()Xuan Zhuo1-4/+4
Use virtqueue_get_vring_max_size() in virtnet_get_ringparam() to set tx,rx_max_pending. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220801063902.129329-39-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11virtio_net: set the default max ring size by find_vqs()Xuan Zhuo1-4/+38
Use virtio_find_vqs_ctx_size() to specify the maximum ring size of tx, rx at the same time. | rx/tx ring size ------------------------------------------- speed == UNKNOWN or < 10G| 1024 speed < 40G | 4096 speed >= 40G | 8192 Call virtnet_update_settings() once before calling init_vqs() to update speed. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220801063902.129329-38-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-08virtio_net: fix memory leak inside XPD_TX with mergeableXuan Zhuo1-1/+4
When we call xdp_convert_buff_to_frame() to get xdpf, if it returns NULL, we should check if xdp_page was allocated by xdp_linearize_page(). If it is newly allocated, it should be freed here alone. Just like any other "goto err_xdp". Fixes: 44fa2dbd4759 ("xdp: transition into using xdp_frame for ndo_xdp_xmit") Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-27virtio-net: fix the race between refill work and closeJason Wang1-3/+34
We try using cancel_delayed_work_sync() to prevent the work from enabling NAPI. This is insufficient since we don't disable the source of the refill work scheduling. This means an NAPI poll callback after cancel_delayed_work_sync() can schedule the refill work then can re-enable the NAPI that leads to use-after-free [1]. Since the work can enable NAPI, we can't simply disable NAPI before calling cancel_delayed_work_sync(). So fix this by introducing a dedicated boolean to control whether or not the work could be scheduled from NAPI. [1] ================================================================== BUG: KASAN: use-after-free in refill_work+0x43/0xd4 Read of size 2 at addr ffff88810562c92e by task kworker/2:1/42 CPU: 2 PID: 42 Comm: kworker/2:1 Not tainted 5.19.0-rc1+ #480 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 Workqueue: events refill_work Call Trace: <TASK> dump_stack_lvl+0x34/0x44 print_report.cold+0xbb/0x6ac ? _printk+0xad/0xde ? refill_work+0x43/0xd4 kasan_report+0xa8/0x130 ? refill_work+0x43/0xd4 refill_work+0x43/0xd4 process_one_work+0x43d/0x780 worker_thread+0x2a0/0x6f0 ? process_one_work+0x780/0x780 kthread+0x167/0x1a0 ? kthread_exit+0x50/0x50 ret_from_fork+0x22/0x30 </TASK> ... Fixes: b2baed69e605c ("virtio_net: set/cancel work on ndo_open/ndo_stop") Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-06-27Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds1-1/+7
Pull virtio fixes from Michael Tsirkin: "Fixes all over the place, most notably we are disabling IRQ hardening (again!)" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: virtio_ring: make vring_create_virtqueue_split prettier vhost-vdpa: call vhost_vdpa_cleanup during the release virtio_mmio: Restore guest page size on resume virtio_mmio: Add missing PM calls to freeze/restore caif_virtio: fix race between virtio_device_ready() and ndo_open() virtio-net: fix race between ndo_open() and virtio_device_ready() virtio: disable notification hardening by default virtio: Remove unnecessary variable assignments virtio_ring : keep used_wrap_counter in vq->last_used_idx vduse: Tie vduse mgmtdev and its device vdpa/mlx5: Initialize CVQ vringh only once vdpa/mlx5: Update Control VQ callback information
2022-06-27virtio-net: fix race between ndo_open() and virtio_device_ready()Jason Wang1-1/+7
We currently call virtio_device_ready() after netdev registration. Since ndo_open() can be called immediately after register_netdev, this means there exists a race between ndo_open() and virtio_device_ready(): the driver may start to use the device before DRIVER_OK which violates the spec. Fix this by switching to use register_netdevice() and protect the virtio_device_ready() with rtnl_lock() to make sure ndo_open() can only be called after virtio_device_ready(). Fixes: 4baf1e33d0842 ("virtio_net: enable VQs early") Signed-off-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220617072949.30734-1-jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-06-23virtio_net: fix xdp_rxq_info bug after suspend/resumeStephan Gerhold1-19/+6
The following sequence currently causes a driver bug warning when using virtio_net: # ip link set eth0 up # echo mem > /sys/power/state (or e.g. # rtcwake -s 10 -m mem) <resume> # ip link set eth0 down Missing register, driver bug WARNING: CPU: 0 PID: 375 at net/core/xdp.c:138 xdp_rxq_info_unreg+0x58/0x60 Call trace: xdp_rxq_info_unreg+0x58/0x60 virtnet_close+0x58/0xac __dev_close_many+0xac/0x140 __dev_change_flags+0xd8/0x210 dev_change_flags+0x24/0x64 do_setlink+0x230/0xdd0 ... This happens because virtnet_freeze() frees the receive_queue completely (including struct xdp_rxq_info) but does not call xdp_rxq_info_unreg(). Similarly, virtnet_restore() sets up the receive_queue again but does not call xdp_rxq_info_reg(). Actually, parts of virtnet_freeze_down() and virtnet_restore_up() are almost identical to virtnet_close() and virtnet_open(): only the calls to xdp_rxq_info_(un)reg() are missing. This means that we can fix this easily and avoid such problems in the future by just calling virtnet_close()/open() from the freeze/restore handlers. Aside from adding the missing xdp_rxq_info calls the only difference is that the refill work is only cancelled if netif_running(). However, this should not make any functional difference since the refill work should only be active if the network interface is actually up. Fixes: 754b8a21a96d ("virtio_net: setup xdp_rxq_info") Signed-off-by: Stephan Gerhold <stephan.gerhold@kernkonzept.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20220621114845.3650258-1-stephan.gerhold@kernkonzept.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-08net: virtio: switch to netif_napi_add_weight()Jakub Kicinski1-2/+2
virtio netdev driver uses a custom napi weight, switch to the new API for setting custom weight. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-06net: move snowflake callers to netif_napi_add_tx_weight()Jakub Kicinski1-2/+3
Make the drivers with custom tx napi weight call netif_napi_add_tx_weight(). Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Link: https://lore.kernel.org/r/20220504163725.550782-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-26virtio_net: fix wrong buf address calculation when using xdpNikolay Aleksandrov1-1/+19
We received a report[1] of kernel crashes when Cilium is used in XDP mode with virtio_net after updating to newer kernels. After investigating the reason it turned out that when using mergeable bufs with an XDP program which adjusts xdp.data or xdp.data_meta page_to_buf() calculates the build_skb address wrong because the offset can become less than the headroom so it gets the address of the previous page (-X bytes depending on how lower offset is): page_to_skb: page addr ffff9eb2923e2000 buf ffff9eb2923e1ffc offset 252 headroom 256 This is a pr_err() I added in the beginning of page_to_skb which clearly shows offset that is less than headroom by adding 4 bytes of metadata via an xdp prog. The calculations done are: receive_mergeable(): headroom = VIRTIO_XDP_HEADROOM; // VIRTIO_XDP_HEADROOM == 256 bytes offset = xdp.data - page_address(xdp_page) - vi->hdr_len - metasize; page_to_skb(): p = page_address(page) + offset; ... buf = p - headroom; Now buf goes -4 bytes from the page's starting address as can be seen above which is set as skb->head and skb->data by build_skb later. Depending on what's done with the skb (when it's freed most often) we get all kinds of corruptions and BUG_ON() triggers in mm[2]. We have to recalculate the new headroom after the xdp program has run, similar to how offset and len are recalculated. Headroom is directly related to data_hard_start, data and data_meta, so we use them to get the new size. The result is correct (similar pr_err() in page_to_skb, one case of xdp_page and one case of virtnet buf): a) Case with 4 bytes of metadata [ 115.949641] page_to_skb: page addr ffff8b4dcfad2000 offset 252 headroom 252 [ 121.084105] page_to_skb: page addr ffff8b4dcf018000 offset 20732 headroom 252 b) Case of pushing data +32 bytes [ 153.181401] page_to_skb: page addr ffff8b4dd0c4d000 offset 288 headroom 288 [ 158.480421] page_to_skb: page addr ffff8b4dd00b0000 offset 24864 headroom 288 c) Case of pushing data -33 bytes [ 835.906830] page_to_skb: page addr ffff8b4dd3270000 offset 223 headroom 223 [ 840.839910] page_to_skb: page addr ffff8b4dcdd68000 offset 12511 headroom 223 Offset and headroom are equal because offset points to the start of reserved bytes for the virtio_net header which are at buf start + headroom, while data points at buf start + vnet hdr size + headroom so when data or data_meta are adjusted by the xdp prog both the headroom size and the offset change equally. We can use data_hard_start to compute the new headroom after the xdp prog (linearized / page start case, the virtnet buf case is similar just with bigger base offset): xdp.data_hard_start = page_address + vnet_hdr xdp.data = page_address + vnet_hdr + headroom new headroom after xdp prog = xdp.data - xdp.data_hard_start - metasize An example reproducer xdp prog[3] is below. [1] https://github.com/cilium/cilium/issues/19453 [2] Two of the many traces: [ 40.437400] BUG: Bad page state in process swapper/0 pfn:14940 [ 40.916726] BUG: Bad page state in process systemd-resolve pfn:053b7 [ 41.300891] kernel BUG at include/linux/mm.h:720! [ 41.301801] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI [ 41.302784] CPU: 1 PID: 1181 Comm: kubelet Kdump: loaded Tainted: G B W 5.18.0-rc1+ #37 [ 41.304458] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1.fc35 04/01/2014 [ 41.306018] RIP: 0010:page_frag_free+0x79/0xe0 [ 41.306836] Code: 00 00 75 ea 48 8b 07 a9 00 00 01 00 74 e0 48 8b 47 48 48 8d 50 ff a8 01 48 0f 45 fa eb d0 48 c7 c6 18 b8 30 a6 e8 d7 f8 fc ff <0f> 0b 48 8d 78 ff eb bc 48 8b 07 a9 00 00 01 00 74 3a 66 90 0f b6 [ 41.310235] RSP: 0018:ffffac05c2a6bc78 EFLAGS: 00010292 [ 41.311201] RAX: 000000000000003e RBX: 0000000000000000 RCX: 0000000000000000 [ 41.312502] RDX: 0000000000000001 RSI: ffffffffa6423004 RDI: 00000000ffffffff [ 41.313794] RBP: ffff993c98823600 R08: 0000000000000000 R09: 00000000ffffdfff [ 41.315089] R10: ffffac05c2a6ba68 R11: ffffffffa698ca28 R12: ffff993c98823600 [ 41.316398] R13: ffff993c86311ebc R14: 0000000000000000 R15: 000000000000005c [ 41.317700] FS: 00007fe13fc56740(0000) GS:ffff993cdd900000(0000) knlGS:0000000000000000 [ 41.319150] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 41.320152] CR2: 000000c00008a000 CR3: 0000000014908000 CR4: 0000000000350ee0 [ 41.321387] Call Trace: [ 41.321819] <TASK> [ 41.322193] skb_release_data+0x13f/0x1c0 [ 41.322902] __kfree_skb+0x20/0x30 [ 41.343870] tcp_recvmsg_locked+0x671/0x880 [ 41.363764] tcp_recvmsg+0x5e/0x1c0 [ 41.384102] inet_recvmsg+0x42/0x100 [ 41.406783] ? sock_recvmsg+0x1d/0x70 [ 41.428201] sock_read_iter+0x84/0xd0 [ 41.445592] ? 0xffffffffa3000000 [ 41.462442] new_sync_read+0x148/0x160 [ 41.479314] ? 0xffffffffa3000000 [ 41.496937] vfs_read+0x138/0x190 [ 41.517198] ksys_read+0x87/0xc0 [ 41.535336] do_syscall_64+0x3b/0x90 [ 41.551637] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 41.568050] RIP: 0033:0x48765b [ 41.583955] Code: e8 4a 35 fe ff eb 88 cc cc cc cc cc cc cc cc e8 fb 7a fe ff 48 8b 7c 24 10 48 8b 74 24 18 48 8b 54 24 20 48 8b 44 24 08 0f 05 <48> 3d 01 f0 ff ff 76 20 48 c7 44 24 28 ff ff ff ff 48 c7 44 24 30 [ 41.632818] RSP: 002b:000000c000a2f5b8 EFLAGS: 00000212 ORIG_RAX: 0000000000000000 [ 41.664588] RAX: ffffffffffffffda RBX: 000000c000062000 RCX: 000000000048765b [ 41.681205] RDX: 0000000000005e54 RSI: 000000c000e66000 RDI: 0000000000000016 [ 41.697164] RBP: 000000c000a2f608 R08: 0000000000000001 R09: 00000000000001b4 [ 41.713034] R10: 00000000000000b6 R11: 0000000000000212 R12: 00000000000000e9 [ 41.728755] R13: 0000000000000001 R14: 000000c000a92000 R15: ffffffffffffffff [ 41.744254] </TASK> [ 41.758585] Modules linked in: br_netfilter bridge veth netconsole virtio_net and [ 33.524802] BUG: Bad page state in process systemd-network pfn:11e60 [ 33.528617] page ffffe05dc0147b00 ffffe05dc04e7a00 ffff8ae9851ec000 (1) len 82 offset 252 metasize 4 hroom 0 hdr_len 12 data ffff8ae9851ec10c data_meta ffff8ae9851ec108 data_end ffff8ae9851ec14e [ 33.529764] page:000000003792b5ba refcount:0 mapcount:-512 mapping:0000000000000000 index:0x0 pfn:0x11e60 [ 33.532463] flags: 0xfffffc0000000(node=0|zone=1|lastcpupid=0x1fffff) [ 33.532468] raw: 000fffffc0000000 0000000000000000 dead000000000122 0000000000000000 [ 33.532470] raw: 0000000000000000 0000000000000000 00000000fffffdff 0000000000000000 [ 33.532471] page dumped because: nonzero mapcount [ 33.532472] Modules linked in: br_netfilter bridge veth netconsole virtio_net [ 33.532479] CPU: 0 PID: 791 Comm: systemd-network Kdump: loaded Not tainted 5.18.0-rc1+ #37 [ 33.532482] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1.fc35 04/01/2014 [ 33.532484] Call Trace: [ 33.532496] <TASK> [ 33.532500] dump_stack_lvl+0x45/0x5a [ 33.532506] bad_page.cold+0x63/0x94 [ 33.532510] free_pcp_prepare+0x290/0x420 [ 33.532515] free_unref_page+0x1b/0x100 [ 33.532518] skb_release_data+0x13f/0x1c0 [ 33.532524] kfree_skb_reason+0x3e/0xc0 [ 33.532527] ip6_mc_input+0x23c/0x2b0 [ 33.532531] ip6_sublist_rcv_finish+0x83/0x90 [ 33.532534] ip6_sublist_rcv+0x22b/0x2b0 [3] XDP program to reproduce(xdp_pass.c): #include <linux/bpf.h> #include <bpf/bpf_helpers.h> SEC("xdp_pass") int xdp_pkt_pass(struct xdp_md *ctx) { bpf_xdp_adjust_head(ctx, -(int)32); return XDP_PASS; } char _license[] SEC("license") = "GPL"; compile: clang -O2 -g -Wall -target bpf -c xdp_pass.c -o xdp_pass.o load on virtio_net: ip link set enp1s0 xdpdrv obj xdp_pass.o sec xdp_pass CC: stable@vger.kernel.org CC: Jason Wang <jasowang@redhat.com> CC: Xuan Zhuo <xuanzhuo@linux.alibaba.com> CC: Daniel Borkmann <daniel@iogearbox.net> CC: "Michael S. Tsirkin" <mst@redhat.com> CC: virtualization@lists.linux-foundation.org Fixes: 8fb7da9e9907 ("virtio_net: get build_skb() buf by data ptr") Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20220425103703.3067292-1-razor@blackwall.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-03-31Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds1-13/+376
Pull virtio updates from Michael Tsirkin: - vdpa generic device type support - more virtio hardening for broken devices (but on the same theme, revert some virtio hotplug hardening patches - they were misusing some interrupt flags and had to be reverted) - RSS support in virtio-net - max device MTU support in mlx5 vdpa - akcipher support in virtio-crypto - shared IRQ support in ifcvf vdpa - a minor performance improvement in vhost - enable virtio mem for ARM64 - beginnings of advance dma support - cleanups, fixes all over the place * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (33 commits) vdpa/mlx5: Avoid processing works if workqueue was destroyed vhost: handle error while adding split ranges to iotlb vdpa: support exposing the count of vqs to userspace vdpa: change the type of nvqs to u32 vdpa: support exposing the config size to userspace vdpa/mlx5: re-create forwarding rules after mac modified virtio: pci: check bar values read from virtio config space Revert "virtio_pci: harden MSI-X interrupts" Revert "virtio-pci: harden INTX interrupts" drivers/net/virtio_net: Added RSS hash report control. drivers/net/virtio_net: Added RSS hash report. drivers/net/virtio_net: Added basic RSS support. drivers/net/virtio_net: Fixed padded vheader to use v1 with hash. virtio: use virtio_device_ready() in virtio_device_restore() tools/virtio: compile with -pthread tools/virtio: fix after premapped buf support virtio_ring: remove flags check for unmap packed indirect desc virtio_ring: remove flags check for unmap split indirect desc virtio_ring: rename vring_unmap_state_packed() to vring_unmap_extra_packed() net/mlx5: Add support for configuring max device MTU ...