summaryrefslogtreecommitdiff
path: root/drivers/net/virtio_net.c
AgeCommit message (Collapse)AuthorFilesLines
2018-01-22virtio_net: Add ethtool statsToshiaki Makita1-50/+141
The main purpose of this patch is adding a way of checking per-queue stats. It's useful to debug performance problems on multiqueue environment. $ ethtool -S ens10 NIC statistics: rx_queue_0_packets: 2090408 rx_queue_0_bytes: 3164825094 rx_queue_1_packets: 2082531 rx_queue_1_bytes: 3152932314 tx_queue_0_packets: 2770841 tx_queue_0_bytes: 4194955474 tx_queue_1_packets: 3084697 tx_queue_1_bytes: 4670196372 This change converts existing per-cpu stats structure into per-queue one. This should not impact on performance since each queue counter is not updated concurrently by multiple cpus. Performance numbers: - Guest has 2 vcpus and 2 queues - Guest runs netserver - Host runs 100-flow super_netperf Before After Diff UDP_STREAM 18byte 86.22 87.00 +0.90% UDP_STREAM 1472byte 4055.27 4042.18 -0.32% TCP_STREAM 16956.32 16890.63 -0.39% UDP_RR 178667.11 185862.70 +4.03% TCP_RR 128473.04 124985.81 -2.71% Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-09virtio_net: propagate linkspeed/duplex settings from the hypervisorJason Baron1-1/+22
The ability to set speed and duplex for virtio_net is useful in various scenarios as described here: 16032be virtio_net: add ethtool support for set and get of settings However, it would be nice to be able to set this from the hypervisor, such that virtio_net doesn't require custom guest ethtool commands. Introduce a new feature flag, VIRTIO_NET_F_SPEED_DUPLEX, which allows the hypervisor to export a linkspeed and duplex setting. The user can subsequently overwrite it later if desired via: 'ethtool -s'. Note that VIRTIO_NET_F_SPEED_DUPLEX is defined as bit 63, the intention is that device feature bits are to grow down from bit 63, since the transports are starting from bit 24 and growing up. Signed-off-by: Jason Baron <jbaron@akamai.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: virtio-dev@lists.oasis-open.org Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-06virtio_net: setup xdp_rxq_infoJesper Dangaard Brouer1-1/+13
The virtio_net driver doesn't dynamically change the RX-ring queue layout and backing pages, but instead reject XDP setup if all the conditions for XDP is not meet. Thus, the xdp_rxq_info also remains fairly static. This allow us to simply add the reg/unreg to net_device open/close functions. Driver hook points for xdp_rxq_info: * reg : virtnet_open * unreg: virtnet_close V3: - bugfix, also setup xdp.rxq in receive_mergeable() - Tested bpf-sample prog inside guest on a virtio_net device Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: virtualization@lists.linux-foundation.org Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Reviewed-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2017-12-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-1/+1
Conflict was two parallel additions of include files to sch_generic.c, no biggie. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-08Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds1-1/+1
Pull virtio bugfixes from Michael Tsirkin: "A couple of minor bugfixes" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: virtio_net: fix return value check in receive_mergeable() virtio_mmio: add cleanup for virtio_mmio_remove virtio_mmio: add cleanup for virtio_mmio_probe
2017-12-08virtio_net: Disable interrupts if napi_complete_done rescheduled napiToshiaki Makita1-3/+6
Since commit 39e6c8208d7b ("net: solve a NAPI race") napi has been able to be rescheduled within napi_complete_done() even in non-busypoll case, but virtnet_poll() always enabled interrupts before complete, and when napi was rescheduled within napi_complete_done() it did not disable interrupts. This caused more interrupts when event idx is disabled. According to commit cbdadbbf0c79 ("virtio_net: fix race in RX VQ processing") we cannot place virtqueue_enable_cb_prepare() after NAPI_STATE_SCHED is cleared, so disable interrupts again if napi_complete_done() returned false. Tested with vhost-user of OVS 2.7 on host, which does not have the event idx feature. * Before patch: $ netperf -t UDP_STREAM -H 192.168.150.253 -l 60 -- -m 1472 MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.150.253 () port 0 AF_INET Socket Message Elapsed Messages Size Size Time Okay Errors Throughput bytes bytes secs # # 10^6bits/sec 212992 1472 60.00 32763206 0 6430.32 212992 60.00 23384299 4589.56 Interrupts on guest: 9872369 Packets/interrupt: 2.37 * After patch $ netperf -t UDP_STREAM -H 192.168.150.253 -l 60 -- -m 1472 MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.150.253 () port 0 AF_INET Socket Message Elapsed Messages Size Size Time Okay Errors Throughput bytes bytes secs # # 10^6bits/sec 212992 1472 60.00 32794646 0 6436.49 212992 60.00 32793501 6436.27 Interrupts on guest: 4941299 Packets/interrupt: 6.64 Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-07virtio_net: fix return value check in receive_mergeable()Yunjian Wang1-1/+1
The function virtqueue_get_buf_ctx() could return NULL, the return value 'buf' need to be checked with NULL, not value 'ctx'. Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-11-16Merge branch 'akpm' (patches from Andrew)Linus Torvalds1-1/+0
Merge updates from Andrew Morton: - a few misc bits - ocfs2 updates - almost all of MM * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (131 commits) memory hotplug: fix comments when adding section mm: make alloc_node_mem_map a void call if we don't have CONFIG_FLAT_NODE_MEM_MAP mm: simplify nodemask printing mm,oom_reaper: remove pointless kthread_run() error check mm/page_ext.c: check if page_ext is not prepared writeback: remove unused function parameter mm: do not rely on preempt_count in print_vma_addr mm, sparse: do not swamp log with huge vmemmap allocation failures mm/hmm: remove redundant variable align_end mm/list_lru.c: mark expected switch fall-through mm/shmem.c: mark expected switch fall-through mm/page_alloc.c: broken deferred calculation mm: don't warn about allocations which stall for too long fs: fuse: account fuse_inode slab memory as reclaimable mm, page_alloc: fix potential false positive in __zone_watermark_ok mm: mlock: remove lru_add_drain_all() mm, sysctl: make NUMA stats configurable shmem: convert shmem_init_inodecache() to void Unify migrate_pages and move_pages access checks mm, pagevec: rename pagevec drained field ...
2017-11-16mm: remove __GFP_COLDMel Gorman1-1/+0
As the page free path makes no distinction between cache hot and cold pages, there is no real useful ordering of pages in the free list that allocation requests can take advantage of. Juding from the users of __GFP_COLD, it is likely that a number of them are the result of copying other sites instead of actually measuring the impact. Remove the __GFP_COLD parameter which simplifies a number of paths in the page allocator. This is potentially controversial but bear in mind that the size of the per-cpu pagelists versus modern cache sizes means that the whole per-cpu list can often fit in the L3 cache. Hence, there is only a potential benefit for microbenchmarks that alloc/free pages in a tight loop. It's even worse when THP is taken into account which has little or no chance of getting a cache-hot page as the per-cpu list is bypassed and the zeroing of multiple pages will thrash the cache anyway. The truncate microbenchmarks are not shown as this patch affects the allocation path and not the free path. A page fault microbenchmark was tested but it showed no sigificant difference which is not surprising given that the __GFP_COLD branches are a miniscule percentage of the fault path. Link: http://lkml.kernel.org/r/20171018075952.10627-9-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-05net: bpf: rename ndo_xdp to ndo_bpfJakub Kicinski1-2/+2
ndo_xdp is a control path callback for setting up XDP in the driver. We can reuse it for other forms of communication between the eBPF stack and the drivers. Rename the callback and associated structures and definitions. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-26bpf: add meta pointer for direct accessDaniel Borkmann1-0/+2
This work enables generic transfer of metadata from XDP into skb. The basic idea is that we can make use of the fact that the resulting skb must be linear and already comes with a larger headroom for supporting bpf_xdp_adjust_head(), which mangles xdp->data. Here, we base our work on a similar principle and introduce a small helper bpf_xdp_adjust_meta() for adjusting a new pointer called xdp->data_meta. Thus, the packet has a flexible and programmable room for meta data, followed by the actual packet data. struct xdp_buff is therefore laid out that we first point to data_hard_start, then data_meta directly prepended to data followed by data_end marking the end of packet. bpf_xdp_adjust_head() takes into account whether we have meta data already prepended and if so, memmove()s this along with the given offset provided there's enough room. xdp->data_meta is optional and programs are not required to use it. The rationale is that when we process the packet in XDP (e.g. as DoS filter), we can push further meta data along with it for the XDP_PASS case, and give the guarantee that a clsact ingress BPF program on the same device can pick this up for further post-processing. Since we work with skb there, we can also set skb->mark, skb->priority or other skb meta data out of BPF, thus having this scratch space generic and programmable allows for more flexibility than defining a direct 1:1 transfer of potentially new XDP members into skb (it's also more efficient as we don't need to initialize/handle each of such new members). The facility also works together with GRO aggregation. The scratch space at the head of the packet can be multiple of 4 byte up to 32 byte large. Drivers not yet supporting xdp->data_meta can simply be set up with xdp->data_meta as xdp->data + 1 as bpf_xdp_adjust_meta() will detect this and bail out, such that the subsequent match against xdp->data for later access is guaranteed to fail. The verifier treats xdp->data_meta/xdp->data the same way as we treat xdp->data/xdp->data_end pointer comparisons. The requirement for doing the compare against xdp->data is that it hasn't been modified from it's original address we got from ctx access. It may have a range marking already from prior successful xdp->data/xdp->data_end pointer comparisons though. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-23virtio-net: correctly set xdp_xmit for mergeable bufferJason Wang1-1/+1
We should set xdp_xmit only when xdp_do_redirect() succeed. Cc: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-21virtio-net: support XDP_REDIRECTJason Wang1-15/+62
This patch tries to add XDP_REDIRECT for virtio-net. The changes are not complex as we could use exist XDP_TX helpers for most of the work. The rest is passing the XDP_TX to NAPI handler for implementing batching. Cc: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-21virtio-net: add packet len average only when needed during XDPJason Wang1-3/+3
There's no need to add packet len average in the case of XDP_PASS since it will be done soon after skb is created. Cc: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-21virtio-net: remove unnecessary parameter of virtnet_xdp_xmit()Jason Wang1-3/+2
CC: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-1/+1
Three cases of simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24virtio_net: be drop monitor friendlyEric Dumazet1-1/+1
This change is needed to not fool drop monitor. (perf record ... -e skb:kfree_skb ) Packets were properly sent and are consumed after TX completion. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-19net: drop unused attribute argument from sysfs queue funcsstephen hemminger1-1/+1
The show and store functions don't need/use the attribute. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-16virtio: put paren around sizeofstephen hemminger1-1/+1
Kernel coding style is to put paren around operand of sizeof. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-14virtio-net: make array guest_offloads staticColin Ian King1-4/+6
The array guest_offloads is local to the source and does not need to be in global scope, so make it static. Also tweak formatting. Cleans up sparse warnings: symbol 'guest_offloads' was not declared. Should it be static? Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-4/+3
Two minor conflicts in virtio_net driver (bug fix overlapping addition of a helper) and MAINTAINERS (new driver edit overlapping revamp of PHY entry). Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds1-3/+2
Pull networking fixes from David Miller: 1) Handle notifier registry failures properly in tun/tap driver, from Tonghao Zhang. 2) Fix bpf verifier handling of subtraction bounds and add a testcase for this, from Edward Cree. 3) Increase reset timeout in ftgmac100 driver, from Ben Herrenschmidt. 4) Fix use after free in prd_retire_rx_blk_timer_exired() in AF_PACKET, from Cong Wang. 5) Fix SElinux regression due to recent UDP optimizations, from Paolo Abeni. 6) We accidently increment IPSTATS_MIB_FRAGFAILS in the ipv6 code paths, fix from Stefano Brivio. 7) Fix some mem leaks in dccp, from Xin Long. 8) Adjust MDIO_BUS kconfig deps to avoid build errors, from Arnd Bergmann. 9) Mac address length check and buffer size fixes from Cong Wang. 10) Don't leak sockets in ipv6 udp early demux, from Paolo Abeni. 11) Fix return value when copy_from_user() fails in bpf_prog_get_info_by_fd(), from Daniel Borkmann. 12) Handle PHY_HALTED properly in phy library state machine, from Florian Fainelli. 13) Fix OOPS in fib_sync_down_dev(), from Ido Schimmel. 14) Fix truesize calculation in virtio_net which led to performance regressions, from Michael S Tsirkin. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (76 commits) samples/bpf: fix bpf tunnel cleanup udp6: fix jumbogram reception ppp: Fix a scheduling-while-atomic bug in del_chan Revert "net: bcmgenet: Remove init parameter from bcmgenet_mii_config" virtio_net: fix truesize for mergeable buffers mv643xx_eth: fix of_irq_to_resource() error check MAINTAINERS: Add more files to the PHY LIBRARY section ipv4: fib: Fix NULL pointer deref during fib_sync_down_dev() net: phy: Correctly process PHY_HALTED in phy_stop_machine() sunhme: fix up GREG_STAT and GREG_IMASK register offsets bpf: fix bpf_prog_get_info_by_fd to dump correct xlated_prog_len tcp: avoid bogus gcc-7 array-bounds warning net: tc35815: fix spelling mistake: "Intterrupt" -> "Interrupt" bpf: don't indicate success when copy_from_user fails udp6: fix socket leak on early demux net: thunderx: Fix BGX transmit stall due to underflow Revert "vhost: cache used event for better performance" team: use a larger struct for mac address net: check dev->addr_len for dev_set_mac_address() phy: bcm-ns-usb3: fix MDIO_BUS dependency ...
2017-08-01virtio_net: fix truesize for mergeable buffersMichael S. Tsirkin1-3/+2
Seth Forshee noticed a performance degradation with some workloads. This turns out to be due to packet drops. Euan Kemp noticed that this is because we drop all packets where length exceeds the truesize, but for some packets we add in extra memory without updating the truesize. This in turn was kept around unchanged from ab7db91705e95 ("virtio-net: auto-tune mergeable rx buffer size for improved performance"). That commit had an internal reason not to account for the extra space: not enough bits to do it. No longer true so let's account for the allocated length exactly. Many thanks to Seth Forshee for the report and bisecting and Euan Kemp for debugging the issue. Fixes: 680557cf79f8 ("virtio_net: rework mergeable buffer handling") Reported-by: Euan Kemp <euan.kemp@coreos.com> Tested-by: Euan Kemp <euan.kemp@coreos.com> Reported-by: Seth Forshee <seth.forshee@canonical.com> Tested-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-26virtio-net: mark PM functions as __maybe_unusedArnd Bergmann1-4/+2
After removing the reset function, the freeze and restore functions are now unused when CONFIG_PM_SLEEP is disabled: drivers/net/virtio_net.c:1881:12: error: 'virtnet_restore_up' defined but not used [-Werror=unused-function] static int virtnet_restore_up(struct virtio_device *vdev) drivers/net/virtio_net.c:1859:13: error: 'virtnet_freeze_down' defined but not used [-Werror=unused-function] static void virtnet_freeze_down(struct virtio_device *vdev) A more robust way to do this is to remove the #ifdef around the callers and instead mark them as __maybe_unused. The compiler will now just silently drop the unused code. Fixes: 4941d472bf95 ("virtio-net: do not reset during XDP set") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-25virtio-net: fix module unloadingAndrew Jones1-1/+1
Unregister the driver before removing multi-instance hotplug callbacks. This order avoids the warning issued from __cpuhp_remove_state_cpuslocked when the number of remaining instances isn't yet zero. Fixes: 8017c279196a ("net/virtio-net: Convert to hotplug state machine") Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-07-24virtio-net: switch off offloads on demand if possible on XDP setJason Wang1-5/+65
Current XDP implementation wants guest offloads feature to be disabled on device. This is inconvenient and means guest can't benefit from offloads if XDP is not used. This patch tries to address this limitation by disabling the offloads on demand through control guest offloads. Guest offloads will be disabled and enabled on demand on XDP set. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-24virtio-net: do not reset during XDP setJason Wang1-126/+106
We currently reset the device during XDP set, the main reason is that we allocate more headroom with XDP (for header adjustment). This works but causes network downtime for users. Previous patches encoded the headroom in the buffer context, this makes it possible to detect the case where a buffer with headroom insufficient for XDP is added to the queue and XDP is enabled afterwards. Upon detection, we handle this case by copying the packet (slow, but it's a temporary condition). Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-24virtio-net: switch to use new ctx API for small bufferJason Wang1-5/+12
Use ctx API to store headroom for small buffers. Following patches will retrieve this info and use it for XDP. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-24virtio-net: pack headroom into ctx for mergeable buffersJason Wang1-5/+24
Pack headroom into ctx - this way when we get a buffer we can figure out the actual headroom that was allocated for the buffer. Will be helpful to optimize switching between XDP and non-XDP modes which have different headroom requirements. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-17virtio_net: Remove references to NETIF_F_UFO.David S. Miller1-4/+2
It is going away. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-07virtio-net: fix leaking of ctx arrayJason Wang1-0/+1
Fixes: commit d45b897b11ea ("virtio_net: allow specifying context for rx") Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-0/+1
A set of overlapping changes in macvlan and the rocker driver, nothing serious. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-29virtio-net: serialize tx routine during resetJason Wang1-0/+1
We don't hold any tx lock when trying to disable TX during reset, this would lead a use after free since ndo_start_xmit() tries to access the virtqueue which has already been freed. Fix this by using netif_tx_disable() before freeing the vqs, this could make sure no tx after vq freeing. Reported-by: Jean-Philippe Menil <jpmenil@gmail.com> Tested-by: Jean-Philippe Menil <jpmenil@gmail.com> Fixes commit f600b6905015 ("virtio_net: Add XDP support") Cc: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Robert McCabe <robert.mccabe@rockwellcollins.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-16bpf: virtio_net: Report bpf_prog ID during XDP_QUERY_PROGMartin KaFai Lau1-5/+8
Add support to virtio_net to report bpf_prog ID during XDP_QUERY_PROG. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Jason Wang <jasowang@redhat.com> Acked-by: Alexei Starovoitov <ast@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-16networking: introduce and use skb_put_data()Johannes Berg1-1/+1
A common pattern with skb_put() is to just want to memcpy() some data into the new space, introduce skb_put_data() for this. An spatch similar to the one for skb_put_zero() converts many of the places using it: @@ identifier p, p2; expression len, skb, data; type t, t2; @@ ( -p = skb_put(skb, len); +p = skb_put_data(skb, data, len); | -p = (t)skb_put(skb, len); +p = skb_put_data(skb, data, len); ) ( p2 = (t2)p; -memcpy(p2, data, len); | -memcpy(p, data, len); ) @@ type t, t2; identifier p, p2; expression skb, data; @@ t *p; ... ( -p = skb_put(skb, sizeof(t)); +p = skb_put_data(skb, data, sizeof(t)); | -p = (t *)skb_put(skb, sizeof(t)); +p = skb_put_data(skb, data, sizeof(t)); ) ( p2 = (t2)p; -memcpy(p2, data, sizeof(*p)); | -memcpy(p, data, sizeof(*p)); ) @@ expression skb, len, data; @@ -memcpy(skb_put(skb, len), data, len); +skb_put_data(skb, data, len); (again, manually post-processed to retain some comments) Reviewed-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-2/+3
Just some simple overlapping changes in marvell PHY driver and the DSA core code. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-05virtio_net: check return value of skb_to_sgvec alwaysJason A. Donenfeld1-2/+7
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-02virtio_net: lower limit on buffer sizeMichael S. Tsirkin1-2/+3
commit d85b758f72b0 ("virtio_net: fix support for small rings") was supposed to increase the buffer size for small rings but had an unintentional side effect of decreasing it for large rings. This seems to break some setups - it's not yet clear why, but increasing buffer size back to what it was before helps. Fixes: d85b758f72b0 ("virtio_net: fix support for small rings") Reported-by: Mikulas Patocka <mpatocka@redhat.com> Reported-by: "J. Bruce Fields" <bfields@fieldses.org> Tested-by: Mikulas Patocka <mpatocka@redhat.com> Tested-by: "J. Bruce Fields" <bfields@fieldses.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-24virtio-net: enable TSO/checksum offloads for Q-in-Q vlansVlad Yasevich1-0/+1
Since virtio does not provide it's own ndo_features_check handler, TSO, and now checksum offload, are disabled for stacked vlans. Re-enable the support and let the host take care of it. This restores/improves Guest-to-Guest performance over Q-in-Q vlans. Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-10Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds1-64/+83
Pull virtio updates from Michael Tsirkin: "Fixes, cleanups, performance A bunch of changes to virtio, most affecting virtio net. Also ptr_ring batched zeroing - first of batching enhancements that seems ready." * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: s390/virtio: change maintainership tools/virtio: fix spelling mistake: "wakeus" -> "wakeups" virtio_net: tidy a couple debug statements ptr_ring: support testing different batching sizes ringtest: support test specific parameters ptr_ring: batch ring zeroing virtio: virtio_driver doc virtio_net: don't reset twice on XDP on/off virtio_net: fix support for small rings virtio_net: reduce alignment for buffers virtio_net: rework mergeable buffer handling virtio_net: allow specifying context for rx virtio: allow extra context per descriptor tools/virtio: fix build breakage virtio: add context flag to find vqs virtio: wrap find_vqs ringtest: fix an assert statement
2017-05-09virtio_net: tidy a couple debug statementsDan Carpenter1-2/+2
We are printing a decimal value for truesize so we shouldn't use an "0x" prefix. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-05-09virtio_net: don't reset twice on XDP on/offMichael S. Tsirkin1-1/+0
We already do a reset once in remove_vq_common - there appears to be no point in doing another one when we add/remove XDP. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-05-09virtio_net: fix support for small ringsMichael S. Tsirkin1-4/+26
When ring size is small (<32 entries) making buffers smaller means a full ring might not be able to hold enough buffers to fit a single large packet. Make sure a ring full of buffers is large enough to allow at least one packet of max size. Fixes: 2613af0ed18a ("virtio_net: migrate mergeable rx buffers to page frag allocators") Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-05-09virtio_net: reduce alignment for buffersMichael S. Tsirkin1-12/+1
We don't need to align length to any particular value anymore. Aligning to L1 cache size probably sill makes sense to reduce false sharing. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-05-09virtio_net: rework mergeable buffer handlingMichael S. Tsirkin1-46/+43
Use the new _ctx virtio API to maintain true length for each buffer. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-05-09virtio_net: allow specifying context for rxMichael S. Tsirkin1-1/+14
With mergeable buffers we never use s/g for rx, so allow specifying context in that case. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-05-03xdp: use common helper for netlink extended ack reportingDaniel Borkmann1-4/+4
Small follow-up to d74a32acd59a ("xdp: use netlink extended ACK reporting") in order to let drivers all use the same NL_SET_ERR_MSG_MOD() helper macro for reporting. This also ensures that we consistently add the driver's prefix for dumping the report in user space to indicate that the error message is driver specific and not coming from core code. Furthermore, NL_SET_ERR_MSG_MOD() now reuses NL_SET_ERR_MSG() and thus makes all macros check the pointer as suggested. References: https://www.spinics.net/lists/netdev/msg433267.html Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-02virtio: wrap find_vqsMichael S. Tsirkin1-2/+1
We are going to add more parameters to find_vqs, let's wrap the call so we don't need to tweak all drivers every time. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-05-01virtio_net: make use of extended ack message reportingJakub Kicinski1-4/+7
Try to carry error messages to the user via the netlink extended ack message attribute. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01virtio-net: use netif_tx_napi_add for tx napiWillem de Bruijn1-2/+2
Avoid hashing the tx napi struct into napi_hash[], which is used for busy polling receive queues. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>