<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/drivers/vhost/net.c, branch v5.2.16</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v5.2.16</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v5.2.16'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2019-07-26T07:10:29+00:00</updated>
<entry>
<title>vhost_net: disable zerocopy by default</title>
<updated>2019-07-26T07:10:29+00:00</updated>
<author>
<name>Jason Wang</name>
<email>jasowang@redhat.com</email>
</author>
<published>2019-06-17T09:20:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=9d28094f3cf3b315e784f04ba617de8c8d8978fa'/>
<id>urn:sha1:9d28094f3cf3b315e784f04ba617de8c8d8978fa</id>
<content type='text'>
[ Upstream commit 098eadce3c622c07b328d0a43dda379b38cf7c5e ]

Vhost_net was known to suffer from HOL[1] issues which is not easy to
fix. Several downstream disable the feature by default. What's more,
the datapath was split and datacopy path got the support of batching
and XDP support recently which makes it faster than zerocopy part for
small packets transmission.

It looks to me that disable zerocopy by default is more
appropriate. It cold be enabled by default again in the future if we
fix the above issues.

[1] https://patchwork.kernel.org/patch/3787671/

Signed-off-by: Jason Wang &lt;jasowang@redhat.com&gt;
Acked-by: Michael S. Tsirkin &lt;mst@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 482</title>
<updated>2019-06-19T15:09:52+00:00</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2019-06-04T08:11:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=7a338472f225f18694cc4d9ad1f6f9428f4a88d0'/>
<id>urn:sha1:7a338472f225f18694cc4d9ad1f6f9428f4a88d0</id>
<content type='text'>
Based on 1 normalized pattern(s):

  this work is licensed under the terms of the gnu gpl version 2

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-only

has been chosen to replace the boilerplate/reference in 48 file(s).

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Allison Randal &lt;allison@lohutok.net&gt;
Reviewed-by: Enrico Weigelt &lt;info@metux.net&gt;
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190604081204.624030236@linutronix.de
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>vhost_net: fix possible infinite loop</title>
<updated>2019-05-27T15:08:22+00:00</updated>
<author>
<name>Jason Wang</name>
<email>jasowang@redhat.com</email>
</author>
<published>2019-05-17T04:29:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e2412c07f8f3040593dfb88207865a3cd58680c0'/>
<id>urn:sha1:e2412c07f8f3040593dfb88207865a3cd58680c0</id>
<content type='text'>
When the rx buffer is too small for a packet, we will discard the vq
descriptor and retry it for the next packet:

while ((sock_len = vhost_net_rx_peek_head_len(net, sock-&gt;sk,
					      &amp;busyloop_intr))) {
...
	/* On overrun, truncate and discard */
	if (unlikely(headcount &gt; UIO_MAXIOV)) {
		iov_iter_init(&amp;msg.msg_iter, READ, vq-&gt;iov, 1, 1);
		err = sock-&gt;ops-&gt;recvmsg(sock, &amp;msg,
					 1, MSG_DONTWAIT | MSG_TRUNC);
		pr_debug("Discarded rx packet: len %zd\n", sock_len);
		continue;
	}
...
}

This makes it possible to trigger a infinite while..continue loop
through the co-opreation of two VMs like:

1) Malicious VM1 allocate 1 byte rx buffer and try to slow down the
   vhost process as much as possible e.g using indirect descriptors or
   other.
2) Malicious VM2 generate packets to VM1 as fast as possible

Fixing this by checking against weight at the end of RX and TX
loop. This also eliminate other similar cases when:

- userspace is consuming the packets in the meanwhile
- theoretical TOCTOU attack if guest moving avail index back and forth
  to hit the continue after vhost find guest just add new buffers

This addresses CVE-2019-3900.

Fixes: d8316f3991d20 ("vhost: fix total length when packets are too short")
Fixes: 3a4d5c94e9593 ("vhost_net: a kernel-level virtio server")
Signed-off-by: Jason Wang &lt;jasowang@redhat.com&gt;
Reviewed-by: Stefan Hajnoczi &lt;stefanha@redhat.com&gt;
Signed-off-by: Michael S. Tsirkin &lt;mst@redhat.com&gt;
</content>
</entry>
<entry>
<title>vhost: introduce vhost_exceeds_weight()</title>
<updated>2019-05-27T15:08:22+00:00</updated>
<author>
<name>Jason Wang</name>
<email>jasowang@redhat.com</email>
</author>
<published>2019-05-17T04:29:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e82b9b0727ff6d665fff2d326162b460dded554d'/>
<id>urn:sha1:e82b9b0727ff6d665fff2d326162b460dded554d</id>
<content type='text'>
We used to have vhost_exceeds_weight() for vhost-net to:

- prevent vhost kthread from hogging the cpu
- balance the time spent between TX and RX

This function could be useful for vsock and scsi as well. So move it
to vhost.c. Device must specify a weight which counts the number of
requests, or it can also specific a byte_weight which counts the
number of bytes that has been processed.

Signed-off-by: Jason Wang &lt;jasowang@redhat.com&gt;
Reviewed-by: Stefan Hajnoczi &lt;stefanha@redhat.com&gt;
Signed-off-by: Michael S. Tsirkin &lt;mst@redhat.com&gt;
</content>
</entry>
<entry>
<title>vhost: fix OOB in get_rx_bufs()</title>
<updated>2019-01-29T06:53:09+00:00</updated>
<author>
<name>Jason Wang</name>
<email>jasowang@redhat.com</email>
</author>
<published>2019-01-28T07:05:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b46a0bf78ad7b150ef5910da83859f7f5a514ffd'/>
<id>urn:sha1:b46a0bf78ad7b150ef5910da83859f7f5a514ffd</id>
<content type='text'>
After batched used ring updating was introduced in commit e2b3b35eb989
("vhost_net: batch used ring update in rx"). We tend to batch heads in
vq-&gt;heads for more than one packet. But the quota passed to
get_rx_bufs() was not correctly limited, which can result a OOB write
in vq-&gt;heads.

        headcount = get_rx_bufs(vq, vq-&gt;heads + nvq-&gt;done_idx,
                    vhost_len, &amp;in, vq_log, &amp;log,
                    likely(mergeable) ? UIO_MAXIOV : 1);

UIO_MAXIOV was still used which is wrong since we could have batched
used in vq-&gt;heads, this will cause OOB if the next buffer needs more
than 960 (1024 (UIO_MAXIOV) - 64 (VHOST_NET_BATCH)) heads after we've
batched 64 (VHOST_NET_BATCH) heads:
Acked-by: Stefan Hajnoczi &lt;stefanha@redhat.com&gt;

=============================================================================
BUG kmalloc-8k (Tainted: G    B            ): Redzone overwritten
-----------------------------------------------------------------------------

INFO: 0x00000000fd93b7a2-0x00000000f0713384. First byte 0xa9 instead of 0xcc
INFO: Allocated in alloc_pd+0x22/0x60 age=3933677 cpu=2 pid=2674
    kmem_cache_alloc_trace+0xbb/0x140
    alloc_pd+0x22/0x60
    gen8_ppgtt_create+0x11d/0x5f0
    i915_ppgtt_create+0x16/0x80
    i915_gem_create_context+0x248/0x390
    i915_gem_context_create_ioctl+0x4b/0xe0
    drm_ioctl_kernel+0xa5/0xf0
    drm_ioctl+0x2ed/0x3a0
    do_vfs_ioctl+0x9f/0x620
    ksys_ioctl+0x6b/0x80
    __x64_sys_ioctl+0x11/0x20
    do_syscall_64+0x43/0xf0
    entry_SYSCALL_64_after_hwframe+0x44/0xa9
INFO: Slab 0x00000000d13e87af objects=3 used=3 fp=0x          (null) flags=0x200000000010201
INFO: Object 0x0000000003278802 @offset=17064 fp=0x00000000e2e6652b

Fixing this by allocating UIO_MAXIOV + VHOST_NET_BATCH iovs for
vhost-net. This is done through set the limitation through
vhost_dev_init(), then set_owner can allocate the number of iov in a
per device manner.

This fixes CVE-2018-16880.

Fixes: e2b3b35eb989 ("vhost_net: batch used ring update in rx")
Signed-off-by: Jason Wang &lt;jasowang@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>vhost: log dirty page correctly</title>
<updated>2019-01-18T05:43:24+00:00</updated>
<author>
<name>Jason Wang</name>
<email>jasowang@redhat.com</email>
</author>
<published>2019-01-16T08:54:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=cc5e710759470bc7f3c61d11fd54586f15fdbdf4'/>
<id>urn:sha1:cc5e710759470bc7f3c61d11fd54586f15fdbdf4</id>
<content type='text'>
Vhost dirty page logging API is designed to sync through GPA. But we
try to log GIOVA when device IOTLB is enabled. This is wrong and may
lead to missing data after migration.

To solve this issue, when logging with device IOTLB enabled, we will:

1) reuse the device IOTLB translation result of GIOVA-&gt;HVA mapping to
   get HVA, for writable descriptor, get HVA through iovec. For used
   ring update, translate its GIOVA to HVA
2) traverse the GPA-&gt;HVA mapping to get the possible GPA and log
   through GPA. Pay attention this reverse mapping is not guaranteed
   to be unique, so we should log each possible GPA in this case.

This fix the failure of scp to guest during migration. In -next, we
will probably support passing GIOVA-&gt;GPA instead of GIOVA-&gt;HVA.

Fixes: 6b1e6cc7855b ("vhost: new device IOTLB API")
Reported-by: Jintack Lim &lt;jintack@cs.columbia.edu&gt;
Cc: Jintack Lim &lt;jintack@cs.columbia.edu&gt;
Signed-off-by: Jason Wang &lt;jasowang@redhat.com&gt;
Acked-by: Michael S. Tsirkin &lt;mst@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next</title>
<updated>2018-12-27T21:04:52+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2018-12-27T21:04:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e0c38a4d1f196a4b17d2eba36afff8f656a4f1de'/>
<id>urn:sha1:e0c38a4d1f196a4b17d2eba36afff8f656a4f1de</id>
<content type='text'>
Pull networking updates from David Miller:

 1) New ipset extensions for matching on destination MAC addresses, from
    Stefano Brivio.

 2) Add ipv4 ttl and tos, plus ipv6 flow label and hop limit offloads to
    nfp driver. From Stefano Brivio.

 3) Implement GRO for plain UDP sockets, from Paolo Abeni.

 4) Lots of work from Michał Mirosław to eliminate the VLAN_TAG_PRESENT
    bit so that we could support the entire vlan_tci value.

 5) Rework the IPSEC policy lookups to better optimize more usecases,
    from Florian Westphal.

 6) Infrastructure changes eliminating direct manipulation of SKB lists
    wherever possible, and to always use the appropriate SKB list
    helpers. This work is still ongoing...

 7) Lots of PHY driver and state machine improvements and
    simplifications, from Heiner Kallweit.

 8) Various TSO deferral refinements, from Eric Dumazet.

 9) Add ntuple filter support to aquantia driver, from Dmitry Bogdanov.

10) Batch dropping of XDP packets in tuntap, from Jason Wang.

11) Lots of cleanups and improvements to the r8169 driver from Heiner
    Kallweit, including support for -&gt;xmit_more. This driver has been
    getting some much needed love since he started working on it.

12) Lots of new forwarding selftests from Petr Machata.

13) Enable VXLAN learning in mlxsw driver, from Ido Schimmel.

14) Packed ring support for virtio, from Tiwei Bie.

15) Add new Aquantia AQtion USB driver, from Dmitry Bezrukov.

16) Add XDP support to dpaa2-eth driver, from Ioana Ciocoi Radulescu.

17) Implement coalescing on TCP backlog queue, from Eric Dumazet.

18) Implement carrier change in tun driver, from Nicolas Dichtel.

19) Support msg_zerocopy in UDP, from Willem de Bruijn.

20) Significantly improve garbage collection of neighbor objects when
    the table has many PERMANENT entries, from David Ahern.

21) Remove egdev usage from nfp and mlx5, and remove the facility
    completely from the tree as it no longer has any users. From Oz
    Shlomo and others.

22) Add a NETDEV_PRE_CHANGEADDR so that drivers can veto the change and
    therefore abort the operation before the commit phase (which is the
    NETDEV_CHANGEADDR event). From Petr Machata.

23) Add indirect call wrappers to avoid retpoline overhead, and use them
    in the GRO code paths. From Paolo Abeni.

24) Add support for netlink FDB get operations, from Roopa Prabhu.

25) Support bloom filter in mlxsw driver, from Nir Dotan.

26) Add SKB extension infrastructure. This consolidates the handling of
    the auxiliary SKB data used by IPSEC and bridge netfilter, and is
    designed to support the needs to MPTCP which could be integrated in
    the future.

27) Lots of XDP TX optimizations in mlx5 from Tariq Toukan.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1845 commits)
  net: dccp: fix kernel crash on module load
  drivers/net: appletalk/cops: remove redundant if statement and mask
  bnx2x: Fix NULL pointer dereference in bnx2x_del_all_vlans() on some hw
  net/net_namespace: Check the return value of register_pernet_subsys()
  net/netlink_compat: Fix a missing check of nla_parse_nested
  ieee802154: lowpan_header_create check must check daddr
  net/mlx4_core: drop useless LIST_HEAD
  mlxsw: spectrum: drop useless LIST_HEAD
  net/mlx5e: drop useless LIST_HEAD
  iptunnel: Set tun_flags in the iptunnel_metadata_reply from src
  net/mlx5e: fix semicolon.cocci warnings
  staging: octeon: fix build failure with XFRM enabled
  net: Revert recent Spectre-v1 patches.
  can: af_can: Fix Spectre v1 vulnerability
  packet: validate address length if non-zero
  nfc: af_nfc: Fix Spectre v1 vulnerability
  phonet: af_phonet: Fix Spectre v1 vulnerability
  net: core: Fix Spectre v1 vulnerability
  net: minor cleanup in skb_ext_add()
  net: drop the unused helper skb_ext_get()
  ...
</content>
</entry>
<entry>
<title>Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2018-12-26T21:07:19+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2018-12-26T21:07:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=792bf4d871dea8b69be2aaabdd320d7c6ed15985'/>
<id>urn:sha1:792bf4d871dea8b69be2aaabdd320d7c6ed15985</id>
<content type='text'>
Pull RCU updates from Ingo Molnar:
 "The biggest RCU changes in this cycle were:

   - Convert RCU's BUG_ON() and similar calls to WARN_ON() and similar.

   - Replace calls of RCU-bh and RCU-sched update-side functions to
     their vanilla RCU counterparts. This series is a step towards
     complete removal of the RCU-bh and RCU-sched update-side functions.

     ( Note that some of these conversions are going upstream via their
       respective maintainers. )

   - Documentation updates, including a number of flavor-consolidation
     updates from Joel Fernandes.

   - Miscellaneous fixes.

   - Automate generation of the initrd filesystem used for rcutorture
     testing.

   - Convert spin_is_locked() assertions to instead use lockdep.

     ( Note that some of these conversions are going upstream via their
       respective maintainers. )

   - SRCU updates, especially including a fix from Dennis Krein for a
     bag-on-head-class bug.

   - RCU torture-test updates"

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (112 commits)
  rcutorture: Don't do busted forward-progress testing
  rcutorture: Use 100ms buckets for forward-progress callback histograms
  rcutorture: Recover from OOM during forward-progress tests
  rcutorture: Print forward-progress test age upon failure
  rcutorture: Print time since GP end upon forward-progress failure
  rcutorture: Print histogram of CB invocation at OOM time
  rcutorture: Print GP age upon forward-progress failure
  rcu: Print per-CPU callback counts for forward-progress failures
  rcu: Account for nocb-CPU callback counts in RCU CPU stall warnings
  rcutorture: Dump grace-period diagnostics upon forward-progress OOM
  rcutorture: Prepare for asynchronous access to rcu_fwd_startat
  torture: Remove unnecessary "ret" variables
  rcutorture: Affinity forward-progress test to avoid housekeeping CPUs
  rcutorture: Break up too-long rcu_torture_fwd_prog() function
  rcutorture: Remove cbflood facility
  torture: Bring any extra CPUs online during kernel startup
  rcutorture: Add call_rcu() flooding forward-progress tests
  rcutorture/formal: Replace synchronize_sched() with synchronize_rcu()
  tools/kernel.h: Replace synchronize_sched() with synchronize_rcu()
  net/decnet: Replace rcu_barrier_bh() with rcu_barrier()
  ...
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net</title>
<updated>2018-12-20T19:53:36+00:00</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2018-12-20T18:53:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=2be09de7d6a06f58e768de1255a687c9aaa66606'/>
<id>urn:sha1:2be09de7d6a06f58e768de1255a687c9aaa66606</id>
<content type='text'>
Lots of conflicts, by happily all cases of overlapping
changes, parallel adds, things of that nature.

Thanks to Stephen Rothwell, Saeed Mahameed, and others
for their guidance in these resolutions.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>vhost_net: switch to use mutex_trylock() in vhost_net_busy_poll()</title>
<updated>2018-12-13T05:56:19+00:00</updated>
<author>
<name>Jason Wang</name>
<email>jasowang@redhat.com</email>
</author>
<published>2018-12-13T02:53:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=476e8ba712fa6a479192dfc4301cd3765d9ba713'/>
<id>urn:sha1:476e8ba712fa6a479192dfc4301cd3765d9ba713</id>
<content type='text'>
We used to hold the mutex of paired virtqueue in
vhost_net_busy_poll(). But this will results an inconsistent lock
order which may cause deadlock if we try to bring back the protection
of device IOTLB with vq mutex that requires to hold mutex of all
virtqueues at the same time.

Fix this simply by switching to use mutex_trylock(), when fail just
skip the busy polling. This can happen when device IOTLB is under
updating which should be rare.

Fixes: commit 78139c94dc8c ("net: vhost: lock the vqs one by one")
Cc: Tonghao Zhang &lt;xiangxia.m.yue@gmail.com&gt;
Acked-by: Michael S. Tsirkin &lt;mst@redhat.com&gt;
Signed-off-by: Jason Wang &lt;jasowang@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
