<feed xmlns='http://www.w3.org/2005/Atom'>
<title>starfive-tech/linux.git/virt, branch master</title>
<subtitle>StarFive Tech Linux Kernel for VisionFive (JH7110) boards (mirror)</subtitle>
<id>https://git.radix-linux.su/starfive-tech/linux.git/atom?h=master</id>
<link rel='self' href='https://git.radix-linux.su/starfive-tech/linux.git/atom?h=master'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/starfive-tech/linux.git/'/>
<updated>2021-05-07T10:06:22+00:00</updated>
<entry>
<title>kvm: Cap halt polling at kvm-&gt;max_halt_poll_ns</title>
<updated>2021-05-07T10:06:22+00:00</updated>
<author>
<name>David Matlack</name>
<email>dmatlack@google.com</email>
</author>
<published>2021-05-06T15:24:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/starfive-tech/linux.git/commit/?id=258785ef08b323bddd844b4926a32c2b2045a1b0'/>
<id>urn:sha1:258785ef08b323bddd844b4926a32c2b2045a1b0</id>
<content type='text'>
When growing halt-polling, there is no check that the poll time exceeds
the per-VM limit. It's possible for vcpu-&gt;halt_poll_ns to grow past
kvm-&gt;max_halt_poll_ns and stay there until a halt which takes longer
than kvm-&gt;halt_poll_ns.

Signed-off-by: David Matlack &lt;dmatlack@google.com&gt;
Signed-off-by: Venkatesh Srinivas &lt;venkateshs@chromium.org&gt;
Message-Id: &lt;20210506152442.4010298-1-venkateshs@chromium.org&gt;
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
<entry>
<title>kvm: exit halt polling on need_resched() as well</title>
<updated>2021-05-03T15:25:36+00:00</updated>
<author>
<name>Benjamin Segall</name>
<email>bsegall@google.com</email>
</author>
<published>2021-04-29T16:22:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/starfive-tech/linux.git/commit/?id=262de4102c7bb8e59f26a967a8ffe8cce85cc537'/>
<id>urn:sha1:262de4102c7bb8e59f26a967a8ffe8cce85cc537</id>
<content type='text'>
single_task_running() is usually more general than need_resched()
but CFS_BANDWIDTH throttling will use resched_task() when there
is just one task to get the task to block. This was causing
long-need_resched warnings and was likely allowing VMs to
overrun their quota when halt polling.

Signed-off-by: Ben Segall &lt;bsegall@google.com&gt;
Signed-off-by: Venkatesh Srinivas &lt;venkateshs@chromium.org&gt;
Message-Id: &lt;20210429162233.116849-1-venkateshs@chromium.org&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
Cc: stable@vger.kernel.org
Reviewed-by: Jim Mattson &lt;jmattson@google.com&gt;
</content>
</entry>
<entry>
<title>KVM: Boost vCPU candidate in user mode which is delivering interrupt</title>
<updated>2021-04-21T16:20:03+00:00</updated>
<author>
<name>Wanpeng Li</name>
<email>wanpengli@tencent.com</email>
</author>
<published>2021-04-16T03:08:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/starfive-tech/linux.git/commit/?id=52acd22faa1af8a0514ccd075a6978ac97986425'/>
<id>urn:sha1:52acd22faa1af8a0514ccd075a6978ac97986425</id>
<content type='text'>
Both lock holder vCPU and IPI receiver that has halted are condidate for
boost. However, the PLE handler was originally designed to deal with the
lock holder preemption problem. The Intel PLE occurs when the spinlock
waiter is in kernel mode. This assumption doesn't hold for IPI receiver,
they can be in either kernel or user mode. the vCPU candidate in user mode
will not be boosted even if they should respond to IPIs. Some benchmarks
like pbzip2, swaptions etc do the TLB shootdown in kernel mode and most
of the time they are running in user mode. It can lead to a large number
of continuous PLE events because the IPI sender causes PLE events
repeatedly until the receiver is scheduled while the receiver is not
candidate for a boost.

This patch boosts the vCPU candidiate in user mode which is delivery
interrupt. We can observe the speed of pbzip2 improves 10% in 96 vCPUs
VM in over-subscribe scenario (The host machine is 2 socket, 48 cores,
96 HTs Intel CLX box). There is no performance regression for other
benchmarks like Unixbench spawn (most of the time contend read/write
lock in kernel mode), ebizzy (most of the time contend read/write sem
and TLB shoodtdown in kernel mode).

Signed-off-by: Wanpeng Li &lt;wanpengli@tencent.com&gt;
Message-Id: &lt;1618542490-14756-1-git-send-email-wanpengli@tencent.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
<entry>
<title>KVM: x86: Support KVM VMs sharing SEV context</title>
<updated>2021-04-21T16:20:02+00:00</updated>
<author>
<name>Nathan Tempelman</name>
<email>natet@google.com</email>
</author>
<published>2021-04-08T22:32:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/starfive-tech/linux.git/commit/?id=54526d1fd59338fd6a381dbd806b7ccbae3aa4aa'/>
<id>urn:sha1:54526d1fd59338fd6a381dbd806b7ccbae3aa4aa</id>
<content type='text'>
Add a capability for userspace to mirror SEV encryption context from
one vm to another. On our side, this is intended to support a
Migration Helper vCPU, but it can also be used generically to support
other in-guest workloads scheduled by the host. The intention is for
the primary guest and the mirror to have nearly identical memslots.

The primary benefits of this are that:
1) The VMs do not share KVM contexts (think APIC/MSRs/etc), so they
can't accidentally clobber each other.
2) The VMs can have different memory-views, which is necessary for post-copy
migration (the migration vCPUs on the target need to read and write to
pages, when the primary guest would VMEXIT).

This does not change the threat model for AMD SEV. Any memory involved
is still owned by the primary guest and its initial state is still
attested to through the normal SEV_LAUNCH_* flows. If userspace wanted
to circumvent SEV, they could achieve the same effect by simply attaching
a vCPU to the primary VM.
This patch deliberately leaves userspace in charge of the memslots for the
mirror, as it already has the power to mess with them in the primary guest.

This patch does not support SEV-ES (much less SNP), as it does not
handle handing off attested VMSAs to the mirror.

For additional context, we need a Migration Helper because SEV PSP
migration is far too slow for our live migration on its own. Using
an in-guest migrator lets us speed this up significantly.

Signed-off-by: Nathan Tempelman &lt;natet@google.com&gt;
Message-Id: &lt;20210408223214.2582277-1-natet@google.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
<entry>
<title>KVM: Add proper lockdep assertion in I/O bus unregister</title>
<updated>2021-04-20T08:18:51+00:00</updated>
<author>
<name>Sean Christopherson</name>
<email>seanjc@google.com</email>
</author>
<published>2021-04-12T22:20:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/starfive-tech/linux.git/commit/?id=7c896d375565a032705f64804f8c1189df1f7a89'/>
<id>urn:sha1:7c896d375565a032705f64804f8c1189df1f7a89</id>
<content type='text'>
Convert a comment above kvm_io_bus_unregister_dev() into an actual
lockdep assertion, and opportunistically add curly braces to a multi-line
for-loop.

Signed-off-by: Sean Christopherson &lt;seanjc@google.com&gt;
Message-Id: &lt;20210412222050.876100-4-seanjc@google.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
<entry>
<title>KVM: Stop looking for coalesced MMIO zones if the bus is destroyed</title>
<updated>2021-04-20T08:18:51+00:00</updated>
<author>
<name>Sean Christopherson</name>
<email>seanjc@google.com</email>
</author>
<published>2021-04-12T22:20:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/starfive-tech/linux.git/commit/?id=5d3c4c79384af06e3c8e25b7770b6247496b4417'/>
<id>urn:sha1:5d3c4c79384af06e3c8e25b7770b6247496b4417</id>
<content type='text'>
Abort the walk of coalesced MMIO zones if kvm_io_bus_unregister_dev()
fails to allocate memory for the new instance of the bus.  If it can't
instantiate a new bus, unregister_dev() destroys all devices _except_ the
target device.   But, it doesn't tell the caller that it obliterated the
bus and invoked the destructor for all devices that were on the bus.  In
the coalesced MMIO case, this can result in a deleted list entry
dereference due to attempting to continue iterating on coalesced_zones
after future entries (in the walk) have been deleted.

Opportunistically add curly braces to the for-loop, which encompasses
many lines but sneaks by without braces due to the guts being a single
if statement.

Fixes: f65886606c2d ("KVM: fix memory leak in kvm_io_bus_unregister_dev()")
Cc: stable@vger.kernel.org
Reported-by: Hao Sun &lt;sunhao.th@gmail.com&gt;
Signed-off-by: Sean Christopherson &lt;seanjc@google.com&gt;
Message-Id: &lt;20210412222050.876100-3-seanjc@google.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
<entry>
<title>KVM: Destroy I/O bus devices on unregister failure _after_ sync'ing SRCU</title>
<updated>2021-04-20T08:18:50+00:00</updated>
<author>
<name>Sean Christopherson</name>
<email>seanjc@google.com</email>
</author>
<published>2021-04-12T22:20:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/starfive-tech/linux.git/commit/?id=2ee3757424be7c1cd1d0bbfa6db29a7edd82a250'/>
<id>urn:sha1:2ee3757424be7c1cd1d0bbfa6db29a7edd82a250</id>
<content type='text'>
If allocating a new instance of an I/O bus fails when unregistering a
device, wait to destroy the device until after all readers are guaranteed
to see the new null bus.  Destroying devices before the bus is nullified
could lead to use-after-free since readers expect the devices on their
reference of the bus to remain valid.

Fixes: f65886606c2d ("KVM: fix memory leak in kvm_io_bus_unregister_dev()")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson &lt;seanjc@google.com&gt;
Message-Id: &lt;20210412222050.876100-2-seanjc@google.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
<entry>
<title>KVM: Take mmu_lock when handling MMU notifier iff the hva hits a memslot</title>
<updated>2021-04-17T12:31:08+00:00</updated>
<author>
<name>Sean Christopherson</name>
<email>seanjc@google.com</email>
</author>
<published>2021-04-02T00:56:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/starfive-tech/linux.git/commit/?id=8931a454aea03bab21b3b8fcdc94f674eebd1c5d'/>
<id>urn:sha1:8931a454aea03bab21b3b8fcdc94f674eebd1c5d</id>
<content type='text'>
Defer acquiring mmu_lock in the MMU notifier paths until a "hit" has been
detected in the memslots, i.e. don't take the lock for notifications that
don't affect the guest.

For small VMs, spurious locking is a minor annoyance.  And for "volatile"
setups where the majority of notifications _are_ relevant, this barely
qualifies as an optimization.

But, for large VMs (hundreds of threads) with static setups, e.g. no
page migration, no swapping, etc..., the vast majority of MMU notifier
callbacks will be unrelated to the guest, e.g. will often be in response
to the userspace VMM adjusting its own virtual address space.  In such
large VMs, acquiring mmu_lock can be painful as it blocks vCPUs from
handling page faults.  In some scenarios it can even be "fatal" in the
sense that it causes unacceptable brownouts, e.g. when rebuilding huge
pages after live migration, a significant percentage of vCPUs will be
attempting to handle page faults.

x86's TDP MMU implementation is especially susceptible to spurious
locking due it taking mmu_lock for read when handling page faults.
Because rwlock is fair, a single writer will stall future readers, while
the writer is itself stalled waiting for in-progress readers to complete.
This is exacerbated by the MMU notifiers often firing multiple times in
quick succession, e.g. moving a page will (always?) invoke three separate
notifiers: .invalidate_range_start(), invalidate_range_end(), and
.change_pte().  Unnecessarily taking mmu_lock each time means even a
single spurious sequence can be problematic.

Note, this optimizes only the unpaired callbacks.  Optimizing the
.invalidate_range_{start,end}() pairs is more complex and will be done in
a future patch.

Suggested-by: Ben Gardon &lt;bgardon@google.com&gt;
Signed-off-by: Sean Christopherson &lt;seanjc@google.com&gt;
Message-Id: &lt;20210402005658.3024832-9-seanjc@google.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
<entry>
<title>KVM: Move MMU notifier's mmu_lock acquisition into common helper</title>
<updated>2021-04-17T12:31:08+00:00</updated>
<author>
<name>Sean Christopherson</name>
<email>seanjc@google.com</email>
</author>
<published>2021-04-02T00:56:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/starfive-tech/linux.git/commit/?id=f922bd9bf33bd5a8c6694927f010f32127810fbf'/>
<id>urn:sha1:f922bd9bf33bd5a8c6694927f010f32127810fbf</id>
<content type='text'>
Acquire and release mmu_lock in the __kvm_handle_hva_range() helper
instead of requiring the caller to do the same.  This paves the way for
future patches to take mmu_lock if and only if an overlapping memslot is
found, without also having to introduce the on_lock() shenanigans used
to manipulate the notifier count and sequence.

No functional change intended.

Signed-off-by: Sean Christopherson &lt;seanjc@google.com&gt;
Message-Id: &lt;20210402005658.3024832-8-seanjc@google.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
<entry>
<title>KVM: Kill off the old hva-based MMU notifier callbacks</title>
<updated>2021-04-17T12:31:08+00:00</updated>
<author>
<name>Sean Christopherson</name>
<email>seanjc@google.com</email>
</author>
<published>2021-04-02T00:56:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/starfive-tech/linux.git/commit/?id=b4c5936c47f86295cc76672e8dbeeca8b2379ba6'/>
<id>urn:sha1:b4c5936c47f86295cc76672e8dbeeca8b2379ba6</id>
<content type='text'>
Yank out the hva-based MMU notifier APIs now that all architectures that
use the notifiers have moved to the gfn-based APIs.

No functional change intended.

Signed-off-by: Sean Christopherson &lt;seanjc@google.com&gt;
Message-Id: &lt;20210402005658.3024832-7-seanjc@google.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
</feed>
