<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/include/uapi/linux/kvm.h, branch linux-5.11.y</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=linux-5.11.y</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=linux-5.11.y'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2021-01-07T23:11:37+00:00</updated>
<entry>
<title>KVM: SVM: Add support for booting APs in an SEV-ES guest</title>
<updated>2021-01-07T23:11:37+00:00</updated>
<author>
<name>Tom Lendacky</name>
<email>thomas.lendacky@amd.com</email>
</author>
<published>2021-01-04T20:20:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=647daca25d24fb6eadc7b6cd680ad3e6eed0f3d5'/>
<id>urn:sha1:647daca25d24fb6eadc7b6cd680ad3e6eed0f3d5</id>
<content type='text'>
Typically under KVM, an AP is booted using the INIT-SIPI-SIPI sequence,
where the guest vCPU register state is updated and then the vCPU is VMRUN
to begin execution of the AP. For an SEV-ES guest, this won't work because
the guest register state is encrypted.

Following the GHCB specification, the hypervisor must not alter the guest
register state, so KVM must track an AP/vCPU boot. Should the guest want
to park the AP, it must use the AP Reset Hold exit event in place of, for
example, a HLT loop.

First AP boot (first INIT-SIPI-SIPI sequence):
  Execute the AP (vCPU) as it was initialized and measured by the SEV-ES
  support. It is up to the guest to transfer control of the AP to the
  proper location.

Subsequent AP boot:
  KVM will expect to receive an AP Reset Hold exit event indicating that
  the vCPU is being parked and will require an INIT-SIPI-SIPI sequence to
  awaken it. When the AP Reset Hold exit event is received, KVM will place
  the vCPU into a simulated HLT mode. Upon receiving the INIT-SIPI-SIPI
  sequence, KVM will make the vCPU runnable. It is again up to the guest
  to then transfer control of the AP to the proper location.

  To differentiate between an actual HLT and an AP Reset Hold, a new MP
  state is introduced, KVM_MP_STATE_AP_RESET_HOLD, which the vCPU is
  placed in upon receiving the AP Reset Hold exit event. Additionally, to
  communicate the AP Reset Hold exit event up to userspace (if needed), a
  new exit reason is introduced, KVM_EXIT_AP_RESET_HOLD.

A new x86 ops function is introduced, vcpu_deliver_sipi_vector, in order
to accomplish AP booting. For VMX, vcpu_deliver_sipi_vector is set to the
original SIPI delivery function, kvm_vcpu_deliver_sipi_vector(). SVM adds
a new function that, for non SEV-ES guests, invokes the original SIPI
delivery function, kvm_vcpu_deliver_sipi_vector(), but for SEV-ES guests,
implements the logic above.

Signed-off-by: Tom Lendacky &lt;thomas.lendacky@amd.com&gt;
Message-Id: &lt;e8fbebe8eb161ceaabdad7c01a5859a78b424d5e.1609791600.git.thomas.lendacky@amd.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
<entry>
<title>KVM: X86: Implement ring-based dirty memory tracking</title>
<updated>2020-11-15T14:49:15+00:00</updated>
<author>
<name>Peter Xu</name>
<email>peterx@redhat.com</email>
</author>
<published>2020-10-01T01:22:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=fb04a1eddb1a65b6588a021bdc132270d5ae48bb'/>
<id>urn:sha1:fb04a1eddb1a65b6588a021bdc132270d5ae48bb</id>
<content type='text'>
This patch is heavily based on previous work from Lei Cao
&lt;lei.cao@stratus.com&gt; and Paolo Bonzini &lt;pbonzini@redhat.com&gt;. [1]

KVM currently uses large bitmaps to track dirty memory.  These bitmaps
are copied to userspace when userspace queries KVM for its dirty page
information.  The use of bitmaps is mostly sufficient for live
migration, as large parts of memory are be dirtied from one log-dirty
pass to another.  However, in a checkpointing system, the number of
dirty pages is small and in fact it is often bounded---the VM is
paused when it has dirtied a pre-defined number of pages. Traversing a
large, sparsely populated bitmap to find set bits is time-consuming,
as is copying the bitmap to user-space.

A similar issue will be there for live migration when the guest memory
is huge while the page dirty procedure is trivial.  In that case for
each dirty sync we need to pull the whole dirty bitmap to userspace
and analyse every bit even if it's mostly zeros.

The preferred data structure for above scenarios is a dense list of
guest frame numbers (GFN).  This patch series stores the dirty list in
kernel memory that can be memory mapped into userspace to allow speedy
harvesting.

This patch enables dirty ring for X86 only.  However it should be
easily extended to other archs as well.

[1] https://patchwork.kernel.org/patch/10471409/

Signed-off-by: Lei Cao &lt;lei.cao@stratus.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
Signed-off-by: Peter Xu &lt;peterx@redhat.com&gt;
Message-Id: &lt;20201001012222.5767-1-peterx@redhat.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
<entry>
<title>KVM: x86: hyper-v: allow KVM_GET_SUPPORTED_HV_CPUID as a system ioctl</title>
<updated>2020-11-15T14:49:11+00:00</updated>
<author>
<name>Vitaly Kuznetsov</name>
<email>vkuznets@redhat.com</email>
</author>
<published>2020-09-29T15:09:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c21d54f0307ff42a346294899107b570b98c47b5'/>
<id>urn:sha1:c21d54f0307ff42a346294899107b570b98c47b5</id>
<content type='text'>
KVM_GET_SUPPORTED_HV_CPUID is a vCPU ioctl but its output is now
independent from vCPU and in some cases VMMs may want to use it as a system
ioctl instead. In particular, QEMU doesn CPU feature expansion before any
vCPU gets created so KVM_GET_SUPPORTED_HV_CPUID can't be used.

Convert KVM_GET_SUPPORTED_HV_CPUID to 'dual' system/vCPU ioctl with the
same meaning.

Signed-off-by: Vitaly Kuznetsov &lt;vkuznets@redhat.com&gt;
Message-Id: &lt;20200929150944.1235688-2-vkuznets@redhat.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
<entry>
<title>kvm: x86: only provide PV features if enabled in guest's CPUID</title>
<updated>2020-10-21T21:36:32+00:00</updated>
<author>
<name>Oliver Upton</name>
<email>oupton@google.com</email>
</author>
<published>2020-08-18T15:24:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=66570e966dd9cb4fd57811d0056c6472a14a2c41'/>
<id>urn:sha1:66570e966dd9cb4fd57811d0056c6472a14a2c41</id>
<content type='text'>
KVM unconditionally provides PV features to the guest, regardless of the
configured CPUID. An unwitting guest that doesn't check
KVM_CPUID_FEATURES before use could access paravirt features that
userspace did not intend to provide. Fix this by checking the guest's
CPUID before performing any paravirtual operations.

Introduce a capability, KVM_CAP_ENFORCE_PV_FEATURE_CPUID, to gate the
aforementioned enforcement. Migrating a VM from a host w/o this patch to
a host with this patch could silently change the ABI exposed to the
guest, warranting that we default to the old behavior and opt-in for
the new one.

Reviewed-by: Jim Mattson &lt;jmattson@google.com&gt;
Reviewed-by: Peter Shier &lt;pshier@google.com&gt;
Signed-off-by: Oliver Upton &lt;oupton@google.com&gt;
Change-Id: I202a0926f65035b872bfe8ad15307c026de59a98
Message-Id: &lt;20200818152429.1923996-4-oupton@google.com&gt;
Reviewed-by: Wanpeng Li &lt;wanpengli@tencent.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
<entry>
<title>KVM: x86: Introduce MSR filtering</title>
<updated>2020-09-28T11:58:08+00:00</updated>
<author>
<name>Alexander Graf</name>
<email>graf@amazon.com</email>
</author>
<published>2020-09-25T14:34:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=1a155254ff937ac92cf9940d273ea597b2c667a2'/>
<id>urn:sha1:1a155254ff937ac92cf9940d273ea597b2c667a2</id>
<content type='text'>
It's not desireable to have all MSRs always handled by KVM kernel space. Some
MSRs would be useful to handle in user space to either emulate behavior (like
uCode updates) or differentiate whether they are valid based on the CPU model.

To allow user space to specify which MSRs it wants to see handled by KVM,
this patch introduces a new ioctl to push filter rules with bitmaps into
KVM. Based on these bitmaps, KVM can then decide whether to reject MSR access.
With the addition of KVM_CAP_X86_USER_SPACE_MSR it can also deflect the
denied MSR events to user space to operate on.

If no filter is populated, MSR handling stays identical to before.

Signed-off-by: Alexander Graf &lt;graf@amazon.com&gt;

Message-Id: &lt;20200925143422.21718-8-graf@amazon.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
<entry>
<title>KVM: x86: Allow deflecting unknown MSR accesses to user space</title>
<updated>2020-09-28T11:58:04+00:00</updated>
<author>
<name>Alexander Graf</name>
<email>graf@amazon.com</email>
</author>
<published>2020-09-25T14:34:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=1ae099540e8c7f1ee066b3ad45cc91f582bb1ce8'/>
<id>urn:sha1:1ae099540e8c7f1ee066b3ad45cc91f582bb1ce8</id>
<content type='text'>
MSRs are weird. Some of them are normal control registers, such as EFER.
Some however are registers that really are model specific, not very
interesting to virtualization workloads, and not performance critical.
Others again are really just windows into package configuration.

Out of these MSRs, only the first category is necessary to implement in
kernel space. Rarely accessed MSRs, MSRs that should be fine tunes against
certain CPU models and MSRs that contain information on the package level
are much better suited for user space to process. However, over time we have
accumulated a lot of MSRs that are not the first category, but still handled
by in-kernel KVM code.

This patch adds a generic interface to handle WRMSR and RDMSR from user
space. With this, any future MSR that is part of the latter categories can
be handled in user space.

Furthermore, it allows us to replace the existing "ignore_msrs" logic with
something that applies per-VM rather than on the full system. That way you
can run productive VMs in parallel to experimental ones where you don't care
about proper MSR handling.

Signed-off-by: Alexander Graf &lt;graf@amazon.com&gt;
Reviewed-by: Jim Mattson &lt;jmattson@google.com&gt;

Message-Id: &lt;20200925143422.21718-3-graf@amazon.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
<entry>
<title>KVM: MIPS: Change the definition of kvm type</title>
<updated>2020-09-11T17:22:52+00:00</updated>
<author>
<name>Huacai Chen</name>
<email>chenhc@lemote.com</email>
</author>
<published>2020-09-10T10:33:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=15e9e35cd1dec2bc138464de6bf8ef828df19235'/>
<id>urn:sha1:15e9e35cd1dec2bc138464de6bf8ef828df19235</id>
<content type='text'>
MIPS defines two kvm types:

 #define KVM_VM_MIPS_TE          0
 #define KVM_VM_MIPS_VZ          1

In Documentation/virt/kvm/api.rst it is said that "You probably want to
use 0 as machine type", which implies that type 0 be the "automatic" or
"default" type. And, in user-space libvirt use the null-machine (with
type 0) to detect the kvm capability, which returns "KVM not supported"
on a VZ platform.

I try to fix it in QEMU but it is ugly:
https://lists.nongnu.org/archive/html/qemu-devel/2020-08/msg05629.html

And Thomas Huth suggests me to change the definition of kvm type:
https://lists.nongnu.org/archive/html/qemu-devel/2020-09/msg03281.html

So I define like this:

 #define KVM_VM_MIPS_AUTO        0
 #define KVM_VM_MIPS_VZ          1
 #define KVM_VM_MIPS_TE          2

Since VZ and TE cannot co-exists, using type 0 on a TE platform will
still return success (so old user-space tools have no problems on new
kernels); the advantage is that using type 0 on a VZ platform will not
return failure. So, the only problem is "new user-space tools use type
2 on old kernels", but if we treat this as a kernel bug, we can backport
this patch to old stable kernels.

Signed-off-by: Huacai Chen &lt;chenhc@lemote.com&gt;
Message-Id: &lt;1599734031-28746-1-git-send-email-chenhc@lemote.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
<entry>
<title>arm64/x86: KVM: Introduce steal-time cap</title>
<updated>2020-08-21T13:05:19+00:00</updated>
<author>
<name>Andrew Jones</name>
<email>drjones@redhat.com</email>
</author>
<published>2020-08-04T17:06:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=004a01241c5a0d375266ebf1c72f208de99294e9'/>
<id>urn:sha1:004a01241c5a0d375266ebf1c72f208de99294e9</id>
<content type='text'>
arm64 requires a vcpu fd (KVM_HAS_DEVICE_ATTR vcpu ioctl) to probe
support for steal-time. However this is unnecessary, as only a KVM
fd is required, and it complicates userspace (userspace may prefer
delaying vcpu creation until after feature probing). Introduce a cap
that can be checked instead. While x86 can already probe steal-time
support with a kvm fd (KVM_GET_SUPPORTED_CPUID), we add the cap there
too for consistency.

Signed-off-by: Andrew Jones &lt;drjones@redhat.com&gt;
Signed-off-by: Marc Zyngier &lt;maz@kernel.org&gt;
Reviewed-by: Steven Price &lt;steven.price@arm.com&gt;
Link: https://lore.kernel.org/r/20200804170604.42662-7-drjones@redhat.com
</content>
</entry>
<entry>
<title>Merge tag 'kvm-s390-next-5.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-next-5.6</title>
<updated>2020-08-03T18:19:13+00:00</updated>
<author>
<name>Paolo Bonzini</name>
<email>pbonzini@redhat.com</email>
</author>
<published>2020-08-03T18:19:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=f3633c2683545213de4a00a9b0c3fba741321fb2'/>
<id>urn:sha1:f3633c2683545213de4a00a9b0c3fba741321fb2</id>
<content type='text'>
KVM: s390: Enhancement for 5.9
- implement diagnose 318
</content>
</entry>
<entry>
<title>KVM: x86: Add a capability for GUEST_MAXPHYADDR &lt; HOST_MAXPHYADDR support</title>
<updated>2020-07-10T21:01:53+00:00</updated>
<author>
<name>Mohammed Gamal</name>
<email>mgamal@redhat.com</email>
</author>
<published>2020-07-10T15:48:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=3edd68399dc155b80335244c8c2673eaa652931a'/>
<id>urn:sha1:3edd68399dc155b80335244c8c2673eaa652931a</id>
<content type='text'>
This patch adds a new capability KVM_CAP_SMALLER_MAXPHYADDR which
allows userspace to query if the underlying architecture would
support GUEST_MAXPHYADDR &lt; HOST_MAXPHYADDR and hence act accordingly
(e.g. qemu can decide if it should warn for -cpu ..,phys-bits=X)

The complications in this patch are due to unexpected (but documented)
behaviour we see with NPF vmexit handling in AMD processor.  If
SVM is modified to add guest physical address checks in the NPF
and guest #PF paths, we see the followning error multiple times in
the 'access' test in kvm-unit-tests:

            test pte.p pte.36 pde.p: FAIL: pte 2000021 expected 2000001
            Dump mapping: address: 0x123400000000
            ------L4: 24c3027
            ------L3: 24c4027
            ------L2: 24c5021
            ------L1: 1002000021

This is because the PTE's accessed bit is set by the CPU hardware before
the NPF vmexit. This is handled completely by hardware and cannot be fixed
in software.

Therefore, availability of the new capability depends on a boolean variable
allow_smaller_maxphyaddr which is set individually by VMX and SVM init
routines. On VMX it's always set to true, on SVM it's only set to true
when NPT is not enabled.

CC: Tom Lendacky &lt;thomas.lendacky@amd.com&gt;
CC: Babu Moger &lt;babu.moger@amd.com&gt;
Signed-off-by: Mohammed Gamal &lt;mgamal@redhat.com&gt;
Message-Id: &lt;20200710154811.418214-10-mgamal@redhat.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
</feed>
