summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2022-01-06KVM: RISC-V: Use common KVM implementation of MMU memory cachesSean Christopherson4-65/+18
Use common KVM's implementation of the MMU memory caches, which for all intents and purposes is semantically identical to RISC-V's version, the only difference being that the common implementation will fall back to an atomic allocation if there's a KVM bug that triggers a cache underflow. RISC-V appears to have based its MMU code on arm64 before the conversion to the common caches in commit c1a33aebe91d ("KVM: arm64: Use common KVM implementation of MMU memory caches"), despite having also copy-pasted the definition of KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE in kvm_types.h. Opportunistically drop the superfluous wrapper kvm_riscv_stage2_flush_cache(), whose name is very, very confusing as "cache flush" in the context of MMU code almost always refers to flushing hardware caches, not freeing unused software objects. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Anup Patel <anup.patel@wdc.com>
2022-01-04Merge branch kvm-arm64/misc-5.17 into kvmarm-master/nextMarc Zyngier11-24/+70
* kvm-arm64/misc-5.17: : . : Misc fixes and improvements: : - Add minimal support for ARMv8.7's PMU extension : - Constify kvm_io_gic_ops : - Drop kvm_is_transparent_hugepage() prototype : - Drop unused workaround_flags field : - Rework kvm_pgtable initialisation : - Documentation fixes : - Replace open-coded SCTLR_EL1.EE useage with its defined macro : - Sysreg list selftest update to handle PAuth : - Include cleanups : . KVM: arm64: vgic: Replace kernel.h with the necessary inclusions KVM: arm64: Fix comment typo in kvm_vcpu_finalize_sve() KVM: arm64: selftests: get-reg-list: Add pauth configuration KVM: arm64: Fix comment on barrier in kvm_psci_vcpu_on() KVM: arm64: Fix comment for kvm_reset_vcpu() KVM: arm64: Use defined value for SCTLR_ELx_EE KVM: arm64: Rework kvm_pgtable initialisation KVM: arm64: Drop unused workaround_flags vcpu field Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-01-04KVM: arm64: vgic: Replace kernel.h with the necessary inclusionsAndy Shevchenko1-1/+3
arm_vgic.h does not require all the stuff that kernel.h provides. Replace kernel.h inclusion with the list of what is really being used. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220104151940.55399-1-andriy.shevchenko@linux.intel.com
2022-01-04Merge branch kvm-arm64/selftest/irq-injection into kvmarm-master/nextMarc Zyngier13-29/+1352
* kvm-arm64/selftest/irq-injection: : . : New tests from Ricardo Koller: : "This series adds a new test, aarch64/vgic-irq, that validates the injection of : different types of IRQs from userspace using various methods and configurations" : . KVM: selftests: aarch64: Add test for restoring active IRQs KVM: selftests: aarch64: Add ISPENDR write tests in vgic_irq KVM: selftests: aarch64: Add tests for IRQFD in vgic_irq KVM: selftests: Add IRQ GSI routing library functions KVM: selftests: aarch64: Add test_inject_fail to vgic_irq KVM: selftests: aarch64: Add tests for LEVEL_INFO in vgic_irq KVM: selftests: aarch64: Level-sensitive interrupts tests in vgic_irq KVM: selftests: aarch64: Add preemption tests in vgic_irq KVM: selftests: aarch64: Cmdline arg to set EOI mode in vgic_irq KVM: selftests: aarch64: Cmdline arg to set number of IRQs in vgic_irq test KVM: selftests: aarch64: Abstract the injection functions in vgic_irq KVM: selftests: aarch64: Add vgic_irq to test userspace IRQ injection KVM: selftests: aarch64: Add vGIC library functions to deal with vIRQ state KVM: selftests: Add kvm_irq_line library function KVM: selftests: aarch64: Add GICv3 register accessor library functions KVM: selftests: aarch64: Add function for accessing GICv3 dist and redist registers KVM: selftests: aarch64: Move gic_v3.h to shared headers Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-01-04Merge branch kvm-arm64/selftest/ipa into kvmarm-master/nextMarc Zyngier5-14/+152
* kvm-arm64/selftest/ipa: : . : Expand the KVM/arm64 selftest infrastructure to discover : supported page sizes at runtime, support 16kB pages, and : find out about the original M1 stupidly small IPA space. : . KVM: selftests: arm64: Add support for various modes with 16kB page size KVM: selftests: arm64: Add support for VM_MODE_P36V48_{4K,64K} KVM: selftests: arm64: Rework TCR_EL1 configuration KVM: selftests: arm64: Check for supported page sizes KVM: selftests: arm64: Introduce a variable default IPA size KVM: selftests: arm64: Initialise default guest mode at test startup time Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-01-04KVM: arm64: Fix comment typo in kvm_vcpu_finalize_sve()Zenghui Yu1-1/+1
kvm_arm_init_arch_resources() was renamed to kvm_arm_init_sve() in commit a3be836df7cb ("KVM: arm/arm64: Demote kvm_arm_init_arch_resources() to just set up SVE"). Fix the function name in comment of kvm_vcpu_finalize_sve(). Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211230141535.1389-1-yuzenghui@huawei.com
2022-01-04KVM: arm64: selftests: get-reg-list: Add pauth configurationMarc Zyngier1-0/+50
The get-reg-list test ignores the Pointer Authentication features, which is a shame now that we have relatively common HW with this feature. Define two new configurations (with and without PMU) that exercise the KVM capabilities. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Andrew Jones <drjones@redhat.com> Link: https://lore.kernel.org/r/20211228121414.1013250-1-maz@kernel.org
2021-12-28KVM: selftests: aarch64: Add test for restoring active IRQsRicardo Koller1-0/+91
Add a test that restores multiple IRQs in active state, it does it by writing into ISACTIVER from the guest and using KVM ioctls. This test tries to emulate what would happen during a live migration: restore active IRQs. Signed-off-by: Ricardo Koller <ricarkol@google.com> Acked-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211109023906.1091208-18-ricarkol@google.com
2021-12-28KVM: selftests: aarch64: Add ISPENDR write tests in vgic_irqRicardo Koller1-0/+22
Add injection tests that use writing into the ISPENDR register (to mark IRQs as pending). This is typically used by migration code. Signed-off-by: Ricardo Koller <ricarkol@google.com> Acked-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211109023906.1091208-17-ricarkol@google.com
2021-12-28KVM: selftests: aarch64: Add tests for IRQFD in vgic_irqRicardo Koller2-1/+102
Add injection tests for the KVM_IRQFD ioctl into vgic_irq. Signed-off-by: Ricardo Koller <ricarkol@google.com> Acked-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211109023906.1091208-16-ricarkol@google.com
2021-12-28KVM: selftests: Add IRQ GSI routing library functionsRicardo Koller2-0/+59
Add an architecture independent wrapper function for creating and writing IRQ GSI routing tables. Also add a function to add irqchip entries. Signed-off-by: Ricardo Koller <ricarkol@google.com> Acked-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211109023906.1091208-15-ricarkol@google.com
2021-12-28KVM: selftests: aarch64: Add test_inject_fail to vgic_irqRicardo Koller2-20/+109
Add tests for failed injections to vgic_irq. This tests that KVM can handle bogus IRQ numbers. Signed-off-by: Ricardo Koller <ricarkol@google.com> Acked-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211109023906.1091208-14-ricarkol@google.com
2021-12-28KVM: selftests: aarch64: Add tests for LEVEL_INFO in vgic_irqRicardo Koller1-0/+6
Add injection tests for the LEVEL_INFO ioctl (level-sensitive specific) into vgic_irq. Signed-off-by: Ricardo Koller <ricarkol@google.com> Acked-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211109023906.1091208-13-ricarkol@google.com
2021-12-28KVM: selftests: aarch64: Level-sensitive interrupts tests in vgic_irqRicardo Koller1-32/+86
Add a cmdline arg for using level-sensitive interrupts (vs the default edge-triggered). Then move the handler into a generic handler function that takes the type of interrupt (level vs. edge) as an arg. When handling line-sensitive interrupts it sets the line to low after acknowledging the IRQ. Signed-off-by: Ricardo Koller <ricarkol@google.com> Acked-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211109023906.1091208-12-ricarkol@google.com
2021-12-28KVM: selftests: aarch64: Add preemption tests in vgic_irqRicardo Koller1-1/+90
Add tests for IRQ preemption (having more than one activated IRQ at the same time). This test injects multiple concurrent IRQs and handles them without handling the actual exceptions. This is done by masking interrupts for the whole test. Signed-off-by: Ricardo Koller <ricarkol@google.com> Acked-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211109023906.1091208-11-ricarkol@google.com
2021-12-28KVM: selftests: aarch64: Cmdline arg to set EOI mode in vgic_irqRicardo Koller1-8/+50
Add a new cmdline arg to set the EOI mode for all vgic_irq tests. This specifies whether a write to EOIR will deactivate IRQs or not. Signed-off-by: Ricardo Koller <ricarkol@google.com> Acked-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211109023906.1091208-10-ricarkol@google.com
2021-12-28KVM: selftests: aarch64: Cmdline arg to set number of IRQs in vgic_irq testRicardo Koller4-35/+127
Add the ability to specify the number of vIRQs exposed by KVM (arg defaults to 64). Then extend the KVM_IRQ_LINE test by injecting all available SPIs at once (specified by the nr-irqs arg). As a bonus, inject all SGIs at once as well. Signed-off-by: Ricardo Koller <ricarkol@google.com> Acked-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211109023906.1091208-9-ricarkol@google.com
2021-12-28KVM: selftests: aarch64: Abstract the injection functions in vgic_irqRicardo Koller1-3/+36
Build an abstraction around the injection functions, so the preparation and checking around the actual injection can be shared between tests. All functions are stored as pointers in arrays of kvm_inject_desc's which include the pointer and what kind of interrupts they can inject. Signed-off-by: Ricardo Koller <ricarkol@google.com> Acked-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211109023906.1091208-8-ricarkol@google.com
2021-12-28KVM: selftests: aarch64: Add vgic_irq to test userspace IRQ injectionRicardo Koller3-0/+246
Add a new KVM selftest, vgic_irq, for testing userspace IRQ injection. This particular test injects an SPI using KVM_IRQ_LINE on GICv3 and verifies that the IRQ is handled in the guest. The next commits will add more types of IRQs and different modes. Signed-off-by: Ricardo Koller <ricarkol@google.com> Acked-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211109023906.1091208-7-ricarkol@google.com
2021-12-28KVM: selftests: aarch64: Add vGIC library functions to deal with vIRQ stateRicardo Koller3-1/+116
Add a set of library functions for userspace code in selftests to deal with vIRQ state (i.e., ioctl wrappers). Signed-off-by: Ricardo Koller <ricarkol@google.com> Acked-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211109023906.1091208-6-ricarkol@google.com
2021-12-28KVM: selftests: Add kvm_irq_line library functionRicardo Koller2-0/+23
Add an architecture independent wrapper function for the KVM_IRQ_LINE ioctl. Signed-off-by: Ricardo Koller <ricarkol@google.com> Acked-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211109023906.1091208-5-ricarkol@google.com
2021-12-28KVM: selftests: aarch64: Add GICv3 register accessor library functionsRicardo Koller5-6/+189
Add library functions for accessing GICv3 registers: DIR, PMR, CTLR, ISACTIVER, ISPENDR. Signed-off-by: Ricardo Koller <ricarkol@google.com> Acked-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211109023906.1091208-4-ricarkol@google.com
2021-12-28KVM: selftests: aarch64: Add function for accessing GICv3 dist and redist ↵Ricardo Koller1-23/+101
registers Add a generic library function for reading and writing GICv3 distributor and redistributor registers. Then adapt some functions to use it; more will come and use it in the next commit. Signed-off-by: Ricardo Koller <ricarkol@google.com> Acked-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211109023906.1091208-3-ricarkol@google.com
2021-12-28KVM: selftests: aarch64: Move gic_v3.h to shared headersRicardo Koller1-0/+0
Move gic_v3.h to the shared headers location. There are some definitions that will be used in the vgic-irq test. Signed-off-by: Ricardo Koller <ricarkol@google.com> Acked-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211109023906.1091208-2-ricarkol@google.com
2021-12-28KVM: selftests: arm64: Add support for various modes with 16kB page sizeMarc Zyngier4-0/+34
The 16kB page size is not a popular choice, due to only a few CPUs actually implementing support for it. However, it can lead to some interesting performance improvements given the right uarch choices. Add support for this page size for various PA/VA combinations. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Andrew Jones <drjones@redhat.com> Link: https://lore.kernel.org/r/20211227124809.1335409-7-maz@kernel.org
2021-12-28KVM: selftests: arm64: Add support for VM_MODE_P36V48_{4K,64K}Marc Zyngier4-0/+18
Some of the arm64 systems out there have an IPA space that is positively tiny. Nonetheless, they make great KVM hosts. Add support for 36bit IPA support with 4kB pages, which makes some of the fruity machines happy. Whilst we're at it, add support for 64kB pages as well, though these boxes have no support for it. Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211227124809.1335409-6-maz@kernel.org
2021-12-28KVM: selftests: arm64: Rework TCR_EL1 configurationMarc Zyngier1-7/+14
The current way we initialise TCR_EL1 is a bit cumbersome, as we mix setting TG0 and IPS in the same swtch statement. Split it into two statements (one for the base granule size, and another for the IPA size), allowing new modes to be added in a more elegant way. No functional change intended. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Andrew Jones <drjones@redhat.com> Link: https://lore.kernel.org/r/20211227124809.1335409-5-maz@kernel.org
2021-12-28KVM: selftests: arm64: Check for supported page sizesMarc Zyngier3-6/+50
Just as arm64 implemenations don't necessary support all IPA ranges, they don't all support the same page sizes either. Fun. Create a dummy VM to snapshot the page sizes supported by the host, and filter the supported modes. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Andrew Jones <drjones@redhat.com> Link: https://lore.kernel.org/r/20211227124809.1335409-4-maz@kernel.org
2021-12-28KVM: selftests: arm64: Introduce a variable default IPA sizeMarc Zyngier2-4/+30
Contrary to popular belief, there is no such thing as a default IPA size on arm64. Anything goes, and implementations are the usual Wild West. The selftest infrastructure default to 40bit IPA, which obviously doesn't work for some systems out there. Turn VM_MODE_DEFAULT from a constant into a variable, and let guest_modes_append_default() populate it, depending on what the HW can do. In order to preserve the current behaviour, we still pick 40bits IPA as the default if it is available, and the largest supported IPA space otherwise. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Andrew Jones <drjones@redhat.com> Link: https://lore.kernel.org/r/20211227124809.1335409-3-maz@kernel.org
2021-12-28KVM: selftests: arm64: Initialise default guest mode at test startup timeMarc Zyngier1-0/+9
As we are going to add support for a variable default mode on arm64, let's make sure it is setup first by using a constructor that gets called before the actual test runs. Suggested-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Andrew Jones <drjones@redhat.com> Link: https://lore.kernel.org/r/20211227124809.1335409-2-maz@kernel.org
2021-12-21Merge tag 'kvm-s390-next-5.17-1' of ↵Paolo Bonzini7-88/+155
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD KVM: s390: Fix and cleanup - fix sigp sense/start/stop/inconsistency - cleanups
2021-12-21Merge remote-tracking branch 'kvm/master' into HEADPaolo Bonzini20-73/+412
Pick commit fdba608f15e2 ("KVM: VMX: Wake vCPU when delivering posted IRQ even if vCPU == this vCPU"). In addition to fixing a bug, it also aligns the non-nested and nested usage of triggering posted interrupts, allowing for additional cleanups. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-21KVM: VMX: Wake vCPU when delivering posted IRQ even if vCPU == this vCPUSean Christopherson1-2/+1
Drop a check that guards triggering a posted interrupt on the currently running vCPU, and more importantly guards waking the target vCPU if triggering a posted interrupt fails because the vCPU isn't IN_GUEST_MODE. If a vIRQ is delivered from asynchronous context, the target vCPU can be the currently running vCPU and can also be blocking, in which case skipping kvm_vcpu_wake_up() is effectively dropping what is supposed to be a wake event for the vCPU. The "do nothing" logic when "vcpu == running_vcpu" mostly works only because the majority of calls to ->deliver_posted_interrupt(), especially when using posted interrupts, come from synchronous KVM context. But if a device is exposed to the guest using vfio-pci passthrough, the VFIO IRQ and vCPU are bound to the same pCPU, and the IRQ is _not_ configured to use posted interrupts, wake events from the device will be delivered to KVM from IRQ context, e.g. vfio_msihandler() | |-> eventfd_signal() | |-> ... | |-> irqfd_wakeup() | |->kvm_arch_set_irq_inatomic() | |-> kvm_irq_delivery_to_apic_fast() | |-> kvm_apic_set_irq() This also aligns the non-nested and nested usage of triggering posted interrupts, and will allow for additional cleanups. Fixes: 379a3c8ee444 ("KVM: VMX: Optimize posted-interrupt delivery for timer fastpath") Cc: stable@vger.kernel.org Reported-by: Longpeng (Mike) <longpeng2@huawei.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20211208015236.1616697-18-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-20KVM: arm64: Fix comment on barrier in kvm_psci_vcpu_on()Fuad Tabba1-1/+1
The barrier is there for power_off rather than power_state. Probably typo in commit 358b28f09f0ab074 ("arm/arm64: KVM: Allow a VCPU to fully reset itself"). Signed-off-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211208193257.667613-3-tabba@google.com
2021-12-20KVM: arm64: Fix comment for kvm_reset_vcpu()Fuad Tabba1-4/+3
The comment for kvm_reset_vcpu() refers to the sysreg table as being the table above, probably because of the code extracted at commit f4672752c321ea36 ("arm64: KVM: virtual CPU reset"). Fix the comment to remove the potentially confusing reference. Signed-off-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211208193257.667613-2-tabba@google.com
2021-12-20KVM: arm64: Use defined value for SCTLR_ELx_EEFuad Tabba1-1/+1
Replace the hardcoded value with the existing definition. No functional change intended. Signed-off-by: Fuad Tabba <tabba@google.com> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211208192810.657360-1-tabba@google.com
2021-12-20KVM: selftests: Add test to verify TRIPLE_FAULT on invalid L2 guest stateSean Christopherson3-0/+107
Add a selftest to attempt to enter L2 with invalid guests state by exiting to userspace via I/O from L2, and then using KVM_SET_SREGS to set invalid guest state (marking TR unusable is arbitrary chosen for its relative simplicity). This is a regression test for a bug introduced by commit c8607e4a086f ("KVM: x86: nVMX: don't fail nested VM entry on invalid guest state if !from_vmentry"), which incorrectly set vmx->fail=true when L2 had invalid guest state and ultimately triggered a WARN due to nested_vmx_vmexit() seeing vmx->fail==true while attempting to synthesize a nested VM-Exit. The is also a functional test to verify that KVM sythesizes TRIPLE_FAULT for L2, which is somewhat arbitrary behavior, instead of emulating L2. KVM should never emulate L2 due to invalid guest state, as it's architecturally impossible for L1 to run an L2 guest with invalid state as nested VM-Enter should always fail, i.e. L1 needs to do the emulation. Stuffing state via KVM ioctl() is a non-architctural, out-of-band case, hence the TRIPLE_FAULT being rather arbitrary. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211207193006.120997-5-seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-20KVM: VMX: Fix stale docs for kvm-intel.emulate_invalid_guest_stateSean Christopherson1-2/+6
Update the documentation for kvm-intel's emulate_invalid_guest_state to rectify the description of KVM's default behavior, and to document that the behavior and thus parameter only applies to L1. Fixes: a27685c33acc ("KVM: VMX: Emulate invalid guest state by default") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211207193006.120997-4-seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-20KVM: nVMX: Synthesize TRIPLE_FAULT for L2 if emulation is requiredSean Christopherson1-8/+24
Synthesize a triple fault if L2 guest state is invalid at the time of VM-Enter, which can happen if L1 modifies SMRAM or if userspace stuffs guest state via ioctls(), e.g. KVM_SET_SREGS. KVM should never emulate invalid guest state, since from L1's perspective, it's architecturally impossible for L2 to have invalid state while L2 is running in hardware. E.g. attempts to set CR0 or CR4 to unsupported values will either VM-Exit or #GP. Modifying vCPU state via RSM+SMRAM and ioctl() are the only paths that can trigger this scenario, as nested VM-Enter correctly rejects any attempt to enter L2 with invalid state. RSM is a straightforward case as (a) KVM follows AMD's SMRAM layout and behavior, and (b) Intel's SDM states that loading reserved CR0/CR4 bits via RSM results in shutdown, i.e. there is precedent for KVM's behavior. Following AMD's SMRAM layout is important as AMD's layout saves/restores the descriptor cache information, including CS.RPL and SS.RPL, and also defines all the fields relevant to invalid guest state as read-only, i.e. so long as the vCPU had valid state before the SMI, which is guaranteed for L2, RSM will generate valid state unless SMRAM was modified. Intel's layout saves/restores only the selector, which means that scenarios where the selector and cached RPL don't match, e.g. conforming code segments, would yield invalid guest state. Intel CPUs fudge around this issued by stuffing SS.RPL and CS.RPL on RSM. Per Intel's SDM on the "Default Treatment of RSM", paraphrasing for brevity: IF internal storage indicates that the [CPU was post-VMXON] THEN enter VMX operation (root or non-root); restore VMX-critical state as defined in Section 34.14.1; set to their fixed values any bits in CR0 and CR4 whose values must be fixed in VMX operation [unless coming from an unrestricted guest]; IF RFLAGS.VM = 0 AND (in VMX root operation OR the “unrestricted guest” VM-execution control is 0) THEN CS.RPL := SS.DPL; SS.RPL := SS.DPL; FI; restore current VMCS pointer; FI; Note that Intel CPUs also overwrite the fixed CR0/CR4 bits, whereas KVM will sythesize TRIPLE_FAULT in this scenario. KVM's behavior is allowed as both Intel and AMD define CR0/CR4 SMRAM fields as read-only, i.e. the only way for CR0 and/or CR4 to have illegal values is if they were modified by the L1 SMM handler, and Intel's SDM "SMRAM State Save Map" section states "modifying these registers will result in unpredictable behavior". KVM's ioctl() behavior is less straightforward. Because KVM allows ioctls() to be executed in any order, rejecting an ioctl() if it would result in invalid L2 guest state is not an option as KVM cannot know if a future ioctl() would resolve the invalid state, e.g. KVM_SET_SREGS, or drop the vCPU out of L2, e.g. KVM_SET_NESTED_STATE. Ideally, KVM would reject KVM_RUN if L2 contained invalid guest state, but that carries the risk of a false positive, e.g. if RSM loaded invalid guest state and KVM exited to userspace. Setting a flag/request to detect such a scenario is undesirable because (a) it's extremely unlikely to add value to KVM as a whole, and (b) KVM would need to consider ioctl() interactions with such a flag, e.g. if userspace migrated the vCPU while the flag were set. Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211207193006.120997-3-seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-20KVM: VMX: Always clear vmx->fail on emulation_requiredSean Christopherson1-3/+1
Revert a relatively recent change that set vmx->fail if the vCPU is in L2 and emulation_required is true, as that behavior is completely bogus. Setting vmx->fail and synthesizing a VM-Exit is contradictory and wrong: (a) it's impossible to have both a VM-Fail and VM-Exit (b) vmcs.EXIT_REASON is not modified on VM-Fail (c) emulation_required refers to guest state and guest state checks are always VM-Exits, not VM-Fails. For KVM specifically, emulation_required is handled before nested exits in __vmx_handle_exit(), thus setting vmx->fail has no immediate effect, i.e. KVM calls into handle_invalid_guest_state() and vmx->fail is ignored. Setting vmx->fail can ultimately result in a WARN in nested_vmx_vmexit() firing when tearing down the VM as KVM never expects vmx->fail to be set when L2 is active, KVM always reflects those errors into L1. ------------[ cut here ]------------ WARNING: CPU: 0 PID: 21158 at arch/x86/kvm/vmx/nested.c:4548 nested_vmx_vmexit+0x16bd/0x17e0 arch/x86/kvm/vmx/nested.c:4547 Modules linked in: CPU: 0 PID: 21158 Comm: syz-executor.1 Not tainted 5.16.0-rc3-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:nested_vmx_vmexit+0x16bd/0x17e0 arch/x86/kvm/vmx/nested.c:4547 Code: <0f> 0b e9 2e f8 ff ff e8 57 b3 5d 00 0f 0b e9 00 f1 ff ff 89 e9 80 Call Trace: vmx_leave_nested arch/x86/kvm/vmx/nested.c:6220 [inline] nested_vmx_free_vcpu+0x83/0xc0 arch/x86/kvm/vmx/nested.c:330 vmx_free_vcpu+0x11f/0x2a0 arch/x86/kvm/vmx/vmx.c:6799 kvm_arch_vcpu_destroy+0x6b/0x240 arch/x86/kvm/x86.c:10989 kvm_vcpu_destroy+0x29/0x90 arch/x86/kvm/../../../virt/kvm/kvm_main.c:441 kvm_free_vcpus arch/x86/kvm/x86.c:11426 [inline] kvm_arch_destroy_vm+0x3ef/0x6b0 arch/x86/kvm/x86.c:11545 kvm_destroy_vm arch/x86/kvm/../../../virt/kvm/kvm_main.c:1189 [inline] kvm_put_kvm+0x751/0xe40 arch/x86/kvm/../../../virt/kvm/kvm_main.c:1220 kvm_vcpu_release+0x53/0x60 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3489 __fput+0x3fc/0x870 fs/file_table.c:280 task_work_run+0x146/0x1c0 kernel/task_work.c:164 exit_task_work include/linux/task_work.h:32 [inline] do_exit+0x705/0x24f0 kernel/exit.c:832 do_group_exit+0x168/0x2d0 kernel/exit.c:929 get_signal+0x1740/0x2120 kernel/signal.c:2852 arch_do_signal_or_restart+0x9c/0x730 arch/x86/kernel/signal.c:868 handle_signal_work kernel/entry/common.c:148 [inline] exit_to_user_mode_loop kernel/entry/common.c:172 [inline] exit_to_user_mode_prepare+0x191/0x220 kernel/entry/common.c:207 __syscall_exit_to_user_mode_work kernel/entry/common.c:289 [inline] syscall_exit_to_user_mode+0x2e/0x70 kernel/entry/common.c:300 do_syscall_64+0x53/0xd0 arch/x86/entry/common.c:86 entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: c8607e4a086f ("KVM: x86: nVMX: don't fail nested VM entry on invalid guest state if !from_vmentry") Reported-by: syzbot+f1d2136db9c80d4733e8@syzkaller.appspotmail.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211207193006.120997-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-20selftests: KVM: Fix non-x86 compilingAndrew Jones2-9/+6
Attempting to compile on a non-x86 architecture fails with include/kvm_util.h: In function ‘vm_compute_max_gfn’: include/kvm_util.h:79:21: error: dereferencing pointer to incomplete type ‘struct kvm_vm’ return ((1ULL << vm->pa_bits) >> vm->page_shift) - 1; ^~ This is because the declaration of struct kvm_vm is in lib/kvm_util_internal.h as an effort to make it private to the test lib code. We can still provide arch specific functions, though, by making the generic function symbols weak. Do that to fix the compile error. Fixes: c8cc43c1eae2 ("selftests: KVM: avoid failures due to reserved HyperTransport region") Cc: stable@vger.kernel.org Signed-off-by: Andrew Jones <drjones@redhat.com> Message-Id: <20211214151842.848314-1-drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-20KVM: x86: Always set kvm_run->if_flagMarc Orr5-17/+21
The kvm_run struct's if_flag is a part of the userspace/kernel API. The SEV-ES patches failed to set this flag because it's no longer needed by QEMU (according to the comment in the source code). However, other hypervisors may make use of this flag. Therefore, set the flag for guests with encrypted registers (i.e., with guest_state_protected set). Fixes: f1c6366e3043 ("KVM: SVM: Add required changes to support intercepts under SEV-ES") Signed-off-by: Marc Orr <marcorr@google.com> Message-Id: <20211209155257.128747-1-marcorr@google.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
2021-12-20KVM: x86/mmu: Don't advance iterator after restart due to yieldingSean Christopherson3-13/+28
After dropping mmu_lock in the TDP MMU, restart the iterator during tdp_iter_next() and do not advance the iterator. Advancing the iterator results in skipping the top-level SPTE and all its children, which is fatal if any of the skipped SPTEs were not visited before yielding. When zapping all SPTEs, i.e. when min_level == root_level, restarting the iter and then invoking tdp_iter_next() is always fatal if the current gfn has as a valid SPTE, as advancing the iterator results in try_step_side() skipping the current gfn, which wasn't visited before yielding. Sprinkle WARNs on iter->yielded being true in various helpers that are often used in conjunction with yielding, and tag the helper with __must_check to reduce the probabily of improper usage. Failing to zap a top-level SPTE manifests in one of two ways. If a valid SPTE is skipped by both kvm_tdp_mmu_zap_all() and kvm_tdp_mmu_put_root(), the shadow page will be leaked and KVM will WARN accordingly. WARNING: CPU: 1 PID: 3509 at arch/x86/kvm/mmu/tdp_mmu.c:46 [kvm] RIP: 0010:kvm_mmu_uninit_tdp_mmu+0x3e/0x50 [kvm] Call Trace: <TASK> kvm_arch_destroy_vm+0x130/0x1b0 [kvm] kvm_destroy_vm+0x162/0x2a0 [kvm] kvm_vcpu_release+0x34/0x60 [kvm] __fput+0x82/0x240 task_work_run+0x5c/0x90 do_exit+0x364/0xa10 ? futex_unqueue+0x38/0x60 do_group_exit+0x33/0xa0 get_signal+0x155/0x850 arch_do_signal_or_restart+0xed/0x750 exit_to_user_mode_prepare+0xc5/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x48/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xae If kvm_tdp_mmu_zap_all() skips a gfn/SPTE but that SPTE is then zapped by kvm_tdp_mmu_put_root(), KVM triggers a use-after-free in the form of marking a struct page as dirty/accessed after it has been put back on the free list. This directly triggers a WARN due to encountering a page with page_count() == 0, but it can also lead to data corruption and additional errors in the kernel. WARNING: CPU: 7 PID: 1995658 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:171 RIP: 0010:kvm_is_zone_device_pfn.part.0+0x9e/0xd0 [kvm] Call Trace: <TASK> kvm_set_pfn_dirty+0x120/0x1d0 [kvm] __handle_changed_spte+0x92e/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] __handle_changed_spte+0x63c/0xca0 [kvm] zap_gfn_range+0x549/0x620 [kvm] kvm_tdp_mmu_put_root+0x1b6/0x270 [kvm] mmu_free_root_page+0x219/0x2c0 [kvm] kvm_mmu_free_roots+0x1b4/0x4e0 [kvm] kvm_mmu_unload+0x1c/0xa0 [kvm] kvm_arch_destroy_vm+0x1f2/0x5c0 [kvm] kvm_put_kvm+0x3b1/0x8b0 [kvm] kvm_vcpu_release+0x4e/0x70 [kvm] __fput+0x1f7/0x8c0 task_work_run+0xf8/0x1a0 do_exit+0x97b/0x2230 do_group_exit+0xda/0x2a0 get_signal+0x3be/0x1e50 arch_do_signal_or_restart+0x244/0x17f0 exit_to_user_mode_prepare+0xcb/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x4d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae Note, the underlying bug existed even before commit 1af4a96025b3 ("KVM: x86/mmu: Yield in TDU MMU iter even if no SPTES changed") moved calls to tdp_mmu_iter_cond_resched() to the beginning of loops, as KVM could still incorrectly advance past a top-level entry when yielding on a lower-level entry. But with respect to leaking shadow pages, the bug was introduced by yielding before processing the current gfn. Alternatively, tdp_mmu_iter_cond_resched() could simply fall through, or callers could jump to their "retry" label. The downside of that approach is that tdp_mmu_iter_cond_resched() _must_ be called before anything else in the loop, and there's no easy way to enfornce that requirement. Ideally, KVM would handling the cond_resched() fully within the iterator macro (the code is actually quite clean) and avoid this entire class of bugs, but that is extremely difficult do while also supporting yielding after tdp_mmu_set_spte_atomic() fails. Yielding after failing to set a SPTE is very desirable as the "owner" of the REMOVED_SPTE isn't strictly bounded, e.g. if it's zapping a high-level shadow page, the REMOVED_SPTE may block operations on the SPTE for a significant amount of time. Fixes: faaf05b00aec ("kvm: x86/mmu: Support zapping SPTEs in the TDP MMU") Fixes: 1af4a96025b3 ("KVM: x86/mmu: Yield in TDU MMU iter even if no SPTES changed") Reported-by: Ignat Korchagin <ignat@cloudflare.com> Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211214033528.123268-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-20KVM: x86: remove PMU FIXED_CTR3 from msrs_to_save_allWei Wang1-1/+1
The fixed counter 3 is used for the Topdown metrics, which hasn't been enabled for KVM guests. Userspace accessing to it will fail as it's not included in get_fixed_pmc(). This breaks KVM selftests on ICX+ machines, which have this counter. To reproduce it on ICX+ machines, ./state_test reports: ==== Test Assertion Failure ==== lib/x86_64/processor.c:1078: r == nmsrs pid=4564 tid=4564 - Argument list too long 1 0x000000000040b1b9: vcpu_save_state at processor.c:1077 2 0x0000000000402478: main at state_test.c:209 (discriminator 6) 3 0x00007fbe21ed5f92: ?? ??:0 4 0x000000000040264d: _start at ??:? Unexpected result from KVM_GET_MSRS, r: 17 (failed MSR was 0x30c) With this patch, it works well. Signed-off-by: Wei Wang <wei.w.wang@intel.com> Message-Id: <20211217124934.32893-1-wei.w.wang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-19KVM: x86: Retry page fault if MMU reload is pending and root has no spSean Christopherson1-1/+15
Play nice with a NULL shadow page when checking for an obsolete root in the page fault handler by flagging the page fault as stale if there's no shadow page associated with the root and KVM_REQ_MMU_RELOAD is pending. Invalidating memslots, which is the only case where _all_ roots need to be reloaded, requests all vCPUs to reload their MMUs while holding mmu_lock for lock. The "special" roots, e.g. pae_root when KVM uses PAE paging, are not backed by a shadow page. Running with TDP disabled or with nested NPT explodes spectaculary due to dereferencing a NULL shadow page pointer. Skip the KVM_REQ_MMU_RELOAD check if there is a valid shadow page for the root. Zapping shadow pages in response to guest activity, e.g. when the guest frees a PGD, can trigger KVM_REQ_MMU_RELOAD even if the current vCPU isn't using the affected root. I.e. KVM_REQ_MMU_RELOAD can be seen with a completely valid root shadow page. This is a bit of a moot point as KVM currently unloads all roots on KVM_REQ_MMU_RELOAD, but that will be cleaned up in the future. Fixes: a955cad84cda ("KVM: x86/mmu: Retry page fault if root is invalidated by memslot update") Cc: stable@vger.kernel.org Cc: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211209060552.2956723-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-19KVM: selftests: vmx_pmu_msrs_test: Drop tests mangling guest visible CPUIDsVitaly Kuznetsov1-17/+0
Host initiated writes to MSR_IA32_PERF_CAPABILITIES should not depend on guest visible CPUIDs and (incorrect) KVM logic implementing it is about to change. Also, KVM_SET_CPUID{,2} after KVM_RUN is now forbidden and causes test to fail. Reported-by: kernel test robot <oliver.sang@intel.com> Fixes: feb627e8d6f6 ("KVM: x86: Forbid KVM_SET_CPUID{,2} after KVM_RUN") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20211216165213.338923-2-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-19KVM: x86: Drop guest CPUID check for host initiated writes to ↵Vitaly Kuznetsov1-1/+1
MSR_IA32_PERF_CAPABILITIES The ability to write to MSR_IA32_PERF_CAPABILITIES from the host should not depend on guest visible CPUID entries, even if just to allow creating/restoring guest MSRs and CPUIDs in any sequence. Fixes: 27461da31089 ("KVM: x86/pmu: Support full width counting") Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20211216165213.338923-3-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-19Merge branch 'topic/ppc-kvm' of ↵Paolo Bonzini30-730/+1566
https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux into HEAD Fix conflicts between memslot overhaul and commit 511d25d6b789f ("KVM: PPC: Book3S: Suppress warnings when allocating too big memory slots") from the powerpc tree.
2021-12-17KVM: s390: Clarify SIGP orders versus STOP/RESTARTEric Farman4-2/+43
With KVM_CAP_S390_USER_SIGP, there are only five Signal Processor orders (CONDITIONAL EMERGENCY SIGNAL, EMERGENCY SIGNAL, EXTERNAL CALL, SENSE, and SENSE RUNNING STATUS) which are intended for frequent use and thus are processed in-kernel. The remainder are sent to userspace with the KVM_CAP_S390_USER_SIGP capability. Of those, three orders (RESTART, STOP, and STOP AND STORE STATUS) have the potential to inject work back into the kernel, and thus are asynchronous. Let's look for those pending IRQs when processing one of the in-kernel SIGP orders, and return BUSY (CC2) if one is in process. This is in agreement with the Principles of Operation, which states that only one order can be "active" on a CPU at a time. Cc: stable@vger.kernel.org Suggested-by: David Hildenbrand <david@redhat.com> Signed-off-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Acked-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20211213210550.856213-2-farman@linux.ibm.com [borntraeger@linux.ibm.com: add stable tag] Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
2021-12-17s390: uv: Add offset comments to UV query struct and fix namingJanosch Frank1-17/+17
Changes to the struct are easier to manage with offset comments so let's add some. And now that we know that the last struct member has the wrong name let's also fix this. Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>