summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2018-12-21kvm: selftests: x86_64: dirty_log_test: fix -tAndrew Jones1-0/+9
Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-12-21KVM: fix some typosWei Yang3-3/+3
Signed-off-by: Wei Yang <richard.weiyang@gmail.com> [Preserved the iff and a probably intentional weird bracket notation. Also dropped the style change to make a single-purpose patch. - Radim] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-12-21x86/kvmclock: convert to SPDX identifiersPeng Hao1-14/+1
Update the verbose license text with the matching SPDX license identifier. Signed-off-by: Peng Hao <peng.hao2@zte.com.cn> [Changed deprecated GPL-2.0+ to GPL-2.0-or-later. - Radim] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-12-21KVM: x86: Remove KF() macro placeholderSean Christopherson1-3/+0
Although well-intentioned, keeping the KF() definition as a hint for handling scattered CPUID features may be counter-productive. Simply redefining the bit position only works for directly manipulating the guest's CPUID leafs, e.g. it doesn't make guest_cpuid_has() magically work. Taking an alternative approach, e.g. ensuring the bit position is identical between the Linux-defined and hardware-defined features, may be a simpler and/or more effective method of exposing scattered features to the guest. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-12-21kvm: vmx: Allow guest read access to IA32_TSCJim Mattson1-0/+1
Let the guest read the IA32_TSC MSR with the generic RDMSR instruction as well as the specific RDTSC(P) instructions. Note that the hardware applies the TSC multiplier and offset (when applicable) to the result of RDMSR(IA32_TSC), just as it does to the result of RDTSC(P). Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Peter Shier <pshier@google.com> Reviewed-by: Marc Orr <marcorr@google.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-12-21kvm: nVMX: NMI-window and interrupt-window exiting should wake L2 from HLTJim Mattson1-3/+7
According to the SDM, "NMI-window exiting" VM-exits wake a logical processor from the same inactive states as would an NMI and "interrupt-window exiting" VM-exits wake a logical processor from the same inactive states as would an external interrupt. Specifically, they wake a logical processor from the shutdown state and from the states entered using the HLT and MWAIT instructions. Fixes: 6dfacadd5858 ("KVM: nVMX: Add support for activity state HLT") Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Peter Shier <pshier@google.com> Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com> [Squashed comments of two Jim's patches and used the simplified code hunk provided by Sean. - Radim] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-12-21KVM: nSVM: Fix nested guest support for PAUSE filtering.Tambe, William1-0/+12
Currently, the nested guest's PAUSE intercept intentions are not being honored. Instead, since the L0 hypervisor's pause_filter_count and pause_filter_thresh values are still in place, these values are used instead of those programmed in the VMCB by the L1 hypervisor. To honor the desired PAUSE intercept support of the L1 hypervisor, the L0 hypervisor must use the PAUSE filtering fields of the L1 hypervisor. This requires saving and restoring of both the L0 and L1 hypervisor's PAUSE filtering fields. Signed-off-by: William Tambe <william.tambe@amd.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-12-21kvm: Change offset in kvm_write_guest_offset_cached to unsignedJim Mattson2-2/+4
Since the offset is added directly to the hva from the gfn_to_hva_cache, a negative offset could result in an out of bounds write. The existing BUG_ON only checks for addresses beyond the end of the gfn_to_hva_cache, not for addresses before the start of the gfn_to_hva_cache. Note that all current call sites have non-negative offsets. Fixes: 4ec6e8636256 ("kvm: Introduce kvm_write_guest_offset_cached()") Reported-by: Cfir Cohen <cfir@google.com> Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Cfir Cohen <cfir@google.com> Reviewed-by: Peter Shier <pshier@google.com> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-12-21kvm: Disallow wraparound in kvm_gfn_to_hva_cache_initJim Mattson1-20/+21
Previously, in the case where (gpa + len) wrapped around, the entire region was not validated, as the comment claimed. It doesn't actually seem that wraparound should be allowed here at all. Furthermore, since some callers don't check the return code from this function, it seems prudent to clear ghc->memslot in the event of an error. Fixes: 8f964525a121f ("KVM: Allow cross page reads and writes from cached translations.") Reported-by: Cfir Cohen <cfir@google.com> Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Cfir Cohen <cfir@google.com> Reviewed-by: Marc Orr <marcorr@google.com> Cc: Andrew Honig <ahonig@google.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-12-21KVM: VMX: Remove duplicated include from vmx.cYueHaibing1-1/+0
Remove duplicated include. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-12-21selftests: kvm: report failed stage when exit reason is unexpectedVitaly Kuznetsov2-4/+4
When we get a report like ==== Test Assertion Failure ==== x86_64/state_test.c:157: run->exit_reason == KVM_EXIT_IO pid=955 tid=955 - Success 1 0x0000000000401350: main at state_test.c:154 2 0x00007fc31c9e9412: ?? ??:0 3 0x000000000040159d: _start at ??:? Unexpected exit reason: 8 (SHUTDOWN), it is not obvious which particular stage failed. Add the info. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-12-21KVM: x86: svm: report MSR_IA32_MCG_EXT_CTL as unsupportedVitaly Kuznetsov1-0/+7
AMD doesn't seem to implement MSR_IA32_MCG_EXT_CTL and svm code in kvm knows nothing about it, however, this MSR is among emulated_msrs and thus returned with KVM_GET_MSR_INDEX_LIST. The consequent KVM_GET_MSRS, of course, fails. Report the MSR as unsupported to not confuse userspace. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-12-21KVM: x86: fix size of x86_fpu_cache objectsPaolo Bonzini1-1/+1
The memory allocation in b666a4b69739 ("kvm: x86: Dynamically allocate guest_fpu", 2018-11-06) is wrong, there are other members in struct fpu before the fpregs_state union and the patch should be doing something similar to the code in fpu__init_task_struct_size. It's enough to run a guest and then rmmod kvm to see slub errors which are actually caused by memory corruption. For now let's revert it to sizeof(struct fpu), which is conservative. I have plans to move fsave/fxsave/xsave directly in KVM, without using the kernel FPU helpers, and once it's done, the size of the object in the cache will be something like kvm_xstate_size. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-20Merge tag 'kvm-ppc-next-4.21-1' of ↵Radim Krčmář18-70/+418
git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc PPC KVM update for 4.21 from Paul Mackerras The main new feature this time is support in HV nested KVM for passing a device that is emulated by a level 0 hypervisor and presented to level 1 as a PCI device through to a level 2 guest using VFIO. Apart from that there are improvements for migration of radix guests under HV KVM and some other fixes and cleanups.
2018-12-20Merge tag 'kvm-s390-next-4.21-1' of ↵Paolo Bonzini1-7/+17
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD KVM: s390: Fixes for 4.21 Just two small fixes.
2018-12-19Merge tag 'kvmarm-for-v4.21' of ↵Paolo Bonzini29-296/+595
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm updates for 4.21 - Large PUD support for HugeTLB - Single-stepping fixes - Improved tracing - Various timer and vgic fixups
2018-12-19arm: KVM: Add S2_PMD_{MASK,SIZE} constantsMarc Zyngier1-0/+3
They were missing, and it turns out that we do need them now. Acked-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-12-19arm/arm64: KVM: Add ARM_EXCEPTION_IS_TRAP macroMarc Zyngier3-1/+6
32 and 64bit use different symbols to identify the traps. 32bit has a fine grained approach (prefetch abort, data abort and HVC), while 64bit is pretty happy with just "trap". This has been fine so far, except that we now need to decode some of that in tracepoints that are common to both architectures. Introduce ARM_EXCEPTION_IS_TRAP which abstracts the trap symbols and make the tracepoint use it. Acked-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-12-19arm64: KVM: Avoid setting the upper 32 bits of VTCR_EL2 to 1Will Deacon1-1/+1
Although bit 31 of VTCR_EL2 is RES1, we inadvertently end up setting all of the upper 32 bits to 1 as well because we define VTCR_EL2_RES1 as signed, which is sign-extended when assigning to kvm->arch.vtcr. Lucky for us, the architecture currently treats these upper bits as RES0 so, whilst we've been naughty, we haven't set fire to anything yet. Cc: <stable@vger.kernel.org> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-12-19KVM: arm/arm64: Fix unintended stage 2 PMD mappingsChristoffer Dall1-22/+64
There are two things we need to take care of when we create block mappings in the stage 2 page tables: (1) The alignment within a PMD between the host address range and the guest IPA range must be the same, since otherwise we end up mapping pages with the wrong offset. (2) The head and tail of a memory slot may not cover a full block size, and we have to take care to not map those with block descriptors, since we could expose memory to the guest that the host did not intend to expose. So far, we have been taking care of (1), but not (2), and our commentary describing (1) was somewhat confusing. This commit attempts to factor out the checks of both into a common function, and if we don't pass the check, we won't attempt any PMD mappings for neither hugetlbfs nor THP. Note that we used to only check the alignment for THP, not for hugetlbfs, but as far as I can tell the check needs to be applied to both scenarios. Cc: Ralph Palutke <ralph.palutke@fau.de> Cc: Lukas Braun <koomi@moshbit.net> Reported-by: Lukas Braun <koomi@moshbit.net> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-12-19arm/arm64: KVM: vgic: Force VM halt when changing the active state of GICv3 ↵Marc Zyngier1-2/+4
PPIs/SGIs We currently only halt the guest when a vCPU messes with the active state of an SPI. This is perfectly fine for GICv2, but isn't enough for GICv3, where all vCPUs can access the state of any other vCPU. Let's broaden the condition to include any GICv3 interrupt that has an active state (i.e. all but LPIs). Cc: stable@vger.kernel.org Reviewed-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-12-19arm64: KVM: Add trapped system register access tracepointMarc Zyngier3-0/+41
We're pretty blind when it comes to system register tracing, and rely on the ESR value displayed by kvm_handle_sys, which isn't much. Instead, let's add an actual name to the sysreg entries, so that we can finally print it as we're about to perform the access itself. The new tracepoint is conveniently called kvm_sys_access. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-12-19KVM: arm64: Make vcpu const in vcpu_read_sys_regChristoffer Dall2-2/+2
vcpu_read_sys_reg should not be modifying the VCPU structure. Eventually, to handle EL2 sysregs for nested virtualization, we will call vcpu_read_sys_reg from places that have a const vcpu pointer, which will complain about the lack of the const modifier on the read path. Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-12-19KVM: arm/arm64: arch_timer: Simplify kvm_timer_vcpu_terminateChristoffer Dall1-3/+0
kvm_timer_vcpu_terminate can only be called in two scenarios: 1. As part of cleanup during a failed VCPU create 2. As part of freeing the whole VM (struct kvm refcount == 0) In the first case, we cannot have programmed any timers or mapped any IRQs, and therefore we do not have to cancel anything or unmap anything. In the second case, the VCPU will have gone through kvm_timer_vcpu_put, which will have canceled the emulated physical timer's hrtimer, and we do not need to that here as well. We also do not care if the irq is recorded as mapped or not in the VGIC data structure, because the whole VM is going away. That leaves us only with having to ensure that we cancel the bg_timer if we were blocking the last time we called kvm_timer_vcpu_put(). Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-12-19KVM: arm/arm64: Remove arch timer workqueueChristoffer Dall2-31/+7
The use of a work queue in the hrtimer expire function for the bg_timer is a leftover from the time when we would inject interrupts when the bg_timer expired. Since we are no longer doing that, we can instead call kvm_vcpu_wake_up() directly from the hrtimer function and remove all workqueue functionality from the arch timer code. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-12-19KVM: arm/arm64: Fixup the kvm_exit tracepointChristoffer Dall3-13/+15
The kvm_exit tracepoint strangely always reported exits as being IRQs. This seems to be because either the __print_symbolic or the tracepoint macros use a variable named idx. Take this chance to update the fields in the tracepoint to reflect the concepts in the arm64 architecture that we pass to the tracepoint and move the exception type table to the same location and header files as the exits code. We also clear out the exception code to 0 for IRQ exits (which translates to UNKNOWN in text) to make it slighyly less confusing to parse the trace output. Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-12-19KVM: arm/arm64: vgic: Consider priority and active state for pending irqChristoffer Dall1-1/+6
When checking if there are any pending IRQs for the VM, consider the active state and priority of the IRQs as well. Otherwise we could be continuously scheduling a guest hypervisor without it seeing an IRQ. Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-12-19KVM: arm/arm64: vgic: Fix off-by-one bug in vgic_get_irq()Gustavo A. R. Silva1-1/+1
When using the nospec API, it should be taken into account that: "...if the CPU speculates past the bounds check then * array_index_nospec() will clamp the index within the range of [0, * size)." The above is part of the header for macro array_index_nospec() in linux/nospec.h Now, in this particular case, if intid evaluates to exactly VGIC_MAX_SPI or to exaclty VGIC_MAX_PRIVATE, the array_index_nospec() macro ends up returning VGIC_MAX_SPI - 1 or VGIC_MAX_PRIVATE - 1 respectively, instead of VGIC_MAX_SPI or VGIC_MAX_PRIVATE, which, based on the original logic: /* SGIs and PPIs */ if (intid <= VGIC_MAX_PRIVATE) return &vcpu->arch.vgic_cpu.private_irqs[intid]; /* SPIs */ if (intid <= VGIC_MAX_SPI) return &kvm->arch.vgic.spis[intid - VGIC_NR_PRIVATE_IRQS]; are valid values for intid. Fix this by calling array_index_nospec() macro with VGIC_MAX_PRIVATE + 1 and VGIC_MAX_SPI + 1 as arguments for its parameter size. Fixes: 41b87599c743 ("KVM: arm/arm64: vgic: fix possible spectre-v1 in vgic_get_irq()") Cc: stable@vger.kernel.org Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> [dropped the SPI part which was fixed separately] Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-12-18KVM: arm/arm64: vgic: Cap SPIs to the VM-defined maximumMarc Zyngier1-2/+2
SPIs should be checked against the VMs specific configuration, and not the architectural maximum. Cc: stable@vger.kernel.org Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-12-18KVM: arm64: Clarify explanation of STAGE2_PGTABLE_LEVELSChristoffer Dall2-10/+8
In attempting to re-construct the logic for our stage 2 page table layout I found the reasoning in the comment explaining how we calculate the number of levels used for stage 2 page tables a bit backwards. This commit attempts to clarify the comment, to make it slightly easier to read without having the Arm ARM open on the right page. While we're at it, fixup a typo in a comment that was recently changed. Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-12-18KVM: arm/arm64: vgic: Do not cond_resched_lock() with IRQs disabledJulien Thierry1-21/+0
To change the active state of an MMIO, halt is requested for all vcpus of the affected guest before modifying the IRQ state. This is done by calling cond_resched_lock() in vgic_mmio_change_active(). However interrupts are disabled at this point and we cannot reschedule a vcpu. We actually don't need any of this, as kvm_arm_halt_guest ensures that all the other vcpus are out of the guest. Let's just drop that useless code. Signed-off-by: Julien Thierry <julien.thierry@arm.com> Suggested-by: Christoffer Dall <christoffer.dall@arm.com> Cc: stable@vger.kernel.org Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-12-18KVM: arm64: Add support for creating PUD hugepages at stage 2Punit Agrawal6-6/+143
KVM only supports PMD hugepages at stage 2. Now that the various page handling routines are updated, extend the stage 2 fault handling to map in PUD hugepages. Addition of PUD hugepage support enables additional page sizes (e.g., 1G with 4K granule) which can be useful on cores that support mapping larger block sizes in the TLB entries. Signed-off-by: Punit Agrawal <punit.agrawal@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@arm.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> [ Replace BUG() => WARN_ON(1) for arm32 PUD helpers ] Signed-off-by: Suzuki Poulose <suzuki.poulose@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-12-18KVM: arm64: Update age handlers to support PUD hugepagesPunit Agrawal4-19/+32
In preparation for creating larger hugepages at Stage 2, add support to the age handling notifiers for PUD hugepages when encountered. Provide trivial helpers for arm32 to allow sharing code. Signed-off-by: Punit Agrawal <punit.agrawal@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@arm.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> [ Replaced BUG() => WARN_ON(1) for arm32 PUD helpers ] Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-12-18KVM: arm64: Support handling access faults for PUD hugepagesPunit Agrawal4-11/+33
In preparation for creating larger hugepages at Stage 2, extend the access fault handling at Stage 2 to support PUD hugepages when encountered. Provide trivial helpers for arm32 to allow sharing of code. Signed-off-by: Punit Agrawal <punit.agrawal@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@arm.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> [ Replaced BUG() => WARN_ON(1) in PUD helpers ] Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-12-18KVM: arm64: Support PUD hugepage in stage2_is_exec()Punit Agrawal4-5/+61
In preparation for creating PUD hugepages at stage 2, add support for detecting execute permissions on PUD page table entries. Faults due to lack of execute permissions on page table entries is used to perform i-cache invalidation on first execute. Provide trivial implementations of arm32 helpers to allow sharing of code. Signed-off-by: Punit Agrawal <punit.agrawal@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@arm.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> [ Replaced BUG() => WARN_ON(1) in arm32 PUD helpers ] Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-12-18KVM: arm64: Support dirty page tracking for PUD hugepagesPunit Agrawal3-4/+32
In preparation for creating PUD hugepages at stage 2, add support for write protecting PUD hugepages when they are encountered. Write protecting guest tables is used to track dirty pages when migrating VMs. Also, provide trivial implementations of required kvm_s2pud_* helpers to allow sharing of code with arm32. Signed-off-by: Punit Agrawal <punit.agrawal@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@arm.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> [ Replaced BUG() => WARN_ON() in arm32 pud helpers ] Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-12-18KVM: arm/arm64: Introduce helpers to manipulate page table entriesPunit Agrawal3-6/+18
Introduce helpers to abstract architectural handling of the conversion of pfn to page table entries and marking a PMD page table entry as a block entry. The helpers are introduced in preparation for supporting PUD hugepages at stage 2 - which are supported on arm64 but do not exist on arm. Signed-off-by: Punit Agrawal <punit.agrawal@arm.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Acked-by: Christoffer Dall <christoffer.dall@arm.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-12-18KVM: arm/arm64: Re-factor setting the Stage 2 entry to exec on faultPunit Agrawal1-13/+15
Stage 2 fault handler marks a page as executable if it is handling an execution fault or if it was a permission fault in which case the executable bit needs to be preserved. The logic to decide if the page should be marked executable is duplicated for PMD and PTE entries. To avoid creating another copy when support for PUD hugepages is introduced refactor the code to share the checks needed to mark a page table entry as executable. Signed-off-by: Punit Agrawal <punit.agrawal@arm.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-12-18KVM: arm/arm64: Share common code in user_mem_abort()Punit Agrawal1-19/+30
The code for operations such as marking the pfn as dirty, and dcache/icache maintenance during stage 2 fault handling is duplicated between normal pages and PMD hugepages. Instead of creating another copy of the operations when we introduce PUD hugepages, let's share them across the different pagesizes. Signed-off-by: Punit Agrawal <punit.agrawal@arm.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-12-18KVM: arm/arm64: vgic-v2: Set active_source to 0 when restoring stateChristoffer Dall1-1/+16
When restoring the active state from userspace, we don't know which CPU was the source for the active state, and this is not architecturally exposed in any of the register state. Set the active_source to 0 in this case. In the future, we can expand on this and exposse the information as additional information to userspace for GICv2 if anyone cares. Cc: stable@vger.kernel.org Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-12-18KVM: arm/arm64: Log PSTATE for unhandled sysregsMark Rutland2-6/+6
When KVM traps an unhandled sysreg/coproc access from a guest, it logs the guest PC. To aid debugging, it would be helpful to know which exception level the trap came from, along with other PSTATE/CPSR bits, so let's log the PSTATE/CPSR too. Acked-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-12-18KVM: arm/arm64: Fix VMID alloc race by reverting to lock-lessChristoffer Dall1-12/+11
We recently addressed a VMID generation race by introducing a read/write lock around accesses and updates to the vmid generation values. However, kvm_arch_vcpu_ioctl_run() also calls need_new_vmid_gen() but does so without taking the read lock. As far as I can tell, this can lead to the same kind of race: VM 0, VCPU 0 VM 0, VCPU 1 ------------ ------------ update_vttbr (vmid 254) update_vttbr (vmid 1) // roll over read_lock(kvm_vmid_lock); force_vm_exit() local_irq_disable need_new_vmid_gen == false //because vmid gen matches enter_guest (vmid 254) kvm_arch.vttbr = <PGD>:<VMID 1> read_unlock(kvm_vmid_lock); enter_guest (vmid 1) Which results in running two VCPUs in the same VM with different VMIDs and (even worse) other VCPUs from other VMs could now allocate clashing VMID 254 from the new generation as long as VCPU 0 is not exiting. Attempt to solve this by making sure vttbr is updated before another CPU can observe the updated VMID generation. Cc: stable@vger.kernel.org Fixes: f0cf47d939d0 "KVM: arm/arm64: Close VMID generation race" Reviewed-by: Julien Thierry <julien.thierry@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-12-18arm64: KVM: Consistently advance singlestep when emulating instructionsMark Rutland9-93/+46
When we emulate a guest instruction, we don't advance the hardware singlestep state machine, and thus the guest will receive a software step exception after a next instruction which is not emulated by the host. We bodge around this in an ad-hoc fashion. Sometimes we explicitly check whether userspace requested a single step, and fake a debug exception from within the kernel. Other times, we advance the HW singlestep state rely on the HW to generate the exception for us. Thus, the observed step behaviour differs for host and guest. Let's make this simpler and consistent by always advancing the HW singlestep state machine when we skip an instruction. Thus we can rely on the hardware to generate the singlestep exception for us, and never need to explicitly check for an active-pending step, nor do we need to fake a debug exception from the guest. Cc: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-12-18arm64: KVM: Skip MMIO insn after emulationMark Rutland1-5/+6
When we emulate an MMIO instruction, we advance the CPU state within decode_hsr(), before emulating the instruction effects. Having this logic in decode_hsr() is opaque, and advancing the state before emulation is problematic. It gets in the way of applying consistent single-step logic, and it prevents us from being able to fail an MMIO instruction with a synchronous exception. Clean this up by only advancing the CPU state *after* the effects of the instruction are emulated. Cc: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-12-18KVM: s390: fix kmsg component kvm-s390Michael Mueller1-4/+3
Relocate #define statement for kvm related kernel messages before the include of printk to become effective. Signed-off-by: Michael Mueller <mimu@linux.ibm.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2018-12-18KVM: s390: unregister debug feature on failing arch initMichael Mueller1-3/+14
Make sure the debug feature and its allocated resources get released upon unsuccessful architecture initialization. A related indication of the issue will be reported as kernel message. Signed-off-by: Michael Mueller <mimu@linux.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Pierre Morel <pmorel@linux.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Message-Id: <20181130143215.69496-2-mimu@linux.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2018-12-17KVM: PPC: Book3S HV: Allow passthrough of an emulated device to an L3 guestSuraj Jitindar Singh2-10/+4
Previously when a device was being emulated by an L1 guest for an L2 guest, that device couldn't then be passed through to an L3 guest. This was because the L1 guest had no method for accessing L3 memory. The hcall H_COPY_TOFROM_GUEST provides this access. Thus this setup for passthrough can now be allowed. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-12-17KVM: PPC: Book3S: Introduce new hcall H_COPY_TOFROM_GUEST to access ↵Suraj Jitindar Singh5-4/+89
quadrants 1 & 2 A guest cannot access quadrants 1 or 2 as this would result in an exception. Thus introduce the hcall H_COPY_TOFROM_GUEST to be used by a guest when it wants to perform an access to quadrants 1 or 2, for example when it wants to access memory for one of its nested guests. Also provide an implementation for the kvm-hv module. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-12-17KVM: PPC: Book3S HV: Allow passthrough of an emulated device to an L2 guestSuraj Jitindar Singh5-11/+57
Allow for a device which is being emulated at L0 (the host) for an L1 guest to be passed through to a nested (L2) guest. The existing kvmppc_hv_emulate_mmio function can be used here. The main challenge is that for a load the result must be stored into the L2 gpr, not an L1 gpr as would normally be the case after going out to qemu to complete the operation. This presents a challenge as at this point the L2 gpr state has been written back into L1 memory. To work around this we store the address in L1 memory of the L2 gpr where the result of the load is to be stored and use the new io_gpr value KVM_MMIO_REG_NESTED_GPR to indicate that this is a nested load for which completion must be done when returning back into the kernel. Then in kvmppc_complete_mmio_load() the resultant value is written into L1 memory at the location of the indicated L2 gpr. Note that we don't currently let an L1 guest emulate a device for an L2 guest which is then passed through to an L3 guest. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-12-17KVM: PPC: Update kvmppc_st and kvmppc_ld to use quadrantsSuraj Jitindar Singh1-2/+16
The functions kvmppc_st and kvmppc_ld are used to access guest memory from the host using a guest effective address. They do so by translating through the process table to obtain a guest real address and then using kvm_read_guest or kvm_write_guest to make the access with the guest real address. This method of access however only works for L1 guests and will give the incorrect results for a nested guest. We can however use the store_to_eaddr and load_from_eaddr kvmppc_ops to perform the access for a nested guesti (and a L1 guest). So attempt this method first and fall back to the old method if this fails and we aren't running a nested guest. At this stage there is no fall back method to perform the access for a nested guest and this is left as a future improvement. For now we will return to the nested guest and rely on the fact that a translation should be faulted in before retrying the access. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>