summaryrefslogtreecommitdiff
path: root/arch/x86/kvm
AgeCommit message (Collapse)AuthorFilesLines
2019-01-25KVM: nSVM: clear events pending from svm_complete_interrupts() when exiting ↵Vitaly Kuznetsov1-0/+8
to L1 kvm-unit-tests' eventinj "NMI failing on IDT" test results in NMI being delivered to the host (L1) when it's running nested. The problem seems to be: svm_complete_interrupts() raises 'nmi_injected' flag but later we decide to reflect EXIT_NPF to L1. The flag remains pending and we do NMI injection upon entry so it got delivered to L1 instead of L2. It seems that VMX code solves the same issue in prepare_vmcs12(), this was introduced with code refactoring in commit 5f3d5799974b ("KVM: nVMX: Rework event injection and recovery"). Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-01-25svm: Fix AVIC incomplete IPI emulationSuravee Suthikulpanit1-15/+4
In case of incomplete IPI with invalid interrupt type, the current SVM driver does not properly emulate the IPI, and fails to boot FreeBSD guests with multiple vcpus when enabling AVIC. Fix this by update APIC ICR high/low registers, which also emulate sending the IPI. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-01-25svm: Add warning message for AVIC IPI invalid targetSuravee Suthikulpanit1-0/+2
Print warning message when IPI target ID is invalid due to one of the following reasons: * In logical mode: cluster > max_cluster (64) * In physical mode: target > max_physical (512) * Address is not present in the physical or logical ID tables Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-01-25KVM: x86: Fix PV IPIs for 32-bit KVM hostSean Christopherson1-1/+1
The recognition of the KVM_HC_SEND_IPI hypercall was unintentionally wrapped in "#ifdef CONFIG_X86_64", causing 32-bit KVM hosts to reject any and all PV IPI requests despite advertising the feature. This results in all KVM paravirtualized guests hanging during SMP boot due to IPIs never being delivered. Fixes: 4180bf1b655a ("KVM: X86: Implement "send IPI" hypercall") Cc: stable@vger.kernel.org Cc: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-01-25x86/kvm/hyper-v: recommend using eVMCS only when it is enabledVitaly Kuznetsov1-1/+2
We shouldn't probably be suggesting using Enlightened VMCS when it's not enabled (not supported from guest's point of view). Hyper-V on KVM seems to be fine either way but let's be consistent. Fixes: 2bc39970e932 ("x86/kvm/hyper-v: Introduce KVM_GET_SUPPORTED_HV_CPUID") Reviewed-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-01-25x86/kvm/hyper-v: don't recommend doing reset via synthetic MSRVitaly Kuznetsov1-1/+0
System reset through synthetic MSR is not recommended neither by genuine Hyper-V nor my QEMU. Fixes: 2bc39970e932 ("x86/kvm/hyper-v: Introduce KVM_GET_SUPPORTED_HV_CPUID") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-01-25kvm: x86/vmx: Use kzalloc for cached_vmcs12Tom Roeder1-4/+8
This changes the allocation of cached_vmcs12 to use kzalloc instead of kmalloc. This removes the information leak found by Syzkaller (see Reported-by) in this case and prevents similar leaks from happening based on cached_vmcs12. It also changes vmx_get_nested_state to copy out the full 4k VMCS12_SIZE in copy_to_user rather than only the size of the struct. Tested: rebuilt against head, booted, and ran the syszkaller repro https://syzkaller.appspot.com/text?tag=ReproC&x=174efca3400000 without observing any problems. Reported-by: syzbot+ded1696f6b50b615b630@syzkaller.appspotmail.com Fixes: 8fcc4b5923af5de58b80b53a069453b135693304 Cc: stable@vger.kernel.org Signed-off-by: Tom Roeder <tmroeder@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-01-25KVM: VMX: Use the correct field var when clearing ↵Sean Christopherson1-1/+1
VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL Fix a recently introduced bug that results in the wrong VMCS control field being updated when applying a IA32_PERF_GLOBAL_CTRL errata. Fixes: c73da3fcab43 ("KVM: VMX: Properly handle dynamic VM Entry/Exit controls") Reported-by: Harald Arnesen <harald@skogtun.org> Tested-by: Harald Arnesen <harald@skogtun.org> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-01-25KVM: x86: Fix single-step debuggingAlexander Popov1-2/+1
The single-step debugging of KVM guests on x86 is broken: if we run gdb 'stepi' command at the breakpoint when the guest interrupts are enabled, RIP always jumps to native_apic_mem_write(). Then other nasty effects follow. Long investigation showed that on Jun 7, 2017 the commit c8401dda2f0a00cd25c0 ("KVM: x86: fix singlestepping over syscall") introduced the kvm_run.debug corruption: kvm_vcpu_do_singlestep() can be called without X86_EFLAGS_TF set. Let's fix it. Please consider that for -stable. Signed-off-by: Alexander Popov <alex.popov@linux.com> Cc: stable@vger.kernel.org Fixes: c8401dda2f0a00cd25c0 ("KVM: x86: fix singlestepping over syscall") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-01-25x86/kvm/hyper-v: don't announce GUEST IDLE MSR supportVitaly Kuznetsov1-1/+0
HV_X64_MSR_GUEST_IDLE_AVAILABLE appeared in kvm_vcpu_ioctl_get_hv_cpuid() by mistake: it announces support for HV_X64_MSR_GUEST_IDLE (0x400000F0) which we don't support in KVM (yet). Fixes: 2bc39970e932 ("x86/kvm/hyper-v: Introduce KVM_GET_SUPPORTED_HV_CPUID") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-01-11x86/kvm/nVMX: don't skip emulated instruction twice when vmptr address is ↵Vitaly Kuznetsov1-2/+1
not backed Since commit 09abb5e3e5e50 ("KVM: nVMX: call kvm_skip_emulated_instruction in nested_vmx_{fail,succeed}") nested_vmx_failValid() results in kvm_skip_emulated_instruction() so doing it again in handle_vmptrld() when vmptr address is not backed is wrong, we end up advancing RIP twice. Fixes: fca91f6d60b6e ("kvm: nVMX: Set VM instruction error for VMPTRLD of unbacked page") Reported-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2019-01-11KVM/VMX: Avoid return error when flush tlb successfully in the ↵Lan Tianyu1-1/+1
hv_remote_flush_tlb_with_range() The "ret" is initialized to be ENOTSUPP. The return value of __hv_remote_flush_tlb_with_range() will be Or with "ret" when ept table potiners are mismatched. This will cause return ENOTSUPP even if flush tlb successfully. This patch is to fix the issue and set "ret" to 0. Fixes: a5c214dad198 ("KVM/VMX: Change hv flush logic when ept tables are mismatched.") Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2019-01-11kvm: sev: Fail KVM_SEV_INIT if already initializedDavid Rientjes1-0/+3
By code inspection, it was found that multiple calls to KVM_SEV_INIT could deplete asid bits and overwrite kvm_sev_info's regions_list. Multiple calls to KVM_SVM_INIT is not likely to occur with QEMU, but this should likely be fixed anyway. This code is serialized by kvm->lock. Fixes: 1654efcbc431 ("KVM: SVM: Add KVM_SEV_INIT command") Reported-by: Cfir Cohen <cfir@google.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2019-01-11KVM: x86: Fix bit shifting in update_intel_pt_cfgGustavo A. R. Silva1-1/+1
ctl_bitmask in pt_desc is of type u64. When an integer like 0xf is being left shifted more than 32 bits, the behavior is undefined. Fix this by adding suffix ULL to integer 0xf. Addresses-Coverity-ID: 1476095 ("Bad bit shift operation") Fixes: 6c0f0bba85a0 ("KVM: x86: Introduce a function to initialize the PT configuration") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Reviewed-by: Wei Yang <richardw.yang@linux.intel.com> Reviewed-by: Luwei Kang <luwei.kang@intel.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2019-01-06jump_label: move 'asm goto' support test to KconfigMasahiro Yamada1-1/+1
Currently, CONFIG_JUMP_LABEL just means "I _want_ to use jump label". The jump label is controlled by HAVE_JUMP_LABEL, which is defined like this: #if defined(CC_HAVE_ASM_GOTO) && defined(CONFIG_JUMP_LABEL) # define HAVE_JUMP_LABEL #endif We can improve this by testing 'asm goto' support in Kconfig, then make JUMP_LABEL depend on CC_HAS_ASM_GOTO. Ugly #ifdef HAVE_JUMP_LABEL will go away, and CONFIG_JUMP_LABEL will match to the real kernel capability. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
2018-12-30Merge tag 'kconfig-v4.21' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kconfig updates from Masahiro Yamada: - support -y option for merge_config.sh to avoid downgrading =y to =m - remove S_OTHER symbol type, and touch include/config/*.h files correctly - fix file name and line number in lexer warnings - fix memory leak when EOF is encountered in quotation - resolve all shift/reduce conflicts of the parser - warn no new line at end of file - make 'source' statement more strict to take only string literal - rewrite the lexer and remove the keyword lookup table - convert to SPDX License Identifier - compile C files independently instead of including them from zconf.y - fix various warnings of gconfig - misc cleanups * tag 'kconfig-v4.21' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (39 commits) kconfig: surround dbg_sym_flags with #ifdef DEBUG to fix gconf warning kconfig: split images.c out of qconf.cc/gconf.c to fix gconf warnings kconfig: add static qualifiers to fix gconf warnings kconfig: split the lexer out of zconf.y kconfig: split some C files out of zconf.y kconfig: convert to SPDX License Identifier kconfig: remove keyword lookup table entirely kconfig: update current_pos in the second lexer kconfig: switch to ASSIGN_VAL state in the second lexer kconfig: stop associating kconf_id with yylval kconfig: refactor end token rules kconfig: stop supporting '.' and '/' in unquoted words treewide: surround Kconfig file paths with double quotes microblaze: surround string default in Kconfig with double quotes kconfig: use T_WORD instead of T_VARIABLE for variables kconfig: use specific tokens instead of T_ASSIGN for assignments kconfig: refactor scanning and parsing "option" properties kconfig: use distinct tokens for type and default properties kconfig: remove redundant token defines kconfig: rename depends_list to comment_option_list ...
2018-12-27Merge branch 'x86-cleanups-for-linus' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Ingo Molnar: "Misc cleanups" * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/kprobes: Remove trampoline_handler() prototype x86/kernel: Fix more -Wmissing-prototypes warnings x86: Fix various typos in comments x86/headers: Fix -Wmissing-prototypes warning x86/process: Avoid unnecessary NULL check in get_wchan() x86/traps: Complete prototype declarations x86/mce: Fix -Wmissing-prototypes warnings x86/gart: Rewrite early_gart_iommu_check() comment
2018-12-26Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds26-15449/+16669
Pull KVM updates from Paolo Bonzini: "ARM: - selftests improvements - large PUD support for HugeTLB - single-stepping fixes - improved tracing - various timer and vGIC fixes x86: - Processor Tracing virtualization - STIBP support - some correctness fixes - refactorings and splitting of vmx.c - use the Hyper-V range TLB flush hypercall - reduce order of vcpu struct - WBNOINVD support - do not use -ftrace for __noclone functions - nested guest support for PAUSE filtering on AMD - more Hyper-V enlightenments (direct mode for synthetic timers) PPC: - nested VFIO s390: - bugfixes only this time" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (171 commits) KVM: x86: Add CPUID support for new instruction WBNOINVD kvm: selftests: ucall: fix exit mmio address guessing Revert "compiler-gcc: disable -ftracer for __noclone functions" KVM: VMX: Move VM-Enter + VM-Exit handling to non-inline sub-routines KVM: VMX: Explicitly reference RCX as the vmx_vcpu pointer in asm blobs KVM: x86: Use jmp to invoke kvm_spurious_fault() from .fixup MAINTAINERS: Add arch/x86/kvm sub-directories to existing KVM/x86 entry KVM/x86: Use SVM assembly instruction mnemonics instead of .byte streams KVM/MMU: Flush tlb directly in the kvm_zap_gfn_range() KVM/MMU: Flush tlb directly in kvm_set_pte_rmapp() KVM/MMU: Move tlb flush in kvm_set_pte_rmapp() to kvm_mmu_notifier_change_pte() KVM: Make kvm_set_spte_hva() return int KVM: Replace old tlb flush function with new one to flush a specified range. KVM/MMU: Add tlb flush with range helper function KVM/VMX: Add hv tlb range flush support x86/hyper-v: Add HvFlushGuestAddressList hypercall support KVM: Add tlb_remote_flush_with_range callback in kvm_x86_ops KVM: x86: Disable Intel PT when VMXON in L1 guest KVM: x86: Set intercept for Intel PT MSRs read/write KVM: x86: Implement Intel PT MSRs read/write emulation ...
2018-12-21treewide: surround Kconfig file paths with double quotesMasahiro Yamada1-1/+1
The Kconfig lexer supports special characters such as '.' and '/' in the parameter context. In my understanding, the reason is just to support bare file paths in the source statement. I do not see a good reason to complicate Kconfig for the room of ambiguity. The majority of code already surrounds file paths with double quotes, and it makes sense since file paths are constant string literals. Make it treewide consistent now. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Acked-by: Wolfram Sang <wsa@the-dreams.de> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Ingo Molnar <mingo@kernel.org>
2018-12-21KVM: x86: Add CPUID support for new instruction WBNOINVDRobert Hoo1-1/+1
Signed-off-by: Robert Hoo <robert.hu@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-21KVM: VMX: Move VM-Enter + VM-Exit handling to non-inline sub-routinesSean Christopherson5-34/+78
Transitioning to/from a VMX guest requires KVM to manually save/load the bulk of CPU state that the guest is allowed to direclty access, e.g. XSAVE state, CR2, GPRs, etc... For obvious reasons, loading the guest's GPR snapshot prior to VM-Enter and saving the snapshot after VM-Exit is done via handcoded assembly. The assembly blob is written as inline asm so that it can easily access KVM-defined structs that are used to hold guest state, e.g. moving the blob to a standalone assembly file would require generating defines for struct offsets. The other relevant aspect of VMX transitions in KVM is the handling of VM-Exits. KVM doesn't employ a separate VM-Exit handler per se, but rather treats the VMX transition as a mega instruction (with many side effects), i.e. sets the VMCS.HOST_RIP to a label immediately following VMLAUNCH/VMRESUME. The label is then exposed to C code via a global variable definition in the inline assembly. Because of the global variable, KVM takes steps to (attempt to) ensure only a single instance of the owning C function, e.g. vmx_vcpu_run, is generated by the compiler. The earliest approach placed the inline assembly in a separate noinline function[1]. Later, the assembly was folded back into vmx_vcpu_run() and tagged with __noclone[2][3], which is still used today. After moving to __noclone, an edge case was encountered where GCC's -ftracer optimization resulted in the inline assembly blob being duplicated. This was "fixed" by explicitly disabling -ftracer in the __noclone definition[4]. Recently, it was found that disabling -ftracer causes build warnings for unsuspecting users of __noclone[5], and more importantly for KVM, prevents the compiler for properly optimizing vmx_vcpu_run()[6]. And perhaps most importantly of all, it was pointed out that there is no way to prevent duplication of a function with 100% reliability[7], i.e. more edge cases may be encountered in the future. So to summarize, the only way to prevent the compiler from duplicating the global variable definition is to move the variable out of inline assembly, which has been suggested several times over[1][7][8]. Resolve the aforementioned issues by moving the VMLAUNCH+VRESUME and VM-Exit "handler" to standalone assembly sub-routines. Moving only the core VMX transition codes allows the struct indexing to remain as inline assembly and also allows the sub-routines to be used by nested_vmx_check_vmentry_hw(). Reusing the sub-routines has a happy side-effect of eliminating two VMWRITEs in the nested_early_check path as there is no longer a need to dynamically change VMCS.HOST_RIP. Note that callers to vmx_vmenter() must account for the CALL modifying RSP, e.g. must subtract op-size from RSP when synchronizing RSP with VMCS.HOST_RSP and "restore" RSP prior to the CALL. There are no great alternatives to fudging RSP. Saving RSP in vmx_enter() is difficult because doing so requires a second register (VMWRITE does not provide an immediate encoding for the VMCS field and KVM supports Hyper-V's memory-based eVMCS ABI). The other more drastic alternative would be to use eschew VMCS.HOST_RSP and manually save/load RSP using a per-cpu variable (which can be encoded as e.g. gs:[imm]). But because a valid stack is needed at the time of VM-Exit (NMIs aren't blocked and a user could theoretically insert INT3/INT1ICEBRK at the VM-Exit handler), a dedicated per-cpu VM-Exit stack would be required. A dedicated stack isn't difficult to implement, but it would require at least one page per CPU and knowledge of the stack in the dumpstack routines. And in most cases there is essentially zero overhead in dynamically updating VMCS.HOST_RSP, e.g. the VMWRITE can be avoided for all but the first VMLAUNCH unless nested_early_check=1, which is not a fast path. In other words, avoiding the VMCS.HOST_RSP by using a dedicated stack would only make the code marginally less ugly while requiring at least one page per CPU and forcing the kernel to be aware (and approve) of the VM-Exit stack shenanigans. [1] cea15c24ca39 ("KVM: Move KVM context switch into own function") [2] a3b5ba49a8c5 ("KVM: VMX: add the __noclone attribute to vmx_vcpu_run") [3] 104f226bfd0a ("KVM: VMX: Fold __vmx_vcpu_run() into vmx_vcpu_run()") [4] 95272c29378e ("compiler-gcc: disable -ftracer for __noclone functions") [5] https://lkml.kernel.org/r/20181218140105.ajuiglkpvstt3qxs@treble [6] https://patchwork.kernel.org/patch/8707981/#21817015 [7] https://lkml.kernel.org/r/ri6y38lo23g.fsf@suse.cz [8] https://lkml.kernel.org/r/20181218212042.GE25620@tassilo.jf.intel.com Suggested-by: Andi Kleen <ak@linux.intel.com> Suggested-by: Martin Jambor <mjambor@suse.cz> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Nadav Amit <namit@vmware.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Martin Jambor <mjambor@suse.cz> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-21KVM: VMX: Explicitly reference RCX as the vmx_vcpu pointer in asm blobsSean Christopherson2-42/+50
Use '%% " _ASM_CX"' instead of '%0' to dereference RCX, i.e. the 'struct vcpu_vmx' pointer, in the VM-Enter asm blobs of vmx_vcpu_run() and nested_vmx_check_vmentry_hw(). Using the symbolic name means that adding/removing an output parameter(s) requires "rewriting" almost all of the asm blob, which makes it nearly impossible to understand what's being changed in even the most minor patches. Opportunistically improve the code comments. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-21KVM/x86: Use SVM assembly instruction mnemonics instead of .byte streamsUros Bizjak1-6/+6
Recently the minimum required version of binutils was changed to 2.20, which supports all SVM instruction mnemonics. The patch removes all .byte #defines and uses real instruction mnemonics instead. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-21KVM/MMU: Flush tlb directly in the kvm_zap_gfn_range()Lan Tianyu1-3/+13
Originally, flush tlb is done by slot_handle_level_range(). This patch moves the flush directly to kvm_zap_gfn_range() when range flush is available, so that only the requested range can be flushed. Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-21KVM/MMU: Flush tlb directly in kvm_set_pte_rmapp()Lan Tianyu1-0/+5
This patch is to flush tlb directly in kvm_set_pte_rmapp() function when Hyper-V remote TLB flush is available, returning 0 so that kvm_mmu_notifier_change_pte() does not flush again. Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-21KVM/MMU: Move tlb flush in kvm_set_pte_rmapp() to kvm_mmu_notifier_change_pte()Lan Tianyu1-6/+2
This patch is to move tlb flush in kvm_set_pte_rmapp() to kvm_mmu_notifier_change_pte() in order to avoid redundant tlb flush. Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-21KVM: Make kvm_set_spte_hva() return intLan Tianyu1-1/+2
The patch is to make kvm_set_spte_hva() return int and caller can check return value to determine flush tlb or not. Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com> Acked-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-21KVM: Replace old tlb flush function with new one to flush a specified range.Lan Tianyu2-11/+23
This patch is to replace kvm_flush_remote_tlbs() with kvm_flush_ remote_tlbs_with_address() in some functions without logic change. Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-21KVM/MMU: Add tlb flush with range helper functionLan Tianyu1-1/+36
This patch is to add wrapper functions for tlb_remote_flush_with_range callback and flush tlb directly in kvm_mmu_zap_collapsible_spte(). kvm_mmu_zap_collapsible_spte() returns flush request to the slot_handle_leaf() and the latter does flush on demand. When range flush is available, make kvm_mmu_zap_collapsible_spte() to flush tlb with range directly to avoid returning range back to slot_handle_leaf(). Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-21KVM/VMX: Add hv tlb range flush supportLan Tianyu1-17/+44
This patch is to register tlb_remote_flush_with_range callback with hv tlb range flush interface. Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-21KVM: x86: Disable Intel PT when VMXON in L1 guestLuwei Kang2-1/+8
Currently, Intel Processor Trace do not support tracing in L1 guest VMX operation(IA32_VMX_MISC[bit 14] is 0). As mentioned in SDM, on these type of processors, execution of the VMXON instruction will clears IA32_RTIT_CTL.TraceEn and any attempt to write IA32_RTIT_CTL causes a general-protection exception (#GP). Signed-off-by: Luwei Kang <luwei.kang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-21KVM: x86: Set intercept for Intel PT MSRs read/writeChao Peng2-7/+27
To save performance overhead, disable intercept Intel PT MSRs read/write when Intel PT is enabled in guest. MSR_IA32_RTIT_CTL is an exception that will always be intercepted. Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com> Signed-off-by: Luwei Kang <luwei.kang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-21KVM: x86: Implement Intel PT MSRs read/write emulationChao Peng2-1/+216
This patch implement Intel Processor Trace MSRs read/write emulation. Intel PT MSRs read/write need to be emulated when Intel PT MSRs is intercepted in guest and during live migration. Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com> Signed-off-by: Luwei Kang <luwei.kang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-21KVM: x86: Introduce a function to initialize the PT configurationLuwei Kang1-0/+73
Initialize the Intel PT configuration when cpuid update. Include cpuid inforamtion, rtit_ctl bit mask and the number of address ranges. Signed-off-by: Luwei Kang <luwei.kang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-21KVM: x86: Add Intel PT context switch for each vcpuChao Peng2-0/+95
Load/Store Intel Processor Trace register in context switch. MSR IA32_RTIT_CTL is loaded/stored automatically from VMCS. In Host-Guest mode, we need load/resore PT MSRs only when PT is enabled in guest. Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com> Signed-off-by: Luwei Kang <luwei.kang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-21KVM: x86: Add Intel Processor Trace cpuid emulationChao Peng3-2/+32
Expose Intel Processor Trace to guest only when the PT works in Host-Guest mode. Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com> Signed-off-by: Luwei Kang <luwei.kang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-21KVM: x86: Add Intel PT virtualization work modeChao Peng3-3/+42
Intel Processor Trace virtualization can be work in one of 2 possible modes: a. System-Wide mode (default): When the host configures Intel PT to collect trace packets of the entire system, it can leave the relevant VMX controls clear to allow VMX-specific packets to provide information across VMX transitions. KVM guest will not aware this feature in this mode and both host and KVM guest trace will output to host buffer. b. Host-Guest mode: Host can configure trace-packet generation while in VMX non-root operation for guests and root operation for native executing normally. Intel PT will be exposed to KVM guest in this mode, and the trace output to respective buffer of host and guest. In this mode, tht status of PT will be saved and disabled before VM-entry and restored after VM-exit if trace a virtual machine. Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com> Signed-off-by: Luwei Kang <luwei.kang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-21KVM: fix some typosWei Yang2-2/+2
Signed-off-by: Wei Yang <richard.weiyang@gmail.com> [Preserved the iff and a probably intentional weird bracket notation. Also dropped the style change to make a single-purpose patch. - Radim] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-12-21KVM: x86: Remove KF() macro placeholderSean Christopherson1-3/+0
Although well-intentioned, keeping the KF() definition as a hint for handling scattered CPUID features may be counter-productive. Simply redefining the bit position only works for directly manipulating the guest's CPUID leafs, e.g. it doesn't make guest_cpuid_has() magically work. Taking an alternative approach, e.g. ensuring the bit position is identical between the Linux-defined and hardware-defined features, may be a simpler and/or more effective method of exposing scattered features to the guest. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-12-21kvm: vmx: Allow guest read access to IA32_TSCJim Mattson1-0/+1
Let the guest read the IA32_TSC MSR with the generic RDMSR instruction as well as the specific RDTSC(P) instructions. Note that the hardware applies the TSC multiplier and offset (when applicable) to the result of RDMSR(IA32_TSC), just as it does to the result of RDTSC(P). Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Peter Shier <pshier@google.com> Reviewed-by: Marc Orr <marcorr@google.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-12-21kvm: nVMX: NMI-window and interrupt-window exiting should wake L2 from HLTJim Mattson1-3/+7
According to the SDM, "NMI-window exiting" VM-exits wake a logical processor from the same inactive states as would an NMI and "interrupt-window exiting" VM-exits wake a logical processor from the same inactive states as would an external interrupt. Specifically, they wake a logical processor from the shutdown state and from the states entered using the HLT and MWAIT instructions. Fixes: 6dfacadd5858 ("KVM: nVMX: Add support for activity state HLT") Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Peter Shier <pshier@google.com> Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com> [Squashed comments of two Jim's patches and used the simplified code hunk provided by Sean. - Radim] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-12-21KVM: nSVM: Fix nested guest support for PAUSE filtering.Tambe, William1-0/+12
Currently, the nested guest's PAUSE intercept intentions are not being honored. Instead, since the L0 hypervisor's pause_filter_count and pause_filter_thresh values are still in place, these values are used instead of those programmed in the VMCB by the L1 hypervisor. To honor the desired PAUSE intercept support of the L1 hypervisor, the L0 hypervisor must use the PAUSE filtering fields of the L1 hypervisor. This requires saving and restoring of both the L0 and L1 hypervisor's PAUSE filtering fields. Signed-off-by: William Tambe <william.tambe@amd.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-12-21KVM: VMX: Remove duplicated include from vmx.cYueHaibing1-1/+0
Remove duplicated include. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-12-21KVM: x86: svm: report MSR_IA32_MCG_EXT_CTL as unsupportedVitaly Kuznetsov1-0/+7
AMD doesn't seem to implement MSR_IA32_MCG_EXT_CTL and svm code in kvm knows nothing about it, however, this MSR is among emulated_msrs and thus returned with KVM_GET_MSR_INDEX_LIST. The consequent KVM_GET_MSRS, of course, fails. Report the MSR as unsupported to not confuse userspace. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-12-21KVM: x86: fix size of x86_fpu_cache objectsPaolo Bonzini1-1/+1
The memory allocation in b666a4b69739 ("kvm: x86: Dynamically allocate guest_fpu", 2018-11-06) is wrong, there are other members in struct fpu before the fpregs_state union and the patch should be doing something similar to the code in fpu__init_task_struct_size. It's enough to run a guest and then rmmod kvm to see slub errors which are actually caused by memory corruption. For now let's revert it to sizeof(struct fpu), which is conservative. I have plans to move fsave/fxsave/xsave directly in KVM, without using the kernel FPU helpers, and once it's done, the size of the object in the cache will be something like kvm_xstate_size. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-20KVM: x86: nSVM: fix switch to guest mmuVitaly Kuznetsov1-1/+3
Recent optimizations in MMU code broke nested SVM with NPT in L1 completely: when we do nested_svm_{,un}init_mmu_context() we want to switch from TDP MMU to shadow MMU, both init_kvm_tdp_mmu() and kvm_init_shadow_mmu() check if re-configuration is needed by looking at cache source data. The data, however, doesn't change - it's only the type of the MMU which changes. We end up not re-initializing guest MMU as shadow and everything goes off the rails. The issue could have been fixed by putting MMU type into extended MMU role but this is not really needed. We can just split root and guest MMUs the exact same way we did for nVMX, their types never change in the lifetime of a vCPU. There is still room for improvement: currently, we reset all MMU roots when switching from L1 to L2 and back and this is not needed. Fixes: 7dcd57552008 ("x86/kvm/mmu: check if tdp/shadow MMU reconfiguration is needed") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-19kvm: x86: Add AMD's EX_CFG to the list of ignored MSRsEduardo Habkost1-0/+2
Some guests OSes (including Windows 10) write to MSR 0xc001102c on some cases (possibly while trying to apply a CPU errata). Make KVM ignore reads and writes to that MSR, so the guest won't crash. The MSR is documented as "Execution Unit Configuration (EX_CFG)", at AMD's "BIOS and Kernel Developer's Guide (BKDG) for AMD Family 15h Models 00h-0Fh Processors". Cc: stable@vger.kernel.org Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-19KVM: X86: Fix NULL deref in vcpu_scan_ioapicWanpeng Li1-1/+1
Reported by syzkaller: CPU: 1 PID: 5962 Comm: syz-executor118 Not tainted 4.20.0-rc6+ #374 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:kvm_apic_hw_enabled arch/x86/kvm/lapic.h:169 [inline] RIP: 0010:vcpu_scan_ioapic arch/x86/kvm/x86.c:7449 [inline] RIP: 0010:vcpu_enter_guest arch/x86/kvm/x86.c:7602 [inline] RIP: 0010:vcpu_run arch/x86/kvm/x86.c:7874 [inline] RIP: 0010:kvm_arch_vcpu_ioctl_run+0x5296/0x7320 arch/x86/kvm/x86.c:8074 Call Trace: kvm_vcpu_ioctl+0x5c8/0x1150 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2596 vfs_ioctl fs/ioctl.c:46 [inline] file_ioctl fs/ioctl.c:509 [inline] do_vfs_ioctl+0x1de/0x1790 fs/ioctl.c:696 ksys_ioctl+0xa9/0xd0 fs/ioctl.c:713 __do_sys_ioctl fs/ioctl.c:720 [inline] __se_sys_ioctl fs/ioctl.c:718 [inline] __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe The reason is that the testcase writes hyperv synic HV_X64_MSR_SINT14 msr and triggers scan ioapic logic to load synic vectors into EOI exit bitmap. However, irqchip is not initialized by this simple testcase, ioapic/apic objects should not be accessed. This patch fixes it by also considering whether or not apic is present. Reported-by: syzbot+39810e6c400efadfef71@syzkaller.appspotmail.com Cc: stable@vger.kernel.org Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-19KVM: Fix UAF in nested posted interrupt processingCfir Cohen1-0/+2
nested_get_vmcs12_pages() processes the posted_intr address in vmcs12. It caches the kmap()ed page object and pointer, however, it doesn't handle errors correctly: it's possible to cache a valid pointer, then release the page and later dereference the dangling pointer. I was able to reproduce with the following steps: 1. Call vmlaunch with valid posted_intr_desc_addr but an invalid MSR_EFER. This causes nested_get_vmcs12_pages() to cache the kmap()ed pi_desc_page and pi_desc. Later the invalid EFER value fails check_vmentry_postreqs() which fails the first vmlaunch. 2. Call vmlanuch with a valid EFER but an invalid posted_intr_desc_addr (I set it to 2G - 0x80). The second time we call nested_get_vmcs12_pages pi_desc_page is unmapped and released and pi_desc_page is set to NULL (the "shouldn't happen" clause). Due to the invalid posted_intr_desc_addr, kvm_vcpu_gpa_to_page() fails and nested_get_vmcs12_pages() returns. It doesn't return an error value so vmlaunch proceeds. Note that at this time we have a dangling pointer in vmx->nested.pi_desc and POSTED_INTR_DESC_ADDR in L0's vmcs. 3. Issue an IPI in L2 guest code. This triggers a call to vmx_complete_nested_posted_interrupt() and pi_test_and_clear_on() which dereferences the dangling pointer. Vulnerable code requires nested and enable_apicv variables to be set to true. The host CPU must also support posted interrupts. Fixes: 5e2f30b756a37 "KVM: nVMX: get rid of nested_get_page()" Cc: stable@vger.kernel.org Reviewed-by: Andy Honig <ahonig@google.com> Signed-off-by: Cfir Cohen <cfir@google.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14kvm: x86: Dynamically allocate guest_fpuMarc Orr3-13/+58
Previously, the guest_fpu field was embedded in the kvm_vcpu_arch struct. Unfortunately, the field is quite large, (e.g., 4352 bytes on my current setup). This bloats the kvm_vcpu_arch struct for x86 into an order 3 memory allocation, which can become a problem on overcommitted machines. Thus, this patch moves the fpu state outside of the kvm_vcpu_arch struct. With this patch applied, the kvm_vcpu_arch struct is reduced to 15168 bytes for vmx on my setup when building the kernel with kvmconfig. Suggested-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Marc Orr <marcorr@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>