summaryrefslogtreecommitdiff
path: root/arch/x86/include
AgeCommit message (Collapse)AuthorFilesLines
2020-03-16KVM: x86: Use KVM cpu caps to track UMIP emulationSean Christopherson1-1/+0
Set UMIP in kvm_cpu_caps when it is emulated by VMX, even though the bit will effectively be dropped by do_host_cpuid(). This allows checking for UMIP emulation via kvm_cpu_caps instead of a dedicated kvm_x86_ops callback. No functional change intended. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: x86: Move XSAVES CPUID adjust to VMX's KVM cpu cap updateSean Christopherson1-1/+0
Move the clearing of the XSAVES CPUID bit into VMX, which has a separate VMCS control to enable XSAVES in non-root, to eliminate the last ugly renmant of the undesirable "unsigned f_* = *_supported ? F(*) : 0" pattern in the common CPUID handling code. Drop ->xsaves_supported(), CPUID adjustment was the only user. No functional change intended. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: x86: Handle PKU CPUID adjustment in VMX codeSean Christopherson1-1/+0
Move the setting of the PKU CPUID bit into VMX to eliminate an instance of the undesirable "unsigned f_* = *_supported ? F(*) : 0" pattern in the common CPUID handling code. Drop ->pku_supported(), CPUID adjustment was the only user. Note, some AMD CPUs now support PKU, but SVM doesn't yet support exposing it to a guest. No functional change intended. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: x86: Handle INVPCID CPUID adjustment in VMX codeSean Christopherson1-1/+0
Move the INVPCID CPUID adjustments into VMX to eliminate an instance of the undesirable "unsigned f_* = *_supported ? F(*) : 0" pattern in the common CPUID handling code. Drop ->invpcid_supported(), CPUID adjustment was the only user. No functional change intended. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: x86: Drop explicit @func param from ->set_supported_cpuid()Sean Christopherson1-1/+1
Drop the explicit @func param from ->set_supported_cpuid() and instead pull the CPUID function from the relevant entry. This sets the stage for hardening guest CPUID updates in future patches, e.g. allows adding run-time assertions that the CPUID feature being changed is actually a bit in the referenced CPUID entry. No functional change intended. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: x86: Use supported_xcr0 to detect MPX supportSean Christopherson1-1/+1
Query supported_xcr0 when checking for MPX support instead of invoking ->mpx_supported() and drop ->mpx_supported() as kvm_mpx_supported() was its last user. Rename vmx_mpx_supported() to cpu_has_vmx_mpx() to better align with VMX/VMCS nomenclature. Modify VMX's adjustment of xcr0 to call cpus_has_vmx_mpx() (renamed from vmx_mpx_supported()) directly to avoid reading supported_xcr0 before it's fully configured. No functional change intended. Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> [Test that *all* bits are set. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: x86: Move kvm_emulate.h into KVM's private directorySean Christopherson2-483/+4
Now that the emulation context is dynamically allocated and not embedded in struct kvm_vcpu, move its header, kvm_emulate.h, out of the public asm directory and into KVM's private x86 directory. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: x86: Dynamically allocate per-vCPU emulation contextSean Christopherson2-1/+2
Allocate the emulation context instead of embedding it in struct kvm_vcpu_arch. Dynamic allocation provides several benefits: - Shrinks the size x86 vcpus by ~2.5k bytes, dropping them back below the PAGE_ALLOC_COSTLY_ORDER threshold. - Allows for dropping the include of kvm_emulate.h from asm/kvm_host.h and moving kvm_emulate.h into KVM's private directory. - Allows a reducing KVM's attack surface by shrinking the amount of vCPU data that is exposed to usercopy. - Allows a future patch to disable the emulator entirely, which may or may not be a realistic endeavor. Mark the entire struct as valid for usercopy to maintain existing behavior with respect to hardened usercopy. Future patches can shrink the usercopy range to cover only what is necessary. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: x86: Explicitly pass an exception struct to check_interceptSean Christopherson1-1/+2
Explicitly pass an exception struct when checking for intercept from the emulator, which eliminates the last reference to arch.emulate_ctxt in vendor specific code. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: x86/mmu: Rename kvm_mmu->get_cr3() to ->get_guest_pgd()Sean Christopherson1-1/+1
Rename kvm_mmu->get_cr3() to call out that it is retrieving a guest value, as opposed to kvm_mmu->set_cr3(), which sets a host value, and to note that it will return something other than CR3 when nested EPT is in use. Hopefully the new name will also make it more obvious that L1's nested_cr3 is returned in SVM's nested NPT case. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: nVMX: Allow L1 to use 5-level page walks for nested EPTSean Christopherson1-0/+12
Add support for 5-level nested EPT, and advertise said support in the EPT capabilities MSR. KVM's MMU can already handle 5-level legacy page tables, there's no reason to force an L1 VMM to use shadow paging if it wants to employ 5-level page tables. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: x86/mmu: Drop kvm_mmu_extended_role.cr4_la57 hackSean Christopherson1-1/+0
Drop kvm_mmu_extended_role.cr4_la57 now that mmu_role doesn't mask off level, which already incorporates the guest's CR4.LA57 for a shadow MMU by querying is_la57_mode(). Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: nVMX: Properly handle userspace interrupt window requestSean Christopherson1-1/+1
Return true for vmx_interrupt_allowed() if the vCPU is in L2 and L1 has external interrupt exiting enabled. IRQs are never blocked in hardware if the CPU is in the guest (L2 from L1's perspective) when IRQs trigger VM-Exit. The new check percolates up to kvm_vcpu_ready_for_interrupt_injection() and thus vcpu_run(), and so KVM will exit to userspace if userspace has requested an interrupt window (to inject an IRQ into L1). Remove the @external_intr param from vmx_check_nested_events(), which is actually an indicator that userspace wants an interrupt window, e.g. it's named @req_int_win further up the stack. Injecting a VM-Exit into L1 to try and bounce out to L0 userspace is all kinds of broken and is no longer necessary. Remove the hack in nested_vmx_vmexit() that attempted to workaround the breakage in vmx_check_nested_events() by only filling interrupt info if there's an actual interrupt pending. The hack actually made things worse because it caused KVM to _never_ fill interrupt info when the LAPIC resides in userspace (kvm_cpu_has_interrupt() queries interrupt.injected, which is always cleared by prepare_vmcs12() before reaching the hack in nested_vmx_vmexit()). Fixes: 6550c4df7e50 ("KVM: nVMX: Fix interrupt window request with "Acknowledge interrupt on exit"") Cc: stable@vger.kernel.org Cc: Liran Alon <liran.alon@oracle.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: LAPIC: Recalculate apic map in batchWanpeng Li1-0/+1
In the vCPU reset and set APIC_BASE MSR path, the apic map will be recalculated several times, each time it will consume 10+ us observed by ftrace in my non-overcommit environment since the expensive memory allocate/mutex/rcu etc operations. This patch optimizes it by recaluating apic map in batch, I hope this can benefit the serverless scenario which can frequently create/destroy VMs. Before patch: kvm_lapic_reset ~27us After patch: kvm_lapic_reset ~14us Observed by ftrace, improve ~48%. Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: x86: enable dirty log gradually in small chunksJay Zhou1-1/+5
It could take kvm->mmu_lock for an extended period of time when enabling dirty log for the first time. The main cost is to clear all the D-bits of last level SPTEs. This situation can benefit from manual dirty log protect as well, which can reduce the mmu_lock time taken. The sequence is like this: 1. Initialize all the bits of the dirty bitmap to 1 when enabling dirty log for the first time 2. Only write protect the huge pages 3. KVM_GET_DIRTY_LOG returns the dirty bitmap info 4. KVM_CLEAR_DIRTY_LOG will clear D-bit for each of the leaf level SPTEs gradually in small chunks Under the Intel(R) Xeon(R) Gold 6152 CPU @ 2.10GHz environment, I did some tests with a 128G windows VM and counted the time taken of memory_global_dirty_log_start, here is the numbers: VM Size Before After optimization 128G 460ms 10ms Signed-off-by: Jay Zhou <jianjay.zhou@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: SVM: Inhibit APIC virtualization for X2APIC guestOliver Upton1-0/+1
The AVIC does not support guest use of the x2APIC interface. Currently, KVM simply chooses to squash the x2APIC feature in the guest's CPUID If the AVIC is enabled. Doing so prevents KVM from running a guest with greater than 255 vCPUs, as such a guest necessitates the use of the x2APIC interface. Instead, inhibit AVIC enablement on a per-VM basis whenever the x2APIC feature is set in the guest's CPUID. Signed-off-by: Oliver Upton <oupton@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: x86: Consolidate VM allocation and free for VMX and SVMSean Christopherson1-8/+4
Move the VM allocation and free code to common x86 as the logic is more or less identical across SVM and VMX. Note, although hyperv.hv_pa_pg is part of the common kvm->arch, it's (currently) only allocated by VMX VMs. But, since kfree() plays nice when passed a NULL pointer, the superfluous call for SVM is harmless and avoids future churn if SVM gains support for HyperV's direct TLB flush. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> [Make vm_size a field instead of a function. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: Simplify kvm_free_memslot() and all its descendentsSean Christopherson1-2/+1
Now that all callers of kvm_free_memslot() pass NULL for @dont, remove the param from the top-level routine and all arch's implementations. No functional change intended. Tested-by: Christoffer Dall <christoffer.dall@arm.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: x86: Move gpa_val and gpa_available into the emulator contextSean Christopherson2-4/+4
Move the GPA tracking into the emulator context now that the context is guaranteed to be initialized via __init_emulate_ctxt() prior to dereferencing gpa_{available,val}, i.e. now that seeing a stale gpa_available will also trigger a WARN due to an invalid context. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: x86: Add EMULTYPE_PF when emulation is triggered by a page faultSean Christopherson1-3/+9
Add a new emulation type flag to explicitly mark emulation related to a page fault. Move the propation of the GPA into the emulator from the page fault handler into x86_emulate_instruction, using EMULTYPE_PF as an indicator that cr2 is valid. Similarly, don't propagate cr2 into the exception.address when it's *not* valid. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-15Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-1/+0
Pull kvm fixes from Paolo Bonzini: "Bugfixes for x86 and s390" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: nVMX: avoid NULL pointer dereference with incorrect EVMCS GPAs KVM: x86: Initializing all kvm_lapic_irq fields in ioapic_write_indirect KVM: VMX: Condition ENCLS-exiting enabling on CPU support for SGX1 KVM: s390: Also reset registers in sync regs for initial cpu reset KVM: fix Kconfig menu text for -Werror KVM: x86: remove stale comment from struct x86_emulate_ctxt KVM: x86: clear stale x86_emulate_ctxt->intercept value KVM: SVM: Fix the svm vmexit code for WRMSR KVM: X86: Fix dereference null cpufreq policy
2020-03-12x86/cpu/amd: Call init_amd_zn() om Family 19h processors tooKim Phillips1-1/+1
Family 19h CPUs are Zen-based and still share most architectural features with Family 17h CPUs, and therefore still need to call init_amd_zn() e.g., to set the RECLAIM_DISTANCE override. init_amd_zn() also sets X86_FEATURE_ZEN, which today is only used in amd_set_core_ssb_state(), which isn't called on some late model Family 17h CPUs, nor on any Family 19h CPUs: X86_FEATURE_AMD_SSBD replaces X86_FEATURE_LS_CFG_SSBD on those later model CPUs, where the SSBD mitigation is done via the SPEC_CTRL MSR instead of the LS_CFG MSR. Family 19h CPUs also don't have the erratum where the CPB feature bit isn't set, but that code can stay unchanged and run safely on Family 19h. Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200311191451.13221-1-kim.phillips@amd.com
2020-03-10x86/mce/dev-mcelog: Dynamically allocate space for machine check recordsTony Luck1-3/+3
We have had a hard coded limit of 32 machine check records since the dawn of time. But as numbers of cores increase, it is possible for more than 32 errors to be reported before a user process reads from /dev/mcelog. In this case the additional errors are lost. Keep 32 as the minimum. But tune the maximum value up based on the number of processors. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200218184408.GA23048@agluck-desk2.amr.corp.intel.com
2020-03-09PCI: hv: Introduce hv_msi_entryBoqun Feng2-2/+17
Add a new structure (hv_msi_entry), which is also defined in the TLFS, to describe the msi entry for HVCALL_RETARGET_INTERRUPT. The structure is needed because its layout may be different from architecture to architecture. Also add a new generic interface hv_set_msi_entry_from_desc() to allow different archs to set the msi entry from msi_desc. No functional change, only preparation for the future support of virtual PCI on non-x86 architectures. Signed-off-by: Boqun Feng (Microsoft) <boqun.feng@gmail.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Reviewed-by: Dexuan Cui <decui@microsoft.com>
2020-03-09PCI: hv: Move retarget related structures into tlfs headerBoqun Feng1-0/+31
Currently, retarget_msi_interrupt and other structures it relys on are defined in pci-hyperv.c. However, those structures are actually defined in Hypervisor Top-Level Functional Specification [1] and may be different in sizes of fields or layout from architecture to architecture. Let's move those definitions into x86's tlfs header file to support virtual PCI on non-x86 architectures in the future. Note that "__packed" attribute is added to these structures during the movement for the same reason as we use the attribute for other TLFS structures in the header file: make sure the structures meet the specification and avoid anything unexpected from the compilers. Additionally, rename struct retarget_msi_interrupt to hv_retarget_msi_interrupt for the consistent naming convention, also mirroring the name in TLFS. [1]: https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/reference/tlfs Signed-off-by: Boqun Feng (Microsoft) <boqun.feng@gmail.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Reviewed-by: Dexuan Cui <decui@microsoft.com>
2020-03-09PCI: hv: Move hypercall related definitions into tlfs headerBoqun Feng1-0/+3
Currently HVCALL_RETARGET_INTERRUPT and HV_PARTITION_ID_SELF are defined in pci-hyperv.c. However, similar to other hypercall related definitions, it makes more sense to put them in the tlfs header file. Besides, these definitions are arch-dependent, so for the support of virtual PCI on non-x86 archs in the future, move them into arch-specific tlfs header file. Signed-off-by: Boqun Feng (Microsoft) <boqun.feng@gmail.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Reviewed-by: Andrew Murray <amurray@thegoodpenguin.co.uk> Reviewed-by: Dexuan Cui <decui@microsoft.com>
2020-03-08Merge branch 'efi/urgent' into efi/core, to pick up fixesIngo Molnar3-1/+19
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-03-06Merge branch 'linus' into sched/core, to pick up fixesIngo Molnar8-4/+36
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-03-03KVM: x86: remove stale comment from struct x86_emulate_ctxtVitaly Kuznetsov1-1/+0
Commit c44b4c6ab80e ("KVM: emulate: clean up initializations in init_decode_cache") did some field shuffling and instead of [opcode_len, _regs) started clearing [has_seg_override, modrm). The comment about clearing fields altogether is not true anymore. Fixes: c44b4c6ab80e ("KVM: emulate: clean up initializations in init_decode_cache") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-02Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds3-1/+19
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Misc fixes: a pkeys fix for a bug that triggers with weird BIOS settings, and two Xen PV fixes: a paravirt interface fix, and pagetable dumping fix" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm: Fix dump_pagetables with Xen PV x86/ioperm: Add new paravirt function update_io_bitmap() x86/pkeys: Manually set X86_FEATURE_OSPKE to preserve existing changes
2020-02-29x86/ioperm: Add new paravirt function update_io_bitmap()Juergen Gross3-1/+19
Commit 111e7b15cf10f6 ("x86/ioperm: Extend IOPL config to control ioperm() as well") reworked the iopl syscall to use I/O bitmaps. Unfortunately this broke Xen PV domains using that syscall as there is currently no I/O bitmap support in PV domains. Add I/O bitmap support via a new paravirt function update_io_bitmap which Xen PV domains can use to update their I/O bitmaps via a hypercall. Fixes: 111e7b15cf10f6 ("x86/ioperm: Extend IOPL config to control ioperm() as well") Reported-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Cc: <stable@vger.kernel.org> # 5.5 Link: https://lkml.kernel.org/r/20200218154712.25490-1-jgross@suse.com
2020-02-27x86/irq: Remove useless return value from do_IRQ()Thomas Gleixner1-1/+1
Nothing is using it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Reviewed-by: Andy Lutomirski <luto@kernel.org> Link: https://lkml.kernel.org/r/20200225220216.826870369@linutronix.de
2020-02-27x86/traps: Remove redundant declaration of do_double_fault()Thomas Gleixner1-7/+7
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Reviewed-by: Andy Lutomirski <luto@kernel.org> Link: https://lkml.kernel.org/r/20200225220216.720335354@linutronix.de
2020-02-27x86/entry/32: Force MCE through do_mce()Thomas Gleixner1-3/+0
Remove the pointless difference between 32 and 64 bit to make further unifications simpler. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Reviewed-by: Andy Lutomirski <luto@kernel.org> Link: https://lkml.kernel.org/r/20200225220216.428188397@linutronix.de
2020-02-27x86/mce: Disable tracing and kprobes on do_machine_check()Andy Lutomirski1-3/+0
do_machine_check() can be raised in almost any context including the most fragile ones. Prevent kprobes and tracing. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@suse.de> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Reviewed-by: Andy Lutomirski <luto@kernel.org> Link: https://lkml.kernel.org/r/20200225220216.315548935@linutronix.de
2020-02-26Merge tag 'efi-next' of ↵Ingo Molnar1-4/+19
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi into efi/core Pull EFI updates for v5.7 from Ard Biesheuvel: This time, the set of changes for the EFI subsystem is much larger than usual. The main reasons are: - Get things cleaned up before EFI support for RISC-V arrives, which will increase the size of the validation matrix, and therefore the threshold to making drastic changes, - After years of defunct maintainership, the GRUB project has finally started to consider changes from the distros regarding UEFI boot, some of which are highly specific to the way x86 does UEFI secure boot and measured boot, based on knowledge of both shim internals and the layout of bootparams and the x86 setup header. Having this maintenance burden on other architectures (which don't need shim in the first place) is hard to justify, so instead, we are introducing a generic Linux/UEFI boot protocol. Summary of changes: - Boot time GDT handling changes (Arvind) - Simplify handling of EFI properties table on arm64 - Generic EFI stub cleanups, to improve command line handling, file I/O, memory allocation, etc. - Introduce a generic initrd loading method based on calling back into the firmware, instead of relying on the x86 EFI handover protocol or device tree. - Introduce a mixed mode boot method that does not rely on the x86 EFI handover protocol either, and could potentially be adopted by other architectures (if another one ever surfaces where one execution mode is a superset of another) - Clean up the contents of struct efi, and move out everything that doesn't need to be stored there. - Incorporate support for UEFI spec v2.8A changes that permit firmware implementations to return EFI_UNSUPPORTED from UEFI runtime services at OS runtime, and expose a mask of which ones are supported or unsupported via a configuration table. - Various documentation updates and minor code cleanups (Heinrich) - Partial fix for the lack of by-VA cache maintenance in the decompressor on 32-bit ARM. Note that these patches were deliberately put at the beginning so they can be used as a stable branch that will be shared with a PR containing the complete fix, which I will send to the ARM tree. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-02-25x86/vmlinux: Drop unneeded linker script discard of .eh_frameArvind Sankar1-2/+2
Now that .eh_frame sections for the files in setup.elf and realmode.elf are not generated anymore, the linker scripts don't need the special output section name /DISCARD/ any more. Remove the one in the main kernel linker script as well, since there are no .eh_frame sections already, and fix up a comment referencing .eh_frame. Update the comment in asm/dwarf2.h referring to .eh_frame so it continues to make sense, as well as being more specific. [ bp: Touch up commit message. ] Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Nathan Chancellor <natechancellor@gmail.com> Link: https://lkml.kernel.org/r/20200224232129.597160-3-nivedita@alum.mit.edu
2020-02-24Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds5-3/+17
Pull kvm fixes from Paolo Bonzini: "Bugfixes, including the fix for CVE-2020-2732 and a few issues found by 'make W=1'" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: s390: rstify new ioctls in api.rst KVM: nVMX: Check IO instruction VM-exit conditions KVM: nVMX: Refactor IO bitmap checks into helper function KVM: nVMX: Don't emulate instructions in guest mode KVM: nVMX: Emulate MTF when performing instruction emulation KVM: fix error handling in svm_hardware_setup KVM: SVM: Fix potential memory leak in svm_cpu_init() KVM: apic: avoid calculating pending eoi from an uninitialized val KVM: nVMX: clear PIN_BASED_POSTED_INTR from nested pinbased_ctls only when apicv is globally disabled KVM: nVMX: handle nested posted interrupts when apicv is disabled for L1 kvm: x86: svm: Fix NULL pointer dereference when AVIC not enabled KVM: VMX: Add VMX_FEATURE_USR_WAIT_PAUSE KVM: nVMX: Hold KVM's srcu lock when syncing vmcs12->shadow KVM: x86: don't notify userspace IOAPIC on edge-triggered interrupt EOI kvm/emulate: fix a -Werror=cast-function-type KVM: x86: fix incorrect comparison in trace event KVM: nVMX: Fix some obsolete comments and grammar error KVM: x86: fix missing prototypes KVM: x86: enable -Werror
2020-02-24x86/pkeys: Add check for pkey "overflow"Dave Hansen1-0/+5
Alex Shi reported the pkey macros above arch_set_user_pkey_access() to be unused. They are unused, and even refer to a nonexistent CONFIG option. But, they might have served a good use, which was to ensure that the code does not try to set values that would not fit in the PKRU register. As it stands, a too-large 'pkey' value would be likely to silently overflow the u32 new_pkru_bits. Add a check to look for overflows. Also add a comment to remind any future developer to closely examine the types used to store pkey values if arch_max_pkey() ever changes. This boots and passes the x86 pkey selftests. Reported-by: Alex Shi <alex.shi@linux.alibaba.com> Signed-off-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200122165346.AD4DA150@viggo.jf.intel.com
2020-02-24Merge tag 'v5.6-rc3' into sched/core, to pick up fixes and dependent patchesIngo Molnar61-995/+853
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-02-23efi/libstub/x86: Use Exit() boot service to exit the stub on errorsArd Biesheuvel1-0/+8
Currently, we either return with an error [from efi_pe_entry()] or enter a deadloop [in efi_main()] if any fatal errors occur during execution of the EFI stub. Let's switch to calling the Exit() EFI boot service instead in both cases, so that we a) can get rid of the deadloop, and simply return to the boot manager if any errors occur during execution of the stub, including during the call to ExitBootServices(), b) can also return cleanly from efi_pe_entry() or efi_main() in mixed mode, once we introduce support for LoadImage/StartImage based mixed mode in the next patch. Note that on systems running downstream GRUBs [which do not use LoadImage or StartImage to boot the kernel, and instead, pass their own image handle as the loaded image handle], calling Exit() will exit from GRUB rather than from the kernel, but this is a tolerable side effect. Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2020-02-23efi: Add 'runtime' pointer to struct efiArd Biesheuvel1-1/+2
Instead of going through the EFI system table each time, just copy the runtime services table pointer into struct efi directly. This is the last use of the system table pointer in struct efi, allowing us to drop it in a future patch, along with a fair amount of quirky handling of the translated address. Note that usually, the runtime services pointer changes value during the call to SetVirtualAddressMap(), so grab the updated value as soon as that call returns. (Mixed mode uses a 1:1 mapping, and kexec boot enters with the updated address in the system table, so in those cases, we don't need to do anything here) Tested-by: Tony Luck <tony.luck@intel.com> # arch/ia64 Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2020-02-23efi/x86: Make fw_vendor, config_table and runtime sysfs nodes x86 specificArd Biesheuvel1-0/+2
There is some code that exposes physical addresses of certain parts of the EFI firmware implementation via sysfs nodes. These nodes are only used on x86, and are of dubious value to begin with, so let's move their handling into the x86 arch code. Tested-by: Tony Luck <tony.luck@intel.com> # arch/ia64 Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2020-02-23efi/x86: Remove runtime table address from kexec EFI setup dataArd Biesheuvel1-1/+0
Since commit 33b85447fa61946b ("efi/x86: Drop two near identical versions of efi_runtime_init()"), we no longer map the EFI runtime services table before calling SetVirtualAddressMap(), which means we don't need the 1:1 mapped physical address of this table, and so there is no point in passing the address via EFI setup data on kexec boot. Note that the kexec tools will still look for this address in sysfs, so we still need to provide it. Tested-by: Tony Luck <tony.luck@intel.com> # arch/ia64 Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2020-02-23efi/libstub: Make the LoadFile EFI protocol accessibleArd Biesheuvel1-0/+4
Add the protocol definitions, GUIDs and mixed mode glue so that the EFI loadfile protocol can be used from the stub. This will be used in a future patch to load the initrd. Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2020-02-23efi/libstub: Expose LocateDevicePath boot serviceArd Biesheuvel1-0/+3
We will be adding support for loading the initrd from a GUIDed device path in a subsequent patch, so update the prototype of the LocateDevicePath() boot service to make it callable from our code. Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2020-02-23efi/libstub/x86: Permit cmdline data to be allocated above 4 GBArd Biesheuvel1-2/+0
We now support cmdline data that is located in memory that is not 32-bit addressable, so relax the allocation limit on systems where this feature is enabled. Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2020-02-23KVM: nVMX: Emulate MTF when performing instruction emulationOliver Upton2-0/+2
Since commit 5f3d45e7f282 ("kvm/x86: add support for MONITOR_TRAP_FLAG"), KVM has allowed an L1 guest to use the monitor trap flag processor-based execution control for its L2 guest. KVM simply forwards any MTF VM-exits to the L1 guest, which works for normal instruction execution. However, when KVM needs to emulate an instruction on the behalf of an L2 guest, the monitor trap flag is not emulated. Add the necessary logic to kvm_skip_emulated_instruction() to synthesize an MTF VM-exit to L1 upon instruction emulation for L2. Fixes: 5f3d45e7f282 ("kvm/x86: add support for MONITOR_TRAP_FLAG") Signed-off-by: Oliver Upton <oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-21KVM: nVMX: handle nested posted interrupts when apicv is disabled for L1Vitaly Kuznetsov1-1/+1
Even when APICv is disabled for L1 it can (and, actually, is) still available for L2, this means we need to always call vmx_deliver_nested_posted_interrupt() when attempting an interrupt delivery. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-21KVM: VMX: Add VMX_FEATURE_USR_WAIT_PAUSEXiaoyao Li2-1/+2
Commit 159348784ff0 ("x86/vmx: Introduce VMX_FEATURES_*") missed bit 26 (enable user wait and pause) of Secondary Processor-based VM-Execution Controls. Add VMX_FEATURE_USR_WAIT_PAUSE flag so that it shows up in /proc/cpuinfo, and use it to define SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE to make them uniform. Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>