summaryrefslogtreecommitdiff
path: root/arch/x86/kernel
AgeCommit message (Collapse)AuthorFilesLines
2020-09-16x86/pci: Set default irq domain in pcibios_add_device()Thomas Gleixner1-1/+1
Now that interrupt remapping sets the irqdomain pointer when a PCI device is added it's possible to store the default irq domain in the device struct in pcibios_add_device(). If the bus to which a device is connected has an irq domain associated then this domain is used otherwise the default domain (PCI/MSI native or XEN PCI/MSI) is used. Using the bus domain ensures that special MSI bus domains like VMD work. This makes XEN and the non-remapped native case work solely based on the irq domain pointer in struct device for PCI/MSI and allows to remove the arch fallback and make most of the x86_msi ops private to XEN in the next steps. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20200826112333.900423047@linutronix.de
2020-09-16x86/irq: Initialize PCI/MSI domain at PCI init timeThomas Gleixner3-15/+22
No point in initializing the default PCI/MSI interrupt domain early and no point to create it when XEN PV/HVM/DOM0 are active. Move the initialization to pci_arch_init() and convert it to init ops so that XEN can override it as XEN has it's own PCI/MSI management. The XEN override comes in a later step. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20200826112332.859209894@linutronix.de
2020-09-16x86/irq: Move apic_post_init() invocation to one placeThomas Gleixner3-6/+3
No point to call it from both 32bit and 64bit implementations of default_setup_apic_routing(). Move it to the caller. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20200826112332.658496557@linutronix.de
2020-09-16x86/msi: Use generic MSI domain opsThomas Gleixner1-29/+1
pci_msi_get_hwirq() and pci_msi_set_desc are not longer special. Enable the generic MSI domain ops in the core and PCI MSI code unconditionally and get rid of the x86 specific implementations in the X86 MSI code and in the hyperv PCI driver. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20200826112332.564274859@linutronix.de
2020-09-16x86/msi: Consolidate MSI allocationThomas Gleixner1-4/+3
Convert the interrupt remap drivers to retrieve the pci device from the msi descriptor and use info::hwirq. This is the first step to prepare x86 for using the generic MSI domain ops. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Wei Liu <wei.liu@kernel.org> Acked-by: Joerg Roedel <jroedel@suse.de> Link: https://lore.kernel.org/r/20200826112332.466405395@linutronix.de
2020-09-16PCI/MSI: Rework pci_msi_domain_calc_hwirq()Thomas Gleixner1-1/+1
Retrieve the PCI device from the msi descriptor instead of doing so at the call sites. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20200826112332.352583299@linutronix.de
2020-09-16x86/irq: Consolidate DMAR irq allocationThomas Gleixner1-5/+5
None of the DMAR specific fields are required. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20200826112332.163462706@linutronix.de
2020-09-16x86_ioapic_Consolidate_IOAPIC_allocationThomas Gleixner2-37/+37
Move the IOAPIC specific fields into their own struct and reuse the common devid. Get rid of the #ifdeffery as it does not matter at all whether the alloc info is a couple of bytes longer or not. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Wei Liu <wei.liu@kernel.org> Acked-by: Joerg Roedel <jroedel@suse.de> Link: https://lore.kernel.org/r/20200826112332.054367732@linutronix.de
2020-09-16x86/msi: Consolidate HPET allocationThomas Gleixner1-7/+7
None of the magic HPET fields are required in any way. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Joerg Roedel <jroedel@suse.de> Link: https://lore.kernel.org/r/20200826112331.943993771@linutronix.de
2020-09-16iommu/irq_remapping: Consolidate irq domain lookupThomas Gleixner2-2/+2
Now that the iommu implementations handle the X86_*_GET_PARENT_DOMAIN types, consolidate the two getter functions. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Joerg Roedel <jroedel@suse.de> Link: https://lore.kernel.org/r/20200826112331.741909337@linutronix.de
2020-09-16x86/irq: Add allocation type for parent domain retrievalThomas Gleixner2-2/+2
irq_remapping_ir_irq_domain() is used to retrieve the remapping parent domain for an allocation type. irq_remapping_irq_domain() is for retrieving the actual device domain for allocating interrupts for a device. The two functions are similar and can be unified by using explicit modes for parent irq domain retrieval. Add X86_IRQ_ALLOC_TYPE_IOAPIC/HPET_GET_PARENT and use it in the iommu implementations. Drop the parent domain retrieval for PCI_MSI/X as that is unused. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Joerg Roedel <jroedel@suse.de> Link: https://lore.kernel.org/r/20200826112331.436350257@linutronix.de
2020-09-16x86_irq_Rename_X86_IRQ_ALLOC_TYPE_MSI_to_reflect_PCI_dependencyThomas Gleixner1-3/+3
No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Joerg Roedel <jroedel@suse.de> Link: https://lore.kernel.org/r/20200826112331.343103175@linutronix.de
2020-09-16x86/msi: Remove pointless vcpu_affinity callbackThomas Gleixner1-1/+0
Setting the irq_set_vcpu_affinity() callback to irq_chip_set_vcpu_affinity_parent() is a pointless exercise because the function which utilizes it searchs the domain hierarchy to find a parent domain which has such a callback. Remove the useless indirection. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20200826112331.250130127@linutronix.de
2020-09-16x86/msi: Move compose message callback where it belongsThomas Gleixner2-9/+4
Composing the MSI message at the MSI chip level is wrong because the underlying parent domain is the one which knows how the message should be composed for the direct vector delivery or the interrupt remapping table entry. The interrupt remapping aware PCI/MSI chip does that already. Make the direct delivery chip do the same and move the composition of the direct delivery MSI message to the vector domain irq chip. This prepares for the upcoming device MSI support to avoid having architecture specific knowledge in the device MSI domain irq chips. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20200826112331.157603198@linutronix.de
2020-09-16genirq/chip: Use the first chip in irq_chip_compose_msi_msg()Thomas Gleixner1-2/+5
The documentation of irq_chip_compose_msi_msg() claims that with hierarchical irq domains the first chip in the hierarchy which has an irq_compose_msi_msg() callback is chosen. But the code just keeps iterating after it finds a chip with a compose callback. The x86 HPET MSI implementation relies on that behaviour, but that does not make it more correct. The message should always be composed at the domain which manages the underlying resource (e.g. APIC or remap table) because that domain knows about the required layout of the message. On X86 the following hierarchies exist: 1) vector -------- PCI/MSI 2) vector -- IR -- PCI/MSI The vector domain has a different message format than the IR (remapping) domain. So obviously the PCI/MSI domain can't compose the message without having knowledge about the parent domain, which is exactly the opposite of what hierarchical domains want to achieve. X86 actually has two different PCI/MSI chips where #1 has a compose callback and #2 does not. #2 delegates the composition to the remap domain where it belongs, but #1 does it at the PCI/MSI level. For the upcoming device MSI support it's necessary to change this and just let the first domain which can compose the message take care of it. That way the top level chip does not have to worry about it and the device MSI code does not need special knowledge about topologies. It just sets the compose callback to NULL and lets the hierarchy pick the first chip which has one. Due to that the attempt to move the compose callback from the direct delivery PCI/MSI domain to the vector domain made the system fail to boot with interrupt remapping enabled because in the remapping case irq_chip_compose_msi_msg() keeps iterating and choses the compose callback of the vector domain which obviously creates the wrong format for the remap table. Break out of the loop when the first irq chip with a compose callback is found and fixup the HPET code temporarily. That workaround will be removed once the direct delivery compose callback is moved to the place where it belongs in the vector domain. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20200826112331.047917603@linutronix.de
2020-09-16x86/init: Remove unused init opsThomas Gleixner2-26/+4
Some past platform removal forgot to get rid of this unused ballast. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20200826112330.806095671@linutronix.de
2020-09-15x86/mce/dev-mcelog: Do not update kflags on AMD systemsSmita Koralahalli1-1/+3
The mcelog utility is not commonly used on AMD systems. Therefore, errors logged only by the dev_mce_log() notifier will be missed. This may occur if the EDAC modules are not loaded, in which case it's preferable to print the error record by the default notifier. However, the mce->kflags set by dev_mce_log() notifier makes the default notifier skip over the errors assuming they are processed by dev_mce_log(). Do not update kflags in the dev_mce_log() notifier on AMD systems. Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200903234531.162484-3-Smita.KoralahalliChannabasappa@amd.com
2020-09-14x86/mce: Stop mce_reign() from re-computing severity for every CPUTony Luck1-8/+8
Back in commit: 20d51a426fe9 ("x86/mce: Reuse one of the u16 padding fields in 'struct mce'") a field was added to "struct mce" to save the computed error severity. Make use of this in mce_reign() to avoid re-computing the severity for every CPU. In the case where the machine panics, one call to mce_severity() is still needed in order to provide the correct message giving the reason for the panic. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200908175519.14223-2-tony.luck@intel.com
2020-09-13Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-6/+20
Pull kvm fixes from Paolo Bonzini: "A bit on the bigger side, mostly due to me being on vacation, then busy, then on parental leave, but there's nothing worrisome. ARM: - Multiple stolen time fixes, with a new capability to match x86 - Fix for hugetlbfs mappings when PUD and PMD are the same level - Fix for hugetlbfs mappings when PTE mappings are enforced (dirty logging, for example) - Fix tracing output of 64bit values x86: - nSVM state restore fixes - Async page fault fixes - Lots of small fixes everywhere" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (25 commits) KVM: emulator: more strict rsm checks. KVM: nSVM: more strict SMM checks when returning to nested guest SVM: nSVM: setup nested msr permission bitmap on nested state load SVM: nSVM: correctly restore GIF on vmexit from nesting after migration x86/kvm: don't forget to ACK async PF IRQ x86/kvm: properly use DEFINE_IDTENTRY_SYSVEC() macro KVM: VMX: Don't freeze guest when event delivery causes an APIC-access exit KVM: SVM: avoid emulation with stale next_rip KVM: x86: always allow writing '0' to MSR_KVM_ASYNC_PF_EN KVM: SVM: Periodically schedule when unregistering regions on destroy KVM: MIPS: Change the definition of kvm type kvm x86/mmu: use KVM_REQ_MMU_SYNC to sync when needed KVM: nVMX: Fix the update value of nested load IA32_PERF_GLOBAL_CTRL control KVM: fix memory leak in kvm_io_bus_unregister_dev() KVM: Check the allocation of pv cpu mask KVM: nVMX: Update VMCS02 when L2 PAE PDPTE updates detected KVM: arm64: Update page shift if stage 2 block mapping not supported KVM: arm64: Fix address truncation in traces KVM: arm64: Do not try to map PUDs when they are folded into PMD arm64/x86: KVM: Introduce steal-time cap ...
2020-09-12x86/kvm: don't forget to ACK async PF IRQVitaly Kuznetsov1-0/+2
Merge commit 26d05b368a5c0 ("Merge branch 'kvm-async-pf-int' into HEAD") tried to adapt the new interrupt based async PF mechanism to the newly introduced IDTENTRY magic but unfortunately it missed the fact that DEFINE_IDTENTRY_SYSVEC() doesn't call ack_APIC_irq() on its own and all DEFINE_IDTENTRY_SYSVEC() users have to call it manually. As the result all multi-CPU KVM guest hang on boot when KVM_FEATURE_ASYNC_PF_INT is present. The breakage went unnoticed because no KVM userspace (e.g. QEMU) currently set it (and thus async PF mechanism is currently disabled) but we're about to change that. Fixes: 26d05b368a5c0 ("Merge branch 'kvm-async-pf-int' into HEAD") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200908135350.355053-3-vkuznets@redhat.com> Tested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-12x86/kvm: properly use DEFINE_IDTENTRY_SYSVEC() macroVitaly Kuznetsov1-4/+0
DEFINE_IDTENTRY_SYSVEC() already contains irqentry_enter()/ irqentry_exit(). Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200908135350.355053-2-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-11KVM: Check the allocation of pv cpu maskHaiwei Li1-3/+19
check the allocation of per-cpu __pv_cpu_mask. Initialize ops only when successful. Signed-off-by: Haiwei Li <lihaiwei@tencent.com> Message-Id: <d59f05df-e6d3-3d31-a036-cc25a2b2f33f@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-11x86/cpu/centaur: Add Centaur family >=7 CPUs initialization supportTony W Wang-oc1-2/+6
Add Centaur family >=7 CPUs specific initialization support. Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/1599562666-31351-3-git-send-email-TonyWWang-oc@zhaoxin.com
2020-09-11x86/cpu/centaur: Replace two-condition switch-case with an if statementTony W Wang-oc1-15/+8
Use a normal if statements instead of a two-condition switch-case. [ bp: Massage commit message. ] Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/1599562666-31351-2-git-send-email-TonyWWang-oc@zhaoxin.com
2020-09-11dma-direct: remove dma_direct_{alloc,free}_pagesChristoph Hellwig1-3/+3
Just merge these helpers into the main dma_direct_{alloc,free} routines, as the additional checks are always false for the two callers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Robin Murphy <robin.murphy@arm.com>
2020-09-11x86/mce: Make mce_rdmsrl() panic on an inaccessible MSRBorislav Petkov2-12/+70
If an exception needs to be handled while reading an MSR - which is in most of the cases caused by a #GP on a non-existent MSR - then this is most likely the incarnation of a BIOS or a hardware bug. Such bug violates the architectural guarantee that MCA banks are present with all MSRs belonging to them. The proper fix belongs in the hardware/firmware - not in the kernel. Handling an #MC exception which is raised while an NMI is being handled would cause the nasty NMI nesting issue because of the shortcoming of IRET of reenabling NMIs when executed. And the machine is in an #MC context already so <Deity> be at its side. Tracing MSR accesses while in #MC is another no-no due to tracing being inherently a bad idea in atomic context: vmlinux.o: warning: objtool: do_machine_check()+0x4a: call to mce_rdmsrl() leaves .noinstr.text section so remove all that "additional" functionality from mce_rdmsrl() and provide it with a special exception handler which panics the machine when that MSR is not accessible. The exception handler prints a human-readable message explaining what the panic reason is but, what is more, it panics while in the #GP handler and latter won't have executed an IRET, thus opening the NMI nesting issue in the case when the #MC has happened while handling an NMI. (#MC itself won't be reenabled until MCG_STATUS hasn't been cleared). Suggested-by: Andy Lutomirski <luto@kernel.org> Suggested-by: Peter Zijlstra <peterz@infradead.org> [ Add missing prototypes for ex_handler_* ] Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20200906212130.GA28456@zn.tnic
2020-09-10x86/sev-es: Check required CPU features for SEV-ESMartin Radev2-0/+18
Make sure the machine supports RDRAND, otherwise there is no trusted source of randomness in the system. To also check this in the pre-decompression stage, make has_cpuflag() not depend on CONFIG_RANDOMIZE_BASE anymore. Signed-off-by: Martin Radev <martin.b.radev@gmail.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/20200907131613.12703-73-joro@8bytes.org
2020-09-10x86/efi: Add GHCB mappings when SEV-ES is activeTom Lendacky1-0/+30
Calling down to EFI runtime services can result in the firmware performing VMGEXIT calls. The firmware is likely to use the GHCB of the OS (e.g., for setting EFI variables), so each GHCB in the system needs to be identity-mapped in the EFI page tables, as unencrypted, to avoid page faults. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> [ jroedel@suse.de: Moved GHCB mapping loop to sev-es.c ] Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lkml.kernel.org/r/20200907131613.12703-72-joro@8bytes.org
2020-09-10x86/fpu: Allow multiple bits in clearcpuid= parameterArvind Sankar1-8/+22
Commit 0c2a3913d6f5 ("x86/fpu: Parse clearcpuid= as early XSAVE argument") changed clearcpuid parsing from __setup() to cmdline_find_option(). While the __setup() function would have been called for each clearcpuid= parameter on the command line, cmdline_find_option() will only return the last one, so the change effectively made it impossible to disable more than one bit. Allow a comma-separated list of bit numbers as the argument for clearcpuid to allow multiple bits to be disabled again. Log the bits being disabled for informational purposes. Also fix the check on the return value of cmdline_find_option(). It returns -1 when the option is not found, so testing as a boolean is incorrect. Fixes: 0c2a3913d6f5 ("x86/fpu: Parse clearcpuid= as early XSAVE argument") Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200907213919.2423441-1-nivedita@alum.mit.edu
2020-09-10objtool: Make unwind hint definitions available to other architecturesJulien Thierry1-5/+6
Unwind hints are useful to provide objtool with information about stack states in non-standard functions/code. While the type of information being provided might be very arch specific, the mechanism to provide the information can be useful for other architectures. Move the relevant unwint hint definitions for all architectures to see. [ jpoimboe: REGS_IRET -> REGS_PARTIAL ] Signed-off-by: Julien Thierry <jthierry@redhat.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
2020-09-10objtool: Rename frame.h -> objtool.hJulien Thierry3-3/+3
Header frame.h is getting more code annotations to help objtool analyze object files. Rename the file to objtool.h. [ jpoimboe: add objtool.h to MAINTAINERS ] Signed-off-by: Julien Thierry <jthierry@redhat.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
2020-09-10x86/tsc: Use seqcount_latch_tAhmed S. Darwish1-5/+5
Latch sequence counters have unique read and write APIs, and thus seqcount_latch_t was recently introduced at seqlock.h. Use that new data type instead of plain seqcount_t. This adds the necessary type-safety and ensures that only latching-safe seqcount APIs are to be used. Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de> [peterz: unwreck cyc2ns_read_begin()] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200827114044.11173-7-a.darwish@linutronix.de
2020-09-09x86/sev-es: Handle NMI StateJoerg Roedel2-0/+24
When running under SEV-ES, the kernel has to tell the hypervisor when to open the NMI window again after an NMI was injected. This is done with an NMI-complete message to the hypervisor. Add code to the kernel's NMI handler to send this message right at the beginning of do_nmi(). This always allows nesting NMIs. [ bp: Mark __sev_es_nmi_complete() noinstr: vmlinux.o: warning: objtool: exc_nmi()+0x17: call to __sev_es_nmi_complete() leaves .noinstr.text section While at it, use __pa_nodebug() for the same reason due to CONFIG_DEBUG_VIRTUAL=y: vmlinux.o: warning: objtool: __sev_es_nmi_complete()+0xd9: call to __phys_addr() leaves .noinstr.text section ] Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200907131613.12703-71-joro@8bytes.org
2020-09-09x86/sev-es: Support CPU offline/onlineJoerg Roedel1-0/+63
Add a play_dead handler when running under SEV-ES. This is needed because the hypervisor can't deliver an SIPI request to restart the AP. Instead, the kernel has to issue a VMGEXIT to halt the VCPU until the hypervisor wakes it up again. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200907131613.12703-70-joro@8bytes.org
2020-09-09x86/head/64: Don't call verify_cpu() on starting APsJoerg Roedel1-0/+12
The APs are not ready to handle exceptions when verify_cpu() is called in secondary_startup_64(). Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/20200907131613.12703-69-joro@8bytes.org
2020-09-09x86/smpboot: Load TSS and getcpu GDT entry before loading IDTJoerg Roedel2-1/+24
The IDT on 64-bit contains vectors which use paranoid_entry() and/or IST stacks. To make these vectors work, the TSS and the getcpu GDT entry need to be set up before the IDT is loaded. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200907131613.12703-68-joro@8bytes.org
2020-09-09x86/realmode: Setup AP jump tableTom Lendacky1-0/+69
As part of the GHCB specification, the booting of APs under SEV-ES requires an AP jump table when transitioning from one layer of code to another (e.g. when going from UEFI to the OS). As a result, each layer that parks an AP must provide the physical address of an AP jump table to the next layer via the hypervisor. Upon booting of the kernel, read the AP jump table address from the hypervisor. Under SEV-ES, APs are started using the INIT-SIPI-SIPI sequence. Before issuing the first SIPI request for an AP, the start CS and IP is programmed into the AP jump table. Upon issuing the SIPI request, the AP will awaken and jump to that start CS:IP address. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> [ jroedel@suse.de: - Adapted to different code base - Moved AP table setup from SIPI sending path to real-mode setup code - Fix sparse warnings ] Co-developed-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200907131613.12703-67-joro@8bytes.org
2020-09-09x86/vmware: Add VMware-specific handling for VMMCALL under SEV-ESDoug Covelli1-5/+45
Add VMware-specific handling for #VC faults caused by VMMCALL instructions. Signed-off-by: Doug Covelli <dcovelli@vmware.com> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> [ jroedel@suse.de: - Adapt to different paravirt interface ] Co-developed-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200907131613.12703-65-joro@8bytes.org
2020-09-09x86/kvm: Add KVM-specific VMMCALL handling under SEV-ESTom Lendacky1-6/+29
Implement the callbacks to copy the processor state required by KVM to the GHCB. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> [ jroedel@suse.de: - Split out of a larger patch - Adapt to different callback functions ] Co-developed-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200907131613.12703-64-joro@8bytes.org
2020-09-09x86/paravirt: Allow hypervisor-specific VMMCALL handling under SEV-ESJoerg Roedel1-0/+12
Add two new paravirt callbacks to provide hypervisor-specific processor state in the GHCB and to copy state from the hypervisor back to the processor. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200907131613.12703-63-joro@8bytes.org
2020-09-09x86/sev-es: Handle #DB EventsJoerg Roedel1-0/+17
Handle #VC exceptions caused by #DB exceptions in the guest. Those must be handled outside of instrumentation_begin()/end() so that the handler will not be raised recursively. Handle them by calling the kernel's debug exception handler. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200907131613.12703-62-joro@8bytes.org
2020-09-09x86/sev-es: Handle #AC EventsJoerg Roedel1-0/+19
Implement a handler for #VC exceptions caused by #AC exceptions. The #AC exception is just forwarded to do_alignment_check() and not pushed down to the hypervisor, as requested by the SEV-ES GHCB Standardization Specification. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200907131613.12703-61-joro@8bytes.org
2020-09-09x86/sev-es: Handle VMMCALL EventsTom Lendacky1-0/+23
Implement a handler for #VC exceptions caused by VMMCALL instructions. This is only a starting point, VMMCALL emulation under SEV-ES needs further hypervisor-specific changes to provide additional state. [ bp: Drop "this patch". ] Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> [ jroedel@suse.de: Adapt to #VC handling infrastructure ] Co-developed-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200907131613.12703-60-joro@8bytes.org
2020-09-09x86/sev-es: Handle MWAIT/MWAITX EventsTom Lendacky1-0/+10
Implement a handler for #VC exceptions caused by MWAIT and MWAITX instructions. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> [ jroedel@suse.de: Adapt to #VC handling infrastructure ] Co-developed-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200907131613.12703-59-joro@8bytes.org
2020-09-09x86/sev-es: Handle MONITOR/MONITORX EventsTom Lendacky1-0/+13
Implement a handler for #VC exceptions caused by MONITOR and MONITORX instructions. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> [ jroedel@suse.de: Adapt to #VC handling infrastructure ] Co-developed-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200907131613.12703-58-joro@8bytes.org
2020-09-09x86/sev-es: Handle INVD EventsTom Lendacky1-0/+4
Implement a handler for #VC exceptions caused by INVD instructions. Since Linux should never use INVD, just mark it as unsupported. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> [ jroedel@suse.de: Adapt to #VC handling infrastructure ] Co-developed-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200907131613.12703-57-joro@8bytes.org
2020-09-09x86/sev-es: Handle RDPMC EventsTom Lendacky1-0/+22
Implement a handler for #VC exceptions caused by RDPMC instructions. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> [ jroedel@suse.de: Adapt to #VC handling infrastructure ] Co-developed-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200907131613.12703-56-joro@8bytes.org
2020-09-09x86/sev-es: Handle RDTSC(P) EventsTom Lendacky2-0/+27
Implement a handler for #VC exceptions caused by RDTSC and RDTSCP instructions. Also make it available in the pre-decompression stage because the KASLR code uses RDTSC/RDTSCP to gather entropy and some hypervisors intercept these instructions. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> [ jroedel@suse.de: - Adapt to #VC handling infrastructure - Make it available early ] Co-developed-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200907131613.12703-55-joro@8bytes.org
2020-09-09x86/sev-es: Handle WBINVD EventsTom Lendacky1-0/+9
Implement a handler for #VC exceptions caused by WBINVD instructions. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> [ jroedel@suse.de: Adapt to #VC handling framework ] Co-developed-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200907131613.12703-54-joro@8bytes.org
2020-09-09x86/sev-es: Handle DR7 read/write eventsTom Lendacky1-0/+85
Add code to handle #VC exceptions on DR7 register reads and writes. This is needed early because show_regs() reads DR7 to print it out. Under SEV-ES, there is currently no support for saving/restoring the DRx registers but software expects to be able to write to the DR7 register. For now, cache the value written to DR7 and return it on read attempts, but do not touch the real hardware DR7. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> [ jroedel@suse.de: - Adapt to #VC handling framework - Support early usage ] Co-developed-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200907131613.12703-53-joro@8bytes.org