summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)AuthorFilesLines
2020-09-11KVM: nVMX: Update VMCS02 when L2 PAE PDPTE updates detectedPeter Shier3-2/+11
When L2 uses PAE, L0 intercepts of L2 writes to CR0/CR3/CR4 call load_pdptrs to read the possibly updated PDPTEs from the guest physical address referenced by CR3. It loads them into vcpu->arch.walk_mmu->pdptrs and sets VCPU_EXREG_PDPTR in vcpu->arch.regs_dirty. At the subsequent assumed reentry into L2, the mmu will call vmx_load_mmu_pgd which calls ept_load_pdptrs. ept_load_pdptrs sees VCPU_EXREG_PDPTR set in vcpu->arch.regs_dirty and loads VMCS02.GUEST_PDPTRn from vcpu->arch.walk_mmu->pdptrs[]. This all works if the L2 CRn write intercept always resumes L2. The resume path calls vmx_check_nested_events which checks for exceptions, MTF, and expired VMX preemption timers. If vmx_check_nested_events finds any of these conditions pending it will reflect the corresponding exit into L1. Live migration at this point would also cause a missed immediate reentry into L2. After L1 exits, vmx_vcpu_run calls vmx_register_cache_reset which clears VCPU_EXREG_PDPTR in vcpu->arch.regs_dirty. When L2 next resumes, ept_load_pdptrs finds VCPU_EXREG_PDPTR clear in vcpu->arch.regs_dirty and does not load VMCS02.GUEST_PDPTRn from vcpu->arch.walk_mmu->pdptrs[]. prepare_vmcs02 will then load VMCS02.GUEST_PDPTRn from vmcs12->pdptr0/1/2/3 which contain the stale values stored at last L2 exit. A repro of this bug showed L2 entering triple fault immediately due to the bad VMCS02.GUEST_PDPTRn values. When L2 is in PAE paging mode add a call to ept_load_pdptrs before leaving L2. This will update VMCS02.GUEST_PDPTRn if they are dirty in vcpu->arch.walk_mmu->pdptrs[]. Tested: kvm-unit-tests with new directed test: vmx_mtf_pdpte_test. Verified that test fails without the fix. Also ran Google internal VMM with an Ubuntu 16.04 4.4.0-83 guest running a custom hypervisor with a 32-bit Windows XP L2 guest using PAE. Prior to fix would repro readily. Ran 14 simultaneous L2s for 140 iterations with no failures. Signed-off-by: Peter Shier <pshier@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <20200820230545.2411347-1-pshier@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-11Merge tag 'kvmarm-fixes-5.9-1' of ↵Paolo Bonzini73-1219/+470
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for Linux 5.9, take #1 - Multiple stolen time fixes, with a new capability to match x86 - Fix for hugetlbfs mappings when PUD and PMD are the same level - Fix for hugetlbfs mappings when PTE mappings are enforced (dirty logging, for example) - Fix tracing output of 64bit values
2020-09-11x86/cpu/centaur: Add Centaur family >=7 CPUs initialization supportTony W Wang-oc1-2/+6
Add Centaur family >=7 CPUs specific initialization support. Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/1599562666-31351-3-git-send-email-TonyWWang-oc@zhaoxin.com
2020-09-11x86/cpu/centaur: Replace two-condition switch-case with an if statementTony W Wang-oc1-15/+8
Use a normal if statements instead of a two-condition switch-case. [ bp: Massage commit message. ] Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/1599562666-31351-2-git-send-email-TonyWWang-oc@zhaoxin.com
2020-09-11dma-direct: remove dma_direct_{alloc,free}_pagesChristoph Hellwig1-3/+3
Just merge these helpers into the main dma_direct_{alloc,free} routines, as the additional checks are always false for the two callers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Robin Murphy <robin.murphy@arm.com>
2020-09-11x86/mce: Make mce_rdmsrl() panic on an inaccessible MSRBorislav Petkov2-12/+70
If an exception needs to be handled while reading an MSR - which is in most of the cases caused by a #GP on a non-existent MSR - then this is most likely the incarnation of a BIOS or a hardware bug. Such bug violates the architectural guarantee that MCA banks are present with all MSRs belonging to them. The proper fix belongs in the hardware/firmware - not in the kernel. Handling an #MC exception which is raised while an NMI is being handled would cause the nasty NMI nesting issue because of the shortcoming of IRET of reenabling NMIs when executed. And the machine is in an #MC context already so <Deity> be at its side. Tracing MSR accesses while in #MC is another no-no due to tracing being inherently a bad idea in atomic context: vmlinux.o: warning: objtool: do_machine_check()+0x4a: call to mce_rdmsrl() leaves .noinstr.text section so remove all that "additional" functionality from mce_rdmsrl() and provide it with a special exception handler which panics the machine when that MSR is not accessible. The exception handler prints a human-readable message explaining what the panic reason is but, what is more, it panics while in the #GP handler and latter won't have executed an IRET, thus opening the NMI nesting issue in the case when the #MC has happened while handling an NMI. (#MC itself won't be reenabled until MCG_STATUS hasn't been cleared). Suggested-by: Andy Lutomirski <luto@kernel.org> Suggested-by: Peter Zijlstra <peterz@infradead.org> [ Add missing prototypes for ex_handler_* ] Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20200906212130.GA28456@zn.tnic
2020-09-11crypto: poly1305-x86_64 - Use XORL r32,32Uros Bizjak1-4/+4
x86_64 zero extends 32bit operations, so for 64bit operands, XORL r32,r32 is functionally equal to XORQ r64,r64, but avoids a REX prefix byte when legacy registers are used. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "David S. Miller" <davem@davemloft.net> Acked-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-09-11crypto: curve25519-x86_64 - Use XORL r32,32Uros Bizjak1-34/+34
x86_64 zero extends 32bit operations, so for 64bit operands, XORL r32,r32 is functionally equal to XORL r64,r64, but avoids a REX prefix byte when legacy registers are used. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "David S. Miller" <davem@davemloft.net> Acked-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-09-10x86/sev-es: Check required CPU features for SEV-ESMartin Radev5-6/+24
Make sure the machine supports RDRAND, otherwise there is no trusted source of randomness in the system. To also check this in the pre-decompression stage, make has_cpuflag() not depend on CONFIG_RANDOMIZE_BASE anymore. Signed-off-by: Martin Radev <martin.b.radev@gmail.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/20200907131613.12703-73-joro@8bytes.org
2020-09-10x86/efi: Add GHCB mappings when SEV-ES is activeTom Lendacky4-0/+43
Calling down to EFI runtime services can result in the firmware performing VMGEXIT calls. The firmware is likely to use the GHCB of the OS (e.g., for setting EFI variables), so each GHCB in the system needs to be identity-mapped in the EFI page tables, as unencrypted, to avoid page faults. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> [ jroedel@suse.de: Moved GHCB mapping loop to sev-es.c ] Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lkml.kernel.org/r/20200907131613.12703-72-joro@8bytes.org
2020-09-10x86/fpu: Allow multiple bits in clearcpuid= parameterArvind Sankar1-8/+22
Commit 0c2a3913d6f5 ("x86/fpu: Parse clearcpuid= as early XSAVE argument") changed clearcpuid parsing from __setup() to cmdline_find_option(). While the __setup() function would have been called for each clearcpuid= parameter on the command line, cmdline_find_option() will only return the last one, so the change effectively made it impossible to disable more than one bit. Allow a comma-separated list of bit numbers as the argument for clearcpuid to allow multiple bits to be disabled again. Log the bits being disabled for informational purposes. Also fix the check on the return value of cmdline_find_option(). It returns -1 when the option is not found, so testing as a boolean is incorrect. Fixes: 0c2a3913d6f5 ("x86/fpu: Parse clearcpuid= as early XSAVE argument") Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200907213919.2423441-1-nivedita@alum.mit.edu
2020-09-10objtool: Make unwind hint definitions available to other architecturesJulien Thierry3-84/+17
Unwind hints are useful to provide objtool with information about stack states in non-standard functions/code. While the type of information being provided might be very arch specific, the mechanism to provide the information can be useful for other architectures. Move the relevant unwint hint definitions for all architectures to see. [ jpoimboe: REGS_IRET -> REGS_PARTIAL ] Signed-off-by: Julien Thierry <jthierry@redhat.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
2020-09-10objtool: Rename frame.h -> objtool.hJulien Thierry8-8/+8
Header frame.h is getting more code annotations to help objtool analyze object files. Rename the file to objtool.h. [ jpoimboe: add objtool.h to MAINTAINERS ] Signed-off-by: Julien Thierry <jthierry@redhat.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
2020-09-10swiotlb: Declare swiotlb_late_init_with_default_size() in headerAndy Shevchenko1-1/+0
Compiler is not happy about one function prototype: CC kernel/dma/swiotlb.o kernel/dma/swiotlb.c:275:1: warning: no previous prototype for ‘swiotlb_late_init_with_default_size’ [-Wmissing-prototypes] 275 | swiotlb_late_init_with_default_size(size_t default_size) | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Since it's used outside of the module, move its declaration to the header from the user. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2020-09-10arch/x86/amd/ibs: Fix re-arming IBS FetchKim Phillips1-1/+14
Stephane Eranian found a bug in that IBS' current Fetch counter was not being reset when the driver would write the new value to clear it along with the enable bit set, and found that adding an MSR write that would first disable IBS Fetch would make IBS Fetch reset its current count. Indeed, the PPR for AMD Family 17h Model 31h B0 55803 Rev 0.54 - Sep 12, 2019 states "The periodic fetch counter is set to IbsFetchCnt [...] when IbsFetchEn is changed from 0 to 1." Explicitly set IbsFetchEn to 0 and then to 1 when re-enabling IBS Fetch, so the driver properly resets the internal counter to 0 and IBS Fetch starts counting again. A family 15h machine tested does not have this problem, and the extra wrmsr is also not needed on Family 19h, so only do the extra wrmsr on families 16h through 18h. Reported-by: Stephane Eranian <stephane.eranian@google.com> Signed-off-by: Kim Phillips <kim.phillips@amd.com> [peterz: optimized] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
2020-09-10perf/x86/rapl: Add AMD Fam19h RAPL supportKim Phillips1-0/+1
Family 19h RAPL support did not change from Family 17h; extend the existing Fam17h support to work on Family 19h too. Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200908214740.18097-8-kim.phillips@amd.com
2020-09-10perf/x86/amd/ibs: Support 27-bit extended Op/cycle counterKim Phillips2-11/+32
IBS hardware with the OpCntExt feature gets a 7-bit wider internal counter. Both the maximum and current count bitfields in the IBS_OP_CTL register are extended to support reading and writing it. No changes are necessary to the driver for handling the extra contiguous current count bits (IbsOpCurCnt), as the driver already passes through 32 bits of that field. However, the driver has to do some extra bit manipulation when converting from a period to the non-contiguous (although conveniently aligned) extra bits in the IbsOpMaxCnt bitfield. This decreases IBS Op interrupt overhead when the period is over 1,048,560 (0xffff0), which would previously activate the driver's software counter. That threshold is now 134,217,712 (0x7fffff0). Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200908214740.18097-7-kim.phillips@amd.com
2020-09-10perf/x86/amd/ibs: Fix raw sample data accumulationKim Phillips2-10/+17
Neither IbsBrTarget nor OPDATA4 are populated in IBS Fetch mode. Don't accumulate them into raw sample user data in that case. Also, in Fetch mode, add saving the IBS Fetch Control Extended MSR. Technically, there is an ABI change here with respect to the IBS raw sample data format, but I don't see any perf driver version information being included in perf.data file headers, but, existing users can detect whether the size of the sample record has reduced by 8 bytes to determine whether the IBS driver has this fix. Fixes: 904cb3677f3a ("perf/x86/amd/ibs: Update IBS MSRs and feature definitions") Reported-by: Stephane Eranian <stephane.eranian@google.com> Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20200908214740.18097-6-kim.phillips@amd.com
2020-09-10perf/x86/amd/ibs: Don't include randomized bits in get_ibs_op_count()Kim Phillips1-4/+8
get_ibs_op_count() adds hardware's current count (IbsOpCurCnt) bits to its count regardless of hardware's valid status. According to the PPR for AMD Family 17h Model 31h B0 55803 Rev 0.54, if the counter rolls over, valid status is set, and the lower 7 bits of IbsOpCurCnt are randomized by hardware. Don't include those bits in the driver's event count. Fixes: 8b1e13638d46 ("perf/x86-ibs: Fix usage of IBS op current count") Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
2020-09-10perf/x86/amd: Fix sampling Large Increment per Cycle eventsKim Phillips1-2/+2
Commit 5738891229a2 ("perf/x86/amd: Add support for Large Increment per Cycle Events") mistakenly zeroes the upper 16 bits of the count in set_period(). That's fine for counting with perf stat, but not sampling with perf record when only Large Increment events are being sampled. To enable sampling, we sign extend the upper 16 bits of the merged counter pair as described in the Family 17h PPRs: "Software wanting to preload a value to a merged counter pair writes the high-order 16-bit value to the low-order 16 bits of the odd counter and then writes the low-order 48-bit value to the even counter. Reading the even counter of the merged counter pair returns the full 64-bit value." Fixes: 5738891229a2 ("perf/x86/amd: Add support for Large Increment per Cycle Events") Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
2020-09-10perf/amd/uncore: Set all slices and threads to restore perf stat -a behaviourKim Phillips1-20/+8
Commit 2f217d58a8a0 ("perf/x86/amd/uncore: Set the thread mask for F17h L3 PMCs") inadvertently changed the uncore driver's behaviour wrt perf tool invocations with or without a CPU list, specified with -C / --cpu=. Change the behaviour of the driver to assume the former all-cpu (-a) case, which is the more commonly desired default. This fixes '-a -A' invocations without explicit cpu lists (-C) to not count L3 events only on behalf of the first thread of the first core in the L3 domain. BEFORE: Activity performed by the first thread of the last core (CPU#43) in CPU#40's L3 domain is not reported by CPU#40: sudo perf stat -a -A -e l3_request_g1.caching_l3_cache_accesses taskset -c 43 perf bench mem memcpy -s 32mb -l 100 -f default ... CPU36 21,835 l3_request_g1.caching_l3_cache_accesses CPU40 87,066 l3_request_g1.caching_l3_cache_accesses CPU44 17,360 l3_request_g1.caching_l3_cache_accesses ... AFTER: The L3 domain activity is now reported by CPU#40: sudo perf stat -a -A -e l3_request_g1.caching_l3_cache_accesses taskset -c 43 perf bench mem memcpy -s 32mb -l 100 -f default ... CPU36 354,891 l3_request_g1.caching_l3_cache_accesses CPU40 1,780,870 l3_request_g1.caching_l3_cache_accesses CPU44 315,062 l3_request_g1.caching_l3_cache_accesses ... Fixes: 2f217d58a8a0 ("perf/x86/amd/uncore: Set the thread mask for F17h L3 PMCs") Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20200908214740.18097-2-kim.phillips@amd.com
2020-09-10perf/x86/intel/ds: Fix x86_pmu_stop warning for large PEBSKan Liang1-12/+20
A warning as below may be triggered when sampling with large PEBS. [ 410.411250] perf: interrupt took too long (72145 > 71975), lowering kernel.perf_event_max_sample_rate to 2000 [ 410.724923] ------------[ cut here ]------------ [ 410.729822] WARNING: CPU: 0 PID: 16397 at arch/x86/events/core.c:1422 x86_pmu_stop+0x95/0xa0 [ 410.933811] x86_pmu_del+0x50/0x150 [ 410.937304] event_sched_out.isra.0+0xbc/0x210 [ 410.941751] group_sched_out.part.0+0x53/0xd0 [ 410.946111] ctx_sched_out+0x193/0x270 [ 410.949862] __perf_event_task_sched_out+0x32c/0x890 [ 410.954827] ? set_next_entity+0x98/0x2d0 [ 410.958841] __schedule+0x592/0x9c0 [ 410.962332] schedule+0x5f/0xd0 [ 410.965477] exit_to_usermode_loop+0x73/0x120 [ 410.969837] prepare_exit_to_usermode+0xcd/0xf0 [ 410.974369] ret_from_intr+0x2a/0x3a [ 410.977946] RIP: 0033:0x40123c [ 411.079661] ---[ end trace bc83adaea7bb664a ]--- In the non-overflow context, e.g., context switch, with large PEBS, perf may stop an event twice. An example is below. //max_samples_per_tick is adjusted to 2 //NMI is triggered intel_pmu_handle_irq() handle_pmi_common() drain_pebs() __intel_pmu_pebs_event() perf_event_overflow() __perf_event_account_interrupt() hwc->interrupts = 1 return 0 //A context switch happens right after the NMI. //In the same tick, the perf_throttled_seq is not changed. perf_event_task_sched_out() perf_pmu_sched_task() intel_pmu_drain_pebs_buffer() __intel_pmu_pebs_event() perf_event_overflow() __perf_event_account_interrupt() ++hwc->interrupts >= max_samples_per_tick return 1 x86_pmu_stop(); # First stop perf_event_context_sched_out() task_ctx_sched_out() ctx_sched_out() event_sched_out() x86_pmu_del() x86_pmu_stop(); # Second stop and trigger the warning Perf should only invoke the perf_event_overflow() in the overflow context. Current drain_pebs() is called from: - handle_pmi_common() -- overflow context - intel_pmu_pebs_sched_task() -- non-overflow context - intel_pmu_pebs_disable() -- non-overflow context - intel_pmu_auto_reload_read() -- possible overflow context With PERF_SAMPLE_READ + PERF_FORMAT_GROUP, the function may be invoked in the NMI handler. But, before calling the function, the PEBS buffer has already been drained. The __intel_pmu_pebs_event() will not be called in the possible overflow context. To fix the issue, an indicator is required to distinguish between the overflow context aka handle_pmi_common() and other cases. The dummy regs pointer can be used as the indicator. In the non-overflow context, perf should treat the last record the same as other PEBS records, and doesn't invoke the generic overflow handler. Fixes: 21509084f999 ("perf/x86/intel: Handle multiple records in the PEBS buffer") Reported-by: Like Xu <like.xu@linux.intel.com> Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Like Xu <like.xu@linux.intel.com> Link: https://lkml.kernel.org/r/20200902210649.2743-1-kan.liang@linux.intel.com
2020-09-10x86/tsc: Use seqcount_latch_tAhmed S. Darwish1-5/+5
Latch sequence counters have unique read and write APIs, and thus seqcount_latch_t was recently introduced at seqlock.h. Use that new data type instead of plain seqcount_t. This adds the necessary type-safety and ensures that only latching-safe seqcount APIs are to be used. Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de> [peterz: unwreck cyc2ns_read_begin()] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200827114044.11173-7-a.darwish@linutronix.de
2020-09-09x86/sev-es: Handle NMI StateJoerg Roedel4-0/+32
When running under SEV-ES, the kernel has to tell the hypervisor when to open the NMI window again after an NMI was injected. This is done with an NMI-complete message to the hypervisor. Add code to the kernel's NMI handler to send this message right at the beginning of do_nmi(). This always allows nesting NMIs. [ bp: Mark __sev_es_nmi_complete() noinstr: vmlinux.o: warning: objtool: exc_nmi()+0x17: call to __sev_es_nmi_complete() leaves .noinstr.text section While at it, use __pa_nodebug() for the same reason due to CONFIG_DEBUG_VIRTUAL=y: vmlinux.o: warning: objtool: __sev_es_nmi_complete()+0xd9: call to __phys_addr() leaves .noinstr.text section ] Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200907131613.12703-71-joro@8bytes.org
2020-09-09x86/sev-es: Support CPU offline/onlineJoerg Roedel2-0/+64
Add a play_dead handler when running under SEV-ES. This is needed because the hypervisor can't deliver an SIPI request to restart the AP. Instead, the kernel has to issue a VMGEXIT to halt the VCPU until the hypervisor wakes it up again. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200907131613.12703-70-joro@8bytes.org
2020-09-09x86/head/64: Don't call verify_cpu() on starting APsJoerg Roedel3-0/+19
The APs are not ready to handle exceptions when verify_cpu() is called in secondary_startup_64(). Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/20200907131613.12703-69-joro@8bytes.org
2020-09-09x86/smpboot: Load TSS and getcpu GDT entry before loading IDTJoerg Roedel3-1/+25
The IDT on 64-bit contains vectors which use paranoid_entry() and/or IST stacks. To make these vectors work, the TSS and the getcpu GDT entry need to be set up before the IDT is loaded. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200907131613.12703-68-joro@8bytes.org
2020-09-09x86/realmode: Setup AP jump tableTom Lendacky4-2/+93
As part of the GHCB specification, the booting of APs under SEV-ES requires an AP jump table when transitioning from one layer of code to another (e.g. when going from UEFI to the OS). As a result, each layer that parks an AP must provide the physical address of an AP jump table to the next layer via the hypervisor. Upon booting of the kernel, read the AP jump table address from the hypervisor. Under SEV-ES, APs are started using the INIT-SIPI-SIPI sequence. Before issuing the first SIPI request for an AP, the start CS and IP is programmed into the AP jump table. Upon issuing the SIPI request, the AP will awaken and jump to that start CS:IP address. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> [ jroedel@suse.de: - Adapted to different code base - Moved AP table setup from SIPI sending path to real-mode setup code - Fix sparse warnings ] Co-developed-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200907131613.12703-67-joro@8bytes.org
2020-09-09x86/realmode: Add SEV-ES specific trampoline entry pointJoerg Roedel3-0/+26
The code at the trampoline entry point is executed in real-mode. In real-mode, #VC exceptions can't be handled so anything that might cause such an exception must be avoided. In the standard trampoline entry code this is the WBINVD instruction and the call to verify_cpu(), which are both not needed anyway when running as an SEV-ES guest. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200907131613.12703-66-joro@8bytes.org
2020-09-09x86/vmware: Add VMware-specific handling for VMMCALL under SEV-ESDoug Covelli1-5/+45
Add VMware-specific handling for #VC faults caused by VMMCALL instructions. Signed-off-by: Doug Covelli <dcovelli@vmware.com> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> [ jroedel@suse.de: - Adapt to different paravirt interface ] Co-developed-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200907131613.12703-65-joro@8bytes.org
2020-09-09x86/kvm: Add KVM-specific VMMCALL handling under SEV-ESTom Lendacky1-6/+29
Implement the callbacks to copy the processor state required by KVM to the GHCB. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> [ jroedel@suse.de: - Split out of a larger patch - Adapt to different callback functions ] Co-developed-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200907131613.12703-64-joro@8bytes.org
2020-09-09x86/paravirt: Allow hypervisor-specific VMMCALL handling under SEV-ESJoerg Roedel2-1/+27
Add two new paravirt callbacks to provide hypervisor-specific processor state in the GHCB and to copy state from the hypervisor back to the processor. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200907131613.12703-63-joro@8bytes.org
2020-09-09x86/sev-es: Handle #DB EventsJoerg Roedel1-0/+17
Handle #VC exceptions caused by #DB exceptions in the guest. Those must be handled outside of instrumentation_begin()/end() so that the handler will not be raised recursively. Handle them by calling the kernel's debug exception handler. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200907131613.12703-62-joro@8bytes.org
2020-09-09x86/sev-es: Handle #AC EventsJoerg Roedel1-0/+19
Implement a handler for #VC exceptions caused by #AC exceptions. The #AC exception is just forwarded to do_alignment_check() and not pushed down to the hypervisor, as requested by the SEV-ES GHCB Standardization Specification. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200907131613.12703-61-joro@8bytes.org
2020-09-09x86/sev-es: Handle VMMCALL EventsTom Lendacky1-0/+23
Implement a handler for #VC exceptions caused by VMMCALL instructions. This is only a starting point, VMMCALL emulation under SEV-ES needs further hypervisor-specific changes to provide additional state. [ bp: Drop "this patch". ] Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> [ jroedel@suse.de: Adapt to #VC handling infrastructure ] Co-developed-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200907131613.12703-60-joro@8bytes.org
2020-09-09x86/sev-es: Handle MWAIT/MWAITX EventsTom Lendacky1-0/+10
Implement a handler for #VC exceptions caused by MWAIT and MWAITX instructions. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> [ jroedel@suse.de: Adapt to #VC handling infrastructure ] Co-developed-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200907131613.12703-59-joro@8bytes.org
2020-09-09x86/sev-es: Handle MONITOR/MONITORX EventsTom Lendacky1-0/+13
Implement a handler for #VC exceptions caused by MONITOR and MONITORX instructions. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> [ jroedel@suse.de: Adapt to #VC handling infrastructure ] Co-developed-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200907131613.12703-58-joro@8bytes.org
2020-09-09x86/sev-es: Handle INVD EventsTom Lendacky1-0/+4
Implement a handler for #VC exceptions caused by INVD instructions. Since Linux should never use INVD, just mark it as unsupported. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> [ jroedel@suse.de: Adapt to #VC handling infrastructure ] Co-developed-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200907131613.12703-57-joro@8bytes.org
2020-09-09x86/sev-es: Handle RDPMC EventsTom Lendacky1-0/+22
Implement a handler for #VC exceptions caused by RDPMC instructions. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> [ jroedel@suse.de: Adapt to #VC handling infrastructure ] Co-developed-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200907131613.12703-56-joro@8bytes.org
2020-09-09x86/sev-es: Handle RDTSC(P) EventsTom Lendacky3-0/+31
Implement a handler for #VC exceptions caused by RDTSC and RDTSCP instructions. Also make it available in the pre-decompression stage because the KASLR code uses RDTSC/RDTSCP to gather entropy and some hypervisors intercept these instructions. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> [ jroedel@suse.de: - Adapt to #VC handling infrastructure - Make it available early ] Co-developed-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200907131613.12703-55-joro@8bytes.org
2020-09-09x86/sev-es: Handle WBINVD EventsTom Lendacky1-0/+9
Implement a handler for #VC exceptions caused by WBINVD instructions. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> [ jroedel@suse.de: Adapt to #VC handling framework ] Co-developed-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200907131613.12703-54-joro@8bytes.org
2020-09-09x86/sev-es: Handle DR7 read/write eventsTom Lendacky1-0/+85
Add code to handle #VC exceptions on DR7 register reads and writes. This is needed early because show_regs() reads DR7 to print it out. Under SEV-ES, there is currently no support for saving/restoring the DRx registers but software expects to be able to write to the DR7 register. For now, cache the value written to DR7 and return it on read attempts, but do not touch the real hardware DR7. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> [ jroedel@suse.de: - Adapt to #VC handling framework - Support early usage ] Co-developed-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200907131613.12703-53-joro@8bytes.org
2020-09-09x86/sev-es: Handle MSR eventsTom Lendacky1-0/+28
Implement a handler for #VC exceptions caused by RDMSR/WRMSR instructions. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> [ jroedel@suse.de: Adapt to #VC handling infrastructure. ] Co-developed-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200907131613.12703-52-joro@8bytes.org
2020-09-09x86/sev-es: Handle MMIO String InstructionsJoerg Roedel1-0/+77
Add handling for emulation of the MOVS instruction on MMIO regions, as done by the memcpy_toio() and memcpy_fromio() functions. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200907131613.12703-51-joro@8bytes.org
2020-09-09x86/sev-es: Handle MMIO eventsTom Lendacky2-0/+227
Add a handler for #VC exceptions caused by MMIO intercepts. These intercepts come along as nested page faults on pages with reserved bits set. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> [ jroedel@suse.de: Adapt to VC handling framework ] Co-developed-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200907131613.12703-50-joro@8bytes.org
2020-09-09x86/sev-es: Handle instruction fetches from user-spaceJoerg Roedel1-9/+22
When a #VC exception is triggered by user-space, the instruction decoder needs to read the instruction bytes from user addresses. Enhance vc_decode_insn() to safely fetch kernel and user instructions. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200907131613.12703-49-joro@8bytes.org
2020-09-09x86/sev-es: Wire up existing #VC exit-code handlersJoerg Roedel2-4/+9
Re-use the handlers for CPUID- and IOIO-caused #VC exceptions in the early boot handler. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200907131613.12703-48-joro@8bytes.org
2020-09-09x86/sev-es: Add a Runtime #VC Exception HandlerTom Lendacky3-8/+255
Add the handlers for #VC exceptions invoked at runtime. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200907131613.12703-47-joro@8bytes.org
2020-09-09x86/entry/64: Add entry code for #VC handlerJoerg Roedel5-0/+171
The #VC handler needs special entry code because: 1. It runs on an IST stack 2. It needs to be able to handle nested #VC exceptions To make this work, the entry code is implemented to pretend it doesn't use an IST stack. When entered from user-mode or early SYSCALL entry path it switches to the task stack. If entered from kernel-mode it tries to switch back to the previous stack in the IRET frame. The stack found in the IRET frame is validated first, and if it is not safe to use it for the #VC handler, the code will switch to a fall-back stack (the #VC2 IST stack). From there, it can cause nested exceptions again. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200907131613.12703-46-joro@8bytes.org
2020-09-09x86/dumpstack/64: Add noinstr version of get_stack_info()Joerg Roedel4-20/+30
The get_stack_info() functionality is needed in the entry code for the #VC exception handler. Provide a version of it in the .text.noinstr section which can be called safely from there. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200907131613.12703-45-joro@8bytes.org