summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)AuthorFilesLines
2019-04-18Merge branch 'linus' of ↵Linus Torvalds2-12/+24
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fix from Herbert Xu: "Fix a bug in the implementation of the x86 accelerated version of poly1305" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: x86/poly1305 - fix overflow during partial reduction
2019-04-18perf/x86/amd: Add event map for AMD Family 17hKim Phillips1-9/+26
Family 17h differs from prior families by: - Does not support an L2 cache miss event - It has re-enumerated PMC counters for: - L2 cache references - front & back end stalled cycles So we add a new amd_f17h_perfmon_event_map[] so that the generic perf event names will resolve to the correct h/w events on family 17h and above processors. Reference sections 2.1.13.3.3 (stalls) and 2.1.13.3.6 (L2): https://www.amd.com/system/files/TechDocs/54945_PPR_Family_17h_Models_00h-0Fh.pdf Signed-off-by: Kim Phillips <kim.phillips@amd.com> Cc: <stable@vger.kernel.org> # v4.9+ Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Liška <mliska@suse.cz> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Pu Wen <puwen@hygon.cn> Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Fixes: e40ed1542dd7 ("perf/x86: Add perf support for AMD family-17h processors") [ Improved the formatting a bit. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-18x86/mm/KASLR: Fix the size of the direct mapping sectionBaoquan He1-1/+1
kernel_randomize_memory() uses __PHYSICAL_MASK_SHIFT to calculate the maximum amount of system RAM supported. The size of the direct mapping section is obtained from the smaller one of the below two values: (actual system RAM size + padding size) vs (max system RAM size supported) This calculation is wrong since commit b83ce5ee9147 ("x86/mm/64: Make __PHYSICAL_MASK_SHIFT always 52"). In it, __PHYSICAL_MASK_SHIFT was changed to be 52, regardless of whether the kernel is using 4-level or 5-level page tables. Thus, it will always use 4 PB as the maximum amount of system RAM, even in 4-level paging mode where it should actually be 64 TB. Thus, the size of the direct mapping section will always be the sum of the actual system RAM size plus the padding size. Even when the amount of system RAM is 64 TB, the following layout will still be used. Obviously KALSR will be weakened significantly. |____|_______actual RAM_______|_padding_|______the rest_______| 0 64TB ~120TB Instead, it should be like this: |____|_______actual RAM_______|_________the rest______________| 0 64TB ~120TB The size of padding region is controlled by CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING, which is 10 TB by default. The above issue only exists when CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING is set to a non-zero value, which is the case when CONFIG_MEMORY_HOTPLUG is enabled. Otherwise, using __PHYSICAL_MASK_SHIFT doesn't affect KASLR. Fix it by replacing __PHYSICAL_MASK_SHIFT with MAX_PHYSMEM_BITS. [ bp: Massage commit message. ] Fixes: b83ce5ee9147 ("x86/mm/64: Make __PHYSICAL_MASK_SHIFT always 52") Signed-off-by: Baoquan He <bhe@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Thomas Garnier <thgarnie@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: frank.ramsay@hpe.com Cc: herbert@gondor.apana.org.au Cc: kirill@shutemov.name Cc: mike.travis@hpe.com Cc: thgarnie@google.com Cc: x86-ml <x86@kernel.org> Cc: yamada.masahiro@socionext.com Link: https://lkml.kernel.org/r/20190417083536.GE7065@MiWiFi-R3L-srv
2019-04-16Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds15-181/+268
Pull KVM fixes from Paolo Bonzini: "5.1 keeps its reputation as a big bugfix release for KVM x86. - Fix for a memory leak introduced during the merge window - Fixes for nested VMX with ept=0 - Fixes for AMD (APIC virtualization, NMI injection) - Fixes for Hyper-V under KVM and KVM under Hyper-V - Fixes for 32-bit SMM and tests for SMM virtualization - More array_index_nospec peppering" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (21 commits) KVM: x86: avoid misreporting level-triggered irqs as edge-triggered in tracing KVM: fix spectrev1 gadgets KVM: x86: fix warning Using plain integer as NULL pointer selftests: kvm: add a selftest for SMM selftests: kvm: fix for compilers that do not support -no-pie selftests: kvm/evmcs_test: complete I/O before migrating guest state KVM: x86: Always use 32-bit SMRAM save state for 32-bit kernels KVM: x86: Don't clear EFER during SMM transitions for 32-bit vCPU KVM: x86: clear SMM flags before loading state while leaving SMM KVM: x86: Open code kvm_set_hflags KVM: x86: Load SMRAM in a single shot when leaving SMM KVM: nVMX: Expose RDPMC-exiting only when guest supports PMU KVM: x86: Raise #GP when guest vCPU do not support PMU x86/kvm: move kvm_load/put_guest_xcr0 into atomic context KVM: x86: svm: make sure NMI is injected after nmi_singlestep svm/avic: Fix invalidate logical APIC id entry Revert "svm: Fix AVIC incomplete IPI emulation" kvm: mmu: Fix overflow on kvm mmu page limit calculation KVM: nVMX: always use early vmcs check when EPT is disabled KVM: nVMX: allow tests to use bad virtual-APIC page address ...
2019-04-16KVM: x86: avoid misreporting level-triggered irqs as edge-triggered in tracingVitaly Kuznetsov1-2/+2
In __apic_accept_irq() interface trig_mode is int and actually on some code paths it is set above u8: kvm_apic_set_irq() extracts it from 'struct kvm_lapic_irq' where trig_mode is u16. This is done on purpose as e.g. kvm_set_msi_irq() sets it to (1 << 15) & e->msi.data kvm_apic_local_deliver sets it to reg & (1 << 15). Fix the immediate issue by making 'tm' into u16. We may also want to adjust __apic_accept_irq() interface and use proper sizes for vector, level, trig_mode but this is not urgent. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-04-16KVM: fix spectrev1 gadgetsPaolo Bonzini1-1/+3
These were found with smatch, and then generalized when applicable. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-04-16KVM: x86: fix warning Using plain integer as NULL pointerHariprasad Kelam1-1/+1
Changed passing argument as "0 to NULL" which resolves below sparse warning arch/x86/kvm/x86.c:3096:61: warning: Using plain integer as NULL pointer Signed-off-by: Hariprasad Kelam <hariprasad.kelam@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-04-16KVM: x86: Always use 32-bit SMRAM save state for 32-bit kernelsSean Christopherson2-4/+16
Invoking the 64-bit variation on a 32-bit kenrel will crash the guest, trigger a WARN, and/or lead to a buffer overrun in the host, e.g. rsm_load_state_64() writes r8-r15 unconditionally, but enum kvm_reg and thus x86_emulate_ctxt._regs only define r8-r15 for CONFIG_X86_64. KVM allows userspace to report long mode support via CPUID, even though the guest is all but guaranteed to crash if it actually tries to enable long mode. But, a pure 32-bit guest that is ignorant of long mode will happily plod along. SMM complicates things as 64-bit CPUs use a different SMRAM save state area. KVM handles this correctly for 64-bit kernels, e.g. uses the legacy save state map if userspace has hid long mode from the guest, but doesn't fare well when userspace reports long mode support on a 32-bit host kernel (32-bit KVM doesn't support 64-bit guests). Since the alternative is to crash the guest, e.g. by not loading state or explicitly requesting shutdown, unconditionally use the legacy SMRAM save state map for 32-bit KVM. If a guest has managed to get far enough to handle SMIs when running under a weird/buggy userspace hypervisor, then don't deliberately crash the guest since there are no downsides (from KVM's perspective) to allow it to continue running. Fixes: 660a5d517aaab ("KVM: x86: save/load state on SMM switch") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-04-16KVM: x86: Don't clear EFER during SMM transitions for 32-bit vCPUSean Christopherson1-10/+11
Neither AMD nor Intel CPUs have an EFER field in the legacy SMRAM save state area, i.e. don't save/restore EFER across SMM transitions. KVM somewhat models this, e.g. doesn't clear EFER on entry to SMM if the guest doesn't support long mode. But during RSM, KVM unconditionally clears EFER so that it can get back to pure 32-bit mode in order to start loading CRs with their actual non-SMM values. Clear EFER only when it will be written when loading the non-SMM state so as to preserve bits that can theoretically be set on 32-bit vCPUs, e.g. KVM always emulates EFER_SCE. And because CR4.PAE is cleared only to play nice with EFER, wrap that code in the long mode check as well. Note, this may result in a compiler warning about cr4 being consumed uninitialized. Re-read CR4 even though it's technically unnecessary, as doing so allows for more readable code and RSM emulation is not a performance critical path. Fixes: 660a5d517aaab ("KVM: x86: save/load state on SMM switch") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-04-16KVM: x86: clear SMM flags before loading state while leaving SMMSean Christopherson3-16/+10
RSM emulation is currently broken on VMX when the interrupted guest has CR4.VMXE=1. Stop dancing around the issue of HF_SMM_MASK being set when loading SMSTATE into architectural state, e.g. by toggling it for problematic flows, and simply clear HF_SMM_MASK prior to loading architectural state (from SMRAM save state area). Reported-by: Jon Doron <arilou@gmail.com> Cc: Jim Mattson <jmattson@google.com> Cc: Liran Alon <liran.alon@oracle.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Fixes: 5bea5123cbf0 ("KVM: VMX: check nested state and CR4.VMXE against SMM") Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Tested-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-04-16KVM: x86: Open code kvm_set_hflagsSean Christopherson3-18/+19
Prepare for clearing HF_SMM_MASK prior to loading state from the SMRAM save state map, i.e. kvm_smm_changed() needs to be called after state has been loaded and so cannot be done automatically when setting hflags from RSM. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-04-16KVM: x86: Load SMRAM in a single shot when leaving SMMSean Christopherson6-92/+92
RSM emulation is currently broken on VMX when the interrupted guest has CR4.VMXE=1. Rather than dance around the issue of HF_SMM_MASK being set when loading SMSTATE into architectural state, ideally RSM emulation itself would be reworked to clear HF_SMM_MASK prior to loading non-SMM architectural state. Ostensibly, the only motivation for having HF_SMM_MASK set throughout the loading of state from the SMRAM save state area is so that the memory accesses from GET_SMSTATE() are tagged with role.smm. Load all of the SMRAM save state area from guest memory at the beginning of RSM emulation, and load state from the buffer instead of reading guest memory one-by-one. This paves the way for clearing HF_SMM_MASK prior to loading state, and also aligns RSM with the enter_smm() behavior, which fills a buffer and writes SMRAM save state in a single go. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-04-16KVM: nVMX: Expose RDPMC-exiting only when guest supports PMULiran Alon1-0/+25
Issue was discovered when running kvm-unit-tests on KVM running as L1 on top of Hyper-V. When vmx_instruction_intercept unit-test attempts to run RDPMC to test RDPMC-exiting, it is intercepted by L1 KVM which it's EXIT_REASON_RDPMC handler raise #GP because vCPU exposed by Hyper-V doesn't support PMU. Instead of unit-test expectation to be reflected with EXIT_REASON_RDPMC. The reason vmx_instruction_intercept unit-test attempts to run RDPMC even though Hyper-V doesn't support PMU is because L1 expose to L2 support for RDPMC-exiting. Which is reasonable to assume that is supported only in case CPU supports PMU to being with. Above issue can easily be simulated by modifying vmx_instruction_intercept config in x86/unittests.cfg to run QEMU with "-cpu host,+vmx,-pmu" and run unit-test. To handle issue, change KVM to expose RDPMC-exiting only when guest supports PMU. Reported-by: Saar Amar <saaramar@microsoft.com> Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-04-16KVM: x86: Raise #GP when guest vCPU do not support PMULiran Alon1-0/+4
Before this change, reading a VMware pseduo PMC will succeed even when PMU is not supported by guest. This can easily be seen by running kvm-unit-test vmware_backdoors with "-cpu host,-pmu" option. Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-04-16x86/kvm: move kvm_load/put_guest_xcr0 into atomic contextWANG Chao4-6/+12
guest xcr0 could leak into host when MCE happens in guest mode. Because do_machine_check() could schedule out at a few places. For example: kvm_load_guest_xcr0 ... kvm_x86_ops->run(vcpu) { vmx_vcpu_run vmx_complete_atomic_exit kvm_machine_check do_machine_check do_memory_failure memory_failure lock_page In this case, host_xcr0 is 0x2ff, guest vcpu xcr0 is 0xff. After schedule out, host cpu has guest xcr0 loaded (0xff). In __switch_to { switch_fpu_finish copy_kernel_to_fpregs XRSTORS If any bit i in XSTATE_BV[i] == 1 and xcr0[i] == 0, XRSTORS will generate #GP (In this case, bit 9). Then ex_handler_fprestore kicks in and tries to reinitialize fpu by restoring init fpu state. Same story as last #GP, except we get DOUBLE FAULT this time. Cc: stable@vger.kernel.org Signed-off-by: WANG Chao <chao.wang@ucloud.cn> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-04-16KVM: x86: svm: make sure NMI is injected after nmi_singlestepVitaly Kuznetsov1-0/+3
I noticed that apic test from kvm-unit-tests always hangs on my EPYC 7401P, the hanging test nmi-after-sti is trying to deliver 30000 NMIs and tracing shows that we're sometimes able to deliver a few but never all. When we're trying to inject an NMI we may fail to do so immediately for various reasons, however, we still need to inject it so enable_nmi_window() arms nmi_singlestep mode. #DB occurs as expected, but we're not checking for pending NMIs before entering the guest and unless there's a different event to process, the NMI will never get delivered. Make KVM_REQ_EVENT request on the vCPU from db_interception() to make sure pending NMIs are checked and possibly injected. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-04-16svm/avic: Fix invalidate logical APIC id entrySuthikulpanit, Suravee1-1/+2
Only clear the valid bit when invalidate logical APIC id entry. The current logic clear the valid bit, but also set the rest of the bits (including reserved bits) to 1. Fixes: 98d90582be2e ('svm: Fix AVIC DFR and LDR handling') Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-04-16Revert "svm: Fix AVIC incomplete IPI emulation"Suthikulpanit, Suravee1-4/+15
This reverts commit bb218fbcfaaa3b115d4cd7a43c0ca164f3a96e57. As Oren Twaig pointed out the old discussion: https://patchwork.kernel.org/patch/8292231/ that the change coud potentially cause an extra IPI to be sent to the destination vcpu because the AVIC hardware already set the IRR bit before the incomplete IPI #VMEXIT with id=1 (target vcpu is not running). Since writting to ICR and ICR2 will also set the IRR. If something triggers the destination vcpu to get scheduled before the emulation finishes, then this could result in an additional IPI. Also, the issue mentioned in the commit bb218fbcfaaa was misdiagnosed. Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Reported-by: Oren Twaig <oren@scalemp.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-04-16kvm: mmu: Fix overflow on kvm mmu page limit calculationBen Gardon4-16/+15
KVM bases its memory usage limits on the total number of guest pages across all memslots. However, those limits, and the calculations to produce them, use 32 bit unsigned integers. This can result in overflow if a VM has more guest pages that can be represented by a u32. As a result of this overflow, KVM can use a low limit on the number of MMU pages it will allocate. This makes KVM unable to map all of guest memory at once, prompting spurious faults. Tested: Ran all kvm-unit-tests on an Intel Haswell machine. This patch introduced no new failures. Signed-off-by: Ben Gardon <bgardon@google.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-04-16KVM: nVMX: always use early vmcs check when EPT is disabledPaolo Bonzini2-2/+21
The remaining failures of vmx.flat when EPT is disabled are caused by incorrectly reflecting VMfails to the L1 hypervisor. What happens is that nested_vmx_restore_host_state corrupts the guest CR3, reloading it with the host's shadow CR3 instead, because it blindly loads GUEST_CR3 from the vmcs01. For simplicity let's just always use hardware VMCS checks when EPT is disabled. This way, nested_vmx_restore_host_state is not reached at all (or at least shouldn't be reached). Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-04-16x86/Kconfig: Fix spelling mistake "effectivness" -> "effectiveness"Colin Ian King1-1/+1
The Kconfig text contains a spelling mistake, fix it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kernel-janitors@vger.kernel.org Link: http://lkml.kernel.org/r/20190416105751.18899-1-colin.king@canonical.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-16perf/x86: Fix incorrect PEBS_REGSKan Liang2-20/+20
PEBS_REGS used as mask for the supported registers for large PEBS. However, the mask cannot filter the sample_regs_user/sample_regs_intr correctly. (1ULL << PERF_REG_X86_*) should be used to replace PERF_REG_X86_*, which is only the index. Rename PEBS_REGS to PEBS_GP_REGS, because the mask is only for general purpose registers. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <stable@vger.kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: acme@kernel.org Cc: jolsa@kernel.org Fixes: 2fe1bc1f501d ("perf/x86: Enable free running PEBS for REGS_USER/INTR") Link: https://lkml.kernel.org/r/20190402194509.2832-2-kan.liang@linux.intel.com [ Renamed it to PEBS_GP_REGS - as 'GPRS' is used elsewhere ;-) ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-16KVM: nVMX: allow tests to use bad virtual-APIC page addressPaolo Bonzini3-10/+19
As mentioned in the comment, there are some special cases where we can simply clear the TPR shadow bit from the CPU-based execution controls in the vmcs02. Handle them so that we can remove some XFAILs from vmx.flat. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-04-16x86/mm/tlb: Revert "x86/mm: Align TLB invalidation info"Peter Zijlstra1-1/+1
Revert the following commit: 515ab7c41306: ("x86/mm: Align TLB invalidation info") I found out (the hard way) that under some .config options (notably L1_CACHE_SHIFT=7) and compiler combinations this on-stack alignment leads to a 320 byte stack usage, which then triggers a KASAN stack warning elsewhere. Using 320 bytes of stack space for a 40 byte structure is ludicrous and clearly not right. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Nadav Amit <namit@vmware.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 515ab7c41306 ("x86/mm: Align TLB invalidation info") Link: http://lkml.kernel.org/r/20190416080335.GM7905@worktop.programming.kicks-ass.net [ Minor changelog edits. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-16x86/reboot, efi: Use EFI reboot for Acer TravelMate X514-51TJian-Hong Pan1-0/+21
Upon reboot, the Acer TravelMate X514-51T laptop appears to complete the shutdown process, but then it hangs in BIOS POST with a black screen. The problem is intermittent - at some points it has appeared related to Secure Boot settings or different kernel builds, but ultimately we have not been able to identify the exact conditions that trigger the issue to come and go. Besides, the EFI mode cannot be disabled in the BIOS of this model. However, after extensive testing, we observe that using the EFI reboot method reliably avoids the issue in all cases. So add a boot time quirk to use EFI reboot on such systems. Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=203119 Signed-off-by: Jian-Hong Pan <jian-hong@endlessm.com> Signed-off-by: Daniel Drake <drake@endlessm.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Cc: linux@endlessm.com Link: http://lkml.kernel.org/r/20190412080152.3718-1-jian-hong@endlessm.com [ Fix !CONFIG_EFI build failure, clarify the code and the changelog a bit. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-16x86/mm: Prevent bogus warnings with "noexec=off"Thomas Gleixner2-2/+3
Xose Vazquez Perez reported boot warnings when NX is disabled on the kernel command line. __early_set_fixmap() triggers this warning: attempted to set unsupported pgprot: 8000000000000163 bits: 8000000000000000 supported: 7fffffffffffffff WARNING: CPU: 0 PID: 0 at arch/x86/include/asm/pgtable.h:537 __early_set_fixmap+0xa2/0xff because it uses __default_kernel_pte_mask to mask out unsupported bits. Use __supported_pte_mask instead. Disabling NX on the command line also triggers the NX warning in the page table mapping check: WARNING: CPU: 1 PID: 1 at arch/x86/mm/dump_pagetables.c:262 note_page+0x2ae/0x650 .... Make the warning depend on NX set in __supported_pte_mask. Reported-by: Xose Vazquez Perez <xose.vazquez@gmail.com> Tested-by: Xose Vazquez Perez <xose.vazquez@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Link: http://lkml.kernel.org/r/alpine.DEB.2.21.1904151037530.1729@nanos.tec.linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-16x86/build/lto: Fix truncated .bss with -fdata-sectionsSami Tolvanen1-1/+1
With CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=y, we compile the kernel with -fdata-sections, which also splits the .bss section. The new section, with a new .bss.* name, which pattern gets missed by the main x86 linker script which only expects the '.bss' name. This results in the discarding of the second part and a too small, truncated .bss section and an unhappy, non-working kernel. Use the common BSS_MAIN macro in the linker script to properly capture and merge all the generated BSS sections. Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20190415164956.124067-1-samitolvanen@google.com [ Extended the changelog. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-15KVM: x86/mmu: Fix an inverted list_empty() check when zapping sptesSean Christopherson1-1/+1
A recently introduced helper for handling zap vs. remote flush incorrectly bails early, effectively leaking defunct shadow pages. Manifests as a slab BUG when exiting KVM due to the shadow pages being alive when their associated cache is destroyed. ========================================================================== BUG kvm_mmu_page_header: Objects remaining in kvm_mmu_page_header on ... -------------------------------------------------------------------------- Disabling lock debugging due to kernel taint INFO: Slab 0x00000000fc436387 objects=26 used=23 fp=0x00000000d023caee ... CPU: 6 PID: 4315 Comm: rmmod Tainted: G B 5.1.0-rc2+ #19 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 Call Trace: dump_stack+0x46/0x5b slab_err+0xad/0xd0 ? on_each_cpu_mask+0x3c/0x50 ? ksm_migrate_page+0x60/0x60 ? on_each_cpu_cond_mask+0x7c/0xa0 ? __kmalloc+0x1ca/0x1e0 __kmem_cache_shutdown+0x13a/0x310 shutdown_cache+0xf/0x130 kmem_cache_destroy+0x1d5/0x200 kvm_mmu_module_exit+0xa/0x30 [kvm] kvm_arch_exit+0x45/0x60 [kvm] kvm_exit+0x6f/0x80 [kvm] vmx_exit+0x1a/0x50 [kvm_intel] __x64_sys_delete_module+0x153/0x1f0 ? exit_to_usermode_loop+0x88/0xc0 do_syscall_64+0x4f/0x100 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: a21136345cb6f ("KVM: x86/mmu: Split remote_flush+zap case out of kvm_mmu_flush_or_zap()") Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-04-15x86/speculation: Prevent deadlock on ssb_state::lockThomas Gleixner1-2/+6
Mikhail reported a lockdep splat related to the AMD specific ssb_state lock: CPU0 CPU1 lock(&st->lock); local_irq_disable(); lock(&(&sighand->siglock)->rlock); lock(&st->lock); <Interrupt> lock(&(&sighand->siglock)->rlock); *** DEADLOCK *** The connection between sighand->siglock and st->lock comes through seccomp, which takes st->lock while holding sighand->siglock. Make sure interrupts are disabled when __speculation_ctrl_update() is invoked via prctl() -> speculation_ctrl_update(). Add a lockdep assert to catch future offenders. Fixes: 1f50ddb4f418 ("x86/speculation: Handle HT correctly on AMD") Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Cc: Thomas Lendacky <thomas.lendacky@amd.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1904141948200.4917@nanos.tec.linutronix.de
2019-04-14x86/resctrl: Do not repeat rdtgroup mode initializationXiaochen Shen1-1/+2
When cache allocation is supported and the user creates a new resctrl resource group, the allocations of the new resource group are initialized to all regions that it can possibly use. At this time these regions are all that are shareable by other resource groups as well as regions that are not currently used. The new resource group's mode is also initialized to reflect this initialization and set to "shareable". The new resource group's mode is currently repeatedly initialized within the loop that configures the hardware with the resource group's default allocations. Move the initialization of the resource group's mode outside the hardware configuration loop. The resource group's mode is now initialized only once as the final step to reflect that its configured allocations are "shareable". Fixes: 95f0b77efa57 ("x86/intel_rdt: Initialize new resource group with sane defaults") Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Fenghua Yu <fenghua.yu@intel.com> Acked-by: Reinette Chatre <reinette.chatre@intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: pei.p.jia@intel.com Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/1554839629-5448-1-git-send-email-xiaochen.shen@intel.com
2019-04-13Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds2-26/+21
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Fix typos in user-visible resctrl parameters, and also fix assembly constraint bugs that might result in miscompilation" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/asm: Use stricter assembly constraints in bitops x86/resctrl: Fix typos in the mba_sc mount option
2019-04-13Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds3-16/+145
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Six kernel side fixes: three related to NMI handling on AMD systems, a race fix, a kexec initialization fix and a PEBS sampling fix" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/core: Fix perf_event_disable_inatomic() race x86/perf/amd: Remove need to check "running" bit in NMI handler x86/perf/amd: Resolve NMI latency issues for active PMCs x86/perf/amd: Resolve race condition when disabling PMC perf/x86/intel: Initialize TFA MSR perf/x86/intel: Fix handling of wakeup_events for multi-entry PEBS
2019-04-10x86/perf/amd: Remove need to check "running" bit in NMI handlerLendacky, Thomas2-12/+22
Spurious interrupt support was added to perf in the following commit, almost a decade ago: 63e6be6d98e1 ("perf, x86: Catch spurious interrupts after disabling counters") The two previous patches (resolving the race condition when disabling a PMC and NMI latency mitigation) allow for the removal of this older spurious interrupt support. Currently in x86_pmu_stop(), the bit for the PMC in the active_mask bitmap is cleared before disabling the PMC, which sets up a race condition. This race condition was mitigated by introducing the running bitmap. That race condition can be eliminated by first disabling the PMC, waiting for PMC reset on overflow and then clearing the bit for the PMC in the active_mask bitmap. The NMI handler will not re-enable a disabled counter. If x86_pmu_stop() is called from the perf NMI handler, the NMI latency mitigation support will guard against any unhandled NMI messages. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <stable@vger.kernel.org> # 4.14.x- Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: https://lkml.kernel.org/r/Message-ID: Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-08crypto: x86/poly1305 - fix overflow during partial reductionEric Biggers2-12/+24
The x86_64 implementation of Poly1305 produces the wrong result on some inputs because poly1305_4block_avx2() incorrectly assumes that when partially reducing the accumulator, the bits carried from limb 'd4' to limb 'h0' fit in a 32-bit integer. This is true for poly1305-generic which processes only one block at a time. However, it's not true for the AVX2 implementation, which processes 4 blocks at a time and therefore can produce intermediate limbs about 4x larger. Fix it by making the relevant calculations use 64-bit arithmetic rather than 32-bit. Note that most of the carries already used 64-bit arithmetic, but the d4 -> h0 carry was different for some reason. To be safe I also made the same change to the corresponding SSE2 code, though that only operates on 1 or 2 blocks at a time. I don't think it's really needed for poly1305_block_sse2(), but it doesn't hurt because it's already x86_64 code. It *might* be needed for poly1305_2block_sse2(), but overflows aren't easy to reproduce there. This bug was originally detected by my patches that improve testmgr to fuzz algorithms against their generic implementation. But also add a test vector which reproduces it directly (in the AVX2 case). Fixes: b1ccc8f4b631 ("crypto: poly1305 - Add a four block AVX2 variant for x86_64") Fixes: c70f4abef07a ("crypto: poly1305 - Add a SSE2 SIMD variant for x86_64") Cc: <stable@vger.kernel.org> # v4.3+ Cc: Martin Willi <martin@strongswan.org> Cc: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Martin Willi <martin@strongswan.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-04-07Merge tag 'for-linus-5.1b-rc4-tag' of ↵Linus Torvalds1-0/+3
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: "One minor fix and a small cleanup for the xen privcmd driver" * tag 'for-linus-5.1b-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen: Prevent buffer overflow in privcmd ioctl xen: use struct_size() helper in kzalloc()
2019-04-06x86/asm: Use stricter assembly constraints in bitopsAlexander Potapenko1-23/+18
There's a number of problems with how arch/x86/include/asm/bitops.h is currently using assembly constraints for the memory region bitops are modifying: 1) Use memory clobber in bitops that touch arbitrary memory Certain bit operations that read/write bits take a base pointer and an arbitrarily large offset to address the bit relative to that base. Inline assembly constraints aren't expressive enough to tell the compiler that the assembly directive is going to touch a specific memory location of unknown size, therefore we have to use the "memory" clobber to indicate that the assembly is going to access memory locations other than those listed in the inputs/outputs. To indicate that BTR/BTS instructions don't necessarily touch the first sizeof(long) bytes of the argument, we also move the address to assembly inputs. This particular change leads to size increase of 124 kernel functions in a defconfig build. For some of them the diff is in NOP operations, other end up re-reading values from memory and may potentially slow down the execution. But without these clobbers the compiler is free to cache the contents of the bitmaps and use them as if they weren't changed by the inline assembly. 2) Use byte-sized arguments for operations touching single bytes. Passing a long value to ANDB/ORB/XORB instructions makes the compiler treat sizeof(long) bytes as being clobbered, which isn't the case. This may theoretically lead to worse code in the case of heavy optimization. Practical impact: I've built a defconfig kernel and looked through some of the functions generated by GCC 7.3.0 with and without this clobber, and didn't spot any miscompilations. However there is a (trivial) theoretical case where this code leads to miscompilation: https://lkml.org/lkml/2019/3/28/393 using just GCC 8.3.0 with -O2. It isn't hard to imagine someone writes such a function in the kernel someday. So the primary motivation is to fix an existing misuse of the asm directive, which happens to work in certain configurations now, but isn't guaranteed to work under different circumstances. [ --mingo: Added -stable tag because defconfig only builds a fraction of the kernel and the trivial testcase looks normal enough to be used in existing or in-development code. ] Signed-off-by: Alexander Potapenko <glider@google.com> Cc: <stable@vger.kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: James Y Knight <jyknight@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20190402112813.193378-1-glider@google.com [ Edited the changelog, tidied up one of the defines. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-06Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2-37/+59
Pull kvm fixes from Paolo Bonzini: "x86 fixes for overflows and other nastiness" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: nVMX: fix x2APIC VTPR read intercept KVM: x86: nVMX: close leak of L0's x2APIC MSRs (CVE-2019-3887) KVM: SVM: prevent DBG_DECRYPT and DBG_ENCRYPT overflow kvm: svm: fix potential get_num_contig_pages overflow
2019-04-05KVM: x86: nVMX: fix x2APIC VTPR read interceptMarc Orr1-1/+1
Referring to the "VIRTUALIZING MSR-BASED APIC ACCESSES" chapter of the SDM, when "virtualize x2APIC mode" is 1 and "APIC-register virtualization" is 0, a RDMSR of 808H should return the VTPR from the virtual APIC page. However, for nested, KVM currently fails to disable the read intercept for this MSR. This means that a RDMSR exit takes precedence over "virtualize x2APIC mode", and KVM passes through L1's TPR to L2, instead of sourcing the value from L2's virtual APIC page. This patch fixes the issue by disabling the read intercept, in VMCS02, for the VTPR when "APIC-register virtualization" is 0. The issue described above and fix prescribed here, were verified with a related patch in kvm-unit-tests titled "Test VMX's virtualize x2APIC mode w/ nested". Signed-off-by: Marc Orr <marcorr@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Fixes: c992384bde84f ("KVM: vmx: speed up MSR bitmap merge") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-04-05KVM: x86: nVMX: close leak of L0's x2APIC MSRs (CVE-2019-3887)Marc Orr1-28/+44
The nested_vmx_prepare_msr_bitmap() function doesn't directly guard the x2APIC MSR intercepts with the "virtualize x2APIC mode" MSR. As a result, we discovered the potential for a buggy or malicious L1 to get access to L0's x2APIC MSRs, via an L2, as follows. 1. L1 executes WRMSR(IA32_SPEC_CTRL, 1). This causes the spec_ctrl variable, in nested_vmx_prepare_msr_bitmap() to become true. 2. L1 disables "virtualize x2APIC mode" in VMCS12. 3. L1 enables "APIC-register virtualization" in VMCS12. Now, KVM will set VMCS02's x2APIC MSR intercepts from VMCS12, and then set "virtualize x2APIC mode" to 0 in VMCS02. Oops. This patch closes the leak by explicitly guarding VMCS02's x2APIC MSR intercepts with VMCS12's "virtualize x2APIC mode" control. The scenario outlined above and fix prescribed here, were verified with a related patch in kvm-unit-tests titled "Add leak scenario to virt_x2apic_mode_test". Note, it looks like this issue may have been introduced inadvertently during a merge---see 15303ba5d1cd. Signed-off-by: Marc Orr <marcorr@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-04-05KVM: SVM: prevent DBG_DECRYPT and DBG_ENCRYPT overflowDavid Rientjes1-3/+9
This ensures that the address and length provided to DBG_DECRYPT and DBG_ENCRYPT do not cause an overflow. At the same time, pass the actual number of pages pinned in memory to sev_unpin_memory() as a cleanup. Reported-by: Cfir Cohen <cfir@google.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-04-05kvm: svm: fix potential get_num_contig_pages overflowDavid Rientjes1-5/+5
get_num_contig_pages() could potentially overflow int so make its type consistent with its usage. Reported-by: Cfir Cohen <cfir@google.com> Cc: stable@vger.kernel.org Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-04-05syscalls: Remove start and number from syscall_set_arguments() argsSteven Rostedt (VMware)1-53/+16
After removing the start and count arguments of syscall_get_arguments() it seems reasonable to remove them from syscall_set_arguments(). Note, as of today, there are no users of syscall_set_arguments(). But we are told that there will be soon. But for now, at least make it consistent with syscall_get_arguments(). Link: http://lkml.kernel.org/r/20190327222014.GA32540@altlinux.org Cc: Oleg Nesterov <oleg@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Dave Martin <dave.martin@arm.com> Cc: "Dmitry V. Levin" <ldv@altlinux.org> Cc: x86@kernel.org Cc: linux-snps-arc@lists.infradead.org Cc: linux-kernel@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-c6x-dev@linux-c6x.org Cc: uclinux-h8-devel@lists.sourceforge.jp Cc: linux-hexagon@vger.kernel.org Cc: linux-ia64@vger.kernel.org Cc: linux-mips@vger.kernel.org Cc: nios2-dev@lists.rocketboards.org Cc: openrisc@lists.librecores.org Cc: linux-parisc@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-riscv@lists.infradead.org Cc: linux-s390@vger.kernel.org Cc: linux-sh@vger.kernel.org Cc: sparclinux@vger.kernel.org Cc: linux-um@lists.infradead.org Cc: linux-xtensa@linux-xtensa.org Cc: linux-arch@vger.kernel.org Acked-by: Max Filippov <jcmvbkbc@gmail.com> # For xtensa changes Acked-by: Will Deacon <will.deacon@arm.com> # For the arm64 bits Reviewed-by: Thomas Gleixner <tglx@linutronix.de> # for x86 Reviewed-by: Dmitry V. Levin <ldv@altlinux.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-04-05syscalls: Remove start and number from syscall_get_arguments() argsSteven Rostedt (Red Hat)1-56/+17
At Linux Plumbers, Andy Lutomirski approached me and pointed out that the function call syscall_get_arguments() implemented in x86 was horribly written and not optimized for the standard case of passing in 0 and 6 for the starting index and the number of system calls to get. When looking at all the users of this function, I discovered that all instances pass in only 0 and 6 for these arguments. Instead of having this function handle different cases that are never used, simply rewrite it to return the first 6 arguments of a system call. This should help out the performance of tracing system calls by ptrace, ftrace and perf. Link: http://lkml.kernel.org/r/20161107213233.754809394@goodmis.org Cc: Oleg Nesterov <oleg@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Dave Martin <dave.martin@arm.com> Cc: "Dmitry V. Levin" <ldv@altlinux.org> Cc: x86@kernel.org Cc: linux-snps-arc@lists.infradead.org Cc: linux-kernel@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-c6x-dev@linux-c6x.org Cc: uclinux-h8-devel@lists.sourceforge.jp Cc: linux-hexagon@vger.kernel.org Cc: linux-ia64@vger.kernel.org Cc: linux-mips@vger.kernel.org Cc: nios2-dev@lists.rocketboards.org Cc: openrisc@lists.librecores.org Cc: linux-parisc@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-riscv@lists.infradead.org Cc: linux-s390@vger.kernel.org Cc: linux-sh@vger.kernel.org Cc: sparclinux@vger.kernel.org Cc: linux-um@lists.infradead.org Cc: linux-xtensa@linux-xtensa.org Cc: linux-arch@vger.kernel.org Acked-by: Paul Burton <paul.burton@mips.com> # MIPS parts Acked-by: Max Filippov <jcmvbkbc@gmail.com> # For xtensa changes Acked-by: Will Deacon <will.deacon@arm.com> # For the arm64 bits Reviewed-by: Thomas Gleixner <tglx@linutronix.de> # for x86 Reviewed-by: Dmitry V. Levin <ldv@altlinux.org> Reported-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-04-05xen: Prevent buffer overflow in privcmd ioctlDan Carpenter1-0/+3
The "call" variable comes from the user in privcmd_ioctl_hypercall(). It's an offset into the hypercall_page[] which has (PAGE_SIZE / 32) elements. We need to put an upper bound on it to prevent an out of bounds access. Cc: stable@vger.kernel.org Fixes: 1246ae0bb992 ("xen: add variable hypercall caller") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2019-04-03x86/perf/amd: Resolve NMI latency issues for active PMCsLendacky, Thomas1-1/+55
On AMD processors, the detection of an overflowed PMC counter in the NMI handler relies on the current value of the PMC. So, for example, to check for overflow on a 48-bit counter, bit 47 is checked to see if it is 1 (not overflowed) or 0 (overflowed). When the perf NMI handler executes it does not know in advance which PMC counters have overflowed. As such, the NMI handler will process all active PMC counters that have overflowed. NMI latency in newer AMD processors can result in multiple overflowed PMC counters being processed in one NMI and then a subsequent NMI, that does not appear to be a back-to-back NMI, not finding any PMC counters that have overflowed. This may appear to be an unhandled NMI resulting in either a panic or a series of messages, depending on how the kernel was configured. To mitigate this issue, add an AMD handle_irq callback function, amd_pmu_handle_irq(), that will invoke the common x86_pmu_handle_irq() function and upon return perform some additional processing that will indicate if the NMI has been handled or would have been handled had an earlier NMI not handled the overflowed PMC. Using a per-CPU variable, a minimum value of the number of active PMCs or 2 will be set whenever a PMC is active. This is used to indicate the possible number of NMIs that can still occur. The value of 2 is used for when an NMI does not arrive at the LAPIC in time to be collapsed into an already pending NMI. Each time the function is called without having handled an overflowed counter, the per-CPU value is checked. If the value is non-zero, it is decremented and the NMI indicates that it handled the NMI. If the value is zero, then the NMI indicates that it did not handle the NMI. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <stable@vger.kernel.org> # 4.14.x- Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: https://lkml.kernel.org/r/Message-ID: Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-03x86/perf/amd: Resolve race condition when disabling PMCLendacky, Thomas1-3/+62
On AMD processors, the detection of an overflowed counter in the NMI handler relies on the current value of the counter. So, for example, to check for overflow on a 48 bit counter, bit 47 is checked to see if it is 1 (not overflowed) or 0 (overflowed). There is currently a race condition present when disabling and then updating the PMC. Increased NMI latency in newer AMD processors makes this race condition more pronounced. If the counter value has overflowed, it is possible to update the PMC value before the NMI handler can run. The updated PMC value is not an overflowed value, so when the perf NMI handler does run, it will not find an overflowed counter. This may appear as an unknown NMI resulting in either a panic or a series of messages, depending on how the kernel is configured. To eliminate this race condition, the PMC value must be checked after disabling the counter. Add an AMD function, amd_pmu_disable_all(), that will wait for the NMI handler to reset any active and overflowed counter after calling x86_pmu_disable_all(). Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <stable@vger.kernel.org> # 4.14.x- Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: https://lkml.kernel.org/r/Message-ID: Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-03perf/x86/intel: Initialize TFA MSRPeter Zijlstra1-0/+6
Stephane reported that the TFA MSR is not initialized by the kernel, but the TFA bit could set by firmware or as a leftover from a kexec, which makes the state inconsistent. Reported-by: Stephane Eranian <eranian@google.com> Tested-by: Nelson DSouza <nelson.dsouza@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: tonyj@suse.com Link: https://lkml.kernel.org/r/20190321123849.GN6521@hirez.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-03perf/x86/intel: Fix handling of wakeup_events for multi-entry PEBSStephane Eranian1-1/+1
When an event is programmed with attr.wakeup_events=N (N>0), it means the caller is interested in getting a user level notification after N samples have been recorded in the kernel sampling buffer. With precise events on Intel processors, the kernel uses PEBS. The kernel tries minimize sampling overhead by verifying if the event configuration is compatible with multi-entry PEBS mode. If so, the kernel is notified only when the buffer has reached its threshold. Other PEBS operates in single-entry mode, the kenrel is notified for each PEBS sample. The problem is that the current implementation look at frequency mode and event sample_type but ignores the wakeup_events field. Thus, it may not be possible to receive a notification after each precise event. This patch fixes this problem by disabling multi-entry PEBS if wakeup_events is non-zero. Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: kan.liang@intel.com Link: https://lkml.kernel.org/r/20190306195048.189514-1-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-01x86/resctrl: Fix typos in the mba_sc mount optionXiaochen Shen1-3/+3
The user can control the MBA memory bandwidth in MBps (Mega Bytes per second) units of the MBA Software Controller (mba_sc) by using the "mba_MBps" mount option. For details, see Documentation/x86/resctrl_ui.txt. However, commit 23bf1b6be9c2 ("kernfs, sysfs, cgroup, intel_rdt: Support fs_context") changed the mount option name from "mba_MBps" to "mba_mpbs" by mistake. Change it back from to "mba_MBps" because it is user-visible, and correct "Opt_mba_mpbs" spelling to "Opt_mba_mbps". [ bp: massage commit message. ] Fixes: 23bf1b6be9c2 ("kernfs, sysfs, cgroup, intel_rdt: Support fs_context") Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: dhowells@redhat.com Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: pei.p.jia@intel.com Cc: Reinette Chatre <reinette.chatre@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/1553896238-22130-1-git-send-email-xiaochen.shen@intel.com
2019-03-31Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds9-55/+138
Pull KVM fixes from Paolo Bonzini: "A collection of x86 and ARM bugfixes, and some improvements to documentation. On top of this, a cleanup of kvm_para.h headers, which were exported by some architectures even though they not support KVM at all. This is responsible for all the Kbuild changes in the diffstat" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (28 commits) Documentation: kvm: clarify KVM_SET_USER_MEMORY_REGION KVM: doc: Document the life cycle of a VM and its resources KVM: selftests: complete IO before migrating guest state KVM: selftests: disable stack protector for all KVM tests KVM: selftests: explicitly disable PIE for tests KVM: selftests: assert on exit reason in CR4/cpuid sync test KVM: x86: update %rip after emulating IO x86/kvm/hyper-v: avoid spurious pending stimer on vCPU init kvm/x86: Move MSR_IA32_ARCH_CAPABILITIES to array emulated_msrs KVM: x86: Emulate MSR_IA32_ARCH_CAPABILITIES on AMD hosts kvm: don't redefine flags as something else kvm: mmu: Used range based flushing in slot_handle_level_range KVM: export <linux/kvm_para.h> and <asm/kvm_para.h> iif KVM is supported KVM: x86: remove check on nr_mmu_pages in kvm_arch_commit_memory_region() kvm: nVMX: Add a vmentry check for HOST_SYSENTER_ESP and HOST_SYSENTER_EIP fields KVM: SVM: Workaround errata#1096 (insn_len maybe zero on SMAP violation) KVM: Reject device ioctls from processes other than the VM's creator KVM: doc: Fix incorrect word ordering regarding supported use of APIs KVM: x86: fix handling of role.cr4_pae and rename it to 'gpte_size' KVM: nVMX: Do not inherit quadrant and invalid for the root shadow EPT ...