summaryrefslogtreecommitdiff
path: root/arch/x86/include/asm
AgeCommit message (Collapse)AuthorFilesLines
2018-12-26Merge branch 'x86-cache-for-linus' of ↵Linus Torvalds1-14/+14
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cache control updates from Borislav Petkov: - The generalization of the RDT code to accommodate the addition of AMD's very similar implementation of the cache monitoring feature. This entails a subsystem move into a separate and generic arch/x86/kernel/cpu/resctrl/ directory along with adding vendor-specific initialization and feature detection helpers. Ontop of that is the unification of user-visible strings, both in the resctrl filesystem error handling and Kconfig. Provided by Babu Moger and Sherry Hurwitz. - Code simplifications and error handling improvements by Reinette Chatre. * 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/resctrl: Fix rdt_find_domain() return value and checks x86/resctrl: Remove unnecessary check for cbm_validate() x86/resctrl: Use rdt_last_cmd_puts() where possible MAINTAINERS: Update resctrl filename patterns Documentation: Rename and update intel_rdt_ui.txt to resctrl_ui.txt x86/resctrl: Introduce AMD QOS feature x86/resctrl: Fixup the user-visible strings x86/resctrl: Add AMD's X86_FEATURE_MBA to the scattered CPUID features x86/resctrl: Rename the config option INTEL_RDT to RESCTRL x86/resctrl: Add vendor check for the MBA software controller x86/resctrl: Bring cbm_validate() into the resource structure x86/resctrl: Initialize the vendor-specific resource functions x86/resctrl: Move all the macros to resctrl/internal.h x86/resctrl: Re-arrange the RDT init code x86/resctrl: Rename the RDT functions and definitions x86/resctrl: Rename and move rdt files to a separate directory
2018-12-26Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds9-136/+333
Pull KVM updates from Paolo Bonzini: "ARM: - selftests improvements - large PUD support for HugeTLB - single-stepping fixes - improved tracing - various timer and vGIC fixes x86: - Processor Tracing virtualization - STIBP support - some correctness fixes - refactorings and splitting of vmx.c - use the Hyper-V range TLB flush hypercall - reduce order of vcpu struct - WBNOINVD support - do not use -ftrace for __noclone functions - nested guest support for PAUSE filtering on AMD - more Hyper-V enlightenments (direct mode for synthetic timers) PPC: - nested VFIO s390: - bugfixes only this time" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (171 commits) KVM: x86: Add CPUID support for new instruction WBNOINVD kvm: selftests: ucall: fix exit mmio address guessing Revert "compiler-gcc: disable -ftracer for __noclone functions" KVM: VMX: Move VM-Enter + VM-Exit handling to non-inline sub-routines KVM: VMX: Explicitly reference RCX as the vmx_vcpu pointer in asm blobs KVM: x86: Use jmp to invoke kvm_spurious_fault() from .fixup MAINTAINERS: Add arch/x86/kvm sub-directories to existing KVM/x86 entry KVM/x86: Use SVM assembly instruction mnemonics instead of .byte streams KVM/MMU: Flush tlb directly in the kvm_zap_gfn_range() KVM/MMU: Flush tlb directly in kvm_set_pte_rmapp() KVM/MMU: Move tlb flush in kvm_set_pte_rmapp() to kvm_mmu_notifier_change_pte() KVM: Make kvm_set_spte_hva() return int KVM: Replace old tlb flush function with new one to flush a specified range. KVM/MMU: Add tlb flush with range helper function KVM/VMX: Add hv tlb range flush support x86/hyper-v: Add HvFlushGuestAddressList hypercall support KVM: Add tlb_remote_flush_with_range callback in kvm_x86_ops KVM: x86: Disable Intel PT when VMXON in L1 guest KVM: x86: Set intercept for Intel PT MSRs read/write KVM: x86: Implement Intel PT MSRs read/write emulation ...
2018-12-26Merge tag 'arm64-upstream' of ↵Linus Torvalds1-0/+3
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 festive updates from Will Deacon: "In the end, we ended up with quite a lot more than I expected: - Support for ARMv8.3 Pointer Authentication in userspace (CRIU and kernel-side support to come later) - Support for per-thread stack canaries, pending an update to GCC that is currently undergoing review - Support for kexec_file_load(), which permits secure boot of a kexec payload but also happens to improve the performance of kexec dramatically because we can avoid the sucky purgatory code from userspace. Kdump will come later (requires updates to libfdt). - Optimisation of our dynamic CPU feature framework, so that all detected features are enabled via a single stop_machine() invocation - KPTI whitelisting of Cortex-A CPUs unaffected by Meltdown, so that they can benefit from global TLB entries when KASLR is not in use - 52-bit virtual addressing for userspace (kernel remains 48-bit) - Patch in LSE atomics for per-cpu atomic operations - Custom preempt.h implementation to avoid unconditional calls to preempt_schedule() from preempt_enable() - Support for the new 'SB' Speculation Barrier instruction - Vectorised implementation of XOR checksumming and CRC32 optimisations - Workaround for Cortex-A76 erratum #1165522 - Improved compatibility with Clang/LLD - Support for TX2 system PMUS for profiling the L3 cache and DMC - Reflect read-only permissions in the linear map by default - Ensure MMIO reads are ordered with subsequent calls to Xdelay() - Initial support for memory hotplug - Tweak the threshold when we invalidate the TLB by-ASID, so that mremap() performance is improved for ranges spanning multiple PMDs. - Minor refactoring and cleanups" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (125 commits) arm64: kaslr: print PHYS_OFFSET in dump_kernel_offset() arm64: sysreg: Use _BITUL() when defining register bits arm64: cpufeature: Rework ptr auth hwcaps using multi_entry_cap_matches arm64: cpufeature: Reduce number of pointer auth CPU caps from 6 to 4 arm64: docs: document pointer authentication arm64: ptr auth: Move per-thread keys from thread_info to thread_struct arm64: enable pointer authentication arm64: add prctl control for resetting ptrauth keys arm64: perf: strip PAC when unwinding userspace arm64: expose user PAC bit positions via ptrace arm64: add basic pointer authentication support arm64/cpufeature: detect pointer authentication arm64: Don't trap host pointer auth use to EL2 arm64/kvm: hide ptrauth from guests arm64/kvm: consistently handle host HCR_EL2 flags arm64: add pointer authentication register bits arm64: add comments about EC exception levels arm64: perf: Treat EXCLUDE_EL* bit definitions as unsigned arm64: kpti: Whitelist Cortex-A CPUs that don't implement the CSV3 field arm64: enable per-task stack canaries ...
2018-12-26Merge branch 'x86-pti-for-linus' of ↵Linus Torvalds2-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 pti updates from Thomas Gleixner: "No point in speculating what's in this parcel: - Drop the swap storage limit when L1TF is disabled so the full space is available - Add support for the new AMD STIBP always on mitigation mode - Fix a bunch of STIPB typos" * 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/speculation: Add support for STIBP always-on preferred mode x86/speculation/l1tf: Drop the swap storage limit restriction when l1tf=off x86/speculation: Change misspelled STIPB to STIBP
2018-12-26Merge tag 'acpi-4.21-rc1' of ↵Linus Torvalds1-0/+7
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI updates from Rafael Wysocki: "These update the ACPICA code in the kernel to the 20181213 upstream revision, make it possible to build the ACPI subsystem without PCI support, and a new OEM _OSI string, add a new device support to the ACPI driver for AMD SoCs and fix PM handling in the ACPI driver for Intel SoCs, fix the SPCR table handling and do some assorted fixes and cleanups. Specifics: - Update the ACPICA code in the kernel to the 20181213 upstream revision including: * New Windows _OSI strings (Bob Moore, Jung-uk Kim). * Buffers-to-string conversions update (Bob Moore). * Removal of support for expressions in package elements (Bob Moore). * New option to display method/object evaluation in debug output (Bob Moore). * Compiler improvements (Bob Moore, Erik Schmauss). * Minor debugger fix (Erik Schmauss). * Disassembler improvement (Erik Schmauss). * Assorted cleanups (Bob Moore, Colin Ian King, Erik Schmauss). - Add support for a new OEM _OSI string to indicate special handling of secondary graphics adapters on some systems (Alex Hung). - Make it possible to build the ACPI subystem without PCI support (Sinan Kaya). - Make the SPCR table handling regard baud rate 0 in accordance with the specification of it and make the DSDT override code support DSDT code names generated by recent ACPICA (Andy Shevchenko, Wang Dongsheng, Nathan Chancellor). - Add clock frequency for Hisilicon Hip08 SPI controller to the ACPI driver for AMD SoCs (APD) (Jay Fang). - Fix the PM handling during device init in the ACPI driver for Intel SoCs (LPSS) (Hans de Goede). - Avoid double panic()s by clearing the APEI GHES block_status before panic() (Lenny Szubowicz). - Clean up a function invocation in the ACPI core and get rid of some code duplication by using the DEFINE_SHOW_ATTRIBUTE macro in the APEI support code (Alexey Dobriyan, Yangtao Li)" * tag 'acpi-4.21-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (31 commits) ACPI / tables: Add an ifdef around amlcode and dsdt_amlcode ACPI/APEI: Clear GHES block_status before panic() ACPI: Make PCI slot detection driver depend on PCI ACPI/IORT: Stub out ACS functions when CONFIG_PCI is not set arm64: select ACPI PCI code only when both features are enabled PCI/ACPI: Allow ACPI to be built without CONFIG_PCI set ACPICA: Remove PCI bits from ACPICA when CONFIG_PCI is unset ACPI: Allow CONFIG_PCI to be unset for reboot ACPI: Move PCI reset to a separate function ACPI / OSI: Add OEM _OSI string to enable dGPU direct output ACPI / tables: add DSDT AmlCode new declaration name support ACPICA: Update version to 20181213 ACPICA: change coding style to match ACPICA, no functional change ACPICA: Debug output: Add option to display method/object evaluation ACPICA: disassembler: disassemble OEMx tables as AML ACPICA: Add "Windows 2018.2" string in the _OSI support ACPICA: Expressions in package elements are not supported ACPICA: Update buffer-to-string conversions ACPICA: add comments, no functional change ACPICA: Remove defines that use deprecated flag ...
2018-12-21Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds10-236/+257
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "The biggest part is a series of reverts for the macro based GCC inlining workarounds. It caused regressions in distro build and other kernel tooling environments, and the GCC project was very receptive to fixing the underlying inliner weaknesses - so as time ran out we decided to do a reasonably straightforward revert of the patches. The plan is to rely on the 'asm inline' GCC 9 feature, which might be backported to GCC 8 and could thus become reasonably widely available on modern distros. Other than those reverts, there's misc fixes from all around the place. I wish our final x86 pull request for v4.20 was smaller..." * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Revert "kbuild/Makefile: Prepare for using macros in inline assembly code to work around asm() related GCC inlining bugs" Revert "x86/objtool: Use asm macros to work around GCC inlining bugs" Revert "x86/refcount: Work around GCC inlining bug" Revert "x86/alternatives: Macrofy lock prefixes to work around GCC inlining bugs" Revert "x86/bug: Macrofy the BUG table section handling, to work around GCC inlining bugs" Revert "x86/paravirt: Work around GCC inlining bugs when compiling paravirt ops" Revert "x86/extable: Macrofy inline assembly code to work around GCC inlining bugs" Revert "x86/cpufeature: Macrofy inline assembly code to work around GCC inlining bugs" Revert "x86/jump-labels: Macrofy inline assembly code to work around GCC inlining bugs" x86/mtrr: Don't copy uninitialized gentry fields back to userspace x86/fsgsbase/64: Fix the base write helper functions x86/mm/cpa: Fix cpa_flush_array() TLB invalidation x86/vdso: Pass --eh-frame-hdr to the linker x86/mm: Fix decoy address handling vs 32-bit builds x86/intel_rdt: Ensure a CPU remains online for the region's pseudo-locking sequence x86/dump_pagetables: Fix LDT remap address marker x86/mm: Fix guard hole handling
2018-12-21KVM: x86: Add CPUID support for new instruction WBNOINVDRobert Hoo1-0/+1
Signed-off-by: Robert Hoo <robert.hu@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-21KVM: x86: Use jmp to invoke kvm_spurious_fault() from .fixupSean Christopherson1-1/+1
____kvm_handle_fault_on_reboot() provides a generic exception fixup handler that is used to cleanly handle faults on VMX/SVM instructions during reboot (or at least try to). If there isn't a reboot in progress, ____kvm_handle_fault_on_reboot() treats any exception as fatal to KVM and invokes kvm_spurious_fault(), which in turn generates a BUG() to get a stack trace and die. When it was originally added by commit 4ecac3fd6dc2 ("KVM: Handle virtualization instruction #UD faults during reboot"), the "call" to kvm_spurious_fault() was handcoded as PUSH+JMP, where the PUSH'd value is the RIP of the faulting instructing. The PUSH+JMP trickery is necessary because the exception fixup handler code lies outside of its associated function, e.g. right after the function. An actual CALL from the .fixup code would show a slightly bogus stack trace, e.g. an extra "random" function would be inserted into the trace, as the return RIP on the stack would point to no known function (and the unwinder will likely try to guess who owns the RIP). Unfortunately, the JMP was replaced with a CALL when the macro was reworked to not spin indefinitely during reboot (commit b7c4145ba2eb "KVM: Don't spin on virt instruction faults during reboot"). This causes the aforementioned behavior where a bogus function is inserted into the stack trace, e.g. my builds like to blame free_kvm_area(). Revert the CALL back to a JMP. The changelog for commit b7c4145ba2eb ("KVM: Don't spin on virt instruction faults during reboot") contains nothing that indicates the switch to CALL was deliberate. This is backed up by the fact that the PUSH <insn RIP> was left intact. Note that an alternative to the PUSH+JMP magic would be to JMP back to the "real" code and CALL from there, but that would require adding a JMP in the non-faulting path to avoid calling kvm_spurious_fault() and would add no value, i.e. the stack trace would be the same. Using CALL: ------------[ cut here ]------------ kernel BUG at /home/sean/go/src/kernel.org/linux/arch/x86/kvm/x86.c:356! invalid opcode: 0000 [#1] SMP CPU: 4 PID: 1057 Comm: qemu-system-x86 Not tainted 4.20.0-rc6+ #75 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:kvm_spurious_fault+0x5/0x10 [kvm] Code: <0f> 0b 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 55 49 89 fd 41 RSP: 0018:ffffc900004bbcc8 EFLAGS: 00010046 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffffffffffff RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffff888273fd8000 R08: 00000000000003e8 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000784 R12: ffffc90000371fb0 R13: 0000000000000000 R14: 000000026d763cf4 R15: ffff888273fd8000 FS: 00007f3d69691700(0000) GS:ffff888277800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055f89bc56fe0 CR3: 0000000271a5a001 CR4: 0000000000362ee0 Call Trace: free_kvm_area+0x1044/0x43ea [kvm_intel] ? vmx_vcpu_run+0x156/0x630 [kvm_intel] ? kvm_arch_vcpu_ioctl_run+0x447/0x1a40 [kvm] ? kvm_vcpu_ioctl+0x368/0x5c0 [kvm] ? kvm_vcpu_ioctl+0x368/0x5c0 [kvm] ? __set_task_blocked+0x38/0x90 ? __set_current_blocked+0x50/0x60 ? __fpu__restore_sig+0x97/0x490 ? do_vfs_ioctl+0xa1/0x620 ? __x64_sys_futex+0x89/0x180 ? ksys_ioctl+0x66/0x70 ? __x64_sys_ioctl+0x16/0x20 ? do_syscall_64+0x4f/0x100 ? entry_SYSCALL_64_after_hwframe+0x44/0xa9 Modules linked in: vhost_net vhost tap kvm_intel kvm irqbypass bridge stp llc ---[ end trace 9775b14b123b1713 ]--- Using JMP: ------------[ cut here ]------------ kernel BUG at /home/sean/go/src/kernel.org/linux/arch/x86/kvm/x86.c:356! invalid opcode: 0000 [#1] SMP CPU: 6 PID: 1067 Comm: qemu-system-x86 Not tainted 4.20.0-rc6+ #75 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:kvm_spurious_fault+0x5/0x10 [kvm] Code: <0f> 0b 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 55 49 89 fd 41 RSP: 0018:ffffc90000497cd0 EFLAGS: 00010046 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffffffffffff RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffff88827058bd40 R08: 00000000000003e8 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000784 R12: ffffc90000369fb0 R13: 0000000000000000 R14: 00000003c8fc6642 R15: ffff88827058bd40 FS: 00007f3d7219e700(0000) GS:ffff888277900000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f3d64001000 CR3: 0000000271c6b004 CR4: 0000000000362ee0 Call Trace: vmx_vcpu_run+0x156/0x630 [kvm_intel] ? kvm_arch_vcpu_ioctl_run+0x447/0x1a40 [kvm] ? kvm_vcpu_ioctl+0x368/0x5c0 [kvm] ? kvm_vcpu_ioctl+0x368/0x5c0 [kvm] ? __set_task_blocked+0x38/0x90 ? __set_current_blocked+0x50/0x60 ? __fpu__restore_sig+0x97/0x490 ? do_vfs_ioctl+0xa1/0x620 ? __x64_sys_futex+0x89/0x180 ? ksys_ioctl+0x66/0x70 ? __x64_sys_ioctl+0x16/0x20 ? do_syscall_64+0x4f/0x100 ? entry_SYSCALL_64_after_hwframe+0x44/0xa9 Modules linked in: vhost_net vhost tap kvm_intel kvm irqbypass bridge stp llc ---[ end trace f9daedb85ab3ddba ]--- Fixes: b7c4145ba2eb ("KVM: Don't spin on virt instruction faults during reboot") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-21KVM/x86: Use SVM assembly instruction mnemonics instead of .byte streamsUros Bizjak1-7/+0
Recently the minimum required version of binutils was changed to 2.20, which supports all SVM instruction mnemonics. The patch removes all .byte #defines and uses real instruction mnemonics instead. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-21KVM: Make kvm_set_spte_hva() return intLan Tianyu1-1/+1
The patch is to make kvm_set_spte_hva() return int and caller can check return value to determine flush tlb or not. Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com> Acked-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-21x86/hyper-v: Add HvFlushGuestAddressList hypercall supportLan Tianyu3-0/+61
Hyper-V provides HvFlushGuestAddressList() hypercall to flush EPT tlb with specified ranges. This patch is to add the hypercall support. Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-21KVM: Add tlb_remote_flush_with_range callback in kvm_x86_opsLan Tianyu1-0/+7
Add flush range call back in the kvm_x86_ops and platform can use it to register its associated function. The parameter "kvm_tlb_range" accepts a single range and flush list which contains a list of ranges. Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-21KVM: x86: Add Intel Processor Trace cpuid emulationChao Peng1-0/+1
Expose Intel Processor Trace to guest only when the PT works in Host-Guest mode. Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com> Signed-off-by: Luwei Kang <luwei.kang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-21KVM: x86: Add Intel PT virtualization work modeChao Peng2-0/+9
Intel Processor Trace virtualization can be work in one of 2 possible modes: a. System-Wide mode (default): When the host configures Intel PT to collect trace packets of the entire system, it can leave the relevant VMX controls clear to allow VMX-specific packets to provide information across VMX transitions. KVM guest will not aware this feature in this mode and both host and KVM guest trace will output to host buffer. b. Host-Guest mode: Host can configure trace-packet generation while in VMX non-root operation for guests and root operation for native executing normally. Intel PT will be exposed to KVM guest in this mode, and the trace output to respective buffer of host and guest. In this mode, tht status of PT will be saved and disabled before VM-entry and restored after VM-exit if trace a virtual machine. Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com> Signed-off-by: Luwei Kang <luwei.kang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-21perf/x86/intel/pt: add new capability for Intel PTLuwei Kang1-0/+1
This adds support for "output to Trace Transport subsystem" capability of Intel PT. It means that PT can output its trace to an MMIO address range rather than system memory buffer. Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Luwei Kang <luwei.kang@intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-21perf/x86/intel/pt: Add new bit definitions for PT MSRsLuwei Kang1-0/+3
Add bit definitions for Intel PT MSRs to support trace output directed to the memeory subsystem and holds a count if packet bytes that have been sent out. These are required by the upcoming PT support in KVM guests for MSRs read/write emulation. Signed-off-by: Luwei Kang <luwei.kang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-21perf/x86/intel/pt: Introduce intel_pt_validate_cap()Luwei Kang1-0/+2
intel_pt_validate_hw_cap() validates whether a given PT capability is supported by the hardware. It checks the PT capability array which reflects the capabilities of the hardware on which the code is executed. For setting up PT for KVM guests this is not correct as the capability array for the guest can be different from the host array. Provide a new function to check against a given capability array. Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Luwei Kang <luwei.kang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-21perf/x86/intel/pt: Export pt_cap_get()Chao Peng1-0/+23
pt_cap_get() is required by the upcoming PT support in KVM guests. Export it and move the capabilites enum to a global header. As a global functions, "pt_*" is already used for ptrace and other things, so it makes sense to use "intel_pt_*" as a prefix. Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com> Signed-off-by: Luwei Kang <luwei.kang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-21perf/x86/intel/pt: Move Intel PT MSRs bit defines to global headerChao Peng1-0/+33
The Intel Processor Trace (PT) MSR bit defines are in a private header. The upcoming support for PT virtualization requires these defines to be accessible from KVM code. Move them to the global MSR header file. Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com> Signed-off-by: Luwei Kang <luwei.kang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-20PCI/ACPI: Allow ACPI to be built without CONFIG_PCI setSinan Kaya1-0/+7
We are compiling PCI code today for systems with ACPI and no PCI device present. Remove the useless code and reduce the tight dependency. Signed-off-by: Sinan Kaya <okaya@kernel.org> Acked-by: Bjorn Helgaas <bhelgaas@google.com> # PCI parts Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-12-19Revert "x86/refcount: Work around GCC inlining bug"Ingo Molnar1-48/+33
This reverts commit 9e1725b410594911cc5981b6c7b4cea4ec054ca8. See this commit for details about the revert: e769742d3584 ("Revert "x86/jump-labels: Macrofy inline assembly code to work around GCC inlining bugs"") The conflict resolution for interaction with: 288e4521f0f6: ("x86/asm: 'Simplify' GEN_*_RMWcc() macros") was provided by Masahiro Yamada. Conflicts: arch/x86/include/asm/refcount.h Reported-by: Masahiro Yamada <yamada.masahiro@socionext.com> Reviewed-by: Borislav Petkov <bp@alien8.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Juergen Gross <jgross@suse.com> Cc: Richard Biener <rguenther@suse.de> Cc: Kees Cook <keescook@chromium.org> Cc: Segher Boessenkool <segher@kernel.crashing.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Nadav Amit <namit@vmware.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-19Revert "x86/alternatives: Macrofy lock prefixes to work around GCC inlining ↵Ingo Molnar2-16/+15
bugs" This reverts commit 77f48ec28e4ccff94d2e5f4260a83ac27a7f3099. See this commit for details about the revert: e769742d3584 ("Revert "x86/jump-labels: Macrofy inline assembly code to work around GCC inlining bugs"") Reported-by: Masahiro Yamada <yamada.masahiro@socionext.com> Reviewed-by: Borislav Petkov <bp@alien8.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Juergen Gross <jgross@suse.com> Cc: Richard Biener <rguenther@suse.de> Cc: Kees Cook <keescook@chromium.org> Cc: Segher Boessenkool <segher@kernel.crashing.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Nadav Amit <namit@vmware.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-19Revert "x86/bug: Macrofy the BUG table section handling, to work around GCC ↵Ingo Molnar1-56/+42
inlining bugs" This reverts commit f81f8ad56fd1c7b99b2ed1c314527f7d9ac447c6. See this commit for details about the revert: e769742d3584 ("Revert "x86/jump-labels: Macrofy inline assembly code to work around GCC inlining bugs"") Reported-by: Masahiro Yamada <yamada.masahiro@socionext.com> Reviewed-by: Borislav Petkov <bp@alien8.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Juergen Gross <jgross@suse.com> Cc: Richard Biener <rguenther@suse.de> Cc: Kees Cook <keescook@chromium.org> Cc: Segher Boessenkool <segher@kernel.crashing.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Nadav Amit <namit@vmware.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-19Revert "x86/paravirt: Work around GCC inlining bugs when compiling paravirt ops"Ingo Molnar1-27/+29
This reverts commit 494b5168f2de009eb80f198f668da374295098dd. See this commit for details about the revert: e769742d3584 ("Revert "x86/jump-labels: Macrofy inline assembly code to work around GCC inlining bugs"") Reported-by: Masahiro Yamada <yamada.masahiro@socionext.com> Reviewed-by: Borislav Petkov <bp@alien8.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Juergen Gross <jgross@suse.com> Cc: Richard Biener <rguenther@suse.de> Cc: Kees Cook <keescook@chromium.org> Cc: Segher Boessenkool <segher@kernel.crashing.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Nadav Amit <namit@vmware.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-19Revert "x86/extable: Macrofy inline assembly code to work around GCC ↵Ingo Molnar1-20/+33
inlining bugs" This reverts commit 0474d5d9d2f7f3b11262f7bf87d0e7314ead9200. See this commit for details about the revert: e769742d3584 ("Revert "x86/jump-labels: Macrofy inline assembly code to work around GCC inlining bugs"") Reported-by: Masahiro Yamada <yamada.masahiro@socionext.com> Reviewed-by: Borislav Petkov <bp@alien8.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Juergen Gross <jgross@suse.com> Cc: Richard Biener <rguenther@suse.de> Cc: Kees Cook <keescook@chromium.org> Cc: Segher Boessenkool <segher@kernel.crashing.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Nadav Amit <namit@vmware.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-19Revert "x86/cpufeature: Macrofy inline assembly code to work around GCC ↵Ingo Molnar1-47/+35
inlining bugs" This reverts commit d5a581d84ae6b8a4a740464b80d8d9cf1e7947b2. See this commit for details about the revert: e769742d3584 ("Revert "x86/jump-labels: Macrofy inline assembly code to work around GCC inlining bugs"") Reported-by: Masahiro Yamada <yamada.masahiro@socionext.com> Reviewed-by: Borislav Petkov <bp@alien8.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Juergen Gross <jgross@suse.com> Cc: Richard Biener <rguenther@suse.de> Cc: Kees Cook <keescook@chromium.org> Cc: Segher Boessenkool <segher@kernel.crashing.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Nadav Amit <namit@vmware.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-19Revert "x86/jump-labels: Macrofy inline assembly code to work around GCC ↵Ingo Molnar1-18/+54
inlining bugs" This reverts commit 5bdcd510c2ac9efaf55c4cbd8d46421d8e2320cd. The macro based workarounds for GCC's inlining bugs caused regressions: distcc and other distro build setups broke, and the fixes are not easy nor will they solve regressions on already existing installations. So we are reverting this patch and the 8 followup patches. What makes this revert easier is that GCC9 will likely include the new 'asm inline' syntax that makes inlining of assembly blocks a lot more robust. This is a superior method to any macro based hackeries - and might even be backported to GCC8, which would make all modern distros get the inlining fixes as well. Many thanks to Masahiro Yamada and others for helping sort out these problems. Reported-by: Masahiro Yamada <yamada.masahiro@socionext.com> Reviewed-by: Borislav Petkov <bp@alien8.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Juergen Gross <jgross@suse.com> Cc: Richard Biener <rguenther@suse.de> Cc: Kees Cook <keescook@chromium.org> Cc: Segher Boessenkool <segher@kernel.crashing.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Nadav Amit <namit@vmware.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-19kvm: x86: Add AMD's EX_CFG to the list of ignored MSRsEduardo Habkost1-0/+1
Some guests OSes (including Windows 10) write to MSR 0xc001102c on some cases (possibly while trying to apply a CPU errata). Make KVM ignore reads and writes to that MSR, so the guest won't crash. The MSR is documented as "Execution Unit Configuration (EX_CFG)", at AMD's "BIOS and Kernel Developer's Guide (BKDG) for AMD Family 15h Models 00h-0Fh Processors". Cc: stable@vger.kernel.org Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-18x86/fsgsbase/64: Fix the base write helper functionsChang S. Bae1-4/+11
Andy spotted a regression in the fs/gs base helpers after the patch series was committed. The helper functions which write fs/gs base are not just writing the base, they are also changing the index. That's wrong and needs to be separated because writing the base has not to modify the index. While the regression is not causing any harm right now because the only caller depends on that behaviour, it's a guarantee for subtle breakage down the road. Make the index explicitly changed from the caller, instead of including the code in the helpers. Subsequently, the task write helpers do not handle for the current task anymore. The range check for a base value is also factored out, to minimize code redundancy from the caller. Fixes: b1378a561fd1 ("x86/fsgsbase/64: Introduce FS/GS base helper functions") Suggested-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Andy Lutomirski <luto@kernel.org> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Ravi Shankar <ravi.v.shankar@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Link: https://lkml.kernel.org/r/20181126195524.32179-1-chang.seok.bae@intel.com
2018-12-18x86/speculation: Add support for STIBP always-on preferred modeThomas Lendacky2-0/+2
Different AMD processors may have different implementations of STIBP. When STIBP is conditionally enabled, some implementations would benefit from having STIBP always on instead of toggling the STIBP bit through MSR writes. This preference is advertised through a CPUID feature bit. When conditional STIBP support is requested at boot and the CPU advertises STIBP always-on mode as preferred, switch to STIBP "on" support. To show that this transition has occurred, create a new spectre_v2_user_mitigation value and a new spectre_v2_user_strings message. The new mitigation value is used in spectre_v2_user_select_mitigation() to print the new mitigation message as well as to return a new string from stibp_state(). Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Borislav Petkov <bp@alien8.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Link: https://lkml.kernel.org/r/20181213230352.6937.74943.stgit@tlendack-t1.amdoffice.net
2018-12-14kvm: x86: Dynamically allocate guest_fpuMarc Orr1-1/+2
Previously, the guest_fpu field was embedded in the kvm_vcpu_arch struct. Unfortunately, the field is quite large, (e.g., 4352 bytes on my current setup). This bloats the kvm_vcpu_arch struct for x86 into an order 3 memory allocation, which can become a problem on overcommitted machines. Thus, this patch moves the fpu state outside of the kvm_vcpu_arch struct. With this patch applied, the kvm_vcpu_arch struct is reduced to 15168 bytes for vmx on my setup when building the kernel with kvmconfig. Suggested-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Marc Orr <marcorr@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14kvm: x86: Use task structs fpu field for userMarc Orr1-4/+3
Previously, x86's instantiation of 'struct kvm_vcpu_arch' added an fpu field to save/restore fpu-related architectural state, which will differ from kvm's fpu state. However, this is redundant to the 'struct fpu' field, called fpu, embedded in the task struct, via the thread field. Thus, this patch removes the user_fpu field from the kvm_vcpu_arch struct and replaces it with the task struct's fpu field. This change is significant because the fpu struct is actually quite large. For example, on the system used to develop this patch, this change reduces the size of the vcpu_vmx struct from 23680 bytes down to 19520 bytes, when building the kernel with kvmconfig. This reduction in the size of the vcpu_vmx struct moves us closer to being able to allocate the struct at order 2, rather than order 3. Suggested-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Marc Orr <marcorr@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14x86/kvm/hyper-v: use stimer config definition from hyperv-tlfs.hVitaly Kuznetsov2-7/+1
As a preparation to implementing Direct Mode for Hyper-V synthetic timers switch to using stimer config definition from hyperv-tlfs.h. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14x86/hyper-v: move synic/stimer control structures definitions to hyperv-tlfs.hVitaly Kuznetsov1-0/+69
We implement Hyper-V SynIC and synthetic timers in KVM too so there's some room for code sharing. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14x86/kvm/hyper-v: Introduce nested_get_evmcs_version() helperVitaly Kuznetsov1-0/+1
The upcoming KVM_GET_SUPPORTED_HV_CPUID ioctl will need to return Enlightened VMCS version in HYPERV_CPUID_NESTED_FEATURES.EAX when it was enabled. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14x86/hyper-v: Drop HV_X64_CONFIGURE_PROFILER definitionVitaly Kuznetsov1-1/+0
BIT(13) in HYPERV_CPUID_FEATURES.EBX is described as "ConfigureProfiler" in TLFS v4.0 but starting 5.0 it is replaced with 'Reserved'. As we don't currently us it in kernel it can just be dropped. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14x86/hyper-v: Do some housekeeping in hyperv-tlfs.hVitaly Kuznetsov1-95/+91
hyperv-tlfs.h is a bit messy: CPUID feature bits are not always sorted, it's hard to get which CPUID they belong to, some items are duplicated (e.g. HV_X64_MSR_CRASH_CTL_NOTIFY/HV_CRASH_CTL_CRASH_NOTIFY). Do some housekeeping work. While on it, replace all (1 << X) with BIT(X) macro. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14x86/hyper-v: Mark TLFS structures packedVitaly Kuznetsov1-22/+25
The TLFS structures are used for hypervisor-guest communication and must exactly meet the specification. Compilers can add alignment padding to structures or reorder struct members for randomization and optimization, which would break the hypervisor ABI. Mark the structures as packed to prevent this. 'struct hv_vp_assist_page' and 'struct hv_enlightened_vmcs' need to be properly padded to support the change. Suggested-by: Nadav Amit <nadav.amit@gmail.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Nadav Amit <nadav.amit@gmail.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14Merge branch 'khdr_fix' of ↵Paolo Bonzini9-25/+62
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest into HEAD Merge topic branch from Shuah.
2018-12-11x86/mm: Fix guard hole handlingKirill A. Shutemov1-0/+5
There is a guard hole at the beginning of the kernel address space, also used by hypervisors. It occupies 16 PGD entries. This reserved range is not defined explicitely, it is calculated relative to other entities: direct mapping and user space ranges. The calculation got broken by recent changes of the kernel memory layout: LDT remap range is now mapped before direct mapping and makes the calculation invalid. The breakage leads to crash on Xen dom0 boot[1]. Define the reserved range explicitely. It's part of kernel ABI (hypervisors expect it to be stable) and must not depend on changes in the rest of kernel memory layout. [1] https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg03313.html Fixes: d52888aa2753 ("x86/mm: Move LDT remap out of KASLR region on 5-level paging") Reported-by: Hans van Kranenburg <hans.van.kranenburg@mendix.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Hans van Kranenburg <hans.van.kranenburg@mendix.com> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: bp@alien8.de Cc: hpa@zytor.com Cc: dave.hansen@linux.intel.com Cc: luto@kernel.org Cc: peterz@infradead.org Cc: boris.ostrovsky@oracle.com Cc: bhe@redhat.com Cc: linux-mm@kvack.org Cc: xen-devel@lists.xenproject.org Link: https://lkml.kernel.org/r/20181130202328.65359-2-kirill.shutemov@linux.intel.com
2018-12-07preempt: Move PREEMPT_NEED_RESCHED definition into arch codeWill Deacon1-0/+3
PREEMPT_NEED_RESCHED is never used directly, so move it into the arch code where it can potentially be implemented using either a different bit in the preempt count or as an entirely separate entity. Cc: Robert Love <rml@tech9.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-03x86/boot: Clear RSDP address in boot_params for broken loadersJuergen Gross1-0/+1
Gunnar Krueger reported a systemd-boot failure and bisected it down to: e6e094e053af75 ("x86/acpi, x86/boot: Take RSDP address from boot params if available") In case a broken boot loader doesn't clear its 'struct boot_params', clear rsdp_addr in sanitize_boot_params(). Reported-by: Gunnar Krueger <taijian@posteo.de> Tested-by: Gunnar Krueger <taijian@posteo.de> Signed-off-by: Juergen Gross <jgross@suse.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bp@alien8.de Cc: sstabellini@kernel.org Fixes: e6e094e053af75 ("x86/acpi, x86/boot: Take RSDP address from boot params if available") Link: http://lkml.kernel.org/r/20181203103811.17056-1-jgross@suse.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-01Merge branch 'x86-pti-for-linus' of ↵Linus Torvalds6-22/+60
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull STIBP fallout fixes from Thomas Gleixner: "The performance destruction department finally got it's act together and came up with a cure for the STIPB regression: - Provide a command line option to control the spectre v2 user space mitigations. Default is either seccomp or prctl (if seccomp is disabled in Kconfig). prctl allows mitigation opt-in, seccomp enables the migitation for sandboxed processes. - Rework the code to handle the conditional STIBP/IBPB control and remove the now unused ptrace_may_access_sched() optimization attempt - Disable STIBP automatically when SMT is disabled - Optimize the switch_to() logic to avoid MSR writes and invocations of __switch_to_xtra(). - Make the asynchronous speculation TIF updates synchronous to prevent stale mitigation state. As a general cleanup this also makes retpoline directly depend on compiler support and removes the 'minimal retpoline' option which just pretended to provide some form of security while providing none" * 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits) x86/speculation: Provide IBPB always command line options x86/speculation: Add seccomp Spectre v2 user space protection mode x86/speculation: Enable prctl mode for spectre_v2_user x86/speculation: Add prctl() control for indirect branch speculation x86/speculation: Prepare arch_smt_update() for PRCTL mode x86/speculation: Prevent stale SPEC_CTRL msr content x86/speculation: Split out TIF update ptrace: Remove unused ptrace_may_access_sched() and MODE_IBRS x86/speculation: Prepare for conditional IBPB in switch_mm() x86/speculation: Avoid __switch_to_xtra() calls x86/process: Consolidate and simplify switch_to_xtra() code x86/speculation: Prepare for per task indirect branch speculation control x86/speculation: Add command line control for indirect branch speculation x86/speculation: Unify conditional spectre v2 print functions x86/speculataion: Mark command line parser data __initdata x86/speculation: Mark string arrays const correctly x86/speculation: Reorder the spec_v2 code x86/l1tf: Show actual SMT state x86/speculation: Rework SMT state change sched/smt: Expose sched_smt_present static key ...
2018-11-30Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds2-3/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Misc fixes: - MCE related boot crash fix on certain AMD systems - FPU exception handling fix - FPU handling race fix - revert+rewrite of the RSDP boot protocol extension, use boot_params instead - documentation fix" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/MCE/AMD: Fix the thresholding machinery initialization order x86/fpu: Use the correct exception table macro in the XSTATE_OP wrapper x86/fpu: Disable bottom halves while loading FPU registers x86/acpi, x86/boot: Take RSDP address from boot params if available x86/boot: Mostly revert commit ae7e1238e68f2a ("Add ACPI RSDP address to setup_header") x86/ptrace: Fix documentation for tracehook_report_syscall_entry()
2018-11-28x86/speculation: Add seccomp Spectre v2 user space protection modeThomas Gleixner1-0/+1
If 'prctl' mode of user space protection from spectre v2 is selected on the kernel command-line, STIBP and IBPB are applied on tasks which restrict their indirect branch speculation via prctl. SECCOMP enables the SSBD mitigation for sandboxed tasks already, so it makes sense to prevent spectre v2 user space to user space attacks as well. The Intel mitigation guide documents how STIPB works: Setting bit 1 (STIBP) of the IA32_SPEC_CTRL MSR on a logical processor prevents the predicted targets of indirect branches on any logical processor of that core from being controlled by software that executes (or executed previously) on another logical processor of the same core. Ergo setting STIBP protects the task itself from being attacked from a task running on a different hyper-thread and protects the tasks running on different hyper-threads from being attacked. While the document suggests that the branch predictors are shielded between the logical processors, the observed performance regressions suggest that STIBP simply disables the branch predictor more or less completely. Of course the document wording is vague, but the fact that there is also no requirement for issuing IBPB when STIBP is used points clearly in that direction. The kernel still issues IBPB even when STIBP is used until Intel clarifies the whole mechanism. IBPB is issued when the task switches out, so malicious sandbox code cannot mistrain the branch predictor for the next user space task on the same logical processor. Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Casey Schaufler <casey.schaufler@intel.com> Cc: Asit Mallick <asit.k.mallick@intel.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Jon Masters <jcm@redhat.com> Cc: Waiman Long <longman9394@gmail.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Dave Stewart <david.c.stewart@intel.com> Cc: Kees Cook <keescook@chromium.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20181125185006.051663132@linutronix.de
2018-11-28x86/speculation: Add prctl() control for indirect branch speculationThomas Gleixner1-0/+1
Add the PR_SPEC_INDIRECT_BRANCH option for the PR_GET_SPECULATION_CTRL and PR_SET_SPECULATION_CTRL prctls to allow fine grained per task control of indirect branch speculation via STIBP and IBPB. Invocations: Check indirect branch speculation status with - prctl(PR_GET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, 0, 0, 0); Enable indirect branch speculation with - prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, PR_SPEC_ENABLE, 0, 0); Disable indirect branch speculation with - prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, PR_SPEC_DISABLE, 0, 0); Force disable indirect branch speculation with - prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, PR_SPEC_FORCE_DISABLE, 0, 0); See Documentation/userspace-api/spec_ctrl.rst. Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Casey Schaufler <casey.schaufler@intel.com> Cc: Asit Mallick <asit.k.mallick@intel.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Jon Masters <jcm@redhat.com> Cc: Waiman Long <longman9394@gmail.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Dave Stewart <david.c.stewart@intel.com> Cc: Kees Cook <keescook@chromium.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20181125185005.866780996@linutronix.de
2018-11-28x86/speculation: Prevent stale SPEC_CTRL msr contentThomas Gleixner2-6/+4
The seccomp speculation control operates on all tasks of a process, but only the current task of a process can update the MSR immediately. For the other threads the update is deferred to the next context switch. This creates the following situation with Process A and B: Process A task 2 and Process B task 1 are pinned on CPU1. Process A task 2 does not have the speculation control TIF bit set. Process B task 1 has the speculation control TIF bit set. CPU0 CPU1 MSR bit is set ProcB.T1 schedules out ProcA.T2 schedules in MSR bit is cleared ProcA.T1 seccomp_update() set TIF bit on ProcA.T2 ProcB.T1 schedules in MSR is not updated <-- FAIL This happens because the context switch code tries to avoid the MSR update if the speculation control TIF bits of the incoming and the outgoing task are the same. In the worst case ProcB.T1 and ProcA.T2 are the only tasks scheduling back and forth on CPU1, which keeps the MSR stale forever. In theory this could be remedied by IPIs, but chasing the remote task which could be migrated is complex and full of races. The straight forward solution is to avoid the asychronous update of the TIF bit and defer it to the next context switch. The speculation control state is stored in task_struct::atomic_flags by the prctl and seccomp updates already. Add a new TIF_SPEC_FORCE_UPDATE bit and set this after updating the atomic_flags. Check the bit on context switch and force a synchronous update of the speculation control if set. Use the same mechanism for updating the current task. Reported-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Casey Schaufler <casey.schaufler@intel.com> Cc: Asit Mallick <asit.k.mallick@intel.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Jon Masters <jcm@redhat.com> Cc: Waiman Long <longman9394@gmail.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Dave Stewart <david.c.stewart@intel.com> Cc: Kees Cook <keescook@chromium.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1811272247140.1875@nanos.tec.linutronix.de
2018-11-28x86/speculation: Prepare for conditional IBPB in switch_mm()Thomas Gleixner2-2/+8
The IBPB speculation barrier is issued from switch_mm() when the kernel switches to a user space task with a different mm than the user space task which ran last on the same CPU. An additional optimization is to avoid IBPB when the incoming task can be ptraced by the outgoing task. This optimization only works when switching directly between two user space tasks. When switching from a kernel task to a user space task the optimization fails because the previous task cannot be accessed anymore. So for quite some scenarios the optimization is just adding overhead. The upcoming conditional IBPB support will issue IBPB only for user space tasks which have the TIF_SPEC_IB bit set. This requires to handle the following cases: 1) Switch from a user space task (potential attacker) which has TIF_SPEC_IB set to a user space task (potential victim) which has TIF_SPEC_IB not set. 2) Switch from a user space task (potential attacker) which has TIF_SPEC_IB not set to a user space task (potential victim) which has TIF_SPEC_IB set. This needs to be optimized for the case where the IBPB can be avoided when only kernel threads ran in between user space tasks which belong to the same process. The current check whether two tasks belong to the same context is using the tasks context id. While correct, it's simpler to use the mm pointer because it allows to mangle the TIF_SPEC_IB bit into it. The context id based mechanism requires extra storage, which creates worse code. When a task is scheduled out its TIF_SPEC_IB bit is mangled as bit 0 into the per CPU storage which is used to track the last user space mm which was running on a CPU. This bit can be used together with the TIF_SPEC_IB bit of the incoming task to make the decision whether IBPB needs to be issued or not to cover the two cases above. As conditional IBPB is going to be the default, remove the dubious ptrace check for the IBPB always case and simply issue IBPB always when the process changes. Move the storage to a different place in the struct as the original one created a hole. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Casey Schaufler <casey.schaufler@intel.com> Cc: Asit Mallick <asit.k.mallick@intel.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Jon Masters <jcm@redhat.com> Cc: Waiman Long <longman9394@gmail.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Dave Stewart <david.c.stewart@intel.com> Cc: Kees Cook <keescook@chromium.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20181125185005.466447057@linutronix.de
2018-11-28x86/speculation: Avoid __switch_to_xtra() callsThomas Gleixner1-2/+11
The TIF_SPEC_IB bit does not need to be evaluated in the decision to invoke __switch_to_xtra() when: - CONFIG_SMP is disabled - The conditional STIPB mode is disabled The TIF_SPEC_IB bit still controls IBPB in both cases so the TIF work mask checks might invoke __switch_to_xtra() for nothing if TIF_SPEC_IB is the only set bit in the work masks. Optimize it out by masking the bit at compile time for CONFIG_SMP=n and at run time when the static key controlling the conditional STIBP mode is disabled. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Casey Schaufler <casey.schaufler@intel.com> Cc: Asit Mallick <asit.k.mallick@intel.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Jon Masters <jcm@redhat.com> Cc: Waiman Long <longman9394@gmail.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Dave Stewart <david.c.stewart@intel.com> Cc: Kees Cook <keescook@chromium.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20181125185005.374062201@linutronix.de
2018-11-28x86/process: Consolidate and simplify switch_to_xtra() codeThomas Gleixner1-3/+0
Move the conditional invocation of __switch_to_xtra() into an inline function so the logic can be shared between 32 and 64 bit. Remove the handthrough of the TSS pointer and retrieve the pointer directly in the bitmap handling function. Use this_cpu_ptr() instead of the per_cpu() indirection. This is a preparatory change so integration of conditional indirect branch speculation optimization happens only in one place. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Casey Schaufler <casey.schaufler@intel.com> Cc: Asit Mallick <asit.k.mallick@intel.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Jon Masters <jcm@redhat.com> Cc: Waiman Long <longman9394@gmail.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Dave Stewart <david.c.stewart@intel.com> Cc: Kees Cook <keescook@chromium.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20181125185005.280855518@linutronix.de