summaryrefslogtreecommitdiff
path: root/arch/x86/kernel
AgeCommit message (Collapse)AuthorFilesLines
2019-08-26x86/apic: Include the LDR when clearing out APIC registersBandan Das1-0/+4
Although APIC initialization will typically clear out the LDR before setting it, the APIC cleanup code should reset the LDR. This was discovered with a 32-bit KVM guest jumping into a kdump kernel. The stale bits in the LDR triggered a bug in the KVM APIC implementation which caused the destination mapping for VCPUs to be corrupted. Note that this isn't intended to paper over the KVM APIC bug. The kernel has to clear the LDR when resetting the APIC registers except when X2APIC is enabled. This lacks a Fixes tag because missing to clear LDR goes way back into pre git history. [ tglx: Made x2apic_enabled a function call as required ] Signed-off-by: Bandan Das <bsd@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20190826101513.5080-3-bsd@redhat.com
2019-08-26x86/apic: Do not initialize LDR and DFR for bigsmpBandan Das1-22/+2
Legacy apic init uses bigsmp for smp systems with 8 and more CPUs. The bigsmp APIC implementation uses physical destination mode, but it nevertheless initializes LDR and DFR. The LDR even ends up incorrectly with multiple bit being set. This does not cause a functional problem because LDR and DFR are ignored when physical destination mode is active, but it triggered a problem on a 32-bit KVM guest which jumps into a kdump kernel. The multiple bits set unearthed a bug in the KVM APIC implementation. The code which creates the logical destination map for VCPUs ignores the disabled state of the APIC and ends up overwriting an existing valid entry and as a result, APIC calibration hangs in the guest during kdump initialization. Remove the bogus LDR/DFR initialization. This is not intended to work around the KVM APIC bug. The LDR/DFR ininitalization is wrong on its own. The issue goes back into the pre git history. The fixes tag is the commit in the bitkeeper import which introduced bigsmp support in 2003. git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git Fixes: db7b9e9f26b8 ("[PATCH] Clustered APIC setup for >8 CPU systems") Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Bandan Das <bsd@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20190826101513.5080-2-bsd@redhat.com
2019-08-26uprobes/x86: Fix detection of 32-bit user modeSebastian Mayr1-7/+10
32-bit processes running on a 64-bit kernel are not always detected correctly, causing the process to crash when uretprobes are installed. The reason for the crash is that in_ia32_syscall() is used to determine the process's mode, which only works correctly when called from a syscall. In the case of uretprobes, however, the function is called from a exception and always returns 'false' on a 64-bit kernel. In consequence this leads to corruption of the process's return address. Fix this by using user_64bit_mode() instead of in_ia32_syscall(), which is correct in any situation. [ tglx: Add a comment and the following historical info ] This should have been detected by the rename which happened in commit abfb9498ee13 ("x86/entry: Rename is_{ia32,x32}_task() to in_{ia32,x32}_syscall()") which states in the changelog: The is_ia32_task()/is_x32_task() function names are a big misnomer: they suggests that the compat-ness of a system call is a task property, which is not true, the compatness of a system call purely depends on how it was invoked through the system call layer. ..... and then it went and blindly renamed every call site. Sadly enough this was already mentioned here: 8faaed1b9f50 ("uprobes/x86: Introduce sizeof_long(), cleanup adjust_ret_addr() and arch_uretprobe_hijack_return_addr()") where the changelog says: TODO: is_ia32_task() is not what we actually want, TS_COMPAT does not necessarily mean 32bit. Fortunately syscall-like insns can't be probed so it actually works, but it would be better to rename and use is_ia32_frame(). and goes all the way back to: 0326f5a94dde ("uprobes/core: Handle breakpoint and singlestep exceptions") Oh well. 7+ years until someone actually tried a uretprobe on a 32bit process on a 64bit kernel.... Fixes: 0326f5a94dde ("uprobes/core: Handle breakpoint and singlestep exceptions") Signed-off-by: Sebastian Mayr <me@sam.st> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Dmitry Safonov <dsafonov@virtuozzo.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20190728152617.7308-1-me@sam.st
2019-08-26x86/apic: Fix arch_dynirq_lower_bound() bug for DT enabled machinesThomas Gleixner1-1/+7
Rahul Tanwar reported the following bug on DT systems: > 'ioapic_dynirq_base' contains the virtual IRQ base number. Presently, it is > updated to the end of hardware IRQ numbers but this is done only when IOAPIC > configuration type is IOAPIC_DOMAIN_LEGACY or IOAPIC_DOMAIN_STRICT. There is > a third type IOAPIC_DOMAIN_DYNAMIC which applies when IOAPIC configuration > comes from devicetree. > > See dtb_add_ioapic() in arch/x86/kernel/devicetree.c > > In case of IOAPIC_DOMAIN_DYNAMIC (DT/OF based system), 'ioapic_dynirq_base' > remains to zero initialized value. This means that for OF based systems, > virtual IRQ base will get set to zero. Such systems will very likely not even boot. For DT enabled machines ioapic_dynirq_base is irrelevant and not updated, so simply map the IRQ base 1:1 instead. Reported-by: Rahul Tanwar <rahul.tanwar@linux.intel.com> Tested-by: Rahul Tanwar <rahul.tanwar@linux.intel.com> Tested-by: Andy Shevchenko <andriy.shevchenko@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: alan@linux.intel.com Cc: bp@alien8.de Cc: cheol.yong.kim@intel.com Cc: qi-ming.wu@intel.com Cc: rahul.tanwar@intel.com Cc: rppt@linux.ibm.com Cc: tony.luck@intel.com Link: http://lkml.kernel.org/r/20190821081330.1187-1-rahul.tanwar@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-08-26Merge tag 'v5.3-rc6' into x86/cpu, to pick up fixesIngo Molnar13-60/+345
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-08-23clocksource/drivers/hyperv: Add Hyper-V specific sched clock functionTianyu Lan1-0/+8
Hyper-V guests use the default native_sched_clock() in pv_ops.time.sched_clock on x86. But native_sched_clock() directly uses the raw TSC value, which can be discontinuous in a Hyper-V VM. Add the generic hv_setup_sched_clock() to set the sched clock function appropriately. On x86, this sets pv_ops.time.sched_clock to read the Hyper-V reference TSC value that is scaled and adjusted to be continuous. Also move the Hyper-V reference TSC initialization much earlier in the boot process so no discontinuity is observed when pv_ops.time.sched_clock calculates its offset. [ tglx: Folded build fix ] Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lkml.kernel.org/r/20190814123216.32245-3-Tianyu.Lan@microsoft.com
2019-08-23x86/dma: Get rid of iommu_pass_throughJoerg Roedel1-17/+3
This variable has no users anymore. Remove it and tell the IOMMU code via its new functions about requested DMA modes. Reviewed-by: Borislav Petkov <bp@suse.de> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-08-20x86/PCI: Remove superfluous returns from void functionsKrzysztof Wilczynski1-4/+0
Remove unnecessary empty return statements at the end of void functions in arch/x86/kernel/quirks.c. Signed-off-by: Krzysztof Wilczynski <kw@linux.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Bjorn Helgaas <helgaas@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: linux-pci@vger.kernel.org Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20190820065121.16594-1-kw@linux.com
2019-08-20acpi: Ignore acpi_rsdp kernel param when the kernel has been locked downJosh Boyer2-0/+6
This option allows userspace to pass the RSDP address to the kernel, which makes it possible for a user to modify the workings of hardware. Reject the option when the kernel is locked down. This requires some reworking of the existing RSDP command line logic, since the early boot code also makes use of a command-line passed RSDP when locating the SRAT table before the lockdown code has been initialised. This is achieved by separating the command line RSDP path in the early boot code from the generic RSDP path, and then copying the command line RSDP into boot params in the kernel proper if lockdown is not enabled. If lockdown is enabled and an RSDP is provided on the command line, this will only be used when parsing SRAT (which shouldn't permit kernel code execution) and will be ignored in the rest of the kernel. (Modified by Matthew Garrett in order to handle the early boot RSDP environment) Signed-off-by: Josh Boyer <jwboyer@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Matthew Garrett <mjg59@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> cc: Dave Young <dyoung@redhat.com> cc: linux-acpi@vger.kernel.org Signed-off-by: James Morris <jmorris@namei.org>
2019-08-20x86/msr: Restrict MSR access when the kernel is locked downMatthew Garrett1-0/+8
Writing to MSRs should not be allowed if the kernel is locked down, since it could lead to execution of arbitrary code in kernel mode. Based on a patch by Kees Cook. Signed-off-by: Matthew Garrett <mjg59@google.com> Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Kees Cook <keescook@chromium.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> cc: x86@kernel.org Signed-off-by: James Morris <jmorris@namei.org>
2019-08-20x86: Lock down IO port access when the kernel is locked downMatthew Garrett1-2/+5
IO port access would permit users to gain access to PCI configuration registers, which in turn (on a lot of hardware) give access to MMIO register space. This would potentially permit root to trigger arbitrary DMA, so lock it down by default. This also implicitly locks down the KDADDIO, KDDELIO, KDENABIO and KDDISABIO console ioctls. Signed-off-by: Matthew Garrett <mjg59@google.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Kees Cook <keescook@chromium.org> cc: x86@kernel.org Signed-off-by: James Morris <jmorris@namei.org>
2019-08-20kexec_file: split KEXEC_VERIFY_SIG into KEXEC_SIG and KEXEC_SIG_FORCEJiri Bohac1-2/+2
This is a preparatory patch for kexec_file_load() lockdown. A locked down kernel needs to prevent unsigned kernel images from being loaded with kexec_file_load(). Currently, the only way to force the signature verification is compiling with KEXEC_VERIFY_SIG. This prevents loading usigned images even when the kernel is not locked down at runtime. This patch splits KEXEC_VERIFY_SIG into KEXEC_SIG and KEXEC_SIG_FORCE. Analogous to the MODULE_SIG and MODULE_SIG_FORCE for modules, KEXEC_SIG turns on the signature verification but allows unsigned images to be loaded. KEXEC_SIG_FORCE disallows images without a valid signature. Signed-off-by: Jiri Bohac <jbohac@suse.cz> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Matthew Garrett <mjg59@google.com> cc: kexec@lists.infradead.org Signed-off-by: James Morris <jmorris@namei.org>
2019-08-20lockdown: Copy secure_boot flag in boot params across kexec rebootDave Young1-0/+1
Kexec reboot in case secure boot being enabled does not keep the secure boot mode in new kernel, so later one can load unsigned kernel via legacy kexec_load. In this state, the system is missing the protections provided by secure boot. Adding a patch to fix this by retain the secure_boot flag in original kernel. secure_boot flag in boot_params is set in EFI stub, but kexec bypasses the stub. Fixing this issue by copying secure_boot flag across kexec reboot. Signed-off-by: Dave Young <dyoung@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Matthew Garrett <mjg59@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> cc: kexec@lists.infradead.org Signed-off-by: James Morris <jmorris@namei.org>
2019-08-20x86/irq: Check for VECTOR_UNUSED directlyHeiner Kallweit1-1/+1
It's simpler and more intuitive to directly check for VECTOR_UNUSED than checking whether the other error codes are not set. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/caeaca93-5ee1-cea1-8894-3aa0d5b19241@gmail.com
2019-08-20x86/irq: Move IS_ERR_OR_NULL() check into common do_IRQ() codeHeiner Kallweit3-17/+7
Both the 64bit and the 32bit handle_irq() implementation check the irq descriptor pointer with IS_ERR_OR_NULL() and return failure. That can be done simpler in the common do_IRQ() code. This reduces the 64bit handle_irq() function to a wrapper around generic_handle_irq_desc(). Invoke it directly from do_IRQ() to spare the extra function call. [ tglx: Got rid of the #ifdef and massaged changelog ] Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/2ec758c7-9aaa-73ab-f083-cc44c86aa741@gmail.com
2019-08-19x86/CPU/AMD: Clear RDRAND CPUID bit on AMD family 15h/16hTom Lendacky1-0/+66
There have been reports of RDRAND issues after resuming from suspend on some AMD family 15h and family 16h systems. This issue stems from a BIOS not performing the proper steps during resume to ensure RDRAND continues to function properly. RDRAND support is indicated by CPUID Fn00000001_ECX[30]. This bit can be reset by clearing MSR C001_1004[62]. Any software that checks for RDRAND support using CPUID, including the kernel, will believe that RDRAND is not supported. Update the CPU initialization to clear the RDRAND CPUID bit for any family 15h and 16h processor that supports RDRAND. If it is known that the family 15h or family 16h system does not have an RDRAND resume issue or that the system will not be placed in suspend, the "rdrand=force" kernel parameter can be used to stop the clearing of the RDRAND CPUID bit. Additionally, update the suspend and resume path to save and restore the MSR C001_1004 value to ensure that the RDRAND CPUID setting remains in place after resuming from suspend. Note, that clearing the RDRAND CPUID bit does not prevent a processor that normally supports the RDRAND instruction from executing it. So any code that determined the support based on family and model won't #UD. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Chen Yu <yu.c.chen@intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: "linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org> Cc: "linux-pm@vger.kernel.org" <linux-pm@vger.kernel.org> Cc: Nathan Chancellor <natechancellor@gmail.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: <stable@vger.kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "x86@kernel.org" <x86@kernel.org> Link: https://lkml.kernel.org/r/7543af91666f491547bd86cebb1e17c66824ab9f.1566229943.git.thomas.lendacky@amd.com
2019-08-19x86/apic: Handle missing global clockevent gracefullyThomas Gleixner1-15/+53
Some newer machines do not advertise legacy timers. The kernel can handle that situation if the TSC and the CPU frequency are enumerated by CPUID or MSRs and the CPU supports TSC deadline timer. If the CPU does not support TSC deadline timer the local APIC timer frequency has to be known as well. Some Ryzens machines do not advertize legacy timers, but there is no reliable way to determine the bus frequency which feeds the local APIC timer when the machine allows overclocking of that frequency. As there is no legacy timer the local APIC timer calibration crashes due to a NULL pointer dereference when accessing the not installed global clock event device. Switch the calibration loop to a non interrupt based one, which polls either TSC (if frequency is known) or jiffies. The latter requires a global clockevent. As the machines which do not have a global clockevent installed have a known TSC frequency this is a non issue. For older machines where TSC frequency is not known, there is no known case where the legacy timers do not exist as that would have been reported long ago. Reported-by: Daniel Drake <drake@endlessm.com> Reported-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Daniel Drake <drake@endlessm.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1908091443030.21433@nanos.tec.linutronix.de Link: http://bugzilla.opensuse.org/show_bug.cgi?id=1142926#c12
2019-08-17x86/cpu: Use constant definitions for CPU modelsRahul Tanwar1-3/+3
Replace model numbers with their respective macro definitions when comparing CPU models. Suggested-by: Andy Shevchenko <andriy.shevchenko@intel.com> Signed-off-by: Rahul Tanwar <rahul.tanwar@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: alan@linux.intel.com Cc: cheol.yong.kim@intel.com Cc: Hans de Goede <hdegoede@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: qi-ming.wu@intel.com Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/f7a0e142faa953a53d5f81f78055e1b3c793b134.1565940653.git.rahul.tanwar@linux.intel.com
2019-08-12x86/apic/32: Fix yet another implicit fallthrough warningBorislav Petkov1-1/+2
Fix arch/x86/kernel/apic/probe_32.c: In function ‘default_setup_apic_routing’: arch/x86/kernel/apic/probe_32.c:146:7: warning: this statement may fall through [-Wimplicit-fallthrough=] if (!APIC_XAPIC(version)) { ^ arch/x86/kernel/apic/probe_32.c:151:3: note: here case X86_VENDOR_HYGON: ^~~~ for 32-bit builds. Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190811154036.29805-1-bp@alien8.de
2019-08-12x86/umwait: Fix error handling in umwait_init()Fenghua Yu1-1/+38
Currently, failure of cpuhp_setup_state() is ignored and the syscore ops and the control interfaces can still be added even after the failure. But, this error handling will cause a few issues: 1. The CPUs may have different values in the IA32_UMWAIT_CONTROL MSR because there is no way to roll back the control MSR on the CPUs which already set the MSR before the failure. 2. If the sysfs interface is added successfully, there will be a mismatch between the global control value and the control MSR: - The interface shows the default global control value. But, the control MSR is not set to the value because the CPU online function, which is supposed to set the MSR to the value, is not installed. - If the sysadmin changes the global control value through the interface, the control MSR on all current online CPUs is set to the new value. But, the control MSR on newly onlined CPUs after the value change will not be set to the new value due to lack of the CPU online function. 3. On resume from suspend/hibernation, the boot CPU restores the control MSR to the global control value through the syscore ops. But, the control MSR on all APs is not set due to lake of the CPU online function. To solve the issues and enforce consistent behavior on the failure of the CPU hotplug setup, make the following changes: 1. Cache the original control MSR value which is configured by hardware or BIOS before kernel boot. This value is likely to be 0. But it could be a different number as well. Cache the control MSR only once before the MSR is changed. 2. Add the CPU offline function so that the MSR is restored to the original control value on all CPUs on the failure. 3. On the failure, exit from cpumait_init() so that the syscore ops and the control interfaces are not added. Reported-by: Valdis Kletnieks <valdis.kletnieks@vt.edu> Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/1565401237-60936-1-git-send-email-fenghua.yu@intel.com
2019-08-11Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds2-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "A few fixes for x86: - Don't reset the carefully adjusted build flags for the purgatory and remove the unwanted flags instead. The 'reset all' approach led to build fails under certain circumstances. - Unbreak CLANG build of the purgatory by avoiding the builtin memcpy/memset implementations. - Address missing prototype warnings by including the proper header - Fix yet more fall-through issues" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/lib/cpu: Address missing prototypes warning x86/purgatory: Use CFLAGS_REMOVE rather than reset KBUILD_CFLAGS x86/purgatory: Do not use __builtin_memcpy and __builtin_memset x86: mtrr: cyrix: Mark expected switch fall-through x86/ptrace: Mark expected switch fall-through
2019-08-10Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-8/+0
Pull kvm fixes from Paolo Bonzini: "Bugfixes (arm and x86) and cleanups" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: selftests: kvm: Adding config fragments KVM: selftests: Update gitignore file for latest changes kvm: remove unnecessary PageReserved check KVM: arm/arm64: vgic: Reevaluate level sensitive interrupts on enable KVM: arm: Don't write junk to CP15 registers on reset KVM: arm64: Don't write junk to sysregs on reset KVM: arm/arm64: Sync ICH_VMCR_EL2 back when about to block x86: kvm: remove useless calls to kvm_para_available KVM: no need to check return value of debugfs_create functions KVM: remove kvm_arch_has_vcpu_debugfs() KVM: Fix leak vCPU's VMCS value into other pCPU KVM: Check preempted_in_kernel for involuntary preemption KVM: LAPIC: Don't need to wakeup vCPU twice afer timer fire arm64: KVM: hyp: debug-sr: Mark expected switch fall-through KVM: arm64: Update kvm_arm_exception_class and esr_class_str for new EC KVM: arm: vgic-v3: Mark expected switch fall-through arm64: KVM: regmap: Fix unexpected switch fall-through KVM: arm/arm64: Introduce kvm_pmu_vcpu_init() to setup PMU counter index
2019-08-09Merge tag 'kvmarm-fixes-for-5.3' of ↵Paolo Bonzini17-58/+52
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm fixes for 5.3 - A bunch of switch/case fall-through annotation, fixing one actual bug - Fix PMU reset bug - Add missing exception class debug strings
2019-08-09fs/core/vmcore: Move sev_active() reference to x86 arch codeThiago Jung Bauermann1-0/+5
Secure Encrypted Virtualization is an x86-specific feature, so it shouldn't appear in generic kernel code because it forces non-x86 architectures to define the sev_active() function, which doesn't make a lot of sense. To solve this problem, add an x86 elfcorehdr_read() function to override the generic weak implementation. To do that, it's necessary to make read_from_oldmem() public so that it can be used outside of vmcore.c. Also, remove the export for sev_active() since it's only used in files that won't be built as modules. Signed-off-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Lianbo Jiang <lijiang@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190806044919.10622-6-bauerman@linux.ibm.com
2019-08-07x86/apic: Annotate global config variables as "read-only after init"Sean Christopherson1-13/+13
Mark the APIC's global config variables that are constant after boot as __ro_after_init to help document that the majority of the APIC config is not changed at runtime, and to harden the kernel a smidge. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190805212134.12001-1-sean.j.christopherson@intel.com
2019-08-07x86: mtrr: cyrix: Mark expected switch fall-throughGustavo A. R. Silva1-0/+1
Mark switch cases where we are expecting to fall through. Fix the following warning (Building: i386_defconfig i386): arch/x86/kernel/cpu/mtrr/cyrix.c:99:6: warning: this statement may fall through [-Wimplicit-fallthrough=] Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/20190805201712.GA19927@embeddedor
2019-08-07x86/ptrace: Mark expected switch fall-throughGustavo A. R. Silva1-0/+1
Mark switch cases where we are expecting to fall through. Fix the following warning (Building: allnoconfig i386): arch/x86/kernel/ptrace.c:202:6: warning: this statement may fall through [-Wimplicit-fallthrough=] if (unlikely(value == 0)) ^ arch/x86/kernel/ptrace.c:206:2: note: here default: ^~~~~~~ Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/20190805195654.GA17831@embeddedor
2019-08-06Merge tag 'drm-intel-next-2019-07-30' of ↵Dave Airlie1-0/+1
git://anongit.freedesktop.org/drm/drm-intel into drm-next - More changes on simplifying locking mechanisms (Chris) - Selftests fixes and improvements (Chris) - More work around engine tracking for better handling (Chris, Tvrtko) - HDCP debug and info improvements (Ram, Ashuman) - Add DSI properties (Vandita) - Rework on sdvo support for better debuggability before fixing bugs (Ville) - Display PLLs fixes and improvements, specially targeting Ice Lake (Imre, Matt, Ville) - Perf fixes and improvements (Lionel) - Enumerate scratch buffers (Lionel) - Add infra to hold off preemption on a request (Lionel) - Ice Lake color space fixes (Uma) - Type-C fixes and improvements (Lucas) - Fix and improvements around workarounds (Chris, John, Tvrtko) - GuC related fixes and improvements (Chris, Daniele, Michal, Tvrtko) - Fix on VLV/CHV display power domain (Ville) - Improvements around Watermark (Ville) - Favor intel_ types on intel_atomic functions (Ville) - Don’t pass stack garbage to pcode (Ville) - Improve display tracepoints (Steven) - Don’t overestimate 4:2:0 link symbol clock (Ville) - Add support for 4th pipe and transcoder (Lucas) - Introduce initial support for Tiger Lake platform (Daniele, Lucas, Mahesh, Jose, Imre, Mika, Vandita, Rodrigo, Michel) - PPGTT allocation simplification (Chris) - Standardize function names and suffixes to make clean, symmetric and let checkpatch happy (Janusz) - Skip SINK_COUNT read on CH7511 (Ville) - Fix on kernel documentation (Chris, Michal) - Add modular FIA (Anusha, Lucas) - Fix EHL display (Matt, Vivek) - Enable hotplug retry (Imre, Jose) - Disable preemption under GVT (Chris) - OA; Reconfigure context on the fly (Chris) - Fixes and improvements around engine reset. (Chris) - Small clean up on display pipe fault mask (Ville) - Make sure cdclk is high enough for DP audio on VLV/CHV (Ville) - Drop some wmb() and improve pwrite flush (Chris) - Fix critical PSR regression (DK) - Remove unused variables (YueHaibing) - Use dev_get_drvdata for simplification (Chunhong) - Use upstream version of header tests (Jani) drm-intel-next-2019-07-08: - Signal fence completion from i915_request_wait (Chris) - Fixes and improvements around rings pin/unpin (Chris) - Display uncore prep patches (Daniele) - Execlists preemption improvements (Chris) - Selftests fixes and improvements (Chris) - More Elkhartlake enabling work (Vandita, Jose, Matt, Vivek) - Defer address space cleanup to an RCU worker (Chris) - Implicit dev_priv removal and GT compartmentalization and other related follow-ups (Tvrtko, Chris) - Prevent dereference of engine before NULL check in error capture (Chris) - GuC related fixes (Daniele, Robert) - Many changes on active tracking, timelines and locking mechanisms (Chris) - Disable SAMPLER_STATE prefetching on Gen11 (HW W/a) (Kenneth) - I915_perf fixes (Lionel) - Add Ice Lake PCI ID (Mika) - eDP backlight fix (Lee) - Fix various gen2 tracepoints (Ville) - Some irq vfunc clean-up and improvements (Ville) - Move OA files to separated folder (Michal) - Display self contained headers clean-up (Jani) - Preparation for 4th pile (Lucas) - Move atomic commit, watermark and other places to use more intel_crtc_state (Maarten) - Many Ice Lake Type C and Thunderbolt fixes (Imre) - Fix some Ice Lake hw w/a whitelist regs (Lionel) - Fix memleak in runtime wakeref tracking (Mika) - Remove unused Private PPAT manager (Michal) - Don't check PPGTT presence on PPGTT-only platforms (Michal) - Fix ICL DSI suspend/resume (Chris) - Fix ICL Bandwidth issues (Ville) - Add N & CTS values for 10/12 bit deep color (Aditya) - Moving more GT related stuff under gt folder (Chris) - Forcewake related fixes (Chris) - Show support for accurate sw PMU busyness tracking (Chris) - Handle gtt double alloc failures (Chris) - Upgrade to new GuC version (Michal) - Improve w/a debug dumps and pull engine w/a initialization into a common (Chris) - Look for instdone on all engines at hangcheck (Tvrtko) - Engine lookup simplification (Chris) - Many plane color formats fixes and improvements (Ville) - Fix some compilation issues (YueHaibing) - GTT page directory clean up and improvements (Mika) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190801201314.GA23635@intel.com
2019-08-05x86: kvm: remove useless calls to kvm_para_availablePaolo Bonzini1-8/+0
Most code in arch/x86/kernel/kvm.c is called through x86_hyper_kvm, and thus only runs if KVM has been detected. There is no need to check again for the CPUID base. Cc: Sergio Lopez <slp@redhat.com> Cc: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-08-05x86/mce: Don't check for the overflow bit on action optional machine checksTony Luck1-2/+2
We currently do not process SRAO (Software Recoverable Action Optional) machine checks if they are logged with the overflow bit set to 1 in the machine check bank status register. This is overly conservative. There are two cases where we could end up with an SRAO+OVER log based on the SDM volume 3 overwrite rules in "Table 15-8. Overwrite Rules for UC, CE, and UCR Errors" 1) First a corrected error is logged, then the SRAO error overwrites. The second error overwrites the first because uncorrected errors have a higher severity than corrected errors. 2) The SRAO error was logged first, followed by a correcetd error. In this case the first error is retained in the bank. So in either case the machine check bank will contain the address of the SRAO error. So we can process that even if the overflow bit was set. Reported-by: Yongkai Wu <yongkaiwu@tencent.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20190718182920.32621-1-tony.luck@intel.com
2019-07-31x86/kvm: Use CONFIG_PREEMPTIONThomas Gleixner1-1/+1
CONFIG_PREEMPTION is selected by CONFIG_PREEMPT and by CONFIG_PREEMPT_RT. Both PREEMPT and PREEMPT_RT require the same functionality which today depends on CONFIG_PREEMPT. Switch the conditional for async pagefaults to use CONFIG_PREEMPTION. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Paul E. McKenney <paulmck@linux.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20190726212124.789755413@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-07-31x86/dumpstack: Indicate PREEMPT_RT in dumpsThomas Gleixner1-1/+6
Stack dumps print whether the kernel has preemption enabled or not. Extend it so a PREEMPT_RT enabled kernel can be identified. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Paul E. McKenney <paulmck@linux.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20190726212124.699136351@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-07-31x86: Use CONFIG_PREEMPTIONThomas Gleixner1-1/+1
CONFIG_PREEMPTION is selected by CONFIG_PREEMPT and by CONFIG_PREEMPT_RT. Both PREEMPT and PREEMPT_RT require the same functionality which today depends on CONFIG_PREEMPT. Switch the entry code, preempt and kprobes conditionals over to CONFIG_PREEMPTION. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Paul E. McKenney <paulmck@linux.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20190726212124.608488448@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-07-30cpuidle-haltpoll: disable host side polling when kvm virtualizedMarcelo Tosatti1-0/+42
When performing guest side polling, it is not necessary to also perform host side polling. So disable host side polling, via the new MSR interface, when loading cpuidle-haltpoll driver. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-07-30add cpuidle-haltpoll driverMarcelo Tosatti1-1/+1
Add a cpuidle driver that calls the architecture default_idle routine. To be used in conjunction with the haltpoll governor. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-07-29Merge drm/drm-next into drm-intel-next-queuedRodrigo Vivi97-1540/+2663
Catching up with 5.3-rc* Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2019-07-28Merge branch master from ↵Thomas Gleixner38-189/+436
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git Pick up the spectre documentation so the Grand Schemozzle can be added.
2019-07-28x86/speculation/swapgs: Exclude ATOMs from speculation through SWAPGSThomas Gleixner2-30/+32
Intel provided the following information: On all current Atom processors, instructions that use a segment register value (e.g. a load or store) will not speculatively execute before the last writer of that segment retires. Thus they will not use a speculatively written segment value. That means on ATOMs there is no speculation through SWAPGS, so the SWAPGS entry paths can be excluded from the extra LFENCE if PTI is disabled. Create a separate bug flag for the through SWAPGS speculation and mark all out-of-order ATOMs and AMD/HYGON CPUs as not affected. The in-order ATOMs are excluded from the whole mitigation mess anyway. Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Tyler Hicks <tyhicks@canonical.com> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
2019-07-25x86/apic/x2apic: Implement IPI shorthands supportThomas Gleixner3-4/+13
All callers of apic->send_IPI_all() and apic->send_IPI_allbutself() contain the decision logic for shorthand invocation already and invoke send_IPI_mask() if the prereqisites are not satisfied. Implement shorthand support for x2apic. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190722105221.134696837@linutronix.de
2019-07-25x86/apic/flat64: Remove the IPI shorthand decision logicThomas Gleixner2-50/+6
All callers of apic->send_IPI_all() and apic->send_IPI_allbutself() contain the decision logic for shorthand invocation already and invoke send_IPI_mask() if the prereqisites are not satisfied. Remove the now redundant decision logic in the APIC code and the duplicate helper in probe_64.c. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190722105221.042964120@linutronix.de
2019-07-25x86/apic: Share common IPI helpersThomas Gleixner2-18/+18
The 64bit implementations need the same wrappers around __default_send_IPI_shortcut() as 32bit. Move them out of the 32bit section. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190722105220.951534451@linutronix.de
2019-07-25x86/apic: Remove the shorthand decision logicThomas Gleixner1-24/+3
All callers of apic->send_IPI_all() and apic->send_IPI_allbutself() contain the decision logic for shorthand invocation already and invoke send_IPI_mask() if the prereqisites are not satisfied. Remove the now redundant decision logic in the 32bit implementation. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190722105220.860244707@linutronix.de
2019-07-25x86/smp: Enhance native_send_call_func_ipi()Thomas Gleixner1-13/+11
Nadav noticed that the cpumask allocations in native_send_call_func_ipi() are noticeable in microbenchmarks. Use the new cpumask_or_equal() function to simplify the decision whether the supplied target CPU mask is either equal to cpu_online_mask or equal to cpu_online_mask except for the CPU on which the function is invoked. cpumask_or_equal() or's the target mask and the cpumask of the current CPU together and compares it to cpu_online_mask. If the result is false, use the mask based IPI function, otherwise check whether the current CPU is set in the target mask and invoke either the send_IPI_all() or the send_IPI_allbutselt() APIC callback. Make the shorthand decision also depend on the static key which enables shorthand mode. That allows to remove the extra cpumask comparison with cpu_callout_mask. Reported-by: Nadav Amit <namit@vmware.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190722105220.768238809@linutronix.de
2019-07-25x86/smp: Move smp_function_call implementations into IPI codeThomas Gleixner2-40/+40
Move it where it belongs. That allows to keep all the shorthand logic in one place. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190722105220.677835995@linutronix.de
2019-07-25x86/apic: Provide and use helper for send_IPI_allbutself()Thomas Gleixner4-9/+16
To support IPI shorthands wrap invocations of apic->send_IPI_allbutself() in a helper function, so the static key controlling the shorthand mode is only in one place. Fixup all callers. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190722105220.492691679@linutronix.de
2019-07-25x86/apic: Add static key to Control IPI shorthandsThomas Gleixner3-1/+31
The IPI shorthand functionality delivers IPI/NMI broadcasts to all CPUs in the system. This can have similar side effects as the MCE broadcasting when CPUs are waiting in the BIOS or are offlined. The kernel tracks already the state of offlined CPUs whether they have been brought up at least once so that the CR4 MCE bit is set to make sure that MCE broadcasts can't brick the machine. Utilize that information and compare it to the cpu_present_mask. If all present CPUs have been brought up at least once then the broadcast side effect is mitigated by disabling regular interrupt/IPI delivery in the APIC itself and by the cpu offline check at the begin of the NMI handler. Use a static key to switch between broadcasting via shorthands or sending the IPI/NMI one by one. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190722105220.386410643@linutronix.de
2019-07-25x86/apic: Move no_ipi_broadcast() out of 32bitThomas Gleixner3-29/+27
For the upcoming shorthand support for all APIC incarnations the command line option needs to be available for 64 bit as well. While at it, rename the control variable, make it static and mark it __ro_after_init. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190722105220.278327940@linutronix.de
2019-07-25x86/apic: Add NMI_VECTOR wait to IPI shorthandThomas Gleixner1-1/+4
To support NMI shorthand broadcasts add the safe wait for ICR idle for NMI vector delivery. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190722105220.185838026@linutronix.de
2019-07-25x86/apic: Remove dest argument from __default_send_IPI_shortcut()Thomas Gleixner4-14/+11
The SDM states: "The destination shorthand field of the ICR allows the delivery mode to be by-passed in favor of broadcasting the IPI to all the processors on the system bus and/or back to itself (see Section 10.6.1, Interrupt Command Register (ICR)). Three destination shorthands are supported: self, all excluding self, and all including self. The destination mode is ignored when a destination shorthand is used." So there is no point to supply the destination mode to the shorthand delivery function. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190722105220.094613426@linutronix.de
2019-07-25x86/hotplug: Silence APIC and NMI when CPU is deadThomas Gleixner3-12/+33
In order to support IPI/NMI broadcasting via the shorthand mechanism side effects of shorthands need to be mitigated: Shorthand IPIs and NMIs hit all CPUs including unplugged CPUs Neither of those can be handled on unplugged CPUs for obvious reasons. It would be trivial to just fully disable the APIC via the enable bit in MSR_APICBASE. But that's not possible because clearing that bit on systems based on the 3 wire APIC bus would require a hardware reset to bring it back as the APIC would lose track of bus arbitration. On systems with FSB delivery APICBASE could be disabled, but it has to be guaranteed that no interrupt is sent to the APIC while in that state and it's not clear from the SDM whether it still responds to INIT/SIPI messages. Therefore stay on the safe side and switch the APIC into soft disabled mode so it won't deliver any regular vector to the CPU. NMIs are still propagated to the 'dead' CPUs. To mitigate that add a check for the CPU being offline on early nmi entry and if so bail. Note, this cannot use the stop/restart_nmi() magic which is used in the alternatives code. A dead CPU cannot invoke nmi_enter() or anything else due to RCU and other reasons. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1907241723290.1791@nanos.tec.linutronix.de