summaryrefslogtreecommitdiff
path: root/arch/x86/kernel
AgeCommit message (Collapse)AuthorFilesLines
2020-04-25x86/unwind/orc: Convert global variables to staticJosh Poimboeuf1-5/+5
These variables aren't used outside of unwind_orc.c, make them static. Also annotate some of them with '__ro_after_init', as applicable. Reviewed-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Jones <dsj@fb.com> Cc: Jann Horn <jannh@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: https://lore.kernel.org/r/43ae310bf7822b9862e571f36ae3474cfde8f301.1587808742.git.jpoimboe@redhat.com
2020-04-24x86/alternatives: Move temporary_mm helpers into CThomas Gleixner1-0/+55
The only user of these inlines is the text poke code and this must not be exposed to the world. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200421092559.139069561@linutronix.de
2020-04-24x86/cpu: Uninline CR4 accessorsThomas Gleixner2-1/+33
cpu_tlbstate is exported because various TLB-related functions need access to it, but cpu_tlbstate is sensitive information which should only be accessed by well-contained kernel functions and not be directly exposed to modules. The various CR4 accessors require cpu_tlbstate as the CR4 shadow cache is located there. In preparation for unexporting cpu_tlbstate, create a builtin function for manipulating CR4 and rework the various helpers to use it. No functional change. [ bp: push the export of native_write_cr4() only when CONFIG_LKTDM=m to the last patch in the series. ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200421092558.939985695@linutronix.de
2020-04-23x86, sched: Move check for CPU type to caller functionGiovanni Gherdovich1-4/+2
Improve readability of the function intel_set_max_freq_ratio() by moving the check for KNL CPUs there, together with checks for GLM and SKX. Signed-off-by: Giovanni Gherdovich <ggherdovich@suse.cz> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lkml.kernel.org/r/20200416054745.740-5-ggherdovich@suse.cz
2020-04-23x86, sched: Don't enable static key when starting secondary CPUsPeter Zijlstra (Intel)1-7/+14
The static key arch_scale_freq_key only needs to be enabled once (at boot). This change fixes a bug by which the key was enabled every time cpu0 is started, even as a secondary CPU during cpu hotplug. Secondary CPUs are started from the idle thread: setting a static key from there means acquiring a lock and may result in sleeping in the idle task, causing CPU lockup. Another consequence of this change is that init_counter_refs() is now called on each CPU correctly; previously the function on_each_cpu() was used, but it was called at boot when the only online cpu is cpu0. [ggherdovich@suse.cz: Tested and wrote changelog] Fixes: 1567c3e3467c ("x86, sched: Add support for frequency invariance") Reported-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Giovanni Gherdovich <ggherdovich@suse.cz> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lkml.kernel.org/r/20200416054745.740-4-ggherdovich@suse.cz
2020-04-23x86, sched: Account for CPUs with less than 4 cores in freq. invarianceGiovanni Gherdovich1-3/+8
If a CPU has less than 4 physical cores, MSR_TURBO_RATIO_LIMIT will rightfully report that the 4C turbo ratio is zero. In such cases, use the 1C turbo ratio instead for frequency invariance calculations. Fixes: 1567c3e3467c ("x86, sched: Add support for frequency invariance") Reported-by: Like Xu <like.xu@linux.intel.com> Reported-by: Neil Rickert <nwr10cst-oslnx@yahoo.com> Signed-off-by: Giovanni Gherdovich <ggherdovich@suse.cz> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Dave Kleikamp <dave.kleikamp@oracle.com> Link: https://lkml.kernel.org/r/20200416054745.740-3-ggherdovich@suse.cz
2020-04-23x86, sched: Bail out of frequency invariance if base frequency is unknownGiovanni Gherdovich1-0/+9
Some hypervisors such as VMWare ESXi 5.5 advertise support for X86_FEATURE_APERFMPERF but then fill all MSR's with zeroes. In particular, MSR_PLATFORM_INFO set to zero tricks the code that wants to know the base clock frequency of the CPU (highest non-turbo frequency), producing a division by zero when computing the ratio turbo_freq/base_freq necessary for frequency invariant accounting. It is to be noted that even if MSR_PLATFORM_INFO contained the appropriate data, APERF and MPERF are constantly zero on ESXi 5.5, thus freq-invariance couldn't be done in principle (not that it would make a lot of sense in a VM anyway). The real problem is advertising X86_FEATURE_APERFMPERF. This appears to be fixed in more recent versions: ESXi 6.7 doesn't advertise that feature. Fixes: 1567c3e3467c ("x86, sched: Add support for frequency invariance") Signed-off-by: Giovanni Gherdovich <ggherdovich@suse.cz> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lkml.kernel.org/r/20200416054745.740-2-ggherdovich@suse.cz
2020-04-22x86/microcode: Fix return value for microcode late loadingMihai Carabas1-8/+7
The return value from stop_machine() might not be consistent. stop_machine_cpuslocked() returns: - zero if all functions have returned 0. - a non-zero value if at least one of the functions returned a non-zero value. There is no way to know if it is negative or positive. So make __reload_late() return 0 on success or negative otherwise. [ bp: Unify ret val check and touch up. ] Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/1587497318-4438-1-git-send-email-mihai.carabas@oracle.com
2020-04-22x86,ftrace: Shrink ftrace_regs_caller() by one bytePeter Zijlstra1-2/+2
'Optimize' ftrace_regs_caller. Instead of comparing against an immediate, the more natural way to test for zero on x86 is: 'test %r,%r'. 48 83 f8 00 cmp $0x0,%rax 74 49 je 226 <ftrace_regs_call+0xa3> 48 85 c0 test %rax,%rax 74 49 je 225 <ftrace_regs_call+0xa2> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lkml.kernel.org/r/20200416115118.867411350@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-04-22x86,ftrace: Use SIZEOF_PTREGSPeter Zijlstra1-2/+2
There's a convenient macro for 'SS+8' called FRAME_SIZE. Use it to clarify things. (entry/calling.h calls this SIZEOF_PTREGS but we're using asm/ptrace-abi.h) Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lkml.kernel.org/r/20200416115118.808485515@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-04-22x86,ftrace: Fix ftrace_regs_caller() unwindPeter Zijlstra2-19/+25
The ftrace_regs_caller() trampoline does something 'funny' when there is a direct-caller present. In that case it stuffs the 'direct-caller' address on the return stack and then exits the function. This then results in 'returning' to the direct-caller with the exact registers we came in with -- an indirect tail-call without using a register. This however (rightfully) confuses objtool because the function shares a few instruction in order to have a single exit path, but the stack layout is different for them, depending through which path we came there. This is currently cludged by forcing the stack state to the non-direct case, but this generates actively wrong (ORC) unwind information for the direct case, leading to potential broken unwinds. Fix this issue by fully separating the exit paths. This results in having to poke a second RET into the trampoline copy, see ftrace_regs_caller_ret. This brings us to a second objtool problem, in order for it to perceive the 'jmp ftrace_epilogue' as a function exit, it needs to be recognised as a tail call. In order to make that happen, ftrace_epilogue needs to be the start of an STT_FUNC, so re-arrange code to make this so. Finally, a third issue is that objtool requires functions to exit with the same stack layout they started with, which is obviously violated in the direct case, employ the new HINT_RET_OFFSET to tell objtool this is an expected exception. Together, this results in generating correct ORC unwind information for the ftrace_regs_caller() function and it's trampoline copies. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lkml.kernel.org/r/20200416115118.749606694@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-04-20x86/speculation: Add Special Register Buffer Data Sampling (SRBDS) mitigationMark Gross3-0/+138
SRBDS is an MDS-like speculative side channel that can leak bits from the random number generator (RNG) across cores and threads. New microcode serializes the processor access during the execution of RDRAND and RDSEED. This ensures that the shared buffer is overwritten before it is released for reuse. While it is present on all affected CPU models, the microcode mitigation is not needed on models that enumerate ARCH_CAPABILITIES[MDS_NO] in the cases where TSX is not supported or has been disabled with TSX_CTRL. The mitigation is activated by default on affected processors and it increases latency for RDRAND and RDSEED instructions. Among other effects this will reduce throughput from /dev/urandom. * Enable administrator to configure the mitigation off when desired using either mitigations=off or srbds=off. * Export vulnerability status via sysfs * Rename file-scoped macros to apply for non-whitelist table initializations. [ bp: Massage, - s/VULNBL_INTEL_STEPPING/VULNBL_INTEL_STEPPINGS/g, - do not read arch cap MSR a second time in tsx_fused_off() - just pass it in, - flip check in cpu_set_bug_bits() to save an indentation level, - reflow comments. jpoimboe: s/Mitigated/Mitigation/ in user-visible strings tglx: Dropped the fused off magic for now ] Signed-off-by: Mark Gross <mgross@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Tested-by: Neelima Krishnan <neelima.krishnan@intel.com>
2020-04-20x86/cpu: Add a steppings field to struct x86_cpu_idMark Gross1-1/+6
Intel uses the same family/model for several CPUs. Sometimes the stepping must be checked to tell them apart. On x86 there can be at most 16 steppings. Add a steppings bitmask to x86_cpu_id and a X86_MATCH_VENDOR_FAMILY_MODEL_STEPPING_FEATURE macro and support for matching against family/model/stepping. [ bp: Massage. ] Signed-off-by: Mark Gross <mgross@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
2020-04-20x86/cpu: Add 'table' argument to cpu_matches()Mark Gross1-11/+14
To make cpu_matches() reusable for other matching tables, have it take a pointer to a x86_cpu_id table as an argument. [ bp: Flip arguments order. ] Signed-off-by: Mark Gross <mgross@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
2020-04-19Merge tag 'x86-urgent-2020-04-19' of ↵Linus Torvalds5-20/+55
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 and objtool fixes from Thomas Gleixner: "A set of fixes for x86 and objtool: objtool: - Ignore the double UD2 which is emitted in BUG() when CONFIG_UBSAN_TRAP is enabled. - Support clang non-section symbols in objtool ORC dump - Fix switch table detection in .text.unlikely - Make the BP scratch register warning more robust. x86: - Increase microcode maximum patch size for AMD to cope with new CPUs which have a larger patch size. - Fix a crash in the resource control filesystem when the removal of the default resource group is attempted. - Preserve Code and Data Prioritization enabled state accross CPU hotplug. - Update split lock cpu matching to use the new X86_MATCH macros. - Change the split lock enumeration as Intel finaly decided that the IA32_CORE_CAPABILITIES bits are not architectural contrary to what the SDM claims. !@#%$^! - Add Tremont CPU models to the split lock detection cpu match. - Add a missing static attribute to make sparse happy" * tag 'x86-urgent-2020-04-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/split_lock: Add Tremont family CPU models x86/split_lock: Bits in IA32_CORE_CAPABILITIES are not architectural x86/resctrl: Preserve CDP enable over CPU hotplug x86/resctrl: Fix invalid attempt at removing the default resource group x86/split_lock: Update to use X86_MATCH_INTEL_FAM6_MODEL() x86/umip: Make umip_insns static x86/microcode/AMD: Increase microcode PATCH_MAX_SIZE objtool: Make BP scratch register warning more robust objtool: Fix switch table detection in .text.unlikely objtool: Support Clang non-section symbols in ORC generation objtool: Support Clang non-section symbols in ORC dump objtool: Fix CONFIG_UBSAN_TRAP unreachable warnings
2020-04-18x86/split_lock: Add Tremont family CPU modelsTony Luck1-0/+3
Tremont CPUs support IA32_CORE_CAPABILITIES bits to indicate whether specific SKUs have support for split lock detection. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20200416205754.21177-4-tony.luck@intel.com
2020-04-18x86/split_lock: Bits in IA32_CORE_CAPABILITIES are not architecturalTony Luck1-14/+31
The Intel Software Developers' Manual erroneously listed bit 5 of the IA32_CORE_CAPABILITIES register as an architectural feature. It is not. Features enumerated by IA32_CORE_CAPABILITIES are model specific and implementation details may vary in different cpu models. Thus it is only safe to trust features after checking the CPU model. Icelake client and server models are known to implement the split lock detect feature even though they don't enumerate IA32_CORE_CAPABILITIES [ tglx: Use switch() for readability and massage comments ] Fixes: 6650cdd9a8cc ("x86/split_lock: Enable split lock detection by kernel") Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20200416205754.21177-3-tony.luck@intel.com
2020-04-17x86/early_printk: Remove unused includesAndy Shevchenko1-3/+0
After 1bd187de5364 ("x86, intel-mid: remove Intel MID specific serial support") the Intel MID header is not needed anymore. After 69c1f396f25b ("efi/x86: Convert x86 EFI earlyprintk into generic earlycon implementation") the EFI headers are not needed anymore. Remove the respective includes. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200326175415.8618-1-andriy.shevchenko@linux.intel.com
2020-04-17x86/resctrl: Preserve CDP enable over CPU hotplugJames Morse3-0/+16
Resctrl assumes that all CPUs are online when the filesystem is mounted, and that CPUs remember their CDP-enabled state over CPU hotplug. This goes wrong when resctrl's CDP-enabled state changes while all the CPUs in a domain are offline. When a domain comes online, enable (or disable!) CDP to match resctrl's current setting. Fixes: 5ff193fbde20 ("x86/intel_rdt: Add basic resctrl filesystem support") Suggested-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20200221162105.154163-1-james.morse@arm.com
2020-04-17x86/resctrl: Fix invalid attempt at removing the default resource groupReinette Chatre1-1/+2
The default resource group ("rdtgroup_default") is associated with the root of the resctrl filesystem and should never be removed. New resource groups can be created as subdirectories of the resctrl filesystem and they can be removed from user space. There exists a safeguard in the directory removal code (rdtgroup_rmdir()) that ensures that only subdirectories can be removed by testing that the directory to be removed has to be a child of the root directory. A possible deadlock was recently fixed with 334b0f4e9b1b ("x86/resctrl: Fix a deadlock due to inaccurate reference"). This fix involved associating the private data of the "mon_groups" and "mon_data" directories to the resource group to which they belong instead of NULL as before. A consequence of this change was that the original safeguard code preventing removal of "mon_groups" and "mon_data" found in the root directory failed resulting in attempts to remove the default resource group that ends in a BUG: kernel BUG at mm/slub.c:3969! invalid opcode: 0000 [#1] SMP PTI Call Trace: rdtgroup_rmdir+0x16b/0x2c0 kernfs_iop_rmdir+0x5c/0x90 vfs_rmdir+0x7a/0x160 do_rmdir+0x17d/0x1e0 do_syscall_64+0x55/0x1d0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fix this by improving the directory removal safeguard to ensure that subdirectories of the resctrl root directory can only be removed if they are a child of the resctrl filesystem's root _and_ not associated with the default resource group. Fixes: 334b0f4e9b1b ("x86/resctrl: Fix a deadlock due to inaccurate reference") Reported-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/884cbe1773496b5dbec1b6bd11bb50cffa83603d.1584461853.git.reinette.chatre@intel.com
2020-04-17x86/split_lock: Update to use X86_MATCH_INTEL_FAM6_MODEL()Tony Luck1-4/+2
The SPLIT_LOCK_CPU() macro escaped the tree-wide sweep for old-style initialization. Update to use X86_MATCH_INTEL_FAM6_MODEL(). Fixes: 6650cdd9a8cc ("x86/split_lock: Enable split lock detection by kernel") Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20200416205754.21177-2-tony.luck@intel.com
2020-04-15crash_dump: Remove no longer used saved_max_pfnKairui Song1-8/+0
saved_max_pfn was originally introduced in commit 92aa63a5a1bf ("[PATCH] kdump: Retrieve saved max pfn") It used to make sure that the user does not try to read the physical memory beyond saved_max_pfn. But since commit 921d58c0e699 ("vmcore: remove saved_max_pfn check") it's no longer used for the check. This variable doesn't have any users anymore so just remove it. [ bp: Drop the Calgary IOMMU reference from the commit message. ] Signed-off-by: Kairui Song <kasong@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Link: https://lkml.kernel.org/r/20200330181544.1595733-1-kasong@redhat.com
2020-04-15x86/umip: Make umip_insns staticJason Yan1-1/+1
Fix the following sparse warning: arch/x86/kernel/umip.c:84:12: warning: symbol 'umip_insns' was not declared. Should it be static? Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Jason Yan <yanaijie@huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Link: https://lkml.kernel.org/r/20200413082213.22934-1-yanaijie@huawei.com
2020-04-14Merge tag 'hyperv-fixes-signed' of ↵Linus Torvalds1-2/+12
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv fixes from Wei Liu: - a series from Tianyu Lan to fix crash reporting on Hyper-V - three miscellaneous cleanup patches * tag 'hyperv-fixes-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: x86/Hyper-V: Report crash data in die() when panic_on_oops is set x86/Hyper-V: Report crash register data when sysctl_record_panic_msg is not set x86/Hyper-V: Report crash register data or kmsg before running crash kernel x86/Hyper-V: Trigger crash enlightenment only once during system crash. x86/Hyper-V: Free hv_panic_page when fail to register kmsg dump x86/Hyper-V: Unload vmbus channel in hv panic callback x86: hyperv: report value of misc_features hv_debugfs: Make hv_debug_root static hv: hyperv_vmbus.h: Replace zero-length array with flexible-array member
2020-04-14x86/mce: Fixup exception only for the correct MCEsBorislav Petkov2-3/+18
The severity grading code returns IN_KERNEL_RECOV error context for errors which have happened in kernel space but from which the kernel can recover. Whether the recovery can happen is determined by the exception table entry having as handler ex_handler_fault() and which has been declared at build time using _ASM_EXTABLE_FAULT(). IN_KERNEL_RECOV is used in mce_severity_intel() to lookup the corresponding error severity in the severities table. However, the mapping back from error severity to whether the error is IN_KERNEL_RECOV is ambiguous and in the very paranoid case - which might not be possible right now - but be better safe than sorry later, an exception fixup could be attempted for another MCE whose address is in the exception table and has the proper severity. Which would be unfortunate, to say the least. Therefore, mark such MCEs explicitly as MCE_IN_KERNEL_RECOV so that the recovery attempt is done only for them. Document the whole handling, while at it, as it is not trivial. Reported-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20200407163414.18058-10-bp@alien8.de
2020-04-14x86/mce: Add mce=print_all optionTony Luck2-1/+7
Sometimes, when logs are getting lost, it's nice to just have everything dumped to the serial console. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20200214222720.13168-7-tony.luck@intel.com
2020-04-14x86/mce: Change default MCE logger to check mce->kflagsTony Luck1-16/+3
Instead of keeping count of how many handlers are registered on the MCE notifier chain and printing if below some magic value, look at mce->kflags to see if anyone claims to have handled/logged this error. [ bp: Do not print ->kflags in __print_mce(). ] Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20200214222720.13168-6-tony.luck@intel.com
2020-04-14x86/mce: Fix all mce notifiers to update the mce->kflags bitmaskTony Luck2-1/+8
If the handler took any action to log or deal with the error, set a bit in mce->kflags so that the default handler on the end of the machine check chain can see what has been done. Get rid of NOTIFY_STOP returns. Make the EDAC and dev-mcelog handlers skip over errors already processed by CEC. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20200214222720.13168-5-tony.luck@intel.com
2020-04-14x86/mce: Convert the CEC to use the MCE notifierTony Luck1-19/+0
The CEC code has its claws in a couple of routines in mce/core.c. Convert it to just register itself on the normal MCE notifier chain. [ bp: Make cec_add_elem() and cec_init() static. ] Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20200214222720.13168-3-tony.luck@intel.com
2020-04-14x86/mce: Rename "first" function as "early"Tony Luck1-5/+5
It isn't going to be first on the notifier chain when the CEC is moved to be a normal user of the notifier chain. Fix the enum for the MCE_PRIO symbols to list them in reverse order so that the compiler can give them numbers from low to high priority. Add an entry for MCE_PRIO_CEC as the highest priority. [ bp: Use passive voice, add comments. ] Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20200214222720.13168-2-tony.luck@intel.com
2020-04-14x86/mce/amd, edac: Remove report_gart_errorsBorislav Petkov1-2/+7
... because no one should be interested in spurious MCEs anyway. Make the filtering unconditional and move it to amd_filter_mce(). Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20200407163414.18058-2-bp@alien8.de
2020-04-14x86/mce/amd: Make threshold bank setting hotplug robustThomas Gleixner1-3/+11
Handle the cases when the CPU goes offline before the bank setting/reading happens. [ bp: Write commit message. ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200403161943.1458-8-bp@alien8.de
2020-04-14x86/mce/amd: Cleanup threshold device remove pathThomas Gleixner1-42/+37
Pass in the bank pointer directly to the cleaning up functions, obviating the need for per-CPU accesses. Make the clean up path interrupt-safe by cleaning the bank pointer first so that the rest of the teardown happens safe from the thresholding interrupt. No functional changes. [ bp: Write commit message and reverse bank->shared test to save an indentation level in threshold_remove_bank(). ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200403161943.1458-7-bp@alien8.de
2020-04-14x86/mce/amd: Straighten CPU hotplug pathThomas Gleixner1-17/+15
mce_threshold_create_device() hotplug callback runs on the plugged in CPU so: - use this_cpu_read() which is faster - pass in struct threshold_bank **bp to threshold_create_bank() and instead of doing per-CPU accesses - Use rdmsr_safe() instead of rdmsr_safe_on_cpu() which avoids an IPI. No functional changes. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200403161943.1458-6-bp@alien8.de
2020-04-14x86/mce/amd: Sanitize thresholding device creation hotplug pathThomas Gleixner2-41/+27
Drop the stupid threshold_init_device() initcall iterating over all online CPUs in favor of properly setting up everything on the CPU hotplug path, when each CPU's callback is invoked. [ bp: Write commit message. ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200403161943.1458-5-bp@alien8.de
2020-04-14x86/mce/amd: Protect a not-fully initialized bank from the thresholding ↵Thomas Gleixner1-2/+17
interrupt Make sure the thresholding bank descriptor is fully initialized when the thresholding interrupt fires after a hotplug event. [ bp: Write commit message and document long-forgotten bank_map. ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200403161943.1458-4-bp@alien8.de
2020-04-14x86/mce/amd: Init thresholding machinery only on relevant vendorsThomas Gleixner3-5/+17
... and not unconditionally. [ bp: Add a new vendor_flags bit for that. ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200403161943.1458-3-bp@alien8.de
2020-04-14x86/mce/amd: Do proper cleanup on error pathsThomas Gleixner1-7/+8
Drop kobject reference counts properly on error in the banks and blocks allocation functions. [ bp: Write commit message. ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200403161943.1458-2-bp@alien8.de
2020-04-14x86/32: Remove CONFIG_DOUBLEFAULTBorislav Petkov3-9/+1
Make the doublefault exception handler unconditional on 32-bit. Yes, it is important to be able to catch #DF exceptions instead of silent reboots. Yes, the code size increase is worth every byte. And one less CONFIG symbol is just the cherry on top. No functional changes. Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Andy Lutomirski <luto@kernel.org> Link: https://lkml.kernel.org/r/20200404083646.8897-1-bp@alien8.de
2020-04-13x86/smpboot: Remove the last ICPU() macroBorislav Petkov1-8/+9
Now all is using the shiny new macros. No code changed: # arch/x86/kernel/smpboot.o: text data bss dec hex filename 16432 2649 40 19121 4ab1 smpboot.o.before 16432 2649 40 19121 4ab1 smpboot.o.after md5: a58104003b72c1de533095bc5a4c30a9 smpboot.o.before.asm a58104003b72c1de533095bc5a4c30a9 smpboot.o.after.asm Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20200324185836.GI22931@zn.tnic
2020-04-13Merge tag 'v5.7-rc1' into locking/kcsan, to resolve conflicts and refreshIngo Molnar58-558/+1692
Resolve these conflicts: arch/x86/Kconfig arch/x86/kernel/Makefile Do a minor "evil merge" to move the KCSAN entry up a bit by a few lines in the Kconfig to reduce the probability of future conflicts. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-04-11x86/Hyper-V: Report crash register data or kmsg before running crash kernelTianyu Lan1-0/+10
We want to notify Hyper-V when a Linux guest VM crash occurs, so there is a record of the crash even when kdump is enabled. But crash_kexec_post_notifiers defaults to "false", so the kdump kernel runs before the notifiers and Hyper-V never gets notified. Fix this by always setting crash_kexec_post_notifiers to be true for Hyper-V VMs. Fixes: 81b18bce48af ("Drivers: HV: Send one page worth of kmsg dump over Hyper-V during panic") Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com> Link: https://lore.kernel.org/r/20200406155331.2105-5-Tianyu.Lan@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2020-04-11x86/split_lock: Provide handle_guest_split_lock()Thomas Gleixner1-5/+28
Without at least minimal handling for split lock detection induced #AC, VMX will just run into the same problem as the VMWare hypervisor, which was reported by Kenneth. It will inject the #AC blindly into the guest whether the guest is prepared or not. Provide a function for guest mode which acts depending on the host SLD mode. If mode == sld_warn, treat it like user space, i.e. emit a warning, disable SLD and mark the task accordingly. Otherwise force SIGBUS. [ bp: Add a !CPU_SUP_INTEL stub for handle_guest_split_lock(). ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Link: https://lkml.kernel.org/r/20200410115516.978037132@linutronix.de Link: https://lkml.kernel.org/r/20200402123258.895628824@linutronix.de
2020-04-11Merge branch 'akpm' (patches from Andrew)Linus Torvalds2-1/+6
Merge yet more updates from Andrew Morton: - Almost all of the rest of MM (memcg, slab-generic, slab, pagealloc, gup, hugetlb, pagemap, memremap) - Various other things (hfs, ocfs2, kmod, misc, seqfile) * akpm: (34 commits) ipc/util.c: sysvipc_find_ipc() should increase position index kernel/gcov/fs.c: gcov_seq_next() should increase position index fs/seq_file.c: seq_read(): add info message about buggy .next functions drivers/dma/tegra20-apb-dma.c: fix platform_get_irq.cocci warnings change email address for Pali Rohár selftests: kmod: test disabling module autoloading selftests: kmod: fix handling test numbers above 9 docs: admin-guide: document the kernel.modprobe sysctl fs/filesystems.c: downgrade user-reachable WARN_ONCE() to pr_warn_once() kmod: make request_module() return an error when autoloading is disabled mm/memremap: set caching mode for PCI P2PDMA memory to WC mm/memory_hotplug: add pgprot_t to mhp_params powerpc/mm: thread pgprot_t through create_section_mapping() x86/mm: introduce __set_memory_prot() x86/mm: thread pgprot_t through init_memory_mapping() mm/memory_hotplug: rename mhp_restrictions to mhp_params mm/memory_hotplug: drop the flags field from struct mhp_restrictions mm/special: create generic fallbacks for pte_special() and pte_mkspecial() mm/vma: introduce VM_ACCESS_FLAGS mm/vma: define a default value for VM_DATA_DEFAULT_FLAGS ...
2020-04-11x86/mm: thread pgprot_t through init_memory_mapping()Logan Gunthorpe1-1/+2
In preparation to support a pgprot_t argument for arch_add_memory(). It's required to move the prototype of init_memory_mapping() seeing the original location came before the definition of pgprot_t. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christoph Hellwig <hch@lst.de> Cc: David Hildenbrand <david@redhat.com> Cc: Eric Badger <ebadger@gigaio.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Will Deacon <will@kernel.org> Link: http://lkml.kernel.org/r/20200306170846.9333-4-logang@deltatee.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-11mm: hugetlb: optionally allocate gigantic hugepages using cmaRoman Gushchin1-0/+4
Commit 944d9fec8d7a ("hugetlb: add support for gigantic page allocation at runtime") has added the run-time allocation of gigantic pages. However it actually works only at early stages of the system loading, when the majority of memory is free. After some time the memory gets fragmented by non-movable pages, so the chances to find a contiguous 1GB block are getting close to zero. Even dropping caches manually doesn't help a lot. At large scale rebooting servers in order to allocate gigantic hugepages is quite expensive and complex. At the same time keeping some constant percentage of memory in reserved hugepages even if the workload isn't using it is a big waste: not all workloads can benefit from using 1 GB pages. The following solution can solve the problem: 1) On boot time a dedicated cma area* is reserved. The size is passed as a kernel argument. 2) Run-time allocations of gigantic hugepages are performed using the cma allocator and the dedicated cma area In this case gigantic hugepages can be allocated successfully with a high probability, however the memory isn't completely wasted if nobody is using 1GB hugepages: it can be used for pagecache, anon memory, THPs, etc. * On a multi-node machine a per-node cma area is allocated on each node. Following gigantic hugetlb allocation are using the first available numa node if the mask isn't specified by a user. Usage: 1) configure the kernel to allocate a cma area for hugetlb allocations: pass hugetlb_cma=10G as a kernel argument 2) allocate hugetlb pages as usual, e.g. echo 10 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages If the option isn't enabled or the allocation of the cma area failed, the current behavior of the system is preserved. x86 and arm-64 are covered by this patch, other architectures can be trivially added later. The patch contains clean-ups and fixes proposed and implemented by Aslan Bakirov and Randy Dunlap. It also contains ideas and suggestions proposed by Rik van Riel, Michal Hocko and Mike Kravetz. Thanks! Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Andreas Schaufler <andreas.schaufler@gmx.de> Acked-by: Mike Kravetz <mike.kravetz@oracle.com> Acked-by: Michal Hocko <mhocko@kernel.org> Cc: Aslan Bakirov <aslan@fb.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Joonsoo Kim <js1304@gmail.com> Link: http://lkml.kernel.org/r/20200407163840.92263-3-guro@fb.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-10Merge branches 'acpi-ec' and 'acpi-x86'Rafael J. Wysocki1-1/+1
* acpi-ec: ACPI: EC: Fix up fast path check in acpi_ec_add() * acpi-x86: ACPI, x86/boot: make acpi_nobgrt static
2020-04-09x86: hyperv: report value of misc_featuresOlaf Hering1-2/+2
A few kernel features depend on ms_hyperv.misc_features, but unlike its siblings ->features and ->hints, the value was never reported during boot. Signed-off-by: Olaf Hering <olaf@aepfle.de> Link: https://lore.kernel.org/r/20200407172739.31371-1-olaf@aepfle.de Signed-off-by: Wei Liu <wei.liu@kernel.org>
2020-04-08ACPI, x86/boot: make acpi_nobgrt staticJason Yan1-1/+1
Fix the following sparse warning: arch/x86/kernel/acpi/boot.c:48:5: warning: symbol 'acpi_nobgrt' was not declared. Should it be static? Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Jason Yan <yanaijie@huawei.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-04-06Merge tag 'acpi-5.7-rc1-2' of ↵Linus Torvalds1-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull more ACPI updates from Rafael Wysocki: "Additional ACPI updates. These update the ACPICA code in the kernel to the 20200326 upstream revision, fix an ACPI-related CPU hotplug deadlock on x86, update Intel Tiger Lake device IDs in some places, add a new ACPI backlight blacklist entry, update the "acpi_backlight" kernel command line switch documentation and clean up a CPPC library routine. Specifics: - Update the ACPICA code in the kernel to upstream revision 20200326 including: * Fix for a typo in a comment field (Bob Moore) * acpiExec namespace init file fixes (Bob Moore) * Addition of NHLT to the known tables list (Cezary Rojewski) * Conversion of PlatformCommChannel ASL keyword to PCC (Erik Kaneda) * acpiexec cleanup (Erik Kaneda) * WSMT-related typo fix (Erik Kaneda) * sprintf() utility function fix (John Levon) * IVRS IVHD type 11h parsing implementation (Michał Żygowski) * IVRS IVHD type 10h reserved field name fix (Michał Żygowski) - Fix ACPI-related CPU hotplug deadlock on x86 (Qian Cai) - Fix Intel Tiger Lake ACPI device IDs in several places (Gayatri Kammela) - Add ACPI backlight blacklist entry for Acer Aspire 5783z (Hans de Goede) - Fix documentation of the "acpi_backlight" kernel command line switch (Randy Dunlap) - Clean up the acpi_get_psd_map() CPPC library routine (Liguang Zhang)" * tag 'acpi-5.7-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: x86: ACPI: fix CPU hotplug deadlock thermal: int340x_thermal: fix: Update Tiger Lake ACPI device IDs platform/x86: intel-hid: fix: Update Tiger Lake ACPI device ID ACPI: Update Tiger Lake ACPI device IDs ACPI: video: Use native backlight on Acer Aspire 5783z ACPI: video: Docs update for "acpi_backlight" kernel parameter options ACPICA: Update version 20200326 ACPICA: Fixes for acpiExec namespace init file ACPICA: Add NHLT table signature ACPICA: WSMT: Fix typo, no functional change ACPICA: utilities: fix sprintf() ACPICA: acpiexec: remove redeclaration of acpi_gbl_db_opt_no_region_support ACPICA: Change PlatformCommChannel ASL keyword to PCC ACPICA: Fix IVRS IVHD type 10h reserved field name ACPICA: Implement IVRS IVHD type 11h parsing ACPICA: Fix a typo in a comment field ACPI: CPPC: clean up acpi_get_psd_map()