summaryrefslogtreecommitdiff
path: root/arch/x86/kernel
AgeCommit message (Collapse)AuthorFilesLines
2020-04-26x86/cpu: Export native_write_cr4() only when CONFIG_LKTDM=mThomas Gleixner1-0/+2
Modules have no business poking into this but fixing this is for later. [ bp: Carve out from an earlier patch. ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200421092558.939985695@linutronix.de
2020-04-26x86/tlb: Move __flush_tlb_one_user() out of lineThomas Gleixner1-5/+0
cpu_tlbstate is exported because various TLB-related functions need access to it, but cpu_tlbstate is sensitive information which should only be accessed by well-contained kernel functions and not be directly exposed to modules. As a third step, move _flush_tlb_one_user() out of line and hide the native function. The latter can be static when CONFIG_PARAVIRT is disabled. Consolidate the name space while at it and remove the pointless extra wrapper in the paravirt code. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200421092559.428213098@linutronix.de
2020-04-26x86/tlb: Move __flush_tlb_global() out of lineThomas Gleixner1-9/+0
cpu_tlbstate is exported because various TLB-related functions need access to it, but cpu_tlbstate is sensitive information which should only be accessed by well-contained kernel functions and not be directly exposed to modules. As a second step, move __flush_tlb_global() out of line and hide the native function. The latter can be static when CONFIG_PARAVIRT is disabled. Consolidate the namespace while at it and remove the pointless extra wrapper in the paravirt code. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200421092559.336916818@linutronix.de
2020-04-26x86/tlb: Move __flush_tlb() out of lineThomas Gleixner2-8/+3
cpu_tlbstate is exported because various TLB-related functions need access to it, but cpu_tlbstate is sensitive information which should only be accessed by well-contained kernel functions and not be directly exposed to modules. As a first step, move __flush_tlb() out of line and hide the native function. The latter can be static when CONFIG_PARAVIRT is disabled. Consolidate the namespace while at it and remove the pointless extra wrapper in the paravirt code. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200421092559.246130908@linutronix.de
2020-04-25x86/unwind/orc: Fix premature unwind stoppage due to IRET framesJosh Poimboeuf1-11/+40
The following execution path is possible: fsnotify() [ realign the stack and store previous SP in R10 ] <IRQ> [ only IRET regs saved ] common_interrupt() interrupt_entry() <NMI> [ full pt_regs saved ] ... [ unwind stack ] When the unwinder goes through the NMI and the IRQ on the stack, and then sees fsnotify(), it doesn't have access to the value of R10, because it only has the five IRET registers. So the unwind stops prematurely. However, because the interrupt_entry() code is careful not to clobber R10 before saving the full regs, the unwinder should be able to read R10 from the previously saved full pt_regs associated with the NMI. Handle this case properly. When encountering an IRET regs frame immediately after a full pt_regs frame, use the pt_regs as a backup which can be used to get the C register values. Also, note that a call frame resets the 'prev_regs' value, because a function is free to clobber the registers. For this fix to work, the IRET and full regs frames must be adjacent, with no FUNC frames in between. So replace the FUNC hint in interrupt_entry() with an IRET_REGS hint. Fixes: ee9f8fce9964 ("x86/unwind: Add the ORC unwinder") Reviewed-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Jones <dsj@fb.com> Cc: Jann Horn <jannh@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: https://lore.kernel.org/r/97a408167cc09f1cfa0de31a7b70dd88868d743f.1587808742.git.jpoimboe@redhat.com
2020-04-25x86/unwind/orc: Fix error path for bad ORC entry typeJosh Poimboeuf1-1/+1
If the ORC entry type is unknown, nothing else can be done other than reporting an error. Exit the function instead of breaking out of the switch statement. Fixes: ee9f8fce9964 ("x86/unwind: Add the ORC unwinder") Reviewed-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Jones <dsj@fb.com> Cc: Jann Horn <jannh@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: https://lore.kernel.org/r/a7fa668ca6eabbe81ab18b2424f15adbbfdc810a.1587808742.git.jpoimboe@redhat.com
2020-04-25x86/unwind/orc: Prevent unwinding before ORC initializationJosh Poimboeuf1-3/+3
If the unwinder is called before the ORC data has been initialized, orc_find() returns NULL, and it tries to fall back to using frame pointers. This can cause some unexpected warnings during boot. Move the 'orc_init' check from orc_find() to __unwind_init(), so that it doesn't even try to unwind from an uninitialized state. Fixes: ee9f8fce9964 ("x86/unwind: Add the ORC unwinder") Reviewed-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Jones <dsj@fb.com> Cc: Jann Horn <jannh@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: https://lore.kernel.org/r/069d1499ad606d85532eb32ce39b2441679667d5.1587808742.git.jpoimboe@redhat.com
2020-04-25x86/unwind/orc: Don't skip the first frame for inactive tasksMiroslav Benes1-1/+1
When unwinding an inactive task, the ORC unwinder skips the first frame by default. If both the 'regs' and 'first_frame' parameters of unwind_start() are NULL, 'state->sp' and 'first_frame' are later initialized to the same value for an inactive task. Given there is a "less than or equal to" comparison used at the end of __unwind_start() for skipping stack frames, the first frame is skipped. Drop the equal part of the comparison and make the behavior equivalent to the frame pointer unwinder. Fixes: ee9f8fce9964 ("x86/unwind: Add the ORC unwinder") Reviewed-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Jones <dsj@fb.com> Cc: Jann Horn <jannh@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: https://lore.kernel.org/r/7f08db872ab59e807016910acdbe82f744de7065.1587808742.git.jpoimboe@redhat.com
2020-04-25x86/unwind: Prevent false warnings for non-current tasksJosh Poimboeuf3-18/+28
There's some daring kernel code out there which dumps the stack of another task without first making sure the task is inactive. If the task happens to be running while the unwinder is reading the stack, unusual unwinder warnings can result. There's no race-free way for the unwinder to know whether such a warning is legitimate, so just disable unwinder warnings for all non-current tasks. Reviewed-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Jones <dsj@fb.com> Cc: Jann Horn <jannh@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: https://lore.kernel.org/r/ec424a2aea1d461eb30cab48a28c6433de2ab784.1587808742.git.jpoimboe@redhat.com
2020-04-25x86/unwind/orc: Convert global variables to staticJosh Poimboeuf1-5/+5
These variables aren't used outside of unwind_orc.c, make them static. Also annotate some of them with '__ro_after_init', as applicable. Reviewed-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Jones <dsj@fb.com> Cc: Jann Horn <jannh@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: https://lore.kernel.org/r/43ae310bf7822b9862e571f36ae3474cfde8f301.1587808742.git.jpoimboe@redhat.com
2020-04-24x86/alternatives: Move temporary_mm helpers into CThomas Gleixner1-0/+55
The only user of these inlines is the text poke code and this must not be exposed to the world. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200421092559.139069561@linutronix.de
2020-04-24x86/cpu: Uninline CR4 accessorsThomas Gleixner2-1/+33
cpu_tlbstate is exported because various TLB-related functions need access to it, but cpu_tlbstate is sensitive information which should only be accessed by well-contained kernel functions and not be directly exposed to modules. The various CR4 accessors require cpu_tlbstate as the CR4 shadow cache is located there. In preparation for unexporting cpu_tlbstate, create a builtin function for manipulating CR4 and rework the various helpers to use it. No functional change. [ bp: push the export of native_write_cr4() only when CONFIG_LKTDM=m to the last patch in the series. ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200421092558.939985695@linutronix.de
2020-04-23x86, sched: Move check for CPU type to caller functionGiovanni Gherdovich1-4/+2
Improve readability of the function intel_set_max_freq_ratio() by moving the check for KNL CPUs there, together with checks for GLM and SKX. Signed-off-by: Giovanni Gherdovich <ggherdovich@suse.cz> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lkml.kernel.org/r/20200416054745.740-5-ggherdovich@suse.cz
2020-04-23x86, sched: Don't enable static key when starting secondary CPUsPeter Zijlstra (Intel)1-7/+14
The static key arch_scale_freq_key only needs to be enabled once (at boot). This change fixes a bug by which the key was enabled every time cpu0 is started, even as a secondary CPU during cpu hotplug. Secondary CPUs are started from the idle thread: setting a static key from there means acquiring a lock and may result in sleeping in the idle task, causing CPU lockup. Another consequence of this change is that init_counter_refs() is now called on each CPU correctly; previously the function on_each_cpu() was used, but it was called at boot when the only online cpu is cpu0. [ggherdovich@suse.cz: Tested and wrote changelog] Fixes: 1567c3e3467c ("x86, sched: Add support for frequency invariance") Reported-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Giovanni Gherdovich <ggherdovich@suse.cz> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lkml.kernel.org/r/20200416054745.740-4-ggherdovich@suse.cz
2020-04-23x86, sched: Account for CPUs with less than 4 cores in freq. invarianceGiovanni Gherdovich1-3/+8
If a CPU has less than 4 physical cores, MSR_TURBO_RATIO_LIMIT will rightfully report that the 4C turbo ratio is zero. In such cases, use the 1C turbo ratio instead for frequency invariance calculations. Fixes: 1567c3e3467c ("x86, sched: Add support for frequency invariance") Reported-by: Like Xu <like.xu@linux.intel.com> Reported-by: Neil Rickert <nwr10cst-oslnx@yahoo.com> Signed-off-by: Giovanni Gherdovich <ggherdovich@suse.cz> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Dave Kleikamp <dave.kleikamp@oracle.com> Link: https://lkml.kernel.org/r/20200416054745.740-3-ggherdovich@suse.cz
2020-04-23x86, sched: Bail out of frequency invariance if base frequency is unknownGiovanni Gherdovich1-0/+9
Some hypervisors such as VMWare ESXi 5.5 advertise support for X86_FEATURE_APERFMPERF but then fill all MSR's with zeroes. In particular, MSR_PLATFORM_INFO set to zero tricks the code that wants to know the base clock frequency of the CPU (highest non-turbo frequency), producing a division by zero when computing the ratio turbo_freq/base_freq necessary for frequency invariant accounting. It is to be noted that even if MSR_PLATFORM_INFO contained the appropriate data, APERF and MPERF are constantly zero on ESXi 5.5, thus freq-invariance couldn't be done in principle (not that it would make a lot of sense in a VM anyway). The real problem is advertising X86_FEATURE_APERFMPERF. This appears to be fixed in more recent versions: ESXi 6.7 doesn't advertise that feature. Fixes: 1567c3e3467c ("x86, sched: Add support for frequency invariance") Signed-off-by: Giovanni Gherdovich <ggherdovich@suse.cz> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lkml.kernel.org/r/20200416054745.740-2-ggherdovich@suse.cz
2020-04-22x86/microcode: Fix return value for microcode late loadingMihai Carabas1-8/+7
The return value from stop_machine() might not be consistent. stop_machine_cpuslocked() returns: - zero if all functions have returned 0. - a non-zero value if at least one of the functions returned a non-zero value. There is no way to know if it is negative or positive. So make __reload_late() return 0 on success or negative otherwise. [ bp: Unify ret val check and touch up. ] Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/1587497318-4438-1-git-send-email-mihai.carabas@oracle.com
2020-04-22x86,ftrace: Shrink ftrace_regs_caller() by one bytePeter Zijlstra1-2/+2
'Optimize' ftrace_regs_caller. Instead of comparing against an immediate, the more natural way to test for zero on x86 is: 'test %r,%r'. 48 83 f8 00 cmp $0x0,%rax 74 49 je 226 <ftrace_regs_call+0xa3> 48 85 c0 test %rax,%rax 74 49 je 225 <ftrace_regs_call+0xa2> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lkml.kernel.org/r/20200416115118.867411350@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-04-22x86,ftrace: Use SIZEOF_PTREGSPeter Zijlstra1-2/+2
There's a convenient macro for 'SS+8' called FRAME_SIZE. Use it to clarify things. (entry/calling.h calls this SIZEOF_PTREGS but we're using asm/ptrace-abi.h) Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lkml.kernel.org/r/20200416115118.808485515@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-04-22x86,ftrace: Fix ftrace_regs_caller() unwindPeter Zijlstra2-19/+25
The ftrace_regs_caller() trampoline does something 'funny' when there is a direct-caller present. In that case it stuffs the 'direct-caller' address on the return stack and then exits the function. This then results in 'returning' to the direct-caller with the exact registers we came in with -- an indirect tail-call without using a register. This however (rightfully) confuses objtool because the function shares a few instruction in order to have a single exit path, but the stack layout is different for them, depending through which path we came there. This is currently cludged by forcing the stack state to the non-direct case, but this generates actively wrong (ORC) unwind information for the direct case, leading to potential broken unwinds. Fix this issue by fully separating the exit paths. This results in having to poke a second RET into the trampoline copy, see ftrace_regs_caller_ret. This brings us to a second objtool problem, in order for it to perceive the 'jmp ftrace_epilogue' as a function exit, it needs to be recognised as a tail call. In order to make that happen, ftrace_epilogue needs to be the start of an STT_FUNC, so re-arrange code to make this so. Finally, a third issue is that objtool requires functions to exit with the same stack layout they started with, which is obviously violated in the direct case, employ the new HINT_RET_OFFSET to tell objtool this is an expected exception. Together, this results in generating correct ORC unwind information for the ftrace_regs_caller() function and it's trampoline copies. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lkml.kernel.org/r/20200416115118.749606694@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-04-20x86/speculation: Add Special Register Buffer Data Sampling (SRBDS) mitigationMark Gross3-0/+138
SRBDS is an MDS-like speculative side channel that can leak bits from the random number generator (RNG) across cores and threads. New microcode serializes the processor access during the execution of RDRAND and RDSEED. This ensures that the shared buffer is overwritten before it is released for reuse. While it is present on all affected CPU models, the microcode mitigation is not needed on models that enumerate ARCH_CAPABILITIES[MDS_NO] in the cases where TSX is not supported or has been disabled with TSX_CTRL. The mitigation is activated by default on affected processors and it increases latency for RDRAND and RDSEED instructions. Among other effects this will reduce throughput from /dev/urandom. * Enable administrator to configure the mitigation off when desired using either mitigations=off or srbds=off. * Export vulnerability status via sysfs * Rename file-scoped macros to apply for non-whitelist table initializations. [ bp: Massage, - s/VULNBL_INTEL_STEPPING/VULNBL_INTEL_STEPPINGS/g, - do not read arch cap MSR a second time in tsx_fused_off() - just pass it in, - flip check in cpu_set_bug_bits() to save an indentation level, - reflow comments. jpoimboe: s/Mitigated/Mitigation/ in user-visible strings tglx: Dropped the fused off magic for now ] Signed-off-by: Mark Gross <mgross@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Tested-by: Neelima Krishnan <neelima.krishnan@intel.com>
2020-04-20x86/cpu: Add a steppings field to struct x86_cpu_idMark Gross1-1/+6
Intel uses the same family/model for several CPUs. Sometimes the stepping must be checked to tell them apart. On x86 there can be at most 16 steppings. Add a steppings bitmask to x86_cpu_id and a X86_MATCH_VENDOR_FAMILY_MODEL_STEPPING_FEATURE macro and support for matching against family/model/stepping. [ bp: Massage. ] Signed-off-by: Mark Gross <mgross@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
2020-04-20x86/cpu: Add 'table' argument to cpu_matches()Mark Gross1-11/+14
To make cpu_matches() reusable for other matching tables, have it take a pointer to a x86_cpu_id table as an argument. [ bp: Flip arguments order. ] Signed-off-by: Mark Gross <mgross@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
2020-04-19Merge tag 'x86-urgent-2020-04-19' of ↵Linus Torvalds5-20/+55
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 and objtool fixes from Thomas Gleixner: "A set of fixes for x86 and objtool: objtool: - Ignore the double UD2 which is emitted in BUG() when CONFIG_UBSAN_TRAP is enabled. - Support clang non-section symbols in objtool ORC dump - Fix switch table detection in .text.unlikely - Make the BP scratch register warning more robust. x86: - Increase microcode maximum patch size for AMD to cope with new CPUs which have a larger patch size. - Fix a crash in the resource control filesystem when the removal of the default resource group is attempted. - Preserve Code and Data Prioritization enabled state accross CPU hotplug. - Update split lock cpu matching to use the new X86_MATCH macros. - Change the split lock enumeration as Intel finaly decided that the IA32_CORE_CAPABILITIES bits are not architectural contrary to what the SDM claims. !@#%$^! - Add Tremont CPU models to the split lock detection cpu match. - Add a missing static attribute to make sparse happy" * tag 'x86-urgent-2020-04-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/split_lock: Add Tremont family CPU models x86/split_lock: Bits in IA32_CORE_CAPABILITIES are not architectural x86/resctrl: Preserve CDP enable over CPU hotplug x86/resctrl: Fix invalid attempt at removing the default resource group x86/split_lock: Update to use X86_MATCH_INTEL_FAM6_MODEL() x86/umip: Make umip_insns static x86/microcode/AMD: Increase microcode PATCH_MAX_SIZE objtool: Make BP scratch register warning more robust objtool: Fix switch table detection in .text.unlikely objtool: Support Clang non-section symbols in ORC generation objtool: Support Clang non-section symbols in ORC dump objtool: Fix CONFIG_UBSAN_TRAP unreachable warnings
2020-04-18x86/split_lock: Add Tremont family CPU modelsTony Luck1-0/+3
Tremont CPUs support IA32_CORE_CAPABILITIES bits to indicate whether specific SKUs have support for split lock detection. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20200416205754.21177-4-tony.luck@intel.com
2020-04-18x86/split_lock: Bits in IA32_CORE_CAPABILITIES are not architecturalTony Luck1-14/+31
The Intel Software Developers' Manual erroneously listed bit 5 of the IA32_CORE_CAPABILITIES register as an architectural feature. It is not. Features enumerated by IA32_CORE_CAPABILITIES are model specific and implementation details may vary in different cpu models. Thus it is only safe to trust features after checking the CPU model. Icelake client and server models are known to implement the split lock detect feature even though they don't enumerate IA32_CORE_CAPABILITIES [ tglx: Use switch() for readability and massage comments ] Fixes: 6650cdd9a8cc ("x86/split_lock: Enable split lock detection by kernel") Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20200416205754.21177-3-tony.luck@intel.com
2020-04-17x86/early_printk: Remove unused includesAndy Shevchenko1-3/+0
After 1bd187de5364 ("x86, intel-mid: remove Intel MID specific serial support") the Intel MID header is not needed anymore. After 69c1f396f25b ("efi/x86: Convert x86 EFI earlyprintk into generic earlycon implementation") the EFI headers are not needed anymore. Remove the respective includes. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200326175415.8618-1-andriy.shevchenko@linux.intel.com
2020-04-17x86/resctrl: Preserve CDP enable over CPU hotplugJames Morse3-0/+16
Resctrl assumes that all CPUs are online when the filesystem is mounted, and that CPUs remember their CDP-enabled state over CPU hotplug. This goes wrong when resctrl's CDP-enabled state changes while all the CPUs in a domain are offline. When a domain comes online, enable (or disable!) CDP to match resctrl's current setting. Fixes: 5ff193fbde20 ("x86/intel_rdt: Add basic resctrl filesystem support") Suggested-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20200221162105.154163-1-james.morse@arm.com
2020-04-17x86/resctrl: Fix invalid attempt at removing the default resource groupReinette Chatre1-1/+2
The default resource group ("rdtgroup_default") is associated with the root of the resctrl filesystem and should never be removed. New resource groups can be created as subdirectories of the resctrl filesystem and they can be removed from user space. There exists a safeguard in the directory removal code (rdtgroup_rmdir()) that ensures that only subdirectories can be removed by testing that the directory to be removed has to be a child of the root directory. A possible deadlock was recently fixed with 334b0f4e9b1b ("x86/resctrl: Fix a deadlock due to inaccurate reference"). This fix involved associating the private data of the "mon_groups" and "mon_data" directories to the resource group to which they belong instead of NULL as before. A consequence of this change was that the original safeguard code preventing removal of "mon_groups" and "mon_data" found in the root directory failed resulting in attempts to remove the default resource group that ends in a BUG: kernel BUG at mm/slub.c:3969! invalid opcode: 0000 [#1] SMP PTI Call Trace: rdtgroup_rmdir+0x16b/0x2c0 kernfs_iop_rmdir+0x5c/0x90 vfs_rmdir+0x7a/0x160 do_rmdir+0x17d/0x1e0 do_syscall_64+0x55/0x1d0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fix this by improving the directory removal safeguard to ensure that subdirectories of the resctrl root directory can only be removed if they are a child of the resctrl filesystem's root _and_ not associated with the default resource group. Fixes: 334b0f4e9b1b ("x86/resctrl: Fix a deadlock due to inaccurate reference") Reported-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/884cbe1773496b5dbec1b6bd11bb50cffa83603d.1584461853.git.reinette.chatre@intel.com
2020-04-17x86/split_lock: Update to use X86_MATCH_INTEL_FAM6_MODEL()Tony Luck1-4/+2
The SPLIT_LOCK_CPU() macro escaped the tree-wide sweep for old-style initialization. Update to use X86_MATCH_INTEL_FAM6_MODEL(). Fixes: 6650cdd9a8cc ("x86/split_lock: Enable split lock detection by kernel") Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20200416205754.21177-2-tony.luck@intel.com
2020-04-15crash_dump: Remove no longer used saved_max_pfnKairui Song1-8/+0
saved_max_pfn was originally introduced in commit 92aa63a5a1bf ("[PATCH] kdump: Retrieve saved max pfn") It used to make sure that the user does not try to read the physical memory beyond saved_max_pfn. But since commit 921d58c0e699 ("vmcore: remove saved_max_pfn check") it's no longer used for the check. This variable doesn't have any users anymore so just remove it. [ bp: Drop the Calgary IOMMU reference from the commit message. ] Signed-off-by: Kairui Song <kasong@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Link: https://lkml.kernel.org/r/20200330181544.1595733-1-kasong@redhat.com
2020-04-15x86/umip: Make umip_insns staticJason Yan1-1/+1
Fix the following sparse warning: arch/x86/kernel/umip.c:84:12: warning: symbol 'umip_insns' was not declared. Should it be static? Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Jason Yan <yanaijie@huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Link: https://lkml.kernel.org/r/20200413082213.22934-1-yanaijie@huawei.com
2020-04-14Merge tag 'hyperv-fixes-signed' of ↵Linus Torvalds1-2/+12
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv fixes from Wei Liu: - a series from Tianyu Lan to fix crash reporting on Hyper-V - three miscellaneous cleanup patches * tag 'hyperv-fixes-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: x86/Hyper-V: Report crash data in die() when panic_on_oops is set x86/Hyper-V: Report crash register data when sysctl_record_panic_msg is not set x86/Hyper-V: Report crash register data or kmsg before running crash kernel x86/Hyper-V: Trigger crash enlightenment only once during system crash. x86/Hyper-V: Free hv_panic_page when fail to register kmsg dump x86/Hyper-V: Unload vmbus channel in hv panic callback x86: hyperv: report value of misc_features hv_debugfs: Make hv_debug_root static hv: hyperv_vmbus.h: Replace zero-length array with flexible-array member
2020-04-14x86/mce: Fixup exception only for the correct MCEsBorislav Petkov2-3/+18
The severity grading code returns IN_KERNEL_RECOV error context for errors which have happened in kernel space but from which the kernel can recover. Whether the recovery can happen is determined by the exception table entry having as handler ex_handler_fault() and which has been declared at build time using _ASM_EXTABLE_FAULT(). IN_KERNEL_RECOV is used in mce_severity_intel() to lookup the corresponding error severity in the severities table. However, the mapping back from error severity to whether the error is IN_KERNEL_RECOV is ambiguous and in the very paranoid case - which might not be possible right now - but be better safe than sorry later, an exception fixup could be attempted for another MCE whose address is in the exception table and has the proper severity. Which would be unfortunate, to say the least. Therefore, mark such MCEs explicitly as MCE_IN_KERNEL_RECOV so that the recovery attempt is done only for them. Document the whole handling, while at it, as it is not trivial. Reported-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20200407163414.18058-10-bp@alien8.de
2020-04-14x86/mce: Add mce=print_all optionTony Luck2-1/+7
Sometimes, when logs are getting lost, it's nice to just have everything dumped to the serial console. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20200214222720.13168-7-tony.luck@intel.com
2020-04-14x86/mce: Change default MCE logger to check mce->kflagsTony Luck1-16/+3
Instead of keeping count of how many handlers are registered on the MCE notifier chain and printing if below some magic value, look at mce->kflags to see if anyone claims to have handled/logged this error. [ bp: Do not print ->kflags in __print_mce(). ] Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20200214222720.13168-6-tony.luck@intel.com
2020-04-14x86/mce: Fix all mce notifiers to update the mce->kflags bitmaskTony Luck2-1/+8
If the handler took any action to log or deal with the error, set a bit in mce->kflags so that the default handler on the end of the machine check chain can see what has been done. Get rid of NOTIFY_STOP returns. Make the EDAC and dev-mcelog handlers skip over errors already processed by CEC. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20200214222720.13168-5-tony.luck@intel.com
2020-04-14x86/mce: Convert the CEC to use the MCE notifierTony Luck1-19/+0
The CEC code has its claws in a couple of routines in mce/core.c. Convert it to just register itself on the normal MCE notifier chain. [ bp: Make cec_add_elem() and cec_init() static. ] Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20200214222720.13168-3-tony.luck@intel.com
2020-04-14x86/mce: Rename "first" function as "early"Tony Luck1-5/+5
It isn't going to be first on the notifier chain when the CEC is moved to be a normal user of the notifier chain. Fix the enum for the MCE_PRIO symbols to list them in reverse order so that the compiler can give them numbers from low to high priority. Add an entry for MCE_PRIO_CEC as the highest priority. [ bp: Use passive voice, add comments. ] Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20200214222720.13168-2-tony.luck@intel.com
2020-04-14x86/mce/amd, edac: Remove report_gart_errorsBorislav Petkov1-2/+7
... because no one should be interested in spurious MCEs anyway. Make the filtering unconditional and move it to amd_filter_mce(). Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20200407163414.18058-2-bp@alien8.de
2020-04-14x86/mce/amd: Make threshold bank setting hotplug robustThomas Gleixner1-3/+11
Handle the cases when the CPU goes offline before the bank setting/reading happens. [ bp: Write commit message. ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200403161943.1458-8-bp@alien8.de
2020-04-14x86/mce/amd: Cleanup threshold device remove pathThomas Gleixner1-42/+37
Pass in the bank pointer directly to the cleaning up functions, obviating the need for per-CPU accesses. Make the clean up path interrupt-safe by cleaning the bank pointer first so that the rest of the teardown happens safe from the thresholding interrupt. No functional changes. [ bp: Write commit message and reverse bank->shared test to save an indentation level in threshold_remove_bank(). ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200403161943.1458-7-bp@alien8.de
2020-04-14x86/mce/amd: Straighten CPU hotplug pathThomas Gleixner1-17/+15
mce_threshold_create_device() hotplug callback runs on the plugged in CPU so: - use this_cpu_read() which is faster - pass in struct threshold_bank **bp to threshold_create_bank() and instead of doing per-CPU accesses - Use rdmsr_safe() instead of rdmsr_safe_on_cpu() which avoids an IPI. No functional changes. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200403161943.1458-6-bp@alien8.de
2020-04-14x86/mce/amd: Sanitize thresholding device creation hotplug pathThomas Gleixner2-41/+27
Drop the stupid threshold_init_device() initcall iterating over all online CPUs in favor of properly setting up everything on the CPU hotplug path, when each CPU's callback is invoked. [ bp: Write commit message. ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200403161943.1458-5-bp@alien8.de
2020-04-14x86/mce/amd: Protect a not-fully initialized bank from the thresholding ↵Thomas Gleixner1-2/+17
interrupt Make sure the thresholding bank descriptor is fully initialized when the thresholding interrupt fires after a hotplug event. [ bp: Write commit message and document long-forgotten bank_map. ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200403161943.1458-4-bp@alien8.de
2020-04-14x86/mce/amd: Init thresholding machinery only on relevant vendorsThomas Gleixner3-5/+17
... and not unconditionally. [ bp: Add a new vendor_flags bit for that. ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200403161943.1458-3-bp@alien8.de
2020-04-14x86/mce/amd: Do proper cleanup on error pathsThomas Gleixner1-7/+8
Drop kobject reference counts properly on error in the banks and blocks allocation functions. [ bp: Write commit message. ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200403161943.1458-2-bp@alien8.de
2020-04-14x86/32: Remove CONFIG_DOUBLEFAULTBorislav Petkov3-9/+1
Make the doublefault exception handler unconditional on 32-bit. Yes, it is important to be able to catch #DF exceptions instead of silent reboots. Yes, the code size increase is worth every byte. And one less CONFIG symbol is just the cherry on top. No functional changes. Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Andy Lutomirski <luto@kernel.org> Link: https://lkml.kernel.org/r/20200404083646.8897-1-bp@alien8.de
2020-04-13x86/smpboot: Remove the last ICPU() macroBorislav Petkov1-8/+9
Now all is using the shiny new macros. No code changed: # arch/x86/kernel/smpboot.o: text data bss dec hex filename 16432 2649 40 19121 4ab1 smpboot.o.before 16432 2649 40 19121 4ab1 smpboot.o.after md5: a58104003b72c1de533095bc5a4c30a9 smpboot.o.before.asm a58104003b72c1de533095bc5a4c30a9 smpboot.o.after.asm Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20200324185836.GI22931@zn.tnic
2020-04-13Merge tag 'v5.7-rc1' into locking/kcsan, to resolve conflicts and refreshIngo Molnar58-558/+1692
Resolve these conflicts: arch/x86/Kconfig arch/x86/kernel/Makefile Do a minor "evil merge" to move the KCSAN entry up a bit by a few lines in the Kconfig to reduce the probability of future conflicts. Signed-off-by: Ingo Molnar <mingo@kernel.org>