summaryrefslogtreecommitdiff
path: root/arch/x86/kernel
AgeCommit message (Collapse)AuthorFilesLines
2020-05-15x86/unwind/orc: Fix error handling in __unwind_start()Josh Poimboeuf1-7/+9
The unwind_state 'error' field is used to inform the reliable unwinding code that the stack trace can't be trusted. Set this field for all errors in __unwind_start(). Also, move the zeroing out of the unwind_state struct to before the ORC table initialization check, to prevent the caller from reading uninitialized data if the ORC table is corrupted. Fixes: af085d9084b4 ("stacktrace/x86: add function for detecting reliable stack traces") Fixes: d3a09104018c ("x86/unwinder/orc: Dont bail on stack overflow") Fixes: 98d0c8ebf77e ("x86/unwind/orc: Prevent unwinding before ORC initialization") Reported-by: Pavel Machek <pavel@denx.de> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/d6ac7215a84ca92b895fdd2e1aa546729417e6e6.1589487277.git.jpoimboe@redhat.com
2020-05-14Merge tag 'trace-v5.7-rc5' of ↵Linus Torvalds1-1/+28
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull more tracing fixes from Steven Rostedt: "Various tracing fixes: - Fix a crash when having function tracing and function stack tracing on the command line. The ftrace trampolines are created as executable and read only. But the stack tracer tries to modify them with text_poke() which expects all kernel text to still be writable at boot. Keep the trampolines writable at boot, and convert them to read-only with the rest of the kernel. - A selftest was triggering in the ring buffer iterator code, that is no longer valid with the update of keeping the ring buffer writable while a iterator is reading. Just bail after three failed attempts to get an event and remove the warning and disabling of the ring buffer. - While modifying the ring buffer code, decided to remove all the unnecessary BUG() calls" * tag 'trace-v5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: ring-buffer: Remove all BUG() calls ring-buffer: Don't deactivate the ring buffer on failed iterator reads x86/ftrace: Have ftrace trampolines turn read-only at the end of system boot up
2020-05-13x86/fpu/xstate: Update sanitize_restored_xstate() for supervisor xstatesYu-cheng Yu1-13/+24
The function sanitize_restored_xstate() sanitizes user xstates of an XSAVE buffer by clearing bits not in the input 'xfeatures' from the buffer's header->xfeatures, effectively resetting those features back to the init state. When supervisor xstates are introduced, it is necessary to make sure only user xstates are sanitized. Ensure supervisor bits in header->xfeatures stay set and supervisor states are not modified. To make names clear, also: - Rename the function to sanitize_restored_user_xstate(). - Rename input parameter 'xfeatures' to 'user_xfeatures'. - In __fpu__restore_sig(), rename 'xfeatures' to 'user_xfeatures'. Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lkml.kernel.org/r/20200512145444.15483-7-yu-cheng.yu@intel.com
2020-05-13x86/fpu/xstate: Define new functions for clearing fpregs and xstatesFenghua Yu4-22/+39
Currently, fpu__clear() clears all fpregs and xstates. Once XSAVES supervisor states are introduced, supervisor settings (e.g. CET xstates) must remain active for signals; It is necessary to have separate functions: - Create fpu__clear_user_states(): clear only user settings for signals; - Create fpu__clear_all(): clear both user and supervisor settings in flush_thread(). Also modify copy_init_fpstate_to_fpregs() to take a mask from above two functions. Remove obvious side-comment in fpu__clear(), while at it. [ bp: Make the second argument of fpu__clear() bool after requesting it a bunch of times during review. - Add a comment about copy_init_fpstate_to_fpregs() locking needs. ] Co-developed-by: Yu-cheng Yu <yu-cheng.yu@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20200512145444.15483-6-yu-cheng.yu@intel.com
2020-05-13x86/fpu/xstate: Introduce XSAVES supervisor statesYu-cheng Yu1-9/+19
Enable XSAVES supervisor states by setting MSR_IA32_XSS bits according to CPUID enumeration results. Also revise comments at various places. Co-developed-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20200512145444.15483-5-yu-cheng.yu@intel.com
2020-05-13x86/fpu/xstate: Separate user and supervisor xfeatures maskYu-cheng Yu2-35/+54
Before the introduction of XSAVES supervisor states, 'xfeatures_mask' is used at various places to determine XSAVE buffer components and XCR0 bits. It contains only user xstates. To support supervisor xstates, it is necessary to separate user and supervisor xstates: - First, change 'xfeatures_mask' to 'xfeatures_mask_all', which represents the full set of bits that should ever be set in a kernel XSAVE buffer. - Introduce xfeatures_mask_supervisor() and xfeatures_mask_user() to extract relevant xfeatures from xfeatures_mask_all. Co-developed-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20200512145444.15483-4-yu-cheng.yu@intel.com
2020-05-13x86/ftrace: Have ftrace trampolines turn read-only at the end of system boot upSteven Rostedt (VMware)1-1/+28
Booting one of my machines, it triggered the following crash: Kernel/User page tables isolation: enabled ftrace: allocating 36577 entries in 143 pages Starting tracer 'function' BUG: unable to handle page fault for address: ffffffffa000005c #PF: supervisor write access in kernel mode #PF: error_code(0x0003) - permissions violation PGD 2014067 P4D 2014067 PUD 2015063 PMD 7b253067 PTE 7b252061 Oops: 0003 [#1] PREEMPT SMP PTI CPU: 0 PID: 0 Comm: swapper Not tainted 5.4.0-test+ #24 Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007 RIP: 0010:text_poke_early+0x4a/0x58 Code: 34 24 48 89 54 24 08 e8 bf 72 0b 00 48 8b 34 24 48 8b 4c 24 08 84 c0 74 0b 48 89 df f3 a4 48 83 c4 10 5b c3 9c 58 fa 48 89 df <f3> a4 50 9d 48 83 c4 10 5b e9 d6 f9 ff ff 0 41 57 49 RSP: 0000:ffffffff82003d38 EFLAGS: 00010046 RAX: 0000000000000046 RBX: ffffffffa000005c RCX: 0000000000000005 RDX: 0000000000000005 RSI: ffffffff825b9a90 RDI: ffffffffa000005c RBP: ffffffffa000005c R08: 0000000000000000 R09: ffffffff8206e6e0 R10: ffff88807b01f4c0 R11: ffffffff8176c106 R12: ffffffff8206e6e0 R13: ffffffff824f2440 R14: 0000000000000000 R15: ffffffff8206eac0 FS: 0000000000000000(0000) GS:ffff88807d400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffa000005c CR3: 0000000002012000 CR4: 00000000000006b0 Call Trace: text_poke_bp+0x27/0x64 ? mutex_lock+0x36/0x5d arch_ftrace_update_trampoline+0x287/0x2d5 ? ftrace_replace_code+0x14b/0x160 ? ftrace_update_ftrace_func+0x65/0x6c __register_ftrace_function+0x6d/0x81 ftrace_startup+0x23/0xc1 register_ftrace_function+0x20/0x37 func_set_flag+0x59/0x77 __set_tracer_option.isra.19+0x20/0x3e trace_set_options+0xd6/0x13e apply_trace_boot_options+0x44/0x6d register_tracer+0x19e/0x1ac early_trace_init+0x21b/0x2c9 start_kernel+0x241/0x518 ? load_ucode_intel_bsp+0x21/0x52 secondary_startup_64+0xa4/0xb0 I was able to trigger it on other machines, when I added to the kernel command line of both "ftrace=function" and "trace_options=func_stack_trace". The cause is the "ftrace=function" would register the function tracer and create a trampoline, and it will set it as executable and read-only. Then the "trace_options=func_stack_trace" would then update the same trampoline to include the stack tracer version of the function tracer. But since the trampoline already exists, it updates it with text_poke_bp(). The problem is that text_poke_bp() called while system_state == SYSTEM_BOOTING, it will simply do a memcpy() and not the page mapping, as it would think that the text is still read-write. But in this case it is not, and we take a fault and crash. Instead, lets keep the ftrace trampolines read-write during boot up, and then when the kernel executable text is set to read-only, the ftrace trampolines get set to read-only as well. Link: https://lkml.kernel.org/r/20200430202147.4dc6e2de@oasis.local.home Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: stable@vger.kernel.org Fixes: 768ae4406a5c ("x86/ftrace: Use text_poke()") Acked-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-05-12x86/fpu/xstate: Define new macros for supervisor and user xstatesFenghua Yu2-14/+15
XCNTXT_MASK is 'all supported xfeatures' before introducing supervisor xstates. Rename it to XFEATURE_MASK_USER_SUPPORTED to make clear that these are user xstates. Replace XFEATURE_MASK_SUPERVISOR with the following: - XFEATURE_MASK_SUPERVISOR_SUPPORTED: Currently nothing. ENQCMD and Control-flow Enforcement Technology (CET) will be introduced in separate series. - XFEATURE_MASK_SUPERVISOR_UNSUPPORTED: Currently only Processor Trace. - XFEATURE_MASK_SUPERVISOR_ALL: the combination of above. Co-developed-by: Yu-cheng Yu <yu-cheng.yu@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20200512145444.15483-3-yu-cheng.yu@intel.com
2020-05-12x86/fpu/xstate: Rename validate_xstate_header() to validate_user_xstate_header()Fenghua Yu3-5/+5
The function validate_xstate_header() validates an xstate header coming from userspace (PTRACE or sigreturn). To make it clear, rename it to validate_user_xstate_header(). Suggested-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200512145444.15483-2-yu-cheng.yu@intel.com
2020-05-10Merge tag 'x86-urgent-2020-05-10' of ↵Linus Torvalds4-53/+93
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "A set of fixes for x86: - Ensure that direct mapping alias is always flushed when changing page attributes. The optimization for small ranges failed to do so when the virtual address was in the vmalloc or module space. - Unbreak the trace event registration for syscalls without arguments caused by the refactoring of the SYSCALL_DEFINE0() macro. - Move the printk in the TSC deadline timer code to a place where it is guaranteed to only be called once during boot and cannot be rearmed by clearing warn_once after boot. If it's invoked post boot then lockdep rightfully complains about a potential deadlock as the calling context is different. - A series of fixes for objtool and the ORC unwinder addressing variety of small issues: - Stack offset tracking for indirect CFAs in objtool ignored subsequent pushs and pops - Repair the unwind hints in the register clearing entry ASM code - Make the unwinding in the low level exit to usermode code stop after switching to the trampoline stack. The unwind hint is no longer valid and the ORC unwinder emits a warning as it can't find the registers anymore. - Fix unwind hints in switch_to_asm() and rewind_stack_do_exit() which caused objtool to generate bogus ORC data. - Prevent unwinder warnings when dumping the stack of a non-current task as there is no way to be sure about the validity because the dumped stack can be a moving target. - Make the ORC unwinder behave the same way as the frame pointer unwinder when dumping an inactive tasks stack and do not skip the first frame. - Prevent ORC unwinding before ORC data has been initialized - Immediately terminate unwinding when a unknown ORC entry type is found. - Prevent premature stop of the unwinder caused by IRET frames. - Fix another infinite loop in objtool caused by a negative offset which was not catched. - Address a few build warnings in the ORC unwinder and add missing static/ro_after_init annotations" * tag 'x86-urgent-2020-05-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/unwind/orc: Move ORC sorting variables under !CONFIG_MODULES x86/apic: Move TSC deadline timer debug printk ftrace/x86: Fix trace event registration for syscalls without arguments x86/mm/cpa: Flush direct map alias during cpa objtool: Fix infinite loop in for_offset_range() x86/unwind/orc: Fix premature unwind stoppage due to IRET frames x86/unwind/orc: Fix error path for bad ORC entry type x86/unwind/orc: Prevent unwinding before ORC initialization x86/unwind/orc: Don't skip the first frame for inactive tasks x86/unwind: Prevent false warnings for non-current tasks x86/unwind/orc: Convert global variables to static x86/entry/64: Fix unwind hints in rewind_stack_do_exit() x86/entry/64: Fix unwind hints in __switch_to_asm() x86/entry/64: Fix unwind hints in kernel exit path x86/entry/64: Fix unwind hints in register clearing code objtool: Fix stack offset tracking for indirect CFAs
2020-05-08x86/module: Use text_mutex in apply_relocate_add()Josh Poimboeuf1-2/+7
Now that the livepatch code no longer needs the text_mutex for changing module permissions, move its usage down to apply_relocate_add(). Note the s390 version of apply_relocate_add() doesn't need to use the text_mutex because it already uses s390_kernel_write_lock, which accomplishes the same task. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Acked-by: Joe Lawrence <joe.lawrence@redhat.com> Acked-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2020-05-08x86/module: Use text_poke() for late relocationsPeter Zijlstra1-7/+31
Because of late module patching, a livepatch module needs to be able to apply some of its relocations well after it has been loaded. Instead of playing games with module_{dis,en}able_ro(), use existing text poking mechanisms to apply relocations after module loading. So far only x86, s390 and Power have HAVE_LIVEPATCH but only the first two also have STRICT_MODULE_RWX. This will allow removal of the last module_disable_ro() usage in livepatch. The ultimate goal is to completely disallow making executable mappings writable. [ jpoimboe: Split up patches. Use mod state to determine whether memcpy() can be used. Implement text_poke() for UML. ] Cc: x86@kernel.org Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Joe Lawrence <joe.lawrence@redhat.com> Acked-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2020-05-08livepatch: Remove .klp.archPeter Zijlstra2-54/+0
After the previous patch, vmlinux-specific KLP relocations are now applied early during KLP module load. This means that .klp.arch sections are no longer needed for *vmlinux-specific* KLP relocations. One might think they're still needed for *module-specific* KLP relocations. If a to-be-patched module is loaded *after* its corresponding KLP module is loaded, any corresponding KLP relocations will be delayed until the to-be-patched module is loaded. If any special sections (.parainstructions, for example) rely on those relocations, their initializations (apply_paravirt) need to be done afterwards. Thus the apparent need for arch_klp_init_object_loaded() and its corresponding .klp.arch sections -- it allows some of the special section initializations to be done at a later time. But... if you look closer, that dependency between the special sections and the module-specific KLP relocations doesn't actually exist in reality. Looking at the contents of the .altinstructions and .parainstructions sections, there's not a realistic scenario in which a KLP module's .altinstructions or .parainstructions section needs to access a symbol in a to-be-patched module. It might need to access a local symbol or even a vmlinux symbol; but not another module's symbol. When a special section needs to reference a local or vmlinux symbol, a normal rela can be used instead of a KLP rela. Since the special section initializations don't actually have any real dependency on module-specific KLP relocations, .klp.arch and arch_klp_init_object_loaded() no longer have a reason to exist. So remove them. As Peter said much more succinctly: So the reason for .klp.arch was that .klp.rela.* stuff would overwrite paravirt instructions. If that happens you're doing it wrong. Those RELAs are core kernel, not module, and thus should've happened in .rela.* sections at patch-module loading time. Reverting this removes the two apply_{paravirt,alternatives}() calls from the late patching path, and means we don't have to worry about them when removing module_disable_ro(). [ jpoimboe: Rewrote patch description. Tweaked klp_init_object_loaded() error path. ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Joe Lawrence <joe.lawrence@redhat.com> Acked-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2020-05-07x86/cpu/amd: Make erratum #1054 a legacy erratumKim Phillips1-2/+1
Commit 21b5ee59ef18 ("x86/cpu/amd: Enable the fixed Instructions Retired counter IRPERF") mistakenly added erratum #1054 as an OS Visible Workaround (OSVW) ID 0. Erratum #1054 is not OSVW ID 0 [1], so make it a legacy erratum. There would never have been a false positive on older hardware that has OSVW bit 0 set, since the IRPERF feature was not available. However, save a couple of RDMSR executions per thread, on modern system configurations that correctly set non-zero values in their OSVW_ID_Length MSRs. [1] Revision Guide for AMD Family 17h Models 00h-0Fh Processors. The revision guide is available from the bugzilla link below. Fixes: 21b5ee59ef18 ("x86/cpu/amd: Enable the fixed Instructions Retired counter IRPERF") Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200417143356.26054-1-kim.phillips@amd.com Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
2020-05-07x86/delay: Introduce TPAUSE delayKyung Min Park1-0/+3
TPAUSE instructs the processor to enter an implementation-dependent optimized state. The instruction execution wakes up when the time-stamp counter reaches or exceeds the implicit EDX:EAX 64-bit input value. The instruction execution also wakes up due to the expiration of the operating system time-limit or by an external interrupt or exceptions such as a debug exception or a machine check exception. TPAUSE offers a choice of two lower power states: 1. Light-weight power/performance optimized state C0.1 2. Improved power/performance optimized state C0.2 This way, it can save power with low wake-up latency in comparison to spinloop based delay. The selection between the two is governed by the input register. TPAUSE is available on processors with X86_FEATURE_WAITPKG. Co-developed-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Kyung Min Park <kyung.min.park@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/1587757076-30337-4-git-send-email-kyung.min.park@intel.com
2020-05-07x86/platform/uv: Unexport uv_apicid_hibitsChristoph Hellwig1-1/+0
This variable is not used by modular code. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20200504171527.2845224-11-hch@lst.de
2020-05-07x86/platform/uv: Remove _uv_hub_info_check()Christoph Hellwig1-6/+0
Neither this functions nor the helpers used to implement it are used anywhere in the kernel tree. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Not-acked-by: Dimitri Sivanich <sivanich@hpe.com> Cc: Russ Anderson <rja@hpe.com> Link: https://lkml.kernel.org/r/20200504171527.2845224-10-hch@lst.de
2020-05-07x86/platform/uv: Simplify uv_send_IPI_one()Christoph Hellwig1-5/+14
Merge two helpers only used by uv_send_IPI_one() into the main function. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Not-acked-by: Dimitri Sivanich <sivanich@hpe.com> Cc: Russ Anderson <rja@hpe.com> Link: https://lkml.kernel.org/r/20200504171527.2845224-9-hch@lst.de
2020-05-07x86/platform/uv: Mark uv_min_hub_revision_id staticChristoph Hellwig1-2/+1
This variable is only used inside x2apic_uv_x and not even declared in a header. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Not-acked-by: Dimitri Sivanich <sivanich@hpe.com> Cc: Russ Anderson <rja@hpe.com> Link: https://lkml.kernel.org/r/20200504171527.2845224-8-hch@lst.de
2020-05-07x86/platform/uv: Mark is_uv_hubless() staticChristoph Hellwig1-2/+1
is_uv_hubless() is only used in x2apic_uv_x.c. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Not-acked-by: Dimitri Sivanich <sivanich@hpe.com> Cc: Russ Anderson <rja@hpe.com> Link: https://lkml.kernel.org/r/20200504171527.2845224-7-hch@lst.de
2020-05-07cpu/hotplug: Remove disable_nonboot_cpus()Qais Yousef1-2/+2
The single user could have called freeze_secondary_cpus() directly. Since this function was a source of confusion, remove it as it's just a pointless wrapper. While at it, rename enable_nonboot_cpus() to thaw_secondary_cpus() to preserve the naming symmetry. Done automatically via: git grep -l enable_nonboot_cpus | xargs sed -i 's/enable_nonboot_cpus/thaw_secondary_cpus/g' Signed-off-by: Qais Yousef <qais.yousef@arm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Link: https://lkml.kernel.org/r/20200430114004.17477-1-qais.yousef@arm.com
2020-05-07x86/apic: Convert the TSC deadline timer matching to steppings macroBorislav Petkov1-45/+12
... and get rid of the function pointers which would spit out the microcode revision based on the CPU stepping. Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Mark Gross <mgross.linux.intel.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200506071516.25445-4-bp@alien8.de
2020-05-07Merge 'x86/urgent' into x86/cpuBorislav Petkov4-53/+93
... to resolve conflicting changes to arch/x86/kernel/apic/apic.c Signed-off-by: Borislav Petkov <bp@suse.de>
2020-05-06x86/resctrl: Support wider MBM countersReinette Chatre2-2/+14
The original Memory Bandwidth Monitoring (MBM) architectural definition defines counters of up to 62 bits in the IA32_QM_CTR MSR while the first-generation MBM implementation uses statically defined 24 bit counters. The MBM CPUID enumeration properties have been expanded to include the MBM counter width, encoded as an offset from 24 bits. While eight bits are available for the counter width offset IA32_QM_CTR MSR only supports 62 bit counters. Add a sanity check, with warning printed when encountered, to ensure counters cannot exceed the 62 bit limit. Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/69d52abd5b14794d3a0f05ba7c755ed1f4c0d5ed.1588715690.git.reinette.chatre@intel.com
2020-05-06x86/resctrl: Support CPUID enumeration of MBM counter widthReinette Chatre1-0/+5
The original Memory Bandwidth Monitoring (MBM) architectural definition defines counters of up to 62 bits in the IA32_QM_CTR MSR while the first-generation MBM implementation uses statically defined 24 bit counters. Expand the MBM CPUID enumeration properties to include the MBM counter width. The previously undefined EAX output register contains, in bits [7:0], the MBM counter width encoded as an offset from 24 bits. Enumerating this property is only specified for Intel CPUs. Suggested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/afa3af2f753f6bc301fb743bc8944e749cb24afa.1588715690.git.reinette.chatre@intel.com
2020-05-06x86/resctrl: Maintain MBM counter width per resourceReinette Chatre4-14/+24
The original Memory Bandwidth Monitoring (MBM) architectural definition defines counters of up to 62 bits in the IA32_QM_CTR MSR, and the first-generation MBM implementation uses 24 bit counters. Software is required to poll at 1 second or faster to ensure that data is retrieved before a counter rollover occurs more than once under worst conditions. As system bandwidths scale the software requirement is maintained with the introduction of a per-resource enumerable MBM counter width. In preparation for supporting hardware with an enumerable MBM counter width the current globally static MBM counter width is moved to a per-resource MBM counter width. Currently initialized to 24 always to result in no functional change. In essence there is one function, mbm_overflow_count() that needs to know the counter width to handle rollovers. The static value used within mbm_overflow_count() will be replaced with a value discovered from the hardware. Support for learning the MBM counter width from hardware is added in the change that follows. Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/e36743b9800f16ce600f86b89127391f61261f23.1588715690.git.reinette.chatre@intel.com
2020-05-06x86/resctrl: Query LLC monitoring properties once during bootReinette Chatre4-2/+11
Cache and memory bandwidth monitoring are features that are part of x86 CPU resource control that is supported by the resctrl subsystem. The monitoring properties are obtained via CPUID from every CPU and only used within the resctrl subsystem where the properties are only read from boot_cpu_data. Obtain the monitoring properties once, placed in boot_cpu_data, via the ->c_bsp_init() helpers of the vendors that support X86_FEATURE_CQM_LLC. Suggested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/6d74a6ac3e69f4b7a8b4115835f9455faf0f468d.1588715690.git.reinette.chatre@intel.com
2020-05-06x86/resctrl: Remove unnecessary RMID checksReinette Chatre1-15/+0
The cache and memory bandwidth monitoring properties are read using CPUID on every CPU. After the information is read from the system a sanity check is run to (1) ensure that the RMID data is initialized for the boot CPU in case the information was not available on the boot CPU and (2) the boot CPU's RMID is set to the minimum of RMID obtained from all CPUs. Every known platform that supports resctrl has the same maximum RMID on all CPUs. Both sanity checks found in x86_init_cache_qos() can thus safely be removed. Suggested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/c9a3b60d34091840c8b0bd1c6fab15e5ba92cb17.1588715690.git.reinette.chatre@intel.com
2020-05-06x86/cpu: Move resctrl CPUID code to resctrl/Reinette Chatre2-25/+26
The function determining a platform's support and properties of cache occupancy and memory bandwidth monitoring (properties of X86_FEATURE_CQM_LLC) can be found among the common CPU code. After the feature's properties is populated in the per-CPU data the resctrl subsystem is the only consumer (via boot_cpu_data). Move the function that obtains the CPU information used by resctrl to the resctrl subsystem and rename it from init_cqm() to resctrl_cpu_detect(). The function continues to be called from the common CPU code. This move is done in preparation of the addition of some vendor specific code. No functional change. Suggested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/38433b99f9d16c8f4ee796f8cc42b871531fa203.1588715690.git.reinette.chatre@intel.com
2020-05-06x86/resctrl: Rename asm/resctrl_sched.h to asm/resctrl.hReinette Chatre5-5/+5
asm/resctrl_sched.h is dedicated to the code used for configuration of the CPU resource control state when a task is scheduled. Rename resctrl_sched.h to resctrl.h in preparation of additions that will no longer make this file dedicated to work done during scheduling. No functional change. Suggested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/6914e0ef880b539a82a6d889f9423496d471ad1d.1588715690.git.reinette.chatre@intel.com
2020-05-05signal: refactor copy_siginfo_to_user32Christoph Hellwig1-1/+27
Factor out a copy_siginfo_to_external32 helper from copy_siginfo_to_user32 that fills out the compat_siginfo, but does so on a kernel space data structure. With that we can let architectures override copy_siginfo_to_user32 with their own implementations using copy_siginfo_to_external32. That allows moving the x32 SIGCHLD purely to x86 architecture code. As a nice side effect copy_siginfo_to_external32 also comes in handy for avoiding a set_fs() call in the coredump code later on. Contains improvements from Eric W. Biederman <ebiederm@xmission.com> and Arnd Bergmann <arnd@arndb.de>. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-05-04x86/mcelog: Add compat_ioctl for 32-bit mcelog supportHe Zhe1-0/+1
A 32-bit version of mcelog issuing ioctls on /dev/mcelog causes errors like the following: MCE_GET_RECORD_LEN: Inappropriate ioctl for device This is due to a missing compat_ioctl callback. Assign to it compat_ptr_ioctl() as a generic implementation of the .compat_ioctl file operation to ioctl functions that either ignore the argument or pass a pointer to a compatible data type. [ bp: Massage commit message. ] Signed-off-by: He Zhe <zhe.he@windriver.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/1583303947-49858-1-git-send-email-zhe.he@windriver.com
2020-05-03x86/unwind/orc: Move ORC sorting variables under !CONFIG_MODULESJosh Poimboeuf1-4/+4
Fix the following warnings seen with !CONFIG_MODULES: arch/x86/kernel/unwind_orc.c:29:26: warning: 'cur_orc_table' defined but not used [-Wunused-variable] 29 | static struct orc_entry *cur_orc_table = __start_orc_unwind; | ^~~~~~~~~~~~~ arch/x86/kernel/unwind_orc.c:28:13: warning: 'cur_orc_ip_table' defined but not used [-Wunused-variable] 28 | static int *cur_orc_ip_table = __start_orc_unwind_ip; | ^~~~~~~~~~~~~~~~ Fixes: 153eb2223c79 ("x86/unwind/orc: Convert global variables to static") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linux Next Mailing List <linux-next@vger.kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20200428071640.psn5m7eh3zt2in4v@treble
2020-05-01x86/apic: Move TSC deadline timer debug printkThomas Gleixner1-13/+14
Leon reported that the printk_once() in __setup_APIC_LVTT() triggers a lockdep splat due to a lock order violation between hrtimer_base::lock and console_sem, when the 'once' condition is reset via /sys/kernel/debug/clear_warn_once after boot. The initial printk cannot trigger this because that happens during boot when the local APIC timer is set up on the boot CPU. Prevent it by moving the printk to a place which is guaranteed to be only called once during boot. Mark the deadline timer check related functions and data __init while at it. Reported-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/87y2qhoshi.fsf@nanos.tec.linutronix.de
2020-04-30x86/perf: Add hardware performance events support for Zhaoxin CPU.CodyYao-oc1-0/+8
Zhaoxin CPU has provided facilities for monitoring performance via PMU (Performance Monitor Unit), but the functionality is unused so far. Therefore, add support for zhaoxin pmu to make performance related hardware events available. The PMU is mostly an Intel Architectural PerfMon-v2 with a novel errata for the ZXC line. It supports the following events: ----------------------------------------------------------------------------------------------------------------------------------- Event | Event | Umask | Description | Select | | ----------------------------------------------------------------------------------------------------------------------------------- cpu-cycles | 82h | 00h | unhalt core clock instructions | 00h | 00h | number of instructions at retirement. cache-references | 15h | 05h | number of fillq pushs at the current cycle. cache-misses | 1ah | 05h | number of l2 miss pushed by fillq. branch-instructions | 28h | 00h | counts the number of branch instructions retired. branch-misses | 29h | 00h | mispredicted branch instructions at retirement. bus-cycles | 83h | 00h | unhalt bus clock stalled-cycles-frontend | 01h | 01h | Increments each cycle the # of Uops issued by the RAT to RS. stalled-cycles-backend | 0fh | 04h | RS0/1/2/3/45 empty L1-dcache-loads | 68h | 05h | number of retire/commit load. L1-dcache-load-misses | 4bh | 05h | retired load uops whose data source followed an L1 miss. L1-dcache-stores | 69h | 06h | number of retire/commit Store,no LEA L1-dcache-store-misses | 62h | 05h | cache lines in M state evicted out of L1D due to Snoop HitM or dirty line replacement. L1-icache-loads | 00h | 03h | number of l1i cache access for valid normal fetch,including un-cacheable access. L1-icache-load-misses | 01h | 03h | number of l1i cache miss for valid normal fetch,including un-cacheable miss. L1-icache-prefetches | 0ah | 03h | number of prefetch. L1-icache-prefetch-misses | 0bh | 03h | number of prefetch miss. dTLB-loads | 68h | 05h | number of retire/commit load dTLB-load-misses | 2ch | 05h | number of load operations miss all level tlbs and cause a tablewalk. dTLB-stores | 69h | 06h | number of retire/commit Store,no LEA dTLB-store-misses | 30h | 05h | number of store operations miss all level tlbs and cause a tablewalk. dTLB-prefetches | 64h | 05h | number of hardware pte prefetch requests dispatched out of the prefetch FIFO. dTLB-prefetch-misses | 65h | 05h | number of hardware pte prefetch requests miss the l1d data cache. iTLB-load | 00h | 00h | actually counter instructions. iTLB-load-misses | 34h | 05h | number of code operations miss all level tlbs and cause a tablewalk. ----------------------------------------------------------------------------------------------------------------------------------- Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: CodyYao-oc <CodyYao-oc@zhaoxin.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/1586747669-4827-1-git-send-email-CodyYao-oc@zhaoxin.com
2020-04-30x86: Change {JMP,CALL}_NOSPEC argumentPeter Zijlstra2-3/+3
In order to change the {JMP,CALL}_NOSPEC macros to call out-of-line versions of the retpoline magic, we need to remove the '%' from the argument, such that we can paste it onto symbol names. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lkml.kernel.org/r/20200428191700.151623523@infradead.org
2020-04-30Merge branch 'x86/asm' of ↵Will Deacon3-9/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into for-next/asm As agreed with Boris, merge in the 'x86/asm' branch from -tip so that we can select the new 'ARCH_USE_SYM_ANNOTATIONS' Kconfig symbol, which is required by the BTI kernel patches. * 'x86/asm' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/asm: Provide a Kconfig symbol for disabling old assembly annotations x86/32: Remove CONFIG_DOUBLEFAULT
2020-04-28Merge branch 'work.sysctl' of ↵Daniel Borkmann1-2/+1
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull in Christoph Hellwig's series that changes the sysctl's ->proc_handler methods to take kernel pointers instead. It gets rid of the set_fs address space overrides used by BPF. As per discussion, pull in the feature branch into bpf-next as it relates to BPF sysctl progs. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200427071508.GV23230@ZenIV.linux.org.uk/T/
2020-04-28x86/tboot: Mark tboot staticChristoph Hellwig1-2/+6
This structure is only really used in tboot.c. The only exception is a single tboot_enabled check, but for that we don't need an inline function. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200428051703.1625952-1-hch@lst.de
2020-04-27x86/setup: Add an initrdmem= option to specify initrd physical addressRonald G. Minnich1-0/+6
Add the initrdmem option: initrdmem=ss[KMG],nn[KMG] which is used to specify the physical address of the initrd, almost always an address in FLASH. Also add code for x86 to use the existing phys_init_start and phys_init_size variables in the kernel. This is useful in cases where a kernel and an initrd is placed in FLASH, but there is no firmware file system structure in the FLASH. One such situation occurs when unused FLASH space on UEFI systems has been reclaimed by, e.g., taking it from the Management Engine. For example, on many systems, the ME is given half the FLASH part; not only is 2.75M of an 8M part unused; but 10.75M of a 16M part is unused. This space can be used to contain an initrd, but need to tell Linux where it is. This space is "raw": due to, e.g., UEFI limitations: it can not be added to UEFI firmware volumes without rebuilding UEFI from source or writing a UEFI device driver. It can be referenced only as a physical address and size. At the same time, if a kernel can be "netbooted" or loaded from GRUB or syslinux, the option of not using the physical address specification should be available. Then, it is easy to boot the kernel and provide an initrd; or boot the the kernel and let it use the initrd in FLASH. In practice, this has proven to be very helpful when integrating Linux into FLASH on x86. Hence, the most flexible and convenient path is to enable the initrdmem command line option in a way that it is the last choice tried. For example, on the DigitalLoggers Atomic Pi, an image into FLASH can be burnt in with a built-in command line which includes: initrdmem=0xff968000,0x200000 which specifies a location and size. [ bp: Massage commit message, make it passive. ] [akpm@linux-foundation.org: coding style fixes] Signed-off-by: Ronald G. Minnich <rminnich@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: H. Peter Anvin (Intel) <hpa@zytor.com> Link: http://lkml.kernel.org/r/CAP6exYLK11rhreX=6QPyDQmW7wPHsKNEFtXE47pjx41xS6O7-A@mail.gmail.com Link: https://lkml.kernel.org/r/20200426011021.1cskg0AGd%akpm@linux-foundation.org
2020-04-27sysctl: pass kernel pointers to ->proc_handlerChristoph Hellwig1-2/+1
Instead of having all the sysctl handlers deal with user pointers, which is rather hairy in terms of the BPF interaction, copy the input to and from userspace in common code. This also means that the strings are always NUL-terminated by the common code, making the API a little bit safer. As most handler just pass through the data to one of the common handlers a lot of the changes are mechnical. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-04-26x86/cpu: Export native_write_cr4() only when CONFIG_LKTDM=mThomas Gleixner1-0/+2
Modules have no business poking into this but fixing this is for later. [ bp: Carve out from an earlier patch. ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200421092558.939985695@linutronix.de
2020-04-26x86/tlb: Move __flush_tlb_one_user() out of lineThomas Gleixner1-5/+0
cpu_tlbstate is exported because various TLB-related functions need access to it, but cpu_tlbstate is sensitive information which should only be accessed by well-contained kernel functions and not be directly exposed to modules. As a third step, move _flush_tlb_one_user() out of line and hide the native function. The latter can be static when CONFIG_PARAVIRT is disabled. Consolidate the name space while at it and remove the pointless extra wrapper in the paravirt code. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200421092559.428213098@linutronix.de
2020-04-26x86/tlb: Move __flush_tlb_global() out of lineThomas Gleixner1-9/+0
cpu_tlbstate is exported because various TLB-related functions need access to it, but cpu_tlbstate is sensitive information which should only be accessed by well-contained kernel functions and not be directly exposed to modules. As a second step, move __flush_tlb_global() out of line and hide the native function. The latter can be static when CONFIG_PARAVIRT is disabled. Consolidate the namespace while at it and remove the pointless extra wrapper in the paravirt code. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200421092559.336916818@linutronix.de
2020-04-26x86/tlb: Move __flush_tlb() out of lineThomas Gleixner2-8/+3
cpu_tlbstate is exported because various TLB-related functions need access to it, but cpu_tlbstate is sensitive information which should only be accessed by well-contained kernel functions and not be directly exposed to modules. As a first step, move __flush_tlb() out of line and hide the native function. The latter can be static when CONFIG_PARAVIRT is disabled. Consolidate the namespace while at it and remove the pointless extra wrapper in the paravirt code. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200421092559.246130908@linutronix.de
2020-04-25x86/unwind/orc: Fix premature unwind stoppage due to IRET framesJosh Poimboeuf1-11/+40
The following execution path is possible: fsnotify() [ realign the stack and store previous SP in R10 ] <IRQ> [ only IRET regs saved ] common_interrupt() interrupt_entry() <NMI> [ full pt_regs saved ] ... [ unwind stack ] When the unwinder goes through the NMI and the IRQ on the stack, and then sees fsnotify(), it doesn't have access to the value of R10, because it only has the five IRET registers. So the unwind stops prematurely. However, because the interrupt_entry() code is careful not to clobber R10 before saving the full regs, the unwinder should be able to read R10 from the previously saved full pt_regs associated with the NMI. Handle this case properly. When encountering an IRET regs frame immediately after a full pt_regs frame, use the pt_regs as a backup which can be used to get the C register values. Also, note that a call frame resets the 'prev_regs' value, because a function is free to clobber the registers. For this fix to work, the IRET and full regs frames must be adjacent, with no FUNC frames in between. So replace the FUNC hint in interrupt_entry() with an IRET_REGS hint. Fixes: ee9f8fce9964 ("x86/unwind: Add the ORC unwinder") Reviewed-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Jones <dsj@fb.com> Cc: Jann Horn <jannh@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: https://lore.kernel.org/r/97a408167cc09f1cfa0de31a7b70dd88868d743f.1587808742.git.jpoimboe@redhat.com
2020-04-25x86/unwind/orc: Fix error path for bad ORC entry typeJosh Poimboeuf1-1/+1
If the ORC entry type is unknown, nothing else can be done other than reporting an error. Exit the function instead of breaking out of the switch statement. Fixes: ee9f8fce9964 ("x86/unwind: Add the ORC unwinder") Reviewed-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Jones <dsj@fb.com> Cc: Jann Horn <jannh@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: https://lore.kernel.org/r/a7fa668ca6eabbe81ab18b2424f15adbbfdc810a.1587808742.git.jpoimboe@redhat.com
2020-04-25x86/unwind/orc: Prevent unwinding before ORC initializationJosh Poimboeuf1-3/+3
If the unwinder is called before the ORC data has been initialized, orc_find() returns NULL, and it tries to fall back to using frame pointers. This can cause some unexpected warnings during boot. Move the 'orc_init' check from orc_find() to __unwind_init(), so that it doesn't even try to unwind from an uninitialized state. Fixes: ee9f8fce9964 ("x86/unwind: Add the ORC unwinder") Reviewed-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Jones <dsj@fb.com> Cc: Jann Horn <jannh@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: https://lore.kernel.org/r/069d1499ad606d85532eb32ce39b2441679667d5.1587808742.git.jpoimboe@redhat.com
2020-04-25x86/unwind/orc: Don't skip the first frame for inactive tasksMiroslav Benes1-1/+1
When unwinding an inactive task, the ORC unwinder skips the first frame by default. If both the 'regs' and 'first_frame' parameters of unwind_start() are NULL, 'state->sp' and 'first_frame' are later initialized to the same value for an inactive task. Given there is a "less than or equal to" comparison used at the end of __unwind_start() for skipping stack frames, the first frame is skipped. Drop the equal part of the comparison and make the behavior equivalent to the frame pointer unwinder. Fixes: ee9f8fce9964 ("x86/unwind: Add the ORC unwinder") Reviewed-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Jones <dsj@fb.com> Cc: Jann Horn <jannh@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: https://lore.kernel.org/r/7f08db872ab59e807016910acdbe82f744de7065.1587808742.git.jpoimboe@redhat.com
2020-04-25x86/unwind: Prevent false warnings for non-current tasksJosh Poimboeuf3-18/+28
There's some daring kernel code out there which dumps the stack of another task without first making sure the task is inactive. If the task happens to be running while the unwinder is reading the stack, unusual unwinder warnings can result. There's no race-free way for the unwinder to know whether such a warning is legitimate, so just disable unwinder warnings for all non-current tasks. Reviewed-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Jones <dsj@fb.com> Cc: Jann Horn <jannh@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: https://lore.kernel.org/r/ec424a2aea1d461eb30cab48a28c6433de2ab784.1587808742.git.jpoimboe@redhat.com