summaryrefslogtreecommitdiff
path: root/arch/x86/kernel
AgeCommit message (Collapse)AuthorFilesLines
2021-06-18Merge branch 'sched/urgent' into sched/core, to resolve conflictsIngo Molnar6-91/+99
This commit in sched/urgent moved the cfs_rq_is_decayed() function: a7b359fc6a37: ("sched/fair: Correctly insert cfs_rq's to list on unthrottle") and this fresh commit in sched/core modified it in the old location: 9e077b52d86a: ("sched/pelt: Check that *_avg are null when *_sum are") Merge the two variants. Conflicts: kernel/sched/fair.c Signed-off-by: Ingo Molnar <mingo@kernel.org>
2021-06-17hyperv: Detect Nested virtualization support for SVMVineeth Pillai1-3/+7
Previously, to detect nested virtualization enlightenment support, we were using HV_X64_ENLIGHTENED_VMCS_RECOMMENDED feature bit of HYPERV_CPUID_ENLIGHTMENT_INFO.EAX CPUID as docuemented in TLFS: "Bit 14: Recommend a nested hypervisor using the enlightened VMCS interface. Also indicates that additional nested enlightenments may be available (see leaf 0x4000000A)". Enlightened VMCS, however, is an Intel only feature so the above detection method doesn't work for AMD. So, use the HYPERV_CPUID_VENDOR_AND_MAX_FUNCTIONS.EAX CPUID information ("The maximum input value for hypervisor CPUID information.") and this works for both AMD and Intel. Signed-off-by: Vineeth Pillai <viremana@linux.microsoft.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Message-Id: <43b25ff21cd2d9a51582033c9bdd895afefac056.1622730232.git.viremana@linux.microsoft.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-15x86/sgx: Add missing xa_destroy() when virtual EPC is destroyedKai Huang1-0/+1
xa_destroy() needs to be called to destroy a virtual EPC's page array before calling kfree() to free the virtual EPC. Currently it is not called so add the missing xa_destroy(). Fixes: 540745ddbc70 ("x86/sgx: Introduce virtual EPC for use by KVM guests") Signed-off-by: Kai Huang <kai.huang@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Dave Hansen <dave.hansen@intel.com> Tested-by: Yang Zhong <yang.zhong@intel.com> Link: https://lkml.kernel.org/r/20210615101639.291929-1-kai.huang@intel.com
2021-06-15x86/tsx: Clear CPUID bits when TSX always force abortsPawan Gupta3-3/+40
As a result of TSX deprecation, some processors always abort TSX transactions by default after a microcode update. When TSX feature cannot be used it is better to hide it. Clear CPUID.RTM and CPUID.HLE bits when TSX transactions always abort. [ bp: Massage commit message and comments. ] Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Tested-by: Neelima Krishnan <neelima.krishnan@intel.com> Link: https://lkml.kernel.org/r/5209b3d72ffe5bd3cafdcc803f5b883f785329c3.1623704845.git-series.pawan.kumar.gupta@linux.intel.com
2021-06-15x86/sev: Propagate #GP if getting linear instruction address failedJoerg Roedel1-1/+8
When an instruction is fetched from user-space, segmentation needs to be taken into account. This means that getting the linear address of an instruction can fail. Hardware would raise a #GP exception in that case, but the #VC exception handler would emulate it as a page-fault. The insn_fetch_from_user*() functions now provide the relevant information in case of a failure. Use that and propagate a #GP when the linear address of an instruction to fetch could not be calculated. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210614135327.9921-7-joro@8bytes.org
2021-06-15x86/insn: Extend error reporting from insn_fetch_from_user[_inatomic]()Joerg Roedel2-10/+8
The error reporting from the insn_fetch_from_user*() functions is not very verbose. Extend it to include information on whether the linear RIP could not be calculated or whether the memory access faulted. This will be used in the SEV-ES code to propagate the correct exception depending on what went wrong during instruction fetch. [ bp: Massage comments. ] Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210614135327.9921-6-joro@8bytes.org
2021-06-15x86/sev: Fix error message in runtime #VC handlerJoerg Roedel1-1/+1
The runtime #VC handler is not "early" anymore. Fix the copy&paste error and remove that word from the error message. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210614135327.9921-2-joro@8bytes.org
2021-06-12Merge tag 'perf-urgent-2021-06-12' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Misc fixes: - Fix the NMI watchdog on ancient Intel CPUs - Remove a misguided, NMI-unsafe KASAN callback from the NMI-safe irq_work path used by perf. - Fix uncore events on Ice Lake servers. - Someone booted maxcpus=1 on an SNB-EP, and the uncore driver emitted warnings and was probably buggy. Fix it. - KCSAN found a genuine data race in the core perf code. Somewhat ironically the bug was introduced through a recent race fix. :-/ In our defense, the new race window was much more narrow. Fix it" * tag 'perf-urgent-2021-06-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/nmi_watchdog: Fix old-style NMI watchdog regression on old Intel CPUs irq_work: Make irq_work_queue() NMI-safe again perf/x86/intel/uncore: Fix M2M event umask for Ice Lake server perf/x86/intel/uncore: Fix a kernel WARNING triggered by maxcpus=1 perf: Fix data race between pin_count increment/decrement
2021-06-11x86/sgx: Correct kernel-doc's arg name in sgx_encl_release()ChenXiaoSong1-1/+1
Fix the following kernel-doc warning: arch/x86/kernel/cpu/sgx/encl.c:392: warning: Function parameter \ or member 'ref' not described in 'sgx_encl_release' [ bp: Massage commit message. ] Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210609035510.2083694-1-chenxiaosong2@huawei.com
2021-06-10x86/nmi_watchdog: Fix old-style NMI watchdog regression on old Intel CPUsCodyYao-oc1-2/+2
The following commit: 3a4ac121c2ca ("x86/perf: Add hardware performance events support for Zhaoxin CPU.") Got the old-style NMI watchdog logic wrong and broke it for basically every Intel CPU where it was active. Which is only truly old CPUs, so few people noticed. On CPUs with perf events support we turn off the old-style NMI watchdog, so it was pretty pointless to add the logic for X86_VENDOR_ZHAOXIN to begin with ... :-/ Anyway, the fix is to restore the old logic and add a 'break'. [ mingo: Wrote a new changelog. ] Fixes: 3a4ac121c2ca ("x86/perf: Add hardware performance events support for Zhaoxin CPU.") Signed-off-by: CodyYao-oc <CodyYao-oc@zhaoxin.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20210607025335.9643-1-CodyYao-oc@zhaoxin.com
2021-06-10x86/fpu: Reset state for all signal restore failuresThomas Gleixner1-11/+15
If access_ok() or fpregs_soft_set() fails in __fpu__restore_sig() then the function just returns but does not clear the FPU state as it does for all other fatal failures. Clear the FPU state for these failures as well. Fixes: 72a671ced66d ("x86, fpu: Unify signal handling code paths for x86 and x86_64 kernels") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/87mtryyhhz.ffs@nanos.tec.linutronix.de
2021-06-10Merge tag 'drm-intel-next-2021-06-09' of ↵Dave Airlie1-0/+1
git://anongit.freedesktop.org/drm/drm-intel into drm-next Cross-subsystem Changes: - x86/gpu: add JasperLake to gen11 early quirks (Although the patch lacks the Ack info, it has been Acked by Borislav) Driver Changes: - General DMC improves (Anusha) - More ADL-P enabling (Vandita, Matt, Jose, Mika, Anusha, Imre, Lucas, Jani, Manasi, Ville, Stanislav) - Introduce MBUS relative dbuf offset (Ville) - PSR fixes and improvements (Gwan, Jose, Ville) - Re-enable LTTPR non-transparent LT mode for DPCD_REV < 1.4 (Ville) - Remove duplicated declarations (Shaokun, Wan) - Check HDMI sink deep color capabilities during .mode_valid (Ville) - Fix display flicker screan related to console and FBC (Chris) - Remaining conversions of GRAPHICS_VER (Lucas) - Drop invalid FIXME (Jose) - Fix bigjoiner check in dsc_disable (Vandita) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/YMEy2Ew82BeL/hDK@intel.com
2021-06-09x86/fpu: Add address range checks to copy_user_to_xstate()Andy Lutomirski1-3/+3
copy_user_to_xstate() uses __copy_from_user(), which provides a negligible speedup. Fortunately, both call sites are at least almost correct. __fpu__restore_sig() checks access_ok() with xstate_sigframe_size() length and ptrace regset access uses fpu_user_xstate_size. These should be valid upper bounds on the length, so, at worst, this would cause spurious failures and not accesses to kernel memory. Nonetheless, this is far more fragile than necessary and none of these callers are in a hotpath. Use copy_from_user() instead. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Rik van Riel <riel@surriel.com> Link: https://lkml.kernel.org/r/20210608144346.140254130@linutronix.de
2021-06-09x86/fpu: Invalidate FPU state after a failed XRSTOR from a user bufferAndy Lutomirski1-0/+19
Both Intel and AMD consider it to be architecturally valid for XRSTOR to fail with #PF but nonetheless change the register state. The actual conditions under which this might occur are unclear [1], but it seems plausible that this might be triggered if one sibling thread unmaps a page and invalidates the shared TLB while another sibling thread is executing XRSTOR on the page in question. __fpu__restore_sig() can execute XRSTOR while the hardware registers are preserved on behalf of a different victim task (using the fpu_fpregs_owner_ctx mechanism), and, in theory, XRSTOR could fail but modify the registers. If this happens, then there is a window in which __fpu__restore_sig() could schedule out and the victim task could schedule back in without reloading its own FPU registers. This would result in part of the FPU state that __fpu__restore_sig() was attempting to load leaking into the victim task's user-visible state. Invalidate preserved FPU registers on XRSTOR failure to prevent this situation from corrupting any state. [1] Frequent readers of the errata lists might imagine "complex microarchitectural conditions". Fixes: 1d731e731c4c ("x86/fpu: Add a fastpath to __fpu__restore_sig()") Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Rik van Riel <riel@surriel.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20210608144345.758116583@linutronix.de
2021-06-09x86/fpu: Prevent state corruption in __fpu__restore_sig()Thomas Gleixner1-8/+1
The non-compacted slowpath uses __copy_from_user() and copies the entire user buffer into the kernel buffer, verbatim. This means that the kernel buffer may now contain entirely invalid state on which XRSTOR will #GP. validate_user_xstate_header() can detect some of that corruption, but that leaves the onus on callers to clear the buffer. Prior to XSAVES support, it was possible just to reinitialize the buffer, completely, but with supervisor states that is not longer possible as the buffer clearing code split got it backwards. Fixing that is possible but not corrupting the state in the first place is more robust. Avoid corruption of the kernel XSAVE buffer by using copy_user_to_xstate() which validates the XSAVE header contents before copying the actual states to the kernel. copy_user_to_xstate() was previously only called for compacted-format kernel buffers, but it works for both compacted and non-compacted forms. Using it for the non-compacted form is slower because of multiple __copy_from_user() operations, but that cost is less important than robust code in an already slow path. [ Changelog polished by Dave Hansen ] Fixes: b860eb8dce59 ("x86/fpu/xstate: Define new functions for clearing fpregs and xstates") Reported-by: syzbot+2067e764dbcd10721e2e@syzkaller.appspotmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Rik van Riel <riel@surriel.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20210608144345.611833074@linutronix.de
2021-06-08x86/setup: Document that Windows reserves the first MiBBorislav Petkov1-10/+11
It does so unconditionally too, on Intel and AMD machines, to work around BIOS bugs, as confirmed by Microsoft folks (see Link for full details). Reflow the paragraph, while at it. Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/MWHPR21MB159330952629D36EEDE706B3D7379@MWHPR21MB1593.namprd21.prod.outlook.com
2021-06-08x86/gpu: add JasperLake to gen11 early quirksTejas Upadhyay1-0/+1
Let's reserve JSL stolen memory for graphics. JasperLake is a gen11 platform which is compatible with ICL/EHL changes. This was missed in commit 24ea098b7c0d ("drm/i915/jsl: Split EHL/JSL platform info and PCI ids") V2: - Added maintainer list in cc - Added patch ref in commit message V1: - Added Cc: x86@kernel.org Fixes: 24ea098b7c0d ("drm/i915/jsl: Split EHL/JSL platform info and PCI ids") Cc: <stable@vger.kernel.org> # v5.11+ Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org Cc: José Roberto de Souza <jose.souza@intel.com> Signed-off-by: Tejas Upadhyay <tejaskumarx.surendrakumar.upadhyay@intel.com> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210608053411.394166-1-tejaskumarx.surendrakumar.upadhyay@intel.com
2021-06-07x86/crash: Remove crash_reserve_low_1M()Mike Rapoport1-13/+0
The entire memory range under 1M is unconditionally reserved in setup_arch(), so there is no need for crash_reserve_low_1M() anymore. Remove this function. Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210601075354.5149-4-rppt@kernel.org
2021-06-07x86/setup: Remove CONFIG_X86_RESERVE_LOW and reservelow= optionsMike Rapoport1-24/+0
The CONFIG_X86_RESERVE_LOW build time and reservelow= command line option allowed to control the amount of memory under 1M that would be reserved at boot to avoid using memory that can be potentially clobbered by BIOS. Since the entire range under 1M is always reserved there is no need for these options anymore and they can be removed. Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210601075354.5149-3-rppt@kernel.org
2021-06-07Merge tag 'v5.13-rc5' into x86/cleanupsBorislav Petkov16-166/+230
Pick up dependent changes in order to base further cleanups ontop. Signed-off-by: Borislav Petkov <bp@suse.de>
2021-06-03x86/setup: Always reserve the first 1M of RAMMike Rapoport1-14/+21
There are BIOSes that are known to corrupt the memory under 1M, or more precisely under 640K because the memory above 640K is anyway reserved for the EGA/VGA frame buffer and BIOS. To prevent usage of the memory that will be potentially clobbered by the kernel, the beginning of the memory is always reserved. The exact size of the reserved area is determined by CONFIG_X86_RESERVE_LOW build time and the "reservelow=" command line option. The reserved range may be from 4K to 640K with the default of 64K. There are also configurations that reserve the entire 1M range, like machines with SandyBridge graphic devices or systems that enable crash kernel. In addition to the potentially clobbered memory, EBDA of unknown size may be as low as 128K and the memory above that EBDA start is also reserved early. It would have been possible to reserve the entire range under 1M unless for the real mode trampoline that must reside in that area. To accommodate placement of the real mode trampoline and keep the memory safe from being clobbered by BIOS, reserve the first 64K of RAM before memory allocations are possible and then, after the real mode trampoline is allocated, reserve the entire range from 0 to 1M. Update trim_snb_memory() and reserve_real_mode() to avoid redundant reservations of the same memory range. Also make sure the memory under 1M is not getting freed by efi_free_boot_services(). [ bp: Massage commit message and comments. ] Fixes: a799c2bd29d1 ("x86/setup: Consolidate early memory reservations") Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Hugh Dickins <hughd@google.com> Link: https://bugzilla.kernel.org/show_bug.cgi?id=213177 Link: https://lkml.kernel.org/r/20210601075354.5149-2-rppt@kernel.org
2021-06-03Merge branch 'sched/urgent' into sched/core, to pick up fixesIngo Molnar11-77/+133
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2021-06-03x86/alternative: Optimize single-byte NOPs at an arbitrary positionBorislav Petkov1-18/+46
Up until now the assumption was that an alternative patching site would have some instructions at the beginning and trailing single-byte NOPs (0x90) padding. Therefore, the patching machinery would go and optimize those single-byte NOPs into longer ones. However, this assumption is broken on 32-bit when code like hv_do_hypercall() in hyperv_init() would use the ratpoline speculation killer CALL_NOSPEC. The 32-bit version of that macro would align certain insns to 16 bytes, leading to the compiler issuing a one or more single-byte NOPs, depending on the holes it needs to fill for alignment. That would lead to the warning in optimize_nops() to fire: ------------[ cut here ]------------ Not a NOP at 0xc27fb598 WARNING: CPU: 0 PID: 0 at arch/x86/kernel/alternative.c:211 optimize_nops.isra.13 due to that function verifying whether all of the following bytes really are single-byte NOPs. Therefore, carve out the NOP padding into a separate function and call it for each NOP range beginning with a single-byte NOP. Fixes: 23c1ad538f4f ("x86/alternatives: Optimize optimize_nops()") Reported-by: Richard Narron <richard@aaazen.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://bugzilla.kernel.org/show_bug.cgi?id=213301 Link: https://lkml.kernel.org/r/20210601212125.17145-1-bp@alien8.de
2021-06-03x86/cpufeatures: Force disable X86_FEATURE_ENQCMD and remove update_pasid()Thomas Gleixner1-57/+0
While digesting the XSAVE-related horrors which got introduced with the supervisor/user split, the recent addition of ENQCMD-related functionality got on the radar and turned out to be similarly broken. update_pasid(), which is only required when X86_FEATURE_ENQCMD is available, is invoked from two places: 1) From switch_to() for the incoming task 2) Via a SMP function call from the IOMMU/SMV code #1 is half-ways correct as it hacks around the brokenness of get_xsave_addr() by enforcing the state to be 'present', but all the conditionals in that code are completely pointless for that. Also the invocation is just useless overhead because at that point it's guaranteed that TIF_NEED_FPU_LOAD is set on the incoming task and all of this can be handled at return to user space. #2 is broken beyond repair. The comment in the code claims that it is safe to invoke this in an IPI, but that's just wishful thinking. FPU state of a running task is protected by fregs_lock() which is nothing else than a local_bh_disable(). As BH-disabled regions run usually with interrupts enabled the IPI can hit a code section which modifies FPU state and there is absolutely no guarantee that any of the assumptions which are made for the IPI case is true. Also the IPI is sent to all CPUs in mm_cpumask(mm), but the IPI is invoked with a NULL pointer argument, so it can hit a completely unrelated task and unconditionally force an update for nothing. Worse, it can hit a kernel thread which operates on a user space address space and set a random PASID for it. The offending commit does not cleanly revert, but it's sufficient to force disable X86_FEATURE_ENQCMD and to remove the broken update_pasid() code to make this dysfunctional all over the place. Anything more complex would require more surgery and none of the related functions outside of the x86 core code are blatantly wrong, so removing those would be overkill. As nothing enables the PASID bit in the IA32_XSS MSR yet, which is required to make this actually work, this cannot result in a regression except for related out of tree train-wrecks, but they are broken already today. Fixes: 20f0afd1fb3d ("x86/mmu: Allocate/free a PASID") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/87mtsd6gr9.ffs@nanos.tec.linutronix.de
2021-06-03kprobes: Do not increment probe miss count in the fault handlerNaveen N. Rao1-8/+0
Kprobes has a counter 'nmissed', that is used to count the number of times a probe handler was not called. This generally happens when we hit a kprobe while handling another kprobe. However, if one of the probe handlers causes a fault, we are currently incrementing 'nmissed'. The comment in fault handler indicates that this can be used to account faults taken by the probe handlers. But, this has never been the intention as is evident from the comment above 'nmissed' in 'struct kprobe': /*count the number of times this probe was temporarily disarmed */ unsigned long nmissed; Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Link: https://lkml.kernel.org/r/20210601120150.672652-1-naveen.n.rao@linux.vnet.ibm.com
2021-06-03x86/alternative: Align insn bytes verticallyBorislav Petkov1-2/+2
For easier inspection which bytes have changed. For example: feat: 7*32+12, old: (__x86_indirect_thunk_r10+0x0/0x20 (ffffffff81c02480) len: 17), repl: (ffffffff897813aa, len: 17) ffffffff81c02480: old_insn: 41 ff e2 90 90 90 90 90 90 90 90 90 90 90 90 90 90 ffffffff897813aa: rpl_insn: e8 07 00 00 00 f3 90 0f ae e8 eb f9 4c 89 14 24 c3 ffffffff81c02480: final_insn: e8 07 00 00 00 f3 90 0f ae e8 eb f9 4c 89 14 24 c3 No functional changes. Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210601193713.16190-1-bp@alien8.de
2021-06-02x86/hyperv: fix logical processor creationPraveen Kumar1-1/+1
Microsoft Hypervisor expects the logical processor index to be the same as CPU's index during logical processor creation. Using cpu_physical_id confuses hypervisor's scheduler. That causes the root partition not boot when core scheduler is used. This patch removes the call to cpu_physical_id and uses the CPU index directly for bringing up logical processor. This scheme works for both classic scheduler and core scheduler. Fixes: 333abaf5abb3 (x86/hyperv: implement and use hv_smp_prepare_cpus) Signed-off-by: Praveen Kumar <kumarpraveen@linux.microsoft.com> Link: https://lore.kernel.org/r/20210531074046.113452-1-kumarpraveen@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2021-06-01perf/x86/rapl: Use CPUID bit on AMD and Hygon partsAndrew Cooper2-0/+8
AMD and Hygon CPUs have a CPUID bit for RAPL. Drop the fam17h suffix as it is stale already. Make use of this instead of a model check to work more nicely in virtual environments where RAPL typically isn't available. [ bp: drop the ../cpu/powerflags.c hunk which is superfluous as the "rapl" bit name appears already in flags. ] Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210514135920.16093-1-andrew.cooper3@citrix.com
2021-06-01kprobes: Remove kprobe::fault_handlerPeter Zijlstra1-10/+0
The reason for kprobe::fault_handler(), as given by their comment: * We come here because instructions in the pre/post * handler caused the page_fault, this could happen * if handler tries to access user space by * copy_from_user(), get_user() etc. Let the * user-specified handler try to fix it first. Is just plain bad. Those other handlers are ran from non-preemptible context and had better use _nofault() functions. Also, there is no upstream usage of this. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Link: https://lore.kernel.org/r/20210525073213.561116662@infradead.org
2021-05-31x86/thermal: Fix LVT thermal setup for SMI delivery modeBorislav Petkov1-0/+9
There are machines out there with added value crap^WBIOS which provide an SMI handler for the local APIC thermal sensor interrupt. Out of reset, the BSP on those machines has something like 0x200 in that APIC register (timestamps left in because this whole issue is timing sensitive): [ 0.033858] read lvtthmr: 0x330, val: 0x200 which means: - bit 16 - the interrupt mask bit is clear and thus that interrupt is enabled - bits [10:8] have 010b which means SMI delivery mode. Now, later during boot, when the kernel programs the local APIC, it soft-disables it temporarily through the spurious vector register: setup_local_APIC: ... /* * If this comes from kexec/kcrash the APIC might be enabled in * SPIV. Soft disable it before doing further initialization. */ value = apic_read(APIC_SPIV); value &= ~APIC_SPIV_APIC_ENABLED; apic_write(APIC_SPIV, value); which means (from the SDM): "10.4.7.2 Local APIC State After It Has Been Software Disabled ... * The mask bits for all the LVT entries are set. Attempts to reset these bits will be ignored." And this happens too: [ 0.124111] APIC: Switch to symmetric I/O mode setup [ 0.124117] lvtthmr 0x200 before write 0xf to APIC 0xf0 [ 0.124118] lvtthmr 0x10200 after write 0xf to APIC 0xf0 This results in CPU 0 soft lockups depending on the placement in time when the APIC soft-disable happens. Those soft lockups are not 100% reproducible and the reason for that can only be speculated as no one tells you what SMM does. Likely, it confuses the SMM code that the APIC is disabled and the thermal interrupt doesn't doesn't fire at all, leading to CPU 0 stuck in SMM forever... Now, before 4f432e8bb15b ("x86/mce: Get rid of mcheck_intel_therm_init()") due to how the APIC_LVTTHMR was read before APIC initialization in mcheck_intel_therm_init(), it would read the value with the mask bit 16 clear and then intel_init_thermal() would replicate it onto the APs and all would be peachy - the thermal interrupt would remain enabled. But that commit moved that reading to a later moment in intel_init_thermal(), resulting in reading APIC_LVTTHMR on the BSP too late and with its interrupt mask bit set. Thus, revert back to the old behavior of reading the thermal LVT register before the APIC gets initialized. Fixes: 4f432e8bb15b ("x86/mce: Get rid of mcheck_intel_therm_init()") Reported-by: James Feeney <james@nurealm.net> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> Cc: Zhang Rui <rui.zhang@intel.com> Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Link: https://lkml.kernel.org/r/YKIqDdFNaXYd39wz@zn.tnic
2021-05-31x86/cstate: Allow ACPI C1 FFH MWAIT use on Hygon systemsPu Wen1-1/+2
Hygon systems support the MONITOR/MWAIT instructions and these can be used for ACPI C1 in the same way as on AMD and Intel systems. The BIOS declares a C1 state in _CST to use FFH and CPUID_Fn00000005_EDX is non-zero on Hygon systems. Allow ffh_cstate_init() to succeed on Hygon systems to default using FFH MWAIT instead of HALT for ACPI C1. Signed-off-by: Pu Wen <puwen@hygon.cn> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210528081417.31474-1-puwen@hygon.cn
2021-05-29x86/apic: Mark _all_ legacy interrupts when IO/APIC is missingThomas Gleixner2-0/+21
PIC interrupts do not support affinity setting and they can end up on any online CPU. Therefore, it's required to mark the associated vectors as system-wide reserved. Otherwise, the corresponding irq descriptors are copied to the secondary CPUs but the vectors are not marked as assigned or reserved. This works correctly for the IO/APIC case. When the IO/APIC is disabled via config, kernel command line or lack of enumeration then all legacy interrupts are routed through the PIC, but nothing marks them as system-wide reserved vectors. As a consequence, a subsequent allocation on a secondary CPU can result in allocating one of these vectors, which triggers the BUG() in apic_update_vector() because the interrupt descriptor slot is not empty. Imran tried to work around that by marking those interrupts as allocated when a CPU comes online. But that's wrong in case that the IO/APIC is available and one of the legacy interrupts, e.g. IRQ0, has been switched to PIC mode because then marking them as allocated will fail as they are already marked as system vectors. Stay consistent and update the legacy vectors after attempting IO/APIC initialization and mark them as system vectors in case that no IO/APIC is available. Fixes: 69cde0004a4b ("x86/vector: Use matrix allocator for vector assignment") Reported-by: Imran Khan <imran.f.khan@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20210519233928.2157496-1-imran.f.khan@oracle.com
2021-05-28x86/mce: Include a MCi_MISC value in faked mce logsTony Luck1-1/+2
When BIOS reports memory errors to Linux using the ACPI/APEI error reporting method Linux creates a "struct mce" to pass to the normal reporting code path. The constructed record doesn't include a value for the "misc" field of the structure, and so mce_usable_address() says this record doesn't include a valid address. Net result is that functions like uc_decode_notifier() will just ignore this record instead of taking action to offline a page. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210527222846.931851-1-tony.luck@intel.com
2021-05-27x86/MCE/AMD, EDAC/mce_amd: Add new SMCA bank typesMuralidhara M K1-21/+34
Add the (HWID, MCATYPE) tuples and names for new SMCA bank types. Also, add their respective error descriptions to the MCE decoding module edac_mce_amd. Also while at it, optimize the string names for some SMCA banks. [ bp: Drop repeated comments, explain why UMC_V2 is a separate entry. ] Signed-off-by: Muralidhara M K <muralimk@amd.com> Signed-off-by: Naveen Krishna Chatradhi <nchatrad@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com> Link: https://lkml.kernel.org/r/20210526164601.66228-1-nchatrad@amd.com
2021-05-27Merge v5.13-rc3 into drm-nextDaniel Vetter3-53/+93
drm/i915 is extremely on fire without the below revert from -rc3: commit 293837b9ac8d3021657f44c9d7a14948ec01c5d0 Author: Linus Torvalds <torvalds@linux-foundation.org> Date: Wed May 19 05:55:57 2021 -1000 Revert "i915: fix remap_io_sg to verify the pgprot" Backmerge so we don't have a too wide bisect window for anything that's a more involved workload than booting the driver. Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2021-05-23Merge tag 'x86_urgent_for_v5.13_rc3' of ↵Linus Torvalds2-51/+86
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: - Fix how SEV handles MMIO accesses by forwarding potential page faults instead of killing the machine and by using the accessors with the exact functionality needed when accessing memory. - Fix a confusion with Clang LTO compiler switches passed to the it - Handle the case gracefully when VMGEXIT has been executed in userspace * tag 'x86_urgent_for_v5.13_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/sev-es: Use __put_user()/__get_user() for data accesses x86/sev-es: Forward page-faults which happen during emulation x86/sev-es: Don't return NULL from sev_es_get_ghcb() x86/build: Fix location of '-plugin-opt=' flags x86/sev-es: Invalidate the GHCB after completing VMGEXIT x86/sev-es: Move sev_es_put_ghcb() in prep for follow on patch
2021-05-21Merge branch 'for-v5.13-rc3' of ↵Linus Torvalds1-2/+7
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull siginfo fix from Eric Biederman: "During the merge window an issue with si_perf and the siginfo ABI came up. The alpha and sparc siginfo structure layout had changed with the addition of SIGTRAP TRAP_PERF and the new field si_perf. The reason only alpha and sparc were affected is that they are the only architectures that use si_trapno. Looking deeper it was discovered that si_trapno is used for only a few select signals on alpha and sparc, and that none of the other _sigfault fields past si_addr are used at all. Which means technically no regression on alpha and sparc. While the alignment concerns might be dismissed the abuse of si_errno by SIGTRAP TRAP_PERF does have the potential to cause regressions in existing userspace. While we still have time before userspace starts using and depending on the new definition siginfo for SIGTRAP TRAP_PERF this set of changes cleans up siginfo_t. - The si_trapno field is demoted from magic alpha and sparc status and made an ordinary union member of the _sigfault member of siginfo_t. Without moving it of course. - si_perf is replaced with si_perf_data and si_perf_type ending the abuse of si_errno. - Unnecessary additions to signalfd_siginfo are removed" * 'for-v5.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: signalfd: Remove SIL_PERF_EVENT fields from signalfd_siginfo signal: Deliver all of the siginfo perf data in _perf signal: Factor force_sig_perf out of perf_sigtrap signal: Implement SIL_FAULT_TRAPNO siginfo: Move si_trapno inside the union inside _si_fault
2021-05-21x86/kexec: Set_[gi]dt() -> native_[gi]dt_invalidate() in machine_kexec_*.cH. Peter Anvin (Intel)2-44/+4
These files contain private set_gdt() functions which are only used to invalid the gdt; machine_kexec_64.c also contains a set_idt() function to invalidate the idt. phys_to_virt(0) *really* doesn't make any sense for creating an invalid GDT. A NULL pointer (virtual 0) makes a lot more sense; although neither will allow any actual memory reference, a NULL pointer stands out more. Replace these calls with native_[gi]dt_invalidate(). Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210519212154.511983-7-hpa@zytor.com
2021-05-21x86/idt: Remove address argument from idt_invalidate()H. Peter Anvin (Intel)3-5/+4
There is no reason to specify any specific address to idt_invalidate(). It looks mostly like an artifact of unifying code done differently by accident. The most "sensible" address to set here is a NULL pointer - virtual address zero, just as a visual marker. This also makes it possible to mark the struct desc_ptr in idt_invalidate() as static const. Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210519212154.511983-5-hpa@zytor.com
2021-05-21x86/amd_nb: Add AMD family 19h model 50h PCI idsDavid Bartley1-0/+3
This is required to support Zen3 APUs in k10temp. Signed-off-by: David Bartley <andareed@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Wei Huang <wei.huang2@amd.com> Link: https://lkml.kernel.org/r/20210520174130.94954-1-andareed@gmail.com
2021-05-21Merge tag 'drm-intel-next-2021-05-19-1' of ↵Dave Airlie1-0/+1
git://anongit.freedesktop.org/drm/drm-intel into drm-next Core Changes: - drm: Rename DP_PSR_SELECTIVE_UPDATE to better mach eDP spec (Jose). Driver Changes: - Display plane clock rates fixes and improvements (Ville). - Uninint DMC FW loader state during shutdown (Imre). - Convert snprintf to sysfs_emit (Xuezhi). - Fix invalid access to ACPI _DSM objects (Takashi). - A big refactor around how i915 addresses the graphics and display IP versions. (Matt, Lucas). - Backlight fix (Lyude). - Display watermark and DBUF fixes (Ville). - HDCP fix (Anshuman). - Improve cases where display is not available (Jose). - Defeature PSR2 for RKL and ALD-S (Jose). - VLV DSI panel power fixes and improvements (Hans). - display-12 workaround (Jose). - Fix modesetting (Imre). - Drop redundant address-of op before lttpr_common_caps array (Imre). - Fix compiler checks (Jose, Jason). - GLK display fixes (Ville). - Fix error code returns (Dan). - eDP novel: back again to slow and wide link training everywhere (Kai-Heng). - Abstract DMC FW path (Rodrigo). - Preparation and changes for upcoming XeLPD display IP (Jose, Matt, Ville, Juha-Pekka, Animesh). - Fix comment typo in DSI code (zuoqilin). - Simplify CCS and UV plane alignment handling (Imre). - PSR Fixes on TGL (Gwan-gyeong, Jose). - Add intel_dp_hdcp.h and rename init (Jani). - Move crtc and dpll declarations around (Jani). - Fix pre-skl DP AUX precharge length (Ville). - Remove stray newlines from random files (Ville). - crtc->index and intel_crtc+drm_crtc pointer clean-up (Ville). - Add frontbuffer tracking tracepoints (Ville). - ADL-S PCI ID updates (Anand). - Use unique backlight device names (Jani). - A few clean-ups on i915/audio (Jani). - Use intel_framebuffer instead of drm one on intel_fb functions (Imre). - Add the missing MC CCS/XYUV8888 format support on display >= 12 (Imre). - Nuke display error state (Ville). - ADL-P initial enablement patches starting to land (Clint, Imre, Jose, Umesh, Vandita, Mika). - Display clean-up around VBT and the strap bits (Lucas). - Try YCbCr420 color when RGB fails (Werner). - More PSR fixes and improvements (Jose). - Other generic display code clean-up (Jose, Ville). - Use correct downstream caps for check Src-Ctl mode for PCON (Ankit). - Disable HiZ Raw Stall Optimization on broken gen7 (Simon). Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/YKVioeu0JkUAlR7y@intel.com
2021-05-19x86/sev-es: Use __put_user()/__get_user() for data accessesJoerg Roedel1-20/+46
The put_user() and get_user() functions do checks on the address which is passed to them. They check whether the address is actually a user-space address and whether its fine to access it. They also call might_fault() to indicate that they could fault and possibly sleep. All of these checks are neither wanted nor needed in the #VC exception handler, which can be invoked from almost any context and also for MMIO instructions from kernel space on kernel memory. All the #VC handler wants to know is whether a fault happened when the access was tried. This is provided by __put_user()/__get_user(), which just do the access no matter what. Also add comments explaining why __get_user() and __put_user() are the best choice here and why it is safe to use them in this context. Also explain why copy_to/from_user can't be used. In addition, also revert commit 7024f60d6552 ("x86/sev-es: Handle string port IO to kernel memory properly") because using __get_user()/__put_user() fixes the same problem while the above commit introduced several problems: 1) It uses access_ok() which is only allowed in task context. 2) It uses memcpy() which has no fault handling at all and is thus unsafe to use here. [ bp: Fix up commit ID of the reverted commit above. ] Fixes: f980f9c31a92 ("x86/sev-es: Compile early handler code into kernel image") Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: stable@vger.kernel.org # v5.10+ Link: https://lkml.kernel.org/r/20210519135251.30093-4-joro@8bytes.org
2021-05-19x86/sev-es: Forward page-faults which happen during emulationJoerg Roedel1-0/+4
When emulating guest instructions for MMIO or IOIO accesses, the #VC handler might get a page-fault and will not be able to complete. Forward the page-fault in this case to the correct handler instead of killing the machine. Fixes: 0786138c78e7 ("x86/sev-es: Add a Runtime #VC Exception Handler") Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: stable@vger.kernel.org # v5.10+ Link: https://lkml.kernel.org/r/20210519135251.30093-3-joro@8bytes.org
2021-05-19x86/sev-es: Don't return NULL from sev_es_get_ghcb()Joerg Roedel1-13/+12
sev_es_get_ghcb() is called from several places but only one of them checks the return value. The reaction to returning NULL is always the same: calling panic() and kill the machine. Instead of adding checks to all call sites, move the panic() into the function itself so that it will no longer return NULL. Fixes: 0786138c78e7 ("x86/sev-es: Add a Runtime #VC Exception Handler") Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: stable@vger.kernel.org # v5.10+ Link: https://lkml.kernel.org/r/20210519135251.30093-2-joro@8bytes.org
2021-05-19x86/signal: Detect and prevent an alternate signal stack overflowChang S. Bae1-4/+20
The kernel pushes context on to the userspace stack to prepare for the user's signal handler. When the user has supplied an alternate signal stack, via sigaltstack(2), it is easy for the kernel to verify that the stack size is sufficient for the current hardware context. Check if writing the hardware context to the alternate stack will exceed it's size. If yes, then instead of corrupting user-data and proceeding with the original signal handler, an immediate SIGSEGV signal is delivered. Refactor the stack pointer check code from on_sig_stack() and use the new helper. While the kernel allows new source code to discover and use a sufficient alternate signal stack size, this check is still necessary to protect binaries with insufficient alternate signal stack size from data corruption. Fixes: c2bc11f10a39 ("x86, AVX-512: Enable AVX-512 States Context Switch") Reported-by: Florian Weimer <fweimer@redhat.com> Suggested-by: Jann Horn <jannh@google.com> Suggested-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Len Brown <len.brown@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20210518200320.17239-6-chang.seok.bae@intel.com Link: https://bugzilla.kernel.org/show_bug.cgi?id=153531
2021-05-19x86/elf: Support a new ELF aux vector AT_MINSIGSTKSZChang S. Bae1-0/+5
Historically, signal.h defines MINSIGSTKSZ (2KB) and SIGSTKSZ (8KB), for use by all architectures with sigaltstack(2). Over time, the hardware state size grew, but these constants did not evolve. Today, literal use of these constants on several architectures may result in signal stack overflow, and thus user data corruption. A few years ago, the ARM team addressed this issue by establishing getauxval(AT_MINSIGSTKSZ). This enables the kernel to supply a value at runtime that is an appropriate replacement on current and future hardware. Add getauxval(AT_MINSIGSTKSZ) support to x86, analogous to the support added for ARM in 94b07c1f8c39 ("arm64: signal: Report signal frame size to userspace via auxv"). Also, include a documentation to describe x86-specific auxiliary vectors. Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Len Brown <len.brown@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20210518200320.17239-4-chang.seok.bae@intel.com
2021-05-19x86/signal: Introduce helpers to get the maximum signal frame sizeChang S. Bae3-2/+79
Signal frames do not have a fixed format and can vary in size when a number of things change: supported XSAVE features, 32 vs. 64-bit apps, etc. Add support for a runtime method for userspace to dynamically discover how large a signal stack needs to be. Introduce a new variable, max_frame_size, and helper functions for the calculation to be used in a new user interface. Set max_frame_size to a system-wide worst-case value, instead of storing multiple app-specific values. Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Len Brown <len.brown@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: H.J. Lu <hjl.tools@gmail.com> Link: https://lkml.kernel.org/r/20210518200320.17239-3-chang.seok.bae@intel.com
2021-05-19signal: Deliver all of the siginfo perf data in _perfEric W. Biederman1-2/+4
Don't abuse si_errno and deliver all of the perf data in _perf member of siginfo_t. Note: The data field in the perf data structures in a u64 to allow a pointer to be encoded without needed to implement a 32bit and 64bit version of the same structure. There already exists a 32bit and 64bit versions siginfo_t, and the 32bit version can not include a 64bit member as it only has 32bit alignment. So unsigned long is used in siginfo_t instead of a u64 as unsigned long can encode a pointer on all architectures linux supports. v1: https://lkml.kernel.org/r/m11rarqqx2.fsf_-_@fess.ebiederm.org v2: https://lkml.kernel.org/r/20210503203814.25487-10-ebiederm@xmission.com v3: https://lkml.kernel.org/r/20210505141101.11519-11-ebiederm@xmission.com Link: https://lkml.kernel.org/r/20210517195748.8880-4-ebiederm@xmission.com Reviewed-by: Marco Elver <elver@google.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-05-19siginfo: Move si_trapno inside the union inside _si_faultEric W. Biederman1-0/+3
It turns out that linux uses si_trapno very sparingly, and as such it can be considered extra information for a very narrow selection of signals, rather than information that is present with every fault reported in siginfo. As such move si_trapno inside the union inside of _si_fault. This results in no change in placement, and makes it eaiser to extend _si_fault in the future as this reduces the number of special cases. In particular with si_trapno included in the union it is no longer a concern that the union must be pointer aligned on most architectures because the union follows immediately after si_addr which is a pointer. This change results in a difference in siginfo field placement on sparc and alpha for the fields si_addr_lsb, si_lower, si_upper, si_pkey, and si_perf. These architectures do not implement the signals that would use si_addr_lsb, si_lower, si_upper, si_pkey, and si_perf. Further these architecture have not yet implemented the userspace that would use si_perf. The point of this change is in fact to correct these placement issues before sparc or alpha grow userspace that cares. This change was discussed[1] and the agreement is that this change is currently safe. [1]: https://lkml.kernel.org/r/CAK8P3a0+uKYwL1NhY6Hvtieghba2hKYGD6hcKx5n8=4Gtt+pHA@mail.gmail.com Acked-by: Marco Elver <elver@google.com> v1: https://lkml.kernel.org/r/m1tunns7yf.fsf_-_@fess.ebiederm.org v2: https://lkml.kernel.org/r/20210505141101.11519-5-ebiederm@xmission.com Link: https://lkml.kernel.org/r/20210517195748.8880-1-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-05-18x86/bus_lock: Set rate limit for bus lockFenghua Yu1-2/+40
A bus lock can be thousands of cycles slower than atomic operation within one cache line. It also disrupts performance on other cores. Malicious users can generate multiple bus locks to degrade the whole system performance. The current mitigation is to kill the offending process, but for certain scenarios it's desired to identify and throttle the offending application. Add a system wide rate limit for bus locks. When the system detects bus locks at a rate higher than N/sec (where N can be set by the kernel boot argument in the range [1..1000]) any task triggering a bus lock will be forced to sleep for at least 20ms until the overall system rate of bus locks drops below the threshold. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20210419214958.4035512-3-fenghua.yu@intel.com