summaryrefslogtreecommitdiff
path: root/arch/x86/kernel/cpu
AgeCommit message (Collapse)AuthorFilesLines
2022-10-24x86/mce: Retrieve poison range from hardwareJane Chu1-1/+12
[ Upstream commit f9781bb18ed828e7b83b7bac4a4ad7cd497ee7d7 ] When memory poison consumption machine checks fire, MCE notifier handlers like nfit_handle_mce() record the impacted physical address range which is reported by the hardware in the MCi_MISC MSR. The error information includes data about blast radius, i.e. how many cachelines did the hardware determine are impacted. A recent change 7917f9cdb503 ("acpi/nfit: rely on mce->misc to determine poison granularity") updated nfit_handle_mce() to stop hard coding the blast radius value of 1 cacheline, and instead rely on the blast radius reported in 'struct mce' which can be up to 4K (64 cachelines). It turns out that apei_mce_report_mem_error() had a similar problem in that it hard coded a blast radius of 4K rather than reading the blast radius from the error information. Fix apei_mce_report_mem_error() to convey the proper poison granularity. Signed-off-by: Jane Chu <jane.chu@oracle.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/7ed50fd8-521e-cade-77b1-738b8bfb8502@oracle.com Link: https://lore.kernel.org/r/20220826233851.1319100-1-jane.chu@oracle.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-24x86/cpu: Include the header of init_ia32_feat_ctl()'s prototypeLuciano Leão1-1/+1
[ Upstream commit 30ea703a38ef76ca119673cd8bdd05c6e068e2ac ] Include the header containing the prototype of init_ia32_feat_ctl(), solving the following warning: $ make W=1 arch/x86/kernel/cpu/feat_ctl.o arch/x86/kernel/cpu/feat_ctl.c:112:6: warning: no previous prototype for ‘init_ia32_feat_ctl’ [-Wmissing-prototypes] 112 | void init_ia32_feat_ctl(struct cpuinfo_x86 *c) This warning appeared after commit 5d5103595e9e5 ("x86/cpu: Reinitialize IA32_FEAT_CTL MSR on BSP during wakeup") had moved the function init_ia32_feat_ctl()'s prototype from arch/x86/kernel/cpu/cpu.h to arch/x86/include/asm/cpu.h. Note that, before the commit mentioned above, the header include "cpu.h" (arch/x86/kernel/cpu/cpu.h) was added by commit 0e79ad863df43 ("x86/cpu: Fix a -Wmissing-prototypes warning for init_ia32_feat_ctl()") solely to fix init_ia32_feat_ctl()'s missing prototype. So, the header include "cpu.h" is no longer necessary. [ bp: Massage commit message. ] Fixes: 5d5103595e9e5 ("x86/cpu: Reinitialize IA32_FEAT_CTL MSR on BSP during wakeup") Signed-off-by: Luciano Leão <lucianorsleao@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Nícolas F. R. A. Prado <n@nfraprado.net> Link: https://lore.kernel.org/r/20220922200053.1357470-1-lucianorsleao@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-24x86/microcode/AMD: Track patch allocation size explicitlyKees Cook1-1/+2
[ Upstream commit 712f210a457d9c32414df246a72781550bc23ef6 ] In preparation for reducing the use of ksize(), record the actual allocation size for later memcpy(). This avoids copying extra (uninitialized!) bytes into the patch buffer when the requested allocation size isn't exactly the size of a kmalloc bucket. Additionally, fix potential future issues where runtime bounds checking will notice that the buffer was allocated to a smaller value than returned by ksize(). Fixes: 757885e94a22 ("x86, microcode, amd: Early microcode patch loading support for AMD") Suggested-by: Daniel Micay <danielmicay@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/lkml/CA+DvKQ+bp7Y7gmaVhacjv9uF6Ar-o4tet872h4Q8RPYPJjcJQA@mail.gmail.com/ Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-24x86/resctrl: Fix to restore to original value when re-enabling hardware ↵Kohei Tarumizu1-3/+9
prefetch register [ Upstream commit 499c8bb4693d1c8d8f3d6dd38e5bdde3ff5bd906 ] The current pseudo_lock.c code overwrites the value of the MSR_MISC_FEATURE_CONTROL to 0 even if the original value is not 0. Therefore, modify it to save and restore the original values. Fixes: 018961ae5579 ("x86/intel_rdt: Pseudo-lock region creation/removal core") Fixes: 443810fe6160 ("x86/intel_rdt: Create debugfs files for pseudo-locking testing") Fixes: 8a2fc0e1bc0c ("x86/intel_rdt: More precise L2 hit/miss measurements") Signed-off-by: Kohei Tarumizu <tarumizu.kohei@fujitsu.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lkml.kernel.org/r/eb660f3c2010b79a792c573c02d01e8e841206ad.1661358182.git.reinette.chatre@intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-05x86/sgx: Do not fail on incomplete sanitization on premature stop of ksgxdJarkko Sakkinen1-6/+9
commit 133e049a3f8c91b175029fb6a59b6039d5e79cba upstream. Unsanitized pages trigger WARN_ON() unconditionally, which can panic the whole computer, if /proc/sys/kernel/panic_on_warn is set. In sgx_init(), if misc_register() fails or misc_register() succeeds but neither sgx_drv_init() nor sgx_vepc_init() succeeds, then ksgxd will be prematurely stopped. This may leave unsanitized pages, which will result a false warning. Refine __sgx_sanitize_pages() to return: 1. Zero when the sanitization process is complete or ksgxd has been requested to stop. 2. The number of unsanitized pages otherwise. Fixes: 51ab30eb2ad4 ("x86/sgx: Replace section->init_laundry_list with sgx_dirty_page_list") Reported-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/linux-sgx/20220825051827.246698-1-jarkko@kernel.org/T/#u Link: https://lkml.kernel.org/r/20220906000221.34286-2-jarkko@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-31x86/bugs: Add "unknown" reporting for MMIO Stale DataPawan Gupta2-17/+39
commit 7df548840c496b0141fb2404b889c346380c2b22 upstream. Older Intel CPUs that are not in the affected processor list for MMIO Stale Data vulnerabilities currently report "Not affected" in sysfs, which may not be correct. Vulnerability status for these older CPUs is unknown. Add known-not-affected CPUs to the whitelist. Report "unknown" mitigation status for CPUs that are not in blacklist, whitelist and also don't enumerate MSR ARCH_CAPABILITIES bits that reflect hardware immunity to MMIO Stale Data vulnerabilities. Mitigation is not deployed when the status is unknown. [ bp: Massage, fixup. ] Fixes: 8d50cdf8b834 ("x86/speculation/mmio: Add sysfs reporting for Processor MMIO Stale Data") Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> Suggested-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/a932c154772f2121794a5f2eded1a11013114711.1657846269.git.pawan.kumar.gupta@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-17x86/bugs: Enable STIBP for IBPB mitigated RETBleedKim Phillips1-4/+6
commit e6cfcdda8cbe81eaf821c897369a65fec987b404 upstream. AMD's "Technical Guidance for Mitigating Branch Type Confusion, Rev. 1.0 2022-07-12" whitepaper, under section 6.1.2 "IBPB On Privileged Mode Entry / SMT Safety" says: Similar to the Jmp2Ret mitigation, if the code on the sibling thread cannot be trusted, software should set STIBP to 1 or disable SMT to ensure SMT safety when using this mitigation. So, like already being done for retbleed=unret, and now also for retbleed=ibpb, force STIBP on machines that have it, and report its SMT vulnerability status accordingly. [ bp: Remove the "we" and remove "[AMD]" applicability parameter which doesn't work here. ] Fixes: 3ebc17006888 ("x86/bugs: Add retbleed=ibpb") Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: stable@vger.kernel.org # 5.10, 5.15, 5.19 Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 Link: https://lore.kernel.org/r/20220804192201.439596-1-kim.phillips@amd.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-17x86/bus_lock: Don't assume the init value of DEBUGCTLMSR.BUS_LOCK_DETECT to ↵Chenyi Qiang1-13/+14
be zero [ Upstream commit ffa6482e461ff550325356ae705b79e256702ea9 ] It's possible that this kernel has been kexec'd from a kernel that enabled bus lock detection, or (hypothetically) BIOS/firmware has set DEBUGCTLMSR_BUS_LOCK_DETECT. Disable bus lock detection explicitly if not wanted. Fixes: ebb1064e7c2e ("x86/traps: Handle #DB for bus lock") Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20220802033206.21333-1-chenyi.qiang@intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-11x86/speculation: Add RSB VM Exit protectionsDaniel Sneddon2-25/+73
commit 2b1299322016731d56807aa49254a5ea3080b6b3 upstream. tl;dr: The Enhanced IBRS mitigation for Spectre v2 does not work as documented for RET instructions after VM exits. Mitigate it with a new one-entry RSB stuffing mechanism and a new LFENCE. == Background == Indirect Branch Restricted Speculation (IBRS) was designed to help mitigate Branch Target Injection and Speculative Store Bypass, i.e. Spectre, attacks. IBRS prevents software run in less privileged modes from affecting branch prediction in more privileged modes. IBRS requires the MSR to be written on every privilege level change. To overcome some of the performance issues of IBRS, Enhanced IBRS was introduced. eIBRS is an "always on" IBRS, in other words, just turn it on once instead of writing the MSR on every privilege level change. When eIBRS is enabled, more privileged modes should be protected from less privileged modes, including protecting VMMs from guests. == Problem == Here's a simplification of how guests are run on Linux' KVM: void run_kvm_guest(void) { // Prepare to run guest VMRESUME(); // Clean up after guest runs } The execution flow for that would look something like this to the processor: 1. Host-side: call run_kvm_guest() 2. Host-side: VMRESUME 3. Guest runs, does "CALL guest_function" 4. VM exit, host runs again 5. Host might make some "cleanup" function calls 6. Host-side: RET from run_kvm_guest() Now, when back on the host, there are a couple of possible scenarios of post-guest activity the host needs to do before executing host code: * on pre-eIBRS hardware (legacy IBRS, or nothing at all), the RSB is not touched and Linux has to do a 32-entry stuffing. * on eIBRS hardware, VM exit with IBRS enabled, or restoring the host IBRS=1 shortly after VM exit, has a documented side effect of flushing the RSB except in this PBRSB situation where the software needs to stuff the last RSB entry "by hand". IOW, with eIBRS supported, host RET instructions should no longer be influenced by guest behavior after the host retires a single CALL instruction. However, if the RET instructions are "unbalanced" with CALLs after a VM exit as is the RET in #6, it might speculatively use the address for the instruction after the CALL in #3 as an RSB prediction. This is a problem since the (untrusted) guest controls this address. Balanced CALL/RET instruction pairs such as in step #5 are not affected. == Solution == The PBRSB issue affects a wide variety of Intel processors which support eIBRS. But not all of them need mitigation. Today, X86_FEATURE_RSB_VMEXIT triggers an RSB filling sequence that mitigates PBRSB. Systems setting RSB_VMEXIT need no further mitigation - i.e., eIBRS systems which enable legacy IBRS explicitly. However, such systems (X86_FEATURE_IBRS_ENHANCED) do not set RSB_VMEXIT and most of them need a new mitigation. Therefore, introduce a new feature flag X86_FEATURE_RSB_VMEXIT_LITE which triggers a lighter-weight PBRSB mitigation versus RSB_VMEXIT. The lighter-weight mitigation performs a CALL instruction which is immediately followed by a speculative execution barrier (INT3). This steers speculative execution to the barrier -- just like a retpoline -- which ensures that speculation can never reach an unbalanced RET. Then, ensure this CALL is retired before continuing execution with an LFENCE. In other words, the window of exposure is opened at VM exit where RET behavior is troublesome. While the window is open, force RSB predictions sampling for RET targets to a dead end at the INT3. Close the window with the LFENCE. There is a subset of eIBRS systems which are not vulnerable to PBRSB. Add these systems to the cpu_vuln_whitelist[] as NO_EIBRS_PBRSB. Future systems that aren't vulnerable will set ARCH_CAP_PBRSB_NO. [ bp: Massage, incorporate review comments from Andy Cooper. ] Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com> Co-developed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-07-29x86/bugs: Do not enable IBPB at firmware entry when IBPB is not availableThadeu Lima de Souza Cascardo1-0/+1
Some cloud hypervisors do not provide IBPB on very recent CPU processors, including AMD processors affected by Retbleed. Using IBPB before firmware calls on such systems would cause a GPF at boot like the one below. Do not enable such calls when IBPB support is not present. EFI Variables Facility v0.08 2004-May-17 general protection fault, maybe for address 0x1: 0000 [#1] PREEMPT SMP NOPTI CPU: 0 PID: 24 Comm: kworker/u2:1 Not tainted 5.19.0-rc8+ #7 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015 Workqueue: efi_rts_wq efi_call_rts RIP: 0010:efi_call_rts Code: e8 37 33 58 ff 41 bf 48 00 00 00 49 89 c0 44 89 f9 48 83 c8 01 4c 89 c2 48 c1 ea 20 66 90 b9 49 00 00 00 b8 01 00 00 00 31 d2 <0f> 30 e8 7b 9f 5d ff e8 f6 f8 ff ff 4c 89 f1 4c 89 ea 4c 89 e6 48 RSP: 0018:ffffb373800d7e38 EFLAGS: 00010246 RAX: 0000000000000001 RBX: 0000000000000006 RCX: 0000000000000049 RDX: 0000000000000000 RSI: ffff94fbc19d8fe0 RDI: ffff94fbc1b2b300 RBP: ffffb373800d7e70 R08: 0000000000000000 R09: 0000000000000000 R10: 000000000000000b R11: 000000000000000b R12: ffffb3738001fd78 R13: ffff94fbc2fcfc00 R14: ffffb3738001fd80 R15: 0000000000000048 FS: 0000000000000000(0000) GS:ffff94fc3da00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff94fc30201000 CR3: 000000006f610000 CR4: 00000000000406f0 Call Trace: <TASK> ? __wake_up process_one_work worker_thread ? rescuer_thread kthread ? kthread_complete_and_exit ret_from_fork </TASK> Modules linked in: Fixes: 28a99e95f55c ("x86/amd: Use IBPB for firmware calls") Reported-by: Dimitri John Ledkov <dimitri.ledkov@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220728122602.2500509-1-cascardo@canonical.com
2022-07-20x86/bugs: Warn when "ibrs" mitigation is selected on Enhanced IBRS partsPawan Gupta1-0/+3
IBRS mitigation for spectre_v2 forces write to MSR_IA32_SPEC_CTRL at every kernel entry/exit. On Enhanced IBRS parts setting MSR_IA32_SPEC_CTRL[IBRS] only once at boot is sufficient. MSR writes at every kernel entry/exit incur unnecessary performance loss. When Enhanced IBRS feature is present, print a warning about this unnecessary performance loss. Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/2a5eaf54583c2bfe0edc4fea64006656256cca17.1657814857.git.pawan.kumar.gupta@linux.intel.com
2022-07-18x86/amd: Use IBPB for firmware callsPeter Zijlstra1-1/+10
On AMD IBRS does not prevent Retbleed; as such use IBPB before a firmware call to flush the branch history state. And because in order to do an EFI call, the kernel maps a whole lot of the kernel page table into the EFI page table, do an IBPB just in case in order to prevent the scenario of poisoning the BTB and causing an EFI call using the unprotected RET there. [ bp: Massage. ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220715194550.793957-1-cascardo@canonical.com
2022-07-16x86/bugs: Remove apostrophe typoKim Phillips1-1/+1
Remove a superfluous ' in the mitigation string. Fixes: e8ec1b6e08a2 ("x86/bugs: Enable STIBP for JMP2RET") Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de>
2022-07-14x86/bugs: Mark retbleed_strings staticJiapeng Chong1-1/+1
This symbol is not used outside of bugs.c, so mark it static. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220714072939.71162-1-jiapeng.chong@linux.alibaba.com
2022-07-09x86/speculation: Disable RRSBA behaviorPawan Gupta2-0/+27
Some Intel processors may use alternate predictors for RETs on RSB-underflow. This condition may be vulnerable to Branch History Injection (BHI) and intramode-BTI. Kernel earlier added spectre_v2 mitigation modes (eIBRS+Retpolines, eIBRS+LFENCE, Retpolines) which protect indirect CALLs and JMPs against such attacks. However, on RSB-underflow, RET target prediction may fallback to alternate predictors. As a result, RET's predicted target may get influenced by branch history. A new MSR_IA32_SPEC_CTRL bit (RRSBA_DIS_S) controls this fallback behavior when in kernel mode. When set, RETs will not take predictions from alternate predictors, hence mitigating RETs as well. Support for this is enumerated by CPUID.7.2.EDX[RRSBA_CTRL] (bit2). For spectre v2 mitigation, when a user selects a mitigation that protects indirect CALLs and JMPs against BHI and intramode-BTI, set RRSBA_DIS_S also to protect RETs for RSB-underflow case. Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de>
2022-07-08x86/bugs: Do not enable IBPB-on-entry when IBPB is not supportedThadeu Lima de Souza Cascardo1-2/+5
There are some VM configurations which have Skylake model but do not support IBPB. In those cases, when using retbleed=ibpb, userspace is going to be killed and kernel is going to panic. If the CPU does not support IBPB, warn and proceed with the auto option. Also, do not fallback to IBPB on AMD/Hygon systems if it is not supported. Fixes: 3ebc17006888 ("x86/bugs: Add retbleed=ibpb") Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by: Borislav Petkov <bp@suse.de>
2022-07-07x86/bugs: Add Cannon lake to RETBleed affected CPU listPawan Gupta1-0/+1
Cannon lake is also affected by RETBleed, add it to the list. Fixes: 6ad0ad2bf8a6 ("x86/bugs: Report Intel retbleed vulnerability") Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-29x86/retbleed: Add fine grained Kconfig knobsPeter Zijlstra2-15/+29
Do fine-grained Kconfig for all the various retbleed parts. NOTE: if your compiler doesn't support return thunks this will silently 'upgrade' your mitigation to IBPB, you might not like this. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27x86/cpu/amd: Enumerate BTC_NOAndrew Cooper2-8/+19
BTC_NO indicates that hardware is not susceptible to Branch Type Confusion. Zen3 CPUs don't suffer BTC. Hypervisors are expected to synthesise BTC_NO when it is appropriate given the migration pool, to prevent kernels using heuristics. [ bp: Massage. ] Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27x86/common: Stamp out the stepping madnessPeter Zijlstra1-21/+16
The whole MMIO/RETBLEED enumeration went overboard on steppings. Get rid of all that and simply use ANY. If a future stepping of these models would not be affected, it had better set the relevant ARCH_CAP_$FOO_NO bit in IA32_ARCH_CAPABILITIES. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27KVM: VMX: Prevent RSB underflow before vmenterJosh Poimboeuf1-2/+2
On VMX, there are some balanced returns between the time the guest's SPEC_CTRL value is written, and the vmenter. Balanced returns (matched by a preceding call) are usually ok, but it's at least theoretically possible an NMI with a deep call stack could empty the RSB before one of the returns. For maximum paranoia, don't allow *any* returns (balanced or otherwise) between the SPEC_CTRL write and the vmenter. [ bp: Fix 32-bit build. ] Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27x86/speculation: Fill RSB on vmexit for IBRSJosh Poimboeuf1-5/+58
Prevent RSB underflow/poisoning attacks with RSB. While at it, add a bunch of comments to attempt to document the current state of tribal knowledge about RSB attacks and what exactly is being mitigated. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27KVM: VMX: Prevent guest RSB poisoning attacks with eIBRSJosh Poimboeuf1-0/+4
On eIBRS systems, the returns in the vmexit return path from __vmx_vcpu_run() to vmx_vcpu_run() are exposed to RSB poisoning attacks. Fix that by moving the post-vmexit spec_ctrl handling to immediately after the vmexit. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27x86/speculation: Remove x86_spec_ctrl_maskJosh Poimboeuf1-30/+1
This mask has been made redundant by kvm_spec_ctrl_test_value(). And it doesn't even work when MSR interception is disabled, as the guest can just write to SPEC_CTRL directly. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27x86/speculation: Use cached host SPEC_CTRL value for guest entry/exitJosh Poimboeuf1-11/+1
There's no need to recalculate the host value for every entry/exit. Just use the cached value in spec_ctrl_current(). Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27x86/speculation: Fix SPEC_CTRL write on SMT state changeJosh Poimboeuf1-1/+2
If the SMT state changes, SSBD might get accidentally disabled. Fix that. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27x86/cpu/amd: Add Spectral ChickenPeter Zijlstra3-1/+30
Zen2 uarchs have an undocumented, unnamed, MSR that contains a chicken bit for some speculation behaviour. It needs setting. Note: very belatedly AMD released naming; it's now officially called MSR_AMD64_DE_CFG2 and MSR_AMD64_DE_CFG2_SUPPRESS_NOBR_PRED_BIT but shall remain the SPECTRAL CHICKEN. Suggested-by: Andrew Cooper <Andrew.Cooper3@citrix.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27x86/bugs: Do IBPB fallback check only onceJosh Poimboeuf1-10/+5
When booting with retbleed=auto, if the kernel wasn't built with CONFIG_CC_HAS_RETURN_THUNK, the mitigation falls back to IBPB. Make sure a warning is printed in that case. The IBPB fallback check is done twice, but it really only needs to be done once. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27x86/bugs: Add retbleed=ibpbPeter Zijlstra1-9/+34
jmp2ret mitigates the easy-to-attack case at relatively low overhead. It mitigates the long speculation windows after a mispredicted RET, but it does not mitigate the short speculation window from arbitrary instruction boundaries. On Zen2, there is a chicken bit which needs setting, which mitigates "arbitrary instruction boundaries" down to just "basic block boundaries". But there is no fix for the short speculation window on basic block boundaries, other than to flush the entire BTB to evict all attacker predictions. On the spectrum of "fast & blurry" -> "safe", there is (on top of STIBP or no-SMT): 1) Nothing System wide open 2) jmp2ret May stop a script kiddy 3) jmp2ret+chickenbit Raises the bar rather further 4) IBPB Only thing which can count as "safe". Tentative numbers put IBPB-on-entry at a 2.5x hit on Zen2, and a 10x hit on Zen1 according to lmbench. [ bp: Fixup feature bit comments, document option, 32-bit build fix. ] Suggested-by: Andrew Cooper <Andrew.Cooper3@citrix.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27intel_idle: Disable IBRS during long idlePeter Zijlstra1-0/+6
Having IBRS enabled while the SMT sibling is idle unnecessarily slows down the running sibling. OTOH, disabling IBRS around idle takes two MSR writes, which will increase the idle latency. Therefore, only disable IBRS around deeper idle states. Shallow idle states are bounded by the tick in duration, since NOHZ is not allowed for them by virtue of their short target residency. Only do this for mwait-driven idle, since that keeps interrupts disabled across idle, which makes disabling IBRS vs IRQ-entry a non-issue. Note: C6 is a random threshold, most importantly C1 probably shouldn't disable IBRS, benchmarking needed. Suggested-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27x86/bugs: Report Intel retbleed vulnerabilityPeter Zijlstra2-18/+45
Skylake suffers from RSB underflow speculation issues; report this vulnerability and it's mitigation (spectre_v2=ibrs). [jpoimboe: cleanups, eibrs] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27x86/bugs: Split spectre_v2_select_mitigation() and ↵Peter Zijlstra1-8/+17
spectre_v2_user_select_mitigation() retbleed will depend on spectre_v2, while spectre_v2_user depends on retbleed. Break this cycle. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27x86/speculation: Add spectre_v2=ibrs option to support Kernel IBRSPawan Gupta1-14/+52
Extend spectre_v2= boot option with Kernel IBRS. [jpoimboe: no STIBP with IBRS] Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27x86/bugs: Optimize SPEC_CTRL MSR writesPeter Zijlstra1-6/+12
When changing SPEC_CTRL for user control, the WRMSR can be delayed until return-to-user when KERNEL_IBRS has been enabled. This avoids an MSR write during context switch. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27x86/bugs: Keep a per-CPU IA32_SPEC_CTRL valuePeter Zijlstra1-5/+23
Due to TIF_SSBD and TIF_SPEC_IB the actual IA32_SPEC_CTRL value can differ from x86_spec_ctrl_base. As such, keep a per-CPU value reflecting the current task's MSR content. [jpoimboe: rename] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27x86/bugs: Enable STIBP for JMP2RETKim Phillips1-12/+46
For untrained return thunks to be fully effective, STIBP must be enabled or SMT disabled. Co-developed-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27x86/bugs: Add AMD retbleed= boot parameterAlexandre Chartre1-1/+107
Add the "retbleed=<value>" boot parameter to select a mitigation for RETBleed. Possible values are "off", "auto" and "unret" (JMP2RET mitigation). The default value is "auto". Currently, "retbleed=auto" will select the unret mitigation on AMD and Hygon and no mitigation on Intel (JMP2RET is not effective on Intel). [peterz: rebase; add hygon] [jpoimboe: cleanups] Signed-off-by: Alexandre Chartre <alexandre.chartre@oracle.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27x86/bugs: Report AMD retbleed vulnerabilityAlexandre Chartre2-0/+32
Report that AMD x86 CPUs are vulnerable to the RETBleed (Arbitrary Speculative Code Execution with Return Instructions) attack. [peterz: add hygon] [kim: invert parity; fam15h] Co-developed-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Alexandre Chartre <alexandre.chartre@oracle.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-14Merge tag 'x86-bugs-2022-06-01' of ↵Linus Torvalds2-39/+248
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 MMIO stale data fixes from Thomas Gleixner: "Yet another hw vulnerability with a software mitigation: Processor MMIO Stale Data. They are a class of MMIO-related weaknesses which can expose stale data by propagating it into core fill buffers. Data which can then be leaked using the usual speculative execution methods. Mitigations include this set along with microcode updates and are similar to MDS and TAA vulnerabilities: VERW now clears those buffers too" * tag 'x86-bugs-2022-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/speculation/mmio: Print SMT warning KVM: x86/speculation: Disable Fill buffer clear within guests x86/speculation/mmio: Reuse SRBDS mitigation for SBDS x86/speculation/srbds: Update SRBDS mitigation selection x86/speculation/mmio: Add sysfs reporting for Processor MMIO Stale Data x86/speculation/mmio: Enable CPU Fill buffer clearing on idle x86/bugs: Group MDS, TAA & Processor MMIO Stale Data mitigations x86/speculation/mmio: Add mitigation for Processor MMIO Stale Data x86/speculation: Add a common function for MD_CLEAR mitigation update x86/speculation/mmio: Enumerate Processor MMIO Stale Data bug Documentation: Add documentation for Processor MMIO Stale Data
2022-06-05Merge tag 'x86-urgent-2022-06-05' of ↵Linus Torvalds3-6/+115
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 SGX fix from Thomas Gleixner: "A single fix for x86/SGX to prevent that memory which is allocated for an SGX enclave is accounted to the wrong memory control group" * tag 'x86-urgent-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/sgx: Set active memcg prior to shmem allocation
2022-06-05Merge tag 'x86-microcode-2022-06-05' of ↵Linus Torvalds2-104/+13
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 microcode updates from Thomas Gleixner: - Disable late microcode loading by default. Unless the HW people get their act together and provide a required minimum version in the microcode header for making a halfways informed decision its just lottery and broken. - Warn and taint the kernel when microcode is loaded late - Remove the old unused microcode loader interface - Remove a redundant perf callback from the microcode loader * tag 'x86-microcode-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/microcode: Remove unnecessary perf callback x86/microcode: Taint and warn on late loading x86/microcode: Default-disable late loading x86/microcode: Rip out the OLD_INTERFACE
2022-06-02x86/sgx: Set active memcg prior to shmem allocationKristen Carlson Accardi3-6/+115
When the system runs out of enclave memory, SGX can reclaim EPC pages by swapping to normal RAM. These backing pages are allocated via a per-enclave shared memory area. Since SGX allows unlimited over commit on EPC memory, the reclaimer thread can allocate a large number of backing RAM pages in response to EPC memory pressure. When the shared memory backing RAM allocation occurs during the reclaimer thread context, the shared memory is charged to the root memory control group, and the shmem usage of the enclave is not properly accounted for, making cgroups ineffective at limiting the amount of RAM an enclave can consume. For example, when using a cgroup to launch a set of test enclaves, the kernel does not properly account for 50% - 75% of shmem page allocations on average. In the worst case, when nearly all allocations occur during the reclaimer thread, the kernel accounts less than a percent of the amount of shmem used by the enclave's cgroup to the correct cgroup. SGX stores a list of mm_structs that are associated with an enclave. Pick one of them during reclaim and charge that mm's memcg with the shmem allocation. The one that gets picked is arbitrary, but this list almost always only has one mm. The cases where there is more than one mm with different memcg's are not worth considering. Create a new function - sgx_encl_alloc_backing(). This function is used whenever a new backing storage page needs to be allocated. Previously the same function was used for page allocation as well as retrieving a previously allocated page. Prior to backing page allocation, if there is a mm_struct associated with the enclave that is requesting the allocation, it is set as the active memory control group. [ dhansen: - fix merge conflict with ELDU fixes - check against actual ksgxd_tsk, not ->mm ] Cc: stable@vger.kernel.org Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Link: https://lkml.kernel.org/r/20220520174248.4918-1-kristen@linux.intel.com
2022-06-01x86/speculation/mmio: Print SMT warningJosh Poimboeuf1-0/+11
Similar to MDS and TAA, print a warning if SMT is enabled for the MMIO Stale Data vulnerability. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2022-05-31x86/microcode: Remove unnecessary perf callbackBorislav Petkov1-3/+0
c93dc84cbe32 ("perf/x86: Add a microcode revision check for SNB-PEBS") checks whether the microcode revision has fixed PEBS issues. This can happen either: 1. At PEBS init time, where the early microcode has been loaded already 2. During late loading, in the microcode_check() callback. So remove the unnecessary call in the microcode loader init routine. Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20220525161232.14924-5-bp@alien8.de
2022-05-31x86/microcode: Taint and warn on late loadingBorislav Petkov1-0/+5
Warn before it is attempted and taint the kernel. Late loading microcode can lead to malfunction of the kernel when the microcode update changes behaviour. There is no way for the kernel to determine whether its safe or not. Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20220525161232.14924-4-bp@alien8.de
2022-05-31x86/microcode: Default-disable late loadingBorislav Petkov2-1/+8
It is dangerous and it should not be used anyway - there's a nice early loading already. Requested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20220525161232.14924-3-bp@alien8.de
2022-05-31x86/microcode: Rip out the OLD_INTERFACEBorislav Petkov1-100/+0
Everything should be using the early initrd loading by now. Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20220525161232.14924-2-bp@alien8.de
2022-05-28Merge tag 'hyperv-next-signed-20220528' of ↵Linus Torvalds1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv updates from Wei Liu: - Harden hv_sock driver (Andrea Parri) - Harden Hyper-V PCI driver (Andrea Parri) - Fix multi-MSI for Hyper-V PCI driver (Jeffrey Hugo) - Fix Hyper-V PCI to reduce boot time (Dexuan Cui) - Remove code for long EOL'ed Hyper-V versions (Michael Kelley, Saurabh Sengar) - Fix balloon driver error handling (Shradha Gupta) - Fix a typo in vmbus driver (Julia Lawall) - Ignore vmbus IMC device (Michael Kelley) - Add a new error message to Hyper-V DRM driver (Saurabh Sengar) * tag 'hyperv-next-signed-20220528' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: (28 commits) hv_balloon: Fix balloon_probe() and balloon_remove() error handling scsi: storvsc: Removing Pre Win8 related logic Drivers: hv: vmbus: fix typo in comment PCI: hv: Fix synchronization between channel callback and hv_pci_bus_exit() PCI: hv: Add validation for untrusted Hyper-V values PCI: hv: Fix interrupt mapping for multi-MSI PCI: hv: Reuse existing IRTE allocation in compose_msi_msg() drm/hyperv: Remove support for Hyper-V 2008 and 2008R2/Win7 video: hyperv_fb: Remove support for Hyper-V 2008 and 2008R2/Win7 scsi: storvsc: Remove support for Hyper-V 2008 and 2008R2/Win7 Drivers: hv: vmbus: Remove support for Hyper-V 2008 and Hyper-V 2008R2/Win7 x86/hyperv: Disable hardlockup detector by default in Hyper-V guests drm/hyperv: Add error message for fb size greater than allocated PCI: hv: Do not set PCI_COMMAND_MEMORY to reduce VM boot time PCI: hv: Fix hv_arch_irq_unmask() for multi-MSI Drivers: hv: vmbus: Refactor the ring-buffer iterator functions Drivers: hv: vmbus: Accept hv_sock offers in isolated guests hv_sock: Add validation for untrusted Hyper-V values hv_sock: Copy packets sent by Hyper-V out of the ring buffer hv_sock: Check hv_pkt_iter_first_raw()'s return value ...
2022-05-28Merge tag 'libnvdimm-for-5.19' of ↵Linus Torvalds1-3/+3
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm and DAX updates from Dan Williams: "New support for clearing memory errors when a file is in DAX mode, alongside with some other fixes and cleanups. Previously it was only possible to clear these errors using a truncate or hole-punch operation to trigger the filesystem to reallocate the block, now, any page aligned write can opportunistically clear errors as well. This change spans x86/mm, nvdimm, and fs/dax, and has received the appropriate sign-offs. Thanks to Jane for her work on this. Summary: - Add support for clearing memory error via pwrite(2) on DAX - Fix 'security overwrite' support in the presence of media errors - Miscellaneous cleanups and fixes for nfit_test (nvdimm unit tests)" * tag 'libnvdimm-for-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: pmem: implement pmem_recovery_write() pmem: refactor pmem_clear_poison() dax: add .recovery_write dax_operation dax: introduce DAX_RECOVERY_WRITE dax access mode mce: fix set_mce_nospec to always unmap the whole page x86/mce: relocate set{clear}_mce_nospec() functions acpi/nfit: rely on mce->misc to determine poison granularity testing: nvdimm: asm/mce.h is not needed in nfit.c testing: nvdimm: iomap: make __nfit_test_ioremap a macro nvdimm: Allow overwrite in the presence of disabled dimms tools/testing/nvdimm: remove unneeded flush_workqueue
2022-05-26Merge tag 'dma-mapping-5.19-2022-05-25' of ↵Linus Torvalds1-8/+0
git://git.infradead.org/users/hch/dma-mapping Pull dma-mapping updates from Christoph Hellwig: - don't over-decrypt memory (Robin Murphy) - takes min align mask into account for the swiotlb max mapping size (Tianyu Lan) - use GFP_ATOMIC in dma-debug (Mikulas Patocka) - fix DMA_ATTR_NO_KERNEL_MAPPING on xen/arm (me) - don't fail on highmem CMA pages in dma_direct_alloc_pages (me) - cleanup swiotlb initialization and share more code with swiotlb-xen (me, Stefano Stabellini) * tag 'dma-mapping-5.19-2022-05-25' of git://git.infradead.org/users/hch/dma-mapping: (23 commits) dma-direct: don't over-decrypt memory swiotlb: max mapping size takes min align mask into account swiotlb: use the right nslabs-derived sizes in swiotlb_init_late swiotlb: use the right nslabs value in swiotlb_init_remap swiotlb: don't panic when the swiotlb buffer can't be allocated dma-debug: change allocation mode from GFP_NOWAIT to GFP_ATIOMIC dma-direct: don't fail on highmem CMA pages in dma_direct_alloc_pages swiotlb-xen: fix DMA_ATTR_NO_KERNEL_MAPPING on arm x86: remove cruft from <asm/dma-mapping.h> swiotlb: remove swiotlb_init_with_tbl and swiotlb_init_late_with_tbl swiotlb: merge swiotlb-xen initialization into swiotlb swiotlb: provide swiotlb_init variants that remap the buffer swiotlb: pass a gfp_mask argument to swiotlb_init_late swiotlb: add a SWIOTLB_ANY flag to lift the low memory restriction swiotlb: make the swiotlb_init interface more useful x86: centralize setting SWIOTLB_FORCE when guest memory encryption is enabled x86: remove the IOMMU table infrastructure MIPS/octeon: use swiotlb_init instead of open coding it arm/xen: don't check for xen_initial_domain() in xen_create_contiguous_region swiotlb: rename swiotlb_late_init_with_default_size ...