summaryrefslogtreecommitdiff
path: root/arch/x86/kernel
AgeCommit message (Collapse)AuthorFilesLines
2023-11-28x86/cpu/hygon: Fix the CPU topology evaluation for realPu Wen1-2/+6
commit ee545b94d39a00c93dc98b1dbcbcf731d2eadeb4 upstream. Hygon processors with a model ID > 3 have CPUID leaf 0xB correctly populated and don't need the fixed package ID shift workaround. The fixup is also incorrect when running in a guest. Fixes: e0ceeae708ce ("x86/CPU/hygon: Fix phys_proc_id calculation logic for multi-die processors") Signed-off-by: Pu Wen <puwen@hygon.cn> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/tencent_594804A808BD93A4EBF50A994F228E3A7F07@qq.com Link: https://lore.kernel.org/r/20230814085112.089607918@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-20x86/amd_nb: Use Family 19h Models 60h-7Fh Function 4 IDsYazen Ghannam1-0/+3
commit 2a565258b3f4bbdc7a3c09cd02082cb286a7bffc upstream. Three PCI IDs for DF Function 4 were defined but not used. Add them to the "link" list. Fixes: f8faf3496633 ("x86/amd_nb: Add AMD PCI IDs for SMN communication") Fixes: 23a5b8bb022c ("x86/amd_nb: Add PCI ID for family 19h model 78h") Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230803150430.3542854-1-yazen.ghannam@amd.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-20x86/sev: Change snp_guest_issue_request()'s fw_err argumentDionna Glaze1-7/+8
[ Upstream commit 0144e3b85d7b42e8a4cda991c0e81f131897457a ] The GHCB specification declares that the firmware error value for a guest request will be stored in the lower 32 bits of EXIT_INFO_2. The upper 32 bits are for the VMM's own error code. The fw_err argument to snp_guest_issue_request() is thus a misnomer, and callers will need access to all 64 bits. The type of unsigned long also causes problems, since sw_exit_info2 is u64 (unsigned long long) vs the argument's unsigned long*. Change this type for issuing the guest request. Pass the ioctl command struct's error field directly instead of in a local variable, since an incomplete guest request may not set the error code, and uninitialized stack memory would be written back to user space. The firmware might not even be called, so bookend the call with the no firmware call error and clear the error. Since the "fw_err" field is really exitinfo2 split into the upper bits' vmm error code and lower bits' firmware error code, convert the 64 bit value to a union. [ bp: - Massage commit message - adjust code - Fix a build issue as Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/oe-kbuild-all/202303070609.vX6wp2Af-lkp@intel.com - print exitinfo2 in hex Tom: - Correct -EIO exit case. ] Signed-off-by: Dionna Glaze <dionnaglaze@google.com> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230214164638.1189804-5-dionnaglaze@google.com Link: https://lore.kernel.org/r/20230307192449.24732-12-bp@alien8.de Stable-dep-of: db10cb9b5746 ("virt: sevguest: Fix passing a stack buffer as a scatterlist target") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-20x86/boot: Fix incorrect startup_gdt_descr.sizeYuntao Wang1-1/+1
[ Upstream commit 001470fed5959d01faecbd57fcf2f60294da0de1 ] Since the size value is added to the base address to yield the last valid byte address of the GDT, the current size value of startup_gdt_descr is incorrect (too large by one), fix it. [ mingo: This probably never mattered, because startup_gdt[] is only used in a very controlled fashion - but make it consistent nevertheless. ] Fixes: 866b556efa12 ("x86/head/64: Install startup GDT") Signed-off-by: Yuntao Wang <ytcoode@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Link: https://lore.kernel.org/r/20230807084547.217390-1-ytcoode@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-20x86/srso: Fix SBPB enablement for (possible) future fixed HWJosh Poimboeuf1-1/+1
[ Upstream commit 1d1142ac51307145dbb256ac3535a1d43a1c9800 ] Make the SBPB check more robust against the (possible) case where future HW has SRSO fixed but doesn't have the SRSO_NO bit set. Fixes: 1b5277c0ea0b ("x86/srso: Add SRSO_NO support") Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/cee5050db750b391c9f35f5334f8ff40e66c01b9.1693889988.git.jpoimboe@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-02x86/i8259: Skip probing when ACPI/MADT advertises PCAT compatibilityThomas Gleixner2-8/+33
commit 128b0c9781c9f2651bea163cb85e52a6c7be0f9e upstream. David and a few others reported that on certain newer systems some legacy interrupts fail to work correctly. Debugging revealed that the BIOS of these systems leaves the legacy PIC in uninitialized state which makes the PIC detection fail and the kernel switches to a dummy implementation. Unfortunately this fallback causes quite some code to fail as it depends on checks for the number of legacy PIC interrupts or the availability of the real PIC. In theory there is no reason to use the PIC on any modern system when IO/APIC is available, but the dependencies on the related checks cannot be resolved trivially and on short notice. This needs lots of analysis and rework. The PIC detection has been added to avoid quirky checks and force selection of the dummy implementation all over the place, especially in VM guest scenarios. So it's not an option to revert the relevant commit as that would break a lot of other scenarios. One solution would be to try to initialize the PIC on detection fail and retry the detection, but that puts the burden on everything which does not have a PIC. Fortunately the ACPI/MADT table header has a flag field, which advertises in bit 0 that the system is PCAT compatible, which means it has a legacy 8259 PIC. Evaluate that bit and if set avoid the detection routine and keep the real PIC installed, which then gets initialized (for nothing) and makes the rest of the code with all the dependencies work again. Fixes: e179f6914152 ("x86, irq, pic: Probe for legacy PIC and set legacy_pic appropriately") Reported-by: David Lazar <dlazar@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: David Lazar <dlazar@gmail.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Cc: stable@vger.kernel.org Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218003 Link: https://lore.kernel.org/r/875y2u5s8g.ffs@tglx Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-25KVM: x86: Constrain guest-supported xfeatures only at KVM_GET_XSAVE{2}Sean Christopherson1-4/+1
commit 8647c52e9504c99752a39f1d44f6268f82c40a5c upstream. Mask off xfeatures that aren't exposed to the guest only when saving guest state via KVM_GET_XSAVE{2} instead of modifying user_xfeatures directly. Preserving the maximal set of xfeatures in user_xfeatures restores KVM's ABI for KVM_SET_XSAVE, which prior to commit ad856280ddea ("x86/kvm/fpu: Limit guest user_xfeatures to supported bits of XCR0") allowed userspace to load xfeatures that are supported by the host, irrespective of what xfeatures are exposed to the guest. There is no known use case where userspace *intentionally* loads xfeatures that aren't exposed to the guest, but the bug fixed by commit ad856280ddea was specifically that KVM_GET_SAVE{2} would save xfeatures that weren't exposed to the guest, e.g. would lead to userspace unintentionally loading guest-unsupported xfeatures when live migrating a VM. Restricting KVM_SET_XSAVE to guest-supported xfeatures is especially problematic for QEMU-based setups, as QEMU has a bug where instead of terminating the VM if KVM_SET_XSAVE fails, QEMU instead simply stops loading guest state, i.e. resumes the guest after live migration with incomplete guest state, and ultimately results in guest data corruption. Note, letting userspace restore all host-supported xfeatures does not fix setups where a VM is migrated from a host *without* commit ad856280ddea, to a target with a subset of host-supported xfeatures. However there is no way to safely address that scenario, e.g. KVM could silently drop the unsupported features, but that would be a clear violation of KVM's ABI and so would require userspace to opt-in, at which point userspace could simply be updated to sanitize the to-be-loaded XSAVE state. Reported-by: Tyler Stachecki <stachecki.tyler@gmail.com> Closes: https://lore.kernel.org/all/20230914010003.358162-1-tstachecki@bloomberg.net Fixes: ad856280ddea ("x86/kvm/fpu: Limit guest user_xfeatures to supported bits of XCR0") Cc: stable@vger.kernel.org Cc: Leonardo Bras <leobras@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Message-Id: <20230928001956.924301-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-25x86/fpu: Allow caller to constrain xfeatures when copying to uabi bufferSean Christopherson3-5/+10
commit 18164f66e6c59fda15c198b371fa008431efdb22 upstream. Plumb an xfeatures mask into __copy_xstate_to_uabi_buf() so that KVM can constrain which xfeatures are saved into the userspace buffer without having to modify the user_xfeatures field in KVM's guest_fpu state. KVM's ABI for KVM_GET_XSAVE{2} is that features that are not exposed to guest must not show up in the effective xstate_bv field of the buffer. Saving only the guest-supported xfeatures allows userspace to load the saved state on a different host with a fewer xfeatures, so long as the target host supports the xfeatures that are exposed to the guest. KVM currently sets user_xfeatures directly to restrict KVM_GET_XSAVE{2} to the set of guest-supported xfeatures, but doing so broke KVM's historical ABI for KVM_SET_XSAVE, which allows userspace to load any xfeatures that are supported by the *host*. Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230928001956.924301-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-25x86/sev: Check for user-space IOIO pointing to kernel spaceJoerg Roedel1-2/+29
Upstream commit: 63e44bc52047f182601e7817da969a105aa1f721 Check the memory operand of INS/OUTS before emulating the instruction. The #VC exception can get raised from user-space, but the memory operand can be manipulated to access kernel memory before the emulation actually begins and after the exception handler has run. [ bp: Massage commit message. ] Fixes: 597cfe48212a ("x86/boot/compressed/64: Setup a GHCB-based VC Exception handler") Reported-by: Tom Dohrmann <erbse.13@gmx.de> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: <stable@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-25x86/sev: Check IOBM for IOIO exceptions from user-spaceJoerg Roedel2-7/+42
Upstream commit: b9cb9c45583b911e0db71d09caa6b56469eb2bdf Check the IO permission bitmap (if present) before emulating IOIO #VC exceptions for user-space. These permissions are checked by hardware already before the #VC is raised, but due to the VC-handler decoding race it needs to be checked again in software. Fixes: 25189d08e516 ("x86/sev-es: Add support for handling IOIO exceptions") Reported-by: Tom Dohrmann <erbse.13@gmx.de> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Tom Dohrmann <erbse.13@gmx.de> Cc: <stable@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-25x86/sev: Disable MMIO emulation from user modeBorislav Petkov (AMD)1-0/+3
Upstream commit: a37cd2a59d0cb270b1bba568fd3a3b8668b9d3ba A virt scenario can be constructed where MMIO memory can be user memory. When that happens, a race condition opens between when the hardware raises the #VC and when the #VC handler gets to emulate the instruction. If the MOVS is replaced with a MOVS accessing kernel memory in that small race window, then write to kernel memory happens as the access checks are not done at emulation time. Disable MMIO emulation in user mode temporarily until a sensible use case appears and justifies properly handling the race window. Fixes: 0118b604c2c9 ("x86/sev-es: Handle MMIO String Instructions") Reported-by: Tom Dohrmann <erbse.13@gmx.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Tom Dohrmann <erbse.13@gmx.de> Cc: <stable@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-20x86/alternatives: Disable KASAN in apply_alternatives()Kirill A. Shutemov1-0/+13
commit d35652a5fc9944784f6f50a5c979518ff8dacf61 upstream. Fei has reported that KASAN triggers during apply_alternatives() on a 5-level paging machine: BUG: KASAN: out-of-bounds in rcu_is_watching() Read of size 4 at addr ff110003ee6419a0 by task swapper/0/0 ... __asan_load4() rcu_is_watching() trace_hardirqs_on() text_poke_early() apply_alternatives() ... On machines with 5-level paging, cpu_feature_enabled(X86_FEATURE_LA57) gets patched. It includes KASAN code, where KASAN_SHADOW_START depends on __VIRTUAL_MASK_SHIFT, which is defined with cpu_feature_enabled(). KASAN gets confused when apply_alternatives() patches the KASAN_SHADOW_START users. A test patch that makes KASAN_SHADOW_START static, by replacing __VIRTUAL_MASK_SHIFT with 56, works around the issue. Fix it for real by disabling KASAN while the kernel is patching alternatives. [ mingo: updated the changelog ] Fixes: 6657fca06e3f ("x86/mm: Allow to boot without LA57 if CONFIG_X86_5LEVEL=y") Reported-by: Fei Yang <fei.yang@intel.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20231012100424.1456-1-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-20x86/cpu: Fix AMD erratum #1485 on Zen4-based CPUsBorislav Petkov (AMD)1-0/+8
commit f454b18e07f518bcd0c05af17a2239138bff52de upstream. Fix erratum #1485 on Zen4 parts where running with STIBP disabled can cause an #UD exception. The performance impact of the fix is negligible. Reported-by: René Rebe <rene@exactcode.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: René Rebe <rene@exactcode.de> Cc: <stable@kernel.org> Link: https://lore.kernel.org/r/D99589F4-BC5D-430B-87B2-72C20370CF57@exactcode.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-10x86/sev: Use the GHCB protocol when available for SNP CPUID requestsTom Lendacky1-14/+55
commit 6bc6f7d9d7ac3cdbe9e8b0495538b4a0cc11f032 upstream. SNP retrieves the majority of CPUID information from the SNP CPUID page. But there are times when that information needs to be supplemented by the hypervisor, for example, obtaining the initial APIC ID of the vCPU from leaf 1. The current implementation uses the MSR protocol to retrieve the data from the hypervisor, even when a GHCB exists. The problem arises when an NMI arrives on return from the VMGEXIT. The NMI will be immediately serviced and may generate a #VC requiring communication with the hypervisor. Since a GHCB exists in this case, it will be used. As part of using the GHCB, the #VC handler will write the GHCB physical address into the GHCB MSR and the #VC will be handled. When the NMI completes, processing resumes at the site of the VMGEXIT which is expecting to read the GHCB MSR and find a CPUID MSR protocol response. Since the NMI handling overwrote the GHCB MSR response, the guest will see an invalid reply from the hypervisor and self-terminate. Fix this problem by using the GHCB when it is available. Any NMI received is properly handled because the GHCB contents are copied into a backup page and restored on NMI exit, thus preserving the active GHCB request or result. [ bp: Touchups. ] Fixes: ee0bfa08a345 ("x86/compressed/64: Add support for SEV-SNP CPUID table in #VC handlers") Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: <stable@kernel.org> Link: https://lore.kernel.org/r/a5856fa1ebe3879de91a8f6298b6bbd901c61881.1690578565.git.thomas.lendacky@amd.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-06x86/srso: Add SRSO mitigation for Hygon processorsPu Wen1-1/+1
commit a5ef7d68cea1344cf524f04981c2b3f80bedbb0d upstream. Add mitigation for the speculative return stack overflow vulnerability which exists on Hygon processors too. Signed-off-by: Pu Wen <puwen@hygon.cn> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/tencent_4A14812842F104E93AA722EC939483CEFF05@qq.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-06x86/sgx: Resolves SECS reclaim vs. page fault for EAUG raceHaitao Huang1-5/+25
commit c6c2adcba50c2622ed25ba5d5e7f05f584711358 upstream. The SGX EPC reclaimer (ksgxd) may reclaim the SECS EPC page for an enclave and set secs.epc_page to NULL. The SECS page is used for EAUG and ELDU in the SGX page fault handler. However, the NULL check for secs.epc_page is only done for ELDU, not EAUG before being used. Fix this by doing the same NULL check and reloading of the SECS page as needed for both EAUG and ELDU. The SECS page holds global enclave metadata. It can only be reclaimed when there are no other enclave pages remaining. At that point, virtually nothing can be done with the enclave until the SECS page is paged back in. An enclave can not run nor generate page faults without a resident SECS page. But it is still possible for a #PF for a non-SECS page to race with paging out the SECS page: when the last resident non-SECS page A triggers a #PF in a non-resident page B, and then page A and the SECS both are paged out before the #PF on B is handled. Hitting this bug requires that race triggered with a #PF for EAUG. Following is a trace when it happens. BUG: kernel NULL pointer dereference, address: 0000000000000000 RIP: 0010:sgx_encl_eaug_page+0xc7/0x210 Call Trace: ? __kmem_cache_alloc_node+0x16a/0x440 ? xa_load+0x6e/0xa0 sgx_vma_fault+0x119/0x230 __do_fault+0x36/0x140 do_fault+0x12f/0x400 __handle_mm_fault+0x728/0x1110 handle_mm_fault+0x105/0x310 do_user_addr_fault+0x1ee/0x750 ? __this_cpu_preempt_check+0x13/0x20 exc_page_fault+0x76/0x180 asm_exc_page_fault+0x27/0x30 Fixes: 5a90d2c3f5ef ("x86/sgx: Support adding of pages to an initialized enclave") Signed-off-by: Haitao Huang <haitao.huang@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Reviewed-by: Kai Huang <kai.huang@intel.com> Acked-by: Reinette Chatre <reinette.chatre@intel.com> Cc:stable@vger.kernel.org Link: https://lore.kernel.org/all/20230728051024.33063-1-haitao.huang%40linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-06x86/reboot: VMCLEAR active VMCSes before emergency rebootSean Christopherson2-31/+22
[ Upstream commit b23c83ad2c638420ec0608a9de354507c41bec29 ] VMCLEAR active VMCSes before any emergency reboot, not just if the kernel may kexec into a new kernel after a crash. Per Intel's SDM, the VMX architecture doesn't require the CPU to flush the VMCS cache on INIT. If an emergency reboot doesn't RESET CPUs, cached VMCSes could theoretically be kept and only be written back to memory after the new kernel is booted, i.e. could effectively corrupt memory after reboot. Opportunistically remove the setting of the global pointer to NULL to make checkpatch happy. Cc: Andrew Cooper <Andrew.Cooper3@citrix.com> Link: https://lore.kernel.org/r/20230721201859.2307736-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-10-06x86/srso: Fix SBPB enablement for spec_rstack_overflow=offJosh Poimboeuf1-1/+1
[ Upstream commit 01b057b2f4cc2d905a0bd92195657dbd9a7005ab ] If the user has requested no SRSO mitigation, other mitigations can use the lighter-weight SBPB instead of IBPB. Fixes: fb3bd914b3ec ("x86/srso: Add a Speculative RAS Overflow mitigation") Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/b20820c3cfd1003171135ec8d762a0b957348497.1693889988.git.jpoimboe@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-10-06x86/srso: Fix srso_show_state() side effectJosh Poimboeuf1-1/+1
[ Upstream commit a8cf700c17d9ca6cb8ee7dc5c9330dbac3948237 ] Reading the 'spec_rstack_overflow' sysfs file can trigger an unnecessary MSR write, and possibly even a (handled) exception if the microcode hasn't been updated. Avoid all that by just checking X86_FEATURE_IBPB_BRTYPE instead, which gets set by srso_select_mitigation() if the updated microcode exists. Fixes: fb3bd914b3ec ("x86/srso: Add a Speculative RAS Overflow mitigation") Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/27d128899cb8aee9eb2b57ddc996742b0c1d776b.1693889988.git.jpoimboe@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-10-06x86/mm, kexec, ima: Use memblock_free_late() from ima_free_kexec_buffer()Rik van Riel1-6/+2
[ Upstream commit 34cf99c250d5cd2530b93a57b0de31d3aaf8685b ] The code calling ima_free_kexec_buffer() runs long after the memblock allocator has already been torn down, potentially resulting in a use after free in memblock_isolate_range(). With KASAN or KFENCE, this use after free will result in a BUG from the idle task, and a subsequent kernel panic. Switch ima_free_kexec_buffer() over to memblock_free_late() to avoid that bug. Fixes: fee3ff99bc67 ("powerpc: Move arch independent ima kexec functions to drivers/of/kexec.c") Suggested-by: Mike Rappoport <rppt@kernel.org> Signed-off-by: Rik van Riel <riel@surriel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230817135558.67274c83@imladris.surriel.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-13x86/sgx: Break up long non-preemptible delays in sgx_vepc_release()Jack Wang1-0/+3
commit 3d7d72a34e05b23e21bafc8bfb861e73c86b31f3 upstream. On large enclaves we hit the softlockup warning with following call trace: xa_erase() sgx_vepc_release() __fput() task_work_run() do_exit() The latency issue is similar to the one fixed in: 8795359e35bc ("x86/sgx: Silence softlockup detection when releasing large enclaves") The test system has 64GB of enclave memory, and all is assigned to a single VM. Release of 'vepc' takes a longer time and causes long latencies, which triggers the softlockup warning. Add cond_resched() to give other tasks a chance to run and reduce latencies, which also avoids the softlockup detector. [ mingo: Rewrote the changelog. ] Fixes: 540745ddbc70 ("x86/sgx: Introduce virtual EPC for use by KVM guests") Reported-by: Yu Zhang <yu.zhang@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Yu Zhang <yu.zhang@ionos.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Reviewed-by: Kai Huang <kai.huang@intel.com> Acked-by: Haitao Huang <haitao.huang@linux.intel.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-09-13x86/MCE: Always save CS register on AMD Zen IF Poison errorsYazen Ghannam2-1/+30
commit 4240e2ebe67941ce2c4f5c866c3af4b5ac7a0c67 upstream. The Instruction Fetch (IF) units on current AMD Zen-based systems do not guarantee a synchronous #MC is delivered for poison consumption errors. Therefore, MCG_STATUS[EIPV|RIPV] will not be set. However, the microarchitecture does guarantee that the exception is delivered within the same context. In other words, the exact rIP is not known, but the context is known to not have changed. There is no architecturally-defined method to determine this behavior. The Code Segment (CS) register is always valid on such IF unit poison errors regardless of the value of MCG_STATUS[EIPV|RIPV]. Add a quirk to save the CS register for poison consumption from the IF unit banks. This is needed to properly determine the context of the error. Otherwise, the severity grading function will assume the context is IN_KERNEL due to the m->cs value being 0 (the initialized value). This leads to unnecessary kernel panics on data poison errors due to the kernel believing the poison consumption occurred in kernel context. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230814200853.29258-1-yazen.ghannam@amd.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-09-13x86/sev: Make enc_dec_hypercall() accept a size instead of npagesSteve Rutherford1-3/+1
commit ac3f9c9f1b37edaa7d1a9b908bc79d843955a1a2 upstream. enc_dec_hypercall() accepted a page count instead of a size, which forced its callers to round up. As a result, non-page aligned vaddrs caused pages to be spuriously marked as decrypted via the encryption status hypercall, which in turn caused consistent corruption of pages during live migration. Live migration requires accurate encryption status information to avoid migrating pages from the wrong perspective. Fixes: 064ce6c550a0 ("mm: x86: Invoke hypercall when page encryption status is changed") Signed-off-by: Steve Rutherford <srutherford@google.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com> Tested-by: Ben Hillier <bhillier@google.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230824223731.2055016-1-srutherford@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-09-13x86/speculation: Mark all Skylake CPUs as vulnerable to GDSDave Hansen1-4/+4
[ Upstream commit c9f4c45c8ec3f07f4f083f9750032a1ec3eab6b2 ] The Gather Data Sampling (GDS) vulnerability is common to all Skylake processors. However, the "client" Skylakes* are now in this list: https://www.intel.com/content/www/us/en/support/articles/000022396/processors.html which means they are no longer included for new vulnerabilities here: https://www.intel.com/content/www/us/en/developer/topic-technology/software-security-guidance/processors-affected-consolidated-product-cpu-model.html or in other GDS documentation. Thus, they were not included in the original GDS mitigation patches. Mark SKYLAKE and SKYLAKE_L as vulnerable to GDS to match all the other Skylake CPUs (which include Kaby Lake). Also group the CPUs so that the ones that share the exact same vulnerabilities are next to each other. Last, move SRBDS to the end of each line. This makes it clear at a glance that SKYLAKE_X is unique. Of the five Skylakes, it is the only "server" CPU and has a different implementation from the clients of the "special register" hardware, making it immune to SRBDS. This makes the diff much harder to read, but the resulting table is worth it. I very much appreciate the report from Michael Zhivich about this issue. Despite what level of support a hardware vendor is providing, the kernel very much needs an accurate and up-to-date list of vulnerable CPUs. More reports like this are very welcome. * Client Skylakes are CPUID 406E3/506E3 which is family 6, models 0x4E and 0x5E, aka INTEL_FAM6_SKYLAKE and INTEL_FAM6_SKYLAKE_L. Reported-by: Michael Zhivich <mzhivich@akamai.com> Fixes: 8974eb588283 ("x86/speculation: Add Gather Data Sampling mitigation") Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Daniel Sneddon <daniel.sneddon@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-13x86/APM: drop the duplicate APM_MINOR_DEV macroRandy Dunlap1-6/+0
[ Upstream commit 4ba2909638a29630a346d6c4907a3105409bee7d ] This source file already includes <linux/miscdevice.h>, which contains the same macro. It doesn't need to be defined here again. Fixes: 874bcd00f520 ("apm-emulation: move APM_MINOR_DEV to include/linux/miscdevice.h") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Jiri Kosina <jikos@kernel.org> Cc: x86@kernel.org Cc: Sohil Mehta <sohil.mehta@intel.com> Cc: Corentin Labbe <clabbe.montjoie@gmail.com> Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> Link: https://lore.kernel.org/r/20230728011120.759-1-rdunlap@infradead.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-30x86/fpu: Set X86_FEATURE_OSXSAVE feature after enabling OSXSAVE in CR4Feng Tang1-0/+7
commit 2c66ca3949dc701da7f4c9407f2140ae425683a5 upstream. 0-Day found a 34.6% regression in stress-ng's 'af-alg' test case, and bisected it to commit b81fac906a8f ("x86/fpu: Move FPU initialization into arch_cpu_finalize_init()"), which optimizes the FPU init order, and moves the CR4_OSXSAVE enabling into a later place: arch_cpu_finalize_init identify_boot_cpu identify_cpu generic_identify get_cpu_cap --> setup cpu capability ... fpu__init_cpu fpu__init_cpu_xstate cr4_set_bits(X86_CR4_OSXSAVE); As the FPU is not yet initialized the CPU capability setup fails to set X86_FEATURE_OSXSAVE. Many security module like 'camellia_aesni_avx_x86_64' depend on this feature and therefore fail to load, causing the regression. Cure this by setting X86_FEATURE_OSXSAVE feature right after OSXSAVE enabling. [ tglx: Moved it into the actual BSP FPU initialization code and added a comment ] Fixes: b81fac906a8f ("x86/fpu: Move FPU initialization into arch_cpu_finalize_init()") Reported-by: kernel test robot <oliver.sang@intel.com> Signed-off-by: Feng Tang <feng.tang@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/lkml/202307192135.203ac24e-oliver.sang@intel.com Link: https://lore.kernel.org/lkml/20230823065747.92257-1-feng.tang@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-30x86/fpu: Invalidate FPU state correctly on exec()Rick Edgecombe2-3/+2
commit 1f69383b203e28cf8a4ca9570e572da1699f76cd upstream. The thread flag TIF_NEED_FPU_LOAD indicates that the FPU saved state is valid and should be reloaded when returning to userspace. However, the kernel will skip doing this if the FPU registers are already valid as determined by fpregs_state_valid(). The logic embedded there considers the state valid if two cases are both true: 1: fpu_fpregs_owner_ctx points to the current tasks FPU state 2: the last CPU the registers were live in was the current CPU. This is usually correct logic. A CPU’s fpu_fpregs_owner_ctx is set to the current FPU during the fpregs_restore_userregs() operation, so it indicates that the registers have been restored on this CPU. But this alone doesn’t preclude that the task hasn’t been rescheduled to a different CPU, where the registers were modified, and then back to the current CPU. To verify that this was not the case the logic relies on the second condition. So the assumption is that if the registers have been restored, AND they haven’t had the chance to be modified (by being loaded on another CPU), then they MUST be valid on the current CPU. Besides the lazy FPU optimizations, the other cases where the FPU registers might not be valid are when the kernel modifies the FPU register state or the FPU saved buffer. In this case the operation modifying the FPU state needs to let the kernel know the correspondence has been broken. The comment in “arch/x86/kernel/fpu/context.h” has: /* ... * If the FPU register state is valid, the kernel can skip restoring the * FPU state from memory. * * Any code that clobbers the FPU registers or updates the in-memory * FPU state for a task MUST let the rest of the kernel know that the * FPU registers are no longer valid for this task. * * Either one of these invalidation functions is enough. Invalidate * a resource you control: CPU if using the CPU for something else * (with preemption disabled), FPU for the current task, or a task that * is prevented from running by the current task. */ However, this is not completely true. When the kernel modifies the registers or saved FPU state, it can only rely on __fpu_invalidate_fpregs_state(), which wipes the FPU’s last_cpu tracking. The exec path instead relies on fpregs_deactivate(), which sets the CPU’s FPU context to NULL. This was observed to fail to restore the reset FPU state to the registers when returning to userspace in the following scenario: 1. A task is executing in userspace on CPU0 - CPU0’s FPU context points to tasks - fpu->last_cpu=CPU0 2. The task exec()’s 3. While in the kernel the task is preempted - CPU0 gets a thread executing in the kernel (such that no other FPU context is activated) - Scheduler sets task’s fpu->last_cpu=CPU0 when scheduling out 4. Task is migrated to CPU1 5. Continuing the exec(), the task gets to fpu_flush_thread()->fpu_reset_fpregs() - Sets CPU1’s fpu context to NULL - Copies the init state to the task’s FPU buffer - Sets TIF_NEED_FPU_LOAD on the task 6. The task reschedules back to CPU0 before completing the exec() and returning to userspace - During the reschedule, scheduler finds TIF_NEED_FPU_LOAD is set - Skips saving the registers and updating task’s fpu→last_cpu, because TIF_NEED_FPU_LOAD is the canonical source. 7. Now CPU0’s FPU context is still pointing to the task’s, and fpu->last_cpu is still CPU0. So fpregs_state_valid() returns true even though the reset FPU state has not been restored. So the root cause is that exec() is doing the wrong kind of invalidate. It should reset fpu->last_cpu via __fpu_invalidate_fpregs_state(). Further, fpu__drop() doesn't really seem appropriate as the task (and FPU) are not going away, they are just getting reset as part of an exec. So switch to __fpu_invalidate_fpregs_state(). Also, delete the misleading comment that says that either kind of invalidate will be enough, because it’s not always the case. Fixes: 33344368cb08 ("x86/fpu: Clean up the fpu__clear() variants") Reported-by: Lei Wang <lei4.wang@intel.com> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Lijun Pan <lijun.pan@intel.com> Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> Acked-by: Lijun Pan <lijun.pan@intel.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230818170305.502891-1-rick.p.edgecombe@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-26x86/srso: Correct the mitigation status when SMT is disabledBorislav Petkov (AMD)1-3/+2
commit 6405b72e8d17bd1875a56ae52d23ec3cd51b9d66 upstream. Specify how is SRSO mitigated when SMT is disabled. Also, correct the SMT check for that. Fixes: e9fbc47b818b ("x86/srso: Disable the mitigation on unaffected configurations") Suggested-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Josh Poimboeuf <jpoimboe@kernel.org> Link: https://lore.kernel.org/r/20230814200813.p5czl47zssuej7nv@treble Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-26x86/retpoline,kprobes: Fix position of thunk sections with CONFIG_LTO_CLANGPetr Pavlu1-4/+4
commit 79cd2a11224eab86d6673fe8a11d2046ae9d2757 upstream. The linker script arch/x86/kernel/vmlinux.lds.S matches the thunk sections ".text.__x86.*" from arch/x86/lib/retpoline.S as follows: .text { [...] TEXT_TEXT [...] __indirect_thunk_start = .; *(.text.__x86.*) __indirect_thunk_end = .; [...] } Macro TEXT_TEXT references TEXT_MAIN which normally expands to only ".text". However, with CONFIG_LTO_CLANG, TEXT_MAIN becomes ".text .text.[0-9a-zA-Z_]*" which wrongly matches also the thunk sections. The output layout is then different than expected. For instance, the currently defined range [__indirect_thunk_start, __indirect_thunk_end] becomes empty. Prevent the problem by using ".." as the first separator, for example, ".text..__x86.indirect_thunk". This pattern is utilized by other explicit section names which start with one of the standard prefixes, such as ".text" or ".data", and that need to be individually selected in the linker script. [ nathan: Fix conflicts with SRSO and fold in fix issue brought up by Andrew Cooper in post-review: https://lore.kernel.org/20230803230323.1478869-1-andrew.cooper3@citrix.com ] Fixes: dc5723b02e52 ("kbuild: add support for Clang LTO") Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230711091952.27944-2-petr.pavlu@suse.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-26x86/srso: Disable the mitigation on unaffected configurationsBorislav Petkov (AMD)1-1/+6
commit e9fbc47b818b964ddff5df5b2d5c0f5f32f4a147 upstream. Skip the srso cmd line parsing which is not needed on Zen1/2 with SMT disabled and with the proper microcode applied (latter should be the case anyway) as those are not affected. Fixes: 5a15d8348881 ("x86/srso: Tie SBPB bit setting to microcode patch detection") Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230813104517.3346-1-bp@alien8.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-26x86/CPU/AMD: Fix the DIV(0) initial fix attemptBorislav Petkov (AMD)2-2/+1
commit f58d6fbcb7c848b7f2469be339bc571f2e9d245b upstream. Initially, it was thought that doing an innocuous division in the #DE handler would take care to prevent any leaking of old data from the divider but by the time the fault is raised, the speculation has already advanced too far and such data could already have been used by younger operations. Therefore, do the innocuous division on every exit to userspace so that userspace doesn't see any potentially old data from integer divisions in kernel space. Do the same before VMRUN too, to protect host data from leaking into the guest too. Fixes: 77245f1c3c64 ("x86/CPU/AMD: Do not leak quotient data after a division by 0") Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: <stable@kernel.org> Link: https://lore.kernel.org/r/20230811213824.10025-1-bp@alien8.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-26x86/static_call: Fix __static_call_fixup()Peter Zijlstra1-0/+13
commit 54097309620ef0dc2d7083783dc521c6a5fef957 upstream. Christian reported spurious module load crashes after some of Song's module memory layout patches. Turns out that if the very last instruction on the very last page of the module is a 'JMP __x86_return_thunk' then __static_call_fixup() will trip a fault and die. And while the module rework made this slightly more likely to happen, it's always been possible. Fixes: ee88d363d156 ("x86,static_call: Use alternative RET encoding") Reported-by: Christian Bricart <christian@bricart.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Josh Poimboeuf <jpoimboe@kernel.org> Link: https://lkml.kernel.org/r/20230816104419.GA982867@hirez.programming.kicks-ass.net Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-26x86/cpu: Cleanup the untrain messPeter Zijlstra1-0/+1
commit e7c25c441e9e0fa75b4c83e0b26306b702cfe90d upstream. Since there can only be one active return_thunk, there only needs be one (matching) untrain_ret. It fundamentally doesn't make sense to allow multiple untrain_ret at the same time. Fold all the 3 different untrain methods into a single (temporary) helper stub. Fixes: fb3bd914b3ec ("x86/srso: Add a Speculative RAS Overflow mitigation") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230814121149.042774962@infradead.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-26x86/cpu: Rename srso_(.*)_alias to srso_alias_\1Peter Zijlstra1-4/+4
commit 42be649dd1f2eee6b1fb185f1a231b9494cf095f upstream. For a more consistent namespace. [ bp: Fixup names in the doc too. ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230814121148.976236447@infradead.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-26x86/cpu: Rename original retbleed methodsPeter Zijlstra2-2/+2
commit d025b7bac07a6e90b6b98b487f88854ad9247c39 upstream. Rename the original retbleed return thunk and untrain_ret to retbleed_return_thunk() and retbleed_untrain_ret(). No functional changes. Suggested-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230814121148.909378169@infradead.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-26x86/cpu: Clean up SRSO return thunk messPeter Zijlstra2-5/+16
commit d43490d0ab824023e11d0b57d0aeec17a6e0ca13 upstream. Use the existing configurable return thunk. There is absolute no justification for having created this __x86_return_thunk alternative. To clarify, the whole thing looks like: Zen3/4 does: srso_alias_untrain_ret: nop2 lfence jmp srso_alias_return_thunk int3 srso_alias_safe_ret: // aliasses srso_alias_untrain_ret just so add $8, %rsp ret int3 srso_alias_return_thunk: call srso_alias_safe_ret ud2 While Zen1/2 does: srso_untrain_ret: movabs $foo, %rax lfence call srso_safe_ret (jmp srso_return_thunk ?) int3 srso_safe_ret: // embedded in movabs instruction add $8,%rsp ret int3 srso_return_thunk: call srso_safe_ret ud2 While retbleed does: zen_untrain_ret: test $0xcc, %bl lfence jmp zen_return_thunk int3 zen_return_thunk: // embedded in the test instruction ret int3 Where Zen1/2 flush the BTB entry using the instruction decoder trick (test,movabs) Zen3/4 use BTB aliasing. SRSO adds a return sequence (srso_safe_ret()) which forces the function return instruction to speculate into a trap (UD2). This RET will then mispredict and execution will continue at the return site read from the top of the stack. Pick one of three options at boot (evey function can only ever return once). [ bp: Fixup commit message uarch details and add them in a comment in the code too. Add a comment about the srso_select_mitigation() dependency on retbleed_select_mitigation(). Add moar ifdeffery for 32-bit builds. Add a dummy srso_untrain_ret_alias() definition for 32-bit alternatives needing the symbol. ] Fixes: fb3bd914b3ec ("x86/srso: Add a Speculative RAS Overflow mitigation") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230814121148.842775684@infradead.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-26x86/alternative: Make custom return thunk unconditionalPeter Zijlstra1-0/+2
commit 095b8303f3835c68ac4a8b6d754ca1c3b6230711 upstream. There is infrastructure to rewrite return thunks to point to any random thunk one desires, unwrap that from CALL_THUNKS, which up to now was the sole user of that. [ bp: Make the thunks visible on 32-bit and add ifdeffery for the 32-bit builds. ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230814121148.775293785@infradead.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-16x86/cpu/amd: Enable Zenbleed fix for AMD Custom APU 0405Cristian Ciocaltea1-0/+1
commit 6dbef74aeb090d6bee7d64ef3fa82ae6fa53f271 upstream. Commit 522b1d69219d ("x86/cpu/amd: Add a Zenbleed fix") provided a fix for the Zen2 VZEROUPPER data corruption bug affecting a range of CPU models, but the AMD Custom APU 0405 found on SteamDeck was not listed, although it is clearly affected by the vulnerability. Add this CPU variant to the Zenbleed erratum list, in order to unconditionally enable the fallback fix until a proper microcode update is available. Fixes: 522b1d69219d ("x86/cpu/amd: Add a Zenbleed fix") Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230811203705.1699914-1-cristian.ciocaltea@collabora.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-16x86/srso: Fix build breakage with the LLVM linkerNick Desaulniers1-3/+9
commit cbe8ded48b939b9d55d2c5589ab56caa7b530709 upstream. The assertion added to verify the difference in bits set of the addresses of srso_untrain_ret_alias() and srso_safe_ret_alias() would fail to link in LLVM's ld.lld linker with the following error: ld.lld: error: ./arch/x86/kernel/vmlinux.lds:210: at least one side of the expression must be absolute ld.lld: error: ./arch/x86/kernel/vmlinux.lds:211: at least one side of the expression must be absolute Use ABSOLUTE to evaluate the expression referring to at least one of the symbols so that LLD can evaluate the linker script. Also, add linker version info to the comment about XOR being unsupported in either ld.bfd or ld.lld until somewhat recently. Fixes: fb3bd914b3ec ("x86/srso: Add a Speculative RAS Overflow mitigation") Closes: https://lore.kernel.org/llvm/CA+G9fYsdUeNu-gwbs0+T6XHi4hYYk=Y9725-wFhZ7gJMspLDRA@mail.gmail.com/ Reported-by: Nathan Chancellor <nathan@kernel.org> Reported-by: Daniel Kolesa <daniel@octaforge.org> Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Suggested-by: Sven Volkinsfeld <thyrc@gmx.net> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://github.com/ClangBuiltLinux/linux/issues/1907 Link: https://lore.kernel.org/r/20230809-gds-v1-1-eaac90b0cbcc@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-11x86/CPU/AMD: Do not leak quotient data after a division by 0Borislav Petkov (AMD)2-0/+21
commit 77245f1c3c6495521f6a3af082696ee2f8ce3921 upstream. Under certain circumstances, an integer division by 0 which faults, can leave stale quotient data from a previous division operation on Zen1 microarchitectures. Do a dummy division 0/1 before returning from the #DE exception handler in order to avoid any leaks of potentially sensitive data. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: <stable@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-08x86: fix backwards merge of GDS/SRSO bitGreg Kroah-Hartman1-3/+3
Stable-tree-only change. Due to the way the GDS and SRSO patches flowed into the stable tree, it was a 50% chance that the merge of the which value GDS and SRSO should be. Of course, I lost that bet, and chose the opposite of what Linus chose in commit 64094e7e3118 ("Merge tag 'gds-for-linus-2023-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip") Fix this up by switching the values to match what is now in Linus's tree as that is the correct value to mirror. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-08x86/srso: Tie SBPB bit setting to microcode patch detectionBorislav Petkov (AMD)2-11/+15
commit 5a15d8348881e9371afdf9f5357a135489496955 upstream. The SBPB bit in MSR_IA32_PRED_CMD is supported only after a microcode patch has been applied so set X86_FEATURE_SBPB only then. Otherwise, guests would attempt to set that bit and #GP on the MSR write. While at it, make SMT detection more robust as some guests - depending on how and what CPUID leafs their report - lead to cpu_smt_control getting set to CPU_SMT_NOT_SUPPORTED but SRSO_NO should be set for any guest incarnation where one simply cannot do SMT, for whatever reason. Fixes: fb3bd914b3ec ("x86/srso: Add a Speculative RAS Overflow mitigation") Reported-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reported-by: Salvatore Bonaccorso <carnil@debian.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-08x86/srso: Fix return thunks in generated codeJosh Poimboeuf2-3/+7
Upstream commit: 238ec850b95a02dcdff3edc86781aa913549282f Set X86_FEATURE_RETHUNK when enabling the SRSO mitigation so that generated code (e.g., ftrace, static call, eBPF) generates "jmp __x86_return_thunk" instead of RET. [ bp: Add a comment. ] Fixes: fb3bd914b3ec ("x86/srso: Add a Speculative RAS Overflow mitigation") Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-08x86/srso: Add IBPB on VMEXITBorislav Petkov (AMD)1-0/+19
Upstream commit: d893832d0e1ef41c72cdae444268c1d64a2be8ad Add the option to flush IBPB only on VMEXIT in order to protect from malicious guests but one otherwise trusts the software that runs on the hypervisor. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-08x86/srso: Add IBPBBorislav Petkov (AMD)1-0/+23
Upstream commit: 233d6f68b98d480a7c42ebe78c38f79d44741ca9 Add the option to mitigate using IBPB on a kernel entry. Pull in the Retbleed alternative so that the IBPB call from there can be used. Also, if Retbleed mitigation is done using IBPB, the same mitigation can and must be used here. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-08x86/srso: Add SRSO_NO supportBorislav Petkov (AMD)3-12/+30
Upstream commit: 1b5277c0ea0b247393a9c426769fde18cff5e2f6 Add support for the CPUID flag which denotes that the CPU is not affected by SRSO. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-08x86/srso: Add IBPB_BRTYPE supportBorislav Petkov (AMD)1-1/+11
Upstream commit: 79113e4060aba744787a81edb9014f2865193854 Add support for the synthetic CPUID flag which "if this bit is 1, it indicates that MSR 49h (PRED_CMD) bit 0 (IBPB) flushes all branch type predictions from the CPU branch predictor." This flag is there so that this capability in guests can be detected easily (otherwise one would have to track microcode revisions which is impossible for guests). It is also needed only for Zen3 and -4. The other two (Zen1 and -2) always flush branch type predictions by default. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-08x86/srso: Add a Speculative RAS Overflow mitigationBorislav Petkov (AMD)5-3/+161
Upstream commit: fb3bd914b3ec28f5fb697ac55c4846ac2d542855 Add a mitigation for the speculative return address stack overflow vulnerability found on AMD processors. The mitigation works by ensuring all RET instructions speculate to a controlled location, similar to how speculation is controlled in the retpoline sequence. To accomplish this, the __x86_return_thunk forces the CPU to mispredict every function return using a 'safe return' sequence. To ensure the safety of this mitigation, the kernel must ensure that the safe return sequence is itself free from attacker interference. In Zen3 and Zen4, this is accomplished by creating a BTB alias between the untraining function srso_untrain_ret_alias() and the safe return function srso_safe_ret_alias() which results in evicting a potentially poisoned BTB entry and using that safe one for all function returns. In older Zen1 and Zen2, this is accomplished using a reinterpretation technique similar to Retbleed one: srso_untrain_ret() and srso_safe_ret(). Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-08x86/cpu, kvm: Add support for CPUID_80000021_EAXKim Phillips1-0/+3
commit 8415a74852d7c24795007ee9862d25feb519007c upstream. Add support for CPUID leaf 80000021, EAX. The majority of the features will be used in the kernel and thus a separate leaf is appropriate. Include KVM's reverse_cpuid entry because features are used by VM guests, too. [ bp: Massage commit message. ] Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20230124163319.2277355-2-kim.phillips@amd.com [bwh: Backported to 6.1: adjust context] Signed-off-by: Ben Hutchings <benh@debian.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-08KVM: Add GDS_NO support to KVMDaniel Sneddon1-0/+7
commit 81ac7e5d741742d650b4ed6186c4826c1a0631a7 upstream Gather Data Sampling (GDS) is a transient execution attack using gather instructions from the AVX2 and AVX512 extensions. This attack allows malicious code to infer data that was previously stored in vector registers. Systems that are not vulnerable to GDS will set the GDS_NO bit of the IA32_ARCH_CAPABILITIES MSR. This is useful for VM guests that may think they are on vulnerable systems that are, in fact, not affected. Guests that are running on affected hosts where the mitigation is enabled are protected as if they were running on an unaffected system. On all hosts that are not affected or that are mitigated, set the GDS_NO bit. Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>