summaryrefslogtreecommitdiff
path: root/arch/x86/include
AgeCommit message (Collapse)AuthorFilesLines
2022-06-14x86/cpu: Elide KCSAN for cpu_has() and friendsPeter Zijlstra1-1/+1
[ Upstream commit a6a5eb269f6f3a2fe392f725a8d9052190c731e2 ] As x86 uses the <asm-generic/bitops/instrumented-*.h> headers, the regular forms of all bitops are instrumented with explicit calls to KASAN and KCSAN checks. As these are explicit calls, these are not suppressed by the noinstr function attribute. This can result in calls to those check functions in noinstr code, which objtool warns about: vmlinux.o: warning: objtool: enter_from_user_mode+0x24: call to __kcsan_check_access() leaves .noinstr.text section vmlinux.o: warning: objtool: syscall_enter_from_user_mode+0x28: call to __kcsan_check_access() leaves .noinstr.text section vmlinux.o: warning: objtool: syscall_enter_from_user_mode_prepare+0x24: call to __kcsan_check_access() leaves .noinstr.text section vmlinux.o: warning: objtool: irqentry_enter_from_user_mode+0x24: call to __kcsan_check_access() leaves .noinstr.text section Prevent this by using the arch_*() bitops, which are the underlying bitops without explciit instrumentation. [null: Changelog] Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220502111216.290518605@infradead.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14x86: drop bogus "cc" clobber from __try_cmpxchg_user_asm()Jan Beulich1-1/+1
[ Upstream commit 1df931d95f4dc1c11db1123e85d4e08156e46ef9 ] As noted (and fixed) a couple of times in the past, "=@cc<cond>" outputs and clobbering of "cc" don't work well together. The compiler appears to mean to reject such, but doesn't - in its upstream form - quite manage to yet for "cc". Furthermore two similar macros don't clobber "cc", and clobbering "cc" is pointless in asm()-s for x86 anyway - the compiler always assumes status flags to be clobbered there. Fixes: 989b5db215a2 ("x86/uaccess: Implement macros for CMPXCHG on user addresses") Signed-off-by: Jan Beulich <jbeulich@suse.com> Message-Id: <485c0c0b-a3a7-0b7c-5264-7d00c01de032@suse.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09kexec_file: drop weak attribute from arch_kexec_apply_relocations[_add]Naveen N. Rao1-0/+8
commit 3e35142ef99fe6b4fe5d834ad43ee13cca10a2dc upstream. Since commit d1bcae833b32f1 ("ELF: Don't generate unused section symbols") [1], binutils (v2.36+) started dropping section symbols that it thought were unused. This isn't an issue in general, but with kexec_file.c, gcc is placing kexec_arch_apply_relocations[_add] into a separate .text.unlikely section and the section symbol ".text.unlikely" is being dropped. Due to this, recordmcount is unable to find a non-weak symbol in .text.unlikely to generate a relocation record against. Address this by dropping the weak attribute from these functions. Instead, follow the existing pattern of having architectures #define the name of the function they want to override in their headers. [1] https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=d1bcae833b32f1 [akpm@linux-foundation.org: arch/s390/include/asm/kexec.h needs linux/module.h] Link: https://lkml.kernel.org/r/20220519091237.676736-1-naveen.n.rao@linux.vnet.ibm.com Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-09x86/mce: relocate set{clear}_mce_nospec() functionsJane Chu1-52/+0
[ Upstream commit b3fdf9398a16f01dc013967a4ab25e99c3f4fc12 ] Relocate the twin mce functions to arch/x86/mm/pat/set_memory.c file where they belong. While at it, fixup a function name in a comment. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Jane Chu <jane.chu@oracle.com> Acked-by: Borislav Petkov <bp@suse.de> Cc: Stephen Rothwell <sfr@canb.auug.org.au> [sfr: gate {set,clear}_mce_nospec() by CONFIG_X86_64] Link: https://lore.kernel.org/r/165272527328.90175.8336008202048685278.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09x86/pm: Fix false positive kmemleak report in msr_build_context()Matthieu Baerts2-5/+9
[ Upstream commit b0b592cf08367719e1d1ef07c9f136e8c17f7ec3 ] Since e2a1256b17b1 ("x86/speculation: Restore speculation related MSRs during S3 resume") kmemleak reports this issue: unreferenced object 0xffff888009cedc00 (size 256): comm "swapper/0", pid 1, jiffies 4294693823 (age 73.764s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 48 00 00 00 00 00 00 00 ........H....... 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: msr_build_context (include/linux/slab.h:621) pm_check_save_msr (arch/x86/power/cpu.c:520) do_one_initcall (init/main.c:1298) kernel_init_freeable (init/main.c:1370) kernel_init (init/main.c:1504) ret_from_fork (arch/x86/entry/entry_64.S:304) Reproducer: - boot the VM with a debug kernel config (see https://github.com/multipath-tcp/mptcp_net-next/issues/268) - wait ~1 minute - start a kmemleak scan The root cause here is alignment within the packed struct saved_context (from suspend_64.h). Kmemleak only searches for pointers that are aligned (see how pointers are scanned in kmemleak.c), but pahole shows that the saved_msrs struct member and all members after it in the structure are unaligned: struct saved_context { struct pt_regs regs; /* 0 168 */ /* --- cacheline 2 boundary (128 bytes) was 40 bytes ago --- */ u16 ds; /* 168 2 */ ... u64 misc_enable; /* 232 8 */ bool misc_enable_saved; /* 240 1 */ /* Note below odd offset values for the remainder of this struct */ struct saved_msrs saved_msrs; /* 241 16 */ /* --- cacheline 4 boundary (256 bytes) was 1 bytes ago --- */ long unsigned int efer; /* 257 8 */ u16 gdt_pad; /* 265 2 */ struct desc_ptr gdt_desc; /* 267 10 */ u16 idt_pad; /* 277 2 */ struct desc_ptr idt; /* 279 10 */ u16 ldt; /* 289 2 */ u16 tss; /* 291 2 */ long unsigned int tr; /* 293 8 */ long unsigned int safety; /* 301 8 */ long unsigned int return_address; /* 309 8 */ /* size: 317, cachelines: 5, members: 25 */ /* last cacheline: 61 bytes */ } __attribute__((__packed__)); Move misc_enable_saved to the end of the struct declaration so that saved_msrs fits in before the cacheline 4 boundary. The comment above the saved_context declaration says to fix wakeup_64.S file and __save/__restore_processor_state() if the struct is modified: it looks like all the accesses in wakeup_64.S are done through offsets which are computed at build-time. Update that comment accordingly. At the end, the false positive kmemleak report is due to a limitation from kmemleak but it is always good to avoid unaligned members for optimisation purposes. Please note that it looks like this issue is not new, e.g. https://lore.kernel.org/all/9f1bb619-c4ee-21c4-a251-870bd4db04fa@lwfinger.net/ https://lore.kernel.org/all/94e48fcd-1dbd-ebd2-4c91-f39941735909@molgen.mpg.de/ [ bp: Massage + cleanup commit message. ] Fixes: 7a9c2dd08ead ("x86/pm: Introduce quirk framework to save/restore extra MSR registers around suspend/resume") Suggested-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lore.kernel.org/r/20220426202138.498310-1-matthieu.baerts@tessares.net Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09ACPICA: Avoid cache flush inside virtual machinesKirill A. Shutemov1-1/+13
[ Upstream commit e2efb6359e620521d1e13f69b2257de8ceaa9475 ] While running inside virtual machine, the kernel can bypass cache flushing. Changing sleep state in a virtual machine doesn't affect the host system sleep state and cannot lead to data loss. Before entering sleep states, the ACPI code flushes caches to prevent data loss using the WBINVD instruction. This mechanism is required on bare metal. But, any use WBINVD inside of a guest is worthless. Changing sleep state in a virtual machine doesn't affect the host system sleep state and cannot lead to data loss, so most hypervisors simply ignore it. Despite this, the ACPI code calls WBINVD unconditionally anyway. It's useless, but also normally harmless. In TDX guests, though, WBINVD stops being harmless; it triggers a virtualization exception (#VE). If the ACPI cache-flushing WBINVD were left in place, TDX guests would need handling to recover from the exception. Avoid using WBINVD whenever running under a hypervisor. This both removes the useless WBINVDs and saves TDX from implementing WBINVD handling. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20220405232939.73860-30-kirill.shutemov@linux.intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-06x86/uaccess: Implement macros for CMPXCHG on user addressesPeter Zijlstra1-0/+142
commit 989b5db215a2f22f89d730b607b071d964780f10 upstream. Add support for CMPXCHG loops on userspace addresses. Provide both an "unsafe" version for tight loops that do their own uaccess begin/end, as well as a "safe" version for use cases where the CMPXCHG is not buried in a loop, e.g. KVM will resume the guest instead of looping when emulation of a guest atomic accesses fails the CMPXCHG. Provide 8-byte versions for 32-bit kernels so that KVM can do CMPXCHG on guest PAE PTEs, which are accessed via userspace addresses. Guard the asm_volatile_goto() variation with CC_HAS_ASM_GOTO_TIED_OUTPUT, the "+m" constraint fails on some compilers that otherwise support CC_HAS_ASM_GOTO_OUTPUT. Cc: stable@vger.kernel.org Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220202004945.2540433-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-30x86/tsc: Use fallback for random_get_entropy() instead of zeroJason A. Donenfeld2-4/+12
commit 3bd4abc07a267e6a8b33d7f8717136e18f921c53 upstream. In the event that random_get_entropy() can't access a cycle counter or similar, falling back to returning 0 is suboptimal. Instead, fallback to calling random_get_entropy_fallback(), which isn't extremely high precision or guaranteed to be entropic, but is certainly better than returning zero all the time. If CONFIG_X86_TSC=n, then it's possible for the kernel to run on systems without RDTSC, such as 486 and certain 586, so the fallback code is only required for that case. As well, fix up both the new function and the get_cycles() function from which it was derived to use cpu_feature_enabled() rather than boot_cpu_has(), and use !IS_ENABLED() instead of #ifndef. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: x86@kernel.org Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-09x86/cpu: Load microcode during restore_processor_state()Borislav Petkov1-0/+2
commit f9e14dbbd454581061c736bf70bf5cbb15ac927c upstream. When resuming from system sleep state, restore_processor_state() restores the boot CPU MSRs. These MSRs could be emulated by microcode. If microcode is not loaded yet, writing to emulated MSRs leads to unchecked MSR access error: ... PM: Calling lapic_suspend+0x0/0x210 unchecked MSR access error: WRMSR to 0x10f (tried to write 0x0...0) at rIP: ... (native_write_msr) Call Trace: <TASK> ? restore_processor_state x86_acpi_suspend_lowlevel acpi_suspend_enter suspend_devices_and_enter pm_suspend.cold state_store kobj_attr_store sysfs_kf_write kernfs_fop_write_iter new_sync_write vfs_write ksys_write __x64_sys_write do_syscall_64 entry_SYSCALL_64_after_hwframe RIP: 0033:0x7fda13c260a7 To ensure microcode emulated MSRs are available for restoration, load the microcode on the boot CPU before restoring these MSRs. [ Pawan: write commit message and productize it. ] Fixes: e2a1256b17b1 ("x86/speculation: Restore speculation related MSRs during S3 resume") Reported-by: Kyle D. Pelton <kyle.d.pelton@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Tested-by: Kyle D. Pelton <kyle.d.pelton@intel.com> Cc: stable@vger.kernel.org Link: https://bugzilla.kernel.org/show_bug.cgi?id=215841 Link: https://lore.kernel.org/r/4350dfbf785cd482d3fafa72b2b49c83102df3ce.1650386317.git.pawan.kumar.gupta@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-27stat: fix inconsistency between struct stat and struct compat_statMikulas Patocka1-4/+2
[ Upstream commit 932aba1e169090357a77af18850a10c256b50819 ] struct stat (defined in arch/x86/include/uapi/asm/stat.h) has 32-bit st_dev and st_rdev; struct compat_stat (defined in arch/x86/include/asm/compat.h) has 16-bit st_dev and st_rdev followed by a 16-bit padding. This patch fixes struct compat_stat to match struct stat. [ Historical note: the old x86 'struct stat' did have that 16-bit field that the compat layer had kept around, but it was changes back in 2003 by "struct stat - support larger dev_t": https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/?id=e95b2065677fe32512a597a79db94b77b90c968d and back in those days, the x86_64 port was still new, and separate from the i386 code, and had already picked up the old version with a 16-bit st_dev field ] Note that we can't change compat_dev_t because it is used by compat_loop_info. Also, if the st_dev and st_rdev values are 32-bit, we don't have to use old_valid_dev to test if the value fits into them. This fixes -EOVERFLOW on filesystems that are on NVMe because NVMe uses the major number 259. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: Andreas Schwab <schwab@linux-m68k.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-27KVM: x86: hyper-v: Avoid writing to TSC page without an active vCPUVitaly Kuznetsov1-3/+1
[ Upstream commit 42dcbe7d8bac997eef4c379e61d9121a15ed4e36 ] The following WARN is triggered from kvm_vm_ioctl_set_clock(): WARNING: CPU: 10 PID: 579353 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:3161 mark_page_dirty_in_slot+0x6c/0x80 [kvm] ... CPU: 10 PID: 579353 Comm: qemu-system-x86 Tainted: G W O 5.16.0.stable #20 Hardware name: LENOVO 20UF001CUS/20UF001CUS, BIOS R1CET65W(1.34 ) 06/17/2021 RIP: 0010:mark_page_dirty_in_slot+0x6c/0x80 [kvm] ... Call Trace: <TASK> ? kvm_write_guest+0x114/0x120 [kvm] kvm_hv_invalidate_tsc_page+0x9e/0xf0 [kvm] kvm_arch_vm_ioctl+0xa26/0xc50 [kvm] ? schedule+0x4e/0xc0 ? __cond_resched+0x1a/0x50 ? futex_wait+0x166/0x250 ? __send_signal+0x1f1/0x3d0 kvm_vm_ioctl+0x747/0xda0 [kvm] ... The WARN was introduced by commit 03c0304a86bc ("KVM: Warn if mark_page_dirty() is called without an active vCPU") but the change seems to be correct (unlike Hyper-V TSC page update mechanism). In fact, there's no real need to actually write to guest memory to invalidate TSC page, this can be done by the first vCPU which goes through kvm_guest_time_update(). Reported-by: Maxim Levitsky <mlevitsk@redhat.com> Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220407201013.963226-1-vkuznets@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-20x86/tsx: Disable TSX development mode at bootPawan Gupta1-2/+2
commit 400331f8ffa3bec5c561417e5eec6848464e9160 upstream. A microcode update on some Intel processors causes all TSX transactions to always abort by default[*]. Microcode also added functionality to re-enable TSX for development purposes. With this microcode loaded, if tsx=on was passed on the cmdline, and TSX development mode was already enabled before the kernel boot, it may make the system vulnerable to TSX Asynchronous Abort (TAA). To be on safer side, unconditionally disable TSX development mode during boot. If a viable use case appears, this can be revisited later. [*]: Intel TSX Disable Update for Selected Processors, doc ID: 643557 [ bp: Drop unstable web link, massage heavily. ] Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> Suggested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Neelima Krishnan <neelima.krishnan@intel.com> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/347bd844da3a333a9793c6687d4e4eb3b2419a3e.1646943780.git.pawan.kumar.gupta@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-20KVM: x86/mmu: Resolve nx_huge_pages when kvm.ko is loadedSean Christopherson1-2/+3
commit 1d0e84806047f38027d7572adb4702ef7c09b317 upstream. Resolve nx_huge_pages to true/false when kvm.ko is loaded, leaving it as -1 is technically undefined behavior when its value is read out by param_get_bool(), as boolean values are supposed to be '0' or '1'. Alternatively, KVM could define a custom getter for the param, but the auto value doesn't depend on the vendor module in any way, and printing "auto" would be unnecessarily unfriendly to the user. In addition to fixing the undefined behavior, resolving the auto value also fixes the scenario where the auto value resolves to N and no vendor module is loaded. Previously, -1 would result in Y being printed even though KVM would ultimately disable the mitigation. Rename the existing MMU module init/exit helpers to clarify that they're invoked with respect to the vendor module, and add comments to document why KVM has two separate "module init" flows. ========================================================================= UBSAN: invalid-load in kernel/params.c:320:33 load of value 255 is not a valid value for type '_Bool' CPU: 6 PID: 892 Comm: tail Not tainted 5.17.0-rc3+ #799 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 Call Trace: <TASK> dump_stack_lvl+0x34/0x44 ubsan_epilogue+0x5/0x40 __ubsan_handle_load_invalid_value.cold+0x43/0x48 param_get_bool.cold+0xf/0x14 param_attr_show+0x55/0x80 module_attr_show+0x1c/0x30 sysfs_kf_seq_show+0x93/0xc0 seq_read_iter+0x11c/0x450 new_sync_read+0x11b/0x1a0 vfs_read+0xf0/0x190 ksys_read+0x5f/0xe0 do_syscall_64+0x3b/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xae </TASK> ========================================================================= Fixes: b8e8c8303ff2 ("kvm: mmu: ITLB_MULTIHIT mitigation") Cc: stable@vger.kernel.org Reported-by: Bruno Goncalves <bgoncalv@redhat.com> Reported-by: Jan Stancek <jstancek@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220331221359.3912754-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-20static_call: Properly initialise DEFINE_STATIC_CALL_RET0()Christophe Leroy1-0/+2
[ Upstream commit 5517d500829c683a358a8de04ecb2e28af629ae5 ] When a static call is updated with __static_call_return0() as target, arch_static_call_transform() set it to use an optimised set of instructions which are meant to lay in the same cacheline. But when initialising a static call with DEFINE_STATIC_CALL_RET0(), we get a branch to the real __static_call_return0() function instead of getting the optimised setup: c00d8120 <__SCT__perf_snapshot_branch_stack>: c00d8120: 4b ff ff f4 b c00d8114 <__static_call_return0> c00d8124: 3d 80 c0 0e lis r12,-16370 c00d8128: 81 8c 81 3c lwz r12,-32452(r12) c00d812c: 7d 89 03 a6 mtctr r12 c00d8130: 4e 80 04 20 bctr c00d8134: 38 60 00 00 li r3,0 c00d8138: 4e 80 00 20 blr c00d813c: 00 00 00 00 .long 0x0 Add ARCH_DEFINE_STATIC_CALL_RET0_TRAMP() defined by each architecture to setup the optimised configuration, and rework DEFINE_STATIC_CALL_RET0() to call it: c00d8120 <__SCT__perf_snapshot_branch_stack>: c00d8120: 48 00 00 14 b c00d8134 <__SCT__perf_snapshot_branch_stack+0x14> c00d8124: 3d 80 c0 0e lis r12,-16370 c00d8128: 81 8c 81 3c lwz r12,-32452(r12) c00d812c: 7d 89 03 a6 mtctr r12 c00d8130: 4e 80 04 20 bctr c00d8134: 38 60 00 00 li r3,0 c00d8138: 4e 80 00 20 blr c00d813c: 00 00 00 00 .long 0x0 Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lore.kernel.org/r/1e0a61a88f52a460f62a58ffc2a5f847d1f7d9d8.1647253456.git.christophe.leroy@csgroup.eu Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-13x86/extable: Prefer local labels in .set directivesNick Desaulniers1-10/+10
commit 334865b2915c33080624e0d06f1c3e917036472c upstream. Bernardo reported an error that Nathan bisected down to (x86_64) defconfig+LTO_CLANG_FULL+X86_PMEM_LEGACY. LTO vmlinux.o ld.lld: error: <instantiation>:1:13: redefinition of 'found' .set found, 0 ^ <inline asm>:29:1: while in macro instantiation extable_type_reg reg=%eax, type=(17 | ((0) << 16)) ^ This appears to be another LTO specific issue similar to what was folded into commit 4b5305decc84 ("x86/extable: Extend extable functionality"), where the `.set found, 0` in DEFINE_EXTABLE_TYPE_REG in arch/x86/include/asm/asm.h conflicts with the symbol for the static function `found` in arch/x86/kernel/pmem.c. Assembler .set directive declare symbols with global visibility, so the assembler may not rename such symbols in the event of a conflict. LTO could rename static functions if there was a conflict in C sources, but it cannot see into symbols defined in inline asm. The symbols are also retained in the symbol table, regardless of LTO. Give the symbols .L prefixes making them locally visible, so that they may be renamed for LTO to avoid conflicts, and to drop them from the symbol table regardless of LTO. Fixes: 4b5305decc84 ("x86/extable: Extend extable functionality") Reported-by: Bernardo Meurer Costa <beme@google.com> Debugged-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Link: https://lore.kernel.org/r/20220329202148.2379697-1-ndesaulniers@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-13x86/bug: Prevent shadowing in __WARN_FLAGSVincent Mailhol1-2/+2
commit 9ce02f0fc68326dd1f87a0a3a4c6ae7fdd39e6f6 upstream. The macro __WARN_FLAGS() uses a local variable named "f". This being a common name, there is a risk of shadowing other variables. For example, GCC would yield: | In file included from ./include/linux/bug.h:5, | from ./include/linux/cpumask.h:14, | from ./arch/x86/include/asm/cpumask.h:5, | from ./arch/x86/include/asm/msr.h:11, | from ./arch/x86/include/asm/processor.h:22, | from ./arch/x86/include/asm/timex.h:5, | from ./include/linux/timex.h:65, | from ./include/linux/time32.h:13, | from ./include/linux/time.h:60, | from ./include/linux/stat.h:19, | from ./include/linux/module.h:13, | from virt/lib/irqbypass.mod.c:1: | ./include/linux/rcupdate.h: In function 'rcu_head_after_call_rcu': | ./arch/x86/include/asm/bug.h:80:21: warning: declaration of 'f' shadows a parameter [-Wshadow] | 80 | __auto_type f = BUGFLAG_WARNING|(flags); \ | | ^ | ./include/asm-generic/bug.h:106:17: note: in expansion of macro '__WARN_FLAGS' | 106 | __WARN_FLAGS(BUGFLAG_ONCE | \ | | ^~~~~~~~~~~~ | ./include/linux/rcupdate.h:1007:9: note: in expansion of macro 'WARN_ON_ONCE' | 1007 | WARN_ON_ONCE(func != (rcu_callback_t)~0L); | | ^~~~~~~~~~~~ | In file included from ./include/linux/rbtree.h:24, | from ./include/linux/mm_types.h:11, | from ./include/linux/buildid.h:5, | from ./include/linux/module.h:14, | from virt/lib/irqbypass.mod.c:1: | ./include/linux/rcupdate.h:1001:62: note: shadowed declaration is here | 1001 | rcu_head_after_call_rcu(struct rcu_head *rhp, rcu_callback_t f) | | ~~~~~~~~~~~~~~~^ For reference, sparse also warns about it, c.f. [1]. This patch renames the variable from f to __flags (with two underscore prefixes as suggested in the Linux kernel coding style [2]) in order to prevent collisions. [1] https://lore.kernel.org/all/CAFGhKbyifH1a+nAMCvWM88TK6fpNPdzFtUXPmRGnnQeePV+1sw@mail.gmail.com/ [2] Linux kernel coding style, section 12) Macros, Enums and RTL, paragraph 5) namespace collisions when defining local variables in macros resembling functions https://www.kernel.org/doc/html/latest/process/coding-style.html#macros-enums-and-rtl Fixes: bfb1a7c91fb7 ("x86/bug: Merge annotate_reachable() into_BUG_FLAGS() asm") Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lkml.kernel.org/r/20220324023742.106546-1-mailhol.vincent@wanadoo.fr Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-13perf/x86/intel: Don't extend the pseudo-encoding to GP countersKan Liang1-0/+5
commit 4a263bf331c512849062805ef1b4ac40301a9829 upstream. The INST_RETIRED.PREC_DIST event (0x0100) doesn't count on SPR. perf stat -e cpu/event=0xc0,umask=0x0/,cpu/event=0x0,umask=0x1/ -C0 Performance counter stats for 'CPU(s) 0': 607,246 cpu/event=0xc0,umask=0x0/ 0 cpu/event=0x0,umask=0x1/ The encoding for INST_RETIRED.PREC_DIST is pseudo-encoding, which doesn't work on the generic counters. However, current perf extends its mask to the generic counters. The pseudo event-code for a fixed counter must be 0x00. Check and avoid extending the mask for the fixed counter event which using the pseudo-encoding, e.g., ref-cycles and PREC_DIST event. With the patch, perf stat -e cpu/event=0xc0,umask=0x0/,cpu/event=0x0,umask=0x1/ -C0 Performance counter stats for 'CPU(s) 0': 583,184 cpu/event=0xc0,umask=0x0/ 583,048 cpu/event=0x0,umask=0x1/ Fixes: 2de71ee153ef ("perf/x86/intel: Fix ICL/SPR INST_RETIRED.PREC_DIST encodings") Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/1648482543-14923-1-git-send-email-kan.liang@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-13x86/msi: Fix msi message data shadow structReto Buerki1-8/+11
commit 59b18a1e65b7a2134814106d0860010e10babe18 upstream. The x86 MSI message data is 32 bits in total and is either in compatibility or remappable format, see Intel Virtualization Technology for Directed I/O, section 5.1.2. Fixes: 6285aa50736 ("x86/msi: Provide msi message shadow structs") Co-developed-by: Adrian-Ken Rueegsegger <ken@codelabs.ch> Signed-off-by: Adrian-Ken Rueegsegger <ken@codelabs.ch> Signed-off-by: Reto Buerki <reet@codelabs.ch> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220407110647.67372-1-reet@codelabs.ch Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-13x86: Annotate call_on_stack()Peter Zijlstra1-1/+2
[ Upstream commit be0075951fde739f14ee2b659e2fd6e2499c46c0 ] vmlinux.o: warning: objtool: page_fault_oops()+0x13c: unreachable instruction 0000 000000000005b460 <page_fault_oops>: ... 0128 5b588: 49 89 23 mov %rsp,(%r11) 012b 5b58b: 4c 89 dc mov %r11,%rsp 012e 5b58e: 4c 89 f2 mov %r14,%rdx 0131 5b591: 48 89 ee mov %rbp,%rsi 0134 5b594: 4c 89 e7 mov %r12,%rdi 0137 5b597: e8 00 00 00 00 call 5b59c <page_fault_oops+0x13c> 5b598: R_X86_64_PLT32 handle_stack_overflow-0x4 013c 5b59c: 5c pop %rsp vmlinux.o: warning: objtool: sysvec_reboot()+0x6d: unreachable instruction 0000 00000000000033f0 <sysvec_reboot>: ... 005d 344d: 4c 89 dc mov %r11,%rsp 0060 3450: e8 00 00 00 00 call 3455 <sysvec_reboot+0x65> 3451: R_X86_64_PLT32 irq_enter_rcu-0x4 0065 3455: 48 89 ef mov %rbp,%rdi 0068 3458: e8 00 00 00 00 call 345d <sysvec_reboot+0x6d> 3459: R_X86_64_PC32 .text+0x47d0c 006d 345d: e8 00 00 00 00 call 3462 <sysvec_reboot+0x72> 345e: R_X86_64_PLT32 irq_exit_rcu-0x4 0072 3462: 5c pop %rsp Both cases are due to a call_on_stack() calling a __noreturn function. Since that's an inline asm, GCC can't do anything about the instructions after the CALL. Therefore put in an explicit ASM_REACHABLE annotation to make sure objtool and gcc are consistently confused about control flow. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lore.kernel.org/r/20220308154319.468805622@infradead.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-13KVM: x86/pmu: Use different raw event masks for AMD and IntelJim Mattson1-0/+1
[ Upstream commit 95b065bf5c431c06c68056a03a5853b660640ecc ] The third nybble of AMD's event select overlaps with Intel's IN_TX and IN_TXCP bits. Therefore, we can't use AMD64_RAW_EVENT_MASK on Intel platforms that support TSX. Declare a raw_event_mask in the kvm_pmu structure, initialize it in the vendor-specific pmu_refresh() functions, and use that mask for PERF_TYPE_RAW configurations in reprogram_gp_counter(). Fixes: 710c47651431 ("KVM: x86/pmu: Use AMD64_RAW_EVENT_MASK for PERF_TYPE_RAW") Signed-off-by: Jim Mattson <jmattson@google.com> Message-Id: <20220308012452.3468611-1-jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-08KVM: x86: SVM: fix avic spec based definitions againMaxim Levitsky1-3/+5
commit 0dacc3df898e219fa774f39e5e10d686364e0a27 upstream. Due to wrong rebase, commit 4a204f7895878 ("KVM: SVM: Allow AVIC support on system w/ physical APIC ID > 255") moved avic spec #defines back to avic.c. Move them back, and while at it extend AVIC_DOORBELL_PHYSICAL_ID_MASK to 12 bits as well (it will be used in nested avic) Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220322172449.235575-5-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-08KVM: SVM: Allow AVIC support on system w/ physical APIC ID > 255Suravee Suthikulpanit1-1/+1
commit 4a204f7895878363ca8211f50ec610408c8c70aa upstream. Expand KVM's mask for the AVIC host physical ID to the full 12 bits defined by the architecture. The number of bits consumed by hardware is model specific, e.g. early CPUs ignored bits 11:8, but there is no way for KVM to enumerate the "true" size. So, KVM must allow using all bits, else it risks rejecting completely legal x2APIC IDs on newer CPUs. This means KVM relies on hardware to not assign x2APIC IDs that exceed the "true" width of the field, but presumably hardware is smart enough to tie the width to the max x2APIC ID. KVM also relies on hardware to support at least 8 bits, as the legacy xAPIC ID is writable by software. But, those assumptions are unavoidable due to the lack of any way to enumerate the "true" width. Cc: stable@vger.kernel.org Cc: Maxim Levitsky <mlevitsk@redhat.com> Suggested-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Fixes: 44a95dae1d22 ("KVM: x86: Detect and Initialize AVIC support") Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Message-Id: <20220211000851.185799-1-suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-08Merge tag 'x86_bugs_for_v5.17' of ↵Linus Torvalds2-8/+10
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 spectre fixes from Borislav Petkov: - Mitigate Spectre v2-type Branch History Buffer attacks on machines which support eIBRS, i.e., the hardware-assisted speculation restriction after it has been shown that such machines are vulnerable even with the hardware mitigation. - Do not use the default LFENCE-based Spectre v2 mitigation on AMD as it is insufficient to mitigate such attacks. Instead, switch to retpolines on all AMD by default. - Update the docs and add some warnings for the obviously vulnerable cmdline configurations. * tag 'x86_bugs_for_v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/speculation: Warn about eIBRS + LFENCE + Unprivileged eBPF + SMT x86/speculation: Warn about Spectre v2 LFENCE mitigation x86/speculation: Update link to AMD speculation whitepaper x86/speculation: Use generic retpoline by default on AMD x86/speculation: Include unprivileged eBPF status in Spectre v2 mitigation reporting Documentation/hw-vuln: Update spectre doc x86/speculation: Add eIBRS + Retpoline options x86/speculation: Rename RETPOLINE_AMD to RETPOLINE_LFENCE
2022-02-25Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-1/+0
Pull kvm fixes from Paolo Bonzini: "x86 host: - Expose KVM_CAP_ENABLE_CAP since it is supported - Disable KVM_HC_CLOCK_PAIRING in TSC catchup mode - Ensure async page fault token is nonzero - Fix lockdep false negative - Fix FPU migration regression from the AMX changes x86 guest: - Don't use PV TLB/IPI/yield on uniprocessor guests PPC: - reserve capability id (topic branch for ppc/kvm)" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: nSVM: disallow userspace setting of MSR_AMD64_TSC_RATIO to non default value when tsc scaling disabled KVM: x86/mmu: make apf token non-zero to fix bug KVM: PPC: reserve capability 210 for KVM_CAP_PPC_AIL_MODE_3 x86/kvm: Don't use pv tlb/ipi/sched_yield if on 1 vCPU x86/kvm: Fix compilation warning in non-x86_64 builds x86/kvm/fpu: Remove kvm_vcpu_arch.guest_supported_xcr0 x86/kvm/fpu: Limit guest user_xfeatures to supported bits of XCR0 kvm: x86: Disable KVM_HC_CLOCK_PAIRING if tsc is in always catchup mode KVM: Fix lockdep false negative during host resume KVM: x86: Add KVM_CAP_ENABLE_CAP to x86
2022-02-21x86/speculation: Add eIBRS + Retpoline optionsPeter Zijlstra1-1/+3
Thanks to the chaps at VUsec it is now clear that eIBRS is not sufficient, therefore allow enabling of retpolines along with eIBRS. Add spectre_v2=eibrs, spectre_v2=eibrs,lfence and spectre_v2=eibrs,retpoline options to explicitly pick your preferred means of mitigation. Since there's new mitigations there's also user visible changes in /sys/devices/system/cpu/vulnerabilities/spectre_v2 to reflect these new mitigations. [ bp: Massage commit message, trim error messages, do more precise eIBRS mode checking. ] Co-developed-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Patrick Colp <patrick.colp@oracle.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2022-02-21x86/speculation: Rename RETPOLINE_AMD to RETPOLINE_LFENCEPeter Zijlstra (Intel)2-7/+7
The RETPOLINE_AMD name is unfortunate since it isn't necessarily AMD only, in fact Hygon also uses it. Furthermore it will likely be sufficient for some Intel processors. Therefore rename the thing to RETPOLINE_LFENCE to better describe what it is. Add the spectre_v2=retpoline,lfence option as an alias to spectre_v2=retpoline,amd to preserve existing setups. However, the output of /sys/devices/system/cpu/vulnerabilities/spectre_v2 will be changed. [ bp: Fix typos, massage. ] Co-developed-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2022-02-17x86/kvm/fpu: Remove kvm_vcpu_arch.guest_supported_xcr0Leonardo Bras1-1/+0
kvm_vcpu_arch currently contains the guest supported features in both guest_supported_xcr0 and guest_fpu.fpstate->user_xfeatures field. Currently both fields are set to the same value in kvm_vcpu_after_set_cpuid() and are not changed anywhere else after that. Since it's not good to keep duplicated data, remove guest_supported_xcr0. To keep the code more readable, introduce kvm_guest_supported_xcr() and kvm_guest_supported_xfd() to replace the previous usages of guest_supported_xcr0. Signed-off-by: Leonardo Bras <leobras@redhat.com> Message-Id: <20220217053028.96432-3-leobras@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-02-15Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2-0/+37
Pull kvm fixes from Paolo Bonzini: "ARM: - Read HW interrupt pending state from the HW x86: - Don't truncate the performance event mask on AMD - Fix Xen runstate updates to be atomic when preempting vCPU - Fix for AMD AVIC interrupt injection race - Several other AMD fixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86/pmu: Use AMD64_RAW_EVENT_MASK for PERF_TYPE_RAW KVM: x86/pmu: Don't truncate the PerfEvtSeln MSR when creating a perf event KVM: SVM: fix race between interrupt delivery and AVIC inhibition KVM: SVM: set IRR in svm_deliver_interrupt KVM: SVM: extract avic_ring_doorbell selftests: kvm: Remove absent target file KVM: arm64: vgic: Read HW interrupt pending state from the HW KVM: x86/xen: Fix runstate updates to be atomic when preempting vCPU KVM: x86: SVM: move avic definitions from AMD's spec to svm.h KVM: x86: lapic: don't touch irr_pending in kvm_apic_update_apicv when inhibiting it KVM: x86: nSVM: deal with L1 hypervisor that intercepts interrupts but lets L2 control them KVM: x86: nSVM: expose clean bit support to the guest KVM: x86: nSVM/nVMX: set nested_run_pending on VM entry which is a result of RSM KVM: x86: nSVM: mark vmcb01 as dirty when restoring SMM saved state KVM: x86: nSVM: fix potential NULL derefernce on nested migration KVM: x86: SVM: don't passthrough SMAP/SMEP/PKE bits in !NPT && !gCR0.PG case Revert "svm: Add warning message for AVIC IPI invalid target"
2022-02-13Merge tag 'objtool_urgent_for_v5.17_rc4' of ↵Linus Torvalds1-9/+11
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull objtool fix from Borislav Petkov: "Fix a case where objtool would mistakenly warn about instructions being unreachable" * tag 'objtool_urgent_for_v5.17_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/bug: Merge annotate_reachable() into _BUG_FLAGS() asm
2022-02-12Merge tag 'for-linus-5.17a-rc4-tag' of ↵Linus Torvalds1-0/+7
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: - Two small cleanups - Another fix for addressing the EFI framebuffer above 4GB when running as Xen dom0 - A patch to let Xen guests use reserved bits in MSI- and IO-APIC- registers for extended APIC-IDs the same way KVM guests are doing it already * tag 'for-linus-5.17a-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/pci: Make use of the helper macro LIST_HEAD() xen/x2apic: Fix inconsistent indenting xen/x86: detect support for extended destination ID xen/x86: obtain full video frame buffer address for Dom0 also under EFI
2022-02-10xen/x86: detect support for extended destination IDRoger Pau Monne1-0/+7
Xen allows the usage of some previously reserved bits in the IO-APIC RTE and the MSI address fields in order to store high bits for the target APIC ID. Such feature is already implemented by QEMU/KVM and HyperV, so in order to enable it just add the handler that checks for it's presence. Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20220120152527.7524-3-roger.pau@citrix.com Signed-off-by: Juergen Gross <jgross@suse.com>
2022-02-08KVM: x86: SVM: move avic definitions from AMD's spec to svm.hMaxim Levitsky2-0/+37
asm/svm.h is the correct place for all values that are defined in the SVM spec, and that includes AVIC. Also add some values from the spec that were not defined before and will be soon useful. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220207155447.840194-10-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-02-05Merge tag 'for-linus-5.17a-rc3-tag' of ↵Linus Torvalds1-14/+0
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: - documentation fixes related to Xen - enable x2apic mode when available when running as hardware virtualized guest under Xen - cleanup and fix a corner case of vcpu enumeration when running a paravirtualized Xen guest * tag 'for-linus-5.17a-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: x86/Xen: streamline (and fix) PV CPU enumeration xen: update missing ioctl magic numers documentation Improve docs for IOCTL_GNTDEV_MAP_GRANT_REF xen: xenbus_dev.h: delete incorrect file name xen/x2apic: enable x2apic mode when supported for HVM
2022-02-05Merge tag 'kvmarm-fixes-5.17-2' of ↵Paolo Bonzini44-291/+407
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 5.17, take #2 - A couple of fixes when handling an exception while a SError has been delivered - Workaround for Cortex-A510's single-step[ erratum
2022-02-03x86/bug: Merge annotate_reachable() into _BUG_FLAGS() asmNick Desaulniers1-9/+11
In __WARN_FLAGS(), we had two asm statements (abbreviated): asm volatile("ud2"); asm volatile(".pushsection .discard.reachable"); These pair of statements are used to trigger an exception, but then help objtool understand that for warnings, control flow will be restored immediately afterwards. The problem is that volatile is not a compiler barrier. GCC explicitly documents this: > Note that the compiler can move even volatile asm instructions > relative to other code, including across jump instructions. Also, no clobbers are specified to prevent instructions from subsequent statements from being scheduled by compiler before the second asm statement. This can lead to instructions from subsequent statements being emitted by the compiler before the second asm statement. Providing a scheduling model such as via -march= options enables the compiler to better schedule instructions with known latencies to hide latencies from data hazards compared to inline asm statements in which latencies are not estimated. If an instruction gets scheduled by the compiler between the two asm statements, then objtool will think that it is not reachable, producing a warning. To prevent instructions from being scheduled in between the two asm statements, merge them. Also remove an unnecessary unreachable() asm annotation from BUG() in favor of __builtin_unreachable(). objtool is able to track that the ud2 from BUG() terminates control flow within the function. Link: https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#Volatile Link: https://github.com/ClangBuiltLinux/linux/issues/1483 Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lore.kernel.org/r/20220202205557.2260694-1-ndesaulniers@google.com
2022-02-01KVM: x86: Move delivery of non-APICv interrupt into vendor codeSean Christopherson2-2/+3
Handle non-APICv interrupt delivery in vendor code, even though it means VMX and SVM will temporarily have duplicate code. SVM's AVIC has a race condition that requires KVM to fall back to legacy interrupt injection _after_ the interrupt has been logged in the vIRR, i.e. to fix the race, SVM will need to open code the full flow anyways[*]. Refactor the code so that the SVM bug without introducing other issues, e.g. SVM would return "success" and thus invoke trace_kvm_apicv_accept_irq() even when delivery through the AVIC failed, and to opportunistically prepare for using KVM_X86_OP to fill each vendor's kvm_x86_ops struct, which will rely on the vendor function matching the kvm_x86_op pointer name. No functional change intended. [*] https://lore.kernel.org/all/20211213104634.199141-4-mlevitsk@redhat.com Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220128005208.4008533-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-28Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2-2/+6
Pull kvm fixes from Paolo Bonzini: "Two larger x86 series: - Redo incorrect fix for SEV/SMAP erratum - Windows 11 Hyper-V workaround Other x86 changes: - Various x86 cleanups - Re-enable access_tracking_perf_test - Fix for #GP handling on SVM - Fix for CPUID leaf 0Dh in KVM_GET_SUPPORTED_CPUID - Fix for ICEBP in interrupt shadow - Avoid false-positive RCU splat - Enable Enlightened MSR-Bitmap support for real ARM: - Correctly update the shadow register on exception injection when running in nVHE mode - Correctly use the mm_ops indirection when performing cache invalidation from the page-table walker - Restrict the vgic-v3 workaround for SEIS to the two known broken implementations Generic code changes: - Dead code cleanup" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (43 commits) KVM: eventfd: Fix false positive RCU usage warning KVM: nVMX: Allow VMREAD when Enlightened VMCS is in use KVM: nVMX: Implement evmcs_field_offset() suitable for handle_vmread() KVM: nVMX: Rename vmcs_to_field_offset{,_table} KVM: nVMX: eVMCS: Filter out VM_EXIT_SAVE_VMX_PREEMPTION_TIMER KVM: nVMX: Also filter MSR_IA32_VMX_TRUE_PINBASED_CTLS when eVMCS selftests: kvm: check dynamic bits against KVM_X86_XCOMP_GUEST_SUPP KVM: x86: add system attribute to retrieve full set of supported xsave states KVM: x86: Add a helper to retrieve userspace address from kvm_device_attr selftests: kvm: move vm_xsave_req_perm call to amx_test KVM: x86: Sync the states size with the XCR0/IA32_XSS at, any time KVM: x86: Update vCPU's runtime CPUID on write to MSR_IA32_XSS KVM: x86: Keep MSR_IA32_XSS unchanged for INIT KVM: x86: Free kvm_cpuid_entry2 array on post-KVM_RUN KVM_SET_CPUID{,2} KVM: nVMX: WARN on any attempt to allocate shadow VMCS for vmcs02 KVM: selftests: Don't skip L2's VMCALL in SMM test for SVM guest KVM: x86: Check .flags in kvm_cpuid_check_equal() too KVM: x86: Forcibly leave nested virt when SMM state is toggled KVM: SVM: drop unnecessary code in svm_hv_vmcb_dirty_nested_enlightenments() KVM: SVM: hyper-v: Enable Enlightened MSR-Bitmap support for real ...
2022-01-28KVM: x86: add system attribute to retrieve full set of supported xsave statesPaolo Bonzini1-0/+3
Because KVM_GET_SUPPORTED_CPUID is meant to be passed (by simple-minded VMMs) to KVM_SET_CPUID2, it cannot include any dynamic xsave states that have not been enabled. Probing those, for example so that they can be passed to ARCH_REQ_XCOMP_GUEST_PERM, requires a new ioctl or arch_prctl. The latter is in fact worse, even though that is what the rest of the API uses, because it would require supported_xcr0 to be moved from the KVM module to the kernel just for this use. In addition, the value would be nonsensical (or an error would have to be returned) until the KVM module is loaded in. Therefore, to limit the growth of system ioctls, add a /dev/kvm variant of KVM_{GET,HAS}_DEVICE_ATTR, and implement it in x86 with just one group (0) and attribute (KVM_X86_XCOMP_GUEST_SUPP). Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-28xen/x2apic: enable x2apic mode when supported for HVMRoger Pau Monne1-14/+0
There's no point in disabling x2APIC mode when running as a Xen HVM guest, just enable it when available. Remove some unneeded wrapping around the detection functions, and simply provide a xen_x2apic_available helper that's a wrapper around x2apic_supported. Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: https://lore.kernel.org/r/20220121090146.13697-1-roger.pau@citrix.com Signed-off-by: Juergen Gross <jgross@suse.com>
2022-01-26KVM: x86: Forcibly leave nested virt when SMM state is toggledSean Christopherson1-0/+1
Forcibly leave nested virtualization operation if userspace toggles SMM state via KVM_SET_VCPU_EVENTS or KVM_SYNC_X86_EVENTS. If userspace forces the vCPU out of SMM while it's post-VMXON and then injects an SMI, vmx_enter_smm() will overwrite vmx->nested.smm.vmxon and end up with both vmxon=false and smm.vmxon=false, but all other nVMX state allocated. Don't attempt to gracefully handle the transition as (a) most transitions are nonsencial, e.g. forcing SMM while L2 is running, (b) there isn't sufficient information to handle all transitions, e.g. SVM wants access to the SMRAM save state, and (c) KVM_SET_VCPU_EVENTS must precede KVM_SET_NESTED_STATE during state restore as the latter disallows putting the vCPU into L2 if SMM is active, and disallows tagging the vCPU as being post-VMXON in SMM if SMM is not active. Abuse of KVM_SET_VCPU_EVENTS manifests as a WARN and memory leak in nVMX due to failure to free vmcs01's shadow VMCS, but the bug goes far beyond just a memory leak, e.g. toggling SMM on while L2 is active puts the vCPU in an architecturally impossible state. WARNING: CPU: 0 PID: 3606 at free_loaded_vmcs arch/x86/kvm/vmx/vmx.c:2665 [inline] WARNING: CPU: 0 PID: 3606 at free_loaded_vmcs+0x158/0x1a0 arch/x86/kvm/vmx/vmx.c:2656 Modules linked in: CPU: 1 PID: 3606 Comm: syz-executor725 Not tainted 5.17.0-rc1-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:free_loaded_vmcs arch/x86/kvm/vmx/vmx.c:2665 [inline] RIP: 0010:free_loaded_vmcs+0x158/0x1a0 arch/x86/kvm/vmx/vmx.c:2656 Code: <0f> 0b eb b3 e8 8f 4d 9f 00 e9 f7 fe ff ff 48 89 df e8 92 4d 9f 00 Call Trace: <TASK> kvm_arch_vcpu_destroy+0x72/0x2f0 arch/x86/kvm/x86.c:11123 kvm_vcpu_destroy arch/x86/kvm/../../../virt/kvm/kvm_main.c:441 [inline] kvm_destroy_vcpus+0x11f/0x290 arch/x86/kvm/../../../virt/kvm/kvm_main.c:460 kvm_free_vcpus arch/x86/kvm/x86.c:11564 [inline] kvm_arch_destroy_vm+0x2e8/0x470 arch/x86/kvm/x86.c:11676 kvm_destroy_vm arch/x86/kvm/../../../virt/kvm/kvm_main.c:1217 [inline] kvm_put_kvm+0x4fa/0xb00 arch/x86/kvm/../../../virt/kvm/kvm_main.c:1250 kvm_vm_release+0x3f/0x50 arch/x86/kvm/../../../virt/kvm/kvm_main.c:1273 __fput+0x286/0x9f0 fs/file_table.c:311 task_work_run+0xdd/0x1a0 kernel/task_work.c:164 exit_task_work include/linux/task_work.h:32 [inline] do_exit+0xb29/0x2a30 kernel/exit.c:806 do_group_exit+0xd2/0x2f0 kernel/exit.c:935 get_signal+0x4b0/0x28c0 kernel/signal.c:2862 arch_do_signal_or_restart+0x2a9/0x1c40 arch/x86/kernel/signal.c:868 handle_signal_work kernel/entry/common.c:148 [inline] exit_to_user_mode_loop kernel/entry/common.c:172 [inline] exit_to_user_mode_prepare+0x17d/0x290 kernel/entry/common.c:207 __syscall_exit_to_user_mode_work kernel/entry/common.c:289 [inline] syscall_exit_to_user_mode+0x19/0x60 kernel/entry/common.c:300 do_syscall_64+0x42/0xb0 arch/x86/entry/common.c:86 entry_SYSCALL_64_after_hwframe+0x44/0xae </TASK> Cc: stable@vger.kernel.org Reported-by: syzbot+8112db3ab20e70d50c31@syzkaller.appspotmail.com Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220125220358.2091737-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-26KVM: x86: Pass emulation type to can_emulate_instruction()Sean Christopherson1-1/+2
Pass the emulation type to kvm_x86_ops.can_emulate_insutrction() so that a future commit can harden KVM's SEV support to WARN on emulation scenarios that should never happen. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Liam Merwick <liam.merwick@oracle.com> Message-Id: <20220120010719.711476-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-25KVM/X86: Make kvm_vcpu_reload_apic_access_page() staticQuanfa Fu1-1/+0
Make kvm_vcpu_reload_apic_access_page() static as it is no longer invoked directly by vmx and it is also no longer exported. No functional change intended. Signed-off-by: Quanfa Fu <quanfafu@gmail.com> Message-Id: <20211219091446.174584-1-quanfafu@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-23Merge tag 'bitmap-5.17-rc1' of git://github.com/norov/linuxLinus Torvalds1-2/+0
Pull bitmap updates from Yury Norov: - introduce for_each_set_bitrange() - use find_first_*_bit() instead of find_next_*_bit() where possible - unify for_each_bit() macros * tag 'bitmap-5.17-rc1' of git://github.com/norov/linux: vsprintf: rework bitmap_list_string lib: bitmap: add performance test for bitmap_print_to_pagebuf bitmap: unify find_bit operations mm/percpu: micro-optimize pcpu_is_populated() Replace for_each_*_bit_from() with for_each_*_bit() where appropriate find: micro-optimize for_each_{set,clear}_bit() include/linux: move for_each_bit() macros from bitops.h to find.h cpumask: replace cpumask_next_* with cpumask_first_* where appropriate tools: sync tools/bitmap with mother linux all: replace find_next{,_zero}_bit with find_first{,_zero}_bit where appropriate cpumask: use find_first_and_bit() lib: add find_first_and_bit() arch: remove GENERIC_FIND_FIRST_BIT entirely include: move find.h from asm_generic to linux bitops: move find_bit_*_le functions from le.h to find.h bitops: protect find_first_{,zero}_bit properly
2022-01-22Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2-14/+2
Pull more kvm updates from Paolo Bonzini: "Generic: - selftest compilation fix for non-x86 - KVM: avoid warning on s390 in mark_page_dirty x86: - fix page write-protection bug and improve comments - use binary search to lookup the PMU event filter, add test - enable_pmu module parameter support for Intel CPUs - switch blocked_vcpu_on_cpu_lock to raw spinlock - cleanups of blocked vCPU logic - partially allow KVM_SET_CPUID{,2} after KVM_RUN (5.16 regression) - various small fixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (46 commits) docs: kvm: fix WARNINGs from api.rst selftests: kvm/x86: Fix the warning in lib/x86_64/processor.c selftests: kvm/x86: Fix the warning in pmu_event_filter_test.c kvm: selftests: Do not indent with spaces kvm: selftests: sync uapi/linux/kvm.h with Linux header selftests: kvm: add amx_test to .gitignore KVM: SVM: Nullify vcpu_(un)blocking() hooks if AVIC is disabled KVM: SVM: Move svm_hardware_setup() and its helpers below svm_x86_ops KVM: SVM: Drop AVIC's intermediate avic_set_running() helper KVM: VMX: Don't do full kick when handling posted interrupt wakeup KVM: VMX: Fold fallback path into triggering posted IRQ helper KVM: VMX: Pass desired vector instead of bool for triggering posted IRQ KVM: VMX: Don't do full kick when triggering posted interrupt "fails" KVM: SVM: Skip AVIC and IRTE updates when loading blocking vCPU KVM: SVM: Use kvm_vcpu_is_blocking() in AVIC load to handle preemption KVM: SVM: Remove unnecessary APICv/AVIC update in vCPU unblocking path KVM: SVM: Don't bother checking for "running" AVIC when kicking for IPIs KVM: SVM: Signal AVIC doorbell iff vCPU is in guest mode KVM: x86: Remove defunct pre_block/post_block kvm_x86_ops hooks KVM: x86: Unexport LAPIC's switch_to_{hv,sw}_timer() helpers ...
2022-01-19KVM: x86: Remove defunct pre_block/post_block kvm_x86_ops hooksSean Christopherson2-14/+0
Drop kvm_x86_ops' pre/post_block() now that all implementations are nops. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20211208015236.1616697-10-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-19Merge branch 'kvm-pi-raw-spinlock' into HEADPaolo Bonzini2-4/+2
Bring in fix for VT-d posted interrupts before further changing the code in 5.17. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-19KVM: VMX: Reject KVM_RUN if emulation is required with pending exceptionSean Christopherson2-0/+2
Reject KVM_RUN if emulation is required (because VMX is running without unrestricted guest) and an exception is pending, as KVM doesn't support emulating exceptions except when emulating real mode via vm86. The vCPU is hosed either way, but letting KVM_RUN proceed triggers a WARN due to the impossible condition. Alternatively, the WARN could be removed, but then userspace and/or KVM bugs would result in the vCPU silently running in a bad state, which isn't very friendly to users. Originally, the bug was hit by syzkaller with a nested guest as that doesn't require kvm_intel.unrestricted_guest=0. That particular flavor is likely fixed by commit cd0e615c49e5 ("KVM: nVMX: Synthesize TRIPLE_FAULT for L2 if emulation is required"), but it's trivial to trigger the WARN with a non-nested guest, and userspace can likely force bad state via ioctls() for a nested guest as well. Checking for the impossible condition needs to be deferred until KVM_RUN because KVM can't force specific ordering between ioctls. E.g. clearing exception.pending in KVM_SET_SREGS doesn't prevent userspace from setting it in KVM_SET_VCPU_EVENTS, and disallowing KVM_SET_VCPU_EVENTS with emulation_required would prevent userspace from queuing an exception and then stuffing sregs. Note, if KVM were to try and detect/prevent the condition prior to KVM_RUN, handle_invalid_guest_state() and/or handle_emulation_failure() would need to be modified to clear the pending exception prior to exiting to userspace. ------------[ cut here ]------------ WARNING: CPU: 6 PID: 137812 at arch/x86/kvm/vmx/vmx.c:1623 vmx_queue_exception+0x14f/0x160 [kvm_intel] CPU: 6 PID: 137812 Comm: vmx_invalid_nes Not tainted 5.15.2-7cc36c3e14ae-pop #279 Hardware name: ASUS Q87M-E/Q87M-E, BIOS 1102 03/03/2014 RIP: 0010:vmx_queue_exception+0x14f/0x160 [kvm_intel] Code: <0f> 0b e9 fd fe ff ff 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 RSP: 0018:ffffa45c83577d38 EFLAGS: 00010202 RAX: 0000000000000003 RBX: 0000000080000006 RCX: 0000000000000006 RDX: 0000000000000000 RSI: 0000000000010002 RDI: ffff9916af734000 RBP: ffff9916af734000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000006 R13: 0000000000000000 R14: ffff9916af734038 R15: 0000000000000000 FS: 00007f1e1a47c740(0000) GS:ffff99188fb80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f1e1a6a8008 CR3: 000000026f83b005 CR4: 00000000001726e0 Call Trace: kvm_arch_vcpu_ioctl_run+0x13a2/0x1f20 [kvm] kvm_vcpu_ioctl+0x279/0x690 [kvm] __x64_sys_ioctl+0x83/0xb0 do_syscall_64+0x3b/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xae Reported-by: syzbot+82112403ace4cbd780d8@syzkaller.appspotmail.com Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211228232437.1875318-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-16Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds8-44/+115
Pull kvm updates from Paolo Bonzini: "RISCV: - Use common KVM implementation of MMU memory caches - SBI v0.2 support for Guest - Initial KVM selftests support - Fix to avoid spurious virtual interrupts after clearing hideleg CSR - Update email address for Anup and Atish ARM: - Simplification of the 'vcpu first run' by integrating it into KVM's 'pid change' flow - Refactoring of the FP and SVE state tracking, also leading to a simpler state and less shared data between EL1 and EL2 in the nVHE case - Tidy up the header file usage for the nvhe hyp object - New HYP unsharing mechanism, finally allowing pages to be unmapped from the Stage-1 EL2 page-tables - Various pKVM cleanups around refcounting and sharing - A couple of vgic fixes for bugs that would trigger once the vcpu xarray rework is merged, but not sooner - Add minimal support for ARMv8.7's PMU extension - Rework kvm_pgtable initialisation ahead of the NV work - New selftest for IRQ injection - Teach selftests about the lack of default IPA space and page sizes - Expand sysreg selftest to deal with Pointer Authentication - The usual bunch of cleanups and doc update s390: - fix sigp sense/start/stop/inconsistency - cleanups x86: - Clean up some function prototypes more - improved gfn_to_pfn_cache with proper invalidation, used by Xen emulation - add KVM_IRQ_ROUTING_XEN_EVTCHN and event channel delivery - completely remove potential TOC/TOU races in nested SVM consistency checks - update some PMCs on emulated instructions - Intel AMX support (joint work between Thomas and Intel) - large MMU cleanups - module parameter to disable PMU virtualization - cleanup register cache - first part of halt handling cleanups - Hyper-V enlightened MSR bitmap support for nested hypervisors Generic: - clean up Makefiles - introduce CONFIG_HAVE_KVM_DIRTY_RING - optimize memslot lookup using a tree - optimize vCPU array usage by converting to xarray" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (268 commits) x86/fpu: Fix inline prefix warnings selftest: kvm: Add amx selftest selftest: kvm: Move struct kvm_x86_state to header selftest: kvm: Reorder vcpu_load_state steps for AMX kvm: x86: Disable interception for IA32_XFD on demand x86/fpu: Provide fpu_sync_guest_vmexit_xfd_state() kvm: selftests: Add support for KVM_CAP_XSAVE2 kvm: x86: Add support for getting/setting expanded xstate buffer x86/fpu: Add uabi_size to guest_fpu kvm: x86: Add CPUID support for Intel AMX kvm: x86: Add XCR0 support for Intel AMX kvm: x86: Disable RDMSR interception of IA32_XFD_ERR kvm: x86: Emulate IA32_XFD_ERR for guest kvm: x86: Intercept #NM for saving IA32_XFD_ERR x86/fpu: Prepare xfd_err in struct fpu_guest kvm: x86: Add emulation for IA32_XFD x86/fpu: Provide fpu_update_guest_xfd() for IA32_XFD emulation kvm: x86: Enable dynamic xfeatures at KVM_SET_CPUID2 x86/fpu: Provide fpu_enable_guest_xfd_features() for KVM x86/fpu: Add guest support to xfd_enable_feature() ...
2022-01-16Merge tag 'hyperv-next-signed-20220114' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv updates from Wei Liu: - More patches for Hyper-V isolation VM support (Tianyu Lan) - Bug fixes and clean-up patches from various people * tag 'hyperv-next-signed-20220114' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: scsi: storvsc: Fix storvsc_queuecommand() memory leak x86/hyperv: Properly deal with empty cpumasks in hyperv_flush_tlb_multi() Drivers: hv: vmbus: Initialize request offers message for Isolation VM scsi: storvsc: Fix unsigned comparison to zero swiotlb: Add CONFIG_HAS_IOMEM check around swiotlb_mem_remap() x86/hyperv: Fix definition of hv_ghcb_pg variable Drivers: hv: Fix definition of hypercall input & output arg variables net: netvsc: Add Isolation VM support for netvsc driver scsi: storvsc: Add Isolation VM support for storvsc driver hyper-v: Enable swiotlb bounce buffer for Isolation VM x86/hyper-v: Add hyperv Isolation VM check in the cc_platform_has() swiotlb: Add swiotlb bounce buffer remap function for HV IVM
2022-01-16Merge tag 'pci-v5.17-changes' of ↵Linus Torvalds2-7/+33
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull pci updates from Bjorn Helgaas: "Enumeration: - Use pci_find_vsec_capability() instead of open-coding it (Andy Shevchenko) - Convert pci_dev_present() stub from macro to static inline to avoid 'unused variable' errors (Hans de Goede) - Convert sysfs slot attributes from default_attrs to default_groups (Greg Kroah-Hartman) - Use DWORD accesses for LTR, L1 SS to avoid BayHub OZ711LV2 erratum (Rajat Jain) - Remove unnecessary initialization of static variables (Longji Guo) Resource management: - Always write Intel I210 ROM BAR on update to work around device defect (Bjorn Helgaas) PCIe native device hotplug: - Fix pciehp lockdep errors on Thunderbolt undock (Hans de Goede) - Fix infinite loop in pciehp IRQ handler on power fault (Lukas Wunner) Power management: - Convert amd64-agp, sis-agp, via-agp from legacy PCI power management to generic power management (Vaibhav Gupta) IOMMU: - Add function 1 DMA alias quirk for Marvell 88SE9125 SATA controller so it can work with an IOMMU (Yifeng Li) Error handling: - Add PCI_ERROR_RESPONSE and related definitions for signaling and checking for transaction errors on PCI (Naveen Naidu) - Fabricate PCI_ERROR_RESPONSE data (~0) in config read wrappers, instead of in host controller drivers, when transactions fail on PCI (Naveen Naidu) - Use PCI_POSSIBLE_ERROR() to check for possible failure of config reads (Naveen Naidu) Peer-to-peer DMA: - Add Logan Gunthorpe as P2PDMA maintainer (Bjorn Helgaas) ASPM: - Calculate link L0s and L1 exit latencies when needed instead of caching them (Saheed O. Bolarinwa) - Calculate device L0s and L1 acceptable exit latencies when needed instead of caching them (Saheed O. Bolarinwa) - Remove struct aspm_latency since it's no longer needed (Saheed O. Bolarinwa) APM X-Gene PCIe controller driver: - Fix IB window setup, which was broken by the fact that IB resources are now sorted in address order instead of DT dma-ranges order (Rob Herring) Apple PCIe controller driver: - Enable clock gating to save power (Hector Martin) - Fix REFCLK1 enable/poll logic (Hector Martin) Broadcom STB PCIe controller driver: - Declare bitmap correctly for use by bitmap interfaces (Christophe JAILLET) - Clean up computation of legacy and non-legacy MSI bitmasks (Florian Fainelli) - Update suspend/resume/remove error handling to warn about errors and not fail the operation (Jim Quinlan) - Correct the "pcie" and "msi" interrupt descriptions in DT binding (Jim Quinlan) - Add DT bindings for endpoint voltage regulators (Jim Quinlan) - Split brcm_pcie_setup() into two functions (Jim Quinlan) - Add mechanism for turning on voltage regulators for connected devices (Jim Quinlan) - Turn voltage regulators for connected devices on/off when bus is added or removed (Jim Quinlan) - When suspending, don't turn off voltage regulators for wakeup devices (Jim Quinlan) Freescale i.MX6 PCIe controller driver: - Add i.MX8MM support (Richard Zhu) Freescale Layerscape PCIe controller driver: - Use DWC common ops instead of layerscape-specific link-up functions (Hou Zhiqiang) Intel VMD host bridge driver: - Honor platform ACPI _OSC feature negotiation for Root Ports below VMD (Kai-Heng Feng) - Add support for Raptor Lake SKUs (Karthik L Gopalakrishnan) - Reset everything below VMD before enumerating to work around failure to enumerate NVMe devices when guest OS reboots (Nirmal Patel) Bridge emulation (used by Marvell Aardvark and MVEBU): - Make emulated ROM BAR read-only by default (Pali Rohár) - Make some emulated legacy PCI bits read-only for PCIe devices (Pali Rohár) - Update reserved bits in emulated PCIe Capability (Pali Rohár) - Allow drivers to emulate different PCIe Capability versions (Pali Rohár) - Set emulated Capabilities List bit for all PCIe devices, since they must have at least a PCIe Capability (Pali Rohár) Marvell Aardvark PCIe controller driver: - Add bridge emulation definitions for PCIe DEVCAP2, DEVCTL2, DEVSTA2, LNKCAP2, LNKCTL2, LNKSTA2, SLTCAP2, SLTCTL2, SLTSTA2 (Pali Rohár) - Add aardvark support for DEVCAP2, DEVCTL2, LNKCAP2 and LNKCTL2 registers (Pali Rohár) - Clear all MSIs at setup to avoid spurious interrupts (Pali Rohár) - Disable bus mastering when unbinding host controller driver (Pali Rohár) - Mask all interrupts when unbinding host controller driver (Pali Rohár) - Fix memory leak in host controller unbind (Pali Rohár) - Assert PERST# when unbinding host controller driver (Pali Rohár) - Disable link training when unbinding host controller driver (Pali Rohár) - Disable common PHY when unbinding host controller driver (Pali Rohár) - Fix resource type checking to check only IORESOURCE_MEM, not IORESOURCE_MEM_64, which is a flavor of IORESOURCE_MEM (Pali Rohár) Marvell MVEBU PCIe controller driver: - Implement pci_remap_iospace() for ARM so mvebu can use devm_pci_remap_iospace() instead of the previous ARM-specific pci_ioremap_io() interface (Pali Rohár) - Use the standard pci_host_probe() instead of the device-specific mvebu_pci_host_probe() (Pali Rohár) - Replace all uses of ARM-specific pci_ioremap_io() with the ARM implementation of the standard pci_remap_iospace() interface and remove pci_ioremap_io() (Pali Rohár) - Skip initializing invalid Root Ports (Pali Rohár) - Check for errors from pci_bridge_emul_init() (Pali Rohár) - Ignore any bridges at non-zero function numbers (Pali Rohár) - Return ~0 data for invalid config read size (Pali Rohár) - Disallow mapping interrupts on emulated bridges (Pali Rohár) - Clear Root Port Memory & I/O Space Enable and Bus Master Enable at initialization (Pali Rohár) - Make type bits in Root Port I/O Base register read-only (Pali Rohár) - Disable Root Port windows when base/limit set to invalid values (Pali Rohár) - Set controller to Root Complex mode (Pali Rohár) - Set Root Port Class Code to PCI Bridge (Pali Rohár) - Update emulated Root Port secondary bus numbers to better reflect the actual topology (Pali Rohár) - Add PCI_BRIDGE_CTL_BUS_RESET support to emulated Root Ports so pci_reset_secondary_bus() can reset connected devices (Pali Rohár) - Add PCI_EXP_DEVCTL Error Reporting Enable support to emulated Root Ports (Pali Rohár) - Add PCI_EXP_RTSTA PME Status bit support to emulated Root Ports (Pali Rohár) - Add DEVCAP2, DEVCTL2 and LNKCTL2 support to emulated Root Ports on Armada XP and newer devices (Pali Rohár) - Export mvebu-mbus.c symbols to allow pci-mvebu.c to be a module (Pali Rohár) - Add support for compiling as a module (Pali Rohár) MediaTek PCIe controller driver: - Assert PERST# for 100ms to allow power and clock to stabilize (qizhong cheng) MediaTek PCIe Gen3 controller driver: - Disable Mediatek DVFSRC voltage request since lack of DVFSRC to respond to the request causes failure to exit L1 PM Substate (Jianjun Wang) MediaTek MT7621 PCIe controller driver: - Declare mt7621_pci_ops static (Sergio Paracuellos) - Give pcibios_root_bridge_prepare() access to host bridge windows (Sergio Paracuellos) - Move MIPS I/O coherency unit setup from driver to pcibios_root_bridge_prepare() (Sergio Paracuellos) - Add missing MODULE_LICENSE() (Sergio Paracuellos) - Allow COMPILE_TEST for all arches (Sergio Paracuellos) Microsoft Hyper-V host bridge driver: - Add hv-internal interfaces to encapsulate arch IRQ dependencies (Sunil Muthuswamy) - Add arm64 Hyper-V vPCI support (Sunil Muthuswamy) Qualcomm PCIe controller driver: - Undo PM setup in qcom_pcie_probe() error handling path (Christophe JAILLET) - Use __be16 type to store return value from cpu_to_be16() (Manivannan Sadhasivam) - Constify static dw_pcie_ep_ops (Rikard Falkeborn) Renesas R-Car PCIe controller driver: - Fix aarch32 abort handler so it doesn't check the wrong bus clock before accessing the host controller (Marek Vasut) TI Keystone PCIe controller driver: - Add register offset for ti,syscon-pcie-id and ti,syscon-pcie-mode DT properties (Kishon Vijay Abraham I) MicroSemi Switchtec management driver: - Add Gen4 automotive device IDs (Kelvin Cao) - Declare state_names[] as static so it's not allocated and initialized for every call (Kelvin Cao) Host controller driver cleanups: - Use of_device_get_match_data(), not of_match_device(), when we only need the device data in altera, artpec6, cadence, designware-plat, dra7xx, keystone, kirin (Fan Fei) - Drop pointless of_device_get_match_data() cast in j721e (Bjorn Helgaas) - Drop redundant struct device * from j721e since struct cdns_pcie already has one (Bjorn Helgaas) - Rename driver structs to *_pcie in intel-gw, iproc, ls-gen4, mediatek-gen3, microchip, mt7621, rcar-gen2, tegra194, uniphier, xgene, xilinx, xilinx-cpm for consistency across drivers (Fan Fei) - Fix invalid address space conversions in hisi, spear13xx (Bjorn Helgaas) Miscellaneous: - Sort Intel Device IDs by value (Andy Shevchenko) - Change Capability offsets to hex to match spec (Baruch Siach) - Correct misspellings (Krzysztof Wilczyński) - Terminate statement with semicolon in pci_endpoint_test.c (Ming Wang)" * tag 'pci-v5.17-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (151 commits) PCI: mt7621: Allow COMPILE_TEST for all arches PCI: mt7621: Add missing MODULE_LICENSE() PCI: mt7621: Move MIPS setup to pcibios_root_bridge_prepare() PCI: Let pcibios_root_bridge_prepare() access bridge->windows PCI: mt7621: Declare mt7621_pci_ops static PCI: brcmstb: Do not turn off WOL regulators on suspend PCI: brcmstb: Add control of subdevice voltage regulators PCI: brcmstb: Add mechanism to turn on subdev regulators PCI: brcmstb: Split brcm_pcie_setup() into two funcs dt-bindings: PCI: Add bindings for Brcmstb EP voltage regulators dt-bindings: PCI: Correct brcmstb interrupts, interrupt-map. PCI: brcmstb: Fix function return value handling PCI: brcmstb: Do not use __GENMASK PCI: brcmstb: Declare 'used' as bitmap, not unsigned long PCI: hv: Add arm64 Hyper-V vPCI support PCI: hv: Make the code arch neutral by adding arch specific interfaces PCI: pciehp: Use down_read/write_nested(reset_lock) to fix lockdep errors x86/PCI: Remove initialization of static variables to false PCI: Use DWORD accesses for LTR, L1 SS to avoid erratum misc: pci_endpoint_test: Terminate statement with semicolon ...