kernel/linux.git - Linux kernel stable tree (mirror)

Age	Commit message (Collapse)	Author	Files	Lines
2023-02-04	KVM: x86: Simplify msr_filter update	Michal Luczaj	1	-4/+1
	Replace srcu_dereference()+rcu_assign_pointer() sequence with a single rcu_replace_pointer(). Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Michal Luczaj <mhal@rbox.co> Link: https://lore.kernel.org/r/20230107001256.2365304-4-mhal@rbox.co Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-02-04	KVM: x86: Optimize kvm->lock and SRCU interaction (KVM_X86_SET_MSR_FILTER)	Michal Luczaj	1	-1/+1
	Reduce time spent holding kvm->lock: unlock mutex before calling synchronize_srcu(). There is no need to hold kvm->lock until all vCPUs have been kicked, KVM only needs to guarantee that all vCPUs will switch to the new filter before exiting to userspace. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Michal Luczaj <mhal@rbox.co> Link: https://lore.kernel.org/r/20230107001256.2365304-3-mhal@rbox.co [sean: expand changelog] Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-02-04	KVM: x86: Optimize kvm->lock and SRCU interaction (KVM_SET_PMU_EVENT_FILTER)	Michal Luczaj	1	-2/+1
	Reduce time spent holding kvm->lock: unlock mutex before calling synchronize_srcu_expedited(). There is no need to hold kvm->lock until all vCPUs have been kicked, KVM only needs to guarantee that all vCPUs will switch to the new filter before exiting to userspace. Protecting the write to __reprogram_pmi is also unnecessary as a vCPU may process a set bit before receiving the final KVM_REQ_PMU, but the per-vCPU writes are guaranteed to occur after all vCPUs have switched to the new filter. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Michal Luczaj <mhal@rbox.co> Link: https://lore.kernel.org/r/20230107001256.2365304-2-mhal@rbox.co [sean: expand changelog] Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-02-04	KVM: x86/emulator: Fix comment in __load_segment_descriptor()	Michal Luczaj	1	-1/+1
	The comment refers to the same condition twice. Make it reflect what the code actually does. No functional change intended. Signed-off-by: Michal Luczaj <mhal@rbox.co> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Link: https://lore.kernel.org/r/20230126013405.2967156-3-mhal@rbox.co Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-02-04	KVM: x86/emulator: Fix segment load privilege level validation	Michal Luczaj	1	-2/+2
	Intel SDM describes what steps are taken by the CPU to verify if a memory segment can actually be used at a given privilege level. Loading DS/ES/FS/GS involves checking segment's type as well as making sure that neither selector's RPL nor caller's CPL are greater than segment's DPL. Emulator implements Intel's pseudocode in __load_segment_descriptor(), even quoting the pseudocode in the comments. Although the pseudocode is correctly translated, the implementation is incorrect. This is most likely due to SDM, at the time, being wrong. Patch fixes emulator's logic and updates the pseudocode in the comment. Below are historical notes. Emulator code for handling segment descriptors appears to have been introduced in March 2010 in commit 38ba30ba51a0 ("KVM: x86 emulator: Emulate task switch in emulator.c"). Intel SDM Vol 2A: Instruction Set Reference, A-M (Order Number: 253666-034US, _March 2010_) lists the steps for loading segment registers in section related to MOV instruction: IF DS, ES, FS, or GS is loaded with non-NULL selector THEN IF segment selector index is outside descriptor table limits or segment is not a data or readable code segment or ((segment is a data or nonconforming code segment) and (both RPL and CPL > DPL)) <--- THEN #GP(selector); FI; This is precisely what __load_segment_descriptor() quotes and implements. But there's a twist; a few SDM revisions later (253667-044US), in August 2012, the snippet above becomes: IF DS, ES, FS, or GS is loaded with non-NULL selector THEN IF segment selector index is outside descriptor table limits or segment is not a data or readable code segment or ((segment is a data or nonconforming code segment) [note: missing or superfluous parenthesis?] or ((RPL > DPL) and (CPL > DPL)) <--- THEN #GP(selector); FI; Many SDMs later (253667-065US), in December 2017, pseudocode reaches what seems to be its final form: IF DS, ES, FS, or GS is loaded with non-NULL selector THEN IF segment selector index is outside descriptor table limits OR segment is not a data or readable code segment OR ((segment is a data or nonconforming code segment) AND ((RPL > DPL) or (CPL > DPL))) <--- THEN #GP(selector); FI; which also matches the behavior described in AMD's APM, which states that a #GP occurs if: The DS, ES, FS, or GS register was loaded and the segment pointed to was a data or non-conforming code segment, but the RPL or CPL was greater than the DPL. Signed-off-by: Michal Luczaj <mhal@rbox.co> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Link: https://lore.kernel.org/r/20230126013405.2967156-2-mhal@rbox.co [sean: add blurb to changelog calling out AMD agrees] Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-02-02	KVM: selftests: Test Hyper-V extended hypercall exit to userspace	Vipin Sharma	3	-0/+99
	Hyper-V extended hypercalls by default exit to userspace. Verify userspace gets the call, update the result and then verify in guest correct result is received. Add KVM_EXIT_HYPERV to list of "known" hypercalls so errors generate pretty strings. Signed-off-by: Vipin Sharma <vipinsh@google.com> Reviewed-by: David Matlack <dmatlack@google.com> Link: https://lore.kernel.org/r/20221212183720.4062037-14-vipinsh@google.com [sean: add KVM_EXIT_HYPERV to exit_reasons_known] Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-02-02	KVM: selftests: Replace hardcoded Linux OS id with HYPERV_LINUX_OS_ID	Vipin Sharma	1	-1/+1
	Use HYPERV_LINUX_OS_ID macro instead of hardcoded 0x8100 << 48 Signed-off-by: Vipin Sharma <vipinsh@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20221212183720.4062037-12-vipinsh@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-02-02	KVM: selftests: Test Hyper-V extended hypercall enablement	Vipin Sharma	2	-0/+14
	Test Hyper-V extended hypercall, HV_EXT_CALL_QUERY_CAPABILITIES (0x8001), access denied and invalid parameter cases. Access is denied if CPUID.0x40000003.EBX BIT(20) is not set. Invalid parameter if call has fast bit set. Signed-off-by: Vipin Sharma <vipinsh@google.com> Link: https://lore.kernel.org/r/20221212183720.4062037-11-vipinsh@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-02-02	KVM: x86: hyper-v: Add extended hypercall support in Hyper-v	Vipin Sharma	1	-0/+28
	Add support for extended hypercall in Hyper-v. Hyper-v TLFS 6.0b describes hypercalls above call code 0x8000 as extended hypercalls. A Hyper-v hypervisor's guest VM finds availability of extended hypercalls via CPUID.0x40000003.EBX BIT(20). If the bit is set then the guest can call extended hypercalls. All extended hypercalls will exit to userspace by default. This allows for easy support of future hypercalls without being dependent on KVM releases. If there will be need to process the hypercall in KVM instead of userspace then KVM can create a capability which userspace can query to know which hypercalls can be handled by the KVM and enable handling of those hypercalls. Signed-off-by: Vipin Sharma <vipinsh@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20221212183720.4062037-10-vipinsh@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-02-02	KVM: x86: hyper-v: Use common code for hypercall userspace exit	Vipin Sharma	1	-16/+11
	Remove duplicate code to exit to userspace for hyper-v hypercalls and use a common place to exit. No functional change intended. Signed-off-by: Vipin Sharma <vipinsh@google.com> Suggested-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20221212183720.4062037-9-vipinsh@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24	KVM: x86: Replace IS_ERR() with IS_ERR_VALUE()	ye xingchen	1	-1/+1
	Avoid type casts that are needed for IS_ERR() and use IS_ERR_VALUE() instead. Signed-off-by: ye xingchen <ye.xingchen@zte.com.cn> Link: https://lore.kernel.org/r/202211161718436948912@zte.com.cn Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24	x86/reboot: Disable SVM, not just VMX, when stopping CPUs	Sean Christopherson	1	-3/+3
	Disable SVM and more importantly force GIF=1 when halting a CPU or rebooting the machine. Similar to VMX, SVM allows software to block INITs via CLGI, and thus can be problematic for a crash/reboot. The window for failure is smaller with SVM as INIT is only blocked while GIF=0, i.e. between CLGI and STGI, but the window does exist. Fixes: fba4f472b33a ("x86/reboot: Turn off KVM when halting a CPU") Cc: stable@vger.kernel.org Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20221130233650.1404148-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24	x86/reboot: Disable virtualization in an emergency if SVM is supported	Sean Christopherson	1	-12/+11
	Disable SVM on all CPUs via NMI shootdown during an emergency reboot. Like VMX, SVM can block INIT, e.g. if the emergency reboot is triggered between CLGI and STGI, and thus can prevent bringing up other CPUs via INIT-SIPI-SIPI. Cc: stable@vger.kernel.org Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20221130233650.1404148-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24	x86/virt: Force GIF=1 prior to disabling SVM (for reboot flows)	Sean Christopherson	1	-1/+15
	Set GIF=1 prior to disabling SVM to ensure that INIT is recognized if the kernel is disabling SVM in an emergency, e.g. if the kernel is about to jump into a crash kernel or may reboot without doing a full CPU RESET. If GIF is left cleared, the new kernel (or firmware) will be unabled to awaken APs. Eat faults on STGI (due to EFER.SVME=0) as it's possible that SVM could be disabled via NMI shootdown between reading EFER.SVME and executing STGI. Link: https://lore.kernel.org/all/cbcb6f35-e5d7-c1c9-4db9-fe5cc4de579a@amd.com Cc: stable@vger.kernel.org Cc: Andrew Cooper <Andrew.Cooper3@citrix.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20221130233650.1404148-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24	x86/crash: Disable virt in core NMI crash handler to avoid double shootdown	Sean Christopherson	3	-28/+56
	Disable virtualization in crash_nmi_callback() and rework the emergency_vmx_disable_all() path to do an NMI shootdown if and only if a shootdown has not already occurred. NMI crash shootdown fundamentally can't support multiple invocations as responding CPUs are deliberately put into halt state without unblocking NMIs. But, the emergency reboot path doesn't have any work of its own, it simply cares about disabling virtualization, i.e. so long as a shootdown occurred, emergency reboot doesn't care who initiated the shootdown, or when. If "crash_kexec_post_notifiers" is specified on the kernel command line, panic() will invoke crash_smp_send_stop() and result in a second call to nmi_shootdown_cpus() during native_machine_emergency_restart(). Invoke the callback _before_ disabling virtualization, as the current VMCS needs to be cleared before doing VMXOFF. Note, this results in a subtle change in ordering between disabling virtualization and stopping Intel PT on the responding CPUs. While VMX and Intel PT do interact, VMXOFF and writes to MSR_IA32_RTIT_CTL do not induce faults between one another, which is all that matters when panicking. Harden nmi_shootdown_cpus() against multiple invocations to try and capture any such kernel bugs via a WARN instead of hanging the system during a crash/dump, e.g. prior to the recent hardening of register_nmi_handler(), re-registering the NMI handler would trigger a double list_add() and hang the system if CONFIG_BUG_ON_DATA_CORRUPTION=y. list_add double add: new=ffffffff82220800, prev=ffffffff8221cfe8, next=ffffffff82220800. WARNING: CPU: 2 PID: 1319 at lib/list_debug.c:29 __list_add_valid+0x67/0x70 Call Trace: __register_nmi_handler+0xcf/0x130 nmi_shootdown_cpus+0x39/0x90 native_machine_emergency_restart+0x1c9/0x1d0 panic+0x237/0x29b Extract the disabling logic to a common helper to deduplicate code, and to prepare for doing the shootdown in the emergency reboot path if SVM is supported. Note, prior to commit ed72736183c4 ("x86/reboot: Force all cpus to exit VMX root if VMX is supported"), nmi_shootdown_cpus() was subtly protected against a second invocation by a cpu_vmx_enabled() check as the kdump handler would disable VMX if it ran first. Fixes: ed72736183c4 ("x86/reboot: Force all cpus to exit VMX root if VMX is supported") Cc: stable@vger.kernel.org Reported-by: Guilherme G. Piccoli <gpiccoli@igalia.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Link: https://lore.kernel.org/all/20220427224924.592546-2-gpiccoli@igalia.com Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20221130233650.1404148-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24	KVM: x86/xen: update Xen CPUID Leaf 4 (tsc info) sub-leaves, if present	Paul Durrant	6	-1/+40
	The scaling information in subleaf 1 should match the values set by KVM in the 'vcpu_info' sub-structure 'time_info' (a.k.a. pvclock_vcpu_time_info) which is shared with the guest, but is not directly available to the VMM. The offset values are not set since a TSC offset is already applied. The TSC frequency should also be set in sub-leaf 2. Signed-off-by: Paul Durrant <pdurrant@amazon.com> Reviewed-by: David Woodhouse <dwmw@amazon.co.uk> Link: https://lore.kernel.org/r/20230106103600.528-3-pdurrant@amazon.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24	KVM: x86/cpuid: generalize kvm_update_kvm_cpuid_base() and also capture limit	Paul Durrant	2	-12/+19
	A subsequent patch will need to acquire the CPUID leaf range for emulated Xen so explicitly pass the signature of the hypervisor we're interested in to the new function. Also introduce a new kvm_hypervisor_cpuid structure so we can neatly store both the base and limit leaf indices. Signed-off-by: Paul Durrant <pdurrant@amazon.com> Reviewed-by: David Woodhouse <dwmw@amazon.co.uk> Link: https://lore.kernel.org/r/20230106103600.528-2-pdurrant@amazon.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24	KVM: x86: Replace cpu_dirty_logging_count with nr_memslots_dirty_logging	David Matlack	3	-9/+9
	Drop cpu_dirty_logging_count in favor of nr_memslots_dirty_logging. Both fields count the number of memslots that have dirty-logging enabled, with the only difference being that cpu_dirty_logging_count is only incremented when using PML. So while nr_memslots_dirty_logging is not a direct replacement for cpu_dirty_logging_count, it can be combined with enable_pml to get the same information. Signed-off-by: David Matlack <dmatlack@google.com> Link: https://lore.kernel.org/r/20230105214303.2919415-1-dmatlack@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24	KVM: x86: Replace 0-length arrays with flexible arrays	Kees Cook	1	-2/+3
	Zero-length arrays are deprecated[1]. Replace struct kvm_nested_state's "data" union 0-length arrays with flexible arrays. (How are the sizes of these arrays verified?) Detected with GCC 13, using -fstrict-flex-arrays=3: arch/x86/kvm/svm/nested.c: In function 'svm_get_nested_state': arch/x86/kvm/svm/nested.c:1536:17: error: array subscript 0 is outside array bounds of 'struct kvm_svm_nested_state_data[0]' [-Werror=array-bounds=] 1536 \| &user_kvm_nested_state->data.svm[0]; \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from include/uapi/linux/kvm.h:15, from include/linux/kvm_host.h:40, from arch/x86/kvm/svm/nested.c:18: arch/x86/include/uapi/asm/kvm.h:511:50: note: while referencing 'svm' 511 \| struct kvm_svm_nested_state_data svm[0]; \| ^~~ [1] https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays Cc: Sean Christopherson <seanjc@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org> Cc: x86@kernel.org Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: kvm@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20230105190548.never.323-kees@kernel.org Link: https://lore.kernel.org/r/20230118195905.gonna.693-kees@kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24	KVM: x86: Advertise fast REP string features inherent to the CPU	Jim Mattson	1	-2/+3
	Fast zero-length REP MOVSB, fast short REP STOSB, and fast short REP {CMPSB,SCASB} are inherent features of the processor that cannot be hidden by the hypervisor. When these features are present on the host, enumerate them in KVM_GET_SUPPORTED_CPUID. Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220901211811.2883855-2-jmattson@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24	x86/cpufeatures: Add macros for Intel's new fast rep string features	Jim Mattson	1	-0/+3
	KVM_GET_SUPPORTED_CPUID should reflect these host CPUID bits. The bits are already cached in word 12. Give the bits X86_FEATURE names, so that they can be easily referenced. Hide these bits from /proc/cpuinfo, since the host kernel makes no use of them at present. Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220901211811.2883855-1-jmattson@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24	KVM: PPC: Fix refactoring goof in kvmppc_e500mc_init()	Sean Christopherson	2	-3/+3
	Fix a build error due to a mixup during a recent refactoring. The error was reported during code review, but the fixed up patch didn't make it into the final commit. Fixes: 474856bad921 ("KVM: PPC: Move processor compatibility check to module init") Link: https://lore.kernel.org/all/87cz93snqc.fsf@mpe.ellerman.id.au Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230119182158.4026656-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-24	Merge branch 'kvm-lapic-fix-and-cleanup' into HEAD	Paolo Bonzini	11	-356/+523
	The first half or so patches fix semi-urgent, real-world relevant APICv and AVIC bugs. The second half fixes a variety of AVIC and optimized APIC map bugs where KVM doesn't play nice with various edge cases that are architecturally legal(ish), but are unlikely to occur in most real world scenarios Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-24	Merge branch 'kvm-v6.2-rc4-fixes' into HEAD	Paolo Bonzini	367	-1920/+7290
	ARM: * Fix the PMCR_EL0 reset value after the PMU rework * Correctly handle S2 fault triggered by a S1 page table walk by not always classifying it as a write, as this breaks on R/O memslots * Document why we cannot exit with KVM_EXIT_MMIO when taking a write fault from a S1 PTW on a R/O memslot * Put the Apple M2 on the naughty list for not being able to correctly implement the vgic SEIS feature, just like the M1 before it * Reviewer updates: Alex is stepping down, replaced by Zenghui x86: * Fix various rare locking issues in Xen emulation and teach lockdep to detect them * Documentation improvements * Do not return host topology information from KVM_GET_SUPPORTED_CPUID
2023-01-24	Merge branch 'kvm-hw-enable-refactor' into HEAD	Paolo Bonzini	82	-816/+861
	The main theme of this series is to kill off kvm_arch_init(), kvm_arch_hardware_(un)setup(), and kvm_arch_check_processor_compat(), which all originated in x86 code from way back when, and needlessly complicate both common KVM code and architecture code. E.g. many architectures don't mark functions/data as __init/__ro_after_init purely because kvm_init() isn't marked __init to support x86's separate vendor modules. The idea/hope is that with those hooks gone (moved to arch code), it will be easier for x86 (and other architectures) to modify their module init sequences as needed without having to fight common KVM code. E.g. I'm hoping that ARM can build on this to simplify its hardware enabling logic, especially the pKVM side of things. There are bug fixes throughout this series. They are more scattered than I would usually prefer, but getting the sequencing correct was a gigantic pain for many of the x86 fixes due to needing to fix common code in order for the x86 fix to have any meaning. And while the bugs are often fatal, they aren't all that interesting for most users as they either require a malicious admin or broken hardware, i.e. aren't likely to be encountered by the vast majority of KVM users. So unless someone _really_ wants a particular fix isolated for backporting, I'm not planning on shuffling patches. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-13	KVM: x86: Add helpers to recalc physical vs. logical optimized APIC maps	Sean Christopherson	1	-117/+133
	Move the guts of kvm_recalculate_apic_map()'s main loop to two separate helpers to handle recalculating the physical and logical pieces of the optimized map. Having 100+ lines of code in the for-loop makes it hard to understand what is being calculated where. No functional change intended. Suggested-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230106011306.85230-34-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-13	KVM: x86: Allow APICv APIC ID inhibit to be cleared	Greg Edwards	1	-26/+15
	Legacy kernels prior to commit 4399c03c6780 ("x86/apic: Remove verify_local_APIC()") write the APIC ID of the boot CPU twice to verify a functioning local APIC. This results in APIC acceleration inhibited on these kernels for reason APICV_INHIBIT_REASON_APIC_ID_MODIFIED. Allow the APICV_INHIBIT_REASON_APIC_ID_MODIFIED inhibit reason to be cleared if/when all APICs in xAPIC mode set their APIC ID back to the expected vcpu_id value. Fold the functionality previously in kvm_lapic_xapic_id_updated() into kvm_recalculate_apic_map(), as this allows examining all APICs in one pass. Fixes: 3743c2f02517 ("KVM: x86: inhibit APICv/AVIC on changes to APIC ID or APIC base") Signed-off-by: Greg Edwards <gedwards@ddn.com> Link: https://lore.kernel.org/r/20221117183247.94314-1-gedwards@ddn.com Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230106011306.85230-33-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-13	KVM: x86: Track required APICv inhibits with variable, not callback	Sean Christopherson	7	-36/+29
	Track the per-vendor required APICv inhibits with a variable instead of calling into vendor code every time KVM wants to query the set of required inhibits. The required inhibits are a property of the vendor's virtualization architecture, i.e. are 100% static. Using a variable allows the compiler to inline the check, e.g. generate a single-uop TEST+Jcc, and thus eliminates any desire to avoid checking inhibits for performance reasons. No functional change intended. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230106011306.85230-32-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-13	Revert "KVM: SVM: Do not throw warning when calling avic_vcpu_load on a ↵	Sean Christopherson	1	-0/+1
	running vcpu" Turns out that some warnings exist for good reasons. Restore the warning in avic_vcpu_load() that guards against calling avic_vcpu_load() on a running vCPU now that KVM avoids doing so when switching between x2APIC and xAPIC. The entire point of the WARN is to highlight that KVM should not be reloading an AVIC. Opportunistically convert the WARN_ON() to WARN_ON_ONCE() to avoid spamming the kernel if it does fire. This reverts commit c0caeee65af3944b7b8abbf566e7cc1fae15c775. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230106011306.85230-31-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-13	KVM: SVM: Ignore writes to Remote Read Data on AVIC write traps	Sean Christopherson	1	-0/+3
	Drop writes to APIC_RRR, a.k.a. Remote Read Data Register, on AVIC unaccelerated write traps. The register is read-only and isn't emulated by KVM. Sending the register through kvm_apic_write_nodecode() will result in screaming when x2APIC is enabled due to the unexpected failure to retrieve the MSR (KVM expects that only "legal" accesses will trap). Fixes: 4d1d7942e36a ("KVM: SVM: Introduce logic to (de)activate x2AVIC mode") Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20230106011306.85230-30-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-13	KVM: SVM: Handle multiple logical targets in AVIC kick fastpath	Sean Christopherson	1	-48/+62
	Iterate over all target logical IDs in the AVIC kick fastpath instead of bailing if there is more than one target. Now that KVM inhibits AVIC if vCPUs aren't mapped 1:1 with logical IDs, each bit in the destination is guaranteed to match to at most one vCPU, i.e. iterating over the bitmap is guaranteed to kick each valid target exactly once. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230106011306.85230-29-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-13	KVM: SVM: Require logical ID to be power-of-2 for AVIC entry	Sean Christopherson	1	-15/+15
	Do not modify AVIC's logical ID table if the logical ID portion of the LDR is not a power-of-2, i.e. if the LDR has multiple bits set. Taking only the first bit means that KVM will fail to match MDAs that intersect with "higher" bits in the "ID" The "ID" acts as a bitmap, but is referred to as an ID because there's an implicit, unenforced "requirement" that software only set one bit. This edge case is arguably out-of-spec behavior, but KVM cleanly handles it in all other cases, e.g. the optimized logical map (and AVIC!) is also disabled in this scenario. Refactor the code to consolidate the checks, and so that the code looks more like avic_kick_target_vcpus_fast(). Fixes: 18f40c53e10f ("svm: Add VMEXIT handlers for AVIC") Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Cc: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230106011306.85230-28-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-13	KVM: SVM: Update svm->ldr_reg cache even if LDR is "bad"	Sean Christopherson	1	-10/+4
	Update SVM's cache of the LDR even if the new value is "bad". Leaving stale information in the cache can result in KVM missing updates and/or invalidating the wrong entry, e.g. if avic_invalidate_logical_id_entry() is triggered after a different vCPU has "claimed" the old LDR. Fixes: 18f40c53e10f ("svm: Add VMEXIT handlers for AVIC") Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230106011306.85230-27-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-13	KVM: SVM: Always update local APIC on writes to logical dest register	Sean Christopherson	1	-7/+4
	Update the vCPU's local (virtual) APIC on LDR writes even if the write "fails". The APIC needs to recalc the optimized logical map even if the LDR is invalid or zero, e.g. if the guest clears its LDR, the optimized map will be left as is and the vCPU will receive interrupts using its old LDR. Fixes: 18f40c53e10f ("svm: Add VMEXIT handlers for AVIC") Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230106011306.85230-26-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-13	KVM: SVM: Inhibit AVIC if vCPUs are aliased in logical mode	Sean Christopherson	3	-1/+13
	Inhibit SVM's AVIC if multiple vCPUs are aliased to the same logical ID. Architecturally, all CPUs whose logical ID matches the MDA are supposed to receive the interrupt; overwriting existing entries in AVIC's logical=>physical map can result in missed IPIs. Fixes: 18f40c53e10f ("svm: Add VMEXIT handlers for AVIC") Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230106011306.85230-25-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-13	KVM: x86: Inhibit APICv/AVIC if the optimized physical map is disabled	Sean Christopherson	4	-1/+20
	Inhibit APICv/AVIC if the optimized physical map is disabled so that KVM KVM provides consistent APIC behavior if xAPIC IDs are aliased due to vcpu_id being truncated and the x2APIC hotplug hack isn't enabled. If the hotplug hack is disabled, events that are emulated by KVM will follow architectural behavior (all matching vCPUs receive events, even if the "match" is due to truncation), whereas APICv and AVIC will deliver events only to the first matching vCPU, i.e. the vCPU that matches without truncation. Note, the "extra" inhibit is needed because KVM deliberately ignores mismatches due to truncation when applying the APIC_ID_MODIFIED inhibit so that large VMs (>255 vCPUs) can run with APICv/AVIC. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230106011306.85230-24-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-13	KVM: x86: Honor architectural behavior for aliased 8-bit APIC IDs	Sean Christopherson	2	-9/+52
	Apply KVM's hotplug hack if and only if userspace has enabled 32-bit IDs for x2APIC. If 32-bit IDs are not enabled, disable the optimized map to honor x86 architectural behavior if multiple vCPUs shared a physical APIC ID. As called out in the changelog that added the hack, all CPUs whose (possibly truncated) APIC ID matches the target are supposed to receive the IPI. KVM intentionally differs from real hardware, because real hardware (Knights Landing) does just "x2apic_id & 0xff" to decide whether to accept the interrupt in xAPIC mode and it can deliver one interrupt to more than one physical destination, e.g. 0x123 to 0x123 and 0x23. Applying the hack even when x2APIC is not fully enabled means KVM doesn't correctly handle scenarios where the guest has aliased xAPIC IDs across multiple vCPUs, as only the vCPU with the lowest vCPU ID will receive any interrupts. It's extremely unlikely any real world guest aliases APIC IDs, or even modifies APIC IDs, but KVM's behavior is arbitrary, e.g. the lowest vCPU ID "wins" regardless of which vCPU is "aliasing" and which vCPU is "normal". Furthermore, the hack is _not_ guaranteed to work! The hack works if and only if the optimized APIC map is successfully allocated. If the map allocation fails (unlikely), KVM will fall back to its unoptimized behavior, which _does_ honor the architectural behavior. Pivot on 32-bit x2APIC IDs being enabled as that is required to take advantage of the hotplug hack (see kvm_apic_state_fixup()), i.e. won't break existing setups unless they are way, way off in the weeds. And an entry in KVM's errata to document the hack. Alternatively, KVM could provide an actual x2APIC quirk and document the hack that way, but there's unlikely to ever be a use case for disabling the quirk. Go the errata route to avoid having to validate a quirk no one cares about. Fixes: 5bd5db385b3e ("KVM: x86: allow hotplug of VCPU with APIC ID over 0xff") Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230106011306.85230-23-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-13	KVM: x86: Disable APIC logical map if vCPUs are aliased in logical mode	Sean Christopherson	1	-2/+3
	Disable the optimized APIC logical map if multiple vCPUs are aliased to the same logical ID. Architecturally, all CPUs whose logical ID matches the MDA are supposed to receive the interrupt; overwriting existing map entries can result in missed IPIs. Fixes: 1e08ec4a130e ("KVM: optimize apic interrupt delivery") Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20230106011306.85230-22-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-13	KVM: x86: Disable APIC logical map if logical ID covers multiple MDAs	Sean Christopherson	1	-2/+8
	Disable the optimized APIC logical map if a logical ID covers multiple MDAs, i.e. if a vCPU has multiple bits set in its ID. In logical mode, events match if "ID & MDA != 0", i.e. creating an entry for only the first bit can cause interrupts to be missed. Note, creating an entry for every bit is also wrong as KVM would generate IPIs for every matching bit. It would be possible to teach KVM to play nice with this edge case, but it is very much an edge case and probably not used in any real world OS, i.e. it's not worth optimizing. Fixes: 1e08ec4a130e ("KVM: optimize apic interrupt delivery") Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20230106011306.85230-21-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-13	KVM: x86: Skip redundant x2APIC logical mode optimized cluster setup	Sean Christopherson	1	-5/+17
	Skip the optimized cluster[] setup for x2APIC logical mode, as KVM reuses the optimized map's phys_map[] and doesn't actually need to insert the target apic into the cluster[]. The LDR is derived from the x2APIC ID, and both are read-only in KVM, thus the vCPU's cluster[ldr] is guaranteed to be the same entry as the vCPU's phys_map[x2apic_id] entry. Skipping the unnecessary setup will allow a future fix for aliased xAPIC logical IDs to simply require that cluster[ldr] is non-NULL, i.e. won't have to special case x2APIC. Alternatively, the future check could allow "cluster[ldr] == apic", but that ends up being terribly confusing because cluster[ldr] is only set at the very end, i.e. it's only possible due to x2APIC's shenanigans. Another alternative would be to send x2APIC down a separate path _after_ the calculation and then assert that all of the above, but the resulting code is rather messy, and it's arguably unnecessary since asserting that the actual LDR matches the expected LDR means that simply testing that interrupts are delivered correctly provides the same guarantees. Reported-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230106011306.85230-20-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-13	KVM: x86: Explicitly track all possibilities for APIC map's logical modes	Sean Christopherson	2	-17/+52
	Track all possibilities for the optimized APIC map's logical modes instead of overloading the pseudo-bitmap and treating any "unknown" value as "invalid". As documented by the now-stale comment above the mode values, the values did have meaning when the optimized map was originally added. That dependent logical was removed by commit e45115b62f9a ("KVM: x86: use physical LAPIC array for logical x2APIC"), but the obfuscated behavior and its comment were left behind. Opportunistically rename "mode" to "logical_mode", partly to make it clear that the "disabled" case applies only to the logical map, but also to prove that there is no lurking code that expects "mode" to be a bitmap. Functionally, this is a glorified nop. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20230106011306.85230-19-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-13	KVM: x86: Explicitly skip optimized logical map setup if vCPU's LDR==0	Sean Christopherson	1	-1/+3
	Explicitly skip the optimized map setup if the vCPU's LDR is '0', i.e. if the vCPU will never respond to logical mode interrupts. KVM already skips setup in this case, but relies on kvm_apic_map_get_logical_dest() to generate mask==0. KVM still needs the mask=0 check as a non-zero LDR can yield mask==0 depending on the mode, but explicitly handling the LDR will make it simpler to clean up the logical mode tracking in the future. No functional change intended. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230106011306.85230-18-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-13	KVM: SVM: Add helper to perform final AVIC "kick" of single vCPU	Sean Christopherson	1	-12/+13
	Add a helper to perform the final kick, two instances of the ICR decoding is one too many. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20230106011306.85230-17-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-13	KVM: SVM: Document that vCPU ID == APIC ID in AVIC kick fastpatch	Sean Christopherson	1	-3/+7
	Document that AVIC is inhibited if any vCPU's APIC ID diverges from its vCPU ID, i.e. that there's no need to check for a destination match in the AVIC kick fast path. Opportunistically tweak comments to remove "guest bug", as that suggests KVM is punting on error handling, which is not the case. Targeting a non-existent vCPU or no vCPUs _may_ be a guest software bug, but whether or not it's a guest bug is irrelevant. Such behavior is architecturally legal and thus needs to faithfully emulated by KVM (and it is). Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20230106011306.85230-16-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-13	Revert "KVM: SVM: Use target APIC ID to complete x2AVIC IRQs when possible"	Sean Christopherson	1	-18/+11
	Due to a likely mismerge of patches, KVM ended up with a superfluous commit to "enable" AVIC's fast path for x2AVIC mode. Even worse, the superfluous commit has several bugs and creates a nasty local shadow variable. Rather than fix the bugs piece-by-piece[] to achieve the same end result, revert the patch wholesale. Opportunistically add a comment documenting the x2AVIC dependencies. This reverts commit 8c9e639da435874fb845c4d296ce55664071ea7a. [] https://lore.kernel.org/all/YxEP7ZBRIuFWhnYJ@google.com Fixes: 8c9e639da435 ("KVM: SVM: Use target APIC ID to complete x2AVIC IRQs when possible") Suggested-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230106011306.85230-15-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-13	KVM: SVM: Fix x2APIC Logical ID calculation for avic_kick_target_vcpus_fast	Suravee Suthikulpanit	1	-1/+1
	For X2APIC ID in cluster mode, the logical ID is bit [15:0]. Fixes: 603ccef42ce9 ("KVM: x86: SVM: fix avic_kick_target_vcpus_fast") Cc: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230106011306.85230-14-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-13	KVM: SVM: Compute dest based on sender's x2APIC status for AVIC kick	Sean Christopherson	1	-7/+1
	Compute the destination from ICRH using the sender's x2APIC status, not each (potential) target's x2APIC status. Fixes: c514d3a348ac ("KVM: SVM: Update avic_kick_target_vcpus to support 32-bit APIC ID") Cc: Li RongQing <lirongqing@baidu.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Li RongQing <lirongqing@baidu.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20230106011306.85230-13-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-13	KVM: SVM: Replace "avic_mode" enum with "x2avic_enabled" boolean	Sean Christopherson	3	-35/+24
	Replace the "avic_mode" enum with a single bool to track whether or not x2AVIC is enabled. KVM already has "apicv_enabled" that tracks if any flavor of AVIC is enabled, i.e. AVIC_MODE_NONE and AVIC_MODE_X1 are redundant and unnecessary noise. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20230106011306.85230-12-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-13	KVM: x86: Inhibit APIC memslot if x2APIC and AVIC are enabled	Sean Christopherson	7	-14/+78
	Free the APIC access page memslot if any vCPU enables x2APIC and SVM's AVIC is enabled to prevent accesses to the virtual APIC on vCPUs with x2APIC enabled. On AMD, if its "hybrid" mode is enabled (AVIC is enabled when x2APIC is enabled even without x2AVIC support), keeping the APIC access page memslot results in the guest being able to access the virtual APIC page as x2APIC is fully emulated by KVM. I.e. hardware isn't aware that the guest is operating in x2APIC mode. Exempt nested SVM's update of APICv state from the new logic as x2APIC can't be toggled on VM-Exit. In practice, invoking the x2APIC logic should be harmless precisely because it should be a glorified nop, but play it safe to avoid latent bugs, e.g. with dropping the vCPU's SRCU lock. Intel doesn't suffer from the same issue as APICv has fully independent VMCS controls for xAPIC vs. x2APIC virtualization. Technically, KVM should provide bus error semantics and not memory semantics for the APIC page when x2APIC is enabled, but KVM already provides memory semantics in other scenarios, e.g. if APICv/AVIC is enabled and the APIC is hardware disabled (via APIC_BASE MSR). Note, checking apic_access_memslot_enabled without taking locks relies it being set during vCPU creation (before kvm_vcpu_reset()). vCPUs can race to set the inhibit and delete the memslot, i.e. can get false positives, but can't get false negatives as apic_access_memslot_enabled can't be toggled "on" once any vCPU reaches KVM_RUN. Opportunistically drop the "can" while updating avic_activate_vmcb()'s comment, i.e. to state that KVM _does_ support the hybrid mode. Move the "Note:" down a line to conform to preferred kernel/KVM multi-line comment style. Opportunistically update the apicv_update_lock comment, as it isn't actually used to protect apic_access_memslot_enabled (which is protected by slots_lock). Fixes: 0e311d33bfbe ("KVM: SVM: Introduce hybrid-AVIC mode") Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20230106011306.85230-11-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-13	KVM: x86: Move APIC access page helper to common x86 code	Sean Christopherson	4	-68/+44
	Move the APIC access page allocation helper function to common x86 code, the allocation routine is virtually identical between APICv (VMX) and AVIC (SVM). Keep APICv's gfn_to_page() + put_page() sequence, which verifies that a backing page can be allocated, i.e. that the system isn't under heavy memory pressure. Forcing the backing page to be populated isn't strictly necessary, but skipping the effective prefetch only delays the inevitable. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230106011306.85230-10-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>