Age | Commit message (Collapse) | Author | Files | Lines |
|
Clean up nested.c and vmx.c by using INVALID_GPA instead of "-1ull",
to denote an invalid address in nested VMX. Affected addresses are
the ones of VMXON region, current VMCS, VMCS link pointer, virtual-
APIC page, ENCLS-exiting bitmap, and IO bitmap etc.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
Message-Id: <20210929175154.11396-2-yu.c.zhang@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
When updating the host's mask for its MSR_IA32_TSX_CTRL user return entry,
clear the mask in the found uret MSR instead of vmx->guest_uret_msrs[i].
Modifying guest_uret_msrs directly is completely broken as 'i' does not
point at the MSR_IA32_TSX_CTRL entry. In fact, it's guaranteed to be an
out-of-bounds accesses as is always set to kvm_nr_uret_msrs in a prior
loop. By sheer dumb luck, the fallout is limited to "only" failing to
preserve the host's TSX_CTRL_CPUID_CLEAR. The out-of-bounds access is
benign as it's guaranteed to clear a bit in a guest MSR value, which are
always zero at vCPU creation on both x86-64 and i386.
Cc: stable@vger.kernel.org
Fixes: 8ea8b8d6f869 ("KVM: VMX: Use common x86's uret MSR list as the one true list")
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210926015545.281083-1-zhenzhong.duan@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
If L1 had invalid state on VM entry (can happen on SMM transactions
when we enter from real mode, straight to nested guest),
then after we load 'host' state from VMCS12, the state has to become
valid again, but since we load the segment registers with
__vmx_set_segment we weren't always updating emulation_required.
Update emulation_required explicitly at end of load_vmcs12_host_state.
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210913140954.165665-8-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
!from_vmentry
It is possible that when non root mode is entered via special entry
(!from_vmentry), that is from SMM or from loading the nested state,
the L2 state could be invalid in regard to non unrestricted guest mode,
but later it can become valid.
(for example when RSM emulation restores segment registers from SMRAM)
Thus delay the check to VM entry, where we will check this and fail.
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210913140954.165665-7-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Since no actual VM entry happened, the VM exit information is stale.
To avoid this, synthesize an invalid VM guest state VM exit.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210913140954.165665-6-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Windows Server 2022 with Hyper-V role enabled failed to boot on KVM when
enlightened VMCS is advertised. Debugging revealed there are two exposed
secondary controls it is not happy with: SECONDARY_EXEC_ENABLE_VMFUNC and
SECONDARY_EXEC_SHADOW_VMCS. These controls are known to be unsupported,
as there are no corresponding fields in eVMCSv1 (see the comment above
EVMCS1_UNSUPPORTED_2NDEXEC definition).
Previously, commit 31de3d2500e4 ("x86/kvm/hyper-v: move VMX controls
sanitization out of nested_enable_evmcs()") introduced the required
filtering mechanism for VMX MSRs but for some reason put only known
to be problematic (and not full EVMCS1_UNSUPPORTED_* lists) controls
there.
Note, Windows Server 2022 seems to have gained some sanity check for VMX
MSRs: it doesn't even try to launch a guest when there's something it
doesn't like, nested_evmcs_check_controls() mechanism can't catch the
problem.
Let's be bold this time and instead of playing whack-a-mole just filter out
all unsupported controls from VMX MSRs.
Fixes: 31de3d2500e4 ("x86/kvm/hyper-v: move VMX controls sanitization out of nested_enable_evmcs()")
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210907163530.110066-1-vkuznets@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
If we are emulating an invalid guest state, we don't have a correct
exit reason, and thus we shouldn't do anything in this function.
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210826095750.1650467-2-mlevitsk@redhat.com>
Cc: stable@vger.kernel.org
Fixes: 95b5a48c4f2b ("KVM: VMX: Handle NMIs, #MCs and async #PFs in common irqs-disabled fn", 2019-06-18)
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
AMD future CPUs will require a 5-level NPT if host CR4.LA57 is set.
To prevent kvm_mmu_get_tdp_level() from incorrectly changing NPT level
on behalf of CPUs, add a new parameter in kvm_configure_mmu() to force
a fixed TDP level.
Signed-off-by: Wei Huang <wei.huang2@amd.com>
Message-Id: <20210818165549.3771014-2-wei.huang2@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Now that nested VMX pulls KVM's desired VMCS controls from vmcs01 instead
of re-calculating on the fly, bury the helpers that do the calcluations
in vmx.c.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210810171952.2758100-5-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Remove the secondary execution controls cache now that it's effectively
dead code; it is only read immediately after it is written.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210810171952.2758100-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The commit efdab992813fb ("KVM: x86: fix escape of guest dr6 to the host")
fixed a bug by resetting DR6 unconditionally when the vcpu being scheduled out.
But writing to debug registers is slow, and it can be visible in perf results
sometimes, even if neither the host nor the guest activate breakpoints.
Since KVM_DEBUGREG_WONT_EXIT on Intel processors is the only case
where DR6 gets the guest value, and it never happens at all on SVM,
the register can be cleared in vmx.c right after reading it.
Reported-by: Lai Jiangshan <laijs@linux.alibaba.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Commit c77fb5fe6f03 ("KVM: x86: Allow the guest to run with dirty debug
registers") allows the guest accessing to DRs without exiting when
KVM_DEBUGREG_WONT_EXIT and we need to ensure that they are synchronized
on entry to the guest---including DR6 that was not synced before the commit.
But the commit sets the hardware DR6 not only when KVM_DEBUGREG_WONT_EXIT,
but also when KVM_DEBUGREG_BP_ENABLED. The second case is unnecessary
and just leads to a more case which leaks stale DR6 to the host which has
to be resolved by unconditionally reseting DR6 in kvm_arch_vcpu_put().
Even if KVM_DEBUGREG_WONT_EXIT, however, setting the host DR6 only matters
on VMX because SVM always uses the DR6 value from the VMCB. So move this
line to vmx.c and make it conditional on KVM_DEBUGREG_WONT_EXIT.
Reported-by: Lai Jiangshan <jiangshanlai@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Move VMWRITE sequences in vmx_vcpu_reset() guarded by !init_event into
init_vmcs() to make it more obvious that they're, uh, initializing the
VMCS.
No meaningful functional change intended (though the order of VMWRITEs
and whatnot is different).
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210713163324.627647-44-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Drop a call to vmx_clear_hlt() during vCPU INIT, the guest's activity
state is unconditionally set to "active" a few lines earlier in
vmx_vcpu_reset().
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210713163324.627647-43-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Consolidate all of the dynamic MSR bitmap adjustments into
vmx_update_msr_bitmap_x2apic(), and rename the mode tracker to reflect
that it is x2APIC specific. If KVM gains more cases of dynamic MSR
pass-through, odds are very good that those new cases will be better off
with their own logic, e.g. see Intel PT MSRs and MSR_IA32_SPEC_CTRL.
Attempting to handle all updates in a common helper did more harm than
good, as KVM ended up collecting a large number of useless "updates".
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210713163324.627647-42-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Don't bother initializing msr_bitmap_mode to 0, all of struct vcpu_vmx is
zero initialized.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210713163324.627647-41-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Drop an explicit call to update the x2APIC MSRs when the userspace MSR
filter is modified. The x2APIC MSRs are deliberately exempt from
userspace filtering.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210713163324.627647-40-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Drop unnecessary MSR bitmap updates during nested transitions, as L1's
APIC_BASE MSR is not modified by the standard VM-Enter/VM-Exit flows,
and L2's MSR bitmap is managed separately. In the unlikely event that L1
is pathological and loads APIC_BASE via the VM-Exit load list, KVM will
handle updating the bitmap in its normal WRMSR flows.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210713163324.627647-39-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Remove an unnecessary MSR bitmap refresh during vCPU RESET/INIT. In both
cases, the MSR bitmap already has the desired values and state.
At RESET, the vCPU is guaranteed to be running with x2APIC disabled, the
x2APIC MSRs are guaranteed to be intercepted due to the MSR bitmap being
initialized to all ones by alloc_loaded_vmcs(), and vmx->msr_bitmap_mode
is guaranteed to be zero, i.e. reflecting x2APIC disabled.
At INIT, the APIC_BASE MSR is not modified, thus there can't be any
change in x2APIC state.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210713163324.627647-38-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Move the setting of CR0, CR4, EFER, RFLAGS, and RIP from vendor code to
common x86. VMX and SVM now have near-identical sequences, the only
difference being that VMX updates the exception bitmap. Updating the
bitmap on SVM is unnecessary, but benign. Unfortunately it can't be left
behind in VMX due to the need to update exception intercepts after the
control registers are set.
Reviewed-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210713163324.627647-37-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
When emulating vCPU INIT, do not unconditionally refresh the list of user
return MSRs that need to be loaded into hardware when running the guest.
Unconditionally refreshing the list is confusing, as the vast majority of
MSRs are not modified on INIT. The real motivation is to handle the case
where an INIT during long mode obviates the need to load the SYSCALL MSRs,
and that is handled as needed by vmx_set_efer().
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210713163324.627647-36-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
After a CPUID update, refresh the list of user return MSRs that are
loaded into hardware when running the vCPU. This is necessary to handle
the oddball case where userspace exposes X86_FEATURE_RDTSCP to the guest
after the vCPU is running.
Fixes: 0023ef39dc35 ("kvm: vmx: Set IA32_TSC_AUX for legacy mode guests")
Fixes: 4e47c7a6d714 ("KVM: VMX: Add instruction rdtscp support for guest")
Reviewed-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210713163324.627647-35-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Split setup_msrs() into vmx_setup_uret_msrs() and an open coded refresh
of the MSR bitmap, and skip the latter when refreshing the user return
MSRs during an EFER load. Only the x2APIC MSRs are dynamically exposed
and hidden, and those are not affected by a change in EFER.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210713163324.627647-34-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Use the "internal" variants of setting segment registers when stuffing
state on nested VM-Exit in order to skip the "emulation required"
updates. VM-Exit must always go to protected mode, and all segments are
mostly hardcoded (to valid values) on VM-Exit. The bits of the segments
that aren't hardcoded are explicitly checked during VM-Enter, e.g. the
selector RPLs must all be zero.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210713163324.627647-30-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Don't refresh "emulation required" when stuffing segments during
transitions to/from real mode when running without unrestricted guest.
The checks are unnecessary as vmx_set_cr0() unconditionally rechecks
"emulation required". They also happen to be broken, as enter_pmode()
and enter_rmode() run with a stale vcpu->arch.cr0.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210713163324.627647-29-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Move the long mode and EPT w/o unrestricted guest side effect processing
down in vmx_set_cr0() so that the EPT && !URG case doesn't have to stuff
vcpu->arch.cr0 early. This also fixes an oddity where CR0 might not be
marked available, i.e. the early vcpu->arch.cr0 write would appear to be
in danger of being overwritten, though that can't actually happen in the
current code since CR0.TS is the only guest-owned bit, and CR0.TS is not
read by vmx_set_cr4().
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210713163324.627647-28-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Tweak the logic for grabbing vmcs.GUEST_CR3 in vmx_cache_reg() to look
directly at the execution controls, as opposed to effectively inferring
the controls based on vCPUs. Inferring the controls isn't wrong, but it
creates a very subtle dependency between the caching logic, the state of
vcpu->arch.cr0 (via is_paging()), and the behavior of vmx_set_cr0().
Using the execution controls doesn't completely eliminate the dependency
in vmx_set_cr0(), e.g. neglecting to cache CR3 before enabling
interception would still break the guest, but it does reduce the
code dependency and mostly eliminate the logical dependency (that CR3
loads are intercepted in certain scenarios). Eliminating the subtle
read of vcpu->arch.cr0 will also allow for additional cleanup in
vmx_set_cr0().
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210713163324.627647-26-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Keep CR3 load/store exiting enable as needed when running L2 in order to
honor L1's desires. This fixes a largely theoretical bug where L1 could
intercept CR3 but not CR0.PG and end up not getting the desired CR3 exits
when L2 enables paging. In other words, the existing !is_paging() check
inadvertantly handles the normal case for L2 where vmx_set_cr0() is
called during VM-Enter, which is guaranteed to run with paging enabled,
and thus will never clear the bits.
Removing the !is_paging() check will also allow future consolidation and
cleanup of the related code. From a performance perspective, this is
all a nop, as the VMCS controls shadow will optimize away the VMWRITE
when the controls are in the desired state.
Add a comment explaining why CR3 is intercepted, with a big disclaimer
about not querying the old CR3. Because vmx_set_cr0() is used for flows
that are not directly tied to MOV CR3, e.g. vCPU RESET/INIT and nested
VM-Enter, it's possible that is_paging() is not synchronized with CR3
load/store exiting. This is actually guaranteed in the current code, as
KVM starts with CR3 interception disabled. Obviously that can be fixed,
but there's no good reason to play whack-a-mole, and it tends to end
poorly, e.g. descriptor table exiting for UMIP emulation attempted to be
precise in the past and ended up botching the interception toggling.
Fixes: fe3ef05c7572 ("KVM: nVMX: Prepare vmcs02 from vmcs01 and vmcs12")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210713163324.627647-25-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Move the CR0/CR3/CR4 shenanigans for EPT without unrestricted guest back
into vmx_set_cr0(). This will allow a future patch to eliminate the
rather gross stuffing of vcpu->arch.cr0 in the paging transition cases
by snapshotting the old CR0.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210713163324.627647-24-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Remove a bogus write to vcpu->arch.cr0 that immediately precedes
vmx_set_cr0() during vCPU RESET/INIT. For RESET, this is a nop since
the "old" CR0 value is meaningless. But for INIT, if the vCPU is coming
from paging enabled mode, crushing vcpu->arch.cr0 will cause the various
is_paging() checks in vmx_set_cr0() to get false negatives.
For the exit_lmode() case, the false negative is benign as vmx_set_efer()
is called immediately after vmx_set_cr0().
For EPT without unrestricted guest, the false negative will cause KVM to
unnecessarily run with CR3 load/store exiting. But again, this is
benign, albeit sub-optimal.
Reviewed-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210713163324.627647-23-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Opt-in to forcing CR0.WP=1 for shadow paging, and stop lying about WP
being "always on" for unrestricted guest. In addition to making KVM a
wee bit more honest, this paves the way for additional cleanup.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210713163324.627647-22-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Move the EDX initialization at vCPU RESET, which is now identical between
VMX and SVM, into common code.
No functional change intended.
Reviewed-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210713163324.627647-20-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Consolidate the APIC base RESET logic, which is currently spread out
across both x86 and vendor code. For an in-kernel APIC, the vendor code
is redundant. But for a userspace APIC, KVM relies on the vendor code
to initialize vcpu->arch.apic_base. Hoist the vcpu->arch.apic_base
initialization above the !apic check so that it applies to both flavors
of APIC emulation, and delete the vendor code.
Reviewed-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210713163324.627647-19-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Write vcpu->arch.apic_base directly instead of bouncing through
kvm_set_apic_base(). This is a glorified nop, and is a step towards
cleaning up the mess that is local APIC creation.
When using an in-kernel APIC, kvm_create_lapic() explicitly sets
vcpu->arch.apic_base to MSR_IA32_APICBASE_ENABLE to avoid its own
kvm_lapic_set_base() call in kvm_lapic_reset() from triggering state
changes. That call during RESET exists purely to set apic->base_address
to the default base value. As a result, by the time VMX gets control,
the only missing piece is the BSP bit being set for the reset BSP.
For a userspace APIC, there are no side effects to process (for the APIC).
In both cases, the call to kvm_update_cpuid_runtime() is a nop because
the vCPU hasn't yet been exposed to userspace, i.e. there can't be any
CPUID entries.
No functional change intended.
Reviewed-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210713163324.627647-17-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Drop an explicit MMU reset when entering emulated real mode now that the
vCPU INIT/RESET path correctly handles conditional MMU resets, e.g. if
INIT arrives while the vCPU is in 64-bit mode.
Note, while there are multiple other direct calls to vmx_set_cr0(), i.e.
paths that change CR0 without invoking kvm_post_set_cr0(), only the INIT
emulation can reach enter_rmode(). CLTS emulation only toggles CR.TS,
VM-Exit (and late VM-Fail) emulation cannot architecturally transition to
Real Mode, and VM-Enter to Real Mode is possible if and only if
Unrestricted Guest is enabled (exposed to L1).
This effectively reverts commit 8668a3c468ed ("KVM: VMX: Reset mmu
context when entering real mode")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210713163324.627647-8-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Set EDX at RESET/INIT based on the userspace-defined CPUID model when
possible, i.e. when CPUID.0x1.EAX is defind by userspace. At RESET/INIT,
all CPUs that support CPUID set EDX to the FMS enumerated in
CPUID.0x1.EAX. If no CPUID match is found, fall back to KVM's default
of 0x600 (Family '6'), which is the least awful approximation of KVM's
virtual CPU model.
Fixes: 6aa8b732ca01 ("[PATCH] kvm: userspace interface")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210713163324.627647-5-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <0e8760a26151f47dc47052b25ca8b84fffe0641e.1625186503.git.isaku.yamahata@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
This better reflects the purpose of this variable on AMD, since
on AMD the AVIC's memory slot can be enabled and disabled dynamically.
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210623113002.111448-4-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
When the kvm_intel module unloads the module parameter
'allow_smaller_maxphyaddr' is not cleared because the backing variable is
defined in the kvm module. As a result, if the module parameter's state
was set before kvm_intel unloads, it will also be set when it reloads.
Explicitly clear the state in vmx_exit() to prevent this from happening.
Signed-off-by: Aaron Lewis <aaronlewis@google.com>
Message-Id: <20210623203426.1891402-1-aaronlewis@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
|
|
Mark #ACs that won't be reinjected to the guest as wanted by L0 so that
KVM handles split-lock #AC from L2 instead of forwarding the exception to
L1. Split-lock #AC isn't yet virtualized, i.e. L1 will treat it like a
regular #AC and do the wrong thing, e.g. reinject it into L2.
Fixes: e6f8b6c12f03 ("KVM: VMX: Extend VMXs #AC interceptor to handle split lock #AC in guest")
Cc: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210622172244.3561540-1-seanjc@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Failed VM-entry is often due to a faulty core. To help identify bad
cores, print the id of the last logical processor that attempted
VM-entry whenever dumping a VMCS or VMCB.
Signed-off-by: Jim Mattson <jmattson@google.com>
Message-Id: <20210621221648.1833148-1-jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
As part of smaller maxphyaddr emulation, kvm needs to intercept
present page faults to see if it needs to add the RSVD flag (bit 3) to
the error code. However, there is no need to intercept page faults
that already have the RSVD flag set. When setting up the page fault
intercept, add the RSVD flag into the #PF error code mask field (but
not the #PF error code match field) to skip the intercept when the
RSVD flag is already set.
Signed-off-by: Jim Mattson <jmattson@google.com>
Message-Id: <20210618235941.1041604-1-jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Refuse to load KVM if NX support is not available and EPT is not enabled.
Shadow paging has assumed NX support since commit 9167ab799362 ("KVM:
vmx, svm: always run with EFER.NXE=1 when shadow paging is active"), so
for all intents and purposes this has been a de facto requirement for
over a year.
Do not require NX support if EPT is enabled purely because Intel CPUs let
firmware disable NX support via MSR_IA32_MISC_ENABLES. If not for that,
VMX (and KVM as a whole) could require NX support with minimal risk to
breaking userspace.
Fixes: 9167ab799362 ("KVM: vmx, svm: always run with EFER.NXE=1 when shadow paging is active")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Message-Id: <20210615164535.2146172-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Instead of checking 'vmx->nested.hv_evmcs' use '-1' in
'vmx->nested.hv_evmcs_vmptr' to indicate 'evmcs is not in use' state. This
matches how we check 'vmx->nested.current_vmptr'. Introduce EVMPTR_INVALID
and evmptr_is_valid() and use it instead of raw '-1' check as a preparation
to adding other 'special' values.
No functional change intended.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210526132026.270394-2-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Currently the remote TLB flush logic is specific to VMX.
Move it to a common place so that SVM can use it as well.
Signed-off-by: Vineeth Pillai <viremana@linux.microsoft.com>
Message-Id: <4f4e4ca19778437dae502f44363a38e99e3ef5d1.1622730232.git.viremana@linux.microsoft.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
to guest code
Currently, the 'nested_run' statistic counts all guest-entry attempts,
including those that fail during vmentry checks on Intel and during
consistency checks on AMD. Convert this statistic to count only those
guest-entries that make it past these state checks and make it to guest
code. This will tell us the number of guest-entries that actually executed
or tried to execute guest code.
Signed-off-by: Krish Sadhukhan <Krish.Sadhukhan@oracle.com>
Message-Id: <20210609180340.104248-2-krish.sadhukhan@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Now that .post_leave_smm() is gone, drop "pre_" from the remaining
helpers. The helpers aren't invoked purely before SMI/RSM processing,
e.g. both helpers are invoked after state is snapshotted (from regs or
SMRAM), and the RSM helper is invoked after some amount of register state
has been stuffed.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210609185619.992058-10-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Now that APICv/AVIC enablement is kept in common 'enable_apicv' variable,
there's no need to call kvm_apicv_init() from vendor specific code.
No functional change intended.
Reviewed-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210609150911.1471882-3-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Unify VMX and SVM code by moving APICv/AVIC enablement tracking to common
'enable_apicv' variable. Note: unlike APICv, AVIC is disabled by default.
No functional change intended.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210609150911.1471882-2-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Currently vmx_vcpu_load_vmcs() writes the TSC_MULTIPLIER field of the
VMCS every time the VMCS is loaded. Instead of doing this, set this
field from common code on initialization and whenever the scaling ratio
changes.
Additionally remove vmx->current_tsc_ratio. This field is redundant as
vcpu->arch.tsc_scaling_ratio already tracks the current TSC scaling
ratio. The vmx->current_tsc_ratio field is only used for avoiding
unnecessary writes but it is no longer needed after removing the code
from the VMCS load path.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
Message-Id: <20210607105438.16541-1-ilstam@amazon.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|