diff options
| author | Paolo Bonzini <pbonzini@redhat.com> | 2026-05-30 19:55:43 +0300 |
|---|---|---|
| committer | Paolo Bonzini <pbonzini@redhat.com> | 2026-06-12 11:43:52 +0300 |
| commit | 62bad2b2ccf3ebfcf1055f0cd76673ea95f724bd (patch) | |
| tree | 62ffdd031d651acbf0335f370da116f35b3f32e3 | |
| parent | 9eff3e99a81c20148b934d5fe882bcfe7c6078ff (diff) | |
| download | linux-62bad2b2ccf3ebfcf1055f0cd76673ea95f724bd.tar.xz | |
KVM: nSVM: invalidate cached PDPTRs across nested NPT transitions
When L2 runs under nested NPT and uses PAE paging, KVM's cached PDPTRs
in mmu->pdptrs[] can hold stale or wrong values after nested
transitions and across migration restore, because both
nested_svm_load_cr3() and svm_get_nested_state_pages() only refresh
PDPTRs on the !nested_npt path.
The user-visible bug is on migration restore of an L2 running with nested
NPT and 32-bit PAE paging, if userspace uses KVM_SET_SREGS rather than
KVM_SET_SREGS2. In that case, load_pdptrs() leaves VCPU_EXREG_PDPTR
marked as available, and kvm_pdptr_read() will use a stale translation
that used L1 GPAs instead of L2 nGPAs. svm_get_nested_state_pages()
runs on first KVM_RUN but skips the refresh because nested_npt_enabled()
is true. The CPU itself reads L2's PDPTRs correctly from memory via
L1's NPT, but KVM-side walking of guest PAE page tables uses the bogus
cached values.
Unlike Intel's GUEST_PDPTR0..3 fields in the VMCS, SVM has no
VMCB-cached PDPTR state: the in-memory PDPTEs at the current CR3 are
the only source of truth, and svm_cache_reg(VCPU_EXREG_PDPTR) simply
reloads them from memory via load_pdptrs(). Clearing the avail
bit (and the dirty bit because !avail/dirty is invalid) to force
a reload when PDPTRs as needed fixes the bug.
Do the same for nested_svm_load_cr3()'s nested_npt branch, so that
the invariant "PDPTRs need reloading" is handled similarly for both
immediate and deferred loading.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20260530165545.25599-4-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| -rw-r--r-- | arch/x86/kvm/kvm_cache_regs.h | 8 | ||||
| -rw-r--r-- | arch/x86/kvm/svm/nested.c | 27 |
2 files changed, 26 insertions, 9 deletions
diff --git a/arch/x86/kvm/kvm_cache_regs.h b/arch/x86/kvm/kvm_cache_regs.h index 2ae492ad6412..6bae5db5a54e 100644 --- a/arch/x86/kvm/kvm_cache_regs.h +++ b/arch/x86/kvm/kvm_cache_regs.h @@ -77,6 +77,14 @@ static inline bool kvm_register_is_dirty(struct kvm_vcpu *vcpu, return test_bit(reg, vcpu->arch.regs_dirty); } +static inline void kvm_register_mark_for_reload(struct kvm_vcpu *vcpu, + enum kvm_reg reg) +{ + kvm_assert_register_caching_allowed(vcpu); + __clear_bit(reg, vcpu->arch.regs_avail); + __clear_bit(reg, vcpu->arch.regs_dirty); +} + static inline void kvm_register_mark_available(struct kvm_vcpu *vcpu, enum kvm_reg reg) { diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c index 1bf3e4804ad0..e6aa54d43730 100644 --- a/arch/x86/kvm/svm/nested.c +++ b/arch/x86/kvm/svm/nested.c @@ -690,9 +690,12 @@ static int nested_svm_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3, if (CC(!kvm_vcpu_is_legal_cr3(vcpu, cr3))) return -EINVAL; - if (reload_pdptrs && !nested_npt && is_pae_paging(vcpu) && - CC(!load_pdptrs(vcpu, cr3))) - return -EINVAL; + if (reload_pdptrs && is_pae_paging(vcpu)) { + if (nested_npt) + kvm_register_mark_for_reload(vcpu, VCPU_REG_PDPTR); + else if (CC(!load_pdptrs(vcpu, cr3))) + return -EINVAL; + } vcpu->arch.cr3 = cr3; @@ -2040,15 +2043,21 @@ static bool svm_get_nested_state_pages(struct kvm_vcpu *vcpu) if (WARN_ON(!is_guest_mode(vcpu))) return true; - if (!vcpu->arch.pdptrs_from_userspace && - !nested_npt_enabled(to_svm(vcpu)) && is_pae_paging(vcpu)) + if (is_pae_paging(vcpu)) { /* - * Reload the guest's PDPTRs since after a migration - * the guest CR3 might be restored prior to setting the nested - * state which can lead to a load of wrong PDPTRs. + * After migration, CR3 may have been restored before + * KVM_SET_NESTED_STATE, so the PDPTR load into mmu->pdptrs[] + * may have treated CR3 as an L1 GPA. For nNPT, drop the + * cache so the next access reloads them with the proper + * nGPA translation. For !nNPT, reload eagerly unless userspace + * already supplied authoritative PDPTRs via KVM_SET_SREGS2. */ - if (CC(!load_pdptrs(vcpu, vcpu->arch.cr3))) + if (nested_npt_enabled(to_svm(vcpu))) + kvm_register_mark_for_reload(vcpu, VCPU_REG_PDPTR); + else if (!vcpu->arch.pdptrs_from_userspace && + CC(!load_pdptrs(vcpu, vcpu->arch.cr3))) return false; + } if (!nested_svm_merge_msrpm(vcpu)) { vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR; |
