diff options
-rw-r--r-- | Documentation/virt/kvm/api.rst | 11 | ||||
-rw-r--r-- | Documentation/virt/kvm/review-checklist.rst | 95 | ||||
-rw-r--r-- | arch/arm64/kvm/sys_regs.c | 2 | ||||
-rw-r--r-- | arch/riscv/include/asm/kvm_aia.h | 4 | ||||
-rw-r--r-- | arch/riscv/include/asm/kvm_host.h | 3 | ||||
-rw-r--r-- | arch/riscv/kvm/aia.c | 51 | ||||
-rw-r--r-- | arch/riscv/kvm/aia_imsic.c | 45 | ||||
-rw-r--r-- | arch/riscv/kvm/vcpu.c | 10 | ||||
-rw-r--r-- | arch/riscv/kvm/vcpu_timer.c | 16 | ||||
-rw-r--r-- | arch/x86/kvm/vmx/tdx.c | 9 | ||||
-rw-r--r-- | arch/x86/kvm/x86.c | 4 |
11 files changed, 180 insertions, 70 deletions
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 43ed57e048a8..544fb11351d9 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -2008,6 +2008,13 @@ If the KVM_CAP_VM_TSC_CONTROL capability is advertised, this can also be used as a vm ioctl to set the initial tsc frequency of subsequently created vCPUs. +For TSC protected Confidential Computing (CoCo) VMs where TSC frequency +is configured once at VM scope and remains unchanged during VM's +lifetime, the vm ioctl should be used to configure the TSC frequency +and the vcpu ioctl is not supported. + +Example of such CoCo VMs: TDX guests. + 4.56 KVM_GET_TSC_KHZ -------------------- @@ -7230,8 +7237,8 @@ inputs and outputs of the TDVMCALL. Currently the following values of placed in fields from ``r11`` to ``r14`` of the ``get_tdvmcall_info`` field of the union. -* ``TDVMCALL_SETUP_EVENT_NOTIFY_INTERRUPT``: the guest has requested to -set up a notification interrupt for vector ``vector``. + * ``TDVMCALL_SETUP_EVENT_NOTIFY_INTERRUPT``: the guest has requested to + set up a notification interrupt for vector ``vector``. KVM may add support for more values in the future that may cause a userspace exit, even without calls to ``KVM_ENABLE_CAP`` or similar. In this case, diff --git a/Documentation/virt/kvm/review-checklist.rst b/Documentation/virt/kvm/review-checklist.rst index dc01aea4057b..debac54e14e7 100644 --- a/Documentation/virt/kvm/review-checklist.rst +++ b/Documentation/virt/kvm/review-checklist.rst @@ -7,7 +7,7 @@ Review checklist for kvm patches 1. The patch must follow Documentation/process/coding-style.rst and Documentation/process/submitting-patches.rst. -2. Patches should be against kvm.git master branch. +2. Patches should be against kvm.git master or next branches. 3. If the patch introduces or modifies a new userspace API: - the API must be documented in Documentation/virt/kvm/api.rst @@ -18,10 +18,10 @@ Review checklist for kvm patches 5. New features must default to off (userspace should explicitly request them). Performance improvements can and should default to on. -6. New cpu features should be exposed via KVM_GET_SUPPORTED_CPUID2 +6. New cpu features should be exposed via KVM_GET_SUPPORTED_CPUID2, + or its equivalent for non-x86 architectures -7. Emulator changes should be accompanied by unit tests for qemu-kvm.git - kvm/test directory. +7. The feature should be testable (see below). 8. Changes should be vendor neutral when possible. Changes to common code are better than duplicating changes to vendor code. @@ -36,6 +36,87 @@ Review checklist for kvm patches 11. New guest visible features must either be documented in a hardware manual or be accompanied by documentation. -12. Features must be robust against reset and kexec - for example, shared - host/guest memory must be unshared to prevent the host from writing to - guest memory that the guest has not reserved for this purpose. +Testing of KVM code +------------------- + +All features contributed to KVM, and in many cases bugfixes too, should be +accompanied by some kind of tests and/or enablement in open source guests +and VMMs. KVM is covered by multiple test suites: + +*Selftests* + These are low level tests that allow granular testing of kernel APIs. + This includes API failure scenarios, invoking APIs after specific + guest instructions, and testing multiple calls to ``KVM_CREATE_VM`` + within a single test. They are included in the kernel tree at + ``tools/testing/selftests/kvm``. + +``kvm-unit-tests`` + A collection of small guests that test CPU and emulated device features + from a guest's perspective. They run under QEMU or ``kvmtool``, and + are generally not KVM-specific: they can be run with any accelerator + that QEMU support or even on bare metal, making it possible to compare + behavior across hypervisors and processor families. + +Functional test suites + Various sets of functional tests exist, such as QEMU's ``tests/functional`` + suite and `avocado-vt <https://avocado-vt.readthedocs.io/en/latest/>`__. + These typically involve running a full operating system in a virtual + machine. + +The best testing approach depends on the feature's complexity and +operation. Here are some examples and guidelines: + +New instructions (no new registers or APIs) + The corresponding CPU features (if applicable) should be made available + in QEMU. If the instructions require emulation support or other code in + KVM, it is worth adding coverage to ``kvm-unit-tests`` or selftests; + the latter can be a better choice if the instructions relate to an API + that already has good selftest coverage. + +New hardware features (new registers, no new APIs) + These should be tested via ``kvm-unit-tests``; this more or less implies + supporting them in QEMU and/or ``kvmtool``. In some cases selftests + can be used instead, similar to the previous case, or specifically to + test corner cases in guest state save/restore. + +Bug fixes and performance improvements + These usually do not introduce new APIs, but it's worth sharing + any benchmarks and tests that will validate your contribution, + ideally in the form of regression tests. Tests and benchmarks + can be included in either ``kvm-unit-tests`` or selftests, depending + on the specifics of your change. Selftests are especially useful for + regression tests because they are included directly in Linux's tree. + +Large scale internal changes + While it's difficult to provide a single policy, you should ensure that + the changed code is covered by either ``kvm-unit-tests`` or selftests. + In some cases the affected code is run for any guests and functional + tests suffice. Explain your testing process in the cover letter, + as that can help identify gaps in existing test suites. + +New APIs + It is important to demonstrate your use case. This can be as simple as + explaining that the feature is already in use on bare metal, or it can be + a proof-of-concept implementation in userspace. The latter need not be + open source, though that is of course preferrable for easier testing. + Selftests should test corner cases of the APIs, and should also cover + basic host and guest operation if no open source VMM uses the feature. + +Bigger features, usually spanning host and guest + These should be supported by Linux guests, with limited exceptions for + Hyper-V features that are testable on Windows guests. It is strongly + suggested that the feature be usable with an open source host VMM, such + as at least one of QEMU or crosvm, and guest firmware. Selftests should + test at least API error cases. Guest operation can be covered by + either selftests of ``kvm-unit-tests`` (this is especially important for + paravirtualized and Windows-only features). Strong selftest coverage + can also be a replacement for implementation in an open source VMM, + but this is generally not recommended. + +Following the above suggestions for testing in selftests and +``kvm-unit-tests`` will make it easier for the maintainers to review +and accept your code. In fact, even before you contribute your changes +upstream it will make it easier for you to develop for KVM. + +Of course, the KVM maintainers reserve the right to require more tests, +though they may also waive the requirement from time to time. diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c index 76c2f0da821f..c20bd6f21e60 100644 --- a/arch/arm64/kvm/sys_regs.c +++ b/arch/arm64/kvm/sys_regs.c @@ -2624,7 +2624,7 @@ static bool access_mdcr(struct kvm_vcpu *vcpu, */ if (hpmn > vcpu->kvm->arch.nr_pmu_counters) { hpmn = vcpu->kvm->arch.nr_pmu_counters; - u64_replace_bits(val, hpmn, MDCR_EL2_HPMN); + u64p_replace_bits(&val, hpmn, MDCR_EL2_HPMN); } __vcpu_assign_sys_reg(vcpu, MDCR_EL2, val); diff --git a/arch/riscv/include/asm/kvm_aia.h b/arch/riscv/include/asm/kvm_aia.h index 3b643b9efc07..5acce285e56e 100644 --- a/arch/riscv/include/asm/kvm_aia.h +++ b/arch/riscv/include/asm/kvm_aia.h @@ -87,6 +87,9 @@ DECLARE_STATIC_KEY_FALSE(kvm_riscv_aia_available); extern struct kvm_device_ops kvm_riscv_aia_device_ops; +bool kvm_riscv_vcpu_aia_imsic_has_interrupt(struct kvm_vcpu *vcpu); +void kvm_riscv_vcpu_aia_imsic_load(struct kvm_vcpu *vcpu, int cpu); +void kvm_riscv_vcpu_aia_imsic_put(struct kvm_vcpu *vcpu); void kvm_riscv_vcpu_aia_imsic_release(struct kvm_vcpu *vcpu); int kvm_riscv_vcpu_aia_imsic_update(struct kvm_vcpu *vcpu); @@ -161,7 +164,6 @@ void kvm_riscv_aia_destroy_vm(struct kvm *kvm); int kvm_riscv_aia_alloc_hgei(int cpu, struct kvm_vcpu *owner, void __iomem **hgei_va, phys_addr_t *hgei_pa); void kvm_riscv_aia_free_hgei(int cpu, int hgei); -void kvm_riscv_aia_wakeon_hgei(struct kvm_vcpu *owner, bool enable); void kvm_riscv_aia_enable(void); void kvm_riscv_aia_disable(void); diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h index 85cfebc32e4c..bcbf8b1ec115 100644 --- a/arch/riscv/include/asm/kvm_host.h +++ b/arch/riscv/include/asm/kvm_host.h @@ -306,6 +306,9 @@ static inline bool kvm_arch_pmi_in_guest(struct kvm_vcpu *vcpu) return IS_ENABLED(CONFIG_GUEST_PERF_EVENTS) && !!vcpu; } +static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {} +static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {} + #define KVM_RISCV_GSTAGE_TLB_MIN_ORDER 12 void kvm_riscv_local_hfence_gvma_vmid_gpa(unsigned long vmid, diff --git a/arch/riscv/kvm/aia.c b/arch/riscv/kvm/aia.c index 19afd1f23537..dad318185660 100644 --- a/arch/riscv/kvm/aia.c +++ b/arch/riscv/kvm/aia.c @@ -30,28 +30,6 @@ unsigned int kvm_riscv_aia_nr_hgei; unsigned int kvm_riscv_aia_max_ids; DEFINE_STATIC_KEY_FALSE(kvm_riscv_aia_available); -static int aia_find_hgei(struct kvm_vcpu *owner) -{ - int i, hgei; - unsigned long flags; - struct aia_hgei_control *hgctrl = get_cpu_ptr(&aia_hgei); - - raw_spin_lock_irqsave(&hgctrl->lock, flags); - - hgei = -1; - for (i = 1; i <= kvm_riscv_aia_nr_hgei; i++) { - if (hgctrl->owners[i] == owner) { - hgei = i; - break; - } - } - - raw_spin_unlock_irqrestore(&hgctrl->lock, flags); - - put_cpu_ptr(&aia_hgei); - return hgei; -} - static inline unsigned long aia_hvictl_value(bool ext_irq_pending) { unsigned long hvictl; @@ -95,7 +73,6 @@ void kvm_riscv_vcpu_aia_sync_interrupts(struct kvm_vcpu *vcpu) bool kvm_riscv_vcpu_aia_has_interrupts(struct kvm_vcpu *vcpu, u64 mask) { - int hgei; unsigned long seip; if (!kvm_riscv_aia_available()) @@ -114,11 +91,7 @@ bool kvm_riscv_vcpu_aia_has_interrupts(struct kvm_vcpu *vcpu, u64 mask) if (!kvm_riscv_aia_initialized(vcpu->kvm) || !seip) return false; - hgei = aia_find_hgei(vcpu); - if (hgei > 0) - return !!(ncsr_read(CSR_HGEIP) & BIT(hgei)); - - return false; + return kvm_riscv_vcpu_aia_imsic_has_interrupt(vcpu); } void kvm_riscv_vcpu_aia_update_hvip(struct kvm_vcpu *vcpu) @@ -164,6 +137,9 @@ void kvm_riscv_vcpu_aia_load(struct kvm_vcpu *vcpu, int cpu) csr_write(CSR_HVIPRIO2H, csr->hviprio2h); #endif } + + if (kvm_riscv_aia_initialized(vcpu->kvm)) + kvm_riscv_vcpu_aia_imsic_load(vcpu, cpu); } void kvm_riscv_vcpu_aia_put(struct kvm_vcpu *vcpu) @@ -174,6 +150,9 @@ void kvm_riscv_vcpu_aia_put(struct kvm_vcpu *vcpu) if (!kvm_riscv_aia_available()) return; + if (kvm_riscv_aia_initialized(vcpu->kvm)) + kvm_riscv_vcpu_aia_imsic_put(vcpu); + if (kvm_riscv_nacl_available()) { nsh = nacl_shmem(); csr->vsiselect = nacl_csr_read(nsh, CSR_VSISELECT); @@ -472,22 +451,6 @@ void kvm_riscv_aia_free_hgei(int cpu, int hgei) raw_spin_unlock_irqrestore(&hgctrl->lock, flags); } -void kvm_riscv_aia_wakeon_hgei(struct kvm_vcpu *owner, bool enable) -{ - int hgei; - - if (!kvm_riscv_aia_available()) - return; - - hgei = aia_find_hgei(owner); - if (hgei > 0) { - if (enable) - csr_set(CSR_HGEIE, BIT(hgei)); - else - csr_clear(CSR_HGEIE, BIT(hgei)); - } -} - static irqreturn_t hgei_interrupt(int irq, void *dev_id) { int i; diff --git a/arch/riscv/kvm/aia_imsic.c b/arch/riscv/kvm/aia_imsic.c index 29ef9c2133a9..2ff865943ebb 100644 --- a/arch/riscv/kvm/aia_imsic.c +++ b/arch/riscv/kvm/aia_imsic.c @@ -676,6 +676,48 @@ static void imsic_swfile_update(struct kvm_vcpu *vcpu, imsic_swfile_extirq_update(vcpu); } +bool kvm_riscv_vcpu_aia_imsic_has_interrupt(struct kvm_vcpu *vcpu) +{ + struct imsic *imsic = vcpu->arch.aia_context.imsic_state; + unsigned long flags; + bool ret = false; + + /* + * The IMSIC SW-file directly injects interrupt via hvip so + * only check for interrupt when IMSIC VS-file is being used. + */ + + read_lock_irqsave(&imsic->vsfile_lock, flags); + if (imsic->vsfile_cpu > -1) + ret = !!(csr_read(CSR_HGEIP) & BIT(imsic->vsfile_hgei)); + read_unlock_irqrestore(&imsic->vsfile_lock, flags); + + return ret; +} + +void kvm_riscv_vcpu_aia_imsic_load(struct kvm_vcpu *vcpu, int cpu) +{ + /* + * No need to explicitly clear HGEIE CSR bits because the + * hgei interrupt handler (aka hgei_interrupt()) will always + * clear it for us. + */ +} + +void kvm_riscv_vcpu_aia_imsic_put(struct kvm_vcpu *vcpu) +{ + struct imsic *imsic = vcpu->arch.aia_context.imsic_state; + unsigned long flags; + + if (!kvm_vcpu_is_blocking(vcpu)) + return; + + read_lock_irqsave(&imsic->vsfile_lock, flags); + if (imsic->vsfile_cpu > -1) + csr_set(CSR_HGEIE, BIT(imsic->vsfile_hgei)); + read_unlock_irqrestore(&imsic->vsfile_lock, flags); +} + void kvm_riscv_vcpu_aia_imsic_release(struct kvm_vcpu *vcpu) { unsigned long flags; @@ -781,6 +823,9 @@ int kvm_riscv_vcpu_aia_imsic_update(struct kvm_vcpu *vcpu) * producers to the new IMSIC VS-file. */ + /* Ensure HGEIE CSR bit is zero before using the new IMSIC VS-file */ + csr_clear(CSR_HGEIE, BIT(new_vsfile_hgei)); + /* Zero-out new IMSIC VS-file */ imsic_vsfile_local_clear(new_vsfile_hgei, imsic->nr_hw_eix); diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c index e0a01af426ff..0462863206ca 100644 --- a/arch/riscv/kvm/vcpu.c +++ b/arch/riscv/kvm/vcpu.c @@ -207,16 +207,6 @@ int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu) return kvm_riscv_vcpu_timer_pending(vcpu); } -void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) -{ - kvm_riscv_aia_wakeon_hgei(vcpu, true); -} - -void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) -{ - kvm_riscv_aia_wakeon_hgei(vcpu, false); -} - int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu) { return (kvm_riscv_vcpu_has_interrupts(vcpu, -1UL) && diff --git a/arch/riscv/kvm/vcpu_timer.c b/arch/riscv/kvm/vcpu_timer.c index ff672fa71fcc..85a7262115e1 100644 --- a/arch/riscv/kvm/vcpu_timer.c +++ b/arch/riscv/kvm/vcpu_timer.c @@ -345,8 +345,24 @@ void kvm_riscv_vcpu_timer_save(struct kvm_vcpu *vcpu) /* * The vstimecmp CSRs are saved by kvm_riscv_vcpu_timer_sync() * upon every VM exit so no need to save here. + * + * If VS-timer expires when no VCPU running on a host CPU then + * WFI executed by such host CPU will be effective NOP resulting + * in no power savings. This is because as-per RISC-V Privileged + * specificaiton: "WFI is also required to resume execution for + * locally enabled interrupts pending at any privilege level, + * regardless of the global interrupt enable at each privilege + * level." + * + * To address the above issue, vstimecmp CSR must be set to -1UL + * over here when VCPU is scheduled-out or exits to user space. */ + csr_write(CSR_VSTIMECMP, -1UL); +#if defined(CONFIG_32BIT) + csr_write(CSR_VSTIMECMPH, -1UL); +#endif + /* timer should be enabled for the remaining operations */ if (unlikely(!t->init_done)) return; diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index f31ccdeb905b..ec79aacc446f 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -173,7 +173,6 @@ static void td_init_cpuid_entry2(struct kvm_cpuid_entry2 *entry, unsigned char i tdx_clear_unsupported_cpuid(entry); } -#define TDVMCALLINFO_GET_QUOTE BIT(0) #define TDVMCALLINFO_SETUP_EVENT_NOTIFY_INTERRUPT BIT(1) static int init_kvm_tdx_caps(const struct tdx_sys_info_td_conf *td_conf, @@ -192,7 +191,6 @@ static int init_kvm_tdx_caps(const struct tdx_sys_info_td_conf *td_conf, caps->cpuid.nent = td_conf->num_cpuid_config; caps->user_tdvmcallinfo_1_r11 = - TDVMCALLINFO_GET_QUOTE | TDVMCALLINFO_SETUP_EVENT_NOTIFY_INTERRUPT; for (i = 0; i < td_conf->num_cpuid_config; i++) @@ -2271,25 +2269,26 @@ static int tdx_get_capabilities(struct kvm_tdx_cmd *cmd) const struct tdx_sys_info_td_conf *td_conf = &tdx_sysinfo->td_conf; struct kvm_tdx_capabilities __user *user_caps; struct kvm_tdx_capabilities *caps = NULL; + u32 nr_user_entries; int ret = 0; /* flags is reserved for future use */ if (cmd->flags) return -EINVAL; - caps = kmalloc(sizeof(*caps) + + caps = kzalloc(sizeof(*caps) + sizeof(struct kvm_cpuid_entry2) * td_conf->num_cpuid_config, GFP_KERNEL); if (!caps) return -ENOMEM; user_caps = u64_to_user_ptr(cmd->data); - if (copy_from_user(caps, user_caps, sizeof(*caps))) { + if (get_user(nr_user_entries, &user_caps->cpuid.nent)) { ret = -EFAULT; goto out; } - if (caps->cpuid.nent < td_conf->num_cpuid_config) { + if (nr_user_entries < td_conf->num_cpuid_config) { ret = -E2BIG; goto out; } diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 357b9e3a6cef..93636f77c42d 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -6188,6 +6188,10 @@ long kvm_arch_vcpu_ioctl(struct file *filp, u32 user_tsc_khz; r = -EINVAL; + + if (vcpu->arch.guest_tsc_protected) + goto out; + user_tsc_khz = (u32)arg; if (kvm_caps.has_tsc_control && |