From 042b6d11e00fd3614760454744ce492825201c2d Mon Sep 17 00:00:00 2001 From: Sean Christopherson Date: Wed, 17 Apr 2019 10:15:31 -0700 Subject: KVM: lapic: Disable timer advancement if adaptive tuning goes haywire commit 57bf67e73ce9bcce2258890f5abf2adf5f619f1a upstream. To minimize the latency of timer interrupts as observed by the guest, KVM adjusts the values it programs into the host timers to account for the host's overhead of programming and handling the timer event. Now that the timer advancement is automatically tuned during runtime, it's effectively unbounded by default, e.g. if KVM is running as L1 the advancement can measure in hundreds of milliseconds. Disable timer advancement if adaptive tuning yields an advancement of more than 5000ns, as large advancements can break reasonable assumptions of the guest, e.g. that a timer configured to fire after 1ms won't arrive on the next instruction. Although KVM busy waits to mitigate the case of a timer event arriving too early, complications can arise when shifting the interrupt too far, e.g. kvm-unit-test's vmx.interrupt test will fail when its "host" exits on interrupts as KVM may inject the INTR before the guest executes STI+HLT. Arguably the unit test is "broken" in the sense that delaying a timer interrupt by 1ms doesn't technically guarantee the interrupt will arrive after STI+HLT, but it's a reasonable assumption that KVM should support. Furthermore, an unbounded advancement also effectively unbounds the time spent busy waiting, e.g. if the guest programs a timer with a very large delay. 5000ns is a somewhat arbitrary threshold. When running on bare metal, which is the intended use case, timer advancement is expected to be in the general vicinity of 1000ns. 5000ns is high enough that false positives are unlikely, while not being so high as to negatively affect the host's performance/stability. Note, a future patch will enable userspace to disable KVM's adaptive tuning, which will allow priveleged userspace will to specifying an advancement value in excess of this arbitrary threshold in order to satisfy an abnormal use case. Cc: Liran Alon Cc: Wanpeng Li Cc: stable@vger.kernel.org Fixes: 3b8a5df6c4dc6 ("KVM: LAPIC: Tune lapic_timer_advance_ns automatically") Signed-off-by: Sean Christopherson Signed-off-by: Paolo Bonzini Signed-off-by: Greg Kroah-Hartman --- arch/x86/kvm/lapic.c | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'arch/x86/kvm/lapic.c') diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 4b6c2da7265c..debd0410f10c 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -1519,6 +1519,10 @@ void wait_lapic_expire(struct kvm_vcpu *vcpu) } if (abs(guest_tsc - tsc_deadline) < LAPIC_TIMER_ADVANCE_ADJUST_DONE) lapic_timer_advance_adjust_done = true; + if (unlikely(lapic_timer_advance_ns > 5000)) { + lapic_timer_advance_ns = 0; + lapic_timer_advance_adjust_done = true; + } } } -- cgit v1.2.3 From a72f524964fb0310156804ba121249f1eebd8c13 Mon Sep 17 00:00:00 2001 From: Liran Alon Date: Tue, 16 Apr 2019 20:36:34 +0300 Subject: KVM: x86: Consider LAPIC TSC-Deadline timer expired if deadline too short commit c09d65d9eab69985c75f98ed64541229f6fa9aa6 upstream. If guest sets MSR_IA32_TSCDEADLINE to value such that in host time-domain it's shorter than lapic_timer_advance_ns, we can reach a case that we call hrtimer_start() with expiration time set at the past. Because lapic_timer.timer is init with HRTIMER_MODE_ABS_PINNED, it is not allowed to run in softirq and therefore will never expire. To avoid such a scenario, verify that deadline expiration time is set on host time-domain further than (now + lapic_timer_advance_ns). A future patch can also consider adding a min_timer_deadline_ns module parameter, similar to min_timer_period_us to avoid races that amount of ns it takes to run logic could still call hrtimer_start() with expiration timer set at the past. Reviewed-by: Joao Martins Signed-off-by: Liran Alon Signed-off-by: Paolo Bonzini Signed-off-by: Greg Kroah-Hartman --- arch/x86/kvm/lapic.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) (limited to 'arch/x86/kvm/lapic.c') diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index debd0410f10c..34678ea56b4b 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -1543,9 +1543,12 @@ static void start_sw_tscdeadline(struct kvm_lapic *apic) now = ktime_get(); guest_tsc = kvm_read_l1_tsc(vcpu, rdtsc()); - if (likely(tscdeadline > guest_tsc)) { - ns = (tscdeadline - guest_tsc) * 1000000ULL; - do_div(ns, this_tsc_khz); + + ns = (tscdeadline - guest_tsc) * 1000000ULL; + do_div(ns, this_tsc_khz); + + if (likely(tscdeadline > guest_tsc) && + likely(ns > lapic_timer_advance_ns)) { expire = ktime_add_ns(now, ns); expire = ktime_sub_ns(expire, lapic_timer_advance_ns); hrtimer_start(&apic->lapic_timer.timer, -- cgit v1.2.3 From a3ca780612e5ec67cb716c47d31404b5f313afb9 Mon Sep 17 00:00:00 2001 From: Sean Christopherson Date: Wed, 17 Apr 2019 10:15:32 -0700 Subject: KVM: lapic: Track lapic timer advance per vCPU commit 39497d7660d9866a47a2dc9055672358da57ad3d upstream. Automatically adjusting the globally-shared timer advancement could corrupt the timer, e.g. if multiple vCPUs are concurrently adjusting the advancement value. That could be partially fixed by using a local variable for the arithmetic, but it would still be susceptible to a race when setting timer_advance_adjust_done. And because virtual_tsc_khz and tsc_scaling_ratio are per-vCPU, the correct calibration for a given vCPU may not apply to all vCPUs. Furthermore, lapic_timer_advance_ns is marked __read_mostly, which is effectively violated when finding a stable advancement takes an extended amount of timer. Opportunistically change the definition of lapic_timer_advance_ns to a u32 so that it matches the style of struct kvm_timer. Explicitly pass the param to kvm_create_lapic() so that it doesn't have to be exposed to lapic.c, thus reducing the probability of unintentionally using the global value instead of the per-vCPU value. Cc: Liran Alon Cc: Wanpeng Li Reviewed-by: Liran Alon Cc: stable@vger.kernel.org Fixes: 3b8a5df6c4dc6 ("KVM: LAPIC: Tune lapic_timer_advance_ns automatically") Signed-off-by: Sean Christopherson Signed-off-by: Paolo Bonzini Signed-off-by: Greg Kroah-Hartman --- arch/x86/kvm/lapic.c | 36 +++++++++++++++++++----------------- arch/x86/kvm/lapic.h | 4 +++- arch/x86/kvm/vmx/vmx.c | 4 +++- arch/x86/kvm/x86.c | 7 +++---- arch/x86/kvm/x86.h | 2 -- 5 files changed, 28 insertions(+), 25 deletions(-) (limited to 'arch/x86/kvm/lapic.c') diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 34678ea56b4b..b6b4b44940f0 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -70,7 +70,6 @@ #define APIC_BROADCAST 0xFF #define X2APIC_BROADCAST 0xFFFFFFFFul -static bool lapic_timer_advance_adjust_done = false; #define LAPIC_TIMER_ADVANCE_ADJUST_DONE 100 /* step-by-step approximation to mitigate fluctuation */ #define LAPIC_TIMER_ADVANCE_ADJUST_STEP 8 @@ -1482,6 +1481,7 @@ static bool lapic_timer_int_injected(struct kvm_vcpu *vcpu) void wait_lapic_expire(struct kvm_vcpu *vcpu) { struct kvm_lapic *apic = vcpu->arch.apic; + u32 timer_advance_ns = apic->lapic_timer.timer_advance_ns; u64 guest_tsc, tsc_deadline, ns; if (!lapic_in_kernel(vcpu)) @@ -1501,34 +1501,36 @@ void wait_lapic_expire(struct kvm_vcpu *vcpu) /* __delay is delay_tsc whenever the hardware has TSC, thus always. */ if (guest_tsc < tsc_deadline) __delay(min(tsc_deadline - guest_tsc, - nsec_to_cycles(vcpu, lapic_timer_advance_ns))); + nsec_to_cycles(vcpu, timer_advance_ns))); - if (!lapic_timer_advance_adjust_done) { + if (!apic->lapic_timer.timer_advance_adjust_done) { /* too early */ if (guest_tsc < tsc_deadline) { ns = (tsc_deadline - guest_tsc) * 1000000ULL; do_div(ns, vcpu->arch.virtual_tsc_khz); - lapic_timer_advance_ns -= min((unsigned int)ns, - lapic_timer_advance_ns / LAPIC_TIMER_ADVANCE_ADJUST_STEP); + timer_advance_ns -= min((u32)ns, + timer_advance_ns / LAPIC_TIMER_ADVANCE_ADJUST_STEP); } else { /* too late */ ns = (guest_tsc - tsc_deadline) * 1000000ULL; do_div(ns, vcpu->arch.virtual_tsc_khz); - lapic_timer_advance_ns += min((unsigned int)ns, - lapic_timer_advance_ns / LAPIC_TIMER_ADVANCE_ADJUST_STEP); + timer_advance_ns += min((u32)ns, + timer_advance_ns / LAPIC_TIMER_ADVANCE_ADJUST_STEP); } if (abs(guest_tsc - tsc_deadline) < LAPIC_TIMER_ADVANCE_ADJUST_DONE) - lapic_timer_advance_adjust_done = true; - if (unlikely(lapic_timer_advance_ns > 5000)) { - lapic_timer_advance_ns = 0; - lapic_timer_advance_adjust_done = true; + apic->lapic_timer.timer_advance_adjust_done = true; + if (unlikely(timer_advance_ns > 5000)) { + timer_advance_ns = 0; + apic->lapic_timer.timer_advance_adjust_done = true; } + apic->lapic_timer.timer_advance_ns = timer_advance_ns; } } static void start_sw_tscdeadline(struct kvm_lapic *apic) { - u64 guest_tsc, tscdeadline = apic->lapic_timer.tscdeadline; + struct kvm_timer *ktimer = &apic->lapic_timer; + u64 guest_tsc, tscdeadline = ktimer->tscdeadline; u64 ns = 0; ktime_t expire; struct kvm_vcpu *vcpu = apic->vcpu; @@ -1548,11 +1550,10 @@ static void start_sw_tscdeadline(struct kvm_lapic *apic) do_div(ns, this_tsc_khz); if (likely(tscdeadline > guest_tsc) && - likely(ns > lapic_timer_advance_ns)) { + likely(ns > apic->lapic_timer.timer_advance_ns)) { expire = ktime_add_ns(now, ns); - expire = ktime_sub_ns(expire, lapic_timer_advance_ns); - hrtimer_start(&apic->lapic_timer.timer, - expire, HRTIMER_MODE_ABS_PINNED); + expire = ktime_sub_ns(expire, ktimer->timer_advance_ns); + hrtimer_start(&ktimer->timer, expire, HRTIMER_MODE_ABS_PINNED); } else apic_timer_expired(apic); @@ -2259,7 +2260,7 @@ static enum hrtimer_restart apic_timer_fn(struct hrtimer *data) return HRTIMER_NORESTART; } -int kvm_create_lapic(struct kvm_vcpu *vcpu) +int kvm_create_lapic(struct kvm_vcpu *vcpu, u32 timer_advance_ns) { struct kvm_lapic *apic; @@ -2283,6 +2284,7 @@ int kvm_create_lapic(struct kvm_vcpu *vcpu) hrtimer_init(&apic->lapic_timer.timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_PINNED); apic->lapic_timer.timer.function = apic_timer_fn; + apic->lapic_timer.timer_advance_ns = timer_advance_ns; /* * APIC is created enabled. This will prevent kvm_lapic_set_base from diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h index ff6ef9c3d760..3e97f8a68967 100644 --- a/arch/x86/kvm/lapic.h +++ b/arch/x86/kvm/lapic.h @@ -31,8 +31,10 @@ struct kvm_timer { u32 timer_mode_mask; u64 tscdeadline; u64 expired_tscdeadline; + u32 timer_advance_ns; atomic_t pending; /* accumulated triggered timers */ bool hv_timer_in_use; + bool timer_advance_adjust_done; }; struct kvm_lapic { @@ -62,7 +64,7 @@ struct kvm_lapic { struct dest_map; -int kvm_create_lapic(struct kvm_vcpu *vcpu); +int kvm_create_lapic(struct kvm_vcpu *vcpu, u32 timer_advance_ns); void kvm_free_lapic(struct kvm_vcpu *vcpu); int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu); diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index e7fe8c692362..72f1d17711f2 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -7133,6 +7133,7 @@ static int vmx_set_hv_timer(struct kvm_vcpu *vcpu, u64 guest_deadline_tsc) { struct vcpu_vmx *vmx; u64 tscl, guest_tscl, delta_tsc, lapic_timer_advance_cycles; + struct kvm_timer *ktimer = &vcpu->arch.apic->lapic_timer; if (kvm_mwait_in_guest(vcpu->kvm)) return -EOPNOTSUPP; @@ -7141,7 +7142,8 @@ static int vmx_set_hv_timer(struct kvm_vcpu *vcpu, u64 guest_deadline_tsc) tscl = rdtsc(); guest_tscl = kvm_read_l1_tsc(vcpu, tscl); delta_tsc = max(guest_deadline_tsc, guest_tscl) - guest_tscl; - lapic_timer_advance_cycles = nsec_to_cycles(vcpu, lapic_timer_advance_ns); + lapic_timer_advance_cycles = nsec_to_cycles(vcpu, + ktimer->timer_advance_ns); if (delta_tsc > lapic_timer_advance_cycles) delta_tsc -= lapic_timer_advance_cycles; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 7e413ea19a9a..3244994b111d 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -137,9 +137,8 @@ static u32 __read_mostly tsc_tolerance_ppm = 250; module_param(tsc_tolerance_ppm, uint, S_IRUGO | S_IWUSR); /* lapic timer advance (tscdeadline mode only) in nanoseconds */ -unsigned int __read_mostly lapic_timer_advance_ns = 1000; +static u32 __read_mostly lapic_timer_advance_ns = 1000; module_param(lapic_timer_advance_ns, uint, S_IRUGO | S_IWUSR); -EXPORT_SYMBOL_GPL(lapic_timer_advance_ns); static bool __read_mostly vector_hashing = true; module_param(vector_hashing, bool, S_IRUGO); @@ -7882,7 +7881,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) } trace_kvm_entry(vcpu->vcpu_id); - if (lapic_timer_advance_ns) + if (vcpu->arch.apic->lapic_timer.timer_advance_ns) wait_lapic_expire(vcpu); guest_enter_irqoff(); @@ -9070,7 +9069,7 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu) goto fail_free_pio_data; if (irqchip_in_kernel(vcpu->kvm)) { - r = kvm_create_lapic(vcpu); + r = kvm_create_lapic(vcpu, lapic_timer_advance_ns); if (r < 0) goto fail_mmu_destroy; } else diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index de3d46769ee3..b457160dc7ba 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -294,8 +294,6 @@ extern u64 kvm_supported_xcr0(void); extern unsigned int min_timer_period_us; -extern unsigned int lapic_timer_advance_ns; - extern bool enable_vmware_backdoor; extern struct static_key kvm_no_apic_vcpu; -- cgit v1.2.3 From 63be90093cf54523dd158ca555319c413fc151b7 Mon Sep 17 00:00:00 2001 From: Sean Christopherson Date: Wed, 17 Apr 2019 10:15:33 -0700 Subject: KVM: lapic: Allow user to disable adaptive tuning of timer advancement commit c3941d9e0ccd48920e4811f133235b3597e5310b upstream. The introduction of adaptive tuning of lapic timer advancement did not allow for the scenario where userspace would want to disable adaptive tuning but still employ timer advancement, e.g. for testing purposes or to handle a use case where adaptive tuning is unable to settle on a suitable time. This is epecially pertinent now that KVM places a hard threshold on the maximum advancment time. Rework the timer semantics to accept signed values, with a value of '-1' being interpreted as "use adaptive tuning with KVM's internal default", and any other value being used as an explicit advancement time, e.g. a time of '0' effectively disables advancement. Note, this does not completely restore the original behavior of lapic_timer_advance_ns. Prior to tracking the advancement per vCPU, which is necessary to support autotuning, userspace could adjust lapic_timer_advance_ns for *running* vCPU. With per-vCPU tracking, the module params are snapshotted at vCPU creation, i.e. applying a new advancement effectively requires restarting a VM. Dynamically updating a running vCPU is possible, e.g. a helper could be added to retrieve the desired delay, choosing between the global module param and the per-VCPU value depending on whether or not auto-tuning is (globally) enabled, but introduces a great deal of complexity. The wrapper itself is not complex, but understanding and documenting the effects of dynamically toggling auto-tuning and/or adjusting the timer advancement is nigh impossible since the behavior would be dependent on KVM's implementation as well as compiler optimizations. In other words, providing stable behavior would require extremely careful consideration now and in the future. Given that the expected use of a manually-tuned timer advancement is to "tune once, run many", use the vastly simpler approach of recognizing changes to the module params only when creating a new vCPU. Cc: Liran Alon Cc: Wanpeng Li Reviewed-by: Liran Alon Cc: stable@vger.kernel.org Fixes: 3b8a5df6c4dc6 ("KVM: LAPIC: Tune lapic_timer_advance_ns automatically") Signed-off-by: Sean Christopherson Signed-off-by: Paolo Bonzini Signed-off-by: Greg Kroah-Hartman --- arch/x86/kvm/lapic.c | 11 +++++++++-- arch/x86/kvm/lapic.h | 2 +- arch/x86/kvm/x86.c | 9 +++++++-- 3 files changed, 17 insertions(+), 5 deletions(-) (limited to 'arch/x86/kvm/lapic.c') diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index b6b4b44940f0..cdff0248b596 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -2260,7 +2260,7 @@ static enum hrtimer_restart apic_timer_fn(struct hrtimer *data) return HRTIMER_NORESTART; } -int kvm_create_lapic(struct kvm_vcpu *vcpu, u32 timer_advance_ns) +int kvm_create_lapic(struct kvm_vcpu *vcpu, int timer_advance_ns) { struct kvm_lapic *apic; @@ -2284,7 +2284,14 @@ int kvm_create_lapic(struct kvm_vcpu *vcpu, u32 timer_advance_ns) hrtimer_init(&apic->lapic_timer.timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_PINNED); apic->lapic_timer.timer.function = apic_timer_fn; - apic->lapic_timer.timer_advance_ns = timer_advance_ns; + if (timer_advance_ns == -1) { + apic->lapic_timer.timer_advance_ns = 1000; + apic->lapic_timer.timer_advance_adjust_done = false; + } else { + apic->lapic_timer.timer_advance_ns = timer_advance_ns; + apic->lapic_timer.timer_advance_adjust_done = true; + } + /* * APIC is created enabled. This will prevent kvm_lapic_set_base from diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h index 3e97f8a68967..d6d049ba3045 100644 --- a/arch/x86/kvm/lapic.h +++ b/arch/x86/kvm/lapic.h @@ -64,7 +64,7 @@ struct kvm_lapic { struct dest_map; -int kvm_create_lapic(struct kvm_vcpu *vcpu, u32 timer_advance_ns); +int kvm_create_lapic(struct kvm_vcpu *vcpu, int timer_advance_ns); void kvm_free_lapic(struct kvm_vcpu *vcpu); int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 3244994b111d..613dd3c89b94 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -136,8 +136,13 @@ EXPORT_SYMBOL_GPL(kvm_default_tsc_scaling_ratio); static u32 __read_mostly tsc_tolerance_ppm = 250; module_param(tsc_tolerance_ppm, uint, S_IRUGO | S_IWUSR); -/* lapic timer advance (tscdeadline mode only) in nanoseconds */ -static u32 __read_mostly lapic_timer_advance_ns = 1000; +/* + * lapic timer advance (tscdeadline mode only) in nanoseconds. '-1' enables + * adaptive tuning starting from default advancment of 1000ns. '0' disables + * advancement entirely. Any other value is used as-is and disables adaptive + * tuning, i.e. allows priveleged userspace to set an exact advancement time. + */ +static int __read_mostly lapic_timer_advance_ns = -1; module_param(lapic_timer_advance_ns, uint, S_IRUGO | S_IWUSR); static bool __read_mostly vector_hashing = true; -- cgit v1.2.3 From b2e3c7d43f71989189fe34d72f7158fb0ad3fb89 Mon Sep 17 00:00:00 2001 From: Sean Christopherson Date: Wed, 17 Apr 2019 10:15:34 -0700 Subject: KVM: lapic: Convert guest TSC to host time domain if necessary commit b6aa57c69cb26ea0160c51f7cf45f1af23542686 upstream. To minimize the latency of timer interrupts as observed by the guest, KVM adjusts the values it programs into the host timers to account for the host's overhead of programming and handling the timer event. In the event that the adjustments are too aggressive, i.e. the timer fires earlier than the guest expects, KVM busy waits immediately prior to entering the guest. Currently, KVM manually converts the delay from nanoseconds to clock cycles. But, the conversion is done in the guest's time domain, while the delay occurs in the host's time domain. This is perfectly ok when the guest and host are using the same TSC ratio, but if the guest is using a different ratio then the delay may not be accurate and could wait too little or too long. When the guest is not using the host's ratio, convert the delay from guest clock cycles to host nanoseconds and use ndelay() instead of __delay() to provide more accurate timing. Because converting to nanoseconds is relatively expensive, e.g. requires division and more multiplication ops, continue using __delay() directly when guest and host TSCs are running at the same ratio. Cc: Liran Alon Cc: Wanpeng Li Cc: stable@vger.kernel.org Fixes: 3b8a5df6c4dc6 ("KVM: LAPIC: Tune lapic_timer_advance_ns automatically") Signed-off-by: Sean Christopherson Signed-off-by: Paolo Bonzini Signed-off-by: Greg Kroah-Hartman --- arch/x86/kvm/lapic.c | 24 +++++++++++++++++++++--- 1 file changed, 21 insertions(+), 3 deletions(-) (limited to 'arch/x86/kvm/lapic.c') diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index cdff0248b596..4ae142eea96a 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -1478,6 +1478,26 @@ static bool lapic_timer_int_injected(struct kvm_vcpu *vcpu) return false; } +static inline void __wait_lapic_expire(struct kvm_vcpu *vcpu, u64 guest_cycles) +{ + u64 timer_advance_ns = vcpu->arch.apic->lapic_timer.timer_advance_ns; + + /* + * If the guest TSC is running at a different ratio than the host, then + * convert the delay to nanoseconds to achieve an accurate delay. Note + * that __delay() uses delay_tsc whenever the hardware has TSC, thus + * always for VMX enabled hardware. + */ + if (vcpu->arch.tsc_scaling_ratio == kvm_default_tsc_scaling_ratio) { + __delay(min(guest_cycles, + nsec_to_cycles(vcpu, timer_advance_ns))); + } else { + u64 delay_ns = guest_cycles * 1000000ULL; + do_div(delay_ns, vcpu->arch.virtual_tsc_khz); + ndelay(min_t(u32, delay_ns, timer_advance_ns)); + } +} + void wait_lapic_expire(struct kvm_vcpu *vcpu) { struct kvm_lapic *apic = vcpu->arch.apic; @@ -1498,10 +1518,8 @@ void wait_lapic_expire(struct kvm_vcpu *vcpu) guest_tsc = kvm_read_l1_tsc(vcpu, rdtsc()); trace_kvm_wait_lapic_expire(vcpu->vcpu_id, guest_tsc - tsc_deadline); - /* __delay is delay_tsc whenever the hardware has TSC, thus always. */ if (guest_tsc < tsc_deadline) - __delay(min(tsc_deadline - guest_tsc, - nsec_to_cycles(vcpu, timer_advance_ns))); + __wait_lapic_expire(vcpu, tsc_deadline - guest_tsc); if (!apic->lapic_timer.timer_advance_adjust_done) { /* too early */ -- cgit v1.2.3 From 8f2b774cbf81603a8146fb2e2d92dc88001fafec Mon Sep 17 00:00:00 2001 From: Sean Christopherson Date: Thu, 25 Apr 2019 19:01:09 -0700 Subject: KVM: lapic: Check for in-kernel LAPIC before deferencing apic pointer commit b904cb8dff824b79233e82c078837627ebd52717 upstream. ...to avoid dereferencing a null pointer when querying the per-vCPU timer advance. Fixes: 39497d7660d98 ("KVM: lapic: Track lapic timer advance per vCPU") Reported-by: syzbot+f7e65445a40d3e0e4ebf@syzkaller.appspotmail.com Signed-off-by: Sean Christopherson Reviewed-by: Konrad Rzeszutek Wilk Signed-off-by: Paolo Bonzini Signed-off-by: Greg Kroah-Hartman --- arch/x86/kvm/lapic.c | 3 --- arch/x86/kvm/x86.c | 3 ++- 2 files changed, 2 insertions(+), 4 deletions(-) (limited to 'arch/x86/kvm/lapic.c') diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 4ae142eea96a..3339697de6e5 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -1504,9 +1504,6 @@ void wait_lapic_expire(struct kvm_vcpu *vcpu) u32 timer_advance_ns = apic->lapic_timer.timer_advance_ns; u64 guest_tsc, tsc_deadline, ns; - if (!lapic_in_kernel(vcpu)) - return; - if (apic->lapic_timer.expired_tscdeadline == 0) return; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 613dd3c89b94..3eeb7183fc09 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -7886,7 +7886,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) } trace_kvm_entry(vcpu->vcpu_id); - if (vcpu->arch.apic->lapic_timer.timer_advance_ns) + if (lapic_in_kernel(vcpu) && + vcpu->arch.apic->lapic_timer.timer_advance_ns) wait_lapic_expire(vcpu); guest_enter_irqoff(); -- cgit v1.2.3