summaryrefslogtreecommitdiff
path: root/arch/x86/kvm/x86.c
AgeCommit message (Collapse)AuthorFilesLines
2012-08-13KVM: x86: fix pvclock guest stopped flag reportingMarcelo Tosatti1-3/+10
kvm_guest_time_update unconditionally clears hv_clock.flags field, so the notification never reaches the guest. Fix it by allowing PVCLOCK_GUEST_STOPPED to passthrough. Reviewed-by: Eric B Munson <emunson@mgebm.net> Reviewed-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-08-09KVM: correctly detect APIC SW state in kvm_apic_post_state_restore()Gleb Natapov1-2/+1
For apic_set_spiv() to track APIC SW state correctly it needs to see previous and next values of the spurious vector register, but currently memset() overwrite the old value before apic_set_spiv() get a chance to do tracking. Fix it by calling apic_set_spiv() before overwriting old value. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-08-06KVM: use jump label to optimize checking for in kernel local apic presenceGleb Natapov1-1/+6
Usually all vcpus have local apic pointer initialized, so the check may be completely skipped. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-08-06KVM: use jump label to optimize checking for HW enabled APIC in APIC_BASE MSRGleb Natapov1-0/+1
Usually all APICs are HW enabled so the check can be optimized out. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-08-06KVM: clean up kvm_(set|get)_apic_baseGleb Natapov1-8/+2
kvm_get_apic_base() needlessly checks irqchip_in_kernel although it does the same no matter what result of the check is. kvm_set_apic_base() also checks for irqchip_in_kernel, but kvm_lapic_set_base() can handle this case. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-08-06KVM: do not release the error pageXiao Guangrong1-6/+3
After commit a2766325cf9f9, the error page is replaced by the error code, it need not be released anymore [ The patch has been compiling tested for powerpc ] Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-08-06KVM: Push rmap into kvm_arch_memory_slotTakuya Yoshikawa1-23/+32
Two reasons: - x86 can integrate rmap and rmap_pde and remove heuristics in __gfn_to_rmap(). - Some architectures do not need rmap. Since rmap is one of the most memory consuming stuff in KVM, ppc'd better restrict the allocation to Book3S HV. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-08-06KVM: Stop checking rmap to see if slot is being createdTakuya Yoshikawa1-2/+2
Instead, check npages consistently. This helps to make rmap architecture specific in a later patch. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-08-05KVM: x86: update KVM_SAVE_MSRS_BEGIN to correct valueGleb Natapov1-1/+1
When MSR_KVM_PV_EOI_EN was added to msrs_to_save array KVM_SAVE_MSRS_BEGIN was not updated accordingly. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-08-05Merge remote-tracking branch 'upstream' into nextAvi Kivity1-0/+4
- bring back critical fixes (esp. aa67f6096c19bc) - provide an updated base for development * upstream: (4334 commits) missed mnt_drop_write() in do_dentry_open() UBIFS: nuke pdflush from comments gfs2: nuke pdflush from comments drbd: nuke pdflush from comments nilfs2: nuke write_super from comments hfs: nuke write_super from comments vfs: nuke pdflush from comments jbd/jbd2: nuke write_super from comments btrfs: nuke pdflush from comments btrfs: nuke write_super from comments ext4: nuke pdflush from comments ext4: nuke write_super from comments ext3: nuke write_super from comments Documentation: fix the VM knobs descritpion WRT pdflush Documentation: get rid of write_super vfs: kill write_super and sync_supers ACPI processor: Fix tick_broadcast_mask online/offline regression ACPI: Only count valid srat memory structures ACPI: Untangle a return statement for better readability Linux 3.6-rc1 ... Signed-off-by: Avi Kivity <avi@redhat.com>
2012-08-03KVM: x86: update KVM_SAVE_MSRS_BEGIN to correct valueGleb Natapov1-1/+1
When MSR_KVM_PV_EOI_EN was added to msrs_to_save array KVM_SAVE_MSRS_BEGIN was not updated accordingly. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-08-02KVM: x86: apply kvmclock offset to guest wall clock timeBruce Rogers1-0/+4
When a guest migrates to a new host, the system time difference from the previous host is used in the updates to the kvmclock system time visible to the guest, resulting in a continuation of correct kvmclock based guest timekeeping. The wall clock component of the kvmclock provided time is currently not updated with this same time offset. Since the Linux guest caches the wall clock based time, this discrepency is not noticed until the guest is rebooted. After reboot the guest's time calculations are off. This patch adjusts the wall clock by the kvmclock_offset, resulting in correct guest time after a reboot. Cc: Zachary Amsden <zamsden@gmail.com> Signed-off-by: Bruce Rogers <brogers@suse.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-08-01KVM: fold kvm_pit_timer into kvm_kpit_stateAvi Kivity1-1/+1
One structure nests inside the other, providing no value at all. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-07-26KVM: Move KVM_IRQ_LINE to arch-generic codeChristoffer Dall1-23/+10
Handle KVM_IRQ_LINE and KVM_IRQ_LINE_STATUS in the generic kvm_vm_ioctl() function and call into kvm_vm_ioctl_irq_line(). This is even more relevant when KVM/ARM also uses this ioctl. Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-20KVM: x86: Fix typos in x86.cGuo Chao1-7/+7
Signed-off-by: Guo Chao <yan@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-07-20KVM: x86: remove unnecessary mark_page_dirtyXiao Guangrong1-1/+0
fix: [ 132.474633] 3.5.0-rc1+ #50 Not tainted [ 132.474634] ------------------------------- [ 132.474635] include/linux/kvm_host.h:369 suspicious rcu_dereference_check() usage! [ 132.474636] [ 132.474636] other info that might help us debug this: [ 132.474636] [ 132.474638] [ 132.474638] rcu_scheduler_active = 1, debug_locks = 1 [ 132.474640] 1 lock held by qemu-kvm/2832: [ 132.474657] #0: (&vcpu->mutex){+.+.+.}, at: [<ffffffffa01e1636>] vcpu_load+0x1e/0x91 [kvm] [ 132.474658] [ 132.474658] stack backtrace: [ 132.474660] Pid: 2832, comm: qemu-kvm Not tainted 3.5.0-rc1+ #50 [ 132.474661] Call Trace: [ 132.474665] [<ffffffff81092f40>] lockdep_rcu_suspicious+0xfc/0x105 [ 132.474675] [<ffffffffa01e0c85>] kvm_memslots+0x6d/0x75 [kvm] [ 132.474683] [<ffffffffa01e0ca1>] gfn_to_memslot+0x14/0x4c [kvm] [ 132.474693] [<ffffffffa01e3575>] mark_page_dirty+0x17/0x2a [kvm] [ 132.474706] [<ffffffffa01f21ea>] kvm_arch_vcpu_ioctl+0xbcf/0xc07 [kvm] Actually, we do not write vcpu->arch.time at this time, mark_page_dirty should be removed. Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-07-18KVM: Separate rmap_pde from kvm_lpage_info->write_countTakuya Yoshikawa1-0/+11
This makes it possible to loop over rmap_pde arrays in the same way as we do over rmap so that we can optimize kvm_handle_hva_range() easily in the following patch. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-07-12KVM: VMX: Implement PCID/INVPCID for guests with EPTMao, Junjie1-3/+20
This patch handles PCID/INVPCID for guests. Process-context identifiers (PCIDs) are a facility by which a logical processor may cache information for multiple linear-address spaces so that the processor may retain cached information when software switches to a different linear address space. Refer to section 4.10.1 in IA32 Intel Software Developer's Manual Volume 3A for details. For guests with EPT, the PCID feature is enabled and INVPCID behaves as running natively. For guests without EPT, the PCID feature is disabled and INVPCID triggers #UD. Signed-off-by: Junjie Mao <junjie.mao@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-09KVM: x86 emulator: change ->get_cpuid() accessor to use the x86 semanticsAvi Kivity1-18/+2
Instead of getting an exact leaf, follow the spec and fall back to the last main leaf instead. This lets us easily emulate the cpuid instruction in the emulator. Signed-off-by: Avi Kivity <avi@redhat.com>
2012-06-25KVM: host side for eoi optimizationMichael S. Tsirkin1-0/+7
Implementation of PV EOI using shared memory. This reduces the number of exits an interrupt causes as much as by half. The idea is simple: there's a bit, per APIC, in guest memory, that tells the guest that it does not need EOI. We set it before injecting an interrupt and clear before injecting a nested one. Guest tests it using a test and clear operation - this is necessary so that host can detect interrupt nesting - and if set, it can skip the EOI MSR. There's a new MSR to set the address of said register in guest memory. Otherwise not much changed: - Guest EOI is not required - Register is tested & ISR is automatically cleared on exit For testing results see description of previous patch 'kvm_para: guest side for eoi avoidance'. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-06-25KVM: rearrange injection cancelling codeMichael S. Tsirkin1-4/+6
Each time we need to cancel injection we invoke same code (cancel_injection callback). Move it towards the end of function using the familiar goto on error pattern. Will make it easier to do more cleanups for PV EOI. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-06-25KVM: only sync when attention bits setMichael S. Tsirkin1-1/+2
Commit eb0dc6d0368072236dcd086d7fdc17fd3c4574d4 introduced apic attention bitmask but kvm still syncs lapic unconditionally. As that commit suggested and in anticipation of adding more attention bits, only sync lapic if(apic_attention). Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-06-19KVM: Use kvm_kvfree() to free memory allocated by kvm_kvzalloc()Takuya Yoshikawa1-1/+1
The following commit did not care about the error handling path: commit c1a7b32a14138f908df52d7c53b5ce3415ec6b50 KVM: Avoid wasting pages for small lpage_info arrays If memory allocation fails, vfree() will be called with the address returned by kzalloc(). This patch fixes this issue. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-06-06KVM: Cleanup the kvm_print functions and introduce pr_XX wrappersChristoffer Dall1-27/+27
Introduces a couple of print functions, which are essentially wrappers around standard printk functions, with a KVM: prefix. Functions introduced or modified are: - kvm_err(fmt, ...) - kvm_info(fmt, ...) - kvm_debug(fmt, ...) - kvm_pr_unimpl(fmt, ...) - pr_unimpl(vcpu, fmt, ...) -> vcpu_unimpl(vcpu, fmt, ...) Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-06-05KVM: Avoid wasting pages for small lpage_info arraysTakuya Yoshikawa1-2/+2
lpage_info is created for each large level even when the memory slot is not for RAM. This means that when we add one slot for a PCI device, we end up allocating at least KVM_NR_PAGE_SIZES - 1 pages by vmalloc(). To make things worse, there is an increasing number of devices which would result in more pages being wasted this way. This patch mitigates this problem by using kvm_kvzalloc(). Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-05-25Merge branch 'next' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-126/+154
Pull KVM changes from Avi Kivity: "Changes include additional instruction emulation, page-crossing MMIO, faster dirty logging, preventing the watchdog from killing a stopped guest, module autoload, a new MSI ABI, and some minor optimizations and fixes. Outside x86 we have a small s390 and a very large ppc update. Regarding the new (for kvm) rebaseless workflow, some of the patches that were merged before we switch trees had to be rebased, while others are true pulls. In either case the signoffs should be correct now." Fix up trivial conflicts in Documentation/feature-removal-schedule.txt arch/powerpc/kvm/book3s_segment.S and arch/x86/include/asm/kvm_para.h. I suspect the kvm_para.h resolution ends up doing the "do I have cpuid" check effectively twice (it was done differently in two different commits), but better safe than sorry ;) * 'next' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (125 commits) KVM: make asm-generic/kvm_para.h have an ifdef __KERNEL__ block KVM: s390: onereg for timer related registers KVM: s390: epoch difference and TOD programmable field KVM: s390: KVM_GET/SET_ONEREG for s390 KVM: s390: add capability indicating COW support KVM: Fix mmu_reload() clash with nested vmx event injection KVM: MMU: Don't use RCU for lockless shadow walking KVM: VMX: Optimize %ds, %es reload KVM: VMX: Fix %ds/%es clobber KVM: x86 emulator: convert bsf/bsr instructions to emulate_2op_SrcV_nobyte() KVM: VMX: unlike vmcs on fail path KVM: PPC: Emulator: clean up SPR reads and writes KVM: PPC: Emulator: clean up instruction parsing kvm/powerpc: Add new ioctl to retreive server MMU infos kvm/book3s: Make kernel emulated H_PUT_TCE available for "PR" KVM KVM: PPC: bookehv: Fix r8/r13 storing in level exception handler KVM: PPC: Book3S: Enable IRQs during exit handling KVM: PPC: Fix PR KVM on POWER7 bare metal KVM: PPC: Fix stbux emulation KVM: PPC: bookehv: Use lwz/stw instead of PPC_LL/PPC_STL for 32-bit fields ...
2012-05-17KVM: Fix mmu_reload() clash with nested vmx event injectionAvi Kivity1-4/+6
Currently the inject_pending_event() call during guest entry happens after kvm_mmu_reload(). This is for historical reasons - we used to inject_pending_event() in atomic context, while kvm_mmu_reload() needs task context. A problem is that nested vmx can cause the mmu context to be reset, if event injection is intercepted and causes a #VMEXIT instead (the #VMEXIT resets CR0/CR3/CR4). If this happens, we end up with invalid root_hpa, and since kvm_mmu_reload() has already run, no one will fix it and we end up entering the guest this way. Fix by reordering event injection to be before kvm_mmu_reload(). Use ->cancel_injection() to undo if kvm_mmu_reload() fails. https://bugzilla.kernel.org/show_bug.cgi?id=42980 Reported-by: Luke-Jr <luke-jr+linuxbugs@utopios.org> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-05-06KVM: ensure async PF event wakes up vcpu from haltGleb Natapov1-0/+1
If vcpu executes hlt instruction while async PF is waiting to be delivered vcpu can block and deliver async PF only after another even wakes it up. This happens because kvm_check_async_pf_completion() will remove completion event from vcpu->async_pf.done before entering kvm_vcpu_block() and this will make kvm_arch_vcpu_runnable() return false. The solution is to make vcpu runnable when processing completion. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-04-21kill mm argument of vm_munmap()Al Viro1-1/+1
it's always current->mm Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-04-21VM: add "vm_mmap()" helper functionLinus Torvalds1-3/+1
This continues the theme started with vm_brk() and vm_munmap(): vm_mmap() does the same thing as do_mmap(), but additionally does the required VM locking. This uninlines (and rewrites it to be clearer) do_mmap(), which sadly duplicates it in mm/mmap.c and mm/nommu.c. But that way we don't have to export our internal do_mmap_pgoff() function. Some day we hopefully don't have to export do_mmap() either, if all modular users can become the simpler vm_mmap() instead. We're actually very close to that already, with the notable exception of the (broken) use in i810, and a couple of stragglers in binfmt_elf. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-04-21VM: add "vm_munmap()" helper functionLinus Torvalds1-3/+1
Like the vm_brk() function, this is the same as "do_munmap()", except it does the VM locking for the caller. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-04-20KVM: Fix page-crossing MMIOAvi Kivity1-33/+81
MMIO that are split across a page boundary are currently broken - the code does not expect to be aborted by the exit to userspace for the first MMIO fragment. This patch fixes the problem by generalizing the current code for handling 16-byte MMIOs to handle a number of "fragments", and changes the MMIO code to create those fragments. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-04-08KVM: Switch to srcu-less get_dirty_log()Takuya Yoshikawa1-73/+43
We have seen some problems of the current implementation of get_dirty_log() which uses synchronize_srcu_expedited() for updating dirty bitmaps; e.g. it is noticeable that this sometimes gives us ms order of latency when we use VGA displays. Furthermore the recent discussion on the following thread "srcu: Implement call_srcu()" http://lkml.org/lkml/2012/1/31/211 also motivated us to implement get_dirty_log() without SRCU. This patch achieves this goal without sacrificing the performance of both VGA and live migration: in practice the new code is much faster than the old one unless we have too many dirty pages. Implementation: The key part of the implementation is the use of xchg() operation for clearing dirty bits atomically. Since this allows us to update only BITS_PER_LONG pages at once, we need to iterate over the dirty bitmap until every dirty bit is cleared again for the next call. Although some people may worry about the problem of using the atomic memory instruction many times to the concurrently accessible bitmap, it is usually accessed with mmu_lock held and we rarely see concurrent accesses: so what we need to care about is the pure xchg() overheads. Another point to note is that we do not use for_each_set_bit() to check which ones in each BITS_PER_LONG pages are actually dirty. Instead we simply use __ffs() in a loop. This is much faster than repeatedly call find_next_bit(). Performance: The dirty-log-perf unit test showed nice improvements, some times faster than before, except for some extreme cases; for such cases the speed of getting dirty page information is much faster than we process it in the userspace. For real workloads, both VGA and live migration, we have observed pure improvements: when the guest was reading a file during live migration, we originally saw a few ms of latency, but with the new method the latency was less than 200us. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-04-08KVM: Avoid checking huge page mappings in get_dirty_log()Takuya Yoshikawa1-5/+3
Dropped such mappings when we enabled dirty logging and we will never create new ones until we stop the logging. For this we introduce a new function which can be used to write protect a range of PT level pages: although we do not need to care about a range of pages at this point, the following patch will need this feature to optimize the write protection of many pages. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-04-08KVM: x86: Add ioctl for KVM_KVMCLOCK_CTRLEric B Munson1-0/+22
Now that we have a flag that will tell the guest it was suspended, create an interface for that communication using a KVM ioctl. Signed-off-by: Eric B Munson <emunson@mgebm.net> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-04-08KVM: Factor out kvm_vcpu_kick to arch-generic codeChristoffer Dall1-14/+2
The kvm_vcpu_kick function performs roughly the same funcitonality on most all architectures, so we shouldn't have separate copies. PowerPC keeps a pointer to interchanging waitqueues on the vcpu_arch structure and to accomodate this special need a __KVM_HAVE_ARCH_VCPU_GET_WQ define and accompanying function kvm_arch_vcpu_wq have been defined. For all other architectures this is a generic inline that just returns &vcpu->wq; Acked-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-29Merge branch 'kvm-updates/3.4' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-90/+313
Pull kvm updates from Avi Kivity: "Changes include timekeeping improvements, support for assigning host PCI devices that share interrupt lines, s390 user-controlled guests, a large ppc update, and random fixes." This is with the sign-off's fixed, hopefully next merge window we won't have rebased commits. * 'kvm-updates/3.4' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (130 commits) KVM: Convert intx_mask_lock to spin lock KVM: x86: fix kvm_write_tsc() TSC matching thinko x86: kvmclock: abstract save/restore sched_clock_state KVM: nVMX: Fix erroneous exception bitmap check KVM: Ignore the writes to MSR_K7_HWCR(3) KVM: MMU: make use of ->root_level in reset_rsvds_bits_mask KVM: PMU: add proper support for fixed counter 2 KVM: PMU: Fix raw event check KVM: PMU: warn when pin control is set in eventsel msr KVM: VMX: Fix delayed load of shared MSRs KVM: use correct tlbs dirty type in cmpxchg KVM: Allow host IRQ sharing for assigned PCI 2.3 devices KVM: Ensure all vcpus are consistent with in-kernel irqchip settings KVM: x86 emulator: Allow PM/VM86 switch during task switch KVM: SVM: Fix CPL updates KVM: x86 emulator: VM86 segments must have DPL 3 KVM: x86 emulator: Fix task switch privilege checks arch/powerpc/kvm/book3s_hv.c: included linux/sched.h twice KVM: x86 emulator: correctly mask pmc index bits in RDPMC instruction emulation KVM: mmu_notifier: Flush TLBs before releasing mmu_lock ...
2012-03-22Merge branch 'x86-fpu-for-linus' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/fpu changes from Ingo Molnar. * 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: i387: Split up <asm/i387.h> into exported and internal interfaces i387: Uninline the generic FP helpers that we expose to kernel modules
2012-03-20x86: remove the second argument of k[un]map_atomic()Cong Wang1-4/+4
Acked-by: Avi Kivity <avi@redhat.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Cong Wang <amwang@redhat.com>
2012-03-20KVM: x86: fix kvm_write_tsc() TSC matching thinkoMarcelo Tosatti1-9/+10
kvm_write_tsc() converts from guest TSC to microseconds, not nanoseconds as intended. The result is that the window for matching is 1000 seconds, not 1 second. Microsecond precision is enough for checking whether the TSC write delta is within the heuristic values, so use it instead of nanoseconds. Noted by Avi Kivity. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-08KVM: Ignore the writes to MSR_K7_HWCR(3)Nicolae Mogoreanu1-0/+1
When CPUID Fn8000_0001_EAX reports 0x00100f22 Windows 7 x64 guest tries to set bit 3 in MSRC001_0015 in nt!KiDisableCacheErrataSource and fails. This patch will ignore this step and allow things to move on without having to fake CPUID value. Signed-off-by: Nicolae Mogoreanu <mogoreanu@gmail.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-08KVM: Allow host IRQ sharing for assigned PCI 2.3 devicesJan Kiszka1-0/+1
PCI 2.3 allows to generically disable IRQ sources at device level. This enables us to share legacy IRQs of such devices with other host devices when passing them to a guest. The new IRQ sharing feature introduced here is optional, user space has to request it explicitly. Moreover, user space can inform us about its view of PCI_COMMAND_INTX_DISABLE so that we can avoid unmasking the interrupt and signaling it if the guest masked it via the virtualized PCI config space. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-08KVM: Ensure all vcpus are consistent with in-kernel irqchip settingsAvi Kivity1-0/+8
If some vcpus are created before KVM_CREATE_IRQCHIP, then irqchip_in_kernel() and vcpu->arch.apic will be inconsistent, leading to potential NULL pointer dereferences. Fix by: - ensuring that no vcpus are installed when KVM_CREATE_IRQCHIP is called - ensuring that a vcpu has an apic if it is installed after KVM_CREATE_IRQCHIP This is somewhat long winded because vcpu->arch.apic is created without kvm->lock held. Based on earlier patch by Michael Ellerman. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-08KVM: x86 emulator: Allow PM/VM86 switch during task switchKevin Wolf1-0/+6
Task switches can switch between Protected Mode and VM86. The current mode must be updated during the task switch emulation so that the new segment selectors are interpreted correctly. In order to let privilege checks succeed, rflags needs to be updated in the vcpu struct as this causes a CPL update. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-08KVM: x86 emulator: Fix task switch privilege checksKevin Wolf1-3/+3
Currently, all task switches check privileges against the DPL of the TSS. This is only correct for jmp/call to a TSS. If a task gate is used, the DPL of this take gate is used for the check instead. Exceptions, external interrupts and iret shouldn't perform any check. [avi: kill kvm-kmod remnants] Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-08KVM: Introduce kvm_memory_slot::arch and move lpage_info into itTakuya Yoshikawa1-0/+59
Some members of kvm_memory_slot are not used by every architecture. This patch is the first step to make this difference clear by introducing kvm_memory_slot::arch; lpage_info is moved into it. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-08KVM: Fix write protection race during dirty loggingTakuya Yoshikawa1-6/+5
This patch fixes a race introduced by: commit 95d4c16ce78cb6b7549a09159c409d52ddd18dae KVM: Optimize dirty logging by rmap_write_protect() During protecting pages for dirty logging, other threads may also try to protect a page in mmu_sync_children() or kvm_mmu_get_page(). In such a case, because get_dirty_log releases mmu_lock before flushing TLB's, the following race condition can happen: A (get_dirty_log) B (another thread) lock(mmu_lock) clear pte.w unlock(mmu_lock) lock(mmu_lock) pte.w is already cleared unlock(mmu_lock) skip TLB flush return ... TLB flush Though thread B assumes the page has already been protected when it returns, the remaining TLB entry will break that assumption. This patch fixes this problem by making get_dirty_log hold the mmu_lock until it flushes the TLB's. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-08KVM: Track TSC synchronization in generationsZachary Amsden1-8/+33
This allows us to track the original nanosecond and counter values at each phase of TSC writing by the guest. This gets us perfect offset matching for stable TSC systems, and perfect software computed TSC matching for machines with unstable TSC. Signed-off-by: Zachary Amsden <zamsden@gmail.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-08KVM: Dont mark TSC unstable due to S4 suspendZachary Amsden1-5/+88
During a host suspend, TSC may go backwards, which KVM interprets as an unstable TSC. Technically, KVM should not be marking the TSC unstable, which causes the TSC clocksource to go bad, but we need to be adjusting the TSC offsets in such a case. Dealing with this issue is a little tricky as the only place we can reliably do it is before much of the timekeeping infrastructure is up and running. On top of this, we are not in a KVM thread context, so we may not be able to safely access VCPU fields. Instead, we compute our best known hardware offset at power-up and stash it to be applied to all VCPUs when they actually start running. Signed-off-by: Zachary Amsden <zamsden@gmail.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-08KVM: Allow adjust_tsc_offset to be in host or guest cyclesMarcelo Tosatti1-1/+1
Redefine the API to take a parameter indicating whether an adjustment is in host or guest cycles. Signed-off-by: Zachary Amsden <zamsden@gmail.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>