summaryrefslogtreecommitdiff
path: root/virt/kvm
AgeCommit message (Collapse)AuthorFilesLines
2012-10-24Merge tag 'kvm-3.7-2' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-3/+1
Pull kvm fixes from Avi Kivity: "KVM updates for 3.7-rc2" * tag 'kvm-3.7-2' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM guest: exit idleness when handling KVM_PV_REASON_PAGE_NOT_PRESENT KVM: apic: fix LDR calculation in x2apic mode KVM: MMU: fix release noslot pfn
2012-10-22KVM: MMU: fix release noslot pfnXiao Guangrong1-3/+1
We can not directly call kvm_release_pfn_clean to release the pfn since we can meet noslot pfn which is used to cache mmio info into spte Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Cc: stable@vger.kernel.org Signed-off-by: Avi Kivity <avi@redhat.com>
2012-10-05kvm: replace test_and_set_bit_le() in mark_page_dirty_in_slot() with ↵Takuya Yoshikawa1-2/+1
set_bit_le() Now that we have defined generic set_bit_le() we do not need to use test_and_set_bit_le() for atomically setting a bit. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Cc: Avi Kivity <avi@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-04Merge tag 'kvm-3.7-1' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds7-250/+525
Pull KVM updates from Avi Kivity: "Highlights of the changes for this release include support for vfio level triggered interrupts, improved big real mode support on older Intels, a streamlines guest page table walker, guest APIC speedups, PIO optimizations, better overcommit handling, and read-only memory." * tag 'kvm-3.7-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (138 commits) KVM: s390: Fix vcpu_load handling in interrupt code KVM: x86: Fix guest debug across vcpu INIT reset KVM: Add resampling irqfds for level triggered interrupts KVM: optimize apic interrupt delivery KVM: MMU: Eliminate pointless temporary 'ac' KVM: MMU: Avoid access/dirty update loop if all is well KVM: MMU: Eliminate eperm temporary KVM: MMU: Optimize is_last_gpte() KVM: MMU: Simplify walk_addr_generic() loop KVM: MMU: Optimize pte permission checks KVM: MMU: Update accessed and dirty bits after guest pagetable walk KVM: MMU: Move gpte_access() out of paging_tmpl.h KVM: MMU: Optimize gpte_access() slightly KVM: MMU: Push clean gpte write protection out of gpte_access() KVM: clarify kvmclock documentation KVM: make processes waiting on vcpu mutex killable KVM: SVM: Make use of asm.h KVM: VMX: Make use of asm.h KVM: VMX: Make lto-friendly KVM: x86: lapic: Clean up find_highest_vector() and count_vectors() ... Conflicts: arch/s390/include/asm/processor.h arch/x86/kvm/i8259.c
2012-10-02Merge branch 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wqLinus Torvalds1-1/+1
Pull workqueue changes from Tejun Heo: "This is workqueue updates for v3.7-rc1. A lot of activities this round including considerable API and behavior cleanups. * delayed_work combines a timer and a work item. The handling of the timer part has always been a bit clunky leading to confusing cancelation API with weird corner-case behaviors. delayed_work is updated to use new IRQ safe timer and cancelation now works as expected. * Another deficiency of delayed_work was lack of the counterpart of mod_timer() which led to cancel+queue combinations or open-coded timer+work usages. mod_delayed_work[_on]() are added. These two delayed_work changes make delayed_work provide interface and behave like timer which is executed with process context. * A work item could be executed concurrently on multiple CPUs, which is rather unintuitive and made flush_work() behavior confusing and half-broken under certain circumstances. This problem doesn't exist for non-reentrant workqueues. While non-reentrancy check isn't free, the overhead is incurred only when a work item bounces across different CPUs and even in simulated pathological scenario the overhead isn't too high. All workqueues are made non-reentrant. This removes the distinction between flush_[delayed_]work() and flush_[delayed_]_work_sync(). The former is now as strong as the latter and the specified work item is guaranteed to have finished execution of any previous queueing on return. * In addition to the various bug fixes, Lai redid and simplified CPU hotplug handling significantly. * Joonsoo introduced system_highpri_wq and used it during CPU hotplug. There are two merge commits - one to pull in IRQ safe timer from tip/timers/core and the other to pull in CPU hotplug fixes from wq/for-3.6-fixes as Lai's hotplug restructuring depended on them." Fixed a number of trivial conflicts, but the more interesting conflicts were silent ones where the deprecated interfaces had been used by new code in the merge window, and thus didn't cause any real data conflicts. Tejun pointed out a few of them, I fixed a couple more. * 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (46 commits) workqueue: remove spurious WARN_ON_ONCE(in_irq()) from try_to_grab_pending() workqueue: use cwq_set_max_active() helper for workqueue_set_max_active() workqueue: introduce cwq_set_max_active() helper for thaw_workqueues() workqueue: remove @delayed from cwq_dec_nr_in_flight() workqueue: fix possible stall on try_to_grab_pending() of a delayed work item workqueue: use hotcpu_notifier() for workqueue_cpu_down_callback() workqueue: use __cpuinit instead of __devinit for cpu callbacks workqueue: rename manager_mutex to assoc_mutex workqueue: WORKER_REBIND is no longer necessary for idle rebinding workqueue: WORKER_REBIND is no longer necessary for busy rebinding workqueue: reimplement idle worker rebinding workqueue: deprecate __cancel_delayed_work() workqueue: reimplement cancel_delayed_work() using try_to_grab_pending() workqueue: use mod_delayed_work() instead of __cancel + queue workqueue: use irqsafe timer for delayed_work workqueue: clean up delayed_work initializers and add missing one workqueue: make deferrable delayed_work initializer names consistent workqueue: cosmetic whitespace updates for macro definitions workqueue: deprecate system_nrt[_freezable]_wq workqueue: deprecate flush[_delayed]_work_sync() ...
2012-09-23KVM: Add resampling irqfds for level triggered interruptsAlex Williamson2-4/+152
To emulate level triggered interrupts, add a resample option to KVM_IRQFD. When specified, a new resamplefd is provided that notifies the user when the irqchip has been resampled by the VM. This may, for instance, indicate an EOI. Also in this mode, posting of an interrupt through an irqfd only asserts the interrupt. On resampling, the interrupt is automatically de-asserted prior to user notification. This enables level triggered interrupts to be posted and re-enabled from vfio with no userspace intervention. All resampling irqfds can make use of a single irq source ID, so we reserve a new one for this interface. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-09-20KVM: optimize apic interrupt deliveryGleb Natapov1-1/+6
Most interrupt are delivered to only one vcpu. Use pre-build tables to find interrupt destination instead of looping through all vcpus. In case of logical mode loop only through vcpus in a logical cluster irq is sent to. Signed-off-by: Gleb Natapov <gleb@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-09-17KVM: make processes waiting on vcpu mutex killableMichael S. Tsirkin1-3/+7
vcpu mutex can be held for unlimited time so taking it with mutex_lock on an ioctl is wrong: one process could be passed a vcpu fd and call this ioctl on the vcpu used by another process, it will then be unkillable until the owner exits. Call mutex_lock_killable instead and return status. Note: mutex_lock_interruptible would be even nicer, but I am not sure all users are prepared to handle EINTR from these ioctls. They might misinterpret it as an error. Cleanup paths expect a vcpu that can't be used by any userspace so this will always succeed - catch bugs by calling BUG_ON. Catch callers that don't check return state by adding __must_check. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-09-06KVM: move postcommit flush to x86, as mmio sptes are x86 specificMarcelo Tosatti1-7/+0
Other arches do not need this. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> v2: fix incorrect deletion of mmio sptes on gpa move (noticed by Takuya) Signed-off-by: Avi Kivity <avi@redhat.com>
2012-09-06KVM: perform an invalid memslot step for gpa base changeMarcelo Tosatti1-3/+3
PPC must flush all translations before the new memory slot is visible. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-09-06KVM: split kvm_arch_flush_shadowMarcelo Tosatti1-4/+4
Introducing kvm_arch_flush_shadow_memslot, to invalidate the translations of a single memory slot. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-08-27KVM: PPC: book3s: fix build error caused by gfn_to_hva_memslot()Gavin Shan1-6/+0
The build error was caused by that builtin functions are calling the functions implemented in modules. This error was introduced by commit 4d8b81abc4 ("KVM: introduce readonly memslot"). The patch fixes the build error by moving function __gfn_to_hva_memslot() from kvm_main.c to kvm_host.h and making that "inline" so that the builtin function (kvmppc_h_enter) can use that. Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-08-26kvm: Fix nonsense handling of compat ioctlAlan Cox1-3/+4
KVM_SET_SIGNAL_MASK passed a NULL argument leaves the on stack signal sets uninitialized. It then passes them through to kvm_vcpu_ioctl_set_sigmask. We should be passing a NULL in this case not translated garbage. Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-08-22KVM: introduce readonly memslotXiao Guangrong1-18/+78
In current code, if we map a readonly memory space from host to guest and the page is not currently mapped in the host, we will get a fault pfn and async is not allowed, then the vm will crash We introduce readonly memory region to map ROM/ROMD to the guest, read access is happy for readonly memslot, write access on readonly memslot will cause KVM_EXIT_MMIO exit Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-08-22KVM: introduce KVM_HVA_ERR_BADXiao Guangrong1-12/+1
Then, remove bad_hva and inline kvm_is_error_hva Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-08-22KVM: use 'writable' as a hint to map writable pfnXiao Guangrong1-1/+23
In current code, we always map writable pfn for the read fault, in order to support readonly memslot, we map writable pfn only if 'writable' is not NULL Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-08-22KVM: reorganize hva_to_pfnXiao Guangrong1-62/+97
We do too many things in hva_to_pfn, this patch reorganize the code, let it be better readable Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-08-22KVM: introduce gfn_to_hva_read/kvm_read_hva/kvm_read_hva_atomicXiao Guangrong1-4/+23
This set of functions is only used to read data from host space, in the later patch, we will only get a readonly hva in gfn_to_hva_read, and the function name is a good hint to let gfn_to_hva_read to pair with kvm_read_hva()/kvm_read_hva_atomic() Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-08-22KVM: introduce gfn_to_pfn_memslot_atomicXiao Guangrong1-6/+8
It can instead of hva_to_pfn_atomic Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-08-22KVM: fix missing check for memslot flagsXiao Guangrong1-0/+12
Check flags when memslot is registered from userspace as Avi's suggestion Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-08-21workqueue: deprecate flush[_delayed]_work_sync()Tejun Heo1-1/+1
flush[_delayed]_work_sync() are now spurious. Mark them deprecated and convert all users to flush[_delayed]_work(). If you're cc'd and wondering what's going on: Now all workqueues are non-reentrant and the regular flushes guarantee that the work item is not pending or running on any CPU on return, so there's no reason to use the sync flushes at all and they're going away. This patch doesn't make any functional difference. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Mattia Dongili <malattia@linux.it> Cc: Kent Yoder <key@linux.vnet.ibm.com> Cc: David Airlie <airlied@linux.ie> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Karsten Keil <isdn@linux-pingi.de> Cc: Bryan Wu <bryan.wu@canonical.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Alasdair Kergon <agk@redhat.com> Cc: Mauro Carvalho Chehab <mchehab@infradead.org> Cc: Florian Tobias Schandinat <FlorianSchandinat@gmx.de> Cc: David Woodhouse <dwmw2@infradead.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: linux-wireless@vger.kernel.org Cc: Anton Vorontsov <cbou@mail.ru> Cc: Sangbeom Kim <sbkim73@samsung.com> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Eric Van Hensbergen <ericvh@gmail.com> Cc: Takashi Iwai <tiwai@suse.de> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Petr Vandrovec <petr@vandrovec.name> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Avi Kivity <avi@redhat.com>
2012-08-15KVM: x86: drop parameter validation in ioapic/picMichael S. Tsirkin1-18/+19
We validate irq pin number when routing is setup, so code handling illegal irq # in pic and ioapic on each injection is never called. Drop it, replace with BUG_ON to catch out of bounds access bugs. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-08-06KVM: do not release the error pageXiao Guangrong2-4/+5
After commit a2766325cf9f9, the error page is replaced by the error code, it need not be released anymore [ The patch has been compiling tested for powerpc ] Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-08-06KVM: do not release the error pfnXiao Guangrong2-7/+8
After commit a2766325cf9f9, the error pfn is replaced by the error code, it need not be released anymore [ The patch has been compiling tested for powerpc ] Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-08-06KVM: introduce KVM_ERR_PTR_BAD_PAGEXiao Guangrong2-13/+2
It is used to eliminate the overload of function call and cleanup the code Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-08-06KVM: inline is_*_pfn functionsXiao Guangrong1-18/+0
These functions are exported and can not inline, move them to kvm_host.h to eliminate the overload of function call Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-08-06KVM: introduce KVM_PFN_ERR_BADXiao Guangrong1-6/+1
Then, remove get_bad_pfn Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-08-06KVM: introduce KVM_PFN_ERR_HWPOISONXiao Guangrong1-12/+1
Then, get_hwpoison_pfn and is_hwpoison_pfn can be removed Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-08-06KVM: introduce KVM_PFN_ERR_FAULTXiao Guangrong1-9/+3
After that, the exported and un-inline function, get_fault_pfn, can be removed Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-08-06KVM: iommu: fix releasing unmapped pageXiao Guangrong1-0/+7
There are two bugs: - the 'error page' is forgot to be released [ it is unneeded after commit a2766325cf9f9, for backport, we still do kvm_release_pfn_clean for the error pfn ] - guest pages are always released regardless of the unmapped page (e,g, caused by hwpoison) Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-08-06KVM: Push rmap into kvm_arch_memory_slotTakuya Yoshikawa1-10/+1
Two reasons: - x86 can integrate rmap and rmap_pde and remove heuristics in __gfn_to_rmap(). - Some architectures do not need rmap. Since rmap is one of the most memory consuming stuff in KVM, ppc'd better restrict the allocation to Book3S HV. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-26KVM: Move KVM_IRQ_LINE to arch-generic codeChristoffer Dall1-0/+23
Handle KVM_IRQ_LINE and KVM_IRQ_LINE_STATUS in the generic kvm_vm_ioctl() function and call into kvm_vm_ioctl_irq_line(). This is even more relevant when KVM/ARM also uses this ioctl. Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-26KVM: remove dummy pagesXiao Guangrong2-74/+50
Currently, kvm allocates some pages and use them as error indicators, it wastes memory and is not good for scalability Base on Avi's suggestion, we use the error codes instead of these pages to indicate the error conditions Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-26KVM: use kvm_release_page_clean to release the pageXiao Guangrong1-2/+2
In kvm_async_pf_wakeup_all, it uses bad_page to generate broadcast wakeup, and uses put_page to release bad_page, the work depends on the fact that bad_page is the normal page. But we will use the error code instead of bad_page, so use kvm_release_page_clean to release the page which will release the error code properly Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-26Merge branch 'queue' into nextAvi Kivity4-29/+72
Merge patches queued during the run-up to the merge window. * queue: (25 commits) KVM: Choose better candidate for directed yield KVM: Note down when cpu relax intercepted or pause loop exited KVM: Add config to support ple or cpu relax optimzation KVM: switch to symbolic name for irq_states size KVM: x86: Fix typos in pmu.c KVM: x86: Fix typos in lapic.c KVM: x86: Fix typos in cpuid.c KVM: x86: Fix typos in emulate.c KVM: x86: Fix typos in x86.c KVM: SVM: Fix typos KVM: VMX: Fix typos KVM: remove the unused parameter of gfn_to_pfn_memslot KVM: remove is_error_hpa KVM: make bad_pfn static to kvm_main.c KVM: using get_fault_pfn to get the fault pfn KVM: MMU: track the refcount when unmap the page KVM: x86: remove unnecessary mark_page_dirty KVM: MMU: Avoid handling same rmap_pde in kvm_handle_hva_range() KVM: MMU: Push trace_kvm_age_page() into kvm_age_rmapp() KVM: MMU: Add memslot parameter to hva handlers ... Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-24Merge tag 'kvm-3.6-1' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds4-43/+47
Pull KVM updates from Avi Kivity: "Highlights include - full big real mode emulation on pre-Westmere Intel hosts (can be disabled with emulate_invalid_guest_state=0) - relatively small ppc and s390 updates - PCID/INVPCID support in guests - EOI avoidance; 3.6 guests should perform better on 3.6 hosts on interrupt intensive workloads) - Lockless write faults during live migration - EPT accessed/dirty bits support for new Intel processors" Fix up conflicts in: - Documentation/virtual/kvm/api.txt: Stupid subchapter numbering, added next to each other. - arch/powerpc/kvm/booke_interrupts.S: PPC asm changes clashing with the KVM fixes - arch/s390/include/asm/sigp.h, arch/s390/kvm/sigp.c: Duplicated commits through the kvm tree and the s390 tree, with subsequent edits in the KVM tree. * tag 'kvm-3.6-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (93 commits) KVM: fix race with level interrupts x86, hyper: fix build with !CONFIG_KVM_GUEST Revert "apic: fix kvm build on UP without IOAPIC" KVM guest: switch to apic_set_eoi_write, apic_write apic: add apic_set_eoi_write for PV use KVM: VMX: Implement PCID/INVPCID for guests with EPT KVM: Add x86_hyper_kvm to complete detect_hypervisor_platform check KVM: PPC: Critical interrupt emulation support KVM: PPC: e500mc: Fix tlbilx emulation for 64-bit guests KVM: PPC64: booke: Set interrupt computation mode for 64-bit host KVM: PPC: bookehv: Add ESR flag to Data Storage Interrupt KVM: PPC: bookehv64: Add support for std/ld emulation. booke: Added crit/mc exception handler for e500v2 booke/bookehv: Add host crit-watchdog exception support KVM: MMU: document mmu-lock and fast page fault KVM: MMU: fix kvm_mmu_pagetable_walk tracepoint KVM: MMU: trace fast page fault KVM: MMU: fast path of handling guest page fault KVM: MMU: introduce SPTE_MMU_WRITEABLE bit KVM: MMU: fold tlb flush judgement into mmu_spte_update ...
2012-07-23KVM: Choose better candidate for directed yieldRaghavendra K T1-0/+42
Currently, on a large vcpu guests, there is a high probability of yielding to the same vcpu who had recently done a pause-loop exit or cpu relax intercepted. Such a yield can lead to the vcpu spinning again and hence degrade the performance. The patchset keeps track of the pause loop exit/cpu relax interception and gives chance to a vcpu which: (a) Has not done pause loop exit or cpu relax intercepted at all (probably he is preempted lock-holder) (b) Was skipped in last iteration because it did pause loop exit or cpu relax intercepted, and probably has become eligible now (next eligible lock holder) Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Reviewed-by: Rik van Riel <riel@redhat.com> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> # on s390x Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-23KVM: Note down when cpu relax intercepted or pause loop exitedRaghavendra K T1-0/+5
Noting pause loop exited vcpu or cpu relax intercepted helps in filtering right candidate to yield. Wrong selection of vcpu; i.e., a vcpu that just did a pl-exit or cpu relax intercepted may contribute to performance degradation. Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Reviewed-by: Rik van Riel <riel@redhat.com> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> # on s390x Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-23KVM: Add config to support ple or cpu relax optimzationRaghavendra K T1-0/+3
Suggested-by: Avi Kivity <avi@redhat.com> Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Reviewed-by: Rik van Riel <riel@redhat.com> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> # on s390x Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-20KVM: switch to symbolic name for irq_states sizeMichael S. Tsirkin1-2/+2
Use PIC_NUM_PINS instead of hard-coded 16 for pic pins. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-07-20KVM: fix race with level interruptsMichael S. Tsirkin3-31/+23
When more than 1 source id is in use for the same GSI, we have the following race related to handling irq_states race: CPU 0 clears bit 0. CPU 0 read irq_state as 0. CPU 1 sets level to 1. CPU 1 calls kvm_ioapic_set_irq(1). CPU 0 calls kvm_ioapic_set_irq(0). Now ioapic thinks the level is 0 but irq_state is not 0. Fix by performing all irq_states bitmap handling under pic/ioapic lock. This also removes the need for atomics with irq_states handling. Reported-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-07-20KVM: remove the unused parameter of gfn_to_pfn_memslotXiao Guangrong2-13/+12
The parameter, 'kvm', is not used in gfn_to_pfn_memslot, we can happily remove it Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-07-20KVM: make bad_pfn static to kvm_main.cXiao Guangrong1-3/+3
bad_pfn is not used out of kvm_main.c, so mark it static, also move it near hwpoison_pfn and fault_pfn Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-07-20KVM: using get_fault_pfn to get the fault pfnXiao Guangrong1-9/+4
Using get_fault_pfn to cleanup the code Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-07-18KVM: Introduce kvm_unmap_hva_range() for ↵Takuya Yoshikawa1-2/+1
kvm_mmu_notifier_invalidate_range_start() When we tested KVM under memory pressure, with THP enabled on the host, we noticed that MMU notifier took a long time to invalidate huge pages. Since the invalidation was done with mmu_lock held, it not only wasted the CPU but also made the host harder to respond. This patch mitigates this by using kvm_handle_hva_range(). Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Cc: Alexander Graf <agraf@suse.de> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-07-11KVM: Fix device assignment threaded irq handlerAlex Williamson1-2/+13
The kernel no longer allows us to pass NULL for the hard handler without also specifying IRQF_ONESHOT. IRQF_ONESHOT imposes latency in the exit path that we don't need for MSI interrupts. Long term we'd like to inject these interrupts from the hard handler when possible. In the short term, we can create dummy hard handlers that return us to the previous behavior. Credit to Michael for original patch. Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=43328 Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-06KVM: handle last_boosted_vcpu = 0 caseRik van Riel1-1/+1
If last_boosted_vcpu == 0, then we fall through all test cases and may end up with all VCPUs pouncing on vcpu 0. With a large enough guest, this can result in enormous runqueue lock contention, which can prevent vcpu0 from running, leading to a livelock. Changing < to <= makes sure we properly handle that case. Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-07-04KVM: fix fault page leakXiao Guangrong1-0/+1
fault_page is forgot to be freed Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-07-03KVM: Sanitize KVM_IRQFD flagsAlex Williamson1-0/+3
We only know of one so far. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-07-03KVM: Pass kvm_irqfd to functionsAlex Williamson2-11/+11
Prune this down to just the struct kvm_irqfd so we can avoid changing function definition for every flag or field we use. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>