summaryrefslogtreecommitdiff
path: root/drivers/kvm/kvm_main.c
AgeCommit message (Collapse)AuthorFilesLines
2007-06-15KVM: Prevent guest fpu state from leaking into the hostAvi Kivity1-0/+22
The lazy fpu changes did not take into account that some vmexit handlers can sleep. Move loading the guest state into the inner loop so that it can be reloaded if necessary, and move loading the host state into vmx_vcpu_put() so it can be performed whenever we relinquish the vcpu. Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-21Detach sched.h from mm.hAlexey Dobriyan1-0/+1
First thing mm.h does is including sched.h solely for can_do_mlock() inline function which has "current" dereference inside. By dealing with can_do_mlock() mm.h can be detached from sched.h which is good. See below, why. This patch a) removes unconditional inclusion of sched.h from mm.h b) makes can_do_mlock() normal function in mm/mlock.c c) exports can_do_mlock() to not break compilation d) adds sched.h inclusions back to files that were getting it indirectly. e) adds less bloated headers to some files (asm/signal.h, jiffies.h) that were getting them indirectly Net result is: a) mm.h users would get less code to open, read, preprocess, parse, ... if they don't need sched.h b) sched.h stops being dependency for significant number of files: on x86_64 allmodconfig touching sched.h results in recompile of 4083 files, after patch it's only 3744 (-8.3%). Cross-compile tested on all arm defconfigs, all mips defconfigs, all powerpc defconfigs, alpha alpha-up arm i386 i386-up i386-defconfig i386-allnoconfig ia64 ia64-up m68k mips parisc parisc-up powerpc powerpc-up s390 s390-up sparc sparc-up sparc64 sparc64-up um-x86_64 x86_64 x86_64-up x86_64-defconfig x86_64-allnoconfig as well as my two usual configs. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09Add suspend-related notifications for CPU hotplugRafael J. Wysocki1-0/+3
Since nonboot CPUs are now disabled after tasks and devices have been frozen and the CPU hotplug infrastructure is used for this purpose, we need special CPU hotplug notifications that will help the CPU-hotplug-aware subsystems distinguish normal CPU hotplug events from CPU hotplug events related to a system-wide suspend or resume operation in progress. This patch introduces such notifications and causes them to be used during suspend and resume transitions. It also changes all of the CPU-hotplug-aware subsystems to take these notifications into consideration (for now they are handled in the same way as the corresponding "normal" ones). [oleg@tv-sign.ru: cleanups] Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Gautham R Shenoy <ego@in.ibm.com> Cc: Pavel Machek <pavel@ucw.cz> Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-03KVM: Don't require explicit indication of completion of mmio or pioAvi Kivity1-22/+22
It is illegal not to return from a pio or mmio request without completing it, as mmio or pio is an atomic operation. Therefore, we can simplify the userspace interface by avoiding the completion indication. Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03KVM: Remove extraneous guest entry on mmio readAvi Kivity1-3/+13
When emulating an mmio read, we actually emulate twice: once to determine the physical address of the mmio, and, after we've exited to userspace to get the mmio value, we emulate again to place the value in the result register and update any flags. But we don't really need to enter the guest again for that, only to take an immediate vmexit. So, if we detect that we're doing an mmio read, emulate a single instruction before entering the guest again. Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03KVM: VMX: Properly shadow the CR0 register in the vcpu structAnthony Liguori1-5/+3
Set all of the host mask bits for CR0 so that we can maintain a proper shadow of CR0. This exposes CR0.TS, paving the way for lazy fpu handling. Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03KVM: Allow passing 64-bit values to the emulated read/write APIAvi Kivity1-36/+9
This simplifies the API somewhat (by eliminating the special-case cmpxchg8b on i386). Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03KVM: Per-vcpu statisticsAvi Kivity1-18/+43
Make the exit statistics per-vcpu instead of global. This gives a 3.5% boost when running one virtual machine per core on my two socket dual core (4 cores total) machine. Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03KVM: VMX: Avoid unnecessary vcpu_load()/vcpu_put() cyclesYaozu Dong1-0/+2
By checking if a reschedule is needed, we avoid dropping the vcpu. [With changes by me, based on Anthony Liguori's observations] Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03KVM: Handle guest page faults when emulating mmioAvi Kivity1-1/+3
Usually, guest page faults are detected by the kvm page fault handler, which detects if they are shadow faults, mmio faults, pagetable faults, or normal guest page faults. However, in ceratin circumstances, we can detect a page fault much later. One of these events is the following combination: - A two memory operand instruction (e.g. movsb) is executed. - The first operand is in mmio space (which is the fault reported to kvm) - The second operand is in an ummaped address (e.g. a guest page fault) The Windows 2000 installer does such an access, an promptly hangs. Fix by adding the missing page fault injection on that path. Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03KVM: Use slab caches to allocate mmu data structuresAvi Kivity1-0/+7
Better leak detection, statistics, memory use, speed -- goodness all around. Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03KVM: Initialize cr0 to indicate an fpu is presentAvi Kivity1-0/+1
Solaris panics if it sees a cpu with no fpu, and it seems to rely on this bit. Closes sourceforge bug 1698920. Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03KVM: Add fpu get/set operationsAvi Kivity1-0/+86
These are really helpful when migrating an floating point app to another machine. Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03KVM: Add physical memory aliasing featureAvi Kivity1-3/+86
With this, we can specify that accesses to one physical memory range will be remapped to another. This is useful for the vga window at 0xa0000 which is used as a movable window into the (much larger) framebuffer. Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03KVM: Simply gfn_to_page()Avi Kivity1-20/+25
Mapping a guest page to a host page is a common operation. Currently, one has first to find the memory slot where the page belongs (gfn_to_memslot), then locate the page itself (gfn_to_page()). This is clumsy, and also won't work well with memory aliases. So simplify gfn_to_page() not to require memory slot translation first, and instead do it internally. Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03KVM: Handle writes to MCG_STATUS msrSergey Kiselev1-0/+4
Some older (~2.6.7) kernels write MCG_STATUS register during kernel boot (mce_clear_all() function, called from mce_init()). It's not currently handled by kvm and will cause it to inject a GPF. Following patch adds a "nop" handler for this. Signed-off-by: Sergey Kiselev <sergey.kiselev@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03KVM: Modify guest segments after potentially switching modesAvi Kivity1-10/+10
The SET_SREGS ioctl modifies both cr0.pe (real mode/protected mode) and guest segment registers. Since segment handling is modified by the mode on Intel procesors, update the segment registers after the mode switch has taken place. Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03KVM: Remove set_cr0_no_modeswitch() arch opAvi Kivity1-1/+1
set_cr0_no_modeswitch() was a hack to avoid corrupting segment registers. As we now cache the protected mode values on entry to real mode, this isn't an issue anymore, and it interferes with reboot (which usually _is_ a modeswitch). Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03KVM: Avoid guest virtual addresses in string pio userspace interfaceAvi Kivity1-15/+168
The current string pio interface communicates using guest virtual addresses, relying on userspace to translate addresses and to check permissions. This interface cannot fully support guest smp, as the check needs to take into account two pages at one in case an unaligned string transfer straddles a page boundary. Change the interface not to communicate guest addresses at all; instead use a buffer page (mmaped by userspace) and do transfers there. The kernel manages the virtual to physical translation and can perform the checks atomically by taking the appropriate locks. Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03KVM: Future-proof argument-less ioctlsAvi Kivity1-0/+9
Some ioctls ignore their arguments. By requiring them to be zero now, we allow a nonzero value to have some special meaning in the future. Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03KVM: Allow kernel to select size of mmap() bufferAvi Kivity1-1/+7
This allows us to store offsets in the kernel/user kvm_run area, and be sure that userspace has them mapped. As offsets can be outside the kvm_run struct, userspace has no way of knowing how much to mmap. Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03KVM: Add guest mode signal maskAvi Kivity1-0/+41
Allow a special signal mask to be used while executing in guest mode. This allows signals to be used to interrupt a vcpu without requiring signal delivery to a userspace handler, which is quite expensive. Userspace still receives -EINTR and can get the signal via sigwait(). Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03KVM: Fold kvm_run::exit_type into kvm_run::exit_reasonAvi Kivity1-2/+1
Currently, userspace is told about the nature of the last exit from the guest using two fields, exit_type and exit_reason, where exit_type has just two enumerations (and no need for more). So fold exit_type into exit_reason, reducing the complexity of determining what really happened. Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03KVM: Allow userspace to process hypercalls which have no kernel handlerAvi Kivity1-1/+17
This is useful for paravirtualized graphics devices, for example. Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03KVM: Add method to check for backwards-compatible API extensionsAvi Kivity1-0/+6
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03KVM: Remove the 'emulated' field from the userspace interfaceAvi Kivity1-5/+0
We no longer emulate single instructions in userspace. Instead, we service mmio or pio requests. Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03KVM: Handle cpuid in the kernel instead of punting to userspaceAvi Kivity1-0/+69
KVM used to handle cpuid by letting userspace decide what values to return to the guest. We now handle cpuid completely in the kernel. We still let userspace decide which values the guest will see by having userspace set up the value table beforehand (this is necessary to allow management software to set the cpu features to the least common denominator, so that live migration can work). The motivation for the change is that kvm kernel code can be impacted by cpuid features, for example the x86 emulator. Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03KVM: Do not communicate to userspace through cpu registers during PIOAvi Kivity1-3/+45
Currently when passing the a PIO emulation request to userspace, we rely on userspace updating %rax (on 'in' instructions) and %rsi/%rdi/%rcx (on string instructions). This (a) requires two extra ioctls for getting and setting the registers and (b) is unfriendly to non-x86 archs, when they get kvm ports. So fix by doing the register fixups in the kernel and passing to userspace only an abstract description of the PIO to be done. Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03KVM: Use a shared page for kernel/user communication when runing a vcpuAvi Kivity1-14/+40
Instead of passing a 'struct kvm_run' back and forth between the kernel and userspace, allocate a page and allow the user to mmap() it. This reduces needless copying and makes the interface expandable by providing lots of free space. Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03KVM: Use own minor numberAvi Kivity1-1/+1
Use the minor number (232) allocated to kvm by lanana. Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-05-03KVM: Fix guest register corruption on paravirt hypercallDor Laor1-2/+2
The hypercall code mixes up the ->cache_regs() and ->decache_regs() callbacks, resulting in guest register corruption. Signed-off-by: Dor Laor <dor.laor@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-03-18KVM: Unset kvm_arch_ops if arch module loading failedAvi Kivity1-1/+3
Otherwise, the core module thinks the arch module is loaded, and won't let you reload it after you've fixed the bug. Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-03-04KVM: Move kvmfs magic number to <linux/magic.h>Andrew Morton1-2/+2
Use the standard magic.h for kvmfs. Cc: Avi Kivity <avi@qumranet.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-03-04KVM: Fix bogus failure in kvm.ko module initializationAvi Kivity1-1/+1
A bogus 'return r' can cause an otherwise successful module load to fail. This both denies users the use of kvm, and it also denies them the use of their machine, as it leaves a filesystem registered with its callbacks pointing into now-freed module memory. Fix by returning a zero like a good module. Thanks to Richard Lucassen <mailinglists@lucassen.org> (?) for reporting the problem and for providing access to a machine which exhibited it. Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-03-04KVM: Remove write access permissions when dirty-page-logging is enabledUri Lublin1-0/+2
Enabling dirty page logging is done using KVM_SET_MEMORY_REGION ioctl. If the memory region already exists, we need to remove write accesses, so writes will be caught, and dirty pages will be logged. Signed-off-by: Uri Lublin <uril@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-03-04kvm: move do_remove_write_access() upUri Lublin1-7/+7
To be called from kvm_vm_ioctl_set_memory_region() Signed-off-by: Uri Lublin <uril@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-03-04KVM: Fix dirty page log bitmap size/access calculationUri Lublin1-2/+2
Since dirty_bitmap is an unsigned long array, the alignment and size need to take that into account. Signed-off-by: Uri Lublin <uril@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-03-04KVM: Add missing calls to mark_page_dirty()Uri Lublin1-0/+6
A few places where we modify guest memory fail to call mark_page_dirty(), causing live migration to fail. This adds the missing calls. Signed-off-by: Uri Lublin <uril@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-03-04KVM: Per-vcpu inodesAvi Kivity1-114/+149
Allocate a distinct inode for every vcpu in a VM. This has the following benefits: - the filp cachelines are no longer bounced when f_count is incremented on every ioctl() - the API and internal code are distinctly clearer; for example, on the KVM_GET_REGS ioctl, there is no need to copy the vcpu number from userspace and then copy the registers back; the vcpu identity is derived from the fd used to make the call Right now the performance benefits are completely theoretical since (a) we don't support more than one vcpu per VM and (b) virtualization hardware inefficiencies completely everwhelm any cacheline bouncing effects. But both of these will change, and we need to prepare the API today. Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-03-04KVM: Move kvm_vm_ioctl_create_vcpu() aroundAvi Kivity1-51/+51
In preparation of some hacking. Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-03-04KVM: Rename some kvm_dev_ioctl_*() functions to kvm_vm_ioctl_*()Avi Kivity1-24/+24
This reflects the changed scope, from device-wide to single vm (previously every device open created a virtual machine). Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-03-04KVM: Create an inode per virtual machineAvi Kivity1-41/+171
This avoids having filp->f_op and the corresponding inode->i_fop different, which is a little unorthodox. The ioctl list is split into two: global kvm ioctls and per-vm ioctls. A new ioctl, KVM_CREATE_VM, is used to create VMs and return the VM fd. Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-03-04KVM: Add internal filesystem for generating inodesAvi Kivity1-1/+32
The kvmfs inodes will represent virtual machines and vcpus, as necessary, reducing cacheline bouncing due to inodes and filps being shared. Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-03-04KVM: More 0 -> NULL conversionsAvi Kivity1-2/+2
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-03-04KVM: Wire up hypercall handlers to a central arch-independent locationAvi Kivity1-0/+36
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-03-04KVM: add MSR based hypercall APIIngo Molnar1-0/+73
This adds a special MSR based hypercall API to KVM. This is to be used by paravirtual kernels and virtual drivers. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-03-04KVM: Use page_private()/set_page_private() apisMarkus Rechberger1-1/+1
Besides using an established api, this allows using kvm in older kernels. Signed-off-by: Markus Rechberger <markus.rechberger@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-03-04KVM: CosmeticsAvi Kivity1-13/+8
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-03-04KVM: Move virtualization deactivation from CPU_DEAD state to CPU_DOWN_PREPAREJeremy Katz1-2/+6
This gives it more chances of surviving suspend. Signed-off-by: Jeremy Katz <katzj@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-02-12[PATCH] KVM: Host suspend/resume supportAvi Kivity1-1/+40
Add the necessary callbacks to suspend and resume a host running kvm. This is just a repeat of the cpu hotplug/unplug work. Signed-off-by: Avi Kivity <avi@qumranet.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>