summaryrefslogtreecommitdiff
path: root/arch/powerpc/kvm/book3s_hv.c
AgeCommit message (Collapse)AuthorFilesLines
2014-12-17KVM: PPC: Book3S HV: Tracepoints for KVM HV guest interactionsSuresh E. Warrier1-0/+19
This patch adds trace points in the guest entry and exit code and also for exceptions handled by the host in kernel mode - hypercalls and page faults. The new events are added to /sys/kernel/debug/tracing/events under a new subsystem called kvm_hv. Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-12-17KVM: PPC: Book3S HV: Simplify locking around stolen time calculationsPaul Mackerras1-29/+31
Currently the calculations of stolen time for PPC Book3S HV guests uses fields in both the vcpu struct and the kvmppc_vcore struct. The fields in the kvmppc_vcore struct are protected by the vcpu->arch.tbacct_lock of the vcpu that has taken responsibility for running the virtual core. This works correctly but confuses lockdep, because it sees that the code takes the tbacct_lock for a vcpu in kvmppc_remove_runnable() and then takes another vcpu's tbacct_lock in vcore_stolen_time(), and it thinks there is a possibility of deadlock, causing it to print reports like this: ============================================= [ INFO: possible recursive locking detected ] 3.18.0-rc7-kvm-00016-g8db4bc6 #89 Not tainted --------------------------------------------- qemu-system-ppc/6188 is trying to acquire lock: (&(&vcpu->arch.tbacct_lock)->rlock){......}, at: [<d00000000ecb1fe8>] .vcore_stolen_time+0x48/0xd0 [kvm_hv] but task is already holding lock: (&(&vcpu->arch.tbacct_lock)->rlock){......}, at: [<d00000000ecb25a0>] .kvmppc_remove_runnable.part.3+0x30/0xd0 [kvm_hv] other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&(&vcpu->arch.tbacct_lock)->rlock); lock(&(&vcpu->arch.tbacct_lock)->rlock); *** DEADLOCK *** May be due to missing lock nesting notation 3 locks held by qemu-system-ppc/6188: #0: (&vcpu->mutex){+.+.+.}, at: [<d00000000eb93f98>] .vcpu_load+0x28/0xe0 [kvm] #1: (&(&vcore->lock)->rlock){+.+...}, at: [<d00000000ecb41b0>] .kvmppc_vcpu_run_hv+0x530/0x1530 [kvm_hv] #2: (&(&vcpu->arch.tbacct_lock)->rlock){......}, at: [<d00000000ecb25a0>] .kvmppc_remove_runnable.part.3+0x30/0xd0 [kvm_hv] stack backtrace: CPU: 40 PID: 6188 Comm: qemu-system-ppc Not tainted 3.18.0-rc7-kvm-00016-g8db4bc6 #89 Call Trace: [c000000b2754f3f0] [c000000000b31b6c] .dump_stack+0x88/0xb4 (unreliable) [c000000b2754f470] [c0000000000faeb8] .__lock_acquire+0x1878/0x2190 [c000000b2754f600] [c0000000000fbf0c] .lock_acquire+0xcc/0x1a0 [c000000b2754f6d0] [c000000000b2954c] ._raw_spin_lock_irq+0x4c/0x70 [c000000b2754f760] [d00000000ecb1fe8] .vcore_stolen_time+0x48/0xd0 [kvm_hv] [c000000b2754f7f0] [d00000000ecb25b4] .kvmppc_remove_runnable.part.3+0x44/0xd0 [kvm_hv] [c000000b2754f880] [d00000000ecb43ec] .kvmppc_vcpu_run_hv+0x76c/0x1530 [kvm_hv] [c000000b2754f9f0] [d00000000eb9f46c] .kvmppc_vcpu_run+0x2c/0x40 [kvm] [c000000b2754fa60] [d00000000eb9c9a4] .kvm_arch_vcpu_ioctl_run+0x54/0x160 [kvm] [c000000b2754faf0] [d00000000eb94538] .kvm_vcpu_ioctl+0x498/0x760 [kvm] [c000000b2754fcb0] [c000000000267eb4] .do_vfs_ioctl+0x444/0x770 [c000000b2754fd90] [c0000000002682a4] .SyS_ioctl+0xc4/0xe0 [c000000b2754fe30] [c0000000000092e4] syscall_exit+0x0/0x98 In order to make the locking easier to analyse, we change the code to use a spinlock in the kvmppc_vcore struct to protect the stolen_tb and preempt_tb fields. This lock needs to be an irq-safe lock since it is used in the kvmppc_core_vcpu_load_hv() and kvmppc_core_vcpu_put_hv() functions, which are called with the scheduler rq lock held, which is an irq-safe lock. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-12-15KVM: PPC: Book3S HV: Check wait conditions before sleeping in ↵Suresh E. Warrier1-0/+20
kvmppc_vcore_blocked The kvmppc_vcore_blocked() code does not check for the wait condition after putting the process on the wait queue. This means that it is possible for an external interrupt to become pending, but the vcpu to remain asleep until the next decrementer interrupt. The fix is to make one last check for pending exceptions and ceded state before calling schedule(). Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-12-15KVM: PPC: Book3S HV: Fix an issue where guest is paused on receiving HMIMahesh Salgaonkar1-0/+2
When we get an HMI (hypervisor maintenance interrupt) while in a guest, we see that guest enters into paused state. The reason is, in kvmppc_handle_exit_hv it falls through default path and returns to host instead of resuming guest. This causes guest to enter into paused state. HMI is a hypervisor only interrupt and it is safe to resume the guest since the host has handled it already. This patch adds a switch case to resume the guest. Without this patch we see guest entering into paused state with following console messages: [ 3003.329351] Severe Hypervisor Maintenance interrupt [Recovered] [ 3003.329356] Error detail: Timer facility experienced an error [ 3003.329359] HMER: 0840000000000000 [ 3003.329360] TFMR: 4a12000980a84000 [ 3003.329366] vcpu c0000007c35094c0 (40): [ 3003.329368] pc = c0000000000c2ba0 msr = 8000000000009032 trap = e60 [ 3003.329370] r 0 = c00000000021ddc0 r16 = 0000000000000046 [ 3003.329372] r 1 = c00000007a02bbd0 r17 = 00003ffff27d5d98 [ 3003.329375] r 2 = c0000000010980b8 r18 = 00001fffffc9a0b0 [ 3003.329377] r 3 = c00000000142d6b8 r19 = c00000000142d6b8 [ 3003.329379] r 4 = 0000000000000002 r20 = 0000000000000000 [ 3003.329381] r 5 = c00000000524a110 r21 = 0000000000000000 [ 3003.329383] r 6 = 0000000000000001 r22 = 0000000000000000 [ 3003.329386] r 7 = 0000000000000000 r23 = c00000000524a110 [ 3003.329388] r 8 = 0000000000000000 r24 = 0000000000000001 [ 3003.329391] r 9 = 0000000000000001 r25 = c00000007c31da38 [ 3003.329393] r10 = c0000000014280b8 r26 = 0000000000000002 [ 3003.329395] r11 = 746f6f6c2f68656c r27 = c00000000524a110 [ 3003.329397] r12 = 0000000028004484 r28 = c00000007c31da38 [ 3003.329399] r13 = c00000000fe01400 r29 = 0000000000000002 [ 3003.329401] r14 = 0000000000000046 r30 = c000000003011e00 [ 3003.329403] r15 = ffffffffffffffba r31 = 0000000000000002 [ 3003.329404] ctr = c00000000041a670 lr = c000000000272520 [ 3003.329405] srr0 = c00000000007e8d8 srr1 = 9000000000001002 [ 3003.329406] sprg0 = 0000000000000000 sprg1 = c00000000fe01400 [ 3003.329407] sprg2 = c00000000fe01400 sprg3 = 0000000000000005 [ 3003.329408] cr = 48004482 xer = 2000000000000000 dsisr = 42000000 [ 3003.329409] dar = 0000010015020048 [ 3003.329410] fault dar = 0000010015020048 dsisr = 42000000 [ 3003.329411] SLB (8 entries): [ 3003.329412] ESID = c000000008000000 VSID = 40016e7779000510 [ 3003.329413] ESID = d000000008000001 VSID = 400142add1000510 [ 3003.329414] ESID = f000000008000004 VSID = 4000eb1a81000510 [ 3003.329415] ESID = 00001f000800000b VSID = 40004fda0a000d90 [ 3003.329416] ESID = 00003f000800000c VSID = 400039f536000d90 [ 3003.329417] ESID = 000000001800000d VSID = 0001251b35150d90 [ 3003.329417] ESID = 000001000800000e VSID = 4001e46090000d90 [ 3003.329418] ESID = d000080008000019 VSID = 40013d349c000400 [ 3003.329419] lpcr = c048800001847001 sdr1 = 0000001b19000006 last_inst = ffffffff [ 3003.329421] trap=0xe60 | pc=0xc0000000000c2ba0 | msr=0x8000000000009032 [ 3003.329524] Severe Hypervisor Maintenance interrupt [Recovered] [ 3003.329526] Error detail: Timer facility experienced an error [ 3003.329527] HMER: 0840000000000000 [ 3003.329527] TFMR: 4a12000980a94000 [ 3006.359786] Severe Hypervisor Maintenance interrupt [Recovered] [ 3006.359792] Error detail: Timer facility experienced an error [ 3006.359795] HMER: 0840000000000000 [ 3006.359797] TFMR: 4a12000980a84000 Id Name State ---------------------------------------------------- 2 guest2 running 3 guest3 paused 4 guest4 running Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-09-22powerpc/kvm: support to handle sw breakpointMadhavan Srinivasan1-4/+37
This patch adds kernel side support for software breakpoint. Design is that, by using an illegal instruction, we trap to hypervisor via Emulation Assistance interrupt, where we check for the illegal instruction and accordingly we return to Host or Guest. Patch also adds support for software breakpoint in PR KVM. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-09-22KVM: PPC: Book3S HV: Only accept host PVR value for guest PVRPaul Mackerras1-1/+3
Since the guest can read the machine's PVR (Processor Version Register) directly and see the real value, we should disallow userspace from setting any value for the guest's PVR other than the real host value. Therefore this makes kvm_arch_vcpu_set_sregs_hv() check the supplied PVR value and return an error if it is different from the host value, which has been put into vcpu->arch.pvr at vcpu creation time. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-09-22KVM: PPC: Book3S HV: Increase timeout for grabbing secondary threadsPaul Mackerras1-1/+1
Occasional failures have been seen with split-core mode and migration where the message "KVM: couldn't grab cpu" appears. This increases the length of time that we wait from 1ms to 10ms, which seems to work around the issue. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-07-28KVM: PPC: BOOK3S: HV: Update compute_tlbie_rb to handle 16MB base pageAneesh Kumar K.V1-6/+0
When calculating the lower bits of AVA field, use the shift count based on the base page size. Also add the missing segment size and remove stale comment. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-07-28Use the POWER8 Micro Partition Prefetch Engine in KVM HV on POWER8Stewart Smith1-1/+56
The POWER8 processor has a Micro Partition Prefetch Engine, which is a fancy way of saying "has way to store and load contents of L2 or L2+MRU way of L3 cache". We initiate the storing of the log (list of addresses) using the logmpp instruction and start restore by writing to a SPR. The logmpp instruction takes parameters in a single 64bit register: - starting address of the table to store log of L2/L2+L3 cache contents - 32kb for L2 - 128kb for L2+L3 - Aligned relative to maximum size of the table (32kb or 128kb) - Log control (no-op, L2 only, L2 and L3, abort logout) We should abort any ongoing logging before initiating one. To initiate restore, we write to the MPPR SPR. The format of what to write to the SPR is similar to the logmpp instruction parameter: - starting address of the table to read from (same alignment requirements) - table size (no data, until end of table) - prefetch rate (from fastest possible to slower. about every 8, 16, 24 or 32 cycles) The idea behind loading and storing the contents of L2/L3 cache is to reduce memory latency in a system that is frequently swapping vcores on a physical CPU. The best case scenario for doing this is when some vcores are doing very cache heavy workloads. The worst case is when they have about 0 cache hits, so we just generate needless memory operations. This implementation just does L2 store/load. In my benchmarks this proves to be useful. Benchmark 1: - 16 core POWER8 - 3x Ubuntu 14.04LTS guests (LE) with 8 VCPUs each - No split core/SMT - two guests running sysbench memory test. sysbench --test=memory --num-threads=8 run - one guest running apache bench (of default HTML page) ab -n 490000 -c 400 http://localhost/ This benchmark aims to measure performance of real world application (apache) where other guests are cache hot with their own workloads. The sysbench memory benchmark does pointer sized writes to a (small) memory buffer in a loop. In this benchmark with this patch I can see an improvement both in requests per second (~5%) and in mean and median response times (again, about 5%). The spread of minimum and maximum response times were largely unchanged. benchmark 2: - Same VM config as benchmark 1 - all three guests running sysbench memory benchmark This benchmark aims to see if there is a positive or negative affect to this cache heavy benchmark. Although due to the nature of the benchmark (stores) we may not see a difference in performance, but rather hopefully an improvement in consistency of performance (when vcore switched in, don't have to wait many times for cachelines to be pulled in) The results of this benchmark are improvements in consistency of performance rather than performance itself. With this patch, the few outliers in duration go away and we get more consistent performance in each guest. benchmark 3: - same 3 guests and CPU configuration as benchmark 1 and 2. - two idle guests - 1 guest running STREAM benchmark This scenario also saw performance improvement with this patch. On Copy and Scale workloads from STREAM, I got 5-6% improvement with this patch. For Add and triad, it was around 10% (or more). benchmark 4: - same 3 guests as previous benchmarks - two guests running sysbench --memory, distinctly different cache heavy workload - one guest running STREAM benchmark. Similar improvements to benchmark 3. benchmark 5: - 1 guest, 8 VCPUs, Ubuntu 14.04 - Host configured with split core (SMT8, subcores-per-core=4) - STREAM benchmark In this benchmark, we see a 10-20% performance improvement across the board of STREAM benchmark results with this patch. Based on preliminary investigation and microbenchmarks by Prerna Saxena <prerna@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-07-28Split out struct kvmppc_vcore creation to separate functionStewart Smith1-10/+21
No code changes, just split it out to a function so that with the addition of micro partition prefetch buffer allocation (in subsequent patch) looks neater and doesn't require excessive indentation. Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-07-28KVM: PPC: Book3S: Fix LPCR one_reg interfaceAlexey Kardashevskiy1-2/+11
Unfortunately, the LPCR got defined as a 32-bit register in the one_reg interface. This is unfortunate because KVM allows userspace to control the DPFD (default prefetch depth) field, which is in the upper 32 bits. The result is that DPFD always get set to 0, which reduces performance in the guest. We can't just change KVM_REG_PPC_LPCR to be a 64-bit register ID, since that would break existing userspace binaries. Instead we define a new KVM_REG_PPC_LPCR_64 id which is 64-bit. Userspace can still use the old KVM_REG_PPC_LPCR id, but it now only modifies those fields in the bottom 32 bits that userspace can modify (ILE, TC and AIL). If userspace uses the new KVM_REG_PPC_LPCR_64 id, it can modify DPFD as well. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Paul Mackerras <paulus@samba.org> Cc: stable@vger.kernel.org Signed-off-by: Alexander Graf <agraf@suse.de>
2014-07-28KVM: PPC: Book3S HV: Access guest VPA in BEAlexander Graf1-11/+11
There are a few shared data structures between the host and the guest. Most of them get registered through the VPA interface. These data structures are defined to always be in big endian byte order, so let's make sure we always access them in big endian. Signed-off-by: Alexander Graf <agraf@suse.de>
2014-07-28KVM: PPC: Book3S HV: Add H_SET_MODE hcall handlingMichael Neuling1-1/+51
This adds support for the H_SET_MODE hcall. This hcall is a multiplexer that has several functions, some of which are called rarely, and some which are potentially called very frequently. Here we add support for the functions that set the debug registers CIABR (Completed Instruction Address Breakpoint Register) and DAWR/DAWRX (Data Address Watchpoint Register and eXtension), since they could be updated by the guest as often as every context switch. This also adds a kvmppc_power8_compatible() function to test to see if a guest is compatible with POWER8 or not. The CIABR and DAWR/X only exist on POWER8. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-07-28KVM: PPC: Book3S: Allow only implemented hcalls to be enabled or disabledPaul Mackerras1-2/+29
This adds code to check that when the KVM_CAP_PPC_ENABLE_HCALL capability is used to enable or disable in-kernel handling of an hcall, that the hcall is actually implemented by the kernel. If not an EINVAL error is returned. This also checks the default-enabled list of hcalls and prints a warning if any hcall there is not actually implemented. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-07-28KVM: PPC: Book3S: Controls for in-kernel sPAPR hypercall handlingPaul Mackerras1-0/+51
This provides a way for userspace controls which sPAPR hcalls get handled in the kernel. Each hcall can be individually enabled or disabled for in-kernel handling, except for H_RTAS. The exception for H_RTAS is because userspace can already control whether individual RTAS functions are handled in-kernel or not via the KVM_PPC_RTAS_DEFINE_TOKEN ioctl, and because the numeric value for H_RTAS is out of the normal sequence of hcall numbers. Hcalls are enabled or disabled using the KVM_ENABLE_CAP ioctl for the KVM_CAP_PPC_ENABLE_HCALL capability on the file descriptor for the VM. The args field of the struct kvm_enable_cap specifies the hcall number in args[0] and the enable/disable flag in args[1]; 0 means disable in-kernel handling (so that the hcall will always cause an exit to userspace) and 1 means enable. Enabling or disabling in-kernel handling of an hcall is effective across the whole VM. The ability for KVM_ENABLE_CAP to be used on a VM file descriptor on PowerPC is new, added by this commit. The KVM_CAP_ENABLE_CAP_VM capability advertises that this ability exists. When a VM is created, an initial set of hcalls are enabled for in-kernel handling. The set that is enabled is the set that have an in-kernel implementation at this point. Any new hcall implementations from this point onwards should not be added to the default set without a good reason. No distinction is made between real-mode and virtual-mode hcall implementations; the one setting controls them both. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-07-28KVM: PPC: BOOK3S: PR: Emulate instruction counterAneesh Kumar K.V1-6/+0
Writing to IC is not allowed in the privileged mode. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-07-28KVM: PPC: BOOK3S: PR: Emulate virtual timebase registerAneesh Kumar K.V1-6/+0
virtual time base register is a per VM, per cpu register that needs to be saved and restored on vm exit and entry. Writing to VTB is not allowed in the privileged mode. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> [agraf: fix compile error] Signed-off-by: Alexander Graf <agraf@suse.de>
2014-06-11Merge branch 'next' of ↵Linus Torvalds1-14/+20
git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc Pull powerpc updates from Ben Herrenschmidt: "Here is the bulk of the powerpc changes for this merge window. It got a bit delayed in part because I wasn't paying attention, and in part because I discovered I had a core PCI change without a PCI maintainer ack in it. Bjorn eventually agreed it was ok to merge it though we'll probably improve it later and I didn't want to rebase to add his ack. There is going to be a bit more next week, essentially fixes that I still want to sort through and test. The biggest item this time is the support to build the ppc64 LE kernel with our new v2 ABI. We previously supported v2 userspace but the kernel itself was a tougher nut to crack. This is now sorted mostly thanks to Anton and Rusty. We also have a fairly big series from Cedric that add support for 64-bit LE zImage boot wrapper. This was made harder by the fact that traditionally our zImage wrapper was always 32-bit, but our new LE toolchains don't really support 32-bit anymore (it's somewhat there but not really "supported") so we didn't want to rely on it. This meant more churn that just endian fixes. This brings some more LE bits as well, such as the ability to run in LE mode without a hypervisor (ie. under OPAL firmware) by doing the right OPAL call to reinitialize the CPU to take HV interrupts in the right mode and the usual pile of endian fixes. There's another series from Gavin adding EEH improvements (one day we *will* have a release with less than 20 EEH patches, I promise!). Another highlight is the support for the "Split core" functionality on P8 by Michael. This allows a P8 core to be split into "sub cores" of 4 threads which allows the subcores to run different guests under KVM (the HW still doesn't support a partition per thread). And then the usual misc bits and fixes ..." [ Further delayed by gmail deciding that BenH is a dirty spammer. Google knows. ] * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (155 commits) powerpc/powernv: Add missing include to LPC code selftests/powerpc: Test the THP bug we fixed in the previous commit powerpc/mm: Check paca psize is up to date for huge mappings powerpc/powernv: Pass buffer size to OPAL validate flash call powerpc/pseries: hcall functions are exported to modules, need _GLOBAL_TOC() powerpc: Exported functions __clear_user and copy_page use r2 so need _GLOBAL_TOC() powerpc/powernv: Set memory_block_size_bytes to 256MB powerpc: Allow ppc_md platform hook to override memory_block_size_bytes powerpc/powernv: Fix endian issues in memory error handling code powerpc/eeh: Skip eeh sysfs when eeh is disabled powerpc: 64bit sendfile is capped at 2GB powerpc/powernv: Provide debugfs access to the LPC bus via OPAL powerpc/serial: Use saner flags when creating legacy ports powerpc: Add cpu family documentation powerpc/xmon: Fix up xmon format strings powerpc/powernv: Add calls to support little endian host powerpc: Document sysfs DSCR interface powerpc: Fix regression of per-CPU DSCR setting powerpc: Split __SYSFS_SPRSETUP macro arch: powerpc/fadump: Cleaning up inconsistent NULL checks ...
2014-05-30KVM: PPC: BOOK3S: HV: Add mixed page-size support for guestAneesh Kumar K.V1-0/+7
On recent IBM Power CPUs, while the hashed page table is looked up using the page size from the segmentation hardware (i.e. the SLB), it is possible to have the HPT entry indicate a larger page size. Thus for example it is possible to put a 16MB page in a 64kB segment, but since the hash lookup is done using a 64kB page size, it may be necessary to put multiple entries in the HPT for a single 16MB page. This capability is called mixed page-size segment (MPSS). With MPSS, there are two relevant page sizes: the base page size, which is the size used in searching the HPT, and the actual page size, which is the size indicated in the HPT entry. [ Note that the actual page size is always >= base page size ]. We use "ibm,segment-page-sizes" device tree node to advertise the MPSS support to PAPR guest. The penc encoding indicates whether we support a specific combination of base page size and actual page size in the same segment. We also use the penc value in the LP encoding of HPTE entry. This patch exposes MPSS support to KVM guest by advertising the feature via "ibm,segment-page-sizes". It also adds the necessary changes to decode the base page size and the actual page size correctly from the HPTE entry. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-05-30KVM: PPC: Book3S PR: Expose EBB registersAlexander Graf1-18/+0
POWER8 introduces a new facility called the "Event Based Branch" facility. It contains of a few registers that indicate where a guest should branch to when a defined event occurs and it's in PR mode. We don't want to really enable EBB as it will create a big mess with !PR guest mode while hardware is in PR and we don't really emulate the PMU anyway. So instead, let's just leave it at emulation of all its registers. Signed-off-by: Alexander Graf <agraf@suse.de>
2014-05-30KVM: PPC: Book3S PR: Expose TAR facility to guestAlexander Graf1-6/+0
POWER8 implements a new register called TAR. This register has to be enabled in FSCR and then from KVM's point of view is mere storage. This patch enables the guest to use TAR. Signed-off-by: Alexander Graf <agraf@suse.de>
2014-05-30KVM: PPC: Book3S PR: Handle Facility interrupt and FSCRAlexander Graf1-6/+0
POWER8 introduced a new interrupt type called "Facility unavailable interrupt" which contains its status message in a new register called FSCR. Handle these exits and try to emulate instructions for unhandled facilities. Follow-on patches enable KVM to expose specific facilities into the guest. Signed-off-by: Alexander Graf <agraf@suse.de>
2014-05-30KVM: PPC: Make shared struct aka magic page guest endianAlexander Graf1-0/+11
The shared (magic) page is a data structure that contains often used supervisor privileged SPRs accessible via memory to the user to reduce the number of exits we have to take to read/write them. When we actually share this structure with the guest we have to maintain it in guest endianness, because some of the patch tricks only work with native endian load/store operations. Since we only share the structure with either host or guest in little endian on book3s_64 pr mode, we don't have to worry about booke or book3s hv. For booke, the shared struct stays big endian. For book3s_64 hv we maintain the struct in host native endian, since it never gets shared with the guest. For book3s_64 pr we introduce a variable that tells us which endianness the shared struct is in and route every access to it through helper inline functions that evaluate this variable. Signed-off-by: Alexander Graf <agraf@suse.de>
2014-05-28powerpc/kvm/book3s_hv: Use threads_per_subcore in KVMMichael Ellerman1-10/+16
To support split core on POWER8 we need to modify various parts of the KVM code to use threads_per_subcore instead of threads_per_core. On systems that do not support split core threads_per_subcore == threads_per_core and these changes are a nop. We use threads_per_subcore as the value reported by KVM_CAP_PPC_SMT. This communicates to userspace that guests can only be created with a value of threads_per_core that is less than or equal to the current threads_per_subcore. This ensures that guests can only be created with a thread configuration that we are able to run given the current split core mode. Although threads_per_subcore can change during the life of the system, the commit that enables that will ensure that threads_per_subcore does not change during the life of a KVM VM. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Michael Neuling <mikey@neuling.org> Acked-by: Alexander Graf <agraf@suse.de> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-05-28powerpc/kvm/book3s_hv: Rework the secondary inhibit codeMichael Ellerman1-4/+4
As part of the support for split core on POWER8, we want to be able to block splitting of the core while KVM VMs are active. The logic to do that would be exactly the same as the code we currently have for inhibiting onlining of secondaries. Instead of adding an identical mechanism to block split core, rework the secondary inhibit code to be a "HV KVM is active" check. We can then use that in both the cpu hotplug code and the upcoming split core code. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Michael Neuling <mikey@neuling.org> Acked-by: Alexander Graf <agraf@suse.de> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-03-29KVM: PPC: Book3S HV: Return ENODEV error rather than EIOPaul Mackerras1-1/+1
If an attempt is made to load the kvm-hv module on a machine which doesn't have hypervisor mode available, return an ENODEV error, which is the conventional thing to return to indicate that this module is not applicable to the hardware of the current machine, rather than EIO, which causes a warning to be printed. Signed-off-by: Paul Mackerras <paulus@samba.org> Acked-by: Scott Wood <scottwood@freescale.com>
2014-03-29KVM: PPC: Book3S HV: Add get/set_one_reg for new TM stateMichael Neuling1-22/+125
This adds code to get/set_one_reg to read and write the new transactional memory (TM) state. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Paul Mackerras <paulus@samba.org> Acked-by: Scott Wood <scottwood@freescale.com>
2014-03-26KVM: PPC: Book3S HV: Fix KVM hang with CONFIG_KVM_XICS=nAnton Blanchard1-3/+1
I noticed KVM is broken when KVM in-kernel XICS emulation (CONFIG_KVM_XICS) is disabled. The problem was introduced in 48eaef05 (KVM: PPC: Book3S HV: use xics_wake_cpu only when defined). It used CONFIG_KVM_XICS to wrap xics_wake_cpu, where CONFIG_PPC_ICP_NATIVE should have been used. Signed-off-by: Anton Blanchard <anton@samba.org> Cc: stable@vger.kernel.org Signed-off-by: Paul Mackerras <paulus@samba.org> Acked-by: Scott Wood <scottwood@freescale.com>
2014-03-26KVM: PPC: Book3S HV: Fix incorrect userspace exit on ioeventfd writeGreg Kurz1-3/+3
When the guest does an MMIO write which is handled successfully by an ioeventfd, ioeventfd_write() returns 0 (success) and kvmppc_handle_store() returns EMULATE_DONE. Then kvmppc_emulate_mmio() converts EMULATE_DONE to RESUME_GUEST_NV and this causes an exit from the loop in kvmppc_vcpu_run_hv(), causing an exit back to userspace with a bogus exit reason code, typically causing userspace (e.g. qemu) to crash with a message about an unknown exit code. This adds handling of RESUME_GUEST_NV in kvmppc_vcpu_run_hv() in order to fix that. For generality, we define a helper to check for either of the return-to-guest codes we use, RESUME_GUEST and RESUME_GUEST_NV, to make it easy to check for either and provide one place to update if any other return-to-guest code gets defined in future. Since it only affects Book3S HV for now, the helper is added to the kvm_book3s.h header file. We use the helper in two places in kvmppc_run_core() as well for future-proofing, though we don't see RESUME_GUEST_NV in either place at present. [paulus@samba.org - combined 4 patches into one, rewrote description] Suggested-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2014-01-29Merge branch 'kvm-ppc-next' of git://github.com/agraf/linux-2.6 into kvm-queuePaolo Bonzini1-78/+241
Conflicts: arch/powerpc/kvm/book3s_hv_rmhandlers.S arch/powerpc/kvm/booke.c
2014-01-27KVM: PPC: Book3S HV: Add new state for transactional memoryMichael Neuling1-0/+4
Add new state for transactional memory (TM) to kvm_vcpu_arch. Also add asm-offset bits that are going to be required. This also moves the existing TFHAR, TFIAR and TEXASR SPRs into a CONFIG_PPC_TRANSACTIONAL_MEM section. This requires some code changes to ensure we still compile with CONFIG_PPC_TRANSACTIONAL_MEM=N. Much of the added the added #ifdefs are removed in a later patch when the bulk of the TM code is added. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Paul Mackerras <paulus@samba.org> [agraf: fix merge conflict] Signed-off-by: Alexander Graf <agraf@suse.de>
2014-01-27KVM: PPC: Book3S HV: Basic little-endian guest supportAnton Blanchard1-0/+22
We create a guest MSR from scratch when delivering exceptions in a few places. Instead of extracting LPCR[ILE] and inserting it into MSR_LE each time, we simply create a new variable intr_msr which contains the entire MSR to use. For a little-endian guest, userspace needs to set the ILE (interrupt little-endian) bit in the LPCR for each vcpu (or at least one vcpu in each virtual core). [paulus@samba.org - removed H_SET_MODE implementation from original version of the patch, and made kvmppc_set_lpcr update vcpu->arch.intr_msr.] Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-01-27KVM: PPC: Book3S HV: Add support for DABRX register on POWER7Paul Mackerras1-0/+6
The DABRX (DABR extension) register on POWER7 processors provides finer control over which accesses cause a data breakpoint interrupt. It contains 3 bits which indicate whether to enable accesses in user, kernel and hypervisor modes respectively to cause data breakpoint interrupts, plus one bit that enables both real mode and virtual mode accesses to cause interrupts. Currently, KVM sets DABRX to allow both kernel and user accesses to cause interrupts while in the guest. This adds support for the guest to specify other values for DABRX. PAPR defines a H_SET_XDABR hcall to allow the guest to set both DABR and DABRX with one call. This adds a real-mode implementation of H_SET_XDABR, which shares most of its code with the existing H_SET_DABR implementation. To support this, we add a per-vcpu field to store the DABRX value plus code to get and set it via the ONE_REG interface. For Linux guests to use this new hcall, userspace needs to add "hcall-xdabr" to the set of strings in the /chosen/hypertas-functions property in the device tree. If userspace does this and then migrates the guest to a host where the kernel doesn't include this patch, then userspace will need to implement H_SET_XDABR by writing the specified DABR value to the DABR using the ONE_REG interface. In that case, the old kernel will set DABRX to DABRX_USER | DABRX_KERNEL. That should still work correctly, at least for Linux guests, since Linux guests cope with getting data breakpoint interrupts in modes that weren't requested by just ignoring the interrupt, and Linux guests never set DABRX_BTI. The other thing this does is to make H_SET_DABR and H_SET_XDABR work on POWER8, which has the DAWR and DAWRX instead of DABR/X. Guests that know about POWER8 should use H_SET_MODE rather than H_SET_[X]DABR, but guests running in POWER7 compatibility mode will still use H_SET_[X]DABR. For them, this adds the logic to convert DABR/X values into DAWR/X values on POWER8. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-01-27KVM: PPC: Book3S HV: Prepare for host using hypervisor doorbellsPaul Mackerras1-0/+1
POWER8 has support for hypervisor doorbell interrupts. Though the kernel doesn't use them for IPIs on the powernv platform yet, it probably will in future, so this makes KVM cope gracefully if a hypervisor doorbell interrupt arrives while in a guest. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-01-27KVM: PPC: Book3S HV: Handle new LPCR bits on POWER8Paul Mackerras1-0/+6
POWER8 has a bit in the LPCR to enable or disable the PURR and SPURR registers to count when in the guest. Set this bit. POWER8 has a field in the LPCR called AIL (Alternate Interrupt Location) which is used to enable relocation-on interrupts. Allow userspace to set this field. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-01-27KVM: PPC: Book3S HV: Implement architecture compatibility modes for POWER8Paul Mackerras1-1/+15
This allows us to select architecture 2.05 (POWER6) or 2.06 (POWER7) compatibility modes on a POWER8 processor. (Note that transactional memory is disabled for usermode if either or both of the PCR_TM_DIS and PCR_ARCH_206 bits are set.) Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-01-27KVM: PPC: Book3S HV: Add handler for HV facility unavailableMichael Ellerman1-1/+10
At present this should never happen, since the host kernel sets HFSCR to allow access to all facilities. It's better to be prepared to handle it cleanly if it does ever happen, though. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-01-27KVM: PPC: Book3S HV: Context-switch new POWER8 SPRsMichael Neuling1-2/+151
This adds fields to the struct kvm_vcpu_arch to store the new guest-accessible SPRs on POWER8, adds code to the get/set_one_reg functions to allow userspace to access this state, and adds code to the guest entry and exit to context-switch these SPRs between host and guest. Note that DPDES (Directed Privileged Doorbell Exception State) is shared between threads on a core; hence we store it in struct kvmppc_vcore and have the master thread save and restore it. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-01-27KVM: PPC: Book3S HV: Align physical and virtual CPU thread numbersPaul Mackerras1-29/+17
On a threaded processor such as POWER7, we group VCPUs into virtual cores and arrange that the VCPUs in a virtual core run on the same physical core. Currently we don't enforce any correspondence between virtual thread numbers within a virtual core and physical thread numbers. Physical threads are allocated starting at 0 on a first-come first-served basis to runnable virtual threads (VCPUs). POWER8 implements a new "msgsndp" instruction which guest kernels can use to interrupt other threads in the same core or sub-core. Since the instruction takes the destination physical thread ID as a parameter, it becomes necessary to align the physical thread IDs with the virtual thread IDs, that is, to make sure virtual thread N within a virtual core always runs on physical thread N. This means that it's possible that thread 0, which is where we call __kvmppc_vcore_entry, may end up running some other vcpu than the one whose task called kvmppc_run_core(), or it may end up running no vcpu at all, if for example thread 0 of the virtual core is currently executing in userspace. However, we do need thread 0 to be responsible for switching the MMU -- a previous version of this patch that had other threads switching the MMU was found to be responsible for occasional memory corruption and machine check interrupts in the guest on POWER7 machines. To accommodate this, we no longer pass the vcpu pointer to __kvmppc_vcore_entry, but instead let the assembly code load it from the PACA. Since the assembly code will need to know the kvm pointer and the thread ID for threads which don't have a vcpu, we move the thread ID into the PACA and we add a kvm pointer to the virtual core structure. In the case where thread 0 has no vcpu to run, it still calls into kvmppc_hv_entry in order to do the MMU switch, and then naps until either its vcpu is ready to run in the guest, or some other thread needs to exit the guest. In the latter case, thread 0 jumps to the code that switches the MMU back to the host. This control flow means that now we switch the MMU before loading any guest vcpu state. Similarly, on guest exit we now save all the guest vcpu state before switching the MMU back to the host. This has required substantial code movement, making the diff rather large. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-01-27KVM: PPC: Book3S HV: use xics_wake_cpu only when definedAndreas Schwab1-1/+6
Signed-off-by: Andreas Schwab <schwab@linux-m68k.org> CC: stable@vger.kernel.org Signed-off-by: Alexander Graf <agraf@suse.de>
2014-01-23Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-1/+1
Pull KVM updates from Paolo Bonzini: "First round of KVM updates for 3.14; PPC parts will come next week. Nothing major here, just bugfixes all over the place. The most interesting part is the ARM guys' virtualized interrupt controller overhaul, which lets userspace get/set the state and thus enables migration of ARM VMs" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (67 commits) kvm: make KVM_MMU_AUDIT help text more readable KVM: s390: Fix memory access error detection KVM: nVMX: Update guest activity state field on L2 exits KVM: nVMX: Fix nested_run_pending on activity state HLT KVM: nVMX: Clean up handling of VMX-related MSRs KVM: nVMX: Add tracepoints for nested_vmexit and nested_vmexit_inject KVM: nVMX: Pass vmexit parameters to nested_vmx_vmexit KVM: nVMX: Leave VMX mode on clearing of feature control MSR KVM: VMX: Fix DR6 update on #DB exception KVM: SVM: Fix reading of DR6 KVM: x86: Sync DR7 on KVM_SET_DEBUGREGS add support for Hyper-V reference time counter KVM: remove useless write to vcpu->hv_clock.tsc_timestamp KVM: x86: fix tsc catchup issue with tsc scaling KVM: x86: limit PIT timer frequency KVM: x86: handle invalid root_hpa everywhere kvm: Provide kvm_vcpu_eligible_for_directed_yield() stub kvm: vfio: silence GCC warning KVM: ARM: Remove duplicate include arm/arm64: KVM: relax the requirements of VMA alignment for THP ...
2014-01-09KVM: PPC: Store FP/VSX/VMX state in thread_fp/vr_state structuresPaul Mackerras1-42/+0
This uses struct thread_fp_state and struct thread_vr_state to store the floating-point, VMX/Altivec and VSX state, rather than flat arrays. This makes transferring the state to/from the thread_struct simpler and allows us to unify the get/set_one_reg implementations for the VSX registers. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-01-09KVM: PPC: Add devname:kvm aliases for modulesAlexander Graf1-0/+3
Systems that support automatic loading of kernel modules through device aliases should try and automatically load kvm when /dev/kvm gets opened. Add code to support that magic for all PPC kvm targets, even the ones that don't support modules yet. Signed-off-by: Alexander Graf <agraf@suse.de>
2013-12-13KVM: Use cond_resched() directly and remove useless kvm_resched()Takuya Yoshikawa1-1/+1
Since the commit 15ad7146 ("KVM: Use the scheduler preemption notifiers to make kvm preemptible"), the remaining stuff in this function is a simple cond_resched() call with an extra need_resched() check which was there to avoid dropping VCPUs unnecessarily. Now it is meaningless. Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-11-21powerpc: kvm: optimize "sc 1" as fast returnLiu Ping Fan1-6/+4
In some scene, e.g openstack CI, PR guest can trigger "sc 1" frequently, this patch optimizes the path by directly delivering BOOK3S_INTERRUPT_SYSCALL to HV guest, so powernv can return to HV guest without heavy exit, i.e, no need to swap TLB, HTAB,.. etc Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-11-19KVM: PPC: Book3S HV: Take SRCU read lock around kvm_read_guest() callPaul Mackerras1-0/+2
Running a kernel with CONFIG_PROVE_RCU=y yields the following diagnostic: =============================== [ INFO: suspicious RCU usage. ] 3.12.0-rc5-kvm+ #9 Not tainted ------------------------------- include/linux/kvm_host.h:473 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 0 1 lock held by qemu-system-ppc/4831: stack backtrace: CPU: 28 PID: 4831 Comm: qemu-system-ppc Not tainted 3.12.0-rc5-kvm+ #9 Call Trace: [c000000be462b2a0] [c00000000001644c] .show_stack+0x7c/0x1f0 (unreliable) [c000000be462b370] [c000000000ad57c0] .dump_stack+0x88/0xb4 [c000000be462b3f0] [c0000000001315e8] .lockdep_rcu_suspicious+0x138/0x180 [c000000be462b480] [c00000000007862c] .gfn_to_memslot+0x13c/0x170 [c000000be462b510] [c00000000007d384] .gfn_to_hva_prot+0x24/0x90 [c000000be462b5a0] [c00000000007d420] .kvm_read_guest_page+0x30/0xd0 [c000000be462b630] [c00000000007d528] .kvm_read_guest+0x68/0x110 [c000000be462b6e0] [c000000000084594] .kvmppc_rtas_hcall+0x34/0x180 [c000000be462b7d0] [c000000000097934] .kvmppc_pseries_do_hcall+0x74/0x830 [c000000be462b880] [c0000000000990e8] .kvmppc_vcpu_run_hv+0xff8/0x15a0 [c000000be462b9e0] [c0000000000839cc] .kvmppc_vcpu_run+0x2c/0x40 [c000000be462ba50] [c0000000000810b4] .kvm_arch_vcpu_ioctl_run+0x54/0x1b0 [c000000be462bae0] [c00000000007b508] .kvm_vcpu_ioctl+0x478/0x730 [c000000be462bca0] [c00000000025532c] .do_vfs_ioctl+0x4dc/0x7a0 [c000000be462bd80] [c0000000002556b4] .SyS_ioctl+0xc4/0xe0 [c000000be462be30] [c000000000009ee4] syscall_exit+0x0/0x98 To fix this, we take the SRCU read lock around the kvmppc_rtas_hcall() call. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-11-19KVM: PPC: Book3S HV: Make tbacct_lock irq-safePaul Mackerras1-10/+12
Lockdep reported that there is a potential for deadlock because vcpu->arch.tbacct_lock is not irq-safe, and is sometimes taken inside the rq_lock (run-queue lock) in the scheduler, which is taken within interrupts. The lockdep splat looks like: ====================================================== [ INFO: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected ] 3.12.0-rc5-kvm+ #8 Not tainted ------------------------------------------------------ qemu-system-ppc/4803 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire: (&(&vcpu->arch.tbacct_lock)->rlock){+.+...}, at: [<c0000000000947ac>] .kvmppc_core_vcpu_put_hv+0x2c/0xa0 and this task is already holding: (&rq->lock){-.-.-.}, at: [<c000000000ac16c0>] .__schedule+0x180/0xaa0 which would create a new lock dependency: (&rq->lock){-.-.-.} -> (&(&vcpu->arch.tbacct_lock)->rlock){+.+...} but this new dependency connects a HARDIRQ-irq-safe lock: (&rq->lock){-.-.-.} ... which became HARDIRQ-irq-safe at: [<c00000000013797c>] .lock_acquire+0xbc/0x190 [<c000000000ac3c74>] ._raw_spin_lock+0x34/0x60 [<c0000000000f8564>] .scheduler_tick+0x54/0x180 [<c0000000000c2610>] .update_process_times+0x70/0xa0 [<c00000000012cdfc>] .tick_periodic+0x3c/0xe0 [<c00000000012cec8>] .tick_handle_periodic+0x28/0xb0 [<c00000000001ef40>] .timer_interrupt+0x120/0x2e0 [<c000000000002868>] decrementer_common+0x168/0x180 [<c0000000001c7ca4>] .get_page_from_freelist+0x924/0xc10 [<c0000000001c8e00>] .__alloc_pages_nodemask+0x200/0xba0 [<c0000000001c9eb8>] .alloc_pages_exact_nid+0x68/0x110 [<c000000000f4c3ec>] .page_cgroup_init+0x1e0/0x270 [<c000000000f24480>] .start_kernel+0x3e0/0x4e4 [<c000000000009d30>] .start_here_common+0x20/0x70 to a HARDIRQ-irq-unsafe lock: (&(&vcpu->arch.tbacct_lock)->rlock){+.+...} ... which became HARDIRQ-irq-unsafe at: ... [<c00000000013797c>] .lock_acquire+0xbc/0x190 [<c000000000ac3c74>] ._raw_spin_lock+0x34/0x60 [<c0000000000946ac>] .kvmppc_core_vcpu_load_hv+0x2c/0x100 [<c00000000008394c>] .kvmppc_core_vcpu_load+0x2c/0x40 [<c000000000081000>] .kvm_arch_vcpu_load+0x10/0x30 [<c00000000007afd4>] .vcpu_load+0x64/0xd0 [<c00000000007b0f8>] .kvm_vcpu_ioctl+0x68/0x730 [<c00000000025530c>] .do_vfs_ioctl+0x4dc/0x7a0 [<c000000000255694>] .SyS_ioctl+0xc4/0xe0 [<c000000000009ee4>] syscall_exit+0x0/0x98 Some users have reported this deadlock occurring in practice, though the reports have been primarily on 3.10.x-based kernels. This fixes the problem by making tbacct_lock be irq-safe. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17kvm: powerpc: book3s: drop is_hv_enabledAneesh Kumar K.V1-1/+0
drop is_hv_enabled, because that should not be a callback property Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17kvm: powerpc: book3s: Allow the HV and PR selection per virtual machineAneesh Kumar K.V1-8/+10
This moves the kvmppc_ops callbacks to be a per VM entity. This enables us to select HV and PR mode when creating a VM. We also allow both kvm-hv and kvm-pr kernel module to be loaded. To achieve this we move /dev/kvm ownership to kvm.ko module. Depending on which KVM mode we select during VM creation we take a reference count on respective module Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> [agraf: fix coding style] Signed-off-by: Alexander Graf <agraf@suse.de>
2013-10-17kvm: powerpc: book3s: Support building HV and PR KVM as moduleAneesh Kumar K.V1-0/+2
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> [agraf: squash in compile fix] Signed-off-by: Alexander Graf <agraf@suse.de>