summaryrefslogtreecommitdiff
path: root/arch/s390/kvm/kvm-s390.c
AgeCommit message (Collapse)AuthorFilesLines
2016-06-16KVM: s390: use kvm->created_vcpusPaolo Bonzini1-5/+5
The new created_vcpus field avoids possible races between enabling capabilities and creating VCPUs. Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-13s390/time: remove ETR supportMartin Schwidefsky1-1/+1
The External-Time-Reference (ETR) clock synchronization interface has been superseded by Server-Time-Protocol (STP). Remove the outdated ETR interface. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-10KVM: s390: storage keys fit into a charDavid Hildenbrand1-2/+1
No need to convert the storage key into an unsigned long, the target function expects a char as argument. Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10s390/mm: return key via pointer in get_guest_storage_keyDavid Hildenbrand1-6/+2
Let's just split returning the key and reporting errors. This makes calling code easier and avoids bugs as happened already. Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10s390/mm: set and get guest storage key mmap lockingMartin Schwidefsky1-10/+16
Move the mmap semaphore locking out of set_guest_storage_key and get_guest_storage_key. This makes the two functions more like the other ptep_xxx operations and allows to avoid repeated semaphore operations if multiple keys are read or written. Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10KVM: s390: turn on tx even without ctxDavid Hildenbrand1-1/+1
Constrained transactional execution is an addon of transactional execution. Let's enable the assist also if only TX is enabled for the guest. Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10KVM: s390: enable host-protection-interruption only with ESOPDavid Hildenbrand1-1/+3
host-protection-interruption control was introduced with ESOP. So let's enable it only if we have ESOP and add an explanatory comment why we can live without it. Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10KVM: s390: enable ibs only if availableDavid Hildenbrand1-0/+2
Let's enable interlock-and-broadcast suppression only if the facility is actually available. Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10KVM: s390: enable PFMFI only if availableDavid Hildenbrand1-1/+1
Let's enable interpretation of PFMFI only if the facility is actually available. Emulation code still works in case the guest is offered EDAT-1. Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10KVM: s390: enable cei only if availableDavid Hildenbrand1-1/+3
Let's only enable conditional-external-interruption if the facility is actually available. Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10KVM: s390: enable ib only if availableDavid Hildenbrand1-1/+3
Let's enable intervention bypass only if the facility is acutally available. Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10KVM: s390: handle missing guest-storage-limit-suppressionDavid Hildenbrand1-0/+4
If guest-storage-limit-suppression is not available, we would for now have a valid guest address space with size 0. So let's simply set the origin to 0 and the limit to hamax. Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10KVM: s390: provide CMMA attributes only if availableDavid Hildenbrand1-1/+6
Let's not provide the device attribute for cmma enabling and clearing if the hardware doesn't support it. This also helps getting rid of the undocumented return value "-EINVAL" in case CMMA is not available when trying to enable it. Also properly document the meaning of -EINVAL for CMMA clearing. Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10KVM: s390: enable CMMA if the interpration is availableDavid Hildenbrand1-2/+1
Now that we can detect if collaborative-memory-management interpretation is available, replace the heuristic by a real hardware detection. Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10KVM: s390: guestdbg: signal missing hardware supportDavid Hildenbrand1-0/+2
Without guest-PER enhancement, we can't provide any debugging support. Therefore act like kernel support is missing. Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10KVM: s390: handle missing 64-bit-SCAO facilityDavid Hildenbrand1-4/+8
Without that facility, we may only use scaol. So fallback to DMA allocation in that case, so we won't overwrite random memory via the SIE. Also disallow ESCA, so we don't have to handle that allocation case. Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10KVM: s390: interface to query and configure cpu subfunctionsDavid Hildenbrand1-0/+89
We have certain instructions that indicate available subfunctions via a query subfunction (crypto functions and ptff), or via a test bit function (plo). By exposing these "subfunction blocks" to user space, we allow user space to 1) query available subfunctions and make sure subfunctions won't get lost during migration - e.g. properly indicate them via a CPU model 2) change the subfunctions to be reported to the guest (even adding unavailable ones) This mechanism works just like the way we indicate the stfl(e) list to user space. This way, user space could even emulate some subfunctions in QEMU in the future. If this is ever applicable, we have to make sure later on, that unsupported subfunctions result in an intercept to QEMU. Please note that support to indicate them to the guest is still missing and requires hardware support. Usually, the IBC takes already care of these subfunctions for migration safety. QEMU should make sure to always set these bits properly according to the machine generation to be emulated. Available subfunctions are only valid in combination with STFLE bits retrieved via KVM_S390_VM_CPU_MACHINE and enabled via KVM_S390_VM_CPU_PROCESSOR. If the applicable bits are available, the indicated subfunctions are guaranteed to be correct. Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10KVM: s390: forward ESOP if availableDavid Hildenbrand1-0/+13
ESOP guarantees that during a protection exception, bit 61 of real location 168-175 will only be set to 1 if it was because of ALCP or DATP. If the exception is due to LAP or KCP, the bit will always be set to 0. The old SOP definition allowed bit 61 to be unpredictable in case of LAP or KCP in some conditions. So ESOP replaces this unpredictability by a guarantee. Therefore, we can directly forward ESOP if it is available on our machine. We don't have to do anything when ESOP is disabled - the guest will simply expect unpredictable values. Our guest access functions are already handling ESOP properly. Please note that future functionality in KVM will require knowledge about ESOP being enabled for a guest or not. Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10KVM: s390: interface to query and configure cpu featuresDavid Hildenbrand1-0/+63
For now, we only have an interface to query and configure facilities indicated via STFL(E). However, we also have features indicated via SCLP, that have to be indicated to the guest by user space and usually require KVM support. This patch allows user space to query and configure available cpu features for the guest. Please note that disabling a feature doesn't necessarily mean that it is completely disabled (e.g. ESOP is mostly handled by the SIE). We will try our best to disable it. Most features (e.g. SCLP) can't directly be forwarded, as most of them need in addition to hardware support, support in KVM. As we later on want to turn these features in KVM explicitly on/off (to simulate different behavior), we have to filter all features provided by the hardware and make them configurable. Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10KVM: s390: Limit sthyi executionJanosch Frank1-0/+2
Store hypervisor information is a valid instruction not only in supervisor state but also in problem state, i.e. the guest's userspace. Its execution is not only computational and memory intensive, but also has to get hold of the ipte lock to write to the guest's memory. This lock is not intended to be held often and long, especially not from the untrusted guest userspace. Therefore we apply rate limiting of sthyi executions per VM. Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com> Acked-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10KVM: s390: Add sthyi emulationJanosch Frank1-0/+6
Store Hypervisor Information is an emulated z/VM instruction that provides a guest with basic information about the layers it is running on. This includes information about the cpu configuration of both the machine and the lpar, as well as their names, machine model and machine type. This information enables an application to determine the maximum capacity of CPs and IFLs available to software. The instruction is available whenever the facility bit 74 is set, otherwise executing it results in an operation exception. It is important to check the validity flags in the sections before using data from any structure member. It is not guaranteed that all members will be valid on all machines / machine configurations. Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10KVM: s390: Add operation exception interception handlerJanosch Frank1-0/+1
This commit introduces code that handles operation exception interceptions. With this handler we can emulate instructions by using illegal opcodes. Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10KVM: s390: Add stats for PEI eventsAlexander Yarygin1-0/+1
Add partial execution intercepted events in kvm_stats_debugfs. Signed-off-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10KVM: s390: ignore IBC if zeroDavid Hildenbrand1-1/+1
Looks like we forgot about the special IBC value of 0 meaning "no IBC". Let's fix that, otherwise it gets rounded up and suddenly an IBC is active with the lowest possible machine. Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Fixes: commit 053dd2308d81 ("KVM: s390: force ibc into valid range") Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-05-13KVM: halt_polling: provide a way to qualify wakeups during pollChristian Borntraeger1-0/+6
Some wakeups should not be considered a sucessful poll. For example on s390 I/O interrupts are usually floating, which means that _ALL_ CPUs would be considered runnable - letting all vCPUs poll all the time for transactional like workload, even if one vCPU would be enough. This can result in huge CPU usage for large guests. This patch lets architectures provide a way to qualify wakeups if they should be considered a good/bad wakeups in regard to polls. For s390 the implementation will fence of halt polling for anything but known good, single vCPU events. The s390 implementation for floating interrupts does a wakeup for one vCPU, but the interrupt will be delivered by whatever CPU checks first for a pending interrupt. We prefer the woken up CPU by marking the poll of this CPU as "good" poll. This code will also mark several other wakeup reasons like IPI or expired timers as "good". This will of course also mark some events as not sucessful. As KVM on z runs always as a 2nd level hypervisor, we prefer to not poll, unless we are really sure, though. This patch successfully limits the CPU usage for cases like uperf 1byte transactional ping pong workload or wakeup heavy workload like OLTP while still providing a proper speedup. This also introduced a new vcpu stat "halt_poll_no_tuning" that marks wakeups that are considered not good for polling. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Radim Krčmář <rkrcmar@redhat.com> (for an earlier version) Cc: David Matlack <dmatlack@google.com> Cc: Wanpeng Li <kernellwp@gmail.com> [Rename config symbol. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-09KVM: s390: Populate mask of non-hypervisor managed facility bitsAlexander Yarygin1-3/+16
When a guest is initializing, KVM provides facility bits that can be successfully used by the guest. It's done by applying kvm_s390_fac_list_mask mask on host facility bits stored by the STFLE instruction. Facility bits can be one of two kinds: it's either a hypervisor managed bit or non-hypervisor managed. The hardware provides information which bits need special handling. Let's automatically passthrough to guests new facility bits, that don't require hypervisor support. Signed-off-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Reviewed-by: Eric Farman <farman@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-05-09KVM: s390: Enable all facility bits that are known good for passthroughAlexander Yarygin1-2/+2
Some facility bits are in a range that is defined to be "ok for guests without any necessary hypervisor changes". Enable those bits. Signed-off-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-05-09KVM: s390: force ibc into valid rangeDavid Hildenbrand1-1/+11
Some hardware variants will round the ibc value up/down themselves, others will report a validity intercept. Let's always round it up/down. This patch will also make sure that the ibc is set to 0 in case we don't have ibc support (lowest_ibc == 0). Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-05-09KVM: s390: cleanup cpuid handlingDavid Hildenbrand1-8/+9
We only have one cpuid for all VCPUs, so let's directly use the one in the cpu model. Also always store it directly as u64, no need for struct cpuid. Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-05-09KVM: s390: enable SRS only if enabled for the guestDavid Hildenbrand1-1/+3
If we don't have SIGP SENSE RUNNING STATUS enabled for the guest, let's not enable interpretation so we can correctly report an invalid order. Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-05-09KVM: s390: enable PFMFI only if guest has EDAT1David Hildenbrand1-1/+2
Only enable PFMF interpretation if the necessary facility (EDAT1) is available, otherwise the pfmf handler in priv.c will inject an exception Reviewed-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-03-16Merge branch 'for-linus' of ↵Linus Torvalds1-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Martin Schwidefsky: - Add the CPU id for the new z13s machine - Add a s390 specific XOR template for RAID-5 checksumming based on the XC instruction. Remove all other alternatives, XC is always faster - The merge of our four different stack tracers into a single one - Tidy up the code related to page tables, several large inline functions are now out-of-line. Bloat-o-meter reports ~11K text size reduction - A binary interface for the priviledged CLP instruction to retrieve the hardware view of the installed PCI functions - Improvements for the dasd format code - Bug fixes and cleanups * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (31 commits) s390/pci: enforce fmb page boundary rule s390: fix floating pointer register corruption (again) s390/cpumf: add missing lpp magic initialization s390: Fix misspellings in comments s390/mm: split arch/s390/mm/pgtable.c s390/mm: uninline pmdp_xxx functions from pgtable.h s390/mm: uninline ptep_xxx functions from pgtable.h s390/pci: add ioctl interface for CLP s390: Use pr_warn instead of pr_warning s390/dasd: remove casts to dasd_*_private s390/dasd: Refactor dasd format functions s390/dasd: Simplify code in format logic s390/dasd: Improve dasd format code s390/percpu: remove this_cpu_cmpxchg_double_4 s390/cpumf: Improve guest detection heuristics s390/fault: merge report_user_fault implementations s390/dis: use correct escape sequence for '%' character s390/kvm: simplify set_guest_storage_key s390/oprofile: add z13/z13s model numbers s390: add z13s model number to z13 elf platform ...
2016-03-16Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-63/+172
Pull KVM updates from Paolo Bonzini: "One of the largest releases for KVM... Hardly any generic changes, but lots of architecture-specific updates. ARM: - VHE support so that we can run the kernel at EL2 on ARMv8.1 systems - PMU support for guests - 32bit world switch rewritten in C - various optimizations to the vgic save/restore code. PPC: - enabled KVM-VFIO integration ("VFIO device") - optimizations to speed up IPIs between vcpus - in-kernel handling of IOMMU hypercalls - support for dynamic DMA windows (DDW). s390: - provide the floating point registers via sync regs; - separated instruction vs. data accesses - dirty log improvements for huge guests - bugfixes and documentation improvements. x86: - Hyper-V VMBus hypercall userspace exit - alternative implementation of lowest-priority interrupts using vector hashing (for better VT-d posted interrupt support) - fixed guest debugging with nested virtualizations - improved interrupt tracking in the in-kernel IOAPIC - generic infrastructure for tracking writes to guest memory - currently its only use is to speedup the legacy shadow paging (pre-EPT) case, but in the future it will be used for virtual GPUs as well - much cleanup (LAPIC, kvmclock, MMU, PIT), including ubsan fixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (217 commits) KVM: x86: remove eager_fpu field of struct kvm_vcpu_arch KVM: x86: disable MPX if host did not enable MPX XSAVE features arm64: KVM: vgic-v3: Only wipe LRs on vcpu exit arm64: KVM: vgic-v3: Reset LRs at boot time arm64: KVM: vgic-v3: Do not save an LR known to be empty arm64: KVM: vgic-v3: Save maintenance interrupt state only if required arm64: KVM: vgic-v3: Avoid accessing ICH registers KVM: arm/arm64: vgic-v2: Make GICD_SGIR quicker to hit KVM: arm/arm64: vgic-v2: Only wipe LRs on vcpu exit KVM: arm/arm64: vgic-v2: Reset LRs at boot time KVM: arm/arm64: vgic-v2: Do not save an LR known to be empty KVM: arm/arm64: vgic-v2: Move GICH_ELRSR saving to its own function KVM: arm/arm64: vgic-v2: Save maintenance interrupt state only if required KVM: arm/arm64: vgic-v2: Avoid accessing GICH registers KVM: s390: allocate only one DMA page per VM KVM: s390: enable STFLE interpretation only if enabled for the guest KVM: s390: wake up when the VCPU cpu timer expires KVM: s390: step the VCPU timer while in enabled wait KVM: s390: protect VCPU cpu timer with a seqcount KVM: s390: step VCPU cpu timer during kvm_run ioctl ...
2016-03-08s390/mm: split arch/s390/mm/pgtable.cMartin Schwidefsky1-1/+2
The pgtable.c file is quite big, before it grows any larger split it into pgtable.c, pgalloc.c and gmap.c. In addition move the gmap related header definitions into the new gmap.h header and all of the pgste helpers from pgtable.h to pgtable.c. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-03-08s390/mm: uninline ptep_xxx functions from pgtable.hMartin Schwidefsky1-1/+1
The code in the various ptep_xxx functions has grown quite large, consolidate them to four out-of-line functions: ptep_xchg_direct to exchange a pte with another with immediate flushing ptep_xchg_lazy to exchange a pte with another in a batched update ptep_modify_prot_start to begin a protection flags update ptep_modify_prot_commit to commit a protection flags update Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-03-08KVM: s390: allocate only one DMA page per VMDavid Hildenbrand1-37/+23
We can fit the 2k for the STFLE interpretation and the crypto control block into one DMA page. As we now only have to allocate one DMA page, we can clean up the code a bit. As a nice side effect, this also fixes a problem with crycbd alignment in case special allocation debug options are enabled, debugged by Sascha Silbe. Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-03-08KVM: s390: enable STFLE interpretation only if enabled for the guestDavid Hildenbrand1-1/+2
Not setting the facility list designation disables STFLE interpretation, this is what we want if the guest was told to not have it. Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-03-08KVM: s390: step the VCPU timer while in enabled waitDavid Hildenbrand1-2/+2
The cpu timer is a mean to measure task execution time. We want to account everything for a VCPU for which it is responsible. Therefore, if the VCPU wants to sleep, it shall be accounted for it. We can easily get this done by not disabling cpu timer accounting when scheduled out while sleeping because of enabled wait. Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-03-08KVM: s390: protect VCPU cpu timer with a seqcountDavid Hildenbrand1-8/+22
For now, only the owning VCPU thread (that has loaded the VCPU) can get a consistent cpu timer value when calculating the delta. However, other threads might also be interested in a more recent, consistent value. Of special interest will be the timer callback of a VCPU that executes without having the VCPU loaded and could run in parallel with the VCPU thread. The cpu timer has a nice property: it is only updated by the owning VCPU thread. And speaking about accounting, a consistent value can only be calculated by looking at cputm_start and the cpu timer itself in one shot, otherwise the result might be wrong. As we only have one writing thread at a time (owning VCPU thread), we can use a seqcount instead of a seqlock and retry if the VCPU refreshed its cpu timer. This avoids any heavy locking and only introduces a counter update/check plus a handful of smp_wmb(). The owning VCPU thread should never have to retry on reads, and also for other threads this might be a very rare scenario. Please note that we have to use the raw_* variants for locking the seqcount as lockdep will produce false warnings otherwise. The rq->lock held during vcpu_load/put is also acquired from hardirq context. Lockdep cannot know that we avoid potential deadlocks by disabling preemption and thereby disable concurrent write locking attempts (via vcpu_put/load). Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-03-08KVM: s390: step VCPU cpu timer during kvm_run ioctlDavid Hildenbrand1-2/+74
Architecturally we should only provide steal time if we are scheduled away, and not if the host interprets a guest exit. We have to step the guest CPU timer in these cases. In the first shot, we will step the VCPU timer only during the kvm_run ioctl. Therefore all time spent e.g. in interception handlers or on irq delivery will be accounted for that VCPU. We have to take care of a few special cases: - Other VCPUs can test for pending irqs. We can only report a consistent value for the VCPU thread itself when adding the delta. - We have to take care of STP sync, therefore we have to extend kvm_clock_sync() and disable preemption accordingly - During any call to disable/enable/start/stop we could get premeempted and therefore get start/stop calls. Therefore we have to make sure we don't get into an inconsistent state. Whenever a VCPU is scheduled out, sleeping, in user space or just about to enter the SIE, the guest cpu timer isn't stepped. Please note that all primitives are prepared to be called from both environments (cpu timer accounting enabled or not), although not completely used in this patch yet (e.g. kvm_s390_set_cpu_timer() will never be called while cpu timer accounting is enabled). Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-03-08KVM: s390: abstract access to the VCPU cpu timerDavid Hildenbrand1-8/+23
We want to manually step the cpu timer in certain scenarios in the future. Let's abstract any access to the cpu timer, so we can hide the complexity internally. Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-03-08KVM: s390: store cpu id in vcpu->cpu when scheduled inDavid Hildenbrand1-0/+2
By storing the cpu id, we have a way to verify if the current cpu is owning a VCPU. Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-03-08KVM: s390: correct fprs on SIGP (STOP AND) STORE STATUSDavid Hildenbrand1-1/+1
With MACHINE_HAS_VX, we convert the floating point registers from the vector registeres when storing the status. For other VCPUs, these are stored to vcpu->run->s.regs.vrs, but we are using current->thread.fpu.vxrs, which resolves to the currently loaded VCPU. So kvm_s390_store_status_unloaded() currently writes the wrong floating point registers (converted from the vector registers) when called from another VCPU on a z13. This is only the case for old user space not handling SIGP STORE STATUS and SIGP STOP AND STORE STATUS, but relying on the kernel implementation. All other calls come from the loaded VCPU via kvm_s390_store_status(). Fixes: 9abc2a08a7d6 (KVM: s390: fix memory overwrites when vx is disabled) Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Cc: stable@vger.kernel.org # v4.4+ Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-02-10KVM: s390: bail out early on fatal signal in dirty loggingChristian Borntraeger1-0/+2
A KVM_GET_DIRTY_LOG ioctl might take a long time. This can result in fatal signals seemingly being ignored. Lets bail out during the dirty bit sync, if a fatal signal is pending. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-02-10KVM: s390: do not block CPU on dirty loggingChristian Borntraeger1-0/+1
When doing dirty logging on huge guests (e.g.600GB) we sometimes get rcu stall timeouts with backtraces like [ 2753.194083] ([<0000000000112fb2>] show_trace+0x12a/0x130) [ 2753.194092] [<0000000000113024>] show_stack+0x6c/0xe8 [ 2753.194094] [<00000000001ee6a8>] rcu_pending+0x358/0xa48 [ 2753.194099] [<00000000001f20cc>] rcu_check_callbacks+0x84/0x168 [ 2753.194102] [<0000000000167654>] update_process_times+0x54/0x80 [ 2753.194107] [<00000000001bdb5c>] tick_sched_handle.isra.16+0x4c/0x60 [ 2753.194113] [<00000000001bdbd8>] tick_sched_timer+0x68/0x90 [ 2753.194115] [<0000000000182a88>] __run_hrtimer+0x88/0x1f8 [ 2753.194119] [<00000000001838ba>] hrtimer_interrupt+0x122/0x2b0 [ 2753.194121] [<000000000010d034>] do_extint+0x16c/0x170 [ 2753.194123] [<00000000005e206e>] ext_skip+0x38/0x3e [ 2753.194129] [<000000000012157c>] gmap_test_and_clear_dirty+0xcc/0x118 [ 2753.194134] ([<00000000001214ea>] gmap_test_and_clear_dirty+0x3a/0x118) [ 2753.194137] [<0000000000132da4>] kvm_vm_ioctl_get_dirty_log+0xd4/0x1b0 [ 2753.194143] [<000000000012ac12>] kvm_vm_ioctl+0x21a/0x548 [ 2753.194146] [<00000000002b57f6>] do_vfs_ioctl+0x30e/0x518 [ 2753.194149] [<00000000002b5a9c>] SyS_ioctl+0x9c/0xb0 [ 2753.194151] [<00000000005e1ae6>] sysc_tracego+0x14/0x1a [ 2753.194153] [<000003ffb75f3972>] 0x3ffb75f3972 We should do a cond_resched in here. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-02-10KVM: s390: do not take mmap_sem on dirty log queryChristian Borntraeger1-2/+0
Dirty log query can take a long time for huge guests. Holding the mmap_sem for very long times can cause some unwanted latencies. Turns out that we do not need to hold the mmap semaphore. We hold the slots_lock for gfn->hva translation and walk the page tables with that address, so no need to look at the VMAs. KVM also holds a reference to the mm, which should prevent other things going away. During the walk we take the necessary ptl locks. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-02-10KVM: s390: instruction-fetching exceptions on SIE faultsDavid Hildenbrand1-2/+10
On instruction-fetch exceptions, we have to forward the PSW by any valid ilc and correctly use that ilc when injecting the irq. Injection will already take care of rewinding the PSW if we injected a nullifying program irq, so we don't need special handling prior to injection. Until now, autodetection would have guessed an ilc of 0. Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-02-10KVM: s390: provide prog irq ilc on SIE faultsDavid Hildenbrand1-4/+8
On SIE faults, the ilc cannot be detected automatically, as the icptcode is 0. The ilc indicated in the program irq will always be 0. Therefore we have to manually specify the ilc in order to tell the guest which ilen was used when forwarding the PSW. Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-02-10KVM: s390: read the correct opcode on SIE faultsDavid Hildenbrand1-2/+1
Let's use our fresh new function read_guest_instr() to access guest storage via the correct addressing schema. Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-02-10KVM: s390: gaccess: introduce access modesDavid Hildenbrand1-2/+4
We will need special handling when fetching instructions, so let's introduce new guest access modes GACC_FETCH and GACC_STORE instead of a write flag. An additional patch will then introduce GACC_IFETCH. Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>