summaryrefslogtreecommitdiff
path: root/virt
AgeCommit message (Collapse)AuthorFilesLines
2017-11-06KVM: arm/arm64: Move timer/vgic flush/sync under disabled irqChristoffer Dall1-13/+13
As we are about to play tricks with the timer to be more lazy in saving and restoring state, we need to move the timer sync and flush functions under a disabled irq section and since we have to flush the vgic state after the timer and PMU state, we do the whole flush/sync sequence with disabled irqs. The only downside is a slightly longer delay before being able to process hardware interrupts and run softirqs. Signed-off-by: Christoffer Dall <cdall@linaro.org> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
2017-11-06KVM: arm/arm64: Rename soft timer to bg_timerChristoffer Dall1-9/+9
As we are about to introduce a separate hrtimer for the physical timer, call this timer bg_timer, because we refer to this timer as the background timer in the code and comments elsewhere. Signed-off-by: Christoffer Dall <cdall@linaro.org> Acked-by: Marc Zyngier <marc.zyngier@arm.com>
2017-11-06KVM: arm/arm64: Make timer_arm and timer_disarm helpers more genericChristoffer Dall1-22/+11
We are about to add an additional soft timer to the arch timer state for a VCPU and would like to be able to reuse the functions to program and cancel a timer, so we make them slightly more generic and rename to make it more clear that these functions work on soft timers and not the hardware resource that this code is managing. The armed flag on the timer state is only used to assert a condition, and we don't rely on this assertion in any meaningful way, so we can simply get rid of this flack and slightly reduce complexity. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <cdall@linaro.org>
2017-11-06KVM: arm/arm64: Support calling vgic_update_irq_pending from irq contextChristoffer Dall8-72/+107
We are about to optimize our timer handling logic which involves injecting irqs to the vgic directly from the irq handler. Unfortunately, the injection path can take any AP list lock and irq lock and we must therefore make sure to use spin_lock_irqsave where ever interrupts are enabled and we are taking any of those locks, to avoid deadlocking between process context and the ISR. This changes a lot of the VGIC code, but the good news are that the changes are mostly mechanical. Acked-by: Marc Zyngier <marc,zyngier@arm.com> Signed-off-by: Christoffer Dall <cdall@linaro.org>
2017-11-06KVM: arm/arm64: Guard kvm_vgic_map_is_active against !vgic_initializedChristoffer Dall1-0/+3
If the vgic is not initialized, don't try to grab its spinlocks or traverse its data structures. This is important because we soon have to start considering the active state of a virtual interrupts when doing vcpu_load, which may happen early on before the vgic is initialized. Signed-off-by: Christoffer Dall <cdall@linaro.org> Acked-by: Marc Zyngier <marc.zyngier@arm.com>
2017-11-04Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2-42/+62
Pull KVM fixes from Paolo Bonzini: "Fixes for interrupt controller emulation in ARM/ARM64 and x86, plus a one-liner x86 KVM guest fix" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: Update APICv on APIC reset KVM: VMX: Do not fully reset PI descriptor on vCPU reset kvm: Return -ENODEV from update_persistent_clock KVM: arm/arm64: vgic-its: Check GITS_BASER Valid bit before saving tables KVM: arm/arm64: vgic-its: Check CBASER/BASER validity before enabling the ITS KVM: arm/arm64: vgic-its: Fix vgic_its_restore_collection_table returned value KVM: arm/arm64: vgic-its: Fix return value for device table restore arm/arm64: kvm: Disable branch profiling in HYP code arm/arm64: kvm: Move initialization completion message arm/arm64: KVM: set right LR register value for 32 bit guest when inject abort KVM: arm64: its: Fix missing dynamic allocation check in scan_its_table
2017-11-03arm64/sve: KVM: Prevent guests from using SVEDave Martin1-0/+3
Until KVM has full SVE support, guests must not be allowed to execute SVE instructions. This patch enables the necessary traps, and also ensures that the traps are disabled again on exit from the guest so that the host can still use SVE if it wants to. On guest exit, high bits of the SVE Zn registers may have been clobbered as a side-effect the execution of FPSIMD instructions in the guest. The existing KVM host FPSIMD restore code is not sufficient to restore these bits, so this patch explicitly marks the CPU as not containing cached vector state for any task, thus forcing a reload on the next return to userspace. This is an interim measure, in advance of adding full SVE awareness to KVM. This marking of cached vector state in the CPU as invalid is done using __this_cpu_write(fpsimd_last_state, NULL) in fpsimd.c. Due to the repeated use of this rather obscure operation, it makes sense to factor it out as a separate helper with a clearer name. This patch factors it out as fpsimd_flush_cpu_state(), and ports all callers to use it. As a side effect of this refactoring, a this_cpu_write() in fpsimd_cpu_pm_notifier() is changed to __this_cpu_write(). This should be fine, since cpu_pm_enter() is supposed to be called only with interrupts disabled. Signed-off-by: Dave Martin <Dave.Martin@arm.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-11-02License cleanup: add SPDX GPL-2.0 license identifier to files with no licenseGreg Kroah-Hartman6-0/+6
Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-29KVM: arm/arm64: vgic-its: Check GITS_BASER Valid bit before saving tablesEric Auger1-9/+11
At the moment we don't properly check the GITS_BASER<n>.Valid bit before saving the collection and device tables. On vgic_its_save_collection_table() we use the GITS_BASER gpa field whereas the Valid bit should be used. On vgic_its_save_device_tables() there is no check. This can cause various bugs, among which a subsequent fault when accessing the table in guest memory. Let's systematically check the Valid bit before doing anything. We also uniformize the code between save and restore. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2017-10-29KVM: arm/arm64: vgic-its: Check CBASER/BASER validity before enabling the ITSEric Auger1-0/+11
The spec says it is UNPREDICTABLE to enable the ITS if any of the following conditions are true: - GITS_CBASER.Valid == 0. - GITS_BASER<n>.Valid == 0, for any GITS_BASER<n> register where the Type field indicates Device. - GITS_BASER<n>.Valid == 0, for any GITS_BASER<n> register where the Type field indicates Interrupt Collection and GITS_TYPER.HCC == 0. In that case, let's keep the ITS disabled. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reported-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2017-10-29KVM: arm/arm64: vgic-its: Fix vgic_its_restore_collection_table returned valueEric Auger1-0/+4
vgic_its_restore_cte returns +1 if the collection table entry is valid and properly decoded. As a consequence, if the collection table is fully filled with valid data that are decoded without error, vgic_its_restore_collection_table() returns +1. This is wrong. Let's return 0 in that case. Fixes: ea1ad53e1e31a3 (KVM: arm64: vgic-its: Collection table save/restore) Signed-off-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2017-10-29KVM: arm/arm64: vgic-its: Fix return value for device table restorewanghaibin1-5/+15
If ITT only contains invalid entries, vgic_its_restore_itt returns 1 and this is considered as an an error in vgic_its_restore_dte. Also in case the device table only contains invalid entries, the table restore fails and this is not correct. This patch fixes those 2 issues: - vgic_its_restore_itt now returns <= 0 values. If all ITEs are invalid, this is considered as successful. - vgic_its_restore_device_tables also returns <= 0 values. We also simplify the returned value computation in handle_l1_dte. Signed-off-by: wanghaibin <wanghaibin.wang@huawei.com> Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2017-10-25locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns ↵Mark Rutland1-1/+1
to READ_ONCE()/WRITE_ONCE() Please do not apply this to mainline directly, instead please re-run the coccinelle script shown below and apply its output. For several reasons, it is desirable to use {READ,WRITE}_ONCE() in preference to ACCESS_ONCE(), and new code is expected to use one of the former. So far, there's been no reason to change most existing uses of ACCESS_ONCE(), as these aren't harmful, and changing them results in churn. However, for some features, the read/write distinction is critical to correct operation. To distinguish these cases, separate read/write accessors must be used. This patch migrates (most) remaining ACCESS_ONCE() instances to {READ,WRITE}_ONCE(), using the following coccinelle script: ---- // Convert trivial ACCESS_ONCE() uses to equivalent READ_ONCE() and // WRITE_ONCE() // $ make coccicheck COCCI=/home/mark/once.cocci SPFLAGS="--include-headers" MODE=patch virtual patch @ depends on patch @ expression E1, E2; @@ - ACCESS_ONCE(E1) = E2 + WRITE_ONCE(E1, E2) @ depends on patch @ expression E; @@ - ACCESS_ONCE(E) + READ_ONCE(E) ---- Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: davem@davemloft.net Cc: linux-arch@vger.kernel.org Cc: mpe@ellerman.id.au Cc: shuah@kernel.org Cc: snitzer@redhat.com Cc: thor.thayer@linux.intel.com Cc: tj@kernel.org Cc: viro@zeniv.linux.org.uk Cc: will.deacon@arm.com Link: http://lkml.kernel.org/r/1508792849-3115-19-git-send-email-paulmck@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-21arm/arm64: kvm: Move initialization completion messageJulien Thierry1-17/+14
KVM is being a bit too optimistic, Hyp mode is said to be initialized when Hyp segments have only been mapped. Notify KVM's successful initialization only once it is really fully initialized. Signed-off-by: Julien Thierry <julien.thierry@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Cc: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2017-10-13KVM: arm64: its: Fix missing dynamic allocation check in scan_its_tableChristoffer Dall1-11/+7
We currently allocate an entry dynamically, but we never check if the allocation actually succeeded. We actually don't need a dynamic allocation, because we know the maximum size of an ITS table entry, so we can simply use an allocation on the stack. Cc: <stable@vger.kernel.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2017-10-12kvm, mm: account kvm related kmem slabs to kmemcgShakeel Butt1-1/+1
The kvm slabs can consume a significant amount of system memory and indeed in our production environment we have observed that a lot of machines are spending significant amount of memory that can not be left as system memory overhead. Also the allocations from these slabs can be triggered directly by user space applications which has access to kvm and thus a buggy application can leak such memory. So, these caches should be accounted to kmemcg. Signed-off-by: Shakeel Butt <shakeelb@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-20get_compat_sigset()Al Viro1-5/+2
similar to put_compat_sigset() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-09-19Revert "KVM: Don't accept obviously wrong gsi values via KVM_IRQFD"Jan H. Schönherr1-2/+0
This reverts commit 36ae3c0a36b7456432fedce38ae2f7bd3e01a563. The commit broke compilation on !CONFIG_HAVE_KVM_IRQ_ROUTING. Also, there may be cases with CONFIG_HAVE_KVM_IRQ_ROUTING, where larger gsi values make sense. As the commit was meant as an early indicator to user space that something is wrong, reverting just restores the previous behavior where overly large values are ignored when encountered (without any direct feedback). Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com> Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-09-15kvm: Serialize wq active checks in kvm_vcpu_wake_up()Davidlohr Bueso1-1/+1
This is a generic call and can be suceptible to races in reading the wq task_list while another task is adding itself to the list. Add a full barrier by using the swq_has_sleeper() helper. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-15kvm,async_pf: Use swq_has_sleeper()Davidlohr Bueso1-5/+1
... as we've got the new helper now. This caller already does the right thing, hence no changes in semantics. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-15KVM: Don't accept obviously wrong gsi values via KVM_IRQFDJan H. Schönherr1-0/+2
We cannot add routes for gsi values >= KVM_MAX_IRQ_ROUTES -- see kvm_set_irq_routing(). Hence, there is no sense in accepting them via KVM_IRQFD. Prevent them from entering the system in the first place. Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-13KVM: fix rcu warning on VM_CREATE errorsChristian Borntraeger1-0/+1
commit 3898da947bba ("KVM: avoid using rcu_dereference_protected") can trigger the following lockdep/rcu splat if the VM_CREATE ioctl fails, for example if kvm_arch_init_vm fails: WARNING: suspicious RCU usage 4.13.0+ #105 Not tainted ----------------------------- ./include/linux/kvm_host.h:481 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 no locks held by qemu-system-s39/79. stack backtrace: CPU: 0 PID: 79 Comm: qemu-system-s39 Not tainted 4.13.0+ #105 Hardware name: IBM 2964 NC9 704 (KVM/Linux) Call Trace: ([<00000000001140b2>] show_stack+0xea/0xf0) [<00000000008a68a4>] dump_stack+0x94/0xd8 [<0000000000134c12>] kvm_dev_ioctl+0x372/0x7a0 [<000000000038f940>] do_vfs_ioctl+0xa8/0x6c8 [<0000000000390004>] SyS_ioctl+0xa4/0xb8 [<00000000008c7a8c>] system_call+0xc4/0x27c no locks held by qemu-system-s39/79. We have to reset the just created users_count back to 0 to tell the check to not trigger. Reported-by: Stefan Haberland <sth@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Fixes: 3898da947bba ("KVM: avoid using rcu_dereference_protected") Cc: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-09-09Merge tag 'kvm-4.14-1' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds7-40/+89
Pull KVM updates from Radim Krčmář: "First batch of KVM changes for 4.14 Common: - improve heuristic for boosting preempted spinlocks by ignoring VCPUs in user mode ARM: - fix for decoding external abort types from guests - added support for migrating the active priority of interrupts when running a GICv2 guest on a GICv3 host - minor cleanup PPC: - expose storage keys to userspace - merge kvm-ppc-fixes with a fix that missed 4.13 because of vacations - fixes s390: - merge of kvm/master to avoid conflicts with additional sthyi fixes - wire up the no-dat enhancements in KVM - multiple epoch facility (z14 feature) - Configuration z/Architecture Mode - more sthyi fixes - gdb server range checking fix - small code cleanups x86: - emulate Hyper-V TSC frequency MSRs - add nested INVPCID - emulate EPTP switching VMFUNC - support Virtual GIF - support 5 level page tables - speedup nested VM exits by packing byte operations - speedup MMIO by using hardware provided physical address - a lot of fixes and cleanups, especially nested" * tag 'kvm-4.14-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (67 commits) KVM: arm/arm64: Support uaccess of GICC_APRn KVM: arm/arm64: Extract GICv3 max APRn index calculation KVM: arm/arm64: vITS: Drop its_ite->lpi field KVM: arm/arm64: vgic: constify seq_operations and file_operations KVM: arm/arm64: Fix guest external abort matching KVM: PPC: Book3S HV: Fix memory leak in kvm_vm_ioctl_get_htab_fd KVM: s390: vsie: cleanup mcck reinjection KVM: s390: use WARN_ON_ONCE only for checking KVM: s390: guestdbg: fix range check KVM: PPC: Book3S HV: Report storage key support to userspace KVM: PPC: Book3S HV: Fix case where HDEC is treated as 32-bit on POWER9 KVM: PPC: Book3S HV: Fix invalid use of register expression KVM: PPC: Book3S HV: Fix H_REGISTER_VPA VPA size validation KVM: PPC: Book3S HV: Fix setting of storage key in H_ENTER KVM: PPC: e500mc: Fix a NULL dereference KVM: PPC: e500: Fix some NULL dereferences on error KVM: PPC: Book3S HV: Protect updates to spapr_tce_tables list KVM: s390: we are always in czam mode KVM: s390: expose no-DAT to guest and migration support KVM: s390: sthyi: remove invalid guest write access ...
2017-09-05KVM: arm/arm64: Support uaccess of GICC_APRnChristoffer Dall1-1/+46
When migrating guests around we need to know the active priorities to ensure functional virtual interrupt prioritization by the GIC. This commit clarifies the API and how active priorities of interrupts in different groups are represented, and implements the accessor functions for the uaccess register range. We live with a slight layering violation in accessing GICv3 data structures from vgic-mmio-v2.c, because anything else just adds too much complexity for us to deal with (it's not like there's a benefit elsewhere in the code of an intermediate representation as is the case with the VMCR). We accept this, because while doing v3 processing from a file named something-v2.c can look strange at first, this really is specific to dealing with the user space interface for something that looks like a GICv2. Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <cdall@linaro.org>
2017-09-05KVM: arm/arm64: Extract GICv3 max APRn index calculationChristoffer Dall1-0/+16
As we are about to access the APRs from the GICv2 uaccess interface, make this logic generally available. Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <cdall@linaro.org>
2017-09-05KVM: arm/arm64: vITS: Drop its_ite->lpi fieldMarc Zyngier1-6/+4
For unknown reasons, the its_ite data structure carries an "lpi" field which contains the intid of the LPI. This is an obvious duplication of the vgic_irq->intid field, so let's fix the only user and remove the now useless field. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Christoffer Dall <cdall@linaro.org>
2017-09-05KVM: arm/arm64: vgic: constify seq_operations and file_operationsArvind Yadav1-2/+2
vgic_debug_seq_ops and file_operations are not supposed to change at runtime and none of the structures is modified. Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com> Signed-off-by: Christoffer Dall <cdall@linaro.org>
2017-09-05KVM: arm/arm64: Fix guest external abort matchingJames Morse1-29/+11
The ARM-ARM has two bits in the ESR/HSR relevant to external aborts. A range of {I,D}FSC values (of which bit 5 is always set) and bit 9 'EA' which provides: > an IMPLEMENTATION DEFINED classification of External Aborts. This bit is in addition to the {I,D}FSC range, and has an implementation defined meaning. KVM should always ignore this bit when handling external aborts from a guest. Remove the ESR_ELx_EA definition and rewrite its helper kvm_vcpu_dabt_isextabt() to check the {I,D}FSC range. This merges kvm_vcpu_dabt_isextabt() and the recently added is_abort_sea() helper. CC: Tyler Baicar <tbaicar@codeaurora.org> Reported-by: gengdongjiu <gengdj.1984@gmail.com> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <cdall@linaro.org>
2017-09-01KVM: update to new mmu_notifier semantic v2Jérôme Glisse1-42/+0
Calls to mmu_notifier_invalidate_page() were replaced by calls to mmu_notifier_invalidate_range() and are now bracketed by calls to mmu_notifier_invalidate_range_start()/end() Remove now useless invalidate_page callback. Changed since v1 (Linus Torvalds) - remove now useless kvm_arch_mmu_notifier_invalidate_page() Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Tested-by: Mike Galbraith <efault@gmx.de> Tested-by: Adam Borowski <kilobyte@angband.pl> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: kvm@vger.kernel.org Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-15kvm: avoid uninitialized-variable warningsArnd Bergmann1-1/+2
When PAGE_OFFSET is not a compile-time constant, we run into warnings from the use of kvm_is_error_hva() that the compiler cannot optimize out: arch/arm/kvm/../../../virt/kvm/kvm_main.c: In function '__kvm_gfn_to_hva_cache_init': arch/arm/kvm/../../../virt/kvm/kvm_main.c:1978:14: error: 'nr_pages_avail' may be used uninitialized in this function [-Werror=maybe-uninitialized] arch/arm/kvm/../../../virt/kvm/kvm_main.c: In function 'gfn_to_page_many_atomic': arch/arm/kvm/../../../virt/kvm/kvm_main.c:1660:5: error: 'entry' may be used uninitialized in this function [-Werror=maybe-uninitialized] This adds fake initializations to the two instances I ran into. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-08-08KVM: arm: implements the kvm_arch_vcpu_in_kernel()Longpeng(Mike)1-1/+1
This implements the kvm_arch_vcpu_in_kernel() for ARM, and adjusts the calls to kvm_vcpu_on_spin(). Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-08-08KVM: add spinlock optimization frameworkLongpeng(Mike)2-1/+8
If a vcpu exits due to request a user mode spinlock, then the spinlock-holder may be preempted in user mode or kernel mode. (Note that not all architectures trap spin loops in user mode, only AMD x86 and ARM/ARM64 currently do). But if a vcpu exits in kernel mode, then the holder must be preempted in kernel mode, so we should choose a vcpu in kernel mode as a more likely candidate for the lock holder. This introduces kvm_arch_vcpu_in_kernel() to decide whether the vcpu is in kernel-mode when it's preempted. kvm_vcpu_on_spin's new argument says the same of the spinning VCPU. Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-08-03Merge tag 'kvm-arm-for-v4.13-rc4' of ↵Radim Krčmář5-33/+22
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm KVM/ARM Fixes for v4.13-rc4 - Yet another race with VM destruction plugged - A set of small vgic fixes
2017-08-03KVM: arm/arm64: vgic: Use READ_ONCE fo cmpxchgChristoffer Dall1-2/+2
There is a small chance that the compiler could generate separate loads for the dist->propbaser which could be modified from another CPU. As we want to make sure we atomically update the entire value, and don't race with other updates, guarantee that the cmpxchg operation compares against the original value. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Christoffer Dall <cdall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2017-08-02KVM: avoid using rcu_dereference_protectedPaolo Bonzini1-7/+4
During teardown, accesses to memslots and buses are using rcu_dereference_protected with an always-true condition because these accesses are done outside the usual mutexes. This is because the last reference is gone and there cannot be any concurrent modifications, but rcu_dereference_protected is ugly and unobvious. Instead, check the refcount in kvm_get_bus and __kvm_memslots. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-07-26KVM: make pid available for uevents without debugfsClaudio Imbrenda1-23/+12
Simplify and improve the code so that the PID is always available in the uevent even when debugfs is not available. This adds a userspace_pid field to struct kvm, as per Radim's suggestion, so that the PID can be retrieved on destruction too. Acked-by: Janosch Frank <frankja@linux.vnet.ibm.com> Fixes: 286de8f6ac9202 ("KVM: trigger uevents when creating or destroying a VM") Signed-off-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-07-25KVM: arm/arm64: Handle hva aging while destroying the vmSuzuki K Poulose1-0/+4
The mmu_notifier_release() callback of KVM triggers cleaning up the stage2 page table on kvm-arm. However there could be other notifier callbacks in parallel with the mmu_notifier_release(), which could cause the call backs ending up in an empty stage2 page table. Make sure we check it for all the notifier callbacks. Cc: stable@vger.kernel.org Fixes: commit 293f29363 ("kvm-arm: Unmap shadow pagetables properly") Reported-by: Alex Graf <agraf@suse.de> Reviewed-by: Christoffer Dall <cdall@linaro.org> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2017-07-25KVM: arm/arm64: PMU: Fix overflow interrupt injectionAndrew Jones1-28/+15
kvm_pmu_overflow_set() is called from perf's interrupt handler, making the call of kvm_vgic_inject_irq() from it introduced with "KVM: arm/arm64: PMU: remove request-less vcpu kick" a really bad idea, as it's quite easy to try and retake a lock that the interrupted context is already holding. The fix is to use a vcpu kick, leaving the interrupt injection to kvm_pmu_sync_hwstate(), like it was doing before the refactoring. We don't just revert, though, because before the kick was request-less, leaving the vcpu exposed to the request-less vcpu kick race, and also because the kick was used unnecessarily from register access handlers. Reviewed-by: Christoffer Dall <cdall@linaro.org> Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2017-07-25KVM: arm/arm64: Fix bug in advertising KVM_CAP_MSI_DEVID capabilityShanker Donthineni2-3/+1
Commit 0e4e82f154e3 ("KVM: arm64: vgic-its: Enable ITS emulation as a virtual MSI controller") tried to advertise KVM_CAP_MSI_DEVID, but the code logic was not updating the dist->msis_require_devid field correctly. If hypervisor tool creates the ITS device after VGIC initialization then we don't advertise KVM_CAP_MSI_DEVID capability. Update the field msis_require_devid to true inside vgic_its_create() to fix the issue. Fixes: 0e4e82f154e3 ("vgic-its: Enable ITS emulation as a virtual MSI controller") Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2017-07-15Merge tag 'kvm-4.13-2' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds3-26/+115
Pull more KVM updates from Radim Krčmář: "Second batch of KVM updates for v4.13 Common: - add uevents for VM creation/destruction - annotate and properly access RCU-protected objects s390: - rename IOCTL added in the first v4.13 merge x86: - emulate VMLOAD VMSAVE feature in SVM - support paravirtual asynchronous page fault while nested - add Hyper-V userspace interfaces for better migration - improve master clock corner cases - extend internal error reporting after EPT misconfig - correct single-stepping of emulated instructions in SVM - handle MCE during VM entry - fix nVMX VM entry checks and nVMX VMCS shadowing" * tag 'kvm-4.13-2' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (28 commits) kvm: x86: hyperv: make VP_INDEX managed by userspace KVM: async_pf: Let guest support delivery of async_pf from guest mode KVM: async_pf: Force a nested vmexit if the injected #PF is async_pf KVM: async_pf: Add L1 guest async_pf #PF vmexit handler KVM: x86: Simplify kvm_x86_ops->queue_exception parameter list kvm: x86: hyperv: add KVM_CAP_HYPERV_SYNIC2 KVM: x86: make backwards_tsc_observed a per-VM variable KVM: trigger uevents when creating or destroying a VM KVM: SVM: Enable Virtual VMLOAD VMSAVE feature KVM: SVM: Add Virtual VMLOAD VMSAVE feature definition KVM: SVM: Rename lbr_ctl field in the vmcb control area KVM: SVM: Prepare for new bit definition in lbr_ctl KVM: SVM: handle singlestep exception when skipping emulated instructions KVM: x86: take slots_lock in kvm_free_pit KVM: s390: Fix KVM_S390_GET_CMMA_BITS ioctl definition kvm: vmx: Properly handle machine check during VM-entry KVM: x86: update master clock before computing kvmclock_offset kvm: nVMX: Shadow "high" parts of shadowed 64-bit VMCS fields kvm: nVMX: Fix nested_vmx_check_msr_bitmap_controls kvm: nVMX: Validate the I/O bitmaps on nested VM-entry ...
2017-07-13Merge tag 'vfio-v4.13-rc1' of git://github.com/awilliam/linux-vfioLinus Torvalds1-15/+25
Pull VFIO updates from Alex Williamson: - Include Intel XXV710 in INTx workaround (Alex Williamson) - Make use of ERR_CAST() for error return (Dan Carpenter) - Fix vfio_group release deadlock from iommu notifier (Alex Williamson) - Unset KVM-VFIO attributes only on group match (Alex Williamson) - Fix release path group/file matching with KVM-VFIO (Alex Williamson) - Remove unnecessary lock uses triggering lockdep splat (Alex Williamson) * tag 'vfio-v4.13-rc1' of git://github.com/awilliam/linux-vfio: vfio: Remove unnecessary uses of vfio_container.group_lock vfio: New external user group/file match kvm-vfio: Decouple only when we match a group vfio: Fix group release deadlock vfio: Use ERR_CAST() instead of open coding it vfio/pci: Add Intel XXV710 to hidden INTx devices
2017-07-12KVM: trigger uevents when creating or destroying a VMClaudio Imbrenda1-0/+69
This patch adds a few lines to the KVM common code to fire a KOBJ_CHANGE uevent whenever a KVM VM is created or destroyed. The event carries five environment variables: CREATED indicates how many times a new VM has been created. It is useful for example to trigger specific actions when the first VM is started COUNT indicates how many VMs are currently active. This can be used for logging or monitoring purposes PID has the pid of the KVM process that has been started or stopped. This can be used to perform process-specific tuning. STATS_PATH contains the path in debugfs to the directory with all the runtime statistics for this VM. This is useful for performance monitoring and profiling. EVENT described the type of event, its value can be either "create" or "destroy" Specific udev rules can be then set up in userspace to deal with the creation or destruction of VMs as needed. Signed-off-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-07-10Merge branch 'annotations' of ↵Paolo Bonzini3-17/+31
git://git.kernel.org/pub/scm/linux/kernel/git/borntraeger/linux into kvm-master
2017-07-10kvm: avoid unused variable warning for UP buildsPaolo Bonzini1-9/+15
The uniprocessor version of smp_call_function_many does not evaluate all of its argument, and the compiler emits a warning about "wait" being unused. This breaks the build on architectures for which "-Werror" is enabled by default. Work around it by moving the invocation of smp_call_function_many to its own inline function. Reported-by: Paul Mackerras <paulus@ozlabs.org> Cc: stable@vger.kernel.org Fixes: 7a97cec26b94c909f4cbad2dc3186af3e457a522 Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-07-07KVM: mark memory slots as rcuChristian Borntraeger1-2/+4
we access the memslots array via srcu. Mark it as such and use the right access functions also for the freeing of memory slots. Found by sparse: ./include/linux/kvm_host.h:565:16: error: incompatible types in comparison expression (different address spaces) Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
2017-07-07KVM: mark kvm->busses as rcu protectedChristian Borntraeger2-10/+15
mark kvm->busses as rcu protected and use the correct access function everywhere. found by sparse virt/kvm/kvm_main.c:3490:15: error: incompatible types in comparison expression (different address spaces) virt/kvm/kvm_main.c:3509:15: error: incompatible types in comparison expression (different address spaces) virt/kvm/kvm_main.c:3561:15: error: incompatible types in comparison expression (different address spaces) virt/kvm/kvm_main.c:3644:15: error: incompatible types in comparison expression (different address spaces) Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2017-07-07KVM: use rcu access function for irq routingChristian Borntraeger1-1/+1
irq routing is rcu protected. Use the proper access functions. Found by sparse virt/kvm/irqchip.c:233:13: warning: incorrect type in assignment (different address spaces) virt/kvm/irqchip.c:233:13: expected struct kvm_irq_routing_table *old virt/kvm/irqchip.c:233:13: got struct kvm_irq_routing_table [noderef] <asn:4>*irq_routing Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
2017-07-07KVM: mark vcpu->pid pointer as rcu protectedChristian Borntraeger1-4/+11
We do use rcu to protect the pid pointer. Mark it as such and adopt all code to use the proper access methods. This was detected by sparse. "virt/kvm/kvm_main.c:2248:15: error: incompatible types in comparison expression (different address spaces)" Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
2017-07-07Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds15-150/+1297
Pull KVM updates from Paolo Bonzini: "PPC: - Better machine check handling for HV KVM - Ability to support guests with threads=2, 4 or 8 on POWER9 - Fix for a race that could cause delayed recognition of signals - Fix for a bug where POWER9 guests could sleep with interrupts pending. ARM: - VCPU request overhaul - allow timer and PMU to have their interrupt number selected from userspace - workaround for Cavium erratum 30115 - handling of memory poisonning - the usual crop of fixes and cleanups s390: - initial machine check forwarding - migration support for the CMMA page hinting information - cleanups and fixes x86: - nested VMX bugfixes and improvements - more reliable NMI window detection on AMD - APIC timer optimizations Generic: - VCPU request overhaul + documentation of common code patterns - kvm_stat improvements" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (124 commits) Update my email address kvm: vmx: allow host to access guest MSR_IA32_BNDCFGS x86: kvm: mmu: use ept a/d in vmcs02 iff used in vmcs12 kvm: x86: mmu: allow A/D bits to be disabled in an mmu x86: kvm: mmu: make spte mmio mask more explicit x86: kvm: mmu: dead code thanks to access tracking KVM: PPC: Book3S: Fix typo in XICS-on-XIVE state saving code KVM: PPC: Book3S HV: Close race with testing for signals on guest entry KVM: PPC: Book3S HV: Simplify dynamic micro-threading code KVM: x86: remove ignored type attribute KVM: LAPIC: Fix lapic timer injection delay KVM: lapic: reorganize restart_apic_timer KVM: lapic: reorganize start_hv_timer kvm: nVMX: Check memory operand to INVVPID KVM: s390: Inject machine check into the nested guest KVM: s390: Inject machine check into the guest tools/kvm_stat: add new interactive command 'b' tools/kvm_stat: add new command line switch '-i' tools/kvm_stat: fix error on interactive command 'g' KVM: SVM: suppress unnecessary NMI singlestep on GIF=0 and nested exit ...
2017-07-06Merge tag 'arm64-upstream' of ↵Linus Torvalds1-3/+33
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Will Deacon: - RAS reporting via GHES/APEI (ACPI) - Indirect ftrace trampolines for modules - Improvements to kernel fault reporting - Page poisoning - Sigframe cleanups and preparation for SVE context - Core dump fixes - Sparse fixes (mainly relating to endianness) - xgene SoC PMU v3 driver - Misc cleanups and non-critical fixes * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (75 commits) arm64: fix endianness annotation for 'struct jit_ctx' and friends arm64: cpuinfo: constify attribute_group structures. arm64: ptrace: Fix incorrect get_user() use in compat_vfp_set() arm64: ptrace: Remove redundant overrun check from compat_vfp_set() arm64: ptrace: Avoid setting compat FP[SC]R to garbage if get_user fails arm64: fix endianness annotation for __apply_alternatives()/get_alt_insn() arm64: fix endianness annotation in get_kaslr_seed() arm64: add missing conversion to __wsum in ip_fast_csum() arm64: fix endianness annotation in acpi_parking_protocol.c arm64: use readq() instead of readl() to read 64bit entry_point arm64: fix endianness annotation for reloc_insn_movw() & reloc_insn_imm() arm64: fix endianness annotation for aarch64_insn_write() arm64: fix endianness annotation in aarch64_insn_read() arm64: fix endianness annotation in call_undef_hook() arm64: fix endianness annotation for debug-monitors.c ras: mark stub functions as 'inline' arm64: pass endianness info to sparse arm64: ftrace: fix !CONFIG_ARM64_MODULE_PLTS kernels arm64: signal: Allow expansion of the signal frame acpi: apei: check for pending errors when probing GHES entries ...