summaryrefslogtreecommitdiff
path: root/arch/arm/kvm
AgeCommit message (Collapse)AuthorFilesLines
2016-09-09kvm-arm: Unmap shadow pagetables properlySuzuki K Poulose2-2/+1
On arm/arm64, we depend on the kvm_unmap_hva* callbacks (via mmu_notifiers::invalidate_*) to unmap the stage2 pagetables when the userspace buffer gets unmapped. However, when the Hypervisor process exits without explicit unmap of the guest buffers, the only notifier we get is kvm_arch_flush_shadow_all() (via mmu_notifier::release ) which does nothing on arm. Later this causes us to access pages that were already released [via exit_mmap() -> unmap_vmas()] when we actually get to unmap the stage2 pagetable [via kvm_arch_destroy_vm() -> kvm_free_stage2_pgd()]. This triggers crashes with CONFIG_DEBUG_PAGEALLOC, which unmaps any free'd pages from the linear map. [ 757.644120] Unable to handle kernel paging request at virtual address ffff800661e00000 [ 757.652046] pgd = ffff20000b1a2000 [ 757.655471] [ffff800661e00000] *pgd=00000047fffe3003, *pud=00000047fcd8c003, *pmd=00000047fcc7c003, *pte=00e8004661e00712 [ 757.666492] Internal error: Oops: 96000147 [#3] PREEMPT SMP [ 757.672041] Modules linked in: [ 757.675100] CPU: 7 PID: 3630 Comm: qemu-system-aar Tainted: G D 4.8.0-rc1 #3 [ 757.683240] Hardware name: AppliedMicro X-Gene Mustang Board/X-Gene Mustang Board, BIOS 3.06.15 Aug 19 2016 [ 757.692938] task: ffff80069cdd3580 task.stack: ffff8006adb7c000 [ 757.698840] PC is at __flush_dcache_area+0x1c/0x40 [ 757.703613] LR is at kvm_flush_dcache_pmd+0x60/0x70 [ 757.708469] pc : [<ffff20000809dbdc>] lr : [<ffff2000080b4a70>] pstate: 20000145 ... [ 758.357249] [<ffff20000809dbdc>] __flush_dcache_area+0x1c/0x40 [ 758.363059] [<ffff2000080b6748>] unmap_stage2_range+0x458/0x5f0 [ 758.368954] [<ffff2000080b708c>] kvm_free_stage2_pgd+0x34/0x60 [ 758.374761] [<ffff2000080b2280>] kvm_arch_destroy_vm+0x20/0x68 [ 758.380570] [<ffff2000080aa330>] kvm_put_kvm+0x210/0x358 [ 758.385860] [<ffff2000080aa524>] kvm_vm_release+0x2c/0x40 [ 758.391239] [<ffff2000082ad234>] __fput+0x114/0x2e8 [ 758.396096] [<ffff2000082ad46c>] ____fput+0xc/0x18 [ 758.400869] [<ffff200008104658>] task_work_run+0x108/0x138 [ 758.406332] [<ffff2000080dc8ec>] do_exit+0x48c/0x10e8 [ 758.411363] [<ffff2000080dd5fc>] do_group_exit+0x6c/0x130 [ 758.416739] [<ffff2000080ed924>] get_signal+0x284/0xa18 [ 758.421943] [<ffff20000808a098>] do_signal+0x158/0x860 [ 758.427060] [<ffff20000808aad4>] do_notify_resume+0x6c/0x88 [ 758.432608] [<ffff200008083624>] work_pending+0x10/0x14 [ 758.437812] Code: 9ac32042 8b010001 d1000443 8a230000 (d50b7e20) This patch fixes the issue by moving the kvm_free_stage2_pgd() to kvm_arch_flush_shadow_all(). Cc: <stable@vger.kernel.org> # 3.9+ Tested-by: Itaru Kitayama <itaru.kitayama@riken.jp> Reported-by: Itaru Kitayama <itaru.kitayama@riken.jp> Reported-by: James Morse <james.morse@arm.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-09-06arm: KVM: Fix idmap overlap detection when the kernel is idmap'edMarc Zyngier1-1/+2
We're trying hard to detect when the HYP idmap overlaps with the HYP va, as it makes the teardown of a cpu dangerous. But there is one case where an overlap is completely safe, which is when the whole of the kernel is idmap'ed, which is likely to happen on 32bit when RAM is at 0x8000000 and we're using a 2G/2G VA split. In that case, we can proceed safely. Reported-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-08-18Merge tag 'kvm-arm-for-v4.8-rc3' of ↵Paolo Bonzini1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/ARM Fixes for v4.8-rc3 This tag contains the following fixes on top of v4.8-rc1: - ITS init issues - ITS error handling issues - ITS IRQ leakage fix - Plug a couple of ITS race conditions - An erratum workaround for timers - Some removal of misleading use of errors and comments - A fix for GICv3 on 32-bit guests
2016-08-17KVM: arm/arm64: Change misleading use of is_error_pfnChristoffer Dall1-1/+1
When converting a gfn to a pfn, we call gfn_to_pfn_prot, which returns various kinds of error values. It turns out that is_error_pfn() only returns true when the gfn was found in a memory slot and could somehow not be used, but it does not return true if the gfn does not belong to any memory slot. Change use to is_error_noslot_pfn() which covers both cases. Note: Since we already check for kvm_is_error_hva(hva) explicitly in the caller of this function while holding the kvm->srcu lock protecting the memory slots, this should never be a problem, but nevertheless this change is warranted as it shows the intention of the code. Reported-by: James Hogan <james.hogan@imgtec.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-08-12KVM: Protect device ops->create and list_add with kvm->lockChristoffer Dall1-1/+5
KVM devices were manipulating list data structures without any form of synchronization, and some implementations of the create operations also suffered from a lack of synchronization. Now when we've split the xics create operation into create and init, we can hold the kvm->lock mutex while calling the create operation and when manipulating the devices list. The error path in the generic code gets slightly ugly because we have to take the mutex again and delete the device from the list, but holding the mutex during anon_inode_getfd or releasing/locking the mutex in the common non-error path seemed wrong. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-08-04Merge tag 'kvm-arm-for-4.8-take2' of ↵Paolo Bonzini3-0/+22
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/ARM Changes for v4.8 - Take 2 Includes GSI routing support to go along with the new VGIC and a small fix that has been cooking in -next for a while.
2016-08-02Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds8-159/+104
Pull KVM updates from Paolo Bonzini: - ARM: GICv3 ITS emulation and various fixes. Removal of the old VGIC implementation. - s390: support for trapping software breakpoints, nested virtualization (vSIE), the STHYI opcode, initial extensions for CPU model support. - MIPS: support for MIPS64 hosts (32-bit guests only) and lots of cleanups, preliminary to this and the upcoming support for hardware virtualization extensions. - x86: support for execute-only mappings in nested EPT; reduced vmexit latency for TSC deadline timer (by about 30%) on Intel hosts; support for more than 255 vCPUs. - PPC: bugfixes. * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (302 commits) KVM: PPC: Introduce KVM_CAP_PPC_HTM MIPS: Select HAVE_KVM for MIPS64_R{2,6} MIPS: KVM: Reset CP0_PageMask during host TLB flush MIPS: KVM: Fix ptr->int cast via KVM_GUEST_KSEGX() MIPS: KVM: Sign extend MFC0/RDHWR results MIPS: KVM: Fix 64-bit big endian dynamic translation MIPS: KVM: Fail if ebase doesn't fit in CP0_EBase MIPS: KVM: Use 64-bit CP0_EBase when appropriate MIPS: KVM: Set CP0_Status.KX on MIPS64 MIPS: KVM: Make entry code MIPS64 friendly MIPS: KVM: Use kmap instead of CKSEG0ADDR() MIPS: KVM: Use virt_to_phys() to get commpage PFN MIPS: Fix definition of KSEGX() for 64-bit KVM: VMX: Add VMCS to CPU's loaded VMCSs before VMPTRLD kvm: x86: nVMX: maintain internal copy of current VMCS KVM: PPC: Book3S HV: Save/restore TM state in H_CEDE KVM: PPC: Book3S HV: Pull out TM state save/restore into separate procedures KVM: arm64: vgic-its: Simplify MAPI error handling KVM: arm64: vgic-its: Make vgic_its_cmd_handle_mapi similar to other handlers KVM: arm64: vgic-its: Turn device_id validation into generic ID validation ...
2016-07-22Merge tag 'kvm-arm-for-4.8' of ↵Radim Krčmář5-151/+96
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into next KVM/ARM changes for Linux 4.8 - GICv3 ITS emulation - Simpler idmap management that fixes potential TLB conflicts - Honor the kernel protection in HYP mode - Removal of the old vgic implementation
2016-07-22KVM: arm/arm64: Enable irqchip routingEric Auger3-0/+22
This patch adds compilation and link against irqchip. Main motivation behind using irqchip code is to enable MSI routing code. In the future irqchip routing may also be useful when targeting multiple irqchips. Routing standard callbacks now are implemented in vgic-irqfd: - kvm_set_routing_entry - kvm_set_irq - kvm_set_msi They only are supported with new_vgic code. Both HAVE_KVM_IRQCHIP and HAVE_KVM_IRQ_ROUTING are defined. KVM_CAP_IRQ_ROUTING is advertised and KVM_SET_GSI_ROUTING is allowed. So from now on IRQCHIP routing is enabled and a routing table entry must exist for irqfd injection to succeed for a given SPI. This patch builds a default flat irqchip routing table (gsi=irqchip.pin) covering all the VGIC SPI indexes. This routing table is overwritten by the first first user-space call to KVM_SET_GSI_ROUTING ioctl. MSI routing setup is not yet allowed. Signed-off-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-07-18KVM: arm64: vgic-its: Introduce new KVM ITS deviceAndre Przywara1-0/+1
Introduce a new KVM device that represents an ARM Interrupt Translation Service (ITS) controller. Since there can be multiple of this per guest, we can't piggy back on the existing GICv3 distributor device, but create a new type of KVM device. On the KVM_CREATE_DEVICE ioctl we allocate and initialize the ITS data structure and store the pointer in the kvm_device data. Upon an explicit init ioctl from userland (after having setup the MMIO address) we register the handlers with the kvm_io_bus framework. Any reference to an ITS thus has to go via this interface. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-07-18KVM: arm/arm64: Extend arch CAP checks to allow per-VM capabilitiesAndre Przywara1-1/+1
KVM capabilities can be a per-VM property, though ARM/ARM64 currently does not pass on the VM pointer to the architecture specific capability handlers. Add a "struct kvm*" parameter to those function to later allow proper per-VM capability reporting. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Eric Auger <eric.auger@linaro.org> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-07-04arm/arm64: Get rid of KERN_TO_HYPMarc Zyngier1-9/+9
We have both KERN_TO_HYP and kern_hyp_va, which do the exact same thing. Let's standardize on the latter. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-07-04arm/arm64: KVM: Check that IDMAP doesn't intersect with VA rangeMarc Zyngier1-0/+15
This is more of a safety measure than anything else: If we end-up with an idmap page that intersect with the range picked for the the HYP VA space, abort the KVM setup, as it is unsafe to go further. I cannot imagine it happening on 64bit (we have a mechanism to work around it), but could potentially occur on a 32bit system with the kernel loaded high enough in memory so that in conflicts with the kernel VA. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-07-04arm: KVM: Allow hyp teardownMarc Zyngier2-1/+17
So far, KVM was getting in the way of kexec on 32bit (and the arm64 kexec hackers couldn't be bothered to fix it on 32bit...). With simpler page tables, tearing KVM down becomes very easy, so let's just do it. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-07-04arm: KVM: Simplify HYP initMarc Zyngier1-40/+9
Just like for arm64, we can now make the HYP setup a lot simpler, and we can now initialise it in one go (instead of the two phases we currently have). Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-07-04arm/arm64: KVM: Kill free_boot_hyp_pgdMarc Zyngier2-27/+7
There is no way to free the boot PGD, because it doesn't exist anymore as a standalone entity. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-07-04arm/arm64: KVM: Drop boot_pgdMarc Zyngier2-20/+3
Since we now only have one set of page tables, the concept of boot_pgd is useless and can be removed. We still keep it as an element of the "extended idmap" thing. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-07-04arm/arm64: KVM: Always have merged page tablesMarc Zyngier1-40/+34
We're in a position where we can now always have "merged" page tables, where both the runtime mapping and the idmap coexist. This results in some code being removed, but there is more to come. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-07-04arm/arm64: KVM: Export __hyp_text_start/end symbolsMarc Zyngier1-2/+0
Declare the __hyp_text_start/end symbols in asm/virt.h so that they can be reused without having to declare them locally. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-07-04KVM: arm/arm64: The GIC is dead, long live the GICMarc Zyngier2-13/+0
I don't think any single piece of the KVM/ARM code ever generated as much hatred as the GIC emulation. It was written by someone who had zero experience in modeling hardware (me), was riddled with design flaws, should have been scrapped and rewritten from scratch long before having a remote chance of reaching mainline, and yet we supported it for a good three years. No need to mention the names of those who suffered, the git log is singing their praises. Thankfully, we now have a much more maintainable implementation, and we can safely put the grumpy old GIC to rest. Fellow hackers, please raise your glass in memory of the GIC: The GIC is dead, long live the GIC! Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-07-01KVM: remove kvm_guest_enter/exit wrappersPaolo Bonzini1-4/+4
Use the functions from context_tracking.h directly. Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Rik van Riel <riel@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-29arm/arm64: KVM: Map the HYP text as read-onlyMarc Zyngier2-4/+4
There should be no reason for mapping the HYP text read/write. As such, let's have a new set of flags (PAGE_HYP_EXEC) that allows execution, but makes the page as read-only, and update the two call sites that deal with mapping code. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-06-29arm/arm64: KVM: Enforce HYP read-only mapping of the kernel's rodata sectionMarc Zyngier1-1/+1
In order to be able to use C code in HYP, we're now mapping the kernel's rodata in HYP. It works absolutely fine, except that we're mapping it RWX, which is not what it should be. Add a new HYP_PAGE_RO protection, and pass it as the protection flags when mapping the rodata section. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-06-29arm/arm64: KVM: Add a protection parameter to create_hyp_mappingsMarc Zyngier2-8/+10
Currently, create_hyp_mappings applies a "one size fits all" page protection (PAGE_HYP). As we're heading towards separate protections for different sections, let's make this protection a parameter, and let the callers pass their prefered protection (PAGE_HYP for everyone for the time being). Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-06-27KVM: arm/arm64: Stop leaking vcpu pid referencesJames Morse1-0/+1
kvm provides kvm_vcpu_uninit(), which amongst other things, releases the last reference to the struct pid of the task that was last running the vcpu. On arm64 built with CONFIG_DEBUG_KMEMLEAK, starting a guest with kvmtool, then killing it with SIGKILL results (after some considerable time) in: > cat /sys/kernel/debug/kmemleak > unreferenced object 0xffff80007d5ea080 (size 128): > comm "lkvm", pid 2025, jiffies 4294942645 (age 1107.776s) > hex dump (first 32 bytes): > 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > backtrace: > [<ffff8000001b30ec>] create_object+0xfc/0x278 > [<ffff80000071da34>] kmemleak_alloc+0x34/0x70 > [<ffff80000019fa2c>] kmem_cache_alloc+0x16c/0x1d8 > [<ffff8000000d0474>] alloc_pid+0x34/0x4d0 > [<ffff8000000b5674>] copy_process.isra.6+0x79c/0x1338 > [<ffff8000000b633c>] _do_fork+0x74/0x320 > [<ffff8000000b66b0>] SyS_clone+0x18/0x20 > [<ffff800000085cb0>] el0_svc_naked+0x24/0x28 > [<ffffffffffffffff>] 0xffffffffffffffff On x86 kvm_vcpu_uninit() is called on the path from kvm_arch_destroy_vm(), on arm no equivalent call is made. Add the call to kvm_arch_vcpu_free(). Signed-off-by: James Morse <james.morse@arm.com> Fixes: 749cf76c5a36 ("KVM: ARM: Initial skeleton to compile KVM support") Cc: <stable@vger.kernel.org> # 3.10+ Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-06-14KVM: ARM: Fix typosAndrea Gelmini4-4/+4
Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-27Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds4-27/+52
Pull second batch of KVM updates from Radim Krčmář: "General: - move kvm_stat tool from QEMU repo into tools/kvm/kvm_stat (kvm_stat had nothing to do with QEMU in the first place -- the tool only interprets debugfs) - expose per-vm statistics in debugfs and support them in kvm_stat (KVM always collected per-vm statistics, but they were summarised into global statistics) x86: - fix dynamic APICv (VMX was improperly configured and a guest could access host's APIC MSRs, CVE-2016-4440) - minor fixes ARM changes from Christoffer Dall: - new vgic reimplementation of our horribly broken legacy vgic implementation. The two implementations will live side-by-side (with the new being the configured default) for one kernel release and then we'll remove the legacy one. - fix for a non-critical issue with virtual abort injection to guests" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (70 commits) tools: kvm_stat: Add comments tools: kvm_stat: Introduce pid monitoring KVM: Create debugfs dir and stat files for each VM MAINTAINERS: Add kvm tools tools: kvm_stat: Powerpc related fixes tools: Add kvm_stat man page tools: Add kvm_stat vm monitor script kvm:vmx: more complete state update on APICv on/off KVM: SVM: Add more SVM_EXIT_REASONS KVM: Unify traced vector format svm: bitwise vs logical op typo KVM: arm/arm64: vgic-new: Synchronize changes to active state KVM: arm/arm64: vgic-new: enable build KVM: arm/arm64: vgic-new: implement mapped IRQ handling KVM: arm/arm64: vgic-new: Wire up irqfd injection KVM: arm/arm64: vgic-new: Add vgic_v2/v3_enable KVM: arm/arm64: vgic-new: vgic_init: implement map_resources KVM: arm/arm64: vgic-new: vgic_init: implement vgic_init KVM: arm/arm64: vgic-new: vgic_init: implement vgic_create KVM: arm/arm64: vgic-new: vgic_init: implement kvm_vgic_hyp_init ...
2016-05-20KVM: arm/arm64: vgic-new: Synchronize changes to active stateChristoffer Dall1-1/+7
When modifying the active state of an interrupt via the MMIO interface, we should ensure that the write has the intended effect. If a guest sets an interrupt to active, but that interrupt is already flushed into a list register on a running VCPU, then that VCPU will write the active state back into the struct vgic_irq upon returning from the guest and syncing its state. This is a non-benign race, because the guest can observe that an interrupt is not active, and it can have a reasonable expectations that other VCPUs will not ack any IRQs, and then set the state to active, and expect it to stay that way. Currently we are not honoring this case. Thefore, change both the SACTIVE and CACTIVE mmio handlers to stop the world, change the irq state, potentially queue the irq if we're setting it to active, and then continue. We take this chance to slightly optimize these functions by not stopping the world when touching private interrupts where there is inherently no possible race. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-05-20KVM: arm/arm64: vgic-new: enable buildAndre Przywara2-0/+18
Now that the new VGIC implementation has reached feature parity with the old one, add the new files to the build system and add a Kconfig option to switch between the two versions. We set the default to the new version to get maximum test coverage, in case people experience problems they can switch back to the old behaviour if needed. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-05-20KVM: arm/arm64: Provide functionality to pause and resume a guestChristoffer Dall1-12/+13
For some rare corner cases in our VGIC emulation later we have to stop the guest to make sure the VGIC state is consistent. Provide the necessary framework to pause and resume a guest. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
2016-05-20KVM: arm/arm64: Export mmio_read/write_busChristoffer Dall1-5/+5
Rename mmio_{read,write}_bus to kvm_mmio_{read,write}_bus and export them out of mmio.c. This will be needed later for the new VGIC implementation. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Andre Przywara <andre.przywara@arm.com>
2016-05-20KVM: arm/arm64: Fix MMIO emulation data handlingChristoffer Dall1-7/+7
When the kernel was handling a guest MMIO read access internally, we need to copy the emulation result into the run->mmio structure in order for the kvm_handle_mmio_return() function to pick it up and inject the result back into the guest. Currently the only user of kvm_io_bus for ARM is the VGIC, which did this copying itself, so this was not causing issues so far. But with the upcoming new vgic implementation we need this done properly. Update the kvm_handle_mmio_return description and cleanup the code to only perform a single copying when needed. Code and commit message inspired by Andre Przywara. Reported-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Andre Przywara <andre.przywara@arm.com>
2016-05-20KVM: arm/arm64: Move timer IRQ map to latest possible timeChristoffer Dall1-3/+3
We are about to modify the VGIC to allocate all data structures dynamically and store mapped IRQ information on a per-IRQ struct, which is indeed allocated dynamically at init time. Therefore, we cannot record the mapped IRQ info from the timer at timer reset time like it's done now, because VCPU reset happens before timer init. A possible later time to do this is on the first run of a per VCPU, it just requires us to move the enable state to be a per-VCPU state and do the lookup of the physical IRQ number when we are about to run the VCPU. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
2016-05-19Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2-188/+222
Pull KVM updates from Paolo Bonzini: "Small release overall. x86: - miscellaneous fixes - AVIC support (local APIC virtualization, AMD version) s390: - polling for interrupts after a VCPU goes to halted state is now enabled for s390 - use hardware provided information about facility bits that do not need any hypervisor activity, and other fixes for cpu models and facilities - improve perf output - floating interrupt controller improvements. MIPS: - miscellaneous fixes PPC: - bugfixes only ARM: - 16K page size support - generic firmware probing layer for timer and GIC Christoffer Dall (KVM-ARM maintainer) says: "There are a few changes in this pull request touching things outside KVM, but they should all carry the necessary acks and it made the merge process much easier to do it this way." though actually the irqchip maintainers' acks didn't make it into the patches. Marc Zyngier, who is both irqchip and KVM-ARM maintainer, later acked at http://mid.gmane.org/573351D1.4060303@arm.com ('more formally and for documentation purposes')" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (82 commits) KVM: MTRR: remove MSR 0x2f8 KVM: x86: make hwapic_isr_update and hwapic_irr_update look the same svm: Manage vcpu load/unload when enable AVIC svm: Do not intercept CR8 when enable AVIC svm: Do not expose x2APIC when enable AVIC KVM: x86: Introducing kvm_x86_ops.apicv_post_state_restore svm: Add VMEXIT handlers for AVIC svm: Add interrupt injection via AVIC KVM: x86: Detect and Initialize AVIC support svm: Introduce new AVIC VMCB registers KVM: split kvm_vcpu_wake_up from kvm_vcpu_kick KVM: x86: Introducing kvm_x86_ops VCPU blocking/unblocking hooks KVM: x86: Introducing kvm_x86_ops VM init/destroy hooks KVM: x86: Rename kvm_apic_get_reg to kvm_lapic_get_reg KVM: x86: Misc LAPIC changes to expose helper functions KVM: shrink halt polling even more for invalid wakeups KVM: s390: set halt polling to 80 microseconds KVM: halt_polling: provide a way to qualify wakeups during poll KVM: PPC: Book3S HV: Re-enable XICS fast path for irqfd-generated interrupts kvm: Conditionally register IRQ bypass consumer ...
2016-05-17Merge tag 'arm64-upstream' of ↵Linus Torvalds2-47/+76
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Will Deacon: - virt_to_page/page_address optimisations - support for NUMA systems described using device-tree - support for hibernate/suspend-to-disk - proper support for maxcpus= command line parameter - detection and graceful handling of AArch64-only CPUs - miscellaneous cleanups and non-critical fixes * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (92 commits) arm64: do not enforce strict 16 byte alignment to stack pointer arm64: kernel: Fix incorrect brk randomization arm64: cpuinfo: Missing NULL terminator in compat_hwcap_str arm64: secondary_start_kernel: Remove unnecessary barrier arm64: Ensure pmd_present() returns false after pmd_mknotpresent() arm64: Replace hard-coded values in the pmd/pud_bad() macros arm64: Implement pmdp_set_access_flags() for hardware AF/DBM arm64: Fix typo in the pmdp_huge_get_and_clear() definition arm64: mm: remove unnecessary EXPORT_SYMBOL_GPL arm64: always use STRICT_MM_TYPECHECKS arm64: kvm: Fix kvm teardown for systems using the extended idmap arm64: kaslr: increase randomization granularity arm64: kconfig: drop CONFIG_RTC_LIB dependency arm64: make ARCH_SUPPORTS_DEBUG_PAGEALLOC depend on !HIBERNATION arm64: hibernate: Refuse to hibernate if the boot cpu is offline arm64: kernel: Add support for hibernate/suspend-to-disk PM / Hibernate: Call flush_icache_range() on pages restored in-place arm64: Add new asm macro copy_page arm64: Promote KERNEL_START/KERNEL_END definitions to a header file arm64: kernel: Include _AC definition in page.h ...
2016-05-09kvm: arm64: Enable hardware updates of the Access Flag for Stage 2 page tablesCatalin Marinas1-17/+29
The ARMv8.1 architecture extensions introduce support for hardware updates of the access and dirty information in page table entries. With VTCR_EL2.HA enabled (bit 21), when the CPU accesses an IPA with the PTE_AF bit cleared in the stage 2 page table, instead of raising an Access Flag fault to EL2 the CPU sets the actual page table entry bit (10). To ensure that kernel modifications to the page table do not inadvertently revert a bit set by hardware updates, certain Stage 2 software pte/pmd operations must be performed atomically. The main user of the AF bit is the kvm_age_hva() mechanism. The kvm_age_hva_handler() function performs a "test and clear young" action on the pte/pmd. This needs to be atomic in respect of automatic hardware updates of the AF bit. Since the AF bit is in the same position for both Stage 1 and Stage 2, the patch reuses the existing ptep_test_and_clear_young() functionality if __HAVE_ARCH_PTEP_TEST_AND_CLEAR_YOUNG is defined. Otherwise, the existing pte_young/pte_mkold mechanism is preserved. The kvm_set_s2pte_readonly() (and the corresponding pmd equivalent) have to perform atomic modifications in order to avoid a race with updates of the AF bit. The arm64 implementation has been re-written using exclusives. Currently, kvm_set_s2pte_writable() (and pmd equivalent) take a pointer argument and modify the pte/pmd in place. However, these functions are only used on local variables rather than actual page table entries, so it makes more sense to follow the pte_mkwrite() approach for stage 1 attributes. The change to kvm_s2pte_mkwrite() makes it clear that these functions do not modify the actual page table entries. The (pte|pmd)_mkyoung() uses on Stage 2 entries (setting the AF bit explicitly) do not need to be modified since hardware updates of the dirty status are not supported by KVM, so there is no possibility of losing such information. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-05-06mm: thp: kvm: fix memory corruption in KVM with THP enabledAndrea Arcangeli1-1/+1
After the THP refcounting change, obtaining a compound pages from get_user_pages() no longer allows us to assume the entire compound page is immediately mappable from a secondary MMU. A secondary MMU doesn't want to call get_user_pages() more than once for each compound page, in order to know if it can map the whole compound page. So a secondary MMU needs to know from a single get_user_pages() invocation when it can map immediately the entire compound page to avoid a flood of unnecessary secondary MMU faults and spurious atomic_inc()/atomic_dec() (pages don't have to be pinned by MMU notifier users). Ideally instead of the page->_mapcount < 1 check, get_user_pages() should return the granularity of the "page" mapping in the "mm" passed to get_user_pages(). However it's non trivial change to pass the "pmd" status belonging to the "mm" walked by get_user_pages up the stack (up to the caller of get_user_pages). So the fix just checks if there is not a single pte mapping on the page returned by get_user_pages, and in turn if the caller can assume that the whole compound page is mapped in the current "mm" (in a pmd_trans_huge()). In such case the entire compound page is safe to map into the secondary MMU without additional get_user_pages() calls on the surrounding tail/head pages. In addition of being faster, not having to run other get_user_pages() calls also reduces the memory footprint of the secondary MMU fault in case the pmd split happened as result of memory pressure. Without this fix after a MADV_DONTNEED (like invoked by QEMU during postcopy live migration or balloning) or after generic swapping (with a failure in split_huge_page() that would only result in pmd splitting and not a physical page split), KVM would map the whole compound page into the shadow pagetables, despite regular faults or userfaults (like UFFDIO_COPY) may map regular pages into the primary MMU as result of the pte faults, leading to the guest mode and userland mode going out of sync and not working on the same memory at all times. Any other secondary MMU notifier manager (KVM is just one of the many MMU notifier users) will need the same information if it doesn't want to run a flood of get_user_pages_fast and it can support multiple granularity in the secondary MMU mappings, so I think it is justified to be exposed not just to KVM. The other option would be to move transparent_hugepage_adjust to mm/huge_memory.c but that currently has all kind of KVM data structures in it, so it's definitely not a cut-and-paste work, so I couldn't do a fix as cleaner as this one for 4.6. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: "Li, Liang Z" <liang.z.li@intel.com> Cc: Amit Shah <amit.shah@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-29arm/arm64: KVM: Enforce Break-Before-Make on Stage-2 page tablesMarc Zyngier1-6/+11
The ARM architecture mandates that when changing a page table entry from a valid entry to another valid entry, an invalid entry is first written, TLB invalidated, and only then the new entry being written. The current code doesn't respect this, directly writing the new entry and only then invalidating TLBs. Let's fix it up. Cc: <stable@vger.kernel.org> Reported-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-04-28arm64: kvm: allows kvm cpu hotplugAKASHI Takahiro2-48/+76
The current kvm implementation on arm64 does cpu-specific initialization at system boot, and has no way to gracefully shutdown a core in terms of kvm. This prevents kexec from rebooting the system at EL2. This patch adds a cpu tear-down function and also puts an existing cpu-init code into a separate function, kvm_arch_hardware_disable() and kvm_arch_hardware_enable() respectively. We don't need the arm64 specific cpu hotplug hook any more. Since this patch modifies common code between arm and arm64, one stub definition, __cpu_reset_hyp_mode(), is added on arm side to avoid compilation errors. Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> [Rebase, added separate VHE init/exit path, changed resets use of kvm_call_hyp() to the __version, en/disabled hardware in init_subsystems(), added icache maintenance to __kvm_hyp_reset() and removed lr restore, removed guest-enter after teardown handling] Signed-off-by: James Morse <james.morse@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-04-21kvm-arm: Cleanup stage2 pgd handlingSuzuki K Poulose2-32/+7
Now that we don't have any fake page table levels for arm64, cleanup the common code to get rid of the dead code. Cc: Marc Zyngier <marc.zyngier@arm.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
2016-04-21kvm-arm: Cleanup kvm_* wrappersSuzuki K Poulose1-8/+1
Now that we have switched to explicit page table routines, get rid of the obsolete kvm_* wrappers. Also, kvm_tlb_flush_vmid_by_ipa is now called only on stage2 page tables, hence get rid of the redundant check. Cc: Marc Zyngier <marc.zyngier@arm.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
2016-04-21kvm-arm: Add stage2 page table modifiersSuzuki K Poulose1-53/+44
Now that the hyp page table is handled by different set of routines, rename the original shared routines to stage2 handlers. Also make explicit use of the stage2 page table helpers. unmap_range has been merged to existing unmap_stage2_range. Cc: Marc Zyngier <marc.zyngier@arm.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
2016-04-21kvm-arm: Add explicit hyp page table modifiersSuzuki K Poulose1-5/+99
We have common routines to modify hyp and stage2 page tables based on the 'kvm' parameter. For a smoother transition to using separate routines for each, duplicate the routines and modify the copy to work on hyp. Marks the forked routines with _hyp_ and gets rid of the kvm parameter which is no longer needed and is NULL for hyp. Also, gets rid of calls to kvm_tlb_flush_by_vmid_ipa() calls from the hyp versions. Uses explicit host page table accessors instead of the kvm_* page table helpers. Suggested-by: Christoffer Dall <christoffer.dall@linaro.org> Cc: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
2016-04-21kvm-arm: Use explicit stage2 helper routinesSuzuki K Poulose1-24/+24
We have stage2 page table helpers for both arm and arm64. Switch to the stage2 helpers for routines that only deal with stage2 page table. Cc: Marc Zyngier <marc.zyngier@arm.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
2016-04-21kvm-arm: Remove kvm_pud_huge()Suzuki K Poulose1-3/+1
Get rid of kvm_pud_huge() which falls back to pud_huge. Use pud_huge instead. Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
2016-04-21kvm-arm: Replace kvm_pmd_huge with pmd_thp_or_hugeSuzuki K Poulose1-9/+8
Both arm and arm64 now provides a helper, pmd_thp_or_huge() to check if the given pmd represents a huge page. Use that instead of our own custom check. Suggested-by: Mark Rutland <mark.rutland@arm.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
2016-04-21kvm arm: Move fake PGD handling to arch specific filesSuzuki K Poulose1-40/+7
Rearrange the code for fake pgd handling, which is applicable only for arm64. This will later be removed once we introduce the stage2 page table walker macros. Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
2016-04-06arm64: KVM: unregister notifiers in hyp mode teardown pathSudeep Holla1-3/+10
Commit 1e947bad0b63 ("arm64: KVM: Skip HYP setup when already running in HYP") re-organized the hyp init code and ended up leaving the CPU hotplug and PM notifier even if hyp mode initialization fails. Since KVM is not yet supported with ACPI, the above mentioned commit breaks CPU hotplug in ACPI boot. This patch fixes teardown_hyp_mode to properly unregister both CPU hotplug and PM notifiers in the teardown path. Fixes: 1e947bad0b63 ("arm64: KVM: Skip HYP setup when already running in HYP") Cc: Christoffer Dall <christoffer.dall@linaro.org> Cc: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-03-31arm64: KVM: Register CPU notifiers when the kernel runs at HYPJames Morse1-19/+33
When the kernel is running at EL2, it doesn't need init_hyp_mode() to configure page tables for HYP. This function also registers the CPU hotplug and lower power notifiers that cause HYP to be re-initialised after the CPU has been reset. To avoid losing the register state that controls stage2 translation, move the registering of these notifiers into init_subsystems(), and add a is_kernel_in_hyp_mode() path to each callback. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Fixes: 1e947bad0b6 ("arm64: KVM: Skip HYP setup when already running in HYP") Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-03-21KVM: arm/arm64: disable preemption when calling smp_call_function_manyEric Auger1-0/+2
Preemption must be disabled when calling smp_call_function_many Reported-by: bartosz.wawrzyniak@tieto.com Signed-off-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>