summaryrefslogtreecommitdiff
path: root/arch/x86/xen
AgeCommit message (Collapse)AuthorFilesLines
2014-01-06xen/mmu/p2m: Refactor the xen_pagetable_init code (v2).Konrad Rzeszutek Wilk1-33/+37
The revectoring and copying of the P2M only happens when !auto-xlat and on 64-bit builds. It is not obvious from the code, so lets have seperate 32 and 64-bit functions. We also invert the check for auto-xlat to make the code flow simpler. Suggested-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2014-01-06xen/pvh: Don't setup P2M tree.Konrad Rzeszutek Wilk2-4/+11
P2M is not available for PVH. Fortunatly for us the P2M code already has mostly the support for auto-xlat guest thanks to commit 3d24bbd7dddbea54358a9795abaf051b0f18973c "grant-table: call set_phys_to_machine after mapping grant refs" which: " introduces set_phys_to_machine calls for auto_translated guests (even on x86) in gnttab_map_refs and gnttab_unmap_refs. translated by swiotlb-xen... " so we don't need to muck much. with above mentioned "commit you'll get set_phys_to_machine calls from gnttab_map_refs and gnttab_unmap_refs but PVH guests won't do anything with them " (Stefano Stabellini) which is OK - we want them to be NOPs. This is because we assume that an "IOMMU is always present on the plaform and Xen is going to make the appropriate IOMMU pagetable changes in the hypercall implementation of GNTTABOP_map_grant_ref and GNTTABOP_unmap_grant_ref, then eveything should be transparent from PVH priviligied point of view and DMA transfers involving foreign pages keep working with no issues[sp] Otherwise we would need a P2M (and an M2P) for PVH priviligied to track these foreign pages .. (see arch/arm/xen/p2m.c)." (Stefano Stabellini). We still have to inhibit the building of the P2M tree. That had been done in the past by not calling xen_build_dynamic_phys_to_machine (which setups the P2M tree and gives us virtual address to access them). But we are missing a check for xen_build_mfn_list_list - which was continuing to setup the P2M tree and would blow up at trying to get the virtual address of p2m_missing (which would have been setup by xen_build_dynamic_phys_to_machine). Hence a check is needed to not call xen_build_mfn_list_list when running in auto-xlat mode. Instead of replicating the check for auto-xlat in enlighten.c do it in the p2m.c code. The reason is that the xen_build_mfn_list_list is called also in xen_arch_post_suspend without any checks for auto-xlat. So for PVH or PV with auto-xlat - we would needlessly allocate space for an P2M tree. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
2014-01-06xen/pvh: Early bootup changes in PV code (v4).Mukesh Rathor2-20/+46
We don't use the filtering that 'xen_cpuid' is doing because the hypervisor treats 'XEN_EMULATE_PREFIX' as an invalid instruction. This means that all of the filtering will have to be done in the hypervisor/toolstack. Without the filtering we expose to the guest the: - cpu topology (sockets, cores, etc); - the APERF (which the generic scheduler likes to use), see 5e626254206a709c6e937f3dda69bf26c7344f6f "xen/setup: filter APERFMPERF cpuid feature out" - and the inability to figure out whether MWAIT_LEAF should be exposed or not. See df88b2d96e36d9a9e325bfcd12eb45671cbbc937 "xen/enlighten: Disable MWAIT_LEAF so that acpi-pad won't be loaded." - x2apic, see 4ea9b9aca90cfc71e6872ed3522356755162932c "xen: mask x2APIC feature in PV" We also check for vector callback early on, as it is a required feature. PVH also runs at default kernel IOPL. Finally, pure PV settings are moved to a separate function that are only called for pure PV, ie, pv with pvmmu. They are also #ifdef with CONFIG_XEN_PVMMU. Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
2014-01-06xen/pvh/x86: Define what an PVH guest is (v3).Mukesh Rathor1-0/+5
Which is a PV guest with auto page translation enabled and with vector callback. It is a cross between PVHVM and PV. The Xen side defines PVH as (from docs/misc/pvh-readme.txt, with modifications): "* the guest uses auto translate: - p2m is managed by Xen - pagetables are owned by the guest - mmu_update hypercall not available * it uses event callback and not vlapic emulation, * IDT is native, so set_trap_table hcall is also N/A for a PVH guest. For a full list of hcalls supported for PVH, see pvh_hypercall64_table in arch/x86/hvm/hvm.c in xen. From the ABI prespective, it's mostly a PV guest with auto translate, although it does use hvm_op for setting callback vector." Also we use the PV cpuid, albeit we can use the HVM (native) cpuid. However, we do have a fair bit of filtering in the xen_cpuid and we can piggyback on that until the hypervisor/toolstack filters the appropiate cpuids. Once that is done we can swap over to use the native one. We setup a Kconfig entry that is disabled by default and cannot be enabled. Note that on ARM the concept of PVH is non-existent. As Ian put it: "an ARM guest is neither PV nor HVM nor PVHVM. It's a bit like PVH but is different also (it's further towards the H end of the spectrum than even PVH).". As such these options (PVHVM, PVH) are never enabled nor seen on ARM compilations. Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2014-01-06xen/x86: set VIRQ_TIMER priority to maximumDavid Vrabel1-0/+1
Commit bee980d9e (xen/events: Handle VIRQ_TIMER before any other hardirq in event loop) effectively made the VIRQ_TIMER the highest priority event when using the 2-level ABI. Set the VIRQ_TIMER priority to the highest so this behaviour is retained when using the FIFO-based ABI. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2014-01-03xen/pvhvm: Remove the xen_platform_pci int.Konrad Rzeszutek Wilk1-3/+2
Since we have xen_has_pv_devices,xen_has_pv_disk_devices, xen_has_pv_nic_devices, and xen_has_pv_and_legacy_disk_devices to figure out the different 'unplug' behaviors - lets use those instead of this single int. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2014-01-03xen/pvhvm: If xen_platform_pci=0 is set don't blow up (v4).Konrad Rzeszutek Wilk1-0/+74
The user has the option of disabling the platform driver: 00:02.0 Unassigned class [ff80]: XenSource, Inc. Xen Platform Device (rev 01) which is used to unplug the emulated drivers (IDE, Realtek 8169, etc) and allow the PV drivers to take over. If the user wishes to disable that they can set: xen_platform_pci=0 (in the guest config file) or xen_emul_unplug=never (on the Linux command line) except it does not work properly. The PV drivers still try to load and since the Xen platform driver is not run - and it has not initialized the grant tables, most of the PV drivers stumble upon: input: Xen Virtual Keyboard as /devices/virtual/input/input5 input: Xen Virtual Pointer as /devices/virtual/input/input6M ------------[ cut here ]------------ kernel BUG at /home/konrad/ssd/konrad/linux/drivers/xen/grant-table.c:1206! invalid opcode: 0000 [#1] SMP Modules linked in: xen_kbdfront(+) xenfs xen_privcmd CPU: 6 PID: 1389 Comm: modprobe Not tainted 3.13.0-rc1upstream-00021-ga6c892b-dirty #1 Hardware name: Xen HVM domU, BIOS 4.4-unstable 11/26/2013 RIP: 0010:[<ffffffff813ddc40>] [<ffffffff813ddc40>] get_free_entries+0x2e0/0x300 Call Trace: [<ffffffff8150d9a3>] ? evdev_connect+0x1e3/0x240 [<ffffffff813ddd0e>] gnttab_grant_foreign_access+0x2e/0x70 [<ffffffffa0010081>] xenkbd_connect_backend+0x41/0x290 [xen_kbdfront] [<ffffffffa0010a12>] xenkbd_probe+0x2f2/0x324 [xen_kbdfront] [<ffffffff813e5757>] xenbus_dev_probe+0x77/0x130 [<ffffffff813e7217>] xenbus_frontend_dev_probe+0x47/0x50 [<ffffffff8145e9a9>] driver_probe_device+0x89/0x230 [<ffffffff8145ebeb>] __driver_attach+0x9b/0xa0 [<ffffffff8145eb50>] ? driver_probe_device+0x230/0x230 [<ffffffff8145eb50>] ? driver_probe_device+0x230/0x230 [<ffffffff8145cf1c>] bus_for_each_dev+0x8c/0xb0 [<ffffffff8145e7d9>] driver_attach+0x19/0x20 [<ffffffff8145e260>] bus_add_driver+0x1a0/0x220 [<ffffffff8145f1ff>] driver_register+0x5f/0xf0 [<ffffffff813e55c5>] xenbus_register_driver_common+0x15/0x20 [<ffffffff813e76b3>] xenbus_register_frontend+0x23/0x40 [<ffffffffa0015000>] ? 0xffffffffa0014fff [<ffffffffa001502b>] xenkbd_init+0x2b/0x1000 [xen_kbdfront] [<ffffffff81002049>] do_one_initcall+0x49/0x170 .. snip.. which is hardly nice. This patch fixes this by having each PV driver check for: - if running in PV, then it is fine to execute (as that is their native environment). - if running in HVM, check if user wanted 'xen_emul_unplug=never', in which case bail out and don't load any PV drivers. - if running in HVM, and if PCI device 5853:0001 (xen_platform_pci) does not exist, then bail out and not load PV drivers. - (v2) if running in HVM, and if the user wanted 'xen_emul_unplug=ide-disks', then bail out for all PV devices _except_ the block one. Ditto for the network one ('nics'). - (v2) if running in HVM, and if the user wanted 'xen_emul_unplug=unnecessary' then load block PV driver, and also setup the legacy IDE paths. In (v3) make it actually load PV drivers. Reported-by: Sander Eikelenboom <linux@eikelenboom.it Reported-by: Anthony PERARD <anthony.perard@citrix.com> Reported-and-Tested-by: Fabio Fantoni <fabio.fantoni@m2r.biz> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [v2: Add extra logic to handle the myrid ways 'xen_emul_unplug' can be used per Ian and Stefano suggestion] [v3: Make the unnecessary case work properly] [v4: s/disks/ide-disks/ spotted by Fabio] Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> [for PCI parts] CC: stable@vger.kernel.org
2013-11-15Merge tag 'stable/for-linus-3.13-rc0-tag' of ↵Linus Torvalds7-17/+25
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull Xen updates from Konrad Rzeszutek Wilk: "This has tons of fixes and two major features which are concentrated around the Xen SWIOTLB library. The short <blurb> is that the tracing facility (just one function) has been added to SWIOTLB to make it easier to track I/O progress. Additionally under Xen and ARM (32 & 64) the Xen-SWIOTLB driver "is used to translate physical to machine and machine to physical addresses of foreign[guest] pages for DMA operations" (Stefano) when booting under hardware without proper IOMMU. There are also bug-fixes, cleanups, compile warning fixes, etc. The commit times for some of the commits is a bit fresh - that is b/c we wanted to make sure we have the Ack's from the ARM folks - which with the string of back-to-back conferences took a bit of time. Rest assured - the code has been stewing in #linux-next for some time. Features: - SWIOTLB has tracing added when doing bounce buffer. - Xen ARM/ARM64 can use Xen-SWIOTLB. This work allows Linux to safely program real devices for DMA operations when running as a guest on Xen on ARM, without IOMMU support. [*1] - xen_raw_printk works with PVHVM guests if needed. Bug-fixes: - Make memory ballooning work under HVM with large MMIO region. - Inform hypervisor of MCFG regions found in ACPI DSDT. - Remove deprecated IRQF_DISABLED. - Remove deprecated __cpuinit. [*1]: "On arm and arm64 all Xen guests, including dom0, run with second stage translation enabled. As a consequence when dom0 programs a device for a DMA operation is going to use (pseudo) physical addresses instead machine addresses. This work introduces two trees to track physical to machine and machine to physical mappings of foreign pages. Local pages are assumed mapped 1:1 (physical address == machine address). It enables the SWIOTLB-Xen driver on ARM and ARM64, so that Linux can translate physical addresses to machine addresses for dma operations when necessary. " (Stefano)" * tag 'stable/for-linus-3.13-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (32 commits) xen/arm: pfn_to_mfn and mfn_to_pfn return the argument if nothing is in the p2m arm,arm64/include/asm/io.h: define struct bio_vec swiotlb-xen: missing include dma-direction.h pci-swiotlb-xen: call pci_request_acs only ifdef CONFIG_PCI arm: make SWIOTLB available xen: delete new instances of added __cpuinit xen/balloon: Set balloon's initial state to number of existing RAM pages xen/mcfg: Call PHYSDEVOP_pci_mmcfg_reserved for MCFG areas. xen: remove deprecated IRQF_DISABLED x86/xen: remove deprecated IRQF_DISABLED swiotlb-xen: fix error code returned by xen_swiotlb_map_sg_attrs swiotlb-xen: static inline xen_phys_to_bus, xen_bus_to_phys, xen_virt_to_bus and range_straddles_page_boundary grant-table: call set_phys_to_machine after mapping grant refs arm,arm64: do not always merge biovec if we are running on Xen swiotlb: print a warning when the swiotlb is full swiotlb-xen: use xen_dma_map/unmap_page, xen_dma_sync_single_for_cpu/device xen: introduce xen_dma_map/unmap_page and xen_dma_sync_single_for_cpu/device tracing/events: Fix swiotlb tracepoint creation swiotlb-xen: use xen_alloc/free_coherent_pages xen: introduce xen_alloc/free_coherent_pages ...
2013-11-15mm: dynamically allocate page->ptl if it cannot be embedded to struct pageKirill A. Shutemov1-1/+1
If split page table lock is in use, we embed the lock into struct page of table's page. We have to disable split lock, if spinlock_t is too big be to be embedded, like when DEBUG_SPINLOCK or DEBUG_LOCK_ALLOC enabled. This patch add support for dynamic allocation of split page table lock if we can't embed it to struct page. page->ptl is unsigned long now and we use it as spinlock_t if sizeof(spinlock_t) <= sizeof(long), otherwise it's pointer to spinlock_t. The spinlock_t allocated in pgtable_page_ctor() for PTE table and in pgtable_pmd_page_ctor() for PMD table. All other helpers converted to support dynamically allocated page->ptl. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-15mm: rename USE_SPLIT_PTLOCKS to USE_SPLIT_PTE_PTLOCKSKirill A. Shutemov1-3/+3
We're going to introduce split page table lock for PMD level. Let's rename existing split ptlock for PTE level to avoid confusion. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Tested-by: Alex Thorlton <athorlton@sgi.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Dave Jones <davej@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kees Cook <keescook@chromium.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Robin Holt <robinmholt@gmail.com> Cc: Sedat Dilek <sedat.dilek@gmail.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-09Merge remote-tracking branch 'stefano/swiotlb-xen-9.1' into ↵Konrad Rzeszutek Wilk2-6/+11
stable/for-linus-3.13 * stefano/swiotlb-xen-9.1: swiotlb-xen: fix error code returned by xen_swiotlb_map_sg_attrs swiotlb-xen: static inline xen_phys_to_bus, xen_bus_to_phys, xen_virt_to_bus and range_straddles_page_boundary grant-table: call set_phys_to_machine after mapping grant refs arm,arm64: do not always merge biovec if we are running on Xen swiotlb: print a warning when the swiotlb is full swiotlb-xen: use xen_dma_map/unmap_page, xen_dma_sync_single_for_cpu/device xen: introduce xen_dma_map/unmap_page and xen_dma_sync_single_for_cpu/device swiotlb-xen: use xen_alloc/free_coherent_pages xen: introduce xen_alloc/free_coherent_pages arm64/xen: get_dma_ops: return xen_dma_ops if we are running as xen_initial_domain arm/xen: get_dma_ops: return xen_dma_ops if we are running as xen_initial_domain swiotlb-xen: introduce xen_swiotlb_set_dma_mask xen/arm,arm64: enable SWIOTLB_XEN xen: make xen_create_contiguous_region return the dma address xen/x86: allow __set_phys_to_machine for autotranslate guests arm/xen,arm64/xen: introduce p2m arm64: define DMA_ERROR_CODE arm: make SWIOTLB available Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Conflicts: arch/arm/include/asm/dma-mapping.h drivers/xen/swiotlb-xen.c [Conflicts arose b/c "arm: make SWIOTLB available" v8 was in Stefano's branch, while I had v9 + Ack from Russel. I also fixed up white-space issues]
2013-11-09pci-swiotlb-xen: call pci_request_acs only ifdef CONFIG_PCIStefano Stabellini1-0/+4
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-11-09xen: delete new instances of added __cpuinitPaul Gortmaker1-1/+1
commit 6efa20e49b9cb1db1ab66870cc37323474a75a13 ("xen: Support 64-bit PV guest receiving NMIs") and commit cd9151e26d31048b2b5e00fd02e110e07d2200c9 ( "xen/balloon: set a mapping for ballooned out pages") added new instances of __cpuinit usage. We removed this a couple versions ago; we now want to remove the compat no-op stubs. Introducing new users is not what we want to see at this point in time, as it will break once the stubs are gone. Cc: Konrad Rzeszutek Wilk <konrad@kernel.org> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-11-07x86/xen: remove deprecated IRQF_DISABLEDMichael Opdenacker3-8/+7
This patch proposes to remove the IRQF_DISABLED flag from x86/xen code. It's a NOOP since 2.6.35 and it will be removed one day. Signed-off-by: Michael Opdenacker <michael.opdenacker@free-electrons.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-10-10xen: Fix possible user space selector corruptionFrediano Ziglio1-0/+9
Due to the way kernel is initialized under Xen is possible that the ring1 selector used by the kernel for the boot cpu end up to be copied to userspace leading to segmentation fault in the userspace. Xen code in the kernel initialize no-boot cpus with correct selectors (ds and es set to __USER_DS) but the boot one keep the ring1 (passed by Xen). On task context switch (switch_to) we assume that ds, es and cs already point to __USER_DS and __KERNEL_CSso these selector are not changed. If processor is an Intel that support sysenter instruction sysenter/sysexit is used so ds and es are not restored switching back from kernel to userspace. In the case the selectors point to a ring1 instead of __USER_DS the userspace code will crash on first memory access attempt (to be precise Xen on the emulated iret used to do sysexit will detect and set ds and es to zero which lead to GPF anyway). Now if an userspace process call kernel using sysenter and get rescheduled (for me it happen on a specific init calling wait4) could happen that the ring1 selector is set to ds and es. This is quite hard to detect cause after a while these selectors are fixed (__USER_DS seems sticky). Bisecting the code commit 7076aada1040de4ed79a5977dbabdb5e5ea5e249 appears to be the first one that have this issue. Signed-off-by: Frediano Ziglio <frediano.ziglio@citrix.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
2013-10-10swiotlb-xen: use xen_alloc/free_coherent_pagesStefano Stabellini1-2/+5
Use xen_alloc_coherent_pages and xen_free_coherent_pages to allocate or free coherent pages. We need to be careful handling the pointer returned by xen_alloc_coherent_pages, because on ARM the pointer is not equal to phys_to_virt(*dma_handle). In fact virt_to_phys only works for kernel direct mapped RAM memory. In ARM case the pointer could be an ioremap address, therefore passing it to virt_to_phys would give you another physical address that doesn't correspond to it. Make xen_create_contiguous_region take a phys_addr_t as start parameter to avoid the virt_to_phys calls which would be incorrect. Changes in v6: - remove extra spaces. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-10-09xen: make xen_create_contiguous_region return the dma addressStefano Stabellini1-1/+3
Modify xen_create_contiguous_region to return the dma address of the newly contiguous buffer. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Changes in v4: - use virt_to_machine instead of virt_to_bus.
2013-10-10xen/x86: allow __set_phys_to_machine for autotranslate guestsStefano Stabellini1-3/+3
Allow __set_phys_to_machine to be called for autotranslate guests. It can be used to keep track of phys_to_machine changes, however we don't do anything with the information at the moment. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
2013-09-27xen/mmu: Correct PAT MST setting.Konrad Rzeszutek Wilk1-2/+2
Jan Beulich spotted that the PAT MSR settings in the Xen public document that "the first (PAT6) column was wrong across the board, and the column for PAT7 was missing altogether." This updates it to be in sync. CC: Jan Beulich <jbeulich@suse.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Jan Beulich <jbeulich@suse.com>
2013-09-26Merge tag 'stable/for-linus-3.12-rc2-tag' of ↵Linus Torvalds2-8/+28
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull Xen fixes from Konrad Rzeszutek Wilk: "Bug-fixes and one update to the kernel-paramters.txt documentation. - Fix PV spinlocks triggering jump_label code bug - Remove extraneous code in the tpm front driver - Fix ballooning out of pages when non-preemptible - Fix deadlock when using a 32-bit initial domain with large amount of memory - Add xen_nopvpsin parameter to the documentation" * tag 'stable/for-linus-3.12-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/spinlock: Document the xen_nopvspin parameter. xen/p2m: check MFN is in range before using the m2p table xen/balloon: don't alloc page while non-preemptible xen: Do not enable spinlocks before jump_label_init() has executed tpm: xen-tpmfront: Remove the locality sysfs attribute tpm: xen-tpmfront: Fix default durations
2013-09-25xen/p2m: check MFN is in range before using the m2p tableDavid Vrabel1-6/+4
On hosts with more than 168 GB of memory, a 32-bit guest may attempt to grant map an MFN that is error cannot lookup in its mapping of the m2p table. There is an m2p lookup as part of m2p_add_override() and m2p_remove_override(). The lookup falls off the end of the mapped portion of the m2p and (because the mapping is at the highest virtual address) wraps around and the lookup causes a fault on what appears to be a user space address. do_page_fault() (thinking it's a fault to a userspace address), tries to lock mm->mmap_sem. If the gntdev device is used for the grant map, m2p_add_override() is called from from gnttab_mmap() with mm->mmap_sem already locked. do_page_fault() then deadlocks. The deadlock would most commonly occur when a 64-bit guest is started and xenconsoled attempts to grant map its console ring. Introduce mfn_to_pfn_no_overrides() which checks the MFN is within the mapped portion of the m2p table before accessing the table and use this in m2p_add_override(), m2p_remove_override(), and mfn_to_pfn() (which already had the correct range check). All faults caused by accessing the non-existant parts of the m2p are thus within the kernel address space and exception_fixup() is called without trying to lock mm->mmap_sem. This means that for MFNs that are outside the mapped range of the m2p then mfn_to_pfn() will always look in the m2p overrides. This is correct because it must be a foreign MFN (and the PFN in the m2p in this case is only relevant for the other domain). Signed-off-by: David Vrabel <david.vrabel@citrix.com> Cc: Stefano Stabellini <stefano.stabellini@citrix.com> Cc: Jan Beulich <JBeulich@suse.com> -- v3: check for auto_translated_physmap in mfn_to_pfn_no_overrides() v2: in mfn_to_pfn() look in m2p_overrides if the MFN is out of range as it's probably foreign. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
2013-09-25xen: Do not enable spinlocks before jump_label_init() has executedKonrad Rzeszutek Wilk1-2/+24
xen_init_spinlocks() currently calls static_key_slow_inc() before jump_label_init() is invoked. When CONFIG_JUMP_LABEL is set (which usually is the case) the effect of this static_key_slow_inc() is deferred until after jump_label_init(). This is different from when CONFIG_JUMP_LABEL is not set, in which case the key is set immediately. Thus, depending on the value of config option, we may observe different behavior. In addition, when we come to __jump_label_transform() from jump_label_init(), the key (paravirt_ticketlocks_enabled) is already enabled. On processors where ideal_nop is not the same as default_nop this will cause a BUG() since it is expected that before a key is enabled the latter is replaced by the former during initialization. To address this problem we need to move static_key_slow_inc(&paravirt_ticketlocks_enabled) so that it is called after jump_label_init(). We also need to make sure that this is done before other cpus start to boot. early_initcall appears to be a good place to do so. (Note that we cannot move whole xen_init_spinlocks() there since pv_lock_ops need to be set before alternative_instructions() runs.) Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [v2: Added extra comments in the code] Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
2013-09-11Merge tag 'stable/for-linus-3.12-rc0-tag-two' of ↵Linus Torvalds4-47/+37
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull Xen bug-fixes from Konrad Rzeszutek Wilk: "This pull I usually do after rc1 is out but because we have a nice amount of fixes, some bootup related fixes for ARM, and it is early in the cycle we figured to do it now to help with tracking of potential regressions. The simple ones are the ARM ones - one of the patches fell through the cracks, other fixes a bootup issue (unconditionally using Xen functions). Then a fix for a regression causing preempt count being off (patch causing this went in v3.12). Lastly are the fixes to make Xen PVHVM guests use PV ticketlocks (Xen PV already does). The enablement of that was supposed to be part of the x86 spinlock merge in commit 816434ec4a67 ("The biggest change here are paravirtualized ticket spinlocks (PV spinlocks), which bring a nice speedup on various benchmarks...") but unfortunatly it would cause hang when booting Xen PVHVM guests. Yours truly got all of the bugs fixed last week and they (six of them) are included in this pull. Bug-fixes: - Boot on ARM without using Xen unconditionally - On Xen ARM don't run cpuidle/cpufreq - Fix regression in balloon driver, preempt count warnings - Fixes to make PVHVM able to use pv ticketlock. - Revert Xen PVHVM disabling pv ticketlock (aka, re-enable pv ticketlocks)" * tag 'stable/for-linus-3.12-rc0-tag-two' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/spinlock: Don't use __initdate for xen_pv_spin Revert "xen/spinlock: Disable IRQ spinlock (PV) allocation on PVHVM" xen/spinlock: Don't setup xen spinlock IPI kicker if disabled. xen/smp: Update pv_lock_ops functions before alternative code starts under PVHVM xen/spinlock: We don't need the old structure anymore xen/spinlock: Fix locking path engaging too soon under PVHVM. xen/arm: disable cpuidle and cpufreq when linux is running as dom0 xen/p2m: Don't call get_balloon_scratch_page() twice, keep interrupts disabled for multicalls ARM: xen: only set pm function ptrs for Xen guests
2013-09-09xen/spinlock: Don't use __initdate for xen_pv_spinKonrad Rzeszutek Wilk1-1/+1
As we get compile warnings about .init.data being used by non-init functions. Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-09-09Revert "xen/spinlock: Disable IRQ spinlock (PV) allocation on PVHVM"Konrad Rzeszutek Wilk1-20/+0
This reverts commit 70dd4998cb85f0ecd6ac892cc7232abefa432efb. Now that the bugs have been resolved we can re-enable the PV ticketlock implementation under PVHVM Xen guests. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com>
2013-09-09xen/spinlock: Don't setup xen spinlock IPI kicker if disabled.Konrad Rzeszutek Wilk1-1/+10
There is no need to setup this kicker IPI if we are never going to use the paravirtualized ticketlock mechanism. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com>
2013-09-09xen/smp: Update pv_lock_ops functions before alternative code starts under PVHVMKonrad Rzeszutek Wilk1-5/+14
Before this patch we would patch all of the pv_lock_ops sites using alternative assembler. Then later in the bootup cycle change the unlock_kick and lock_spinning to the Xen specific - without re patching. That meant that for the core of the kernel we would be running with the baremetal version of unlock_kick and lock_spinning while for modules we would have the proper Xen specific slowpaths. As most of the module uses some API from the core kernel that ended up with slowpath lockers waiting forever to be kicked (b/c they would be using the Xen specific slowpath logic). And the kick never came b/c the unlock path that was taken was the baremetal one. On PV we do not have the problem as we initialise before the alternative code kicks in. The fix is to make the updating of the pv_lock_ops function be done before the alternative code starts patching. Note that this patch fixes issues discovered by commit f10cd522c5fbfec9ae3cc01967868c9c2401ed23. ("xen: disable PV spinlocks on HVM") wherein it mentioned PV spinlocks cannot possibly work with the current code because they are enabled after pvops patching has already been done, and because PV spinlocks use a different data structure than native spinlocks so we cannot switch between them dynamically. The first problem is solved by this patch. The second problem has been solved by commit 816434ec4a674fcdb3c2221a6dffdc8f34020550 (Merge branch 'x86-spinlocks-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip) P.S. There is still the commit 70dd4998cb85f0ecd6ac892cc7232abefa432efb (xen/spinlock: Disable IRQ spinlock (PV) allocation on PVHVM) to revert but that can be done later after all other bugs have been fixed. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com>
2013-09-09xen/spinlock: We don't need the old structure anymoreKonrad Rzeszutek Wilk1-18/+0
As we are using the generic ticketlock structs and these old structures are not needed anymore. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com>
2013-09-09xen/spinlock: Fix locking path engaging too soon under PVHVM.Konrad Rzeszutek Wilk2-1/+9
The xen_lock_spinning has a check for the kicker interrupts and if it is not initialized it will spin normally (not enter the slowpath). But for PVHVM case we would initialize the kicker interrupt before the CPU came online. This meant that if the booting CPU used a spinlock and went in the slowpath - it would enter the slowpath and block forever. The forever part because during bootup: the spinlock would be taken _before_ the CPU sets itself to be online (more on this further), and we enter to poll on the event channel forever. The bootup CPU (see commit fc78d343fa74514f6fd117b5ef4cd27e4ac30236 "xen/smp: initialize IPI vectors before marking CPU online" for details) and the CPU that started the bootup consult the cpu_online_mask to determine whether the booting CPU should get an IPI. The booting CPU has to set itself in this mask via: set_cpu_online(smp_processor_id(), true); However, if the spinlock is taken before this (and it is) and it polls on an event channel - it will never be woken up as the kernel will never send an IPI to an offline CPU. Note that the PVHVM logic in sending IPIs is using the HVM path which has numerous checks using the cpu_online_mask and cpu_active_mask. See above mention git commit for details. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com>
2013-09-09Merge tag 'v3.11-rc7' into stable/for-linus-3.12Konrad Rzeszutek Wilk2-2/+31
Linux 3.11-rc7 As we need the git commit 28817e9de4f039a1a8c1fe1df2fa2df524626b9e Author: Chuck Anderson <chuck.anderson@oracle.com> Date: Tue Aug 6 15:12:19 2013 -0700 xen/smp: initialize IPI vectors before marking CPU online * tag 'v3.11-rc7': (443 commits) Linux 3.11-rc7 ARC: [lib] strchr breakage in Big-endian configuration VFS: collect_mounts() should return an ERR_PTR bfs: iget_locked() doesn't return an ERR_PTR efs: iget_locked() doesn't return an ERR_PTR() proc: kill the extra proc_readfd_common()->dir_emit_dots() cope with potentially long ->d_dname() output for shmem/hugetlb usb: phy: fix build breakage USB: OHCI: add missing PCI PM callbacks to ohci-pci.c staging: comedi: bug-fix NULL pointer dereference on failed attach lib/lz4: correct the LZ4 license memcg: get rid of swapaccount leftovers nilfs2: fix issue with counting number of bio requests for BIO_EOPNOTSUPP error detection nilfs2: remove double bio_put() in nilfs_end_bio_write() for BIO_EOPNOTSUPP error drivers/platform/olpc/olpc-ec.c: initialise earlier ipv4: expose IPV4_DEVCONF ipv6: handle Redirect ICMP Message with no Redirected Header option be2net: fix disabling TX in be_close() Revert "ACPI / video: Always call acpi_video_init_brightness() on init" Revert "genetlink: fix family dump race" ... Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-09-09Merge branch 'x86/spinlocks' of ↵Konrad Rzeszutek Wilk2-260/+129
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into stable/for-linus-3.12 * 'x86/spinlocks' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/kvm/guest: Fix sparse warning: "symbol 'klock_waiting' was not declared as static" kvm: Paravirtual ticketlocks support for linux guests running on KVM hypervisor kvm guest: Add configuration support to enable debug information for KVM Guests kvm uapi: Add KICK_CPU and PV_UNHALT definition to uapi xen, pvticketlock: Allow interrupts to be enabled while blocking x86, ticketlock: Add slowpath logic jump_label: Split jumplabel ratelimit x86, pvticketlock: When paravirtualizing ticket locks, increment by 2 x86, pvticketlock: Use callee-save for lock_spinning xen, pvticketlocks: Add xen_nopvspin parameter to disable xen pv ticketlocks xen, pvticketlock: Xen implementation for PV ticket locks xen: Defer spinlock setup until boot CPU setup x86, ticketlock: Collapse a layer of functions x86, ticketlock: Don't inline _spin_unlock when using paravirt spinlocks x86, spinlock: Replace pv spinlocks with pv ticketlocks
2013-09-09xen/p2m: Don't call get_balloon_scratch_page() twice, keep interrupts ↵Boris Ostrovsky1-4/+6
disabled for multicalls m2p_remove_override() calls get_balloon_scratch_page() in MULTI_update_va_mapping() even though it already has pointer to this page from the earlier call (in scratch_page). This second call doesn't have a matching put_balloon_scratch_page() thus not restoring preempt count back. (Also, there is no put_balloon_scratch_page() in the error path.) In addition, the second multicall uses __xen_mc_entry() which does not disable interrupts. Rearrange xen_mc_* calls to keep interrupts off while performing multicalls. This commit fixes a regression introduced by: commit ee0726407feaf504dff304fb603652fb2d778b42 Author: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Date: Tue Jul 23 17:23:54 2013 +0000 xen/m2p: use GNTTABOP_unmap_and_replace to reinstate the original mapping Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
2013-09-05Merge tag 'stable/for-linus-3.12-rc0-tag' of ↵Linus Torvalds5-32/+65
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull Xen updates from Konrad Rzeszutek Wilk: "A couple of features and a ton of bug-fixes. There is also some maintership changes. Jeremy is enjoying the full-time work at the startup and as much as he would love to help - he can't find the time. I have a bunch of other things that I promised to work on - paravirt diet, get SWIOTLB working everywhere, etc, but haven't been able to find the time. As such both David Vrabel and Boris Ostrovsky have graciously volunteered to help with the maintership role. They will keep the lid on regressions, bug-fixes, etc. I will be in the background to help - but eventually there will be less of me doing the Xen GIT pulls and more of them. Stefano is still doing the ARM/ARM64 and will continue on doing so. Features: - Xen Trusted Platform Module (TPM) frontend driver - with the backend in MiniOS. - Scalability improvements in event channel. - Two extra Xen co-maintainers (David, Boris) and one going away (Jeremy) Bug-fixes: - Make the 1:1 mapping work during early bootup on selective regions. - Add scratch page to balloon driver to deal with unexpected code still holding on stale pages. - Allow NMIs on PV guests (64-bit only) - Remove unnecessary TLB flush in M2P code. - Fixes duplicate callbacks in Xen granttable code. - Fixes in PRIVCMD_MMAPBATCH ioctls to allow retries - Fix for events being lost due to rescheduling on different VCPUs. - More documentation" * tag 'stable/for-linus-3.12-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (23 commits) hvc_xen: Remove unnecessary __GFP_ZERO from kzalloc drivers/xen-tpmfront: Fix compile issue with missing option. xen/balloon: don't set P2M entry for auto translated guest xen/evtchn: double free on error Xen: Fix retry calls into PRIVCMD_MMAPBATCH*. xen/pvhvm: Initialize xen panic handler for PVHVM guests xen/m2p: use GNTTABOP_unmap_and_replace to reinstate the original mapping xen: fix ARM build after 6efa20e4 MAINTAINERS: Remove Jeremy from the Xen subsystem. xen/events: document behaviour when scanning the start word for events x86/xen: during early setup, only 1:1 map the ISA region x86/xen: disable premption when enabling local irqs swiotlb-xen: replace dma_length with sg_dma_len() macro swiotlb: replace dma_length with sg_dma_len() macro xen/balloon: set a mapping for ballooned out pages xen/evtchn: improve scalability by using per-user locks xen/p2m: avoid unneccesary TLB flush in m2p_remove_override() MAINTAINERS: Add in two extra co-maintainers of the Xen tree. MAINTAINERS: Update the Xen subsystem's with proper mailing list. xen: replace strict_strtoul() with kstrtoul() ...
2013-09-04Merge branch 'x86-spinlocks-for-linus' of ↵Linus Torvalds2-260/+129
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 spinlock changes from Ingo Molnar: "The biggest change here are paravirtualized ticket spinlocks (PV spinlocks), which bring a nice speedup on various benchmarks. The KVM host side will come to you via the KVM tree" * 'x86-spinlocks-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/kvm/guest: Fix sparse warning: "symbol 'klock_waiting' was not declared as static" kvm: Paravirtual ticketlocks support for linux guests running on KVM hypervisor kvm guest: Add configuration support to enable debug information for KVM Guests kvm uapi: Add KICK_CPU and PV_UNHALT definition to uapi xen, pvticketlock: Allow interrupts to be enabled while blocking x86, ticketlock: Add slowpath logic jump_label: Split jumplabel ratelimit x86, pvticketlock: When paravirtualizing ticket locks, increment by 2 x86, pvticketlock: Use callee-save for lock_spinning xen, pvticketlocks: Add xen_nopvspin parameter to disable xen pv ticketlocks xen, pvticketlock: Xen implementation for PV ticket locks xen: Defer spinlock setup until boot CPU setup x86, ticketlock: Collapse a layer of functions x86, ticketlock: Don't inline _spin_unlock when using paravirt spinlocks x86, spinlock: Replace pv spinlocks with pv ticketlocks
2013-09-04Merge branch 'x86-paravirt-for-linus' of ↵Linus Torvalds1-6/+3
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 paravirt changes from Ingo Molnar: "Hypervisor signature detection cleanup and fixes - the goal is to make KVM guests run better on MS/Hyperv and to generalize and factor out the code a bit" * 'x86-paravirt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Correctly detect hypervisor x86, kvm: Switch to use hypervisor_cpuid_base() xen: Switch to use hypervisor_cpuid_base() x86: Introduce hypervisor_cpuid_base()
2013-09-04Merge branch 'x86-asmlinkage-for-linus' of ↵Linus Torvalds1-8/+8
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/asmlinkage changes from Ingo Molnar: "As a preparation for Andi Kleen's LTO patchset (link time optimizations using GCC's -flto which build time optimization has steadily increased in quality over the past few years and might eventually be usable for the kernel too) this tree includes a handful of preparatory patches that make function calling convention annotations consistent again: - Mark every function without arguments (or 64bit only) that is used by assembly code with asmlinkage() - Mark every function with parameters or variables that is used by assembly code as __visible. For the vanilla kernel this has documentation, consistency and debuggability advantages, for the time being" * 'x86-asmlinkage-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/asmlinkage: Fix warning in xen asmlinkage change x86, asmlinkage, vdso: Mark vdso variables __visible x86, asmlinkage, power: Make various symbols used by the suspend asm code visible x86, asmlinkage: Make dump_stack visible x86, asmlinkage: Make 64bit checksum functions visible x86, asmlinkage, paravirt: Add __visible/asmlinkage to xen paravirt ops x86, asmlinkage, apm: Make APM data structure used from assembler visible x86, asmlinkage: Make syscall tables visible x86, asmlinkage: Make several variables used from assembler/linker script visible x86, asmlinkage: Make kprobes code visible and fix assembler code x86, asmlinkage: Make various syscalls asmlinkage x86, asmlinkage: Make 32bit/64bit __switch_to visible x86, asmlinkage: Make _*_start_kernel visible x86, asmlinkage: Make all interrupt handlers asmlinkage / __visible x86, asmlinkage: Change dotraplinkage into __visible on 32bit x86: Fix sys_call_table type in asm/syscall.h
2013-08-22x86/asmlinkage: Fix warning in xen asmlinkage changeAndi Kleen1-6/+6
Current code uses asmlinkage for functions without arguments. This adds an implicit regparm(0) which creates a warning when assigning the function to pointers. Use __visible for the functions without arguments. This avoids having to add regparm(0) to function pointers. Since they have no arguments it does not make any difference. Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1377115662-4865-1-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-08-22Merge tag 'stable/for-linus-3.11-rc6-tag' of ↵Linus Torvalds2-2/+31
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull Xen bug-fixes from Konrad Rzeszutek Wilk: - On ARM did not have balanced calls to get/put_cpu. - Fix to make tboot + Xen + Linux correctly. - Fix events VCPU binding issues. - Fix a vCPU online race where IPIs are sent to not-yet-online vCPU. * tag 'stable/for-linus-3.11-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/smp: initialize IPI vectors before marking CPU online xen/events: mask events when changing their VCPU binding xen/events: initialize local per-cpu mask for all possible events x86/xen: do not identity map UNUSABLE regions in the machine E820 xen/arm: missing put_cpu in xen_percpu_init
2013-08-20xen/pvhvm: Initialize xen panic handler for PVHVM guestsVaughan Cao1-0/+2
kernel use callback linked in panic_notifier_list to notice others when panic happens. NORET_TYPE void panic(const char * fmt, ...){ ... atomic_notifier_call_chain(&panic_notifier_list, 0, buf); } When Xen becomes aware of this, it will call xen_reboot(SHUTDOWN_crash) to send out an event with reason code - SHUTDOWN_crash. xen_panic_handler_init() is defined to register on panic_notifier_list but we only call it in xen_arch_setup which only be called by PV, this patch is necessary for PVHVM. Without this patch, setting 'on_crash=coredump-restart' in PVHVM guest config file won't lead a vmcore to be generate when the guest panics. It can be reproduced with 'echo c > /proc/sysrq-trigger'. Signed-off-by: Vaughan Cao <vaughan.cao@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Joe Jin <joe.jin@oracle.com>
2013-08-20xen/m2p: use GNTTABOP_unmap_and_replace to reinstate the original mappingStefano Stabellini1-6/+15
GNTTABOP_unmap_grant_ref unmaps a grant and replaces it with a 0 mapping instead of reinstating the original mapping. Doing so separately would be racy. To unmap a grant and reinstate the original mapping atomically we use GNTTABOP_unmap_and_replace. GNTTABOP_unmap_and_replace doesn't work with GNTMAP_contains_pte, so don't use it for kmaps. GNTTABOP_unmap_and_replace zeroes the mapping passed in new_addr so we have to reinstate it, however that is a per-cpu mapping only used for balloon scratch pages, so we can be sure that it's not going to be accessed while the mapping is not valid. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> CC: alex@alex.org.uk CC: dcrisan@flexiant.com [v1: Konrad fixed up the conflicts] Conflicts: arch/x86/xen/p2m.c
2013-08-20xen/smp: initialize IPI vectors before marking CPU onlineChuck Anderson1-2/+9
An older PVHVM guest (v3.0 based) crashed during vCPU hot-plug with: kernel BUG at drivers/xen/events.c:1328! RCU has detected that a CPU has not entered a quiescent state within the grace period. It needs to send the CPU a reschedule IPI if it is not offline. rcu_implicit_offline_qs() does this check: /* * If the CPU is offline, it is in a quiescent state. We can * trust its state not to change because interrupts are disabled. */ if (cpu_is_offline(rdp->cpu)) { rdp->offline_fqs++; return 1; } Else the CPU is online. Send it a reschedule IPI. The CPU is in the middle of being hot-plugged and has been marked online (!cpu_is_offline()). See start_secondary(): set_cpu_online(smp_processor_id(), true); ... per_cpu(cpu_state, smp_processor_id()) = CPU_ONLINE; start_secondary() then waits for the CPU bringing up the hot-plugged CPU to mark it as active: /* * Wait until the cpu which brought this one up marked it * online before enabling interrupts. If we don't do that then * we can end up waking up the softirq thread before this cpu * reached the active state, which makes the scheduler unhappy * and schedule the softirq thread on the wrong cpu. This is * only observable with forced threaded interrupts, but in * theory it could also happen w/o them. It's just way harder * to achieve. */ while (!cpumask_test_cpu(smp_processor_id(), cpu_active_mask)) cpu_relax(); /* enable local interrupts */ local_irq_enable(); The CPU being hot-plugged will be marked active after it has been fully initialized by the CPU managing the hot-plug. In the Xen PVHVM case xen_smp_intr_init() is called to set up the hot-plugged vCPU's XEN_RESCHEDULE_VECTOR. The hot-plugging CPU is marked online, not marked active and does not have its IPI vectors set up. rcu_implicit_offline_qs() sees the hot-plugging cpu is !cpu_is_offline() and tries to send it a reschedule IPI: This will lead to: kernel BUG at drivers/xen/events.c:1328! xen_send_IPI_one() xen_smp_send_reschedule() rcu_implicit_offline_qs() rcu_implicit_dynticks_qs() force_qs_rnp() force_quiescent_state() __rcu_process_callbacks() rcu_process_callbacks() __do_softirq() call_softirq() do_softirq() irq_exit() xen_evtchn_do_upcall() because xen_send_IPI_one() will attempt to use an uninitialized IRQ for the XEN_RESCHEDULE_VECTOR. There is at least one other place that has caused the same crash: xen_smp_send_reschedule() wake_up_idle_cpu() add_timer_on() clocksource_watchdog() call_timer_fn() run_timer_softirq() __do_softirq() call_softirq() do_softirq() irq_exit() xen_evtchn_do_upcall() xen_hvm_callback_vector() clocksource_watchdog() uses cpu_online_mask to pick the next CPU to handle a watchdog timer: /* * Cycle through CPUs to check if the CPUs stay synchronized * to each other. */ next_cpu = cpumask_next(raw_smp_processor_id(), cpu_online_mask); if (next_cpu >= nr_cpu_ids) next_cpu = cpumask_first(cpu_online_mask); watchdog_timer.expires += WATCHDOG_INTERVAL; add_timer_on(&watchdog_timer, next_cpu); This resulted in an attempt to send an IPI to a hot-plugging CPU that had not initialized its reschedule vector. One option would be to make the RCU code check to not check for CPU offline but for CPU active. As becoming active is done after a CPU is online (in older kernels). But Srivatsa pointed out that "the cpu_active vs cpu_online ordering has been completely reworked - in the online path, cpu_active is set *before* cpu_online, and also, in the cpu offline path, the cpu_active bit is reset in the CPU_DYING notification instead of CPU_DOWN_PREPARE." Drilling in this the bring-up path: "[brought up CPU].. send out a CPU_STARTING notification, and in response to that, the scheduler sets the CPU in the cpu_active_mask. Again, this mask is better left to the scheduler alone, since it has the intelligence to use it judiciously." The conclusion was that: " 1. At the IPI sender side: It is incorrect to send an IPI to an offline CPU (cpu not present in the cpu_online_mask). There are numerous places where we check this and warn/complain. 2. At the IPI receiver side: It is incorrect to let the world know of our presence (by setting ourselves in global bitmasks) until our initialization steps are complete to such an extent that we can handle the consequences (such as receiving interrupts without crashing the sender etc.) " (from Srivatsa) As the native code enables the interrupts at some point we need to be able to service them. In other words a CPU must have valid IPI vectors if it has been marked online. It doesn't need to handle the IPI (interrupts may be disabled) but needs to have valid IPI vectors because another CPU may find it in cpu_online_mask and attempt to send it an IPI. This patch will change the order of the Xen vCPU bring-up functions so that Xen vectors have been set up before start_secondary() is called. It also will not continue to bring up a Xen vCPU if xen_smp_intr_init() fails to initialize it. Orabug 13823853 Signed-off-by Chuck Anderson <chuck.anderson@oracle.com> Acked-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-08-20x86/xen: during early setup, only 1:1 map the ISA regionDavid Vrabel1-5/+11
During early setup, when the reserved regions and MMIO holes are being setup as 1:1 in the p2m, clear any mappings instead of making them 1:1 (execept for the ISA region which is expected to be mapped). This fixes a regression introduced in 3.5 by 83d51ab473dd (xen/setup: update VA mapping when releasing memory during setup) which caused hosts with tboot to fail to boot. tboot marks a region in the e820 map as unusable and the dom0 kernel would attempt to map this region and Xen does not permit unusable regions to be mapped by guests. (XEN) 0000000000000000 - 0000000000060000 (usable) (XEN) 0000000000060000 - 0000000000068000 (reserved) (XEN) 0000000000068000 - 000000000009e000 (usable) (XEN) 0000000000100000 - 0000000000800000 (usable) (XEN) 0000000000800000 - 0000000000972000 (unusable) tboot marked this region as unusable. (XEN) 0000000000972000 - 00000000cf200000 (usable) (XEN) 00000000cf200000 - 00000000cf38f000 (reserved) (XEN) 00000000cf38f000 - 00000000cf3ce000 (ACPI data) (XEN) 00000000cf3ce000 - 00000000d0000000 (reserved) (XEN) 00000000e0000000 - 00000000f0000000 (reserved) (XEN) 00000000fe000000 - 0000000100000000 (reserved) (XEN) 0000000100000000 - 0000000630000000 (usable) Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-08-20x86/xen: disable premption when enabling local irqsDavid Vrabel1-13/+12
If CONFIG_PREEMPT is enabled then xen_enable_irq() (and xen_restore_fl()) could be preempted and rescheduled on a different VCPU in between the clear of the mask and the check for pending events. This may result in events being lost as the upcall will check for pending events on the wrong VCPU. Fix this by disabling preemption around the unmask and check for events. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-08-20x86/xen: do not identity map UNUSABLE regions in the machine E820David Vrabel1-0/+22
If there are UNUSABLE regions in the machine memory map, dom0 will attempt to map them 1:1 which is not permitted by Xen and the kernel will crash. There isn't anything interesting in the UNUSABLE region that the dom0 kernel needs access to so we can avoid making the 1:1 mapping and treat it as RAM. We only do this for dom0, as that is where tboot case shows up. A PV domU could have an UNUSABLE region in its pseudo-physical map and would need to be handled in another patch. This fixes a boot failure on hosts with tboot. tboot marks a region in the e820 map as unusable and the dom0 kernel would attempt to map this region and Xen does not permit unusable regions to be mapped by guests. (XEN) 0000000000000000 - 0000000000060000 (usable) (XEN) 0000000000060000 - 0000000000068000 (reserved) (XEN) 0000000000068000 - 000000000009e000 (usable) (XEN) 0000000000100000 - 0000000000800000 (usable) (XEN) 0000000000800000 - 0000000000972000 (unusable) tboot marked this region as unusable. (XEN) 0000000000972000 - 00000000cf200000 (usable) (XEN) 00000000cf200000 - 00000000cf38f000 (reserved) (XEN) 00000000cf38f000 - 00000000cf3ce000 (ACPI data) (XEN) 00000000cf3ce000 - 00000000d0000000 (reserved) (XEN) 00000000e0000000 - 00000000f0000000 (reserved) (XEN) 00000000fe000000 - 0000000100000000 (reserved) (XEN) 0000000100000000 - 0000000630000000 (usable) Signed-off-by: David Vrabel <david.vrabel@citrix.com> [v1: Altered the patch and description with domU's with UNUSABLE regions] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-08-09xen/p2m: avoid unneccesary TLB flush in m2p_remove_override()David Vrabel1-1/+0
In m2p_remove_override() when removing the grant map from the kernel mapping and replacing with a mapping to the original page, the grant unmap will already have flushed the TLB and it is not necessary to do it again after updating the mapping. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
2013-08-09xen: Support 64-bit PV guest receiving NMIsKonrad Rzeszutek Wilk3-7/+25
This is based on a patch that Zhenzhong Duan had sent - which was missing some of the remaining pieces. The kernel has the logic to handle Xen-type-exceptions using the paravirt interface in the assembler code (see PARAVIRT_ADJUST_EXCEPTION_FRAME - pv_irq_ops.adjust_exception_frame and and INTERRUPT_RETURN - pv_cpu_ops.iret). That means the nmi handler (and other exception handlers) use the hypervisor iret. The other changes that would be neccessary for this would be to translate the NMI_VECTOR to one of the entries on the ipi_vector and make xen_send_IPI_mask_allbutself use different events. Fortunately for us commit 1db01b4903639fcfaec213701a494fe3fb2c490b (xen: Clean up apic ipi interface) implemented this and we piggyback on the cleanup such that the apic IPI interface will pass the right vector value for NMI. With this patch we can trigger NMIs within a PV guest (only tested x86_64). For this to work with normal PV guests (not initial domain) we need the domain to be able to use the APIC ops - they are already implemented to use the Xen event channels. For that to be turned on in a PV domU we need to remove the masking of X86_FEATURE_APIC. Incidentally that means kgdb will also now work within a PV guest without using the 'nokgdbroundup' workaround. Note that the 32-bit version is different and this patch does not enable that. CC: Lisa Nguyen <lisa@xenapiadmin.com> CC: Ben Guthro <benjamin.guthro@citrix.com> CC: Zhenzhong Duan <zhenzhong.duan@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [v1: Fixed up per David Vrabel comments] Reviewed-by: Ben Guthro <benjamin.guthro@citrix.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com>
2013-08-09xen, pvticketlock: Allow interrupts to be enabled while blockingJeremy Fitzhardinge1-6/+40
If interrupts were enabled when taking the spinlock, we can leave them enabled while blocking to get the lock. If we can enable interrupts while waiting for the lock to become available, and we take an interrupt before entering the poll, and the handler takes a spinlock which ends up going into the slow state (invalidating the per-cpu "lock" and "want" values), then when the interrupt handler returns the event channel will remain pending so the poll will return immediately, causing it to return out to the main spinlock loop. Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org> Link: http://lkml.kernel.org/r/1376058122-8248-12-git-send-email-raghavendra.kt@linux.vnet.ibm.com Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-08-09x86, ticketlock: Add slowpath logicJeremy Fitzhardinge1-0/+6
Maintain a flag in the LSB of the ticket lock tail which indicates whether anyone is in the lock slowpath and may need kicking when the current holder unlocks. The flags are set when the first locker enters the slowpath, and cleared when unlocking to an empty queue (ie, no contention). In the specific implementation of lock_spinning(), make sure to set the slowpath flags on the lock just before blocking. We must do this before the last-chance pickup test to prevent a deadlock with the unlocker: Unlocker Locker test for lock pickup -> fail unlock test slowpath -> false set slowpath flags block Whereas this works in any ordering: Unlocker Locker set slowpath flags test for lock pickup -> fail block unlock test slowpath -> true, kick If the unlocker finds that the lock has the slowpath flag set but it is actually uncontended (ie, head == tail, so nobody is waiting), then it clears the slowpath flag. The unlock code uses a locked add to update the head counter. This also acts as a full memory barrier so that its safe to subsequently read back the slowflag state, knowing that the updated lock is visible to the other CPUs. If it were an unlocked add, then the flag read may just be forwarded from the store buffer before it was visible to the other CPUs, which could result in a deadlock. Unfortunately this means we need to do a locked instruction when unlocking with PV ticketlocks. However, if PV ticketlocks are not enabled, then the old non-locked "add" is the only unlocking code. Note: this code relies on gcc making sure that unlikely() code is out of line of the fastpath, which only happens when OPTIMIZE_SIZE=n. If it doesn't the generated code isn't too bad, but its definitely suboptimal. Thanks to Srivatsa Vaddagiri for providing a bugfix to the original version of this change, which has been folded in. Thanks to Stephan Diestelhorst for commenting on some code which relied on an inaccurate reading of the x86 memory ordering rules. Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org> Link: http://lkml.kernel.org/r/1376058122-8248-11-git-send-email-raghavendra.kt@linux.vnet.ibm.com Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Stephan Diestelhorst <stephan.diestelhorst@amd.com> Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-08-09x86, pvticketlock: Use callee-save for lock_spinningJeremy Fitzhardinge1-1/+2
Although the lock_spinning calls in the spinlock code are on the uncommon path, their presence can cause the compiler to generate many more register save/restores in the function pre/postamble, which is in the fast path. To avoid this, convert it to using the pvops callee-save calling convention, which defers all the save/restores until the actual function is called, keeping the fastpath clean. Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org> Link: http://lkml.kernel.org/r/1376058122-8248-8-git-send-email-raghavendra.kt@linux.vnet.ibm.com Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Tested-by: Attilio Rao <attilio.rao@citrix.com> Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-08-09xen, pvticketlocks: Add xen_nopvspin parameter to disable xen pv ticketlocksJeremy Fitzhardinge1-0/+14
Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org> Link: http://lkml.kernel.org/r/1376058122-8248-7-git-send-email-raghavendra.kt@linux.vnet.ibm.com Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>