summaryrefslogtreecommitdiff
path: root/arch/x86/xen
AgeCommit message (Collapse)AuthorFilesLines
2017-08-29x86/asm: Replace access to desc_struct:a/b fieldsThomas Gleixner1-1/+1
The union inside of desc_struct which allows access to the raw u32 parts of the descriptors. This raw access part is about to go away. Replace the few code parts which access those fields. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170828064958.120214366@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29x86/idt: Unify gate_struct handling for 32/64-bit kernelsThomas Gleixner1-6/+6
The first 32 bits of gate struct are the same for 32 and 64 bit kernels. The 32-bit version uses desc_struct and no designated data structure, so we need different accessors for 32 and 64 bit kernels. Aside of that the macros which are necessary to build the 32-bit gate descriptor are horrible to read. Unify the gate structs and switch all code fiddling with it over. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170828064957.861974317@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-26Merge branch 'linus' into x86/mm to pick up fixes and to fix conflictsIngo Molnar3-24/+39
Conflicts: arch/x86/kernel/head64.c arch/x86/mm/mmap.c Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-25Merge branch 'x86/asm' into x86/apicThomas Gleixner6-152/+48
Pick up dependent changes to avoid merge conflicts
2017-08-24x86/paravirt/xen: Remove xen_patch()Juergen Gross6-138/+21
Xen's paravirt patch function xen_patch() does some special casing for irq_ops functions to apply relocations when those functions can be patched inline instead of calls. Unfortunately none of the special case function replacements is small enough to be patched inline, so the special case never applies. As xen_patch() will call paravirt_patch_default() in all cases it can be just dropped. xen-asm.h doesn't seem necessary without xen_patch() as the only thing left in it would be the definition of XEN_EFLAGS_NMI used only once. So move that definition and remove xen-asm.h. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: boris.ostrovsky@oracle.com Cc: lguest@lists.ozlabs.org Cc: rusty@rustcorp.com.au Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/20170816173157.8633-2-jgross@suse.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-15x86/xen/64: Fix the reported SS and CS in SYSCALLAndy Lutomirski1-0/+18
When I cleaned up the Xen SYSCALL entries, I inadvertently changed the reported segment registers. Before my patch, regs->ss was __USER(32)_DS and regs->cs was __USER(32)_CS. After the patch, they are FLAT_USER_CS/DS(32). This had a couple unfortunate effects. It confused the opportunistic fast return logic. It also significantly increased the risk of triggering a nasty glibc bug: https://sourceware.org/bugzilla/show_bug.cgi?id=21269 Update the Xen entry code to change it back. Reported-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: xen-devel@lists.xenproject.org Fixes: 8a9949bc71a7 ("x86/xen/64: Rearrange the SYSCALL entries") Link: http://lkml.kernel.org/r/daba8351ea2764bb30272296ab9ce08a81bd8264.1502775273.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-11xen: fix hvm guest with kaslr enabledJuergen Gross1-2/+14
A Xen HVM guest running with KASLR enabled will die rather soon today because the shared info page mapping is using va() too early. This was introduced by commit a5d5f328b0e2baa5ee7c119fd66324eb79eeeb66 ("xen: allocate page for shared info page from low memory"). In order to fix this use early_memremap() to get a temporary virtual address for shared info until va() can be used safely. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Juergen Gross <jgross@suse.com>
2017-08-11xen: split up xen_hvm_init_shared_info()Juergen Gross1-21/+24
Instead of calling xen_hvm_init_shared_info() on boot and resume split it up into a boot time function searching for the pfn to use and a mapping function doing the hypervisor mapping call. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Juergen Gross <jgross@suse.com>
2017-08-10x86/xen/64: Rearrange the SYSCALL entriesAndy Lutomirski1-14/+9
Xen's raw SYSCALL entries are much less weird than native. Rather than fudging them to look like native entries, use the Xen-provided stack frame directly. This lets us eliminate entry_SYSCALL_64_after_swapgs and two uses of the SWAPGS_UNSAFE_STACK paravirt hook. The SYSENTER code would benefit from similar treatment. This makes one change to the native code path: the compat instruction that clears the high 32 bits of %rax is moved slightly later. I'd be surprised if this affects performance at all. Tested-by: Juergen Gross <jgross@suse.com> Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bpetkov@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/7c88ed36805d36841ab03ec3b48b4122c4418d71.1502164668.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-23xen/x86: fix cpu hotplugJuergen Gross1-1/+2
Commit dc6416f1d711eb4c1726e845d653235dcaae12e1 ("xen/x86: Call cpu_startup_entry(CPUHP_AP_ONLINE_IDLE) from xen_play_dead()") introduced an error leading to a stack overflow of the idle task when a cpu was brought offline/online many times: by calling cpu_startup_entry() instead of returning at the end of xen_play_dead() do_idle() would be entered again and again. Don't use cpu_startup_entry(), but cpuhp_online_idle() instead allowing to return from xen_play_dead(). Cc: <stable@vger.kernel.org> # 4.12 Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2017-07-23xen/x86: Don't BUG on CPU0 offliningVitaly Kuznetsov1-1/+0
CONFIG_BOOTPARAM_HOTPLUG_CPU0 allows to offline CPU0 but Xen HVM guests BUG() in xen_teardown_timer(). Remove the BUG_ON(), this is probably a leftover from ancient times when CPU0 hotplug was impossible, it works just fine for HVM. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Acked-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2017-07-21x86: Enable 5-level paging support via CONFIG_X86_5LEVEL=yKirill A. Shutemov1-0/+5
Most of things are in place and we can enable support for 5-level paging. The patch makes XEN_PV and XEN_PVH dependent on !X86_5LEVEL. Both are not ready to work with 5-level paging. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170716225954.74185-9-kirill.shutemov@linux.intel.com [ Minor readability edits. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-21x86/xen: Redefine XEN_ELFNOTE_INIT_P2M using PUD_SIZE * PTRS_PER_PUDKirill A. Shutemov1-1/+1
XEN_ELFNOTE_INIT_P2M has to be 512GB for both 4- and 5-level paging. (PUD_SIZE * PTRS_PER_PUD) would do this. Unfortunately, we cannot use P4D_SIZE, which would fit here. With current headers structure it cannot be used in assembly, if p4d level is folded. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170716225954.74185-4-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-18xen/x86: Remove SME feature in PV guestsTom Lendacky1-0/+1
Xen does not currently support SME for PV guests. Clear the SME CPU capability in order to avoid any ambiguity. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@suse.de> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: <xen-devel@lists.xen.org> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Dave Young <dyoung@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Larry Woodman <lwoodman@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Toshimitsu Kani <toshi.kani@hpe.com> Cc: kasan-dev@googlegroups.com Cc: kvm@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: linux-doc@vger.kernel.org Cc: linux-efi@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/3b605622a9fae5e588e5a13967120a18ec18071b.1500319216.git.thomas.lendacky@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-18Merge branch 'x86/boot' into x86/mm, to pick up interacting changesIngo Molnar12-155/+244
The SME patches we are about to apply add some E820 logic, so merge in pending E820 code changes first, to have a single code base. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-13kexec: move vmcoreinfo out of the kernel's .bss sectionXunlei Pang1-2/+2
As Eric said, "what we need to do is move the variable vmcoreinfo_note out of the kernel's .bss section. And modify the code to regenerate and keep this information in something like the control page. Definitely something like this needs a page all to itself, and ideally far away from any other kernel data structures. I clearly was not watching closely the data someone decided to keep this silly thing in the kernel's .bss section." This patch allocates extra pages for these vmcoreinfo_XXX variables, one advantage is that it enhances some safety of vmcoreinfo, because vmcoreinfo now is kept far away from other kernel data structures. Link: http://lkml.kernel.org/r/1493281021-20737-1-git-send-email-xlpang@redhat.com Signed-off-by: Xunlei Pang <xlpang@redhat.com> Tested-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Reviewed-by: Juergen Gross <jgross@suse.com> Suggested-by: Eric Biederman <ebiederm@xmission.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Dave Young <dyoung@redhat.com> Cc: Hari Bathini <hbathini@linux.vnet.ibm.com> Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-07Merge tag 'dma-mapping-4.13' of git://git.infradead.org/users/hch/dma-mappingLinus Torvalds1-14/+0
Pull dma-mapping infrastructure from Christoph Hellwig: "This is the first pull request for the new dma-mapping subsystem In this new subsystem we'll try to properly maintain all the generic code related to dma-mapping, and will further consolidate arch code into common helpers. This pull request contains: - removal of the DMA_ERROR_CODE macro, replacing it with calls to ->mapping_error so that the dma_map_ops instances are more self contained and can be shared across architectures (me) - removal of the ->set_dma_mask method, which duplicates the ->dma_capable one in terms of functionality, but requires more duplicate code. - various updates for the coherent dma pool and related arm code (Vladimir) - various smaller cleanups (me)" * tag 'dma-mapping-4.13' of git://git.infradead.org/users/hch/dma-mapping: (56 commits) ARM: dma-mapping: Remove traces of NOMMU code ARM: NOMMU: Set ARM_DMA_MEM_BUFFERABLE for M-class cpus ARM: NOMMU: Introduce dma operations for noMMU drivers: dma-mapping: allow dma_common_mmap() for NOMMU drivers: dma-coherent: Introduce default DMA pool drivers: dma-coherent: Account dma_pfn_offset when used with device tree dma: Take into account dma_pfn_offset dma-mapping: replace dmam_alloc_noncoherent with dmam_alloc_attrs dma-mapping: remove dmam_free_noncoherent crypto: qat - avoid an uninitialized variable warning au1100fb: remove a bogus dma_free_nonconsistent call MAINTAINERS: add entry for dma mapping helpers powerpc: merge __dma_set_mask into dma_set_mask dma-mapping: remove the set_dma_mask method powerpc/cell: use the dma_supported method for ops switching powerpc/cell: clean up fixed mapping dma_ops initialization tile: remove dma_supported and mapping_error methods xen-swiotlb: remove xen_swiotlb_set_dma_mask arm: implement ->dma_supported instead of ->set_dma_mask mips/loongson64: implement ->dma_supported instead of ->set_dma_mask ...
2017-07-07Merge tag 'for-linus-4.13-rc1-tag' of ↵Linus Torvalds10-139/+242
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen updates from Juergen Gross: "Other than fixes and cleanups it contains: - support > 32 VCPUs at domain restore - support for new sysfs nodes related to Xen - some performance tuning for Linux running as Xen guest" * tag 'for-linus-4.13-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: x86/xen: allow userspace access during hypercalls x86: xen: remove unnecessary variable in xen_foreach_remap_area() xen: allocate page for shared info page from low memory xen: avoid deadlock in xenbus driver xen: add sysfs node for hypervisor build id xen: sync include/xen/interface/version.h xen: add sysfs node for guest type doc,xen: document hypervisor sysfs nodes for xen xen/vcpu: Handle xen_vcpu_setup() failure at boot xen/vcpu: Handle xen_vcpu_setup() failure in hotplug xen/pv: Fix OOPS on restore for a PV, !SMP domain xen/pvh*: Support > 32 VCPUs at domain restore xen/vcpu: Simplify xen_vcpu related code xen-evtchn: Bind dyn evtchn:qemu-dm interrupt to next online VCPU xen: avoid type warning in xchg_xen_ulong xen: fix HYPERVISOR_dm_op() prototype xen: don't print error message in case of missing Xenstore entry arm/xen: Adjust one function call together with a variable assignment arm/xen: Delete an error message for a failed memory allocation in __set_phys_to_machine_multi() arm/xen: Improve a size determination in __set_phys_to_machine_multi()
2017-07-05x86/mm: Enable CR4.PCIDE on supported systemsAndy Lutomirski1-0/+6
We can use PCID if the CPU has PCID and PGE and we're not on Xen. By itself, this has no effect. A followup patch will start using PCID. Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Nadav Amit <nadav.amit@gmail.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/6327ecd907b32f79d5aa0d466f04503bbec5df88.1498751203.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-05x86/mm: Rework lazy TLB mode and TLB freshness trackingAndy Lutomirski1-3/+2
x86's lazy TLB mode used to be fairly weak -- it would switch to init_mm the first time it tried to flush a lazy TLB. This meant an unnecessary CR3 write and, if the flush was remote, an unnecessary IPI. Rewrite it entirely. When we enter lazy mode, we simply remove the CPU from mm_cpumask. This means that we need a way to figure out whether we've missed a flush when we switch back out of lazy mode. I use the tlb_gen machinery to track whether a context is up to date. Note to reviewers: this patch, my itself, looks a bit odd. I'm using an array of length 1 containing (ctx_id, tlb_gen) rather than just storing tlb_gen, and making it at array isn't necessary yet. I'm doing this because the next few patches add PCID support, and, with PCID, we need ctx_id, and the array will end up with a length greater than 1. Making it an array now means that there will be less churn and therefore less stress on your eyeballs. NB: This is dubious but, AFAICT, still correct on Xen and UV. xen_exit_mmap() uses mm_cpumask() for nefarious purposes and this patch changes the way that mm_cpumask() works. This should be okay, since Xen *also* iterates all online CPUs to find all the CPUs it needs to twiddle. The UV tlbflush code is rather dated and should be changed. Here are some benchmark results, done on a Skylake laptop at 2.3 GHz (turbo off, intel_pstate requesting max performance) under KVM with the guest using idle=poll (to avoid artifacts when bouncing between CPUs). I haven't done any real statistics here -- I just ran them in a loop and picked the fastest results that didn't look like outliers. Unpatched means commit a4eb8b993554, so all the bookkeeping overhead is gone. MADV_DONTNEED; touch the page; switch CPUs using sched_setaffinity. In an unpatched kernel, MADV_DONTNEED will send an IPI to the previous CPU. This is intended to be a nearly worst-case test. patched: 13.4µs unpatched: 21.6µs Vitaly's pthread_mmap microbenchmark with 8 threads (on four cores), nrounds = 100, 256M data patched: 1.1 seconds or so unpatched: 1.9 seconds or so The sleepup on Vitaly's test appearss to be because it spends a lot of time blocked on mmap_sem, and this patch avoids sending IPIs to blocked CPUs. Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Nadav Amit <nadav.amit@gmail.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Banman <abanman@sgi.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Dimitri Sivanich <sivanich@sgi.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Travis <travis@sgi.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/ddf2c92962339f4ba39d8fc41b853936ec0b44f1.1498751203.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-04Merge branch 'irq-core-for-linus' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq updates from Thomas Gleixner: "The irq department delivers: - Expand the generic infrastructure handling the irq migration on CPU hotplug and convert X86 over to it. (Thomas Gleixner) Aside of consolidating code this is a preparatory change for: - Finalizing the affinity management for multi-queue devices. The main change here is to shut down interrupts which are affine to a outgoing CPU and reenabling them when the CPU comes online again. That avoids moving interrupts pointlessly around and breaking and reestablishing affinities for no value. (Christoph Hellwig) Note: This contains also the BLOCK-MQ and NVME changes which depend on the rework of the irq core infrastructure. Jens acked them and agreed that they should go with the irq changes. - Consolidation of irq domain code (Marc Zyngier) - State tracking consolidation in the core code (Jeffy Chen) - Add debug infrastructure for hierarchical irq domains (Thomas Gleixner) - Infrastructure enhancement for managing generic interrupt chips via devmem (Bartosz Golaszewski) - Constification work all over the place (Tobias Klauser) - Two new interrupt controller drivers for MVEBU (Thomas Petazzoni) - The usual set of fixes, updates and enhancements all over the place" * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (112 commits) irqchip/or1k-pic: Fix interrupt acknowledgement irqchip/irq-mvebu-gicp: Allocate enough memory for spi_bitmap irqchip/gic-v3: Fix out-of-bound access in gic_set_affinity nvme: Allocate queues for all possible CPUs blk-mq: Create hctx for each present CPU blk-mq: Include all present CPUs in the default queue mapping genirq: Avoid unnecessary low level irq function calls genirq: Set irq masked state when initializing irq_desc genirq/timings: Add infrastructure for estimating the next interrupt arrival time genirq/timings: Add infrastructure to track the interrupt timings genirq/debugfs: Remove pointless NULL pointer check irqchip/gic-v3-its: Don't assume GICv3 hardware supports 16bit INTID irqchip/gic-v3-its: Add ACPI NUMA node mapping irqchip/gic-v3-its-platform-msi: Make of_device_ids const irqchip/gic-v3-its: Make of_device_ids const irqchip/irq-mvebu-icu: Add new driver for Marvell ICU irqchip/irq-mvebu-gicp: Add new driver for Marvell GICP dt-bindings/interrupt-controller: Add DT binding for the Marvell ICU genirq/irqdomain: Remove auto-recursive hierarchy support irqchip/MSI: Use irq_domain_update_bus_token instead of an open coded access ...
2017-07-04Merge branch 'x86-mm-for-linus' of ↵Linus Torvalds2-45/+40
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm updates from Ingo Molnar: "The main changes in this cycle were: - Continued work to add support for 5-level paging provided by future Intel CPUs. In particular we switch the x86 GUP code to the generic implementation. (Kirill A. Shutemov) - Continued work to add PCID CPU support to native kernels as well. In this round most of the focus is on reworking/refreshing the TLB flush infrastructure for the upcoming PCID changes. (Andy Lutomirski)" * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (34 commits) x86/mm: Delete a big outdated comment about TLB flushing x86/mm: Don't reenter flush_tlb_func_common() x86/KASLR: Fix detection 32/64 bit bootloaders for 5-level paging x86/ftrace: Exclude functions in head64.c from function-tracing x86/mmap, ASLR: Do not treat unlimited-stack tasks as legacy mmap x86/mm: Remove reset_lazy_tlbstate() x86/ldt: Simplify the LDT switching logic x86/boot/64: Put __startup_64() into .head.text x86/mm: Add support for 5-level paging for KASLR x86/mm: Make kernel_physical_mapping_init() support 5-level paging x86/mm: Add sync_global_pgds() for configuration with 5-level paging x86/boot/64: Add support of additional page table level during early boot x86/boot/64: Rename init_level4_pgt and early_level4_pgt x86/boot/64: Rewrite startup_64() in C x86/boot/compressed: Enable 5-level paging during decompression stage x86/boot/efi: Define __KERNEL32_CS GDT on 64-bit configurations x86/boot/efi: Fix __KERNEL_CS definition of GDT entry on 64-bit configurations x86/boot/efi: Cleanup initialization of GDT entries x86/asm: Fix comment in return_from_SYSCALL_64() x86/mm/gup: Switch GUP to the generic get_user_page_fast() implementation ...
2017-07-03Merge branch 'efi-core-for-linus' of ↵Linus Torvalds1-33/+12
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull EFI updates from Ingo Molnar: "The main changes in this cycle were: - Rework the EFI capsule loader to allow for workarounds for non-compliant firmware (Ard Biesheuvel) - Implement a capsule loader quirk for Quark X102x (Jan Kiszka) - Enable SMBIOS/DMI support for the ARM architecture (Ard Biesheuvel) - Add CONFIG_EFI_PGT_DUMP=y support for x86-32 and kexec (Sai Praneeth) - Fixes for EFI support for Xen dom0 guests running under x86-64 hosts (Daniel Kiper)" * 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/xen/efi: Initialize only the EFI struct members used by Xen efi: Process the MEMATTR table only if EFI_MEMMAP is enabled efi/arm: Enable DMI/SMBIOS x86/efi: Extend CONFIG_EFI_PGT_DUMP support to x86_32 and kexec as well efi/efi_test: Use memdup_user() helper efi/capsule: Add support for Quark security header efi/capsule-loader: Use page addresses rather than struct page pointers efi/capsule-loader: Redirect calls to efi_capsule_setup_info() via weak alias efi/capsule: Remove NULL test on kmap() efi/capsule-loader: Use a cached copy of the capsule header efi/capsule: Adjust return type of efi_capsule_setup_info() efi/capsule: Clean up pr_err/_info() messages efi/capsule: Remove pr_debug() on ENOMEM or EFAULT efi/capsule: Fix return code on failing kmap/vmap
2017-07-03x86: xen: remove unnecessary variable in xen_foreach_remap_area()Gustavo A. R. Silva1-5/+2
Remove unnecessary variable mfn in function xen_foreach_remap_area() and, refactor the code. Variable mfn at line 518:mfn = xen_remap_buf.mfns[i]; is only being used to store a value to be passed as an argument to the xen_update_mem_tables() function. This value can be passed directly, which makes variable mfn unnecessary. Also, value assigned to variable mfn at line 534:mfn = xen_remap_mfn; is never used. Addresses-Coverity-ID: 1260110 Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2017-06-30objtool, x86: Add several functions and files to the objtool whitelistJosh Poimboeuf1-0/+3
In preparation for an objtool rewrite which will have broader checks, whitelist functions and files which cause problems because they do unusual things with the stack. These whitelists serve as a TODO list for which functions and files don't yet have undwarf unwinder coverage. Eventually most of the whitelists can be removed in favor of manual CFI hint annotations or objtool improvements. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: live-patching@vger.kernel.org Link: http://lkml.kernel.org/r/7f934a5d707a574bda33ea282e9478e627fb1829.1498659915.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-25xen: allocate page for shared info page from low memoryJuergen Gross2-9/+24
In a HVM guest the kernel allocates the page for mapping the shared info structure via extend_brk() today. This will lead to a drop of performance as the underlying EPT entry will have to be split up into 4kB entries as the single shared info page is located in hypervisor memory. The issue has been detected by using the libmicro munmap test: unmapping 8kB of memory was faster by nearly a factor of two when no pv interfaces were active in the HVM guest. So instead of taking a page from memory which might be mapped via large EPT entries use a page which is already mapped via a 4kB EPT entry: we can take a page from the first 1MB of memory as the video memory at 640kB disallows using larger EPT entries. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2017-06-23x86/xen/efi: Initialize only the EFI struct members used by XenDaniel Kiper1-33/+12
The current approach, which is the wholesale efi struct initialization from a 'efi_xen' local template is not robust. Usually if new member is defined then it is properly initialized in drivers/firmware/efi/efi.c, but not in arch/x86/xen/efi.c. The effect is that the Xen initialization clears any fields the generic code might have set and the Xen code does not know about yet. I saw this happen a few times, so let's initialize only the EFI struct members used by Xen and maintain no local duplicate, to avoid such issues in the future. Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: andrew.cooper3@citrix.com Cc: jgross@suse.com Cc: linux-efi@vger.kernel.org Cc: matt@codeblueprint.co.uk Cc: stable@vger.kernel.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1498128697-12943-3-git-send-email-daniel.kiper@oracle.com [ Clarified the changelog. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-22x86/apic: Move cpumask and to core codeThomas Gleixner1-1/+1
All implementations of apic->cpu_mask_to_apicid_and() and the two incoming cpumasks to search for the target. Move that operation to the call site and rename it to cpu_mask_to_apicid() Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Jens Axboe <axboe@kernel.dk> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Keith Busch <keith.busch@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Christoph Hellwig <hch@lst.de> Link: http://lkml.kernel.org/r/20170619235446.641575516@linutronix.de
2017-06-20xen-swiotlb: consolidate xen_swiotlb_dma_opsChristoph Hellwig1-14/+0
ARM and x86 had duplicated versions of the dma_ops structure, the only difference is that x86 hasn't wired up the set_dma_mask, mmap, and get_sgtable ops yet. On x86 all of them are identical to the generic version, so they aren't needed but harmless. All the symbols used only for xen_swiotlb_dma_ops can now be marked static as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2017-06-13xen/vcpu: Handle xen_vcpu_setup() failure at bootAnkur Arora4-5/+35
On PVH, PVHVM, at failure in the VCPUOP_register_vcpu_info hypercall we limit the number of cpus to to MAX_VIRT_CPUS. However, if this failure had occurred for a cpu beyond MAX_VIRT_CPUS, we continue to function with > MAX_VIRT_CPUS. This leads to problems at the next save/restore cycle when there are > MAX_VIRT_CPUS threads going into stop_machine() but coming back up there's valid state for only the first MAX_VIRT_CPUS. This patch pulls the excess CPUs down via cpu_down(). Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2017-06-13xen/vcpu: Handle xen_vcpu_setup() failure in hotplugAnkur Arora4-26/+45
The hypercall VCPUOP_register_vcpu_info can fail. This failure is handled by making per_cpu(xen_vcpu, cpu) point to its shared_info slot and those without one (cpu >= MAX_VIRT_CPUS) be NULL. For PVH/PVHVM, this is not enough, because we also need to pull these VCPUs out of circulation. Fix for PVH/PVHVM: on registration failure in the cpuhp prepare callback (xen_cpu_up_prepare_hvm()), return an error to the cpuhp state-machine so it can fail the CPU init. Fix for PV: the registration happens before smp_init(), so, in the failure case we clamp setup_max_cpus and limit the number of VCPUs that smp_init() will bring-up to MAX_VIRT_CPUS. This is functionally correct but it makes the code a bit simpler if we get rid of this explicit clamping: for VCPUs that don't have valid xen_vcpu, fail the CPU init in the cpuhp prepare callback (xen_cpu_up_prepare_pv()). Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2017-06-13xen/pv: Fix OOPS on restore for a PV, !SMP domainAnkur Arora1-11/+15
If CONFIG_SMP is disabled, xen_setup_vcpu_info_placement() is called from xen_setup_shared_info(). This is fine as far as boot goes, but it means that we also call it in the restore path. This results in an OOPS because we assign to pv_mmu_ops.read_cr2 which is __ro_after_init. Also, though less problematically, this means we call xen_vcpu_setup() twice at restore -- once from the vcpu info placement call and the second time from xen_vcpu_restore(). Fix by calling xen_setup_vcpu_info_placement() at boot only. Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2017-06-13xen/pvh*: Support > 32 VCPUs at domain restoreAnkur Arora4-34/+52
When Xen restores a PVHVM or PVH guest, its shared_info only holds up to 32 CPUs. The hypercall VCPUOP_register_vcpu_info allows us to setup per-page areas for VCPUs. This means we can boot PVH* guests with more than 32 VCPUs. During restore the per-cpu structure is allocated freshly by the hypervisor (vcpu_info_mfn is set to INVALID_MFN) so that the newly restored guest can make a VCPUOP_register_vcpu_info hypercall. However, we end up triggering this condition in Xen: /* Run this command on yourself or on other offline VCPUS. */ if ( (v != current) && !test_bit(_VPF_down, &v->pause_flags) ) which means we are unable to setup the per-cpu VCPU structures for running VCPUS. The Linux PV code paths makes this work by iterating over cpu_possible in xen_vcpu_restore() with: 1) is target CPU up (VCPUOP_is_up hypercall?) 2) if yes, then VCPUOP_down to pause it 3) VCPUOP_register_vcpu_info 4) if it was down, then VCPUOP_up to bring it back up With Xen commit 192df6f9122d ("xen/x86: allow HVM guests to use hypercalls to bring up vCPUs") this is available for non-PV guests. As such first check if VCPUOP_is_up is actually possible before trying this dance. As most of this dance code is done already in xen_vcpu_restore() let's make it callable on PV, PVH and PVHVM. Based-on-patch-by: Konrad Wilk <konrad.wilk@oracle.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2017-06-13xen/vcpu: Simplify xen_vcpu related codeAnkur Arora5-69/+89
Largely mechanical changes to aid unification of xen_vcpu_restore() logic for PV, PVH and PVHVM. xen_vcpu_setup(): the only change in logic is that clamp_max_cpus() is now handled inside the "if (!xen_have_vcpu_info_placement)" block. xen_vcpu_restore(): code movement from enlighten_pv.c to enlighten.c. xen_vcpu_info_reset(): pulls together all the code where xen_vcpu is set to default. Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2017-06-13x86/boot/64: Rename init_level4_pgt and early_level4_pgtKirill A. Shutemov2-9/+9
With CONFIG_X86_5LEVEL=y, level 4 is no longer top level of page tables. Let's give these variable more generic names: init_top_pgt and early_top_pgt. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170606113133.22974-9-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-13x86/mm: Split read_cr3() into read_cr3_pa() and __read_cr3()Andy Lutomirski1-3/+3
The kernel has several code paths that read CR3. Most of them assume that CR3 contains the PGD's physical address, whereas some of them awkwardly use PHYSICAL_PAGE_MASK to mask off low bits. Add explicit mask macros for CR3 and convert all of the CR3 readers. This will keep them from breaking when PCID is enabled. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: xen-devel <xen-devel@lists.xen.org> Link: http://lkml.kernel.org/r/883f8fb121f4616c1c1427ad87350bb2f5ffeca1.1497288170.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-05x86/mm: Rework lazy TLB to track the actual loaded mmAndy Lutomirski1-28/+23
Lazy TLB state is currently managed in a rather baroque manner. AFAICT, there are three possible states: - Non-lazy. This means that we're running a user thread or a kernel thread that has called use_mm(). current->mm == current->active_mm == cpu_tlbstate.active_mm and cpu_tlbstate.state == TLBSTATE_OK. - Lazy with user mm. We're running a kernel thread without an mm and we're borrowing an mm_struct. We have current->mm == NULL, current->active_mm == cpu_tlbstate.active_mm, cpu_tlbstate.state != TLBSTATE_OK (i.e. TLBSTATE_LAZY or 0). The current cpu is set in mm_cpumask(current->active_mm). CR3 points to current->active_mm->pgd. The TLB is up to date. - Lazy with init_mm. This happens when we call leave_mm(). We have current->mm == NULL, current->active_mm == cpu_tlbstate.active_mm, but that mm is only relelvant insofar as the scheduler is tracking it for refcounting. cpu_tlbstate.state != TLBSTATE_OK. The current cpu is clear in mm_cpumask(current->active_mm). CR3 points to swapper_pg_dir, i.e. init_mm->pgd. This patch simplifies the situation. Other than perf, x86 stops caring about current->active_mm at all. We have cpu_tlbstate.loaded_mm pointing to the mm that CR3 references. The TLB is always up to date for that mm. leave_mm() just switches us to init_mm. There are no longer any special cases for mm_cpumask, and switch_mm() switches mms without worrying about laziness. After this patch, cpu_tlbstate.state serves only to tell the TLB flush code whether it may switch to init_mm instead of doing a normal flush. This makes fairly extensive changes to xen_exit_mmap(), which used to look a bit like black magic. Perf is unchanged. With or without this change, perf may behave a bit erratically if it tries to read user memory in kernel thread context. We should build on this patch to teach perf to never look at user memory when cpu_tlbstate.loaded_mm != current->mm. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bpetkov@suse.de> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Michal Hocko <mhocko@suse.com> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Nadav Amit <namit@vmware.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-05x86/mm: Pass flush_tlb_info to flush_tlb_others() etcAndy Lutomirski1-5/+5
Rather than passing all the contents of flush_tlb_info to flush_tlb_others(), pass a pointer to the structure directly. For consistency, this also removes the unnecessary cpu parameter from uv_flush_tlb_others() to make its signature match the other *flush_tlb_others() functions. This serves two purposes: - It will dramatically simplify future patches that change struct flush_tlb_info, which I'm planning to do. - struct flush_tlb_info is an adequate description of what to do for a local flush, too, so by reusing it we can remove duplicated code between local and remove flushes in a future patch. Signed-off-by: Andy Lutomirski <luto@kernel.org> Acked-by: Rik van Riel <riel@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bpetkov@suse.de> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Michal Hocko <mhocko@suse.com> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Nadav Amit <namit@vmware.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org [ Fix build warning. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-05-19xen: make xen_flush_tlb_all() staticJuergen Gross1-1/+1
xen_flush_tlb_all() is used in arch/x86/xen/mmu.c only. Make it static. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2017-05-19xen: cleanup pvh leftovers from pv-only sourcesJuergen Gross2-75/+42
There are some leftovers testing for pvh guest mode in pv-only source files. Remove them. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2017-05-11xen: adjust early dom0 p2m handling to xen hypervisor behaviorJuergen Gross1-3/+4
When booted as pv-guest the p2m list presented by the Xen is already mapped to virtual addresses. In dom0 case the hypervisor might make use of 2M- or 1G-pages for this mapping. Unfortunately while being properly aligned in virtual and machine address space, those pages might not be aligned properly in guest physical address space. So when trying to obtain the guest physical address of such a page pud_pfn() and pmd_pfn() must be avoided as those will mask away guest physical address bits not being zero in this special case. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2017-05-11x86/amd: don't set X86_BUG_SYSRET_SS_ATTRS when running under XenJuergen Gross1-1/+0
When running as Xen pv guest X86_BUG_SYSRET_SS_ATTRS must not be set on AMD cpus. This bug/feature bit is kind of special as it will be used very early when switching threads. Setting the bit and clearing it a little bit later leaves a critical window where things can go wrong. This time window has enlarged a little bit by using setup_clear_cpu_cap() instead of the hypervisor's set_cpu_features callback. It seems this larger window now makes it rather easy to hit the problem. The proper solution is to never set the bit in case of Xen. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Juergen Gross <jgross@suse.com>
2017-05-05xen/x86: Do not call xen_init_time_ops() until shared_info is initializedBoris Ostrovsky2-3/+8
Routines that are set by xen_init_time_ops() use shared_info's pvclock_vcpu_time_info area. This area is not properly available until shared_info is mapped in xen_setup_shared_info(). This became especially problematic due to commit dd759d93f4dd ("x86/timers: Add simple udelay calibration") where we end up reading tsc_to_system_mul from xen_dummy_shared_info (i.e. getting zero value) and then trying to divide by it in pvclock_tsc_khz(). Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2017-05-05x86/xen: fix xsave capability settingJuergen Gross1-23/+9
Commit 690b7f10b4f9f ("x86/xen: use capabilities instead of fake cpuid values for xsave") introduced a regression as it tried to make use of the fixup feature before it being available. Fall back to the old variant testing via cpuid(). Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2017-05-03xen: Move xen_have_vector_callback definition to enlighten.cBoris Ostrovsky2-3/+3
Commit 84d582d236dc ("xen: Revert commits da72ff5bfcb0 and 72a9b186292d") defined xen_have_vector_callback in enlighten_hvm.c. Since guest-type-neutral code refers to this variable this causes build failures when CONFIG_XEN_PVHVM is not defined. Moving xen_have_vector_callback definition to enlighten.c resolves this issue. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reported-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2017-05-02xen: Implement EFI reset_system callbackJulien Grall1-1/+1
When rebooting DOM0 with ACPI on ARM64, the kernel is crashing with the stack trace [1]. This is happening because when EFI runtimes are enabled, the reset code (see machine_restart) will first try to use EFI restart method. However, the EFI restart code is expecting the reset_system callback to be always set. This is not the case for Xen and will lead to crash. The EFI restart helper is used in multiple places and some of them don't not have fallback (see machine_power_off). So implement reset_system callback as a call to xen_reboot when using EFI Xen. [ 36.999270] reboot: Restarting system [ 37.002921] Internal error: Attempting to execute userspace memory: 86000004 [#1] PREEMPT SMP [ 37.011460] Modules linked in: [ 37.014598] CPU: 0 PID: 1 Comm: systemd-shutdow Not tainted 4.11.0-rc1-00003-g1e248b60a39b-dirty #506 [ 37.023903] Hardware name: (null) (DT) [ 37.027734] task: ffff800902068000 task.stack: ffff800902064000 [ 37.033739] PC is at 0x0 [ 37.036359] LR is at efi_reboot+0x94/0xd0 [ 37.040438] pc : [<0000000000000000>] lr : [<ffff00000880f2c4>] pstate: 404001c5 [ 37.047920] sp : ffff800902067cf0 [ 37.051314] x29: ffff800902067cf0 x28: ffff800902068000 [ 37.056709] x27: ffff000008992000 x26: 000000000000008e [ 37.062104] x25: 0000000000000123 x24: 0000000000000015 [ 37.067499] x23: 0000000000000000 x22: ffff000008e6e250 [ 37.072894] x21: ffff000008e6e000 x20: 0000000000000000 [ 37.078289] x19: ffff000008e5d4c8 x18: 0000000000000010 [ 37.083684] x17: 0000ffffa7c27470 x16: 00000000deadbeef [ 37.089079] x15: 0000000000000006 x14: ffff000088f42bef [ 37.094474] x13: ffff000008f42bfd x12: ffff000008e706c0 [ 37.099870] x11: ffff000008e70000 x10: 0000000005f5e0ff [ 37.105265] x9 : ffff800902067a50 x8 : 6974726174736552 [ 37.110660] x7 : ffff000008cc6fb8 x6 : ffff000008cc6fb0 [ 37.116055] x5 : ffff000008c97dd8 x4 : 0000000000000000 [ 37.121453] x3 : 0000000000000000 x2 : 0000000000000000 [ 37.126845] x1 : 0000000000000000 x0 : 0000000000000000 [ 37.132239] [ 37.133808] Process systemd-shutdow (pid: 1, stack limit = 0xffff800902064000) [ 37.141118] Stack: (0xffff800902067cf0 to 0xffff800902068000) [ 37.146949] 7ce0: ffff800902067d40 ffff000008085334 [ 37.154869] 7d00: 0000000000000000 ffff000008f3b000 ffff800902067d40 ffff0000080852e0 [ 37.162787] 7d20: ffff000008cc6fb0 ffff000008cc6fb8 ffff000008c7f580 ffff000008c97dd8 [ 37.170706] 7d40: ffff800902067d60 ffff0000080e2c2c 0000000000000000 0000000001234567 [ 37.178624] 7d60: ffff800902067d80 ffff0000080e2ee8 0000000000000000 ffff0000080e2df4 [ 37.186544] 7d80: 0000000000000000 ffff0000080830f0 0000000000000000 00008008ff1c1000 [ 37.194462] 7da0: ffffffffffffffff 0000ffffa7c4b1cc 0000000000000000 0000000000000024 [ 37.202380] 7dc0: ffff800902067dd0 0000000000000005 0000fffff24743c8 0000000000000004 [ 37.210299] 7de0: 0000fffff2475f03 0000000000000010 0000fffff2474418 0000000000000005 [ 37.218218] 7e00: 0000fffff2474578 000000000000000a 0000aaaad6b722c0 0000000000000001 [ 37.226136] 7e20: 0000000000000123 0000000000000038 ffff800902067e50 ffff0000081e7294 [ 37.234055] 7e40: ffff800902067e60 ffff0000081e935c ffff800902067e60 ffff0000081e9388 [ 37.241973] 7e60: ffff800902067eb0 ffff0000081ea388 0000000000000000 00008008ff1c1000 [ 37.249892] 7e80: ffffffffffffffff 0000ffffa7c4a79c 0000000000000000 ffff000000020000 [ 37.257810] 7ea0: 0000010000000004 0000000000000000 0000000000000000 ffff0000080830f0 [ 37.265729] 7ec0: fffffffffee1dead 0000000028121969 0000000001234567 0000000000000000 [ 37.273651] 7ee0: ffffffffffffffff 8080000000800000 0000800000008080 feffa9a9d4ff2d66 [ 37.281567] 7f00: 000000000000008e feffa9a9d5b60e0f 7f7fffffffff7f7f 0101010101010101 [ 37.289485] 7f20: 0000000000000010 0000000000000008 000000000000003a 0000ffffa7ccf588 [ 37.297404] 7f40: 0000aaaad6b87d00 0000ffffa7c4b1b0 0000fffff2474be0 0000aaaad6b88000 [ 37.305326] 7f60: 0000fffff2474fb0 0000000001234567 0000000000000000 0000000000000000 [ 37.313240] 7f80: 0000000000000000 0000000000000001 0000aaaad6b70d4d 0000000000000000 [ 37.321159] 7fa0: 0000000000000001 0000fffff2474ea0 0000aaaad6b5e2e0 0000fffff2474e80 [ 37.329078] 7fc0: 0000ffffa7c4b1cc 0000000000000000 fffffffffee1dead 000000000000008e [ 37.336997] 7fe0: 0000000000000000 0000000000000000 9ce839cffee77eab fafdbf9f7ed57f2f [ 37.344911] Call trace: [ 37.347437] Exception stack(0xffff800902067b20 to 0xffff800902067c50) [ 37.353970] 7b20: ffff000008e5d4c8 0001000000000000 0000000080f82000 0000000000000000 [ 37.361883] 7b40: ffff800902067b60 ffff000008e17000 ffff000008f44c68 00000001081081b4 [ 37.369802] 7b60: ffff800902067bf0 ffff000008108478 0000000000000000 ffff000008c235b0 [ 37.377721] 7b80: ffff800902067ce0 0000000000000000 0000000000000000 0000000000000015 [ 37.385643] 7ba0: 0000000000000123 000000000000008e ffff000008992000 ffff800902068000 [ 37.393557] 7bc0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 37.401477] 7be0: 0000000000000000 ffff000008c97dd8 ffff000008cc6fb0 ffff000008cc6fb8 [ 37.409396] 7c00: 6974726174736552 ffff800902067a50 0000000005f5e0ff ffff000008e70000 [ 37.417318] 7c20: ffff000008e706c0 ffff000008f42bfd ffff000088f42bef 0000000000000006 [ 37.425234] 7c40: 00000000deadbeef 0000ffffa7c27470 [ 37.430190] [< (null)>] (null) [ 37.434982] [<ffff000008085334>] machine_restart+0x6c/0x70 [ 37.440550] [<ffff0000080e2c2c>] kernel_restart+0x6c/0x78 [ 37.446030] [<ffff0000080e2ee8>] SyS_reboot+0x130/0x228 [ 37.451337] [<ffff0000080830f0>] el0_svc_naked+0x24/0x28 [ 37.456737] Code: bad PC value [ 37.459891] ---[ end trace 76e2fc17e050aecd ]--- Signed-off-by: Julien Grall <julien.grall@arm.com> -- Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Juergen Gross <jgross@suse.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org The x86 code has theoritically a similar issue, altought EFI does not seem to be the preferred method. I have only built test it on x86. This should also probably be fixed in stable tree. Changes in v2: - Implement xen_efi_reset_system using xen_reboot - Move xen_efi_reset_system in drivers/xen/efi.c Signed-off-by: Juergen Gross <jgross@suse.com>
2017-05-02xen: Export xen_rebootJulien Grall1-1/+0
The helper xen_reboot will be called by the EFI code in a later patch. Note that the ARM version does not yet exist and will be added in a later patch too. Signed-off-by: Julien Grall <julien.grall@arm.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2017-05-02xen/x86: Call xen_smp_intr_init_pv() on BSPBoris Ostrovsky1-1/+1
Recent code rework that split handling ov PV, HVM and PVH guests into separate files missed calling xen_smp_intr_init_pv() on CPU0. Add this call. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reported-by: Sander Eikelenboom <linux@eikelenboom.it> Signed-off-by: Juergen Gross <jgross@suse.com>
2017-05-02xen: Revert commits da72ff5bfcb0 and 72a9b186292dBoris Ostrovsky3-4/+21
Recent discussion (http://marc.info/?l=xen-devel&m=149192184523741) established that commit 72a9b186292d ("xen: Remove event channel notification through Xen PCI platform device") (and thus commit da72ff5bfcb0 ("partially revert "xen: Remove event channel notification through Xen PCI platform device"")) are unnecessary and, in fact, prevent HVM guests from booting on Xen releases prior to 4.0 Therefore we revert both of those commits. The summary of that discussion is below: Here is the brief summary of the current situation: Before the offending commit (72a9b186292): 1) INTx does not work because of the reset_watches path. 2) The reset_watches path is only taken if you have Xen > 4.0 3) The Linux Kernel by default will use vector inject if the hypervisor support. So even INTx does not work no body running the kernel with Xen > 4.0 would notice. Unless he explicitly disabled this feature either in the kernel or in Xen (and this can only be disabled by modifying the code, not user-supported way to do it). After the offending commit (+ partial revert): 1) INTx is no longer support for HVM (only for PV guests). 2) Any HVM guest The kernel will not boot on Xen < 4.0 which does not have vector injection support. Since the only other mode supported is INTx which. So based on this summary, I think before commit (72a9b186292) we were in much better position from a user point of view. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2017-05-02xen/pvh: Do not fill kernel's e820 map in init_pvh_bootparams()Boris Ostrovsky1-14/+6
e820 map is updated with information from the zeropage (i.e. pvh_bootparams) by default_machine_specific_memory_setup(). With the way things are done now, we end up with a duplicated e820 map. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>