summaryrefslogtreecommitdiff
path: root/arch/x86/include
AgeCommit message (Collapse)AuthorFilesLines
2015-06-23Merge branch 'x86-core-for-linus' of ↵Linus Torvalds39-748/+453
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 core updates from Ingo Molnar: "There were so many changes in the x86/asm, x86/apic and x86/mm topics in this cycle that the topical separation of -tip broke down somewhat - so the result is a more traditional architecture pull request, collected into the 'x86/core' topic. The topics were still maintained separately as far as possible, so bisectability and conceptual separation should still be pretty good - but there were a handful of merge points to avoid excessive dependencies (and conflicts) that would have been poorly tested in the end. The next cycle will hopefully be much more quiet (or at least will have fewer dependencies). The main changes in this cycle were: * x86/apic changes, with related IRQ core changes: (Jiang Liu, Thomas Gleixner) - This is the second and most intrusive part of changes to the x86 interrupt handling - full conversion to hierarchical interrupt domains: [IOAPIC domain] ----- | [MSI domain] --------[Remapping domain] ----- [ Vector domain ] | (optional) | [HPET MSI domain] ----- | | [DMAR domain] ----------------------------- | [Legacy domain] ----------------------------- This now reflects the actual hardware and allowed us to distangle the domain specific code from the underlying parent domain, which can be optional in the case of interrupt remapping. It's a clear separation of functionality and removes quite some duct tape constructs which plugged the remap code between ioapic/msi/hpet and the vector management. - Intel IOMMU IRQ remapping enhancements, to allow direct interrupt injection into guests (Feng Wu) * x86/asm changes: - Tons of cleanups and small speedups, micro-optimizations. This is in preparation to move a good chunk of the low level entry code from assembly to C code (Denys Vlasenko, Andy Lutomirski, Brian Gerst) - Moved all system entry related code to a new home under arch/x86/entry/ (Ingo Molnar) - Removal of the fragile and ugly CFI dwarf debuginfo annotations. Conversion to C will reintroduce many of them - but meanwhile they are only getting in the way, and the upstream kernel does not rely on them (Ingo Molnar) - NOP handling refinements. (Borislav Petkov) * x86/mm changes: - Big PAT and MTRR rework: making the code more robust and preparing to phase out exposing direct MTRR interfaces to drivers - in favor of using PAT driven interfaces (Toshi Kani, Luis R Rodriguez, Borislav Petkov) - New ioremap_wt()/set_memory_wt() interfaces to support Write-Through cached memory mappings. This is especially important for good performance on NVDIMM hardware (Toshi Kani) * x86/ras changes: - Add support for deferred errors on AMD (Aravind Gopalakrishnan) This is an important RAS feature which adds hardware support for poisoned data. That means roughly that the hardware marks data which it has detected as corrupted but wasn't able to correct, as poisoned data and raises an APIC interrupt to signal that in the form of a deferred error. It is the OS's responsibility then to take proper recovery action and thus prolonge system lifetime as far as possible. - Add support for Intel "Local MCE"s: upcoming CPUs will support CPU-local MCE interrupts, as opposed to the traditional system- wide broadcasted MCE interrupts (Ashok Raj) - Misc cleanups (Borislav Petkov) * x86/platform changes: - Intel Atom SoC updates ... and lots of other cleanups, fixlets and other changes - see the shortlog and the Git log for details" * 'x86-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (222 commits) x86/hpet: Use proper hpet device number for MSI allocation x86/hpet: Check for irq==0 when allocating hpet MSI interrupts x86/mm/pat, drivers/infiniband/ipath: Use arch_phys_wc_add() and require PAT disabled x86/mm/pat, drivers/media/ivtv: Use arch_phys_wc_add() and require PAT disabled x86/platform/intel/baytrail: Add comments about why we disabled HPET on Baytrail genirq: Prevent crash in irq_move_irq() genirq: Enhance irq_data_to_desc() to support hierarchy irqdomain iommu, x86: Properly handle posted interrupts for IOMMU hotplug iommu, x86: Provide irq_remapping_cap() interface iommu, x86: Setup Posted-Interrupts capability for Intel iommu iommu, x86: Add cap_pi_support() to detect VT-d PI capability iommu, x86: Avoid migrating VT-d posted interrupts iommu, x86: Save the mode (posted or remapped) of an IRTE iommu, x86: Implement irq_set_vcpu_affinity for intel_ir_chip iommu: dmar: Provide helper to copy shared irte fields iommu: dmar: Extend struct irte for VT-d Posted-Interrupts iommu: Add new member capability to struct irq_remap_ops x86/asm/entry/64: Disentangle error_entry/exit gsbase/ebx/usermode code x86/asm/entry/32: Shorten __audit_syscall_entry() args preparation x86/asm/entry/32: Explain reloading of registers after __audit_syscall_entry() ...
2015-06-23Merge branch 'x86-microcode-for-linus' of ↵Linus Torvalds3-14/+11
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pul x86 microcode updates from Ingo Molnar: "x86 microcode loader updates from Borislav Petkov: - early parsing of the built-in microcode - cleanups - misc smaller fixes" * 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/microcode: Correct CPU family related variable types x86/microcode: Disable builtin microcode loading on 32-bit for now x86/microcode/intel: Rename get_matching_sig() x86/microcode/intel: Simplify get_matching_sig() x86/microcode/intel: Simplify update_match_cpu() x86/microcode/intel: Rename get_matching_microcode x86/cpu/microcode: Zap changelog x86/microcode: Parse built-in microcode early x86/microcode/intel: Remove unused @rev arg of get_matching_sig() x86/microcode/intel: Get rid of revision_is_newer()
2015-06-23Merge branch 'x86-fpu-for-linus' of ↵Linus Torvalds27-1243/+1357
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 FPU updates from Ingo Molnar: "This tree contains two main changes: - The big FPU code rewrite: wide reaching cleanups and reorganization that pulls all the FPU code together into a clean base in arch/x86/fpu/. The resulting code is leaner and faster, and much easier to understand. This enables future work to further simplify the FPU code (such as removing lazy FPU restores). By its nature these changes have a substantial regression risk: FPU code related bugs are long lived, because races are often subtle and bugs mask as user-space failures that are difficult to track back to kernel side backs. I'm aware of no unfixed (or even suspected) FPU related regression so far. - MPX support rework/fixes. As this is still not a released CPU feature, there were some buglets in the code - should be much more robust now (Dave Hansen)" * 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (250 commits) x86/fpu: Fix double-increment in setup_xstate_features() x86/mpx: Allow 32-bit binaries on 64-bit kernels again x86/mpx: Do not count MPX VMAs as neighbors when unmapping x86/mpx: Rewrite the unmap code x86/mpx: Support 32-bit binaries on 64-bit kernels x86/mpx: Use 32-bit-only cmpxchg() for 32-bit apps x86/mpx: Introduce new 'directory entry' to 'addr' helper function x86/mpx: Add temporary variable to reduce masking x86: Make is_64bit_mm() widely available x86/mpx: Trace allocation of new bounds tables x86/mpx: Trace the attempts to find bounds tables x86/mpx: Trace entry to bounds exception paths x86/mpx: Trace #BR exceptions x86/mpx: Introduce a boot-time disable flag x86/mpx: Restrict the mmap() size check to bounds tables x86/mpx: Remove redundant MPX_BNDCFG_ADDR_MASK x86/mpx: Clean up the code by not passing a task pointer around when unnecessary x86/mpx: Use the new get_xsave_field_ptr()API x86/fpu/xstate: Wrap get_xsave_addr() to make it safer x86/fpu/xstate: Fix up bad get_xsave_addr() assumptions ...
2015-06-23Merge branch 'x86-cpu-for-linus' of ↵Linus Torvalds2-0/+12
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 CPU features from Ingo Molnar: "Various CPU feature support related changes: in particular the /proc/cpuinfo model name sanitization change should be monitored, it has a chance to break stuff. (but really shouldn't and there are no regression reports)" * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu/amd: Give access to the number of nodes in a physical package x86/cpu: Trim model ID whitespace x86/cpu: Strip any /proc/cpuinfo model name field whitespace x86/cpu/amd: Set X86_FEATURE_EXTD_APICID for future processors x86/gart: Check for GART support before accessing GART registers
2015-06-23Merge branch 'x86-cleanups-for-linus' of ↵Linus Torvalds1-41/+5
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Ingo Molnar: "Misc cleanups" * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm: Clean up types in xlate_dev_mem_ptr() some more x86: Deinline dma_free_attrs() x86: Deinline dma_alloc_attrs() x86: Remove unused TI_cpu x86: Merge common 32-bit values in asm-offsets.c
2015-06-23Merge branch 'sched-core-for-linus' of ↵Linus Torvalds5-23/+18
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "The main changes are: - lockless wakeup support for futexes and IPC message queues (Davidlohr Bueso, Peter Zijlstra) - Replace spinlocks with atomics in thread_group_cputimer(), to improve scalability (Jason Low) - NUMA balancing improvements (Rik van Riel) - SCHED_DEADLINE improvements (Wanpeng Li) - clean up and reorganize preemption helpers (Frederic Weisbecker) - decouple page fault disabling machinery from the preemption counter, to improve debuggability and robustness (David Hildenbrand) - SCHED_DEADLINE documentation updates (Luca Abeni) - topology CPU masks cleanups (Bartosz Golaszewski) - /proc/sched_debug improvements (Srikar Dronamraju)" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (79 commits) sched/deadline: Remove needless parameter in dl_runtime_exceeded() sched: Remove superfluous resetting of the p->dl_throttled flag sched/deadline: Drop duplicate init_sched_dl_class() declaration sched/deadline: Reduce rq lock contention by eliminating locking of non-feasible target sched/deadline: Make init_sched_dl_class() __init sched/deadline: Optimize pull_dl_task() sched/preempt: Add static_key() to preempt_notifiers sched/preempt: Fix preempt notifiers documentation about hlist_del() within unsafe iteration sched/stop_machine: Fix deadlock between multiple stop_two_cpus() sched/debug: Add sum_sleep_runtime to /proc/<pid>/sched sched/debug: Replace vruntime with wait_sum in /proc/sched_debug sched/debug: Properly format runnable tasks in /proc/sched_debug sched/numa: Only consider less busy nodes as numa balancing destinations Revert 095bebf61a46 ("sched/numa: Do not move past the balance point if unbalanced") sched/fair: Prevent throttling in early pick_next_task_fair() preempt: Reorganize the notrace definitions a bit preempt: Use preempt_schedule_context() as the official tracing preemption point sched: Make preempt_schedule_context() function-tracing safe x86: Remove cpu_sibling_mask() and cpu_core_mask() x86: Replace cpu_**_mask() with topology_**_cpumask() ...
2015-06-23Merge branch 'locking-core-for-linus' of ↵Linus Torvalds8-5/+112
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: "The main changes are: - 'qspinlock' support, enabled on x86: queued spinlocks - these are now the spinlock variant used by x86 as they outperform ticket spinlocks in every category. (Waiman Long) - 'pvqspinlock' support on x86: paravirtualized variant of queued spinlocks. (Waiman Long, Peter Zijlstra) - 'qrwlock' support, enabled on x86: queued rwlocks. Similar to queued spinlocks, they are now the variant used by x86: CONFIG_ARCH_USE_QUEUED_SPINLOCKS=y CONFIG_QUEUED_SPINLOCKS=y CONFIG_ARCH_USE_QUEUED_RWLOCKS=y CONFIG_QUEUED_RWLOCKS=y - various lockdep fixlets - various locking primitives cleanups, further WRITE_ONCE() propagation" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits) locking/lockdep: Remove hard coded array size dependency locking/qrwlock: Don't contend with readers when setting _QW_WAITING lockdep: Do not break user-visible string locking/arch: Rename set_mb() to smp_store_mb() locking/arch: Add WRITE_ONCE() to set_mb() rtmutex: Warn if trylock is called from hard/softirq context arch: Remove __ARCH_HAVE_CMPXCHG locking/rtmutex: Drop usage of __HAVE_ARCH_CMPXCHG locking/qrwlock: Rename QUEUE_RWLOCK to QUEUED_RWLOCKS locking/pvqspinlock: Rename QUEUED_SPINLOCK to QUEUED_SPINLOCKS locking/pvqspinlock: Replace xchg() by the more descriptive set_mb() locking/pvqspinlock, x86: Enable PV qspinlock for Xen locking/pvqspinlock, x86: Enable PV qspinlock for KVM locking/pvqspinlock, x86: Implement the paravirt qspinlock call patching locking/pvqspinlock: Implement simple paravirt support for the qspinlock locking/qspinlock: Revert to test-and-set on hypervisors locking/qspinlock: Use a simple write to grab the lock locking/qspinlock: Optimize for smaller NR_CPUS locking/qspinlock: Extract out code snippets for the next patch locking/qspinlock: Add pending bit ...
2015-06-22Merge branches 'x86/apic', 'x86/asm', 'x86/mm' and 'x86/platform' into ↵Ingo Molnar33-467/+272
x86/core, to merge last updates Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-18x86/cpu/amd: Give access to the number of nodes in a physical packageAravind Gopalakrishnan1-0/+1
Stash the number of nodes in a physical processor package locally and add an accessor to be called by interested parties. The first user is the MCE injection module which uses it to find the node base core in a package for injecting a certain type of errors. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> [ Rewrote the commit message, merged it with the accessor patch and unified naming. ] Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jacob Shin <jacob.w.shin@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: mchehab@osg.samsung.com Link: http://lkml.kernel.org/r/1433868317-18417-2-git-send-email-Aravind.Gopalakrishnan@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-12iommu, x86: Provide irq_remapping_cap() interfaceFeng Wu1-0/+2
Add a new interface irq_remapping_cap() to detect whether irq remapping supports new features, such as VT-d Posted-Interrupts. Export the function, so that KVM code can check this and use this mechanism properly. Signed-off-by: Feng Wu <feng.wu@intel.com> Reviewed-by: Jiang Liu <jiang.liu@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Joerg Roedel <joro@8bytes.org> Cc: iommu@lists.linux-foundation.org Cc: dwmw2@infradead.org Link: http://lkml.kernel.org/r/1433827237-3382-10-git-send-email-feng.wu@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-06-12iommu, x86: Implement irq_set_vcpu_affinity for intel_ir_chipFeng Wu1-0/+5
Interrupt chip callback to set the VCPU affinity for posted interrupts. [ tglx: Use the helper function to copy from the remap irte instead of open coding it. Massage the comment as well ] Signed-off-by: Feng Wu <feng.wu@intel.com> Reviewed-by: Jiang Liu <jiang.liu@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: David Woodhouse <David.Woodhouse@intel.com> Cc: iommu@lists.linux-foundation.org Cc: joro@8bytes.org Cc: dwmw2@infradead.org Link: http://lkml.kernel.org/r/1433827237-3382-5-git-send-email-feng.wu@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-06-12iommu: Add new member capability to struct irq_remap_opsFeng Wu1-0/+4
Add a new member 'capability' to struct irq_remap_ops for storing information about available capabilities such as VT-d Posted-Interrupts. Signed-off-by: Feng Wu <feng.wu@intel.com> Reviewed-by: Jiang Liu <jiang.liu@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Joerg Roedel <joro@8bytes.org> Cc: iommu@lists.linux-foundation.org Cc: dwmw2@infradead.org Link: http://lkml.kernel.org/r/1433827237-3382-2-git-send-email-feng.wu@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-06-09x86/mpx: Support 32-bit binaries on 64-bit kernelsDave Hansen1-32/+30
Right now, the kernel can only switch between 64-bit and 32-bit binaries at compile time. This patch adds support for 32-bit binaries on 64-bit kernels when we support ia32 emulation. We essentially choose which set of table sizes to use when doing arithmetic for the bounds table calculations. This also uses a different approach for calculating the table indexes than before. I think the new one makes it much more clear what is going on, and allows us to share more code between the 32-bit and 64-bit cases. Based-on-patch-by: Qiaowei Ren <qiaowei.ren@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave@sr71.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20150607183705.E01F21E2@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-09x86/mpx: Introduce new 'directory entry' to 'addr' helper functionDave Hansen1-1/+0
Currently, to get from a bounds directory entry to the virtual address of a bounds table, we simply mask off a few low bits. However, the set of bits we mask off is different for 32-bit and 64-bit binaries. This breaks the operation out in to a helper function and also adds a temporary variable to store the result until we are sure we are returning one. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave@sr71.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20150607183704.007686CE@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-09x86: Make is_64bit_mm() widely availableDave Hansen1-0/+13
The uprobes code has a nice helper, is_64bit_mm(), that consults both the runtime and compile-time flags for 32-bit support. Instead of reinventing the wheel, pull it in to an x86 header so we can use it for MPX. I prefer passing the 'mm' around to test_thread_flag(TIF_IA32) because it makes it explicit where the context is coming from. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave@sr71.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20150607183704.F0209999@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-09x86/mpx: Trace allocation of new bounds tablesDave Hansen1-0/+16
Bounds tables are a significant consumer of memory. It is important to know when they are being allocated. Add a trace point to trace whenever an allocation occurs and also its virtual address. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave@sr71.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20150607183704.EC23A93E@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-09x86/mpx: Trace the attempts to find bounds tablesDave Hansen1-0/+32
There are two different events being traced here. They are doing similar things so share a trace "EVENT_CLASS" and are presented together. 1. Trace when MPX is zapping pages "mpx_unmap_zap": When MPX can not free an entire bounds table, it will instead try to zap unused parts of a bounds table to free the backing memory. This decreases RSS (resident set size) without decreasing the virtual space allocated for bounds tables. 2. Trace attempts to find bounds tables "mpx_unmap_search": This event traces any time we go looking to unmap a bounds table for a given virtual address range. This is useful to ensure that the kernel actually "tried" to free a bounds table versus times it succeeded in finding one. It might try and fail if it realized that a table was shared with an adjacent VMA which is not being unmapped. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave@sr71.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20150607183703.B9D2468B@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-09x86/mpx: Trace entry to bounds exception pathsDave Hansen1-0/+34
There are two basic things that can happen as the result of a bounds exception (#BR): 1. We allocate a new bounds table 2. We pass up a bounds exception to userspace. This patch adds a trace point for the case where we are passing the exception up to userspace with a signal. We are also explicit that we're printing out the inverse of the 'upper' that we encounter. If you want to filter, for instance, you need to ~ the value first. The reason we do this is because of how 'upper' is stored in the bounds table. If a pointer's range is: 0x1000 -> 0x2000 it is stored in the bounds table as (32-bits here for brevity): lower: 0x00001000 upper: 0xffffdfff That is so that an all 0's entry: lower: 0x00000000 upper: 0x00000000 corresponds to the "init" bounds which store a *range* of: 0x00000000 -> 0xffffffff That is, by far, the common case, and that lets us use the zero page, or deduplicate the memory, etc... The 'upper' stored in the table is gibberish to print by itself, so we print ~upper to get the *actual*, logical, human-readable value printed out. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave@sr71.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20150607183703.027BB9B0@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-09x86/mpx: Trace #BR exceptionsDave Hansen1-0/+50
This is the first in a series of MPX tracing patches. I've found these extremely useful in the process of debugging applications and the kernel code itself. This exception hooks in to the bounds (#BR) exception very early and allows capturing the key registers which would influence how the exception is handled. Note that bndcfgu/bndstatus are technically still 64-bit registers even in 32-bit mode. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave@sr71.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20150607183703.5FE2619A@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-09x86/mpx: Remove redundant MPX_BNDCFG_ADDR_MASKQiaowei Ren1-1/+0
MPX_BNDCFG_ADDR_MASK is defined two times, so this patch removes redundant one. Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave@sr71.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20150607183702.5F129376@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-09x86/mpx: Clean up the code by not passing a task pointer around when unnecessaryDave Hansen2-12/+10
The MPX code can only work on the current task. You can not, for instance, enable MPX management in another process or thread. You can also not handle a fault for another process or thread. Despite this, we pass a task_struct around prolifically. This patch removes all of the task struct passing for code paths where the code can not deal with another task (which turns out to be all of them). This has no functional changes. It's just a cleanup. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave@sr71.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: bp@alien8.de Link: http://lkml.kernel.org/r/20150607183702.6A81DA2C@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-09x86/mpx: Use the new get_xsave_field_ptr()APIDave Hansen1-4/+4
The MPX registers (bndcsr/bndcfgu/bndstatus) are not directly accessible via normal instructions. They essentially act as if they were floating point registers and are saved/restored along with those registers. There are two main paths in the MPX code where we care about the contents of these registers: 1. #BR (bounds) faults 2. the prctl() code where we are setting MPX up Both of those paths _might_ be called without the FPU having been used. That means that 'tsk->thread.fpu.state' might never be allocated. Also, fpu_save_init() is not preempt-safe. It was a bug to call it without disabling preemption. The new get_xsave_addr() calls unlazy_fpu() instead and properly disables preemption. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave@sr71.net> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Suresh Siddha <sbsiddha@gmail.com> Cc: bp@alien8.de Link: http://lkml.kernel.org/r/20150607183701.BC0D37CF@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-09x86/fpu/xstate: Wrap get_xsave_addr() to make it saferDave Hansen1-0/+1
The MPX code appears is calling a low-level FPU function (copy_fpregs_to_fpstate()). This function is not able to be called in all contexts, although it is safe to call directly in some cases. Although probably correct, the current code is ugly and potentially error-prone. So, add a wrapper that calls the (slightly) higher-level fpu__save() (which is preempt- safe) and also ensures that we even *have* an FPU context (in the case that this was called when in lazy FPU mode). Ingo had this to say about the details about when we need preemption disabled: > it's indeed generally unsafe to access/copy FPU registers with preemption enabled, > for two reasons: > > - on older systems that use FSAVE the instruction destroys FPU register > contents, which has to be handled carefully > > - even on newer systems if we copy to FPU registers (which this code doesn't) > then we don't want a context switch to occur in the middle of it, because a > context switch will write to the fpstate, potentially overwriting our new data > with old FPU state. > > But it's safe to access FPU registers with preemption enabled in a couple of > special cases: > > - potentially destructively saving FPU registers: the signal handling code does > this in copy_fpstate_to_sigframe(), because it can rely on the signal restore > side to restore the original FPU state. > > - reading FPU registers on modern systems: we don't do this anywhere at the > moment, mostly to keep symmetry with older systems where FSAVE is > destructive. > > - initializing FPU registers on modern systems: fpu__clear() does this. Here > it's safe because we don't copy from the fpstate. > > - directly writing FPU registers from user-space memory (!). We do this in > fpu__restore_sig(), and it's safe because neither context switches nor > irq-handler FPU use can corrupt the source context of the copy (which is > user-space memory). > > Note that the MPX code's current use of copy_fpregs_to_fpstate() was safe I think, > because: > > - MPX is predicated on eagerfpu, so the destructive F[N]SAVE instruction won't be > used. > > - the code was only reading FPU registers, and was doing it only in places that > guaranteed that an FPU state was already active (i.e. didn't do it in > kthreads) Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave@sr71.net> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Suresh Siddha <sbsiddha@gmail.com> Cc: bp@alien8.de Link: http://lkml.kernel.org/r/20150607183700.AA881696@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-08Merge branch 'x86/asm' into x86/core, to prepare for new patchIngo Molnar6-430/+27
Collect all changes to arch/x86/entry/entry_64.S, before applying patch that changes most of the file. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-08x86/asm/entry: Untangle 'system_call' into two entry points: ↵Ingo Molnar1-2/+3
entry_SYSCALL_64 and entry_INT80_32 The 'system_call' entry points differ starkly between native 32-bit and 64-bit kernels: on 32-bit kernels it defines the INT 0x80 entry point, while on 64-bit it's the SYSCALL entry point. This is pretty confusing when looking at generic code, and it also obscures the nature of the entry point at the assembly level. So unangle this by splitting the name into its two uses: system_call (32) -> entry_INT80_32 system_call (64) -> entry_SYSCALL_64 As per the generic naming scheme for x86 system call entry points: entry_MNEMONIC_qualifier where 'qualifier' is one of _32, _64 or _compat. Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-08x86/asm/entry: Untangle 'ia32_sysenter_target' into two entry points: ↵Ingo Molnar1-1/+2
entry_SYSENTER_32 and entry_SYSENTER_compat So the SYSENTER instruction is pretty quirky and it has different behavior depending on bitness and CPU maker. Yet we create a false sense of coherency by naming it 'ia32_sysenter_target' in both of the cases. Split the name into its two uses: ia32_sysenter_target (32) -> entry_SYSENTER_32 ia32_sysenter_target (64) -> entry_SYSENTER_compat As per the generic naming scheme for x86 system call entry points: entry_MNEMONIC_qualifier where 'qualifier' is one of _32, _64 or _compat. Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-08x86/asm/entry: Rename compat syscall entry pointsIngo Molnar1-2/+2
Rename the following system call entry points: ia32_cstar_target -> entry_SYSCALL_compat ia32_syscall -> entry_INT80_compat The generic naming scheme for x86 system call entry points is: entry_MNEMONIC_qualifier where 'qualifier' is one of _32, _64 or _compat. Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-07preempt: Use preempt_schedule_context() as the official tracing preemption pointFrederic Weisbecker1-5/+3
preempt_schedule_context() is a tracing safe preemption point but it's only used when CONFIG_CONTEXT_TRACKING=y. Other configs have tracing recursion issues since commit: b30f0e3ffedf ("sched/preempt: Optimize preemption operations on __schedule() callers") introduced function based preemp_count_*() ops. Lets make it available on all configs and give it a more appropriate name for its new position. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1433432349-1021-3-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-07x86/microcode: Correct CPU family related variable typesAndy Shevchenko1-2/+2
Change the type of variables and function prototypes to be in alignment with what the x86_*() / __x86_*() family/model functions return. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1433436928-31903-21-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-07x86/uapi: Do not export <asm/msr-index.h> as part of the user API headersBorislav Petkov3-3/+2
This header containing all MSRs and respective bit definitions got exported to userspace in conjunction with the big UAPI shuffle. But, it doesn't belong in the UAPI headers because userspace can do its own MSR defines and exporting them from the kernel blocks us from doing cleanups/renames in that header. Which is ridiculous - it is not kernel's job to export such a header and keep MSRs list and their names stable. Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: H. Peter Anvin <hpa@zytor.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: David Howells <dhowells@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1433436928-31903-19-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-07Merge branch 'x86/ras' into x86/core, to fix conflictsIngo Molnar8-8/+50
Conflicts: arch/x86/include/asm/irq_vectors.h Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-07x86: Kill CONFIG_X86_HTBorislav Petkov1-1/+1
In talking to Aravind recently about making certain AMD topology attributes available to the MCE injection module, it seemed like that CONFIG_X86_HT thing is more or less superfluous. It is def_bool y, depends on SMP and gets enabled in the majority of .configs - distro and otherwise - out there. So let's kill it and make code behind it depend directly on SMP. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Cc: Bartosz Golaszewski <bgolaszewski@baylibre.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Daniel Walter <dwalter@google.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Jacob Shin <jacob.w.shin@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1433436928-31903-18-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-07x86/mce: Add infrastructure to support Local MCEAshok Raj1-0/+5
Initialize and prepare for handling LMCEs. Add a boot-time option to disable LMCEs. Signed-off-by: Ashok Raj <ashok.raj@intel.com> [ Simplify stuff, align statements for better readability, reflow comments; kill unused lmce_clear(); save us an MSR write if LMCE is already enabled. ] Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1433436928-31903-16-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-07x86/mce: Add Local MCE definitionsAshok Raj2-0/+7
Add required definitions to support Local Machine Check Exceptions. Historically, machine check exceptions on Intel x86 processors have been broadcast to all logical processors in the system. Upcoming CPUs will support an opt-in mechanism to request some machine check exceptions be delivered to a single logical processor experiencing the fault. See http://www.intel.com/sdm Volume 3, System Programming Guide, chapter 15 for more information on MSRs and documentation on Local MCE. Signed-off-by: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1433436928-31903-15-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-07x86/mm/pat: Add set_memory_wt() for Write-Through typeToshi Kani1-1/+5
Now that reserve_ram_pages_type() accepts the WT type, add set_memory_wt(), set_memory_array_wt() and set_pages_array_wt() in order to be able to set memory to Write-Through page cache mode. Also, extend ioremap_change_attr() to accept the WT type. Signed-off-by: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Elliott@hp.com Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: arnd@arndb.de Cc: hch@lst.de Cc: hmh@hmh.eng.br Cc: jgross@suse.com Cc: konrad.wilk@oracle.com Cc: linux-mm <linux-mm@kvack.org> Cc: linux-nvdimm@lists.01.org Cc: stefan.bader@canonical.com Cc: yigal@plexistor.com Link: http://lkml.kernel.org/r/1433436928-31903-13-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-07x86/mm/pat: Add pgprot_writethrough()Toshi Kani1-0/+3
Add pgprot_writethrough() for setting page protection flags to Write-Through mode. Signed-off-by: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Elliott@hp.com Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: arnd@arndb.de Cc: hch@lst.de Cc: hmh@hmh.eng.br Cc: jgross@suse.com Cc: konrad.wilk@oracle.com Cc: linux-mm <linux-mm@kvack.org> Cc: linux-nvdimm@lists.01.org Cc: stefan.bader@canonical.com Cc: yigal@plexistor.com Link: http://lkml.kernel.org/r/1433436928-31903-11-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-07x86/mm, asm-generic: Add ioremap_wt() for creating Write-Through mappingsToshi Kani1-0/+2
Add ioremap_wt() for creating Write-Through mappings on x86. It follows the same model as ioremap_wc() for multi-arch support. Define ARCH_HAS_IOREMAP_WT in the x86 version of io.h to indicate that ioremap_wt() is implemented on x86. Also update the PAT documentation file to cover ioremap_wt(). Signed-off-by: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Elliott@hp.com Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: arnd@arndb.de Cc: hch@lst.de Cc: hmh@hmh.eng.br Cc: jgross@suse.com Cc: konrad.wilk@oracle.com Cc: linux-mm <linux-mm@kvack.org> Cc: linux-nvdimm@lists.01.org Cc: stefan.bader@canonical.com Cc: yigal@plexistor.com Link: http://lkml.kernel.org/r/1433436928-31903-8-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-07x86/mm: Teach is_new_memtype_allowed() about Write-Through typeToshi Kani1-1/+7
__ioremap_caller() calls reserve_memtype() and the passed down @new_pcm contains the actual page cache type it reserved in the success case. is_new_memtype_allowed() verifies if converting to the new page cache type is allowed when @pcm (the requested type) is different from @new_pcm. When WT is requested, the caller expects that writes are ordered and uncached. Therefore, enhance is_new_memtype_allowed() to disallow the following cases: - If the request is WT, mapping type cannot be WB - If the request is WT, mapping type cannot be WC Signed-off-by: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Elliott@hp.com Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: arnd@arndb.de Cc: hch@lst.de Cc: hmh@hmh.eng.br Cc: jgross@suse.com Cc: konrad.wilk@oracle.com Cc: linux-mm <linux-mm@kvack.org> Cc: linux-nvdimm@lists.01.org Cc: stefan.bader@canonical.com Cc: yigal@plexistor.com Link: http://lkml.kernel.org/r/1433436928-31903-7-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-07x86/mm/pat: Emulate PAT when it is disabledBorislav Petkov1-1/+1
In the case when PAT is disabled on the command line with "nopat" or when virtualization doesn't support PAT (correctly) - see 9d34cfdf4796 ("x86: Don't rely on VMWare emulating PAT MSR correctly"). we emulate it using the PWT and PCD cache attribute bits. Get rid of boot_pat_state while at it. Based on a conglomerate patch from Toshi Kani. Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Toshi Kani <toshi.kani@hp.com> Acked-by: Juergen Gross <jgross@suse.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Elliott@hp.com Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: arnd@arndb.de Cc: hch@lst.de Cc: hmh@hmh.eng.br Cc: konrad.wilk@oracle.com Cc: linux-mm <linux-mm@kvack.org> Cc: linux-nvdimm@lists.01.org Cc: stefan.bader@canonical.com Cc: yigal@plexistor.com Link: http://lkml.kernel.org/r/1433436928-31903-3-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-05Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds2-3/+13
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Misc fixes: - early_idt_handlers[] fix that fixes the build with bleeding edge tooling - build warning fix on GCC 5.1 - vm86 fix plus self-test to make it harder to break it again" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/asm/irq: Stop relying on magic JMP behavior for early_idt_handlers x86/asm/entry/32, selftests: Add a selftest for kernel entries from VM86 mode x86/boot: Add CONFIG_PARAVIRT_SPINLOCKS quirk to arch/x86/boot/compressed/misc.h x86/asm/entry/32: Really make user_mode() work correctly for VM86 mode
2015-06-05x86/asm/msr: Make wrmsrl_safe() a functionAndy Lutomirski1-2/+7
The wrmsrl_safe macro performs invalid shifts if the value argument is 32 bits. This makes it unnecessarily awkward to write code that puts an unsigned long into an MSR. Convert it to a real inline function. For inspiration, see: 7c74d5b7b7b6 ("x86/asm/entry/64: Fix MSR_IA32_SYSENTER_CS MSR value"). Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: <linux-kernel@vger.kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> [ Applied small improvements. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-04x86/asm/entry: Move arch/x86/include/asm/calling.h to arch/x86/entry/Ingo Molnar1-243/+0
asm/calling.h is private to the entry code, make this more apparent by moving it to the new arch/x86/entry/ directory. Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Brian Gerst <brgerst@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-03x86/mm: Decouple <linux/vmalloc.h> from <asm/io.h>Stephen Rothwell1-2/+1
Nothing in <asm/io.h> uses anything from <linux/vmalloc.h>, so remove it from there and fix up the resulting build problems triggered on x86 {64|32}-bit {def|allmod|allno}configs. The breakages were triggering in places where x86 builds relied on vmalloc() facilities but did not include <linux/vmalloc.h> explicitly and relied on the implicit inclusion via <asm/io.h>. Also add: - <linux/init.h> to <linux/io.h> - <asm/pgtable_types> to <asm/io.h> ... which were two other implicit header file dependencies. Suggested-by: David Miller <davem@davemloft.net> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> [ Tidied up the changelog. ] Acked-by: David Miller <davem@davemloft.net> Acked-by: Takashi Iwai <tiwai@suse.de> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Vinod Koul <vinod.koul@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Anton Vorontsov <anton@enomsg.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Colin Cross <ccross@android.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: James E.J. Bottomley <JBottomley@odin.com> Cc: Jaroslav Kysela <perex@perex.cz> Cc: K. Y. Srinivasan <kys@microsoft.com> Cc: Kees Cook <keescook@chromium.org> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Kristen Carlson Accardi <kristen@linux.intel.com> Cc: Len Brown <lenb@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Suma Ramars <sramars@cisco.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-03Merge branch 'locking/core' into x86/core, to prepare for dependent patchIngo Molnar8-5/+112
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-03Merge branches 'x86/mm', 'x86/build', 'x86/apic' and 'x86/platform' into ↵Ingo Molnar25-303/+355
x86/core, to apply dependent patch Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-02x86/asm/entry/64: Use negative immediates for stack adjustmentsJan Beulich1-2/+2
Doing so allows adjustments by 128 bytes (occurring for REMOVE_PT_GPREGS_FROM_STACK 8 uses) to be expressed with a single byte immediate. Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/556C660F020000780007FB60@mail.emea.novell.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-02x86/asm/irq: Stop relying on magic JMP behavior for early_idt_handlersAndy Lutomirski1-2/+12
The early_idt_handlers asm code generates an array of entry points spaced nine bytes apart. It's not really clear from that code or from the places that reference it what's going on, and the code only works in the first place because GAS never generates two-byte JMP instructions when jumping to global labels. Clean up the code to generate the correct array stride (member size) explicitly. This should be considerably more robust against screw-ups, as GAS will warn if a .fill directive has a negative count. Using '. =' to advance would have been even more robust (it would generate an actual error if it tried to move backwards), but it would pad with nulls, confusing anyone who tries to disassemble the code. The new scheme should be much clearer to future readers. While we're at it, improve the comments and rename the array and common code. Binutils may start relaxing jumps to non-weak labels. If so, this change will fix our build, and we may need to backport this change. Before, on x86_64: 0000000000000000 <early_idt_handlers>: 0: 6a 00 pushq $0x0 2: 6a 00 pushq $0x0 4: e9 00 00 00 00 jmpq 9 <early_idt_handlers+0x9> 5: R_X86_64_PC32 early_idt_handler-0x4 ... 48: 66 90 xchg %ax,%ax 4a: 6a 08 pushq $0x8 4c: e9 00 00 00 00 jmpq 51 <early_idt_handlers+0x51> 4d: R_X86_64_PC32 early_idt_handler-0x4 ... 117: 6a 00 pushq $0x0 119: 6a 1f pushq $0x1f 11b: e9 00 00 00 00 jmpq 120 <early_idt_handler> 11c: R_X86_64_PC32 early_idt_handler-0x4 After: 0000000000000000 <early_idt_handler_array>: 0: 6a 00 pushq $0x0 2: 6a 00 pushq $0x0 4: e9 14 01 00 00 jmpq 11d <early_idt_handler_common> ... 48: 6a 08 pushq $0x8 4a: e9 d1 00 00 00 jmpq 120 <early_idt_handler_common> 4f: cc int3 50: cc int3 ... 117: 6a 00 pushq $0x0 119: 6a 1f pushq $0x1f 11b: eb 03 jmp 120 <early_idt_handler_common> 11d: cc int3 11e: cc int3 11f: cc int3 Signed-off-by: Andy Lutomirski <luto@kernel.org> Acked-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Binutils <binutils@sourceware.org> Cc: Borislav Petkov <bp@alien8.de> Cc: H.J. Lu <hjl.tools@gmail.com> Cc: Jan Beulich <JBeulich@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/ac027962af343b0c599cbfcf50b945ad2ef3d7a8.1432336324.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-02Merge branch 'linus' into sched/core, to resolve conflictIngo Molnar2-0/+4
Conflicts: arch/sparc/include/asm/topology_64.h Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-02x86/debug: Remove perpetually broken, unmaintainable dwarf annotationsIngo Molnar3-224/+47
So the dwarf2 annotations in low level assembly code have become an increasing hindrance: unreadable, messy macros mixed into some of the most security sensitive code paths of the Linux kernel. These debug info annotations don't even buy the upstream kernel anything: dwarf driven stack unwinding has caused problems in the past so it's out of tree, and the upstream kernel only uses the much more robust framepointers based stack unwinding method. In addition to that there's a steady, slow bitrot going on with these annotations, requiring frequent fixups. There's no tooling and no functionality upstream that keeps it correct. So burn down the sick forest, allowing new, healthier growth: 27 files changed, 350 insertions(+), 1101 deletions(-) Someone who has the willingness and time to do this properly can attempt to reintroduce dwarf debuginfo in x86 assembly code plus dwarf unwinding from first principles, with the following conditions: - it should be maximally readable, and maximally low-key to 'ordinary' code reading and maintenance. - find a build time method to insert dwarf annotations automatically in the most common cases, for pop/push instructions that manipulate the stack pointer. This could be done for example via a preprocessing step that just looks for common patterns - plus special annotations for the few cases where we want to depart from the default. We have hundreds of CFI annotations, so automating most of that makes sense. - it should come with build tooling checks that ensure that CFI annotations are sensible. We've seen such efforts from the framepointer side, and there's no reason it couldn't be done on the dwarf side. Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Frédéric Weisbecker <fweisbec@gmail.com Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jan Beulich <JBeulich@suse.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-29x86/asm/entry/32: Really make user_mode() work correctly for VM86 modeJan Beulich1-1/+1
While commit efa7045103 ("x86/asm/entry: Make user_mode() work correctly if regs came from VM86 mode") claims that "user_mode() is now identical to user_mode_vm()", this wasn't actually the case - no prior commit made it so. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/5566EB0D020000780007E655@mail.emea.novell.com Signed-off-by: Ingo Molnar <mingo@kernel.org>