summaryrefslogtreecommitdiff
path: root/arch/x86/include/asm
AgeCommit message (Collapse)AuthorFilesLines
2012-07-24Merge tag 'kvm-3.6-1' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds9-5/+59
Pull KVM updates from Avi Kivity: "Highlights include - full big real mode emulation on pre-Westmere Intel hosts (can be disabled with emulate_invalid_guest_state=0) - relatively small ppc and s390 updates - PCID/INVPCID support in guests - EOI avoidance; 3.6 guests should perform better on 3.6 hosts on interrupt intensive workloads) - Lockless write faults during live migration - EPT accessed/dirty bits support for new Intel processors" Fix up conflicts in: - Documentation/virtual/kvm/api.txt: Stupid subchapter numbering, added next to each other. - arch/powerpc/kvm/booke_interrupts.S: PPC asm changes clashing with the KVM fixes - arch/s390/include/asm/sigp.h, arch/s390/kvm/sigp.c: Duplicated commits through the kvm tree and the s390 tree, with subsequent edits in the KVM tree. * tag 'kvm-3.6-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (93 commits) KVM: fix race with level interrupts x86, hyper: fix build with !CONFIG_KVM_GUEST Revert "apic: fix kvm build on UP without IOAPIC" KVM guest: switch to apic_set_eoi_write, apic_write apic: add apic_set_eoi_write for PV use KVM: VMX: Implement PCID/INVPCID for guests with EPT KVM: Add x86_hyper_kvm to complete detect_hypervisor_platform check KVM: PPC: Critical interrupt emulation support KVM: PPC: e500mc: Fix tlbilx emulation for 64-bit guests KVM: PPC64: booke: Set interrupt computation mode for 64-bit host KVM: PPC: bookehv: Add ESR flag to Data Storage Interrupt KVM: PPC: bookehv64: Add support for std/ld emulation. booke: Added crit/mc exception handler for e500v2 booke/bookehv: Add host crit-watchdog exception support KVM: MMU: document mmu-lock and fast page fault KVM: MMU: fix kvm_mmu_pagetable_walk tracepoint KVM: MMU: trace fast page fault KVM: MMU: fast path of handling guest page fault KVM: MMU: introduce SPTE_MMU_WRITEABLE bit KVM: MMU: fold tlb flush judgement into mmu_spte_update ...
2012-07-23Merge branch 'x86-mce-for-linus' of ↵Linus Torvalds1-0/+21
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/mce changes from Ingo Molnar: "This tree improves the AMD thresholding bank code and includes a memory fault signal handling fixlet." * 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce: Fix siginfo_t->si_addr value for non-recoverable memory faults x86, MCE, AMD: Update copyrights and boilerplate x86, MCE, AMD: Give proper names to the thresholding banks x86, MCE, AMD: Make error_count read only x86, MCE, AMD: Cleanup reading of error_count x86, MCE, AMD: Print decimal thresholding values x86, MCE, AMD: Move shared bank to node descriptor x86, MCE, AMD: Remove local_allocate_... wrapper x86, MCE, AMD: Remove shared banks sysfs linking x86, amd_nb: Export model 0x10 and later PCI id
2012-07-22Merge branch 'x86-uv-for-linus' of ↵Linus Torvalds1-9/+19
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/uv changes from Ingo Molnar: "UV2 BAU productization fixes. The BAU (Broadcast Assist Unit) is SGI's fancy out of line way on UV hardware to do TLB flushes, instead of the normal APIC IPI methods. The commits here fix / work around hangs in their latest hardware iteration (UV2). My understanding is that the main purpose of the out of line signalling channel is to improve scalability: the UV APIC hardware glue does not handle broadcasting to many CPUs very well, and this matters most for TLB shootdowns. [ I don't agree with all aspects of the current approach: in hindsight it would have been better to link the BAU at the IPI/APIC driver level instead of the TLB shootdown level, where TLB flushes are really just one of the uses of broadcast SMP messages. Doing that would improve scalability in some other ways and it would also remove a few uglies from the TLB path. It would also be nice to push more is_uv_system() tests into proper x86_init or x86_platform callbacks. Cliff? ]" * 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/uv: Work around UV2 BAU hangs x86/uv: Implement UV BAU runtime enable and disable control via /proc/sgi_uv/ x86/uv: Fix the UV BAU destination timeout period
2012-07-22Merge branch 'x86-reboot-for-linus' of ↵Linus Torvalds3-5/+4
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/reboot changes from Ingo Molnar: "Now that the revampted x86 real-mode trampoline code is upstream and seems to be working well, we can extend the 64-bit reboot code to be as capable as the 32-bit one." * 'x86-reboot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86-64, reboot: Be more paranoid in 64-bit reboot=bios x86, reboot: Drop redundant write of reboot_mode x86-64, reboot: Allow reboot=bios and reboot-cpu override on x86-64
2012-07-22Merge branch 'x86-platform-for-linus' of ↵Linus Torvalds3-33/+50
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 platform changes from Ingo Molnar: "This tree mostly involves various APIC driver cleanups/robustization, and vSMP motivated platform callback improvements/cleanups" Fix up trivial conflict due to printk cleanup right next to return value change. * 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (29 commits) Revert "x86/early_printk: Replace obsolete simple_strtoul() usage with kstrtoint()" x86/apic/x2apic: Use multiple cluster members for the irq destination only with the explicit affinity x86/apic/x2apic: Limit the vector reservation to the user specified mask x86/apic: Optimize cpu traversal in __assign_irq_vector() using domain membership x86/vsmp: Fix vector_allocation_domain's return value irq/apic: Use config_enabled(CONFIG_SMP) checks to clean up irq_set_affinity() for UP x86/vsmp: Fix linker error when CONFIG_PROC_FS is not set x86/apic/es7000: Make apicid of a cluster (not CPU) from a cpumask x86/apic/es7000+summit: Always make valid apicid from a cpumask x86/apic/es7000+summit: Fix compile warning in cpu_mask_to_apicid() x86/apic: Fix ugly casting and branching in cpu_mask_to_apicid_and() x86/apic: Eliminate cpu_mask_to_apicid() operation x86/x2apic/cluster: Vector_allocation_domain() should return a value x86/apic/irq_remap: Silence a bogus pr_err() x86/vsmp: Ignore IOAPIC IRQ affinity if possible x86/apic: Make cpu_mask_to_apicid() operations check cpu_online_mask x86/apic: Make cpu_mask_to_apicid() operations return error code x86/apic: Avoid useless scanning thru a cpumask in assign_irq_vector() x86/apic: Try to spread IRQ vectors to different priority levels x86/apic: Factor out default vector_allocation_domain() operation ...
2012-07-22Merge branch 'x86-debug-for-linus' of ↵Linus Torvalds6-29/+19
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull debug-for-linus git tree from Ingo Molnar. Fix up trivial conflict in arch/x86/kernel/cpu/perf_event_intel.c due to a printk() having changed to a pr_info() differently in the two branches. * 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Move call to print_modules() out of show_regs() x86/mm: Mark free_initrd_mem() as __init x86/microcode: Mark microcode_id[] as __initconst x86/nmi: Clean up register_nmi_handler() usage x86: Save cr2 in NMI in case NMIs take a page fault (for i386) x86: Remove cmpxchg from i386 NMI nesting code x86: Save cr2 in NMI in case NMIs take a page fault x86/debug: Add KERN_<LEVEL> to bare printks, convert printks to pr_<level>
2012-07-22Merge branch 'x86-asm-for-linus' of ↵Linus Torvalds2-16/+69
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/asm changes from Ingo Molnar: "Assorted single-commit improvements, as usual" * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm/mtrr: Slightly simplify print_mtrr_state() x86/mm/mtrr: Fix alignment determination in range_to_mtrr() x86/copy_user_generic: Optimize copy_user_generic with CPU erms feature x86/alternatives: Use atomic_xchg() instead atomic_dec_and_test() for stop_machine_text_poke()
2012-07-22Merge branch 'smp-hotplug-for-linus' of ↵Linus Torvalds1-5/+0
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull smp/hotplug changes from Ingo Molnar: "Various cleanups to the SMP hotplug code - a continuing effort of Thomas et al" * 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: smpboot: Remove leftover declaration smp: Remove num_booting_cpus() smp: Remove ipi_call_lock[_irq]()/ipi_call_unlock[_irq]() POWERPC: Smp: remove call to ipi_call_lock()/ipi_call_unlock() SPARC: SMP: Remove call to ipi_call_lock_irq()/ipi_call_unlock_irq() ia64: SMP: Remove call to ipi_call_lock_irq()/ipi_call_unlock_irq() x86-smp-remove-call-to-ipi_call_lock-ipi_call_unlock tile: SMP: Remove call to ipi_call_lock()/ipi_call_unlock() S390: Smp: remove call to ipi_call_lock()/ipi_call_unlock() parisc: Smp: remove call to ipi_call_lock()/ipi_call_unlock() mn10300: SMP: Remove call to ipi_call_lock()/ipi_call_unlock() hexagon: SMP: Remove call to ipi_call_lock()/ipi_call_unlock()
2012-07-20KVM: fix race with level interruptsMichael S. Tsirkin1-1/+14
When more than 1 source id is in use for the same GSI, we have the following race related to handling irq_states race: CPU 0 clears bit 0. CPU 0 read irq_state as 0. CPU 1 sets level to 1. CPU 1 calls kvm_ioapic_set_irq(1). CPU 0 calls kvm_ioapic_set_irq(0). Now ioapic thinks the level is 0 but irq_state is not 0. Fix by performing all irq_states bitmap handling under pic/ioapic lock. This also removes the need for atomics with irq_states handling. Reported-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-07-16Revert "apic: fix kvm build on UP without IOAPIC"Michael S. Tsirkin1-5/+0
This reverts commit f9808b7fd422b965cea52e05ba470e0a473c53d3. After commit 'kvm: switch to apic_set_eoi_write, apic_write' the stubs are no longer needed as kvm does not look at apicdrivers anymore. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-16apic: add apic_set_eoi_write for PV useMichael S. Tsirkin1-0/+3
KVM PV EOI optimization overrides eoi_write apic op with its own version. Add an API for this to avoid meddling with core x86 apic driver data structures directly. For KVM use, we don't need any guarantees about when the switch to the new op will take place, so it could in theory use this API after SMP init, but it currently doesn't, and restricting callers to early init makes it clear that it's safe as it won't race with actual APIC driver use. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-12KVM: VMX: Implement PCID/INVPCID for guests with EPTMao, Junjie3-1/+7
This patch handles PCID/INVPCID for guests. Process-context identifiers (PCIDs) are a facility by which a logical processor may cache information for multiple linear-address spaces so that the processor may retain cached information when software switches to a different linear address space. Refer to section 4.10.1 in IA32 Intel Software Developer's Manual Volume 3A for details. For guests with EPT, the PCID feature is enabled and INVPCID behaves as running natively. For guests without EPT, the PCID feature is disabled and INVPCID triggers #UD. Signed-off-by: Junjie Mao <junjie.mao@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-11KVM: Add x86_hyper_kvm to complete detect_hypervisor_platform checkPrarit Bhargava1-0/+1
While debugging I noticed that unlike all the other hypervisor code in the kernel, kvm does not have an entry for x86_hyper which is used in detect_hypervisor_platform() which results in a nice printk in the syslog. This is only really a stub function but it does make kvm more consistent with the other hypervisors. Signed-off-by: Prarit Bhargava <prarit@redhat.com> Cc: Avi Kivity <avi@redhat.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Marcelo Tostatti <mtosatti@redhat.com> Cc: kvm@vger.kernel.org Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-11Merge tag 'v3.5-rc6' into x86/mceIngo Molnar5-21/+38
Merge Linux 3.5-rc6 before merging more code. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-09KVM: x86 emulator: initialize memopAvi Kivity1-1/+1
memop is not initialized; this can lead to a two-byte operation following a 4-byte operation to see garbage values. Usually truncation fixes things fot us later on, but at least in one case (call abs) it doesn't. Fix by moving memop to the auto-initialized field area. Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-09KVM: x86 emulator: change ->get_cpuid() accessor to use the x86 semanticsAvi Kivity1-2/+2
Instead of getting an exact leaf, follow the spec and fall back to the last main leaf instead. This lets us easily emulate the cpuid instruction in the emulator. Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-06x86/apic/x2apic: Limit the vector reservation to the user specified maskSuresh Siddha1-3/+6
For the x2apic cluster mode, vector for an interrupt is currently reserved on all the cpu's that are part of the x2apic cluster. But the interrupts will be routed only to the cluster (derived from the first cpu in the mask) members specified in the mask. So there is no need to reserve the vector in the unused cluster members. Modify __assign_irq_vector() to reserve the vectors based on the user specified irq destination mask. If the new mask is a proper subset of the currently used mask, cleanup the vector allocation on the unused cpu members. Also, allow the apic driver to tune the vector domain based on the affinity mask (which in most cases is the user-specified mask). Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: Yinghai Lu <yinghai@kernel.org> Acked-by: Alexander Gordeev <agordeev@redhat.com> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> Link: http://lkml.kernel.org/r/1340656709-11423-3-git-send-email-suresh.b.siddha@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-06x86/apic: Optimize cpu traversal in __assign_irq_vector() using domain ↵Suresh Siddha1-5/+3
membership Currently __assign_irq_vector() goes through each cpu in the specified mask until it finds a free vector in all the cpu's that are part of the same interrupt domain. We visit all the interrupt domain sibling cpus to reserve the free vector. So, when we fail to find a free vector in an interrupt domain, it is safe to continue our search with a cpu belonging to a new interrupt domain. No need to go through each cpu, if the domain containing that cpu is already visited. Use the irq_cfg's old_domain to track the visited domains and optimize the cpu traversal while finding a free vector in the given cpumask. NOTE: We can also optimize the search by using for_each_cpu() and skip the current cpu, if it is not the first cpu in the mask returned by the vector_allocation_domain(). But re-using the cfg->old_domain to track the visited domains will be slightly faster. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: Yinghai Lu <yinghai@kernel.org> Acked-by: Alexander Gordeev <agordeev@redhat.com> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> Link: http://lkml.kernel.org/r/1340656709-11423-2-git-send-email-suresh.b.siddha@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-05perf/x86: Add a microcode revision check for SNB-PEBSPeter Zijlstra1-0/+2
Recent Intel microcode resolved the SNB-PEBS issues, so conditionally enable PEBS on SNB hardware depending on the microcode revision. Thanks to Stephane for figuring out the various microcode revisions. Suggested-by: Stephane Eranian <eranian@google.com> Acked-by: Borislav Petkov <borislav.petkov@amd.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-v3672ziwh9damwqwh1uz3krm@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-05perf/x86/amd: Unify AMD's generic and family 15h pmusRobert Richter1-2/+1
There is no need for keeping separate pmu structs. We can enable amd_{get,put}_event_constraints() functions also for family 15h event. The advantage is that there is only a single pmu struct for all AMD cpus. This patch introduces functions to setup the pmu to enabe core performance counters or counter constraints. Also, cpuid checks are used instead of family checks where possible. Thus, it enables the code independently of cpu families if the feature flag is set. Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1340217996-2254-4-git-send-email-robert.richter@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-05perf/x86: Rename Intel specific macrosRobert Richter2-11/+10
There are macros that are Intel specific and not x86 generic. Rename them into INTEL_*. This patch removes X86_PMC_IDX_GENERIC and does: $ sed -i -e 's/X86_PMC_MAX_/INTEL_PMC_MAX_/g' \ arch/x86/include/asm/kvm_host.h \ arch/x86/include/asm/perf_event.h \ arch/x86/kernel/cpu/perf_event.c \ arch/x86/kernel/cpu/perf_event_p4.c \ arch/x86/kvm/pmu.c $ sed -i -e 's/X86_PMC_IDX_FIXED/INTEL_PMC_IDX_FIXED/g' \ arch/x86/include/asm/perf_event.h \ arch/x86/kernel/cpu/perf_event.c \ arch/x86/kernel/cpu/perf_event_intel.c \ arch/x86/kernel/cpu/perf_event_intel_ds.c \ arch/x86/kvm/pmu.c $ sed -i -e 's/X86_PMC_MSK_/INTEL_PMC_MSK_/g' \ arch/x86/include/asm/perf_event.h \ arch/x86/kernel/cpu/perf_event.c Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1340217996-2254-2-git-send-email-robert.richter@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-05Merge branch 'x86/cpu' into perf/coreIngo Molnar3-82/+3
Merge this branch because we changed the wrmsr*_safe() API and there's a conflict. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-05Merge branch 'perf/urgent' into perf/coreIngo Molnar2-14/+18
Merge this branch to pick up a fixlet and to update to a more recent base. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-03apic: fix kvm build on UP without IOAPICMichael S. Tsirkin1-0/+5
On UP i386, when APIC is disabled # CONFIG_X86_UP_APIC is not set # CONFIG_PCI_IOAPIC is not set code looking at apicdrivers never has any effect but it still gets compiled in. In particular, this causes build failures with kvm, but it generally bloats the kernel unnecessarily. Fix by defining both __apicdrivers and __apicdrivers_end to be NULL when CONFIG_X86_LOCAL_APIC is unset: I verified that as the result any loop scanning __apicdrivers gets optimized out by the compiler. Warning: a .config with apic disabled doesn't seem to boot for me (even without this patch). Still verifying why, meanwhile this patch is compile-tested only. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reported-by: Randy Dunlap <rdunlap@xenotime.net> Acked-by: Randy Dunlap <rdunlap@xenotime.net> Acked-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-06-30x86/copy_user_generic: Optimize copy_user_generic with CPU erms featureFenghua Yu2-16/+69
According to Intel 64 and IA-32 SDM and Optimization Reference Manual, beginning with Ivybridge, REG string operation using MOVSB and STOSB can provide both flexible and high-performance REG string operations in cases like memory copy. Enhancement availability is indicated by CPUID.7.0.EBX[9] (Enhanced REP MOVSB/ STOSB). If CPU erms feature is detected, patch copy_user_generic with enhanced fast string version of copy_user_generic. A few new macros are defined to reduce duplicate code in ALTERNATIVE and ALTERNATIVE_2. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: http://lkml.kernel.org/r/1337908785-14015-1-git-send-email-fenghua.yu@intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-06-29Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar. * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, cpufeature: Remove stray %s, add -w to mkcapflags.pl x86, cpufeature: Catch duplicate CPU feature strings x86, cpufeature: Rename X86_FEATURE_DTS to X86_FEATURE_DTHERM x86: Fix kernel-doc warnings x86, compat: Use test_thread_flag(TIF_IA32) in compat signal delivery
2012-06-25x86, cpufeature: Rename X86_FEATURE_DTS to X86_FEATURE_DTHERMH. Peter Anvin1-1/+1
It makes sense to label "Digital Thermal Sensor" as "DTS", but unfortunately the string "dts" was already used for "Debug Store", and /proc/cpuinfo is a user space ABI. Therefore, rename this to "dtherm". This conflict went into mainline via the hwmon tree without any x86 maintainer ack, and without any kind of hint in the subject. a4659053 x86/hwmon: fix initialization of coretemp Reported-by: Jean Delvare <khali@linux-fr.org> Link: http://lkml.kernel.org/r/4FE34BCB.5050305@linux.intel.com Cc: Jan Beulich <JBeulich@suse.com> Cc: <stable@vger.kernel.org> v2.6.36..v3.4 Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-06-25x86/uv: Work around UV2 BAU hangsCliff Wickman1-10/+18
On SGI's UV2 the BAU (Broadcast Assist Unit) driver can hang under a heavy load. To cure this: - Disable the UV2 extended status mode (see UV2_EXT_SHFT), as this mode changes BAU behavior in more ways then just delivering an extra bit of status. Revert status to just two meaningful bits, like UV1. - Use no IPI-style resets on UV2. Just give up the request for whatever the reason it failed and let it be accomplished with the legacy IPI method. - Use no alternate sending descriptor (the former UV2 workaround bcp->using_desc and handle_uv2_busy() stuff). Just disable the use of the BAU for a period of time in favor of the legacy IPI method when the h/w bug leaves a descriptor busy. -- new tunable: giveup_limit determines the threshold at which a hub is so plugged that it should do all requests with the legacy IPI method for a period of time -- generalize disable_for_congestion() (renamed disable_for_period()) for use whenever a hub should avoid using the BAU for a period of time Also: - Fix find_another_by_swack(), which is part of the UV2 bug workaround - Correct and clarify the statistics (new stats s_overipilimit, s_giveuplimit, s_enters, s_ipifordisabled, s_plugged, s_congested) Signed-off-by: Cliff Wickman <cpw@sgi.com> Link: http://lkml.kernel.org/r/20120622131459.GC31884@sgi.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-25x86/uv: Implement UV BAU runtime enable and disable control via /proc/sgi_uv/Cliff Wickman1-0/+2
This patch enables the BAU to be turned on or off dynamically. echo "on" > /proc/sgi_uv/ptc_statistics echo "off" > /proc/sgi_uv/ptc_statistics The system may be booted with or without the nobau option. Whether the system currently has the BAU off can be seen in the /proc file -- normally with the baustats script. Each cpu will have a 1 in the bauoff field if the BAU was turned off, so baustats will give a count of cpus that have it off. Signed-off-by: Cliff Wickman <cpw@sgi.com> Link: http://lkml.kernel.org/r/20120622131330.GB31884@sgi.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-25KVM: host side for eoi optimizationMichael S. Tsirkin1-0/+12
Implementation of PV EOI using shared memory. This reduces the number of exits an interrupt causes as much as by half. The idea is simple: there's a bit, per APIC, in guest memory, that tells the guest that it does not need EOI. We set it before injecting an interrupt and clear before injecting a nested one. Guest tests it using a test and clear operation - this is necessary so that host can detect interrupt nesting - and if set, it can skip the EOI MSR. There's a new MSR to set the address of said register in guest memory. Otherwise not much changed: - Guest EOI is not required - Register is tested & ISR is automatically cleared on exit For testing results see description of previous patch 'kvm_para: guest side for eoi avoidance'. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-06-25x86, bitops: note on __test_and_clear_bit atomicityMichael S. Tsirkin1-0/+7
__test_and_clear_bit is actually atomic with respect to the local CPU. Add a note saying that KVM on x86 relies on this behaviour so people don't accidentaly break it. Also warn not to rely on this in portable code. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-06-25KVM guest: guest side for eoi avoidanceMichael S. Tsirkin1-0/+7
The idea is simple: there's a bit, per APIC, in guest memory, that tells the guest that it does not need EOI. Guest tests it using a single est and clear operation - this is necessary so that host can detect interrupt nesting - and if set, it can skip the EOI MSR. I run a simple microbenchmark to show exit reduction (note: for testing, need to apply follow-up patch 'kvm: host side for eoi optimization' + a qemu patch I posted separately, on host): Before: Performance counter stats for 'sleep 1s': 47,357 kvm:kvm_entry [99.98%] 0 kvm:kvm_hypercall [99.98%] 0 kvm:kvm_hv_hypercall [99.98%] 5,001 kvm:kvm_pio [99.98%] 0 kvm:kvm_cpuid [99.98%] 22,124 kvm:kvm_apic [99.98%] 49,849 kvm:kvm_exit [99.98%] 21,115 kvm:kvm_inj_virq [99.98%] 0 kvm:kvm_inj_exception [99.98%] 0 kvm:kvm_page_fault [99.98%] 22,937 kvm:kvm_msr [99.98%] 0 kvm:kvm_cr [99.98%] 0 kvm:kvm_pic_set_irq [99.98%] 0 kvm:kvm_apic_ipi [99.98%] 22,207 kvm:kvm_apic_accept_irq [99.98%] 22,421 kvm:kvm_eoi [99.98%] 0 kvm:kvm_pv_eoi [99.99%] 0 kvm:kvm_nested_vmrun [99.99%] 0 kvm:kvm_nested_intercepts [99.99%] 0 kvm:kvm_nested_vmexit [99.99%] 0 kvm:kvm_nested_vmexit_inject [99.99%] 0 kvm:kvm_nested_intr_vmexit [99.99%] 0 kvm:kvm_invlpga [99.99%] 0 kvm:kvm_skinit [99.99%] 57 kvm:kvm_emulate_insn [99.99%] 0 kvm:vcpu_match_mmio [99.99%] 0 kvm:kvm_userspace_exit [99.99%] 2 kvm:kvm_set_irq [99.99%] 2 kvm:kvm_ioapic_set_irq [99.99%] 23,609 kvm:kvm_msi_set_irq [99.99%] 1 kvm:kvm_ack_irq [99.99%] 131 kvm:kvm_mmio [99.99%] 226 kvm:kvm_fpu [100.00%] 0 kvm:kvm_age_page [100.00%] 0 kvm:kvm_try_async_get_page [100.00%] 0 kvm:kvm_async_pf_doublefault [100.00%] 0 kvm:kvm_async_pf_not_present [100.00%] 0 kvm:kvm_async_pf_ready [100.00%] 0 kvm:kvm_async_pf_completed 1.002100578 seconds time elapsed After: Performance counter stats for 'sleep 1s': 28,354 kvm:kvm_entry [99.98%] 0 kvm:kvm_hypercall [99.98%] 0 kvm:kvm_hv_hypercall [99.98%] 1,347 kvm:kvm_pio [99.98%] 0 kvm:kvm_cpuid [99.98%] 1,931 kvm:kvm_apic [99.98%] 29,595 kvm:kvm_exit [99.98%] 24,884 kvm:kvm_inj_virq [99.98%] 0 kvm:kvm_inj_exception [99.98%] 0 kvm:kvm_page_fault [99.98%] 1,986 kvm:kvm_msr [99.98%] 0 kvm:kvm_cr [99.98%] 0 kvm:kvm_pic_set_irq [99.98%] 0 kvm:kvm_apic_ipi [99.99%] 25,953 kvm:kvm_apic_accept_irq [99.99%] 26,132 kvm:kvm_eoi [99.99%] 26,593 kvm:kvm_pv_eoi [99.99%] 0 kvm:kvm_nested_vmrun [99.99%] 0 kvm:kvm_nested_intercepts [99.99%] 0 kvm:kvm_nested_vmexit [99.99%] 0 kvm:kvm_nested_vmexit_inject [99.99%] 0 kvm:kvm_nested_intr_vmexit [99.99%] 0 kvm:kvm_invlpga [99.99%] 0 kvm:kvm_skinit [99.99%] 284 kvm:kvm_emulate_insn [99.99%] 68 kvm:vcpu_match_mmio [99.99%] 68 kvm:kvm_userspace_exit [99.99%] 2 kvm:kvm_set_irq [99.99%] 2 kvm:kvm_ioapic_set_irq [99.99%] 28,288 kvm:kvm_msi_set_irq [99.99%] 1 kvm:kvm_ack_irq [99.99%] 131 kvm:kvm_mmio [100.00%] 588 kvm:kvm_fpu [100.00%] 0 kvm:kvm_age_page [100.00%] 0 kvm:kvm_try_async_get_page [100.00%] 0 kvm:kvm_async_pf_doublefault [100.00%] 0 kvm:kvm_async_pf_not_present [100.00%] 0 kvm:kvm_async_pf_ready [100.00%] 0 kvm:kvm_async_pf_completed 1.002039622 seconds time elapsed We see that # of exits is almost halved. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-06-21thp: avoid atomic64_read in pmd_read_atomic for 32bit PAEAndrea Arcangeli1-13/+17
In the x86 32bit PAE CONFIG_TRANSPARENT_HUGEPAGE=y case while holding the mmap_sem for reading, cmpxchg8b cannot be used to read pmd contents under Xen. So instead of dealing only with "consistent" pmdvals in pmd_none_or_trans_huge_or_clear_bad() (which would be conceptually simpler) we let pmd_none_or_trans_huge_or_clear_bad() deal with pmdvals where the low 32bit and high 32bit could be inconsistent (to avoid having to use cmpxchg8b). The only guarantee we get from pmd_read_atomic is that if the low part of the pmd was found null, the high part will be null too (so the pmd will be considered unstable). And if the low part of the pmd is found "stable" later, then it means the whole pmd was read atomically (because after a pmd is stable, neither MADV_DONTNEED nor page faults can alter it anymore, and we read the high part after the low part). In the 32bit PAE x86 case, it is enough to read the low part of the pmdval atomically to declare the pmd as "stable" and that's true for THP and no THP, furthermore in the THP case we also have a barrier() that will prevent any inconsistent pmdvals to be cached by a later re-read of the *pmd. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Jonathan Nieder <jrnieder@gmail.com> Cc: Ulrich Obergfell <uobergfe@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Cc: Larry Woodman <lwoodman@redhat.com> Cc: Petr Matousek <pmatouse@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Jan Beulich <jbeulich@suse.com> Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com> Tested-by: Andrew Jones <drjones@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-06-20x86/nmi: Clean up register_nmi_handler() usageLi Zhong1-17/+3
Implement a cleaner and easier to maintain version for the section warning fixes implemented in commit eeaaa96a3a21 ("x86/nmi: Fix section mismatch warnings on 32-bit"). Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com> Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: Jan Beulich <JBeulich@suse.com> Link: http://lkml.kernel.org/r/1340049393-17771-1-git-send-email-dzickus@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-20Merge commit 'v3.5-rc3' into x86/debugIngo Molnar3-7/+20
Merge it in to pick up a fix that we are going to clean up in this branch. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-18KVM: Introduce __KVM_HAVE_IRQ_LINEChristoffer Dall1-0/+1
This is a preparatory patch for the KVM/ARM implementation. KVM/ARM will use the KVM_IRQ_LINE ioctl, which is currently conditional on __KVM_HAVE_IOAPIC, but ARM obviously doesn't have any IOAPIC support and we need a separate define. Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-06-18Merge branch 'x86/apic' into x86/platformIngo Molnar3-33/+47
Merge in x86/apic to solve a vector_allocation_domain() API change semantic merge conflict. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-17x86-64, reboot: Allow reboot=bios and reboot-cpu override on x86-64H. Peter Anvin3-5/+4
With the revamped realmode trampoline code, it is trivial to extend support for reboot=bios to x86-64. Furthermore, while we are at it, remove the restriction that only we can only override the reboot CPU on 32 bits. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Link: http://lkml.kernel.org/n/tip-jopx7y6g6dbcx4tpal8q0jlr@git.kernel.org
2012-06-15Merge branch 'x86/cleanups' into x86/apicIngo Molnar1-2/+0
Merge in the cleanups because a followup x86/apic change relies on them. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-14x86/apic: Eliminate cpu_mask_to_apicid() operationAlexander Gordeev1-25/+8
Since there are only two locations where cpu_mask_to_apicid() is called from, remove the operation and use only cpu_mask_to_apicid_and() instead. Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Suggested-and-acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/20120614074935.GE3383@dhcp-26-207.brq.redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-11Merge tag 'v3.5-rc2' into perf/coreIngo Molnar2-1/+14
Merge in Linux 3.5-rc2 - to pick up fixes. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-08Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds2-1/+14
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar. * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/nmi: Fix section mismatch warnings on 32-bit x86/uv: Fix UV2 BAU legacy mode x86/mm: Only add extra pages count for the first memory range during pre-allocation early page table space x86, efi stub: Add .reloc section back into image x86/ioapic: Fix NULL pointer dereference on CPU hotplug after disabling irqs x86/reboot: Fix a warning message triggered by stop_other_cpus() x86/intel/moorestown: Change intel_scu_devices_create() to __devinit x86/numa: Set numa_nodes_parsed at acpi_numa_memory_affinity_init() x86/gart: Fix kmemleak warning x86: mce: Add the dropped timer interval init back x86/mce: Fix the MCE poll timer logic
2012-06-08uprobes: Pass probed vaddr to arch_uprobe_analyze_insn()Ananth N Mavinakayanahalli1-1/+1
On RISC architectures like powerpc, instructions are fixed size. Instruction analysis on such platforms is just a matter of (insn % 4). Pass the vaddr at which the uprobe is to be inserted so that arch_uprobe_analyze_insn() can flag misaligned registration requests. Signed-off-by: Ananth N Mavinakaynahalli <ananth@in.ibm.com> Cc: michael@ellerman.id.au Cc: antonb@thinktux.localdomain Cc: Paul Mackerras <paulus@samba.org> Cc: benh@kernel.crashing.org Cc: peterz@infradead.org Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Jim Keniston <jkenisto@us.ibm.com> Cc: oleg@redhat.com Cc: linuxppc-dev@lists.ozlabs.org Link: http://lkml.kernel.org/r/20120608093257.GG13409@in.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-08x86/nmi: Fix section mismatch warnings on 32-bitDon Zickus1-0/+14
It was reported that compiling for 32-bit caused a bunch of section mismatch warnings: VDSOSYM arch/x86/vdso/vdso32-syms.lds LD arch/x86/vdso/built-in.o LD arch/x86/built-in.o WARNING: arch/x86/built-in.o(.data+0x5af0): Section mismatch in reference from the variable test_nmi_ipi_callback_na.10451 to the function .init.text:test_nmi_ipi_callback() [...] WARNING: arch/x86/built-in.o(.data+0x5b04): Section mismatch in reference from the variable nmi_unk_cb_na.10399 to the function .init.text:nmi_unk_cb() The variable nmi_unk_cb_na.10399 references the function __init nmi_unk_cb() [...] Both of these are attributed to the internal representation of the nmiaction struct created during register_nmi_handler. The reason for this is that those structs are not defined in the init section whereas the rest of the code in nmi_selftest.c is. To resolve this, I created a new #define, register_nmi_handler_initonly, that tags the struct as __initdata to resolve the mismatch. This #define should only be used in rare situations where the register/unregister is called during init of the kernel. Big thanks to Jan Beulich for decoding this for me as I didn't have a clue what was going on. Reported-by: Witold Baryluk <baryluk@smp.if.uj.edu.pl> Tested-by: Witold Baryluk <baryluk@smp.if.uj.edu.pl> Cc: Jan Beulich <JBeulich@suse.com> Signed-off-by: Don Zickus <dzickus@redhat.com> Link: http://lkml.kernel.org/r/1338991542-23000-1-git-send-email-dzickus@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-08x86/uv: Fix UV2 BAU legacy modeCliff Wickman1-1/+0
The SGI Altix UV2 BAU (Broadcast Assist Unit) as used for tlb-shootdown (selective broadcast mode) always uses UV2 broadcast descriptor format. There is no need to clear the 'legacy' (UV1) mode, because the hardware always uses UV2 mode for selective broadcast. But the BIOS uses general broadcast and legacy mode, and the hardware pays attention to the legacy mode bit for general broadcast. So the kernel must not clear that mode bit. Signed-off-by: Cliff Wickman <cpw@sgi.com> Cc: <stable@kernel.org> Link: http://lkml.kernel.org/r/E1SccoO-0002Lh-Cb@eag09.americas.sgi.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-08x86/apic: Make cpu_mask_to_apicid() operations check cpu_online_maskAlexander Gordeev1-4/+2
Currently cpu_mask_to_apicid() should not get a offline CPU with the cpumask. Otherwise some apic drivers might try to access non-existent per-cpu variables (i.e. x2apic). In that regard cpu_mask_to_apicid() and cpu_mask_to_apicid_and() operations are inconsistent. This fix makes the two operations do not rely on calling functions and always return the apicid for only online CPUs. As result, the meaning and implementations of cpu_mask_to_apicid() and cpu_mask_to_apicid_and() operations become straight. Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/20120607131624.GG4759@dhcp-26-207.brq.redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-08x86/apic: Make cpu_mask_to_apicid() operations return error codeAlexander Gordeev1-13/+31
Current cpu_mask_to_apicid() and cpu_mask_to_apicid_and() implementations have few shortcomings: 1. A value returned by cpu_mask_to_apicid() is written to hardware registers unconditionally. Should BAD_APICID get ever returned it will be written to a hardware too. But the value of BAD_APICID is not universal across all hardware in all modes and might cause unexpected results, i.e. interrupts might get routed to CPUs that are not configured to receive it. 2. Because the value of BAD_APICID is not universal it is counter- intuitive to return it for a hardware where it does not make sense (i.e. x2apic). 3. cpu_mask_to_apicid_and() operation is thought as an complement to cpu_mask_to_apicid() that only applies a AND mask on top of a cpumask being passed. Yet, as consequence of 18374d8 commit the two operations are inconsistent in that of: cpu_mask_to_apicid() should not get a offline CPU with the cpumask cpu_mask_to_apicid_and() should not fail and return BAD_APICID These limitations are impossible to realize just from looking at the operations prototypes. Most of these shortcomings are resolved by returning a error code instead of BAD_APICID. As the result, faults are reported back early rather than possibilities to cause a unexpected behaviour exist (in case of [1]). The only exception is setup_timer_IRQ0_pin() routine. Although obviously controversial to this fix, its existing behaviour is preserved to not break the fragile check_timer() and would better addressed in a separate fix. Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/20120607131559.GF4759@dhcp-26-207.brq.redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-08x86/apic: Avoid useless scanning thru a cpumask in assign_irq_vector()Alexander Gordeev1-3/+5
In case of static vector allocation domains (i.e. flat) if all vector numbers are exhausted, an attempt to assign a new vector will lead to useless scans through all CPUs in the cpumask, even though it is known that each new pass would fail. Make this corner case less painful by letting report whether the vector allocation domain depends on passed arguments or not and stop scanning early. The same could have been achived by introducing a static flag to the apic operations. But let's allow vector_allocation_domain() have more intelligence here and decide dynamically, in case we would need it in the future. Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/20120607131542.GE4759@dhcp-26-207.brq.redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-08x86/apic: Factor out default vector_allocation_domain() operationAlexander Gordeev1-0/+21
Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/20120607131449.GC4759@dhcp-26-207.brq.redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-08x86, cpu: Rename checking_wrmsrl() to wrmsrl_safe()H. Peter Anvin1-1/+1
Rename checking_wrmsrl() to wrmsrl_safe(), to match the naming convention used by all the other MSR access functions/macros. Signed-off-by: H. Peter Anvin <hpa@zytor.com>