summaryrefslogtreecommitdiff
path: root/arch/x86/kernel/process.c
AgeCommit message (Collapse)AuthorFilesLines
2011-01-13cpuidle/x86/perf: fix power:cpu_idle double end events and throw cpu_idle ↵Thomas Renninger1-2/+4
events from the cpuidle layer Currently intel_idle and acpi_idle driver show double cpu_idle "exit idle" events -> this patch fixes it and makes cpu_idle events throwing less complex. It also introduces cpu_idle events for all architectures which use the cpuidle subsystem, namely: - arch/arm/mach-at91/cpuidle.c - arch/arm/mach-davinci/cpuidle.c - arch/arm/mach-kirkwood/cpuidle.c - arch/arm/mach-omap2/cpuidle34xx.c - arch/drivers/acpi/processor_idle.c (for all cases, not only mwait) - arch/x86/kernel/process.c (did throw events before, but was a mess) - drivers/idle/intel_idle.c (did throw events before) Convention should be: Fire cpu_idle events inside the current pm_idle function (not somewhere down the the callee tree) to keep things easy. Current possible pm_idle functions in X86: c1e_idle, poll_idle, cpuidle_idle_call, mwait_idle, default_idle -> this is really easy is now. This affects userspace: The type field of the cpu_idle power event can now direclty get mapped to: /sys/devices/system/cpu/cpuX/cpuidle/stateX/{name,desc,usage,time,...} instead of throwing very CPU/mwait specific values. This change is not visible for the intel_idle driver. For the acpi_idle driver it should only be visible if the vendor misses out C-states in his BIOS. Another (perf timechart) patch reads out cpuidle info of cpu_idle events from: /sys/.../cpuidle/stateX/*, then the cpuidle events are mapped to the correct C-/cpuidle state again, even if e.g. vendors miss out C-states in their BIOS and for example only export C1 and C3. -> everything is fine. Signed-off-by: Thomas Renninger <trenn@suse.de> CC: Robert Schoene <robert.schoene@tu-dresden.de> CC: Jean Pihet <j-pihet@ti.com> CC: Arjan van de Ven <arjan@linux.intel.com> CC: Ingo Molnar <mingo@elte.hu> CC: Frederic Weisbecker <fweisbec@gmail.com> CC: linux-pm@lists.linux-foundation.org CC: linux-acpi@vger.kernel.org CC: linux-kernel@vger.kernel.org CC: linux-perf-users@vger.kernel.org CC: linux-omap@vger.kernel.org Signed-off-by: Len Brown <len.brown@intel.com>
2011-01-12ACPI, intel_idle: Cleanup idle= internal variablesThomas Renninger1-16/+8
Having four variables for the same thing: idle_halt, idle_nomwait, force_mwait and boot_option_idle_overrides is rather confusing and unnecessary complex. if idle= boot param is passed, only set up one variable: boot_option_idle_overrides Introduces following functional changes/fixes: - intel_idle driver does not register if any idle=xy boot param is passed. - processor_idle.c will also not register a cpuidle driver and get active if idle=halt is passed. Before a cpuidle driver with one (C1, halt) state got registered Now the default_idle function will be used which finally uses the same idle call to enter sleep state (safe_halt()), but without registering a whole cpuidle driver. That means idle= param will always avoid cpuidle drivers to register with one exception (same behavior as before): idle=nomwait may still register acpi_idle cpuidle driver, but C1 will not use mwait, but hlt. This can be a workaround for IO based deeper sleep states where C1 mwait causes problems. Signed-off-by: Thomas Renninger <trenn@suse.de> cc: x86@kernel.org Signed-off-by: Len Brown <len.brown@intel.com>
2011-01-08Merge branch 'for-2.6.38' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu * 'for-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (30 commits) gameport: use this_cpu_read instead of lookup x86: udelay: Use this_cpu_read to avoid address calculation x86: Use this_cpu_inc_return for nmi counter x86: Replace uses of current_cpu_data with this_cpu ops x86: Use this_cpu_ops to optimize code vmstat: User per cpu atomics to avoid interrupt disable / enable irq_work: Use per cpu atomics instead of regular atomics cpuops: Use cmpxchg for xchg to avoid lock semantics x86: this_cpu_cmpxchg and this_cpu_xchg operations percpu: Generic this_cpu_cmpxchg() and this_cpu_xchg support percpu,x86: relocate this_cpu_add_return() and friends connector: Use this_cpu operations xen: Use this_cpu_inc_return taskstats: Use this_cpu_ops random: Use this_cpu_inc_return fs: Use this_cpu_inc_return in buffer.c highmem: Use this_cpu_xx_return() operations vmstat: Use this_cpu_inc_return for vm statistics x86: Support for this_cpu_add, sub, dec, inc_return percpu: Generic support for this_cpu_add, sub, dec, inc_return ... Fixed up conflicts: in arch/x86/kernel/{apic/nmi.c, apic/x2apic_uv_x.c, process.c} as per Tejun.
2011-01-04perf: Clean up power events by introducing new, more generic onesThomas Renninger1-1/+6
Add these new power trace events: power:cpu_idle power:cpu_frequency power:machine_suspend The old C-state/idle accounting events: power:power_start power:power_end Have now a replacement (but we are still keeping the old tracepoints for compatibility): power:cpu_idle and power:power_frequency is replaced with: power:cpu_frequency power:machine_suspend is newly introduced. Jean Pihet has a patch integrated into the generic layer (kernel/power/suspend.c) which will make use of it. the type= field got removed from both, it was never used and the type is differed by the event type itself. perf timechart userspace tool gets adjusted in a separate patch. Signed-off-by: Thomas Renninger <trenn@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Acked-by: Jean Pihet <jean.pihet@newoldbits.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: rjw@sisk.pl LKML-Reference: <1294073445-14812-3-git-send-email-trenn@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> LKML-Reference: <1290072314-31155-2-git-send-email-trenn@suse.de>
2010-12-30x86: Replace uses of current_cpu_data with this_cpu opsTejun Heo1-2/+2
Replace all uses of current_cpu_data with this_cpu operations on the per cpu structure cpu_info. The scala accesses are replaced with the matching this_cpu ops which results in smaller and more efficient code. In the long run, it might be a good idea to remove cpu_data() macro too and use per_cpu macro directly. tj: updated description Cc: Yinghai Lu <yinghai@kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: H. Peter Anvin <hpa@zytor.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2010-11-18x86: Eliminate bp argument from the stack tracing routinesSoeren Sandmann Pedersen1-2/+1
The various stack tracing routines take a 'bp' argument in which the caller is supposed to provide the base pointer to use, or 0 if doesn't have one. Since bp is garbage whenever CONFIG_FRAME_POINTER is not defined, this means all callers in principle should either always pass 0, or be conditional on CONFIG_FRAME_POINTER. However, there are only really three use cases for stack tracing: (a) Trace the current task, including IRQ stack if any (b) Trace the current task, but skip IRQ stack (c) Trace some other task In all cases, if CONFIG_FRAME_POINTER is not defined, bp should just be 0. If it _is_ defined, then - in case (a) bp should be gotten directly from the CPU's register, so the caller should pass NULL for regs, - in case (b) the caller should should pass the IRQ registers to dump_trace(), - in case (c) bp should be gotten from the top of the task's stack, so the caller should pass NULL for regs. Hence, the bp argument is not necessary because the combination of task and regs is sufficient to determine an appropriate value for bp. This patch introduces a new inline function stack_frame(task, regs) that computes the desired bp. This function is then called from the two versions of dump_stack(). Signed-off-by: Soren Sandmann <ssp@redhat.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arjan van de Ven <arjan@infradead.org>, Cc: Frederic Weisbecker <fweisbec@gmail.com>, Cc: Arnaldo Carvalho de Melo <acme@redhat.com>, LKML-Reference: <m3oc9rop28.fsf@dhcp-100-3-82.bos.redhat.com>> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2010-08-18Make do_execve() take a const filename pointerDavid Howells1-2/+3
Make do_execve() take a const filename pointer so that kernel_execve() compiles correctly on ARM: arch/arm/kernel/sys_arm.c:88: warning: passing argument 1 of 'do_execve' discards qualifiers from pointer target type This also requires the argv and envp arguments to be consted twice, once for the pointer array and once for the strings the array points to. This is because do_execve() passes a pointer to the filename (now const) to copy_strings_kernel(). A simpler alternative would be to cast the filename pointer in do_execve() when it's passed to copy_strings_kernel(). do_execve() may not change any of the strings it is passed as part of the argv or envp lists as they are some of them in .rodata, so marking these strings as const should be fine. Further kernel_execve() and sys_execve() need to be changed to match. This has been test built on x86_64, frv, arm and mips. Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Ralf Baechle <ralf@linux-mips.org> Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-14Mark arguments to certain syscalls as being constDavid Howells1-1/+1
Mark arguments to certain system calls as being const where they should be but aren't. The list includes: (*) The filename arguments of various stat syscalls, execve(), various utimes syscalls and some mount syscalls. (*) The filename arguments of some syscall helpers relating to the above. (*) The buffer argument of various write syscalls. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-06Merge branch 'x86-cpu-for-linus' of ↵Linus Torvalds1-39/+6
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Fix keeping track of AMD C1E x86, cpu: Package Level Thermal Control, Power Limit Notification definitions x86, cpu: Export AMD errata definitions x86, cpu: Use AMD errata checking framework for erratum 383 x86, cpu: Clean up AMD erratum 400 workaround x86, cpu: AMD errata checking framework x86, cpu: Split addon_cpuid_features.c x86, cpu: Clean up formatting in cpufeature.h, remove override x86, cpu: Enumerate xsaveopt x86, cpu: Add xsaveopt cpufeature x86, cpu: Make init_scattered_cpuid_features() consider cpuid subleaves x86, cpu: Support the features flags in new CPUID leaf 7 x86, cpu: Add CPU flags for F16C and RDRND x86: Look for IA32_ENERGY_PERF_BIAS support x86, AMD: Extend support to future families x86, cacheinfo: Carve out L3 cache slot accessors x86, xsave: Cleanup return codes in check_for_xstate()
2010-08-04Merge branch 'next' of ↵Linus Torvalds1-4/+4
git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq: [CPUFREQ] Remove pointless printk from p4-clockmod. [CPUFREQ] Fix section mismatch for powernow_cpu_init in powernow-k7.c [CPUFREQ] Fix section mismatch for longhaul_cpu_init. [CPUFREQ] Fix section mismatch for longrun_cpu_init. [CPUFREQ] powernow-k8: Fix misleading variable naming [CPUFREQ] Convert pci_table entries to PCI_VDEVICE (if PCI_ANY_ID is used) [CPUFREQ] arch/x86/kernel/cpu/cpufreq: use for_each_pci_dev() [CPUFREQ] fix brace coding style issue. [CPUFREQ] x86 cpufreq: Make trace_power_frequency cpufreq driver independent [CPUFREQ] acpi-cpufreq: Fix CPU_ANY CPUFREQ_{PRE,POST}CHANGE notification [CPUFREQ] ondemand: don't synchronize sample rate unless multiple cpus present [CPUFREQ] unexport (un)lock_policy_rwsem* functions [CPUFREQ] ondemand: Refactor frequency increase code [CPUFREQ] powernow-k8: On load failure, remind the user to enable support in BIOS setup [CPUFREQ] powernow-k8: Limit Pstate transition latency check [CPUFREQ] Fix PCC driver error path [CPUFREQ] fix double freeing in error path of pcc-cpufreq [CPUFREQ] pcc driver should check for pcch method before calling _OSC [CPUFREQ] fix memory leak in cpufreq_add_dev [CPUFREQ] revert "[CPUFREQ] remove rwsem lock from CPUFREQ_GOV_STOP call (second call site)" Manually fix up non-data merge conflict introduced by new calling conventions for trace_power_start() in commit 6f4f2723d085 ("x86 cpufreq: Make trace_power_frequency cpufreq driver independent"), which didn't update the intel_idle native hardware cpuidle driver.
2010-08-03[CPUFREQ] x86 cpufreq: Make trace_power_frequency cpufreq driver independentThomas Renninger1-4/+4
and fix the broken case if a core's frequency depends on others. trace_power_frequency was only implemented in a rather ungeneric way in acpi-cpufreq driver's target() function only. -> Move the call to trace_power_frequency to cpufreq.c:cpufreq_notify_transition() where CPUFREQ_POSTCHANGE notifier is triggered. This will support power frequency tracing by all cpufreq drivers trace_power_frequency did not trace frequency changes correctly when the userspace governor was used or when CPU cores' frequency depend on each other. -> Moving this into the CPUFREQ_POSTCHANGE notifier and pass the cpu which gets switched automatically fixes this. Robert Schoene provided some important fixes on top of my initial quick shot version which are integrated in this patch: - Forgot some changes in power_end trace (TP_printk/variable names) - Variable dummy in power_end must now be cpu_id - Use static 64 bit variable instead of unsigned int for cpu_id Signed-off-by: Thomas Renninger <trenn@suse.de> CC: davej@redhat.com CC: arjan@infradead.org CC: linux-kernel@vger.kernel.org CC: robert.schoene@tu-dresden.de Tested-by: robert.schoene@tu-dresden.de Signed-off-by: Dave Jones <davej@redhat.com>
2010-08-02x86: Fix keeping track of AMD C1EMichal Schmidt1-3/+5
Accomodate the original C1E-aware idle routine to the different times during boot when the BIOS enables C1E. While at it, remove the synthetic CPUID flag in favor of a single global setting which denotes C1E status on the system. [ hpa: changed c1e_enabled to be a bool; clarified cpu bit 3:21 comment ] Signed-off-by: Michal Schmidt <mschmidt@redhat.com> LKML-Reference: <20100727165335.GA11630@aftab> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Acked-by: Thomas Gleixner <tglx@linutronix.de>
2010-08-01x86: Export FPU API for KVM useSheng Yang1-0/+1
Also add some constants. Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-07-29x86, cpu: Clean up AMD erratum 400 workaroundHans Rosenfeld1-37/+2
Remove check_c1e_idle() and use the new AMD errata checking framework instead. Signed-off-by: Hans Rosenfeld <hans.rosenfeld@amd.com> LKML-Reference: <1280336972-865982-2-git-send-email-hans.rosenfeld@amd.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-05-18Merge branch 'x86-fpu-for-linus' of ↵Linus Torvalds1-11/+9
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, fpu: Use static_cpu_has() to implement use_xsave() x86: Add new static_cpu_has() function using alternatives x86, fpu: Use the proper asm constraint in use_xsave() x86, fpu: Unbreak FPU emulation x86: Introduce 'struct fpu' and related API x86: Eliminate TS_XSAVE x86-32: Don't set ignore_fpu_irq in simd exception x86: Merge kernel_math_error() into math_error() x86: Merge simd_math_error() into math_error() x86-32: Rework cache flush denied handler Fix trivial conflict in arch/x86/kernel/process.c
2010-05-18Merge branch 'perf-core-for-linus' of ↵Linus Torvalds1-8/+10
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (311 commits) perf tools: Add mode to build without newt support perf symbols: symbol inconsistency message should be done only at verbose=1 perf tui: Add explicit -lslang option perf options: Type check all the remaining OPT_ variants perf options: Type check OPT_BOOLEAN and fix the offenders perf options: Check v type in OPT_U?INTEGER perf options: Introduce OPT_UINTEGER perf tui: Add workaround for slang < 2.1.4 perf record: Fix bug mismatch with -c option definition perf options: Introduce OPT_U64 perf tui: Add help window to show key associations perf tui: Make <- exit menus too perf newt: Add single key shortcuts for zoom into DSO and threads perf newt: Exit browser unconditionally when CTRL+C, q or Q is pressed perf newt: Fix the 'A'/'a' shortcut for annotate perf newt: Make <- exit the ui_browser x86, perf: P4 PMU - fix counters management logic perf newt: Make <- zoom out filters perf report: Report number of events, not samples perf hist: Clarify events_stats fields usage ... Fix up trivial conflicts in kernel/fork.c and tools/perf/builtin-record.c
2010-05-14x86, amd: Check X86_FEATURE_OSVW bit before accessing OSVW MSRsAndreas Herrmann1-5/+7
If host CPU is exposed to a guest the OSVW MSRs are not guaranteed to be present and a GP fault occurs. Thus checking the feature flag is essential. Cc: <stable@kernel.org> # .32.x .33.x Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> LKML-Reference: <20100427101348.GC4489@alberich.amd.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-05-10x86: Introduce 'struct fpu' and related APIAvi Kivity1-12/+9
Currently all fpu state access is through tsk->thread.xstate. Since we wish to generalize fpu access to non-task contexts, wrap the state in a new 'struct fpu' and convert existing access to use an fpu API. Signal frame handlers are not converted to the API since they will remain task context only things. Signed-off-by: Avi Kivity <avi@redhat.com> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <1273135546-29690-3-git-send-email-avi@redhat.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-04-02Merge branch 'perf/urgent' into perf/coreIngo Molnar1-8/+24
Conflicts: arch/x86/kernel/cpu/perf_event.c Merge reason: Resolve the conflict, pick up fixes Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-26x86, ptrace: Fix block-stepPeter Zijlstra1-0/+11
Implement ptrace-block-step using TIF_BLOCKSTEP which will set DEBUGCTLMSR_BTF when set for a task while preserving any other DEBUGCTLMSR bits. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20100325135414.017536066@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-26x86, perf, bts, mm: Delete the never used BTS-ptrace codePeter Zijlstra1-9/+0
Support for the PMU's BTS features has been upstreamed in v2.6.32, but we still have the old and disabled ptrace-BTS, as Linus noticed it not so long ago. It's buggy: TIF_DEBUGCTLMSR is trampling all over that MSR without regard for other uses (perf) and doesn't provide the flexibility needed for perf either. Its users are ptrace-block-step and ptrace-bts, since ptrace-bts was never used and ptrace-block-step can be implemented using a much simpler approach. So axe all 3000 lines of it. That includes the *locked_memory*() APIs in mm/mlock.c as well. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Roland McGrath <roland@redhat.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Markus Metzger <markus.t.metzger@intel.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Andrew Morton <akpm@linux-foundation.org> LKML-Reference: <20100325135413.938004390@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-20x86, amd: Restrict usage of c1e_idle()Andreas Herrmann1-8/+24
Currently c1e_idle returns true for all CPUs greater than or equal to family 0xf model 0x40. This covers too many CPUs. Meanwhile a respective erratum for the underlying problem was filed (#400). This patch adds the logic to check whether erratum #400 applies to a given CPU. Especially for CPUs where SMI/HW triggered C1e is not supported, c1e_idle() doesn't need to be used. We can check this by looking at the respective OSVW bit for erratum #400. Cc: <stable@kernel.org> # .32.x .33.x Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> LKML-Reference: <20100319110922.GA19614@alberich.amd.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-03-14Merge branch 'x86-fixes-for-linus' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, k8 nb: Fix boot crash: enable k8_northbridges unconditionally on AMD systems x86, UV: Fix target_cpus() in x2apic_uv_x.c x86: Reduce per cpu warning boot up messages x86: Reduce per cpu MCA boot up messages x86_64, cpa: Don't work hard in preserving kernel 2M mappings when using 4K already
2010-03-11x86: Reduce per cpu warning boot up messagesMike Travis1-1/+1
Reduce warning message output to one line only instead of per cpu. Signed-of-by: Mike Travis <travis@sgi.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Brian Gerst <brgerst@gmail.com> Cc: x86@kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-02-28Merge branch 'x86-asm-for-linus' of ↵Linus Torvalds1-0/+7
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Mark atomic irq ops raw for 32bit legacy x86: Merge show_regs() x86: Macroise x86 cache descriptors x86-32: clean up rwsem inline asm statements x86: Merge asm/atomic_{32,64}.h x86: Sync asm/atomic_32.h and asm/atomic_64.h x86: Split atomic64_t functions into seperate headers x86-64: Modify memcpy()/memset() alternatives mechanism x86-64: Modify copy_user_generic() alternatives mechanism x86: Lift restriction on the location of FIX_BTMAP_* x86, core: Optimize hweight32()
2010-01-29x86: get rid of the insane TIF_ABI_PENDING bitH. Peter Anvin1-12/+0
Now that the previous commit made it possible to do the personality setting at the point of no return, we do just that for ELF binaries. And suddenly all the reasons for that insane TIF_ABI_PENDING bit go away, and we can just make SET_PERSONALITY() just do the obvious thing for a 32-bit compat process. Everything becomes much more straightforward this way. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-01-13x86: Merge show_regs()Brian Gerst1-0/+7
Using kernel_stack_pointer() allows 32-bit and 64-bit versions to be merged. This is more correct for 64-bit, since the old %rsp is always saved on the stack. Signed-off-by: Brian Gerst <brgerst@gmail.com> LKML-Reference: <1263397555-27695-1-git-send-email-brgerst@gmail.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-01-13x86: kernel_thread() -- initialize SS to a known stateCyrill Gorcunov1-0/+2
Before the kernel_thread was converted into "C" we had pt_regs::ss set to __KERNEL_DS (by SAVE_ALL asm macro). Though I must admit I didn't find any *explicit* load of %ss from this structure the better to be on a safe side and set it to a known value. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: Christian Kujau <lists@nerdbynature.de> Cc: Jeremy Fitzhardinge <Jeremy.Fitzhardinge@citrix.com> Cc: Brian Gerst <brgerst@gmail.com> LKML-Reference: <1263377768-19600-1-git-send-email-ian.campbell@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-28x86: Use KERN_DEFAULT log-level in __show_regs()Pekka Enberg1-2/+2
Andrew Morton reported a strange looking kmemcheck warning: WARNING: kmemcheck: Caught 32-bit read from uninitialized memory (ffff88004fba6c20) 0000000000000000310000000000000000000000000000002413000000c9ffff u u u u u u u u u u u u u u u u i i i i i i i i u u u u u u u u [<ffffffff810af3aa>] kmemleak_scan+0x25a/0x540 [<ffffffff810afbcb>] kmemleak_scan_thread+0x5b/0xe0 [<ffffffff8104d0fe>] kthread+0x9e/0xb0 [<ffffffff81003074>] kernel_thread_helper+0x4/0x10 [<ffffffffffffffff>] 0xffffffffffffffff The above printout is missing register dump completely. The problem here is that the output comes from syslog which doesn't show KERN_INFO log-level messages. We didn't see this before because both of us were testing on 32-bit kernels which use the _default_ log-level. Fix that up by explicitly using KERN_DEFAULT log-level for __show_regs() printks. Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Vegard Nossum <vegard.nossum@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> LKML-Reference: <1261988819.4641.2.camel@penberg-laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-15Merge branch 'x86/asm' into x86/urgentIngo Molnar1-0/+70
Merge reason: it's stable so lets push it upstream. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-11x86: Merge kernel_thread()Brian Gerst1-0/+35
Signed-off-by: Brian Gerst <brgerst@gmail.com> LKML-Reference: <1260380084-3707-6-git-send-email-brgerst@gmail.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-12-10x86: Merge sys_cloneBrian Gerst1-0/+9
Change 32-bit sys_clone to new PTREGSCALL stub, and merge with 64-bit. Signed-off-by: Brian Gerst <brgerst@gmail.com> LKML-Reference: <1260403316-5679-7-git-send-email-brgerst@gmail.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-12-10x86: Merge sys_execveBrian Gerst1-0/+26
Change 32-bit sys_execve to PTREGSCALL3, and merge with 64-bit. Signed-off-by: Brian Gerst <brgerst@gmail.com> LKML-Reference: <1260403316-5679-4-git-send-email-brgerst@gmail.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-12-09x86: Print DMI_BOARD_NAME as well as DMI_PRODUCT_NAME from __show_regs()Andy Isaacson1-4/+7
Robert Hancock observes that DMI_BOARD_NAME is often more useful than DMI_PRODUCT_NAME, especially on standalone motherboards. So, print both. Signed-off-by: Andy Isaacson <adi@hexapodia.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Robert Hancock <hancockrwd@gmail.com> Cc: Richard Zidlicky <rz@linux-m68k.org> Cc: Andrew Morton <akpm@linux-foundation.org> LKML-Reference: <20091208083021.GB27174@hexapodia.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-09x86: Factor duplicated code out of __show_regs() into show_regs_common()Andy Isaacson1-0/+18
Unify x86_32 and x86_64 implementations of __show_regs() header, standardizing on the x86_64 format string in the process. Also, 32-bit will now call print_modules. Signed-off-by: Andy Isaacson <adi@hexapodia.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Robert Hancock <hancockrwd@gmail.com> Cc: Richard Zidlicky <rz@linux-m68k.org> Cc: Andrew Morton <akpm@linux-foundation.org> LKML-Reference: <20091208082942.GA27174@hexapodia.org> [ v2: resolved conflict ] Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-08Merge branch 'kvm-updates/2.6.33' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-0/+2
* 'kvm-updates/2.6.33' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (84 commits) KVM: VMX: Fix comparison of guest efer with stale host value KVM: s390: Fix prefix register checking in arch/s390/kvm/sigp.c KVM: Drop user return notifier when disabling virtualization on a cpu KVM: VMX: Disable unrestricted guest when EPT disabled KVM: x86 emulator: limit instructions to 15 bytes KVM: s390: Make psw available on all exits, not just a subset KVM: x86: Add KVM_GET/SET_VCPU_EVENTS KVM: VMX: Report unexpected simultaneous exceptions as internal errors KVM: Allow internal errors reported to userspace to carry extra data KVM: Reorder IOCTLs in main kvm.h KVM: x86: Polish exception injection via KVM_SET_GUEST_DEBUG KVM: only clear irq_source_id if irqchip is present KVM: x86: disallow KVM_{SET,GET}_LAPIC without allocated in-kernel lapic KVM: x86: disallow multiple KVM_CREATE_IRQCHIP KVM: VMX: Remove vmx->msr_offset_efer KVM: MMU: update invlpg handler comment KVM: VMX: move CR3/PDPTR update to vmx_set_cr3 KVM: remove duplicated task_switch check KVM: powerpc: Fix BUILD_BUG_ON condition KVM: VMX: Use shared msr infrastructure ... Trivial conflicts due to new Kconfig options in arch/Kconfig and kernel/Makefile
2009-11-08hw-breakpoints: Rewrite the hw-breakpoints layer on top of perf eventsFrederic Weisbecker1-5/+2
This patch rebase the implementation of the breakpoints API on top of perf events instances. Each breakpoints are now perf events that handle the register scheduling, thread/cpu attachment, etc.. The new layering is now made as follows: ptrace kgdb ftrace perf syscall \ | / / \ | / / / Core breakpoint API / / | / | / Breakpoints perf events | | Breakpoints PMU ---- Debug Register constraints handling (Part of core breakpoint API) | | Hardware debug registers Reasons of this rewrite: - Use the centralized/optimized pmu registers scheduling, implying an easier arch integration - More powerful register handling: perf attributes (pinned/flexible events, exclusive/non-exclusive, tunable period, etc...) Impact: - New perf ABI: the hardware breakpoints counters - Ptrace breakpoints setting remains tricky and still needs some per thread breakpoints references. Todo (in the order): - Support breakpoints perf counter events for perf tools (ie: implement perf_bpcounter_event()) - Support from perf tools Changes in v2: - Follow the perf "event " rename - The ptrace regression have been fixed (ptrace breakpoint perf events weren't released when a task ended) - Drop the struct hw_breakpoint and store generic fields in perf_event_attr. - Separate core and arch specific headers, drop asm-generic/hw_breakpoint.h and create linux/hw_breakpoint.h - Use new generic len/type for breakpoint - Handle off case: when breakpoints api is not supported by an arch Changes in v3: - Fix broken CONFIG_KVM, we need to propagate the breakpoint api changes to kvm when we exit the guest and restore the bp registers to the host. Changes in v4: - Drop the hw_breakpoint_restore() stub as it is only used by KVM - EXPORT_SYMBOL_GPL hw_breakpoint_restore() as KVM can be built as a module - Restore the breakpoints unconditionally on kvm guest exit: TIF_DEBUG_THREAD doesn't anymore cover every cases of running breakpoints and vcpu->arch.switch_db_regs might not always be set when the guest used debug registers. (Waiting for a reliable optimization) Changes in v5: - Split-up the asm-generic/hw-breakpoint.h moving to linux/hw_breakpoint.h into a separate patch - Optimize the breakpoints restoring while switching from kvm guest to host. We only want to restore the state if we have active breakpoints to the host, otherwise we don't care about messed-up address registers. - Add asm/hw_breakpoint.h to Kbuild - Fix bad breakpoint type in trace_selftest.c Changes in v6: - Fix wrong header inclusion in trace.h (triggered a build error with CONFIG_FTRACE_SELFTEST Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Prasad <prasad@linux.vnet.ibm.com> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Jan Kiszka <jan.kiszka@web.de> Cc: Jiri Slaby <jirislaby@gmail.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Avi Kivity <avi@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Masami Hiramatsu <mhiramat@redhat.com> Cc: Paul Mundt <lethal@linux-sh.org>
2009-11-03x86/hw-breakpoints: Actually flush thread breakpoints in flush_thread().Paul Mundt1-2/+0
flush_thread() tries to do a TIF_DEBUG check before calling in to flush_thread_hw_breakpoint() (which subsequently clears the thread flag), but for some reason, the x86 code is manually clearing TIF_DEBUG immediately before the test, so this path will never be taken. This kills off the erroneous clear_tsk_thread_flag() and lets flush_thread_hw_breakpoint() actually get invoked. Presumably folks were getting lucky with testing and the free_thread_info() -> free_thread_xstate() path was taking care of the flush there. Signed-off-by: Paul Mundt <lethal@linux-sh.org> Acked-by: "K.Prasad" <prasad@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Alan Stern <stern@rowland.harvard.edu> LKML-Reference: <20091005102306.GA7889@linux-sh.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-10-18Merge commit 'perf/core' into perf/hw-breakpointFrederic Weisbecker1-23/+8
Conflicts: kernel/Makefile kernel/trace/Makefile kernel/trace/trace.h samples/Makefile Merge reason: We need to be uptodate with the perf events development branch because we plan to rewrite the breakpoints API on top of perf events.
2009-10-01core, x86: Add user return notifiersAvi Kivity1-0/+2
Add a general per-cpu notifier that is called whenever the kernel is about to return to userspace. The notifier uses a thread_info flag and existing checks, so there is no impact on user return or context switch fast paths. This will be used initially to speed up KVM task switching by lazily updating MSRs. Signed-off-by: Avi Kivity <avi@redhat.com> LKML-Reference: <1253342422-13811-1-git-send-email-avi@redhat.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-09-24cpumask: use zalloc_cpumask_var() where possibleLi Zefan1-4/+2
Remove open-coded zalloc_cpumask_var() and zalloc_cpumask_var_node(). Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-09-19tracing, x86, cpuidle: Move the end point of a C state in the power tracerArjan van de Ven1-3/+0
The "end of a C state" trace point currently happens before the code runs that corrects the TSC for having stopped during idle. The result of this is that the timestamp of the end-of-C-state event is garbage on cpus where the TSC stops during idle. This patch moves the end point of the C state to after the timekeeping engine of the kernel has been corrected. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Cc: Len Brown <len.brown@intel.com> Cc: fweisbec@gmail.com Cc: peterz@infradead.org Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <20090919133533.139c2a46@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-19tracing, perf: Convert the power tracer into an event tracerArjan van de Ven1-19/+9
This patch converts the existing power tracer into an event tracer, so that power events (C states and frequency changes) can be tracked via "perf". This also removes the perl script that was used to demo the tracer; its functionality is being replaced entirely with timechart. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <20090912130542.6d314860@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-07Merge branch 'tracing/core' into tracing/hw-breakpointsIngo Molnar1-5/+1
Conflicts: arch/Kconfig kernel/trace/trace.h Merge reason: resolve the conflicts, plus adopt to the new ring-buffer APIs. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-19clockevent: Prevent dead lock on clockevents_lockSuresh Siddha1-5/+1
Currently clockevents_notify() is called with interrupts enabled at some places and interrupts disabled at some other places. This results in a deadlock in this scenario. cpu A holds clockevents_lock in clockevents_notify() with irqs enabled cpu B waits for clockevents_lock in clockevents_notify() with irqs disabled cpu C doing set_mtrr() which will try to rendezvous of all the cpus. This will result in C and A come to the rendezvous point and waiting for B. B is stuck forever waiting for the spinlock and thus not reaching the rendezvous point. Fix the clockevents code so that clockevents_lock is taken with interrupts disabled and thus avoid the above deadlock. Also call lapic_timer_propagate_broadcast() on the destination cpu so that we avoid calling smp_call_function() in the clockevents notifier chain. This issue left us wondering if we need to change the MTRR rendezvous logic to use stop machine logic (instead of smp_call_function) or add a check in spinlock debug code to see if there are other spinlocks which gets taken under both interrupts enabled/disabled conditions. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Cc: "Pallipadi Venkatesh" <venkatesh.pallipadi@intel.com> Cc: "Brown Len" <len.brown@intel.com> LKML-Reference: <1250544899.2709.210.camel@sbs-t61.sc.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-06-17Merge branch 'linus' into tracing/hw-breakpointsIngo Molnar1-1/+16
Conflicts: arch/x86/Kconfig arch/x86/kernel/traps.c arch/x86/power/cpu.c arch/x86/power/cpu_32.c kernel/Makefile Semantic conflict: arch/x86/kernel/hw_breakpoint.c Merge reason: Resolve the conflicts, move from put_cpu_no_sched() to put_cpu() in arch/x86/kernel/hw_breakpoint.c. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-15kmemcheck: add mm functionsVegard Nossum1-1/+1
With kmemcheck enabled, the slab allocator needs to do this: 1. Tell kmemcheck to allocate the shadow memory which stores the status of each byte in the allocation proper, e.g. whether it is initialized or uninitialized. 2. Tell kmemcheck which parts of memory that should be marked uninitialized. There are actually a few more states, such as "not yet allocated" and "recently freed". If a slab cache is set up using the SLAB_NOTRACK flag, it will never return memory that can take page faults because of kmemcheck. If a slab cache is NOT set up using the SLAB_NOTRACK flag, callers can still request memory with the __GFP_NOTRACK flag. This does not prevent the page faults from occuring, however, but marks the object in question as being initialized so that no warnings will ever be produced for this object. In addition to (and in contrast to) __GFP_NOTRACK, the __GFP_NOTRACK_FALSE_POSITIVE flag indicates that the allocation should not be tracked _because_ it would produce a false positive. Their values are identical, but need not be so in the future (for example, we could now enable/disable false positives with a config option). Parts of this patch were contributed by Pekka Enberg but merged for atomicity. Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Ingo Molnar <mingo@elte.hu> [rebased for mainline inclusion] Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
2009-06-11Merge branch 'tracing-for-linus' of ↵Linus Torvalds1-2/+3
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (244 commits) Revert "x86, bts: reenable ptrace branch trace support" tracing: do not translate event helper macros in print format ftrace/documentation: fix typo in function grapher name tracing/events: convert block trace points to TRACE_EVENT(), fix !CONFIG_BLOCK tracing: add protection around module events unload tracing: add trace_seq_vprint interface tracing: fix the block trace points print size tracing/events: convert block trace points to TRACE_EVENT() ring-buffer: fix ret in rb_add_time_stamp ring-buffer: pass in lockdep class key for reader_lock tracing: add annotation to what type of stack trace is recorded tracing: fix multiple use of __print_flags and __print_symbolic tracing/events: fix output format of user stack tracing/events: fix output format of kernel stack tracing/trace_stack: fix the number of entries in the header ring-buffer: discard timestamps that are at the start of the buffer ring-buffer: try to discard unneeded timestamps ring-buffer: fix bug in ring_buffer_discard_commit ftrace: do not profile functions when disabled tracing: make trace pipe recognize latency format flag ...
2009-06-11Merge branch 'x86-mm-for-linus' of ↵Linus Torvalds1-0/+14
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (22 commits) x86: fix system without memory on node0 x86, mm: Fix node_possible_map logic mm, x86: remove MEMORY_HOTPLUG_RESERVE related code x86: make sparse mem work in non-NUMA mode x86: process.c, remove useless headers x86: merge process.c a bit x86: use sparse_memory_present_with_active_regions() on UMA x86: unify 64-bit UMA and NUMA paging_init() x86: Allow 1MB of slack between the e820 map and SRAT, not 4GB x86: Sanity check the e820 against the SRAT table using e820 map only x86: clean up and and print out initial max_pfn_mapped x86/pci: remove rounding quirk from e820_setup_gap() x86, e820, pci: reserve extra free space near end of RAM x86: fix typo in address space documentation x86: 46 bit physical address support on 64 bits x86, mm: fault.c, use printk_once() in is_errata93() x86: move per-cpu mmu_gathers to mm/init.c x86: move max_pfn_mapped and max_low_pfn_mapped to setup.c x86: unify noexec handling x86: remove (null) in /sys kernel_page_tables ...
2009-06-03hw-breakpoints: use the new wrapper routines to access debug registers in ↵K.Prasad1-16/+6
process/thread code This patch enables the use of abstract debug registers in process-handling routines, according to the new hardware breakpoint Api. [ Impact: adapt thread breakpoints handling code to the new breakpoint Api ] Original-patch-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: K.Prasad <prasad@linux.vnet.ibm.com> Reviewed-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>