summaryrefslogtreecommitdiff
path: root/drivers/cpufreq
AgeCommit message (Collapse)AuthorFilesLines
2010-08-03[CPUFREQ] unexport (un)lock_policy_rwsem* functionsAmerigo Wang1-7/+3
lock_policy_rwsem_* and unlock_policy_rwsem_* functions are scheduled to be unexported when 2.6.33. Now there are no other callers of them out of cpufreq.c, unexport them and make them static. Signed-off-by: WANG Cong <amwang@redhat.com> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Dave Jones <davej@redhat.com>
2010-08-03[CPUFREQ] ondemand: Refactor frequency increase codeMike Chan1-13/+12
Make simpler to read and call. *** v3 - Always call when powersave_bias is enabled. Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Mike Chan <mike@android.com> Signed-off-by: Dave Jones <davej@redhat.com>
2010-08-03[CPUFREQ] fix memory leak in cpufreq_add_devXiaotian Feng1-0/+1
We didn't free policy->related_cpus in error path err_unlock_policy. This is catched by following kmemleak report: unreferenced object 0xffff88022a0b96d0 (size 512): comm "modprobe", pid 886, jiffies 4294689177 (age 780.694s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff8111ebe5>] create_object+0x186/0x281 [<ffffffff814fad4f>] kmemleak_alloc+0x60/0xa7 [<ffffffff8111127a>] kmem_cache_alloc_node_notrace+0x120/0x142 [<ffffffff81262e4f>] alloc_cpumask_var_node+0x2c/0xd7 [<ffffffff81262f0b>] alloc_cpumask_var+0x11/0x13 [<ffffffff81262f1c>] zalloc_cpumask_var+0xf/0x11 [<ffffffff8140fac0>] cpufreq_add_dev+0x11f/0x547 [<ffffffff81334bda>] sysdev_driver_register+0xc2/0x11d [<ffffffff8140e334>] cpufreq_register_driver+0xcb/0x1b8 [<ffffffffa032e040>] 0xffffffffa032e040 [<ffffffff810021ba>] do_one_initcall+0x5e/0x15c [<ffffffff81087f94>] sys_init_module+0xa6/0x1e6 [<ffffffff81009bc2>] system_call_fastpath+0x16/0x1b [<ffffffffffffffff>] 0xffffffffffffffff Signed-off-by: Xiaotian Feng <dfeng@redhat.com> Cc: Thomas Renninger <trenn@suse.de> Cc: Prarit Bhargava <prarit@redhat.com> Signed-off-by: Dave Jones <davej@redhat.com>
2010-08-03[CPUFREQ] revert "[CPUFREQ] remove rwsem lock from CPUFREQ_GOV_STOP call ↵Andrej Gelenberg1-10/+1
(second call site)" 395913d0b1db37092ea3d9d69b832183b1dd84c5 ("[CPUFREQ] remove rwsem lock from CPUFREQ_GOV_STOP call (second call site)") is not needed, because there is no rwsem lock in cpufreq_ondemand and cpufreq_conservative anymore. Lock should not be released until the work done. Addresses https://bugzilla.kernel.org/show_bug.cgi?id=1594 Signed-off-by: Andrej Gelenberg <andrej.gelenberg@udo.edu> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Dave Jones <davej@redhat.com>
2010-05-18Merge branch 'x86-cpu-for-linus' of ↵Linus Torvalds3-87/+43
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, hypervisor: add missing <linux/module.h> Modify the VMware balloon driver for the new x86_hyper API x86, hypervisor: Export the x86_hyper* symbols x86: Clean up the hypervisor layer x86, HyperV: fix up the license to mshyperv.c x86: Detect running on a Microsoft HyperV system x86, cpu: Make APERF/MPERF a normal table-driven flag x86, k8: Fix build error when K8_NB is disabled x86, cacheinfo: Disable index in all four subcaches x86, cacheinfo: Make L3 cache info per node x86, cacheinfo: Reorganize AMD L3 cache structure x86, cacheinfo: Turn off L3 cache index disable feature in virtualized environments x86, cacheinfo: Unify AMD L3 cache index disable checking cpufreq: Unify sysfs attribute definition macros powernow-k8: Fix frequency reporting x86, cpufreq: Add APERF/MPERF support for AMD processors x86: Unify APERF/MPERF support powernow-k8: Add core performance boost support x86, cpu: Add AMD core boosting feature flag to /proc/cpuinfo Fix up trivial conflicts in arch/x86/kernel/cpu/intel_cacheinfo.c and drivers/cpufreq/cpufreq_ondemand.c
2010-05-09ondemand: Make the iowait-is-busy time a sysfs tunableArjan van de Ven1-1/+46
Pavel Machek pointed out that not all CPUs have an efficient idle at high frequency. Specifically, older Intel and various AMD cpus would get a higher powerusage when copying files from USB. Mike Chan pointed out that the same is true for various ARM chips as well. Thomas Renninger suggested to make this a sysfs tunable with a reasonable default. This patch adds a sysfs tunable for the new behavior, and uses a very simple function to determine a reasonable default, depending on the CPU vendor/type. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: Pavel Machek <pavel@ucw.cz> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: davej@redhat.com LKML-Reference: <20100509082651.46914d04@infradead.org> [ minor tidyup ] Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-05-09ondemand: Solve a big performance issue by counting IOWAIT time as busyArjan van de Ven1-2/+28
The ondemand cpufreq governor uses CPU busy time (e.g. not-idle time) as a measure for scaling the CPU frequency up or down. If the CPU is busy, the CPU frequency scales up, if it's idle, the CPU frequency scales down. Effectively, it uses the CPU busy time as proxy variable for the more nebulous "how critical is performance right now" question. This algorithm falls flat on its face in the light of workloads where you're alternatingly disk and CPU bound, such as the ever popular "git grep", but also things like startup of programs and maildir using email clients... much to the chagarin of Andrew Morton. This patch changes the ondemand algorithm to count iowait time as busy, not idle, time. As shown in the breakdown cases above, iowait is performance critical often, and by counting iowait, the proxy variable becomes a more accurate representation of the "how critical is performance" question. The problem and fix are both verified with the "perf timechar" tool. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dave Jones <davej@redhat.com> Reviewed-by: Rik van Riel <riel@redhat.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20100509082606.3d9f00d0@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-05-09Merge commit 'v2.6.34-rc6' into x86/cpuH. Peter Anvin3-7/+21
2010-04-24Merge branch 'fixes' of ↵Linus Torvalds2-7/+20
git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq: [CPUFREQ] use max load in conservative governor [CPUFREQ] fix a lockdep warning
2010-04-10cpufreq: Unify sysfs attribute definition macrosBorislav Petkov3-86/+42
Multiple modules used to define those which are with identical functionality and were needlessly replicated among the different cpufreq drivers. Push them into the header and remove duplication. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> LKML-Reference: <1270065406-1814-7-git-send-email-bp@amd64.org> Reviewed-by: Thomas Renninger <trenn@suse.de> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-03-31[CPUFREQ] use max load in conservative governorDominik Brodowski1-2/+6
Instead of using the load of the last CPU in a package, use the maximum load of all CPUs in a package. Reported-by: Jean-Christian Goussard <jeanchristian.goussard@sfr.fr> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Dave Jones <davej@redhat.com>
2010-03-31[CPUFREQ] fix a lockdep warningAmerigo Wang1-5/+14
There is no need to do sysfs_remove_link() or kobject_put() etc. when policy_rwsem_write is held, move them after releasing the lock. This fixes the lockdep warning: halt/4071 is trying to acquire lock: (s_active){++++.+}, at: [<c0000000001ef868>] .sysfs_addrm_finish+0x58/0xc0 but task is already holding lock: (&per_cpu(cpu_policy_rwsem, cpu)){+.+.+.}, at: [<c0000000004cd6ac>] .lock_policy_rwsem_write+0x84/0xf4 Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: WANG Cong <amwang@redhat.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Dave Jones <davej@redhat.com>
2010-03-30include cleanup: Update gfp.h and slab.h includes to prepare for breaking ↵Tejun Heo1-0/+1
implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-08Driver core: Constify struct sysfs_ops in struct kobj_typeEmese Revfy1-1/+1
Constify struct sysfs_ops. This is part of the ops structure constification effort started by Arjan van de Ven et al. Benefits of this constification: * prevents modification of data that is shared (referenced) by many other structure instances at runtime * detects/prevents accidental (but not intentional) modification attempts on archs that enforce read-only kernel data at runtime * potentially better optimized code as the compiler can assume that the const data cannot be changed * the compiler/linker move const data into .rodata and therefore exclude them from false sharing Signed-off-by: Emese Revfy <re.emese@gmail.com> Acked-by: David Teigland <teigland@redhat.com> Acked-by: Matt Domsch <Matt_Domsch@dell.com> Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com> Acked-by: Hans J. Koch <hjk@linutronix.de> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: Jens Axboe <jens.axboe@oracle.com> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-13[CPUFREQ] Fix ondemand to not request targets outside policy limitsNagananda.Chumbalkar@hp.com1-0/+3
Dominik said: target_freq cannot be below policy->min or above policy->max. If it were, the whole cpufreq subsystem is broken. But (answer): I think the "ondemand" governor can ask for a target frequency that is below policy->min. ... A patch such as below may be needed to sanitize the target frequency requested by "ondemand". The "conservative" governor already has this check: Signed-off-by: Thomas Renninger <trenn@suse.de> Signed-off-by: Dave Jones <davej@redhat.com> # diff -bur x/drivers/cpufreq/cpufreq_ondemand.c.orig y/drivers/cpufreq/cpufreq_ondemand.c
2009-12-14Merge branch 'for-linus' of ↵Linus Torvalds2-14/+14
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (34 commits) m68k: rename global variable vmalloc_end to m68k_vmalloc_end percpu: add missing per_cpu_ptr_to_phys() definition for UP percpu: Fix kdump failure if booted with percpu_alloc=page percpu: make misc percpu symbols unique percpu: make percpu symbols in ia64 unique percpu: make percpu symbols in powerpc unique percpu: make percpu symbols in x86 unique percpu: make percpu symbols in xen unique percpu: make percpu symbols in cpufreq unique percpu: make percpu symbols in oprofile unique percpu: make percpu symbols in tracer unique percpu: make percpu symbols under kernel/ and mm/ unique percpu: remove some sparse warnings percpu: make alloc_percpu() handle array types vmalloc: fix use of non-existent percpu variable in put_cpu_var() this_cpu: Use this_cpu_xx in trace_functions_graph.c this_cpu: Use this_cpu_xx for ftrace this_cpu: Use this_cpu_xx in nmi handling this_cpu: Use this_cpu operations in RCU this_cpu: Use this_cpu ops for VM statistics ... Fix up trivial (famous last words) global per-cpu naming conflicts in arch/x86/kvm/svm.c mm/slab.c
2009-11-24[ACPI/CPUFREQ] Introduce bios_limit per cpu cpufreq sysfs interfaceThomas Renninger1-0/+21
This interface is mainly intended (and implemented) for ACPI _PPC BIOS frequency limitations, but other cpufreq drivers can also use it for similar use-cases. Why is this needed: Currently it's not obvious why cpufreq got limited. People see cpufreq/scaling_max_freq reduced, but this could have happened by: - any userspace prog writing to scaling_max_freq - thermal limitations - hardware (_PPC in ACPI case) limitiations Therefore export bios_limit (in kHz) to: - Point the user that it's the BIOS (broken or intended) which limits frequency - Export it as a sysfs interface for userspace progs. While this was a rarely used feature on laptops, there will appear more and more server implemenations providing "Green IT" features like allowing the service processor to limit the frequency. People want to know about HW/BIOS frequency limitations. All ACPI P-state driven cpufreq drivers are covered with this patch: - powernow-k8 - powernow-k7 - acpi-cpufreq Tested with a patched DSDT which limits the first two cores (_PPC returns 1) via _PPC, exposed by bios_limit: # echo 2200000 >cpu2/cpufreq/scaling_max_freq # cat cpu*/cpufreq/scaling_max_freq 2600000 2600000 2200000 2200000 # #scaling_max_freq shows general user/thermal/BIOS limitations # cat cpu*/cpufreq/bios_limit 2600000 2600000 2800000 2800000 # #bios_limit only shows the HW/BIOS limitation CC: Pallipadi Venkatesh <venkatesh.pallipadi@intel.com> CC: Len Brown <lenb@kernel.org> CC: davej@codemonkey.org.uk CC: linux@dominikbrodowski.net Signed-off-by: Thomas Renninger <trenn@suse.de> Signed-off-by: Dave Jones <davej@redhat.com>
2009-11-24[CPUFREQ] make internal cpufreq_add_dev_* staticAlex Chiang1-5/+8
No need to export these symbols; make them static. cpufreq_add_dev_policy cpufreq_add_dev_symlink cpufreq_add_dev_interface Signed-off-by: Alex Chiang <achiang@hp.com> Signed-off-by: Dave Jones <davej@redhat.com>
2009-11-24[CPUFREQ] Use global sysfs cpufreq structure for conservative governor tuningsThomas Renninger1-19/+110
Same adustments that have been added to the ondemand recently. Signed-off-by: Thomas Renninger <trenn@suse.de> Signed-off-by: Dave Jones <davej@redhat.com>
2009-11-18[CPUFREQ] Fix stale cpufreq_cpu_governor pointerPrarit Bhargava1-2/+30
Dave, Attached is an update of my patch against the cpufreq fixes branch. Before applying the patch I compiled and booted the tree to see if the panic was still there -- to my surprise it was not. This is because of the conversion of cpufreq_cpu_governor to a char[]. While the panic is kaput, the problem of stale data continues and my patch is still valid. It is possible to end up with the wrong governor after hotplug events because CPUFREQ_DEFAULT_GOVERNOR is statically linked to a default, while the cpu siblings may have had a different governor assigned by a user. ie) the patch is still needed in order to keep the governors assigned properly when hotplugging devices Signed-off-by: Prarit Bhargava <prarit@redhat.com> Signed-off-by: Dave Jones <davej@redhat.com>
2009-11-18[CPUFREQ] Resolve time unit thinko in ondemand/conservative govsPallipadi, Venkatesh2-4/+4
ondemand and conservative governors are messing up time units in the code path where NO_HZ is not enabled and ignore_nice is set. The walltime idletime stored is in jiffies and nice time calculation is happening in microseconds. The problem was reported and diagnosed by Alexander here. http://marc.info/?l=linux-kernel&m=125752550404513&w=2 The patch below fixes this thinko. Reported-by: Alexander Miller <Miller@fmi.uni-stuttgart.de> Tested-by: Alexander Miller <Miller@fmi.uni-stuttgart.de> Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Dave Jones <davej@redhat.com>
2009-11-18[CPUFREQ] Fix use after free on governor restoreDmitry Monakhov1-6/+10
Currently on governer backup/restore path we storing governor's pointer. This is wrong because one may unload governor's module after cpu goes offline. As result use-after-free will take place on restored cpu. It is not easy to exploit this bug, but still we have to close this issue ASAP. Issue was introduced by following commit 084f34939424161669467c19280dbcf637730314 ##TESTCASE## #!/bin/sh -x modprobe acpi_cpufreq # Any non default governor, in may case it is "ondemand" modprobe cpufreq_ondemand echo ondemand > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor rmmod acpi_cpufreq rmmod cpufreq_ondemand modprobe acpi_cpufreq # << use-after-free here. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Dave Jones <davej@redhat.com>
2009-10-29percpu: make percpu symbols in cpufreq uniqueTejun Heo2-14/+14
This patch updates percpu related symbols in cpufreq such that percpu symbols are unique and don't clash with local symbols. This serves two purposes of decreasing the possibility of global percpu symbol collision and allowing dropping per_cpu__ prefix from percpu symbols. * drivers/cpufreq/cpufreq.c: s/policy_cpu/cpufreq_policy_cpu/ * drivers/cpufreq/freq_table.c: s/show_table/cpufreq_show_table/ * arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c: s/drv_data/acfreq_data/ s/old_perf/acfreq_old_perf/ Partly based on Rusty Russell's "alloc_percpu: rename percpu vars which cause name clashes" patch. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Rusty Russell <rusty@rustcorp.com.au>
2009-09-18Merge branch 'next' of ↵Linus Torvalds2-144/+300
git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq: [CPUFREQ] Fix NULL ptr regression in powernow-k8 [CPUFREQ] Create a blacklist for processors that should not load the acpi-cpufreq module. [CPUFREQ] Powernow-k8: Enable more than 2 low P-states [CPUFREQ] remove rwsem lock from CPUFREQ_GOV_STOP call (second call site) [CPUFREQ] ondemand - Use global sysfs dir for tuning settings [CPUFREQ] Introduce global, not per core: /sys/devices/system/cpu/cpufreq [CPUFREQ] Bail out of cpufreq_add_dev if the link for a managed CPU got created [CPUFREQ] Factor out policy setting from cpufreq_add_dev [CPUFREQ] Factor out interface creation from cpufreq_add_dev [CPUFREQ] Factor out symlink creation from cpufreq_add_dev [CPUFREQ] cleanup up -ENOMEM handling in cpufreq_add_dev [CPUFREQ] Reduce scope of cpu_sys_dev in cpufreq_add_dev [CPUFREQ] update Doc for cpuinfo_cur_freq and scaling_cur_freq
2009-09-15Merge branch 'for-linus' of ↵Linus Torvalds2-13/+14
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (46 commits) powerpc64: convert to dynamic percpu allocator sparc64: use embedding percpu first chunk allocator percpu: kill lpage first chunk allocator x86,percpu: use embedding for 64bit NUMA and page for 32bit NUMA percpu: update embedding first chunk allocator to handle sparse units percpu: use group information to allocate vmap areas sparsely vmalloc: implement pcpu_get_vm_areas() vmalloc: separate out insert_vmalloc_vm() percpu: add chunk->base_addr percpu: add pcpu_unit_offsets[] percpu: introduce pcpu_alloc_info and pcpu_group_info percpu: move pcpu_lpage_build_unit_map() and pcpul_lpage_dump_cfg() upward percpu: add @align to pcpu_fc_alloc_fn_t percpu: make @dyn_size mandatory for pcpu_setup_first_chunk() percpu: drop @static_size from first chunk allocators percpu: generalize first chunk allocator selection percpu: build first chunk allocators selectively percpu: rename 4k first chunk allocator to page percpu: improve boot messages percpu: fix pcpu_reclaim() locking ... Fix trivial conflict as by Tejun Heo in kernel/sched.c
2009-09-01[CPUFREQ] remove rwsem lock from CPUFREQ_GOV_STOP call (second call site)Mathieu Desnoyers1-1/+12
remove rwsem lock from CPUFREQ_GOV_STOP call (second call site) commit 42a06f2166f2f6f7bf04f32b4e823eacdceafdc9 Missed a call site for CPUFREQ_GOV_STOP to remove the rwlock taken around the teardown. To make a long story short, the rwlock write-lock causes a circular dependency with cancel_delayed_work_sync(), because the timer handler takes the read lock. Note that all callers to __cpufreq_set_policy are taking the rwsem. All sysfs callers (writers) hold the write rwsem at the earliest sysfs calling stage. However, the rwlock write-lock is not needed upon governor stop. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> CC: rjw@sisk.pl CC: mingo@elte.hu CC: Shaohua Li <shaohua.li@intel.com> CC: Pekka Enberg <penberg@cs.helsinki.fi> CC: Dave Young <hidave.darkstar@gmail.com> CC: "Rafael J. Wysocki" <rjw@sisk.pl> CC: Rusty Russell <rusty@rustcorp.com.au> CC: trenn@suse.de CC: sven.wegener@stealer.net CC: cpufreq@vger.kernel.org Signed-off-by: Dave Jones <davej@redhat.com>
2009-09-01[CPUFREQ] ondemand - Use global sysfs dir for tuning settingsThomas Renninger1-26/+113
Ondemand has only global variables for userspace tunings via sysfs. But they were exposed per CPU which wrongly implies to the user that his settings are applied per cpu. Also locking sysfs against concurrent access won't be necessary anymore after deprecation time. This means the ondemand config dir is moved: /sys/devices/system/cpu/cpu*/cpufreq/ondemand -> /sys/devices/system/cpu/cpufreq/ondemand The old files will still exist, but reading or writing to them will result in one (printk_once) deprecation msg to syslog per file. Signed-off-by: Thomas Renninger <trenn@suse.de> Signed-off-by: Dave Jones <davej@redhat.com>
2009-09-01[CPUFREQ] Introduce global, not per core: /sys/devices/system/cpu/cpufreqThomas Renninger1-1/+8
Currently everything in the cpufreq layer is per core based. This does not reflect reality, for example ondemand on conservative governors have global sysfs variables. Introduce a global cpufreq directory and add the kobject to the governor struct, so that governors can easily access it. The directory is initialized in the cpufreq_core_init initcall and thus will always be created if cpufreq is compiled in, even if no cpufreq driver is active later. Signed-off-by: Thomas Renninger <trenn@suse.de> Signed-off-by: Dave Jones <davej@redhat.com>
2009-09-01[CPUFREQ] Bail out of cpufreq_add_dev if the link for a managed CPU got createdThomas Renninger1-3/+17
Doing: echo 0 >cpu1/online echo 1 >cpu1/online on a managed CPU will result in: Jul 22 15:15:37 linux kernel: [ 80.013864] WARNING: at fs/sysfs/dir.c:487 sysfs_add_one+0xcf/0xe6() Jul 22 15:15:37 linux kernel: [ 80.013866] Hardware name: To Be Filled By O.E.M. Jul 22 15:15:37 linux kernel: [ 80.013868] sysfs: cannot create duplicate filename '/devices/system/cpu/cpu1/cpufreq' Jul 22 15:15:37 linux kernel: [ 80.013870] Modules linked in: powernow_k8 Jul 22 15:15:37 linux kernel: [ 80.013874] Pid: 5750, comm: bash Not tainted 2.6.31-rc2 #40 Jul 22 15:15:37 linux kernel: [ 80.013876] Call Trace: Jul 22 15:15:37 linux kernel: [ 80.013879] [<ffffffff8112ebda>] ? sysfs_add_one+0xcf/0xe6 Jul 22 15:15:37 linux kernel: [ 80.013884] [<ffffffff81041926>] warn_slowpath_common+0x77/0xa4 Jul 22 15:15:37 linux kernel: [ 80.013888] [<ffffffff810419a0>] warn_slowpath_fmt+0x3c/0x3e Jul 22 15:15:37 linux kernel: [ 80.013891] [<ffffffff8112ebda>] sysfs_add_one+0xcf/0xe6 Jul 22 15:15:37 linux kernel: [ 80.013894] [<ffffffff8112f213>] create_dir+0x58/0x87 Jul 22 15:15:37 linux kernel: [ 80.013898] [<ffffffff8112f27a>] sysfs_create_dir+0x38/0x4f Jul 22 15:15:37 linux kernel: [ 80.013902] [<ffffffff811ffb8a>] kobject_add_internal+0x11f/0x1de Jul 22 15:15:37 linux kernel: [ 80.013905] [<ffffffff811ffd21>] kobject_add_varg+0x41/0x4e Jul 22 15:15:37 linux kernel: [ 80.013908] [<ffffffff811ffd7a>] kobject_init_and_add+0x4c/0x57 Jul 22 15:15:37 linux kernel: [ 80.013913] [<ffffffff810667bc>] ? mark_lock+0x22/0x228 Jul 22 15:15:37 linux kernel: [ 80.013918] [<ffffffff813e8a3b>] cpufreq_add_dev_interface+0x40/0x1e4 ... This bug slipped in by git commit: 150b06f7f223cfd0f808737a5243cceca8ea47fa When splitting up cpufreq_add_dev, the whole cpufreq_add_dev function is not left anymore, only cpufreq_add_dev_policy. This patch should reconstruct the identical functionality again as it was before the split. CC: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Thomas Renninger <trenn@suse.de> Signed-off-by: Dave Jones <davej@redhat.com>
2009-09-01[CPUFREQ] Factor out policy setting from cpufreq_add_devDave Jones1-76/+90
Signed-off-by: Dave Jones <davej@redhat.com>
2009-09-01[CPUFREQ] Factor out interface creation from cpufreq_add_devDave Jones1-37/+52
Signed-off-by: Dave Jones <davej@redhat.com>
2009-09-01[CPUFREQ] Factor out symlink creation from cpufreq_add_devDave Jones1-20/+31
Signed-off-by: Dave Jones <davej@redhat.com>
2009-09-01[CPUFREQ] cleanup up -ENOMEM handling in cpufreq_add_devDave Jones1-9/+6
Signed-off-by: Dave Jones <davej@redhat.com>
2009-09-01[CPUFREQ] Reduce scope of cpu_sys_dev in cpufreq_add_devDave Jones1-1/+1
Signed-off-by: Dave Jones <davej@redhat.com>
2009-09-01[CPUFREQ] Re-enable cpufreq suspend and resume codeDominik Brodowski1-88/+7
Commit 4bc5d3413503 is broken and causes regressions: (1) cpufreq_driver->resume() and ->suspend() were only called on __powerpc__, but you could set them on all architectures. In fact, ->resume() was defined and used before the PPC-related commit 42d4dc3f4e1e complained about in 4bc5d3413503. (2) Therfore, the resume functions in acpi_cpufreq and speedstep-smi would never be called. (3) This means speedstep-smi would be unusuable after suspend or resume. The _real_ problem was calling cpufreq_driver->get() with interrupts off, but it re-enabling interrupts on some platforms. Why is ->get() necessary? Some systems like to change the CPU frequency behind our back, especially during BIOS-intensive operations like suspend or resume. If such systems also use a CPU frequency-dependant timing loop, delays might be off by large factors. Therefore, we need to ascertain as soon as possible that the CPU frequency is indeed at the speed we think it is. You can do this two ways: either setting it anew, or trying to get it. The latter is what was done, the former also has the same IRQ issue. So, let's try something different: defer the checking to after interrupts are re-enabled, by calling cpufreq_update_policy() (via schedule_work()). Timings may be off until this later stage, so let's watch out for resume regressions caused by the deferred handling of frequency changes behind the kernel's back. Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Dave Jones <davej@redhat.com>
2009-08-14Merge branch 'percpu-for-linus' into percpu-for-nextTejun Heo3-101/+118
Conflicts: arch/sparc/kernel/smp_64.c arch/x86/kernel/cpu/perf_counter.c arch/x86/kernel/setup_percpu.c drivers/cpufreq/cpufreq_ondemand.c mm/percpu.c Conflicts in core and arch percpu codes are mostly from commit ed78e1e078dd44249f88b1dd8c76dafb39567161 which substituted many num_possible_cpus() with nr_cpu_ids. As for-next branch has moved all the first chunk allocators into mm/percpu.c, the changes are moved from arch code to mm/percpu.c. Signed-off-by: Tejun Heo <tj@kernel.org>
2009-08-04[CPUFREQ] Make cpufreq suspend code conditional on powerpc.Dave Jones1-2/+19
The suspend code runs with interrupts disabled, and the powerpc workaround we do in the cpufreq suspend hook calls the drivers ->get method. powernow-k8's ->get does an smp_call_function_single which needs interrupts enabled cpufreq's suspend/resume code was added in 42d4dc3f4e1e to work around a hardware problem on ppc powerbooks. If we make all this code conditional on powerpc, we avoid the issue above. Signed-off-by: Dave Jones <davej@redhat.com>
2009-08-04[CPUFREQ] Fix a kobject reference bug related to managed CPUsThomas Renninger1-1/+3
The first offline/online cycle is successful, the second not. Doing: echo 0 >cpu1/online echo 1 >cpu1/online echo 0 >cpu1/online The last command will trigger: Jul 22 14:39:50 linux kernel: [ 593.210125] ------------[ cut here ]------------ Jul 22 14:39:50 linux kernel: [ 593.210139] WARNING: at lib/kref.c:43 kref_get+0x23/0x2b() Jul 22 14:39:50 linux kernel: [ 593.210144] Hardware name: To Be Filled By O.E.M. Jul 22 14:39:50 linux kernel: [ 593.210148] Modules linked in: powernow_k8 Jul 22 14:39:50 linux kernel: [ 593.210158] Pid: 378, comm: kondemand/2 Tainted: G W 2.6.31-rc2 #38 Jul 22 14:39:50 linux kernel: [ 593.210163] Call Trace: Jul 22 14:39:50 linux kernel: [ 593.210171] [<ffffffff812008e8>] ? kref_get+0x23/0x2b Jul 22 14:39:50 linux kernel: [ 593.210181] [<ffffffff81041926>] warn_slowpath_common+0x77/0xa4 Jul 22 14:39:50 linux kernel: [ 593.210190] [<ffffffff81041962>] warn_slowpath_null+0xf/0x11 Jul 22 14:39:50 linux kernel: [ 593.210198] [<ffffffff812008e8>] kref_get+0x23/0x2b Jul 22 14:39:50 linux kernel: [ 593.210206] [<ffffffff811ffa19>] kobject_get+0x1a/0x22 Jul 22 14:39:50 linux kernel: [ 593.210214] [<ffffffff813e815d>] cpufreq_cpu_get+0x8a/0xcb Jul 22 14:39:50 linux kernel: [ 593.210222] [<ffffffff813e87d1>] __cpufreq_driver_getavg+0x1d/0x67 Jul 22 14:39:50 linux kernel: [ 593.210231] [<ffffffff813ea18f>] do_dbs_timer+0x158/0x27f Jul 22 14:39:50 linux kernel: [ 593.210240] [<ffffffff810529ea>] worker_thread+0x200/0x313 ... The output continues on every do_dbs_timer ondemand freq checking poll. This regression was introduced by git commit: 3f4a782b5ce2698b1870b5a7b573cd721d4fce33 The policy is released when the cpufreq device is removed in: __cpufreq_remove_dev(): /* if this isn't the CPU which is the parent of the kobj, we * only need to unlink, put and exit */ Not creating the symlink is not sever at all. As long as: sysfs_remove_link(&sys_dev->kobj, "cpufreq"); handles it gracefully that the symlink did not exist. Possibly no error should be returned at all, because ondemand governor would still provide the same functionality. Userspace in userspace gov case might be confused if the link is missing. Resolves http://bugzilla.kernel.org/show_bug.cgi?id=13903 CC: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> CC: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Thomas Renninger <trenn@suse.de> Signed-off-by: Dave Jones <davej@redhat.com>
2009-08-04[CPUFREQ] Do not set policy for offline cpusPrarit Bhargava1-0/+2
Suspend/Resume fails on multi socket, multi core systems because the cpufreq code erroneously sets the per_cpu policy_cpu value when a logical cpu is offline. This most notably results in missing sysfs files that are used to set the cpu frequencies of the various cpus. Signed-off-by: Prarit Bhargava <prarit@redhat.com> Signed-off-by: Dave Jones <davej@redhat.com>
2009-08-04[CPUFREQ] Fix NULL pointer dereference regression in conservative governorPallipadi, Venkatesh1-0/+6
Commit ee88415caf736b89500f16e0a545614541a45005 introduced this regression when it removed enable bit in cpu_dbs_info_s. That added a possibility of dbs_cpufreq_notifier getting called for a CPU that is not yet managed by conservative governor. That will happen as the transition notifier is set as soon as one CPU switches to conservative governor and other CPUs can get a NULL pointer dereference without the enable bit check. Add the enable bit back again. Reported-by: Lermytte Christophe <Christophe.Lermytte@thomson.net> Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Dave Jones <davej@redhat.com>
2009-07-09[CPUFREQ] Fix compile failure in cpufreq.cDave Jones1-3/+4
managed_policy is out of scope for the non-smp case. Declare it locally where used (twice) Signed-off-by: Dave Jones <davej@redhat.com>
2009-07-07[CPUFREQ] fix (utter) cpufreq_add_dev messMathieu Desnoyers1-25/+40
OK, I've tried to clean it up the best I could, but please test this with concurrent cpu hotplug and cpufreq add/remove in loops. I'm sure we will make other interesting findings. This is step one of fixing the overall locking dependency mess in cpufreq. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> CC: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> CC: rjw@sisk.pl CC: mingo@elte.hu CC: Shaohua Li <shaohua.li@intel.com> CC: Pekka Enberg <penberg@cs.helsinki.fi> CC: Dave Young <hidave.darkstar@gmail.com> CC: "Rafael J. Wysocki" <rjw@sisk.pl> CC: Rusty Russell <rusty@rustcorp.com.au> CC: sven.wegener@stealer.net CC: cpufreq@vger.kernel.org CC: Thomas Renninger <trenn@suse.de> Signed-off-by: Dave Jones <davej@redhat.com>
2009-07-07[CPUFREQ] Cleanup locking in conservative governorvenkatesh.pallipadi@intel.com1-21/+13
Redesign the locking inside conservative driver. Make dbs_mutex handle all the global state changes inside the driver and invent a new percpu mutex to serialize percpu timer and frequency limit change. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Dave Jones <davej@redhat.com>
2009-07-07[CPUFREQ] Cleanup locking in ondemand governorvenkatesh.pallipadi@intel.com1-35/+27
Redesign the locking inside ondemand driver. Make dbs_mutex handle all the global state changes inside the driver and invent a new percpu mutex to serialize percpu timer and frequency limit change. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Dave Jones <davej@redhat.com>
2009-07-07[CPUFREQ] Eliminate the recent lockdep warnings in cpufreqvenkatesh.pallipadi@intel.com3-34/+24
Commit b14893a62c73af0eca414cfed505b8c09efc613c although it was very much needed to properly cleanup ondemand timer, opened-up a can of worms related to locking dependencies in cpufreq. Patch here defines the need for dbs_mutex and cleans up its usage in ondemand governor. This also resolves the lockdep warnings reported here http://lkml.indiana.edu/hypermail/linux/kernel/0906.1/01925.html http://lkml.indiana.edu/hypermail/linux/kernel/0907.0/00820.html and few others.. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Dave Jones <davej@redhat.com>
2009-06-24percpu: clean up percpu variable definitionsTejun Heo2-13/+14
Percpu variable definition is about to be updated such that all percpu symbols including the static ones must be unique. Update percpu variable definitions accordingly. * as,cfq: rename ioc_count uniquely * cpufreq: rename cpu_dbs_info uniquely * xen: move nesting_count out of xen_evtchn_do_upcall() and rename it * mm: move ratelimits out of balance_dirty_pages_ratelimited_nr() and rename it * ipv4,6: rename cookie_scratch uniquely * x86 perf_counter: rename prev_left to pmc_prev_left, irq_entry to pmc_irq_entry and nmi_entry to pmc_nmi_entry * perf_counter: rename disable_count to perf_disable_count * ftrace: rename test_event_disable to ftrace_test_event_disable * kmemleak: rename test_pointer to kmemleak_test_pointer * mce: rename next_interval to mce_next_interval [ Impact: percpu usage cleanups, no duplicate static percpu var names ] Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Dave Jones <davej@redhat.com> Cc: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: linux-mm <linux-mm@kvack.org> Cc: David S. Miller <davem@davemloft.net> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Steven Rostedt <srostedt@redhat.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Andi Kleen <andi@firstfloor.org>
2009-06-15[CPUFREQ] Only set sampling_rate_max deprecated, sampling_rate_min is usefulThomas Renninger2-31/+4
Update the documentation accordingly. Cleanup and use printk_once. Signed-off-by: Thomas Renninger <trenn@suse.de> Signed-off-by: Dave Jones <davej@redhat.com>
2009-06-15[CPUFREQ] ondemand: Uncouple minimal sampling rate from HZ in NO_HZ caseThomas Renninger2-53/+41
With this patch you have following minimal sampling rate restrictions: Kernel restrictions: If CONFIG_NO_HZ is set, the limit is 10ms fixed. If CONFIG_NO_HZ is not set or no_hz=off boot parameter is used, the limits depend on the CONFIG_HZ option: HZ=1000: min=20000us (20ms) HZ=250: min=80000us (80ms) HZ=100: min=200000us (200ms) HW restrictions: Do not sample/poll more often than HW latency * 100 exported by the low level cpufreq HW driver The higher value of above restrictions is the minimal sampling rate that can be set (and can be seen via ondemand/sampling_rate_min sysfs file) Default sampling rate still is HW latency * 1000, but this will now end up in lower values on latest (Intel and AMD) hardware as these can switch really fast and sampling rate mostly was limited to the 80ms or 200ms (depending on whether HZ=250 or HZ=1000 is used). Signed-off-by: Thomas Renninger <trenn@suse.de> Cc: Pallipadi Venkatesh <venkatesh.pallipadi@intel.com> Signed-off-by: Dave Jones <davej@redhat.com>
2009-06-09cpumask: alloc zeroed cpumask for static cpumask_var_tsYinghai Lu1-1/+1
These are defined as static cpumask_var_t so if MAXSMP is not used, they are cleared already. Avoid surprises when MAXSMP is enabled. Signed-off-by: Yinghai Lu <yinghai.lu@kernel.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-05-26[CPUFREQ] fix timer teardown in ondemand governorMathieu Desnoyers1-1/+4
* Rafael J. Wysocki (rjw@sisk.pl) wrote: > This message has been generated automatically as a part of a report > of regressions introduced between 2.6.28 and 2.6.29. > > The following bug entry is on the current list of known regressions > introduced between 2.6.28 and 2.6.29. Please verify if it still should > be listed and let me know (either way). > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13186 > Subject : cpufreq timer teardown problem > Submitter : Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> > Date : 2009-04-23 14:00 (24 days old) > References : http://marc.info/?l=linux-kernel&m=124049523515036&w=4 > Handled-By : Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> > Patch : http://patchwork.kernel.org/patch/19754/ > http://patchwork.kernel.org/patch/19753/ > (updated changelog) cpufreq fix timer teardown in ondemand governor The problem is that dbs_timer_exit() uses cancel_delayed_work() when it should use cancel_delayed_work_sync(). cancel_delayed_work() does not wait for the workqueue handler to exit. The ondemand governor does not seem to be affected because the "if (!dbs_info->enable)" check at the beginning of the workqueue handler returns immediately without rescheduling the work. The conservative governor in 2.6.30-rc has the same check as the ondemand governor, which makes things usually run smoothly. However, if the governor is quickly stopped and then started, this could lead to the following race : dbs_enable could be reenabled and multiple do_dbs_timer handlers would run. This is why a synchronized teardown is required. The following patch applies to, at least, 2.6.28.x, 2.6.29.1, 2.6.30-rc2. Depends on patch cpufreq: remove rwsem lock from CPUFREQ_GOV_STOP call Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> CC: Andrew Morton <akpm@linux-foundation.org> CC: gregkh@suse.de CC: stable@kernel.org CC: cpufreq@vger.kernel.org CC: Ingo Molnar <mingo@elte.hu> CC: rjw@sisk.pl CC: Ben Slusky <sluskyb@paranoiacs.org> Signed-off-by: Dave Jones <davej@redhat.com>