Age | Commit message (Collapse) | Author | Files | Lines |
|
commit 7e7e8fe69820c6fa31395dbbd8e348e3c69cd2a9 upstream.
The pcc-cpufreq driver is not automatically loaded on systems where
the platform's power management setting requires this driver.
Instead, on those systems no CPU frequency driver is registered and
active.
Make the autoloading matching criteria for loading the pcc-cpufreq
driver the same as done in acpi-cpufreq by commit c655affbd524d01
("ACPI / cpufreq: Add ACPI processor device IDs to acpi-cpufreq").
x86 CPU frequency drivers are now typically autoloaded by specifying
MODULE_DEVICE_TABLE entries and x86cpu model specific matching.
But pcc-cpufreq was omitted when acpi-cpufreq and other drivers were
changed to use this approach.
Both acpi-cpufreq and pcc-cpufreq depend on a distinct and mutually
exclusive set of ACPI methods which are not directly tied to specific
processor model numbers. Both of these drivers have init routines
which look for their required ACPI methods. As a result, only the
appropriate driver registers as the cpu frequency driver and the other
one ends up being unloaded.
Tested on various systems where acpi-cpufreq, intel_pstate, and
pcc-cpufreq are the expected cpu frequency drivers.
Signed-off-by: Lenny Szubowicz <lszubowi@redhat.com>
Signed-off-by: Joseph Szczypek <joseph.szczypek@hp.com>
Reported-by: Trinh Dao <trinh.dao@hp.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit cb57720bf79688d64854a0a43565aa52303c1f3f upstream.
If ACPI _PPC changed notification happens before governor was initiated
while kernel is booting, a NULL pointer dereference will be triggered:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000030
IP: [<ffffffff81470453>] __cpufreq_governor+0x23/0x1e0
PGD 0
Oops: 0000 [#1] SMP
... ...
RIP: 0010:[<ffffffff81470453>] [<ffffffff81470453>]
__cpufreq_governor+0x23/0x1e0
RSP: 0018:ffff881fcfbcfbb8 EFLAGS: 00010286
RAX: 0000000000000000 RBX: ffff881fd11b3980 RCX: ffff88407fc20000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff881fd11b3980
RBP: ffff881fcfbcfbd8 R08: 0000000000000000 R09: 000000000000000f
R10: ffffffff818068d0 R11: 0000000000000043 R12: 0000000000000004
R13: 0000000000000000 R14: ffffffff8196cae0 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff881fffc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000030 CR3: 00000000018ae000 CR4: 00000000000407f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kworker/0:3 (pid: 750, threadinfo ffff881fcfbce000, task
ffff881fcf556400)
Stack:
ffff881fffc17d00 ffff881fcfbcfc18 ffff881fd11b3980 0000000000000000
ffff881fcfbcfc08 ffffffff81470d08 ffff881fd11b3980 0000000000000007
ffff881fcfbcfc18 ffff881fffc17d00 ffff881fcfbcfd28 ffffffff81472e9a
Call Trace:
[<ffffffff81470d08>] __cpufreq_set_policy+0x1b8/0x2e0
[<ffffffff81472e9a>] cpufreq_update_policy+0xca/0x150
[<ffffffff81472f20>] ? cpufreq_update_policy+0x150/0x150
[<ffffffff81324a96>] acpi_processor_ppc_has_changed+0x71/0x7b
[<ffffffff81320bcd>] acpi_processor_notify+0x55/0x115
[<ffffffff812f9c29>] acpi_device_notify+0x19/0x1b
[<ffffffff813084ca>] acpi_ev_notify_dispatch+0x41/0x5f
[<ffffffff812f64a4>] acpi_os_execute_deferred+0x27/0x34
The root cause is a race conditon -- cpufreq core and acpi-cpufreq driver
were initiated, but cpufreq_governor wasn't and _PPC changed notification
happened, __cpufreq_governor() was called within acpi_os_execute_deferred
kernel thread context.
To fix this panic issue, add pointer checking code in __cpufreq_governor()
before pointer policy->governor is to be dereferenced.
Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 61882b63171736571e1139ab5aa929e3bb336016 upstream.
The two functions s3c2416_cpufreq_driver_init and s3c_cpufreq_register
are marked init but are called from a context that might be run after
the __init sections are discarded, as the compiler points out:
WARNING: vmlinux.o(.data+0x1ad9dc): Section mismatch in reference from the variable s3c2416_cpufreq_driver to the function .init.text:s3c2416_cpufreq_driver_init()
WARNING: drivers/built-in.o(.text+0x35b5dc): Section mismatch in reference from the function s3c2410a_cpufreq_add() to the function .init.text:s3c_cpufreq_register()
This removes the __init markings.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit d4d4eda23794c701442e55129dd4f8f2fefd5e4d upstream.
On Dell Latitude C600 laptop with Pentium 3 850MHz processor, the
speedstep-smi driver sometimes loads and sometimes doesn't load with
"change to state X failed" message.
The hardware sometimes refuses to change frequency and in this case, we
need to retry later. I found out that we need to enable interrupts while
waiting. When we enable interrupts, the hardware blockage that prevents
frequency transition resolves and the transition is possible. With
disabled interrupts, the blockage doesn't resolve (no matter how long do
we wait). The exact reasons for this hardware behavior are unknown.
This patch enables interrupts in the function speedstep_set_state that can
be called with disabled interrupts. However, this function is called with
disabled interrupts only from speedstep_get_freqs, so it shouldn't cause
any problem.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 36b4bed5cd8f6e17019fa7d380e0836872c7b367 upstream.
Code which changes policy to powersave changes also max_policy_pct based on
max_freq. Code which change max_perf_pct has upper limit base on value
max_policy_pct. When policy is changing from powersave back to performance
then max_policy_pct is not changed. Which means that changing max_perf_pct is
not possible to high values if max_freq was too low in powersave policy.
Test case:
$ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
800000
$ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
3300000
$ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
performance
$ cat /sys/devices/system/cpu/intel_pstate/max_perf_pct
100
$ echo powersave > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
$ echo 800000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
$ echo 20 > /sys/devices/system/cpu/intel_pstate/max_perf_pct
$ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
powersave
$ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
800000
$ cat /sys/devices/system/cpu/intel_pstate/max_perf_pct
20
$ echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
$ echo 3300000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
$ echo 100 > /sys/devices/system/cpu/intel_pstate/max_perf_pct
$ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
performance
$ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
3300000
$ cat /sys/devices/system/cpu/intel_pstate/max_perf_pct
24
And now intel_pstate driver allows to set maximal value for max_perf_pct based
on max_policy_pct which is 24 for previous powersave max_freq 800000.
This patch will set default value for max_policy_pct when setting policy to
performance so it will allow to set also max value for max_perf_pct.
Signed-off-by: Pali Rohár <pali.rohar@gmail.com>
Acked-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit c034b02e213d271b98c45c4a7b54af8f69aaac1e upstream.
Currently the core does not expose scaling_cur_freq for set_policy()
drivers this breaks some userspace monitoring tools.
Change the core to expose this file for all drivers and if the
set_policy() driver supports the get() callback use it to retrieve the
current frequency.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=73741
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 179e8471673ce0249cd4ecda796008f7757e5bad upstream.
Ensure that cpu->cpu is set before writing MSR_IA32_PERF_CTL during CPU
initialization. Otherwise only cpu0 has its P-state set and all other
cores are left with their values unchanged.
In most cases, this is not too serious because the P-states will be set
correctly when the timer function is run. But when the default governor
is set to performance, the per-CPU current_pstate stays the same forever
and no attempts are made to write the MSRs again.
Signed-off-by: Vincent Minet <vincent@vincent-minet.net>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 5a90af67c2126fe1d04ebccc1f8177e6ca70d3a9 upstream.
Since commtit 8a7b1227e303 (cpufreq: davinci: move cpufreq driver to
drivers/cpufreq) this added dependancy only for CONFIG_ARCH_DAVINCI_DA850
where as davinci_cpufreq_init() call is used by all davinci platform.
This patch fixes following build error:
arch/arm/mach-davinci/built-in.o: In function `davinci_init_late':
:(.init.text+0x928): undefined reference to `davinci_cpufreq_init'
make: *** [vmlinux] Error 1
Fixes: 8a7b1227e303 (cpufreq: davinci: move cpufreq driver to drivers/cpufreq)
Signed-off-by: Lad, Prabhakar <prabhakar.csengg@gmail.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit c5450db85b828d0c46ac8fc570fb8a51bf07ac40 upstream.
While accessing cur_policy during executing events
CPUFREQ_GOV_START, CPUFREQ_GOV_STOP, CPUFREQ_GOV_LIMITS,
same mutex lock is not taken, dbs_data->mutex, which leads
to race and data corruption while running continious suspend
resume test. This is seen with ondemand governor with suspend
resume test using rtcwake.
Unable to handle kernel NULL pointer dereference at virtual address 00000028
pgd = ed610000
[00000028] *pgd=adf11831, *pte=00000000, *ppte=00000000
Internal error: Oops: 17 [#1] PREEMPT SMP ARM
Modules linked in: nvhost_vi
CPU: 1 PID: 3243 Comm: rtcwake Not tainted 3.10.24-gf5cf9e5 #1
task: ee708040 ti: ed61c000 task.ti: ed61c000
PC is at cpufreq_governor_dbs+0x400/0x634
LR is at cpufreq_governor_dbs+0x3f8/0x634
pc : [<c05652b8>] lr : [<c05652b0>] psr: 600f0013
sp : ed61dcb0 ip : 000493e0 fp : c1cc14f0
r10: 00000000 r9 : 00000000 r8 : 00000000
r7 : eb725280 r6 : c1cc1560 r5 : eb575200 r4 : ebad7740
r3 : ee708040 r2 : ed61dca8 r1 : 001ebd24 r0 : 00000000
Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
Control: 10c5387d Table: ad61006a DAC: 00000015
[<c05652b8>] (cpufreq_governor_dbs+0x400/0x634) from [<c055f700>] (__cpufreq_governor+0x98/0x1b4)
[<c055f700>] (__cpufreq_governor+0x98/0x1b4) from [<c0560770>] (__cpufreq_set_policy+0x250/0x320)
[<c0560770>] (__cpufreq_set_policy+0x250/0x320) from [<c0561dcc>] (cpufreq_update_policy+0xcc/0x168)
[<c0561dcc>] (cpufreq_update_policy+0xcc/0x168) from [<c0561ed0>] (cpu_freq_notify+0x68/0xdc)
[<c0561ed0>] (cpu_freq_notify+0x68/0xdc) from [<c008eff8>] (notifier_call_chain+0x4c/0x8c)
[<c008eff8>] (notifier_call_chain+0x4c/0x8c) from [<c008f3d4>] (__blocking_notifier_call_chain+0x50/0x68)
[<c008f3d4>] (__blocking_notifier_call_chain+0x50/0x68) from [<c008f40c>] (blocking_notifier_call_chain+0x20/0x28)
[<c008f40c>] (blocking_notifier_call_chain+0x20/0x28) from [<c00aac6c>] (pm_qos_update_bounded_target+0xd8/0x310)
[<c00aac6c>] (pm_qos_update_bounded_target+0xd8/0x310) from [<c00ab3b0>] (__pm_qos_update_request+0x64/0x70)
[<c00ab3b0>] (__pm_qos_update_request+0x64/0x70) from [<c004b4b8>] (tegra_pm_notify+0x114/0x134)
[<c004b4b8>] (tegra_pm_notify+0x114/0x134) from [<c008eff8>] (notifier_call_chain+0x4c/0x8c)
[<c008eff8>] (notifier_call_chain+0x4c/0x8c) from [<c008f3d4>] (__blocking_notifier_call_chain+0x50/0x68)
[<c008f3d4>] (__blocking_notifier_call_chain+0x50/0x68) from [<c008f40c>] (blocking_notifier_call_chain+0x20/0x28)
[<c008f40c>] (blocking_notifier_call_chain+0x20/0x28) from [<c00ac228>] (pm_notifier_call_chain+0x1c/0x34)
[<c00ac228>] (pm_notifier_call_chain+0x1c/0x34) from [<c00ad38c>] (enter_state+0xec/0x128)
[<c00ad38c>] (enter_state+0xec/0x128) from [<c00ad400>] (pm_suspend+0x38/0xa4)
[<c00ad400>] (pm_suspend+0x38/0xa4) from [<c00ac114>] (state_store+0x70/0xc0)
[<c00ac114>] (state_store+0x70/0xc0) from [<c027b1e8>] (kobj_attr_store+0x14/0x20)
[<c027b1e8>] (kobj_attr_store+0x14/0x20) from [<c019cd9c>] (sysfs_write_file+0x104/0x184)
[<c019cd9c>] (sysfs_write_file+0x104/0x184) from [<c0143038>] (vfs_write+0xd0/0x19c)
[<c0143038>] (vfs_write+0xd0/0x19c) from [<c0143414>] (SyS_write+0x4c/0x78)
[<c0143414>] (SyS_write+0x4c/0x78) from [<c000f080>] (ret_fast_syscall+0x0/0x30)
Code: e1a00006 eb084346 e59b0020 e5951024 (e5903028)
---[ end trace 0488523c8f6b0f9d ]---
Signed-off-by: Bibek Basu <bbasu@nvidia.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 22c73795b101597051924556dce019385a1e2fa0 upstream.
This patch reorders reported frequencies from the highest to the lowest,
just like in other frequency drivers.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit d82b922a4acc1781d368aceac2f9da43b038cab2 upstream.
The powernow-k6 driver used to read the initial multiplier from the
powernow register. However, there is a problem with this:
* If there was a frequency transition before, the multiplier read from the
register corresponds to the current multiplier.
* If there was no frequency transition since reset, the field in the
register always reads as zero, regardless of the current multiplier that
is set using switches on the mainboard and that the CPU is running at.
The zero value corresponds to multiplier 4.5, so as a consequence, the
powernow-k6 driver always assumes multiplier 4.5.
For example, if we have 550MHz CPU with bus frequency 100MHz and
multiplier 5.5, the powernow-k6 driver thinks that the multiplier is 4.5
and bus frequency is 122MHz. The powernow-k6 driver then sets the
multiplier to 4.5, underclocking the CPU to 450MHz, but reports the
current frequency as 550MHz.
There is no reliable way how to read the initial multiplier. I modified
the driver so that it contains a table of known frequencies (based on
parameters of existing CPUs and some common overclocking schemes) and sets
the multiplier according to the frequency. If the frequency is unknown
(because of unusual overclocking or underclocking), the user must supply
the bus speed and maximum multiplier as module parameters.
This patch should be backported to all stable kernels. If it doesn't
apply cleanly, change it, or ask me to change it.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit e20e1d0ac02308e2211306fc67abcd0b2668fb8b upstream.
I found out that a system with k6-3+ processor is unstable during network
server load. The system locks up or the network card stops receiving. The
reason for the instability is the CPU frequency scaling.
During frequency transition the processor is in "EPM Stop Grant" state.
The documentation says that the processor doesn't respond to inquiry
requests in this state. Consequently, coherency of processor caches and
bus master devices is not maintained, causing the system instability.
This patch flushes the cache during frequency transition. It fixes the
instability.
Other minor changes:
* u64 invalue changed to unsigned long because the variable is 32-bit
* move the logic to set the multiplier to a separate function
powernow_k6_set_cpu_multiplier
* preserve lower 5 bits of the powernow port instead of 4 (the voltage
field has 5 bits)
* mask interrupts when reading the multiplier, so that the port is not
open during other activity (running other kernel code with the port open
shouldn't cause any misbehavior, but we should better be safe and keep
the port closed)
This patch should be backported to all stable kernels. If it doesn't
apply cleanly, change it, or ask me to change it.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit c3274763bfc3bf1ececa269ed6e6c4d7ec1c3e5e upstream.
The powernow-k8 driver maintains a per-cpu data-structure called
powernow_data that is used to perform the frequency transitions.
It initializes this data structure only for the policy->cpu. So,
accesses to this data structure by other CPUs results in various
problems because they would have been uninitialized.
Specifically, if a cpu (!= policy->cpu) invokes the drivers' ->get()
function, it returns 0 as the KHz value, since its per-cpu memory
doesn't point to anything valid. This causes problems during
suspend/resume since cpufreq_update_policy() tries to enforce this
(0 KHz) as the current frequency of the CPU, and this madness gets
propagated to adjust_jiffies() as well. Eventually, lots of things
start breaking down, including the r8169 ethernet card, in one
particularly interesting case reported by Pierre Ossman.
Fix this by initializing the per-cpu data-structures of all the CPUs
in the policy appropriately.
References: https://bugzilla.kernel.org/show_bug.cgi?id=70311
Reported-by: Pierre Ossman <pierre@ossman.eu>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 6cbd7ee10e2842a3d1f9b60abede1c8f3d1f1130 upstream.
KVM environments do not support APERF/MPERF MSRs. intel_pstate cannot
operate without these registers.
The previous validity checks in intel_pstate_msrs_not_valid() are
insufficent in nested KVMs.
References: https://bugzilla.redhat.com/show_bug.cgi?id=1046317
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 98a947abdd54e5de909bebadfced1696ccad30cf upstream.
If pstate.current_pstate is 0 after the initial
intel_pstate_get_cpu_pstates(), this means that we were unable to
obtain any useful P-state information and there is no reason to
continue, so free memory and return an error in that case.
This fixes the following divide error occuring in a nested KVM
guest:
Intel P-state driver initializing.
Intel pstate controlling: cpu 0
cpufreq: __cpufreq_add_dev: ->get() failed
divide error: 0000 [#1] SMP
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.13.0-0.rc4.git5.1.fc21.x86_64 #1
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
task: ffff88001ea20000 ti: ffff88001e9bc000 task.ti: ffff88001e9bc000
RIP: 0010:[<ffffffff815c551d>] [<ffffffff815c551d>] intel_pstate_timer_func+0x11d/0x2b0
RSP: 0000:ffff88001ee03e18 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff88001a454348 RCX: 0000000000006100
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffff88001ee03e38 R08: 0000000000000000 R09: 0000000000000000
R10: ffff88001ea20000 R11: 0000000000000000 R12: 00000c0a1ea20000
R13: 1ea200001ea20000 R14: ffffffff815c5400 R15: ffff88001a454348
FS: 0000000000000000(0000) GS:ffff88001ee00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 0000000001c0c000 CR4: 00000000000006f0
Stack:
fffffffb1a454390 ffffffff821a4500 ffff88001a454390 0000000000000100
ffff88001ee03ea8 ffffffff81083e9a ffffffff81083e15 ffffffff82d5ed40
ffffffff8258cc60 0000000000000000 ffffffff81ac39de 0000000000000000
Call Trace:
<IRQ>
[<ffffffff81083e9a>] call_timer_fn+0x8a/0x310
[<ffffffff81083e15>] ? call_timer_fn+0x5/0x310
[<ffffffff815c5400>] ? pid_param_set+0x130/0x130
[<ffffffff81084354>] run_timer_softirq+0x234/0x380
[<ffffffff8107aee4>] __do_softirq+0x104/0x430
[<ffffffff8107b5fd>] irq_exit+0xcd/0xe0
[<ffffffff81770645>] smp_apic_timer_interrupt+0x45/0x60
[<ffffffff8176efb2>] apic_timer_interrupt+0x72/0x80
<EOI>
[<ffffffff810e15cd>] ? vprintk_emit+0x1dd/0x5e0
[<ffffffff81757719>] printk+0x67/0x69
[<ffffffff815c1493>] __cpufreq_add_dev.isra.13+0x883/0x8d0
[<ffffffff815c14f0>] cpufreq_add_dev+0x10/0x20
[<ffffffff814a14d1>] subsys_interface_register+0xb1/0xf0
[<ffffffff815bf5cf>] cpufreq_register_driver+0x9f/0x210
[<ffffffff81fb19af>] intel_pstate_init+0x27d/0x3be
[<ffffffff81761e3e>] ? mutex_unlock+0xe/0x10
[<ffffffff81fb1732>] ? cpufreq_gov_dbs_init+0x12/0x12
[<ffffffff8100214a>] do_one_initcall+0xfa/0x1b0
[<ffffffff8109dbf5>] ? parse_args+0x225/0x3f0
[<ffffffff81f64193>] kernel_init_freeable+0x1fc/0x287
[<ffffffff81f638d0>] ? do_early_param+0x88/0x88
[<ffffffff8174b530>] ? rest_init+0x150/0x150
[<ffffffff8174b53e>] kernel_init+0xe/0x130
[<ffffffff8176e27c>] ret_from_fork+0x7c/0xb0
[<ffffffff8174b530>] ? rest_init+0x150/0x150
Code: c1 e0 05 48 63 bc 03 10 01 00 00 48 63 83 d0 00 00 00 48 63 d6 48 c1 e2 08 c1 e1 08 4c 63 c2 48 c1 e0 08 48 98 48 c1 e0 08 48 99 <49> f7 f8 48 98 48 0f af f8 48 c1 ff 08 29 f9 89 ca c1 fa 1f 89
RIP [<ffffffff815c551d>] intel_pstate_timer_func+0x11d/0x2b0
RSP <ffff88001ee03e18>
---[ end trace f166110ed22cc37a ]---
Kernel panic - not syncing: Fatal exception in interrupt
Reported-and-tested-by: Kashyap Chamarthy <kchamart@redhat.com>
Cc: Josh Boyer <jwboyer@fedoraproject.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit fbbc5bfb44a22e7a8ef753a1c8dfb448d7ac8b85 upstream.
Calxeda's new ECX-2000 part uses the same cpufreq interface as highbank,
so add it to the driver's compatibility list.
This is a minor change that can safely be applied to the 3.10 and 3.11
stable trees.
Signed-off-by: Mark Langsdorf <mark.langsdorf@calxeda.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Make acpi_cpufreq_init() return error codes when the driver cannot be
registered so that the module doesn't stay useless in memory and so
that acpi_cpufreq_exit() doesn't attempt to unregister things that
have never been registered when the module is unloaded.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
|
|
The minimum pstate is supposed to be a percentage of the maximum P
state available. Calculate min using max pstate and not the
current max which may have been limited by the user
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
This patch addresses Bug 60727
(https://bugzilla.kernel.org/show_bug.cgi?id=60727)
which was due to the truncation of intermediate values in the
calculations, which causes the code to consistently underestimate the
current cpu frequency, specifically 100% cpu utilization was truncated
down to the setpoint of 97%. This patch fixes the problem by keeping
the results of all intermediate calculations as fixed point numbers
rather scaling them back and forth between integers and fixed point.
References: https://bugzilla.kernel.org/show_bug.cgi?id=60727
Signed-off-by: Brennan Shacklett <bpshacklett@gmail.com>
Acked-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
The index field of cpufreq_frequency_table has been renamed to
driver_data by commit 5070158 (cpufreq: rename index as driver_data
in cpufreq_frequency_table).
This patch updates the s3c64xx driver to match.
Signed-off-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com>
Cc: 3.11+ <stable@vger.kernel.org> # 3.11+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
The expression in line 398 of intel_pstate.c causes the following
warning to be emitted:
drivers/cpufreq/intel_pstate.c:398:3: warning: left shift count >= width of type
which happens because unsigned long is 32-bit on some architectures.
Fix that by using a helper u64 variable and simplify the code
slightly.
Tested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
If the system is suspended while max_perf_pct is less than 100 percent
or no_turbo set policy->{min,max} will be set incorrectly with scaled
values which turn the scaled values into hard limits.
References: https://bugzilla.kernel.org/show_bug.cgi?id=61241
Reported-by: Patrick Bartels <petzicus@googlemail.com>
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Cc: 3.9+ <stable@vger.kernel.org> # 3.9+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
When sysfs for no_turbo is set, then also some p states in turbo regions
are observed. This patch will set IDA Engage bit when no_turbo is set to
explicitly disengage turbo.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Acked-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Since the patch "cpufreq: cpufreq-cpu0: NULL is a valid regulator", cpu_reg
contains an error value if the regulator is not set, instead of NULL.
Accordingly, fix the remaining check for non-NULL cpu_reg.
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
'clk_round_rate' returns a negative error code upon failure. This
will never get detected by unsigned 'newfreq'. Make it signed.
Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
If 'dvfs_info' is NULL (due to devm_kzalloc failure) the failure
error message would try to dereference it. Use 'pdev' instead.
Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
cpufreq_get()
cpufreq_get() can be called from external drivers which might not be aware if
cpufreq driver is registered or not. And so we should actually check if cpufreq
driver is registered or not and also if cpufreq is active or disabled, at the
beginning of cpufreq_get().
Otherwise call to lock_policy_rwsem_read() might hit BUG_ON(!policy).
Reported-and-tested-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
If the hw supports intel_pstate and acpi_cpufreq, intel_pstate will
get loaded first.
acpi_cpufreq_init() will call acpi_cpufreq_early_init()
and that will allocate perf data and init those perf data in ACPI core,
(that will cover all CPUs). But later it will free them as
cpufreq_register_driver(acpi_cpufreq) will fail as intel_pstate is
already registered
Use cpufreq_get_current_driver() to check if we can skip the
acpi_cpufreq loading.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
On systems that support intel_pstate, acpi_cpufreq fails to load, and
udev keeps trying until trace gets filled up and kernel crashes.
The root cause is driver return ret from cpufreq_register_driver(),
because when some other driver takes over before, it will return
EBUSY and then udev will keep trying ...
cpufreq_register_driver() should return EEXIST instead so that the
system can boot without appending intel_pstate=disable and still use
intel_pstate.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Commit cdc58d602d2e657602a90c190cbf745886c95977 "cpufreq: imx6q-cpufreq:
remove device tree parsing for cpu nodes" assumed the pdev->dev is set to
cpu0 device in the platform code. But it actually points to the virtual
cpufreq-cpu0 platform device which is not present in the device tree.
Most of the information needed by cpufreq is stored in cpu0 DT node.
So cpu_dev must point to cpu0 device.
This patch fixes the wrong assignment to cpu_dev.
Reported-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Tested-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Sudeep KarkadaNagesha <sudeep.karkadanagesha@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Commit f837a9b5ab05c52a07108c6f09ca66f2e0aee757 "cpufreq: cpufreq-cpu0:
remove device tree parsing for cpu nodes" assumed the pdev->dev is set to
cpu0 device in the platform code. But it actually points to the virtual
cpufreq-cpu0 platform device which is not present in the device tree.
Most of the information needed by cpufreq is stored in cpu0 DT node.
So cpu_dev must point to cpu0 device.
This patch fixes the wrong assignment to cpu_dev.
Reported-and-tested-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Cc: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Sudeep KarkadaNagesha <sudeep.karkadanagesha@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Current code looks like this:
WARN_ON(lock_policy_rwsem_write(cpu));
update_policy_cpu(policy, new_cpu);
unlock_policy_rwsem_write(cpu);
{lock|unlock}_policy_rwsem_write(cpu) takes/releases policy->cpu's rwsem.
Because cpu is changing with the call to update_policy_cpu(), the
unlock_policy_rwsem_write() will release the incorrect lock.
The right solution would be to release the same lock as was taken earlier. Also
update_policy_cpu() was also called from cpufreq_add_dev() without any locks and
so its better if we move this locking to inside update_policy_cpu().
This patch fixes a regression introduced in 3.12 by commit f9ba680d23
(cpufreq: Extract the handover of policy cpu to a helper function).
Reported-and-tested-by: Jon Medhurst<tixy@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
This broke after a recent change "cedb70a cpufreq: Split __cpufreq_remove_dev()
into two parts" from Srivatsa.
Consider a scenario where we have two CPUs in a policy (0 & 1) and we are
removing CPU 1. On the call to __cpufreq_remove_dev_prepare() we have cleared 1
from policy->cpus and now on a call to __cpufreq_remove_dev_finish() we read
cpumask_weight of policy->cpus, which will come as 1 and this code will behave
as if we are removing the last CPU from policy :)
Fix it by clearing the CPU mask in __cpufreq_remove_dev_finish() instead of
__cpufreq_remove_dev_prepare().
Tested-by: Stephen Warren <swarren@wwwdotorg.org>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
* pm-cpufreq:
cpufreq: Acquire the lock in cpufreq_policy_restore() for reading
cpufreq: Prevent problems in update_policy_cpu() if last_cpu == new_cpu
cpufreq: Restructure if/else block to avoid unintended behavior
cpufreq: Fix crash in cpufreq-stats during suspend/resume
|
|
In cpufreq_policy_restore() before system suspend policy is read from
percpu's cpufreq_cpu_data_fallback. It's a read operation rather
than a write one, so take the lock for reading in there.
Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
If update_policy_cpu() is invoked with the existing policy->cpu itself
as the new-cpu parameter, then a lot of things can go terribly wrong.
In its present form, update_policy_cpu() always assumes that the new-cpu
is different from policy->cpu and invokes other functions to perform their
respective updates. And those functions implement the actual update like
this:
per_cpu(..., new_cpu) = per_cpu(..., last_cpu);
per_cpu(..., last_cpu) = NULL;
Thus, when new_cpu == last_cpu, the final NULL assignment makes the per-cpu
references vanish into thin air! (memory leak). From there, it leads to more
problems: cpufreq_stats_create_table() now doesn't find the per-cpu reference
and hence tries to create a new sysfs-group; but sysfs already had created
the group earlier, so it complains that it cannot create a duplicate filename.
In short, the repercussions of a rather innocuous invocation of
update_policy_cpu() can turn out to be pretty nasty.
Ideally update_policy_cpu() should handle this situation (new == last)
gracefully, and not lead to such severe problems. So fix it by adding an
appropriate check.
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Tested-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
In __cpufreq_remove_dev_prepare(), the code which decides whether to remove
the sysfs link or nominate a new policy cpu, is governed by an if/else block
with a rather complex set of conditionals. Worse, they harbor a subtlety
which leads to certain unintended behavior.
The code looks like this:
if (cpu != policy->cpu && !frozen) {
sysfs_remove_link(&dev->kobj, "cpufreq");
} else if (cpus > 1) {
new_cpu = cpufreq_nominate_new_policy_cpu(...);
...
update_policy_cpu(..., new_cpu);
}
The original intention was:
If the CPU going offline is not policy->cpu, just remove the link.
On the other hand, if the CPU going offline is the policy->cpu itself,
handover the policy->cpu job to some other surviving CPU in that policy.
But because the 'if' condition also includes the 'frozen' check, now there
are *two* possibilities by which we can enter the 'else' block:
1. cpu == policy->cpu (intended)
2. cpu != policy->cpu && frozen (unintended)
Due to the second (unintended) scenario, we end up spuriously nominating
a CPU as the policy->cpu, even when the existing policy->cpu is alive and
well. This can cause problems further down the line, especially when we end
up nominating the same policy->cpu as the new one (ie., old == new),
because it totally confuses update_policy_cpu().
To avoid this mess, restructure the if/else block to only do what was
originally intended, and thus prevent any unwelcome surprises.
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Tested-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Stephen Warren reported that the cpufreq-stats code hits a NULL pointer
dereference during the second attempt to suspend a system. He also
pin-pointed the problem to commit 5302c3f "cpufreq: Perform light-weight
init/teardown during suspend/resume".
That commit actually ensured that the cpufreq-stats table and the
cpufreq-stats sysfs entries are *not* torn down (ie., not freed) during
suspend/resume, which makes it all the more surprising. However, it turns
out that the root-cause is not that we access an already freed memory, but
that the reference to the allocated memory gets moved around and we lose
track of that during resume, leading to the reported crash in a subsequent
suspend attempt.
In the suspend path, during CPU offline, the value of policy->cpu is updated
by choosing one of the surviving CPUs in that policy, as long as there is
atleast one CPU in that policy. And cpufreq_stats_update_policy_cpu() is
invoked to update the reference to the stats structure by assigning it to
the new CPU. However, in the resume path, during CPU online, we end up
assigning a fresh CPU as the policy->cpu, without letting cpufreq-stats
know about this. Thus the reference to the stats structure remains
(incorrectly) associated with the old CPU. So, in a subsequent suspend attempt,
during CPU offline, we end up accessing an incorrect location to get the
stats structure, which eventually leads to the NULL pointer dereference.
Fix this by letting cpufreq-stats know about the update of the policy->cpu
during CPU online in the resume path. (Also, move the update_policy_cpu()
function higher up in the file, so that __cpufreq_add_dev() can invoke
it).
Reported-and-tested-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
* pm-cpufreq:
intel_pstate: Add Haswell CPU models
Revert "cpufreq: make sure frequency transitions are serialized"
cpufreq: Use signed type for 'ret' variable, to store negative error values
cpufreq: Remove temporary fix for race between CPU hotplug and sysfs-writes
cpufreq: Synchronize the cpufreq store_*() routines with CPU hotplug
cpufreq: Invoke __cpufreq_remove_dev_finish() after releasing cpu_hotplug.lock
cpufreq: Split __cpufreq_remove_dev() into two parts
cpufreq: Fix wrong time unit conversion
cpufreq: serialize calls to __cpufreq_governor()
cpufreq: don't allow governor limits to be changed when it is disabled
|
|
Enable the intel_pstate driver for Haswell CPUs. One missing Ivy Bridge
model (0x3E) is also included. Models referenced from
tools/power/x86/turbostat/turbostat.c:has_nehalem_turbo_ratio_limit
Signed-off-by: Nell Hardcastle <nell@spicious.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Commit 7c30ed5 (cpufreq: make sure frequency transitions are
serialized) attempted to serialize frequency transitions by
adding checks to the CPUFREQ_PRECHANGE and CPUFREQ_POSTCHANGE
notifications. However, it assumed that the notifications will
always originate from the driver's .target() callback, but they
also can be triggered by cpufreq_out_of_sync() and that leads to
warnings like this on some systems:
WARNING: CPU: 0 PID: 14543 at drivers/cpufreq/cpufreq.c:317
__cpufreq_notify_transition+0x238/0x260()
In middle of another frequency transition
accompanied by a call trace similar to this one:
[<ffffffff81720daa>] dump_stack+0x46/0x58
[<ffffffff8106534c>] warn_slowpath_common+0x8c/0xc0
[<ffffffff815b8560>] ? acpi_cpufreq_target+0x320/0x320
[<ffffffff81065436>] warn_slowpath_fmt+0x46/0x50
[<ffffffff815b1ec8>] __cpufreq_notify_transition+0x238/0x260
[<ffffffff815b33be>] cpufreq_notify_transition+0x3e/0x70
[<ffffffff815b345d>] cpufreq_out_of_sync+0x6d/0xb0
[<ffffffff815b370c>] cpufreq_update_policy+0x10c/0x160
[<ffffffff815b3760>] ? cpufreq_update_policy+0x160/0x160
[<ffffffff81413813>] cpufreq_set_cur_state+0x8c/0xb5
[<ffffffff814138df>] processor_set_cur_state+0xa3/0xcf
[<ffffffff8158e13c>] thermal_cdev_update+0x9c/0xb0
[<ffffffff8159046a>] step_wise_throttle+0x5a/0x90
[<ffffffff8158e21f>] handle_thermal_trip+0x4f/0x140
[<ffffffff8158e377>] thermal_zone_device_update+0x57/0xa0
[<ffffffff81415b36>] acpi_thermal_check+0x2e/0x30
[<ffffffff81415ca0>] acpi_thermal_notify+0x40/0xdc
[<ffffffff813e7dbd>] acpi_device_notify+0x19/0x1b
[<ffffffff813f8241>] acpi_ev_notify_dispatch+0x41/0x5c
[<ffffffff813e3fbe>] acpi_os_execute_deferred+0x25/0x32
[<ffffffff81081060>] process_one_work+0x170/0x4a0
[<ffffffff81082121>] worker_thread+0x121/0x390
[<ffffffff81082000>] ? manage_workers.isra.20+0x170/0x170
[<ffffffff81088fe0>] kthread+0xc0/0xd0
[<ffffffff81088f20>] ? flush_kthread_worker+0xb0/0xb0
[<ffffffff8173582c>] ret_from_fork+0x7c/0xb0
[<ffffffff81088f20>] ? flush_kthread_worker+0xb0/0xb0
For this reason, revert commit 7c30ed5 along with the fix 266c13d
(cpufreq: Fix serialization of frequency transitions) on top of it
and we will revisit the serialization problem later.
Reported-by: Alessandro Bono <alessandro.bono@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
There are places where the variable 'ret' is declared as unsigned int
and then used to store negative return values such as -EINVAL. Fix them
by declaring the variable as a signed quantity.
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Commit "cpufreq: serialize calls to __cpufreq_governor()" had been a temporary
and partial solution to the race condition between writing to a cpufreq sysfs
file and taking a CPU offline. Now that we have a proper and complete solution
to that problem, remove the temporary fix.
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
The functions that are used to write to cpufreq sysfs files (such as
store_scaling_max_freq()) are not hotplug safe. They can race with CPU
hotplug tasks and lead to problems such as trying to acquire an already
destroyed timer-mutex etc.
Eg:
__cpufreq_remove_dev()
__cpufreq_governor(policy, CPUFREQ_GOV_STOP);
policy->governor->governor(policy, CPUFREQ_GOV_STOP);
cpufreq_governor_dbs()
case CPUFREQ_GOV_STOP:
mutex_destroy(&cpu_cdbs->timer_mutex)
cpu_cdbs->cur_policy = NULL;
<PREEMPT>
store()
__cpufreq_set_policy()
__cpufreq_governor(policy, CPUFREQ_GOV_LIMITS);
policy->governor->governor(policy, CPUFREQ_GOV_LIMITS);
case CPUFREQ_GOV_LIMITS:
mutex_lock(&cpu_cdbs->timer_mutex); <-- Warning (destroyed mutex)
if (policy->max < cpu_cdbs->cur_policy->cur) <- cur_policy == NULL
So use get_online_cpus()/put_online_cpus() in the store_*() functions, to
synchronize with CPU hotplug. However, there is an additional point to note
here: some parts of the CPU teardown in the cpufreq subsystem are done in
the CPU_POST_DEAD stage, with cpu_hotplug.lock *released*. So, using the
get/put_online_cpus() functions alone is insufficient; we should also ensure
that we don't race with those latter steps in the hotplug sequence. We can
easily achieve this by checking if the CPU is online before proceeding with
the store, since the CPU would have been marked offline by the time the
CPU_POST_DEAD notifiers are executed.
Reported-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
__cpufreq_remove_dev_finish() handles the kobject cleanup for a CPU going
offline. But because we destroy the kobject towards the end of the CPU offline
phase, there are certain race windows where a task can try to write to a
cpufreq sysfs file (eg: using store_scaling_max_freq()) while we are taking
that CPU offline, and this can bump up the kobject refcount, which in turn might
hinder the CPU offline task from running to completion. (It can also cause
other more serious problems such as trying to acquire a destroyed timer-mutex
etc., depending on the exact stage of the cleanup at which the task managed to
take a new refcount).
To fix the race window, we will need to synchronize those store_*() call-sites
with CPU hotplug, using get_online_cpus()/put_online_cpus(). However, that
in turn can cause a total deadlock because it can end up waiting for the
CPU offline task to complete, with incremented refcount!
Write to sysfs CPU offline task
-------------- ----------------
kobj_refcnt++
Acquire cpu_hotplug.lock
get_online_cpus();
Wait for kobj_refcnt to drop to zero
**DEADLOCK**
A simple way to avoid this problem is to perform the kobject cleanup in the
CPU offline path, with the cpu_hotplug.lock *released*. That is, we can
perform the wait-for-kobj-refcnt-to-drop as well as the subsequent cleanup
in the CPU_POST_DEAD stage of CPU offline, which is run with cpu_hotplug.lock
released. Doing this helps us avoid deadlocks due to holding kobject refcounts
and waiting on each other on the cpu_hotplug.lock.
(Note: We can't move all of the cpufreq CPU offline steps to the
CPU_POST_DEAD stage, because certain things such as stopping the governors
have to be done before the outgoing CPU is marked offline. So retain those
parts in the CPU_DOWN_PREPARE stage itself).
Reported-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
During CPU offline, the cpufreq core invokes __cpufreq_remove_dev()
to perform work such as stopping the cpufreq governor, clearing the
CPU from the policy structure etc, and finally cleaning up the
kobject.
There are certain subtle issues related to the kobject cleanup, and
it would be much easier to deal with them if we separate that part
from the rest of the cleanup-work in the CPU offline phase. So split
the __cpufreq_remove_dev() function into 2 parts: one that handles
the kobject cleanup, and the other that handles the rest of the work.
Reported-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
The time spent by a CPU under a given frequency is stored in jiffies unit
in the cpu var cpufreq_stats_table->time_in_state[i], i being the index of
the frequency.
This is what is displayed in the following file on the right column:
cat /sys/devices/system/cpu/cpuX/cpufreq/stats/time_in_state
2301000 19835820
2300000 3172
[...]
Now cpufreq converts this jiffies unit delta to clock_t before returning it
to the user as in the above file. And that conversion is achieved using the API
cputime64_to_clock_t().
Although it accidentally works on traditional tick based cputime accounting, where
cputime_t maps directly to jiffies, it doesn't work with other types of cputime
accounting such as CONFIG_VIRT_CPU_ACCOUNTING_* where cputime_t can map to nsecs
or any granularity preffered by the architecture.
For example we get a buggy zero delta on full dyntick configurations:
cat /sys/devices/system/cpu/cpuX/cpufreq/stats/time_in_state
2301000 0
2300000 0
[...]
Fix this with using the proper jiffies_64_t to clock_t conversion.
Reported-and-tested-by: Carsten Emde <C.Emde@osadl.org>
Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
We can't take a big lock around __cpufreq_governor() as this causes
recursive locking for some cases. But calls to this routine must be
serialized for every policy. Otherwise we can see some unpredictable
events.
For example, consider following scenario:
__cpufreq_remove_dev()
__cpufreq_governor(policy, CPUFREQ_GOV_STOP);
policy->governor->governor(policy, CPUFREQ_GOV_STOP);
cpufreq_governor_dbs()
case CPUFREQ_GOV_STOP:
mutex_destroy(&cpu_cdbs->timer_mutex)
cpu_cdbs->cur_policy = NULL;
<PREEMPT>
store()
__cpufreq_set_policy()
__cpufreq_governor(policy, CPUFREQ_GOV_LIMITS);
policy->governor->governor(policy, CPUFREQ_GOV_LIMITS);
case CPUFREQ_GOV_LIMITS:
mutex_lock(&cpu_cdbs->timer_mutex); <-- Warning (destroyed mutex)
if (policy->max < cpu_cdbs->cur_policy->cur) <- cur_policy == NULL
And so store() will eventually result in a crash if cur_policy is
NULL at this point.
Introduce an additional variable which would guarantee serialization
here.
Reported-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
__cpufreq_governor() returns with -EBUSY when governor is already
stopped and we try to stop it again, but when it is stopped we must
not allow calls to CPUFREQ_GOV_LIMITS event as well.
This patch adds this check in __cpufreq_governor().
Reported-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI and power management updates from Rafael Wysocki:
1) ACPI-based PCI hotplug (ACPIPHP) subsystem rework and introduction
of Intel Thunderbolt support on systems that use ACPI for signalling
Thunderbolt hotplug events. This also should make ACPIPHP work in
some cases in which it was known to have problems. From
Rafael J Wysocki, Mika Westerberg and Kirill A Shutemov.
2) ACPI core code cleanups and dock station support cleanups from
Jiang Liu and Rafael J Wysocki.
3) Fixes for locking problems related to ACPI device hotplug from
Rafael J Wysocki.
4) ACPICA update to version 20130725 includig fixes, cleanups, support
for more than 256 GPEs per GPE block and a change to make the ACPI
PM Timer optional (we've seen systems without the PM Timer in the
field already). One of the fixes, related to the DeRefOf operator,
is necessary to prevent some Windows 8 oriented AML from causing
problems to happen. From Bob Moore, Lv Zheng, and Jung-uk Kim.
5) Removal of the old and long deprecated /proc/acpi/event interface
and related driver changes from Thomas Renninger.
6) ACPI and Xen changes to make the reduced hardware sleep work with
the latter from Ben Guthro.
7) ACPI video driver cleanups and a blacklist of systems that should
not tell the BIOS that they are compatible with Windows 8 (or ACPI
backlight and possibly other things will not work on them). From
Felipe Contreras.
8) Assorted ACPI fixes and cleanups from Aaron Lu, Hanjun Guo,
Kuppuswamy Sathyanarayanan, Lan Tianyu, Sachin Kamat, Tang Chen,
Toshi Kani, and Wei Yongjun.
9) cpufreq ondemand governor target frequency selection change to
reduce oscillations between min and max frequencies (essentially,
it causes the governor to choose target frequencies proportional
to load) from Stratos Karafotis.
10) cpufreq fixes allowing sysfs attributes file permissions to be
preserved over suspend/resume cycles Srivatsa S Bhat.
11) Removal of Device Tree parsing for CPU device nodes from multiple
cpufreq drivers that required some changes related to
of_get_cpu_node() to be made in a few architectures and in the
driver core. From Sudeep KarkadaNagesha.
12) cpufreq core fixes and cleanups related to mutual exclusion and
driver module references from Viresh Kumar, Lukasz Majewski and
Rafael J Wysocki.
13) Assorted cpufreq fixes and cleanups from Amit Daniel Kachhap,
Bartlomiej Zolnierkiewicz, Hanjun Guo, Jingoo Han, Joseph Lo,
Julia Lawall, Li Zhong, Mark Brown, Sascha Hauer, Stephen Boyd,
Stratos Karafotis, and Viresh Kumar.
14) Fixes to prevent race conditions in coupled cpuidle from happening
from Colin Cross.
15) cpuidle core fixes and cleanups from Daniel Lezcano and
Tuukka Tikkanen.
16) Assorted cpuidle fixes and cleanups from Daniel Lezcano,
Geert Uytterhoeven, Jingoo Han, Julia Lawall, Linus Walleij,
and Sahara.
17) System sleep tracing changes from Todd E Brandt and Shuah Khan.
18) PNP subsystem conversion to using struct dev_pm_ops for power
management from Shuah Khan.
* tag 'pm+acpi-3.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (217 commits)
cpufreq: Don't use smp_processor_id() in preemptible context
cpuidle: coupled: fix race condition between pokes and safe state
cpuidle: coupled: abort idle if pokes are pending
cpuidle: coupled: disable interrupts after entering safe state
ACPI / hotplug: Remove containers synchronously
driver core / ACPI: Avoid device hot remove locking issues
cpufreq: governor: Fix typos in comments
cpufreq: governors: Remove duplicate check of target freq in supported range
cpufreq: Fix timer/workqueue corruption due to double queueing
ACPI / EC: Add ASUSTEK L4R to quirk list in order to validate ECDT
ACPI / thermal: Add check of "_TZD" availability and evaluating result
cpufreq: imx6q: Fix clock enable balance
ACPI: blacklist win8 OSI for buggy laptops
cpufreq: tegra: fix the wrong clock name
cpuidle: Change struct menu_device field types
cpuidle: Add a comment warning about possible overflow
cpuidle: Fix variable domains in get_typical_interval()
cpuidle: Fix menu_device->intervals type
cpuidle: CodingStyle: Break up multiple assignments on single line
cpuidle: Check called function parameter in get_typical_interval()
...
|