summaryrefslogtreecommitdiff
path: root/drivers/cpufreq
AgeCommit message (Collapse)AuthorFilesLines
2016-04-11intel_pstate: Use del_timer_sync in intel_pstate_cpu_stopDirk Brandewie1-1/+1
commit c2294a2f7853e6450361d078b65407bdaa6d1d11 upstream. Ensure that no timer callback is running since we are about to free the timer structure. We cannot guarantee that the call back is called on the CPU where the timer is running. Reported-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com> Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2015-06-23cpufreq: pcc: Enable autoload of pcc-cpufreq for ACPI processorsLenny Szubowicz1-0/+7
commit 7e7e8fe69820c6fa31395dbbd8e348e3c69cd2a9 upstream. The pcc-cpufreq driver is not automatically loaded on systems where the platform's power management setting requires this driver. Instead, on those systems no CPU frequency driver is registered and active. Make the autoloading matching criteria for loading the pcc-cpufreq driver the same as done in acpi-cpufreq by commit c655affbd524d01 ("ACPI / cpufreq: Add ACPI processor device IDs to acpi-cpufreq"). x86 CPU frequency drivers are now typically autoloaded by specifying MODULE_DEVICE_TABLE entries and x86cpu model specific matching. But pcc-cpufreq was omitted when acpi-cpufreq and other drivers were changed to use this approach. Both acpi-cpufreq and pcc-cpufreq depend on a distinct and mutually exclusive set of ACPI methods which are not directly tied to specific processor model numbers. Both of these drivers have init routines which look for their required ACPI methods. As a result, only the appropriate driver registers as the cpu frequency driver and the other one ends up being unloaded. Tested on various systems where acpi-cpufreq, intel_pstate, and pcc-cpufreq are the expected cpu frequency drivers. Signed-off-by: Lenny Szubowicz <lszubowi@redhat.com> Signed-off-by: Joseph Szczypek <joseph.szczypek@hp.com> Reported-by: Trinh Dao <trinh.dao@hp.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2015-04-27cpufreq: fix a NULL pointer dereference in __cpufreq_governor()Ethan Zhao1-0/+7
commit cb57720bf79688d64854a0a43565aa52303c1f3f upstream. If ACPI _PPC changed notification happens before governor was initiated while kernel is booting, a NULL pointer dereference will be triggered: BUG: unable to handle kernel NULL pointer dereference at 0000000000000030 IP: [<ffffffff81470453>] __cpufreq_governor+0x23/0x1e0 PGD 0 Oops: 0000 [#1] SMP ... ... RIP: 0010:[<ffffffff81470453>] [<ffffffff81470453>] __cpufreq_governor+0x23/0x1e0 RSP: 0018:ffff881fcfbcfbb8 EFLAGS: 00010286 RAX: 0000000000000000 RBX: ffff881fd11b3980 RCX: ffff88407fc20000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff881fd11b3980 RBP: ffff881fcfbcfbd8 R08: 0000000000000000 R09: 000000000000000f R10: ffffffff818068d0 R11: 0000000000000043 R12: 0000000000000004 R13: 0000000000000000 R14: ffffffff8196cae0 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff881fffc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000030 CR3: 00000000018ae000 CR4: 00000000000407f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process kworker/0:3 (pid: 750, threadinfo ffff881fcfbce000, task ffff881fcf556400) Stack: ffff881fffc17d00 ffff881fcfbcfc18 ffff881fd11b3980 0000000000000000 ffff881fcfbcfc08 ffffffff81470d08 ffff881fd11b3980 0000000000000007 ffff881fcfbcfc18 ffff881fffc17d00 ffff881fcfbcfd28 ffffffff81472e9a Call Trace: [<ffffffff81470d08>] __cpufreq_set_policy+0x1b8/0x2e0 [<ffffffff81472e9a>] cpufreq_update_policy+0xca/0x150 [<ffffffff81472f20>] ? cpufreq_update_policy+0x150/0x150 [<ffffffff81324a96>] acpi_processor_ppc_has_changed+0x71/0x7b [<ffffffff81320bcd>] acpi_processor_notify+0x55/0x115 [<ffffffff812f9c29>] acpi_device_notify+0x19/0x1b [<ffffffff813084ca>] acpi_ev_notify_dispatch+0x41/0x5f [<ffffffff812f64a4>] acpi_os_execute_deferred+0x27/0x34 The root cause is a race conditon -- cpufreq core and acpi-cpufreq driver were initiated, but cpufreq_governor wasn't and _PPC changed notification happened, __cpufreq_governor() was called within acpi_os_execute_deferred kernel thread context. To fix this panic issue, add pointer checking code in __cpufreq_governor() before pointer policy->governor is to be dereferenced. Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2015-03-02cpufreq: s3c: remove incorrect __init annotationsArnd Bergmann2-3/+3
commit 61882b63171736571e1139ab5aa929e3bb336016 upstream. The two functions s3c2416_cpufreq_driver_init and s3c_cpufreq_register are marked init but are called from a context that might be run after the __init sections are discarded, as the compiler points out: WARNING: vmlinux.o(.data+0x1ad9dc): Section mismatch in reference from the variable s3c2416_cpufreq_driver to the function .init.text:s3c2416_cpufreq_driver_init() WARNING: drivers/built-in.o(.text+0x35b5dc): Section mismatch in reference from the function s3c2410a_cpufreq_add() to the function .init.text:s3c_cpufreq_register() This removes the __init markings. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2015-03-02cpufreq: speedstep-smi: enable interrupts when waitingMikulas Patocka2-0/+15
commit d4d4eda23794c701442e55129dd4f8f2fefd5e4d upstream. On Dell Latitude C600 laptop with Pentium 3 850MHz processor, the speedstep-smi driver sometimes loads and sometimes doesn't load with "change to state X failed" message. The hardware sometimes refuses to change frequency and in this case, we need to retry later. I found out that we need to enable interrupts while waiting. When we enable interrupts, the hardware blockage that prevents frequency transition resolves and the transition is possible. With disabled interrupts, the blockage doesn't resolve (no matter how long do we wait). The exact reasons for this hardware behavior are unknown. This patch enables interrupts in the function speedstep_set_state that can be called with disabled interrupts. However, this function is called with disabled interrupts only from speedstep_get_freqs, so it shouldn't cause any problem. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-11-13cpufreq: intel_pstate: Fix setting max_perf_pct in performance policyPali Rohár1-0/+1
commit 36b4bed5cd8f6e17019fa7d380e0836872c7b367 upstream. Code which changes policy to powersave changes also max_policy_pct based on max_freq. Code which change max_perf_pct has upper limit base on value max_policy_pct. When policy is changing from powersave back to performance then max_policy_pct is not changed. Which means that changing max_perf_pct is not possible to high values if max_freq was too low in powersave policy. Test case: $ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq 800000 $ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq 3300000 $ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor performance $ cat /sys/devices/system/cpu/intel_pstate/max_perf_pct 100 $ echo powersave > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor $ echo 800000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq $ echo 20 > /sys/devices/system/cpu/intel_pstate/max_perf_pct $ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor powersave $ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq 800000 $ cat /sys/devices/system/cpu/intel_pstate/max_perf_pct 20 $ echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor $ echo 3300000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq $ echo 100 > /sys/devices/system/cpu/intel_pstate/max_perf_pct $ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor performance $ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq 3300000 $ cat /sys/devices/system/cpu/intel_pstate/max_perf_pct 24 And now intel_pstate driver allows to set maximal value for max_perf_pct based on max_policy_pct which is 24 for previous powersave max_freq 800000. This patch will set default value for max_policy_pct when setting policy to performance so it will allow to set also max value for max_perf_pct. Signed-off-by: Pali Rohár <pali.rohar@gmail.com> Acked-by: Dirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-11-13cpufreq: expose scaling_cur_freq sysfs file for set_policy() driversDirk Brandewie1-6/+17
commit c034b02e213d271b98c45c4a7b54af8f69aaac1e upstream. Currently the core does not expose scaling_cur_freq for set_policy() drivers this breaks some userspace monitoring tools. Change the core to expose this file for all drivers and if the set_policy() driver supports the get() callback use it to retrieve the current frequency. Link: https://bugzilla.kernel.org/show_bug.cgi?id=73741 Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-07-18intel_pstate: Set CPU number before accessing MSRsVincent Minet1-1/+1
commit 179e8471673ce0249cd4ecda796008f7757e5bad upstream. Ensure that cpu->cpu is set before writing MSR_IA32_PERF_CTL during CPU initialization. Otherwise only cpu0 has its P-state set and all other cores are left with their values unchanged. In most cases, this is not too serious because the P-states will be set correctly when the timer function is run. But when the default governor is set to performance, the per-CPU current_pstate stays the same forever and no attempts are made to write the MSRs again. Signed-off-by: Vincent Minet <vincent@vincent-minet.net> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-07-18cpufreq: Makefile: fix compilation for davinci platformPrabhakar Lad1-1/+1
commit 5a90af67c2126fe1d04ebccc1f8177e6ca70d3a9 upstream. Since commtit 8a7b1227e303 (cpufreq: davinci: move cpufreq driver to drivers/cpufreq) this added dependancy only for CONFIG_ARCH_DAVINCI_DA850 where as davinci_cpufreq_init() call is used by all davinci platform. This patch fixes following build error: arch/arm/mach-davinci/built-in.o: In function `davinci_init_late': :(.init.text+0x928): undefined reference to `davinci_cpufreq_init' make: *** [vmlinux] Error 1 Fixes: 8a7b1227e303 (cpufreq: davinci: move cpufreq driver to drivers/cpufreq) Signed-off-by: Lad, Prabhakar <prabhakar.csengg@gmail.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-06-20cpufreq: remove race while accessing cur_policyBibek Basu1-0/+6
commit c5450db85b828d0c46ac8fc570fb8a51bf07ac40 upstream. While accessing cur_policy during executing events CPUFREQ_GOV_START, CPUFREQ_GOV_STOP, CPUFREQ_GOV_LIMITS, same mutex lock is not taken, dbs_data->mutex, which leads to race and data corruption while running continious suspend resume test. This is seen with ondemand governor with suspend resume test using rtcwake. Unable to handle kernel NULL pointer dereference at virtual address 00000028 pgd = ed610000 [00000028] *pgd=adf11831, *pte=00000000, *ppte=00000000 Internal error: Oops: 17 [#1] PREEMPT SMP ARM Modules linked in: nvhost_vi CPU: 1 PID: 3243 Comm: rtcwake Not tainted 3.10.24-gf5cf9e5 #1 task: ee708040 ti: ed61c000 task.ti: ed61c000 PC is at cpufreq_governor_dbs+0x400/0x634 LR is at cpufreq_governor_dbs+0x3f8/0x634 pc : [<c05652b8>] lr : [<c05652b0>] psr: 600f0013 sp : ed61dcb0 ip : 000493e0 fp : c1cc14f0 r10: 00000000 r9 : 00000000 r8 : 00000000 r7 : eb725280 r6 : c1cc1560 r5 : eb575200 r4 : ebad7740 r3 : ee708040 r2 : ed61dca8 r1 : 001ebd24 r0 : 00000000 Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user Control: 10c5387d Table: ad61006a DAC: 00000015 [<c05652b8>] (cpufreq_governor_dbs+0x400/0x634) from [<c055f700>] (__cpufreq_governor+0x98/0x1b4) [<c055f700>] (__cpufreq_governor+0x98/0x1b4) from [<c0560770>] (__cpufreq_set_policy+0x250/0x320) [<c0560770>] (__cpufreq_set_policy+0x250/0x320) from [<c0561dcc>] (cpufreq_update_policy+0xcc/0x168) [<c0561dcc>] (cpufreq_update_policy+0xcc/0x168) from [<c0561ed0>] (cpu_freq_notify+0x68/0xdc) [<c0561ed0>] (cpu_freq_notify+0x68/0xdc) from [<c008eff8>] (notifier_call_chain+0x4c/0x8c) [<c008eff8>] (notifier_call_chain+0x4c/0x8c) from [<c008f3d4>] (__blocking_notifier_call_chain+0x50/0x68) [<c008f3d4>] (__blocking_notifier_call_chain+0x50/0x68) from [<c008f40c>] (blocking_notifier_call_chain+0x20/0x28) [<c008f40c>] (blocking_notifier_call_chain+0x20/0x28) from [<c00aac6c>] (pm_qos_update_bounded_target+0xd8/0x310) [<c00aac6c>] (pm_qos_update_bounded_target+0xd8/0x310) from [<c00ab3b0>] (__pm_qos_update_request+0x64/0x70) [<c00ab3b0>] (__pm_qos_update_request+0x64/0x70) from [<c004b4b8>] (tegra_pm_notify+0x114/0x134) [<c004b4b8>] (tegra_pm_notify+0x114/0x134) from [<c008eff8>] (notifier_call_chain+0x4c/0x8c) [<c008eff8>] (notifier_call_chain+0x4c/0x8c) from [<c008f3d4>] (__blocking_notifier_call_chain+0x50/0x68) [<c008f3d4>] (__blocking_notifier_call_chain+0x50/0x68) from [<c008f40c>] (blocking_notifier_call_chain+0x20/0x28) [<c008f40c>] (blocking_notifier_call_chain+0x20/0x28) from [<c00ac228>] (pm_notifier_call_chain+0x1c/0x34) [<c00ac228>] (pm_notifier_call_chain+0x1c/0x34) from [<c00ad38c>] (enter_state+0xec/0x128) [<c00ad38c>] (enter_state+0xec/0x128) from [<c00ad400>] (pm_suspend+0x38/0xa4) [<c00ad400>] (pm_suspend+0x38/0xa4) from [<c00ac114>] (state_store+0x70/0xc0) [<c00ac114>] (state_store+0x70/0xc0) from [<c027b1e8>] (kobj_attr_store+0x14/0x20) [<c027b1e8>] (kobj_attr_store+0x14/0x20) from [<c019cd9c>] (sysfs_write_file+0x104/0x184) [<c019cd9c>] (sysfs_write_file+0x104/0x184) from [<c0143038>] (vfs_write+0xd0/0x19c) [<c0143038>] (vfs_write+0xd0/0x19c) from [<c0143414>] (SyS_write+0x4c/0x78) [<c0143414>] (SyS_write+0x4c/0x78) from [<c000f080>] (ret_fast_syscall+0x0/0x30) Code: e1a00006 eb084346 e59b0020 e5951024 (e5903028) ---[ end trace 0488523c8f6b0f9d ]--- Signed-off-by: Bibek Basu <bbasu@nvidia.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-04-13powernow-k6: reorder frequenciesMikulas Patocka1-7/+10
commit 22c73795b101597051924556dce019385a1e2fa0 upstream. This patch reorders reported frequencies from the highest to the lowest, just like in other frequency drivers. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-04-13powernow-k6: correctly initialize default parametersMikulas Patocka1-4/+72
commit d82b922a4acc1781d368aceac2f9da43b038cab2 upstream. The powernow-k6 driver used to read the initial multiplier from the powernow register. However, there is a problem with this: * If there was a frequency transition before, the multiplier read from the register corresponds to the current multiplier. * If there was no frequency transition since reset, the field in the register always reads as zero, regardless of the current multiplier that is set using switches on the mainboard and that the CPU is running at. The zero value corresponds to multiplier 4.5, so as a consequence, the powernow-k6 driver always assumes multiplier 4.5. For example, if we have 550MHz CPU with bus frequency 100MHz and multiplier 5.5, the powernow-k6 driver thinks that the multiplier is 4.5 and bus frequency is 122MHz. The powernow-k6 driver then sets the multiplier to 4.5, underclocking the CPU to 450MHz, but reports the current frequency as 550MHz. There is no reliable way how to read the initial multiplier. I modified the driver so that it contains a table of known frequencies (based on parameters of existing CPUs and some common overclocking schemes) and sets the multiplier according to the frequency. If the frequency is unknown (because of unusual overclocking or underclocking), the user must supply the bus speed and maximum multiplier as module parameters. This patch should be backported to all stable kernels. If it doesn't apply cleanly, change it, or ask me to change it. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-04-13powernow-k6: disable cache when changing frequencyMikulas Patocka1-17/+39
commit e20e1d0ac02308e2211306fc67abcd0b2668fb8b upstream. I found out that a system with k6-3+ processor is unstable during network server load. The system locks up or the network card stops receiving. The reason for the instability is the CPU frequency scaling. During frequency transition the processor is in "EPM Stop Grant" state. The documentation says that the processor doesn't respond to inquiry requests in this state. Consequently, coherency of processor caches and bus master devices is not maintained, causing the system instability. This patch flushes the cache during frequency transition. It fixes the instability. Other minor changes: * u64 invalue changed to unsigned long because the variable is 32-bit * move the logic to set the multiplier to a separate function powernow_k6_set_cpu_multiplier * preserve lower 5 bits of the powernow port instead of 4 (the voltage field has 5 bits) * mask interrupts when reading the multiplier, so that the port is not open during other activity (running other kernel code with the port open shouldn't cause any misbehavior, but we should better be safe and keep the port closed) This patch should be backported to all stable kernels. If it doesn't apply cleanly, change it, or ask me to change it. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-03-05cpufreq: powernow-k8: Initialize per-cpu data-structures properlySrivatsa S. Bhat1-3/+7
commit c3274763bfc3bf1ececa269ed6e6c4d7ec1c3e5e upstream. The powernow-k8 driver maintains a per-cpu data-structure called powernow_data that is used to perform the frequency transitions. It initializes this data structure only for the policy->cpu. So, accesses to this data structure by other CPUs results in various problems because they would have been uninitialized. Specifically, if a cpu (!= policy->cpu) invokes the drivers' ->get() function, it returns 0 as the KHz value, since its per-cpu memory doesn't point to anything valid. This causes problems during suspend/resume since cpufreq_update_policy() tries to enforce this (0 KHz) as the current frequency of the CPU, and this madness gets propagated to adjust_jiffies() as well. Eventually, lots of things start breaking down, including the r8169 ethernet card, in one particularly interesting case reported by Pierre Ossman. Fix this by initializing the per-cpu data-structures of all the CPUs in the policy appropriately. References: https://bugzilla.kernel.org/show_bug.cgi?id=70311 Reported-by: Pierre Ossman <pierre@ossman.eu> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-01-16intel_pstate: Add X86_FEATURE_APERFMPERF to cpu match parameters.Dirk Brandewie1-1/+2
commit 6cbd7ee10e2842a3d1f9b60abede1c8f3d1f1130 upstream. KVM environments do not support APERF/MPERF MSRs. intel_pstate cannot operate without these registers. The previous validity checks in intel_pstate_msrs_not_valid() are insufficent in nested KVMs. References: https://bugzilla.redhat.com/show_bug.cgi?id=1046317 Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-10intel_pstate: Fail initialization if P-state information is missingRafael J. Wysocki1-0/+5
commit 98a947abdd54e5de909bebadfced1696ccad30cf upstream. If pstate.current_pstate is 0 after the initial intel_pstate_get_cpu_pstates(), this means that we were unable to obtain any useful P-state information and there is no reason to continue, so free memory and return an error in that case. This fixes the following divide error occuring in a nested KVM guest: Intel P-state driver initializing. Intel pstate controlling: cpu 0 cpufreq: __cpufreq_add_dev: ->get() failed divide error: 0000 [#1] SMP Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.13.0-0.rc4.git5.1.fc21.x86_64 #1 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 task: ffff88001ea20000 ti: ffff88001e9bc000 task.ti: ffff88001e9bc000 RIP: 0010:[<ffffffff815c551d>] [<ffffffff815c551d>] intel_pstate_timer_func+0x11d/0x2b0 RSP: 0000:ffff88001ee03e18 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88001a454348 RCX: 0000000000006100 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffff88001ee03e38 R08: 0000000000000000 R09: 0000000000000000 R10: ffff88001ea20000 R11: 0000000000000000 R12: 00000c0a1ea20000 R13: 1ea200001ea20000 R14: ffffffff815c5400 R15: ffff88001a454348 FS: 0000000000000000(0000) GS:ffff88001ee00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000000 CR3: 0000000001c0c000 CR4: 00000000000006f0 Stack: fffffffb1a454390 ffffffff821a4500 ffff88001a454390 0000000000000100 ffff88001ee03ea8 ffffffff81083e9a ffffffff81083e15 ffffffff82d5ed40 ffffffff8258cc60 0000000000000000 ffffffff81ac39de 0000000000000000 Call Trace: <IRQ> [<ffffffff81083e9a>] call_timer_fn+0x8a/0x310 [<ffffffff81083e15>] ? call_timer_fn+0x5/0x310 [<ffffffff815c5400>] ? pid_param_set+0x130/0x130 [<ffffffff81084354>] run_timer_softirq+0x234/0x380 [<ffffffff8107aee4>] __do_softirq+0x104/0x430 [<ffffffff8107b5fd>] irq_exit+0xcd/0xe0 [<ffffffff81770645>] smp_apic_timer_interrupt+0x45/0x60 [<ffffffff8176efb2>] apic_timer_interrupt+0x72/0x80 <EOI> [<ffffffff810e15cd>] ? vprintk_emit+0x1dd/0x5e0 [<ffffffff81757719>] printk+0x67/0x69 [<ffffffff815c1493>] __cpufreq_add_dev.isra.13+0x883/0x8d0 [<ffffffff815c14f0>] cpufreq_add_dev+0x10/0x20 [<ffffffff814a14d1>] subsys_interface_register+0xb1/0xf0 [<ffffffff815bf5cf>] cpufreq_register_driver+0x9f/0x210 [<ffffffff81fb19af>] intel_pstate_init+0x27d/0x3be [<ffffffff81761e3e>] ? mutex_unlock+0xe/0x10 [<ffffffff81fb1732>] ? cpufreq_gov_dbs_init+0x12/0x12 [<ffffffff8100214a>] do_one_initcall+0xfa/0x1b0 [<ffffffff8109dbf5>] ? parse_args+0x225/0x3f0 [<ffffffff81f64193>] kernel_init_freeable+0x1fc/0x287 [<ffffffff81f638d0>] ? do_early_param+0x88/0x88 [<ffffffff8174b530>] ? rest_init+0x150/0x150 [<ffffffff8174b53e>] kernel_init+0xe/0x130 [<ffffffff8176e27c>] ret_from_fork+0x7c/0xb0 [<ffffffff8174b530>] ? rest_init+0x150/0x150 Code: c1 e0 05 48 63 bc 03 10 01 00 00 48 63 83 d0 00 00 00 48 63 d6 48 c1 e2 08 c1 e1 08 4c 63 c2 48 c1 e0 08 48 98 48 c1 e0 08 48 99 <49> f7 f8 48 98 48 0f af f8 48 c1 ff 08 29 f9 89 ca c1 fa 1f 89 RIP [<ffffffff815c551d>] intel_pstate_timer_func+0x11d/0x2b0 RSP <ffff88001ee03e18> ---[ end trace f166110ed22cc37a ]--- Kernel panic - not syncing: Fatal exception in interrupt Reported-and-tested-by: Kashyap Chamarthy <kchamart@redhat.com> Cc: Josh Boyer <jwboyer@fedoraproject.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-12-04cpufreq: highbank-cpufreq: Enable Midway/ECX-2000Mark Langsdorf1-1/+2
commit fbbc5bfb44a22e7a8ef753a1c8dfb448d7ac8b85 upstream. Calxeda's new ECX-2000 part uses the same cpufreq interface as highbank, so add it to the driver's compatibility list. This is a minor change that can safely be applied to the 3.10 and 3.11 stable trees. Signed-off-by: Mark Langsdorf <mark.langsdorf@calxeda.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-25acpi-cpufreq: Fail initialization if driver cannot be registeredRafael J. Wysocki1-4/+4
Make acpi_cpufreq_init() return error codes when the driver cannot be registered so that the module doesn't stay useless in memory and so that acpi_cpufreq_exit() doesn't attempt to unregister things that have never been registered when the module is unloaded. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2013-10-22intel_pstate: Correct calculation of min pstate valueDirk Brandewie1-2/+3
The minimum pstate is supposed to be a percentage of the maximum P state available. Calculate min using max pstate and not the current max which may have been limited by the user Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-10-22intel_pstate: Improve accuracy by not truncating until final resultBrennan Shacklett1-18/+15
This patch addresses Bug 60727 (https://bugzilla.kernel.org/show_bug.cgi?id=60727) which was due to the truncation of intermediate values in the calculations, which causes the code to consistently underestimate the current cpu frequency, specifically 100% cpu utilization was truncated down to the setpoint of 97%. This patch fixes the problem by keeping the results of all intermediate calculations as fixed point numbers rather scaling them back and forth between integers and fixed point. References: https://bugzilla.kernel.org/show_bug.cgi?id=60727 Signed-off-by: Brennan Shacklett <bpshacklett@gmail.com> Acked-by: Dirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-10-17cpufreq: s3c64xx: Rename index to driver_dataCharles Keepax1-1/+1
The index field of cpufreq_frequency_table has been renamed to driver_data by commit 5070158 (cpufreq: rename index as driver_data in cpufreq_frequency_table). This patch updates the s3c64xx driver to match. Signed-off-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com> Cc: 3.11+ <stable@vger.kernel.org> # 3.11+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-10-17intel_pstate: Fix type mismatch warningRafael J. Wysocki1-3/+4
The expression in line 398 of intel_pstate.c causes the following warning to be emitted: drivers/cpufreq/intel_pstate.c:398:3: warning: left shift count >= width of type which happens because unsigned long is 32-bit on some architectures. Fix that by using a helper u64 variable and simplify the code slightly. Tested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-10-16cpufreq / intel_pstate: Fix max_perf_pct on resumeDirk Brandewie1-4/+3
If the system is suspended while max_perf_pct is less than 100 percent or no_turbo set policy->{min,max} will be set incorrectly with scaled values which turn the scaled values into hard limits. References: https://bugzilla.kernel.org/show_bug.cgi?id=61241 Reported-by: Patrick Bartels <petzicus@googlemail.com> Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com> Cc: 3.9+ <stable@vger.kernel.org> # 3.9+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-10-02intel_pstate: fix no_turboSrinivas Pandruvada1-1/+4
When sysfs for no_turbo is set, then also some p states in turbo regions are observed. This patch will set IDA Engage bit when no_turbo is set to explicitly disengage turbo. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Acked-by: Dirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-30cpufreq: cpufreq-cpu0: NULL is a valid regulator, part 2Philipp Zabel1-1/+1
Since the patch "cpufreq: cpufreq-cpu0: NULL is a valid regulator", cpu_reg contains an error value if the regulator is not set, instead of NULL. Accordingly, fix the remaining check for non-NULL cpu_reg. Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-30cpufreq: SPEAr: Fix incorrect variable typeSachin Kamat1-1/+1
'clk_round_rate' returns a negative error code upon failure. This will never get detected by unsigned 'newfreq'. Make it signed. Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-25cpufreq: exynos5440: Fix potential NULL pointer dereferenceSachin Kamat1-1/+1
If 'dvfs_info' is NULL (due to devm_kzalloc failure) the failure error message would try to dereference it. Use 'pdev' instead. Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-25cpufreq: check cpufreq driver is valid and cpufreq isn't disabled in ↵Viresh Kumar1-0/+3
cpufreq_get() cpufreq_get() can be called from external drivers which might not be aware if cpufreq driver is registered or not. And so we should actually check if cpufreq driver is registered or not and also if cpufreq is active or disabled, at the beginning of cpufreq_get(). Otherwise call to lock_policy_rwsem_read() might hit BUG_ON(!policy). Reported-and-tested-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-25acpi-cpufreq: skip loading acpi_cpufreq after intel_pstateYinghai Lu1-0/+4
If the hw supports intel_pstate and acpi_cpufreq, intel_pstate will get loaded first. acpi_cpufreq_init() will call acpi_cpufreq_early_init() and that will allocate perf data and init those perf data in ACPI core, (that will cover all CPUs). But later it will free them as cpufreq_register_driver(acpi_cpufreq) will fail as intel_pstate is already registered Use cpufreq_get_current_driver() to check if we can skip the acpi_cpufreq loading. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-20cpufreq: return EEXIST instead of EBUSY for second registeringYinghai Lu1-1/+1
On systems that support intel_pstate, acpi_cpufreq fails to load, and udev keeps trying until trace gets filled up and kernel crashes. The root cause is driver return ret from cpufreq_register_driver(), because when some other driver takes over before, it will return EBUSY and then udev will keep trying ... cpufreq_register_driver() should return EEXIST instead so that the system can boot without appending intel_pstate=disable and still use intel_pstate. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-19cpufreq: imx6q-cpufreq: assign cpu_dev correctly to cpu0 deviceSudeep KarkadaNagesha1-1/+6
Commit cdc58d602d2e657602a90c190cbf745886c95977 "cpufreq: imx6q-cpufreq: remove device tree parsing for cpu nodes" assumed the pdev->dev is set to cpu0 device in the platform code. But it actually points to the virtual cpufreq-cpu0 platform device which is not present in the device tree. Most of the information needed by cpufreq is stored in cpu0 DT node. So cpu_dev must point to cpu0 device. This patch fixes the wrong assignment to cpu_dev. Reported-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de> Tested-by: Shawn Guo <shawn.guo@linaro.org> Signed-off-by: Sudeep KarkadaNagesha <sudeep.karkadanagesha@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-19cpufreq: cpufreq-cpu0: assign cpu_dev correctly to cpu0 deviceSudeep KarkadaNagesha1-1/+6
Commit f837a9b5ab05c52a07108c6f09ca66f2e0aee757 "cpufreq: cpufreq-cpu0: remove device tree parsing for cpu nodes" assumed the pdev->dev is set to cpu0 device in the platform code. But it actually points to the virtual cpufreq-cpu0 platform device which is not present in the device tree. Most of the information needed by cpufreq is stored in cpu0 DT node. So cpu_dev must point to cpu0 device. This patch fixes the wrong assignment to cpu_dev. Reported-and-tested-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de> Cc: Shawn Guo <shawn.guo@linaro.org> Signed-off-by: Sudeep KarkadaNagesha <sudeep.karkadanagesha@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-18cpufreq: unlock correct rwsem while updating policy->cpuViresh Kumar1-2/+11
Current code looks like this: WARN_ON(lock_policy_rwsem_write(cpu)); update_policy_cpu(policy, new_cpu); unlock_policy_rwsem_write(cpu); {lock|unlock}_policy_rwsem_write(cpu) takes/releases policy->cpu's rwsem. Because cpu is changing with the call to update_policy_cpu(), the unlock_policy_rwsem_write() will release the incorrect lock. The right solution would be to release the same lock as was taken earlier. Also update_policy_cpu() was also called from cpufreq_add_dev() without any locks and so its better if we move this locking to inside update_policy_cpu(). This patch fixes a regression introduced in 3.12 by commit f9ba680d23 (cpufreq: Extract the handover of policy cpu to a helper function). Reported-and-tested-by: Jon Medhurst<tixy@linaro.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-18cpufreq: Clear policy->cpus bits in __cpufreq_remove_dev_finish()Viresh Kumar1-8/+8
This broke after a recent change "cedb70a cpufreq: Split __cpufreq_remove_dev() into two parts" from Srivatsa. Consider a scenario where we have two CPUs in a policy (0 & 1) and we are removing CPU 1. On the call to __cpufreq_remove_dev_prepare() we have cleared 1 from policy->cpus and now on a call to __cpufreq_remove_dev_finish() we read cpumask_weight of policy->cpus, which will come as 1 and this code will behave as if we are removing the last CPU from policy :) Fix it by clearing the CPU mask in __cpufreq_remove_dev_finish() instead of __cpufreq_remove_dev_prepare(). Tested-by: Stephen Warren <swarren@wwwdotorg.org> Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-12Merge branch 'pm-cpufreq'Rafael J. Wysocki1-17/+32
* pm-cpufreq: cpufreq: Acquire the lock in cpufreq_policy_restore() for reading cpufreq: Prevent problems in update_policy_cpu() if last_cpu == new_cpu cpufreq: Restructure if/else block to avoid unintended behavior cpufreq: Fix crash in cpufreq-stats during suspend/resume
2013-09-12cpufreq: Acquire the lock in cpufreq_policy_restore() for readingLan Tianyu1-2/+2
In cpufreq_policy_restore() before system suspend policy is read from percpu's cpufreq_cpu_data_fallback. It's a read operation rather than a write one, so take the lock for reading in there. Signed-off-by: Lan Tianyu <tianyu.lan@intel.com> Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-12cpufreq: Prevent problems in update_policy_cpu() if last_cpu == new_cpuSrivatsa S. Bhat1-0/+3
If update_policy_cpu() is invoked with the existing policy->cpu itself as the new-cpu parameter, then a lot of things can go terribly wrong. In its present form, update_policy_cpu() always assumes that the new-cpu is different from policy->cpu and invokes other functions to perform their respective updates. And those functions implement the actual update like this: per_cpu(..., new_cpu) = per_cpu(..., last_cpu); per_cpu(..., last_cpu) = NULL; Thus, when new_cpu == last_cpu, the final NULL assignment makes the per-cpu references vanish into thin air! (memory leak). From there, it leads to more problems: cpufreq_stats_create_table() now doesn't find the per-cpu reference and hence tries to create a new sysfs-group; but sysfs already had created the group earlier, so it complains that it cannot create a duplicate filename. In short, the repercussions of a rather innocuous invocation of update_policy_cpu() can turn out to be pretty nasty. Ideally update_policy_cpu() should handle this situation (new == last) gracefully, and not lead to such severe problems. So fix it by adding an appropriate check. Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Tested-by: Stephen Warren <swarren@nvidia.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-12cpufreq: Restructure if/else block to avoid unintended behaviorSrivatsa S. Bhat1-2/+3
In __cpufreq_remove_dev_prepare(), the code which decides whether to remove the sysfs link or nominate a new policy cpu, is governed by an if/else block with a rather complex set of conditionals. Worse, they harbor a subtlety which leads to certain unintended behavior. The code looks like this: if (cpu != policy->cpu && !frozen) { sysfs_remove_link(&dev->kobj, "cpufreq"); } else if (cpus > 1) { new_cpu = cpufreq_nominate_new_policy_cpu(...); ... update_policy_cpu(..., new_cpu); } The original intention was: If the CPU going offline is not policy->cpu, just remove the link. On the other hand, if the CPU going offline is the policy->cpu itself, handover the policy->cpu job to some other surviving CPU in that policy. But because the 'if' condition also includes the 'frozen' check, now there are *two* possibilities by which we can enter the 'else' block: 1. cpu == policy->cpu (intended) 2. cpu != policy->cpu && frozen (unintended) Due to the second (unintended) scenario, we end up spuriously nominating a CPU as the policy->cpu, even when the existing policy->cpu is alive and well. This can cause problems further down the line, especially when we end up nominating the same policy->cpu as the new one (ie., old == new), because it totally confuses update_policy_cpu(). To avoid this mess, restructure the if/else block to only do what was originally intended, and thus prevent any unwelcome surprises. Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Tested-by: Stephen Warren <swarren@nvidia.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-12cpufreq: Fix crash in cpufreq-stats during suspend/resumeSrivatsa S. Bhat1-13/+24
Stephen Warren reported that the cpufreq-stats code hits a NULL pointer dereference during the second attempt to suspend a system. He also pin-pointed the problem to commit 5302c3f "cpufreq: Perform light-weight init/teardown during suspend/resume". That commit actually ensured that the cpufreq-stats table and the cpufreq-stats sysfs entries are *not* torn down (ie., not freed) during suspend/resume, which makes it all the more surprising. However, it turns out that the root-cause is not that we access an already freed memory, but that the reference to the allocated memory gets moved around and we lose track of that during resume, leading to the reported crash in a subsequent suspend attempt. In the suspend path, during CPU offline, the value of policy->cpu is updated by choosing one of the surviving CPUs in that policy, as long as there is atleast one CPU in that policy. And cpufreq_stats_update_policy_cpu() is invoked to update the reference to the stats structure by assigning it to the new CPU. However, in the resume path, during CPU online, we end up assigning a fresh CPU as the policy->cpu, without letting cpufreq-stats know about this. Thus the reference to the stats structure remains (incorrectly) associated with the old CPU. So, in a subsequent suspend attempt, during CPU offline, we end up accessing an incorrect location to get the stats structure, which eventually leads to the NULL pointer dereference. Fix this by letting cpufreq-stats know about the update of the policy->cpu during CPU online in the resume path. (Also, move the update_policy_cpu() function higher up in the file, so that __cpufreq_add_dev() can invoke it). Reported-and-tested-by: Stephen Warren <swarren@nvidia.com> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-11Merge branch 'pm-cpufreq'Rafael J. Wysocki3-34/+76
* pm-cpufreq: intel_pstate: Add Haswell CPU models Revert "cpufreq: make sure frequency transitions are serialized" cpufreq: Use signed type for 'ret' variable, to store negative error values cpufreq: Remove temporary fix for race between CPU hotplug and sysfs-writes cpufreq: Synchronize the cpufreq store_*() routines with CPU hotplug cpufreq: Invoke __cpufreq_remove_dev_finish() after releasing cpu_hotplug.lock cpufreq: Split __cpufreq_remove_dev() into two parts cpufreq: Fix wrong time unit conversion cpufreq: serialize calls to __cpufreq_governor() cpufreq: don't allow governor limits to be changed when it is disabled
2013-09-11intel_pstate: Add Haswell CPU modelsNell Hardcastle1-0/+5
Enable the intel_pstate driver for Haswell CPUs. One missing Ivy Bridge model (0x3E) is also included. Models referenced from tools/power/x86/turbostat/turbostat.c:has_nehalem_turbo_ratio_limit Signed-off-by: Nell Hardcastle <nell@spicious.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Dirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-10Revert "cpufreq: make sure frequency transitions are serialized"Rafael J. Wysocki1-15/+0
Commit 7c30ed5 (cpufreq: make sure frequency transitions are serialized) attempted to serialize frequency transitions by adding checks to the CPUFREQ_PRECHANGE and CPUFREQ_POSTCHANGE notifications. However, it assumed that the notifications will always originate from the driver's .target() callback, but they also can be triggered by cpufreq_out_of_sync() and that leads to warnings like this on some systems: WARNING: CPU: 0 PID: 14543 at drivers/cpufreq/cpufreq.c:317 __cpufreq_notify_transition+0x238/0x260() In middle of another frequency transition accompanied by a call trace similar to this one: [<ffffffff81720daa>] dump_stack+0x46/0x58 [<ffffffff8106534c>] warn_slowpath_common+0x8c/0xc0 [<ffffffff815b8560>] ? acpi_cpufreq_target+0x320/0x320 [<ffffffff81065436>] warn_slowpath_fmt+0x46/0x50 [<ffffffff815b1ec8>] __cpufreq_notify_transition+0x238/0x260 [<ffffffff815b33be>] cpufreq_notify_transition+0x3e/0x70 [<ffffffff815b345d>] cpufreq_out_of_sync+0x6d/0xb0 [<ffffffff815b370c>] cpufreq_update_policy+0x10c/0x160 [<ffffffff815b3760>] ? cpufreq_update_policy+0x160/0x160 [<ffffffff81413813>] cpufreq_set_cur_state+0x8c/0xb5 [<ffffffff814138df>] processor_set_cur_state+0xa3/0xcf [<ffffffff8158e13c>] thermal_cdev_update+0x9c/0xb0 [<ffffffff8159046a>] step_wise_throttle+0x5a/0x90 [<ffffffff8158e21f>] handle_thermal_trip+0x4f/0x140 [<ffffffff8158e377>] thermal_zone_device_update+0x57/0xa0 [<ffffffff81415b36>] acpi_thermal_check+0x2e/0x30 [<ffffffff81415ca0>] acpi_thermal_notify+0x40/0xdc [<ffffffff813e7dbd>] acpi_device_notify+0x19/0x1b [<ffffffff813f8241>] acpi_ev_notify_dispatch+0x41/0x5c [<ffffffff813e3fbe>] acpi_os_execute_deferred+0x25/0x32 [<ffffffff81081060>] process_one_work+0x170/0x4a0 [<ffffffff81082121>] worker_thread+0x121/0x390 [<ffffffff81082000>] ? manage_workers.isra.20+0x170/0x170 [<ffffffff81088fe0>] kthread+0xc0/0xd0 [<ffffffff81088f20>] ? flush_kthread_worker+0xb0/0xb0 [<ffffffff8173582c>] ret_from_fork+0x7c/0xb0 [<ffffffff81088f20>] ? flush_kthread_worker+0xb0/0xb0 For this reason, revert commit 7c30ed5 along with the fix 266c13d (cpufreq: Fix serialization of frequency transitions) on top of it and we will revisit the serialization problem later. Reported-by: Alessandro Bono <alessandro.bono@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-10cpufreq: Use signed type for 'ret' variable, to store negative error valuesSrivatsa S. Bhat1-2/+2
There are places where the variable 'ret' is declared as unsigned int and then used to store negative return values such as -EINVAL. Fix them by declaring the variable as a signed quantity. Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-10cpufreq: Remove temporary fix for race between CPU hotplug and sysfs-writesSrivatsa S. Bhat1-6/+1
Commit "cpufreq: serialize calls to __cpufreq_governor()" had been a temporary and partial solution to the race condition between writing to a cpufreq sysfs file and taking a CPU offline. Now that we have a proper and complete solution to that problem, remove the temporary fix. Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-10cpufreq: Synchronize the cpufreq store_*() routines with CPU hotplugSrivatsa S. Bhat1-2/+9
The functions that are used to write to cpufreq sysfs files (such as store_scaling_max_freq()) are not hotplug safe. They can race with CPU hotplug tasks and lead to problems such as trying to acquire an already destroyed timer-mutex etc. Eg: __cpufreq_remove_dev() __cpufreq_governor(policy, CPUFREQ_GOV_STOP); policy->governor->governor(policy, CPUFREQ_GOV_STOP); cpufreq_governor_dbs() case CPUFREQ_GOV_STOP: mutex_destroy(&cpu_cdbs->timer_mutex) cpu_cdbs->cur_policy = NULL; <PREEMPT> store() __cpufreq_set_policy() __cpufreq_governor(policy, CPUFREQ_GOV_LIMITS); policy->governor->governor(policy, CPUFREQ_GOV_LIMITS); case CPUFREQ_GOV_LIMITS: mutex_lock(&cpu_cdbs->timer_mutex); <-- Warning (destroyed mutex) if (policy->max < cpu_cdbs->cur_policy->cur) <- cur_policy == NULL So use get_online_cpus()/put_online_cpus() in the store_*() functions, to synchronize with CPU hotplug. However, there is an additional point to note here: some parts of the CPU teardown in the cpufreq subsystem are done in the CPU_POST_DEAD stage, with cpu_hotplug.lock *released*. So, using the get/put_online_cpus() functions alone is insufficient; we should also ensure that we don't race with those latter steps in the hotplug sequence. We can easily achieve this by checking if the CPU is online before proceeding with the store, since the CPU would have been marked offline by the time the CPU_POST_DEAD notifiers are executed. Reported-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-10cpufreq: Invoke __cpufreq_remove_dev_finish() after releasing cpu_hotplug.lockSrivatsa S. Bhat1-0/+3
__cpufreq_remove_dev_finish() handles the kobject cleanup for a CPU going offline. But because we destroy the kobject towards the end of the CPU offline phase, there are certain race windows where a task can try to write to a cpufreq sysfs file (eg: using store_scaling_max_freq()) while we are taking that CPU offline, and this can bump up the kobject refcount, which in turn might hinder the CPU offline task from running to completion. (It can also cause other more serious problems such as trying to acquire a destroyed timer-mutex etc., depending on the exact stage of the cleanup at which the task managed to take a new refcount). To fix the race window, we will need to synchronize those store_*() call-sites with CPU hotplug, using get_online_cpus()/put_online_cpus(). However, that in turn can cause a total deadlock because it can end up waiting for the CPU offline task to complete, with incremented refcount! Write to sysfs CPU offline task -------------- ---------------- kobj_refcnt++ Acquire cpu_hotplug.lock get_online_cpus(); Wait for kobj_refcnt to drop to zero **DEADLOCK** A simple way to avoid this problem is to perform the kobject cleanup in the CPU offline path, with the cpu_hotplug.lock *released*. That is, we can perform the wait-for-kobj-refcnt-to-drop as well as the subsequent cleanup in the CPU_POST_DEAD stage of CPU offline, which is run with cpu_hotplug.lock released. Doing this helps us avoid deadlocks due to holding kobject refcounts and waiting on each other on the cpu_hotplug.lock. (Note: We can't move all of the cpufreq CPU offline steps to the CPU_POST_DEAD stage, because certain things such as stopping the governors have to be done before the outgoing CPU is marked offline. So retain those parts in the CPU_DOWN_PREPARE stage itself). Reported-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-10cpufreq: Split __cpufreq_remove_dev() into two partsSrivatsa S. Bhat1-12/+53
During CPU offline, the cpufreq core invokes __cpufreq_remove_dev() to perform work such as stopping the cpufreq governor, clearing the CPU from the policy structure etc, and finally cleaning up the kobject. There are certain subtle issues related to the kobject cleanup, and it would be much easier to deal with them if we separate that part from the rest of the cleanup-work in the CPU offline phase. So split the __cpufreq_remove_dev() function into 2 parts: one that handles the kobject cleanup, and the other that handles the rest of the work. Reported-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-10cpufreq: Fix wrong time unit conversionAndreas Schwab1-1/+1
The time spent by a CPU under a given frequency is stored in jiffies unit in the cpu var cpufreq_stats_table->time_in_state[i], i being the index of the frequency. This is what is displayed in the following file on the right column: cat /sys/devices/system/cpu/cpuX/cpufreq/stats/time_in_state 2301000 19835820 2300000 3172 [...] Now cpufreq converts this jiffies unit delta to clock_t before returning it to the user as in the above file. And that conversion is achieved using the API cputime64_to_clock_t(). Although it accidentally works on traditional tick based cputime accounting, where cputime_t maps directly to jiffies, it doesn't work with other types of cputime accounting such as CONFIG_VIRT_CPU_ACCOUNTING_* where cputime_t can map to nsecs or any granularity preffered by the architecture. For example we get a buggy zero delta on full dyntick configurations: cat /sys/devices/system/cpu/cpuX/cpufreq/stats/time_in_state 2301000 0 2300000 0 [...] Fix this with using the proper jiffies_64_t to clock_t conversion. Reported-and-tested-by: Carsten Emde <C.Emde@osadl.org> Signed-off-by: Andreas Schwab <schwab@linux-m68k.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-10cpufreq: serialize calls to __cpufreq_governor()Viresh Kumar1-1/+6
We can't take a big lock around __cpufreq_governor() as this causes recursive locking for some cases. But calls to this routine must be serialized for every policy. Otherwise we can see some unpredictable events. For example, consider following scenario: __cpufreq_remove_dev() __cpufreq_governor(policy, CPUFREQ_GOV_STOP); policy->governor->governor(policy, CPUFREQ_GOV_STOP); cpufreq_governor_dbs() case CPUFREQ_GOV_STOP: mutex_destroy(&cpu_cdbs->timer_mutex) cpu_cdbs->cur_policy = NULL; <PREEMPT> store() __cpufreq_set_policy() __cpufreq_governor(policy, CPUFREQ_GOV_LIMITS); policy->governor->governor(policy, CPUFREQ_GOV_LIMITS); case CPUFREQ_GOV_LIMITS: mutex_lock(&cpu_cdbs->timer_mutex); <-- Warning (destroyed mutex) if (policy->max < cpu_cdbs->cur_policy->cur) <- cur_policy == NULL And so store() will eventually result in a crash if cur_policy is NULL at this point. Introduce an additional variable which would guarantee serialization here. Reported-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-10cpufreq: don't allow governor limits to be changed when it is disabledViresh Kumar1-2/+3
__cpufreq_governor() returns with -EBUSY when governor is already stopped and we try to stop it again, but when it is stopped we must not allow calls to CPUFREQ_GOV_LIMITS event as well. This patch adds this check in __cpufreq_governor(). Reported-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>