summaryrefslogtreecommitdiff
path: root/drivers/cpufreq
AgeCommit message (Collapse)AuthorFilesLines
2016-03-25Merge tag 'pm+acpi-4.6-rc1-2' of ↵Linus Torvalds5-118/+197
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull more power management and ACPI updates from Rafael Wysocki: "The second batch of power management and ACPI updates for v4.6. Included are fixups on top of the previous PM/ACPI pull request and other material that didn't make into it but still should go into 4.6. Among other things, there's a fix for an intel_pstate driver issue uncovered by recent cpufreq changes, a workaround for a boot hang on Skylake-H related to the handling of deep C-states by the platform and a PCI/ACPI fix for the handling of IO port resources on non-x86 architectures plus some new device IDs and similar. Specifics: - Fix for an intel_pstate driver issue related to the handling of MSR updates uncovered by the recent cpufreq rework (Rafael Wysocki). - cpufreq core cleanups related to starting governors and frequency synchronization during resume from system suspend and a locking fix for cpufreq_quick_get() (Rafael Wysocki, Richard Cochran). - acpi-cpufreq and powernv cpufreq driver updates (Jisheng Zhang, Michael Neuling, Richard Cochran, Shilpasri Bhat). - intel_idle driver update preventing some Skylake-H systems from hanging during initialization by disabling deep C-states mishandled by the platform in the problematic configurations (Len Brown). - Intel Xeon Phi Processor x200 support for intel_idle (Dasaratharaman Chandramouli). - cpuidle menu governor updates to make it always honor PM QoS latency constraints (and prevent C1 from being used as the fallback C-state on x86 when they are set below its exit latency) and to restore the previous behavior to fall back to C1 if the next timer event is set far enough in the future that was changed in 4.4 which led to an energy consumption regression (Rik van Riel, Rafael Wysocki). - New device ID for a future AMD UART controller in the ACPI driver for AMD SoCs (Wang Hongcheng). - Rockchip rk3399 support for the rockchip-io-domain adaptive voltage scaling (AVS) driver (David Wu). - ACPI PCI resources management fix for the handling of IO space resources on architectures where the IO space is memory mapped (IA64 and ARM64) broken by the introduction of common ACPI resources parsing for PCI host bridges in 4.4 (Lorenzo Pieralisi). - Fix for the ACPI backend of the generic device properties API to make it parse non-device (data node only) children of an ACPI device correctly (Irina Tirdea). - Fixes for the handling of global suspend flags (introduced in 4.4) during hibernation and resume from it (Lukas Wunner). - Support for obtaining configuration information from Device Trees in the PM clocks framework (Jon Hunter). - ACPI _DSM helper code and devfreq framework cleanups (Colin Ian King, Geert Uytterhoeven)" * tag 'pm+acpi-4.6-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (23 commits) PM / AVS: rockchip-io: add io selectors and supplies for rk3399 intel_idle: Support for Intel Xeon Phi Processor x200 Product Family intel_idle: prevent SKL-H boot failure when C8+C9+C10 enabled ACPI / PM: Runtime resume devices when waking from hibernate PM / sleep: Clear pm_suspend_global_flags upon hibernate cpufreq: governor: Always schedule work on the CPU running update cpufreq: Always update current frequency before startig governor cpufreq: Introduce cpufreq_update_current_freq() cpufreq: Introduce cpufreq_start_governor() cpufreq: powernv: Add sysfs attributes to show throttle stats cpufreq: acpi-cpufreq: make Intel/AMD MSR access, io port access static PCI: ACPI: IA64: fix IO port generic range check ACPI / util: cast data to u64 before shifting to fix sign extension cpufreq: powernv: Define per_cpu chip pointer to optimize hot-path cpuidle: menu: Fall back to polling if next timer event is near cpufreq: acpi-cpufreq: Clean up hot plug notifier callback intel_pstate: Do not call wrmsrl_on_cpu() with disabled interrupts cpufreq: Make cpufreq_quick_get() safe to call ACPI / property: fix data node parsing in acpi_get_next_subnode() ACPI / APD: Add device HID for future AMD UART controller ...
2016-03-25Merge branches 'pm-cpufreq' and 'pm-cpuidle'Rafael J. Wysocki5-118/+197
* pm-cpufreq: cpufreq: governor: Always schedule work on the CPU running update cpufreq: Always update current frequency before startig governor cpufreq: Introduce cpufreq_update_current_freq() cpufreq: Introduce cpufreq_start_governor() cpufreq: powernv: Add sysfs attributes to show throttle stats cpufreq: acpi-cpufreq: make Intel/AMD MSR access, io port access static cpufreq: powernv: Define per_cpu chip pointer to optimize hot-path cpufreq: acpi-cpufreq: Clean up hot plug notifier callback intel_pstate: Do not call wrmsrl_on_cpu() with disabled interrupts cpufreq: Make cpufreq_quick_get() safe to call * pm-cpuidle: intel_idle: Support for Intel Xeon Phi Processor x200 Product Family intel_idle: prevent SKL-H boot failure when C8+C9+C10 enabled cpuidle: menu: Fall back to polling if next timer event is near cpuidle: menu: use high confidence factors only when considering polling
2016-03-23cpufreq: governor: Always schedule work on the CPU running updateRafael J. Wysocki1-1/+1
Modify dbs_irq_work() to always schedule the process-context work on the current CPU which also ran the dbs_update_util_handler() that the irq_work being handled came from. This causes the entire frequency update handling (involving the "ondemand" or "conservative" governors) to be carried out by the CPU whose frequency is to be updated and reduces the overall amount of inter-CPU noise related to cpufreq. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-23cpufreq: Always update current frequency before startig governorRafael J. Wysocki1-11/+3
Make policy->cur match the current frequency returned by the driver's ->get() callback before starting the governor in case they went out of sync in the meantime and drop the piece of code attempting to resync policy->cur with the real frequency of the boot CPU from cpufreq_resume() as it serves no purpose any more (and it's racy and super-ugly anyway). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-23cpufreq: Introduce cpufreq_update_current_freq()Rafael J. Wysocki1-9/+19
Move the part of cpufreq_update_policy() that obtains the current frequency from the driver and updates policy->cur if necessary to a separate function, cpufreq_get_current_freq(). That should not introduce functional changes and subsequent change set will need it. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-23cpufreq: Introduce cpufreq_start_governor()Rafael J. Wysocki1-22/+22
Starting a governor in cpufreq always follows the same pattern involving two calls to cpufreq_governor(), one with the event argument set to CPUFREQ_GOV_START and one with that argument set to CPUFREQ_GOV_LIMITS. Introduce cpufreq_start_governor() that will carry out those two operations and make all places where governors are started use it. That slightly modifies the behavior of cpufreq_set_policy() which now also will go back to the old governor if the second call to cpufreq_governor() (the one with event equal to CPUFREQ_GOV_LIMITS) fails, but that really is how it should work in the first place. Also cpufreq_resume() will now pring an error message if the CPUFREQ_GOV_LIMITS call to cpufreq_governor() fails, but that makes it follow cpufreq_add_policy_cpu() and cpufreq_offline() in that respect. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-23cpufreq: powernv: Add sysfs attributes to show throttle statsShilpasri G Bhat1-2/+72
Create sysfs attributes to export throttle information in /sys/devices/system/cpu/cpuX/cpufreq/throttle_stats directory. The newly added sysfs files are as follows: 1)/sys/devices/system/cpu/cpuX/cpufreq/throttle_stats/turbo_stat 2)/sys/devices/system/cpu/cpuX/cpufreq/throttle_stats/sub-turbo_stat 3)/sys/devices/system/cpu/cpuX/cpufreq/throttle_stats/unthrottle 4)/sys/devices/system/cpu/cpuX/cpufreq/throttle_stats/powercap 5)/sys/devices/system/cpu/cpuX/cpufreq/throttle_stats/overtemp 6)/sys/devices/system/cpu/cpuX/cpufreq/throttle_stats/supply_fault 7)/sys/devices/system/cpu/cpuX/cpufreq/throttle_stats/overcurrent 8)/sys/devices/system/cpu/cpuX/cpufreq/throttle_stats/occ_reset Detailed explanation of each attribute is added to Documentation/ABI/testing/sysfs-devices-system-cpu Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-23cpufreq: acpi-cpufreq: make Intel/AMD MSR access, io port access staticJisheng Zhang1-6/+6
These frequency register read/write operations' implementations for the given processor (Intel/AMD MSR access or I/O port access) are only used internally in acpi-cpufreq, so make them static. Signed-off-by: Jisheng Zhang <jszhang@marvell.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-22cpufreq: powernv: Define per_cpu chip pointer to optimize hot-pathMichael Neuling1-33/+17
Commit 96c4726f01cd "cpufreq: powernv: Remove cpu_to_chip_id() from hot-path" introduced a 'core_to_chip_map' array to cache the chip-ids of all cores. Replace this with a per-CPU variable that stores the pointer to the chip-array. This removes the linear lookup and provides a neater and simpler solution. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-21Merge tag 'armsoc-drivers' of ↵Linus Torvalds1-2/+0
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC driver updates from Arnd Bergmann: "Driver updates for ARM SoCs, these contain various things that touch the drivers/ directory but got merged through arm-soc for practical reasons: - Rockchip rk3368 gains power domain support - Small updates for the ARM spmi driver - The Atmel PMC driver saw a larger rework, touching both arch/arm/mach-at91 and drivers/clk/at91 - All reset controller driver changes alway get merged through arm-soc, though this time the largest change is the addition of a MIPS pistachio reset driver - One bugfix for the NXP (formerly Freescale) i.MX weim bus driver" * tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (43 commits) bus: imx-weim: Take the 'status' property value into account clk: at91: remove useless includes clk: at91: pmc: remove useless capacities handling clk: at91: pmc: drop at91_pmc_base usb: gadget: atmel: access the PMC using regmap ARM: at91: remove useless includes and function prototypes ARM: at91: pm: move idle functions to pm.c ARM: at91: pm: find and remap the pmc ARM: at91: pm: simply call at91_pm_init clk: at91: pmc: move pmc structures to C file clk: at91: pmc: merge at91_pmc_init in atmel_pmc_probe clk: at91: remove IRQ handling and use polling clk: at91: make use of syscon/regmap internally clk: at91: make use of syscon to share PMC registers in several drivers hwmon: (scpi) add energy meter support firmware: arm_scpi: add support for 64-bit sensor values firmware: arm_scpi: decrease Tx timeout to 20ms firmware: arm_scpi: fix send_message and sensor_get_value for big-endian reset: sti: Make reset_control_ops const reset: zynq: Make reset_control_ops const ...
2016-03-21Merge tag 'armsoc-soc' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC platform updates from Arnd Bergmann: "Newly added support for additional SoCs: - Axis Artpec-6 SoC family - Allwinner A83T SoC - Mediatek MT7623 - NXP i.MX6QP SoC - ST Microelectronics stm32f469 microcontroller New features: - SMP support for Mediatek mt2701 - Big-endian support for NXP i.MX - DaVinci now uses the new DMA engine dma_slave_map - OMAP now uses the new DMA engine dma_slave_map - earlyprintk support for palmchip uart on mach-tango - delay timer support for orion Other: - Exynos PMU driver moved out to drivers/soc/ - Various smaller updates for Renesas, Xilinx, PXA, AT91, OMAP, uniphier" * tag 'armsoc-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (83 commits) ARM: uniphier: rework SMP code to support new System Bus binding ARM: uniphier: add missing of_node_put() ARM: at91: avoid defining CONFIG_* symbols in source code ARM: DRA7: hwmod: Add data for eDMA tpcc, tptc0, tptc1 ARM: imx: Make reset_control_ops const ARM: imx: Do L2 errata only if the L2 cache isn't enabled ARM: imx: select ARM_CPU_SUSPEND only for imx6 dmaengine: pxa_dma: fix the maximum requestor line ARM: alpine: select the Alpine MSI controller driver ARM: pxa: add the number of DMA requestor lines dmaengine: mmp-pdma: add number of requestors dma: mmp_pdma: Add the #dma-requests DT property documentation ARM: OMAP2+: Add rtc hwmod configuration for ti81xx ARM: s3c24xx: Avoid warning for inb/outb ARM: zynq: Move early printk virtual address to vmalloc area ARM: DRA7: hwmod: Add custom reset handler for PCIeSS ARM: SAMSUNG: Remove unused register offset definition ARM: EXYNOS: Cleanup header files inclusion drivers: soc: samsung: Enable COMPILE_TEST MAINTAINERS: Add maintainers entry for drivers/soc/samsung ...
2016-03-20cpufreq: acpi-cpufreq: Clean up hot plug notifier callbackRichard Cochran1-2/+4
This driver has two issues. First, it tries to fiddle with the hot plugged CPU's MSR on the UP_PREPARE event, at a time when the CPU is not yet online. Second, the driver sets the "boost-disable" bit for a CPU when going down, but does not clear the bit again if the CPU comes up again due to DOWN_FAILED. This patch fixes the issues by changing the driver to react to the ONLINE/DOWN_FAILED events instead of UP_PREPARE. As an added benefit, the driver also becomes symmetric with respect to the hot plug mechanism. Signed-off-by: Richard Cochran <rcochran@linutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-20intel_pstate: Do not call wrmsrl_on_cpu() with disabled interruptsRafael J. Wysocki1-30/+43
After commit a4675fbc4a7a (cpufreq: intel_pstate: Replace timers with utilization update callbacks) wrmsrl_on_cpu() cannot be called in the intel_pstate_adjust_busy_pstate() path as that is executed with disabled interrupts. However, atom_set_pstate() called from there via intel_pstate_set_pstate() uses wrmsrl_on_cpu() to update the IA32_PERF_CTL MSR which triggers the WARN_ON_ONCE() in smp_call_function_single(). The reason why wrmsrl_on_cpu() is used by atom_set_pstate() is because intel_pstate_set_pstate() calling it is also invoked during the initialization and cleanup of the driver and in those cases it is not guaranteed to be run on the CPU that is being updated. However, in the case when intel_pstate_set_pstate() is called by intel_pstate_adjust_busy_pstate(), wrmsrl() can be used to update the register safely. Moreover, intel_pstate_set_pstate() already contains code that only is executed if the function is called by intel_pstate_adjust_busy_pstate() and there is a special argument passed to it because of that. To fix the problem at hand, rearrange the code taking the above observations into account. First, replace the ->set() callback in struct pstate_funcs with a ->get_val() one that will return the value to be written to the IA32_PERF_CTL MSR without updating the register. Second, split intel_pstate_set_pstate() into two functions, intel_pstate_update_pstate() to be called by intel_pstate_adjust_busy_pstate() that will contain all of the intel_pstate_set_pstate() code which only needs to be executed in that case and will use wrmsrl() to update the MSR (after obtaining the value to write to it from the ->get_val() callback), and intel_pstate_set_min_pstate() to be invoked during the initialization and cleanup that will set the P-state to the minimum one and will update the MSR using wrmsrl_on_cpu(). Finally, move the code shared between intel_pstate_update_pstate() and intel_pstate_set_min_pstate() to a new static inline function intel_pstate_record_pstate() and make them both call it. Of course, that unifies the handling of the IA32_PERF_CTL MSR writes between Atom and Core. Fixes: a4675fbc4a7a (cpufreq: intel_pstate: Replace timers with utilization update callbacks) Reported-and-tested-by: Josh Boyer <jwboyer@fedoraproject.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-18cpufreq: Make cpufreq_quick_get() safe to callRichard Cochran1-2/+10
The function, cpufreq_quick_get, accesses the global 'cpufreq_driver' and its fields without taking the associated lock, cpufreq_driver_lock. Without the locking, nothing guarantees that 'cpufreq_driver' remains consistent during the call. This patch fixes the issue by taking the lock before accessing the data structure. Signed-off-by: Richard Cochran <rcochran@linutronix.de> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-14Merge branch 'pm-cpufreq'Rafael J. Wysocki15-1676/+1344
* pm-cpufreq: (94 commits) intel_pstate: Do not skip samples partially intel_pstate: Remove freq calculation from intel_pstate_calc_busy() intel_pstate: Move intel_pstate_calc_busy() into get_target_pstate_use_performance() intel_pstate: Optimize calculation for max/min_perf_adj intel_pstate: Remove extra conversions in pid calculation cpufreq: Move scheduler-related code to the sched directory Revert "cpufreq: postfix policy directory with the first CPU in related_cpus" cpufreq: Reduce cpufreq_update_util() overhead a bit cpufreq: Select IRQ_WORK if CPU_FREQ_GOV_COMMON is set cpufreq: Remove 'policy->governor_enabled' cpufreq: Rename __cpufreq_governor() to cpufreq_governor() cpufreq: Relocate handle_update() to kill its declaration cpufreq: governor: Drop unnecessary checks from show() and store() cpufreq: governor: Fix race in dbs_update_util_handler() cpufreq: governor: Make gov_set_update_util() static cpufreq: governor: Narrow down the dbs_data_mutex coverage cpufreq: governor: Make dbs_data_mutex static cpufreq: governor: Relocate definitions of tuners structures cpufreq: governor: Move per-CPU data to the common code cpufreq: governor: Make governor private data per-policy ...
2016-03-11intel_pstate: Do not skip samples partiallyRafael J. Wysocki1-5/+7
If the current value of MPERF or the current value of TSC is the same as the previous one, respectively, intel_pstate_sample() bails out early and skips the sample. However, intel_pstate_adjust_busy_pstate() is still called in that case which is not correct, so modify intel_pstate_sample() to return a bool value indicating whether or not the sample has been taken and use it to decide whether or not to call intel_pstate_adjust_busy_pstate(). While at it, remove redundant parentheses from the MPERF/TSC check in intel_pstate_sample(). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
2016-03-11intel_pstate: Remove freq calculation from intel_pstate_calc_busy()Philippe Longepe1-8/+8
Use a helper function to compute the average pstate and call it only where it is needed (only when tracing or in intel_pstate_get). Signed-off-by: Philippe Longepe <philippe.longepe@linux.intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-11intel_pstate: Move intel_pstate_calc_busy() into ↵Philippe Longepe1-3/+2
get_target_pstate_use_performance() The cpu_load algorithm doesn't need to invoke intel_pstate_calc_busy(), so move that call from intel_pstate_sample() to get_target_pstate_use_performance(). Signed-off-by: Philippe Longepe <philippe.longepe@linux.intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-11intel_pstate: Optimize calculation for max/min_perf_adjPhilippe Longepe1-2/+2
mul_fp(int_tofp(A), B) expands to: ((A << FRAC_BITS) * B) >> FRAC_BITS, so the same result can be obtained via simple multiplication A * B. Apply this observation to max_perf * limits->max_perf and max_perf * limits->min_perf in intel_pstate_get_min_max()." Signed-off-by: Philippe Longepe <philippe.longepe@linux.intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-11intel_pstate: Remove extra conversions in pid calculationPhilippe Longepe1-4/+4
pid->setpoint and pid->deadband can be initialized in fixed point, so we can avoid the int_tofp in pid_calc. Signed-off-by: Philippe Longepe <philippe.longepe@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-10Merge branch 'pm-cpufreq-governor' into pm-cpufreqRafael J. Wysocki9-1142/+903
2016-03-10cpufreq: Move scheduler-related code to the sched directoryRafael J. Wysocki2-53/+1
Create cpufreq.c under kernel/sched/ and move the cpufreq code related to the scheduler to that file and to sched.h. Redefine cpufreq_update_util() as a static inline function to avoid function calls at its call sites in the scheduler code (as suggested by Peter Zijlstra). Also move the definition of struct update_util_data and declaration of cpufreq_set_update_util_data() from include/linux/cpufreq.h to include/linux/sched.h. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2016-03-09Revert "cpufreq: postfix policy directory with the first CPU in related_cpus"Viresh Kumar1-11/+10
Revert commit 3510fac45492 (cpufreq: postfix policy directory with the first CPU in related_cpus). Earlier, the policy->kobj was added to the kobject core, before ->init() callback was called for the cpufreq drivers. Which allowed those drivers to add or remove, driver dependent, sysfs files/directories to the same kobj from their ->init() and ->exit() callbacks. That isn't possible anymore after commit 3510fac45492. Now, there is no other clean alternative that people can adopt. Its better to revert the earlier commit to allow cpufreq drivers to create/remove sysfs files from ->init() and ->exit() callbacks. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-09cpufreq: Reduce cpufreq_update_util() overhead a bitRafael J. Wysocki3-11/+20
Use the observation that cpufreq_update_util() is only called by the scheduler with rq->lock held, so the callers of cpufreq_set_update_util_data() can use synchronize_sched() instead of synchronize_rcu() to wait for cpufreq_update_util() to complete. Moreover, if they are updated to do that, rcu_read_(un)lock() calls in cpufreq_update_util() might be replaced with rcu_read_(un)lock_sched(), respectively, but those aren't really necessary, because the scheduler calls that function from RCU-sched read-side critical sections already. In addition to that, if cpufreq_set_update_util_data() checks the func field in the struct update_util_data before setting the per-CPU pointer to it, the data->func check may be dropped from cpufreq_update_util() as well. Make the above changes to reduce the overhead from cpufreq_update_util() in the scheduler paths invoking it and to make the cleanup after removing its callbacks less heavy-weight somewhat. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2016-03-09cpufreq: Select IRQ_WORK if CPU_FREQ_GOV_COMMON is setRafael J. Wysocki1-1/+1
Commit 0eb463be3436 (cpufreq: governor: Replace timers with utilization update callbacks) made CPU_FREQ select IRQ_WORK, but that's not necessary, as it is sufficient for IRQ_WORK to be selected by CPU_FREQ_GOV_COMMON, so modify the cpufreq Kconfig to that effect. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09cpufreq: Remove 'policy->governor_enabled'Viresh Kumar1-17/+0
The entire sequence of events (like INIT/START or STOP/EXIT) for which cpufreq_governor() is called, is guaranteed to be protected by policy->rwsem now. The additional checks that were added earlier (as we were forced to drop policy->rwsem before calling cpufreq_governor() for EXIT event), aren't required anymore. Over that, they weren't sufficient really. They just take care of START/STOP events, but not INIT/EXIT and the state machine was never maintained properly by them. Kill the unnecessary checks and policy->governor_enabled field. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-09cpufreq: Rename __cpufreq_governor() to cpufreq_governor()Viresh Kumar1-23/+21
The __ at the beginning of the routine aren't really necessary at all. Rename it to cpufreq_governor() instead. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-09cpufreq: Relocate handle_update() to kill its declarationViresh Kumar1-10/+9
handle_update() is declared at the top of the file as its user appear before its definition. Relocate the routine to get rid of this. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-09cpufreq: governor: Drop unnecessary checks from show() and store()Viresh Kumar1-7/+3
The show() and store() routines in the cpufreq-governor core don't need to check if the struct governor_attr they want to use really provides the callbacks they need as expected (if that's not the case, it means a bug in the code anyway), so change them to avoid doing that. Also change the error value to -EBUSY, if the governor is getting removed and we aren't allowed to store any more changes. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-09cpufreq: governor: Fix race in dbs_update_util_handler()Rafael J. Wysocki1-5/+16
There is a scenario that may lead to undesired results in dbs_update_util_handler(). Namely, if two CPUs sharing a policy enter the funtion at the same time, pass the sample delay check and then one of them is stalled until dbs_work_handler() (queued up by the other CPU) clears the work counter, it may update the work counter and queue up another work item prematurely. To prevent that from happening, use the observation that the CPU queuing up a work item in dbs_update_util_handler() updates the last sample time. This means that if another CPU was stalling after passing the sample delay check and now successfully updated the work counter as a result of the race described above, it will see the new value of the last sample time which is different from what it used in the sample delay check before. If that happens, the sample delay check passed previously is not valid any more, so the CPU should not continue. Fixes: f17cbb53783c (cpufreq: governor: Avoid atomic operations in hot paths) Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09cpufreq: governor: Make gov_set_update_util() staticRafael J. Wysocki1-3/+2
The gov_set_update_util() routine is only used internally by the common governor code and it doesn't need to be exported, so make it static. No functional changes. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09cpufreq: governor: Narrow down the dbs_data_mutex coverageRafael J. Wysocki1-23/+23
Since cpufreq_governor_dbs() is now always called with policy->rwsem held, it cannot be executed twice in parallel for the same policy. Thus it is not necessary to hold dbs_data_mutex around the invocations of cpufreq_governor_start/stop/limits() from it as those functions never modify any data that can be shared between different policies. However, cpufreq_governor_dbs() may be executed twice in parallal for different policies using the same gov->gdbs_data object and dbs_data_mutex is still necessary to protect that object against concurrent updates. For this reason, narrow down the dbs_data_mutex locking to cpufreq_governor_init/exit() where it is needed and rename the mutex to gov_dbs_data_mutex to reflect its purpose. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09cpufreq: governor: Make dbs_data_mutex staticRafael J. Wysocki2-3/+1
That mutex is only used by cpufreq_governor_dbs() and it doesn't need to be exported to modules, so make it static and drop the export incantation. No functional changes. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09cpufreq: governor: Relocate definitions of tuners structuresRafael J. Wysocki3-10/+9
Move the definitions of struct od_dbs_tuners and struct cs_dbs_tuners from the common governor header to the ondemand and conservative governor code, respectively, as they don't need to be in the common header any more. No functional changes. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09cpufreq: governor: Move per-CPU data to the common codeRafael J. Wysocki4-59/+25
After previous changes there is only one piece of code in the ondemand governor making references to per-CPU data structures, but it can be easily modified to avoid doing that, so modify it accordingly and move the definition of per-CPU data used by the ondemand and conservative governors to the common code. Next, change that code to access the per-CPU data structures directly rather than via a governor callback. This causes the ->get_cpu_cdbs governor callback to become unnecessary, so drop it along with the macro and function definitions related to it. Finally, drop the definitions of struct od_cpu_dbs_info_s and struct cs_cpu_dbs_info_s that aren't necessary any more. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09cpufreq: governor: Make governor private data per-policyRafael J. Wysocki6-28/+87
Some fields in struct od_cpu_dbs_info_s and struct cs_cpu_dbs_info_s are only used for a limited set of CPUs. Namely, if a policy is shared between multiple CPUs, those fields will only be used for one of them (policy->cpu). This means that they really are per-policy rather than per-CPU and holding room for them in per-CPU data structures is generally wasteful. Also moving those fields into per-policy data structures will allow some significant simplifications to be made going forward. For this reason, introduce struct cs_policy_dbs_info and struct od_policy_dbs_info to hold those fields. Define each of the new structures as an extension of struct policy_dbs_info (such that struct policy_dbs_info is embedded in each of them) and introduce new ->alloc and ->free governor callbacks to allocate and free those structures, respectively, such that ->alloc() will return a pointer to the struct policy_dbs_info embedded in the allocated data structure and ->free() will take that pointer as its argument. With that, modify the code accessing the data fields in question in per-CPU data objects to look for them in the new structures via the struct policy_dbs_info pointer available to it and drop them from struct od_cpu_dbs_info_s and struct cs_cpu_dbs_info_s. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09cpufreq: ondemand: Rework the handling of powersave bias updatesRafael J. Wysocki1-17/+13
The ondemand_powersave_bias_init() function used for resetting data fields related to the powersave bias tunable of the ondemand governor works by walking all of the online CPUs in the system and updating the od_cpu_dbs_info_s structures for all of them. However, if governor tunables are per policy, the update should not touch the CPUs that are not associated with the given dbs_data. Moreover, since the data fields in question are only ever used for policy->cpu in each policy governed by ondemand, the update can be limited to those specific CPUs. Rework the code to take the above observations into account. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09cpufreq: governor: Fix CPU load information updates via ->storeRafael J. Wysocki4-28/+40
The ->store() callbacks of some tunable sysfs attributes of the ondemand and conservative governors trigger immediate updates of the CPU load information for all CPUs "governed" by the given dbs_data by walking the cpu_dbs_info structures for all online CPUs in the system and updating them. This is questionable for two reasons. First, it may lead to a lot of extra overhead on a system with many CPUs if the given dbs_data is only associated with a few of them. Second, if governor tunables are per-policy, the CPUs associated with the other sets of governor tunables should not be updated. To address this issue, use the observation that in all of the places in question the update operation may be carried out in the same way (because all of the tunables involved are now located in struct dbs_data and readily available to the common code) and make the code in those places invoke the same (new) helper function that will carry out the update correctly. That new function always checks the ignore_nice_load tunable value and updates the CPUs' prev_cpu_nice data fields if that's set, which wasn't done by the original code in store_io_is_busy(), but it should have been done in there too. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09cpufreq: ondemand: Drop one more callback from struct od_opsRafael J. Wysocki2-3/+1
The ->powersave_bias_init_cpu callback in struct od_ops is only used in one place and that invocation may be replaced with a direct call to the function pointed to by that callback, so change the code accordingly and drop the callback. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09cpufreq: governor: Drop unused governor callback and data fieldsRafael J. Wysocki3-19/+1
After some previous changes, the ->get_cpu_dbs_info_s governor callback and the "governor" field in struct dbs_governor (whose value represents the governor type) are not used any more, so drop them. Also drop the unused gov_ops field from struct dbs_governor. No functional changes. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09cpufreq: governor: Add a ->start callback for governorsRafael J. Wysocki4-14/+22
To avoid having to check the governor type explicitly in the common code in order to initialize data structures specific to the governor type properly, add a ->start callback to struct dbs_governor and use it to initialize those data structures for the ondemand and conservative governors. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09cpufreq: governor: Move io_is_busy to struct dbs_dataRafael J. Wysocki3-26/+15
The io_is_busy governor tunable is only used by the ondemand governor and is located in the ondemand-specific data structure, but it is looked at by the common governor code that has to do ugly things to get to that value, so move it to struct dbs_data and modify ondemand accordingly. Since the conservative governor never touches that field, it will be always 0 for that governor and it won't have any effect on the results of computations in that case. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09cpufreq: governor: Close dbs_data update race conditionRafael J. Wysocki1-1/+1
It is possible for a dbs_data object to be updated after its usage counter has become 0. That may happen if governor_store() runs (via a govenor tunable sysfs attribute write) in parallel with cpufreq_governor_exit() called for the last cpufreq policy associated with the dbs_data in question. In that case, if governor_store() acquires dbs_data->mutex right after cpufreq_governor_exit() has released it, the ->store() callback invoked by it may operate on dbs_data with no users. Although sysfs will cause the kobject_put() in cpufreq_governor_exit() to block until governor_store() has returned, that situation may lead to some unexpected results, depending on the implementation of the ->store callback, and therefore it should be avoided. To that end, modify governor_store() to check the dbs_data's usage count before invoking the ->store() callback and return an error if it is 0 at that point. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09cpufreq: ondemand: Drop unused callback from struct od_opsRafael J. Wysocki2-2/+0
The ->freq_increase callback in struct od_ops is never invoked, so drop it. No functional changes. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09cpufreq: ondemand: Simplify od_update() slightlyRafael J. Wysocki1-8/+5
Drop some lines of code from od_update() by arranging the statements in there in a more logical way. No functional changes. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09cpufreq: governor: Use microseconds in sample delay computationsRafael J. Wysocki4-32/+17
Do not convert microseconds to jiffies and the other way around in governor computations related to the sampling rate and sample delay and drop delay_for_sampling_rate() which isn't of any use then. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09cpufreq: ondemand: Simplify conditionals in od_dbs_timer()Rafael J. Wysocki1-13/+11
Reduce the indentation level in the conditionals in od_dbs_timer() and drop the delay variable from it. No functional changes. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09cpufreq: governor: Move rate_mult to struct policy_dbsRafael J. Wysocki3-25/+26
The rate_mult field in struct od_cpu_dbs_info_s is used by the code shared with the conservative governor and to access it that code has to do an ugly governor type check. However, first of all it is ever only used for policy->cpu, so it is per-policy rather than per-CPU and second, it is initialized to 1 by cpufreq_governor_start(), so if the conservative governor never modifies it, it will have no effect on the results of any computations. For these reasons, move rate_mult to struct policy_dbs_info (as a common field). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09cpufreq: governor: Reset sample delay in store_sampling_rate()Rafael J. Wysocki1-12/+4
If store_sampling_rate() updates the sample delay when the ondemand governor is in the middle of its high/low dance (OD_SUB_SAMPLE sample type is set), the governor will still do the bottom half of the previous sample which may take too much time. To prevent that from happening, change store_sampling_rate() to always reset the sample delay to 0 which also is consistent with the new behavior of cpufreq_governor_limits(). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09cpufreq: governor: Get rid of the ->gov_check_cpu callbackRafael J. Wysocki4-32/+27
The way the ->gov_check_cpu governor callback is used by the ondemand and conservative governors is not really straightforward. Namely, the governor calls dbs_check_cpu() that updates the load information for the policy and the invokes ->gov_check_cpu() for the governor. To get rid of that entanglement, notice that cpufreq_governor_limits() doesn't need to call dbs_check_cpu() directly. Instead, it can simply reset the sample delay to 0 which will cause a sample to be taken immediately. The result of that is practically equivalent to calling dbs_check_cpu() except that it will trigger a full update of governor internal state and not just the ->gov_check_cpu() part. Following that observation, make cpufreq_governor_limits() reset the sample delay and turn dbs_check_cpu() into a function that will simply evaluate the load and return the result called dbs_update(). That function can now be called by governors from the routines that previously were pointed to by ->gov_check_cpu and those routines can be called directly by each governor instead of dbs_check_cpu(). This way ->gov_check_cpu becomes unnecessary, so drop it. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>