kernel/linux.git - Linux kernel stable tree (mirror)

Age	Commit message (Collapse)	Author	Files	Lines
2021-02-08	cpufreq: ACPI: Update arch scale-invariance max perf ratio if CPPC is not there	Rafael J. Wysocki	1	-0/+8
	If the maximum performance level taken for computing the arch_max_freq_ratio value used in the x86 scale-invariance code is higher than the one corresponding to the cpuinfo.max_freq value coming from the acpi_cpufreq driver, the scale-invariant utilization falls below 100% even if the CPU runs at cpuinfo.max_freq or slightly faster, which causes the schedutil governor to select a frequency below cpuinfo.max_freq. That frequency corresponds to a frequency table entry below the maximum performance level necessary to get to the "boost" range of CPU frequencies which prevents "boost" frequencies from being used in some workloads. While this issue is related to scale-invariance, it may be amplified by commit db865272d9c4 ("cpufreq: Avoid configuring old governors as default with intel_pstate") from the 5.10 development cycle which made it extremely easy to default to schedutil even if the preferred driver is acpi_cpufreq as long as intel_pstate is built too, because the mere presence of the latter effectively removes the ondemand governor from the defaults. Distro kernels are likely to include both intel_pstate and acpi_cpufreq on x86, so their users who cannot use intel_pstate or choose to use acpi_cpufreq may easily be affectecd by this issue. If CPPC is available, it can be used to address this issue by extending the frequency tables created by acpi_cpufreq to cover the entire available frequency range (including "boost" frequencies) for each CPU, but if CPPC is not there, acpi_cpufreq has no idea what the maximum "boost" frequency is and the frequency tables created by it cannot be extended in a meaningful way, so in that case make it ask the arch scale-invariance code to to use the "nominal" performance level for CPU utilization scaling in order to avoid the issue at hand. Fixes: db865272d9c4 ("cpufreq: Avoid configuring old governors as default with intel_pstate") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Giovanni Gherdovich <ggherdovich@suse.cz> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2021-02-08	cpufreq: ACPI: Extend frequency tables to cover boost frequencies	Rafael J. Wysocki	1	-12/+95
	A severe performance regression on AMD EPYC processors when using the schedutil scaling governor was discovered by Phoronix.com and attributed to the following commits: 41ea667227ba ("x86, sched: Calculate frequency invariance for AMD systems") 976df7e5730e ("x86, sched: Use midpoint of max_boost and max_P for frequency invariance on AMD EPYC") The source of the problem is that the maximum performance level taken for computing the arch_max_freq_ratio value used in the x86 scale- invariance code is higher than the one corresponding to the cpuinfo.max_freq value coming from the acpi_cpufreq driver. This effectively causes the scale-invariant utilization to fall below 100% even if the CPU runs at cpuinfo.max_freq or slightly faster, so the schedutil governor selects a frequency below cpuinfo.max_freq then. That frequency corresponds to a frequency table entry below the maximum performance level necessary to get to the "boost" range of CPU frequencies. However, if the cpuinfo.max_freq value coming from acpi_cpufreq was higher, the schedutil governor would select higher frequencies which in turn would allow acpi_cpufreq to set more adequate performance levels and to get to the "boost" range of CPU frequencies more often. This issue affects any systems where acpi_cpufreq is used and the "boost" (or "turbo") frequencies are enabled, not just AMD EPYC. Moreover, commit db865272d9c4 ("cpufreq: Avoid configuring old governors as default with intel_pstate") from the 5.10 development cycle made it extremely easy to default to schedutil even if the preferred driver is acpi_cpufreq as long as intel_pstate is built too, because the mere presence of the latter effectively removes the ondemand governor from the defaults. Distro kernels are likely to include both intel_pstate and acpi_cpufreq on x86, so their users who cannot use intel_pstate or choose to use acpi_cpufreq may easily be affectecd by this issue. To address this issue, extend the frequency table constructed by acpi_cpufreq for each CPU to cover the entire range of available frequencies (including the "boost" ones) if CPPC is available and indicates that "boost" (or "turbo") frequencies are enabled. That causes cpuinfo.max_freq to become the maximum "boost" frequency of the given CPU (instead of the maximum frequency returned by the ACPI _PSS object that corresponds to the "nominal" performance level). Fixes: 41ea667227ba ("x86, sched: Calculate frequency invariance for AMD systems") Fixes: 976df7e5730e ("x86, sched: Use midpoint of max_boost and max_P for frequency invariance on AMD EPYC") Fixes: db865272d9c4 ("cpufreq: Avoid configuring old governors as default with intel_pstate") Link: https://www.phoronix.com/scan.php?page=article&item=linux511-amd-schedutil&num=1 Link: https://lore.kernel.org/linux-pm/20210203135321.12253-2-ggherdovich@suse.cz/ Reported-by: Michael Larabel <Michael@phoronix.com> Diagnosed-by: Giovanni Gherdovich <ggherdovich@suse.cz> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Giovanni Gherdovich <ggherdovich@suse.cz> Reviewed-by: Giovanni Gherdovich <ggherdovich@suse.cz> Tested-by: Michael Larabel <Michael@phoronix.com>
2021-01-07	cpufreq: intel_pstate: remove obsolete functions	Lukas Bulwahn	1	-10/+0
	percent_fp() was used in intel_pstate_pid_reset(), which was removed in commit 9d0ef7af1f2d ("cpufreq: intel_pstate: Do not use PID-based P-state selection") and hence, percent_fp() is unused since then. percent_ext_fp() was last used in intel_pstate_update_perf_limits(), which was refactored in commit 1a4fe38add8b ("cpufreq: intel_pstate: Remove max/min fractions to limit performance"), and hence, percent_ext_fp() is unused since then. make CC=clang W=1 points us those unused functions: drivers/cpufreq/intel_pstate.c:79:23: warning: unused function 'percent_fp' [-Wunused-function] static inline int32_t percent_fp(int percent) ^ drivers/cpufreq/intel_pstate.c:94:23: warning: unused function 'percent_ext_fp' [-Wunused-function] static inline int32_t percent_ext_fp(int percent) ^ Remove those obsolete functions. Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-01-07	cpufreq: powernow-k8: pass policy rather than use cpufreq_cpu_get()	Colin Ian King	1	-6/+3
	Currently there is an unlikely case where cpufreq_cpu_get() returns a NULL policy and this will cause a NULL pointer dereference later on. Fix this by passing the policy to transition_frequency_fidvid() from the caller and hence eliminating the need for the cpufreq_cpu_get() and cpufreq_cpu_put(). Thanks to Viresh Kumar for suggesting the fix. Addresses-Coverity: ("Dereference null return") Fixes: b43a7ffbf33b ("cpufreq: Notify all policy->cpus in cpufreq_notify_transition()") Suggested-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-01-07	cpufreq: intel_pstate: Use HWP capabilities in intel_cpufreq_adjust_perf()	Rafael J. Wysocki	1	-2/+3
	If turbo P-states cannot be used, either due to the configuration of the processor, or because intel_pstate is not allowed to used them, the maximum available P-state with HWP enabled corresponds to the HWP_CAP.GUARANTEED value which is not static. It can be adjusted by an out-of-band agent or during an Intel Speed Select performance level change, so long as it remains less than or equal to HWP_CAP.MAX. However, if turbo P-states cannot be used, intel_cpufreq_adjust_perf() always uses pstate.max_pstate (set during the initialization of the driver only) as the maximum available P-state, so it may miss a change of the HWP_CAP.GUARANTEED value. Prevent that from happening by modifyig intel_cpufreq_adjust_perf() to always read the "guaranteed" and "maximum turbo" performance levels from the cached HWP_CAP value. Fixes: a365ab6b9dfb ("cpufreq: intel_pstate: Implement the ->adjust_perf() callback") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
2020-12-30	cpufreq: intel_pstate: Fix fast-switch fallback path	Rafael J. Wysocki	1	-1/+0
	When sugov_update_single_perf() falls back to the "frequency" path due to the missing scale-invariance, it will call cpufreq_driver_fast_switch() via sugov_fast_switch() and the driver's ->fast_switch() callback will be invoked, so it must not be NULL. However, after commit a365ab6b9dfb ("cpufreq: intel_pstate: Implement the ->adjust_perf() callback") intel_pstate sets ->fast_switch() to NULL when it is going to use intel_cpufreq_adjust_perf(), which is a mistake, because on x86 the scale-invariance may be turned off dynamically, so modify it to retain the original ->adjust_perf() callback pointer. Fixes: a365ab6b9dfb ("cpufreq: intel_pstate: Implement the ->adjust_perf() callback") Reported-by: Kenneth R. Crudup <kenny@panix.com> Tested-by: Kenneth R. Crudup <kenny@panix.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-12-22	Merge branch 'pm-cpufreq'	Rafael J. Wysocki	3	-103/+227
	* pm-cpufreq: cpufreq: intel_pstate: Use most recent guaranteed performance values cpufreq: intel_pstate: Implement the ->adjust_perf() callback cpufreq: Add special-purpose fast-switching callback for drivers cpufreq: schedutil: Add util to struct sg_cpu cppc_cpufreq: replace per-cpu data array with a list cppc_cpufreq: expose information on frequency domains cppc_cpufreq: clarify support for coordination types cppc_cpufreq: use policy->cpu as driver of frequency setting ACPI: processor: fix NONE coordination for domain mapping failure ACPI: processor: Drop duplicate setting of shared_cpu_map
2020-12-21	cpufreq: intel_pstate: Use most recent guaranteed performance values	Rafael J. Wysocki	1	-3/+13
	When turbo has been disabled by the BIOS, but HWP_CAP.GUARANTEED is changed later, user space may want to take advantage of this increased guaranteed performance. HWP_CAP.GUARANTEED is not a static value. It can be adjusted by an out-of-band agent or during an Intel Speed Select performance level change. The HWP_CAP.MAX is still the maximum achievable performance with turbo disabled by the BIOS, so HWP_CAP.GUARANTEED can still change as long as it remains less than or equal to HWP_CAP.MAX. When HWP_CAP.GUARANTEED is changed, the sysfs base_frequency attribute shows the most recent guaranteed frequency value. This attribute can be used by user space software to update the scaling min/max limits of the CPU. Currently, the ->setpolicy() callback already uses the latest HWP_CAP values when setting HWP_REQ, but the ->verify() callback will restrict the user settings to the to old guaranteed performance value which prevents user space from making use of the extra CPU capacity theoretically available to it after increasing HWP_CAP.GUARANTEED. To address this, read HWP_CAP in intel_pstate_verify_cpu_policy() to obtain the maximum P-state that can be used and use that to confine the policy max limit instead of using the cached and possibly stale pstate.max_freq value for this purpose. For consistency, update intel_pstate_update_perf_limits() to use the maximum available P-state returned by intel_pstate_get_hwp_max() to compute the maximum frequency instead of using the return value of intel_pstate_get_max_freq() which, again, may be stale. This issue is a side-effect of fixing the scaling frequency limits in commit eacc9c5a927e ("cpufreq: intel_pstate: Fix intel_pstate_get_hwp_max() for turbo disabled") which corrected the setting of the reduced scaling frequency values, but caused stale HWP_CAP.GUARANTEED to be used in the case at hand. Fixes: eacc9c5a927e ("cpufreq: intel_pstate: Fix intel_pstate_get_hwp_max() for turbo disabled") Reported-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Tested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: 5.8+ <stable@vger.kernel.org> # 5.8+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-12-15	cpufreq: intel_pstate: Implement the ->adjust_perf() callback	Rafael J. Wysocki	1	-12/+58
	Make intel_pstate expose the ->adjust_perf() callback when it operates in the passive mode with HWP enabled which causes the schedutil governor to use that callback instead of ->fast_switch(). The minimum and target performance-level values passed by the governor to ->adjust_perf() are converted to HWP.REQ.MIN and HWP.REQ.DESIRED, respectively, which allows the processor to adjust its configuration to maximize energy-efficiency while providing sufficient capacity. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2020-12-15	cpufreq: Add special-purpose fast-switching callback for drivers	Rafael J. Wysocki	1	-0/+40
	First off, some cpufreq drivers (eg. intel_pstate) can pass hints beyond the current target frequency to the hardware and there are no provisions for doing that in the cpufreq framework. In particular, today the driver has to assume that it should not allow the frequency to fall below the one requested by the governor (or the required capacity may not be provided) which may not be the case and which may lead to excessive energy usage in some scenarios. Second, the hints passed by these drivers to the hardware need not be in terms of the frequency, so representing the utilization numbers coming from the scheduler as frequency before passing them to those drivers is not really useful. Address the two points above by adding a special-purpose replacement for the ->fast_switch callback, called ->adjust_perf, allowing the governor to pass abstract performance level (rather than frequency) values for the minimum (required) and target (desired) performance along with the CPU capacity to compare them to. Also update the schedutil governor to use the new callback instead of ->fast_switch if present and if the utilization mertics are frequency-invariant (that is requisite for the direct mapping between the utilization and the CPU performance levels to be a reasonable approximation). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2020-12-15	cppc_cpufreq: replace per-cpu data array with a list	Ionela Voinescu	1	-83/+91
	The cppc_cpudata per-cpu storage was inefficient (1) additional to causing functional issues (2) when CPUs are hotplugged out, due to per-cpu data being improperly initialised. (1) The amount of information needed for CPPC performance control in its cpufreq driver depends on the domain (PSD) coordination type: ANY: One set of CPPC control and capability data (e.g desired performance, highest/lowest performance, etc) applies to all CPUs in the domain. ALL: Same as ANY. To be noted that this type is not currently supported. When supported, information about which CPUs belong to a domain is needed in order for frequency change requests to be sent to each of them. HW: It's necessary to store CPPC control and capability information for all the CPUs. HW will then coordinate the performance state based on their limitations and requests. NONE: Same as HW. No HW coordination is expected. Despite this, the previous initialisation code would indiscriminately allocate memory for all CPUs (all_cpu_data) and unnecessarily duplicate performance capabilities and the domain sharing mask and type for each possible CPU. (2) With the current per-cpu structure, when having ANY coordination, the cppc_cpudata cpu information is not initialised (will remain 0) for all CPUs in a policy, other than policy->cpu. When policy->cpu is hotplugged out, the driver will incorrectly use the uninitialised (0) value of the other CPUs when making frequency changes. Additionally, the previous values stored in the perf_ctrls.desired_perf will be lost when policy->cpu changes. Therefore replace the array of per cpu data with a list. The memory for each structure is allocated at policy init, where a single structure can be allocated per policy, not per cpu. In order to accommodate the struct list_head node in the cppc_cpudata structure, the now unused cpu and cur_policy variables are removed. For example, on a arm64 Juno platform with 6 CPUs: (0, 1, 2, 3) in PSD1, (4, 5) in PSD2 - ANY coordination, the memory allocation comparison shows: Before patch: - ANY coordination: total slack req alloc/free caller 0 0 0 0/1 _kernel_size_le_hi32+0x0xffff800008ff7810 0 0 0 0/6 _kernel_size_le_hi32+0x0xffff800008ff7808 128 80 48 1/0 _kernel_size_le_hi32+0x0xffff800008ffc070 768 0 768 6/0 _kernel_size_le_hi32+0x0xffff800008ffc0e4 After patch: - ANY coordination: total slack req alloc/free caller 256 0 256 2/0 _kernel_size_le_hi32+0x0xffff800008fed410 0 0 0 0/2 _kernel_size_le_hi32+0x0xffff800008fed274 Additional notes: - A pointer to the policy's cppc_cpudata is stored in policy->driver_data - Driver registration is skipped if _CPC entries are not present. Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com> Tested-by: Mian Yousaf Kaukab <ykaukab@suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-12-15	cppc_cpufreq: expose information on frequency domains	Ionela Voinescu	1	-0/+14
	Use the existing sysfs attribute "freqdomain_cpus" to expose information to userspace about CPUs in the same frequency domain. Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Tested-by: Mian Yousaf Kaukab <ykaukab@suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-12-15	cppc_cpufreq: clarify support for coordination types	Ionela Voinescu	1	-7/+12
	The previous coordination type handling in the cppc_cpufreq init code created some confusion: the comment mentioned "Support only SW_ANY for now" while only the SW_ALL/ALL case resulted in a failure. The other coordination types (HW_ALL/HW, NONE) were silently supported. Clarify support for coordination types while describing in comments the intended behavior. Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Tested-by: Mian Yousaf Kaukab <ykaukab@suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-12-15	cppc_cpufreq: use policy->cpu as driver of frequency setting	Ionela Voinescu	1	-2/+3
	Considering only the currently supported coordination types (ANY, HW, NONE), this change only makes a difference for the ANY type, when policy->cpu is hotplugged out. In that case the new policy->cpu will be different from ((struct cppc_cpudata )policy->driver_data)->cpu. While in this case the controls of ANY* CPU could be used to drive frequency changes, it's more consistent to use policy->cpu as the leading CPU, as used in all other cppc_cpufreq functions. Additionally, the debug prints in cppc_set_perf() would no longer create confusion when referring to a CPU that is hotplugged out. Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Tested-by: Mian Yousaf Kaukab <ykaukab@suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-12-15	Merge branch 'pm-cpufreq'	Rafael J. Wysocki	18	-223/+288
	* pm-cpufreq: (31 commits) cpufreq: Fix cpufreq_online() return value on errors cpufreq: Fix up several kerneldoc comments cpufreq: stats: Use local_clock() instead of jiffies cpufreq: schedutil: Simplify sugov_update_next_freq() cpufreq: intel_pstate: Simplify intel_cpufreq_update_pstate() cpufreq: arm_scmi: Discover the power scale in performance protocol firmware: arm_scmi: Add power_scale_mw_get() interface cpufreq: tegra194: Rename tegra194_get_speed_common function cpufreq: tegra194: Remove unnecessary frequency calculation cpufreq: tegra186: Simplify cluster information lookup cpufreq: tegra186: Fix sparse 'incorrect type in assignment' warning cpufreq: imx: fix NVMEM_IMX_OCOTP dependency cpufreq: vexpress-spc: Add missing MODULE_ALIAS cpufreq: scpi: Add missing MODULE_ALIAS cpufreq: loongson1: Add missing MODULE_ALIAS cpufreq: sun50i: Add missing MODULE_DEVICE_TABLE cpufreq: st: Add missing MODULE_DEVICE_TABLE cpufreq: qcom: Add missing MODULE_DEVICE_TABLE cpufreq: mediatek: Add missing MODULE_DEVICE_TABLE cpufreq: highbank: Add missing MODULE_DEVICE_TABLE ...
2020-12-14	Merge branch 'cpufreq/arm/linux-next' of ↵	Rafael J. Wysocki	14	-88/+150
	git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm Pull ARM cpufreq updates for 5.11-rc1 from Viresh Kumar: "This contains the following updates: - Fix imx's NVMEM_IMX_OCOTP dependency (Arnd Bergmann). - Add support for mt8167 and blacklist mt8516 (Fabien Parent). - Some ->get() callback related cleanups to the tegra194 driver and some optimizations in tegra186 driver (Jon Hunter and Sumit Gupta). - Power scale improvements to arm_scmi driver (Lukasz Luba). - Add missing MODULE_DEVICE_TABLE and MODULE_ALIAS to several drivers (Pali Rohár). - Fix error path in mediatek driver (Qinglang Miao). - Fix memleak in ST's cpufreq driver (Yangtao Li)." * 'cpufreq/arm/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm: (22 commits) cpufreq: arm_scmi: Discover the power scale in performance protocol firmware: arm_scmi: Add power_scale_mw_get() interface cpufreq: tegra194: Rename tegra194_get_speed_common function cpufreq: tegra194: Remove unnecessary frequency calculation cpufreq: tegra186: Simplify cluster information lookup cpufreq: tegra186: Fix sparse 'incorrect type in assignment' warning cpufreq: imx: fix NVMEM_IMX_OCOTP dependency cpufreq: vexpress-spc: Add missing MODULE_ALIAS cpufreq: scpi: Add missing MODULE_ALIAS cpufreq: loongson1: Add missing MODULE_ALIAS cpufreq: sun50i: Add missing MODULE_DEVICE_TABLE cpufreq: st: Add missing MODULE_DEVICE_TABLE cpufreq: qcom: Add missing MODULE_DEVICE_TABLE cpufreq: mediatek: Add missing MODULE_DEVICE_TABLE cpufreq: highbank: Add missing MODULE_DEVICE_TABLE cpufreq: ap806: Add missing MODULE_DEVICE_TABLE cpufreq: mediatek: add missing platform_driver_unregister() on error in mtk_cpufreq_driver_init cpufreq: tegra194: get consistent cpuinfo_cur_freq cpufreq: blacklist mt8516 in cpufreq-dt-platdev cpufreq: mediatek: Add support for mt8167 ...
2020-12-14	Merge branch 'opp/linux-next' of ↵	Rafael J. Wysocki	2	-98/+72
	git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm Pull OPP (Operating Performance Points) updates for 5.11-rc1 from Viresh Kumar: "This contains the following updates: - Allow empty (node-less) OPP tables in DT for passing just the dependency related information (Nicola Mazzucato). - Fix a potential lockdep in OPP core and other OPP core cleanups (Viresh Kumar). - Don't abuse dev_pm_opp_get_opp_table() to create an OPP table, fix cpufreq-dt driver for the same (Viresh Kumar). - dev_pm_opp_put_regulators() accepts a NULL argument now, updates to all the users as well (Viresh Kumar)." * 'opp/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm: opp: of: Allow empty opp-table with opp-shared dt-bindings: opp: Allow empty OPP tables media: venus: dev_pm_opp_put_() accepts NULL argument drm/panfrost: dev_pm_opp_put_() accepts NULL argument drm/lima: dev_pm_opp_put_() accepts NULL argument PM / devfreq: exynos: dev_pm_opp_put_() accepts NULL argument cpufreq: qcom-cpufreq-nvmem: dev_pm_opp_put_() accepts NULL argument cpufreq: dt: dev_pm_opp_put_regulators() accepts NULL argument opp: Allow dev_pm_opp_put_() APIs to accept NULL opp_table opp: Don't create an OPP table from dev_pm_opp_get_opp_table() cpufreq: dt: Don't (ab)use dev_pm_opp_get_opp_table() to create OPP table opp: Reduce the size of critical section in _opp_kref_release() opp: Don't return opp_dev from _find_opp_dev() opp: Allocate the OPP table outside of opp_table_lock opp: Always add entries in dev_list with opp_table->lock held
2020-12-11	cpufreq: Fix cpufreq_online() return value on errors	Wang ShaoBo	1	-1/+4
	Make cpufreq_online() return negative error codes on all errors that cause the policy to be destroyed, as appropriate. Signed-off-by: Wang ShaoBo <bobo.shaobowang@huawei.com> [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-12-11	cpufreq: Fix up several kerneldoc comments	Rafael J. Wysocki	1	-35/+35
	Fix up the remaining kerneldoc comments that don't adhere to the expected format and clarify some of them a bit. No functional changes. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2020-12-11	cpufreq: stats: Use local_clock() instead of jiffies	Viresh Kumar	1	-8/+8
	local_clock() has better precision and accuracy as compared to jiffies, lets use it for time management in cpufreq stats. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-12-11	cpufreq: intel_pstate: Simplify intel_cpufreq_update_pstate()	Rafael J. Wysocki	1	-5/+4
	Avoid doing the same assignment in both branches of a conditional, do it after the whole conditional instead. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-12-11	Merge back cpufreq material for v5.11.	Rafael J. Wysocki	2	-86/+87

2020-12-09	cpufreq: qcom-cpufreq-nvmem: dev_pm_opp_put_*() accepts NULL argument	Viresh Kumar	1	-9/+6
	The dev_pm_opp_put_*() APIs now accepts a NULL opp_table pointer and so there is no need for us to carry the extra checks. Drop them. Reviewed-by: Ilia Lin <ilia.lin@kernel.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2020-12-09	cpufreq: dt: dev_pm_opp_put_regulators() accepts NULL argument	Viresh Kumar	1	-4/+2
	The dev_pm_opp_put_*() APIs now accepts a NULL opp_table pointer and so there is no need for us to carry the extra checks. Drop them. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2020-12-09	cpufreq: dt: Don't (ab)use dev_pm_opp_get_opp_table() to create OPP table	Viresh Kumar	1	-89/+68
	Initially, the helper dev_pm_opp_get_opp_table() was supposed to be used only for the OPP core's internal use (it tries to find an existing OPP table and if it doesn't find one, then it allocates the OPP table). Sometime back, the cpufreq-dt driver started using it to make sure all the relevant resources required by the OPP core are available earlier during initialization process to properly propagate -EPROBE_DEFER. It worked but it also abused the API to create an OPP table, which should be created with the help of other helpers provided by the OPP core. The OPP core will be updated in a later commit to limit the scope of dev_pm_opp_get_opp_table() to only finding an existing OPP table and not create one. This commit updates the cpufreq-dt driver before that happens. Now the cpufreq-dt driver creates the OPP and cpufreq tables for all the CPUs from driver's init callback itself. Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2020-12-08	Merge branch 'cpufreq/scmi' into cpufreq/arm/linux-next	Viresh Kumar	7	-12/+27

2020-12-08	cpufreq: arm_scmi: Discover the power scale in performance protocol	Lukasz Luba	1	-1/+3
	Add mechanism to discover the power scale present in the performance protocol for all domains. Provide this information to Energy Model, which then can be checked in other frameworks, e.g. thermal. Suggested-by: Morten Rasmussen <morten.rasmussen@arm.com> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2020-12-07	cpufreq: tegra194: Rename tegra194_get_speed_common function	Jon Hunter	1	-2/+2
	The function tegra194_get_speed_common() uses hardware timers to calculate the current CPUFREQ and so rename this function to be tegra194_calculate_speed() to reflect what it does. Signed-off-by: Jon Hunter <jonathanh@nvidia.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2020-12-07	cpufreq: tegra194: Remove unnecessary frequency calculation	Jon Hunter	1	-9/+3
	The Tegra194 CPUFREQ driver sets the CPUFREQ_NEED_INITIAL_FREQ_CHECK flag which means that the CPUFREQ framework will call the 'get' callback on boot to determine the current frequency of the CPUs. Therefore, it is not necessary for the Tegra194 CPUFREQ driver to internally call the tegra194_get_speed_common() during initialisation to query the current frequency as well. Fix this by removing the call to the tegra194_get_speed_common() during initialisation and simplify the code. Signed-off-by: Jon Hunter <jonathanh@nvidia.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2020-12-07	cpufreq: tegra186: Simplify cluster information lookup	Jon Hunter	1	-65/+20
	The CPUFREQ driver framework references each individual CPUs when getting and setting the speed. Tegra186 has 3 clusters of A57 CPUs and 1 cluster of Denver CPUs. Hence, the Tegra186 CPUFREQ driver need to know which cluster a given CPU belongs to. The logic in the Tegra186 driver can be greatly simplified by storing the cluster ID associated with each CPU in the tegra186_cpufreq_cpu structure. This allow us to completely remove the Tegra cluster info structure from the driver and simplifiy the code. Signed-off-by: Jon Hunter <jonathanh@nvidia.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2020-12-07	cpufreq: tegra186: Fix sparse 'incorrect type in assignment' warning	Jon Hunter	1	-15/+46
	Sparse warns that the incorrect type is being assigned to the CPUFREQ driver_data variable in the Tegra186 CPUFREQ driver. The Tegra186 CPUFREQ driver is assigned a type of 'void __iomem ' to a pointer of type 'void ' ... drivers/cpufreq/tegra186-cpufreq.c:72:37: sparse: sparse: incorrect type in assignment (different address spaces) @@ expected void driver_data @@ got void [noderef] __iomem @@ ... drivers/cpufreq/tegra186-cpufreq.c:87:40: sparse: sparse: incorrect type in initializer (different address spaces) @@ expected void [noderef] __iomem edvd_reg @@ got void driver_data @@ The Tegra186 CPUFREQ driver is using the policy->driver_data variable to store and iomem pointer to a Tegra186 CPU register that is used to set the clock speed for the CPU. This is not necessary because the register base address is already stored in the driver data and the offset of the register for each CPU is static. Therefore, fix this by adding a new structure with the register offsets for each CPU and store this in the main driver data structure along with the register base address. Please note that a new structure has been added for storing the register offsets rather than a simple array, because this will permit further clean-ups and simplification of the driver. Signed-off-by: Jon Hunter <jonathanh@nvidia.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2020-12-07	cpufreq: imx: fix NVMEM_IMX_OCOTP dependency	Arnd Bergmann	1	-1/+1
	A driver should not 'select' drivers from another subsystem. If NVMEM is disabled, this one results in a warning: WARNING: unmet direct dependencies detected for NVMEM_IMX_OCOTP Depends on [n]: NVMEM [=n] && (ARCH_MXC [=y] \|\| COMPILE_TEST [=y]) && HAS_IOMEM [=y] Selected by [y]: - ARM_IMX6Q_CPUFREQ [=y] && CPU_FREQ [=y] && (ARM \|\| ARM64 [=y]) && ARCH_MXC [=y] && REGULATOR_ANATOP [=y] Change the 'select' to 'depends on' to prevent it from going wrong, and allow compile-testing without that driver, since it is only a runtime dependency. Fixes: 2782ef34ed23 ("cpufreq: imx: Select NVMEM_IMX_OCOTP") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2020-12-07	cpufreq: vexpress-spc: Add missing MODULE_ALIAS	Pali Rohár	1	-0/+1
	This patch adds missing MODULE_ALIAS for automatic loading of this cpufreq driver when it is compiled as an external module. Signed-off-by: Pali Rohár <pali@kernel.org> Fixes: 47ac9aa165540 ("cpufreq: arm_big_little: add vexpress SPC interface driver") Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2020-12-07	cpufreq: scpi: Add missing MODULE_ALIAS	Pali Rohár	1	-0/+1
	This patch adds missing MODULE_ALIAS for automatic loading of this cpufreq driver when it is compiled as an external module. Signed-off-by: Pali Rohár <pali@kernel.org> Fixes: 8def31034d033 ("cpufreq: arm_big_little: add SCPI interface driver") Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2020-12-07	cpufreq: loongson1: Add missing MODULE_ALIAS	Pali Rohár	1	-0/+1
	This patch adds missing MODULE_ALIAS for automatic loading of this cpufreq driver when it is compiled as an external module. Signed-off-by: Pali Rohár <pali@kernel.org> Fixes: a0a22cf14472f ("cpufreq: Loongson1: Add cpufreq driver for Loongson1B") Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2020-12-07	cpufreq: sun50i: Add missing MODULE_DEVICE_TABLE	Pali Rohár	1	-0/+1
	This patch adds missing MODULE_DEVICE_TABLE definition which generates correct modalias for automatic loading of this cpufreq driver when it is compiled as an external module. Signed-off-by: Pali Rohár <pali@kernel.org> Fixes: f328584f7bff8 ("cpufreq: Add sun50i nvmem based CPU scaling driver") Reviewed-by: Yangtao Li <tiny.windzz@gmail.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2020-12-07	cpufreq: st: Add missing MODULE_DEVICE_TABLE	Pali Rohár	1	-0/+7
	This patch adds missing MODULE_DEVICE_TABLE definition which generates correct modalias for automatic loading of this cpufreq driver when it is compiled as an external module. Signed-off-by: Pali Rohár <pali@kernel.org> Fixes: ab0ea257fc58d ("cpufreq: st: Provide runtime initialised driver for ST's platforms") Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2020-12-07	cpufreq: qcom: Add missing MODULE_DEVICE_TABLE	Pali Rohár	1	-0/+1
	This patch adds missing MODULE_DEVICE_TABLE definition which generates correct modalias for automatic loading of this cpufreq driver when it is compiled as an external module. Signed-off-by: Pali Rohár <pali@kernel.org> Fixes: 46e2856b8e188 ("cpufreq: Add Kryo CPU scaling driver") Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2020-12-07	cpufreq: mediatek: Add missing MODULE_DEVICE_TABLE	Pali Rohár	1	-0/+1
	This patch adds missing MODULE_DEVICE_TABLE definition which generates correct modalias for automatic loading of this cpufreq driver when it is compiled as an external module. Signed-off-by: Pali Rohár <pali@kernel.org> Fixes: 501c574f4e3a5 ("cpufreq: mediatek: Add support of cpufreq to MT2701/MT7623 SoC") Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2020-12-07	cpufreq: highbank: Add missing MODULE_DEVICE_TABLE	Pali Rohár	1	-0/+7
	This patch adds missing MODULE_DEVICE_TABLE definition which generates correct modalias for automatic loading of this cpufreq driver when it is compiled as an external module. Signed-off-by: Pali Rohár <pali@kernel.org> Fixes: 6754f556103be ("cpufreq / highbank: add support for highbank cpufreq") Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2020-12-07	cpufreq: ap806: Add missing MODULE_DEVICE_TABLE	Pali Rohár	1	-0/+6
	This patch adds missing MODULE_DEVICE_TABLE definition which generates correct modalias for automatic loading of this cpufreq driver when it is compiled as an external module. Signed-off-by: Pali Rohár <pali@kernel.org> Fixes: f525a670533d9 ("cpufreq: ap806: add cpufreq driver for Armada 8K") Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2020-12-07	cpufreq: mediatek: add missing platform_driver_unregister() on error in ↵	Qinglang Miao	1	-0/+1
	mtk_cpufreq_driver_init Add the missing platform_driver_unregister() before return from mtk_cpufreq_driver_init in the error handling case when failed to register mtk-cpufreq platform device Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2020-12-07	cpufreq: tegra194: get consistent cpuinfo_cur_freq	Sumit Gupta	1	-9/+53
	Frequency returned by 'cpuinfo_cur_freq' using counters is not fixed and keeps changing slightly. This change returns a consistent value from freq_table. If the reconstructed frequency has acceptable delta from the last written value, then return the frequency corresponding to the last written ndiv value from freq_table. Otherwise, print a warning and return the reconstructed freq. Signed-off-by: Sumit Gupta <sumitg@nvidia.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2020-12-07	cpufreq: blacklist mt8516 in cpufreq-dt-platdev	Fabien Parent	1	-0/+1
	Add MT8516 to cpufreq-dt-platdev blacklist since the actual scaling is handled by the 'mediatek-cpufreq' driver. Signed-off-by: Fabien Parent <fparent@baylibre.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2020-12-07	cpufreq: mediatek: Add support for mt8167	Fabien Parent	2	-0/+2
	Add compatible string for mediatek mt8167 Signed-off-by: Fabien Parent <fparent@baylibre.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2020-12-07	cpufreq: sti-cpufreq: fix mem leak in sti_cpufreq_set_opp_info()	Yangtao Li	1	-1/+6
	Use dev_pm_opp_put_prop_name() to avoid mem leak, which free opp_table. Signed-off-by: Yangtao Li <tiny.windzz@gmail.com> Signed-off-by: Yangtao Li <frank@allwinnertech.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2020-11-23	Merge branch 'cpufreq/arm/fixes' of ↵	Rafael J. Wysocki	1	-1/+3
	git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm Pull SCMI cpufreq driver fix for 5.10-rc6 from Viresh Kumar: "This fixes a build issues with SCMI cpufreq driver in the !CONFIG_COMMON_CLK case." * 'cpufreq/arm/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm: cpufreq: scmi: Fix build for !CONFIG_COMMON_CLK
2020-11-23	cpufreq: scmi: Fix build for !CONFIG_COMMON_CLK	Sudeep Holla	1	-1/+3
	Commit 8410e7f3b31e ("cpufreq: scmi: Fix OPP addition failure with a dummy clock provider") registers a dummy clock provider using devm_of_clk_add_hw_provider. These *_hw_provider functions are defined only when CONFIG_COMMON_CLK=y. One possible fix is to add the Kconfig dependency, but since we plan to move away from the clock dependency for scmi cpufreq, it is preferrable to avoid that. Let us just conditionally compile out the offending call to devm_of_clk_add_hw_provider. It also uses the variable 'dev' outside of the #ifdef block to avoid build warning. Fixes: 8410e7f3b31e ("cpufreq: scmi: Fix OPP addition failure with a dummy clock provider") Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2020-11-17	cppc_cpufreq: simplify use of performance capabilities	Ionela Voinescu	1	-17/+23
	The CPPC performance capabilities are used significantly throughout the driver. Simplify the use of them by introducing a local pointer "caps" to point to cpu_data->perf_caps, in functions that access performance capabilities often. Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-11-17	cppc_cpufreq: clean up cpu, cpu_num and cpunum variable use	Ionela Voinescu	1	-74/+69
	In order to maintain the typical naming convention in the cpufreq framework: - replace the use of "cpu" variable name for cppc_cpudata pointers with "cpu_data" - replace variable names "cpu_num" and "cpunum" with "cpu" - make cpu variables unsigned int Where pertinent, also move the initialisation of cpu_data variable to its declaration and make consistent use of the local "cpu" variable. Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>