diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2021-04-27 01:10:25 +0300 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2021-04-27 01:10:25 +0300 |
commit | 5469f160e6bf38b84eb237055868286e629b8d44 (patch) | |
tree | f4ca3ebd04e46af3e895f941ec76d714f92670ff /drivers/cpufreq | |
parent | d8f9176b4ece17e831306072678cd9ae49688cf5 (diff) | |
parent | 59e2c959f20f9f255a42de52cde54a2962fb726f (diff) | |
download | linux-5469f160e6bf38b84eb237055868286e629b8d44.tar.xz |
Merge tag 'pm-5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"These add some new hardware support (for example, IceLake-D idle
states in intel_idle), fix some issues (for example, the handling of
negative "sleep length" values in cpuidle governors), add new
functionality to the existing drivers (for example, scale-invariance
support in the ACPI CPPC cpufreq driver) and clean up code all over.
Specifics:
- Add idle states table for IceLake-D to the intel_idle driver and
update IceLake-X C6 data in it (Artem Bityutskiy).
- Fix the C7 idle state on Tegra114 in the tegra cpuidle driver and
drop the unused do_idle() firmware call from it (Dmitry Osipenko).
- Fix cpuidle-qcom-spm Kconfig entry (He Ying).
- Fix handling of possible negative tick_nohz_get_next_hrtimer()
return values of in cpuidle governors (Rafael Wysocki).
- Add support for frequency-invariance to the ACPI CPPC cpufreq
driver and update the frequency-invariance engine (FIE) to use it
as needed (Viresh Kumar).
- Simplify the default delay_us setting in the ACPI CPPC cpufreq
driver (Tom Saeger).
- Clean up frequency-related computations in the intel_pstate cpufreq
driver (Rafael Wysocki).
- Fix TBG parent setting for load levels in the armada-37xx cpufreq
driver and drop the CPU PM clock .set_parent method for armada-37xx
(Marek Behún).
- Fix multiple issues in the armada-37xx cpufreq driver (Pali Rohár).
- Fix handling of dev_pm_opp_of_cpumask_add_table() return values in
cpufreq-dt to take the -EPROBE_DEFER one into acconut as
appropriate (Quanyang Wang).
- Fix format string in ia64-acpi-cpufreq (Sergei Trofimovich).
- Drop the unused for_each_policy() macro from cpufreq (Shaokun
Zhang).
- Simplify computations in the schedutil cpufreq governor to avoid
unnecessary overhead (Yue Hu).
- Fix typos in the s5pv210 cpufreq driver (Bhaskar Chowdhury).
- Fix cpufreq documentation links in Kconfig (Alexander Monakov).
- Fix PCI device power state handling in pci_enable_device_flags() to
avoid issuse in some cases when the device depends on an ACPI power
resource (Rafael Wysocki).
- Add missing documentation of pm_runtime_resume_and_get() (Alan
Stern).
- Add missing static inline stub for pm_runtime_has_no_callbacks() to
pm_runtime.h and drop the unused try_to_freeze_nowarn() definition
(YueHaibing).
- Drop duplicate struct device declaration from pm.h and fix a
structure type declaration in intel_rapl.h (Wan Jiabing).
- Use dev_set_name() instead of an open-coded equivalent of it in the
wakeup sources code and drop a redundant local variable
initialization from it (Andy Shevchenko, Colin Ian King).
- Use crc32 instead of md5 for e820 memory map integrity check during
resume from hibernation on x86 (Chris von Recklinghausen).
- Fix typos in comments in the system-wide and hibernation support
code (Lu Jialin).
- Modify the generic power domains (genpd) code to avoid resuming
devices in the "prepare" phase of system-wide suspend and
hibernation (Ulf Hansson).
- Add Hygon Fam18h RAPL support to the intel_rapl power capping
driver (Pu Wen).
- Add MAINTAINERS entry for the dynamic thermal power management
(DTPM) code (Daniel Lezcano).
- Add devm variants of operating performance points (OPP) API
functions and switch over some users of the OPP framework to the
new resource-managed API (Yangtao Li and Dmitry Osipenko).
- Update devfreq core:
* Register devfreq devices as cooling devices on demand (Daniel
Lezcano).
* Add missing unlock opeation in devfreq_add_device() (Lukasz
Luba).
* Use the next frequency as resume_freq instead of the previous
frequency when using the opp-suspend property (Dong Aisheng).
* Check get_dev_status in devfreq_update_stats() (Dong Aisheng).
* Fix set_freq path for the userspace governor in Kconfig (Dong
Aisheng).
* Remove invalid description of get_target_freq() (Dong Aisheng).
- Update devfreq drivers:
* imx8m-ddrc: Remove imx8m_ddrc_get_dev_status() and unneeded
of_match_ptr() (Dong Aisheng, Fabio Estevam).
* rk3399_dmc: dt-bindings: Add rockchip,pmu phandle and drop
references to undefined symbols (Enric Balletbo i Serra, Gaël
PORTAY).
* rk3399_dmc: Use dev_err_probe() to simplify the code (Krzysztof
Kozlowski).
* imx-bus: Remove unneeded of_match_ptr() (Fabio Estevam).
- Fix kernel-doc warnings in three places (Pierre-Louis Bossart).
- Fix typo in the pm-graph utility code (Ricardo Ribalda)"
* tag 'pm-5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (74 commits)
PM: wakeup: remove redundant assignment to variable retval
PM: hibernate: x86: Use crc32 instead of md5 for hibernation e820 integrity check
cpufreq: Kconfig: fix documentation links
PM: wakeup: use dev_set_name() directly
PM: runtime: Add documentation for pm_runtime_resume_and_get()
cpufreq: intel_pstate: Simplify intel_pstate_update_perf_limits()
cpufreq: armada-37xx: Fix module unloading
cpufreq: armada-37xx: Remove cur_frequency variable
cpufreq: armada-37xx: Fix determining base CPU frequency
cpufreq: armada-37xx: Fix driver cleanup when registration failed
clk: mvebu: armada-37xx-periph: Fix workaround for switching from L1 to L0
clk: mvebu: armada-37xx-periph: Fix switching CPU freq from 250 Mhz to 1 GHz
cpufreq: armada-37xx: Fix the AVS value for load L1
clk: mvebu: armada-37xx-periph: remove .set_parent method for CPU PM clock
cpufreq: armada-37xx: Fix setting TBG parent for load levels
cpuidle: Fix ARM_QCOM_SPM_CPUIDLE configuration
cpuidle: tegra: Remove do_idle firmware call
cpuidle: tegra: Fix C7 idling state on Tegra114
PM: sleep: fix typos in comments
cpufreq: Remove unused for_each_policy macro
...
Diffstat (limited to 'drivers/cpufreq')
-rw-r--r-- | drivers/cpufreq/Kconfig | 23 | ||||
-rw-r--r-- | drivers/cpufreq/Kconfig.arm | 10 | ||||
-rw-r--r-- | drivers/cpufreq/armada-37xx-cpufreq.c | 111 | ||||
-rw-r--r-- | drivers/cpufreq/cppc_cpufreq.c | 259 | ||||
-rw-r--r-- | drivers/cpufreq/cpufreq-dt.c | 9 | ||||
-rw-r--r-- | drivers/cpufreq/cpufreq.c | 3 | ||||
-rw-r--r-- | drivers/cpufreq/ia64-acpi-cpufreq.c | 4 | ||||
-rw-r--r-- | drivers/cpufreq/intel_pstate.c | 107 | ||||
-rw-r--r-- | drivers/cpufreq/s5pv210-cpufreq.c | 14 |
9 files changed, 398 insertions, 142 deletions
diff --git a/drivers/cpufreq/Kconfig b/drivers/cpufreq/Kconfig index 85de313ddec2..c3038cdc6865 100644 --- a/drivers/cpufreq/Kconfig +++ b/drivers/cpufreq/Kconfig @@ -13,7 +13,8 @@ config CPU_FREQ clock speed, you need to either enable a dynamic cpufreq governor (see below) after boot, or use a userspace tool. - For details, take a look at <file:Documentation/cpu-freq>. + For details, take a look at + <file:Documentation/admin-guide/pm/cpufreq.rst>. If in doubt, say N. @@ -140,8 +141,6 @@ config CPU_FREQ_GOV_USERSPACE To compile this driver as a module, choose M here: the module will be called cpufreq_userspace. - For details, take a look at <file:Documentation/cpu-freq/>. - If in doubt, say Y. config CPU_FREQ_GOV_ONDEMAND @@ -158,7 +157,8 @@ config CPU_FREQ_GOV_ONDEMAND To compile this driver as a module, choose M here: the module will be called cpufreq_ondemand. - For details, take a look at linux/Documentation/cpu-freq. + For details, take a look at + <file:Documentation/admin-guide/pm/cpufreq.rst>. If in doubt, say N. @@ -182,7 +182,8 @@ config CPU_FREQ_GOV_CONSERVATIVE To compile this driver as a module, choose M here: the module will be called cpufreq_conservative. - For details, take a look at linux/Documentation/cpu-freq. + For details, take a look at + <file:Documentation/admin-guide/pm/cpufreq.rst>. If in doubt, say N. @@ -246,8 +247,6 @@ config IA64_ACPI_CPUFREQ This driver adds a CPUFreq driver which utilizes the ACPI Processor Performance States. - For details, take a look at <file:Documentation/cpu-freq/>. - If in doubt, say N. endif @@ -271,8 +270,6 @@ config LOONGSON2_CPUFREQ Loongson2F and it's successors support this feature. - For details, take a look at <file:Documentation/cpu-freq/>. - If in doubt, say N. config LOONGSON1_CPUFREQ @@ -282,8 +279,6 @@ config LOONGSON1_CPUFREQ This option adds a CPUFreq driver for loongson1 processors which support software configurable cpu frequency. - For details, take a look at <file:Documentation/cpu-freq/>. - If in doubt, say N. endif @@ -293,8 +288,6 @@ config SPARC_US3_CPUFREQ help This adds the CPUFreq driver for UltraSPARC-III processors. - For details, take a look at <file:Documentation/cpu-freq>. - If in doubt, say N. config SPARC_US2E_CPUFREQ @@ -302,8 +295,6 @@ config SPARC_US2E_CPUFREQ help This adds the CPUFreq driver for UltraSPARC-IIe processors. - For details, take a look at <file:Documentation/cpu-freq>. - If in doubt, say N. endif @@ -318,8 +309,6 @@ config SH_CPU_FREQ will also generate a notice in the boot log before disabling itself if the CPU in question is not capable of rate rounding. - For details, take a look at <file:Documentation/cpu-freq>. - If unsure, say N. endif diff --git a/drivers/cpufreq/Kconfig.arm b/drivers/cpufreq/Kconfig.arm index e65e0a43be64..a5c5f70acfc9 100644 --- a/drivers/cpufreq/Kconfig.arm +++ b/drivers/cpufreq/Kconfig.arm @@ -19,6 +19,16 @@ config ACPI_CPPC_CPUFREQ If in doubt, say N. +config ACPI_CPPC_CPUFREQ_FIE + bool "Frequency Invariance support for CPPC cpufreq driver" + depends on ACPI_CPPC_CPUFREQ && GENERIC_ARCH_TOPOLOGY + default y + help + This extends frequency invariance support in the CPPC cpufreq driver, + by using CPPC delivered and reference performance counters. + + If in doubt, say N. + config ARM_ALLWINNER_SUN50I_CPUFREQ_NVMEM tristate "Allwinner nvmem based SUN50I CPUFreq driver" depends on ARCH_SUNXI diff --git a/drivers/cpufreq/armada-37xx-cpufreq.c b/drivers/cpufreq/armada-37xx-cpufreq.c index b4af4094309b..3fc98a3ffd91 100644 --- a/drivers/cpufreq/armada-37xx-cpufreq.c +++ b/drivers/cpufreq/armada-37xx-cpufreq.c @@ -25,6 +25,10 @@ #include "cpufreq-dt.h" +/* Clk register set */ +#define ARMADA_37XX_CLK_TBG_SEL 0 +#define ARMADA_37XX_CLK_TBG_SEL_CPU_OFF 22 + /* Power management in North Bridge register set */ #define ARMADA_37XX_NB_L0L1 0x18 #define ARMADA_37XX_NB_L2L3 0x1C @@ -69,6 +73,8 @@ #define LOAD_LEVEL_NR 4 #define MIN_VOLT_MV 1000 +#define MIN_VOLT_MV_FOR_L1_1000MHZ 1108 +#define MIN_VOLT_MV_FOR_L1_1200MHZ 1155 /* AVS value for the corresponding voltage (in mV) */ static int avs_map[] = { @@ -80,6 +86,8 @@ static int avs_map[] = { }; struct armada37xx_cpufreq_state { + struct platform_device *pdev; + struct device *cpu_dev; struct regmap *regmap; u32 nb_l0l1; u32 nb_l2l3; @@ -120,10 +128,15 @@ static struct armada_37xx_dvfs *armada_37xx_cpu_freq_info_get(u32 freq) * will be configured then the DVFS will be enabled. */ static void __init armada37xx_cpufreq_dvfs_setup(struct regmap *base, - struct clk *clk, u8 *divider) + struct regmap *clk_base, u8 *divider) { + u32 cpu_tbg_sel; int load_lvl; - struct clk *parent; + + /* Determine to which TBG clock is CPU connected */ + regmap_read(clk_base, ARMADA_37XX_CLK_TBG_SEL, &cpu_tbg_sel); + cpu_tbg_sel >>= ARMADA_37XX_CLK_TBG_SEL_CPU_OFF; + cpu_tbg_sel &= ARMADA_37XX_NB_TBG_SEL_MASK; for (load_lvl = 0; load_lvl < LOAD_LEVEL_NR; load_lvl++) { unsigned int reg, mask, val, offset = 0; @@ -142,6 +155,11 @@ static void __init armada37xx_cpufreq_dvfs_setup(struct regmap *base, mask = (ARMADA_37XX_NB_CLK_SEL_MASK << ARMADA_37XX_NB_CLK_SEL_OFF); + /* Set TBG index, for all levels we use the same TBG */ + val = cpu_tbg_sel << ARMADA_37XX_NB_TBG_SEL_OFF; + mask = (ARMADA_37XX_NB_TBG_SEL_MASK + << ARMADA_37XX_NB_TBG_SEL_OFF); + /* * Set cpu divider based on the pre-computed array in * order to have balanced step. @@ -160,14 +178,6 @@ static void __init armada37xx_cpufreq_dvfs_setup(struct regmap *base, regmap_update_bits(base, reg, mask, val); } - - /* - * Set cpu clock source, for all the level we keep the same - * clock source that the one already configured. For this one - * we need to use the clock framework - */ - parent = clk_get_parent(clk); - clk_set_parent(clk, parent); } /* @@ -202,6 +212,8 @@ static u32 armada_37xx_avs_val_match(int target_vm) * - L2 & L3 voltage should be about 150mv smaller than L0 voltage. * This function calculates L1 & L2 & L3 AVS values dynamically based * on L0 voltage and fill all AVS values to the AVS value table. + * When base CPU frequency is 1000 or 1200 MHz then there is additional + * minimal avs value for load L1. */ static void __init armada37xx_cpufreq_avs_configure(struct regmap *base, struct armada_37xx_dvfs *dvfs) @@ -233,6 +245,19 @@ static void __init armada37xx_cpufreq_avs_configure(struct regmap *base, for (load_level = 1; load_level < LOAD_LEVEL_NR; load_level++) dvfs->avs[load_level] = avs_min; + /* + * Set the avs values for load L0 and L1 when base CPU frequency + * is 1000/1200 MHz to its typical initial values according to + * the Armada 3700 Hardware Specifications. + */ + if (dvfs->cpu_freq_max >= 1000*1000*1000) { + if (dvfs->cpu_freq_max >= 1200*1000*1000) + avs_min = armada_37xx_avs_val_match(MIN_VOLT_MV_FOR_L1_1200MHZ); + else + avs_min = armada_37xx_avs_val_match(MIN_VOLT_MV_FOR_L1_1000MHZ); + dvfs->avs[0] = dvfs->avs[1] = avs_min; + } + return; } @@ -252,6 +277,26 @@ static void __init armada37xx_cpufreq_avs_configure(struct regmap *base, target_vm = avs_map[l0_vdd_min] - 150; target_vm = target_vm > MIN_VOLT_MV ? target_vm : MIN_VOLT_MV; dvfs->avs[2] = dvfs->avs[3] = armada_37xx_avs_val_match(target_vm); + + /* + * Fix the avs value for load L1 when base CPU frequency is 1000/1200 MHz, + * otherwise the CPU gets stuck when switching from load L1 to load L0. + * Also ensure that avs value for load L1 is not higher than for L0. + */ + if (dvfs->cpu_freq_max >= 1000*1000*1000) { + u32 avs_min_l1; + + if (dvfs->cpu_freq_max >= 1200*1000*1000) + avs_min_l1 = armada_37xx_avs_val_match(MIN_VOLT_MV_FOR_L1_1200MHZ); + else + avs_min_l1 = armada_37xx_avs_val_match(MIN_VOLT_MV_FOR_L1_1000MHZ); + + if (avs_min_l1 > dvfs->avs[0]) + avs_min_l1 = dvfs->avs[0]; + + if (dvfs->avs[1] < avs_min_l1) + dvfs->avs[1] = avs_min_l1; + } } static void __init armada37xx_cpufreq_avs_setup(struct regmap *base, @@ -357,12 +402,17 @@ static int __init armada37xx_cpufreq_driver_init(void) struct armada_37xx_dvfs *dvfs; struct platform_device *pdev; unsigned long freq; - unsigned int cur_frequency, base_frequency; - struct regmap *nb_pm_base, *avs_base; + unsigned int base_frequency; + struct regmap *nb_clk_base, *nb_pm_base, *avs_base; struct device *cpu_dev; int load_lvl, ret; struct clk *clk, *parent; + nb_clk_base = + syscon_regmap_lookup_by_compatible("marvell,armada-3700-periph-clock-nb"); + if (IS_ERR(nb_clk_base)) + return -ENODEV; + nb_pm_base = syscon_regmap_lookup_by_compatible("marvell,armada-3700-nb-pm"); @@ -413,15 +463,7 @@ static int __init armada37xx_cpufreq_driver_init(void) return -EINVAL; } - /* Get nominal (current) CPU frequency */ - cur_frequency = clk_get_rate(clk); - if (!cur_frequency) { - dev_err(cpu_dev, "Failed to get clock rate for CPU\n"); - clk_put(clk); - return -EINVAL; - } - - dvfs = armada_37xx_cpu_freq_info_get(cur_frequency); + dvfs = armada_37xx_cpu_freq_info_get(base_frequency); if (!dvfs) { clk_put(clk); return -EINVAL; @@ -439,7 +481,7 @@ static int __init armada37xx_cpufreq_driver_init(void) armada37xx_cpufreq_avs_configure(avs_base, dvfs); armada37xx_cpufreq_avs_setup(avs_base, dvfs); - armada37xx_cpufreq_dvfs_setup(nb_pm_base, clk, dvfs->divider); + armada37xx_cpufreq_dvfs_setup(nb_pm_base, nb_clk_base, dvfs->divider); clk_put(clk); for (load_lvl = ARMADA_37XX_DVFS_LOAD_0; load_lvl < LOAD_LEVEL_NR; @@ -466,6 +508,9 @@ static int __init armada37xx_cpufreq_driver_init(void) if (ret) goto disable_dvfs; + armada37xx_cpufreq_state->cpu_dev = cpu_dev; + armada37xx_cpufreq_state->pdev = pdev; + platform_set_drvdata(pdev, dvfs); return 0; disable_dvfs: @@ -473,7 +518,7 @@ disable_dvfs: remove_opp: /* clean-up the already added opp before leaving */ while (load_lvl-- > ARMADA_37XX_DVFS_LOAD_0) { - freq = cur_frequency / dvfs->divider[load_lvl]; + freq = base_frequency / dvfs->divider[load_lvl]; dev_pm_opp_remove(cpu_dev, freq); } @@ -484,6 +529,26 @@ remove_opp: /* late_initcall, to guarantee the driver is loaded after A37xx clock driver */ late_initcall(armada37xx_cpufreq_driver_init); +static void __exit armada37xx_cpufreq_driver_exit(void) +{ + struct platform_device *pdev = armada37xx_cpufreq_state->pdev; + struct armada_37xx_dvfs *dvfs = platform_get_drvdata(pdev); + unsigned long freq; + int load_lvl; + + platform_device_unregister(pdev); + + armada37xx_cpufreq_disable_dvfs(armada37xx_cpufreq_state->regmap); + + for (load_lvl = ARMADA_37XX_DVFS_LOAD_0; load_lvl < LOAD_LEVEL_NR; load_lvl++) { + freq = dvfs->cpu_freq_max / dvfs->divider[load_lvl]; + dev_pm_opp_remove(armada37xx_cpufreq_state->cpu_dev, freq); + } + + kfree(armada37xx_cpufreq_state); +} +module_exit(armada37xx_cpufreq_driver_exit); + static const struct of_device_id __maybe_unused armada37xx_cpufreq_of_match[] = { { .compatible = "marvell,armada-3700-nb-pm" }, { }, diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c index 8a482c434ea6..3848b4c222e1 100644 --- a/drivers/cpufreq/cppc_cpufreq.c +++ b/drivers/cpufreq/cppc_cpufreq.c @@ -10,14 +10,18 @@ #define pr_fmt(fmt) "CPPC Cpufreq:" fmt +#include <linux/arch_topology.h> #include <linux/kernel.h> #include <linux/module.h> #include <linux/delay.h> #include <linux/cpu.h> #include <linux/cpufreq.h> #include <linux/dmi.h> +#include <linux/irq_work.h> +#include <linux/kthread.h> #include <linux/time.h> #include <linux/vmalloc.h> +#include <uapi/linux/sched/types.h> #include <asm/unaligned.h> @@ -57,6 +61,204 @@ static struct cppc_workaround_oem_info wa_info[] = { } }; +#ifdef CONFIG_ACPI_CPPC_CPUFREQ_FIE + +/* Frequency invariance support */ +struct cppc_freq_invariance { + int cpu; + struct irq_work irq_work; + struct kthread_work work; + struct cppc_perf_fb_ctrs prev_perf_fb_ctrs; + struct cppc_cpudata *cpu_data; +}; + +static DEFINE_PER_CPU(struct cppc_freq_invariance, cppc_freq_inv); +static struct kthread_worker *kworker_fie; +static bool fie_disabled; + +static struct cpufreq_driver cppc_cpufreq_driver; +static unsigned int hisi_cppc_cpufreq_get_rate(unsigned int cpu); +static int cppc_perf_from_fbctrs(struct cppc_cpudata *cpu_data, + struct cppc_perf_fb_ctrs fb_ctrs_t0, + struct cppc_perf_fb_ctrs fb_ctrs_t1); + +/** + * cppc_scale_freq_workfn - CPPC arch_freq_scale updater for frequency invariance + * @work: The work item. + * + * The CPPC driver register itself with the topology core to provide its own + * implementation (cppc_scale_freq_tick()) of topology_scale_freq_tick() which + * gets called by the scheduler on every tick. + * + * Note that the arch specific counters have higher priority than CPPC counters, + * if available, though the CPPC driver doesn't need to have any special + * handling for that. + * + * On an invocation of cppc_scale_freq_tick(), we schedule an irq work (since we + * reach here from hard-irq context), which then schedules a normal work item + * and cppc_scale_freq_workfn() updates the per_cpu arch_freq_scale variable + * based on the counter updates since the last tick. + */ +static void cppc_scale_freq_workfn(struct kthread_work *work) +{ + struct cppc_freq_invariance *cppc_fi; + struct cppc_perf_fb_ctrs fb_ctrs = {0}; + struct cppc_cpudata *cpu_data; + unsigned long local_freq_scale; + u64 perf; + + cppc_fi = container_of(work, struct cppc_freq_invariance, work); + cpu_data = cppc_fi->cpu_data; + + if (cppc_get_perf_ctrs(cppc_fi->cpu, &fb_ctrs)) { + pr_warn("%s: failed to read perf counters\n", __func__); + return; + } + + cppc_fi->prev_perf_fb_ctrs = fb_ctrs; + perf = cppc_perf_from_fbctrs(cpu_data, cppc_fi->prev_perf_fb_ctrs, + fb_ctrs); + + perf <<= SCHED_CAPACITY_SHIFT; + local_freq_scale = div64_u64(perf, cpu_data->perf_caps.highest_perf); + if (WARN_ON(local_freq_scale > 1024)) + local_freq_scale = 1024; + + per_cpu(arch_freq_scale, cppc_fi->cpu) = local_freq_scale; +} + +static void cppc_irq_work(struct irq_work *irq_work) +{ + struct cppc_freq_invariance *cppc_fi; + + cppc_fi = container_of(irq_work, struct cppc_freq_invariance, irq_work); + kthread_queue_work(kworker_fie, &cppc_fi->work); +} + +static void cppc_scale_freq_tick(void) +{ + struct cppc_freq_invariance *cppc_fi = &per_cpu(cppc_freq_inv, smp_processor_id()); + + /* + * cppc_get_perf_ctrs() can potentially sleep, call that from the right + * context. + */ + irq_work_queue(&cppc_fi->irq_work); +} + +static struct scale_freq_data cppc_sftd = { + .source = SCALE_FREQ_SOURCE_CPPC, + .set_freq_scale = cppc_scale_freq_tick, +}; + +static void cppc_freq_invariance_policy_init(struct cpufreq_policy *policy, + struct cppc_cpudata *cpu_data) +{ + struct cppc_perf_fb_ctrs fb_ctrs = {0}; + struct cppc_freq_invariance *cppc_fi; + int i, ret; + + if (cppc_cpufreq_driver.get == hisi_cppc_cpufreq_get_rate) + return; + + if (fie_disabled) + return; + + for_each_cpu(i, policy->cpus) { + cppc_fi = &per_cpu(cppc_freq_inv, i); + cppc_fi->cpu = i; + cppc_fi->cpu_data = cpu_data; + kthread_init_work(&cppc_fi->work, cppc_scale_freq_workfn); + init_irq_work(&cppc_fi->irq_work, cppc_irq_work); + + ret = cppc_get_perf_ctrs(i, &fb_ctrs); + if (ret) { + pr_warn("%s: failed to read perf counters: %d\n", + __func__, ret); + fie_disabled = true; + } else { + cppc_fi->prev_perf_fb_ctrs = fb_ctrs; + } + } +} + +static void __init cppc_freq_invariance_init(void) +{ + struct sched_attr attr = { + .size = sizeof(struct sched_attr), + .sched_policy = SCHED_DEADLINE, + .sched_nice = 0, + .sched_priority = 0, + /* + * Fake (unused) bandwidth; workaround to "fix" + * priority inheritance. + */ + .sched_runtime = 1000000, + .sched_deadline = 10000000, + .sched_period = 10000000, + }; + int ret; + + if (cppc_cpufreq_driver.get == hisi_cppc_cpufreq_get_rate) + return; + + if (fie_disabled) + return; + + kworker_fie = kthread_create_worker(0, "cppc_fie"); + if (IS_ERR(kworker_fie)) + return; + + ret = sched_setattr_nocheck(kworker_fie->task, &attr); + if (ret) { + pr_warn("%s: failed to set SCHED_DEADLINE: %d\n", __func__, + ret); + kthread_destroy_worker(kworker_fie); + return; + } + + /* Register for freq-invariance */ + topology_set_scale_freq_source(&cppc_sftd, cpu_present_mask); +} + +static void cppc_freq_invariance_exit(void) +{ + struct cppc_freq_invariance *cppc_fi; + int i; + + if (cppc_cpufreq_driver.get == hisi_cppc_cpufreq_get_rate) + return; + + if (fie_disabled) + return; + + topology_clear_scale_freq_source(SCALE_FREQ_SOURCE_CPPC, cpu_present_mask); + + for_each_possible_cpu(i) { + cppc_fi = &per_cpu(cppc_freq_inv, i); + irq_work_sync(&cppc_fi->irq_work); + } + + kthread_destroy_worker(kworker_fie); + kworker_fie = NULL; +} + +#else +static inline void +cppc_freq_invariance_policy_init(struct cpufreq_policy *policy, + struct cppc_cpudata *cpu_data) +{ +} + +static inline void cppc_freq_invariance_init(void) +{ +} + +static inline void cppc_freq_invariance_exit(void) +{ +} +#endif /* CONFIG_ACPI_CPPC_CPUFREQ_FIE */ + /* Callback function used to retrieve the max frequency from DMI */ static void cppc_find_dmi_mhz(const struct dmi_header *dm, void *private) { @@ -216,26 +418,16 @@ static unsigned int cppc_cpufreq_get_transition_delay_us(unsigned int cpu) { unsigned long implementor = read_cpuid_implementor(); unsigned long part_num = read_cpuid_part_number(); - unsigned int delay_us = 0; switch (implementor) { case ARM_CPU_IMP_QCOM: switch (part_num) { case QCOM_CPU_PART_FALKOR_V1: case QCOM_CPU_PART_FALKOR: - delay_us = 10000; - break; - default: - delay_us = cppc_get_transition_latency(cpu) / NSEC_PER_USEC; - break; + return 10000; } - break; - default: - delay_us = cppc_get_transition_latency(cpu) / NSEC_PER_USEC; - break; } - - return delay_us; + return cppc_get_transition_latency(cpu) / NSEC_PER_USEC; } #else @@ -355,9 +547,12 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy) cpu_data->perf_ctrls.desired_perf = caps->highest_perf; ret = cppc_set_perf(cpu, &cpu_data->perf_ctrls); - if (ret) + if (ret) { pr_debug("Err setting perf value:%d on CPU:%d. ret:%d\n", caps->highest_perf, cpu, ret); + } else { + cppc_freq_invariance_policy_init(policy, cpu_data); + } return ret; } @@ -370,12 +565,12 @@ static inline u64 get_delta(u64 t1, u64 t0) return (u32)t1 - (u32)t0; } -static int cppc_get_rate_from_fbctrs(struct cppc_cpudata *cpu_data, - struct cppc_perf_fb_ctrs fb_ctrs_t0, - struct cppc_perf_fb_ctrs fb_ctrs_t1) +static int cppc_perf_from_fbctrs(struct cppc_cpudata *cpu_data, + struct cppc_perf_fb_ctrs fb_ctrs_t0, + struct cppc_perf_fb_ctrs fb_ctrs_t1) { u64 delta_reference, delta_delivered; - u64 reference_perf, delivered_perf; + u64 reference_perf; reference_perf = fb_ctrs_t0.reference_perf; @@ -384,12 +579,21 @@ static int cppc_get_rate_from_fbctrs(struct cppc_cpudata *cpu_data, delta_delivered = get_delta(fb_ctrs_t1.delivered, fb_ctrs_t0.delivered); - /* Check to avoid divide-by zero */ - if (delta_reference || delta_delivered) - delivered_perf = (reference_perf * delta_delivered) / - delta_reference; - else - delivered_perf = cpu_data->perf_ctrls.desired_perf; + /* Check to avoid divide-by zero and invalid delivered_perf */ + if (!delta_reference || !delta_delivered) + return cpu_data->perf_ctrls.desired_perf; + + return (reference_perf * delta_delivered) / delta_reference; +} + +static int cppc_get_rate_from_fbctrs(struct cppc_cpudata *cpu_data, + struct cppc_perf_fb_ctrs fb_ctrs_t0, + struct cppc_perf_fb_ctrs fb_ctrs_t1) +{ + u64 delivered_perf; + + delivered_perf = cppc_perf_from_fbctrs(cpu_data, fb_ctrs_t0, + fb_ctrs_t1); return cppc_cpufreq_perf_to_khz(cpu_data, delivered_perf); } @@ -514,6 +718,8 @@ static void cppc_check_hisi_workaround(void) static int __init cppc_cpufreq_init(void) { + int ret; + if ((acpi_disabled) || !acpi_cpc_valid()) return -ENODEV; @@ -521,7 +727,11 @@ static int __init cppc_cpufreq_init(void) cppc_check_hisi_workaround(); - return cpufreq_register_driver(&cppc_cpufreq_driver); + ret = cpufreq_register_driver(&cppc_cpufreq_driver); + if (!ret) + cppc_freq_invariance_init(); + + return ret; } static inline void free_cpu_data(void) @@ -538,6 +748,7 @@ static inline void free_cpu_data(void) static void __exit cppc_cpufreq_exit(void) { + cppc_freq_invariance_exit(); cpufreq_unregister_driver(&cppc_cpufreq_driver); free_cpu_data(); diff --git a/drivers/cpufreq/cpufreq-dt.c b/drivers/cpufreq/cpufreq-dt.c index b1e1bdc63b01..ece52863ba62 100644 --- a/drivers/cpufreq/cpufreq-dt.c +++ b/drivers/cpufreq/cpufreq-dt.c @@ -255,10 +255,15 @@ static int dt_cpufreq_early_init(struct device *dev, int cpu) * before updating priv->cpus. Otherwise, we will end up creating * duplicate OPPs for the CPUs. * - * OPPs might be populated at runtime, don't check for error here. + * OPPs might be populated at runtime, don't fail for error here unless + * it is -EPROBE_DEFER. */ - if (!dev_pm_opp_of_cpumask_add_table(priv->cpus)) + ret = dev_pm_opp_of_cpumask_add_table(priv->cpus); + if (!ret) { priv->have_static_opps = true; + } else if (ret == -EPROBE_DEFER) { + goto out; + } /* * The OPP table must be initialized, statically or dynamically, by this diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 1d1b563cea4b..802abc925b2a 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -42,9 +42,6 @@ static LIST_HEAD(cpufreq_policy_list); #define for_each_inactive_policy(__policy) \ for_each_suitable_policy(__policy, false) -#define for_each_policy(__policy) \ - list_for_each_entry(__policy, &cpufreq_policy_list, policy_list) - /* Iterate over governors */ static LIST_HEAD(cpufreq_governor_list); #define for_each_governor(__governor) \ diff --git a/drivers/cpufreq/ia64-acpi-cpufreq.c b/drivers/cpufreq/ia64-acpi-cpufreq.c index 2efe7189ccc4..c6bdc455517f 100644 --- a/drivers/cpufreq/ia64-acpi-cpufreq.c +++ b/drivers/cpufreq/ia64-acpi-cpufreq.c @@ -54,7 +54,7 @@ processor_set_pstate ( retval = ia64_pal_set_pstate((u64)value); if (retval) { - pr_debug("Failed to set freq to 0x%x, with error 0x%lx\n", + pr_debug("Failed to set freq to 0x%x, with error 0x%llx\n", value, retval); return -ENODEV; } @@ -77,7 +77,7 @@ processor_get_pstate ( if (retval) pr_debug("Failed to get current freq with " - "error 0x%lx, idx 0x%x\n", retval, *value); + "error 0x%llx, idx 0x%x\n", retval, *value); return (int)retval; } diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c index 5175ae3cac44..f0401064d7aa 100644 --- a/drivers/cpufreq/intel_pstate.c +++ b/drivers/cpufreq/intel_pstate.c @@ -819,19 +819,21 @@ static struct freq_attr *hwp_cpufreq_attrs[] = { NULL, }; -static void intel_pstate_get_hwp_max(struct cpudata *cpu, int *phy_max, - int *current_max) +static void __intel_pstate_get_hwp_cap(struct cpudata *cpu) { u64 cap; rdmsrl_on_cpu(cpu->cpu, MSR_HWP_CAPABILITIES, &cap); WRITE_ONCE(cpu->hwp_cap_cached, cap); - if (global.no_turbo || global.turbo_disabled) - *current_max = HWP_GUARANTEED_PERF(cap); - else - *current_max = HWP_HIGHEST_PERF(cap); + cpu->pstate.max_pstate = HWP_GUARANTEED_PERF(cap); + cpu->pstate.turbo_pstate = HWP_HIGHEST_PERF(cap); +} - *phy_max = HWP_HIGHEST_PERF(cap); +static void intel_pstate_get_hwp_cap(struct cpudata *cpu) +{ + __intel_pstate_get_hwp_cap(cpu); + cpu->pstate.max_freq = cpu->pstate.max_pstate * cpu->pstate.scaling; + cpu->pstate.turbo_freq = cpu->pstate.turbo_pstate * cpu->pstate.scaling; } static void intel_pstate_hwp_set(unsigned int cpu) @@ -1195,12 +1197,13 @@ static ssize_t store_no_turbo(struct kobject *a, struct kobj_attribute *b, static void update_qos_request(enum freq_qos_req_type type) { - int max_state, turbo_max, freq, i, perf_pct; struct freq_qos_request *req; struct cpufreq_policy *policy; + int i; for_each_possible_cpu(i) { struct cpudata *cpu = all_cpu_data[i]; + unsigned int freq, perf_pct; policy = cpufreq_cpu_get(i); if (!policy) @@ -1213,9 +1216,7 @@ static void update_qos_request(enum freq_qos_req_type type) continue; if (hwp_active) - intel_pstate_get_hwp_max(cpu, &turbo_max, &max_state); - else - turbo_max = cpu->pstate.turbo_pstate; + intel_pstate_get_hwp_cap(cpu); if (type == FREQ_QOS_MIN) { perf_pct = global.min_perf_pct; @@ -1224,8 +1225,7 @@ static void update_qos_request(enum freq_qos_req_type type) perf_pct = global.max_perf_pct; } - freq = DIV_ROUND_UP(turbo_max * perf_pct, 100); - freq *= cpu->pstate.scaling; + freq = DIV_ROUND_UP(cpu->pstate.turbo_freq * perf_pct, 100); if (freq_qos_update_request(req, freq) < 0) pr_warn("Failed to update freq constraint: CPU%d\n", i); @@ -1715,21 +1715,17 @@ static void intel_pstate_get_cpu_pstates(struct cpudata *cpu) { cpu->pstate.min_pstate = pstate_funcs.get_min(); cpu->pstate.max_pstate_physical = pstate_funcs.get_max_physical(); - cpu->pstate.turbo_pstate = pstate_funcs.get_turbo(); cpu->pstate.scaling = pstate_funcs.get_scaling(); if (hwp_active && !hwp_mode_bdw) { - unsigned int phy_max, current_max; - - intel_pstate_get_hwp_max(cpu, &phy_max, ¤t_max); - cpu->pstate.turbo_freq = phy_max * cpu->pstate.scaling; - cpu->pstate.turbo_pstate = phy_max; - cpu->pstate.max_pstate = HWP_GUARANTEED_PERF(READ_ONCE(cpu->hwp_cap_cached)); + __intel_pstate_get_hwp_cap(cpu); } else { - cpu->pstate.turbo_freq = cpu->pstate.turbo_pstate * cpu->pstate.scaling; cpu->pstate.max_pstate = pstate_funcs.get_max(); + cpu->pstate.turbo_pstate = pstate_funcs.get_turbo(); } + cpu->pstate.max_freq = cpu->pstate.max_pstate * cpu->pstate.scaling; + cpu->pstate.turbo_freq = cpu->pstate.turbo_pstate * cpu->pstate.scaling; if (pstate_funcs.get_aperf_mperf_shift) cpu->aperf_mperf_shift = pstate_funcs.get_aperf_mperf_shift(); @@ -2199,41 +2195,34 @@ static void intel_pstate_update_perf_limits(struct cpudata *cpu, unsigned int policy_min, unsigned int policy_max) { + int scaling = cpu->pstate.scaling; int32_t max_policy_perf, min_policy_perf; - int max_state, turbo_max; - int max_freq; /* - * HWP needs some special consideration, because on BDX the - * HWP_REQUEST uses abstract value to represent performance - * rather than pure ratios. + * HWP needs some special consideration, because HWP_REQUEST uses + * abstract values to represent performance rather than pure ratios. */ - if (hwp_active) { - intel_pstate_get_hwp_max(cpu, &turbo_max, &max_state); - } else { - max_state = global.no_turbo || global.turbo_disabled ? - cpu->pstate.max_pstate : cpu->pstate.turbo_pstate; - turbo_max = cpu->pstate.turbo_pstate; - } - max_freq = max_state * cpu->pstate.scaling; + if (hwp_active) + intel_pstate_get_hwp_cap(cpu); - max_policy_perf = max_state * policy_max / max_freq; + max_policy_perf = policy_max / scaling; if (policy_max == policy_min) { min_policy_perf = max_policy_perf; } else { - min_policy_perf = max_state * policy_min / max_freq; + min_policy_perf = policy_min / scaling; min_policy_perf = clamp_t(int32_t, min_policy_perf, 0, max_policy_perf); } - pr_debug("cpu:%d max_state %d min_policy_perf:%d max_policy_perf:%d\n", - cpu->cpu, max_state, min_policy_perf, max_policy_perf); + pr_debug("cpu:%d min_policy_perf:%d max_policy_perf:%d\n", + cpu->cpu, min_policy_perf, max_policy_perf); /* Normalize user input to [min_perf, max_perf] */ if (per_cpu_limits) { cpu->min_perf_ratio = min_policy_perf; cpu->max_perf_ratio = max_policy_perf; } else { + int turbo_max = cpu->pstate.turbo_pstate; int32_t global_min, global_max; /* Global limits are in percent of the maximum turbo P-state. */ @@ -2322,10 +2311,9 @@ static void intel_pstate_verify_cpu_policy(struct cpudata *cpu, update_turbo_state(); if (hwp_active) { - int max_state, turbo_max; - - intel_pstate_get_hwp_max(cpu, &turbo_max, &max_state); - max_freq = max_state * cpu->pstate.scaling; + intel_pstate_get_hwp_cap(cpu); + max_freq = global.no_turbo || global.turbo_disabled ? + cpu->pstate.max_freq : cpu->pstate.turbo_freq; } else { max_freq = intel_pstate_get_max_freq(cpu); } @@ -2416,25 +2404,15 @@ static int __intel_pstate_cpu_init(struct cpufreq_policy *policy) cpu->max_perf_ratio = 0xFF; cpu->min_perf_ratio = 0; - policy->min = cpu->pstate.min_pstate * cpu->pstate.scaling; - policy->max = cpu->pstate.turbo_pstate * cpu->pstate.scaling; - /* cpuinfo and default policy values */ policy->cpuinfo.min_freq = cpu->pstate.min_pstate * cpu->pstate.scaling; update_turbo_state(); global.turbo_disabled_mf = global.turbo_disabled; policy->cpuinfo.max_freq = global.turbo_disabled ? - cpu->pstate.max_pstate : cpu->pstate.turbo_pstate; - policy->cpuinfo.max_freq *= cpu->pstate.scaling; - - if (hwp_active) { - unsigned int max_freq; - - max_freq = global.turbo_disabled ? cpu->pstate.max_freq : cpu->pstate.turbo_freq; - if (max_freq < policy->cpuinfo.max_freq) - policy->cpuinfo.max_freq = max_freq; - } + + policy->min = policy->cpuinfo.min_freq; + policy->max = policy->cpuinfo.max_freq; intel_pstate_init_acpi_perf_limits(policy); @@ -2683,10 +2661,10 @@ static void intel_cpufreq_adjust_perf(unsigned int cpunum, static int intel_cpufreq_cpu_init(struct cpufreq_policy *policy) { - int max_state, turbo_max, min_freq, max_freq, ret; struct freq_qos_request *req; struct cpudata *cpu; struct device *dev; + int ret, freq; dev = get_cpu_device(policy->cpu); if (!dev) @@ -2711,30 +2689,31 @@ static int intel_cpufreq_cpu_init(struct cpufreq_policy *policy) if (hwp_active) { u64 value; - intel_pstate_get_hwp_max(cpu, &turbo_max, &max_state); policy->transition_delay_us = INTEL_CPUFREQ_TRANSITION_DELAY_HWP; + + intel_pstate_get_hwp_cap(cpu); + rdmsrl_on_cpu(cpu->cpu, MSR_HWP_REQUEST, &value); WRITE_ONCE(cpu->hwp_req_cached, value); + cpu->epp_cached = intel_pstate_get_epp(cpu, value); } else { - turbo_max = cpu->pstate.turbo_pstate; policy->transition_delay_us = INTEL_CPUFREQ_TRANSITION_DELAY; } - min_freq = DIV_ROUND_UP(turbo_max * global.min_perf_pct, 100); - min_freq *= cpu->pstate.scaling; - max_freq = DIV_ROUND_UP(turbo_max * global.max_perf_pct, 100); - max_freq *= cpu->pstate.scaling; + freq = DIV_ROUND_UP(cpu->pstate.turbo_freq * global.min_perf_pct, 100); ret = freq_qos_add_request(&policy->constraints, req, FREQ_QOS_MIN, - min_freq); + freq); if (ret < 0) { dev_err(dev, "Failed to add min-freq constraint (%d)\n", ret); goto free_req; } + freq = DIV_ROUND_UP(cpu->pstate.turbo_freq * global.max_perf_pct, 100); + ret = freq_qos_add_request(&policy->constraints, req + 1, FREQ_QOS_MAX, - max_freq); + freq); if (ret < 0) { dev_err(dev, "Failed to add max-freq constraint (%d)\n", ret); goto remove_min_req; diff --git a/drivers/cpufreq/s5pv210-cpufreq.c b/drivers/cpufreq/s5pv210-cpufreq.c index 69786e5bbf05..ad7d4f272ddc 100644 --- a/drivers/cpufreq/s5pv210-cpufreq.c +++ b/drivers/cpufreq/s5pv210-cpufreq.c @@ -91,7 +91,7 @@ static DEFINE_MUTEX(set_freq_lock); /* Use 800MHz when entering sleep mode */ #define SLEEP_FREQ (800 * 1000) -/* Tracks if cpu freqency can be updated anymore */ +/* Tracks if CPU frequency can be updated anymore */ static bool no_cpufreq_access; /* @@ -190,7 +190,7 @@ static u32 clkdiv_val[5][11] = { /* * This function set DRAM refresh counter - * accoriding to operating frequency of DRAM + * according to operating frequency of DRAM * ch: DMC port number 0 or 1 * freq: Operating frequency of DRAM(KHz) */ @@ -320,7 +320,7 @@ static int s5pv210_target(struct cpufreq_policy *policy, unsigned int index) /* * 3. DMC1 refresh count for 133Mhz if (index == L4) is - * true refresh counter is already programed in upper + * true refresh counter is already programmed in upper * code. 0x287@83Mhz */ if (!bus_speed_changing) @@ -378,7 +378,7 @@ static int s5pv210_target(struct cpufreq_policy *policy, unsigned int index) /* * 6. Turn on APLL * 6-1. Set PMS values - * 6-2. Wait untile the PLL is locked + * 6-2. Wait until the PLL is locked */ if (index == L0) writel_relaxed(APLL_VAL_1000, S5P_APLL_CON); @@ -390,7 +390,7 @@ static int s5pv210_target(struct cpufreq_policy *policy, unsigned int index) } while (!(reg & (0x1 << 29))); /* - * 7. Change souce clock from SCLKMPLL(667Mhz) + * 7. Change source clock from SCLKMPLL(667Mhz) * to SCLKA2M(200Mhz) in MFC_MUX and G3D MUX * (667/4=166)->(200/4=50)Mhz */ @@ -439,8 +439,8 @@ static int s5pv210_target(struct cpufreq_policy *policy, unsigned int index) } /* - * L4 level need to change memory bus speed, hence onedram clock divier - * and memory refresh parameter should be changed + * L4 level needs to change memory bus speed, hence ONEDRAM clock + * divider and memory refresh parameter should be changed */ if (bus_speed_changing) { reg = readl_relaxed(S5P_CLK_DIV6); |