summaryrefslogtreecommitdiff
path: root/drivers/cpufreq/intel_pstate.c
AgeCommit message (Collapse)AuthorFilesLines
2024-06-27Merge back cpufreq material for v6.11.Rafael J. Wysocki1-48/+52
2024-06-25cpufreq: intel_pstate: Replace boot_cpu_has()Srinivas Pandruvada1-2/+2
Replace boot_cpu_has() with cpu_feature_enabled(). Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Link: https://patch.msgid.link/20240624162714.1431182-1-srinivas.pandruvada@linux.intel.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-06-24cpufreq: intel_pstate: Use HWP to initialize ITMT if CPPC is missingRafael J. Wysocki1-7/+6
It is reported that single-thread performance on some hybrid systems dropped significantly after commit 7feec7430edd ("ACPI: CPPC: Only probe for _CPC if CPPC v2 is acked") which prevented _CPC from being used if the support for it had not been confirmed by the platform firmware. The problem is that if the platform firmware does not confirm CPPC v2 support, cppc_get_perf_caps() returns an error which prevents the intel_pstate driver from enabling ITMT. Consequently, the scheduler does not get any hints on CPU performance differences, so in a hybrid system some tasks may run on CPUs with lower capacity even though they should be running on high-capacity CPUs. To address this, modify intel_pstate to use the information from MSR_HWP_CAPABILITIES to enable ITMT if CPPC is not available (which is done already if the highest performance number coming from CPPC is not realistic). Fixes: 7feec7430edd ("ACPI: CPPC: Only probe for _CPC if CPPC v2 is acked") Closes: https://lore.kernel.org/linux-acpi/d01b0a1f-bd33-47fe-ab41-43843d8a374f@kfocus.org Link: https://lore.kernel.org/linux-acpi/ZnD22b3Br1ng7alf@kf-XE Reported-by: Aaron Rainbolt <arainbolt@kfocus.org> Tested-by: Aaron Rainbolt <arainbolt@kfocus.org> Cc: 5.19+ <stable@vger.kernel.org> # 5.19+ Link: https://patch.msgid.link/12460110.O9o76ZdvQC@rjwysocki.net Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
2024-06-19cpufreq: intel_pstate: Update Lunar Lake hybrid scaling factorSrinivas Pandruvada1-0/+2
Change hybrid scaling factor for Lunar Lake. Scaling factor is 1.15 for P-cores compared to E-cores. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Link: https://patch.msgid.link/20240618055221.446108-2-srinivas.pandruvada@linux.intel.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-06-19cpufreq: intel_pstate: Update Arrow Lake hybrid scaling factorSrinivas Pandruvada1-0/+1
Arrow Lake uses the same scaling factor as Meteor Lake, so reuse the same scaling factor. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Link: https://patch.msgid.link/20240618055221.446108-1-srinivas.pandruvada@linux.intel.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-06-14Merge back new cpufreq material for v6.11.Rafael J. Wysocki1-46/+47
2024-06-12cpufreq: intel_pstate: Check turbo_is_disabled() in store_no_turbo()Rafael J. Wysocki1-7/+12
After recent changes in intel_pstate, global.turbo_disabled is only set at the initialization time and never changed. However, it turns out that on some systems the "turbo disabled" bit in MSR_IA32_MISC_ENABLE, the initial state of which is reflected by global.turbo_disabled, can be flipped later and there should be a way to take that into account (other than checking that MSR every time the driver runs which is costly and useless overhead on the vast majority of systems). For this purpose, notice that before the changes in question, store_no_turbo() contained a turbo_is_disabled() check that was used for updating global.turbo_disabled if the "turbo disabled" bit in MSR_IA32_MISC_ENABLE had been flipped and that functionality can be restored. Then, users will be able to reset global.turbo_disabled by writing 0 to no_turbo which used to work before on systems with flipping "turbo disabled" bit. This guarantees the driver state to remain in sync, but READ_ONCE() annotations need to be added in two places where global.turbo_disabled is accessed locklessly, so modify the driver to make that happen. Fixes: 0940f1a8011f ("cpufreq: intel_pstate: Do not update global.turbo_disabled after initialization") Closes: https://lore.kernel.org/linux-pm/bf3ebf1571a4788e97daf861eb493c12d42639a3.camel@xry111.site Suggested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Reported-by: Xi Ruoyao <xry111@xry111.site> Tested-by: Xi Ruoyao <xry111@xry111.site> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-06-07cpufreq: intel_pstate: Support Emerald Rapids OOB modeSrinivas Pandruvada1-0/+1
Prevent intel_pstate from loading when OOB (Out Of Band) P-states mode is enabled in Emerald Rapids. The OOB identifying bits are same as for the prior generation CPUs like Sapphire Rapids servers, so also add Emerald Rapids to the intel_pstate_cpu_oob_ids[] list. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-06-07cpufreq: intel_pstate: Use Meteor Lake EPPs for Arrow LakeSrinivas Pandruvada1-0/+2
Use the same default EPPs as Meteor Lake generation. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-06-07cpufreq: intel_pstate: Update Meteor Lake EPPsSrinivas Pandruvada1-1/+1
Update the default balance_performance EPP to 64. This gives better performance and also perf/watt compared to current value of 115. For example: Speedometer 2.1 score: +19% Perf/watt: +5.25% Webxprt 4 score score: +12% Perf/watt: +6.12% 3DMark Wildlife extreme unlimited score score: +3.2% Perf/watt: +11.5% Geekbench6 MT score: +2.14% Perf/watt: +0.32% Also update balance_power EPP default to 179. With this change: Video Playback power is reduced by 52% Team video conference power is reduced by 35% With Power profile daemon now sets balance_power EPP on DC instead of balance_performance, updating balance_power EPP will help to extend battery life. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-06-07cpufreq: intel_pstate: Switch to new Intel CPU model definesTony Luck1-46/+44
New CPU #defines encode vendor and family as well as model. Signed-off-by: Tony Luck <tony.luck@intel.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-06-03cpufreq: intel_pstate: Fix unchecked HWP MSR accessSrinivas Pandruvada1-1/+2
Fix unchecked MSR access error for processors with no HWP support. On such processors, maximum frequency can be changed by the system firmware using ACPI event ACPI_PROCESSOR_NOTIFY_HIGEST_PERF_CHANGED. This results in accessing HWP MSR 0x771. Call Trace: <TASK> generic_exec_single+0x58/0x120 smp_call_function_single+0xbf/0x110 rdmsrl_on_cpu+0x46/0x60 intel_pstate_get_hwp_cap+0x1b/0x70 intel_pstate_update_limits+0x2a/0x60 acpi_processor_notify+0xb7/0x140 acpi_ev_notify_dispatch+0x3b/0x60 HWP MSR 0x771 can be only read on a CPU which supports HWP and enabled. Hence intel_pstate_get_hwp_cap() can only be called when hwp_active is true. Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Closes: https://lore.kernel.org/linux-pm/20240529155740.Hq2Hw7be@linutronix.de/ Fixes: e8217b4bece3 ("cpufreq: intel_pstate: Update the maximum CPU frequency consistently") Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-05-07cpufreq: intel_pstate: fix struct cpudata::epp_cached kernel-docJeff Johnson1-1/+1
make C=1 currently gives the following warning: drivers/cpufreq/intel_pstate.c:262: warning: Function parameter or struct member 'epp_cached' not described in 'cpudata' Add the missing ":" to fix the trivial kernel-doc syntax error. Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-04-03cpufreq: intel_pstate: hide unused intel_pstate_cpu_oob_ids[]Arnd Bergmann1-0/+2
The reference to this variable is hidden in an #ifdef: drivers/cpufreq/intel_pstate.c:2440:32: error: 'intel_pstate_cpu_oob_ids' defined but not used [-Werror=unused-const-variable=] Use the same check around the definition. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-04-02cpufreq: intel_pstate: Update the maximum CPU frequency consistentlyRafael J. Wysocki1-5/+18
There are 3 places at which the maximum CPU frequency may change, store_no_turbo(), intel_pstate_update_limits() (when called by the cpufreq core) and intel_pstate_notify_work() (when handling a HWP change notification). Currently, cpuinfo.max_freq is only updated by store_no_turbo() and intel_pstate_notify_work(), although it principle it may be necessary to update it in intel_pstate_update_limits() either. Make all of them mutually consistent. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
2024-04-02cpufreq: intel_pstate: Replace three global.turbo_disabled checksRafael J. Wysocki1-4/+5
Replace the global.turbo_disabled in __intel_pstate_update_max_freq() with a global.no_turbo one to make store_no_turbo() actually update the maximum CPU frequency on the trubo preference changes, which needs to be consistent with arch_set_max_freq_ratio() called from there. For more consistency, replace the global.turbo_disabled checks in __intel_pstate_cpu_init() and intel_cpufreq_adjust_perf() with global.no_turbo checks either. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
2024-04-02cpufreq: intel_pstate: Read global.no_turbo under READ_ONCE()Rafael J. Wysocki1-6/+6
Because global.no_turbo is generally not read under intel_pstate_driver_lock make store_no_turbo() use WRITE_ONCE() for updating it (this is the only place at which it is updated except for the initialization) and make the majority of places reading it use READ_ONCE(). Also remove redundant global.turbo_disabled checks from places that depend on the 'true' value of global.no_turbo because it can only be 'true' if global.turbo_disabled is also 'true'. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
2024-04-02cpufreq: intel_pstate: Rearrange show_no_turbo() and store_no_turbo()Rafael J. Wysocki1-16/+18
Now that global.turbo_disabled can only change at the cpufreq driver registration time, initialize global.no_turbo at that time too so they are in sync to start with (if the former is set, the latter cannot be updated later anyway). That allows show_no_turbo() to be simlified because it does not need to check global.turbo_disabled and store_no_turbo() can be rearranged to avoid doing anything if the new value of global.no_turbo is equal to the current one and only return an error on attempts to clear global.no_turbo when global.turbo_disabled. While at it, eliminate the redundant ret variable from store_no_turbo(). No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
2024-04-02cpufreq: intel_pstate: Do not update global.turbo_disabled after initializationRafael J. Wysocki1-43/+8
The global.turbo_disabled is updated quite often, especially in the passive mode in which case it is updated every time the scheduler calls into the driver. However, this is generally not necessary and it adds MSR read overhead to scheduler code paths (and that particular MSR is slow to read). For this reason, make the driver read MSR_IA32_MISC_ENABLE_TURBO_DISABLE just once at the cpufreq driver registration time and remove all of the in-flight updates of global.turbo_disabled. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
2024-04-02cpufreq: intel_pstate: Fold intel_pstate_max_within_limits() into callerRafael J. Wysocki1-9/+4
Fold intel_pstate_max_within_limits() into its only caller. No functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
2024-04-02cpufreq: intel_pstate: Use __ro_after_init for three variablesRafael J. Wysocki1-3/+3
There are at least 3 variables in intel_pstate that do not get updated after they have been initialized, so annotate them with __ro_after_init. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-04-02cpufreq: intel_pstate: Get rid of unnecessary READ_ONCE() annotationsRafael J. Wysocki1-22/+5
Drop two redundant checks involving READ_ONCE() from notify_hwp_interrupt() and make it check hwp_active without READ_ONCE() which is not necessary, because that variable is only set once during the early initialization of the driver. In order to make that clear, annotate hwp_active with __ro_after_init. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-04-02cpufreq: intel_pstate: Wait for canceled delayed work to completeRafael J. Wysocki1-2/+6
Make intel_pstate_disable_hwp_interrupt() wait for canceled delayed work to complete to avoid leftover work items running when it returns which may be during driver unregistration and may confuse things going forward. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-04-02cpufreq: intel_pstate: Simplify spinlock lockingRafael J. Wysocki1-8/+4
Because intel_pstate_enable/disable_hwp_interrupt() are only called from thread context, they need not save the IRQ flags when using a spinlock as interrupts are guaranteed to be enabled when they run, so make them use spin_lock/unlock_irq(). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-04-02cpufreq: intel_pstate: Drop redundant locking from intel_pstate_driver_cleanup()Rafael J. Wysocki1-2/+0
Remove the spinlock locking from intel_pstate_driver_cleanup() as it is not necessary because no other code accessing all_cpu_data[] can run in parallel with that function. Had the locking been necessary, though, it would have been incorrect because the lock in question is acquired from a hardirq handler and it cannot be acquired from thread context without disabling interrupts. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-03-11Merge back cpufreq material for 6.9-rc1.Rafael J. Wysocki1-9/+37
2024-02-24cpufreq: intel_pstate: Update default EPPs for Meteor LakeSrinivas Pandruvada1-0/+2
Update default balanced_performance EPP to 115 and performance EPP to 16. Changing the balanced_performance EPP has better performance/watt compared to default powerup EPP value of 128. Changing the performance EPP to 0x10 shows reduced power for similar performance as EPP 0. On small form factor devices this is beneficial as lower power results in lower CPU and skin temperature. This results in reduced thermal throttling and higher performance. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-02-24cpufreq: intel_pstate: Allow model specific EPPsSrinivas Pandruvada1-6/+35
The current implementation allows model specific EPP override for balanced_performance. Add feature to allow model specific EPP for all predefined EPP strings. For example for some CPU models, even changing performance EPP has benefits Use a mask of EPPs as driver_data instead of just balanced_performance. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-02-24cpufreq: intel_pstate: fix pstate limits enforcement for adjust_perf call backDoug Smythies1-0/+3
There is a loophole in pstate limit clamping for the intel_cpufreq CPU frequency scaling driver (intel_pstate in passive mode), schedutil CPU frequency scaling governor, HWP (HardWare Pstate) control enabled, when the adjust_perf call back path is used. Fix it. Fixes: a365ab6b9dfb cpufreq: intel_pstate: Implement the ->adjust_perf() callback Signed-off-by: Doug Smythies <dsmythies@telus.net> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-02-13cpufreq: intel_pstate: remove cpudata::prev_cummulative_iowaitJiri Slaby (SUSE)1-3/+0
Commit 09c448d3c61f ("cpufreq: intel_pstate: Use IOWAIT flag in Atom algorithm") removed the last user of cpudata::prev_cummulative_iowait. Remove the member too. Found by https://github.com/jirislaby/clang-struct. Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-01-22cpufreq: intel_pstate: Refine computation of P-state for given frequencyRafael J. Wysocki1-21/+34
On systems using HWP, if a given frequency is equal to the maximum turbo frequency or the maximum non-turbo frequency, the HWP performance level corresponding to it is already known and can be used directly without any computation. Accordingly, adjust the code to use the known HWP performance levels in the cases mentioned above. This also helps to avoid limiting CPU capacity artificially in some cases when the BIOS produces the HWP_CAP numbers using a different E-core-to-P-core performance scaling factor than expected by the kernel. Fixes: f5c8cf2a4992 ("cpufreq: intel_pstate: hybrid: Use known scaling factor for P-cores") Cc: 6.1+ <stable@vger.kernel.org> # 6.1+ Tested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-01-10cpufreq: intel_pstate: Update hybrid scaling factor for Meteor LakeSrinivas Pandruvada1-3/+18
On some Meteor Lake platforms, maximum one core turbo frequency is not observed. During hybrid performance to frequency conversion, the maximum frequency is 100 MHz less. This results in requesting maximum frequency 100 MHz less. For example when the max one core turbo is 4.9 GHz: MSR HWP_CAPABILITIES shows highest performance ratio for P-core is 0x3E. With the current scaling factor of 78741 (1.27x for converting frequency to performance) results in max frequency of 4.8 GHz. This results in capping the max scaling frequency as 4.8 GHz, which is 100 MHz less than the desired. Add capability to define per CPU model specific scaling factor and define scaling factor of 80000 (1.25x for converting frequency to performance for P-cores) for Meteor Lake. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> [ rjw: Debug message adjustment, subject edit ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-12-19cpufreq: intel_pstate: Add Emerald Rapids support in no-HWP modeZhenguo Yao1-0/+1
Users may disable HWP in firmware, in which case intel_pstate will give up unless the CPU model is explicitly supported. See also the following past commits: - commit df51f287b5de ("cpufreq: intel_pstate: Add Sapphire Rapids support in no-HWP mode") - commit d8de7a44e11f ("cpufreq: intel_pstate: Add Skylake servers support") - commit 706c5328851d ("cpufreq: intel_pstate: Add Cometlake support in no-HWP mode") - commit fbdc21e9b038 ("cpufreq: intel_pstate: Add Icelake servers support in no-HWP mode") - commit 71bb5c82aaae ("cpufreq: intel_pstate: Add Tigerlake support in no-HWP mode") Signed-off-by: Zhenguo Yao <yaozhenguo1@gmail.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> [ rjw: Changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-12-05cpufreq: intel_pstate: Prioritize firmware-provided balance performance EPPSrinivas Pandruvada1-7/+7
The platform firmware can provide a balance performance EPP value by enabling HWP and programming the EPP to the desired value. However, currently this only takes effect for processors listed in intel_epp_balance_perf[], so in order to enable a new processor model to utilize this mechanism, that table needs to be updated. It arguably should not be necessary to modify the kernel to work properly with every new generation of processors, though, and distributions that don't always ship the most recent kernels should be able to run reasonably well on new hardware without code changes. For this reason, move the check to avoid updating the EPP when the balance performance EPP is unmodified from the power-up default of 0x80 after the check that allows the firmware-provided balance performance EPP value to be retrieved. This will cause the code to always look for the firmware- provided value before consulting intel_epp_balance_perf[] and the handling of new hardware will not depend on whether or not that thable has been updated yet. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-09-11cpufreq: intel_pstate: Revise global turbo disable checkSrinivas Pandruvada1-5/+1
Setting global turbo flag based on CPU 0 P-state limits is problematic as it limits max P-state request on every CPU on the system just based on its P-state limits. There are two cases in which global.turbo_disabled flag is set: - When the MSR_IA32_MISC_ENABLE_TURBO_DISABLE bit is set to 1 in the MSR MSR_IA32_MISC_ENABLE. This bit can be only changed by the system BIOS before power up. - When the max non turbo P-state is same as max turbo P-state for CPU 0. The second check is not a valid to decide global turbo state based on the CPU 0. CPU 0 max turbo P-state can be same as max non turbo P-state, but for other CPUs this may not be true. There is no guarantee that max P-state limits are same for every CPU. This is possible that during fusing max P-state for a CPU is constrained. Also with the Intel Speed Select performance profile, CPU 0 may not be present in all profiles. In this case the max non turbo and turbo P-state can be set to the lowest possible P-state by the hardware when switched to such profile. Since max non turbo and turbo P-state is same, global.turbo_disabled flag will be set. Once global.turbo_disabled is set, any scaling max and min frequency update for any CPU will result in its max P-state constrained to the max non turbo P-state. Hence remove the check of max non turbo P-state equal to max turbo P-state of CPU 0 to set global turbo disabled flag. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> [ rjw: Subject edit ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-08-21cpufreq: intel_pstate: set stale CPU frequency to minimumDoug Smythies1-0/+5
The intel_pstate CPU frequency scaling driver does not use policy->cur and it is 0. When the CPU frequency is outdated arch_freq_get_on_cpu() will default to the nominal clock frequency when its call to cpufreq_quick_getpolicy_cur returns the never updated 0. Thus, the listed frequency might be outside of currently set limits. Some users are complaining about the high reported frequency, albeit stale, when their system is idle and/or it is above the reduced maximum they have set. This patch will maintain policy_cur for the intel_pstate driver at the current minimum CPU frequency. Reported-by: Yang Jie <yang.jie@linux.intel.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217597 Signed-off-by: Doug Smythies <dsmythies@telus.net> [ rjw: White space damage fixes and comment adjustment ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-06-30cpufreq: intel_pstate: Fix scaling for hybrid-capable systems with disabled ↵Srinivas Pandruvada1-10/+48
E-cores Some system BIOS configuration may provide option to disable E-cores. As part of this change, CPUID feature for hybrid (Leaf 7 sub leaf 0, EDX[15] = 0) may not be set. But HWP performance limits will still be using a scaling factor like any other hybrid enabled system. The current check for applying scaling factor will fail when hybrid CPUID feature is not set and the only way to make sure that scaling should be applied by checking CPPC nominal frequency and nominal performance. First, or systems predating Alder Lake, the CPPC nominal frequency and nominal performance are 0, which can be used to distinguish those systems from hybrid systems with disabled E-cores. Second, if the CPPC nominal frequency and nominal performance are defined, which indicates the need to use a special scaling factor, and the nominal performance value multiplied by 100 is not equal to the nominal frequency one, use hybrid scaling factor. This can be done for all HWP systems without additional CPU model check. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> [ rjw: Subject and changelog edits, removal of unneeded parens, comment edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-06-21cpufreq: intel_pstate: Fix energy_performance_preference for passiveTero Kristo1-0/+2
If the intel_pstate driver is set to passive mode, then writing the same value to the energy_performance_preference sysfs twice will fail. This is caused by the wrong return value used (index of the matched energy_perf_string), instead of the length of the passed in parameter. Fix by forcing the internal return value to zero when the same preference is passed in by user. This same issue is not present when active mode is used for the driver. Fixes: f6ebbcf08f37 ("cpufreq: intel_pstate: Implement passive mode with HWP enabled") Reported-by: Niklas Neronin <niklas.neronin@intel.com> Signed-off-by: Tero Kristo <tero.kristo@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-04-27Merge tag 'driver-core-6.4-rc1' of ↵Linus Torvalds1-2/+5
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the large set of driver core changes for 6.4-rc1. Once again, a busy development cycle, with lots of changes happening in the driver core in the quest to be able to move "struct bus" and "struct class" into read-only memory, a task now complete with these changes. This will make the future rust interactions with the driver core more "provably correct" as well as providing more obvious lifetime rules for all busses and classes in the kernel. The changes required for this did touch many individual classes and busses as many callbacks were changed to take const * parameters instead. All of these changes have been submitted to the various subsystem maintainers, giving them plenty of time to review, and most of them actually did so. Other than those changes, included in here are a small set of other things: - kobject logging improvements - cacheinfo improvements and updates - obligatory fw_devlink updates and fixes - documentation updates - device property cleanups and const * changes - firwmare loader dependency fixes. All of these have been in linux-next for a while with no reported problems" * tag 'driver-core-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (120 commits) device property: make device_property functions take const device * driver core: update comments in device_rename() driver core: Don't require dynamic_debug for initcall_debug probe timing firmware_loader: rework crypto dependencies firmware_loader: Strip off \n from customized path zram: fix up permission for the hot_add sysfs file cacheinfo: Add use_arch[|_cache]_info field/function arch_topology: Remove early cacheinfo error message if -ENOENT cacheinfo: Check cache properties are present in DT cacheinfo: Check sib_leaf in cache_leaves_are_shared() cacheinfo: Allow early level detection when DT/ACPI info is missing/broken cacheinfo: Add arm64 early level initializer implementation cacheinfo: Add arch specific early level initializer tty: make tty_class a static const structure driver core: class: remove struct class_interface * from callbacks driver core: class: mark the struct class in struct class_interface constant driver core: class: make class_register() take a const * driver core: class: mark class_release() as taking a const * driver core: remove incorrect comment for device_create* MIPS: vpe-cmp: remove module owner pointer from struct class usage. ...
2023-03-17cpufreq: intel_pstate: Enable HWP IO boost for all serversSrinivas Pandruvada1-10/+1
The HWP IO boost results in slight improvements for IO performance on both Ice Lake and Sapphire Rapid servers. Currently there is a CPU model check for Skylake desktop and server along with the ACPI PM profile for performance and enterprise servers to enable IO boost. Remove the CPU model check, so that all current server models enable HWP IO boost by default. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-03-17cpufreq: move to use bus_get_dev_root()Greg Kroah-Hartman1-2/+5
Direct access to the struct bus_type dev_root pointer is going away soon so replace that with a call to bus_get_dev_root() instead, which is what it is there for. Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: Len Brown <lenb@kernel.org> Cc: linux-pm@vger.kernel.org Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lore.kernel.org/r/20230313182918.1312597-3-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-03Merge tag 'pm-6.3-rc1-2' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull more power management updates from Rafael Wysocki: "These update power capping (new hardware support and cleanup) and cpufreq (bug fixes, cleanups and intel_pstate adjustment for a new platform). Specifics: - Fix error handling in the apple-soc cpufreq driver (Dan Carpenter) - Change the log level of a message in the amd-pstate cpufreq driver so it is more visible to users (Kai-Heng Feng) - Adjust the balance_performance EPP value for Sapphire Rapids in the intel_pstate cpufreq driver (Srinivas Pandruvada) - Remove MODULE_LICENSE from 3 pieces of non-modular code (Nick Alcock) - Make a read-only kobj_type structure in the schedutil cpufreq governor constant (Thomas Weißschuh) - Add Add Power Limit4 support for Meteor Lake SoC to the Intel RAPL power capping driver (Sumeet Pawnikar)" * tag 'pm-6.3-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: cpufreq: apple-soc: Fix an IS_ERR() vs NULL check powercap: remove MODULE_LICENSE in non-modules cpufreq: intel_pstate: remove MODULE_LICENSE in non-modules powercap: RAPL: Add Power Limit4 support for Meteor Lake SoC cpufreq: amd-pstate: remove MODULE_LICENSE in non-modules cpufreq: schedutil: make kobj_type structure constant cpufreq: amd-pstate: Let user know amd-pstate is disabled cpufreq: intel_pstate: Adjust balance_performance EPP for Sapphire Rapids
2023-02-28cpufreq: intel_pstate: remove MODULE_LICENSE in non-modulesNick Alcock1-1/+0
Since commit 8b41fc4454e ("kbuild: create modules.builtin without Makefile.modbuiltin or tristate.conf"), MODULE_LICENSE declarations are used to identify modules. As a consequence, uses of the macro in non-modules will cause modprobe to misidentify their containing object file as a module when it is not (false positives), and modprobe might succeed rather than failing with a suitable error message. So remove it in the files in this commit, none of which can be built as modules. Signed-off-by: Nick Alcock <nick.alcock@oracle.com> Suggested-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-02-23cpufreq: intel_pstate: Adjust balance_performance EPP for Sapphire RapidsSrinivas Pandruvada1-0/+1
While the majority of server OS distributions are deployed with the "performance" governor as the default, some distributions like Ubuntu use the "powersave" governor by default. While using the "powersave" governor in its default configuration on Sapphire Rapids systems leads to much lower power, the performance is lower by more than 25% for several workloads relative to the "performance" governor. A 37% difference has been reported by www.Phoronix.com [1]. This is a consequence of using a relatively high EPP value in the default configuration of the "powersave" governor and the performance can be made much closer to the "performance" governor's level by adjusting the default EPP value. Based on experiments, with EPP of 0x00, 0x10, 0x20, the performance delta between the "powersave" governor and the "performance" one is around 12%. However, the EPP of 0x20 reduces average power by 18% with respect to the lower EPP values. [Note that raising min_perf_pct in sysfs as high as 50% in addition to adjusting EPP does not improve the performance any further.] For this reason, change the EPP value corresponding to the the default balance_performance setting for Sapphire Rapids to 0x20, which is straightforward, because analogous default EPP adjustment has been applied to Alder Lake and there is a way to set the balance_performance EPP value in intel_pstate based on the processor model already. The goal here is to limit the mean performance delta between the "powersave" governor in the default configuration and the "performance" governor for a wide variety of server workloadsto to around 10-12%. For some bursty workloads, this delta can be still large, as the frequency ramp-up will still lag when the "powersave" governor is in use irrespective of the EPP setting, because the performance governor always requests the maximum possible frequency. Link: https://www.phoronix.com/review/centos-clear-spr/6 # [1] Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-12-30cpufreq: intel_pstate: Drop ACPI _PSS states table patchingRafael J. Wysocki1-14/+0
After making acpi_processor_get_platform_limit() use the "no limit" value for its frequency QoS request when _PPC returns 0, it is not necessary to replace the frequency corresponding to the first _PSS return package entry with the maximum turbo frequency of the given CPU in intel_pstate_init_acpi_perf_limits() any more, so drop the code doing that along with the comment explaining it. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-11-30cpufreq: intel_pstate: Add Sapphire Rapids support in no-HWP modeGiovanni Gherdovich1-0/+1
Users may disable HWP in firmware, in which case intel_pstate wouldn't load unless the CPU model is explicitly supported. See also the following past commits: commit d8de7a44e11f ("cpufreq: intel_pstate: Add Skylake servers support") commit 706c5328851d ("cpufreq: intel_pstate: Add Cometlake support in no-HWP mode") commit fbdc21e9b038 ("cpufreq: intel_pstate: Add Icelake servers support in no-HWP mode") commit 71bb5c82aaae ("cpufreq: intel_pstate: Add Tigerlake support in no-HWP mode") Signed-off-by: Giovanni Gherdovich <ggherdovich@suse.cz> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-11-03cpufreq: intel_pstate: Allow EPP 0x80 setting by the firmwareSrinivas Pandruvada1-4/+5
With the "commit 3d13058ed2a6 ("cpufreq: intel_pstate: Use firmware default EPP")" the firmware can set an EPP, which driver will not overwrite. But the driver has a valid range check for: 0x40 > firmware epp < 0x80. Hence firmware can't specify EPP of 0x80. If the firmware didn't specify in the valid range, the driver has a hard coded EPP of 102. But some Chrome hardware vendors don't want this overwrite and wants to boot with chipset default EPP of 0x80 as this improves battery life. In this case they want to have capability to specify EPP of 0x80 via the firmware. This require the valid range to include 0x80 also. But here the valid range can't be simply extended to include 0x80 as this is the chipset default EPP. Even without any firmware specifying EPP, the chipset will always boot with EPP of 0x80. To make sure that firmware specified EPP of 0x80 and not by the chipset default, it will require additional check to make sure HWP was enabled by the firmware before boot. Only way the firmware can update EPP, is to enable HWP and update EPP via MSR_HWP_REQUEST. This driver already checks, if the HWP is enabled by the firmware. Use the same flag and extend valid range to include 0x80. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-10-25cpufreq: intel_pstate: hybrid: Use known scaling factor for P-coresRafael J. Wysocki1-54/+15
Commit 46573fd6369f ("cpufreq: intel_pstate: hybrid: Rework HWP calibration") attempted to use the information from CPPC (the nominal performance in particular) to obtain the scaling factor allowing the frequency to be computed if the HWP performance level of the given CPU is known or vice versa. However, it turns out that on some platforms this doesn't work, because the CPPC information on them does not align with the contents of the MSR_HWP_CAPABILITIES registers. This basically means that the only way to make intel_pstate work on all of the hybrid platforms to date is to use the observation that on all of them the scaling factor between the HWP performance levels and frequency for P-cores is 78741 (approximately 100000/1.27). For E-cores it is 100000, which is the same as for all of the non-hybrid "core" platforms and does not require any changes. Accordingly, make intel_pstate use 78741 as the scaling factor between HWP performance levels and frequency for P-cores on all hybrid platforms and drop the dependency of the HWP calibration code on CPPC. Fixes: 46573fd6369f ("cpufreq: intel_pstate: hybrid: Rework HWP calibration") Reported-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Tested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: 5.15+ <stable@vger.kernel.org> # 5.15+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-10-25cpufreq: intel_pstate: Read all MSRs on the target CPURafael J. Wysocki1-33/+33
Some of the MSR accesses in intel_pstate are carried out on the CPU that is running the code, but the values coming from them are used for the performance scaling of the other CPUs. This is problematic, for example, on hybrid platforms where MSR_TURBO_RATIO_LIMIT for P-cores and E-cores is different, so the values read from it on a P-core are generally not applicable to E-cores and the other way around. For this reason, make the driver access all MSRs on the target CPU on platforms using the "core" pstate_funcs callbacks which is the case for all of the hybrid platforms released to date. For this purpose, pass a CPU argument to the ->get_max(), ->get_max_physical(), ->get_min() and ->get_turbo() pstate_funcs callbacks and from there pass it to rdmsrl_on_cpu() or rdmsrl_safe_on_cpu() to access the MSR on the target CPU. Fixes: 46573fd6369f ("cpufreq: intel_pstate: hybrid: Rework HWP calibration") Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Tested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: 5.15+ <stable@vger.kernel.org> # 5.15+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-09-10cpufreq: intel_pstate: Add Tigerlake support in no-HWP modeDoug Smythies1-0/+1
Users may disable HWP in firmware, in which case intel_pstate wouldn't load unless the CPU model is explicitly supported. Add TIGERLAKE to the list of CPUs that can register intel_pstate while not advertising the HWP capability. Without this change, an TIGERLAKE in no-HWP mode could only use the acpi_cpufreq frequency scaling driver. See also commits: d8de7a44e11f: cpufreq: intel_pstate: Add Skylake servers support fbdc21e9b038: cpufreq: intel_pstate: Add Icelake servers support in no-HWP mode 706c5328851d: cpufreq: intel_pstate: Add Cometlake support in no-HWP mode Reported by: M. Cargi Ari <cagriari@pm.me> Signed-off-by: Doug Smythies <dsmythies@telus.net> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>