summaryrefslogtreecommitdiff
path: root/drivers/thermal/intel
AgeCommit message (Collapse)AuthorFilesLines
2023-03-11thermal: intel: BXT_PMIC: select REGMAP instead of depending on itRandy Dunlap1-1/+2
[ Upstream commit 1467fb960349dfa5e300658f1a409dde2cfb0c51 ] REGMAP is a hidden (not user visible) symbol. Users cannot set it directly thru "make *config", so drivers should select it instead of depending on it if they need it. Consistently using "select" or "depends on" can also help reduce Kconfig circular dependency issues. Therefore, change the use of "depends on REGMAP" to "select REGMAP". Fixes: b474303ffd57 ("thermal: add Intel BXT WhiskeyCove PMIC thermal driver") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-11thermal: intel: quark_dts: fix error pointer dereferenceDan Carpenter1-10/+2
[ Upstream commit f1b930e740811d416de4d2074da48b6633a672c8 ] If alloc_soc_dts() fails, then we can just return. Trying to free "soc_dts" will lead to an Oops. Fixes: 8c1876939663 ("thermal: intel Quark SoC X1000 DTS thermal driver") Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-10thermal: intel: powerclamp: Fix cur_state for multi package systemSrinivas Pandruvada1-4/+16
commit 8e47363588377e1bdb65e2b020b409cfb44dd260 upstream. The powerclamp cooling device cur_state shows actual idle observed by package C-state idle counters. But the implementation is not sufficient for multi package or multi die system. The cur_state value is incorrect. On these systems, these counters must be read from each package/die and somehow aggregate them. But there is no good method for aggregation. It was not a problem when explicit CPU model addition was required to enable intel powerclamp. In this way certain CPU models could have been avoided. But with the removal of CPU model check with the availability of Package C-state counters, the driver is loaded on most of the recent systems. For multi package/die systems, just show the actual target idle state, the system is trying to achieve. In powerclamp this is the user set state minus one. Also there is no use of starting a worker thread for polling package C-state counters and applying any compensation for multiple package or multiple die systems. Fixes: b721ca0d1927 ("thermal/powerclamp: remove cpu whitelist") Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: 4.14+ <stable@vger.kernel.org> # 4.14+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-10thermal: intel: intel_pch: Add support for Wellsburg PCHTim Zimmermann1-0/+8
[ Upstream commit 40dc1929089fc844ea06d9f8bdb6211ed4517c2e ] Add the PCI ID for the Wellsburg C610 series chipset PCH. The driver can read the temperature from the Wellsburg PCH with only the PCI ID added and no other modifications. Signed-off-by: Tim Zimmermann <tim@linux4.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-10thermal: intel: Fix unsigned comparison with less than zeroYang Li1-1/+1
[ Upstream commit e7fcfe67f9f410736b758969477b17ea285e8e6c ] The return value from the call to intel_tcc_get_tjmax() is int, which can be a negative error code. However, the return value is being assigned to an u32 variable 'tj_max', so making 'tj_max' an int. Eliminate the following warning: ./drivers/thermal/intel/intel_soc_dts_iosf.c:394:5-11: WARNING: Unsigned expression compared with zero: tj_max < 0 Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=3637 Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Acked-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-02-01thermal: intel: int340x: Add locking to int340x_thermal_get_trip_type()Rafael J. Wysocki1-3/+7
[ Upstream commit acd7e9ee57c880b99671dd99680cb707b7b5b0ee ] In order to prevent int340x_thermal_get_trip_type() from possibly racing with int340x_thermal_read_trips() invoked by int3403_notify() add locking to it in analogy with int340x_thermal_get_trip_temp(). Fixes: 6757a7abe47b ("thermal: intel: int340x: Protect trip temperature from concurrent updates") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-02-01thermal: intel: int340x: Protect trip temperature from concurrent updatesSrinivas Pandruvada2-3/+16
commit 6757a7abe47bcb12cb2d45661067e182424b0ee3 upstream. Trip temperatures are read using ACPI methods and stored in the memory during zone initializtion and when the firmware sends a notification for change. This trip temperature is returned when the thermal core calls via callback get_trip_temp(). But it is possible that while updating the memory copy of the trips when the firmware sends a notification for change, thermal core is reading the trip temperature via the callback get_trip_temp(). This may return invalid trip temperature. To address this add a mutex to protect the invalid temperature reads in the callback get_trip_temp() and int340x_thermal_read_trips(). Fixes: 5fbf7f27fa3d ("Thermal/int340x: Add common thermal zone handler") Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: 5.0+ <stable@vger.kernel.org> # 5.0+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-01-12thermal: int340x: Add missing attribute for data rate baseSrinivas Pandruvada1-0/+4
commit b878d3ba9bb41cddb73ba4b56e5552f0a638daca upstream. Commit 473be51142ad ("thermal: int340x: processor_thermal: Add RFIM driver")' added rfi_restriction_data_rate_base string, mmio details and documentation, but missed adding attribute to sysfs. Add missing sysfs attribute. Fixes: 473be51142ad ("thermal: int340x: processor_thermal: Add RFIM driver") Cc: 5.11+ <stable@vger.kernel.org> # v5.11+ Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-10-15thermal: intel_powerclamp: Use first online CPU as control_cpuRafael J. Wysocki1-5/+1
Commit 68b99e94a4a2 ("thermal: intel_powerclamp: Use get_cpu() instead of smp_processor_id() to avoid crash") fixed an issue related to using smp_processor_id() in preemptible context by replacing it with a pair of get_cpu()/put_cpu(), but what is needed there really is any online CPU and not necessarily the one currently running the code. Arguably, getting the one that's running the code in there is confusing. For this reason, simply give the control CPU role to the first online one which automatically will be CPU0 if it is online, so one check can be dropped from the code for an added benefit. Link: https://lore.kernel.org/linux-pm/20221011113646.GA12080@duo.ucw.cz/ Fixes: 68b99e94a4a2 ("thermal: intel_powerclamp: Use get_cpu() instead of smp_processor_id() to avoid crash") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Chen Yu <yu.c.chen@intel.com>
2022-09-24thermal: int340x: processor_thermal: Use module_pci_driver() macroShang XiaoJing2-24/+2
Since PCI provides helper macro module_pci_driver(), the module_init/exit code can be replaced with it. Signed-off-by: Shang XiaoJing <shangxiaojing@huawei.com> [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-09-21thermal: intel_powerclamp: Remove accounting for IRQ wakesSrinivas Pandruvada1-18/+3
There is a static variable "idle_wakeup_counter", which accounts for number of wake ups because of IRQs and take actions to compensate idle injection. This is now read and reset to 0, but never incremented. So all the usage of this counter for idle injection has no use. Also another static variable "reduce_irq", which depends on "idle_wakeup_counter", so remove usage of "reduce_irq" also. Commit feb6cd6a0f9f ("thermal/intel_powerclamp: stop sched tick in forced idle") replaced the local use of "mwait_idle_with_hints" with play_idle(). This removed possibility of updating "idle_wakeup_counter" without change in play_idle(). This change was made in Linux 4.10. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> [ rjw: Changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-09-21thermal: intel_powerclamp: Use get_cpu() instead of smp_processor_id() to ↵Srinivas Pandruvada1-2/+4
avoid crash When CPU 0 is offline and intel_powerclamp is used to inject idle, it generates kernel BUG: BUG: using smp_processor_id() in preemptible [00000000] code: bash/15687 caller is debug_smp_processor_id+0x17/0x20 CPU: 4 PID: 15687 Comm: bash Not tainted 5.19.0-rc7+ #57 Call Trace: <TASK> dump_stack_lvl+0x49/0x63 dump_stack+0x10/0x16 check_preemption_disabled+0xdd/0xe0 debug_smp_processor_id+0x17/0x20 powerclamp_set_cur_state+0x7f/0xf9 [intel_powerclamp] ... ... Here CPU 0 is the control CPU by default and changed to the current CPU, if CPU 0 offlined. This check has to be performed under cpus_read_lock(), hence the above warning. Use get_cpu() instead of smp_processor_id() to avoid this BUG. Suggested-by: Chen Yu <yu.c.chen@intel.com> Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> [ rjw: Subject edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-09-03thermal: int340x_thermal: Consolidate priv->data_vault checksRafael J. Wysocki1-3/+2
It is sufficient to check priv->data_vault once in the error code path of int3400_thermal_probe(), so do that. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-08-23thermal/int340x_thermal: handle data_vault when the value is ZERO_SIZE_PTRLee, Chun-Yi1-4/+5
In some case, the GDDV returns a package with a buffer which has zero length. It causes that kmemdup() returns ZERO_SIZE_PTR (0x10). Then the data_vault_read() got NULL point dereference problem when accessing the 0x10 value in data_vault. [ 71.024560] BUG: kernel NULL pointer dereference, address: 0000000000000010 This patch uses ZERO_OR_NULL_PTR() for checking ZERO_SIZE_PTR or NULL value in data_vault. Signed-off-by: "Lee, Chun-Yi" <jlee@suse.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-08-03thermal: intel: Add TCC cooling support for Alder Lake-N and Raptor Lake-PSumeet Pawnikar1-0/+2
Add Alder Lake-N and Raptor Lake-P to the list of processor models supported by the Intel TCC cooling driver. Signed-off-by: Sumeet Pawnikar <sumeet.r.pawnikar@intel.com> Acked-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-07-22intel: thermal: PCH: Drop ACPI_FADT_LOW_POWER_S0 checkRafael J. Wysocki1-8/+0
If ACPI_FADT_LOW_POWER_S0 is not set, this doesn't mean that low-power S0 idle is not usable. It merely means that using S3 on the given system is more beneficial from the energy saving perspective than using low-power S0 idle, as long as S3 is supported. Suspend-to-idle is still a valid suspend mode if ACPI_FADT_LOW_POWER_S0 is not set and the pm_suspend_via_firmware() check in pch_wpt_suspend() is sufficient to distinguish suspend-to-idle from S3, so drop the confusing ACPI_FADT_LOW_POWER_S0 check. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Zhang Rui <rui.zhang@intel.com>
2022-07-12thermal: intel: x86_pkg_temp_thermal: Drop duplicate 'is' from commentJiang Jian1-1/+1
There is an unexpected word 'is' in a comments that need to be dropped file: ./drivers/thermal/intel/x86_pkg_temp_thermal.c line: 108 * tj-max is is interesting because threshold is set relative to this changed to: * tj-max is interesting because threshold is set relative to this Signed-off-by: Jiang Jian <jiangjian@cdjrlc.com> [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-07-01Merge tag 'thermal-5.19-rc5' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull thermal control fix from Rafael Wysocki: "Add a new CPU ID to the list of supported processors in the intel_tcc_cooling driver (Sumeet Pawnikar)" * tag 'thermal-5.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: thermal: intel_tcc_cooling: Add TCC cooling support for RaptorLake
2022-06-30thermal: intel_tcc_cooling: Add TCC cooling support for RaptorLakeSumeet Pawnikar1-0/+1
Add RaptorLake to the list of processor models supported by the Intel TCC cooling driver. Signed-off-by: Sumeet Pawnikar <sumeet.r.pawnikar@intel.com> [ rjw: Subject edits, new changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-05-30Merge tag 'thermal-5.19-rc1-2' of ↵Linus Torvalds2-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull additional thermal control update from Rafael Wysocki: "Add Meteor Lake PCI device ID to the int340x thermal control driver (Sumeet Pawnikar)" * tag 'thermal-5.19-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: thermal: int340x: Add Meteor Lake PCI device ID
2022-05-25thermal: int340x: Add Meteor Lake PCI device IDSumeet Pawnikar2-0/+2
Add Meteor Lake PCI ID for processor thermal device. Signed-off-by: Sumeet Pawnikar <sumeet.r.pawnikar@intel.com> Reviewed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-05-25ACPI: DPTF: Support Meteor LakeSumeet Pawnikar2-0/+2
Add Meteor Lake ACPI IDs for DPTF devices. Signed-off-by: Sumeet Pawnikar <sumeet.r.pawnikar@intel.com> Reviewed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-05-23Merge branches 'thermal-int340x', 'thermal-pch' and 'thermal-misc'Rafael J. Wysocki2-30/+37
Merge int340x thermal driver updates, PCH thermal driver updates and miscellaneous thermal control updates for 5.19-rc1: - Clean up _OSC handling in int340x (Davidlohr Bueso). - Improve overheat condition handling during suspend-to-idle in the Intel PCH thermal driver (Zhang Rui). - Use local ops instead of global ops in devfreq_cooling (Kant Fan). - Switch hisi_termal from CONFIG_PM_SLEEP guards to pm_sleep_ptr() (Hesham Almatary) * thermal-int340x: thermal: int340x: Clean up _OSC context init thermal: int340x: Consolidate freeing of acpi_buffer pointer thermal: int340x: Clean up unnecessary acpi_buffer pointer freeing * thermal-pch: thermal: intel: pch: improve the cooling delay log thermal: intel: pch: enhance overheat handling thermal: intel: pch: move cooling delay to suspend_noirq phase PM: wakeup: expose pm_wakeup_pending to modules * thermal-misc: thermal: devfreq_cooling: use local ops instead of global ops thermal: hisi_termal: Switch from CONFIG_PM_SLEEP guards to pm_sleep_ptr()
2022-05-23Merge back earlier thermal control updates for 5.19-rc1.Rafael J. Wysocki1-16/+32
2022-05-19thermal: intel: pch: improve the cooling delay logZhang Rui1-11/+20
Previously, during suspend, intel_pch_thermal driver logs for every cooling iteration, about the current PCH temperature and number of cooling iterations that have been tried, like below [ 100.955526] intel_pch_thermal 0000:00:14.2: CPU-PCH current temp [53C] higher than the threshold temp [50C], sleep 1 times for 100 ms duration [ 101.064156] intel_pch_thermal 0000:00:14.2: CPU-PCH current temp [53C] higher than the threshold temp [50C], sleep 2 times for 100 ms duration After changing the default delay_cnt to 600, in practice, it is common to see tens of the above messages if the system is suspended when PCH overheats. Thus, change this log message from dev_warn to dev_dbg because it is only useful when we want to check the temperature trend. At the same time, there is always a one-line message given by the driver with the patch applied, with below four possibilities. 1. PCH is cool, no cooling delay needed [ 1791.902853] intel_pch_thermal 0000:00:12.0: CPU-PCH is cool [48C] 2. PCH overheats and becomes cool after the cooling delays [ 1475.511617] intel_pch_thermal 0000:00:12.0: CPU-PCH is cool [49C] after 30700 ms delay 3. PCH still overheats after the overall cooling timeout [ 2250.157487] intel_pch_thermal 0000:00:12.0: CPU-PCH is hot [60C] after 60000 ms delay. S0ix might fail 4. PCH aborts cooling because of wakeup event detected during the delay [ 1933.639509] intel_pch_thermal 0000:00:12.0: Wakeup event detected, abort cooling Signed-off-by: Zhang Rui <rui.zhang@intel.com> Tested-by: Sumeet Pawnikar <sumeet.r.pawnikar@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-05-19thermal: intel: pch: enhance overheat handlingZhang Rui1-5/+8
Commit ef63b043ac86 ("thermal: intel: pch: fix S0ix failure due to PCH temperature above threshold") introduces delay loop mechanism that allows PCH temperature to go down below threshold during suspend so it won't block S0ix. And the default overall delay timeout is 1 second. However, in practice, we found that the time it takes to cool the PCH down below threshold highly depends on the initial PCH temperature when the delay starts, as well as the ambient temperature. And in some cases, the 1 second delay is not sufficient. As a result, the system stays in a shallower power state like PCx instead of S0ix, and drains the battery power, without user' notice. To make sure S0ix is not blocked by the PCH overheating, we 1. expand the default overall timeout to 60 seconds. 2. make sure the temperature is below threshold rather than equal to it. At the same time, as the cooling delay can be much longer and many wakeup events (ACPI Power Button press, USB mouse move, etc) becomes valid in the suspend_noirq phase, add detection of wakeup event so that the driver does not delay blindly when the system suspend is likely to abort soon. This patch may introduce longer suspend time, but only in the cases when the system overheats and Linux used to enter a shallower S2idle state, say, PCx instead of S0ix. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Tested-by: Sumeet Pawnikar <sumeet.r.pawnikar@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-05-19thermal: intel: pch: move cooling delay to suspend_noirq phaseZhang Rui1-2/+3
Move the PCH Thermal driver suspend callback to suspend_noirq to do cooling while the system is more quiescent. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Tested-by: Sumeet Pawnikar <sumeet.r.pawnikar@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-05-18thermal: intel: hfi: remove NULL check after container_of() callHaowen Bai1-2/+0
container_of() will never return NULL, so remove useless code. Signed-off-by: Haowen Bai <baihaowen@meizu.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-05-18Merge back earlier int340x driver changes for 5.19.Rafael J. Wysocki1-15/+9
2022-05-11thermal: int340x: Mode setting with new OS handshakeSrinivas Pandruvada1-16/+32
With the new OS handshake introduced by commit: "c7ff29763989 ("thermal: int340x: Update OS policy capability handshake")", the "enabled" thermal zone mode doesn't work in the same way as previously. The "enabled" mode fails with -EINVAL when the new handshake is used. To address this issue, when the new OS UUID mask is set: - When the mode is "enabled", return 0 as the firmware already has the latest policy mask. - When the mode is "disabled", update the firmware with the UUID mask of zero. This way, the firmware can take over the thermal control. Also reset the OS UUID mask, which allows user space to update with new set of policies. Fixes: c7ff29763989 ("thermal: int340x: Update OS policy capability handshake") Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> [ rjw: Changelog edits, removed unneeded parens ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-05-05Merge back earlier int340x thermal driver changes for 5.19.Rafael J. Wysocki1-15/+9
2022-04-21thermal: int340x: Fix attr.show callback prototypeKees Cook1-2/+2
Control Flow Integrity (CFI) instrumentation of the kernel noticed that the caller, dev_attr_show(), and the callback, odvp_show(), did not have matching function prototypes, which would cause a CFI exception to be raised. Correct the prototype by using struct device_attribute instead of struct kobj_attribute. Reported-and-tested-by: Joao Moreira <joao@overdrivepizza.com> Link: https://lore.kernel.org/lkml/067ce8bd4c3968054509831fa2347f4f@overdrivepizza.com/ Fixes: 006f006f1e5c ("thermal/int340x_thermal: Export OEM vendor variables") Cc: 5.8+ <stable@vger.kernel.org> # 5.8+ Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-04-05thermal: int340x: Clean up _OSC context initDavidlohr Bueso1-5/+2
Now that the UUID is already sanitized by the caller, lets trivially clean up some of the context arming. Signed-off-by: Davidlohr Bueso <dave@stgolabs.net> Acked-by: Zhang Rui <rui.zhang@intel.com> [ rjw: Subject edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-04-05thermal: int340x: Consolidate freeing of acpi_buffer pointerDavidlohr Bueso1-8/+5
Introduce a single point of freeing/exit after ensuring no error in int3400_setup_gddv(). Signed-off-by: Davidlohr Bueso <dave@stgolabs.net> Acked-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-04-05thermal: int340x: Clean up unnecessary acpi_buffer pointer freeingDavidlohr Bueso1-2/+2
It is the caller's responsibility to free only upon ACPI_SUCCESS. Signed-off-by: Davidlohr Bueso <dave@stgolabs.net> Acked-by: Zhang Rui <rui.zhang@intel.com> [ rjw: Subject edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-03-18Merge branch 'thermal-hfi'Rafael J. Wysocki5-0/+623
Merge Intel Hardware Feedback Interface (HFI) thermal driver for 5.18-rc1 and update the intel-speed-select utility to support that driver. * thermal-hfi: tools/power/x86/intel-speed-select: v1.12 release tools/power/x86/intel-speed-select: HFI support tools/power/x86/intel-speed-select: OOB daemon mode thermal: intel: hfi: INTEL_HFI_THERMAL depends on NET thermal: netlink: Fix parameter type of thermal_genl_cpu_capability_event() stub thermal: intel: hfi: Notify user space for HFI events thermal: netlink: Add a new event to notify CPU capabilities change thermal: intel: hfi: Enable notification interrupt thermal: intel: hfi: Handle CPU hotplug events thermal: intel: hfi: Minimally initialize the Hardware Feedback Interface x86/cpu: Add definitions for the Intel Hardware Feedback Interface x86/Documentation: Describe the Intel Hardware Feedback Interface
2022-03-18Merge branches 'thermal-powerclamp', 'thermal-int340x' and 'thermal-docs'Rafael J. Wysocki3-72/+113
Merge powerclamp thermal driver changes, int340x thermal driver changes and thermal documentation changes for 5.18-rc1: - Don't use bitmap_weight() in end_power_clamp() in the powerclamp driver (Yury Norov). - Update the OS policy capabilities handshake in the int340x thermal driver (Srinivas Pandruvada). - Increase the policies bitmap size in int340x (Srinivas Pandruvada). - Replace acpi_bus_get_device() with acpi_fetch_acpi_dev() in the int340x thermal driver (Rafael Wysocki). - Check for NULL after calling kmemdup() in int340x (Jiasheng Jiang). - Add Intel Dynamic Power and Thermal Framework (DPTF) kernel interface documentation (Srinivas Pandruvada). - Fix bullet list warning in the thermal documentation (Randy Dunlap). * thermal-powerclamp: thermal: intel_powerclamp: don't use bitmap_weight() in end_power_clamp() * thermal-int340x: thermal: int340x: Update OS policy capability handshake thermal: int340x: Increase bitmap size thermal: Replace acpi_bus_get_device() thermal: int340x: Check for NULL after calling kmemdup() * thermal-docs: Documentation: thermal: DPTF Documentation thermal: fix Documentation bullet list warning
2022-03-16thermal: int340x: Update OS policy capability handshakeSrinivas Pandruvada1-49/+97
Update the firmware with OS supported policies mask, so that firmware can relinquish its internal controls. Without this update several Tiger Lake laptops gets performance limited with in few seconds of executing in turbo region. The existing way of enumerating firmware policies via IDSP method and selecting policy by directly writing those policy UUIDS via _OSC method is not supported in newer generation of hardware. There is a new UUID "B23BA85D-C8B7-3542-88DE-8DE2FFCFD698" is defined for updating policy capabilities. As part of ACPI _OSC method: Arg0 - UUID: B23BA85D-C8B7-3542-88DE-8DE2FFCFD698 Arg1 - Rev ID: 1 Arg2 - Count: 2 Arg3 - Capability buffers: Array of Arg2 DWORDS DWORD1: As defined in the ACPI 5.0 Specification - Bit 0: Query Flag - Bits 1-3: Always 0 - Bits 4-31: Reserved DWORD2 and beyond: - Bit0: set to 1 to indicate Intel(R) Dynamic Tuning is active, 0 to indicate it is disabled and legacy thermal mechanism should be enabled. - Bit1: set to 1 to indicate Intel(R) Dynamic Tuning is controlling active cooling, 0 to indicate bios shall enable legacy thermal zone with active trip point. - Bit2: set to 1 to indicate Intel(R) Dynamic Tuning is controlling passive cooling, 0 to indicate bios shall enable legacy thermal zone with passive trip point. - Bit3: set to 1 to indicate Intel(R) Dynamic Tuning is handling critical trip point, 0 to indicate bios shall enable legacy thermal zone with critical trip point. - Bits 4:31: Reserved From sysfs interface, there is an existing interface to update policy UUID using attribute "current_uuid". User space can write the same UUID for ACTIVE, PASSIVE and CRITICAL policy. Driver converts these UUIDs to DWORD2 Bit 1 to Bit 3. When any of the policy is activated by user space it is assumed that dynamic tuning is active. For example $cd /sys/bus/platform/devices/INTC1040:00/uuids To support active policy $echo "3A95C389-E4B8-4629-A526-C52C88626BAE" > current_uuid To support passive policy $echo "42A441D6-AE6A-462b-A84B-4A8CE79027D3" > current_uuid To support critical policy $echo "97C68AE7-15FA-499c-B8C9-5DA81D606E0A" > current_uuid To check all the supported policies $cat current_uuid 3A95C389-E4B8-4629-A526-C52C88626BAE 42A441D6-AE6A-462b-A84B-4A8CE79027D3 97C68AE7-15FA-499c-B8C9-5DA81D606E0A To match the bit format for DWORD2, rearranged enum int3400_thermal_uuid and int3400_thermal_uuids[] by swapping current INT3400_THERMAL_ACTIVE and INT3400_THERMAL_PASSIVE_1. If the policies are enumerated via IDSP method then legacy method is used, if not the new method is used to update policy support. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-03-16thermal: int340x: Increase bitmap sizeSrinivas Pandruvada1-1/+1
The number of policies are 10, so can't be supported by the bitmap size of u8. Even though there are no platfoms with these many policies, but for correctness increase to u32. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Fixes: 16fc8eca1975 ("thermal/int340x_thermal: Add additional UUIDs") Cc: 5.1+ <stable@vger.kernel.org> # 5.1+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-02-28Merge back int340x thermal driver changes for v5.18.Rafael J. Wysocki2-16/+12
2022-02-24thermal: int340x: fix memory leak in int3400_notify()Chuansheng Liu1-0/+4
It is easy to hit the below memory leaks in my TigerLake platform: unreferenced object 0xffff927c8b91dbc0 (size 32): comm "kworker/0:2", pid 112, jiffies 4294893323 (age 83.604s) hex dump (first 32 bytes): 4e 41 4d 45 3d 49 4e 54 33 34 30 30 20 54 68 65 NAME=INT3400 The 72 6d 61 6c 00 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5 rmal.kkkkkkkkkk. backtrace: [<ffffffff9c502c3e>] __kmalloc_track_caller+0x2fe/0x4a0 [<ffffffff9c7b7c15>] kvasprintf+0x65/0xd0 [<ffffffff9c7b7d6e>] kasprintf+0x4e/0x70 [<ffffffffc04cb662>] int3400_notify+0x82/0x120 [int3400_thermal] [<ffffffff9c8b7358>] acpi_ev_notify_dispatch+0x54/0x71 [<ffffffff9c88f1a7>] acpi_os_execute_deferred+0x17/0x30 [<ffffffff9c2c2c0a>] process_one_work+0x21a/0x3f0 [<ffffffff9c2c2e2a>] worker_thread+0x4a/0x3b0 [<ffffffff9c2cb4dd>] kthread+0xfd/0x130 [<ffffffff9c201c1f>] ret_from_fork+0x1f/0x30 Fix it by calling kfree() accordingly. Fixes: 38e44da59130 ("thermal: int3400_thermal: process "thermal table changed" event") Signed-off-by: Chuansheng Liu <chuansheng.liu@intel.com> Cc: 4.14+ <stable@vger.kernel.org> # 4.14+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-02-10thermal: intel: hfi: INTEL_HFI_THERMAL depends on NETRandy Dunlap1-0/+1
THERMAL_NETLINK depends on NET and since 'select' does not follow any dependency chain, INTEL_HFI_THERMAL also should depend on NET. Fix one Kconfig warning and 48 subsequent build errors: WARNING: unmet direct dependencies detected for THERMAL_NETLINK Depends on [n]: THERMAL [=y] && NET [=n] Selected by [y]: - INTEL_HFI_THERMAL [=y] && THERMAL [=y] && (X86 [=y] || X86_INTEL_QUARK [=n] || COMPILE_TEST [=y]) && CPU_SUP_INTEL [=y] && X86_THERMAL_VECTOR [=y] Fixes: bd30cdfd9bd7 ("thermal: intel: hfi: Notify user space for HFI events") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-02-04thermal: Replace acpi_bus_get_device()Rafael J. Wysocki1-16/+7
Replace acpi_bus_get_device() that is going to be dropped with acpi_fetch_acpi_dev(). No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-02-04thermal: intel_powerclamp: don't use bitmap_weight() in end_power_clamp()Yury Norov1-6/+3
Don't call bitmap_weight() if the following code can get by without it. Signed-off-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-02-04thermal: int340x: Check for NULL after calling kmemdup()Jiasheng Jiang1-0/+5
As the potential failure of the allocation, kmemdup() may return NULL. Then, 'bin_attr_data_vault.private' will be NULL, but 'bin_attr_data_vault.size' is not 0, which is not consistent. Therefore, it is better to check the return value of kmemdup() to avoid the confusion. Fixes: 0ba13c763aac ("thermal/int340x_thermal: Export GDDV") Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn> [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-02-03thermal: intel: hfi: Notify user space for HFI eventsSrinivas Pandruvada2-1/+75
When the hardware issues an HFI event, relay a notification to user space. This allows user space to respond by reading performance and efficiency of each CPU and take appropriate action. For example, when the performance and efficiency of a CPU is 0, user space can either offline the CPU or inject idle. Also, if user space notices a downward trend in performance, it may proactively adjust power limits to avoid future situations in which performance drops to 0. To avoid excessive notifications, the rate is limited by one HZ per event. To limit the netlink message size, send parameters for up to 16 CPUs in a single message. If there are more than 16 CPUs, issue as many messages as needed to notify the status of all CPUs. In the HFI specification, both performance and efficiency capabilities are defined in the [0, 255] range. The existing implementations of HFI hardware do not scale the maximum values to 255. Since userspace cares about capability values that are either 0 or show a downward/upward trend, this fact does not matter much. Relative changes in capabilities are enough. To comply with the thermal netlink ABI, scale both performance and efficiency capabilities to the [0, 1023] interval. Reviewed-by: Len Brown <len.brown@intel.com> Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-02-03thermal: intel: hfi: Enable notification interruptRicardo Neri3-0/+122
When hardware wants to inform the operating system about updates in the HFI table, it issues a package-level thermal event interrupt. For this, hardware has new interrupt and status bits in the IA32_PACKAGE_THERM_ INTERRUPT and IA32_PACKAGE_THERM_STATUS registers. The existing thermal throttle driver already handles thermal event interrupts: it initializes the thermal vector of the local APIC as well as per-CPU and package-level interrupt reporting. It also provides routines to service such interrupts. Extend its functionality to also handle HFI interrupts. The frequency of the thermal HFI interrupt is specific to each processor model. On some processors, a single interrupt happens as soon as the HFI is enabled and hardware will never update HFI capabilities afterwards. On other processors, thermal and power constraints may cause thermal HFI interrupts every tens of milliseconds. To not overwhelm consumers of the HFI data, use delayed work to throttle the rate at which HFI updates are processed. Use a dedicated workqueue to not overload system_wq if hardware issues many HFI updates. Reviewed-by: Len Brown <len.brown@intel.com> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-02-03thermal: intel: hfi: Handle CPU hotplug eventsRicardo Neri3-0/+218
All CPUs in a package are represented in an HFI table. There exists an HFI table per package. Thus, CPUs in a package need to coordinate to initialize and access the table. Do such coordination during CPU hotplug. Use the first CPU to come online in a package to initialize the HFI instance and the data structure representing it. Other CPUs in the same package need only to register or unregister themselves in that data structure. The HFI depends on both the package-level thermal management and the local APIC thermal local vector. Thus, to ensure that a CPU coming online has an associated HFI instance when the hardware issues an HFI event, enable the HFI only after having enabled the local APIC thermal vector. The thermal throttle driver takes care of the needed package-level initialization. Reviewed-by: Len Brown <len.brown@intel.com> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-02-03thermal: intel: hfi: Minimally initialize the Hardware Feedback InterfaceRicardo Neri5-0/+208
The Intel Hardware Feedback Interface provides guidance to the operating system about the performance and energy efficiency capabilities of each CPU in the system. Capabilities are numbers between 0 and 255 where a higher number represents a higher capability. For each CPU, energy efficiency and performance are reported as separate capabilities. Hardware computes these capabilities based on the operating conditions of the system such as power and thermal limits. These capabilities are shared with the operating system in a table resident in memory. Each package in the system has its own HFI instance. Every logical CPU in the package is represented in the table. More than one logical CPUs may be represented in a single table entry. When the hardware updates the table, it generates a package-level thermal interrupt. The size and format of the HFI table depend on the supported features and can only be determined at runtime. To minimally initialize the HFI, parse its features and allocate one instance per package of a data structure with the necessary parameters to read and navigate a local copy (i.e., owned by the driver) of individual HFI tables. A subsequent changeset will provide per-CPU initialization and interrupt handling. Reviewed-by: Len Brown <len.brown@intel.com> Co-developed by: Aubrey Li <aubrey.li@linux.intel.com> Signed-off-by: Aubrey Li <aubrey.li@linux.intel.com> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-01-17thermal: int340x: Add Raptor Lake PCI device idSrinivas Pandruvada2-0/+2
Add Raptor Lake PCI ID for processor thermal device. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>