summaryrefslogtreecommitdiff
path: root/drivers/thermal/intel
AgeCommit message (Collapse)AuthorFilesLines
2024-12-14thermal: int3400: Fix reading of current_uuid for active policySrinivas Pandruvada1-1/+1
commit 7082503622986537f57bdb5ef23e69e70cfad881 upstream. When the current_uuid attribute is set to the active policy UUID, reading back the same attribute is returning "INVALID" instead of the active policy UUID on some platforms before Ice Lake. In platforms before Ice Lake, firmware provides a list of supported thermal policies. In this case, user space can select any of the supported thermal policies via a write to attribute "current_uuid". In commit c7ff29763989 ("thermal: int340x: Update OS policy capability handshake")', the OS policy handshake was updated to support Ice Lake and later platforms and it treated priv->current_uuid_index=0 as invalid. However, priv->current_uuid_index=0 is for the active policy, only priv->current_uuid_index=-1 is invalid. Fix this issue by updating the priv->current_uuid_index check. Fixes: c7ff29763989 ("thermal: int340x: Update OS policy capability handshake") Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: 5.18+ <stable@vger.kernel.org> # 5.18+ Link: https://patch.msgid.link/20241114200213.422303-1-srinivas.pandruvada@linux.intel.com [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-17thermal: intel: int340x: processor: Fix warning during module unloadZhang Rui1-2/+0
[ Upstream commit 99ca0b57e49fb73624eede1c4396d9e3d10ccf14 ] The processor_thermal driver uses pcim_device_enable() to enable a PCI device, which means the device will be automatically disabled on driver detach. Thus there is no need to call pci_disable_device() again on it. With recent PCI device resource management improvements, e.g. commit f748a07a0b64 ("PCI: Remove legacy pcim_release()"), this problem is exposed and triggers the warining below. [ 224.010735] proc_thermal_pci 0000:00:04.0: disabling already-disabled device [ 224.010747] WARNING: CPU: 8 PID: 4442 at drivers/pci/pci.c:2250 pci_disable_device+0xe5/0x100 ... [ 224.010844] Call Trace: [ 224.010845] <TASK> [ 224.010847] ? show_regs+0x6d/0x80 [ 224.010851] ? __warn+0x8c/0x140 [ 224.010854] ? pci_disable_device+0xe5/0x100 [ 224.010856] ? report_bug+0x1c9/0x1e0 [ 224.010859] ? handle_bug+0x46/0x80 [ 224.010862] ? exc_invalid_op+0x1d/0x80 [ 224.010863] ? asm_exc_invalid_op+0x1f/0x30 [ 224.010867] ? pci_disable_device+0xe5/0x100 [ 224.010869] ? pci_disable_device+0xe5/0x100 [ 224.010871] ? kfree+0x21a/0x2b0 [ 224.010873] pcim_disable_device+0x20/0x30 [ 224.010875] devm_action_release+0x16/0x20 [ 224.010878] release_nodes+0x47/0xc0 [ 224.010880] devres_release_all+0x9f/0xe0 [ 224.010883] device_unbind_cleanup+0x12/0x80 [ 224.010885] device_release_driver_internal+0x1ca/0x210 [ 224.010887] driver_detach+0x4e/0xa0 [ 224.010889] bus_remove_driver+0x6f/0xf0 [ 224.010890] driver_unregister+0x35/0x60 [ 224.010892] pci_unregister_driver+0x44/0x90 [ 224.010894] proc_thermal_pci_driver_exit+0x14/0x5f0 [processor_thermal_device_pci] ... [ 224.010921] ---[ end trace 0000000000000000 ]--- Remove the excess pci_disable_device() calls. Fixes: acd65d5d1cf4 ("thermal/drivers/int340x/processor_thermal: Add PCI MMIO based thermal driver") Signed-off-by: Zhang Rui <rui.zhang@intel.com> Reviewed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Link: https://patch.msgid.link/20240930081801.28502-3-rui.zhang@intel.com [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-17thermal: int340x: processor_thermal: Set feature mask before proc_thermal_addSrinivas Pandruvada1-11/+10
[ Upstream commit 6ebc25d8b053a208786295bab58abbb66b39c318 ] The function proc_thermal_add() adds sysfs entries for power limits. The feature mask of available features is not present at that time, so it cannot be used by proc_thermal_add() to selectively create sysfs attributes. The feature mask is set by proc_thermal_mmio_add(), so modify the code to call it before proc_thermal_add() so as to allow the latter to use the feature mask. There is no functional impact with this change. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> [ rjw: Changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Stable-dep-of: 99ca0b57e49f ("thermal: intel: int340x: processor: Fix warning during module unload") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-02-01thermal: intel: hfi: Add syscore callbacks for system-wide PMRicardo Neri1-0/+28
[ Upstream commit 97566d09fd02d2ab329774bb89a2cdf2267e86d9 ] The kernel allocates a memory buffer and provides its location to the hardware, which uses it to update the HFI table. This allocation occurs during boot and remains constant throughout runtime. When resuming from hibernation, the restore kernel allocates a second memory buffer and reprograms the HFI hardware with the new location as part of a normal boot. The location of the second memory buffer may differ from the one allocated by the image kernel. When the restore kernel transfers control to the image kernel, its HFI buffer becomes invalid, potentially leading to memory corruption if the hardware writes to it (the hardware continues to use the buffer from the restore kernel). It is also possible that the hardware "forgets" the address of the memory buffer when resuming from "deep" suspend. Memory corruption may also occur in such a scenario. To prevent the described memory corruption, disable HFI when preparing to suspend or hibernate. Enable it when resuming. Add syscore callbacks to handle the package of the boot CPU (packages of non-boot CPUs are handled via CPU offline). Syscore ops always run on the boot CPU. Additionally, HFI only needs to be disabled during "deep" suspend and hibernation. Syscore ops only run in these cases. Cc: 6.1+ <stable@vger.kernel.org> # 6.1+ Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> [ rjw: Comment adjustment, subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-02-01thermal: intel: hfi: Disable an HFI instance when all its CPUs go offlineRicardo Neri1-0/+35
[ Upstream commit 1c53081d773c2cb4461636559b0d55b46559ceec ] In preparation to support hibernation, add functionality to disable an HFI instance during CPU offline. The last CPU of an instance that goes offline will disable such instance. The Intel Software Development Manual states that the operating system must wait for the hardware to set MSR_IA32_PACKAGE_THERM_STATUS[26] after disabling an HFI instance to ensure that it will no longer write on the HFI memory. Some processors, however, do not ever set such bit. Wait a minimum of 2ms to give time hardware to complete any pending memory writes. Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Stable-dep-of: 97566d09fd02 ("thermal: intel: hfi: Add syscore callbacks for system-wide PM") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-02-01thermal: intel: hfi: Refactor enabling code into helper functionsRicardo Neri1-21/+22
[ Upstream commit 8a8b6bb93c704776c4b05cb517c3fa8baffb72f5 ] In preparation for the addition of a suspend notifier, wrap the logic to enable HFI and program its memory buffer into helper functions. Both the CPU hotplug callback and the suspend notifier will use them. This refactoring does not introduce functional changes. Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Stable-dep-of: 97566d09fd02 ("thermal: intel: hfi: Add syscore callbacks for system-wide PM") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-11thermal: intel: BXT_PMIC: select REGMAP instead of depending on itRandy Dunlap1-1/+2
[ Upstream commit 1467fb960349dfa5e300658f1a409dde2cfb0c51 ] REGMAP is a hidden (not user visible) symbol. Users cannot set it directly thru "make *config", so drivers should select it instead of depending on it if they need it. Consistently using "select" or "depends on" can also help reduce Kconfig circular dependency issues. Therefore, change the use of "depends on REGMAP" to "select REGMAP". Fixes: b474303ffd57 ("thermal: add Intel BXT WhiskeyCove PMIC thermal driver") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-11thermal: intel: quark_dts: fix error pointer dereferenceDan Carpenter1-10/+2
[ Upstream commit f1b930e740811d416de4d2074da48b6633a672c8 ] If alloc_soc_dts() fails, then we can just return. Trying to free "soc_dts" will lead to an Oops. Fixes: 8c1876939663 ("thermal: intel Quark SoC X1000 DTS thermal driver") Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-10thermal: intel: powerclamp: Fix cur_state for multi package systemSrinivas Pandruvada1-4/+16
commit 8e47363588377e1bdb65e2b020b409cfb44dd260 upstream. The powerclamp cooling device cur_state shows actual idle observed by package C-state idle counters. But the implementation is not sufficient for multi package or multi die system. The cur_state value is incorrect. On these systems, these counters must be read from each package/die and somehow aggregate them. But there is no good method for aggregation. It was not a problem when explicit CPU model addition was required to enable intel powerclamp. In this way certain CPU models could have been avoided. But with the removal of CPU model check with the availability of Package C-state counters, the driver is loaded on most of the recent systems. For multi package/die systems, just show the actual target idle state, the system is trying to achieve. In powerclamp this is the user set state minus one. Also there is no use of starting a worker thread for polling package C-state counters and applying any compensation for multiple package or multiple die systems. Fixes: b721ca0d1927 ("thermal/powerclamp: remove cpu whitelist") Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: 4.14+ <stable@vger.kernel.org> # 4.14+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-10thermal: intel: intel_pch: Add support for Wellsburg PCHTim Zimmermann1-0/+8
[ Upstream commit 40dc1929089fc844ea06d9f8bdb6211ed4517c2e ] Add the PCI ID for the Wellsburg C610 series chipset PCH. The driver can read the temperature from the Wellsburg PCH with only the PCI ID added and no other modifications. Signed-off-by: Tim Zimmermann <tim@linux4.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-10thermal: intel: Fix unsigned comparison with less than zeroYang Li1-1/+1
[ Upstream commit e7fcfe67f9f410736b758969477b17ea285e8e6c ] The return value from the call to intel_tcc_get_tjmax() is int, which can be a negative error code. However, the return value is being assigned to an u32 variable 'tj_max', so making 'tj_max' an int. Eliminate the following warning: ./drivers/thermal/intel/intel_soc_dts_iosf.c:394:5-11: WARNING: Unsigned expression compared with zero: tj_max < 0 Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=3637 Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Acked-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-02-01thermal: intel: int340x: Add locking to int340x_thermal_get_trip_type()Rafael J. Wysocki1-3/+7
[ Upstream commit acd7e9ee57c880b99671dd99680cb707b7b5b0ee ] In order to prevent int340x_thermal_get_trip_type() from possibly racing with int340x_thermal_read_trips() invoked by int3403_notify() add locking to it in analogy with int340x_thermal_get_trip_temp(). Fixes: 6757a7abe47b ("thermal: intel: int340x: Protect trip temperature from concurrent updates") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-02-01thermal: intel: int340x: Protect trip temperature from concurrent updatesSrinivas Pandruvada2-3/+16
commit 6757a7abe47bcb12cb2d45661067e182424b0ee3 upstream. Trip temperatures are read using ACPI methods and stored in the memory during zone initializtion and when the firmware sends a notification for change. This trip temperature is returned when the thermal core calls via callback get_trip_temp(). But it is possible that while updating the memory copy of the trips when the firmware sends a notification for change, thermal core is reading the trip temperature via the callback get_trip_temp(). This may return invalid trip temperature. To address this add a mutex to protect the invalid temperature reads in the callback get_trip_temp() and int340x_thermal_read_trips(). Fixes: 5fbf7f27fa3d ("Thermal/int340x: Add common thermal zone handler") Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: 5.0+ <stable@vger.kernel.org> # 5.0+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-01-12thermal: int340x: Add missing attribute for data rate baseSrinivas Pandruvada1-0/+4
commit b878d3ba9bb41cddb73ba4b56e5552f0a638daca upstream. Commit 473be51142ad ("thermal: int340x: processor_thermal: Add RFIM driver")' added rfi_restriction_data_rate_base string, mmio details and documentation, but missed adding attribute to sysfs. Add missing sysfs attribute. Fixes: 473be51142ad ("thermal: int340x: processor_thermal: Add RFIM driver") Cc: 5.11+ <stable@vger.kernel.org> # v5.11+ Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-10-15thermal: intel_powerclamp: Use first online CPU as control_cpuRafael J. Wysocki1-5/+1
Commit 68b99e94a4a2 ("thermal: intel_powerclamp: Use get_cpu() instead of smp_processor_id() to avoid crash") fixed an issue related to using smp_processor_id() in preemptible context by replacing it with a pair of get_cpu()/put_cpu(), but what is needed there really is any online CPU and not necessarily the one currently running the code. Arguably, getting the one that's running the code in there is confusing. For this reason, simply give the control CPU role to the first online one which automatically will be CPU0 if it is online, so one check can be dropped from the code for an added benefit. Link: https://lore.kernel.org/linux-pm/20221011113646.GA12080@duo.ucw.cz/ Fixes: 68b99e94a4a2 ("thermal: intel_powerclamp: Use get_cpu() instead of smp_processor_id() to avoid crash") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Chen Yu <yu.c.chen@intel.com>
2022-09-24thermal: int340x: processor_thermal: Use module_pci_driver() macroShang XiaoJing2-24/+2
Since PCI provides helper macro module_pci_driver(), the module_init/exit code can be replaced with it. Signed-off-by: Shang XiaoJing <shangxiaojing@huawei.com> [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-09-21thermal: intel_powerclamp: Remove accounting for IRQ wakesSrinivas Pandruvada1-18/+3
There is a static variable "idle_wakeup_counter", which accounts for number of wake ups because of IRQs and take actions to compensate idle injection. This is now read and reset to 0, but never incremented. So all the usage of this counter for idle injection has no use. Also another static variable "reduce_irq", which depends on "idle_wakeup_counter", so remove usage of "reduce_irq" also. Commit feb6cd6a0f9f ("thermal/intel_powerclamp: stop sched tick in forced idle") replaced the local use of "mwait_idle_with_hints" with play_idle(). This removed possibility of updating "idle_wakeup_counter" without change in play_idle(). This change was made in Linux 4.10. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> [ rjw: Changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-09-21thermal: intel_powerclamp: Use get_cpu() instead of smp_processor_id() to ↵Srinivas Pandruvada1-2/+4
avoid crash When CPU 0 is offline and intel_powerclamp is used to inject idle, it generates kernel BUG: BUG: using smp_processor_id() in preemptible [00000000] code: bash/15687 caller is debug_smp_processor_id+0x17/0x20 CPU: 4 PID: 15687 Comm: bash Not tainted 5.19.0-rc7+ #57 Call Trace: <TASK> dump_stack_lvl+0x49/0x63 dump_stack+0x10/0x16 check_preemption_disabled+0xdd/0xe0 debug_smp_processor_id+0x17/0x20 powerclamp_set_cur_state+0x7f/0xf9 [intel_powerclamp] ... ... Here CPU 0 is the control CPU by default and changed to the current CPU, if CPU 0 offlined. This check has to be performed under cpus_read_lock(), hence the above warning. Use get_cpu() instead of smp_processor_id() to avoid this BUG. Suggested-by: Chen Yu <yu.c.chen@intel.com> Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> [ rjw: Subject edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-09-03thermal: int340x_thermal: Consolidate priv->data_vault checksRafael J. Wysocki1-3/+2
It is sufficient to check priv->data_vault once in the error code path of int3400_thermal_probe(), so do that. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-08-23thermal/int340x_thermal: handle data_vault when the value is ZERO_SIZE_PTRLee, Chun-Yi1-4/+5
In some case, the GDDV returns a package with a buffer which has zero length. It causes that kmemdup() returns ZERO_SIZE_PTR (0x10). Then the data_vault_read() got NULL point dereference problem when accessing the 0x10 value in data_vault. [ 71.024560] BUG: kernel NULL pointer dereference, address: 0000000000000010 This patch uses ZERO_OR_NULL_PTR() for checking ZERO_SIZE_PTR or NULL value in data_vault. Signed-off-by: "Lee, Chun-Yi" <jlee@suse.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-08-03thermal: intel: Add TCC cooling support for Alder Lake-N and Raptor Lake-PSumeet Pawnikar1-0/+2
Add Alder Lake-N and Raptor Lake-P to the list of processor models supported by the Intel TCC cooling driver. Signed-off-by: Sumeet Pawnikar <sumeet.r.pawnikar@intel.com> Acked-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-07-22intel: thermal: PCH: Drop ACPI_FADT_LOW_POWER_S0 checkRafael J. Wysocki1-8/+0
If ACPI_FADT_LOW_POWER_S0 is not set, this doesn't mean that low-power S0 idle is not usable. It merely means that using S3 on the given system is more beneficial from the energy saving perspective than using low-power S0 idle, as long as S3 is supported. Suspend-to-idle is still a valid suspend mode if ACPI_FADT_LOW_POWER_S0 is not set and the pm_suspend_via_firmware() check in pch_wpt_suspend() is sufficient to distinguish suspend-to-idle from S3, so drop the confusing ACPI_FADT_LOW_POWER_S0 check. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Zhang Rui <rui.zhang@intel.com>
2022-07-12thermal: intel: x86_pkg_temp_thermal: Drop duplicate 'is' from commentJiang Jian1-1/+1
There is an unexpected word 'is' in a comments that need to be dropped file: ./drivers/thermal/intel/x86_pkg_temp_thermal.c line: 108 * tj-max is is interesting because threshold is set relative to this changed to: * tj-max is interesting because threshold is set relative to this Signed-off-by: Jiang Jian <jiangjian@cdjrlc.com> [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-07-01Merge tag 'thermal-5.19-rc5' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull thermal control fix from Rafael Wysocki: "Add a new CPU ID to the list of supported processors in the intel_tcc_cooling driver (Sumeet Pawnikar)" * tag 'thermal-5.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: thermal: intel_tcc_cooling: Add TCC cooling support for RaptorLake
2022-06-30thermal: intel_tcc_cooling: Add TCC cooling support for RaptorLakeSumeet Pawnikar1-0/+1
Add RaptorLake to the list of processor models supported by the Intel TCC cooling driver. Signed-off-by: Sumeet Pawnikar <sumeet.r.pawnikar@intel.com> [ rjw: Subject edits, new changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-05-30Merge tag 'thermal-5.19-rc1-2' of ↵Linus Torvalds2-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull additional thermal control update from Rafael Wysocki: "Add Meteor Lake PCI device ID to the int340x thermal control driver (Sumeet Pawnikar)" * tag 'thermal-5.19-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: thermal: int340x: Add Meteor Lake PCI device ID
2022-05-25thermal: int340x: Add Meteor Lake PCI device IDSumeet Pawnikar2-0/+2
Add Meteor Lake PCI ID for processor thermal device. Signed-off-by: Sumeet Pawnikar <sumeet.r.pawnikar@intel.com> Reviewed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-05-25ACPI: DPTF: Support Meteor LakeSumeet Pawnikar2-0/+2
Add Meteor Lake ACPI IDs for DPTF devices. Signed-off-by: Sumeet Pawnikar <sumeet.r.pawnikar@intel.com> Reviewed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-05-23Merge branches 'thermal-int340x', 'thermal-pch' and 'thermal-misc'Rafael J. Wysocki2-30/+37
Merge int340x thermal driver updates, PCH thermal driver updates and miscellaneous thermal control updates for 5.19-rc1: - Clean up _OSC handling in int340x (Davidlohr Bueso). - Improve overheat condition handling during suspend-to-idle in the Intel PCH thermal driver (Zhang Rui). - Use local ops instead of global ops in devfreq_cooling (Kant Fan). - Switch hisi_termal from CONFIG_PM_SLEEP guards to pm_sleep_ptr() (Hesham Almatary) * thermal-int340x: thermal: int340x: Clean up _OSC context init thermal: int340x: Consolidate freeing of acpi_buffer pointer thermal: int340x: Clean up unnecessary acpi_buffer pointer freeing * thermal-pch: thermal: intel: pch: improve the cooling delay log thermal: intel: pch: enhance overheat handling thermal: intel: pch: move cooling delay to suspend_noirq phase PM: wakeup: expose pm_wakeup_pending to modules * thermal-misc: thermal: devfreq_cooling: use local ops instead of global ops thermal: hisi_termal: Switch from CONFIG_PM_SLEEP guards to pm_sleep_ptr()
2022-05-23Merge back earlier thermal control updates for 5.19-rc1.Rafael J. Wysocki1-16/+32
2022-05-19thermal: intel: pch: improve the cooling delay logZhang Rui1-11/+20
Previously, during suspend, intel_pch_thermal driver logs for every cooling iteration, about the current PCH temperature and number of cooling iterations that have been tried, like below [ 100.955526] intel_pch_thermal 0000:00:14.2: CPU-PCH current temp [53C] higher than the threshold temp [50C], sleep 1 times for 100 ms duration [ 101.064156] intel_pch_thermal 0000:00:14.2: CPU-PCH current temp [53C] higher than the threshold temp [50C], sleep 2 times for 100 ms duration After changing the default delay_cnt to 600, in practice, it is common to see tens of the above messages if the system is suspended when PCH overheats. Thus, change this log message from dev_warn to dev_dbg because it is only useful when we want to check the temperature trend. At the same time, there is always a one-line message given by the driver with the patch applied, with below four possibilities. 1. PCH is cool, no cooling delay needed [ 1791.902853] intel_pch_thermal 0000:00:12.0: CPU-PCH is cool [48C] 2. PCH overheats and becomes cool after the cooling delays [ 1475.511617] intel_pch_thermal 0000:00:12.0: CPU-PCH is cool [49C] after 30700 ms delay 3. PCH still overheats after the overall cooling timeout [ 2250.157487] intel_pch_thermal 0000:00:12.0: CPU-PCH is hot [60C] after 60000 ms delay. S0ix might fail 4. PCH aborts cooling because of wakeup event detected during the delay [ 1933.639509] intel_pch_thermal 0000:00:12.0: Wakeup event detected, abort cooling Signed-off-by: Zhang Rui <rui.zhang@intel.com> Tested-by: Sumeet Pawnikar <sumeet.r.pawnikar@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-05-19thermal: intel: pch: enhance overheat handlingZhang Rui1-5/+8
Commit ef63b043ac86 ("thermal: intel: pch: fix S0ix failure due to PCH temperature above threshold") introduces delay loop mechanism that allows PCH temperature to go down below threshold during suspend so it won't block S0ix. And the default overall delay timeout is 1 second. However, in practice, we found that the time it takes to cool the PCH down below threshold highly depends on the initial PCH temperature when the delay starts, as well as the ambient temperature. And in some cases, the 1 second delay is not sufficient. As a result, the system stays in a shallower power state like PCx instead of S0ix, and drains the battery power, without user' notice. To make sure S0ix is not blocked by the PCH overheating, we 1. expand the default overall timeout to 60 seconds. 2. make sure the temperature is below threshold rather than equal to it. At the same time, as the cooling delay can be much longer and many wakeup events (ACPI Power Button press, USB mouse move, etc) becomes valid in the suspend_noirq phase, add detection of wakeup event so that the driver does not delay blindly when the system suspend is likely to abort soon. This patch may introduce longer suspend time, but only in the cases when the system overheats and Linux used to enter a shallower S2idle state, say, PCx instead of S0ix. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Tested-by: Sumeet Pawnikar <sumeet.r.pawnikar@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-05-19thermal: intel: pch: move cooling delay to suspend_noirq phaseZhang Rui1-2/+3
Move the PCH Thermal driver suspend callback to suspend_noirq to do cooling while the system is more quiescent. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Tested-by: Sumeet Pawnikar <sumeet.r.pawnikar@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-05-18thermal: intel: hfi: remove NULL check after container_of() callHaowen Bai1-2/+0
container_of() will never return NULL, so remove useless code. Signed-off-by: Haowen Bai <baihaowen@meizu.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-05-18Merge back earlier int340x driver changes for 5.19.Rafael J. Wysocki1-15/+9
2022-05-11thermal: int340x: Mode setting with new OS handshakeSrinivas Pandruvada1-16/+32
With the new OS handshake introduced by commit: "c7ff29763989 ("thermal: int340x: Update OS policy capability handshake")", the "enabled" thermal zone mode doesn't work in the same way as previously. The "enabled" mode fails with -EINVAL when the new handshake is used. To address this issue, when the new OS UUID mask is set: - When the mode is "enabled", return 0 as the firmware already has the latest policy mask. - When the mode is "disabled", update the firmware with the UUID mask of zero. This way, the firmware can take over the thermal control. Also reset the OS UUID mask, which allows user space to update with new set of policies. Fixes: c7ff29763989 ("thermal: int340x: Update OS policy capability handshake") Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> [ rjw: Changelog edits, removed unneeded parens ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-05-05Merge back earlier int340x thermal driver changes for 5.19.Rafael J. Wysocki1-15/+9
2022-04-21thermal: int340x: Fix attr.show callback prototypeKees Cook1-2/+2
Control Flow Integrity (CFI) instrumentation of the kernel noticed that the caller, dev_attr_show(), and the callback, odvp_show(), did not have matching function prototypes, which would cause a CFI exception to be raised. Correct the prototype by using struct device_attribute instead of struct kobj_attribute. Reported-and-tested-by: Joao Moreira <joao@overdrivepizza.com> Link: https://lore.kernel.org/lkml/067ce8bd4c3968054509831fa2347f4f@overdrivepizza.com/ Fixes: 006f006f1e5c ("thermal/int340x_thermal: Export OEM vendor variables") Cc: 5.8+ <stable@vger.kernel.org> # 5.8+ Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-04-05thermal: int340x: Clean up _OSC context initDavidlohr Bueso1-5/+2
Now that the UUID is already sanitized by the caller, lets trivially clean up some of the context arming. Signed-off-by: Davidlohr Bueso <dave@stgolabs.net> Acked-by: Zhang Rui <rui.zhang@intel.com> [ rjw: Subject edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-04-05thermal: int340x: Consolidate freeing of acpi_buffer pointerDavidlohr Bueso1-8/+5
Introduce a single point of freeing/exit after ensuring no error in int3400_setup_gddv(). Signed-off-by: Davidlohr Bueso <dave@stgolabs.net> Acked-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-04-05thermal: int340x: Clean up unnecessary acpi_buffer pointer freeingDavidlohr Bueso1-2/+2
It is the caller's responsibility to free only upon ACPI_SUCCESS. Signed-off-by: Davidlohr Bueso <dave@stgolabs.net> Acked-by: Zhang Rui <rui.zhang@intel.com> [ rjw: Subject edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-03-18Merge branch 'thermal-hfi'Rafael J. Wysocki5-0/+623
Merge Intel Hardware Feedback Interface (HFI) thermal driver for 5.18-rc1 and update the intel-speed-select utility to support that driver. * thermal-hfi: tools/power/x86/intel-speed-select: v1.12 release tools/power/x86/intel-speed-select: HFI support tools/power/x86/intel-speed-select: OOB daemon mode thermal: intel: hfi: INTEL_HFI_THERMAL depends on NET thermal: netlink: Fix parameter type of thermal_genl_cpu_capability_event() stub thermal: intel: hfi: Notify user space for HFI events thermal: netlink: Add a new event to notify CPU capabilities change thermal: intel: hfi: Enable notification interrupt thermal: intel: hfi: Handle CPU hotplug events thermal: intel: hfi: Minimally initialize the Hardware Feedback Interface x86/cpu: Add definitions for the Intel Hardware Feedback Interface x86/Documentation: Describe the Intel Hardware Feedback Interface
2022-03-18Merge branches 'thermal-powerclamp', 'thermal-int340x' and 'thermal-docs'Rafael J. Wysocki3-72/+113
Merge powerclamp thermal driver changes, int340x thermal driver changes and thermal documentation changes for 5.18-rc1: - Don't use bitmap_weight() in end_power_clamp() in the powerclamp driver (Yury Norov). - Update the OS policy capabilities handshake in the int340x thermal driver (Srinivas Pandruvada). - Increase the policies bitmap size in int340x (Srinivas Pandruvada). - Replace acpi_bus_get_device() with acpi_fetch_acpi_dev() in the int340x thermal driver (Rafael Wysocki). - Check for NULL after calling kmemdup() in int340x (Jiasheng Jiang). - Add Intel Dynamic Power and Thermal Framework (DPTF) kernel interface documentation (Srinivas Pandruvada). - Fix bullet list warning in the thermal documentation (Randy Dunlap). * thermal-powerclamp: thermal: intel_powerclamp: don't use bitmap_weight() in end_power_clamp() * thermal-int340x: thermal: int340x: Update OS policy capability handshake thermal: int340x: Increase bitmap size thermal: Replace acpi_bus_get_device() thermal: int340x: Check for NULL after calling kmemdup() * thermal-docs: Documentation: thermal: DPTF Documentation thermal: fix Documentation bullet list warning
2022-03-16thermal: int340x: Update OS policy capability handshakeSrinivas Pandruvada1-49/+97
Update the firmware with OS supported policies mask, so that firmware can relinquish its internal controls. Without this update several Tiger Lake laptops gets performance limited with in few seconds of executing in turbo region. The existing way of enumerating firmware policies via IDSP method and selecting policy by directly writing those policy UUIDS via _OSC method is not supported in newer generation of hardware. There is a new UUID "B23BA85D-C8B7-3542-88DE-8DE2FFCFD698" is defined for updating policy capabilities. As part of ACPI _OSC method: Arg0 - UUID: B23BA85D-C8B7-3542-88DE-8DE2FFCFD698 Arg1 - Rev ID: 1 Arg2 - Count: 2 Arg3 - Capability buffers: Array of Arg2 DWORDS DWORD1: As defined in the ACPI 5.0 Specification - Bit 0: Query Flag - Bits 1-3: Always 0 - Bits 4-31: Reserved DWORD2 and beyond: - Bit0: set to 1 to indicate Intel(R) Dynamic Tuning is active, 0 to indicate it is disabled and legacy thermal mechanism should be enabled. - Bit1: set to 1 to indicate Intel(R) Dynamic Tuning is controlling active cooling, 0 to indicate bios shall enable legacy thermal zone with active trip point. - Bit2: set to 1 to indicate Intel(R) Dynamic Tuning is controlling passive cooling, 0 to indicate bios shall enable legacy thermal zone with passive trip point. - Bit3: set to 1 to indicate Intel(R) Dynamic Tuning is handling critical trip point, 0 to indicate bios shall enable legacy thermal zone with critical trip point. - Bits 4:31: Reserved From sysfs interface, there is an existing interface to update policy UUID using attribute "current_uuid". User space can write the same UUID for ACTIVE, PASSIVE and CRITICAL policy. Driver converts these UUIDs to DWORD2 Bit 1 to Bit 3. When any of the policy is activated by user space it is assumed that dynamic tuning is active. For example $cd /sys/bus/platform/devices/INTC1040:00/uuids To support active policy $echo "3A95C389-E4B8-4629-A526-C52C88626BAE" > current_uuid To support passive policy $echo "42A441D6-AE6A-462b-A84B-4A8CE79027D3" > current_uuid To support critical policy $echo "97C68AE7-15FA-499c-B8C9-5DA81D606E0A" > current_uuid To check all the supported policies $cat current_uuid 3A95C389-E4B8-4629-A526-C52C88626BAE 42A441D6-AE6A-462b-A84B-4A8CE79027D3 97C68AE7-15FA-499c-B8C9-5DA81D606E0A To match the bit format for DWORD2, rearranged enum int3400_thermal_uuid and int3400_thermal_uuids[] by swapping current INT3400_THERMAL_ACTIVE and INT3400_THERMAL_PASSIVE_1. If the policies are enumerated via IDSP method then legacy method is used, if not the new method is used to update policy support. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-03-16thermal: int340x: Increase bitmap sizeSrinivas Pandruvada1-1/+1
The number of policies are 10, so can't be supported by the bitmap size of u8. Even though there are no platfoms with these many policies, but for correctness increase to u32. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Fixes: 16fc8eca1975 ("thermal/int340x_thermal: Add additional UUIDs") Cc: 5.1+ <stable@vger.kernel.org> # 5.1+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-02-28Merge back int340x thermal driver changes for v5.18.Rafael J. Wysocki2-16/+12
2022-02-24thermal: int340x: fix memory leak in int3400_notify()Chuansheng Liu1-0/+4
It is easy to hit the below memory leaks in my TigerLake platform: unreferenced object 0xffff927c8b91dbc0 (size 32): comm "kworker/0:2", pid 112, jiffies 4294893323 (age 83.604s) hex dump (first 32 bytes): 4e 41 4d 45 3d 49 4e 54 33 34 30 30 20 54 68 65 NAME=INT3400 The 72 6d 61 6c 00 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5 rmal.kkkkkkkkkk. backtrace: [<ffffffff9c502c3e>] __kmalloc_track_caller+0x2fe/0x4a0 [<ffffffff9c7b7c15>] kvasprintf+0x65/0xd0 [<ffffffff9c7b7d6e>] kasprintf+0x4e/0x70 [<ffffffffc04cb662>] int3400_notify+0x82/0x120 [int3400_thermal] [<ffffffff9c8b7358>] acpi_ev_notify_dispatch+0x54/0x71 [<ffffffff9c88f1a7>] acpi_os_execute_deferred+0x17/0x30 [<ffffffff9c2c2c0a>] process_one_work+0x21a/0x3f0 [<ffffffff9c2c2e2a>] worker_thread+0x4a/0x3b0 [<ffffffff9c2cb4dd>] kthread+0xfd/0x130 [<ffffffff9c201c1f>] ret_from_fork+0x1f/0x30 Fix it by calling kfree() accordingly. Fixes: 38e44da59130 ("thermal: int3400_thermal: process "thermal table changed" event") Signed-off-by: Chuansheng Liu <chuansheng.liu@intel.com> Cc: 4.14+ <stable@vger.kernel.org> # 4.14+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-02-10thermal: intel: hfi: INTEL_HFI_THERMAL depends on NETRandy Dunlap1-0/+1
THERMAL_NETLINK depends on NET and since 'select' does not follow any dependency chain, INTEL_HFI_THERMAL also should depend on NET. Fix one Kconfig warning and 48 subsequent build errors: WARNING: unmet direct dependencies detected for THERMAL_NETLINK Depends on [n]: THERMAL [=y] && NET [=n] Selected by [y]: - INTEL_HFI_THERMAL [=y] && THERMAL [=y] && (X86 [=y] || X86_INTEL_QUARK [=n] || COMPILE_TEST [=y]) && CPU_SUP_INTEL [=y] && X86_THERMAL_VECTOR [=y] Fixes: bd30cdfd9bd7 ("thermal: intel: hfi: Notify user space for HFI events") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-02-04thermal: Replace acpi_bus_get_device()Rafael J. Wysocki1-16/+7
Replace acpi_bus_get_device() that is going to be dropped with acpi_fetch_acpi_dev(). No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-02-04thermal: intel_powerclamp: don't use bitmap_weight() in end_power_clamp()Yury Norov1-6/+3
Don't call bitmap_weight() if the following code can get by without it. Signed-off-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>