summaryrefslogtreecommitdiff
path: root/drivers/thermal
AgeCommit message (Collapse)AuthorFilesLines
2025-10-15thermal/drivers/qcom/lmh: Add missing IRQ includesDmitry Baryshkov1-0/+2
[ Upstream commit b50b2c53f98fcdb6957e184eb488c16502db9575 ] As reported by LKP, the Qualcomm LMH driver needs to include several IRQ-related headers, which decrlare necessary IRQ functionality. Currently driver builds on ARM64 platforms, where the headers are pulled in implicitly by other headers, but fails to build on other platforms. Fixes: 53bca371cdf7 ("thermal/drivers/qcom: Add support for LMh driver") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202507270042.KdK0KKht-lkp@intel.com/ Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20250728-lmh-scm-v2-2-33bc58388ca5@oss.qualcomm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-10-15thermal/drivers/qcom: Make LMH select QCOM_SCMDmitry Baryshkov1-1/+2
[ Upstream commit 57eda47bd14b0c2876f2db42e757c57b7a671965 ] The QCOM_SCM symbol is not user-visible, so it makes little sense to depend on it. Make LMH driver select QCOM_SCM as all other drivers do and, as the dependecy is now correctly handled, enable || COMPILE_TEST in order to include the driver into broader set of build tests. Fixes: 9e5a4fb84230 ("thermal/drivers/qcom/lmh: make QCOM_LMH depends on QCOM_SCM") Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20250728-lmh-scm-v2-1-33bc58388ca5@oss.qualcomm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-09thermal/drivers/mediatek/lvts: Disable low offset IRQ for minimum thresholdNícolas F. R. A. Prado1-14/+36
[ Upstream commit fa17ff8e325a657c84be1083f06e54ee7eea82e4 ] In order to get working interrupts, a low offset value needs to be configured. The minimum value for it is 20 Celsius, which is what is configured when there's no lower thermal trip (ie the thermal core passes -INT_MAX as low trip temperature). However, when the temperature gets that low and fluctuates around that value it causes an interrupt storm. Prevent that interrupt storm by not enabling the low offset interrupt if the low threshold is the minimum one. Cc: stable@vger.kernel.org Fixes: 77354eaef821 ("thermal/drivers/mediatek/lvts_thermal: Don't leave threshold zeroed") Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Signed-off-by: Nícolas F. R. A. Prado <nfraprado@collabora.com> Link: https://lore.kernel.org/r/20250113-mt8192-lvts-filtered-suspend-fix-v2-3-07a25200c7c6@collabora.com Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> [ Adapted interrupt mask definitions ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-08-28thermal: sysfs: Return ENODATA instead of EAGAIN for readsHsin-Te Yuan1-3/+6
[ Upstream commit 1a4aabc27e95674837f2e25f4ef340c0469e6203 ] According to POSIX spec, EAGAIN returned by read with O_NONBLOCK set means the read would block. Hence, the common implementation in nonblocking model will poll the file when the nonblocking read returns EAGAIN. However, when the target file is thermal zone, this mechanism will totally malfunction because thermal zone doesn't implement sysfs notification and thus the poll will never return. For example, the read in Golang implemnts such method and sometimes hangs at reading some thermal zones via sysfs. Change to return -ENODATA instead of -EAGAIN to userspace. Signed-off-by: Hsin-Te Yuan <yuanhsinte@chromium.org> Link: https://patch.msgid.link/20250620-temp-v3-1-6becc6aeb66c@chromium.org Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-28thermal/drivers/qcom-spmi-temp-alarm: Enable stage 2 shutdown when requiredDavid Collins1-9/+34
[ Upstream commit f8e157ff2df46ddabd930815d196895976227831 ] Certain TEMP_ALARM GEN2 PMIC peripherals need over-temperature stage 2 automatic PMIC partial shutdown. This will ensure that in the event of reaching the hotter stage 3 over-temperature threshold, repeated faults will be avoided during the automatic PMIC hardware full shutdown. Modify the stage 2 shutdown control logic to ensure that stage 2 shutdown is enabled on all affected PMICs. Read the digital major and minor revision registers to identify these PMICs. Signed-off-by: David Collins <david.collins@oss.qualcomm.com> Signed-off-by: Anjelique Melendez <anjelique.melendez@oss.qualcomm.com> Link: https://lore.kernel.org/r/20250710224555.3047790-2-anjelique.melendez@oss.qualcomm.com Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-04thermal: intel: x86_pkg_temp_thermal: Fix bogus trip temperatureZhang Rui1-0/+1
commit cf948c8e274e8b406e846cdf6cc48fe47f98cf57 upstream. The tj_max value obtained from the Intel TCC library are in Celsius, whereas the thermal subsystem operates in milli-Celsius. This discrepancy leads to incorrect trip temperature calculations. Fix bogus trip temperature by converting tj_max to milli-Celsius Unit. Fixes: 8ef0ca4a177d ("Merge back other thermal control material for 6.3.") Signed-off-by: Zhang Rui <rui.zhang@intel.com> Reported-by: zhang ning <zhangn1985@outlook.com> Closes: https://lore.kernel.org/all/TY2PR01MB3786EF0FE24353026293F5ACCD97A@TY2PR01MB3786.jpnprd01.prod.outlook.com/ Tested-by: zhang ning <zhangn1985@outlook.com> Cc: 6.3+ <stable@vger.kernel.org> # 6.3+ Link: https://patch.msgid.link/20250519070901.1031233-1-rui.zhang@intel.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-06-04thermal/drivers/qoriq: Power down TMU on system suspendAlice Guo1-0/+13
[ Upstream commit 229f3feb4b0442835b27d519679168bea2de96c2 ] Enable power-down of TMU (Thermal Management Unit) for TMU version 2 during system suspend to save power. Save approximately 4.3mW on VDD_ANA_1P8 on i.MX93 platforms. Signed-off-by: Alice Guo <alice.guo@nxp.com> Signed-off-by: Frank Li <Frank.Li@nxp.com> Link: https://lore.kernel.org/r/20241209164859.3758906-2-Frank.Li@nxp.com Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-25thermal/drivers/rockchip: Add missing rk3328 mapping entryTrevor Woerner1-0/+1
commit ee022e5cae052e0c67ca7c5fec0f2e7bc897c70e upstream. The mapping table for the rk3328 is missing the entry for -25C which is found in the TRM section 9.5.2 "Temperature-to-code mapping". NOTE: the kernel uses the tsadc_q_sel=1'b1 mode which is defined as: 4096-<code in table>. Whereas the table in the TRM gives the code "3774" for -25C, the kernel uses 4096-3774=322. [Dragan Simic] : "After going through the RK3308 and RK3328 TRMs, as well as through the downstream kernel code, it seems we may have some troubles at our hands. Let me explain, please. To sum it up, part 1 of the RK3308 TRM v1.1 says on page 538 that the equation for the output when tsadc_q_sel equals 1 is (4096 - tsadc_q), while part 1 of the RK3328 TRM v1.2 says that the output equation is (1024 - tsadc_q) in that case. The downstream kernel code, however, treats the RK3308 and RK3328 tables and their values as being the same. It even mentions 1024 as the "offset" value in a comment block for the rk_tsadcv3_control() function, just like the upstream code does, which is obviously wrong "offset" value when correlated with the table on page 544 of part 1 of the RK3308 TRM v1.1. With all this in mind, it's obvious that more work is needed to make it clear where's the actual mistake (it could be that the TRM is wrong), which I'll volunteer for as part of the SoC binning project. In the meantime, this patch looks fine as-is to me, by offering what's a clear improvement to the current state of the upstream code" Link: https://opensource.rock-chips.com/images/9/97/Rockchip_RK3328TRM_V1.1-Part1-20170321.pdf Cc: stable@vger.kernel.org Fixes: eda519d5f73e ("thermal: rockchip: Support the RK3328 SOC in thermal driver") Signed-off-by: Trevor Woerner <twoerner@gmail.com> Reviewed-by: Dragan Simic <dsimic@manjaro.org> Link: https://lore.kernel.org/r/20250207175048.35959-1-twoerner@gmail.com Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-04-10thermal: int340x: Add NULL check for adevChenyuan Yang1-0/+3
[ Upstream commit 2542a3f70e563a9e70e7ded314286535a3321bdb ] Not all devices have an ACPI companion fwnode, so adev might be NULL. This is similar to the commit cd2fd6eab480 ("platform/x86: int3472: Check for adev == NULL"). Add a check for adev not being set and return -ENODEV in that case to avoid a possible NULL pointer deref in int3402_thermal_probe(). Note, under the same directory, int3400_thermal_probe() has such a check. Fixes: 77e337c6e23e ("Thermal: introduce INT3402 thermal driver") Signed-off-by: Chenyuan Yang <chenyuan0y@gmail.com> Acked-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com> Link: https://patch.msgid.link/20250313043611.1212116-1-chenyuan0y@gmail.com [ rjw: Subject edit, added Fixes: ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-03-22thermal/cpufreq_cooling: Remove structure member documentationDaniel Lezcano1-2/+0
[ Upstream commit a6768c4f92e152265590371975d44c071a5279c7 ] The structure member documentation refers to a member which does not exist any more. Remove it. Link: https://lore.kernel.org/all/202501220046.h3PMBCti-lkp@intel.com/ Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202501220046.h3PMBCti-lkp@intel.com/ Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Link: https://patch.msgid.link/20250211084712.2746705-1-daniel.lezcano@linaro.org [ rjw: Minor changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-01-17thermal: of: fix OF node leak in of_thermal_zone_find()Joe Hattori1-0/+1
[ Upstream commit 9164e0912af206a72ddac4915f7784e470a04ace ] of_thermal_zone_find() calls of_parse_phandle_with_args(), but does not release the OF node reference obtained by it. Add a of_node_put() call when the call is successful. Fixes: 3fd6d6e2b4e8 ("thermal/of: Rework the thermal device tree initialization") Signed-off-by: Joe Hattori <joe@pf.is.s.u-tokyo.ac.jp> Link: https://patch.msgid.link/20241224031809.950461-1-joe@pf.is.s.u-tokyo.ac.jp [ rjw: Changelog edit ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-14thermal/drivers/qcom/tsens-v1: Add support for MSM8937 tsensBarnabás Czémán3-8/+18
[ Upstream commit e2ffb6c3a40ee714160e35e61f0a984028b5d550 ] Add support for tsens v1.4 block what can be found in MSM8937 and MSM8917. Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Signed-off-by: Barnabás Czémán <barnabas.czeman@mainlining.org> Link: https://lore.kernel.org/r/20241113-msm8917-v6-5-c348fb599fef@mainlining.org Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-09thermal: int3400: Fix reading of current_uuid for active policySrinivas Pandruvada1-1/+1
commit 7082503622986537f57bdb5ef23e69e70cfad881 upstream. When the current_uuid attribute is set to the active policy UUID, reading back the same attribute is returning "INVALID" instead of the active policy UUID on some platforms before Ice Lake. In platforms before Ice Lake, firmware provides a list of supported thermal policies. In this case, user space can select any of the supported thermal policies via a write to attribute "current_uuid". In commit c7ff29763989 ("thermal: int340x: Update OS policy capability handshake")', the OS policy handshake was updated to support Ice Lake and later platforms and it treated priv->current_uuid_index=0 as invalid. However, priv->current_uuid_index=0 is for the active policy, only priv->current_uuid_index=-1 is invalid. Fix this issue by updating the priv->current_uuid_index check. Fixes: c7ff29763989 ("thermal: int340x: Update OS policy capability handshake") Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: 5.18+ <stable@vger.kernel.org> # 5.18+ Link: https://patch.msgid.link/20241114200213.422303-1-srinivas.pandruvada@linux.intel.com [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-09thermal: core: Initialize thermal zones before registering themRafael J. Wysocki1-1/+1
[ Upstream commit 662f920f7e390db5d1a6792a2b0ffa59b6c962fc ] Since user space can start interacting with a new thermal zone as soon as device_register() called by thermal_zone_device_register_with_trips() returns, it is better to initialize the thermal zone before calling device_register() on it. Fixes: d0df264fbd3c ("thermal/core: Remove pointless thermal_zone_device_reset() function") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://patch.msgid.link/3336146.44csPzL39Z@rjwysocki.net Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-11-14thermal/drivers/qcom/lmh: Remove false lockdep backtraceDmitry Baryshkov1-0/+7
commit f16beaaee248eaa37ad40b5905924fcf70ae02e3 upstream. Annotate LMH IRQs with lockdep classes so that the lockdep doesn't report possible recursive locking issue between LMH and GIC interrupts. For the reference: CPU0 ---- lock(&irq_desc_lock_class); lock(&irq_desc_lock_class); *** DEADLOCK *** Call trace: dump_backtrace+0x98/0xf0 show_stack+0x18/0x24 dump_stack_lvl+0x90/0xd0 dump_stack+0x18/0x24 print_deadlock_bug+0x258/0x348 __lock_acquire+0x1078/0x1f44 lock_acquire+0x1fc/0x32c _raw_spin_lock_irqsave+0x60/0x88 __irq_get_desc_lock+0x58/0x98 enable_irq+0x38/0xa0 lmh_enable_interrupt+0x2c/0x38 irq_enable+0x40/0x8c __irq_startup+0x78/0xa4 irq_startup+0x78/0x168 __enable_irq+0x70/0x7c enable_irq+0x4c/0xa0 qcom_cpufreq_ready+0x20/0x2c cpufreq_online+0x2a8/0x988 cpufreq_add_dev+0x80/0x98 subsys_interface_register+0x104/0x134 cpufreq_register_driver+0x150/0x234 qcom_cpufreq_hw_driver_probe+0x2a8/0x388 platform_probe+0x68/0xc0 really_probe+0xbc/0x298 __driver_probe_device+0x78/0x12c driver_probe_device+0x3c/0x160 __device_attach_driver+0xb8/0x138 bus_for_each_drv+0x84/0xe0 __device_attach+0x9c/0x188 device_initial_probe+0x14/0x20 bus_probe_device+0xac/0xb0 deferred_probe_work_func+0x8c/0xc8 process_one_work+0x20c/0x62c worker_thread+0x1bc/0x36c kthread+0x120/0x124 ret_from_fork+0x10/0x20 Fixes: 53bca371cdf7 ("thermal/drivers/qcom: Add support for LMh driver") Cc: stable@vger.kernel.org Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Link: https://lore.kernel.org/r/20241011-lmh-lockdep-v1-1-495cbbe6fef1@linaro.org Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-11-14thermal/of: support thermal zones w/o trips subnodeIcenowy Zheng1-11/+10
[ Upstream commit 725f31f300e300a9d94976bd8f1db6e746f95f63 ] Although the current device tree binding of thermal zones require the trips subnode, the binding in kernel v5.15 does not require it, and many device trees shipped with the kernel, for example, allwinner/sun50i-a64.dtsi and mediatek/mt8183-kukui.dtsi in ARM64, still comply to the old binding and contain no trips subnode. Allow the code to successfully register thermal zones w/o trips subnode for DT binding compatibility now. Furtherly, the inconsistency between DTs and bindings should be resolved by either adding empty trips subnode or dropping the trips subnode requirement. Fixes: d0c75fa2c17f ("thermal/of: Initialize trip points separately") Signed-off-by: Icenowy Zheng <uwu@icenowy.me> [wenst@chromium.org: Reworked logic and kernel log messages] Signed-off-by: Chen-Yu Tsai <wenst@chromium.org> Reviewed-by: Rafael J. Wysocki <rafael@kernel.org> Link: https://lore.kernel.org/r/20241018073139.1268995-1-wenst@chromium.org Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-11-08thermal: intel: int340x: processor: Add MMIO RAPL PL4 supportZhang Rui1-2/+2
[ Upstream commit 3fb0eea8a1c4be5884e0731ea76cbd3ce126e1f3 ] Similar to the MSR RAPL interface, MMIO RAPL supports PL4 too, so add MMIO RAPL PL4d support to the processor_thermal driver. As a result, the powercap sysfs for MMIO RAPL will show a new "peak power" constraint. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Reviewed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Link: https://patch.msgid.link/20240930081801.28502-7-rui.zhang@intel.com [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-11-08thermal: intel: int340x: processor: Remove MMIO RAPL CPU hotplug supportZhang Rui1-44/+22
[ Upstream commit bfc6819e4bf56a55df6178f93241b5845ad672eb ] CPU0/package0 is always online and the MMIO RAPL driver runs on single package systems only, so there is no need to handle CPU hotplug in it. Always register a RAPL package device for package 0 and remove the unnecessary CPU hotplug support. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Reviewed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Link: https://patch.msgid.link/20240930081801.28502-6-rui.zhang@intel.com [ rjw: Subject edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-11-08thermal: core: Free tzp copy along with the thermal zoneRafael J. Wysocki1-3/+1
[ Upstream commit 827a07525c099f54d3b15110408824541ec66b3c ] The object pointed to by tz->tzp may still be accessed after being freed in thermal_zone_device_unregister(), so move the freeing of it to the point after the removal completion has been completed at which it cannot be accessed any more. Fixes: 3d439b1a2ad3 ("thermal/core: Alloc-copy-free the thermal zone parameters structure") Cc: 6.8+ <stable@vger.kernel.org> # 6.8+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Link: https://patch.msgid.link/4623516.LvFx2qVVIh@rjwysocki.net Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-11-08thermal: core: Rework thermal zone availability checkRafael J. Wysocki1-3/+12
[ Upstream commit b38aa87f67931e23ebc32c0ca00a86dfa4688719 ] In order to avoid running __thermal_zone_device_update() for thermal zones going away, the thermal zone lock is held around device_del() in thermal_zone_device_unregister() and thermal_zone_device_update() passes the given thermal zone device to device_is_registered(). This allows thermal_zone_device_update() to skip the __thermal_zone_device_update() if device_del() has already run for the thermal zone at hand. However, instead of looking at driver core internals, the thermal subsystem may as well rely on its own data structures for this purpose. Namely, if the thermal zone is not present in thermal_tz_list, it can be regarded as unavailable, which in fact is already the case in thermal_zone_device_unregister(). Accordingly, the device_is_registered() check in thermal_zone_device_update() can be replaced with checking whether or not the node list_head in struct thermal_zone_device is empty, in which case it is not there in thermal_tz_list. To make this work, though, it is necessary to initialize tz->node in thermal_zone_device_register_with_trips() before registering the thermal zone device and it needs to be added to thermal_tz_list and deleted from it under its zone lock. After the above modifications, the zone lock does not need to be held around device_del() in thermal_zone_device_unregister() any more. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-and-tested-by: Lukasz Luba <lukasz.luba@arm.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Stable-dep-of: 827a07525c09 ("thermal: core: Free tzp copy along with the thermal zone") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-11-08thermal: core: Make thermal_zone_device_unregister() return after freeing ↵Rafael J. Wysocki1-1/+5
the zone [ Upstream commit 4649620d9404d3aceb25891c24bab77143e3f21c ] Make thermal_zone_device_unregister() wait until all of the references to the given thermal zone object have been dropped and free it before returning. This guarantees that when thermal_zone_device_unregister() returns, there is no leftover activity regarding the thermal zone in question which is required by some of its callers (for instance, modular driver code that wants to know when it is safe to let the module go away). Subsequently, this will allow some confusing device_is_registered() checks to be dropped from the thermal sysfs and core code. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-and-tested-by: Lukasz Luba <lukasz.luba@arm.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Stable-dep-of: 827a07525c09 ("thermal: core: Free tzp copy along with the thermal zone") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-17thermal: intel: int340x: processor: Fix warning during module unloadZhang Rui1-2/+0
[ Upstream commit 99ca0b57e49fb73624eede1c4396d9e3d10ccf14 ] The processor_thermal driver uses pcim_device_enable() to enable a PCI device, which means the device will be automatically disabled on driver detach. Thus there is no need to call pci_disable_device() again on it. With recent PCI device resource management improvements, e.g. commit f748a07a0b64 ("PCI: Remove legacy pcim_release()"), this problem is exposed and triggers the warining below. [ 224.010735] proc_thermal_pci 0000:00:04.0: disabling already-disabled device [ 224.010747] WARNING: CPU: 8 PID: 4442 at drivers/pci/pci.c:2250 pci_disable_device+0xe5/0x100 ... [ 224.010844] Call Trace: [ 224.010845] <TASK> [ 224.010847] ? show_regs+0x6d/0x80 [ 224.010851] ? __warn+0x8c/0x140 [ 224.010854] ? pci_disable_device+0xe5/0x100 [ 224.010856] ? report_bug+0x1c9/0x1e0 [ 224.010859] ? handle_bug+0x46/0x80 [ 224.010862] ? exc_invalid_op+0x1d/0x80 [ 224.010863] ? asm_exc_invalid_op+0x1f/0x30 [ 224.010867] ? pci_disable_device+0xe5/0x100 [ 224.010869] ? pci_disable_device+0xe5/0x100 [ 224.010871] ? kfree+0x21a/0x2b0 [ 224.010873] pcim_disable_device+0x20/0x30 [ 224.010875] devm_action_release+0x16/0x20 [ 224.010878] release_nodes+0x47/0xc0 [ 224.010880] devres_release_all+0x9f/0xe0 [ 224.010883] device_unbind_cleanup+0x12/0x80 [ 224.010885] device_release_driver_internal+0x1ca/0x210 [ 224.010887] driver_detach+0x4e/0xa0 [ 224.010889] bus_remove_driver+0x6f/0xf0 [ 224.010890] driver_unregister+0x35/0x60 [ 224.010892] pci_unregister_driver+0x44/0x90 [ 224.010894] proc_thermal_pci_driver_exit+0x14/0x5f0 [processor_thermal_device_pci] ... [ 224.010921] ---[ end trace 0000000000000000 ]--- Remove the excess pci_disable_device() calls. Fixes: acd65d5d1cf4 ("thermal/drivers/int340x/processor_thermal: Add PCI MMIO based thermal driver") Signed-off-by: Zhang Rui <rui.zhang@intel.com> Reviewed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Link: https://patch.msgid.link/20240930081801.28502-3-rui.zhang@intel.com [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-17thermal: int340x: processor_thermal: Set feature mask before proc_thermal_addSrinivas Pandruvada1-11/+10
[ Upstream commit 6ebc25d8b053a208786295bab58abbb66b39c318 ] The function proc_thermal_add() adds sysfs entries for power limits. The feature mask of available features is not present at that time, so it cannot be used by proc_thermal_add() to selectively create sysfs attributes. The feature mask is set by proc_thermal_mmio_add(), so modify the code to call it before proc_thermal_add() so as to allow the latter to use the feature mask. There is no functional impact with this change. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> [ rjw: Changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Stable-dep-of: 99ca0b57e49f ("thermal: intel: int340x: processor: Fix warning during module unload") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-09-04thermal: of: Fix OF node leak in of_thermal_zone_find() error pathsKrzysztof Kozlowski1-6/+7
[ Upstream commit c0a1ef9c5be72ff28a5413deb1b3e1a066593c13 ] Terminating for_each_available_child_of_node() loop requires dropping OF node reference, so bailing out on errors misses this. Solve the OF node reference leak with scoped for_each_available_child_of_node_scoped(). Fixes: 3fd6d6e2b4e8 ("thermal/of: Rework the thermal device tree initialization") Cc: <stable@vger.kernel.org> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Chen-Yu Tsai <wenst@chromium.org> Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://patch.msgid.link/20240814195823.437597-3-krzysztof.kozlowski@linaro.org Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-09-04thermal: of: Fix OF node leak in thermal_of_trips_init() error pathKrzysztof Kozlowski1-2/+2
[ Upstream commit afc954fd223ded70b1fa000767e2531db55cce58 ] Terminating for_each_child_of_node() loop requires dropping OF node reference, so bailing out after thermal_of_populate_trip() error misses this. Solve the OF node reference leak with scoped for_each_child_of_node_scoped(). Fixes: d0c75fa2c17f ("thermal/of: Initialize trip points separately") Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Chen-Yu Tsai <wenst@chromium.org> Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://patch.msgid.link/20240814195823.437597-1-krzysztof.kozlowski@linaro.org Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-11thermal/drivers/broadcom: Fix race between removal and clock disableKrzysztof Kozlowski1-15/+4
[ Upstream commit e90c369cc2ffcf7145a46448de101f715a1f5584 ] During the probe, driver enables clocks necessary to access registers (in get_temp()) and then registers thermal zone with managed-resources (devm) interface. Removal of device is not done in reversed order, because: 1. Clock will be disabled in driver remove() callback - thermal zone is still registered and accessible to users, 2. devm interface will unregister thermal zone. This leaves short window between (1) and (2) for accessing the get_temp() callback with disabled clock. Fix this by enabling clock also via devm-interface, so entire cleanup path will be in proper, reversed order. Fixes: 8454c8c09c77 ("thermal/drivers/bcm2835: Remove buggy call to thermal_of_zone_unregister") Cc: stable@vger.kernel.org Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20240709-thermal-probe-v1-1-241644e2b6e0@linaro.org Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-11thermal: bcm2835: Convert to platform remove callback returning voidUwe Kleine-König1-4/+2
[ Upstream commit f29ecd3748a28d0b52512afc81b3c13fd4a00c9b ] The .remove() callback for a platform driver returns an int which makes many driver authors wrongly assume it's possible to do error handling by returning an error code. However the value returned is ignored (apart from emitting a warning) and this typically results in resource leaks. To improve here there is a quest to make the remove callback return void. In the first step of this quest all drivers are converted to .remove_new(), which already returns void. Eventually after all drivers are converted, .remove_new() will be renamed to .remove(). Trivially convert this driver from always returning zero in the remove callback to the void returning variant. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Stable-dep-of: e90c369cc2ff ("thermal/drivers/broadcom: Fix race between removal and clock disable") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-07-11thermal/drivers/mediatek/lvts_thermal: Check NULL ptr on lvts_dataJulien Panis1-0/+2
[ Upstream commit a1191a77351e25ddf091bb1a231cae12ee598b5d ] Verify that lvts_data is not NULL before using it. Signed-off-by: Julien Panis <jpanis@baylibre.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20240502-mtk-thermal-lvts-data-v1-1-65f1b0bfad37@baylibre.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-06-27thermal/drivers/mediatek/lvts_thermal: Return error in case of invalid efuse ↵Julien Panis1-1/+5
data [ Upstream commit 72cacd06e47d86d89b0e7179fbc9eb3a0f39cd93 ] This patch prevents from registering thermal entries and letting the driver misbehave if efuse data is invalid. A device is not properly calibrated if the golden temperature is zero. Fixes: f5f633b18234 ("thermal/drivers/mediatek: Add the Low Voltage Thermal Sensor driver") Signed-off-by: Julien Panis <jpanis@baylibre.com> Reviewed-by: Nicolas Pitre <npitre@baylibre.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Link: https://lore.kernel.org/r/20240604-mtk-thermal-calib-check-v2-1-8f258254051d@baylibre.com Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-06-16thermal/drivers/qcom/lmh: Check for SCM availability at probeKonrad Dybcio1-0/+3
commit d9d3490c48df572edefc0b64655259eefdcbb9be upstream. Up until now, the necessary scm availability check has not been performed, leading to possible null pointer dereferences (which did happen for me on RB1). Fix that. Fixes: 53bca371cdf7 ("thermal/drivers/qcom: Add support for LMh driver") Cc: <stable@vger.kernel.org> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Reviewed-by: Bjorn Andersson <andersson@kernel.org> Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20240308-topic-rb1_lmh-v2-2-bac3914b0fe3@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-12thermal/drivers/tsens: Fix null pointer dereferenceAleksandr Mishin1-1/+1
[ Upstream commit d998ddc86a27c92140b9f7984ff41e3d1d07a48f ] compute_intercept_slope() is called from calibrate_8960() (in tsens-8960.c) as compute_intercept_slope(priv, p1, NULL, ONE_PT_CALIB) which lead to null pointer dereference (if DEBUG or DYNAMIC_DEBUG set). Fix this bug by adding null pointer check. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: dfc1193d4dbd ("thermal/drivers/tsens: Replace custom 8960 apis with generic apis") Signed-off-by: Aleksandr Mishin <amishin@t-argos.ru> Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20240411114021.12203-1-amishin@t-argos.ru Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-04-13thermal/of: Assume polling-delay(-passive) 0 when absentKonrad Dybcio1-4/+8
[ Upstream commit 488164006a281986d95abbc4b26e340c19c4c85b ] Currently, thermal zones associated with providers that have interrupts for signaling hot/critical trips are required to set a polling-delay of 0 to indicate no polling. This feels a bit backwards. Change the code such that "no polling delay" also means "no polling". Suggested-by: Bjorn Andersson <andersson@kernel.org> Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Reviewed-by: Bjorn Andersson <andersson@kernel.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20240125-topic-thermal-v1-2-3c9d4dced138@linaro.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-04-03thermal: devfreq_cooling: Fix perf state when calculate dfc res_utilYe Zhang1-1/+1
commit a26de34b3c77ae3a969654d94be49e433c947e3b upstream. The issue occurs when the devfreq cooling device uses the EM power model and the get_real_power() callback is provided by the driver. The EM power table is sorted ascending,can't index the table by cooling device state,so convert cooling state to performance state by dfc->max_state - dfc->capped_state. Fixes: 615510fe13bd ("thermal: devfreq_cooling: remove old power model and use EM") Cc: 5.11+ <stable@vger.kernel.org> # 5.11+ Signed-off-by: Ye Zhang <ye.zhang@rock-chips.com> Reviewed-by: Dhruva Gole <d-gole@ti.com> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-03thermal/drivers/mediatek: Fix control buffer enablement on MT7896Frank Wunderlich1-0/+3
[ Upstream commit 371ed6263e2403068b359f0c07188548c2d70827 ] Reading thermal sensor on mt7986 devices returns invalid temperature: bpi-r3 ~ # cat /sys/class/thermal/thermal_zone0/temp -274000 Fix this by adding missing members in mtk_thermal_data struct which were used in mtk_thermal_turn_on_buffer after commit 33140e668b10. Cc: stable@vger.kernel.org Fixes: 33140e668b10 ("thermal/drivers/mediatek: Control buffer enablement tweaks") Signed-off-by: Frank Wunderlich <frank-w@public-files.de> Reviewed-by: Markus Schneider-Pargmann <msp@baylibre.com> Reviewed-by: Daniel Golle <daniel@makrotopia.org> Tested-by: Daniel Golle <daniel@makrotopia.org> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20230907112018.52811-1-linux@fw-web.de Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-04-03powercap: intel_rapl: Fix locking in TPMI RAPLZhang Rui1-4/+4
[ Upstream commit 1aa09b9379a7a644cd2f75ae0bac82b8783df600 ] The RAPL framework uses CPU hotplug locking to protect the rapl_packages list and rp->lead_cpu to guarantee that 1. the RAPL package device is not unprobed and freed 2. the cached rp->lead_cpu is always valid for operations like powercap sysfs accesses. Current RAPL APIs assume being called from CPU hotplug callbacks which hold the CPU hotplug lock, but TPMI RAPL driver invokes the APIs in the driver's .probe() function without acquiring the CPU hotplug lock. Fix the problem by providing both locked and lockless versions of RAPL APIs. Fixes: 9eef7f9da928 ("powercap: intel_rapl: Introduce RAPL TPMI interface driver") Signed-off-by: Zhang Rui <rui.zhang@intel.com> Cc: 6.5+ <stable@vger.kernel.org> # 6.5+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-04-03thermal/intel: Fix intel_tcc_get_temp() to support negative CPU temperatureZhang Rui3-14/+14
[ Upstream commit 7251b9e8a007ddd834aa81f8c7ea338884629fec ] CPU temperature can be negative in some cases. Thus the negative CPU temperature should not be considered as a failure. Fix intel_tcc_get_temp() and its users to support negative CPU temperature. Fixes: a3c1f066e1c5 ("thermal/intel: Introduce Intel TCC library") Signed-off-by: Zhang Rui <rui.zhang@intel.com> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Cc: 6.3+ <stable@vger.kernel.org> # 6.3+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-27thermal/drivers/qoriq: Fix getting tmu rangePeng Fan1-4/+8
[ Upstream commit 4d0642074c67ed9928e9d68734ace439aa06e403 ] TMU Version 1 has 4 TTRCRs, while TMU Version >=2 has 16 TTRCRs. So limit the len to 4 will report "invalid range data" for i.MX93. This patch drop the local array with allocated ttrcr array and able to support larger tmu ranges. Fixes: f12d60c81fce ("thermal/drivers/qoriq: Support version 2.1") Tested-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Peng Fan <peng.fan@nxp.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20240226003657.3012880-1-peng.fan@oss.nxp.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-27thermal/drivers/mediatek/lvts_thermal: Fix a memory leak in an error ↵Christophe JAILLET1-1/+3
handling path [ Upstream commit ca93bf607a44c1f009283dac4af7df0d9ae5e357 ] If devm_krealloc() fails, then 'efuse' is leaking. So free it to avoid a leak. Fixes: f5f633b18234 ("thermal/drivers/mediatek: Add the Low Voltage Thermal Sensor driver") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/481d345233862d58c3c305855a93d0dbc2bbae7e.1706431063.git.christophe.jaillet@wanadoo.fr Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-02-05thermal: core: Fix thermal zone suspend-resume synchronizationRafael J. Wysocki1-7/+23
[ Upstream commit 4e814173a8c4f432fd068b1c796f0416328c9d99 ] There are 3 synchronization issues with thermal zone suspend-resume during system-wide transitions: 1. The resume code runs in a PM notifier which is invoked after user space has been thawed, so it can run concurrently with user space which can trigger a thermal zone device removal. If that happens, the thermal zone resume code may use a stale pointer to the next list element and crash, because it does not hold thermal_list_lock while walking thermal_tz_list. 2. The thermal zone resume code calls thermal_zone_device_init() outside the zone lock, so user space or an update triggered by the platform firmware may see an inconsistent state of a thermal zone leading to unexpected behavior. 3. Clearing the in_suspend global variable in thermal_pm_notify() allows __thermal_zone_device_update() to continue for all thermal zones and it may as well run before the thermal_tz_list walk (or at any point during the list walk for that matter) and attempt to operate on a thermal zone that has not been resumed yet. It may also race destructively with thermal_zone_device_init(). To address these issues, add thermal_list_lock locking to thermal_pm_notify(), especially arount the thermal_tz_list, make it call thermal_zone_device_init() back-to-back with __thermal_zone_device_update() under the zone lock and replace in_suspend with per-zone bool "suspend" indicators set and unset under the given zone's lock. Link: https://lore.kernel.org/linux-pm/20231218162348.69101-1-bo.ye@mediatek.com/ Reported-by: Bo Ye <bo.ye@mediatek.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-02-01thermal: trip: Drop lockdep assertion from thermal_zone_trip_id()Rafael J. Wysocki1-2/+0
commit 108ffd12be24ba1d74b3314df8db32a0a6d55ba5 upstream. The lockdep assertion in thermal_zone_trip_id() triggers when the trip point sysfs attribute of a thermal instance is read, because there is no thermal zone locking in that code path. This is not verly useful, though, because there is no mechanism by which the location of the trips[] table in a thermal zone or its size can change after binding cooling devices to the trips in that thermal zone and before those cooling devices are unbound from them. Thus it is not in fact necessary to hold the thermal zone lock when thermal_zone_trip_id() is called from trip_point_show() and so the lockdep asserion in the former is invalid. Accordingly, drop that lockdep assertion. Fixes: 2c7b4bfadef0 ("thermal: core: Store trip pointer in struct thermal_instance") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-02-01thermal: gov_power_allocator: avoid inability to reset a cdevDi Shen1-1/+1
[ Upstream commit e95fa7404716f6e25021e66067271a4ad8eb1486 ] Commit 0952177f2a1f ("thermal/core/power_allocator: Update once cooling devices when temp is low") adds an update flag to avoid triggering a thermal event when there is no need, and the thermal cdev is updated once when the temperature is low. But when the trips are writable, and switch_on_temp is set to be a higher value, the cooling device state may not be reset to 0, because last_temperature is smaller than switch_on_temp. For example: First: switch_on_temp=70 control_temp=85; Then userspace change the trip_temp: switch_on_temp=45 control_temp=55 cur_temp=54 Then userspace reset the trip_temp: switch_on_temp=70 control_temp=85 cur_temp=57 last_temp=54 At this time, the cooling device state should be reset to 0. However, because cur_temp(57) < switch_on_temp(70) last_temp(54) < switch_on_temp(70) ----> update = false, update is false, the cooling device state can not be reset. Using the observation that tz->passive can also be regarded as the temperature status, set the update flag to the tz->passive value. When the temperature drops below switch_on for the first time, the states of cooling devices can be reset once, and tz->passive is updated to 0. In the next round, because tz->passive is 0, cdev->state will not be updated. By using the tz->passive value as the "update" flag, the issue above can be solved, and the cooling devices can be updated only once when the temperature is low. Fixes: 0952177f2a1f ("thermal/core/power_allocator: Update once cooling devices when temp is low") Cc: 5.13+ <stable@vger.kernel.org> # 5.13+ Suggested-by: Wei Wang <wvw@google.com> Signed-off-by: Di Shen <di.shen@unisoc.com> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-02-01thermal: core: Store trip pointer in struct thermal_instanceRafael J. Wysocki9-37/+60
[ Upstream commit 2c7b4bfadef08cc0995c24a7b9eb120fe897165f ] Replace the integer trip number stored in struct thermal_instance with a pointer to the relevant trip and adjust the code using the structure in question accordingly. The main reason for making this change is to allow the trip point to cooling device binding code more straightforward, as illustrated by subsequent modifications of the ACPI thermal driver, but it also helps to clarify the overall design and allows the governor code overhead to be reduced (through subsequent modifications). The only case in which it adds complexity is trip_point_show() that needs to walk the trips[] table to find the index of the given trip point, but this is not a critical path and the interface that trip_point_show() belongs to is problematic anyway (for instance, it doesn't cover the case when the same cooling devices is associated with multiple trip points). This is a preliminary change and the affected code will be refined by a series of subsequent modifications of thermal governors, the core and the ACPI thermal driver. The general functionality is not expected to be affected by this change. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org> Stable-dep-of: e95fa7404716 ("thermal: gov_power_allocator: avoid inability to reset a cdev") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-02-01thermal: trip: Drop redundant trips check from for_each_thermal_trip()Rafael J. Wysocki1-3/+0
[ Upstream commit a15ffa783ea4210877886c59566a0d20f6b2bc09 ] It is invalid to call for_each_thermal_trip() on an unregistered thermal zone anyway, and as per thermal_zone_device_register_with_trips(), the trips[] table must be present if num_trips is greater than zero for the given thermal zone. Hence, the trips check in for_each_thermal_trip() is redundant and so it can be dropped. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Stable-dep-of: e95fa7404716 ("thermal: gov_power_allocator: avoid inability to reset a cdev") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-02-01thermal: intel: hfi: Add syscore callbacks for system-wide PMRicardo Neri1-0/+28
[ Upstream commit 97566d09fd02d2ab329774bb89a2cdf2267e86d9 ] The kernel allocates a memory buffer and provides its location to the hardware, which uses it to update the HFI table. This allocation occurs during boot and remains constant throughout runtime. When resuming from hibernation, the restore kernel allocates a second memory buffer and reprograms the HFI hardware with the new location as part of a normal boot. The location of the second memory buffer may differ from the one allocated by the image kernel. When the restore kernel transfers control to the image kernel, its HFI buffer becomes invalid, potentially leading to memory corruption if the hardware writes to it (the hardware continues to use the buffer from the restore kernel). It is also possible that the hardware "forgets" the address of the memory buffer when resuming from "deep" suspend. Memory corruption may also occur in such a scenario. To prevent the described memory corruption, disable HFI when preparing to suspend or hibernate. Enable it when resuming. Add syscore callbacks to handle the package of the boot CPU (packages of non-boot CPUs are handled via CPU offline). Syscore ops always run on the boot CPU. Additionally, HFI only needs to be disabled during "deep" suspend and hibernation. Syscore ops only run in these cases. Cc: 6.1+ <stable@vger.kernel.org> # 6.1+ Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> [ rjw: Comment adjustment, subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-02-01thermal: intel: hfi: Disable an HFI instance when all its CPUs go offlineRicardo Neri1-0/+35
[ Upstream commit 1c53081d773c2cb4461636559b0d55b46559ceec ] In preparation to support hibernation, add functionality to disable an HFI instance during CPU offline. The last CPU of an instance that goes offline will disable such instance. The Intel Software Development Manual states that the operating system must wait for the hardware to set MSR_IA32_PACKAGE_THERM_STATUS[26] after disabling an HFI instance to ensure that it will no longer write on the HFI memory. Some processors, however, do not ever set such bit. Wait a minimum of 2ms to give time hardware to complete any pending memory writes. Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Stable-dep-of: 97566d09fd02 ("thermal: intel: hfi: Add syscore callbacks for system-wide PM") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-02-01thermal: intel: hfi: Refactor enabling code into helper functionsRicardo Neri1-21/+22
[ Upstream commit 8a8b6bb93c704776c4b05cb517c3fa8baffb72f5 ] In preparation for the addition of a suspend notifier, wrap the logic to enable HFI and program its memory buffer into helper functions. Both the CPU hotplug callback and the suspend notifier will use them. This refactoring does not introduce functional changes. Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Stable-dep-of: 97566d09fd02 ("thermal: intel: hfi: Add syscore callbacks for system-wide PM") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-01-26drivers/thermal/loongson2_thermal: Fix incorrect PTR_ERR() judgmentBinbin Zhou1-1/+1
[ Upstream commit 15ef92e9c41124ee9d88b01208364f3fe1f45f84 ] PTR_ERR() returns -ENODEV when thermal-zones are undefined, and we need -ENODEV as the right value for comparison. Otherwise, tz->type is NULL when thermal-zones is undefined, resulting in the following error: [ 12.290030] CPU 1 Unable to handle kernel paging request at virtual address fffffffffffffff1, era == 900000000355f410, ra == 90000000031579b8 [ 12.302877] Oops[#1]: [ 12.305190] CPU: 1 PID: 181 Comm: systemd-udevd Not tainted 6.6.0-rc7+ #5385 [ 12.312304] pc 900000000355f410 ra 90000000031579b8 tp 90000001069e8000 sp 90000001069eba10 [ 12.320739] a0 0000000000000000 a1 fffffffffffffff1 a2 0000000000000014 a3 0000000000000001 [ 12.329173] a4 90000001069eb990 a5 0000000000000001 a6 0000000000001001 a7 900000010003431c [ 12.337606] t0 fffffffffffffff1 t1 54567fd5da9b4fd4 t2 900000010614ec40 t3 00000000000dc901 [ 12.346041] t4 0000000000000000 t5 0000000000000004 t6 900000010614ee20 t7 900000000d00b790 [ 12.354472] t8 00000000000dc901 u0 54567fd5da9b4fd4 s9 900000000402ae10 s0 900000010614ec40 [ 12.362916] s1 90000000039fced0 s2 ffffffffffffffed s3 ffffffffffffffed s4 9000000003acc000 [ 12.362931] s5 0000000000000004 s6 fffffffffffff000 s7 0000000000000490 s8 90000001028b2ec8 [ 12.362938] ra: 90000000031579b8 thermal_add_hwmon_sysfs+0x258/0x300 [ 12.386411] ERA: 900000000355f410 strscpy+0xf0/0x160 [ 12.391626] CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE) [ 12.397898] PRMD: 00000004 (PPLV0 +PIE -PWE) [ 12.403678] EUEN: 00000000 (-FPE -SXE -ASXE -BTE) [ 12.409859] ECFG: 00071c1c (LIE=2-4,10-12 VS=7) [ 12.415882] ESTAT: 00010000 [PIL] (IS= ECode=1 EsubCode=0) [ 12.415907] BADV: fffffffffffffff1 [ 12.415911] PRID: 0014a000 (Loongson-64bit, Loongson-2K1000) [ 12.415917] Modules linked in: loongson2_thermal(+) vfat fat uio_pdrv_genirq uio fuse zram zsmalloc [ 12.415950] Process systemd-udevd (pid: 181, threadinfo=00000000358b9718, task=00000000ace72fe3) [ 12.415961] Stack : 0000000000000dc0 54567fd5da9b4fd4 900000000402ae10 9000000002df9358 [ 12.415982] ffffffffffffffed 0000000000000004 9000000107a10aa8 90000001002a3410 [ 12.415999] ffffffffffffffed ffffffffffffffed 9000000107a11268 9000000003157ab0 [ 12.416016] 9000000107a10aa8 ffffff80020fc0c8 90000001002a3410 ffffffffffffffed [ 12.416032] 0000000000000024 ffffff80020cc1e8 900000000402b2a0 9000000003acc000 [ 12.416048] 90000001002a3410 0000000000000000 ffffff80020f4030 90000001002a3410 [ 12.416065] 0000000000000000 9000000002df6808 90000001002a3410 0000000000000000 [ 12.416081] ffffff80020f4030 0000000000000000 90000001002a3410 9000000002df2ba8 [ 12.416097] 00000000000000b4 90000001002a34f4 90000001002a3410 0000000000000002 [ 12.416114] ffffff80020f4030 fffffffffffffff0 90000001002a3410 9000000002df2f30 [ 12.416131] ... [ 12.416138] Call Trace: [ 12.416142] [<900000000355f410>] strscpy+0xf0/0x160 [ 12.416167] [<90000000031579b8>] thermal_add_hwmon_sysfs+0x258/0x300 [ 12.416183] [<9000000003157ab0>] devm_thermal_add_hwmon_sysfs+0x50/0xe0 [ 12.416200] [<ffffff80020cc1e8>] loongson2_thermal_probe+0x128/0x200 [loongson2_thermal] [ 12.416232] [<9000000002df6808>] platform_probe+0x68/0x140 [ 12.416249] [<9000000002df2ba8>] really_probe+0xc8/0x3c0 [ 12.416269] [<9000000002df2f30>] __driver_probe_device+0x90/0x180 [ 12.416286] [<9000000002df3058>] driver_probe_device+0x38/0x160 [ 12.416302] [<9000000002df33a8>] __driver_attach+0xa8/0x200 [ 12.416314] [<9000000002deffec>] bus_for_each_dev+0x8c/0x120 [ 12.416330] [<9000000002df198c>] bus_add_driver+0x10c/0x2a0 [ 12.416346] [<9000000002df46b4>] driver_register+0x74/0x160 [ 12.416358] [<90000000022201a4>] do_one_initcall+0x84/0x220 [ 12.416372] [<90000000022f3ab8>] do_init_module+0x58/0x2c0 [ 12.416386] [<90000000022f6538>] init_module_from_file+0x98/0x100 [ 12.416399] [<90000000022f67f0>] sys_finit_module+0x230/0x3c0 [ 12.416412] [<900000000358f7c8>] do_syscall+0x88/0xc0 [ 12.416431] [<900000000222137c>] handle_syscall+0xbc/0x158 Fixes: e7e3a7c35791 ("thermal/drivers/loongson-2: Add thermal management support") Cc: Yinbo Zhu <zhuyinbo@loongson.cn> Signed-off-by: Binbin Zhou <zhoubinbin@loongson.cn> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/343c14de98216636a47b43e8bfd47b70d0a8e068.1700817227.git.zhoubinbin@loongson.cn Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-01-26thermal: core: Fix NULL pointer dereference in zone registration error pathRafael J. Wysocki1-1/+0
[ Upstream commit 04e6ccfc93c5a1aa1d75a537cf27e418895e20ea ] If device_register() in thermal_zone_device_register_with_trips() returns an error, the tz variable is set to NULL and subsequently dereferenced in kfree(tz->tzp). Commit adc8749b150c ("thermal/drivers/core: Use put_device() if device_register() fails") added the tz = NULL assignment in question to avoid a possible double-free after dropping the reference to the zone device. However, after commit 4649620d9404 ("thermal: core: Make thermal_zone_device_unregister() return after freeing the zone"), that assignment has become redundant, because dropping the reference to the zone device does not cause the zone object to be freed any more. Drop it to address the NULL pointer dereference. Fixes: 3d439b1a2ad3 ("thermal/core: Alloc-copy-free the thermal zone parameters structure") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-28thermal: intel: powerclamp: fix mismatch in get function for max_idleDavid Arcari1-1/+1
commit fae633cfb729da2771b5433f6b84ae7e8b4aa5f7 upstream. KASAN reported this [ 444.853098] BUG: KASAN: global-out-of-bounds in param_get_int+0x77/0x90 [ 444.853111] Read of size 4 at addr ffffffffc16c9220 by task cat/2105 ... [ 444.853442] The buggy address belongs to the variable: [ 444.853443] max_idle+0x0/0xffffffffffffcde0 [intel_powerclamp] There is a mismatch between the param_get_int and the definition of max_idle. Replacing param_get_int with param_get_byte resolves this issue. Fixes: ebf519710218 ("thermal: intel: powerclamp: Add two module parameters") Cc: 6.3+ <stable@vger.kernel.org> # 6.3+ Signed-off-by: David Arcari <darcari@redhat.com> Reviewed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-20thermal: core: Don't update trip points inside the hysteresis rangeNícolas F. R. A. Prado1-2/+17
[ Upstream commit cf3986f8c01d355490d0ac6024391b989a9d1e9d ] When searching for the trip points that need to be set, the nearest higher trip point's temperature is used for the high trip, while the nearest lower trip point's temperature minus the hysteresis is used for the low trip. The issue with this logic is that when the current temperature is inside a trip point's hysteresis range, both high and low trips will come from the same trip point. As a consequence instability can still occur like this: * the temperature rises slightly and enters the hysteresis range of a trip point * polling happens and updates the trip points to the hysteresis range * the temperature falls slightly, exiting the hysteresis range, crossing the trip point and triggering an IRQ, the trip points are updated * repeat So even though the current hysteresis implementation prevents instability from happening due to IRQs triggering on the same temperature value, both ways, it doesn't prevent it from happening due to an IRQ on one way and polling on the other. To properly implement a hysteresis behavior, when inside the hysteresis range, don't update the trip points. This way, the previously set trip points will stay in effect, which will in a way remember the previous state (if the temperature signal came from above or below the range) and therefore have the right trip point already set. The exception is if there was no previous trip point set, in which case a previous state doesn't exist, and so it's sensible to allow the hysteresis range as trip points. The following logs show the current behavior when running on a real machine: [ 202.524658] thermal thermal_zone0: new temperature boundaries: -2147483647 < x < 40000 203.562817: thermal_temperature: thermal_zone=vpu0-thermal id=0 temp_prev=36986 temp=37979 [ 203.562845] thermal thermal_zone0: new temperature boundaries: 37000 < x < 40000 204.176059: thermal_temperature: thermal_zone=vpu0-thermal id=0 temp_prev=37979 temp=40028 [ 204.176089] thermal thermal_zone0: new temperature boundaries: 37000 < x < 100000 205.226813: thermal_temperature: thermal_zone=vpu0-thermal id=0 temp_prev=40028 temp=38652 [ 205.226842] thermal thermal_zone0: new temperature boundaries: 37000 < x < 40000 And with this patch applied: [ 184.933415] thermal thermal_zone0: new temperature boundaries: -2147483647 < x < 40000 185.981182: thermal_temperature: thermal_zone=vpu0-thermal id=0 temp_prev=36986 temp=37872 186.744685: thermal_temperature: thermal_zone=vpu0-thermal id=0 temp_prev=37872 temp=40058 [ 186.744716] thermal thermal_zone0: new temperature boundaries: 37000 < x < 100000 187.773284: thermal_temperature: thermal_zone=vpu0-thermal id=0 temp_prev=40058 temp=38698 Fixes: 060c034a9741 ("thermal: Add support for hardware-tracked trip points") Signed-off-by: Nícolas F. R. A. Prado <nfraprado@collabora.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Co-developed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>