summaryrefslogtreecommitdiff
path: root/drivers/thermal/thermal_core.c
AgeCommit message (Collapse)AuthorFilesLines
2023-11-20thermal: core: prevent potential string overflowDan Carpenter1-2/+4
[ Upstream commit c99626092efca3061b387043d4a7399bf75fbdd5 ] The dev->id value comes from ida_alloc() so it's a number between zero and INT_MAX. If it's too high then these sprintf()s will overflow. Fixes: 203d3d4aa482 ("the generic thermal sysfs driver") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09thermal/core: Fix memory leak in the error pathDaniel Lezcano1-0/+1
commit d44616c6cc3e35eea03ecfe9040edfa2b486a059 upstream. Fix the following error: smatch warnings: drivers/thermal/thermal_core.c:1020 __thermal_cooling_device_register() warn: possible memory leak of 'cdev' by freeing the cdev when exiting the function in the error path. Fixes: 584837618100 ("thermal/drivers/core: Use a char pointer for the cooling device name") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20210319202257.890848-1-daniel.lezcano@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-09thermal/core: fix a UAF bug in __thermal_cooling_device_register()Ziyang Xuan1-2/+4
commit 0a5c26712f963f0500161a23e0ffff8d29f742ab upstream. When device_register() return failed, program will goto out_kfree_type to release 'cdev->device' by put_device(). That will call thermal_release() to free 'cdev'. But the follow-up processes access 'cdev' continually. That trggers the UAF bug. ==================================================================== BUG: KASAN: use-after-free in __thermal_cooling_device_register+0x75b/0xa90 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 Call Trace: dump_stack_lvl+0xe2/0x152 print_address_description.constprop.0+0x21/0x140 ? __thermal_cooling_device_register+0x75b/0xa90 kasan_report.cold+0x7f/0x11b ? __thermal_cooling_device_register+0x75b/0xa90 __thermal_cooling_device_register+0x75b/0xa90 ? memset+0x20/0x40 ? __sanitizer_cov_trace_pc+0x1d/0x50 ? __devres_alloc_node+0x130/0x180 devm_thermal_of_cooling_device_register+0x67/0xf0 max6650_probe.cold+0x557/0x6aa ...... Freed by task 258: kasan_save_stack+0x1b/0x40 kasan_set_track+0x1c/0x30 kasan_set_free_info+0x20/0x30 __kasan_slab_free+0x109/0x140 kfree+0x117/0x4c0 thermal_release+0xa0/0x110 device_release+0xa7/0x240 kobject_put+0x1ce/0x540 put_device+0x20/0x30 __thermal_cooling_device_register+0x731/0xa90 devm_thermal_of_cooling_device_register+0x67/0xf0 max6650_probe.cold+0x557/0x6aa [max6650] Do not use 'cdev' again after put_device() to fix the problem like doing in thermal_zone_device_register(). [dlezcano]: as requested by Rafael, change the affectation into two statements. Fixes: 584837618100 ("thermal/drivers/core: Use a char pointer for the cooling device name") Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/r/20211015024504.947520-1-william.xuanziyang@huawei.com Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-09thermal/core: Fix memory leak in __thermal_cooling_device_register()Yang Yingliang1-0/+1
[ Upstream commit 98a160e898c0f4a979af9de3ab48b4b1d42d1dbb ] I got memory leak as follows when doing fault injection test: unreferenced object 0xffff888010080000 (size 264312): comm "182", pid 102533, jiffies 4296434960 (age 10.100s) hex dump (first 32 bytes): 00 00 00 00 ad 4e ad de ff ff ff ff 00 00 00 00 .....N.......... ff ff ff ff ff ff ff ff 40 7f 1f b9 ff ff ff ff ........@....... backtrace: [<0000000038b2f4fc>] kmalloc_order_trace+0x1d/0x110 mm/slab_common.c:969 [<00000000ebcb8da5>] __kmalloc+0x373/0x420 include/linux/slab.h:510 [<0000000084137f13>] thermal_cooling_device_setup_sysfs+0x15d/0x2d0 include/linux/slab.h:586 [<00000000352b8755>] __thermal_cooling_device_register+0x332/0xa60 drivers/thermal/thermal_core.c:927 [<00000000fb9f331b>] devm_thermal_of_cooling_device_register+0x6b/0xf0 drivers/thermal/thermal_core.c:1041 [<000000009b8012d2>] max6650_probe.cold+0x557/0x6aa drivers/hwmon/max6650.c:211 [<00000000da0b7e04>] i2c_device_probe+0x472/0xac0 drivers/i2c/i2c-core-base.c:561 If device_register() fails, thermal_cooling_device_destroy_sysfs() need be called to free the memory allocated in thermal_cooling_device_setup_sysfs(). Fixes: 8ea229511e06 ("thermal: Add cooling device's statistics in sysfs") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Link: https://lore.kernel.org/r/20220511020605.3096734-1-yangyingliang@huawei.com Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09thermal/drivers/core: Use a char pointer for the cooling device nameDaniel Lezcano1-16/+22
[ Upstream commit 58483761810087e5ffdf36e84ac1bf26df909097 ] We want to have any kind of name for the cooling devices as we do no longer want to rely on auto-numbering. Let's replace the cooling device's fixed array by a char pointer to be allocated dynamically when registering the cooling device, so we don't limit the length of the name. Rework the error path at the same time as we have to rollback the allocations in case of error. Tested with a dummy device having the name: "Llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch" A village on the island of Anglesey (Wales), known to have the longest name in Europe. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/20210314111333.16551-1-daniel.lezcano@linaro.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-12-08thermal: core: Reset previous low and high trip during thermal zone initManaf Meethalavalappu Pallikunhi1-0/+2
[ Upstream commit 99b63316c39988039965693f5f43d8b4ccb1c86c ] During the suspend is in process, thermal_zone_device_update bails out thermal zone re-evaluation for any sensor trip violation without setting next valid trip to that sensor. It assumes during resume it will re-evaluate same thermal zone and update trip. But when it is in suspend temperature goes down and on resume path while updating thermal zone if temperature is less than previously violated trip, thermal zone set trip function evaluates the same previous high and previous low trip as new high and low trip. Since there is no change in high/low trip, it bails out from thermal zone set trip API without setting any trip. It leads to a case where sensor high trip or low trip is disabled forever even though thermal zone has a valid high or low trip. During thermal zone device init, reset thermal zone previous high and low trip. It resolves above mentioned scenario. Signed-off-by: Manaf Meethalavalappu Pallikunhi <manafm@codeaurora.org> Reviewed-by: Thara Gopinath <thara.gopinath@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-30thermal/core: Potential buffer overflow in thermal_build_list_of_policies()Dan Carpenter1-4/+3
[ Upstream commit 1bb30b20b49773369c299d4d6c65227201328663 ] After printing the list of thermal governors, then this function prints a newline character. The problem is that "size" has not been updated after printing the last governor. This means that it can write one character (the NUL terminator) beyond the end of the buffer. Get rid of the "size" variable and just use "PAGE_SIZE - count" directly. Fixes: 1b4f48494eb2 ("thermal: core: group functions related to governor handling") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20210916131342.GB25094@kili Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-25thermal/core: Correct function name thermal_zone_device_unregister()Yang Yingliang1-1/+1
[ Upstream commit a052b5118f13febac1bd901fe0b7a807b9d6b51c ] Fix the following make W=1 kernel build warning: drivers/thermal/thermal_core.c:1376: warning: expecting prototype for thermal_device_unregister(). Prototype was for thermal_zone_device_unregister() instead Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20210517051020.3463536-1-yangyingliang@huawei.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-12thermal: cooling: Remove unused variable *tzzhuguangqing1-7/+5
1. devfreq_cooling.c: The variable *tz is not used in devfreq_cooling_get_requested_power(), devfreq_cooling_state2power() and devfreq_cooling_power2state(). 2. cpufreq_cooling.c: After 84fe2cab48590, the variable *tz is not used anymore in cpufreq_get_requested_power(), cpufreq_state2power() and cpufreq_power2state(). Remove the variable *tz. Signed-off-by: zhuguangqing <zhuguangqing@xiaomi.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20200914071101.13575-1-zhuguangqing83@gmail.com
2020-10-12thermal: core: remove unnecessary mutex_init()Qinglang Miao1-1/+0
The mutex poweroff_lock is initialized statically. It is unnecessary to initialize by mutex_init(). Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20200916062139.191233-1-miaoqinglang@huawei.com
2020-09-04thermal: core: Fix use-after-free in thermal_zone_device_unregister()Dmitry Osipenko1-2/+3
The user-after-free bug in thermal_zone_device_unregister() is reported by KASAN. It happens because struct thermal_zone_device is released during of device_unregister() invocation, and hence the "tz" variable shouldn't be touched by thermal_notify_tz_delete(tz->id). Fixes: 55cdf0a283b8 ("thermal: core: Add notifications call in the framework") Signed-off-by: Dmitry Osipenko <digetx@gmail.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20200817235854.26816-1-digetx@gmail.com
2020-07-29thermal: core: Add thermal zone enable/disable notificationDaniel Lezcano1-0/+5
Now the calls to enable/disable a thermal zone are centralized in a call to a function, we can add in these the corresponding netlink notifications. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Acked-by: Zhang Rui <rui.zhang@intel.com> Reviewed-by: Amit Kucheria <amit.kucheria@linaro.org> Link: https://lore.kernel.org/r/20200727231033.26512-1-daniel.lezcano@linaro.org
2020-07-24thermal: core: Fix thermal zone lookup by IDThierry Reding1-3/+5
When a thermal zone is looked up by an ID and no zone is found matching that ID, the thermal_zone_get_by_id() function will return a pointer to the thermal zone list head which isn't actually a valid thermal zone. This can lead to a subsequent crash because a valid pointer is returned to the called, but dereferencing that pointer as struct thermal_zone is not safe. Fixes: 329b064fbd13 ("thermal: core: Get thermal zone by id") Signed-off-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20200724170105.2705467-1-thierry.reding@gmail.com
2020-07-21thermal: core: Move initialization after core initcallDaniel Lezcano1-1/+1
The generic netlink is initialized at subsys_initcall, so far after the thermal init routine and the thermal generic netlink family initialization. On ŝome platforms, that leads to a memory corruption. The fix was sent to netdev@ to move the genetlink framework initialization at core_initcall. Move the thermal core initialization to postcore level which is very close to core level. Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Reviewed-by: Amit Kucheria <amit.kucheria@linaro.org> Link: https://lore.kernel.org/r/20200717164217.18819-2-daniel.lezcano@linaro.org
2020-07-21thermal: netlink: Improve the initcall orderingDaniel Lezcano1-0/+4
The initcalls like to play joke. In our case, the thermal-netlink initcall is called after the thermal-core initcall but this one sends a notification before the former is initialized. No issue was spotted, but it could lead to a memory corruption, so instead of relying on the core_initcall for the thermal-netlink, let's initialize directly from the thermal-core init routine, so we have full control of the init ordering. Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Reviewed-by: Amit Kucheria <amit.kucheria@linaro.org> Link: https://lore.kernel.org/r/20200717164217.18819-1-daniel.lezcano@linaro.org
2020-07-07thermal: core: Add notifications call in the frameworkDaniel Lezcano1-0/+21
The generic netlink protocol is implemented but the different notification functions are not yet connected to the core code. These changes add the notification calls in the different corresponding places. Reviewed-by: Amit Kucheria <amit.kucheria@linaro.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Acked-by: Zhang Rui <rui.zhang@intel.com> Link: https://lore.kernel.org/r/20200706105538.2159-4-daniel.lezcano@linaro.org
2020-07-07thermal: core: Get thermal zone by idDaniel Lezcano1-0/+14
The next patch will introduce the generic netlink protocol to handle events, sampling and command from the thermal framework. In order to deal with the thermal zone, it uses its unique identifier to characterize it in the message. Passing an integer is more efficient than passing an entire string. This change provides a function returning back a thermal zone pointer corresponding to the identifier passed as parameter. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Reviewed-by: Amit Kucheria <amit.kucheria@linaro.org> Acked-by: Zhang Rui <rui.zhang@intel.com> Link: https://lore.kernel.org/r/20200706105538.2159-2-daniel.lezcano@linaro.org
2020-07-07thermal: core: Add helpers to browse the cdev, tz and governor listDaniel Lezcano1-0/+51
The cdev, tz and governor list, as well as their respective locks are statically defined in the thermal_core.c file. In order to give a sane access to these list, like browsing all the thermal zones or all the cooling devices, let's define a set of helpers where we pass a callback as a parameter to be called for each thermal entity. We keep the self-encapsulation and ensure the locks are correctly taken when looking at the list. Acked-by: Zhang Rui <rui.zhang@intel.com> Reviewed-by: Amit Kucheria <amit.kucheria@linaro.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20200706105538.2159-1-daniel.lezcano@linaro.org
2020-07-07thermal: Make thermal_zone_device_is_enabled() available to core onlyAndrzej Pietrasiewicz1-1/+0
This function is not needed by drivers. Signed-off-by: Andrzej Pietrasiewicz <andrzej.p@collabora.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20200703104354.19657-4-andrzej.p@collabora.com
2020-06-29thermal: Rename set_mode() to change_mode()Andrzej Pietrasiewicz1-2/+2
set_mode() is only called when tzd's mode is about to change. Actual setting is performed in thermal_core, in thermal_zone_device_set_mode(). The meaning of set_mode() callback is actually to notify the driver about the mode being changed and giving the driver a chance to oppose such change. To better reflect the purpose of the method rename it to change_mode() Signed-off-by: Andrzej Pietrasiewicz <andrzej.p@collabora.com> [for acerhdf] Acked-by: Peter Kaestle <peter@piie.net> Reviewed-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Reviewed-by: Amit Kucheria <amit.kucheria@linaro.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20200629122925.21729-12-andrzej.p@collabora.com
2020-06-29thermal: core: Stop polling DISABLED thermal devicesAndrzej Pietrasiewicz1-2/+14
Polling DISABLED devices is not desired, as all such "disabled" devices are meant to be handled by userspace. This patch introduces and uses should_stop_polling() to decide whether the device should be polled or not. Signed-off-by: Andrzej Pietrasiewicz <andrzej.p@collabora.com> Reviewed-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Reviewed-by: Amit Kucheria <amit.kucheria@linaro.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20200629122925.21729-10-andrzej.p@collabora.com
2020-06-29thermal: Use mode helpers in driversAndrzej Pietrasiewicz1-1/+1
Use thermal_zone_device_{en|dis}able() and thermal_zone_device_is_enabled(). Consequently, all set_mode() implementations in drivers: - can stop modifying tzd's "mode" member, - shall stop taking tzd's lock, as it is taken in the helpers - shall stop calling thermal_zone_device_update() as it is called in the helpers - can assume they are called when the mode truly changes, so checks to verify that can be dropped Not providing set_mode() by a driver no longer prevents the core from being able to set tzd's mode, so the relevant check in mode_store() is removed. Other comments: - acpi/thermal.c: tz->thermal_zone->mode will be updated only after we return from set_mode(), so use function parameter in thermal_set_mode() instead, no need to call acpi_thermal_check() in set_mode() - thermal/imx_thermal.c: regmap writes and mode assignment are done in thermal_zone_device_{en|dis}able() and set_mode() callback - thermal/intel/intel_quark_dts_thermal.c: soc_dts_{en|dis}able() are a part of set_mode() callback, so they don't need to modify tzd->mode, and don't need to fall back to the opposite mode if unsuccessful, as the return value will be propagated to thermal_zone_device_{en|dis}able() and ultimately tzd's member will not be changed in thermal_zone_device_set_mode(). - thermal/of-thermal.c: no need to set zone->mode to DISABLED in of_parse_thermal_zones() as a tzd is kzalloc'ed so mode is DISABLED anyway Signed-off-by: Andrzej Pietrasiewicz <andrzej.p@collabora.com> [for acerhdf] Acked-by: Peter Kaestle <peter@piie.net> Reviewed-by: Amit Kucheria <amit.kucheria@linaro.org> Reviewed-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20200629122925.21729-8-andrzej.p@collabora.com
2020-06-29thermal: Add mode helpersAndrzej Pietrasiewicz1-0/+53
Prepare for making the drivers not access tzd's private members. Signed-off-by: Andrzej Pietrasiewicz <andrzej.p@collabora.com> Reviewed-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> [staticize thermal_zone_device_set_mode()] Signed-off-by: kernel test robot <lkp@intel.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20200629122925.21729-7-andrzej.p@collabora.com
2020-06-29thermal: remove get_mode() operation of driversAndrzej Pietrasiewicz1-6/+1
get_mode() is now redundant, as the state is stored in struct thermal_zone_device. Consequently the "mode" attribute in sysfs can always be visible, because it is always possible to get the mode from struct tzd. Signed-off-by: Andrzej Pietrasiewicz <andrzej.p@collabora.com> [for acerhdf] Acked-by: Peter Kaestle <peter@piie.net> Reviewed-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Reviewed-by: Amit Kucheria <amit.kucheria@linaro.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20200629122925.21729-6-andrzej.p@collabora.com
2020-05-22thermal/core: Replace module.h with export.hAmit Kucheria1-1/+1
Thermal core cannot be modular, remove the unnecessary module.h include and replace with export.h to handle EXPORT_SYMBOL family of macros. Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/33af23406dcdb0c62dae1e6401446b997ccb449f.1589199124.git.amit.kucheria@linaro.org
2020-05-22thermal/core: Get rid of MODULE_* tagsAmit Kucheria1-4/+0
The thermal framework can no longer be compiled as a module as of commit 554b3529fe01 ("thermal/drivers/core: Remove the module Kconfig's option"). Remove the MODULE_* tags. Rui is mentioned in the copyright line at the top of the file and the license is mentioned in the SPDX tags. So no loss of information. Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/74339a09a55f8f3d86c4074fc2bf853a302d6186.1589199124.git.amit.kucheria@linaro.org
2020-04-14thermal: core: Remove pointless debug tracesDaniel Lezcano1-6/+0
The last temperature and the current temperature are show via a dev_debug. The line before, those temperature are also traced. It is pointless to duplicate the traces for the temperatures, remove the dev_dbg traces. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Reviewed-by: Amit Kucheria <amit.kucheria@linaro.org> Link: https://lore.kernel.org/r/20200331165449.30355-2-daniel.lezcano@linaro.org
2019-11-14thermal: Fix deadlock in thermal thermal_zone_device_checkWei Wang1-2/+2
1851799e1d29 ("thermal: Fix use-after-free when unregistering thermal zone device") changed cancel_delayed_work to cancel_delayed_work_sync to avoid a use-after-free issue. However, cancel_delayed_work_sync could be called insides the WQ causing deadlock. [54109.642398] c0 1162 kworker/u17:1 D 0 11030 2 0x00000000 [54109.642437] c0 1162 Workqueue: thermal_passive_wq thermal_zone_device_check [54109.642447] c0 1162 Call trace: [54109.642456] c0 1162 __switch_to+0x138/0x158 [54109.642467] c0 1162 __schedule+0xba4/0x1434 [54109.642480] c0 1162 schedule_timeout+0xa0/0xb28 [54109.642492] c0 1162 wait_for_common+0x138/0x2e8 [54109.642511] c0 1162 flush_work+0x348/0x40c [54109.642522] c0 1162 __cancel_work_timer+0x180/0x218 [54109.642544] c0 1162 handle_thermal_trip+0x2c4/0x5a4 [54109.642553] c0 1162 thermal_zone_device_update+0x1b4/0x25c [54109.642563] c0 1162 thermal_zone_device_check+0x18/0x24 [54109.642574] c0 1162 process_one_work+0x3cc/0x69c [54109.642583] c0 1162 worker_thread+0x49c/0x7c0 [54109.642593] c0 1162 kthread+0x17c/0x1b0 [54109.642602] c0 1162 ret_from_fork+0x10/0x18 [54109.643051] c0 1162 kworker/u17:2 D 0 16245 2 0x00000000 [54109.643067] c0 1162 Workqueue: thermal_passive_wq thermal_zone_device_check [54109.643077] c0 1162 Call trace: [54109.643085] c0 1162 __switch_to+0x138/0x158 [54109.643095] c0 1162 __schedule+0xba4/0x1434 [54109.643104] c0 1162 schedule_timeout+0xa0/0xb28 [54109.643114] c0 1162 wait_for_common+0x138/0x2e8 [54109.643122] c0 1162 flush_work+0x348/0x40c [54109.643131] c0 1162 __cancel_work_timer+0x180/0x218 [54109.643141] c0 1162 handle_thermal_trip+0x2c4/0x5a4 [54109.643150] c0 1162 thermal_zone_device_update+0x1b4/0x25c [54109.643159] c0 1162 thermal_zone_device_check+0x18/0x24 [54109.643167] c0 1162 process_one_work+0x3cc/0x69c [54109.643177] c0 1162 worker_thread+0x49c/0x7c0 [54109.643186] c0 1162 kthread+0x17c/0x1b0 [54109.643195] c0 1162 ret_from_fork+0x10/0x18 [54109.644500] c0 1162 cat D 0 7766 1 0x00000001 [54109.644515] c0 1162 Call trace: [54109.644524] c0 1162 __switch_to+0x138/0x158 [54109.644536] c0 1162 __schedule+0xba4/0x1434 [54109.644546] c0 1162 schedule_preempt_disabled+0x80/0xb0 [54109.644555] c0 1162 __mutex_lock+0x3a8/0x7f0 [54109.644563] c0 1162 __mutex_lock_slowpath+0x14/0x20 [54109.644575] c0 1162 thermal_zone_get_temp+0x84/0x360 [54109.644586] c0 1162 temp_show+0x30/0x78 [54109.644609] c0 1162 dev_attr_show+0x5c/0xf0 [54109.644628] c0 1162 sysfs_kf_seq_show+0xcc/0x1a4 [54109.644636] c0 1162 kernfs_seq_show+0x48/0x88 [54109.644656] c0 1162 seq_read+0x1f4/0x73c [54109.644664] c0 1162 kernfs_fop_read+0x84/0x318 [54109.644683] c0 1162 __vfs_read+0x50/0x1bc [54109.644692] c0 1162 vfs_read+0xa4/0x140 [54109.644701] c0 1162 SyS_read+0xbc/0x144 [54109.644708] c0 1162 el0_svc_naked+0x34/0x38 [54109.845800] c0 1162 D 720.000s 1->7766->7766 cat [panic] Fixes: 1851799e1d29 ("thermal: Fix use-after-free when unregistering thermal zone device") Cc: stable@vger.kernel.org Signed-off-by: Wei Wang <wvw@google.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2019-11-07thermal: Initialize thermal subsystem earlierAmit Kucheria1-1/+1
Now that the thermal framework is built-in, in order to facilitate thermal mitigation as early as possible in the boot cycle, move the thermal framework initialization to core_initcall. Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/f8ff0ab4a8e9c2eca5a26fb2256365b26cb326ce.1571656015.git.amit.kucheria@linaro.org
2019-11-07thermal: Remove netlink supportAmit Kucheria1-100/+1
There are no users of netlink messages for thermal inside the kernel. Remove the code and adjust the documentation. Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/8ff02cf62186c7a54fff325fad40a2e9ca3affa6.1571656014.git.amit.kucheria@linaro.org
2019-09-24thermal: Add some error messagesAmit Kucheria1-4/+13
When registering a thermal zone device, we currently return -EINVAL in four cases. This makes it a little hard to debug the real cause of the failure. Print some error messages to make it easier for developer to figure out what happened. Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org> Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2019-09-24thermal: Fix use-after-free when unregistering thermal zone deviceIdo Schimmel1-1/+1
thermal_zone_device_unregister() cancels the delayed work that polls the thermal zone, but it does not wait for it to finish. This is racy with respect to the freeing of the thermal zone device, which can result in a use-after-free [1]. Fix this by waiting for the delayed work to finish before freeing the thermal zone device. Note that thermal_zone_device_set_polling() is never invoked from an atomic context, so it is safe to call cancel_delayed_work_sync() that can block. [1] [ +0.002221] ================================================================== [ +0.000064] BUG: KASAN: use-after-free in __mutex_lock+0x1076/0x11c0 [ +0.000016] Read of size 8 at addr ffff8881e48e0450 by task kworker/1:0/17 [ +0.000023] CPU: 1 PID: 17 Comm: kworker/1:0 Not tainted 5.2.0-rc6-custom-02495-g8e73ca3be4af #1701 [ +0.000010] Hardware name: Mellanox Technologies Ltd. MSN2100-CB2FO/SA001017, BIOS 5.6.5 06/07/2016 [ +0.000016] Workqueue: events_freezable_power_ thermal_zone_device_check [ +0.000012] Call Trace: [ +0.000021] dump_stack+0xa9/0x10e [ +0.000020] print_address_description.cold.2+0x9/0x25e [ +0.000018] __kasan_report.cold.3+0x78/0x9d [ +0.000016] kasan_report+0xe/0x20 [ +0.000016] __mutex_lock+0x1076/0x11c0 [ +0.000014] step_wise_throttle+0x72/0x150 [ +0.000018] handle_thermal_trip+0x167/0x760 [ +0.000019] thermal_zone_device_update+0x19e/0x5f0 [ +0.000019] process_one_work+0x969/0x16f0 [ +0.000017] worker_thread+0x91/0xc40 [ +0.000014] kthread+0x33d/0x400 [ +0.000015] ret_from_fork+0x3a/0x50 [ +0.000020] Allocated by task 1: [ +0.000015] save_stack+0x19/0x80 [ +0.000015] __kasan_kmalloc.constprop.4+0xc1/0xd0 [ +0.000014] kmem_cache_alloc_trace+0x152/0x320 [ +0.000015] thermal_zone_device_register+0x1b4/0x13a0 [ +0.000015] mlxsw_thermal_init+0xc92/0x23d0 [ +0.000014] __mlxsw_core_bus_device_register+0x659/0x11b0 [ +0.000013] mlxsw_core_bus_device_register+0x3d/0x90 [ +0.000013] mlxsw_pci_probe+0x355/0x4b0 [ +0.000014] local_pci_probe+0xc3/0x150 [ +0.000013] pci_device_probe+0x280/0x410 [ +0.000013] really_probe+0x26a/0xbb0 [ +0.000013] driver_probe_device+0x208/0x2e0 [ +0.000013] device_driver_attach+0xfe/0x140 [ +0.000013] __driver_attach+0x110/0x310 [ +0.000013] bus_for_each_dev+0x14b/0x1d0 [ +0.000013] driver_register+0x1c0/0x400 [ +0.000015] mlxsw_sp_module_init+0x5d/0xd3 [ +0.000014] do_one_initcall+0x239/0x4dd [ +0.000013] kernel_init_freeable+0x42b/0x4e8 [ +0.000012] kernel_init+0x11/0x18b [ +0.000013] ret_from_fork+0x3a/0x50 [ +0.000015] Freed by task 581: [ +0.000013] save_stack+0x19/0x80 [ +0.000014] __kasan_slab_free+0x125/0x170 [ +0.000013] kfree+0xf3/0x310 [ +0.000013] thermal_release+0xc7/0xf0 [ +0.000014] device_release+0x77/0x200 [ +0.000014] kobject_put+0x1a8/0x4c0 [ +0.000014] device_unregister+0x38/0xc0 [ +0.000014] thermal_zone_device_unregister+0x54e/0x6a0 [ +0.000014] mlxsw_thermal_fini+0x184/0x35a [ +0.000014] mlxsw_core_bus_device_unregister+0x10a/0x640 [ +0.000013] mlxsw_devlink_core_bus_device_reload+0x92/0x210 [ +0.000015] devlink_nl_cmd_reload+0x113/0x1f0 [ +0.000014] genl_family_rcv_msg+0x700/0xee0 [ +0.000013] genl_rcv_msg+0xca/0x170 [ +0.000013] netlink_rcv_skb+0x137/0x3a0 [ +0.000012] genl_rcv+0x29/0x40 [ +0.000013] netlink_unicast+0x49b/0x660 [ +0.000013] netlink_sendmsg+0x755/0xc90 [ +0.000013] __sys_sendto+0x3de/0x430 [ +0.000013] __x64_sys_sendto+0xe2/0x1b0 [ +0.000013] do_syscall_64+0xa4/0x4d0 [ +0.000013] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ +0.000017] The buggy address belongs to the object at ffff8881e48e0008 which belongs to the cache kmalloc-2k of size 2048 [ +0.000012] The buggy address is located 1096 bytes inside of 2048-byte region [ffff8881e48e0008, ffff8881e48e0808) [ +0.000007] The buggy address belongs to the page: [ +0.000012] page:ffffea0007923800 refcount:1 mapcount:0 mapping:ffff88823680d0c0 index:0x0 compound_mapcount: 0 [ +0.000020] flags: 0x200000000010200(slab|head) [ +0.000019] raw: 0200000000010200 ffffea0007682008 ffffea00076ab808 ffff88823680d0c0 [ +0.000016] raw: 0000000000000000 00000000000d000d 00000001ffffffff 0000000000000000 [ +0.000007] page dumped because: kasan: bad access detected [ +0.000012] Memory state around the buggy address: [ +0.000012] ffff8881e48e0300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ +0.000012] ffff8881e48e0380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ +0.000012] >ffff8881e48e0400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ +0.000008] ^ [ +0.000012] ffff8881e48e0480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ +0.000012] ffff8881e48e0500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ +0.000007] ================================================================== Fixes: b1569e99c795 ("ACPI: move thermal trip handling to generic thermal layer") Reported-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2019-09-24thermal/drivers/core: Use put_device() if device_register() failsYue Hu1-12/+13
Never directly free @dev after calling device_register(), even if it returned an error! Always use put_device() to give up the reference initialized. Clean up the rollback block also. Signed-off-by: Yue Hu <huyue2@yulong.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2019-06-27thermal/drivers/core: Use governor table to initializeDaniel Lezcano1-23/+29
Now that the governor table is in place and the macro allows to browse the table, declare the governor so the entry is added in the governor table in the init section. The [un]register_thermal_governors function does no longer need to use the exported [un]register thermal governor's specific function which in turn call the [un]register_thermal_governor. The governors are fully self-encapsulated. The cyclic dependency is no longer needed, remove it. Reviewed-by: Amit Kucheria <amit.kucheria@linaro.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2019-05-17Merge branch 'next' of ↵Linus Torvalds1-19/+12
git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux Pull thermal management updates from Zhang Rui: - Remove the 'module' Kconfig option for thermal subsystem framework because the thermal framework are required to be ready as early as possible to avoid overheat at boot time (Daniel Lezcano) - Fix a bug that thermal framework pokes disabled thermal zones upon resume (Wei Wang) - A couple of cleanups and trivial fixes on int340x thermal drivers (Srinivas Pandruvada, Zhang Rui, Sumeet Pawnikar) * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux: drivers: thermal: processor_thermal: Downgrade error message mlxsw: Remove obsolete dependency on THERMAL=m hwmon/drivers/core: Simplify complex dependency thermal/drivers/core: Fix typo in the option name thermal/drivers/core: Remove depends on THERMAL in Kconfig thermal/drivers/core: Remove module unload code thermal/drivers/core: Remove the module Kconfig's option thermal: core: skip update disabled thermal zones after suspend thermal: make device_register's type argument const thermal: intel: int340x: processor_thermal_device: simplify to get driver data thermal/int3403_thermal: favor _TMP instead of PTYP
2019-05-14thermal: Introduce devm_thermal_of_cooling_device_registerGuenter Roeck1-0/+49
thermal_of_cooling_device_register() and thermal_cooling_device_register() are typically called from driver probe functions, and thermal_cooling_device_unregister() is called from remove functions. This makes both a perfect candidate for device managed functions. Introduce devm_thermal_of_cooling_device_register(). This function can also be used to replace thermal_cooling_device_register() by passing a NULL pointer as device node. The new function requires both struct device * and struct device_node * as parameters since the struct device_node * parameter is not always identical to dev->of_node. Don't introduce a device managed remove function since it is not needed at this point. Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2019-05-07Merge branches 'thermal-core', 'thermal-built-it' and 'thermal-intel' into nextZhang Rui1-16/+1
2019-05-06thermal/drivers/core: Remove module unload codeDaniel Lezcano1-16/+1
Now the thermal core is no longer compiled as a module. Remove the unloading module code and move the unregister function to the __init section. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2019-05-06thermal: core: skip update disabled thermal zones after suspendWei Wang1-0/+8
It is unnecessary to update disabled thermal zones post suspend and sometimes leads error/warning in bad behaved thermal drivers. Signed-off-by: Wei Wang <wvw@google.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2019-05-06thermal: make device_register's type argument constJean-Francois Dagenais1-3/+3
...because it can be, the buffer is strlcpy'd into a local buffer in a thermal struct member. Signed-off-by: Jean-Francois Dagenais <jeff.dagenais@gmail.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2018-11-30Thermal: do not clear passive state during system sleepWei Wang1-4/+8
commit ff140fea847e ("Thermal: handle thermal zone device properly during system sleep") added PM hook to call thermal zone reset during sleep. However resetting thermal zone will also clear the passive state and thus cancel the polling queue which leads the passive cooling device state not being cleared properly after sleep. thermal_pm_notify => thermal_zone_device_reset set passive to 0 thermal_zone_trip_update will skip update passive as `old_target == instance->target'. monitor_thermal_zone => thermal_zone_device_set_polling will cancel tz->poll_queue, so the cooling device state will not be changed afterwards. Reported-by: Kame Wang <kamewang@google.com> Signed-off-by: Wei Wang <wvw@google.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2018-11-30thermal: remove unused function parameterLukasz Luba1-4/+2
Clean unused parameter from internal framework function. Signed-off-by: Lukasz Luba <l.luba@partner.samsung.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2018-10-10thermal: core: using power_efficient_wq for thermal workerJeson Gao1-2/+4
For SMP systems, thermal worker should use power_efficient_wq in power saving mode, that will make scheduler more flexible on selecting an active core for running work handler to avoid keeping work handler always running on a single core, that will save some power. Even if 'power_efficient_wq' relevant configs are disabled 'system_freezable_power_efficient_wq' is identical to system_freezable_wq, behavior is unchanged. Signed-off-by: Jeson Gao <jeson.gao@unisoc.com> Signed-off-by: Chunyan Zhang <chunyan.zhang@unisoc.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2018-10-10thermal: core: Fix use-after-free in thermal_cooling_device_destroy_sysfsDmitry Osipenko1-1/+2
This patch fixes use-after-free that was detected by KASAN. The bug is triggered on a CPUFreq driver module unload by freeing 'cdev' on device unregister and then using the freed structure during of the cdev's sysfs data destruction. The solution is to unregister the sysfs at first, then destroy sysfs data and finally release the cooling device. Cc: <stable@vger.kernel.org> # v4.17+ Fixes: 8ea229511e06 ("thermal: Add cooling device's statistics in sysfs") Signed-off-by: Dmitry Osipenko <digetx@gmail.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Eduardo Valentin <edubezval@gmail.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2018-05-30drivers: thermal: Update license to SPDX formatLina Iyer1-4/+1
Update licences format for core thermal files. Signed-off-by: Lina Iyer <ilina@codeaurora.org> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2018-05-22thermal: Shorten name of sysfs callbacksViresh Kumar1-3/+3
The naming isn't consistent across all sysfs callbacks in the thermal core, some have a short name like type_show() and others have long names like thermal_cooling_device_weight_show(). This patch tries to make it consistent by shortening the name of sysfs callbacks. Some of the sysfs files are named similarly for both thermal zone and cooling device (like: type) and to avoid name clash between their show/store routines, the cooling device specific sysfs callbacks are prefixed with "cdev_". Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2018-04-02thermal: Add cooling device's statistics in sysfsViresh Kumar1-1/+2
This extends the sysfs interface for thermal cooling devices and exposes some pretty useful statistics. These statistics have proven to be quite useful specially while doing benchmarks related to the task scheduler, where we want to make sure that nothing has disrupted the test, specially the cooling device which may have put constraints on the CPUs. The information exposed here tells us to what extent the CPUs were constrained by the thermal framework. The write-only "reset" file is used to reset the statistics. The read-only "time_in_state_ms" file shows the time (in msec) spent by the device in the respective cooling states, and it prints one line per cooling state. The read-only "total_trans" file shows single positive integer value showing the total number of cooling state transitions the device has gone through since the time the cooling device is registered or the time when statistics were reset last. The read-only "trans_table" file shows a two dimensional matrix, where an entry <i,j> (row i, column j) represents the number of transitions from State_i to State_j. This is how the directory structure looks like for a single cooling device: $ ls -R /sys/class/thermal/cooling_device0/ /sys/class/thermal/cooling_device0/: cur_state max_state power stats subsystem type uevent /sys/class/thermal/cooling_device0/power: autosuspend_delay_ms runtime_active_time runtime_suspended_time control runtime_status /sys/class/thermal/cooling_device0/stats: reset time_in_state_ms total_trans trans_table This is tested on ARM 64-bit Hisilicon hikey620 board running Ubuntu and ARM 64-bit Hisilicon hikey960 board running Android. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2017-08-11thermal: core: Fix resources release in error paths in ↵Christophe Jaillet1-10/+13
thermal_zone_device_register() Reorder error handling code in order to fix some resources leaks in some cases: - 'tz' would leak if 'thermal_zone_create_device_groups()' fails - memory allocated by 'thermal_zone_create_device_groups()' would leak if 'device_register()' fails With this patch, we now have 2 error handling paths: one before 'device_register()', and one after it. This is needed because some resources are released in 'thermal_release()'. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2017-08-11thermal: core: Use the new 'thermal_zone_destroy_device_groups()' helper ↵Christophe Jaillet1-5/+1
function Simplify code by using the new 'thermal_zone_destroy_device_groups()' helper function. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2017-08-08thermal: core: fix some format issues on critical shutdown stringIcenowy Zheng1-1/+1
The critical shutdown notice string used to have some spaces missing, which makes it not so pretty. Add the spaces to satisfy usual English space rules. Reported-by: Mingcong Bai <jeffbai@aosc.io> Signed-off-by: Icenowy Zheng <icenowy@aosc.io> Signed-off-by: Zhang Rui <rui.zhang@intel.com>