<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/drivers/thermal, branch v4.14.217</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v4.14.217</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v4.14.217'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2020-09-09T17:03:10+00:00</updated>
<entry>
<title>thermal: ti-soc-thermal: Fix bogus thermal shutdowns for omap4430</title>
<updated>2020-09-09T17:03:10+00:00</updated>
<author>
<name>Tony Lindgren</name>
<email>tony@atomide.com</email>
</author>
<published>2020-07-06T18:33:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=905f44ec6e0eaca2c73b2a92b02c0731530384ca'/>
<id>urn:sha1:905f44ec6e0eaca2c73b2a92b02c0731530384ca</id>
<content type='text'>
[ Upstream commit 30d24faba0532d6972df79a1bf060601994b5873 ]

We can sometimes get bogus thermal shutdowns on omap4430 at least with
droid4 running idle with a battery charger connected:

thermal thermal_zone0: critical temperature reached (143 C), shutting down

Dumping out the register values shows we can occasionally get a 0x7f value
that is outside the TRM listed values in the ADC conversion table. And then
we get a normal value when reading again after that. Reading the register
multiple times does not seem help avoiding the bogus values as they stay
until the next sample is ready.

Looking at the TRM chapter "18.4.10.2.3 ADC Codes Versus Temperature", we
should have values from 13 to 107 listed with a total of 95 values. But
looking at the omap4430_adc_to_temp array, the values are off, and the
end values are missing. And it seems that the 4430 ADC table is similar
to omap3630 rather than omap4460.

Let's fix the issue by using values based on the omap3630 table and just
ignoring invalid values. Compared to the 4430 TRM, the omap3630 table has
the missing values added while the TRM table only shows every second
value.

Note that sometimes the ADC register values within the valid table can
also be way off for about 1 out of 10 values. But it seems that those
just show about 25 C too low values rather than too high values. So those
do not cause a bogus thermal shutdown.

Fixes: 1a31270e54d7 ("staging: omap-thermal: add OMAP4 data structures")
Cc: Merlijn Wajer &lt;merlijn@wizzup.org&gt;
Cc: Pavel Machek &lt;pavel@ucw.cz&gt;
Cc: Sebastian Reichel &lt;sebastian.reichel@collabora.com&gt;
Signed-off-by: Tony Lindgren &lt;tony@atomide.com&gt;
Signed-off-by: Daniel Lezcano &lt;daniel.lezcano@linaro.org&gt;
Link: https://lore.kernel.org/r/20200706183338.25622-1-tony@atomide.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>thermal: ti-soc-thermal: Fix reversed condition in ti_thermal_expose_sensor()</title>
<updated>2020-08-21T07:48:10+00:00</updated>
<author>
<name>Dan Carpenter</name>
<email>dan.carpenter@oracle.com</email>
</author>
<published>2020-06-16T09:19:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=de3661a5687b209fdd2803c5c1769aca0f5d2ac6'/>
<id>urn:sha1:de3661a5687b209fdd2803c5c1769aca0f5d2ac6</id>
<content type='text'>
[ Upstream commit 0f348db01fdf128813fdd659fcc339038fb421a4 ]

This condition is reversed and will cause breakage.

Fixes: 7440f518dad9 ("thermal/drivers/ti-soc-thermal: Avoid dereferencing ERR_PTR")
Signed-off-by: Dan Carpenter &lt;dan.carpenter@oracle.com&gt;
Signed-off-by: Daniel Lezcano &lt;daniel.lezcano@linaro.org&gt;
Link: https://lore.kernel.org/r/20200616091949.GA11940@mwanda
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>thermal/drivers/cpufreq_cooling: Fix wrong frequency converted from power</title>
<updated>2020-07-22T07:22:28+00:00</updated>
<author>
<name>Finley Xiao</name>
<email>finley.xiao@rock-chips.com</email>
</author>
<published>2020-06-19T09:08:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=39e0651cac9c80865b2838f297f95ffc0f34a1d8'/>
<id>urn:sha1:39e0651cac9c80865b2838f297f95ffc0f34a1d8</id>
<content type='text'>
commit 371a3bc79c11b707d7a1b7a2c938dc3cc042fffb upstream.

The function cpu_power_to_freq is used to find a frequency and set the
cooling device to consume at most the power to be converted. For example,
if the power to be converted is 80mW, and the em table is as follow.
struct em_cap_state table[] = {
	/* KHz     mW */
	{ 1008000, 36, 0 },
	{ 1200000, 49, 0 },
	{ 1296000, 59, 0 },
	{ 1416000, 72, 0 },
	{ 1512000, 86, 0 },
};
The target frequency should be 1416000KHz, not 1512000KHz.

Fixes: 349d39dc5739 ("thermal: cpu_cooling: merge frequency and power tables")
Cc: &lt;stable@vger.kernel.org&gt; # v4.13+
Signed-off-by: Finley Xiao &lt;finley.xiao@rock-chips.com&gt;
Acked-by: Viresh Kumar &lt;viresh.kumar@linaro.org&gt;
Reviewed-by: Amit Kucheria &lt;amit.kucheria@linaro.org&gt;
Signed-off-by: Daniel Lezcano &lt;daniel.lezcano@linaro.org&gt;
Link: https://lore.kernel.org/r/20200619090825.32747-1-finley.xiao@rock-chips.com
Signed-off-by: Viresh Kumar &lt;viresh.kumar@linaro.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>Revert "thermal: mediatek: fix register index error"</title>
<updated>2020-07-22T07:22:24+00:00</updated>
<author>
<name>Enric Balletbo i Serra</name>
<email>enric.balletbo@collabora.com</email>
</author>
<published>2020-07-07T10:34:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c5b5101aeb7a65a9b67ba3f4f7437b951cf2bc59'/>
<id>urn:sha1:c5b5101aeb7a65a9b67ba3f4f7437b951cf2bc59</id>
<content type='text'>
[ Upstream commit a8f62f183021be389561570ab5f8c701a5e70298 ]

This reverts commit eb9aecd90d1a39601e91cd08b90d5fee51d321a6

The above patch is supposed to fix a register index error on mt2701. It
is not clear if the problem solved is a hang or just an invalid value
returned, my guess is the second. The patch introduces, though, a new
hang on MT8173 device making them unusable. So, seems reasonable, revert
the patch because introduces a worst issue.

The reason I send a revert instead of trying to fix the issue for MT8173
is because the information needed to fix the issue is in the datasheet
and is not public. So I am not really able to fix it.

Fixes the following bug when CONFIG_MTK_THERMAL is set on MT8173
devices.

[    2.222488] Unable to handle kernel paging request at virtual address ffff8000125f5001
[    2.230421] Mem abort info:
[    2.233207]   ESR = 0x96000021
[    2.236261]   EC = 0x25: DABT (current EL), IL = 32 bits
[    2.241571]   SET = 0, FnV = 0
[    2.244623]   EA = 0, S1PTW = 0
[    2.247762] Data abort info:
[    2.250640]   ISV = 0, ISS = 0x00000021
[    2.254473]   CM = 0, WnR = 0
[    2.257544] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000041850000
[    2.264251] [ffff8000125f5001] pgd=000000013ffff003, pud=000000013fffe003, pmd=000000013fff9003, pte=006800001100b707
[    2.274867] Internal error: Oops: 96000021 [#1] PREEMPT SMP
[    2.280432] Modules linked in:
[    2.283483] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.7.0-rc6+ #162
[    2.289914] Hardware name: Google Elm (DT)
[    2.294003] pstate: 20000005 (nzCv daif -PAN -UAO)
[    2.298792] pc : mtk_read_temp+0xb8/0x1c8
[    2.302793] lr : mtk_read_temp+0x7c/0x1c8
[    2.306794] sp : ffff80001003b930
[    2.310100] x29: ffff80001003b930 x28: 0000000000000000
[    2.315404] x27: 0000000000000002 x26: ffff0000f9550b10
[    2.320709] x25: ffff0000f9550a80 x24: 0000000000000090
[    2.326014] x23: ffff80001003ba24 x22: 00000000610344c0
[    2.331318] x21: 0000000000002710 x20: 00000000000001f4
[    2.336622] x19: 0000000000030d40 x18: ffff800011742ec0
[    2.341926] x17: 0000000000000001 x16: 0000000000000001
[    2.347230] x15: ffffffffffffffff x14: ffffff0000000000
[    2.352535] x13: ffffffffffffffff x12: 0000000000000028
[    2.357839] x11: 0000000000000003 x10: ffff800011295ec8
[    2.363143] x9 : 000000000000291b x8 : 0000000000000002
[    2.368447] x7 : 00000000000000a8 x6 : 0000000000000004
[    2.373751] x5 : 0000000000000000 x4 : ffff800011295cb0
[    2.379055] x3 : 0000000000000002 x2 : ffff8000125f5001
[    2.384359] x1 : 0000000000000001 x0 : ffff0000f9550a80
[    2.389665] Call trace:
[    2.392105]  mtk_read_temp+0xb8/0x1c8
[    2.395760]  of_thermal_get_temp+0x2c/0x40
[    2.399849]  thermal_zone_get_temp+0x78/0x160
[    2.404198]  thermal_zone_device_update.part.0+0x3c/0x1f8
[    2.409589]  thermal_zone_device_update+0x34/0x48
[    2.414286]  of_thermal_set_mode+0x58/0x88
[    2.418375]  thermal_zone_of_sensor_register+0x1a8/0x1d8
[    2.423679]  devm_thermal_zone_of_sensor_register+0x64/0xb0
[    2.429242]  mtk_thermal_probe+0x690/0x7d0
[    2.433333]  platform_drv_probe+0x5c/0xb0
[    2.437335]  really_probe+0xe4/0x448
[    2.440901]  driver_probe_device+0xe8/0x140
[    2.445077]  device_driver_attach+0x7c/0x88
[    2.449252]  __driver_attach+0xac/0x178
[    2.453082]  bus_for_each_dev+0x78/0xc8
[    2.456909]  driver_attach+0x2c/0x38
[    2.460476]  bus_add_driver+0x14c/0x230
[    2.464304]  driver_register+0x6c/0x128
[    2.468131]  __platform_driver_register+0x50/0x60
[    2.472831]  mtk_thermal_driver_init+0x24/0x30
[    2.477268]  do_one_initcall+0x50/0x298
[    2.481098]  kernel_init_freeable+0x1ec/0x264
[    2.485450]  kernel_init+0x1c/0x110
[    2.488931]  ret_from_fork+0x10/0x1c
[    2.492502] Code: f9401081 f9400402 b8a67821 8b010042 (b9400042)
[    2.498599] ---[ end trace e43e3105ed27dc99 ]---
[    2.503367] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[    2.511020] SMP: stopping secondary CPUs
[    2.514941] Kernel Offset: disabled
[    2.518421] CPU features: 0x090002,25006005
[    2.522595] Memory Limit: none
[    2.525644] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]--

Cc: Michael Kao &lt;michael.kao@mediatek.com&gt;
Fixes: eb9aecd90d1a ("thermal: mediatek: fix register index error")
Signed-off-by: Enric Balletbo i Serra &lt;enric.balletbo@collabora.com&gt;
Reviewed-by: Matthias Brugger &lt;matthias.bgg@gmail.com&gt;
Signed-off-by: Daniel Lezcano &lt;daniel.lezcano@linaro.org&gt;
Link: https://lore.kernel.org/r/20200707103412.1010823-1-enric.balletbo@collabora.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>thermal/drivers/ti-soc-thermal: Avoid dereferencing ERR_PTR</title>
<updated>2020-06-25T13:41:51+00:00</updated>
<author>
<name>Sudip Mukherjee</name>
<email>sudipm.mukherjee@gmail.com</email>
</author>
<published>2020-04-24T16:19:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=8bb69793a78668bbb2b73db42ecb688db17bf456'/>
<id>urn:sha1:8bb69793a78668bbb2b73db42ecb688db17bf456</id>
<content type='text'>
[ Upstream commit 7440f518dad9d861d76c64956641eeddd3586f75 ]

On error the function ti_bandgap_get_sensor_data() returns the error
code in ERR_PTR() but we only checked if the return value is NULL or
not. And, so we can dereference an error code inside ERR_PTR.
While at it, convert a check to IS_ERR_OR_NULL.

Signed-off-by: Sudip Mukherjee &lt;sudipm.mukherjee@gmail.com&gt;
Reviewed-by: Amit Kucheria &lt;amit.kucheria@linaro.org&gt;
Signed-off-by: Daniel Lezcano &lt;daniel.lezcano@linaro.org&gt;
Link: https://lore.kernel.org/r/20200424161944.6044-1-sudipm.mukherjee@gmail.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>thermal: cpu_cooling: Actually trace CPU load in thermal_power_cpu_get_power</title>
<updated>2020-01-27T13:46:34+00:00</updated>
<author>
<name>Matthias Kaehlcke</name>
<email>mka@chromium.org</email>
</author>
<published>2019-05-02T18:32:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=0450254e832c40401c074c84e664d2f37bae7155'/>
<id>urn:sha1:0450254e832c40401c074c84e664d2f37bae7155</id>
<content type='text'>
[ Upstream commit bf45ac18b78038e43af3c1a273cae4ab5704d2ce ]

The CPU load values passed to the thermal_power_cpu_get_power
tracepoint are zero for all CPUs, unless, unless the
thermal_power_cpu_limit tracepoint is enabled too:

  irq/41-rockchip-98    [000] ....   290.972410: thermal_power_cpu_get_power:
  cpus=0000000f freq=1800000 load={{0x0,0x0,0x0,0x0}} dynamic_power=4815

vs

  irq/41-rockchip-96    [000] ....    95.773585: thermal_power_cpu_get_power:
  cpus=0000000f freq=1800000 load={{0x56,0x64,0x64,0x5e}} dynamic_power=4959
  irq/41-rockchip-96    [000] ....    95.773596: thermal_power_cpu_limit:
  cpus=0000000f freq=408000 cdev_state=10 power=416

There seems to be no good reason for omitting the CPU load information
depending on another tracepoint. My guess is that the intention was to
check whether thermal_power_cpu_get_power is (still) enabled, however
'load_cpu != NULL' already indicates that it was at least enabled when
cpufreq_get_requested_power() was entered, there seems little gain
from omitting the assignment if the tracepoint was just disabled, so
just remove the check.

Fixes: 6828a4711f99 ("thermal: add trace events to the power allocator governor")
Signed-off-by: Matthias Kaehlcke &lt;mka@chromium.org&gt;
Reviewed-by: Daniel Lezcano &lt;daniel.lezcano@linaro.org&gt;
Acked-by: Javi Merino &lt;javi.merino@kernel.org&gt;
Acked-by: Viresh Kumar &lt;viresh.kumar@linaro.org&gt;
Signed-off-by: Eduardo Valentin &lt;edubezval@gmail.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>thermal: mediatek: fix register index error</title>
<updated>2020-01-27T13:46:17+00:00</updated>
<author>
<name>Michael Kao</name>
<email>michael.kao@mediatek.com</email>
</author>
<published>2019-02-01T07:38:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=032c10fb504cf15575a69e2f7e673c72c720ade7'/>
<id>urn:sha1:032c10fb504cf15575a69e2f7e673c72c720ade7</id>
<content type='text'>
[ Upstream commit eb9aecd90d1a39601e91cd08b90d5fee51d321a6 ]

The index of msr and adcpnp should match the sensor
which belongs to the selected bank in the for loop.

Fixes: b7cf0053738c ("thermal: Add Mediatek thermal driver for mt2701.")
Signed-off-by: Michael Kao &lt;michael.kao@mediatek.com&gt;
Signed-off-by: Eduardo Valentin &lt;edubezval@gmail.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>thermal: Fix deadlock in thermal thermal_zone_device_check</title>
<updated>2019-12-17T19:38:57+00:00</updated>
<author>
<name>Wei Wang</name>
<email>wvw@google.com</email>
</author>
<published>2019-11-12T20:42:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=3be5da34757a684c239e7164c5bd783b6e9e724f'/>
<id>urn:sha1:3be5da34757a684c239e7164c5bd783b6e9e724f</id>
<content type='text'>
commit 163b00cde7cf2206e248789d2780121ad5e6a70b upstream.

1851799e1d29 ("thermal: Fix use-after-free when unregistering thermal zone
device") changed cancel_delayed_work to cancel_delayed_work_sync to avoid
a use-after-free issue. However, cancel_delayed_work_sync could be called
insides the WQ causing deadlock.

[54109.642398] c0   1162 kworker/u17:1   D    0 11030      2 0x00000000
[54109.642437] c0   1162 Workqueue: thermal_passive_wq thermal_zone_device_check
[54109.642447] c0   1162 Call trace:
[54109.642456] c0   1162  __switch_to+0x138/0x158
[54109.642467] c0   1162  __schedule+0xba4/0x1434
[54109.642480] c0   1162  schedule_timeout+0xa0/0xb28
[54109.642492] c0   1162  wait_for_common+0x138/0x2e8
[54109.642511] c0   1162  flush_work+0x348/0x40c
[54109.642522] c0   1162  __cancel_work_timer+0x180/0x218
[54109.642544] c0   1162  handle_thermal_trip+0x2c4/0x5a4
[54109.642553] c0   1162  thermal_zone_device_update+0x1b4/0x25c
[54109.642563] c0   1162  thermal_zone_device_check+0x18/0x24
[54109.642574] c0   1162  process_one_work+0x3cc/0x69c
[54109.642583] c0   1162  worker_thread+0x49c/0x7c0
[54109.642593] c0   1162  kthread+0x17c/0x1b0
[54109.642602] c0   1162  ret_from_fork+0x10/0x18
[54109.643051] c0   1162 kworker/u17:2   D    0 16245      2 0x00000000
[54109.643067] c0   1162 Workqueue: thermal_passive_wq thermal_zone_device_check
[54109.643077] c0   1162 Call trace:
[54109.643085] c0   1162  __switch_to+0x138/0x158
[54109.643095] c0   1162  __schedule+0xba4/0x1434
[54109.643104] c0   1162  schedule_timeout+0xa0/0xb28
[54109.643114] c0   1162  wait_for_common+0x138/0x2e8
[54109.643122] c0   1162  flush_work+0x348/0x40c
[54109.643131] c0   1162  __cancel_work_timer+0x180/0x218
[54109.643141] c0   1162  handle_thermal_trip+0x2c4/0x5a4
[54109.643150] c0   1162  thermal_zone_device_update+0x1b4/0x25c
[54109.643159] c0   1162  thermal_zone_device_check+0x18/0x24
[54109.643167] c0   1162  process_one_work+0x3cc/0x69c
[54109.643177] c0   1162  worker_thread+0x49c/0x7c0
[54109.643186] c0   1162  kthread+0x17c/0x1b0
[54109.643195] c0   1162  ret_from_fork+0x10/0x18
[54109.644500] c0   1162 cat             D    0  7766      1 0x00000001
[54109.644515] c0   1162 Call trace:
[54109.644524] c0   1162  __switch_to+0x138/0x158
[54109.644536] c0   1162  __schedule+0xba4/0x1434
[54109.644546] c0   1162  schedule_preempt_disabled+0x80/0xb0
[54109.644555] c0   1162  __mutex_lock+0x3a8/0x7f0
[54109.644563] c0   1162  __mutex_lock_slowpath+0x14/0x20
[54109.644575] c0   1162  thermal_zone_get_temp+0x84/0x360
[54109.644586] c0   1162  temp_show+0x30/0x78
[54109.644609] c0   1162  dev_attr_show+0x5c/0xf0
[54109.644628] c0   1162  sysfs_kf_seq_show+0xcc/0x1a4
[54109.644636] c0   1162  kernfs_seq_show+0x48/0x88
[54109.644656] c0   1162  seq_read+0x1f4/0x73c
[54109.644664] c0   1162  kernfs_fop_read+0x84/0x318
[54109.644683] c0   1162  __vfs_read+0x50/0x1bc
[54109.644692] c0   1162  vfs_read+0xa4/0x140
[54109.644701] c0   1162  SyS_read+0xbc/0x144
[54109.644708] c0   1162  el0_svc_naked+0x34/0x38
[54109.845800] c0   1162 D 720.000s 1-&gt;7766-&gt;7766 cat [panic]

Fixes: 1851799e1d29 ("thermal: Fix use-after-free when unregistering thermal zone device")
Cc: stable@vger.kernel.org
Signed-off-by: Wei Wang &lt;wvw@google.com&gt;
Signed-off-by: Zhang Rui &lt;rui.zhang@intel.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>thermal: rcar_thermal: Prevent hardware access during system suspend</title>
<updated>2019-12-01T08:13:47+00:00</updated>
<author>
<name>Geert Uytterhoeven</name>
<email>geert+renesas@glider.be</email>
</author>
<published>2018-10-12T07:20:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=986a4216501036e6855e8170c4d12971565624c0'/>
<id>urn:sha1:986a4216501036e6855e8170c4d12971565624c0</id>
<content type='text'>
[ Upstream commit 3a31386217628ffe2491695be2db933c25dde785 ]

On r8a7791/koelsch, sometimes the following message is printed during
system suspend:

    rcar_thermal e61f0000.thermal: thermal sensor was broken

This happens if the workqueue runs while the device is already
suspended.  Fix this by using the freezable system workqueue instead,
cfr. commit 51e20d0e3a60cf46 ("thermal: Prevent polling from happening
during system suspend").

Fixes: e0a5172e9eec7f0d ("thermal: rcar: add interrupt support")
Signed-off-by: Geert Uytterhoeven &lt;geert+renesas@glider.be&gt;
Reviewed-by: Niklas Söderlund &lt;niklas.soderlund+renesas@ragnatech.se&gt;
Signed-off-by: Eduardo Valentin &lt;edubezval@gmail.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>thermal: Fix use-after-free when unregistering thermal zone device</title>
<updated>2019-10-11T16:18:41+00:00</updated>
<author>
<name>Ido Schimmel</name>
<email>idosch@mellanox.com</email>
</author>
<published>2019-07-10T10:14:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=414755a4bf32adef628cc1f5d569f05c968de138'/>
<id>urn:sha1:414755a4bf32adef628cc1f5d569f05c968de138</id>
<content type='text'>
[ Upstream commit 1851799e1d2978f68eea5d9dff322e121dcf59c1 ]

thermal_zone_device_unregister() cancels the delayed work that polls the
thermal zone, but it does not wait for it to finish. This is racy with
respect to the freeing of the thermal zone device, which can result in a
use-after-free [1].

Fix this by waiting for the delayed work to finish before freeing the
thermal zone device. Note that thermal_zone_device_set_polling() is
never invoked from an atomic context, so it is safe to call
cancel_delayed_work_sync() that can block.

[1]
[  +0.002221] ==================================================================
[  +0.000064] BUG: KASAN: use-after-free in __mutex_lock+0x1076/0x11c0
[  +0.000016] Read of size 8 at addr ffff8881e48e0450 by task kworker/1:0/17

[  +0.000023] CPU: 1 PID: 17 Comm: kworker/1:0 Not tainted 5.2.0-rc6-custom-02495-g8e73ca3be4af #1701
[  +0.000010] Hardware name: Mellanox Technologies Ltd. MSN2100-CB2FO/SA001017, BIOS 5.6.5 06/07/2016
[  +0.000016] Workqueue: events_freezable_power_ thermal_zone_device_check
[  +0.000012] Call Trace:
[  +0.000021]  dump_stack+0xa9/0x10e
[  +0.000020]  print_address_description.cold.2+0x9/0x25e
[  +0.000018]  __kasan_report.cold.3+0x78/0x9d
[  +0.000016]  kasan_report+0xe/0x20
[  +0.000016]  __mutex_lock+0x1076/0x11c0
[  +0.000014]  step_wise_throttle+0x72/0x150
[  +0.000018]  handle_thermal_trip+0x167/0x760
[  +0.000019]  thermal_zone_device_update+0x19e/0x5f0
[  +0.000019]  process_one_work+0x969/0x16f0
[  +0.000017]  worker_thread+0x91/0xc40
[  +0.000014]  kthread+0x33d/0x400
[  +0.000015]  ret_from_fork+0x3a/0x50

[  +0.000020] Allocated by task 1:
[  +0.000015]  save_stack+0x19/0x80
[  +0.000015]  __kasan_kmalloc.constprop.4+0xc1/0xd0
[  +0.000014]  kmem_cache_alloc_trace+0x152/0x320
[  +0.000015]  thermal_zone_device_register+0x1b4/0x13a0
[  +0.000015]  mlxsw_thermal_init+0xc92/0x23d0
[  +0.000014]  __mlxsw_core_bus_device_register+0x659/0x11b0
[  +0.000013]  mlxsw_core_bus_device_register+0x3d/0x90
[  +0.000013]  mlxsw_pci_probe+0x355/0x4b0
[  +0.000014]  local_pci_probe+0xc3/0x150
[  +0.000013]  pci_device_probe+0x280/0x410
[  +0.000013]  really_probe+0x26a/0xbb0
[  +0.000013]  driver_probe_device+0x208/0x2e0
[  +0.000013]  device_driver_attach+0xfe/0x140
[  +0.000013]  __driver_attach+0x110/0x310
[  +0.000013]  bus_for_each_dev+0x14b/0x1d0
[  +0.000013]  driver_register+0x1c0/0x400
[  +0.000015]  mlxsw_sp_module_init+0x5d/0xd3
[  +0.000014]  do_one_initcall+0x239/0x4dd
[  +0.000013]  kernel_init_freeable+0x42b/0x4e8
[  +0.000012]  kernel_init+0x11/0x18b
[  +0.000013]  ret_from_fork+0x3a/0x50

[  +0.000015] Freed by task 581:
[  +0.000013]  save_stack+0x19/0x80
[  +0.000014]  __kasan_slab_free+0x125/0x170
[  +0.000013]  kfree+0xf3/0x310
[  +0.000013]  thermal_release+0xc7/0xf0
[  +0.000014]  device_release+0x77/0x200
[  +0.000014]  kobject_put+0x1a8/0x4c0
[  +0.000014]  device_unregister+0x38/0xc0
[  +0.000014]  thermal_zone_device_unregister+0x54e/0x6a0
[  +0.000014]  mlxsw_thermal_fini+0x184/0x35a
[  +0.000014]  mlxsw_core_bus_device_unregister+0x10a/0x640
[  +0.000013]  mlxsw_devlink_core_bus_device_reload+0x92/0x210
[  +0.000015]  devlink_nl_cmd_reload+0x113/0x1f0
[  +0.000014]  genl_family_rcv_msg+0x700/0xee0
[  +0.000013]  genl_rcv_msg+0xca/0x170
[  +0.000013]  netlink_rcv_skb+0x137/0x3a0
[  +0.000012]  genl_rcv+0x29/0x40
[  +0.000013]  netlink_unicast+0x49b/0x660
[  +0.000013]  netlink_sendmsg+0x755/0xc90
[  +0.000013]  __sys_sendto+0x3de/0x430
[  +0.000013]  __x64_sys_sendto+0xe2/0x1b0
[  +0.000013]  do_syscall_64+0xa4/0x4d0
[  +0.000013]  entry_SYSCALL_64_after_hwframe+0x49/0xbe

[  +0.000017] The buggy address belongs to the object at ffff8881e48e0008
               which belongs to the cache kmalloc-2k of size 2048
[  +0.000012] The buggy address is located 1096 bytes inside of
               2048-byte region [ffff8881e48e0008, ffff8881e48e0808)
[  +0.000007] The buggy address belongs to the page:
[  +0.000012] page:ffffea0007923800 refcount:1 mapcount:0 mapping:ffff88823680d0c0 index:0x0 compound_mapcount: 0
[  +0.000020] flags: 0x200000000010200(slab|head)
[  +0.000019] raw: 0200000000010200 ffffea0007682008 ffffea00076ab808 ffff88823680d0c0
[  +0.000016] raw: 0000000000000000 00000000000d000d 00000001ffffffff 0000000000000000
[  +0.000007] page dumped because: kasan: bad access detected

[  +0.000012] Memory state around the buggy address:
[  +0.000012]  ffff8881e48e0300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  +0.000012]  ffff8881e48e0380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  +0.000012] &gt;ffff8881e48e0400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  +0.000008]                                                  ^
[  +0.000012]  ffff8881e48e0480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  +0.000012]  ffff8881e48e0500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  +0.000007] ==================================================================

Fixes: b1569e99c795 ("ACPI: move thermal trip handling to generic thermal layer")
Reported-by: Jiri Pirko &lt;jiri@mellanox.com&gt;
Signed-off-by: Ido Schimmel &lt;idosch@mellanox.com&gt;
Acked-by: Jiri Pirko &lt;jiri@mellanox.com&gt;
Signed-off-by: Zhang Rui &lt;rui.zhang@intel.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
</feed>
