Age | Commit message (Collapse) | Author | Files | Lines |
|
Add some helper functions to make it easier introducing the support
for thermal reboot.
No functional change.
Signed-off-by: Fabio Estevam <festevam@denx.de>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20231129124330.519423-2-festevam@gmail.com
|
|
PTR_ERR() returns -ENODEV when thermal-zones are undefined, and we need
-ENODEV as the right value for comparison.
Otherwise, tz->type is NULL when thermal-zones is undefined, resulting
in the following error:
[ 12.290030] CPU 1 Unable to handle kernel paging request at virtual address fffffffffffffff1, era == 900000000355f410, ra == 90000000031579b8
[ 12.302877] Oops[#1]:
[ 12.305190] CPU: 1 PID: 181 Comm: systemd-udevd Not tainted 6.6.0-rc7+ #5385
[ 12.312304] pc 900000000355f410 ra 90000000031579b8 tp 90000001069e8000 sp 90000001069eba10
[ 12.320739] a0 0000000000000000 a1 fffffffffffffff1 a2 0000000000000014 a3 0000000000000001
[ 12.329173] a4 90000001069eb990 a5 0000000000000001 a6 0000000000001001 a7 900000010003431c
[ 12.337606] t0 fffffffffffffff1 t1 54567fd5da9b4fd4 t2 900000010614ec40 t3 00000000000dc901
[ 12.346041] t4 0000000000000000 t5 0000000000000004 t6 900000010614ee20 t7 900000000d00b790
[ 12.354472] t8 00000000000dc901 u0 54567fd5da9b4fd4 s9 900000000402ae10 s0 900000010614ec40
[ 12.362916] s1 90000000039fced0 s2 ffffffffffffffed s3 ffffffffffffffed s4 9000000003acc000
[ 12.362931] s5 0000000000000004 s6 fffffffffffff000 s7 0000000000000490 s8 90000001028b2ec8
[ 12.362938] ra: 90000000031579b8 thermal_add_hwmon_sysfs+0x258/0x300
[ 12.386411] ERA: 900000000355f410 strscpy+0xf0/0x160
[ 12.391626] CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE)
[ 12.397898] PRMD: 00000004 (PPLV0 +PIE -PWE)
[ 12.403678] EUEN: 00000000 (-FPE -SXE -ASXE -BTE)
[ 12.409859] ECFG: 00071c1c (LIE=2-4,10-12 VS=7)
[ 12.415882] ESTAT: 00010000 [PIL] (IS= ECode=1 EsubCode=0)
[ 12.415907] BADV: fffffffffffffff1
[ 12.415911] PRID: 0014a000 (Loongson-64bit, Loongson-2K1000)
[ 12.415917] Modules linked in: loongson2_thermal(+) vfat fat uio_pdrv_genirq uio fuse zram zsmalloc
[ 12.415950] Process systemd-udevd (pid: 181, threadinfo=00000000358b9718, task=00000000ace72fe3)
[ 12.415961] Stack : 0000000000000dc0 54567fd5da9b4fd4 900000000402ae10 9000000002df9358
[ 12.415982] ffffffffffffffed 0000000000000004 9000000107a10aa8 90000001002a3410
[ 12.415999] ffffffffffffffed ffffffffffffffed 9000000107a11268 9000000003157ab0
[ 12.416016] 9000000107a10aa8 ffffff80020fc0c8 90000001002a3410 ffffffffffffffed
[ 12.416032] 0000000000000024 ffffff80020cc1e8 900000000402b2a0 9000000003acc000
[ 12.416048] 90000001002a3410 0000000000000000 ffffff80020f4030 90000001002a3410
[ 12.416065] 0000000000000000 9000000002df6808 90000001002a3410 0000000000000000
[ 12.416081] ffffff80020f4030 0000000000000000 90000001002a3410 9000000002df2ba8
[ 12.416097] 00000000000000b4 90000001002a34f4 90000001002a3410 0000000000000002
[ 12.416114] ffffff80020f4030 fffffffffffffff0 90000001002a3410 9000000002df2f30
[ 12.416131] ...
[ 12.416138] Call Trace:
[ 12.416142] [<900000000355f410>] strscpy+0xf0/0x160
[ 12.416167] [<90000000031579b8>] thermal_add_hwmon_sysfs+0x258/0x300
[ 12.416183] [<9000000003157ab0>] devm_thermal_add_hwmon_sysfs+0x50/0xe0
[ 12.416200] [<ffffff80020cc1e8>] loongson2_thermal_probe+0x128/0x200 [loongson2_thermal]
[ 12.416232] [<9000000002df6808>] platform_probe+0x68/0x140
[ 12.416249] [<9000000002df2ba8>] really_probe+0xc8/0x3c0
[ 12.416269] [<9000000002df2f30>] __driver_probe_device+0x90/0x180
[ 12.416286] [<9000000002df3058>] driver_probe_device+0x38/0x160
[ 12.416302] [<9000000002df33a8>] __driver_attach+0xa8/0x200
[ 12.416314] [<9000000002deffec>] bus_for_each_dev+0x8c/0x120
[ 12.416330] [<9000000002df198c>] bus_add_driver+0x10c/0x2a0
[ 12.416346] [<9000000002df46b4>] driver_register+0x74/0x160
[ 12.416358] [<90000000022201a4>] do_one_initcall+0x84/0x220
[ 12.416372] [<90000000022f3ab8>] do_init_module+0x58/0x2c0
[ 12.416386] [<90000000022f6538>] init_module_from_file+0x98/0x100
[ 12.416399] [<90000000022f67f0>] sys_finit_module+0x230/0x3c0
[ 12.416412] [<900000000358f7c8>] do_syscall+0x88/0xc0
[ 12.416431] [<900000000222137c>] handle_syscall+0xbc/0x158
Fixes: e7e3a7c35791 ("thermal/drivers/loongson-2: Add thermal management support")
Cc: Yinbo Zhu <zhuyinbo@loongson.cn>
Signed-off-by: Binbin Zhou <zhoubinbin@loongson.cn>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/343c14de98216636a47b43e8bfd47b70d0a8e068.1700817227.git.zhoubinbin@loongson.cn
|
|
Correct one misuse of kernel-doc notation and one spelling error as
reported by codespell.
cpuidle_cooling.c:152: warning: cannot understand function prototype: 'struct thermal_cooling_device_ops cpuidle_cooling_ops = '
For the kernel-doc warning, don't use "/**" for a comment on data.
kernel-doc can be used for structure declarations but not definitions.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
If device_register() in thermal_zone_device_register_with_trips()
returns an error, the tz variable is set to NULL and subsequently
dereferenced in kfree(tz->tzp).
Commit adc8749b150c ("thermal/drivers/core: Use put_device() if
device_register() fails") added the tz = NULL assignment in question to
avoid a possible double-free after dropping the reference to the zone
device. However, after commit 4649620d9404 ("thermal: core: Make
thermal_zone_device_unregister() return after freeing the zone"), that
assignment has become redundant, because dropping the reference to the
zone device does not cause the zone object to be freed any more.
Drop it to address the NULL pointer dereference.
Fixes: 3d439b1a2ad3 ("thermal/core: Alloc-copy-free the thermal zone parameters structure")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
|
|
Initially the check against the get_temp ops in the
thermal_zone_device_update() was put in there in order to catch
drivers not providing this method.
Instead of checking again and again the function if the ops exists in
the update function, let's do the check at registration time, so it is
checked one time and for all.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
The _store callbacks of the trip point temperature and hysteresis sysfs
attributes invoke thermal_notify_tz_trip_change() to send a notification
regarding the trip point change, but when trip points are updated by the
platform firmware, trip point change notifications are not sent.
To make the behavior after a trip point change more consistent,
modify all of the 3 places where trip point temperature is updated
to use a new function called thermal_zone_set_trip_temp() for this
purpose and make that function call thermal_notify_tz_trip_change().
Note that trip point hysteresis can only be updated via sysfs and
trip_point_hyst_store() calls thermal_notify_tz_trip_change() already,
so this code path need not be changed.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
|
|
Make thermal_genl_cmd_tz_get_trip() use for_each_trip() instead of an open-
coded loop over trip indices.
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
|
|
Make __thermal_zone_get_temp() use for_each_trip() instead of an open-
coded loop over trip indices.
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
|
|
Make __thermal_zone_set_trips() use for_each_trip() instead of an open-
coded loop over trip indices.
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
|
|
The __thermal_zone_get_trip() header in drivers/thermal/thermal_core.h
is redundant, because there is one already in thermal.h, so drop it.
No functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
|
|
In order to avoid running __thermal_zone_device_update() for thermal
zones going away, the thermal zone lock is held around device_del()
in thermal_zone_device_unregister() and thermal_zone_device_update()
passes the given thermal zone device to device_is_registered().
This allows thermal_zone_device_update() to skip the
__thermal_zone_device_update() if device_del() has already run for
the thermal zone at hand.
However, instead of looking at driver core internals, the thermal
subsystem may as well rely on its own data structures for this
purpose. Namely, if the thermal zone is not present in
thermal_tz_list, it can be regarded as unavailable, which in fact is
already the case in thermal_zone_device_unregister(). Accordingly,
the device_is_registered() check in thermal_zone_device_update() can
be replaced with checking whether or not the node list_head in struct
thermal_zone_device is empty, in which case it is not there in
thermal_tz_list.
To make this work, though, it is necessary to initialize tz->node
in thermal_zone_device_register_with_trips() before registering the
thermal zone device and it needs to be added to thermal_tz_list and
deleted from it under its zone lock.
After the above modifications, the zone lock does not need to be
held around device_del() in thermal_zone_device_unregister() any more.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-and-tested-by: Lukasz Luba <lukasz.luba@arm.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
|
|
Multiple places in the thermal subsystem (most importantly, sysfs
attribute callback functions) check if the given thermal zone device is
still registered in order to return early in case the device_del() in
thermal_zone_device_unregister() has run already.
However, after thermal_zone_device_unregister() has been made wait for
all of the zone-related activity to complete before returning, it is
not necessary to do that any more, because all of the code holding a
reference to the thermal zone device object will be waited for even if
it does not do anything special to enforce this.
Accordingly, drop all of the device_is_registered() checks that are now
redundant and get rid of the zone locking that is not necessary any more
after dropping them.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-and-tested-by: Lukasz Luba <lukasz.luba@arm.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
|
|
the zone
Make thermal_zone_device_unregister() wait until all of the references
to the given thermal zone object have been dropped and free it before
returning.
This guarantees that when thermal_zone_device_unregister() returns,
there is no leftover activity regarding the thermal zone in question
which is required by some of its callers (for instance, modular driver
code that wants to know when it is safe to let the module go away).
Subsequently, this will allow some confusing device_is_registered()
checks to be dropped from the thermal sysfs and core code.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-and-tested-by: Lukasz Luba <lukasz.luba@arm.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
|
|
Rework the _show() callback functions for the trip point temperature,
hysteresis and type attributes to avoid copying the values of struct
thermal_trip fields that they do not use and make them carry out the
same validation checks as the corresponding _store() callback functions.
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
|
|
Both trip_point_temp_store() and trip_point_hyst_store() use
thermal_zone_set_trip() to update a given trip point, but none of them
actually needs to change more than one field in struct thermal_trip
representing it. However, each of them effectively calls
__thermal_zone_get_trip() twice in a row for the same trip index value,
once directly and once via thermal_zone_set_trip(), which is not
particularly efficient, and the way in which thermal_zone_set_trip()
carries out the update is not particularly straightforward.
Moreover, input processing need not be done under the thermal zone lock
in any of these functions.
Rework trip_point_temp_store() and trip_point_hyst_store() to address
the above, move the part of thermal_zone_set_trip() that is still
useful to a new function called thermal_zone_trip_updated() and drop
the rest of it.
While at it, make trip_point_hyst_store() reject negative hysteresis
values.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
|
|
After recent changes in the thermal framework, a trip points array is
required for registering a thermal zone that is not tripless, so the
tz->trips pointer in thermal_zone_set_trip() is never NULL and the
check involving it is redundant. Drop that check.
No functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
|
|
Rearrange the initialization of local variables in allocate_power() so
as to improve code clarity and the visibility of the initial values.
This change is not expected to alter the general functionality.
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Local variable 'ret' in allocate_power() is only used in the return
statement, so drop it.
Local variable 'trip_max' in allocate_power() is only used for caching
the params->trip_max value which may as well be accessed directly as
needed, so drop it either.
This change is not expected to alter the general functionality.
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
The 'cdev' pointer in allow_maximum_power() is valid, so there is no
need to use 'instance->cdev' instead of it.
This change is not expected to alter the general functionality.
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Rearrange the order of local variable definitions in multiple functions
so as to follow the kernel coding style in that respect.
Also, move local variable definitions located in nested code blocks to
the beginning of each function to improve the visibility of all local
variables in use.
This change is not expected to alter the general functionality.
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
The throttling logic only cares about the last passive trip point and
the cooling devices attached to it.
Therefore, there is no need to bail out if other trip points have
cooling devices which are not a supported by the IPA.
Check the cooling devices only for 'trip_max' during the binding.
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
[ rjw: Changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Set up the trip points at the beginning of the binding function.
This simplifies the code a bit and allows for further cleanups.
Also add a check to fail the binding if the last passive trip point is
not found.
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
[ rjw: Changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Refactor the code and rename the last passive trip point field.
There is a comment describing the field properly. Use shorter field name
so as to allow to clarify the code.
This change is not expected to alter the general functionality.
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
[ rjw: Changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
The trip crossing detection in handle_thermal_trip() does not work
correctly in the cases when a trip point is crossed on the way up and
then the zone temperature stays above its low temperature (that is, its
temperature decreased by its hysteresis). The trip temperature may
be passed by the zone temperature subsequently in that case, even
multiple times, but that does not count as the trip crossing as long as
the zone temperature does not fall below the trip's low temperature or,
in other words, until the trip is crossed on the way down.
|-----------low--------high------------|
|<--------->|
| hyst |
| |
| -|--> crossed on the way up
|
<---|-- crossed on the way down
However, handle_thermal_trip() will invoke thermal_notify_tz_trip_up()
every time the trip temperature is passed by the zone temperature on
the way up regardless of whether or not the trip has been crossed on
the way down yet. Moreover, it will not call thermal_notify_tz_trip_down()
if the last zone temperature was between the trip's temperature and its
low temperature, so some "trip crossed on the way down" events may not
be reported.
To address this issue, introduce trip thresholds equal to either the
temperature of the given trip, or its low temperature, such that if
the trip's threshold is passed by the zone temperature on the way up,
its value will be set to the trip's low temperature and
thermal_notify_tz_trip_up() will be called, and if the trip's threshold
is passed by the zone temperature on the way down, its value will be set
to the trip's temperature (high) and thermal_notify_tz_trip_down() will
be called. Accordingly, if the threshold is passed on the way up, it
cannot be passed on the way up again until its passed on the way down
and if it is passed on the way down, it cannot be passed on the way down
again until it is passed on the way up which guarantees correct
triggering of trip crossing notifications.
If the last temperature of the zone is invalid, the trip's threshold
will be set depending of the zone's current temperature: If that
temperature is above the trip's temperature, its threshold will be
set to its low temperature or otherwise its threshold will be set to
its (high) temperature. Because the zone temperature is initially
set to invalid and tz->last_temperature is only updated by
update_temperature(), this is sufficient to set the correct initial
threshold values for all trips.
Link: https://lore.kernel.org/all/20220718145038.1114379-4-daniel.lezcano@linaro.org
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fix from Borislav Petkov:
- Flush the translation service tables to prevent unpredictable
behavior on non-coherent GIC devices
* tag 'irq_urgent_for_v6.7_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/gic-v3-its: Flush ITS tables correctly in non-coherent GIC designs
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Seven small fixes, six in drivers and one in sd.
The sd fix is so large because it changes a struct pointer to a struct
but otherwise is fairly simple"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ufs: qcom-ufs: dt-bindings: Document the SM8650 UFS Controller
scsi: sd: Fix sshdr use in sd_suspend_common()
scsi: scsi_debug: Delete some bogus error checking
scsi: scsi_debug: Fix some bugs in sdebug_error_write()
scsi: ufs: core: Fix racing issue between ufshcd_mcq_abort() and ISR
scsi: ufs: core: Expand MCQ queue slot to DeviceQueueDepth + 1
scsi: qla2xxx: Fix system crash due to bad pointer access
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc fixes from Helge Deller:
"On parisc we still sometimes need writeable stacks, e.g. if programs
aren't compiled with gcc-14. To avoid issues with the upcoming
systemd-254 we therefore have to disable prctl(PR_SET_MDWE) for now
(for parisc only).
The other two patches are minor: a bugfix for the soft power-off on
qemu with 64-bit kernel and prefer strscpy() over strlcpy():
- Fix power soft-off on qemu
- Disable prctl(PR_SET_MDWE) since parisc sometimes still needs
writeable stacks
- Use strscpy instead of strlcpy in show_cpuinfo()"
* tag 'parisc-for-6.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
prctl: Disable prctl(PR_SET_MDWE) on parisc
parisc/power: Fix power soft-off when running on qemu
parisc: Replace strlcpy() with strscpy()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
- Various fixes for the DM delay target to address regressions
introduced during the 6.7 merge window
- Fixes to both DM bufio and the verity target for no-sleep mode,
to address sleeping while atomic issues
- Update DM crypt target in response to the treewide change that
made MAX_ORDER inclusive
* tag 'for-6.7/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm-crypt: start allocating with MAX_ORDER
dm-verity: don't use blocking calls from tasklets
dm-bufio: fix no-sleep mode
dm-delay: avoid duplicate logic
dm-delay: fix bugs introduced by kthread mode
dm-delay: fix a race between delay_presuspend and delay_bio
|
|
Firmware returns the physical address of the power switch,
so need to use gsc_writel() instead of direct memory access.
Fixes: d0c219472980 ("parisc/power: Add power soft-off when running on qemu")
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org # v6.0+
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"Revert a not-working conversion to generic recovery for PXA,
use proper IO accessors for designware, and use proper PM level
for ocores to allow accessing interrupt providers late"
* tag 'i2c-for-6.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: ocores: Move system PM hooks to the NOIRQ phase
i2c: designware: Fix corrupted memory seen in the ISR
Revert "i2c: pxa: move to generic GPIO recovery"
|
|
Pull drm fixes from Daniel Vetter:
"This is a 'blast from the bast' fixes pull, because it contains a
bunch of AGP fixes for amdgpu. Otherwise nothing out of the ordinary.
Next week is back to Dave unless he's knocked out by some conference
bug.
- amdgpu: fixes all over, including a set of AGP fixes
- nouvea: GSP + other bugfixes
- ivpu build fix
- lenovo legion go panel orientation quirk"
* tag 'drm-fixes-2023-11-17' of git://anongit.freedesktop.org/drm/drm: (30 commits)
drm/amdgpu/gmc9: disable AGP aperture
drm/amdgpu/gmc10: disable AGP aperture
drm/amdgpu/gmc11: disable AGP aperture
drm/amdgpu: add a module parameter to control the AGP aperture
drm/amdgpu/gmc11: fix logic typo in AGP check
drm/amd/display: Fix encoder disable logic
drm/amd/display: Change the DMCUB mailbox memory location from FB to inbox
drm/amdgpu: add and populate the port num into xgmi topology info
drm/amd/display: Negate IPS allow and commit bits
drm/amd/pm: Don't send unload message for reset
drm/amdgpu: fix ras err_data null pointer issue in amdgpu_ras.c
drm/amd/display: Clear dpcd_sink_ext_caps if not set
drm/amd/display: Enable fast plane updates on DCN3.2 and above
drm/amd/display: fix NULL dereference
drm/amd/display: fix a NULL pointer dereference in amdgpu_dm_i2c_xfer()
drm/amd/display: Add null checks for 8K60 lightup
drm/amd/pm: Fill pcie error counters for gpu v1_4
drm/amd/pm: Update metric table for smu v13_0_6
drm/amdgpu: correct chunk_ptr to a pointer to chunk.
drm/amd/display: Fix DSC not Enabled on Direct MST Sink
...
|
|
Commit 23baf831a32c ("mm, treewide: redefine MAX_ORDER sanely")
changed the meaning of MAX_ORDER from exclusive to inclusive. So, we
can allocate compound pages with up to 1 << MAX_ORDER pages.
Reflect this change in dm-crypt and start trying to allocate compound
pages with MAX_ORDER.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
|
|
The commit 5721d4e5a9cd enhanced dm-verity, so that it can verify blocks
from tasklets rather than from workqueues. This reportedly improves
performance significantly.
However, dm-verity was using the flag CRYPTO_TFM_REQ_MAY_SLEEP from
tasklets which resulted in warnings about sleeping function being called
from non-sleeping context.
BUG: sleeping function called from invalid context at crypto/internal.h:206
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 14, name: ksoftirqd/0
preempt_count: 100, expected: 0
RCU nest depth: 0, expected: 0
CPU: 0 PID: 14 Comm: ksoftirqd/0 Tainted: G W 6.7.0-rc1 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x32/0x50
__might_resched+0x110/0x160
crypto_hash_walk_done+0x54/0xb0
shash_ahash_update+0x51/0x60
verity_hash_update.isra.0+0x4a/0x130 [dm_verity]
verity_verify_io+0x165/0x550 [dm_verity]
? free_unref_page+0xdf/0x170
? psi_group_change+0x113/0x390
verity_tasklet+0xd/0x70 [dm_verity]
tasklet_action_common.isra.0+0xb3/0xc0
__do_softirq+0xaf/0x1ec
? smpboot_thread_fn+0x1d/0x200
? sort_range+0x20/0x20
run_ksoftirqd+0x15/0x30
smpboot_thread_fn+0xed/0x200
kthread+0xdc/0x110
? kthread_complete_and_exit+0x20/0x20
ret_from_fork+0x28/0x40
? kthread_complete_and_exit+0x20/0x20
ret_from_fork_asm+0x11/0x20
</TASK>
This commit fixes dm-verity so that it doesn't use the flags
CRYPTO_TFM_REQ_MAY_SLEEP and CRYPTO_TFM_REQ_MAY_BACKLOG from tasklets. The
crypto API would do GFP_ATOMIC allocation instead, it could return -ENOMEM
and we catch -ENOMEM in verity_tasklet and requeue the request to the
workqueue.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org # v6.0+
Fixes: 5721d4e5a9cd ("dm verity: Add optional "try_verify_in_tasklet" feature")
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
|
|
dm-bufio has a no-sleep mode. When activated (with the
DM_BUFIO_CLIENT_NO_SLEEP flag), the bufio client is read-only and we
could call dm_bufio_get from tasklets. This is used by dm-verity.
Unfortunately, commit 450e8dee51aa ("dm bufio: improve concurrent IO
performance") broke this and the kernel would warn that cache_get()
was calling down_read() from no-sleeping context. The bug can be
reproduced by using "veritysetup open" with the "--use-tasklets"
flag.
This commit fixes dm-bufio, so that the tasklet mode works again, by
expanding use of the 'no_sleep_enabled' static_key to conditionally
use either a rw_semaphore or rwlock_t (which are colocated in the
buffer_tree structure using a union).
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org # v6.4
Fixes: 450e8dee51aa ("dm bufio: improve concurrent IO performance")
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
|
|
This is small refactoring of dm-delay - we avoid duplicate logic in
flush_delayed_bios and flush_delayed_bios_fast and join these two
functions into one.
We also add cond_resched() to flush_delayed_bios because the list may have
unbounded number of entries.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
|
|
This commit fixes the following bugs introduced by commit 70bbeb29fab0
("dm delay: for short delays, use kthread instead of timers and wq"):
* the function flush_worker_fn has no exit path - on unload, this
function will just loop and consume 100% CPU without any progress
* the wake-up mechanism in flush_worker_fn is racy - a wake up will be
missed if the process adds entries to the delayed_bios list just
before set_current_state(TASK_INTERRUPTIBLE)
* flush_delayed_bios_fast submits a bio while holding a global mutex;
this may deadlock if we have multiple stacked dm-delay devices and
the underlying device attempts to acquire the mutex too
* if the target constructor fails, it will call delay_dtr. delay_dtr
would attempt to free dc->timer_lock without it being initialized by
the constructor.
* if the target constructor's kthread allocation fails, delay_dtr
would crash trying to dereference dc->worker because it is non-NULL
due to ERR_PTR.
Fixes: 70bbeb29fab0 ("dm delay: for short delays, use kthread instead of timers and wq")
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
|
|
In delay_presuspend, we set the atomic variable may_delay and then stop
the timer and flush pending bios. The intention here is to prevent the
delay target from re-arming the timer again.
However, this test is racy. Suppose that one thread goes to delay_bio,
sees that dc->may_delay is one and proceeds; now, another thread executes
delay_presuspend, it sets dc->may_delay to zero, deletes the timer and
flushes pending bios. Then, the first thread continues and adds the bio to
delayed->list despite the fact that dc->may_delay is false.
Fix this bug by changing may_delay's type from atomic_t to bool and
only access it while holding the delayed_bios_lock mutex. Note that we
don't have to grab the mutex in delay_resume because there are no bios
in flight at this point.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
|
|
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-6.7-2023-11-17:
amdgpu:
- DMCUB fixes
- SR-IOV fix
- GMC9 fix
- Documentation fix
- DSC MST fix
- CS chunk parsing fix
- SMU13.0.6 fixes
- 8K tiled display fix
- Fix potential NULL pointer dereferences
- Cursor lag fix
- Backlight fix
- DCN s0ix fix
- XGMI fix
- DCN encoder disable logic fix
- AGP aperture fixes
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231117063441.4883-1-alexander.deucher@amd.com
|
|
We've had misc reports of random IOMMU page faults when
this is used. It's just a rarely used optimization anyway, so
let's just disable it. It can still be toggled via the
module parameter for testing.
v2: leave it configurable via module parameter
Reviewed-by: Yang Wang <kevinyang.wang@amd.com> (v1)
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Tested-by: Mario Limonciello <mario.limonciello@amd.com> # PHX & Navi33
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
We've had misc reports of random IOMMU page faults when
this is used. It's just a rarely used optimization anyway, so
let's just disable it. It can still be toggled via the
module parameter for testing.
v2: leave it configurable via module parameter
Reviewed-by: Yang Wang <kevinyang.wang@amd.com> (v1)
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Tested-by: Mario Limonciello <mario.limonciello@amd.com> # PHX & Navi33
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
We've had misc reports of random IOMMU page faults when
this is used. It's just a rarely used optimization anyway, so
let's just disable it. It can still be toggled via the
module parameter for testing.
v2: leave it configurable via module parameter
Fixes: 67318cb84341 ("drm/amdgpu/gmc11: set gart placement GC11")
Reviewed-by: Yang Wang <kevinyang.wang@amd.com> (v1)
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Tested-by: Mario Limonciello <mario.limonciello@amd.com> # PHX & Navi33
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Add a module parameter to control the AGP aperture. The AGP
aperture is an aperture in the GPU's internal address space
which provides direct non-paged access to the platform address
space. This access is non-snooped so only uncached memory
can be accessed.
Add a knob so that we can toggle this for debugging.
Fixes: 67318cb84341 ("drm/amdgpu/gmc11: set gart placement GC11")
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Tested-by: Mario Limonciello <mario.limonciello@amd.com> # PHX & Navi33
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Should be && rather than ||.
Fixes: b2e1cbe6281f ("drm/amdgpu/gmc11: disable AGP on GC 11.5")
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Tested-by: Mario Limonciello <mario.limonciello@amd.com> # PHX & Navi33
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
[WHY]
DENTIST hangs when OTG is off and encoder is on. We were not
disabling the encoder properly when switching from extended mode to
external monitor only.
[HOW]
Disable the encoder using an existing enable/disable fifo helper instead
of enc35_stream_encoder_enable.
Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Acked-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Nicholas Susanto <nicholas.susanto@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
[WHY]
Flush command sent to DMCUB spends more time for execution on
a dGPU than on an APU. This causes cursor lag when using high
refresh rate mouses.
[HOW]
1. Change the DMCUB mailbox memory location from FB to inbox.
2. Only change windows memory to inbox.
Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Acked-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Lewis Huang <lewis.huang@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
The port num info is firstly introduced with 20.00.01.13 xgmi ta and
make them as part of topology info.
Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
[WHY]
On s0i3, IPS mask isn't saved and restored.
It is reset to zero on exit.
If it is cleared unexpectedly, driver will
proceed operations while DCN is in IPS2 and
cause a hang.
[HOW]
Negate the bit logic. Default value of
zero indicates it is still in IPS2. Driver
must poll for the bit to assert.
Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Reviewed-by: Charlene Liu <charlene.liu@amd.com>
Acked-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Duncan Ma <duncan.ma@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
No need to notify about unload during reset. Also remove the FW version
check.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
fix ras err_data null pointer issue in amdgpu_ras.c
Fixes: 8cc0f5669eb6 ("drm/amdgpu: Support multiple error query modes")
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
[WHY]
Some eDP panels' ext caps don't set initial values
and the value of dpcd_addr (0x317) is random.
It means that sometimes the eDP can be OLED, miniLED and etc,
and cause incorrect backlight control interface.
[HOW]
Add remove_sink_ext_caps to remove sink ext caps (HDR, OLED and etc)
Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Reviewed-by: Anthony Koo <anthony.koo@amd.com>
Acked-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Paul Hsieh <paul.hsieh@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|