kernel/linux.git/drivers/accel/amdxdna, branch v6.19.12

accel/amdxdna: Fix runtime suspend deadlock when there is pending job

2026-03-19T15:14:56+00:00

[ Upstream commit 6b13cb8f48a42ddf6dd98865b673a82e37ff238b ] The runtime suspend callback drains the running job workqueue before suspending the device. If a job is still executing and calls pm_runtime_resume_and_get(), it can deadlock with the runtime suspend path. Fix this by moving pm_runtime_resume_and_get() from the job execution routine to the job submission routine, ensuring the device is resumed before the job is queued and avoiding the deadlock during runtime suspend. Fixes: 063db451832b ("accel/amdxdna: Enhance runtime power management") Reviewed-by: Mario Limonciello (AMD) Signed-off-by: Lizhi Hou Link: https://patch.msgid.link/20260310180058.336348-1-lizhi.hou@amd.com Signed-off-by: Sasha Levin

accel/amdxdna: Fix NULL pointer dereference of mgmt_chann

2026-03-12T11:09:49+00:00

[ Upstream commit 6270ee26e1edd862ea17e3eba148ca8fb2c99dc9 ] mgmt_chann may be set to NULL if the firmware returns an unexpected error in aie2_send_mgmt_msg_wait(). This can later lead to a NULL pointer dereference in aie2_hw_stop(). Fix this by introducing a dedicated helper to destroy mgmt_chann and by adding proper NULL checks before accessing it. Fixes: b87f920b9344 ("accel/amdxdna: Support hardware mailbox") Reviewed-by: Mario Limonciello (AMD) Signed-off-by: Lizhi Hou Link: https://patch.msgid.link/20260226213857.3068474-1-lizhi.hou@amd.com Signed-off-by: Sasha Levin

accel/amdxdna: Fill invalid payload for failed command

2026-03-12T11:09:45+00:00

[ Upstream commit 89ff45359abbf9d8d3c4aa3f5a57ed0be82b5a12 ] Newer userspace applications may read the payload of a failed command to obtain detailed error information. However, the driver and old firmware versions may not support returning advanced error information. In this case, initialize the command payload with an invalid value so userspace can detect that no detailed error information is available. Fixes: aac243092b70 ("accel/amdxdna: Add command execution") Reviewed-by: Mario Limonciello (AMD) Signed-off-by: Lizhi Hou Link: https://patch.msgid.link/20260227004841.3080241-1-lizhi.hou@amd.com Signed-off-by: Sasha Levin

accel/amdxdna: Validate command buffer payload count

2026-03-12T11:09:15+00:00

[ Upstream commit 901ec3470994006bc8dd02399e16b675566c3416 ] The count field in the command header is used to determine the valid payload size. Verify that the valid payload does not exceed the remaining buffer space. Fixes: aac243092b70 ("accel/amdxdna: Add command execution") Reviewed-by: Mario Limonciello (AMD) Signed-off-by: Lizhi Hou Link: https://patch.msgid.link/20260219211946.1920485-1-lizhi.hou@amd.com Signed-off-by: Sasha Levin

accel/amdxdna: Prevent ubuf size overflow

2026-03-12T11:09:15+00:00

[ Upstream commit 03808abb1d868aed7478a11a82e5bb4b3f1ca6d6 ] The ubuf size calculation may overflow, resulting in an undersized allocation and possible memory corruption. Use check_add_overflow() helpers to validate the size calculation before allocation. Fixes: bd72d4acda10 ("accel/amdxdna: Support user space allocated buffer") Reviewed-by: Mario Limonciello (AMD) Signed-off-by: Lizhi Hou Link: https://patch.msgid.link/20260217192815.1784689-1-lizhi.hou@amd.com Signed-off-by: Sasha Levin

accel/amdxdna: Fix out-of-bounds memset in command slot handling

2026-03-12T11:09:15+00:00

[ Upstream commit 1110a949675ebd56b3f0286e664ea543f745801c ] The remaining space in a command slot may be smaller than the size of the command header. Clearing the command header with memset() before verifying the available slot space can result in an out-of-bounds write and memory corruption. Fix this by moving the memset() call after the size validation. Fixes: 3d32eb7a5ecf ("accel/amdxdna: Fix cu_idx being cleared by memset() during command setup") Reviewed-by: Mario Limonciello (AMD) Signed-off-by: Lizhi Hou Link: https://patch.msgid.link/20260217185415.1781908-1-lizhi.hou@amd.com Signed-off-by: Sasha Levin

accel/amdxdna: Fix command hang on suspended hardware context

2026-03-12T11:09:14+00:00

[ Upstream commit 07efce5a6611af6714ea3ef65694e0c8dd7e44f5 ] When a hardware context is suspended, the job scheduler is stopped. If a command is submitted while the context is suspended, the job is queued in the scheduler but aie2_sched_job_run() is never invoked to restart the hardware context. As a result, the command hangs. Fix this by modifying the hardware context suspend routine to keep the job scheduler running so that queued jobs can trigger context restart properly. Fixes: aac243092b70 ("accel/amdxdna: Add command execution") Reviewed-by: Mario Limonciello (AMD) Signed-off-by: Lizhi Hou Link: https://patch.msgid.link/20260211205341.722982-1-lizhi.hou@amd.com Signed-off-by: Sasha Levin

accel/amdxdna: Fix suspend failure after enabling turbo mode

2026-03-12T11:09:14+00:00

[ Upstream commit fdb65acfe655f844ae1e88696b9656d3ef5bb8fb ] Enabling turbo mode disables hardware clock gating. Suspend requires hardware clock gating to be re-enabled, otherwise suspend will fail. Fix this by calling aie2_runtime_cfg() from aie2_hw_stop() to re-enable clock gating during suspend. Also ensure that firmware is initialized in aie2_hw_start() before modifying clock-gating settings during resume. Fixes: f4d7b8a6bc8c ("accel/amdxdna: Enhance power management settings") Reviewed-by: Mario Limonciello (AMD) Signed-off-by: Lizhi Hou Link: https://patch.msgid.link/20260211204716.722788-1-lizhi.hou@amd.com Signed-off-by: Sasha Levin

accel/amdxdna: Fix dead lock for suspend and resume

2026-03-12T11:09:14+00:00

[ Upstream commit 1aa82181a3c285c7351523d587f7981ae4c015c8 ] When an application issues a query IOCTL while auto suspend is running, a deadlock can occur. The query path holds dev_lock and then calls pm_runtime_resume_and_get(), which waits for the ongoing suspend to complete. Meanwhile, the suspend callback attempts to acquire dev_lock and blocks, resulting in a deadlock. Fix this by releasing dev_lock before calling pm_runtime_resume_and_get() and reacquiring it after the call completes. Also acquire dev_lock in the resume callback to keep the locking consistent. Fixes: 063db451832b ("accel/amdxdna: Enhance runtime power management") Reviewed-by: Mario Limonciello (AMD) Signed-off-by: Lizhi Hou Link: https://patch.msgid.link/20260211204644.722758-1-lizhi.hou@amd.com Signed-off-by: Sasha Levin

accel/amdxdna: Reduce log noise during process termination

2026-03-12T11:09:14+00:00

[ Upstream commit 57aa3917a3b3bd805a3679371f97a1ceda3c5510 ] During process termination, several error messages are logged that are not actual errors but expected conditions when a process is killed or interrupted. This creates unnecessary noise in the kernel log. The specific scenarios are: 1. HMM invalidation returns -ERESTARTSYS when the wait is interrupted by a signal during process cleanup. This is expected when a process is being terminated and should not be logged as an error. 2. Context destruction returns -ENODEV when the firmware or device has already stopped, which commonly occurs during cleanup if the device was already torn down. This is also an expected condition during orderly shutdown. Downgrade these expected error conditions from error level to debug level to reduce log noise while still keeping genuine errors visible. Fixes: 97f27573837e ("accel/amdxdna: Fix potential NULL pointer dereference in context cleanup") Reviewed-by: Lizhi Hou Signed-off-by: Mario Limonciello Signed-off-by: Lizhi Hou Link: https://patch.msgid.link/20260210164521.1094274-3-mario.limonciello@amd.com Signed-off-by: Sasha Levin