| Age | Commit message (Collapse) | Author | Files | Lines |
|
[ Upstream commit 7bbf6d15e935abbb3d604c1fa157350e84a26f98 ]
SVA support is required, which isn't configured by hypervisor
solutions.
Closes: https://github.com/QubesOS/qubes-issues/issues/10275
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4656
Reviewed-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://patch.msgid.link/20251213054513.87925-1-superm1@kernel.org
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit ca2583412306ceda9304a7c4302fd9efbf43e963 ]
Hardware context destroy function holds dev_lock while waiting for all jobs
to complete. The timeout job also needs to acquire dev_lock, this leads to
a deadlock.
Fix the issue by temporarily releasing dev_lock before waiting for all
jobs to finish, and reacquiring it afterward.
Fixes: 4fd6ca90fc7f ("accel/amdxdna: Refactor hardware context destroy routine")
Reviewed-by: Maciej Falkowski <maciej.falkowski@linux.intel.com>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://patch.msgid.link/20251107181050.1293125-1-lizhi.hou@amd.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 6ff9385c07aa311f01f87307e6256231be7d8675 ]
The mailbox interrupt register is not always cleared when a mailbox channel
is created. This can leave stale interrupt states from previous operations.
Fix this by explicitly clearing the interrupt register in the mailbox
channel creation function.
Fixes: b87f920b9344 ("accel/amdxdna: Support hardware mailbox")
Reviewed-by: Maciej Falkowski <maciej.falkowski@linux.intel.com>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://patch.msgid.link/20251107181115.1293158-1-lizhi.hou@amd.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit dea9f84776b96a703f504631ebe9fea07bd2c181 ]
Currently, dma_fence_put(job->fence) is called in job notification
callback. However, if a job is canceled, the notification callback is never
invoked, leading to a memory leak. Move dma_fence_put(job->fence)
to the job cleanup function to ensure the fence is always released.
Fixes: aac243092b70 ("accel/amdxdna: Add command execution")
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://patch.msgid.link/20251105194140.1004314-1-lizhi.hou@amd.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 6fb7f298883246e21f60f971065adcb789ae6eba ]
When a command times out, mark it as ERT_CMD_STATE_TIMEOUT. Any other
commands that are canceled due to this timeout should be marked as
ERT_CMD_STATE_ABORT.
Fixes: aac243092b70 ("accel/amdxdna: Add command execution")
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://patch.msgid.link/20251029193423.2430463-1-lizhi.hou@amd.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 00812636df370bedf4e44a0c81b86ea96bca8628 ]
Fix 'Memory manager not clean during takedown' warning that occurs
when ivpu_gem_bo_free() removes the BO from the BOs list before it
gets unmapped. Then file_priv_unbind() triggers a warning in
drm_mm_takedown() during context teardown.
Protect the unmapping sequence with bo_list_lock to ensure the BO is
always fully unmapped when removed from the list. This ensures the BO
is either fully unmapped at context teardown time or present on the
list and unmapped by file_priv_unbind().
Fixes: 48aea7f2a2ef ("accel/ivpu: Fix locking in ivpu_bo_remove_all_bos_from_context()")
Signed-off-by: Tomasz Rusinowicz <tomasz.rusinowicz@intel.com>
Reviewed-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com>
Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com>
Link: https://patch.msgid.link/20251029071451.184243-1-karol.wachowski@linux.intel.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit c063c1bbee67391f12956d2ffdd5da00eb87ff79 ]
Rework of imported buffers introduced in the commit
e0c0891cd63b ("accel/ivpu: Rework bind/unbind of imported buffers")
switched the logic of imported buffers by dma mapping/unmapping
them just as the regular buffers.
The commit didn't include removal of skipping dma unmap of imported
buffers which results in them being mapped without unmapping.
Fixes: e0c0891cd63b ("accel/ivpu: Rework bind/unbind of imported buffers")
Reviewed-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com>
Reviewed-by: Karol Wachowski <karol.wachowski@linux.intel.com>
Signed-off-by: Maciej Falkowski <maciej.falkowski@linux.intel.com>
Link: https://patch.msgid.link/20251027150933.2384538-1-maciej.falkowski@linux.intel.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 81233d5419cf20e4f5ef505882951616888a2ef9 ]
In aie2_get_hwctx_status() and aie2_query_ctx_status_array(), the
functions could return an uninitialized value in some cases. Update them
to always return 0. The amount of valid results is indicated by the
returned buffer_size, element_size, and num_element fields.
Fixes: 2f509fe6a42c ("accel/amdxdna: Add ioctl DRM_IOCTL_AMDXDNA_GET_ARRAY")
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://patch.msgid.link/20251024165503.1548131-1-lizhi.hou@amd.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 63c7870fab67b2ab2bfe75e8b46f3c37b88c47a8 ]
Fix a race that can occur when multiple jobs submit the same dmabuf.
This could cause the sg_table to be mapped twice, leading to undefined
behavior.
Fixes: e0c0891cd63b ("accel/ivpu: Rework bind/unbind of imported buffers")
Signed-off-by: Wludzik, Jozef <jozef.wludzik@intel.com>
Reviewed-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com>
Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com>
Link: https://lore.kernel.org/r/20251014071725.3047287-1-karol.wachowski@linux.intel.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit aa1c2b073ad23847dd2e7bdc7d30009f34ed7f59 ]
The pcode MAILBOX STATUS register PARAM2 field expects DCT active
percent in U1.7 value format. Convert percentage value to this
format before writing to the register.
Fixes: a19bffb10c46 ("accel/ivpu: Implement DCT handling")
Reviewed-by: Lizhi Hou <lizhi.hou@amd.com>
Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com>
Link: https://lore.kernel.org/r/20251001104322.1249896-1-karol.wachowski@linux.intel.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 8b694b405a84696f1d964f6da7cf9721e68c4714 ]
Don't add BO to the vdev->bo_list in ivpu_gem_create_object().
When failure happens inside drm_gem_shmem_create(), the BO is not
fully created and ivpu_gem_bo_free() callback will not be called
causing a deleted BO to be left on the list.
Fixes: 8d88e4cdce4f ("accel/ivpu: Use GEM shmem helper for all buffers")
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Signed-off-by: Maciej Falkowski <maciej.falkowski@linux.intel.com>
Reviewed-by: Karol Wachowski <karol.wachowski@linux.intel.com>
Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com>
Link: https://lore.kernel.org/r/20250925145114.1446283-1-maciej.falkowski@linux.intel.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit e0c0891cd63bf8338c25c423e28a5a93aed3d74c ]
Ensure that imported buffers are properly mapped and unmapped in
the same way as regular buffers to properly handle buffers during
device's bind and unbind operations to prevent resource leaks and
inconsistent buffer states.
Imported buffers are now dma_mapped before submission and
dma_unmapped in ivpu_bo_unbind(), guaranteeing they are unmapped
when the device is unbound.
Add also imported buffers to vdev->bo_list for consistent unmapping
on device unbind. The bo->ctx_id is set in open() so imported
buffers have a valid context ID.
Debug logs have been updated to match the new code structure.
The function ivpu_bo_pin() has been renamed to ivpu_bo_bind()
to better reflect its purpose, and unbind tests have been refactored
for improved coverage and clarity.
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Signed-off-by: Maciej Falkowski <maciej.falkowski@linux.intel.com>
Reviewed-by: Karol Wachowski <karol.wachowski@linux.intel.com>
Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com>
Link: https://lore.kernel.org/r/20250925145059.1446243-1-maciej.falkowski@linux.intel.com
Stable-dep-of: 8b694b405a84 ("accel/ivpu: Fix page fault in ivpu_bo_unbind_all_bos_from_context()")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 9f6c63285737b141ca25a619add80a96111b8b96 ]
Previously, aborting work could return early after engine reset or resume
failure, skipping the necessary runtime_put cleanup leaving the device
with incorrect reference count breaking runtime power management state.
Replace early returns with goto statements to ensure runtime_put is always
executed.
Fixes: a47e36dc5d90 ("accel/ivpu: Trigger device recovery on engine reset/resume failure")
Reviewed-by: Lizhi Hou <lizhi.hou@amd.com>
Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com>
Link: https://lore.kernel.org/r/20250916084809.850073-1-karol.wachowski@linux.intel.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 457f4393d02fdb612a93912fb09cef70e6e545c9 ]
In amdxdna_gem_obj_vmap(), calling dma_buf_vmap() triggers a kernel
warning if LOCKDEP is enabled. So for imported object, use
dma_buf_vmap_unlocked(). Then, use drm_gem_vmap() for other objects.
The similar change applies to vunmap code.
Fixes: bd72d4acda10 ("accel/amdxdna: Support user space allocated buffer")
Reviewed-by: Maciej Falkowski <maciej.falkowski@linux.intel.com>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://lore.kernel.org/r/20250916174842.234709-1-lizhi.hou@amd.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 9e16c8bf9aebf629344cfd4cd5e3dc7d8c3f7d82 ]
The unpublished smatch static checker reported a warning.
drivers/accel/amdxdna/aie2_pci.c:904 aie2_query_ctx_status_array()
warn: potential user controlled sizeof overflow
'args->num_element * args->element_size' '1-u32max(user) * 1-u32max(user)'
Even this will not cause a real issue, it is better to put a reasonable
limitation for element_size and num_element. Add condition to make sure
the input element_size <= 4K and num_element <= 1K.
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/dri-devel/aL56ZCLyl3tLQM1e@stanley.mountain/
Fixes: 2f509fe6a42c ("accel/amdxdna: Add ioctl DRM_IOCTL_AMDXDNA_GET_ARRAY")
Reviewed-by: Maciej Falkowski <maciej.falkowski@linux.intel.com>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://lore.kernel.org/r/20250909154531.3469979-1-lizhi.hou@amd.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
Two threads of the same process can potential read and write parallelly to
head and tail pointers of the same DBC request queue. This could lead to a
race condition and corrupt the DBC request queue.
Fixes: ff13be830333 ("accel/qaic: Add datapath")
Signed-off-by: Pranjal Ramajor Asha Kanojiya <quic_pkanojiy@quicinc.com>
Signed-off-by: Youssef Samir <youssef.abdulrahman@oss.qualcomm.com>
Reviewed-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com>
Reviewed-by: Carl Vanderlip <carl.vanderlip@oss.qualcomm.com>
[jhugo: Add fixes tag]
Signed-off-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20251007061837.206132-1-youssef.abdulrahman@oss.qualcomm.com
|
|
Currently, if find_and_map_user_pages() takes a DMA xfer request from the
user with a length field set to 0, or in a rare case, the host receives
QAIC_TRANS_DMA_XFER_CONT from the device where resources->xferred_dma_size
is equal to the requested transaction size, the function will return 0
before allocating an sgt or setting the fields of the dma_xfer struct.
In that case, encode_addr_size_pairs() will try to access the sgt which
will lead to a general protection fault.
Return an EINVAL in case the user provides a zero-sized ALP, or the device
requests continuation after all of the bytes have been transferred.
Fixes: 96d3c1cadedb ("accel/qaic: Clean up integer overflow checking in map_user_pages()")
Signed-off-by: Youssef Samir <quic_yabdulra@quicinc.com>
Signed-off-by: Youssef Samir <youssef.abdulrahman@oss.qualcomm.com>
Reviewed-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com>
Reviewed-by: Carl Vanderlip <carl.vanderlip@oss.qualcomm.com>
Signed-off-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20251007122320.339654-1-youssef.abdulrahman@oss.qualcomm.com
|
|
As soon as we queue MHI buffers to receive the bootlog from the device,
we could be receiving data. Therefore all the resources needed to
process that data need to be setup prior to queuing the buffers.
We currently initialize some of the resources after queuing the buffers
which creates a race between the probe() and any data that comes back
from the device. If the uninitialized resources are accessed, we could
see page faults.
Fix the init ordering to close the race.
Fixes: 5f8df5c6def6 ("accel/qaic: Add bootlog debugfs")
Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Youssef Samir <youssef.abdulrahman@oss.qualcomm.com>
Reviewed-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com>
Reviewed-by: Carl Vanderlip <carl.vanderlip@oss.qualcomm.com>
Signed-off-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20251007115750.332169-1-youssef.abdulrahman@oss.qualcomm.com
|
|
On HL338 ASICs, the Infineon first‑stage firmware is not present and
the reported version is 0. In this case printing a version number is
misleading, as it suggests valid firmware when it does not exist.
Fix this by printing the first‑stage Infineon firmware version only
if the reported value is non‑zero. This avoids confusing or incorrect
log messages on devices where the first stage is not applicable.
Signed-off-by: Pavan S <pavan.sreenivas@intel.com>
Reviewed-by: Koby Elbaz <koby.elbaz@intel.com>
Signed-off-by: Koby Elbaz <koby.elbaz@intel.com>
|
|
Dirty state can occur when the host VM undergoes a reset while the
device does not. In such a case, the driver must reset the device before
it can be used again. As part of this reset, the device capabilities
are zeroed. Therefore, the driver must read the Preboot status again to
learn the Preboot state, capabilities, and security configuration.
Signed-off-by: Konstantin Sinyuk <konstantin.sinyuk@intel.com>
Reviewed-by: Koby Elbaz <koby.elbaz@intel.com>
Signed-off-by: Koby Elbaz <koby.elbaz@intel.com>
|
|
Add a new passthrough type HL_GET_P_STATE to the cpucp generic ioctl
to allow userspace to read the device performance state via firmware.
Signed-off-by: Ariel Aviad <ariel.aviad@intel.com>
Reviewed-by: Koby Elbaz <koby.elbaz@intel.com>
Signed-off-by: Koby Elbaz <koby.elbaz@intel.com>
|
|
Add debugfs files for NVMe Direct I/O (HLDIO) functionality.
This interface allows userspace access to direct SSD ↔ device transfers
through debugfs nodes.
Four debugfs files are created under /sys/kernel/debug/habanalabs/hlN/:
- dio_ssd2hl : trigger SSD-to-device transfers
- dio_hl2ssd : trigger device-to-SSD transfers
(placeholder, not yet implemented)
- dio_stats : show transfer statistics
- dio_reset : reset statistics counters
Usage examples:
# Perform SSD → device transfer
echo "fd=3 va=0x10000 off=0 len=4096" > \
/sys/kernel/debug/habanalabs/hl0/dio_ssd2hl
# View statistics
cat /sys/kernel/debug/habanalabs/hl0/dio_stats
# Reset counters
echo 1 > /sys/kernel/debug/habanalabs/hl0/dio_reset
This interface provides access to HLDIO functionality for validation
and diagnostics.
Signed-off-by: Konstantin Sinyuk <konstantin.sinyuk@intel.com>
Reviewed-by: Farah Kassabri <farah.kassabri@intel.com>
Reviewed-by: Koby Elbaz <koby.elbaz@intel.com>
Signed-off-by: Koby Elbaz <koby.elbaz@intel.com>
|
|
Introduce NVMe Direct I/O (HLDIO) infrastructure to support
peer‑to‑peer DMA in the habanalabs driver. This adds internal helpers
and data structures to enable direct transfers between NVMe storage
and device memory.
The feature is built only when CONFIG_HL_HLDIO is enabled. A debugfs
interface is also provided for functional validation.
Signed-off-by: Konstantin Sinyuk <konstantin.sinyuk@intel.com>
Reviewed-by: Farah Kassabri <farah.kassabri@intel.com>
Reviewed-by: Koby Elbaz <koby.elbaz@intel.com>
Signed-off-by: Koby Elbaz <koby.elbaz@intel.com>
|
|
When IOMMU is enabled, dma_alloc_coherent() with GFP_USER may return
addresses from the vmalloc range. If such an address is mapped without
VM_MIXEDMAP, vm_insert_page() will trigger a BUG_ON due to the
VM_PFNMAP restriction.
Fix this by checking for vmalloc addresses and setting VM_MIXEDMAP
in the VMA before mapping. This ensures safe mapping and avoids kernel
crashes. The memory is still driver-allocated and cannot be accessed
directly by userspace.
Signed-off-by: Moti Haimovski <moti.haimovski@intel.com>
Reviewed-by: Koby Elbaz <koby.elbaz@intel.com>
Signed-off-by: Koby Elbaz <koby.elbaz@intel.com>
|
|
The access_ok() API no longer requires the VERIFY_WRITE argument,
and the use of the old interface with VERIFY_WRITE is deprecated.
Clean up the habanalabs memory manager to use the modern access_ok()
interface consistently. This removes old #ifdef guards and aligns the
driver with current upstream kernel APIs.
Signed-off-by: Ilia Levi <ilia.levi@intel.com>
Reviewed-by: Koby Elbaz <koby.elbaz@intel.com>
Signed-off-by: Koby Elbaz <koby.elbaz@intel.com>
|
|
After CPLD shutdown event the device is not usable anymore. The common
CPLD_SHUTDOWN event handler disables any subsequent device access.
Signed-off-by: Konstantin Sinyuk <konstantin.sinyuk@intel.com>
Reviewed-by: Koby Elbaz <koby.elbaz@intel.com>
Signed-off-by: Koby Elbaz <koby.elbaz@intel.com>
|
|
After a CPLD shutdown event the device becomes unusable. Prevent further
device access once this event is received.
Signed-off-by: Konstantin Sinyuk <konstantin.sinyuk@intel.com>
Reviewed-by: Koby Elbaz <koby.elbaz@intel.com>
Signed-off-by: Koby Elbaz <koby.elbaz@intel.com>
|
|
In hl_release_dmabuf(), ctx is dereferenced after calling hl_ctx_put()
to obtain the compute device file.
This is safe because the dma-buf object holds a file reference taken in
export_dmabuf(), and the file release (which drops another ctx reference)
can only happen after we drop that file reference via fput(). Thus, this
hl_ctx_put() call cannot be the last one at this point.
Add a comment explaining this to avoid confusion.
Signed-off-by: Tomer Tayar <tomer.tayar@intel.com>
Reviewed-by: Koby Elbaz <koby.elbaz@intel.com>
Signed-off-by: Koby Elbaz <koby.elbaz@intel.com>
|
|
Add infrastructure for logging the last configuration register accesses
that occur via debugfs read/write operations. At interrupt time, these
log entries can be dumped to dmesg, which helps in diagnosing the cause
of RAZWI and ADDR_DEC interrupts.
The logging is implemented as a ring buffer of access entries, with each
entry recording timestamp and access details. To ensure correctness
under concurrent access, operations are now protected using spinlocks.
Entries are copied under lock and then printed after releasing it, which
minimizes time spent in the critical section.
Signed-off-by: Sharley Calzolari <sharley.calzolari@intel.com>
Reviewed-by: Koby Elbaz <koby.elbaz@intel.com>
Signed-off-by: Koby Elbaz <koby.elbaz@intel.com>
|
|
Print engine/queue names instead of numerical engine/queue IDs to make
logs and debug output more readable.
Signed-off-by: Ariel Suller <ariel.suller@intel.com>
Reviewed-by: Koby Elbaz <koby.elbaz@intel.com>
Signed-off-by: Koby Elbaz <koby.elbaz@intel.com>
|
|
Add a new CPUCP generic message type to retrieve HBM, SRAM and critical
error counters from the device.
Signed-off-by: Vitaly Margolin <vitaly.margolin@intel.com>
Reviewed-by: Koby Elbaz <koby.elbaz@intel.com>
Signed-off-by: Koby Elbaz <koby.elbaz@intel.com>
|
|
Change the BMON_CR register value back to its original state before
enabling, so that BMON does not continue to collect information
after being disabled.
Signed-off-by: Vered Yavniely <vered.yavniely@intel.com>
Reviewed-by: Koby Elbaz <koby.elbaz@intel.com>
Signed-off-by: Koby Elbaz <koby.elbaz@intel.com>
|
|
EFAULT is currently returned if less than requested user pages are
pinned. This value means a "bad address" which might be confusing to
the user, as the address of the given user memory is not necessarily
"bad".
Modify the return value to ENOMEM, as "out of memory" is more suitable
in this case.
Signed-off-by: Tomer Tayar <tomer.tayar@intel.com>
Reviewed-by: Koby Elbaz <koby.elbaz@intel.com>
Signed-off-by: Koby Elbaz <koby.elbaz@intel.com>
|
|
This is a backmerge of Linux 6.17-rc6, needed for msm,
also requested by misc.
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
Add interface for applications to get information array. The application
provides a buffer pointer along with information type, maximum number of
entries and maximum size of each entry. The buffer may also contain match
conditions based on the information type. After the ioctl completes, the
actual number of entries and entry size are returned. (see [1], used by
driver runtime library)
[1] https://github.com/amd/xdna-driver/blob/main/src/shim/host/platform_host.cpp#L337
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Reviewed-by: Maciej Falkowski <maciej.falkowski@linux.intel.com>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://lore.kernel.org/r/20250903053402.2103196-1-lizhi.hou@amd.com
|
|
Use disable_work_sync() instead of cancel_work_sync() in ivpu_dev_fini()
to ensure that no new recovery work items can be queued after device
removal has started. Previously, recovery work could be scheduled even
after canceling existing work, potentially leading to use-after-free
bugs if recovery accessed freed resources.
Rename ivpu_pm_cancel_recovery() to ivpu_pm_disable_recovery() to better
reflect its new behavior.
Fixes: 58cde80f45a2 ("accel/ivpu: Use dedicated work for job timeout detection")
Cc: stable@vger.kernel.org # v6.8+
Signed-off-by: Karol Wachowski <karol.wachowski@intel.com>
Reviewed-by: Lizhi Hou <lizhi.hou@amd.com>
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Link: https://lore.kernel.org/r/20250808110939.328366-1-jacek.lawrynowicz@linux.intel.com
|
|
Make ivpu_hw_btrs_dct_set_status() and ivpu_fw_boot_params_setup()
declaration and definition parameter names consistent.
Reviewed-by: Lizhi Hou <lizhi.hou@amd.com>
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Link: https://lore.kernel.org/r/20250808111014.328607-1-jacek.lawrynowicz@linux.intel.com
|
|
This change removes the unnecessary condition, makes the code clearer,
and silences clang-tidy warning.
Reviewed-by: Lizhi Hou <lizhi.hou@amd.com>
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Link: https://lore.kernel.org/r/20250808111044.328800-1-jacek.lawrynowicz@linux.intel.com
|
|
The problem is that pm_runtime_get_sync() can return 1 on success so
checking for zero doesn't work. Use the pm_runtime_resume_and_get()
function instead. The pm_runtime_resume_and_get() function does
additional cleanup as well so that's a bonus as well.
Fixes: 0810d5ad88a1 ("accel/rocket: Add job submission IOCTL")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
Link: https://lore.kernel.org/r/aKcRW6fsRP_o5C_y@stanley.mountain
|
|
Right now, the code checks the DMA_READ_ERROR state 2 times, while
I guess it was supposed to warn about both read and write errors.
Change the 2nd check to look at the write-error flag.
Fixes: 0810d5ad88a1 ("accel/rocket: Add job submission IOCTL")
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
Link: https://lore.kernel.org/r/20250818185658.2585696-1-heiko@sntech.de
|
|
Replace usages of kfree() with kvfree() for pointers which were
allocated using kvmalloc(), as required by the kernel memory management
API.
Use sizeof() on the type that a pointer references instead of the
pointer itself. In this case, scheds and *scheds both happen to be
pointers, so sizeof() will expand to the same value in either case, but
using *scheds is more technically correct since scheds is an array of
drm_gpu_scheduler *.
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Julia Lawall <julia.lawall@inria.fr>
Closes: https://lore.kernel.org/r/202508120730.PLbjlKbI-lkp@intel.com/
Signed-off-by: Brigham Campbell <me@brighamcampbell.com>
Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
Link: https://lore.kernel.org/r/20250813-rocket-free-fix-v1-1-51f00a7a1271@brighamcampbell.com
Fixes: 0810d5ad88a1 ("accel/rocket: Add job submission IOCTL")
|
|
With the current dependency on only DRM, a config of
CONFIG_DRM_ACCEL_ROCKET=y
is possible, but of course wrong, because without DRM_ACCEL the build-
system will never even enter drivers/accel/* .
So depend on DRM_ACCEL instead of just DRM.
Fixes: ed98261b4168 ("accel/rocket: Add a new driver for Rockchip's NPU")
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
Link: https://lore.kernel.org/r/20250814113519.1551855-3-heiko@sntech.de
|
|
The general indentation for the Kconfig lines is one tab, so adapt the
lines accordingly.
The description is correctly indented (1 tab + 2 spaces) so doesn't need
changes.
Fixes: ed98261b4168 ("accel/rocket: Add a new driver for Rockchip's NPU")
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
Link: https://lore.kernel.org/r/20250814113519.1551855-2-heiko@sntech.de
|
|
Change the 'ret' variable from u32 to int to store -EINVAL. Storing the
negative error codes in unsigned type, doesn't cause an issue at runtime
but it's ugly as pants.
Additionally, assigning -EINVAL to u32 ret (i.e., u32 ret = -EINVAL) may
trigger a GCC warning when the -Wsign-conversion flag is enabled.
Fixes: aac243092b70 ("accel/amdxdna: Add command execution")
Signed-off-by: Qianfeng Rong <rongqianfeng@vivo.com>
Reviewed-by: Lizhi Hou <lizhi.hou@amd.com>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://lore.kernel.org/r/20250828033917.113364-1-rongqianfeng@vivo.com
|
|
drivers/accel/amdxdna/aie2_pci.c:794:13: sparse: sparse: incorrect type in assignment (different address spaces)
Fixes: c8cea4371e5e ("accel/amdxdna: Add a function to walk hardware contexts")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202508230855.0b9efFl6-lkp@intel.com/
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://lore.kernel.org/r/20250826171951.801585-1-lizhi.hou@amd.com
|
|
Update drm-misc-fixes to -rc2.
Signed-off-by: Maxime Ripard <mripard@kernel.org>
|
|
Bring v6.17-rc2 in to unstuck for-linux-next.
Signed-off-by: Maxime Ripard <mripard@kernel.org>
|
|
https://gitlab.freedesktop.org/drm/misc/kernel into drm-next
drm-misc-next for v6.18:
UAPI Changes:
- Add DRM_IOCTL_GEM_CHANGE_HANDLE for reassigning GEM handles
- Document DRM_MODE_PAGE_FLIP_EVENT
Cross-subsystem Changes:
fbcon:
- Add missing declarations in fbcon.h
Core Changes:
bridge:
- Fix ref counting
panel:
- Replace and remove mipi_dsi_generic_write_{seq/_chatty}()
sched:
- Fixes
Rust:
- Drop Opaque<> from ioctl arguments
Driver Changes:
amdxdma:
- Support buffers allocated by user space
- Streamline PM interfaces
- Fixes
bridge:
- cdns-dsi: Various improvements to mode setting
- Support Solomon SSD2825 plus DT bindings
- Support Waveshare DSI2DPI plus DT bindings
gud:
- Fixes
ivpu:
- Fixes
nouveau:
- Use GSP firmware by default
- Fixes
panel:
- panel-edp: Support mt8189 Chromebooks; Support BOE NV140WUM-N64;
Support SHP LQ134Z1; Fixes
- panel-simple: Support Olimex LCD-OLinuXino-5CTS plus DT bindings
- Support Samsung AMS561RA01
- Support Hydis HV101HD1 plus DT bindings
panthor:
- Print task/pid on errors
- Fixes
renesas:
- convert to RUNTIME_PM_OPS
repaper:
- Use shadow-plane helpers
rocket:
- Add driver for Rockchip NPU plus DT bindings
sharp-memory:
- Use shadow-plane helpers
simpledrm:
- Use of_reserved_mem_region_to_resource() helper
tidss:
- Use crtc_ fields for programming display mode
- Remove other drivers from aperture
v3d:
- Support querying nubmer of GPU resets for KHR_robustness
vmwgfx:
- Fixes
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://lore.kernel.org/r/20250814072454.GA18104@linux.fritz.box
|
|
Walking hardware contexts created by a process is duplicated in multiple
spots. Add a function, amdxdna_hwctx_walk(), and replace all spots.
hwctx_srcu and dev_lock are good enough to protect hardware context list.
Remove hwctx_lock.
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://lore.kernel.org/r/20250815171634.3417487-1-lizhi.hou@amd.com
|
|
Use kvfree() to fix the following Coccinelle/coccicheck warning reported
by kfree_mismatch.cocci:
WARNING kvmalloc is used to allocate this memory at line 10398
Fixes: f728c17fc97a ("accel/habanalabs/gaudi2: move HMMU page tables to device memory")
Reported-by: Qianfeng Rong <rongqianfeng@vivo.com>
Closes: https://patch.msgid.link/20250808085530.233737-1-rongqianfeng@vivo.com
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
[lukas: acknowledge Qianfeng, adjust Thorsten's domain, add Fixes tag]
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Tomer Tayar <ttayar@habana.ai>
Cc: stable@vger.kernel.org # v6.9+
Link: https://patch.msgid.link/20240820231028.136126-1-thorsten.blum@toblux.com
|