summaryrefslogtreecommitdiff
path: root/drivers/accel
AgeCommit message (Collapse)AuthorFilesLines
3 daysaccel/amdxdna: Block running under a hypervisorMario Limonciello (AMD)1-0/+6
[ Upstream commit 7bbf6d15e935abbb3d604c1fa157350e84a26f98 ] SVA support is required, which isn't configured by hypervisor solutions. Closes: https://github.com/QubesOS/qubes-issues/issues/10275 Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4656 Reviewed-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20251213054513.87925-1-superm1@kernel.org Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-12-18accel/amdxdna: Fix deadlock between context destroy and job timeoutLizhi Hou1-2/+4
[ Upstream commit ca2583412306ceda9304a7c4302fd9efbf43e963 ] Hardware context destroy function holds dev_lock while waiting for all jobs to complete. The timeout job also needs to acquire dev_lock, this leads to a deadlock. Fix the issue by temporarily releasing dev_lock before waiting for all jobs to finish, and reacquiring it afterward. Fixes: 4fd6ca90fc7f ("accel/amdxdna: Refactor hardware context destroy routine") Reviewed-by: Maciej Falkowski <maciej.falkowski@linux.intel.com> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20251107181050.1293125-1-lizhi.hou@amd.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-12-18accel/amdxdna: Clear mailbox interrupt register during channel creationLizhi Hou1-0/+1
[ Upstream commit 6ff9385c07aa311f01f87307e6256231be7d8675 ] The mailbox interrupt register is not always cleared when a mailbox channel is created. This can leave stale interrupt states from previous operations. Fix this by explicitly clearing the interrupt register in the mailbox channel creation function. Fixes: b87f920b9344 ("accel/amdxdna: Support hardware mailbox") Reviewed-by: Maciej Falkowski <maciej.falkowski@linux.intel.com> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20251107181115.1293158-1-lizhi.hou@amd.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-12-18accel/amdxdna: Fix dma_fence leak when job is canceledLizhi Hou2-1/+1
[ Upstream commit dea9f84776b96a703f504631ebe9fea07bd2c181 ] Currently, dma_fence_put(job->fence) is called in job notification callback. However, if a job is canceled, the notification callback is never invoked, leading to a memory leak. Move dma_fence_put(job->fence) to the job cleanup function to ensure the fence is always released. Fixes: aac243092b70 ("accel/amdxdna: Add command execution") Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20251105194140.1004314-1-lizhi.hou@amd.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-12-18accel/amdxdna: Fix incorrect command state for timed out jobLizhi Hou2-2/+14
[ Upstream commit 6fb7f298883246e21f60f971065adcb789ae6eba ] When a command times out, mark it as ERT_CMD_STATE_TIMEOUT. Any other commands that are canceled due to this timeout should be marked as ERT_CMD_STATE_ABORT. Fixes: aac243092b70 ("accel/amdxdna: Add command execution") Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20251029193423.2430463-1-lizhi.hou@amd.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-12-18accel/ivpu: Fix race condition when unbinding BOsTomasz Rusinowicz1-1/+2
[ Upstream commit 00812636df370bedf4e44a0c81b86ea96bca8628 ] Fix 'Memory manager not clean during takedown' warning that occurs when ivpu_gem_bo_free() removes the BO from the BOs list before it gets unmapped. Then file_priv_unbind() triggers a warning in drm_mm_takedown() during context teardown. Protect the unmapping sequence with bo_list_lock to ensure the BO is always fully unmapped when removed from the list. This ensures the BO is either fully unmapped at context teardown time or present on the list and unmapped by file_priv_unbind(). Fixes: 48aea7f2a2ef ("accel/ivpu: Fix locking in ivpu_bo_remove_all_bos_from_context()") Signed-off-by: Tomasz Rusinowicz <tomasz.rusinowicz@intel.com> Reviewed-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com> Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com> Link: https://patch.msgid.link/20251029071451.184243-1-karol.wachowski@linux.intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-12-18accel/ivpu: Remove skip of dma unmap for imported buffersMaciej Falkowski1-3/+0
[ Upstream commit c063c1bbee67391f12956d2ffdd5da00eb87ff79 ] Rework of imported buffers introduced in the commit e0c0891cd63b ("accel/ivpu: Rework bind/unbind of imported buffers") switched the logic of imported buffers by dma mapping/unmapping them just as the regular buffers. The commit didn't include removal of skipping dma unmap of imported buffers which results in them being mapped without unmapping. Fixes: e0c0891cd63b ("accel/ivpu: Rework bind/unbind of imported buffers") Reviewed-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com> Reviewed-by: Karol Wachowski <karol.wachowski@linux.intel.com> Signed-off-by: Maciej Falkowski <maciej.falkowski@linux.intel.com> Link: https://patch.msgid.link/20251027150933.2384538-1-maciej.falkowski@linux.intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-12-18accel/amdxdna: Fix uninitialized return valueLizhi Hou1-2/+2
[ Upstream commit 81233d5419cf20e4f5ef505882951616888a2ef9 ] In aie2_get_hwctx_status() and aie2_query_ctx_status_array(), the functions could return an uninitialized value in some cases. Update them to always return 0. The amount of valid results is indicated by the returned buffer_size, element_size, and num_element fields. Fixes: 2f509fe6a42c ("accel/amdxdna: Add ioctl DRM_IOCTL_AMDXDNA_GET_ARRAY") Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20251024165503.1548131-1-lizhi.hou@amd.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-12-18accel/ivpu: Fix race condition when mapping dmabufWludzik, Jozef1-1/+2
[ Upstream commit 63c7870fab67b2ab2bfe75e8b46f3c37b88c47a8 ] Fix a race that can occur when multiple jobs submit the same dmabuf. This could cause the sg_table to be mapped twice, leading to undefined behavior. Fixes: e0c0891cd63b ("accel/ivpu: Rework bind/unbind of imported buffers") Signed-off-by: Wludzik, Jozef <jozef.wludzik@intel.com> Reviewed-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com> Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com> Link: https://lore.kernel.org/r/20251014071725.3047287-1-karol.wachowski@linux.intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-12-18accel/ivpu: Fix DCT active percent formatKarol Wachowski3-4/+9
[ Upstream commit aa1c2b073ad23847dd2e7bdc7d30009f34ed7f59 ] The pcode MAILBOX STATUS register PARAM2 field expects DCT active percent in U1.7 value format. Convert percentage value to this format before writing to the register. Fixes: a19bffb10c46 ("accel/ivpu: Implement DCT handling") Reviewed-by: Lizhi Hou <lizhi.hou@amd.com> Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com> Link: https://lore.kernel.org/r/20251001104322.1249896-1-karol.wachowski@linux.intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-12-18accel/ivpu: Fix page fault in ivpu_bo_unbind_all_bos_from_context()Jacek Lawrynowicz1-6/+16
[ Upstream commit 8b694b405a84696f1d964f6da7cf9721e68c4714 ] Don't add BO to the vdev->bo_list in ivpu_gem_create_object(). When failure happens inside drm_gem_shmem_create(), the BO is not fully created and ivpu_gem_bo_free() callback will not be called causing a deleted BO to be left on the list. Fixes: 8d88e4cdce4f ("accel/ivpu: Use GEM shmem helper for all buffers") Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Signed-off-by: Maciej Falkowski <maciej.falkowski@linux.intel.com> Reviewed-by: Karol Wachowski <karol.wachowski@linux.intel.com> Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com> Link: https://lore.kernel.org/r/20250925145114.1446283-1-maciej.falkowski@linux.intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-12-18accel/ivpu: Rework bind/unbind of imported buffersJacek Lawrynowicz3-34/+60
[ Upstream commit e0c0891cd63bf8338c25c423e28a5a93aed3d74c ] Ensure that imported buffers are properly mapped and unmapped in the same way as regular buffers to properly handle buffers during device's bind and unbind operations to prevent resource leaks and inconsistent buffer states. Imported buffers are now dma_mapped before submission and dma_unmapped in ivpu_bo_unbind(), guaranteeing they are unmapped when the device is unbound. Add also imported buffers to vdev->bo_list for consistent unmapping on device unbind. The bo->ctx_id is set in open() so imported buffers have a valid context ID. Debug logs have been updated to match the new code structure. The function ivpu_bo_pin() has been renamed to ivpu_bo_bind() to better reflect its purpose, and unbind tests have been refactored for improved coverage and clarity. Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Signed-off-by: Maciej Falkowski <maciej.falkowski@linux.intel.com> Reviewed-by: Karol Wachowski <karol.wachowski@linux.intel.com> Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com> Link: https://lore.kernel.org/r/20250925145059.1446243-1-maciej.falkowski@linux.intel.com Stable-dep-of: 8b694b405a84 ("accel/ivpu: Fix page fault in ivpu_bo_unbind_all_bos_from_context()") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-12-18accel/ivpu: Ensure rpm_runtime_put in case of engine reset/resume failKarol Wachowski1-2/+2
[ Upstream commit 9f6c63285737b141ca25a619add80a96111b8b96 ] Previously, aborting work could return early after engine reset or resume failure, skipping the necessary runtime_put cleanup leaving the device with incorrect reference count breaking runtime power management state. Replace early returns with goto statements to ensure runtime_put is always executed. Fixes: a47e36dc5d90 ("accel/ivpu: Trigger device recovery on engine reset/resume failure") Reviewed-by: Lizhi Hou <lizhi.hou@amd.com> Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com> Link: https://lore.kernel.org/r/20250916084809.850073-1-karol.wachowski@linux.intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-12-18accel/amdxdna: Call dma_buf_vmap_unlocked() for imported objectLizhi Hou1-27/+20
[ Upstream commit 457f4393d02fdb612a93912fb09cef70e6e545c9 ] In amdxdna_gem_obj_vmap(), calling dma_buf_vmap() triggers a kernel warning if LOCKDEP is enabled. So for imported object, use dma_buf_vmap_unlocked(). Then, use drm_gem_vmap() for other objects. The similar change applies to vunmap code. Fixes: bd72d4acda10 ("accel/amdxdna: Support user space allocated buffer") Reviewed-by: Maciej Falkowski <maciej.falkowski@linux.intel.com> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://lore.kernel.org/r/20250916174842.234709-1-lizhi.hou@amd.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-12-18accel/amdxdna: Fix an integer overflow in aie2_query_ctx_status_array()Lizhi Hou1-0/+6
[ Upstream commit 9e16c8bf9aebf629344cfd4cd5e3dc7d8c3f7d82 ] The unpublished smatch static checker reported a warning. drivers/accel/amdxdna/aie2_pci.c:904 aie2_query_ctx_status_array() warn: potential user controlled sizeof overflow 'args->num_element * args->element_size' '1-u32max(user) * 1-u32max(user)' Even this will not cause a real issue, it is better to put a reasonable limitation for element_size and num_element. Add condition to make sure the input element_size <= 4K and num_element <= 1K. Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/dri-devel/aL56ZCLyl3tLQM1e@stanley.mountain/ Fixes: 2f509fe6a42c ("accel/amdxdna: Add ioctl DRM_IOCTL_AMDXDNA_GET_ARRAY") Reviewed-by: Maciej Falkowski <maciej.falkowski@linux.intel.com> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://lore.kernel.org/r/20250909154531.3469979-1-lizhi.hou@amd.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-10-14accel/qaic: Synchronize access to DBC request queue head & tail pointerPranjal Ramajor Asha Kanojiya3-2/+15
Two threads of the same process can potential read and write parallelly to head and tail pointers of the same DBC request queue. This could lead to a race condition and corrupt the DBC request queue. Fixes: ff13be830333 ("accel/qaic: Add datapath") Signed-off-by: Pranjal Ramajor Asha Kanojiya <quic_pkanojiy@quicinc.com> Signed-off-by: Youssef Samir <youssef.abdulrahman@oss.qualcomm.com> Reviewed-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com> Reviewed-by: Carl Vanderlip <carl.vanderlip@oss.qualcomm.com> [jhugo: Add fixes tag] Signed-off-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com> Link: https://lore.kernel.org/r/20251007061837.206132-1-youssef.abdulrahman@oss.qualcomm.com
2025-10-14accel/qaic: Treat remaining == 0 as error in find_and_map_user_pages()Youssef Samir1-1/+1
Currently, if find_and_map_user_pages() takes a DMA xfer request from the user with a length field set to 0, or in a rare case, the host receives QAIC_TRANS_DMA_XFER_CONT from the device where resources->xferred_dma_size is equal to the requested transaction size, the function will return 0 before allocating an sgt or setting the fields of the dma_xfer struct. In that case, encode_addr_size_pairs() will try to access the sgt which will lead to a general protection fault. Return an EINVAL in case the user provides a zero-sized ALP, or the device requests continuation after all of the bytes have been transferred. Fixes: 96d3c1cadedb ("accel/qaic: Clean up integer overflow checking in map_user_pages()") Signed-off-by: Youssef Samir <quic_yabdulra@quicinc.com> Signed-off-by: Youssef Samir <youssef.abdulrahman@oss.qualcomm.com> Reviewed-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com> Reviewed-by: Carl Vanderlip <carl.vanderlip@oss.qualcomm.com> Signed-off-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com> Link: https://lore.kernel.org/r/20251007122320.339654-1-youssef.abdulrahman@oss.qualcomm.com
2025-10-14accel/qaic: Fix bootlog initialization orderingJeffrey Hugo1-2/+3
As soon as we queue MHI buffers to receive the bootlog from the device, we could be receiving data. Therefore all the resources needed to process that data need to be setup prior to queuing the buffers. We currently initialize some of the resources after queuing the buffers which creates a race between the probe() and any data that comes back from the device. If the uninitialized resources are accessed, we could see page faults. Fix the init ordering to close the race. Fixes: 5f8df5c6def6 ("accel/qaic: Add bootlog debugfs") Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Signed-off-by: Youssef Samir <youssef.abdulrahman@oss.qualcomm.com> Reviewed-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com> Reviewed-by: Carl Vanderlip <carl.vanderlip@oss.qualcomm.com> Signed-off-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com> Link: https://lore.kernel.org/r/20251007115750.332169-1-youssef.abdulrahman@oss.qualcomm.com
2025-09-25accel/habanalabs: add Infineon version checkPavan S1-2/+9
On HL338 ASICs, the Infineon first‑stage firmware is not present and the reported version is 0. In this case printing a version number is misleading, as it suggests valid firmware when it does not exist. Fix this by printing the first‑stage Infineon firmware version only if the reported value is non‑zero. This avoids confusing or incorrect log messages on devices where the first stage is not applicable. Signed-off-by: Pavan S <pavan.sreenivas@intel.com> Reviewed-by: Koby Elbaz <koby.elbaz@intel.com> Signed-off-by: Koby Elbaz <koby.elbaz@intel.com>
2025-09-25accel/habanalabs/gaudi2: read preboot status after recovering from dirty stateKonstantin Sinyuk1-1/+7
Dirty state can occur when the host VM undergoes a reset while the device does not. In such a case, the driver must reset the device before it can be used again. As part of this reset, the device capabilities are zeroed. Therefore, the driver must read the Preboot status again to learn the Preboot state, capabilities, and security configuration. Signed-off-by: Konstantin Sinyuk <konstantin.sinyuk@intel.com> Reviewed-by: Koby Elbaz <koby.elbaz@intel.com> Signed-off-by: Koby Elbaz <koby.elbaz@intel.com>
2025-09-25accel/habanalabs: add HL_GET_P_STATE passthrough typeAriel Aviad1-0/+3
Add a new passthrough type HL_GET_P_STATE to the cpucp generic ioctl to allow userspace to read the device performance state via firmware. Signed-off-by: Ariel Aviad <ariel.aviad@intel.com> Reviewed-by: Koby Elbaz <koby.elbaz@intel.com> Signed-off-by: Koby Elbaz <koby.elbaz@intel.com>
2025-09-25accel/habanalabs: add debugfs interface for HLDIO testingKonstantin Sinyuk2-0/+214
Add debugfs files for NVMe Direct I/O (HLDIO) functionality. This interface allows userspace access to direct SSD ↔ device transfers through debugfs nodes. Four debugfs files are created under /sys/kernel/debug/habanalabs/hlN/: - dio_ssd2hl : trigger SSD-to-device transfers - dio_hl2ssd : trigger device-to-SSD transfers (placeholder, not yet implemented) - dio_stats : show transfer statistics - dio_reset : reset statistics counters Usage examples: # Perform SSD → device transfer echo "fd=3 va=0x10000 off=0 len=4096" > \ /sys/kernel/debug/habanalabs/hl0/dio_ssd2hl # View statistics cat /sys/kernel/debug/habanalabs/hl0/dio_stats # Reset counters echo 1 > /sys/kernel/debug/habanalabs/hl0/dio_reset This interface provides access to HLDIO functionality for validation and diagnostics. Signed-off-by: Konstantin Sinyuk <konstantin.sinyuk@intel.com> Reviewed-by: Farah Kassabri <farah.kassabri@intel.com> Reviewed-by: Koby Elbaz <koby.elbaz@intel.com> Signed-off-by: Koby Elbaz <koby.elbaz@intel.com>
2025-09-25accel/habanalabs: add NVMe Direct I/O (HLDIO) infrastructureKonstantin Sinyuk6-2/+624
Introduce NVMe Direct I/O (HLDIO) infrastructure to support peer‑to‑peer DMA in the habanalabs driver. This adds internal helpers and data structures to enable direct transfers between NVMe storage and device memory. The feature is built only when CONFIG_HL_HLDIO is enabled. A debugfs interface is also provided for functional validation. Signed-off-by: Konstantin Sinyuk <konstantin.sinyuk@intel.com> Reviewed-by: Farah Kassabri <farah.kassabri@intel.com> Reviewed-by: Koby Elbaz <koby.elbaz@intel.com> Signed-off-by: Koby Elbaz <koby.elbaz@intel.com>
2025-09-25accel/habanalabs: support mapping cb with vmalloc-backed coherent memoryMoti Haimovski2-0/+26
When IOMMU is enabled, dma_alloc_coherent() with GFP_USER may return addresses from the vmalloc range. If such an address is mapped without VM_MIXEDMAP, vm_insert_page() will trigger a BUG_ON due to the VM_PFNMAP restriction. Fix this by checking for vmalloc addresses and setting VM_MIXEDMAP in the VMA before mapping. This ensures safe mapping and avoids kernel crashes. The memory is still driver-allocated and cannot be accessed directly by userspace. Signed-off-by: Moti Haimovski <moti.haimovski@intel.com> Reviewed-by: Koby Elbaz <koby.elbaz@intel.com> Signed-off-by: Koby Elbaz <koby.elbaz@intel.com>
2025-09-25accel/habanalabs: remove old interface variation of 'access_ok()'Ilia Levi1-5/+0
The access_ok() API no longer requires the VERIFY_WRITE argument, and the use of the old interface with VERIFY_WRITE is deprecated. Clean up the habanalabs memory manager to use the modern access_ok() interface consistently. This removes old #ifdef guards and aligns the driver with current upstream kernel APIs. Signed-off-by: Ilia Levi <ilia.levi@intel.com> Reviewed-by: Koby Elbaz <koby.elbaz@intel.com> Signed-off-by: Koby Elbaz <koby.elbaz@intel.com>
2025-09-25accel/habanalabs/gaudi2: use the CPLD_SHUTDOWN event handlerKonstantin Sinyuk2-4/+2
After CPLD shutdown event the device is not usable anymore. The common CPLD_SHUTDOWN event handler disables any subsequent device access. Signed-off-by: Konstantin Sinyuk <konstantin.sinyuk@intel.com> Reviewed-by: Koby Elbaz <koby.elbaz@intel.com> Signed-off-by: Koby Elbaz <koby.elbaz@intel.com>
2025-09-25accel/habanalabs: disable device access after CPLD_SHUTDOWNKonstantin Sinyuk2-0/+28
After a CPLD shutdown event the device becomes unusable. Prevent further device access once this event is received. Signed-off-by: Konstantin Sinyuk <konstantin.sinyuk@intel.com> Reviewed-by: Koby Elbaz <koby.elbaz@intel.com> Signed-off-by: Koby Elbaz <koby.elbaz@intel.com>
2025-09-25accel/habanalabs: clarify ctx use after hl_ctx_put() in dmabuf releaseTomer Tayar1-1/+6
In hl_release_dmabuf(), ctx is dereferenced after calling hl_ctx_put() to obtain the compute device file. This is safe because the dma-buf object holds a file reference taken in export_dmabuf(), and the file release (which drops another ctx reference) can only happen after we drop that file reference via fput(). Thus, this hl_ctx_put() call cannot be the last one at this point. Add a comment explaining this to avoid confusion. Signed-off-by: Tomer Tayar <tomer.tayar@intel.com> Reviewed-by: Koby Elbaz <koby.elbaz@intel.com> Signed-off-by: Koby Elbaz <koby.elbaz@intel.com>
2025-09-25accel/habanalabs/gaudi2: add support for logging register accesses from debugfsSharley Calzolari3-1/+148
Add infrastructure for logging the last configuration register accesses that occur via debugfs read/write operations. At interrupt time, these log entries can be dumped to dmesg, which helps in diagnosing the cause of RAZWI and ADDR_DEC interrupts. The logging is implemented as a ring buffer of access entries, with each entry recording timestamp and access details. To ensure correctness under concurrent access, operations are now protected using spinlocks. Entries are copied under lock and then printed after releasing it, which minimizes time spent in the critical section. Signed-off-by: Sharley Calzolari <sharley.calzolari@intel.com> Reviewed-by: Koby Elbaz <koby.elbaz@intel.com> Signed-off-by: Koby Elbaz <koby.elbaz@intel.com>
2025-09-25accel/habanalabs/gaudi2: stringify engine/queue idsAriel Suller2-8/+367
Print engine/queue names instead of numerical engine/queue IDs to make logs and debug output more readable. Signed-off-by: Ariel Suller <ariel.suller@intel.com> Reviewed-by: Koby Elbaz <koby.elbaz@intel.com> Signed-off-by: Koby Elbaz <koby.elbaz@intel.com>
2025-09-25accel/habanalabs: add generic message type to get error countersVitaly Margolin1-0/+3
Add a new CPUCP generic message type to retrieve HBM, SRAM and critical error counters from the device. Signed-off-by: Vitaly Margolin <vitaly.margolin@intel.com> Reviewed-by: Koby Elbaz <koby.elbaz@intel.com> Signed-off-by: Koby Elbaz <koby.elbaz@intel.com>
2025-09-25accel/habanalabs/gaudi2: fix BMON disable configurationVered Yavniely1-1/+1
Change the BMON_CR register value back to its original state before enabling, so that BMON does not continue to collect information after being disabled. Signed-off-by: Vered Yavniely <vered.yavniely@intel.com> Reviewed-by: Koby Elbaz <koby.elbaz@intel.com> Signed-off-by: Koby Elbaz <koby.elbaz@intel.com>
2025-09-25accel/habanalabs: return ENOMEM if less than requested pages were pinnedTomer Tayar1-1/+1
EFAULT is currently returned if less than requested user pages are pinned. This value means a "bad address" which might be confusing to the user, as the address of the given user memory is not necessarily "bad". Modify the return value to ENOMEM, as "out of memory" is more suitable in this case. Signed-off-by: Tomer Tayar <tomer.tayar@intel.com> Reviewed-by: Koby Elbaz <koby.elbaz@intel.com> Signed-off-by: Koby Elbaz <koby.elbaz@intel.com>
2025-09-15Merge tag 'v6.17-rc6' into drm-nextDave Airlie4-5/+5
This is a backmerge of Linux 6.17-rc6, needed for msm, also requested by misc. Signed-off-by: Dave Airlie <airlied@redhat.com>
2025-09-04accel/amdxdna: Add ioctl DRM_IOCTL_AMDXDNA_GET_ARRAYLizhi Hou3-26/+118
Add interface for applications to get information array. The application provides a buffer pointer along with information type, maximum number of entries and maximum size of each entry. The buffer may also contain match conditions based on the information type. After the ioctl completes, the actual number of entries and entry size are returned. (see [1], used by driver runtime library) [1] https://github.com/amd/xdna-driver/blob/main/src/shim/host/platform_host.cpp#L337 Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Reviewed-by: Maciej Falkowski <maciej.falkowski@linux.intel.com> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://lore.kernel.org/r/20250903053402.2103196-1-lizhi.hou@amd.com
2025-09-01accel/ivpu: Prevent recovery work from being queued during device removalKarol Wachowski3-4/+4
Use disable_work_sync() instead of cancel_work_sync() in ivpu_dev_fini() to ensure that no new recovery work items can be queued after device removal has started. Previously, recovery work could be scheduled even after canceling existing work, potentially leading to use-after-free bugs if recovery accessed freed resources. Rename ivpu_pm_cancel_recovery() to ivpu_pm_disable_recovery() to better reflect its new behavior. Fixes: 58cde80f45a2 ("accel/ivpu: Use dedicated work for job timeout detection") Cc: stable@vger.kernel.org # v6.8+ Signed-off-by: Karol Wachowski <karol.wachowski@intel.com> Reviewed-by: Lizhi Hou <lizhi.hou@amd.com> Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Link: https://lore.kernel.org/r/20250808110939.328366-1-jacek.lawrynowicz@linux.intel.com
2025-09-01accel/ivpu: Make function parameter names consistentJacek Lawrynowicz2-2/+2
Make ivpu_hw_btrs_dct_set_status() and ivpu_fw_boot_params_setup() declaration and definition parameter names consistent. Reviewed-by: Lizhi Hou <lizhi.hou@amd.com> Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Link: https://lore.kernel.org/r/20250808111014.328607-1-jacek.lawrynowicz@linux.intel.com
2025-09-01accel/ivpu: Remove unused PLL_CONFIG_DEFAULTJacek Lawrynowicz1-2/+1
This change removes the unnecessary condition, makes the code clearer, and silences clang-tidy warning. Reviewed-by: Lizhi Hou <lizhi.hou@amd.com> Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Link: https://lore.kernel.org/r/20250808111044.328800-1-jacek.lawrynowicz@linux.intel.com
2025-09-01accel/rocket: Fix some error checking in rocket_core_init()Dan Carpenter1-1/+1
The problem is that pm_runtime_get_sync() can return 1 on success so checking for zero doesn't work. Use the pm_runtime_resume_and_get() function instead. The pm_runtime_resume_and_get() function does additional cleanup as well so that's a bonus as well. Fixes: 0810d5ad88a1 ("accel/rocket: Add job submission IOCTL") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net> Link: https://lore.kernel.org/r/aKcRW6fsRP_o5C_y@stanley.mountain
2025-09-01accel/rocket: Check the correct DMA irq status to warn aboutHeiko Stuebner1-1/+1
Right now, the code checks the DMA_READ_ERROR state 2 times, while I guess it was supposed to warn about both read and write errors. Change the 2nd check to look at the write-error flag. Fixes: 0810d5ad88a1 ("accel/rocket: Add job submission IOCTL") Signed-off-by: Heiko Stuebner <heiko@sntech.de> Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net> Link: https://lore.kernel.org/r/20250818185658.2585696-1-heiko@sntech.de
2025-09-01accel/rocket: Fix usages of kfree() and sizeof()Brigham Campbell1-3/+4
Replace usages of kfree() with kvfree() for pointers which were allocated using kvmalloc(), as required by the kernel memory management API. Use sizeof() on the type that a pointer references instead of the pointer itself. In this case, scheds and *scheds both happen to be pointers, so sizeof() will expand to the same value in either case, but using *scheds is more technically correct since scheds is an array of drm_gpu_scheduler *. Reported-by: kernel test robot <lkp@intel.com> Reported-by: Julia Lawall <julia.lawall@inria.fr> Closes: https://lore.kernel.org/r/202508120730.PLbjlKbI-lkp@intel.com/ Signed-off-by: Brigham Campbell <me@brighamcampbell.com> Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net> Link: https://lore.kernel.org/r/20250813-rocket-free-fix-v1-1-51f00a7a1271@brighamcampbell.com Fixes: 0810d5ad88a1 ("accel/rocket: Add job submission IOCTL")
2025-09-01accel/rocket: Depend on DRM_ACCEL not just DRMHeiko Stuebner1-1/+1
With the current dependency on only DRM, a config of CONFIG_DRM_ACCEL_ROCKET=y is possible, but of course wrong, because without DRM_ACCEL the build- system will never even enter drivers/accel/* . So depend on DRM_ACCEL instead of just DRM. Fixes: ed98261b4168 ("accel/rocket: Add a new driver for Rockchip's NPU") Signed-off-by: Heiko Stuebner <heiko@sntech.de> Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net> Link: https://lore.kernel.org/r/20250814113519.1551855-3-heiko@sntech.de
2025-09-01accel/rocket: Fix indentation of Kconfig entryHeiko Stuebner1-8/+8
The general indentation for the Kconfig lines is one tab, so adapt the lines accordingly. The description is correctly indented (1 tab + 2 spaces) so doesn't need changes. Fixes: ed98261b4168 ("accel/rocket: Add a new driver for Rockchip's NPU") Signed-off-by: Heiko Stuebner <heiko@sntech.de> Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net> Link: https://lore.kernel.org/r/20250814113519.1551855-2-heiko@sntech.de
2025-08-29accel/amdxdna: Use int instead of u32 to store error codesQianfeng Rong1-3/+3
Change the 'ret' variable from u32 to int to store -EINVAL. Storing the negative error codes in unsigned type, doesn't cause an issue at runtime but it's ugly as pants. Additionally, assigning -EINVAL to u32 ret (i.e., u32 ret = -EINVAL) may trigger a GCC warning when the -Wsign-conversion flag is enabled. Fixes: aac243092b70 ("accel/amdxdna: Add command execution") Signed-off-by: Qianfeng Rong <rongqianfeng@vivo.com> Reviewed-by: Lizhi Hou <lizhi.hou@amd.com> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://lore.kernel.org/r/20250828033917.113364-1-rongqianfeng@vivo.com
2025-08-26accel/amdxdna: Fix incorrect type used for a local variableLizhi Hou1-1/+2
drivers/accel/amdxdna/aie2_pci.c:794:13: sparse: sparse: incorrect type in assignment (different address spaces) Fixes: c8cea4371e5e ("accel/amdxdna: Add a function to walk hardware contexts") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202508230855.0b9efFl6-lkp@intel.com/ Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://lore.kernel.org/r/20250826171951.801585-1-lizhi.hou@amd.com
2025-08-20Merge drm/drm-fixes into drm-misc-fixesMaxime Ripard1-16/+7
Update drm-misc-fixes to -rc2. Signed-off-by: Maxime Ripard <mripard@kernel.org>
2025-08-20Merge drm/drm-next into drm-misc-nextMaxime Ripard1-16/+7
Bring v6.17-rc2 in to unstuck for-linux-next. Signed-off-by: Maxime Ripard <mripard@kernel.org>
2025-08-19Merge tag 'drm-misc-next-2025-08-14' of ↵Dave Airlie27-166/+6370
https://gitlab.freedesktop.org/drm/misc/kernel into drm-next drm-misc-next for v6.18: UAPI Changes: - Add DRM_IOCTL_GEM_CHANGE_HANDLE for reassigning GEM handles - Document DRM_MODE_PAGE_FLIP_EVENT Cross-subsystem Changes: fbcon: - Add missing declarations in fbcon.h Core Changes: bridge: - Fix ref counting panel: - Replace and remove mipi_dsi_generic_write_{seq/_chatty}() sched: - Fixes Rust: - Drop Opaque<> from ioctl arguments Driver Changes: amdxdma: - Support buffers allocated by user space - Streamline PM interfaces - Fixes bridge: - cdns-dsi: Various improvements to mode setting - Support Solomon SSD2825 plus DT bindings - Support Waveshare DSI2DPI plus DT bindings gud: - Fixes ivpu: - Fixes nouveau: - Use GSP firmware by default - Fixes panel: - panel-edp: Support mt8189 Chromebooks; Support BOE NV140WUM-N64; Support SHP LQ134Z1; Fixes - panel-simple: Support Olimex LCD-OLinuXino-5CTS plus DT bindings - Support Samsung AMS561RA01 - Support Hydis HV101HD1 plus DT bindings panthor: - Print task/pid on errors - Fixes renesas: - convert to RUNTIME_PM_OPS repaper: - Use shadow-plane helpers rocket: - Add driver for Rockchip NPU plus DT bindings sharp-memory: - Use shadow-plane helpers simpledrm: - Use of_reserved_mem_region_to_resource() helper tidss: - Use crtc_ fields for programming display mode - Remove other drivers from aperture v3d: - Support querying nubmer of GPU resets for KHR_robustness vmwgfx: - Fixes Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://lore.kernel.org/r/20250814072454.GA18104@linux.fritz.box
2025-08-18accel/amdxdna: Add a function to walk hardware contextsLizhi Hou8-95/+102
Walking hardware contexts created by a process is duplicated in multiple spots. Add a function, amdxdna_hwctx_walk(), and replace all spots. hwctx_srcu and dev_lock are good enough to protect hardware context list. Remove hwctx_lock. Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://lore.kernel.org/r/20250815171634.3417487-1-lizhi.hou@amd.com
2025-08-16accel/habanalabs/gaudi2: Use kvfree() for memory allocated with kvcalloc()Thorsten Blum1-1/+1
Use kvfree() to fix the following Coccinelle/coccicheck warning reported by kfree_mismatch.cocci: WARNING kvmalloc is used to allocate this memory at line 10398 Fixes: f728c17fc97a ("accel/habanalabs/gaudi2: move HMMU page tables to device memory") Reported-by: Qianfeng Rong <rongqianfeng@vivo.com> Closes: https://patch.msgid.link/20250808085530.233737-1-rongqianfeng@vivo.com Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> [lukas: acknowledge Qianfeng, adjust Thorsten's domain, add Fixes tag] Signed-off-by: Lukas Wunner <lukas@wunner.de> Reviewed-by: Tomer Tayar <ttayar@habana.ai> Cc: stable@vger.kernel.org # v6.9+ Link: https://patch.msgid.link/20240820231028.136126-1-thorsten.blum@toblux.com