summaryrefslogtreecommitdiff
path: root/drivers/accel
AgeCommit message (Collapse)AuthorFilesLines
2025-12-18accel/ivpu: Fix DCT active percent formatKarol Wachowski3-4/+9
[ Upstream commit aa1c2b073ad23847dd2e7bdc7d30009f34ed7f59 ] The pcode MAILBOX STATUS register PARAM2 field expects DCT active percent in U1.7 value format. Convert percentage value to this format before writing to the register. Fixes: a19bffb10c46 ("accel/ivpu: Implement DCT handling") Reviewed-by: Lizhi Hou <lizhi.hou@amd.com> Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com> Link: https://lore.kernel.org/r/20251001104322.1249896-1-karol.wachowski@linux.intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-12-18accel/ivpu: Make function parameter names consistentJacek Lawrynowicz2-2/+2
[ Upstream commit cf87f93847dea607e8a35983cb006ef8493f8065 ] Make ivpu_hw_btrs_dct_set_status() and ivpu_fw_boot_params_setup() declaration and definition parameter names consistent. Reviewed-by: Lizhi Hou <lizhi.hou@amd.com> Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Link: https://lore.kernel.org/r/20250808111014.328607-1-jacek.lawrynowicz@linux.intel.com Stable-dep-of: aa1c2b073ad2 ("accel/ivpu: Fix DCT active percent format") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-12-18accel/ivpu: Ensure rpm_runtime_put in case of engine reset/resume failKarol Wachowski1-2/+2
[ Upstream commit 9f6c63285737b141ca25a619add80a96111b8b96 ] Previously, aborting work could return early after engine reset or resume failure, skipping the necessary runtime_put cleanup leaving the device with incorrect reference count breaking runtime power management state. Replace early returns with goto statements to ensure runtime_put is always executed. Fixes: a47e36dc5d90 ("accel/ivpu: Trigger device recovery on engine reset/resume failure") Reviewed-by: Lizhi Hou <lizhi.hou@amd.com> Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com> Link: https://lore.kernel.org/r/20250916084809.850073-1-karol.wachowski@linux.intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-12-18accel/ivpu: Prevent runtime suspend during context abort workAndrzej Kacprowski1-1/+9
[ Upstream commit 7806bad76ac397a767f0c369534133c71c73b157 ] Increment the runtime PM counter when entering ivpu_context_abort_work_fn() to prevent the device from suspending while the function is executing. Reviewed-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Signed-off-by: Andrzej Kacprowski <Andrzej.Kacprowski@intel.com> Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250204084622.2422544-3-jacek.lawrynowicz@linux.intel.com Stable-dep-of: 9f6c63285737 ("accel/ivpu: Ensure rpm_runtime_put in case of engine reset/resume fail") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-11-13accel/habanalabs: support mapping cb with vmalloc-backed coherent memoryMoti Haimovski2-0/+26
[ Upstream commit 513024d5a0e34fd34247043f1876b6138ca52847 ] When IOMMU is enabled, dma_alloc_coherent() with GFP_USER may return addresses from the vmalloc range. If such an address is mapped without VM_MIXEDMAP, vm_insert_page() will trigger a BUG_ON due to the VM_PFNMAP restriction. Fix this by checking for vmalloc addresses and setting VM_MIXEDMAP in the VMA before mapping. This ensures safe mapping and avoids kernel crashes. The memory is still driver-allocated and cannot be accessed directly by userspace. Signed-off-by: Moti Haimovski <moti.haimovski@intel.com> Reviewed-by: Koby Elbaz <koby.elbaz@intel.com> Signed-off-by: Koby Elbaz <koby.elbaz@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-11-13accel/habanalabs/gaudi2: read preboot status after recovering from dirty stateKonstantin Sinyuk1-1/+7
[ Upstream commit a0d866bab184161ba155b352650083bf6695e50e ] Dirty state can occur when the host VM undergoes a reset while the device does not. In such a case, the driver must reset the device before it can be used again. As part of this reset, the device capabilities are zeroed. Therefore, the driver must read the Preboot status again to learn the Preboot state, capabilities, and security configuration. Signed-off-by: Konstantin Sinyuk <konstantin.sinyuk@intel.com> Reviewed-by: Koby Elbaz <koby.elbaz@intel.com> Signed-off-by: Koby Elbaz <koby.elbaz@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-11-13accel/habanalabs: return ENOMEM if less than requested pages were pinnedTomer Tayar1-1/+1
[ Upstream commit 9f5067531c9b79318c4e48a933cb2694f53f3de2 ] EFAULT is currently returned if less than requested user pages are pinned. This value means a "bad address" which might be confusing to the user, as the address of the given user memory is not necessarily "bad". Modify the return value to ENOMEM, as "out of memory" is more suitable in this case. Signed-off-by: Tomer Tayar <tomer.tayar@intel.com> Reviewed-by: Koby Elbaz <koby.elbaz@intel.com> Signed-off-by: Koby Elbaz <koby.elbaz@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-11-13accel/habanalabs/gaudi2: fix BMON disable configurationVered Yavniely1-1/+1
[ Upstream commit b4fd8e56c9a3b614370fde2d45aec1032eb67ddd ] Change the BMON_CR register value back to its original state before enabling, so that BMON does not continue to collect information after being disabled. Signed-off-by: Vered Yavniely <vered.yavniely@intel.com> Reviewed-by: Koby Elbaz <koby.elbaz@intel.com> Signed-off-by: Koby Elbaz <koby.elbaz@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-10-23accel/qaic: Synchronize access to DBC request queue head & tail pointerPranjal Ramajor Asha Kanojiya3-2/+15
[ Upstream commit 52e59f7740ba23bbb664914967df9a00208ca10c ] Two threads of the same process can potential read and write parallelly to head and tail pointers of the same DBC request queue. This could lead to a race condition and corrupt the DBC request queue. Fixes: ff13be830333 ("accel/qaic: Add datapath") Signed-off-by: Pranjal Ramajor Asha Kanojiya <quic_pkanojiy@quicinc.com> Signed-off-by: Youssef Samir <youssef.abdulrahman@oss.qualcomm.com> Reviewed-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com> Reviewed-by: Carl Vanderlip <carl.vanderlip@oss.qualcomm.com> [jhugo: Add fixes tag] Signed-off-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com> Link: https://lore.kernel.org/r/20251007061837.206132-1-youssef.abdulrahman@oss.qualcomm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-10-23accel/qaic: Treat remaining == 0 as error in find_and_map_user_pages()Youssef Samir1-1/+1
[ Upstream commit 11f08c30a3e4157305ba692f1d44cca5fc9a8fca ] Currently, if find_and_map_user_pages() takes a DMA xfer request from the user with a length field set to 0, or in a rare case, the host receives QAIC_TRANS_DMA_XFER_CONT from the device where resources->xferred_dma_size is equal to the requested transaction size, the function will return 0 before allocating an sgt or setting the fields of the dma_xfer struct. In that case, encode_addr_size_pairs() will try to access the sgt which will lead to a general protection fault. Return an EINVAL in case the user provides a zero-sized ALP, or the device requests continuation after all of the bytes have been transferred. Fixes: 96d3c1cadedb ("accel/qaic: Clean up integer overflow checking in map_user_pages()") Signed-off-by: Youssef Samir <quic_yabdulra@quicinc.com> Signed-off-by: Youssef Samir <youssef.abdulrahman@oss.qualcomm.com> Reviewed-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com> Reviewed-by: Carl Vanderlip <carl.vanderlip@oss.qualcomm.com> Signed-off-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com> Link: https://lore.kernel.org/r/20251007122320.339654-1-youssef.abdulrahman@oss.qualcomm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-10-23accel/qaic: Fix bootlog initialization orderingJeffrey Hugo1-2/+3
[ Upstream commit fd6e385528d8f85993b7bfc6430576136bb14c65 ] As soon as we queue MHI buffers to receive the bootlog from the device, we could be receiving data. Therefore all the resources needed to process that data need to be setup prior to queuing the buffers. We currently initialize some of the resources after queuing the buffers which creates a race between the probe() and any data that comes back from the device. If the uninitialized resources are accessed, we could see page faults. Fix the init ordering to close the race. Fixes: 5f8df5c6def6 ("accel/qaic: Add bootlog debugfs") Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Signed-off-by: Youssef Samir <youssef.abdulrahman@oss.qualcomm.com> Reviewed-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com> Reviewed-by: Carl Vanderlip <carl.vanderlip@oss.qualcomm.com> Signed-off-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com> Link: https://lore.kernel.org/r/20251007115750.332169-1-youssef.abdulrahman@oss.qualcomm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-09accel/ivpu: Prevent recovery work from being queued during device removalKarol Wachowski3-4/+4
commit 69a79ada8eb034ce016b5b78fb7d08d8687223de upstream. Use disable_work_sync() instead of cancel_work_sync() in ivpu_dev_fini() to ensure that no new recovery work items can be queued after device removal has started. Previously, recovery work could be scheduled even after canceling existing work, potentially leading to use-after-free bugs if recovery accessed freed resources. Rename ivpu_pm_cancel_recovery() to ivpu_pm_disable_recovery() to better reflect its new behavior. Fixes: 58cde80f45a2 ("accel/ivpu: Use dedicated work for job timeout detection") Cc: stable@vger.kernel.org # v6.8+ Signed-off-by: Karol Wachowski <karol.wachowski@intel.com> Reviewed-by: Lizhi Hou <lizhi.hou@amd.com> Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Link: https://lore.kernel.org/r/20250808110939.328366-1-jacek.lawrynowicz@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-08-28accel/habanalabs/gaudi2: Use kvfree() for memory allocated with kvcalloc()Thorsten Blum1-1/+1
commit a44458dfd5bc0c79c6739c3f4c658361d3a5126b upstream. Use kvfree() to fix the following Coccinelle/coccicheck warning reported by kfree_mismatch.cocci: WARNING kvmalloc is used to allocate this memory at line 10398 Fixes: f728c17fc97a ("accel/habanalabs/gaudi2: move HMMU page tables to device memory") Reported-by: Qianfeng Rong <rongqianfeng@vivo.com> Closes: https://patch.msgid.link/20250808085530.233737-1-rongqianfeng@vivo.com Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> [lukas: acknowledge Qianfeng, adjust Thorsten's domain, add Fixes tag] Signed-off-by: Lukas Wunner <lukas@wunner.de> Reviewed-by: Tomer Tayar <ttayar@habana.ai> Cc: stable@vger.kernel.org # v6.9+ Link: https://patch.msgid.link/20240820231028.136126-1-thorsten.blum@toblux.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-08-20habanalabs: fix UAF in export_dmabuf()Al Viro1-16/+7
[ Upstream commit 33927f3d0ecdcff06326d6e4edb6166aed42811c ] As soon as we'd inserted a file reference into descriptor table, another thread could close it. That's fine for the case when all we are doing is returning that descriptor to userland (it's a race, but it's a userland race and there's nothing the kernel can do about it). However, if we follow fd_install() with any kind of access to objects that would be destroyed on close (be it the struct file itself or anything destroyed by its ->release()), we have a UAF. dma_buf_fd() is a combination of reserving a descriptor and fd_install(). habanalabs export_dmabuf() calls it and then proceeds to access the objects destroyed on close. In particular, it grabs an extra reference to another struct file that will be dropped as part of ->release() for ours; that "will be" is actually "might have already been". Fix that by reserving descriptor before anything else and do fd_install() only when everything had been set up. As a side benefit, we no longer have the failure exit with file already created, but reference to underlying file (as well as ->dmabuf_export_cnt, etc.) not grabbed yet; unlike dma_buf_fd(), fd_install() can't fail. Fixes: db1a8dd916aa ("habanalabs: add support for dma-buf exporter") Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-15accel/ivpu: Fix reset_engine debugfs file logicAndrzej Kacprowski1-34/+8
commit 541a137254c71822e7a3ebdf8309c5a37b7de465 upstream. The current reset_engine implementation unconditionally resets all engines. Improve implementation to reset only the engine requested by the user space to allow more granular testing. Also use DEFINE_DEBUGFS_ATTRIBUTE() to simplify implementation. Same changes applied to resume_engine debugfs file for consistency. Signed-off-by: Andrzej Kacprowski <Andrzej.Kacprowski@intel.com> Reviewed-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240930195322.461209-22-jacek.lawrynowicz@linux.intel.com Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-06accel/ivpu: Trigger device recovery on engine reset/resume failureKarol Wachowski2-4/+11
[ Upstream commit a47e36dc5d90dc664cac87304c17d50f1595d634 ] Trigger full device recovery when the driver fails to restore device state via engine reset and resume operations. This is necessary because, even if submissions from a faulty context are blocked, the NPU may still process previously submitted faulty jobs if the engine reset fails to abort them. Such jobs can continue to generate faults and occupy device resources. When engine reset is ineffective, the only way to recover is to perform a full device recovery. Fixes: dad945c27a42 ("accel/ivpu: Add handling of VPU_JSM_STATUS_MVNCI_CONTEXT_VIOLATION_HW") Cc: stable@vger.kernel.org # v6.15+ Signed-off-by: Karol Wachowski <karol.wachowski@intel.com> Reviewed-by: Lizhi Hou <lizhi.hou@amd.com> Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Link: https://lore.kernel.org/r/20250528154253.500556-1-jacek.lawrynowicz@linux.intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-07-06accel/ivpu: Add debugfs interface for setting HWS priority bandsKarol Wachowski4-18/+121
[ Upstream commit 320323d2e5456df9d6236ac1ce9c030b1a74aa5b ] Add debugfs interface to modify following priority bands properties: * grace period * process grace period * process quantum This allows for the adjustment of hardware scheduling algorithm parameters for each existing priority band, facilitating validation and fine-tuning. Reviewed-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Signed-off-by: Karol Wachowski <karol.wachowski@intel.com> Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250204084622.2422544-4-jacek.lawrynowicz@linux.intel.com Stable-dep-of: a47e36dc5d90 ("accel/ivpu: Trigger device recovery on engine reset/resume failure") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-07-06accel/ivpu: Separate DB ID and CMDQ ID allocations from CMDQ allocationKarol Wachowski1-24/+64
[ Upstream commit 950942b4813f8c44dbec683fdb140cf4a238516b ] Move doorbell ID and command queue ID XArray allocations from command queue memory allocation function. This will allow ID allocations to be done without the need for actual memory allocation. Signed-off-by: Karol Wachowski <karol.wachowski@intel.com> Signed-off-by: Maciej Falkowski <maciej.falkowski@linux.intel.com> Reviewed-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250107173238.381120-2-maciej.falkowski@linux.intel.com Stable-dep-of: a47e36dc5d90 ("accel/ivpu: Trigger device recovery on engine reset/resume failure") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-07-06accel/ivpu: Make command queue ID allocated on XArrayKarol Wachowski4-46/+60
[ Upstream commit 76ad741ec7349bb1112f3a0ff27adf1ca75cf025 ] Use XArray for dynamic command queue ID allocations instead of fixed ones. This is required by upcoming changes to UAPI that will allow to manage command queues by user space instead of having predefined number of queues in a context. Signed-off-by: Karol Wachowski <karol.wachowski@intel.com> Reviewed-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241017145817.121590-8-jacek.lawrynowicz@linux.intel.com Stable-dep-of: a47e36dc5d90 ("accel/ivpu: Trigger device recovery on engine reset/resume failure") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-07-06accel/ivpu: Remove copy engine supportAndrzej Kacprowski3-36/+20
[ Upstream commit 94b2a2c0e7cba3f163609dbd94120ee533ad2a07 ] Copy engine was deprecated by the FW and is no longer supported. Compute engine includes all copy engine functionality and should be used instead. This change does not affect user space as the copy engine was never used outside of a couple of tests. Signed-off-by: Andrzej Kacprowski <Andrzej.Kacprowski@intel.com> Reviewed-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241017145817.121590-4-jacek.lawrynowicz@linux.intel.com Stable-dep-of: a47e36dc5d90 ("accel/ivpu: Trigger device recovery on engine reset/resume failure") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-07-06accel/ivpu: Do not fail on cmdq if failed to allocate preemption buffersKarol Wachowski1-11/+16
[ Upstream commit 08eb99ce911d3ea202f79b42b96cd6e8498f7f69 ] Allow to proceed with job command queue creation even if preemption buffers failed to be allocated, print warning that preemption on such command queue will be disabled. Signed-off-by: Karol Wachowski <karol.wachowski@intel.com> Reviewed-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240930195322.461209-26-jacek.lawrynowicz@linux.intel.com Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Stable-dep-of: a47e36dc5d90 ("accel/ivpu: Trigger device recovery on engine reset/resume failure") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-27accel/ivpu: Fix warning in ivpu_gem_bo_free()Jacek Lawrynowicz1-1/+2
commit 91274fd4ed9ba110b02c53d71d2778b7d13b49ac upstream. Don't WARN if imported buffers are in use in ivpu_gem_bo_free() as they can be indeed used in the original context/driver. Fixes: 647371a6609d ("accel/ivpu: Add GEM buffer object management") Cc: stable@vger.kernel.org # v6.3 Reviewed-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com> Reviewed-by: Lizhi Hou <lizhi.hou@amd.com> Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Link: https://lore.kernel.org/r/20250528171220.513225-1-jacek.lawrynowicz@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-06-27accel/ivpu: Use dma_resv_lock() instead of a custom mutexJacek Lawrynowicz2-30/+34
commit 98d3f772ca7d6822bdfc8c960f5f909574db97c9 upstream. This fixes a potential race conditions in: - ivpu_bo_unbind_locked() where we modified the shmem->sgt without holding the dma_resv_lock(). - ivpu_bo_print_info() where we read the shmem->pages without holding the dma_resv_lock(). Using dma_resv_lock() also protects against future syncronisation issues that may arise when accessing drm_gem_shmem_object or drm_gem_object members. Fixes: 42328003ecb6 ("accel/ivpu: Refactor BO creation functions") Cc: stable@vger.kernel.org # v6.9+ Reviewed-by: Lizhi Hou <lizhi.hou@amd.com> Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Link: https://lore.kernel.org/r/20250528154325.500684-1-jacek.lawrynowicz@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-06-27accel/ivpu: Use firmware names from upstream repoJacek Lawrynowicz1-6/+6
commit 1c2c0e29f24360b3130c005a3c261cb8c7b363c6 upstream. Use FW names from linux-firmware repo instead of deprecated ones. The vpu_37xx.bin style names were never released and were only used for internal testing, so it is safe to remove them. Fixes: c140244f0cfb ("accel/ivpu: Add initial Panther Lake support") Cc: stable@vger.kernel.org # v6.13+ Reviewed-by: Lizhi Hou <lizhi.hou@amd.com> Reviewed-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com> Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Link: https://lore.kernel.org/r/20250506092030.280276-1-jacek.lawrynowicz@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-06-27accel/ivpu: Improve buffer object loggingJacek Lawrynowicz2-8/+18
commit a01e93ee44f7ed76f872d0ede82f8d31bf0a048a upstream. - Fix missing alloc log when drm_gem_handle_create() fails in drm_vma_node_allow() and open callback is not called - Add ivpu_bo->ctx_id that enables to log the actual context id instead of using 0 as default - Add couple WARNs and errors so we can catch more memory corruption issues Fixes: 37dee2a2f433 ("accel/ivpu: Improve buffer object debug logs") Cc: stable@vger.kernel.org # v6.8+ Reviewed-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com> Reviewed-by: Lizhi Hou <lizhi.hou@amd.com> Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Link: https://lore.kernel.org/r/20250506091303.262034-1-jacek.lawrynowicz@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-06-10accel/ivpu: Update power island delaysKarol Wachowski2-17/+34
commit 88bdd1644ca28d48591b2a1e6e8b8c2b13f4bd3f upstream. Apply Hardware Architecture Specification compatible delays for main island power delivery for 50xx and above. Signed-off-by: Karol Wachowski <karol.wachowski@intel.com> Signed-off-by: Maciej Falkowski <maciej.falkowski@linux.intel.com> Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241004162505.1695605-3-maciej.falkowski@linux.intel.com Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-06-10accel/ivpu: Add initial Panther Lake supportMaciej Falkowski3-3/+11
commit c140244f0cfb9601dbc35e7ab90914954a76b3d1 upstream. Add support for the 5th generation of Intel NPU that is going to be present in PTL_P (Panther Lake) CPUs. NPU5 code reuses almost all of previous driver code. Signed-off-by: Maciej Falkowski <maciej.falkowski@linux.intel.com> Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241004162505.1695605-2-maciej.falkowski@linux.intel.com Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-05-29accel/qaic: Mask out SR-IOV PCI resourcesYoussef Samir1-1/+1
[ Upstream commit 8685520474bfc0fe4be83c3cbfe3fb3e1ca1514a ] During the initialization of the qaic device, pci_select_bars() is used to fetch a bitmask of the BARs exposed by the device. On devices that have Virtual Functions capabilities, the bitmask includes SR-IOV BARs. Use a mask to filter out SR-IOV BARs if they exist. Signed-off-by: Youssef Samir <quic_yabdulra@quicinc.com> Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Reviewed-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250117170943.2643280-6-quic_jhugo@quicinc.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-05-22accel/ivpu: Fix fw log printingJacek Lawrynowicz1-17/+32
commit 4bc988b47019536b3b1f7d9c5b83893c712d94d6 upstream. - Fix empty log detection that couldn't work without read_wrap_count - Start printing wrapped log from correct position (log_start) - Properly handle logs that are wrapped multiple times in reference to reader position - Don't add a newline when log buffer is wrapped - Always add a newline after printing a log buffer in case log does not end with one Reviewed-by: Tomasz Rusinowicz <tomasz.rusinowicz@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240930195322.461209-6-jacek.lawrynowicz@linux.intel.com Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-05-22accel/ivpu: Refactor functions in ivpu_fw_log.cJacek Lawrynowicz3-31/+35
commit 1fc1251149a76d3b75d7f4c94d9c4e081b7df6b4 upstream. Make function names more consistent and (arguably) readable in fw log code. Add fw_log_print_all_in_bo() that remove duplicated code in ivpu_fw_log_print(). Reviewed-by: Tomasz Rusinowicz <tomasz.rusinowicz@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240930195322.461209-5-jacek.lawrynowicz@linux.intel.com Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-05-22accel/ivpu: Reset fw log on cold bootTomasz Rusinowicz3-0/+16
commit 4b4d9e394b6f45ac26ac6144b31604c76b7e3705 upstream. Add ivpu_fw_log_reset() that resets the read_index of all FW logs on cold boot so logs are properly read. Signed-off-by: Tomasz Rusinowicz <tomasz.rusinowicz@intel.com> Reviewed-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240930195322.461209-4-jacek.lawrynowicz@linux.intel.com Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-05-22accel/ivpu: Rename ivpu_log_level to fw_log_levelJacek Lawrynowicz3-11/+11
commit 3a3fb8110c65d361cd9d750c9e16520f740c93f2 upstream. Rename module param ivpu_log_level to fw_log_level, so it is clear what log level is actually changed. Reviewed-by: Maciej Falkowski <maciej.falkowski@linux.intel.com> Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240930195322.461209-3-jacek.lawrynowicz@linux.intel.com Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-05-18accel/ivpu: Increase state dump msg timeoutJacek Lawrynowicz1-1/+1
commit c4eb2f88d2796ab90c5430e11c48709716181364 upstream. Increase JMS message state dump command timeout to 100 ms. On some platforms, the FW may take a bit longer than 50 ms to dump its state to the log buffer and we don't want to miss any debug info during TDR. Fixes: 5e162f872d7a ("accel/ivpu: Add FW state dump on TDR") Cc: stable@vger.kernel.org # v6.13+ Reviewed-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com> Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Link: https://lore.kernel.org/r/20250425092822.2194465-1-jacek.lawrynowicz@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-05-09accel/ivpu: Add handling of VPU_JSM_STATUS_MVNCI_CONTEXT_VIOLATION_HWKarol Wachowski1-0/+25
commit dad945c27a42dfadddff1049cf5ae417209a8996 upstream. Mark as invalid context of a job that returned HW context violation error and queue work that aborts jobs from faulty context. Add engine reset to the context abort thread handler to not only abort currently executing jobs but also to ensure NPU invalid state recovery. Signed-off-by: Karol Wachowski <karol.wachowski@intel.com> Signed-off-by: Maciej Falkowski <maciej.falkowski@linux.intel.com> Reviewed-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250107173238.381120-13-maciej.falkowski@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-05-09accel/ivpu: Fix locking order in ivpu_job_submitKarol Wachowski1-9/+6
commit ab680dc6c78aa035e944ecc8c48a1caab9f39924 upstream. Fix deadlock in job submission and abort handling. When a thread aborts currently executing jobs due to a fault, it first locks the global lock protecting submitted_jobs (#1). After the last job is destroyed, it proceeds to release the related context and locks file_priv (#2). Meanwhile, in the job submission thread, the file_priv lock (#2) is taken first, and then the submitted_jobs lock (#1) is obtained when a job is added to the submitted jobs list. CPU0 CPU1 ---- ---- (for example due to a fault) (jobs submissions keep coming) lock(&vdev->submitted_jobs_lock) #1 ivpu_jobs_abort_all() job_destroy() lock(&file_priv->lock) #2 lock(&vdev->submitted_jobs_lock) #1 file_priv_release() lock(&vdev->context_list_lock) lock(&file_priv->lock) #2 This order of locking causes a deadlock. To resolve this issue, change the order of locking in ivpu_job_submit(). Signed-off-by: Karol Wachowski <karol.wachowski@intel.com> Signed-off-by: Maciej Falkowski <maciej.falkowski@linux.intel.com> Reviewed-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250107173238.381120-12-maciej.falkowski@linux.intel.com [ This backport required small adjustments to ivpu_job_submit(), which lacks support for explicit command queue creation added in 6.15. ] Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-05-09accel/ivpu: Abort all jobs after command queue unregisterKarol Wachowski6-48/+77
commit 5bbccadaf33eea2b879d8326ad59ae0663be47d1 upstream. With hardware scheduler it is not expected to receive JOB_DONE notifications from NPU FW for the jobs aborted due to command queue destroy JSM command. Remove jobs submitted to unregistered command queue from submitted_jobs_xa to avoid triggering a TDR in such case. Add explicit submitted_jobs_lock that protects access to list of submitted jobs which is now used to find jobs to abort. Move context abort procedure to separate work queue not to slow down handling of IPCs or DCT requests in case where job abort takes longer, especially when destruction of the last job of a specific context results in context release. Signed-off-by: Karol Wachowski <karol.wachowski@intel.com> Signed-off-by: Maciej Falkowski <maciej.falkowski@linux.intel.com> Reviewed-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250107173238.381120-4-maciej.falkowski@linux.intel.com [ This backport removes all the lines from upstream commit related to the command queue UAPI, as it is not present in the 6.12 kernel and should not be backported. ] Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-05-09accel/ivpu: Update VPU FW API headersAndrzej Kacprowski4-59/+292
commit a4293cc75348409f998c991c48cbe5532c438114 upstream. This commit bumps: - Boot API from 3.24.0 to 3.26.3 - JSM API from 3.16.0 to 3.25.0 Signed-off-by: Andrzej Kacprowski <Andrzej.Kacprowski@intel.com> Co-developed-by: Tomasz Rusinowicz <tomasz.rusinowicz@intel.com> Signed-off-by: Tomasz Rusinowicz <tomasz.rusinowicz@intel.com> Reviewed-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240930195322.461209-2-jacek.lawrynowicz@linux.intel.com Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-05-09accel/ivpu: Fix a typoAndrew Kreimer1-1/+1
commit 284a8908f5ec25355a831e3e2d87975d748e98dc upstream. Fix a typo in comments. Reported-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Kreimer <algonell@gmail.com> Signed-off-by: Simona Vetter <simona.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20240909135655.45938-1-algonell@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-05-09accel/ivpu: Use xa_alloc_cyclic() instead of custom functionKarol Wachowski3-37/+12
commit ae7af7d8dc2a13a427aa90d003fe4fb2c168342a upstream. Remove custom ivpu_id_alloc() wrapper used for ID allocations and replace it with standard xa_alloc_cyclic() API. The idea behind ivpu_id_alloc() was to have monotonic IDs, so the driver is easier to debug because same IDs are not reused all over. The same can be achieved just by using appropriate Linux API. Signed-off-by: Karol Wachowski <karol.wachowski@intel.com> Reviewed-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241017145817.121590-7-jacek.lawrynowicz@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-05-09accel/ivpu: Make DB_ID and JOB_ID allocations incrementalTomasz Rusinowicz3-10/+43
commit c3b0ec0fe0c7ebc4eb42ba60f7340ecdb7aae1a2 upstream. Save last used ID and use it to limit the possible values for the ID. This should decrease the rate at which the IDs are reused, which will make debugging easier. Signed-off-by: Tomasz Rusinowicz <tomasz.rusinowicz@intel.com> Reviewed-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240930195322.461209-19-jacek.lawrynowicz@linux.intel.com Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-05-09accel/ivpu: Correct DCT interrupt handlingKarol Wachowski2-9/+11
[ Upstream commit e53e004e346062e15df9511bd4b5a19e34701384 ] Fix improper use of dct_active_percent field in DCT interrupt handler causing DCT to never get enabled. Set dct_active_percent internally before IPC to ensure correct driver value even if IPC fails. Set default DCT value to 30 accordingly to HW architecture specification. Fixes: a19bffb10c46 ("accel/ivpu: Implement DCT handling") Signed-off-by: Karol Wachowski <karol.wachowski@intel.com> Signed-off-by: Maciej Falkowski <maciej.falkowski@linux.intel.com> Reviewed-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com> Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Link: https://lore.kernel.org/r/20250416102616.384577-1-maciej.falkowski@linux.intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-05-02accel/ivpu: Fix the NPU's DPU frequency calculationAndrzej Kacprowski5-86/+64
[ Upstream commit 6c2b75404d33caa46a582f2791a70f92232adb71 ] Fix the frequency returned to the user space by the DRM_IVPU_PARAM_CORE_CLOCK_RATE GET_PARAM IOCTL. The kernel driver returned CPU frequency for MTL and bare PLL frequency for LNL - this was inconsistent and incorrect for both platforms. With this fix the driver returns maximum frequency of the NPU data processing unit (DPU) for all HW generations. This is what user space always expected. Also do not set CPU frequency in boot params - the firmware does not use frequency passed from the driver, it was only used by the early pre-production firmware. With that we can remove CPU frequency calculation code. Show NPU frequency in FREQ_CHANGE interrupt when frequency tracking is enabled. Fixes: 8a27ad81f7d3 ("accel/ivpu: Split IP and buttress code") Cc: stable@vger.kernel.org # v6.11+ Signed-off-by: Andrzej Kacprowski <Andrzej.Kacprowski@intel.com> Signed-off-by: Maciej Falkowski <maciej.falkowski@linux.intel.com> Reviewed-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com> Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Link: https://lore.kernel.org/r/20250401155912.4049340-2-maciej.falkowski@linux.intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-05-02accel/ivpu: Add auto selection logic for job schedulerJacek Lawrynowicz8-15/+52
[ Upstream commit 436b67d6936b5658426e40d0df8f147239bc532b ] Add ivpu_fw_sched_mode_select() function that can select scheduling mode based on HW and FW versions. This prepares for a switch to HWS on selected platforms. Reviewed-by: Karol Wachowski <karol.wachowski@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240930195322.461209-17-jacek.lawrynowicz@linux.intel.com Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Stable-dep-of: 6c2b75404d33 ("accel/ivpu: Fix the NPU's DPU frequency calculation") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-20accel/ivpu: Fix deadlock in ivpu_ms_cleanup()Jacek Lawrynowicz1-0/+6
commit 9a6f56762d23a1f3af15e67901493c927caaf882 upstream. Fix deadlock in ivpu_ms_cleanup() by preventing runtime resume after file_priv->ms_lock is acquired. During a failure in runtime resume, a cold boot is executed, which calls ivpu_ms_cleanup_all(). This function calls ivpu_ms_cleanup() that acquires file_priv->ms_lock and causes the deadlock. Fixes: cdfad4db7756 ("accel/ivpu: Add NPU profiling support") Cc: stable@vger.kernel.org # v6.11+ Signed-off-by: Maciej Falkowski <maciej.falkowski@linux.intel.com> Reviewed-by: Lizhi Hou <lizhi.hou@amd.com> Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Link: https://lore.kernel.org/r/20250325114306.3740022-2-maciej.falkowski@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-04-20accel/ivpu: Fix warning in ivpu_ipc_send_receive_internal()Jacek Lawrynowicz1-1/+2
commit 6b4568b675b14cf890c0c21779773c3e08e80ce5 upstream. Warn if device is suspended only when runtime PM is enabled. Runtime PM is disabled during reset/recovery and it is not an error to use ivpu_ipc_send_receive_internal() in such cases. Fixes: 5eaa49741119 ("accel/ivpu: Prevent recovery invocation during probe and resume") Cc: stable@vger.kernel.org # v6.13+ Signed-off-by: Maciej Falkowski <maciej.falkowski@linux.intel.com> Reviewed-by: Lizhi Hou <lizhi.hou@amd.com> Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Link: https://lore.kernel.org/r/20250325114219.3739951-1-maciej.falkowski@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-04-20accel/ivpu: Fix PM related deadlocks in MS IOCTLsJacek Lawrynowicz2-2/+20
commit d893da85e06edf54737bb80648bb58ba8fd56d9f upstream. Prevent runtime resume/suspend while MS IOCTLs are in progress. Failed suspend will call ivpu_ms_cleanup() that would try to acquire file_priv->ms_lock, which is already held by the IOCTLs. Fixes: cdfad4db7756 ("accel/ivpu: Add NPU profiling support") Cc: stable@vger.kernel.org # v6.11+ Signed-off-by: Maciej Falkowski <maciej.falkowski@linux.intel.com> Reviewed-by: Lizhi Hou <lizhi.hou@amd.com> Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Link: https://lore.kernel.org/r/20250325114306.3740022-3-maciej.falkowski@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-29accel/qaic: Fix integer overflow in qaic_validate_req()Dan Carpenter1-1/+3
commit 67d15c7aa0864dfd82325c7e7e7d8548b5224c7b upstream. These are u64 variables that come from the user via qaic_attach_slice_bo_ioctl(). Use check_add_overflow() to ensure that the math doesn't have an integer wrapping bug. Cc: stable@vger.kernel.org Fixes: ff13be830333 ("accel/qaic: Add datapath") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com> Signed-off-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com> Link: https://patchwork.freedesktop.org/patch/msgid/176388fa-40fe-4cb4-9aeb-2c91c22130bd@stanley.mountain Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-29accel/qaic: Fix possible data corruption in BOs > 2GJeffrey Hugo1-1/+4
[ Upstream commit 84a833d90635e4b846333e2df0ae72f9cbecac39 ] When slicing a BO, we need to iterate through the BO's sgt to find the right pieces to construct the slice. Some of the data types chosen for this process are incorrectly too small, and can overflow. This can result in the incorrect slice construction, which can lead to data corruption in workload execution. The device can only handle 32-bit sized transfers, and the scatterlist struct only supports 32-bit buffer sizes, so our upper limit for an individual transfer is an unsigned int. Using an int is incorrect due to the reservation of the sign bit. Upgrade the length of a scatterlist entry and the offsets into a scatterlist entry to unsigned int for a correct representation. While each transfer may be limited to 32-bits, the overall BO may exceed that size. For counting the total length of the BO, we need a type that can represent the largest allocation possible on the system. That is the definition of size_t, so use it. Fixes: ff13be830333 ("accel/qaic: Add datapath") Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Signed-off-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com> Reviewed-by: Lizhi Hou <lizhi.hou@amd.com> Reviewed-by: Troy Hanson <quic_thanson@quicinc.com> Reviewed-by: Youssef Samir <quic_yabdulra@quicinc.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250306171959.853466-1-jeff.hugo@oss.qualcomm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-02-27accel/ivpu: Fix error handling in recovery/resetJacek Lawrynowicz1-36/+43
[ Upstream commit 41a2d8286c905614f29007f1bc8e652d54654b82 ] Disable runtime PM for the duration of reset/recovery so it is possible to set the correct runtime PM state depending on the outcome of the `ivpu_resume()`. Don’t suspend or reset the HW if the NPU is suspended when the reset/recovery is requested. Also, move common reset/recovery code to separate functions for better code readability. Fixes: 27d19268cf39 ("accel/ivpu: Improve recovery and reset support") Cc: stable@vger.kernel.org # v6.8+ Reviewed-by: Maciej Falkowski <maciej.falkowski@linux.intel.com> Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250129124009.1039982-4-jacek.lawrynowicz@linux.intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-02-27accel/ivpu: Add FW state dump on TDRTomasz Rusinowicz7-0/+43
[ Upstream commit 5e162f872d7af8f041b143536617ab2563ea7de5 ] Send JSM state dump message at the beginning of TDR handler. This allows FW to collect debug info in the FW log before the state of the NPU is lost allowing to analyze the cause of a TDR. Wait a predefined timeout (10 ms) so the FW has a chance to write debug logs. We cannot wait for JSM response at this point because IRQs are already disabled before TDR handler is invoked. Signed-off-by: Tomasz Rusinowicz <tomasz.rusinowicz@intel.com> Reviewed-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240930195322.461209-9-jacek.lawrynowicz@linux.intel.com Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Stable-dep-of: 41a2d8286c90 ("accel/ivpu: Fix error handling in recovery/reset") Signed-off-by: Sasha Levin <sashal@kernel.org>