kernel/linux.git/drivers/gpu/host1x/hw, branch v6.19.11

gpu: host1x: Syncpoint interrupt performance optimization

2025-11-14T17:27:19+00:00

Optimize performance of syncpoint interrupt handling by reading the status register in 64-bit chunks when possible, and skipping processing when the read value is zero. Signed-off-by: Mikko Perttunen Signed-off-by: Thierry Reding Link: https://patch.msgid.link/20250917-host1x-syncpt-irq-perf-v2-1-736ef69b1347@nvidia.com

gpu: host1x: Wait prefences outside MLOCK

2025-09-11T16:56:35+00:00

The current submission opcode sequence first takes the engine MLOCK, and then switches to HOST1X class to wait prefences. This is fine while we only use a single channel per engine and there is no virtualization, since jobs are serialized on that one channel anyway. However, when that assumption doesn't hold, we are keeping the engine locked while not running anything on it while waiting for prefences to complete. To resolve this, execute wait commands in the beginning of the job outside the engine MLOCK. We still take the HOST1X MLOCK because recent hardware requires register opcodes to be executed within some MLOCK, but the hardware also allows unlimited channels to take the HOST1X MLOCK at the same time. Signed-off-by: Mikko Perttunen Signed-off-by: Thierry Reding Link: https://lore.kernel.org/r/20250708-host1x-wait-prefences-outside-mlock-v1-1-13e98044e35a@nvidia.com

gpu: host1x: Add MLOCK recovery for rest of engines

2024-08-29T18:14:29+00:00

Add class IDs / MLOCKs for MLOCK recovery for rest of engines present on Tegra234. Signed-off-by: Mikko Perttunen Signed-off-by: Thierry Reding Link: https://patchwork.freedesktop.org/patch/msgid/20240425050238.2943404-4-cyndis@kapsi.fi

gpu: host1x: Handle CDMA wraparound when debug printing

2024-08-29T18:14:29+00:00

During channel debug information dump, when printing CDMA opcodes, the circular nature of the CDMA pushbuffer wasn't being taken into account, sometimes accessing past the end. Change the printing to take this into account. Signed-off-by: Mikko Perttunen Signed-off-by: Thierry Reding Link: https://patchwork.freedesktop.org/patch/msgid/20240425050238.2943404-2-cyndis@kapsi.fi

gpu: host1x: Request syncpoint IRQs only during probe

2024-08-28T15:28:48+00:00

Syncpoint IRQs are currently requested in a code path that runs during resume. Due to this, we get multiple overlapping registered interrupt handlers as host1x is suspended and resumed. Rearrange interrupt code to only request IRQs during initialization. Signed-off-by: Mikko Perttunen Signed-off-by: Thierry Reding Link: https://patchwork.freedesktop.org/patch/msgid/20240531070719.2138-1-cyndis@kapsi.fi

gpu: host1x: Syncpoint interrupt sharding

2023-10-11T20:52:44+00:00

Support sharded syncpoint interrupts on Tegra234+. This feature allows specifying one of eight interrupt lines for each syncpoint to lower processing latency of syncpoint threshold interrupts. Signed-off-by: Mikko Perttunen Signed-off-by: Thierry Reding Link: https://patchwork.freedesktop.org/patch/msgid/20230901114008.672433-1-cyndis@kapsi.fi

gpu: host1x: Use tegra_dev_iommu_get_stream_id()

2023-01-27T16:41:49+00:00

Use the newly implemented tegra_dev_iommu_get_stream_id() helper to encapsulate and centralize the IOMMU stream ID access. Signed-off-by: Thierry Reding

gpu: host1x: External timeout/cancellation for fences

2023-01-26T14:55:38+00:00

Currently all fences have a 30 second timeout to ensure they are cleaned up if the fence never completes otherwise. However, this one size fits all solution doesn't actually fit in every case, such as syncpoint waiting where we want to be able to have timeouts longer than 30 seconds. As such, we want to be able to give control over fence cancellation to the caller (and maybe eventually get rid of the internal timeout altogether). Here we add this cancellation mechanism by essentially adding a function for entering the timeout path by function call, and changing the syncpoint wait function to use it. Signed-off-by: Mikko Perttunen Signed-off-by: Thierry Reding

gpu: host1x: Rewrite syncpoint interrupt handling

2023-01-26T14:55:38+00:00

Move from the old, complex intr handling code to a new implementation based on dma_fences. While there is a fair bit of churn to get there, the new implementation is much simpler and likely faster as well due to allowing signaling directly from interrupt context. Signed-off-by: Mikko Perttunen Signed-off-by: Thierry Reding

gpu: host1x: Implement job tracking using DMA fences

2023-01-26T14:55:38+00:00

In anticipation of removal of the intr API, implement job tracking using DMA fences instead. The main two things about this are making cdma_update schedule the work since fence completion can now be called from interrupt context, and some complication in ensuring the callback is not running when we free the fence. Signed-off-by: Mikko Perttunen Signed-off-by: Thierry Reding