kernel/linux.git/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c, branch linux-6.9.y

drm/amdgpu: Fix the warning info in mode1 reset

2024-01-31T22:34:05+00:00

Fix the warning info below during mode1 reset. [ +0.000004] Call Trace: [ +0.000004] [ +0.000006] ? show_regs+0x6e/0x80 [ +0.000011] ? __flush_work.isra.0+0x2e8/0x390 [ +0.000005] ? __warn+0x91/0x150 [ +0.000009] ? __flush_work.isra.0+0x2e8/0x390 [ +0.000006] ? report_bug+0x19d/0x1b0 [ +0.000013] ? handle_bug+0x46/0x80 [ +0.000012] ? exc_invalid_op+0x1d/0x80 [ +0.000011] ? asm_exc_invalid_op+0x1f/0x30 [ +0.000014] ? __flush_work.isra.0+0x2e8/0x390 [ +0.000007] ? __flush_work.isra.0+0x208/0x390 [ +0.000007] ? _prb_read_valid+0x216/0x290 [ +0.000008] __cancel_work_timer+0x11d/0x1a0 [ +0.000007] ? try_to_grab_pending+0xe8/0x190 [ +0.000012] cancel_work_sync+0x14/0x20 [ +0.000008] amddrm_sched_stop+0x3c/0x1d0 [amd_sched] [ +0.000032] amdgpu_device_gpu_recover+0x29a/0xe90 [amdgpu] This warning info was printed after applying the patch "drm/sched: Convert drm scheduler to use a work queue rather than kthread". The root cause is that amdgpu driver tries to use the uninitialized work_struct in the struct drm_gpu_scheduler v2: - Rename the function to amdgpu_ring_sched_ready and move it to amdgpu_ring.c (Alex) v3: - Fix a few more checks based on Vitaly's patch (Alex) v4: - squash in fix noticed by Bert in https://gitlab.freedesktop.org/drm/amd/-/issues/3139 Fixes: 11b3b9f461c5 ("drm/sched: Check scheduler ready before calling timeout handling") Reviewed-by: Alex Deucher Signed-off-by: Vitaly Prosyak Signed-off-by: Ma Jun Signed-off-by: Alex Deucher

Merge tag 'amd-drm-next-6.8-2023-12-01' of https://gitlab.freedesktop.org/agd5f/linux into drm-next

2023-12-05T02:11:41+00:00

amd-drm-next-6.8-2023-12-01: amdgpu: - Add new 64 bit sequence number infrastructure. This will ultimately be used for user queue synchronization. - GPUVM updates - Misc code cleanups - RAS updates - DCN 3.5 updates - Rework PCIe link speed handling - Document GPU reset types - DMUB fixes - eDP fixes - NBIO 7.9 updates - NBIO 7.11 updates - SubVP updates - DCN 3.1.4 fixes - ABM fixes - AGP aperture fix - DCN 3.1.5 fix - Fix some potential error path memory leaks - Enable PCIe PMEs - Add XGMI, PCIe state dumping for aqua vanjaram - GFX11 golden register updates - Misc display fixes amdkfd: - Migrate TLB flushing logic to amdgpu - Trap handler fixes - Fix restore workers handling on suspend and reset - Fix possible memory leak in pqm_uninit() radeon: - Fix some possible overflows in command buffer checking - Check for errors in ring_lock From: Alex Deucher Link: https://patchwork.freedesktop.org/patch/msgid/20231201181743.5313-1-alexander.deucher@amd.com Signed-off-by: Dave Airlie

amdgpu: Adjust kmalloc_array calls for new -Walloc-size

2023-11-17T14:29:54+00:00

GCC 14 introduces a new -Walloc-size included in -Wextra which errors out on various files in drivers/gpu/drm/amd/amdgpu like: ``` amdgpu_amdkfd_gfx_v8.c:241:15: error: allocation of insufficient size ‘4’ for type ‘uint32_t[2]’ {aka ‘unsigned int[2]'} with size ‘8’ [-Werror=alloc-size] ``` This is because each HQD_N_REGS is actually a uint32_t[2]. Move the * 2 to the size argument so GCC sees we're allocating enough. Originally did 'sizeof(uint32_t) * 2' for the size but a friend suggested 'sizeof(**dump)' better communicates the intent. Link: https://lore.kernel.org/all/87wmuwo7i3.fsf@gentoo.org/ Signed-off-by: Sam James Signed-off-by: Alex Deucher

drm/sched: Add drm_sched_wqueue_* helpers

2023-11-01T21:29:20+00:00

Add scheduler wqueue ready, stop, and start helpers to hide the implementation details of the scheduler from the drivers. v2: - s/sched_wqueue/sched_wqueue (Luben) - Remove the extra white line after the return-statement (Luben) - update drm_sched_wqueue_ready comment (Luben) Cc: Luben Tuikov Signed-off-by: Matthew Brost Reviewed-by: Luben Tuikov Link: https://lore.kernel.org/r/20231031032439.1558703-2-matthew.brost@intel.com Signed-off-by: Luben Tuikov

drm/amdgpu: fix debug wait on idle for gfx9.4.1

2023-06-09T16:37:50+00:00

Wait calls for amd_ip_block_type not amd_hw_ip_block_type. Reported-by: Hamza Mahfooz Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher

drm/amdkfd: add debug set and clear address watch points operation

2023-06-09T16:36:46+00:00

Shader read, write and atomic memory operations can be alerted to the debugger as an address watch exception. Allow the debugger to pass in a watch point to a particular memory address per device. Note that there exists only 4 watch points per devices to date, so have the KFD keep track of what watch points are allocated or not. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher

drm/amdkfd: add debug wave launch mode operation

2023-06-09T16:36:39+00:00

Allow the debugger to set wave behaviour on to either normally operate, halt at launch, trap on every instruction, terminate immediately or stall on allocation. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher

drm/amdkfd: add debug wave launch override operation

2023-06-09T16:36:37+00:00

This operation allows the debugger to override the enabled HW exceptions on the device. On debug devices that only support the debugging of a single process, the HW exceptions are global and set through the SPI_GDBG_TRAP_MASK register. Because they are global, only address watch exceptions are allowed to be enabled. In other words, the debugger must preserve all non-address watch exception states in normal mode operation by barring a full replacement override or a non-address watch override request. For multi-process debugging, all HW exception overrides are per-VMID so all exceptions can be overridden or fully replaced. In order for the debugger to know what is permissible, returned the supported override mask back to the debugger along with the previously enable overrides. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher

drm/amdgpu: add configurable grace period for unmap queues

2023-06-09T16:35:31+00:00

The HWS schedule allows a grace period for wave completion prior to preemption for better performance by avoiding CWSR on waves that can potentially complete quickly. The debugger, on the other hand, will want to inspect wave status immediately after it actively triggers preemption (a suspend function to be provided). To minimize latency between preemption and debugger wave inspection, allow immediate preemption by setting the grace period to 0. Note that setting the preepmtion grace period to 0 will result in an infinite grace period being set due to a CP FW bug so set it to 1 for now. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher

drm/amdgpu: add gfx9.4.1 hw debug mode enable and disable calls

2023-06-09T16:35:15+00:00

On GFX9.4.1, the implicit wait count instruction on s_barrier is disabled by default in the driver during normal operation for performance requirements. There is a hardware bug in GFX9.4.1 where if the implicit wait count instruction after an s_barrier instruction is disabled, any wave that hits an exception may step over the s_barrier when returning from the trap handler with the barrier logic having no ability to be aware of this, thereby causing other waves to wait at the barrier indefinitely resulting in a shader hang. This bug has been corrected for GFX9.4.2 and onward. Since the debugger subscribes to hardware exceptions, in order to avoid this bug, the debugger must enable implicit wait count on s_barrier for a debug session and disable it on detach. In order to change this setting in the in the device global SQ_CONFIG register, the GFX pipeline must be idle. GFX9.4.1 as a compute device will either dispatch work through the compute ring buffers used for image post processing or through the hardware scheduler by the KFD. Have the KGD suspend and drain the compute ring buffer, then suspend the hardware scheduler and block any future KFD process job requests before changing the implicit wait count setting. Once set, resume all work. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher