summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
AgeCommit message (Collapse)AuthorFilesLines
2025-05-13drm/amdgpu/userq: Fix lock contention in userq fenceArunpravin Paneer Selvam1-2/+2
Fix lockdep warnings. [ +0.000637] ================================ [ +0.000004] WARNING: inconsistent lock state [ +0.000004] 6.12.0+ #18 Tainted: G W OE [ +0.000004] -------------------------------- [ +0.000004] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage. [ +0.000004] Xwayland/1952 [HC0[0]:SC0[0]:HE1:SE1] takes: [ +0.000005] ffff8884636f4740 (&fence_drv->fence_list_lock){?...}-{2:2}, at: amdgpu_userq_fence_driver_destroy+0xb8/0x540 [amdgpu] [ +0.000208] {IN-HARDIRQ-W} state was registered at: [ +0.000004] lock_acquire.part.0+0x116/0x360 [ +0.000005] lock_acquire+0x7c/0xc0 [ +0.000005] _raw_spin_lock+0x2f/0x60 [ +0.000005] amdgpu_userq_fence_driver_process+0x75/0x400 [amdgpu] [ +0.000185] gfx_v12_0_eop_irq+0x29f/0x420 [amdgpu] [ +0.000210] amdgpu_irq_dispatch+0x2a4/0x7b0 [amdgpu] [ +0.000191] amdgpu_ih_process+0x1e1/0x3d0 [amdgpu] [ +0.000185] amdgpu_irq_handler+0x28/0xc0 [amdgpu] [ +0.000183] __handle_irq_event_percpu+0x1bb/0x590 [ +0.000005] handle_irq_event+0xab/0x1d0 [ +0.000005] handle_edge_irq+0x1fd/0xc10 [ +0.000005] __common_interrupt+0x83/0x190 [ +0.000004] common_interrupt+0xb1/0xe0 [ +0.000005] asm_common_interrupt+0x27/0x40 [ +0.000004] cpuidle_enter_state+0x2ba/0x530 [ +0.000005] cpuidle_enter+0x4f/0xb0 [ +0.000006] call_cpuidle+0x46/0xd0 [ +0.000005] do_idle+0x367/0x430 [ +0.000004] cpu_startup_entry+0x58/0x70 [ +0.000005] start_secondary+0x224/0x2b0 [ +0.000005] common_startup_64+0x13e/0x141 [ +0.000005] irq event stamp: 88271 [ +0.000004] hardirqs last enabled at (88271): [<ffffffffad9ca7a1>] _raw_spin_unlock_irqrestore+0x51/0x80 [ +0.000005] hardirqs last disabled at (88270): [<ffffffffad9ca424>] _raw_spin_lock_irqsave+0x74/0x80 [ +0.000005] softirqs last enabled at (87858): [<ffffffffaa67377e>] __irq_exit_rcu+0x17e/0x1d0 [ +0.000005] softirqs last disabled at (87849): [<ffffffffaa67377e>] __irq_exit_rcu+0x17e/0x1d0 [ +0.000005] other info that might help us debug this: [ +0.000004] Possible unsafe locking scenario: [ +0.000003] CPU0 [ +0.000004] ---- [ +0.000003] lock(&fence_drv->fence_list_lock); v2: Drop fence_list_flags and use xa_lock_irqsave() flags parameter (Christian) Fix merge conflicts. Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-13drm/amdgpu: Fix amdgpu_userq_wait_ioctl() warn missing error code 'r'Arvind Yadav1-1/+3
To resolve the warning regarding the missing error code 'r' in amdgpu_userq_wait_ioctl(), assign the value 'r = -EINVAL'. Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/r/202505080458.rnV8YfiY-lkp@intel.com/ Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Reviewed-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Arvind Yadav <Arvind.Yadav@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-05drm/amdgpu: only keep most recent fence for each contextArvind Yadav1-0/+6
Keep only the latest fences to reduce the number of values given back to userspace v2: - Export this code from dma-fence-unwrap.c(by Christian). v3: - To split this in a dma_buf patch and amd userq patch(by Sunil). - No need to add a new function just re-use existing(by Christian). v4: Export dma_fence_dedub_array function and used it(by Christian). Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Reviewed-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Arvind Yadav <Arvind.Yadav@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-01drm/amdgpu/userq: Call unreserve on error in amdgpu_userq_fence_read_wptr()Dan Carpenter1-0/+1
This error path should call amdgpu_bo_unreserve() before returning. Fixes: d8675102ba32 ("drm/amdgpu: add vm root BO lock before accessing the vm") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-01drm/amdgpu: remove DRM_AMDGPU_NAVI3X_USERQ config for UQArvind Yadav1-18/+0
DRM_AMDGPU_NAVI3X_USERQ config support is not required for usermode queue. v2: rebase. Cc: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Reviewed-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Arvind Yadav <Arvind.Yadav@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-22drm/amdgpu/userq: use consistent function namingAlex Deucher1-2/+2
s/userqueue/userq/ 1. remove the mix of amdgpu_userqueue and amdgpu_userq 2. to be consistent with other amdgpu_userq_fence.c 3. it's shorter Reviewed-by: Prike Liang <Prike.Liang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-22drm/amdgpu: Add queue id support to the user queue wait IOCTLArunpravin Paneer Selvam1-8/+12
Add queue id support to the user queue wait IOCTL drm_amdgpu_userq_wait structure. This is required to retrieve the wait user queue and maintain the fence driver references in it so that the user queue in the same context releases their reference to the fence drivers at some point before queue destruction. Otherwise, we would gather those references until we don't have any more space left and crash. v2: Modify the UAPI comment as per the mesa and libdrm UAPI comment. Libdrm MR: https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/408 Mesa MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34493 Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21drm/amdgpu: Clean up error handling in amdgpu_userq_fence_driver_alloc()Dan Carpenter1-5/+2
1) Checkpatch complains if we print an error message for kzalloc() failure. The kzalloc() failure already has it's own error messages built in. Also this allocation is small enough that it is guaranteed to succeed. 2) Return directly instead of doing a goto free_fence_drv. The "fence_drv" is already NULL so no cleanup is necessary. Reviewed-by: Arvind Yadav <arvind.yadav@amd.com> Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21drm/amdgpu: Fix double free in amdgpu_userq_fence_driver_alloc()Dan Carpenter1-5/+2
The goto frees "fence_drv" so this is a double free bug. There is no need to call amdgpu_seq64_free(adev, fence_drv->va) since the seq64 allocation failed so change the goto to goto free_fence_drv. Also propagate the error code from amdgpu_seq64_alloc() instead of hard coding it to -ENOMEM. Fixes: e7cf21fbb277 ("drm/amdgpu: Few optimization and fixes for userq fence driver") Reviewed-by: Arvind Yadav <arvind.yadav@amd.com> Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21drm/amdgpu/userq: move some code aroundAlex Deucher1-0/+26
Move some userq fence handling code into amdgpu_userq_fence.c. This matches the other code in that file. Reviewed-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-08drm/amdgpu: Fix display freeze lockup errorArvind Yadav1-25/+46
A deadlock situation has arised between the userq signal ioctl and the eviction fence. In this scenario, the function amdgpu_userq_signal_ioctl() has acquired a reservation lock on the read/write buffer object (BO) through drm_exec. Subsequently, it calls amdgpu_userqueue_ensure_ev_fence(), which is in a waiting for the userq resume work. Meanwhile, the userq suspend worker has initiated the userq resume work(amdgpu_userqueue_resume_worker). This userq resume work attempts to validate the vm->done BO, leading to amdgpu_userqueue_validate_bos also attempting to reservation lock the same write BO that is already locked by amdgpu_userq_signal_ioctl. As a result, the resume work becomes stalled, causing amdgpu_userqueue_ensure_ev_fence to remain in a waiting state. Call Trace: [ 242.836469] INFO: task gnome-shel:cs0:1288 blocked for more than 120 seconds. [ 242.836486] Tainted: G OE 6.12.0-rc2rebased-oct-24+ #4 [ 242.836491] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 242.836494] task:gnome-shel:cs0 state:D stack:0 pid:1288 tgid:1282 ppid:1180 flags:0x00000002 [ 242.836503] Call Trace: [ 242.836508] <TASK> [ 242.836517] __schedule+0x3e0/0xb10 [ 242.836530] ? srso_return_thunk+0x5/0x5f [ 242.836541] schedule+0x31/0x120 [ 242.836546] schedule_timeout+0x150/0x160 [ 242.836551] ? srso_return_thunk+0x5/0x5f [ 242.836555] ? sysvec_call_function+0x69/0xd0 [ 242.836562] ? srso_return_thunk+0x5/0x5f [ 242.836567] ? preempt_count_add+0x7f/0xd0 [ 242.836577] __wait_for_common+0x91/0x180 [ 242.836582] ? __pfx_schedule_timeout+0x10/0x10 [ 242.836590] wait_for_completion+0x28/0x30 [ 242.836595] __flush_work+0x16c/0x290 [ 242.836602] ? __pfx_wq_barrier_func+0x10/0x10 [ 242.836611] flush_delayed_work+0x3a/0x60 [ 242.836621] amdgpu_userqueue_ensure_ev_fence+0x2d/0xb0 [amdgpu] [ 242.836966] amdgpu_userq_signal_ioctl+0x959/0xec0 [amdgpu] [ 242.837171] ? __pfx_amdgpu_userq_signal_ioctl+0x10/0x10 [amdgpu] [ 242.837365] drm_ioctl_kernel+0xae/0x100 [drm] [ 242.837398] drm_ioctl+0x2a1/0x500 [drm] [ 242.837420] ? __pfx_amdgpu_userq_signal_ioctl+0x10/0x10 [amdgpu] [ 242.837622] ? srso_return_thunk+0x5/0x5f [ 242.837627] ? srso_return_thunk+0x5/0x5f [ 242.837630] ? _raw_spin_unlock_irqrestore+0x2b/0x50 [ 242.837635] amdgpu_drm_ioctl+0x4f/0x90 [amdgpu] [ 242.837811] __x64_sys_ioctl+0x99/0xd0 [ 242.837820] x64_sys_call+0x1209/0x20d0 [ 242.837825] do_syscall_64+0x51/0x120 [ 242.837830] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 242.837835] RIP: 0033:0x7f2f33f1a94f [ 242.837838] RSP: 002b:00007f2f24ffea30 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 242.837842] RAX: ffffffffffffffda RBX: 00007f2f24ffebd0 RCX: 00007f2f33f1a94f [ 242.837845] RDX: 00007f2f24ffebd0 RSI: 00000000c0306457 RDI: 000000000000000d [ 242.837847] RBP: 00007f2f24ffeab0 R08: 0000000000000000 R09: 0000000000000000 [ 242.837849] R10: 00007f2f24ffecd0 R11: 0000000000000246 R12: 00007f2f25000640 [ 242.837851] R13: 00000000c0306457 R14: 000000000000000d R15: 00007fff3b39c1e0 [ 242.837858] </TASK> [ 242.837865] INFO: task Xwayland:cs0:1517 blocked for more than 120 seconds. [ 242.837869] Tainted: G OE 6.12.0-rc2rebased-oct-24+ #4 [ 242.837872] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 242.837874] task:Xwayland:cs0 state:D stack:0 pid:1517 tgid:1338 ppid:1282 flags:0x00004002 [ 242.837878] Call Trace: [ 242.837880] <TASK> [ 242.837883] __schedule+0x3e0/0xb10 [ 242.837890] schedule+0x31/0x120 [ 242.837894] schedule_preempt_disabled+0x1c/0x30 [ 242.837897] __mutex_lock.constprop.0+0x386/0x6e0 [ 242.837902] ? srso_return_thunk+0x5/0x5f [ 242.837905] ? __timer_delete_sync+0x81/0xe0 [ 242.837911] __mutex_lock_slowpath+0x13/0x20 [ 242.837915] mutex_lock+0x3b/0x50 [ 242.837919] amdgpu_userqueue_ensure_ev_fence+0x35/0xb0 [amdgpu] [ 242.838138] amdgpu_userq_signal_ioctl+0x959/0xec0 [amdgpu] [ 242.838340] ? __pfx_amdgpu_userq_signal_ioctl+0x10/0x10 [amdgpu] [ 242.838531] drm_ioctl_kernel+0xae/0x100 [drm] [ 242.838559] drm_ioctl+0x2a1/0x500 [drm] [ 242.838580] ? __pfx_amdgpu_userq_signal_ioctl+0x10/0x10 [amdgpu] [ 242.838778] ? srso_return_thunk+0x5/0x5f [ 242.838783] ? srso_return_thunk+0x5/0x5f [ 242.838786] ? _raw_spin_unlock_irqrestore+0x2b/0x50 [ 242.838791] amdgpu_drm_ioctl+0x4f/0x90 [amdgpu] [ 242.838967] __x64_sys_ioctl+0x99/0xd0 [ 242.838972] x64_sys_call+0x1209/0x20d0 [ 242.838975] do_syscall_64+0x51/0x120 [ 242.838979] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 242.838982] RIP: 0033:0x7f9118b1a94f [ 242.838985] RSP: 002b:00007f910cdff760 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 242.838989] RAX: ffffffffffffffda RBX: 00007f910cdff910 RCX: 00007f9118b1a94f [ 242.838991] RDX: 00007f910cdff910 RSI: 00000000c0306457 RDI: 000000000000000c [ 242.838993] RBP: 00007f910cdff7e0 R08: 0000000000000000 R09: 0000000000000001 [ 242.838995] R10: 00007f910cdff9d4 R11: 0000000000000246 R12: 00007f910ce00640 [ 242.838997] R13: 00000000c0306457 R14: 000000000000000c R15: 00007fff9dd11d10 [ 242.839004] </TASK> v2: Addressed review comemnts from Christian. v3/v4: Addressed review comemnts from Christian. - Move drm_exec drm_exec loop after userq fence create. - cleanup the newly created userq fence in case of error. v5 - Addressed review comemnts from Christian. - Create a new amdgpu_userq_fence_alloc() function for allocation. - Calling dma_fence_put for cleanup procedure. - make amdgpu_userq_fence_create() function static. - drm_exec_init is called after mutex_unlock. Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Shashank Sharma <shashank.sharma@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-08drm/amdgpu: Fix out-of-bounds issue in user fenceArunpravin Paneer Selvam1-2/+4
Fix out-of-bounds issue in userq fence create when accessing the userq xa structure. Added a lock to protect the race condition. v2:(Christian) - Allocate memory with GFP_ATOMIC. v3: - Moved to 2 xa approach. v4:(Christian) - Lock the xa_for_each blocks and memory allocation part as well to make sure that xa is not modified in between the 2 xa_for_each blocks. BUG: KASAN: slab-out-of-bounds in amdgpu_userq_fence_create+0x726/0x880 [amdgpu] [ +0.000006] Call Trace: [ +0.000005] <TASK> [ +0.000005] dump_stack_lvl+0x6c/0x90 [ +0.000011] print_report+0xc4/0x5e0 [ +0.000009] ? srso_return_thunk+0x5/0x5f [ +0.000008] ? kasan_complete_mode_report_info+0x26/0x1d0 [ +0.000007] ? amdgpu_userq_fence_create+0x726/0x880 [amdgpu] [ +0.000405] kasan_report+0xdf/0x120 [ +0.000009] ? amdgpu_userq_fence_create+0x726/0x880 [amdgpu] [ +0.000405] __asan_report_store8_noabort+0x17/0x20 [ +0.000007] amdgpu_userq_fence_create+0x726/0x880 [amdgpu] [ +0.000406] ? __pfx_amdgpu_userq_fence_create+0x10/0x10 [amdgpu] [ +0.000408] ? srso_return_thunk+0x5/0x5f [ +0.000008] ? ttm_resource_move_to_lru_tail+0x235/0x4f0 [ttm] [ +0.000013] ? srso_return_thunk+0x5/0x5f [ +0.000008] amdgpu_userq_signal_ioctl+0xd29/0x1c70 [amdgpu] [ +0.000412] ? __pfx_amdgpu_userq_signal_ioctl+0x10/0x10 [amdgpu] [ +0.000404] ? try_to_wake_up+0x165/0x1840 [ +0.000010] ? __pfx_futex_wake_mark+0x10/0x10 [ +0.000011] drm_ioctl_kernel+0x178/0x2f0 [drm] [ +0.000050] ? __pfx_amdgpu_userq_signal_ioctl+0x10/0x10 [amdgpu] [ +0.000404] ? __pfx_drm_ioctl_kernel+0x10/0x10 [drm] [ +0.000043] ? __kasan_check_read+0x11/0x20 [ +0.000007] ? srso_return_thunk+0x5/0x5f [ +0.000007] ? __kasan_check_write+0x14/0x20 [ +0.000008] drm_ioctl+0x513/0xd20 [drm] [ +0.000040] ? __pfx_amdgpu_userq_signal_ioctl+0x10/0x10 [amdgpu] [ +0.000407] ? __pfx_drm_ioctl+0x10/0x10 [drm] [ +0.000044] ? srso_return_thunk+0x5/0x5f [ +0.000007] ? _raw_spin_lock_irqsave+0x99/0x100 [ +0.000007] ? __pfx__raw_spin_lock_irqsave+0x10/0x10 [ +0.000006] ? __rseq_handle_notify_resume+0x188/0xc30 [ +0.000008] ? srso_return_thunk+0x5/0x5f [ +0.000008] ? srso_return_thunk+0x5/0x5f [ +0.000006] ? _raw_spin_unlock_irqrestore+0x27/0x50 [ +0.000010] amdgpu_drm_ioctl+0xcd/0x1d0 [amdgpu] [ +0.000388] __x64_sys_ioctl+0x135/0x1b0 [ +0.000009] x64_sys_call+0x1205/0x20d0 [ +0.000007] do_syscall_64+0x4d/0x120 [ +0.000008] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ +0.000007] RIP: 0033:0x7f7c3d31a94f Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-08drm/amdgpu: Fix NULL ptr dereference issue for non userq fencesArunpravin Paneer Selvam1-1/+1
Add the correct fences count variable [num_fences] in the fences array iteration to handle the userq / non-userq fences. v2:(Christian) - All fences in the array either come from some reservation object or drm_syncobj. If any of those are NULL then there is a bug somewhere else. Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-08drm/amdgpu: simplify eviction fence suspend/resumeShashank Sharma1-16/+8
The basic idea in this redesign is to add an eviction fence only in UQ resume path. When userqueue is not present, keep ev_fence as NULL Main changes are: - do not create the eviction fence during evf_mgr_init, keeping evf_mgr->ev_fence=NULL until UQ get active. - do not replace the ev_fence in evf_resume path, but replace it only in uq_resume path, so remove all the unnecessary code from ev_fence_resume. - add a new helper function (amdgpu_userqueue_ensure_ev_fence) which will do the following: - flush any pending uq_resume work, so that it could create an eviction_fence - if there is no pending uq_resume_work, add a uq_resume work and wait for it to execute so that we always have a valid ev_fence - call this helper function from two places, to ensure we have a valid ev_fence: - when a new uq is created - when a new uq completion fence is created v2: Worked on review comments by Christian. v3: Addressed few more review comments by Christian. v4: Move mutex lock outside of the amdgpu_userqueue_suspend() function (Christian). v5: squash in build fix (Alex) Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-08drm/amdgpu: suspend gfx userqueuesShashank Sharma1-11/+26
This patch adds suspend support for gfx userqueues. It typically does the following: - adds an enable_signaling function for the eviction fence, so that it can trigger the userqueue suspend, - adds a delayed work to handle suspending of the eviction_fence - adds a suspend function to handle suspending of userqueues which suspends all the queues under this userq manager and signals the eviction fence, - adds a function to replace the old eviction fence with a new one and attach it to each of the objects, - adds reference of userq manager in the eviction fence container so that it can be used in the suspend function. V2: Addressed Christian's review comments: - schedule suspend work immediately V4: Addressed Christian's review comments: - wait for pending uq fences before starting suspend, added queue->last_fence for the same - accommodate ev_fence_mgr into existing code - some bug fixes and NULL checks V5: Addressed Christian's review comments (gitlab) - Wait for eviction fence to get signaled in destroy, don't signal it - Wait for eviction fence to get signaled in replace fence, don't signal it V6: Addressed Christian's review comments - Do not destroy the old eviction fence until we have it replaced - Change the sequence of fence replacement sub-tasks - reusing the ev_fence delayed work for userqueue suspend as well (Shashank). V7: Addressed Christian's review comments - give evf_mgr as argument (instead of fpriv) to replace_fence() - save ptr to evf_mgr in ev_fence (instead of uq_mgr) - modify suspend_all_queues logic to reflect error properly - remove the garbage drm_exec_lock section in wait_for_signal - grab the userqueue mutex before starting the wait for fence - remove the unrelated gobj check from signal_ioctl V8: Added race condition fixes Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian Koenig <christian.koenig@amd.com> Acked-by: Christian Koenig <christian.koenig@amd.com> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-08drm/amdgpu: Modify userq signal/wait struct field namesArunpravin Paneer Selvam1-10/+10
Modify kernel UAPI userq signal/wait struct field names and description corresponding to the libdrm UAPI review comments. libdrm MR: https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/392 Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-08drm/amdgpu: add userq specific kernel config for fence ioctlsArunpravin Paneer Selvam1-0/+16
Keep the user queue fence signal and wait IOCTLs in the kernel config CONFIG_DRM_AMDGPU_NAVI3X_USERQ. v2(Christian): - Remove the userq specific config added for kernel queues fence init function. v3(Alex): - It will be better to return an error(-ENOTSUPP) in these cases. Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-08drm/amdgpu: Add gpu_addr support to seq64 allocationArunpravin Paneer Selvam1-4/+4
Add gpu address support to seq64 alloc function. v1:(Christian) - Add the user of this new interface change to the same patch. Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-08drm/amdgpu: Add separate array of read and write for BO handlesArunpravin Paneer Selvam1-65/+181
Drop AMDGPU_USERQ_BO_WRITE as this should not be a global option of the IOCTL, It should be option per buffer. Hence adding separate array for read and write BO handles. v2(Marek): - Internal kernel details shouldn't be here. This file should only document the observed behavior, not the implementation . v3: - Fix DAL CI clang issue. v4: - Added Alex RB to merge the kernel UAPI changes since he has already approved the amdgpu_drm.h changes. Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Suggested-by: Marek Olšák <marek.olsak@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-08drm/amdgpu: add vm root BO lock before accessing the vmArunpravin Paneer Selvam1-9/+15
Add a vm root BO lock before accessing the userqueue VM. v1:(Christian) - Keep the VM locked until you are done with the mapping. - Grab a temporary BO reference, drop the VM lock and acquire the BO. When you are done with everything just drop the BO lock and then the temporary BO reference. Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-08drm/amdgpu: Add the missing error handling for xa_store() callArunpravin Paneer Selvam1-2/+4
Add the missing error handling for xa_store() call in the function amdgpu_userq_fence_driver_alloc(). Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-08drm/amdgpu: Few optimization and fixes for userq fence driverArunpravin Paneer Selvam1-17/+27
Few optimization and fixes for userq fence driver. v1:(Christian): - Remove unnecessary comments. - In drm_exec_init call give num_bo_handles as last parameter it would making allocation of the array more efficient - Handle return value of __xa_store() and improve the error handling of amdgpu_userq_fence_driver_alloc(). v2:(Christian): - Revert userq_xa xarray init to XA_FLAGS_LOCK_IRQ. - move the xa_unlock before the error check of the call xa_err(__xa_store()) and moved this change to a separate patch as this is adding a missing error handling. - Removed the unnecessary comments. Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-08drm/amdgpu: Enable userq fence interrupt supportArunpravin Paneer Selvam1-0/+6
Add support to handle the userqueue protected fence signal hardware interrupt. Create a xarray which maps the doorbell index to the fence driver address. This would help to retrieve the fence driver information when an userq fence interrupt is triggered. Firmware sends the doorbell offset value and this info is compared with the queue's mqd doorbell offset value. If they are same, we process the userq fence interrupt. v1:(Christian): - use xa_load to extract the fence driver. - move the amdgpu_userq_fence_driver_process call within the xa_lock as there is a chance that fence_drv might be freed. Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-08drm/amdgpu: Add wait IOCTL timeline syncobj supportArunpravin Paneer Selvam1-7/+84
Add user fence wait IOCTL timeline syncobj support. v2:(Christian) - handle dma_fence_wait() return value. - shorten the variable name syncobj_timeline_points a bit. - move num_points up to avoid padding issues. v3:(Christian) - Handle timeline drm_syncobj_find_fence() call error handling - Use dma_fence_unwrap_for_each() in timeline fence as there could be more than one fence. v4:(Christian) - Drop the first num_fences since fence is always included in the dma_fence_unwrap_for_each() iteration, when fence != f then fence is most likely just a container. v5: Added Alex RB to merge the kernel UAPI changes since he has already approved the amdgpu_drm.h changes. Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-08drm/amdgpu: Implement userqueue signal/wait IOCTLArunpravin Paneer Selvam1-7/+439
This patch introduces new IOCTL for userqueue secure semaphore. The signal IOCTL called from userspace application creates a drm syncobj and array of bo GEM handles and passed in as parameter to the driver to install the fence into it. The wait IOCTL gets an array of drm syncobjs, finds the fences attached to the drm syncobjs and obtain the array of memory_address/fence_value combintion which are returned to userspace. v2: (Christian) - Install fence into GEM BO object. - Lock all BO's using the dma resv subsystem - Reorder the sequence in signal IOCTL function. - Get write pointer from the shadow wptr - use userq_fence to fetch the va/value in wait IOCTL. v3: (Christian) - Use drm_exec helper for the proper BO drm reserve and avoid BO lock/unlock issues. - fence/fence driver reference count logic for signal/wait IOCTLs. v4: (Christian) - Fixed the drm_exec calling sequence - use dma_resv_for_each_fence_unlock if BO's are not locked - Modified the fence_info array storing logic. v5: (Christian) - Keep fence_drv until wait queue execution. - Add dma_fence_wait for other fences. - Lock BO's using drm_exec as the number of fences in them could change. - Install signaled fences as well into BO/Syncobj. - Move Syncobj fence installation code after the drm_exec_prepare_array. - Directly add dma_resv_usage_rw(args->bo_flags.... - remove unnecessary dma_fence_put. v6: (Christian) - Add xarray stuff to store the fence_drv - Implement a function to iterate over the xarray and drop the fence_drv references. - Add drm_exec_until_all_locked() wrapper - Add a check that if we haven't exceeded the user allocated num_fences before adding dma_fence to the fences array. v7: (Christian) - Use memdup_user() for kmalloc_array + copy_from_user - Move the fence_drv references from the xarray into the newly created fence and drop the fence_drv references when we signal this fence. - Move this locking of BOs before the "if (!wait_info->num_fences)", this way you need this code block only once. - Merge the error handling code and the cleanup + return 0 code. - Initializing the xa should probably be done in the userq code. - Remove the userq back pointer stored in fence_drv. - Pass xarray as parameter in amdgpu_userq_walk_and_drop_fence_drv() v8: (Christian) - Move fence_drv references must come before adding the fence to the list. - Use xa_lock_irqsave_nested for nested spinlock operations. - userq_mgr should be per fpriv and not one per device. - Restructure the interrupt process code for the early exit of the loop. - The reference acquired in the syncobj fence replace code needs to be kept around. - Modify the dma_fence acquire placement in wait IOCTL. - Move USERQ_BO_WRITE flag to UAPI header file. - drop the fence drv reference after telling the hw to stop accessing it. - Add multi sync object support to userq signal IOCTL. V9: (Christian) - Store all the fence_drv ref to other drivers and not ourself. - Remove the userq fence xa implementation and replace with kvmalloc_array. v10: (Christian) - Add a comment for the userq_xa xarray - drop the if check of userq_fence->fence_drv_array - use the i variable to initialize userq_fence->fence_drv_array_count - drop the fence reference before you free the array in the error handling, otherwise it could be that some references leaked. Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-08drm/amdgpu: screen freeze and userq driver crashArunpravin Paneer Selvam1-3/+4
Screen freeze and userq fence driver crash while playing Xonotic v2: (Christian) - There is change that fence might signal in between testing and grabbing the lock. Hence we can move the lock above the if..else check and use the dma_fence_is_signaled_locked(). Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-08drm/amdgpu: Implement a new userqueue fence driverArunpravin Paneer Selvam1-0/+257
Developed a userqueue fence driver for the userqueue process shared BO synchronization. Create a dma fence having write pointer as the seqno and allocate a seq64 memory for each user queue process and feed this memory address into the firmware/hardware, thus the firmware writes the read pointer into the given address when the process completes it execution. Compare wptr and rptr, if rptr >= wptr, signal the fences for the waiting process to consume the buffers. v2: Worked on review comments from Christian for the following modifications - Add wptr as sequence number into the fence - Add a reference count for the fence driver - Add dma_fence_put below the list_del as it might frees the userq fence. - Trim unnecessary code in interrupt handler. - Check dma fence signaled state in dma fence creation function for a potential problem of hardware completing the job processing beforehand. - Add necessary locks. - Create a list and process all the unsignaled fences. - clean up fences in destroy function. - implement .signaled callback function v3: Worked on review comments from Christian - Modify naming convention for reference counted objects - Fix fence driver reference drop issue - Drop amdgpu_userq_fence_driver_process() function return value v4: Worked on review comments from Christian - Moved fence driver allocation into amdgpu_userq_fence_driver_alloc() - Added detail doc mentioning the differences b/w two spinlocks declared. v5: Worked on review comments from Christian - Check before upcast and remove local variable - Add error handling in fence_drv alloc function. - Move rptr read fn outside of the loop and remove WARN_ON in destroy function. v6: - clear the seq64 memory in user fence driver(Christian) - fix for the wptr va bo mapping(Christian) - move the fence_drv xa entry erase code from the interrupt handler into user fence destroy function Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>