summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
AgeCommit message (Collapse)AuthorFilesLines
2025-07-21Merge tag 'amd-drm-next-6.17-2025-07-17' of ↵Dave Airlie1-5/+6
https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.17-2025-07-17: amdgpu: - Partition fixes - Reset fixes - RAS fixes - i2c fix - MPC updates - DSC cleanup - EDID fixes - Display idle D3 update - IPS updates - DMUB updates - Retimer fix - Replay fixes - Fix DC memory leak - Initial support for smartmux - DCN 4.0.1 degamma LUT fix - Per queue reset cleanups - Track ring state associated with a fence - SR-IOV fixes - SMU fixes - Per queue reset improvements for GC 9+ compute - Per queue reset improvements for GC 10+ gfx - Per queue reset improvements for SDMA 5+ - Per queue reset improvements for JPEG 2+ - Per queue reset improvements for VCN 2+ - GC 8 fix - ISP updates amdkfd: - Enable KFD on LoongArch radeon: - Drop console lock during suspend/resume UAPI: - Add userq slot info to INFO IOCTL Used for IGT userq validation tests (https://lists.freedesktop.org/archives/igt-dev/2025-July/093228.html) From: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20250717213827.2061581-1-alexander.deucher@amd.com Signed-off-by: Dave Airlie <airlied@redhat.com>
2025-07-21Merge tag 'drm-misc-next-2025-07-17' of ↵Dave Airlie1-0/+17
https://gitlab.freedesktop.org/drm/misc/kernel into drm-next drm-misc-next for 6.17: UAPI Changes: Cross-subsystem Changes: Core Changes: - mode_config: Change fb_create prototype to pass the drm_format_info and avoid redundant lookups in drivers - sched: kunit improvements, memory leak fixes, reset handling improvements - tests: kunit EDID update Driver Changes: - amdgpu: Hibernation fixes, structure lifetime fixes - nouveau: sched improvements - sitronix: Add Sitronix ST7567 Support - bridge: - Make connector available to bridge detect hook - panel: - More refcounting changes - New panels: BOE NE14QDM Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maxime Ripard <mripard@redhat.com> Link: https://lore.kernel.org/r/20250717-efficient-kudu-of-fantasy-ff95e0@houat
2025-07-16drm/amdgpu: make compute timeouts consistentAlex Deucher1-5/+5
For kernel compute queues, align the timeout with other kernel queues (10 sec). This had previously been set higher for OpenCL when it used kernel queues, but now OpenCL uses KFD user queues which don't have a timeout limitation. This also aligns with SR-IOV which already used a shorter timeout. Additionally the longer timeout negatively impacts the user experience with kernel queues for interactive applications. Reviewed-by: Kent Russell <kent.russell@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-07-15drm/amdgpu: refine eeprom data checkganglxie1-0/+1
add eeprom data checksum check before driver unload. reset eeprom and save correct data to eeprom when check failed Signed-off-by: ganglxie <ganglxie@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-07-10drm/amdgpu: do not resume device in thaw for normal hibernationSamuel Zhang1-0/+17
For normal hibernation, GPU do not need to be resumed in thaw since it is not involved in writing the hibernation image. Skip resume in this case can reduce the hibernation time. On VM with 8 * 192GB VRAM dGPUs, 98% VRAM usage and 1.7TB system memory, this can save 50 minutes. Signed-off-by: Samuel Zhang <guoqing.zhang@amd.com> Tested-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Link: https://lore.kernel.org/r/20250710062313.3226149-6-guoqing.zhang@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
2025-07-07Revert "drm/amdgpu: fix slab-use-after-free in amdgpu_userq_mgr_fini"Vitaly Prosyak1-1/+15
This reverts commit 5fb90421fa0fbe0a968274912101fe917bf1c47b. The original patch moved `amdgpu_userq_mgr_fini()` to the driver's `postclose` callback, which is called after `drm_gem_release()` in the DRM file cleanup sequence.If a user application crashes or aborts without cleaning up its user queues, 'drm_gem_release()` may free GEM objects that are still referenced by active user queues, leading to use-after-free. By reverting, we ensure that user queues are disabled and cleaned up before any GEM objects are released, preventing this class of bug. However, this reintroduces a race during PCI hot-unplug, where device removal can race with per-file cleanup, leading to use-after-free in suspend/unplug paths. This will be fixed in the next patch. Fixes: 5fb90421fa0f ("drm/amdgpu: fix slab-use-after-free in amdgpu_userq_mgr_fini+0x70c") Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-07-07drm/amdgpu: Pass adev pointer to functionsLijo Lazar1-8/+7
Pass amdgpu device context instead of drm device context to some amdgpu_device_* functions. DRM device context is not required in those functions. No functional change. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-30drm/amdgpu: Fix error with dev_info_once usageLijo Lazar1-1/+1
Fixes error with dev_info_once usage in amdgpu_device_asic_has_dc_support. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202506281140.mXfWT3EN-lkp@intel.com/ Fixes: a3e510fd69c3 ("drm/amdgpu: Convert from DRM_* to dev_*") Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-24drm/amd: Fix spelling mistake "correctalbe" -> "correctable"Colin Ian King1-1/+1
There is a spelling mistake in a pr_info message. Fix it. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-24drm/amdgpu: fix slab-use-after-free in amdgpu_userq_mgr_fini+0x70cVitaly Prosyak1-15/+1
The issue was reproduced on NV10 using IGT pci_unplug test. It is expected that `amdgpu_driver_postclose_kms()` is called prior to `amdgpu_drm_release()`. However, the bug is that `amdgpu_fpriv` was freed in `amdgpu_driver_postclose_kms()`, and then later accessed in `amdgpu_drm_release()` via a call to `amdgpu_userq_mgr_fini()`. As a result, KASAN detected a use-after-free condition, as shown in the log below. The proposed fix is to move the calls to `amdgpu_eviction_fence_destroy()` and `amdgpu_userq_mgr_fini()` into `amdgpu_driver_postclose_kms()`, so they are invoked before `amdgpu_fpriv` is freed. This also ensures symmetry with the initialization path in `amdgpu_driver_open_kms()`, where the following components are initialized: - `amdgpu_userq_mgr_init()` - `amdgpu_eviction_fence_init()` - `amdgpu_ctx_mgr_init()` Correspondingly, in `amdgpu_driver_postclose_kms()` we should clean up using: - `amdgpu_userq_mgr_fini()` - `amdgpu_eviction_fence_destroy()` - `amdgpu_ctx_mgr_fini()` This change eliminates the use-after-free and improves consistency in resource management between open and close paths. [ +0.094367] ================================================================== [ +0.000026] BUG: KASAN: slab-use-after-free in amdgpu_userq_mgr_fini+0x70c/0x730 [amdgpu] [ +0.000866] Write of size 8 at addr ffff88811c068c60 by task amd_pci_unplug/1737 [ +0.000026] CPU: 3 UID: 0 PID: 1737 Comm: amd_pci_unplug Not tainted 6.14.0+ #2 [ +0.000008] Hardware name: ASUS System Product Name/ROG STRIX B550-F GAMING (WI-FI), BIOS 1401 12/03/2020 [ +0.000004] Call Trace: [ +0.000004] <TASK> [ +0.000003] dump_stack_lvl+0x76/0xa0 [ +0.000010] print_report+0xce/0x600 [ +0.000009] ? amdgpu_userq_mgr_fini+0x70c/0x730 [amdgpu] [ +0.000790] ? srso_return_thunk+0x5/0x5f [ +0.000007] ? kasan_complete_mode_report_info+0x76/0x200 [ +0.000008] ? amdgpu_userq_mgr_fini+0x70c/0x730 [amdgpu] [ +0.000684] kasan_report+0xbe/0x110 [ +0.000007] ? amdgpu_userq_mgr_fini+0x70c/0x730 [amdgpu] [ +0.000601] __asan_report_store8_noabort+0x17/0x30 [ +0.000007] amdgpu_userq_mgr_fini+0x70c/0x730 [amdgpu] [ +0.000801] ? __pfx_amdgpu_userq_mgr_fini+0x10/0x10 [amdgpu] [ +0.000819] ? srso_return_thunk+0x5/0x5f [ +0.000008] amdgpu_drm_release+0xa3/0xe0 [amdgpu] [ +0.000604] __fput+0x354/0xa90 [ +0.000010] __fput_sync+0x59/0x80 [ +0.000005] __x64_sys_close+0x7d/0xe0 [ +0.000006] x64_sys_call+0x2505/0x26f0 [ +0.000006] do_syscall_64+0x7c/0x170 [ +0.000004] ? kasan_record_aux_stack+0xae/0xd0 [ +0.000005] ? srso_return_thunk+0x5/0x5f [ +0.000004] ? kmem_cache_free+0x398/0x580 [ +0.000006] ? __fput+0x543/0xa90 [ +0.000006] ? srso_return_thunk+0x5/0x5f [ +0.000004] ? __fput+0x543/0xa90 [ +0.000004] ? __kasan_check_read+0x11/0x20 [ +0.000007] ? srso_return_thunk+0x5/0x5f [ +0.000004] ? __kasan_check_read+0x11/0x20 [ +0.000003] ? srso_return_thunk+0x5/0x5f [ +0.000004] ? fpregs_assert_state_consistent+0x21/0xb0 [ +0.000006] ? srso_return_thunk+0x5/0x5f [ +0.000004] ? syscall_exit_to_user_mode+0x4e/0x240 [ +0.000005] ? srso_return_thunk+0x5/0x5f [ +0.000004] ? do_syscall_64+0x88/0x170 [ +0.000003] ? srso_return_thunk+0x5/0x5f [ +0.000004] ? do_syscall_64+0x88/0x170 [ +0.000004] ? srso_return_thunk+0x5/0x5f [ +0.000004] ? irqentry_exit+0x43/0x50 [ +0.000004] ? srso_return_thunk+0x5/0x5f [ +0.000004] ? exc_page_fault+0x7c/0x110 [ +0.000006] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ +0.000005] RIP: 0033:0x7ffff7b14f67 [ +0.000005] Code: ff e8 0d 16 02 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 41 c3 48 83 ec 18 89 7c 24 0c e8 73 ba f7 ff [ +0.000004] RSP: 002b:00007fffffffe358 EFLAGS: 00000246 ORIG_RAX: 0000000000000003 [ +0.000006] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ffff7b14f67 [ +0.000003] RDX: 0000000000000000 RSI: 00007ffff7f5755a RDI: 0000000000000003 [ +0.000003] RBP: 00007fffffffe380 R08: 0000555555568170 R09: 0000000000000000 [ +0.000003] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fffffffe5c8 [ +0.000003] R13: 00005555555552a9 R14: 0000555555557d48 R15: 00007ffff7ffd040 [ +0.000007] </TASK> [ +0.000286] Allocated by task 425 on cpu 11 at 29.751192s: [ +0.000013] kasan_save_stack+0x28/0x60 [ +0.000008] kasan_save_track+0x18/0x70 [ +0.000006] kasan_save_alloc_info+0x38/0x60 [ +0.000006] __kasan_kmalloc+0xc1/0xd0 [ +0.000005] __kmalloc_cache_noprof+0x1bd/0x430 [ +0.000006] amdgpu_driver_open_kms+0x172/0x760 [amdgpu] [ +0.000521] drm_file_alloc+0x569/0x9a0 [ +0.000008] drm_client_init+0x1b7/0x410 [ +0.000007] drm_fbdev_client_setup+0x174/0x470 [ +0.000007] drm_client_setup+0x8a/0xf0 [ +0.000006] amdgpu_pci_probe+0x50b/0x10d0 [amdgpu] [ +0.000482] local_pci_probe+0xe7/0x1b0 [ +0.000008] pci_device_probe+0x5bf/0x890 [ +0.000005] really_probe+0x1fd/0x950 [ +0.000007] __driver_probe_device+0x307/0x410 [ +0.000005] driver_probe_device+0x4e/0x150 [ +0.000006] __driver_attach+0x223/0x510 [ +0.000005] bus_for_each_dev+0x102/0x1a0 [ +0.000006] driver_attach+0x3d/0x60 [ +0.000005] bus_add_driver+0x309/0x650 [ +0.000005] driver_register+0x13d/0x490 [ +0.000006] __pci_register_driver+0x1ee/0x2b0 [ +0.000006] xfrm_ealg_get_byidx+0x43/0x50 [xfrm_algo] [ +0.000008] do_one_initcall+0x9c/0x3e0 [ +0.000007] do_init_module+0x29e/0x7f0 [ +0.000006] load_module+0x5c75/0x7c80 [ +0.000006] init_module_from_file+0x106/0x180 [ +0.000007] idempotent_init_module+0x377/0x740 [ +0.000006] __x64_sys_finit_module+0xd7/0x180 [ +0.000006] x64_sys_call+0x1f0b/0x26f0 [ +0.000006] do_syscall_64+0x7c/0x170 [ +0.000005] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ +0.000013] Freed by task 1737 on cpu 9 at 76.455063s: [ +0.000010] kasan_save_stack+0x28/0x60 [ +0.000006] kasan_save_track+0x18/0x70 [ +0.000005] kasan_save_free_info+0x3b/0x60 [ +0.000006] __kasan_slab_free+0x54/0x80 [ +0.000005] kfree+0x127/0x470 [ +0.000006] amdgpu_driver_postclose_kms+0x455/0x760 [amdgpu] [ +0.000485] drm_file_free.part.0+0x5b1/0xba0 [ +0.000007] drm_file_free+0x13/0x30 [ +0.000006] drm_client_release+0x1c4/0x2b0 [ +0.000006] drm_fbdev_ttm_fb_destroy+0xd2/0x120 [drm_ttm_helper] [ +0.000007] put_fb_info+0x97/0xe0 [ +0.000006] unregister_framebuffer+0x197/0x380 [ +0.000005] drm_fb_helper_unregister_info+0x94/0x100 [ +0.000005] drm_fbdev_client_unregister+0x3c/0x80 [ +0.000007] drm_client_dev_unregister+0x144/0x330 [ +0.000006] drm_dev_unregister+0x49/0x1b0 [ +0.000006] drm_dev_unplug+0x4c/0xd0 [ +0.000006] amdgpu_pci_remove+0x58/0x130 [amdgpu] [ +0.000482] pci_device_remove+0xae/0x1e0 [ +0.000006] device_remove+0xc7/0x180 [ +0.000006] device_release_driver_internal+0x3d4/0x5a0 [ +0.000007] device_release_driver+0x12/0x20 [ +0.000006] pci_stop_bus_device+0x104/0x150 [ +0.000006] pci_stop_and_remove_bus_device_locked+0x1b/0x40 [ +0.000005] remove_store+0xd7/0xf0 [ +0.000007] dev_attr_store+0x3f/0x80 [ +0.000006] sysfs_kf_write+0x125/0x1d0 [ +0.000005] kernfs_fop_write_iter+0x2ea/0x490 [ +0.000007] vfs_write+0x90d/0xe70 [ +0.000006] ksys_write+0x119/0x220 [ +0.000006] __x64_sys_write+0x72/0xc0 [ +0.000006] x64_sys_call+0x18ab/0x26f0 [ +0.000005] do_syscall_64+0x7c/0x170 [ +0.000005] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ +0.000013] The buggy address belongs to the object at ffff88811c068000 which belongs to the cache kmalloc-rnd-01-4k of size 4096 [ +0.000016] The buggy address is located 3168 bytes inside of freed 4096-byte region [ffff88811c068000, ffff88811c069000) [ +0.000022] The buggy address belongs to the physical page: [ +0.000010] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0xffff88811c06e000 pfn:0x11c068 [ +0.000006] head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0 [ +0.000006] flags: 0x17ffffc0000040(head|node=0|zone=2|lastcpupid=0x1fffff) [ +0.000007] page_type: f5(slab) [ +0.000007] raw: 0017ffffc0000040 ffff88810004c140 dead000000000122 0000000000000000 [ +0.000005] raw: ffff88811c06e000 0000000080040002 00000000f5000000 0000000000000000 [ +0.000006] head: 0017ffffc0000040 ffff88810004c140 dead000000000122 0000000000000000 [ +0.000005] head: ffff88811c06e000 0000000080040002 00000000f5000000 0000000000000000 [ +0.000006] head: 0017ffffc0000003 ffffea0004701a01 ffffffffffffffff 0000000000000000 [ +0.000005] head: 0000000000000008 0000000000000000 00000000ffffffff 0000000000000000 [ +0.000004] page dumped because: kasan: bad access detected [ +0.000011] Memory state around the buggy address: [ +0.000009] ffff88811c068b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ +0.000012] ffff88811c068b80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ +0.000011] >ffff88811c068c00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ +0.000011] ^ [ +0.000010] ffff88811c068c80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ +0.000011] ffff88811c068d00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ +0.000011] ================================================================== Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Lijo Lazar <lijo.lazar@amd.com> Cc: Jesse Zhang <Jesse.Zhang@amd.com> Cc: Arvind Yadav <arvind.yadav@amd.com> v2: drop amdgpu_drm_release() and assign drm_release() as the callback directly.(Alex) Fixes: adba0929736a ("drm/amdgpu: Fix Illegal opcode in command stream Error") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-24drm/amdgpu: remove fence slabAlex Deucher1-5/+0
Just use kmalloc for the fences in the rare case we need an independent fence. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-18drm/amd: Add support for a complete pmops actionMario Limonciello1-1/+1
complete() callbacks are supposed to handle reversing anything that occurred during prepare() callbacks. They'll be called on every power state transition, and will also be called if the sequence is failed (such as an aborted suspend). Add support for IP blocks to support this action. Reviewed-by: Alex Hung <alex.hung@amd.com> Link: https://lore.kernel.org/r/20250602014432.3538345-2-superm1@kernel.org Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-18drm/amdgpu: Add debug mask to disable CE logsXiang Liu1-0/+6
Add debug mask to disable kernel logs of RAS correctable errors, including both ACA and CE error counter kernel messages. Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-22drm/amdgpu: Fix eviction fence worker race during fd closeJesse.Zhang1-1/+1
The current cleanup order during file descriptor close can lead to a race condition where the eviction fence worker attempts to access a destroyed mutex from the user queue manager: [ 517.294055] DEBUG_LOCKS_WARN_ON(lock->magic != lock) [ 517.294060] WARNING: CPU: 8 PID: 2030 at kernel/locking/mutex.c:564 [ 517.294094] Workqueue: events amdgpu_eviction_fence_suspend_worker [amdgpu] The issue occurs because: 1. We destroy the user queue manager (including its mutex) first 2. Then try to destroy eviction fences which may have pending work 3. The eviction fence worker may try to access the already-destroyed mutex Fix this by reordering the cleanup to: 1. First mark the fd as closing and destroy eviction fences, which flushes any pending work 2. Then safely destroy the user queue manager after we're certain no more fence work will be executed The copy in amdgpu_driver_postclose_kms() needs to be removed (Christian) Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Arvind Yadav <Arvind.Yadav@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-08drm/amdgpu: Add debug bit for userptr usageShane Xiao1-0/+5
In VM debug mode, it is desirable to notify the application to correct the freeing sequence by unmapping the memory before destroying the userptr in the old userptr path. Add a bitmask to decide whether to send gpu vm fault to the applition. Signed-off-by: Shane Xiao <shane.xiao@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-08drm/amdgpu: fix pm notifier handlingAlex Deucher1-9/+1
Set the s3/s0ix and s4 flags in the pm notifier so that we can skip the resource evictions properly in pm prepare based on whether we are suspending or hibernating. Drop the eviction as processes are not frozen at this time, we we can end up getting stuck trying to evict VRAM while applications continue to submit work which causes the buffers to get pulled back into VRAM. v2: Move suspend flags out of pm notifier (Mario) Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4178 Fixes: 2965e6355dcd ("drm/amd: Add Suspend/Hibernate notification callback support") Cc: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-05drm/amdgpu: Add Support for enforcing isolation without Cleaner ShaderSrinivasan Shanmugam1-2/+2
Adjusted the enforce isolation setting handling to include the ability to disable the cleaner shader without affecting isolation between tasks. v2: Updated enforce isolation documentation and parameters. (Alex) Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-01drm/amdgpu/userq: fix user_queue parameters listBagas Sanjaya1-5/+6
Sphinx reports htmldocs warning: Documentation/gpu/amdgpu/module-parameters:7: drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c:1119: ERROR: Unexpected indentation. [docutils] Fix the warning by using reST bullet list syntax for user_queue parameter options, separated from preceding paragraph by a blank line. Fixes: fb20954c9717 ("drm/amdgpu/userq: rework driver parameter") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Closes: https://lore.kernel.org/linux-next/20250422202956.176fb590@canb.auug.org.au/ Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-22drm/amdgpu/userq: use consistent function namingAlex Deucher1-1/+1
s/userqueue/userq/ 1. remove the mix of amdgpu_userqueue and amdgpu_userq 2. to be consistent with other amdgpu_userq_fence.c 3. it's shorter Reviewed-by: Prike Liang <Prike.Liang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21drm/amdgpu/userq: rework driver parameterAlex Deucher1-6/+9
Replace disable_kq parameter with user_queue parameter. The parameter has the following logic: -1 = auto (ASIC specific default) 0 = user queues disabled 1 = user queues enabled and kernel queues enabled (if supported) 2 = user queues enabled and kernel queues disabled The default behavior (-1) is currently the same as 0 for current ASICs. To enable user queues (in addition to kernel queues) set user_queue=1. To enable user queues and disable kernel queues (to make all resources available to user queues), set user_queue=2. Reviewed-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-11drm/amdgpu: adjust enforce_isolation handlingAlex Deucher1-5/+7
Switch from a bool to an enum and allow more options for enforce isolation. There are now 3 modes of operation: - Disabled (0) - Enabled (serialization and cleaner shader) (1) - Enabled in legacy mode (no serialization or cleaner shader) (2) This provides better flexibility for more use cases. Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-11drm/amd: Forbid suspending into non-default suspend statesMario Limonciello1-1/+13
On systems that default to 'deep' some userspace software likes to try to suspend in 'deep' first. If there is a failure for any reason (such as -ENOMEM) the failure is ignored and then it will try to use 's2idle' as a fallback. This fails, but more importantly it leads to graphical problems. Forbid this behavior and only allow suspending in the last state supported by the system. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4093 Acked-by: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20250408180957.4027643-1-superm1@kernel.org Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-08drm/amdgpu: add parameter to disable kernel queuesAlex Deucher1-0/+9
On chips that support user queues, setting this option will disable kernel queues to be used to validate user queues without kernel queues. Reviewed-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-08drm/amdgpu/userq: prevent runtime pm when userqs are activeAlex Deucher1-0/+30
Similar to KFD, prevent runtime pm while user queues are active. Reviewed-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-08drm/amdgpu: bump version for user queue IP support queryAlex Deucher1-1/+2
Add the user queue IP support query to the drm_amdgpu_info_device query. Cc: marek.olsak@amd.com Cc: prike.liang@amd.com Cc: sunil.khatri@amd.com Cc: yogesh.mohanmarimuthu@amd.com Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-08drm/amdgpu: Fix Illegal opcode in command stream ErrorArvind Yadav1-1/+15
When applications closes, it triggers the drm_file_free function which subsequently releases all allocated buffer objects. Concurrently, the resume_worker thread will attempt to map the usermode queue. However, since the wptr buffer object has already been deallocated, this will result in an Illegal opcode error being raised in the command stream. Now replacing drm_release() with a new function amdgpu_drm_release(). This function will set the flag to prevent the scheduling of any new queue resume/map, stop all queues and then call drm_release(). V2: - Replace drm_release with amdgpu_drm_release(Christian). Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian Koenig <christian.koenig@amd.com> Reviewed-by: Shashank Sharma <shashank.sharma@amd.com> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-08drm/amdgpu: Implement userqueue signal/wait IOCTLArunpravin Paneer Selvam1-0/+2
This patch introduces new IOCTL for userqueue secure semaphore. The signal IOCTL called from userspace application creates a drm syncobj and array of bo GEM handles and passed in as parameter to the driver to install the fence into it. The wait IOCTL gets an array of drm syncobjs, finds the fences attached to the drm syncobjs and obtain the array of memory_address/fence_value combintion which are returned to userspace. v2: (Christian) - Install fence into GEM BO object. - Lock all BO's using the dma resv subsystem - Reorder the sequence in signal IOCTL function. - Get write pointer from the shadow wptr - use userq_fence to fetch the va/value in wait IOCTL. v3: (Christian) - Use drm_exec helper for the proper BO drm reserve and avoid BO lock/unlock issues. - fence/fence driver reference count logic for signal/wait IOCTLs. v4: (Christian) - Fixed the drm_exec calling sequence - use dma_resv_for_each_fence_unlock if BO's are not locked - Modified the fence_info array storing logic. v5: (Christian) - Keep fence_drv until wait queue execution. - Add dma_fence_wait for other fences. - Lock BO's using drm_exec as the number of fences in them could change. - Install signaled fences as well into BO/Syncobj. - Move Syncobj fence installation code after the drm_exec_prepare_array. - Directly add dma_resv_usage_rw(args->bo_flags.... - remove unnecessary dma_fence_put. v6: (Christian) - Add xarray stuff to store the fence_drv - Implement a function to iterate over the xarray and drop the fence_drv references. - Add drm_exec_until_all_locked() wrapper - Add a check that if we haven't exceeded the user allocated num_fences before adding dma_fence to the fences array. v7: (Christian) - Use memdup_user() for kmalloc_array + copy_from_user - Move the fence_drv references from the xarray into the newly created fence and drop the fence_drv references when we signal this fence. - Move this locking of BOs before the "if (!wait_info->num_fences)", this way you need this code block only once. - Merge the error handling code and the cleanup + return 0 code. - Initializing the xa should probably be done in the userq code. - Remove the userq back pointer stored in fence_drv. - Pass xarray as parameter in amdgpu_userq_walk_and_drop_fence_drv() v8: (Christian) - Move fence_drv references must come before adding the fence to the list. - Use xa_lock_irqsave_nested for nested spinlock operations. - userq_mgr should be per fpriv and not one per device. - Restructure the interrupt process code for the early exit of the loop. - The reference acquired in the syncobj fence replace code needs to be kept around. - Modify the dma_fence acquire placement in wait IOCTL. - Move USERQ_BO_WRITE flag to UAPI header file. - drop the fence drv reference after telling the hw to stop accessing it. - Add multi sync object support to userq signal IOCTL. V9: (Christian) - Store all the fence_drv ref to other drivers and not ourself. - Remove the userq fence xa implementation and replace with kvmalloc_array. v10: (Christian) - Add a comment for the userq_xa xarray - drop the if check of userq_fence->fence_drv_array - use the i variable to initialize userq_fence->fence_drv_array_count - drop the fence reference before you free the array in the error handling, otherwise it could be that some references leaked. Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-08drm/amdgpu: Implement a new userqueue fence driverArunpravin Paneer Selvam1-0/+6
Developed a userqueue fence driver for the userqueue process shared BO synchronization. Create a dma fence having write pointer as the seqno and allocate a seq64 memory for each user queue process and feed this memory address into the firmware/hardware, thus the firmware writes the read pointer into the given address when the process completes it execution. Compare wptr and rptr, if rptr >= wptr, signal the fences for the waiting process to consume the buffers. v2: Worked on review comments from Christian for the following modifications - Add wptr as sequence number into the fence - Add a reference count for the fence driver - Add dma_fence_put below the list_del as it might frees the userq fence. - Trim unnecessary code in interrupt handler. - Check dma fence signaled state in dma fence creation function for a potential problem of hardware completing the job processing beforehand. - Add necessary locks. - Create a list and process all the unsignaled fences. - clean up fences in destroy function. - implement .signaled callback function v3: Worked on review comments from Christian - Modify naming convention for reference counted objects - Fix fence driver reference drop issue - Drop amdgpu_userq_fence_driver_process() function return value v4: Worked on review comments from Christian - Moved fence driver allocation into amdgpu_userq_fence_driver_alloc() - Added detail doc mentioning the differences b/w two spinlocks declared. v5: Worked on review comments from Christian - Check before upcast and remove local variable - Add error handling in fence_drv alloc function. - Move rptr read fn outside of the loop and remove WARN_ON in destroy function. v6: - clear the seq64 memory in user fence driver(Christian) - fix for the wptr va bo mapping(Christian) - move the fence_drv xa entry erase code from the interrupt handler into user fence destroy function Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-08drm/amdgpu: add new IOCTL for usermode queueShashank Sharma1-0/+1
This patch adds: - A new IOCTL function to create and destroy - A new structure to keep all the user queue data in one place. - A function to generate unique index for the queue. V1: Worked on review comments from RFC patch series: - Alex: Keep a list of queues, instead of single queue per process. - Christian: Use the queue manager instead of global ptrs, Don't keep the queue structure in amdgpu_ctx V2: Worked on review comments: - Christian: - Formatting of text - There is no need for queuing of userqueues, with idr in place - Alex: - Remove use_doorbell, its unnecessary - Reuse amdgpu_mqd_props for saving mqd fields - Code formatting and re-arrangement V3: - Integration with doorbell manager V4: - Accommodate MQD union related changes in UAPI (Alex) - Do not set the queue size twice (Bas) V5: - Remove wrapper functions for queue indexing (Christian) - Do not save the queue id/idr in queue itself (Christian) - Move the idr allocation in the IP independent generic space (Christian) V6: - Check the validity of input IP type (Christian) V7: - Move uq_func from uq_mgr to adev (Alex) - Add missing free(queue) for error cases (Yifan) V9: - Rebase V10: Addressed review comments from Christian, and added R-B: - Do not initialize the local variable - Convert DRM_ERROR to DEBUG. V11: - check the input flags to be zero (Alex) Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian Koenig <christian.koenig@amd.com> Reviewed-by: Christian Koenig <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-08drm/amdgpu: add usermode queue base codeShashank Sharma1-0/+1
This patch adds IP independent skeleton code for amdgpu usermode queue. It contains: - A new files with init functions of usermode queues. - A queue context manager in driver private data. V1: Worked on design review comments from RFC patch series: (https://patchwork.freedesktop.org/series/112214/) - Alex: Keep a list of queues, instead of single queue per process. - Christian: Use the queue manager instead of global ptrs, Don't keep the queue structure in amdgpu_ctx V2: - Reformatted code, split the big patch into two V3: - Integration with doorbell manager V4: - Align the structure member names to the largest member's column (Luben) - Added SPDX license (Luben) V5: - Do not add amdgpu.h in amdgpu_userqueue.h (Christian). - Move struct amdgpu_userq_mgr into amdgpu_userqueue.h (Christian). V6: Rebase V9: Rebase V10: Rebase + Alex's R-B Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian Koenig <christian.koenig@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-07drm/amdgpu: add rebar parameterAlex Deucher1-0/+11
Add a new parameter to disable BAR resizing. Note that this only disables the driver from attempting to resize the BAR, The BIOS may have resized the BAR at boot. Some teams have found this useful in debugging P2P DMA issues on systems where the available MMIO space did not allow for all of the GPUs present to resize their BARs. Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-27drm/amd: Handle being compiled without SI or CIK support betterMario Limonciello1-20/+24
If compiled without SI or CIK support but amdgpu tries to load it will run into failures with uninitialized callbacks. Show a nicer message in this case and fail probe instead. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4050 Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2025-03-19Documentation/amdgpu: Add debug_mask documentationLijo Lazar1-0/+5
Add description for debug_mask bit options. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-19drm/amd/pm: Add debug bit for smu pool allocationLijo Lazar1-0/+5
In certain cases, it's desirable to avoid PMFW log transactions to system memory. Add a mask bit to decide whether to allocate smu pool in device memory or system memory. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-18drm/amdgpu: adjust drm_firmware_drivers_only() handlingAlex Deucher1-0/+14
Move to probe so we can check the PCI device type and only apply the drm_firmware_drivers_only() check for PCI DISPLAY classes. Also add a module parameter to override the nomodeset kernel parameter as a workaround for platforms that have this hardcoded on their kernel command lines. Reviewed-by: Kent Russell <kent.russell@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-18drm/amdgpu: drop drm_firmware_drivers_only()Alex Deucher1-3/+0
There are a number of systems and cloud providers out there that have nomodeset hardcoded in their kernel parameters to block nouveau for the nvidia driver. This prevents the amdgpu driver from loading. Unfortunately the end user cannot easily change this. The preferred way to block modules from loading is to use modprobe.blacklist=<driver>. That is what providers should be using to block specific drivers. Drop the check to allow the driver to load even when nomodeset is specified on the kernel command line. Reviewed-by: Kent Russell <kent.russell@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-10drm/amd/display: allow 256B DCC max compressed block sizes on gfx12Marek Olšák1-1/+2
The hw supports it. Signed-off-by: Marek Olšák <marek.olsak@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-07drm/amd: Keep display off while going into S4Mario Limonciello1-2/+9
When userspace invokes S4 the flow is: 1) amdgpu_pmops_prepare() 2) amdgpu_pmops_freeze() 3) Create hibernation image 4) amdgpu_pmops_thaw() 5) Write out image to disk 6) Turn off system Then on resume amdgpu_pmops_restore() is called. This flow has a problem that because amdgpu_pmops_thaw() is called it will call amdgpu_device_resume() which will resume all of the GPU. This includes turning the display hardware back on and discovering connectors again. This is an unexpected experience for the display to turn back on. Adjust the flow so that during the S4 sequence display hardware is not turned back on. Reported-by: Xaver Hugl <xaver.hugl@gmail.com> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2038 Cc: Muhammad Usama Anjum <usama.anjum@collabora.com> Tested-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Harry Wentland <harry.wentland@amd.com> Link: https://lore.kernel.org/r/20250306185124.44780-1-mario.limonciello@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-28drm/amdgpu: Create a debug option to disable ring resetAndré Almeida1-0/+6
Prior to the addition of ring reset, the debug option `debug_disable_soft_recovery` could be used to force a full device reset. Now that we have ring reset, create a debug option to disable them in amdgpu, forcing the driver to go with the full device reset path again when both options are combined. This option is useful for testing and debugging purposes when one wants to test the full reset from userspace. Signed-off-by: André Almeida <andrealmeid@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-13drm/amdgpu: Add flags to distinguish vf/pf/pt modeAsad Kamal1-1/+2
Add extra flag definition for ids_flag field to distinguish between vf/pf/pt modes v2: Updated kms driver minor version & removed pf check as default is 0 v3: Fix up version (Alex) v4: rebase (Alex) Proposed userspace: https://github.com/ROCm/amdsmi/commit/e663bed7d6b3df79f5959e73981749b1f22ec698 Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-13drm/amdgpu: Update usage for bad page thresholdHawking Zhang1-1/+1
The driver's behavior varies based on the configuration of amdgpu_bad_page_threshold setting Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-13drm/amd: Mark amdgpu.gttsize parameter as deprecated and show warnings on useMario Limonciello1-0/+1
When not set `gttsize` module parameter by default will get the value to use for the GTT pool from the TTM page limit, which is set by a separate module parameter. This inevitably leads to people not sure which one to set when they want more addressable memory for the GPU, and you'll end up seeing instructions online saying to set both. Add some messages to try to guide people both who are using or misusing the parameters and mark the parameter as deprecated with the plan to drop it after the next LTS kernel release. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-13drm/amdgpu: bump version for RV/PCO compute fixAlex Deucher1-1/+2
Bump the driver version for RV/PCO compute stability fix so mesa can use this check to enable compute queues on RV/PCO. Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.12.x
2025-02-03drm/amdgpu: add a BO metadata flag to disable write compression for VulkanMarek Olšák1-1/+2
Vulkan can't support DCC and Z/S compression on GFX12 without WRITE_COMPRESS_DISABLE in this commit or a completely different DCC interface. AMDGPU_TILING_GFX12_SCANOUT is added because it's already used by userspace. Cc: stable@vger.kernel.org # 6.12.x Signed-off-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-01-24drm/amd: Clarify kdoc for amdgpu.gttsizeMario Limonciello1-1/+1
Effectively amdgpu.gttsize gets set to ~1/2 of RAM, but that's controlled by what the TTM page limit is set to. Clarify the kdoc. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-01-14drm/amdgpu: Mark debug KFD module params as unsafeKent Russell1-5/+5
Mark options only meant to be used for debugging as unsafe so that the kernel is tainted when they are used. Signed-off-by: Kent Russell <kent.russell@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-01-14drm/amdgpu: mark a bunch of module parameters unsafeChristian König1-7/+7
We sometimes have people trying to use debugging options in production environments. Mark options only meant to be used for debugging as unsafe so that the kernel is tainted when they are used. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Felix Kuehling <felix.kuehling@amd.com> Acked-by: Simona Vetter <simona.vetter@ffwll.ch> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-20Merge tag 'amd-drm-next-6.14-2024-12-18' of ↵Dave Airlie1-1/+0
https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.14-2024-12-18: amdgpu: - RAS updates - ISP updates - SDMA queue reset support - Rework DPM powergating interfaces - Documentation updates and cleanups - Panel replay fixes - DCN 3.5 updates - DP tunneling fixes - Use a pm notifier to more gracefully handle VRAM eviction on suspend or hibernate - Add debugfs interfaces for forcing scheduling to specific engine instances - GG 9.5 updates - IH 4.4 updates - Make missing optional firmware less noisy - PSP 13.x updates - SMU 13.x updates - VCN 5.x updates - JPEG 5.x updates - Misc cleanups - GC 12.x updates - DRM panic support - DC FAMS updates - DSC fixes - job handling fixes amdkfd: - GG 9.5 updates - Logging improvements - Misc cleanups - Various Optimizations Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241218201758.2580723-1-alexander.deucher@amd.com
2024-12-10drm/amd: Add Suspend/Hibernate notification callback supportMario Limonciello1-1/+0
As part of the suspend sequence VRAM needs to be evicted on dGPUs. In order to make suspend/resume more reliable we moved this into the pmops prepare() callback so that the suspend sequence would fail but the system could remain operational under high memory usage suspend. Another class of issues exist though where due to memory fragementation there isn't a large enough contiguous space and swap isn't accessible. Add support for a suspend/hibernate notification callback that could evict VRAM before tasks are frozen. This should allow paging out to swap if necessary. Link: https://github.com/ROCm/ROCK-Kernel-Driver/issues/174 Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3476 Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2362 Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3781 Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Link: https://lore.kernel.org/r/20241128032656.2090059-2-superm1@kernel.org Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-05drm: remove driver date from struct drm_driver and all driversJani Nikula1-2/+0
We stopped using the driver initialized date in commit 7fb8af6798e8 ("drm: deprecate driver date") and (eventually) started returning "0" for drm_version ioctl instead. Finish the job, and remove the unused date member from struct drm_driver, its initialization from drivers, along with the common DRIVER_DATE macros. v2: Also update drivers/accel (kernel test robot) Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Simon Ser <contact@emersion.fr> Acked-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Acked-by: Lucas De Marchi <lucas.demarchi@intel.com> Acked-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> # msm Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/1f2bf2543aed270a06f6c707fd6ed1b78bf16712.1733322525.git.jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com>