summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu
AgeCommit message (Collapse)AuthorFilesLines
2023-01-05Revert "drm/amd/display: Enable Freesync Video Mode by default"Michel Dänzer2-0/+28
This reverts commit de05abe6b9d0fe08f65d744f7f75a4cba4df27ad. The bug referenced below was bisected to this commit. There has been no activity toward fixing it in 3 months, so let's revert for now. Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2162 Signed-off-by: Michel Dänzer <mdaenzer@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2023-01-04Merge tag 'drm-misc-next-2023-01-03' of ↵Daniel Vetter14-15/+18
git://anongit.freedesktop.org/drm/drm-misc into drm-next drm-misc-next for v6.3: UAPI Changes: * connector: Support analog-TV mode property * media: Add MEDIA_BUS_FMT_RGB565_1X24_CPADHI, MEDIA_BUS_FMT_RGB666_1X18 and MEDIA_BUS_FMT_RGB666_1X24_CPADHI Cross-subsystem Changes: * dma-buf: Documentation fixes * i2c: Introduce i2c_client_get_device_id() helper Core Changes: * Improve support for analog TV output * bridge: Remove unused drm_bridge_chain functions * debugfs: Add per-device helpers and convert various DRM drivers * dp-mst: Various fixes * fbdev emulation: Always pick 32 bpp as default * KUnit: Add tests for managed helpers; Various cleanups * panel-orientation: Add quirks for Lenovo Yoga Tab 3 X90F and DynaBook K50 * TTM: Open-code ttm_bo_wait() and remove the helper Driver Changes: * Fix preferred depth and bpp values throughout DRM drivers * Remove #CONFIG_PM guards throughout DRM drivers * ast: Various fixes * bridge: Implement i2c's probe_new in various drivers; Fixes; ite-it6505: Locking fixes, Cache EDID data; ite-it66121: Support IT6610 chip, Cleanups; lontium-tl9611: Fix HDMI on DragonBoard 845c; parade-ps8640: Use atomic bridge functions * gud: Convert to DRM shadow-plane helpers; Perform flushing synchronously during atomic update * ili9486: Support 16-bit pixel data * imx: Split off IPUv3 driver; Various fixes * mipi-dbi: Convert to DRM shadow-plane helpers plus rsp driver changes;i Support separate I/O-voltage supply * mxsfb: Depend on ARCH_MXS or ARCH_MXC * omapdrm: Various fixes * panel: Use ktime_get_boottime() to measure power-down delay in various drivers; Fix auto-suspend delay in various drivers; orisetech-ota5601a: Add support * sprd: Cleanups * sun4i: Convert to new TV-mode property * tidss: Various fixes * v3d: Various fixes * vc4: Convert to new TV-mode property; Support Kunit tests; Cleanups; dpi: Support RGB565 and RGB666 formats; dsi: Convert DSI driver to bridge * virtio: Improve tracing * vkms: Support small cursors in IGT tests; Various fixes Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/Y7QIwlfElAYWxRcR@linux-uq9g
2023-01-04Revert "drm/amd/display: Enable Freesync Video Mode by default"Michel Dänzer2-0/+28
This reverts commit de05abe6b9d0fe08f65d744f7f75a4cba4df27ad. The bug referenced below was bisected to this commit. There has been no activity toward fixing it in 3 months, so let's revert for now. Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2162 Signed-off-by: Michel Dänzer <mdaenzer@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-04drm/amdgpu: remove enable ras cmd call traceStanley.Yang1-3/+13
[Why] [ 41.285804] RIP: 0010:amdgpu_ras_feature_enable+0x15c/0x310 [amdgpu] [ 41.285945] Code: 48 89 c1 48 c7 c2 b9 f2 88 c1 48 c7 c0 c0 f2 88 c1 49 8b 3c 24 48 0f 44 d0 48 c7 c6 98 33 80 c1 e8 5f 52 75 d9 e9 fa fe ff ff <0f> 0b e9 66 ff ff ff 48 8b 3d 86 8c 0f da ba 00 04 00 00 be c0 0d [ 41.285946] RSP: 0018:ffffbccdc72efc90 EFLAGS: 00010246 [ 41.285948] RAX: 0000000000000004 RBX: ffff931897406980 RCX: 0000000000000002 [ 41.285949] RDX: 0000000000000dc0 RSI: 0000000000000002 RDI: ffff931500042b00 [ 41.285950] RBP: ffffbccdc72efcc0 R08: 0000000000000002 R09: ffff931885b87000 [ 41.285951] R10: 0000000000ffff10 R11: 0000000000000001 R12: ffff931893e20000 [ 41.285952] R13: 0000000000000001 R14: ffff931885b87000 R15: 0000000000000000 [ 41.285953] FS: 0000000000000000(0000) GS:ffff931c6f200000(0000) knlGS:0000000000000000 [ 41.285954] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 41.285955] CR2: 000055dd6f532008 CR3: 000000061b010006 CR4: 00000000003706e0 [ 41.285956] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 41.285957] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 41.285958] Call Trace: [ 41.285959] <TASK> [ 41.285963] ? gfx_v11_0_early_init+0x250/0x250 [amdgpu] [ 41.286117] gfx_v11_0_late_init+0x8c/0xb0 [amdgpu] [ 41.286271] amdgpu_device_ip_late_init+0x8d/0x3c0 [amdgpu] [ 41.286401] amdgpu_device_init.cold+0x1677/0x1fda [amdgpu] [ 41.286616] ? pci_bus_read_config_word+0x4a/0x70 [ 41.286621] ? do_pci_enable_device+0xdb/0x110 [ 41.286625] amdgpu_driver_load_kms+0x1a/0x160 [amdgpu] [ 41.286762] amdgpu_pci_probe+0x18d/0x3a0 [amdgpu] [ 41.286898] local_pci_probe+0x4b/0x90 [ 41.286901] work_for_cpu_fn+0x1a/0x30 [ 41.286903] process_one_work+0x22b/0x3d0 [ 41.286905] worker_thread+0x223/0x420 [ 41.286907] ? process_one_work+0x3d0/0x3d0 [ 41.286908] kthread+0x12a/0x150 [ 41.286911] ? set_kthread_struct+0x50/0x50 [ 41.286913] ret_from_fork+0x22/0x30 [How] For specific asic, only mem ecc is enabled, sram ecc is not enabled, but it still need to send ras enable cmd to gfx block to support poison mode, so add check posion mode. Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-04drm/amdgpu: correct umc poison mode set valueStanley.Yang1-20/+4
For GFX 11.0.3, Due to security policy, there is no way to check UcFatalEn field of UMCCH0_0_GeccCtrl to identify UMC poison mode. This is workaround force set umc poison mode as 1 for GFX 11.0.3 Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-04drm/amdgpu: allow zero as vram limitChristian König3-3/+3
This allows testing the driver without any VRAM. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Luben Tuikov <luben.tuikov@amd.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-04drm/amdgpu: cleanup visible vram size handlingChristian König6-19/+7
Centralize the limit handling and validation in one place instead of spreading that around in different hw generations. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Luben Tuikov <luben.tuikov@amd.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-04drm/amdgpu: rename vram_scratch into mem_scratchChristian König18-32/+32
Rename vram_scratch into mem_scratch and allow allocating it into GTT as well. The only problem with that is that we won't have a default page for the system aperture any more. Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-04drm/amdgpu: use VRAM|GTT for a bunch of kernel allocationsChristian König16-49/+101
Technically all of those can use GTT as well, no need to force things into VRAM. Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-04drm/amdgpu: enable VCN DPG for GC IP v11.0.4Saleemkhan Jamadar1-0/+1
Enable VCN Dynamic Power Gating control for GC IP v11.0.4. Signed-off-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com> Reviewed-by: Veerabadhran Gopalakrishnan <veerabadhran.gopalakrishnan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-04drm/amdgpu: adjust the sequence to check soft resetLikun Gao1-6/+2
1.Drop soft reset check when do should recover gpu check. (As it will skip gpu reset operation if some ip is hang but not support soft reset) 2.Check soft reset status before do soft reset when pre asic reset. a. If check soft reset return true, it means: some ip is hang and it also support soft reset, will try soft reset first. b. If check soft reset return false, it means: I. All the ip are not hang, will skip gpu reset. II. Some ip is hang but not support soft reset, will skip soft reset and retry with full reset later. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-12-23Merge tag 'drm-next-2022-12-23' of git://anongit.freedesktop.org/drm/drmLinus Torvalds22-234/+355
Pull drm fixes from Dave Airlie: "Holiday fixes! Two batches from amd, and one group of i915 changes. amdgpu: - Spelling fix - BO pin fix - Properly handle polaris 10/11 overlap asics - GMC9 fix - SR-IOV suspend fix - DCN 3.1.4 fix - KFD userptr locking fix - SMU13.x fixes - GDS/GWS/OA handling fix - Reserved VMID handling fixes - FRU EEPROM fix - BO validation fixes - Avoid large variable on the stack - S0ix fixes - SMU 13.x fixes - VCN fix - Add missing fence reference amdkfd: - Fix init vm error handling - Fix double release of compute pasid i915 - Documentation fixes - OA-perf related fix - VLV/CHV HDMI/DP audio fix - Display DDI/Transcoder fix - Migrate fixes" * tag 'drm-next-2022-12-23' of git://anongit.freedesktop.org/drm/drm: (39 commits) drm/amdgpu: grab extra fence reference for drm_sched_job_add_dependency drm/amdgpu: enable VCN DPG for GC IP v11.0.4 drm/amdgpu: skip mes self test after s0i3 resume for MES IP v11.0 drm/amd/pm: correct the fan speed retrieving in PWM for some SMU13 asics drm/amd/pm: bump SMU13.0.0 driver_if header to version 0x34 drm/amdgpu: skip MES for S0ix as well since it's part of GFX drm/amd/pm: avoid large variable on kernel stack drm/amdkfd: Fix double release compute pasid drm/amdkfd: Fix kfd_process_device_init_vm error handling drm/amd/pm: update SMU13.0.0 reported maximum shader clock drm/amd/pm: correct SMU13.0.0 pstate profiling clock settings drm/amd/pm: enable GPO dynamic control support for SMU13.0.7 drm/amd/pm: enable GPO dynamic control support for SMU13.0.0 drm/amdgpu: revert "generally allow over-commit during BO allocation" drm/amdgpu: Remove unnecessary domain argument drm/amdgpu: Fix size validation for non-exclusive domains (v4) drm/amdgpu: Check if fru_addr is not NULL (v2) drm/i915/ttm: consider CCS for backup objects drm/i915/migrate: fix corner case in CCS aux copying drm/amdgpu: rework reserved VMID handling ...
2022-12-21drm/amdgpu: grab extra fence reference for drm_sched_job_add_dependencyChristian König1-0/+2
That function consumes the reference. Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Reported-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Christian König <christian.koenig@amd.com> Fixes: aab9cf7b6954 ("drm/amdgpu: use scheduler dependencies for VM updates") Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-12-21drm/amdgpu: enable VCN DPG for GC IP v11.0.4Saleemkhan Jamadar1-0/+1
Enable VCN Dynamic Power Gating control for GC IP v11.0.4. Signed-off-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com> Reviewed-by: Veerabadhran Gopalakrishnan <veerabadhran.gopalakrishnan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.0, 6.1
2022-12-20drm/amdgpu: skip mes self test after s0i3 resume for MES IP v11.0Tim Huang1-1/+2
MES is part of gfxoff and MES suspend and resume are skipped for S0i3. But the mes_self_test call path is still in the amdgpu_device_ip_late_init. it's should also be skipped for s0ix as no hardware re-initialization happened. Besides, mes_self_test will free the BO that triggers a lot of warning messages while in the suspend state. [ 81.656085] WARNING: CPU: 2 PID: 1550 at drivers/gpu/drm/amd/amdgpu/amdgpu_object.c:425 amdgpu_bo_free_kernel+0xfc/0x110 [amdgpu] [ 81.679435] Call Trace: [ 81.679726] <TASK> [ 81.679981] amdgpu_mes_remove_hw_queue+0x17a/0x230 [amdgpu] [ 81.680857] amdgpu_mes_self_test+0x390/0x430 [amdgpu] [ 81.681665] mes_v11_0_late_init+0x37/0x50 [amdgpu] [ 81.682423] amdgpu_device_ip_late_init+0x53/0x280 [amdgpu] [ 81.683257] amdgpu_device_resume+0xae/0x2a0 [amdgpu] [ 81.684043] amdgpu_pmops_resume+0x37/0x70 [amdgpu] [ 81.684818] pci_pm_resume+0x5c/0xa0 [ 81.685247] ? pci_pm_thaw+0x90/0x90 [ 81.685658] dpm_run_callback+0x4e/0x160 [ 81.686110] device_resume+0xad/0x210 [ 81.686529] async_resume+0x1e/0x40 [ 81.686931] async_run_entry_fn+0x33/0x120 [ 81.687405] process_one_work+0x21d/0x3f0 [ 81.687869] worker_thread+0x4a/0x3c0 [ 81.688293] ? process_one_work+0x3f0/0x3f0 [ 81.688777] kthread+0xff/0x130 [ 81.689157] ? kthread_complete_and_exit+0x20/0x20 [ 81.689707] ret_from_fork+0x22/0x30 [ 81.690118] </TASK> [ 81.690380] ---[ end trace 0000000000000000 ]--- v2: make the comment clean and use adev->in_s0ix instead of adev->suspend Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.0, 6.1
2022-12-20drm/amdgpu: skip MES for S0ix as well since it's part of GFXAlex Deucher1-2/+3
It's also part of gfxoff. Cc: stable@vger.kernel.org # 6.0, 6.1 Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-12-20drm/amdkfd: Fix double release compute pasidPhilip Yang2-12/+31
If kfd_process_device_init_vm returns failure after vm is converted to compute vm and vm->pasid set to compute pasid, KFD will not take pdd->drm_file reference. As a result, drm close file handler maybe called to release the compute pasid before KFD process destroy worker to release the same pasid and set vm->pasid to zero, this generates below WARNING backtrace and NULL pointer access. Add helper amdgpu_amdkfd_gpuvm_set_vm_pasid and call it at the last step of kfd_process_device_init_vm, to ensure vm pasid is the original pasid if acquiring vm failed or is the compute pasid with pdd->drm_file reference taken to avoid double release same pasid. amdgpu: Failed to create process VM object ida_free called for id=32770 which is not allocated. WARNING: CPU: 57 PID: 72542 at ../lib/idr.c:522 ida_free+0x96/0x140 RIP: 0010:ida_free+0x96/0x140 Call Trace: amdgpu_pasid_free_delayed+0xe1/0x2a0 [amdgpu] amdgpu_driver_postclose_kms+0x2d8/0x340 [amdgpu] drm_file_free.part.13+0x216/0x270 [drm] drm_close_helper.isra.14+0x60/0x70 [drm] drm_release+0x6e/0xf0 [drm] __fput+0xcc/0x280 ____fput+0xe/0x20 task_work_run+0x96/0xc0 do_exit+0x3d0/0xc10 BUG: kernel NULL pointer dereference, address: 0000000000000000 RIP: 0010:ida_free+0x76/0x140 Call Trace: amdgpu_pasid_free_delayed+0xe1/0x2a0 [amdgpu] amdgpu_driver_postclose_kms+0x2d8/0x340 [amdgpu] drm_file_free.part.13+0x216/0x270 [drm] drm_close_helper.isra.14+0x60/0x70 [drm] drm_release+0x6e/0xf0 [drm] __fput+0xcc/0x280 ____fput+0xe/0x20 task_work_run+0x96/0xc0 do_exit+0x3d0/0xc10 Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-12-20drm/amdgpu: skip mes self test after s0i3 resume for MES IP v11.0Tim Huang1-1/+2
MES is part of gfxoff and MES suspend and resume are skipped for S0i3. But the mes_self_test call path is still in the amdgpu_device_ip_late_init. it's should also be skipped for s0ix as no hardware re-initialization happened. Besides, mes_self_test will free the BO that triggers a lot of warning messages while in the suspend state. [ 81.656085] WARNING: CPU: 2 PID: 1550 at drivers/gpu/drm/amd/amdgpu/amdgpu_object.c:425 amdgpu_bo_free_kernel+0xfc/0x110 [amdgpu] [ 81.679435] Call Trace: [ 81.679726] <TASK> [ 81.679981] amdgpu_mes_remove_hw_queue+0x17a/0x230 [amdgpu] [ 81.680857] amdgpu_mes_self_test+0x390/0x430 [amdgpu] [ 81.681665] mes_v11_0_late_init+0x37/0x50 [amdgpu] [ 81.682423] amdgpu_device_ip_late_init+0x53/0x280 [amdgpu] [ 81.683257] amdgpu_device_resume+0xae/0x2a0 [amdgpu] [ 81.684043] amdgpu_pmops_resume+0x37/0x70 [amdgpu] [ 81.684818] pci_pm_resume+0x5c/0xa0 [ 81.685247] ? pci_pm_thaw+0x90/0x90 [ 81.685658] dpm_run_callback+0x4e/0x160 [ 81.686110] device_resume+0xad/0x210 [ 81.686529] async_resume+0x1e/0x40 [ 81.686931] async_run_entry_fn+0x33/0x120 [ 81.687405] process_one_work+0x21d/0x3f0 [ 81.687869] worker_thread+0x4a/0x3c0 [ 81.688293] ? process_one_work+0x3f0/0x3f0 [ 81.688777] kthread+0xff/0x130 [ 81.689157] ? kthread_complete_and_exit+0x20/0x20 [ 81.689707] ret_from_fork+0x22/0x30 [ 81.690118] </TASK> [ 81.690380] ---[ end trace 0000000000000000 ]--- v2: make the comment clean and use adev->in_s0ix instead of adev->suspend Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-12-20drm/amdgpu: skip MES for S0ix as well since it's part of GFXAlex Deucher1-2/+3
It's also part of gfxoff. Cc: stable@vger.kernel.org # 6.0, 6.1 Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-12-20Revert "drm/amdgpu: force exit gfxoff on sdma resume for rmb s0ix"Alex Deucher1-8/+0
This reverts commit e5d59cfa330523e47cba62a496864acc3948fc27. This is no longer needed since we no longer suspend SDMA during S0ix. Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-12-20Revert "drm/amdgpu: disallow gfxoff until GC IP blocks complete s2idle resume"Alex Deucher1-16/+0
This reverts commit f543d28687480fad06b708bc6e0b0b6ec953b078. This is no longer needed since we no longer touch SDMA 5.x for s0i3. Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-12-20drm/amdgpu: for S0ix, skip SDMA 5.x+ suspend/resumeAlex Deucher1-0/+6
SDMA 5.x is part of the GFX block so it's controlled via GFXOFF. Skip suspend as it should be handled the same as GFX. v2: drop SDMA 4.x. That requires special handling. Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Acked-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-12-20drm/amdgpu: don't mess with SDMA clock or powergating in S0ixAlex Deucher1-4/+6
It's handled by GFXOFF for SDMA 5.x and SMU saves the state on SDMA 4.x. Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-12-20drm/amdgpu/gmc11: don't touch gfxhub registers during S0ixAlex Deucher1-2/+14
gfxhub registers are part of gfx IP and should not need to be changed. Doing so without disabling gfxoff can hang the gfx IP. v2: add comments explaining why we can skip the interrupt control for S0i3 Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-12-20drm/amdgpu/gmc10: don't touch gfxhub registers during S0ixAlex Deucher1-9/+27
gfxhub registers are part of gfx IP and should not need to be changed. Doing so without disabling gfxoff can hang the gfx IP. v2: add comments explaining why we can skip the interrupt control for S0i3 Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-12-20drm/amdgpu/gmc9: don't touch gfxhub registers during S0ixAlex Deucher1-6/+30
gfxhub registers are part of gfx IP and should not need to be changed. Doing so without disabling gfxoff can hang the gfx IP. v2: add comments explaining why we can skip the interrupt control for S0i3 Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-12-20drm/amdkfd: Fix double release compute pasidPhilip Yang2-12/+31
If kfd_process_device_init_vm returns failure after vm is converted to compute vm and vm->pasid set to compute pasid, KFD will not take pdd->drm_file reference. As a result, drm close file handler maybe called to release the compute pasid before KFD process destroy worker to release the same pasid and set vm->pasid to zero, this generates below WARNING backtrace and NULL pointer access. Add helper amdgpu_amdkfd_gpuvm_set_vm_pasid and call it at the last step of kfd_process_device_init_vm, to ensure vm pasid is the original pasid if acquiring vm failed or is the compute pasid with pdd->drm_file reference taken to avoid double release same pasid. amdgpu: Failed to create process VM object ida_free called for id=32770 which is not allocated. WARNING: CPU: 57 PID: 72542 at ../lib/idr.c:522 ida_free+0x96/0x140 RIP: 0010:ida_free+0x96/0x140 Call Trace: amdgpu_pasid_free_delayed+0xe1/0x2a0 [amdgpu] amdgpu_driver_postclose_kms+0x2d8/0x340 [amdgpu] drm_file_free.part.13+0x216/0x270 [drm] drm_close_helper.isra.14+0x60/0x70 [drm] drm_release+0x6e/0xf0 [drm] __fput+0xcc/0x280 ____fput+0xe/0x20 task_work_run+0x96/0xc0 do_exit+0x3d0/0xc10 BUG: kernel NULL pointer dereference, address: 0000000000000000 RIP: 0010:ida_free+0x76/0x140 Call Trace: amdgpu_pasid_free_delayed+0xe1/0x2a0 [amdgpu] amdgpu_driver_postclose_kms+0x2d8/0x340 [amdgpu] drm_file_free.part.13+0x216/0x270 [drm] drm_close_helper.isra.14+0x60/0x70 [drm] drm_release+0x6e/0xf0 [drm] __fput+0xcc/0x280 ____fput+0xe/0x20 task_work_run+0x96/0xc0 do_exit+0x3d0/0xc10 Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-12-15drm/amdgpu: Add poison mode query for df v4_3Candice Li4-1/+98
Add poison mode query support on df v4_3. Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-12-15drm/amdgpu: bump minor version number for DEV_INFO and SENSOR IOCTLs updateEvan Quan1-2/+4
Update AMDGPU_INFO_DEV_INFO IOCTL for minimum engine and memory clock. And update AMDGPU_INFO_SENSOR IOCTL for PEAK_PSTATE engine and memory clock. User applications can better utilize these IOCTLs to get needed informations. Increase the minor version number to indicate that the new flags are available. Proposed mesa patch: https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/278 Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-12-15drm/amdgpu: expose the minimum shader/memory clock frequencyEvan Quan1-2/+8
Otherwise, some UMD tools will treate them as 0 at default while actually they are not. Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-12-15drm/amdgpu: expose peak profiling mode shader/memory clocksEvan Quan1-0/+18
Expose those informations to UMD who need them as for standard profiling mode. Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-12-15drm/amdgpu: define RAS query poison mode functionTao Zhou1-21/+33
1. no need to query poison mode on SRIOV guest side, host can handle it. 2. define the function to simplify code. v2: rename amdgpu_ras_poison_mode_query to amdgpu_ras_query_poison_mode. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-12-15drm/amdgpu: update VCN/JPEG RAS settingTao Zhou1-11/+13
Support VCN/JPEG RAS in both bare metal and SRIOV environment. v2: update commit description. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-12-15drm/amdgpu: skip RAS error injection in SRIOVTao Zhou1-0/+4
Injection on guest is not allowed. v2: return directly in SRIOV environment. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-12-15drm/amdgpu: add VCN poison consumption handler for SRIOVTao Zhou1-2/+10
Inform host and let host handle consumption interrupt. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-12-15drm/amdgpu: add RAS poison consumption handler for SRIOVTao Zhou1-18/+26
Send message to PF if VF receives RAS poison consumption interrupt. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-12-15drm/amdgpu: add RAS poison consumption handler for NV SRIOVTao Zhou2-0/+7
Send handling request to host. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-12-15drm/amdgpu: add RAS poison consumption handler for AI SRIOVTao Zhou3-0/+8
Send message to host and host will handle it. v2: split the patch into two parts, one is for mxgpu ai and another one is for common poison consumption handler. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-12-15drm/amdgpu: revert "generally allow over-commit during BO allocation"Christian König2-4/+18
This reverts commit f9d00a4a8dc8fff951c97b3213f90d6bc7a72175. This causes problem for KFD because when we overcommit we accidentially bind the BO to GTT for moving it into VRAM. We also need to make sure that this is done only as fallback after trying to evict first. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-12-15drm/amdgpu: Remove unnecessary domain argumentLuben Tuikov4-14/+6
Remove the "domain" argument to amdgpu_bo_create_kernel_at() since this function takes an "offset" argument which is the offset off of VRAM, and as such allocation always takes place in VRAM. Thus, the "domain" argument is unnecessary. Cc: Alex Deucher <Alexander.Deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: AMD Graphics <amd-gfx@lists.freedesktop.org> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-12-15drm/amdgpu: Fix size validation for non-exclusive domains (v4)Luben Tuikov1-11/+8
Fix amdgpu_bo_validate_size() to check whether the TTM domain manager for the requested memory exists, else we get a kernel oops when dereferencing "man". v2: Make the patch standalone, i.e. not dependent on local patches. v3: Preserve old behaviour and just check that the manager pointer is not NULL. v4: Complain if GTT domain requested and it is uninitialized--most likely a bug. Cc: Alex Deucher <Alexander.Deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: AMD Graphics <amd-gfx@lists.freedesktop.org> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-12-15drm/amdgpu: Check if fru_addr is not NULL (v2)Luben Tuikov1-2/+4
Always check if fru_addr is not NULL. This commit also fixes a "smatch" warning. v2: Add a Fixes tag. Cc: Alex Deucher <Alexander.Deucher@amd.com> Cc: Dan Carpenter <error27@gmail.com> Cc: kernel test robot <lkp@intel.com> Cc: AMD Graphics <amd-gfx@lists.freedesktop.org> Fixes: afbe5d1e4bd7c7 ("drm/amdgpu: Bug-fix: Reading I2C FRU data on newer ASICs") Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Reviewed-by: Kent Russell <kent.russell@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-12-14drm/amdgpu: rework reserved VMID handlingChristian König3-30/+24
Instead of reserving a VMID for a single process allow that many processes use the reserved ID. This allows for proper isolation between the processes. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-12-14drm/amdgpu: stop waiting for the VM during unreserveChristian König1-16/+0
This is completely pointless since the VMID always stays allocated until the VM is idle. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-12-14drm/amdgpu: cleanup SPM support a bitChristian König3-4/+7
This should probably not access job->vm and also emit the SPM switch under the conditional execute. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-12-14drm/amdgpu: fix GDS/GWS/OA switch handlingChristian König3-43/+54
Bas pointed out that this isn't working as expected and could cause crashes. Fix the handling by storing the marker that a switch is needed inside the job instead. Reported-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-12-14drm/amdgpu: Add notifier lock for KFD userptrsFelix Kuehling6-91/+172
Add a per-process MMU notifier lock for processing notifiers from userptrs. Use that lock to properly synchronize page table updates with MMU notifiers. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Xiaogang Chen<Xiaogang.Chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-12-14drm/amdgpu: WARN when freeing kernel memory during suspendChristian König1-0/+2
When buffers are freed during suspend there is no guarantee that they can be re-allocated during resume. The PSP subsystem seems to be quite buggy regarding this, so add a WARN_ON() to point out those bugs. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexdeucher@amd.com> Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-12-14drm/amdgpu: fixx NULL pointer deref in gmc_v9_0_get_vm_pteChristian König1-1/+3
We not only need to make sure that we have a BO, but also that the BO has some backing store. Fixes: d1a372af1c3d ("drm/amdgpu: Set MTYPE in PTE based on BO flags") Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-12-14Merge tag 'mm-stable-2022-12-13' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - More userfaultfs work from Peter Xu - Several convert-to-folios series from Sidhartha Kumar and Huang Ying - Some filemap cleanups from Vishal Moola - David Hildenbrand added the ability to selftest anon memory COW handling - Some cpuset simplifications from Liu Shixin - Addition of vmalloc tracing support by Uladzislau Rezki - Some pagecache folioifications and simplifications from Matthew Wilcox - A pagemap cleanup from Kefeng Wang: we have VM_ACCESS_FLAGS, so use it - Miguel Ojeda contributed some cleanups for our use of the __no_sanitize_thread__ gcc keyword. This series should have been in the non-MM tree, my bad - Naoya Horiguchi improved the interaction between memory poisoning and memory section removal for huge pages - DAMON cleanups and tuneups from SeongJae Park - Tony Luck fixed the handling of COW faults against poisoned pages - Peter Xu utilized the PTE marker code for handling swapin errors - Hugh Dickins reworked compound page mapcount handling, simplifying it and making it more efficient - Removal of the autonuma savedwrite infrastructure from Nadav Amit and David Hildenbrand - zram support for multiple compression streams from Sergey Senozhatsky - David Hildenbrand reworked the GUP code's R/O long-term pinning so that drivers no longer need to use the FOLL_FORCE workaround which didn't work very well anyway - Mel Gorman altered the page allocator so that local IRQs can remnain enabled during per-cpu page allocations - Vishal Moola removed the try_to_release_page() wrapper - Stefan Roesch added some per-BDI sysfs tunables which are used to prevent network block devices from dirtying excessive amounts of pagecache - David Hildenbrand did some cleanup and repair work on KSM COW breaking - Nhat Pham and Johannes Weiner have implemented writeback in zswap's zsmalloc backend - Brian Foster has fixed a longstanding corner-case oddity in file[map]_write_and_wait_range() - sparse-vmemmap changes for MIPS, LoongArch and NIOS2 from Feiyang Chen - Shiyang Ruan has done some work on fsdax, to make its reflink mode work better under xfstests. Better, but still not perfect - Christoph Hellwig has removed the .writepage() method from several filesystems. They only need .writepages() - Yosry Ahmed wrote a series which fixes the memcg reclaim target beancounting - David Hildenbrand has fixed some of our MM selftests for 32-bit machines - Many singleton patches, as usual * tag 'mm-stable-2022-12-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (313 commits) mm/hugetlb: set head flag before setting compound_order in __prep_compound_gigantic_folio mm: mmu_gather: allow more than one batch of delayed rmaps mm: fix typo in struct pglist_data code comment kmsan: fix memcpy tests mm: add cond_resched() in swapin_walk_pmd_entry() mm: do not show fs mm pc for VM_LOCKONFAULT pages selftests/vm: ksm_functional_tests: fixes for 32bit selftests/vm: cow: fix compile warning on 32bit selftests/vm: madv_populate: fix missing MADV_POPULATE_(READ|WRITE) definitions mm/gup_test: fix PIN_LONGTERM_TEST_READ with highmem mm,thp,rmap: fix races between updates of subpages_mapcount mm: memcg: fix swapcached stat accounting mm: add nodes= arg to memory.reclaim mm: disable top-tier fallback to reclaim on proactive reclaim selftests: cgroup: make sure reclaim target memcg is unprotected selftests: cgroup: refactor proactive reclaim code to reclaim_until() mm: memcg: fix stale protection of reclaim target memcg mm/mmap: properly unaccount memory on mas_preallocate() failure omfs: remove ->writepage jfs: remove ->writepage ...