summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu
AgeCommit message (Collapse)AuthorFilesLines
2023-12-13drm/amdgpu: Do not program VF copy regs in mmhub v1.8 under SRIOV (v2)Victor Lu1-3/+3
[ Upstream commit 0288603040c38ccfeb5342f34a52673366d90038 ] MC_VM_AGP_* registers should not be programmed by guest driver. v2: move early return outside of loop Signed-off-by: Victor Lu <victorchengchi.lu@amd.com> Reviewed-by: Samir Dhume <samir.dhume@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-12-08drm/amdgpu: Update EEPROM I2C address for smu v13_0_0Candice Li1-0/+6
commit e0409021e34af50e7b6f31635c8d21583d7c43dd upstream. Check smu v13_0_0 SKU type to select EEPROM I2C address. Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.1.x Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-08drm/amdgpu: fix memory overflow in the IB testTim Huang4-7/+7
commit 6b0b7789a7a5f3e69185449f891beea58e563f9b upstream. Fix a memory overflow issue in the gfx IB test for some ASICs. At least 20 bytes are needed for the IB test packet. v2: correct code indentation errors. (Christian) Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-08drm/amdgpu: Force order between a read and write to the same addressAlex Sierra1-0/+8
commit 4b27a33c3b173bef1d19ba89e0b9b812b4fddd25 upstream. Setting register to force ordering to prevent read/write or write/read hazards for un-cached modes. Signed-off-by: Alex Sierra <alex.sierra@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.1.x Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-08drm/amdgpu: correct the amdgpu runtime dereference usage countPrike Liang1-6/+3
commit c6df7f313794c3ad41a49b9a7c95da369db607f3 upstream. Fix the amdgpu runpm dereference usage count. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-08drm/amd: Enable PCIe PME from D3Mario Limonciello1-0/+2
commit 6967741d26c87300a51b5e50d4acd104bc1a9759 upstream. When dGPU is put into BOCO it may be in D3cold but still able send PME on display hotplug event. For this to work it must be enabled as wake source from D3. When runpm is enabled use pci_wake_from_d3() to mark wakeup as enabled by default. Cc: stable@vger.kernel.org # 6.1+ Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-28drm/amdgpu: Fix possible null pointer dereferenceFelix Kuehling1-2/+2
commit 256503071c2de2b5b5c20e06654aa9a44f13aa62 upstream. mem = bo->tbo.resource may be NULL in amdgpu_vm_bo_update. Fixes: 180253782038 ("drm/ttm: stop allocating dummy resources during BO creation") Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-28drm/amdgpu: lower CS errors to debug severityChristian König1-1/+1
commit 17daf01ab4e3e5a5929747aa05cc15eb2bad5438 upstream. Otherwise userspace can spam the logs by using incorrect input values. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-28drm/amdgpu: fix error handling in amdgpu_bo_list_get()Christian König1-0/+1
commit 12f76050d8d4d10dab96333656b821bd4620d103 upstream. We should not leak the pointer where we couldn't grab the reference on to the caller because it can be that the error handling still tries to put the reference then. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-28drm/amdgpu: fix error handling in amdgpu_vm_initChristian König1-15/+16
commit 8473bfdcb5b1a32fd05629c4535ccacd73bc5567 upstream. When clearing the root PD fails we need to properly release it again. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-28drm/amdgpu: don't use ATRM for external devicesAlex Deucher1-0/+5
commit 432e664e7c98c243fab4c3c95bd463bea3aeed28 upstream. The ATRM ACPI method is for fetching the dGPU vbios rom image on laptops and all-in-one systems. It should not be used for external add in cards. If the dGPU is thunderbolt connected, don't try ATRM. v2: pci_is_thunderbolt_attached only works for Intel. Use pdev->external_facing instead. v3: dev_is_removable() seems to be what we want Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2925 Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-28drm/amdgpu: add a retry for IP discovery initAlex Deucher1-2/+21
commit 3938eb956e383ef88b8fc7d556492336ebee52df upstream. AMD dGPUs have integrated FW that runs as soon as the device gets power and initializes the board (determines the amount of memory, provides configuration details to the driver, etc.). For direct PCIe attached cards this happens as soon as power is applied and normally completes well before the OS has even started loading. However, with hotpluggable ports like USB4, the driver needs to wait for this to complete before initializing the device. This normally takes 60-100ms, but could take longer on some older boards periodically due to memory training. Retry for up to a second. In the non-hotplug case, there should be no change in behavior and this should complete on the first try. v2: adjust test criteria v3: adjust checks for the masks, only enable on removable devices v4: skip bif_fb_en check Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2925 Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-28drm/amdgpu: fix GRBM read timeout when do mes_self_testTim Huang1-0/+16
commit 36e7ff5c13cb15cb7b06c76d42bb76cbf6b7ea75 upstream. Use a proper MEID to make sure the CP_HQD_* and CP_GFX_HQD_* registers can be touched when initialize the compute and gfx mqd in mes_self_test. Otherwise, we expect no response from CP and an GRBM eventual timeout. Signed-off-by: Tim Huang <Tim.Huang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-28drm/amdgpu: don't use pci_is_thunderbolt_attached()Alex Deucher2-6/+7
commit 7b1c6263eaf4fd64ffe1cafdc504a42ee4bfbb33 upstream. It's only valid on Intel systems with the Intel VSEC. Use dev_is_removable() instead. This should do the right thing regardless of the platform. Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2925 Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-28drm/amdgpu: fix software pci_unplug on some chipsVitaly Prosyak1-3/+6
[ Upstream commit 4638e0c29a3f2294d5de0d052a4b8c9f33ccb957 ] When software 'pci unplug' using IGT is executed we got a sysfs directory entry is NULL for differant ras blocks like hdp, umc, etc. Before call 'sysfs_remove_file_from_group' and 'sysfs_remove_group' check that 'sd' is not NULL. [ +0.000001] RIP: 0010:sysfs_remove_group+0x83/0x90 [ +0.000002] Code: 31 c0 31 d2 31 f6 31 ff e9 9a a8 b4 00 4c 89 e7 e8 f2 a2 ff ff eb c2 49 8b 55 00 48 8b 33 48 c7 c7 80 65 94 82 e8 cd 82 bb ff <0f> 0b eb cc 66 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 [ +0.000001] RSP: 0018:ffffc90002067c90 EFLAGS: 00010246 [ +0.000002] RAX: 0000000000000000 RBX: ffffffff824ea180 RCX: 0000000000000000 [ +0.000001] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [ +0.000001] RBP: ffffc90002067ca8 R08: 0000000000000000 R09: 0000000000000000 [ +0.000001] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 [ +0.000001] R13: ffff88810a395f48 R14: ffff888101aab0d0 R15: 0000000000000000 [ +0.000001] FS: 00007f5ddaa43a00(0000) GS:ffff88841e800000(0000) knlGS:0000000000000000 [ +0.000002] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ +0.000001] CR2: 00007f8ffa61ba50 CR3: 0000000106432000 CR4: 0000000000350ef0 [ +0.000001] Call Trace: [ +0.000001] <TASK> [ +0.000001] ? show_regs+0x72/0x90 [ +0.000002] ? sysfs_remove_group+0x83/0x90 [ +0.000002] ? __warn+0x8d/0x160 [ +0.000001] ? sysfs_remove_group+0x83/0x90 [ +0.000001] ? report_bug+0x1bb/0x1d0 [ +0.000003] ? handle_bug+0x46/0x90 [ +0.000001] ? exc_invalid_op+0x19/0x80 [ +0.000002] ? asm_exc_invalid_op+0x1b/0x20 [ +0.000003] ? sysfs_remove_group+0x83/0x90 [ +0.000001] dpm_sysfs_remove+0x61/0x70 [ +0.000002] device_del+0xa3/0x3d0 [ +0.000002] ? ktime_get_mono_fast_ns+0x46/0xb0 [ +0.000002] device_unregister+0x18/0x70 [ +0.000001] i2c_del_adapter+0x26d/0x330 [ +0.000002] arcturus_i2c_control_fini+0x25/0x50 [amdgpu] [ +0.000236] smu_sw_fini+0x38/0x260 [amdgpu] [ +0.000241] amdgpu_device_fini_sw+0x116/0x670 [amdgpu] [ +0.000186] ? mutex_lock+0x13/0x50 [ +0.000003] amdgpu_driver_release_kms+0x16/0x40 [amdgpu] [ +0.000192] drm_minor_release+0x4f/0x80 [drm] [ +0.000025] drm_release+0xfe/0x150 [drm] [ +0.000027] __fput+0x9f/0x290 [ +0.000002] ____fput+0xe/0x20 [ +0.000002] task_work_run+0x61/0xa0 [ +0.000002] exit_to_user_mode_prepare+0x150/0x170 [ +0.000002] syscall_exit_to_user_mode+0x2a/0x50 Cc: Hawking Zhang <hawking.zhang@amd.com> Cc: Luben Tuikov <luben.tuikov@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian Koenig <christian.koenig@amd.com> Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-28drm/amd: Disable PP_PCIE_DPM_MASK when dynamic speed switching not supportedMario Limonciello1-0/+2
[ Upstream commit fbf1035b033a51eee48d5f42e781b02fff272ca0 ] Rather than individual ASICs checking for the quirk, set the quirk at the driver level. Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-28drm/amdgpu: Fix a null pointer access when the smc_rreg pointer is NULLQu Huang1-0/+6
[ Upstream commit 5104fdf50d326db2c1a994f8b35dcd46e63ae4ad ] In certain types of chips, such as VEGA20, reading the amdgpu_regs_smc file could result in an abnormal null pointer access when the smc_rreg pointer is NULL. Below are the steps to reproduce this issue and the corresponding exception log: 1. Navigate to the directory: /sys/kernel/debug/dri/0 2. Execute command: cat amdgpu_regs_smc 3. Exception Log:: [4005007.702554] BUG: kernel NULL pointer dereference, address: 0000000000000000 [4005007.702562] #PF: supervisor instruction fetch in kernel mode [4005007.702567] #PF: error_code(0x0010) - not-present page [4005007.702570] PGD 0 P4D 0 [4005007.702576] Oops: 0010 [#1] SMP NOPTI [4005007.702581] CPU: 4 PID: 62563 Comm: cat Tainted: G OE 5.15.0-43-generic #46-Ubunt u [4005007.702590] RIP: 0010:0x0 [4005007.702598] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6. [4005007.702600] RSP: 0018:ffffa82b46d27da0 EFLAGS: 00010206 [4005007.702605] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffa82b46d27e68 [4005007.702609] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff9940656e0000 [4005007.702612] RBP: ffffa82b46d27dd8 R08: 0000000000000000 R09: ffff994060c07980 [4005007.702615] R10: 0000000000020000 R11: 0000000000000000 R12: 00007f5e06753000 [4005007.702618] R13: ffff9940656e0000 R14: ffffa82b46d27e68 R15: 00007f5e06753000 [4005007.702622] FS: 00007f5e0755b740(0000) GS:ffff99479d300000(0000) knlGS:0000000000000000 [4005007.702626] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [4005007.702629] CR2: ffffffffffffffd6 CR3: 00000003253fc000 CR4: 00000000003506e0 [4005007.702633] Call Trace: [4005007.702636] <TASK> [4005007.702640] amdgpu_debugfs_regs_smc_read+0xb0/0x120 [amdgpu] [4005007.703002] full_proxy_read+0x5c/0x80 [4005007.703011] vfs_read+0x9f/0x1a0 [4005007.703019] ksys_read+0x67/0xe0 [4005007.703023] __x64_sys_read+0x19/0x20 [4005007.703028] do_syscall_64+0x5c/0xc0 [4005007.703034] ? do_user_addr_fault+0x1e3/0x670 [4005007.703040] ? exit_to_user_mode_prepare+0x37/0xb0 [4005007.703047] ? irqentry_exit_to_user_mode+0x9/0x20 [4005007.703052] ? irqentry_exit+0x19/0x30 [4005007.703057] ? exc_page_fault+0x89/0x160 [4005007.703062] ? asm_exc_page_fault+0x8/0x30 [4005007.703068] entry_SYSCALL_64_after_hwframe+0x44/0xae [4005007.703075] RIP: 0033:0x7f5e07672992 [4005007.703079] Code: c0 e9 b2 fe ff ff 50 48 8d 3d fa b2 0c 00 e8 c5 1d 02 00 0f 1f 44 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 00 48 83 e c 28 48 89 54 24 [4005007.703083] RSP: 002b:00007ffe03097898 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [4005007.703088] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007f5e07672992 [4005007.703091] RDX: 0000000000020000 RSI: 00007f5e06753000 RDI: 0000000000000003 [4005007.703094] RBP: 00007f5e06753000 R08: 00007f5e06752010 R09: 00007f5e06752010 [4005007.703096] R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000022000 [4005007.703099] R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000020000 [4005007.703105] </TASK> [4005007.703107] Modules linked in: nf_tables libcrc32c nfnetlink algif_hash af_alg binfmt_misc nls_ iso8859_1 ipmi_ssif ast intel_rapl_msr intel_rapl_common drm_vram_helper drm_ttm_helper amd64_edac t tm edac_mce_amd kvm_amd ccp mac_hid k10temp kvm acpi_ipmi ipmi_si rapl sch_fq_codel ipmi_devintf ipm i_msghandler msr parport_pc ppdev lp parport mtd pstore_blk efi_pstore ramoops pstore_zone reed_solo mon ip_tables x_tables autofs4 ib_uverbs ib_core amdgpu(OE) amddrm_ttm_helper(OE) amdttm(OE) iommu_v 2 amd_sched(OE) amdkcl(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec rc_core drm igb ahci xhci_pci libahci i2c_piix4 i2c_algo_bit xhci_pci_renesas dca [4005007.703184] CR2: 0000000000000000 [4005007.703188] ---[ end trace ac65a538d240da39 ]--- [4005007.800865] RIP: 0010:0x0 [4005007.800871] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6. [4005007.800874] RSP: 0018:ffffa82b46d27da0 EFLAGS: 00010206 [4005007.800878] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffa82b46d27e68 [4005007.800881] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff9940656e0000 [4005007.800883] RBP: ffffa82b46d27dd8 R08: 0000000000000000 R09: ffff994060c07980 [4005007.800886] R10: 0000000000020000 R11: 0000000000000000 R12: 00007f5e06753000 [4005007.800888] R13: ffff9940656e0000 R14: ffffa82b46d27e68 R15: 00007f5e06753000 [4005007.800891] FS: 00007f5e0755b740(0000) GS:ffff99479d300000(0000) knlGS:0000000000000000 [4005007.800895] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [4005007.800898] CR2: ffffffffffffffd6 CR3: 00000003253fc000 CR4: 00000000003506e0 Signed-off-by: Qu Huang <qu.huang@linux.dev> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-28drm/amdgpu/vkms: fix a possible null pointer dereferenceMa Ke1-0/+2
[ Upstream commit cd90511557fdfb394bb4ac4c3b539b007383914c ] In amdgpu_vkms_conn_get_modes(), the return value of drm_cvt_mode() is assigned to mode, which will lead to a NULL pointer dereference on failure of drm_cvt_mode(). Add a check to avoid null pointer dereference. Signed-off-by: Ma Ke <make_ruc2021@163.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-28drm/amdgpu: Fix potential null pointer derefernceStanley.Yang1-1/+2
[ Upstream commit 80285ae1ec8717b597b20de38866c29d84d321a1 ] The amdgpu_ras_get_context may return NULL if device not support ras feature, so add check before using. Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-28drm/amdgpu: update retry times for psp vmbx waitTao Zhou1-1/+4
[ Upstream commit fc598890715669ff794b253fdf387cd02b9396f8 ] Increase the retry loops and replace the constant number with macro. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-28drm/amdgpu: not to save bo in the case of RAS err_event_athubDavid (Ming Qiang) Wu1-0/+7
[ Upstream commit fa1f1cc09d588a90c8ce3f507c47df257461d148 ] err_event_athub will corrupt VCPU buffer and not good to be restored in amdgpu_vcn_resume() and in this case the VCPU buffer needs to be cleared for VCN firmware to work properly. Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-20drm/amdgpu: don't put MQDs in VRAM on ARM | ARM64Alex Deucher1-0/+2
[ Upstream commit ba0fb4b48c19a2d2380fc16ca4af236a0871d279 ] Issues were reported with commit 1cfb4d612127 ("drm/amdgpu: put MQDs in VRAM") on an ADLINK Ampere Altra Developer Platform (AVA developer platform). Various ARM systems seem to have problems related to PCIe and MMIO access. In this case, I'm not sure if this is specific to the ADLINK platform or ARM in general. Seems to be some coherency issue with VRAM. For now, just don't put MQDs in VRAM on ARM. Link: https://lists.freedesktop.org/archives/amd-gfx/2023-October/100453.html Fixes: 1cfb4d612127 ("drm/amdgpu: put MQDs in VRAM") Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: alexey.klimov@linaro.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-20drm/amdgpu/gfx10,11: use memcpy_to/fromio for MQDsAlex Deucher2-12/+12
[ Upstream commit b3c942bb6c32a8ddc1d52ee6bc24b8cf732dddf4 ] Since they were moved to VRAM, we need to use the IO variants of memcpy. Fixes: 1cfb4d612127 ("drm/amdgpu: put MQDs in VRAM") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-20drm/amdgpu: Increase IH soft ring size for GFX v9.4.3 dGPUPhilip Yang1-1/+1
[ Upstream commit bcfb9cee61207b80f37663ffa08c135657a27ad5 ] On GFX v9.4.3 dGPU, applications have random timeout failure when XNACK on, dmesg log has "amdgpu: IH soft ring buffer overflow 0x900, 0x900", because dGPU mode has 272 cam entries. After increasing IH soft ring to 512 entries, no more IH soft ring overflow message and application passed. Fixes: bf80d34b6c58 ("drm/amdgpu: Increase soft IH ring size") Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-10-27Merge tag 'amd-drm-fixes-6.6-2023-10-25' of ↵Dave Airlie1-1/+1
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-6.6-2023-10-25: amdgpu: - Extend VI APSM quirks to more platforms Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231026035452.14921-1-alexander.deucher@amd.com
2023-10-27Merge tag 'drm-misc-fixes-2023-10-26' of ↵Dave Airlie3-9/+11
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes Short summary of fixes pull: amdgpu: - ignore duplicated BOs in CS parser - remove redundant call to amdgpu_ctx_priority_is_valid() amdkfd: - reserve fence slot while locking BO dp_mst: - Fix NULL deref in get_mst_branch_device_by_guid_helper() logicvc: - Kconfig: Select REGMAP and REGMAP_MMIO ivpu: - Fix missing VPUIP interrupts Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20231026110132.GA10591@linux-uq9g.fritz.box
2023-10-25drm/amd: Disable ASPM for VI w/ all Intel systemsMario Limonciello1-1/+1
Originally we were quirking ASPM disabled specifically for VI when used with Alder Lake, but it appears to have problems with Rocket Lake as well. Like we've done in the case of dpm for newer platforms, disable ASPM for all Intel systems. Cc: stable@vger.kernel.org # 5.15+ Fixes: 0064b0ce85bb ("drm/amd/pm: enable ASPM by default") Reported-and-tested-by: Paolo Gentili <paolo.gentili@canonical.com> Closes: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2036742 Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-23drm/amdkfd: reserve a fence slot while locking the BOChristian König1-1/+1
Looks like the KFD still needs this. Signed-off-by: Christian König <christian.koenig@amd.com> Fixes: 8abc1eb2987a ("drm/amdkfd: switch over to using drm_exec v3") Acked-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231020123306.43978-1-christian.koenig@amd.com
2023-10-22drm/amdgpu: Remove redundant call to priority_is_valid()Luben Tuikov1-7/+8
Remove a redundant call to amdgpu_ctx_priority_is_valid() from amdgpu_ctx_priority_permit(), which is called from amdgpu_ctx_init() which is called from amdgpu_ctx_alloc() which is called from amdgpu_ctx_ioctl(), where we've called amdgpu_ctx_priority_is_valid() already first thing in the function. Cc: Alex Deucher <Alexander.Deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Link: https://lore.kernel.org/r/20231018010359.30393-1-luben.tuikov@amd.com
2023-10-20Merge tag 'drm-misc-fixes-2023-10-19' of ↵Dave Airlie1-2/+3
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes Short summary of fixes pull: amdgpu: - Disable AMD_CTX_PRIORITY_UNSET bridge: - ti-sn65dsi86: Fix device lifetime edid: - Add quirk for BenQ GW2765 ivpu: - Extend address range for MMU mmap nouveau: - DP-connector fixes - Documentation fixes panel: - Move AUX B116XW03 into panel-simple scheduler: - Eliminate DRM_SCHED_PRIORITY_UNSET ttm: - Fix possible NULL-ptr deref in cleanup Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20231019114605.GA22540@linux-uq9g
2023-10-20drm/amdgpu: Reserve fences for VM updateFelix Kuehling1-1/+4
In amdgpu_dma_buf_move_notify reserve fences for the page table updates in amdgpu_vm_clear_freed and amdgpu_vm_handle_moved. This fixes a BUG_ON in dma_resv_add_fence when using SDMA for page table updates. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-20drm/amdgpu: Fix possible null pointer dereferenceFelix Kuehling1-1/+2
abo->tbo.resource may be NULL in amdgpu_vm_bo_update. Fixes: 180253782038 ("drm/ttm: stop allocating dummy resources during BO creation") Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-19drm/amdgpu: ignore duplicate BOs againChristian König1-1/+2
Looks like RADV is actually hitting this. Signed-off-by: Christian König <christian.koenig@amd.com> Fixes: ca6c1e210aa7 ("drm/amdgpu: use the new drm_exec object for CS v3") Acked-by: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231017121015.1336786-1-christian.koenig@amd.com
2023-10-18gpu/drm: Eliminate DRM_SCHED_PRIORITY_UNSETLuben Tuikov1-1/+2
Eliminate DRM_SCHED_PRIORITY_UNSET, value of -2, whose only user was amdgpu. Furthermore, eliminate an index bug, in that when amdgpu boots, it calls drm_sched_entity_init() with DRM_SCHED_PRIORITY_UNSET, which uses it to index sched->sched_rq[]. Cc: Alex Deucher <Alexander.Deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Acked-by: Alex Deucher <Alexander.Deucher@amd.com> Link: https://lore.kernel.org/r/20231017035656.8211-2-luben.tuikov@amd.com
2023-10-18drm/amdgpu: Unset context priority is now invalidLuben Tuikov1-1/+1
A context priority value of AMD_CTX_PRIORITY_UNSET is now invalid--instead of carrying it around and passing it to the Direct Rendering Manager--and it becomes AMD_CTX_PRIORITY_NORMAL in amdgpu_ctx_ioctl(), the gateway to context creation. Cc: Alex Deucher <Alexander.Deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Acked-by: Alex Deucher <Alexander.Deucher@amd.com> Link: https://lore.kernel.org/r/20231017035656.8211-1-luben.tuikov@amd.com
2023-10-10drm/amdgpu: fix SI failure due to doorbells allocationIcenowy Zheng1-0/+4
SI hardware does not have doorbells at all, however currently the code will try to do the allocation and thus fail, makes SI AMDGPU not usable. Fix this failure by skipping doorbells allocation when doorbells count is zero. Fixes: 54c30d2a8def ("drm/amdgpu: create kernel doorbell pages") Reviewed-by: Shashank Sharma <shashank.sharma@amd.com> Signed-off-by: Icenowy Zheng <uwu@icenowy.me> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-10drm/amdgpu: add missing NULL checkChristian König1-1/+1
bo->tbo.resource can easily be NULL here. Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2902 Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> CC: stable@vger.kernel.org
2023-10-05drm/amd: Fix detection of _PR3 on the PCIe root portMario Limonciello1-1/+1
On some systems with Navi3x dGPU will attempt to use BACO for runtime PM but fails to resume properly. This is because on these systems the root port goes into D3cold which is incompatible with BACO. This happens because in this case dGPU is connected to a bridge between root port which causes BOCO detection logic to fail. Fix the intent of the logic by looking at root port, not the immediate upstream bridge for _PR3. Cc: stable@vger.kernel.org Suggested-by: Jun Ma <Jun.Ma2@amd.com> Tested-by: David Perry <David.Perry@amd.com> Fixes: b10c1c5b3a4e ("drm/amdgpu: add check for ACPI power resources") Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-05drm/amdgpu: Fix a memory leakLuben Tuikov1-0/+1
Fix a memory leak in amdgpu_fru_get_product_info(). Cc: Alex Deucher <Alexander.Deucher@amd.com> Reported-by: Yang Wang <kevinyang.wang@amd.com> Fixes: 0dbf2c562625 ("drm/amdgpu: Interpret IPMI data for product information (v2)") Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Reviewed-by: Alex Deucher <Alexander.Deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-21drm/amdgpu: fix a memory leak in amdgpu_ras_feature_enableCong Liu1-0/+1
This patch fixes a memory leak in the amdgpu_ras_feature_enable() function. The leak occurs when the function sends a command to the firmware to enable or disable a RAS feature for a GFX block. If the command fails, the kfree() function is not called to free the info memory. Fixes: 9f051d6ff13f ("drm/amdgpu: Free ras cmd input buffer properly") Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Cong Liu <liucong2@kylinos.cn> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-21Revert "drm/amdgpu: Report vbios version instead of PN"Lijo Lazar1-1/+1
This reverts commit 7748ce5b69581325cae40c2134088820f0957902. vbios_version sysfs node is used to identify Part Number also. Revert to the same so that it doesn't break scripts/software which parse this. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-15Merge tag 'amd-drm-fixes-6.6-2023-09-13' of ↵Dave Airlie19-98/+69
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-6.6-2023-09-13: amdgpu: - GC 9.4.3 fixes - Fix white screen issues with S/G display on system with >= 64G of ram - Replay fixes - SMU 13.0.6 fixes - AUX backlight fix - NBIO 4.3 SR-IOV fixes for HDP - RAS fixes - DP MST resume fix - Fix segfault on systems with no vbios - DPIA fixes amdkfd: - CWSR grace period fix - Unaligned doorbell fix - CRIU fix for GFX11 - Add missing TLB flush on gfx10 and newer Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230913195009.7714-1-alexander.deucher@amd.com
2023-09-14Merge tag 'drm-misc-fixes-2023-09-07' of ↵Daniel Vetter1-1/+1
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes One doc fix for drm/connector, one fix for amdgpu for an crash when VRAM usage is high, and one fix in gm12u320 to fix the timeout units in the code Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> From: Maxime Ripard <mripard@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/w5nlld5ukeh6bgtljsxmkex3e7s7f4qquuqkv5lv4cv3uxzwqr@pgokpejfsyef
2023-09-12Merge drm/drm-fixes into drm-misc-fixesThomas Zimmermann145-2415/+4352
Forwarding to v6.6-rc1. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
2023-09-12drm/amdgpu: Handle null atom context in VBIOS info ioctlDavid Francis1-6/+11
On some APU systems, there is no atom context and so the atom_context struct is null. Add a check to the VBIOS_INFO branch of amdgpu_info_ioctl to handle this case, returning all zeroes. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: David Francis <David.Francis@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-12drm/amdgpu: fallback to old RAS error message for aqua_vanjaramHawking Zhang1-2/+4
So driver doesn't generate incorrect message until the new format is settled down for aqua_vanjaram Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-12drm/amdgpu/nbio4.3: set proper rmmio_remap.reg_offset for SR-IOVAlex Deucher1-0/+3
Needed for HDP flush to work correctly. Reviewed-by: Timmy Tsai <timmtsai@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-12drm/amdgpu/soc21: don't remap HDP registers for SR-IOVAlex Deucher1-1/+1
This matches the behavior for soc15 and nv. Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Timmy Tsai <timmtsai@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-12Revert "drm/amd: Disable S/G for APUs when 64GB or more host memory"Hamza Mahfooz2-27/+0
This reverts commit 70e64c4d522b732e31c6475a3be2349de337d321. Since, we now have an actual fix for this issue, we can get rid of this workaround as it can cause pin failures if enough VRAM isn't carved out by the BIOS. Cc: stable@vger.kernel.org # 6.1+ Acked-by: Harry Wentland <harry.wentland@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-12drm/amdgpu: Store CU info from all XCCs for GFX v9.4.3Mukul Joshi10-52/+45
Currently, we store CU info only for a single XCC assuming that it is the same for all XCCs. However, that may not be true. As a result, store CU info for all XCCs. This info is later used for CU masking. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>