| Age | Commit message (Collapse) | Author | Files | Lines |
|
commit c0dd8a9253fadfb8e5357217d085f1989da4ef0a upstream.
The page_link lower bits of the first sg could contain something like
SG_END, if we are mapping a single VRAM page or contiguous blob which
fits into one sg entry. Rather pull out the struct page, and use that in
our check to know if we mapped struct pages vs VRAM.
Fixes: f44ffd677fb3 ("drm/amdgpu: add support for exporting VRAM using DMA-buf v3")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: amd-gfx@lists.freedesktop.org
Cc: <stable@vger.kernel.org> # v5.8+
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 5f054ddead33c1622ea9c0c0aaf07c6843fc7ab0 upstream.
If compiled without SI or CIK support but amdgpu tries to load it
will run into failures with uninitialized callbacks.
Show a nicer message in this case and fail probe instead.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4050
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 0d9a95099dcb05b5f4719c830d15bf4fdcad0dc2 ]
We keep the gang submission fence around in adev, make sure that it
stays alive.
v2: fix memory leak on retry
Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 4161050d47e1b083a7e1b0b875c9907e1a6f1f1f ]
GC11 only has 1 mec.
Fixes: 3d879e81f0f9 ("drm/amdgpu: add init support for GFX11 (v2)")
Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 4afacc9948e1f8fdbca401d259ae65ad93d298c0 ]
When userspace invokes S4 the flow is:
1) amdgpu_pmops_prepare()
2) amdgpu_pmops_freeze()
3) Create hibernation image
4) amdgpu_pmops_thaw()
5) Write out image to disk
6) Turn off system
Then on resume amdgpu_pmops_restore() is called.
This flow has a problem that because amdgpu_pmops_thaw() is called
it will call amdgpu_device_resume() which will resume all of the GPU.
This includes turning the display hardware back on and discovering
connectors again.
This is an unexpected experience for the display to turn back on.
Adjust the flow so that during the S4 sequence display hardware is
not turned back on.
Reported-by: Xaver Hugl <xaver.hugl@gmail.com>
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2038
Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>
Tested-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Harry Wentland <harry.wentland@amd.com>
Link: https://lore.kernel.org/r/20250306185124.44780-1-mario.limonciello@amd.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 68bfdc8dc0a1a7fdd9ab61e69907ae71a6fd3d91)
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit ec33964d9d88488fa954a03d476a8b811efc6e85 upstream.
8192x8192 is the maximum supported resolution.
Signed-off-by: David Rosca <david.rosca@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 6e0d2fde3ae8fdb5b47e10389f23ed2cb4daec5d)
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit f0105e173103c9d30a2bb959f7399437d536c848 upstream.
1920x1088 is the maximum supported resolution.
Signed-off-by: David Rosca <david.rosca@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 1a0807feb97082bff2b1342dbbe55a2a9a8bdb88)
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 099bffc7cadff40bfab1517c3461c53a7a38a0d7 ]
There was a quirk added to add a workaround for a Sapphire
RX 5600 XT Pulse that didn't allow BAR resizing. However,
the quirk caused a regression with runtime pm on Dell laptops
using those chips, rather than narrowing the scope of the
resizing quirk, add a quirk to prevent amdgpu from resizing
the BAR on those Dell platforms unless runtime pm is disabled.
v2: update commit message, add runpm check
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/1707
Fixes: 907830b0fc9e ("PCI: Add a REBAR size quirk for Sapphire RX 5600 XT Pulse")
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 5235053f443cef4210606e5fb71f99b915a9723d)
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
large bar
[ Upstream commit e372baeb3d336b20fd9463784c577fd8824497cd ]
Some customer platforms do not enable mmconfig for various reasons,
such as bios bug, and therefore cannot access the GPU extend configuration
space through mmio.
When the system enters the d3cold state and resumes, the amdgpu driver
fails to resume because the extend configuration space registers of
GPU can't be restored. At this point, Usually we only see some failure
dmesg log printed by amdgpu driver, it is difficult to find the root
cause.
Therefor print a warnning message if the system can't access the
extended configuration space register when using large bar.
Signed-off-by: Ma Jun <Jun.Ma2@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Stable-dep-of: 099bffc7cadf ("drm/amdgpu: disable BAR resize on Dell G5 SE")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit a0a455b4bc7483ad60e8b8a50330c1e05bb7bfcf ]
In function psp_init_cap_microcode(), it should bail out when failed to
load firmware, otherwise it may cause invalid memory access.
Fixes: 07dbfc6b102e ("drm/amd: Use `amdgpu_ucode_*` helpers for PSP")
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 60a2c0c12b644450e420ffc42291d1eb248bacb7 ]
Tear down ttm range manager for doorbell in function amdgpu_ttm_fini(),
to avoid memory leakage.
Fixes: 792b84fb9038 ("drm/amdgpu: initialize ttm for doorbells")
Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
Signed-off-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 3676f37a88432132bcff55a17dc48911239b6d98 ]
- The previous patch only considered the case for baremetal
and is not applicable for SRIOV code path. We also need to
init fw_share for SRIOV VF
Fixes: 928cd772e18f ("drm/amdgpu/vcn: reset fw_shared when VCPU buffers corrupted on vcn v4.0.3")
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Bokun Zhang <bokun.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
This reverts commit 2daba7d857e48035d71cdd95964350b6d0d51545 which is
commit 73dae652dcac776296890da215ee7dec357a1032 upstream.
The original patch 73dae652dcac (drm/amdgpu: rework resume handling for
display (v2)), was only targeted at kernels 6.11 and newer. It did not
apply cleanly to 6.12 so I backported it and it backport landed as
99a02eab8251 ("drm/amdgpu: rework resume handling for display (v2)"),
however there was a bug in the backport that was subsequently fixed in
063d380ca28e ("drm/amdgpu: fix backport of commit 73dae652dcac"). None
of this was intended for kernels older than 6.11, however the original
backport eventually landed in 6.6, 6.1, and 5.15.
Please revert the change from kernels 6.6, 6.1, and 5.15.
Link: https://lore.kernel.org/r/BL1PR12MB5144D5363FCE6F2FD3502534F7E72@BL1PR12MB5144.namprd12.prod.outlook.com
Link: https://lore.kernel.org/r/BL1PR12MB51449ADCFBF2314431F8BCFDF7132@BL1PR12MB5144.namprd12.prod.outlook.com
Reported-by: Salvatore Bonaccorso <carnil@debian.org>
Reported-by: Christian König <christian.koenig@amd.com>
Reported-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit af04b320c71c4b59971f021615876808a36e5038 upstream.
That is needed to enforce isolation between contexts.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit def59436fb0d3ca0f211d14873d0273d69ebb405)
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 0021d70a0654e668d457758110abec33dfbd3ba5 ]
I think this was an abstraction back from when
kfd supported both radeon and amdgpu. Since we just
support amdgpu now, there is no more need for this and
we can use the amdgpu structures directly.
This also avoids having the kfd_cu_info structures on
the stack when inlining which can blow up the stack.
Cc: Arnd Bergmann <arnd@kernel.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Stable-dep-of: 438b39ac74e2 ("drm/amdkfd: pause autosuspend when creating pdd")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit abe1cbaec6cfe9fde609a15cd6a12c812282ce77 ]
Need to read back to make sure the write goes through.
Cc: David Belanger <david.belanger@amd.com>
Reviewed-by: Frank Min <frank.min@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit cf424020e040be35df05b682b546b255e74a420f ]
Need to read back to make sure the write goes through.
Cc: David Belanger <david.belanger@amd.com>
Reviewed-by: Frank Min <frank.min@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit c9b8dcabb52afe88413ff135a0953e3cc4128483 ]
Need to read back to make sure the write goes through.
Cc: David Belanger <david.belanger@amd.com>
Reviewed-by: Frank Min <frank.min@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit bf2bc61638033d118c9ef4ab1204295ba6694401 ]
when use cpu to do page table update under sriov runtime, since mmio
access is blocked, kiq has to be used to flush hdp.
change WREG32_NO_KIQ to WREG32 to allow kiq.
Signed-off-by: Victor Zhao <Victor.Zhao@amd.com>
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Stable-dep-of: c9b8dcabb52a ("drm/amdgpu/hdp4.0: do a posting read when flushing HDP")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 85230ee36d88e7a09fb062d43203035659dd10a5 upstream.
Third time's the charm, I hope?
Fixes: d3116756a710 ("drm/ttm: rename bo->mem and make it a pointer")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3837
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Michel Dänzer <mdaenzer@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 695c2c745e5dff201b75da8a1d237ce403600d04)
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit a93b1020eb9386d7da11608477121b10079c076a ]
Since 2320c9e6a768 ("drm/sched: memset() 'job' in drm_sched_job_init()")
accessing job->base.sched can produce unexpected results as the initialisation
of (*job)->base.sched done in amdgpu_job_alloc is overwritten by the
memset.
This commit fixes an issue when a CS would fail validation and would
be rejected after job->num_ibs is incremented. In this case,
amdgpu_ib_free(ring->adev, ...) will be called, which would crash the
machine because the ring value is bogus.
To fix this, pass a NULL pointer to amdgpu_ib_free(): we can do this
because the device is actually not used in this function.
The next commit will remove the ring argument completely.
Fixes: 2320c9e6a768 ("drm/sched: memset() 'job' in drm_sched_job_init()")
Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 2ae520cb12831d264ceb97c61f72c59d33c0dbd7)
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 47f402a3e08113e0f5d8e1e6fcc197667a16022f ]
base.sched may not be set for each instance and should not
be used for cases such as non-IB tests.
Fixes: 2320c9e6a768 ("drm/sched: memset() 'job' in drm_sched_job_init()")
Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 73dae652dcac776296890da215ee7dec357a1032 upstream.
Split resume into a 3rd step to handle displays when DCC is
enabled on DCN 4.0.1. Move display after the buffer funcs
have been re-enabled so that the GPU will do the move and
properly set the DCC metadata for DCN.
v2: fix fence irq resume ordering
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.11.x
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 928cd772e18ffbd7723cb2361db4a8ccf2222235 ]
It is not necessarily corrupted. When there is RAS fatal error, device
memory access is blocked. Hence vcpu bo cannot be saved to system memory
as in a regular suspend sequence before going for reset. In other full
device reset cases, that gets saved and restored during resume.
v2: Remove redundant code like vcn_v4_0 did
v2: Refine commit message
v3: Drop the volatile
v3: Refine commit message
Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit e2e97435783979124ba92d6870415c57ecfef6a5 ]
The driver needs to set the correct max_segment_size;
otherwise debug_dma_map_sg() will complain about the
over-mapping of the AMDGPU sg length as following:
WARNING: CPU: 6 PID: 1964 at kernel/dma/debug.c:1178 debug_dma_map_sg+0x2dc/0x370
[ 364.049444] Modules linked in: veth amdgpu(OE) amdxcp drm_exec gpu_sched drm_buddy drm_ttm_helper ttm(OE) drm_suballoc_helper drm_display_helper drm_kms_helper i2c_algo_bit rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace netfs xt_conntrack xt_MASQUERADE nf_conntrack_netlink xfrm_user xfrm_algo iptable_nat xt_addrtype iptable_filter br_netfilter nvme_fabrics overlay nfnetlink_cttimeout nfnetlink openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c bridge stp llc amd_atl intel_rapl_msr intel_rapl_common sunrpc sch_fq_codel snd_hda_codec_realtek snd_hda_codec_generic snd_hda_scodec_component snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg edac_mce_amd binfmt_misc snd_hda_codec snd_pci_acp6x snd_hda_core snd_acp_config snd_hwdep snd_soc_acpi kvm_amd snd_pcm kvm snd_seq_midi snd_seq_midi_event crct10dif_pclmul ghash_clmulni_intel sha512_ssse3 snd_rawmidi sha256_ssse3 sha1_ssse3 aesni_intel snd_seq nls_iso8859_1 crypto_simd snd_seq_device cryptd snd_timer rapl input_leds snd
[ 364.049532] ipmi_devintf wmi_bmof ccp serio_raw k10temp sp5100_tco soundcore ipmi_msghandler cm32181 industrialio mac_hid msr parport_pc ppdev lp parport drm efi_pstore ip_tables x_tables pci_stub crc32_pclmul nvme ahci libahci i2c_piix4 r8169 nvme_core i2c_designware_pci realtek i2c_ccgx_ucsi video wmi hid_generic cdc_ether usbnet usbhid hid r8152 mii
[ 364.049576] CPU: 6 PID: 1964 Comm: rocminfo Tainted: G OE 6.10.0-custom #492
[ 364.049579] Hardware name: AMD Majolica-RN/Majolica-RN, BIOS RMJ1009A 06/13/2021
[ 364.049582] RIP: 0010:debug_dma_map_sg+0x2dc/0x370
[ 364.049585] Code: 89 4d b8 e8 36 b1 86 00 8b 4d b8 48 8b 55 b0 44 8b 45 a8 4c 8b 4d a0 48 89 c6 48 c7 c7 00 4b 74 bc 4c 89 4d b8 e8 b4 73 f3 ff <0f> 0b 4c 8b 4d b8 8b 15 c8 2c b8 01 85 d2 0f 85 ee fd ff ff 8b 05
[ 364.049588] RSP: 0018:ffff9ca600b57ac0 EFLAGS: 00010286
[ 364.049590] RAX: 0000000000000000 RBX: ffff88b7c132b0c8 RCX: 0000000000000027
[ 364.049592] RDX: ffff88bb0f521688 RSI: 0000000000000001 RDI: ffff88bb0f521680
[ 364.049594] RBP: ffff9ca600b57b20 R08: 000000000000006f R09: ffff9ca600b57930
[ 364.049596] R10: ffff9ca600b57928 R11: ffffffffbcb46328 R12: 0000000000000000
[ 364.049597] R13: 0000000000000001 R14: ffff88b7c19c0700 R15: ffff88b7c9059800
[ 364.049599] FS: 00007fb2d3516e80(0000) GS:ffff88bb0f500000(0000) knlGS:0000000000000000
[ 364.049601] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 364.049603] CR2: 000055610bd03598 CR3: 00000001049f6000 CR4: 0000000000350ef0
[ 364.049605] Call Trace:
[ 364.049607] <TASK>
[ 364.049609] ? show_regs+0x6d/0x80
[ 364.049614] ? __warn+0x8c/0x140
[ 364.049618] ? debug_dma_map_sg+0x2dc/0x370
[ 364.049621] ? report_bug+0x193/0x1a0
[ 364.049627] ? handle_bug+0x46/0x80
[ 364.049631] ? exc_invalid_op+0x1d/0x80
[ 364.049635] ? asm_exc_invalid_op+0x1f/0x30
[ 364.049642] ? debug_dma_map_sg+0x2dc/0x370
[ 364.049647] __dma_map_sg_attrs+0x90/0xe0
[ 364.049651] dma_map_sgtable+0x25/0x40
[ 364.049654] amdgpu_bo_move+0x59a/0x850 [amdgpu]
[ 364.049935] ? srso_return_thunk+0x5/0x5f
[ 364.049939] ? amdgpu_ttm_tt_populate+0x5d/0xc0 [amdgpu]
[ 364.050095] ttm_bo_handle_move_mem+0xc3/0x180 [ttm]
[ 364.050103] ttm_bo_validate+0xc1/0x160 [ttm]
[ 364.050108] ? amdgpu_ttm_tt_get_user_pages+0xe5/0x1b0 [amdgpu]
[ 364.050263] amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu+0xa12/0xc90 [amdgpu]
[ 364.050473] kfd_ioctl_alloc_memory_of_gpu+0x16b/0x3b0 [amdgpu]
[ 364.050680] kfd_ioctl+0x3c2/0x530 [amdgpu]
[ 364.050866] ? __pfx_kfd_ioctl_alloc_memory_of_gpu+0x10/0x10 [amdgpu]
[ 364.051054] ? srso_return_thunk+0x5/0x5f
[ 364.051057] ? tomoyo_file_ioctl+0x20/0x30
[ 364.051063] __x64_sys_ioctl+0x9c/0xd0
[ 364.051068] x64_sys_call+0x1219/0x20d0
[ 364.051073] do_syscall_64+0x51/0x120
[ 364.051077] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 364.051081] RIP: 0033:0x7fb2d2f1a94f
Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit afe260df55ac280cd56306248cb6d8a6b0db095c ]
Under sriov, host driver will save and restore vf pci cfg space during
reset. And during device init, under sriov, pci_restore_state happens after
fullaccess released, and it can have race condition with mmio protection
enable from host side leading to missing interrupts.
So skip amdgpu_device_cache_pci_state for sriov.
Signed-off-by: Victor Zhao <Victor.Zhao@amd.com>
Acked-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 46186667f98fb7158c98f4ff5da62c427761ffcd ]
Free sg table when dma_map_sgtable() failed to avoid memory leak.
Signed-off-by: Lang Yu <lang.yu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 32e7ee293ff476c67b51be006e986021967bc525 ]
Need to dereference the atcs acpi buffer after
the method is executed, otherwise it will result in
a memory leak.
Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 8b22f048331dfd45fdfbf0efdfb1d43deff7518d ]
Port this change to vega20_ih.c:
commit afbf7955ff01 ("drm/amdgpu: clear RB_OVERFLOW bit when enabling interrupts")
Original commit message:
"Why:
Setting IH_RB_WPTR register to 0 will not clear the RB_OVERFLOW bit
if RB_ENABLE is not set.
How to fix:
Set WPTR_OVERFLOW_CLEAR bit after RB_ENABLE bit is set.
The RB_ENABLE bit is required to be set, together with
WPTR_OVERFLOW_ENABLE bit so that setting WPTR_OVERFLOW_CLEAR bit
would clear the RB_OVERFLOW."
Signed-off-by: Victor Lu <victorchengchi.lu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit f756dbac1ce1d5f9a2b35e3b55fa429cf6336437 upstream.
Need to read back to make sure the write goes through.
Cc: David Belanger <david.belanger@amd.com>
Reviewed-by: Frank Min <frank.min@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit b61badd20b443eabe132314669bb51a263982e5c upstream.
[ +0.000021] BUG: KASAN: slab-use-after-free in drm_sched_entity_flush+0x6cb/0x7a0 [gpu_sched]
[ +0.000027] Read of size 8 at addr ffff8881b8605f88 by task amd_pci_unplug/2147
[ +0.000023] CPU: 6 PID: 2147 Comm: amd_pci_unplug Not tainted 6.10.0+ #1
[ +0.000016] Hardware name: ASUS System Product Name/ROG STRIX B550-F GAMING (WI-FI), BIOS 1401 12/03/2020
[ +0.000016] Call Trace:
[ +0.000008] <TASK>
[ +0.000009] dump_stack_lvl+0x76/0xa0
[ +0.000017] print_report+0xce/0x5f0
[ +0.000017] ? drm_sched_entity_flush+0x6cb/0x7a0 [gpu_sched]
[ +0.000019] ? srso_return_thunk+0x5/0x5f
[ +0.000015] ? kasan_complete_mode_report_info+0x72/0x200
[ +0.000016] ? drm_sched_entity_flush+0x6cb/0x7a0 [gpu_sched]
[ +0.000019] kasan_report+0xbe/0x110
[ +0.000015] ? drm_sched_entity_flush+0x6cb/0x7a0 [gpu_sched]
[ +0.000023] __asan_report_load8_noabort+0x14/0x30
[ +0.000014] drm_sched_entity_flush+0x6cb/0x7a0 [gpu_sched]
[ +0.000020] ? srso_return_thunk+0x5/0x5f
[ +0.000013] ? __kasan_check_write+0x14/0x30
[ +0.000016] ? __pfx_drm_sched_entity_flush+0x10/0x10 [gpu_sched]
[ +0.000020] ? srso_return_thunk+0x5/0x5f
[ +0.000013] ? __kasan_check_write+0x14/0x30
[ +0.000013] ? srso_return_thunk+0x5/0x5f
[ +0.000013] ? enable_work+0x124/0x220
[ +0.000015] ? __pfx_enable_work+0x10/0x10
[ +0.000013] ? srso_return_thunk+0x5/0x5f
[ +0.000014] ? free_large_kmalloc+0x85/0xf0
[ +0.000016] drm_sched_entity_destroy+0x18/0x30 [gpu_sched]
[ +0.000020] amdgpu_vce_sw_fini+0x55/0x170 [amdgpu]
[ +0.000735] ? __kasan_check_read+0x11/0x20
[ +0.000016] vce_v4_0_sw_fini+0x80/0x110 [amdgpu]
[ +0.000726] amdgpu_device_fini_sw+0x331/0xfc0 [amdgpu]
[ +0.000679] ? mutex_unlock+0x80/0xe0
[ +0.000017] ? __pfx_amdgpu_device_fini_sw+0x10/0x10 [amdgpu]
[ +0.000662] ? srso_return_thunk+0x5/0x5f
[ +0.000014] ? __kasan_check_write+0x14/0x30
[ +0.000013] ? srso_return_thunk+0x5/0x5f
[ +0.000013] ? mutex_unlock+0x80/0xe0
[ +0.000016] amdgpu_driver_release_kms+0x16/0x80 [amdgpu]
[ +0.000663] drm_minor_release+0xc9/0x140 [drm]
[ +0.000081] drm_release+0x1fd/0x390 [drm]
[ +0.000082] __fput+0x36c/0xad0
[ +0.000018] __fput_sync+0x3c/0x50
[ +0.000014] __x64_sys_close+0x7d/0xe0
[ +0.000014] x64_sys_call+0x1bc6/0x2680
[ +0.000014] do_syscall_64+0x70/0x130
[ +0.000014] ? srso_return_thunk+0x5/0x5f
[ +0.000014] ? irqentry_exit_to_user_mode+0x60/0x190
[ +0.000015] ? srso_return_thunk+0x5/0x5f
[ +0.000014] ? irqentry_exit+0x43/0x50
[ +0.000012] ? srso_return_thunk+0x5/0x5f
[ +0.000013] ? exc_page_fault+0x7c/0x110
[ +0.000015] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ +0.000014] RIP: 0033:0x7ffff7b14f67
[ +0.000013] Code: ff e8 0d 16 02 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 41 c3 48 83 ec 18 89 7c 24 0c e8 73 ba f7 ff
[ +0.000026] RSP: 002b:00007fffffffe378 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
[ +0.000019] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ffff7b14f67
[ +0.000014] RDX: 0000000000000000 RSI: 00007ffff7f6f47a RDI: 0000000000000003
[ +0.000014] RBP: 00007fffffffe3a0 R08: 0000555555569890 R09: 0000000000000000
[ +0.000014] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fffffffe5c8
[ +0.000013] R13: 00005555555552a9 R14: 0000555555557d48 R15: 00007ffff7ffd040
[ +0.000020] </TASK>
[ +0.000016] Allocated by task 383 on cpu 7 at 26.880319s:
[ +0.000014] kasan_save_stack+0x28/0x60
[ +0.000008] kasan_save_track+0x18/0x70
[ +0.000007] kasan_save_alloc_info+0x38/0x60
[ +0.000007] __kasan_kmalloc+0xc1/0xd0
[ +0.000007] kmalloc_trace_noprof+0x180/0x380
[ +0.000007] drm_sched_init+0x411/0xec0 [gpu_sched]
[ +0.000012] amdgpu_device_init+0x695f/0xa610 [amdgpu]
[ +0.000658] amdgpu_driver_load_kms+0x1a/0x120 [amdgpu]
[ +0.000662] amdgpu_pci_probe+0x361/0xf30 [amdgpu]
[ +0.000651] local_pci_probe+0xe7/0x1b0
[ +0.000009] pci_device_probe+0x248/0x890
[ +0.000008] really_probe+0x1fd/0x950
[ +0.000008] __driver_probe_device+0x307/0x410
[ +0.000007] driver_probe_device+0x4e/0x150
[ +0.000007] __driver_attach+0x223/0x510
[ +0.000006] bus_for_each_dev+0x102/0x1a0
[ +0.000007] driver_attach+0x3d/0x60
[ +0.000006] bus_add_driver+0x2ac/0x5f0
[ +0.000006] driver_register+0x13d/0x490
[ +0.000008] __pci_register_driver+0x1ee/0x2b0
[ +0.000007] llc_sap_close+0xb0/0x160 [llc]
[ +0.000009] do_one_initcall+0x9c/0x3e0
[ +0.000008] do_init_module+0x241/0x760
[ +0.000008] load_module+0x51ac/0x6c30
[ +0.000006] __do_sys_init_module+0x234/0x270
[ +0.000007] __x64_sys_init_module+0x73/0xc0
[ +0.000006] x64_sys_call+0xe3/0x2680
[ +0.000006] do_syscall_64+0x70/0x130
[ +0.000007] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ +0.000015] Freed by task 2147 on cpu 6 at 160.507651s:
[ +0.000013] kasan_save_stack+0x28/0x60
[ +0.000007] kasan_save_track+0x18/0x70
[ +0.000007] kasan_save_free_info+0x3b/0x60
[ +0.000007] poison_slab_object+0x115/0x1c0
[ +0.000007] __kasan_slab_free+0x34/0x60
[ +0.000007] kfree+0xfa/0x2f0
[ +0.000007] drm_sched_fini+0x19d/0x410 [gpu_sched]
[ +0.000012] amdgpu_fence_driver_sw_fini+0xc4/0x2f0 [amdgpu]
[ +0.000662] amdgpu_device_fini_sw+0x77/0xfc0 [amdgpu]
[ +0.000653] amdgpu_driver_release_kms+0x16/0x80 [amdgpu]
[ +0.000655] drm_minor_release+0xc9/0x140 [drm]
[ +0.000071] drm_release+0x1fd/0x390 [drm]
[ +0.000071] __fput+0x36c/0xad0
[ +0.000008] __fput_sync+0x3c/0x50
[ +0.000007] __x64_sys_close+0x7d/0xe0
[ +0.000007] x64_sys_call+0x1bc6/0x2680
[ +0.000007] do_syscall_64+0x70/0x130
[ +0.000007] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ +0.000014] The buggy address belongs to the object at ffff8881b8605f80
which belongs to the cache kmalloc-64 of size 64
[ +0.000020] The buggy address is located 8 bytes inside of
freed 64-byte region [ffff8881b8605f80, ffff8881b8605fc0)
[ +0.000028] The buggy address belongs to the physical page:
[ +0.000011] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1b8605
[ +0.000008] anon flags: 0x17ffffc0000000(node=0|zone=2|lastcpupid=0x1fffff)
[ +0.000007] page_type: 0xffffefff(slab)
[ +0.000009] raw: 0017ffffc0000000 ffff8881000428c0 0000000000000000 dead000000000001
[ +0.000006] raw: 0000000000000000 0000000000200020 00000001ffffefff 0000000000000000
[ +0.000006] page dumped because: kasan: bad access detected
[ +0.000012] Memory state around the buggy address:
[ +0.000011] ffff8881b8605e80: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
[ +0.000015] ffff8881b8605f00: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc
[ +0.000015] >ffff8881b8605f80: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
[ +0.000013] ^
[ +0.000011] ffff8881b8606000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fc
[ +0.000014] ffff8881b8606080: fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb fb
[ +0.000013] ==================================================================
The issue reproduced on VG20 during the IGT pci_unplug test.
The root cause of the issue is that the function drm_sched_fini is called before drm_sched_entity_kill.
In drm_sched_fini, the drm_sched_rq structure is freed, but this structure is later accessed by
each entity within the run queue, leading to invalid memory access.
To resolve this, the order of cleanup calls is updated:
Before:
amdgpu_fence_driver_sw_fini
amdgpu_device_ip_fini
After:
amdgpu_device_ip_fini
amdgpu_fence_driver_sw_fini
This updated order ensures that all entities in the IPs are cleaned up first, followed by proper
cleanup of the schedulers.
Additional Investigation:
During debugging, another issue was identified in the amdgpu_vce_sw_fini function. The vce.vcpu_bo
buffer must be freed only as the final step in the cleanup process to prevent any premature
access during earlier cleanup stages.
v2: Using Christian suggestion call drm_sched_entity_destroy before drm_sched_fini.
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 7013a8268d311fded6c7a6528fc1de82668e75f6 upstream.
There is a strapping issue on NBIO 7.7.0 that can lead to spurious PME
events while in the D0 state.
Co-developed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Vijendar Mukunda <Vijendar.Mukunda@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://lore.kernel.org/r/20241112161142.28974-1-mario.limonciello@amd.com
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 447a54a0f79c9a409ceaa17804bdd2e0206397b9)
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit a6dd15981c03f2cdc9a351a278f09b5479d53d2e upstream.
acpi_evaluate_object() may return AE_NOT_FOUND (failure), which
would result in dereferencing buffer.pointer (obj) while being NULL.
Although this case may be unrealistic for the current code, it is
still better to protect against possible bugs.
Bail out also when status is AE_NOT_FOUND.
This fixes 1 FORWARD_NULL issue reported by Coverity
Report: CID 1600951: Null pointer dereferences (FORWARD_NULL)
Signed-off-by: Antonio Quartulli <antonio@mandelbit.com>
Fixes: c9b7c809b89f ("drm/amd: Guard against bad data for ATIF ACPI method")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/20241031152848.4716-1-antonio@mandelbit.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 91c9e221fe2553edf2db71627d8453f083de87a1)
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 3ce3f85787352fa48fc02ef6cbd7a5e5aba93347 upstream.
For DPX mode, the number of memory partitions supported should be less
than or equal to 2.
Fixes: 1589c82a1085 ("drm/amdgpu: Check memory ranges for valid xcp mode")
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 990c4f580742de7bb78fa57420ffd182fc3ab4cd)
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit b46dadf7e3cfe26d0b109c9c3d81b278d6c75361 upstream.
Regular users shouldn't have read access.
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit c0cfd2e652553d607b910be47d0cc5a7f3a78641)
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 4d75b9468021c73108b4439794d69e892b1d24e3 upstream.
Avoid a possible buffer overflow if size is larger than 4K.
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit f5d873f5825b40d886d03bd2aede91d4cf002434)
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit f790a2c494c4ef587eeeb9fca20124de76a1646f upstream.
Users should not be able to run these.
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 7ba9395430f611cfc101b1c2687732baafa239d5)
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit bf58f03931fdcf7b3c45cb76ac13244477a60f44 upstream.
If a BIOS provides bad data in response to an ATIF method call
this causes a NULL pointer dereference in the caller.
```
? show_regs (arch/x86/kernel/dumpstack.c:478 (discriminator 1))
? __die (arch/x86/kernel/dumpstack.c:423 arch/x86/kernel/dumpstack.c:434)
? page_fault_oops (arch/x86/mm/fault.c:544 (discriminator 2) arch/x86/mm/fault.c:705 (discriminator 2))
? do_user_addr_fault (arch/x86/mm/fault.c:440 (discriminator 1) arch/x86/mm/fault.c:1232 (discriminator 1))
? acpi_ut_update_object_reference (drivers/acpi/acpica/utdelete.c:642)
? exc_page_fault (arch/x86/mm/fault.c:1542)
? asm_exc_page_fault (./arch/x86/include/asm/idtentry.h:623)
? amdgpu_atif_query_backlight_caps.constprop.0 (drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c:387 (discriminator 2)) amdgpu
? amdgpu_atif_query_backlight_caps.constprop.0 (drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c:386 (discriminator 1)) amdgpu
```
It has been encountered on at least one system, so guard for it.
Fixes: d38ceaf99ed0 ("drm/amdgpu: add core driver (v4)")
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit c9b7c809b89f24e9372a4e7f02d64c950b07fdee)
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit e7457532cb7167516263150ceae86f36d6ef9683 ]
This patch addresses a double unlock issue in the amdgpu_mes_add_ring
function. The mutex was being unlocked twice under certain error
conditions, which could lead to undefined behavior.
The fix ensures that the mutex is unlocked only once before jumping to
the clean_up_memory label. The unlock operation is moved to just before
the goto statement within the conditional block that checks the return
value of amdgpu_ring_init. This prevents the second unlock attempt after
the clean_up_memory label, which is no longer necessary as the mutex is
already unlocked by this point in the code flow.
This change resolves the potential double unlock and maintains the
correct mutex handling throughout the function.
Fixes below:
Commit d0c423b64765 ("drm/amdgpu/mes: use ring for kernel queue
submission"), leads to the following Smatch static checker warning:
drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c:1240 amdgpu_mes_add_ring()
warn: double unlock '&adev->mes.mutex_hidden' (orig line 1213)
drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
1143 int amdgpu_mes_add_ring(struct amdgpu_device *adev, int gang_id,
1144 int queue_type, int idx,
1145 struct amdgpu_mes_ctx_data *ctx_data,
1146 struct amdgpu_ring **out)
1147 {
1148 struct amdgpu_ring *ring;
1149 struct amdgpu_mes_gang *gang;
1150 struct amdgpu_mes_queue_properties qprops = {0};
1151 int r, queue_id, pasid;
1152
1153 /*
1154 * Avoid taking any other locks under MES lock to avoid circular
1155 * lock dependencies.
1156 */
1157 amdgpu_mes_lock(&adev->mes);
1158 gang = idr_find(&adev->mes.gang_id_idr, gang_id);
1159 if (!gang) {
1160 DRM_ERROR("gang id %d doesn't exist\n", gang_id);
1161 amdgpu_mes_unlock(&adev->mes);
1162 return -EINVAL;
1163 }
1164 pasid = gang->process->pasid;
1165
1166 ring = kzalloc(sizeof(struct amdgpu_ring), GFP_KERNEL);
1167 if (!ring) {
1168 amdgpu_mes_unlock(&adev->mes);
1169 return -ENOMEM;
1170 }
1171
1172 ring->ring_obj = NULL;
1173 ring->use_doorbell = true;
1174 ring->is_mes_queue = true;
1175 ring->mes_ctx = ctx_data;
1176 ring->idx = idx;
1177 ring->no_scheduler = true;
1178
1179 if (queue_type == AMDGPU_RING_TYPE_COMPUTE) {
1180 int offset = offsetof(struct amdgpu_mes_ctx_meta_data,
1181 compute[ring->idx].mec_hpd);
1182 ring->eop_gpu_addr =
1183 amdgpu_mes_ctx_get_offs_gpu_addr(ring, offset);
1184 }
1185
1186 switch (queue_type) {
1187 case AMDGPU_RING_TYPE_GFX:
1188 ring->funcs = adev->gfx.gfx_ring[0].funcs;
1189 ring->me = adev->gfx.gfx_ring[0].me;
1190 ring->pipe = adev->gfx.gfx_ring[0].pipe;
1191 break;
1192 case AMDGPU_RING_TYPE_COMPUTE:
1193 ring->funcs = adev->gfx.compute_ring[0].funcs;
1194 ring->me = adev->gfx.compute_ring[0].me;
1195 ring->pipe = adev->gfx.compute_ring[0].pipe;
1196 break;
1197 case AMDGPU_RING_TYPE_SDMA:
1198 ring->funcs = adev->sdma.instance[0].ring.funcs;
1199 break;
1200 default:
1201 BUG();
1202 }
1203
1204 r = amdgpu_ring_init(adev, ring, 1024, NULL, 0,
1205 AMDGPU_RING_PRIO_DEFAULT, NULL);
1206 if (r)
1207 goto clean_up_memory;
1208
1209 amdgpu_mes_ring_to_queue_props(adev, ring, &qprops);
1210
1211 dma_fence_wait(gang->process->vm->last_update, false);
1212 dma_fence_wait(ctx_data->meta_data_va->last_pt_update, false);
1213 amdgpu_mes_unlock(&adev->mes);
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1214
1215 r = amdgpu_mes_add_hw_queue(adev, gang_id, &qprops, &queue_id);
1216 if (r)
1217 goto clean_up_ring;
^^^^^^^^^^^^^^^^^^
1218
1219 ring->hw_queue_id = queue_id;
1220 ring->doorbell_index = qprops.doorbell_off;
1221
1222 if (queue_type == AMDGPU_RING_TYPE_GFX)
1223 sprintf(ring->name, "gfx_%d.%d.%d", pasid, gang_id, queue_id);
1224 else if (queue_type == AMDGPU_RING_TYPE_COMPUTE)
1225 sprintf(ring->name, "compute_%d.%d.%d", pasid, gang_id,
1226 queue_id);
1227 else if (queue_type == AMDGPU_RING_TYPE_SDMA)
1228 sprintf(ring->name, "sdma_%d.%d.%d", pasid, gang_id,
1229 queue_id);
1230 else
1231 BUG();
1232
1233 *out = ring;
1234 return 0;
1235
1236 clean_up_ring:
1237 amdgpu_ring_fini(ring);
1238 clean_up_memory:
1239 kfree(ring);
--> 1240 amdgpu_mes_unlock(&adev->mes);
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1241 return r;
1242 }
Fixes: d0c423b64765 ("drm/amdgpu/mes: use ring for kernel queue submission")
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Suggested-by: Jack Xiao <Jack.Xiao@amd.com>
Reported by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Jack Xiao <Jack.Xiao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit bfaf1883605fd0c0dbabacd67ed49708470d5ea4)
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit c0ec082f10b7a1fd25e8c1e2a686440da913b7a3 upstream.
Before this patch, if multiple BO_HANDLES chunks were submitted,
the error -EINVAL would be correctly set but could be overwritten
by the return value from amdgpu_cs_p1_bo_handles(). This patch
ensures that if there are multiple BO_HANDLES, we stop.
Fixes: fec5f8e8c6bc ("drm/amdgpu: disallow multiple BO_HANDLES chunks in one submit")
Signed-off-by: Mohammed Anees <pvmohammedanees2003@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 40f2cd98828f454bdc5006ad3d94330a5ea164b7)
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit ead60e9c4e29c8574cae1be4fe3af1d9a978fb0f ]
Protect the MMIO access with safe mode.
Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 3f2d35c325534c1b7ac5072173f0dc7ca969dec2 ]
Protect the MMIO access with safe mode.
Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 3ec2ad7c34c412bd9264cd1ff235d0812be90e82 ]
Protect the MMIO access with safe mode.
Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 9e823f307074c0f82b5f6044943b0086e3079bed ]
Register access from userspace should be blocked until
reset is complete.
Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit c0277b9d7c2ee9ee5dbc948548984f0fbb861301 ]
This resolves the unchecded return value warning reported by Coverity.
Signed-off-by: Tim Huang <tim.huang@amd.com>
Reviewed-by: Jesse Zhang <jesse.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 2c7795e245d993bcba2f716a8c93a5891ef910c9 ]
Enabling gfxoff quirk results in perfectly usable
graphical user interface on HP 705G4 DM with R5 2400G.
Without the quirk, X server is completely unusable as
every few seconds there is gpu reset due to ring gfx timeout.
Signed-off-by: Peng Liu <liupeng01@kylinos.cn>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 0126c0ae11e8b52ecfde9d1b174ee2f32d6c3a5d ]
Fix screen corruption with openkylin.
Link: https://bbs.openkylin.top/t/topic/171497
Signed-off-by: Peng Liu <liupeng01@kylinos.cn>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit c86ad39140bbcb9dc75a10046c2221f657e8083b ]
Pass pointer reference to amdgpu_bo_unref to clear the correct pointer,
otherwise amdgpu_bo_unref clear the local variable, the original pointer
not set to NULL, this could cause use-after-free bug.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit fec5f8e8c6bcf83ed7a392801d7b44c5ecfc1e82 ]
Before this commit, only submits with both a BO_HANDLES chunk and a
'bo_list_handle' would be rejected (by amdgpu_cs_parser_bos).
But if UMD sent multiple BO_HANDLES, what would happen is:
* only the last one would be really used
* all the others would leak memory as amdgpu_cs_p1_bo_handles would
overwrite the previous p->bo_list value
This commit rejects submissions with multiple BO_HANDLES chunks to
match the implementation of the parser.
Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit ef126c06a98bde1a41303970eb0fc0ac33c3cc02 ]
Fix get each xcp macro to loop over each partition correctly
Fixes: 4bdca2057933 ("drm/amdgpu: Add utility functions for xcp")
Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|