| Age | Commit message (Collapse) | Author | Files | Lines |
|
commit 31ab27b14daaa75541a415c6794d6f3567fea44a upstream.
Submitting a cs with 0 chunks, causes an oops later, found trying
to execute the wrong userspace driver.
MESA_LOADER_DRIVER_OVERRIDE=v3d glxinfo
[172536.665184] BUG: kernel NULL pointer dereference, address: 00000000000001d8
[172536.665188] #PF: supervisor read access in kernel mode
[172536.665189] #PF: error_code(0x0000) - not-present page
[172536.665191] PGD 6712a0067 P4D 6712a0067 PUD 5af9ff067 PMD 0
[172536.665195] Oops: 0000 [#1] SMP NOPTI
[172536.665197] CPU: 7 PID: 2769838 Comm: glxinfo Tainted: P O 5.10.81 #1-NixOS
[172536.665199] Hardware name: To be filled by O.E.M. To be filled by O.E.M./CROSSHAIR V FORMULA-Z, BIOS 2201 03/23/2015
[172536.665272] RIP: 0010:amdgpu_cs_ioctl+0x96/0x1ce0 [amdgpu]
[172536.665274] Code: 75 18 00 00 4c 8b b2 88 00 00 00 8b 46 08 48 89 54 24 68 49 89 f7 4c 89 5c 24 60 31 d2 4c 89 74 24 30 85 c0 0f 85 c0 01 00 00 <48> 83 ba d8 01 00 00 00 48 8b b4 24 90 00 00 00 74 16 48 8b 46 10
[172536.665276] RSP: 0018:ffffb47c0e81bbe0 EFLAGS: 00010246
[172536.665277] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[172536.665278] RDX: 0000000000000000 RSI: ffffb47c0e81be28 RDI: ffffb47c0e81bd68
[172536.665279] RBP: ffff936524080010 R08: 0000000000000000 R09: ffffb47c0e81be38
[172536.665281] R10: ffff936524080010 R11: ffff936524080000 R12: ffffb47c0e81bc40
[172536.665282] R13: ffffb47c0e81be28 R14: ffff9367bc410000 R15: ffffb47c0e81be28
[172536.665283] FS: 00007fe35e05d740(0000) GS:ffff936c1edc0000(0000) knlGS:0000000000000000
[172536.665284] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[172536.665286] CR2: 00000000000001d8 CR3: 0000000532e46000 CR4: 00000000000406e0
[172536.665287] Call Trace:
[172536.665322] ? amdgpu_cs_find_mapping+0x110/0x110 [amdgpu]
[172536.665332] drm_ioctl_kernel+0xaa/0xf0 [drm]
[172536.665338] drm_ioctl+0x201/0x3b0 [drm]
[172536.665369] ? amdgpu_cs_find_mapping+0x110/0x110 [amdgpu]
[172536.665372] ? selinux_file_ioctl+0x135/0x230
[172536.665399] amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
[172536.665403] __x64_sys_ioctl+0x83/0xb0
[172536.665406] do_syscall_64+0x33/0x40
[172536.665409] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2018
Signed-off-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit ab0cd4a9ae5b4679b714d8dbfedc0901fecdce9f ]
When psp_hw_init failed, it will set the load_type to AMDGPU_FW_LOAD_DIRECT.
During amdgpu_device_ip_fini, amdgpu_ucode_free_bo checks that load_type is
AMDGPU_FW_LOAD_DIRECT and skips deallocating fw_buf causing memory leak.
Remove load_type check in amdgpu_ucode_free_bo.
Signed-off-by: Alice Wong <shiwei.wong@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 555238d92ac32dbad2d77ad2bafc48d17391990c ]
Fix the compile warning below:
drivers/gpu/drm/amd/amdgpu/../pm/legacy-dpm/kv_dpm.c:1641
kv_get_acp_boot_level() warn: always true condition '(table->entries[i]->clk >= 0) => (0-u32max >= 0)'
Reported-by: kernel test robot <lkp@intel.com>
CC: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit f3fa2becf2fc25b6ac7cf8d8b1a2e4a86b3b72bd ]
In function si_parse_power_table(), array adev->pm.dpm.ps and its member
is allocated. If the allocation of each member fails, the array itself
is freed and returned with an error code. However, the array is later
freed again in si_dpm_fini() function which is called when the function
returns an error.
This leads to potential double free of the array adev->pm.dpm.ps, as
well as leak of its array members, since the members are not freed in
the allocation function and the array is not nulled when freed.
In addition adev->pm.dpm.num_ps, which keeps track of the allocated
array member, is not updated until the member allocation is
successfully finished, this could also lead to either use after free,
or uninitialized variable access in si_dpm_fini().
Fix this by postponing the free of the array until si_dpm_fini() and
increment adev->pm.dpm.num_ps everytime the array member is allocated.
Signed-off-by: Keita Suzuki <keitasuzuki.park@sslab.ics.keio.ac.jp>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit b40a6ab2cf9213923bf8e821ce7fa7f6a0a26990 upstream.
amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu needs the drm_priv to allow mmap
to access the BO through the corresponding file descriptor. The VM can
also be extracted from drm_priv, so drm_priv can replace the vm parameter
in the kfd2kgd interface.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Philip Yang <philip.yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[This is a partial cherry-pick of the upstream commit.]
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit b7dfbd2e601f3fee545bc158feceba4f340fe7cf ]
Compute-only GPUs have more than 8 VMIDs allocated to KFD. Fix
this by passing correct number of VMIDs to HWS
v2: squash in warning fix (Alex)
Signed-off-by: Tushar Patel <tushar.patel@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit c5c948aa894a831f96fccd025e47186b1ee41615 ]
[Why&How] Add a dedicated AMDGPU specific ID for use with
newer ASICs that support USB-C output
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
This patch is for linux-5.4.y only, it has no equivalent change
upstream.
When building x86_64 allmodconfig with tip of tree clang, there is an
instance of -Wstrict-prototypes:
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c:168:59: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
struct kfd2kgd_calls *amdgpu_amdkfd_gfx_10_0_get_functions()
^
void
1 error generated.
amdgpu_amdkfd_gfx_10_0_get_functions() is prototyped properly in
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h but its definition in
amdgpu_amdkfd_gfx_v10.c does not have the argument types specified,
which causes the warning. GCC does not warn because it permits an
old-style definition if the prototype has the argument types.
This code was eliminated by commit e392c887df97 ("drm/amdkfd: Use array
to probe kfd2kgd_calls"), which was a part of a larger series that does
not look very suitable for stable. Just fix this one location, as it was
the only instance of this new warning across a variety of builds.
Fixes: 6bdadb207224 ("drm/amdgpu: Add navi10 kfd support for amdgpu (v3)")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 1647b54ed55d4d48c7199d439f8834626576cbe9 ]
This post-op should be a pre-op so that we do not pass -1 as the bit
number to test_bit(). The current code will loop downwards from 63 to
-1. After changing to a pre-op, it loops from 63 to 0.
Fixes: 71c37505e7ea ("drm/amdgpu/gfx: move more common KIQ code to amdgpu_gfx.c")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 447c7997b62a5115ba4da846dcdee4fc12298a6a ]
Noticed the below warning while running a pytorch workload on vega10
GPUs. Change to trylock to avoid conflicts with already held reservation
locks.
[ +0.000003] WARNING: possible recursive locking detected
[ +0.000003] 5.13.0-kfd-rajneesh #1030 Not tainted
[ +0.000004] --------------------------------------------
[ +0.000002] python/4822 is trying to acquire lock:
[ +0.000004] ffff932cd9a259f8 (reservation_ww_class_mutex){+.+.}-{3:3},
at: amdgpu_bo_release_notify+0xc4/0x160 [amdgpu]
[ +0.000203]
but task is already holding lock:
[ +0.000003] ffff932cbb7181f8 (reservation_ww_class_mutex){+.+.}-{3:3},
at: ttm_eu_reserve_buffers+0x270/0x470 [ttm]
[ +0.000017]
other info that might help us debug this:
[ +0.000002] Possible unsafe locking scenario:
[ +0.000003] CPU0
[ +0.000002] ----
[ +0.000002] lock(reservation_ww_class_mutex);
[ +0.000004] lock(reservation_ww_class_mutex);
[ +0.000003]
*** DEADLOCK ***
[ +0.000002] May be due to missing lock nesting notation
[ +0.000003] 7 locks held by python/4822:
[ +0.000003] #0: ffff932c4ac028d0 (&process->mutex){+.+.}-{3:3}, at:
kfd_ioctl_map_memory_to_gpu+0x10b/0x320 [amdgpu]
[ +0.000232] #1: ffff932c55e830a8 (&info->lock#2){+.+.}-{3:3}, at:
amdgpu_amdkfd_gpuvm_map_memory_to_gpu+0x64/0xf60 [amdgpu]
[ +0.000241] #2: ffff932cc45b5e68 (&(*mem)->lock){+.+.}-{3:3}, at:
amdgpu_amdkfd_gpuvm_map_memory_to_gpu+0xdf/0xf60 [amdgpu]
[ +0.000236] #3: ffffb2b35606fd28
(reservation_ww_class_acquire){+.+.}-{0:0}, at:
amdgpu_amdkfd_gpuvm_map_memory_to_gpu+0x232/0xf60 [amdgpu]
[ +0.000235] #4: ffff932cbb7181f8
(reservation_ww_class_mutex){+.+.}-{3:3}, at:
ttm_eu_reserve_buffers+0x270/0x470 [ttm]
[ +0.000015] #5: ffffffffc045f700 (*(sspp++)){....}-{0:0}, at:
drm_dev_enter+0x5/0xa0 [drm]
[ +0.000038] #6: ffff932c52da7078 (&vm->eviction_lock){+.+.}-{3:3},
at: amdgpu_vm_bo_update_mapping+0xd5/0x4f0 [amdgpu]
[ +0.000195]
stack backtrace:
[ +0.000003] CPU: 11 PID: 4822 Comm: python Not tainted
5.13.0-kfd-rajneesh #1030
[ +0.000005] Hardware name: GIGABYTE MZ01-CE0-00/MZ01-CE0-00, BIOS F02
08/29/2018
[ +0.000003] Call Trace:
[ +0.000003] dump_stack+0x6d/0x89
[ +0.000010] __lock_acquire+0xb93/0x1a90
[ +0.000009] lock_acquire+0x25d/0x2d0
[ +0.000005] ? amdgpu_bo_release_notify+0xc4/0x160 [amdgpu]
[ +0.000184] ? lock_is_held_type+0xa2/0x110
[ +0.000006] ? amdgpu_bo_release_notify+0xc4/0x160 [amdgpu]
[ +0.000184] __ww_mutex_lock.constprop.17+0xca/0x1060
[ +0.000007] ? amdgpu_bo_release_notify+0xc4/0x160 [amdgpu]
[ +0.000183] ? lock_release+0x13f/0x270
[ +0.000005] ? lock_is_held_type+0xa2/0x110
[ +0.000006] ? amdgpu_bo_release_notify+0xc4/0x160 [amdgpu]
[ +0.000183] amdgpu_bo_release_notify+0xc4/0x160 [amdgpu]
[ +0.000185] ttm_bo_release+0x4c6/0x580 [ttm]
[ +0.000010] amdgpu_bo_unref+0x1a/0x30 [amdgpu]
[ +0.000183] amdgpu_vm_free_table+0x76/0xa0 [amdgpu]
[ +0.000189] amdgpu_vm_free_pts+0xb8/0xf0 [amdgpu]
[ +0.000189] amdgpu_vm_update_ptes+0x411/0x770 [amdgpu]
[ +0.000191] amdgpu_vm_bo_update_mapping+0x324/0x4f0 [amdgpu]
[ +0.000191] amdgpu_vm_bo_update+0x251/0x610 [amdgpu]
[ +0.000191] update_gpuvm_pte+0xcc/0x290 [amdgpu]
[ +0.000229] ? amdgpu_vm_bo_map+0xd7/0x130 [amdgpu]
[ +0.000190] amdgpu_amdkfd_gpuvm_map_memory_to_gpu+0x912/0xf60
[amdgpu]
[ +0.000234] kfd_ioctl_map_memory_to_gpu+0x182/0x320 [amdgpu]
[ +0.000218] kfd_ioctl+0x2b9/0x600 [amdgpu]
[ +0.000216] ? kfd_ioctl_unmap_memory_from_gpu+0x270/0x270 [amdgpu]
[ +0.000216] ? lock_release+0x13f/0x270
[ +0.000006] ? __fget_files+0x107/0x1e0
[ +0.000007] __x64_sys_ioctl+0x8b/0xd0
[ +0.000007] do_syscall_64+0x36/0x70
[ +0.000004] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ +0.000007] RIP: 0033:0x7fbff90a7317
[ +0.000004] Code: b3 66 90 48 8b 05 71 4b 2d 00 64 c7 00 26 00 00 00
48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f
05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 41 4b 2d 00 f7 d8 64 89 01 48
[ +0.000005] RSP: 002b:00007fbe301fe648 EFLAGS: 00000246 ORIG_RAX:
0000000000000010
[ +0.000006] RAX: ffffffffffffffda RBX: 00007fbcc402d820 RCX:
00007fbff90a7317
[ +0.000003] RDX: 00007fbe301fe690 RSI: 00000000c0184b18 RDI:
0000000000000004
[ +0.000003] RBP: 00007fbe301fe690 R08: 0000000000000000 R09:
00007fbcc402d880
[ +0.000003] R10: 0000000002001000 R11: 0000000000000246 R12:
00000000c0184b18
[ +0.000003] R13: 0000000000000004 R14: 00007fbf689593a0 R15:
00007fbcc402d820
Cc: Christian König <christian.koenig@amd.com>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Alex Deucher <Alexander.Deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit dfced44f122c500004a48ecc8db516bb6a295a1b ]
This issue takes place in an error path in
amdgpu_cs_fence_to_handle_ioctl(). When `info->in.what` falls into
default case, the function simply returns -EINVAL, forgetting to
decrement the reference count of a dma_fence obj, which is bumped
earlier by amdgpu_cs_get_fence(). This may result in reference count
leaks.
Fix it by decreasing the refcount of specific object before returning
the error code.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Xin Xiong <xiongx18@fudan.edu.cn>
Signed-off-by: Xin Tan <tanxin.ctf@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit f626dd0ff05043e5a7154770cc7cda66acee33a3 upstream.
MMHUB PG needs to be disabled for Picasso for stability reasons.
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit e8ae38720e1a685fd98cfa5ae118c9d07b45ca79 ]
We probably never trigger this, but the logic inside the check is
inverted.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 11544d77e3974924c5a9c8a8320b996a3e9b2f8b ]
Some boards(like RX550) seem to have garbage in the upper
16 bits of the vram size register. Check for
this and clamp the size properly. Fixes
boards reporting bogus amounts of vram.
after add this patch,the maximum GPU VRAM size is 64GB,
otherwise only 64GB vram size will be used.
Signed-off-by: Zongmin Zhou<zhouzongmin@kylinos.cn>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit b220110e4cd442156f36e1d9b4914bb9e87b0d00 ]
In amdgpu_connector_lcd_native_mode(), the return value of
drm_mode_duplicate() is assigned to mode, and there is a dereference
of it in amdgpu_connector_lcd_native_mode(), which will lead to a NULL
pointer dereference on failure of drm_mode_duplicate().
Fix this bug add a check of mode.
This bug was found by a static analyzer. The analysis employs
differential checking to identify inconsistent security operations
(e.g., checks or kfrees) between two code paths and confirms that the
inconsistent operations are not recovered in the current function or
the callers, so they constitute bugs.
Note that, as a bug found by static analysis, it can be a false
positive or hard to trigger. Multiple researchers have cross-reviewed
the bug.
Builds with CONFIG_DRM_AMDGPU=m show no new warnings, and
our static analyzer no longer warns about this code.
Fixes: d38ceaf99ed0 ("drm/amdgpu: add core driver (v4)")
Signed-off-by: Zhou Qingyang <zhou1615@umn.edu>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit f3a8076eb28cae1553958c629aecec479394bbe2 upstream.
should count on GC IP base address
Signed-off-by: Le Ma <le.ma@amd.com>
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
and dvi connectors
commit bf552083916a7f8800477b5986940d1c9a31b953 upstream.
amdgpu_connector_vga_get_modes missed function amdgpu_get_native_mode
which assign amdgpu_encoder->native_mode with *preferred_mode result in
amdgpu_encoder->native_mode.clock always be 0. That will cause
amdgpu_connector_set_property returned early on:
if ((rmx_type != DRM_MODE_SCALE_NONE) &&
(amdgpu_encoder->native_mode.clock == 0))
when we try to set scaling mode Full/Full aspect/Center.
Add the missing function to amdgpu_connector_vga_get_mode can fix this.
It also works on dvi connectors because
amdgpu_connector_dvi_helper_funcs.get_mode use the same method.
Signed-off-by: hongao <hongao@uniontech.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 403475be6d8b122c3e6b8a47e075926d7299e5ef ]
The DMA mask on SI parts is 40 bits not 44. Copy
paste typo.
Fixes: 244511f386ccb9 ("drm/amdgpu: simplify and cleanup setting the dma mask")
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1762
Acked-by: Christian König <christian.koenig@amd.com>
Tested-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 335aea75b0d95518951cad7c4c676e6f1c02c150 ]
The overflow check in amdgpu_bo_list_create() causes a warning with
clang-14 on 64-bit architectures, since the limit can never be
exceeded.
drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c:74:18: error: result of comparison of constant 256204778801521549 with expression of type 'unsigned int' is always false [-Werror,-Wtautological-constant-out-of-range-compare]
if (num_entries > (SIZE_MAX - sizeof(struct amdgpu_bo_list))
~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The check remains useful for 32-bit architectures, so just avoid the
warning by using size_t as the type for the count.
Fixes: 920990cb080a ("drm/amdgpu: allocate the bo_list array after the list")
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 66805763a97f8f7bdf742fc0851d85c02ed9411f ]
gmc_v{9,10}_0_gart_disable() isn't called matched with
correspoding gart_enbale function in SRIOV case. This will
lead to gart.bo pin_count leak on driver unload.
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Leslie Shi <Yuliang.Shi@amd.com>
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 67a44e659888569a133a8f858c8230e9d7aad1d5 upstream.
Seems like newer cards can have even more instances now.
Found by UBSAN: array-index-out-of-bounds in
drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c:318:29
index 8 is out of range for type 'uint32_t *[8]'
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1697
Cc: stable@vger.kernel.org
Signed-off-by: Ernst Sjöstrand <ernstp@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit ea7acd7c5967542353430947f3faf699e70602e5 upstream.
With added CPU domain to placement you can have
now 3 placemnts at once.
CC: stable@kernel.org
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210622162339.761651-5-andrey.grodzovsky@amd.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
access in amdgpu_i2c_router_select_ddc_port()
[ Upstream commit a211260c34cfadc6068fece8c9e99e0fe1e2a2b6 ]
The variable val is declared without initialization, and its address is
passed to amdgpu_i2c_get_byte(). In this function, the value of val is
accessed in:
DRM_DEBUG("i2c 0x%02x 0x%02x read failed\n",
addr, *val);
Also, when amdgpu_i2c_get_byte() returns, val may remain uninitialized,
but it is accessed in:
val &= ~amdgpu_connector->router.ddc_mux_control_pin;
To fix this possible uninitialized-variable access, initialize val to 0 in
amdgpu_i2c_router_select_ddc_port().
Reported-by: TOTE Robot <oslab@tsinghua.edu.cn>
Signed-off-by: Tuo Li <islituo@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit dce4400e6516d18313d23de45b5be8a18980b00e ]
No need to account for the 2 bytes of EEPROM
address--this is now well abstracted away by
the fixes the the lower layers.
Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Cc: Alexander Deucher <Alexander.Deucher@amd.com>
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Acked-by: Alexander Deucher <Alexander.Deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit aff890288de2d818e4f83ec40c9315e2d735df07 ]
Devices created by mfd_add_hotplug_devices() don't really increase the
index of its name, so get_mfd_cell_dev() cannot find any device, hence a
NULL dev is passed to pm_genpd_add_device():
[ 56.974926] (NULL device *): amdgpu: device acp_audio_dma.0.auto added to pm domain
[ 56.974933] (NULL device *): amdgpu: Failed to add dev to genpd
[ 56.974941] [drm:amdgpu_device_ip_init [amdgpu]] *ERROR* hw_init of IP block <acp_ip> failed -22
[ 56.975810] amdgpu 0000:00:01.0: amdgpu: amdgpu_device_ip_init failed
[ 56.975839] amdgpu 0000:00:01.0: amdgpu: Fatal error during GPU init
[ 56.977136] ------------[ cut here ]------------
[ 56.977143] kernel BUG at mm/slub.c:4206!
[ 56.977158] invalid opcode: 0000 [#1] SMP NOPTI
[ 56.977167] CPU: 1 PID: 1648 Comm: modprobe Not tainted 5.12.0-051200rc8-generic #202104182230
[ 56.977175] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./FM2A68M-HD+, BIOS P5.20 02/13/2019
[ 56.977180] RIP: 0010:kfree+0x3bf/0x410
[ 56.977195] Code: 89 e7 48 d3 e2 f7 da e8 5f 0d 02 00 80 e7 02 75 3e 44 89 ee 4c 89 e7 e8 ef 5f fd ff e9 fa fe ff ff 49 8b 44 24 08 a8 01 75 b7 <0f> 0b 4c 8b 4d b0 48 8b 4d a8 48 89 da 4c 89 e6 41 b8 01 00 00 00
[ 56.977202] RSP: 0018:ffffa48640ff79f0 EFLAGS: 00010246
[ 56.977210] RAX: 0000000000000000 RBX: ffff9286127d5608 RCX: 0000000000000000
[ 56.977215] RDX: 0000000000000000 RSI: ffffffffc099d0fb RDI: ffff9286127d5608
[ 56.977220] RBP: ffffa48640ff7a48 R08: 0000000000000001 R09: 0000000000000001
[ 56.977224] R10: 0000000000000000 R11: ffff9286087d8458 R12: fffff3ae0449f540
[ 56.977229] R13: 0000000000000000 R14: dead000000000122 R15: dead000000000100
[ 56.977234] FS: 00007f9de5929540(0000) GS:ffff928612e80000(0000) knlGS:0000000000000000
[ 56.977240] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 56.977245] CR2: 00007f697dd97160 CR3: 00000001110f0000 CR4: 00000000001506e0
[ 56.977251] Call Trace:
[ 56.977261] amdgpu_dm_encoder_destroy+0x1b/0x30 [amdgpu]
[ 56.978056] drm_mode_config_cleanup+0x4f/0x2e0 [drm]
[ 56.978147] ? kfree+0x3dd/0x410
[ 56.978157] ? drm_managed_release+0xc8/0x100 [drm]
[ 56.978232] drm_mode_config_init_release+0xe/0x10 [drm]
[ 56.978311] drm_managed_release+0x9d/0x100 [drm]
[ 56.978388] devm_drm_dev_init_release+0x4d/0x70 [drm]
[ 56.978450] devm_action_release+0x15/0x20
[ 56.978459] release_nodes+0x77/0xc0
[ 56.978469] devres_release_all+0x3f/0x50
[ 56.978477] really_probe+0x245/0x460
[ 56.978485] driver_probe_device+0xe9/0x160
[ 56.978492] device_driver_attach+0xab/0xb0
[ 56.978499] __driver_attach+0x8f/0x150
[ 56.978506] ? device_driver_attach+0xb0/0xb0
[ 56.978513] bus_for_each_dev+0x7e/0xc0
[ 56.978521] driver_attach+0x1e/0x20
[ 56.978528] bus_add_driver+0x135/0x1f0
[ 56.978534] driver_register+0x91/0xf0
[ 56.978540] __pci_register_driver+0x54/0x60
[ 56.978549] amdgpu_init+0x77/0x1000 [amdgpu]
[ 56.979246] ? 0xffffffffc0dbc000
[ 56.979254] do_one_initcall+0x48/0x1d0
[ 56.979265] ? kmem_cache_alloc_trace+0x120/0x230
[ 56.979274] ? do_init_module+0x28/0x280
[ 56.979282] do_init_module+0x62/0x280
[ 56.979288] load_module+0x71c/0x7a0
[ 56.979296] __do_sys_finit_module+0xc2/0x120
[ 56.979305] __x64_sys_finit_module+0x1a/0x20
[ 56.979311] do_syscall_64+0x38/0x90
[ 56.979319] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 56.979328] RIP: 0033:0x7f9de54f989d
[ 56.979335] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c3 f5 0c 00 f7 d8 64 89 01 48
[ 56.979342] RSP: 002b:00007ffe3c395a28 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[ 56.979350] RAX: ffffffffffffffda RBX: 0000560df3ef4330 RCX: 00007f9de54f989d
[ 56.979355] RDX: 0000000000000000 RSI: 0000560df3a07358 RDI: 000000000000000f
[ 56.979360] RBP: 0000000000040000 R08: 0000000000000000 R09: 0000000000000000
[ 56.979365] R10: 000000000000000f R11: 0000000000000246 R12: 0000560df3a07358
[ 56.979369] R13: 0000000000000000 R14: 0000560df3ef4460 R15: 0000560df3ef4330
[ 56.979377] Modules linked in: amdgpu(+) iommu_v2 gpu_sched drm_ttm_helper ttm drm_kms_helper cec rc_core i2c_algo_bit fb_sys_fops syscopyarea sysfillrect sysimgblt nft_counter xt_tcpudp ipt_REJECT nf_reject_ipv4 xt_conntrack iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security ip_set nf_tables libcrc32c nfnetlink ip6_tables iptable_filter bpfilter input_leds binfmt_misc edac_mce_amd kvm_amd ccp kvm snd_hda_codec_realtek snd_hda_codec_generic crct10dif_pclmul snd_hda_codec_hdmi ledtrig_audio ghash_clmulni_intel aesni_intel snd_hda_intel snd_intel_dspcfg snd_seq_midi crypto_simd snd_intel_sdw_acpi cryptd snd_hda_codec snd_seq_midi_event snd_rawmidi snd_hda_core snd_hwdep snd_seq fam15h_power k10temp snd_pcm snd_seq_device snd_timer snd mac_hid soundcore sch_fq_codel nct6775 hwmon_vid drm ip_tables x_tables autofs4 dm_mirror dm_region_hash dm_log hid_generic usbhid hid uas usb_storage r8169 crc32_pclmul realtek ahci xhci_pci i2c_piix4
[ 56.979521] xhci_pci_renesas libahci video
[ 56.979541] ---[ end trace cb8f6a346f18da7b ]---
Instead of finding MFD hotplugged device by its name, simply iterate
over the child devices to avoid the issue.
Squash in unused variable removal (Alex)
BugLink: https://bugs.launchpad.net/bugs/1920674
Fixes: 25030321ba28 ("drm/amd: add pm domain for ACP IP sub blocks")
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit bc05716d4fdd065013633602c5960a2bf1511b9c ]
Fixes handling when page tables are in system memory.
v3: remove struct amdgpu_vm_parser.
v2: remove unwanted variable.
change amdgpu_amdkfd_validate instead of amdgpu_amdkfd_bo_validate.
Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 95ea3dbc4e9548d35ab6fbf67675cef8c293e2f5 ]
Disable all ip's hw status to false before any hw_init.
Only set it to true until its hw_init is executed.
The old 5.9 branch has this change but somehow the 5.11 kernrel does
not have this fix.
Without this change, sriov tdr have gfx IB test fail.
Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com>
Review-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
doorbell."
commit baacf52a473b24e10322b67757ddb92ab8d86717 upstream.
This reverts commit 1c0b0efd148d5b24c4932ddb3fa03c8edd6097b3.
Reason for revert: Side effect of enlarging CP_MEC_DOORBELL_RANGE may
cause some APUs fail to enter gfxoff in certain user cases.
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit ee5468b9f1d3bf48082eed351dace14598e8ca39 upstream.
This reverts commit 4cbbe34807938e6e494e535a68d5ff64edac3f20.
Reason for revert: side effect of enlarging CP_MEC_DOORBELL_RANGE may
cause some APUs fail to enter gfxoff in certain user cases.
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 4cbbe34807938e6e494e535a68d5ff64edac3f20 upstream.
If GC has entered CGPG, ringing doorbell > first page doesn't wakeup GC.
Enlarge CP_MEC_DOORBELL_RANGE_UPPER to workaround this issue.
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 1c0b0efd148d5b24c4932ddb3fa03c8edd6097b3 upstream.
If GC has entered CGPG, ringing doorbell > first page doesn't wakeup GC.
Enlarge CP_MEC_DOORBELL_RANGE_UPPER to workaround this issue.
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 07438603a07e52f1c6aa731842bd298d2725b7be upstream.
Releasing pinned BOs is illegal now. UVD 6 was missing from:
commit 2f40801dc553 ("drm/amdgpu: make sure we unpin the UVD BO")
Fixes: 2f40801dc553 ("drm/amdgpu: make sure we unpin the UVD BO")
Cc: stable@vger.kernel.org
Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit dce3d8e1d070900e0feeb06787a319ff9379212c upstream.
On QUERY2 IOCTL don't query counts of correctable
and uncorrectable errors, since when RAS is
enabled and supported on Vega20 server boards,
this takes insurmountably long time, in O(n^3),
which slows the system down to the point of it
being unusable when we have GUI up.
Fixes: ae363a212b14 ("drm/amdgpu: Add a new flag to AMDGPU_CTX_OP_QUERY_STATE2")
Cc: Alexander Deucher <Alexander.Deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Reviewed-by: Alexander Deucher <Alexander.Deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 9c2876d56f1ce9b6b2072f1446fb1e8d1532cb3d ]
When amdgpu_ib_ring_tests failed, the reset logic called
amdgpu_device_ip_suspend twice, then deadlock occurred.
Deadlock log:
[ 805.655192] amdgpu 0000:04:00.0: amdgpu: ib ring test failed (-110).
[ 806.290952] [drm] free PSP TMR buffer
[ 806.319406] ============================================
[ 806.320315] WARNING: possible recursive locking detected
[ 806.321225] 5.11.0-custom #1 Tainted: G W OEL
[ 806.322135] --------------------------------------------
[ 806.323043] cat/2593 is trying to acquire lock:
[ 806.323825] ffff888136b1cdc8 (&adev->dm.dc_lock){+.+.}-{3:3}, at: dm_suspend+0xb8/0x1d0 [amdgpu]
[ 806.325668]
but task is already holding lock:
[ 806.326664] ffff888136b1cdc8 (&adev->dm.dc_lock){+.+.}-{3:3}, at: dm_suspend+0xb8/0x1d0 [amdgpu]
[ 806.328430]
other info that might help us debug this:
[ 806.329539] Possible unsafe locking scenario:
[ 806.330549] CPU0
[ 806.330983] ----
[ 806.331416] lock(&adev->dm.dc_lock);
[ 806.332086] lock(&adev->dm.dc_lock);
[ 806.332738]
*** DEADLOCK ***
[ 806.333747] May be due to missing lock nesting notation
[ 806.334899] 3 locks held by cat/2593:
[ 806.335537] #0: ffff888100d3f1b8 (&attr->mutex){+.+.}-{3:3}, at: simple_attr_read+0x4e/0x110
[ 806.337009] #1: ffff888136b1fd78 (&adev->reset_sem){++++}-{3:3}, at: amdgpu_device_lock_adev+0x42/0x94 [amdgpu]
[ 806.339018] #2: ffff888136b1cdc8 (&adev->dm.dc_lock){+.+.}-{3:3}, at: dm_suspend+0xb8/0x1d0 [amdgpu]
[ 806.340869]
stack backtrace:
[ 806.341621] CPU: 6 PID: 2593 Comm: cat Tainted: G W OEL 5.11.0-custom #1
[ 806.342921] Hardware name: AMD Celadon-CZN/Celadon-CZN, BIOS WLD0C23N_Weekly_20_12_2 12/23/2020
[ 806.344413] Call Trace:
[ 806.344849] dump_stack+0x93/0xbd
[ 806.345435] __lock_acquire.cold+0x18a/0x2cf
[ 806.346179] lock_acquire+0xca/0x390
[ 806.346807] ? dm_suspend+0xb8/0x1d0 [amdgpu]
[ 806.347813] __mutex_lock+0x9b/0x930
[ 806.348454] ? dm_suspend+0xb8/0x1d0 [amdgpu]
[ 806.349434] ? amdgpu_device_indirect_rreg+0x58/0x70 [amdgpu]
[ 806.350581] ? _raw_spin_unlock_irqrestore+0x47/0x50
[ 806.351437] ? dm_suspend+0xb8/0x1d0 [amdgpu]
[ 806.352437] ? rcu_read_lock_sched_held+0x4f/0x80
[ 806.353252] ? rcu_read_lock_sched_held+0x4f/0x80
[ 806.354064] mutex_lock_nested+0x1b/0x20
[ 806.354747] ? mutex_lock_nested+0x1b/0x20
[ 806.355457] dm_suspend+0xb8/0x1d0 [amdgpu]
[ 806.356427] ? soc15_common_set_clockgating_state+0x17d/0x19 [amdgpu]
[ 806.357736] amdgpu_device_ip_suspend_phase1+0x78/0xd0 [amdgpu]
[ 806.360394] amdgpu_device_ip_suspend+0x21/0x70 [amdgpu]
[ 806.362926] amdgpu_device_pre_asic_reset+0xb3/0x270 [amdgpu]
[ 806.365560] amdgpu_device_gpu_recover.cold+0x679/0x8eb [amdgpu]
Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Acked-by: Christian KÃnig <christian.koenig@amd.com>
Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 1e5c37385097c35911b0f8a0c67ffd10ee1af9a2 ]
looks like we forget to set ttm->sg to NULL.
Hit panic below
[ 1235.844104] general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b7b4b: 0000 [#1] SMP DEBUG_PAGEALLOC NOPTI
[ 1235.989074] Call Trace:
[ 1235.991751] sg_free_table+0x17/0x20
[ 1235.995667] amdgpu_ttm_backend_unbind.cold+0x4d/0xf7 [amdgpu]
[ 1236.002288] amdgpu_ttm_backend_destroy+0x29/0x130 [amdgpu]
[ 1236.008464] ttm_tt_destroy+0x1e/0x30 [ttm]
[ 1236.013066] ttm_bo_cleanup_memtype_use+0x51/0xa0 [ttm]
[ 1236.018783] ttm_bo_release+0x262/0xa50 [ttm]
[ 1236.023547] ttm_bo_put+0x82/0xd0 [ttm]
[ 1236.027766] amdgpu_bo_unref+0x26/0x50 [amdgpu]
[ 1236.032809] amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu+0x7aa/0xd90 [amdgpu]
[ 1236.040400] kfd_ioctl_alloc_memory_of_gpu+0xe2/0x330 [amdgpu]
[ 1236.046912] kfd_ioctl+0x463/0x690 [amdgpu]
Signed-off-by: xinhui pan <xinhui.pan@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit fa7e6abc75f3d491bc561734312d065dc9dc2a77 ]
[Why]
the gem object rfb->base.obj[0] is get according to num_planes
in amdgpufb_create, but is not put according to num_planes
[How]
put rfb->base.obj[0] in amdgpu_fbdev_destroy according to num_planes
Signed-off-by: Jingwen Chen <Jingwen.Chen2@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 2fb536ea42d557f39f70c755f68e1aa1ad466c55 upstream.
Add cancel_delayed_work_sync before set power gating state
to avoid race condition issue when power gating.
Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 0c6013377b4027e69d8f3e63b6bf556b6cb87802 upstream.
Add cancel_delayed_work_sync before set power gating state
to avoid race condition issue when power gating.
Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit b95f045ea35673572ef46d6483ad8bd6d353d63c upstream.
Add cancel_delayed_work_sync before set power gating state
to avoid race condition issue when power gating.
Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 77194d8642dd4cb7ea8ced77bfaea55610574c38 upstream.
Current golden setting is out of date.
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 99c45ba5799d6b938bd9bd20edfeb6f3e3e039b9 upstream.
Current golden setting is out of date.
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit dbd1003d1252db5973dddf20b24bb0106ac52aa2 upstream.
There is problem with 3DCGCG firmware and it will cause compute test
hang on picasso/raven1. It needs to disable 3DCGCG in driver to avoid
compute hang.
Signed-off-by: Changfeng <Changfeng.Zhu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 3c3dc654333f6389803cdcaf03912e94173ae510 ]
ttm->sg needs to be checked before accessing its child member.
Call Trace:
amdgpu_ttm_backend_destroy+0x12/0x70 [amdgpu]
ttm_bo_cleanup_memtype_use+0x3a/0x60 [ttm]
ttm_bo_release+0x17d/0x300 [ttm]
amdgpu_bo_unref+0x1a/0x30 [amdgpu]
amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu+0x78b/0x8b0 [amdgpu]
kfd_ioctl_alloc_memory_of_gpu+0x118/0x220 [amdgpu]
kfd_ioctl+0x222/0x400 [amdgpu]
? kfd_dev_is_large_bar+0x90/0x90 [amdgpu]
__x64_sys_ioctl+0x8e/0xd0
? __context_tracking_exit+0x52/0x90
do_syscall_64+0x33/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f97f264d317
Code: b3 66 90 48 8b 05 71 4b 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 41 4b 2d 00 f7 d8 64 89 01 48
RSP: 002b:00007ffdb402c338 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007f97f3cc63a0 RCX: 00007f97f264d317
RDX: 00007ffdb402c380 RSI: 00000000c0284b16 RDI: 0000000000000003
RBP: 00007ffdb402c380 R08: 00007ffdb402c428 R09: 00000000c4000004
R10: 00000000c4000004 R11: 0000000000000246 R12: 00000000c0284b16
R13: 0000000000000003 R14: 00007f97f3cc63a0 R15: 00007f8836200000
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 7d98d416c2cc1c1f7d9508e887de4630e521d797 ]
clang points out that the %hu format string does not match the type
of the variables here:
drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c:263:7: warning: format specifies type 'unsigned short' but the argument has type 'unsigned int' [-Wformat]
version_major, version_minor);
^~~~~~~~~~~~~
include/drm/drm_print.h:498:19: note: expanded from macro 'DRM_ERROR'
__drm_err(fmt, ##__VA_ARGS__)
~~~ ^~~~~~~~~~~
Change it to a regular %u, the same way a previous patch did for
another instance of the same warning.
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tom Rix <trix@redhat.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit c8941550aa66b2a90f4b32c45d59e8571e33336e ]
This recent change introduce SDMA interrupt info printing with irq->process function.
These functions do not require a set function to enable/disable the irq
Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 4ac5617c4b7d0f0a8f879997f8ceaa14636d7554 ]
The psp supplies the link type in the upper 2 bits of the psp xgmi node
information num_hops field. With a new link type, Aldebaran has these
bits set to a non-zero value (1 = xGMI3) so the KFD topology will report
the incorrect IO link weights without proper masking.
The actual number of hops is located in the 3 least significant bits of
this field so mask if off accordingly before passing it to the KFD.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Amber Lin <amber.lin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit e3512fb67093fabdf27af303066627b921ee9bd8 upstream.
The page table of AMDGPU requires an alignment to CPU page so we should
check ioctl parameters for it. Return -EINVAL if some parameter is
unaligned to CPU page, instead of corrupt the page table sliently.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Xi Ruoyao <xry111@mengyan1223.wang>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 5e61b84f9d3ddfba73091f9fbc940caae1c9eb22 upstream.
Offset calculation wasn't correct as start addresses are in pfn
not in bytes.
CC: stable@vger.kernel.org
Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 521f04f9e3ffc73ef96c776035f8a0a31b4cdd81 ]
FB BO should not be ttm_bo_type_kernel type and
amdgpufb_create_pinned_object() pins the FB BO anyway.
Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 1aa46901ee51c1c5779b3b239ea0374a50c6d9ff upstream.
the register offset isn't needed division by 4 to pass RREG32_PCIE()
Signed-off-by: Kevin Wang <kevin1.wang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|