kernel/linux.git - Linux kernel stable tree (mirror)

Age	Commit message (Collapse)	Author	Files	Lines
2020-09-03	drm/amdgpu/gfx10: refine mgcg setting	Jiansong Chen	1	-4/+2
	commit de7a1b0b8753e1b0000084f0e339ffab295d27ef upstream. 1. enable ENABLE_CGTS_LEGACY to fix specviewperf11 random hang. 2. remove obsolete RLC_CGTT_SCLK_OVERRIDE workaround. Signed-off-by: Jiansong Chen <Jiansong.Chen@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-09-03	drm/amdgpu: Fix buffer overflow in INFO ioctl	Alex Deucher	1	-0/+4
	commit b5b97cab55eb71daba3283c8b1d2cce456d511a1 upstream. The values for "se_num" and "sh_num" come from the user in the ioctl. They can be in the 0-255 range but if they're more than AMDGPU_GFX_MAX_SE (4) or AMDGPU_GFX_MAX_SH_PER_SE (2) then it results in an out of bounds read. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-09-03	drm/amdgpu/display: fix ref count leak when pm_runtime_get_sync fails	Navid Emamdoost	1	-4/+12
	[ Upstream commit f79f94765f8c39db0b7dec1d335ab046aac03f20 ] The call to pm_runtime_get_sync increments the counter even in case of failure, leading to incorrect ref count. In case of failure, decrement the ref count before returning. Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-09-03	drm/amdgpu: fix ref count leak in amdgpu_display_crtc_set_config	Navid Emamdoost	1	-2/+3
	[ Upstream commit e008fa6fb41544b63973a529b704ef342f47cc65 ] in amdgpu_display_crtc_set_config, the call to pm_runtime_get_sync increments the counter even in case of failure, leading to incorrect ref count. In case of failure, decrement the ref count before returning. Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-09-03	drm/amd/display: fix ref count leak in amdgpu_drm_ioctl	Navid Emamdoost	1	-1/+2
	[ Upstream commit 5509ac65f2fe5aa3c0003237ec629ca55024307c ] in amdgpu_drm_ioctl the call to pm_runtime_get_sync increments the counter even in case of failure, leading to incorrect ref count. In case of failure, decrement the ref count before returning. Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-09-03	drm/amdgpu: fix ref count leak in amdgpu_driver_open_kms	Navid Emamdoost	1	-1/+2
	[ Upstream commit 9ba8923cbbe11564dd1bf9f3602add9a9cfbb5c6 ] in amdgpu_driver_open_kms the call to pm_runtime_get_sync increments the counter even in case of failure, leading to incorrect ref count. In case of failure, decrement the ref count before returning. Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-19	drm/amdgpu: avoid dereferencing a NULL pointer	Jack Xiao	1	-7/+12
	[ Upstream commit 55611b507fd6453d26030c0c0619fdf0c262766d ] Check if irq_src is NULL to avoid dereferencing a NULL pointer, for MES ring is uneccessary to recieve an interrupt notification. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-05	drm/amdgpu: Prevent kernel-infoleak in amdgpu_info_ioctl()	Peilin Ye	1	-1/+2
	commit 543e8669ed9bfb30545fd52bc0e047ca4df7fb31 upstream. Compiler leaves a 4-byte hole near the end of `dev_info`, causing amdgpu_info_ioctl() to copy uninitialized kernel stack memory to userspace when `size` is greater than 356. In 2015 we tried to fix this issue by doing `= {};` on `dev_info`, which unfortunately does not initialize that 4-byte hole. Fix it by using memset() instead. Cc: stable@vger.kernel.org Fixes: c193fa91b918 ("drm/amdgpu: information leak in amdgpu_info_ioctl()") Fixes: d38ceaf99ed0 ("drm/amdgpu: add core driver (v4)") Suggested-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Peilin Ye <yepeilin.cs@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-05	Revert "drm/amdgpu: Fix NULL dereference in dpm sysfs handlers"	Alex Deucher	1	-3/+6
	commit 87004abfbc27261edd15716515d89ab42198b405 upstream. This regressed some working configurations so revert it. Will fix this properly for 5.9 and backport then. This reverts commit 38e0c89a19fd13f28d2b4721035160a3e66e270b. Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-29	drm/amdgpu: Fix NULL dereference in dpm sysfs handlers	Paweł Gronowski	1	-6/+3
	commit 38e0c89a19fd13f28d2b4721035160a3e66e270b upstream. NULL dereference occurs when string that is not ended with space or newline is written to some dpm sysfs interface (for example pp_dpm_sclk). This happens because strsep replaces the tmp with NULL if the delimiter is not present in string, which is then dereferenced by tmp[0]. Reproduction example: sudo sh -c 'echo -n 1 > /sys/class/drm/card0/device/pp_dpm_sclk' Signed-off-by: Paweł Gronowski <me@woland.xyz> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-29	drm/amdgpu: fix preemption unit test	Jack Xiao	1	-5/+15
	[ Upstream commit d845a2051b6b673fab4229b920ea04c7c4352b51 ] Remove signaled jobs from job list and ensure the job was indeed preempted. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-07-29	drm/amdgpu/gfx10: fix race condition for kiq	Jack Xiao	1	-1/+8
	[ Upstream commit 7d65a577bb58d4f27a3398a4c0cb0b00ab7d0511 ] During preemption test for gfx10, it uses kiq to trigger gfx preemption, which would result in race condition with flushing TLB for kiq. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-07-22	drm/amdgpu/sdma5: fix wptr overwritten in ->get_wptr()	Xiaojie Yuan	1	-18/+8
	commit 05051496b2622e4d12e2036b35165969aa502f89 upstream. "u64 wptr" points to the the wptr value in write back buffer and "wptr = (*wptr) >> 2;" results in the value being overwritten each time when ->get_wptr() is called. umr uses /sys/kernel/debug/dri/0/amdgpu_ring_sdma0 to get rptr/wptr and decode ring content and it is affected by this issue. fix and simplify the logic similar as sdma_v4_0_ring_get_wptr(). v2: fix for sdma5.2 as well v3: drop sdma 5.2 changes for 5.8 and stable Suggested-by: Le Ma <le.ma@amd.com> Signed-off-by: Xiaojie Yuan <xiaojie.yuan@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-16	drm/amdgpu: don't do soft recovery if gpu_recovery=0	Marek Olšák	1	-1/+2
	commit f4892c327a8e5df7ce16cab40897daf90baf6bec upstream. It's impossible to debug shader hangs with soft recovery. Signed-off-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-09	drm/amdgpu/atomfirmware: fix vram_info fetching for renoir	Alex Deucher	1	-0/+1
	commit d7a6634a4cfba073ff6a526cb4265d6e58ece234 upstream. Renoir uses integrated_system_info table v12. The table has the same layout as v11 with respect to this data. Just reuse the existing code for v12 for stable. Fixes incorrectly reported vram info in the driver output. Acked-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-09	drm/amdgpu: use %u rather than %d for sclk/mclk	Alex Deucher	1	-2/+2
	commit beaf10efca64ac824240838ab1f054dfbefab5e6 upstream. Large clock values may overflow and show up as negative. Reported by prOMiNd on IRC. Acked-by: Nirmoy Das <nirmoy.das@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-30	drm/amdgpu: add fw release for sdma v5_0	Wenhui Sheng	1	-1/+5
	commit edfaf6fa73f15568d4337f208b2333f647c35810 upstream. sdma fw isn't released when module exit Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Wenhui Sheng <Wenhui.Sheng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-22	drm/amdgpu: Sync with VM root BO when switching VM to CPU update mode	Felix Kuehling	1	-2/+9
	[ Upstream commit 90ca78deb004abe75b5024968a199acb96bb70f9 ] This fixes an intermittent bug where a root PD clear operation still in progress could overwrite a PDE update done by the CPU, resulting in a VM fault. Fixes: 108b4d928c03 ("drm/amd/amdgpu: Update VM function pointer") Reported-by: Jay Cornwall <Jay.Cornwall@amd.com> Tested-by: Jay Cornwall <Jay.Cornwall@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-22	drm/amd/powerpay: Disable gfxoff when setting manual mode on picasso and raven	chen gong	1	-0/+9
	[ Upstream commit cbd2d08c7463e78d625a69e9db27ad3004cbbd99 ] [Problem description] 1. Boot up picasso platform, launches desktop, Don't do anything (APU enter into "gfxoff" state) 2. Remote login to platform using SSH, then type the command line: sudo su -c "echo manual > /sys/class/drm/card0/device/power_dpm_force_performance_level" sudo su -c "echo 2 > /sys/class/drm/card0/device/pp_dpm_sclk" (fix SCLK to 1400MHz) 3. Move the mouse around in Window 4. Phenomenon : The screen frozen Tester will switch sclk level during glmark2 run time. APU will enter "gfxoff" state intermittently during glmark2 run time. The system got hanged if fix GFXCLK to 1400MHz when APU is in "gfxoff" state. [Debug] 1. Fix SCLK to X MHz 1400: screen frozen, screen black, then OS will reboot. 1300: screen frozen. 1200: screen frozen, screen black. 1100: screen frozen, screen black, then OS will reboot. 1000: screen frozen, screen black. 900: screen frozen, screen black, then OS will reboot. 800: Situation Nomal, issue disappear. 700: Situation Nomal, issue disappear. 2. SBIOS setting: AMD CBS --> SMU Debug Options -->SMU Debug --> "GFX DLDO Psm Margin Control": 50 : Situation Nomal, issue disappear. 45 : Situation Nomal, issue disappear. 40 : Situation Nomal, issue disappear. 35 : Situation Nomal, issue disappear. 30 : screen black. 25 : screen frozen, then blurred screen. 20 : screen frozen. 15 : screen black. 10 : screen frozen. 5 : screen frozen, then blurred screen. 3. Disable GFXOFF feature Situation Nomal, issue disappear. [Why] Through a period of time debugging with Sys Eng team and SMU team, Sys Eng team said this is voltage/frequency marginal issue not a F/W or H/W bug. This experiment proves that default targetPsm [for f=1400MHz] is not sufficient when GFXOFF is enabled on Picasso. SMU team think it is an odd test conditions to force sclk="1400MHz" when GPU is in "gfxoff" state，then wake up the GFX. SCLK should be in the "lowest frequency" when gfxoff. [How] Disable gfxoff when setting manual mode. Enable gfxoff when setting other mode(exiting manual mode) again. By the way, from the user point of view, now that user switch to manual mode and force SCLK Frequency, he don't want SCLK be controlled by workload.It becomes meaningless to "switch to manual mode" if APU enter "gfxoff" due to lack of workload at this point. Tips: Same issue observed on Raven. Signed-off-by: chen gong <curry.gong@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-22	drm/amdgpu: Init data to avoid oops while reading pp_num_states.	limingyu	1	-1/+4
	[ Upstream commit 6f81b2d047c59eb77cd04795a44245d6a52cdaec ] For chip like CHIP_OLAND with si enabled(amdgpu.si_support=1), the amdgpu will expose pp_num_states to the /sys directory. In this moment, read the pp_num_states file will excute the amdgpu_get_pp_num_states func. In our case, the data hasn't been initialized, so the kernel will access some ilegal address, trigger the segmentfault and system will reboot soon: uos@uos-PC:~$ cat /sys/devices/pci0000\:00/0000\:00\:00.0/0000\:01\:00 .0/pp_num_states Message from syslogd@uos-PC at Apr 22 09:26:20 ... kernel:[ 82.154129] Internal error: Oops: 96000004 [#1] SMP This patch aims to fix this problem, avoid that reading file triggers the kernel sementfault. Signed-off-by: limingyu <limingyu@uniontech.com> Signed-off-by: zhoubinbin <zhoubinbin@uniontech.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-22	drm/amdgpu: fix and cleanup amdgpu_gem_object_close v4	Christian König	1	-18/+25
	[ Upstream commit 82c416b13cb7d22b96ec0888b296a48dff8a09eb ] The problem is that we can't add the clear fence to the BO when there is an exclusive fence on it since we can't guarantee the the clear fence will complete after the exclusive one. To fix this refactor the function and also add the exclusive fence as shared to the resv object. v2: fix warning v3: add excl fence as shared instead v4: squash in fix for fence handling in amdgpu_gem_object_close Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: xinhui pan <xinhui.pan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-03	drm/amdgpu: Use GEM obj reference for KFD BOs	Felix Kuehling	1	-2/+3
	[ Upstream commit 39b3128d7ffd44e400e581e6f49e88cb42bef9a1 ] Releasing the AMDGPU BO ref directly leads to problems when BOs were exported as DMA bufs. Releasing the GEM reference makes sure that the AMDGPU/TTM BO is not freed too early. Also take a GEM reference when importing BOs from DMABufs to keep references to imported BOs balances properly. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Tested-by: Alex Sierra <alex.sierra@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Sierra <alex.sierra@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-03	drm/amdgpu: drop unnecessary cancel_delayed_work_sync on PG ungate	Evan Quan	2	-14/+4
	[ Upstream commit 1fe48ec08d9f2e26d893a6c05bd6c99a3490f9ef ] As this is already properly handled in amdgpu_gfx_off_ctrl(). In fact, this unnecessary cancel_delayed_work_sync may leave a small time window for race condition and is dangerous. Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-05-20	drm/amdgpu: force fbdev into vram	Alex Deucher	1	-2/+1
	[ Upstream commit a6aacb2b26e85aa619cf0c6f98d0ca77314cd2a1 ] We set the fb smem pointer to the offset into the BAR, so keep the fbdev bo in vram. Bug: https://bugzilla.kernel.org/show_bug.cgi?id=207581 Fixes: 6c8d74caa2fa33 ("drm/amdgpu: Enable scatter gather display support") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-05-20	drm/amdgpu: invalidate L2 before SDMA IBs (v2)	Marek Olšák	2	-1/+29
	[ Upstream commit fdf83646c0542ecfb9adc4db8f741a1f43dca058 ] This fixes GPU hangs due to cache coherency issues. v2: Split the version bump to a separate patch Signed-off-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Tested-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-05-20	drm/amdgpu: simplify padding calculations (v2)	Luben Tuikov	5	-13/+20
	[ Upstream commit ce73516d42c9ab011fad498168b892d28e181db4 ] Simplify padding calculations. v2: Comment update and spacing. Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-05-14	drm/amdgpu: drop redundant cg/pg ungate on runpm enter	Evan Quan	1	-3/+0
	[ Upstream commit f7b52890daba570bc8162d43c96b5583bbdd4edd ] CG/PG ungate is already performed in ip_suspend_phase1. Otherwise, the CG/PG ungate will be performed twice. That will cause gfxoff disablement is performed twice also on runpm enter while gfxoff enablemnt once on rump exit. That will put gfxoff into disabled state. Fixes: b2a7e9735ab286 ("drm/amdgpu: fix the hw hang during perform system reboot and reset") Signed-off-by: Evan Quan <evan.quan@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-05-14	drm/amdgpu: move kfd suspend after ip_suspend_phase1	Evan Quan	1	-2/+2
	[ Upstream commit c457a273e118bb96e1db8d1825f313e6cafe4258 ] This sequence change should be safe as what did in ip_suspend_phase1 is to suspend DCE only. And this is a prerequisite for coming redundant cg/pg ungate dropping. Fixes: 487eca11a321ef ("drm/amdgpu: fix gfx hang during suspend with video playback (v2)") Signed-off-by: Evan Quan <evan.quan@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-05-10	drm/amdgpu: Fix oops when pp_funcs is unset in ACPI event	Aaron Ma	1	-1/+2
	commit 5932d260a8d85a103bd6c504fbb85ff58b156bf9 upstream. On ARCTURUS and RENOIR, powerplay is not supported yet. When plug in or unplug power jack, ACPI event will issue. Then kernel NULL pointer BUG will be triggered. Check for NULL pointers before calling. Signed-off-by: Aaron Ma <aaron.ma@canonical.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-21	drm/amdgpu: fix the hw hang during perform system reboot and reset	Prike Liang	1	-0/+2
	commit b2a7e9735ab2864330be9d00d7f38c961c28de5d upstream. The system reboot failed as some IP blocks enter power gate before perform hw resource destory. Meanwhile use unify interface to set device CGPG to ungate state can simplify the amdgpu poweroff or reset ungate guard. Fixes: 487eca11a321ef ("drm/amdgpu: fix gfx hang during suspend with video playback (v2)") Signed-off-by: Prike Liang <Prike.Liang@amd.com> Tested-by: Mengbing Wang <Mengbing.Wang@amd.com> Tested-by: Paul Menzel <pmenzel@molgen.mpg.de> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-17	drm/amdgpu: fix gfx hang during suspend with video playback (v2)	Prike Liang	1	-2/+3
	[ Upstream commit 487eca11a321ef33bcf4ca5adb3c0c4954db1b58 ] The system will be hang up during S3 suspend because of SMU is pending for GC not respose the register CP_HQD_ACTIVE access request.This issue root cause of accessing the GC register under enter GFX CGGPG and can be fixed by disable GFX CGPG before perform suspend. v2: Use disable the GFX CGPG instead of RLC safe mode guard. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Tested-by: Mengbing Wang <Mengbing.Wang@amd.com> Reviewed-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-17	drm/amdgpu: unify fw_write_wait for new gfx9 asics	Aaron Liu	1	-0/+2
	commit 2960758cce2310774de60bbbd8d6841d436c54d9 upstream. Make the fw_write_wait default case true since presumably all new gfx9 asics will have updated firmware. That is using unique WAIT_REG_MEM packet with opration=1. Signed-off-by: Aaron Liu <aaron.liu@amd.com> Tested-by: Aaron Liu <aaron.liu@amd.com> Tested-by: Yuxian Dai <Yuxian.Dai@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-08	drm/amdgpu: fix typo for vcn1 idle check	James Zhu	1	-1/+1
	[ Upstream commit acfc62dc68770aa665cc606891f6df7d6d1e52c0 ] fix typo for vcn1 idle check Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-01	drm/amdgpu: correct ROM_INDEX/DATA offset for VEGA20	Hawking Zhang	1	-2/+23
	[ Upstream commit f1c2cd3f8fb959123a9beba18c0e8112dcb2e137 ] The ROMC_INDEX/DATA offset was changed to e4/e5 since from smuio_v11 (vega20/arcturus). Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Tested-by: Candice Li <Candice.Li@amd.com> Reviewed-by: Candice Li <Candice.Li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-03-25	drm/amd/amdgpu: Fix GPR read from debugfs (v2)	Tom St Denis	1	-3/+3
	commit 5bbc6604a62814511c32f2e39bc9ffb2c1b92cbe upstream. The offset into the array was specified in bytes but should be in terms of 32-bit words. Also prevent large reads that would also cause a buffer overread. v2: Read from correct offset from internal storage buffer. Signed-off-by: Tom St Denis <tom.stdenis@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-25	drm/amdgpu: clean wptr on wb when gpu recovery	Yintian Tao	2	-0/+2
	[ Upstream commit 2ab7e274b86739f4ceed5d94b6879f2d07b2802f ] The TDR will be randomly failed due to compute ring test failure. If the compute ring wptr & 0x7ff(ring_buf_mask) is 0x100 then after map mqd the compute ring rptr will be synced with 0x100. And the ring test packet size is also 0x100. Then after invocation of amdgpu_ring_commit, the cp will not really handle the packet on the ring buffer because rptr is equal to wptr. Signed-off-by: Yintian Tao <yttao@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Monk Liu <Monk.Liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-03-21	drm/amdgpu: Fix TLB invalidation request when using semaphore	Felix Kuehling	2	-6/+7
	[ Upstream commit 37c58ddf57364d1a636850bb8ba6acbe1e16195e ] Use a more meaningful variable name for the invalidation request that is distinct from the tmp variable that gets overwritten when acquiring the invalidation semaphore. Fixes: 4ed8a03740d0 ("drm/amdgpu: invalidate mmhub semaphore workaround in gmc9/gmc10") Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Yong Zhao <Yong.Zhao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-03-18	drm/amd/display: remove duplicated assignment to grph_obj_type	Colin Ian King	1	-2/+1
	commit d785476c608c621b345dd9396e8b21e90375cb0e upstream. Variable grph_obj_type is being assigned twice, one of these is redundant so remove it. Addresses-Coverity: ("Evaluation order violation") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: <nobuhiro1.iwamatsu@toshiba.co.jp> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-05	amdgpu/gmc_v9: save/restore sdpif regs during S3	Shirish S	2	-1/+37
	commit a3ed353cf8015ba84a0407a5dc3ffee038166ab0 upstream. fixes S3 issue with IOMMU + S/G enabled @ 64M VRAM. Suggested-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Shirish S <shirish.s@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-05	drm/amdgpu: Drop DRIVER_USE_AGP	Daniel Vetter	1	-1/+1
	commit 8a3bddf67ce88b96531fb22c5a75d7f4dc41d155 upstream. This doesn't do anything except auto-init drm_agp support when you call drm_get_pci_dev(). Which amdgpu stopped doing with commit b58c11314a1706bf094c489ef5cb28f76478c704 Author: Alex Deucher <alexander.deucher@amd.com> Date: Fri Jun 2 17:16:31 2017 -0400 drm/amdgpu: drop deprecated drm_get_pci_dev and drm_put_dev No idea whether this was intentional or accidental breakage, but I guess anyone who manages to boot a this modern gpu behind an agp bridge deserves a price. A price I never expect anyone to ever collect :-) Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: Hawking Zhang <Hawking.Zhang@amd.com> Cc: Xiaojie Yuan <xiaojie.yuan@amd.com> Cc: Evan Quan <evan.quan@amd.com> Cc: "Tianci.Yin" <tianci.yin@amd.com> Cc: "Marek Olšák" <marek.olsak@amd.com> Cc: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-28	drm/amdgpu/gfx10: disable gfxoff when reading rlc clock	Alex Deucher	1	-0/+2
	commit b08c3ed609aabc4e76e74edc4404f0c26279d7ed upstream. Otherwise we readback all ones. Fixes rlc counter readback while gfxoff is active. Reviewed-by: Xiaojie Yuan <xiaojie.yuan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-28	drm/amdgpu/gfx9: disable gfxoff when reading rlc clock	Alex Deucher	1	-0/+2
	commit 120cf959308e1bda984e40a9edd25ee2d6262efd upstream. Otherwise we readback all ones. Fixes rlc counter readback while gfxoff is active. Reviewed-by: Xiaojie Yuan <xiaojie.yuan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-28	drm/amdgpu/soc15: fix xclk for raven	Alex Deucher	1	-1/+6
	commit c657b936ea98630ef5ba4f130ab1ad5c534d0165 upstream. It's 25 Mhz (refclk / 4). This fixes the interpretation of the rlc clock counter. Acked-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-24	drm/amdgpu: fix KIQ ring test fail in TDR of SRIOV	Monk Liu	1	-2/+0
	[ Upstream commit 5a7489a7e189ee2be889485f90c8cf24ea4b9a40 ] issues: MEC is ruined by the amdkfd_pre_reset after VF FLR done fix: amdkfd_pre_reset() would ruin MEC after hypervisor finished the VF FLR, the correct sequence is do amdkfd_pre_reset before VF FLR but there is a limitation to block this sequence: if we do pre_reset() before VF FLR, it would go KIQ way to do register access and stuck there, because KIQ probably won't work by that time (e.g. you already made GFX hang) so the best way right now is to simply remove it. Signed-off-by: Monk Liu <Monk.Liu@amd.com> Reviewed-by: Emily Deng <Emily.Deng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-02-24	drm/amdgpu: Ensure ret is always initialized when using SOC15_WAIT_ON_RREG	Nathan Chancellor	1	-0/+1
	[ Upstream commit a63141e31764f8daf3f29e8e2d450dcf9199d1c8 ] Commit b0f3cd3191cd ("drm/amdgpu: remove unnecessary JPEG2.0 code from VCN2.0") introduced a new clang warning in the vcn_v2_0_stop function: ../drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c:1082:2: warning: variable 'r' is used uninitialized whenever 'while' loop exits because its condition is false [-Wsometimes-uninitialized] SOC15_WAIT_ON_RREG(VCN, 0, mmUVD_STATUS, UVD_STATUS__IDLE, 0x7, r); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ../drivers/gpu/drm/amd/amdgpu/../amdgpu/soc15_common.h:55:10: note: expanded from macro 'SOC15_WAIT_ON_RREG' while ((tmp_ & (mask)) != (expected_value)) { \ ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ../drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c:1083:6: note: uninitialized use occurs here if (r) ^ ../drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c:1082:2: note: remove the condition if it is always true SOC15_WAIT_ON_RREG(VCN, 0, mmUVD_STATUS, UVD_STATUS__IDLE, 0x7, r); ^ ../drivers/gpu/drm/amd/amdgpu/../amdgpu/soc15_common.h:55:10: note: expanded from macro 'SOC15_WAIT_ON_RREG' while ((tmp_ & (mask)) != (expected_value)) { \ ^ ../drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c:1072:7: note: initialize the variable 'r' to silence this warning int r; ^ = 0 1 warning generated. To prevent warnings like this from happening in the future, make the SOC15_WAIT_ON_RREG macro initialize its ret variable before the while loop that can time out. This macro's return value is always checked so it should set ret in both the success and fail path. Link: https://github.com/ClangBuiltLinux/linux/issues/776 Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-02-24	drm/amdgpu: remove 4 set but not used variable in ↵	yu kuai	1	-17/+2
	amdgpu_atombios_get_connector_info_from_object_table [ Upstream commit bae028e3e521e8cb8caf2cc16a455ce4c55f2332 ] Fixes gcc '-Wunused-but-set-variable' warning: drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c: In function 'amdgpu_atombios_get_connector_info_from_object_table': drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c:376:26: warning: variable 'grph_obj_num' set but not used [-Wunused-but-set-variable] drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c:376:13: warning: variable 'grph_obj_id' set but not used [-Wunused-but-set-variable] drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c:341:37: warning: variable 'con_obj_type' set but not used [-Wunused-but-set-variable] drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c:341:24: warning: variable 'con_obj_num' set but not used [-Wunused-but-set-variable] They are never used, so can be removed. Fixes: d38ceaf99ed0 ("drm/amdgpu: add core driver (v4)") Signed-off-by: yu kuai <yukuai3@huawei.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-02-24	drm/amdgpu/sriov: workaround on rev_id for Navi12 under sriov	Tiecheng Zhou	1	-0/+6
	[ Upstream commit df5e984c8bd414561c320d6cbbb66d53abf4c7e2 ] guest vm gets 0xffffffff when reading RCC_DEV0_EPF0_STRAP0, as a consequence, the rev_id and external_rev_id are wrong. workaround it by hardcoding the rev_id to 0, which is the default value. v2. add comment in the code Signed-off-by: Tiecheng Zhou <Tiecheng.Zhou@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-02-01	drm/amdgpu/SRIOV: add navi12 pci id for SRIOV (v2)	Jiange Zhao	1	-0/+1
	[ Upstream commit 57d4f3b7fd65b56f98b62817f27c461142c0bc2a ] Add Navi12 PCI id support. v2: flag as experimental for now (Alex) Signed-off-by: Jiange Zhao <Jiange.Zhao@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-26	drm/amdgpu: remove excess function parameter description	yu kuai	1	-2/+0
	[ Upstream commit d0580c09c65cff211f589a40e08eabc62da463fb ] Fixes gcc warning: drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c:431: warning: Excess function parameter 'sw' description in 'vcn_v2_5_disable_clock_gating' drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c:550: warning: Excess function parameter 'sw' description in 'vcn_v2_5_enable_clock_gating' Fixes: cbead2bdfcf1 ("drm/amdgpu: add VCN2.5 VCPU start and stop") Signed-off-by: yu kuai <yukuai3@huawei.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-23	drm/amdgpu: allow direct upload save restore list for raven2	changzhu	1	-1/+3
	commit eebc7f4d7ffa09f2a620bd1e2c67ddd579118af9 upstream. It will cause modprobe atombios stuck problem in raven2 if it doesn't allow direct upload save restore list from gfx driver. So it needs to allow direct upload save restore list for raven2 temporarily. Signed-off-by: changzhu <Changfeng.Zhu@amd.com> Reviewed-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>