summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu
AgeCommit message (Collapse)AuthorFilesLines
2020-06-22drm/amdgpu: fix and cleanup amdgpu_gem_object_close v4Christian König1-18/+25
[ Upstream commit 82c416b13cb7d22b96ec0888b296a48dff8a09eb ] The problem is that we can't add the clear fence to the BO when there is an exclusive fence on it since we can't guarantee the the clear fence will complete after the exclusive one. To fix this refactor the function and also add the exclusive fence as shared to the resv object. v2: fix warning v3: add excl fence as shared instead v4: squash in fix for fence handling in amdgpu_gem_object_close Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: xinhui pan <xinhui.pan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-03drm/amdgpu: Use GEM obj reference for KFD BOsFelix Kuehling1-2/+3
[ Upstream commit 39b3128d7ffd44e400e581e6f49e88cb42bef9a1 ] Releasing the AMDGPU BO ref directly leads to problems when BOs were exported as DMA bufs. Releasing the GEM reference makes sure that the AMDGPU/TTM BO is not freed too early. Also take a GEM reference when importing BOs from DMABufs to keep references to imported BOs balances properly. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Tested-by: Alex Sierra <alex.sierra@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Sierra <alex.sierra@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-03drm/amdgpu: drop unnecessary cancel_delayed_work_sync on PG ungateEvan Quan2-14/+4
[ Upstream commit 1fe48ec08d9f2e26d893a6c05bd6c99a3490f9ef ] As this is already properly handled in amdgpu_gfx_off_ctrl(). In fact, this unnecessary cancel_delayed_work_sync may leave a small time window for race condition and is dangerous. Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-05-20drm/amdgpu: force fbdev into vramAlex Deucher1-2/+1
[ Upstream commit a6aacb2b26e85aa619cf0c6f98d0ca77314cd2a1 ] We set the fb smem pointer to the offset into the BAR, so keep the fbdev bo in vram. Bug: https://bugzilla.kernel.org/show_bug.cgi?id=207581 Fixes: 6c8d74caa2fa33 ("drm/amdgpu: Enable scatter gather display support") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-05-20drm/amdgpu: invalidate L2 before SDMA IBs (v2)Marek Olšák2-1/+29
[ Upstream commit fdf83646c0542ecfb9adc4db8f741a1f43dca058 ] This fixes GPU hangs due to cache coherency issues. v2: Split the version bump to a separate patch Signed-off-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Tested-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-05-20drm/amdgpu: simplify padding calculations (v2)Luben Tuikov5-13/+20
[ Upstream commit ce73516d42c9ab011fad498168b892d28e181db4 ] Simplify padding calculations. v2: Comment update and spacing. Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-05-14drm/amdgpu: drop redundant cg/pg ungate on runpm enterEvan Quan1-3/+0
[ Upstream commit f7b52890daba570bc8162d43c96b5583bbdd4edd ] CG/PG ungate is already performed in ip_suspend_phase1. Otherwise, the CG/PG ungate will be performed twice. That will cause gfxoff disablement is performed twice also on runpm enter while gfxoff enablemnt once on rump exit. That will put gfxoff into disabled state. Fixes: b2a7e9735ab286 ("drm/amdgpu: fix the hw hang during perform system reboot and reset") Signed-off-by: Evan Quan <evan.quan@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-05-14drm/amdgpu: move kfd suspend after ip_suspend_phase1Evan Quan1-2/+2
[ Upstream commit c457a273e118bb96e1db8d1825f313e6cafe4258 ] This sequence change should be safe as what did in ip_suspend_phase1 is to suspend DCE only. And this is a prerequisite for coming redundant cg/pg ungate dropping. Fixes: 487eca11a321ef ("drm/amdgpu: fix gfx hang during suspend with video playback (v2)") Signed-off-by: Evan Quan <evan.quan@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-05-10drm/amdgpu: Fix oops when pp_funcs is unset in ACPI eventAaron Ma1-1/+2
commit 5932d260a8d85a103bd6c504fbb85ff58b156bf9 upstream. On ARCTURUS and RENOIR, powerplay is not supported yet. When plug in or unplug power jack, ACPI event will issue. Then kernel NULL pointer BUG will be triggered. Check for NULL pointers before calling. Signed-off-by: Aaron Ma <aaron.ma@canonical.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-21drm/amdgpu: fix the hw hang during perform system reboot and resetPrike Liang1-0/+2
commit b2a7e9735ab2864330be9d00d7f38c961c28de5d upstream. The system reboot failed as some IP blocks enter power gate before perform hw resource destory. Meanwhile use unify interface to set device CGPG to ungate state can simplify the amdgpu poweroff or reset ungate guard. Fixes: 487eca11a321ef ("drm/amdgpu: fix gfx hang during suspend with video playback (v2)") Signed-off-by: Prike Liang <Prike.Liang@amd.com> Tested-by: Mengbing Wang <Mengbing.Wang@amd.com> Tested-by: Paul Menzel <pmenzel@molgen.mpg.de> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-17drm/amdgpu: fix gfx hang during suspend with video playback (v2)Prike Liang1-2/+3
[ Upstream commit 487eca11a321ef33bcf4ca5adb3c0c4954db1b58 ] The system will be hang up during S3 suspend because of SMU is pending for GC not respose the register CP_HQD_ACTIVE access request.This issue root cause of accessing the GC register under enter GFX CGGPG and can be fixed by disable GFX CGPG before perform suspend. v2: Use disable the GFX CGPG instead of RLC safe mode guard. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Tested-by: Mengbing Wang <Mengbing.Wang@amd.com> Reviewed-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-17drm/amdgpu: unify fw_write_wait for new gfx9 asicsAaron Liu1-0/+2
commit 2960758cce2310774de60bbbd8d6841d436c54d9 upstream. Make the fw_write_wait default case true since presumably all new gfx9 asics will have updated firmware. That is using unique WAIT_REG_MEM packet with opration=1. Signed-off-by: Aaron Liu <aaron.liu@amd.com> Tested-by: Aaron Liu <aaron.liu@amd.com> Tested-by: Yuxian Dai <Yuxian.Dai@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-08drm/amdgpu: fix typo for vcn1 idle checkJames Zhu1-1/+1
[ Upstream commit acfc62dc68770aa665cc606891f6df7d6d1e52c0 ] fix typo for vcn1 idle check Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-01drm/amdgpu: correct ROM_INDEX/DATA offset for VEGA20Hawking Zhang1-2/+23
[ Upstream commit f1c2cd3f8fb959123a9beba18c0e8112dcb2e137 ] The ROMC_INDEX/DATA offset was changed to e4/e5 since from smuio_v11 (vega20/arcturus). Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Tested-by: Candice Li <Candice.Li@amd.com> Reviewed-by: Candice Li <Candice.Li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-03-25drm/amd/amdgpu: Fix GPR read from debugfs (v2)Tom St Denis1-3/+3
commit 5bbc6604a62814511c32f2e39bc9ffb2c1b92cbe upstream. The offset into the array was specified in bytes but should be in terms of 32-bit words. Also prevent large reads that would also cause a buffer overread. v2: Read from correct offset from internal storage buffer. Signed-off-by: Tom St Denis <tom.stdenis@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-25drm/amdgpu: clean wptr on wb when gpu recoveryYintian Tao2-0/+2
[ Upstream commit 2ab7e274b86739f4ceed5d94b6879f2d07b2802f ] The TDR will be randomly failed due to compute ring test failure. If the compute ring wptr & 0x7ff(ring_buf_mask) is 0x100 then after map mqd the compute ring rptr will be synced with 0x100. And the ring test packet size is also 0x100. Then after invocation of amdgpu_ring_commit, the cp will not really handle the packet on the ring buffer because rptr is equal to wptr. Signed-off-by: Yintian Tao <yttao@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Monk Liu <Monk.Liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-03-21drm/amdgpu: Fix TLB invalidation request when using semaphoreFelix Kuehling2-6/+7
[ Upstream commit 37c58ddf57364d1a636850bb8ba6acbe1e16195e ] Use a more meaningful variable name for the invalidation request that is distinct from the tmp variable that gets overwritten when acquiring the invalidation semaphore. Fixes: 4ed8a03740d0 ("drm/amdgpu: invalidate mmhub semaphore workaround in gmc9/gmc10") Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Yong Zhao <Yong.Zhao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-03-18drm/amd/display: remove duplicated assignment to grph_obj_typeColin Ian King1-2/+1
commit d785476c608c621b345dd9396e8b21e90375cb0e upstream. Variable grph_obj_type is being assigned twice, one of these is redundant so remove it. Addresses-Coverity: ("Evaluation order violation") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: <nobuhiro1.iwamatsu@toshiba.co.jp> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-05amdgpu/gmc_v9: save/restore sdpif regs during S3Shirish S2-1/+37
commit a3ed353cf8015ba84a0407a5dc3ffee038166ab0 upstream. fixes S3 issue with IOMMU + S/G enabled @ 64M VRAM. Suggested-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Shirish S <shirish.s@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-05drm/amdgpu: Drop DRIVER_USE_AGPDaniel Vetter1-1/+1
commit 8a3bddf67ce88b96531fb22c5a75d7f4dc41d155 upstream. This doesn't do anything except auto-init drm_agp support when you call drm_get_pci_dev(). Which amdgpu stopped doing with commit b58c11314a1706bf094c489ef5cb28f76478c704 Author: Alex Deucher <alexander.deucher@amd.com> Date: Fri Jun 2 17:16:31 2017 -0400 drm/amdgpu: drop deprecated drm_get_pci_dev and drm_put_dev No idea whether this was intentional or accidental breakage, but I guess anyone who manages to boot a this modern gpu behind an agp bridge deserves a price. A price I never expect anyone to ever collect :-) Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: Hawking Zhang <Hawking.Zhang@amd.com> Cc: Xiaojie Yuan <xiaojie.yuan@amd.com> Cc: Evan Quan <evan.quan@amd.com> Cc: "Tianci.Yin" <tianci.yin@amd.com> Cc: "Marek Olšák" <marek.olsak@amd.com> Cc: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-28drm/amdgpu/gfx10: disable gfxoff when reading rlc clockAlex Deucher1-0/+2
commit b08c3ed609aabc4e76e74edc4404f0c26279d7ed upstream. Otherwise we readback all ones. Fixes rlc counter readback while gfxoff is active. Reviewed-by: Xiaojie Yuan <xiaojie.yuan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-28drm/amdgpu/gfx9: disable gfxoff when reading rlc clockAlex Deucher1-0/+2
commit 120cf959308e1bda984e40a9edd25ee2d6262efd upstream. Otherwise we readback all ones. Fixes rlc counter readback while gfxoff is active. Reviewed-by: Xiaojie Yuan <xiaojie.yuan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-28drm/amdgpu/soc15: fix xclk for ravenAlex Deucher1-1/+6
commit c657b936ea98630ef5ba4f130ab1ad5c534d0165 upstream. It's 25 Mhz (refclk / 4). This fixes the interpretation of the rlc clock counter. Acked-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-24drm/amdgpu: fix KIQ ring test fail in TDR of SRIOVMonk Liu1-2/+0
[ Upstream commit 5a7489a7e189ee2be889485f90c8cf24ea4b9a40 ] issues: MEC is ruined by the amdkfd_pre_reset after VF FLR done fix: amdkfd_pre_reset() would ruin MEC after hypervisor finished the VF FLR, the correct sequence is do amdkfd_pre_reset before VF FLR but there is a limitation to block this sequence: if we do pre_reset() before VF FLR, it would go KIQ way to do register access and stuck there, because KIQ probably won't work by that time (e.g. you already made GFX hang) so the best way right now is to simply remove it. Signed-off-by: Monk Liu <Monk.Liu@amd.com> Reviewed-by: Emily Deng <Emily.Deng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-02-24drm/amdgpu: Ensure ret is always initialized when using SOC15_WAIT_ON_RREGNathan Chancellor1-0/+1
[ Upstream commit a63141e31764f8daf3f29e8e2d450dcf9199d1c8 ] Commit b0f3cd3191cd ("drm/amdgpu: remove unnecessary JPEG2.0 code from VCN2.0") introduced a new clang warning in the vcn_v2_0_stop function: ../drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c:1082:2: warning: variable 'r' is used uninitialized whenever 'while' loop exits because its condition is false [-Wsometimes-uninitialized] SOC15_WAIT_ON_RREG(VCN, 0, mmUVD_STATUS, UVD_STATUS__IDLE, 0x7, r); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ../drivers/gpu/drm/amd/amdgpu/../amdgpu/soc15_common.h:55:10: note: expanded from macro 'SOC15_WAIT_ON_RREG' while ((tmp_ & (mask)) != (expected_value)) { \ ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ../drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c:1083:6: note: uninitialized use occurs here if (r) ^ ../drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c:1082:2: note: remove the condition if it is always true SOC15_WAIT_ON_RREG(VCN, 0, mmUVD_STATUS, UVD_STATUS__IDLE, 0x7, r); ^ ../drivers/gpu/drm/amd/amdgpu/../amdgpu/soc15_common.h:55:10: note: expanded from macro 'SOC15_WAIT_ON_RREG' while ((tmp_ & (mask)) != (expected_value)) { \ ^ ../drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c:1072:7: note: initialize the variable 'r' to silence this warning int r; ^ = 0 1 warning generated. To prevent warnings like this from happening in the future, make the SOC15_WAIT_ON_RREG macro initialize its ret variable before the while loop that can time out. This macro's return value is always checked so it should set ret in both the success and fail path. Link: https://github.com/ClangBuiltLinux/linux/issues/776 Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-02-24drm/amdgpu: remove 4 set but not used variable in ↵yu kuai1-17/+2
amdgpu_atombios_get_connector_info_from_object_table [ Upstream commit bae028e3e521e8cb8caf2cc16a455ce4c55f2332 ] Fixes gcc '-Wunused-but-set-variable' warning: drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c: In function 'amdgpu_atombios_get_connector_info_from_object_table': drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c:376:26: warning: variable 'grph_obj_num' set but not used [-Wunused-but-set-variable] drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c:376:13: warning: variable 'grph_obj_id' set but not used [-Wunused-but-set-variable] drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c:341:37: warning: variable 'con_obj_type' set but not used [-Wunused-but-set-variable] drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c:341:24: warning: variable 'con_obj_num' set but not used [-Wunused-but-set-variable] They are never used, so can be removed. Fixes: d38ceaf99ed0 ("drm/amdgpu: add core driver (v4)") Signed-off-by: yu kuai <yukuai3@huawei.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-02-24drm/amdgpu/sriov: workaround on rev_id for Navi12 under sriovTiecheng Zhou1-0/+6
[ Upstream commit df5e984c8bd414561c320d6cbbb66d53abf4c7e2 ] guest vm gets 0xffffffff when reading RCC_DEV0_EPF0_STRAP0, as a consequence, the rev_id and external_rev_id are wrong. workaround it by hardcoding the rev_id to 0, which is the default value. v2. add comment in the code Signed-off-by: Tiecheng Zhou <Tiecheng.Zhou@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-02-01drm/amdgpu/SRIOV: add navi12 pci id for SRIOV (v2)Jiange Zhao1-0/+1
[ Upstream commit 57d4f3b7fd65b56f98b62817f27c461142c0bc2a ] Add Navi12 PCI id support. v2: flag as experimental for now (Alex) Signed-off-by: Jiange Zhao <Jiange.Zhao@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-26drm/amdgpu: remove excess function parameter descriptionyu kuai1-2/+0
[ Upstream commit d0580c09c65cff211f589a40e08eabc62da463fb ] Fixes gcc warning: drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c:431: warning: Excess function parameter 'sw' description in 'vcn_v2_5_disable_clock_gating' drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c:550: warning: Excess function parameter 'sw' description in 'vcn_v2_5_enable_clock_gating' Fixes: cbead2bdfcf1 ("drm/amdgpu: add VCN2.5 VCPU start and stop") Signed-off-by: yu kuai <yukuai3@huawei.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-23drm/amdgpu: allow direct upload save restore list for raven2changzhu1-1/+3
commit eebc7f4d7ffa09f2a620bd1e2c67ddd579118af9 upstream. It will cause modprobe atombios stuck problem in raven2 if it doesn't allow direct upload save restore list from gfx driver. So it needs to allow direct upload save restore list for raven2 temporarily. Signed-off-by: changzhu <Changfeng.Zhu@amd.com> Reviewed-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-17drm/amdgpu: enable gfxoff for raven1 refreshchangzhu1-11/+4
[ Upstream commit e0c63812352298efbce2a71483c1dab627d0c288 ] When smu version is larger than 0x41e2b, it will load raven_kicker_rlc.bin.To enable gfxoff for raven_kicker_rlc.bin,it needs to avoid adev->pm.pp_feature &= ~PP_GFXOFF_MASK when it loads raven_kicker_rlc.bin. Signed-off-by: changzhu <Changfeng.Zhu@amd.com> Reviewed-by: Huang Rui <ray.huang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-17drm/amdgpu/discovery: reserve discovery data at the top of VRAMXiaojie Yuan4-2/+22
commit 5f6a556f98de425fcb7928456839a06f02156633 upstream. IP Discovery data is TMR fenced by the latest PSP BL, so we need to reserve this region. Tested on navi10/12/14 with VBIOS integrated with latest PSP BL. v2: use DISCOVERY_TMR_SIZE macro as bo size use amdgpu_bo_create_kernel_at() to allocate bo Signed-off-by: Xiaojie Yuan <xiaojie.yuan@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-17drm/amdgpu: cleanup creating BOs at fixed location (v2)Christian König4-148/+83
commit de7b45babd9be25138ff5e4a0c34eefffbb226ff upstream. The placement is something TTM/BO internal and the RAS code should avoid touching that directly. Add a helper to create a BO at a fixed location and use that instead. v2: squash in fixes (Alex) Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-14Revert "drm/amdgpu: Set no-retry as default."Alex Deucher1-2/+2
commit 7aec9ec1cf324d5c5a8d17b9c78a34c388e5f17b upstream. This reverts commit 51bfac71cade386966791a8db87a5912781d249f. This causes stability issues on some raven boards. Revert for now until a proper fix is completed. Bug: https://gitlab.freedesktop.org/drm/amd/issues/934 Bug: https://bugzilla.kernel.org/show_bug.cgi?id=206017 Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-09drm/amdgpu: add cache flush workaround to gfx8 emit_fencePierre-Eric Pelloux-Prayer1-3/+19
[ Upstream commit bf26da927a1cd57c9deb2db29ae8cf276ba8b17b ] The same workaround is used for gfx7. Both PAL and Mesa use it for gfx8 too, so port this commit to gfx_v8_0_ring_emit_fence_gfx. Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-09drm/amdgpu: add check before enabling/disabling broadcast modeGuchun Chen1-16/+22
[ Upstream commit 6e807535dae5dbbd53bcc5e81047a20bf5eb08ea ] When security violation from new vbios happens, data fabric is risky to stop working. So prevent the direct access to DF mmFabricConfigAccessControl from the new vbios and onwards. Signed-off-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-04drm/amdgpu: Call find_vma under mmap_semJason Gunthorpe1-16/+21
[ Upstream commit a9ae8731e6e52829a935d81a65d7f925cb95dbac ] find_vma() must be called under the mmap_sem, reorganize this code to do the vma check after entering the lock. Further, fix the unlocked use of struct task_struct's mm, instead use the mm from hmm_mirror which has an active mm_grab. Also the mm_grab must be converted to a mm_get before acquiring mmap_sem or calling find_vma(). Fixes: 66c45500bfdc ("drm/amdgpu: use new HMM APIs and helpers") Fixes: 0919195f2b0d ("drm/amdgpu: Enable amdgpu_ttm_tt_get_user_pages in worker threads") Link: https://lore.kernel.org/r/20191112202231.3856-11-jgg@ziepe.ca Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Philip Yang <Philip.Yang@amd.com> Tested-by: Philip Yang <Philip.Yang@amd.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-12-31drm/amdgpu: fix uninitialized variable pasid_mapping_neededColin Ian King1-1/+1
[ Upstream commit 17cf678a33c6196a3df4531fe5aec91384c9eeb5 ] The boolean variable pasid_mapping_needed is not initialized and there are code paths that do not assign it any value before it is is read later. Fix this by initializing pasid_mapping_needed to false. Addresses-Coverity: ("Uninitialized scalar variable") Fixes: 6817bf283b2b ("drm/amdgpu: grab the id mgr lock while accessing passid_mapping") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-12-31drm/amdgpu: fix bad DMA from INTERRUPT_CNTL2Sam Bobroff1-1/+2
[ Upstream commit 3d0e3ce52ce3eb4b9de3caf9c38dbb5a4d3e13c3 ] The INTERRUPT_CNTL2 register expects a valid DMA address, but is currently set with a GPU MC address. This can cause problems on systems that detect the resulting DMA read from an invalid address (found on a Power8 guest). Instead, use the DMA address of the dummy page because it will always be safe. Fixes: 27ae10641e9c ("drm/amdgpu: add interupt handler implementation for si v3") Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-12-31drm/amdgpu: Avoid accidental thread reactivation.Andrey Grodzovsky1-0/+10
[ Upstream commit a28fda312a9fabdf0e5f5652449d6197c9fb0a90 ] Problem: During GPU reset we call the GPU scheduler to suspend it's thread, those two functions in amdgpu also suspend and resume the sceduler for their needs but this can collide with GPU reset in progress and accidently restart a suspended thread before time. Fix: Serialize with GPU reset. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-12-31drm/amdgpu: fix potential double drop fence referencePan Bian1-0/+2
[ Upstream commit 946ab8db6953535a3a88c957db8328beacdfed9d ] The object fence is not set to NULL after its reference is dropped. As a result, its reference may be dropped again if error occurs after that, which may lead to a use after free bug. To avoid the issue, fence is explicitly set to NULL after dropping its reference. Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Pan Bian <bianpan2016@163.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-12-31drm/amdgpu: disallow direct upload save restore list from gfx driverHawking Zhang1-1/+2
[ Upstream commit 58f46d4b65021083ef4b4d49c6e2c58e5783f626 ] Direct uploading save/restore list via mmio register writes breaks the security policy. Instead, the driver should pass s&r list to psp. For all the ASICs that use rlc v2_1 headers, the driver actually upload s&r list twice, in non-psp ucode front door loading phase and gfx pg initialization phase. The latter is not allowed. VG12 is the only exception where the driver still keeps legacy approach for S&R list uploading. In theory, this can be elimnated if we have valid srcntl ucode for VG12. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Candice Li <Candice.Li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-12-31drm/amdgpu: fix amdgpu trace event print string format errorKevin Wang1-9/+9
[ Upstream commit 2c2fdb8bca290c439e383cfb6857b0c65e528964 ] the trace event print string format error. (use integer type to handle string) before: amdgpu_test_kev-1556 [002] 138.508781: amdgpu_cs_ioctl: sched_job=8, timeline=gfx_0.0.0, context=177, seqno=1, ring_name=ffff94d01c207bf0, num_ibs=2 after: amdgpu_test_kev-1506 [004] 370.703783: amdgpu_cs_ioctl: sched_job=12, timeline=gfx_0.0.0, context=234, seqno=2, ring_name=gfx_0.0.0, num_ibs=1 change trace event list: 1.amdgpu_cs_ioctl 2.amdgpu_sched_run_job 3.amdgpu_ib_pipe_sync Signed-off-by: Kevin Wang <kevin1.wang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-12-31drm/amdgpu: grab the id mgr lock while accessing passid_mappingChristian König1-3/+9
[ Upstream commit 6817bf283b2b851095825ec7f0e9f10398e09125 ] Need to make sure that we actually dropping the right fence. Could be done with RCU as well, but to complicated for a fix. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Chunming Zhou <david1.zhou@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-12-31drm/amdgpu/sriov: add ring_stop before ring_create in psp v11 codeJack Zhang1-27/+34
[ Upstream commit 51c0f58e9f6af3a387d14608033e6796a7ad90ee ] psp v11 code missed ring stop in ring create function(VMR) while psp v3.1 code had the code. This will cause VM destroy1 fail and psp ring create fail. For SIOV-VF, ring_stop should not be deleted in ring_create function. Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-12-21drm/amdgpu: add invalidate semaphore limit for SRIOV and picasso in gmc9changzhu1-20/+24
commit 90f6452ca58d436de4f69b423ecd75a109aa9766 upstream. It may fail to load guest driver in round 2 or cause Xstart problem when using invalidate semaphore for SRIOV or picasso. So it needs avoid using invalidate semaphore for SRIOV and picasso. Signed-off-by: changzhu <Changfeng.Zhu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-21drm/amdgpu: avoid using invalidate semaphore for picassochangzhu1-8/+20
commit 413fc385a594ea6eb08843be33939057ddfdae76 upstream. It may cause timeout waiting for sem acquire in VM flush when using invalidate semaphore for picasso. So it needs to avoid using invalidate semaphore for piasso. Signed-off-by: changzhu <Changfeng.Zhu@amd.com> Reviewed-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-21drm/amdgpu/gfx10: re-init clear state buffer after gpu resetXiaojie Yuan1-6/+37
commit 210b3b3c7563df391bd81d49c51af303b928de4a upstream. This patch fixes 2nd baco reset failure with gfxoff enabled on navi1x. clear state buffer (resides in vram) is corrupted after 1st baco reset, upon gfxoff exit, CPF gets garbage header in CSIB and hangs. Signed-off-by: Xiaojie Yuan <xiaojie.yuan@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-21drm/amdgpu/gfx10: explicitly wait for cp idle after halt/unhaltXiaojie Yuan1-2/+12
commit 1e902a6d32d73e4a6b3bc9d7cd43d4ee2b242dea upstream. 50us is not enough to wait for cp ready after gpu reset on some navi asics. Signed-off-by: Xiaojie Yuan <xiaojie.yuan@amd.com> Suggested-by: Jack Xiao <Jack.Xiao@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-21drm/amdgpu: invalidate mmhub semaphore workaround in gmc9/gmc10changzhu3-2/+116
commit f920d1bb9c4e77efb08c41d70b6d442f46fd8902 upstream. It may lose gpuvm invalidate acknowldege state across power-gating off cycle. To avoid this issue in gmc9/gmc10 invalidation, add semaphore acquire before invalidation and semaphore release after invalidation. After adding semaphore acquire before invalidation, the semaphore register become read-only if another process try to acquire semaphore. Then it will not be able to release this semaphore. Then it may cause deadlock problem. If this deadlock problem happens, it needs a semaphore firmware fix. Signed-off-by: changzhu <Changfeng.Zhu@amd.com> Acked-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>