<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c, branch v5.13.5</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v5.13.5</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v5.13.5'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2021-07-19T08:04:30+00:00</updated>
<entry>
<title>drm/amdgpu: change the default timeout for kernel compute queues</title>
<updated>2021-07-19T08:04:30+00:00</updated>
<author>
<name>Alex Deucher</name>
<email>alexander.deucher@amd.com</email>
</author>
<published>2021-05-04T15:00:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=3eb50ab2c5ac3e25c4ea056491474baac78a4fb3'/>
<id>urn:sha1:3eb50ab2c5ac3e25c4ea056491474baac78a4fb3</id>
<content type='text'>
[ Upstream commit 67387dfe0f6630f2d4f412ce77debec23a49db7a ]

Change to 60s.  This matches what we already do in virtualization.
Infinite timeout can lead to deadlocks in the kernel.

Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Acked-by: Daniel Vetter &lt;daniel.vetter@ffwll.ch&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>drm/amd/amdgpu/sriov disable all ip hw status by default</title>
<updated>2021-07-19T08:04:30+00:00</updated>
<author>
<name>Jack Zhang</name>
<email>Jack.Zhang1@amd.com</email>
</author>
<published>2021-04-27T09:08:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e684e0ee1153f40509926ead0da8007e8061d9ce'/>
<id>urn:sha1:e684e0ee1153f40509926ead0da8007e8061d9ce</id>
<content type='text'>
[ Upstream commit 95ea3dbc4e9548d35ab6fbf67675cef8c293e2f5 ]

Disable all ip's hw status to false before any hw_init.
Only set it to true until its hw_init is executed.

The old 5.9 branch has this change but somehow the 5.11 kernrel does
not have this fix.

Without this change, sriov tdr have gfx IB test fail.

Signed-off-by: Jack Zhang &lt;Jack.Zhang1@amd.com&gt;
Review-by: Emily Deng &lt;Emily.Deng@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: add judgement for dc support</title>
<updated>2021-06-02T21:49:15+00:00</updated>
<author>
<name>Asher Song</name>
<email>Asher.Song@amd.com</email>
</author>
<published>2021-05-21T09:11:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=147feb007685cbb765b16a834d4f00675d589bb4'/>
<id>urn:sha1:147feb007685cbb765b16a834d4f00675d589bb4</id>
<content type='text'>
Drop DC initialization when DCN is harvested in VBIOS. The way
doesn't affect virtual display ip initialization.

Signed-off-by: Likun Gao  &lt;Likun.Gao@amd.com&gt;
Signed-off-by: Asher Song &lt;Asher.Song@amd.com&gt;
Reviewed-by: Guchun Chen &lt;guchun.chen@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amd/amdgpu: fix a potential deadlock in gpu reset</title>
<updated>2021-05-19T22:07:23+00:00</updated>
<author>
<name>Lang Yu</name>
<email>Lang.Yu@amd.com</email>
</author>
<published>2021-05-17T04:47:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=9c2876d56f1ce9b6b2072f1446fb1e8d1532cb3d'/>
<id>urn:sha1:9c2876d56f1ce9b6b2072f1446fb1e8d1532cb3d</id>
<content type='text'>
When amdgpu_ib_ring_tests failed, the reset logic called
amdgpu_device_ip_suspend twice, then deadlock occurred.
Deadlock log:

[  805.655192] amdgpu 0000:04:00.0: amdgpu: ib ring test failed (-110).
[  806.290952] [drm] free PSP TMR buffer

[  806.319406] ============================================
[  806.320315] WARNING: possible recursive locking detected
[  806.321225] 5.11.0-custom #1 Tainted: G        W  OEL
[  806.322135] --------------------------------------------
[  806.323043] cat/2593 is trying to acquire lock:
[  806.323825] ffff888136b1cdc8 (&amp;adev-&gt;dm.dc_lock){+.+.}-{3:3}, at: dm_suspend+0xb8/0x1d0 [amdgpu]
[  806.325668]
               but task is already holding lock:
[  806.326664] ffff888136b1cdc8 (&amp;adev-&gt;dm.dc_lock){+.+.}-{3:3}, at: dm_suspend+0xb8/0x1d0 [amdgpu]
[  806.328430]
               other info that might help us debug this:
[  806.329539]  Possible unsafe locking scenario:

[  806.330549]        CPU0
[  806.330983]        ----
[  806.331416]   lock(&amp;adev-&gt;dm.dc_lock);
[  806.332086]   lock(&amp;adev-&gt;dm.dc_lock);
[  806.332738]
                *** DEADLOCK ***

[  806.333747]  May be due to missing lock nesting notation

[  806.334899] 3 locks held by cat/2593:
[  806.335537]  #0: ffff888100d3f1b8 (&amp;attr-&gt;mutex){+.+.}-{3:3}, at: simple_attr_read+0x4e/0x110
[  806.337009]  #1: ffff888136b1fd78 (&amp;adev-&gt;reset_sem){++++}-{3:3}, at: amdgpu_device_lock_adev+0x42/0x94 [amdgpu]
[  806.339018]  #2: ffff888136b1cdc8 (&amp;adev-&gt;dm.dc_lock){+.+.}-{3:3}, at: dm_suspend+0xb8/0x1d0 [amdgpu]
[  806.340869]
               stack backtrace:
[  806.341621] CPU: 6 PID: 2593 Comm: cat Tainted: G        W  OEL    5.11.0-custom #1
[  806.342921] Hardware name: AMD Celadon-CZN/Celadon-CZN, BIOS WLD0C23N_Weekly_20_12_2 12/23/2020
[  806.344413] Call Trace:
[  806.344849]  dump_stack+0x93/0xbd
[  806.345435]  __lock_acquire.cold+0x18a/0x2cf
[  806.346179]  lock_acquire+0xca/0x390
[  806.346807]  ? dm_suspend+0xb8/0x1d0 [amdgpu]
[  806.347813]  __mutex_lock+0x9b/0x930
[  806.348454]  ? dm_suspend+0xb8/0x1d0 [amdgpu]
[  806.349434]  ? amdgpu_device_indirect_rreg+0x58/0x70 [amdgpu]
[  806.350581]  ? _raw_spin_unlock_irqrestore+0x47/0x50
[  806.351437]  ? dm_suspend+0xb8/0x1d0 [amdgpu]
[  806.352437]  ? rcu_read_lock_sched_held+0x4f/0x80
[  806.353252]  ? rcu_read_lock_sched_held+0x4f/0x80
[  806.354064]  mutex_lock_nested+0x1b/0x20
[  806.354747]  ? mutex_lock_nested+0x1b/0x20
[  806.355457]  dm_suspend+0xb8/0x1d0 [amdgpu]
[  806.356427]  ? soc15_common_set_clockgating_state+0x17d/0x19 [amdgpu]
[  806.357736]  amdgpu_device_ip_suspend_phase1+0x78/0xd0 [amdgpu]
[  806.360394]  amdgpu_device_ip_suspend+0x21/0x70 [amdgpu]
[  806.362926]  amdgpu_device_pre_asic_reset+0xb3/0x270 [amdgpu]
[  806.365560]  amdgpu_device_gpu_recover.cold+0x679/0x8eb [amdgpu]

Signed-off-by: Lang Yu &lt;Lang.Yu@amd.com&gt;
Acked-by: Christian KÃnig &lt;christian.koenig@amd.com&gt;
Reviewed-by: Andrey Grodzovsky &lt;andrey.grodzovsky@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: add judgement when add ip blocks (v2)</title>
<updated>2021-05-13T14:46:58+00:00</updated>
<author>
<name>Likun GAO</name>
<email>Likun.Gao@amd.com</email>
</author>
<published>2021-04-29T06:08:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=83a0b8639185f40ab7fc9dd291a057150eb9d238'/>
<id>urn:sha1:83a0b8639185f40ab7fc9dd291a057150eb9d238</id>
<content type='text'>
Judgement whether to add an sw ip according to the harvest info.

v2: fix indentation (Alex)

Signed-off-by: Likun Gao &lt;Likun.Gao@amd.com&gt;
Reviewed-by: Guchun Chen &lt;guchun.chen@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Register VGA clients after init can no longer fail</title>
<updated>2021-04-29T04:03:21+00:00</updated>
<author>
<name>Kai-Heng Feng</name>
<email>kai.heng.feng@canonical.com</email>
</author>
<published>2021-04-26T10:50:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=8c3dd61cfa05a65a7e1a8a028000fc95856156c4'/>
<id>urn:sha1:8c3dd61cfa05a65a7e1a8a028000fc95856156c4</id>
<content type='text'>
When an amdgpu device fails to init, it makes another VGA device cause
kernel splat:
kernel: amdgpu 0000:08:00.0: amdgpu: amdgpu_device_ip_init failed
kernel: amdgpu 0000:08:00.0: amdgpu: Fatal error during GPU init
kernel: amdgpu: probe of 0000:08:00.0 failed with error -110
...
kernel: amdgpu 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none
kernel: BUG: kernel NULL pointer dereference, address: 0000000000000018
kernel: #PF: supervisor read access in kernel mode
kernel: #PF: error_code(0x0000) - not-present page
kernel: PGD 0 P4D 0
kernel: Oops: 0000 [#1] SMP NOPTI
kernel: CPU: 6 PID: 1080 Comm: Xorg Tainted: G        W         5.12.0-rc8+ #12
kernel: Hardware name: HP HP EliteDesk 805 G6/872B, BIOS S09 Ver. 02.02.00 12/30/2020
kernel: RIP: 0010:amdgpu_device_vga_set_decode+0x13/0x30 [amdgpu]
kernel: Code: 06 31 c0 c3 b8 ea ff ff ff 5d c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 0f 1f 44 00 00 55 48 8b 87 90 06 00 00 48 89 e5 53 89 f3 &lt;48&gt; 8b 40 18 40 0f b6 f6 e8 40 58 39 fd 80 fb 01 5b 5d 19 c0 83 e0
kernel: RSP: 0018:ffffae3c0246bd68 EFLAGS: 00010002
kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
kernel: RDX: ffff8dd1af5a8560 RSI: 0000000000000000 RDI: ffff8dce8c160000
kernel: RBP: ffffae3c0246bd70 R08: ffff8dd1af5985c0 R09: ffffae3c0246ba38
kernel: R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000246
kernel: R13: 0000000000000000 R14: 0000000000000003 R15: ffff8dce81490000
kernel: FS:  00007f9303d8fa40(0000) GS:ffff8dd1af580000(0000) knlGS:0000000000000000
kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 0000000000000018 CR3: 0000000103cfa000 CR4: 0000000000350ee0
kernel: Call Trace:
kernel:  vga_arbiter_notify_clients.part.0+0x4a/0x80
kernel:  vga_get+0x17f/0x1c0
kernel:  vga_arb_write+0x121/0x6a0
kernel:  ? apparmor_file_permission+0x1c/0x20
kernel:  ? security_file_permission+0x30/0x180
kernel:  vfs_write+0xca/0x280
kernel:  ksys_write+0x67/0xe0
kernel:  __x64_sys_write+0x1a/0x20
kernel:  do_syscall_64+0x38/0x90
kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xae
kernel: RIP: 0033:0x7f93041e02f7
kernel: Code: 75 05 48 83 c4 58 c3 e8 f7 33 ff ff 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 &lt;48&gt; 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
kernel: RSP: 002b:00007fff60e49b28 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
kernel: RAX: ffffffffffffffda RBX: 000000000000000b RCX: 00007f93041e02f7
kernel: RDX: 000000000000000b RSI: 00007fff60e49b40 RDI: 000000000000000f
kernel: RBP: 00007fff60e49b40 R08: 00000000ffffffff R09: 00007fff60e499d0
kernel: R10: 00007f93049350b5 R11: 0000000000000246 R12: 000056111d45e808
kernel: R13: 0000000000000000 R14: 000056111d45e7f8 R15: 000056111d46c980
kernel: Modules linked in: nls_iso8859_1 snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_seq input_leds snd_seq_device snd_timer snd soundcore joydev kvm_amd serio_raw k10temp mac_hid hp_wmi ccp kvm sparse_keymap wmi_bmof ucsi_acpi efi_pstore typec_ucsi rapl typec video wmi sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx libcrc32c xor raid6_pq raid1 raid0 multipath linear dm_mirror dm_region_hash dm_log hid_generic usbhid hid amdgpu drm_ttm_helper ttm iommu_v2 gpu_sched i2c_algo_bit drm_kms_helper syscopyarea sysfillrect crct10dif_pclmul sysimgblt crc32_pclmul fb_sys_fops ghash_clmulni_intel cec rc_core aesni_intel crypto_simd psmouse cryptd r8169 i2c_piix4 drm ahci xhci_pci realtek libahci xhci_pci_renesas gpio_amdpt gpio_generic
kernel: CR2: 0000000000000018
kernel: ---[ end trace 76d04313d4214c51 ]---

Commit 4192f7b57689 ("drm/amdgpu: unmap register bar on device init
failure") makes amdgpu_driver_unload_kms() skips amdgpu_device_fini(),
so the VGA clients remain registered. So when
vga_arbiter_notify_clients() iterates over registered clients, it causes
NULL pointer dereference.

Since there's no reason to register VGA clients that early, so solve
the issue by putting them after all the goto cleanups.

v2:
 - Remove redundant vga_switcheroo cleanup in failed: label.

Fixes: 4192f7b57689 ("drm/amdgpu: unmap register bar on device init failure")
Signed-off-by: Kai-Heng Feng &lt;kai.heng.feng@canonical.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: split mmhub callbacks into ras and non-ras ones</title>
<updated>2021-04-09T20:51:19+00:00</updated>
<author>
<name>Hawking Zhang</name>
<email>Hawking.Zhang@amd.com</email>
</author>
<published>2021-03-19T07:50:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=8bc7b360ad4b0a090380d7548dbf24a627f0b035'/>
<id>urn:sha1:8bc7b360ad4b0a090380d7548dbf24a627f0b035</id>
<content type='text'>
mmhub ras is only avaiable in cerntain mmhub ip
generation.

Signed-off-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Reviewed-by: Dennis Li &lt;Dennis.Li@amd.com&gt;
Reviewed-by: John Clements &lt;John.Clements@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: indirect register access for nv12 sriov</title>
<updated>2021-04-09T20:50:17+00:00</updated>
<author>
<name>Peng Ju Zhou</name>
<email>PengJu.Zhou@amd.com</email>
</author>
<published>2021-03-22T07:18:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=5e025531b773ee9789a9a9948fc7e74e6077ddd5'/>
<id>urn:sha1:5e025531b773ee9789a9a9948fc7e74e6077ddd5</id>
<content type='text'>
1. expand rlcg interface for gc &amp; mmhub indirect access
2. add rlcg interface for no kiq

v2: squash in fix for gfx9 (Changfeng)

Signed-off-by: Peng Ju Zhou &lt;PengJu.Zhou@amd.com&gt;
Reviewed-by: Emily.Deng &lt;Emily.Deng@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: indirect register access for nv12 sriov</title>
<updated>2021-04-09T20:50:09+00:00</updated>
<author>
<name>Peng Ju Zhou</name>
<email>PengJu.Zhou@amd.com</email>
</author>
<published>2021-03-29T07:47:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=77eabc6f5975dafeb76f7c7c2451282b91e9f5b6'/>
<id>urn:sha1:77eabc6f5975dafeb76f7c7c2451282b91e9f5b6</id>
<content type='text'>
get pf2vf msg info at it's earliest time so that
guest driver can use these info to decide whether
register indirect access enabled.

Signed-off-by: Peng Ju Zhou &lt;PengJu.Zhou@amd.com&gt;
Reviewed-by: Emily.Deng &lt;Emily.Deng@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Reset error code for 'no handler' case</title>
<updated>2021-04-09T20:47:37+00:00</updated>
<author>
<name>Lijo Lazar</name>
<email>lijo.lazar@amd.com</email>
</author>
<published>2021-03-26T09:47:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=404b277bbe4945830e5ebc01a93ff9fe8403702f'/>
<id>urn:sha1:404b277bbe4945830e5ebc01a93ff9fe8403702f</id>
<content type='text'>
If reset handler is not implemented, reset error before proceeding.

Fixes issue with the following trace -
[  106.508592] amdgpu 0000:b1:00.0: amdgpu: ASIC reset failed with error, -38 for drm dev, 0000:b1:00.0
[  106.508972] amdgpu 0000:b1:00.0: amdgpu: GPU reset succeeded, trying to resume
[  106.509116] [drm] PCIE GART of 512M enabled.
[  106.509120] [drm] PTB located at 0x0000008000000000
[  106.509136] [drm] VRAM is lost due to GPU reset!
[  106.509332] [drm] PSP is resuming...

Signed-off-by: Lijo Lazar &lt;lijo.lazar@amd.com&gt;
Reviewed-and-tested-by: Guchun Chen &lt;guchun.chen@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
</feed>
