<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c, branch v6.6.40</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.6.40</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.6.40'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2024-04-03T13:28:57+00:00</updated>
<entry>
<title>drm/amdgpu: fix deadlock while reading mqd from debugfs</title>
<updated>2024-04-03T13:28:57+00:00</updated>
<author>
<name>Johannes Weiner</name>
<email>hannes@cmpxchg.org</email>
</author>
<published>2024-03-07T22:07:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=197f6d6987c55860f6eea1c93e4f800c59078874'/>
<id>urn:sha1:197f6d6987c55860f6eea1c93e4f800c59078874</id>
<content type='text'>
commit 8678b1060ae2b75feb60b87e5b75e17374e3c1c5 upstream.

An errant disk backup on my desktop got into debugfs and triggered the
following deadlock scenario in the amdgpu debugfs files. The machine
also hard-resets immediately after those lines are printed (although I
wasn't able to reproduce that part when reading by hand):

[ 1318.016074][ T1082] ======================================================
[ 1318.016607][ T1082] WARNING: possible circular locking dependency detected
[ 1318.017107][ T1082] 6.8.0-rc7-00015-ge0c8221b72c0 #17 Not tainted
[ 1318.017598][ T1082] ------------------------------------------------------
[ 1318.018096][ T1082] tar/1082 is trying to acquire lock:
[ 1318.018585][ T1082] ffff98c44175d6a0 (&amp;mm-&gt;mmap_lock){++++}-{3:3}, at: __might_fault+0x40/0x80
[ 1318.019084][ T1082]
[ 1318.019084][ T1082] but task is already holding lock:
[ 1318.020052][ T1082] ffff98c4c13f55f8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: amdgpu_debugfs_mqd_read+0x6a/0x250 [amdgpu]
[ 1318.020607][ T1082]
[ 1318.020607][ T1082] which lock already depends on the new lock.
[ 1318.020607][ T1082]
[ 1318.022081][ T1082]
[ 1318.022081][ T1082] the existing dependency chain (in reverse order) is:
[ 1318.023083][ T1082]
[ 1318.023083][ T1082] -&gt; #2 (reservation_ww_class_mutex){+.+.}-{3:3}:
[ 1318.024114][ T1082]        __ww_mutex_lock.constprop.0+0xe0/0x12f0
[ 1318.024639][ T1082]        ww_mutex_lock+0x32/0x90
[ 1318.025161][ T1082]        dma_resv_lockdep+0x18a/0x330
[ 1318.025683][ T1082]        do_one_initcall+0x6a/0x350
[ 1318.026210][ T1082]        kernel_init_freeable+0x1a3/0x310
[ 1318.026728][ T1082]        kernel_init+0x15/0x1a0
[ 1318.027242][ T1082]        ret_from_fork+0x2c/0x40
[ 1318.027759][ T1082]        ret_from_fork_asm+0x11/0x20
[ 1318.028281][ T1082]
[ 1318.028281][ T1082] -&gt; #1 (reservation_ww_class_acquire){+.+.}-{0:0}:
[ 1318.029297][ T1082]        dma_resv_lockdep+0x16c/0x330
[ 1318.029790][ T1082]        do_one_initcall+0x6a/0x350
[ 1318.030263][ T1082]        kernel_init_freeable+0x1a3/0x310
[ 1318.030722][ T1082]        kernel_init+0x15/0x1a0
[ 1318.031168][ T1082]        ret_from_fork+0x2c/0x40
[ 1318.031598][ T1082]        ret_from_fork_asm+0x11/0x20
[ 1318.032011][ T1082]
[ 1318.032011][ T1082] -&gt; #0 (&amp;mm-&gt;mmap_lock){++++}-{3:3}:
[ 1318.032778][ T1082]        __lock_acquire+0x14bf/0x2680
[ 1318.033141][ T1082]        lock_acquire+0xcd/0x2c0
[ 1318.033487][ T1082]        __might_fault+0x58/0x80
[ 1318.033814][ T1082]        amdgpu_debugfs_mqd_read+0x103/0x250 [amdgpu]
[ 1318.034181][ T1082]        full_proxy_read+0x55/0x80
[ 1318.034487][ T1082]        vfs_read+0xa7/0x360
[ 1318.034788][ T1082]        ksys_read+0x70/0xf0
[ 1318.035085][ T1082]        do_syscall_64+0x94/0x180
[ 1318.035375][ T1082]        entry_SYSCALL_64_after_hwframe+0x46/0x4e
[ 1318.035664][ T1082]
[ 1318.035664][ T1082] other info that might help us debug this:
[ 1318.035664][ T1082]
[ 1318.036487][ T1082] Chain exists of:
[ 1318.036487][ T1082]   &amp;mm-&gt;mmap_lock --&gt; reservation_ww_class_acquire --&gt; reservation_ww_class_mutex
[ 1318.036487][ T1082]
[ 1318.037310][ T1082]  Possible unsafe locking scenario:
[ 1318.037310][ T1082]
[ 1318.037838][ T1082]        CPU0                    CPU1
[ 1318.038101][ T1082]        ----                    ----
[ 1318.038350][ T1082]   lock(reservation_ww_class_mutex);
[ 1318.038590][ T1082]                                lock(reservation_ww_class_acquire);
[ 1318.038839][ T1082]                                lock(reservation_ww_class_mutex);
[ 1318.039083][ T1082]   rlock(&amp;mm-&gt;mmap_lock);
[ 1318.039328][ T1082]
[ 1318.039328][ T1082]  *** DEADLOCK ***
[ 1318.039328][ T1082]
[ 1318.040029][ T1082] 1 lock held by tar/1082:
[ 1318.040259][ T1082]  #0: ffff98c4c13f55f8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: amdgpu_debugfs_mqd_read+0x6a/0x250 [amdgpu]
[ 1318.040560][ T1082]
[ 1318.040560][ T1082] stack backtrace:
[ 1318.041053][ T1082] CPU: 22 PID: 1082 Comm: tar Not tainted 6.8.0-rc7-00015-ge0c8221b72c0 #17 3316c85d50e282c5643b075d1f01a4f6365e39c2
[ 1318.041329][ T1082] Hardware name: Gigabyte Technology Co., Ltd. B650 AORUS PRO AX/B650 AORUS PRO AX, BIOS F20 12/14/2023
[ 1318.041614][ T1082] Call Trace:
[ 1318.041895][ T1082]  &lt;TASK&gt;
[ 1318.042175][ T1082]  dump_stack_lvl+0x4a/0x80
[ 1318.042460][ T1082]  check_noncircular+0x145/0x160
[ 1318.042743][ T1082]  __lock_acquire+0x14bf/0x2680
[ 1318.043022][ T1082]  lock_acquire+0xcd/0x2c0
[ 1318.043301][ T1082]  ? __might_fault+0x40/0x80
[ 1318.043580][ T1082]  ? __might_fault+0x40/0x80
[ 1318.043856][ T1082]  __might_fault+0x58/0x80
[ 1318.044131][ T1082]  ? __might_fault+0x40/0x80
[ 1318.044408][ T1082]  amdgpu_debugfs_mqd_read+0x103/0x250 [amdgpu 8fe2afaa910cbd7654c8cab23563a94d6caebaab]
[ 1318.044749][ T1082]  full_proxy_read+0x55/0x80
[ 1318.045042][ T1082]  vfs_read+0xa7/0x360
[ 1318.045333][ T1082]  ksys_read+0x70/0xf0
[ 1318.045623][ T1082]  do_syscall_64+0x94/0x180
[ 1318.045913][ T1082]  ? do_syscall_64+0xa0/0x180
[ 1318.046201][ T1082]  ? lockdep_hardirqs_on+0x7d/0x100
[ 1318.046487][ T1082]  ? do_syscall_64+0xa0/0x180
[ 1318.046773][ T1082]  ? do_syscall_64+0xa0/0x180
[ 1318.047057][ T1082]  ? do_syscall_64+0xa0/0x180
[ 1318.047337][ T1082]  ? do_syscall_64+0xa0/0x180
[ 1318.047611][ T1082]  entry_SYSCALL_64_after_hwframe+0x46/0x4e
[ 1318.047887][ T1082] RIP: 0033:0x7f480b70a39d
[ 1318.048162][ T1082] Code: 91 ba 0d 00 f7 d8 64 89 02 b8 ff ff ff ff eb b2 e8 18 a3 01 00 0f 1f 84 00 00 00 00 00 80 3d a9 3c 0e 00 00 74 17 31 c0 0f 05 &lt;48&gt; 3d 00 f0 ff ff 77 5b c3 66 2e 0f 1f 84 00 00 00 00 00 53 48 83
[ 1318.048769][ T1082] RSP: 002b:00007ffde77f5c68 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[ 1318.049083][ T1082] RAX: ffffffffffffffda RBX: 0000000000000800 RCX: 00007f480b70a39d
[ 1318.049392][ T1082] RDX: 0000000000000800 RSI: 000055c9f2120c00 RDI: 0000000000000008
[ 1318.049703][ T1082] RBP: 0000000000000800 R08: 000055c9f2120a94 R09: 0000000000000007
[ 1318.050011][ T1082] R10: 0000000000000000 R11: 0000000000000246 R12: 000055c9f2120c00
[ 1318.050324][ T1082] R13: 0000000000000008 R14: 0000000000000008 R15: 0000000000000800
[ 1318.050638][ T1082]  &lt;/TASK&gt;

amdgpu_debugfs_mqd_read() holds a reservation when it calls
put_user(), which may fault and acquire the mmap_sem. This violates
the established locking order.

Bounce the mqd data through a kernel buffer to get put_user() out of
the illegal section.

Fixes: 445d85e3c1df ("drm/amdgpu: add debugfs interface for reading MQDs")
Cc: stable@vger.kernel.org # v6.5+
Reviewed-by: Shashank Sharma &lt;shashank.sharma@amd.com&gt;
Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: mark soft recovered fences with -ENODATA</title>
<updated>2023-06-15T15:37:55+00:00</updated>
<author>
<name>Christian König</name>
<email>christian.koenig@amd.com</email>
</author>
<published>2023-04-17T11:04:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=89fae8dc41d0a9bfc9fc1ea7ec03bf36e680774d'/>
<id>urn:sha1:89fae8dc41d0a9bfc9fc1ea7ec03bf36e680774d</id>
<content type='text'>
Set the fence error code before trying to soft-recover it.

It gets overwritten when a hard recovery is required.

Signed-off-by: Christian König &lt;christian.koenig@amd.com&gt;
Reviewed-by: Luben Tuikov &lt;luben.tuikov@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: add amdgpu_error_* debugfs file</title>
<updated>2023-06-15T15:37:54+00:00</updated>
<author>
<name>Christian König</name>
<email>christian.koenig@amd.com</email>
</author>
<published>2023-04-19T10:51:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b13eb02ba8ba7617d41212121891756da31f1d8b'/>
<id>urn:sha1:b13eb02ba8ba7617d41212121891756da31f1d8b</id>
<content type='text'>
This allows us to insert some error codes into the bottom of the pipeline
on an engine.

Signed-off-by: Christian König &lt;christian.koenig@amd.com&gt;
Reviewed-by: Luben Tuikov &lt;luben.tuikov@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Modify indirect buffer packages for resubmission</title>
<updated>2023-06-09T16:44:15+00:00</updated>
<author>
<name>Jiadong Zhu</name>
<email>Jiadong.Zhu@amd.com</email>
</author>
<published>2023-05-25T08:52:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=8ff865be93e642d0ad66ca7369f42fbe36dc6a90'/>
<id>urn:sha1:8ff865be93e642d0ad66ca7369f42fbe36dc6a90</id>
<content type='text'>
When the preempted IB frame resubmitted to cp, we need to modify the frame
data including:
1. set PRE_RESUME 1 in CONTEXT_CONTROL.
2. use meta data(DE and CE) read from CSA in WRITE_DATA.

Add functions to save the location the first time IBs emitted and callback
to patch the package when resubmission happens.

Signed-off-by: Jiadong Zhu &lt;Jiadong.Zhu@amd.com&gt;
Acked-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amd/amdgpu: Fix warnings in amdgpu _object, _ring.c</title>
<updated>2023-06-09T13:38:20+00:00</updated>
<author>
<name>Srinivasan Shanmugam</name>
<email>srinivasan.shanmugam@amd.com</email>
</author>
<published>2023-05-09T13:38:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=1d6ecab1ac0fdff8e62ff3ba506b606177010d08'/>
<id>urn:sha1:1d6ecab1ac0fdff8e62ff3ba506b606177010d08</id>
<content type='text'>
Fix below warnings reported by checkpatch:

WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
WARNING: static const char * array should probably be static const char * const
WARNING: space prohibited between function name and open parenthesis '('
WARNING: braces {} are not necessary for single statement blocks
WARNING: Symbolic permissions 'S_IRUGO' are not preferred. Consider using octal permissions '0444'.

Cc: Christian König &lt;christian.koenig@amd.com&gt;
Cc: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Srinivasan Shanmugam &lt;srinivasan.shanmugam@amd.com&gt;
Acked-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: add debugfs interface for reading MQDs</title>
<updated>2023-04-24T22:36:45+00:00</updated>
<author>
<name>Alex Deucher</name>
<email>alexander.deucher@amd.com</email>
</author>
<published>2023-03-21T17:59:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=445d85e3c1dfd8c45b24be6f1527f1e117256d0e'/>
<id>urn:sha1:445d85e3c1dfd8c45b24be6f1527f1e117256d0e</id>
<content type='text'>
Provide a debugfs interface to access the MQD.  Useful for
debugging issues with the CP and MES hardware scheduler.

v2: fix missing unreserve/unmap when pos &gt;= size (Alex)

Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: fix memory leak in mes self test</title>
<updated>2023-04-24T22:16:02+00:00</updated>
<author>
<name>Jack Xiao</name>
<email>Jack.Xiao@amd.com</email>
</author>
<published>2023-04-21T06:20:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=31d7c3a4fc3d312a0646990767647925d5bde540'/>
<id>urn:sha1:31d7c3a4fc3d312a0646990767647925d5bde540</id>
<content type='text'>
The fences associated with mes queue have to be freed
up during amdgpu_ring_fini.

Signed-off-by: Jack Xiao &lt;Jack.Xiao@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Add a max ibs per submission limit.</title>
<updated>2023-04-18T20:28:54+00:00</updated>
<author>
<name>Bas Nieuwenhuizen</name>
<email>bas@basnieuwenhuizen.nl</email>
</author>
<published>2023-04-13T14:22:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c30ddcece3a0a86853862a7d92678a79525ca1fb'/>
<id>urn:sha1:c30ddcece3a0a86853862a7d92678a79525ca1fb</id>
<content type='text'>
And ensure each ring supports that many submissions. This makes
sure that we don't get surprises after the submission has been
scheduled where the ring allocation actually gets rejected.

My calculations on the existing limits:
COMPUTE v10: 128
COMPUTE v11: 128
COMPUTE v6: 157
COMPUTE v7: 133
COMPUTE v8: 130
COMPUTE v9: 125
GFX v10: 208
GFX v11: 213
GFX v6: 154 (doubling this in the previous patch)
GFX v7: 226
GFX v8: 213
GFX v9: 208
GFX v9 (SW): 208
SDMA CIK: 87
SDMA SI: 97
SDMA v2.4: 74
SDMA v3.0: 74
SDMA v4.0: 72
SDMA v5.0: 51
SDMA v6.0: 52
UVD ENC v6.0: 98
UVD ENC v7.0: 92
UVD v3.1: 124
UVD v4.2: 124
UVD v5.0: 83
UVD v6.0  (VM): 55
UVD v7.0: 51
VCE v2.0: 126
VCE v3.0 (VM): 98
VCE v4.0: 93
VCN DEC v1.0: 49
VCN DEC v2.0: 51
VCN DEC v3.0: 51
VCN ENC v1.0: 58
VCN ENC v2.0: 93
VCN ENC v3.0: 93
VCN ENC v4.0: 93
VCN JPEG v1.0: 17
VCN JPEG v2.0: 16
VCN JPEG v2.5: 17
VCN JPEG v3.0: 17
VCN JPEG v4.0: 17

Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2498
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Bas Nieuwenhuizen &lt;bas@basnieuwenhuizen.nl&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: MCBP based on DRM scheduler (v9)</title>
<updated>2022-12-02T15:04:51+00:00</updated>
<author>
<name>Jiadong.Zhu</name>
<email>Jiadong.Zhu@amd.com</email>
</author>
<published>2022-09-07T02:24:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=3f4c175d62d89819121cbbd5a0a30f4b80862025'/>
<id>urn:sha1:3f4c175d62d89819121cbbd5a0a30f4b80862025</id>
<content type='text'>
Trigger Mid-Command Buffer Preemption according to the priority of the software
rings and the hw fence signalling condition.

The muxer saves the locations of the indirect buffer frames from the software
ring together with the fence sequence number in its fifo queue, and pops out
those records when the fences are signalled. The locations are used to resubmit
packages in preemption scenarios by coping the chunks from the software ring.

v2: Update comment style.
v3: Fix conflict caused by previous modifications.
v4: Remove unnecessary prints.
v5: Fix corner cases for resubmission cases.
v6: Refactor functions for resubmission, calling fence_process in irq handler.
v7: Solve conflict for removing amdgpu_sw_ring.c.
v8: Add time threshold to judge if preemption request is needed.
v9: Correct comment spelling. Set fence emit timestamp before rsu assignment.

Cc: Christian Koenig &lt;Christian.Koenig@amd.com&gt;
Cc: Luben Tuikov &lt;Luben.Tuikov@amd.com&gt;
Cc: Andrey Grodzovsky &lt;Andrey.Grodzovsky@amd.com&gt;
Cc: Michel Dänzer &lt;michel@daenzer.net&gt;
Signed-off-by: Jiadong.Zhu &lt;Jiadong.Zhu@amd.com&gt;
Acked-by: Luben Tuikov &lt;luben.tuikov@amd.com&gt;
Acked-by: Huang Rui &lt;ray.huang@amd.com&gt;
Acked-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>Revert "drm/amdgpu: add debugfs amdgpu_reset_level"</title>
<updated>2022-10-19T02:08:25+00:00</updated>
<author>
<name>Victor Zhao</name>
<email>Victor.Zhao@amd.com</email>
</author>
<published>2022-10-13T02:42:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=afbaa15501125ae0b7de9dd16c6f00c85de14218'/>
<id>urn:sha1:afbaa15501125ae0b7de9dd16c6f00c85de14218</id>
<content type='text'>
This reverts commit 5bd8d53f6fa53eab5433698d1362dae2aa53c1cc.

This commit breaks the reset logic for aldebaran, revert it for now.
Will move the mask inside the reset handler.

Fixes: 5bd8d53f6fa53e ("drm/amdgpu: add debugfs amdgpu_reset_level")
Signed-off-by: Victor Zhao &lt;Victor.Zhao@amd.com&gt;
Reviewed-by: Lijo Lazar &lt;lijo.lazar@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
</feed>
