<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/drivers/gpu/drm/scheduler, branch v5.8.10</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v5.8.10</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v5.8.10'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2020-04-15T09:09:13+00:00</updated>
<entry>
<title>drm/scheduler: fix drm_sched_get_cleanup_job</title>
<updated>2020-04-15T09:09:13+00:00</updated>
<author>
<name>Christian König</name>
<email>christian.koenig@amd.com</email>
</author>
<published>2020-04-11T09:54:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=8623b5255ae7ccaf276aac3920787bf575fa6b37'/>
<id>urn:sha1:8623b5255ae7ccaf276aac3920787bf575fa6b37</id>
<content type='text'>
We are racing to initialize sched-&gt;thread here, just always check the
current thread.

Signed-off-by: Christian König &lt;christian.koenig@amd.com&gt;
Reviewed-by: Andrey Grodzovsky &lt;andrey.grodzovsky@amd.com&gt;
Reviewed-by: Kent Russell &lt;kent.russell@amd.com&gt;
Link: https://patchwork.freedesktop.org/patch/361303/
</content>
</entry>
<entry>
<title>drm/scheduler: fix rare NULL ptr race</title>
<updated>2020-03-25T21:00:11+00:00</updated>
<author>
<name>Yintian Tao</name>
<email>yttao@amd.com</email>
</author>
<published>2020-03-23T11:19:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=77bb2f204f1f0a53a602a8fd15816d6826212077'/>
<id>urn:sha1:77bb2f204f1f0a53a602a8fd15816d6826212077</id>
<content type='text'>
There is one one corner case at dma_fence_signal_locked
which will raise the NULL pointer problem just like below.
-&gt;dma_fence_signal
    -&gt;dma_fence_signal_locked
	-&gt;test_and_set_bit
here trigger dma_fence_release happen due to the zero of fence refcount.

-&gt;dma_fence_put
    -&gt;dma_fence_release
	-&gt;drm_sched_fence_release_scheduled
	    -&gt;call_rcu
here make the union fled “cb_list” at finished fence
to NULL because struct rcu_head contains two pointer
which is same as struct list_head cb_list

Therefore, to hold the reference of finished fence at drm_sched_process_job
to prevent the null pointer during finished fence dma_fence_signal

[  732.912867] BUG: kernel NULL pointer dereference, address: 0000000000000008
[  732.914815] #PF: supervisor write access in kernel mode
[  732.915731] #PF: error_code(0x0002) - not-present page
[  732.916621] PGD 0 P4D 0
[  732.917072] Oops: 0002 [#1] SMP PTI
[  732.917682] CPU: 7 PID: 0 Comm: swapper/7 Tainted: G           OE     5.4.0-rc7 #1
[  732.918980] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014
[  732.920906] RIP: 0010:dma_fence_signal_locked+0x3e/0x100
[  732.938569] Call Trace:
[  732.939003]  &lt;IRQ&gt;
[  732.939364]  dma_fence_signal+0x29/0x50
[  732.940036]  drm_sched_fence_finished+0x12/0x20 [gpu_sched]
[  732.940996]  drm_sched_process_job+0x34/0xa0 [gpu_sched]
[  732.941910]  dma_fence_signal_locked+0x85/0x100
[  732.942692]  dma_fence_signal+0x29/0x50
[  732.943457]  amdgpu_fence_process+0x99/0x120 [amdgpu]
[  732.944393]  sdma_v4_0_process_trap_irq+0x81/0xa0 [amdgpu]

v2: hold the finished fence at drm_sched_process_job instead of
    amdgpu_fence_process
v3: resume the blank line

Signed-off-by: Yintian Tao &lt;yttao@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/sched: implement and export drm_sched_pick_best</title>
<updated>2020-03-16T20:21:32+00:00</updated>
<author>
<name>Nirmoy Das</name>
<email>nirmoy.das@amd.com</email>
</author>
<published>2020-03-13T10:39:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=ec2edcc2796c892aa0cc4740ce00a22fe57d2a1c'/>
<id>urn:sha1:ec2edcc2796c892aa0cc4740ce00a22fe57d2a1c</id>
<content type='text'>
Remove drm_sched_entity_get_free_sched() and use the logic of picking
the least loaded drm scheduler from a drm scheduler list to implement
drm_sched_pick_best(). This patch also exports drm_sched_pick_best() so
that it can be utilized by other drm drivers.

Signed-off-by: Nirmoy Das &lt;nirmoy.das@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>Revert "drm/scheduler: improve job distribution with multiple queues"</title>
<updated>2020-03-16T20:21:32+00:00</updated>
<author>
<name>changzhu</name>
<email>Changfeng.Zhu@amd.com</email>
</author>
<published>2020-03-11T11:12:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=d164bebb95516c9dd2a63cf8c8e9fe0b13d7474e'/>
<id>urn:sha1:d164bebb95516c9dd2a63cf8c8e9fe0b13d7474e</id>
<content type='text'>
It needs to revert this patch to avoid amdgpu_test compute hang problem
on picasso.

This reverts commit 56822db194232c089601728d68ed078dccb97f8b.

Signed-off-by: changzhu &lt;Changfeng.Zhu@amd.com&gt;
Reviewed-by: Feifei Xu &lt;Feifei.Xu@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/scheduler: fix inconsistent locking of job_list_lock</title>
<updated>2020-03-13T15:52:36+00:00</updated>
<author>
<name>Lucas Stach</name>
<email>l.stach@pengutronix.de</email>
</author>
<published>2020-01-20T10:51:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a7fbb630c5485f5095146df46f04c2ca1a24c299'/>
<id>urn:sha1:a7fbb630c5485f5095146df46f04c2ca1a24c299</id>
<content type='text'>
1db8c142b6c5 (drm/scheduler: Add drm_sched_suspend/resume_timeout()) made
the job_list_lock IRQ safe in as the suspend/resume calls were expected to
be called from IRQ context. This usage never materialized in upstream.
Instead amdgpu started locking the job_list_lock in an IRQ unsafe way in
amdgpu_ib_preempt_mark_partial_job() and amdgpu_ib_preempt_job_recovery(),
which leads to potential deadlock if one would actually start to call the
drm_sched_suspend/resume_timeout functions from IRQ context.

As no current user needs the locking to be IRQ safe, the local IRQ
disable/enable is pure overhead. Fix the inconsistent locking by changing
all uses of job_list_lock to use the IRQ unsafe locking primitives.

Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Lucas Stach &lt;l.stach@pengutronix.de&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/sched: add run job trace</title>
<updated>2020-03-13T15:52:36+00:00</updated>
<author>
<name>Robert Beckett</name>
<email>bob.beckett@collabora.com</email>
</author>
<published>2020-03-13T09:56:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c2c91828fbdbc5a31616f956834c85ab011392e1'/>
<id>urn:sha1:c2c91828fbdbc5a31616f956834c85ab011392e1</id>
<content type='text'>
Add a new trace event to show when jobs are run on the HW.

Acked-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Robert Beckett &lt;bob.beckett@collabora.com&gt;
Signed-off-by: Lucas Stach &lt;l.stach@pengutronix.de&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/scheduler: implement a function to modify sched list</title>
<updated>2020-03-09T17:51:35+00:00</updated>
<author>
<name>Nirmoy Das</name>
<email>nirmoy.das@amd.com</email>
</author>
<published>2020-02-27T14:34:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b37aced31eb08757875306137a0ac00004f2f783'/>
<id>urn:sha1:b37aced31eb08757875306137a0ac00004f2f783</id>
<content type='text'>
Implement drm_sched_entity_modify_sched() which modifies existing
sched_list with a different one. This is going to be helpful when
userspace changes priority of a ctx/entity then the driver can switch
to the corresponding HW scheduler list for that priority.

Signed-off-by: Nirmoy Das &lt;nirmoy.das@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: fix doc by clarifying sched_list definition</title>
<updated>2020-01-27T21:46:44+00:00</updated>
<author>
<name>Nirmoy Das</name>
<email>nirmoy.das@amd.com</email>
</author>
<published>2020-01-22T09:37:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=2639f453f28e71dc4149fb06c71bcf6f93eb468f'/>
<id>urn:sha1:2639f453f28e71dc4149fb06c71bcf6f93eb468f</id>
<content type='text'>
expand sched_list definition for better understanding.
Also fix a typo atleast -&gt; at least

Signed-off-by: Nirmoy Das &lt;nirmoy.das@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/scheduler: fix documentation by replacing rq_list with sched_list</title>
<updated>2020-01-16T18:41:00+00:00</updated>
<author>
<name>Nirmoy Das</name>
<email>nirmoy.das@amd.com</email>
</author>
<published>2020-01-14T09:38:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=9e3e90c50dd34fe961dc662f37ee9640e04cba97'/>
<id>urn:sha1:9e3e90c50dd34fe961dc662f37ee9640e04cba97</id>
<content type='text'>
This also replaces old artifacts with a correct one in drm_sched_entity_init()
declaration

Signed-off-by: Nirmoy Das &lt;nirmoy.das@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/scheduler: improve job distribution with multiple queues</title>
<updated>2020-01-16T18:37:54+00:00</updated>
<author>
<name>Nirmoy Das</name>
<email>nirmoy.das@amd.com</email>
</author>
<published>2020-01-15T14:06:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=56822db194232c089601728d68ed078dccb97f8b'/>
<id>urn:sha1:56822db194232c089601728d68ed078dccb97f8b</id>
<content type='text'>
This patch uses score based logic to select a new rq for better
loadbalance between multiple rq/scheds instead of num_jobs.

Below are test results after running amdgpu_test from mesa drm

Before this patch:

sched_name     num of many times it got scheduled
=========      ==================================
sdma0          314
sdma1          32
comp_1.0.0     56
comp_1.0.1     0
comp_1.1.0     0
comp_1.1.1     0
comp_1.2.0     0
comp_1.2.1     0
comp_1.3.0     0
comp_1.3.1     0
After this patch:

sched_name     num of many times it got scheduled
=========      ==================================
sdma0          216
sdma1          185
comp_1.0.0     39
comp_1.0.1     9
comp_1.1.0     12
comp_1.1.1     0
comp_1.2.0     12
comp_1.2.1     0
comp_1.3.0     12
comp_1.3.1     0

Signed-off-by: Nirmoy Das &lt;nirmoy.das@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
</feed>
