<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/drivers/gpu/drm/scheduler, branch v6.6.131</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.6.131</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.6.131'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2026-03-25T10:05:41+00:00</updated>
<entry>
<title>drm/sched: Fix kernel-doc warning for drm_sched_job_done()</title>
<updated>2026-03-25T10:05:41+00:00</updated>
<author>
<name>Yujie Liu</name>
<email>yujie.liu@intel.com</email>
</author>
<published>2026-02-27T08:24:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=5d53fe502ef484821b41eed231bae87ce7c9fc78'/>
<id>urn:sha1:5d53fe502ef484821b41eed231bae87ce7c9fc78</id>
<content type='text'>
[ Upstream commit 61ded1083b264ff67ca8c2de822c66b6febaf9a8 ]

There is a kernel-doc warning for the scheduler:

Warning: drivers/gpu/drm/scheduler/sched_main.c:367 function parameter 'result' not described in 'drm_sched_job_done'

Fix the warning by describing the undocumented error code.

Fixes: 539f9ee4b52a ("drm/scheduler: properly forward fence errors")
Signed-off-by: Yujie Liu &lt;yujie.liu@intel.com&gt;
[phasta: Flesh out commit message]
Signed-off-by: Philipp Stanner &lt;phasta@kernel.org&gt;
Link: https://patch.msgid.link/20260227082452.1802922-1-yujie.liu@intel.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>drm/sched: Fix deadlock in drm_sched_entity_kill_jobs_cb</title>
<updated>2025-11-24T09:29:54+00:00</updated>
<author>
<name>Pierre-Eric Pelloux-Prayer</name>
<email>pierre-eric.pelloux-prayer@amd.com</email>
</author>
<published>2025-11-04T09:53:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=70150b9443dddf02157d821c68abf438f55a2e8e'/>
<id>urn:sha1:70150b9443dddf02157d821c68abf438f55a2e8e</id>
<content type='text'>
commit 487df8b698345dd5a91346335f05170ed5f29d4e upstream.

The Mesa issue referenced below pointed out a possible deadlock:

[ 1231.611031]  Possible interrupt unsafe locking scenario:

[ 1231.611033]        CPU0                    CPU1
[ 1231.611034]        ----                    ----
[ 1231.611035]   lock(&amp;xa-&gt;xa_lock#17);
[ 1231.611038]                                local_irq_disable();
[ 1231.611039]                                lock(&amp;fence-&gt;lock);
[ 1231.611041]                                lock(&amp;xa-&gt;xa_lock#17);
[ 1231.611044]   &lt;Interrupt&gt;
[ 1231.611045]     lock(&amp;fence-&gt;lock);
[ 1231.611047]
                *** DEADLOCK ***

In this example, CPU0 would be any function accessing job-&gt;dependencies
through the xa_* functions that don't disable interrupts (eg:
drm_sched_job_add_dependency(), drm_sched_entity_kill_jobs_cb()).

CPU1 is executing drm_sched_entity_kill_jobs_cb() as a fence signalling
callback so in an interrupt context. It will deadlock when trying to
grab the xa_lock which is already held by CPU0.

Replacing all xa_* usage by their xa_*_irq counterparts would fix
this issue, but Christian pointed out another issue: dma_fence_signal
takes fence.lock and so does dma_fence_add_callback.

  dma_fence_signal() // locks f1.lock
  -&gt; drm_sched_entity_kill_jobs_cb()
  -&gt; foreach dependencies
     -&gt; dma_fence_add_callback() // locks f2.lock

This will deadlock if f1 and f2 share the same spinlock.

To fix both issues, the code iterating on dependencies and re-arming them
is moved out to drm_sched_entity_kill_jobs_work().

Cc: stable@vger.kernel.org # v6.2+
Fixes: 2fdb8a8f07c2 ("drm/scheduler: rework entity flush, kill and fini")
Link: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13908
Reported-by: Mikhail Gavrilov &lt;mikhail.v.gavrilov@gmail.com&gt;
Suggested-by: Christian König &lt;christian.koenig@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Pierre-Eric Pelloux-Prayer &lt;pierre-eric.pelloux-prayer@amd.com&gt;
[phasta: commit message nits]
Signed-off-by: Philipp Stanner &lt;phasta@kernel.org&gt;
Link: https://patch.msgid.link/20251104095358.15092-1-pierre-eric.pelloux-prayer@amd.com
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>drm/sched: Fix race in drm_sched_entity_select_rq()</title>
<updated>2025-11-24T09:29:17+00:00</updated>
<author>
<name>Philipp Stanner</name>
<email>phasta@kernel.org</email>
</author>
<published>2025-11-03T14:59:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b5493968ac7a9b3356e80e18933b4d203aef2019'/>
<id>urn:sha1:b5493968ac7a9b3356e80e18933b4d203aef2019</id>
<content type='text'>
[ Upstream commit d25e3a610bae03bffc5c14b5d944a5d0cd844678 ]

In a past bug fix it was forgotten that entity access must be protected
by the entity lock. That's a data race and potentially UB.

Move the spin_unlock() to the appropriate position.

Cc: stable@vger.kernel.org # v5.13+
Fixes: ac4eb83ab255 ("drm/sched: select new rq even if there is only one v3")
Reviewed-by: Tvrtko Ursulin &lt;tvrtko.ursulin@igalia.com&gt;
Signed-off-by: Philipp Stanner &lt;phasta@kernel.org&gt;
Link: https://patch.msgid.link/20251022063402.87318-2-phasta@kernel.org
[ adapted lock field name from entity-&gt;lock to entity-&gt;rq_lock ]
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>drm/sched: Fix potential double free in drm_sched_job_add_resv_dependencies</title>
<updated>2025-10-23T14:16:25+00:00</updated>
<author>
<name>Tvrtko Ursulin</name>
<email>tvrtko.ursulin@igalia.com</email>
</author>
<published>2025-10-15T08:40:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=57239762aa90ad768dac055021f27705dae73344'/>
<id>urn:sha1:57239762aa90ad768dac055021f27705dae73344</id>
<content type='text'>
commit 5801e65206b065b0b2af032f7f1eef222aa2fd83 upstream.

When adding dependencies with drm_sched_job_add_dependency(), that
function consumes the fence reference both on success and failure, so in
the latter case the dma_fence_put() on the error path (xarray failed to
expand) is a double free.

Interestingly this bug appears to have been present ever since
commit ebd5f74255b9 ("drm/sched: Add dependency tracking"), since the code
back then looked like this:

drm_sched_job_add_implicit_dependencies():
...
       for (i = 0; i &lt; fence_count; i++) {
               ret = drm_sched_job_add_dependency(job, fences[i]);
               if (ret)
                       break;
       }

       for (; i &lt; fence_count; i++)
               dma_fence_put(fences[i]);

Which means for the failing 'i' the dma_fence_put was already a double
free. Possibly there were no users at that time, or the test cases were
insufficient to hit it.

The bug was then only noticed and fixed after
commit 9c2ba265352a ("drm/scheduler: use new iterator in drm_sched_job_add_implicit_dependencies v2")
landed, with its fixup of
commit 4eaf02d6076c ("drm/scheduler: fix drm_sched_job_add_implicit_dependencies").

At that point it was a slightly different flavour of a double free, which
commit 963d0b356935 ("drm/scheduler: fix drm_sched_job_add_implicit_dependencies harder")
noticed and attempted to fix.

But it only moved the double free from happening inside the
drm_sched_job_add_dependency(), when releasing the reference not yet
obtained, to the caller, when releasing the reference already released by
the former in the failure case.

As such it is not easy to identify the right target for the fixes tag so
lets keep it simple and just continue the chain.

While fixing we also improve the comment and explain the reason for taking
the reference and not dropping it.

Signed-off-by: Tvrtko Ursulin &lt;tvrtko.ursulin@igalia.com&gt;
Fixes: 963d0b356935 ("drm/scheduler: fix drm_sched_job_add_implicit_dependencies harder")
Reported-by: Dan Carpenter &lt;dan.carpenter@linaro.org&gt;
Closes: https://lore.kernel.org/dri-devel/aNFbXq8OeYl3QSdm@stanley.mountain/
Cc: Christian König &lt;christian.koenig@amd.com&gt;
Cc: Rob Clark &lt;robdclark@chromium.org&gt;
Cc: Daniel Vetter &lt;daniel.vetter@ffwll.ch&gt;
Cc: Matthew Brost &lt;matthew.brost@intel.com&gt;
Cc: Danilo Krummrich &lt;dakr@kernel.org&gt;
Cc: Philipp Stanner &lt;phasta@kernel.org&gt;
Cc: Christian König &lt;ckoenig.leichtzumerken@gmail.com&gt;
Cc: dri-devel@lists.freedesktop.org
Cc: stable@vger.kernel.org # v5.16+
Signed-off-by: Philipp Stanner &lt;phasta@kernel.org&gt;
Link: https://lore.kernel.org/r/20251015084015.6273-1-tvrtko.ursulin@igalia.com
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>drm/sched: Remove optimization that causes hang when killing dependent jobs</title>
<updated>2025-08-01T08:47:33+00:00</updated>
<author>
<name>Lin.Cao</name>
<email>lincao12@amd.com</email>
</author>
<published>2025-07-29T15:40:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=aed4053c506be9f73a713a4162ccad5429745b37'/>
<id>urn:sha1:aed4053c506be9f73a713a4162ccad5429745b37</id>
<content type='text'>
[ Upstream commit 15f77764e90a713ee3916ca424757688e4f565b9 ]

When application A submits jobs and application B submits a job with a
dependency on A's fence, the normal flow wakes up the scheduler after
processing each job. However, the optimization in
drm_sched_entity_add_dependency_cb() uses a callback that only clears
dependencies without waking up the scheduler.

When application A is killed before its jobs can run, the callback gets
triggered but only clears the dependency without waking up the scheduler,
causing the scheduler to enter sleep state and application B to hang.

Remove the optimization by deleting drm_sched_entity_clear_dep() and its
usage, ensuring the scheduler is always woken up when dependencies are
cleared.

Fixes: 777dbd458c89 ("drm/amdgpu: drop a dummy wakeup scheduler")
Cc: stable@vger.kernel.org # v4.6+
Signed-off-by: Lin.Cao &lt;lincao12@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Philipp Stanner &lt;phasta@kernel.org&gt;
Link: https://lore.kernel.org/r/20250717084453.921097-1-lincao12@amd.com
[ replaced drm_sched_wakeup() calls with drm_sched_wakeup_if_can_queue() ]
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>drm/scheduler: signal scheduled fence when kill job</title>
<updated>2025-07-06T09:00:07+00:00</updated>
<author>
<name>Lin.Cao</name>
<email>lincao12@amd.com</email>
</author>
<published>2025-05-15T02:07:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c5734f9bab6f0d40577ad0633af4090a5fda2407'/>
<id>urn:sha1:c5734f9bab6f0d40577ad0633af4090a5fda2407</id>
<content type='text'>
[ Upstream commit 471db2c2d4f80ee94225a1ef246e4f5011733e50 ]

When an entity from application B is killed, drm_sched_entity_kill()
removes all jobs belonging to that entity through
drm_sched_entity_kill_jobs_work(). If application A's job depends on a
scheduled fence from application B's job, and that fence is not properly
signaled during the killing process, application A's dependency cannot be
cleared.

This leads to application A hanging indefinitely while waiting for a
dependency that will never be resolved. Fix this issue by ensuring that
scheduled fences are properly signaled when an entity is killed, allowing
dependent applications to continue execution.

Signed-off-by: Lin.Cao &lt;lincao12@amd.com&gt;
Reviewed-by: Philipp Stanner &lt;phasta@kernel.org&gt;
Signed-off-by: Christian König &lt;christian.koenig@amd.com&gt;
Link: https://lore.kernel.org/r/20250515020713.1110476-1-lincao12@amd.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>drm/sched: Fix fence reference count leak</title>
<updated>2025-03-28T20:59:55+00:00</updated>
<author>
<name>qianyi liu</name>
<email>liuqianyi125@gmail.com</email>
</author>
<published>2025-03-11T06:02:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c76bd3c99293834de7d1dca5de536616d5655e38'/>
<id>urn:sha1:c76bd3c99293834de7d1dca5de536616d5655e38</id>
<content type='text'>
commit a952f1ab696873be124e31ce5ef964d36bce817f upstream.

The last_scheduled fence leaks when an entity is being killed and adding
the cleanup callback fails.

Decrement the reference count of prev when dma_fence_add_callback()
fails, ensuring proper balance.

Cc: stable@vger.kernel.org	# v6.2+
[phasta: add git tag info for stable kernel]
Fixes: 2fdb8a8f07c2 ("drm/scheduler: rework entity flush, kill and fini")
Signed-off-by: qianyi liu &lt;liuqianyi125@gmail.com&gt;
Signed-off-by: Philipp Stanner &lt;phasta@kernel.org&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20250311060251.4041101-1-liuqianyi125@gmail.com
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>drm/sched: Fix preprocessor guard</title>
<updated>2025-03-13T11:58:30+00:00</updated>
<author>
<name>Philipp Stanner</name>
<email>phasta@kernel.org</email>
</author>
<published>2025-02-18T12:41:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=4f5cc62f9a81d3467abb3c42dec9658a394a5b54'/>
<id>urn:sha1:4f5cc62f9a81d3467abb3c42dec9658a394a5b54</id>
<content type='text'>
[ Upstream commit 23e0832d6d7be2d3c713f9390c060b6f1c48bf36 ]

When writing the header guard for gpu_scheduler_trace.h, a typo,
apparently, occurred.

Fix the typo and document the scope of the guard.

Fixes: 353da3c520b4 ("drm/amdgpu: add tracepoint for scheduler (v2)")
Reviewed-by: Tvrtko Ursulin &lt;tvrtko.ursulin@igalia.com&gt;
Signed-off-by: Philipp Stanner &lt;phasta@kernel.org&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20250218124149.118002-2-phasta@kernel.org
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>drm/sched: memset() 'job' in drm_sched_job_init()</title>
<updated>2024-12-14T19:00:06+00:00</updated>
<author>
<name>Philipp Stanner</name>
<email>pstanner@redhat.com</email>
</author>
<published>2024-10-21T10:50:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=87210234e5a273ebf9c4110a6aa82b8221478daa'/>
<id>urn:sha1:87210234e5a273ebf9c4110a6aa82b8221478daa</id>
<content type='text'>
[ Upstream commit 2320c9e6a768d135c7b0039995182bb1a4e4fd22 ]

drm_sched_job_init() has no control over how users allocate struct
drm_sched_job. Unfortunately, the function can also not set some struct
members such as job-&gt;sched.

This could theoretically lead to UB by users dereferencing the struct's
pointer members too early.

It is easier to debug such issues if these pointers are initialized to
NULL, so dereferencing them causes a NULL pointer exception.
Accordingly, drm_sched_entity_init() does precisely that and initializes
its struct with memset().

Initialize parameter "job" to 0 in drm_sched_job_init().

Signed-off-by: Philipp Stanner &lt;pstanner@redhat.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20241021105028.19794-2-pstanner@redhat.com
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>drm/sched: Add locking to drm_sched_entity_modify_sched</title>
<updated>2024-10-10T09:58:00+00:00</updated>
<author>
<name>Tvrtko Ursulin</name>
<email>tvrtko.ursulin@igalia.com</email>
</author>
<published>2024-09-13T16:05:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=579a0a84e3c0174f296004ac4af83cd9819b38a9'/>
<id>urn:sha1:579a0a84e3c0174f296004ac4af83cd9819b38a9</id>
<content type='text'>
commit 4286cc2c953983d44d248c9de1c81d3a9643345c upstream.

Without the locking amdgpu currently can race between
amdgpu_ctx_set_entity_priority() (via drm_sched_entity_modify_sched()) and
drm_sched_job_arm(), leading to the latter accesing potentially
inconsitent entity-&gt;sched_list and entity-&gt;num_sched_list pair.

v2:
 * Improve commit message. (Philipp)

Signed-off-by: Tvrtko Ursulin &lt;tvrtko.ursulin@igalia.com&gt;
Fixes: b37aced31eb0 ("drm/scheduler: implement a function to modify sched list")
Cc: Christian König &lt;christian.koenig@amd.com&gt;
Cc: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Cc: Luben Tuikov &lt;ltuikov89@gmail.com&gt;
Cc: Matthew Brost &lt;matthew.brost@intel.com&gt;
Cc: David Airlie &lt;airlied@gmail.com&gt;
Cc: Daniel Vetter &lt;daniel@ffwll.ch&gt;
Cc: dri-devel@lists.freedesktop.org
Cc: Philipp Stanner &lt;pstanner@redhat.com&gt;
Cc: &lt;stable@vger.kernel.org&gt; # v5.7+
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20240913160559.49054-2-tursulin@igalia.com
Signed-off-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
</feed>
