<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/include/drm/gpu_scheduler.h, branch v6.19.11</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.19.11</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.19.11'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2025-09-17T12:58:33+00:00</updated>
<entry>
<title>drm/sched: backend_ops doc fix</title>
<updated>2025-09-17T12:58:33+00:00</updated>
<author>
<name>Luc Ma</name>
<email>onion0709@gmail.com</email>
</author>
<published>2025-09-15T13:23:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=091767ee7510765886b8475c44fd81b2e6e9da87'/>
<id>urn:sha1:091767ee7510765886b8475c44fd81b2e6e9da87</id>
<content type='text'>
Function drm_sched_entity_do_release() has been renamed in
commit 180fc134d712 ("drm/scheduler: Rename cleanup functions v2.").

Refer to the correct function in the documentation.

Signed-off-by: Luc Ma &lt;onion0709@gmail.com&gt;
[phasta: commit message]
Signed-off-by: Philipp Stanner &lt;phasta@kernel.org&gt;
Link: https://lore.kernel.org/r/20250915132327.6293-1-onion0709@gmail.com
</content>
</entry>
<entry>
<title>drm/sched: Allow drivers to skip the reset and keep on running</title>
<updated>2025-07-15T11:27:07+00:00</updated>
<author>
<name>Maíra Canal</name>
<email>mcanal@igalia.com</email>
</author>
<published>2025-07-14T22:07:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=0b1217bfdfddf664c15954d1d51ee18ed88a2ccf'/>
<id>urn:sha1:0b1217bfdfddf664c15954d1d51ee18ed88a2ccf</id>
<content type='text'>
When the DRM scheduler times out, it's possible that the GPU isn't hung;
instead, a job just took unusually long (longer than the timeout) but is
still running, and there is, thus, no reason to reset the hardware. This
can occur in two scenarios:

  1. The job is taking longer than the timeout, but the driver determined
     through a GPU-specific mechanism that the hardware is still making
     progress. Hence, the driver would like the scheduler to skip the
     timeout and treat the job as still pending from then onward. This
     happens in v3d, Etnaviv, and Xe.
  2. Timeout has fired before the free-job worker. Consequently, the
     scheduler calls `sched-&gt;ops-&gt;timedout_job()` for a job that isn't
     timed out.

These two scenarios are problematic because the job was removed from the
`sched-&gt;pending_list` before calling `sched-&gt;ops-&gt;timedout_job()`, which
means that when the job finishes, it won't be freed by the scheduler
though `sched-&gt;ops-&gt;free_job()` - leading to a memory leak.

To solve these problems, create a new `drm_gpu_sched_stat`, called
DRM_GPU_SCHED_STAT_NO_HANG, which allows a driver to skip the reset. The
new status will indicate that the job must be reinserted into
`sched-&gt;pending_list`, and the hardware / driver will still complete that
job.

Reviewed-by: Philipp Stanner &lt;phasta@kernel.org&gt;
Link: https://lore.kernel.org/r/20250714-sched-skip-reset-v6-2-5c5ba4f55039@igalia.com
Signed-off-by: Maíra Canal &lt;mcanal@igalia.com&gt;
</content>
</entry>
<entry>
<title>drm/sched: Rename DRM_GPU_SCHED_STAT_NOMINAL to DRM_GPU_SCHED_STAT_RESET</title>
<updated>2025-07-15T11:27:00+00:00</updated>
<author>
<name>Maíra Canal</name>
<email>mcanal@igalia.com</email>
</author>
<published>2025-07-14T22:07:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=0a5dc1b67ef5c7e851b57764a2aab8cc4341a7b7'/>
<id>urn:sha1:0a5dc1b67ef5c7e851b57764a2aab8cc4341a7b7</id>
<content type='text'>
Among the scheduler's statuses, the only one that indicates an error is
DRM_GPU_SCHED_STAT_ENODEV. Any status other than DRM_GPU_SCHED_STAT_ENODEV
signifies that the operation succeeded and the GPU is in a nominal state.

However, to provide more information about the GPU's status, it is needed
to convey more information than just "OK".

Therefore, rename DRM_GPU_SCHED_STAT_NOMINAL to
DRM_GPU_SCHED_STAT_RESET, which better communicates the meaning of this
status. The status DRM_GPU_SCHED_STAT_RESET indicates that the GPU has
hung, but it has been successfully reset and is now in a nominal state
again.

Reviewed-by: Philipp Stanner &lt;phasta@kernel.org&gt;
Link: https://lore.kernel.org/r/20250714-sched-skip-reset-v6-1-5c5ba4f55039@igalia.com
Signed-off-by: Maíra Canal &lt;mcanal@igalia.com&gt;
</content>
</entry>
<entry>
<title>drm/sched: Avoid memory leaks with cancel_job() callback</title>
<updated>2025-07-10T15:07:08+00:00</updated>
<author>
<name>Philipp Stanner</name>
<email>phasta@kernel.org</email>
</author>
<published>2025-07-10T12:54:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=bf8bbaefaa6ae0a07971ea57b3208df60e8ad0a4'/>
<id>urn:sha1:bf8bbaefaa6ae0a07971ea57b3208df60e8ad0a4</id>
<content type='text'>
Since its inception, the GPU scheduler can leak memory if the driver
calls drm_sched_fini() while there are still jobs in flight.

The simplest way to solve this in a backwards compatible manner is by
adding a new callback, drm_sched_backend_ops.cancel_job(), which
instructs the driver to signal the hardware fence associated with the
job. Afterwards, the scheduler can safely use the established free_job()
callback for freeing the job.

Implement the new backend_ops callback cancel_job().

Suggested-by: Tvrtko Ursulin &lt;tvrtko.ursulin@igalia.com&gt;
Link: https://lore.kernel.org/dri-devel/20250418113211.69956-1-tvrtko.ursulin@igalia.com/
Reviewed-by: Maíra Canal &lt;mcanal@igalia.com&gt;
Acked-by: Tvrtko Ursulin &lt;tvrtko.ursulin@igalia.com&gt;
Signed-off-by: Philipp Stanner &lt;phasta@kernel.org&gt;
Link: https://lore.kernel.org/r/20250710125412.128476-4-phasta@kernel.org
</content>
</entry>
<entry>
<title>drm: Get rid of drm_sched_job.id</title>
<updated>2025-05-28T14:16:15+00:00</updated>
<author>
<name>Pierre-Eric Pelloux-Prayer</name>
<email>pierre-eric.pelloux-prayer@amd.com</email>
</author>
<published>2025-05-26T12:54:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=4f7fa5fa414cc2990afbdffc0333f7c7ba3756b1'/>
<id>urn:sha1:4f7fa5fa414cc2990afbdffc0333f7c7ba3756b1</id>
<content type='text'>
Its only purpose was for trace events, but jobs can already be
uniquely identified using their fence.

The downside of using the fence is that it's only available
after 'drm_sched_job_arm' was called which is true for all trace
events that used job.id so they can safely switch to using it.

Suggested-by: Tvrtko Ursulin &lt;tursulin@igalia.com&gt;
Signed-off-by: Pierre-Eric Pelloux-Prayer &lt;pierre-eric.pelloux-prayer@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Reviewed-by: Arvind Yadav &lt;arvind.yadav@amd.com&gt;
Signed-off-by: Philipp Stanner &lt;phasta@kernel.org&gt;
Link: https://lore.kernel.org/r/20250526125505.2360-9-pierre-eric.pelloux-prayer@amd.com
</content>
</entry>
<entry>
<title>drm/sched: Store the drm client_id in drm_sched_fence</title>
<updated>2025-05-28T14:15:58+00:00</updated>
<author>
<name>Pierre-Eric Pelloux-Prayer</name>
<email>pierre-eric.pelloux-prayer@amd.com</email>
</author>
<published>2025-05-26T12:54:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=2956554823cedb390b7ec4534afa898176317638'/>
<id>urn:sha1:2956554823cedb390b7ec4534afa898176317638</id>
<content type='text'>
This will be used in a later commit to trace the drm client_id in
some of the gpu_scheduler trace events.

This requires changing all the users of drm_sched_job_init to
add an extra parameter.

The newly added drm_client_id field in the drm_sched_fence is a bit
of a duplicate of the owner one. One suggestion I received was to
merge those 2 fields - this can't be done right now as amdgpu uses
some special values (AMDGPU_FENCE_OWNER_*) that can't really be
translated into a client id. Christian is working on getting rid of
those; when it's done we should be able to squash owner/drm_client_id
together.

Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Pierre-Eric Pelloux-Prayer &lt;pierre-eric.pelloux-prayer@amd.com&gt;
Signed-off-by: Philipp Stanner &lt;phasta@kernel.org&gt;
Link: https://lore.kernel.org/r/20250526125505.2360-3-pierre-eric.pelloux-prayer@amd.com
</content>
</entry>
<entry>
<title>drm/sched: Fix outdated comments referencing thread</title>
<updated>2025-05-13T13:39:48+00:00</updated>
<author>
<name>Philipp Stanner</name>
<email>phasta@kernel.org</email>
</author>
<published>2025-03-14T10:10:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=1773ea5caf0b6640aada70411eba6d33fef8e1d5'/>
<id>urn:sha1:1773ea5caf0b6640aada70411eba6d33fef8e1d5</id>
<content type='text'>
The GPU scheduler's comments refer to a "thread" at various places.
Those are leftovers from commit a6149f039369 ("drm/sched: Convert drm
scheduler to use a work queue rather than kthread").

Replace all references to kthreads.

Reviewed-by: Tvrtko Ursulin &lt;tvrtko.ursulin@igalia.com&gt;
Signed-off-by: Philipp Stanner &lt;phasta@kernel.org&gt;
Link: https://lore.kernel.org/r/20250314101023.111248-2-phasta@kernel.org
</content>
</entry>
<entry>
<title>drm/sched: Update timedout_job()'s documentation</title>
<updated>2025-03-06T15:36:32+00:00</updated>
<author>
<name>Philipp Stanner</name>
<email>phasta@kernel.org</email>
</author>
<published>2025-03-05T13:05:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=2eeed61db4550171a32854e4d3fdf974b749c214'/>
<id>urn:sha1:2eeed61db4550171a32854e4d3fdf974b749c214</id>
<content type='text'>
drm_sched_backend_ops.timedout_job()'s documentation is outdated. It
mentions the deprecated function drm_sched_resubmit_jobs(). Furthermore,
it does not point out the important distinction between hardware and
firmware schedulers.

Since firmware schedulers typically only use one entity per scheduler,
timeout handling is significantly more simple because the entity the
faulted job came from can just be killed without affecting innocent
processes.

Update the documentation with that distinction and other details.

Reformat the docstring to work to a unified style with the other
handles.

Acked-by: Danilo Krummrich &lt;dakr@kernel.org&gt;
Signed-off-by: Philipp Stanner &lt;phasta@kernel.org&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20250305130551.136682-5-phasta@kernel.org
</content>
</entry>
<entry>
<title>drm/sched: Adjust outdated docu for run_job()</title>
<updated>2025-03-06T15:35:48+00:00</updated>
<author>
<name>Philipp Stanner</name>
<email>phasta@kernel.org</email>
</author>
<published>2025-03-05T13:05:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=87edca6261c1327977d86c16857e74f4fc7c3ae8'/>
<id>urn:sha1:87edca6261c1327977d86c16857e74f4fc7c3ae8</id>
<content type='text'>
The documentation for drm_sched_backend_ops.run_job() mentions a certain
function called drm_sched_job_recovery(). This function does not exist.
What's actually meant is drm_sched_resubmit_jobs(), which is by now also
deprecated.

Furthermore, the scheduler expects to "inherit" a reference on the fence
from the run_job() callback. This, so far, is also not documented.

Remove the mention of the removed function.

Discourage the behavior of drm_sched_backend_ops.run_job() being called
multiple times for the same job.

Document the necessity of incrementing the refcount in run_job().

Acked-by: Danilo Krummrich &lt;dakr@kernel.org&gt;
Signed-off-by: Philipp Stanner &lt;phasta@kernel.org&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20250305130551.136682-3-phasta@kernel.org
</content>
</entry>
<entry>
<title>drm/sched: Group exported prototypes by object type</title>
<updated>2025-02-24T09:17:42+00:00</updated>
<author>
<name>Tvrtko Ursulin</name>
<email>tvrtko.ursulin@igalia.com</email>
</author>
<published>2025-02-21T10:50:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=27d4815149ba0c80ef2db2a82f0512f647e76d62'/>
<id>urn:sha1:27d4815149ba0c80ef2db2a82f0512f647e76d62</id>
<content type='text'>
Do a bit of house keeping in gpu_scheduler.h by grouping the API by type
of object it operates on.

Signed-off-by: Tvrtko Ursulin &lt;tvrtko.ursulin@igalia.com&gt;
Cc: Christian König &lt;christian.koenig@amd.com&gt;
Cc: Danilo Krummrich &lt;dakr@kernel.org&gt;
Cc: Matthew Brost &lt;matthew.brost@intel.com&gt;
Cc: Philipp Stanner &lt;phasta@kernel.org&gt;
Signed-off-by: Philipp Stanner &lt;phasta@kernel.org&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20250221105038.79665-7-tvrtko.ursulin@igalia.com
</content>
</entry>
</feed>
