summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorPhilipp Stanner <phasta@kernel.org>2025-11-07 16:57:00 +0300
committerPhilipp Stanner <phasta@kernel.org>2025-12-02 12:40:37 +0300
commit9d56cbaf12037e8ce7ead9f8f8f9000e4784f2eb (patch)
treeaa6b205443b83bc74b3cbb2720afb5c28b4fca34
parent7068d42048dab5eb71a0d65388f64f1e0ca5b9ee (diff)
downloadlinux-9d56cbaf12037e8ce7ead9f8f8f9000e4784f2eb.tar.xz
drm/todo: Add section with task for GPU scheduler
The GPU scheduler has a great many problems and deserves its own TODO section. Add a section and a first task describing the problem of drm_sched_resubmit_jobs() being deprecated without a successor. Acked-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Philipp Stanner <phasta@kernel.org> Link: https://patch.msgid.link/20251107135701.244659-3-phasta@kernel.org
-rw-r--r--Documentation/gpu/todo.rst31
1 files changed, 31 insertions, 0 deletions
diff --git a/Documentation/gpu/todo.rst b/Documentation/gpu/todo.rst
index 9013ced318cb..572a5611dd0c 100644
--- a/Documentation/gpu/todo.rst
+++ b/Documentation/gpu/todo.rst
@@ -878,6 +878,37 @@ Contact: Christian König
Level: Starter
+DRM GPU Scheduler
+=================
+
+Provide a universal successor for drm_sched_resubmit_jobs()
+-----------------------------------------------------------
+
+drm_sched_resubmit_jobs() is deprecated. Main reason being that it leads to
+reinitializing dma_fences. See that function's docu for details. The better
+approach for valid resubmissions by amdgpu and Xe is (apparently) to figure out
+which job (and, through association: which entity) caused the hang. Then, the
+job's buffer data, together with all other jobs' buffer data currently in the
+same hardware ring, must be invalidated. This can for example be done by
+overwriting it. amdgpu currently determines which jobs are in the ring and need
+to be overwritten by keeping copies of the job. Xe obtains that information by
+directly accessing drm_sched's pending_list.
+
+Tasks:
+
+1. implement scheduler functionality through which the driver can obtain the
+ information which *broken* jobs are currently in the hardware ring.
+2. Such infrastructure would then typically be used in
+ drm_sched_backend_ops.timedout_job(). Document that.
+3. Port a driver as first user.
+4. Document the new alternative in the docu of deprecated
+ drm_sched_resubmit_jobs().
+
+Contact: Christian König <christian.koenig@amd.com>
+ Philipp Stanner <phasta@kernel.org>
+
+Level: Advanced
+
Outside DRM
===========