kernel/linux.git/drivers/gpu/drm/scheduler, branch v5.15.89

drm/sched: Allow using a dedicated workqueue for the timeout/fault tdr

2021-07-01T06:53:25+00:00

Mali Midgard/Bifrost GPUs have 3 hardware queues but only a global GPU reset. This leads to extra complexity when we need to synchronize timeout works with the reset work. One solution to address that is to have an ordered workqueue at the driver level that will be used by the different schedulers to queue their timeout work. Thanks to the serialization provided by the ordered workqueue we are guaranteed that timeout handlers are executed sequentially, and can thus easily reset the GPU from the timeout handler without extra synchronization. v5: * Add a new paragraph to the timedout_job() method v3: * New patch v4: * Actually use the timeout_wq to queue the timeout work Suggested-by: Daniel Vetter Signed-off-by: Boris Brezillon Reviewed-by: Steven Price Reviewed-by: Lucas Stach Acked-by: Daniel Vetter Acked-by: Christian König Cc: Qiang Yu Cc: Emma Anholt Cc: Alex Deucher Cc: "Christian König" Link: https://patchwork.freedesktop.org/patch/msgid/20210630062751.2832545-3-boris.brezillon@collabora.com

drm/sched: Declare entity idle only after HW submission

2021-06-28T11:16:49+00:00

The panfrost driver tries to kill in-flight jobs on FD close after destroying the FD scheduler entities. For this to work properly, we need to make sure the jobs popped from the scheduler entities have been queued at the HW level before declaring the entity idle, otherwise we might iterate over a list that doesn't contain those jobs. Suggested-by: Lucas Stach Signed-off-by: Boris Brezillon Cc: Lucas Stach Reviewed-by: Steven Price Reviewed-by: Lucas Stach Link: https://patchwork.freedesktop.org/patch/msgid/20210624140850.2229697-1-boris.brezillon@collabora.com

drm/sched: Avoid data corruptions

2021-05-20T03:50:28+00:00

Wait for all dependencies of a job to complete before killing it to avoid data corruptions. Signed-off-by: Andrey Grodzovsky Reviewed-by: Christian König Link: https://patchwork.freedesktop.org/patch/msgid/20210519141407.88444-1-andrey.grodzovsky@amd.com

drm/scheduler: Fix hang when sched_entity released

2021-05-20T03:50:28+00:00

Problem: If scheduler is already stopped by the time sched_entity is released and entity's job_queue not empty I encountred a hang in drm_sched_entity_flush. This is because drm_sched_entity_is_idle never becomes false. Fix: In drm_sched_fini detach all sched_entities from the scheduler's run queues. This will satisfy drm_sched_entity_is_idle. Also wakeup all those processes stuck in sched_entity flushing as the scheduler main thread which wakes them up is stopped by now. v2: Reverse order of drm_sched_rq_remove_entity and marking s_entity as stopped to prevent reinserion back to rq due to race. v3: Drop drm_sched_rq_remove_entity, only modify entity->stopped and check for it in drm_sched_entity_is_idle Signed-off-by: Andrey Grodzovsky Reviewed-by: Christian König Link: https://patchwork.freedesktop.org/patch/msgid/20210512142648.666476-14-andrey.grodzovsky@amd.com

drm/sched: Make timeout timer rearm conditional.

2021-05-20T03:50:28+00:00

We don't want to rearm the timer if driver hook reports that the device is gone. v5: Update drm_gpu_sched_stat values in code. Signed-off-by: Andrey Grodzovsky Reviewed-by: Christian König Link: https://patchwork.freedesktop.org/patch/msgid/20210512142648.666476-11-andrey.grodzovsky@amd.com

drm/scheduler: Change scheduled fence track v2

2021-05-05T07:26:36+00:00

Update the timestamp of scheduled fence on HW completion of the previous fences This allow more accurate tracking of the fence execution in HW v2 (chk): drop the flag check and improve the comment Signed-off-by: David M Nieto Signed-off-by: Roy Sun Reviewed-by: Christian König Signed-off-by: Christian König Link: https://patchwork.freedesktop.org/patch/msgid/20210426062701.39732-1-Roy.Sun@amd.com

Merge drm/drm-next into drm-misc-next

2021-04-26T12:03:09+00:00

Christian needs some patches from drm/next Signed-off-by: Maxime Ripard

drm/scheduler/sched_entity: Fix some function name disparity

2021-04-22T12:05:17+00:00

Fixes the following W=1 kernel build warning(s): drivers/gpu/drm/scheduler/sched_entity.c:204: warning: expecting prototype for drm_sched_entity_kill_jobs(). Prototype was for drm_sched_entity_kill_jobs_cb() instead drivers/gpu/drm/scheduler/sched_entity.c:262: warning: expecting prototype for drm_sched_entity_cleanup(). Prototype was for drm_sched_entity_fini() instead drivers/gpu/drm/scheduler/sched_entity.c:305: warning: expecting prototype for drm_sched_entity_fini(). Prototype was for drm_sched_entity_destroy() instead Cc: David Airlie Cc: Daniel Vetter Cc: Sumit Semwal Cc: "Christian König" Cc: dri-devel@lists.freedesktop.org Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Signed-off-by: Lee Jones Link: https://patchwork.freedesktop.org/patch/msgid/20210416143725.2769053-25-lee.jones@linaro.org Reviewed-by: Christian König Signed-off-by: Christian König

drm/amd/amdgpu implement tdr advanced mode

2021-04-09T20:45:45+00:00

[Why] Previous tdr design treats the first job in job_timeout as the bad job. But sometimes a later bad compute job can block a good gfx job and cause an unexpected gfx job timeout because gfx and compute ring share internal GC HW mutually. [How] This patch implements an advanced tdr mode.It involves an additinal synchronous pre-resubmit step(Step0 Resubmit) before normal resubmit step in order to find the real bad job. 1. At Step0 Resubmit stage, it synchronously submits and pends for the first job being signaled. If it gets timeout, we identify it as guilty and do hw reset. After that, we would do the normal resubmit step to resubmit left jobs. 2. For whole gpu reset(vram lost), do resubmit as the old way. v2: squash in build fix (Alex) Signed-off-by: Jack Zhang Reviewed-by: Andrey Grodzovsky Signed-off-by: Alex Deucher

drm/sched: select new rq even if there is only one v3

2021-03-08T13:06:17+00:00

This is necessary when changing priorities of an entity. v2: test the sched_list instead of num_sched. v3: set the sched_list to NULL when there is only one entry Signed-off-by: Christian König Reviewed-by: Sonny Jiang Link: https://patchwork.freedesktop.org/patch/msgid/20210305125155.2312-1-christian.koenig@amd.com