diff options
author | Jack Zhang <Jack.Zhang1@amd.com> | 2021-03-08 07:41:27 +0300 |
---|---|---|
committer | Alex Deucher <alexander.deucher@amd.com> | 2021-04-09 23:45:45 +0300 |
commit | e6c6338f393b74ac0b303d567bb918b44ae7ad75 (patch) | |
tree | 589404ebe414ab0b537e4bbfe5388dce0e444557 /include/drm/gpu_scheduler.h | |
parent | 030bb4addb36ee94e286eb51486f990cac433825 (diff) | |
download | linux-e6c6338f393b74ac0b303d567bb918b44ae7ad75.tar.xz |
drm/amd/amdgpu implement tdr advanced mode
[Why]
Previous tdr design treats the first job in job_timeout as the bad job.
But sometimes a later bad compute job can block a good gfx job and
cause an unexpected gfx job timeout because gfx and compute ring share
internal GC HW mutually.
[How]
This patch implements an advanced tdr mode.It involves an additinal
synchronous pre-resubmit step(Step0 Resubmit) before normal resubmit
step in order to find the real bad job.
1. At Step0 Resubmit stage, it synchronously submits and pends for the
first job being signaled. If it gets timeout, we identify it as guilty
and do hw reset. After that, we would do the normal resubmit step to
resubmit left jobs.
2. For whole gpu reset(vram lost), do resubmit as the old way.
v2: squash in build fix (Alex)
Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com>
Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Diffstat (limited to 'include/drm/gpu_scheduler.h')
-rw-r--r-- | include/drm/gpu_scheduler.h | 3 |
1 files changed, 3 insertions, 0 deletions
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h index f888b5e9583a..10225a0a35d0 100644 --- a/include/drm/gpu_scheduler.h +++ b/include/drm/gpu_scheduler.h @@ -322,7 +322,10 @@ void drm_sched_wakeup(struct drm_gpu_scheduler *sched); void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad); void drm_sched_start(struct drm_gpu_scheduler *sched, bool full_recovery); void drm_sched_resubmit_jobs(struct drm_gpu_scheduler *sched); +void drm_sched_resubmit_jobs_ext(struct drm_gpu_scheduler *sched, int max); void drm_sched_increase_karma(struct drm_sched_job *bad); +void drm_sched_reset_karma(struct drm_sched_job *bad); +void drm_sched_increase_karma_ext(struct drm_sched_job *bad, int type); bool drm_sched_dependency_optimized(struct dma_fence* fence, struct drm_sched_entity *entity); void drm_sched_fault(struct drm_gpu_scheduler *sched); |