kernel/linux.git/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c, branch v4.18.18

drm/amdgpu: Skip drm_sched_entity related ops for KIQ ring.

2018-05-18T21:08:16+00:00

Following change 75fbed2 we never initialize or use the GPU scheduler for KIQ and hence we need to skip KIQ ring when iterating amdgpu_ctx's scheduler entites. Signed-off-by: Andrey Grodzovsky Reviewed-by: Alex Deucher Signed-off-by: Alex Deucher

drm/scheduler: remove unused parameter

2018-05-15T18:44:27+00:00

this patch also effect the amdgpu and etnaviv drivers which use the function drm_sched_entity_init Signed-off-by: Nayan Deshmukh Suggested-by: Christian König Acked-by: Lucas Stach Reviewed-by: Christian König Signed-off-by: Alex Deucher

drm/amdgpu: Switch to interruptable wait to recover from ring hang.

2018-05-15T18:44:18+00:00

v2: Use dma_fence_wait instead of dma_fence_wait_timeout(...,MAX_SCHEDULE_TIMEOUT) Avoid printing error message for ERESTARTSYS Originally-by: David Panariti Signed-off-by: Andrey Grodzovsky Reviewed-by: Christian König Signed-off-by: Alex Deucher

drm/gpu-sched: fix force APP kill hang(v4)

2018-05-15T18:43:17+00:00

issue: there are VMC page fault occurred if force APP kill during 3dmark test, the cause is in entity_fini we manually signal all those jobs in entity's queue which confuse the sync/dep mechanism: 1)page fault occurred in sdma's clear job which operate on shadow buffer, and shadow buffer's Gart table is cleaned by ttm_bo_release since the fence in its reservation was fake signaled by entity_fini() under the case of SIGKILL received. 2)page fault occurred in gfx' job because during the lifetime of gfx job we manually fake signal all jobs from its entity in entity_fini(), thus the unmapping/clear PTE job depend on those result fence is satisfied and sdma start clearing the PTE and lead to GFX page fault. fix: 1)should at least wait all jobs already scheduled complete in entity_fini() if SIGKILL is the case. 2)if a fence signaled and try to clear some entity's dependency, should set this entity guilty to prevent its job really run since the dependency is fake signaled. v2: splitting drm_sched_entity_fini() into two functions: 1)The first one is does the waiting, removes the entity from the runqueue and returns an error when the process was killed. 2)The second one then goes over the entity, install it as completion signal for the remaining jobs and signals all jobs with an error code. v3: 1)Replace the fini1 and fini2 with better name 2)Call the first part before the VM teardown in amdgpu_driver_postclose_kms() and the second part after the VM teardown 3)Keep the original function drm_sched_entity_fini to refine the code. v4: 1)Rename entity->finished to entity->last_scheduled; 2)Rename drm_sched_entity_fini_job_cb() to drm_sched_entity_kill_jobs_cb(); 3)Pass NULL to drm_sched_entity_fini_job_cb() if -ENOENT; 4)Replace the type of entity->fini_status with "int"; 5)Remove the check about entity->finished. Signed-off-by: Monk Liu Signed-off-by: Emily Deng Reviewed-by: Christian König Signed-off-by: Alex Deucher

drm: move amd_gpu_scheduler into common location

2017-12-07T16:51:56+00:00

This moves and renames the AMDGPU scheduler to a common location in DRM in order to facilitate re-use by other drivers. This is mostly a straight forward rename with no code changes. One notable exception is the function to_drm_sched_fence(), which is no longer a inline header function to avoid the need to export the drm_sched_fence_ops_scheduled and drm_sched_fence_ops_finished structures. Reviewed-by: Chunming Zhou Tested-by: Dieter Nützel Acked-by: Alex Deucher Signed-off-by: Lucas Stach Signed-off-by: Alex Deucher

drm/amdgpu:implement ctx query2

2017-12-04T21:33:12+00:00

this query will give flag bits to indicate what happend on the given context Signed-off-by: Monk Liu Reviewed-by: Christian König Signed-off-by: Alex Deucher

drm/amdgpu:don't change ctx->reset_couner upon query

2017-12-04T21:33:11+00:00

reset_counter marks the reset counter number once the context is created, shouldn't be changed due to query. To keep U/K interface on the ctx_query and keep ctx's reset_counter logic compatible with GPU RESET feature, now use another var named "reset_counter_query" to replace the original checked & updated in amdgpu_ctx_query. Signed-off-by: Monk Liu Reviewed-by: Christian König Signed-off-by: Alex Deucher

drm/amdgpu:pass ctx->guilty address to entity init

2017-12-04T21:33:09+00:00

this way the real interested guilty is connected to entity->guilty pointer, and we can use entity->pointer later in gpu recovery procedure Signed-off-by: Monk Liu Reviewed-by: Chunming Zhou Signed-off-by: Alex Deucher

drm/amd/scheduler:introduce guilty pointer member

2017-12-04T21:33:09+00:00

this member will be used later, it will points to the real var inside of context and CS_SUBMIT & gpu schdduler can decide if skip a job depends on context->guilty or *entity->guilty Signed-off-by: Monk Liu Reviewed-by: Chunming Zhou Signed-off-by: Alex Deucher

drm/amdgpu: move the VRAM lost counter per context

2017-10-19T19:27:04+00:00

Instead of per device track the VRAM lost per context and return ECANCELED instead of ENODEV. Signed-off-by: Christian König Reviewed-by: Nicolai Hähnle Signed-off-by: Alex Deucher