kernel/linux.git/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c, branch v4.14.152

drm/amdgpu: bypass lru touch for KIQ ring submission

2017-12-20T09:10:23+00:00

[ Upstream commit dce1e131dd4dc68099ff1b70aa03cd2d0acf8639 ] KIQ ring submission is used for register accessing on SRIOV VF that could happen both in irq enabled and irq disabled cases. Inversion lock could happen on adev->ring_lru_list_lock, while this operation is useless and just adds overhead in this use case. Signed-off-by: Pixel Ding Reviewed-by: Monk Liu Reviewed-by: Christian König Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman

drm/amdgpu: set sched_hw_submission higher for KIQ (v3)

2017-08-24T15:48:45+00:00

KIQ doesn't really use the GPU scheduler. The base drivers generally use the KIQ ring directly rather than submitting IBs. However, amdgpu_sched_hw_submission (which defaults to 2) limits the number of outstanding fences to 2. KFD uses the KIQ for TLB flushes and the 2 fence limit hurts performance when there are several KFD processes running. v2: move some expressions to one line change KIQ sched_hw_submission to at least 16 v3: bump to 256 Reviewed-by: Christian König Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher

drm/amdgpu: don't finish the ring if not initialized

2017-08-15T18:46:17+00:00

If a ring is not initialized, it also should not be finished. For example, in Vega10's SR-IOV environment, UVD's decode ring is not initialized, but will be finnished in amdgpu_uvd_sw_fini, because UVD driver put all the uvd decode ring's finish operation into amdgpu_uvd_sw_fini function, while not uvd_vXXX_0_sw_fini. This will lead to amdgpu module unloading failure. Signed-off-by: Trigger Huang Reviewed-by: Monk Liu Reviewed-by: Christian König Signed-off-by: Alex Deucher

drm/amdgpu: use 256 bit buffers for all wb allocations (v2)

2017-08-15T18:46:08+00:00

May waste a bit of memory, but simplifies the interface significantly. v2: convert internal accounting to use 256bit slots Reviewed-by: Christian König Signed-off-by: Alex Deucher

drm/amdgpu: make wb 256bit function names consistent

2017-08-15T18:45:59+00:00

Use a lower case b to be consistent with the other wb functions. Reviewed-by: Christian König Signed-off-by: Alex Deucher

drm/amdgpu:fix gfx fence allocate size

2017-07-25T20:29:26+00:00

1, for sriov, we need 8dw for the gfx fence due to CP behaviour 2, cleanup wrong logic in wptr/rptr wb alloc and free Change-Id: Ifbfed17a4621dae57244942ffac7de1743de0294 Signed-off-by: Monk Liu Signed-off-by: Xiangliang Yu Reviewed-by: Alex Deucher Reviewed-by: Christian König Signed-off-by: Alex Deucher

drm/amdgpu: Move compute vm bug logic to amdgpu_vm.c

2017-06-01T20:00:20+00:00

In review, Christian would like to keep the logic inside amdgpu_vm.c with a cost of slightly slower. The loop is still optimized out with this patch. v2: remove the if statement. Now it is not slower. Signed-off-by: Alex Xie Reviewed-by: Christian König Signed-off-by: Alex Deucher

drm/amdgpu: guarantee bijective mapping of ring ids for LRU v3

2017-05-31T20:49:03+00:00

Depending on usage patterns, the current LRU policy may create a non-injective mapping between userspace ring ids and kernel rings. This behaviour is undesired as apps that attempt to fill all HW blocks would be unable to reach some of them. This change forces the LRU policy to create bijective mappings only. v2: compress ring_blacklist v3: simplify amdgpu_ring_is_blacklisted() logic Signed-off-by: Andres Rodriguez Reviewed-by: Nicolai Hähnle Signed-off-by: Alex Deucher

drm/amdgpu: implement lru amdgpu_queue_mgr policy for compute v4

2017-05-31T20:49:02+00:00

Use an LRU policy to map usermode rings to HW compute queues. Most compute clients use one queue, and usually the first queue available. This results in poor pipe/queue work distribution when multiple compute apps are running. In most cases pipe 0 queue 0 is the only queue that gets used. In order to better distribute work across multiple HW queues, we adopt a policy to map the usermode ring ids to the LRU HW queue. This fixes a large majority of multi-app compute workloads sharing the same HW queue, even though 7 other queues are available. v2: use ring->funcs->type instead of ring->hw_ip v3: remove amdgpu_queue_mapper_funcs v4: change ring_lru_list_lock to spinlock, grab only once in lru_get() Signed-off-by: Andres Rodriguez Signed-off-by: Alex Deucher

drm/amdgpu: Optimize a function called by every IB sheduling

2017-05-31T18:16:38+00:00

Move several if statements and a loop statment from run time to initialization time. Signed-off-by: Alex Xie Reviewed-by: Chunming Zhou Signed-off-by: Alex Deucher