<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/drivers/gpu/drm/msm/msm_ringbuffer.c, branch v6.12.80</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.12.80</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.12.80'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2024-02-01T23:24:10+00:00</updated>
<entry>
<title>Revert "drm/msm/gpu: Push gpu lock down past runpm"</title>
<updated>2024-02-01T23:24:10+00:00</updated>
<author>
<name>Rob Clark</name>
<email>robdclark@chromium.org</email>
</author>
<published>2024-01-09T18:22:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=917e9b7c2350e3e53162fcf5035e5f2d68e2cbed'/>
<id>urn:sha1:917e9b7c2350e3e53162fcf5035e5f2d68e2cbed</id>
<content type='text'>
This reverts commit abe2023b4cea192ab266b351fd38dc9dbd846df0.

Changing the locking order means that scheduler/msm_job_run() can race
with the recovery kthread worker, with the result that the GPU gets an
extra runpm get when we are trying to power it off.  Leaving the GPU in
an unrecovered state.

I'll need to come up with a different scheme for appeasing lockdep.

Signed-off-by: Rob Clark &lt;robdclark@chromium.org&gt;
Patchwork: https://patchwork.freedesktop.org/patch/573835/
</content>
</entry>
<entry>
<title>drm/msm/gem: Split out submit_unpin_objects() helper</title>
<updated>2023-12-10T18:23:13+00:00</updated>
<author>
<name>Rob Clark</name>
<email>robdclark@chromium.org</email>
</author>
<published>2023-11-21T00:38:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=2d7d2c4e84802485a1e765bd0732d41526dcf25c'/>
<id>urn:sha1:2d7d2c4e84802485a1e765bd0732d41526dcf25c</id>
<content type='text'>
Untangle unpinning from unlock/unref loop.  The unpin only happens in
error paths so it is easier to decouple from the normal unlock path.

Since we never have an intermediate state where a subset of buffers
are pinned (ie. we never bail out of the pin or unpin loops) we can
replace the bo state flag bit with a global flag in the submit.

Signed-off-by: Rob Clark &lt;robdclark@chromium.org&gt;
Reviewed-by: Dmitry Baryshkov &lt;dmitry.baryshkov@linaro.org&gt;
Patchwork: https://patchwork.freedesktop.org/patch/568335/
</content>
</entry>
<entry>
<title>drm/sched: Convert drm scheduler to use a work queue rather than kthread</title>
<updated>2023-11-01T21:29:21+00:00</updated>
<author>
<name>Matthew Brost</name>
<email>matthew.brost@intel.com</email>
</author>
<published>2023-10-31T03:24:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a6149f0393699308fb00149be913044977bceb56'/>
<id>urn:sha1:a6149f0393699308fb00149be913044977bceb56</id>
<content type='text'>
In Xe, the new Intel GPU driver, a choice has made to have a 1 to 1
mapping between a drm_gpu_scheduler and drm_sched_entity. At first this
seems a bit odd but let us explain the reasoning below.

1. In Xe the submission order from multiple drm_sched_entity is not
guaranteed to be the same completion even if targeting the same hardware
engine. This is because in Xe we have a firmware scheduler, the GuC,
which allowed to reorder, timeslice, and preempt submissions. If a using
shared drm_gpu_scheduler across multiple drm_sched_entity, the TDR falls
apart as the TDR expects submission order == completion order. Using a
dedicated drm_gpu_scheduler per drm_sched_entity solve this problem.

2. In Xe submissions are done via programming a ring buffer (circular
buffer), a drm_gpu_scheduler provides a limit on number of jobs, if the
limit of number jobs is set to RING_SIZE / MAX_SIZE_PER_JOB we get flow
control on the ring for free.

A problem with this design is currently a drm_gpu_scheduler uses a
kthread for submission / job cleanup. This doesn't scale if a large
number of drm_gpu_scheduler are used. To work around the scaling issue,
use a worker rather than kthread for submission / job cleanup.

v2:
  - (Rob Clark) Fix msm build
  - Pass in run work queue
v3:
  - (Boris) don't have loop in worker
v4:
  - (Tvrtko) break out submit ready, stop, start helpers into own patch
v5:
  - (Boris) default to ordered work queue
v6:
  - (Luben / checkpatch) fix alignment in msm_ringbuffer.c
  - (Luben) s/drm_sched_submit_queue/drm_sched_wqueue_enqueue
  - (Luben) Update comment for drm_sched_wqueue_enqueue
  - (Luben) Positive check for submit_wq in drm_sched_init
  - (Luben) s/alloc_submit_wq/own_submit_wq
v7:
  - (Luben) s/drm_sched_wqueue_enqueue/drm_sched_run_job_queue
v8:
  - (Luben) Adjust var names / comments

Signed-off-by: Matthew Brost &lt;matthew.brost@intel.com&gt;
Reviewed-by: Luben Tuikov &lt;luben.tuikov@amd.com&gt;
Link: https://lore.kernel.org/r/20231031032439.1558703-3-matthew.brost@intel.com
Signed-off-by: Luben Tuikov &lt;ltuikov89@gmail.com&gt;
</content>
</entry>
<entry>
<title>drm/sched: Convert the GPU scheduler to variable number of run-queues</title>
<updated>2023-10-26T16:03:47+00:00</updated>
<author>
<name>Luben Tuikov</name>
<email>luben.tuikov@amd.com</email>
</author>
<published>2023-10-15T01:15:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=56e449603f0ac580700621a356d35d5716a62ce5'/>
<id>urn:sha1:56e449603f0ac580700621a356d35d5716a62ce5</id>
<content type='text'>
The GPU scheduler has now a variable number of run-queues, which are set up at
drm_sched_init() time. This way, each driver announces how many run-queues it
requires (supports) per each GPU scheduler it creates. Note, that run-queues
correspond to scheduler "priorities", thus if the number of run-queues is set
to 1 at drm_sched_init(), then that scheduler supports a single run-queue,
i.e. single "priority". If a driver further sets a single entity per
run-queue, then this creates a 1-to-1 correspondence between a scheduler and
a scheduled entity.

Cc: Lucas Stach &lt;l.stach@pengutronix.de&gt;
Cc: Russell King &lt;linux+etnaviv@armlinux.org.uk&gt;
Cc: Qiang Yu &lt;yuq825@gmail.com&gt;
Cc: Rob Clark &lt;robdclark@gmail.com&gt;
Cc: Abhinav Kumar &lt;quic_abhinavk@quicinc.com&gt;
Cc: Dmitry Baryshkov &lt;dmitry.baryshkov@linaro.org&gt;
Cc: Danilo Krummrich &lt;dakr@redhat.com&gt;
Cc: Matthew Brost &lt;matthew.brost@intel.com&gt;
Cc: Boris Brezillon &lt;boris.brezillon@collabora.com&gt;
Cc: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Cc: Christian König &lt;christian.koenig@amd.com&gt;
Cc: Emma Anholt &lt;emma@anholt.net&gt;
Cc: etnaviv@lists.freedesktop.org
Cc: lima@lists.freedesktop.org
Cc: linux-arm-msm@vger.kernel.org
Cc: freedreno@lists.freedesktop.org
Cc: nouveau@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Luben Tuikov &lt;luben.tuikov@amd.com&gt;
Acked-by: Christian König &lt;christian.koenig@amd.com&gt;
Link: https://lore.kernel.org/r/20231023032251.164775-1-luben.tuikov@amd.com
</content>
</entry>
<entry>
<title>drm/msm/gpu: Push gpu lock down past runpm</title>
<updated>2023-08-15T17:09:30+00:00</updated>
<author>
<name>Rob Clark</name>
<email>robdclark@chromium.org</email>
</author>
<published>2023-08-10T21:31:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=abe2023b4cea192ab266b351fd38dc9dbd846df0'/>
<id>urn:sha1:abe2023b4cea192ab266b351fd38dc9dbd846df0</id>
<content type='text'>
Avoid holding gpu lock when calling runpm, to avoid this lockdep splat:

   ======================================================
   WARNING: possible circular locking dependency detected
   6.4.3-debug+ #14 Not tainted
   ------------------------------------------------------
   ring0/373 is trying to acquire lock:
   ffffffead86efb98 (prepare_lock){+.+.}-{3:3}, at: clk_prepare_lock+0x70/0x98

   but task is already holding lock:
   ffffff809cd19170 (&amp;gpu-&gt;lock){+.+.}-{3:3}, at: msm_job_run+0x7c/0x128 [msm]

   which lock already depends on the new lock.

   the existing dependency chain (in reverse order) is:

   -&gt; #4 (&amp;gpu-&gt;lock){+.+.}-{3:3}:
          __mutex_lock+0xc8/0x388
          mutex_lock_nested+0x2c/0x38
          msm_job_run+0x7c/0x128 [msm]
          drm_sched_main+0x264/0x354 [gpu_sched]
          kthread+0xf0/0x100
          ret_from_fork+0x10/0x20

   -&gt; #3 (dma_fence_map){++++}-{0:0}:
          __dma_fence_might_wait+0x74/0xc0
          dma_resv_lockdep+0x1f0/0x2e8
          do_one_initcall+0xb4/0x214
          kernel_init_freeable+0x338/0x33c
          kernel_init+0x30/0x134
          ret_from_fork+0x10/0x20

   -&gt; #2 (mmu_notifier_invalidate_range_start){+.+.}-{0:0}:
          fs_reclaim_acquire+0x7c/0x9c
          slab_pre_alloc_hook.constprop.0+0x40/0x250
          __kmem_cache_alloc_node+0x60/0x18c
          kmalloc_node_trace+0x40/0x84
          alloc_worker+0x2c/0x64
          init_rescuer+0x34/0xe0
          workqueue_init+0x168/0x1fc
          kernel_init_freeable+0x15c/0x33c
          kernel_init+0x30/0x134
          ret_from_fork+0x10/0x20

   -&gt; #1 (fs_reclaim){+.+.}-{0:0}:
          __fs_reclaim_acquire+0x3c/0x48
          fs_reclaim_acquire+0x50/0x9c
          slab_pre_alloc_hook.constprop.0+0x40/0x250
          __kmem_cache_alloc_node+0x60/0x18c
          kmalloc_trace+0x44/0x88
          clk_rcg2_dfs_determine_rate+0x60/0x214
          clk_core_determine_round_nolock+0xb8/0xf0
          clk_core_round_rate_nolock+0x84/0x118
          clk_core_round_rate_nolock+0xd8/0x118
          clk_round_rate+0x6c/0xd0
          geni_se_clk_tbl_get+0x78/0xc0
          geni_se_clk_freq_match+0x44/0xe4
          get_spi_clk_cfg+0x50/0xf4
          geni_spi_set_clock_and_bw+0x54/0x104
          spi_geni_prepare_message+0x130/0x174
          __spi_pump_transfer_message+0x200/0x4d8
          __spi_sync+0x13c/0x23c
          spi_sync_locked+0x18/0x24
          do_cros_ec_pkt_xfer_spi+0x124/0x3f0
          cros_ec_xfer_high_pri_work+0x28/0x3c
          kthread_worker_fn+0x14c/0x27c
          kthread+0xf0/0x100
          ret_from_fork+0x10/0x20

   -&gt; #0 (prepare_lock){+.+.}-{3:3}:
          __lock_acquire+0xdf8/0x109c
          lock_acquire+0x234/0x284
          __mutex_lock+0xc8/0x388
          mutex_lock_nested+0x2c/0x38
          clk_prepare_lock+0x70/0x98
          clk_prepare+0x24/0x50
          clk_bulk_prepare+0x50/0x9c
          a6xx_gmu_resume+0x94/0x800 [msm]
          a6xx_gmu_pm_resume+0x38/0x158 [msm]
          adreno_runtime_resume+0x2c/0x38 [msm]
          pm_generic_runtime_resume+0x30/0x44
          __rpm_callback+0x4c/0x134
          rpm_callback+0x78/0x7c
          rpm_resume+0x3a4/0x46c
          __pm_runtime_resume+0x78/0xbc
          pm_runtime_get_sync.isra.0+0x14/0x20 [msm]
          msm_gpu_submit+0x4c/0x12c [msm]
          msm_job_run+0x88/0x128 [msm]
          drm_sched_main+0x264/0x354 [gpu_sched]
          kthread+0xf0/0x100
          ret_from_fork+0x10/0x20

   other info that might help us debug this:
   Chain exists of:
     prepare_lock --&gt; dma_fence_map --&gt; &amp;gpu-&gt;lock
    Possible unsafe locking scenario:
          CPU0                    CPU1
          ----                    ----
     lock(&amp;gpu-&gt;lock);
                                  lock(dma_fence_map);
                                  lock(&amp;gpu-&gt;lock);
     lock(prepare_lock);

    *** DEADLOCK ***
   2 locks held by ring0/373:
    #0: ffffffead875ae50 (dma_fence_map){++++}-{0:0}, at: drm_sched_main+0x54/0x354 [gpu_sched]
    #1: ffffff809cd19170 (&amp;gpu-&gt;lock){+.+.}-{3:3}, at: msm_job_run+0x7c/0x128 [msm]

   stack backtrace:
   CPU: 2 PID: 373 Comm: ring0 Not tainted 6.4.3-debug+ #14
   Hardware name: Google Villager (rev1+) with LTE (DT)
   Call trace:
    dump_backtrace+0xb4/0xf0
    show_stack+0x20/0x30
    dump_stack_lvl+0x60/0x84
    dump_stack+0x18/0x24
    print_circular_bug+0x1cc/0x234
    check_noncircular+0x78/0xac
    __lock_acquire+0xdf8/0x109c
    lock_acquire+0x234/0x284
    __mutex_lock+0xc8/0x388
    mutex_lock_nested+0x2c/0x38
    clk_prepare_lock+0x70/0x98
    clk_prepare+0x24/0x50
    clk_bulk_prepare+0x50/0x9c
    a6xx_gmu_resume+0x94/0x800 [msm]
    a6xx_gmu_pm_resume+0x38/0x158 [msm]
    adreno_runtime_resume+0x2c/0x38 [msm]
    pm_generic_runtime_resume+0x30/0x44
    __rpm_callback+0x4c/0x134
    rpm_callback+0x78/0x7c
    rpm_resume+0x3a4/0x46c
    __pm_runtime_resume+0x78/0xbc
    pm_runtime_get_sync.isra.0+0x14/0x20 [msm]
    msm_gpu_submit+0x4c/0x12c [msm]
    msm_job_run+0x88/0x128 [msm]
    drm_sched_main+0x264/0x354 [gpu_sched]
    kthread+0xf0/0x100
    ret_from_fork+0x10/0x20

Signed-off-by: Rob Clark &lt;robdclark@chromium.org&gt;
Patchwork: https://patchwork.freedesktop.org/patch/552298/
</content>
</entry>
<entry>
<title>drm/msm: Remove vma use tracking</title>
<updated>2023-08-10T20:08:03+00:00</updated>
<author>
<name>Rob Clark</name>
<email>robdclark@chromium.org</email>
</author>
<published>2023-08-02T22:21:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=7391c282ba0f0e82ac131658e2faf712215ed6a2'/>
<id>urn:sha1:7391c282ba0f0e82ac131658e2faf712215ed6a2</id>
<content type='text'>
This was not strictly necessary, as page unpinning (ie. shrinker) only
cares about the resv.  It did give us some extra sanity checking for
userspace controlled iova, and was useful to catch issues on kernel and
userspace side when enabling userspace iova.  But if userspace screws
this up, it just corrupts it's own gpu buffers and/or gets iova faults.
So we can just let userspace shoot it's own foot and drop the extra per-
buffer SUBMIT overhead.

Signed-off-by: Rob Clark &lt;robdclark@chromium.org&gt;
Acked-by: Daniel Vetter &lt;daniel.vetter@ffwll.ch&gt;
Patchwork: https://patchwork.freedesktop.org/patch/551023/
</content>
</entry>
<entry>
<title>drm/msm: Use drm_gem_object in submit bos table</title>
<updated>2023-08-10T17:44:02+00:00</updated>
<author>
<name>Rob Clark</name>
<email>robdclark@chromium.org</email>
</author>
<published>2023-08-02T22:21:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=6ba5daa5d5ad54b78aeac8912092f986e8d4c38f'/>
<id>urn:sha1:6ba5daa5d5ad54b78aeac8912092f986e8d4c38f</id>
<content type='text'>
Basically everywhere wants the base ptr type.  So store that instead of
msm_gem_object.

Signed-off-by: Rob Clark &lt;robdclark@chromium.org&gt;
Patchwork: https://patchwork.freedesktop.org/patch/551021/
</content>
</entry>
<entry>
<title>drm/msm: Take lru lock once per job_run</title>
<updated>2023-08-10T17:44:01+00:00</updated>
<author>
<name>Rob Clark</name>
<email>robdclark@chromium.org</email>
</author>
<published>2023-08-02T22:21:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=1a8b612ef09bcba3708443339adfad9802d3e9d8'/>
<id>urn:sha1:1a8b612ef09bcba3708443339adfad9802d3e9d8</id>
<content type='text'>
Rather than acquiring it and dropping it for each individual obj.

Signed-off-by: Rob Clark &lt;robdclark@chromium.org&gt;
Patchwork: https://patchwork.freedesktop.org/patch/551019/
</content>
</entry>
<entry>
<title>drm/msm/gem: Avoid obj lock in job_run()</title>
<updated>2023-03-25T23:31:44+00:00</updated>
<author>
<name>Rob Clark</name>
<email>robdclark@chromium.org</email>
</author>
<published>2023-03-20T14:43:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=17b704f1c0fb9150551567cb7a5414fb761b57ea'/>
<id>urn:sha1:17b704f1c0fb9150551567cb7a5414fb761b57ea</id>
<content type='text'>
Now that everything that controls which LRU an obj lives in *except* the
backing pages is protected by the LRU lock, add a special path to unpin
in the job_run() path, where we are assured that we already have backing
pages and will not be racing against eviction (because the GEM object's
dma_resv contains the fence that will be signaled when the submit/job
completes).

Signed-off-by: Rob Clark &lt;robdclark@chromium.org&gt;
Patchwork: https://patchwork.freedesktop.org/patch/527845/
Link: https://lore.kernel.org/r/20230320144356.803762-10-robdclark@gmail.com
</content>
</entry>
<entry>
<title>drm/msm: Decouple vma tracking from obj lock</title>
<updated>2023-03-25T23:31:44+00:00</updated>
<author>
<name>Rob Clark</name>
<email>robdclark@chromium.org</email>
</author>
<published>2023-03-20T14:43:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b14b8c5f0eaf363e33c5410c96a6259dee6bd78e'/>
<id>urn:sha1:b14b8c5f0eaf363e33c5410c96a6259dee6bd78e</id>
<content type='text'>
We need to use the inuse count to track that a BO is pinned until
we have the hw_fence.  But we want to remove the obj lock from the
job_run() path as this could deadlock against reclaim/shrinker
(because it is blocking the hw_fence from eventually being signaled).
So split that tracking out into a per-vma lock with narrower scope.

Signed-off-by: Rob Clark &lt;robdclark@chromium.org&gt;
Patchwork: https://patchwork.freedesktop.org/patch/527839/
Link: https://lore.kernel.org/r/20230320144356.803762-5-robdclark@gmail.com
</content>
</entry>
</feed>
