<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c, branch v6.18.21</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.18.21</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.18.21'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2026-03-04T12:19:54+00:00</updated>
<entry>
<title>drm/amdgpu: Skip loading SDMA_RS64 in VF</title>
<updated>2026-03-04T12:19:54+00:00</updated>
<author>
<name>YuBiao Wang</name>
<email>YuBiao.Wang@amd.com</email>
</author>
<published>2025-11-12T07:16:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=2b9e44b3849fdd20cfd88ec51102a03e60f0af6b'/>
<id>urn:sha1:2b9e44b3849fdd20cfd88ec51102a03e60f0af6b</id>
<content type='text'>
[ Upstream commit 39c21b81112321cbe1267b02c77ecd2161ce19aa ]

VFs use the PF SDMA ucode and are unable to load SDMA_RS64.

Signed-off-by: YuBiao Wang &lt;YuBiao.Wang@amd.com&gt;
Signed-off-by: Victor Skvortsov &lt;Victor.Skvortsov@amd.com&gt;
Reviewed-by: Gavin Wan &lt;gavin.wan@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Fix NULL pointer dereference in VRAM logic for APU devices</title>
<updated>2025-10-13T18:14:15+00:00</updated>
<author>
<name>Jesse.Zhang</name>
<email>Jesse.Zhang@amd.com</email>
</author>
<published>2025-10-13T05:46:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=883f309add55060233bf11c1ea6947140372920f'/>
<id>urn:sha1:883f309add55060233bf11c1ea6947140372920f</id>
<content type='text'>
Previously, APU platforms (and other scenarios with uninitialized VRAM managers)
triggered a NULL pointer dereference in `ttm_resource_manager_usage()`. The root
cause is not that the `struct ttm_resource_manager *man` pointer itself is NULL,
but that `man-&gt;bdev` (the backing device pointer within the manager) remains
uninitialized (NULL) on APUs—since APUs lack dedicated VRAM and do not fully
set up VRAM manager structures. When `ttm_resource_manager_usage()` attempts to
acquire `man-&gt;bdev-&gt;lru_lock`, it dereferences the NULL `man-&gt;bdev`, leading to
a kernel OOPS.

1. **amdgpu_cs.c**: Extend the existing bandwidth control check in
   `amdgpu_cs_get_threshold_for_moves()` to include a check for
   `ttm_resource_manager_used()`. If the manager is not used (uninitialized
   `bdev`), return 0 for migration thresholds immediately—skipping VRAM-specific
   logic that would trigger the NULL dereference.

2. **amdgpu_kms.c**: Update the `AMDGPU_INFO_VRAM_USAGE` ioctl and memory info
   reporting to use a conditional: if the manager is used, return the real VRAM
   usage; otherwise, return 0. This avoids accessing `man-&gt;bdev` when it is
   NULL.

3. **amdgpu_virt.c**: Modify the vf2pf (virtual function to physical function)
   data write path. Use `ttm_resource_manager_used()` to check validity: if the
   manager is usable, calculate `fb_usage` from VRAM usage; otherwise, set
   `fb_usage` to 0 (APUs have no discrete framebuffer to report).

This approach is more robust than APU-specific checks because it:
- Works for all scenarios where the VRAM manager is uninitialized (not just APUs),
- Aligns with TTM's design by using its native helper function,
- Preserves correct behavior for discrete GPUs (which have fully initialized
  `man-&gt;bdev` and pass the `ttm_resource_manager_used()` check).

v4: use ttm_resource_manager_used(&amp;adev-&gt;mman.vram_mgr.manager) instead of checking the adev-&gt;gmc.is_app_apu flag (Christian)

Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Suggested-by: Lijo Lazar &lt;lijo.lazar@amd.com&gt;
Signed-off-by: Jesse Zhang &lt;Jesse.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Introduce VF critical region check for RAS poison injection</title>
<updated>2025-09-18T13:43:02+00:00</updated>
<author>
<name>Xiang Liu</name>
<email>xiang.liu@amd.com</email>
</author>
<published>2025-08-19T04:51:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=f1fdeb3d07a4c2dcfc8cf0e20c0bddb68e8bc607'/>
<id>urn:sha1:f1fdeb3d07a4c2dcfc8cf0e20c0bddb68e8bc607</id>
<content type='text'>
The SRIOV guest send requet to host to check whether the poison
injection address is in VF critical region or not via mabox.

Signed-off-by: Xiang Liu &lt;xiang.liu@amd.com&gt;
Reviewed-by: Shravan Kumar Gande &lt;Shravankumar.Gande@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Implement Runtime Bad Page query for VFs</title>
<updated>2025-05-07T21:41:49+00:00</updated>
<author>
<name>Ellen Pan</name>
<email>yunru.pan@amd.com</email>
</author>
<published>2025-04-29T20:48:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=5da3d8820dd3202d5ae871050e45facb2787235d'/>
<id>urn:sha1:5da3d8820dd3202d5ae871050e45facb2787235d</id>
<content type='text'>
Host will send a notification when new bad pages are available.

Uopn guest request, the first 256 bad page addresses
will be placed into the PF2VF region.
Guest should pause the PF2VF worker thread while
the copy is in progress.

Reviewed-by: Shravan Kumar Gande &lt;Shravankumar.Gande@amd.com&gt;
Signed-off-by: Victor Skvortsov &lt;victor.skvortsov@amd.com&gt;
Signed-off-by: Ellen Pan &lt;yunru.pan@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Fix CPER error handling on VFs</title>
<updated>2025-04-07T22:00:40+00:00</updated>
<author>
<name>Victor Skvortsov</name>
<email>Victor.Skvortsov@amd.com</email>
</author>
<published>2025-03-30T18:54:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=f9fbc338811c8d123f0c05cd699ffa3ae7f08d34'/>
<id>urn:sha1:f9fbc338811c8d123f0c05cd699ffa3ae7f08d34</id>
<content type='text'>
CPER read will loop infinitely if an error is encountered and
the more bit is set. Add error checks to break upon failure.

v2: added function pointer checks

Suggested-by: Tony Yi &lt;Tony.Yi@amd.com&gt;
Signed-off-by: Victor Skvortsov &lt;Victor.Skvortsov@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amd/amdgpu: Fix MES init sequence</title>
<updated>2025-03-14T03:16:08+00:00</updated>
<author>
<name>Shaoyun Liu</name>
<email>shaoyun.liu@amd.com</email>
</author>
<published>2025-03-10T16:38:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=f81cd793119e7f4b426a825435d49cc10a081c7a'/>
<id>urn:sha1:f81cd793119e7f4b426a825435d49cc10a081c7a</id>
<content type='text'>
When MES is been used , the set_hw_resource_1 API is required to
initialize MES internal context correctly

Signed-off-by: Shaoyun Liu &lt;shaoyun.liu@amd.com&gt;
Acked-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Reviewed-by: Srinivasan Shanmugam &lt;srinivasan.shanmugam@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Add support for CPERs on virtualization</title>
<updated>2025-03-05T15:47:03+00:00</updated>
<author>
<name>Tony Yi</name>
<email>Tony.Yi@amd.com</email>
</author>
<published>2025-02-26T22:03:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a91d91b6004796b868374394962331a1322da7ab'/>
<id>urn:sha1:a91d91b6004796b868374394962331a1322da7ab</id>
<content type='text'>
Add support for CPERs on VFs.

VFs do not receive PMFW messages directly; as such, they need to
query them from the host. To avoid hitting host event guard,
CPER queries need to be rate limited. CPER queries share the same
RAS telemetry buffer as error count query, so a mutex protecting
the shared buffer was added as well.

For readability, the amdgpu_detect_virtualization was refactored
into multiple individual functions.

Signed-off-by: Tony Yi &lt;Tony.Yi@amd.com&gt;
Reviewed-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Replace Mutex with Spinlock for RLCG register access to avoid Priority Inversion in SRIOV</title>
<updated>2025-02-19T20:14:38+00:00</updated>
<author>
<name>Srinivasan Shanmugam</name>
<email>srinivasan.shanmugam@amd.com</email>
</author>
<published>2025-02-14T08:52:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=dc0297f3198bd60108ccbd167ee5d9fa4af31ed0'/>
<id>urn:sha1:dc0297f3198bd60108ccbd167ee5d9fa4af31ed0</id>
<content type='text'>
RLCG Register Access is a way for virtual functions to safely access GPU
registers in a virtualized environment., including TLB flushes and
register reads. When multiple threads or VFs try to access the same
registers simultaneously, it can lead to race conditions. By using the
RLCG interface, the driver can serialize access to the registers. This
means that only one thread can access the registers at a time,
preventing conflicts and ensuring that operations are performed
correctly. Additionally, when a low-priority task holds a mutex that a
high-priority task needs, ie., If a thread holding a spinlock tries to
acquire a mutex, it can lead to priority inversion. register access in
amdgpu_virt_rlcg_reg_rw especially in a fast code path is critical.

The call stack shows that the function amdgpu_virt_rlcg_reg_rw is being
called, which attempts to acquire the mutex. This function is invoked
from amdgpu_sriov_wreg, which in turn is called from
gmc_v11_0_flush_gpu_tlb.

The [ BUG: Invalid wait context ] indicates that a thread is trying to
acquire a mutex while it is in a context that does not allow it to sleep
(like holding a spinlock).

Fixes the below:

[  253.013423] =============================
[  253.013434] [ BUG: Invalid wait context ]
[  253.013446] 6.12.0-amdstaging-drm-next-lol-050225 #14 Tainted: G     U     OE
[  253.013464] -----------------------------
[  253.013475] kworker/0:1/10 is trying to lock:
[  253.013487] ffff9f30542e3cf8 (&amp;adev-&gt;virt.rlcg_reg_lock){+.+.}-{3:3}, at: amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu]
[  253.013815] other info that might help us debug this:
[  253.013827] context-{4:4}
[  253.013835] 3 locks held by kworker/0:1/10:
[  253.013847]  #0: ffff9f3040050f58 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x3f5/0x680
[  253.013877]  #1: ffffb789c008be40 ((work_completion)(&amp;wfc.work)){+.+.}-{0:0}, at: process_one_work+0x1d6/0x680
[  253.013905]  #2: ffff9f3054281838 (&amp;adev-&gt;gmc.invalidate_lock){+.+.}-{2:2}, at: gmc_v11_0_flush_gpu_tlb+0x198/0x4f0 [amdgpu]
[  253.014154] stack backtrace:
[  253.014164] CPU: 0 UID: 0 PID: 10 Comm: kworker/0:1 Tainted: G     U     OE      6.12.0-amdstaging-drm-next-lol-050225 #14
[  253.014189] Tainted: [U]=USER, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
[  253.014203] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 11/18/2024
[  253.014224] Workqueue: events work_for_cpu_fn
[  253.014241] Call Trace:
[  253.014250]  &lt;TASK&gt;
[  253.014260]  dump_stack_lvl+0x9b/0xf0
[  253.014275]  dump_stack+0x10/0x20
[  253.014287]  __lock_acquire+0xa47/0x2810
[  253.014303]  ? srso_alias_return_thunk+0x5/0xfbef5
[  253.014321]  lock_acquire+0xd1/0x300
[  253.014333]  ? amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu]
[  253.014562]  ? __lock_acquire+0xa6b/0x2810
[  253.014578]  __mutex_lock+0x85/0xe20
[  253.014591]  ? amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu]
[  253.014782]  ? sched_clock_noinstr+0x9/0x10
[  253.014795]  ? srso_alias_return_thunk+0x5/0xfbef5
[  253.014808]  ? local_clock_noinstr+0xe/0xc0
[  253.014822]  ? amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu]
[  253.015012]  ? srso_alias_return_thunk+0x5/0xfbef5
[  253.015029]  mutex_lock_nested+0x1b/0x30
[  253.015044]  ? mutex_lock_nested+0x1b/0x30
[  253.015057]  amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu]
[  253.015249]  amdgpu_sriov_wreg+0xc5/0xd0 [amdgpu]
[  253.015435]  gmc_v11_0_flush_gpu_tlb+0x44b/0x4f0 [amdgpu]
[  253.015667]  gfx_v11_0_hw_init+0x499/0x29c0 [amdgpu]
[  253.015901]  ? __pfx_smu_v13_0_update_pcie_parameters+0x10/0x10 [amdgpu]
[  253.016159]  ? srso_alias_return_thunk+0x5/0xfbef5
[  253.016173]  ? smu_hw_init+0x18d/0x300 [amdgpu]
[  253.016403]  amdgpu_device_init+0x29ad/0x36a0 [amdgpu]
[  253.016614]  amdgpu_driver_load_kms+0x1a/0xc0 [amdgpu]
[  253.017057]  amdgpu_pci_probe+0x1c2/0x660 [amdgpu]
[  253.017493]  local_pci_probe+0x4b/0xb0
[  253.017746]  work_for_cpu_fn+0x1a/0x30
[  253.017995]  process_one_work+0x21e/0x680
[  253.018248]  worker_thread+0x190/0x330
[  253.018500]  ? __pfx_worker_thread+0x10/0x10
[  253.018746]  kthread+0xe7/0x120
[  253.018988]  ? __pfx_kthread+0x10/0x10
[  253.019231]  ret_from_fork+0x3c/0x60
[  253.019468]  ? __pfx_kthread+0x10/0x10
[  253.019701]  ret_from_fork_asm+0x1a/0x30
[  253.019939]  &lt;/TASK&gt;

v2: s/spin_trylock/spin_lock_irqsave to be safe (Christian).

Fixes: e864180ee49b ("drm/amdgpu: Add lock around VF RLCG interface")
Cc: lin cao &lt;lin.cao@amd.com&gt;
Cc: Jingwen Chen &lt;Jingwen.Chen2@amd.com&gt;
Cc: Victor Skvortsov &lt;victor.skvortsov@amd.com&gt;
Cc: Zhigang Luo &lt;zhigang.luo@amd.com&gt;
Cc: Christian König &lt;christian.koenig@amd.com&gt;
Cc: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Srinivasan Shanmugam &lt;srinivasan.shanmugam@amd.com&gt;
Suggested-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Skip err_count sysfs creation on VF unsupported RAS blocks</title>
<updated>2025-02-13T02:02:59+00:00</updated>
<author>
<name>Victor Skvortsov</name>
<email>victor.skvortsov@amd.com</email>
</author>
<published>2025-01-21T03:00:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=04893397766a2b2f1bc7fe5c6414e4c0846ed171'/>
<id>urn:sha1:04893397766a2b2f1bc7fe5c6414e4c0846ed171</id>
<content type='text'>
VFs are not able to query error counts for all RAS blocks. Rather than
returning error for queries on these blocks, skip sysfs the creation
all together.

Signed-off-by: Victor Skvortsov &lt;victor.skvortsov@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/admgpu: replace kmalloc() and memcpy() with kmemdup()</title>
<updated>2024-12-18T17:39:08+00:00</updated>
<author>
<name>Mirsad Todorovac</name>
<email>mtodorovac69@gmail.com</email>
</author>
<published>2024-12-17T22:58:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a21ab06b8c2d8d25c4a83bdf39542834b1f3beae'/>
<id>urn:sha1:a21ab06b8c2d8d25c4a83bdf39542834b1f3beae</id>
<content type='text'>
The static analyser tool gave the following advice:

./drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c:1266:7-14: WARNING opportunity for kmemdup

 → 1266         tmp = kmalloc(used_size, GFP_KERNEL);
   1267         if (!tmp)
   1268                 return -ENOMEM;
   1269
 → 1270         memcpy(tmp, &amp;host_telemetry-&gt;body.error_count, used_size);

Replacing kmalloc() + memcpy() with kmemdump() doesn't change semantics.
Original code works without fault, so this is not a bug fix but proposed improvement.

Link: https://lwn.net/Articles/198928/
Fixes: 84a2947ecc85 ("drm/amdgpu: Implement virt req_ras_err_count")
Cc: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Cc: "Christian König" &lt;christian.koenig@amd.com&gt;
Cc: Xinhui Pan &lt;Xinhui.Pan@amd.com&gt;
Cc: David Airlie &lt;airlied@gmail.com&gt;
Cc: Simona Vetter &lt;simona@ffwll.ch&gt;
Cc: Zhigang Luo &lt;Zhigang.Luo@amd.com&gt;
Cc: Victor Skvortsov &lt;victor.skvortsov@amd.com&gt;
Cc: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Cc: Lijo Lazar &lt;lijo.lazar@amd.com&gt;
Cc: Yunxiang Li &lt;Yunxiang.Li@amd.com&gt;
Cc: Jack Xiao &lt;Jack.Xiao@amd.com&gt;
Cc: Vignesh Chander &lt;Vignesh.Chander@amd.com&gt;
Cc: Danijel Slivka &lt;danijel.slivka@amd.com&gt;
Cc: amd-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Lijo Lazar &lt;lijo.lazar@amd.com&gt;
Signed-off-by: Mirsad Todorovac &lt;mtodorovac69@gmail.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
</feed>
