<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h, branch v6.18.21</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.18.21</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.18.21'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2025-09-18T13:43:02+00:00</updated>
<entry>
<title>drm/amdgpu: Introduce VF critical region check for RAS poison injection</title>
<updated>2025-09-18T13:43:02+00:00</updated>
<author>
<name>Xiang Liu</name>
<email>xiang.liu@amd.com</email>
</author>
<published>2025-08-19T04:51:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=f1fdeb3d07a4c2dcfc8cf0e20c0bddb68e8bc607'/>
<id>urn:sha1:f1fdeb3d07a4c2dcfc8cf0e20c0bddb68e8bc607</id>
<content type='text'>
The SRIOV guest send requet to host to check whether the poison
injection address is in VF critical region or not via mabox.

Signed-off-by: Xiang Liu &lt;xiang.liu@amd.com&gt;
Reviewed-by: Shravan Kumar Gande &lt;Shravankumar.Gande@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Add virtual device capabilities</title>
<updated>2025-09-15T20:55:48+00:00</updated>
<author>
<name>Lijo Lazar</name>
<email>lijo.lazar@amd.com</email>
</author>
<published>2025-09-04T12:17:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=780f7a45e56e9fd2bbae146687c054fcf6a08743'/>
<id>urn:sha1:780f7a45e56e9fd2bbae146687c054fcf6a08743</id>
<content type='text'>
Add a member to define the capabilities of virtual device.

Signed-off-by: Lijo Lazar &lt;lijo.lazar@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: refactor bad_page_work for corner case handling</title>
<updated>2025-08-15T17:07:30+00:00</updated>
<author>
<name>Chenglei Xie</name>
<email>Chenglei.Xie@amd.com</email>
</author>
<published>2025-08-07T20:52:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=d2fa0ec6e0aea6ffbd41939d0c7671db16991ca4'/>
<id>urn:sha1:d2fa0ec6e0aea6ffbd41939d0c7671db16991ca4</id>
<content type='text'>
When a poison is consumed on the guest before the guest receives the host's poison creation msg, a corner case may occur to have poison_handler complete processing earlier than it should to cause the guest to hang waiting for the req_bad_pages reply during a VF FLR, resulting in the VM becoming inaccessible in stress tests.

To fix this issue, this patch refactored the mailbox sequence by seperating the bad_page_work into two parts req_bad_pages_work and handle_bad_pages_work.
Old sequence:
  1.Stop data exchange work
  2.Guest sends MB_REQ_RAS_BAD_PAGES to host and keep polling for IDH_RAS_BAD_PAGES_READY
  3.If the IDH_RAS_BAD_PAGES_READY arrives within timeout limit, re-init the data exchange region for updated bad page info
    else timeout with error message
New sequence:
req_bad_pages_work:
  1.Stop data exhange work
  2.Guest sends MB_REQ_RAS_BAD_PAGES to host
Once Guest receives IDH_RAS_BAD_PAGES_READY event
handle_bad_pages_work:
  3.re-init the data exchange region for updated bad page info

Signed-off-by: Chenglei Xie &lt;Chenglei.Xie@amd.com&gt;
Reviewed-by: Shravan Kumar Gande &lt;Shravankumar.Gande@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Check SQ_CONFIG register support on SRIOV</title>
<updated>2025-07-16T20:14:21+00:00</updated>
<author>
<name>Tony Yi</name>
<email>Tony.Yi@amd.com</email>
</author>
<published>2025-06-09T19:09:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=991f2e0c63a7513202faab90a470ebb46e227541'/>
<id>urn:sha1:991f2e0c63a7513202faab90a470ebb46e227541</id>
<content type='text'>
On SRIOV environments, check if RLCG supports
SQ_CONFIG register programming.

Signed-off-by: Tony Yi &lt;Tony.Yi@amd.com&gt;
Reviewed-by: Zhigang Luo &lt;zhigang.luo@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: update xgmi info and vram_base_offset on resume</title>
<updated>2025-06-18T16:19:01+00:00</updated>
<author>
<name>Samuel Zhang</name>
<email>guoqing.zhang@amd.com</email>
</author>
<published>2025-04-11T07:59:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=855a2a029a2eda88317cdccde9b3a3168ad193dc'/>
<id>urn:sha1:855a2a029a2eda88317cdccde9b3a3168ad193dc</id>
<content type='text'>
For SRIOV VM env with XGMI enabled systems, XGMI physical node id may
change when hibernate and resume with different VF.

Update XGMI info and vram_base_offset on resume for gfx444 SRIOV env.
Add amdgpu_virt_xgmi_migrate_enabled() as the feature flag.

Signed-off-by: Jiang Liu &lt;gerry@linux.alibaba.com&gt;
Signed-off-by: Samuel Zhang &lt;guoqing.zhang@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Reviewed-by: Lijo Lazar &lt;lijo.lazar@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Implement Runtime Bad Page query for VFs</title>
<updated>2025-05-07T21:41:49+00:00</updated>
<author>
<name>Ellen Pan</name>
<email>yunru.pan@amd.com</email>
</author>
<published>2025-04-29T20:48:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=5da3d8820dd3202d5ae871050e45facb2787235d'/>
<id>urn:sha1:5da3d8820dd3202d5ae871050e45facb2787235d</id>
<content type='text'>
Host will send a notification when new bad pages are available.

Uopn guest request, the first 256 bad page addresses
will be placed into the PF2VF region.
Guest should pause the PF2VF worker thread while
the copy is in progress.

Reviewed-by: Shravan Kumar Gande &lt;Shravankumar.Gande@amd.com&gt;
Signed-off-by: Victor Skvortsov &lt;victor.skvortsov@amd.com&gt;
Signed-off-by: Ellen Pan &lt;yunru.pan@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Add indirect L1_TLB_CNTL reg programming for VFs</title>
<updated>2025-04-07T19:18:58+00:00</updated>
<author>
<name>Victor Skvortsov</name>
<email>victor.skvortsov@amd.com</email>
</author>
<published>2025-03-27T15:38:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=0c6e39ce6da20104900b11bad64464a12fb47320'/>
<id>urn:sha1:0c6e39ce6da20104900b11bad64464a12fb47320</id>
<content type='text'>
VFs on some IP versions are unable to access this register directly.

This register must be programmed before PSP ring is setup,
so use PSP VF mailbox directly. PSP will broadcast the register
value to all VF assigned instances.

Signed-off-by: Victor Skvortsov &lt;victor.skvortsov@amd.com&gt;
Reviewed-by: Zhigang Luo &lt;Zhigang.luo@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Add amdgpu_sriov_multi_vf_mode function</title>
<updated>2025-03-14T03:12:52+00:00</updated>
<author>
<name>Emily Deng</name>
<email>Emily.Deng@amd.com</email>
</author>
<published>2025-02-06T05:09:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=8d5e70ba5da21452735474b70322446aeb442c94'/>
<id>urn:sha1:8d5e70ba5da21452735474b70322446aeb442c94</id>
<content type='text'>
Use amdgpu_sriov_multi_vf_mode to replace amdgpu_sriov_vf(adev) &amp;&amp; !amdgpu_sriov_is_pp_one_vf(adev).

Signed-off-by: Emily Deng &lt;Emily.Deng@amd.com&gt;
Reviewed-by: Lijo Lazar &lt;lijo.lazar@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Add support for CPERs on virtualization</title>
<updated>2025-03-05T15:47:03+00:00</updated>
<author>
<name>Tony Yi</name>
<email>Tony.Yi@amd.com</email>
</author>
<published>2025-02-26T22:03:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a91d91b6004796b868374394962331a1322da7ab'/>
<id>urn:sha1:a91d91b6004796b868374394962331a1322da7ab</id>
<content type='text'>
Add support for CPERs on VFs.

VFs do not receive PMFW messages directly; as such, they need to
query them from the host. To avoid hitting host event guard,
CPER queries need to be rate limited. CPER queries share the same
RAS telemetry buffer as error count query, so a mutex protecting
the shared buffer was added as well.

For readability, the amdgpu_detect_virtualization was refactored
into multiple individual functions.

Signed-off-by: Tony Yi &lt;Tony.Yi@amd.com&gt;
Reviewed-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Replace Mutex with Spinlock for RLCG register access to avoid Priority Inversion in SRIOV</title>
<updated>2025-02-19T20:14:38+00:00</updated>
<author>
<name>Srinivasan Shanmugam</name>
<email>srinivasan.shanmugam@amd.com</email>
</author>
<published>2025-02-14T08:52:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=dc0297f3198bd60108ccbd167ee5d9fa4af31ed0'/>
<id>urn:sha1:dc0297f3198bd60108ccbd167ee5d9fa4af31ed0</id>
<content type='text'>
RLCG Register Access is a way for virtual functions to safely access GPU
registers in a virtualized environment., including TLB flushes and
register reads. When multiple threads or VFs try to access the same
registers simultaneously, it can lead to race conditions. By using the
RLCG interface, the driver can serialize access to the registers. This
means that only one thread can access the registers at a time,
preventing conflicts and ensuring that operations are performed
correctly. Additionally, when a low-priority task holds a mutex that a
high-priority task needs, ie., If a thread holding a spinlock tries to
acquire a mutex, it can lead to priority inversion. register access in
amdgpu_virt_rlcg_reg_rw especially in a fast code path is critical.

The call stack shows that the function amdgpu_virt_rlcg_reg_rw is being
called, which attempts to acquire the mutex. This function is invoked
from amdgpu_sriov_wreg, which in turn is called from
gmc_v11_0_flush_gpu_tlb.

The [ BUG: Invalid wait context ] indicates that a thread is trying to
acquire a mutex while it is in a context that does not allow it to sleep
(like holding a spinlock).

Fixes the below:

[  253.013423] =============================
[  253.013434] [ BUG: Invalid wait context ]
[  253.013446] 6.12.0-amdstaging-drm-next-lol-050225 #14 Tainted: G     U     OE
[  253.013464] -----------------------------
[  253.013475] kworker/0:1/10 is trying to lock:
[  253.013487] ffff9f30542e3cf8 (&amp;adev-&gt;virt.rlcg_reg_lock){+.+.}-{3:3}, at: amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu]
[  253.013815] other info that might help us debug this:
[  253.013827] context-{4:4}
[  253.013835] 3 locks held by kworker/0:1/10:
[  253.013847]  #0: ffff9f3040050f58 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x3f5/0x680
[  253.013877]  #1: ffffb789c008be40 ((work_completion)(&amp;wfc.work)){+.+.}-{0:0}, at: process_one_work+0x1d6/0x680
[  253.013905]  #2: ffff9f3054281838 (&amp;adev-&gt;gmc.invalidate_lock){+.+.}-{2:2}, at: gmc_v11_0_flush_gpu_tlb+0x198/0x4f0 [amdgpu]
[  253.014154] stack backtrace:
[  253.014164] CPU: 0 UID: 0 PID: 10 Comm: kworker/0:1 Tainted: G     U     OE      6.12.0-amdstaging-drm-next-lol-050225 #14
[  253.014189] Tainted: [U]=USER, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
[  253.014203] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 11/18/2024
[  253.014224] Workqueue: events work_for_cpu_fn
[  253.014241] Call Trace:
[  253.014250]  &lt;TASK&gt;
[  253.014260]  dump_stack_lvl+0x9b/0xf0
[  253.014275]  dump_stack+0x10/0x20
[  253.014287]  __lock_acquire+0xa47/0x2810
[  253.014303]  ? srso_alias_return_thunk+0x5/0xfbef5
[  253.014321]  lock_acquire+0xd1/0x300
[  253.014333]  ? amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu]
[  253.014562]  ? __lock_acquire+0xa6b/0x2810
[  253.014578]  __mutex_lock+0x85/0xe20
[  253.014591]  ? amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu]
[  253.014782]  ? sched_clock_noinstr+0x9/0x10
[  253.014795]  ? srso_alias_return_thunk+0x5/0xfbef5
[  253.014808]  ? local_clock_noinstr+0xe/0xc0
[  253.014822]  ? amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu]
[  253.015012]  ? srso_alias_return_thunk+0x5/0xfbef5
[  253.015029]  mutex_lock_nested+0x1b/0x30
[  253.015044]  ? mutex_lock_nested+0x1b/0x30
[  253.015057]  amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu]
[  253.015249]  amdgpu_sriov_wreg+0xc5/0xd0 [amdgpu]
[  253.015435]  gmc_v11_0_flush_gpu_tlb+0x44b/0x4f0 [amdgpu]
[  253.015667]  gfx_v11_0_hw_init+0x499/0x29c0 [amdgpu]
[  253.015901]  ? __pfx_smu_v13_0_update_pcie_parameters+0x10/0x10 [amdgpu]
[  253.016159]  ? srso_alias_return_thunk+0x5/0xfbef5
[  253.016173]  ? smu_hw_init+0x18d/0x300 [amdgpu]
[  253.016403]  amdgpu_device_init+0x29ad/0x36a0 [amdgpu]
[  253.016614]  amdgpu_driver_load_kms+0x1a/0xc0 [amdgpu]
[  253.017057]  amdgpu_pci_probe+0x1c2/0x660 [amdgpu]
[  253.017493]  local_pci_probe+0x4b/0xb0
[  253.017746]  work_for_cpu_fn+0x1a/0x30
[  253.017995]  process_one_work+0x21e/0x680
[  253.018248]  worker_thread+0x190/0x330
[  253.018500]  ? __pfx_worker_thread+0x10/0x10
[  253.018746]  kthread+0xe7/0x120
[  253.018988]  ? __pfx_kthread+0x10/0x10
[  253.019231]  ret_from_fork+0x3c/0x60
[  253.019468]  ? __pfx_kthread+0x10/0x10
[  253.019701]  ret_from_fork_asm+0x1a/0x30
[  253.019939]  &lt;/TASK&gt;

v2: s/spin_trylock/spin_lock_irqsave to be safe (Christian).

Fixes: e864180ee49b ("drm/amdgpu: Add lock around VF RLCG interface")
Cc: lin cao &lt;lin.cao@amd.com&gt;
Cc: Jingwen Chen &lt;Jingwen.Chen2@amd.com&gt;
Cc: Victor Skvortsov &lt;victor.skvortsov@amd.com&gt;
Cc: Zhigang Luo &lt;zhigang.luo@amd.com&gt;
Cc: Christian König &lt;christian.koenig@amd.com&gt;
Cc: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Srinivasan Shanmugam &lt;srinivasan.shanmugam@amd.com&gt;
Suggested-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
</feed>
