<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.h, branch v6.18.21</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.18.21</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.18.21'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2025-09-18T13:43:02+00:00</updated>
<entry>
<title>drm/amdgpu: Introduce VF critical region check for RAS poison injection</title>
<updated>2025-09-18T13:43:02+00:00</updated>
<author>
<name>Xiang Liu</name>
<email>xiang.liu@amd.com</email>
</author>
<published>2025-08-19T04:51:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=f1fdeb3d07a4c2dcfc8cf0e20c0bddb68e8bc607'/>
<id>urn:sha1:f1fdeb3d07a4c2dcfc8cf0e20c0bddb68e8bc607</id>
<content type='text'>
The SRIOV guest send requet to host to check whether the poison
injection address is in VF critical region or not via mabox.

Signed-off-by: Xiang Liu &lt;xiang.liu@amd.com&gt;
Reviewed-by: Shravan Kumar Gande &lt;Shravankumar.Gande@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Implement unrecoverable error message handling for VFs</title>
<updated>2025-05-07T21:43:13+00:00</updated>
<author>
<name>Ellen Pan</name>
<email>yunru.pan@amd.com</email>
</author>
<published>2025-04-29T21:18:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=086809c82c96116aed78f762e4f1d61a4e0210a7'/>
<id>urn:sha1:086809c82c96116aed78f762e4f1d61a4e0210a7</id>
<content type='text'>
This notification may arrive in VF mailbox while polling for response from
another event.

This patches covers the following scenarios:

- If VF is already in RMA state, then do not attempt to contact the host.
  Host will ignore the VF after sending the notification.

- If the notification is detected during polling, then set the RMA status,
  and return error to caller.

- If the notification arrives by interrupt, then set the RMA status and
  queue a reset.  This reset will fail and VF will stop runtime services.

Reviewed-by: Shravan Kumar Gande &lt;Shravankumar.Gande@amd.com&gt;
Signed-off-by: Victor Skvortsov &lt;victor.skvortsov@amd.com&gt;
Signed-off-by: Ellen Pan &lt;yunru.pan@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Implement Runtime Bad Page query for VFs</title>
<updated>2025-05-07T21:41:49+00:00</updated>
<author>
<name>Ellen Pan</name>
<email>yunru.pan@amd.com</email>
</author>
<published>2025-04-29T20:48:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=5da3d8820dd3202d5ae871050e45facb2787235d'/>
<id>urn:sha1:5da3d8820dd3202d5ae871050e45facb2787235d</id>
<content type='text'>
Host will send a notification when new bad pages are available.

Uopn guest request, the first 256 bad page addresses
will be placed into the PF2VF region.
Guest should pause the PF2VF worker thread while
the copy is in progress.

Reviewed-by: Shravan Kumar Gande &lt;Shravankumar.Gande@amd.com&gt;
Signed-off-by: Victor Skvortsov &lt;victor.skvortsov@amd.com&gt;
Signed-off-by: Ellen Pan &lt;yunru.pan@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Update headers for CPER support on SRIOV</title>
<updated>2025-03-05T15:46:40+00:00</updated>
<author>
<name>Tony Yi</name>
<email>Tony.Yi@amd.com</email>
</author>
<published>2025-02-26T21:56:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=d4c60219ac10242a1d5a621e7ba673d6128b7e13'/>
<id>urn:sha1:d4c60219ac10242a1d5a621e7ba673d6128b7e13</id>
<content type='text'>
Update amdgv_sriovmsg.h and mxgpu_nv.h to add new definitions for
CPER support on VFs. PMFW ACA messages are not available on VFs,
and VFs must query CPERs from host.

Signed-off-by: Tony Yi &lt;Tony.Yi@amd.com&gt;
Reviewed-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Update SRIOV Exchange Headers for RAS Telemetry Support</title>
<updated>2024-11-11T16:55:01+00:00</updated>
<author>
<name>Victor Skvortsov</name>
<email>victor.skvortsov@amd.com</email>
</author>
<published>2024-10-30T13:33:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=60c58d72afb81d2dc3f52f638eff5197511ac114'/>
<id>urn:sha1:60c58d72afb81d2dc3f52f638eff5197511ac114</id>
<content type='text'>
The SRIOV PF/VF Data exchange is extended by 64KB for VF RAS Telemetry data.
Add Host RAS Telemetry enable capabilities bitfields.
Add a new VF msg REQ_RAS_ERROR_COUNT, the host response data will be populated
in the RAS Telemetry region.

Signed-off-by: Victor Skvortsov &lt;victor.skvortsov@amd.com&gt;
Reviewed-by: Zhigang Luo &lt;zhigang.luo@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amd/sriov: extend NV_MAILBOX_POLL_MSG_TIMEDOUT</title>
<updated>2024-08-13T16:12:51+00:00</updated>
<author>
<name>Victor Zhao</name>
<email>Victor.Zhao@amd.com</email>
</author>
<published>2024-08-07T09:32:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=ef6c2cb349c708676b7820c36a5beb75868ad544'/>
<id>urn:sha1:ef6c2cb349c708676b7820c36a5beb75868ad544</id>
<content type='text'>
on MI300/MI308 UBB products, when doing mode1 reset, since 1 gpu need to
wait all 8 gpus finish mode1 reset and then do re-init. As observed,
sometimes the gpu which triggered the reset need to wait 15s for all
gpus to finish.

If poll msg timeout, guest driver will send the reset message again, and
may mess up the following reinit sequence on other gpus.

So extend the time to cover the maximum time needed to recover.

Signed-off-by: Victor Zhao &lt;Victor.Zhao@amd.com&gt;
Acked-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: process RAS fatal error MB notification</title>
<updated>2024-06-27T21:31:37+00:00</updated>
<author>
<name>Vignesh Chander</name>
<email>Vignesh.Chander@amd.com</email>
</author>
<published>2024-06-24T21:44:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=cbda2758d8bfae323b846210a3e52f0ad5fe7164'/>
<id>urn:sha1:cbda2758d8bfae323b846210a3e52f0ad5fe7164</id>
<content type='text'>
For RAS error scenario, VF guest driver will check mailbox
and set fed flag to avoid unnecessary HW accesses.
additionally, poll for reset completion message first
to avoid accidentally spamming multiple reset requests to host.

v2: add another mailbox check for handling case where kfd detects
timeout first

v3: set host_flr bit and use wait_for_reset

Signed-off-by: Vignesh Chander &lt;Vignesh.Chander@amd.com&gt;
Reviewed-by: Zhigang Luo &lt;Zhigang.Luo@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Add RAS_POISON_READY host response message</title>
<updated>2024-01-25T19:58:03+00:00</updated>
<author>
<name>Victor Skvortsov</name>
<email>victor.skvortsov@amd.com</email>
</author>
<published>2024-01-21T15:25:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=2474414c60b7ed1f90293facdc4d94ef7cf61a3b'/>
<id>urn:sha1:2474414c60b7ed1f90293facdc4d94ef7cf61a3b</id>
<content type='text'>
In a non-FLR page avoidance scenario, the host driver will
provide the bad pages in the pf2vf exchange region.

Adding a new host response message to indicate when the
pf2vf exchange region has been updated.

Signed-off-by: Victor Skvortsov &lt;victor.skvortsov@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: add RAS poison consumption handler for NV SRIOV</title>
<updated>2022-12-15T17:18:19+00:00</updated>
<author>
<name>Tao Zhou</name>
<email>tao.zhou1@amd.com</email>
</author>
<published>2022-12-06T09:04:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=ae844dd79ffc60f419b32a8d6026128f18021650'/>
<id>urn:sha1:ae844dd79ffc60f419b32a8d6026128f18021650</id>
<content type='text'>
Send handling request to host.

Signed-off-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Add MB_REQ_MSG_READY_TO_RESET response when VF get FLR notification.</title>
<updated>2021-08-16T19:17:57+00:00</updated>
<author>
<name>Jiange Zhao</name>
<email>Jiange.Zhao@amd.com</email>
</author>
<published>2021-03-19T02:32:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=3e183e2faea97fb284f82861286de09aa16e3630'/>
<id>urn:sha1:3e183e2faea97fb284f82861286de09aa16e3630</id>
<content type='text'>
When guest received FLR notification from host, it would
lock adapter into reset state. There will be no more
job submission and hardware access after that.

Then it should send a response to host that it has prepared
for host reset.

Signed-off-by: Jiange Zhao &lt;Jiange.Zhao@amd.com&gt;
Signed-off-by: Peng Ju Zhou &lt;PengJu.Zhou@amd.com&gt;
Reviewed-by: Emily.Deng &lt;Emily.Deng@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
</feed>
