<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c, branch v6.13.6</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.13.6</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.13.6'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2024-11-20T15:03:05+00:00</updated>
<entry>
<title>drm/amdgpu: Use reset recovery state checks</title>
<updated>2024-11-20T15:03:05+00:00</updated>
<author>
<name>Lijo Lazar</name>
<email>lijo.lazar@amd.com</email>
</author>
<published>2024-11-15T06:05:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e283f4fb0862647f4bb02e78d728bc8fb9eef18d'/>
<id>urn:sha1:e283f4fb0862647f4bb02e78d728bc8fb9eef18d</id>
<content type='text'>
Some in_reset checks are infact checking whether the state is
reinitialization after reset. Replace with reset_in_recovery calls to
identify that it's really checking for recovery stage after reset.

Signed-off-by: Lijo Lazar &lt;lijo.lazar@amd.com&gt;
Acked-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Implement virt req_ras_err_count</title>
<updated>2024-11-11T16:55:42+00:00</updated>
<author>
<name>Victor Skvortsov</name>
<email>victor.skvortsov@amd.com</email>
</author>
<published>2024-10-30T14:18:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=84a2947ecc85c67f433f2cc2186e54cdb9047b61'/>
<id>urn:sha1:84a2947ecc85c67f433f2cc2186e54cdb9047b61</id>
<content type='text'>
Enable RAS late init  if VF RAS Telemetry is supported.

When enabled, the VF can use this interface to query total
RAS error counts from the host.

The VF FB access may abruptly end due to a fatal error,
therefore the VF must cache and sanitize the input.

The Host allows 15 Telemetry messages every 60 seconds, afterwhich
the host will ignore any more in-coming telemetry messages. The VF will
rate limit its msg calling to once every 5 seconds (12 times in 60 seconds).
While the VF is rate limited, it will continue to report the last
good cached data.

v2: Flip generate report &amp; update statistics order for VF

Signed-off-by: Victor Skvortsov &lt;victor.skvortsov@amd.com&gt;
Acked-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Reviewed-by: Zhigang Luo &lt;zhigang.luo@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: VF Query RAS Caps from Host if supported</title>
<updated>2024-11-11T16:55:36+00:00</updated>
<author>
<name>Victor Skvortsov</name>
<email>victor.skvortsov@amd.com</email>
</author>
<published>2024-10-30T13:58:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=907fec2dfd061ca422d8b121f4af1b6062e098ba'/>
<id>urn:sha1:907fec2dfd061ca422d8b121f4af1b6062e098ba</id>
<content type='text'>
If VF RAS Capability support is enabled, guest is able to
retrieve the real RAS support from the host.

Signed-off-by: Victor Skvortsov &lt;victor.skvortsov@amd.com&gt;
Reviewed-by: Zhigang Luo &lt;zhigang.luo@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Skip IP coredump for RAS errors</title>
<updated>2024-11-04T17:06:23+00:00</updated>
<author>
<name>Lijo Lazar</name>
<email>lijo.lazar@amd.com</email>
</author>
<published>2024-11-04T04:33:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=81db4eab2847094137a266616954e5f1c6e33575'/>
<id>urn:sha1:81db4eab2847094137a266616954e5f1c6e33575</id>
<content type='text'>
For RAS errors, source of error is known. Skip the core dump of IP
states.

Signed-off-by: Lijo Lazar &lt;lijo.lazar@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Refactor XGMI reset on init handling</title>
<updated>2024-09-26T21:06:46+00:00</updated>
<author>
<name>Lijo Lazar</name>
<email>lijo.lazar@amd.com</email>
</author>
<published>2024-08-26T13:22:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=631af731ee9cc7f5a5c0ab1de94da68195920214'/>
<id>urn:sha1:631af731ee9cc7f5a5c0ab1de94da68195920214</id>
<content type='text'>
Use XGMI hive information to rely on resetting XGMI devices on
initialization rather than using mgpu structure. mgpu structure may have
other devices as well.

Signed-off-by: Lijo Lazar &lt;lijo.lazar@amd.com&gt;
Reviewed-by: Feifei Xu &lt;feifxu@amd.com&gt;
Acked-by: Rajneesh Bhardwaj &lt;rajneesh.bhardwaj@amd.com&gt;
Tested-by: Rajneesh Bhardwaj &lt;rajneesh.bhardwaj@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Add helper to initialize badpage info</title>
<updated>2024-09-26T21:06:38+00:00</updated>
<author>
<name>Lijo Lazar</name>
<email>lijo.lazar@amd.com</email>
</author>
<published>2024-08-30T05:51:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b17f87329d49860130a524ab424ecefd3332600f'/>
<id>urn:sha1:b17f87329d49860130a524ab424ecefd3332600f</id>
<content type='text'>
Add a separate function to read badpage data during initialization.
Reading bad pages will need hardware access and cannot be done during
reset. Hence in cases where device needs a full reset during
init itself, attempting to read will cause a deadlock.

Signed-off-by: Lijo Lazar &lt;lijo.lazar@amd.com&gt;
Reviewed-by: Feifei Xu &lt;Feifei.Xu@amd.com&gt;
Reviewed-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Acked-by: Rajneesh Bhardwaj &lt;rajneesh.bhardwaj@amd.com&gt;
Tested-by: Rajneesh Bhardwaj &lt;rajneesh.bhardwaj@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Use init level for pending_reset flag</title>
<updated>2024-09-26T21:06:18+00:00</updated>
<author>
<name>Lijo Lazar</name>
<email>lijo.lazar@amd.com</email>
</author>
<published>2024-08-19T07:53:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=5839d27d5b2dad160e402bfac16ab61b481c47f3'/>
<id>urn:sha1:5839d27d5b2dad160e402bfac16ab61b481c47f3</id>
<content type='text'>
Drop pending_reset flag in gmc block. Instead use init level to
determine which type of init is preferred - in this case MINIMAL.

Signed-off-by: Lijo Lazar &lt;lijo.lazar@amd.com&gt;
Acked-by: Rajneesh Bhardwaj &lt;rajneesh.bhardwaj@amd.com&gt;
Tested-by: Rajneesh Bhardwaj &lt;rajneesh.bhardwaj@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>amd/amdgpu: Reduce unnecessary repetitive GPU resets</title>
<updated>2024-09-26T21:06:18+00:00</updated>
<author>
<name>YiPeng Chai</name>
<email>YiPeng.Chai@amd.com</email>
</author>
<published>2024-09-20T07:22:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=9e0feb79469233bc91339bcfd1ae1d940e121eca'/>
<id>urn:sha1:9e0feb79469233bc91339bcfd1ae1d940e121eca</id>
<content type='text'>
In multiple GPUs case, after a GPU has started
resetting all GPUs on hive, other GPUs do not
need to trigger GPU reset again.

Signed-off-by: YiPeng Chai &lt;YiPeng.Chai@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: fix typo in the comment</title>
<updated>2024-09-18T20:14:26+00:00</updated>
<author>
<name>Yan Zhen</name>
<email>yanzhen@vivo.com</email>
</author>
<published>2024-09-11T04:27:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=0110ac11952f06419d267f51a3989e989b17e67a'/>
<id>urn:sha1:0110ac11952f06419d267f51a3989e989b17e67a</id>
<content type='text'>
Correctly spelled comments make it easier for the reader to understand
the code.

Replace 'udpate' with 'update' in the comment &amp;
replace 'recieved' with 'received' in the comment &amp;
replace 'dsiable' with 'disable' in the comment &amp;
replace 'Initiailize' with 'Initialize' in the comment &amp;
replace 'disble' with 'disable' in the comment &amp;
replace 'Disbale' with 'Disable' in the comment &amp;
replace 'enogh' with 'enough' in the comment &amp;
replace 'availabe' with 'available' in the comment.

Acked-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Yan Zhen &lt;yanzhen@vivo.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: disable GPU RAS bad page feature for specific ASIC</title>
<updated>2024-09-17T14:04:57+00:00</updated>
<author>
<name>Tao Zhou</name>
<email>tao.zhou1@amd.com</email>
</author>
<published>2024-09-09T10:51:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c389a0604cfbcdb1f8f53a76560eb31e0700e206'/>
<id>urn:sha1:c389a0604cfbcdb1f8f53a76560eb31e0700e206</id>
<content type='text'>
The feature is not applicable to specific app platform.

v2: update the disablement condition and commit description
v3: move the setting to amdgpu_ras_check_supported

Signed-off-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
</feed>
