<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c, branch linux-7.0.y</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=linux-7.0.y</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=linux-7.0.y'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2026-05-23T11:09:27+00:00</updated>
<entry>
<title>drm/amdgpu: Only send RMA CPER when threshold is exceeded</title>
<updated>2026-05-23T11:09:27+00:00</updated>
<author>
<name>Kent Russell</name>
<email>kent.russell@amd.com</email>
</author>
<published>2026-04-22T13:34:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c78ececfea7e43c199607e7fd14349b1eaf17942'/>
<id>urn:sha1:c78ececfea7e43c199607e7fd14349b1eaf17942</id>
<content type='text'>
[ Upstream commit b56922fc37454633b831a2a04a1537616742977d ]

According to our documentation, the RMA should only occur when the
threshold has been exceeded, not met.

Fixes: 5028a24aa89a ("drm/amdgpu: Send applicable RMA CPERs at end of RAS init")
Signed-off-by: Kent Russell &lt;kent.russell@amd.com&gt;
Reviewed-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
(cherry picked from commit 8bc09a7d0e90ec45a0b4865661cf45cbbce1c3d7)
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: return when ras table checksum is error</title>
<updated>2026-02-12T20:21:56+00:00</updated>
<author>
<name>Gangliang Xie</name>
<email>ganglxie@amd.com</email>
</author>
<published>2026-02-09T09:32:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=044f8d3b1fac6ac89c560f61415000e6bdab3a03'/>
<id>urn:sha1:044f8d3b1fac6ac89c560f61415000e6bdab3a03</id>
<content type='text'>
end the function flow when ras table checksum is error

Signed-off-by: Gangliang Xie &lt;ganglxie@amd.com&gt;
Reviewed-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Reviewed-by: Kent Russell &lt;kent.russell@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Send applicable RMA CPERs at end of RAS init</title>
<updated>2026-02-05T22:28:34+00:00</updated>
<author>
<name>Kent Russell</name>
<email>kent.russell@amd.com</email>
</author>
<published>2026-02-03T14:48:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=5028a24aa89a2c91b44964191ee8184e5f5c8cb2'/>
<id>urn:sha1:5028a24aa89a2c91b44964191ee8184e5f5c8cb2</id>
<content type='text'>
Firmware and monitoring tools may not be ready to receive a CPER when we
read the bad pages, so send the CPERs at the end of RAS initialization
to ensure that the FW is ready to receive and process the CPER. This
removes the previous CPER submission that was added during bad page
load, and sends both in-band and out-of-band at the same time.

Signed-off-by: Kent Russell &lt;kent.russell@amd.com&gt;
Reviewed-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Send RMA CPER at bad page loading</title>
<updated>2026-01-27T23:13:24+00:00</updated>
<author>
<name>Kent Russell</name>
<email>kent.russell@amd.com</email>
</author>
<published>2026-01-22T15:19:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e0d11bdb294cd1325eeccfec05e07d8c2534b43e'/>
<id>urn:sha1:e0d11bdb294cd1325eeccfec05e07d8c2534b43e</id>
<content type='text'>
Some older builds weren't sending RMA CPERs when the bad page threshold
was exceeded. Newer builds have resolved this, but there could be
systems out there with bad page numbers higher than the threshold, that
haven't sent out an RMA CPER. To be thorough and safe, send an RMA CPER
when we load the table, if the threshold is met or exceeded, instead of
waiting for the next UE to trigger the CPER.

Signed-off-by: Kent Russell &lt;kent.russell@amd.com&gt;
Reviewed-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: optimize timeout implemention in ras_eeprom_update_record_num</title>
<updated>2025-11-12T02:54:14+00:00</updated>
<author>
<name>Tao Zhou</name>
<email>tao.zhou1@amd.com</email>
</author>
<published>2025-11-06T08:26:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=7fb41ab3c94828ad48e1a6d2237e8a7e682c74b9'/>
<id>urn:sha1:7fb41ab3c94828ad48e1a6d2237e8a7e682c74b9</id>
<content type='text'>
The busy status returned by ras_eeprom_update_record_num may not be
an error, increase timeout to exclude false busy status. Also add more
comments to make the code readable.

v2: define a macro for the timeout value.

Signed-off-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: add RAS bad page threshold handling for PMFW manages eeprom</title>
<updated>2025-11-12T02:54:14+00:00</updated>
<author>
<name>Tao Zhou</name>
<email>tao.zhou1@amd.com</email>
</author>
<published>2025-09-24T09:52:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=eed30152746ec1d8b6e8ab31e349f1eb8d8bd666'/>
<id>urn:sha1:eed30152746ec1d8b6e8ab31e349f1eb8d8bd666</id>
<content type='text'>
Check if bad page threshold is reached and take actions accordingly.

v2: remove rma message sent to smu when pmfw manages eeprom.
v3: add null pointer check for con.

Signed-off-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: try for more times if RAS bad page number is not updated</title>
<updated>2025-11-12T02:54:14+00:00</updated>
<author>
<name>Tao Zhou</name>
<email>tao.zhou1@amd.com</email>
</author>
<published>2025-08-27T11:33:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=334b27bf712b5ddd19908aba318175e4b9bcf839'/>
<id>urn:sha1:334b27bf712b5ddd19908aba318175e4b9bcf839</id>
<content type='text'>
RAS info update in PMFW is time cost, wait for it.

Signed-off-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: get RAS bad page address from MCA address</title>
<updated>2025-11-12T02:54:14+00:00</updated>
<author>
<name>Tao Zhou</name>
<email>tao.zhou1@amd.com</email>
</author>
<published>2025-08-27T07:48:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e84835940e60a7d5263767ee92acc08f9877cb26'/>
<id>urn:sha1:e84835940e60a7d5263767ee92acc08f9877cb26</id>
<content type='text'>
Instead of from physical address.

v2: add comment to make the code more readable

Signed-off-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: skip writing eeprom when PMFW manages RAS data</title>
<updated>2025-11-06T15:02:15+00:00</updated>
<author>
<name>Tao Zhou</name>
<email>tao.zhou1@amd.com</email>
</author>
<published>2025-09-08T12:39:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=541414065c59db785000d7661d3d07184e104ec2'/>
<id>urn:sha1:541414065c59db785000d7661d3d07184e104ec2</id>
<content type='text'>
Only update bad page number in legacy eeprom write path.

v2: add null pointer check for con.

Signed-off-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: add ras_eeprom_read_idx interface</title>
<updated>2025-11-06T14:58:35+00:00</updated>
<author>
<name>Tao Zhou</name>
<email>tao.zhou1@amd.com</email>
</author>
<published>2025-07-23T11:04:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=7f34ddf77d30959fcc24cd279074f7e0d4f732df'/>
<id>urn:sha1:7f34ddf77d30959fcc24cd279074f7e0d4f732df</id>
<content type='text'>
PMFW will manage RAS eeprom data by itself, add new interface to read
eeprom data via PMFW, we can read part of records by setting index.

v2: use IPID parse interface.
    pa is not used and set it to a fixed value.
v3: optimize the null pointer check for IPID parse interface.

Signed-off-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Reviewed-by: Yang Wang &lt;kevinyang.wang@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
</feed>
