<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h, branch v6.18.21</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.18.21</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.18.21'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2025-08-27T17:57:47+00:00</updated>
<entry>
<title>drm/amdgpu: Correct the loss of aca bank reg info</title>
<updated>2025-08-27T17:57:47+00:00</updated>
<author>
<name>Ce Sun</name>
<email>cesun102@amd.com</email>
</author>
<published>2025-08-20T09:18:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=d8442bcad0764c5613e9f8b2356f3e0a48327e20'/>
<id>urn:sha1:d8442bcad0764c5613e9f8b2356f3e0a48327e20</id>
<content type='text'>
By polling, poll ACA bank count to ensure that valid
ACA bank reg info can be obtained

v2: add corresponding delay before send msg to SMU to query mca bank info
(Stanley)

v3: the loop cannot exit. (Thomas)

v4: remove amdgpu_aca_clear_bank_count. (Kevin)

v5: continuously inject ce. If a creation interruption
occurs at this time, bank reg info will be lost. (Thomas)
v5: each cycle is delayed by 100ms. (Tao)

Signed-off-by: Ce Sun &lt;cesun102@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Add a mutex lock to protect poison injection</title>
<updated>2025-08-27T17:57:47+00:00</updated>
<author>
<name>Ce Sun</name>
<email>cesun102@amd.com</email>
</author>
<published>2025-08-19T06:47:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=0989b764f43d4de2c6665c15165c251d9cfde9c0'/>
<id>urn:sha1:0989b764f43d4de2c6665c15165c251d9cfde9c0</id>
<content type='text'>
When poison is triggered multiple times, competition will occur.
Add a mutex lock to protect poison injection

Signed-off-by: Ce Sun &lt;cesun102@amd.com&gt;
Reviewed-by: Yang Wang &lt;kevinyang.wang@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Avoid rma causes GPU duplicate reset</title>
<updated>2025-08-04T18:27:47+00:00</updated>
<author>
<name>Ce Sun</name>
<email>cesun102@amd.com</email>
</author>
<published>2025-07-27T04:06:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=21c0ffa612c98bcc6dab5bd9d977a18d565ee28e'/>
<id>urn:sha1:21c0ffa612c98bcc6dab5bd9d977a18d565ee28e</id>
<content type='text'>
Try to ensure poison creation handle is completed in time
to set device rma value.

Signed-off-by: Ce Sun &lt;cesun102@amd.com&gt;
Signed-off-by: Stanley.Yang &lt;Stanley.Yang@amd.com&gt;
Reviewed-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: support ras critical address check</title>
<updated>2025-07-28T20:40:06+00:00</updated>
<author>
<name>YiPeng Chai</name>
<email>YiPeng.Chai@amd.com</email>
</author>
<published>2025-07-24T07:34:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=f3486918979030f8982e1af901561dbd6e2cd1bc'/>
<id>urn:sha1:f3486918979030f8982e1af901561dbd6e2cd1bc</id>
<content type='text'>
Support ras critical address check.

Signed-off-by: YiPeng Chai &lt;YiPeng.Chai@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: adjust the update of RAS bad page number</title>
<updated>2025-07-28T20:40:06+00:00</updated>
<author>
<name>Tao Zhou</name>
<email>tao.zhou1@amd.com</email>
</author>
<published>2025-07-04T09:12:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=d45c5e6845a76169ef3d6076f0f04487e5776905'/>
<id>urn:sha1:d45c5e6845a76169ef3d6076f0f04487e5776905</id>
<content type='text'>
One eeprom record may not map to unit number of bad pages, the accurate
bad page number is gotten after bad page address check.

Signed-off-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: add command to check address validity</title>
<updated>2025-07-28T20:40:06+00:00</updated>
<author>
<name>YiPeng Chai</name>
<email>YiPeng.Chai@amd.com</email>
</author>
<published>2025-07-16T03:16:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a813437c33842c4e28a0656a9d8f20c3a8d35d6d'/>
<id>urn:sha1:a813437c33842c4e28a0656a9d8f20c3a8d35d6d</id>
<content type='text'>
Add command to check address validity and remove
unused command codes.

v2:
 The command interface adds new parameters to support
 multiple check address strategies.

Signed-off-by: YiPeng Chai &lt;YiPeng.Chai@amd.com&gt;
Reviewed-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Update ta ras block</title>
<updated>2025-03-26T21:44:34+00:00</updated>
<author>
<name>Stanley.Yang</name>
<email>Stanley.Yang@amd.com</email>
</author>
<published>2025-03-25T03:10:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=cc11dffc14bd2322d92a896a639cc42145401515'/>
<id>urn:sha1:cc11dffc14bd2322d92a896a639cc42145401515</id>
<content type='text'>
Update ta ra block to keep sync with RAS TA.

Signed-off-by: Stanley.Yang &lt;Stanley.Yang@amd.com&gt;
Reviewed-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Report generic instead of unknown boot time errors</title>
<updated>2025-02-27T21:50:03+00:00</updated>
<author>
<name>Xiang Liu</name>
<email>xiang.liu@amd.com</email>
</author>
<published>2025-02-26T06:27:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=d4bd7a50ca7c6199438cf19063464b4d6327a6c1'/>
<id>urn:sha1:d4bd7a50ca7c6199438cf19063464b4d6327a6c1</id>
<content type='text'>
Change the DMESG reporting of unknown errors to "Boot Controller
Generic Error" to align with the RAS SPEC and provide more clarity
to customers.

Signed-off-by: Xiang Liu &lt;xiang.liu@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Update usage for bad page threshold</title>
<updated>2025-02-13T02:02:59+00:00</updated>
<author>
<name>Hawking Zhang</name>
<email>Hawking.Zhang@amd.com</email>
</author>
<published>2025-01-22T11:34:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=16b85a0942c0b0f1611bcaa42cc98f020e34b1cf'/>
<id>urn:sha1:16b85a0942c0b0f1611bcaa42cc98f020e34b1cf</id>
<content type='text'>
The driver's behavior varies based on
the configuration of amdgpu_bad_page_threshold setting

Signed-off-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Reviewed-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: parse legacy RAS bad page mixed with new data in various NPS modes</title>
<updated>2024-12-10T15:26:48+00:00</updated>
<author>
<name>Tao Zhou</name>
<email>tao.zhou1@amd.com</email>
</author>
<published>2024-10-31T07:48:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a8d133e625ceb147a173b6cafc862a9bd4312894'/>
<id>urn:sha1:a8d133e625ceb147a173b6cafc862a9bd4312894</id>
<content type='text'>
All legacy RAS bad pages are generated in NPS1 mode, but new bad page
can be generated in any NPS mode, so we can't use retired_page stored
on eeprom directly in non-nps1 mode even for legacy data. We need to
take different actions for different data, new data can be identified
from old data by UMC_CHANNEL_IDX_V2 flag.

Signed-off-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
</feed>
