<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h, branch v6.6.132</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.6.132</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.6.132'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2023-06-30T17:12:15+00:00</updated>
<entry>
<title>drm/amdgpu: gpu recovers from fatal error in poison mode</title>
<updated>2023-06-30T17:12:15+00:00</updated>
<author>
<name>YiPeng Chai</name>
<email>YiPeng.Chai@amd.com</email>
</author>
<published>2023-06-25T02:18:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=2c7cd280e5c4a626690315a6fbb70b49124d8354'/>
<id>urn:sha1:2c7cd280e5c4a626690315a6fbb70b49124d8354</id>
<content type='text'>
Fatal error occurs in ras poison mode, mode1 reset
is used to recover gpu.

Signed-off-by: YiPeng Chai &lt;YiPeng.Chai@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: perform mode2 reset for sdma fed error on gfx v11_0_3</title>
<updated>2023-06-09T14:38:19+00:00</updated>
<author>
<name>YiPeng Chai</name>
<email>YiPeng.Chai@amd.com</email>
</author>
<published>2023-05-16T09:34:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=6c47a79b3b8ba91faf89f9866da2ec16aac979e7'/>
<id>urn:sha1:6c47a79b3b8ba91faf89f9866da2ec16aac979e7</id>
<content type='text'>
perform mode2 reset for sdma fed error on gfx v11_0_3.

Signed-off-by: YiPeng Chai &lt;YiPeng.Chai@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: add instance mask for RAS inject</title>
<updated>2023-06-09T14:37:10+00:00</updated>
<author>
<name>Tao Zhou</name>
<email>tao.zhou1@amd.com</email>
</author>
<published>2023-02-27T10:25:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=2c22ed0bdb0cb6da9408593eafa6137325576017'/>
<id>urn:sha1:2c22ed0bdb0cb6da9408593eafa6137325576017</id>
<content type='text'>
User can specify injected instances by the mask. For backward
compatibility, the mask value is incorporated into sub block index
without interface change of RAS TA.
User uses logical mask and driver should convert it to physical value
before sending it to RAS TA.

v2: update parameter name.

Signed-off-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Reviewed-by: Stanley.Yang &lt;Stanley.Yang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Add common helper to reset ras error</title>
<updated>2023-06-09T13:52:54+00:00</updated>
<author>
<name>Hawking Zhang</name>
<email>Hawking.Zhang@amd.com</email>
</author>
<published>2023-02-03T08:10:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e53a3250f76b8a0dd5b533bd0ce0dc821055e77d'/>
<id>urn:sha1:e53a3250f76b8a0dd5b533bd0ce0dc821055e77d</id>
<content type='text'>
Add common helper to reset ras error status. It
applies to IP blocks that follow the new ras error
logging register design, and need to write 0 to
reset the error status. For IP blocks that don't
support the new design, please still implement ip
specific helper.

Signed-off-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Reviewed-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Reviewed-by: Lijo Lazar &lt;lijo.lazar@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Add common helper to query ras error (v2)</title>
<updated>2023-06-09T13:52:50+00:00</updated>
<author>
<name>Hawking Zhang</name>
<email>Hawking.Zhang@amd.com</email>
</author>
<published>2023-02-02T12:54:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=322a7e005db78b8a46ead91b7e3df3514cb658f0'/>
<id>urn:sha1:322a7e005db78b8a46ead91b7e3df3514cb658f0</id>
<content type='text'>
Add common helper to query ras error status and
log error information, including memory block id
and erorr count. The helpers are applicable to IP
blocks that follow the new ras error logging design.
For IP blocks that don't support the new design,
please still implement ip specific helper to query
ras error.

v2: optimize struct amdgpu_ras_err_status_reg_entry
and the implementaion in helper (Lijo/Tao)

Signed-off-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Reviewed-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Reviewed-by: Lijo Lazar &lt;lijo.lazar@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: fix unexpected block id</title>
<updated>2023-04-11T22:03:45+00:00</updated>
<author>
<name>Stanley.Yang</name>
<email>Stanley.Yang@amd.com</email>
</author>
<published>2023-04-10T10:20:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=caa4dffa9abd80f3360432cf89236f018be355ca'/>
<id>urn:sha1:caa4dffa9abd80f3360432cf89236f018be355ca</id>
<content type='text'>
Aldebaran supports VCN and JPEG RAS, it reports unexpected
block id message during VCN and JPEG RAS initialization if VCN
and JPEG block id not defined.

Signed-off-by: Stanley.Yang &lt;Stanley.Yang@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: exclude duplicate pages from UMC RAS UE count</title>
<updated>2023-02-23T22:35:59+00:00</updated>
<author>
<name>Tao Zhou</name>
<email>tao.zhou1@amd.com</email>
</author>
<published>2023-02-10T08:33:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=4d33e0f1340b3d08002ff8f9bcbf256cfdc4f3ba'/>
<id>urn:sha1:4d33e0f1340b3d08002ff8f9bcbf256cfdc4f3ba</id>
<content type='text'>
If a UMC bad page is reserved but not freed by an application, the
application may trigger uncorrectable error repeatly by accessing the page.

v2: add specific function to do the check.
v3: remove duplicate pages, calculate new added bad page number.
v4: reuse save_bad_pages to calculate new added bad page number.

Signed-off-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Reviewed-by: Stanley.Yang &lt;Stanley.Yang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: allow query error counters for specific IP block</title>
<updated>2023-01-05T16:42:14+00:00</updated>
<author>
<name>Hawking Zhang</name>
<email>Hawking.Zhang@amd.com</email>
</author>
<published>2023-01-03T15:41:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=4a1c9a444b5e0f276f43f77e1723088bbedb1687'/>
<id>urn:sha1:4a1c9a444b5e0f276f43f77e1723088bbedb1687</id>
<content type='text'>
amdgpu_ras_block_late_init will be invoked in IP
specific ras_late_init call as a common helper for
all the IP blocks.

However, when amdgpu_ras_block_late_init call
amdgpu_ras_query_error_count to query ras error
counters, amdgpu_ras_query_error_count queries
all the IP blocks that support ras query interface.

This results to wrong error counters cached in
software copies when there are ras errors detected
at time zero or warm reset procedure. i.e., in
sdma_ras_late_init phase, it counts on sdma/mmhub
errors, while, in mmhub_ras_late_init phase, it
still counts on sdma/mmhub errors.

The change updates amdgpu_ras_query_error_count
interface to allow query specific ip error counter.
It introduces a new input parameter: query_info. if
query_info is NULL,  it means query all the IP blocks,
otherwise, only query the ip block specified by
query_info.

Signed-off-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Reviewed-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: print umc correctable error address</title>
<updated>2022-06-03T20:44:15+00:00</updated>
<author>
<name>Stanley.Yang</name>
<email>Stanley.Yang@amd.com</email>
</author>
<published>2022-05-20T13:03:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=cbd3e8440e2e6a4d83479235c9bf278b89360946'/>
<id>urn:sha1:cbd3e8440e2e6a4d83479235c9bf278b89360946</id>
<content type='text'>
Signed-off-by: Stanley.Yang &lt;Stanley.Yang@amd.com&gt;
Acked-by: Lijo Lazar &lt;lijo.lazar@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu/pm: support mca_ceumc_addr in ecctable</title>
<updated>2022-06-03T20:43:36+00:00</updated>
<author>
<name>Stanley.Yang</name>
<email>Stanley.Yang@amd.com</email>
</author>
<published>2022-05-20T10:22:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=2f6247dad2c56cfe2df3c6e00586ead5ee905b46'/>
<id>urn:sha1:2f6247dad2c56cfe2df3c6e00586ead5ee905b46</id>
<content type='text'>
SMU add a new variable mca_ceumc_addr to record
umc correctable error address in EccInfo table,
driver side add EccInfo_V2_t to support this feature

Signed-off-by: Stanley.Yang &lt;Stanley.Yang@amd.com&gt;
Reviewed-by: Lijo Lazar &lt;lijo.lazar@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
</feed>
