<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c, branch v6.4.11</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.4.11</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.4.11'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2023-07-19T14:36:05+00:00</updated>
<entry>
<title>drm/amdgpu: Fix usage of UMC fill record in RAS</title>
<updated>2023-07-19T14:36:05+00:00</updated>
<author>
<name>Luben Tuikov</name>
<email>luben.tuikov@amd.com</email>
</author>
<published>2023-06-10T10:19:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=9f40de4df625b3b7f26bc7c0ee0a853815aa6d23'/>
<id>urn:sha1:9f40de4df625b3b7f26bc7c0ee0a853815aa6d23</id>
<content type='text'>
[ Upstream commit 71344a718a9fda8c551cdc4381d354f9a9907f6f ]

The fixed commit listed in the Fixes tag below, introduced a bug in
amdgpu_ras.c::amdgpu_reserve_page_direct(), in that when introducing the new
amdgpu_umc_fill_error_record() and internally in that new function the physical
address (argument "uint64_t retired_page"--wrong name) is right-shifted by
AMDGPU_GPU_PAGE_SHIFT. Thus, in amdgpu_reserve_page_direct() when we pass
"address" to that new function, we should NOT right-shift it, since this
results, erroneously, in the page address to be 0 for first
2^(2*AMDGPU_GPU_PAGE_SHIFT) memory addresses.

This commit fixes this bug.

Cc: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Cc: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Cc: Alex Deucher &lt;Alexander.Deucher@amd.com&gt;
Fixes: 400013b268cb ("drm/amdgpu: add umc_fill_error_record to make code more simple")
Signed-off-by: Luben Tuikov &lt;luben.tuikov@amd.com&gt;
Link: https://lore.kernel.org/r/20230610113536.10621-1-luben.tuikov@amd.com
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>Revert "drm/amdgpu: enable ras for mp0 v13_0_10 on SRIOV"</title>
<updated>2023-04-14T17:47:49+00:00</updated>
<author>
<name>Jane Jian</name>
<email>Jane.Jian@amd.com</email>
</author>
<published>2023-04-14T03:33:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b805d8d785e49cb3ee9279dad1402d5dcf902166'/>
<id>urn:sha1:b805d8d785e49cb3ee9279dad1402d5dcf902166</id>
<content type='text'>
This reverts commit fe120b9f5ce873516a2604e4ff0c19084be94e8c.
This patch impacts sriov multi-vf stability

Signed-off-by: Jane Jian &lt;Jane.Jian@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: correct ras enabled flag</title>
<updated>2023-04-11T22:03:45+00:00</updated>
<author>
<name>Stanley.Yang</name>
<email>Stanley.Yang@amd.com</email>
</author>
<published>2023-04-10T11:43:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=58bc2a9cbfdd4abdbfaafd835a0cd78bdad11423'/>
<id>urn:sha1:58bc2a9cbfdd4abdbfaafd835a0cd78bdad11423</id>
<content type='text'>
XGMI RAS should be according to the gmc xgmi physical nodes number,
XGMI RAS should not be enabled if xgmi num_physical_nodes is zero.

Signed-off-by: Stanley.Yang &lt;Stanley.Yang@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Add fatal error handling in nbio v4_3</title>
<updated>2023-03-31T15:18:32+00:00</updated>
<author>
<name>Hawking Zhang</name>
<email>Hawking.Zhang@amd.com</email>
</author>
<published>2023-03-23T02:21:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=9af357bc3e05400eb632f3975986e1eac196f159'/>
<id>urn:sha1:9af357bc3e05400eb632f3975986e1eac196f159</id>
<content type='text'>
GPU will stop working once fatal error is detected.
it will inform driver to do reset to recover from
the fatal error.

v2: squash in logic fix (Srinivasan)
v3: squash in logic fix (Dan)

Signed-off-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Reviewed-by: Candice Li &lt;candice.li@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: enable ras for mp0 v13_0_10 on SRIOV</title>
<updated>2023-03-22T04:58:37+00:00</updated>
<author>
<name>YiPeng Chai</name>
<email>YiPeng.Chai@amd.com</email>
</author>
<published>2023-03-03T02:01:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=fe120b9f5ce873516a2604e4ff0c19084be94e8c'/>
<id>urn:sha1:fe120b9f5ce873516a2604e4ff0c19084be94e8c</id>
<content type='text'>
Enable ras for mp0 v13_0_10 on SRIOV.

Signed-off-by: YiPeng Chai &lt;YiPeng.Chai@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: drop ras check at asic level for new blocks</title>
<updated>2023-03-15T22:45:27+00:00</updated>
<author>
<name>Hawking Zhang</name>
<email>Hawking.Zhang@amd.com</email>
</author>
<published>2023-03-04T12:22:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=370808876b5cab365f8fc6dbaf8cae13a2bc6efa'/>
<id>urn:sha1:370808876b5cab365f8fc6dbaf8cae13a2bc6efa</id>
<content type='text'>
amdgpu_ras_register_ras_block should always be invoked
by ras_sw_init, where driver needs to check ras caps
at ip level, instead of asic level.

Signed-off-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Reviewed-by: Stanley Yang &lt;Stanley.Yang@amd.com&gt;
Reviewed-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Rework pcie_bif ras sw_init</title>
<updated>2023-03-15T22:45:27+00:00</updated>
<author>
<name>Hawking Zhang</name>
<email>Hawking.Zhang@amd.com</email>
</author>
<published>2023-03-13T06:18:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=fdc94d3a8c887e4e06a7ff8dcb51d55cd70e16cf'/>
<id>urn:sha1:fdc94d3a8c887e4e06a7ff8dcb51d55cd70e16cf</id>
<content type='text'>
pcie_bif ras blocks needs to be initialized as early
as possible to handle fatal error detected in hw_init
phase. also align the pcie_bif ras sw_init with other
ras blocks

Signed-off-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Reviewed-by: Stanley Yang &lt;Stanley.Yang@amd.com&gt;
Reviewed-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: change default behavior of bad_page_threshold parameter</title>
<updated>2023-02-23T22:35:59+00:00</updated>
<author>
<name>Tao Zhou</name>
<email>tao.zhou1@amd.com</email>
</author>
<published>2023-02-21T07:25:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=f3cbe70e215a87dcfdf028582a2fa94b24a08efe'/>
<id>urn:sha1:f3cbe70e215a87dcfdf028582a2fa94b24a08efe</id>
<content type='text'>
Ignore ras umc bad page threshold by default, GPU initialization won't
be stopped in this mode.

v2: refine the description of bad_page_threshold.

Signed-off-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Reviewed-by: Stanley.Yang &lt;Stanley.Yang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: exclude duplicate pages from UMC RAS UE count</title>
<updated>2023-02-23T22:35:59+00:00</updated>
<author>
<name>Tao Zhou</name>
<email>tao.zhou1@amd.com</email>
</author>
<published>2023-02-10T08:33:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=4d33e0f1340b3d08002ff8f9bcbf256cfdc4f3ba'/>
<id>urn:sha1:4d33e0f1340b3d08002ff8f9bcbf256cfdc4f3ba</id>
<content type='text'>
If a UMC bad page is reserved but not freed by an application, the
application may trigger uncorrectable error repeatly by accessing the page.

v2: add specific function to do the check.
v3: remove duplicate pages, calculate new added bad page number.
v4: reuse save_bad_pages to calculate new added bad page number.

Signed-off-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Reviewed-by: Stanley.Yang &lt;Stanley.Yang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Adjust ras support check condition for special asic</title>
<updated>2023-01-17T21:11:51+00:00</updated>
<author>
<name>YiPeng Chai</name>
<email>YiPeng.Chai@amd.com</email>
</author>
<published>2023-01-06T12:16:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=8f453c51cfae92fded6e232985f6943c51b7829c'/>
<id>urn:sha1:8f453c51cfae92fded6e232985f6943c51b7829c</id>
<content type='text'>
[Why]:
     Amdgpu ras uses amdgpu_ras_is_supported to check whether
  the ras block supports the ras function. amdgpu_ras_is_supported
  uses .ras_enabled to determine whether the ras function of the
  block is enabled.
     But for special asic with mem ecc enabled but sram ecc not
  enabled, some ras blocks support poison mode but their ras function
  is not enabled on .ras_enabled, these ras blocks will run abnormally.

[How]:
    If the ras block is not supported on .ras_enabled but the asic
  supports poison mode and the ras block has ras configuration, it
  can be considered that the ras block supports ras function.

Signed-off-by: YiPeng Chai &lt;YiPeng.Chai@amd.com&gt;
Reviewed-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
</feed>
