<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/drivers/gpu/drm/amd/amdgpu/amdgpu_mca.c, branch v6.12.80</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.12.80</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.12.80'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2024-08-06T15:11:01+00:00</updated>
<entry>
<title>drm/amdgpu: remove RAS unused paramter 'err_addr'</title>
<updated>2024-08-06T15:11:01+00:00</updated>
<author>
<name>Yang Wang</name>
<email>kevinyang.wang@amd.com</email>
</author>
<published>2024-08-02T02:11:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=671af06690e7f79db51b475a35c3b2619f345abc'/>
<id>urn:sha1:671af06690e7f79db51b475a35c3b2619f345abc</id>
<content type='text'>
- amdgpu_ras_error_statistic_ue_count()
- amdgpu_ras_error_statistic_ce_count()
- amdgpu_ras_error_statistic_de_count()

The parameter 'err_addr' is no longer used since following patch.

Fixes: a7e8467fbeee ("drm/amdgpu: Remove unused code")
Signed-off-by: Yang Wang &lt;kevinyang.wang@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: refine amdgpu ras event id core code</title>
<updated>2024-07-08T20:55:11+00:00</updated>
<author>
<name>Yang Wang</name>
<email>kevinyang.wang@amd.com</email>
</author>
<published>2024-06-25T06:23:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=75ac6a250632d2fff62039ae728c842033dceddb'/>
<id>urn:sha1:75ac6a250632d2fff62039ae728c842033dceddb</id>
<content type='text'>
v1:
- use unified event id to manage ras events
- add a new function amdgpu_ras_query_error_status_with_event() to accept
  event type as parameter.

v2:
add a warn log to show the location of function failure
when calling amdgpu_ras_mark_event(). (Tao Zhou)

v3:
change RAS_EVENT_TYPE_ISR to RAS_EVENT_TYPE_FATAL.

v4:
rename amdgpu_ras_get_recovery_event() to
amdgpu_ras_get_fatal_error_event().

Signed-off-by: Yang Wang &lt;kevinyang.wang@amd.com&gt;
Reviewed-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>Revert "drm/amdgpu: change bank cache lock type to spinlock"</title>
<updated>2024-06-19T16:50:31+00:00</updated>
<author>
<name>Yang Wang</name>
<email>kevinyang.wang@amd.com</email>
</author>
<published>2024-06-18T01:47:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=8c9ee180196fb2a04e28891578ae608f772eab9c'/>
<id>urn:sha1:8c9ee180196fb2a04e28891578ae608f772eab9c</id>
<content type='text'>
This reverts commit 258ed689bc3163f86204f75df6c23f92b59b3fad

revert this patch to modify lock type back to 'mutex' to avoid kernel
calltrace issue.

[  602.668806] Workqueue: amdgpu-reset-dev amdgpu_ras_do_recovery [amdgpu]
[  602.668939] Call Trace:
[  602.668940]  &lt;TASK&gt;
[  602.668941]  dump_stack_lvl+0x4c/0x70
[  602.668945]  dump_stack+0x14/0x20
[  602.668946]  __schedule_bug+0x5a/0x70
[  602.668950]  __schedule+0x940/0xb30
[  602.668952]  ? srso_alias_return_thunk+0x5/0xfbef5
[  602.668955]  ? hrtimer_reprogram+0x77/0xb0
[  602.668957]  ? srso_alias_return_thunk+0x5/0xfbef5
[  602.668959]  ? hrtimer_start_range_ns+0x126/0x370
[  602.668961]  schedule+0x39/0xe0
[  602.668962]  schedule_hrtimeout_range_clock+0xb1/0x140
[  602.668964]  ? __pfx_hrtimer_wakeup+0x10/0x10
[  602.668966]  schedule_hrtimeout_range+0x17/0x20
[  602.668967]  usleep_range_state+0x69/0x90
[  602.668970]  psp_cmd_submit_buf+0x132/0x570 [amdgpu]
[  602.669066]  psp_ras_invoke+0x75/0x1a0 [amdgpu]
[  602.669156]  psp_ras_query_address+0x9c/0x120 [amdgpu]
[  602.669245]  umc_v12_0_update_ecc_status+0x16d/0x520 [amdgpu]
[  602.669337]  ? srso_alias_return_thunk+0x5/0xfbef5
[  602.669339]  ? stack_depot_save+0x12/0x20
[  602.669342]  ? srso_alias_return_thunk+0x5/0xfbef5
[  602.669343]  ? set_track_prepare+0x52/0x70
[  602.669346]  ? kmemleak_alloc+0x4f/0x90
[  602.669348]  ? __kmalloc_node+0x34b/0x450
[  602.669352]  amdgpu_umc_update_ecc_status+0x23/0x40 [amdgpu]
[  602.669438]  mca_umc_mca_get_err_count+0x85/0xc0 [amdgpu]
[  602.669554]  mca_smu_parse_mca_error_count+0x120/0x1d0 [amdgpu]
[  602.669655]  amdgpu_mca_dispatch_mca_set.part.0+0x141/0x250 [amdgpu]
[  602.669743]  ? kmemleak_free+0x36/0x60
[  602.669745]  ? kvfree+0x32/0x40
[  602.669747]  ? srso_alias_return_thunk+0x5/0xfbef5
[  602.669749]  ? kfree+0x15d/0x2a0
[  602.669752]  amdgpu_mca_smu_log_ras_error+0x1f6/0x210 [amdgpu]
[  602.669839]  amdgpu_ras_query_error_status_helper+0x2ad/0x390 [amdgpu]
[  602.669924]  ? srso_alias_return_thunk+0x5/0xfbef5
[  602.669925]  ? __call_rcu_common.constprop.0+0xa6/0x2b0
[  602.669929]  amdgpu_ras_query_error_status+0xf3/0x620 [amdgpu]
[  602.670014]  ? srso_alias_return_thunk+0x5/0xfbef5
[  602.670017]  amdgpu_ras_log_on_err_counter+0xe1/0x170 [amdgpu]
[  602.670103]  amdgpu_ras_do_recovery+0xd2/0x2c0 [amdgpu]
[  602.670187]  ? srso_alias_return_thunk+0x5/0xfbef5
[  602.670189]  ? __schedule+0x37d/0xb30
[  602.670191]  process_one_work+0x176/0x350
[  602.670194]  worker_thread+0x2f7/0x420
[  602.670197]  ?

Signed-off-by: Yang Wang &lt;kevinyang.wang@amd.com&gt;
Reviewed-by: YiPeng Chai &lt;YiPeng.Chai@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: move aca/mca init functions into ras_init() stage</title>
<updated>2024-06-14T20:17:12+00:00</updated>
<author>
<name>Yang Wang</name>
<email>kevinyang.wang@amd.com</email>
</author>
<published>2024-06-04T08:30:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=9817f06173cfa74b78e9e701a064864d62e88cd2'/>
<id>urn:sha1:9817f06173cfa74b78e9e701a064864d62e88cd2</id>
<content type='text'>
adjust the function position to better match aca/mca fini code in ras_fini().

Signed-off-by: Yang Wang &lt;kevinyang.wang@amd.com&gt;
Reviewed-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: change bank cache lock type to spinlock</title>
<updated>2024-05-17T21:40:39+00:00</updated>
<author>
<name>Yang Wang</name>
<email>kevinyang.wang@amd.com</email>
</author>
<published>2024-05-16T23:56:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=258ed689bc3163f86204f75df6c23f92b59b3fad'/>
<id>urn:sha1:258ed689bc3163f86204f75df6c23f92b59b3fad</id>
<content type='text'>
modify the lock type to 'spinlock' to avoid schedule issue
in interrupt context.

Signed-off-by: Yang Wang &lt;kevinyang.wang@amd.com&gt;
Reviewed-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: ignoring unsupported ras blocks when MCA bank dispatches</title>
<updated>2024-05-08T19:17:05+00:00</updated>
<author>
<name>Yang Wang</name>
<email>kevinyang.wang@amd.com</email>
</author>
<published>2024-05-06T02:41:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=85a24a3ea09e93c05ff59609fde6c3d825a014c2'/>
<id>urn:sha1:85a24a3ea09e93c05ff59609fde6c3d825a014c2</id>
<content type='text'>
This patch is used to solve the problem of incorrect parsing of error counts.
When the UE trigger gpu is reset, the driver will attempt to parse all possible ras blocks.
For ras blocks that are not supported by the current ASIC, the driver should ignore this error.

Signed-off-by: Yang Wang &lt;kevinyang.wang@amd.com&gt;
Reviewed-by: Candice Li &lt;candice.li@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Add smu v13_0_14 ip block</title>
<updated>2024-05-02T19:49:11+00:00</updated>
<author>
<name>Hawking Zhang</name>
<email>Hawking.Zhang@amd.com</email>
</author>
<published>2024-04-30T16:12:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a6bcffa596770b0c54b3ddccbc115bdab4df08e9'/>
<id>urn:sha1:a6bcffa596770b0c54b3ddccbc115bdab4df08e9</id>
<content type='text'>
Add smu v13_0_14 ip block support

Signed-off-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Reviewed-by: Le Ma &lt;Le.Ma@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: avoid dump mca bank log muti times during ras ISR</title>
<updated>2024-04-30T13:58:47+00:00</updated>
<author>
<name>Yang Wang</name>
<email>kevinyang.wang@amd.com</email>
</author>
<published>2024-04-23T02:14:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=5eccab32c15f1e5cf9651d865fb20012d3563c96'/>
<id>urn:sha1:5eccab32c15f1e5cf9651d865fb20012d3563c96</id>
<content type='text'>
because the ue valid mca count will only be cleared after gpu reset,
so only dump mca log on the first time to get mca bank after receive RAS interrupt.

Signed-off-by: Yang Wang &lt;kevinyang.wang@amd.com&gt;
Reviewed-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: add MCA smu cache support</title>
<updated>2024-04-30T13:58:41+00:00</updated>
<author>
<name>Yang Wang</name>
<email>kevinyang.wang@amd.com</email>
</author>
<published>2024-04-18T07:46:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=76ad30f51aa0d1bd99f12658d4775a86df6e4282'/>
<id>urn:sha1:76ad30f51aa0d1bd99f12658d4775a86df6e4282</id>
<content type='text'>
v1:
because SMU CE valid mca bank will be cleared after reading,
this patch adds mca cache at the driver level to ensure that the mca bank is not lost.

v2:
refine amdgpu_mca_init/fini/reset() function name.

v3:
add mca_cache.lock support
only add CE bank to mca bank cache.

Signed-off-by: Yang Wang &lt;kevinyang.wang@amd.com&gt;
Reviewed-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: add amdgpu MCA bank dispatch function support</title>
<updated>2024-04-30T13:58:34+00:00</updated>
<author>
<name>Yang Wang</name>
<email>kevinyang.wang@amd.com</email>
</author>
<published>2024-04-18T05:12:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=8fb20d9551368f0b1ef5b31fa0b4634f4be37157'/>
<id>urn:sha1:8fb20d9551368f0b1ef5b31fa0b4634f4be37157</id>
<content type='text'>
- Refine mca driver code.
- Centralize mca bank dispatch code logic.

Signed-off-by: Yang Wang &lt;kevinyang.wang@amd.com&gt;
Reviewed-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
</feed>
