<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/drivers/acpi/apei, branch v6.6.131</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.6.131</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.6.131'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2026-03-04T12:20:33+00:00</updated>
<entry>
<title>APEI/GHES: ensure that won't go past CPER allocated record</title>
<updated>2026-03-04T12:20:33+00:00</updated>
<author>
<name>Mauro Carvalho Chehab</name>
<email>mchehab+huawei@kernel.org</email>
</author>
<published>2026-01-08T11:35:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e0ec99115e135dbb58e11a0df007c7d4771d4a17'/>
<id>urn:sha1:e0ec99115e135dbb58e11a0df007c7d4771d4a17</id>
<content type='text'>
[ Upstream commit fa2408a24f8f0db14d9cfc613ef162dc267d7ad4 ]

The logic at ghes_new() prevents allocating too large records, by
checking if they're bigger than GHES_ESTATUS_MAX_SIZE (currently, 64KB).
Yet, the allocation is done with the actual number of pages from the
CPER bios table location, which can be smaller.

Yet, a bad firmware could send data with a different size, which might
be bigger than the allocated memory, causing an OOPS:

    Unable to handle kernel paging request at virtual address fff00000f9b40000
    Mem abort info:
      ESR = 0x0000000096000007
      EC = 0x25: DABT (current EL), IL = 32 bits
      SET = 0, FnV = 0
      EA = 0, S1PTW = 0
      FSC = 0x07: level 3 translation fault
    Data abort info:
      ISV = 0, ISS = 0x00000007, ISS2 = 0x00000000
      CM = 0, WnR = 0, TnD = 0, TagAccess = 0
      GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
    swapper pgtable: 4k pages, 52-bit VAs, pgdp=000000008ba16000
    [fff00000f9b40000] pgd=180000013ffff403, p4d=180000013fffe403, pud=180000013f85b403, pmd=180000013f68d403, pte=0000000000000000
    Internal error: Oops: 0000000096000007 [#1]  SMP
    Modules linked in:
    CPU: 0 UID: 0 PID: 303 Comm: kworker/0:1 Not tainted 6.19.0-rc1-00002-gda407d200220 #34 PREEMPT
    Hardware name: QEMU QEMU Virtual Machine, BIOS unknown 02/02/2022
    Workqueue: kacpi_notify acpi_os_execute_deferred
    pstate: 214020c5 (nzCv daIF +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
    pc : hex_dump_to_buffer+0x30c/0x4a0
    lr : hex_dump_to_buffer+0x328/0x4a0
    sp : ffff800080e13880
    x29: ffff800080e13880 x28: ffffac9aba86f6a8 x27: 0000000000000083
    x26: fff00000f9b3fffc x25: 0000000000000004 x24: 0000000000000004
    x23: ffff800080e13905 x22: 0000000000000010 x21: 0000000000000083
    x20: 0000000000000001 x19: 0000000000000008 x18: 0000000000000010
    x17: 0000000000000001 x16: 00000007c7f20fec x15: 0000000000000020
    x14: 0000000000000008 x13: 0000000000081020 x12: 0000000000000008
    x11: ffff800080e13905 x10: ffff800080e13988 x9 : 0000000000000000
    x8 : 0000000000000000 x7 : 0000000000000001 x6 : 0000000000000020
    x5 : 0000000000000030 x4 : 00000000fffffffe x3 : 0000000000000000
    x2 : ffffac9aba78c1c8 x1 : ffffac9aba76d0a8 x0 : 0000000000000008
    Call trace:
     hex_dump_to_buffer+0x30c/0x4a0 (P)
     print_hex_dump+0xac/0x170
     cper_estatus_print_section+0x90c/0x968
     cper_estatus_print+0xf0/0x158
     __ghes_print_estatus+0xa0/0x148
     ghes_proc+0x1bc/0x220
     ghes_notify_hed+0x5c/0xb8
     notifier_call_chain+0x78/0x148
     blocking_notifier_call_chain+0x4c/0x80
     acpi_hed_notify+0x28/0x40
     acpi_ev_notify_dispatch+0x50/0x80
     acpi_os_execute_deferred+0x24/0x48
     process_one_work+0x15c/0x3b0
     worker_thread+0x2d0/0x400
     kthread+0x148/0x228
     ret_from_fork+0x10/0x20
    Code: 6b14033f 540001ad a94707e2 f100029f (b8747b44)
    ---[ end trace 0000000000000000 ]---

Prevent that by taking the actual allocated are into account when
checking for CPER length.

Signed-off-by: Mauro Carvalho Chehab &lt;mchehab+huawei@kernel.org&gt;
Reviewed-by: Jonathan Cameron &lt;jonathan.cameron@huawei.com&gt;
Acked-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Reviewed-by: Hanjun Guo &lt;guohanjun@huawei.com&gt;
[ rjw: Subject tweaks ]
Link: https://patch.msgid.link/4e70310a816577fabf37d94ed36cde4ad62b1e0a.1767871950.git.mchehab+huawei@kernel.org
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>efi/cper: align ARM CPER type with UEFI 2.9A/2.10 specs</title>
<updated>2026-01-11T14:21:40+00:00</updated>
<author>
<name>Mauro Carvalho Chehab</name>
<email>mchehab+huawei@kernel.org</email>
</author>
<published>2025-08-14T16:52:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=94a4c58d7195c360f6719506931bbe3186ae91f4'/>
<id>urn:sha1:94a4c58d7195c360f6719506931bbe3186ae91f4</id>
<content type='text'>
[ Upstream commit 96b010536ee020e716d28d9b359a4bcd18800aeb ]

Up to UEFI spec 2.9, the type byte of CPER struct for ARM processor
was defined simply as:

Type at byte offset 4:

	- Cache error
	- TLB Error
	- Bus Error
	- Micro-architectural Error
	All other values are reserved

Yet, there was no information about how this would be encoded.

Spec 2.9A errata corrected it by defining:

	- Bit 1 - Cache Error
	- Bit 2 - TLB Error
	- Bit 3 - Bus Error
	- Bit 4 - Micro-architectural Error
	All other values are reserved

That actually aligns with the values already defined on older
versions at N.2.4.1. Generic Processor Error Section.

Spec 2.10 also preserve the same encoding as 2.9A.

Adjust CPER and GHES handling code for both generic and ARM
processors to properly handle UEFI 2.9A and 2.10 encoding.

Link: https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.html#arm-processor-error-information
Signed-off-by: Mauro Carvalho Chehab &lt;mchehab+huawei@kernel.org&gt;
Reviewed-by: Jonathan Cameron &lt;Jonathan.Cameron@huawei.com&gt;
Acked-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Signed-off-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>ACPI: APEI: GHES: add TAINT_MACHINE_CHECK on GHES panic path</title>
<updated>2025-08-28T14:28:17+00:00</updated>
<author>
<name>Breno Leitao</name>
<email>leitao@debian.org</email>
</author>
<published>2025-07-02T15:39:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=d9d611639beab4dd0322b714f58d0fe0f46104c3'/>
<id>urn:sha1:d9d611639beab4dd0322b714f58d0fe0f46104c3</id>
<content type='text'>
[ Upstream commit 4734c8b46b901cff2feda8b82abc710b65dc31c1 ]

When a GHES (Generic Hardware Error Source) triggers a panic, add the
TAINT_MACHINE_CHECK taint flag to the kernel. This explicitly marks the
kernel as tainted due to a machine check event, improving diagnostics
and post-mortem analysis. The taint is set with LOCKDEP_STILL_OK to
indicate lockdep remains valid.

At large scale deployment, this helps to quickly determine panics that
are coming due to hardware failures.

Signed-off-by: Breno Leitao &lt;leitao@debian.org&gt;
Reviewed-by: Tony Luck &lt;tony.luck@intel.com&gt;
Link: https://patch.msgid.link/20250702-add_tain-v1-1-9187b10914b9@debian.org
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>ACPI: APEI: send SIGBUS to current task if synchronous memory error not recovered</title>
<updated>2025-08-28T14:28:16+00:00</updated>
<author>
<name>Shuai Xue</name>
<email>xueshuai@linux.alibaba.com</email>
</author>
<published>2025-07-14T11:42:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=082735fbcdb6cd0cf20fbec94516ab2996f1cdd5'/>
<id>urn:sha1:082735fbcdb6cd0cf20fbec94516ab2996f1cdd5</id>
<content type='text'>
[ Upstream commit 79a5ae3c4c5eb7e38e0ebe4d6bf602d296080060 ]

If a synchronous error is detected as a result of user-space process
triggering a 2-bit uncorrected error, the CPU will take a synchronous
error exception such as Synchronous External Abort (SEA) on Arm64. The
kernel will queue a memory_failure() work which poisons the related
page, unmaps the page, and then sends a SIGBUS to the process, so that
a system wide panic can be avoided.

However, no memory_failure() work will be queued when abnormal
synchronous errors occur. These errors can include situations like
invalid PA, unexpected severity, no memory failure config support,
invalid GUID section, etc. In such a case, the user-space process will
trigger SEA again.  This loop can potentially exceed the platform
firmware threshold or even trigger a kernel hard lockup, leading to a
system reboot.

Fix it by performing a force kill if no memory_failure() work is queued
for synchronous errors.

Signed-off-by: Shuai Xue &lt;xueshuai@linux.alibaba.com&gt;
Reviewed-by: Jarkko Sakkinen &lt;jarkko@kernel.org&gt;
Reviewed-by: Jonathan Cameron &lt;Jonathan.Cameron@huawei.com&gt;
Reviewed-by: Yazen Ghannam &lt;yazen.ghannam@amd.com&gt;
Reviewed-by: Jane Chu &lt;jane.chu@oracle.com&gt;
Reviewed-by: Hanjun Guo &lt;guohanjun@huawei.com&gt;
Link: https://patch.msgid.link/20250714114212.31660-2-xueshuai@linux.alibaba.com
[ rjw: Changelog edits ]
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>firmware: SDEI: Allow sdei initialization without ACPI_APEI_GHES</title>
<updated>2025-06-19T13:28:08+00:00</updated>
<author>
<name>Huang Yiwei</name>
<email>quic_hyiwei@quicinc.com</email>
</author>
<published>2025-05-07T04:57:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=99d4011a0a322e3eb0a7828630145311b8a0b58e'/>
<id>urn:sha1:99d4011a0a322e3eb0a7828630145311b8a0b58e</id>
<content type='text'>
[ Upstream commit 59529bbe642de4eb2191a541d9b4bae7eb73862e ]

SDEI usually initialize with the ACPI table, but on platforms where
ACPI is not used, the SDEI feature can still be used to handle
specific firmware calls or other customized purposes. Therefore, it
is not necessary for ARM_SDE_INTERFACE to depend on ACPI_APEI_GHES.

In commit dc4e8c07e9e2 ("ACPI: APEI: explicit init of HEST and GHES
in acpi_init()"), to make APEI ready earlier, sdei_init was moved
into acpi_ghes_init instead of being a standalone initcall, adding
ACPI_APEI_GHES dependency to ARM_SDE_INTERFACE. This restricts the
flexibility and usability of SDEI.

This patch corrects the dependency in Kconfig and splits sdei_init()
into two separate functions: sdei_init() and acpi_sdei_init().
sdei_init() will be called by arch_initcall and will only initialize
the platform driver, while acpi_sdei_init() will initialize the
device from acpi_ghes_init() when ACPI is ready. This allows the
initialization of SDEI without ACPI_APEI_GHES enabled.

Fixes: dc4e8c07e9e2 ("ACPI: APEI: explicit init of HEST and GHES in apci_init()")
Cc: Shuai Xue &lt;xueshuai@linux.alibaba.com&gt;
Signed-off-by: Huang Yiwei &lt;quic_hyiwei@quicinc.com&gt;
Reviewed-by: Shuai Xue &lt;xueshuai@linux.alibaba.com&gt;
Reviewed-by: Gavin Shan &lt;gshan@redhat.com&gt;
Acked-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Link: https://lore.kernel.org/r/20250507045757.2658795-1-quic_hyiwei@quicinc.com
Signed-off-by: Will Deacon &lt;will@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>APEI: GHES: Have GHES honor the panic= setting</title>
<updated>2025-02-17T08:40:08+00:00</updated>
<author>
<name>Borislav Petkov</name>
<email>bp@alien8.de</email>
</author>
<published>2025-01-13T12:52:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c09a05b3a3941578a588b49e12c55e84f2ae75cb'/>
<id>urn:sha1:c09a05b3a3941578a588b49e12c55e84f2ae75cb</id>
<content type='text'>
[ Upstream commit 5c0e00a391dd0099fe95991bb2f962848d851916 ]

The GHES driver overrides the panic= setting by force-rebooting the
system after a fatal hw error has been reported. The intent being that
such an error would be reported earlier.

However, this is not optimal when a hard-to-debug issue requires long
time to reproduce and when that happens, the box will get rebooted after
30 seconds and thus destroy the whole hw context of when the error
happened.

So rip out the default GHES panic timeout and honor the global one.

In the panic disabled (panic=0) case, the error will still be logged to
dmesg for later inspection and if panic after a hw error is really
required, then that can be controlled the usual way - use panic= on the
cmdline or set it in the kernel .config's CONFIG_PANIC_TIMEOUT.

Reported-by: Feng Tang &lt;feng.tang@linux.alibaba.com&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Reviewed-by: Feng Tang &lt;feng.tang@linux.alibaba.com&gt;
Reviewed-by: Ira Weiny &lt;ira.weiny@intel.com&gt;
Link: https://patch.msgid.link/20250113125224.GFZ4UMiNtWIJvgpveU@fat_crate.local
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>ACPI: APEI: set memory failure flags as MF_ACTION_REQUIRED on synchronous events</title>
<updated>2024-02-05T20:14:15+00:00</updated>
<author>
<name>Shuai Xue</name>
<email>xueshuai@linux.alibaba.com</email>
</author>
<published>2023-12-18T06:45:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=410063c9e100cb694f83591704fa69f28696dfa1'/>
<id>urn:sha1:410063c9e100cb694f83591704fa69f28696dfa1</id>
<content type='text'>
[ Upstream commit a70297d2213253853e95f5b49651f924990c6d3b ]

There are two major types of uncorrected recoverable (UCR) errors :

 - Synchronous error: The error is detected and raised at the point of
   the consumption in the execution flow, e.g. when a CPU tries to
   access a poisoned cache line. The CPU will take a synchronous error
   exception such as Synchronous External Abort (SEA) on Arm64 and
   Machine Check Exception (MCE) on X86. OS requires to take action (for
   example, offline failure page/kill failure thread) to recover this
   uncorrectable error.

 - Asynchronous error: The error is detected out of processor execution
   context, e.g. when an error is detected by a background scrubber.
   Some data in the memory are corrupted. But the data have not been
   consumed. OS is optional to take action to recover this uncorrectable
   error.

When APEI firmware first is enabled, a platform may describe one error
source for the handling of synchronous errors (e.g. MCE or SEA notification
), or for handling asynchronous errors (e.g. SCI or External Interrupt
notification). In other words, we can distinguish synchronous errors by
APEI notification. For synchronous errors, kernel will kill the current
process which accessing the poisoned page by sending SIGBUS with
BUS_MCEERR_AR. In addition, for asynchronous errors, kernel will notify the
process who owns the poisoned page by sending SIGBUS with BUS_MCEERR_AO in
early kill mode. However, the GHES driver always sets mf_flags to 0 so that
all synchronous errors are handled as asynchronous errors in memory failure.

To this end, set memory failure flags as MF_ACTION_REQUIRED on synchronous
events.

Signed-off-by: Shuai Xue &lt;xueshuai@linux.alibaba.com&gt;
Tested-by: Ma Wupeng &lt;mawupeng1@huawei.com&gt;
Reviewed-by: Kefeng Wang &lt;wangkefeng.wang@huawei.com&gt;
Reviewed-by: Xiaofei Tan &lt;tanxiaofei@huawei.com&gt;
Reviewed-by: Baolin Wang &lt;baolin.wang@linux.alibaba.com&gt;
Reviewed-by: James Morse &lt;james.morse@arm.com&gt;
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>ACPI: APEI: Fix AER info corruption when error status data has multiple sections</title>
<updated>2023-11-28T17:19:37+00:00</updated>
<author>
<name>Shiju Jose</name>
<email>shiju.jose@huawei.com</email>
</author>
<published>2023-09-20T18:03:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b7765b0a034553018f0d815e27f3e9d4178a31a5'/>
<id>urn:sha1:b7765b0a034553018f0d815e27f3e9d4178a31a5</id>
<content type='text'>
[ Upstream commit e2abc47a5a1a9f641e7cacdca643fdd40729bf6e ]

ghes_handle_aer() passes AER data to the PCI core for logging and
recovery by calling aer_recover_queue() with a pointer to struct
aer_capability_regs.

The problem was that aer_recover_queue() queues the pointer directly
without copying the aer_capability_regs data.  The pointer was to
the ghes-&gt;estatus buffer, which could be reused before
aer_recover_work_func() reads the data.

To avoid this problem, allocate a new aer_capability_regs structure
from the ghes_estatus_pool, copy the AER data from the ghes-&gt;estatus
buffer into it, pass a pointer to the new struct to
aer_recover_queue(), and free it after aer_recover_work_func() has
processed it.

Reported-by: Bjorn Helgaas &lt;helgaas@kernel.org&gt;
Acked-by: Bjorn Helgaas &lt;bhelgaas@google.com&gt;
Signed-off-by: Shiju Jose &lt;shiju.jose@huawei.com&gt;
[ rjw: Subject edits ]
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>APEI: GHES: correctly return NULL for ghes_get_devices()</title>
<updated>2023-06-12T17:31:48+00:00</updated>
<author>
<name>Li Yang</name>
<email>leoyang.li@nxp.com</email>
</author>
<published>2023-05-19T20:12:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=9368aa1882ac7178adcd936cee5f0899dbf76dc4'/>
<id>urn:sha1:9368aa1882ac7178adcd936cee5f0899dbf76dc4</id>
<content type='text'>
Since 315bada690e0 ("EDAC: Check for GHES preference in the
chipset-specific EDAC drivers"), vendor specific EDAC driver will not
probe correctly when CONFIG_ACPI_APEI_GHES is enabled but no GHES device
is present.  Make ghes_get_devices() return NULL when the GHES device
list is empty to fix the problem.

Fixes: 9057a3f7ac36 ("EDAC/ghes: Prepare to make ghes_edac a proper module")
Signed-off-by: Li Yang &lt;leoyang.li@nxp.com&gt;
Reviewed-by: Tony Luck &lt;tony.luck@intel.com&gt;
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
</content>
</entry>
<entry>
<title>ACPI: APEI: mark bert_disable as __initdata</title>
<updated>2023-06-12T17:23:25+00:00</updated>
<author>
<name>Miaohe Lin</name>
<email>linmiaohe@huawei.com</email>
</author>
<published>2023-06-06T12:28:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=d38f6bcea88836bfb346a6d567c452065417e2ed'/>
<id>urn:sha1:d38f6bcea88836bfb346a6d567c452065417e2ed</id>
<content type='text'>
It's only used inside the __init section. Mark it __initdata.

Signed-off-by: Miaohe Lin &lt;linmiaohe@huawei.com&gt;
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
</content>
</entry>
</feed>
