<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/drivers/edac/mce_amd.c, branch v6.19.12</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.19.12</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.19.12'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2025-05-02T08:23:47+00:00</updated>
<entry>
<title>x86/msr: Add explicit includes of &lt;asm/msr.h&gt;</title>
<updated>2025-05-02T08:23:47+00:00</updated>
<author>
<name>Xin Li (Intel)</name>
<email>xin@zytor.com</email>
</author>
<published>2025-05-01T05:42:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=efef7f184f2eaf29a1ca676712d0e6e851cd0191'/>
<id>urn:sha1:efef7f184f2eaf29a1ca676712d0e6e851cd0191</id>
<content type='text'>
For historic reasons there are some TSC-related functions in the
&lt;asm/msr.h&gt; header, even though there's an &lt;asm/tsc.h&gt; header.

To facilitate the relocation of rdtsc{,_ordered}() from &lt;asm/msr.h&gt;
to &lt;asm/tsc.h&gt; and to eventually eliminate the inclusion of
&lt;asm/msr.h&gt; in &lt;asm/tsc.h&gt;, add an explicit &lt;asm/msr.h&gt; dependency
to the source files that reference definitions from &lt;asm/msr.h&gt;.

[ mingo: Clarified the changelog. ]

Signed-off-by: Xin Li (Intel) &lt;xin@zytor.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Acked-by: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Acked-by: Ilpo Järvinen &lt;ilpo.jarvinen@linux.intel.com&gt;
Cc: Andy Lutomirski &lt;luto@kernel.org&gt;
Cc: Brian Gerst &lt;brgerst@gmail.com&gt;
Cc: Juergen Gross &lt;jgross@suse.com&gt;
Cc: H. Peter Anvin &lt;hpa@zytor.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Borislav Petkov &lt;bp@alien8.de&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Josh Poimboeuf &lt;jpoimboe@redhat.com&gt;
Cc: Uros Bizjak &lt;ubizjak@gmail.com&gt;
Link: https://lore.kernel.org/r/20250501054241.1245648-1-xin@zytor.com
</content>
</entry>
<entry>
<title>EDAC/mce_amd: Add support for FRU text in MCA</title>
<updated>2024-10-31T09:53:04+00:00</updated>
<author>
<name>Yazen Ghannam</name>
<email>yazen.ghannam@amd.com</email>
</author>
<published>2024-10-22T19:36:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=612c2addff367ee461dc99ffca2bc786f105d2ec'/>
<id>urn:sha1:612c2addff367ee461dc99ffca2bc786f105d2ec</id>
<content type='text'>
A new "FRU Text in MCA" feature is defined where the Field Replaceable
Unit (FRU) Text for a device is represented by a string in the new
MCA_SYND1 and MCA_SYND2 registers. This feature is supported per MCA
bank, and it is advertised by the McaFruTextInMca bit (MCA_CONFIG[9]).

The FRU Text is populated dynamically for each individual error state
(MCA_STATUS, MCA_ADDR, et al.). Handle the case where an MCA bank covers
multiple devices, for example, a Unified Memory Controller (UMC) bank
that manages two DIMMs.

  [ Yazen: Add Avadhut as co-developer for wrapper changes. ]
  [ bp: Do not expose MCA_CONFIG to userspace yet. ]

Signed-off-by: Yazen Ghannam &lt;yazen.ghannam@amd.com&gt;
Co-developed-by: Avadhut Naik &lt;avadhut.naik@amd.com&gt;
Signed-off-by: Avadhut Naik &lt;avadhut.naik@amd.com&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Link: https://lore.kernel.org/r/20241022194158.110073-6-avadhut.naik@amd.com
</content>
</entry>
<entry>
<title>x86/MCE/AMD: Add support for new MCA_SYND{1,2} registers</title>
<updated>2024-10-31T09:36:07+00:00</updated>
<author>
<name>Avadhut Naik</name>
<email>avadhut.naik@amd.com</email>
</author>
<published>2024-10-22T19:36:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=d4fca1358ea9096f2f6ed942e2cb3a820073dfc1'/>
<id>urn:sha1:d4fca1358ea9096f2f6ed942e2cb3a820073dfc1</id>
<content type='text'>
Starting with Zen4, AMD's Scalable MCA systems incorporate two new registers:
MCA_SYND1 and MCA_SYND2.

These registers will include supplemental error information in addition to the
existing MCA_SYND register. The data within these registers is considered
valid if MCA_STATUS[SyndV] is set.

Userspace error decoding tools like rasdaemon gather related hardware error
information through the tracepoints.

Therefore, export these two registers through the mce_record tracepoint so
that tools like rasdaemon can parse them and output the supplemental error
information like FRU text contained in them.

  [ bp: Massage. ]

Signed-off-by: Yazen Ghannam &lt;yazen.ghannam@amd.com&gt;
Signed-off-by: Avadhut Naik &lt;avadhut.naik@amd.com&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Reviewed-by: Qiuxu Zhuo &lt;qiuxu.zhuo@intel.com&gt;
Link: https://lore.kernel.org/r/20241022194158.110073-4-avadhut.naik@amd.com
</content>
</entry>
<entry>
<title>x86/cpu/amd: Provide a separate accessor for Node ID</title>
<updated>2024-02-15T21:07:37+00:00</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2024-02-13T21:04:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=7e3ec6286753b404666af9a58d283690302c9321'/>
<id>urn:sha1:7e3ec6286753b404666af9a58d283690302c9321</id>
<content type='text'>
AMD (ab)uses topology_die_id() to store the Node ID information and
topology_max_dies_per_pkg to store the number of nodes per package.

This collides with the proper processor die level enumeration which is
coming on AMD with CPUID 8000_0026, unless there is a correlation between
the two. There is zero documentation about that.

So provide new storage and new accessors which for now still access die_id
and topology_max_die_per_pkg(). Will be mopped up after AMD and HYGON are
converted over.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Tested-by: Juergen Gross &lt;jgross@suse.com&gt;
Tested-by: Sohil Mehta &lt;sohil.mehta@intel.com&gt;
Tested-by: Michael Kelley &lt;mhklinux@outlook.com&gt;
Tested-by: Zhang Rui &lt;rui.zhang@intel.com&gt;
Tested-by: Wang Wendy &lt;wendy.wang@intel.com&gt;
Tested-by: K Prateek Nayak &lt;kprateek.nayak@amd.com&gt;
Link: https://lore.kernel.org/r/20240212153624.956116738@linutronix.de

</content>
</entry>
<entry>
<title>EDAC/mce_amd: Remove SMCA Extended Error code descriptions</title>
<updated>2023-11-28T14:17:09+00:00</updated>
<author>
<name>Muralidhara M K</name>
<email>muralidhara.mk@amd.com</email>
</author>
<published>2023-11-02T11:42:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=9f988030e85fafa2b03910d467302853ad29a300'/>
<id>urn:sha1:9f988030e85fafa2b03910d467302853ad29a300</id>
<content type='text'>
On AMD systems with Scalable MCA each machine check error of a SMCA bank
type has an associated bit position in the bank's control (CTL)
register.

An error's bit position in the CTL register is used during error decoding
for offsetting into the corresponding bank's error description structure.
As new errors are being added in newer AMD systems for existing SMCA bank
types, the underlying SMCA architecture guarantees that the bit positions
of existing errors are not altered.

However, on some AMD systems some of the existing bit definitions in the
CTL register of SMCA bank type are reassigned without defining new HWID
and McaType. Consequently, the errors whose bit definitions have been
reassigned in the CTL register are being erroneously decoded.

Remove SMCA Extended Error Code descriptions, this avoids decoding
issues for incorrectly reassigned bits, and avoids the related
maintenance burden in the kernel. But the bank type and Extended Error
Code value for an error will continue to be printed as a convenience.

The decoding of SMCA Extended Error Code description can be done by
referring to AMD documentation or use external tools such as rasdaemon.

Offline decoding can be done using below option in rasdaemon. For example:

  $ rasdaemon -p --status &lt;STATUS&gt; --ipid &lt;IPID&gt; --smca

Also, the user can pass particular family and model to decode the error
string.

$ rasdaemon -p --status &lt;STATUS&gt; --ipid &lt;IPID&gt; --smca --family &lt;CPU Family&gt;
	--model &lt;CPU Model&gt; --bank &lt;BANK_NUM&gt;

Refer to the rasdaemon commit for details:

  https://github.com/mchehab/rasdaemon/commit/932118b04a04104dfac6b8536

Signed-off-by: Muralidhara M K &lt;muralidhara.mk@amd.com&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Reviewed-by: Yazen Ghannam &lt;yazen.ghannam@amd.com&gt;
Link: https://lore.kernel.org/r/20231102114225.2006878-2-muralimk@amd.com
</content>
</entry>
<entry>
<title>x86/mce/amd, EDAC/mce_amd: Move long names to decoder module</title>
<updated>2023-11-27T11:16:51+00:00</updated>
<author>
<name>Yazen Ghannam</name>
<email>yazen.ghannam@amd.com</email>
</author>
<published>2023-11-18T19:32:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=ff03ff328fbd0a2b3a43e8b9bbc2a1d84265e77e'/>
<id>urn:sha1:ff03ff328fbd0a2b3a43e8b9bbc2a1d84265e77e</id>
<content type='text'>
The long names of the SMCA banks are only used by the MCE decoder
module.

Move them out of the arch code and into the decoder module.

  [ bp: Name the long names array "smca_long_names", drop local ptr in
    decode_smca_error(), constify arrays. ]

Signed-off-by: Yazen Ghannam &lt;yazen.ghannam@amd.com&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Link: https://lore.kernel.org/r/20231118193248.1296798-5-yazen.ghannam@amd.com
</content>
</entry>
<entry>
<title>x86/MCE/AMD, EDAC/mce_amd: Decode UMC_V2 ECC errors</title>
<updated>2023-06-05T10:27:11+00:00</updated>
<author>
<name>Yazen Ghannam</name>
<email>yazen.ghannam@amd.com</email>
</author>
<published>2023-05-15T11:35:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c35977b00fa76ce5f3fe9afdb9cffda970c943d5'/>
<id>urn:sha1:c35977b00fa76ce5f3fe9afdb9cffda970c943d5</id>
<content type='text'>
The MI200 (Aldebaran) series of devices introduced a new SMCA bank type
for Unified Memory Controllers. The MCE subsystem already has support
for this new type. The MCE decoder module will decode the common MCA
error information for the new bank type, but it will not pass the
information to the AMD64 EDAC module for detailed memory error decoding.

Have the MCE decoder module recognize the new bank type as an SMCA UMC
memory error and pass the MCA information to AMD64 EDAC.

Signed-off-by: Yazen Ghannam &lt;yazen.ghannam@amd.com&gt;
Co-developed-by: Muralidhara M K &lt;muralidhara.mk@amd.com&gt;
Signed-off-by: Muralidhara M K &lt;muralidhara.mk@amd.com&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Link: https://lore.kernel.org/r/20230515113537.1052146-3-muralimk@amd.com
</content>
</entry>
<entry>
<title>x86/MCE/AMD, EDAC/mce_amd: Support non-uniform MCA bank type enumeration</title>
<updated>2021-12-22T16:22:09+00:00</updated>
<author>
<name>Yazen Ghannam</name>
<email>yazen.ghannam@amd.com</email>
</author>
<published>2021-12-16T16:29:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=91f75eb481cfaee5c4ed8fb5214bf2fbfa04bd7b'/>
<id>urn:sha1:91f75eb481cfaee5c4ed8fb5214bf2fbfa04bd7b</id>
<content type='text'>
AMD systems currently lay out MCA bank types such that the type of bank
number "i" is either the same across all CPUs or is Reserved/Read-as-Zero.

For example:

  Bank # | CPUx | CPUy
    0      LS     LS
    1      RAZ    UMC
    2      CS     CS
    3      SMU    RAZ

Future AMD systems will lay out MCA bank types such that the type of
bank number "i" may be different across CPUs.

For example:

  Bank # | CPUx | CPUy
    0      LS     LS
    1      RAZ    UMC
    2      CS     NBIO
    3      SMU    RAZ

Change the structures that cache MCA bank types to be per-CPU and update
smca_get_bank_type() to handle this change.

Move some SMCA-specific structures to amd.c from mce.h, since they no
longer need to be global.

Break out the "count" for bank types from struct smca_hwid, since this
should provide a per-CPU count rather than a system-wide count.

Apply the "const" qualifier to the struct smca_hwid_mcatypes array. The
values in this array should not change at runtime.

Signed-off-by: Yazen Ghannam &lt;yazen.ghannam@amd.com&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Link: https://lore.kernel.org/r/20211216162905.4132657-3-yazen.ghannam@amd.com
</content>
</entry>
<entry>
<title>x86/MCE/AMD, EDAC/mce_amd: Add new SMCA bank types</title>
<updated>2021-12-22T16:19:18+00:00</updated>
<author>
<name>Yazen Ghannam</name>
<email>yazen.ghannam@amd.com</email>
</author>
<published>2021-12-16T16:29:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=5176a93ab27aef1b9f4496fc68e6c303a011d7cc'/>
<id>urn:sha1:5176a93ab27aef1b9f4496fc68e6c303a011d7cc</id>
<content type='text'>
Add HWID and McaType values for new SMCA bank types, and add their error
descriptions to edac_mce_amd.

The "PHY" bank types all have the same error descriptions, and the NBIF
and SHUB bank types have the same error descriptions. So reuse the same
arrays where appropriate.

  [ bp: Remove useless comments over hwid types. ]

Signed-off-by: Yazen Ghannam &lt;yazen.ghannam@amd.com&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Link: https://lore.kernel.org/r/20211216162905.4132657-2-yazen.ghannam@amd.com
</content>
</entry>
<entry>
<title>EDAC/mce_amd: Do not load edac_mce_amd module on guests</title>
<updated>2021-08-09T10:35:43+00:00</updated>
<author>
<name>Smita Koralahalli</name>
<email>Smita.KoralahalliChannabasappa@amd.com</email>
</author>
<published>2021-06-28T17:27:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=767f4b620edadac579c9b8b6660761d4285fa6f9'/>
<id>urn:sha1:767f4b620edadac579c9b8b6660761d4285fa6f9</id>
<content type='text'>
Hypervisors likely do not expose the SMCA feature to the guest and
loading this module leads to false warnings. This module should not be
loaded in guests to begin with, but people tend to do so, especially
when testing kernels in VMs. And then they complain about those false
warnings.

Do the practical thing and do not load this module when running as a
guest to avoid all that complaining.

 [ bp: Rewrite commit message. ]

Suggested-by: Borislav Petkov &lt;bp@suse.de&gt;
Signed-off-by: Smita Koralahalli &lt;Smita.KoralahalliChannabasappa@amd.com&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Reviewed-by: Yazen Ghannam &lt;yazen.ghannam@amd.com&gt;
Tested-by: Kim Phillips &lt;kim.phillips@amd.com&gt;
Link: https://lkml.kernel.org/r/20210628172740.245689-1-Smita.KoralahalliChannabasappa@amd.com
</content>
</entry>
</feed>
