<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/drivers/edac/mce_amd.c, branch v6.12.91</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.12.91</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.12.91'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2024-02-15T21:07:37+00:00</updated>
<entry>
<title>x86/cpu/amd: Provide a separate accessor for Node ID</title>
<updated>2024-02-15T21:07:37+00:00</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2024-02-13T21:04:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=7e3ec6286753b404666af9a58d283690302c9321'/>
<id>urn:sha1:7e3ec6286753b404666af9a58d283690302c9321</id>
<content type='text'>
AMD (ab)uses topology_die_id() to store the Node ID information and
topology_max_dies_per_pkg to store the number of nodes per package.

This collides with the proper processor die level enumeration which is
coming on AMD with CPUID 8000_0026, unless there is a correlation between
the two. There is zero documentation about that.

So provide new storage and new accessors which for now still access die_id
and topology_max_die_per_pkg(). Will be mopped up after AMD and HYGON are
converted over.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Tested-by: Juergen Gross &lt;jgross@suse.com&gt;
Tested-by: Sohil Mehta &lt;sohil.mehta@intel.com&gt;
Tested-by: Michael Kelley &lt;mhklinux@outlook.com&gt;
Tested-by: Zhang Rui &lt;rui.zhang@intel.com&gt;
Tested-by: Wang Wendy &lt;wendy.wang@intel.com&gt;
Tested-by: K Prateek Nayak &lt;kprateek.nayak@amd.com&gt;
Link: https://lore.kernel.org/r/20240212153624.956116738@linutronix.de

</content>
</entry>
<entry>
<title>EDAC/mce_amd: Remove SMCA Extended Error code descriptions</title>
<updated>2023-11-28T14:17:09+00:00</updated>
<author>
<name>Muralidhara M K</name>
<email>muralidhara.mk@amd.com</email>
</author>
<published>2023-11-02T11:42:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=9f988030e85fafa2b03910d467302853ad29a300'/>
<id>urn:sha1:9f988030e85fafa2b03910d467302853ad29a300</id>
<content type='text'>
On AMD systems with Scalable MCA each machine check error of a SMCA bank
type has an associated bit position in the bank's control (CTL)
register.

An error's bit position in the CTL register is used during error decoding
for offsetting into the corresponding bank's error description structure.
As new errors are being added in newer AMD systems for existing SMCA bank
types, the underlying SMCA architecture guarantees that the bit positions
of existing errors are not altered.

However, on some AMD systems some of the existing bit definitions in the
CTL register of SMCA bank type are reassigned without defining new HWID
and McaType. Consequently, the errors whose bit definitions have been
reassigned in the CTL register are being erroneously decoded.

Remove SMCA Extended Error Code descriptions, this avoids decoding
issues for incorrectly reassigned bits, and avoids the related
maintenance burden in the kernel. But the bank type and Extended Error
Code value for an error will continue to be printed as a convenience.

The decoding of SMCA Extended Error Code description can be done by
referring to AMD documentation or use external tools such as rasdaemon.

Offline decoding can be done using below option in rasdaemon. For example:

  $ rasdaemon -p --status &lt;STATUS&gt; --ipid &lt;IPID&gt; --smca

Also, the user can pass particular family and model to decode the error
string.

$ rasdaemon -p --status &lt;STATUS&gt; --ipid &lt;IPID&gt; --smca --family &lt;CPU Family&gt;
	--model &lt;CPU Model&gt; --bank &lt;BANK_NUM&gt;

Refer to the rasdaemon commit for details:

  https://github.com/mchehab/rasdaemon/commit/932118b04a04104dfac6b8536

Signed-off-by: Muralidhara M K &lt;muralidhara.mk@amd.com&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Reviewed-by: Yazen Ghannam &lt;yazen.ghannam@amd.com&gt;
Link: https://lore.kernel.org/r/20231102114225.2006878-2-muralimk@amd.com
</content>
</entry>
<entry>
<title>x86/mce/amd, EDAC/mce_amd: Move long names to decoder module</title>
<updated>2023-11-27T11:16:51+00:00</updated>
<author>
<name>Yazen Ghannam</name>
<email>yazen.ghannam@amd.com</email>
</author>
<published>2023-11-18T19:32:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=ff03ff328fbd0a2b3a43e8b9bbc2a1d84265e77e'/>
<id>urn:sha1:ff03ff328fbd0a2b3a43e8b9bbc2a1d84265e77e</id>
<content type='text'>
The long names of the SMCA banks are only used by the MCE decoder
module.

Move them out of the arch code and into the decoder module.

  [ bp: Name the long names array "smca_long_names", drop local ptr in
    decode_smca_error(), constify arrays. ]

Signed-off-by: Yazen Ghannam &lt;yazen.ghannam@amd.com&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Link: https://lore.kernel.org/r/20231118193248.1296798-5-yazen.ghannam@amd.com
</content>
</entry>
<entry>
<title>x86/MCE/AMD, EDAC/mce_amd: Decode UMC_V2 ECC errors</title>
<updated>2023-06-05T10:27:11+00:00</updated>
<author>
<name>Yazen Ghannam</name>
<email>yazen.ghannam@amd.com</email>
</author>
<published>2023-05-15T11:35:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c35977b00fa76ce5f3fe9afdb9cffda970c943d5'/>
<id>urn:sha1:c35977b00fa76ce5f3fe9afdb9cffda970c943d5</id>
<content type='text'>
The MI200 (Aldebaran) series of devices introduced a new SMCA bank type
for Unified Memory Controllers. The MCE subsystem already has support
for this new type. The MCE decoder module will decode the common MCA
error information for the new bank type, but it will not pass the
information to the AMD64 EDAC module for detailed memory error decoding.

Have the MCE decoder module recognize the new bank type as an SMCA UMC
memory error and pass the MCA information to AMD64 EDAC.

Signed-off-by: Yazen Ghannam &lt;yazen.ghannam@amd.com&gt;
Co-developed-by: Muralidhara M K &lt;muralidhara.mk@amd.com&gt;
Signed-off-by: Muralidhara M K &lt;muralidhara.mk@amd.com&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Link: https://lore.kernel.org/r/20230515113537.1052146-3-muralimk@amd.com
</content>
</entry>
<entry>
<title>x86/MCE/AMD, EDAC/mce_amd: Support non-uniform MCA bank type enumeration</title>
<updated>2021-12-22T16:22:09+00:00</updated>
<author>
<name>Yazen Ghannam</name>
<email>yazen.ghannam@amd.com</email>
</author>
<published>2021-12-16T16:29:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=91f75eb481cfaee5c4ed8fb5214bf2fbfa04bd7b'/>
<id>urn:sha1:91f75eb481cfaee5c4ed8fb5214bf2fbfa04bd7b</id>
<content type='text'>
AMD systems currently lay out MCA bank types such that the type of bank
number "i" is either the same across all CPUs or is Reserved/Read-as-Zero.

For example:

  Bank # | CPUx | CPUy
    0      LS     LS
    1      RAZ    UMC
    2      CS     CS
    3      SMU    RAZ

Future AMD systems will lay out MCA bank types such that the type of
bank number "i" may be different across CPUs.

For example:

  Bank # | CPUx | CPUy
    0      LS     LS
    1      RAZ    UMC
    2      CS     NBIO
    3      SMU    RAZ

Change the structures that cache MCA bank types to be per-CPU and update
smca_get_bank_type() to handle this change.

Move some SMCA-specific structures to amd.c from mce.h, since they no
longer need to be global.

Break out the "count" for bank types from struct smca_hwid, since this
should provide a per-CPU count rather than a system-wide count.

Apply the "const" qualifier to the struct smca_hwid_mcatypes array. The
values in this array should not change at runtime.

Signed-off-by: Yazen Ghannam &lt;yazen.ghannam@amd.com&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Link: https://lore.kernel.org/r/20211216162905.4132657-3-yazen.ghannam@amd.com
</content>
</entry>
<entry>
<title>x86/MCE/AMD, EDAC/mce_amd: Add new SMCA bank types</title>
<updated>2021-12-22T16:19:18+00:00</updated>
<author>
<name>Yazen Ghannam</name>
<email>yazen.ghannam@amd.com</email>
</author>
<published>2021-12-16T16:29:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=5176a93ab27aef1b9f4496fc68e6c303a011d7cc'/>
<id>urn:sha1:5176a93ab27aef1b9f4496fc68e6c303a011d7cc</id>
<content type='text'>
Add HWID and McaType values for new SMCA bank types, and add their error
descriptions to edac_mce_amd.

The "PHY" bank types all have the same error descriptions, and the NBIF
and SHUB bank types have the same error descriptions. So reuse the same
arrays where appropriate.

  [ bp: Remove useless comments over hwid types. ]

Signed-off-by: Yazen Ghannam &lt;yazen.ghannam@amd.com&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Link: https://lore.kernel.org/r/20211216162905.4132657-2-yazen.ghannam@amd.com
</content>
</entry>
<entry>
<title>EDAC/mce_amd: Do not load edac_mce_amd module on guests</title>
<updated>2021-08-09T10:35:43+00:00</updated>
<author>
<name>Smita Koralahalli</name>
<email>Smita.KoralahalliChannabasappa@amd.com</email>
</author>
<published>2021-06-28T17:27:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=767f4b620edadac579c9b8b6660761d4285fa6f9'/>
<id>urn:sha1:767f4b620edadac579c9b8b6660761d4285fa6f9</id>
<content type='text'>
Hypervisors likely do not expose the SMCA feature to the guest and
loading this module leads to false warnings. This module should not be
loaded in guests to begin with, but people tend to do so, especially
when testing kernels in VMs. And then they complain about those false
warnings.

Do the practical thing and do not load this module when running as a
guest to avoid all that complaining.

 [ bp: Rewrite commit message. ]

Suggested-by: Borislav Petkov &lt;bp@suse.de&gt;
Signed-off-by: Smita Koralahalli &lt;Smita.KoralahalliChannabasappa@amd.com&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Reviewed-by: Yazen Ghannam &lt;yazen.ghannam@amd.com&gt;
Tested-by: Kim Phillips &lt;kim.phillips@amd.com&gt;
Link: https://lkml.kernel.org/r/20210628172740.245689-1-Smita.KoralahalliChannabasappa@amd.com
</content>
</entry>
<entry>
<title>EDAC/mce_amd: Fix typo "FIfo" -&gt; "Fifo"</title>
<updated>2021-06-04T13:44:25+00:00</updated>
<author>
<name>Colin Ian King</name>
<email>colin.king@canonical.com</email>
</author>
<published>2021-06-03T10:33:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=429b2ba70812fc8ce7c591e787ec0f2b48d13319'/>
<id>urn:sha1:429b2ba70812fc8ce7c591e787ec0f2b48d13319</id>
<content type='text'>
There is an uppercase letter I in one of the MCE error descriptions
instead of a lowercase one. Fix it.

Signed-off-by: Colin Ian King &lt;colin.king@canonical.com&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Reviewed-by: Yazen Ghannam &lt;yazen.ghannam@amd.com&gt;
Link: https://lkml.kernel.org/r/20210603103349.79117-1-colin.king@canonical.com
</content>
</entry>
<entry>
<title>x86/MCE/AMD, EDAC/mce_amd: Add new SMCA bank types</title>
<updated>2021-05-27T18:08:14+00:00</updated>
<author>
<name>Muralidhara M K</name>
<email>muralimk@amd.com</email>
</author>
<published>2021-05-26T16:46:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=94a311ce248e0b53c76e110fd00511af47b72ffb'/>
<id>urn:sha1:94a311ce248e0b53c76e110fd00511af47b72ffb</id>
<content type='text'>
Add the (HWID, MCATYPE) tuples and names for new SMCA bank types.

Also, add their respective error descriptions to the MCE decoding module
edac_mce_amd. Also while at it, optimize the string names for some SMCA
banks.

 [ bp: Drop repeated comments, explain why UMC_V2 is a separate entry. ]

Signed-off-by: Muralidhara M K &lt;muralimk@amd.com&gt;
Signed-off-by: Naveen Krishna Chatradhi  &lt;nchatrad@amd.com&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Reviewed-by: Yazen Ghannam &lt;yazen.ghannam@amd.com&gt;
Link: https://lkml.kernel.org/r/20210526164601.66228-1-nchatrad@amd.com
</content>
</entry>
<entry>
<title>EDAC/mce_amd: Use struct cpuinfo_x86.cpu_die_id for AMD NodeId</title>
<updated>2020-11-19T10:43:21+00:00</updated>
<author>
<name>Yazen Ghannam</name>
<email>yazen.ghannam@amd.com</email>
</author>
<published>2020-11-09T21:06:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=8de0c9917cc1297bc5543b61992d5bdee4ce621a'/>
<id>urn:sha1:8de0c9917cc1297bc5543b61992d5bdee4ce621a</id>
<content type='text'>
The edac_mce_amd module calls decode_dram_ecc() on AMD Family17h and
later systems. This function is used in amd64_edac_mod to do
system-specific decoding for DRAM ECC errors. The function takes a
"NodeId" as a parameter.

In AMD documentation, NodeId is used to identify a physical die in a
system. This can be used to identify a node in the AMD_NB code and also
it is used with umc_normaddr_to_sysaddr().

However, the input used for decode_dram_ecc() is currently the NUMA node
of a logical CPU. In the default configuration, the NUMA node and
physical die will be equivalent, so this doesn't have an impact.

But the NUMA node configuration can be adjusted with optional memory
interleaving modes. This will cause the NUMA node enumeration to not
match the physical die enumeration. The mismatch will cause the address
translation function to fail or report incorrect results.

Use struct cpuinfo_x86.cpu_die_id for the node_id parameter to ensure the
physical ID is used.

Fixes: fbe63acf62f5 ("EDAC, mce_amd: Use cpu_to_node() to find the node ID")
Signed-off-by: Yazen Ghannam &lt;yazen.ghannam@amd.com&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Link: https://lkml.kernel.org/r/20201109210659.754018-4-Yazen.Ghannam@amd.com
</content>
</entry>
</feed>
