<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/drivers/edac/mce_amd.c, branch v6.6.141</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.6.141</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.6.141'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2023-06-05T10:27:11+00:00</updated>
<entry>
<title>x86/MCE/AMD, EDAC/mce_amd: Decode UMC_V2 ECC errors</title>
<updated>2023-06-05T10:27:11+00:00</updated>
<author>
<name>Yazen Ghannam</name>
<email>yazen.ghannam@amd.com</email>
</author>
<published>2023-05-15T11:35:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c35977b00fa76ce5f3fe9afdb9cffda970c943d5'/>
<id>urn:sha1:c35977b00fa76ce5f3fe9afdb9cffda970c943d5</id>
<content type='text'>
The MI200 (Aldebaran) series of devices introduced a new SMCA bank type
for Unified Memory Controllers. The MCE subsystem already has support
for this new type. The MCE decoder module will decode the common MCA
error information for the new bank type, but it will not pass the
information to the AMD64 EDAC module for detailed memory error decoding.

Have the MCE decoder module recognize the new bank type as an SMCA UMC
memory error and pass the MCA information to AMD64 EDAC.

Signed-off-by: Yazen Ghannam &lt;yazen.ghannam@amd.com&gt;
Co-developed-by: Muralidhara M K &lt;muralidhara.mk@amd.com&gt;
Signed-off-by: Muralidhara M K &lt;muralidhara.mk@amd.com&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Link: https://lore.kernel.org/r/20230515113537.1052146-3-muralimk@amd.com
</content>
</entry>
<entry>
<title>x86/MCE/AMD, EDAC/mce_amd: Support non-uniform MCA bank type enumeration</title>
<updated>2021-12-22T16:22:09+00:00</updated>
<author>
<name>Yazen Ghannam</name>
<email>yazen.ghannam@amd.com</email>
</author>
<published>2021-12-16T16:29:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=91f75eb481cfaee5c4ed8fb5214bf2fbfa04bd7b'/>
<id>urn:sha1:91f75eb481cfaee5c4ed8fb5214bf2fbfa04bd7b</id>
<content type='text'>
AMD systems currently lay out MCA bank types such that the type of bank
number "i" is either the same across all CPUs or is Reserved/Read-as-Zero.

For example:

  Bank # | CPUx | CPUy
    0      LS     LS
    1      RAZ    UMC
    2      CS     CS
    3      SMU    RAZ

Future AMD systems will lay out MCA bank types such that the type of
bank number "i" may be different across CPUs.

For example:

  Bank # | CPUx | CPUy
    0      LS     LS
    1      RAZ    UMC
    2      CS     NBIO
    3      SMU    RAZ

Change the structures that cache MCA bank types to be per-CPU and update
smca_get_bank_type() to handle this change.

Move some SMCA-specific structures to amd.c from mce.h, since they no
longer need to be global.

Break out the "count" for bank types from struct smca_hwid, since this
should provide a per-CPU count rather than a system-wide count.

Apply the "const" qualifier to the struct smca_hwid_mcatypes array. The
values in this array should not change at runtime.

Signed-off-by: Yazen Ghannam &lt;yazen.ghannam@amd.com&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Link: https://lore.kernel.org/r/20211216162905.4132657-3-yazen.ghannam@amd.com
</content>
</entry>
<entry>
<title>x86/MCE/AMD, EDAC/mce_amd: Add new SMCA bank types</title>
<updated>2021-12-22T16:19:18+00:00</updated>
<author>
<name>Yazen Ghannam</name>
<email>yazen.ghannam@amd.com</email>
</author>
<published>2021-12-16T16:29:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=5176a93ab27aef1b9f4496fc68e6c303a011d7cc'/>
<id>urn:sha1:5176a93ab27aef1b9f4496fc68e6c303a011d7cc</id>
<content type='text'>
Add HWID and McaType values for new SMCA bank types, and add their error
descriptions to edac_mce_amd.

The "PHY" bank types all have the same error descriptions, and the NBIF
and SHUB bank types have the same error descriptions. So reuse the same
arrays where appropriate.

  [ bp: Remove useless comments over hwid types. ]

Signed-off-by: Yazen Ghannam &lt;yazen.ghannam@amd.com&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Link: https://lore.kernel.org/r/20211216162905.4132657-2-yazen.ghannam@amd.com
</content>
</entry>
<entry>
<title>EDAC/mce_amd: Do not load edac_mce_amd module on guests</title>
<updated>2021-08-09T10:35:43+00:00</updated>
<author>
<name>Smita Koralahalli</name>
<email>Smita.KoralahalliChannabasappa@amd.com</email>
</author>
<published>2021-06-28T17:27:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=767f4b620edadac579c9b8b6660761d4285fa6f9'/>
<id>urn:sha1:767f4b620edadac579c9b8b6660761d4285fa6f9</id>
<content type='text'>
Hypervisors likely do not expose the SMCA feature to the guest and
loading this module leads to false warnings. This module should not be
loaded in guests to begin with, but people tend to do so, especially
when testing kernels in VMs. And then they complain about those false
warnings.

Do the practical thing and do not load this module when running as a
guest to avoid all that complaining.

 [ bp: Rewrite commit message. ]

Suggested-by: Borislav Petkov &lt;bp@suse.de&gt;
Signed-off-by: Smita Koralahalli &lt;Smita.KoralahalliChannabasappa@amd.com&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Reviewed-by: Yazen Ghannam &lt;yazen.ghannam@amd.com&gt;
Tested-by: Kim Phillips &lt;kim.phillips@amd.com&gt;
Link: https://lkml.kernel.org/r/20210628172740.245689-1-Smita.KoralahalliChannabasappa@amd.com
</content>
</entry>
<entry>
<title>EDAC/mce_amd: Fix typo "FIfo" -&gt; "Fifo"</title>
<updated>2021-06-04T13:44:25+00:00</updated>
<author>
<name>Colin Ian King</name>
<email>colin.king@canonical.com</email>
</author>
<published>2021-06-03T10:33:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=429b2ba70812fc8ce7c591e787ec0f2b48d13319'/>
<id>urn:sha1:429b2ba70812fc8ce7c591e787ec0f2b48d13319</id>
<content type='text'>
There is an uppercase letter I in one of the MCE error descriptions
instead of a lowercase one. Fix it.

Signed-off-by: Colin Ian King &lt;colin.king@canonical.com&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Reviewed-by: Yazen Ghannam &lt;yazen.ghannam@amd.com&gt;
Link: https://lkml.kernel.org/r/20210603103349.79117-1-colin.king@canonical.com
</content>
</entry>
<entry>
<title>x86/MCE/AMD, EDAC/mce_amd: Add new SMCA bank types</title>
<updated>2021-05-27T18:08:14+00:00</updated>
<author>
<name>Muralidhara M K</name>
<email>muralimk@amd.com</email>
</author>
<published>2021-05-26T16:46:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=94a311ce248e0b53c76e110fd00511af47b72ffb'/>
<id>urn:sha1:94a311ce248e0b53c76e110fd00511af47b72ffb</id>
<content type='text'>
Add the (HWID, MCATYPE) tuples and names for new SMCA bank types.

Also, add their respective error descriptions to the MCE decoding module
edac_mce_amd. Also while at it, optimize the string names for some SMCA
banks.

 [ bp: Drop repeated comments, explain why UMC_V2 is a separate entry. ]

Signed-off-by: Muralidhara M K &lt;muralimk@amd.com&gt;
Signed-off-by: Naveen Krishna Chatradhi  &lt;nchatrad@amd.com&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Reviewed-by: Yazen Ghannam &lt;yazen.ghannam@amd.com&gt;
Link: https://lkml.kernel.org/r/20210526164601.66228-1-nchatrad@amd.com
</content>
</entry>
<entry>
<title>EDAC/mce_amd: Use struct cpuinfo_x86.cpu_die_id for AMD NodeId</title>
<updated>2020-11-19T10:43:21+00:00</updated>
<author>
<name>Yazen Ghannam</name>
<email>yazen.ghannam@amd.com</email>
</author>
<published>2020-11-09T21:06:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=8de0c9917cc1297bc5543b61992d5bdee4ce621a'/>
<id>urn:sha1:8de0c9917cc1297bc5543b61992d5bdee4ce621a</id>
<content type='text'>
The edac_mce_amd module calls decode_dram_ecc() on AMD Family17h and
later systems. This function is used in amd64_edac_mod to do
system-specific decoding for DRAM ECC errors. The function takes a
"NodeId" as a parameter.

In AMD documentation, NodeId is used to identify a physical die in a
system. This can be used to identify a node in the AMD_NB code and also
it is used with umc_normaddr_to_sysaddr().

However, the input used for decode_dram_ecc() is currently the NUMA node
of a logical CPU. In the default configuration, the NUMA node and
physical die will be equivalent, so this doesn't have an impact.

But the NUMA node configuration can be adjusted with optional memory
interleaving modes. This will cause the NUMA node enumeration to not
match the physical die enumeration. The mismatch will cause the address
translation function to fail or report incorrect results.

Use struct cpuinfo_x86.cpu_die_id for the node_id parameter to ensure the
physical ID is used.

Fixes: fbe63acf62f5 ("EDAC, mce_amd: Use cpu_to_node() to find the node ID")
Signed-off-by: Yazen Ghannam &lt;yazen.ghannam@amd.com&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Link: https://lkml.kernel.org/r/20201109210659.754018-4-Yazen.Ghannam@amd.com
</content>
</entry>
<entry>
<title>x86/CPU/AMD: Remove amd_get_nb_id()</title>
<updated>2020-11-19T10:43:17+00:00</updated>
<author>
<name>Yazen Ghannam</name>
<email>yazen.ghannam@amd.com</email>
</author>
<published>2020-11-09T21:06:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=db970bd231c2264a062e0de4dcf4ead5e6669e7a'/>
<id>urn:sha1:db970bd231c2264a062e0de4dcf4ead5e6669e7a</id>
<content type='text'>
The Last Level Cache ID is returned by amd_get_nb_id(). In practice,
this value is the same as the AMD NodeId for callers of this function.
The NodeId is saved in struct cpuinfo_x86.cpu_die_id.

Replace calls to amd_get_nb_id() with the logical CPU's cpu_die_id and
remove the function.

Signed-off-by: Yazen Ghannam &lt;yazen.ghannam@amd.com&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Link: https://lkml.kernel.org/r/20201109210659.754018-3-Yazen.Ghannam@amd.com
</content>
</entry>
<entry>
<title>Merge tag 'ras_updates_for_v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2020-10-12T17:14:38+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2020-10-12T17:14:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=ca1b66922a702316734bcd5ea2100e5fb8f3caa3'/>
<id>urn:sha1:ca1b66922a702316734bcd5ea2100e5fb8f3caa3</id>
<content type='text'>
Pull RAS updates from Borislav Petkov:

 - Extend the recovery from MCE in kernel space also to processes which
   encounter an MCE in kernel space but while copying from user memory
   by sending them a SIGBUS on return to user space and umapping the
   faulty memory, by Tony Luck and Youquan Song.

 - memcpy_mcsafe() rework by splitting the functionality into
   copy_mc_to_user() and copy_mc_to_kernel(). This, as a result, enables
   support for new hardware which can recover from a machine check
   encountered during a fast string copy and makes that the default and
   lets the older hardware which does not support that advance recovery,
   opt in to use the old, fragile, slow variant, by Dan Williams.

 - New AMD hw enablement, by Yazen Ghannam and Akshay Gupta.

 - Do not use MSR-tracing accessors in #MC context and flag any fault
   while accessing MCA architectural MSRs as an architectural violation
   with the hope that such hw/fw misdesigns are caught early during the
   hw eval phase and they don't make it into production.

 - Misc fixes, improvements and cleanups, as always.

* tag 'ras_updates_for_v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mce: Allow for copy_mc_fragile symbol checksum to be generated
  x86/mce: Decode a kernel instruction to determine if it is copying from user
  x86/mce: Recover from poison found while copying from user space
  x86/mce: Avoid tail copy when machine check terminated a copy from user
  x86/mce: Add _ASM_EXTABLE_CPY for copy user access
  x86/mce: Provide method to find out the type of an exception handler
  x86/mce: Pass pointer to saved pt_regs to severity calculation routines
  x86/copy_mc: Introduce copy_mc_enhanced_fast_string()
  x86, powerpc: Rename memcpy_mcsafe() to copy_mc_to_{user, kernel}()
  x86/mce: Drop AMD-specific "DEFERRED" case from Intel severity rule list
  x86/mce: Add Skylake quirk for patrol scrub reported errors
  RAS/CEC: Convert to DEFINE_SHOW_ATTRIBUTE()
  x86/mce: Annotate mce_rd/wrmsrl() with noinstr
  x86/mce/dev-mcelog: Do not update kflags on AMD systems
  x86/mce: Stop mce_reign() from re-computing severity for every CPU
  x86/mce: Make mce_rdmsrl() panic on an inaccessible MSR
  x86/mce: Increase maximum number of banks to 64
  x86/mce: Delay clearing IA32_MCG_STATUS to the end of do_machine_check()
  x86/MCE/AMD, EDAC/mce_amd: Remove struct smca_hwid.xec_bitmap
  RAS/CEC: Fix cec_init() prototype
</content>
</entry>
<entry>
<title>x86/MCE/AMD, EDAC/mce_amd: Remove struct smca_hwid.xec_bitmap</title>
<updated>2020-08-20T08:34:38+00:00</updated>
<author>
<name>Yazen Ghannam</name>
<email>yazen.ghannam@amd.com</email>
</author>
<published>2020-07-20T14:53:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=368d1887200d68075c064a62a9aa191168cf1eed'/>
<id>urn:sha1:368d1887200d68075c064a62a9aa191168cf1eed</id>
<content type='text'>
The Extended Error Code Bitmap (xec_bitmap) for a Scalable MCA bank type
was intended to be used by the kernel to filter out invalid error codes
on a system. However, this is unnecessary after a few product releases
because the hardware will only report valid error codes. Thus, there's
no need for it with future systems.

Remove the xec_bitmap field and all references to it.

Signed-off-by: Yazen Ghannam &lt;yazen.ghannam@amd.com&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Link: https://lkml.kernel.org/r/20200720145353.43924-1-Yazen.Ghannam@amd.com
</content>
</entry>
</feed>
