summaryrefslogtreecommitdiff
path: root/drivers/edac
AgeCommit message (Collapse)AuthorFilesLines
2022-07-27EDAC/ghes: Set the DIMM label unconditionallyToshi Kani1-3/+8
The commit cb51a371d08e ("EDAC/ghes: Setup DIMM label from DMI and use it in error reports") enforced that both the bank and device strings passed to dimm_setup_label() are not NULL. However, there are BIOSes, for example on a HPE ProLiant DL360 Gen10/ProLiant DL360 Gen10, BIOS U32 03/15/2019 which don't populate both strings: Handle 0x0020, DMI type 17, 84 bytes Memory Device Array Handle: 0x0013 Error Information Handle: Not Provided Total Width: 72 bits Data Width: 64 bits Size: 32 GB Form Factor: DIMM Set: None Locator: PROC 1 DIMM 1 <===== device Bank Locator: Not Specified <===== bank This results in a buffer overflow because ghes_edac_register() calls strlen() on an uninitialized label, which had non-zero values left over from krealloc_array(): detected buffer overflow in __fortify_strlen ------------[ cut here ]------------ kernel BUG at lib/string_helpers.c:983! invalid opcode: 0000 [#1] PREEMPT SMP NOPTI CPU: 1 PID: 1 Comm: swapper/0 Tainted: G I 5.18.6-200.fc36.x86_64 #1 Hardware name: HPE ProLiant DL360 Gen10/ProLiant DL360 Gen10, BIOS U32 03/15/2019 RIP: 0010:fortify_panic ... Call Trace: <TASK> ghes_edac_register.cold ghes_probe platform_probe really_probe __driver_probe_device driver_probe_device __driver_attach ? __device_attach_driver bus_for_each_dev bus_add_driver driver_register acpi_ghes_init acpi_init ? acpi_sleep_proc_init do_one_initcall The label contains garbage because the commit in Fixes reallocs the DIMMs array while scanning the system but doesn't clear the newly allocated memory. Change dimm_setup_label() to always initialize the label to fix the issue. Set it to the empty string in case BIOS does not provide both bank and device so that ghes_edac_register() can keep the default label given by edac_mc_alloc_dimms(). [ bp: Rewrite commit message. ] Fixes: b9cae27728d1f ("EDAC/ghes: Scan the system once on driver init") Co-developed-by: Robert Richter <rric@kernel.org> Signed-off-by: Robert Richter <rric@kernel.org> Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Robert Elliott <elliott@hpe.com> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220719220124.760359-1-toshi.kani@hpe.com
2022-07-22EDAC/synopsys: Re-enable the error interrupts on v3 hwSherry Sun1-22/+25
zynqmp_get_error_info() writes 0 to the ECC_CLR_OFST register after an interrupt for a {un-,}correctable error is raised, which disables the error interrupts. Then the interrupt handler will be called only once. Therefore, re-enable the error interrupt line at the end of intr_handler() for v3.x Synopsys EDAC DDR. Fixes: f7824ded4149 ("EDAC/synopsys: Add support for version 3 of the Synopsys EDAC DDR") Signed-off-by: Sherry Sun <sherry.sun@nxp.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Shubhrajyoti Datta <Shubhrajyoti.datta@xilinx.com> Acked-by: Michal Simek <michal.simek@xilinx.com> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220427015137.8406-3-sherry.sun@nxp.com
2022-07-22EDAC/synopsys: Use the correct register to disable the error interrupt on v3 hwSherry Sun1-2/+5
v3.x Synopsys EDAC DDR doesn't have the QOS Interrupt register. Use the ECC Clear Register to disable the error interrupts instead. Fixes: f7824ded4149 ("EDAC/synopsys: Add support for version 3 of the Synopsys EDAC DDR") Signed-off-by: Sherry Sun <sherry.sun@nxp.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Shubhrajyoti Datta <Shubhrajyoti.datta@xilinx.com> Acked-by: Michal Simek <michal.simek@xilinx.com> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220427015137.8406-2-sherry.sun@nxp.com
2022-05-24Merge tag 'x86_misc_for_v5.19_rc1' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc x86 updates from Borislav Petkov: "A variety of fixes which don't fit any other tip bucket: - Remove unnecessary function export - Correct asm constraint - Fix __setup handlers retval" * tag 'x86_misc_for_v5.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm: Cleanup the control_va_addr_alignment() __setup handler x86: Fix return value of __setup handlers x86/delay: Fix the wrong asm constraint in delay_loop() x86/amd_nb: Unexport amd_cache_northbridges()
2022-05-24Merge tag 'edac_updates_for_v5.19_rc1' of ↵Linus Torvalds14-398/+135
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull EDAC updates from Borislav Petkov: - Switch ghes_edac to use the CPER error reporting routines and simplify the code considerably this way - Rip out the silly edac_align_ptr() contraption which was computing the size of the private structures of each driver and thus allowing for a one-shot memory allocation. This was clearly unnecessary and confusing so switch to simple and boring kmalloc* calls. - Last but not least, the usual garden variety of fixes, cleanups and improvements all over EDAC land * tag 'edac_updates_for_v5.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: EDAC/xgene: Fix typo processsors -> processors EDAC/i5100: Remove unused inline function i5100_nrecmema_dm_buf_id() EDAC: Use kcalloc() EDAC/ghes: Change ghes_hw from global to static EDAC/armada_xp: Use devm_platform_ioremap_resource() EDAC/synopsys: Add a SPDX identifier EDAC/synopsys: Add driver support for i.MX platforms EDAC/dmc520: Don't print an error for each unconfigured interrupt line EDAC/mc: Get rid of edac_align_ptr() EDAC/device: Sanitize edac_device_alloc_ctl_info() definition EDAC/device: Get rid of the silly one-shot memory allocation in edac_device_alloc_ctl_info() EDAC/pci: Get rid of the silly one-shot memory allocation in edac_pci_alloc_ctl_info() EDAC/mc: Get rid of silly one-shot struct allocation in edac_mc_alloc() efi/cper: Reformat CPER memory error location to more readable EDAC/ghes: Unify CPER memory error location reporting efi/cper: Add a cper_mem_err_status_str() to decode error description powerpc/85xx: Remove fsl,85... bindings
2022-05-23Merge branches 'edac-misc' and 'edac-alloc-cleanup' into edac-updates-for-v5.19Borislav Petkov6-182/+90
Combine all collected EDAC changes for submission into v5.19: * edac-misc: EDAC/xgene: Fix typo processsors -> processors EDAC/i5100: Remove unused inline function i5100_nrecmema_dm_buf_id() EDAC/ghes: Change ghes_hw from global to static EDAC/armada_xp: Use devm_platform_ioremap_resource() EDAC/synopsys: Add a SPDX identifier EDAC/synopsys: Add driver support for i.MX platforms EDAC/dmc520: Don't print an error for each unconfigured interrupt line efi/cper: Reformat CPER memory error location to more readable EDAC/ghes: Unify CPER memory error location reporting efi/cper: Add a cper_mem_err_status_str() to decode error description powerpc/85xx: Remove fsl,85... bindings * edac-alloc-cleanup: EDAC: Use kcalloc() EDAC/mc: Get rid of edac_align_ptr() EDAC/device: Sanitize edac_device_alloc_ctl_info() definition EDAC/device: Get rid of the silly one-shot memory allocation in edac_device_alloc_ctl_info() EDAC/pci: Get rid of the silly one-shot memory allocation in edac_pci_alloc_ctl_info() EDAC/mc: Get rid of silly one-shot struct allocation in edac_mc_alloc() Signed-off-by: Borislav Petkov <bp@suse.de>
2022-05-21EDAC/xgene: Fix typo processsors -> processorsJulia Lawall1-1/+1
Spelling mistake (triple letters) in comment. Detected with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220521111145.81697-39-Julia.Lawall@inria.fr
2022-05-17EDAC/i5100: Remove unused inline function i5100_nrecmema_dm_buf_id()YueHaibing1-5/+0
Commit a4972b1b9a04 ("edac: i5100_edac: Remove unused i5100_recmema_dm_buf_id") left this function unused. Remove it. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220514080433.29944-1-yuehaibing@huawei.com
2022-05-02EDAC: Use kcalloc()Borislav Petkov2-10/+5
It is syntactic sugar anyway: # drivers/edac/edac_mc.o: text data bss dec hex filename 13378 324 8 13710 358e edac_mc.o.before 13378 324 8 13710 358e edac_mc.o.after md5: 70a53ee3ac7f867730e35c2be9110748 edac_mc.o.before.asm 70a53ee3ac7f867730e35c2be9110748 edac_mc.o.after.asm # drivers/edac/edac_device.o: text data bss dec hex filename 5704 120 4 5828 16c4 edac_device.o.before 5704 120 4 5828 16c4 edac_device.o.after md5: 880563c859da6eb9aca85ec431fdbaeb edac_device.o.before.asm 880563c859da6eb9aca85ec431fdbaeb edac_device.o.after.asm No functional changes. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220412211957.28899-1-bp@alien8.de
2022-04-29EDAC/ghes: Change ghes_hw from global to staticTom Rix1-1/+1
Smatch reports this issue ghes_edac.c:44:3: warning: symbol 'ghes_hw' was not declared. Should it be static? ghes_hw is used only in ghes_edac.c so change its storage-class specifier to static. Signed-off-by: Tom Rix <trix@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220421135319.1508754-1-trix@redhat.com
2022-04-29EDAC/armada_xp: Use devm_platform_ioremap_resource()Lv Ruyi1-16/+2
Use the devm_platform_ioremap_resource() helper instead of calling platform_get_resource() and devm_ioremap_resource() separately. Make the code simpler without functional changes. Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Lv Ruyi <lv.ruyi@zte.com.cn> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jan Luebbe <jlu@pengutronix.de> Link: https://lore.kernel.org/r/20220421084621.2615517-1-lv.ruyi@zte.com.cn
2022-04-28EDAC/synopsys: Add a SPDX identifierShubhrajyoti Datta1-14/+1
Replace the copyright boilerplate with a SPDX identifier. [ bp: Rewrite commit message. ] Signed-off-by: Shubhrajyoti Datta <shubhrajyoti.datta@xilinx.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220428044051.2842687-1-shubhrajyoti.datta@xilinx.com
2022-04-28EDAC/synopsys: Add driver support for i.MX platformsSherry Sun1-1/+1
i.MX8MP use Synopsys v3.70a DDR controller IP so add support for it with the Synopsys driver. Signed-off-by: Sherry Sun <sherry.sun@nxp.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Michal Simek <michal.simek@amd.com> Link: https://lore.kernel.org/r/20220428023209.18087-1-sherry.sun@nxp.com
2022-04-19EDAC/dmc520: Don't print an error for each unconfigured interrupt lineTyler Hicks1-1/+1
The dmc520 driver requires that at least one interrupt line, out of the ten possible, is configured. The driver prints an error and returns -EINVAL from its .probe function if there are no interrupt lines configured. Don't print a KERN_ERR level message for each interrupt line that's unconfigured as that can confuse users into thinking that there is an error condition. Before this change, the following KERN_ERR level messages would be reported if only dram_ecc_errc and dram_ecc_errd were configured in the device tree: dmc520 68000000.dmc: IRQ ram_ecc_errc not found dmc520 68000000.dmc: IRQ ram_ecc_errd not found dmc520 68000000.dmc: IRQ failed_access not found dmc520 68000000.dmc: IRQ failed_prog not found dmc520 68000000.dmc: IRQ link_err not dmc520 68000000.dmc: IRQ temperature_event not found dmc520 68000000.dmc: IRQ arch_fsm not found dmc520 68000000.dmc: IRQ phy_request not found Fixes: 1088750d7839 ("EDAC: Add EDAC driver for DMC520") Reported-by: Sinan Kaya <okaya@kernel.org> Signed-off-by: Tyler Hicks <tyhicks@linux.microsoft.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220111163800.22362-1-tyhicks@linux.microsoft.com
2022-04-14EDAC/synopsys: Read the error count from the correct registerShubhrajyoti Datta1-5/+11
Currently, the error count is read wrongly from the status register. Read the count from the proper error count register (ERRCNT). [ bp: Massage. ] Fixes: b500b4a029d5 ("EDAC, synopsys: Add ECC support for ZynqMP DDR controller") Signed-off-by: Shubhrajyoti Datta <shubhrajyoti.datta@xilinx.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Michal Simek <michal.simek@xilinx.com> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220414102813.4468-1-shubhrajyoti.datta@xilinx.com
2022-04-11EDAC/mc: Get rid of edac_align_ptr()Borislav Petkov2-57/+0
Get rid of it now that it is unused. Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220310095254.1510-6-bp@alien8.de
2022-04-11EDAC/device: Sanitize edac_device_alloc_ctl_info() definitionBorislav Petkov1-16/+16
Shorten argument names, correct formatting, sort local vars in reverse x-mas tree order. No functional changes. Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220310095254.1510-5-bp@alien8.de
2022-04-11EDAC/device: Get rid of the silly one-shot memory allocation in ↵Borislav Petkov3-71/+57
edac_device_alloc_ctl_info() Use boring kzalloc() instead. Add pointers to the different allocated members in struct edac_device_ctl_info for easier freeing later. One of the reasons, perhaps, why it was done this way is to be able to do a single kfree(ctl_info) without having to kfree() the other parts of the struct too but that is not nearly a sensible reason as to why there should be this obscure pointer alignment. There should be no functional changes resulting from this. Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220310095254.1510-4-bp@alien8.de
2022-04-11EDAC/pci: Get rid of the silly one-shot memory allocation in ↵Borislav Petkov1-13/+12
edac_pci_alloc_ctl_info() Use boring kzalloc() instead. There should be no functional changes resulting from this. Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220310095254.1510-3-bp@alien8.de
2022-04-11EDAC/mc: Get rid of silly one-shot struct allocation in edac_mc_alloc()Borislav Petkov1-28/+13
This has probably meant something at some point but there's no need for it anymore - the struct mem_ctl_info allocation can happen with normal, boring k*alloc() calls like everyone else does it. No functional changes. Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220310095254.1510-2-bp@alien8.de
2022-04-08EDAC/ghes: Unify CPER memory error location reportingShuai Xue2-163/+38
Switch the GHES EDAC memory error reporting functions to use the common CPER ones and get rid of code duplication. [ bp: - rewrite commit message, remove useless text - rip out useless reformatting - align function params on the opening brace - rename function to a more descriptive name - drop useless function exports - handle buffer lengths properly when printing other detail - remove useless casting ] Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220308144053.49090-3-xueshuai@linux.alibaba.com
2022-04-05powerpc/85xx: Remove fsl,85... bindingsChristophe Leroy1-14/+0
Since 8a4ab218ef70 ("powerpc/85xx: Change deprecated binding for 85xx-based boards") those bindings are not used anymore. A comment in drivers/edac/mpc85xx_edac.c say they are to be removed with kernel 2.6.30. Remove them now. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Acked-by: Scott Wood <oss@buserror.net> Link: https://lore.kernel.org/r/82a8bc4450a4daee50ee5fada75621fecb3703ff.1648721299.git.christophe.leroy@csgroup.eu
2022-04-05x86/amd_nb: Unexport amd_cache_northbridges()Muralidhara M K1-1/+1
amd_cache_northbridges() is exported by amd_nb.c and is called by amd64-agp.c and amd64_edac.c modules at module_init() time so that NB descriptors are properly cached before those drivers can use them. However, the init_amd_nbs() initcall already does call amd_cache_northbridges() unconditionally and thus makes sure the NB descriptors are enumerated. That initcall is a fs_initcall type which is on the 5th group (starting from 0) of initcalls that gets run in increasing numerical order by the init code. The module_init() call is turned into an __initcall() in the MODULE=n case and those are device-level initcalls, i.e., group 6. Therefore, the northbridges caching is already finished by the time module initialization starts and thus the correct initialization order is retained. Unexport amd_cache_northbridges(), update dependent modules to call amd_nb_num() instead. While at it, simplify the checks in amd_cache_northbridges(). [ bp: Heavily massage and *actually* explain why the change is ok. ] Signed-off-by: Muralidhara M K <muralimk@amd.com> Signed-off-by: Naveen Krishna Chatradhi <nchatrad@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220324122729.221765-1-nchatrad@amd.com
2022-03-21Merge branch 'edac-amd64' into edac-updates-for-v5.18Borislav Petkov5-25/+114
Signed-off-by: Borislav Petkov <bp@suse.de>
2022-03-16EDAC/altera: Add SDRAM ECC check for U-BootRabara Niravkumar L1-1/+39
A bug in legacy U-Boot causes a crash during SDRAM boot if ECC is not enabled in the bitstream but enabled in the Linux config. Memory mapped read of the ECC Enabled bit was only enabled if U-Boot determined ECC was enabled in the bitstream. The Linux driver checks the ECC enable bit using a memory map read. In the ECC disabled bitstream case, U-Boot didn't enable ECC register memory map reads and since they are not allowed this results in a crash. Always read the ECC Enable register through a SMC call which is always allowed and it works with legacy and current U-Boot. [ bp: Massage commit message. ] Signed-off-by: Rabara Niravkumar L <niravkumar.l.rabara@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Dinh Nguyen <dinguyen@kernel.org> Link: https://lore.kernel.org/r/20220305014118.4794-1-niravkumar.l.rabara@intel.com
2022-02-24EDAC/amd64: Add new register offset support and related changesYazen Ghannam2-16/+78
Introduce a "family flags" bitmask that can be used to indicate any special behavior needed on a per-family basis. Add a flag to indicate a system uses the new register offsets introduced with Family 19h Model 10h. Use this flag to account for register offset changes, a new bitfield indicating DDR5 use on a memory controller, and to set the proper number of chip select masks. Rework f17_addr_mask_to_cs_size() to properly handle the change in chip select masks. And update code comments to reflect the updated Chip Select, DIMM, and Mask relationships. [uninitialized variable warning] Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: William Roche <william.roche@oracle.com> Link: https://lore.kernel.org/r/20220202144307.2678405-3-yazen.ghannam@amd.com
2022-02-23EDAC/amd64: Set memory type per DIMMYazen Ghannam2-13/+40
Current AMD systems allow mixing of DIMM types within a system. However, DIMMs within a channel, i.e. managed by a single Unified Memory Controller (UMC), must be of the same type. Handle this possible configuration by checking and setting the memory type for each individual "UMC" structure. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: William Roche <william.roche@oracle.com> Link: https://lore.kernel.org/r/20220202144307.2678405-2-yazen.ghannam@amd.com
2022-02-15EDAC: Fix calculation of returned address and next offset in edac_align_ptr()Eliav Farber1-1/+1
Do alignment logic properly and use the "ptr" local variable for calculating the remainder of the alignment. This became an issue because struct edac_mc_layer has a size that is not zero modulo eight, and the next offset that was prepared for the private data was unaligned, causing an alignment exception. The patch in Fixes: which broke this actually wanted to "what we actually care about is the alignment of the actual pointer that's about to be returned." But it didn't check that alignment. Use the correct variable "ptr" for that. [ bp: Massage commit message. ] Fixes: 8447c4d15e35 ("edac: Do alignment logic properly in edac_align_ptr()") Signed-off-by: Eliav Farber <farbere@amazon.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220113100622.12783-2-farbere@amazon.com
2022-01-30EDAC/xgene: Fix deferred probingSergey Shtylyov1-1/+1
The driver overrides error codes returned by platform_get_irq_optional() to -EINVAL for some strange reason, so if it returns -EPROBE_DEFER, the driver will fail the probe permanently instead of the deferred probing. Switch to propagating the proper error codes to platform driver code upwards. [ bp: Massage commit message. ] Fixes: 0d4429301c4a ("EDAC: Add APM X-Gene SoC EDAC driver") Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220124185503.6720-3-s.shtylyov@omp.ru
2022-01-28EDAC/altera: Fix deferred probingSergey Shtylyov1-1/+1
The driver overrides the error codes returned by platform_get_irq() to -ENODEV for some strange reason, so if it returns -EPROBE_DEFER, the driver will fail the probe permanently instead of the deferred probing. Switch to propagating the proper error codes to platform driver code upwards. [ bp: Massage commit message. ] Fixes: 71bcada88b0f ("edac: altera: Add Altera SDRAM EDAC support") Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Dinh Nguyen <dinguyen@kernel.org> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220124185503.6720-2-s.shtylyov@omp.ru
2022-01-26EDAC/mc: Remove unnecessary cast to char * in edac_align_ptr()Eliav Farber1-2/+2
Remove the forgotten (char *) casts as that function returns void *. [ bp: Rewrite commit message. ] Signed-off-by: Eliav Farber <farbere@amazon.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220113100622.12783-3-farbere@amazon.com
2022-01-23EDAC: Use default_groups in kobj_typeGreg Kroah-Hartman2-10/+15
There are currently 2 ways to create a set of sysfs files for a kobj_type, through the default_attrs field, and the default_groups field. Move the edac sysfs code to use default_groups field which has been the preferred way since aa30f47cf666 ("kobject: Add support for default attribute groups to kobj_type") so that the obsolete default_attrs field can be removed soon. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220104112401.1067148-2-gregkh@linuxfoundation.org
2022-01-23EDAC: Use proper list of struct attribute for attributesGreg Kroah-Hartman2-26/+26
The EDAC sysfs code is doing some crazy casting of the list of attributes that is not necessary at all. Instead, properly point to the correct attribute structure in the lists, which removes the need to cast anything and the code is now properly typesafe (as much as sysfs attribute logic is typesafe...) Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220104112401.1067148-1-gregkh@linuxfoundation.org
2022-01-10Merge tag 'edac_updates_for_v5.17_rc1' of ↵Linus Torvalds7-14/+90
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull EDAC updates from Borislav Petkov: - Add support for version 3 of the Synopsys DDR controller to synopsys_edac - Add support for DRR5 and new models 0x10-0x1f and 0x50-0x5f of AMD family 0x19 CPUs to amd64_edac - The usual set of fixes and cleanups * tag 'edac_updates_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: EDAC/amd64: Add support for family 19h, models 50h-5fh EDAC/sb_edac: Remove redundant initialization of variable rc RAS/CEC: Remove a repeated 'an' in a comment EDAC/amd64: Add support for AMD Family 19h Models 10h-1Fh and A0h-AFh EDAC: Add RDDR5 and LRDDR5 memory types EDAC/sifive: Fix non-kernel-doc comment dt-bindings: memory: Add entry for version 3.80a EDAC/synopsys: Enable the driver on Intel's N5X platform EDAC/synopsys: Add support for version 3 of the Synopsys EDAC DDR EDAC/synopsys: Use the quirk for version instead of ddr version
2022-01-10Merge tag 'ras_core_for_v5.17_rc1' of ↵Linus Torvalds2-16/+405
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS updates from Borislav Petkov: "A relatively big amount of movements in RAS-land this time around: - First part of a series to move the AMD address translation code from arch/x86/ to amd64_edac as that is its only user anyway - Some MCE error injection improvements to the AMD side - Reorganization of the #MC handler code and the facilities it calls to make it noinstr-safe - Add support for new AMD MCA bank types and non-uniform banks layout - The usual set of cleanups and fixes" * tag 'ras_core_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits) x86/mce: Reduce number of machine checks taken during recovery x86/mce/inject: Avoid out-of-bounds write when setting flags x86/MCE/AMD, EDAC/mce_amd: Support non-uniform MCA bank type enumeration x86/MCE/AMD, EDAC/mce_amd: Add new SMCA bank types x86/mce: Check regs before accessing it x86/mce: Mark mce_start() noinstr x86/mce: Mark mce_timed_out() noinstr x86/mce: Move the tainting outside of the noinstr region x86/mce: Mark mce_read_aux() noinstr x86/mce: Mark mce_end() noinstr x86/mce: Mark mce_panic() noinstr x86/mce: Prevent severity computation from being instrumented x86/mce: Allow instrumentation during task work queueing x86/mce: Remove noinstr annotation from mce_setup() x86/mce: Use mce_rdmsrl() in severity checking code x86/mce: Remove function-local cpus variables x86/mce: Do not use memset to clear the banks bitmaps x86/mce/inject: Set the valid bit in MCA_STATUS before error injection x86/mce/inject: Check if a bank is populated before injecting x86/mce: Get rid of cpu_missing ...
2022-01-10Merge branches 'edac-misc' and 'edac-amd64' into edac-updates-for-v5.17Borislav Petkov5-4/+46
Signed-off-by: Borislav Petkov <bp@suse.de>
2022-01-04EDAC/i10nm: Release mdev/mbase when failing to detect HBMQiuxu Zhuo1-0/+9
On systems without HBM (High Bandwidth Memory) mdev/mbase are not released/unmapped. Add the code to release mdev/mbase when failing to detect HBM. [Tony: re-word commit message] Cc: <stable@vger.kernel.org> Fixes: c945088384d0 ("EDAC/i10nm: Add support for high bandwidth memory") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20211224091126.1246-1-qiuxu.zhuo@intel.com
2021-12-24EDAC/amd64: Add support for family 19h, models 50h-5fhMarc Bevand2-0/+18
Add the new family 19h models 50h-5fh PCI IDs (device 18h functions 0 and 6) to support Ryzen 5000 APUs ("Cezanne"). Signed-off-by: Marc Bevand <m@zorinaq.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com> Link: https://lore.kernel.org/r/20211221233112.556927-1-m@zorinaq.com
2021-12-22x86/MCE/AMD, EDAC/mce_amd: Support non-uniform MCA bank type enumerationYazen Ghannam1-9/+2
AMD systems currently lay out MCA bank types such that the type of bank number "i" is either the same across all CPUs or is Reserved/Read-as-Zero. For example: Bank # | CPUx | CPUy 0 LS LS 1 RAZ UMC 2 CS CS 3 SMU RAZ Future AMD systems will lay out MCA bank types such that the type of bank number "i" may be different across CPUs. For example: Bank # | CPUx | CPUy 0 LS LS 1 RAZ UMC 2 CS NBIO 3 SMU RAZ Change the structures that cache MCA bank types to be per-CPU and update smca_get_bank_type() to handle this change. Move some SMCA-specific structures to amd.c from mce.h, since they no longer need to be global. Break out the "count" for bank types from struct smca_hwid, since this should provide a per-CPU count rather than a system-wide count. Apply the "const" qualifier to the struct smca_hwid_mcatypes array. The values in this array should not change at runtime. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20211216162905.4132657-3-yazen.ghannam@amd.com
2021-12-22x86/MCE/AMD, EDAC/mce_amd: Add new SMCA bank typesYazen Ghannam1-7/+128
Add HWID and McaType values for new SMCA bank types, and add their error descriptions to edac_mce_amd. The "PHY" bank types all have the same error descriptions, and the NBIF and SHUB bank types have the same error descriptions. So reuse the same arrays where appropriate. [ bp: Remove useless comments over hwid types. ] Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20211216162905.4132657-2-yazen.ghannam@amd.com
2021-12-21EDAC/sb_edac: Remove redundant initialization of variable rcColin Ian King1-1/+1
The variable rc is being initialized with a value that is never read, it is being updated later on. The assignment is redundant and thus remove it. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20211126221848.1125321-1-colin.i.king@gmail.com
2021-12-10EDAC/amd64: Add support for AMD Family 19h Models 10h-1Fh and A0h-AFhYazen Ghannam2-2/+24
Add a new family type for AMD Family 19h Models 10h to 1Fh. Use this new family type for Models A0h to AFh also. Increase the maximum number of controllers from 8 to 12. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20211208174356.1997855-3-yazen.ghannam@amd.com
2021-12-10EDAC: Add RDDR5 and LRDDR5 memory typesYazen Ghannam1-0/+2
Include Registered-DDR5 and Load-Reduced DDR5 in the list of memory types. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20211208174356.1997855-2-yazen.ghannam@amd.com
2021-12-05EDAC/sifive: Fix non-kernel-doc commentRandy Dunlap1-1/+1
scripts/kernel-doc complains about a comment that begins with "/**" but is not in kernel-doc format, so correct it. Prevents this warning: drivers/edac/sifive_edac.c:23: warning: This comment starts with '/**', \ but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst * EDAC error callback Fixes: 91abaeaaff35 ("EDAC/sifive: Add EDAC platform driver for SiFive SoCs") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20211201030913.10283-1-rdunlap@infradead.org
2021-11-20EDAC/synopsys: Enable the driver on Intel's N5X platformDinh Nguyen1-1/+1
Intel's N5X platform is also using the Synopsys EDAC controller. Signed-off-by: Dinh Nguyen <dinguyen@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Michal Simek <michal.simek@xilinx.com> Link: https://lkml.kernel.org/r/20211012190709.1504152-3-dinguyen@kernel.org
2021-11-20EDAC/synopsys: Add support for version 3 of the Synopsys EDAC DDRDinh Nguyen1-7/+42
Add support for version 3.80a of the Synopsys DDR controller. This version of the controller has the following differences: - UE/CE are auto cleared - Interrupts are supported by default Signed-off-by: Dinh Nguyen <dinguyen@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Michal Simek <michal.simek@xilinx.com> Link: https://lkml.kernel.org/r/20211012190709.1504152-2-dinguyen@kernel.org
2021-11-20EDAC/synopsys: Use the quirk for version instead of ddr versionDinh Nguyen1-2/+1
Version 2.40a supports DDR_ECC_INTR_SUPPORT for a quirk, so use that quirk to determine a call to setup_address_map(). Signed-off-by: Dinh Nguyen <dinguyen@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Michal Simek <michal.simek@xilinx.com> Link: https://lkml.kernel.org/r/20211012190709.1504152-1-dinguyen@kernel.org
2021-11-15EDAC/amd64: Add context structYazen Ghannam1-42/+55
Define an address translation context struct. This will hold values that will be passed between multiple functions. Save return address, Node ID, and the Instance ID number to start. Currently, the UMC number is used as the Instance ID, but future DF versions may use another value. Also include a "tmp" field to use when reading registers. This is to avoid having to define a temporary variable in multiple functions. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211028175728.121452-5-yazen.ghannam@amd.com
2021-11-15EDAC/amd64: Allow for DF Indirect Broadcast readsYazen Ghannam1-8/+21
The DF Indirect Access method allows for "Broadcast" accesses in which case no specific instance is targeted. Add support using a reserved instance ID of 0xFF to indicate a broadcast access. Set the FICAA register appropriately. Define helpers functions for instance and broadcast reads and use them where appropriate. Drop the "amd_" prefix since these functions are all static. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211028175728.121452-4-yazen.ghannam@amd.com
2021-11-15x86/amd_nb, EDAC/amd64: Move DF Indirect Read to AMD64 EDACYazen Ghannam1-0/+50
df_indirect_read() is used only for address translation. Move it to EDAC along with the translation code. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211028175728.121452-3-yazen.ghannam@amd.com