summaryrefslogtreecommitdiff
path: root/drivers/edac
AgeCommit message (Collapse)AuthorFilesLines
2025-12-07EDAC/altera: Use INTTEST register for Ethernet and USB SBE injectionNiravkumar L Rabara1-2/+2
commit 281326be67252ac5794d1383f67526606b1d6b13 upstream. The current single-bit error injection mechanism flips bits directly in ECC RAM by performing write and read operations. When the ECC RAM is actively used by the Ethernet or USB controller, this approach sometimes trigger a false double-bit error. Switch both Ethernet and USB EDAC devices to use the INTTEST register (altr_edac_a10_device_inject_fops) for single-bit error injection, similar to the existing double-bit error injection method. Fixes: 064acbd4f4ab ("EDAC, altera: Add Stratix10 peripheral support") Signed-off-by: Niravkumar L Rabara <niravkumarlaxmidas.rabara@altera.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Dinh Nguyen <dinguyen@kernel.org> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20251111081333.1279635-1-niravkumarlaxmidas.rabara@altera.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-12-07EDAC/altera: Handle OCRAM ECC enable after warm resetNiravkumar L Rabara1-3/+15
commit fd3ecda38fe0cb713d167b5477d25f6b350f0514 upstream. The OCRAM ECC is always enabled either by the BootROM or by the Secure Device Manager (SDM) during a power-on reset on SoCFPGA. However, during a warm reset, the OCRAM content is retained to preserve data, while the control and status registers are reset to their default values. As a result, ECC must be explicitly re-enabled after a warm reset. Fixes: 17e47dc6db4f ("EDAC/altera: Add Stratix10 OCRAM ECC support") Signed-off-by: Niravkumar L Rabara <niravkumarlaxmidas.rabara@altera.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Dinh Nguyen <dinguyen@kernel.org> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20251111080801.1279401-1-niravkumarlaxmidas.rabara@altera.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-19minmax: make generic MIN() and MAX() macros available everywhereLinus Torvalds1-1/+0
[ Upstream commit 1a251f52cfdc417c84411a056bc142cbd77baef4 ] This just standardizes the use of MIN() and MAX() macros, with the very traditional semantics. The goal is to use these for C constant expressions and for top-level / static initializers, and so be able to simplify the min()/max() macros. These macro names were used by various kernel code - they are very traditional, after all - and all such users have been fixed up, with a few different approaches: - trivial duplicated macro definitions have been removed Note that 'trivial' here means that it's obviously kernel code that already included all the major kernel headers, and thus gets the new generic MIN/MAX macros automatically. - non-trivial duplicated macro definitions are guarded with #ifndef This is the "yes, they define their own versions, but no, the include situation is not entirely obvious, and maybe they don't get the generic version automatically" case. - strange use case #1 A couple of drivers decided that the way they want to describe their versioning is with #define MAJ 1 #define MIN 2 #define DRV_VERSION __stringify(MAJ) "." __stringify(MIN) which adds zero value and I just did my Alexander the Great impersonation, and rewrote that pointless Gordian knot as #define DRV_VERSION "1.2" instead. - strange use case #2 A couple of drivers thought that it's a good idea to have a random 'MIN' or 'MAX' define for a value or index into a table, rather than the traditional macro that takes arguments. These values were re-written as C enum's instead. The new function-line macros only expand when followed by an open parenthesis, and thus don't clash with enum use. Happily, there weren't really all that many of these cases, and a lot of users already had the pattern of using '#ifndef' guarding (or in one case just using '#undef MIN') before defining their own private version that does the same thing. I left such cases alone. Cc: David Laight <David.Laight@aculab.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Eliav Farber <farbere@amazon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-19minmax: add a few more MIN_T/MAX_T usersLinus Torvalds1-2/+2
[ Upstream commit 4477b39c32fdc03363affef4b11d48391e6dc9ff ] Commit 3a7e02c040b1 ("minmax: avoid overly complicated constant expressions in VM code") added the simpler MIN_T/MAX_T macros in order to avoid some excessive expansion from the rather complicated regular min/max macros. The complexity of those macros stems from two issues: (a) trying to use them in situations that require a C constant expression (in static initializers and for array sizes) (b) the type sanity checking and MIN_T/MAX_T avoids both of these issues. Now, in the whole (long) discussion about all this, it was pointed out that the whole type sanity checking is entirely unnecessary for min_t/max_t which get a fixed type that the comparison is done in. But that still leaves min_t/max_t unnecessarily complicated due to worries about the C constant expression case. However, it turns out that there really aren't very many cases that use min_t/max_t for this, and we can just force-convert those. This does exactly that. Which in turn will then allow for much simpler implementations of min_t()/max_t(). All the usual "macros in all upper case will evaluate the arguments multiple times" rules apply. We should do all the same things for the regular min/max() vs MIN/MAX() cases, but that has the added complexity of various drivers defining their own local versions of MIN/MAX, so that needs another level of fixes first. Link: https://lore.kernel.org/all/b47fad1d0cf8449886ad148f8c013dae@AcuMS.aculab.com/ Cc: David Laight <David.Laight@aculab.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Eliav Farber <farbere@amazon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-02EDAC/altera: Delete an inappropriate dma_free_coherent() callSalah Triki1-1/+0
commit ff2a66d21fd2364ed9396d151115eec59612b200 upstream. dma_free_coherent() must only be called if the corresponding dma_alloc_coherent() call has succeeded. Calling it when the allocation fails leads to undefined behavior. Delete the wrong call. [ bp: Massage commit message. ] Fixes: 71bcada88b0f3 ("edac: altera: Add Altera SDRAM EDAC support") Signed-off-by: Salah Triki <salah.triki@gmail.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Dinh Nguyen <dinguyen@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/aIrfzzqh4IzYtDVC@pc Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-08-28EDAC/synopsys: Clear the ECC counters on initShubhrajyoti Datta1-51/+46
[ Upstream commit b1dc7f097b78eb8d25b071ead2384b07a549692b ] Clear the ECC error and counter registers during initialization/probe to avoid reporting stale errors that may have occurred before EDAC registration. For that, unify the Zynq and ZynqMP ECC state reading paths and simplify the code. [ bp: Massage commit message. Fix an -Wsometimes-uninitialized warning as reported by Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202507141048.obUv3ZUm-lkp@intel.com ] Signed-off-by: Shubhrajyoti Datta <shubhrajyoti.datta@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/20250713050753.7042-1-shubhrajyoti.datta@amd.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-27EDAC/altera: Use correct write width with the INTTEST registerNiravkumar L Rabara1-3/+3
commit e5ef4cd2a47f27c0c9d8ff6c0f63a18937c071a3 upstream. On the SoCFPGA platform, the INTTEST register supports only 16-bit writes. A 32-bit write triggers an SError to the CPU so do 16-bit accesses only. [ bp: AI-massage the commit message. ] Fixes: c7b4be8db8bc ("EDAC, altera: Add Arria10 OCRAM ECC support") Signed-off-by: Niravkumar L Rabara <niravkumar.l.rabara@intel.com> Signed-off-by: Matthew Gerlach <matthew.gerlach@altera.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Dinh Nguyen <dinguyen@kernel.org> Cc: stable@kernel.org Link: https://lore.kernel.org/20250527145707.25458-1-matthew.gerlach@altera.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-06-27EDAC/skx_common: Fix general protection faultQiuxu Zhuo1-0/+1
[ Upstream commit 20d2d476b3ae18041be423671a8637ed5ffd6958 ] After loading i10nm_edac (which automatically loads skx_edac_common), if unload only i10nm_edac, then reload it and perform error injection testing, a general protection fault may occur: mce: [Hardware Error]: Machine check events logged Oops: general protection fault ... ... Workqueue: events mce_gen_pool_process RIP: 0010:string+0x53/0xe0 ... Call Trace: <TASK> ? die_addr+0x37/0x90 ? exc_general_protection+0x1e7/0x3f0 ? asm_exc_general_protection+0x26/0x30 ? string+0x53/0xe0 vsnprintf+0x23e/0x4c0 snprintf+0x4d/0x70 skx_adxl_decode+0x16a/0x330 [skx_edac_common] skx_mce_check_error.part.0+0xf8/0x220 [skx_edac_common] skx_mce_check_error+0x17/0x20 [skx_edac_common] ... The issue arose was because the variable 'adxl_component_count' (inside skx_edac_common), which counts the ADXL components, was not reset. During the reloading of i10nm_edac, the count was incremented by the actual number of ADXL components again, resulting in a count that was double the real number of ADXL components. This led to an out-of-bounds reference to the ADXL component array, causing the general protection fault above. Fix this issue by resetting the 'adxl_component_count' in adxl_put(), which is called during the unloading of {skx,i10nm}_edac. Fixes: 123b15863550 ("EDAC, i10nm: make skx_common.o a separate module") Reported-by: Feng Xu <feng.f.xu@intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Tested-by: Feng Xu <feng.f.xu@intel.com> Link: https://lore.kernel.org/r/20250417150724.1170168-2-qiuxu.zhuo@intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-04EDAC/ie31200: work around false positive build warningArnd Bergmann1-15/+13
[ Upstream commit c29dfd661fe2f8d1b48c7f00590929c04b25bf40 ] gcc-14 produces a bogus warning in some configurations: drivers/edac/ie31200_edac.c: In function 'ie31200_probe1.isra': drivers/edac/ie31200_edac.c:412:26: error: 'dimm_info' is used uninitialized [-Werror=uninitialized] 412 | struct dimm_data dimm_info[IE31200_CHANNELS][IE31200_DIMMS_PER_CHANNEL]; | ^~~~~~~~~ drivers/edac/ie31200_edac.c:412:26: note: 'dimm_info' declared here 412 | struct dimm_data dimm_info[IE31200_CHANNELS][IE31200_DIMMS_PER_CHANNEL]; | ^~~~~~~~~ I don't see any way the unintialized access could really happen here, but I can see why the compiler gets confused by the two loops. Instead, rework the two nested loops to only read the addr_decode registers and then keep only one instance of the dimm info structure. [Tony: Qiuxu pointed out that the "populate DIMM info" comment was left behind in the refactor and suggested moving it. I deleted the comment as unnecessry in front os a call to populate_dimm_info(). That seems pretty self-describing.] Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Jason Baron <jbaron@akamai.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/all/20250122065031.1321015-1-arnd@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-05-09EDAC/altera: Set DDR and SDMMC interrupt mask before registrationNiravkumar L Rabara2-3/+6
commit 6dbe3c5418c4368e824bff6ae4889257dd544892 upstream. Mask DDR and SDMMC in probe function to avoid spurious interrupts before registration. Removed invalid register write to system manager. Fixes: 1166fde93d5b ("EDAC, altera: Add Arria10 ECC memory init functions") Signed-off-by: Niravkumar L Rabara <niravkumar.l.rabara@altera.com> Signed-off-by: Matthew Gerlach <matthew.gerlach@altera.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Dinh Nguyen <dinguyen@kernel.org> Cc: stable@kernel.org Link: https://lore.kernel.org/20250425142640.33125-3-matthew.gerlach@altera.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-05-09EDAC/altera: Test the correct error reg offsetNiravkumar L Rabara1-1/+1
commit 4fb7b8fceb0beebbe00712c3daf49ade0386076a upstream. Test correct structure member, ecc_cecnt_offset, before using it. [ bp: Massage commit message. ] Fixes: 73bcc942f427 ("EDAC, altera: Add Arria10 EDAC support") Signed-off-by: Niravkumar L Rabara <niravkumar.l.rabara@altera.com> Signed-off-by: Matthew Gerlach <matthew.gerlach@altera.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Dinh Nguyen <dinguyen@kernel.org> Cc: stable@kernel.org Link: https://lore.kernel.org/20250425142640.33125-2-matthew.gerlach@altera.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-04-10EDAC/ie31200: Fix the error path order of ie31200_init()Qiuxu Zhuo1-5/+7
[ Upstream commit 231e341036d9988447e3b3345cf741a98139199e ] The error path order of ie31200_init() is incorrect, fix it. Fixes: 709ed1bcef12 ("EDAC/ie31200: Fallback if host bridge device is already initialized") Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Tested-by: Gary Wang <gary.c.wang@intel.com> Link: https://lore.kernel.org/r/20250310011411.31685-4-qiuxu.zhuo@intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10EDAC/ie31200: Fix the DIMM size mask for several SoCsQiuxu Zhuo1-1/+2
[ Upstream commit 3427befbbca6b19fe0e37f91d66ce5221de70bf1 ] The DIMM size mask for {Sky, Kaby, Coffee} Lake is not bits{7:0}, but bits{5:0}. Fix it. Fixes: 953dee9bbd24 ("EDAC, ie31200_edac: Add Skylake support") Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Tested-by: Gary Wang <gary.c.wang@intel.com> Link: https://lore.kernel.org/r/20250310011411.31685-3-qiuxu.zhuo@intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10EDAC/ie31200: Fix the size of EDAC_MC_LAYER_CHIP_SELECT layerQiuxu Zhuo1-3/+1
[ Upstream commit d59d844e319d97682c8de29b88d2d60922a683b3 ] The EDAC_MC_LAYER_CHIP_SELECT layer pertains to the rank, not the DIMM. Fix its size to reflect the number of ranks instead of the number of DIMMs. Also delete the unused macros IE31200_{DIMMS,RANKS}. Fixes: 7ee40b897d18 ("ie31200_edac: Introduce the driver") Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Tested-by: Gary Wang <gary.c.wang@intel.com> Link: https://lore.kernel.org/r/20250310011411.31685-2-qiuxu.zhuo@intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-14EDAC/igen6: Avoid segmentation fault on module unloadOrange Kao1-0/+2
[ Upstream commit fefaae90398d38a1100ccd73b46ab55ff4610fba ] The segmentation fault happens because: During modprobe: 1. In igen6_probe(), igen6_pvt will be allocated with kzalloc() 2. In igen6_register_mci(), mci->pvt_info will point to &igen6_pvt->imc[mc] During rmmod: 1. In mci_release() in edac_mc.c, it will kfree(mci->pvt_info) 2. In igen6_remove(), it will kfree(igen6_pvt); Fix this issue by setting mci->pvt_info to NULL to avoid the double kfree. Fixes: 10590a9d4f23 ("EDAC/igen6: Add EDAC driver for Intel client SoCs using IBECC") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219360 Signed-off-by: Orange Kao <orange@aiven.io> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20241104124237.124109-2-orange@aiven.io Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-14EDAC/fsl_ddr: Fix bad bit shift operationsPriyanka Singh1-9/+13
[ Upstream commit 9ec22ac4fe766c6abba845290d5139a3fbe0153b ] Fix undefined behavior caused by left-shifting a negative value in the expression: cap_high ^ (1 << (bad_data_bit - 32)) The variable bad_data_bit ranges from 0 to 63. When it is less than 32, bad_data_bit - 32 becomes negative, and left-shifting by a negative value in C is undefined behavior. Fix this by combining cap_high and cap_low into a 64-bit variable. [ bp: Massage commit message, simplify error bits handling. ] Fixes: ea2eb9a8b620 ("EDAC, fsl-ddr: Separate FSL DDR driver from MPC85xx") Signed-off-by: Priyanka Singh <priyanka.singh@nxp.com> Signed-off-by: Li Yang <leoyang.li@nxp.com> Signed-off-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20241016-imx95_edac-v3-3-86ae6fc2756a@nxp.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-14EDAC/bluefield: Fix potential integer overflowDavid Thompson1-1/+1
[ Upstream commit 1fe774a93b46bb029b8f6fa9d1f25affa53f06c6 ] The 64-bit argument for the "get DIMM info" SMC call consists of mem_ctrl_idx left-shifted 16 bits and OR-ed with DIMM index. With mem_ctrl_idx defined as 32-bits wide the left-shift operation truncates the upper 16 bits of information during the calculation of the SMC argument. The mem_ctrl_idx stack variable must be defined as 64-bits wide to prevent any potential integer overflow, i.e. loss of data from upper 16 bits. Fixes: 82413e562ea6 ("EDAC, mellanox: Add ECC support for BlueField DDR4") Signed-off-by: David Thompson <davthompson@nvidia.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Shravan Kumar Ramani <shravankr@nvidia.com> Link: https://lore.kernel.org/r/20240930151056.10158-1-davthompson@nvidia.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-17EDAC/igen6: Fix conversion of system address to physical memory addressQiuxu Zhuo1-1/+1
commit 0ad875f442e95d69a1145a38aabac2fd29984fe3 upstream. The conversion of system address to physical memory address (as viewed by the memory controller) by igen6_edac is incorrect when the system address is above the TOM (Total amount Of populated physical Memory) for Elkhart Lake and Ice Lake (Neural Network Processor). Fix this conversion. Fixes: 10590a9d4f23 ("EDAC/igen6: Add EDAC driver for Intel client SoCs using IBECC") Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/stable/20240814061011.43545-1-qiuxu.zhuo%40intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-17EDAC/synopsys: Fix error injection on Zynq UltraScale+Shubhrajyoti Datta1-1/+34
[ Upstream commit 35e6dbfe1846caeafabb49b7575adb36b0aa2269 ] The Zynq UltraScale+ MPSoC DDR has a disjoint memory from 2GB to 32GB. The DDR host interface has a contiguous memory so while injecting errors, the driver should remove the hole else the injection fails as the address translation is incorrect. Introduce a get_mem_info() function pointer and set it for Zynq UltraScale+ platform to return host address. Fixes: 1a81361f75d8 ("EDAC, synopsys: Add Error Injection support for ZynqMP DDR controller") Signed-off-by: Shubhrajyoti Datta <shubhrajyoti.datta@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20240711100656.31376-1-shubhrajyoti.datta@amd.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-17EDAC/synopsys: Fix ECC status and IRQ control race conditionSerge Semin1-13/+37
[ Upstream commit 591c946675d88dcc0ae9ff54be9d5caaee8ce1e3 ] The race condition around the ECCCLR register access happens in the IRQ disable method called in the device remove() procedure and in the ECC IRQ handler: 1. Enable IRQ: a. ECCCLR = EN_CE | EN_UE 2. Disable IRQ: a. ECCCLR = 0 3. IRQ handler: a. ECCCLR = CLR_CE | CLR_CE_CNT | CLR_CE | CLR_CE_CNT b. ECCCLR = 0 c. ECCCLR = EN_CE | EN_UE So if the IRQ disabling procedure is called concurrently with the IRQ handler method the IRQ might be actually left enabled due to the statement 3c. The root cause of the problem is that ECCCLR register (which since v3.10a has been called as ECCCTL) has intermixed ECC status data clear flags and the IRQ enable/disable flags. Thus the IRQ disabling (clear EN flags) and handling (write 1 to clear ECC status data) procedures must be serialised around the ECCCTL register modification to prevent the race. So fix the problem described above by adding the spin-lock around the ECCCLR modifications and preventing the IRQ-handler from modifying the IRQs enable flags (there is no point in disabling the IRQ and then re-enabling it again within a single IRQ handler call, see the statements 3a/3b and 3c above). Fixes: f7824ded4149 ("EDAC/synopsys: Add support for version 3 of the Synopsys EDAC DDR") Signed-off-by: Serge Semin <fancer.lancer@gmail.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20240222181324.28242-2-fancer.lancer@gmail.com Stable-dep-of: 35e6dbfe1846 ("EDAC/synopsys: Fix error injection on Zynq UltraScale+") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-17EDAC/synopsys: Re-enable the error interrupts on v3 hwSherry Sun1-22/+25
[ Upstream commit 4bcffe941758ee17becb43af3b25487f848f6512 ] zynqmp_get_error_info() writes 0 to the ECC_CLR_OFST register after an interrupt for a {un-,}correctable error is raised, which disables the error interrupts. Then the interrupt handler will be called only once. Therefore, re-enable the error interrupt line at the end of intr_handler() for v3.x Synopsys EDAC DDR. Fixes: f7824ded4149 ("EDAC/synopsys: Add support for version 3 of the Synopsys EDAC DDR") Signed-off-by: Sherry Sun <sherry.sun@nxp.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Shubhrajyoti Datta <Shubhrajyoti.datta@xilinx.com> Acked-by: Michal Simek <michal.simek@xilinx.com> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220427015137.8406-3-sherry.sun@nxp.com Stable-dep-of: 35e6dbfe1846 ("EDAC/synopsys: Fix error injection on Zynq UltraScale+") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-17EDAC/synopsys: Use the correct register to disable the error interrupt on v3 hwSherry Sun1-2/+5
[ Upstream commit be76ceaf03bc04e74be5e28f608316b73c2b04ad ] v3.x Synopsys EDAC DDR doesn't have the QOS Interrupt register. Use the ECC Clear Register to disable the error interrupts instead. Fixes: f7824ded4149 ("EDAC/synopsys: Add support for version 3 of the Synopsys EDAC DDR") Signed-off-by: Sherry Sun <sherry.sun@nxp.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Shubhrajyoti Datta <Shubhrajyoti.datta@xilinx.com> Acked-by: Michal Simek <michal.simek@xilinx.com> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220427015137.8406-2-sherry.sun@nxp.com Stable-dep-of: 35e6dbfe1846 ("EDAC/synopsys: Fix error injection on Zynq UltraScale+") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-17EDAC/synopsys: Add support for version 3 of the Synopsys EDAC DDRDinh Nguyen1-7/+42
[ Upstream commit f7824ded41491d7ebc156a3a2f6fa05cd89da7c2 ] Add support for version 3.80a of the Synopsys DDR controller. This version of the controller has the following differences: - UE/CE are auto cleared - Interrupts are supported by default Signed-off-by: Dinh Nguyen <dinguyen@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Michal Simek <michal.simek@xilinx.com> Link: https://lkml.kernel.org/r/20211012190709.1504152-2-dinguyen@kernel.org Stable-dep-of: 35e6dbfe1846 ("EDAC/synopsys: Fix error injection on Zynq UltraScale+") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-19EDAC, i10nm: make skx_common.o a separate moduleArnd Bergmann3-8/+27
[ Upstream commit 123b158635505c89ed0d3ef45c5845ff9030a466 ] Commit 598afa050403 ("kbuild: warn objects shared among multiple modules") was added to track down cases where the same object is linked into multiple modules. This can cause serious problems if some modules are builtin while others are not. That test triggers this warning: scripts/Makefile.build:236: drivers/edac/Makefile: skx_common.o is added to multiple modules: i10nm_edac skx_edac Make this a separate module instead. [Tony: Added more background details to commit message] Fixes: d4dc89d069aa ("EDAC, i10nm: Add a driver for Intel 10nm server processors") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/all/20240529095132.1929397-1-arnd@kernel.org/ Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-06-16EDAC/igen6: Convert PCIBIOS_* return codes to errnosIlpo Järvinen1-2/+2
commit f8367a74aebf88dc8b58a0db6a6c90b4cb8fc9d3 upstream. errcmd_enable_error_reporting() uses pci_{read,write}_config_word() that return PCIBIOS_* codes. The return code is then returned all the way into the probe function igen6_probe() that returns it as is. The probe functions, however, should return normal errnos. Convert PCIBIOS_* returns code using pcibios_err_to_errno() into normal errno before returning it from errcmd_enable_error_reporting(). Fixes: 10590a9d4f23 ("EDAC/igen6: Add EDAC driver for Intel client SoCs using IBECC") Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240527132236.13875-2-ilpo.jarvinen@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-01-26EDAC/thunderx: Fix possible out-of-bounds string accessArnd Bergmann1-5/+5
[ Upstream commit 475c58e1a471e9b873e3e39958c64a2d278275c8 ] Enabling -Wstringop-overflow globally exposes a warning for a common bug in the usage of strncat(): drivers/edac/thunderx_edac.c: In function 'thunderx_ocx_com_threaded_isr': drivers/edac/thunderx_edac.c:1136:17: error: 'strncat' specified bound 1024 equals destination size [-Werror=stringop-overflow=] 1136 | strncat(msg, other, OCX_MESSAGE_SIZE); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ... 1145 | strncat(msg, other, OCX_MESSAGE_SIZE); ... 1150 | strncat(msg, other, OCX_MESSAGE_SIZE); ... Apparently the author of this driver expected strncat() to behave the way that strlcat() does, which uses the size of the destination buffer as its third argument rather than the length of the source buffer. The result is that there is no check on the size of the allocated buffer. Change it to strlcat(). [ bp: Trim compiler output, fixup commit message. ] Fixes: 41003396f932 ("EDAC, thunderx: Add Cavium ThunderX EDAC driver") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://lore.kernel.org/r/20231122222007.3199885-1-arnd@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-19EDAC/igen6: Fix the issue of no error eventsQiuxu Zhuo1-4/+4
[ Upstream commit ce53ad81ed36c24aff075f94474adecfabfcf239 ] Current igen6_edac checks for pending errors before the registration of the error handler. However, there is a possibility that the error occurs during the registration process, leading to unhandled pending errors and no future error events. This issue can be reproduced by repeatedly injecting errors during the loading of the igen6_edac. Fix this issue by moving the pending error handler after the registration of the error handler, ensuring that no pending errors are left unhandled. Fixes: 10590a9d4f23 ("EDAC/igen6: Add EDAC driver for Intel client SoCs using IBECC") Reported-by: Ee Wey Lim <ee.wey.lim@intel.com> Tested-by: Ee Wey Lim <ee.wey.lim@intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20230725080427.23883-1-qiuxu.zhuo@intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-05-11EDAC/skx: Fix overflows on the DRAM row address mapping arraysQiuxu Zhuo1-2/+2
[ Upstream commit 71b1e3ba3fed5a34c5fac6d3a15c2634b04c1eb7 ] The current DRAM row address mapping arrays skx_{open,close}_row[] only support ranks with sizes up to 16G. Decoding a rank address to a DRAM row address for a 32G rank by using either one of the above arrays by the skx_edac driver, will result in an overflow on the array. For a 32G rank, the most significant DRAM row address bit (the bit17) is mapped from the bit34 of the rank address. Add this new mapping item to both arrays to fix the overflow issue. Fixes: 4ec656bdf43a ("EDAC, skx_edac: Add EDAC driver for Skylake") Reported-by: Feng Xu <feng.f.xu@intel.com> Tested-by: Feng Xu <feng.f.xu@intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/all/20230211011728.71764-1-qiuxu.zhuo@intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-02-01EDAC/qcom: Do not pass llcc_driv_data as edac_device_ctl_info's pvt_infoManivannan Sadhasivam1-3/+2
commit 977c6ba624f24ae20cf0faee871257a39348d4a9 upstream. The memory for llcc_driv_data is allocated by the LLCC driver. But when it is passed as the private driver info to the EDAC core, it will get freed during the qcom_edac driver release. So when the qcom_edac driver gets probed again, it will try to use the freed data leading to the use-after-free bug. Hence, do not pass llcc_driv_data as pvt_info but rather reference it using the platform_data pointer in the qcom_edac driver. Fixes: 27450653f1db ("drivers: edac: Add EDAC driver support for QCOM SoCs") Reported-by: Steev Klimaszewski <steev@kali.org> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Steev Klimaszewski <steev@kali.org> # Thinkpad X13s Tested-by: Andrew Halaney <ahalaney@redhat.com> # sa8540p-ride Cc: <stable@vger.kernel.org> # 4.20 Link: https://lore.kernel.org/r/20230118150904.26913-4-manivannan.sadhasivam@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-01EDAC/device: Respect any driver-supplied workqueue polling valueManivannan Sadhasivam1-8/+7
commit cec669ff716cc83505c77b242aecf6f7baad869d upstream. The EDAC drivers may optionally pass the poll_msec value. Use that value if available, else fall back to 1000ms. [ bp: Touchups. ] Fixes: e27e3dac6517 ("drivers/edac: add edac_device class") Reported-by: Luca Weiss <luca.weiss@fairphone.com> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Steev Klimaszewski <steev@kali.org> # Thinkpad X13s Tested-by: Andrew Halaney <ahalaney@redhat.com> # sa8540p-ride Cc: <stable@vger.kernel.org> # 4.9 Link: https://lore.kernel.org/r/COZYL8MWN97H.MROQ391BGA09@otso Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-01EDAC/highbank: Fix memory leak in highbank_mc_probe()Miaoqian Lin1-2/+5
[ Upstream commit e7a293658c20a7945014570e1921bf7d25d68a36 ] When devres_open_group() fails, it returns -ENOMEM without freeing memory allocated by edac_mc_alloc(). Call edac_mc_free() on the error handling path to avoid a memory leak. [ bp: Massage commit message. ] Fixes: a1b01edb2745 ("edac: add support for Calxeda highbank memory controller") Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Link: https://lore.kernel.org/r/20221229054825.1361993-1-linmq006@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-18EDAC/device: Fix period calculation in edac_device_reset_delay_period()Eliav Farber2-10/+9
commit e84077437902ec99eba0a6b516df772653f142c7 upstream. Fix period calculation in case user sets a value of 1000. The input of round_jiffies_relative() should be in jiffies and not in milli-seconds. [ bp: Use the same code pattern as in edac_device_workq_setup() for clarity. ] Fixes: c4cf3b454eca ("EDAC: Rework workqueue handling") Signed-off-by: Eliav Farber <farbere@amazon.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: <stable@kernel.org> Link: https://lore.kernel.org/r/20221020124458.22153-1-farbere@amazon.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-12-31EDAC/i10nm: fix refcount leak in pci_get_dev_wrapper()Yang Yingliang1-2/+1
[ Upstream commit 9c8921555907f4d723f01ed2d859b66f2d14f08e ] As the comment of pci_get_domain_bus_and_slot() says, it returns a PCI device with refcount incremented, so it doesn't need to call an extra pci_dev_get() in pci_get_dev_wrapper(), and the PCI device needs to be put in the error path. Fixes: d4dc89d069aa ("EDAC, i10nm: Add a driver for Intel 10nm server processors") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20221128065512.3572550-1-yangyingliang@huawei.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-03EDAC/ghes: Set the DIMM label unconditionallyToshi Kani1-3/+8
commit 5e2805d5379619c4a2e3ae4994e73b36439f4bad upstream. The commit cb51a371d08e ("EDAC/ghes: Setup DIMM label from DMI and use it in error reports") enforced that both the bank and device strings passed to dimm_setup_label() are not NULL. However, there are BIOSes, for example on a HPE ProLiant DL360 Gen10/ProLiant DL360 Gen10, BIOS U32 03/15/2019 which don't populate both strings: Handle 0x0020, DMI type 17, 84 bytes Memory Device Array Handle: 0x0013 Error Information Handle: Not Provided Total Width: 72 bits Data Width: 64 bits Size: 32 GB Form Factor: DIMM Set: None Locator: PROC 1 DIMM 1 <===== device Bank Locator: Not Specified <===== bank This results in a buffer overflow because ghes_edac_register() calls strlen() on an uninitialized label, which had non-zero values left over from krealloc_array(): detected buffer overflow in __fortify_strlen ------------[ cut here ]------------ kernel BUG at lib/string_helpers.c:983! invalid opcode: 0000 [#1] PREEMPT SMP NOPTI CPU: 1 PID: 1 Comm: swapper/0 Tainted: G I 5.18.6-200.fc36.x86_64 #1 Hardware name: HPE ProLiant DL360 Gen10/ProLiant DL360 Gen10, BIOS U32 03/15/2019 RIP: 0010:fortify_panic ... Call Trace: <TASK> ghes_edac_register.cold ghes_probe platform_probe really_probe __driver_probe_device driver_probe_device __driver_attach ? __device_attach_driver bus_for_each_dev bus_add_driver driver_register acpi_ghes_init acpi_init ? acpi_sleep_proc_init do_one_initcall The label contains garbage because the commit in Fixes reallocs the DIMMs array while scanning the system but doesn't clear the newly allocated memory. Change dimm_setup_label() to always initialize the label to fix the issue. Set it to the empty string in case BIOS does not provide both bank and device so that ghes_edac_register() can keep the default label given by edac_mc_alloc_dimms(). [ bp: Rewrite commit message. ] Fixes: b9cae27728d1f ("EDAC/ghes: Scan the system once on driver init") Co-developed-by: Robert Richter <rric@kernel.org> Signed-off-by: Robert Richter <rric@kernel.org> Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Robert Elliott <elliott@hpe.com> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220719220124.760359-1-toshi.kani@hpe.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-09EDAC/dmc520: Don't print an error for each unconfigured interrupt lineTyler Hicks1-1/+1
[ Upstream commit ad2df24732e8956a45a00894d2163c4ee8fb0e1f ] The dmc520 driver requires that at least one interrupt line, out of the ten possible, is configured. The driver prints an error and returns -EINVAL from its .probe function if there are no interrupt lines configured. Don't print a KERN_ERR level message for each interrupt line that's unconfigured as that can confuse users into thinking that there is an error condition. Before this change, the following KERN_ERR level messages would be reported if only dram_ecc_errc and dram_ecc_errd were configured in the device tree: dmc520 68000000.dmc: IRQ ram_ecc_errc not found dmc520 68000000.dmc: IRQ ram_ecc_errd not found dmc520 68000000.dmc: IRQ failed_access not found dmc520 68000000.dmc: IRQ failed_prog not found dmc520 68000000.dmc: IRQ link_err not dmc520 68000000.dmc: IRQ temperature_event not found dmc520 68000000.dmc: IRQ arch_fsm not found dmc520 68000000.dmc: IRQ phy_request not found Fixes: 1088750d7839 ("EDAC: Add EDAC driver for DMC520") Reported-by: Sinan Kaya <okaya@kernel.org> Signed-off-by: Tyler Hicks <tyhicks@linux.microsoft.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220111163800.22362-1-tyhicks@linux.microsoft.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-27EDAC/synopsys: Read the error count from the correct registerShubhrajyoti Datta1-5/+11
commit e2932d1f6f055b2af2114c7e64a26dc1b5593d0c upstream. Currently, the error count is read wrongly from the status register. Read the count from the proper error count register (ERRCNT). [ bp: Massage. ] Fixes: b500b4a029d5 ("EDAC, synopsys: Add ECC support for ZynqMP DDR controller") Signed-off-by: Shubhrajyoti Datta <shubhrajyoti.datta@xilinx.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Michal Simek <michal.simek@xilinx.com> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220414102813.4468-1-shubhrajyoti.datta@xilinx.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-23EDAC: Fix calculation of returned address and next offset in edac_align_ptr()Eliav Farber1-1/+1
commit f8efca92ae509c25e0a4bd5d0a86decea4f0c41e upstream. Do alignment logic properly and use the "ptr" local variable for calculating the remainder of the alignment. This became an issue because struct edac_mc_layer has a size that is not zero modulo eight, and the next offset that was prepared for the private data was unaligned, causing an alignment exception. The patch in Fixes: which broke this actually wanted to "what we actually care about is the alignment of the actual pointer that's about to be returned." But it didn't check that alignment. Use the correct variable "ptr" for that. [ bp: Massage commit message. ] Fixes: 8447c4d15e35 ("edac: Do alignment logic properly in edac_align_ptr()") Signed-off-by: Eliav Farber <farbere@amazon.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220113100622.12783-2-farbere@amazon.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-08EDAC/xgene: Fix deferred probingSergey Shtylyov1-1/+1
commit dfd0dfb9a7cc04acf93435b440dd34c2ca7b4424 upstream. The driver overrides error codes returned by platform_get_irq_optional() to -EINVAL for some strange reason, so if it returns -EPROBE_DEFER, the driver will fail the probe permanently instead of the deferred probing. Switch to propagating the proper error codes to platform driver code upwards. [ bp: Massage commit message. ] Fixes: 0d4429301c4a ("EDAC: Add APM X-Gene SoC EDAC driver") Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220124185503.6720-3-s.shtylyov@omp.ru Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-08EDAC/altera: Fix deferred probingSergey Shtylyov1-1/+1
commit 279eb8575fdaa92c314a54c0d583c65e26229107 upstream. The driver overrides the error codes returned by platform_get_irq() to -ENODEV for some strange reason, so if it returns -EPROBE_DEFER, the driver will fail the probe permanently instead of the deferred probing. Switch to propagating the proper error codes to platform driver code upwards. [ bp: Massage commit message. ] Fixes: 71bcada88b0f ("edac: altera: Add Altera SDRAM EDAC support") Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Dinh Nguyen <dinguyen@kernel.org> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220124185503.6720-2-s.shtylyov@omp.ru Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-01-27EDAC/synopsys: Use the quirk for version instead of ddr versionDinh Nguyen1-2/+1
[ Upstream commit bd1d6da17c296bd005bfa656952710d256e77dd3 ] Version 2.40a supports DDR_ECC_INTR_SUPPORT for a quirk, so use that quirk to determine a call to setup_address_map(). Signed-off-by: Dinh Nguyen <dinguyen@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Michal Simek <michal.simek@xilinx.com> Link: https://lkml.kernel.org/r/20211012190709.1504152-1-dinguyen@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-11EDAC/i10nm: Release mdev/mbase when failing to detect HBMQiuxu Zhuo1-0/+9
commit c370baa328022cbd46c59c821d1b467a97f047be upstream. On systems without HBM (High Bandwidth Memory) mdev/mbase are not released/unmapped. Add the code to release mdev/mbase when failing to detect HBM. [Tony: re-word commit message] Cc: <stable@vger.kernel.org> Fixes: c945088384d0 ("EDAC/i10nm: Add support for high bandwidth memory") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20211224091126.1246-1-qiuxu.zhuo@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-18EDAC/amd64: Handle three rank interleaving modeYazen Ghannam1-1/+21
[ Upstream commit 9f4873fb6af7966de8fcbd95c36b61351c1c4b1f ] AMD Rome systems and later support interleaving between three identical ranks within a channel. Check for this mode by counting the number of enabled chip selects and comparing their masks. If there are exactly three enabled chip selects and their masks are identical, then three rank interleaving is enabled. The size of a rank is determined from its mask value. However, three rank interleaving doesn't follow the method of swapping an interleave bit with the most significant bit. Rather, the interleave bit is flipped and the most significant bit remains the same. There is only a single interleave bit in this case. Account for this when determining the chip select size by keeping the most significant bit at its original value and ignoring any zero bits. This will return a full bitmask in [MSB:1]. Fixes: e53a3b267fb0 ("EDAC/amd64: Find Chip Select memory size using Address Mask") Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211005154419.2060504-1-yazen.ghannam@amd.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-18EDAC/sb_edac: Fix top-of-high-memory value for Broadwell/HaswellEric Badger1-1/+1
commit 537bddd069c743759addf422d0b8f028ff0f8dbc upstream. The computation of TOHM is off by one bit. This missed bit results in too low a value for TOHM, which can cause errors in regular memory to incorrectly report: EDAC MC0: 1 CE Error at MMIOH area, on addr 0x000000207fffa680 on any memory Fixes: 50d1bb93672f ("sb_edac: add support for Haswell based systems") Cc: stable@vger.kernel.org Reported-by: Meeta Saggi <msaggi@purestorage.com> Signed-off-by: Eric Badger <ebadger@purestorage.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20211010170127.848113-1-ebadger@purestorage.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-14EDAC/armada-xp: Fix output of uncorrectable error counterHans Potsch1-1/+1
The number of correctable errors is displayed as uncorrectable errors because the "SBE" error count is passed to both calls of edac_mc_handle_error(). Pass the correct uncorrectable error count to the second edac_mc_handle_error() call when logging uncorrectable errors. [ bp: Massage commit message. ] Fixes: 7f6998a41257 ("ARM: 8888/1: EDAC: Add driver for the Marvell Armada XP SDRAM and L2 cache ECC") Signed-off-by: Hans Potsch <hans.potsch@nokia.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20211006121332.58788-1-hans.potsch@nokia.com
2021-09-16EDAC/dmc520: Assign the proper type to dimm->edac_modeBorislav Petkov1-1/+1
dimm->edac_mode contains values of type enum edac_type - not the corresponding capability flags. Fix that. Fixes: 1088750d7839 ("EDAC: Add EDAC driver for DMC520") Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20210916085258.7544-1-bp@alien8.de
2021-09-16EDAC/synopsys: Fix wrong value type assignment for edac_modeSai Krishna Potthuri1-1/+1
dimm->edac_mode contains values of type enum edac_type - not the corresponding capability flags. Fix that. Issue caught by Coverity check "enumerated type mixed with another type." [ bp: Rewrite commit message, add tags. ] Fixes: ae9b56e3996d ("EDAC, synps: Add EDAC support for zynq ddr ecc controller") Signed-off-by: Sai Krishna Potthuri <lakshmi.sai.krishna.potthuri@xilinx.com> Signed-off-by: Shubhrajyoti Datta <shubhrajyoti.datta@xilinx.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20210818072315.15149-1-shubhrajyoti.datta@xilinx.com
2021-08-31Merge tag 'irq-core-2021-08-30' of ↵Linus Torvalds1-5/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq updates from Thomas Gleixner: "Updates to the interrupt core and driver subsystems: Core changes: - The usual set of small fixes and improvements all over the place, but nothing stands out MSI changes: - Further consolidation of the PCI/MSI interrupt chip code - Make MSI sysfs code independent of PCI/MSI and expose the MSI interrupts of platform devices in the same way as PCI exposes them. Driver changes: - Support for ARM GICv3 EPPI partitions - Treewide conversion to generic_handle_domain_irq() for all chained interrupt controllers - Conversion to bitmap_zalloc() throughout the irq chip drivers - The usual set of small fixes and improvements" * tag 'irq-core-2021-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (57 commits) platform-msi: Add ABI to show msi_irqs of platform devices genirq/msi: Move MSI sysfs handling from PCI to MSI core genirq/cpuhotplug: Demote debug printk to KERN_DEBUG irqchip/qcom-pdc: Trim unused levels of the interrupt hierarchy irqdomain: Export irq_domain_disconnect_hierarchy() irqchip/gic-v3: Fix priority comparison when non-secure priorities are used irqchip/apple-aic: Fix irq_disable from within irq handlers pinctrl/rockchip: drop the gpio related codes gpio/rockchip: drop irq_gc_lock/irq_gc_unlock for irq set type gpio/rockchip: support next version gpio controller gpio/rockchip: use struct rockchip_gpio_regs for gpio controller gpio/rockchip: add driver for rockchip gpio dt-bindings: gpio: change items restriction of clock for rockchip,gpio-bank pinctrl/rockchip: add pinctrl device to gpio bank struct pinctrl/rockchip: separate struct rockchip_pin_bank to a head file pinctrl/rockchip: always enable clock for gpio controller genirq: Fix kernel doc indentation EDAC/altera: Convert to generic_handle_domain_irq() powerpc: Bulk conversion to generic_handle_domain_irq() nios2: Bulk conversion to generic_handle_domain_irq() ...
2021-08-30Merge tag 'edac_updates_for_v5.15' of ↵Linus Torvalds8-38/+202
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull EDAC updates from Borislav Petkov: "The usual EDAC stuff which managed to trickle in for 5.15: - Add new HBM2 (High Bandwidth Memory Gen 2) type and add support for it to the Intel SKx drivers - Print additional useful per-channel error information on i10nm, like on SKL - Don't load the AMD EDAC decoder in virtual images - The usual round of fixes and cleanups" * tag 'edac_updates_for_v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: EDAC/i10nm: Retrieve and print retry_rd_err_log registers EDAC/i10nm: Fix NVDIMM detection EDAC/skx_common: Set the memory type correctly for HBM memory EDAC/altera: Skip defining unused structures for specific configs EDAC/mce_amd: Do not load edac_mce_amd module on guests EDAC/mc: Add new HBM2 memory type EDAC/amd64: Use DEVICE_ATTR helper macros
2021-08-23EDAC/i10nm: Retrieve and print retry_rd_err_log registersYouquan Song4-3/+157
Retrieve and print retry_rd_err_log registers like the earlier change: commit e80634a75aba ("EDAC, skx: Retrieve and print retry_rd_err_log registers") This is a little trickier than on Skylake because of potential interference with BIOS use of the same registers. The default behavior is to ignore these registers. A module parameter retry_rd_err_log(default=0) controls the mode of operation: - 0=off : Default. - 1=bios : Linux doesn't reset any control bits, but just reports values. This is "no harm" mode, but it may miss reporting some data. - 2=linux: Linux tries to take control and resets mode bits, clears valid/UC bits after reading. This should be more reliable (especially if BIOS interference is reduced by disabling eMCA reporting mode in BIOS setup). Co-developed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Youquan Song <youquan.song@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20210818175701.1611513-3-tony.luck@intel.com
2021-08-23EDAC/i10nm: Fix NVDIMM detectionQiuxu Zhuo1-3/+3
MCDDRCFG is a per-channel register and uses bit{0,1} to indicate the NVDIMM presence on DIMM slot{0,1}. Current i10nm_edac driver wrongly uses MCDDRCFG as per-DIMM register and fails to detect the NVDIMM. Fix it by reading MCDDRCFG as per-channel register and using its bit{0,1} to check whether the NVDIMM is populated on DIMM slot{0,1}. Fixes: d4dc89d069aa ("EDAC, i10nm: Add a driver for Intel 10nm server processors") Reported-by: Fan Du <fan.du@intel.com> Tested-by: Wen Jin <wen.jin@intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20210818175701.1611513-2-tony.luck@intel.com