summaryrefslogtreecommitdiff
path: root/drivers/edac
AgeCommit message (Collapse)AuthorFilesLines
2014-04-03Merge branch 'mips-for-linux-next' of ↵Linus Torvalds1-6/+173
git://git.linux-mips.org/pub/scm/ralf/upstream-sfr Pull MIPS updates from Ralf Baechle: - Support for Imgtec's Aptiv family of MIPS cores. - Improved detection of BCM47xx configurations. - Fix hiberation for certain configurations. - Add support for the Chinese Loongson 3 CPU, a MIPS64 R2 core and systems. - Detection and support for the MIPS P5600 core. - A few more random fixes that didn't make 3.14. - Support for the EVA Extended Virtual Addressing - Switch Alchemy to the platform PATA driver - Complete unification of Alchemy support - Allow availability of I/O cache coherency to be runtime detected - Improvments to multiprocessing support for Imgtec platforms - A few microoptimizations - Cleanups of FPU support - Paul Gortmaker's fixes for the init stuff - Support for seccomp * 'mips-for-linux-next' of git://git.linux-mips.org/pub/scm/ralf/upstream-sfr: (165 commits) MIPS: CPC: Use __raw_ memory access functions MIPS: CM: use __raw_ memory access functions MIPS: Fix warning when including smp-ops.h with CONFIG_SMP=n MIPS: Malta: GIC IPIs may be used without MT MIPS: smp-mt: Use common GIC IPI implementation MIPS: smp-cmp: Remove incorrect core number probe MIPS: Fix gigaton of warning building with microMIPS. MIPS: Fix core number detection for MT cores MIPS: MT: core_nvpes function to retrieve VPE count MIPS: Provide empty mips_mt_set_cpuoptions when CONFIG_MIPS_MT=n MIPS: Lasat: Replace del_timer by del_timer_sync MIPS: Malta: Setup PM I/O region on boot MIPS: Loongson: Add a Loongson-3 default config file MIPS: Loongson 3: Add CPU hotplug support MIPS: Loongson 3: Add Loongson-3 SMP support MIPS: Loongson: Add Loongson-3 Kconfig options MIPS: Loongson: Add swiotlb to support All-Memory DMA MIPS: Loongson 3: Add serial port support MIPS: Loongson 3: Add IRQ init and dispatch support MIPS: Loongson 3: Add HT-linked PCI support ...
2014-04-02Merge tag 'edac_for_3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bpLinus Torvalds12-91/+130
Pull EDAC updates from Borislav Petkov: "A bunch of EDAC updates all over the place: - Support for new AMD models, along with more graceful fallback for unsupported hw. - Bunch of fixes from SUSE accumulated from bug reports - Misc other fixes and cleanups" * tag 'edac_for_3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: amd64_edac: Add support for newer F16h models i7core_edac: Drop unused variable i82875p_edac: Drop redundant call to pci_get_device() amd8111_edac: Fix leaks in probe error paths e752x_edac: Drop pvt->bridge_ck MCE, AMD: Fix decoding module loading on unsupported hw i5100_edac: Remove an unneeded condition in i5100_init_csrows() sb_edac: Degrade log level for device registration amd64_edac: Fix logic to determine channel for F15 M30h processors edac/85xx: Remove deprecated IRQF_DISABLED i3200_edac: Add a missing pci_disable_device() on the exit path i5400_edac: Disable device when unloading module e752x_edac: Simplify call to pci_get_device()
2014-03-31EDAC: Octeon: Add error injection supportDaniel Walker1-6/+171
This adds an ad-hoc error injection method. Octeon II doesn't have hardware support for injection, so this simulates it. Signed-off-by: Daniel Walker <dwalker@fifo99.com> Cc: David Daney <david.daney@cavium.com> Cc: Doug Thompson <dougthompson@xmission.com> Cc: linux-edac@vger.kernel.org Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/5873/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2014-03-31EDAC: Octeon: Fix lack of opstate_initDaniel Walker1-0/+2
If the opstate_init() isn't called the driver won't start properly. I just added it in what appears to be an appropriate place. Signed-off-by: Daniel Walker <dwalker@fifo99.com> Cc: David Daney <david.daney@cavium.com> Cc: Doug Thompson <dougthompson@xmission.com> Cc: linux-edac@vger.kernel.org Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/5872/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2014-02-27amd64_edac: Add support for newer F16h modelsAravind Gopalakrishnan2-0/+27
Extend ECC decoding support for F16h M30h. Tested on F16h M30h with ECC turned on using mce_amd_inj module and the patch works fine. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Link: http://lkml.kernel.org/r/1392913726-16961-1-git-send-email-Aravind.Gopalakrishnan@amd.com Tested-by: Arindam Nath <Arindam.Nath@amd.com> Acked-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Borislav Petkov <bp@suse.de>
2014-02-25i7core_edac: Drop unused variableJean Delvare1-7/+3
Fix the following warning: drivers/edac/i7core_edac.c: In function "core_mce_output_error": drivers/edac/i7core_edac.c:1711:8: warning: variable "type" set but not used [-Wunused-but-set-variable] char *type, *optype, *err; ^ According to Mauro, type can just be dropped, as tp_event now maps if the error is corrected, uncorrected non-fatal or uncorrected fatal one. Signed-off-by: Jean Delvare <jdelvare@suse.de> Link: http://lkml.kernel.org/r/20140224171358.692d7e5a@endymion.delvare Acked-by: Mauro Carvalho Chehab <m.chehab@samsung.com> Cc: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Borislav Petkov <bp@suse.de>
2014-02-25i82875p_edac: Drop redundant call to pci_get_device()Jean Delvare1-2/+0
The first call to pci_get_device() in i82875p_probe1() is useless. The result is immediately reset at the beginning of i82875p_setup_overfl_dev(), which then issues the same pci_get_device() call. Dropping the redundant call avoids a device reference leak. Signed-off-by: Jean Delvare <jdelvare@suse.de> Link: http://lkml.kernel.org/r/20140224160316.60e55fb6@endymion.delvare Cc: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Borislav Petkov <bp@suse.de>
2014-02-25amd8111_edac: Fix leaks in probe error pathsJean Delvare1-14/+30
Both probe error paths are incomplete and leak memory and device references. Add the missing cleanups. Signed-off-by: Jean Delvare <jdelvare@suse.de> Link: http://lkml.kernel.org/r/20140224154534.5a3b797a@endymion.delvare Cc: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de>
2014-02-25e752x_edac: Drop pvt->bridge_ckJean Delvare1-18/+10
pvt->bridge_ck always points to the same device as pvt->dev_d0f1, so get rid of the former and only use the latter. Signed-off-by: Jean Delvare <jdelvare@suse.de> Tested-by: Aristeu Rozanski <aris@redhat.com> Link: http://lkml.kernel.org/r/20140224151544.16ba28a0@endymion.delvare Cc: Mark Gross <mark.gross@intel.com> Cc: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Borislav Petkov <bp@suse.de>
2014-02-25i7300_edac: Fix device reference countJean Delvare1-18/+20
pci_get_device() decrements the reference count of "from" (last argument) so when we break off the loop successfully we have only one device reference - and we don't know which device we have. If we want a reference to each device, we must take them explicitly and let the pci_get_device() walk complete to avoid duplicate references. This is serious, as over-putting device references will cause the device to eventually disappear. Without this fix, the kernel crashes after a few insmod/rmmod cycles. Tested on an Intel S7000FC4UR system with a 7300 chipset. Signed-off-by: Jean Delvare <jdelvare@suse.de> Link: http://lkml.kernel.org/r/20140224111656.09bbb7ed@endymion.delvare Cc: Mauro Carvalho Chehab <m.chehab@samsung.com> Cc: Doug Thompson <dougthompson@xmission.com> Cc: stable@vger.kernel.org Signed-off-by: Borislav Petkov <bp@suse.de>
2014-02-25i7core_edac: Fix PCI device reference countJean Delvare1-2/+7
The reference count changes done by pci_get_device can be a little misleading when the usage diverges from the most common scheme. The reference count of the device passed as the last parameter is always decreased, even if the function returns no new device. So if we are going to try alternative device IDs, we must manually increment the device reference count before each retry. If we don't, we end up decreasing the reference count, and after a few modprobe/rmmod cycles the PCI devices will vanish. In other words and as Alan put it: without this fix the EDAC code corrupts the PCI device list. This fixes kernel bug #50491: https://bugzilla.kernel.org/show_bug.cgi?id=50491 Signed-off-by: Jean Delvare <jdelvare@suse.de> Link: http://lkml.kernel.org/r/20140224093927.7659dd9d@endymion.delvare Reviewed-by: Alan Cox <alan@linux.intel.com> Cc: Mauro Carvalho Chehab <m.chehab@samsung.com> Cc: Doug Thompson <dougthompson@xmission.com> Cc: stable@vger.kernel.org Signed-off-by: Borislav Petkov <bp@suse.de>
2014-02-24MCE, AMD: Fix decoding module loading on unsupported hwBorislav Petkov1-32/+33
We want to still be able to issue some error information on systems for which there is no decoding support (think older distro kernels here, for example). Therefore, we allow module registration but skip the per-family bank-specific decoders and issue the general information only, i.e.: [ 46.822828] [Hardware Error]: Error Status: Uncorrected, software containable error. [ 46.822846] [Hardware Error]: CPU:0 (15:30:0) MC0_STATUS[-|UE|-|-|-|-|-]: 0xa000000000010f0f [ 46.822858] [Hardware Error]: cache level: L3/GEN, mem/io: GEN, mem-tx: GEN, part-proc: GEN (timed out) with the hope that it still contains helpful useful bits. Suggested-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com> Tested-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com> Link: http://lkml.kernel.org/r/1392659391-2411-1-git-send-email-Aravind.Gopalakrishnan@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>
2014-02-20i5100_edac: Remove an unneeded condition in i5100_init_csrows()Dan Carpenter1-10/+7
We checked that "npages" was not zero a couple lines earlier so we can remove this and pull the code in one indent level. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: http://lkml.kernel.org/r/20140213111738.GB15549@elgon.mountain Signed-off-by: Borislav Petkov <bp@suse.de>
2014-02-20sb_edac: Degrade log level for device registrationJiang Liu1-1/+1
On a system with four Intel processors, it generates too many messages "EDAC sbridge: Seeking for: dev 1d.3 PCI ID xxxx". And it doesn't give many useful information for normal users, so change log level from INFO to DEBUG. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Link: http://lkml.kernel.org/r/1392613824-11230-1-git-send-email-jiang.liu@linux.intel.com Acked-by: Aristeu Rozanski <aris@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de>
2014-02-14EDAC: Correct workqueue setup pathBorislav Petkov1-4/+7
We're using edac_mc_workq_setup() both on the init path, when we load an edac driver and when we change the polling period (edac_mc_reset_delay_period) through /sys/.../edac_mc_poll_msec. On that second path we don't need to init the workqueue which has been initialized already. Thanks to Tejun for workqueue insights. Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1391457913-881-1-git-send-email-prarit@redhat.com Cc: <stable@vger.kernel.org>
2014-02-14EDAC: Poll timeout cannot be zero, p2Borislav Petkov3-7/+9
Sanitize code even more to accept unsigned longs only and to not allow polling intervals below 1 second as this is unnecessary and doesn't make much sense anyway for polling errors. Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1391457913-881-1-git-send-email-prarit@redhat.com Cc: Doug Thompson <dougthompson@xmission.com> Cc: <stable@vger.kernel.org>
2014-02-11drivers/edac/edac_mc_sysfs.c: poll timeout cannot be zeroPrarit Bhargava1-1/+1
If you do echo 0 > /sys/module/edac_core/parameters/edac_mc_poll_msec the following stack trace is output because the edac module is not designed to poll with a timeout of zero. WARNING: CPU: 12 PID: 0 at lib/list_debug.c:33 __list_add+0xac/0xc0() list_add corruption. prev->next should be next (ffff8808291dd1b8), but was (null). (prev=ffff8808286fe3f8). Modules linked in: sg nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache cfg80211 rfkill x86_pkg_temp_thermal coretemp kvm_intel kvm ixgbe e1000e crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt ptp sb_edac iTCO_vendor_support pps_core mdio ipmi_devintf edac_core ioatdma microcode shpchp lpc_ich pcspkr i2c_i801 dca mfd_core ipmi_si wmi ipmi_msghandler nfsd auth_rpcgss nfs_acl lockd sunrpc xfs libcrc32c sd_mod sr_mod cdrom crc_t10dif crct10dif_common mgag200 syscopyarea sysfillrect sysimgblt isci i2c_algo_bit drm_kms_helper ttm drm libsas ahci libahci scsi_transport_sas libata i2c_core dm_mirror dm_region_hash dm_log dm_mod CPU: 12 PID: 0 Comm: swapper/12 Not tainted 3.13.0+ #1 Hardware name: Intel Corporation LH Pass ........../SVRBD-ROW_T, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013 Call Trace: <IRQ> __list_add+0xac/0xc0 __internal_add_timer+0xab/0x130 internal_add_timer+0x17/0x40 mod_timer_pinned+0xca/0x170 intel_pstate_timer_func+0x28a/0x380 call_timer_fn+0x36/0x100 run_timer_softirq+0x1ff/0x2f0 __do_softirq+0xf5/0x2e0 irq_exit+0x10d/0x120 smp_apic_timer_interrupt+0x45/0x60 apic_timer_interrupt+0x6d/0x80 <EOI> cpuidle_idle_call+0xb9/0x1f0 arch_cpu_idle+0xe/0x30 cpu_startup_entry+0x9e/0x240 start_secondary+0x1e4/0x290 kernel BUG at kernel/timer.c:1084! invalid opcode: 0000 [#1] SMP Modules linked in: sg nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache cfg80211 rfkill x86_pkg_temp_thermal coretemp kvm_intel kvm ixgbe e1000e crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt ptp sb_edac iTCO_vendor_support pps_core mdio ipmi_devintf edac_core ioatdma microcode shpchp lpc_ich pcspkr i2c_i801 dca mfd_core ipmi_si wmi ipmi_msghandler nfsd auth_rpcgss nfs_acl lockd sunrpc xfs libcrc32c sd_mod sr_mod cdrom crc_t10dif crct10dif_common mgag200 syscopyarea sysfillrect sysimgblt isci i2c_algo_bit drm_kms_helper ttm drm libsas ahci libahci scsi_transport_sas libata i2c_core dm_mirror dm_region_hash dm_log dm_mod CPU: 12 PID: 0 Comm: swapper/12 Tainted: G W 3.13.0+ #1 Hardware name: Intel Corporation LH Pass ........../SVRBD-ROW_T, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013 Call Trace: <IRQ> run_timer_softirq+0x245/0x2f0 __do_softirq+0xf5/0x2e0 irq_exit+0x10d/0x120 smp_apic_timer_interrupt+0x45/0x60 apic_timer_interrupt+0x6d/0x80 <EOI> cpuidle_idle_call+0xb9/0x1f0 arch_cpu_idle+0xe/0x30 cpu_startup_entry+0x9e/0x240 start_secondary+0x1e4/0x290 RIP cascade+0x93/0xa0 WARNING: CPU: 36 PID: 1154 at kernel/workqueue.c:1461 __queue_delayed_work+0xed/0x1a0() Modules linked in: sg nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache cfg80211 rfkill x86_pkg_temp_thermal coretemp kvm_intel kvm ixgbe e1000e crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt ptp sb_edac iTCO_vendor_support pps_core mdio ipmi_devintf edac_core ioatdma microcode shpchp lpc_ich pcspkr i2c_i801 dca mfd_core ipmi_si wmi ipmi_msghandler nfsd auth_rpcgss nfs_acl lockd sunrpc xfs libcrc32c sd_mod sr_mod cdrom crc_t10dif crct10dif_common mgag200 syscopyarea sysfillrect sysimgblt isci i2c_algo_bit drm_kms_helper ttm drm libsas ahci libahci scsi_transport_sas libata i2c_core dm_mirror dm_region_hash dm_log dm_mod CPU: 36 PID: 1154 Comm: kworker/u481:3 Tainted: G W 3.13.0+ #1 Hardware name: Intel Corporation LH Pass ........../SVRBD-ROW_T, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013 Workqueue: edac-poller edac_mc_workq_function [edac_core] Call Trace: dump_stack+0x45/0x56 warn_slowpath_common+0x7d/0xa0 warn_slowpath_null+0x1a/0x20 __queue_delayed_work+0xed/0x1a0 queue_delayed_work_on+0x27/0x50 edac_mc_workq_function+0x72/0xa0 [edac_core] process_one_work+0x17b/0x460 worker_thread+0x11b/0x400 kthread+0xd2/0xf0 ret_from_fork+0x7c/0xb0 This patch adds a range check in the edac_mc_poll_msec code to check for 0. Signed-off-by: Prarit Bhargava <prarit@redhat.com> Cc: Doug Thompson <dougthompson@xmission.com> Cc: Mauro Carvalho Chehab <mchehab@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-02-07amd64_edac: Fix logic to determine channel for F15 M30h processorsAravind Gopalakrishnan1-3/+11
Update current channel selection logic to include F15h, M30h memory controllers. Refer F15 M30h BKDG D18F2x110[7:6] (DRAM Controller Select Low) (Link:http://support.amd.com/TechDocs/49125_15h_Models_30h-3Fh_BKDG.pdf) Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Link: http://lkml.kernel.org/r/1390338216-3873-1-git-send-email-Aravind.Gopalakrishnan@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>
2014-02-07edac/85xx: Remove deprecated IRQF_DISABLEDJohannes Thumshirn1-3/+3
Remove IRQF_DISABLED as it is a NOOP. Signed-off-by: Johannes Thumshirn <johannes.thumshirn@men.de> Link: http://lkml.kernel.org/r/1390293747-11185-1-git-send-email-johannes.thumshirn@men.de Signed-off-by: Borislav Petkov <bp@suse.de>
2014-02-07i3200_edac: Add a missing pci_disable_device() on the exit pathHitoshi Mitake1-0/+2
Currently, i3200_edac.c forgets to call pci_disable_device() during a process of disabling device. This patch adds that call. The problem was detected by Huqiu Liu. Reported-by: Huqiu Liu <liuhq11@mails.tsinghua.edu.cn> Signed-off-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp> Link: http://lkml.kernel.org/r/1389587178-10167-1-git-send-email-mitake.hitoshi@gmail.com Signed-off-by: Borislav Petkov <bp@suse.de>
2014-02-07i5400_edac: Disable device when unloading moduleAristeu Rozanski1-0/+2
This was found by Huqiu Liu using a static analysis. Reported-by: Huqiu Liu <liuhq11@mails.tsinghua.edu.cn> Signed-off-by: Aristeu Rozanski <aris@redhat.com> Link: http://lkml.kernel.org/r/20140116162021.GY15716@redhat.com Signed-off-by: Borislav Petkov <bp@suse.de>
2014-02-03e752x_edac: Simplify call to pci_get_device()Jean Delvare1-1/+1
The last parameter, pvt->bridge_ck, is always NULL as far as I can see, so just use that. Signed-off-by: Jean Delvare <jdelvare@suse.de> Acked-by: Aristeu Rozanski <aris@redhat.com> Link: http://lkml.kernel.org/r/20140130091101.22f2b550@endymion.delvare Cc: Mark Gross <mark.gross@intel.com> Cc: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Borislav Petkov <bp@suse.de>
2014-01-21Merge branch 'x86-ras-for-linus' of ↵Linus Torvalds2-1/+24
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 RAS changes from Ingo Molnar: - SCI reporting for other error types not only correctable ones - GHES cleanups - Add the functionality to override error reporting agents as some machines are sporting a new extended error logging capability which, if done properly in the BIOS, makes a corresponding EDAC module redundant - PCIe AER tracepoint severity levels fix - Error path correction for the mce device init - MCE timer fix - Add more flexibility to the error injection (EINJ) debugfs interface * 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, mce: Fix mce_start_timer semantics ACPI, APEI, GHES: Cleanup ghes memory error handling ACPI, APEI: Cleanup alignment-aware accesses ACPI, APEI, GHES: Do not report only correctable errors with SCI ACPI, APEI, EINJ: Changes to the ACPI/APEI/EINJ debugfs interface ACPI, eMCA: Combine eMCA/EDAC event reporting priority EDAC, sb_edac: Modify H/W event reporting policy EDAC: Add an edac_report parameter to EDAC PCI, AER: Fix severity usage in aer trace event x86, mce: Call put_device on device_register failure
2014-01-20Merge tag 'edac_for_3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bpLinus Torvalds22-104/+181
Pull EDAC updates from Borislav Petkov: - mpc85xx PCIe error interrupt support - misc small enhancements/fixes all over the place. * tag 'edac_for_3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: EDAC: Don't try to cancel workqueue when it's never setup e752x_edac: Fix pci_dev usage count sb_edac: Mark get_mci_for_node_id as static EDAC: Mark edac_create_debug_nodes as static amd64_edac: Remove "amd64" prefix from static functions amd64_edac: Simplify code around decode_bus_error amd64_edac: Mark amd64_decode_bus_error as static EDAC: Remove DEFINE_PCI_DEVICE_TABLE macro amd64_edac: Fix condition to verify max channels allowed for F15 M30h edac/85xx: Add PCIe error interrupt edac support
2014-01-10EDAC: Don't try to cancel workqueue when it's never setupStephen Boyd1-0/+3
We only setup a workqueue for edac devices that use the polling method. We still try to cancel the workqueue if an edac_device uses the irq method though. This causes a warning from debug objects when we remove an edac device: WARNING: CPU: 0 PID: 56 at lib/debugobjects.c:260 debug_print_object+0x98/0xc0() ODEBUG: assert_init not available (active state 0) object type: timer_list hint: stub_timer+0x0/0x28 Modules linked in: krait_edac(-) CPU: 0 PID: 56 Comm: rmmod Not tainted 3.12.0-rc2-00035-g15292a0 #69 (unwind_backtrace+0x0/0x144) (show_stack+0x20/0x24) (dump_stack+0x74/0xb4) (warn_slowpath_common+0x78/0x9c) (warn_slowpath_fmt+0x40/0x48) (debug_print_object+0x98/0xc0) (debug_object_assert_init+0xdc/0xec) (del_timer+0x24/0x7c) (try_to_grab_pending+0xc0/0x1b0) (cancel_delayed_work+0x2c/0xa0) (edac_device_workq_teardown+0x1c/0x38) (edac_device_del_device+0xb8/0xe4) (krait_edac_remove+0x50/0x70 [krait_edac]) (platform_drv_remove+0x24/0x28) (__device_release_driver+0x68/0xc0) (driver_detach+0xc4/0xc8) (bus_remove_driver+0xac/0x114) (driver_unregister+0x38/0x58) (platform_driver_unregister+0x1c/0x20) (krait_edac_driver_exit+0x14/0x1c [krait_edac]) (SyS_delete_module+0x178/0x2b4) Fix it by skipping the workqueue teardown for such devices. Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Link: http://lkml.kernel.org/r/1388434457-4194-2-git-send-email-sboyd@codeaurora.org Signed-off-by: Borislav Petkov <bp@suse.de>
2013-12-21e752x_edac: Fix pci_dev usage countAristeu Rozanski1-1/+3
In case the device 0, function 1 is not found using pci_get_device(), pci_scan_single_device() will be used but, differently than pci_get_device(), it allocates a pci_dev but doesn't does bump the usage count on the pci_dev and after few module removals and loads the pci_dev will be freed. Signed-off-by: Aristeu Rozanski <aris@redhat.com> Reviewed-by: mark gross <mark.gross@intel.com> Link: http://lkml.kernel.org/r/20131205153755.GL4545@redhat.com Signed-off-by: Borislav Petkov <bp@suse.de>
2013-12-16Merge tag 'ras_for_3.14' of ↵Ingo Molnar2-1/+24
git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp into x86/ras Pull RAS updates from Borislav Petkov: * Add the functionality to override error reporting agents as some machines are sporting a new extended error logging capability which, if done properly in the BIOS, makes a corresponding EDAC module redundant, from Gong Chen. * PCIe AER tracepoint severity levels fix, from Rui Wang. * Error path correction for the mce device init, from Levente Kurusa. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-12-15sb_edac: Mark get_mci_for_node_id as staticRashika Kheria1-1/+1
This patch marks the function get_mci_for_node_id() as static because it is not used outside of sb_edac.c. Thus, it also eliminates the following warning: drivers/edac/sb_edac.c:918:22: warning: no previous prototype for ‘get_mci_for_node_id’ [-Wmissing-prototypes] Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Link: http://lkml.kernel.org/r/0441f508186fc4eeabc8e9c3e4dde013d99405d4.1387029387.git.rashika.kheria@gmail.com Signed-off-by: Borislav Petkov <bp@suse.de>
2013-12-15EDAC: Mark edac_create_debug_nodes as staticRashika Kheria1-1/+1
This patch marks the function edac_create_debug_nodes() as static because it is not used outside of edac_mc_sysfs.c. Thus, it also eliminates the following warning: drivers/edac/edac_mc_sysfs.c:917:5: warning: no previous prototype for ‘edac_create_debug_nodes’ [-Wmissing-prototypes] Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Link: http://lkml.kernel.org/r/a1c863b08c0d6f67d03280cf908c771bf26a3239.1387029387.git.rashika.kheria@gmail.com Signed-off-by: Borislav Petkov <bp@suse.de>
2013-12-15amd64_edac: Remove "amd64" prefix from static functionsBorislav Petkov1-62/+56
No need for the namespace tagging there. Cleanup setup_pci_device while at it. Signed-off-by: Borislav Petkov <bp@suse.de>
2013-12-15amd64_edac: Simplify code around decode_bus_errorBorislav Petkov1-9/+4
Drop wrapper function and prefixes. Signed-off-by: Borislav Petkov <bp@suse.de>
2013-12-15amd64_edac: Mark amd64_decode_bus_error as staticRashika Kheria1-1/+1
This patch marks the function amd64_decode_bus_error() as static because it is not used outside of amd64_edac.c. It also eliminates the following warning: drivers/edac/amd64_edac.c:2038:6: warning: no previous prototype for ‘amd64_decode_bus_error’ [-Wmissing-prototypes] Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Link: http://lkml.kernel.org/r/7cddbd4c69ed493f183383e98853181aaf75b26b.1387029387.git.rashika.kheria@gmail.com Signed-off-by: Borislav Petkov <bp@suse.de>
2013-12-11EDAC, sb_edac: Modify H/W event reporting policyChen, Gong1-1/+5
Newer Intel platforms support more than one method to report H/W event. On this kind of platform, H/W event report can adopt new method and traditional EDAC method should be disabled. Moreover, if EDAC event report method is set to *force*, it means event must be reported via EDAC interface. IOW, it overrides the default event report policy. Signed-off-by: Chen, Gong <gong.chen@linux.intel.com> Acked-by: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/1386310630-12529-3-git-send-email-gong.chen@linux.intel.com [ Boris: massage commit and error messages ] Signed-off-by: Borislav Petkov <bp@suse.de>
2013-12-11EDAC: Add an edac_report parameter to EDACChen, Gong1-0/+19
This new parameter is used to control how to report HW error reporting, especially for newer Intel platform, like Ivybridge-EX, which contains an enhanced error decoding functionality in the firmware, i.e. eMCA. Signed-off-by: Chen, Gong <gong.chen@linux.intel.com> Acked-by: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/1386310630-12529-2-git-send-email-gong.chen@linux.intel.com [ Boris: massage commit message. ] Signed-off-by: Borislav Petkov <bp@suse.de>
2013-12-06EDAC: Remove DEFINE_PCI_DEVICE_TABLE macroJingoo Han18-18/+18
Currently, there is no other bus that has something like this macro for their device ids. Thus, DEFINE_PCI_DEVICE_TABLE macro should be removed. Signed-off-by: Jingoo Han <jg1.han@samsung.com> Link: http://lkml.kernel.org/r/001c01ceefb3$5724d860$056e8920$%han@samsung.com [ Boris: swap commit message with better one. ] Signed-off-by: Borislav Petkov <bp@suse.de>
2013-12-06amd64_edac: Fix condition to verify max channels allowed for F15 M30hAravind Gopalakrishnan1-1/+1
The value returned from 'f15_m30h_determine_channel' will always be 0x3 max. The condition (channel > 4 || channel < 0) works as hardware never returns a value of 4, but it leads to static checker analysis errors like http://marc.info/?l=linux-edac&m=138607615131951&w=2. Fix that. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Link: http://lkml.kernel.org/r/20131203130857.GA32170@elgon.mountain [ Boris: massage commit message a bit. ] Signed-off-by: Borislav Petkov <bp@suse.de>
2013-11-30sb_edac: Shut up compiler warning when EDAC_DEBUG is enabledAristeu Rozanski1-1/+1
Fix this: In file included from drivers/edac/sb_edac.c:27:0: drivers/edac/sb_edac.c: In function ‘sbridge_mce_output_error’: drivers/edac/edac_core.h:50:8: warning: ‘limit’ may be used uninitialized in this function [-Wmaybe-uninitialized] printk(level "EDAC " prefix ": " fmt, ##arg) ^ drivers/edac/sb_edac.c:948:25: note: ‘limit’ was declared here u64 ch_addr, offset, limit, prv = 0; Limit can be initialized to 0. The only way limit wouldn't be initialized is if there are no DIMMs present (which would be a bug of course) and it'd fail on the next test. Signed-off-by: Aristeu Rozanski <arozansk@redhat.com> Cc: Mauro Carvalho Chehab <mchehab@infradead.org> Link: http://lkml.kernel.org/r/20131121122021.GD26009@pd.tnic Signed-off-by: Borislav Petkov <bp@suse.de>
2013-11-25edac/85xx: Add PCIe error interrupt edac supportChunhe Lan2-11/+94
Add pcie error interrupt edac support for mpc85xx, p3041, p4080, and p5020. The mpc85xx uses the legacy interrupt report mechanism - the error interrupts are reported directly to mpic. While the p3041/ p4080/p5020 attaches the most of error interrupts to interrupt zero. And report error interrupts to mpic via interrupt 0. This patch can handle both of them. Signed-off-by: Chunhe Lan <Chunhe.Lan@freescale.com> Link: http://lkml.kernel.org/r/1384712714-8826-3-git-send-email-morbidrsa@gmail.com Cc: Doug Thompson <dougthompson@xmission.com> Cc: Dave Jiang <dave.jiang@gmail.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@men.de> Signed-off-by: Borislav Petkov <bp@suse.de>
2013-11-19Merge branch 'linux_next' of ↵Linus Torvalds2-129/+465
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac Pull EDAC driver updates from Mauro Carvalho Chehab: - sb_edac: add support for Ivy Bridge support - cell_edac: add a missing of_node_put() call * 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac: cell_edac: fix missing of_node_put sb_edac: add support for Ivy Bridge sb_edac: avoid decoding the same error multiple times sb_edac: rename mci_bind_devs() sb_edac: enable multiple PCI id tables to be used sb_edac: rework sad_pkg sb_edac: allow different interleave lists sb_edac: allow different dram_rule arrays sb_edac: isolate TOHM retrieval sb_edac: rename pci_br sb_edac: isolate TOLM retrieval sb_edac: make RANK_CFG_A value part of sbridge_info
2013-11-19Merge tag 'edac_for_3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bpLinus Torvalds6-125/+128
Pull EDAC updates from Borislav Petkov: "Following up on last week's discussion, here's my part of the EDAC pile, highlights in the signed tag. The last two patches have a date from just now because I've just applied them to the tree after Johannes sent them to me earlier. I decided to forward them now because they're trivial. There's a third one for MPC85xx which adds PCIe error interrupt support but since it is not so trivial and hasn't seen any linux-next time, I'm deferring it to 3.14 EDAC update highlights: - Support for Calxeda ECX-2000 memory controller, from Robert Richter - Misc Calxeda Highbank drivers and EDAC core cleanups, from Rob Herring and Robert Richter - New maintainer for Freescale's MPC85xx EDAC driver: Johannes Thumshirn" * tag 'edac_for_3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: edac/85xx: Remove mpc85xx_pci_err_remove EDAC: Add edac-mpc85xx driver to MAINTAINERS edac, highbank: Moving error injection to sysfs for edac edac, highbank: Add MAINTAINERS entry edac: Unify reporting of device info for device, mc and pci edac, highbank: Improve and unify naming edac, highbank: Add Calxeda ECX-2000 support ARM: dts: calxeda: move memory-controller node out of ecx-common.dtsi edac, highbank: Fix interrupt setup of mem and l2 controller
2013-11-17edac/85xx: Remove mpc85xx_pci_err_removeJohannes Thumshirn1-22/+0
Remove mpc85xx_pci_err_remove(...) which is obsolete, this removes the compiler warning which can be seen when building the driver either statically or as a module. Signed-off-by: Johannes Thumshirn <morbidrsa@gmail.com> Link: https://lkml.kernel.org/r/20131112161901.GA15637@jtlinux Signed-off-by: Johannes Thumshirn <johannes.thumshirn@men.de> Signed-off-by: Borislav Petkov <bp@suse.de>
2013-11-14cell_edac: fix missing of_node_putLibo Chen1-0/+1
Decrease device_node refcount np after task completion. Signed-off-by: Libo Chen <libo.chen@huawei.com> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2013-11-14sb_edac: add support for Ivy BridgeAristeu Rozanski1-70/+376
Since Ivy Bridge memory controller is very similar to Sandy Bridge, it's wiser to modify sb_edac to support both instead of creating another driver. [m.chehab@samsung.com: Fix CodingStyle] Signed-off-by: Aristeu Rozanski <arozansk@redhat.com> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2013-11-14sb_edac: avoid decoding the same error multiple timesAristeu Rozanski1-0/+4
Whenever the extended error reporting is active, multiple MCEs will be generated for the same event, which will lead to multiple repeated errors to be reported. So check ADDRV and only decode the error if the MCE address is valid. Signed-off-by: Aristeu Rozanski <arozansk@redhat.com> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2013-11-14sb_edac: rename mci_bind_devs()Aristeu Rozanski1-3/+3
This is in preparation for Ivy Bridge support Signed-off-by: Aristeu Rozanski <arozansk@redhat.com> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2013-11-14sb_edac: enable multiple PCI id tables to be usedAristeu Rozanski1-9/+13
This is needed to allow separated PCI id tables for Sandy Bridge and Ivy Bridge. Signed-off-by: Aristeu Rozanski <arozansk@redhat.com> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2013-11-14sb_edac: rework sad_pkgAristeu Rozanski1-36/+30
This is in preparation for Ivy Bridge support Signed-off-by: Aristeu Rozanski <arozansk@redhat.com> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2013-11-14sb_edac: allow different interleave listsAristeu Rozanski1-5/+8
This is in preparation for Ivy Bridge support Signed-off-by: Aristeu Rozanski <arozansk@redhat.com> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2013-11-14sb_edac: allow different dram_rule arraysAristeu Rozanski1-11/+14
This is in preparation for Ivy Bridge support Signed-off-by: Aristeu Rozanski <arozansk@redhat.com> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2013-11-14sb_edac: isolate TOHM retrievalAristeu Rozanski1-3/+11
This is preparation of Ivy Bridge support. Signed-off-by: Aristeu Rozanski <arozansk@redhat.com> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>