summaryrefslogtreecommitdiff
path: root/drivers/edac
AgeCommit message (Collapse)AuthorFilesLines
2016-03-21EDAC, ppc4xx: Access mci->csrows array elements properlyMichael Walle1-1/+1
commit 5c16179b550b9fd8114637a56b153c9768ea06a5 upstream. The commit de3910eb79ac ("edac: change the mem allocation scheme to make Documentation/kobject.txt happy") changed the memory allocation for the csrows member. But ppc4xx_edac was forgotten in the patch. Fix it. Signed-off-by: Michael Walle <michael@walle.cc> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Link: http://lkml.kernel.org/r/1437469253-8611-1-git-send-email-michael@walle.cc Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Zefan Li <lizefan@huawei.com>
2015-06-19sb_edac: Fix erroneous bytes->gigabytes conversionJim Snow1-19/+19
commit 8c009100295597f23978c224aec5751a365bc965 upstream. Signed-off-by: Jim Snow <jim.snow@intel.com> Signed-off-by: Lukasz Anaczkowski <lukasz.anaczkowski@intel.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: Vinson Lee <vlee@twopensource.com> [lizf: Backported to 3.4: - adjust context - use debugf0() instead of edac_dbg()] Signed-off-by: Zefan Li <lizefan@huawei.com>
2015-02-02mpc85xx_edac: Make L2 interrupt shared tooBorislav Petkov1-1/+2
commit a18c3f16a907b8977ef65fc8dd71ed3f7b751748 upstream. The other two interrupt handlers in this driver are shared, except this one. When loading the driver, it fails like this. So make the IRQ line shared. Freescale(R) MPC85xx EDAC driver, (C) 2006 Montavista Software mpc85xx_mc_err_probe: No ECC DIMMs discovered EDAC DEVICE0: Giving out device to module MPC85xx_edac controller mpc85xx_l2_err: DEV mpc85xx_l2_err (INTERRUPT) genirq: Flags mismatch irq 16. 00000000 ([EDAC] L2 err) vs. 00000080 ([EDAC] PCI err) mpc85xx_l2_err_probe: Unable to request irq 16 for MPC85xx L2 err remove_proc_entry: removing non-empty directory 'irq/16', leaking at least 'aerdrv' ------------[ cut here ]------------ WARNING: at fs/proc/generic.c:521 Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.17.0-rc5-dirty #1 task: ee058000 ti: ee046000 task.ti: ee046000 NIP: c016c0c4 LR: c016c0c4 CTR: c037b51c REGS: ee047c10 TRAP: 0700 Not tainted (3.17.0-rc5-dirty) MSR: 00029000 <CE,EE,ME> CR: 22008022 XER: 20000000 GPR00: c016c0c4 ee047cc0 ee058000 00000053 00029000 00000000 c037c744 00000003 GPR08: c09aab28 c09aab24 c09aab28 00000156 20008028 00000000 c0002ac8 00000000 GPR16: 00000000 00000000 00000000 00000000 00000000 00000000 00000139 c0950394 GPR24: c09f0000 ee5585b0 ee047d08 c0a10000 ee047d08 ee15f808 00000002 ee03f660 NIP [c016c0c4] remove_proc_entry LR [c016c0c4] remove_proc_entry Call Trace: remove_proc_entry (unreliable) unregister_irq_proc free_desc irq_free_descs mpc85xx_l2_err_probe platform_drv_probe really_probe __driver_attach bus_for_each_dev bus_add_driver driver_register mpc85xx_mc_init do_one_initcall kernel_init_freeable kernel_init ret_from_kernel_thread Instruction dump: ... Reported-and-tested-by: <lpb_098@163.com> Acked-by: Johannes Thumshirn <johannes.thumshirn@men.de> Signed-off-by: Borislav Petkov <bp@suse.de> [lizf: Backported to 3.4: IRQF_DISABLED hasn't been removed in 3.4] Signed-off-by: Zefan Li <lizefan@huawei.com>
2014-06-08i82975x_edac: Fix dimm label initializationMauro Carvalho Chehab1-7/+4
commit 479696840239e0cc43efb3c917bdcad2174d2215 upstream. The driver has only 4 hardcoded labels, but allows much more memory. Fix it by removing the hardcoded logic, using snprintf() instead. [ 19.833972] general protection fault: 0000 [#1] SMP [ 19.837733] Modules linked in: i82975x_edac(+) edac_core firewire_ohci firewire_core crc_itu_t nouveau mxm_wmi wmi video i2c_algo_bit drm_kms_helper ttm drm i2c_core [ 19.837733] CPU 0 [ 19.837733] Pid: 390, comm: udevd Not tainted 3.6.1-1.fc17.x86_64.debug #1 Dell Inc. Precision WorkStation 390 /0MY510 [ 19.837733] RIP: 0010:[<ffffffff813463a8>] [<ffffffff813463a8>] strncpy+0x18/0x30 [ 19.837733] RSP: 0018:ffff880078535b68 EFLAGS: 00010202 [ 19.837733] RAX: ffff880069fa9708 RBX: ffff880078588000 RCX: ffff880069fa9708 [ 19.837733] RDX: 000000000000001f RSI: 5f706f5f63616465 RDI: ffff880069fa9708 [ 19.837733] RBP: ffff880078535b68 R08: ffff880069fa9727 R09: 000000000000fffe [ 19.837733] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000003 [ 19.837733] R13: 0000000000000000 R14: ffff880069fa9290 R15: ffff880079624a80 [ 19.837733] FS: 00007f3de01ee840(0000) GS:ffff88007c400000(0000) knlGS:0000000000000000 [ 19.837733] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 19.837733] CR2: 00007f3de00b9000 CR3: 0000000078dbc000 CR4: 00000000000007f0 [ 19.837733] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 19.837733] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 19.837733] Process udevd (pid: 390, threadinfo ffff880078534000, task ffff880079642450) [ 19.837733] Stack: [ 19.837733] ffff880078535c18 ffffffffa017c6b8 00040000816d627f ffff880079624a88 [ 19.837733] ffffc90004cd6000 ffff880079624520 ffff88007ac21148 0000000000000000 [ 19.837733] 0000000000000000 0004000000000000 feda000078535bc8 ffffffff810d696d [ 19.837733] Call Trace: [ 19.837733] [<ffffffffa017c6b8>] i82975x_init_one+0x2e6/0x3e6 [i82975x_edac] ... Fix bug reported at: https://bugzilla.redhat.com/show_bug.cgi?id=848149 And, very likely: https://bbs.archlinux.org/viewtopic.php?id=148033 https://bugzilla.kernel.org/show_bug.cgi?id=47171 Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com> [bwh: Backported to 3.2: - Adjust context - Use csrow->channels[chan].label not csrow->channels[chan]->dimm->label] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Qiang Huang <h.huangqiang@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-04-14sb_edac: Avoid overflow errors at memory size calculationMauro Carvalho Chehab1-3/+4
commit deb09ddaff1435f72dd598d38f9b58354c68a5ec upstream. Sandy bridge EDAC is calculating the memory size with overflow. Basically, the size field and the integer calculation is using 32 bits. More bits are needed, when the DIMM memories have high density. The net result is that memories are improperly reported there, when high-density DIMMs are used: EDAC DEBUG: in drivers/edac/sb_edac.c, line at 591: mc#0: channel 0, dimm 0, -16384 Mb (-4194304 pages) bank: 8, rank: 2, row: 0x10000, col: 0x800 EDAC DEBUG: in drivers/edac/sb_edac.c, line at 591: mc#0: channel 1, dimm 0, -16384 Mb (-4194304 pages) bank: 8, rank: 2, row: 0x10000, col: 0x800 As the number of pages value is handled at the EDAC core as unsigned ints, the driver shows the 16 GB memories at sysfs interface as 16760832 MB! The fix is simple: calculate the number of pages as unsigned 64-bits integer. After the patch, the memory size (16 GB) is properly detected: EDAC DEBUG: in drivers/edac/sb_edac.c, line at 592: mc#0: channel 0, dimm 0, 16384 Mb (4194304 pages) bank: 8, rank: 2, row: 0x10000, col: 0x800 EDAC DEBUG: in drivers/edac/sb_edac.c, line at 592: mc#0: channel 1, dimm 0, 16384 Mb (4194304 pages) bank: 8, rank: 2, row: 0x10000, col: 0x800 Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com> [bwh: Backported to 3.2: - Adjust context - Debug log function is debugf0(), not edac_dbg()] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Qiang Huang <h.huangqiang@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-31i7300_edac: Fix device reference countJean Delvare1-18/+20
commit 75135da0d68419ef8a925f4c1d5f63d8046e314d upstream. pci_get_device() decrements the reference count of "from" (last argument) so when we break off the loop successfully we have only one device reference - and we don't know which device we have. If we want a reference to each device, we must take them explicitly and let the pci_get_device() walk complete to avoid duplicate references. This is serious, as over-putting device references will cause the device to eventually disappear. Without this fix, the kernel crashes after a few insmod/rmmod cycles. Tested on an Intel S7000FC4UR system with a 7300 chipset. Signed-off-by: Jean Delvare <jdelvare@suse.de> Link: http://lkml.kernel.org/r/20140224111656.09bbb7ed@endymion.delvare Cc: Mauro Carvalho Chehab <m.chehab@samsung.com> Cc: Doug Thompson <dougthompson@xmission.com> Cc: stable@vger.kernel.org Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-12i7core_edac: Fix PCI device reference countJean Delvare1-2/+7
commit c0f5eeed0f4cef4f05b74883a7160e7edde58b6a upstream. The reference count changes done by pci_get_device can be a little misleading when the usage diverges from the most common scheme. The reference count of the device passed as the last parameter is always decreased, even if the function returns no new device. So if we are going to try alternative device IDs, we must manually increment the device reference count before each retry. If we don't, we end up decreasing the reference count, and after a few modprobe/rmmod cycles the PCI devices will vanish. In other words and as Alan put it: without this fix the EDAC code corrupts the PCI device list. This fixes kernel bug #50491: https://bugzilla.kernel.org/show_bug.cgi?id=50491 Signed-off-by: Jean Delvare <jdelvare@suse.de> Link: http://lkml.kernel.org/r/20140224093927.7659dd9d@endymion.delvare Reviewed-by: Alan Cox <alan@linux.intel.com> Cc: Mauro Carvalho Chehab <m.chehab@samsung.com> Cc: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-06e752x_edac: Fix pci_dev usage countAristeu Rozanski1-1/+3
commit 90ed4988b8c030d65b41b7d13140e9376dc6ec5a upstream. In case the device 0, function 1 is not found using pci_get_device(), pci_scan_single_device() will be used but, differently than pci_get_device(), it allocates a pci_dev but doesn't does bump the usage count on the pci_dev and after few module removals and loads the pci_dev will be freed. Signed-off-by: Aristeu Rozanski <aris@redhat.com> Reviewed-by: mark gross <mark.gross@intel.com> Link: http://lkml.kernel.org/r/20131205153755.GL4545@redhat.com Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Jean Delvare <jdelvare@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-02-04EDAC: Test correct variable in ->store functionDan Carpenter1-1/+1
commit 8024c4c0b1057d1cd811fc9c3f88f81de9729fcd upstream. We're testing for ->show but calling ->store(). Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-12-10i7300_edac: Fix error flag testingJean Delvare1-4/+4
commit 7e06b7a3333f5c7a0cec12aff20d39c5c87c0795 upstream. * Right-shift the values in GET_FBD_FAT_IDX and GET_FBD_NF_IDX, so that the callers get the result they expect. * Fix definition of FERR_FAT_FBD_ERR_MASK. * Call GET_FBD_NF_IDX, not GET_FBD_FAT_IDX, when operating on register FERR_NF_FBD. We were lucky they have the same definition. This fixes kernel bug #44131: https://bugzilla.kernel.org/show_bug.cgi?id=44131 Signed-off-by: Jean Delvare <jdelvare@suse.de> Cc: Mauro Carvalho Chehab <mchehab@redhat.com> Cc: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-28amd64_edac:__amd64_set_scrub_rate(): avoid overindexing scrubrates[]Andrew Morton1-7/+4
commit 168bfeef7bba3f9784f7540b053e4ac72b769ce9 upstream. If none of the elements in scrubrates[] matches, this loop will cause __amd64_set_scrub_rate() to incorrectly use the n+1th element. As the function is designed to use the final scrubrates[] element in the case of no match, we can fix this bug by simply terminating the array search at the n-1th element. Boris: this code is fragile anyway, see here why: http://marc.info/?l=linux-kernel&m=135102834131236&w=2 It will be rewritten more robustly soonish. Reported-by: Denis Kirjanov <kirjanov@gmail.com> Cc: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-22edac: fix the error about memory type detection on SandyBridgeChen Gong1-1/+1
commit 2cbb587d3bc41a305168e91b4f3c5b6944a12566 upstream. On SandyBridge, DDRIOA(Dev: 17 Func: 0 Offset: 328) is used to detect whether DIMM is RDIMM/LRDIMM, not TA(Dev: 15 Func: 0). Signed-off-by: Chen Gong <gong.chen@linux.intel.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-22edac: avoid mce decoding crash after edac driver unloadedChen Gong2-15/+8
commit e35fca4791fcdd43dc1fd769797df40c562ab491 upstream. Some edac drivers register themselves as mce decoders via notifier_chain. But in current notifier_chain implementation logic, it doesn't accept same notifier registered twice. If so, it will be wrong when adding/removing the element from the list. For example, on one SandyBridge platform, remove module sb_edac and then trigger one error, it will hit oops because it has no mce decoder registered but related notifier_chain still points to an invalid callback function. Here is an example: Call Trace: [<ffffffff8150ef6a>] atomic_notifier_call_chain+0x1a/0x20 [<ffffffff8102b936>] mce_log+0x46/0x180 [<ffffffff8102eaea>] apei_mce_report_mem_error+0x4a/0x60 [<ffffffff812e19d2>] ghes_do_proc+0x192/0x210 [<ffffffff812e2066>] ghes_proc+0x46/0x70 [<ffffffff812e20d8>] ghes_notify_sci+0x48/0x80 [<ffffffff8150ef05>] notifier_call_chain+0x55/0x80 [<ffffffff81076f1a>] __blocking_notifier_call_chain+0x5a/0x80 [<ffffffff812aea11>] ? acpi_os_wait_events_complete+0x23/0x23 [<ffffffff81076f56>] blocking_notifier_call_chain+0x16/0x20 [<ffffffff812ddc4d>] acpi_hed_notify+0x19/0x1b [<ffffffff812b16bd>] acpi_device_notify+0x19/0x1b [<ffffffff812beb38>] acpi_ev_notify_dispatch+0x67/0x7f [<ffffffff812aea3a>] acpi_os_execute_deferred+0x29/0x36 [<ffffffff81069dc2>] process_one_work+0x132/0x450 [<ffffffff8106bbcb>] worker_thread+0x17b/0x3c0 [<ffffffff8106ba50>] ? manage_workers+0x120/0x120 [<ffffffff81070aee>] kthread+0x9e/0xb0 [<ffffffff81514724>] kernel_thread_helper+0x4/0x10 [<ffffffff81070a50>] ? kthread_freezable_should_stop+0x70/0x70 [<ffffffff81514720>] ? gs_change+0x13/0x13 Code: f3 49 89 d4 45 85 ed 4d 89 c6 48 8b 0f 74 48 48 85 c9 75 17 eb 41 0f 1f 80 00 00 00 00 41 83 ed 01 4c 89 f9 74 22 4d 85 ff 74 1d <4c> 8b 79 08 4c 89 e2 48 89 de 48 89 cf ff 11 4d 85 f6 74 04 41 RIP [<ffffffff8150eef6>] notifier_call_chain+0x46/0x80 RSP <ffff88042868fb20> CR2: ffffffffa01af838 ---[ end trace 0100930068e73e6f ]--- BUG: unable to handle kernel paging request at fffffffffffffff8 IP: [<ffffffff810705b0>] kthread_data+0x10/0x20 PGD 1a0d067 PUD 1a0e067 PMD 0 Oops: 0000 [#2] SMP Only i7core_edac and sb_edac have such issues because they have more than one memory controller which means they have to register mce decoder many times. Signed-off-by: Chen Gong <gong.chen@linux.intel.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-04-07Merge branch 'stable' of ↵Linus Torvalds1-0/+4
git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile Pull arch/tile bug fixes from Chris Metcalf: "This includes Paul Gortmaker's change to fix the <asm/system.h> disintegration issues on tile, a fix to unbreak the tilepro ethernet driver, and a backlog of bugfix-only changes from internal Tilera development over the last few months. They have all been to LKML and on linux-next for the last few days. The EDAC change to MAINTAINERS is an oddity but discussion on the linux-edac list suggested I ask you to pull that change through my tree since they don't have a tree to pull edac changes from at the moment." * 'stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile: (39 commits) drivers/net/ethernet/tile: fix netdev_alloc_skb() bombing MAINTAINERS: update EDAC information tilepro ethernet driver: fix a few minor issues tile-srom.c driver: minor code cleanup edac: say "TILEGx" not "TILEPro" for the tilegx edac driver arch/tile: avoid accidentally unmasking NMI-type interrupt accidentally arch/tile: remove bogus performance optimization arch/tile: return SIGBUS for addresses that are unaligned AND invalid arch/tile: fix finv_buffer_remote() for tilegx arch/tile: use atomic exchange in arch_write_unlock() arch/tile: stop mentioning the "kvm" subdirectory arch/tile: export the page_home() function. arch/tile: fix pointer cast in cacheflush.c arch/tile: fix single-stepping over swint1 instructions on tilegx arch/tile: implement panic_smp_self_stop() arch/tile: add "nop" after "nap" to help GX idle power draw arch/tile: use proper memparse() for "maxmem" options arch/tile: fix up locking in pgtable.c slightly arch/tile: don't leak kernel memory when we unload modules arch/tile: fix bug in delay_backoff() ...
2012-04-04MCE, AMD: Drop too granulary family model checksBorislav Petkov1-4/+2
MCA details seldom change inbetween the models of a family so don't be too conservative and enable decoding on everything starting from K8 onwards. Minor adjustments can come in later but most importantly, we have some decoding infrastructure in place for upcoming models by default. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-04-02edac: say "TILEGx" not "TILEPro" for the tilegx edac driverChris Metcalf1-0/+4
This is just an aesthetic change but it was silly to say TILEPro when booting up on the tilegx architecture. Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2012-03-29Merge branch 'linux_next' of ↵Linus Torvalds6-51/+80
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac Pull EDAC fixes from Mauro Carvalho Chehab: "A series of EDAC driver fixes. It also has one core fix at the documentation, and a rename patch, fixing the name of the struct that contains the rank information." * 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac: edac: rename channel_info to rank_info i5400_edac: Avoid calling pci_put_device() twice edac: i5100 ack error detection register after each read edac: i5100 fix erroneous define for M1Err edac: sb_edac: Fix a wrong value setting for the previous value edac: sb_edac: Fix a INTERLEAVE_MODE() misuse edac: sb_edac: Let the driver depend on PCI_MMCONFIG edac: Improve the comments to better describe the memory concepts edac/ppc4xx_edac: Fix compilation Fix sb_edac compilation with 32 bits kernels
2012-03-24Merge tag 'device-for-3.4' of ↵Linus Torvalds2-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux Pull <linux/device.h> avoidance patches from Paul Gortmaker: "Nearly every subsystem has some kind of header with a proto like: void foo(struct device *dev); and yet there is no reason for most of these guys to care about the sub fields within the device struct. This allows us to significantly reduce the scope of headers including headers. For this instance, a reduction of about 40% is achieved by replacing the include with the simple fact that the device is some kind of a struct. Unlike the much larger module.h cleanup, this one is simply two commits. One to fix the implicit <linux/device.h> users, and then one to delete the device.h includes from the linux/include/ dir wherever possible." * tag 'device-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: device.h: audit and cleanup users in main include dir device.h: cleanup users outside of linux/include (C files)
2012-03-24Merge tag 'amd64-edac-updates-for-3.4' of ↵Linus Torvalds21-184/+125
git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp Pull AMD64 EDAC fixes from Borislav Petkov: "A bunch of fixes/updates for the AMD side of EDAC including * MCE decoding updates * tree-wide EDAC sweep making pci_device_ids __devinitconst * Scrub rate API correction * two amd64_edac corrections for K8 boxes and sysfs csrow nodes" * tag 'amd64-edac-updates-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: MCE, AMD: Constify error tables MCE, AMD: Correct bank 5 error signatures MCE, AMD: Rework NB MCE signatures MCE, AMD: Correct VB data error description MCE, AMD: Correct ucode patch buffer description MCE, AMD: Correct some MC0 error types EDAC: Make pci_device_id tables __devinitconst. EDAC: Correct scrub rate API amd64_edac: Fix K8 revD and later chip select sizes amd64_edac: Fix missing csrows sysfs nodes
2012-03-21edac: rename channel_info to rank_infoMauro Carvalho Chehab1-3/+3
What it is pointed by a csrow/channel vector is a rank information, and not a channel information. On a traditional architecture, the memory controller directly access the memory ranks, via chip select rows. Different ranks at the same DIMM is selected via different chip select rows. So, typically, one csrow/channel pair means one different DIMM. On FB-DIMMs, there's a microcontroller chip at the DIMM, called Advanced Memory Buffer (AMB) that serves as the interface between the memory controller and the memory chips. The AMB selection is via the DIMM slot, and not via a csrow. It is up to the AMB to talk with the csrows of the DRAM chips. So, the FB-DIMM memory controllers see the DIMM slot, and not the DIMM rank. RAMBUS is similar. Newer memory controllers, like the ones found on Intel Sandy Bridge and Nehalem, even working with normal DDR3 DIMM's, don't use the usual channel A/channel B interleaving schema to provide 128 bits data access. Instead, they have more channels (3 or 4 channels), and they can use several interleaving schemas. Such memory controllers see the DIMMs directly on their registers, instead of the ranks, which is better for the driver, as its main usageis to point to a broken DIMM stick (the Field Repleceable Unit), and not to point to a broken DRAM chip. The drivers that support such such newer memory architecture models currently need to fake information and to abuse on EDAC structures, as the subsystem was conceived with the idea that the csrow would always be visible by the CPU. To make things a little worse, those drivers don't currently fake csrows/channels on a consistent way, as the concepts there don't apply to the memory controllers they're talking with. So, each driver author interpreted the concepts using a different logic. In order to fix it, let's rename the data structure that points into a DIMM rank to "rank_info", in order to be clearer about what's stored there. Latter patches will provide a better way to represent the memory hierarchy for the other types of memory controller. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2012-03-21i5400_edac: Avoid calling pci_put_device() twiceMauro Carvalho Chehab1-16/+38
When i5400_edac driver is removed and re-loaded a few times, it causes an OOPS, as it is currently decrementing some PCI device usage two times. When called inside a loop, pci_get_device() will call pci_put_device(). That mangles the error count. In this specific case, it seems easier to just duplicate the call. Also fixes the error logic when pci_get_device fails. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2012-03-21edac: i5100 ack error detection register after each readNiklas Söderlund1-7/+4
If I only ack the detection register after a error have been detected I'm unable to reliably detect errors. I have verified this behavior using both an error injection DIMM and software to inject errors. I can't find any documentation supporting this behavior in Intel 5100 Memory Controller Hub Chipset, see 1. So this is all based on experimentation. [1] Intel® 5100 Memory Controller Hub Chipset http://www.intel.com/content/dam/doc/datasheet/5100- memory-controller-hub-chipset-datasheet.pdf Signed-off-by: Niklas Söderlund <niklas.soderlund@ericsson.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2012-03-21edac: i5100 fix erroneous define for M1ErrNiklas Söderlund1-1/+1
According to [1] the define for M1Err in the FERR_NF_MEM register is wrong. It should be at position 1 not 0. [1] Intel 5100 Memory Controller Hub Chipset Doc.Nr: 318378 http://www.intel.com/content/dam/doc/datasheet/5100- memory-controller-hub-chipset-datasheet.pdf Reported-by: Ba Thang Nguyen <thang.b.nguyen@dektech.com.au> Signed-off-by: Niklas Söderlund <niklas.soderlund@ericsson.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2012-03-21edac: sb_edac: Fix a wrong value setting for the previous valueHui Wang1-1/+1
>From the driver design, the variable limit wants to compare with its previous value, we should set the value of limit instead of the value of tmp_mb to the variable prev. Signed-off-by: Hui Wang <jason77.wang@gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2012-03-21edac: sb_edac: Fix a INTERLEAVE_MODE() misuseHui Wang1-1/+1
We can identify dram interleave mode from the Dram Rule register rather than Dram Interleave list register. In this context, the reg of INTERLEAVE_MODE(reg) contains the Dram Interleave list register, we can't get interleave mode from the reg, while the variable interleave_mode saves the the mode got from the Dram Rule register, so we use the variable to replace INTERLEAVE_MDDE(reg) here. Signed-off-by: Hui Wang <jason77.wang@gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2012-03-21edac: sb_edac: Let the driver depend on PCI_MMCONFIGHui Wang1-2/+2
This driver needs to access PCIe Extended Configuration Space Registers (0x100~0xfff), to correctly access those registers, we need to enable PCI_MMCONFIG option. Since this option is not enabled for X86_64 by default, we let the driver depend on it to prevent users forgetting to enable this option. Signed-off-by: Hui Wang <jason77.wang@gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2012-03-21edac/ppc4xx_edac: Fix compilationMauro Carvalho Chehab1-2/+2
It seems that nobody is cross-compiling for this arch anymore... drivers/edac/ppc4xx_edac.c: In function 'ppc4xx_edac_probe': drivers/edac/ppc4xx_edac.c:188:12: error: storage class specified for parameter 'ppc4xx_edac_remove' ... drivers/edac/ppc4xx_edac.c:1068:19: error: 'match' undeclared (first use in this function) drivers/edac/ppc4xx_edac.c:1068:19: note: each undeclared identifier is reported only once for each function it appears in drivers/edac/ppc4xx_edac.c:1068:36: warning: left-hand operand of comma expression has no effect [-Wunused-value] Acked-by: Josh Boyer <jwboyer@gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2012-03-21Fix sb_edac compilation with 32 bits kernelsMauro Carvalho Chehab2-20/+30
As reported by Josh Boyer <jwboyer@redhat.com>: > drivers/edac/sb_edac.c: In function 'get_memory_error_data': > drivers/edac/sb_edac.c:861:2: warning: left shift count >= width of type > [enabled by default] > <snip> > ERROR: "__udivdi3" [drivers/edac/sb_edac.ko] undefined! > make[1]: *** [__modpost] Error 1 > make: *** [modules] Error 2 PS.: compile-tested only Reported-by: Josh Boyer <jwboyer@redhat.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2012-03-20edac: remove the second argument of k[un]map_atomic()Cong Wang1-2/+2
Signed-off-by: Cong Wang <amwang@redhat.com>
2012-03-19MCE, AMD: Constify error tablesBorislav Petkov2-13/+13
... so that checkpatch can chill out. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Reviewed-by: Andreas Herrmann <andreas.herrmann3@amd.com>
2012-03-19MCE, AMD: Correct bank 5 error signaturesBorislav Petkov1-4/+1
... and remove superfluous ErrorCodeExt check. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Reviewed-by: Andreas Herrmann <andreas.herrmann3@amd.com>
2012-03-19MCE, AMD: Rework NB MCE signaturesBorislav Petkov2-129/+48
Correct their formulation, replace per-family functions with a single, unified lookup table. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Reviewed-by: Andreas Herrmann <andreas.herrmann3@amd.com>
2012-03-19MCE, AMD: Correct VB data error descriptionBorislav Petkov1-1/+1
Sync with latest BKDG error types. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Reviewed-by: Andreas Herrmann <andreas.herrmann3@amd.com>
2012-03-19MCE, AMD: Correct ucode patch buffer descriptionBorislav Petkov1-2/+6
This MC1 error signature is called differently now, fix it. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Reviewed-by: Andreas Herrmann <andreas.herrmann3@amd.com>
2012-03-19MCE, AMD: Correct some MC0 error typesBorislav Petkov1-3/+2
Use "System Read Data Error" as a more general name for MC0 bus errors on F15h and update some error definitions. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Reviewed-by: Andreas Herrmann <andreas.herrmann3@amd.com>
2012-03-19EDAC: Make pci_device_id tables __devinitconst.Lionel Debroux18-18/+18
These const tables are currently marked __devinitdata, but Documentation/PCI/pci.txt says: "o The ID table array should be marked __devinitconst; this is done automatically if the table is declared with DEFINE_PCI_DEVICE_TABLE()." So use DEFINE_PCI_DEVICE_TABLE(x). Based on PaX and earlier work by Andi Kleen. Signed-off-by: Lionel Debroux <lionel_debroux@yahoo.fr> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-03-19EDAC: Correct scrub rate APIBorislav Petkov1-2/+2
The original scrub rate API definition states that if scrub rate accessors are not implemented, a negative value (-1) should be written to the sysfs file (/sys/devices/system/edac/mc/mc<N>/sdram_scrub_rate, where N is the memory controller number on the system). This is counter-intuitive and awkward at the very least because, when setting the scrub rate, userspace has to write to sysfs and then read it back to check error status of the operation. As Tony notes, best it would be to not have the sdram_scrub_rate in sysfs if scrub rate support is not implemented. It is too late about that and a bunch of drivers on a bunch of arches would need to be changed and tested which is not a trivial task ATM. Instead, settle for the next best thing of returning -ENODEV when implementation is missing and -EINVAL when there was an error encountered while setting the scrub rate. Reported-by: Han Pingtian <phan@redhat.com> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/20110916105856.GA13253@hpt.nay.redhat.com Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-03-19amd64_edac: Fix K8 revD and later chip select sizesBorislav Petkov1-4/+28
Fix DRAM chip select sizes calculation for K8, revisions D and E. Reported-by: Niklas Söderlund <niklas.soderlund@ericsson.com Link: http://lkml.kernel.org/r/1320849178-23340-1-git-send-email-niklas.soderlund@ericsson.com Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-03-19amd64_edac: Fix missing csrows sysfs nodesAshish Shenoy1-9/+7
While initializing the array of csrow attribute instances, a few csrows were uninitialized. This happened because the module only performed a check for DRAM base ctl register0's and not DRAM base ctl register1's chip select enable bit. There could be systems with DIMMs populated on only single memory channel whereas the module also assumed that a dual channel dimm had double the memory size of a single memory channel instead of checking the memory on each channel. This patch fixes these above issues. Signed-off-by: Ashish Shenoy <ashenoy@riverbed.com> Signed-off-by: Prasanna S. Panchamukhi <ppanchamukhi@riverbed.com> Link: http://lkml.kernel.org/r/4F459CFA.5090604@riverbed.com Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-03-11device.h: cleanup users outside of linux/include (C files)Paul Gortmaker2-0/+2
For files that are actively using linux/device.h, make sure that they call it out. This will allow us to clean up some of the implicit uses of linux/device.h within include/* without introducing build regressions. Yes, this was created by "cheating" -- i.e. the headers were cleaned up, and then the fallout was found and fixed, and then the two commits were reordered. This ensures we don't introduce build regressions into the git history. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-02-22asm-generic: architecture independent readq/writeq for 32bit environmentHitoshi Mitake1-13/+2
This provides unified readq()/writeq() helper functions for 32-bit drivers. For some cases, readq/writeq without atomicity is harmful, and order of io access has to be specified explicitly. So in this patch, new two header files which contain non-atomic readq/writeq are added. - <asm-generic/io-64-nonatomic-lo-hi.h> provides non-atomic readq/ writeq with the order of lower address -> higher address - <asm-generic/io-64-nonatomic-hi-lo.h> provides non-atomic readq/ writeq with reversed order This allows us to remove some readq()s that were added drivers when the default non-atomic ones were removed in commit dbee8a0affd5 ("x86: remove 32-bit versions of readq()/writeq()") The drivers which need readq/writeq but can do with the non-atomic ones must add the line: #include <asm-generic/io-64-nonatomic-lo-hi.h> /* or hi-lo.h */ But this will be nop in 64-bit environments, and no other #ifdefs are required. So I believe that this patch can solve the problem of 1. driver-specific readq/writeq 2. atomicity and order of io access This patch is tested with building allyesconfig and allmodconfig as ARCH=x86 and ARCH=i386 on top of tip/master. Cc: Kashyap Desai <Kashyap.Desai@lsi.com> Cc: Len Brown <lenb@kernel.org> Cc: Ravi Anand <ravi.anand@qlogic.com> Cc: Vikas Chaudhary <vikas.chaudhary@qlogic.com> Cc: Matthew Garrett <mjg@redhat.com> Cc: Jason Uhlenkott <juhlenko@akamai.com> Cc: James Bottomley <James.Bottomley@parallels.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Roland Dreier <roland@purestorage.com> Cc: James Bottomley <jbottomley@parallels.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Matthew Wilcox <matthew.r.wilcox@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Hitoshi Mitake <h.mitake@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-13module_param: make bool parameters really bool (drivers & misc)Rusty Russell1-1/+1
module_param(bool) used to counter-intuitively take an int. In fddd5201 (mid-2009) we allowed bool or int/unsigned int using a messy trick. It's time to remove the int/unsigned int option. For this version it'll simply give a warning, but it'll break next kernel version. Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-01-09Merge branch 'for-linus' of ↵Linus Torvalds2-12/+20
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (53 commits) Kconfig: acpi: Fix typo in comment. misc latin1 to utf8 conversions devres: Fix a typo in devm_kfree comment btrfs: free-space-cache.c: remove extra semicolon. fat: Spelling s/obsolate/obsolete/g SCSI, pmcraid: Fix spelling error in a pmcraid_err() call tools/power turbostat: update fields in manpage mac80211: drop spelling fix types.h: fix comment spelling for 'architectures' typo fixes: aera -> area, exntension -> extension devices.txt: Fix typo of 'VMware'. sis900: Fix enum typo 'sis900_rx_bufer_status' decompress_bunzip2: remove invalid vi modeline treewide: Fix comment and string typo 'bufer' hyper-v: Update MAINTAINERS treewide: Fix typos in various parts of the kernel, and fix some comments. clockevents: drop unknown Kconfig symbol GENERIC_CLOCKEVENTS_MIGR gpio: Kconfig: drop unknown symbol 'CS5535_GPIO' leds: Kconfig: Fix typo 'D2NET_V2' sound: Kconfig: drop unknown symbol ARCH_CLPS7500 ... Fix up trivial conflicts in arch/powerpc/platforms/40x/Kconfig (some new kconfig additions, close to removed commented-out old ones)
2012-01-08Merge branch 'driver-core-next' of ↵Linus Torvalds10-55/+49
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core * 'driver-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (73 commits) arm: fix up some samsung merge sysdev conversion problems firmware: Fix an oops on reading fw_priv->fw in sysfs loading file Drivers:hv: Fix a bug in vmbus_driver_unregister() driver core: remove __must_check from device_create_file debugfs: add missing #ifdef HAS_IOMEM arm: time.h: remove device.h #include driver-core: remove sysdev.h usage. clockevents: remove sysdev.h arm: convert sysdev_class to a regular subsystem arm: leds: convert sysdev_class to a regular subsystem kobject: remove kset_find_obj_hinted() m86k: gpio - convert sysdev_class to a regular subsystem mips: txx9_sram - convert sysdev_class to a regular subsystem mips: 7segled - convert sysdev_class to a regular subsystem sh: dma - convert sysdev_class to a regular subsystem sh: intc - convert sysdev_class to a regular subsystem power: suspend - convert sysdev_class to a regular subsystem power: qe_ic - convert sysdev_class to a regular subsystem power: cmm - convert sysdev_class to a regular subsystem s390: time - convert sysdev_class to a regular subsystem ... Fix up conflicts with 'struct sysdev' removal from various platform drivers that got changed: - arch/arm/mach-exynos/cpu.c - arch/arm/mach-exynos/irq-eint.c - arch/arm/mach-s3c64xx/common.c - arch/arm/mach-s3c64xx/cpu.c - arch/arm/mach-s5p64x0/cpu.c - arch/arm/mach-s5pv210/common.c - arch/arm/plat-samsung/include/plat/cpu.h - arch/powerpc/kernel/sysfs.c and fix up cpu_is_hotpluggable() as per Greg in include/linux/cpu.h
2012-01-07Merge branch 'x86-mce-for-linus' of ↵Linus Torvalds3-8/+6
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: add IRQ context simulation in module mce-inject x86, mce, therm_throt: Don't report power limit and package level thermal throttle events in mcelog x86, MCE: Drain mcelog buffer x86, mce: Add wrappers for registering on the decode chain
2012-01-06Merge branch 'driver-core-next' into Linux 3.2Greg Kroah-Hartman10-55/+49
This resolves the conflict in the arch/arm/mach-s3c64xx/s3c6400.c file, and it fixes the build error in the arch/x86/kernel/microcode_core.c file, that the merge did not catch. The microcode_core.c patch was provided by Stephen Rothwell <sfr@canb.auug.org.au> who was invaluable in the merge issues involved with the large sysdev removal process in the driver-core tree. Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-21x86: Simplify code by removing a !SMP #ifdefs from 'struct cpuinfo_x86'Kevin Winchester1-2/+0
Several fields in struct cpuinfo_x86 were not defined for the !SMP case, likely to save space. However, those fields still have some meaning for UP, and keeping them allows some #ifdef removal from other files. The additional size of the UP kernel from this change is not significant enough to worry about keeping up the distinction: text data bss dec hex filename 4737168 506459 972040 6215667 5ed7f3 vmlinux.o.before 4737444 506459 972040 6215943 5ed907 vmlinux.o.after for a difference of 276 bytes for an example UP config. If someone wants those 276 bytes back badly then it should be implemented in a cleaner way. Signed-off-by: Kevin Winchester <kjwinchester@gmail.com> Cc: Steffen Persvold <sp@numascale.com> Link: http://lkml.kernel.org/r/1324428742-12498-1-git-send-email-kjwinchester@gmail.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-15edac: convert sysdev_class to a regular subsystemKay Sievers10-55/+49
After all sysdev classes are ported to regular driver core entities, the sysdev implementation will be entirely removed from the kernel. Cc: Doug Thompson <dougthompson@xmission.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Lucas De Marchi <lucas.demarchi@profusion.mobi> Cc: Borislav Petkov <borislav.petkov@amd.com> Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-14x86, mce: Add wrappers for registering on the decode chainBorislav Petkov3-8/+6
No functionality change, this is done so that in a follow-on patch all queued-up MCEs can be decoded after registering on the chain. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-12-02treewide: Fix typos in various parts of the kernel, and fix some comments.Justin P. Mattock1-1/+1
The below patch fixes some typos in various parts of the kernel, as well as fixes some comments. Please let me know if I missed anything, and I will try to get it changed and resent. Signed-off-by: Justin P. Mattock <justinmattock@gmail.com> Acked-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Jiri Kosina <jkosina@suse.cz>