kernel/linux.git/drivers/edac, branch v3.4.105

i82975x_edac: Fix dimm label initialization

2014-06-07T23:02:06+00:00

commit 479696840239e0cc43efb3c917bdcad2174d2215 upstream. The driver has only 4 hardcoded labels, but allows much more memory. Fix it by removing the hardcoded logic, using snprintf() instead. [ 19.833972] general protection fault: 0000 [#1] SMP [ 19.837733] Modules linked in: i82975x_edac(+) edac_core firewire_ohci firewire_core crc_itu_t nouveau mxm_wmi wmi video i2c_algo_bit drm_kms_helper ttm drm i2c_core [ 19.837733] CPU 0 [ 19.837733] Pid: 390, comm: udevd Not tainted 3.6.1-1.fc17.x86_64.debug #1 Dell Inc. Precision WorkStation 390 /0MY510 [ 19.837733] RIP: 0010:[] [] strncpy+0x18/0x30 [ 19.837733] RSP: 0018:ffff880078535b68 EFLAGS: 00010202 [ 19.837733] RAX: ffff880069fa9708 RBX: ffff880078588000 RCX: ffff880069fa9708 [ 19.837733] RDX: 000000000000001f RSI: 5f706f5f63616465 RDI: ffff880069fa9708 [ 19.837733] RBP: ffff880078535b68 R08: ffff880069fa9727 R09: 000000000000fffe [ 19.837733] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000003 [ 19.837733] R13: 0000000000000000 R14: ffff880069fa9290 R15: ffff880079624a80 [ 19.837733] FS: 00007f3de01ee840(0000) GS:ffff88007c400000(0000) knlGS:0000000000000000 [ 19.837733] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 19.837733] CR2: 00007f3de00b9000 CR3: 0000000078dbc000 CR4: 00000000000007f0 [ 19.837733] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 19.837733] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 19.837733] Process udevd (pid: 390, threadinfo ffff880078534000, task ffff880079642450) [ 19.837733] Stack: [ 19.837733] ffff880078535c18 ffffffffa017c6b8 00040000816d627f ffff880079624a88 [ 19.837733] ffffc90004cd6000 ffff880079624520 ffff88007ac21148 0000000000000000 [ 19.837733] 0000000000000000 0004000000000000 feda000078535bc8 ffffffff810d696d [ 19.837733] Call Trace: [ 19.837733] [] i82975x_init_one+0x2e6/0x3e6 [i82975x_edac] ... Fix bug reported at: https://bugzilla.redhat.com/show_bug.cgi?id=848149 And, very likely: https://bbs.archlinux.org/viewtopic.php?id=148033 https://bugzilla.kernel.org/show_bug.cgi?id=47171 Cc: Alan Cox Signed-off-by: Mauro Carvalho Chehab [bwh: Backported to 3.2: - Adjust context - Use csrow->channels[chan].label not csrow->channels[chan]->dimm->label] Signed-off-by: Ben Hutchings Cc: Qiang Huang Signed-off-by: Greg Kroah-Hartman

sb_edac: Avoid overflow errors at memory size calculation

2014-04-14T13:44:32+00:00

commit deb09ddaff1435f72dd598d38f9b58354c68a5ec upstream. Sandy bridge EDAC is calculating the memory size with overflow. Basically, the size field and the integer calculation is using 32 bits. More bits are needed, when the DIMM memories have high density. The net result is that memories are improperly reported there, when high-density DIMMs are used: EDAC DEBUG: in drivers/edac/sb_edac.c, line at 591: mc#0: channel 0, dimm 0, -16384 Mb (-4194304 pages) bank: 8, rank: 2, row: 0x10000, col: 0x800 EDAC DEBUG: in drivers/edac/sb_edac.c, line at 591: mc#0: channel 1, dimm 0, -16384 Mb (-4194304 pages) bank: 8, rank: 2, row: 0x10000, col: 0x800 As the number of pages value is handled at the EDAC core as unsigned ints, the driver shows the 16 GB memories at sysfs interface as 16760832 MB! The fix is simple: calculate the number of pages as unsigned 64-bits integer. After the patch, the memory size (16 GB) is properly detected: EDAC DEBUG: in drivers/edac/sb_edac.c, line at 592: mc#0: channel 0, dimm 0, 16384 Mb (4194304 pages) bank: 8, rank: 2, row: 0x10000, col: 0x800 EDAC DEBUG: in drivers/edac/sb_edac.c, line at 592: mc#0: channel 1, dimm 0, 16384 Mb (4194304 pages) bank: 8, rank: 2, row: 0x10000, col: 0x800 Signed-off-by: Mauro Carvalho Chehab [bwh: Backported to 3.2: - Adjust context - Debug log function is debugf0(), not edac_dbg()] Signed-off-by: Ben Hutchings Cc: Qiang Huang Signed-off-by: Greg Kroah-Hartman

i7300_edac: Fix device reference count

2014-03-31T04:40:30+00:00

commit 75135da0d68419ef8a925f4c1d5f63d8046e314d upstream. pci_get_device() decrements the reference count of "from" (last argument) so when we break off the loop successfully we have only one device reference - and we don't know which device we have. If we want a reference to each device, we must take them explicitly and let the pci_get_device() walk complete to avoid duplicate references. This is serious, as over-putting device references will cause the device to eventually disappear. Without this fix, the kernel crashes after a few insmod/rmmod cycles. Tested on an Intel S7000FC4UR system with a 7300 chipset. Signed-off-by: Jean Delvare Link: http://lkml.kernel.org/r/20140224111656.09bbb7ed@endymion.delvare Cc: Mauro Carvalho Chehab Cc: Doug Thompson Cc: stable@vger.kernel.org Signed-off-by: Borislav Petkov Signed-off-by: Greg Kroah-Hartman

i7core_edac: Fix PCI device reference count

2014-03-11T23:10:01+00:00

commit c0f5eeed0f4cef4f05b74883a7160e7edde58b6a upstream. The reference count changes done by pci_get_device can be a little misleading when the usage diverges from the most common scheme. The reference count of the device passed as the last parameter is always decreased, even if the function returns no new device. So if we are going to try alternative device IDs, we must manually increment the device reference count before each retry. If we don't, we end up decreasing the reference count, and after a few modprobe/rmmod cycles the PCI devices will vanish. In other words and as Alan put it: without this fix the EDAC code corrupts the PCI device list. This fixes kernel bug #50491: https://bugzilla.kernel.org/show_bug.cgi?id=50491 Signed-off-by: Jean Delvare Link: http://lkml.kernel.org/r/20140224093927.7659dd9d@endymion.delvare Reviewed-by: Alan Cox Cc: Mauro Carvalho Chehab Cc: Doug Thompson Signed-off-by: Borislav Petkov Signed-off-by: Greg Kroah-Hartman

e752x_edac: Fix pci_dev usage count

2014-02-06T19:05:46+00:00

commit 90ed4988b8c030d65b41b7d13140e9376dc6ec5a upstream. In case the device 0, function 1 is not found using pci_get_device(), pci_scan_single_device() will be used but, differently than pci_get_device(), it allocates a pci_dev but doesn't does bump the usage count on the pci_dev and after few module removals and loads the pci_dev will be freed. Signed-off-by: Aristeu Rozanski Reviewed-by: mark gross Link: http://lkml.kernel.org/r/20131205153755.GL4545@redhat.com Signed-off-by: Borislav Petkov Cc: Jean Delvare Signed-off-by: Greg Kroah-Hartman

EDAC: Test correct variable in ->store function

2013-02-04T00:24:41+00:00

commit 8024c4c0b1057d1cd811fc9c3f88f81de9729fcd upstream. We're testing for ->show but calling ->store(). Signed-off-by: Dan Carpenter Signed-off-by: Borislav Petkov Signed-off-by: Greg Kroah-Hartman

i7300_edac: Fix error flag testing

2012-12-10T18:59:40+00:00

commit 7e06b7a3333f5c7a0cec12aff20d39c5c87c0795 upstream. * Right-shift the values in GET_FBD_FAT_IDX and GET_FBD_NF_IDX, so that the callers get the result they expect. * Fix definition of FERR_FAT_FBD_ERR_MASK. * Call GET_FBD_NF_IDX, not GET_FBD_FAT_IDX, when operating on register FERR_NF_FBD. We were lucky they have the same definition. This fixes kernel bug #44131: https://bugzilla.kernel.org/show_bug.cgi?id=44131 Signed-off-by: Jean Delvare Cc: Mauro Carvalho Chehab Cc: Doug Thompson Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Greg Kroah-Hartman

amd64_edac:__amd64_set_scrub_rate(): avoid overindexing scrubrates[]

2012-10-28T17:14:15+00:00

commit 168bfeef7bba3f9784f7540b053e4ac72b769ce9 upstream. If none of the elements in scrubrates[] matches, this loop will cause __amd64_set_scrub_rate() to incorrectly use the n+1th element. As the function is designed to use the final scrubrates[] element in the case of no match, we can fix this bug by simply terminating the array search at the n-1th element. Boris: this code is fragile anyway, see here why: http://marc.info/?l=linux-kernel&m=135102834131236&w=2 It will be rewritten more robustly soonish. Reported-by: Denis Kirjanov Cc: Doug Thompson Signed-off-by: Andrew Morton Signed-off-by: Borislav Petkov Signed-off-by: Greg Kroah-Hartman

edac: fix the error about memory type detection on SandyBridge

2012-06-22T18:37:15+00:00

commit 2cbb587d3bc41a305168e91b4f3c5b6944a12566 upstream. On SandyBridge, DDRIOA(Dev: 17 Func: 0 Offset: 328) is used to detect whether DIMM is RDIMM/LRDIMM, not TA(Dev: 15 Func: 0). Signed-off-by: Chen Gong Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Greg Kroah-Hartman

edac: avoid mce decoding crash after edac driver unloaded

2012-06-22T18:37:15+00:00

commit e35fca4791fcdd43dc1fd769797df40c562ab491 upstream. Some edac drivers register themselves as mce decoders via notifier_chain. But in current notifier_chain implementation logic, it doesn't accept same notifier registered twice. If so, it will be wrong when adding/removing the element from the list. For example, on one SandyBridge platform, remove module sb_edac and then trigger one error, it will hit oops because it has no mce decoder registered but related notifier_chain still points to an invalid callback function. Here is an example: Call Trace: [] atomic_notifier_call_chain+0x1a/0x20 [] mce_log+0x46/0x180 [] apei_mce_report_mem_error+0x4a/0x60 [] ghes_do_proc+0x192/0x210 [] ghes_proc+0x46/0x70 [] ghes_notify_sci+0x48/0x80 [] notifier_call_chain+0x55/0x80 [] __blocking_notifier_call_chain+0x5a/0x80 [] ? acpi_os_wait_events_complete+0x23/0x23 [] blocking_notifier_call_chain+0x16/0x20 [] acpi_hed_notify+0x19/0x1b [] acpi_device_notify+0x19/0x1b [] acpi_ev_notify_dispatch+0x67/0x7f [] acpi_os_execute_deferred+0x29/0x36 [] process_one_work+0x132/0x450 [] worker_thread+0x17b/0x3c0 [] ? manage_workers+0x120/0x120 [] kthread+0x9e/0xb0 [] kernel_thread_helper+0x4/0x10 [] ? kthread_freezable_should_stop+0x70/0x70 [] ? gs_change+0x13/0x13 Code: f3 49 89 d4 45 85 ed 4d 89 c6 48 8b 0f 74 48 48 85 c9 75 17 eb 41 0f 1f 80 00 00 00 00 41 83 ed 01 4c 89 f9 74 22 4d 85 ff 74 1d <4c> 8b 79 08 4c 89 e2 48 89 de 48 89 cf ff 11 4d 85 f6 74 04 41 RIP [] notifier_call_chain+0x46/0x80 RSP CR2: ffffffffa01af838 ---[ end trace 0100930068e73e6f ]--- BUG: unable to handle kernel paging request at fffffffffffffff8 IP: [] kthread_data+0x10/0x20 PGD 1a0d067 PUD 1a0e067 PMD 0 Oops: 0000 [#2] SMP Only i7core_edac and sb_edac have such issues because they have more than one memory controller which means they have to register mce decoder many times. Signed-off-by: Chen Gong Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Greg Kroah-Hartman