summaryrefslogtreecommitdiff
path: root/drivers/edac/edac_device.c
AgeCommit message (Collapse)AuthorFilesLines
2019-10-09EDAC/device: Rework error logging APIHanna Hawa1-21/+29
Make the main workhorse the "count" functions which can log a @count of errors. Have the current APIs edac_device_handle_{ce,ue}() call the _count() variants and this way keep the exported symbols number unchanged. [ bp: Rewrite. ] Signed-off-by: Hanna Hawa <hhhawa@amazon.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: benh@amazon.com Cc: dwmw@amazon.co.uk Cc: hanochu@amazon.com Cc: James Morse <james.morse@arm.com> Cc: jonnyc@amazon.com Cc: linux-edac <linux-edac@vger.kernel.org> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: ronenk@amazon.com Cc: talel@amazon.com Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20190923191741.29322-2-hhhawa@amazon.com
2016-12-24Replace <asm/uaccess.h> with <linux/uaccess.h> globallyLinus Torvalds1-1/+1
This was entirely automated, using the script by Al: PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>' sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \ $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h) to do the replacement at the end of the merge window. Requested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-15edac: move documentation from edac_device to edac_core.hMauro Carvalho Chehab1-58/+0
Several functions are documented at edac_device.c. As we'll be including edac_core.h at drivers-api book, move those, in order for the kernel-doc markups be part of the API documentation book. As several of those kernel-doc macros are not in the right format, fix them. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
2016-12-15edac: move EDAC device definitions to drivers/edac/edac_device.hMauro Carvalho Chehab1-12/+9
The edac_core.h header contain data structures and function definitions for both EDAC MC and EDAC device. Let's move the devices ones to a separate header file, as part of a header reorganization. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
2015-12-11EDAC: Rework workqueue handlingBorislav Petkov1-17/+11
Hide the EDAC workqueue pointer in a separate compilation unit and add accessors for the workqueue manipulations needed. Remove edac_pci_reset_delay_period() which wasn't used by anything. It seems it got added without a user with 91b99041c1d5 ("drivers/edac: updated PCI monitoring") Signed-off-by: Borislav Petkov <bp@suse.de>
2015-12-11EDAC: Make edac_device workqueue setup/teardown functions staticBorislav Petkov1-3/+3
They're not used anywhere else. Signed-off-by: Borislav Petkov <bp@suse.de>
2015-12-11EDAC: Robustify workqueues destructionBorislav Petkov1-7/+4
EDAC workqueue destruction is really fragile. We cancel delayed work but if it is still running and requeues itself, we still go ahead and destroy the workqueue and the queued work explodes when workqueue core attempts to run it. Make the destruction more robust by switching op_state to offline so that requeuing stops. Cancel any pending work *synchronously* too. EDAC i7core: Driver loaded. general protection fault: 0000 [#1] SMP CPU 12 Modules linked in: Supported: Yes Pid: 0, comm: kworker/0:1 Tainted: G IE 3.0.101-0-default #1 HP ProLiant DL380 G7 RIP: 0010:[<ffffffff8107dcd7>] [<ffffffff8107dcd7>] __queue_work+0x17/0x3f0 < ... regs ...> Process kworker/0:1 (pid: 0, threadinfo ffff88019def6000, task ffff88019def4600) Stack: ... Call Trace: call_timer_fn run_timer_softirq __do_softirq call_softirq do_softirq irq_exit smp_apic_timer_interrupt apic_timer_interrupt intel_idle cpuidle_idle_call cpu_idle Code: ... RIP __queue_work RSP <...> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org>
2014-01-10EDAC: Don't try to cancel workqueue when it's never setupStephen Boyd1-0/+3
We only setup a workqueue for edac devices that use the polling method. We still try to cancel the workqueue if an edac_device uses the irq method though. This causes a warning from debug objects when we remove an edac device: WARNING: CPU: 0 PID: 56 at lib/debugobjects.c:260 debug_print_object+0x98/0xc0() ODEBUG: assert_init not available (active state 0) object type: timer_list hint: stub_timer+0x0/0x28 Modules linked in: krait_edac(-) CPU: 0 PID: 56 Comm: rmmod Not tainted 3.12.0-rc2-00035-g15292a0 #69 (unwind_backtrace+0x0/0x144) (show_stack+0x20/0x24) (dump_stack+0x74/0xb4) (warn_slowpath_common+0x78/0x9c) (warn_slowpath_fmt+0x40/0x48) (debug_print_object+0x98/0xc0) (debug_object_assert_init+0xdc/0xec) (del_timer+0x24/0x7c) (try_to_grab_pending+0xc0/0x1b0) (cancel_delayed_work+0x2c/0xa0) (edac_device_workq_teardown+0x1c/0x38) (edac_device_del_device+0xb8/0xe4) (krait_edac_remove+0x50/0x70 [krait_edac]) (platform_drv_remove+0x24/0x28) (__device_release_driver+0x68/0xc0) (driver_detach+0xc4/0xc8) (bus_remove_driver+0xac/0x114) (driver_unregister+0x38/0x58) (platform_driver_unregister+0x1c/0x20) (krait_edac_driver_exit+0x14/0x1c [krait_edac]) (SyS_delete_module+0x178/0x2b4) Fix it by skipping the workqueue teardown for such devices. Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Link: http://lkml.kernel.org/r/1388434457-4194-2-git-send-email-sboyd@codeaurora.org Signed-off-by: Borislav Petkov <bp@suse.de>
2013-11-05edac: Unify reporting of device info for device, mc and pciRobert Richter1-6/+3
Log messages slightly differ between edac subsystems. Unifying it. Signed-off-by: Robert Richter <robert.richter@linaro.org> Acked-by: Rob Herring <rob.herring@calxeda.com> Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Robert Richter <rric@kernel.org>
2012-07-30Merge branch 'devel'Mauro Carvalho Chehab1-25/+22
* devel: (33 commits) edac i5000, i5400: fix pointer math in i5000_get_mc_regs() edac: allow specifying the error count with fake_inject edac: add support for Calxeda highbank L2 cache ecc edac: add support for Calxeda highbank memory controller edac: create top-level debugfs directory sb_edac: properly handle error count i7core_edac: properly handle error count edac: edac_mc_handle_error(): add an error_count parameter edac: remove arch-specific parameter for the error handler amd64_edac: Don't pass driver name as an error parameter edac_mc: check for allocation failure in edac_mc_alloc() edac: Increase version to 3.0.0 edac_mc: Cleanup per-dimm_info debug messages edac: Convert debugfX to edac_dbg(X, edac: Use more normal debugging macro style edac: Don't add __func__ or __FILE__ for debugf[0-9] msgs Edac: Add ABI Documentation for the new device nodes edac: move documentation ABI to ABI/testing/sysfs-devices-edac i7core_edac: change the mem allocation scheme to make Documentation/kobject.txt happy edac: change the mem allocation scheme to make Documentation/kobject.txt happy ...
2012-06-11edac: Convert debugfX to edac_dbg(X,Joe Perches1-25/+22
Use a more common debugging style. Remove __FILE__ uses, add missing newlines, coalesce formats and align arguments. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2012-06-11edac: Don't add __func__ or __FILE__ for debugf[0-9] msgsMauro Carvalho Chehab1-14/+14
The debug macro already adds that. Most of the work here was made by this small script: $f .=$_ while (<>); $f =~ s/(debugf[0-9]\s*\(\s*)__FILE__\s*": /\1"/g; $f =~ s/(debugf[0-9]\s*\(\s*)__FILE__\s*/\1/g; $f =~ s/(debugf[0-9]\s*\(\s*)__FILE__\s*"MC: /\1"/g; $f =~ s/(debugf[0-9]\s*\(\")\%s[\:\,\(\)]*\s*([^\"]*\s*[^\)]+)__func__\s*\,\s*/\1\2/g; $f =~ s/(debugf[0-9]\s*\(\")\%s[\:\,\(\)]*\s*([^\"]*\s*[^\)]+),\s*__func__\s*\)/\1\2)/g; $f =~ s/(debugf[0-9]\s*\(\"MC\:\s*)\%s[\:\,\(\)]*\s*([^\"]*\s*[^\)]+)__func__\s*\,\s*/\1\2/g; $f =~ s/(debugf[0-9]\s*\(\"MC\:\s*)\%s[\:\,\(\)]*\s*([^\"]*\s*[^\)]+),\s*__func__\s*\)/\1\2)/g; $f =~ s/\"MC\: \\n\"/"MC:\\n"/g; print $f; After running the script, manual cleanups were done to fix it the remaining places. While here, removed the __LINE__ on most places, as it doesn't actually give useful info on most places. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2012-05-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edacLinus Torvalds1-16/+11
Pull EDAC internal API changes from Mauro Carvalho Chehab: "This changeset is the first part of a series of patches that fixes the EDAC sybsystem. On this set, it changes the Kernel EDAC API in order to properly represent the Intel i3/i5/i7, Xeon 3xxx/5xxx/7xxx, and Intel E5-xxxx memory controllers. The EDAC core used to assume that: - the DRAM chip select pin is directly accessed by the memory controller - when multiple channels are used, they're all filled with the same type of memory. None of the above premises is true on Intel memory controllers since 2002, when RAMBUS and FB-DIMMs were introduced, and Advanced Memory Buffer or by some similar technologies hides the direct access to the DRAM pins. So, the existing drivers for those chipsets had to lie to the EDAC core, in general telling that just one channel is filled. That produces some hard to understand error messages like: EDAC MC0: CE row 3, channel 0, label "DIMM1": 1 Unknown error(s): memory read error on FATAL area : cpu=0 Err=0008:00c2 (ch=2), addr = 0xad1f73480 => socket=0, Channel=0(mask=2), rank=1 The location information there (row3 channel 0) is completely bogus: it has no physical meaning, and are just some random values that the driver uses to talk with the EDAC core. The error actually happened at CPU socket 0, channel 0, slot 1, but this is not reported anywhere, as the EDAC core doesn't know anything about the memory layout. So, only advanced users that know how the EDAC driver works and that tests their systems to see how DIMMs are mapped can actually benefit for such error logs. This patch series fixes the error report logic, in order to allow the EDAC to expose the memory architecture used by them to the EDAC core. So, as the EDAC core now understands how the memory is organized, it can provide an useful report: EDAC MC0: CE memory read error on DIMM1 (channel:0 slot:1 page:0x364b1b offset:0x600 grain:32 syndrome:0x0 - count:1 area:DRAM err_code:0001:0090 socket:0 channel_mask:1 rank:4) The location of the DIMM where the error happened is reported by "MC0" (cpu socket #0), at "channel:0 slot:1" location, and matches the physical location of the DIMM. There are two remaining issues not covered by this patch series: - The EDAC sysfs API will still report bogus values. So, userspace tools like edac-utils will still use the bogus data; - Add a new tracepoint-based way to get the binary information about the errors. Those are on a second series of patches (also at -next), but will probably miss the train for 3.5, due to the slow review process." Fix up trivial conflict (due to spelling correction of removed code) in drivers/edac/edac_device.c * git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac: (42 commits) i7core: fix ranks information at the per-channel struct i5000: Fix the fatal error handling i5100_edac: Fix a warning when compiled with 32 bits i82975x_edac: Test nr_pages earlier to save a few CPU cycles e752x_edac: provide more info about how DIMMS/ranks are mapped i5000_edac: Fix the logic that retrieves memory information i5400_edac: improve debug messages to better represent the filled memory edac: Cleanup the logs for i7core and sb edac drivers edac: Initialize the dimm label with the known information edac: Remove the legacy EDAC ABI x38_edac: convert driver to use the new edac ABI tile_edac: convert driver to use the new edac ABI sb_edac: convert driver to use the new edac ABI r82600_edac: convert driver to use the new edac ABI ppc4xx_edac: convert driver to use the new edac ABI pasemi_edac: convert driver to use the new edac ABI mv64x60_edac: convert driver to use the new edac ABI mpc85xx_edac: convert driver to use the new edac ABI i82975x_edac: convert driver to use the new edac ABI i82875p_edac: convert driver to use the new edac ABI ...
2012-05-29edac: rewrite edac_align_ptr()Mauro Carvalho Chehab1-16/+11
The edac_align_ptr() function is used to prepare data for a single memory allocation kzalloc() call. It counts how many bytes are needed by some data structure. Using it as-is is not that trivial, as the quantity of memory elements reserved is not there, but, instead, it is on a next call. In order to avoid mistakes when using it, move the number of allocated elements into it, making easier to use it. Reviewed-by: Borislav Petkov <bp@amd64.org> Cc: Aristeu Rozanski <arozansk@redhat.com> Cc: Doug Thompson <norsk5@yahoo.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2012-05-22edac, mips: don't change code that has been removed in edac/mips treeJiri Kosina1-1/+1
This is a partial revert of 15ed103a9800 ("edac: Fix spelling errors") 6997991ab0db ("mips: Fix printk typos in arc/mips") which change code that doesn't exist any more in edac/mips trees. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-04-30edac: Fix spelling errors.David Mackey1-5/+5
Signed-off-by: David Mackey <tdmackey@twitter.com> Signed-off-by: Vinson Lee <vlee@twitter.com> Acked-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-12-15edac: convert sysdev_class to a regular subsystemKay Sievers1-1/+0
After all sysdev classes are ported to regular driver core entities, the sysdev implementation will be entirely removed from the kernel. Cc: Doug Thompson <dougthompson@xmission.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Lucas De Marchi <lucas.demarchi@profusion.mobi> Cc: Borislav Petkov <borislav.petkov@amd.com> Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-05-27edac,rcu: use synchronize_rcu() instead of call_rcu()+rcu_barrier()Lai Jiangshan1-18/+6
synchronize_rcu() does the stuff as needed. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Doug Thompson <dougthompson@xmission.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Mauro Carvalho Chehab <mchehab@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-31Fix common misspellingsLucas De Marchi1-2/+2
Fixes generated by 'codespell' and manually reviewed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
2009-09-24edac: core: remove completion-wait for complete with rcu_barrierJesper Dangaard Brouer1-4/+1
Module edac_core.ko uses call_rcu() callbacks in edac_device.c, edac_mc.c and edac_pci.c. They all use a wait_for_completion() scheme, but this scheme it not 100% safe on multiple CPUs. See the _rcu_barrier() implementation which explains why extra precausion is needed. The patch adds a comment about rcu_barrier() and as a precausion calls rcu_barrier(). A maintainer needs to look at removing the wait_for_completion code. [dougthompson@xmission.com: remove the wait_for_completion code] Signed-off-by Jesper Dangaard Brouer <hawk@comx.dk> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-19edac: add edac_device_alloc_index()Harry Ciao1-0/+14
Add edac_device_alloc_index(), because for MAPLE platform there may exist several EDAC driver modules that could make use of edac_device_ctl_info structure at the same time. The index allocation for these structures should be taken care of by EDAC core. [akpm@linux-foundation.org: cleanups] Signed-off-by: Harry Ciao <qingtao.cao@windriver.com> Cc: Doug Thompson <norsk5@yahoo.com> Cc: Michael Ellerman <michael@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Kumar Gala <galak@gate.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-14edac: use to_delayed_work()Jean Delvare1-1/+1
The edac-core driver includes code which assumes that the work_struct which is included in every delayed_work is the first member of that structure. This is currently the case but might change in the future, so use to_delayed_work() instead, which doesn't make such an assumption. linux-2.6.30-rc1 has the to_delayed_work() function that will allow this patch to work Signed-off-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-07edac: struct device: replace bus_id with dev_name(), dev_set_name()Kay Sievers1-1/+1
This patch is part of a larger patch series which will remove the "char bus_id[20]" name string from struct device. The device name is managed in the kobject anyway, and without any size limitation, and just needlessly copied into "struct device". [akpm@linux-foundation.org: coding-style fixes] Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Acked-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-12-24edac: fix edac core deadlock when removing a deviceHarry Ciao1-3/+9
When deleting an edac device, we have to wait for its edac_dev.work to be completed before deleting the whole edac_dev structure. Since we have no idea which work in current edac_poller's workqueue is the work we are conerned about, we wait for all work in the edac_poller's workqueue to be proceseed. This is done via flush_cpu_workqueue() which inserts a wq_barrier into the tail of the workqueue and then sleeping on the completion of this wq_barrier. The edac_poller will wake up sleepers when it is found. EDAC core creates only one kernel worker thread, edac_poller, to run the works of all current edac devices. They share the same callback function of edac_device_workq_function(), which would grab the mutex of device_ctls_mutex first before it checks the device. This is exactly where edac_poller and rmmod would have a great chance to deadlock. In below call trace of rmmod > ... > edac_device_del_device > edac_device_workq_teardown > flush_workqueue > flush_cpu_workqueue, device_ctls_mutex would have already been grabbed by edac_device_del_device(). So, on one hand rmmod would sleep on the completion of a wq_barrier, holding device_ctls_mutex; on the other hand edac_poller would be blocked on the same mutex when it's running any one of works of existing edac evices(Note, this edac_dev.work is likely to be totally irrelevant to the one that is being removed right now)and never would have a chance to run the work of above wq_barrier to wake rmmod up. edac_device_workq_teardown() should not be called within the critical region of device_ctls_mutex. Just like is done in edac_pci_del_device() and edac_mc_del_mc(), where edac_pci_workq_teardown() and edac_mc_workq_teardown() are called after related mutex are released. Moreover, an edac_dev.work should check first if it is being removed. If this is the case, then it should bail out immediately. Since not all of existing edac devices are to be removed, this "shutting flag" should be contained to edac device being removed. The current edac_dev.op_state can be used to serve this purpose. The original deadlock problem and the solution have been witnessed and tested on actual hardware. Without the solution, rmmod an edac driver would result in below deadlock: root@localhost:/root> rmmod mv64x60_edac EDAC DEBUG: mv64x60_dma_err_remove() EDAC DEBUG: edac_device_del_device() EDAC DEBUG: find_edac_device_by_dev() (hang for a moment) INFO: task edac-poller:2030 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. edac-poller D 00000000 0 2030 2 Call Trace: [df159dc0] [c0071e3c] free_hot_cold_page+0x17c/0x304 (unreliable) [df159e80] [c000a024] __switch_to+0x6c/0xa0 [df159ea0] [c03587d8] schedule+0x2f4/0x4d8 [df159f00] [c03598a8] __mutex_lock_slowpath+0xa0/0x174 [df159f40] [e1030434] edac_device_workq_function+0x28/0xd8 [edac_core] [df159f60] [c003beb4] run_workqueue+0x114/0x218 [df159f90] [c003c674] worker_thread+0x5c/0xc8 [df159fd0] [c004106c] kthread+0x5c/0xa0 [df159ff0] [c0013538] original_kernel_thread+0x44/0x60 INFO: task rmmod:2062 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. rmmod D 0ff2c9fc 0 2062 1839 Call Trace: [df119c00] [c0437a74] 0xc0437a74 (unreliable) [df119cc0] [c000a024] __switch_to+0x6c/0xa0 [df119ce0] [c03587d8] schedule+0x2f4/0x4d8 [df119d40] [c03591dc] schedule_timeout+0xb0/0xf4 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-06dev_name introduction fall out fixStephen Rothwell1-3/+3
Commit 06916639e2fed9ee475efef2747a1b7429f8fe76 ("driver-core: add dev_name() to help transition away from using bus_id") added a static inline dev_name() and used it in dev_printk. Unfortunately, drivers/edac/edac_core.h defines a macro called dev_name(). Rename the latter. Diagnosis by Tony Breeds and Michael Ellerman. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29edac: remove unneeded functions and add static accessorAdrian Bunk1-31/+0
Collection of patches, merged into one, from Adrian that do the following: 1) This patch makes the following needlessly global functions static: - edac_pci_get_log_pe() - edac_pci_get_log_npe() - edac_pci_get_panic_on_pe() - edac_pci_unregister_sysfs_instance_kobj() - edac_pci_main_kobj_setup() 2) Remove unneeded function edac_device_find() 3) Added #if 0 around function edac_pci_find() 4) make the needlessly global edac_pci_generic_check() static 5) Removed function edac_check_mc_devices() Doug Thompson modified Adrian's patches, to bettern represent the direction of EDAC, and make them one patch. Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29edac: use the shorter LIST_HEAD for brevityRobert P. J. Day1-1/+1
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Acked-by: Doug Thompson <norsk5@yahoo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-07drivers-edac: use round_jiffies_relativeAnton Blanchard1-2/+2
When rounding a relative timeout we need to use round_jiffies_relative(). Signed-off-by: Anton Blanchard <anton@samba.org> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-07drivers-edac: turn on edac device error loggingDoug Thompson1-0/+4
ENABLE the 'logging' of CE and UE events for the EDAC_DEVICE class of error harvester in EDAC Cc: Alan Cox <alan@lxorguk.ukuu.org.uk Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19drivers/edac: fix edac_device sysfs corner case bugDoug Thompson1-14/+30
Some simple fixes to properly reference counter values from the block attribute level of edac_device objects. Properly sequencing the array pointer was added, resulting in correct identification of block level attributes from their base class functions. Added more verbose debug statement for event tracking. Also during some corner testing, found a bug in the store/show sequence of operations for the block attribute/controls management. An old intermediate structure for 'blocks' was still in the processing pipeline. This patch removes that old structure and correctly utilizes the new struct edac_dev_sysfs_block_attribute for passing control from the sysfs to the low level store/show function of the edac driver. Now the proper kobj pointer to passed downward to the store/show functions. Signed-off-by: Doug Thompson <dougthompson@xmission.com> Cc: Greg KH <greg@kroah.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19drivers/edac: fix workq reset deadlockDoug Thompson1-9/+47
Fix mutex locking deadlock on the device controller linked list. Was calling a lock then a function that could call the same lock. Moved the cancel workq function to outside the lock Added some short circuit logic in the workq code Added comments of description Code tidying Signed-off-by: Doug Thompson <dougthompson@xmission.com> Cc: Greg KH <greg@kroah.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19drivers/edac: fix edac_device sysfs completion codeDouglas Thompson1-9/+28
With feedback, this patch corrects operation of the kobject release operation on kobjects, attributes and controls for the edac_device. Cc: Alan Cox alan@lxorguk.ukuu.org.uk Signed-off-by: Doug Thompson <dougthompson@xmission.com> Acked-by: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19drivers/edac: fix edac_device init apisDoug Thompson1-4/+4
Refactoring of sysfs code necessitated the refactoring of the edac_device_alloc() and edac_device_add_device() apis, of moving the index value to the alloc() function. This patch alters the in tree drivers to utilize this new api signature. Having the index value performed later created a chicken-and-the-egg issue. Moving it to the alloc() function allows for creating the necessary sysfs entries with the proper index number Cc: Alan Cox alan@lxorguk.ukuu.org.uk Signed-off-by: Doug Thompson <dougthompson@xmission.com> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19drivers/edac: fix leaf sysfs attributeDouglas Thompson1-31/+63
This patch fixes and enhances the driver level set of sysfs attributes that can be added to the 'block' level of an edac_device type of driver. There is a controller information structure, which contains one or more instances of device. Each instance will have one or more blocks of device specific counters. This patch fixes the ability to have more detailed attributes/controls for each of the 'blocks', providing for the addition of controls/attributes from the low level driver to user space via sysfs. Cc: Alan Cox alan@lxorguk.ukuu.org.uk Signed-off-by: Douglas Thompson <dougthompson@xmission.com> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19drivers/edac: fix edac_device semaphore to mutexDoug Thompson1-11/+11
A previous patch changed the edac_mc src file from semaphore usage to mutex This patch changes the edac_device src file as well, from semaphore use to mutex operation. Use a mutex primitive for mutex operations, as it does not require a semaphore Cc: Alan Cox alan@lxorguk.ukuu.org.uk Signed-off-by: Doug Thompson <dougthompson@xmission.com> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19drivers/edac: mod edac_opt_state_to_string functionDouglas Thompson1-1/+1
Refactored the function edac_op_state_toString() to be edac_op_state_to_string() for consistent style, and its callers Signed-off-by: Douglas Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19drivers/edac: mod edac_align_ptr functionDouglas Thompson1-6/+3
Refactor the edac_align_ptr() function to reduce the noise of casting the aligned pointer to the various types of data objects and modified its callers to its new signature Signed-off-by: Douglas Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19drivers/edac: edac_device code tidyingDouglas Thompson1-51/+54
For the file edac_device.c perform some coding style enhancements Add some function header comments Made for better readability commands Signed-off-by: Douglas Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19drivers/edac: cleanup spaces-gotos after Lindent messupDouglas Thompson1-54/+56
This patch fixes some remnant spaces inserted by the use of Lindent. Seems Lindent adds some spaces when it shoulded. These have been fixed. In addition, goto targets have issues, these have been fixed in this patch. Signed-off-by: Douglas Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19drivers/edac: device output clenaupDouglas Thompson1-4/+4
The error handling output strings needed to be refactored for better displaying of the error informaton. Also needed to added offset_value for output as well Signed-off-by: Douglas Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19drivers/edac: cleanup workq ifdefsDouglas Thompson1-11/+0
The origin of this code comes from patches at sourceforge, that allow EDAC to be updated to various kernels. With kernel version 2.6.20 a new workq system was installed, thus the patches needed to be modified based on the kernel version. For submitting to the latest kernel.org those #ifdefs are removed Signed-off-by: Douglas Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19drivers/edac: core Lindent cleanupDouglas Thompson1-94/+92
Run the EDAC CORE files through Lindent for cleanup Signed-off-by: Douglas Thompson <dougthompson@xmission.com> Signed-off-by: Dave Jiang <djiang@mvista.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19drivers/edac: updated PCI monitoringDave Jiang1-22/+1
Moving PCI to a per-instance device model This should include the correct sysfs setup as well. Please review. Signed-off-by: Dave Jiang <djiang@mvista.com> Signed-off-by: Douglas Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19drivers/edac: mod MC to use workq instead of kthreadDave Jiang1-16/+20
Move the memory controller object to work queue based implementation from the kernel thread based. Signed-off-by: Dave Jiang <djiang@mvista.com> Signed-off-by: Douglas Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19drivers/edac: add dev_name getter functionDave Jiang1-3/+3
Move dev_name() macro to a more generic interface since it's not possible to determine whether a device is pci, platform, or of_device easily. Now each low level driver sets the name into the control structure, and the EDAC core references the control structure for the information. Better abstraction. Signed-off-by: Dave Jiang <djiang@mvista.com> Signed-off-by: Douglas Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19drivers/edac: add edac_device classDouglas Thompson1-0/+669
This patch adds the new 'class' of object to be managed, named: 'edac_device'. As a peer of the 'edac_mc' class of object, it provides a non-memory centric view of an ERROR DETECTING device in hardware. It provides a sysfs interface and an abstraction for varioius EDAC type devices. Multiple 'instances' within the class are possible, with each 'instance' able to have multiple 'blocks', and each 'block' having 'attributes'. At the 'block' level there are the 'ce_count' and 'ue_count' fields which the device driver can update and/or call edac_device_handle_XX() functions. At each higher level are additional 'total' count fields, which are a summation of counts below that level. This 'edac_device' has been used to capture and present ECC errors which are found in a a L1 and L2 system on a per CORE/CPU basis. Signed-off-by: Douglas Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>