summaryrefslogtreecommitdiff
path: root/drivers/edac/amd64_edac.c
AgeCommit message (Collapse)AuthorFilesLines
2010-01-15amd64_edac: Ensure index stays within bounds in amd64_get_scrub_rateRoel Kluin1-1/+1
Add a missing iterator variable thus fixing the conditional of the for-loop in amd64_get_scrub_rate(). Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-24amd64_edac: restrict PCI config space accessBorislav Petkov1-1/+1
Do not access F2x19[0,4] on K8 since they're undefined there. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-24amd64_edac: fix forcing module load/unloadBorislav Petkov1-2/+1
Clear the override flag after force-loading the module. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-24amd64_edac: make driver loading more robustBorislav Petkov1-7/+16
Currently, the module does not initialize fully when the DIMMs aren't ECC but remains still loaded. Propagate the error when no instance of the driver is properly initialized and prevent further loading. Reorganize and polish error handling in amd64_edac_init() while at it. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-24amd64_edac: fix driver instance freeingBorislav Petkov1-5/+4
Fix use-after-free errors by pushing all memory-freeing calls to the end of amd64_remove_one_instance(). Reported-by: Darren Jenkins <darrenrjenkins@gmail.com> LKML-Reference: <1261370306.11354.52.camel@ICE-BOX> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-24amd64_edac: fix K8 chip select reportingBorislav Petkov1-2/+6
Fix the case when amd64_debug_display_dimm_sizes() reports only half the amount of DRAM on it because it doesn't account for when the single DCT operates in 128-bit mode and merges chip selects from different DIMMs. Reported-by: Johannes Hirte <johannes.hirte@fem.tu-ilmenau.de> LKML-Reference: <200912112202.48173.johannes.hirte@fem.tu-ilmenau.de> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-11x86, msr: Add support for non-contiguous cpumasksBorislav Petkov1-29/+17
The current rd/wrmsr_on_cpus helpers assume that the supplied cpumasks are contiguous. However, there are machines out there like some K8 multinode Opterons which have a non-contiguous core enumeration on each node (e.g. cores 0,2 on node 0 instead of 0,1), see http://www.gossamer-threads.com/lists/linux/kernel/1160268. This patch fixes out-of-bounds writes (see URL above) by adding per-CPU msr structs which are used on the respective cores. Additionally, two helpers, msrs_{alloc,free}, are provided for use by the callers of the MSR accessors. Cc: H. Peter Anvin <hpa@zytor.com> Cc: Mauro Carvalho Chehab <mchehab@redhat.com> Cc: Aristeu Rozanski <aris@redhat.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> LKML-Reference: <20091211171440.GD31998@aftab> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-12-08amd64_edac: fix use-uninitialised bugAndrew Morton1-1/+1
drivers/edac/amd64_edac.c: In function 'amd64_edac_init': drivers/edac/amd64_edac.c:2840: warning: 'ret' may be used uninitialized in this function Cc: Doug Thompson <dougthompson@xmission.com> Cc: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-08amd64_edac: correct sys address to chip select mappingBorislav Petkov1-31/+27
The routine does the reverse mapping of the error address of a CECC back to the node id, DRAM controller and chip select of the DIMM which caused the error. We should lookup the channel using the syndromes _only_ when the DCTs are ganged so fix that. Also, add an early exit when there's an error while scanning for the csrow thus decreasing indentation levels for better readability. Finally, fixup comments. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-08amd64_edac: add a leaner syndrome decoding algorithmBorislav Petkov1-126/+154
Instead of using the whole syndrome tables for channel decoding, use a set of eigenvectors with which the tables can be generated to search for the syndrome in error. The algorithm operates independently of symbol size and can be used for both x4 and x8 syndromes. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-07amd64_edac: remove early hw support checkBorislav Petkov1-46/+1
The .probe_valid_hardware low_ops member checked whether the DCTs are in DDR3 mode and bailed out if so. Now that all the needed changes for DDR3 support is in place, remove it. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-07amd64_edac: detect DDR3 memory typeBorislav Petkov1-3/+4
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-07edac: add memory types strings for debuggingBorislav Petkov1-4/+1
Instead of using deeply-nested conditionals for dumping the DIMM type in debug mode, add a strings array of the supported DIMM types. This is useful in cases where an edac driver supports multiple DRAM types and is only defined in debug builds. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-07amd64_edac: remove unneeded extract_error_address wrapperBorislav Petkov1-18/+4
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-07amd64_edac: rename StinkyIdentifierBorislav Petkov1-18/+18
SystemAddress -> sys_addr Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-07amd64_edac: remove superfluous dbg printkBorislav Petkov1-3/+0
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-07amd64_edac: enhance address to DRAM bank mappingBorislav Petkov1-100/+94
Add cs mode to cs size mapping tables for DDR2 and DDR3 and F10 and all K8 flavors and remove klugdy table of pseudo values. Add a low_ops->dbam_to_cs member which is family-specific and replaces low_ops->dbam_map_to_pages since the pages calculation is a one liner now. Further cleanups, while at it: - shorten family name defines - align amd64_family_types struct members Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-07amd64_edac: cleanup f10_early_channel_countBorislav Petkov1-11/+7
Do not read DCLR[01] again since this is done in amd64_read_mc_registers() earlier. There can be more than two physical DIMMs present so clamp the channels value to max 2. Also, do not report DCT data width - it is also done earlier. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-07amd64_edac: dump DIMM sizes on K8 tooBorislav Petkov1-20/+22
Extend f10_debug_display_dimm_sizes to dump the logical DIMMs configuration on K8 revF too. Remove the ganged arg since we print the DCT operating mode (ganged vs unganged) earlier. Also, DCT csrow configuration is relevant therefore dump it as KERN_DEBUG instead of only on debug builds. Remove misleading DIMM output since there's no reliable way of mapping of chip selects to actual physical DIMMs. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-07amd64_edac: cleanup rest of amd64_dump_misc_regsBorislav Petkov1-21/+12
Clarify bitfields description, add PCI config function/offset names to registers for easy reference, simplify code layout, remove unneeded info. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-07amd64_edac: cleanup DRAM cfg low debug outputBorislav Petkov1-33/+32
Carve out the register-specific debug statements into a separate function, clarify meanings of the single bitfields in the register, remove irrelevant output and macros. There should be no functionality change resulting from this patch. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-07amd64_edac: wrap-up pci config read error handlingBorislav Petkov1-159/+54
Add a pci config read wrapper for signaling pci config space access errors instead of them being visible only on a debug build. This is important on amd64_edac since it uses all those pci config register values to access the DRAM/DIMM configuration of the nodes. In addition, the wrapper makes a _lot_ (look at the diffstat!) of error handling code superfluous and improves much of the overall code readability by removing error handling details out of the way. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-07amd64_edac: unify MCGCTL ECC switchingBorislav Petkov1-91/+113
Unify almost identical code into one function and remove NUMA-specific usage (specifically cpumask_of_node()) in favor of generic topology methods. Remove unused defines, while at it. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-07cpumask: use modern cpumask style in drivers/edac/amd64_edac.cRusty Russell1-9/+15
cpumask_t -> struct cpumask, and don't put one on the stack. (Note: this is actually on the stack unless CONFIG_CPUMASK_OFFSTACK=y). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-07amd64_edac: make DRAM regions output more human-readableBorislav Petkov1-12/+9
Do not shift the TOP_MEM and TOP_MEM2 values by 23 but rather save the whole 64-bit value read from the MSR. Although the TOP_MEM/TOP_MEM2 bits are only a subset of the 64bit register, the values are correct since the remaining bits are Read-As-Zero and no shifting is needed. Also, cleanup DRAM base/limit debug output. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-07amd64_edac: clarify DRAM CTL debug reportingBorislav Petkov1-14/+23
Make debug info formulations about the DRAM and DCT configuration of the machine more human readable. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-11-04amd64_edac: fix CECCs reportingBorislav Petkov1-1/+1
Shift error type bits properly. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-11-04amd64_edac: fix a wrong goto clause in amd64_edac.cLi Hong1-3/+1
In amd64_edac_init(void) in amd64_edac.c, cache_k8_northbridges() is called before pci_register_driver. If it fails, should exit with err directly. Signed-off-by: Li Hong <lihong.hi@gmail.com> Acked-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-10-16amd64_edac: fix DRAM base and limit extraction masks, v2Borislav Petkov1-4/+4
This is a proper fix as a follow-up to 66216a7 and 916d11b. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-10-07amd64_edac: fix DRAM base and limit extractionBorislav Petkov1-5/+5
On Fam10h and above, F1x[1, 0][7C:40] are DRAM Base/Limit registers which specify the destination node of a DRAM address. Those address boundaries are being extracted into ->dram_base[] and ->dram_limit[]. Correct the extraction masks to match the respective address bits. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-10-07amd64_edac: fix chip select handlingBorislav Petkov1-31/+26
Different processor families support a different number of chip selects. Handle this in a family-dependent way with the proper values assigned at init time (see amd64_set_dct_base_and_mask). Remove _DCSM_COUNT defines since they're used at one place and originate from public documentation. CC: Keith Mannthey <kmannth@us.ibm.com> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-10-07amd64_edac: simple fix to allow reporting of CECC errorsKeith Mannthey1-1/+1
This allows the errors to be further decoded and mapped to csrows. Tested with ECC debug dimms and an Rev F cpu based system. Signed-off-by: Keith Mannthey <kmannth@us.ibm.com> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-10-07amd64_edac: fix K8 intlv_sel checkBorislav Petkov1-11/+8
The check when DRAM interleaving is enabled should be done against the pvt->dram_IntlvSel field and not against the ->dram_limit. Simplify first loop and fixup printk formatting while at it. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-10-07amd64_edac: fix interleave enable testsBorislav Petkov1-4/+4
The pvt->dram_IntlvEn saves the 3 "Interleave Enable" bits already right-shifted by 8 so the check in find_mc_by_sys_addr() by shifting the values to the left 8 bits is wrong. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-10-07amd64_edac: fix DRAM base and limit address extractionBorislav Petkov1-2/+2
K8 DRAM base and limit addresses from F1x40 +8*i and F1x44 + 8*i, where i in (0..7) are both bits 39-24 and therefore the shifting should be done by 24 and not by 8. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-10-07amd64_edac: fix driver instance lookup table allocationBorislav Petkov1-2/+2
Allocate memory statically for 8-node machines max for simplicity instead of relying on MAX_NUMNODES which is 0 on !CONFIG_NUMA builds. Spotted by Jan Beulich. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-09-16amd64_edac: check NB MCE bank enable on the current node properlyBorislav Petkov1-21/+45
The old code was using smp_call_function_many which skips the current cpu if it is in the supplied cpumask. Switch to the rdmsr_on_cpus() interface which takes care of that. In addition, add get_cpus_on_this_dct_cpumask helper which computes a cpumask of all the cores on a node and thus on a DCT. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-09-16amd64_edac: Rewrite unganged mode code of f10_early_channel_countWan Wei1-35/+10
Simplify the procedure by checking if there is any DIMM in each channel. This patch will fix the bugs such as when there is no DIMMs under certain node, two DIMMs in the same channel, and only one DIMM in each channel of the node. Borislav: minor fixups Signed-off-by: Wan Wei <wanwei@mail.dawning.com.cn> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-09-16amd64_edac: cleanup amd64_check_ecc_enabledBorislav Petkov1-51/+25
Simplify code flow and make sure return value is always valid since further driver init depends on it. Carve out long warning string and make code more readable. Shorten some names, while at it. There should be no functional change resulting from this patch. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-09-14EDAC, AMD: carve out decoding of MCi_STATUS ErrorCodeBorislav Petkov1-4/+0
This is the MCE error code from the MCi_STATUS banks, bits [15:0] which describe what type of error was encountered: GART TLB, Memory or Bus error. The semantics of those bits are identical across all MCE banks so decode those separately, irrespectively of MCE type. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-09-14EDAC, AMD: carve out MCi_STATUS decodingBorislav Petkov1-4/+4
The MCi_STATUS registers have most field definitions in common so decode them in the general path. Do not pass ecc_type along and compute it in __amd64_decode_bus_error instead. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-09-14x86, mce: pass mce info to EDAC for decodingBorislav Petkov1-74/+24
Move NB decoder along with required defines to EDAC MCE core. Add registration routines for further decoding of the MCE info in the AMD64 EDAC module. CC: Andi Kleen <andi@firstfloor.org> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-09-14amd64_edac: cleanup amd64_decode_bus_errorBorislav Petkov1-26/+9
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-09-14amd64_edac: remove memory and GART TLB error decodersBorislav Petkov1-29/+7
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-09-14amd64_edac: cleanup/complete NB MCE decodingBorislav Petkov1-79/+46
* don't dump info which mcheck already does * update to newest BKDG * mv amd64_process_error_info -> amd64_decode_nb_mce * shorten error struct names * remove redundant info ptr in amd64_process_error_info * remove unused ErrorCodeExt[19:16] (MCx_STATUS) defines Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-09-14amd64_edac: cleanup amd64_process_error_infoBorislav Petkov1-24/+20
* mv amd64_error_info_regs -> err_regs * remove redundant info ptr Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-09-14EDAC: move MCE error descriptions to EDAC coreBorislav Petkov1-62/+78
This is in preparation of adding AMD-specific MCE decoding functionality to the EDAC core. The error decoding macros originate from the AMD64 EDAC driver albeit in a simplified and cleaned up version here. While at it, add macros to generate the error description strings and use them in the error type decoders directly which removes a bunch of code and makes the decoding functions much more readable. Also, fix strings and shorten macro names. Remove superfluous htlink_msgs. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-08-04amd64_edac: print debug statements only on errorDoug Thompson1-0/+4
Add forgotten return calls for the successful cases. Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-08-03amd64_edac: fix ECC checkingDoug Thompson1-0/+3
On the good path of BIOS enabled ECC and no override, the value returned is 1 by omission and thus is deemed failing by the probe-function. Allow proper module initialization by clearing the retval explicitly. Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-07-27amd64_edac: read the right F2 maskoffset regWan Wei1-1/+1
Signed-off-by: Wan Wei <onewayforever@gmail.com> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>