<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/drivers/edac, branch v4.19.112</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v4.19.112</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v4.19.112'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2020-03-11T13:14:45+00:00</updated>
<entry>
<title>EDAC/amd64: Set grain per DIMM</title>
<updated>2020-03-11T13:14:45+00:00</updated>
<author>
<name>Yazen Ghannam</name>
<email>yazen.ghannam@amd.com</email>
</author>
<published>2020-03-05T16:30:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=17f86cfbe738dee566167642c988ba8dc2c63c03'/>
<id>urn:sha1:17f86cfbe738dee566167642c988ba8dc2c63c03</id>
<content type='text'>
[ Upstream commit 466503d6b1b33be46ab87c6090f0ade6c6011cbc ]

The following commit introduced a warning on error reports without a
non-zero grain value.

  3724ace582d9 ("EDAC/mc: Fix grain_bits calculation")

The amd64_edac_mod module does not provide a value, so the warning will
be given on the first reported memory error.

Set the grain per DIMM to cacheline size (64 bytes). This is the current
recommendation.

Fixes: 3724ace582d9 ("EDAC/mc: Fix grain_bits calculation")
Signed-off-by: Yazen Ghannam &lt;yazen.ghannam@amd.com&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Cc: "linux-edac@vger.kernel.org" &lt;linux-edac@vger.kernel.org&gt;
Cc: James Morse &lt;james.morse@arm.com&gt;
Cc: Mauro Carvalho Chehab &lt;mchehab@kernel.org&gt;
Cc: Robert Richter &lt;rrichter@marvell.com&gt;
Cc: Tony Luck &lt;tony.luck@intel.com&gt;
Link: https://lkml.kernel.org/r/20191022203448.13962-7-Yazen.Ghannam@amd.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>EDAC/mc: Fix edac_mc_find() in case no device is found</title>
<updated>2020-01-27T13:50:48+00:00</updated>
<author>
<name>Robert Richter</name>
<email>rrichter@marvell.com</email>
</author>
<published>2019-05-14T10:49:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a0a4643f1899b6f9339957d3c7c8e749735be494'/>
<id>urn:sha1:a0a4643f1899b6f9339957d3c7c8e749735be494</id>
<content type='text'>
[ Upstream commit 29a0c843973bc385918158c6976e4dbe891df969 ]

The function should return NULL in case no device is found, but it
always returns the last checked mc device from the list even if the
index did not match. Fix that.

I did some analysis why this did not raise any issues for about 3 years
and the reason is that edac_mc_find() is mostly used to search for
existing devices. Thus, the bug is not triggered.

 [ bp: Drop the if (mci-&gt;mc_idx &gt; idx) test in favor of readability. ]

Fixes: c73e8833bec5 ("EDAC, mc: Fix locking around mc_devices list")
Signed-off-by: Robert Richter &lt;rrichter@marvell.com&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Cc: "linux-edac@vger.kernel.org" &lt;linux-edac@vger.kernel.org&gt;
Cc: James Morse &lt;james.morse@arm.com&gt;
Cc: Mauro Carvalho Chehab &lt;mchehab@kernel.org&gt;
Link: https://lkml.kernel.org/r/20190514104838.15065-1-rrichter@marvell.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>EDAC/ghes: Fix grain calculation</title>
<updated>2019-12-31T15:35:58+00:00</updated>
<author>
<name>Robert Richter</name>
<email>rrichter@marvell.com</email>
</author>
<published>2019-11-06T09:33:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=93056ca4e713eec06b614ba666386005c6629245'/>
<id>urn:sha1:93056ca4e713eec06b614ba666386005c6629245</id>
<content type='text'>
[ Upstream commit 7088e29e0423d3195e09079b4f849ec4837e5a75 ]

The current code to convert a physical address mask to a grain
(defined as granularity in bytes) is:

	e-&gt;grain = ~(mem_err-&gt;physical_addr_mask &amp; ~PAGE_MASK);

This is broken in several ways:

1) It calculates to wrong grain values. E.g., a physical address mask
of ~0xfff should give a grain of 0x1000. Without considering
PAGE_MASK, there is an off-by-one. Things are worse when also
filtering it with ~PAGE_MASK. This will calculate to a grain with the
upper bits set. In the example it even calculates to ~0.

2) The grain does not depend on and is unrelated to the kernel's
page-size. The page-size only matters when unmapping memory in
memory_failure(). Smaller grains are wrongly rounded up to the
page-size, on architectures with a configurable page-size (e.g. arm64)
this could round up to the even bigger page-size of the hypervisor.

Fix this with:

	e-&gt;grain = ~mem_err-&gt;physical_addr_mask + 1;

The grain_bits are defined as:

	grain = 1 &lt;&lt; grain_bits;

Change also the grain_bits calculation accordingly, it is the same
formula as in edac_mc.c now and the code can be unified.

The value in -&gt;physical_addr_mask coming from firmware is assumed to
be contiguous, but this is not sanity-checked. However, in case the
mask is non-contiguous, a conversion to grain_bits effectively
converts the grain bit mask to a power of 2 by rounding it up.

Suggested-by: James Morse &lt;james.morse@arm.com&gt;
Signed-off-by: Robert Richter &lt;rrichter@marvell.com&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Reviewed-by: Mauro Carvalho Chehab &lt;mchehab+samsung@kernel.org&gt;
Cc: "linux-edac@vger.kernel.org" &lt;linux-edac@vger.kernel.org&gt;
Cc: Tony Luck &lt;tony.luck@intel.com&gt;
Link: https://lkml.kernel.org/r/20191106093239.25517-11-rrichter@marvell.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>EDAC, thunderx: Fix memory leak in thunderx_l2c_threaded_isr()</title>
<updated>2019-12-01T08:16:18+00:00</updated>
<author>
<name>Dan Carpenter</name>
<email>dan.carpenter@oracle.com</email>
</author>
<published>2018-10-13T10:28:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b95998fb6c500790fc4f13ae5feadac72cfa4e1d'/>
<id>urn:sha1:b95998fb6c500790fc4f13ae5feadac72cfa4e1d</id>
<content type='text'>
[ Upstream commit d8c27ba86a2fd806d3957e5a9b30e66dfca2a61d ]

Fix memory leak in L2c threaded interrupt handler.

 [ bp: Rewrite commit message. ]

Fixes: 41003396f932 ("EDAC, thunderx: Add Cavium ThunderX EDAC driver")
Signed-off-by: Dan Carpenter &lt;dan.carpenter@oracle.com&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
CC: David Daney &lt;david.daney@cavium.com&gt;
CC: Jan Glauber &lt;jglauber@cavium.com&gt;
CC: Mauro Carvalho Chehab &lt;mchehab@kernel.org&gt;
CC: Sergey Temerkhanov &lt;s.temerkhanov@gmail.com&gt;
CC: linux-edac &lt;linux-edac@vger.kernel.org&gt;
Link: http://lkml.kernel.org/r/20181013102843.GG16086@mwanda
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>EDAC: Correct DIMM capacity unit symbol</title>
<updated>2019-11-20T17:47:15+00:00</updated>
<author>
<name>Qiuxu Zhuo</name>
<email>qiuxu.zhuo@intel.com</email>
</author>
<published>2018-09-19T00:34:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=3b9528b302a60023347480c3b880e681704113b6'/>
<id>urn:sha1:3b9528b302a60023347480c3b880e681704113b6</id>
<content type='text'>
[ Upstream commit 6f6da136046294a1e8d2944336eb97412751f653 ]

The {i3200|i7core|sb|skx}_edac drivers show DIMM capacity using the
wrong unit symbol: 'Mb' - megabit. Fix them by replacing 'Mb' with
'MiB' - mebibyte.

[Tony: These are all "edac_dbg()" messages, so this won't break scripts
       that parse console logs.]

Signed-off-by: Qiuxu Zhuo &lt;qiuxu.zhuo@intel.com&gt;
Signed-off-by: Tony Luck &lt;tony.luck@intel.com&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Acked-by: Aristeu Rozanski &lt;aris@redhat.com&gt;
Cc: Mauro Carvalho Chehab &lt;mchehab@kernel.org&gt;
Cc: linux-edac@vger.kernel.org
Link: https://lkml.kernel.org/r/20180919003433.16475-1-tony.luck@intel.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>EDAC, sb_edac: Return early on ADDRV bit and address type test</title>
<updated>2019-11-20T17:46:17+00:00</updated>
<author>
<name>Qiuxu Zhuo</name>
<email>qiuxu.zhuo@intel.com</email>
</author>
<published>2018-09-07T23:08:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=0773f03aebdd8533b425fc1eb6fdb43f9b721f81'/>
<id>urn:sha1:0773f03aebdd8533b425fc1eb6fdb43f9b721f81</id>
<content type='text'>
[ Upstream commit dcc960b225ceb2bd66c45e0845d03e577f7010f9 ]

Users of the mce_register_decode_chain() are called for every logged
error. EDAC drivers should check:

1) Is this a memory error? [bit 7 in status register]
2) Is there a valid address? [bit 58 in status register]
3) Is the address a system address? [bitfield 8:6 in misc register]

The sb_edac driver performed test "1" twice. Waited far too long to
perform check "2". Didn't do check "3" at all.

Fix it by moving the test for valid address from
sbridge_mce_output_error() into sbridge_mce_check_error() and add a test
for the type immediately after. Delete the redundant check for the type
of the error from sbridge_mce_output_error().

Signed-off-by: Qiuxu Zhuo &lt;qiuxu.zhuo@intel.com&gt;
Cc: Aristeu Rozanski &lt;aris@redhat.com&gt;
Cc: Mauro Carvalho Chehab &lt;mchehab@kernel.org&gt;
Cc: Qiuxu Zhuo &lt;qiuxu.zhuo@intel.com&gt;
Cc: linux-edac &lt;linux-edac@vger.kernel.org&gt;
Link: http://lkml.kernel.org/r/20180907230828.13901-2-tony.luck@intel.com
[ Re-word commit message. ]
Signed-off-by: Tony Luck &lt;tony.luck@intel.com&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>EDAC/ghes: Fix Use after free in ghes_edac remove path</title>
<updated>2019-10-29T08:20:01+00:00</updated>
<author>
<name>James Morse</name>
<email>james.morse@arm.com</email>
</author>
<published>2019-10-14T17:19:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=d97e4a6d2b2f7c0895bbf0ea90be4fcd46df910c'/>
<id>urn:sha1:d97e4a6d2b2f7c0895bbf0ea90be4fcd46df910c</id>
<content type='text'>
commit 1e72e673b9d102ff2e8333e74b3308d012ddf75b upstream.

ghes_edac models a single logical memory controller, and uses a global
ghes_init variable to ensure only the first ghes_edac_register() will
do anything.

ghes_edac is registered the first time a GHES entry in the HEST is
probed. There may be multiple entries, so subsequent attempts to
register ghes_edac are silently ignored as the work has already been
done.

When a GHES entry is unregistered, it calls ghes_edac_unregister(),
which free()s the memory behind the global variables in ghes_edac.

But there may be multiple GHES entries, the next call to
ghes_edac_unregister() will dereference the free()d memory, and attempt
to free it a second time.

This may also be triggered on a platform with one GHES entry, if the
driver is unbound/re-bound and unbound. The re-bind step will do
nothing because of ghes_init, the second unbind will then do the same
work as the first.

Doing the unregister work on the first call is unsafe, as another
CPU may be processing a notification in ghes_edac_report_mem_error(),
using the memory we are about to free.

ghes_init is already half of the reference counting. We only need
to do the register work for the first call, and the unregister work
for the last. Add the unregister check.

This means we no longer free ghes_edac's memory while there are
GHES entries that may receive a notification.

This was detected by KASAN and DEBUG_TEST_DRIVER_REMOVE.

 [ bp: merge into a single patch. ]

Fixes: 0fe5f281f749 ("EDAC, ghes: Model a single, logical memory controller")
Reported-by: John Garry &lt;john.garry@huawei.com&gt;
Signed-off-by: James Morse &lt;james.morse@arm.com&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Cc: linux-edac &lt;linux-edac@vger.kernel.org&gt;
Cc: Mauro Carvalho Chehab &lt;mchehab@kernel.org&gt;
Cc: Robert Richter &lt;rrichter@marvell.com&gt;
Cc: Tony Luck &lt;tony.luck@intel.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Link: https://lkml.kernel.org/r/20191014171919.85044-2-james.morse@arm.com
Link: https://lkml.kernel.org/r/304df85b-8b56-b77e-1a11-aa23769f2e7c@huawei.com
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>EDAC/amd64: Decode syndrome before translating address</title>
<updated>2019-10-05T11:09:48+00:00</updated>
<author>
<name>Yazen Ghannam</name>
<email>yazen.ghannam@amd.com</email>
</author>
<published>2019-08-22T00:00:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=f9de170eaf7ee2cc50f5e3808d6fbb2dbbc3b6f9'/>
<id>urn:sha1:f9de170eaf7ee2cc50f5e3808d6fbb2dbbc3b6f9</id>
<content type='text'>
[ Upstream commit 8a2eaab7daf03b23ac902481218034ae2fae5e16 ]

AMD Family 17h systems currently require address translation in order to
report the system address of a DRAM ECC error. This is currently done
before decoding the syndrome information. The syndrome information does
not depend on the address translation, so the proper EDAC csrow/channel
reporting can function without the address. However, the syndrome
information will not be decoded if the address translation fails.

Decode the syndrome information before doing the address translation.
The syndrome information is architecturally defined in MCA_SYND and can
be considered robust. The address translation is system-specific and may
fail on newer systems without proper updates to the translation
algorithm.

Fixes: 713ad54675fd ("EDAC, amd64: Define and register UMC error decode function")
Signed-off-by: Yazen Ghannam &lt;yazen.ghannam@amd.com&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Cc: "linux-edac@vger.kernel.org" &lt;linux-edac@vger.kernel.org&gt;
Cc: James Morse &lt;james.morse@arm.com&gt;
Cc: Mauro Carvalho Chehab &lt;mchehab@kernel.org&gt;
Cc: Tony Luck &lt;tony.luck@intel.com&gt;
Link: https://lkml.kernel.org/r/20190821235938.118710-6-Yazen.Ghannam@amd.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>EDAC/amd64: Recognize DRAM device type ECC capability</title>
<updated>2019-10-05T11:09:48+00:00</updated>
<author>
<name>Yazen Ghannam</name>
<email>yazen.ghannam@amd.com</email>
</author>
<published>2019-08-21T23:59:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=6f80e91a66e06959e68160cc16f3b2a42f71fc51'/>
<id>urn:sha1:6f80e91a66e06959e68160cc16f3b2a42f71fc51</id>
<content type='text'>
[ Upstream commit f8be8e5680225ac9caf07d4545f8529b7395327f ]

AMD Family 17h systems support x4 and x16 DRAM devices. However, the
device type is not checked when setting mci.edac_ctl_cap.

Set the appropriate capability flag based on the device type.

Default to x8 DRAM device when neither the x4 or x16 bits are set.

 [ bp: reverse cpk_en check to save an indentation level. ]

Fixes: 2d09d8f301f5 ("EDAC, amd64: Determine EDAC MC capabilities on Fam17h")
Signed-off-by: Yazen Ghannam &lt;yazen.ghannam@amd.com&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Cc: "linux-edac@vger.kernel.org" &lt;linux-edac@vger.kernel.org&gt;
Cc: James Morse &lt;james.morse@arm.com&gt;
Cc: Mauro Carvalho Chehab &lt;mchehab@kernel.org&gt;
Cc: Tony Luck &lt;tony.luck@intel.com&gt;
Link: https://lkml.kernel.org/r/20190821235938.118710-3-Yazen.Ghannam@amd.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>EDAC, pnd2: Fix ioremap() size in dnv_rd_reg()</title>
<updated>2019-10-05T11:09:39+00:00</updated>
<author>
<name>Stephen Douthit</name>
<email>stephend@silicom-usa.com</email>
</author>
<published>2019-08-09T14:18:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=4410b85179504a08a9c0953b6615c458870e9273'/>
<id>urn:sha1:4410b85179504a08a9c0953b6615c458870e9273</id>
<content type='text'>
[ Upstream commit 29a3388bfcce7a6d087051376ea02bf8326a957b ]

Depending on how BIOS has marked the reserved region containing the 32KB
MCHBAR you can get warnings like:

resource sanity check: requesting [mem 0xfed10000-0xfed1ffff], which spans more than reserved [mem 0xfed10000-0xfed17fff]
caller dnv_rd_reg+0xc8/0x240 [pnd2_edac] mapping multiple BARs

Not all of the mmio regions used in dnv_rd_reg() are the same size.  The
MCHBAR window is 32KB and the sideband ports are 64KB.  Pass the correct
size to ioremap() depending on which resource we're reading from.

Signed-off-by: Stephen Douthit &lt;stephend@silicom-usa.com&gt;
Signed-off-by: Tony Luck &lt;tony.luck@intel.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
</feed>
