Age | Commit message (Collapse) | Author | Files | Lines |
|
[ Upstream commit 7d444522303177f3a3c09b9abb104ddeea470a70 ]
The SDIO firmware does not provide RSSI value to the host, it's only set to
zero. In that case don't report the value to mac80211. One risk here is that
value zero might be a valid value with other firmware, currently there's no way
to detect that.
Without the fix, the rssi value indicated by iw changes between the actual
value and -95.
Tested with QCA6174 SDIO with firmware WLAN.RMH.4.4.1-00005-QCARMSWP-1.
Co-developed-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Alagu Sankar <alagusankar@silex-india.com>
Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit dcd5fb82ffb484124203aa339733663ac0b059f3 ]
Reference counting in amdgpu_dm_connector for amdgpu_dm_connector::dc_sink
and amdgpu_dm_connector::dc_em_sink as well as in dc_link::local_sink seems
to be out of shape. Thus make reference counting consistent for these
members and just plain increment the reference count when the variable
gets assigned and decrement when the pointer is set to zero or replaced.
Also simplify reference counting in selected function sopes to be sure the
reference is released in any case. In some cases add NULL pointer check
before dereferencing.
At a hand full of places a comment is placed to stat that the reference
increment happened already somewhere else.
This actually fixes the following kernel bug on my system when enabling
display core in amdgpu. There are some more similar bug reports around,
so it probably helps at more places.
kernel BUG at mm/slub.c:294!
invalid opcode: 0000 [#1] SMP PTI
CPU: 9 PID: 1180 Comm: Xorg Not tainted 5.0.0-rc1+ #2
Hardware name: Supermicro X10DAi/X10DAI, BIOS 3.0a 02/05/2018
RIP: 0010:__slab_free+0x1e2/0x3d0
Code: 8b 54 24 30 48 89 4c 24 28 e8 da fb ff ff 4c 8b 54 24 28 85 c0 0f 85 67 fe ff ff 48 8d 65 d8 5b 41 5c 41 5d 41 5e 41 5f 5d c3 <0f> 0b 49 3b 5c 24 28 75 ab 48 8b 44 24 30 49 89 4c 24 28 49 89 44
RSP: 0018:ffffb0978589fa90 EFLAGS: 00010246
RAX: ffff92f12806c400 RBX: 0000000080200019 RCX: ffff92f12806c400
RDX: ffff92f12806c400 RSI: ffffdd6421a01a00 RDI: ffff92ed2f406e80
RBP: ffffb0978589fb40 R08: 0000000000000001 R09: ffffffffc0ee4748
R10: ffff92f12806c400 R11: 0000000000000001 R12: ffffdd6421a01a00
R13: ffff92f12806c400 R14: ffff92ed2f406e80 R15: ffffdd6421a01a20
FS: 00007f4170be0ac0(0000) GS:ffff92ed2fb40000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000562818aaa000 CR3: 000000045745a002 CR4: 00000000003606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
? drm_dbg+0x87/0x90 [drm]
dc_stream_release+0x28/0x50 [amdgpu]
amdgpu_dm_connector_mode_valid+0xb4/0x1f0 [amdgpu]
drm_helper_probe_single_connector_modes+0x492/0x6b0 [drm_kms_helper]
drm_mode_getconnector+0x457/0x490 [drm]
? drm_connector_property_set_ioctl+0x60/0x60 [drm]
drm_ioctl_kernel+0xa9/0xf0 [drm]
drm_ioctl+0x201/0x3a0 [drm]
? drm_connector_property_set_ioctl+0x60/0x60 [drm]
amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
do_vfs_ioctl+0xa4/0x630
? __sys_recvmsg+0x83/0xa0
ksys_ioctl+0x60/0x90
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x5b/0x160
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f417110809b
Code: 0f 1e fa 48 8b 05 ed bd 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d bd bd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffdd8d1c268 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000562818a8ebc0 RCX: 00007f417110809b
RDX: 00007ffdd8d1c2a0 RSI: 00000000c05064a7 RDI: 0000000000000012
RBP: 00007ffdd8d1c2a0 R08: 0000562819012280 R09: 0000000000000007
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000c05064a7
R13: 0000000000000012 R14: 0000000000000012 R15: 00007ffdd8d1c2a0
Modules linked in: nfsv4 dns_resolver nfs lockd grace fscache fuse vfat fat amdgpu intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul chash gpu_sched crc32_pclmul snd_hda_codec_realtek ghash_clmulni_intel amd_iommu_v2 iTCO_wdt iTCO_vendor_support ttm snd_hda_codec_generic snd_hda_codec_hdmi ledtrig_audio snd_hda_intel drm_kms_helper snd_hda_codec intel_cstate snd_hda_core drm snd_hwdep snd_seq snd_seq_device intel_uncore snd_pcm intel_rapl_perf snd_timer snd soundcore ioatdma pcspkr intel_wmi_thunderbolt mxm_wmi i2c_i801 lpc_ich pcc_cpufreq auth_rpcgss sunrpc igb crc32c_intel i2c_algo_bit dca wmi hid_cherry analog gameport joydev
This patch is based on agd5f/drm-next-5.1-wip. This patch does not require
all of that, but agd5f/drm-next-5.1-wip contains at least one more dc_sink
counting fix that I could spot.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit aea6f028d01d629eda2e958ccd1133e805cda159 ]
Previously we only updated the drop_progress key if we were in the
DROP_REFERENCE stage of snapshot deletion. This is because the
UPDATE_BACKREF stage checks the flags of the blocks it's converting to
FULL_BACKREF, so if we go over a block we processed before it doesn't
matter, we just don't do anything.
The problem is in do_walk_down() we will go ahead and drop the roots
reference to any blocks that we know we won't need to walk into.
Given subvolume A and snapshot B. The root of B points to all of the
nodes that belong to A, so all of those nodes have a refcnt > 1. If B
did not modify those blocks it'll hit this condition in do_walk_down
if (!wc->update_ref ||
generation <= root->root_key.offset)
goto skip;
and in "goto skip" we simply do a btrfs_free_extent() for that bytenr
that we point at.
Now assume we modified some data in B, and then took a snapshot of B and
call it C. C points to all the nodes in B, making every node the root
of B points to have a refcnt > 1. This assumes the root level is 2 or
higher.
We delete snapshot B, which does the above work in do_walk_down,
free'ing our ref for nodes we share with A that we didn't modify. Now
we hit a node we _did_ modify, thus we own. We need to walk down into
this node and we set wc->stage == UPDATE_BACKREF. We walk down to level
0 which we also own because we modified data. We can't walk any further
down and thus now need to walk up and start the next part of the
deletion. Now walk_up_proc is supposed to put us back into
DROP_REFERENCE, but there's an exception to this
if (level < wc->shared_level)
goto out;
we are at level == 0, and our shared_level == 1. We skip out of this
one and go up to level 1. Since path->slots[1] < nritems we
path->slots[1]++ and break out of walk_up_tree to stop our transaction
and loop back around. Now in btrfs_drop_snapshot we have this snippet
if (wc->stage == DROP_REFERENCE) {
level = wc->level;
btrfs_node_key(path->nodes[level],
&root_item->drop_progress,
path->slots[level]);
root_item->drop_level = level;
}
our stage == UPDATE_BACKREF still, so we don't update the drop_progress
key. This is a problem because we would have done btrfs_free_extent()
for the nodes leading up to our current position. If we crash or
unmount here and go to remount we'll start over where we were before and
try to free our ref for blocks we've already freed, and thus abort()
out.
Fix this by keeping track of the last place we dropped a reference for
our block in do_walk_down. Then if wc->stage == UPDATE_BACKREF we know
we'll start over from a place we meant to, and otherwise things continue
to work as they did before.
I have a complicated reproducer for this problem, without this patch
we'll fail to fsck the fs when replaying the log writes log. With this
patch we can replay the whole log without any fsck or mount failures.
The steps to reproduce this easily are sort of tricky, I had to add a
couple of debug patches to the kernel in order to make it easy,
basically I just needed to make sure we did actually commit the
transaction every time we finished a walk_down_tree/walk_up_tree combo.
The reproducer:
1) Creates a base subvolume.
2) Creates 100k files in the subvolume.
3) Snapshots the base subvolume (snap1).
4) Touches files 5000-6000 in snap1.
5) Snapshots snap1 (snap2).
6) Deletes snap1.
I do this with dm-log-writes, and then replay to every FUA in the log
and fsck the fs.
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
[ copy reproducer steps ]
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 3812b8c5c5d527239ac015f1f2c7654da7fcfbba ]
Adding -rR to MAKEFLAGS is important because we do not want to
be bothered by built-in implicit rules or variables.
One problem that used to exist in older GNU Make versions is
MAKEFLAGS += -rR
... does not become effective in the current Makefile. When you are
building with O= option, it becomes effective in the top Makefile
since it recurses via 'sub-make' target. Otherwise, the top Makefile
tries implicit rules. That is why we explicitly add empty rules for
Makefiles, but we often miss to do that.
In fact, adding -d option to older GNU Make versions shows it is
trying a bunch of implicit pattern rules.
Considering target file `scripts/Makefile.kcov'.
Looking for an implicit rule for `scripts/Makefile.kcov'.
Trying pattern rule with stem `Makefile.kcov'.
Trying implicit prerequisite `scripts/Makefile.kcov.o'.
Trying pattern rule with stem `Makefile.kcov'.
Trying implicit prerequisite `scripts/Makefile.kcov.c'.
Trying pattern rule with stem `Makefile.kcov'.
Trying implicit prerequisite `scripts/Makefile.kcov.cc'.
Trying pattern rule with stem `Makefile.kcov'.
Trying implicit prerequisite `scripts/Makefile.kcov.C'.
...
This issue was fixed by GNU Make commit 58dae243526b ("[Savannah #20501]
Handle adding -r/-R to MAKEFLAGS in the makefile"). So, it is no longer
a problem if you use GNU Make 4.0 or later. However, older versions are
still widely used.
So, I decided to patch the kernel Makefile to invoke sub-make regardless
of O= option. This will allow further cleanups.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 9390dff66a52d1a60c6e517d8fa6cdbdffc83cb1 ]
If include/config/auto.conf.cmd is lost for some reasons, it is not
self-healing, so the top Makefile misses to run syncconfig.
Move include/config/auto.conf.cmd to the target side.
I used a pattern rule instead of a normal rule here although it is
a bit gross.
If the rule were written with a normal rule like this,
include/config/auto.conf \
include/config/auto.conf.cmd \
include/config/tristate.conf: $(KCONFIG_CONFIG)
$(Q)$(MAKE) -f $(srctree)/Makefile syncconfig
... syncconfig would be executed per target.
Using a pattern rule makes sure that syncconfig is executed just once
because Make assumes the recipe will create all of the targets.
Here is a quote from the GNU Make manual [1]:
"Pattern rules may have more than one target. Unlike normal rules,
this does not act as many different rules with the same prerequisites
and recipe. If a pattern rule has multiple targets, make knows that
the rule's recipe is responsible for making all of the targets. The
recipe is executed only once to make all the targets. When searching
for a pattern rule to match a target, the target patterns of a rule
other than the one that matches the target in need of a rule are
incidental: make worries only about giving a recipe and prerequisites
to the file presently in question. However, when this file's recipe is
run, the other targets are marked as having been updated themselves."
[1]: https://www.gnu.org/software/make/manual/html_node/Pattern-Intro.html
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 1749ef00f7312679f76d5e9104c5d1e22a829038 ]
We had a test-report where, under memory pressure, adding LUNs to the
systems would fail (the tests add LUNs strictly in sequence):
[ 5525.853432] scsi 0:0:1:1088045124: Direct-Access IBM 2107900 .148 PQ: 0 ANSI: 5
[ 5525.853826] scsi 0:0:1:1088045124: alua: supports implicit TPGS
[ 5525.853830] scsi 0:0:1:1088045124: alua: device naa.6005076303ffd32700000000000044da port group 0 rel port 43
[ 5525.853931] sd 0:0:1:1088045124: Attached scsi generic sg10 type 0
[ 5525.854075] sd 0:0:1:1088045124: [sdk] Disabling DIF Type 1 protection
[ 5525.855495] sd 0:0:1:1088045124: [sdk] 2097152 512-byte logical blocks: (1.07 GB/1.00 GiB)
[ 5525.855606] sd 0:0:1:1088045124: [sdk] Write Protect is off
[ 5525.855609] sd 0:0:1:1088045124: [sdk] Mode Sense: ed 00 00 08
[ 5525.855795] sd 0:0:1:1088045124: [sdk] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 5525.857838] sdk: sdk1
[ 5525.859468] sd 0:0:1:1088045124: [sdk] Attached SCSI disk
[ 5525.865073] sd 0:0:1:1088045124: alua: transition timeout set to 60 seconds
[ 5525.865078] sd 0:0:1:1088045124: alua: port group 00 state A preferred supports tolusnA
[ 5526.015070] sd 0:0:1:1088045124: alua: port group 00 state A preferred supports tolusnA
[ 5526.015213] sd 0:0:1:1088045124: alua: port group 00 state A preferred supports tolusnA
[ 5526.587439] scsi_alloc_sdev: Allocation failure during SCSI scanning, some SCSI devices might not be configured
[ 5526.588562] scsi_alloc_sdev: Allocation failure during SCSI scanning, some SCSI devices might not be configured
Looking at the code of scsi_alloc_sdev(), and all the calling contexts,
there seems to be no reason to use GFP_ATMOIC here. All the different
call-contexts use a mutex at some point, and nothing in between that
requires no sleeping, as far as I could see. Additionally, the code that
later allocates the block queue for the device (scsi_mq_alloc_queue())
already uses GFP_KERNEL.
There are similar allocations in two other functions:
scsi_probe_and_add_lun(), and scsi_add_lun(),; that can also be done with
GFP_KERNEL.
Here is the contexts for the three functions so far:
scsi_alloc_sdev()
scsi_probe_and_add_lun()
scsi_sequential_lun_scan()
__scsi_scan_target()
scsi_scan_target()
mutex_lock()
scsi_scan_channel()
scsi_scan_host_selected()
mutex_lock()
scsi_report_lun_scan()
__scsi_scan_target()
...
__scsi_add_device()
mutex_lock()
__scsi_scan_target()
...
scsi_report_lun_scan()
...
scsi_get_host_dev()
mutex_lock()
scsi_probe_and_add_lun()
...
scsi_add_lun()
scsi_probe_and_add_lun()
...
So replace all these, and give them a bit of a better chance to succeed,
with more chances of reclaim.
Signed-off-by: Benjamin Block <bblock@linux.ibm.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 11f5acce2fa43b015a8120fa7620fa4efd0a2952 ]
We store 2 multilevel tables in iommu_table - one for the hardware and
one with the corresponding userspace addresses. Before allocating
the tables, the iommu_table_group_ops::get_table_size() hook returns
the combined size of the two and VFIO SPAPR TCE IOMMU driver adjusts
the locked_vm counter correctly. When the table is actually allocated,
the amount of allocated memory is stored in iommu_table::it_allocated_size
and used to decrement the locked_vm counter when we release the memory
used by the table; .get_table_size() and .create_table() calculate it
independently but the result is expected to be the same.
However the allocator does not add the userspace table size to
.it_allocated_size so when we destroy the table because of VFIO PCI
unplug (i.e. VFIO container is gone but the userspace keeps running),
we decrement locked_vm by just a half of size of memory we are
releasing.
To make things worse, since we enabled on-demand allocation of
indirect levels, it_allocated_size contains only the amount of memory
actually allocated at the table creation time which can just be a
fraction. It is not a problem with incrementing locked_vm (as
get_table_size() value is used) but it is with decrementing.
As the result, we leak locked_vm and may not be able to allocate more
IOMMU tables after few iterations of hotplug/unplug.
This sets it_allocated_size in the pnv_pci_ioda2_ops::create_table()
hook to what pnv_pci_ioda2_get_table_size() returns so from now on we
have a single place which calculates the maximum memory a table can
occupy. The original meaning of it_allocated_size is somewhat lost now
though.
We do not ditch it_allocated_size whatsoever here and we do not call
get_table_size() from vfio_iommu_spapr_tce.c when decrementing
locked_vm as we may have multiple IOMMU groups per container and even
though they all are supposed to have the same get_table_size()
implementation, there is a small chance for failure or confusion.
Fixes: 090bad39b237 ("powerpc/powernv: Add indirect levels to it_userspace")
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 68ef236274793066b9ba3154b16c0acc1c891e5c ]
According to the chipidea driver bindings, the USB PHY is specified via
the "phys" phandle node. However, this only takes effect for USB PHYs
that use the common PHY framework. For legacy USB PHYs, a simple lookup
based on the USB PHY type is done instead.
This does not play out well when more than one USB PHY is registered,
since the first registered PHY matching the type will always be
returned regardless of what the driver was bound to.
Fix this by looking up the PHY based on the "phys" phandle node.
Although generic PHYs are rather matched by their "phys-name" and not
the "phys" phandle directly, there is no helper for similar lookup on
legacy PHYs and it's probably not worth the effort to add it.
When no legacy USB PHY is found by phandle, fallback to grabbing any
registered USB2 PHY. This ensures backward compatibility if some users
were actually relying on this mechanism.
Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
Signed-off-by: Peter Chen <peter.chen@nxp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 9eca5083757b679b37f210092c871916c2c222d0 ]
The bpf_map_lookup_elem is added in the bpf program.
Without previous patch, the test change will trigger the
following error:
$ ./test_maps
...
; value_p = bpf_map_lookup_elem(map, &key);
20: (bf) r1 = r7
21: (bf) r2 = r8
22: (85) call bpf_map_lookup_elem#1
; if (!value_p || *value_p != 123)
23: (15) if r0 == 0x0 goto pc+16
R0=map_value(id=2,off=0,ks=4,vs=4,imm=0) R6=inv1 R7=map_ptr(id=0,off=0,ks=4,vs=4,imm=0)
R8=fp-8,call_-1 R10=fp0,call_-1 fp-8=mmmmmmmm
; if (!value_p || *value_p != 123)
24: (61) r1 = *(u32 *)(r0 +0)
R0=map_value(id=2,off=0,ks=4,vs=4,imm=0) R6=inv1 R7=map_ptr(id=0,off=0,ks=4,vs=4,imm=0)
R8=fp-8,call_-1 R10=fp0,call_-1 fp-8=mmmmmmmm
bpf_spin_lock cannot be accessed directly by load/store
With the kernel fix in the previous commit, the error goes away.
Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 41798036430015ad45137db2d4c213cd77fd0251 ]
The cavium/zip implementation of the deflate compression algorithm is
incorrectly being registered under the generic driver name, which
prevents the generic implementation from being registered with the
crypto API when CONFIG_CRYPTO_DEV_CAVIUM_ZIP=y. Similarly the lzs
algorithm (which does not currently have a generic implementation...)
is incorrectly being registered as lzs-generic.
Fix the naming collision by adding a suffix "-cavium" to the
cra_driver_name of the cavium/zip algorithms.
Fixes: 640035a2dc55 ("crypto: zip - Add ThunderX ZIP driver core")
Cc: Mahipal Challa <mahipalreddy2006@gmail.com>
Cc: Jan Glauber <jglauber@cavium.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 8c2b43d2d85b48a97d2f8279278a4aac5b45f925 ]
Add an of_node_put when a tested device node is not available.
The semantic patch that fixes this problem is as follows
(http://coccinelle.lip6.fr):
// <smpl>
@@
identifier f;
local idexpression e;
expression x;
@@
e = f(...);
... when != of_node_put(e)
when != x = e
when != e = x
when any
if (<+...of_device_is_available(e)...+>) {
... when != of_node_put(e)
(
return e;
|
+ of_node_put(e);
return ...;
)
}
// </smpl>
Fixes: 5343e674f32fb ("crypto4xx: integrate ppc4xx-rng into crypto4xx")
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit d93ac78bf7b37db36fa00225f8e9a14c7ed1b2ba ]
Apparently the execute bits were set for the tests/*.sh scripts on my
test setup but these are not set in the kernel tree. Fix this by adding
the interpreter path in front of the script paths.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman Long <longman@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: johannes.berg@intel.com
Cc: tj@kernel.org
Fixes: 5ecb8e94b494 ("tools/lib/lockdep/tests: Improve testing accuracy") # v5.0-rc1
Link: https://lkml.kernel.org/r/20190214230058.196511-23-bvanassche@acm.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit ef9051c72ab7bc664e8047c55ac74bdb1c7fa3ee ]
Currently, the bandwidth is updated wrongly in BW table in tx_stats
debugfs per sta as there is difference in number of bandwidth type
in mac80211 and driver stats table. This leads to bandwidth getting
updated at wrong index in bandwidth table in tx_stats.
Fix this index mismatch between mac80211 and driver stats table (BW table)
by making the number of bandwidth type in driver compatible with mac80211.
Tested HW: WCN3990
Tested FW: WLAN.HL.3.1-00784-QCAHLSWMTPLZ-1
Fixes: a904417fc876 ("ath10k: add extended per sta tx statistics support")
Signed-off-by: Surabhi Vishnoi <svishnoi@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 34e022d8b780a03902d82fb3997ba7c7b1f40c81 ]
The call to of_find_node_by_phandle returns a node pointer with refcount
incremented thus it must be explicitly decremented after the last
usage.
Detected by coccinelle with the following warnings:
./drivers/net/wireless/mediatek/mt76/eeprom.c:58:2-8: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 48, but without a corresponding object release within this function.
./drivers/net/wireless/mediatek/mt76/eeprom.c:61:2-8: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 48, but without a corresponding object release within this function.
./drivers/net/wireless/mediatek/mt76/eeprom.c:67:2-8: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 48, but without a corresponding object release within this function.
./drivers/net/wireless/mediatek/mt76/eeprom.c:70:2-8: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 48, but without a corresponding object release within this function.
./drivers/net/wireless/mediatek/mt76/eeprom.c:72:1-7: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 48, but without a corresponding object release within this function.
Signed-off-by: Wen Yang <wen.yang99@zte.com.cn>
Cc: Felix Fietkau <nbd@nbd.name>
Cc: Lorenzo Bianconi <lorenzo.bianconi83@gmail.com>
Cc: Kalle Valo <kvalo@codeaurora.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: linux-wireless@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-mediatek@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit de77a53c2d1e8fb3621e63e8e1f0f0c9a1a99ff7 ]
ies1 or ies2 might be null when code inside
_wil_cfg80211_merge_extra_ies access them.
Add explicit check for null and make sure ies1/ies2 are not
accessed in such a case.
spos might be null and be accessed inside
_wil_cfg80211_merge_extra_ies.
Add explicit check for null in the while condition statement
and make sure spos is not accessed in such a case.
Signed-off-by: Alexei Avshalom Lazar <ailizaro@codeaurora.org>
Signed-off-by: Maya Erez <merez@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 95c80bc6952b6a5badc7b702d23e5bf14d251e7c ]
Dongdong reported a deadlock triggered by a hotplug event during a sysfs
"remove" operation:
pciehp 0000:00:0c.0:pcie004: Slot(0-1): Link Up
# echo 1 > 0000:00:0c.0/remove
PME and hotplug share an MSI/MSI-X vector. The sysfs "remove" side is:
remove_store
pci_stop_and_remove_bus_device_locked
pci_lock_rescan_remove
pci_stop_and_remove_bus_device
...
pcie_pme_remove
pcie_pme_suspend
synchronize_irq # wait for hotplug IRQ handler
pci_unlock_rescan_remove
The hotplug side is:
pciehp_ist
pciehp_handle_presence_or_link_change
pciehp_configure_device
pci_lock_rescan_remove # wait for pci_unlock_rescan_remove()
INFO: task bash:10913 blocked for more than 120 seconds.
# ps -ax |grep D
PID TTY STAT TIME COMMAND
10913 ttyAMA0 Ds+ 0:00 -bash
14022 ? D 0:00 [irq/745-pciehp]
# cat /proc/14022/stack
__switch_to+0x94/0xd8
pci_lock_rescan_remove+0x20/0x28
pciehp_configure_device+0x30/0x140
pciehp_handle_presence_or_link_change+0x324/0x458
pciehp_ist+0x1dc/0x1e0
# cat /proc/10913/stack
__switch_to+0x94/0xd8
synchronize_irq+0x8c/0xc0
pcie_pme_suspend+0xa4/0x118
pcie_pme_remove+0x20/0x40
pcie_port_remove_service+0x3c/0x58
...
pcie_port_device_remove+0x2c/0x48
pcie_portdrv_remove+0x68/0x78
pci_device_remove+0x48/0x120
...
pci_stop_bus_device+0x84/0xc0
pci_stop_and_remove_bus_device_locked+0x24/0x40
remove_store+0xa4/0xb8
dev_attr_store+0x44/0x60
sysfs_kf_write+0x58/0x80
It is incorrect to call pcie_pme_suspend() from pcie_pme_remove() for two
reasons.
First, pcie_pme_suspend() calls synchronize_irq(), which will wait for the
native hotplug interrupt handler as well as for the PME one, because they
share one IRQ (as per the spec). That may deadlock if hotplug is signaled
while pcie_pme_remove() is running and the latter calls
pci_lock_rescan_remove() before the former.
Second, if pcie_pme_suspend() figures out that wakeup needs to be enabled
for the port, it will return without disabling the interrupt as expected by
pcie_pme_remove() which was overlooked by commit c7b5a4e6e8fb ("PCI / PM:
Fix native PME handling during system suspend/resume").
To fix that, rework pcie_pme_remove() to disable the PME interrupt, clear
its status and prevent the PME worker function from re-enabling it before
calling free_irq() on it, which should be sufficient.
Fixes: c7b5a4e6e8fb ("PCI / PM: Fix native PME handling during system suspend/resume")
Link: https://lore.kernel.org/linux-pci/c7697e7c-e1af-13e4-8491-0a3996e6ab5d@huawei.com
Reported-by: Dongdong Liu <liudongdong3@huawei.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
[bhelgaas: add URL and deadlock details from Dongdong]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 5cd401ace914dc68556c6d2fcae0c349444d5f86 ]
walk_system_ram_range() can return an error code either becuase
*it* failed, or because the 'func' that it calls returned an
error. The memory hotplug does the following:
ret = walk_system_ram_range(..., func);
if (ret)
return ret;
and 'ret' makes it out to userspace, eventually. The problem
s, walk_system_ram_range() failues that result from *it* failing
(as opposed to 'func') return -1. That leads to a very odd
-EPERM (-1) return code out to userspace.
Make walk_system_ram_range() return -EINVAL for internal
failures to keep userspace less confused.
This return code is compatible with all the callers that I
audited.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Ross Zwisler <zwisler@kernel.org>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: linux-nvdimm@lists.01.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: Huang Ying <ying.huang@intel.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Yaowei Bai <baiyaowei@cmss.chinamobile.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Keith Busch <keith.busch@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 7c5b019e3a638a5a290b0ec020f6ca83d2ec2aaa ]
Fix buffer overflow observed when running perf test.
The overflow is when trying to evaluate "1ULL << (64 - 1)" which is
resulting in -9223372036854775808 which overflows the 20 character
buffer.
If is possible this bug has been reported before but I still don't see
any fix checked in:
See: https://www.spinics.net/lists/linux-perf-users/msg07714.html
Reported-by: Michael Sartain <mikesart@fastmail.com>
Reported-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Tony Jones <tonyj@suse.de>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Fixes: f7d82350e597 ("tools/events: Add files to create libtraceevent.a")
Link: http://lkml.kernel.org/r/20190228015532.8941-1-tonyj@suse.de
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit dce30ca9e3b676fb288c33c1f4725a0621361185 ]
guard_bio_eod() can truncate a segment in bio to allow it to do IO on
odd last sectors of a device.
It already checks if the IO starts past EOD, but it does not consider
the possibility of an IO request starting within device boundaries can
contain more than one segment past EOD.
In such cases, truncated_bytes can be bigger than PAGE_SIZE, and will
underflow bvec->bv_len.
Fix this by checking if truncated_bytes is lower than PAGE_SIZE.
This situation has been found on filesystems such as isofs and vfat,
which doesn't check the device size before mount, if the device is
smaller than the filesystem itself, a readahead on such filesystem,
which spans EOD, can trigger this situation, leading a call to
zero_user() with a wrong size possibly corrupting memory.
I didn't see any crash, or didn't let the system run long enough to
check if memory corruption will be hit somewhere, but adding
instrumentation to guard_bio_end() to check truncated_bytes size, was
enough to see the error.
The following script can trigger the error.
MNT=/mnt
IMG=./DISK.img
DEV=/dev/loop0
mkfs.vfat $IMG
mount $IMG $MNT
cp -R /etc $MNT &> /dev/null
umount $MNT
losetup -D
losetup --find --show --sizelimit 16247280 $IMG
mount $DEV $MNT
find $MNT -type f -exec cat {} + >/dev/null
Kudos to Eric Sandeen for coming up with the reproducer above
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 7bd75230b43727b258a4f7a59d62114cffe1b6c8 ]
Ext4 may not free clusters correctly when punching holes in bigalloc
file systems under high load conditions. If it's not possible to
extend and restart the journal in ext4_ext_rm_leaf() when preparing to
remove blocks from a punched region, a retry of the entire punch
operation is triggered in ext4_ext_remove_space(). This causes a
partial cluster to be set to the first cluster in the extent found to
the right of the punched region. However, if the punch operation
prior to the retry had made enough progress to delete one or more
extents and a partial cluster candidate for freeing had already been
recorded, the retry would overwrite the partial cluster. The loss of
this information makes it impossible to correctly free the original
partial cluster in all cases.
This bug can cause generic/476 to fail when run as part of
xfstests-bld's bigalloc and bigalloc_1k test cases. The failure is
reported when e2fsck detects bad iblocks counts greater than expected
in units of whole clusters and also detects a number of negative block
bitmap differences equal to the iblocks discrepancy in cluster units.
Signed-off-by: Eric Whitney <enwlinux@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 6e876c3dd205d30b0db6850e97a03d75457df007 ]
In jbd2_journal_commit_transaction(), if we are in abort mode,
we may flush the buffer without setting descriptor block checksum
by goto start_journal_io. Then fs is mounted,
jbd2_descriptor_block_csum_verify() failed.
[ 271.379811] EXT4-fs (vdd): shut down requested (2)
[ 271.381827] Aborting journal on device vdd-8.
[ 271.597136] JBD2: Invalid checksum recovering block 22199 in log
[ 271.598023] JBD2: recovery failed
[ 271.598484] EXT4-fs (vdd): error loading journal
Fix this problem by keep setting descriptor block checksum if the
descriptor buffer is not NULL.
This checksum problem can be reproduced by xfstests generic/388.
Signed-off-by: luojiajun <luojiajun3@huawei.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit d8b8591054575f33237556c32762d54e30774d28 ]
Commit fb58fdcd295b9 ("iommu/vt-d: Do not enable ATS for untrusted
devices") disables ATS support on the devices which have been marked
as untrusted. Unfortunately this is not enough to fix the DMA attack
vulnerabiltiies because IOMMU driver allows translated requests as
long as a device advertises the ATS capability. Hence a malicious
peripheral device could use this to bypass IOMMU.
This disables the ATS support on untrusted devices by clearing the
internal per-device ATS mark. As the result, IOMMU driver will block
any translated requests from any device marked as untrusted.
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Suggested-by: Kevin Tian <kevin.tian@intel.com>
Suggested-by: Ashok Raj <ashok.raj@intel.com>
Fixes: fb58fdcd295b9 ("iommu/vt-d: Do not enable ATS for untrusted devices")
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit be0502a3f2e94211a8809a09ecbc3a017189b8fb ]
TCP resets cause instant transition from established to closed state
provided the reset is in-window. Endpoints that implement RFC 5961
require resets to match the next expected sequence number.
RST segments that are in-window (but that do not match RCV.NXT) are
ignored, and a "challenge ACK" is sent back.
Main problem for conntrack is that its a middlebox, i.e. whereas an end
host might have ACK'd SEQ (and would thus accept an RST with this
sequence number), conntrack might not have seen this ACK (yet).
Therefore we can't simply flag RSTs with non-exact match as invalid.
This updates RST processing as follows:
1. If the connection is in a state other than ESTABLISHED, nothing is
changed, RST is subject to normal in-window check.
2. If the RSTs sequence number either matches exactly RCV.NXT,
connection state moves to CLOSE.
3. The same applies if the RST sequence number aligns with a previous
packet in the same direction.
In all other cases, the connection remains in ESTABLISHED state.
If the normal-in-window check passes, the timeout will be lowered
to that of CLOSE.
If the peer sends a challenge ack, connection timeout will be reset.
If the challenge ACK triggers another RST (RST was valid after all),
this 2nd RST will match expected sequence and conntrack state changes to
CLOSE.
If no challenge ACK is received, the connection will time out after
CLOSE seconds (10 seconds by default), just like without this patch.
Packetdrill test case:
0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
0.000 bind(3, ..., ...) = 0
0.000 listen(3, 1) = 0
0.100 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 7>
0.100 > S. 0:0(0) ack 1 win 64240 <mss 1460,nop,nop,sackOK,nop,wscale 7>
0.200 < . 1:1(0) ack 1 win 257
0.200 accept(3, ..., ...) = 4
// Receive a segment.
0.210 < P. 1:1001(1000) ack 1 win 46
0.210 > . 1:1(0) ack 1001
// Application writes 1000 bytes.
0.250 write(4, ..., 1000) = 1000
0.250 > P. 1:1001(1000) ack 1001
// First reset, old sequence. Conntrack (correctly) considers this
// invalid due to failed window validation (regardless of this patch).
0.260 < R 2:2(0) ack 1001 win 260
// 2nd reset, but too far ahead sequence. Same: correctly handled
// as invalid.
0.270 < R 99990001:99990001(0) ack 1001 win 260
// in-window, but not exact sequence.
// Current Linux kernels might reply with a challenge ack, and do not
// remove connection.
// Without this patch, conntrack state moves to CLOSE.
// With patch, timeout is lowered like CLOSE, but connection stays
// in ESTABLISHED state.
0.280 < R 1010:1010(0) ack 1001 win 260
// Expect challenge ACK
0.281 > . 1001:1001(0) ack 1001 win 501
// With or without this patch, RST will cause connection
// to move to CLOSE (sequence number matches)
// 0.282 < R 1001:1001(0) ack 1001 win 260
// ACK
0.300 < . 1001:1001(0) ack 1001 win 257
// more data could be exchanged here, connection
// is still established
// Client closes the connection.
0.610 < F. 1001:1001(0) ack 1001 win 260
0.650 > . 1001:1001(0) ack 1002
// Close the connection without reading outstanding data
0.700 close(4) = 0
// so one more reset. Will be deemed acceptable with patch as well:
// connection is already closing.
0.701 > R. 1001:1001(0) ack 1002 win 501
// End packetdrill test case.
With patch, this generates following conntrack events:
[NEW] 120 SYN_SENT src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [UNREPLIED]
[UPDATE] 60 SYN_RECV src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80
[UPDATE] 432000 ESTABLISHED src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED]
[UPDATE] 120 FIN_WAIT src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED]
[UPDATE] 60 CLOSE_WAIT src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED]
[UPDATE] 10 CLOSE src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED]
Without patch, first RST moves connection to close, whereas socket state
does not change until FIN is received.
[NEW] 120 SYN_SENT src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80 [UNREPLIED]
[UPDATE] 60 SYN_RECV src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80
[UPDATE] 432000 ESTABLISHED src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80 [ASSURED]
[UPDATE] 10 CLOSE src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80 [ASSURED]
Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit c61df57343bf05743f8abbb31eec9a6f05820dd1 ]
Mediatek's HW assigns a MMIO address range (typically starts from
0x20000000 to 0x2fffffff for both mt2712 and mt7622) for PCI usage.
This MMIO address space represents the address space that can
be allocated to PCI devices through Base Address Registers.
Even though the full MMIO address range is available to be allocated, it
should be enabled by the PCIE_AHB_TRANS_BASE register in the host
controller and the size that is enabled is determined by AHB2PCIE_SIZE
bits in this register.
Owing to a bug in the MMIO window size computation, current code does
not enable the full size of the available MMIO address range in the
PCI host controller; if the PCI devices BARs requested size exceeds the
size enabled through the PCIE_AHB_TRANS_BASE register the requests
targeting the disabled address address space will be blocked by the root
complex causing a system error.
Existing code has never run into a system error in production because
even half of the enabled MMIO range (128MB) is big enough for typical
devices BAR requests (4MB) but the full MMIO address range should
be enabled regardless.
Fix the MMIO window size computation by using resource_size(mem) instead
of mem->end - mem->start.
Since the MMIO window size for both MT2712 and MT7622 is 0x10000000,
this change will update the parameter passed to fls() from 0xfffffff to
0x10000000 and calculate the whole memory mapped IO range size
correctly.
Detected through coccinelle semantic patch (and related warning):
scripts/coccinelle/api/resource_size.cocci:
pcie-mediatek.c:720:13-16: WARNING: Suspicious code. resource_size is maybe missing with mem
Signed-off-by: Honghui Zhang <honghui.zhang@mediatek.com>
[lorenzo.pieralisi@arm.com: rewrote the commit log]
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit a9f5e78c403d2d62ade4f4c85040efc85f4049b8 ]
Check the result of dereferencing base_chain->stats, instead of result
of this_cpu_ptr with NULL.
base_chain->stats maybe be changed to NULL when a chain is updated and a
new NULL counter can be attached.
And we do not need to check returning of this_cpu_ptr since
base_chain->stats is from percpu allocator if it is non-NULL,
this_cpu_ptr returns a valid value.
And fix two sparse error by replacing rcu_access_pointer and
rcu_dereference with READ_ONCE under rcu_read_lock.
Thanks for Eric's help to finish this patch.
Fixes: 009240940e84c1 ("netfilter: nf_tables: don't assume chain stats are set when jumplabel is set")
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit c3b81a500f35241a4c16febe0a015e572cf2c492 ]
When the prefix suppresion/enabling logic was added, I forgot to add an
extra %, which ended up chopping off the strings:
Before:
# perf trace -e *mmsg --map-dump syscalls
[299] = 1,
[307] = 1,
DNS Res~ver #3/14587 sendmmsg(106<socket:[3462393]>, 0x7f252b0fcaf0, 2, MSG_) = 2
chronyd/1053 recvmmsg(4, 0x558542ca5740, 4, MSG_, NULL) = 1
DNS Res~ver #2/14445 sendmmsg(106<socket:[3461475]>, 0x7f252ab09af0, 2, MSG_) = 2
DNS Res~ver #2/14444 sendmmsg(146<socket:[3457863]>, 0x7f2521a7aaf0, 2, MSG_) = 2
DNS Res~ver #2/14445 sendmmsg(106<socket:[3461475]>, 0x7f252ab09af0, 2, MSG_) = 2
DNS Res~ver #3/14587 sendmmsg(148<socket:[3460636]>, 0x7f252b0fcaf0, 2, MSG_) = 2
DNS Res~ver #2/14444 sendmmsg(146<socket:[3457863]>, 0x7f2521a7aaf0, 2, MSG_) = 2
^C#
After:
# perf trace -e *mmsg --map-dump syscalls
[299] = 1,
[307] = 1,
NetworkManager/17467 sendmmsg(22<socket:[3466493]>, 0x7f28927f9bb0, 2, MSG_NOSIGNAL) = 2
pool/17478 sendmmsg(10<socket:[3466523]>, 0x7f2769f95e90, 2, MSG_NOSIGNAL) = 2
DNS Res~ver #3/14587 sendmmsg(121<socket:[3466132]>, 0x7f252b0fcaf0, 2, MSG_NOSIGNAL) = 2
chronyd/1053 recvmmsg(4, 0x558542ca5740, 4, MSG_DONTWAIT, NULL) = 1
Socket Thread/17433 sendmmsg(121<socket:[3460903]>, 0x7f252668baf0, 2, MSG_NOSIGNAL) = 2
^C#
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Fixes: c65c83ffe904 ("perf trace: Allow asking for not suppressing common string prefixes")
Link: https://lkml.kernel.org/n/tip-t2eu1rqx710k6jr4814mlzg7@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 68e2672f8fbd1e04982b8d2798dd318bf2515dd2 ]
There is a NULL pointer dereference of devname in strspn()
The oops looks something like:
CIFS: Attempting to mount (null)
BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
...
RIP: 0010:strspn+0x0/0x50
...
Call Trace:
? cifs_parse_mount_options+0x222/0x1710 [cifs]
? cifs_get_volume_info+0x2f/0x80 [cifs]
cifs_setup_volume_info+0x20/0x190 [cifs]
cifs_get_volume_info+0x50/0x80 [cifs]
cifs_smb3_do_mount+0x59/0x630 [cifs]
? ida_alloc_range+0x34b/0x3d0
cifs_do_mount+0x11/0x20 [cifs]
mount_fs+0x52/0x170
vfs_kern_mount+0x6b/0x170
do_mount+0x216/0xdc0
ksys_mount+0x83/0xd0
__x64_sys_mount+0x25/0x30
do_syscall_64+0x65/0x220
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Fix this by adding a NULL check on devname in cifs_parse_devname()
Signed-off-by: Yao Liu <yotta.liu@ucloud.cn>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 969ae8e8d4ee54c99134d3895f2adf96047f5bee ]
Old windows version or Netapp SMB server will return
NT_STATUS_NOT_SUPPORTED since they do not allow or implement
FSCTL_VALIDATE_NEGOTIATE_INFO. The client should accept the response
provided it's properly signed.
See
https://blogs.msdn.microsoft.com/openspecification/2012/06/28/smb3-secure-dialect-negotiation/
and
MS-SMB2 validate negotiate response processing:
https://msdn.microsoft.com/en-us/library/hh880630.aspx
Samba client had already handled it.
https://bugzilla.samba.org/attachment.cgi?id=13285&action=edit
Signed-off-by: Namjae Jeon <linkinjeon@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 500e0b28ecd3c5aade98f3c3a339d18dcb166bb6 ]
We use below condition to check inline_xattr_size boundary:
if (!F2FS_OPTION(sbi).inline_xattr_size ||
F2FS_OPTION(sbi).inline_xattr_size >=
DEF_ADDRS_PER_INODE -
F2FS_TOTAL_EXTRA_ATTR_SIZE -
DEF_INLINE_RESERVED_SIZE -
DEF_MIN_INLINE_SIZE)
There is there problems in that check:
- we should allow inline_xattr_size equaling to min size of inline
{data,dentry} area.
- F2FS_TOTAL_EXTRA_ATTR_SIZE and inline_xattr_size are based on
different size unit, previous one is 4 bytes, latter one is 1 bytes.
- DEF_MIN_INLINE_SIZE only indicate min size of inline data area,
however, we need to consider min size of inline dentry area as well,
minimal inline dentry should at least contain two entries: '.' and
'..', so that min inline_dentry size is 40 bytes.
.bitmap 1 * 1 = 1
.reserved 1 * 1 = 1
.dentry 11 * 2 = 22
.filename 8 * 2 = 16
total 40
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 70de2cbda8a5d788284469e755f8b097d339c240 ]
Invoking dm_get_device() twice on the same device path with different
modes is dangerous. Because in that case, upgrade_mode() will alloc a
new 'dm_dev' and free the old one, which may be referenced by a previous
caller. Dereferencing the dangling pointer will trigger kernel NULL
pointer dereference.
The following two cases can reproduce this issue. Actually, they are
invalid setups that must be disallowed, e.g.:
1. Creating a thin-pool with read_only mode, and the same device as
both metadata and data.
dmsetup create thinp --table \
"0 41943040 thin-pool /dev/vdb /dev/vdb 128 0 1 read_only"
BUG: unable to handle kernel NULL pointer dereference at 0000000000000080
...
Call Trace:
new_read+0xfb/0x110 [dm_bufio]
dm_bm_read_lock+0x43/0x190 [dm_persistent_data]
? kmem_cache_alloc_trace+0x15c/0x1e0
__create_persistent_data_objects+0x65/0x3e0 [dm_thin_pool]
dm_pool_metadata_open+0x8c/0xf0 [dm_thin_pool]
pool_ctr.cold.79+0x213/0x913 [dm_thin_pool]
? realloc_argv+0x50/0x70 [dm_mod]
dm_table_add_target+0x14e/0x330 [dm_mod]
table_load+0x122/0x2e0 [dm_mod]
? dev_status+0x40/0x40 [dm_mod]
ctl_ioctl+0x1aa/0x3e0 [dm_mod]
dm_ctl_ioctl+0xa/0x10 [dm_mod]
do_vfs_ioctl+0xa2/0x600
? handle_mm_fault+0xda/0x200
? __do_page_fault+0x26c/0x4f0
ksys_ioctl+0x60/0x90
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x55/0x150
entry_SYSCALL_64_after_hwframe+0x44/0xa9
2. Creating a external snapshot using the same thin-pool device.
dmsetup create thinp --table \
"0 41943040 thin-pool /dev/vdc /dev/vdb 128 0 2 ignore_discard"
dmsetup message /dev/mapper/thinp 0 "create_thin 0"
dmsetup create snap --table \
"0 204800 thin /dev/mapper/thinp 0 /dev/mapper/thinp"
BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
...
Call Trace:
? __alloc_pages_nodemask+0x13c/0x2e0
retrieve_status+0xa5/0x1f0 [dm_mod]
? dm_get_live_or_inactive_table.isra.7+0x20/0x20 [dm_mod]
table_status+0x61/0xa0 [dm_mod]
ctl_ioctl+0x1aa/0x3e0 [dm_mod]
dm_ctl_ioctl+0xa/0x10 [dm_mod]
do_vfs_ioctl+0xa2/0x600
ksys_ioctl+0x60/0x90
? ksys_write+0x4f/0xb0
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x55/0x150
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Signed-off-by: Jason Cai (Xiang Feng) <jason.cai@linux.alibaba.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 259594bea574e515a148171b5cd84ce5cbdc028a ]
When compiling with -Wformat, clang emits the following warnings:
fs/cifs/smb1ops.c:312:20: warning: format specifies type 'unsigned
short' but the argument has type 'unsigned int' [-Wformat]
tgt_total_cnt, total_in_tgt);
^~~~~~~~~~~~
fs/cifs/cifs_dfs_ref.c:289:4: warning: format specifies type 'short'
but the argument has type 'int' [-Wformat]
ref->flags, ref->server_type);
^~~~~~~~~~
fs/cifs/cifs_dfs_ref.c:289:16: warning: format specifies type 'short'
but the argument has type 'int' [-Wformat]
ref->flags, ref->server_type);
^~~~~~~~~~~~~~~~
fs/cifs/cifs_dfs_ref.c:291:4: warning: format specifies type 'short'
but the argument has type 'int' [-Wformat]
ref->ref_flag, ref->path_consumed);
^~~~~~~~~~~~~
fs/cifs/cifs_dfs_ref.c:291:19: warning: format specifies type 'short'
but the argument has type 'int' [-Wformat]
ref->ref_flag, ref->path_consumed);
^~~~~~~~~~~~~~~~~~
The types of these arguments are unconditionally defined, so this patch
updates the format character to the correct ones for ints and unsigned
ints.
Link: https://github.com/ClangBuiltLinux/linux/issues/378
Signed-off-by: Louis Taylor <louis@kragniz.eu>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit bcf6f55a0d05eedd8ebb6ecc60ae3f93205ad833 ]
Building little-endian allmodconfig kernels on arm64 started failing
with the generated atomic.h implementation, since we now try to call
kasan helpers from the EFI stub:
aarch64-linux-gnu-ld: drivers/firmware/efi/libstub/arm-stub.stub.o: in function `atomic_set':
include/generated/atomic-instrumented.h:44: undefined reference to `__efistub_kasan_check_write'
I suspect that we get similar problems in other files that explicitly
disable KASAN for some reason but call atomic_t based helper functions.
We can fix this by checking the predefined __SANITIZE_ADDRESS__ macro
that the compiler sets instead of checking CONFIG_KASAN, but this in
turn requires a small hack in mm/kasan/common.c so we do see the extern
declaration there instead of the inline function.
Link: http://lkml.kernel.org/r/20181211133453.2835077-1-arnd@arndb.de
Fixes: b1864b828644 ("locking/atomics: build atomic headers as required")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reported-by: Anders Roxell <anders.roxell@linaro.org>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>,
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 4117992df66a26fa33908b4969e04801534baab1 ]
KASAN does not play well with the page poisoning (CONFIG_PAGE_POISONING).
It triggers false positives in the allocation path:
BUG: KASAN: use-after-free in memchr_inv+0x2ea/0x330
Read of size 8 at addr ffff88881f800000 by task swapper/0
CPU: 0 PID: 0 Comm: swapper Not tainted 5.0.0-rc1+ #54
Call Trace:
dump_stack+0xe0/0x19a
print_address_description.cold.2+0x9/0x28b
kasan_report.cold.3+0x7a/0xb5
__asan_report_load8_noabort+0x19/0x20
memchr_inv+0x2ea/0x330
kernel_poison_pages+0x103/0x3d5
get_page_from_freelist+0x15e7/0x4d90
because KASAN has not yet unpoisoned the shadow page for allocation
before it checks memchr_inv() but only found a stale poison pattern.
Also, false positives in free path,
BUG: KASAN: slab-out-of-bounds in kernel_poison_pages+0x29e/0x3d5
Write of size 4096 at addr ffff8888112cc000 by task swapper/0/1
CPU: 5 PID: 1 Comm: swapper/0 Not tainted 5.0.0-rc1+ #55
Call Trace:
dump_stack+0xe0/0x19a
print_address_description.cold.2+0x9/0x28b
kasan_report.cold.3+0x7a/0xb5
check_memory_region+0x22d/0x250
memset+0x28/0x40
kernel_poison_pages+0x29e/0x3d5
__free_pages_ok+0x75f/0x13e0
due to KASAN adds poisoned redzones around slab objects, but the page
poisoning needs to poison the whole page.
Link: http://lkml.kernel.org/r/20190114233405.67843-1-cai@lca.pw
Signed-off-by: Qian Cai <cai@lca.pw>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 5704a06810682683355624923547b41540e2801a ]
(Taken from https://bugzilla.kernel.org/show_bug.cgi?id=200647)
'get_unused_fd_flags' in kthread cause kernel crash. It works fine on
4.1, but causes crash after get 64 fds. It also cause crash on
ubuntu1404/1604/1804, centos7.5, and the crash messages are almost the
same.
The crash message on centos7.5 shows below:
start fd 61
start fd 62
start fd 63
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: __wake_up_common+0x2e/0x90
PGD 0
Oops: 0000 [#1] SMP
Modules linked in: test(OE) xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter devlink sunrpc kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd sg ppdev pcspkr virtio_balloon parport_pc parport i2c_piix4 joydev ip_tables xfs libcrc32c sr_mod cdrom sd_mod crc_t10dif crct10dif_generic ata_generic pata_acpi virtio_scsi virtio_console virtio_net cirrus drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm crct10dif_pclmul crct10dif_common crc32c_intel drm ata_piix serio_raw libata virtio_pci virtio_ring i2c_core
virtio floppy dm_mirror dm_region_hash dm_log dm_mod
CPU: 2 PID: 1820 Comm: test_fd Kdump: loaded Tainted: G OE ------------ 3.10.0-862.3.3.el7.x86_64 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
task: ffff8e92b9431fa0 ti: ffff8e94247a0000 task.ti: ffff8e94247a0000
RIP: 0010:__wake_up_common+0x2e/0x90
RSP: 0018:ffff8e94247a2d18 EFLAGS: 00010086
RAX: 0000000000000000 RBX: ffffffff9d09daa0 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffffffff9d09daa0
RBP: ffff8e94247a2d50 R08: 0000000000000000 R09: ffff8e92b95dfda8
R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff9d09daa8
R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000003
FS: 0000000000000000(0000) GS:ffff8e9434e80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000017c686000 CR4: 00000000000207e0
Call Trace:
__wake_up+0x39/0x50
expand_files+0x131/0x250
__alloc_fd+0x47/0x170
get_unused_fd_flags+0x30/0x40
test_fd+0x12a/0x1c0 [test]
kthread+0xd1/0xe0
ret_from_fork_nospec_begin+0x21/0x21
Code: 66 90 55 48 89 e5 41 57 41 89 f7 41 56 41 89 ce 41 55 41 54 49 89 fc 49 83 c4 08 53 48 83 ec 10 48 8b 47 08 89 55 cc 4c 89 45 d0 <48> 8b 08 49 39 c4 48 8d 78 e8 4c 8d 69 e8 75 08 eb 3b 4c 89 ef
RIP __wake_up_common+0x2e/0x90
RSP <ffff8e94247a2d18>
CR2: 0000000000000000
This issue exists since CentOS 7.5 3.10.0-862 and CentOS 7.4
(3.10.0-693.21.1 ) is ok. Root cause: the item 'resize_wait' is not
initialized before being used.
Reported-by: Richard Zhang <zhang.zijian@h3c.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit a0770e13c8da83bdb64738c0209ab02dd3cfff8b ]
v4: Rearrange the previous three versions.
The following scenario could lead to data block override by mistake.
TASK A | TASK kworker | TASK B | TASK C
| | |
open | | |
write | | |
close | | |
| f2fs_write_data_pages | |
| f2fs_write_cache_pages | |
| f2fs_outplace_write_data | |
| f2fs_allocate_data_block (get block in seg S, | |
| S is full, and only | |
| have this valid data | |
| block) | |
| allocate_segment | |
| locate_dirty_segment (mark S as PRE) | |
| f2fs_submit_page_write (submit but is not | |
| written on dev) | |
unlink | | |
iput_final | | |
f2fs_drop_inode | | |
f2fs_truncate | | |
(not evict) | | |
| | write_checkpoint |
| | flush merged bio but not wait file data writeback |
| | set_prefree_as_free (mark S as FREE) |
| | | update NODE/DATA
| | | allocate_segment (select S)
| writeback done | |
So we need to guarantee io complete before truncate inode in f2fs_drop_inode.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Zheng Liang <zhengliang6@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 9083977dabf3833298ddcd40dee28687f1e6b483 ]
Fix below warning coming because of using mutex lock in atomic context.
BUG: sleeping function called from invalid context at kernel/locking/mutex.c:98
in_atomic(): 1, irqs_disabled(): 0, pid: 585, name: sh
Preemption disabled at: __radix_tree_preload+0x28/0x130
Call trace:
dump_backtrace+0x0/0x2b4
show_stack+0x20/0x28
dump_stack+0xa8/0xe0
___might_sleep+0x144/0x194
__might_sleep+0x58/0x8c
mutex_lock+0x2c/0x48
f2fs_trace_pid+0x88/0x14c
f2fs_set_node_page_dirty+0xd0/0x184
Do not use f2fs_radix_tree_insert() to avoid doing cond_resched() with
spin_lock() acquired.
Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit cc725ef3cb202ef2019a3c67c8913efa05c3cce6 ]
In the process of creating a node, it will cause NULL pointer
dereference in kernel if o2cb_ctl failed in the interval (mkdir,
o2cb_set_node_attribute(node_num)] in function o2cb_add_node.
The node num is initialized to 0 in function o2nm_node_group_make_item,
o2nm_node_group_drop_item will mistake the node number 0 for a valid
node number when we delete the node before the node number is set
correctly. If the local node number of the current host happens to be
0, cluster->cl_local_node will be set to O2NM_INVALID_NODE_NUM while
o2hb_thread still running. The panic stack is generated as follows:
o2hb_thread
\-o2hb_do_disk_heartbeat
\-o2hb_check_own_slot
|-slot = ®->hr_slots[o2nm_this_node()];
//o2nm_this_node() return O2NM_INVALID_NODE_NUM
We need to check whether the node number is set when we delete the node.
Link: http://lkml.kernel.org/r/133d8045-72cc-863e-8eae-5013f9f6bc51@huawei.com
Signed-off-by: Jia Guo <guojia12@huawei.com>
Reviewed-by: Joseph Qi <jiangqi903@gmail.com>
Acked-by: Jun Piao <piaojun@huawei.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <ge.changwei@h3c.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 92d1d07daad65c300c7d0b68bbef8867e9895d54 ]
Kmemleak throws endless warnings during boot due to in
__alloc_alien_cache(),
alc = kmalloc_node(memsize, gfp, node);
init_arraycache(&alc->ac, entries, batch);
kmemleak_no_scan(ac);
Kmemleak does not track the array cache (alc->ac) but the alien cache
(alc) instead, so let it track the latter by lifting kmemleak_no_scan()
out of init_arraycache().
There is another place that calls init_arraycache(), but
alloc_kmem_cache_cpus() uses the percpu allocation where will never be
considered as a leak.
kmemleak: Found object by alias at 0xffff8007b9aa7e38
CPU: 190 PID: 1 Comm: swapper/0 Not tainted 5.0.0-rc2+ #2
Call trace:
dump_backtrace+0x0/0x168
show_stack+0x24/0x30
dump_stack+0x88/0xb0
lookup_object+0x84/0xac
find_and_get_object+0x84/0xe4
kmemleak_no_scan+0x74/0xf4
setup_kmem_cache_node+0x2b4/0x35c
__do_tune_cpucache+0x250/0x2d4
do_tune_cpucache+0x4c/0xe4
enable_cpucache+0xc8/0x110
setup_cpu_cache+0x40/0x1b8
__kmem_cache_create+0x240/0x358
create_cache+0xc0/0x198
kmem_cache_create_usercopy+0x158/0x20c
kmem_cache_create+0x50/0x64
fsnotify_init+0x58/0x6c
do_one_initcall+0x194/0x388
kernel_init_freeable+0x668/0x688
kernel_init+0x18/0x124
ret_from_fork+0x10/0x18
kmemleak: Object 0xffff8007b9aa7e00 (size 256):
kmemleak: comm "swapper/0", pid 1, jiffies 4294697137
kmemleak: min_count = 1
kmemleak: count = 0
kmemleak: flags = 0x1
kmemleak: checksum = 0
kmemleak: backtrace:
kmemleak_alloc+0x84/0xb8
kmem_cache_alloc_node_trace+0x31c/0x3a0
__kmalloc_node+0x58/0x78
setup_kmem_cache_node+0x26c/0x35c
__do_tune_cpucache+0x250/0x2d4
do_tune_cpucache+0x4c/0xe4
enable_cpucache+0xc8/0x110
setup_cpu_cache+0x40/0x1b8
__kmem_cache_create+0x240/0x358
create_cache+0xc0/0x198
kmem_cache_create_usercopy+0x158/0x20c
kmem_cache_create+0x50/0x64
fsnotify_init+0x58/0x6c
do_one_initcall+0x194/0x388
kernel_init_freeable+0x668/0x688
kernel_init+0x18/0x124
kmemleak: Not scanning unknown object at 0xffff8007b9aa7e38
CPU: 190 PID: 1 Comm: swapper/0 Not tainted 5.0.0-rc2+ #2
Call trace:
dump_backtrace+0x0/0x168
show_stack+0x24/0x30
dump_stack+0x88/0xb0
kmemleak_no_scan+0x90/0xf4
setup_kmem_cache_node+0x2b4/0x35c
__do_tune_cpucache+0x250/0x2d4
do_tune_cpucache+0x4c/0xe4
enable_cpucache+0xc8/0x110
setup_cpu_cache+0x40/0x1b8
__kmem_cache_create+0x240/0x358
create_cache+0xc0/0x198
kmem_cache_create_usercopy+0x158/0x20c
kmem_cache_create+0x50/0x64
fsnotify_init+0x58/0x6c
do_one_initcall+0x194/0x388
kernel_init_freeable+0x668/0x688
kernel_init+0x18/0x124
ret_from_fork+0x10/0x18
Link: http://lkml.kernel.org/r/20190129184518.39808-1-cai@lca.pw
Fixes: 1fe00d50a9e8 ("slab: factor out initialization of array cache")
Signed-off-by: Qian Cai <cai@lca.pw>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit afd07389d3f4933c7f7817a92fb5e053d59a3182 ]
One of the vmalloc stress test case triggers the kernel BUG():
<snip>
[60.562151] ------------[ cut here ]------------
[60.562154] kernel BUG at mm/vmalloc.c:512!
[60.562206] invalid opcode: 0000 [#1] PREEMPT SMP PTI
[60.562247] CPU: 0 PID: 430 Comm: vmalloc_test/0 Not tainted 4.20.0+ #161
[60.562293] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
[60.562351] RIP: 0010:alloc_vmap_area+0x36f/0x390
<snip>
it can happen due to big align request resulting in overflowing of
calculated address, i.e. it becomes 0 after ALIGN()'s fixup.
Fix it by checking if calculated address is within vstart/vend range.
Link: http://lkml.kernel.org/r/20190124115648.9433-2-urezki@gmail.com
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 2e25644e8da4ed3a27e7b8315aaae74660be72dc ]
Syzbot with KMSAN reports (excerpt):
==================================================================
BUG: KMSAN: uninit-value in mpol_rebind_policy mm/mempolicy.c:353 [inline]
BUG: KMSAN: uninit-value in mpol_rebind_mm+0x249/0x370 mm/mempolicy.c:384
CPU: 1 PID: 17420 Comm: syz-executor4 Not tainted 4.20.0-rc7+ #15
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x173/0x1d0 lib/dump_stack.c:113
kmsan_report+0x12e/0x2a0 mm/kmsan/kmsan.c:613
__msan_warning+0x82/0xf0 mm/kmsan/kmsan_instr.c:295
mpol_rebind_policy mm/mempolicy.c:353 [inline]
mpol_rebind_mm+0x249/0x370 mm/mempolicy.c:384
update_tasks_nodemask+0x608/0xca0 kernel/cgroup/cpuset.c:1120
update_nodemasks_hier kernel/cgroup/cpuset.c:1185 [inline]
update_nodemask kernel/cgroup/cpuset.c:1253 [inline]
cpuset_write_resmask+0x2a98/0x34b0 kernel/cgroup/cpuset.c:1728
...
Uninit was created at:
kmsan_save_stack_with_flags mm/kmsan/kmsan.c:204 [inline]
kmsan_internal_poison_shadow+0x92/0x150 mm/kmsan/kmsan.c:158
kmsan_kmalloc+0xa6/0x130 mm/kmsan/kmsan_hooks.c:176
kmem_cache_alloc+0x572/0xb90 mm/slub.c:2777
mpol_new mm/mempolicy.c:276 [inline]
do_mbind mm/mempolicy.c:1180 [inline]
kernel_mbind+0x8a7/0x31a0 mm/mempolicy.c:1347
__do_sys_mbind mm/mempolicy.c:1354 [inline]
As it's difficult to report where exactly the uninit value resides in
the mempolicy object, we have to guess a bit. mm/mempolicy.c:353
contains this part of mpol_rebind_policy():
if (!mpol_store_user_nodemask(pol) &&
nodes_equal(pol->w.cpuset_mems_allowed, *newmask))
"mpol_store_user_nodemask(pol)" is testing pol->flags, which I couldn't
ever see being uninitialized after leaving mpol_new(). So I'll guess
it's actually about accessing pol->w.cpuset_mems_allowed on line 354,
but still part of statement starting on line 353.
For w.cpuset_mems_allowed to be not initialized, and the nodes_equal()
reachable for a mempolicy where mpol_set_nodemask() is called in
do_mbind(), it seems the only possibility is a MPOL_PREFERRED policy
with empty set of nodes, i.e. MPOL_LOCAL equivalent, with MPOL_F_LOCAL
flag. Let's exclude such policies from the nodes_equal() check. Note
the uninit access should be benign anyway, as rebinding this kind of
policy is always a no-op. Therefore no actual need for stable
inclusion.
Link: http://lkml.kernel.org/r/a71997c3-e8ae-a787-d5ce-3db05768b27c@suse.cz
Link: http://lkml.kernel.org/r/73da3e9c-cc84-509e-17d9-0c434bb9967d@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: syzbot+b19c2dc2c990ea657a71@syzkaller.appspotmail.com
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Yisheng Xie <xieyisheng1@huawei.com>
Cc: zhong jiang <zhongjiang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 7775face207922ea62a4e96b9cd45abfdc7b9840 ]
If a memory cgroup contains a single process with many threads
(including different process group sharing the mm) then it is possible
to trigger a race when the oom killer complains that there are no oom
elible tasks and complain into the log which is both annoying and
confusing because there is no actual problem. The race looks as
follows:
P1 oom_reaper P2
try_charge try_charge
mem_cgroup_out_of_memory
mutex_lock(oom_lock)
out_of_memory
oom_kill_process(P1,P2)
wake_oom_reaper
mutex_unlock(oom_lock)
oom_reap_task
mutex_lock(oom_lock)
select_bad_process # no victim
The problem is more visible with many threads.
Fix this by checking for fatal_signal_pending from
mem_cgroup_out_of_memory when the oom_lock is already held.
The oom bypass is safe because we do the same early in the try_charge
path already. The situation migh have changed in the mean time. It
should be safe to check for fatal_signal_pending and tsk_is_oom_victim
but for a better code readability abstract the current charge bypass
condition into should_force_charge and reuse it from that path. "
Link: http://lkml.kernel.org/r/01370f70-e1f6-ebe4-b95e-0df21a0bc15e@i-love.sakura.ne.jp
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit d342a0b38674867ea67fde47b0e1e60ffe9f17a2 ]
Since setting global init process to some memory cgroup is technically
possible, oom_kill_memcg_member() must check it.
Tasks in /test1 are going to be killed due to memory.oom.group set
Memory cgroup out of memory: Killed process 1 (systemd) total-vm:43400kB, anon-rss:1228kB, file-rss:3992kB, shmem-rss:0kB
oom_reaper: reaped process 1 (systemd), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000008b
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
int main(int argc, char *argv[])
{
static char buffer[10485760];
static int pipe_fd[2] = { EOF, EOF };
unsigned int i;
int fd;
char buf[64] = { };
if (pipe(pipe_fd))
return 1;
if (chdir("/sys/fs/cgroup/"))
return 1;
fd = open("cgroup.subtree_control", O_WRONLY);
write(fd, "+memory", 7);
close(fd);
mkdir("test1", 0755);
fd = open("test1/memory.oom.group", O_WRONLY);
write(fd, "1", 1);
close(fd);
fd = open("test1/cgroup.procs", O_WRONLY);
write(fd, "1", 1);
snprintf(buf, sizeof(buf) - 1, "%d", getpid());
write(fd, buf, strlen(buf));
close(fd);
snprintf(buf, sizeof(buf) - 1, "%lu", sizeof(buffer) * 5);
fd = open("test1/memory.max", O_WRONLY);
write(fd, buf, strlen(buf));
close(fd);
for (i = 0; i < 10; i++)
if (fork() == 0) {
char c;
close(pipe_fd[1]);
read(pipe_fd[0], &c, 1);
memset(buffer, 0, sizeof(buffer));
sleep(3);
_exit(0);
}
close(pipe_fd[0]);
close(pipe_fd[1]);
sleep(3);
return 0;
}
[ 37.052923][ T9185] a.out invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=0
[ 37.056169][ T9185] CPU: 4 PID: 9185 Comm: a.out Kdump: loaded Not tainted 5.0.0-rc4-next-20190131 #280
[ 37.059205][ T9185] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018
[ 37.062954][ T9185] Call Trace:
[ 37.063976][ T9185] dump_stack+0x67/0x95
[ 37.065263][ T9185] dump_header+0x51/0x570
[ 37.066619][ T9185] ? trace_hardirqs_on+0x3f/0x110
[ 37.068171][ T9185] ? _raw_spin_unlock_irqrestore+0x3d/0x70
[ 37.069967][ T9185] oom_kill_process+0x18d/0x210
[ 37.071515][ T9185] out_of_memory+0x11b/0x380
[ 37.072936][ T9185] mem_cgroup_out_of_memory+0xb6/0xd0
[ 37.074601][ T9185] try_charge+0x790/0x820
[ 37.076021][ T9185] mem_cgroup_try_charge+0x42/0x1d0
[ 37.077629][ T9185] mem_cgroup_try_charge_delay+0x11/0x30
[ 37.079370][ T9185] do_anonymous_page+0x105/0x5e0
[ 37.080939][ T9185] __handle_mm_fault+0x9cb/0x1070
[ 37.082485][ T9185] handle_mm_fault+0x1b2/0x3a0
[ 37.083819][ T9185] ? handle_mm_fault+0x47/0x3a0
[ 37.085181][ T9185] __do_page_fault+0x255/0x4c0
[ 37.086529][ T9185] do_page_fault+0x28/0x260
[ 37.087788][ T9185] ? page_fault+0x8/0x30
[ 37.088978][ T9185] page_fault+0x1e/0x30
[ 37.090142][ T9185] RIP: 0033:0x7f8b183aefe0
[ 37.091433][ T9185] Code: 20 f3 44 0f 7f 44 17 d0 f3 44 0f 7f 47 30 f3 44 0f 7f 44 17 c0 48 01 fa 48 83 e2 c0 48 39 d1 74 a3 66 0f 1f 84 00 00 00 00 00 <66> 44 0f 7f 01 66 44 0f 7f 41 10 66 44 0f 7f 41 20 66 44 0f 7f 41
[ 37.096917][ T9185] RSP: 002b:00007fffc5d329e8 EFLAGS: 00010206
[ 37.098615][ T9185] RAX: 00000000006010e0 RBX: 0000000000000008 RCX: 0000000000c30000
[ 37.100905][ T9185] RDX: 00000000010010c0 RSI: 0000000000000000 RDI: 00000000006010e0
[ 37.103349][ T9185] RBP: 0000000000000000 R08: 00007f8b188f4740 R09: 0000000000000000
[ 37.105797][ T9185] R10: 00007fffc5d32420 R11: 00007f8b183aef40 R12: 0000000000000005
[ 37.108228][ T9185] R13: 0000000000000000 R14: ffffffffffffffff R15: 0000000000000000
[ 37.110840][ T9185] memory: usage 51200kB, limit 51200kB, failcnt 125
[ 37.113045][ T9185] memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0
[ 37.115808][ T9185] kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
[ 37.117660][ T9185] Memory cgroup stats for /test1: cache:0KB rss:49484KB rss_huge:30720KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB inactive_anon:0KB active_anon:49700KB inactive_file:0KB active_file:0KB unevictable:0KB
[ 37.123371][ T9185] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/test1,task_memcg=/test1,task=a.out,pid=9188,uid=0
[ 37.128158][ T9185] Memory cgroup out of memory: Killed process 9188 (a.out) total-vm:14456kB, anon-rss:10324kB, file-rss:504kB, shmem-rss:0kB
[ 37.132710][ T9185] Tasks in /test1 are going to be killed due to memory.oom.group set
[ 37.132833][ T54] oom_reaper: reaped process 9188 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[ 37.135498][ T9185] Memory cgroup out of memory: Killed process 1 (systemd) total-vm:43400kB, anon-rss:1228kB, file-rss:3992kB, shmem-rss:0kB
[ 37.143434][ T9185] Memory cgroup out of memory: Killed process 9182 (a.out) total-vm:14456kB, anon-rss:76kB, file-rss:588kB, shmem-rss:0kB
[ 37.144328][ T54] oom_reaper: reaped process 1 (systemd), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[ 37.147585][ T9185] Memory cgroup out of memory: Killed process 9183 (a.out) total-vm:14456kB, anon-rss:6228kB, file-rss:512kB, shmem-rss:0kB
[ 37.157222][ T9185] Memory cgroup out of memory: Killed process 9184 (a.out) total-vm:14456kB, anon-rss:6228kB, file-rss:508kB, shmem-rss:0kB
[ 37.157259][ T9185] Memory cgroup out of memory: Killed process 9185 (a.out) total-vm:14456kB, anon-rss:6228kB, file-rss:512kB, shmem-rss:0kB
[ 37.157291][ T9185] Memory cgroup out of memory: Killed process 9186 (a.out) total-vm:14456kB, anon-rss:4180kB, file-rss:508kB, shmem-rss:0kB
[ 37.157306][ T54] oom_reaper: reaped process 9183 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[ 37.157328][ T9185] Memory cgroup out of memory: Killed process 9187 (a.out) total-vm:14456kB, anon-rss:4180kB, file-rss:512kB, shmem-rss:0kB
[ 37.157452][ T9185] Memory cgroup out of memory: Killed process 9189 (a.out) total-vm:14456kB, anon-rss:6228kB, file-rss:512kB, shmem-rss:0kB
[ 37.158733][ T9185] Memory cgroup out of memory: Killed process 9190 (a.out) total-vm:14456kB, anon-rss:552kB, file-rss:512kB, shmem-rss:0kB
[ 37.160083][ T54] oom_reaper: reaped process 9186 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[ 37.160187][ T54] oom_reaper: reaped process 9189 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[ 37.206941][ T54] oom_reaper: reaped process 9185 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[ 37.212300][ T9185] Memory cgroup out of memory: Killed process 9191 (a.out) total-vm:14456kB, anon-rss:4180kB, file-rss:512kB, shmem-rss:0kB
[ 37.212317][ T54] oom_reaper: reaped process 9190 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[ 37.218860][ T9185] Memory cgroup out of memory: Killed process 9192 (a.out) total-vm:14456kB, anon-rss:1080kB, file-rss:512kB, shmem-rss:0kB
[ 37.227667][ T54] oom_reaper: reaped process 9192 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[ 37.292323][ T9193] abrt-hook-ccpp (9193) used greatest stack depth: 10480 bytes left
[ 37.351843][ T1] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000008b
[ 37.354833][ T1] CPU: 7 PID: 1 Comm: systemd Kdump: loaded Not tainted 5.0.0-rc4-next-20190131 #280
[ 37.357876][ T1] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018
[ 37.361685][ T1] Call Trace:
[ 37.363239][ T1] dump_stack+0x67/0x95
[ 37.365010][ T1] panic+0xfc/0x2b0
[ 37.366853][ T1] do_exit+0xd55/0xd60
[ 37.368595][ T1] do_group_exit+0x47/0xc0
[ 37.370415][ T1] get_signal+0x32a/0x920
[ 37.372449][ T1] ? _raw_spin_unlock_irqrestore+0x3d/0x70
[ 37.374596][ T1] do_signal+0x32/0x6e0
[ 37.376430][ T1] ? exit_to_usermode_loop+0x26/0x9b
[ 37.378418][ T1] ? prepare_exit_to_usermode+0xa8/0xd0
[ 37.380571][ T1] exit_to_usermode_loop+0x3e/0x9b
[ 37.382588][ T1] prepare_exit_to_usermode+0xa8/0xd0
[ 37.384594][ T1] ? page_fault+0x8/0x30
[ 37.386453][ T1] retint_user+0x8/0x18
[ 37.388160][ T1] RIP: 0033:0x7f42c06974a8
[ 37.389922][ T1] Code: Bad RIP value.
[ 37.391788][ T1] RSP: 002b:00007ffc3effd388 EFLAGS: 00010213
[ 37.394075][ T1] RAX: 000000000000000e RBX: 00007ffc3effd390 RCX: 0000000000000000
[ 37.396963][ T1] RDX: 000000000000002a RSI: 00007ffc3effd390 RDI: 0000000000000004
[ 37.399550][ T1] RBP: 00007ffc3effd680 R08: 0000000000000000 R09: 0000000000000000
[ 37.402334][ T1] R10: 00000000ffffffff R11: 0000000000000246 R12: 0000000000000001
[ 37.404890][ T1] R13: ffffffffffffffff R14: 0000000000000884 R15: 000056460b1ac3b0
Link: http://lkml.kernel.org/r/201902010336.x113a4EO027170@www262.sakura.ne.jp
Fixes: 3d8b38eb81cac813 ("mm, oom: introduce memory.oom.group")
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit bc8ff3ca6589d63c6d10f5ee8bed38f74851b469 ]
The descriptions of userspace memory access functions had minor issues
with formatting that made kernel-doc unable to properly detect the
function/macro names and the return value sections:
./arch/x86/include/asm/uaccess.h:80: info: Scanning doc for
./arch/x86/include/asm/uaccess.h:139: info: Scanning doc for
./arch/x86/include/asm/uaccess.h:231: info: Scanning doc for
./arch/x86/include/asm/uaccess.h:505: info: Scanning doc for
./arch/x86/include/asm/uaccess.h:530: info: Scanning doc for
./arch/x86/lib/usercopy_32.c:58: info: Scanning doc for
./arch/x86/lib/usercopy_32.c:69: warning: No description found for return
value of 'clear_user'
./arch/x86/lib/usercopy_32.c:78: info: Scanning doc for
./arch/x86/lib/usercopy_32.c:90: warning: No description found for return
value of '__clear_user'
Fix the formatting.
Link: http://lkml.kernel.org/r/1549549644-4903-3-git-send-email-rppt@linux.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit c10d38cc8d3e43f946b6c2bf4602c86791587f30 ]
Dan Carpenter reports a potential NULL dereference in
get_swap_page_of_type:
Smatch complains that the NULL checks on "si" aren't consistent. This
seems like a real bug because we have not ensured that the type is
valid and so "si" can be NULL.
Add the missing check for NULL, taking care to use a read barrier to
ensure CPU1 observes CPU0's updates in the correct order:
CPU0 CPU1
alloc_swap_info() if (type >= nr_swapfiles)
swap_info[type] = p /* handle invalid entry */
smp_wmb() smp_rmb()
++nr_swapfiles p = swap_info[type]
Without smp_rmb, CPU1 might observe CPU0's write to nr_swapfiles before
CPU0's write to swap_info[type] and read NULL from swap_info[type].
Ying Huang noticed other places in swapfile.c don't order these reads
properly. Introduce swap_type_to_swap_info to encourage correct usage.
Use READ_ONCE and WRITE_ONCE to follow the Linux Kernel Memory Model
(see tools/memory-model/Documentation/explanation.txt).
This ordering need not be enforced in places where swap_lock is held
(e.g. si_swapinfo) because swap_lock serializes updates to nr_swapfiles
and the swap_info array.
Link: http://lkml.kernel.org/r/20190131024410.29859-1-daniel.m.jordan@oracle.com
Fixes: ec8acf20afb8 ("swap: add per-partition lock for swapfile")
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Suggested-by: "Huang, Ying" <ying.huang@intel.com>
Reviewed-by: Andrea Parri <andrea.parri@amarulasolutions.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Omar Sandoval <osandov@fb.com>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Shaohua Li <shli@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Tejun Heo <tj@kernel.org>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 0c81585499601acd1d0e1cbf424cabfaee60628c ]
After offlining a memory block, kmemleak scan will trigger a crash, as
it encounters a page ext address that has already been freed during
memory offlining. At the beginning in alloc_page_ext(), it calls
kmemleak_alloc(), but it does not call kmemleak_free() in
free_page_ext().
BUG: unable to handle kernel paging request at ffff888453d00000
PGD 128a01067 P4D 128a01067 PUD 128a04067 PMD 47e09e067 PTE 800ffffbac2ff060
Oops: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
CPU: 1 PID: 1594 Comm: bash Not tainted 5.0.0-rc8+ #15
Hardware name: HP ProLiant DL180 Gen9/ProLiant DL180 Gen9, BIOS U20 10/25/2017
RIP: 0010:scan_block+0xb5/0x290
Code: 85 6e 01 00 00 48 b8 00 00 30 f5 81 88 ff ff 48 39 c3 0f 84 5b 01 00 00 48 89 d8 48 c1 e8 03 42 80 3c 20 00 0f 85 87 01 00 00 <4c> 8b 3b e8 f3 0c fa ff 4c 39 3d 0c 6b 4c 01 0f 87 08 01 00 00 4c
RSP: 0018:ffff8881ec57f8e0 EFLAGS: 00010082
RAX: 0000000000000000 RBX: ffff888453d00000 RCX: ffffffffa61e5a54
RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff888453d00000
RBP: ffff8881ec57f920 R08: fffffbfff4ed588d R09: fffffbfff4ed588c
R10: fffffbfff4ed588c R11: ffffffffa76ac463 R12: dffffc0000000000
R13: ffff888453d00ff9 R14: ffff8881f80cef48 R15: ffff8881f80cef48
FS: 00007f6c0e3f8740(0000) GS:ffff8881f7680000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff888453d00000 CR3: 00000001c4244003 CR4: 00000000001606a0
Call Trace:
scan_gray_list+0x269/0x430
kmemleak_scan+0x5a8/0x10f0
kmemleak_write+0x541/0x6ca
full_proxy_write+0xf8/0x190
__vfs_write+0xeb/0x980
vfs_write+0x15a/0x4f0
ksys_write+0xd2/0x1b0
__x64_sys_write+0x73/0xb0
do_syscall_64+0xeb/0xaaa
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f6c0dad73b8
Code: 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 65 63 2d 00 8b 00 85 c0 75 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 49 89 d4 55
RSP: 002b:00007ffd5b863cb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007f6c0dad73b8
RDX: 0000000000000005 RSI: 000055a9216e1710 RDI: 0000000000000001
RBP: 000055a9216e1710 R08: 000000000000000a R09: 00007ffd5b863840
R10: 000000000000000a R11: 0000000000000246 R12: 00007f6c0dda9780
R13: 0000000000000005 R14: 00007f6c0dda4740 R15: 0000000000000005
Modules linked in: nls_iso8859_1 nls_cp437 vfat fat kvm_intel kvm irqbypass efivars ip_tables x_tables xfs sd_mod ahci libahci igb i2c_algo_bit libata i2c_core dm_mirror dm_region_hash dm_log dm_mod efivarfs
CR2: ffff888453d00000
---[ end trace ccf646c7456717c5 ]---
Kernel panic - not syncing: Fatal exception
Shutting down cpus with NMI
Kernel Offset: 0x24c00000 from 0xffffffff81000000 (relocation range:
0xffffffff80000000-0xffffffffbfffffff)
---[ end Kernel panic - not syncing: Fatal exception ]---
Link: http://lkml.kernel.org/r/20190227173147.75650-1-cai@lca.pw
Signed-off-by: Qian Cai <cai@lca.pw>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 0d3bd18a5efd66097ef58622b898d3139790aa9d ]
In case cma_init_reserved_mem failed, need to free the memblock
allocated by memblock_reserve or memblock_alloc_range.
Quote Catalin's comments:
https://lkml.org/lkml/2019/2/26/482
Kmemleak is supposed to work with the memblock_{alloc,free} pair and it
ignores the memblock_reserve() as a memblock_alloc() implementation
detail. It is, however, tolerant to memblock_free() being called on
a sub-range or just a different range from a previous memblock_alloc().
So the original patch looks fine to me. FWIW:
Link: http://lkml.kernel.org/r/20190227144631.16708-1-peng.fan@nxp.com
Signed-off-by: Peng Fan <peng.fan@nxp.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit d778015ac95bc036af73342c878ab19250e01fe1 ]
next_present_section_nr() could only return an unsigned number -1, so
just check it specifically where compilers will convert -1 to unsigned
if needed.
mm/sparse.c: In function 'sparse_init_nid':
mm/sparse.c:200:20: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits]
((section_nr >= 0) && \
^~
mm/sparse.c:478:2: note: in expansion of macro
'for_each_present_section_nr'
for_each_present_section_nr(pnum_begin, pnum) {
^~~~~~~~~~~~~~~~~~~~~~~~~~~
mm/sparse.c:200:20: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits]
((section_nr >= 0) && \
^~
mm/sparse.c:497:2: note: in expansion of macro
'for_each_present_section_nr'
for_each_present_section_nr(pnum_begin, pnum) {
^~~~~~~~~~~~~~~~~~~~~~~~~~~
mm/sparse.c: In function 'sparse_init':
mm/sparse.c:200:20: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits]
((section_nr >= 0) && \
^~
mm/sparse.c:520:2: note: in expansion of macro
'for_each_present_section_nr'
for_each_present_section_nr(pnum_begin + 1, pnum_end) {
^~~~~~~~~~~~~~~~~~~~~~~~~~~
Link: http://lkml.kernel.org/r/20190228181839.86504-1-cai@lca.pw
Fixes: c4e1be9ec113 ("mm, sparsemem: break out of loops early")
Signed-off-by: Qian Cai <cai@lca.pw>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit e34c940245437f36d2c492edd1f8237eff391064 ]
Ravi Bangoria reported that we fail with an empty NUMA node with the
following message:
$ lscpu
NUMA node0 CPU(s):
NUMA node1 CPU(s): 0-4
$ sudo ./perf c2c report
node/cpu topology bugFailed setup nodes
Fix this by detecting the empty node and keeping its CPU set empty.
Reported-by: Nageswara R Sastry <nasastry@in.ibm.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jonas Rabenstein <jonas.rabenstein@studium.uni-erlangen.de>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20190305152536.21035-2-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 179fb36abb097976997f50733d5b122a29158cba ]
After commit 68bb7bfb7985 ("X86/Hyper-V: Enable IPI enlightenments"),
kexec fails with a kernel panic:
kexec_core: Starting new kernel
BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v3.0 03/02/2018
RIP: 0010:0xffffc9000001d000
Call Trace:
? __send_ipi_mask+0x1c6/0x2d0
? hv_send_ipi_mask_allbutself+0x6d/0xb0
? mp_save_irq+0x70/0x70
? __ioapic_read_entry+0x32/0x50
? ioapic_read_entry+0x39/0x50
? clear_IO_APIC_pin+0xb8/0x110
? native_stop_other_cpus+0x6e/0x170
? native_machine_shutdown+0x22/0x40
? kernel_kexec+0x136/0x156
That happens if hypercall based IPIs are used because the hypercall page is
reset very early upon kexec reboot, but kexec sends IPIs to stop CPUs,
which invokes the hypercall and dereferences the unusable page.
To fix his, reset hv_hypercall_pg to NULL before the page is reset to avoid
any misuse, IPI sending will fall back to the non hypercall based
method. This only happens on kexec / kdump so just setting the pointer to
NULL is good enough.
Fixes: 68bb7bfb7985 ("X86/Hyper-V: Enable IPI enlightenments")
Signed-off-by: Kairui Song <kasong@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: devel@linuxdriverproject.org
Link: https://lkml.kernel.org/r/20190306111827.14131-1-kasong@redhat.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit e0f0ae838a25464179d37f355d763f9ec139fc15 ]
The pm8xxx_get_channel() implementation is unclear, and causes gcc to
suddenly generate odd warnings. The trigger for the warning (at least
for me) was the entirely unrelated commit 79a4e91d1bb2 ("device.h: Add
__cold to dev_<level> logging functions"), which apparently changes gcc
code generation in the caller function enough to cause this:
drivers/iio/adc/qcom-pm8xxx-xoadc.c: In function ‘pm8xxx_xoadc_probe’:
drivers/iio/adc/qcom-pm8xxx-xoadc.c:633:8: warning: ‘ch’ may be used uninitialized in this function [-Wmaybe-uninitialized]
ret = pm8xxx_read_channel_rsv(adc, ch, AMUX_RSV4,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
&read_nomux_rsv4, true);
~~~~~~~~~~~~~~~~~~~~~~~
drivers/iio/adc/qcom-pm8xxx-xoadc.c:426:27: note: ‘ch’ was declared here
struct pm8xxx_chan_info *ch;
^~
because gcc for some reason then isn't able to see that the termination
condition for the "for( )" loop in that function is also the condition
for returning NULL.
So it's not _actually_ uninitialized, but the function is admittedly
just unnecessarily oddly written.
Simplify and clarify the function, making gcc also see that it always
returns a valid initialized value.
Cc: Joe Perches <joe@perches.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Andy Gross <andy.gross@linaro.org>
Cc: David Brown <david.brown@linaro.org>
Cc: Jonathan Cameron <jic23@kernel.org>
Cc: Hartmut Knaack <knaack.h@gmx.de>
Cc: Lars-Peter Clausen <lars@metafoo.de>
Cc: Peter Meerwald-Stadler <pmeerw@pmeerw.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|