| Age | Commit message (Collapse) | Author | Files | Lines |
|
Tested-by: Pavel Machek (CIP) <pavel@denx.de>
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Tested-by: Shuah Khan <skhan@linuxfoundation.org>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20201120104539.534424264@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 51b958e5aeb1e18c00332e0b37c5d4e95a3eff84 upstream.
The instruction emulator ignores clflush instructions, yet fails to
support clflushopt. Treat both similarly.
Fixes: 13e457e0eebf ("KVM: x86: Emulator does not decode clflush well")
Signed-off-by: David Edmondson <david.edmondson@oracle.com>
Message-Id: <20201103120400.240882-1-david.edmondson@oracle.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit dcd479e10a0510522a5d88b29b8f79ea3467d501 upstream.
When (for example) an IBSS station is pre-moved to AUTHORIZED
before it's inserted, and then the insertion fails, we don't
clean up the fast RX/TX states that might already have been
created, since we don't go through all the state transitions
again on the way down.
Do that, if it hasn't been done already, when the station is
freed. I considered only freeing the fast TX/RX state there,
but we might add more state so it's more robust to wind down
the state properly.
Note that we warn if the station was ever inserted, it should
have been properly cleaned up in that case, and the driver
will probably not like things happening out of order.
Reported-by: syzbot+2e293dbd67de2836ba42@syzkaller.appspotmail.com
Link: https://lore.kernel.org/r/20201009141710.7223b322a955.I95bd08b9ad0e039c034927cce0b75beea38e059b@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 77e70d351db7de07a46ac49b87a6c3c7a60fca7e upstream.
We need to make sure we cancel the reinit work before we tear down the
driver structures.
Reported-by: Bodong Zhao <nopitydays@gmail.com>
Tested-by: Bodong Zhao <nopitydays@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 29daf869cbab69088fe1755d9dd224e99ba78b56 upstream.
The kernel expects pte_young() to work regardless of CONFIG_SWAP.
Make sure a minor fault is taken to set _PAGE_ACCESSED when it
is not already set, regardless of the selection of CONFIG_SWAP.
This adds at least 3 instructions to the TLB miss exception
handlers fast path. Following patch will reduce this overhead.
Also update the rotation instruction to the correct number of bits
to reflect all changes done to _PAGE_ACCESSED over time.
Fixes: d069cb4373fe ("powerpc/8xx: Don't touch ACCESSED when no SWAP.")
Fixes: 5f356497c384 ("powerpc/8xx: remove unused _PAGE_WRITETHRU")
Fixes: e0a8e0d90a9f ("powerpc/8xx: Handle PAGE_USER via APG bits")
Fixes: 5b2753fc3e8a ("powerpc/8xx: Implementation of PAGE_EXEC")
Fixes: a891c43b97d3 ("powerpc/8xx: Prepare handlers for _PAGE_HUGE for 512k pages.")
Cc: stable@vger.kernel.org
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/af834e8a0f1fa97bfae65664950f0984a70c4750.1602492856.git.christophe.leroy@csgroup.eu
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit afca6c5b2595fc44383919fba740c194b0b76aff upstream
A recent fuzzed filesystem image cached random dcache corruption
when the reproducer was run. This often showed up as panics in
lookup_slow() on a null inode->i_ops pointer when doing pathwalks.
BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
....
Call Trace:
lookup_slow+0x44/0x60
walk_component+0x3dd/0x9f0
link_path_walk+0x4a7/0x830
path_lookupat+0xc1/0x470
filename_lookup+0x129/0x270
user_path_at_empty+0x36/0x40
path_listxattr+0x98/0x110
SyS_listxattr+0x13/0x20
do_syscall_64+0xf5/0x280
entry_SYSCALL_64_after_hwframe+0x42/0xb7
but had many different failure modes including deadlocks trying to
lock the inode that was just allocated or KASAN reports of
use-after-free violations.
The cause of the problem was a corrupt INOBT on a v4 fs where the
root inode was marked as free in the inobt record. Hence when we
allocated an inode, it chose the root inode to allocate, found it in
the cache and re-initialised it.
We recently fixed a similar inode allocation issue caused by inobt
record corruption problem in xfs_iget_cache_miss() in commit
ee457001ed6c ("xfs: catch inode allocation state mismatch
corruption"). This change adds similar checks to the cache-hit path
to catch it, and turns the reproducer into a corruption shutdown
situation.
Reported-by: Wen Xu <wen.xu@gatech.edu>
Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
[darrick: fix typos in comment]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
[sudip: use ip->i_d.di_mode instead of VFS_I(ip)->i_mode]
Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit ee457001ed6c6f31ddad69c24c1da8f377d8472d upstream
We recently came across a V4 filesystem causing memory corruption
due to a newly allocated inode being setup twice and being added to
the superblock inode list twice. From code inspection, the only way
this could happen is if a newly allocated inode was not marked as
free on disk (i.e. di_mode wasn't zero).
Running the metadump on an upstream debug kernel fails during inode
allocation like so:
XFS: Assertion failed: ip->i_d.di_nblocks == 0, file: fs/xfs/xfs_inod=
e.c, line: 838
------------[ cut here ]------------
kernel BUG at fs/xfs/xfs_message.c:114!
invalid opcode: 0000 [#1] PREEMPT SMP
CPU: 11 PID: 3496 Comm: mkdir Not tainted 4.16.0-rc5-dgc #442
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/0=
1/2014
RIP: 0010:assfail+0x28/0x30
RSP: 0018:ffffc9000236fc80 EFLAGS: 00010202
RAX: 00000000ffffffea RBX: 0000000000004000 RCX: 0000000000000000
RDX: 00000000ffffffc0 RSI: 000000000000000a RDI: ffffffff8227211b
RBP: ffffc9000236fce8 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000bec R11: f000000000000000 R12: ffffc9000236fd30
R13: ffff8805c76bab80 R14: ffff8805c77ac800 R15: ffff88083fb12e10
FS: 00007fac8cbff040(0000) GS:ffff88083fd00000(0000) knlGS:0000000000000=
000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fffa6783ff8 CR3: 00000005c6e2b003 CR4: 00000000000606e0
Call Trace:
xfs_ialloc+0x383/0x570
xfs_dir_ialloc+0x6a/0x2a0
xfs_create+0x412/0x670
xfs_generic_create+0x1f7/0x2c0
? capable_wrt_inode_uidgid+0x3f/0x50
vfs_mkdir+0xfb/0x1b0
SyS_mkdir+0xcf/0xf0
do_syscall_64+0x73/0x1a0
entry_SYSCALL_64_after_hwframe+0x42/0xb7
Extracting the inode number we crashed on from an event trace and
looking at it with xfs_db:
xfs_db> inode 184452204
xfs_db> p
core.magic = 0x494e
core.mode = 0100644
core.version = 2
core.format = 2 (extents)
core.nlinkv2 = 1
core.onlink = 0
.....
Confirms that it is not a free inode on disk. xfs_repair
also trips over this inode:
.....
zero length extent (off = 0, fsbno = 0) in ino 184452204
correcting nextents for inode 184452204
bad attribute fork in inode 184452204, would clear attr fork
bad nblocks 1 for inode 184452204, would reset to 0
bad anextents 1 for inode 184452204, would reset to 0
imap claims in-use inode 184452204 is free, would correct imap
would have cleared inode 184452204
.....
disconnected inode 184452204, would move to lost+found
And so we have a situation where the directory structure and the
inobt thinks the inode is free, but the inode on disk thinks it is
still in use. Where this corruption came from is not possible to
diagnose, but we can detect it and prevent the kernel from oopsing
on lookup. The reproducer now results in:
$ sudo mkdir /mnt/scratch/{0,1,2,3,4,5}{0,1,2,3,4,5}
mkdir: cannot create directory =E2=80=98/mnt/scratch/00=E2=80=99: File ex=
ists
mkdir: cannot create directory =E2=80=98/mnt/scratch/01=E2=80=99: File ex=
ists
mkdir: cannot create directory =E2=80=98/mnt/scratch/03=E2=80=99: Structu=
re needs cleaning
mkdir: cannot create directory =E2=80=98/mnt/scratch/04=E2=80=99: Input/o=
utput error
mkdir: cannot create directory =E2=80=98/mnt/scratch/05=E2=80=99: Input/o=
utput error
....
And this corruption shutdown:
[ 54.843517] XFS (loop0): Corruption detected! Free inode 0xafe846c not=
marked free on disk
[ 54.845885] XFS (loop0): Internal error xfs_trans_cancel at line 1023 =
of file fs/xfs/xfs_trans.c. Caller xfs_create+0x425/0x670
[ 54.848994] CPU: 10 PID: 3541 Comm: mkdir Not tainted 4.16.0-rc5-dgc #=
443
[ 54.850753] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIO=
S 1.10.2-1 04/01/2014
[ 54.852859] Call Trace:
[ 54.853531] dump_stack+0x85/0xc5
[ 54.854385] xfs_trans_cancel+0x197/0x1c0
[ 54.855421] xfs_create+0x425/0x670
[ 54.856314] xfs_generic_create+0x1f7/0x2c0
[ 54.857390] ? capable_wrt_inode_uidgid+0x3f/0x50
[ 54.858586] vfs_mkdir+0xfb/0x1b0
[ 54.859458] SyS_mkdir+0xcf/0xf0
[ 54.860254] do_syscall_64+0x73/0x1a0
[ 54.861193] entry_SYSCALL_64_after_hwframe+0x42/0xb7
[ 54.862492] RIP: 0033:0x7fb73bddf547
[ 54.863358] RSP: 002b:00007ffdaa553338 EFLAGS: 00000246 ORIG_RAX: 0000=
000000000053
[ 54.865133] RAX: ffffffffffffffda RBX: 00007ffdaa55449a RCX: 00007fb73=
bddf547
[ 54.866766] RDX: 0000000000000001 RSI: 00000000000001ff RDI: 00007ffda=
a55449a
[ 54.868432] RBP: 00007ffdaa55449a R08: 00000000000001ff R09: 00005623a=
8670dd0
[ 54.870110] R10: 00007fb73be72d5b R11: 0000000000000246 R12: 000000000=
00001ff
[ 54.871752] R13: 00007ffdaa5534b0 R14: 0000000000000000 R15: 00007ffda=
a553500
[ 54.873429] XFS (loop0): xfs_do_force_shutdown(0x8) called from line 1=
024 of file fs/xfs/xfs_trans.c. Return address = ffffffff814cd050
[ 54.882790] XFS (loop0): Corruption of in-memory data detected. Shutt=
ing down filesystem
[ 54.884597] XFS (loop0): Please umount the filesystem and rectify the =
problem(s)
Note that this crash is only possible on v4 filesystemsi or v5
filesystems mounted with the ikeep mount option. For all other V5
filesystems, this problem cannot occur because we don't read inodes
we are allocating from disk - we simply overwrite them with the new
inode information.
Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Tested-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
[sudip: use ip->i_d.di_mode instead of VFS_I(ip)->i_mode]
Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit e50e4f0b85be308a01b830c5fbdffc657e1a6dd0 upstream
If interrupt comes late, during probe error path or device remove (could
be triggered with CONFIG_DEBUG_SHIRQ), the interrupt handler
i2c_imx_isr() will access registers with the clock being disabled. This
leads to external abort on non-linefetch on Toradex Colibri VF50 module
(with Vybrid VF5xx):
Unhandled fault: external abort on non-linefetch (0x1008) at 0x8882d003
Internal error: : 1008 [#1] ARM
Modules linked in:
CPU: 0 PID: 1 Comm: swapper Not tainted 5.7.0 #607
Hardware name: Freescale Vybrid VF5xx/VF6xx (Device Tree)
(i2c_imx_isr) from [<8017009c>] (free_irq+0x25c/0x3b0)
(free_irq) from [<805844ec>] (release_nodes+0x178/0x284)
(release_nodes) from [<80580030>] (really_probe+0x10c/0x348)
(really_probe) from [<80580380>] (driver_probe_device+0x60/0x170)
(driver_probe_device) from [<80580630>] (device_driver_attach+0x58/0x60)
(device_driver_attach) from [<805806bc>] (__driver_attach+0x84/0xc0)
(__driver_attach) from [<8057e228>] (bus_for_each_dev+0x68/0xb4)
(bus_for_each_dev) from [<8057f3ec>] (bus_add_driver+0x144/0x1ec)
(bus_add_driver) from [<80581320>] (driver_register+0x78/0x110)
(driver_register) from [<8010213c>] (do_one_initcall+0xa8/0x2f4)
(do_one_initcall) from [<80c0100c>] (kernel_init_freeable+0x178/0x1dc)
(kernel_init_freeable) from [<80807048>] (kernel_init+0x8/0x110)
(kernel_init) from [<80100114>] (ret_from_fork+0x14/0x20)
Additionally, the i2c_imx_isr() could wake up the wait queue
(imx_i2c_struct->queue) before its initialization happens.
The resource-managed framework should not be used for interrupt handling,
because the resource will be released too late - after disabling clocks.
The interrupt handler is not prepared for such case.
Fixes: 1c4b6c3bcf30 ("i2c: imx: implement bus recovery")
Cc: <stable@vger.kernel.org>
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
[sudip: manual backport with extra label for goto]
Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 9a32a7e78bd0cd9a9b6332cbdc345ee5ffd0c5de upstream.
IBM Power9 processors can speculatively operate on data in the L1 cache before
it has been completely validated, via a way-prediction mechanism. It is not possible
for an attacker to determine the contents of impermissible memory using this method,
since these systems implement a combination of hardware and software security measures
to prevent scenarios where protected data could be leaked.
However these measures don't address the scenario where an attacker induces
the operating system to speculatively execute instructions using data that the
attacker controls. This can be used for example to speculatively bypass "kernel
user access prevention" techniques, as discovered by Anthony Steinhauser of
Google's Safeside Project. This is not an attack by itself, but there is a possibility
it could be used in conjunction with side-channels or other weaknesses in the
privileged code to construct an attack.
This issue can be mitigated by flushing the L1 cache between privilege boundaries
of concern. This patch flushes the L1 cache after user accesses.
This is part of the fix for CVE-2020-4788.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit d02f6b7dab8228487268298ea1f21081c0b4b3eb upstream.
get/put_user() can be called with nontrivial arguments. fs/proc/page.c
has a good example:
if (put_user(stable_page_flags(ppage), out)) {
stable_page_flags() is quite a lot of code, including spin locks in
the page allocator.
Ensure these arguments are evaluated before user access is allowed.
This improves security by reducing code with access to userspace, but
it also fixes a PREEMPT bug with KUAP on powerpc/64s:
stable_page_flags() is currently called with AMR set to allow writes,
it ends up calling spin_unlock(), which can call preempt_schedule. But
the task switch code can not be called with AMR set (it relies on
interrupts saving the register), so this blows up.
It's fine if the code inside allow_user_access() is preemptible,
because a timer or IPI will save the AMR, but it's not okay to
explicitly cause a reschedule.
Fixes: de78a9c42a79 ("powerpc: Add a framework for Kernel Userspace Access Protection")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200407041245.600651-1-npiggin@gmail.com
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 61e3acd8c693a14fc69b824cb5b08d02cb90a6e7 upstream.
The KUAP implementation adds calls in clear_user() to enable and
disable access to userspace memory. However, it doesn't add these to
__clear_user(), which is used in the ptrace regset code.
As there's only one direct user of __clear_user() (the regset code),
and the time taken to set the AMR for KUAP purposes is going to
dominate the cost of a quick access_ok(), there's not much point
having a separate path.
Rename __clear_user() to __arch_clear_user(), and make __clear_user()
just call clear_user().
Reported-by: syzbot+f25ecf4b2982d8c7a640@syzkaller-ppc64.appspotmail.com
Reported-by: Daniel Axtens <dja@axtens.net>
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Fixes: de78a9c42a79 ("powerpc: Add a framework for Kernel Userspace Access Protection")
Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
[mpe: Use __arch_clear_user() for the asm version like arm64 & nds32]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191209132221.15328-1-ajd@linux.ibm.com
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 5cd623333e7cf4e3a334c70529268b65f2a6c2c7 upstream.
Today, when a function like strncpy_from_user() is called,
the userspace access protection is de-activated and re-activated
for every word read.
By implementing user_access_begin and friends, the protection
is de-activated at the beginning of the copy and re-activated at the
end.
Implement user_access_begin(), user_access_end() and
unsafe_get_user(), unsafe_put_user() and unsafe_copy_to_user()
For the time being, we keep user_access_save() and
user_access_restore() as nops.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/36d4fbf9e56a75994aca4ee2214c77b26a5a8d35.1579866752.git.christophe.leroy@c-s.fr
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Backported from commit de78a9c42a79 ("powerpc: Add a framework
for Kernel Userspace Access Protection"). Here we don't try to
add the KUAP framework, we just want the helper functions
because we want to put uaccess flush helpers in them.
In terms of fixes, we don't need commit 1d8f739b07bd ("powerpc/kuap:
Fix set direction in allow/prevent_user_access()") as we don't have
real KUAP. Likewise as all our allows are noops and all our prevents
are just flushes, we don't need commit 9dc086f1e9ef ("powerpc/futex:
Fix incorrect user access blocking") The other 2 fixes we do need.
The original description is:
This patch implements a framework for Kernel Userspace Access
Protection.
Then subarches will have the possibility to provide their own
implementation by providing setup_kuap() and
allow/prevent_user_access().
Some platforms will need to know the area accessed and whether it is
accessed from read, write or both. Therefore source, destination and
size and handed over to the two functions.
mpe: Rename to allow/prevent rather than unlock/lock, and add
read/write wrappers. Drop the 32-bit code for now until we have an
implementation for it. Add kuap to pt_regs for 64-bit as well as
32-bit. Don't split strings, use pr_crit_ratelimited().
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit f79643787e0a0762d2409b7b8334e83f22d85695 upstream.
IBM Power9 processors can speculatively operate on data in the L1 cache before
it has been completely validated, via a way-prediction mechanism. It is not possible
for an attacker to determine the contents of impermissible memory using this method,
since these systems implement a combination of hardware and software security measures
to prevent scenarios where protected data could be leaked.
However these measures don't address the scenario where an attacker induces
the operating system to speculatively execute instructions using data that the
attacker controls. This can be used for example to speculatively bypass "kernel
user access prevention" techniques, as discovered by Anthony Steinhauser of
Google's Safeside Project. This is not an attack by itself, but there is a possibility
it could be used in conjunction with side-channels or other weaknesses in the
privileged code to construct an attack.
This issue can be mitigated by flushing the L1 cache between privilege boundaries
of concern. This patch flushes the L1 cache on kernel entry.
This is part of the fix for CVE-2020-4788.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
(backport only)
We're about to grow the exception handlers, which will make a bunch of them
no longer fit within the space available. We move them out of line.
This is a fiddly and error-prone business, so in the interests of reviewability
I haven't merged this in with the addition of the entry flush.
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Add a definition provided by mpe and fixed up for 4.4. It doesn't exist
for 4.4 and we'd quite like to use it.
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Tested-by: Pavel Machek (CIP) <pavel@denx.de>
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Tested-by: Linux Kernel Functional Testing <lkft@linaro.org>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20201117122106.144800239@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 57c176074057531b249cf522d90c22313fa74b0b upstream.
When converting trailing spaces and periods in paths, do so
for every component of the path, not just the last component.
If the conversion is not done for every path component, then
subsequent operations in directories with trailing spaces or
periods (e.g. create(), mkdir()) will fail with ENOENT. This
is because on the server, the directory will have a special
symbol in its name, and the client needs to provide the same.
Signed-off-by: Boris Protopopov <pboris@amazon.com>
Acked-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit cb8d53d2c97369029cc638c9274ac7be0a316c75 upstream.
ext4_unregister_sysfs() only deletes the kobject. The reference to it
needs to be put separately, like ext4_put_super() does.
This addresses the syzbot report
"memory leak in kobject_set_name_vargs (3)"
(https://syzkaller.appspot.com/bug?extid=9f864abad79fae7c17e1).
Reported-by: syzbot+9f864abad79fae7c17e1@syzkaller.appspotmail.com
Fixes: 72ba74508b28 ("ext4: release sysfs kobject when failing to enable quotas on mount")
Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20200922162456.93657-1-ebiggers@kernel.org
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
[sudip: adjust context]
Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit df5b0ab3e08a156701b537809914b339b0daa526 upstream.
Limit the CPU number to num_possible_cpus(), because setting it to a
value lower than INT_MAX but higher than NR_CPUS produces the following
error on reboot and shutdown:
BUG: unable to handle page fault for address: ffffffff90ab1bb0
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 1c09067 P4D 1c09067 PUD 1c0a063 PMD 0
Oops: 0000 [#1] SMP
CPU: 1 PID: 1 Comm: systemd-shutdow Not tainted 5.9.0-rc8-kvm #110
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
RIP: 0010:migrate_to_reboot_cpu+0xe/0x60
Code: ea ea 00 48 89 fa 48 c7 c7 30 57 f1 81 e9 fa ef ff ff 66 2e 0f 1f 84 00 00 00 00 00 53 8b 1d d5 ea ea 00 e8 14 33 fe ff 89 da <48> 0f a3 15 ea fc bd 00 48 89 d0 73 29 89 c2 c1 e8 06 65 48 8b 3c
RSP: 0018:ffffc90000013e08 EFLAGS: 00010246
RAX: ffff88801f0a0000 RBX: 0000000077359400 RCX: 0000000000000000
RDX: 0000000077359400 RSI: 0000000000000002 RDI: ffffffff81c199e0
RBP: ffffffff81c1e3c0 R08: ffff88801f41f000 R09: ffffffff81c1e348
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 00007f32bedf8830 R14: 00000000fee1dead R15: 0000000000000000
FS: 00007f32bedf8980(0000) GS:ffff88801f480000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffff90ab1bb0 CR3: 000000001d057000 CR4: 00000000000006a0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
__do_sys_reboot.cold+0x34/0x5b
do_syscall_64+0x2d/0x40
Fixes: 1b3a5d02ee07 ("reboot: move arch/x86 reboot= handling to generic kernel")
Signed-off-by: Matteo Croce <mcroce@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Fabian Frederick <fabf@skynet.be>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Robin Holt <robinmholt@gmail.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20201103214025.116799-3-mcroce@linux.microsoft.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[sudip: use reboot_mode instead of mode]
Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 8b92c4ff4423aa9900cf838d3294fcade4dbda35 upstream.
Patch series "fix parsing of reboot= cmdline", v3.
The parsing of the reboot= cmdline has two major errors:
- a missing bound check can crash the system on reboot
- parsing of the cpu number only works if specified last
Fix both.
This patch (of 2):
This reverts commit 616feab753972b97.
kstrtoint() and simple_strtoul() have a subtle difference which makes
them non interchangeable: if a non digit character is found amid the
parsing, the former will return an error, while the latter will just
stop parsing, e.g. simple_strtoul("123xyx") = 123.
The kernel cmdline reboot= argument allows to specify the CPU used for
rebooting, with the syntax `s####` among the other flags, e.g.
"reboot=warm,s31,force", so if this flag is not the last given, it's
silently ignored as well as the subsequent ones.
Fixes: 616feab75397 ("kernel/reboot.c: convert simple_strtoul to kstrtoint")
Signed-off-by: Matteo Croce <mcroce@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Robin Holt <robinmholt@gmail.com>
Cc: Fabian Frederick <fabf@skynet.be>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20201103214025.116799-2-mcroce@linux.microsoft.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[sudip: use reboot_mode instead of mode]
Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit f91072ed1b7283b13ca57fcfbece5a3b92726143 upstream.
There's a possible race in perf_mmap_close() when checking ring buffer's
mmap_count refcount value. The problem is that the mmap_count check is
not atomic because we call atomic_dec() and atomic_read() separately.
perf_mmap_close:
...
atomic_dec(&rb->mmap_count);
...
if (atomic_read(&rb->mmap_count))
goto out_put;
<ring buffer detach>
free_uid
out_put:
ring_buffer_put(rb); /* could be last */
The race can happen when we have two (or more) events sharing same ring
buffer and they go through atomic_dec() and then they both see 0 as refcount
value later in atomic_read(). Then both will go on and execute code which
is meant to be run just once.
The code that detaches ring buffer is probably fine to be executed more
than once, but the problem is in calling free_uid(), which will later on
demonstrate in related crashes and refcount warnings, like:
refcount_t: addition on 0; use-after-free.
...
RIP: 0010:refcount_warn_saturate+0x6d/0xf
...
Call Trace:
prepare_creds+0x190/0x1e0
copy_creds+0x35/0x172
copy_process+0x471/0x1a80
_do_fork+0x83/0x3a0
__do_sys_wait4+0x83/0x90
__do_sys_clone+0x85/0xa0
do_syscall_64+0x5b/0x1e0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Using atomic decrease and check instead of separated calls.
Tested-by: Michael Petlan <mpetlan@redhat.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Wade Mealing <wmealing@redhat.com>
Fixes: 9bb5d40cd93c ("perf: Fix mmap() accounting hole");
Link: https://lore.kernel.org/r/20200916115311.GE2301783@krava
[sudip: backport to v4.9.y by using ring_buffer]
Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 5f7f77400ab5b357b5fdb7122c3442239672186c upstream.
In order to avoid high dom0 load due to rogue guests sending events at
high frequency, block those events in case there was no action needed
in dom0 to handle the events.
This is done by adding a per-event counter, which set to zero in case
an EOI without the XEN_EOI_FLAG_SPURIOUS is received from a backend
driver, and incremented when this flag has been set. In case the
counter is 2 or higher delay the EOI by 1 << (cnt - 2) jiffies, but
not more than 1 second.
In order not to waste memory shorten the per-event refcnt to two bytes
(it should normally never exceed a value of 2). Add an overflow check
to evtchn_get() to make sure the 2 bytes really won't overflow.
This is part of XSA-332.
Cc: stable@vger.kernel.org
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Wei Liu <wl@xen.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit e99502f76271d6bc4e374fe368c50c67a1fd3070 upstream.
In case rogue guests are sending events at high frequency it might
happen that xen_evtchn_do_upcall() won't stop processing events in
dom0. As this is done in irq handling a crash might be the result.
In order to avoid that, delay further inter-domain events after some
time in xen_evtchn_do_upcall() by forcing eoi processing into a
worker on the same cpu, thus inhibiting new events coming in.
The time after which eoi processing is to be delayed is configurable
via a new module parameter "event_loop_timeout" which specifies the
maximum event loop time in jiffies (default: 2, the value was chosen
after some tests showing that a value of 2 was the lowest with an
only slight drop of dom0 network throughput while multiple guests
performed an event storm).
How long eoi processing will be delayed can be specified via another
parameter "event_eoi_delay" (again in jiffies, default 10, again the
value was chosen after testing with different delay values).
This is part of XSA-332.
Cc: stable@vger.kernel.org
Reported-by: Julien Grall <julien@xen.org>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Wei Liu <wl@xen.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 7beb290caa2adb0a399e735a1e175db9aae0523a upstream.
Today only fifo event channels have a cpu hotplug callback. In order
to prepare for more percpu (de)init work move that callback into
events_base.c and add percpu_init() and percpu_deinit() hooks to
struct evtchn_ops.
This is part of XSA-332.
Cc: stable@vger.kernel.org
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wl@xen.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit c44b849cee8c3ac587da3b0980e01f77500d158c upstream.
Instead of disabling the irq when an event is received and enabling
it again when handled by the user process use the lateeoi model.
This is part of XSA-332.
Cc: stable@vger.kernel.org
Reported-by: Julien Grall <julien@xen.org>
Signed-off-by: Juergen Gross <jgross@suse.com>
Tested-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wl@xen.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit c2711441bc961b37bba0615dd7135857d189035f upstream.
In order to reduce the chance for the system becoming unresponsive due
to event storms triggered by a misbehaving pcifront use the lateeoi irq
binding for pciback and unmask the event channel only just before
leaving the event handling function.
Restructure the handling to support that scheme. Basically an event can
come in for two reasons: either a normal request for a pciback action,
which is handled in a worker, or in case the guest has finished an AER
request which was requested by pciback.
When an AER request is issued to the guest and a normal pciback action
is currently active issue an EOI early in order to be able to receive
another event when the AER request has been finished by the guest.
Let the worker processing the normal requests run until no further
request is pending, instead of starting a new worker ion that case.
Issue the EOI only just before leaving the worker.
This scheme allows to drop calling the generic function
xen_pcibk_test_and_schedule_op() after processing of any request as
the handling of both request types is now separated more cleanly.
This is part of XSA-332.
Cc: stable@vger.kernel.org
Reported-by: Julien Grall <julien@xen.org>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wl@xen.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 86991b6e7ea6c613b7692f65106076943449b6b7 upstream.
In order to reduce the chance for the system becoming unresponsive due
to event storms triggered by a misbehaving scsifront use the lateeoi
irq binding for scsiback and unmask the event channel only just before
leaving the event handling function.
In case of a ring protocol error don't issue an EOI in order to avoid
the possibility to use that for producing an event storm. This at once
will result in no further call of scsiback_irq_fn(), so the ring_error
struct member can be dropped and scsiback_do_cmd_fn() can signal the
protocol error via a negative return value.
This is part of XSA-332.
Cc: stable@vger.kernel.org
Reported-by: Julien Grall <julien@xen.org>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wl@xen.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 23025393dbeb3b8b3b60ebfa724cdae384992e27 upstream.
In order to reduce the chance for the system becoming unresponsive due
to event storms triggered by a misbehaving netfront use the lateeoi
irq binding for netback and unmask the event channel only just before
going to sleep waiting for new events.
Make sure not to issue an EOI when none is pending by introducing an
eoi_pending element to struct xenvif_queue.
When no request has been consumed set the spurious flag when sending
the EOI for an interrupt.
This is part of XSA-332.
Cc: stable@vger.kernel.org
Reported-by: Julien Grall <julien@xen.org>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wl@xen.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 01263a1fabe30b4d542f34c7e2364a22587ddaf2 upstream.
In order to reduce the chance for the system becoming unresponsive due
to event storms triggered by a misbehaving blkfront use the lateeoi
irq binding for blkback and unmask the event channel only after
processing all pending requests.
As the thread processing requests is used to do purging work in regular
intervals an EOI may be sent only after having received an event. If
there was no pending I/O request flag the EOI as spurious.
This is part of XSA-332.
Cc: stable@vger.kernel.org
Reported-by: Julien Grall <julien@xen.org>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wl@xen.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 54c9de89895e0a36047fcc4ae754ea5b8655fb9d upstream.
In order to avoid tight event channel related IRQ loops add a new
framework of "late EOI" handling: the IRQ the event channel is bound
to will be masked until the event has been handled and the related
driver is capable to handle another event. The driver is responsible
for unmasking the event channel via the new function xen_irq_lateeoi().
This is similar to binding an event channel to a threaded IRQ, but
without having to structure the driver accordingly.
In order to support a future special handling in case a rogue guest
is sending lots of unsolicited events, add a flag to xen_irq_lateeoi()
which can be set by the caller to indicate the event was a spurious
one.
This is part of XSA-332.
Cc: stable@vger.kernel.org
Reported-by: Julien Grall <julien@xen.org>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Wei Liu <wl@xen.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit f01337197419b7e8a492e83089552b77d3b5fb90 upstream.
Unmasking a fifo event channel can result in unmasking it twice, once
directly in the kernel and once via a hypercall in case the event was
pending.
Fix that by doing the local unmask only if the event is not pending.
This is part of XSA-332.
Cc: stable@vger.kernel.org
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 4d3fe31bd993ef504350989786858aefdb877daa upstream.
A follow-up patch will require certain write to happen before an event
channel is unmasked.
While the memory barrier is not strictly necessary for all the callers,
the main one will need it. In order to avoid an extra memory barrier
when using fifo event channels, mandate evtchn_unmask() to provide
write ordering.
The 2-level event handling unmask operation is missing an appropriate
barrier, so add it. Fifo event channels are fine in this regard due to
using sync_cmpxchg().
This is part of XSA-332.
Cc: stable@vger.kernel.org
Suggested-by: Julien Grall <julien@xen.org>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Wei Liu <wl@xen.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 073d0552ead5bfc7a3a9c01de590e924f11b5dd2 upstream.
Today it can happen that an event channel is being removed from the
system while the event handling loop is active. This can lead to a
race resulting in crashes or WARN() splats when trying to access the
irq_info structure related to the event channel.
Fix this problem by using a rwlock taken as reader in the event
handling loop and as writer when deallocating the irq_info structure.
As the observed problem was a NULL dereference in evtchn_from_irq()
make this function more robust against races by testing the irq_info
pointer to be not NULL before dereferencing it.
And finally make all accesses to evtchn_to_irq[row][col] atomic ones
in order to avoid seeing partial updates of an array element in irq
handling. Note that irq handling can be entered only for event channels
which have been valid before, so any not populated row isn't a problem
in this regard, as rows are only ever added and never removed.
This is XSA-331.
Cc: stable@vger.kernel.org
Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reported-by: Jinoh Kang <luke1337@theori.io>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Wei Liu <wl@xen.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
always-on STIBP
commit 1978b3a53a74e3230cd46932b149c6e62e832e9a upstream.
On AMD CPUs which have the feature X86_FEATURE_AMD_STIBP_ALWAYS_ON,
STIBP is set to on and
spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT_PREFERRED
At the same time, IBPB can be set to conditional.
However, this leads to the case where it's impossible to turn on IBPB
for a process because in the PR_SPEC_DISABLE case in ib_prctl_set() the
spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT_PREFERRED
condition leads to a return before the task flag is set. Similarly,
ib_prctl_get() will return PR_SPEC_DISABLE even though IBPB is set to
conditional.
More generally, the following cases are possible:
1. STIBP = conditional && IBPB = on for spectre_v2_user=seccomp,ibpb
2. STIBP = on && IBPB = conditional for AMD CPUs with
X86_FEATURE_AMD_STIBP_ALWAYS_ON
The first case functions correctly today, but only because
spectre_v2_user_ibpb isn't updated to reflect the IBPB mode.
At a high level, this change does one thing. If either STIBP or IBPB
is set to conditional, allow the prctl to change the task flag.
Also, reflect that capability when querying the state. This isn't
perfect since it doesn't take into account if only STIBP or IBPB is
unconditionally on. But it allows the conditional feature to work as
expected, without affecting the unconditional one.
[ bp: Massage commit message and comment; space out statements for
better readability. ]
Fixes: 21998a351512 ("x86/speculation: Avoid force-disabling IBPB based on STIBP and enhanced IBRS.")
Signed-off-by: Anand K Mistry <amistry@google.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lkml.kernel.org/r/20201105163246.v2.1.Ifd7243cd3e2c2206a893ad0a5b9a4f19549e22c6@changeid
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit c51f8f88d705e06bd696d7510aff22b33eb8e638 upstream.
Non-cryptographic PRNGs may have great statistical properties, but
are usually trivially predictable to someone who knows the algorithm,
given a small sample of their output. An LFSR like prandom_u32() is
particularly simple, even if the sample is widely scattered bits.
It turns out the network stack uses prandom_u32() for some things like
random port numbers which it would prefer are *not* trivially predictable.
Predictability led to a practical DNS spoofing attack. Oops.
This patch replaces the LFSR with a homebrew cryptographic PRNG based
on the SipHash round function, which is in turn seeded with 128 bits
of strong random key. (The authors of SipHash have *not* been consulted
about this abuse of their algorithm.) Speed is prioritized over security;
attacks are rare, while performance is always wanted.
Replacing all callers of prandom_u32() is the quick fix.
Whether to reinstate a weaker PRNG for uses which can tolerate it
is an open question.
Commit f227e3ec3b5c ("random32: update the net random state on interrupt
and activity") was an earlier attempt at a solution. This patch replaces
it.
Reported-by: Amit Klein <aksecurity@gmail.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Eric Dumazet <edumazet@google.com>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: tytso@mit.edu
Cc: Florian Westphal <fw@strlen.de>
Cc: Marc Plumb <lkml.mplumb@gmail.com>
Fixes: f227e3ec3b5c ("random32: update the net random state on interrupt and activity")
Signed-off-by: George Spelvin <lkml@sdf.org>
Link: https://lore.kernel.org/netdev/20200808152628.GA27941@SDF.ORG/
[ willy: partial reversal of f227e3ec3b5c; moved SIPROUND definitions
to prandom.h for later use; merged George's prandom_seed() proposal;
inlined siprand_u32(); replaced the net_rand_state[] array with 4
members to fix a build issue; cosmetic cleanups to make checkpatch
happy; fixed RANDOM32_SELFTEST build ]
[wt: backported to 4.4 -- no latent_entropy, drop prandom_reseed_late]
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 909172a149749242990a6e64cb55d55460d4e417 ]
When net.ipv4.tcp_syncookies=1 and syn flood is happened,
cookie_v4_check or cookie_v6_check tries to redo what
tcp_v4_send_synack or tcp_v6_send_synack did,
rsk_window_clamp will be changed if SOCK_RCVBUF is set,
which will make rcv_wscale is different, the client
still operates with initial window scale and can overshot
granted window, the client use the initial scale but local
server use new scale to advertise window value, and session
work abnormally.
Fixes: e88c64f0a425 ("tcp: allow effective reduction of TCP's rcv-buffer via setsockopt")
Signed-off-by: Mao Wenan <wenan.mao@linux.alibaba.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/1604967391-123737-1-git-send-email-wenan.mao@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 361182308766a265b6c521879b34302617a8c209 ]
This fixes a regression for blocking connects introduced by commit
4becb7ee5b3d ("net/x25: Fix x25_neigh refcnt leak when x25 disconnect").
The x25->neighbour is already set to "NULL" by x25_disconnect() now,
while a blocking connect is waiting in
x25_wait_for_connection_establishment(). Therefore x25->neighbour must
not be accessed here again and x25->state is also already set to
X25_STATE_0 by x25_disconnect().
Fixes: 4becb7ee5b3d ("net/x25: Fix x25_neigh refcnt leak when x25 disconnect")
Signed-off-by: Martin Schiller <ms@dev.tdt.de>
Reviewed-by: Xie He <xie.he.0141@gmail.com>
Link: https://lore.kernel.org/r/20201109065449.9014-1-ms@dev.tdt.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 4031eeafa71eaf22ae40a15606a134ae86345daf ]
syzbot reported the following KASAN finding:
BUG: KASAN: nullptr-dereference in iucv_send_ctrl+0x390/0x3f0 net/iucv/af_iucv.c:385
Read of size 2 at addr 000000000000021e by task syz-executor907/519
CPU: 0 PID: 519 Comm: syz-executor907 Not tainted 5.9.0-syzkaller-07043-gbcf9877ad213 #0
Hardware name: IBM 3906 M04 701 (KVM/Linux)
Call Trace:
[<00000000c576af60>] unwind_start arch/s390/include/asm/unwind.h:65 [inline]
[<00000000c576af60>] show_stack+0x180/0x228 arch/s390/kernel/dumpstack.c:135
[<00000000c9dcd1f8>] __dump_stack lib/dump_stack.c:77 [inline]
[<00000000c9dcd1f8>] dump_stack+0x268/0x2f0 lib/dump_stack.c:118
[<00000000c5fed016>] print_address_description.constprop.0+0x5e/0x218 mm/kasan/report.c:383
[<00000000c5fec82a>] __kasan_report mm/kasan/report.c:517 [inline]
[<00000000c5fec82a>] kasan_report+0x11a/0x168 mm/kasan/report.c:534
[<00000000c98b5b60>] iucv_send_ctrl+0x390/0x3f0 net/iucv/af_iucv.c:385
[<00000000c98b6262>] iucv_sock_shutdown+0x44a/0x4c0 net/iucv/af_iucv.c:1457
[<00000000c89d3a54>] __sys_shutdown+0x12c/0x1c8 net/socket.c:2204
[<00000000c89d3b70>] __do_sys_shutdown net/socket.c:2212 [inline]
[<00000000c89d3b70>] __s390x_sys_shutdown+0x38/0x48 net/socket.c:2210
[<00000000c9e36eac>] system_call+0xe0/0x28c arch/s390/kernel/entry.S:415
There is nothing to shutdown if a connection has never been established.
Besides that iucv->hs_dev is not yet initialized if a socket is in
IUCV_OPEN state and iucv->path is not yet initialized if socket is in
IUCV_BOUND state.
So, just skip the shutdown calls for a socket in these states.
Fixes: eac3731bd04c ("[S390]: Add AF_IUCV socket support")
Fixes: 82492a355fac ("af_iucv: add shutdown for HS transport")
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
[jwi: correct one Fixes tag]
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 8ef9ba4d666614497a057d09b0a6eafc1e34eadf ]
Due to the legacy usage of hard_header_len for SIT tunnels while
already using infrastructure from net/ipv4/ip_tunnel.c the
calculation of the path MTU in tnl_update_pmtu is incorrect.
This leads to unnecessary creation of MTU exceptions for any
flow going over a SIT tunnel.
As SIT tunnels do not have a header themsevles other than their
transport (L3, L2) headers we're leaving hard_header_len set to zero
as tnl_update_pmtu is already taking care of the transport headers
sizes.
This will also help avoiding unnecessary IPv6 GC runs and spinlock
contention seen when using SIT tunnels and for more than
net.ipv6.route.gc_thresh flows.
Fixes: c54419321455 ("GRE: Refactor GRE tunneling code.")
Signed-off-by: Oliver Herms <oliver.peter.herms@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20201103104133.GA1573211@tws
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit e9696d259d0fb5d239e8c28ca41089838ea76d13 upstream.
kernel/dma/swiotlb.c:swiotlb_init gets called first and tries to
allocate a buffer for the swiotlb. It does so by calling
memblock_alloc_low(PAGE_ALIGN(bytes), PAGE_SIZE);
If the allocation must fail, no_iotlb_memory is set.
Later during initialization swiotlb-xen comes in
(drivers/xen/swiotlb-xen.c:xen_swiotlb_init) and given that io_tlb_start
is != 0, it thinks the memory is ready to use when actually it is not.
When the swiotlb is actually needed, swiotlb_tbl_map_single gets called
and since no_iotlb_memory is set the kernel panics.
Instead, if swiotlb-xen.c:xen_swiotlb_init knew the swiotlb hadn't been
initialized, it would do the initialization itself, which might still
succeed.
Fix the panic by setting io_tlb_start to 0 on swiotlb initialization
failure, and also by setting no_iotlb_memory to false on swiotlb
initialization success.
Fixes: ac2cbab21f31 ("x86: Don't panic if can not alloc buffer for swiotlb")
Reported-by: Elliott Mitchell <ehem+xen@m5p.com>
Tested-by: Elliott Mitchell <ehem+xen@m5p.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: stable@vger.kernel.org
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 06abe8291bc31839950f7d0362d9979edc88a666 upstream.
The correct way to disable debounce filter is to clear bit 5 and 6
of the register.
Cc: stable@vger.kerne.org
Signed-off-by: Coiby Xu <coiby.xu@gmail.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Cc: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/linux-gpio/df2c008b-e7b5-4fdd-42ea-4d1c62b52139@redhat.com/
Link: https://lore.kernel.org/r/20201105231912.69527-2-coiby.xu@gmail.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit c64a6a0d4a928c63e5bc3b485552a8903a506c36 upstream.
RTC is 32.768kHz thus 512 RtcClk equals 15625 usec. The documentation
likely has dropped precision and that's why the driver mistakenly took
the slightly deviated value.
Cc: stable@vger.kernel.org
Reported-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Suggested-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Suggested-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Coiby Xu <coiby.xu@gmail.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/linux-gpio/2f4706a1-502f-75f0-9596-cc25b4933b6c@redhat.com/
Link: https://lore.kernel.org/r/20201105231912.69527-3-coiby.xu@gmail.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 06ad8d339524bf94b89859047822c31df6ace239 upstream.
The gma500 driver expects 3 pipelines in several it's IRQ functions.
Accessing struct drm_device.vblank[], this fails with devices that only
have 2 pipelines. An example KASAN report is shown below.
[ 62.267688] ==================================================================
[ 62.268856] BUG: KASAN: slab-out-of-bounds in psb_irq_postinstall+0x250/0x3c0 [gma500_gfx]
[ 62.269450] Read of size 1 at addr ffff8880012bc6d0 by task systemd-udevd/285
[ 62.269949]
[ 62.270192] CPU: 0 PID: 285 Comm: systemd-udevd Tainted: G E 5.10.0-rc1-1-default+ #572
[ 62.270807] Hardware name: /DN2800MT, BIOS MTCDT10N.86A.0164.2012.1213.1024 12/13/2012
[ 62.271366] Call Trace:
[ 62.271705] dump_stack+0xae/0xe5
[ 62.272180] print_address_description.constprop.0+0x17/0xf0
[ 62.272987] ? psb_irq_postinstall+0x250/0x3c0 [gma500_gfx]
[ 62.273474] __kasan_report.cold+0x20/0x38
[ 62.273989] ? psb_irq_postinstall+0x250/0x3c0 [gma500_gfx]
[ 62.274460] kasan_report+0x3a/0x50
[ 62.274891] psb_irq_postinstall+0x250/0x3c0 [gma500_gfx]
[ 62.275380] drm_irq_install+0x131/0x1f0
<...>
[ 62.300751] Allocated by task 285:
[ 62.301223] kasan_save_stack+0x1b/0x40
[ 62.301731] __kasan_kmalloc.constprop.0+0xbf/0xd0
[ 62.302293] drmm_kmalloc+0x55/0x100
[ 62.302773] drm_vblank_init+0x77/0x210
Resolve the issue by only handling vblank entries up to the number of
CRTCs.
I'm adding a Fixes tag for reference, although the bug has been present
since the driver's initial commit.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Fixes: 5c49fd3aa0ab ("gma500: Add the core DRM files and headers")
Cc: Alan Cox <alan@linux.intel.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Patrik Jakobsson <patrik.r.jakobsson@gmail.com>
Cc: dri-devel@lists.freedesktop.org
Cc: stable@vger.kernel.org#v3.3+
Link: https://patchwork.freedesktop.org/patch/msgid/20201105190256.3893-1-tzimmermann@suse.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 77f6ab8b7768cf5e6bdd0e72499270a0671506ee upstream.
Coredump logics needs to report not only the registers of the dumping
thread, but (since 2.5.43) those of other threads getting killed.
Doing that might require extra state saved on the stack in asm glue at
kernel entry; signal delivery logics does that (we need to be able to
save sigcontext there, at the very least) and so does seccomp.
That covers all callers of do_coredump(). Secondary threads get hit with
SIGKILL and caught as soon as they reach exit_mm(), which normally happens
in signal delivery, so those are also fine most of the time. Unfortunately,
it is possible to end up with secondary zapped when it has already entered
exit(2) (or, worse yet, is oopsing). In those cases we reach exit_mm()
when mm->core_state is already set, but the stack contents is not what
we would have in signal delivery.
At least on two architectures (alpha and m68k) it leads to infoleaks - we
end up with a chunk of kernel stack written into coredump, with the contents
consisting of normal C stack frames of the call chain leading to exit_mm()
instead of the expected copy of userland registers. In case of alpha we
leak 312 bytes of stack. Other architectures (including the regset-using
ones) might have similar problems - the normal user of regsets is ptrace
and the state of tracee at the time of such calls is special in the same
way signal delivery is.
Note that had the zapper gotten to the exiting thread slightly later,
it wouldn't have been included into coredump anyway - we skip the threads
that have already cleared their ->mm. So let's pretend that zapper always
loses the race. IOW, have exit_mm() only insert into the dumper list if
we'd gotten there from handling a fatal signal[*]
As the result, the callers of do_exit() that have *not* gone through get_signal()
are not seen by coredump logics as secondary threads. Which excludes voluntary
exit()/oopsen/traps/etc. The dumper thread itself is unaffected by that,
so seccomp is fine.
[*] originally I intended to add a new flag in tsk->flags, but ebiederman pointed
out that PF_SIGNALED is already doing just what we need.
Cc: stable@vger.kernel.org
Fixes: d89f3847def4 ("[PATCH] thread-aware coredumps, 2.5.43-C3")
History-tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit f5785283dd64867a711ca1fb1f5bb172f252ecdf upstream.
Though problem if found on a lower 4.1.12 kernel, I think upstream has
same issue.
In one node in the cluster, there is the following callback trace:
# cat /proc/21473/stack
__ocfs2_cluster_lock.isra.36+0x336/0x9e0 [ocfs2]
ocfs2_inode_lock_full_nested+0x121/0x520 [ocfs2]
ocfs2_evict_inode+0x152/0x820 [ocfs2]
evict+0xae/0x1a0
iput+0x1c6/0x230
ocfs2_orphan_filldir+0x5d/0x100 [ocfs2]
ocfs2_dir_foreach_blk+0x490/0x4f0 [ocfs2]
ocfs2_dir_foreach+0x29/0x30 [ocfs2]
ocfs2_recover_orphans+0x1b6/0x9a0 [ocfs2]
ocfs2_complete_recovery+0x1de/0x5c0 [ocfs2]
process_one_work+0x169/0x4a0
worker_thread+0x5b/0x560
kthread+0xcb/0xf0
ret_from_fork+0x61/0x90
The above stack is not reasonable, the final iput shouldn't happen in
ocfs2_orphan_filldir() function. Looking at the code,
2067 /* Skip inodes which are already added to recover list, since dio may
2068 * happen concurrently with unlink/rename */
2069 if (OCFS2_I(iter)->ip_next_orphan) {
2070 iput(iter);
2071 return 0;
2072 }
2073
The logic thinks the inode is already in recover list on seeing
ip_next_orphan is non-NULL, so it skip this inode after dropping a
reference which incremented in ocfs2_iget().
While, if the inode is already in recover list, it should have another
reference and the iput() at line 2070 should not be the final iput
(dropping the last reference). So I don't think the inode is really in
the recover list (no vmcore to confirm).
Note that ocfs2_queue_orphans(), though not shown up in the call back
trace, is holding cluster lock on the orphan directory when looking up
for unlinked inodes. The on disk inode eviction could involve a lot of
IOs which may need long time to finish. That means this node could hold
the cluster lock for very long time, that can lead to the lock requests
(from other nodes) to the orhpan directory hang for long time.
Looking at more on ip_next_orphan, I found it's not initialized when
allocating a new ocfs2_inode_info structure.
This causes te reflink operations from some nodes hang for very long
time waiting for the cluster lock on the orphan directory.
Fix: initialize ip_next_orphan as NULL.
Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20201109171746.27884-1-wen.gang.wang@oracle.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit bcbc0b2e275f0a797de11a10eff495b4571863fc upstream.
A receive callback is queued while the client is still connected
but can still be called after the client was disconnected. Upon
disconnect cl->me_cl is set to NULL, hence we need to check
that ME client is not-NULL in mei_cl_mtu to avoid
null dereference.
Cc: <stable@vger.kernel.org>
Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Link: https://lore.kernel.org/r/20201029095444.957924-2-tomas.winkler@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 6d853c9e4104b4fc8d55dc9cd3b99712aa347174 upstream.
Renesas R-Car and RZ/G SoCs have a firmware download mode over USB.
However, on reset a banner string is transmitted out which is not expected
to be echoed back and will corrupt the protocol.
Cc: stable <stable@vger.kernel.org>
Acked-by: Oliver Neukum <oneukum@suse.com>
Signed-off-by: Chris Brandt <chris.brandt@renesas.com>
Link: https://lore.kernel.org/r/20201111131209.3977903-1-chris.brandt@renesas.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 7067b2619017d51e71686ca9756b454de0e5826a upstream.
It takes xattr_sem to check inline data again but without unlock it
in case not have. So unlock it before return.
Fixes: aef1c8513c1f ("ext4: let ext4_truncate handle inline data correctly")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Link: https://lore.kernel.org/r/1604370542-124630-1-git-send-email-joseph.qi@linux.alibaba.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 174fe5ba2d1ea0d6c5ab2a7d4aa058d6d497ae4d upstream.
The macro MOPT_Q is used to indicates the mount option is related to
quota stuff and is defined to be MOPT_NOSUPPORT when CONFIG_QUOTA is
disabled. Normally the quota options are handled explicitly, so it
didn't matter that the MOPT_STRING flag was missing, even though the
usrjquota and grpjquota mount options take a string argument. It's
important that's present in the !CONFIG_QUOTA case, since without
MOPT_STRING, the mount option matcher will match usrjquota= followed
by an integer, and will otherwise skip the table entry, and so "mount
option not supported" error message is never reported.
[ Fixed up the commit description to better explain why the fix
works. --TYT ]
Fixes: 26092bf52478 ("ext4: use a table-driven handler for mount options")
Signed-off-by: Kaixu Xia <kaixuxia@tencent.com>
Link: https://lore.kernel.org/r/1603986396-28917-1-git-send-email-kaixuxia@tencent.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|