Age | Commit message (Collapse) | Author | Files | Lines |
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Fix a regression in the conversion of the 64-bit BookE interrupt
entry to C.
- Fix KVM hosts running with the hash MMU since the recent KVM gfn
changes.
- Fix a deadlock in our paravirt spinlocks when hcall tracing is
enabled.
- Several fixes for oopses in our runtime code patching for security
mitigations.
- A couple of minor fixes for the recent conversion of 32-bit interrupt
entry/exit to C.
- Fix __get_user() causing spurious crashes in sigreturn due to a bad
inline asm constraint, spotted with GCC 11.
- A fix for the way we track IRQ masking state vs NMI interrupts when
using the new scv system call entry path.
- A couple more minor fixes.
Thanks to Cédric Le Goater, Christian Zigotzky, Christophe Leroy,
Naveen N. Rao, Nicholas Piggin Paul Menzel, and Sean Christopherson.
* tag 'powerpc-5.13-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/64e/interrupt: Fix nvgprs being clobbered
powerpc/64s: Make NMI record implicitly soft-masked code as irqs disabled
powerpc/64s: Fix stf mitigation patching w/strict RWX & hash
powerpc/64s: Fix entry flush patching w/strict RWX & hash
powerpc/64s: Fix crashes when toggling entry flush barrier
powerpc/64s: Fix crashes when toggling stf barrier
KVM: PPC: Book3S HV: Fix kvm_unmap_gfn_range_hv() for Hash MMU
powerpc/legacy_serial: Fix UBSAN: array-index-out-of-bounds
powerpc/signal: Fix possible build failure with unsafe_copy_fpr_{to/from}_user
powerpc/uaccess: Fix __get_user() with CONFIG_CC_HAS_ASM_GOTO_OUTPUT
powerpc/pseries: warn if recursing into the hcall tracing code
powerpc/pseries: use notrace hcall variant for H_CEDE idle
powerpc/pseries: Don't trace hcall tracing wrapper
powerpc/pseries: Fix hcall tracing recursion in pv queued spinlocks
powerpc/syscall: Calling kuap_save_and_lock() is wrong
powerpc/interrupts: Fix kuep_unlock() call
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
"Fix an idle CPU selection bug, and an AMD Ryzen maximum frequency
enumeration bug"
* tag 'sched-urgent-2021-05-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, sched: Fix the AMD CPPC maximum performance value on certain AMD Ryzen generations
sched/fair: Fix clearing of has_idle_cores flag in select_idle_cpu()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull objtool fixes from Ingo Molnar:
"Fix a couple of endianness bugs that crept in"
* tag 'objtool-urgent-2021-05-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
objtool/x86: Fix elf_add_alternative() endianness
objtool: Fix elf_create_undef_symbol() endianness
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fix from Ingo Molnar:
"Fix build warning on SH"
* tag 'irq-urgent-2021-05-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sh: Remove unused variable
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 stack randomization fix from Ingo Molnar:
"Fix an assembly constraint that affected LLVM up to version 12"
* tag 'core-urgent-2021-05-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
stack: Replace "o" output with "r" input constraint
|
|
Merge misc fixes from Andrew Morton:
"13 patches.
Subsystems affected by this patch series: resource, squashfs, hfsplus,
modprobe, and mm (hugetlb, slub, userfaultfd, ksm, pagealloc, kasan,
pagemap, and ioremap)"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm/ioremap: fix iomap_max_page_shift
docs: admin-guide: update description for kernel.modprobe sysctl
hfsplus: prevent corruption in shrinking truncate
mm/filemap: fix readahead return types
kasan: fix unit tests with CONFIG_UBSAN_LOCAL_BOUNDS enabled
mm: fix struct page layout on 32-bit systems
ksm: revert "use GET_KSM_PAGE_NOLOCK to get ksm page in remove_rmap_item_from_tree()"
userfaultfd: release page in error path to avoid BUG_ON
squashfs: fix divide error in calculate_skip()
kernel/resource: fix return code check in __request_free_mem_region
mm, slub: move slub_debug static key enabling outside slab_mutex
mm/hugetlb: fix cow where page writtable in child
mm/hugetlb: fix F_SEAL_FUTURE_WRITE
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc
Pull ARC fixes from Vineet Gupta:
- PAE fixes
- syscall num check off-by-one bug
- misc fixes
* tag 'arc-5.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
ARC: mm: Use max_high_pfn as a HIGHMEM zone border
ARC: mm: PAE: use 40-bit physical page mask
ARC: entry: fix off-by-one error in syscall number validation
ARC: kgdb: add 'fallthrough' to prevent a warning
arc: Fix typos/spellos
|
|
Pull block fixes from Jens Axboe:
- Fix for shared tag set exit (Bart)
- Correct ioctl range for zoned ioctls (Damien)
- Removed dead/unused function (Lin)
- Fix perf regression for shared tags (Ming)
- Fix out-of-bounds issue with kyber and preemption (Omar)
- BFQ merge fix (Paolo)
- Two error handling fixes for nbd (Sun)
- Fix weight update in blk-iocost (Tejun)
- NVMe pull request (Christoph):
- correct the check for using the inline bio in nvmet (Chaitanya
Kulkarni)
- demote unsupported command warnings (Chaitanya Kulkarni)
- fix corruption due to double initializing ANA state (me, Hou Pu)
- reset ns->file when open fails (Daniel Wagner)
- fix a NULL deref when SEND is completed with error in nvmet-rdma
(Michal Kalderon)
- Fix kernel-doc warning (Bart)
* tag 'block-5.13-2021-05-14' of git://git.kernel.dk/linux-block:
block/partitions/efi.c: Fix the efi_partition() kernel-doc header
blk-mq: Swap two calls in blk_mq_exit_queue()
blk-mq: plug request for shared sbitmap
nvmet: use new ana_log_size instead the old one
nvmet: seset ns->file when open fails
nbd: share nbd_put and return by goto put_nbd
nbd: Fix NULL pointer in flush_workqueue
blkdev.h: remove unused codes blk_account_rq
block, bfq: avoid circular stable merges
blk-iocost: fix weight updates of inner active iocgs
nvmet: demote fabrics cmd parse err msg to debug
nvmet: use helper to remove the duplicate code
nvmet: demote discovery cmd parse err msg to debug
nvmet-rdma: Fix NULL deref when SEND is completed with error
nvmet: fix inline bio check for passthru
nvmet: fix inline bio check for bdev-ns
nvme-multipath: fix double initialization of ANA state
kyber: fix out of bounds access when preempted
block: uapi: fix comment about block device ioctl
|
|
Pull io_uring fixes from Jens Axboe:
"Just a few minor fixes/changes:
- Fix issue with double free race for linked timeout completions
- Fix reference issue with timeouts
- Remove last few places that make SQPOLL special, since it's just an
io thread now.
- Bump maximum allowed registered buffers, as we don't allocate as
much anymore"
* tag 'io_uring-5.13-2021-05-14' of git://git.kernel.dk/linux-block:
io_uring: increase max number of reg buffers
io_uring: further remove sqpoll limits on opcodes
io_uring: fix ltout double free on completion race
io_uring: fix link timeout refs
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs fixes from Gao Xiang:
"This mainly fixes 1 lcluster-sized pclusters for the big pcluster
feature, which can be forcely generated by mkfs as a specific on-disk
case for per-(sub)file compression strategies but missed to handle in
runtime properly.
Also, documentation updates are included to fix the broken
illustration due to the ReST conversion by accident and complete the
big pcluster introduction.
Summary:
- update documentation to fix the broken illustration due to ReST
conversion by accident at that time and complete the big pcluster
introduction
- fix 1 lcluster-sized pclusters for the big pcluster feature"
* tag 'erofs-for-5.13-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: fix 1 lcluster-sized pcluster for big pcluster
erofs: update documentation about data compression
erofs: fix broken illustration in documentation
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm fixes from Dan Williams:
"A regression fix for a bootup crash condition introduced in this merge
window and some other minor fixups:
- Fix regression in ACPI NFIT table handling leading to crashes and
driver load failures.
- Move the nvdimm mailing list
- Miscellaneous minor fixups"
* tag 'libnvdimm-fixes-5.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
ACPI: NFIT: Fix support for variable 'SPA' structure size
MAINTAINERS: Move nvdimm mailing list
tools/testing/nvdimm: Make symbol '__nfit_test_ioremap' static
libnvdimm: Remove duplicate struct declaration
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull dax fixes from Dan Williams:
"A fix for a hang condition due to missed wakeups in the filesystem-dax
core when exercised by virtiofs.
This bug has been there from the beginning, but the condition has
not triggered on other filesystems since they hold a lock over
invalidation events"
* tag 'dax-fixes-5.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
dax: Wake up all waiters after invalidating dax entry
dax: Add a wakeup mode parameter to put_unlocked_entry()
dax: Add an enum for specifying dax wakup mode
|
|
Pull more drm fixes from Dave Airlie:
"Looks like I wasn't the only one not fully switched on this week. The
msm pull has a missing tag so I missed it, and i915 team were a bit
late. In my defence I did have a day with the roof of my home office
removed, so was sitting at my kids desk.
msm:
- dsi regression fix
- dma-buf pinning fix
- displayport fixes
- llc fix
i915:
- Fix active callback alignment annotations and subsequent crashes
- Retract link training strategy to slow and wide, again
- Avoid division by zero on gen2
- Use correct width reads for C0DRB3/C1DRB3 registers
- Fix double free in pdp allocation failure path
- Fix HDMI 2.1 PCON downstream caps check"
* tag 'drm-fixes-2021-05-15' of git://anongit.freedesktop.org/drm/drm:
drm/i915: Use correct downstream caps for check Src-Ctl mode for PCON
drm/i915/overlay: Fix active retire callback alignment
drm/i915: Fix crash in auto_retire
drm/i915/gt: Fix a double free in gen8_preallocate_top_level_pdp
drm/i915: Read C0DRB3/C1DRB3 as 16 bits again
drm/i915: Avoid div-by-zero on gen2
drm/i915/dp: Use slow and wide link training for everything
drm/msm/dp: initialize audio_comp when audio starts
drm/msm/dp: check sink_count before update is_connected status
drm/msm: fix minor version to indicate MSM_PARAM_SUSPENDS support
drm/msm/dsi: fix msm_dsi_phy_get_clk_provider return code
drm/msm/dsi: dsi_phy_28nm_8960: fix uninitialized variable access
drm/msm: fix LLC not being enabled for mmu500 targets
drm/msm: Do not unpin/evict exported dma-buf's
|
|
syzbot is reporting OOB write at vga16fb_imageblit() [1], for
resize_screen() from ioctl(VT_RESIZE) returns 0 without checking whether
requested rows/columns fit the amount of memory reserved for the graphical
screen if current mode is KD_GRAPHICS.
----------
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/ioctl.h>
#include <linux/kd.h>
#include <linux/vt.h>
int main(int argc, char *argv[])
{
const int fd = open("/dev/char/4:1", O_RDWR);
struct vt_sizes vt = { 0x4100, 2 };
ioctl(fd, KDSETMODE, KD_GRAPHICS);
ioctl(fd, VT_RESIZE, &vt);
ioctl(fd, KDSETMODE, KD_TEXT);
return 0;
}
----------
Allow framebuffer drivers to return -EINVAL, by moving vc->vc_mode !=
KD_GRAPHICS check from resize_screen() to fbcon_resize().
Link: https://syzkaller.appspot.com/bug?extid=1f29e126cf461c4de3b3 [1]
Reported-by: syzbot <syzbot+1f29e126cf461c4de3b3@syzkaller.appspotmail.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Tested-by: syzbot <syzbot+1f29e126cf461c4de3b3@syzkaller.appspotmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
iomap_max_page_shift is expected to contain a page shift, so it can't be a
'bool', has to be an 'unsigned int'
And fix the default values: P4D_SHIFT is when huge iomap is allowed.
However, on some architectures (eg: powerpc book3s/64), P4D_SHIFT is not a
constant so it can't be used to initialise a static variable. So,
initialise iomap_max_page_shift with a maximum shift supported by the
architecture, it is gated by P4D_SHIFT in vmap_try_huge_p4d() anyway.
Link: https://lkml.kernel.org/r/ad2d366015794a9f21320dcbdd0a8eb98979e9df.1620898113.git.christophe.leroy@csgroup.eu
Fixes: bbc180a5adb0 ("mm: HUGE_VMAP arch support cleanup")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
When I added CONFIG_MODPROBE_PATH, I neglected to update Documentation/.
It's still true that this defaults to /sbin/modprobe, but now via a level
of indirection. So document that the kernel might have been built with
something other than /sbin/modprobe as the initial value.
Link: https://lkml.kernel.org/r/20210420125324.1246826-1-linux@rasmusvillemoes.dk
Fixes: 17652f4240f7a ("modules: add CONFIG_MODPROBE_PATH")
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
I believe there are some issues introduced by commit 31651c607151
("hfsplus: avoid deadlock on file truncation")
HFS+ has extent records which always contains 8 extents. In case the
first extent record in catalog file gets full, new ones are allocated from
extents overflow file.
In case shrinking truncate happens to middle of an extent record which
locates in extents overflow file, the logic in hfsplus_file_truncate() was
changed so that call to hfs_brec_remove() is not guarded any more.
Right action would be just freeing the extents that exceed the new size
inside extent record by calling hfsplus_free_extents(), and then check if
the whole extent record should be removed. However since the guard
(blk_cnt > start) is now after the call to hfs_brec_remove(), this has
unfortunate effect that the last matching extent record is removed
unconditionally.
To reproduce this issue, create a file which has at least 10 extents, and
then perform shrinking truncate into middle of the last extent record, so
that the number of remaining extents is not under or divisible by 8. This
causes the last extent record (8 extents) to be removed totally instead of
truncating into middle of it. Thus this causes corruption, and lost data.
Fix for this is simply checking if the new truncated end is below the
start of this extent record, making it safe to remove the full extent
record. However call to hfs_brec_remove() can't be moved to it's previous
place since we're dropping ->tree_lock and it can cause a race condition
and the cached info being invalidated possibly corrupting the node data.
Another issue is related to this one. When entering into the block
(blk_cnt > start) we are not holding the ->tree_lock. We break out from
the loop not holding the lock, but hfs_find_exit() does unlock it. Not
sure if it's possible for someone else to take the lock under our feet,
but it can cause hard to debug errors and premature unlocking. Even if
there's no real risk of it, the locking should still always be kept in
balance. Thus taking the lock now just before the check.
Link: https://lkml.kernel.org/r/20210429165139.3082828-1-jouni.roivas@tuxera.com
Fixes: 31651c607151f ("hfsplus: avoid deadlock on file truncation")
Signed-off-by: Jouni Roivas <jouni.roivas@tuxera.com>
Reviewed-by: Anton Altaparmakov <anton@tuxera.com>
Cc: Anatoly Trosinenko <anatoly.trosinenko@gmail.com>
Cc: Viacheslav Dubeyko <slava@dubeyko.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
A readahead request will not allocate more memory than can be represented
by a size_t, even on systems that have HIGHMEM available. Change the
length functions from returning an loff_t to a size_t.
Link: https://lkml.kernel.org/r/20210510201201.1558972-1-willy@infradead.org
Fixes: 32c0a6bcaa1f57 ("btrfs: add and use readahead_batch_length")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
These tests deliberately access these arrays out of bounds, which will
cause the dynamic local bounds checks inserted by
CONFIG_UBSAN_LOCAL_BOUNDS to fail and panic the kernel. To avoid this
problem, access the arrays via volatile pointers, which will prevent the
compiler from being able to determine the array bounds.
These accesses use volatile pointers to char (char *volatile) rather than
the more conventional pointers to volatile char (volatile char *) because
we want to prevent the compiler from making inferences about the pointer
itself (i.e. its array bounds), not the data that it refers to.
Link: https://lkml.kernel.org/r/20210507025915.1464056-1-pcc@google.com
Link: https://linux-review.googlesource.com/id/I90b1713fbfa1bf68ff895aef099ea77b98a7c3b9
Signed-off-by: Peter Collingbourne <pcc@google.com>
Tested-by: Alexander Potapenko <glider@google.com>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: George Popescu <georgepope@android.com>
Cc: Elena Petrova <lenaptr@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
32-bit architectures which expect 8-byte alignment for 8-byte integers and
need 64-bit DMA addresses (arm, mips, ppc) had their struct page
inadvertently expanded in 2019. When the dma_addr_t was added, it forced
the alignment of the union to 8 bytes, which inserted a 4 byte gap between
'flags' and the union.
Fix this by storing the dma_addr_t in one or two adjacent unsigned longs.
This restores the alignment to that of an unsigned long. We always
store the low bits in the first word to prevent the PageTail bit from
being inadvertently set on a big endian platform. If that happened,
get_user_pages_fast() racing against a page which was freed and
reallocated to the page_pool could dereference a bogus compound_head(),
which would be hard to trace back to this cause.
Link: https://lkml.kernel.org/r/20210510153211.1504886-1-willy@infradead.org
Fixes: c25fff7171be ("mm: add dma_addr_t to struct page")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Tested-by: Matteo Croce <mcroce@linux.microsoft.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
remove_rmap_item_from_tree()"
This reverts commit 3e96b6a2e9ad929a3230a22f4d64a74671a0720b. General
Protection Fault in rmap_walk_ksm() under memory pressure:
remove_rmap_item_from_tree() needs to take page lock, of course.
Link: https://lkml.kernel.org/r/alpine.LSU.2.11.2105092253500.1127@eggly.anvils
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Consider the following sequence of events:
1. Userspace issues a UFFD ioctl, which ends up calling into
shmem_mfill_atomic_pte(). We successfully account the blocks, we
shmem_alloc_page(), but then the copy_from_user() fails. We return
-ENOENT. We don't release the page we allocated.
2. Our caller detects this error code, tries the copy_from_user() after
dropping the mmap_lock, and retries, calling back into
shmem_mfill_atomic_pte().
3. Meanwhile, let's say another process filled up the tmpfs being used.
4. So shmem_mfill_atomic_pte() fails to account blocks this time, and
immediately returns - without releasing the page.
This triggers a BUG_ON in our caller, which asserts that the page
should always be consumed, unless -ENOENT is returned.
To fix this, detect if we have such a "dangling" page when accounting
fails, and if so, release it before returning.
Link: https://lkml.kernel.org/r/20210428230858.348400-1-axelrasmussen@google.com
Fixes: cb658a453b93 ("userfaultfd: shmem: avoid leaking blocks and used blocks in UFFDIO_COPY")
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Reported-by: Hugh Dickins <hughd@google.com>
Acked-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Sysbot has reported a "divide error" which has been identified as being
caused by a corrupted file_size value within the file inode. This value
has been corrupted to a much larger value than expected.
Calculate_skip() is passed i_size_read(inode) >> msblk->block_log. Due to
the file_size value corruption this overflows the int argument/variable in
that function, leading to the divide error.
This patch changes the function to use u64. This will accommodate any
unexpectedly large values due to corruption.
The value returned from calculate_skip() is clamped to be never more than
SQUASHFS_CACHED_BLKS - 1, or 7. So file_size corruption does not lead to
an unexpectedly large return result here.
Link: https://lkml.kernel.org/r/20210507152618.9447-1-phillip@squashfs.org.uk
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Reported-by: <syzbot+e8f781243ce16ac2f962@syzkaller.appspotmail.com>
Reported-by: <syzbot+7b98870d4fec9447b951@syzkaller.appspotmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Splitting an earlier version of a patch that allowed calling
__request_region() while holding the resource lock into a series of
patches required changing the return code for the newly introduced
__request_region_locked().
Unfortunately this change was not carried through to a subsequent commit
56fd94919b8b ("kernel/resource: fix locking in request_free_mem_region")
in the series. This resulted in a use-after-free due to freeing the
struct resource without properly releasing it. Fix this by correcting the
return code check so that the struct is not freed if the request to add it
was successful.
Link: https://lkml.kernel.org/r/20210512073528.22334-1-apopple@nvidia.com
Fixes: 56fd94919b8b ("kernel/resource: fix locking in request_free_mem_region")
Signed-off-by: Alistair Popple <apopple@nvidia.com>
Reported-by: kernel test robot <oliver.sang@intel.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Muchun Song <smuchun@gmail.com>
Cc: Oliver Sang <oliver.sang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Paul E. McKenney reported [1] that commit 1f0723a4c0df ("mm, slub: enable
slub_debug static key when creating cache with explicit debug flags")
results in the lockdep complaint:
======================================================
WARNING: possible circular locking dependency detected
5.12.0+ #15 Not tainted
------------------------------------------------------
rcu_torture_sta/109 is trying to acquire lock:
ffffffff96063cd0 (cpu_hotplug_lock){++++}-{0:0}, at: static_key_enable+0x9/0x20
but task is already holding lock:
ffffffff96173c28 (slab_mutex){+.+.}-{3:3}, at: kmem_cache_create_usercopy+0x2d/0x250
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (slab_mutex){+.+.}-{3:3}:
lock_acquire+0xb9/0x3a0
__mutex_lock+0x8d/0x920
slub_cpu_dead+0x15/0xf0
cpuhp_invoke_callback+0x17a/0x7c0
cpuhp_invoke_callback_range+0x3b/0x80
_cpu_down+0xdf/0x2a0
cpu_down+0x2c/0x50
device_offline+0x82/0xb0
remove_cpu+0x1a/0x30
torture_offline+0x80/0x140
torture_onoff+0x147/0x260
kthread+0x10a/0x140
ret_from_fork+0x22/0x30
-> #0 (cpu_hotplug_lock){++++}-{0:0}:
check_prev_add+0x8f/0xbf0
__lock_acquire+0x13f0/0x1d80
lock_acquire+0xb9/0x3a0
cpus_read_lock+0x21/0xa0
static_key_enable+0x9/0x20
__kmem_cache_create+0x38d/0x430
kmem_cache_create_usercopy+0x146/0x250
kmem_cache_create+0xd/0x10
rcu_torture_stats+0x79/0x280
kthread+0x10a/0x140
ret_from_fork+0x22/0x30
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(slab_mutex);
lock(cpu_hotplug_lock);
lock(slab_mutex);
lock(cpu_hotplug_lock);
*** DEADLOCK ***
1 lock held by rcu_torture_sta/109:
#0: ffffffff96173c28 (slab_mutex){+.+.}-{3:3}, at: kmem_cache_create_usercopy+0x2d/0x250
stack backtrace:
CPU: 3 PID: 109 Comm: rcu_torture_sta Not tainted 5.12.0+ #15
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-1ubuntu1.1 04/01/2014
Call Trace:
dump_stack+0x6d/0x89
check_noncircular+0xfe/0x110
? lock_is_held_type+0x98/0x110
check_prev_add+0x8f/0xbf0
__lock_acquire+0x13f0/0x1d80
lock_acquire+0xb9/0x3a0
? static_key_enable+0x9/0x20
? mark_held_locks+0x49/0x70
cpus_read_lock+0x21/0xa0
? static_key_enable+0x9/0x20
static_key_enable+0x9/0x20
__kmem_cache_create+0x38d/0x430
kmem_cache_create_usercopy+0x146/0x250
? rcu_torture_stats_print+0xd0/0xd0
kmem_cache_create+0xd/0x10
rcu_torture_stats+0x79/0x280
? rcu_torture_stats_print+0xd0/0xd0
kthread+0x10a/0x140
? kthread_park+0x80/0x80
ret_from_fork+0x22/0x30
This is because there's one order of locking from the hotplug callbacks:
lock(cpu_hotplug_lock); // from hotplug machinery itself
lock(slab_mutex); // in e.g. slab_mem_going_offline_callback()
And commit 1f0723a4c0df made the reverse sequence possible:
lock(slab_mutex); // in kmem_cache_create_usercopy()
lock(cpu_hotplug_lock); // kmem_cache_open() -> static_key_enable()
The simplest fix is to move static_key_enable() to a place before slab_mutex is
taken. That means kmem_cache_create_usercopy() in mm/slab_common.c which is not
ideal for SLUB-specific code, but the #ifdef CONFIG_SLUB_DEBUG makes it
at least self-contained and obvious.
[1] https://lore.kernel.org/lkml/20210502171827.GA3670492@paulmck-ThinkPad-P17-Gen-1/
Link: https://lkml.kernel.org/r/20210504120019.26791-1-vbabka@suse.cz
Fixes: 1f0723a4c0df ("mm, slub: enable slub_debug static key when creating cache with explicit debug flags")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Paul E. McKenney <paulmck@kernel.org>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
When rework early cow of pinned hugetlb pages, we moved huge_ptep_get()
upper but overlooked a side effect that the huge_ptep_get() will fetch the
pte after wr-protection. After moving it upwards, we need explicit
wr-protect of child pte or we will keep the write bit set in the child
process, which could cause data corrution where the child can write to the
original page directly.
This issue can also be exposed by "memfd_test hugetlbfs" kselftest.
Link: https://lkml.kernel.org/r/20210503234356.9097-3-peterx@redhat.com
Fixes: 4eae4efa2c299 ("hugetlb: do early cow when page pinned on src mm")
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Patch series "mm/hugetlb: Fix issues on file sealing and fork", v2.
Hugh reported issue with F_SEAL_FUTURE_WRITE not applied correctly to
hugetlbfs, which I can easily verify using the memfd_test program, which
seems that the program is hardly run with hugetlbfs pages (as by default
shmem).
Meanwhile I found another probably even more severe issue on that hugetlb
fork won't wr-protect child cow pages, so child can potentially write to
parent private pages. Patch 2 addresses that.
After this series applied, "memfd_test hugetlbfs" should start to pass.
This patch (of 2):
F_SEAL_FUTURE_WRITE is missing for hugetlb starting from the first day.
There is a test program for that and it fails constantly.
$ ./memfd_test hugetlbfs
memfd-hugetlb: CREATE
memfd-hugetlb: BASIC
memfd-hugetlb: SEAL-WRITE
memfd-hugetlb: SEAL-FUTURE-WRITE
mmap() didn't fail as expected
Aborted (core dumped)
I think it's probably because no one is really running the hugetlbfs test.
Fix it by checking FUTURE_WRITE also in hugetlbfs_file_mmap() as what we
do in shmem_mmap(). Generalize a helper for that.
Link: https://lkml.kernel.org/r/20210503234356.9097-1-peterx@redhat.com
Link: https://lkml.kernel.org/r/20210503234356.9097-2-peterx@redhat.com
Fixes: ab3948f58ff84 ("mm/memfd: add an F_SEAL_FUTURE_WRITE seal to memfd")
Signed-off-by: Peter Xu <peterx@redhat.com>
Reported-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
https://gitlab.freedesktop.org/drm/msm into drm-fixes
- dsi regression fix
- dma-buf pinning fix
- displayport fixes
- llc fix
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rob Clark <robdclark@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/CAF6AEGuqLZDAEJwUFKb6m+h3kyxgjDEKa3DPA1fHA69vxbXH=g@mail.gmail.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fix from Steven Rostedt:
"Fix trace_check_vprintf() for %.*s
The sanity check of all strings being read from the ring buffer to
make sure they are in safe memory space did not account for the %.*s
notation having another parameter to process (the length).
Add that to the check"
* tag 'trace-v5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Handle %.*s in trace_check_vprintf()
|
|
git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
drm/i915 fixes for v5.13-rc2:
- Fix active callback alignment annotations and subsequent crashes
- Retract link training strategy to slow and wide, again
- Avoid division by zero on gen2
- Use correct width reads for C0DRB3/C1DRB3 registers
- Fix double free in pdp allocation failure path
- Fix HDMI 2.1 PCON downstream caps check
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/87a6oxu9ao.fsf@intel.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
"Fixes and cpucaps.h automatic generation:
- Generate cpucaps.h at build time rather than carrying lots of
#defines. Merged at -rc1 to avoid some conflicts during the merge
window.
- Initialise RGSR_EL1.SEED in __cpu_setup() as it may be left as 0
out of reset and the IRG instruction would not function as expected
if only the architected pseudorandom number generator is
implemented.
- Fix potential race condition in __sync_icache_dcache() where the
PG_dcache_clean page flag is set before the actual cache
maintenance.
- Fix header include in BTI kselftests"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: Fix race condition on PG_dcache_clean in __sync_icache_dcache()
arm64: tools: Add __ASM_CPUCAPS_H to the endif in cpucaps.h
arm64: mte: initialize RGSR_EL1.SEED in __cpu_setup
kselftest/arm64: Add missing stddef.h include to BTI tests
arm64: Generate cpucaps.h
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs fixes from Jaegeuk Kim:
"This fixes some critical bugs such as memory leak in compression
flows, kernel panic when handling errors, and swapon failure due to
newly added condition check"
* tag 'f2fs-5.13-rc1-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs:
f2fs: return EINVAL for hole cases in swap file
f2fs: avoid swapon failure by giving a warning first
f2fs: compress: fix to assign cc.cluster_idx correctly
f2fs: compress: fix race condition of overwrite vs truncate
f2fs: compress: fix to free compress page correctly
f2fs: support iflag change given the mask
f2fs: avoid null pointer access when handling IPU error
|
|
Pull drm fixes from Dave Airlie:
"Not much here, mostly amdgpu fixes, with a couple of radeon, and a
cosmetic vc4.
Two MAINTAINERS file updates also.
amdgpu:
- Fixes for flexible array conversions
- Fix sysfs attribute init
- Harvesting fixes
- VCN CG/PG fixes for Picasso
radeon:
- Fixes for flexible array conversions
- Fix for flickering on Oland with multiple 4K displays
vc4:
- drop unused function"
* tag 'drm-fixes-2021-05-14' of git://anongit.freedesktop.org/drm/drm:
drm/amdgpu: update vcn1.0 Non-DPG suspend sequence
drm/amdgpu: set vcn mgcg flag for picasso
drm/radeon/dpm: Disable sclk switching on Oland when two 4K 60Hz monitors are connected
drm/amdgpu: update the method for harvest IP for specific SKU
drm/amdgpu: add judgement when add ip blocks (v2)
drm/amd/display: Initialize attribute for hdcp_srm sysfs file
drm/amd/pm: Fix out-of-bounds bug
drm/radeon/si_dpm: Fix SMU power state load
drm/radeon/ni_dpm: Fix booting bug
MAINTAINERS: Update address for Emma Anholt
MAINTAINERS: Update my e-mail
drm/vc4: remove unused function
drm/ttm: Do not add non-system domain BO into swap list
|
|
To ensure that instructions are observable in a new mapping, the arm64
set_pte_at() implementation cleans the D-cache and invalidates the
I-cache to the PoU. As an optimisation, this is only done on executable
mappings and the PG_dcache_clean page flag is set to avoid future cache
maintenance on the same page.
When two different processes map the same page (e.g. private executable
file or shared mapping) there's a potential race on checking and setting
PG_dcache_clean via set_pte_at() -> __sync_icache_dcache(). While on the
fault paths the page is locked (PG_locked), mprotect() does not take the
page lock. The result is that one process may see the PG_dcache_clean
flag set but the I/D cache maintenance not yet performed.
Avoid test_and_set_bit(PG_dcache_clean) in favour of separate test_bit()
and set_bit(). In the rare event of a race, the cache maintenance is
done twice.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: <stable@vger.kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Steven Price <steven.price@arm.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210514095001.13236-1-catalin.marinas@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Fix the following kernel-doc warning:
block/partitions/efi.c:685: warning: wrong kernel-doc identifier on line:
* efi_partition(struct parsed_partitions *state)
Cc: Alexander Viro <viro@math.psu.edu>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20210513171708.8391-1-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
If a tag set is shared across request queues (e.g. SCSI LUNs) then the
block layer core keeps track of the number of active request queues in
tags->active_queues. blk_mq_tag_busy() and blk_mq_tag_idle() update that
atomic counter if the hctx flag BLK_MQ_F_TAG_QUEUE_SHARED is set. Make
sure that blk_mq_exit_queue() calls blk_mq_tag_idle() before that flag is
cleared by blk_mq_del_queue_tag_set().
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Fixes: 0d2602ca30e4 ("blk-mq: improve support for shared tags maps")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20210513171529.7977-1-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
In case of shared sbitmap, request won't be held in plug list any more
sine commit 32bc15afed04 ("blk-mq: Facilitate a shared sbitmap per
tagset"), this way makes request merge from flush plug list & batching
submission not possible, so cause performance regression.
Yanhui reports performance regression when running sequential IO
test(libaio, 16 jobs, 8 depth for each job) in VM, and the VM disk
is emulated with image stored on xfs/megaraid_sas.
Fix the issue by recovering original behavior to allow to hold request
in plug list.
Cc: Yanhui Ma <yama@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: kashyap.desai@broadcom.com
Fixes: 32bc15afed04 ("blk-mq: Facilitate a shared sbitmap per tagset")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20210514022052.1047665-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Since recent changes instead of storing a large array of struct
io_mapped_ubuf, we store pointers to them, that is 4 times slimmer and
we should not to so worry about restricting max number of registererd
buffer slots, increase the limit 4 times.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d3dee1da37f46da416aa96a16bf9e5094e10584d.1620990371.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
There are three types of requests that left disabled for sqpoll, namely
epoll ctx, statx, and resources update. Since SQPOLL task is now closely
mimics a userspace thread, remove the restrictions.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/909b52d70c45636d8d7897582474ea5aab5eed34.1620990306.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Always remove linked timeout on io_link_timeout_fn() from the master
request link list, otherwise we may get use-after-free when first
io_link_timeout_fn() puts linked timeout in the fail path, and then
will be found and put on master's free.
Cc: stable@vger.kernel.org # 5.10+
Fixes: 90cd7e424969d ("io_uring: track link timeout's master explicitly")
Reported-and-tested-by: syzbot+5a864149dd970b546223@syzkaller.appspotmail.com
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/69c46bf6ce37fec4fdcd98f0882e18eb07ce693a.1620990121.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Some interrupt handlers have an "extra" that saves 1 or 2
registers (r14, r15) in the paca save area and makes them available to
use by the handler.
The change to always save nvgprs in exception handlers lead to some
interrupt handlers saving those scratch r14 / r15 registers into the
interrupt frame's GPR saves, which get restored on interrupt exit.
Fix this by always reloading those scratch registers from paca before
the EXCEPTION_COMMON that saves nvgprs.
Fixes: 4228b2c3d20e ("powerpc/64e/interrupt: always save nvgprs on interrupt")
Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210514044008.1955783-1-npiggin@gmail.com
|
|
scv support introduced the notion of code that implicitly soft-masks
irqs due to the instruction addresses. This is required because scv
enters the kernel with MSR[EE]=1.
If a NMI (including soft-NMI) interrupt hits when we are implicitly
soft-masked then its regs->softe does not reflect this because it is
derived from the explicit soft mask state (paca->irq_soft_mask). This
makes arch_irq_disabled_regs(regs) return false.
This can trigger a warning in the soft-NMI watchdog code (shown below).
Fix it by having NMI interrupts set regs->softe to disabled in case of
interrupting an implicit soft-masked region.
------------[ cut here ]------------
WARNING: CPU: 41 PID: 1103 at arch/powerpc/kernel/watchdog.c:259 soft_nmi_interrupt+0x3e4/0x5f0
CPU: 41 PID: 1103 Comm: (spawn) Not tainted
NIP: c000000000039534 LR: c000000000039234 CTR: c000000000009a00
REGS: c000007fffbcf940 TRAP: 0700 Not tainted
MSR: 9000000000021033 <SF,HV,ME,IR,DR,RI,LE> CR: 22042482 XER: 200400ad
CFAR: c000000000039260 IRQMASK: 3
GPR00: c000000000039204 c000007fffbcfbe0 c000000001d6c300 0000000000000003
GPR04: 00007ffffa45d078 0000000000000000 0000000000000008 0000000000000020
GPR08: 0000007ffd4e0000 0000000000000000 c000007ffffceb00 7265677368657265
GPR12: 9000000000009033 c000007ffffceb00 00000f7075bf4480 000000000000002a
GPR16: 00000f705745a528 00007ffffa45ddd8 00000f70574d0008 0000000000000000
GPR20: 00000f7075c58d70 00000f7057459c38 0000000000000001 0000000000000040
GPR24: 0000000000000000 0000000000000029 c000000001dae058 0000000000000029
GPR28: 0000000000000000 0000000000000800 0000000000000009 c000007fffbcfd60
NIP [c000000000039534] soft_nmi_interrupt+0x3e4/0x5f0
LR [c000000000039234] soft_nmi_interrupt+0xe4/0x5f0
Call Trace:
[c000007fffbcfbe0] [c000000000039204] soft_nmi_interrupt+0xb4/0x5f0 (unreliable)
[c000007fffbcfcf0] [c00000000000c0e8] soft_nmi_common+0x138/0x1c4
--- interrupt: 900 at end_real_trampolines+0x0/0x1000
NIP: c000000000003000 LR: 00007ca426adb03c CTR: 900000000280f033
REGS: c000007fffbcfd60 TRAP: 0900
MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 44042482 XER: 200400ad
CFAR: 00007ca426946020 IRQMASK: 0
GPR00: 00000000000000ad 00007ffffa45d050 00007ca426b07f00 0000000000000035
GPR04: 00007ffffa45d078 0000000000000000 0000000000000008 0000000000000020
GPR08: 0000000000000000 0000000000100000 0000000010000000 00007ffffa45d110
GPR12: 0000000000000001 00007ca426d4e680 00000f7075bf4480 000000000000002a
GPR16: 00000f705745a528 00007ffffa45ddd8 00000f70574d0008 0000000000000000
GPR20: 00000f7075c58d70 00000f7057459c38 0000000000000001 0000000000000040
GPR24: 0000000000000000 00000f7057473f68 0000000000000003 000000000000041b
GPR28: 00007ffffa45d4c4 0000000000000035 0000000000000000 00000f7057473f68
NIP [c000000000003000] end_real_trampolines+0x0/0x1000
LR [00007ca426adb03c] 0x7ca426adb03c
--- interrupt: 900
Instruction dump:
60000000 60000000 60420000 38600001 482b3ae5 60000000 e93f0138 a36d0008
7daa6b78 71290001 7f7907b4 4082fd34 <0fe00000> 4bfffd2c 60420000 ea6100a8
---[ end trace dc75f67d819779da ]---
Fixes: 118178e62e2e ("powerpc: move NMI entry/exit code into wrapper")
Reported-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210503111708.758261-1-npiggin@gmail.com
|
|
The stf entry barrier fallback is unsafe to execute in a semi-patched
state, which can happen when enabling/disabling the mitigation with
strict kernel RWX enabled and using the hash MMU.
See the previous commit for more details.
Fix it by changing the order in which we patch the instructions.
Note the stf barrier fallback is only used on Power6 or earlier.
Fixes: bd573a81312f ("powerpc/mm/64s: Allow STRICT_KERNEL_RWX again")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210513140800.1391706-2-mpe@ellerman.id.au
|
|
The entry flush mitigation can be enabled/disabled at runtime. When this
happens it results in the kernel patching its own instructions to
enable/disable the mitigation sequence.
With strict kernel RWX enabled instruction patching happens via a
secondary mapping of the kernel text, so that we don't have to make the
primary mapping writable. With the hash MMU this leads to a hash fault,
which causes us to execute the exception entry which contains the entry
flush mitigation.
This means we end up executing the entry flush in a semi-patched state,
ie. after we have patched the first instruction but before we patch the
second or third instruction of the sequence.
On machines with updated firmware the entry flush is a series of special
nops, and it's safe to to execute in a semi-patched state.
However when using the fallback flush the sequence is mflr/branch/mtlr,
and so it's not safe to execute if we have patched out the mflr but not
the other two instructions. Doing so leads to us corrputing LR, leading
to an oops, for example:
# echo 0 > /sys/kernel/debug/powerpc/entry_flush
kernel tried to execute exec-protected page (c000000002971000) - exploit attempt? (uid: 0)
BUG: Unable to handle kernel instruction fetch
Faulting instruction address: 0xc000000002971000
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
CPU: 0 PID: 2215 Comm: bash Not tainted 5.13.0-rc1-00010-gda3bb206c9ce #1
NIP: c000000002971000 LR: c000000002971000 CTR: c000000000120c40
REGS: c000000013243840 TRAP: 0400 Not tainted (5.13.0-rc1-00010-gda3bb206c9ce)
MSR: 8000000010009033 <SF,EE,ME,IR,DR,RI,LE> CR: 48428482 XER: 00000000
...
NIP 0xc000000002971000
LR 0xc000000002971000
Call Trace:
do_patch_instruction+0xc4/0x340 (unreliable)
do_entry_flush_fixups+0x100/0x3b0
entry_flush_set+0x50/0xe0
simple_attr_write+0x160/0x1a0
full_proxy_write+0x8c/0x110
vfs_write+0xf0/0x340
ksys_write+0x84/0x140
system_call_exception+0x164/0x2d0
system_call_common+0xec/0x278
The simplest fix is to change the order in which we patch the
instructions, so that the sequence is always safe to execute. For the
non-fallback flushes it doesn't matter what order we patch in.
Fixes: bd573a81312f ("powerpc/mm/64s: Allow STRICT_KERNEL_RWX again")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210513140800.1391706-1-mpe@ellerman.id.au
|
|
The entry flush mitigation can be enabled/disabled at runtime via a
debugfs file (entry_flush), which causes the kernel to patch itself to
enable/disable the relevant mitigations.
However depending on which mitigation we're using, it may not be safe to
do that patching while other CPUs are active. For example the following
crash:
sleeper[15639]: segfault (11) at c000000000004c20 nip c000000000004c20 lr c000000000004c20
Shows that we returned to userspace with a corrupted LR that points into
the kernel, due to executing the partially patched call to the fallback
entry flush (ie. we missed the LR restore).
Fix it by doing the patching under stop machine. The CPUs that aren't
doing the patching will be spinning in the core of the stop machine
logic. That is currently sufficient for our purposes, because none of
the patching we do is to that code or anywhere in the vicinity.
Fixes: f79643787e0a ("powerpc/64s: flush L1D on kernel entry")
Cc: stable@vger.kernel.org # v5.10+
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210506044959.1298123-2-mpe@ellerman.id.au
|
|
The STF (store-to-load forwarding) barrier mitigation can be
enabled/disabled at runtime via a debugfs file (stf_barrier), which
causes the kernel to patch itself to enable/disable the relevant
mitigations.
However depending on which mitigation we're using, it may not be safe to
do that patching while other CPUs are active. For example the following
crash:
User access of kernel address (c00000003fff5af0) - exploit attempt? (uid: 0)
segfault (11) at c00000003fff5af0 nip 7fff8ad12198 lr 7fff8ad121f8 code 1
code: 40820128 e93c00d0 e9290058 7c292840 40810058 38600000 4bfd9a81 e8410018
code: 2c030006 41810154 3860ffb6 e9210098 <e94d8ff0> 7d295279 39400000 40820a3c
Shows that we returned to userspace without restoring the user r13
value, due to executing the partially patched STF exit code.
Fix it by doing the patching under stop machine. The CPUs that aren't
doing the patching will be spinning in the core of the stop machine
logic. That is currently sufficient for our purposes, because none of
the patching we do is to that code or anywhere in the vicinity.
Fixes: a048a07d7f45 ("powerpc/64s: Add support for a store forwarding barrier at kernel entry/exit")
Cc: stable@vger.kernel.org # v4.17+
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210506044959.1298123-1-mpe@ellerman.id.au
|
|
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-5.13-2021-05-13:
amdgpu:
- Fixes for flexible array conversions
- Fix sysfs attribute init
- Harvesting fixes
- VCN CG/PG fixes for Picasso
radeon:
- Fixes for flexible array conversions
- Fix for flickering on Oland with multiple 4K displays
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210513163228.3963-1-alexander.deucher@amd.com
|
|
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
Remove an unused function and a MAINTAINERS update.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20210513133617.xq77wwrehpuh7yn2@hendrix
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These close a coverage gap in the intel_pstate driver and fix runtime
PM child count imbalance related to interactions with system-wide
suspend.
Specifics:
- Make intel_pstate work as expected on systems where the platform
firmware enables HWP even though the HWP EPP support is not
advertised (Rafael Wysocki).
- Fix possible runtime PM child count imbalance that may occur if
other runtime PM functions are called after invoking
pm_runtime_force_suspend() and before pm_runtime_force_resume()
is called (Tony Lindgren)"
* tag 'pm-5.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM: runtime: Fix unpaired parent child_count for force_resume
cpufreq: intel_pstate: Use HWP if enabled by platform firmware
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"These revert an unnecessary revert of an ACPI power management commit,
add a missing device ID to one of the lists and fix a possible memory
leak in an error path.
Specifics:
- Revert a revert of a recent ACPI power management change that does
not need to be reverted after all (Rafael Wysocki).
- Add missing fan device ID to the list of device IDs for which the
devices should not be put into the ACPI PM domain (Sumeet
Pawnikar).
- Fix possible memory leak in an error path in the ACPI device
enumeration code (Christophe JAILLET)"
* tag 'acpi-5.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: PM: Add ACPI ID of Alder Lake Fan
ACPI: scan: Fix a memory leak in an error handling path
Revert "Revert "ACPI: scan: Turn off unused power resources during initialization""
|