summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2023-07-11Linux 6.4.3v6.4.3Greg Kroah-Hartman1-1/+1
Link: https://lore.kernel.org/r/20230709111345.297026264@linuxfoundation.org Tested-by: Ronald Warsow <rwarsow@gmx.de Link: https://lore.kernel.org/r/20230709203826.141774942@linuxfoundation.org Tested-by: Ronald Warsow <rwarsow@gmx.de> Tested-by: Bagas Sanjaya <bagasdotme@gmail.com> Tested-by: Linux Kernel Functional Testing <lkft@linaro.org> Tested-by: Chris Paterson (CIP) <chris.paterson2@renesas.com> Tested-by: Salvatore Bonaccorso <carnil@debian.org> Tested-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Takeshi Ogasawara <takeshi.ogasawara@futuring-girl.com> Tested-by: Ron Economos <re@w6rz.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-11fork: lock VMAs of the parent process when forkingSuren Baghdasaryan1-0/+1
commit fb49c455323ff8319a123dd312be9082c49a23a5 upstream. When forking a child process, the parent write-protects anonymous pages and COW-shares them with the child being forked using copy_present_pte(). We must not take any concurrent page faults on the source vma's as they are being processed, as we expect both the vma and the pte's behind it to be stable. For example, the anon_vma_fork() expects the parents vma->anon_vma to not change during the vma copy. A concurrent page fault on a page newly marked read-only by the page copy might trigger wp_page_copy() and a anon_vma_prepare(vma) on the source vma, defeating the anon_vma_clone() that wasn't done because the parent vma originally didn't have an anon_vma, but we now might end up copying a pte entry for a page that has one. Before the per-vma lock based changes, the mmap_lock guaranteed exclusion with concurrent page faults. But now we need to do a vma_start_write() to make sure no concurrent faults happen on this vma while it is being processed. This fix can potentially regress some fork-heavy workloads. Kernel build time did not show noticeable regression on a 56-core machine while a stress test mapping 10000 VMAs and forking 5000 times in a tight loop shows ~5% regression. If such fork time regression is unacceptable, disabling CONFIG_PER_VMA_LOCK should restore its performance. Further optimizations are possible if this regression proves to be problematic. Suggested-by: David Hildenbrand <david@redhat.com> Reported-by: Jiri Slaby <jirislaby@kernel.org> Closes: https://lore.kernel.org/all/dbdef34c-3a07-5951-e1ae-e9c6e3cdf51b@kernel.org/ Reported-by: Holger Hoffstätte <holger@applied-asynchrony.com> Closes: https://lore.kernel.org/all/b198d649-f4bf-b971-31d0-e8433ec2a34c@applied-asynchrony.com/ Reported-by: Jacob Young <jacobly.alt@gmail.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217624 Fixes: 0bff0aaea03e ("x86/mm: try VMA lock-based page fault handling first") Cc: stable@vger.kernel.org Signed-off-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-11bootmem: remove the vmemmap pages from kmemleak in free_bootmem_pageLiu Shixin1-0/+2
commit 028725e73375a1ff080bbdf9fb503306d0116f28 upstream. commit dd0ff4d12dd2 ("bootmem: remove the vmemmap pages from kmemleak in put_page_bootmem") fix an overlaps existing problem of kmemleak. But the problem still existed when HAVE_BOOTMEM_INFO_NODE is disabled, because in this case, free_bootmem_page() will call free_reserved_page() directly. Fix the problem by adding kmemleak_free_part() in free_bootmem_page() when HAVE_BOOTMEM_INFO_NODE is disabled. Link: https://lkml.kernel.org/r/20230704101942.2819426-1-liushixin2@huawei.com Fixes: f41f2ed43ca5 ("mm: hugetlb: free the vmemmap pages associated with each HugeTLB page") Signed-off-by: Liu Shixin <liushixin2@huawei.com> Acked-by: Muchun Song <songmuchun@bytedance.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-11mm: call arch_swap_restore() from do_swap_page()Peter Collingbourne1-0/+7
commit 6dca4ac6fc91fd41ea4d6c4511838d37f4e0eab2 upstream. Commit c145e0b47c77 ("mm: streamline COW logic in do_swap_page()") moved the call to swap_free() before the call to set_pte_at(), which meant that the MTE tags could end up being freed before set_pte_at() had a chance to restore them. Fix it by adding a call to the arch_swap_restore() hook before the call to swap_free(). Link: https://lkml.kernel.org/r/20230523004312.1807357-2-pcc@google.com Link: https://linux-review.googlesource.com/id/I6470efa669e8bd2f841049b8c61020c510678965 Fixes: c145e0b47c77 ("mm: streamline COW logic in do_swap_page()") Signed-off-by: Peter Collingbourne <pcc@google.com> Reported-by: Qun-wei Lin <Qun-wei.Lin@mediatek.com> Closes: https://lore.kernel.org/all/5050805753ac469e8d727c797c2218a9d780d434.camel@mediatek.com/ Acked-by: David Hildenbrand <david@redhat.com> Acked-by: "Huang, Ying" <ying.huang@intel.com> Reviewed-by: Steven Price <steven.price@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Cc: <stable@vger.kernel.org> [6.1+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-11mm: lock newly mapped VMA with corrected orderingHugh Dickins1-2/+2
commit 1c7873e3364570ec89343ff4877e0f27a7b21a61 upstream. Lockdep is certainly right to complain about (&vma->vm_lock->lock){++++}-{3:3}, at: vma_start_write+0x2d/0x3f but task is already holding lock: (&mapping->i_mmap_rwsem){+.+.}-{3:3}, at: mmap_region+0x4dc/0x6db Invert those to the usual ordering. Fixes: 33313a747e81 ("mm: lock newly mapped VMA which can be modified after it becomes visible") Cc: stable@vger.kernel.org Signed-off-by: Hugh Dickins <hughd@google.com> Tested-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-11mm: lock newly mapped VMA which can be modified after it becomes visibleSuren Baghdasaryan1-0/+2
commit 33313a747e81af9f31d0d45de78c9397fa3655eb upstream. mmap_region adds a newly created VMA into VMA tree and might modify it afterwards before dropping the mmap_lock. This poses a problem for page faults handled under per-VMA locks because they don't take the mmap_lock and can stumble on this VMA while it's still being modified. Currently this does not pose a problem since post-addition modifications are done only for file-backed VMAs, which are not handled under per-VMA lock. However, once support for handling file-backed page faults with per-VMA locks is added, this will become a race. Fix this by write-locking the VMA before inserting it into the VMA tree. Other places where a new VMA is added into VMA tree do not modify it after the insertion, so do not need the same locking. Cc: stable@vger.kernel.org Signed-off-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-11mm: lock a vma before stack expansionSuren Baghdasaryan1-0/+4
commit c137381f71aec755fbf47cd4e9bd4dce752c054c upstream. With recent changes necessitating mmap_lock to be held for write while expanding a stack, per-VMA locks should follow the same rules and be write-locked to prevent page faults into the VMA being expanded. Add the necessary locking. Cc: stable@vger.kernel.org Signed-off-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-05Linux 6.4.2v6.4.2Greg Kroah-Hartman1-1/+1
Link: https://lore.kernel.org/r/20230703184519.261119397@linuxfoundation.org Link: https://lore.kernel.org/r/20230704084611.900603362@linuxfoundation.org Tested-by: Ronald Warsow <rwarsow@gmx.de> Tested-by: Bagas Sanjaya <bagasdotme@gmail.com> Tested-by: Salvatore Bonaccorso <carnil@debian.org> Tested-by: Ron Economos <re@w6rz.net> Tested-by: Rudi Heitbaum <rudi@heitbaum.com> Tested-by: Markus Reichelt <lkt+2023@mareichelt.com> Tested-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Linux Kernel Functional Testing <lkft@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-05arch/arm64/mm/fault: Fix undeclared variable error in do_page_fault()SeongJae Park1-2/+0
commit 24be4d0b46bb0c3c1dc7bacd30957d6144a70dfc upstream. Commit ae870a68b5d1 ("arm64/mm: Convert to using lock_mm_and_find_vma()") made do_page_fault() to use 'vma' even if CONFIG_PER_VMA_LOCK is not defined, but the declaration is still in the ifdef. As a result, building kernel without the config fails with undeclared variable error as below: arch/arm64/mm/fault.c: In function 'do_page_fault': arch/arm64/mm/fault.c:624:2: error: 'vma' undeclared (first use in this function); did you mean 'vmap'? 624 | vma = lock_mm_and_find_vma(mm, addr, regs); | ^~~ | vmap Fix it by moving the declaration out of the ifdef. Fixes: ae870a68b5d1 ("arm64/mm: Convert to using lock_mm_and_find_vma()") Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-05drm/amdgpu: Validate VM ioctl flags.Bas Nieuwenhuizen1-0/+4
commit a2b308044dcaca8d3e580959a4f867a1d5c37fac upstream. None have been defined yet, so reject anybody setting any. Mesa sets it to 0 anyway. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-05dm ioctl: Avoid double-fetch of versionDemi Marie Obenour1-12/+21
commit 249bed821b4db6d95a99160f7d6d236ea5fe6362 upstream. The version is fetched once in check_version(), which then does some validation and then overwrites the version in userspace with the API version supported by the kernel. copy_params() then fetches the version from userspace *again*, and this time no validation is done. The result is that the kernel's version number is completely controllable by userspace, provided that userspace can win a race condition. Fix this flaw by not copying the version back to the kernel the second time. This is not exploitable as the version is not further used in the kernel. However, it could become a problem if future patches start relying on the version field. Cc: stable@vger.kernel.org Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-05docs: Set minimal gtags / GNU GLOBAL version to 6.6.5Ahmed S. Darwish1-0/+7
commit b230235b386589d8f0d631b1c77a95ca79bb0732 upstream. Kernel build now uses the gtags "-C (--directory)" option, available since GNU GLOBAL v6.6.5. Update the documentation accordingly. Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de> Cc: <stable@vger.kernel.org> Link: https://lists.gnu.org/archive/html/info-global/2020-09/msg00000.html Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-05scripts/tags.sh: Resolve gtags empty index generationAhmed S. Darwish1-1/+8
commit e1b37563caffc410bb4b55f153ccb14dede66815 upstream. gtags considers any file outside of its current working directory "outside the source tree" and refuses to index it. For O= kernel builds, or when "make" is invoked from a directory other then the kernel source tree, gtags ignores the entire kernel source and generates an empty index. Force-set gtags current working directory to the kernel source tree. Due to commit 9da0763bdd82 ("kbuild: Use relative path when building in a subdir of the source tree"), if the kernel build is done in a sub-directory of the kernel source tree, the kernel Makefile will set the kernel's $srctree to ".." for shorter compile-time and run-time warnings. Consequently, the list of files to be indexed will be in the "../*" form, rendering all such paths invalid once gtags switches to the kernel source tree as its current working directory. If gtags indexing is requested and the build directory is not the kernel source tree, index all files in absolute-path form. Note, indexing in absolute-path form will not affect the generated index, as paths in gtags indices are always relative to the gtags "root directory" anyway (as evidenced by "gtags --dump"). Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de> Cc: <stable@vger.kernel.org> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-05hugetlb: revert use of page_cache_next_miss()Mike Kravetz2-11/+9
commit fd4aed8d985a3236d0877ff6d0c80ad39d4ce81a upstream. Ackerley Tng reported an issue with hugetlbfs fallocate as noted in the Closes tag. The issue showed up after the conversion of hugetlb page cache lookup code to use page_cache_next_miss. User visible effects are: - hugetlbfs fallocate incorrectly returns -EEXIST if pages are presnet in the file. - hugetlb pages will not be included in core dumps if they need to be brought in via GUP. - userfaultfd UFFDIO_COPY will not notice pages already present in the cache. It may try to allocate a new page and potentially return ENOMEM as opposed to EEXIST. Revert the use page_cache_next_miss() in hugetlb code. IMPORTANT NOTE FOR STABLE BACKPORTS: This patch will apply cleanly to v6.3. However, due to the change of filemap_get_folio() return values, it will not function correctly. This patch must be modified for stable backports. [dan.carpenter@linaro.org: fix hugetlbfs_pagecache_present()] Link: https://lkml.kernel.org/r/efa86091-6a2c-4064-8f55-9b44e1313015@moroto.mountain Link: https://lkml.kernel.org/r/20230621212403.174710-2-mike.kravetz@oracle.com Fixes: d0ce0e47b323 ("mm/hugetlb: convert hugetlb fault paths to use alloc_hugetlb_folio()") Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reported-by: Ackerley Tng <ackerleytng@google.com> Closes: https://lore.kernel.org/linux-mm/cover.1683069252.git.ackerleytng@google.com Reviewed-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Erdem Aktas <erdemaktas@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Vishal Annapurve <vannapurve@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-05nubus: Partially revert proc_create_single_data() conversionFinn Thain1-5/+17
commit 0e96647cff9224db564a1cee6efccb13dbe11ee2 upstream. The conversion to proc_create_single_data() introduced a regression whereby reading a file in /proc/bus/nubus results in a seg fault: # grep -r . /proc/bus/nubus/e/ Data read fault at 0x00000020 in Super Data (pc=0x1074c2) BAD KERNEL BUSERR Oops: 00000000 Modules linked in: PC: [<001074c2>] PDE_DATA+0xc/0x16 SR: 2010 SP: 38284958 a2: 01152370 d0: 00000001 d1: 01013000 d2: 01002790 d3: 00000000 d4: 00000001 d5: 0008ce2e a0: 00000000 a1: 00222a40 Process grep (pid: 45, task=142f8727) Frame format=B ssw=074d isc=2008 isb=4e5e daddr=00000020 dobuf=01199e70 baddr=001074c8 dibuf=ffffffff ver=f Stack from 01199e48: 01199e70 00222a58 01002790 00000000 011a3000 01199eb0 015000c0 00000000 00000000 01199ec0 01199ec0 000d551a 011a3000 00000001 00000000 00018000 d003f000 00000003 00000001 0002800d 01052840 01199fa8 c01f8000 00000000 00000029 0b532b80 00000000 00000000 00000029 0b532b80 01199ee4 00103640 011198c0 d003f000 00018000 01199fa8 00000000 011198c0 00000000 01199f4c 000b3344 011198c0 d003f000 00018000 01199fa8 00000000 00018000 011198c0 Call Trace: [<00222a58>] nubus_proc_rsrc_show+0x18/0xa0 [<000d551a>] seq_read+0xc4/0x510 [<00018000>] fp_fcos+0x2/0x82 [<0002800d>] __sys_setreuid+0x115/0x1c6 [<00103640>] proc_reg_read+0x5c/0xb0 [<00018000>] fp_fcos+0x2/0x82 [<000b3344>] __vfs_read+0x2c/0x13c [<00018000>] fp_fcos+0x2/0x82 [<00018000>] fp_fcos+0x2/0x82 [<000b8aa2>] sys_statx+0x60/0x7e [<000b34b6>] vfs_read+0x62/0x12a [<00018000>] fp_fcos+0x2/0x82 [<00018000>] fp_fcos+0x2/0x82 [<000b39c2>] ksys_read+0x48/0xbe [<00018000>] fp_fcos+0x2/0x82 [<000b3a4e>] sys_read+0x16/0x1a [<00018000>] fp_fcos+0x2/0x82 [<00002b84>] syscall+0x8/0xc [<00018000>] fp_fcos+0x2/0x82 [<0000c016>] not_ext+0xa/0x18 Code: 4e5e 4e75 4e56 0000 206e 0008 2068 ffe8 <2068> 0020 2008 4e5e 4e75 4e56 0000 2f0b 206e 0008 2068 0004 2668 0020 206b ffe8 Disabling lock debugging due to kernel taint Segmentation fault The proc_create_single_data() conversion does not work because single_open(file, nubus_proc_rsrc_show, PDE_DATA(inode)) is not equivalent to the original code. Fixes: 3f3942aca6da ("proc: introduce proc_create_single{,_data}") Cc: Christoph Hellwig <hch@lst.de> Cc: stable@vger.kernel.org # 5.6+ Signed-off-by: Finn Thain <fthain@linux-m68k.org> Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org> Link: https://lore.kernel.org/r/d4e2a586e793cc8d9442595684ab8a077c0fe726.1678783919.git.fthain@linux-m68k.org Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-05Revert "cxl/port: Enable the HDM decoder capability for switch ports"Dan Williams5-49/+9
commit 8f0220af58c3b73e9041377a23708d37600b33c1 upstream. commit eb0764b822b9 ("cxl/port: Enable the HDM decoder capability for switch ports") ...was added on the observation of CXL memory not being accessible after setting up a region on a "cold-plugged" device. A "cold-plugged" CXL device is one that was not present at boot, so platform-firmware/BIOS has no chance to set it up. While it is true that the debug found the enable bit clear in the host-bridge's instance of the global control register (CXL 3.0 8.2.4.19.2 CXL HDM Decoder Global Control Register), that bit is described as: "This bit is only applicable to CXL.mem devices and shall return 0 on CXL Host Bridges and Upstream Switch Ports." So it is meant to be zero, and further testing confirmed that this "fix" had no effect on the failure. Revert it, and be more vigilant about proposed fixes in the future. Since the original copied stable@, flag this revert for stable@ as well. Cc: <stable@vger.kernel.org> Fixes: eb0764b822b9 ("cxl/port: Enable the HDM decoder capability for switch ports") Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/168685882012.3475336.16733084892658264991.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-05nfs: don't report STATX_BTIME in ->getattrJeff Layton1-1/+1
commit cded49ba366220ae7009d71c5804baa01acfb860 upstream. NFS doesn't properly support reporting the btime in getattr (yet), but 61a968b4f05e mistakenly added it to the request_mask. This causes statx for STATX_BTIME to report a zeroed out btime instead of properly clearing the flag. Cc: stable@vger.kernel.org # v6.3+ Fixes: 61a968b4f05e ("nfs: report the inode version in getattr if requested") Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2214134 Reported-by: Boyang Xue <bxue@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-05execve: always mark stack as growing down during early stack setupLinus Torvalds1-1/+3
commit f66066bc5136f25e36a2daff4896c768f18c211e upstream. While our user stacks can grow either down (all common architectures) or up (parisc and the ia64 register stack), the initial stack setup when we copy the argument and environment strings to the new stack at execve() time is always done by extending the stack downwards. But it turns out that in commit 8d7071af8907 ("mm: always expand the stack with the mmap write lock held"), as part of making the stack growing code more robust, 'expand_downwards()' was now made to actually check the vma flags: if (!(vma->vm_flags & VM_GROWSDOWN)) return -EFAULT; and that meant that this execve-time stack expansion started failing on parisc, because on that architecture, the stack flags do not contain the VM_GROWSDOWN bit. At the same time the new check in expand_downwards() is clearly correct, and simplified the callers, so let's not remove it. The solution is instead to just codify the fact that yes, during execve(), the stack grows down. This not only matches reality, it ends up being particularly simple: we already have special execve-time flags for the stack (VM_STACK_INCOMPLETE_SETUP) and use those flags to avoid page migration during this setup time (see vma_is_temporary_stack() and invalid_migration_vma()). So just add VM_GROWSDOWN to that set of temporary flags, and now our stack flags automatically match reality, and the parisc stack expansion works again. Note that the VM_STACK_INCOMPLETE_SETUP bits will be cleared when the stack is finalized, so we only add the extra VM_GROWSDOWN bit on CONFIG_STACK_GROWSUP architectures (ie parisc) rather than adding it in general. Link: https://lore.kernel.org/all/612eaa53-6904-6e16-67fc-394f4faa0e16@bell.net/ Link: https://lore.kernel.org/all/5fd98a09-4792-1433-752d-029ae3545168@gmx.de/ Fixes: 8d7071af8907 ("mm: always expand the stack with the mmap write lock held") Reported-by: John David Anglin <dave.anglin@bell.net> Reported-and-tested-by: Helge Deller <deller@gmx.de> Reported-and-tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-05PCI/ACPI: Call _REG when transitioning D-statesMario Limonciello1-0/+22
commit 112a7f9c8edbf76f7cb83856a6cb6b60a210b659 upstream. ACPI r6.5, sec 6.5.4, describes how AML is unable to access an OperationRegion unless _REG has been called to connect a handler: The OS runs _REG control methods to inform AML code of a change in the availability of an operation region. When an operation region handler is unavailable, AML cannot access data fields in that region. (Operation region writes will be ignored and reads will return indeterminate data.) The PCI core does not call _REG at any time, leading to the undefined behavior mentioned in the spec. The spec explains that _REG should be executed to indicate whether a given region can be accessed: Once _REG has been executed for a particular operation region, indicating that the operation region handler is ready, a control method can access fields in the operation region. Conversely, control methods must not access fields in operation regions when _REG method execution has not indicated that the operation region handler is ready. An example included in the spec demonstrates calling _REG when devices are turned off: "when the host controller or bridge controller is turned off or disabled, PCI Config Space Operation Regions for child devices are no longer available. As such, ETH0’s _REG method will be run when it is turned off and will again be run when PCI1 is turned off." It is reported that ASMedia PCIe GPIO controllers fail functional tests after the system has returning from suspend (S3 or s2idle). This is because the BIOS checks whether the OSPM has called the _REG method to determine whether it can interact with the OperationRegion assigned to the device as part of the other AML called for the device. To fix this issue, call acpi_evaluate_reg() when devices are transitioning to D3cold or D0. [bhelgaas: split pci_power_t checking to preliminary patch] Link: https://uefi.org/specs/ACPI/6.5/06_Device_Configuration.html#reg-region Link: https://lore.kernel.org/r/20230620140451.21007-1-mario.limonciello@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Rafael J. Wysocki <rafael@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-05PCI/ACPI: Validate acpi_pci_set_power_state() parameterBjorn Helgaas1-13/+18
commit 5557b62634abbd55bab7b154ce4bca348ad7f96f upstream. Previously acpi_pci_set_power_state() assumed the requested power state was valid (PCI_D0 ... PCI_D3cold). If a caller supplied something else, we could index outside the state_conv[] array and pass junk to acpi_device_set_power(). Validate the pci_power_t parameter and return -EINVAL if it's invalid. Link: https://lore.kernel.org/r/20230621222857.GA122930@bhelgaas Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-05tools/nolibc: x86_64: disable stack protector for _startThomas Weißschuh1-1/+1
commit 7a9b2345202a14dfec9081994486156f7a691513 upstream. This was forgotten in the original submission. It is unknown why it worked for x86_64 on some compiler without this attribute. Reported-by: Willy Tarreau <w@1wt.eu> Closes: https://lore.kernel.org/lkml/20230520133237.GA27501@1wt.eu/ Fixes: 0d8c461adbc4 ("tools/nolibc: x86_64: add stackprotector support") Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-05xtensa: fix lock_mm_and_find_vma in case VMA not foundMax Filippov1-1/+6
commit 03f889378f33aa9a9d8e5f49ba94134cf6158090 upstream. MMU version of lock_mm_and_find_vma releases the mm lock before returning when VMA is not found. Do the same in noMMU version. This fixes hang on an attempt to handle protection fault. Fixes: d85a143b69ab ("xtensa: fix NOMMU build with lock_mm_and_find_vma() conversion") Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-01Linux 6.4.1v6.4.1Greg Kroah-Hartman1-1/+1
Link: https://lore.kernel.org/r/20230629184151.888604958@linuxfoundation.org Tested-by: Ronald Warsow <rwarsow@gmx.de> Link: https://lore.kernel.org/r/20230630055626.202608973@linuxfoundation.org Link: https://lore.kernel.org/r/20230630072101.040486316@linuxfoundation.org Tested-by: Ron Economos <re@w6rz.net> Tested-by: Jon Hunter <jonathanh@nvidia.com> Tested-by: Ronald Warsow <rwarsow@gmx.de> Tested-by: Rudi Heitbaum <rudi@heitbaum.com> Tested-by: Linux Kernel Functional Testing <lkft@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-01xtensa: fix NOMMU build with lock_mm_and_find_vma() conversionLinus Torvalds2-2/+14
commit d85a143b69abb4d7544227e26d12c4c7735ab27d upstream. It turns out that xtensa has a really odd configuration situation: you can do a no-MMU config, but still have the page fault code enabled. Which doesn't sound all that sensible, but it turns out that xtensa can have protection faults even without the MMU, and we have this: config PFAULT bool "Handle protection faults" if EXPERT && !MMU default y help Handle protection faults. MMU configurations must enable it. noMMU configurations may disable it if used memory map never generates protection faults or faults are always fatal. If unsure, say Y. which completely violated my expectations of the page fault handling. End result: Guenter reports that the xtensa no-MMU builds all fail with arch/xtensa/mm/fault.c: In function ‘do_page_fault’: arch/xtensa/mm/fault.c:133:8: error: implicit declaration of function ‘lock_mm_and_find_vma’ because I never exposed the new lock_mm_and_find_vma() function for the no-MMU case. Doing so is simple enough, and fixes the problem. Reported-and-tested-by: Guenter Roeck <linux@roeck-us.net> Fixes: a050ba1e7422 ("mm/fault: convert remaining simple cases to lock_mm_and_find_vma()") Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-01csky: fix up lock_mm_and_find_vma() conversionLinus Torvalds1-1/+1
commit e55e5df193d247a38a5e1ac65a5316a0adcc22fa upstream. As already mentioned in my merge message for the 'expand-stack' branch, we have something like 24 different versions of the page fault path for all our different architectures, all just _slightly_ different due to various historical reasons (usually related to exactly when they branched off the original i386 version, and the details of the other architectures they had in their history). And a few of them had some silly mistake in the conversion. Most of the architectures call the faulting address 'address' in the fault path. But not all. Some just call it 'addr'. And if you end up doing a bit too much copy-and-paste, you end up with the wrong version in the places that do it differently. In this case it was csky. Fixes: a050ba1e7422 ("mm/fault: convert remaining simple cases to lock_mm_and_find_vma()") Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-01parisc: fix expand_stack() conversionLinus Torvalds1-1/+1
commit ea3f8272876f2958463992f6736ab690fde7fa9c upstream. In commit 8d7071af8907 ("mm: always expand the stack with the mmap write lock held") I tried to deal with the remaining odd page fault handling cases. The oddest one is ia64, which has stacks that grow both up and down. And because ia64 was _so_ odd, I asked people to verify the end result. But a close second oddity is parisc, which is the only one that has a main stack growing up (our "CONFIG_STACK_GROWSUP" config option). But it looked obvious enough that I didn't worry about it. I should have worried a bit more. Not because it was particularly complex, but because I just used the wrong variable name. The previous vma isn't called "prev", it's called "prev_vma". Blush. Fixes: 8d7071af8907 ("mm: always expand the stack with the mmap write lock held") Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-01sparc32: fix lock_mm_and_find_vma() conversionLinus Torvalds1-1/+1
commit 0b26eadbf200abf6c97c6d870286c73219cdac65 upstream. The sparc32 conversion to lock_mm_and_find_vma() in commit a050ba1e7422 ("mm/fault: convert remaining simple cases to lock_mm_and_find_vma()") missed the fact that we didn't actually have a 'regs' pointer available in the 'force_user_fault()' case. It's there in the regular page fault path ("do_sparc_fault()"), but not the window underflow/overflow paths. Which is all fine - we can just pass in a NULL pointer. The register state is only used to avoid deadlock with kernel faults, which is not the case for any of these register window faults. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Fixes: a050ba1e7422 ("mm/fault: convert remaining simple cases to lock_mm_and_find_vma()") Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Naresh Kamboju <naresh.kamboju@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-01Revert "thermal/drivers/mediatek: Use devm_of_iomap to avoid resource leak ↵Ricardo Cañuelo1-12/+2
in mtk_thermal_probe" commit 86edac7d3888c715fe3a81bd61f3617ecfe2e1dd upstream. This reverts commit f05c7b7d9ea9477fcc388476c6f4ade8c66d2d26. That change was causing a regression in the generic-adc-thermal-probed bootrr test as reported in the kernelci-results list [1]. A proper rework will take longer, so revert it for now. [1] https://groups.io/g/kernelci-results/message/42660 Fixes: f05c7b7d9ea9 ("thermal/drivers/mediatek: Use devm_of_iomap to avoid resource leak in mtk_thermal_probe") Signed-off-by: Ricardo Cañuelo <ricardo.canuelo@collabora.com> Suggested-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20230525121811.3360268-1-ricardo.canuelo@collabora.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-01HID: logitech-hidpp: add HIDPP_QUIRK_DELAYED_INIT for the T651.Mike Hommey1-1/+1
commit 5fe251112646d8626818ea90f7af325bab243efa upstream. commit 498ba2069035 ("HID: logitech-hidpp: Don't restart communication if not necessary") put restarting communication behind that flag, and this was apparently necessary on the T651, but the flag was not set for it. Fixes: 498ba2069035 ("HID: logitech-hidpp: Don't restart communication if not necessary") Cc: stable@vger.kernel.org Signed-off-by: Mike Hommey <mh@glandium.org> Link: https://lore.kernel.org/r/20230617230957.6mx73th4blv7owqk@glandium.org Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-01HID: hidraw: fix data race on device refcountLudvig Michaelsson1-2/+7
commit 944ee77dc6ec7b0afd8ec70ffc418b238c92f12b upstream. The hidraw_open() function increments the hidraw device reference counter. The counter has no dedicated synchronization mechanism, resulting in a potential data race when concurrently opening a device. The race is a regression introduced by commit 8590222e4b02 ("HID: hidraw: Replace hidraw device table mutex with a rwsem"). While minors_rwsem is intended to protect the hidraw_table itself, by instead acquiring the lock for writing, the reference counter is also protected. This is symmetrical to hidraw_release(). Link: https://github.com/systemd/systemd/issues/27947 Fixes: 8590222e4b02 ("HID: hidraw: Replace hidraw device table mutex with a rwsem") Cc: stable@vger.kernel.org Signed-off-by: Ludvig Michaelsson <ludvig.michaelsson@yubico.com> Link: https://lore.kernel.org/r/20230621-hidraw-race-v1-1-a58e6ac69bab@yubico.com Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-01fbdev: fix potential OOB read in fast_imageblit()Zhang Shurong1-1/+1
commit c2d22806aecb24e2de55c30a06e5d6eb297d161d upstream. There is a potential OOB read at fast_imageblit, for "colortab[(*src >> 4)]" can become a negative value due to "const char *s = image->data, *src". This change makes sure the index for colortab always positive or zero. Similar commit: https://patchwork.kernel.org/patch/11746067 Potential bug report: https://groups.google.com/g/syzkaller-bugs/c/9ubBXKeKXf4/m/k-QXy4UgAAAJ Signed-off-by: Zhang Shurong <zhang_shurong@foxmail.com> Cc: stable@vger.kernel.org Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-01mm/khugepaged: fix regression in collapse_file()Hugh Dickins1-4/+3
commit e8c716bc6812202ccf4ce0f0bad3428b794fb39c upstream. There is no xas_pause(&xas) in collapse_file()'s main loop, at the points where it does xas_unlock_irq(&xas) and then continues. That would explain why, once two weeks ago and twice yesterday, I have hit the VM_BUG_ON_PAGE(page != xas_load(&xas), page) since "mm/khugepaged: fix iteration in collapse_file" removed the xas_set(&xas, index) just before it: xas.xa_node could be left pointing to a stale node, if there was concurrent activity on the file which transformed its xarray. I tried inserting xas_pause()s, but then even bootup crashed on that VM_BUG_ON_PAGE(): there appears to be a subtle "nextness" implicit in xas_pause(). xas_next() and xas_pause() are good for use in simple loops, but not in this one: xas_set() worked well until now, so use xas_set(&xas, index) explicitly at the head of the loop; and change that VM_BUG_ON_PAGE() not to need its own xas_set(), and not to interfere with the xa_state (which would probably stop the crashes from xas_pause(), but I trust that less). The user-visible effects of this bug (if VM_BUG_ONs are configured out) would be data loss and data leak - potentially - though in practice I expect it is more likely that a subsequent check (e.g. on mapping or on nr_none) would notice an inconsistency, and just abandon the collapse. Link: https://lore.kernel.org/linux-mm/f18e4b64-3f88-a8ab-56cc-d1f5f9c58d4@google.com/ Fixes: c8a8f3b4a95a ("mm/khugepaged: fix iteration in collapse_file") Signed-off-by: Hugh Dickins <hughd@google.com> Cc: stable@kernel.org Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: David Stevens <stevensd@chromium.org> Cc: Peter Xu <peterx@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-01gup: add warning if some caller would seem to want stack expansionLinus Torvalds1-2/+10
commit a425ac5365f6cb3cc47bf83e6bff0213c10445f7 upstream. It feels very unlikely that anybody would want to do a GUP in an unmapped area under the stack pointer, but real users sometimes do some really strange things. So add a (temporary) warning for the case where a GUP fails and expanding the stack might have made it work. It's trivial to do the expansion in the caller as part of getting the mm lock in the first place - see __access_remote_vm() for ptrace, for example - it's just that it's unnecessarily painful to do it deep in the guts of the GUP lookup when we might have to drop and re-take the lock. I doubt anybody actually does anything quite this strange, but let's be proactive: adding these warnings is simple, and will make debugging it much easier if they trigger. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-01HID: wacom: Use ktime_t rather than int when dealing with timestampsJason Gerecke2-4/+4
commit 9a6c0e28e215535b2938c61ded54603b4e5814c5 upstream. Code which interacts with timestamps needs to use the ktime_t type returned by functions like ktime_get. The int type does not offer enough space to store these values, and attempting to use it is a recipe for problems. In this particular case, overflows would occur when calculating/storing timestamps leading to incorrect values being reported to userspace. In some cases these bad timestamps cause input handling in userspace to appear hung. Link: https://gitlab.freedesktop.org/libinput/libinput/-/issues/901 Fixes: 17d793f3ed53 ("HID: wacom: insert timestamp to packed Bluetooth (BT) events") CC: stable@vger.kernel.org Signed-off-by: Jason Gerecke <jason.gerecke@wacom.com> Reviewed-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Link: https://lore.kernel.org/r/20230608213828.2108-1-jason.gerecke@wacom.com Signed-off-by: Benjamin Tissoires <bentiss@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-01mm: always expand the stack with the mmap write lock heldLinus Torvalds17-116/+169
commit 8d7071af890768438c14db6172cc8f9f4d04e184 upstream. This finishes the job of always holding the mmap write lock when extending the user stack vma, and removes the 'write_locked' argument from the vm helper functions again. For some cases, we just avoid expanding the stack at all: drivers and page pinning really shouldn't be extending any stacks. Let's see if any strange users really wanted that. It's worth noting that architectures that weren't converted to the new lock_mm_and_find_vma() helper function are left using the legacy "expand_stack()" function, but it has been changed to drop the mmap_lock and take it for writing while expanding the vma. This makes it fairly straightforward to convert the remaining architectures. As a result of dropping and re-taking the lock, the calling conventions for this function have also changed, since the old vma may no longer be valid. So it will now return the new vma if successful, and NULL - and the lock dropped - if the area could not be extended. Tested-by: Vegard Nossum <vegard.nossum@oracle.com> Tested-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> # ia64 Tested-by: Frank Scheiner <frank.scheiner@web.de> # ia64 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-01execve: expand new process stack manually ahead of timeLinus Torvalds1-16/+21
commit f313c51d26aa87e69633c9b46efb37a930faca71 upstream. This is a small step towards a model where GUP itself would not expand the stack, and any user that needs GUP to not look up existing mappings, but actually expand on them, would have to do so manually before-hand, and with the mm lock held for writing. It turns out that execve() already did almost exactly that, except it didn't take the mm lock at all (it's single-threaded so no locking technically needed, but it could cause lockdep errors). And it only did it for the CONFIG_STACK_GROWSUP case, since in that case GUP has obviously never expanded the stack downwards. So just make that CONFIG_STACK_GROWSUP case do the right thing with locking, and enable it generally. This will eventually help GUP, and in the meantime avoids a special case and the lockdep issue. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-01mm: make find_extend_vma() fail if write lock not heldLiam R. Howlett6-27/+49
commit f440fa1ac955e2898893f9301568435eb5cdfc4b upstream. Make calls to extend_vma() and find_extend_vma() fail if the write lock is required. To avoid making this a flag-day event, this still allows the old read-locking case for the trivial situations, and passes in a flag to say "is it write-locked". That way write-lockers can say "yes, I'm being careful", and legacy users will continue to work in all the common cases until they have been fully converted to the new world order. Co-Developed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-01powerpc/mm: convert coprocessor fault to lock_mm_and_find_vma()Linus Torvalds1-11/+3
commit 2cd76c50d0b41cec5c87abfcdf25b236a2793fb6 upstream. This is one of the simple cases, except there's no pt_regs pointer. Which is fine, as lock_mm_and_find_vma() is set up to work fine with a NULL pt_regs. Powerpc already enabled LOCK_MM_AND_FIND_VMA for the main CPU faulting, so we can just use the helper without any extra work. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-01mm/fault: convert remaining simple cases to lock_mm_and_find_vma()Linus Torvalds18-124/+45
commit a050ba1e7422f2cc60ff8bfde3f96d34d00cb585 upstream. This does the simple pattern conversion of alpha, arc, csky, hexagon, loongarch, nios2, sh, sparc32, and xtensa to the lock_mm_and_find_vma() helper. They all have the regular fault handling pattern without odd special cases. The remaining architectures all have something that keeps us from a straightforward conversion: ia64 and parisc have stacks that can grow both up as well as down (and ia64 has special address region checks). And m68k, microblaze, openrisc, sparc64, and um end up having extra rules about only expanding the stack down a limited amount below the user space stack pointer. That is something that x86 used to do too (long long ago), and it probably could just be skipped, but it still makes the conversion less than trivial. Note that this conversion was done manually and with the exception of alpha without any build testing, because I have a fairly limited cross- building environment. The cases are all simple, and I went through the changes several times, but... Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-01arm/mm: Convert to using lock_mm_and_find_vma()Ben Hutchings3-50/+16
commit 8b35ca3e45e35a26a21427f35d4093606e93ad0a upstream. arm has an additional check for address < FIRST_USER_ADDRESS before expanding the stack. Since FIRST_USER_ADDRESS is defined everywhere (generally as 0), move that check to the generic expand_downwards(). Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-01riscv/mm: Convert to using lock_mm_and_find_vma()Ben Hutchings2-18/+14
commit 7267ef7b0b77f4ed23b7b3c87d8eca7bd9c2d007 upstream. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-01mips/mm: Convert to using lock_mm_and_find_vma()Ben Hutchings2-10/+3
commit 4bce37a68ff884e821a02a731897a8119e0c37b7 upstream. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-01powerpc/mm: Convert to using lock_mm_and_find_vma()Michael Ellerman2-36/+4
commit e6fe228c4ffafdfc970cf6d46883a1f481baf7ea upstream. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-01arm64/mm: Convert to using lock_mm_and_find_vma()Linus Torvalds2-39/+9
commit ae870a68b5d13d67cf4f18d47bb01ee3fee40acb upstream. This converts arm64 to use the new page fault helper. It was very straightforward, but still needed a fix for the "obvious" conversion I initially did. Thanks to Suren for the fix and testing. Fixed-and-tested-by: Suren Baghdasaryan <surenb@google.com> Unnecessary-code-removal-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-01mm: make the page fault mmap locking killableLinus Torvalds1-4/+2
commit eda0047296a16d65a7f2bc60a408f70d178b2014 upstream. This is done as a separate patch from introducing the new lock_mm_and_find_vma() helper, because while it's an obvious change, it's not what x86 used to do in this area. We already abort the page fault on fatal signals anyway, so why should we wait for the mmap lock only to then abort later? With the new helper function that returns without the lock held on failure anyway, this is particularly easy and straightforward. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-01mm: introduce new 'lock_mm_and_find_vma()' page fault helperLinus Torvalds5-50/+130
commit c2508ec5a58db67093f4fb8bf89a9a7c53a109e9 upstream. .. and make x86 use it. This basically extracts the existing x86 "find and expand faulting vma" code, but extends it to also take the mmap lock for writing in case we actually do need to expand the vma. We've historically short-circuited that case, and have some rather ugly special logic to serialize the stack segment expansion (since we only hold the mmap lock for reading) that doesn't match the normal VM locking. That slight violation of locking worked well, right up until it didn't: the maple tree code really does want proper locking even for simple extension of an existing vma. So extract the code for "look up the vma of the fault" from x86, fix it up to do the necessary write locking, and make it available as a helper function for other architectures that can use the common helper. Note: I say "common helper", but it really only handles the normal stack-grows-down case. Which is all architectures except for PA-RISC and IA64. So some rare architectures can't use the helper, but if they care they'll just need to open-code this logic. It's also worth pointing out that this code really would like to have an optimistic "mmap_upgrade_trylock()" to make it quicker to go from a read-lock (for the common case) to taking the write lock (for having to extend the vma) in the normal single-threaded situation where there is no other locking activity. But that _is_ all the very uncommon special case, so while it would be nice to have such an operation, it probably doesn't matter in reality. I did put in the skeleton code for such a possible future expansion, even if it only acts as pseudo-documentation for what we're doing. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-01maple_tree: fix potential out-of-bounds access in mas_wr_end_piv()Peng Zhang1-5/+6
commit cd00dd2585c4158e81fdfac0bbcc0446afbad26d upstream. Check the write offset end bounds before using it as the offset into the pivot array. This avoids a possible out-of-bounds access on the pivot array if the write extends to the last slot in the node, in which case the node maximum should be used as the end pivot. akpm: this doesn't affect any current callers, but new users of mapletree may encounter this problem if backported into earlier kernels, so let's fix it in -stable kernels in case of this. Link: https://lkml.kernel.org/r/20230506024752.2550-1-zhangpeng.00@bytedance.com Fixes: 54a611b60590 ("Maple Tree: add new data structure") Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-01can: isotp: isotp_sendmsg(): fix return error fix on TX pathOliver Hartkopp1-2/+3
commit e38910c0072b541a91954682c8b074a93e57c09b upstream. With commit d674a8f123b4 ("can: isotp: isotp_sendmsg(): fix return error on FC timeout on TX path") the missing correct return value in the case of a protocol error was introduced. But the way the error value has been read and sent to the user space does not follow the common scheme to clear the error after reading which is provided by the sock_error() function. This leads to an error report at the following write() attempt although everything should be working. Fixes: d674a8f123b4 ("can: isotp: isotp_sendmsg(): fix return error on FC timeout on TX path") Reported-by: Carsten Schmidt <carsten.schmidt-achim@t-online.de> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://lore.kernel.org/all/20230607072708.38809-1-socketcan@hartkopp.net Cc: stable@vger.kernel.org Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-01cpufreq: amd-pstate: Make amd-pstate EPP driver name hyphenatedWyes Karny1-1/+1
commit f4aad639302a07454dcb23b408dcadf8a9efb031 upstream. amd-pstate passive mode driver is hyphenated. So make amd-pstate active mode driver consistent with that rename "amd_pstate_epp" to "amd-pstate-epp". Fixes: ffa5096a7c33 ("cpufreq: amd-pstate: implement Pstate EPP support for the AMD processors") Cc: All applicable <stable@vger.kernel.org> Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Signed-off-by: Wyes Karny <wyes.karny@amd.com> Acked-by: Huang Rui <ray.huang@amd.com> Reviewed-by: Perry Yuan <Perry.Yuan@amd.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-01x86/smp: Cure kexec() vs. mwait_play_dead() breakageThomas Gleixner3-0/+66
commit d7893093a7417527c0d73c9832244e65c9d0114f upstream. TLDR: It's a mess. When kexec() is executed on a system with offline CPUs, which are parked in mwait_play_dead() it can end up in a triple fault during the bootup of the kexec kernel or cause hard to diagnose data corruption. The reason is that kexec() eventually overwrites the previous kernel's text, page tables, data and stack. If it writes to the cache line which is monitored by a previously offlined CPU, MWAIT resumes execution and ends up executing the wrong text, dereferencing overwritten page tables or corrupting the kexec kernels data. Cure this by bringing the offlined CPUs out of MWAIT into HLT. Write to the monitored cache line of each offline CPU, which makes MWAIT resume execution. The written control word tells the offlined CPUs to issue HLT, which does not have the MWAIT problem. That does not help, if a stray NMI, MCE or SMI hits the offlined CPUs as those make it come out of HLT. A follow up change will put them into INIT, which protects at least against NMI and SMI. Fixes: ea53069231f9 ("x86, hotplug: Use mwait to offline a processor, fix the legacy case") Reported-by: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Ashok Raj <ashok.raj@intel.com> Reviewed-by: Ashok Raj <ashok.raj@intel.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230615193330.492257119@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>