summaryrefslogtreecommitdiff
path: root/arch/x86/entry/vdso
AgeCommit message (Collapse)AuthorFilesLines
2024-06-12kbuild: unify vdso_install rulesMasahiro Yamada1-27/+0
[ Upstream commit 56769ba4b297a629148eb24d554aef72d1ddfd9e ] Currently, there is no standard implementation for vdso_install, leading to various issues: 1. Code duplication Many architectures duplicate similar code just for copying files to the install destination. Some architectures (arm, sparc, x86) create build-id symlinks, introducing more code duplication. 2. Unintended updates of in-tree build artifacts The vdso_install rule depends on the vdso files to install. It may update in-tree build artifacts. This can be problematic, as explained in commit 19514fc665ff ("arm, kbuild: make "make install" not depend on vmlinux"). 3. Broken code in some architectures Makefile code is often copied from one architecture to another without proper adaptation. 'make vdso_install' for parisc does not work. 'make vdso_install' for s390 installs vdso64, but not vdso32. To address these problems, this commit introduces a generic vdso_install rule. Architectures that support vdso_install need to define vdso-install-y in arch/*/Makefile. vdso-install-y lists the files to install. For example, arch/x86/Makefile looks like this: vdso-install-$(CONFIG_X86_64) += arch/x86/entry/vdso/vdso64.so.dbg vdso-install-$(CONFIG_X86_X32_ABI) += arch/x86/entry/vdso/vdsox32.so.dbg vdso-install-$(CONFIG_X86_32) += arch/x86/entry/vdso/vdso32.so.dbg vdso-install-$(CONFIG_IA32_EMULATION) += arch/x86/entry/vdso/vdso32.so.dbg These files will be installed to $(MODLIB)/vdso/ with the .dbg suffix, if exists, stripped away. vdso-install-y can optionally take the second field after the colon separator. This is needed because some architectures install a vdso file as a different base name. The following is a snippet from arch/arm64/Makefile. vdso-install-$(CONFIG_COMPAT_VDSO) += arch/arm64/kernel/vdso32/vdso.so.dbg:vdso32.so This will rename vdso.so.dbg to vdso32.so during installation. If such architectures change their implementation so that the base names match, this workaround will go away. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Sven Schnelle <svens@linux.ibm.com> # s390 Reviewed-by: Nicolas Schier <nicolas@fjasle.eu> Reviewed-by: Guo Ren <guoren@kernel.org> Acked-by: Helge Deller <deller@gmx.de> # parisc Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Stable-dep-of: fc2f5f10f9bc ("s390/vdso: Create .build-id links for unstripped vdso files") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-09x86/mm: Fix VDSO and VVAR placement on 5-level paging machinesKirill A. Shutemov1-2/+2
Yingcong has noticed that on the 5-level paging machine, VDSO and VVAR VMAs are placed above the 47-bit border: 8000001a9000-8000001ad000 r--p 00000000 00:00 0 [vvar] 8000001ad000-8000001af000 r-xp 00000000 00:00 0 [vdso] This might confuse users who are not aware of 5-level paging and expect all userspace addresses to be under the 47-bit border. So far problem has only been triggered with ASLR disabled, although it may also occur with ASLR enabled if the layout is randomized in a just right way. The problem happens due to custom placement for the VMAs in the VDSO code: vdso_addr() tries to place them above the stack and checks the result against TASK_SIZE_MAX, which is wrong. TASK_SIZE_MAX is set to the 56-bit border on 5-level paging machines. Use DEFAULT_MAP_WINDOW instead. Fixes: b569bab78d8d ("x86/mm: Prepare to expose larger address space to userspace") Reported-by: Yingcong Wu <yingcong.wu@intel.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20230803151609.22141-1-kirill.shutemov%40linux.intel.com
2023-05-18x86/vdso: Include vdso/processor.hArnd Bergmann1-0/+1
__vdso_getcpu is declared in a header but this is not included before the definition, causing a W=1 warning: arch/x86/entry/vdso/vgetcpu.c:13:1: error: no previous prototype for '__vdso_getcpu' [-Werror=missing-prototypes] arch/x86/entry/vdso/vdso32/../vgetcpu.c:13:1: error: no previous prototype for '__vdso_getcpu' [-Werror=missing-prototypes] Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://lore.kernel.org/all/20230516193549.544673-17-arnd%40kernel.org
2023-04-28Merge tag 'x86_cleanups_for_v6.4_rc1' of ↵Linus Torvalds1-10/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Borislav Petkov: - Unify duplicated __pa() and __va() definitions - Simplify sysctl tables registration - Remove unused symbols - Correct function name in comment * tag 'x86_cleanups_for_v6.4_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/boot: Centralize __pa()/__va() definitions x86: Simplify one-level sysctl registration for itmt_kern_table x86: Simplify one-level sysctl registration for abi_table2 x86/platform/intel-mid: Remove unused definitions from intel-mid.h x86/uaccess: Remove memcpy_page_flushcache() x86/entry: Change stale function name in comment to error_return()
2023-03-22x86: Simplify one-level sysctl registration for abi_table2Luis Chamberlain1-10/+1
There is no need to declare an extra tables to just create directory, this can be easily be done with a prefix path with register_sysctl(). Simplify this registration. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/all/20230310233248.3965389-2-mcgrof%40kernel.org
2023-03-21vdso: Improve cmd_vdso_check to check all dynamic relocationsFangrui Song1-4/+1
The actual intention is that no dynamic relocation exists in the VDSO. For this the VDSO build validates that the resulting .so file does not have any relocations which are specified via $(ARCH_REL_TYPE_ABS) per architecture, which is fragile as e.g. ARM64 lacks an entry for R_AARCH64_RELATIVE. Aside of that ARCH_REL_TYPE_ABS is a misnomer as it checks for relative relocations too. However, some GNU ld ports produce unneeded R_*_NONE relocation entries. If a port fails to determine the exact .rel[a].dyn size, the trailing zeros become R_*_NONE relocations. E.g. ld's powerpc port recently fixed https://sourceware.org/bugzilla/show_bug.cgi?id=29540). R_*_NONE are generally a no-op in the dynamic loaders. So just ignore them. Remove the ARCH_REL_TYPE_ABS defines and just validate that the resulting .so file does not contain any R_* relocation entries except R_*_NONE. Signed-off-by: Fangrui Song <maskray@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com> # for aarch64 Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> # for vDSO, aarch64 Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Link: https://lore.kernel.org/r/20230310190750.3323802-1-maskray@google.com
2023-02-24Merge tag 'mm-stable-2023-02-20-13-37' of ↵Linus Torvalds1-3/+1
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Daniel Verkamp has contributed a memfd series ("mm/memfd: add F_SEAL_EXEC") which permits the setting of the memfd execute bit at memfd creation time, with the option of sealing the state of the X bit. - Peter Xu adds a patch series ("mm/hugetlb: Make huge_pte_offset() thread-safe for pmd unshare") which addresses a rare race condition related to PMD unsharing. - Several folioification patch serieses from Matthew Wilcox, Vishal Moola, Sidhartha Kumar and Lorenzo Stoakes - Johannes Weiner has a series ("mm: push down lock_page_memcg()") which does perform some memcg maintenance and cleanup work. - SeongJae Park has added DAMOS filtering to DAMON, with the series "mm/damon/core: implement damos filter". These filters provide users with finer-grained control over DAMOS's actions. SeongJae has also done some DAMON cleanup work. - Kairui Song adds a series ("Clean up and fixes for swap"). - Vernon Yang contributed the series "Clean up and refinement for maple tree". - Yu Zhao has contributed the "mm: multi-gen LRU: memcg LRU" series. It adds to MGLRU an LRU of memcgs, to improve the scalability of global reclaim. - David Hildenbrand has added some userfaultfd cleanup work in the series "mm: uffd-wp + change_protection() cleanups". - Christoph Hellwig has removed the generic_writepages() library function in the series "remove generic_writepages". - Baolin Wang has performed some maintenance on the compaction code in his series "Some small improvements for compaction". - Sidhartha Kumar is doing some maintenance work on struct page in his series "Get rid of tail page fields". - David Hildenbrand contributed some cleanup, bugfixing and generalization of pte management and of pte debugging in his series "mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with swap PTEs". - Mel Gorman and Neil Brown have removed the __GFP_ATOMIC allocation flag in the series "Discard __GFP_ATOMIC". - Sergey Senozhatsky has improved zsmalloc's memory utilization with his series "zsmalloc: make zspage chain size configurable". - Joey Gouly has added prctl() support for prohibiting the creation of writeable+executable mappings. The previous BPF-based approach had shortcomings. See "mm: In-kernel support for memory-deny-write-execute (MDWE)". - Waiman Long did some kmemleak cleanup and bugfixing in the series "mm/kmemleak: Simplify kmemleak_cond_resched() & fix UAF". - T.J. Alumbaugh has contributed some MGLRU cleanup work in his series "mm: multi-gen LRU: improve". - Jiaqi Yan has provided some enhancements to our memory error statistics reporting, mainly by presenting the statistics on a per-node basis. See the series "Introduce per NUMA node memory error statistics". - Mel Gorman has a second and hopefully final shot at fixing a CPU-hog regression in compaction via his series "Fix excessive CPU usage during compaction". - Christoph Hellwig does some vmalloc maintenance work in the series "cleanup vfree and vunmap". - Christoph Hellwig has removed block_device_operations.rw_page() in ths series "remove ->rw_page". - We get some maple_tree improvements and cleanups in Liam Howlett's series "VMA tree type safety and remove __vma_adjust()". - Suren Baghdasaryan has done some work on the maintainability of our vm_flags handling in the series "introduce vm_flags modifier functions". - Some pagemap cleanup and generalization work in Mike Rapoport's series "mm, arch: add generic implementation of pfn_valid() for FLATMEM" and "fixups for generic implementation of pfn_valid()" - Baoquan He has done some work to make /proc/vmallocinfo and /proc/kcore better represent the real state of things in his series "mm/vmalloc.c: allow vread() to read out vm_map_ram areas". - Jason Gunthorpe rationalized the GUP system's interface to the rest of the kernel in the series "Simplify the external interface for GUP". - SeongJae Park wishes to migrate people from DAMON's debugfs interface over to its sysfs interface. To support this, we'll temporarily be printing warnings when people use the debugfs interface. See the series "mm/damon: deprecate DAMON debugfs interface". - Andrey Konovalov provided the accurately named "lib/stackdepot: fixes and clean-ups" series. - Huang Ying has provided a dramatic reduction in migration's TLB flush IPI rates with the series "migrate_pages(): batch TLB flushing". - Arnd Bergmann has some objtool fixups in "objtool warning fixes". * tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (505 commits) include/linux/migrate.h: remove unneeded externs mm/memory_hotplug: cleanup return value handing in do_migrate_range() mm/uffd: fix comment in handling pte markers mm: change to return bool for isolate_movable_page() mm: hugetlb: change to return bool for isolate_hugetlb() mm: change to return bool for isolate_lru_page() mm: change to return bool for folio_isolate_lru() objtool: add UACCESS exceptions for __tsan_volatile_read/write kmsan: disable ftrace in kmsan core code kasan: mark addr_has_metadata __always_inline mm: memcontrol: rename memcg_kmem_enabled() sh: initialize max_mapnr m68k/nommu: add missing definition of ARCH_PFN_OFFSET mm: percpu: fix incorrect size in pcpu_obj_full_size() maple_tree: reduce stack usage with gcc-9 and earlier mm: page_alloc: call panic() when memoryless node allocation fails mm: multi-gen LRU: avoid futile retries migrate_pages: move THP/hugetlb migration support check to simplify code migrate_pages: batch flushing TLB migrate_pages: share more code between _unmap and _move ...
2023-02-07x86/vdso: Fake 32bit VDSO build on 64bit compile for vgetcpuSebastian Andrzej Siewior3-26/+27
The 64bit register constrains in __arch_hweight64() cannot be fulfilled in a 32-bit build. The function is only declared but not used within vclock_gettime.c and gcc does not care. LLVM complains and aborts. Reportedly because it validates extended asm even if latter would get compiled out, see https://lore.kernel.org/r/Y%2BJ%2BUQ1vAKr6RHuH@dev-arch.thelio-3990X i.e., a long standing design difference between gcc and LLVM. Move the "fake a 32 bit kernel configuration" bits from vclock_gettime.c into a common header file. Use this from vclock_gettime.c and vgetcpu.c. [ bp: Add background info from Nathan. ] Fixes: 92d33063c081a ("x86/vdso: Provide getcpu for x86-32.") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/Y+IsCWQdXEr8d9Vy@linutronix.de
2023-02-06x86/vdso: Provide getcpu for x86-32.Sebastian Andrzej Siewior4-3/+6
Wire up __vdso_getcpu() for x86-32. The 64bit version is reused with trivial modifications. Contrary to vclock_gettime.c there is no requirement to fake any defines in the case of 32bit VDSO on a 64bit kernel because the GDT entry from which the CPU and node information is read is always the native one. Adopt vdso_getcpu.c by: - removing the unneeded time* header files which lead to compile errors for 32bit. - adding segment.h which provides vdso_read_cpunode() and the defines required by it. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20221125094216.3663444-3-bigeasy@linutronix.de
2023-01-25x86/vdso: Move VDSO image init to vdso2c generated codeBrian Gerst3-24/+10
Generate an init function for each VDSO image, replacing init_vdso() and sysenter_setup(). Signed-off-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230124184019.26850-1-brgerst@gmail.com
2023-01-19mm: remove zap_page_range and create zap_vma_pagesMike Kravetz1-3/+1
zap_page_range was originally designed to unmap pages within an address range that could span multiple vmas. While working on [1], it was discovered that all callers of zap_page_range pass a range entirely within a single vma. In addition, the mmu notification call within zap_page range does not correctly handle ranges that span multiple vmas. When crossing a vma boundary, a new mmu_notifier_range_init/end call pair with the new vma should be made. Instead of fixing zap_page_range, do the following: - Create a new routine zap_vma_pages() that will remove all pages within the passed vma. Most users of zap_page_range pass the entire vma and can use this new routine. - For callers of zap_page_range not passing the entire vma, instead call zap_page_range_single(). - Remove zap_page_range. [1] https://lore.kernel.org/linux-mm/20221114235507.294320-2-mike.kravetz@oracle.com/ Link: https://lkml.kernel.org/r/20230104002732.232573-1-mike.kravetz@oracle.com Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Suggested-by: Peter Xu <peterx@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Peter Xu <peterx@redhat.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> [s390] Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Rik van Riel <riel@surriel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-12-15Merge tag 'x86_core_for_v6.2' of ↵Linus Torvalds1-6/+8
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 core updates from Borislav Petkov: - Add the call depth tracking mitigation for Retbleed which has been long in the making. It is a lighterweight software-only fix for Skylake-based cores where enabling IBRS is a big hammer and causes a significant performance impact. What it basically does is, it aligns all kernel functions to 16 bytes boundary and adds a 16-byte padding before the function, objtool collects all functions' locations and when the mitigation gets applied, it patches a call accounting thunk which is used to track the call depth of the stack at any time. When that call depth reaches a magical, microarchitecture-specific value for the Return Stack Buffer, the code stuffs that RSB and avoids its underflow which could otherwise lead to the Intel variant of Retbleed. This software-only solution brings a lot of the lost performance back, as benchmarks suggest: https://lore.kernel.org/all/20220915111039.092790446@infradead.org/ That page above also contains a lot more detailed explanation of the whole mechanism - Implement a new control flow integrity scheme called FineIBT which is based on the software kCFI implementation and uses hardware IBT support where present to annotate and track indirect branches using a hash to validate them - Other misc fixes and cleanups * tag 'x86_core_for_v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (80 commits) x86/paravirt: Use common macro for creating simple asm paravirt functions x86/paravirt: Remove clobber bitmask from .parainstructions x86/debug: Include percpu.h in debugreg.h to get DECLARE_PER_CPU() et al x86/cpufeatures: Move X86_FEATURE_CALL_DEPTH from bit 18 to bit 19 of word 11, to leave space for WIP X86_FEATURE_SGX_EDECCSSA bit x86/Kconfig: Enable kernel IBT by default x86,pm: Force out-of-line memcpy() objtool: Fix weak hole vs prefix symbol objtool: Optimize elf_dirty_reloc_sym() x86/cfi: Add boot time hash randomization x86/cfi: Boot time selection of CFI scheme x86/ibt: Implement FineIBT objtool: Add --cfi to generate the .cfi_sites section x86: Add prefix symbols for function padding objtool: Add option to generate prefix symbols objtool: Avoid O(bloody terrible) behaviour -- an ode to libelf objtool: Slice up elf_create_section_symbol() kallsyms: Revert "Take callthunks into account" x86: Unconfuse CONFIG_ and X86_FEATURE_ namespaces x86/retpoline: Fix crash printing warning x86/paravirt: Fix a !PARAVIRT build warning ...
2022-12-13Merge tag 'random-6.2-rc1-for-linus' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/crng/random Pull random number generator updates from Jason Donenfeld: - Replace prandom_u32_max() and various open-coded variants of it, there is now a new family of functions that uses fast rejection sampling to choose properly uniformly random numbers within an interval: get_random_u32_below(ceil) - [0, ceil) get_random_u32_above(floor) - (floor, U32_MAX] get_random_u32_inclusive(floor, ceil) - [floor, ceil] Coccinelle was used to convert all current users of prandom_u32_max(), as well as many open-coded patterns, resulting in improvements throughout the tree. I'll have a "late" 6.1-rc1 pull for you that removes the now unused prandom_u32_max() function, just in case any other trees add a new use case of it that needs to converted. According to linux-next, there may be two trivial cases of prandom_u32_max() reintroductions that are fixable with a 's/.../.../'. So I'll have for you a final conversion patch doing that alongside the removal patch during the second week. This is a treewide change that touches many files throughout. - More consistent use of get_random_canary(). - Updates to comments, documentation, tests, headers, and simplification in configuration. - The arch_get_random*_early() abstraction was only used by arm64 and wasn't entirely useful, so this has been replaced by code that works in all relevant contexts. - The kernel will use and manage random seeds in non-volatile EFI variables, refreshing a variable with a fresh seed when the RNG is initialized. The RNG GUID namespace is then hidden from efivarfs to prevent accidental leakage. These changes are split into random.c infrastructure code used in the EFI subsystem, in this pull request, and related support inside of EFISTUB, in Ard's EFI tree. These are co-dependent for full functionality, but the order of merging doesn't matter. - Part of the infrastructure added for the EFI support is also used for an improvement to the way vsprintf initializes its siphash key, replacing an sleep loop wart. - The hardware RNG framework now always calls its correct random.c input function, add_hwgenerator_randomness(), rather than sometimes going through helpers better suited for other cases. - The add_latent_entropy() function has long been called from the fork handler, but is a no-op when the latent entropy gcc plugin isn't used, which is fine for the purposes of latent entropy. But it was missing out on the cycle counter that was also being mixed in beside the latent entropy variable. So now, if the latent entropy gcc plugin isn't enabled, add_latent_entropy() will expand to a call to add_device_randomness(NULL, 0), which adds a cycle counter, without the absent latent entropy variable. - The RNG is now reseeded from a delayed worker, rather than on demand when used. Always running from a worker allows it to make use of the CPU RNG on platforms like S390x, whose instructions are too slow to do so from interrupts. It also has the effect of adding in new inputs more frequently with more regularity, amounting to a long term transcript of random values. Plus, it helps a bit with the upcoming vDSO implementation (which isn't yet ready for 6.2). - The jitter entropy algorithm now tries to execute on many different CPUs, round-robining, in hopes of hitting even more memory latencies and other unpredictable effects. It also will mix in a cycle counter when the entropy timer fires, in addition to being mixed in from the main loop, to account more explicitly for fluctuations in that timer firing. And the state it touches is now kept within the same cache line, so that it's assured that the different execution contexts will cause latencies. * tag 'random-6.2-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random: (23 commits) random: include <linux/once.h> in the right header random: align entropy_timer_state to cache line random: mix in cycle counter when jitter timer fires random: spread out jitter callback to different CPUs random: remove extraneous period and add a missing one in comments efi: random: refresh non-volatile random seed when RNG is initialized vsprintf: initialize siphash key using notifier random: add back async readiness notifier random: reseed in delayed work rather than on-demand random: always mix cycle counter in add_latent_entropy() hw_random: use add_hwgenerator_randomness() for early entropy random: modernize documentation comment on get_random_bytes() random: adjust comment to account for removed function random: remove early archrandom abstraction random: use random.trust_{bootloader,cpu} command line option only stackprotector: actually use get_random_canary() stackprotector: move get_random_canary() into stackprotector.h treewide: use get_random_u32_inclusive() when possible treewide: use get_random_u32_{above,below}() instead of manual loop treewide: use get_random_u32_below() instead of deprecated function ...
2022-12-12Merge tag 'timers-core-2022-12-10' of ↵Linus Torvalds1-23/+0
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: "Updates for timers, timekeeping and drivers: Core: - The timer_shutdown[_sync]() infrastructure: Tearing down timers can be tedious when there are circular dependencies to other things which need to be torn down. A prime example is timer and workqueue where the timer schedules work and the work arms the timer. What needs to prevented is that pending work which is drained via destroy_workqueue() does not rearm the previously shutdown timer. Nothing in that shutdown sequence relies on the timer being functional. The conclusion was that the semantics of timer_shutdown_sync() should be: - timer is not enqueued - timer callback is not running - timer cannot be rearmed Preventing the rearming of shutdown timers is done by discarding rearm attempts silently. A warning for the case that a rearm attempt of a shutdown timer is detected would not be really helpful because it's entirely unclear how it should be acted upon. The only way to address such a case is to add 'if (in_shutdown)' conditionals all over the place. This is error prone and in most cases of teardown not required all. - The real fix for the bluetooth HCI teardown based on timer_shutdown_sync(). A larger scale conversion to timer_shutdown_sync() is work in progress. - Consolidation of VDSO time namespace helper functions - Small fixes for timer and timerqueue Drivers: - Prevent integer overflow on the XGene-1 TVAL register which causes an never ending interrupt storm. - The usual set of new device tree bindings - Small fixes and improvements all over the place" * tag 'timers-core-2022-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (34 commits) dt-bindings: timer: renesas,cmt: Add r8a779g0 CMT support dt-bindings: timer: renesas,tmu: Add r8a779g0 support clocksource/drivers/arm_arch_timer: Use kstrtobool() instead of strtobool() clocksource/drivers/timer-ti-dm: Fix missing clk_disable_unprepare in dmtimer_systimer_init_clock() clocksource/drivers/timer-ti-dm: Clear settings on probe and free clocksource/drivers/timer-ti-dm: Make timer_get_irq static clocksource/drivers/timer-ti-dm: Fix warning for omap_timer_match clocksource/drivers/arm_arch_timer: Fix XGene-1 TVAL register math error clocksource/drivers/timer-npcm7xx: Enable timer 1 clock before use dt-bindings: timer: nuvoton,npcm7xx-timer: Allow specifying all clocks dt-bindings: timer: rockchip: Add rockchip,rk3128-timer clockevents: Repair kernel-doc for clockevent_delta2ns() clocksource/drivers/ingenic-ost: Define pm functions properly in platform_driver struct clocksource/drivers/sh_cmt: Access registers according to spec vdso/timens: Refactor copy-pasted find_timens_vvar_page() helper into one copy Bluetooth: hci_qca: Fix the teardown problem for real timers: Update the documentation to reflect on the new timer_shutdown() API timers: Provide timer_shutdown[_sync]() timers: Add shutdown mechanism to the internal functions timers: Split [try_to_]del_timer[_sync]() to prepare for shutdown mode ...
2022-12-12Merge tag 'x86-urgent-2022-12-12' of ↵Linus Torvalds1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "Three small x86 fixes which did not make it into 6.1: - Remove a superfluous noinline which prevents GCC-7.3 to optimize a stub function away - Allow uprobes on REP NOP and do not treat them like word-sized branch instructions - Make the VDSO symbol export of __vdso_sgx_enter_enclave() depend on CONFIG_X86_SGX to prevent build failures with newer LLVM versions which rightfully detect that there is no function behind the symbol" * tag 'x86-urgent-2022-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/vdso: Conditionally export __vdso_sgx_enter_enclave() uprobes/x86: Allow to probe a NOP instruction with 0x66 prefix x86/alternative: Remove noinline from __ibt_endbr_seal[_end]() stubs
2022-12-09x86/vdso: Conditionally export __vdso_sgx_enter_enclave()Nathan Chancellor1-0/+2
Recently, ld.lld moved from '--undefined-version' to '--no-undefined-version' as the default, which breaks building the vDSO when CONFIG_X86_SGX is not set: ld.lld: error: version script assignment of 'LINUX_2.6' to symbol '__vdso_sgx_enter_enclave' failed: symbol not defined __vdso_sgx_enter_enclave is only included in the vDSO when CONFIG_X86_SGX is set. Only export it if it will be present in the final object, which clears up the error. Fixes: 8466436952017 ("x86/vdso: Implement a vDSO for Intel SGX enclave call") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Link: https://github.com/ClangBuiltLinux/linux/issues/1756 Link: https://lore.kernel.org/r/20221109000306.1407357-1-nathan@kernel.org
2022-12-01vdso/timens: Refactor copy-pasted find_timens_vvar_page() helper into one copyJann Horn1-23/+0
find_timens_vvar_page() is not architecture-specific, as can be seen from how all five per-architecture versions of it are the same. (arm64, powerpc and riscv are exactly the same; x86 and s390 have two characters difference inside a comment, less blank lines, and mark the !CONFIG_TIME_NS version as inline.) Refactor the five copies into a central copy in kernel/time/namespace.c. Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20221130115320.2918447-1-jannh@google.com
2022-11-28clocksource: hyper-v: Use TSC PFN getter to map vvar pageStanislav Kinsburskiy1-4/+3
Instead of converting the virtual address to physical directly. This is a precursor patch for the upcoming support for TSC page mapping into Microsoft Hypervisor root partition, where TSC PFN will be defined by the hypervisor and thus can't be obtained by linear translation of the physical address. Signed-off-by: Stanislav Kinsburskiy <stanislav.kinsburskiy@gmail.com> CC: Andy Lutomirski <luto@kernel.org> CC: Thomas Gleixner <tglx@linutronix.de> CC: Ingo Molnar <mingo@redhat.com> CC: Borislav Petkov <bp@alien8.de> CC: Dave Hansen <dave.hansen@linux.intel.com> CC: x86@kernel.org CC: "H. Peter Anvin" <hpa@zytor.com> CC: "K. Y. Srinivasan" <kys@microsoft.com> CC: Haiyang Zhang <haiyangz@microsoft.com> CC: Wei Liu <wei.liu@kernel.org> CC: Dexuan Cui <decui@microsoft.com> CC: Daniel Lezcano <daniel.lezcano@linaro.org> CC: linux-kernel@vger.kernel.org CC: linux-hyperv@vger.kernel.org Reviewed-by: Michael Kelley <mikelley@microsoft.com> Reviewed-by: Anirudh Rayabharam <anrayabh@linux.microsoft.com> Link: https://lore.kernel.org/r/166749833939.218190.14095015146003109462.stgit@skinsburskii-cloud-desktop.internal.cloudapp.net Signed-off-by: Wei Liu <wei.liu@kernel.org>
2022-11-18treewide: use get_random_u32_below() instead of deprecated functionJason A. Donenfeld1-1/+1
This is a simple mechanical transformation done by: @@ expression E; @@ - prandom_u32_max + get_random_u32_below (E) Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs Reviewed-by: SeongJae Park <sj@kernel.org> # for damon Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> # for infiniband Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> # for arm Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # for mmc Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-10-17x86/Kconfig: Introduce function paddingThomas Gleixner1-1/+2
Now that all functions are 16 byte aligned, add 16 bytes of NOP padding in front of each function. This prepares things for software call stack tracking and kCFI/FineIBT. This significantly increases kernel .text size, around 5.1% on a x86_64-defconfig-ish build. However, per the random access argument used for alignment, these 16 extra bytes are code that wouldn't be used. Performance measurements back this up by showing no significant performance regressions. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220915111146.950884492@infradead.org
2022-10-17x86/vdso: Ensure all kernel code is seen by objtoolThomas Gleixner1-5/+6
extable.c is kernel code and not part of the VDSO Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220915111143.512144110@infradead.org
2022-10-12treewide: use prandom_u32_max() when possible, part 1Jason A. Donenfeld1-1/+1
Rather than incurring a division or requesting too many random bytes for the given range, use the prandom_u32_max() function, which only takes the minimum required bytes from the RNG and avoids divisions. This was done mechanically with this coccinelle script: @basic@ expression E; type T; identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; typedef u64; @@ ( - ((T)get_random_u32() % (E)) + prandom_u32_max(E) | - ((T)get_random_u32() & ((E) - 1)) + prandom_u32_max(E * XXX_MAKE_SURE_E_IS_POW2) | - ((u64)(E) * get_random_u32() >> 32) + prandom_u32_max(E) | - ((T)get_random_u32() & ~PAGE_MASK) + prandom_u32_max(PAGE_SIZE) ) @multi_line@ identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; identifier RAND; expression E; @@ - RAND = get_random_u32(); ... when != RAND - RAND %= (E); + RAND = prandom_u32_max(E); // Find a potential literal @literal_mask@ expression LITERAL; type T; identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; position p; @@ ((T)get_random_u32()@p & (LITERAL)) // Add one to the literal. @script:python add_one@ literal << literal_mask.LITERAL; RESULT; @@ value = None if literal.startswith('0x'): value = int(literal, 16) elif literal[0] in '123456789': value = int(literal, 10) if value is None: print("I don't know how to handle %s" % (literal)) cocci.include_match(False) elif value == 2**32 - 1 or value == 2**31 - 1 or value == 2**24 - 1 or value == 2**16 - 1 or value == 2**8 - 1: print("Skipping 0x%x for cleanup elsewhere" % (value)) cocci.include_match(False) elif value & (value + 1) != 0: print("Skipping 0x%x because it's not a power of two minus one" % (value)) cocci.include_match(False) elif literal.startswith('0x'): coccinelle.RESULT = cocci.make_expr("0x%x" % (value + 1)) else: coccinelle.RESULT = cocci.make_expr("%d" % (value + 1)) // Replace the literal mask with the calculated result. @plus_one@ expression literal_mask.LITERAL; position literal_mask.p; expression add_one.RESULT; identifier FUNC; @@ - (FUNC()@p & (LITERAL)) + prandom_u32_max(RESULT) @collapse_ret@ type T; identifier VAR; expression E; @@ { - T VAR; - VAR = (E); - return VAR; + return E; } @drop_var@ type T; identifier VAR; @@ { - T VAR; ... when != VAR } Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Yury Norov <yury.norov@gmail.com> Reviewed-by: KP Singh <kpsingh@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> # for ext4 and sbitmap Reviewed-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> # for drbd Acked-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Heiko Carstens <hca@linux.ibm.com> # for s390 Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # for mmc Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-10-11Merge tag 'mm-stable-2022-10-08' of ↵Linus Torvalds2-4/+8
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Yu Zhao's Multi-Gen LRU patches are here. They've been under test in linux-next for a couple of months without, to my knowledge, any negative reports (or any positive ones, come to that). - Also the Maple Tree from Liam Howlett. An overlapping range-based tree for vmas. It it apparently slightly more efficient in its own right, but is mainly targeted at enabling work to reduce mmap_lock contention. Liam has identified a number of other tree users in the kernel which could be beneficially onverted to mapletrees. Yu Zhao has identified a hard-to-hit but "easy to fix" lockdep splat at [1]. This has yet to be addressed due to Liam's unfortunately timed vacation. He is now back and we'll get this fixed up. - Dmitry Vyukov introduces KMSAN: the Kernel Memory Sanitizer. It uses clang-generated instrumentation to detect used-unintialized bugs down to the single bit level. KMSAN keeps finding bugs. New ones, as well as the legacy ones. - Yang Shi adds a userspace mechanism (madvise) to induce a collapse of memory into THPs. - Zach O'Keefe has expanded Yang Shi's madvise(MADV_COLLAPSE) to support file/shmem-backed pages. - userfaultfd updates from Axel Rasmussen - zsmalloc cleanups from Alexey Romanov - cleanups from Miaohe Lin: vmscan, hugetlb_cgroup, hugetlb and memory-failure - Huang Ying adds enhancements to NUMA balancing memory tiering mode's page promotion, with a new way of detecting hot pages. - memcg updates from Shakeel Butt: charging optimizations and reduced memory consumption. - memcg cleanups from Kairui Song. - memcg fixes and cleanups from Johannes Weiner. - Vishal Moola provides more folio conversions - Zhang Yi removed ll_rw_block() :( - migration enhancements from Peter Xu - migration error-path bugfixes from Huang Ying - Aneesh Kumar added ability for a device driver to alter the memory tiering promotion paths. For optimizations by PMEM drivers, DRM drivers, etc. - vma merging improvements from Jakub Matěn. - NUMA hinting cleanups from David Hildenbrand. - xu xin added aditional userspace visibility into KSM merging activity. - THP & KSM code consolidation from Qi Zheng. - more folio work from Matthew Wilcox. - KASAN updates from Andrey Konovalov. - DAMON cleanups from Kaixu Xia. - DAMON work from SeongJae Park: fixes, cleanups. - hugetlb sysfs cleanups from Muchun Song. - Mike Kravetz fixes locking issues in hugetlbfs and in hugetlb core. Link: https://lkml.kernel.org/r/CAOUHufZabH85CeUN-MEMgL8gJGzJEWUrkiM58JkTbBhh-jew0Q@mail.gmail.com [1] * tag 'mm-stable-2022-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (555 commits) hugetlb: allocate vma lock for all sharable vmas hugetlb: take hugetlb vma_lock when clearing vma_lock->vma pointer hugetlb: fix vma lock handling during split vma and range unmapping mglru: mm/vmscan.c: fix imprecise comments mm/mglru: don't sync disk for each aging cycle mm: memcontrol: drop dead CONFIG_MEMCG_SWAP config symbol mm: memcontrol: use do_memsw_account() in a few more places mm: memcontrol: deprecate swapaccounting=0 mode mm: memcontrol: don't allocate cgroup swap arrays when memcg is disabled mm/secretmem: remove reduntant return value mm/hugetlb: add available_huge_pages() func mm: remove unused inline functions from include/linux/mm_inline.h selftests/vm: add selftest for MADV_COLLAPSE of uffd-minor memory selftests/vm: add file/shmem MADV_COLLAPSE selftest for cleared pmd selftests/vm: add thp collapse shmem testing selftests/vm: add thp collapse file and tmpfs testing selftests/vm: modularize thp collapse memory operations selftests/vm: dedup THP helpers mm/khugepaged: add tracepoint to hpage_collapse_scan_file() mm/madvise: add file and shmem support to MADV_COLLAPSE ...
2022-10-04x86: kmsan: disable instrumentation of unsupported codeAlexander Potapenko1-0/+3
Instrumenting some files with KMSAN will result in kernel being unable to link, boot or crashing at runtime for various reasons (e.g. infinite recursion caused by instrumentation hooks calling instrumented code again). Completely omit KMSAN instrumentation in the following places: - arch/x86/boot and arch/x86/realmode/rm, as KMSAN doesn't work for i386; - arch/x86/entry/vdso, which isn't linked with KMSAN runtime; - three files in arch/x86/kernel - boot problems; - arch/x86/mm/cpu_entry_area.c - recursion. Link: https://lkml.kernel.org/r/20220915150417.722975-33-glider@google.com Signed-off-by: Alexander Potapenko <glider@google.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Eric Biggers <ebiggers@google.com> Cc: Eric Biggers <ebiggers@kernel.org> Cc: Eric Dumazet <edumazet@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kees Cook <keescook@chromium.org> Cc: Marco Elver <elver@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vegard Nossum <vegard.nossum@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-27x86: remove vma linked list walksMatthew Wilcox (Oracle)1-4/+5
Use the VMA iterator instead. Link: https://lkml.kernel.org/r/20220906194824.2110408-36-Liam.Howlett@oracle.com Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Tested-by: Yu Zhao <yuzhao@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: SeongJae Park <sj@kernel.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-26treewide: Filter out CC_FLAGS_CFISami Tolvanen1-1/+2
In preparation for removing CC_FLAGS_CFI from CC_FLAGS_LTO, explicitly filter out CC_FLAGS_CFI in all the makefiles where we currently filter out CC_FLAGS_LTO. Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Kees Cook <keescook@chromium.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220908215504.3686827-2-samitolvanen@google.com
2022-08-11x86: link vdso and boot with -z noexecstack --no-warn-rwx-segmentsNick Desaulniers1-1/+1
Users of GNU ld (BFD) from binutils 2.39+ will observe multiple instances of a new warning when linking kernels in the form: ld: warning: arch/x86/boot/pmjump.o: missing .note.GNU-stack section implies executable stack ld: NOTE: This behaviour is deprecated and will be removed in a future version of the linker ld: warning: arch/x86/boot/compressed/vmlinux has a LOAD segment with RWX permissions Generally, we would like to avoid the stack being executable. Because there could be a need for the stack to be executable, assembler sources have to opt-in to this security feature via explicit creation of the .note.GNU-stack feature (which compilers create by default) or command line flag --noexecstack. Or we can simply tell the linker the production of such sections is irrelevant and to link the stack as --noexecstack. LLVM's LLD linker defaults to -z noexecstack, so this flag isn't strictly necessary when linking with LLD, only BFD, but it doesn't hurt to be explicit here for all linkers IMO. --no-warn-rwx-segments is currently BFD specific and only available in the current latest release, so it's wrapped in an ld-option check. While the kernel makes extensive usage of ELF sections, it doesn't use permissions from ELF segments. Link: https://lore.kernel.org/linux-block/3af4127a-f453-4cf7-f133-a181cce06f73@kernel.dk/ Link: https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=ba951afb99912da01a6e8434126b8fac7aa75107 Link: https://github.com/llvm/llvm-project/issues/57009 Reported-and-tested-by: Jens Axboe <axboe@kernel.dk> Suggested-by: Fangrui Song <maskray@google.com> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-08-04Merge tag 'spdx-6.0-rc1' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx Pull SPDX updates from Greg KH: "Here is the set of SPDX comment updates for 6.0-rc1. Nothing huge here, just a number of updated SPDX license tags and cleanups based on the review of a number of common patterns in GPLv2 boilerplate text. Also included in here are a few other minor updates, two USB files, and one Documentation file update to get the SPDX lines correct. All of these have been in the linux-next tree for a very long time" * tag 'spdx-6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx: (28 commits) Documentation: samsung-s3c24xx: Add blank line after SPDX directive x86/crypto: Remove stray comment terminator treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_406.RULE treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_398.RULE treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_391.RULE treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_390.RULE treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_385.RULE treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_320.RULE treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_319.RULE treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_318.RULE treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_298.RULE treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_292.RULE treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_179.RULE treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_168.RULE (part 2) treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_168.RULE (part 1) treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_160.RULE treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_152.RULE treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_149.RULE treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_147.RULE treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_133.RULE ...
2022-06-27x86: Use return-thunk in asm codePeter Zijlstra1-0/+1
Use the return thunk in asm code. If the thunk isn't needed, it will get patched into a RET instruction during boot by apply_returns(). Since alternatives can't handle relocations outside of the first instruction, putting a 'jmp __x86_return_thunk' in one is not valid, therefore carve out the memmove ERMS path into a separate label and jump to it. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-10treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_385.RULEThomas Gleixner1-1/+1
Based on the normalized pattern: licensed under the gpl v2 extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference. Reviewed-by: Allison Randal <allison@lohutok.net> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-24Merge tag 'kernel-hardening-v5.19-rc1' of ↵Linus Torvalds1-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull kernel hardening updates from Kees Cook: - usercopy hardening expanded to check other allocation types (Matthew Wilcox, Yuanzheng Song) - arm64 stackleak behavioral improvements (Mark Rutland) - arm64 CFI code gen improvement (Sami Tolvanen) - LoadPin LSM block dev API adjustment (Christoph Hellwig) - Clang randstruct support (Bill Wendling, Kees Cook) * tag 'kernel-hardening-v5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (34 commits) loadpin: stop using bdevname mm: usercopy: move the virt_addr_valid() below the is_vmalloc_addr() gcc-plugins: randstruct: Remove cast exception handling af_unix: Silence randstruct GCC plugin warning niu: Silence randstruct warnings big_keys: Use struct for internal payload gcc-plugins: Change all version strings match kernel randomize_kstack: Improve docs on requirements/rationale lkdtm/stackleak: fix CONFIG_GCC_PLUGIN_STACKLEAK=n arm64: entry: use stackleak_erase_on_task_stack() stackleak: add on/off stack variants lkdtm/stackleak: check stack boundaries lkdtm/stackleak: prevent unexpected stack usage lkdtm/stackleak: rework boundary management lkdtm/stackleak: avoid spurious failure stackleak: rework poison scanning stackleak: rework stack high bound handling stackleak: clarify variable names stackleak: rework stack low bound handling stackleak: remove redundant check ...
2022-05-08randstruct: Split randstruct Makefile and CFLAGSKees Cook1-1/+2
To enable the new Clang randstruct implementation[1], move randstruct into its own Makefile and split the CFLAGS from GCC_PLUGINS_CFLAGS into RANDSTRUCT_CFLAGS. [1] https://reviews.llvm.org/D121556 Cc: linux-hardening@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220503205503.3054173-5-keescook@chromium.org
2022-05-04x86: Fix return value of __setup handlersRandy Dunlap1-1/+1
__setup() handlers should return 1 to obsolete_checksetup() in init/main.c to indicate that the boot option has been handled. A return of 0 causes the boot option/value to be listed as an Unknown kernel parameter and added to init's (limited) argument (no '=') or environment (with '=') strings. So return 1 from these x86 __setup handlers. Examples: Unknown kernel command line parameters "apicpmtimer BOOT_IMAGE=/boot/bzImage-517rc8 vdso=1 ring3mwait=disable", will be passed to user space. Run /sbin/init as init process with arguments: /sbin/init apicpmtimer with environment: HOME=/ TERM=linux BOOT_IMAGE=/boot/bzImage-517rc8 vdso=1 ring3mwait=disable Fixes: 2aae950b21e4 ("x86_64: Add vDSO for x86-64 with gettimeofday/clock_gettime/getcpu") Fixes: 77b52b4c5c66 ("x86: add "debugpat" boot option") Fixes: e16fd002afe2 ("x86/cpufeature: Enable RING3MWAIT for Knights Landing") Fixes: b8ce33590687 ("x86_64: convert to clock events") Reported-by: Igor Zhbanov <i.zhbanov@omprussia.ru> Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/64644a2f-4a20-bab3-1e15-3b2cdd0defe3@omprussia.ru Link: https://lore.kernel.org/r/20220314012725.26661-1-rdunlap@infradead.org
2022-01-13Merge tag 'x86_core_for_v5.17_rc1' of ↵Linus Torvalds3-3/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 core updates from Borislav Petkov: - Get rid of all the .fixup sections because this generates misleading/wrong stacktraces and confuse RELIABLE_STACKTRACE and LIVEPATCH as the backtrace misses the function which is being fixed up. - Add Straight Line Speculation mitigation support which uses a new compiler switch -mharden-sls= which sticks an INT3 after a RET or an indirect branch in order to block speculation after them. Reportedly, CPUs do speculate behind such insns. - The usual set of cleanups and improvements * tag 'x86_core_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (32 commits) x86/entry_32: Fix segment exceptions objtool: Remove .fixup handling x86: Remove .fixup section x86/word-at-a-time: Remove .fixup usage x86/usercopy: Remove .fixup usage x86/usercopy_32: Simplify __copy_user_intel_nocache() x86/sgx: Remove .fixup usage x86/checksum_32: Remove .fixup usage x86/vmx: Remove .fixup usage x86/kvm: Remove .fixup usage x86/segment: Remove .fixup usage x86/fpu: Remove .fixup usage x86/xen: Remove .fixup usage x86/uaccess: Remove .fixup usage x86/futex: Remove .fixup usage x86/msr: Remove .fixup usage x86/extable: Extend extable functionality x86/entry_32: Remove .fixup usage x86/entry_64: Remove .fixup usage x86/copy_mc_64: Remove .fixup usage ...
2021-12-30x86/vdso: Remove -nostdlib compiler flagMasahiro Yamada1-1/+1
The -nostdlib option requests the compiler to not use the standard system startup files or libraries when linking. It is effective only when $(CC) is used as a linker driver. Since 379d98ddf413 ("x86: vdso: Use $LD instead of $CC to link") $(LD) is directly used, hence -nostdlib is unneeded. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Link: https://lore.kernel.org/r/20211107162641.324688-1-masahiroy@kernel.org
2021-12-11x86: Remove .fixup sectionPeter Zijlstra1-1/+0
No moar users, kill it dead. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lore.kernel.org/r/20211110101326.201590122@infradead.org
2021-12-08x86: Prepare asm files for straight-line-speculationPeter Zijlstra2-2/+2
Replace all ret/retq instructions with RET in preparation of making RET a macro. Since AS is case insensitive it's a big no-op without RET defined. find arch/x86/ -name \*.S | while read file do sed -i 's/\<ret[q]*\>/RET/' $file done Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20211204134907.905503893@infradead.org
2021-09-03x86/build/vdso: fix missing FORCE for *.so build ruleMasahiro Yamada1-1/+1
Add FORCE so that if_changed can detect the command line change. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2021-04-26Merge tag 'x86-vdso-2021-04-26' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 vdso update from Thomas Gleixner: "A single fix for the x86 VDSO build infrastructure to address a compiler warning on 32bit hosts due to a fprintf() modifier/argument mismatch." * tag 'x86-vdso-2021-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/vdso: Use proper modifier for len's format specifier in extract()
2021-04-26Merge tag 'x86_cleanups_for_v5.13' of ↵Linus Torvalds4-4/+4
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc x86 cleanups from Borislav Petkov: "Trivial cleanups and fixes all over the place" * tag 'x86_cleanups_for_v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: MAINTAINERS: Remove me from IDE/ATAPI section x86/pat: Do not compile stubbed functions when X86_PAT is off x86/asm: Ensure asm/proto.h can be included stand-alone x86/platform/intel/quark: Fix incorrect kernel-doc comment syntax in files x86/msr: Make locally used functions static x86/cacheinfo: Remove unneeded dead-store initialization x86/process/64: Move cpu_current_top_of_stack out of TSS tools/turbostat: Unmark non-kernel-doc comment x86/syscalls: Fix -Wmissing-prototypes warnings from COND_SYSCALL() x86/fpu/math-emu: Fix function cast warning x86/msr: Fix wr/rdmsr_safe_regs_on_cpu() prototypes x86: Fix various typos in comments, take #2 x86: Remove unusual Unicode characters from comments x86/kaslr: Return boolean values from a function returning bool x86: Fix various typos in comments x86/setup: Remove unused RESERVE_BRK_ARRAY() stacktrace: Move documentation for arch_stack_walk_reliable() to header x86: Remove duplicate TSC DEADLINE MSR definitions
2021-03-22x86: Fix various typos in comments, take #2Ingo Molnar4-4/+4
Fix another ~42 single-word typos in arch/x86/ code comments, missed a few in the first pass, in particular in .S files. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: linux-kernel@vger.kernel.org
2021-03-11x86/alternative: Merge include filesJuergen Gross1-1/+1
Merge arch/x86/include/asm/alternative-asm.h into arch/x86/include/asm/alternative.h in order to make it easier to use common definitions later. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210311142319.4723-2-jgross@suse.com
2021-03-06x86/vdso: Use proper modifier for len's format specifier in extract()Jiri Slaby1-1/+1
Commit 8382c668ce4f ("x86/vdso: Add support for exception fixup in vDSO functions") prints length "len" which is size_t. Compilers now complain when building on a 32-bit host: HOSTCC arch/x86/entry/vdso/vdso2c ... In file included from arch/x86/entry/vdso/vdso2c.c:162: arch/x86/entry/vdso/vdso2c.h: In function 'extract64': arch/x86/entry/vdso/vdso2c.h:38:52: warning: format '%lu' expects argument of \ type 'long unsigned int', but argument 4 has type 'size_t' {aka 'unsigned int'} So use proper modifier (%zu) for size_t. [ bp: Massage commit message. ] Fixes: 8382c668ce4f ("x86/vdso: Add support for exception fixup in vDSO functions") Signed-off-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Jarkko Sakkinen <jarkko@kernel.org> Link: https://lkml.kernel.org/r/20210303064357.17056-1-jslaby@suse.cz
2021-02-23x86, vdso: disable LTO only for vDSOSami Tolvanen1-1/+2
Disable LTO for the vDSO. Note that while we could use Clang's LTO for the 64-bit vDSO, it won't add noticeable benefit for the small amount of C code. Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Reviewed-by: Kees Cook <keescook@chromium.org>
2020-12-15Merge branch 'akpm' (patches from Andrew)Linus Torvalds1-17/+0
Merge misc updates from Andrew Morton: - a few random little subsystems - almost all of the MM patches which are staged ahead of linux-next material. I'll trickle to post-linux-next work in as the dependents get merged up. Subsystems affected by this patch series: kthread, kbuild, ide, ntfs, ocfs2, arch, and mm (slab-generic, slab, slub, dax, debug, pagecache, gup, swap, shmem, memcg, pagemap, mremap, hmm, vmalloc, documentation, kasan, pagealloc, memory-failure, hugetlb, vmscan, z3fold, compaction, oom-kill, migration, cma, page-poison, userfaultfd, zswap, zsmalloc, uaccess, zram, and cleanups). * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (200 commits) mm: cleanup kstrto*() usage mm: fix fall-through warnings for Clang mm: slub: convert sysfs sprintf family to sysfs_emit/sysfs_emit_at mm: shmem: convert shmem_enabled_show to use sysfs_emit_at mm:backing-dev: use sysfs_emit in macro defining functions mm: huge_memory: convert remaining use of sprintf to sysfs_emit and neatening mm: use sysfs_emit for struct kobject * uses mm: fix kernel-doc markups zram: break the strict dependency from lzo zram: add stat to gather incompressible pages since zram set up zram: support page writeback mm/process_vm_access: remove redundant initialization of iov_r mm/zsmalloc.c: rework the list_add code in insert_zspage() mm/zswap: move to use crypto_acomp API for hardware acceleration mm/zswap: fix passing zero to 'PTR_ERR' warning mm/zswap: make struct kernel_param_ops definitions const userfaultfd/selftests: hint the test runner on required privilege userfaultfd/selftests: fix retval check for userfaultfd_open() userfaultfd/selftests: always dump something in modes userfaultfd: selftests: make __{s,u}64 format specifiers portable ...
2020-12-15mm: forbid splitting special mappingsDmitry Safonov1-17/+0
Don't allow splitting of vm_special_mapping's. It affects vdso/vvar areas. Uprobes have only one page in xol_area so they aren't affected. Those restrictions were enforced by checks in .mremap() callbacks. Restrict resizing with generic .split() callback. Link: https://lkml.kernel.org/r/20201013013416.390574-7-dima@arista.com Signed-off-by: Dmitry Safonov <dima@arista.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Geffon <bgeffon@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15Merge tag 'core-entry-2020-12-14' of ↵Linus Torvalds3-0/+19
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core entry/exit updates from Thomas Gleixner: "A set of updates for entry/exit handling: - More generalization of entry/exit functionality - The consolidation work to reclaim TIF flags on x86 and also for non-x86 specific TIF flags which are solely relevant for syscall related work and have been moved into their own storage space. The x86 specific part had to be merged in to avoid a major conflict. - The TIF_NOTIFY_SIGNAL work which replaces the inefficient signal delivery mode of task work and results in an impressive performance improvement for io_uring. The non-x86 consolidation of this is going to come seperate via Jens. - The selective syscall redirection facility which provides a clean and efficient way to support the non-Linux syscalls of WINE by catching them at syscall entry and redirecting them to the user space emulation. This can be utilized for other purposes as well and has been designed carefully to avoid overhead for the regular fastpath. This includes the core changes and the x86 support code. - Simplification of the context tracking entry/exit handling for the users of the generic entry code which guarantee the proper ordering and protection. - Preparatory changes to make the generic entry code accomodate S390 specific requirements which are mostly related to their syscall restart mechanism" * tag 'core-entry-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits) entry: Add syscall_exit_to_user_mode_work() entry: Add exit_to_user_mode() wrapper entry_Add_enter_from_user_mode_wrapper entry: Rename exit_to_user_mode() entry: Rename enter_from_user_mode() docs: Document Syscall User Dispatch selftests: Add benchmark for syscall user dispatch selftests: Add kselftest for syscall user dispatch entry: Support Syscall User Dispatch on common syscall entry kernel: Implement selective syscall userspace redirection signal: Expose SYS_USER_DISPATCH si_code type x86: vdso: Expose sigreturn address on vdso to the kernel MAINTAINERS: Add entry for common entry code entry: Fix boot for !CONFIG_GENERIC_ENTRY x86: Support HAVE_CONTEXT_TRACKING_OFFSTACK context_tracking: Only define schedule_user() on !HAVE_CONTEXT_TRACKING_OFFSTACK archs sched: Detect call to schedule from critical entry code context_tracking: Don't implement exception_enter/exit() on CONFIG_HAVE_CONTEXT_TRACKING_OFFSTACK context_tracking: Introduce HAVE_CONTEXT_TRACKING_OFFSTACK x86: Reclaim unused x86 TI flags ...
2020-12-15Merge tag 'x86_cleanups_for_v5.11' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Borislav Petkov: "Another branch with a nicely negative diffstat, just the way I like 'em: - Remove all uses of TIF_IA32 and TIF_X32 and reclaim the two bits in the end (Gabriel Krisman Bertazi) - All kinds of minor cleanups all over the tree" * tag 'x86_cleanups_for_v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits) x86/ia32_signal: Propagate __user annotation properly x86/alternative: Update text_poke_bp() kernel-doc comment x86/PCI: Make a kernel-doc comment a normal one x86/asm: Drop unused RDPID macro x86/boot/compressed/64: Use TEST %reg,%reg instead of CMP $0,%reg x86/head64: Remove duplicate include x86/mm: Declare 'start' variable where it is used x86/head/64: Remove unused GET_CR2_INTO() macro x86/boot: Remove unused finalize_identity_maps() x86/uaccess: Document copy_from_user_nmi() x86/dumpstack: Make show_trace_log_lvl() static x86/mtrr: Fix a kernel-doc markup x86/setup: Remove unused MCA variables x86, libnvdimm/test: Remove COPY_MC_TEST x86: Reclaim TIF_IA32 and TIF_X32 x86/mm: Convert mmu context ia32_compat into a proper flags field x86/elf: Use e_machine to check for x32/ia32 in setup_additional_pages() elf: Expose ELF header on arch_setup_additional_pages() x86/elf: Use e_machine to select start_thread for x32 elf: Expose ELF header in compat_start_thread() ...
2020-12-02x86: vdso: Expose sigreturn address on vdso to the kernelGabriel Krisman Bertazi3-0/+19
Syscall user redirection requires the signal trampoline code to not be captured, in order to support returning with a locked selector while avoiding recursion back into the signal handler. For ia-32, which has the trampoline in the vDSO, expose the entry points to the kernel, such that it can avoid dispatching syscalls from that region to userspace. Suggested-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Andy Lutomirski <luto@kernel.org> Acked-by: Andy Lutomirski <luto@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20201127193238.821364-2-krisman@collabora.com
2020-11-18x86/vdso: Implement a vDSO for Intel SGX enclave callSean Christopherson3-0/+154
Enclaves encounter exceptions for lots of reasons: everything from enclave page faults to NULL pointer dereferences, to system calls that must be “proxied” to the kernel from outside the enclave. In addition to the code contained inside an enclave, there is also supporting code outside the enclave called an “SGX runtime”, which is virtually always implemented inside a shared library. The runtime helps build the enclave and handles things like *re*building the enclave if it got destroyed by something like a suspend/resume cycle. The rebuilding has traditionally been handled in SIGSEGV handlers, registered by the library. But, being process-wide, shared state, signal handling and shared libraries do not mix well. Introduce a vDSO function call that wraps the enclave entry functions (EENTER/ERESUME functions of the ENCLU instruciton) and returns information about any exceptions to the caller in the SGX runtime. Instead of generating a signal, the kernel places exception information in RDI, RSI and RDX. The kernel-provided userspace portion of the vDSO handler will place this information in a user-provided buffer or trigger a user-provided callback at the time of the exception. The vDSO function calling convention uses the standard RDI RSI, RDX, RCX, R8 and R9 registers. This makes it possible to declare the vDSO as a C prototype, but other than that there is no specific support for SystemV ABI. Things like storing XSAVE are the responsibility of the enclave and the runtime. [ bp: Change vsgx.o build dependency to CONFIG_X86_SGX. ] Suggested-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Co-developed-by: Cedric Xing <cedric.xing@intel.com> Signed-off-by: Cedric Xing <cedric.xing@intel.com> Co-developed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Jethro Beekman <jethro@fortanix.com> Link: https://lkml.kernel.org/r/20201112220135.165028-20-jarkko@kernel.org