summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)AuthorFilesLines
2025-09-19bpf: table based bpf_insn_successors()Eduard Zingerman1-0/+1
Converting bpf_insn_successors() to use lookup table makes it ~1.5 times faster. Also remove unnecessary conditionals: - `idx + 1 < prog->len` is unnecessary because after check_cfg() all jump targets are guaranteed to be within a program; - `i == 0 || succ[0] != dst` is unnecessary because any client of bpf_insn_successors() can handle duplicate edges: - compute_live_registers() - compute_scc() Moving bpf_insn_successors() to liveness.c allows its inlining in liveness.c:__update_stack_liveness(). Such inlining speeds up __update_stack_liveness() by ~40%. bpf_insn_successors() is used in both verifier.c and liveness.c. perf shows such move does not negatively impact users in verifier.c, as these are executed only once before main varification pass. Unlike __update_stack_liveness() which can be triggered multiple times. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250918-callchain-sensitive-liveness-v3-10-c3cd27bacc60@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-19bpf: disable and remove registers chain based livenessEduard Zingerman1-25/+0
Remove register chain based liveness tracking: - struct bpf_reg_state->{parent,live} fields are no longer needed; - REG_LIVE_WRITTEN marks are superseded by bpf_mark_stack_write() calls; - mark_reg_read() calls are superseded by bpf_mark_stack_read(); - log.c:print_liveness() is superseded by logging in liveness.c; - propagate_liveness() is superseded by bpf_update_live_stack(); - no need to establish register chains in is_state_visited() anymore; - fix a bunch of tests expecting "_w" suffixes in verifier log messages. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250918-callchain-sensitive-liveness-v3-9-c3cd27bacc60@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-19bpf: signal error if old liveness is more conservative than newEduard Zingerman1-0/+1
Unlike the new algorithm, register chain based liveness tracking is fully path sensitive, and thus should be strictly more accurate. Validate the new algorithm by signaling an error whenever it considers a stack slot dead while the old algorithm considers it alive. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250918-callchain-sensitive-liveness-v3-8-c3cd27bacc60@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-19bpf: callchain sensitive stack liveness tracking using CFGEduard Zingerman1-0/+14
This commit adds a flow-sensitive, context-sensitive, path-insensitive data flow analysis for live stack slots: - flow-sensitive: uses program control flow graph to compute data flow values; - context-sensitive: collects data flow values for each possible call chain in a program; - path-insensitive: does not distinguish between separate control flow graph paths reaching the same instruction. Compared to the current path-sensitive analysis, this approach trades some precision for not having to enumerate every path in the program. This gives a theoretical capability to run the analysis before main verification pass. See cover letter for motivation. The basic idea is as follows: - Data flow values indicate stack slots that might be read and stack slots that are definitely written. - Data flow values are collected for each (call chain, instruction number) combination in the program. - Within a subprogram, data flow values are propagated using control flow graph. - Data flow values are transferred from entry instructions of callee subprograms to call sites in caller subprograms. In other words, a tree of all possible call chains is constructed. Each node of this tree represents a subprogram. Read and write marks are collected for each instruction of each node. Live stack slots are first computed for lower level nodes. Then, information about outer stack slots that might be read or are definitely written by a subprogram is propagated one level up, to the corresponding call instructions of the upper nodes. Procedure repeats until root node is processed. In the absence of value range analysis, stack read/write marks are collected during main verification pass, and data flow computation is triggered each time verifier.c:states_equal() needs to query the information. Implementation details are documented in kernel/bpf/liveness.c. Quantitative data about verification performance changes and memory consumption is in the cover letter. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250918-callchain-sensitive-liveness-v3-6-c3cd27bacc60@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-19bpf: compute instructions postorder per subprogramEduard Zingerman1-1/+5
The next patch would require doing postorder traversal of individual subprograms. Facilitate this by moving env->cfg.insn_postorder computation from check_cfg() to a separate pass, as check_cfg() descends into called subprograms (and it needs to, because of merge_callee_effects() logic). env->cfg.insn_postorder is used only by compute_live_registers(), this function does not track cross subprogram dependencies, thus the change does not affect it's operation. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250918-callchain-sensitive-liveness-v3-5-c3cd27bacc60@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-19bpf: declare a few utility functions as internal apiEduard Zingerman1-0/+5
Namely, rename the following functions and add prototypes to bpf_verifier.h: - find_containing_subprog -> bpf_find_containing_subprog - insn_successors -> bpf_insn_successors - calls_callback -> bpf_calls_callback - fmt_stack_mask -> bpf_fmt_stack_mask Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250918-callchain-sensitive-liveness-v3-4-c3cd27bacc60@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-19bpf: bpf_verifier_state->cleaned flag instead of REG_LIVE_DONEEduard Zingerman1-1/+1
Prepare for bpf_reg_state->live field removal by introducing a separate flag to track if clean_verifier_state() had been applied to the state. No functional changes. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250918-callchain-sensitive-liveness-v3-1-c3cd27bacc60@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-19bpf: Move the signature kfuncs to helpers.cKP Singh1-0/+32
No functional changes, except for the addition of the headers for the kfuncs so that they can be used for signature verification. Signed-off-by: KP Singh <kpsingh@kernel.org> Link: https://lore.kernel.org/r/20250914215141.15144-8-kpsingh@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-19bpf: Return hashes of maps in BPF_OBJ_GET_INFO_BY_FDKP Singh1-0/+3
Currently only array maps are supported, but the implementation can be extended for other maps and objects. The hash is memoized only for exclusive and frozen maps as their content is stable until the exclusive program modifies the map. This is required for BPF signing, enabling a trusted loader program to verify a map's integrity. The loader retrieves the map's runtime hash from the kernel and compares it against an expected hash computed at build time. Signed-off-by: KP Singh <kpsingh@kernel.org> Link: https://lore.kernel.org/r/20250914215141.15144-7-kpsingh@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-19bpf: Implement exclusive map creationKP Singh1-0/+1
Exclusive maps allow maps to only be accessed by program with a program with a matching hash which is specified in the excl_prog_hash attr. For the signing use-case, this allows the trusted loader program to load the map and verify the integrity Signed-off-by: KP Singh <kpsingh@kernel.org> Link: https://lore.kernel.org/r/20250914215141.15144-3-kpsingh@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-19bpf: Update the bpf_prog_calc_tag to use SHA256KP Singh1-1/+5
Exclusive maps restrict map access to specific programs using a hash. The current hash used for this is SHA1, which is prone to collisions. This patch uses SHA256, which is more resilient against collisions. This new hash is stored in bpf_prog and used by the verifier to determine if a program can access a given exclusive map. The original 64-bit tags are kept, as they are used by users as a short, possibly colliding program identifier for non-security purposes. Signed-off-by: KP Singh <kpsingh@kernel.org> Link: https://lore.kernel.org/r/20250914215141.15144-2-kpsingh@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-15bpf: Do not limit bpf_cgroup_from_id to current's namespaceKumar Kartikeya Dwivedi1-0/+1
The bpf_cgroup_from_id kfunc relies on cgroup_get_from_id to obtain the cgroup corresponding to a given cgroup ID. This helper can be called in a lot of contexts where the current thread can be random. A recent example was its use in sched_ext's ops.tick(), to obtain the root cgroup pointer. Since the current task can be whatever random user space task preempted by the timer tick, this makes the behavior of the helper unreliable. Refactor out __cgroup_get_from_id as the non-namespace aware version of cgroup_get_from_id, and change bpf_cgroup_from_id to make use of it. There is no compatibility breakage here, since changing the namespace against which the lookup is being done to the root cgroup namespace only permits a wider set of lookups to succeed now. The cgroup IDs across namespaces are globally unique, and thus don't need to be retranslated. Reported-by: Dan Schatzberg <dschatzberg@meta.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20250915032618.1551762-2-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-11bpf: Report arena faults to BPF stderrPuranjay Mohan1-0/+6
Begin reporting arena page faults and the faulting address to BPF program's stderr, this patch adds support in the arm64 and x86-64 JITs, support for other archs can be added later. The fault handlers receive the 32 bit address in the arena region so the upper 32 bits of user_vm_start is added to it before printing the address. This is what the user would expect to see as this is what is printed by bpf_printk() is you pass it an address returned by bpf_arena_alloc_pages(); Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20250911145808.58042-4-puranjay@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-11bpf: core: introduce main_prog_aux for stream accessPuranjay Mohan1-0/+1
BPF streams are only valid for the main programs, to make it easier to access streams from subprogs, introduce main_prog_aux in struct bpf_prog_aux. prog->aux->main_prog_aux = prog->aux, for main programs and prog->aux->main_prog_aux = main_prog->aux, for subprograms. Make bpf_prog_find_from_stack() use the added main_prog_aux to return the mainprog when a subprog is found on the stack. Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250911145808.58042-3-puranjay@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf after rc5Alexei Starovoitov30-110/+171
Cross-merge BPF and other fixes after downstream PR. No conflicts. Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-11Merge tag 'pm-6.17-rc6' of ↵Linus Torvalds1-0/+10
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "These fix a nasty hibernation regression introduced during the 6.16 cycle, an issue related to energy model management occurring on Intel hybrid systems where some CPUs are offline to start with, and two regressions in the amd-pstate driver: - Restore a pm_restrict_gfp_mask() call in hibernation_snapshot() that was removed incorrectly during the 6.16 development cycle (Rafael Wysocki) - Introduce a function for registering a perf domain without triggering a system-wide CPU capacity update and make the intel_pstate driver use it to avoid reocurring unsuccessful attempts to update capacities of all CPUs in the system (Rafael Wysocki) - Fix setting of CPPC.min_perf in the active mode with performance governor in the amd-pstate driver to restore its expected behavior changed recently (Gautham Shenoy) - Avoid mistakenly setting EPP to 0 in the amd-pstate driver after system resume as a result of recent code changes (Mario Limonciello)" * tag 'pm-6.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: PM: hibernate: Restrict GFP mask in hibernation_snapshot() PM: EM: Add function for registering a PD without capacity update cpufreq/amd-pstate: Fix a regression leading to EPP 0 after resume cpufreq/amd-pstate: Fix setting of CPPC.min_perf in active mode for performance governor
2025-09-11Merge tag 'mm-hotfixes-stable-2025-09-10-20-00' of ↵Linus Torvalds2-8/+27
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "20 hotfixes. 15 are cc:stable and the remainder address post-6.16 issues or aren't considered necessary for -stable kernels. 14 of these fixes are for MM. This includes - kexec fixes from Breno for a recently introduced use-uninitialized bug - DAMON fixes from Quanmin Yan to avoid div-by-zero crashes which can occur if the operator uses poorly-chosen insmod parameters and misc singleton fixes" * tag 'mm-hotfixes-stable-2025-09-10-20-00' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: MAINTAINERS: add tree entry to numa memblocks and emulation block mm/damon/sysfs: fix use-after-free in state_show() proc: fix type confusion in pde_set_flags() compiler-clang.h: define __SANITIZE_*__ macros only when undefined mm/vmalloc, mm/kasan: respect gfp mask in kasan_populate_vmalloc() ocfs2: fix recursive semaphore deadlock in fiemap call mm/memory-failure: fix VM_BUG_ON_PAGE(PagePoisoned(page)) when unpoison memory mm/mremap: fix regression in vrm->new_addr check percpu: fix race on alloc failed warning limit mm/memory-failure: fix redundant updates for already poisoned pages s390: kexec: initialize kexec_buf struct riscv: kexec: initialize kexec_buf struct arm64: kexec: initialize kexec_buf struct in load_other_segments() mm/damon/reclaim: avoid divide-by-zero in damon_reclaim_apply_parameters() mm/damon/lru_sort: avoid divide-by-zero in damon_lru_sort_apply_parameters() mm/damon/core: set quota->charged_from to jiffies at first charge window mm/hugetlb: add missing hugetlb_lock in __unmap_hugepage_range() init/main.c: fix boot time tracing crash mm/memory_hotplug: fix hwpoisoned large folio handling in do_migrate_range() mm/khugepaged: fix the address passed to notifier on testing young
2025-09-11Merge tag 'vmscape-for-linus-20250904' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull vmescape mitigation fixes from Dave Hansen: "Mitigate vmscape issue with indirect branch predictor flushes. vmscape is a vulnerability that essentially takes Spectre-v2 and attacks host userspace from a guest. It particularly affects hypervisors like QEMU. Even if a hypervisor may not have any sensitive data like disk encryption keys, guest-userspace may be able to attack the guest-kernel using the hypervisor as a confused deputy. There are many ways to mitigate vmscape using the existing Spectre-v2 defenses like IBRS variants or the IBPB flushes. This series focuses solely on IBPB because it works universally across vendors and all vulnerable processors. Further work doing vendor and model-specific optimizations can build on top of this if needed / wanted. Do the normal issue mitigation dance: - Add the CPU bug boilerplate - Add a list of vulnerable CPUs - Use IBPB to flush the branch predictors after running guests" * tag 'vmscape-for-linus-20250904' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/vmscape: Add old Intel CPUs to affected list x86/vmscape: Warn when STIBP is disabled with SMT x86/bugs: Move cpu_bugs_smt_update() down x86/vmscape: Enable the mitigation x86/vmscape: Add conditional IBPB mitigation x86/vmscape: Enumerate VMSCAPE bug Documentation/hw-vuln: Add VMSCAPE documentation
2025-09-10PM: EM: Add function for registering a PD without capacity updateRafael J. Wysocki1-0/+10
The intel_pstate driver manages CPU capacity changes itself and it does not need an update of the capacity of all CPUs in the system to be carried out after registering a PD. Moreover, in some configurations (for instance, an SMT-capable hybrid x86 system booted with nosmt in the kernel command line) the em_check_capacity_update() call at the end of em_dev_register_perf_domain() always fails and reschedules itself to run once again in 1 s, so effectively it runs in vain every 1 s forever. To address this, introduce a new variant of em_dev_register_perf_domain(), called em_dev_register_pd_no_update(), that does not invoke em_check_capacity_update(), and make intel_pstate use it instead of the original. Fixes: 7b010f9b9061 ("cpufreq: intel_pstate: EAS support for hybrid platforms") Closes: https://lore.kernel.org/linux-pm/40212796-734c-4140-8a85-854f72b8144d@panix.com/ Reported-by: Kenneth R. Crudup <kenny@panix.com> Tested-by: Kenneth R. Crudup <kenny@panix.com> Cc: 6.16+ <stable@vger.kernel.org> # 6.16+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-09-09compiler-clang.h: define __SANITIZE_*__ macros only when undefinedNathan Chancellor1-5/+24
Clang 22 recently added support for defining __SANITIZE__ macros similar to GCC [1], which causes warnings (or errors with CONFIG_WERROR=y or W=e) with the existing defines that the kernel creates to emulate this behavior with existing clang versions. In file included from <built-in>:3: In file included from include/linux/compiler_types.h:171: include/linux/compiler-clang.h:37:9: error: '__SANITIZE_THREAD__' macro redefined [-Werror,-Wmacro-redefined] 37 | #define __SANITIZE_THREAD__ | ^ <built-in>:352:9: note: previous definition is here 352 | #define __SANITIZE_THREAD__ 1 | ^ Refactor compiler-clang.h to only define the sanitizer macros when they are undefined and adjust the rest of the code to use these macros for checking if the sanitizers are enabled, clearing up the warnings and allowing the kernel to easily drop these defines when the minimum supported version of LLVM for building the kernel becomes 22.0.0 or newer. Link: https://lkml.kernel.org/r/20250902-clang-update-sanitize-defines-v1-1-cf3702ca3d92@kernel.org Link: https://github.com/llvm/llvm-project/commit/568c23bbd3303518c5056d7f03444dae4fdc8a9c [1] Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Justin Stitt <justinstitt@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Bill Wendling <morbo@google.com> Cc: Dmitriy Vyukov <dvyukov@google.com> Cc: Marco Elver <elver@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-09-09mm/vmalloc, mm/kasan: respect gfp mask in kasan_populate_vmalloc()Uladzislau Rezki (Sony)1-3/+3
kasan_populate_vmalloc() and its helpers ignore the caller's gfp_mask and always allocate memory using the hardcoded GFP_KERNEL flag. This makes them inconsistent with vmalloc(), which was recently extended to support GFP_NOFS and GFP_NOIO allocations. Page table allocations performed during shadow population also ignore the external gfp_mask. To preserve the intended semantics of GFP_NOFS and GFP_NOIO, wrap the apply_to_page_range() calls into the appropriate memalloc scope. xfs calls vmalloc with GFP_NOFS, so this bug could lead to deadlock. There was a report here https://lkml.kernel.org/r/686ea951.050a0220.385921.0016.GAE@google.com This patch: - Extends kasan_populate_vmalloc() and helpers to take gfp_mask; - Passes gfp_mask down to alloc_pages_bulk() and __get_free_page(); - Enforces GFP_NOFS/NOIO semantics with memalloc_*_save()/restore() around apply_to_page_range(); - Updates vmalloc.c and percpu allocator call sites accordingly. Link: https://lkml.kernel.org/r/20250831121058.92971-1-urezki@gmail.com Fixes: 451769ebb7e7 ("mm/vmalloc: alloc GFP_NO{FS,IO} for vmalloc") Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Reported-by: syzbot+3470c9ffee63e4abafeb@syzkaller.appspotmail.com Reviewed-by: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Baoquan He <bhe@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-09-08Merge tag 'vfs-6.17-rc6.fixes' of ↵Linus Torvalds1-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: "fuse: - Prevent opening of non-regular backing files. Fuse doesn't support non-regular files anyway. - Check whether copy_file_range() returns a larger size than requested. - Prevent overflow in copy_file_range() as fuse currently only supports 32-bit sized copies. - Cache the blocksize value if the server returned a new value as inode->i_blkbits isn't modified directly anymore. - Fix i_blkbits handling for iomap partial writes. By default i_blkbits is set to PAGE_SIZE which causes iomap to mark the whole folio as uptodate even on a partial write. But fuseblk filesystems support choosing a blocksize smaller than PAGE_SIZE risking data corruption. Simply enforce PAGE_SIZE as blocksize for fuseblk's internal inode for now. - Prevent out-of-bounds acces in fuse_dev_write() when the number of bytes to be retrieved is truncated to the fc->max_pages limit. virtiofs: - Fix page faults for DAX page addresses. Misc: - Tighten file handle decoding from userns. Check that the decoded dentry itself has a valid idmapping in the user namespace. - Fix mount-notify selftests. - Fix some indentation errors. - Add an FMODE_ flag to indicate IOCB_HAS_METADATA availability. This will be moved to an FOP_* flag with a bit more rework needed for that to happen not suitable for a fix. - Don't silently ignore metadata for sync read/write. - Don't pointlessly log warning when reading coredump sysctls" * tag 'vfs-6.17-rc6.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fuse: virtio_fs: fix page fault for DAX page address selftests/fs/mount-notify: Fix compilation failure. fhandle: use more consistent rules for decoding file handle from userns fuse: Block access to folio overlimit fuse: fix fuseblk i_blkbits for iomap partial writes fuse: reflect cached blocksize if blocksize was changed fuse: prevent overflow in copy_file_range return value fuse: check if copy_file_range() returns larger than requested size fuse: do not allow mapping a non-regular backing file coredump: don't pointlessly check and spew warnings fs: fix indentation style block: don't silently ignore metadata for sync read/write fs: add a FMODE_ flag to indicate IOCB_HAS_METADATA availability Please enter a commit message to explain why this merge is necessary, especially if it merges an updated upstream into a topic branch.
2025-09-07Merge tag 'timers-urgent-2025-09-07' of ↵Linus Torvalds1-1/+8
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Ingo Molnar: "Fix a severe slowdown regression in the timer vDSO code related to the while() loop in __iter_div_u64_rem(), when the AUX-clock is enabled" * tag 'timers-urgent-2025-09-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: vdso/vsyscall: Avoid slow division loop in auxiliary clock update
2025-09-05Merge branch 'bpf-next/skb-meta-dynptr' into 'bpf-next/master'Martin KaFai Lau1-1/+1
Merge skb-meta-dynptr branch into master branch after fixing a compiler warning. No conflict. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2025-09-04bpf: Return an error pointer for skb metadata when CONFIG_NET=nJakub Sitnicki1-1/+1
Kernel Test Robot reported a compiler warning - a null pointer may be passed to memmove in __bpf_dynptr_{read,write} when building without networking support. The warning is correct from a static analysis standpoint, but not actually reachable. Without CONFIG_NET, creating dynptrs to skb metadata is impossible since the constructor kfunc is missing. Silence the false-postive diagnostic message by returning an error pointer from bpf_skb_meta_pointer stub when CONFIG_NET=n. Fixes: 6877cd392bae ("bpf: Enable read/write access to skb metadata through a dynptr") Closes: https://lore.kernel.org/oe-kbuild-all/202508212031.ir9b3B6Q-lkp@intel.com/ Reported-by: kernel test robot <lkp@intel.com> Suggested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20250901-dynptr-skb-meta-no-net-v2-1-ce607fcb6091@cloudflare.com
2025-09-04Merge tag 'net-6.17-rc5' of ↵Linus Torvalds1-0/+5
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from netfilter, wireless and Bluetooth. We're reverting the removal of a Sundance driver, a user has appeared. This makes the PR rather large in terms of LoC. There's a conspicuous absence of real, user-reported 6.17 issues. Slightly worried that the summer distracted people from testing. Previous releases - regressions: - ax25: properly unshare skbs in ax25_kiss_rcv() Previous releases - always broken: - phylink: disable autoneg for interfaces that have no inband, fix regression on pcs-lynx (NXP LS1088) - vxlan: fix null-deref when using nexthop objects - batman-adv: fix OOB read/write in network-coding decode - icmp: icmp_ndo_send: fix reversing address translation for replies - tcp: fix socket ref leak in TCP-AO failure handling for IPv6 - mctp: - mctp_fraq_queue should take ownership of passed skb - usb: initialise mac header in RX path, avoid WARN - wifi: mac80211: do not permit 40 MHz EHT operation on 5/6 GHz, respect device limitations - wifi: wilc1000: avoid buffer overflow in WID string configuration - wifi: mt76: - fix regressions from mt7996 MLO support rework - fix offchannel handling issues on mt7996 - fix multiple wcid linked list corruption issues - mt7921: don't disconnect when AP requests switch to a channel which requires radar detection - mt7925u: use connac3 tx aggr check in tx complete - wifi: intel: - improve validation of ACPI DSM data - cfg: restore some 1000 series configs - wifi: ath: - ath11k: a fix for GTK rekeying - ath12k: a missed WiFi7 capability (multi-link EMLSR) - eth: intel: - ice: fix races in "low latency" firmware interface for Tx timestamps - idpf: set mac type when adding and removing MAC filters - i40e: remove racy read access to some debugfs files Misc: - Revert "eth: remove the DLink/Sundance (ST201) driver" - netfilter: conntrack: helper: Replace -EEXIST by -EBUSY, avoid confusing modprobe" * tag 'net-6.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (99 commits) phy: mscc: Stop taking ts_lock for tx_queue and use its own lock selftest: net: Fix weird setsockopt() in bind_bhash.c. MAINTAINERS: add Sabrina to TLS maintainers gve: update MAINTAINERS ppp: fix memory leak in pad_compress_skb net: xilinx: axienet: Add error handling for RX metadata pointer retrieval net: atm: fix memory leak in atm_register_sysfs when device_register fail netfilter: nf_tables: Introduce NFTA_DEVICE_PREFIX selftests: netfilter: fix udpclash tool hang ax25: properly unshare skbs in ax25_kiss_rcv() mctp: return -ENOPROTOOPT for unknown getsockopt options net/smc: Remove validation of reserved bits in CLC Decline message ipv4: Fix NULL vs error pointer check in inet_blackhole_dev_init() net: thunder_bgx: decrement cleanup index before use net: thunder_bgx: add a missing of_node_put net: phylink: move PHY interrupt request to non-fail path net: lockless sock_i_ino() tools: ynl-gen: fix nested array counting wifi: wilc1000: avoid buffer overflow in WID string configuration wifi: cfg80211: sme: cap SSID length in __cfg80211_connect_result() ...
2025-09-03vdso/vsyscall: Avoid slow division loop in auxiliary clock updateThomas Weißschuh1-1/+8
The call to __iter_div_u64_rem() in vdso_time_update_aux() is a wrapper around subtraction. It cannot be used to divide large numbers, as that introduces long, computationally expensive delays. A regular u64 division is also not possible in the timekeeper update path as it can be too slow. Instead of splitting the ktime_t offset into into second and subsecond components during the timekeeper update fast-path, do it together with the adjustment of tk->offs_aux in the slow-path. Equivalent to the handling of offs_boot and monotonic_to_boot. Reuse the storage of monotonic_to_boot for the new field, as it is not used by auxiliary timekeepers. Fixes: 380b84e168e5 ("vdso/vsyscall: Update auxiliary clock data in the datapage") Reported-by: Miroslav Lichvar <mlichvar@redhat.com> Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250825-vdso-auxclock-division-v1-1-a1d32a16a313@linutronix.de Closes: https://lore.kernel.org/lkml/aKwsNNWsHJg8IKzj@localhost/
2025-09-03net: phy: add phy_interface_weight()Russell King (Oracle)1-0/+5
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/E1uslwn-00000001SOx-0a7H@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-02Merge tag 'mm-hotfixes-stable-2025-09-01-17-20' of ↵Linus Torvalds4-21/+52
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "17 hotfixes. 13 are cc:stable and the remainder address post-6.16 issues or aren't considered necessary for -stable kernels. 11 of these fixes are for MM. This includes a three-patch series from Harry Yoo which fixes an intermittent boot failure which can occur on x86 systems. And a two-patch series from Alexander Gordeev which fixes a KASAN crash on S390 systems" * tag 'mm-hotfixes-stable-2025-09-01-17-20' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: mm: fix possible deadlock in kmemleak x86/mm/64: define ARCH_PAGE_TABLE_SYNC_MASK and arch_sync_kernel_mappings() mm: introduce and use {pgd,p4d}_populate_kernel() mm: move page table sync declarations to linux/pgtable.h proc: fix missing pde_set_flags() for net proc files mm: fix accounting of memmap pages mm/damon/core: prevent unnecessary overflow in damos_set_effective_quota() kexec: add KEXEC_FILE_NO_CMA as a legal flag kasan: fix GCC mem-intrinsic prefix with sw tags mm/kasan: avoid lazy MMU mode hazards mm/kasan: fix vmalloc shadow memory (de-)population races kunit: kasan_test: disable fortify string checker on kasan_strings() test selftests/mm: fix FORCE_READ to read input value correctly mm/userfaultfd: fix kmap_local LIFO ordering for CONFIG_HIGHPTE ocfs2: prevent release journal inode after journal shutdown rust: mm: mark VmaNew as transparent of_numa: fix uninitialized memory nodes causing kernel panic
2025-09-01Merge tag 'fuse-fixes-6.17-rc5' of ↵Christian Brauner8-61/+29
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse into vfs.fixes fuse fixes for 6.17-rc5 * tag 'fuse-fixes-6.17-rc5' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: (6 commits) fuse: Block access to folio overlimit fuse: fix fuseblk i_blkbits for iomap partial writes fuse: reflect cached blocksize if blocksize was changed fuse: prevent overflow in copy_file_range return value fuse: check if copy_file_range() returns larger than requested size fuse: do not allow mapping a non-regular backing file Link: https://lore.kernel.org/CAJfpeguEVMMyw_zCb+hbOuSxdE2Z3Raw=SJsq=Y56Ae6dn2W3g@mail.gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-08-29Merge tag 'net-6.17-rc4' of ↵Linus Torvalds2-0/+3
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from Bluetooth. Current release - regressions: - ipv4: fix regression in local-broadcast routes - vsock: fix error-handling regression introduced in v6.17-rc1 Previous releases - regressions: - bluetooth: - mark connection as closed during suspend disconnect - fix set_local_name race condition - eth: - ice: fix NULL pointer dereference on reset - mlx5: fix memory leak in hws_pool_buddy_init error path - bnxt_en: fix stats context reservation logic - hv: fix loss of receive events from host during channel open Previous releases - always broken: - page_pool: fix incorrect mp_ops error handling - sctp: initialize more fields in sctp_v6_from_sk() - eth: - octeontx2-vf: fix max packet length errors - idpf: fix Tx flow scheduling to avoid Tx timeouts - bnxt_en: fix memory corruption during ifdown - ice: fix incorrect counter for buffer allocation failures - mlx5: fix lockdep assertion on sync reset unload event - fbnic: fixup rtnl_lock and devl_lock handling - xgmac: do not enable RX FIFO overflow interrupts - phy: mscc: fix when PTP clock is register and unregister Misc: - add Telit Cinterion LE910C4-WWX new compositions" * tag 'net-6.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (60 commits) net: ipv4: fix regression in local-broadcast routes net: macb: Disable clocks once fbnic: Move phylink resume out of service_task and into open/close fbnic: Fixup rtnl_lock and devl_lock handling related to mailbox code net: rose: fix a typo in rose_clear_routes() l2tp: do not use sock_hold() in pppol2tp_session_get_sock() sctp: initialize more fields in sctp_v6_from_sk() MAINTAINERS: rmnet: Update email addresses net: rose: include node references in rose_neigh refcount net: rose: convert 'use' field to refcount_t net: rose: split remove and free operations in rose_remove_neigh() net: hv_netvsc: fix loss of early receive events from host during channel open. net: stmmac: Set CIC bit only for TX queues with COE net: stmmac: xgmac: Correct supported speed modes net: stmmac: xgmac: Do not enable RX FIFO Overflow interrupts net/mlx5e: Set local Xoff after FW update net/mlx5e: Update and set Xon/Xoff upon port speed set net/mlx5e: Update and set Xon/Xoff upon MTU set net/mlx5: Prevent flow steering mode changes in switchdev mode net/mlx5: Nack sync reset when SFs are present ...
2025-08-29Merge tag 'dma-mapping-6.17-2025-08-28' of ↵Linus Torvalds1-0/+3
git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux Pull dma-mapping fixes from Marek Szyprowski: - another small fix for arm64 systems with memory encryption (Shanker Donthineni) - fix for arm32 systems with non-standard CMA configuration (Oreoluwa Babatunde) * tag 'dma-mapping-6.17-2025-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux: dma/pool: Ensure DMA_DIRECT_REMAP allocations are decrypted of: reserved_mem: Restructure call site for dma_contiguous_early_fixup()
2025-08-29Merge tag 'fixes-2025-08-28' of ↵Linus Torvalds1-2/+3
git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock Pull memblock fixes from Mike Rapoport: - printk cleanups in memblock and numa_memblks - update kernel-doc for MEMBLOCK_RSRV_NOINIT to be more accurate and detailed * tag 'fixes-2025-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock: memblock: fix kernel-doc for MEMBLOCK_RSRV_NOINIT mm: numa,memblock: Use SZ_1M macro to denote bytes to MB conversion mm/numa_memblks: Use pr_debug instead of printk(KERN_DEBUG)
2025-08-28mm: introduce and use {pgd,p4d}_populate_kernel()Harry Yoo2-6/+36
Introduce and use {pgd,p4d}_populate_kernel() in core MM code when populating PGD and P4D entries for the kernel address space. These helpers ensure proper synchronization of page tables when updating the kernel portion of top-level page tables. Until now, the kernel has relied on each architecture to handle synchronization of top-level page tables in an ad-hoc manner. For example, see commit 9b861528a801 ("x86-64, mem: Update all PGDs for direct mapping and vmemmap mapping changes"). However, this approach has proven fragile for following reasons: 1) It is easy to forget to perform the necessary page table synchronization when introducing new changes. For instance, commit 4917f55b4ef9 ("mm/sparse-vmemmap: improve memory savings for compound devmaps") overlooked the need to synchronize page tables for the vmemmap area. 2) It is also easy to overlook that the vmemmap and direct mapping areas must not be accessed before explicit page table synchronization. For example, commit 8d400913c231 ("x86/vmemmap: handle unpopulated sub-pmd ranges")) caused crashes by accessing the vmemmap area before calling sync_global_pgds(). To address this, as suggested by Dave Hansen, introduce _kernel() variants of the page table population helpers, which invoke architecture-specific hooks to properly synchronize page tables. These are introduced in a new header file, include/linux/pgalloc.h, so they can be called from common code. They reuse existing infrastructure for vmalloc and ioremap. Synchronization requirements are determined by ARCH_PAGE_TABLE_SYNC_MASK, and the actual synchronization is performed by arch_sync_kernel_mappings(). This change currently targets only x86_64, so only PGD and P4D level helpers are introduced. Currently, these helpers are no-ops since no architecture sets PGTBL_{PGD,P4D}_MODIFIED in ARCH_PAGE_TABLE_SYNC_MASK. In theory, PUD and PMD level helpers can be added later if needed by other architectures. For now, 32-bit architectures (x86-32 and arm) only handle PGTBL_PMD_MODIFIED, so p*d_populate_kernel() will never affect them unless we introduce a PMD level helper. [harry.yoo@oracle.com: fix KASAN build error due to p*d_populate_kernel()] Link: https://lkml.kernel.org/r/20250822020727.202749-1-harry.yoo@oracle.com Link: https://lkml.kernel.org/r/20250818020206.4517-3-harry.yoo@oracle.com Fixes: 8d400913c231 ("x86/vmemmap: handle unpopulated sub-pmd ranges") Signed-off-by: Harry Yoo <harry.yoo@oracle.com> Suggested-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Kiryl Shutsemau <kas@kernel.org> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Alexander Potapenko <glider@google.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: bibo mao <maobibo@loongson.cn> Cc: Borislav Betkov <bp@alien8.de> Cc: Christoph Lameter (Ampere) <cl@gentwo.org> Cc: Dennis Zhou <dennis@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Dmitriy Vyukov <dvyukov@google.com> Cc: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jane Chu <jane.chu@oracle.com> Cc: Joao Martins <joao.m.martins@oracle.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Peter Xu <peterx@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Thomas Huth <thuth@redhat.com> Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-08-28mm: move page table sync declarations to linux/pgtable.hHarry Yoo2-16/+16
During our internal testing, we started observing intermittent boot failures when the machine uses 4-level paging and has a large amount of persistent memory: BUG: unable to handle page fault for address: ffffe70000000034 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 0 P4D 0 Oops: 0002 [#1] SMP NOPTI RIP: 0010:__init_single_page+0x9/0x6d Call Trace: <TASK> __init_zone_device_page+0x17/0x5d memmap_init_zone_device+0x154/0x1bb pagemap_range+0x2e0/0x40f memremap_pages+0x10b/0x2f0 devm_memremap_pages+0x1e/0x60 dev_dax_probe+0xce/0x2ec [device_dax] dax_bus_probe+0x6d/0xc9 [... snip ...] </TASK> It turns out that the kernel panics while initializing vmemmap (struct page array) when the vmemmap region spans two PGD entries, because the new PGD entry is only installed in init_mm.pgd, but not in the page tables of other tasks. And looking at __populate_section_memmap(): if (vmemmap_can_optimize(altmap, pgmap)) // does not sync top level page tables r = vmemmap_populate_compound_pages(pfn, start, end, nid, pgmap); else // sync top level page tables in x86 r = vmemmap_populate(start, end, nid, altmap); In the normal path, vmemmap_populate() in arch/x86/mm/init_64.c synchronizes the top level page table (See commit 9b861528a801 ("x86-64, mem: Update all PGDs for direct mapping and vmemmap mapping changes")) so that all tasks in the system can see the new vmemmap area. However, when vmemmap_can_optimize() returns true, the optimized path skips synchronization of top-level page tables. This is because vmemmap_populate_compound_pages() is implemented in core MM code, which does not handle synchronization of the top-level page tables. Instead, the core MM has historically relied on each architecture to perform this synchronization manually. We're not the first party to encounter a crash caused by not-sync'd top level page tables: earlier this year, Gwan-gyeong Mun attempted to address the issue [1] [2] after hitting a kernel panic when x86 code accessed the vmemmap area before the corresponding top-level entries were synced. At that time, the issue was believed to be triggered only when struct page was enlarged for debugging purposes, and the patch did not get further updates. It turns out that current approach of relying on each arch to handle the page table sync manually is fragile because 1) it's easy to forget to sync the top level page table, and 2) it's also easy to overlook that the kernel should not access the vmemmap and direct mapping areas before the sync. # The solution: Make page table sync more code robust and harder to miss To address this, Dave Hansen suggested [3] [4] introducing {pgd,p4d}_populate_kernel() for updating kernel portion of the page tables and allow each architecture to explicitly perform synchronization when installing top-level entries. With this approach, we no longer need to worry about missing the sync step, reducing the risk of future regressions. The new interface reuses existing ARCH_PAGE_TABLE_SYNC_MASK, PGTBL_P*D_MODIFIED and arch_sync_kernel_mappings() facility used by vmalloc and ioremap to synchronize page tables. pgd_populate_kernel() looks like this: static inline void pgd_populate_kernel(unsigned long addr, pgd_t *pgd, p4d_t *p4d) { pgd_populate(&init_mm, pgd, p4d); if (ARCH_PAGE_TABLE_SYNC_MASK & PGTBL_PGD_MODIFIED) arch_sync_kernel_mappings(addr, addr); } It is worth noting that vmalloc() and apply_to_range() carefully synchronizes page tables by calling p*d_alloc_track() and arch_sync_kernel_mappings(), and thus they are not affected by this patch series. This series was hugely inspired by Dave Hansen's suggestion and hence added Suggested-by: Dave Hansen. Cc stable because lack of this series opens the door to intermittent boot failures. This patch (of 3): Move ARCH_PAGE_TABLE_SYNC_MASK and arch_sync_kernel_mappings() to linux/pgtable.h so that they can be used outside of vmalloc and ioremap. Link: https://lkml.kernel.org/r/20250818020206.4517-1-harry.yoo@oracle.com Link: https://lkml.kernel.org/r/20250818020206.4517-2-harry.yoo@oracle.com Link: https://lore.kernel.org/linux-mm/20250220064105.808339-1-gwan-gyeong.mun@intel.com [1] Link: https://lore.kernel.org/linux-mm/20250311114420.240341-1-gwan-gyeong.mun@intel.com [2] Link: https://lore.kernel.org/linux-mm/d1da214c-53d3-45ac-a8b6-51821c5416e4@intel.com [3] Link: https://lore.kernel.org/linux-mm/4d800744-7b88-41aa-9979-b245e8bf794b@intel.com [4] Fixes: 8d400913c231 ("x86/vmemmap: handle unpopulated sub-pmd ranges") Signed-off-by: Harry Yoo <harry.yoo@oracle.com> Acked-by: Kiryl Shutsemau <kas@kernel.org> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: "Uladzislau Rezki (Sony)" <urezki@gmail.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Alexander Potapenko <glider@google.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: bibo mao <maobibo@loongson.cn> Cc: Borislav Betkov <bp@alien8.de> Cc: Christoph Lameter (Ampere) <cl@gentwo.org> Cc: Dennis Zhou <dennis@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Dmitriy Vyukov <dvyukov@google.com> Cc: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jane Chu <jane.chu@oracle.com> Cc: Joao Martins <joao.m.martins@oracle.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Peter Xu <peterx@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Thomas Huth <thuth@redhat.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-08-28kexec: add KEXEC_FILE_NO_CMA as a legal flagBrian Mak1-1/+2
Commit 07d24902977e ("kexec: enable CMA based contiguous allocation") introduces logic to use CMA-based allocation in kexec by default. As part of the changes, it introduces a kexec_file_load flag to disable the use of CMA allocations from userspace. However, this flag is broken since it is missing from the list of legal flags for kexec_file_load. kexec_file_load returns EINVAL when attempting to use the flag. Fix this by adding the KEXEC_FILE_NO_CMA flag to the list of legal flags for kexec_file_load. Without this fix, kexec_file_load syscall will failed and return '-EINVAL' when KEXEC_FILE_NO_CMA is specified. Link: https://lkml.kernel.org/r/20250805211527.122367-2-makb@juniper.net Fixes: 07d24902977e ("kexec: enable CMA based contiguous allocation") Signed-off-by: Brian Mak <makb@juniper.net> Acked-by: Baoquan He <bhe@redhat.com> Cc: Alexander Graf <graf@amazon.com> Cc: Borislav Betkov <bp@alien8.de> Cc: Dave Young <dyoung@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Rob Herring <robh@kernel.org> Cc: Saravana Kannan <saravanak@google.com> Cc: Thomas Gleinxer <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-08-28bpf: Improve the general precision of tnum_mulNandakumar Edamana1-0/+3
Drop the value-mask decomposition technique and adopt straightforward long-multiplication with a twist: when LSB(a) is uncertain, find the two partial products (for LSB(a) = known 0 and LSB(a) = known 1) and take a union. Experiment shows that applying this technique in long multiplication improves the precision in a significant number of cases (at the cost of losing precision in a relatively lower number of cases). Signed-off-by: Nandakumar Edamana <nandakumar@nandakumar.co.in> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Tested-by: Harishankar Vishwanathan <harishankar.vishwanathan@gmail.com> Reviewed-by: Harishankar Vishwanathan <harishankar.vishwanathan@gmail.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/bpf/20250826034524.2159515-1-nandakumar@nandakumar.co.in
2025-08-27Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds1-2/+0
Pull virtio/vhost fixes from Michael Tsirkin: "More small fixes. Most notably this fixes a messed up ioctl number, and a regression in shmem affecting drm users" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: virtio_net: adjust the execution order of function `virtnet_close` during freeze virtio_input: Improve freeze handling vhost: Fix ioctl # for VHOST_[GS]ET_FORK_FROM_OWNER Revert "virtio: reject shm region if length is zero" vhost/net: Protect ubufs with rcu read lock in vhost_net_ubuf_put() virtio_pci: Fix misleading comment for queue vector
2025-08-27Merge tag 'media/v6.17-2' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull media fixes from Mauro Carvalho Chehab: - drop the redundant pm_runtime_mark_last_busy() in rkvdec - fix probing error handling in rkvdec - fix an issue affecting lt6911uxe/lt6911uxc related to CSI-2 GPIO pins in int3472 * tag 'media/v6.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: media: Remove redundant pm_runtime_mark_last_busy() calls platform/x86: int3472: add hpd pin support media: rkvdec: Remove redundant pm_runtime_mark_last_busy() calls media: rkvdec: Fix an error handling path in rkvdec_probe() media: rkvdec: Fix a NULL vs IS_ERR() bug in probe()
2025-08-26memblock: fix kernel-doc for MEMBLOCK_RSRV_NOINITMike Rapoport (Microsoft)1-2/+3
The kernel-doc description of MEMBLOCK_RSRV_NOINIT and memblock_reserved_mark_noinit() do not accurately describe their functionality. Expand their kernel doc to make it clear that the user of MEMBLOCK_RSRV_NOINIT is responsible to properly initialize the struct pages for such regions and add more details about effects of using this flag. Reviewed-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/f8140a17-c4ec-489b-b314-d45abe48bf36@redhat.com Link: https://lore.kernel.org/r/20250826071947.1949725-1-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
2025-08-26Revert "virtio: reject shm region if length is zero"Igor Torrente1-2/+0
The commit 206cc44588f7 ("virtio: reject shm region if length is zero") breaks the Virtio-gpu `host_visible` feature. As you can see in the snippet below, host_visible_region is zero because of the `kzalloc`. It's using the `vm_get_shm_region` (drivers/virtio/virtio_mmio.c:536) to read the `addr` and `len` from qemu/crosvm. ``` drivers/gpu/drm/virtio/virtgpu_kms.c 132 vgdev = drmm_kzalloc(dev, sizeof(struct virtio_gpu_device), GFP_KERNEL); [...] 177 if (virtio_get_shm_region(vgdev->vdev, &vgdev->host_visible_region, 178 VIRTIO_GPU_SHM_ID_HOST_VISIBLE)) { ``` Now it always fails. To fix, revert the offending commit. Fixes: 206cc44588f7 ("virtio: reject shm region if length is zero") Signed-off-by: Igor Torrente <igor.torrente@collabora.com> Message-Id: <20250807124145.81816-1-igor.torrente@collabora.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-08-26rcu: add rcu_read_lock_dont_migrate()Menglong Dong1-0/+14
migrate_disable() is called to disable migration in the kernel, and it is often used together with rcu_read_lock(). However, with PREEMPT_RCU disabled, it's unnecessary, as rcu_read_lock() will always disable preemption, which will also disable migration. Introduce rcu_read_lock_dont_migrate() and rcu_read_unlock_migrate(), which will do the migration enable and disable only when PREEMPT_RCU. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Link: https://lore.kernel.org/r/20250821090609.42508-2-dongml2@chinatelecom.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-08-23Merge tag 'usb-6.17-rc3' of ↵Linus Torvalds1-38/+9
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB fixes from Greg KH: "Here are some small USB driver fixes for 6.17-rc3 to resolve a bunch of reported issues. Included in here are: - typec driver fixes - dwc3 new device id - dwc3 driver fixes - new usb-storage driver quirks - xhci driver fixes - other tiny USB driver fixes to resolve bugs All of these have been in linux-next this week with no reported issues" * tag 'usb-6.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: usb: xhci: fix host not responding after suspend and resume usb: xhci: Fix slot_id resource race conflict usb: typec: fusb302: Revert incorrect threaded irq fix USB: core: Update kerneldoc for usb_hcd_giveback_urb() usb: typec: maxim_contaminant: re-enable cc toggle if cc is open and port is clean usb: typec: maxim_contaminant: disable low power mode when reading comparator values usb: dwc3: Remove WARN_ON for device endpoint command timeouts USB: storage: Ignore driver CD mode for Realtek multi-mode Wi-Fi dongles usb: storage: realtek_cr: Use correct byte order for bcs->Residue usb: chipidea: imx: improve usbmisc_imx7d_pullup() kcov, usb: Don't disable interrupts in kcov_remote_start_usb_softirq() usb: dwc3: pci: add support for the Intel Wildcat Lake usb: dwc3: Ignore late xferNotReady event to prevent halt timeout USB: storage: Add unusual-devs entry for Novatek NTK96550-based camera usb: core: hcd: fix accessing unmapped memory in SINGLE_STEP_SET_FEATURE test usb: renesas-xhci: Fix External ROM access timeouts usb: gadget: tegra-xudc: fix PM use count underflow usb: quirks: Add DELAY_INIT quick for another SanDisk 3.2Gen1 Flash Drive
2025-08-23atm: atmtcp: Prevent arbitrary write in atmtcp_recv_control().Kuniyuki Iwashima1-0/+1
syzbot reported the splat below. [0] When atmtcp_v_open() or atmtcp_v_close() is called via connect() or close(), atmtcp_send_control() is called to send an in-kernel special message. The message has ATMTCP_HDR_MAGIC in atmtcp_control.hdr.length. Also, a pointer of struct atm_vcc is set to atmtcp_control.vcc. The notable thing is struct atmtcp_control is uAPI but has a space for an in-kernel pointer. struct atmtcp_control { struct atmtcp_hdr hdr; /* must be first */ ... atm_kptr_t vcc; /* both directions */ ... } __ATM_API_ALIGN; typedef struct { unsigned char _[8]; } __ATM_API_ALIGN atm_kptr_t; The special message is processed in atmtcp_recv_control() called from atmtcp_c_send(). atmtcp_c_send() is vcc->dev->ops->send() and called from 2 paths: 1. .ndo_start_xmit() (vcc->send() == atm_send_aal0()) 2. vcc_sendmsg() The problem is sendmsg() does not validate the message length and userspace can abuse atmtcp_recv_control() to overwrite any kptr by atmtcp_control. Let's add a new ->pre_send() hook to validate messages from sendmsg(). [0]: Oops: general protection fault, probably for non-canonical address 0xdffffc00200000ab: 0000 [#1] SMP KASAN PTI KASAN: probably user-memory-access in range [0x0000000100000558-0x000000010000055f] CPU: 0 UID: 0 PID: 5865 Comm: syz-executor331 Not tainted 6.17.0-rc1-syzkaller-00215-gbab3ce404553 #0 PREEMPT(full) Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/12/2025 RIP: 0010:atmtcp_recv_control drivers/atm/atmtcp.c:93 [inline] RIP: 0010:atmtcp_c_send+0x1da/0x950 drivers/atm/atmtcp.c:297 Code: 4d 8d 75 1a 4c 89 f0 48 c1 e8 03 42 0f b6 04 20 84 c0 0f 85 15 06 00 00 41 0f b7 1e 4d 8d b7 60 05 00 00 4c 89 f0 48 c1 e8 03 <42> 0f b6 04 20 84 c0 0f 85 13 06 00 00 66 41 89 1e 4d 8d 75 1c 4c RSP: 0018:ffffc90003f5f810 EFLAGS: 00010203 RAX: 00000000200000ab RBX: 0000000000000000 RCX: 0000000000000000 RDX: ffff88802a510000 RSI: 00000000ffffffff RDI: ffff888030a6068c RBP: ffff88802699fb40 R08: ffff888030a606eb R09: 1ffff1100614c0dd R10: dffffc0000000000 R11: ffffffff8718fc40 R12: dffffc0000000000 R13: ffff888030a60680 R14: 000000010000055f R15: 00000000ffffffff FS: 00007f8d7e9236c0(0000) GS:ffff888125c1c000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000045ad50 CR3: 0000000075bde000 CR4: 00000000003526f0 Call Trace: <TASK> vcc_sendmsg+0xa10/0xc60 net/atm/common.c:645 sock_sendmsg_nosec net/socket.c:714 [inline] __sock_sendmsg+0x219/0x270 net/socket.c:729 ____sys_sendmsg+0x505/0x830 net/socket.c:2614 ___sys_sendmsg+0x21f/0x2a0 net/socket.c:2668 __sys_sendmsg net/socket.c:2700 [inline] __do_sys_sendmsg net/socket.c:2705 [inline] __se_sys_sendmsg net/socket.c:2703 [inline] __x64_sys_sendmsg+0x19b/0x260 net/socket.c:2703 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xfa/0x3b0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f8d7e96a4a9 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 51 18 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f8d7e923198 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00007f8d7e9f4308 RCX: 00007f8d7e96a4a9 RDX: 0000000000000000 RSI: 0000200000000240 RDI: 0000000000000005 RBP: 00007f8d7e9f4300 R08: 65732f636f72702f R09: 65732f636f72702f R10: 65732f636f72702f R11: 0000000000000246 R12: 00007f8d7e9c10ac R13: 00007f8d7e9231a0 R14: 0000200000000200 R15: 0000200000000250 </TASK> Modules linked in: Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot+1741b56d54536f4ec349@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/68a6767c.050a0220.3d78fd.0011.GAE@google.com/ Tested-by: syzbot+1741b56d54536f4ec349@syzkaller.appspotmail.com Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250821021901.2814721-1-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-23Merge tag 'drm-fixes-2025-08-23-1' of https://gitlab.freedesktop.org/drm/kernelLinus Torvalds1-6/+1
Pull drm fixes from Dave Airlie: "Weekly drm fixes. Looks like things did indeed get busier after rc2, nothing seems too major, but stuff scattered all over the place, amdgpu, xe, i915, hibmc, rust support code, and other small fixes. rust: - drm device memory layout and safety fixes tests: - Endianness fixes gpuvm: - docs warning fix panic: - fix division on 32-bit arm i915: - TypeC DP display Fixes - Silence rpm wakeref asserts on GEN11_GU_MISC_IIR access - Relocate compression repacking WA for JSL/EHL xe: - xe_vm_create fixes - fix vm bind ioctl double free amdgpu: - Replay fixes - SMU14 fix - Null check DC fixes - DCE6 DC fixes - Misc DC fixes bridge: - analogix_dp: devm_drm_bridge_alloc() error handling fix habanalabs: - Memory deallocation fix hibmc: - modesetting black screen fixes - fix UAF on irq - fix leak on i2c failure path nouveau: - memory leak fixes - typos rockchip: - Kconfig fix - register caching fix" * tag 'drm-fixes-2025-08-23-1' of https://gitlab.freedesktop.org/drm/kernel: (49 commits) drm/xe: Fix vm_bind_ioctl double free bug drm/xe: Move ASID allocation and user PT BO tracking into xe_vm_create drm/xe: Assign ioctl xe file handler to vm in xe_vm_create drm/i915/gt: Relocate compression repacking WA for JSL/EHL drm/i915: silence rpm wakeref asserts on GEN11_GU_MISC_IIR access drm/amd/display: Fix DP audio DTO1 clock source on DCE 6. drm/amd/display: Fix fractional fb divider in set_pixel_clock_v3 drm/amd/display: Don't print errors for nonexistent connectors drm/amd/display: Don't warn when missing DCE encoder caps drm/amd/display: Fill display clock and vblank time in dce110_fill_display_configs drm/amd/display: Find first CRTC and its line time in dce110_fill_display_configs drm/amd/display: Adjust DCE 8-10 clock, don't overclock by 15% drm/amd/display: Don't overclock DCE 6 by 15% drm/amd/display: Add null pointer check in mod_hdcp_hdcp1_create_session() drm/amd/display: Fix Xorg desktop unresponsive on Replay panel drm/amd/display: Avoid a NULL pointer dereference drm/amdgpu/swm14: Update power limit logic drm/amd/display: Revert Add HPO encoder support to Replay drm/i915/icl+/tc: Convert AUX powered WARN to a debug message drm/i915/lnl+/tc: Use the cached max lane count value ...
2025-08-22bpf: Use sha1() instead of sha1_transform() in bpf_prog_calc_tag()Eric Biggers1-6/+0
Now that there's a proper SHA-1 library API, just use that instead of the low-level SHA-1 compression function. This eliminates the need for bpf_prog_calc_tag() to implement the SHA-1 padding itself. No functional change; the computed tags remain the same. Signed-off-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/bpf/20250811201615.564461-1-ebiggers@kernel.org
2025-08-22bpf: Use tnums for JEQ/JNE is_branch_taken logicPaul Chaignon1-0/+3
In the following toy program (reg states minimized for readability), R0 and R1 always have different values at instruction 6. This is obvious when reading the program but cannot be guessed from ranges alone as they overlap (R0 in [0; 0xc0000000], R1 in [1024; 0xc0000400]). 0: call bpf_get_prandom_u32#7 ; R0_w=scalar() 1: w0 = w0 ; R0_w=scalar(var_off=(0x0; 0xffffffff)) 2: r0 >>= 30 ; R0_w=scalar(var_off=(0x0; 0x3)) 3: r0 <<= 30 ; R0_w=scalar(var_off=(0x0; 0xc0000000)) 4: r1 = r0 ; R1_w=scalar(var_off=(0x0; 0xc0000000)) 5: r1 += 1024 ; R1_w=scalar(var_off=(0x400; 0xc0000000)) 6: if r1 != r0 goto pc+1 Looking at tnums however, we can deduce that R1 is always different from R0 because their tnums don't agree on known bits. This patch uses this logic to improve is_scalar_branch_taken in case of BPF_JEQ and BPF_JNE. This change has a tiny impact on complexity, which was measured with the Cilium complexity CI test. That test covers 72 programs with various build and load time configurations for a total of 970 test cases. For 80% of test cases, the patch has no impact. On the other test cases, the patch decreases complexity by only 0.08% on average. In the best case, the verifier needs to walk 3% less instructions and, in the worst case, 1.5% more. Overall, the patch has a small positive impact, especially for our largest programs. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/be3ee70b6e489c49881cb1646114b1d861b5c334.1755694147.git.paul.chaignon@gmail.com
2025-08-22Merge tag 'for-linus-6.17-rc3-tag' of ↵Linus Torvalds1-8/+0
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: "Two small cleanups which are both relevant only when running as a Xen guest" * tag 'for-linus-6.17-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: drivers/xen/xenbus: remove quirk for Xen 3.x compiler: remove __ADDRESSABLE_ASM{_STR,}() again
2025-08-22Merge tag 'block-6.17-20250822' of git://git.kernel.dk/linuxLinus Torvalds1-0/+1
Pull block fixes from Jens Axboe: "A set of fixes for block that should go into this tree. A bit larger than what I usually have at this point in time, a lot of that is the continued fixing of the lockdep annotation for queue freezing that we recently added, which has highlighted a number of little issues here and there. This contains: - MD pull request via Yu: - Add a legacy_async_del_gendisk mode, to prevent a user tools regression. New user tools releases will not use such a mode, the old release with a new kernel now will have warning about deprecated behavior, and we prepare to remove this legacy mode after about a year later - The rename in kernel causing user tools build failure, revert the rename in mdp_superblock_s - Fix a regression that interrupted resync can be shown as recover from mdstat or sysfs - Improve file size detection for loop, particularly for networked file systems, by using getattr to get the size rather than the cached inode size. - Hotplug CPU lock vs queue freeze fix - Lockdep fix while updating the number of hardware queues - Fix stacking for PI devices - Silence bio_check_eod() for the known case of device removal where the size is truncated to 0 sectors" * tag 'block-6.17-20250822' of git://git.kernel.dk/linux: block: avoid cpu_hotplug_lock depedency on freeze_lock block: decrement block_rq_qos static key in rq_qos_del() block: skip q->rq_qos check in rq_qos_done_bio() blk-mq: fix lockdep warning in __blk_mq_update_nr_hw_queues block: tone down bio_check_eod loop: use vfs_getattr_nosec for accurate file size loop: Consolidate size calculation logic into lo_calculate_size() block: remove newlines from the warnings in blk_validate_integrity_limits block: handle pi_tuple_size in queue_limits_stack_integrity selftests: ublk: Use ARRAY_SIZE() macro to improve code md: fix sync_action incorrect display during resync md: add helper rdev_needs_recovery() md: keep recovery_cp in mdp_superblock_s md: add legacy_async_del_gendisk mode
2025-08-22Merge tag 'nfs-for-6.17-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds1-0/+1
Pull NFS client fix from Trond Myklebust: - NFS: Fix a data corrupting race when updating an existing write * tag 'nfs-for-6.17-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFS: Fix a race when updating an existing write