kernel/linux.git - Linux kernel stable tree (mirror)

Age	Commit message (Collapse)	Author	Files	Lines
2026-01-27	Merge branch 'bpf-fix-fionread-and-copied_seq-issues'	Alexei Starovoitov	6	-17/+439
	Jiayuan Chen says: ==================== bpf: Fix FIONREAD and copied_seq issues syzkaller reported a bug [1] where a socket using sockmap, after being unloaded, exposed incorrect copied_seq calculation. The selftest I provided can be used to reproduce the issue reported by syzkaller. TCP recvmsg seq # bug 2: copied E92C873, seq E68D125, rcvnxt E7CEB7C, fl 40 WARNING: CPU: 1 PID: 5997 at net/ipv4/tcp.c:2724 tcp_recvmsg_locked+0xb2f/0x2910 net/ipv4/tcp.c:2724 Call Trace: <TASK> receive_fallback_to_copy net/ipv4/tcp.c:1968 [inline] tcp_zerocopy_receive+0x131a/0x2120 net/ipv4/tcp.c:2200 do_tcp_getsockopt+0xe28/0x26c0 net/ipv4/tcp.c:4713 tcp_getsockopt+0xdf/0x100 net/ipv4/tcp.c:4812 do_sock_getsockopt+0x34d/0x440 net/socket.c:2421 __sys_getsockopt+0x12f/0x260 net/socket.c:2450 __do_sys_getsockopt net/socket.c:2457 [inline] __se_sys_getsockopt net/socket.c:2454 [inline] __x64_sys_getsockopt+0xbd/0x160 net/socket.c:2454 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xcd/0xfa0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f A sockmap socket maintains its own receive queue (ingress_msg) which may contain data from either its own protocol stack or forwarded from other sockets. FD1:read() -- FD1->copied_seq++ \| [read data] \| [enqueue data] v [sockmap] -> ingress to self -> ingress_msg queue FD1 native stack ------> ^ -- FD1->rcv_nxt++ -> redirect to other \| [enqueue data] \| \| \| ingress to FD1 v ^ ... \| [sockmap] FD2 native stack The issue occurs when reading from ingress_msg: we update tp->copied_seq by default, but if the data comes from other sockets (not the socket's own protocol stack), tcp->rcv_nxt remains unchanged. Later, when converting back to a native socket, reads may fail as copied_seq could be significantly larger than rcv_nxt. Additionally, FIONREAD calculation based on copied_seq and rcv_nxt is insufficient for sockmap sockets, requiring separate field tracking. [1] https://syzkaller.appspot.com/bug?extid=06dbd397158ec0ea4983 --- v7 -> v9: Address Jakub Sitnicki's feedback: - Remove sk_receive_queue check in tcp_bpf_ioctl, only report ingress_msg data length for FIONREAD - Minor nits fixes - Add Reviewed-by tag from John Fastabend - Fix ci error https://lore.kernel.org/bpf/20260113025121.197535-1-jiayuan.chen@linux.dev/ v5 -> v7: Some modifications suggested by Jakub Sitnicki, and added Reviewed-by tag. https://lore.kernel.org/bpf/20260106051458.279151-1-jiayuan.chen@linux.dev/ v1 -> v5: Use skmsg.sk instead of extending BPF_F_XXX macro and fix CI failure reported by CI v1: https://lore.kernel.org/bpf/20251117110736.293040-1-jiayuan.chen@linux.dev/ ==================== Link: https://patch.msgid.link/20260124113314.113584-1-jiayuan.chen@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-27	selftests/bpf: Add tests for FIONREAD and copied_seq	Jiayuan Chen	2	-6/+302
	This commit adds two new test functions: one to reproduce the bug reported by syzkaller [1], and another to cover the calculation of copied_seq. The tests primarily involve installing and uninstalling sockmap on sockets, then reading data to verify proper functionality. Additionally, extend the do_test_sockmap_skb_verdict_fionread() function to support UDP FIONREAD testing. [1] https://syzkaller.appspot.com/bug?extid=06dbd397158ec0ea4983 Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/r/20260124113314.113584-4-jiayuan.chen@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-27	bpf, sockmap: Fix FIONREAD for sockmap	Jiayuan Chen	4	-6/+108
	A socket using sockmap has its own independent receive queue: ingress_msg. This queue may contain data from its own protocol stack or from other sockets. Therefore, for sockmap, relying solely on copied_seq and rcv_nxt to calculate FIONREAD is not enough. This patch adds a new msg_tot_len field in the psock structure to record the data length in ingress_msg. Additionally, we implement new ioctl interfaces for TCP and UDP to intercept FIONREAD operations. Note that we intentionally do not include sk_receive_queue data in the FIONREAD result. Data in sk_receive_queue has not yet been processed by the BPF verdict program, and may be redirected to other sockets or dropped. Including it would create semantic ambiguity since this data may never be readable by the user. Unix and VSOCK sockets have similar issues, but fixing them is outside the scope of this patch as it would require more intrusive changes. Previous work by John Fastabend made some efforts towards FIONREAD support: commit e5c6de5fa025 ("bpf, sockmap: Incorrectly handling copied_seq") Although the current patch is based on the previous work by John Fastabend, it is acceptable for our Fixes tag to point to the same commit. FD1:read() -- FD1->copied_seq++ \| [read data] \| [enqueue data] v [sockmap] -> ingress to self -> ingress_msg queue FD1 native stack ------> ^ -- FD1->rcv_nxt++ -> redirect to other \| [enqueue data] \| \| \| ingress to FD1 v ^ ... \| [sockmap] FD2 native stack Fixes: 04919bed948dc ("tcp: Introduce tcp_read_skb()") Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/r/20260124113314.113584-3-jiayuan.chen@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-27	bpf, sockmap: Fix incorrect copied_seq calculation	Jiayuan Chen	3	-5/+29
	A socket using sockmap has its own independent receive queue: ingress_msg. This queue may contain data from its own protocol stack or from other sockets. The issue is that when reading from ingress_msg, we update tp->copied_seq by default. However, if the data is not from its own protocol stack, tcp->rcv_nxt is not increased. Later, if we convert this socket to a native socket, reading from this socket may fail because copied_seq might be significantly larger than rcv_nxt. This fix also addresses the syzkaller-reported bug referenced in the Closes tag. This patch marks the skmsg objects in ingress_msg. When reading, we update copied_seq only if the data is from its own protocol stack. FD1:read() -- FD1->copied_seq++ \| [read data] \| [enqueue data] v [sockmap] -> ingress to self -> ingress_msg queue FD1 native stack ------> ^ -- FD1->rcv_nxt++ -> redirect to other \| [enqueue data] \| \| \| ingress to FD1 v ^ ... \| [sockmap] FD2 native stack Closes: https://syzkaller.appspot.com/bug?extid=06dbd397158ec0ea4983 Fixes: 04919bed948dc ("tcp: Introduce tcp_read_skb()") Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev> Link: https://lore.kernel.org/r/20260124113314.113584-2-jiayuan.chen@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-27	selftests/bpf: cover BPF_CGROUP_ITER_CHILDREN control option	Matt Bobrowski	3	-4/+25
	Extend some of the existing CSS iterator selftests such that they cover the newly introduced BPF_CGROUP_ITER_CHILDREN iterator control option. Signed-off-by: Matt Bobrowski <mattbobrowski@google.com> Link: https://lore.kernel.org/r/20260127085112.3608687-2-mattbobrowski@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-27	bpf: add new BPF_CGROUP_ITER_CHILDREN control option	Matt Bobrowski	3	-5/+37
	Currently, the BPF cgroup iterator supports walking descendants in either pre-order (BPF_CGROUP_ITER_DESCENDANTS_PRE) or post-order (BPF_CGROUP_ITER_DESCENDANTS_POST). These modes perform an exhaustive depth-first search (DFS) of the hierarchy. In scenarios where a BPF program may need to inspect only the direct children of a given parent cgroup, a full DFS is unnecessarily expensive. This patch introduces a new BPF cgroup iterator control option, BPF_CGROUP_ITER_CHILDREN. This control option restricts the traversal to the immediate children of a specified parent cgroup, allowing for more targeted and efficient iteration, particularly when exhaustive depth-first search (DFS) traversal is not required. Signed-off-by: Matt Bobrowski <mattbobrowski@google.com> Link: https://lore.kernel.org/r/20260127085112.3608687-1-mattbobrowski@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-27	Merge branch 'selftests-bpf-migrate-a-few-bpftool-testing-scripts'	Alexei Starovoitov	7	-486/+602
	Alexis Lothoré says: ==================== selftests/bpf: migrate a few bpftool testing scripts this is the v4 for some bpftool tests conversion. The new tests are being integrated in test_progs so that they can be executed on each CI run. - First commit introduces a few dedicated helpers to execute bpftool commands, with or without retrieving the generated stdout output - Second commit integrates test_bpftool_metadata.sh into test_progs - Third commit integrates test_bpftool_map.sh into test_progs Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com> --- Changes in v4: - Port missing map access test in bpftool_metadata - Link to v3: https://lore.kernel.org/r/20260121-bpftool-tests-v3-0-368632f377e5@bootlin.com Changes in v3: - Drop commit reordering objects in Makefile - Rebased series on ci/bpf-next_base to fix conflict - Link to v2: https://lore.kernel.org/r/20260121-bpftool-tests-v2-0-64edb47e91ae@bootlin.com Changes in v2: - drop standalone runner in favor of test_progs - Link to v1: https://lore.kernel.org/r/20260114-bpftool-tests-v1-0-cfab1cc9beaf@bootlin.com ==================== Link: https://patch.msgid.link/20260123-bpftool-tests-v4-0-a6653a7f28e7@bootlin.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-27	selftests/bpf: convert test_bpftool_map_access.sh into test_progs framework	Alexis Lothoré (eBPF Foundation)	3	-399/+371
	The test_bpftool_map.sh script tests that maps read/write accesses are being properly allowed/refused by the kernel depending on a specific fmod_ret program being attached on security_bpf_map function. Rewrite this test to integrate it in the test_progs. The new test spawns a few subtests: #36/1 bpftool_maps_access/unprotected_unpinned:OK #36/2 bpftool_maps_access/unprotected_pinned:OK #36/3 bpftool_maps_access/protected_unpinned:OK #36/4 bpftool_maps_access/protected_pinned:OK #36/5 bpftool_maps_access/nested_maps:OK #36/6 bpftool_maps_access/btf_list:OK #36 bpftool_maps_access:OK Summary: 1/6 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com> Acked-by: Quentin Monnet <qmo@kernel.org> Link: https://lore.kernel.org/r/20260123-bpftool-tests-v4-3-a6653a7f28e7@bootlin.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-27	selftests/bpf: convert test_bpftool_metadata.sh into test_progs framework	Alexis Lothoré (eBPF Foundation)	3	-86/+144
	The test_bpftool_metadata.sh script validates that bpftool properly returns in its ouptput any metadata generated by bpf programs through some .rodata sections. Port this test to the test_progs framework so that it can be executed automatically in CI. The new test, similarly to the former script, checks that valid data appears both for textual output and json output, as well as for both data not used at all and used data. For the json check part, the expected json string is hardcoded to avoid bringing a new external dependency (eg: a json deserializer) for test_progs. As the test is now converted into test_progs, remove the former script. The newly converted test brings two new subtests: #37/1 bpftool_metadata/metadata_unused:OK #37/2 bpftool_metadata/metadata_used:OK #37 bpftool_metadata:OK Summary: 1/2 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com> Link: https://lore.kernel.org/r/20260123-bpftool-tests-v4-2-a6653a7f28e7@bootlin.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-27	selftests/bpf: Add a few helpers for bpftool testing	Alexis Lothoré (eBPF Foundation)	3	-1/+87
	In order to integrate some bpftool tests into test_progs, define a few specific helpers that allow to execute bpftool commands, while possibly retrieving the command output. Those helpers most notably set the path to the bpftool binary under test. This version checks different possible paths relative to the directories where the different test_progs runners are executed, as we want to make sure not to accidentally use a bootstrap version of the binary. Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com> Link: https://lore.kernel.org/r/20260123-bpftool-tests-v4-1-a6653a7f28e7@bootlin.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-26	selftests/bpf: Harden cpu flags test for lru_percpu_hash map	Leon Hwang	1	-3/+10
	CI occasionally reports failures in the percpu_alloc/cpu_flag_lru_percpu_hash selftest, for example: First test_progs failure (test_progs_no_alu32-x86_64-llvm-21): #264/15 percpu_alloc/cpu_flag_lru_percpu_hash ... test_percpu_map_op_cpu_flag:FAIL:bpf_map_lookup_batch value on specified cpu unexpected bpf_map_lookup_batch value on specified cpu: actual 0 != expected 3735929054 The unexpected value indicates that an element was removed from the map. However, the test never calls delete_elem(), so the only possible cause is LRU eviction. This can happen when the current task migrates to another CPU: an update_elem() triggers eviction because there is no available LRU node on local freelist and global freelist. Harden the test against this behavior by provisioning sufficient spare elements. Set max_entries to 'nr_cpus * 2' and restrict the test to using the first nr_cpus entries, ensuring that updates do not spuriously trigger LRU eviction. Signed-off-by: Leon Hwang <leon.hwang@linux.dev> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20260119133417.19739-1-leon.hwang@linux.dev
2026-01-25	Merge branch 'selftests-bpf-introduce-execution-context-detection-helpers'	Alexei Starovoitov	6	-0/+202
	Changwoo Min says: ==================== selftests/bpf: Introduce execution context detection helpers This series introduces four new BPF-native inline helpers -- bpf_in_nmi(), bpf_in_hardirq(), bpf_in_serving_softirq(), and bpf_in_task() -- to allow BPF programs to query the current execution context. Following the feedback on v1, these are implemented in bpf_experimental.h as inline helpers wrapping get_preempt_count(). This approach allows the logic to be JIT-inlined for better performance compared to a kfunc call, while providing the granular context detection (e.g., hardirq vs. softirq) required by subsystems like sched_ext. The series includes a new selftest suite, exe_ctx, which uses bpf_testmod to verify context detection across Task, HardIRQ, and SoftIRQ boundaries via irq_work and tasklets. NMI context testing is omitted as NMIs cannot be triggered deterministically within software-only BPF CI environments. ChangeLog v2 -> v3: - Added exe_ctx to DENYLIST.s390x since new helpers are supported only on x86 and arm64 (patch 2). - Added comments to helpers describing supported architectures (patch 1). ChangeLog v1 -> v2: - Dropped the core kernel kfunc implementations, and implemented context detection as inline BPF helpers in bpf_experimental.h. - Renamed the selftest suite from ctx_kfunc to exe_ctx to reflect the change from kfuncs to helpers. - Updated BPF programs to use the new inline helpers. - Swapped clean-up order between tasklet and irqwork in bpf_testmod to avoid re-scheduling the already-killed tasklet (reported by bot+bpf-ci). ==================== Link: https://patch.msgid.link/20260125115413.117502-1-changwoo@igalia.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-25	selftests/bpf: Add tests for execution context helpers	Changwoo Min	5	-0/+144
	Add a new selftest suite `exe_ctx` to verify the accuracy of the bpf_in_task(), bpf_in_hardirq(), and bpf_in_serving_softirq() helpers introduced in bpf_experimental.h. Testing these execution contexts deterministically requires crossing context boundaries within a single CPU. To achieve this, the test implements a "Trigger-Observer" pattern using bpf_testmod: 1. Trigger: A BPF syscall program calls a new bpf_testmod kfunc bpf_kfunc_trigger_ctx_check(). 2. Task to HardIRQ: The kfunc uses irq_work_queue() to trigger a self-IPI on the local CPU. 3. HardIRQ to SoftIRQ: The irq_work handler calls a dummy function (observed by BPF fentry) and then schedules a tasklet to transition into SoftIRQ context. The user-space runner ensures determinism by pinning itself to CPU 0 before execution, forcing the entire interrupt chain to remain on a single core. Dummy noinline functions with compiler barriers are added to bpf_testmod.c to serve as stable attachment points for fentry programs. A retry loop is used in user-space to wait for the asynchronous SoftIRQ to complete. Note that testing on s390x is avoided because supporting those helpers purely in BPF on s390x is not possible at this point. Reviewed-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Changwoo Min <changwoo@igalia.com> Link: https://lore.kernel.org/r/20260125115413.117502-3-changwoo@igalia.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-25	selftests/bpf: Introduce execution context detection helpers	Changwoo Min	1	-0/+58
	Introduce bpf_in_nmi(), bpf_in_hardirq(), bpf_in_serving_softirq(), and bpf_in_task() inline helpers in bpf_experimental.h. These allow BPF programs to query the current execution context with higher granularity than the existing bpf_in_interrupt() helper. While BPF programs can often infer their context from attachment points, subsystems like sched_ext may call the same BPF logic from multiple contexts (e.g., task-to-task wake-ups vs. interrupt-to-task wake-ups). These helpers provide a reliable way for logic to branch based on the current CPU execution state. Implementing these as BPF-native inline helpers wrapping get_preempt_count() allows the compiler and JIT to inline the logic. The implementation accounts for differences in preempt_count layout between standard and PREEMPT_RT kernels. Reviewed-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Changwoo Min <changwoo@igalia.com> Link: https://lore.kernel.org/r/20260125115413.117502-2-changwoo@igalia.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-25	Merge branch 'bpf-fsession-support'	Alexei Starovoitov	26	-101/+654
	Menglong Dong says: ==================== bpf: fsession support overall ------- Sometimes, we need to hook both the entry and exit of a function with TRACING. Therefore, we need define a FENTRY and a FEXIT for the target function, which is not convenient. Therefore, we add a tracing session support for TRACING. Generally speaking, it's similar to kprobe session, which can hook both the entry and exit of a function with a single BPF program. We allow the usage of bpf_get_func_ret() to get the return value in the fentry of the tracing session, as it will always get "0", which is safe enough and is OK. Session cookie is also supported with the kfunc bpf_session_cookie(). In order to limit the stack usage, we limit the maximum number of cookies to 4. kfunc design ------------ In order to keep consistency with existing kfunc, we don't introduce new kfunc for fsession. Instead, we reuse the existing kfunc bpf_session_cookie() and bpf_session_is_return(). The prototype of bpf_session_cookie() and bpf_session_is_return() don't satisfy our needs, so we change their prototype by adding the argument "void ctx" to them. We inline bpf_session_cookie() and bpf_session_is_return() for fsession in the verifier directly. Therefore, we don't need to introduce new functions for them. architecture ------------ The fsession stuff is arch related, so the -EOPNOTSUPP will be returned if it is not supported yet by the arch. In this series, we only support x86_64. And later, other arch will be implemented. Changes v12 -> v13: fix the selftests fail on !x86_64 in the 11th patch * v12: https://lore.kernel.org/bpf/20260124033119.28682-1-dongml2@chinatelecom.cn/ Changes v11 -> v12: * update the variable "delta" in the 2nd patch * improve the fsession testcase by adding the 11th patch, which will test bpf_get_func_* for fsession * v11: https://lore.kernel.org/bpf/20260123073532.238985-1-dongml2@chinatelecom.cn/ Changes v10 -> v11: * rebase and fix the conflicts in the 2nd patch * use "volatile" in the 11th patch * rename BPF_TRAMP_SHIFT_* to BPF_TRAMP__SHIFT v10: https://lore.kernel.org/bpf/20260115112246.221082-1-dongml2@chinatelecom.cn/ Changes v9 -> v10: * 1st patch: some small adjustment, such as use switch in bpf_prog_has_trampoline() * 2nd patch: some adjustment to the commit log and comment * 3rd patch: - drop the declaration of bpf_session_is_return() and bpf_session_cookie() - use vmlinux.h instead of bpf_kfuncs.h in uprobe_multi_session.c, kprobe_multi_session_cookie.c and uprobe_multi_session_cookie.c * 4th patch: - some adjustment to the comment and commit log - rename the prefix from BPF_TRAMP_M_ to BPF_TRAMP_SHIFT_ - remove the definition of BPF_TRAMP_M_NR_ARGS - check the program type in bpf_session_filter() * 5th patch: some adjustment to the commit log * 6th patch: - add the "reg" to the function arguments of emit_store_stack_imm64() - use the positive offset in emit_store_stack_imm64() * 7th patch: - use "\|" for func_meta instead of "+" - pass the "func_meta_off" to invoke_bpf() explicitly, instead of computing it with "stack_size + 8" - pass the "cookie_off" to invoke_bpf() instead of computing the current cookie index with "func_meta" * 8th patch: - split the modification to bpftool to a separate patch * v9: https://lore.kernel.org/bpf/20260110141115.537055-1-dongml2@chinatelecom.cn/ Changes v8 -> v9: * remove the definition of bpf_fsession_cookie and bpf_fsession_is_return in the 4th and 5th patch * rename emit_st_r0_imm64() to emit_store_stack_imm64() in the 6th patch * v8: https://lore.kernel.org/bpf/20260108022450.88086-1-dongml2@chinatelecom.cn/ Changes v7 -> v8: * use the last byte of nr_args for bpf_get_func_arg_cnt() in the 2nd patch * v7: https://lore.kernel.org/bpf/20260107064352.291069-1-dongml2@chinatelecom.cn/ Changes v6 -> v7: * change the prototype of bpf_session_cookie() and bpf_session_is_return(), and reuse them instead of introduce new kfunc for fsession. * v6: https://lore.kernel.org/bpf/20260104122814.183732-1-dongml2@chinatelecom.cn/ Changes v5 -> v6: * No changes in this version, just a rebase to deal with conflicts. * v5: https://lore.kernel.org/bpf/20251224130735.201422-1-dongml2@chinatelecom.cn/ Changes v4 -> v5: * use fsession terminology consistently in all patches * 1st patch: - use more explicit way in __bpf_trampoline_link_prog() * 4th patch: - remove "cookie_cnt" in struct bpf_trampoline * 6th patch: - rename nr_regs to func_md - define cookie_off in a new line * 7th patch: - remove the handling of BPF_TRACE_SESSION in legacy fallback path for BPF_RAW_TRACEPOINT_OPEN * v4: https://lore.kernel.org/bpf/20251217095445.218428-1-dongml2@chinatelecom.cn/ Changes v3 -> v4: * instead of adding a new hlist to progs_hlist in trampoline, add the bpf program to both the fentry hlist and the fexit hlist. * introduce the 2nd patch to reuse the nr_args field in the stack to store all the information we need(except the session cookies). * limit the maximum number of cookies to 4. * remove the logic to skip fexit if the fentry return non-zero. * v3: https://lore.kernel.org/bpf/20251026030143.23807-1-dongml2@chinatelecom.cn/ Changes v2 -> v3: * squeeze some patches: - the 2 patches for the kfunc bpf_tracing_is_exit() and bpf_fsession_cookie() are merged into the second patch. - the testcases for fsession are also squeezed. * fix the CI error by move the testcase for bpf_get_func_ip to fsession_test.c * v2: https://lore.kernel.org/bpf/20251022080159.553805-1-dongml2@chinatelecom.cn/ Changes v1 -> v2: * session cookie support. In this version, session cookie is implemented, and the kfunc bpf_fsession_cookie() is added. * restructure the layout of the stack. In this version, the session stuff that stored in the stack is changed, and we locate them after the return value to not break bpf_get_func_ip(). * testcase enhancement. Some nits in the testcase that suggested by Jiri is fixed. Meanwhile, the testcase for get_func_ip and session cookie is added too. * v1: https://lore.kernel.org/bpf/20251018142124.783206-1-dongml2@chinatelecom.cn/ ==================== Link: https://patch.msgid.link/20260124062008.8657-1-dongml2@chinatelecom.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-25	selftests/bpf: test fsession mixed with fentry and fexit	Menglong Dong	1	-0/+16
	Test the fsession when it is used together with fentry, fexit. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Link: https://lore.kernel.org/r/20260124062008.8657-14-dongml2@chinatelecom.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-25	selftests/bpf: add testcases for fsession cookie	Menglong Dong	2	-0/+100
	Test session cookie for fsession. Multiple fsession BPF progs is attached to bpf_fentry_test1() and session cookie is read and write in the testcase. bpf_get_func_ip() will influence the layout of the session cookies, so we test the cookie in two case: with and without bpf_get_func_ip(). Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Link: https://lore.kernel.org/r/20260124062008.8657-13-dongml2@chinatelecom.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-25	selftests/bpf: test bpf_get_func_* for fsession	Menglong Dong	4	-1/+65
	Test following bpf helper for fsession: bpf_get_func_arg() bpf_get_func_arg_cnt() bpf_get_func_ret() bpf_get_func_ip() Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Link: https://lore.kernel.org/r/20260124062008.8657-12-dongml2@chinatelecom.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-25	selftests/bpf: add testcases for fsession	Menglong Dong	2	-0/+187
	Add testcases for BPF_TRACE_FSESSION. The function arguments and return value are tested both in the entry and exit. And the kfunc bpf_session_is_ret() is also tested. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Link: https://lore.kernel.org/r/20260124062008.8657-11-dongml2@chinatelecom.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-25	bpftool: add fsession support	Menglong Dong	1	-0/+1
	Add BPF_TRACE_FSESSION to bpftool. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Link: https://lore.kernel.org/r/20260124062008.8657-10-dongml2@chinatelecom.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-25	libbpf: add fsession support	Menglong Dong	2	-0/+4
	Add BPF_TRACE_FSESSION to libbpf. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Link: https://lore.kernel.org/r/20260124062008.8657-9-dongml2@chinatelecom.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-25	bpf,x86: add fsession support for x86_64	Menglong Dong	1	-12/+40
	Add BPF_TRACE_FSESSION supporting to x86_64, including: 1. clear the return value in the stack before fentry to make the fentry of the fsession can only get 0 with bpf_get_func_ret(). 2. clear all the session cookies' value in the stack. 2. store the index of the cookie to ctx[-1] before the calling to fsession 3. store the "is_return" flag to ctx[-1] before the calling to fexit of the fsession. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Co-developed-by: Leon Hwang <leon.hwang@linux.dev> Signed-off-by: Leon Hwang <leon.hwang@linux.dev> Link: https://lore.kernel.org/r/20260124062008.8657-8-dongml2@chinatelecom.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-25	bpf,x86: introduce emit_store_stack_imm64() for trampoline	Menglong Dong	1	-12/+14
	Introduce the helper emit_store_stack_imm64(), which is used to store a imm64 to the stack with the help of a register. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Link: https://lore.kernel.org/r/20260124062008.8657-7-dongml2@chinatelecom.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-25	bpf: support fsession for bpf_session_cookie	Menglong Dong	2	-0/+35
	Implement session cookie for fsession. The session cookies will be stored in the stack, and the layout of the stack will look like this: return value -> 8 bytes argN -> 8 bytes ... arg1 -> 8 bytes nr_args -> 8 bytes ip (optional) -> 8 bytes cookie2 -> 8 bytes cookie1 -> 8 bytes The offset of the cookie for the current bpf program, which is in 8-byte units, is stored in the "(((u64 )ctx)[-1] >> BPF_TRAMP_COOKIE_INDEX_SHIFT) & 0xFF". Therefore, we can get the session cookie with ((u64 )ctx)[-offset]. Implement and inline the bpf_session_cookie() for the fsession in the verifier. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Link: https://lore.kernel.org/r/20260124062008.8657-6-dongml2@chinatelecom.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-25	bpf: support fsession for bpf_session_is_return	Menglong Dong	3	-13/+41
	If fsession exists, we will use the bit (1 << BPF_TRAMP_IS_RETURN_SHIFT) in ((u64 )ctx)[-1] to store the "is_return" flag. The logic of bpf_session_is_return() for fsession is implemented in the verifier by inline following code: bool bpf_session_is_return(void ctx) { return (((u64 *)ctx)[-1] >> BPF_TRAMP_IS_RETURN_SHIFT) & 1; } Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Co-developed-by: Leon Hwang <leon.hwang@linux.dev> Signed-off-by: Leon Hwang <leon.hwang@linux.dev> Link: https://lore.kernel.org/r/20260124062008.8657-5-dongml2@chinatelecom.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-25	bpf: change prototype of bpf_session_{cookie,is_return}	Menglong Dong	7	-32/+29
	Add the function argument of "void *ctx" to bpf_session_cookie() and bpf_session_is_return(), which is a preparation of the next patch. The two kfunc is seldom used now, so it will not introduce much effect to change their function prototype. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20260124062008.8657-4-dongml2@chinatelecom.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-25	bpf: use the least significant byte for the nr_args in trampoline	Menglong Dong	2	-19/+26
	For now, ((u64 )ctx)[-1] is used to store the nr_args in the trampoline. However, 1 byte is enough to store such information. Therefore, we use only the least significant byte of ((u64 )ctx)[-1] to store the nr_args, and reserve the rest for other usages. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Link: https://lore.kernel.org/r/20260124062008.8657-3-dongml2@chinatelecom.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-25	bpf: add fsession support	Menglong Dong	10	-13/+97
	The fsession is something that similar to kprobe session. It allow to attach a single BPF program to both the entry and the exit of the target functions. Introduce the struct bpf_fsession_link, which allows to add the link to both the fentry and fexit progs_hlist of the trampoline. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Co-developed-by: Leon Hwang <leon.hwang@linux.dev> Signed-off-by: Leon Hwang <leon.hwang@linux.dev> Link: https://lore.kernel.org/r/20260124062008.8657-2-dongml2@chinatelecom.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-25	selftests/bpf: Fix xdp_pull_data failure with 64K page	Yonghong Song	1	-7/+9
	If the argument 'pull_len' of run_test() is 'PULL_MAX' or 'PULL_MAX \| PULL_PLUS_ONE', the eventual pull_len size will close to the page size. On arm64 systems with 64K pages, the pull_len size will be close to 64K. But the existing buffer will be close to 9000 which is not enough to pull. For those failed run_tests(), make buff size to pg_sz + (pg_sz / 2) This way, there will be enough buffer space to pull regardless of page size. Tested-by: Alan Maguire <alan.maguire@oracle.com> Cc: Amery Hung <ameryhung@gmail.com> Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Acked-by: Amery Hung <ameryhung@gmail.com> Link: https://lore.kernel.org/r/20260123055128.495265-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-25	selftests/bpf: Fix task_local_data failure with 64K page	Yonghong Song	3	-4/+4
	On arm64 systems with 64K pages, the selftest task_local_data has the following failures: ... test_task_local_data_basic:PASS:tld_create_key 0 nsec test_task_local_data_basic:FAIL:tld_create_key unexpected tld_create_key: actual 0 != expected -28 ... test_task_local_data_basic_thread:PASS:run task_main 0 nsec test_task_local_data_basic_thread:FAIL:task_main retval unexpected error: 2 (errno 0) test_task_local_data_basic_thread:FAIL:tld_get_data value0 unexpected tld_get_data value0: actual 0 != expected 6268 ... #447/1 task_local_data/task_local_data_basic:FAIL ... #447/2 task_local_data/task_local_data_race:FAIL #447 task_local_data:FAIL When TLD_DYN_DATA_SIZE is 64K page size, for struct tld_meta_u { _Atomic __u8 cnt; __u16 size; struct tld_metadata metadata[]; }; field 'cnt' would overflow. For example, for 4K page, 'cnt' will be 4096/64 = 64. But for 64K page, 'cnt' will be 65536/64 = 1024 and 'cnt' is not enough for 1024. To accommodate 64K page, '_Atomic __u8 cnt' becomes '_Atomic __u16 cnt'. A few other places are adjusted accordingly. In test_task_local_data.c, the value for TLD_DYN_DATA_SIZE is changed from 4096 to (getpagesize() - 8) since the maximum buffer size for TLD_DYN_DATA_SIZE is (getpagesize() - 8). Reviewed-by: Alan Maguire <alan.maguire@oracle.com> Tested-by: Alan Maguire <alan.maguire@oracle.com> Cc: Amery Hung <ameryhung@gmail.com> Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Acked-by: Amery Hung <ameryhung@gmail.com> Link: https://lore.kernel.org/r/20260123055122.494352-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-23	rqspinlock: Fix TAS fallback lock entry creation	Kumar Kartikeya Dwivedi	2	-4/+5
	The TAS fallback can be invoked directly when queued spin locks are disabled, and through the slow path when paravirt is enabled for queued spin locks. In the latter case, the res_spin_lock macro will attempt the fast path and already hold the entry when entering the slow path. This will lead to creation of extraneous entries that are not released, which may cause false positives for deadlock detection. Fix this by always preceding invocation of the TAS fallback in every case with the grabbing of the held lock entry, and add a comment to make note of this. Fixes: c9102a68c070 ("rqspinlock: Add a test-and-set fallback") Reported-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Tested-by: Amery Hung <ameryhung@gmail.com> Link: https://lore.kernel.org/r/20260122115911.3668985-1-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-22	selftests/bpf: Fix resource leak in serial_test_wq on attach failure	Kery Qi	1	-2/+3
	When wq__attach() fails, serial_test_wq() returns early without calling wq__destroy(), leaking the skeleton resources allocated by wq__open_and_load(). This causes ASAN leak reports in selftests runs. Fix this by jumping to a common clean_up label that calls wq__destroy() on all exit paths after successful open_and_load. Note that the early return after wq__open_and_load() failure is correct and doesn't need fixing, since that function returns NULL on failure (after internally cleaning up any partial allocations). Fixes: 8290dba51910 ("selftests/bpf: wq: add bpf_wq_start() checks") Signed-off-by: Kery Qi <qikeyu2017@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/bpf/20260121094114.1801-3-qikeyu2017@gmail.com
2026-01-21	scripts/gen-btf.sh: Use CONFIG_SHELL for execution	Ihor Solodrai	3	-6/+6
	According to the docs [1], kernel build scripts should be executed via CONFIG_SHELL, which is sh by default. Fixup gen-btf.sh to be runnable with sh, and use CONFIG_SHELL at every invocation site. See relevant discussion for context [2]. [1] https://docs.kernel.org/kbuild/makefiles.html#script-invocation [2] https://lore.kernel.org/bpf/CAADnVQ+dxmSNoJAGb6xV89ffUCKXe5CJXovXZt22nv5iYFV5mw@mail.gmail.com/ Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev> Tested-by: Gary Guo <gary@garyguo.net> Reported-by: Gary Guo <gary@garyguo.net> Suggested-by: Thomas Weißschuh <linux@weissschuh.net> Fixes: 522397d05e7d ("resolve_btfids: Change in-place update with raw binary output") Link: https://lore.kernel.org/r/20260121181617.820300-1-ihor.solodrai@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-21	Merge branch 'bpf-add-kfunc-bpf_strncasecmp'	Alexei Starovoitov	5	-5/+40
	Yuzuki Ishiyama says: ==================== bpf: Add kfunc bpf_strncasecmp() This patchset introduces bpf_strncasecmp to allow case-insensitive and limited-length string comparison. This is useful for parsing protocol headers like HTTP. --- Changes in v5: - Fixed the test function numbering Changes in v4: - Updated the loop variable to maintain style consistency Changes in v3: - Use ternary operator to maintain style consistency - Reverted unnecessary doc comment about XATTR_SIZE_MAX Changes in v2: - Compute max_sz upfront and remove len check from the loop body - Document that @len is limited by XATTR_SIZE_MAX ==================== Link: https://patch.msgid.link/20260121033328.1850010-1-ishiyama@hpc.is.uec.ac.jp Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-21	selftests/bpf: Test kfunc bpf_strncasecmp	Yuzuki Ishiyama	4	-0/+15
	Add testsuites for kfunc bpf_strncasecmp. Signed-off-by: Yuzuki Ishiyama <ishiyama@hpc.is.uec.ac.jp> Acked-by: Viktor Malik <vmalik@redhat.com> Link: https://lore.kernel.org/r/20260121033328.1850010-3-ishiyama@hpc.is.uec.ac.jp Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-21	bpf: add bpf_strncasecmp kfunc	Yuzuki Ishiyama	1	-5/+25
	bpf_strncasecmp() function performs same like bpf_strcasecmp() except limiting the comparison to a specific length. Signed-off-by: Yuzuki Ishiyama <ishiyama@hpc.is.uec.ac.jp> Acked-by: Viktor Malik <vmalik@redhat.com> Acked-by: Mykyta Yatsenko <mykyta.yatsenko5@gmail.com> Link: https://lore.kernel.org/r/20260121033328.1850010-2-ishiyama@hpc.is.uec.ac.jp Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-21	bpf: Revert "bpf: drop KF_ACQUIRE flag on BPF kfunc bpf_get_root_mem_cgroup()"	Matt Bobrowski	1	-7/+7
	This reverts commit e463b6de9da1 ("bpf: drop KF_ACQUIRE flag on BPF kfunc bpf_get_root_mem_cgroup()"). The original commit removed the KF_ACQUIRE flag from bpf_get_root_mem_cgroup() under the assumption that it resulted in simplified usage. This stemmed from the fact that bpf_get_root_mem_cgroup() inherently returns a reference to an object which technically isn't reference counted, therefore there is no strong requirement to call a matching bpf_put_mem_cgroup() on the returned reference. Although technically correct, as per the arguments in the thread [0], dropping the KF_ACQUIRE flag and losing reference tracking semantics negatively impacted the usability of bpf_get_root_mem_cgroup() in practice. [0] https://lore.kernel.org/bpf/878qdx6yut.fsf@linux.dev/ Link: https://lore.kernel.org/bpf/CAADnVQ+6d1Lj4dteAv8u62d7kj3Ze5io6bqM0xeQd-UPk9ZgJQ@mail.gmail.com/ Signed-off-by: Matt Bobrowski <mattbobrowski@google.com> Link: https://lore.kernel.org/r/20260121090001.240166-1-mattbobrowski@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-21	Merge branch 'bpf-support-bpf_get_func_arg-for-bpf_trace_raw_tp'	Alexei Starovoitov	6	-6/+87
	Menglong Dong says: ==================== bpf: support bpf_get_func_arg() for BPF_TRACE_RAW_TP Support bpf_get_func_arg() for BPF_TRACE_RAW_TP by getting the function argument count from "prog->aux->attach_func_proto" during verifier inline. Changes v5 -> v4: * some format adjustment in the 1st patch * v4: https://lore.kernel.org/bpf/20260120073046.324342-1-dongml2@chinatelecom.cn/ Changes v4 -> v3: * fix the error of using bpf_get_func_arg() for BPF_TRACE_ITER * v3: https://lore.kernel.org/bpf/20260119023732.130642-1-dongml2@chinatelecom.cn/ Changes v3 -> v2: * remove unnecessary NULL checking for prog->aux->attach_func_proto * v2: https://lore.kernel.org/bpf/20260116071739.121182-1-dongml2@chinatelecom.cn/ Changes v2 -> v1: * for nr_args, skip first 'void __data' argument in btf_trace_##name typedef check the result4 and result5 in the selftests * v1: https://lore.kernel.org/bpf/20260116035024.98214-1-dongml2@chinatelecom.cn/ ==================== Link: https://patch.msgid.link/20260121044348.113201-1-dongml2@chinatelecom.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-21	selftests/bpf: test bpf_get_func_arg() for tp_btf	Menglong Dong	4	-0/+61
	Test bpf_get_func_arg() and bpf_get_func_arg_cnt() for tp_btf. The code is most copied from test1 and test2. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20260121044348.113201-3-dongml2@chinatelecom.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-21	bpf: support bpf_get_func_arg() for BPF_TRACE_RAW_TP	Menglong Dong	2	-6/+26
	For now, bpf_get_func_arg() and bpf_get_func_arg_cnt() is not supported by the BPF_TRACE_RAW_TP, which is not convenient to get the argument of the tracepoint, especially for the case that the position of the arguments in a tracepoint can change. The target tracepoint BTF type id is specified during loading time, therefore we can get the function argument count from the function prototype instead of the stack. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Acked-by: Yonghong Song <yonghong.song@linux.dev> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20260121044348.113201-2-dongml2@chinatelecom.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-21	Merge branch 'bpf-x86-inline-bpf_get_current_task-for-x86_64'	Alexei Starovoitov	3	-0/+44
	Menglong Dong says: ==================== bpf, x86: inline bpf_get_current_task() for x86_64 Inline bpf_get_current_task() and bpf_get_current_task_btf() for x86_64 to obtain better performance, and add the testcase for it. Changes since v5: * remove unnecessary 'ifdef' and __description in the selftests * v5: https://lore.kernel.org/bpf/20260119070246.249499-1-dongml2@chinatelecom.cn/ Changes since v4: * don't support the !CONFIG_SMP case * v4: https://lore.kernel.org/bpf/20260112104529.224645-1-dongml2@chinatelecom.cn/ Changes since v3: * handle the !CONFIG_SMP case * ignore the !CONFIG_SMP case in the testcase, as we enable CONFIG_SMP for x86_64 in the selftests Changes since v2: * implement it in the verifier with BPF_MOV64_PERCPU_REG() instead of in x86_64 JIT (Alexei). Changes since v1: * add the testcase * remove the usage of const_current_task ==================== Link: https://patch.msgid.link/20260120070555.233486-1-dongml2@chinatelecom.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-21	selftests/bpf: test the jited inline of bpf_get_current_task	Menglong Dong	2	-0/+22
	Add the testcase for the jited inline of bpf_get_current_task(). Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20260120070555.233486-3-dongml2@chinatelecom.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-21	bpf, x86: inline bpf_get_current_task() for x86_64	Menglong Dong	1	-0/+22
	Inline bpf_get_current_task() and bpf_get_current_task_btf() for x86_64 to obtain better performance. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20260120070555.233486-2-dongml2@chinatelecom.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-21	bpf: Simplify bpf_timer_cancel()	Mykyta Yatsenko	1	-16/+11
	Remove lock from the bpf_timer_cancel() helper. The lock does not protect from concurrent modification of the bpf_async_cb data fields as those are modified in the callback without locking. Use guard(rcu)() instead of pair of explicit lock()/unlock(). Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com> Link: https://lore.kernel.org/r/20260120-timer_nolock-v6-4-670ffdd787b4@meta.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-21	bpf: Introduce lock-free bpf_async_update_prog_callback()	Mykyta Yatsenko	1	-30/+37
	Introduce bpf_async_update_prog_callback(): lock-free update of cb->prog and cb->callback_fn. This function allows updating prog and callback_fn fields of the struct bpf_async_cb without holding lock. For now use it under the lock from __bpf_async_set_callback(), in the next patches that lock will be removed. Lock-free algorithm: * Acquire a guard reference on prog to prevent it from being freed during the retry loop. * Retry loop: 1. Each iteration acquires a new prog reference and stores it in cb->prog via xchg. The previous prog is released. 2. The loop condition checks if both cb->prog and cb->callback_fn match what we just wrote. If either differs, a concurrent writer overwrote our value, and we must retry. 3. When we retry, our previously-stored prog was already released by the concurrent writer or will be released by us after overwriting. * Release guard reference. Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com> Link: https://lore.kernel.org/r/20260120-timer_nolock-v6-3-670ffdd787b4@meta.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-21	bpf: Remove unnecessary arguments from bpf_async_set_callback()	Mykyta Yatsenko	1	-5/+4
	Remove unused arguments from __bpf_async_set_callback(). Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com> Link: https://lore.kernel.org/r/20260120-timer_nolock-v6-2-670ffdd787b4@meta.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-21	bpf: Factor out timer deletion helper	Mykyta Yatsenko	1	-11/+18
	Move the timer deletion logic into a dedicated bpf_timer_delete() helper so it can be reused by later patches. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com> Link: https://lore.kernel.org/r/20260120-timer_nolock-v6-1-670ffdd787b4@meta.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-21	selftests/bpf: update verifier test for default trusted pointer semantics	Matt Bobrowski	5	-34/+52
	Replace the verifier test for default trusted pointer semantics, which previously relied on BPF kfunc bpf_get_root_mem_cgroup(), with a new test utilizing dedicated BPF kfuncs defined within the bpf_testmod. bpf_get_root_mem_cgroup() was modified such that it again relies on KF_ACQUIRE semantics, therefore no longer making it a suitable candidate to test BPF verifier default trusted pointer semantics against. Link: https://lore.kernel.org/bpf/20260113083949.2502978-2-mattbobrowski@google.com Signed-off-by: Matt Bobrowski <mattbobrowski@google.com> Link: https://lore.kernel.org/r/20260120091630.3420452-1-mattbobrowski@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-21	Merge branch 'bpf-fix-memory-access-flags-in-helper-prototypes'	Alexei Starovoitov	5	-15/+32
	Zesen Liu says: ==================== bpf: Fix memory access flags in helper prototypes This series adds missing memory access flags (MEM_RDONLY or MEM_WRITE) to several bpf helper function prototypes that use ARG_PTR_TO_MEM but lack the correct flag. It also adds a new check in verifier to ensure the flag is specified. Missing memory access flags in helper prototypes can lead to critical correctness issues when the verifier tries to perform code optimization. After commit 37cce22dbd51 ("bpf: verifier: Refactor helper access type tracking"), the verifier relies on the memory access flags, rather than treating all arguments in helper functions as potentially modifying the pointed-to memory. Using ARG_PTR_TO_MEM alone without flags does not make sense because: - If the helper does not change the argument, missing MEM_RDONLY causes the verifier to incorrectly reject a read-only buffer. - If the helper does change the argument, missing MEM_WRITE causes the verifier to incorrectly assume the memory is unchanged, leading to errors in code optimization. We have already seen several reports regarding this: - commit ac44dcc788b9 ("bpf: Fix verifier assumptions of bpf_d_path's output buffer") adds MEM_WRITE to bpf_d_path; - commit 2eb7648558a7 ("bpf: Specify access type of bpf_sysctl_get_name args") adds MEM_WRITE to bpf_sysctl_get_name. This series looks through all prototypes in the kernel and completes the flags. It also adds check_mem_arg_rw_flag_ok() and wires it into check_func_proto() to statically restrict ARG_PTR_TO_MEM from appearing without memory access flags. Changelog ========= v3: - Rebased to bpf-next to address check_func_proto() signature changes, as suggested by Eduard Zingerman. v2: - Add missing MEM_RDONLY flags to protos with ARG_PTR_TO_FIXED_SIZE_MEM. ==================== Link: https://patch.msgid.link/20260120-helper_proto-v3-0-27b0180b4e77@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-21	bpf: Require ARG_PTR_TO_MEM with memory flag	Zesen Liu	1	-0/+17
	Add check to ensure that ARG_PTR_TO_MEM is used with either MEM_WRITE or MEM_RDONLY. Using ARG_PTR_TO_MEM alone without flags does not make sense because: - If the helper does not change the argument, missing MEM_RDONLY causes the verifier to incorrectly reject a read-only buffer. - If the helper does change the argument, missing MEM_WRITE causes the verifier to incorrectly assume the memory is unchanged, leading to errors in code optimization. Co-developed-by: Shuran Liu <electronlsr@gmail.com> Signed-off-by: Shuran Liu <electronlsr@gmail.com> Co-developed-by: Peili Gao <gplhust955@gmail.com> Signed-off-by: Peili Gao <gplhust955@gmail.com> Co-developed-by: Haoran Ni <haoran.ni.cs@gmail.com> Signed-off-by: Haoran Ni <haoran.ni.cs@gmail.com> Signed-off-by: Zesen Liu <ftyghome@gmail.com> Reviewed-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20260120-helper_proto-v3-2-27b0180b4e77@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>