summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2 daysMerge tag 'vfs-7.1-rc1.fixes' of ↵HEADmasterLinus Torvalds7-109/+125
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: - eventpoll: fix ep_remove() UAF and follow-up cleanup - fs: aio: set VMA_DONTCOPY_BIT in mmap to fix NULL-pointer-dereference error - writeback: Fix use after free in inode_switch_wbs_work_fn() - fuse: reject oversized dirents in page cache - fs: aio: reject partial mremap to avoid Null-pointer-dereference error - nstree: fix func. parameter kernel-doc warnings - fs: Handle multiply claimed blocks more gracefully with mmb * tag 'vfs-7.1-rc1.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: eventpoll: drop vestigial epi->dying flag eventpoll: drop dead bool return from ep_remove_epi() eventpoll: refresh eventpoll_release() fast-path comment eventpoll: move f_lock acquisition into ep_remove_file() eventpoll: fix ep_remove struct eventpoll / struct file UAF eventpoll: move epi_fget() up eventpoll: rename ep_remove_safe() back to ep_remove() eventpoll: drop vestigial __ prefix from ep_remove_{file,epi}() eventpoll: kill __ep_remove() eventpoll: split __ep_remove() eventpoll: use hlist_is_singular_node() in __ep_remove() fs: Handle multiply claimed blocks more gracefully with mmb nstree: fix func. parameter kernel-doc warnings fs: aio: reject partial mremap to avoid Null-pointer-dereference error fuse: reject oversized dirents in page cache writeback: Fix use after free in inode_switch_wbs_work_fn() fs: aio: set VMA_DONTCOPY_BIT in mmap to fix NULL-pointer-dereference error
2 daysMerge tag 'v7.1-rc-part2-ksmbd-fixes' of git://git.samba.org/ksmbdLinus Torvalds40-215/+165
Pull more smb server updates from Steve French: - move fs/smb/common/smbdirect to fs/smb/smbdirect - change signature calc to use AES-CMAC library, simpler and faster - invalid signature fix - multichannel fix - open create options fix - fix durable handle leak - cap maximum lock count to avoid potential denial of service - four connection fixes: connection free and session destroy IDA fixes, refcount fix, connection leak fix, max_connections off by one fix - IPC validation fix - fix out of bounds write in getting xattrs - fix use after free in durable handle reconnect - three ACL fixes: fix potential ACL overflow, harden num_aces check, and fix minimum ACE size check * tag 'v7.1-rc-part2-ksmbd-fixes' of git://git.samba.org/ksmbd: smb: smbdirect: move fs/smb/common/smbdirect/ to fs/smb/smbdirect/ smb: server: stop sending fake security descriptors ksmbd: scope conn->binding slowpath to bound sessions only ksmbd: fix CreateOptions sanitization clobbering the whole field ksmbd: fix durable fd leak on ClientGUID mismatch in durable v2 open ksmbd: fix O(N^2) DoS in smb2_lock via unbounded LockCount ksmbd: destroy async_ida in ksmbd_conn_free() ksmbd: destroy tree_conn_ida in ksmbd_session_destroy() ksmbd: Use AES-CMAC library for SMB3 signature calculation ksmbd: reset rcount per connection in ksmbd_conn_wait_idle_sess_id() ksmbd: fix out-of-bounds write in smb2_get_ea() EA alignment ksmbd: use check_add_overflow() to prevent u16 DACL size overflow ksmbd: fix use-after-free in smb2_open during durable reconnect ksmbd: validate num_aces and harden ACE walk in smb_inherit_dacl() smb: server: fix max_connections off-by-one in tcp accept path ksmbd: require minimum ACE size in smb_check_perm_dacl() ksmbd: validate response sizes in ipc_validate_msg() smb: server: fix active_num_conn leak on transport allocation failure
2 daysMerge tag 'v7.1-rc1-part3-smb3-client-fixes' of ↵Linus Torvalds24-392/+438
git://git.samba.org/sfrench/cifs-2.6 Pull smb client fixes from Steve French: - Four bug fixes: OOB read in ioctl query info, 3 ACL fixes - SMB1 Unix extensions mount fix - Four crypto improvements: move to AES-CMAC library, simpler and faster - Remove drop_dir_cache to avoid potential crash, and move to /procfs - Seven SMB3.1.1 compression fixes * tag 'v7.1-rc1-part3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: smb: client: Drop 'allocate_crypto' arg from smb*_calc_signature() smb: client: Make generate_key() return void smb: client: Remove obsolete cmac(aes) allocation smb: client: Use AES-CMAC library for SMB3 signature calculation smb: common: add SMB3_COMPRESS_MAX_ALGS smb: client: compress: add code docs to lz77.c smb: client: compress: LZ77 optimizations smb: client: compress: increase LZ77_MATCH_MAX_DIST smb: client: compress: fix counting in LZ77 match finding smb: client: compress: fix buffer overrun in lz77_compress() smb: client: scope end_of_dacl to CIFS_DEBUG2 use in parse_dacl smb: client: fix (remove) drop_dir_cache module parameter smb: client: require a full NFS mode SID before reading mode bits smb: client: validate the whole DACL before rewriting it in cifsacl smb: client: fix OOB read in smb2_ioctl_query_info QUERY_INFO path cifs: update internal module version number smb: client: compress: fix bad encoding on last LZ77 flag smb: client: fix dir separator in SMB1 UNIX mounts
2 daysMerge tag 'net-7.1-rc1' of ↵Linus Torvalds172-1414/+3534
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from Netfilter. Steady stream of fixes. Last two weeks feel comparable to the two weeks before the merge window. Lots of AI-aided bug discovery. A newer big source is Sashiko/Gemini (Roman Gushchin's system), which points out issues in existing code during patch review (maybe 25% of fixes here likely originating from Sashiko). Nice thing is these are often fixed by the respective maintainers, not drive-bys. Current release - new code bugs: - kconfig: MDIO_PIC64HPSC should depend on ARCH_MICROCHIP Previous releases - regressions: - add async ndo_set_rx_mode and switch drivers which we promised to be called under the per-netdev mutex to it - dsa: remove duplicate netdev_lock_ops() for conduit ethtool ops - hv_sock: report EOF instead of -EIO for FIN - vsock/virtio: fix MSG_PEEK calculation on bytes to copy Previous releases - always broken: - ipv6: fix possible UAF in icmpv6_rcv() - icmp: validate reply type before using icmp_pointers - af_unix: drop all SCM attributes for SOCKMAP - netfilter: fix a number of bugs in the osf (OS fingerprinting) - eth: intel: fix timestamp interrupt configuration for E825C Misc: - bunch of data-race annotations" * tag 'net-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (148 commits) rxrpc: Fix error handling in rxgk_extract_token() rxrpc: Fix re-decryption of RESPONSE packets rxrpc: Fix rxrpc_input_call_event() to only unshare DATA packets rxrpc: Fix missing validation of ticket length in non-XDR key preparsing rxgk: Fix potential integer overflow in length check rxrpc: Fix conn-level packet handling to unshare RESPONSE packets rxrpc: Fix potential UAF after skb_unshare() failure rxrpc: Fix rxkad crypto unalignment handling rxrpc: Fix memory leaks in rxkad_verify_response() net: rds: fix MR cleanup on copy error m68k: mvme147: Make me the maintainer net: txgbe: fix firmware version check selftests/bpf: check epoll readiness during reuseport migration tcp: call sk_data_ready() after listener migration vhost_net: fix sleeping with preempt-disabled in vhost_net_busy_poll() ipv6: Cap TLV scan in ip6_tnl_parse_tlv_enc_lim tipc: fix double-free in tipc_buf_append() llc: Return -EINPROGRESS from llc_ui_connect() ipv4: icmp: validate reply type before using icmp_pointers selftests/net: packetdrill: cover RFC 5961 5.2 challenge ACK on both edges ...
2 daysMerge tag 'i2c-for-7.1-rc1-part2' of ↵Linus Torvalds6-33/+76
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull more i2c updates from Wolfram Sang: - cx92755: convert I2C bindings to DT schema - mediatek: add optional bus power management during transfers - pxa: handle early bus busy condition - MAINTAINERS: update I2C RUST entry * tag 'i2c-for-7.1-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: MAINTAINERS: add Rust I2C tree and update Igor Korotin's email i2c: mediatek: add bus regulator control for power saving dt-bindings: i2c: cnxt,cx92755-i2c: Convert to DT schema i2c: pxa: handle 'Early Bus Busy' condition on Armada 3700
2 daysMerge tag 'xtensa-20260422' of https://github.com/jcmvbkbc/linux-xtensaLinus Torvalds5-43/+13
Pull Xtensa updates from Max Filippov: - use register_sys_off_handler(SYS_OFF_MODE_RESTART) instead of the deprecated register_restart_handler() - drop custom ucontext.h and reuse asm-generic ucontext.h * tag 'xtensa-20260422' of https://github.com/jcmvbkbc/linux-xtensa: xtensa: uapi: Reuse asm-generic ucontext.h xtensa: xtfpga: Use register_sys_off_handler(SYS_OFF_MODE_RESTART) xtensa: xt2000: Use register_sys_off_handler(SYS_OFF_MODE_RESTART) xtensa: ISS: Use register_sys_off_handler(SYS_OFF_MODE_RESTART)
2 daysMerge patch series "eventpoll: fix ep_remove() UAF and follow-up cleanup"Christian Brauner2-86/+88
Christian Brauner <brauner@kernel.org> says: ep_remove() (via __ep_remove_file()) cleared file->f_ep under file->f_lock but then kept using @file in the same critical section: is_file_epoll(), hlist_del_rcu() through the head, spin_unlock. A concurrent __fput() on the watched eventpoll caught the transient NULL in eventpoll_release()'s lockless fast path, skipped eventpoll_release_file() entirely, and ran to ep_eventpoll_release() -> ep_clear_and_put() -> ep_free(). That kfree()s the struct eventpoll whose embedded ->refs hlist_head is exactly where epi->fllink.pprev points and the subsequent hlist_del_rcu()'s "*pprev = next" scribbles into freed kmalloc-192 memory, which is the slab-use-after-free KASAN caught. struct file is SLAB_TYPESAFE_BY_RCU on top of that so the same window also lets the slot recycle while ep_remove() is still nominally inside file->f_lock. The upshot is an attacker-influencable kmem_cache_free() against the wrong slab cache. The comment on eventpoll_release()'s fast path - "False positives simply cannot happen because the file in on the way to be removed and nobody ( but eventpoll ) has still a reference to this file" - was itself the wrong invariant this race exploits. The fix pins @file via epi_fget() at the top of ep_remove() and gates the f_ep clear / hlist_del_rcu() on the pin succeeding. With the pin held __fput() cannot start which transitively keeps the watched struct eventpoll alive across the critical section and also prevents the struct file slot from recycling. Both UAFs are closed. If the pin fails __fput() is already in flight on @file. Because we bail before clearing f_ep that path takes eventpoll_release()'s slow path into eventpoll_release_file() which blocks on ep->mtx until ep_clear_and_put() drops it and then cleans up the orphaned epi. The bailed epi's share of ep->refcount stays intact so ep_clear_and_put()'s trailing ep_refcount_dec_and_test() cannot free the eventpoll out from under eventpoll_release_file(). With epi_fget() now gating every ep_remove() call the epi->dying flag becomes vestigial. epi->dying == true always coincides with file_ref_get() == false because __fput() is reachable only once the refcount hits zero and the refcount is monotone there. The last patch drops the flag and leaves a single coordination mechanism instead of two. * patches from https://patch.msgid.link/20260423-work-epoll-uaf-v1-0-2470f9eec0f5@kernel.org: eventpoll: drop vestigial epi->dying flag eventpoll: drop dead bool return from __ep_remove_epi() eventpoll: refresh eventpoll_release() fast-path comment eventpoll: move f_lock acquisition into __ep_remove_file() eventpoll: fix ep_remove struct eventpoll / struct file UAF eventpoll: move epi_fget() up eventpoll: rename ep_remove_safe() back to ep_remove() eventpoll: kill __ep_remove() eventpoll: split __ep_remove() eventpoll: use hlist_is_singular_node() in __ep_remove() Link: https://patch.msgid.link/20260423-work-epoll-uaf-v1-0-2470f9eec0f5@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2 dayseventpoll: drop vestigial epi->dying flagChristian Brauner1-20/+7
With ep_remove() now pinning @file via epi_fget() across the f_ep clear and hlist_del_rcu(), the dying flag no longer orchestrates anything: it was set in eventpoll_release_file() (which only runs from __fput(), i.e. after @file's refcount has reached zero) and read in __ep_remove() / ep_remove() as a cheap bail before attempting the same synchronization epi_fget() now provides unconditionally. The implication is simple: epi->dying == true always coincides with file_ref_get(&file->f_ref) == false, because __fput() is reachable only once the refcount hits zero and the refcount is monotone in that state. The READ_ONCE(epi->dying) in ep_remove() therefore selects exactly the same callers that epi_fget() would reject, just one atomic cheaper. That's not worth a struct field, a second coordination mechanism, and the comments on both. Refresh the eventpoll_release_file() comment to describe what actually makes the path race-free now (the pin in ep_remove()). No functional change: the correctness argument is unchanged, only the mechanism is now a single one instead of two. Link: https://patch.msgid.link/20260423-work-epoll-uaf-v1-10-2470f9eec0f5@kernel.org Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
2 dayseventpoll: drop dead bool return from ep_remove_epi()Christian Brauner1-8/+5
ep_remove_epi() always returns true -- the "can be disposed" answer was meaningful back when the dying-check lived inside the pre-split __ep_remove(), but after that check moved to ep_remove() the return value is just noise. Both callers gate on it unconditionally: if (ep_remove_epi(ep, epi)) WARN_ON_ONCE(ep_refcount_dec_and_test(ep)); dispose = ep_remove_epi(ep, epi); ... if (dispose && ep_refcount_dec_and_test(ep)) ep_free(ep); Make ep_remove_epi() return void, drop the dispose local in eventpoll_release_file(), and the useless conditionals at both callers. No functional change. Link: https://patch.msgid.link/20260423-work-epoll-uaf-v1-9-2470f9eec0f5@kernel.org Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
2 dayseventpoll: refresh eventpoll_release() fast-path commentChristian Brauner1-6/+10
The old comment justified the lockless READ_ONCE(file->f_ep) check with "False positives simply cannot happen because the file is on the way to be removed and nobody ( but eventpoll ) has still a reference to this file." That reasoning was the root of the UAF fixed in "eventpoll: fix ep_remove struct eventpoll / struct file UAF": __ep_remove() could clear f_ep while another close raced past the fast path and freed the watched eventpoll / recycled the struct file slot. With ep_remove() now pinning @file via epi_fget() across the f_ep clear and hlist_del_rcu(), the invariant is re-established for the right reason: anyone who might clear f_ep holds @file alive for the duration, so a NULL observation really does mean no concurrent eventpoll path has work left on this file. Refresh the comment accordingly so the next reader doesn't inherit the broken model. Link: https://patch.msgid.link/20260423-work-epoll-uaf-v1-8-2470f9eec0f5@kernel.org Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
2 dayseventpoll: move f_lock acquisition into ep_remove_file()Christian Brauner1-6/+4
Let the helper own its critical section end-to-end: take &file->f_lock at the top, read file->f_ep inside the lock, release on exit. Callers (ep_remove() and eventpoll_release_file()) no longer need to wrap the call, and the function-comment lock-handoff contract is gone. Link: https://patch.msgid.link/20260423-work-epoll-uaf-v1-7-2470f9eec0f5@kernel.org Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
2 dayseventpoll: fix ep_remove struct eventpoll / struct file UAFChristian Brauner1-6/+10
ep_remove() (via ep_remove_file()) cleared file->f_ep under file->f_lock but then kept using @file inside the critical section (is_file_epoll(), hlist_del_rcu() through the head, spin_unlock). A concurrent __fput() taking the eventpoll_release() fastpath in that window observed the transient NULL, skipped eventpoll_release_file() and ran to f_op->release / file_free(). For the epoll-watches-epoll case, f_op->release is ep_eventpoll_release() -> ep_clear_and_put() -> ep_free(), which kfree()s the watched struct eventpoll. Its embedded ->refs hlist_head is exactly where epi->fllink.pprev points, so the subsequent hlist_del_rcu()'s "*pprev = next" scribbles into freed kmalloc-192 memory. In addition, struct file is SLAB_TYPESAFE_BY_RCU, so the slot backing @file could be recycled by alloc_empty_file() -- reinitializing f_lock and f_ep -- while ep_remove() is still nominally inside that lock. The upshot is an attacker-controllable kmem_cache_free() against the wrong slab cache. Pin @file via epi_fget() at the top of ep_remove() and gate the critical section on the pin succeeding. With the pin held @file cannot reach refcount zero, which holds __fput() off and transitively keeps the watched struct eventpoll alive across the hlist_del_rcu() and the f_lock use, closing both UAFs. If the pin fails @file has already reached refcount zero and its __fput() is in flight. Because we bailed before clearing f_ep, that path takes the eventpoll_release() slow path into eventpoll_release_file() and blocks on ep->mtx until the waiter side's ep_clear_and_put() drops it. The bailed epi's share of ep->refcount stays intact, so the trailing ep_refcount_dec_and_test() in ep_clear_and_put() cannot free the eventpoll out from under eventpoll_release_file(); the orphaned epi is then cleaned up there. A successful pin also proves we are not racing eventpoll_release_file() on this epi, so drop the now-redundant re-check of epi->dying under f_lock. The cheap lockless READ_ONCE(epi->dying) fast-path bailout stays. Fixes: 58c9b016e128 ("epoll: use refcount to reduce ep_mutex contention") Reported-by: Jaeyoung Chung <jjy600901@snu.ac.kr> Link: https://patch.msgid.link/20260423-work-epoll-uaf-v1-6-2470f9eec0f5@kernel.org Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
2 dayseventpoll: move epi_fget() upChristian Brauner1-28/+28
We'll need it when removing files so move it up. No functional change. Link: https://patch.msgid.link/20260423-work-epoll-uaf-v1-5-2470f9eec0f5@kernel.org Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
2 dayseventpoll: rename ep_remove_safe() back to ep_remove()Christian Brauner1-8/+8
The current name is just confusing and doesn't clarify anything. Link: https://patch.msgid.link/20260423-work-epoll-uaf-v1-4-2470f9eec0f5@kernel.org Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
2 dayseventpoll: drop vestigial __ prefix from ep_remove_{file,epi}()Christian Brauner1-6/+6
With __ep_remove() gone, the double-underscore on __ep_remove_file() and __ep_remove_epi() no longer contrasts with a __-less parent and just reads as noise. Rename both to ep_remove_file() and ep_remove_epi(). No functional change. Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
2 dayseventpoll: kill __ep_remove()Christian Brauner1-37/+30
Remove the boolean conditional in __ep_remove() and restructure the code so the check for racing with eventpoll_release_file() are only done in the ep_remove_safe() path where they belong. Link: https://patch.msgid.link/20260423-work-epoll-uaf-v1-3-2470f9eec0f5@kernel.org Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
2 dayseventpoll: split __ep_remove()Christian Brauner1-4/+23
Split __ep_remove() to delineate file removal from epoll item removal. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Link: https://patch.msgid.link/20260423-work-epoll-uaf-v1-2-2470f9eec0f5@kernel.org Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
2 dayseventpoll: use hlist_is_singular_node() in __ep_remove()Christian Brauner1-1/+1
Replace the open-coded "epi is the only entry in file->f_ep" check with hlist_is_singular_node(). Same semantics, and the helper avoids the head-cacheline access in the common false case. Link: https://patch.msgid.link/20260423-work-epoll-uaf-v1-1-2470f9eec0f5@kernel.org Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
2 daysfs: Handle multiply claimed blocks more gracefully with mmbJan Kara1-1/+8
When a metadata block is referenced by multiple inodes and tracked by metadata bh infrastructure (which is forbidden and generally indicates filesystem corruption), it can happen that mmb_mark_buffer_dirty() is called for two different mmb structures in parallel. This can lead to a corruption of mmb linked list. Handle that situation gracefully (at least from mmb POV) by serializing on setting bh->b_mmb. Reported-by: Ruikai Peng <ruikai@pwno.io> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20260423090311.10955-2-jack@suse.cz Signed-off-by: Christian Brauner <brauner@kernel.org>
2 daysnstree: fix func. parameter kernel-doc warningsRandy Dunlap1-3/+3
Use the correct parameter name ("__ns") for function parameter kernel-doc to avoid 3 warnings: Warning: include/linux/nstree.h:68 function parameter '__ns' not described in 'ns_tree_add_raw' Warning: include/linux/nstree.h:77 function parameter '__ns' not described in 'ns_tree_add' Warning: include/linux/nstree.h:88 function parameter '__ns' not described in 'ns_tree_remove' Fixes: 885fc8ac0a4d ("nstree: make iterator generic") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Link: https://patch.msgid.link/20260416215429.948898-1-rdunlap@infradead.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2 daysfs: aio: reject partial mremap to avoid Null-pointer-dereference errorZizhi Wo1-1/+2
[BUG] Recently, our internal syzkaller testing uncovered a null pointer dereference issue: BUG: kernel NULL pointer dereference, address: 0000000000000000 ... [ 51.111664] filemap_read_folio+0x25/0xe0 [ 51.112410] filemap_fault+0xad7/0x1250 [ 51.113112] __do_fault+0x4b/0x460 [ 51.113699] do_pte_missing+0x5bc/0x1db0 [ 51.114250] ? __pte_offset_map+0x23/0x170 [ 51.114822] __handle_mm_fault+0x9f8/0x1680 ... Crash analysis showed the file involved was an AIO ring file. The phenomenon triggered is the same as the issue described in [1]. [CAUSE] Consider the following scenario: userspace sets up an AIO context via io_setup(), which creates a VMA covering the entire ring buffer. Then userspace calls mremap() with the AIO ring address as the source, a smaller old_len (less than the full ring size), MREMAP_MAYMOVE set, and without MREMAP_DONTUNMAP. The kernel will relocate the requested portion to a new destination address. During this move, __split_vma() splits the original AIO ring VMA. The requested portion is unmapped from the source and re-established at the destination, while the remainder stays at the original source address as an orphan VMA. The aio_ring_mremap() callback fires on the new destination VMA, updating ctx->mmap_base to the destination address. But the callback is unaware that only a partial region was moved and that an orphan VMA still exists at the source: source(AIO): +-------------------+---------------------+ | moved to dest | orphan VMA (AIO) | +-------------------+---------------------+ A A+partial_len A+ctx->mmap_size dest: +-------------------+ | moved VMA (AIO) | +-------------------+ B B+partial_len Later, io_destroy() calls vm_munmap(ctx->mmap_base, ctx->mmap_size), which unmaps the destination. This not only fails to unmap the orphan VMA at the source, but also overshoots the destination VMA and may unmap unrelated mappings adjacent to it! After put_aio_ring_file() calls truncate_setsize() to remove all pages from the pagecache, any subsequent access to the orphan VMA triggers filemap_fault(), which calls a_ops->read_folio(). Since aio does not implement read_folio, this results in a NULL pointer dereference. [FIX] Note that expanding mremap (new_len > old_len) is already rejected because AIO ring VMAs are created with VM_DONTEXPAND. The only problematic case is a partial move where "old_len == new_len" but both are smaller than the full ring size. Fix this by checking in aio_ring_mremap() that the new VMA covers the entire ring. This ensures the AIO ring is always moved as a whole, preventing orphan VMAs and the subsequent crash. [1]: https://lore.kernel.org/all/20260413010814.548568-1-wozizhi@huawei.com/ Signed-off-by: Zizhi Wo <wozizhi@huaweicloud.com> Link: https://patch.msgid.link/20260418060634.3713620-1-wozizhi@huaweicloud.com Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2 daysfuse: reject oversized dirents in page cacheSamuel Page1-0/+4
fuse_add_dirent_to_cache() computes a serialized dirent size from the server-controlled namelen field and copies the dirent into a single page-cache page. The existing logic only checks whether the dirent fits in the remaining space of the current page and advances to a fresh page if not. It never checks whether the dirent itself exceeds PAGE_SIZE. As a result, a malicious FUSE server can return a dirent with namelen=4095, producing a serialized record size of 4120 bytes. On 4 KiB page systems this causes memcpy() to overflow the cache page by 24 bytes into the following kernel page. Reject dirents that cannot fit in a single page before copying them into the readdir cache. Fixes: 69e34551152a ("fuse: allow caching readdir") Cc: stable@vger.kernel.org # v6.16+ Assisted-by: Bynario AI Signed-off-by: Samuel Page <sam@bynar.io> Reported-by: Qi Tang <tpluszz77@gmail.com> Reported-by: Zijun Hu <nightu@northwestern.edu> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Link: https://patch.msgid.link/20260420090139.662772-1-mszeredi@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2 dayswriteback: Fix use after free in inode_switch_wbs_work_fn()Jan Kara1-17/+19
inode_switch_wbs_work_fn() has a loop like: wb_get(new_wb); while (1) { list = llist_del_all(&new_wb->switch_wbs_ctxs); /* Nothing to do? */ if (!list) break; ... process the items ... } Now adding of items to the list looks like: wb_queue_isw() if (llist_add(&isw->list, &wb->switch_wbs_ctxs)) queue_work(isw_wq, &wb->switch_work); Because inode_switch_wbs_work_fn() loops when processing isw items, it can happen that wb->switch_work is pending while wb->switch_wbs_ctxs is empty. This is a problem because in that case wb can get freed (no isw items -> no wb reference) while the work is still pending causing use-after-free issues. We cannot just fix this by cancelling work when freeing wb because that could still trigger problematic 0 -> 1 transitions on wb refcount due to wb_get() in inode_switch_wbs_work_fn(). It could be all handled with more careful code but that seems unnecessarily complex so let's avoid that until it is proven that the looping actually brings practical benefit. Just remove the loop from inode_switch_wbs_work_fn() instead. That way when wb_queue_isw() queues work, we are guaranteed we have added the first item to wb->switch_wbs_ctxs and nobody is going to remove it (and drop the wb reference it holds) until the queued work runs. Fixes: e1b849cfa6b6 ("writeback: Avoid contention on wb->list_lock when switching inodes") CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20260413093618.17244-2-jack@suse.cz Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2 daysfs: aio: set VMA_DONTCOPY_BIT in mmap to fix NULL-pointer-dereference errorZizhi Wo1-1/+1
[BUG] Recently, our internal syzkaller testing uncovered a null pointer dereference issue: BUG: kernel NULL pointer dereference, address: 0000000000000000 ... [ 51.111664] filemap_read_folio+0x25/0xe0 [ 51.112410] filemap_fault+0xad7/0x1250 [ 51.113112] __do_fault+0x4b/0x460 [ 51.113699] do_pte_missing+0x5bc/0x1db0 [ 51.114250] ? __pte_offset_map+0x23/0x170 [ 51.114822] __handle_mm_fault+0x9f8/0x1680 [ 51.115408] handle_mm_fault+0x24c/0x570 [ 51.115958] do_user_addr_fault+0x226/0xa50 ... Crash analysis showed the file involved was an AIO ring file. [CAUSE] PARENT process CHILD process t=0 io_setup(1, &ctx) [access ctx addr] fork() io_destroy vm_munmap // not affect child vma percpu_ref_put ... put_aio_ring_file t=1 [access ctx addr] // pagefault ... __do_fault filemap_fault max_idx = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE) t=2 truncate_setsize truncate_pagecache t=3 filemap_get_folio // no folio, create folio __filemap_get_folio(..., FGP_CREAT, ...) // page_not_uptodate filemap_read_folio(file, mapping->a_ops->read_folio, folio) // oops! At t=0, the parent process calls io_setup and then fork. The child process gets its own VMA but without any PTEs. The parent then calls io_destroy. Before i_size is truncated to 0, at t=1 the child process accesses this AIO ctx address and triggers a pagefault. After the max_idx check passes, at t=2 the parent calls truncate_setsize and truncate_pagecache. At t=3 the child fails to obtain the folio, falls into the "page_not_uptodate" path, and hits this problem because AIO does not implement "read_folio". [Fix] Fix this by marking the AIO ring buffer VMA with VM_DONTCOPY so that fork()'s dup_mmap() skips it entirely. This is the correct semantic because: 1) The child's ioctx_table is already reset to NULL by mm_init_aio() during fork(), so the child has no AIO context and no way to perform any AIO operations on this mapping. 2) The AIO ring VMA is only meaningful in conjunction with its associated kioctx, which is never inherited across fork(). So child process with no AIO context has no legitimate reason to access the ring buffer. Delivering SIGSEGV on such an erroneous access is preferable to a kernel crash. Signed-off-by: Zizhi Wo <wozizhi@huaweicloud.com> Link: https://patch.msgid.link/20260413010814.548568-1-wozizhi@huawei.com Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2 daysMerge branch 'rxrpc-miscellaneous-fixes'Jakub Kicinski4-14/+5
David Howells says: ==================== rxrpc: Miscellaneous fixes Here are some fixes for rxrpc, as found by Sashiko[1]: (1) Fix rxrpc_input_call_event() to only unshare DATA packets. (2) Fix re-decryption of RESPONSE packets where a partially decrypted skbuff gets requeued if there was a failure due to ENOMEM. (3) Fix error handling in rxgk_extract_token() where the ENOMEM case is unhandled. Link: https://sashiko.dev/#/patchset/20260422161438.2593376-4-dhowells@redhat.com [1] ==================== Link: https://patch.msgid.link/20260423200909.3049438-1-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 daysrxrpc: Fix error handling in rxgk_extract_token()David Howells1-0/+1
Fix a missing bit of error handling in rxgk_extract_token(): in the event that rxgk_decrypt_skb() returns -ENOMEM, it should just return that rather than continuing on (for anything else, it generates an abort). Fixes: 64863f4ca494 ("rxrpc: Fix unhandled errors in rxgk_verify_packet_integrity()") Closes: https://sashiko.dev/#/patchset/20260422161438.2593376-4-dhowells@redhat.com Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Jeffrey Altman <jaltman@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org cc: stable@kernel.org Link: https://patch.msgid.link/20260423200909.3049438-4-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 daysrxrpc: Fix re-decryption of RESPONSE packetsDavid Howells2-13/+2
If a RESPONSE packet gets a temporary failure during processing, it may end up in a partially decrypted state - and then get requeued for a retry. Fix this by just discarding the packet; we will send another CHALLENGE packet and thereby elicit a further response. Similarly, discard an incoming CHALLENGE packet if we get an error whilst generating a RESPONSE; the server will send another CHALLENGE. Fixes: 17926a79320a ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both") Closes: https://sashiko.dev/#/patchset/20260422161438.2593376-4-dhowells@redhat.com Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Jeffrey Altman <jaltman@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org cc: stable@kernel.org Link: https://patch.msgid.link/20260423200909.3049438-3-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 daysrxrpc: Fix rxrpc_input_call_event() to only unshare DATA packetsDavid Howells1-1/+2
Fix rxrpc_input_call_event() to only unshare DATA packets and not ACK, ABORT, etc.. And with that, rxrpc_input_packet() doesn't need to take a pointer to the pointer to the packet, so change that to just a pointer. Fixes: 1f2740150f90 ("rxrpc: Fix potential UAF after skb_unshare() failure") Closes: https://sashiko.dev/#/patchset/20260422161438.2593376-4-dhowells@redhat.com Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Jeffrey Altman <jaltman@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org cc: stable@kernel.org Link: https://patch.msgid.link/20260423200909.3049438-2-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 daysMerge branch 'rxrpc-miscellaneous-fixes'Jakub Kicinski10-100/+106
David Howells says: ==================== rxrpc: Miscellaneous fixes Here are some fixes for rxrpc, as found by Sashiko[1]: (1) Fix leaks in rxkad_verify_response(). (2) Fix handling of rxkad-encrypted packets with crypto-misaligned lengths. (3) Fix problem with unsharing DATA packets potentially causing a crash in the caller. (4) Fix lack of unsharing of RESPONSE packets. (5) Fix integer overflow in RxGK ticket length check. (6) Fix missing length check in RxKAD tickets. ==================== Link: https://patch.msgid.link/20260422161438.2593376-1-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 daysrxrpc: Fix missing validation of ticket length in non-XDR key preparsingAnderson Nascimento1-0/+4
In rxrpc_preparse(), there are two paths for parsing key payloads: the XDR path (for large payloads) and the non-XDR path (for payloads <= 28 bytes). While the XDR path (rxrpc_preparse_xdr_rxkad()) correctly validates the ticket length against AFSTOKEN_RK_TIX_MAX, the non-XDR path fails to do so. This allows an unprivileged user to provide a very large ticket length. When this key is later read via rxrpc_read(), the total token size (toksize) calculation results in a value that exceeds AFSTOKEN_LENGTH_MAX, triggering a WARN_ON(). [ 2001.302904] WARNING: CPU: 2 PID: 2108 at net/rxrpc/key.c:778 rxrpc_read+0x109/0x5c0 [rxrpc] Fix this by adding a check in the non-XDR parsing path of rxrpc_preparse() to ensure the ticket length does not exceed AFSTOKEN_RK_TIX_MAX, bringing it into parity with the XDR parsing logic. Fixes: 8a7a3eb4ddbe ("KEYS: RxRPC: Use key preparsing") Fixes: 84924aac08a4 ("rxrpc: Fix checker warning") Reported-by: Anderson Nascimento <anderson@allelesecurity.com> Signed-off-by: Anderson Nascimento <anderson@allelesecurity.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Jeffrey Altman <jaltman@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org cc: stable@kernel.org Link: https://patch.msgid.link/20260422161438.2593376-7-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 daysrxgk: Fix potential integer overflow in length checkDavid Howells2-1/+2
Fix potential integer overflow in rxgk_extract_token() when checking the length of the ticket. Rather than rounding up the value to be tested (which might overflow), round down the size of the available data. Fixes: 2429a1976481 ("rxrpc: Fix untrusted unsigned subtract") Closes: https://sashiko.dev/#/patchset/20260408121252.2249051-1-dhowells%40redhat.com Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Jeffrey Altman <jaltman@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org cc: stable@kernel.org Link: https://patch.msgid.link/20260422161438.2593376-6-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 daysrxrpc: Fix conn-level packet handling to unshare RESPONSE packetsDavid Howells1-1/+28
The security operations that verify the RESPONSE packets decrypt bits of it in place - however, the sk_buff may be shared with a packet sniffer, which would lead to the sniffer seeing an apparently corrupt packet (actually decrypted). Fix this by handing a copy of the packet off to the specific security handler if the packet was cloned. Fixes: 17926a79320a ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both") Closes: https://sashiko.dev/#/patchset/20260408121252.2249051-1-dhowells%40redhat.com Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Jeffrey Altman <jaltman@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org cc: stable@kernel.org Link: https://patch.msgid.link/20260422161438.2593376-5-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 daysrxrpc: Fix potential UAF after skb_unshare() failureDavid Howells5-35/+22
If skb_unshare() fails to unshare a packet due to allocation failure in rxrpc_input_packet(), the skb pointer in the parent (rxrpc_io_thread()) will be NULL'd out. This will likely cause the call to trace_rxrpc_rx_done() to oops. Fix this by moving the unsharing down to where rxrpc_input_call_event() calls rxrpc_input_call_packet(). There are a number of places prior to that where we ignore DATA packets for a variety of reasons (such as the call already being complete) for which an unshare is then avoided. And with that, rxrpc_input_packet() doesn't need to take a pointer to the pointer to the packet, so change that to just a pointer. Fixes: 2d1faf7a0ca3 ("rxrpc: Simplify skbuff accounting in receive path") Closes: https://sashiko.dev/#/patchset/20260408121252.2249051-1-dhowells%40redhat.com Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Jeffrey Altman <jaltman@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org cc: stable@kernel.org Link: https://patch.msgid.link/20260422161438.2593376-4-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 daysrxrpc: Fix rxkad crypto unalignment handlingDavid Howells2-2/+8
Fix handling of a packet with a misaligned crypto length. Also handle non-ENOMEM errors from decryption by aborting. Further, remove the WARN_ON_ONCE() so that it can't be remotely triggered (a trace line can still be emitted). Fixes: f93af41b9f5f ("rxrpc: Fix missing error checks for rxkad encryption/decryption failure") Closes: https://sashiko.dev/#/patchset/20260408121252.2249051-1-dhowells%40redhat.com Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Jeffrey Altman <jaltman@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org cc: stable@kernel.org Link: https://patch.msgid.link/20260422161438.2593376-3-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 daysrxrpc: Fix memory leaks in rxkad_verify_response()David Howells1-61/+42
Fix rxkad_verify_response() to free the ticket and the server key under all circumstances by initialising the ticket pointer to NULL and then making all paths through the function after the first allocation has been done go through a single common epilogue that just releases everything - where all the releases skip on a NULL pointer. Fixes: 57af281e5389 ("rxrpc: Tidy up abort generation infrastructure") Fixes: ec832bd06d6f ("rxrpc: Don't retain the server key in the connection") Closes: https://sashiko.dev/#/patchset/20260408121252.2249051-1-dhowells%40redhat.com Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Jeffrey Altman <jaltman@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org cc: stable@kernel.org Link: https://patch.msgid.link/20260422161438.2593376-2-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 daysMerge tag 'acpi-7.1-rc1-2' of ↵Linus Torvalds5-11/+26
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI support fixes from Rafael Wysocki: "These fix two potential refcount leaks in error code paths in the ACPI core code, address a recently introduced build breakage related to the CPU UID handling consolidation, fix up a recently added MAINTAINERS entry, fix the quirk list in the ACPI video bus driver, and add a new quirk to it: - Add an acpi_get_cpu_uid() stub helper to address an x86 Xen support build breakage (Arnd Bergmann) - Use acpi_dev_put() in object add error paths in the ACPI core to avoid refcount leaks (Guangshuo Li) - Adjust the file entry in the recently added NVIDIA GHES HANDLER entry in MAINTAINERS to the actual existing file (Lukas Bulwahn) - Add backlight=native quirk for Dell OptiPlex 7770 AIO to the ACPI video bus driver (Jan Schär) - Move Lenovo Legion S7 15ACH6 quirk to the right section of the quirk list in the ACPI video bus driver (Hans de Goede)" * tag 'acpi-7.1-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI: video: Move Lenovo Legion S7 15ACH6 quirk to the right section ACPI: video: Add backlight=native quirk for Dell OptiPlex 7770 AIO ACPI: add acpi_get_cpu_uid() stub helper MAINTAINERS: adjust file entry in NVIDIA GHES HANDLER ACPI: scan: Use acpi_dev_put() in object add error paths
2 daysnet: rds: fix MR cleanup on copy errorAo Zhou1-4/+0
__rds_rdma_map() hands sg/pages ownership to the transport after get_mr() succeeds. If copying the generated cookie back to user space fails after that point, the error path must not free those resources again before dropping the MR reference. Remove the duplicate unpin/free from the put_user() failure branch so that MR teardown is handled only through the existing final cleanup path. Fixes: 0d4597c8c5ab ("net/rds: Track user mapped pages through special API") Cc: stable@kernel.org Reported-by: Yuan Tan <yuantan098@gmail.com> Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Reported-by: Xin Liu <bird@lzu.edu.cn> Signed-off-by: Ao Zhou <draw51280@163.com> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn> Reviewed-by: Allison Henderson <achender@kernel.org> Link: https://patch.msgid.link/79c8ef73ec8e5844d71038983940cc2943099baf.1776764247.git.draw51280@163.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 daysm68k: mvme147: Make me the maintainerDaniel Palmer1-0/+7
I'm actively using mainline + patches on this board as a bootloader for another VME board and as a terminal server using a multiport serial board in the same VME backplane. I even have mainline u-boot on real EPROMs. Make me the maintainer of its ethernet, scsi and arch code so I get an email before one or more of them get deleted. Signed-off-by: Daniel Palmer <daniel@thingy.jp> Link: https://patch.msgid.link/20260422132710.2855826-1-daniel@thingy.jp Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 daysnet: txgbe: fix firmware version checkJiawen Wu1-1/+2
For the device SP, the firmware version is a 32-bit value where the lower 20 bits represent the base version number. And the customized firmware version populates the upper 12 bits with a specific identification number. For other devices AML 25G and 40G, the upper 12 bits of the firmware version is always non-zero, and they have other naming conventions. Only SP devices need to check this to tell if XPCS will work properly. So the judgement of MAC type is added here. And the original logic compared the entire 32-bit value against 0x20010, which caused the outdated base firmwares bypass the version check without a warning. Apply a mask 0xfffff to isolate the lower 20 bits for an accurate base version comparison. Fixes: ab928c24e6cd ("net: txgbe: add FW version warning") Cc: stable@vger.kernel.org Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/C787AA5C07598B13+20260422071837.372731-1-jiawenwu@trustnetic.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 daysMerge branches 'acpi-scan', 'acpi-apei' and 'acpi-video'Rafael J. Wysocki4-11/+20
Merge an ACPI core fix, a fix for the new NVIDIA GHES HANDLER entry in MAINTAINERS, a new quirk for the ACPI video bus driver and a quirk list fix for that driver for 7.1-rc1: - Use acpi_dev_put() in object add error paths in the ACPI core to avoid refcount leaks (Guangshuo Li) - Adjust the file entry in the recently added NVIDIA GHES HANDLER entry in MAINTAINERS to the actual existing file (Lukas Bulwahn) - Add backlight=native quirk for Dell OptiPlex 7770 AIO to the ACPI video bus driver (Jan Schär) - Move Lenovo Legion S7 15ACH6 quirk to the right section of the quirk list in the ACPI video bus driver (Hans de Goede) * acpi-scan: ACPI: scan: Use acpi_dev_put() in object add error paths * acpi-apei: MAINTAINERS: adjust file entry in NVIDIA GHES HANDLER * acpi-video: ACPI: video: Move Lenovo Legion S7 15ACH6 quirk to the right section ACPI: video: Add backlight=native quirk for Dell OptiPlex 7770 AIO
2 daysMerge branch 'tcp-fix-listener-wakeup-after-reuseport-migration'Jakub Kicinski2-7/+45
Zhenzhong Wu says: ==================== tcp: fix listener wakeup after reuseport migration This series fixes a missing wakeup when inet_csk_listen_stop() migrates an established child socket from a closing listener to another socket in the same SO_REUSEPORT group after the child has already been queued for accept. The target listener receives the migrated accept-queue entry via inet_csk_reqsk_queue_add(), but its waiters are not notified. Nonblocking accept() still succeeds because it checks the accept queue directly, but readiness-based waiters can remain asleep until another connection generates a wakeup. Patch 1 notifies the target listener after a successful migration in inet_csk_listen_stop() and protects the post-queue_add() nsk accesses with rcu_read_lock()/rcu_read_unlock(). Patch 2 extends the existing migrate_reuseport BPF selftest with epoll readiness checks inside migrate_dance(), around shutdown() where the migration happens. The test now verifies that the target listener is not ready before migration and becomes ready immediately after it, for both TCP_ESTABLISHED and TCP_SYN_RECV. TCP_NEW_SYN_RECV remains excluded because it still depends on later handshake completion. Testing: - On a local unpatched kernel, the focused migrate_reuseport test fails for the listener-migration cases and passes for the TCP_NEW_SYN_RECV cases: not ok 1 IPv4 TCP_ESTABLISHED inet_csk_listen_stop not ok 2 IPv4 TCP_SYN_RECV inet_csk_listen_stop ok 3 IPv4 TCP_NEW_SYN_RECV reqsk_timer_handler ok 4 IPv4 TCP_NEW_SYN_RECV inet_csk_complete_hashdance not ok 5 IPv6 TCP_ESTABLISHED inet_csk_listen_stop not ok 6 IPv6 TCP_SYN_RECV inet_csk_listen_stop ok 7 IPv6 TCP_NEW_SYN_RECV reqsk_timer_handler ok 8 IPv6 TCP_NEW_SYN_RECV inet_csk_complete_hashdance - On a patched kernel booted under QEMU, the full migrate_reuseport selftest passes: ok 1 IPv4 TCP_ESTABLISHED inet_csk_listen_stop ok 2 IPv4 TCP_SYN_RECV inet_csk_listen_stop ok 3 IPv4 TCP_NEW_SYN_RECV reqsk_timer_handler ok 4 IPv4 TCP_NEW_SYN_RECV inet_csk_complete_hashdance ok 5 IPv6 TCP_ESTABLISHED inet_csk_listen_stop ok 6 IPv6 TCP_SYN_RECV inet_csk_listen_stop ok 7 IPv6 TCP_NEW_SYN_RECV reqsk_timer_handler ok 8 IPv6 TCP_NEW_SYN_RECV inet_csk_complete_hashdance SELFTEST_RC=0 ==================== Link: https://patch.msgid.link/20260422024554.130346-1-jt26wzz@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 daysselftests/bpf: check epoll readiness during reuseport migrationZhenzhong Wu1-7/+42
Inside migrate_dance(), add epoll checks around shutdown() to verify that the target listener is not ready before shutdown() and becomes ready immediately after shutdown() triggers migration. Cover TCP_ESTABLISHED and TCP_SYN_RECV. Exclude TCP_NEW_SYN_RECV as it depends on later handshake completion. Suggested-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Signed-off-by: Zhenzhong Wu <jt26wzz@gmail.com> Link: https://patch.msgid.link/20260422024554.130346-3-jt26wzz@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 daystcp: call sk_data_ready() after listener migrationZhenzhong Wu1-0/+3
When inet_csk_listen_stop() migrates an established child socket from a closing listener to another socket in the same SO_REUSEPORT group, the target listener gets a new accept-queue entry via inet_csk_reqsk_queue_add(), but that path never notifies the target listener's waiters. A nonblocking accept() still works because it checks the queue directly, but poll()/epoll_wait() waiters and blocking accept() callers can also remain asleep indefinitely. Call READ_ONCE(nsk->sk_data_ready)(nsk) after a successful migration in inet_csk_listen_stop(). However, after inet_csk_reqsk_queue_add() succeeds, the ref acquired in reuseport_migrate_sock() is effectively transferred to nreq->rsk_listener. Another CPU can then dequeue nreq via accept() or listener shutdown, hit reqsk_put(), and drop that listener ref. Since listeners are SOCK_RCU_FREE, wrap the post-queue_add() dereferences of nsk in rcu_read_lock()/rcu_read_unlock(), which also covers the existing sock_net(nsk) access in that path. The reqsk_timer_handler() path does not need the same changes for two reasons: half-open requests become readable only after the final ACK, where tcp_child_process() already wakes the listener; and once nreq is visible via inet_ehash_insert(), the success path no longer touches nsk directly. Fixes: 54b92e841937 ("tcp: Migrate TCP_ESTABLISHED/TCP_SYN_RECV sockets in accept queues.") Cc: stable@vger.kernel.org Suggested-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Signed-off-by: Zhenzhong Wu <jt26wzz@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260422024554.130346-2-jt26wzz@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 daysvhost_net: fix sleeping with preempt-disabled in vhost_net_busy_poll()Kohei Enju1-2/+2
syzbot reported "sleeping function called from invalid context" in vhost_net_busy_poll(). Commit 030881372460 ("vhost_net: basic polling support") introduced a busy-poll loop and preempt_{disable,enable}() around it, where each iteration calls a sleepable function inside the loop. The purpose of disabling preemption was to keep local_clock()-based timeout accounting on a single CPU, rather than as a requirement of busy-poll itself: https://lore.kernel.org/1448435489-5949-4-git-send-email-jasowang@redhat.com From this perspective, migrate_disable() is sufficient here, so replace preempt_disable() with migrate_disable(), avoiding sleepable accesses from a preempt-disabled context. Fixes: 030881372460 ("vhost_net: basic polling support") Tested-by: syzbot+6985cb8e543ea90ba8ee@syzkaller.appspotmail.com Reported-by: syzbot+6985cb8e543ea90ba8ee@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/69e6a414.050a0220.24bfd3.002d.GAE@google.com/T/ Signed-off-by: Kohei Enju <kohei@enjuk.jp> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 daysipv6: Cap TLV scan in ip6_tnl_parse_tlv_enc_limDaniel Borkmann1-0/+6
Commit 47d3d7ac656a ("ipv6: Implement limits on Hop-by-Hop and Destination options") added net.ipv6.max_{hbh,dst}_opts_{cnt,len} and applied them in ip6_parse_tlv(), the generic TLV walker invoked from ipv6_destopt_rcv() and ipv6_parse_hopopts(). ip6_tnl_parse_tlv_enc_lim() does not go through ip6_parse_tlv(); it has its own hand-rolled TLV scanner inside its NEXTHDR_DEST branch which looks for IPV6_TLV_TNL_ENCAP_LIMIT. That inner loop is bounded only by optlen, which can be up to 2048 bytes. Stuffing the Destination Options header with 2046 Pad1 (type=0) entries advances the scanner a single byte at a time, yielding ~2000 TLV iterations per extension header. Reusing max_dst_opts_cnt to bound the TLV iterations, matching the semantics from 47d3d7ac656a, would require duplicating ip6_parse_tlv() to also validate Pad1/PadN payload. It would also mandate enforcing max_dst_opts_len, since otherwise an attacker shifts the axis to few options with a giant PadN and recovers the original DoS. Allowing up to 8 options before the tunnel encapsulation limit TLV is liberal enough; in practice encap limit is the first TLV. Thus, go with a hard-coded limit IP6_TUNNEL_MAX_DEST_TLVS (8). Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Justin Iurman <justin.iurman@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 daystipc: fix double-free in tipc_buf_append()Lee Jones1-1/+13
tipc_msg_validate() can potentially reallocate the skb it is validating, freeing the old one. In tipc_buf_append(), it was being called with a pointer to a local variable which was a copy of the caller's skb pointer. If the skb was reallocated and validation subsequently failed, the error handling path would free the original skb pointer, which had already been freed, leading to double-free. Fix this by checking if head now points to a newly allocated reassembled skb. If it does, reassign *headbuf for later freeing operations. Fixes: d618d09a68e4 ("tipc: enforce valid ratio between skb truesize and contents") Suggested-by: Tung Nguyen <tung.quang.nguyen@est.tech> Signed-off-by: Lee Jones <lee@kernel.org> Reviewed-by: Tung Nguyen <tung.quang.nguyen@est.tech> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 daysllc: Return -EINPROGRESS from llc_ui_connect()Ernestas Kulik1-1/+3
Given a zero sk_sndtimeo, llc_ui_connect() skips waiting for state change and returns 0, confusing userspace applications that will assume the socket is connected, making e.g. getpeername() calls error out. More specifically, the issue was discovered in libcoap, where newly-added AF_LLC socket support was behaving differently from AF_INET connections due to EINPROGRESS handling being skipped. Set rc to -EINPROGRESS if connect() would not block, akin to AF_INET sockets. Signed-off-by: Ernestas Kulik <ernestas.k@iconn-networks.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260421060304.285419-1-ernestas.k@iconn-networks.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 daysipv4: icmp: validate reply type before using icmp_pointersRuide Cao1-1/+4
Extended echo replies use ICMP_EXT_ECHOREPLY as the outbound reply type. That value is outside the range covered by icmp_pointers[], which only describes the traditional ICMP types up to NR_ICMP_TYPES. Avoid consulting icmp_pointers[] for reply types outside that range, and use array_index_nospec() for the remaining in-range lookup. Normal ICMP replies keep their existing behavior unchanged. Fixes: d329ea5bd884 ("icmp: add response to RFC 8335 PROBE messages") Cc: stable@kernel.org Reported-by: Yuan Tan <yuantan098@gmail.com> Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Reported-by: Xin Liu <bird@lzu.edu.cn> Signed-off-by: Ruide Cao <caoruide123@gmail.com> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/0dace90c01a5978e829ca741ef684dbd7304ce62.1776628519.git.caoruide123@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 daysMerge tag 'pcmcia-7.1-rc1' of ↵Linus Torvalds13-3169/+9
git://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux Pull PCMCIA updates from Dominik Brodowski: "A number of minor PCMCIA bugfixes and cleanups, and a patch removing obsolete host controller drivers" * tag 'pcmcia-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux: pcmcia: remove obsolete host controller drivers pcmcia: Convert to use less arguments in pci_bus_for_each_resource() PCMCIA: Fix garbled log messages for KERN_CONT
2 daysMerge branch 'tcp-symmetric-challenge-ack-for-seg-ack-snd-nxt'Jakub Kicinski3-4/+58
Jiayuan Chen says: ==================== tcp: symmetric challenge ACK for SEG.ACK > SND.NXT Commit 354e4aa391ed ("tcp: RFC 5961 5.2 Blind Data Injection Attack Mitigation") quotes RFC 5961 Section 5.2 in full, which requires that any incoming segment whose ACK value falls outside [SND.UNA - MAX.SND.WND, SND.NXT] MUST be discarded and an ACK sent back. Linux currently sends that challenge ACK only on the lower edge (SEG.ACK < SND.UNA - MAX.SND.WND); on the symmetric upper edge (SEG.ACK > SND.NXT) the segment is silently dropped with SKB_DROP_REASON_TCP_ACK_UNSENT_DATA. Patch 1 completes the mitigation by emitting a rate-limited challenge ACK on that branch, reusing tcp_send_challenge_ack() and honouring FLAG_NO_CHALLENGE_ACK for consistency with the lower-edge case. It also updates the existing tcp_ts_recent_invalid_ack.pkt selftest, which drives this exact path, to consume the new challenge ACK so bisect stays clean. Patch 2 adds a new packetdrill selftest that exercises RFC 5961 Section 5.2 on both edges of the acceptable window, filling a gap in the selftests tree (neither edge had dedicated coverage before). ==================== Link: https://patch.msgid.link/20260422123605.320000-1-jiayuan.chen@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>