summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
27 hourstcp: secure_seq: add back ports to TS offsetEric Dumazet2-9/+42
[ Upstream commit 165573e41f2f66ef98940cf65f838b2cb575d9d1 ] This reverts 28ee1b746f49 ("secure_seq: downgrade to per-host timestamp offsets") tcp_tw_recycle went away in 2017. Zhouyan Deng reported off-path TCP source port leakage via SYN cookie side-channel that can be fixed in multiple ways. One of them is to bring back TCP ports in TS offset randomization. As a bonus, we perform a single siphash() computation to provide both an ISN and a TS offset. Fixes: 28ee1b746f49 ("secure_seq: downgrade to per-host timestamp offsets") Reported-by: Zhouyan Deng <dengzhouyan_nwpu@163.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Acked-by: Florian Westphal <fw@strlen.de> Link: https://patch.msgid.link/20260302205527.1982836-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> (cherry picked from commit 165573e41f2f66ef98940cf65f838b2cb575d9d1) [kept the DCCP functions in the header, as DCCP was not retired yet in 6.12] Signed-off-by: Heiko Stuebner <heiko.stuebner@cherry.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
27 hoursnet: introduce EXPORT_IPV6_MOD() and EXPORT_IPV6_MOD_GPL()Eric Dumazet1-0/+8
[ Upstream commit 54568a84c95bdea20227cf48d41f198d083e78dd ] We have many EXPORT_SYMBOL(x) in networking tree because IPv6 can be built as a module. CONFIG_IPV6=y is becoming the norm. Define a EXPORT_IPV6_MOD(x) which only exports x for modular IPv6. Same principle applies to EXPORT_IPV6_MOD_GPL() Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Link: https://patch.msgid.link/20250212132418.1524422-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> (cherry picked from commit 54568a84c95bdea20227cf48d41f198d083e78dd) [needed as dependency for tcp: secure_seq: add back ports to TS offset] Signed-off-by: Heiko Stuebner <heiko.stuebner@cherry.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
27 hoursRDMA: Move DMA block iterator logic into dedicated filesLeon Romanovsky3-80/+88
[ Upstream commit 6094ea64c69520ed1e770e7c79c43412de202bfa ] The DMA iterator logic was mixed into verbs and umem-specific code, forcing all users to include rdma/ib_umem.h. Move the block iterator logic into iter.c and rdma/iter.h so that rdma/ib_umem.h and rdma/ib_verbs.h can be separated in a follow-up patch. Link: https://patch.msgid.link/20260213-refactor-umem-v1-1-f3be85847922@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Stable-dep-of: 15fe76e23615 ("RDMA/umem: Fix truncation for block sizes >= 4G") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
27 hoursRDMA/umem: fix kernel-doc warningsRandy Dunlap1-1/+5
[ Upstream commit ff46d1392750444fab5ae5a0194764ffdc4ac0d2 ] Add or correct kernel-doc comments to eliminate warnings: Warning: include/rdma/ib_umem.h:104 function parameter 'biter' not described in 'rdma_umem_for_each_dma_block' Warning: include/rdma/ib_umem.h:140 function parameter 'pgsz_bitmap' not described in 'ib_umem_find_best_pgoff' Warning: include/rdma/ib_umem.h:141 No description found for return value of 'ib_umem_find_best_pgoff' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Link: https://patch.msgid.link/20260224003120.3173892-1-rdunlap@infradead.org Signed-off-by: Leon Romanovsky <leon@kernel.org> Stable-dep-of: 15fe76e23615 ("RDMA/umem: Fix truncation for block sizes >= 4G") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
27 hoursRDMA: During rereg_mr ensure that REREG_ACCESS is compatibleJason Gunthorpe1-0/+8
[ Upstream commit badad6fad60def1b9805559dd81dbab3d97b82aa ] If IB_MR_REREG_ACCESS changes from RO to RW then the umem has to be re-evaluated to ensure it is properly pinned as RW. Since the umem is hidden inside each driver's mr struct add a ib_umem_check_rereg() function that each driver has to call before processing IB_MR_REREG_ACCESS. mlx4 has to retain its duplicate ib_access_writable check because it implements IB_MR_REREG_ACCESS | IB_MR_REREG_TRANS by changing both items in place sequentially while the MR is live, so it will continue to not support this combination. Cc: stable@vger.kernel.org Fixes: b40656aa7d55 ("RDMA/umem: remove FOLL_FORCE usage") Link: https://patch.msgid.link/r/0-v1-06fb1a2d6cf5+107-rereg_access_jgg@nvidia.com Reported-by: Philip Tsukerman <philiptsukerman@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
27 hoursRDMA/umem: Add helpers for umem dmabuf revoke lockJacob Moroni1-0/+4
[ Upstream commit 3a0b171302eea1732a168e26db3b8461f51cc1f9 ] Added helpers to acquire and release the umem dmabuf revoke lock. The intent is to avoid the need for drivers to peek into the ib_umem_dmabuf internals to get the dma_resv_lock and bring us one step closer to abstracting ib_umem_dmabuf away from drivers in general. Signed-off-by: Jacob Moroni <jmoroni@google.com> Link: https://patch.msgid.link/20260305170826.3803155-5-jmoroni@google.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Stable-dep-of: badad6fad60d ("RDMA: During rereg_mr ensure that REREG_ACCESS is compatible") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
27 hoursmm/memory-failure: fix hugetlb_lock AA deadlock in get_huge_page_for_hwpoisonWupeng Ma2-16/+0
[ Upstream commit 3c2d42b8ee345b17a4ba56b0f6492d1ff4c1178e ] Two concurrent madvise(MADV_HWPOISON) calls on the same hugetlb page can trigger a recursive spinlock self-deadlock (AA deadlock) on hugetlb_lock when racing with a concurrent unmap: thread#0 thread#1 -------- -------- madvise(folio, MADV_HWPOISON) -> poisons the folio successfully madvise(folio, MADV_HWPOISON) unmap(folio) try_memory_failure_hugetlb get_huge_page_for_hwpoison spin_lock_irq(&hugetlb_lock) <- held __get_huge_page_for_hwpoison hugetlb_update_hwpoison() -> MF_HUGETLB_FOLIO_PRE_POISONED goto out: folio_put() refcount: 1 -> 0 free_huge_folio() spin_lock_irqsave(&hugetlb_lock) -> AA DEADLOCK! The out: path in __get_huge_page_for_hwpoison() calls folio_put() to drop the GUP reference while the hugetlb_lock is still held by the hugetlb.c wrapper get_huge_page_for_hwpoison(). If concurrent unmap has released the page table mapping reference, folio_put() drops the folio refcount to zero, triggering free_huge_folio() which attempts to re-acquire the non-recursive hugetlb_lock. Fix this by moving hugetlb_lock acquisition from the hugetlb.c wrapper into get_huge_page_for_hwpoison(). Place spin_unlock_irq() before the folio_put() at the out: label so the folio is always released outside the lock. [akpm@linux-foundation.org: fix race, rename label per Miaohe] Link: https://sashiko.dev/#/patchset/20260522010305.4099834-1-mawupeng1@huawei.com Link: https://lore.kernel.org/f39f405e-4b4b-8f79-70fe-a2b5b62114eb@huawei.com Link: https://lore.kernel.org/20260522010305.4099834-1-mawupeng1@huawei.com Fixes: 405ce051236c ("mm/hwpoison: fix race between hugetlb free/demotion and memory_failure_hugetlb()") Signed-off-by: Wupeng Ma <mawupeng1@huawei.com> Acked-by: Oscar Salvador (SUSE) <osalvador@kernel.org> Acked-by: Muchun Song <muchun.song@linux.dev> Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com> Acked-by: Miaohe Lin <linmiaohe@huawei.com> Cc: David Hildenbrand <david@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
27 hoursmailbox: Fix NULL message support in mbox_send_message()Jassi Brar1-0/+3
commit c58e9456e30c7098cbcd9f04571992be8a2e4e63 upstream. The active_req field serves double duty as both the "is a TX in flight" flag (NULL means idle) and the storage for the in-flight message pointer. When a client sends NULL via mbox_send_message(), active_req is set to NULL, which the framework misinterprets as "no active request". This breaks the TX state machine by: - tx_tick() short-circuits on (!mssg), skipping the tx_done callback and the tx_complete completion - txdone_hrtimer() skips the channel entirely since active_req is NULL, so poll-based TX-done detection never fires. Fix this by introducing a MBOX_NO_MSG sentinel value that means "no active request," freeing NULL to be valid message data. The sentinel is defined in the subsystem-internal mailbox.h so that controller drivers within drivers/mailbox/ can reference it, but it is not exposed to clients outside the subsystem. Fifteen in-tree callers send NULL (doorbell-style IPCs on Qualcomm, Tegra, TI, Xilinx, i.MX, SCMI, and PCC platforms). All were audited for regression: - Most already work around the bug via knows_txdone=true with a manual mbox_client_txdone() call, making the framework's tracking irrelevant. These are unaffected. - Poll-based callers (Xilinx zynqmp/r5) are strictly better off: the poll timer now correctly detects NULL-active channels instead of silently skipping them. - irq-qcom-mpm.c was a pre-existing bug -- the only Qualcomm caller that omitted the knows_txdone + mbox_client_txdone() pattern. Fixed in a companion commit ("irqchip/qcom-mpm: Fix missing mailbox TX done acknowledgment"). - No caller sets both a tx_done callback and sends NULL, nor combines tx_block=true with NULL sends, so the newly reachable callback/completion paths are never exercised. Also update tegra-hsp's flush callback, which directly inspects active_req to wait for the channel to drain: the old "!= NULL" check becomes "!= MBOX_NO_MSG", otherwise flush spins until timeout since the sentinel is non-NULL. The only tradeoff is that 'MBOX_NO_MSG' can not be used as a message by clients. Reported-by: Joonwon Kang <joonwonkang@google.com> Reviewed-by: Douglas Anderson <dianders@chromium.org> Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com> Signed-off-by: Joonwon Kang <joonwonkang@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
27 hoursBluetooth: L2CAP: reject BR/EDR signaling packets over MTUsigMichael Bommarito1-0/+1
commit dd214733544427587a95f66dbf3adff072568990 upstream. net/bluetooth/l2cap_core.c:l2cap_sig_channel() accepts BR/EDR signaling packets up to the channel MTU and dispatches each command without enforcing the signaling MTU (MTUsig). A Bluetooth BR/EDR peer within radio range can send a fixed-channel CID 0x0001 packet that is larger than MTUsig and contains many L2CAP_ECHO_REQ commands before pairing. In a real-radio stock-kernel run, one 681-byte signaling packet containing 168 zero-length ECHO_REQ commands made the target transmit 168 ECHO_RSP frames over about 220 ms. Impact: a Bluetooth BR/EDR peer within radio range, before pairing, can force 168 ECHO_RSP frames from one 681-byte fixed-channel signaling packet containing packed ECHO_REQ commands. Define Linux's BR/EDR signaling MTU as the spec minimum of 48 bytes and reject any larger signaling packet with one L2CAP_COMMAND_REJECT_RSP carrying L2CAP_REJ_MTU_EXCEEDED before any command is dispatched. The Bluetooth Core spec wording for MTUExceeded says the reject identifier shall match the first request command in the packet, and that packets containing only responses shall be silently discarded. Linux intentionally deviates from that prescription: silently discarding desynchronizes the peer because the remote stack never learns its responses were dropped, and locating the first request command requires walking command headers past MTUsig, i.e. processing bytes from a packet we have already decided is too large to process. We therefore always emit one reject and use the identifier from the first command header, a single fixed-offset byte read. The unrestricted BR/EDR signaling parser and ECHO_REQ response path both trace to the initial git import; no later introducing commit is available for a Fixes tag. Cc: stable@vger.kernel.org Suggested-by: Luiz Augusto von Dentz <luiz.dentz@gmail.com> Link: https://lore.kernel.org/r/20260518002800.1361430-1-michael.bommarito@gmail.com Link: https://lore.kernel.org/r/20260520135034.1060859-1-michael.bommarito@gmail.com Link: https://lore.kernel.org/r/20260521000555.3712030-1-michael.bommarito@gmail.com Assisted-by: Claude:claude-opus-4-7 Assisted-by: Codex:gpt-5-5-xhigh Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
27 hourswriteback: Avoid contention on wb->list_lock when switching inodesJan Kara2-0/+6
[ Upstream commit e1b849cfa6b61f1c866a908c9e8dd9b5aaab820b ] There can be multiple inode switch works that are trying to switch inodes to / from the same wb. This can happen in particular if some cgroup exits which owns many (thousands) inodes and we need to switch them all. In this case several inode_switch_wbs_work_fn() instances will be just spinning on the same wb->list_lock while only one of them makes forward progress. This wastes CPU cycles and quickly leads to softlockup reports and unusable system. Instead of running several inode_switch_wbs_work_fn() instances in parallel switching to the same wb and contending on wb->list_lock, run just one work item per wb and manage a queue of isw items switching to this wb. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Sasha Levin <sashal@kernel.org>
27 hoursnetfilter: ctnetlink: ensure safe access to master conntrackPablo Neira Ayuso1-0/+5
[ Upstream commit bffcaad9afdfe45d7fc777397d3b83c1e3ebffe5 ] Holding reference on the expectation is not sufficient, the master conntrack object can just go away, making exp->master invalid. To access exp->master safely: - Grab the nf_conntrack_expect_lock, this gets serialized with clean_from_lists() which also holds this lock when the master conntrack goes away. - Hold reference on master conntrack via nf_conntrack_find_get(). Not so easy since the master tuple to look up for the master conntrack is not available in the existing problematic paths. This patch goes for extending the nf_conntrack_expect_lock section to address this issue for simplicity, in the cases that are described below this is just slightly extending the lock section. The add expectation command already holds a reference to the master conntrack from ctnetlink_create_expect(). However, the delete expectation command needs to grab the spinlock before looking up for the expectation. Expand the existing spinlock section to address this to cover the expectation lookup. Note that, the nf_ct_expect_iterate_net() calls already grabs the spinlock while iterating over the expectation table, which is correct. The get expectation command needs to grab the spinlock to ensure master conntrack does not go away. This also expands the existing spinlock section to cover the expectation lookup too. I needed to move the netlink skb allocation out of the spinlock to keep it GFP_KERNEL. For the expectation events, the IPEXP_DESTROY event is already delivered under the spinlock, just move the delivery of IPEXP_NEW under the spinlock too because the master conntrack event cache is reached through exp->master. While at it, add lockdep notations to help identify what codepaths need to grab the spinlock. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> [ fix timer_delete -> del_timer in diff context lines since 8fa7292 ("treewide: Switch/rename to timer_delete[_sync]()") landed in 6.15 ] Signed-off-by: Mark Bundschuh <mkbund@amazon.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
27 hoursnetfilter: nf_conntrack: destroy stale expectfn expectations on unregisterWeiming Shi1-0/+1
[ Upstream commit c3009418f9fa1dcb3eb86f4d8c92583537b5faa3 ] NAT helpers such as nf_nat_h323 store a raw pointer to module text in exp->expectfn (e.g. ip_nat_q931_expect). nf_ct_helper_expectfn_unregister() only unlinks the callback descriptor and never walks the expectation table, so an expectation pending at module removal survives with a dangling exp->expectfn into freed module text. When the expected connection arrives, init_conntrack() invokes exp->expectfn(), now a stale pointer into the unloaded module. Reproduced on a KASAN build by loading the H.323 helpers, creating a Q.931 expectation, unloading nf_nat_h323, then connecting to the expected port: Oops: int3: 0000 [#1] SMP KASAN NOPTI RIP: 0010:0xffffffffa06102d1 init_conntrack.isra.0 (net/netfilter/nf_conntrack_core.c:1862) nf_conntrack_in (net/netfilter/nf_conntrack_core.c:2049) ipv4_conntrack_local (net/netfilter/nf_conntrack_proto.c:223) nf_hook_slow (net/netfilter/core.c:619) __ip_local_out (net/ipv4/ip_output.c:120) __tcp_transmit_skb (net/ipv4/tcp_output.c:1715) tcp_connect (net/ipv4/tcp_output.c:4374) tcp_v4_connect (net/ipv4/tcp_ipv4.c:345) __sys_connect (net/socket.c:2167) Modules linked in: nf_conntrack_h323 [last unloaded: nf_nat_h323] Reaching the dangling state requires CAP_SYS_MODULE in the initial user namespace to remove a NAT helper that still has live expectations, so this is a robustness fix; leaving an expectation pointing at freed text is wrong regardless. Add nf_ct_helper_expectfn_destroy(), which walks the expectation table and drops every expectation whose ->expectfn matches the descriptor being torn down. Call it from each NAT helper's exit path after the existing RCU grace period, so no expectation outlives the code it points at and no extra synchronize_rcu() is introduced. With the fix, the same reproducer runs to completion without the Oops. Fixes: f587de0e2feb ("[NETFILTER]: nf_conntrack/nf_nat: add H.323 helper port") Reported-by: Xiang Mei <xmei5@asu.edu> Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Weiming Shi <bestswngs@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
27 hoursnet: guard timestamp cmsgs to real error queue skbsKyle Zeng1-0/+1
[ Upstream commit 1ee90b77b727df903033db873c75caac5c27ec98 ] skb_is_err_queue() treats PACKET_OUTGOING as the sole marker for an skb from sk_error_queue. That assumption is not true for AF_PACKET sockets: outgoing packet taps are also delivered to packet sockets with skb->pkt_type == PACKET_OUTGOING, but their skb->cb is owned by AF_PACKET instead of struct sock_exterr_skb. If such an skb is received with timestamping enabled, the generic timestamp cmsg path can read AF_PACKET control-buffer state as sock_exterr_skb::opt_stats. With SO_RXQ_OVFL enabled, the packet drop counter overlaps opt_stats. An odd drop count makes the path emit SCM_TIMESTAMPING_OPT_STATS with skb->len and skb->data. For non-linear skbs this copies past the linear head and can trigger hardened usercopy or disclose adjacent heap contents. Keep skb_is_err_queue() local to net/socket.c, but make it verify that the PACKET_OUTGOING marker is paired with the sock_rmem_free destructor installed by sock_queue_err_skb(). AF_PACKET receive skbs use normal receive ownership and no longer pass as error-queue skbs, while legitimate sk_error_queue entries keep the PACKET_OUTGOING marker and sock_rmem_free ownership. Fixes: 8605330aac5a ("tcp: fix SCM_TIMESTAMPING_OPT_STATS for normal skbs") Signed-off-by: Kyle Zeng <kylebot@openai.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20260607021819.49698-1-kylebot@openai.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
27 hoursnet/mlx5: Fix slab-out-of-bounds in mlx5_query_nic_vport_mac_listDragos Tatulea1-2/+2
[ Upstream commit 894e036a24a26a6dd7b17d8d3fb5c53ab48a6074 ] mlx5_query_nic_vport_mac_list() sizes its firmware command buffer using the PF's log_max_current_uc/mc_list capabilities. When querying a VF vport with a larger configured max (via devlink), the firmware response can overflow this buffer: BUG: KASAN: slab-out-of-bounds in mlx5_query_nic_vport_mac_list+0x453/0x4c0 [mlx5_core] Read of size 4 at addr ff1100013ffc8a12 by task kworker/u96:2/385 CPU: 12 UID: 0 PID: 385 Comm: kworker/u96:2 Not tainted 7.0.0-rc6+ #1 PREEMPT Hardware name: QEMU Standard PC (Q35 + ICH9, 2009) Workqueue: mlx5_esw_wq esw_vport_change_handler [mlx5_core] Call Trace: <TASK> dump_stack_lvl+0x69/0xa0 print_report+0x176/0x4e4 kasan_report+0xc8/0x100 mlx5_query_nic_vport_mac_list+0x453/0x4c0 [mlx5_core] esw_update_vport_addr_list+0x2e3/0xda0 [mlx5_core] esw_vport_change_handle_locked+0xa1f/0x1060 [mlx5_core] esw_vport_change_handler+0x6a/0x90 [mlx5_core] process_one_work+0x87f/0x15e0 worker_thread+0x62b/0x1020 kthread+0x375/0x490 ret_from_fork+0x4dc/0x810 ret_from_fork_asm+0x11/0x20 </TASK> Fix by querying the vport's own HCA caps to size the buffer correctly. Refactor the function to allocate and return the MAC list internally, removing the caller's dependency on knowing the correct max. Fixes: e16aea2744ab ("net/mlx5: Introduce access functions to modify/query vport mac lists") Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260604135849.458060-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
27 hoursima: kexec: skip IMA segment validation after kexec soft rebootSteven Chen1-0/+3
[ Upstream commit 9ee8888a80fe2bd20ce929ffbc1dedd57607a778 ] Currently, the function kexec_calculate_store_digests() calculates and stores the digest of the segment during the kexec_file_load syscall, where the IMA segment is also allocated. Later, the IMA segment will be updated with the measurement log at the kexec execute stage when a kexec reboot is initiated. Therefore, the digests should be updated for the IMA segment in the normal case. The problem is that the content of memory segments carried over to the new kernel during the kexec systemcall can be changed at kexec 'execute' stage, but the size and the location of the memory segments cannot be changed at kexec 'execute' stage. To address this, skip the calculation and storage of the digest for the IMA segment in kexec_calculate_store_digests() so that it is not added to the purgatory_sha_regions. With this change, the IMA segment is not included in the digest calculation, storage, and verification. Cc: Eric Biederman <ebiederm@xmission.com> Cc: Baoquan He <bhe@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Dave Young <dyoung@redhat.com> Co-developed-by: Tushar Sugandhi <tusharsu@linux.microsoft.com> Signed-off-by: Tushar Sugandhi <tusharsu@linux.microsoft.com> Signed-off-by: Steven Chen <chenste@linux.microsoft.com> Reviewed-by: Stefan Berger <stefanb@linux.ibm.com> Acked-by: Baoquan He <bhe@redhat.com> Tested-by: Stefan Berger <stefanb@linux.ibm.com> # ppc64/kvm [zohar@linux.ibm.com: Fixed Signed-off-by tag to match author's email ] Signed-off-by: Mimi Zohar <zohar@linux.ibm.com> (cherry picked from commit 9ee8888a80fe2bd20ce929ffbc1dedd57607a778) Signed-off-by: Sherry Yang <sherry.yang@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
27 hoursnet/sched: fix pedit partial COW leading to page cache corruptionRajat Gupta1-1/+0
[ Upstream commit 899ee91156e57784090c5565e4f31bd7dbffbc5a ] tcf_pedit_act() computes the COW range for skb_ensure_writable() once before the key loop using tcfp_off_max_hint, but the hint does not account for the runtime header offset added by typed keys. This can leave part of the write region un-COW'd. Fix by moving skb_ensure_writable() inside the per-key loop where the actual write offset is known, and add overflow checking on the offset arithmetic. For negative offsets (e.g. Ethernet header edits at ingress), use skb_cow() to COW the headroom instead. Guard offset_valid() against INT_MIN, where negation is undefined. Fixes: 8b796475fd78 ("net/sched: act_pedit: really ensure the skb is writable") Reported-by: Yiming Qian <yimingqian591@gmail.com> Reported-by: Keenan Dong <keenanat2000@gmail.com> Reported-by: Han Guidong <2045gemini@gmail.com> Reported-by: Zhang Cen <rollkingzzc@gmail.com> Reviewed-by: Han Guidong <2045gemini@gmail.com> Tested-by: Han Guidong <2045gemini@gmail.com> Reviewed-by: Davide Caratti <dcaratti@redhat.com> Tested-by: Davide Caratti <dcaratti@redhat.com> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Tested-by: Toke Høiland-Jørgensen <toke@redhat.com> Reviewed-by: Victor Nogueira <victor@mojatatu.com> Tested-by: Victor Nogueira <victor@mojatatu.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Rajat Gupta <rajat.gupta@oss.qualcomm.com> Link: https://patch.msgid.link/20260531123221.48732-1-jhs@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
27 hoursnet_sched: act_pedit: use RCU in tcf_pedit_dump()Eric Dumazet1-0/+1
[ Upstream commit 9d096746572616a50cac4906f528a1959c0ee1c2 ] Also storing tcf_action into struct tcf_pedit_params makes sure there is no discrepancy in tcf_pedit_act(). Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250709090204.797558-10-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Stable-dep-of: 899ee91156e5 ("net/sched: fix pedit partial COW leading to page cache corruption") Signed-off-by: Sasha Levin <sashal@kernel.org>
27 hoursBluetooth: ISO: Fix not using bc_sid as advertisement SIDLuiz Augusto von Dentz2-5/+8
[ Upstream commit 5842c01a9ed1d515c8ba2d6d3733eac78ace89c1 ] Currently bc_sid is being ignore when acting as Broadcast Source role, so this fix it by passing the bc_sid and then use it when programming the PA: < HCI Command: LE Set Exte.. (0x08|0x0036) plen 25 Handle: 0x01 Properties: 0x0000 Min advertising interval: 140.000 msec (0x00e0) Max advertising interval: 140.000 msec (0x00e0) Channel map: 37, 38, 39 (0x07) Own address type: Random (0x01) Peer address type: Public (0x00) Peer address: 00:00:00:00:00:00 (OUI 00-00-00) Filter policy: Allow Scan Request from Any, Allow Connect Request from Any (0x00) TX power: Host has no preference (0x7f) Primary PHY: LE 1M (0x01) Secondary max skip: 0x00 Secondary PHY: LE 2M (0x02) SID: 0x01 Scan request notifications: Disabled (0x00) Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Stable-dep-of: 9ca7053d6215 ("Bluetooth: ISO: Fix data-race on iso_pi fields in hci_get_route calls") Signed-off-by: Sasha Levin <sashal@kernel.org>
27 hoursnet/sched: act_api: use RCU with deferred freeing for action lifecycleJamal Hadi Salim1-0/+1
[ Upstream commit 5057e1aca011e51ef51498c940ef96f3d3e8a305 ] When NEWTFILTER and DELFILTER are run concurrently it is possible to create a race with an associated action. Let's illustrate with CPU0 running NEWTFILTER and CPU1 running DELFILTER: 0: mutex_lock() <-- holds the idr lock 0: rcu_read_lock() 0: p = idr_find(idr, index) <-- action p is valid (RCU protects IDR) 0: mutex_unlock() <-- releases the idr lock 1: refcount_dec_and_mutex_lock() <-- refcnt 1->0, mutex held 1: idr_remove(idr, index) <-- Action removed from IDR 1: mutex_unlock() <-- mutex released allowing us to delete the action 1: tcf_action_cleanup(p); kfree(p) <-- Kfrees p immediately, no deferral 0: refcount_inc_not_zero(&p->tcfa_refcnt) <-- ouch, UAF p points to freed memory This patch fixes the race condition between NEWTFILTER and DELFILTER by adding struct rcu_head to tc_action used in the deferral and introducing a call_rcu() in the delete path to defer the final kfree(). Note: this is a revert of commit d7fb60b9cafb ("net_sched: get rid of tcfa_rcu") but also modernization/simplification to directly use kfree_rcu(). Let's illustrate the new restored code path: 0: rcu_read_lock() 1: refcount_dec_and_mutex_lock() <-- refcnt 1->0, mutex held 1: idr_remove(idr, index) 1: mutex_unlock() 1: call_rcu(&p->tcfa_rcu, tcf_action_rcu_free) <-- defer kfree after grace period 0: p = idr_find(idr, index) 0: refcount_inc_not_zero(&p->tcfa_refcnt) <-- fails, refcnt already 0 1: rcu_read_unlock() <-- release so freeing can run after grace period After CPU1 calls idr_remove(), the object is no longer reachable through the IDR. CPU0's subsequent idr_find() will return NULL, and even if it still held a stale pointer, the immediate kfree() is now deferred until after the RCU grace period, so no UAF can occur. Fixes: d7fb60b9cafb ("net_sched: get rid of tcfa_rcu") Suggested-by: Jakub Kicinski <kuba@kernel.org> Reported-by: Kyle Zeng <kylebot@openai.com> Tested-by: Victor Nogueira <victor@mojatatu.com> Tested-by: syzbot@syzkaller.appspotmail.com Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Tested-by: Kyle Zeng <kylebot@openai.com> Reviewed-by: Pedro Tammela <pctammela@mojatatu.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Victor Nogueira <victor@mojatatu.com> Link: https://patch.msgid.link/20260531160812.68020-1-jhs@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
27 hoursipvs: clear the svc scheduler ptr early on editJulian Anastasov1-2/+1
[ Upstream commit 193989cc6d80dd8e0460fb3992e69fa03bf0ff9b ] ip_vs_edit_service() while unbinding the old scheduler clears the svc->scheduler ptr after the scheduler module initiates RCU callbacks. This can cause packets to use the old scheduler at the time when svc->sched_data is already freed after RCU grace period. Fix it by clearing the ptr early in ip_vs_unbind_scheduler(), before the done_service method schedules any RCU callbacks. Also, if the new scheduler fails to initialize when replacing the old scheduler, try to restore the old scheduler while still returning the error code. Link: https://sashiko.dev/#/patchset/20260519015506.634185-1-rosenp%40gmail.com Fixes: 05f00505a89a ("ipvs: fix crash if scheduler is changed") Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
27 hourswifi: remove zero-length arraysJohannes Berg1-9/+9
commit a85b8544d46390469b6ca72d6bfd3ecb7be985ff upstream. All of these are really meant to be variable-length, and in the case of s1g_beacon it's actually accessed. Make that one in particular, and a couple of others (that aren't used as arrays now), actually variable. Reported-by: syzbot+fd222bb38e916df26fa4@syzkaller.appspotmail.com Fixes: 1e1f706fc2ce ("wifi: cfg80211/mac80211: correctly parse S1G beacon optional elements") Link: https://patch.msgid.link/20250614003037.a3e82e882251.I2e8b58e56ff2a9f8b06c66f036578b7c1d4e4685@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Carlos Llamas <cmllamas@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
11 daysmm: perform all memfd seal checks in a single placeLorenzo Stoakes2-67/+11
[ Upstream commit fa00b8ef1803fe133b4897c25227aa0d298dd093 ] We no longer actually need to perform these checks in the f_op->mmap() hook any longer. We already moved the operation which clears VM_MAYWRITE on a read-only mapping of a write-sealed memfd in order to work around the restrictions imposed by commit 5de195060b2e ("mm: resolve faulty mmap_region() error path behaviour"). There is no reason for us not to simply go ahead and additionally check to see if any pre-existing seals are in place here rather than defer this to the f_op->mmap() hook. By doing this we remove more logic from shmem_mmap() which doesn't belong there, as well as doing the same for hugetlbfs_file_mmap(). We also remove dubious shared logic in mm.h which simply does not belong there either. It makes sense to do these checks at the earliest opportunity, we know these are shmem (or hugetlbfs) mappings whose relevant VMA flags will not change from the invoking do_mmap() so there is simply no need to wait. This also means the implementation of further memfd seal flags can be done within mm/memfd.c and also have the opportunity to modify VMA flags as necessary early in the mapping logic. [lorenzo.stoakes@oracle.com: fix typos in !memfd inline stub] Link: https://lkml.kernel.org/r/7dee6c5d-480b-4c24-b98e-6fa47dbd8a23@lucifer.local Link: https://lkml.kernel.org/r/20241206212846.210835-1-lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Tested-by: Isaac J. Manjarres <isaacmanjarres@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jann Horn <jannh@google.com> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Jeff Xu <jeffxu@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Stable-dep-of: 3b041514cb6e ("memfd: deny writeable mappings when implying SEAL_WRITE") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 daysring-buffer: Flush and stop persistent ring buffer on panicMasami Hiramatsu (Google)1-0/+13
[ Upstream commit a494d3c8d5392bcdff83c2a593df0c160ff9f322 ] On real hardware, panic and machine reboot may not flush hardware cache to memory. This means the persistent ring buffer, which relies on a coherent state of memory, may not have its events written to the buffer and they may be lost. Moreover, there may be inconsistency with the counters which are used for validation of the integrity of the persistent ring buffer which may cause all data to be discarded. To avoid this issue, stop recording of the ring buffer on panic and flush the cache of the ring buffer's memory. Fixes: e645535a954a ("tracing: Add option to use memmapped memory for trace boot instance") Cc: stable@vger.kernel.org Cc: Will Deacon <will@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Ian Rogers <irogers@google.com> Link: https://patch.msgid.link/177751969602.2136606.12031934362587643488.stgit@mhiramat.tok.corp.google.com Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 daysserdev: Provide a bustype shutdown functionUwe Kleine-König1-0/+1
[ Upstream commit 6d71c62b13c33ea858ab298fe20beaec5736edc7 ] To prepare serdev driver to migrate away from struct device_driver::shutdown (and then eventually remove that callback) create a serdev driver shutdown callback and migration code to keep the existing behaviour. Note this introduces a warning for each driver at register time that isn't converted yet to that callback. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com> Link: https://patch.msgid.link/ab518883e3ed0976a19cb5b5b5faf42bd3a655b7.1765526117.git.u.kleine-koenig@baylibre.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Stable-dep-of: 375ba7484132 ("Bluetooth: hci_qca: Convert timeout from jiffies to ms") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 daysxfrm: route MIGRATE notifications to caller's netnsMaoyi Xie1-1/+2
commit 7e2a4f7ca0952820731ef7bdadfc9a9e9d3571b4 upstream. xfrm_send_migrate() in net/xfrm/xfrm_user.c and pfkey_send_migrate() in net/key/af_key.c both hardcode &init_net for the multicast that announces a successful XFRM_MSG_MIGRATE / SADB_X_MIGRATE. XFRM_MSG_MIGRATE arrives on a per-netns NETLINK_XFRM socket, and the rest of the xfrm/af_key netlink path was made netns-aware in 2008. The other 14 multicast paths in xfrm_user.c route their event using xs_net(x), xp_net(xp) or sock_net(skb->sk); only the migrate path was missed. Two consequences of the init_net hardcoding: 1. The notification (selector, old/new endpoint addresses, and the km_address) is delivered to listeners on init_net's XFRMNLGRP_MIGRATE / pfkey BROADCAST_ALL groups rather than on the issuing netns. An IKE daemon running in init_net therefore receives migration notifications originating from any other netns on the host. 2. An IKE daemon running inside a non-init netns and subscribed to its own XFRMNLGRP_MIGRATE / pfkey groups never receives the notification of its own migration. IKEv2 MOBIKE / address-update handling inside a netns is silently broken. Thread struct net through km_migrate() and the xfrm_mgr.migrate function pointer, drop the &init_net override in xfrm_send_migrate() and pfkey_send_migrate(), and pass the caller's net (already in scope in xfrm_migrate() via sock_net(skb->sk)) all the way down. struct xfrm_mgr is in-tree only and not exported as a stable API, so the function-pointer signature change is internal. pfkey_broadcast() is already netns-aware via net_generic(net, pfkey_net_id) since the pernet conversion. The five other pfkey_broadcast() callers in af_key.c already pass xs_net(x), sock_net(sk) or a per-netns net, so this only removes the &init_net outlier. Fixes: 5c79de6e79cd ("[XFRM]: User interface for handling XFRM_MSG_MIGRATE") Cc: stable@vger.kernel.org # v5.15+ Signed-off-by: Maoyi Xie <maoyi.xie@ntu.edu.sg> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 daysiommu, debugobjects: avoid gcc-16.1 section mismatch warningsArnd Bergmann1-0/+11
commit 4c9ad387aa2d6785299722e54224d34764edaeb3 upstream. gcc-16 has gained some more advanced inter-procedual optimization techniques that enable it to inline the dummy_tlb_add_page() and dummy_tlb_flush() function pointers into a specialized version of __arm_v7s_unmap: WARNING: modpost: vmlinux: section mismatch in reference: __arm_v7s_unmap+0x2cc (section: .text) -> dummy_tlb_add_page (section: .init.text) ERROR: modpost: Section mismatches detected. >From what I can tell, the transformation is correct, as this is only called when __arm_v7s_unmap() is called from arm_v7s_do_selftests(), which is also __init. Since __arm_v7s_unmap() however is not __init, gcc cannot inline the inner function calls directly. In debug_objects_selftest(), the same thing happens. Both the caller and the leaf function are __init, but the IPA pulls it into a non-init one: WARNING: modpost: vmlinux: section mismatch in reference: lookup_object_or_alloc+0x7c (section: .text.lookup_object_or_alloc) -> is_static_object (section: .init.text) Marking the affected functions as not "__init" would reliably avoid this issue but is not a good solution because it removes an otherwise correct annotation. I tried marking the functions as 'noinline', but that ended up not covering all the affected configurations. With some more experimenting, I found that marking these functions as __attribute__((noipa)) is both logical and reliable. In order to keep the syntax readable, add a custom macro for this in include/linux/compiler_attributes.h next to other related macros and use it to annotate both files. Link: https://lore.kernel.org/all/abRB6g-48ZX6Yl2r@willie-the-truck/ Cc: Will Deacon <will@kernel.org> Cc: Thomas Gleixner <tglx@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: linux-kbuild@vger.kernel.org Cc: stable@vger.kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Will Deacon <will@kernel.org> Acked-by: Thomas Gleixner <tglx@kernel.org> Acked-by: Miguel Ojeda <ojeda@kernel.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 daysDisable -Wattribute-alias for clang-23 and newerNathan Chancellor4-0/+18
commit 175db11786bde9061db526bf1ac5107d915f5163 upstream. Clang recently added support for -Wattribute-alias [1], which results in the same warnings that necessitated commit bee20031772a ("disable -Wattribute-alias warning for SYSCALL_DEFINEx()") for GCC. kernel/time/itimer.c:325:1: error: alias and aliasee have different types 'long (unsigned int)' and 'long (typeof (__builtin_choose_expr((__builtin_types_compatible_p(typeof ((unsigned int)0), typeof (0LL)) || __builtin_types_compatible_p(typeof ((unsigned int)0), typeof (0ULL))), 0LL, 0L)))' (aka 'long (long)') [-Werror,-Wattribute-alias] 325 | SYSCALL_DEFINE1(alarm, unsigned int, seconds) | ^ include/linux/syscalls.h:225:36: note: expanded from macro 'SYSCALL_DEFINE1' 225 | #define SYSCALL_DEFINE1(name, ...) SYSCALL_DEFINEx(1, _##name, __VA_ARGS__) | ^ include/linux/syscalls.h:236:2: note: expanded from macro 'SYSCALL_DEFINEx' 236 | __SYSCALL_DEFINEx(x, sname, __VA_ARGS__) | ^ include/linux/syscalls.h:251:18: note: expanded from macro '__SYSCALL_DEFINEx' 251 | __attribute__((alias(__stringify(__se_sys##name)))); \ | ^ kernel/time/itimer.c:325:1: note: aliasee is declared here include/linux/syscalls.h:225:36: note: expanded from macro 'SYSCALL_DEFINE1' 225 | #define SYSCALL_DEFINE1(name, ...) SYSCALL_DEFINEx(1, _##name, __VA_ARGS__) | ^ include/linux/syscalls.h:236:2: note: expanded from macro 'SYSCALL_DEFINEx' 236 | __SYSCALL_DEFINEx(x, sname, __VA_ARGS__) | ^ include/linux/syscalls.h:255:18: note: expanded from macro '__SYSCALL_DEFINEx' 255 | asmlinkage long __se_sys##name(__MAP(x,__SC_LONG,__VA_ARGS__)) \ | ^ <scratch space>:16:1: note: expanded from here 16 | __se_sys_alarm | ^ Disable the warnings in the same way for clang-23 and newer. Disable the warning about unknown warning options to avoid breaking the build for versions of clang-23 that do not have -Wattribute-alias, such as ones deployed by vendors like Android or CI systems or when bisecting LLVM between llvmorg-23-init and release/23.x. Cc: stable@vger.kernel.org Closes: https://github.com/ClangBuiltLinux/linux/issues/2163 Link: https://github.com/llvm/llvm-project/commit/40da6920a0d71d49dfa2392b09153600b0759f5e [1] Link: https://patch.msgid.link/20260515-syscall-disable-attribute-alias-for-clang-v1-1-9a9d95d41df6@kernel.org Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 daysparport: Fix race between port and client registrationBen Hutchings1-0/+1
commit ef15ccbb3e8640a723c42ad90eaf81d66ae02017 upstream. The parport subsystem registers port devices before they are fully initialised, resulting in a race condition where client drivers such as lp can attach to ports that are not completely initialised or even being torn down. When the port and client drivers are built as modules and loaded around the same time during boot, this occasionally results in a crash. I was able to make this happen reliably in a VM with a PC-style parallel port by patching parport_pc to fail probing: > --- a/drivers/parport/parport_pc.c > +++ b/drivers/parport/parport_pc.c > @@ -2069,7 +2069,7 @@ static struct parport *__parport_pc_probe_port(unsigned long int base, > if (!p) > goto out3; > > - base_res = request_region(base, 3, p->name); > + base_res = NULL; > if (!base_res) > goto out4; > and then running: while true; do modprobe lp & modprobe parport_pc wait rmmod lp parport_pc done for a few seconds. In the long term I think port registration should be changed to put the call to device_add() inside parport_announce_port(), but since the latter currently cannot fail this will require changing all port drivers. For now, add a flag to indicate whether a port has been "announced" and only try to attach client drivers to ports when the flag is set. Fixes: 6fa45a226897 ("parport: add device-model to parport subsystem") Closes: https://bugs.debian.org/1130365 Closes: https://lore.kernel.org/all/6ba903ad-9897-42bb-8c2d-337385cc3746@molgen.mpg.de/ Cc: stable <stable@kernel.org> Signed-off-by: Ben Hutchings <benh@debian.org> Acked-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Link: https://patch.msgid.link/afo6uBv68GDevbMD@decadent.org.uk Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 daysdrm/dp: Add eDP 1.5 bit definitionSuraj Kandpal1-0/+1
commit 5dfc37a6b77bf6beedbd30d70184b54e1a08ccac upstream. Add the eDP revision bit value for 1.5. Spec: eDPv1.5 Table 16-5 Signed-off-by: Suraj Kandpal <suraj.kandpal@intel.com> Reviewed-by: Arun R Murthy <arun.r.murthy@intel.com> Tested-by: Ben Kao <ben.kao@intel.com> Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250206063253.2827017-2-suraj.kandpal@intel.com Signed-off-by: Jouni Högander <jouni.hogander@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
11 daysHID: core: introduce hid_safe_input_report()Benjamin Tissoires1-0/+2
[ Upstream commit 206342541fc887ae919774a43942dc883161fece ] hid_input_report() is used in too many places to have a commit that doesn't cross subsystem borders. Instead of changing the API, introduce a new one when things matters in the transport layers: - usbhid - i2chid This effectively revert to the old behavior for those two transport layers. Fixes: 0a3fe972a7cb ("HID: core: Mitigate potential OOB by removing bogus memset()") Cc: stable@vger.kernel.org Signed-off-by: Benjamin Tissoires <bentiss@kernel.org> Signed-off-by: Jiri Kosina <jkosina@suse.com> (cherry picked from commit 301338b8edadc67a42b1c86add975091e66768d9) Signed-off-by: Lee Jones <lee@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
11 daysHID: pass the buffer size to hid_report_raw_eventBenjamin Tissoires2-7/+11
[ Upstream commit 2c85c61d1332e1e16f020d76951baf167dcb6f7a ] commit 0a3fe972a7cb ("HID: core: Mitigate potential OOB by removing bogus memset()") enforced the provided data to be at least the size of the declared buffer in the report descriptor to prevent a buffer overflow. However, we can try to be smarter by providing both the buffer size and the data size, meaning that hid_report_raw_event() can make better decision whether we should plaining reject the buffer (buffer overflow attempt) or if we can safely memset it to 0 and pass it to the rest of the stack. Fixes: 0a3fe972a7cb ("HID: core: Mitigate potential OOB by removing bogus memset()") Cc: stable@vger.kernel.org Signed-off-by: Benjamin Tissoires <bentiss@kernel.org> Acked-by: Johan Hovold <johan@kernel.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jiri Kosina <jkosina@suse.com> Stable-dep-of: 206342541fc8 ("HID: core: introduce hid_safe_input_report()") (cherry picked from commit 509c2605065004fc4cd86ee50a9350d402785307) [Lee: Backported to linux-6.12.y and beyond] Signed-off-by: Lee Jones <lee@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
11 daysHID: core: Add printk_ratelimited variants to hid_warn() etcVicki Pfau1-0/+11
[ Upstream commit 1d64624243af8329b4b219d8c39e28ea448f9929 ] hid_warn_ratelimited() is needed. Add the others as part of the block. Signed-off-by: Vicki Pfau <vi@endrift.com> Signed-off-by: Jiri Kosina <jkosina@suse.com> Signed-off-by: Lee Jones <lee@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
11 daysinet: frags: flush pending skbs in fqdir_pre_exit()Jakub Kicinski2-15/+7
[ Upstream commit 006a5035b495dec008805df249f92c22c89c3d2e ] We have been seeing occasional deadlocks on pernet_ops_rwsem since September in NIPA. The stuck task was usually modprobe (often loading a driver like ipvlan), trying to take the lock as a Writer. lockdep does not track readers for rwsems so the read wasn't obvious from the reports. On closer inspection the Reader holding the lock was conntrack looping forever in nf_conntrack_cleanup_net_list(). Based on past experience with occasional NIPA crashes I looked thru the tests which run before the crash and noticed that the crash follows ip_defrag.sh. An immediate red flag. Scouring thru (de)fragmentation queues reveals skbs sitting around, holding conntrack references. The problem is that since conntrack depends on nf_defrag_ipv6, nf_defrag_ipv6 will load first. Since nf_defrag_ipv6 loads first its netns exit hooks run _after_ conntrack's netns exit hook. Flush all fragment queue SKBs during fqdir_pre_exit() to release conntrack references before conntrack cleanup runs. Also flush the queues in timer expiry handlers when they discover fqdir->dead is set, in case packet sneaks in while we're running the pre_exit flush. The commit under Fixes is not exactly the culprit, but I think previously the timer firing would eventually unblock the spinning conntrack. Fixes: d5dd88794a13 ("inet: fix various use-after-free in defrags units") Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251207010942.1672972-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Rajani Kantha <681739313@139.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
11 daysinet: frags: add inet_frag_queue_flush()Jakub Kicinski1-3/+2
[ Upstream commit 1231eec6994be29d6bb5c303dfa54731ed9fc0e6 ] Instead of exporting inet_frag_rbtree_purge() which requires that caller takes care of memory accounting, add a new helper. We will need to call it from a few places in the next patch. Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251207010942.1672972-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Rajani Kantha <681739313@139.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
11 daysmedia: rc: fix race between unregister and urb/irq callbacksSean Young1-2/+0
[ Upstream commit dccc0c3ddf8f16071736f98a7d6dd46a2d43e037 ] Some rc device drivers have a race condition between rc_unregister_device() and irq or urb callbacks. This is because rc_unregister_device() does two things, it marks the device as unregistered so no new commands can be issued and then it calls rc_free_device(). This means the driver has no chance to cancel any pending urb callbacks or interrupts after the device has been marked as unregistered. Those callbacks may access struct rc_dev or its members (e.g. struct ir_raw_event_ctrl), which have been freed by rc_free_device(). This change removes the implicit call to rc_free_device() from rc_unregister_device(). This means that device drivers can call rc_unregister_device() in their remove or disconnect function, then cancel all the urbs and interrupts before explicitly calling rc_free_device(). Note this is an alternative fix for an issue found by Haotian Zhang, see the Closes: tags. Reported-by: Haotian Zhang <vulab@iscas.ac.cn> Closes: https://lore.kernel.org/linux-media/20251114101432.2566-1-vulab@iscas.ac.cn/ Closes: https://lore.kernel.org/linux-media/20251114101418.2548-1-vulab@iscas.ac.cn/ Closes: https://lore.kernel.org/linux-media/20251114101346.2530-1-vulab@iscas.ac.cn/ Closes: https://lore.kernel.org/linux-media/20251114090605.2413-1-vulab@iscas.ac.cn/ Reviewed-by: Patrice Chotard <patrice.chotard@foss.st.com> Signed-off-by: Sean Young <sean@mess.org> Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org> Stable-dep-of: 646ebdd31058 ("media: rc: ttusbir: fix inverted error logic") Signed-off-by: Sasha Levin <sashal@kernel.org>
11 daysnet: Introduce skb tc depth field to track packet loopsJamal Hadi Salim1-0/+2
[ Upstream commit 98b34f3e8c3492cfc89ff943c9d92b4d52863d1d ] Add a 2-bit per-skb tc depth field to track packet loops across the stack. The previous per-CPU loop counters like MIRRED_NEST_LIMIT assume a single call stack and lose state in two cases: 1) When a packet is queued and reprocessed later (e.g., egress->ingress via backlog), the per-cpu state is gone by the time it is dequeued. 2) With XPS/RPS a packet may arrive on one CPU and be processed on another. A per-skb field solves both by travelling with the packet itself. The field fits in existing padding, using 2 bits that were previously a hole: pahole before(-) and after (+) diff looks like: __u8 slow_gro:1; /* 132: 3 1 */ __u8 csum_not_inet:1; /* 132: 4 1 */ __u8 unreadable:1; /* 132: 5 1 */ + __u8 tc_depth:2; /* 132: 6 1 */ - /* XXX 2 bits hole, try to pack */ /* XXX 1 byte hole, try to pack */ __u16 tc_index; /* 134 2 */ There used to be a ttl field which was removed as part of tc_verd in commit aec745e2c520 ("net-tc: remove unused tc_verd fields"). It was already unused by that time, due to remove earlier in commit c19ae86a510c ("tc: remove unused redirect ttl"). The first user of this field is netem, which increments tc_depth on duplicated packets before re-enqueueing them at the root qdisc. On re-entry, netem skips duplication for any skb with tc_depth already set, bounding recursion to a single level regardless of tree topology. The other user is mirred which increments it on each pass and limits to depth to MIRRED_DEFER_LIMIT (3). The new field was called ttl in earlier versions of this patch but renamed to tc_depth to avoid confusion with IP ttl. Note (looking at you Sashiko! Dont ignore me and continue bringing this up): 1. Since both mirred and netem utilize the same 2-bit tc_depth field it is possible when netem and mirred are used together that netem qdisc to skip the duplication step. This is a known trade-off, as a 2-bit field cannot independently track both features' recursion depths and it is not considered sane to have a setup that addresses both features on at the same time. 2. skb_scrub_packet does not clear tc_depth. This means a packet's loop history is preserved even across namespaces. While this might be restrictive for some topologies, it is also design intent to provide robustness against loops across namespaces. Reviewed-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://patch.msgid.link/20260525122556.973584-2-jhs@mojatatu.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Stable-dep-of: e80ad525fc7e ("net/sched: act_mirred: Fix return code in early mirred redirect error paths") Signed-off-by: Sasha Levin <sashal@kernel.org>
11 daysnet/sched: act_mirred: add loop detectionEric Dumazet1-1/+8
[ Upstream commit fe946a751d9b52b7c45ca34899723b314b79b249 ] Commit 0f022d32c3ec ("net/sched: Fix mirred deadlock on device recursion") added code in the fast path, even when act_mirred is not used. Prepare its revert by implementing loop detection in act_mirred. Adds an array of device pointers in struct netdev_xmit. tcf_mirred_is_act_redirect() can detect if the array already contains the target device. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Tested-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://patch.msgid.link/20251014171907.3554413-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Stable-dep-of: e80ad525fc7e ("net/sched: act_mirred: Fix return code in early mirred redirect error paths") Signed-off-by: Sasha Levin <sashal@kernel.org>
11 daysnet/sched: act_mirred: Move the recursion counter struct netdev_xmitSebastian Andrzej Siewior1-0/+3
[ Upstream commit 7fe70c06a182a140be9996b02256d907e114479a ] mirred_nest_level is a per-CPU variable and relies on disabled BH for its locking. Without per-CPU locking in local_bh_disable() on PREEMPT_RT this data structure requires explicit locking. Move mirred_nest_level to struct netdev_xmit as u8, provide wrappers. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Juri Lelli <juri.lelli@redhat.com> Link: https://patch.msgid.link/20250512092736.229935-11-bigeasy@linutronix.de Signed-off-by: Paolo Abeni <pabeni@redhat.com> Stable-dep-of: e80ad525fc7e ("net/sched: act_mirred: Fix return code in early mirred redirect error paths") Signed-off-by: Sasha Levin <sashal@kernel.org>
11 dayskunit: fix use-after-free in debugfs when using kunit.filterFlorian Schmaus1-0/+1
[ Upstream commit fb6988b83b4cafe8db63999c1ddff1b7c66d2ff5 ] When the kernel is booted with a kunit filter (e.g., kunit.filter="speed!=slow"), the kunit executor dynamically allocates copies of the filtered test suites using kmalloc/kmemdup. During the initial boot execution, kunit_debugfs_create_suite() creates debugfs files (such as /sys/kernel/debug/kunit/<suite>/run) and permanently stores a pointer to the dynamically allocated suite in the inode's i_private field. Previously, the executor freed this dynamically allocated suite_set immediately after executing the boot-time tests. Because the debugfs nodes were not destroyed, any subsequent interaction with the debugfs `run` file from userspace triggered a use-after-free (UAF). On systems with architectural capabilities, like CHERI RISC-V, this resulted in an immediate fatal hardware exception due to the invalidation of the capability tags on the reclaimed memory. On other architectures, it resulted in silent memory corruption. Fix this UAF by properly coupling the lifetime of the filtered suite memory allocation to the lifetime of the kunit subsystem and its associated VFS nodes. Ownership of the boot-time suite_set is now transferred to a global tracker ('kunit_boot_suites'), and the memory is cleanly released in kunit_exit() during module teardown. Link: https://lore.kernel.org/r/20260507084854.233984-1-florian.schmaus@codasip.com Fixes: e2219db280e3 ("kunit: add debugfs /sys/kernel/debug/kunit/<suite>/results display") Signed-off-by: Florian Schmaus <florian.schmaus@codasip.com> Reviewed-by: Martin Kaiser <martin@kaiser.cx> Reviewed-by: David Gow <david@davidgow.net> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-06-01block: make bio_integrity_map_user() static inlineJens Axboe1-1/+1
[ Upstream commit 546d191427cf5cf3215529744c2ea8558f0279db ] If CONFIG_BLK_DEV_INTEGRITY isn't set, then the dummy helper must be static inline to avoid complaints about the function being unused. Fixes: fe8f4ca7107e ("block: modify bio_integrity_map_user to accept iov_iter as argument") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202411300229.y7h60mDg-lkp@intel.com/ Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-06-01block: modify bio_integrity_map_user to accept iov_iter as argumentAnuj Gupta1-3/+2
[ Upstream commit fe8f4ca7107e968b0eb7328155c8811f2a19424a ] This patch refactors bio_integrity_map_user to accept iov_iter as argument. This is a prep patch. Signed-off-by: Anuj Gupta <anuj20.g@samsung.com> Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20241128112240.8867-4-anuj20.g@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Stable-dep-of: 8582792cf23b ("block: bio-integrity: Fix null-ptr-deref in bio_integrity_map_user()") Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-06-01blk-integrity: remove seed for user mapped buffersKeith Busch2-5/+4
[ Upstream commit 133008e84b99e4f5f8cf3d8b768c995732df9406 ] The seed is only used for kernel generation and verification. That doesn't happen for user buffers, so passing the seed around doesn't accomplish anything. Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Anuj Gupta <anuj20.g@samsung.com> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Link: https://lore.kernel.org/r/20241016201309.1090320-1-kbusch@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Stable-dep-of: 637ad3a56a3b ("block: don't overwrite bip_vcnt in bio_integrity_copy_user()") Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-06-01netfs: Fix folio->private handling in netfs_perform_write()David Howells1-0/+1
[ Upstream commit ccde2ac757c713535b224233a296de40efe5212d ] Under some circumstances, netfs_perform_write() doesn't correctly manipulate folio->private between NULL, NETFS_FOLIO_COPY_TO_CACHE, pointing to a group and pointing to a netfs_folio struct, leading to potential multiple attachments of private data with associated folio ref leaks and also leaks of netfs_folio structs or netfs_group refs. Fix this by consolidating the place at which a folio is marked uptodate in one place and having that look at what's attached to folio->private and decide how to clean it up and then set the new group. Also, the content shouldn't be flushed if group is NULL, even if a group is specified in the netfs_group parameter, as that would be the case for a new folio. A filesystem should always specify netfs_group or never specify netfs_group. The Sashiko auto-review tool noted that it was theoretically possible that the fpos >= ctx->zero_point section might leak if it modified a streaming write folio. This is unlikely, but with a network filesystem, third party changes can happen. It also pointed out that __netfs_set_group() would leak if called multiple times on the same folio from the "whole folio modify section". Fixes: 8f52de0077ba ("netfs: Reduce number of conditional branches in netfs_perform_write()") Closes: https://sashiko.dev/#/patchset/20260414082004.3756080-1-dhowells%40redhat.com Signed-off-by: David Howells <dhowells@redhat.com> Link: https://patch.msgid.link/20260512123404.719402-22-dhowells@redhat.com cc: Paulo Alcantara <pc@manguebit.org> cc: Matthew Wilcox <willy@infradead.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-06-01netfs: Fix streaming write being overwrittenDavid Howells1-0/+3
[ Upstream commit 7b4dcf1b9455a6e52ac7478b4057dbe10359576d ] In order to avoid reading whilst writing, netfslib will allow "streaming writes" in which dirty data is stored directly into folios without reading them first. Such folios are marked dirty but may not be marked uptodate. If a folio is entirely written by a streaming write, uptodate will be set, otherwise it will have a netfs_folio struct attached to ->private recording the dirty region. In the event that a partially written streaming write page is to be overwritten entirely by a single write(), netfs_perform_write() will try to copy over it, but doesn't discard the netfs_folio if it succeeds; further, it doesn't correctly handle a partial copy that overwrites some of the dirty data. Fix this by the following: (1) If the folio is successfully overwritten, free the netfs_folio struct before marking the page uptodate. (2) If the copy to the folio partially fails, but short of the dirty data, just ignore the copy. (3) If the copy partially fails and overwrites some of the dirty data, accept the copy, update the netfs_folio struct to record the new data. If the folio is now filled, free the netfs_folio and set uptodate, otherwise return a partial write. Found with: fsx -q -N 1000000 -p 10000 -o 128000 -l 600000 \ /xfstest.test/junk --replay-ops=junk.fsxops using the following as junk.fsxops: truncate 0x0 0 0x927c0 write 0x63fb8 0x53c8 0 copy_range 0xb704 0x19b9 0x24429 0x79380 write 0x2402b 0x144a2 0x90660 * write 0x204d5 0x140a0 0x927c0 * copy_range 0x1f72c 0x137d0 0x7a906 0x927c0 * read 0x00000 0x20000 0x9157c read 0x20000 0x20000 0x9157c read 0x40000 0x20000 0x9157c read 0x60000 0x20000 0x9157c read 0x7e1a0 0xcfb9 0x9157c on cifs with the default cache option. It shows folio 0x24 misbehaving if the FMODE_READ check is commented out in netfs_perform_write(): if (//(file->f_mode & FMODE_READ) || netfs_is_cache_enabled(ctx)) { and no fscache. This was initially found with the generic/522 xfstest. Fixes: 8f52de0077ba ("netfs: Reduce number of conditional branches in netfs_perform_write()") Signed-off-by: David Howells <dhowells@redhat.com> Link: https://patch.msgid.link/20260512123404.719402-14-dhowells@redhat.com cc: Paulo Alcantara <pc@manguebit.org> cc: Matthew Wilcox <willy@infradead.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-06-01netfs: Fix netfs_invalidate_folio() to clear dirty bit if all changes goneDavid Howells1-0/+4
[ Upstream commit 156ac2ec2ee77c44c4eb7439d6d165247ba12247 ] If a streaming write is made, this will leave the relevant modified folio in a not-uptodate, but dirty state with a netfs_folio struct hung off of folio->private indicating the dirty range. Subsequently truncating the file such that the dirty data in the folio is removed, but the first part of the folio theoretically remains will cause the netfs_folio struct to be discarded... but will leave the dirty flag set. If the folio is then read via mmap(), netfs_read_folio() will see that the page is dirty and jump to netfs_read_gaps() to fill in the missing bits. netfs_read_gaps(), however, expects there to be a netfs_folio struct present and can oops because truncate removed it. Fix this by calling folio_cancel_dirty() in netfs_invalidate_folio() in the event that all the dirty data in the folio is erased (as nfs does). Also add some tracepoints to log modifications to a dirty page. This can be reproduced with something like: dd if=/dev/zero of=/xfstest.test/foo bs=1M count=1 umount /xfstest.test mount /xfstest.test xfs_io -c "w 0xbbbf 0xf96c" \ -c "truncate 0xbbbf" \ -c "mmap -r 0xb000 0x11000" \ -c "mr 0xb000 0x11000" \ /xfstest.test/foo with fscaching disabled (otherwise streaming writes are suppressed) and a change to netfs_perform_write() to disallow streaming writes if the fd is open O_RDWR: if (//(file->f_mode & FMODE_READ) || <--- comment this out netfs_is_cache_enabled(ctx)) { It should be reproducible even without this change, but if prevents the above trivial xfs_io command from reproducing it. Note that the initial dd is important: the file must start out sufficiently large that the zero-point logic doesn't just clear the gaps because it knows there's nothing in the file to read yet. Unmounting and mounting is needed to clear the pagecache (there are other ways to do that that may also work). This was initially reproduced with the generic/522 xfstest on some patches that remove the FMODE_READ restriction. Fixes: 9ebff83e6481 ("netfs: Prep to use folio->private for write grouping and streaming write") Reported-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> Link: https://patch.msgid.link/20260512123404.719402-12-dhowells@redhat.com cc: Paulo Alcantara <pc@manguebit.org> cc: Matthew Wilcox <willy@infradead.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-06-01kprobes: skip non-symbol addresses in kprobe_add_ksym_blacklist()Jianpeng Chang1-1/+1
[ Upstream commit 307abfac04a254c09c5705d816b33354acee97a0 ] When kprobe_add_area_blacklist() iterates through a section like .kprobes.text, the start address may not correspond to a named symbol. On ARM64 with CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS=y (introduced by commit baaf553d3bc3 ("arm64: Implement HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS")), the compiler flag -fpatchable-function-entry=4,2 inserts 2 NOPs before each function entry point for ftrace call_ops. These pre-function NOPs sit at the section base address, before the first named function symbol. The compiler emits a $x mapping symbol at offset 0x00 to mark the start of code, but find_kallsyms_symbol() ignores mapping symbols. Without CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS (e.g. defconfig), no pre-function NOPs are inserted, the first function starts at offset 0x00, and the bug does not trigger. This only affects modules that have a .kprobes.text section (i.e. those using the __kprobes annotation). Modules using NOKPROBE_SYMBOL() instead (like kretprobe_example.ko) blacklist exact function addresses via the _kprobe_blacklist section and are not affected. For kprobe_example.ko on ARM64 with -fpatchable-function-entry=4,2, the .kprobes.text section layout is: offset 0x00: $x + 2 NOPs (mapping symbol + ftrace preamble) offset 0x08: handler_post (64 bytes) offset 0x50: handler_pre (68 bytes) kprobe_add_area_blacklist() starts iterating from the section base address (offset 0x00), which only has the $x mapping symbol. kprobe_add_ksym_blacklist() then calls kallsyms_lookup_size_offset() for this address, which goes through: kallsyms_lookup_size_offset() -> module_address_lookup() -> find_kallsyms_symbol() find_kallsyms_symbol() scans all module symbols to find the closest preceding symbol. Since no named text symbol exists at offset 0x00, find_kallsyms_symbol() picks __UNIQUE_ID_vermagic (a .modinfo symbol whose address is in the temporary image) as the "best" match. The computed "size" = next_text_symbol - modinfo_symbol spans across these two unrelated memory regions, creating a blacklist entry with a bogus range of tens of terabytes. Whether this causes a visible failure depends on address randomization, here is what happens on Raspberry Pi 4/5: - On RPi5, the bogus size was ~35 TB. start + size stayed within 64-bit range, so the blacklist entry covered the entire kernel text. register_kprobe() in the module's own init function failed with -EINVAL. - On RPi4, the bogus size was ~75 TB. start + size overflowed 64 bits and wrapped to a small address near zero. The range check (addr >= start && addr < end) then failed because end wrapped around, so the bogus entry was accidentally harmless and kprobes worked by luck. The same bug exists on both machines, but randomization determines whether the integer overflow masks it or not. Fix this by adding notrace to the __kprobes macro. Functions in .kprobes.text are kprobe infrastructure handlers that should never be traced by ftrace. With notrace, the compiler stops inserting them and the non-symbol gap at the section start disappears entirely. Link: https://lore.kernel.org/all/20260506012706.2785785-1-jianpeng.chang.cn@windriver.com/ Fixes: baaf553d3bc3 ("arm64: Implement HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS") Signed-off-by: Jianpeng Chang <jianpeng.chang.cn@windriver.com> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-06-01btrfs: tracepoints: fix sleep while in atomic context in btrfs_sync_file()Filipe Manana1-3/+1
[ Upstream commit c73370c677646e86fc4b1780fb07027bdf847375 ] The trace event btrfs_sync_file() is called in an atomic context (all trace events are) and its call to dput(), which is needed due to the call to dget_parent(), can sleep, triggering a kernel splat. This can be reproduced by enabling the trace event and running btrfs/056 from fstests for example. The splat shown in dmesg is the following: [53.919] BUG: sleeping function called from invalid context at fs/dcache.c:970 [53.947] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 32773, name: xfs_io [53.988] preempt_count: 2, expected: 0 [53.967] RCU nest depth: 0, expected: 0 [53.943] Preemption disabled at: [53.944] [<0000000000000000>] 0x0 [54.078] CPU: 0 UID: 0 PID: 32773 Comm: xfs_io Tainted: G W 7.1.0-rc1-btrfs-next-232+ #1 PREEMPT(full) [54.070] Tainted: [W]=WARN [54.071] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014 [54.072] Call Trace: [54.074] <TASK> [54.076] dump_stack_lvl+0x56/0x80 [54.079] __might_resched.cold+0xd6/0x10f [54.072] dput.part.0+0x24/0x110 [54.078] trace_event_raw_event_btrfs_sync_file+0x75/0x140 [btrfs] [54.089] btrfs_sync_file+0x1ed/0x530 [btrfs] [54.087] ? __handle_mm_fault+0x8ae/0xed0 [54.089] btrfs_do_write_iter+0x172/0x210 [btrfs] [54.091] vfs_write+0x21f/0x450 [54.094] __x64_sys_pwrite64+0x8d/0xc0 [54.096] ? do_user_addr_fault+0x20c/0x670 [54.099] do_syscall_64+0x60/0xf20 [54.092] ? clear_bhb_loop+0x60/0xb0 [54.094] entry_SYSCALL_64_after_hwframe+0x76/0x7e So stop using dget_parent() and dput() and access the parent dentry directly as dentry->d_parent. This is also what ext4 is doing in its equivalent trace event ext4_sync_file_enter(). Fixes: a85b46db143f ("btrfs: tracepoints: get correct superblock from dentry in event btrfs_sync_file()") Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-06-01firmware: arm_ffa: Unregister the FF-A devices when cleaning up the partitionsSudeep Holla1-0/+3
[ Upstream commit 46dcd68aaccac0812c12ec3f4e59c8963e2760ad ] Both the FF-A core and the bus were in a single module before the commit 18c250bd7ed0 ("firmware: arm_ffa: Split bus and driver into distinct modules"). The arm_ffa_bus_exit() takes care of unregistering all the FF-A devices. Now that there are 2 distinct modules, if the core driver is unloaded and reloaded, it will end up adding duplicate FF-A devices as the previously registered devices weren't unregistered when we cleaned up the modules. Fix the same by unregistering all the FF-A devices on the FF-A bus during the cleaning up of the partitions and hence the cleanup of the module. Fixes: 18c250bd7ed0 ("firmware: arm_ffa: Split bus and driver into distinct modules") Tested-by: Viresh Kumar <viresh.kumar@linaro.org> Message-Id: <20250217-ffa_updates-v3-8-bd1d9de615e7@arm.com> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Stable-dep-of: 6d3daa9b8d31 ("firmware: arm_ffa: Unregister bus notifier on teardown for FF-A v1.0") Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-06-01device property: set fwnode->secondary to NULL in fwnode_init()Bartosz Golaszewski1-0/+1
commit 215c90ee656114f5e8c32408228d97082f8e0eef upstream. If a firmware node is allocated on the stack (for instance: temporary software node whose life-time we control) or on the heap - but using a non-zeroing allocation function - and initialized using fwnode_init(), its secondary pointer will contain uninitalized memory which likely will be neither NULL nor IS_ERR() and so may end up being dereferenced (for example: in dev_to_swnode()). Set fwnode->secondary to NULL on initialization. Cc: stable <stable@kernel.org> Fixes: 01bb86b380a3 ("driver core: Add fwnode_init()") Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com> Reviewed-by: Rafael J. Wysocki (Intel) <rafael@kernel.org> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Sakari Ailus <sakari.ailus@linux.intel.com> Link: https://patch.msgid.link/20260506115701.23035-1-bartosz.golaszewski@oss.qualcomm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-06-01netfilter: nf_queue: hold bridge skb->dev while queuedHaoze Xie1-0/+1
commit e196115ec330a18de415bdb9f5071aa9f08e53ce upstream. br_pass_frame_up() rewrites skb->dev from the ingress port to the bridge master before queueing bridge LOCAL_IN packets. NFQUEUE only holds references on state.in/out and bridge physdevs, so a queued bridge packet can retain a freed bridge master in skb->dev until reinjection. When the verdict is reinjected later, br_netif_receive_skb() re-enters the receive path with skb->dev still pointing at the freed bridge master, triggering a use-after-free. Store skb->dev in the queue entry, hold a reference on it for the queue lifetime, and use the saved device when dropping queued packets during NETDEV_DOWN handling. Fixes: ac2863445686 ("netfilter: bridge: add nf_afinfo to enable queuing to userspace") Cc: stable@kernel.org Reported-by: Yuan Tan <yuantan098@gmail.com> Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Reported-by: Xin Liu <bird@lzu.edu.cn> Signed-off-by: Haoze Xie <royenheart@gmail.com> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>