summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
39 hoursstring: add mem_is_zero() helper to check if memory area is all zerosJani Nikula1-0/+12
[ Upstream commit 3942bb49728ad9e1f94d953a88af169a8f5d8099 ] Almost two thirds of the memchr_inv() usages check if the memory area is all zeros, with no interest in where in the buffer the first non-zero byte is located. Checking for !memchr_inv(s, 0, n) is also not very intuitive or discoverable. Add an explicit mem_is_zero() helper for this use case. Reviewed-by: Kees Cook <kees@kernel.org> Reviewed-by: Andy Shevchenko <andy@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20240814100035.3100852-1-jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com> Stable-dep-of: 3e6ccd790ed6 ("gpio: cdev: check if uAPI v2 config attributes are correctly zeroed") Signed-off-by: Sasha Levin <sashal@kernel.org>
39 hoursdevice property: set fwnode->secondary to NULL in fwnode_init()Bartosz Golaszewski1-0/+1
commit 215c90ee656114f5e8c32408228d97082f8e0eef upstream. If a firmware node is allocated on the stack (for instance: temporary software node whose life-time we control) or on the heap - but using a non-zeroing allocation function - and initialized using fwnode_init(), its secondary pointer will contain uninitalized memory which likely will be neither NULL nor IS_ERR() and so may end up being dereferenced (for example: in dev_to_swnode()). Set fwnode->secondary to NULL on initialization. Cc: stable <stable@kernel.org> Fixes: 01bb86b380a3 ("driver core: Add fwnode_init()") Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com> Reviewed-by: Rafael J. Wysocki (Intel) <rafael@kernel.org> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Sakari Ailus <sakari.ailus@linux.intel.com> Link: https://patch.msgid.link/20260506115701.23035-1-bartosz.golaszewski@oss.qualcomm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
39 hoursnetfilter: nf_queue: hold bridge skb->dev while queuedHaoze Xie1-0/+1
commit e196115ec330a18de415bdb9f5071aa9f08e53ce upstream. br_pass_frame_up() rewrites skb->dev from the ingress port to the bridge master before queueing bridge LOCAL_IN packets. NFQUEUE only holds references on state.in/out and bridge physdevs, so a queued bridge packet can retain a freed bridge master in skb->dev until reinjection. When the verdict is reinjected later, br_netif_receive_skb() re-enters the receive path with skb->dev still pointing at the freed bridge master, triggering a use-after-free. Store skb->dev in the queue entry, hold a reference on it for the queue lifetime, and use the saved device when dropping queued packets during NETDEV_DOWN handling. Fixes: ac2863445686 ("netfilter: bridge: add nf_afinfo to enable queuing to userspace") Cc: stable@kernel.org Reported-by: Yuan Tan <yuantan098@gmail.com> Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Reported-by: Xin Liu <bird@lzu.edu.cn> Signed-off-by: Haoze Xie <royenheart@gmail.com> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
39 hoursbtrfs: tracepoints: fix sleep while in atomic context in btrfs_sync_file()Filipe Manana1-3/+1
[ Upstream commit c73370c677646e86fc4b1780fb07027bdf847375 ] The trace event btrfs_sync_file() is called in an atomic context (all trace events are) and its call to dput(), which is needed due to the call to dget_parent(), can sleep, triggering a kernel splat. This can be reproduced by enabling the trace event and running btrfs/056 from fstests for example. The splat shown in dmesg is the following: [53.919] BUG: sleeping function called from invalid context at fs/dcache.c:970 [53.947] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 32773, name: xfs_io [53.988] preempt_count: 2, expected: 0 [53.967] RCU nest depth: 0, expected: 0 [53.943] Preemption disabled at: [53.944] [<0000000000000000>] 0x0 [54.078] CPU: 0 UID: 0 PID: 32773 Comm: xfs_io Tainted: G W 7.1.0-rc1-btrfs-next-232+ #1 PREEMPT(full) [54.070] Tainted: [W]=WARN [54.071] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014 [54.072] Call Trace: [54.074] <TASK> [54.076] dump_stack_lvl+0x56/0x80 [54.079] __might_resched.cold+0xd6/0x10f [54.072] dput.part.0+0x24/0x110 [54.078] trace_event_raw_event_btrfs_sync_file+0x75/0x140 [btrfs] [54.089] btrfs_sync_file+0x1ed/0x530 [btrfs] [54.087] ? __handle_mm_fault+0x8ae/0xed0 [54.089] btrfs_do_write_iter+0x172/0x210 [btrfs] [54.091] vfs_write+0x21f/0x450 [54.094] __x64_sys_pwrite64+0x8d/0xc0 [54.096] ? do_user_addr_fault+0x20c/0x670 [54.099] do_syscall_64+0x60/0xf20 [54.092] ? clear_bhb_loop+0x60/0xb0 [54.094] entry_SYSCALL_64_after_hwframe+0x76/0x7e So stop using dget_parent() and dput() and access the parent dentry directly as dentry->d_parent. This is also what ext4 is doing in its equivalent trace event ext4_sync_file_enter(). Fixes: a85b46db143f ("btrfs: tracepoints: get correct superblock from dentry in event btrfs_sync_file()") Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
39 hoursbonding: 3ad: implement proper RCU rules for port->aggregatorEric Dumazet1-1/+1
[ Upstream commit c4f050ce06c56cfb5993268af4a5cb66ed1cd04e ] syzbot found a data-race in bond_3ad_get_active_agg_info / bond_3ad_state_machine_handler [1] which hints at lack of proper RCU implementation. Add __rcu qualifier to port->aggregator, and add proper RCU API. [1] BUG: KCSAN: data-race in bond_3ad_get_active_agg_info / bond_3ad_state_machine_handler write to 0xffff88813cf5c4b0 of 8 bytes by task 36 on cpu 0: ad_port_selection_logic drivers/net/bonding/bond_3ad.c:1659 [inline] bond_3ad_state_machine_handler+0x9d5/0x2d60 drivers/net/bonding/bond_3ad.c:2569 process_one_work kernel/workqueue.c:3302 [inline] process_scheduled_works+0x4f0/0x9c0 kernel/workqueue.c:3385 worker_thread+0x58a/0x780 kernel/workqueue.c:3466 kthread+0x22a/0x280 kernel/kthread.c:436 ret_from_fork+0x146/0x330 arch/x86/kernel/process.c:158 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245 read to 0xffff88813cf5c4b0 of 8 bytes by task 22063 on cpu 1: __bond_3ad_get_active_agg_info drivers/net/bonding/bond_3ad.c:2858 [inline] bond_3ad_get_active_agg_info+0x8c/0x230 drivers/net/bonding/bond_3ad.c:2881 bond_fill_info+0xe0f/0x10f0 drivers/net/bonding/bond_netlink.c:853 rtnl_link_info_fill net/core/rtnetlink.c:906 [inline] rtnl_link_fill+0x1d7/0x4e0 net/core/rtnetlink.c:927 rtnl_fill_ifinfo+0xf8e/0x1380 net/core/rtnetlink.c:2168 rtmsg_ifinfo_build_skb+0x11c/0x1b0 net/core/rtnetlink.c:4453 rtmsg_ifinfo_event net/core/rtnetlink.c:4486 [inline] rtmsg_ifinfo+0x6d/0x110 net/core/rtnetlink.c:4495 __dev_notify_flags+0x76/0x390 net/core/dev.c:9790 netif_change_flags+0xac/0xd0 net/core/dev.c:9823 do_setlink+0x905/0x2950 net/core/rtnetlink.c:3180 rtnl_group_changelink net/core/rtnetlink.c:3813 [inline] __rtnl_newlink net/core/rtnetlink.c:3981 [inline] rtnl_newlink+0xf55/0x1400 net/core/rtnetlink.c:4109 rtnetlink_rcv_msg+0x64b/0x720 net/core/rtnetlink.c:6995 netlink_rcv_skb+0x123/0x220 net/netlink/af_netlink.c:2550 rtnetlink_rcv+0x1c/0x30 net/core/rtnetlink.c:7022 netlink_unicast_kernel net/netlink/af_netlink.c:1318 [inline] netlink_unicast+0x5a8/0x680 net/netlink/af_netlink.c:1344 netlink_sendmsg+0x5c8/0x6f0 net/netlink/af_netlink.c:1894 sock_sendmsg_nosec net/socket.c:787 [inline] __sock_sendmsg net/socket.c:802 [inline] ____sys_sendmsg+0x563/0x5b0 net/socket.c:2698 ___sys_sendmsg+0x195/0x1e0 net/socket.c:2752 __sys_sendmsg net/socket.c:2784 [inline] __do_sys_sendmsg net/socket.c:2789 [inline] __se_sys_sendmsg net/socket.c:2787 [inline] __x64_sys_sendmsg+0xd4/0x160 net/socket.c:2787 x64_sys_call+0x194c/0x3020 arch/x86/include/generated/asm/syscalls_64.h:47 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0x12c/0x3b0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f value changed: 0x0000000000000000 -> 0xffff88813cf5c400 Reported by Kernel Concurrency Sanitizer on: CPU: 1 UID: 0 PID: 22063 Comm: syz.0.31122 Tainted: G W syzkaller #0 PREEMPT(full) Tainted: [W]=WARN Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/18/2026 Fixes: 47e91f56008b ("bonding: use RCU protection for 3ad xmit path") Reported-by: syzbot+9bb2ff2a4ab9e17307e1@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/69f0a82f.050a0220.3aadc4.0000.GAE@google.com/ Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jay Vosburgh <jv@jvosburgh.net> Cc: Andrew Lunn <andrew+netdev@lunn.ch> Link: https://patch.msgid.link/20260428123207.3809211-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
39 hoursbonding: print churn state via netlinkHangbin Liu1-0/+2
[ Upstream commit 4916f2e2f3fc9aef289fcd07949301e5c29094c2 ] Currently, the churn state is printed only in sysfs. Add netlink support so users could get the state via netlink. Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://patch.msgid.link/20260224020215.6012-1-liuhangbin@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Stable-dep-of: c4f050ce06c5 ("bonding: 3ad: implement proper RCU rules for port->aggregator") Signed-off-by: Sasha Levin <sashal@kernel.org>
39 hoursbonding: add support for per-port LACP actor priorityHangbin Liu3-0/+3
[ Upstream commit 6b6dc81ee7e8ca87c71a533e1d69cf96a4f1e986 ] Introduce a new netlink attribute 'actor_port_prio' to allow setting the LACP actor port priority on a per-slave basis. This extends the existing bonding infrastructure to support more granular control over LACP negotiations. The priority value is embedded in LACPDU packets and will be used by subsequent patches to influence aggregator selection policies. Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://patch.msgid.link/20250902064501.360822-2-liuhangbin@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Stable-dep-of: c4f050ce06c5 ("bonding: 3ad: implement proper RCU rules for port->aggregator") Signed-off-by: Sasha Levin <sashal@kernel.org>
39 hoursnet: bonding: add broadcast_neighbor option for 802.3adTonghao Zhang2-0/+4
[ Upstream commit ce7a381697cb3958ffe0b45e5028ac69444e9288 ] Stacking technology is a type of technology used to expand ports on Ethernet switches. It is widely used as a common access method in large-scale Internet data center architectures. Years of practice have proved that stacking technology has advantages and disadvantages in high-reliability network architecture scenarios. For instance, in stacking networking arch, conventional switch system upgrades require multiple stacked devices to restart at the same time. Therefore, it is inevitable that the business will be interrupted for a while. It is for this reason that "no-stacking" in data centers has become a trend. Additionally, when the stacking link connecting the switches fails or is abnormal, the stack will split. Although it is not common, it still happens in actual operation. The problem is that after the split, it is equivalent to two switches with the same configuration appearing in the network, causing network configuration conflicts and ultimately interrupting the services carried by the stacking system. To improve network stability, "non-stacking" solutions have been increasingly adopted, particularly by public cloud providers and tech companies like Alibaba, Tencent, and Didi. "non-stacking" is a method of mimicing switch stacking that convinces a LACP peer, bonding in this case, connected to a set of "non-stacked" switches that all of its ports are connected to a single switch (i.e., LACP aggregator), as if those switches were stacked. This enables the LACP peer's ports to aggregate together, and requires (a) special switch configuration, described in the linked article, and (b) modifications to the bonding 802.3ad (LACP) mode to send all ARP/ND packets across all ports of the active aggregator. Note that, with multiple aggregators, the current broadcast mode logic will send only packets to the selected aggregator(s). +-----------+ +-----------+ | switch1 | | switch2 | +-----------+ +-----------+ ^ ^ | | +-----------------+ | bond4 lacp | +-----------------+ | | | NIC1 | NIC2 +-----------------+ | server | +-----------------+ - https://www.ruijie.com/fr-fr/support/tech-gallery/de-stack-data-center-network-architecture/ Cc: Jay Vosburgh <jv@jvosburgh.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Simon Horman <horms@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Andrew Lunn <andrew+netdev@lunn.ch> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Tonghao Zhang <tonghao@bamaicloud.com> Signed-off-by: Zengbing Tu <tuzengbing@didiglobal.com> Link: https://patch.msgid.link/84d0a044514157bb856a10b6d03a1028c4883561.1751031306.git.tonghao@bamaicloud.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Stable-dep-of: c4f050ce06c5 ("bonding: 3ad: implement proper RCU rules for port->aggregator") Signed-off-by: Sasha Levin <sashal@kernel.org>
39 hoursipv6: rename and move ip6_dst_lookup_tunnel()Beniamino Galvani2-6/+7
[ Upstream commit fc47e86dbfb75a864c0c9dd8e78affb6506296bb ] At the moment ip6_dst_lookup_tunnel() is used only by bareudp. Ideally, other UDP tunnel implementations should use it, but to do so the function needs to accept new parameters that are specific for UDP tunnels, such as the ports. Prepare for these changes by renaming the function to udp_tunnel6_dst_lookup() and move it to file net/ipv6/ip6_udp_tunnel.c. This is similar to what already done for IPv4 in commit bf3fcbf7e7a0 ("ipv4: rename and move ip_route_output_tunnel()"). Suggested-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Beniamino Galvani <b.galvani@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Stable-dep-of: aa6c6d9ee064 ("bareudp: fix NULL pointer dereference in bareudp_fill_metadata_dst()") Signed-off-by: Sasha Levin <sashal@kernel.org>
39 hoursipv4: add new arguments to udp_tunnel_dst_lookup()Beniamino Galvani1-3/+5
[ Upstream commit 72fc68c6356b663a8763f02d9b0ec773d59a4949 ] We want to make the function more generic so that it can be used by other UDP tunnel implementations such as geneve and vxlan. To do that, add the following arguments: - source and destination UDP port; - ifindex of the output interface, needed by vxlan; - the tos, because in some cases it is not taken from struct ip_tunnel_info (for example, when it's inherited from the inner packet); - the dst cache, because not all tunnel types (e.g. vxlan) want to use the one from struct ip_tunnel_info. With these parameters, the function no longer needs the full struct ip_tunnel_info as argument and we can pass only the relevant part of it (struct ip_tunnel_key). Suggested-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Beniamino Galvani <b.galvani@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Stable-dep-of: aa6c6d9ee064 ("bareudp: fix NULL pointer dereference in bareudp_fill_metadata_dst()") Signed-off-by: Sasha Levin <sashal@kernel.org>
39 hoursipv4: remove "proto" argument from udp_tunnel_dst_lookup()Beniamino Galvani1-1/+1
[ Upstream commit 78f3655adcb52412275f282267ee771421731632 ] The function is now UDP-specific, the protocol is always IPPROTO_UDP. Suggested-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Beniamino Galvani <b.galvani@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Stable-dep-of: aa6c6d9ee064 ("bareudp: fix NULL pointer dereference in bareudp_fill_metadata_dst()") Signed-off-by: Sasha Levin <sashal@kernel.org>
39 hoursipv4: rename and move ip_route_output_tunnel()Beniamino Galvani2-6/+6
[ Upstream commit bf3fcbf7e7a08015d3b169bad6281b29d45c272d ] At the moment ip_route_output_tunnel() is used only by bareudp. Ideally, other UDP tunnel implementations should use it, but to do so the function needs to accept new parameters that are specific for UDP tunnels, such as the ports. Prepare for these changes by renaming the function to udp_tunnel_dst_lookup() and move it to file net/ipv4/udp_tunnel_core.c. Suggested-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Beniamino Galvani <b.galvani@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Stable-dep-of: aa6c6d9ee064 ("bareudp: fix NULL pointer dereference in bareudp_fill_metadata_dst()") Signed-off-by: Sasha Levin <sashal@kernel.org>
39 hourscdrom, scsi: sr: propagate read-only status to block layer via set_disk_ro()Daan De Meyer1-0/+1
[ Upstream commit 0898a817621a2f0cddca8122d9b974003fe5036d ] The cdrom core never calls set_disk_ro() for a registered device, so BLKROGET on a CD-ROM device always returns 0 (writable), even when the drive has no write capabilities and writes will inevitably fail. This causes problems for userspace that relies on BLKROGET to determine whether a block device is read-only. For example, systemd's loop device setup uses BLKROGET to decide whether to create a loop device with LO_FLAGS_READ_ONLY. Without the read-only flag, writes pass through the loop device to the CD-ROM and fail with I/O errors. systemd-fsck similarly checks BLKROGET to decide whether to run fsck in no-repair mode (-n). The write-capability bits in cdi->mask come from two different sources: CDC_DVD_RAM and CDC_CD_RW are populated by the driver from the MODE SENSE capabilities page (page 0x2A) before register_cdrom() is called, while CDC_MRW_W and CDC_RAM require the MMC GET CONFIGURATION command and were only probed by cdrom_open_write() at device open time. This meant that any attempt to compute the writable state from the full mask at probe time was incorrect, because the GET CONFIGURATION bits were still unset (and cdi->mask is initialized such that capabilities are assumed present). Fix this by factoring the GET CONFIGURATION probing out of cdrom_open_write() into a new exported helper, cdrom_probe_write_features(), and having sr call it from sr_probe() right after get_capabilities() has populated the MODE SENSE bits. register_cdrom() then calls set_disk_ro() based on the full write-capability mask (CDC_DVD_RAM | CDC_MRW_W | CDC_RAM | CDC_CD_RW) so the block layer reflects the drive's actual write support. The feature queries used (CDF_MRW and CDF_RWRT via GET CONFIGURATION with RT=00) report drive-level capabilities that are persistent across media, so a single probe before register_cdrom() is sufficient and the redundant probe at open time is dropped. With set_disk_ro() now accurate, the long-vestigial cd->writeable flag in sr can go: get_capabilities() used to set cd->writeable based on the same four mask bits, but because CDC_MRW_W and CDC_RAM default to "capability present" in cdi->mask and aren't touched by MODE SENSE, the condition that gated cd->writeable was always true, making it unconditionally 1. Replace the corresponding gate in sr_init_command() with get_disk_ro(cd->disk), which turns a previously no-op check into a real one and also catches kernel-internal bio writers that bypass blkdev_write_iter()'s bdev_read_only() check. The sd driver (SCSI disks) does not have this problem because it checks the MODE SENSE Write Protect bit and calls set_disk_ro() accordingly. The sr driver cannot use the same approach because the MMC specification does not define the WP bit in the MODE SENSE device-specific parameter byte for CD-ROM devices. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Daan De Meyer <daan@amutable.com> Reviewed-by: Phillip Potter <phil@philpotter.co.uk> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Phillip Potter <phil@philpotter.co.uk> Link: https://patch.msgid.link/20260427210139.1400-2-phil@philpotter.co.uk Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
39 hoursnet/sched: sch_pie: annotate data-races in pie_dump_stats()Eric Dumazet1-1/+1
[ Upstream commit 5154561d9b119f781249f8e845fecf059b38b483 ] pie_dump_stats() only runs with RTNL held, reading fields that can be changed in qdisc fast path. Add READ_ONCE()/WRITE_ONCE() annotations. Alternative would be to acquire the qdisc spinlock, but our long-term goal is to make qdisc dump operations lockless as much as we can. tc_pie_xstats fields don't need to be latched atomically, otherwise this bug would have been caught earlier. Fixes: edb09eb17ed8 ("net: sched: do not acquire qdisc spinlock in qdisc/class stats dump") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://patch.msgid.link/20260421142944.4009941-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
39 hourspppoe: drop PFC framesQingfang Deng1-0/+16
[ Upstream commit cc1ff87bce1ccd38410ab10960f576dcd17db679 ] RFC 2516 Section 7 states that Protocol Field Compression (PFC) is NOT RECOMMENDED for PPPoE. In practice, pppd does not support negotiating PFC for PPPoE sessions, and the current PPPoE driver assumes an uncompressed (2-byte) protocol field. However, the generic PPP layer function ppp_input() is not aware of the negotiation result, and still accepts PFC frames. If a peer with a broken implementation or an attacker sends a frame with a compressed (1-byte) protocol field, the subsequent PPP payload is shifted by one byte. This causes the network header to be 4-byte misaligned, which may trigger unaligned access exceptions on some architectures. To reduce the attack surface, drop PPPoE PFC frames. Introduce ppp_skb_is_compressed_proto() helper function to be used in both ppp_generic.c and pppoe.c to avoid open-coding. Fixes: 7fb1b8ca8fa1 ("ppp: Move PFC decompression to PPP generic layer") Signed-off-by: Qingfang Deng <qingfang.deng@linux.dev> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260415022456.141758-2-qingfang.deng@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
39 hourstcp: preserve const qualifier in tcp_sk()Eric Dumazet2-5/+7
[ Upstream commit e9d9da91548b21e189fcd0259a0f2d26d1afc509 ] We can change tcp_sk() to propagate its argument const qualifier, thanks to container_of_const(). We have two places where a const sock pointer has to be upgraded to a write one. We have been using const qualifier for lockless listeners to clearly identify points where writes could happen. Add tcp_sk_rw() helper to better document these. tcp_inbound_md5_hash(), __tcp_grow_window(), tcp_reset_check() and tcp_rack_reo_wnd() get an additional const qualififer for their @tp local variables. smc_check_reset_syn_req() also needs a similar change. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net> Stable-dep-of: 21e92a38cfd8 ("tcp: add data-race annotations around tp->data_segs_out and tp->total_retrans") Signed-off-by: Sasha Levin <sashal@kernel.org>
39 hourscontainer_of: add container_of_const() that preserves const-ness of the pointerGreg Kroah-Hartman1-0/+13
[ Upstream commit 64f6a5d1922bf6d2b2d845de20d4563a6f328e2d ] container_of does not preserve the const-ness of a pointer that is passed into it, which can cause C code that passes in a const pointer to get a pointer back that is not const and then scribble all over the data in it. To prevent this, container_of_const() will preserve the const status of the pointer passed into it using the newly available _Generic() method. Suggested-by: Jason Gunthorpe <jgg@ziepe.ca> Suggested-by: Sakari Ailus <sakari.ailus@linux.intel.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Sakari Ailus <sakari.ailus@linux.intel.com> Acked-by: Rafael J. Wysocki <rafael@kernel.org> Link: https://lore.kernel.org/r/20221205121206.166576-1-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Stable-dep-of: 21e92a38cfd8 ("tcp: add data-race annotations around tp->data_segs_out and tp->total_retrans") Signed-off-by: Sasha Levin <sashal@kernel.org>
39 hourscontainer_of: remove container_of_safe()Greg Kroah-Hartman1-16/+0
[ Upstream commit 848dba781f1951636c966c9f3a6a41a5b2f8b572 ] It came in from a staging driver that has been long removed from the tree, and there are no in-kernel users of the macro, and it's very dubious if anyone should ever use this thing, so just remove it entirely. Reviewed-by: Sakari Ailus <sakari.ailus@linux.intel.com> Acked-by: Rafael J. Wysocki <rafael@kernel.org> Acked-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20221024123933.3331116-1-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Stable-dep-of: 21e92a38cfd8 ("tcp: add data-race annotations around tp->data_segs_out and tp->total_retrans") Signed-off-by: Sasha Levin <sashal@kernel.org>
39 hourslib/hexdump: print_hex_dump_bytes() calls print_hex_dump_debug()Geert Uytterhoeven1-2/+3
[ Upstream commit 36776b7f8a8955b4e75b5d490a75fee0c7a2a7ef ] print_hex_dump_bytes() claims to be a simple wrapper around print_hex_dump(), but it actally calls print_hex_dump_debug(), which means no output is printed if (dynamic) DEBUG is disabled. Update the documentation to match the implementation. Fixes: 091cb0994edd20d6 ("lib/hexdump: make print_hex_dump_bytes() a nop on !DEBUG builds") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Petr Mladek <pmladek@suse.com> Link: https://patch.msgid.link/3d5c3069fd9102ecaf81d044b750cd613eb72a08.1774970392.git.geert+renesas@glider.be Signed-off-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
39 hoursdt-bindings: clock: qcom,dispcc-sc7180: Define MDSS resetsKonrad Dybcio1-1/+6
[ Upstream commit fc6e29d42872680dca017f2e5169eefe971f8d89 ] The MDSS resets have so far been left undescribed. Fix that. Fixes: 75616da71291 ("dt-bindings: clock: Introduce QCOM sc7180 display clock bindings") Signed-off-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Reviewed-by: Taniya Das <taniya.das@oss.qualcomm.com> Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com> Tested-by: Val Packett <val@packett.cool> # sc7180-ecs-liva-qc710 Link: https://lore.kernel.org/r/20260120-topic-7180_dispcc_bcr-v1-1-0b1b442156c3@oss.qualcomm.com Signed-off-by: Bjorn Andersson <andersson@kernel.org> Stable-dep-of: b0bc6011c549 ("clk: qcom: dispcc-sc7180: Add missing MDSS resets") Signed-off-by: Sasha Levin <sashal@kernel.org>
39 hoursdt-bindings: clock: qcom,gcc-sc8180x: Add missing GDSCsVal Packett1-0/+5
[ Upstream commit 76404ffbf07f28a5ec04748e18fce3dac2e78ef6 ] There are 5 more GDSCs that we were ignoring and not putting to sleep, which are listed in downstream DTS. Add them. Signed-off-by: Val Packett <val@packett.cool> Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com> Link: https://lore.kernel.org/r/20260312112321.370983-2-val@packett.cool Signed-off-by: Bjorn Andersson <andersson@kernel.org> Stable-dep-of: 3565741eb985 ("clk: qcom: gcc-sc8180x: Add missing GDSCs") Signed-off-by: Sasha Levin <sashal@kernel.org>
39 hoursdev_printk: add new dev_err_probe() helpersNuno Sa1-0/+8
[ Upstream commit dbbe7eaf0e4795bf003ac06872aaf52b6b6b1310 ] This is similar to dev_err_probe() but for cases where an ERR_PTR() or ERR_CAST() is to be returned simplifying patterns like: dev_err_probe(dev, ret, ...); return ERR_PTR(ret) or dev_err_probe(dev, PTR_ERR(ptr), ...); return ERR_CAST(ptr) Signed-off-by: Nuno Sa <nuno.sa@analog.com> Link: https://patch.msgid.link/20240606-dev-add_dev_errp_probe-v3-1-51bb229edd79@analog.com Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Stable-dep-of: 797cc011ae02 ("backlight: sky81452-backlight: Check return value of devm_gpiod_get_optional() in sky81452_bl_parse_dt()") Signed-off-by: Sasha Levin <sashal@kernel.org>
39 hoursdriver core: Move dev_err_probe() to where it belogsAndy Shevchenko2-2/+2
[ Upstream commit 9e0cace7a6254070159ebd86497eadc29ea307ca ] dev_err_probe() belongs to the printing API, hence move the definition from device.h to dev_printk.h. There is no change to the callers at all, since: 1) implementation is located in the same core.c; 2) dev_printk.h is guaranteed to be included by device.h. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Andi Shyti <andi.shyti@kernel.org> Link: https://lore.kernel.org/r/20230721131309.16821-1-andriy.shevchenko@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Stable-dep-of: 797cc011ae02 ("backlight: sky81452-backlight: Check return value of devm_gpiod_get_optional() in sky81452_bl_parse_dt()") Signed-off-by: Sasha Levin <sashal@kernel.org>
39 hoursdriver core: device.h: remove extern from function prototypesGreg Kroah-Hartman1-2/+1
[ Upstream commit f43243c66e5e9ad839d235f82a58e73a7e7612af ] The kernel coding style does not require 'extern' in function prototypes in .h files, so remove them from include/linux/device.h as they are not needed. Acked-by: Rafael J. Wysocki <rafael@kernel.org> Link: https://lore.kernel.org/r/20230324122711.2664537-1-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Stable-dep-of: 797cc011ae02 ("backlight: sky81452-backlight: Check return value of devm_gpiod_get_optional() in sky81452_bl_parse_dt()") Signed-off-by: Sasha Levin <sashal@kernel.org>
39 hoursquota: Fix race of dquot_scan_active() with quota deactivationJan Kara1-8/+1
[ Upstream commit e93ab401da4b2e2c1b8ef2424de2f238d51c8b2d ] dquot_scan_active() can race with quota deactivation in quota_release_workfn() like: CPU0 (quota_release_workfn) CPU1 (dquot_scan_active) ============================== ============================== spin_lock(&dq_list_lock); list_replace_init( &releasing_dquots, &rls_head); /* dquot X on rls_head, dq_count == 0, DQ_ACTIVE_B still set */ spin_unlock(&dq_list_lock); synchronize_srcu(&dquot_srcu); spin_lock(&dq_list_lock); list_for_each_entry(dquot, &inuse_list, dq_inuse) { /* finds dquot X */ dquot_active(X) -> true atomic_inc(&X->dq_count); } spin_unlock(&dq_list_lock); spin_lock(&dq_list_lock); dquot = list_first_entry(&rls_head); WARN_ON_ONCE(atomic_read(&dquot->dq_count)); The problem is not only a cosmetic one as under memory pressure the caller of dquot_scan_active() can end up working on freed dquot. Fix the problem by making sure the dquot is removed from releasing list when we acquire a reference to it. Fixes: 869b6ea1609f ("quota: Fix slow quotaoff") Reported-by: Sam Sun <samsun1006219@gmail.com> Link: https://lore.kernel.org/all/CAEkJfYPTt3uP1vAYnQ5V2ZWn5O9PLhhGi5HbOcAzyP9vbXyjeg@mail.gmail.com Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Sasha Levin <sashal@kernel.org>
39 hourspadata: Put CPU offline callback in ONLINE section to allow failureDaniel Jordan2-5/+4
[ Upstream commit c8c4a2972f83c8b68ff03b43cecdb898939ff851 ] syzbot reported the following warning: DEAD callback error for CPU1 WARNING: kernel/cpu.c:1463 at _cpu_down+0x759/0x1020 kernel/cpu.c:1463, CPU#0: syz.0.1960/14614 at commit 4ae12d8bd9a8 ("Merge tag 'kbuild-fixes-7.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux") which tglx traced to padata_cpu_dead() given it's the only sub-CPUHP_TEARDOWN_CPU callback that returns an error. Failure isn't allowed in hotplug states before CPUHP_TEARDOWN_CPU so move the CPU offline callback to the ONLINE section where failure is possible. Fixes: 894c9ef9780c ("padata: validate cpumask without removed CPU during offline") Reported-by: syzbot+123e1b70473ce213f3af@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/69af0a05.050a0220.310d8.002f.GAE@google.com/ Debugged-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
39 hoursmodule: Fix freeing of charp module parameters when CONFIG_SYSFS=nPetr Pavlu1-8/+3
[ Upstream commit deffe1edba626d474fef38007c03646ca5876a0e ] When setting a charp module parameter, the param_set_charp() function allocates memory to store a copy of the input value. Later, when the module is potentially unloaded, the destroy_params() function is called to free this allocated memory. However, destroy_params() is available only when CONFIG_SYSFS=y, otherwise only a dummy variant is present. In the unlikely case that the kernel is configured with CONFIG_MODULES=y and CONFIG_SYSFS=n, this results in a memory leak of charp values when a module is unloaded. Fix this issue by making destroy_params() always available when CONFIG_MODULES=y. Rename the function to module_destroy_params() to clarify that it is intended for use by the module loader. Fixes: e180a6b7759a ("param: fix charp parameters set via sysfs") Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
39 hourskernel: globalize lookup_or_create_module_kobject()Shyam Saini1-0/+2
[ Upstream commit 7c76c813cfc42a7376378a0c4b7250db2eebab81 ] lookup_or_create_module_kobject() is marked as static and __init, to make it global drop static keyword. Since this function can be called from non-init code, use __modinit instead of __init, __modinit marker will make it __init if CONFIG_MODULES is not defined. Suggested-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Shyam Saini <shyamsaini@linux.microsoft.com> Link: https://lore.kernel.org/r/20250227184930.34163-4-shyamsaini@linux.microsoft.com Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Stable-dep-of: deffe1edba62 ("module: Fix freeing of charp module parameters when CONFIG_SYSFS=n") Signed-off-by: Sasha Levin <sashal@kernel.org>
39 hoursbpf: Add CHECKSUM_COMPLETE to bpf test progsVadim Fedorenko1-0/+2
[ Upstream commit a3cfe84cca28f205761a0450016593b0d728165e ] Add special flag to validate that TC BPF program properly updates checksum information in skb. Signed-off-by: Vadim Fedorenko <vadfed@meta.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20240606145851.229116-1-vadfed@meta.com Stable-dep-of: 972787479ee7 ("bpf: test_run: Fix the null pointer dereference issue in bpf_lwt_xmit_push_encap") Signed-off-by: Sasha Levin <sashal@kernel.org>
39 hoursfirmware: dmi: Correct an indexing error in dmi.hMario Limonciello (AMD)1-0/+5
[ Upstream commit c064abc68e009d2cc18416e7132d9c25e03125b6 ] The entries later in enum dmi_entry_type don't match the SMBIOS specification¹. The entry for type 33: `64-Bit Memory Error Information` is not present and thus the index for all later entries is incorrect. Add it. Also, add missing entry types 43-46, while at it. ¹ Search for "System Management BIOS (SMBIOS) Reference Specification" [ bp: Drop the flaky SMBIOS spec URL. ] Fixes: 93c890dbe5287 ("firmware: Add DMI entry types to the headers") Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Jean Delvare <jdelvare@suse.de> Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com> Link: https://patch.msgid.link/20260307141024.819807-2-superm1@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
39 hourslocking: Fix rwlock support in <linux/spinlock_up.h>Bart Van Assche1-10/+10
[ Upstream commit 756a0e011cfca0b45a48464aa25b05d9a9c2fb0b ] Architecture support for rwlocks must be available whether or not CONFIG_DEBUG_SPINLOCK has been defined. Move the definitions of the arch_{read,write}_{lock,trylock,unlock}() macros such that these become visbile if CONFIG_DEBUG_SPINLOCK=n. This patch prepares for converting do_raw_{read,write}_trylock() into inline functions. Without this patch that conversion triggers a build failure for UP architectures, e.g. arm-ep93xx. I used the following kernel configuration to build the kernel for that architecture: CONFIG_ARCH_MULTIPLATFORM=y CONFIG_ARCH_MULTI_V7=n CONFIG_ATAGS=y CONFIG_MMU=y CONFIG_ARCH_MULTI_V4T=y CONFIG_CPU_LITTLE_ENDIAN=y CONFIG_ARCH_EP93XX=y Fixes: fb1c8f93d869 ("[PATCH] spinlock consolidation") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260313171510.230998-2-bvanassche@acm.org Signed-off-by: Sasha Levin <sashal@kernel.org>
39 hourshrtimer: Reduce trace noise in hrtimer_start()Thomas Gleixner1-4/+7
[ Upstream commit f2e388a019e4cf83a15883a3d1f1384298e9a6aa ] hrtimer_start() when invoked with an already armed timer traces like: <comm>-.. [032] d.h2. 5.002263: hrtimer_cancel: hrtimer= .... <comm>-.. [032] d.h1. 5.002263: hrtimer_start: hrtimer= .... Which is incorrect as the timer doesn't get canceled. Just the expiry time changes. The internal dequeue operation which is required for that is not really interesting for trace analysis. But it makes it tedious to keep real cancellations and the above case apart. Remove the cancel tracing in hrtimer_start() and add a 'was_armed' indicator to the hrtimer start tracepoint, which clearly indicates what the state of the hrtimer is when hrtimer_start() is invoked: <comm>-.. [032] d.h1. 6.200103: hrtimer_start: hrtimer= .... was_armed=0 <comm>-.. [032] d.h1. 6.200558: hrtimer_start: hrtimer= .... was_armed=1 Fixes: c6a2a1770245 ("hrtimer: Add tracepoint for hrtimers") Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260224163430.208491877@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
39 hoursfanotify: fix false positive on permission eventsMiklos Szeredi1-0/+1
commit 7746e3bd4cc19b5092e00d32d676e329bfcb6900 upstream. fsnotify_get_mark_safe() may return false for a mark on an unrelated group, which results in bypassing the permission check. Fix by skipping over detached marks that are not in the current group. CC: stable@vger.kernel.org Fixes: abc77577a669 ("fsnotify: Provide framework for dropping SRCU lock in ->handle_event") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Link: https://patch.msgid.link/20260410144950.156160-1-mszeredi@redhat.com Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
39 hoursBluetooth: hci_sync: Remove remaining dependencies of hci_requestLuiz Augusto von Dentz1-0/+17
[ Upstream commit f2d89775358606c7ab6b6b6c4a02fe1e8cd270b1 ] This removes the dependencies of hci_req_init and hci_request_cancel_all from hci_sync.c. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Fang Wang <32840572@qq.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
39 hoursfbdev: udlfb: add vm_ops to dlfb_ops_mmap to prevent use-after-freeRajat Gupta1-0/+1
commit 8de779dc40d35d39fa07387b6f921eb11df0f511 upstream. dlfb_ops_mmap() uses remap_pfn_range() to map vmalloc framebuffer pages to userspace but sets no vm_ops on the VMA. This means the kernel cannot track active mmaps. When dlfb_realloc_framebuffer() replaces the backing buffer via FBIOPUT_VSCREENINFO, existing mmap PTEs are not invalidated. On USB disconnect, dlfb_ops_destroy() calls vfree() on the old pages while userspace PTEs still reference them, resulting in a use-after-free: the process retains read/write access to freed kernel pages. Add vm_operations_struct with open/close callbacks that maintain an atomic mmap_count on struct dlfb_data. In dlfb_realloc_framebuffer(), check mmap_count and return -EBUSY if the buffer is currently mapped, preventing buffer replacement while userspace holds stale PTEs. Tested with PoC using dummy_hcd + raw_gadget USB device emulation. Signed-off-by: Rajat Gupta <rajgupt@qti.qualcomm.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: stable@vger.kernel.org Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
39 hoursdriver core: Add kernel-doc for DEV_FLAG_COUNT enum valueDouglas Anderson1-0/+1
commit 5b484311507b5d403c1f7a45f6aa3778549e268b upstream. Even though nobody should use this value (except when declaring the "flags" bitmap), kernel-doc still gets upset that it's not documented. It reports: WARNING: ../include/linux/device.h:519 Enum value 'DEV_FLAG_COUNT' not described in enum 'struct_device_flags' Add the description of DEV_FLAG_COUNT. Fixes: a2225b6e834a ("driver core: Don't let a device probe until it's ready") Reported-by: Randy Dunlap <rdunlap@infradead.org> Closes: https://lore.kernel.org/f318cd43-81fd-48b9-abf7-92af85f12f91@infradead.org Signed-off-by: Douglas Anderson <dianders@chromium.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Link: https://patch.msgid.link/20260413195910.1.I23aca74fe2d3636a47df196a80920fecb2643220@changeid Signed-off-by: Danilo Krummrich <dakr@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
39 hoursrandomize_kstack: Maintain kstack_offset per taskRyan Roberts2-11/+19
commit 37beb42560165869838e7d91724f3e629db64129 upstream. kstack_offset was previously maintained per-cpu, but this caused a couple of issues. So let's instead make it per-task. Issue 1: add_random_kstack_offset() and choose_random_kstack_offset() expected and required to be called with interrupts and preemption disabled so that it could manipulate per-cpu state. But arm64, loongarch and risc-v are calling them with interrupts and preemption enabled. I don't _think_ this causes any functional issues, but it's certainly unexpected and could lead to manipulating the wrong cpu's state, which could cause a minor performance degradation due to bouncing the cache lines. By maintaining the state per-task those functions can safely be called in preemptible context. Issue 2: add_random_kstack_offset() is called before executing the syscall and expands the stack using a previously chosen random offset. choose_random_kstack_offset() is called after executing the syscall and chooses and stores a new random offset for the next syscall. With per-cpu storage for this offset, an attacker could force cpu migration during the execution of the syscall and prevent the offset from being updated for the original cpu such that it is predictable for the next syscall on that cpu. By maintaining the state per-task, this problem goes away because the per-task random offset is updated after the syscall regardless of which cpu it is executing on. Fixes: 39218ff4c625 ("stack: Optionally randomize kernel stack offset each syscall") Closes: https://lore.kernel.org/all/dd8c37bc-795f-4c7a-9086-69e584d8ab24@arm.com/ Cc: stable@vger.kernel.org Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Link: https://patch.msgid.link/20260303150840.3789438-2-ryan.roberts@arm.com Signed-off-by: Kees Cook <kees@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
39 hourstpm: avoid -Wunused-but-set-variableArnd Bergmann1-2/+7
commit 6f1d4d2ecfcd1b577dc87350ea965fe81f272e83 upstream. Outside of the EFI tpm code, the TPM_MEMREMAP()/TPM_MEMUNMAP functions are defined as trivial macros, leading to the mapping_size variable ending up unused: In file included from drivers/char/tpm/tpm-sysfs.c:16: In file included from drivers/char/tpm/tpm.h:28: include/linux/tpm_eventlog.h:167:6: error: variable 'mapping_size' set but not used [-Werror,-Wunused-but-set-variable] 167 | int mapping_size; Turn the stubs into inline functions to avoid this warning. Cc: stable@vger.kernel.org # v5.3+ Fixes: c46f3405692d ("tpm: Reserve the TPM final events table") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Thorsten Blum <thorsten.blum@linux.dev> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
39 hoursf2fs: fix to detect potential corrupted nid in free_nid_listChao Yu1-0/+1
[ Upstream commit 8fc6056dcf79937c46c97fa4996cda65956437a9 ] As reported, on-disk footer.ino and footer.nid is the same and out-of-range, let's add sanity check on f2fs_alloc_nid() to detect any potential corruption in free_nid_list. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Robert Garcia <rob_garcia@163.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
39 hoursdriver core: Don't let a device probe until it's readyDouglas Anderson1-0/+44
[ Upstream commit a2225b6e834a838ae3c93709760edc0a169eb2f2 ] The moment we link a "struct device" into the list of devices for the bus, it's possible probe can happen. This is because another thread can load the driver at any time and that can cause the device to probe. This has been seen in practice with a stack crawl that looks like this [1]: really_probe() __driver_probe_device() driver_probe_device() __driver_attach() bus_for_each_dev() driver_attach() bus_add_driver() driver_register() __platform_driver_register() init_module() [some module] do_one_initcall() do_init_module() load_module() __arm64_sys_finit_module() invoke_syscall() As a result of the above, it was seen that device_links_driver_bound() could be called for the device before "dev->fwnode->dev" was assigned. This prevented __fw_devlink_pickup_dangling_consumers() from being called which meant that other devices waiting on our driver's sub-nodes were stuck deferring forever. It's believed that this problem is showing up suddenly for two reasons: 1. Android has recently (last ~1 year) implemented an optimization to the order it loads modules [2]. When devices opt-in to this faster loading, modules are loaded one-after-the-other very quickly. This is unlike how other distributions do it. The reproduction of this problem has only been seen on devices that opt-in to Android's "parallel module loading". 2. Android devices typically opt-in to fw_devlink, and the most noticeable issue is the NULL "dev->fwnode->dev" in device_links_driver_bound(). fw_devlink is somewhat new code and also not in use by all Linux devices. Even though the specific symptom where "dev->fwnode->dev" wasn't assigned could be fixed by moving that assignment higher in device_add(), other parts of device_add() (like the call to device_pm_add()) are also important to run before probe. Only moving the "dev->fwnode->dev" assignment would likely fix the current symptoms but lead to difficult-to-debug problems in the future. Fix the problem by preventing probe until device_add() has run far enough that the device is ready to probe. If somehow we end up trying to probe before we're allowed, __driver_probe_device() will return -EPROBE_DEFER which will make certain the device is noticed. In the race condition that was seen with Android's faster module loading, we will temporarily add the device to the deferred list and then take it off immediately when device_add() probes the device. Instead of adding another flag to the bitfields already in "struct device", instead add a new "flags" field and use that. This allows us to freely change the bit from different thread without worrying about corrupting nearby bits (and means threads changing other bit won't corrupt us). [1] Captured on a machine running a downstream 6.6 kernel [2] https://cs.android.com/android/platform/superproject/main/+/main:system/core/libmodprobe/libmodprobe.cpp?q=LoadModulesParallel Cc: stable@vger.kernel.org Fixes: 2023c610dc54 ("Driver core: add new device to bus's list before probing") Reviewed-by: Alan Stern <stern@rowland.harvard.edu> Reviewed-by: Rafael J. Wysocki (Intel) <rafael@kernel.org> Reviewed-by: Danilo Krummrich <dakr@kernel.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Douglas Anderson <dianders@chromium.org> Link: https://patch.msgid.link/20260406162231.v5.1.Id750b0fbcc94f23ed04b7aecabcead688d0d8c17@changeid Signed-off-by: Danilo Krummrich <dakr@kernel.org> Signed-off-by: Douglas Anderson <dianders@chromium.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
39 hoursdevice property: Make modifications of fwnode "flags" thread safeDouglas Anderson1-11/+33
commit f72e77c33e4b5657af35125e75bab249256030f3 upstream. In various places in the kernel, we modify the fwnode "flags" member by doing either: fwnode->flags |= SOME_FLAG; fwnode->flags &= ~SOME_FLAG; This type of modification is not thread-safe. If two threads are both mucking with the flags at the same time then one can clobber the other. While flags are often modified while under the "fwnode_link_lock", this is not universally true. Create some accessor functions for setting, clearing, and testing the FWNODE flags and move all users to these accessor functions. New accessor functions use set_bit() and clear_bit(), which are thread-safe. Cc: stable@vger.kernel.org Fixes: c2c724c868c4 ("driver core: Add fw_devlink_parse_fwtree()") Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Mark Brown <broonie@kernel.org> Reviewed-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Rafael J. Wysocki (Intel) <rafael@kernel.org> Reviewed-by: Saravana Kannan <saravanak@kernel.org> Link: https://patch.msgid.link/20260317090112.v2.1.I0a4d03104ecd5103df3d76f66c8d21b1d15a2e38@changeid [ Fix fwnode_clear_flag() argument alignment, restore dropped blank line in fwnode_dev_initialized(), and remove unnecessary parentheses around fwnode_test_flag() calls. - Danilo ] Signed-off-by: Danilo Krummrich <dakr@kernel.org> Signed-off-by: Douglas Anderson <dianders@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
39 hourspadata: Remove comment for reorder_workHerbert Xu1-1/+0
[ Upstream commit 82a0302e7167d0b7c6cde56613db3748f8dd806d ] Remove comment for reorder_work which no longer exists. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Fixes: 71203f68c774 ("padata: Fix pd UAF once and for all") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Bin Lan <lanbincn@139.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
39 hourspadata: Fix pd UAF once and for allHerbert Xu1-3/+0
[ Upstream commit 71203f68c7749609d7fc8ae6ad054bdedeb24f91 ] There is a race condition/UAF in padata_reorder that goes back to the initial commit. A reference count is taken at the start of the process in padata_do_parallel, and released at the end in padata_serial_worker. This reference count is (and only is) required for padata_replace to function correctly. If padata_replace is never called then there is no issue. In the function padata_reorder which serves as the core of padata, as soon as padata is added to queue->serial.list, and the associated spin lock released, that padata may be processed and the reference count on pd would go away. Fix this by getting the next padata before the squeue->serial lock is released. In order to make this possible, simplify padata_reorder by only calling it once the next padata arrives. Fixes: 16295bec6398 ("padata: Generic parallelization/serialization interface") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> [ Adjust context of padata_find_next(). Replace cpumask_next_wrap(cpu, pd->cpumask.pcpu) with cpumask_next_wrap(cpu, pd->cpumask.pcpu, -1, false) in padata_reorder() in v6.1 according to dc5bb9b769c9 ("cpumask: deprecate cpumask_next_wrap()") and f954a2d37637 ("padata: switch padata_find_next() to using cpumask_next_wrap()") . ] Signed-off-by: Bin Lan <lanbincn@139.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
39 hoursusb: xhci: Make usb_host_endpoint.hcpriv survive endpoint_disable()Michal Pecio1-1/+2
commit 25e531b422dc2ac90cdae3b6e74b5cdeb081440d upstream. xHCI hardware maintains its endpoint state between add_endpoint() and drop_endpoint() calls followed by successful check_bandwidth(). So does the driver. Core may call endpoint_disable() during xHCI endpoint life, so don't clear host_ep->hcpriv then, because this breaks endpoint_reset(). If a driver calls usb_set_interface(), submits URBs which make host sequence state non-zero and calls usb_clear_halt(), the device clears its sequence state but xhci_endpoint_reset() bails out. The next URB malfunctions: USB2 loses one packet, USB3 gets Transaction Error or may not complete at all on some (buggy?) HCs from ASMedia and AMD. This is triggered by uvcvideo on bulk video devices. The code was copied from ehci_endpoint_disable() but it isn't needed here - hcpriv should only be NULL on emulated root hub endpoints. It might prevent resetting and inadvertently enabling a disabled and dropped endpoint, but core shouldn't try to reset dropped endpoints. Document xhci requirements regarding hcpriv. They are currently met. Fixes: 18b74067ac78 ("xhci: Fix use-after-free regression in xhci clear hub TT implementation") Cc: stable@vger.kernel.org Signed-off-by: Michal Pecio <michal.pecio@gmail.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://patch.msgid.link/20260402131342.2628648-26-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
39 hoursscsi: ufs: core: Fix use-after free in init error and remove pathsAndré Draszik1-1/+0
[ Upstream commit f8fb2403ddebb5eea0033d90d9daae4c88749ada ] devm_blk_crypto_profile_init() registers a cleanup handler to run when the associated (platform-) device is being released. For UFS, the crypto private data and pointers are stored as part of the ufs_hba's data structure 'struct ufs_hba::crypto_profile'. This structure is allocated as part of the underlying ufshcd and therefore Scsi_host allocation. During driver release or during error handling in ufshcd_pltfrm_init(), this structure is released as part of ufshcd_dealloc_host() before the (platform-) device associated with the crypto call above is released. Once this device is released, the crypto cleanup code will run, using the just-released 'struct ufs_hba::crypto_profile'. This causes a use-after-free situation: Call trace: kfree+0x60/0x2d8 (P) kvfree+0x44/0x60 blk_crypto_profile_destroy_callback+0x28/0x70 devm_action_release+0x1c/0x30 release_nodes+0x6c/0x108 devres_release_all+0x98/0x100 device_unbind_cleanup+0x20/0x70 really_probe+0x218/0x2d0 In other words, the initialisation code flow is: platform-device probe ufshcd_pltfrm_init() ufshcd_alloc_host() scsi_host_alloc() allocation of struct ufs_hba creation of scsi-host devices devm_blk_crypto_profile_init() devm registration of cleanup handler using platform-device and during error handling of ufshcd_pltfrm_init() or during driver removal: ufshcd_dealloc_host() scsi_host_put() put_device(scsi-host) release of struct ufs_hba put_device(platform-device) crypto cleanup handler To fix this use-after free, change ufshcd_alloc_host() to register a devres action to automatically cleanup the underlying SCSI device on ufshcd destruction, without requiring explicit calls to ufshcd_dealloc_host(). This way: * the crypto profile and all other ufs_hba-owned resources are destroyed before SCSI (as they've been registered after) * a memleak is plugged in tc-dwc-g210-pci.c remove() as a side-effect * EXPORT_SYMBOL_GPL(ufshcd_dealloc_host) can be removed fully as it's not needed anymore * no future drivers using ufshcd_alloc_host() could ever forget adding the cleanup Fixes: cb77cb5abe1f ("blk-crypto: rename blk_keyslot_manager to blk_crypto_profile") Fixes: d76d9d7d1009 ("scsi: ufs: use devm_blk_ksm_init()") Cc: stable@vger.kernel.org Signed-off-by: André Draszik <andre.draszik@linaro.org> Link: https://lore.kernel.org/r/20250124-ufshcd-fix-v4-1-c5d0144aae59@linaro.org Reviewed-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Acked-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> [ Delete modifications about ufshcd_parse_operating_points() for it's added from commit 72208ebe181e3("scsi: ufs: core: Add support for parsing OPP") and that in ufshcd_pltfrm_remove() for it's added from commit 897df60c16d54("scsi: ufs: pltfrm: Dellocate HBA during ufshcd_pltfrm_remove()"). ] Signed-off-by: Robert Garcia <rob_garcia@163.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
39 hoursrxrpc: Fix recvmsg() unconditional requeueDavid Howells1-0/+4
[ Upstream commit 2c28769a51deb6022d7fbd499987e237a01dd63a ] If rxrpc_recvmsg() fails because MSG_DONTWAIT was specified but the call at the front of the recvmsg queue already has its mutex locked, it requeues the call - whether or not the call is already queued. The call may be on the queue because MSG_PEEK was also passed and so the call was not dequeued or because the I/O thread requeued it. The unconditional requeue may then corrupt the recvmsg queue, leading to things like UAFs or refcount underruns. Fix this by only requeuing the call if it isn't already on the queue - and moving it to the front if it is already queued. If we don't queue it, we have to put the ref we obtained by dequeuing it. Also, MSG_PEEK doesn't dequeue the call so shouldn't call rxrpc_notify_socket() for the call if we didn't use up all the data on the queue, so fix that also. Fixes: 540b1c48c37a ("rxrpc: Fix deadlock between call creation and sendmsg/recvmsg") Reported-by: Faith <faith@zellic.io> Reported-by: Pumpkin Chang <pumpkin@devco.re> Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Cc: stable@vger.kernel.org [Adapted to 6.1: use write_lock_bh/write_unlock_bh, trace_rxrpc_call directly for see-call tracing, and 6.1 trace enum naming convention.] Signed-off-by: Jay Wang <wanjay@amazon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
39 hourssoc: qcom: apr: make remove callback of apr driver void returnedDawei Li1-1/+1
[ Upstream commit 33ae3d0955943ac5bacfcb6911cf7cb74822bf8c ] Since commit fc7a6209d571 ("bus: Make remove callback return void") forces bus_type::remove be void-returned, it doesn't make much sense for any bus based driver implementing remove callbalk to return non-void to its caller. As such, change the remove function for apr bus based drivers to return void. Signed-off-by: Dawei Li <set_pte_at@outlook.com> Signed-off-by: Bjorn Andersson <andersson@kernel.org> Link: https://lore.kernel.org/r/TYCP286MB23232B7968D34DB8323B0F16CAFB9@TYCP286MB2323.JPNP286.PROD.OUTLOOK.COM Stable-dep-of: 6ec1235fc941 ("ASoC: qcom: q6apm: move component registration to unmanaged version") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
39 hoursKVM: x86: Use __DECLARE_FLEX_ARRAY() for UAPI structures with VLAsDavid Woodhouse1-5/+6
[ Upstream commit 2619da73bb2f10d88f7e1087125c40144fdf0987 ] Commit 94dfc73e7cf4 ("treewide: uapi: Replace zero-length arrays with flexible-array members") broke the userspace API for C++. These structures ending in VLAs are typically a *header*, which can be followed by an arbitrary number of entries. Userspace typically creates a larger structure with some non-zero number of entries, for example in QEMU's kvm_arch_get_supported_msr_feature(): struct { struct kvm_msrs info; struct kvm_msr_entry entries[1]; } msr_data = {}; While that works in C, it fails in C++ with an error like: flexible array member 'kvm_msrs::entries' not at end of 'struct msr_data' Fix this by using __DECLARE_FLEX_ARRAY() for the VLA, which uses [0] for C++ compilation. Fixes: 94dfc73e7cf4 ("treewide: uapi: Replace zero-length arrays with flexible-array members") Cc: stable@vger.kernel.org Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Link: https://patch.msgid.link/3abaf6aefd6e5efeff3b860ac38421d9dec908db.camel@infradead.org [sean: tag for stable@] Signed-off-by: Sean Christopherson <seanjc@google.com> [ applied `__DECLARE_FLEX_ARRAY(char, name)` change directly instead of inside missing `#ifdef __KERNEL__` else branch ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
39 hourswifi: mac80211: always free skb on ieee80211_tx_prepare_skb() failureFelix Fietkau1-0/+4
[ Upstream commit d5ad6ab61cbd89afdb60881f6274f74328af3ee9 ] ieee80211_tx_prepare_skb() has three error paths, but only two of them free the skb. The first error path (ieee80211_tx_prepare() returning TX_DROP) does not free it, while invoke_tx_handlers() failure and the fragmentation check both do. Add kfree_skb() to the first error path so all three are consistent, and remove the now-redundant frees in callers (ath9k, mt76, mac80211_hwsim) to avoid double-free. Document the skb ownership guarantee in the function's kdoc. Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://patch.msgid.link/20260314065455.2462900-1-nbd@nbd.name Fixes: 06be6b149f7e ("mac80211: add ieee80211_tx_prepare_skb() helper function") Signed-off-by: Johannes Berg <johannes.berg@intel.com> [ Exclude changes to drivers/net/wireless/mediatek/mt76/scan.c as this file is first introduced by commit 31083e38548f("wifi: mt76: add code for emulating hardware scanning") after linux-6.14.] Signed-off-by: Li hongliang <1468888505@139.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
39 hoursKVM: x86: Use scratch field in MMIO fragment to hold small write valuesSean Christopherson1-1/+2
commit 0b16e69d17d8c35c5c9d5918bf596c75a44655d3 upstream. When exiting to userspace to service an emulated MMIO write, copy the to-be-written value to a scratch field in the MMIO fragment if the size of the data payload is 8 bytes or less, i.e. can fit in a single chunk, instead of pointing the fragment directly at the source value. This fixes a class of use-after-free bugs that occur when the emulator initiates a write using an on-stack, local variable as the source, the write splits a page boundary, *and* both pages are MMIO pages. Because KVM's ABI only allows for physically contiguous MMIO requests, accesses that split MMIO pages are separated into two fragments, and are sent to userspace one at a time. When KVM attempts to complete userspace MMIO in response to KVM_RUN after the first fragment, KVM will detect the second fragment and generate a second userspace exit, and reference the on-stack variable. The issue is most visible if the second KVM_RUN is performed by a separate task, in which case the stack of the initiating task can show up as truly freed data. ================================================================== BUG: KASAN: use-after-free in complete_emulated_mmio+0x305/0x420 Read of size 1 at addr ffff888009c378d1 by task syz-executor417/984 CPU: 1 PID: 984 Comm: syz-executor417 Not tainted 5.10.0-182.0.0.95.h2627.eulerosv2r13.x86_64 #3 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014 Call Trace: dump_stack+0xbe/0xfd print_address_description.constprop.0+0x19/0x170 __kasan_report.cold+0x6c/0x84 kasan_report+0x3a/0x50 check_memory_region+0xfd/0x1f0 memcpy+0x20/0x60 complete_emulated_mmio+0x305/0x420 kvm_arch_vcpu_ioctl_run+0x63f/0x6d0 kvm_vcpu_ioctl+0x413/0xb20 __se_sys_ioctl+0x111/0x160 do_syscall_64+0x30/0x40 entry_SYSCALL_64_after_hwframe+0x67/0xd1 RIP: 0033:0x42477d Code: <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007faa8e6890e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00000000004d7338 RCX: 000000000042477d RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000005 RBP: 00000000004d7330 R08: 00007fff28d546df R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000004d733c R13: 0000000000000000 R14: 000000000040a200 R15: 00007fff28d54720 The buggy address belongs to the page: page:0000000029f6a428 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x9c37 flags: 0xfffffc0000000(node=0|zone=1|lastcpupid=0x1fffff) raw: 000fffffc0000000 0000000000000000 ffffea0000270dc8 0000000000000000 raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff888009c37780: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff888009c37800: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff >ffff888009c37880: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ^ ffff888009c37900: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff888009c37980: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ================================================================== The bug can also be reproduced with a targeted KVM-Unit-Test by hacking KVM to fill a large on-stack variable in complete_emulated_mmio(), i.e. by overwrite the data value with garbage. Limit the use of the scratch fields to 8-byte or smaller accesses, and to just writes, as larger accesses and reads are not affected thanks to implementation details in the emulator, but add a sanity check to ensure those details don't change in the future. Specifically, KVM never uses on-stack variables for accesses larger that 8 bytes, e.g. uses an operand in the emulator context, and *all* reads are buffered through the mem_read cache. Note! Using the scratch field for reads is not only unnecessary, it's also extremely difficult to handle correctly. As above, KVM buffers all reads through the mem_read cache, and heavily relies on that behavior when re-emulating the instruction after a userspace MMIO read exit. If a read splits a page, the first page is NOT an MMIO page, and the second page IS an MMIO page, then the MMIO fragment needs to point at _just_ the second chunk of the destination, i.e. its position in the mem_read cache. Taking the "obvious" approach of copying the fragment value into the destination when re-emulating the instruction would clobber the first chunk of the destination, i.e. would clobber the data that was read from guest memory. Fixes: f78146b0f923 ("KVM: Fix page-crossing MMIO") Suggested-by: Yashu Zhang <zhangjiaji1@huawei.com> Reported-by: Yashu Zhang <zhangjiaji1@huawei.com> Closes: https://lore.kernel.org/all/369eaaa2b3c1425c85e8477066391bc7@huawei.com Cc: stable@vger.kernel.org Tested-by: Tom Lendacky <thomas.lendacky@gmail.com> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Link: https://patch.msgid.link/20260225012049.920665-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>