summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2025-07-18neighbour: Annotate neigh_table.phash_buckets and pneigh_entry.next with __rcu.Kuniyuki Iwashima1-2/+2
The next patch will free pneigh_entry with call_rcu(). Then, we need to annotate neigh_table.phash_buckets[] and pneigh_entry.next with __rcu. To make the next patch cleaner, let's annotate the fields in advance. Currently, all accesses to the fields are under the neigh table lock, so rcu_dereference_protected() is used with 1 for now, but most of them (except in pneigh_delete() and pneigh_ifdown_and_unlock()) will be replaced with rcu_dereference() and rcu_dereference_check(). Note that pneigh_ifdown_and_unlock() changes pneigh_entry.next to a local list, which is illegal because the RCU iterator could be moved to another list. This part will be fixed in the next patch. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250716221221.442239-7-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-18neighbour: Split pneigh_lookup().Kuniyuki Iwashima1-2/+3
pneigh_lookup() has ASSERT_RTNL() in the middle of the function, which is confusing. When called with the last argument, creat, 0, pneigh_lookup() literally looks up a proxy neighbour entry. This is the case of the reader path as the fast path and RTM_GETNEIGH. pneigh_lookup(), however, creates a pneigh_entry when called with creat 1 from RTM_NEWNEIGH and SIOCSARP, which require RTNL. Let's split pneigh_lookup() into two functions. We will convert all the reader paths to RCU, and read_lock_bh(&tbl->lock) in the new pneigh_lookup() will be dropped. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250716221221.442239-6-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-18ethtool: rss: initial RSS_SET (indirection table handling)Jakub Kicinski1-0/+1
Add initial support for RSS_SET, for now only operations on the indirection table are supported. Unlike the ioctl don't check if at least one parameter is being changed. This is how other ethtool-nl ops behave, so pick the ethtool-nl consistency vs copying ioctl behavior. There are two special cases here: 1) resetting the table to defaults; 2) support for tables of different size. For (1) I use an empty Netlink attribute (array of size 0). (2) may require some background. AFAICT a lot of modern devices allow allocating RSS tables of different sizes. mlx5 can upsize its tables, bnxt has some "table size calculation", and Intel folks asked about RSS table sizing in context of resource allocation in the past. The ethtool IOCTL API has a concept of table size, but right now the user is expected to provide a table exactly the size the device requests. Some drivers may change the table size at runtime (in response to queue count changes) but the user is not in control of this. What's not great is that all RSS contexts share the same table size. For example a device with 128 queues enabled, 16 RSS contexts 8 queues in each will likely have 256 entry tables for each of the 16 contexts, while 32 would be more than enough given each context only has 8 queues. To address this the Netlink API should avoid enforcing table size at the uAPI level, and should allow the user to express the min table size they expect. To fully solve (2) we will need more driver plumbing but at the uAPI level this patch allows the user to specify a table size smaller than what the device advertises. The device table size must be a multiple of the user requested table size. We then replicate the user-provided table to fill the full device size table. This addresses the "allow the user to express the min table size" objective, while not enforcing any fixed size. From Netlink perspective .get_rxfh_indir_size() is now de facto the "max" table size supported by the device. We may choose to support table replication in ethtool, too, when we actually plumb this thru the device APIs. Initially I was considering moving full pattern generation to the kernel (which queues to use, at which frequency and what min sequence length). I don't think this complexity would buy us much and most if not all devices have pow-2 table sizes, which simplifies the replication a lot. Reviewed-by: Gal Pressman <gal@nvidia.com> Link: https://patch.msgid.link/20250716000331.1378807-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17PCI: Add pci_is_display() to check if device is a display controllerMario Limonciello1-0/+15
Several places in the kernel do class shifting to match whether a PCI device is display class. Add pci_is_display() for those places to use. Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Daniel Dadap <ddadap@nvidia.com> Reviewed-by: Simona Vetter <simona.vetter@ffwll.ch> Link: https://patch.msgid.link/20250717173812.3633478-2-superm1@kernel.org
2025-07-17stop_machine: Improve kernel-doc function-header commentsPaul E. McKenney1-23/+41
Add more detail to the kernel-doc function-header comments for stop_machine(), stop_machine_cpuslocked(), and stop_core_cpuslocked(). Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2025-07-17Merge back earlier cpufreq material for 6.17-rc1Rafael J. Wysocki1-1/+0
2025-07-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski16-50/+71
Cross-merge networking fixes after downstream PR (net-6.16-rc7). Conflicts: Documentation/netlink/specs/ovpn.yaml 880d43ca9aa4 ("netlink: specs: clean up spaces in brackets") af52020fc599 ("ovpn: reject unexpected netlink attributes") drivers/net/phy/phy_device.c a44312d58e78 ("net: phy: Don't register LEDs for genphy") f0f2b992d818 ("net: phy: Don't register LEDs for genphy") https://lore.kernel.org/20250710114926.7ec3a64f@kernel.org drivers/net/wireless/intel/iwlwifi/fw/regulatory.c drivers/net/wireless/intel/iwlwifi/mld/regulatory.c 5fde0fcbd760 ("wifi: iwlwifi: mask reserved bits in chan_state_active_bitmap") ea045a0de3b9 ("wifi: iwlwifi: add support for accepting raw DSM tables by firmware") net/ipv6/mcast.c ae3264a25a46 ("ipv6: mcast: Delay put pmc->idev in mld_del_delrec()") a8594c956cc9 ("ipv6: mcast: Avoid a duplicate pointer check in mld_del_delrec()") https://lore.kernel.org/8cc52891-3653-4b03-a45e-05464fe495cf@kernel.org No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17Merge back earlier material related to system sleepRafael J. Wysocki3-4/+40
2025-07-17cgroup: llist: avoid memory tears for llist_nodeShakeel Butt1-3/+3
Before the commit 36df6e3dbd7e ("cgroup: make css_rstat_updated nmi safe"), the struct llist_node is expected to be private to the one inserting the node to the lockless list or the one removing the node from the lockless list. After the mentioned commit, the llist_node in the rstat code is per-cpu shared between the stacked contexts i.e. process, softirq, hardirq & nmi. It is possible the compiler may tear the loads or stores of llist_node. Let's avoid that. KCSAN reported the following race: Reported by Kernel Concurrency Sanitizer on: CPU: 60 UID: 0 PID: 5425 ... 6.16.0-rc3-next-20250626 #1 NONE Tainted: [E]=UNSIGNED_MODULE Hardware name: ... ================================================================== ================================================================== BUG: KCSAN: data-race in css_rstat_flush / css_rstat_updated write to 0xffffe8fffe1c85f0 of 8 bytes by task 1061 on cpu 1: css_rstat_flush+0x1b8/0xeb0 __mem_cgroup_flush_stats+0x184/0x190 flush_memcg_stats_dwork+0x22/0x50 process_one_work+0x335/0x630 worker_thread+0x5f1/0x8a0 kthread+0x197/0x340 ret_from_fork+0xd3/0x110 ret_from_fork_asm+0x11/0x20 read to 0xffffe8fffe1c85f0 of 8 bytes by task 3551 on cpu 15: css_rstat_updated+0x81/0x180 mod_memcg_lruvec_state+0x113/0x2d0 __mod_lruvec_state+0x3d/0x50 lru_add+0x21e/0x3f0 folio_batch_move_lru+0x80/0x1b0 __folio_batch_add_and_move+0xd7/0x160 folio_add_lru_vma+0x42/0x50 do_anonymous_page+0x892/0xe90 __handle_mm_fault+0xfaa/0x1520 handle_mm_fault+0xdc/0x350 do_user_addr_fault+0x1dc/0x650 exc_page_fault+0x5c/0x110 asm_exc_page_fault+0x22/0x30 value changed: 0xffffe8fffe18e0d0 -> 0xffffe8fffe1c85f0 $ ./scripts/faddr2line vmlinux css_rstat_flush+0x1b8/0xeb0 css_rstat_flush+0x1b8/0xeb0: init_llist_node at include/linux/llist.h:86 (inlined by) llist_del_first_init at include/linux/llist.h:308 (inlined by) css_process_update_tree at kernel/cgroup/rstat.c:148 (inlined by) css_rstat_updated_list at kernel/cgroup/rstat.c:258 (inlined by) css_rstat_flush at kernel/cgroup/rstat.c:389 $ ./scripts/faddr2line vmlinux css_rstat_updated+0x81/0x180 css_rstat_updated+0x81/0x180: css_rstat_updated at kernel/cgroup/rstat.c:90 (discriminator 1) These are expected race and a simple READ_ONCE/WRITE_ONCE resolves these reports. However let's add comments to explain the race and the need for memory barriers if stronger guarantees are needed. More specifically the rstat updater and the flusher can race and cause a scenario where the stats updater skips adding the css to the lockless list but the flusher might not see those updates done by the skipped updater. This is benign race and the subsequent flusher will flush those stats and at the moment there aren't any rstat users which are not fine with this kind of race. However some future user might want more stricter guarantee, so let's add appropriate comments to ease the job of future users. Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Fixes: 36df6e3dbd7e ("cgroup: make css_rstat_updated nmi safe") Signed-off-by: Tejun Heo <tj@kernel.org>
2025-07-17Merge tag 'net-6.16-rc7' of ↵Linus Torvalds8-44/+48
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from Bluetooth, CAN, WiFi and Netfilter. More code here than I would have liked. That said, better now than next week. Nothing particularly scary stands out. The improvement to the OpenVPN input validation is a bit large but better get them in before the code makes it to a final release. Some of the changes we got from sub-trees could have been split better between the fix and -next refactoring, IMHO, that has been communicated. We have one known regression in a TI AM65 board not getting link. The investigation is going a bit slow, a number of people are on vacation. We'll try to wrap it up, but don't think it should hold up the release. Current release - fix to a fix: - Bluetooth: L2CAP: fix attempting to adjust outgoing MTU, it broke some headphones and speakers Current release - regressions: - wifi: ath12k: fix packets received in WBM error ring with REO LUT enabled, fix Rx performance regression - wifi: iwlwifi: - fix crash due to a botched indexing conversion - mask reserved bits in chan_state_active_bitmap, avoid FW assert() Current release - new code bugs: - nf_conntrack: fix crash due to removal of uninitialised entry - eth: airoha: fix potential UaF in airoha_npu_get() Previous releases - regressions: - net: fix segmentation after TCP/UDP fraglist GRO - af_packet: fix the SO_SNDTIMEO constraint not taking effect and a potential soft lockup waiting for a completion - rpl: fix UaF in rpl_do_srh_inline() for sneaky skb geometry - virtio-net: fix recursive rtnl_lock() during probe() - eth: stmmac: populate entire system_counterval_t in get_time_fn() - eth: libwx: fix a number of crashes in the driver Rx path - hv_netvsc: prevent IPv6 addrconf after IFF_SLAVE lost that meaning Previous releases - always broken: - mptcp: fix races in handling connection fallback to pure TCP - rxrpc: assorted error handling and race fixes - sched: another batch of "security" fixes for qdiscs (QFQ, HTB) - tls: always refresh the queue when reading sock, avoid UaF - phy: don't register LEDs for genphy, avoid deadlock - Bluetooth: btintel: check if controller is ISO capable on btintel_classify_pkt_type(), work around FW returning incorrect capabilities Misc: - make OpenVPN Netlink input checking more strict before it makes it to a final release - wifi: cfg80211: remove scan request n_channels __counted_by, it's only yielding false positives" * tag 'net-6.16-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (66 commits) rxrpc: Fix to use conn aborts for conn-wide failures rxrpc: Fix transmission of an abort in response to an abort rxrpc: Fix notification vs call-release vs recvmsg rxrpc: Fix recv-recv race of completed call rxrpc: Fix irq-disabled in local_bh_enable() selftests/tc-testing: Test htb_dequeue_tree with deactivation and row emptying net/sched: Return NULL when htb_lookup_leaf encounters an empty rbtree net: bridge: Do not offload IGMP/MLD messages selftests: Add test cases for vlan_filter modification during runtime net: vlan: fix VLAN 0 refcount imbalance of toggling filtering during runtime tls: always refresh the queue when reading sock virtio-net: fix recursived rtnl_lock() during probe() net/mlx5: Update the list of the PCI supported devices hv_netvsc: Set VF priv_flags to IFF_NO_ADDRCONF before open to prevent IPv6 addrconf phonet/pep: Move call to pn_skb_get_dst_sockaddr() earlier in pep_sock_accept() Bluetooth: L2CAP: Fix attempting to adjust outgoing MTU netfilter: nf_conntrack: fix crash due to removal of uninitialised entry net: fix segmentation after TCP/UDP fraglist GRO ipv6: mcast: Delay put pmc->idev in mld_del_delrec() net: airoha: fix potential use-after-free in airoha_npu_get() ...
2025-07-17Merge tag 'for-net-2025-07-17' of ↵Jakub Kicinski2-23/+29
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Luiz Augusto von Dentz says: ==================== bluetooth pull request for net: - hci_sync: fix connectable extended advertising when using static random address - hci_core: fix typos in macros - hci_core: add missing braces when using macro parameters - hci_core: replace 'quirks' integer by 'quirk_flags' bitmap - SMP: If an unallowed command is received consider it a failure - SMP: Fix using HCI_ERROR_REMOTE_USER_TERM on timeout - L2CAP: Fix null-ptr-deref in l2cap_sock_resume_cb() - L2CAP: Fix attempting to adjust outgoing MTU - btintel: Check if controller is ISO capable on btintel_classify_pkt_type - btusb: QCA: Fix downloading wrong NVM for WCN6855 GF variant without board ID * tag 'for-net-2025-07-17' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth: Bluetooth: L2CAP: Fix attempting to adjust outgoing MTU Bluetooth: btusb: QCA: Fix downloading wrong NVM for WCN6855 GF variant without board ID Bluetooth: hci_dev: replace 'quirks' integer by 'quirk_flags' bitmap Bluetooth: hci_core: add missing braces when using macro parameters Bluetooth: hci_core: fix typos in macros Bluetooth: SMP: Fix using HCI_ERROR_REMOTE_USER_TERM on timeout Bluetooth: SMP: If an unallowed command is received consider it a failure Bluetooth: btintel: Check if controller is ISO capable on btintel_classify_pkt_type Bluetooth: hci_sync: fix connectable extended advertising when using static random address Bluetooth: Fix null-ptr-deref in l2cap_sock_resume_cb() ==================== Link: https://patch.msgid.link/20250717142849.537425-1-luiz.dentz@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17rxrpc: Fix notification vs call-release vs recvmsgDavid Howells1-1/+2
When a call is released, rxrpc takes the spinlock and removes it from ->recvmsg_q in an effort to prevent racing recvmsg() invocations from seeing the same call. Now, rxrpc_recvmsg() only takes the spinlock when actually removing a call from the queue; it doesn't, however, take it in the lead up to that when it checks to see if the queue is empty. It *does* hold the socket lock, which prevents a recvmsg/recvmsg race - but this doesn't prevent sendmsg from ending the call because sendmsg() drops the socket lock and relies on the call->user_mutex. Fix this by firstly removing the bit in rxrpc_release_call() that dequeues the released call and, instead, rely on recvmsg() to simply discard released calls (done in a preceding fix). Secondly, rxrpc_notify_socket() is abandoned if the call is already marked as released rather than trying to be clever by setting both pointers in call->recvmsg_link to NULL to trick list_empty(). This isn't perfect and can still race, resulting in a released call on the queue, but recvmsg() will now clean that up. Fixes: 17926a79320a ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both") Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeffrey Altman <jaltman@auristor.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Junvyyang, Tencent Zhuque Lab <zhuque@tencent.com> cc: LePremierHomme <kwqcheii@proton.me> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/20250717074350.3767366-4-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17rxrpc: Fix recv-recv race of completed callDavid Howells1-0/+3
If a call receives an event (such as incoming data), the call gets placed on the socket's queue and a thread in recvmsg can be awakened to go and process it. Once the thread has picked up the call off of the queue, further events will cause it to be requeued, and once the socket lock is dropped (recvmsg uses call->user_mutex to allow the socket to be used in parallel), a second thread can come in and its recvmsg can pop the call off the socket queue again. In such a case, the first thread will be receiving stuff from the call and the second thread will be blocked on call->user_mutex. The first thread can, at this point, process both the event that it picked call for and the event that the second thread picked the call for and may see the call terminate - in which case the call will be "released", decoupling the call from the user call ID assigned to it (RXRPC_USER_CALL_ID in the control message). The first thread will return okay, but then the second thread will wake up holding the user_mutex and, if it sees that the call has been released by the first thread, it will BUG thusly: kernel BUG at net/rxrpc/recvmsg.c:474! Fix this by just dequeuing the call and ignoring it if it is seen to be already released. We can't tell userspace about it anyway as the user call ID has become stale. Fixes: 248f219cb8bc ("rxrpc: Rewrite the data and ack handling code") Reported-by: Junvyyang, Tencent Zhuque Lab <zhuque@tencent.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeffrey Altman <jaltman@auristor.com> cc: LePremierHomme <kwqcheii@proton.me> cc: Marc Dionne <marc.dionne@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/20250717074350.3767366-3-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17Merge tag 'wireless-next-2025-07-17' of ↵Paolo Abeni1-1/+5
https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Johannes Berg says: ==================== Another set of changes, notably: - cfg80211: fix double-free introduced earlier - mac80211: fix RCU iteration in CSA - iwlwifi: many cleanups (unused FW APIs, PCIe code, WoWLAN) - mac80211: some work around how FIPS affects wifi, which was wrong (RC4 is used by TKIP, not only WEP) - cfg/mac80211: improvements for unsolicated probe response handling * tag 'wireless-next-2025-07-17' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (64 commits) wifi: cfg80211: fix double free for link_sinfo in nl80211_station_dump() wifi: cfg80211: fix off channel operation allowed check for MLO wifi: mac80211: use RCU-safe iteration in ieee80211_csa_finish wifi: mac80211_hwsim: Update comments in header wifi: mac80211: parse unsolicited broadcast probe response data wifi: cfg80211: parse attribute to update unsolicited probe response template wifi: mac80211: don't use TPE data from assoc response wifi: mac80211: handle WLAN_HT_ACTION_NOTIFY_CHANWIDTH async wifi: mac80211: simplify __ieee80211_rx_h_amsdu() loop wifi: mac80211: don't mark keys for inactive links as uploaded wifi: mac80211: only assign chanctx in reconfig wifi: mac80211_hwsim: Declare support for AP scanning wifi: mac80211: clean up cipher suite handling wifi: mac80211: don't send keys to driver when fips_enabled wifi: cfg80211: Fix interface type validation wifi: mac80211: remove ieee80211_link_unreserve_chanctx() return value wifi: mac80211: don't unreserve never reserved chanctx mwl8k: Add missing check after DMA map wifi: mac80211: make VHT opmode NSS ignore a debug message wifi: iwlwifi: remove support of several iwl_ppag_table_cmd versions ... ==================== Link: https://patch.msgid.link/20250717094610.20106-47-johannes@sipsolutions.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-17Merge tag 'wireless-2025-07-17' of ↵Paolo Abeni1-1/+1
https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Johannes Berg says: ==================== Couple of fixes: - ath12k performance regression from -rc1 - cfg80211 counted_by() removal for scan request as it doesn't match usage and keeps complaining - iwlwifi crash with certain older devices - iwlwifi missing an error path unlock - iwlwifi compatibility with certain BIOS updates * tag 'wireless-2025-07-17' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: wifi: iwlwifi: Fix botched indexing conversion wifi: cfg80211: remove scan request n_channels counted_by wifi: ath12k: Fix packets received in WBM error ring with REO LUT enabled wifi: iwlwifi: mask reserved bits in chan_state_active_bitmap wifi: iwlwifi: pcie: fix locking on invalid TOP reset ==================== Link: https://patch.msgid.link/20250717091831.18787-5-johannes@sipsolutions.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-17ilog2: add max_pow_of_two_factor()John Garry1-0/+14
Relocate the function max_pow_of_two_factor() to common ilog2.h from the xfs code, as it will be used elsewhere. Also simplify the function, as advised by Mikulas Patocka. Signed-off-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20250711105258.3135198-2-john.g.garry@oracle.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-07-17nvme: fix typo in status code constant for self-test in progressAlok Tiwari1-1/+1
Correct a typo error in the NVMe status code constant from NVME_SC_SELT_TEST_IN_PROGRESS to NVME_SC_SELF_TEST_IN_PROGRESS to accurately reflect its meaning. Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-07-17Merge branch 'for-next' of ↵Paolo Abeni1-0/+55
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/linux Tony Nguyen says: ==================== Add RDMA support for Intel IPU E2000 in idpf Tatyana Nikolova says: This idpf patch series is the second part of the staged submission for introducing RDMA RoCEv2 support for the IPU E2000 line of products, referred to as GEN3. To support RDMA GEN3 devices, the idpf driver uses common definitions of the IIDC interface and implements specific device functionality in iidc_rdma_idpf.h. The IPU model can host one or more logical network endpoints called vPorts per PCI function that are flexibly associated with a physical port or an internal communication port. Other features as it pertains to GEN3 devices include: * MMIO learning * RDMA capability negotiation * RDMA vectors discovery between idpf and control plane These patches are split from the submission "Add RDMA support for Intel IPU E2000 (GEN3)" [1]. The patches have been tested on a range of hosts and platforms with a variety of general RDMA applications which include standalone verbs (rping, perftest, etc.), storage and HPC applications. Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> [1] https://lore.kernel.org/all/20240724233917.704-1-tatyana.e.nikolova@intel.com/ This idpf patch series is the second part of the staged submission for introducing RDMA RoCEv2 support for the IPU E2000 line of products, referred to as GEN3. To support RDMA GEN3 devices, the idpf driver uses common definitions of the IIDC interface and implements specific device functionality in iidc_rdma_idpf.h. The IPU model can host one or more logical network endpoints called vPorts per PCI function that are flexibly associated with a physical port or an internal communication port. Other features as it pertains to GEN3 devices include: * MMIO learning * RDMA capability negotiation * RDMA vectors discovery between idpf and control plane These patches are split from the submission "Add RDMA support for Intel IPU E2000 (GEN3)" [1]. The patches have been tested on a range of hosts and platforms with a variety of general RDMA applications which include standalone verbs (rping, perftest, etc.), storage and HPC applications. Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> [1] https://lore.kernel.org/all/20240724233917.704-1-tatyana.e.nikolova@intel.com/ IWL reviews: v3: https://lore.kernel.org/all/20250708210554.1662-1-tatyana.e.nikolova@intel.com/ v2: https://lore.kernel.org/all/20250612220002.1120-1-tatyana.e.nikolova@intel.com/ v1 (split from previous series): https://lore.kernel.org/all/20250523170435.668-1-tatyana.e.nikolova@intel.com/ v3: https://lore.kernel.org/all/20250207194931.1569-1-tatyana.e.nikolova@intel.com/ RFC v2: https://lore.kernel.org/all/20240824031924.421-1-tatyana.e.nikolova@intel.com/ RFC: https://lore.kernel.org/all/20240724233917.704-1-tatyana.e.nikolova@intel.com/ * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/linux: idpf: implement get LAN MMIO memory regions idpf: implement IDC vport aux driver MTU change handler idpf: implement remaining IDC RDMA core callbacks and handlers idpf: implement RDMA vport auxiliary dev create, init, and destroy idpf: implement core RDMA auxiliary dev create, init, and destroy idpf: use reserved RDMA vectors from control plane ==================== Link: https://patch.msgid.link/20250714181002.2865694-1-anthony.l.nguyen@intel.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-17watchdog: Don't use "proxy" headersAndy Shevchenko1-4/+8
Update header inclusions to follow IWYU (Include What You Use) principle. Note that kernel.h is discouraged to be included as it's written at the top of that file. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/20250708133646.70384-3-andriy.shevchenko@linux.intel.com Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
2025-07-17netfilter: nf_conntrack: fix crash due to removal of uninitialised entryFlorian Westphal1-2/+13
A crash in conntrack was reported while trying to unlink the conntrack entry from the hash bucket list: [exception RIP: __nf_ct_delete_from_lists+172] [..] #7 [ff539b5a2b043aa0] nf_ct_delete at ffffffffc124d421 [nf_conntrack] #8 [ff539b5a2b043ad0] nf_ct_gc_expired at ffffffffc124d999 [nf_conntrack] #9 [ff539b5a2b043ae0] __nf_conntrack_find_get at ffffffffc124efbc [nf_conntrack] [..] The nf_conn struct is marked as allocated from slab but appears to be in a partially initialised state: ct hlist pointer is garbage; looks like the ct hash value (hence crash). ct->status is equal to IPS_CONFIRMED|IPS_DYING, which is expected ct->timeout is 30000 (=30s), which is unexpected. Everything else looks like normal udp conntrack entry. If we ignore ct->status and pretend its 0, the entry matches those that are newly allocated but not yet inserted into the hash: - ct hlist pointers are overloaded and store/cache the raw tuple hash - ct->timeout matches the relative time expected for a new udp flow rather than the absolute 'jiffies' value. If it were not for the presence of IPS_CONFIRMED, __nf_conntrack_find_get() would have skipped the entry. Theory is that we did hit following race: cpu x cpu y cpu z found entry E found entry E E is expired <preemption> nf_ct_delete() return E to rcu slab init_conntrack E is re-inited, ct->status set to 0 reply tuplehash hnnode.pprev stores hash value. cpu y found E right before it was deleted on cpu x. E is now re-inited on cpu z. cpu y was preempted before checking for expiry and/or confirm bit. ->refcnt set to 1 E now owned by skb ->timeout set to 30000 If cpu y were to resume now, it would observe E as expired but would skip E due to missing CONFIRMED bit. nf_conntrack_confirm gets called sets: ct->status |= CONFIRMED This is wrong: E is not yet added to hashtable. cpu y resumes, it observes E as expired but CONFIRMED: <resumes> nf_ct_expired() -> yes (ct->timeout is 30s) confirmed bit set. cpu y will try to delete E from the hashtable: nf_ct_delete() -> set DYING bit __nf_ct_delete_from_lists Even this scenario doesn't guarantee a crash: cpu z still holds the table bucket lock(s) so y blocks: wait for spinlock held by z CONFIRMED is set but there is no guarantee ct will be added to hash: "chaintoolong" or "clash resolution" logic both skip the insert step. reply hnnode.pprev still stores the hash value. unlocks spinlock return NF_DROP <unblocks, then crashes on hlist_nulls_del_rcu pprev> In case CPU z does insert the entry into the hashtable, cpu y will unlink E again right away but no crash occurs. Without 'cpu y' race, 'garbage' hlist is of no consequence: ct refcnt remains at 1, eventually skb will be free'd and E gets destroyed via: nf_conntrack_put -> nf_conntrack_destroy -> nf_ct_destroy. To resolve this, move the IPS_CONFIRMED assignment after the table insertion but before the unlock. Pablo points out that the confirm-bit-store could be reordered to happen before hlist add resp. the timeout fixup, so switch to set_bit and before_atomic memory barrier to prevent this. It doesn't matter if other CPUs can observe a newly inserted entry right before the CONFIRMED bit was set: Such event cannot be distinguished from above "E is the old incarnation" case: the entry will be skipped. Also change nf_ct_should_gc() to first check the confirmed bit. The gc sequence is: 1. Check if entry has expired, if not skip to next entry 2. Obtain a reference to the expired entry. 3. Call nf_ct_should_gc() to double-check step 1. nf_ct_should_gc() is thus called only for entries that already failed an expiry check. After this patch, once the confirmed bit check passes ct->timeout has been altered to reflect the absolute 'best before' date instead of a relative time. Step 3 will therefore not remove the entry. Without this change to nf_ct_should_gc() we could still get this sequence: 1. Check if entry has expired. 2. Obtain a reference. 3. Call nf_ct_should_gc() to double-check step 1: 4 - entry is still observed as expired 5 - meanwhile, ct->timeout is corrected to absolute value on other CPU and confirm bit gets set 6 - confirm bit is seen 7 - valid entry is removed again First do check 6), then 4) so the gc expiry check always picks up either confirmed bit unset (entry gets skipped) or expiry re-check failure for re-inited conntrack objects. This change cannot be backported to releases before 5.19. Without commit 8a75a2c17410 ("netfilter: conntrack: remove unconfirmed list") |= IPS_CONFIRMED line cannot be moved without further changes. Cc: Razvan Cojocaru <rzvncj@gmail.com> Link: https://lore.kernel.org/netfilter-devel/20250627142758.25664-1-fw@strlen.de/ Link: https://lore.kernel.org/netfilter-devel/4239da15-83ff-4ca4-939d-faef283471bb@gmail.com/ Fixes: 1397af5bfd7d ("netfilter: conntrack: remove the percpu dying list") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-07-17dt-bindings: clock: qcom: document the Milos Video Clock ControllerLuca Weiss1-0/+36
Add bindings documentation for the Milos (e.g. SM7635) Video Clock Controller. Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Luca Weiss <luca.weiss@fairphone.com> Link: https://lore.kernel.org/r/20250715-sm7635-clocks-v3-10-18f9faac4984@fairphone.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-07-17dt-bindings: clock: qcom: document the Milos GPU Clock ControllerLuca Weiss1-0/+56
Add bindings documentation for the Milos (e.g. SM7635) Graphics Clock Controller. Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Luca Weiss <luca.weiss@fairphone.com> Link: https://lore.kernel.org/r/20250715-sm7635-clocks-v3-8-18f9faac4984@fairphone.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-07-17dt-bindings: clock: qcom: document the Milos Display Clock ControllerLuca Weiss1-0/+61
Add bindings documentation for the Milos (e.g. SM7635) Display Clock Controller. Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Luca Weiss <luca.weiss@fairphone.com> Link: https://lore.kernel.org/r/20250715-sm7635-clocks-v3-6-18f9faac4984@fairphone.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-07-17dt-bindings: clock: qcom: document the Milos Camera Clock ControllerLuca Weiss1-0/+131
Add bindings documentation for the Milos (e.g. SM7635) Camera Clock Controller. Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Luca Weiss <luca.weiss@fairphone.com> Link: https://lore.kernel.org/r/20250715-sm7635-clocks-v3-4-18f9faac4984@fairphone.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-07-17dt-bindings: clock: qcom: document the Milos Global Clock ControllerLuca Weiss1-0/+210
Add bindings documentation for the Milos (e.g. SM7635) Global Clock Controller. Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Luca Weiss <luca.weiss@fairphone.com> Link: https://lore.kernel.org/r/20250715-sm7635-clocks-v3-2-18f9faac4984@fairphone.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-07-17dt-bindings: clock: qcom,x1e80100-gcc: Add missing video resetsStephan Gerhold1-0/+2
Add the missing video resets that are needed for the iris video codec. Acked-by: Rob Herring (Arm) <robh@kernel.org> Reviewed-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org> Signed-off-by: Stephan Gerhold <stephan.gerhold@linaro.org> Link: https://lore.kernel.org/r/20250709-x1e-videocc-v2-4-ad1acf5674b4@linaro.org Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-07-17dt-bindings: clock: Add Qualcomm QCS615 Video clock controllerTaniya Das1-0/+30
Add DT bindings for the Video clock on QCS615 platforms. Add the relevant DT include definitions as well. Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Taniya Das <quic_tdas@quicinc.com> Link: https://lore.kernel.org/r/20250702-qcs615-mm-v10-clock-controllers-v11-8-9c216e1615ab@quicinc.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-07-17dt-bindings: clock: Add Qualcomm QCS615 Graphics clock controllerTaniya Das1-0/+39
Add DT bindings for the Graphics clock on QCS615 platforms. Add the relevant DT include definitions as well. Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Taniya Das <quic_tdas@quicinc.com> Link: https://lore.kernel.org/r/20250702-qcs615-mm-v10-clock-controllers-v11-6-9c216e1615ab@quicinc.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-07-17dt-bindings: clock: Add Qualcomm QCS615 Display clock controllerTaniya Das1-0/+52
Add DT bindings for the Display clock on QCS615 platforms. Add the relevant DT include definitions as well. Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Taniya Das <quic_tdas@quicinc.com> Link: https://lore.kernel.org/r/20250702-qcs615-mm-v10-clock-controllers-v11-4-9c216e1615ab@quicinc.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-07-17dt-bindings: clock: Add Qualcomm QCS615 Camera clock controllerTaniya Das1-0/+110
Add DT bindings for the Camera clock on QCS615 platforms. Add the relevant DT include definitions as well. Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Taniya Das <quic_tdas@quicinc.com> Link: https://lore.kernel.org/r/20250702-qcs615-mm-v10-clock-controllers-v11-2-9c216e1615ab@quicinc.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-07-17Merge branch '20250516-ipq5018-cmn-pll-v4-2-389a6b30e504@outlook.com' into ↵Bjorn Andersson1-0/+16
clk-for-6.17 Merge the IPQ5018 CMN PLL binding through a topic branch, to allow merging the clock defines into DeviceTree branch as well.
2025-07-17dt-bindings: clock: qcom: Add CMN PLL support for IPQ5018 SoCGeorge Moussalem1-0/+16
The CMN PLL block in the IPQ5018 SoC takes 96 MHZ as the reference input clock. Its output clocks are the XO (24Mhz), sleep (32Khz), and ethernet (50Mhz) clocks. Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Signed-off-by: George Moussalem <george.moussalem@outlook.com> Link: https://lore.kernel.org/r/20250516-ipq5018-cmn-pll-v4-2-389a6b30e504@outlook.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-07-17soc: qcom: spmi-pmic: add more PMIC SUBTYPE IDsRakesh Kota1-0/+2
Add the PMM8650AU and PMM8650AU_PSAIL PMIC SUBTYPE IDs and These PMICs are used by the qcs8300 and qcs9100 platforms. Signed-off-by: Rakesh Kota <rakesh.kota@oss.qualcomm.com> Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Link: https://lore.kernel.org/r/20250704113036.1627695-1-rakesh.kota@oss.qualcomm.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-07-17dt-bindings: arm: qcom,ids: Add SoC IDs for SM7635 familyLuca Weiss1-0/+5
Add the SoC IDs of the 'volcano' family, namely SM7635, SM6650, SM6650P, QCM6690 and QCS6690. Signed-off-by: Luca Weiss <luca.weiss@fairphone.com> Link: https://lore.kernel.org/r/20250625-sm7635-socinfo-v1-1-be09d5c697b8@fairphone.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-07-17firmware: qcom: scm: take struct device as argument in SHM bridge enableBartosz Golaszewski1-1/+0
qcom_scm_shm_bridge_enable() is used early in the SCM initialization routine. It makes an SCM call and so expects the internal __scm pointer in the SCM driver to be assigned. For this reason the tzmem memory pool is allocated *after* this pointer is assigned. However, this can lead to a crash if another consumer of the SCM API makes a call using the memory pool between the assignment of the __scm pointer and the initialization of the tzmem memory pool. As qcom_scm_shm_bridge_enable() is a special case, not meant to be called by ordinary users, pull it into the local SCM header. Make it take struct device as argument. This is the device that will be used to make the SCM call as opposed to the global __scm pointer. This will allow us to move the tzmem initialization *before* the __scm assignment in the core SCM driver. Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Link: https://lore.kernel.org/r/20250630-qcom-scm-race-v2-2-fa3851c98611@linaro.org Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-07-17firmware: qcom: scm: remove unused arguments from SHM bridge routinesBartosz Golaszewski1-2/+2
qcom_scm_shm_bridge_create() and qcom_scm_shm_bridge_delete() take struct device as argument but don't use it. Remove it from these functions' prototypes. Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Link: https://lore.kernel.org/r/20250630-qcom-scm-race-v2-1-fa3851c98611@linaro.org Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-07-17bpf: Add struct bpf_token_infoTao Chen2-0/+19
The 'commit 35f96de04127 ("bpf: Introduce BPF token object")' added BPF token as a new kind of BPF kernel object. And BPF_OBJ_GET_INFO_BY_FD already used to get BPF object info, so we can also get token info with this cmd. One usage scenario, when program runs failed with token, because of the permission failure, we can report what BPF token is allowing with this API for debugging. Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Tao Chen <chen.dylane@linux.dev> Link: https://lore.kernel.org/r/20250716134654.1162635-1-chen.dylane@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-16drm/amdgpu: Replace HQD terminology with slots namingJesse Zhang1-2/+2
The term "HQD" is CP-specific and doesn't accurately describe the queue resources for other IP blocks like SDMA, VCN, or VPE. This change: 1. Renames `num_hqds` to `num_slots` in amdgpu_kms.c to better reflect the generic nature of the resource counting 2. Updates the UAPI struct member from `userq_num_hqds` to `userq_num_slots` 3. Maintains the same functionality while using more appropriate terminology Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-07-16drm/amdgpu: Add user queue instance count in HW IP infoJesse Zhang1-0/+2
This change exposes the number of available user queue instances for each hardware IP type (GFX, COMPUTE, SDMA) through the drm_amdgpu_info_hw_ip interface. Key changes: 1. Added userq_num_instance field to drm_amdgpu_info_hw_ip structure 2. Implemented counting of available HQD slots using: - mes.gfx_hqd_mask for GFX queues - mes.compute_hqd_mask for COMPUTE queues - mes.sdma_hqd_mask for SDMA queues 3. Only counts available instances when user queues are enabled (!disable_uq) v2: using the adev->mes.gfx_hqd_mask[]/compute_hqd_mask[]/sdma_hqd_mask[] masks to determine the number of queue slots available for each engine type (Alex) v3: rename userq_num_instance to userq_num_hqds (Alex) Suggested-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-07-16drm/amd/amdgpu: Add helper functions for isp buffersPratap Nirujogi1-0/+51
Accessing amdgpu internal data structures "struct amdgpu_device" and "struct amdgpu_bo" in ISP V4L2 driver to alloc/free GART buffers is not recommended. Add new amdgpu_isp helper functions that takes opaque params from ISP V4L2 driver and calls the amdgpu internal functions amdgpu_bo_create_isp_user() and amdgpu_bo_create_kernel() to alloc/free GART buffers. Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Pratap Nirujogi <pratap.nirujogi@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-07-16Bluetooth: hci_dev: replace 'quirks' integer by 'quirk_flags' bitmapChristian Eggers2-12/+18
The 'quirks' member already ran out of bits on some platforms some time ago. Replace the integer member by a bitmap in order to have enough bits in future. Replace raw bit operations by accessor macros. Fixes: ff26b2dd6568 ("Bluetooth: Add quirk for broken READ_VOICE_SETTING") Fixes: 127881334eaa ("Bluetooth: Add quirk for broken READ_PAGE_SCAN_TYPE") Suggested-by: Pauli Virtanen <pav@iki.fi> Tested-by: Ivan Pravdin <ipravdin.official@gmail.com> Signed-off-by: Kiran K <kiran.k@intel.com> Signed-off-by: Christian Eggers <ceggers@arri.de> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-07-16Bluetooth: hci_core: add missing braces when using macro parametersChristian Eggers1-11/+11
Macro parameters should always be put into braces when accessing it. Fixes: 4fc9857ab8c6 ("Bluetooth: hci_sync: Add check simultaneous roles support") Signed-off-by: Christian Eggers <ceggers@arri.de> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-07-16Bluetooth: hci_core: fix typos in macrosChristian Eggers1-2/+2
The provided macro parameter is named 'dev' (rather than 'hdev', which may be a variable on the stack where the macro is used). Fixes: a9a830a676a9 ("Bluetooth: hci_event: Fix sending HCI_OP_READ_ENC_KEY_SIZE") Fixes: 6126ffabba6b ("Bluetooth: Introduce HCI_CONN_FLAG_DEVICE_PRIVACY device flag") Signed-off-by: Christian Eggers <ceggers@arri.de> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-07-16ACPI: APEI: handle synchronous exceptions in task workShuai Xue2-4/+0
The memory uncorrected error could be signaled by asynchronous interrupt (specifically, SPI in arm64 platform), e.g. when an error is detected by a background scrubber, or signaled by synchronous exception (specifically, data abort exception in arm64 platform), e.g. when a CPU tries to access a poisoned cache line. Currently, both synchronous and asynchronous errors use memory_failure_queue() to schedule memory_failure() to exectute in a kworker context. As a result, when a user-space process is accessing a poisoned data, a data abort is taken and the memory_failure() is executed in the kworker context, which: - will send wrong si_code by SIGBUS signal in early_kill mode, and - can not kill the user-space in some cases resulting a synchronous error infinite loop Issue 1: send wrong si_code in early_kill mode Since commit a70297d22132 ("ACPI: APEI: set memory failure flags as MF_ACTION_REQUIRED on synchronous events")', the flag MF_ACTION_REQUIRED could be used to determine whether a synchronous exception occurs on ARM64 platform. When a synchronous exception is detected, the kernel is expected to terminate the current process which has accessed a poisoned page. This is done by sending a SIGBUS signal with error code BUS_MCEERR_AR, indicating an action-required machine check error on read. However, when kill_proc() is called to terminate the processes who has the poisoned page mapped, it sends the incorrect SIGBUS error code BUS_MCEERR_AO because the context in which it operates is not the one where the error was triggered. To reproduce this problem: #sysctl -w vm.memory_failure_early_kill=1 vm.memory_failure_early_kill = 1 # STEP2: inject an UCE error and consume it to trigger a synchronous error #einj_mem_uc single 0: single vaddr = 0xffffb0d75400 paddr = 4092d55b400 injecting ... triggering ... signal 7 code 5 addr 0xffffb0d75000 page not present Test passed The si_code (code 5) from einj_mem_uc indicates that it is BUS_MCEERR_AO error and it is not factually correct. After this change: # STEP1: enable early kill mode #sysctl -w vm.memory_failure_early_kill=1 vm.memory_failure_early_kill = 1 # STEP2: inject an UCE error and consume it to trigger a synchronous error #einj_mem_uc single 0: single vaddr = 0xffffb0d75400 paddr = 4092d55b400 injecting ... triggering ... signal 7 code 4 addr 0xffffb0d75000 page not present Test passed The si_code (code 4) from einj_mem_uc indicates that it is a BUS_MCEERR_AR error as expected. Issue 2: a synchronous error infinite loop If a user-space process, e.g. devmem, accesses a poisoned page for which the HWPoison flag is set, kill_accessing_process() is called to send SIGBUS to current processs with error info. Since the memory_failure() is executed in the kworker context, it will just do nothing but return EFAULT. So, devmem will access the posioned page and trigger an exception again, resulting in a synchronous error infinite loop. Such exception loop may cause platform firmware to exceed some threshold and reboot when Linux could have recovered from this error. To reproduce this problem: # STEP 1: inject an UCE error, and kernel will set HWPosion flag for related page #einj_mem_uc single 0: single vaddr = 0xffffb0d75400 paddr = 4092d55b400 injecting ... triggering ... signal 7 code 4 addr 0xffffb0d75000 page not present Test passed # STEP 2: access the same page and it will trigger a synchronous error infinite loop devmem 0x4092d55b400 To fix above two issues, queue memory_failure() as a task_work so that it runs in the context of the process that is actually consuming the poisoned data. Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com> Tested-by: Ma Wupeng <mawupeng1@huawei.com> Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Xiaofei Tan <tanxiaofei@huawei.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Jane Chu <jane.chu@oracle.com> Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com> Reviewed-by: Hanjun Guo <guohanjun@huawei.com> Link: https://patch.msgid.link/20250714114212.31660-3-xueshuai@linux.alibaba.com [ rjw: Changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-07-16pmdomain: core: introduce dev_pm_genpd_is_on()Hiago De Franco1-0/+6
This helper function returns the current power status of a given generic power domain. As example, remoteproc/imx_rproc.c can now use this function to check the power status of the remote core to properly set "attached" or "offline" modes. Suggested-by: Ulf Hansson <ulf.hansson@linaro.org> Reviewed-by: Bjorn Andersson <andersson@kernel.org> Reviewed-by: Peng Fan <peng.fan@nxp.com> Signed-off-by: Hiago De Franco <hiago.franco@toradex.com> Link: https://lore.kernel.org/r/20250629172512.14857-2-hiagofranco@gmail.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2025-07-16cxl: Convert to ACQUIRE() for conditional rwsem lockingDan Williams1-0/+1
Use ACQUIRE() to cleanup conditional locking paths in the CXL driver The ACQUIRE() macro and its associated ACQUIRE_ERR() helpers, like scoped_cond_guard(), arrange for scoped-based conditional locking. Unlike scoped_cond_guard(), these macros arrange for an ERR_PTR() to be retrieved representing the state of the conditional lock. The goal of this conversion is to complete the removal of all explicit unlock calls in the subsystem. I.e. the methods to acquire a lock are solely via guard(), scoped_guard() (for limited cases), or ACQUIRE(). All unlock is implicit / scope-based. In order to make sure all lock sites are converted, the existing rwsem's are consolidated and renamed in 'struct cxl_rwsem'. While that makes the patch noisier it gives a clean cut-off between old-world (explicit unlock allowed), and new world (explicit unlock deleted). Cc: David Lechner <dlechner@baylibre.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Fabio M. De Francesco <fabio.m.de.francesco@linux.intel.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Jonathan Cameron <jonathan.cameron@huawei.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Alison Schofield <alison.schofield@intel.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Shiju Jose <shiju.jose@huawei.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Fabio M. De Francesco <fabio.m.de.francesco@linux.intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Tested-by: Shiju Jose <shiju.jose@huawei.com> Link: https://patch.msgid.link/20250711234932.671292-9-dan.j.williams@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-07-16cleanup: Introduce ACQUIRE() and ACQUIRE_ERR() for conditional locksPeter Zijlstra3-16/+83
scoped_cond_guard(), automatic cleanup for conditional locks, has a couple pain points: * It causes existing straight-line code to be re-indented into a new bracketed scope. While this can be mitigated by a new helper function to contain the scope, that is not always a comfortable conversion. * The return code from the conditional lock is tossed in favor of a scheme to pass a 'return err;' statement to the macro. Other attempts to clean this up, to behave more like guard() [1], got hung up trying to both establish and evaluate the conditional lock in one statement. ACQUIRE() solves this by reflecting the result of the condition in the automatic variable established by the lock CLASS(). The result is separately retrieved with the ACQUIRE_ERR() helper, effectively a PTR_ERR() operation. Link: http://lore.kernel.org/all/Z1LBnX9TpZLR5Dkf@gmail.com [1] Link: http://patch.msgid.link/20250512105026.GP4439@noisy.programming.kicks-ass.net Link: http://patch.msgid.link/20250512185817.GA1808@noisy.programming.kicks-ass.net Cc: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: David Lechner <dlechner@baylibre.com> Cc: Fabio M. De Francesco <fabio.m.de.francesco@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> [djbw: wrap Peter's proposal with changelog and comments] Co-developed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Link: https://patch.msgid.link/20250711234932.671292-2-dan.j.williams@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-07-16drm/gem/afbc: Eliminate redundant drm_get_format_info()Ville Syrjälä1-0/+1
Pass along the format info from .fb_create() to aliminate the redundant drm_get_format_info() calls from the afbc code. Cc: Sandy Huang <hjc@rock-chips.com> Cc: "Heiko Stübner" <heiko@sntech.de> Cc: Andy Yan <andy.yan@rock-chips.com> Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250701090722.13645-9-ville.syrjala@linux.intel.com
2025-07-16drm/gem: Pass along the format info from .fb_create() to ↵Ville Syrjälä1-0/+2
drm_helper_mode_fill_fb_struct() Pass along the format info from .fb_create() to eliminate the redundant drm_get_format_info() calls from the gem fb code. v2: Fix kernel docs (Laurent) Cc: Dave Airlie <airlied@redhat.com> Cc: Gerd Hoffmann <kraxel@redhat.com> Cc: Sandy Huang <hjc@rock-chips.com> Cc: "Heiko Stübner" <heiko@sntech.de> Cc: Andy Yan <andy.yan@rock-chips.com> Cc: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com> Cc: virtualization@lists.linux.dev Cc: spice-devel@lists.freedesktop.org Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250701090722.13645-8-ville.syrjala@linux.intel.com
2025-07-16drm: Allow the caller to pass in the format info to ↵Ville Syrjälä1-0/+2
drm_helper_mode_fill_fb_struct() Soon all drivers should have the format info already available in the places where they call drm_helper_mode_fill_fb_struct(). Allow it to be passed along into drm_helper_mode_fill_fb_struct() instead of doing yet another redundant lookup. Start by always passing in NULL and still doing the extra lookup. The actual changes to avoid the lookup will follow. Done with cocci (with some manual fixups): @@ identifier dev, fb, mode_cmd; expression get_format_info; @@ void drm_helper_mode_fill_fb_struct(struct drm_device *dev, struct drm_framebuffer *fb, + const struct drm_format_info *info, const struct drm_mode_fb_cmd2 *mode_cmd) { ... - fb->format = get_format_info; + fb->format = info ?: get_format_info; ... } @@ identifier dev, fb, mode_cmd; @@ void drm_helper_mode_fill_fb_struct(struct drm_device *dev, struct drm_framebuffer *fb, + const struct drm_format_info *info, const struct drm_mode_fb_cmd2 *mode_cmd); @@ expression dev, fb, mode_cmd; @@ drm_helper_mode_fill_fb_struct(dev, fb + ,NULL ,mode_cmd); Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Liviu Dudau <liviu.dudau@arm.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Inki Dae <inki.dae@samsung.com> Cc: Seung-Woo Kim <sw0312.kim@samsung.com> Cc: Kyungmin Park <kyungmin.park@samsung.com> Cc: Patrik Jakobsson <patrik.r.jakobsson@gmail.com> Cc: Rob Clark <robdclark@gmail.com> Cc: Abhinav Kumar <quic_abhinavk@quicinc.com> Cc: Dmitry Baryshkov <lumag@kernel.org> Cc: Sean Paul <sean@poorly.run> Cc: Marijn Suijten <marijn.suijten@somainline.org> Cc: Lyude Paul <lyude@redhat.com> Cc: Danilo Krummrich <dakr@kernel.org> Cc: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com> Cc: Thierry Reding <thierry.reding@gmail.com> Cc: Mikko Perttunen <mperttunen@nvidia.com> Cc: Gerd Hoffmann <kraxel@redhat.com> Cc: Dmitry Osipenko <dmitry.osipenko@collabora.com> Cc: Gurchetan Singh <gurchetansingh@chromium.org> Cc: Chia-I Wu <olvaffe@gmail.com> Cc: Zack Rusin <zack.rusin@broadcom.com> Cc: Broadcom internal kernel review list <bcm-kernel-feedback-list@broadcom.com> Cc: amd-gfx@lists.freedesktop.org Cc: linux-arm-msm@vger.kernel.org Cc: freedreno@lists.freedesktop.org Cc: nouveau@lists.freedesktop.org Cc: linux-tegra@vger.kernel.org Cc: virtualization@lists.linux.dev Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Reviewed-by: Liviu Dudau <liviu.dudau@arm.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250701090722.13645-6-ville.syrjala@linux.intel.com