summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2026-01-30igc: Reduce TSN TX packet buffer from 7KB to 5KB per queueChwee-Lin Choong1-2/+3
[ Upstream commit 8ad1b6c1e63d25f5465b7a8aa403bdcee84b86f9 ] The previous 7 KB per queue caused TX unit hangs under heavy timestamping load. Reducing to 5 KB avoids these hangs and matches the TSN recommendation in I225/I226 SW User Manual Section 7.5.4. The 8 KB "freed" by this change is currently unused. This reduction is not expected to impact throughput, as the i226 is PCIe-limited for small TSN packets rather than TX-buffer-limited. Fixes: 0d58cdc902da ("igc: optimize TX packet buffer utilization for TSN mode") Reported-by: Zdenek Bouska <zdenek.bouska@siemens.com> Closes: https://lore.kernel.org/netdev/AS1PR10MB5675DBFE7CE5F2A9336ABFA4EBEAA@AS1PR10MB5675.EURPRD10.PROD.OUTLOOK.COM/ Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Chwee-Lin Choong <chwee.lin.choong@intel.com> Tested-by: Avigail Dahan <avigailx.dahan@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-01-30igc: fix race condition in TX timestamp read for register 0Chwee-Lin Choong1-18/+25
[ Upstream commit 6990dc392a9ab10e52af37e0bee8c7b753756dc4 ] The current HW bug workaround checks the TXTT_0 ready bit first, then reads TXSTMPL_0 twice (before and after reading TXSTMPH_0) to detect whether a new timestamp was captured by timestamp register 0 during the workaround. This sequence has a race: if a new timestamp is captured after checking the TXTT_0 bit but before the first TXSTMPL_0 read, the detection fails because both the "old" and "new" values come from the same timestamp. Fix by reading TXSTMPL_0 first to establish a baseline, then checking the TXTT_0 bit. This ensures any timestamp captured during the race window will be detected. Old sequence: 1. Check TXTT_0 ready bit 2. Read TXSTMPL_0 (baseline) 3. Read TXSTMPH_0 (interrupt workaround) 4. Read TXSTMPL_0 (detect changes vs baseline) New sequence: 1. Read TXSTMPL_0 (baseline) 2. Check TXTT_0 ready bit 3. Read TXSTMPH_0 (interrupt workaround) 4. Read TXSTMPL_0 (detect changes vs baseline) Fixes: c789ad7cbebc ("igc: Work around HW bug causing missing timestamps") Suggested-by: Avi Shalev <avi.shalev@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Co-developed-by: Song Yoong Siang <yoong.siang.song@intel.com> Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com> Signed-off-by: Chwee-Lin Choong <chwee.lin.choong@intel.com> Tested-by: Avigail Dahan <avigailx.dahan@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-01-30igc: Restore default Qbv schedule when changing channelsKurt Kanzenbach2-2/+7
[ Upstream commit 41a9a6826f20a524242a6c984845c4855f629841 ] The Multi-queue Priority (MQPRIO) and Earliest TxTime First (ETF) offloads utilize the Time Sensitive Networking (TSN) Tx mode. This mode is always coupled to IEEE 802.1Qbv time aware shaper (Qbv). Therefore, the driver sets a default Qbv schedule of all gates opened and a cycle time of 1s. This schedule is set during probe. However, the following sequence of events lead to Tx issues: - Boot a dual core system igc_probe(): igc_tsn_clear_schedule(): -> Default Schedule is set Note: At this point the driver has allocated two Tx/Rx queues, because there are only two CPUs. - ethtool -L enp3s0 combined 4 igc_ethtool_set_channels(): igc_reinit_queues() -> Default schedule is gone, per Tx ring start and end time are zero - tc qdisc replace dev enp3s0 handle 100 parent root mqprio \ num_tc 4 map 3 3 2 2 0 1 1 1 3 3 3 3 3 3 3 3 \ queues 1@0 1@1 1@2 1@3 hw 1 igc_tsn_offload_apply(): igc_tsn_enable_offload(): -> Writes zeros to IGC_STQT(i) and IGC_ENDQT(i), causing Tx to stall/fail Therefore, restore the default Qbv schedule after changing the number of channels. Furthermore, add a restriction to not allow queue reconfiguration when TSN/Qbv is enabled, because it may lead to inconsistent states. Fixes: c814a2d2d48f ("igc: Use default cycle 'start' and 'end' values for queues") Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Avigail Dahan <avigailx.dahan@intel.com> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-01-30ice: Fix incorrect timeout ice_release_res()Ding Hui1-1/+1
[ Upstream commit 01139a2ce532d77379e1593230127caa261a8036 ] The commit 5f6df173f92e ("ice: implement and use rd32_poll_timeout for ice_sq_done timeout") converted ICE_CTL_Q_SQ_CMD_TIMEOUT from jiffies to microseconds. But the ice_release_res() function was missed, and its logic still treats ICE_CTL_Q_SQ_CMD_TIMEOUT as a jiffies value. So correct the issue by usecs_to_jiffies(). Found by inspection of the DDP downloading process. Compile and modprobe tested only. Fixes: 5f6df173f92e ("ice: implement and use rd32_poll_timeout for ice_sq_done timeout") Signed-off-by: Ding Hui <dinghui@sangfor.com.cn> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-01-30ice: Avoid detrimental cleanup for bond during interface stopDave Ertman1-8/+17
[ Upstream commit a9d45c22ed120cdd15ff56d0a6e4700c46451901 ] When the user issues an administrative down to an interface that is the primary for an aggregate bond, the prune lists are being purged. This breaks communication to the secondary interface, which shares a prune list on the main switch block while bonded together. For the primary interface of an aggregate, avoid deleting these prune lists during stop, and since they are hardcoded to specific values for the default vlan and QinQ vlans, the attempt to re-add them during the up phase will quietly fail without any additional problem. Fixes: 1e0f9881ef79 ("ice: Flesh out implementation of support for SRIOV on bonded interface") Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com> Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-01-30ice: initialize ring_stats->syncpJacob Keller1-0/+4
[ Upstream commit 8439016c3b8b5ab687c2420317b1691585106611 ] The u64_stats_sync structure is empty on 64-bit systems. However, on 32-bit systems it contains a seqcount_t which needs to be initialized. While the memory is zero-initialized, a lack of u64_stats_init means that lockdep won't get initialized properly. Fix this by adding u64_stats_init() calls to the rings just after allocation. Fixes: 2b245cb29421 ("ice: Implement transmit and NAPI support") Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-01-30wifi: ath12k: Fix wrong P2P device link id issueYingying Tang1-1/+4
[ Upstream commit 31707572108da55a005e7fed32cc3869c16b7c16 ] Wrong P2P device link id value of 0 was introduced in ath12k_mac_op_tx() by [1]. During the P2P negotiation process, there is only one scan vdev with link ID 15. Currently, the device link ID is incorrectly set to 0 in ath12k_mac_op_tx() during the P2P negotiation process, which leads to TX failures. Set the correct P2P device link ID to 15 to fix the TX failure issue. Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.1.c5-00302-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1.115823.3 Fixes: 648a121bafa3 ("wifi: ath12k: ath12k_mac_op_tx(): MLO support") # [1] Signed-off-by: Yingying Tang <yingying.tang@oss.qualcomm.com> Reviewed-by: Baochen Qiang <baochen.qiang@oss.qualcomm.com> Reviewed-by: Vasanthakumar Thiagarajan <vasanthakumar.thiagarajan@oss.qualcomm.com> Cc: linux-next@vger.kernel.org Cc: netdev@vger.kernel.org Link: https://patch.msgid.link/20260113054636.2620035-1-yingying.tang@oss.qualcomm.com Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
2026-01-30wifi: ath12k: fix dead lock while flushing management framesBaochen Qiang1-3/+3
[ Upstream commit f88e9fc30a261d63946ddc6cc6a33405e6aa27c3 ] Commit [1] converted the management transmission work item into a wiphy work. Since a wiphy work can only run under wiphy lock protection, a race condition happens in below scenario: 1. a management frame is queued for transmission. 2. ath12k_mac_op_flush() gets called to flush pending frames associated with the hardware (i.e, vif being NULL). Then in ath12k_mac_flush() the process waits for the transmission done. 3. Since wiphy lock has been taken by the flush process, the transmission work item has no chance to run, hence the dead lock. >From user view, this dead lock results in below issue: wlp8s0: authenticate with xxxxxx (local address=xxxxxx) wlp8s0: send auth to xxxxxx (try 1/3) wlp8s0: authenticate with xxxxxx (local address=xxxxxx) wlp8s0: send auth to xxxxxx (try 1/3) wlp8s0: authenticated wlp8s0: associate with xxxxxx (try 1/3) wlp8s0: aborting association with xxxxxx by local choice (Reason: 3=DEAUTH_LEAVING) ath12k_pci 0000:08:00.0: failed to flush mgmt transmit queue, mgmt pkts pending 1 The dead lock can be avoided by invoking wiphy_work_flush() to proactively run the queued work item. Note actually it is already present in ath12k_mac_op_flush(), however it does not protect the case where vif being NULL. Hence move it ahead to cover this case as well. Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.1.c5-00302-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1.115823.3 Fixes: 56dcbf0b5207 ("wifi: ath12k: convert struct ath12k::wmi_mgmt_tx_work to struct wiphy_work") # [1] Reported-by: Stuart Hayhurst <stuart.a.hayhurst@gmail.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220959 Signed-off-by: Baochen Qiang <baochen.qiang@oss.qualcomm.com> Reviewed-by: Vasanthakumar Thiagarajan <vasanthakumar.thiagarajan@oss.qualcomm.com> Link: https://patch.msgid.link/20260113-ath12k-fix-dead-lock-while-flushing-v1-1-9713621f3a0f@oss.qualcomm.com Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-01-30wifi: ath12k: Fix scan state stuck in ABORTING after cancel_remain_on_channelYingying Tang1-1/+1
[ Upstream commit 8b8d6ee53dfdee61b0beff66afe3f712456e707a ] Scan finish workqueue was introduced in __ath12k_mac_scan_finish() by [1]. During ath12k_mac_op_cancel_remain_on_channel(), scan state is set to ABORTING and should be reset to IDLE in the queued work. However, wiphy_work_cancel() is called before exiting ath12k_mac_op_cancel_remain_on_channel(), which prevents the work from running and leaves the state in ABORTING. This blocks all subsequent scan requests. Replace wiphy_work_cancel() with wiphy_work_flush() to ensure the queued work runs and scan state is reset to IDLE. Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.1.c5-00302-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1.115823.3 Fixes: 3863f014ad23 ("wifi: ath12k: symmetrize scan vdev creation and deletion during HW scan") # [1] Signed-off-by: Yingying Tang <yingying.tang@oss.qualcomm.com> Reviewed-by: Vasanthakumar Thiagarajan <vasanthakumar.thiagarajan@oss.qualcomm.com> Reviewed-by: Baochen Qiang <baochen.qiang@oss.qualcomm.com> Link: https://patch.msgid.link/20260112115516.2144219-1-yingying.tang@oss.qualcomm.com Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-01-30wifi: ath12k: cancel scan only on active scan vdevManish Dharanenthiran1-1/+2
[ Upstream commit 39c90b1a1dbe6d7c49d19da6e5aec00980c55d8b ] Cancel the scheduled scan request only on the vdev that has an active scan running. Currently, ahvif->links_map is used to obtain the links, but this includes links for which no scan is scheduled. In failure cases where the scan fails due to an invalid channel definition, other links which are not yet brought up (vdev not created) may also be accessed, leading to the following trace: Unable to handle kernel paging request at virtual address 0000000000004c8c pc : _raw_spin_lock_bh+0x1c/0x54 lr : ath12k_scan_abort+0x20/0xc8 [ath12k] Call trace: _raw_spin_lock_bh+0x1c/0x54 (P) ath12k_mac_op_cancel_hw_scan+0xac/0xc4 [ath12k] ieee80211_scan_cancel+0xcc/0x12c [mac80211] ieee80211_do_stop+0x6c4/0x7a8 [mac80211] ieee80211_stop+0x60/0xd8 [mac80211] Skip links that are not created or are not the current scan vdev. This ensures only the scan for the matching links is aborted and avoids aborting unrelated links during cancellation, thus aligning with how start/cleanup manage ar->scan.arvif. Also, remove the redundant arvif->is_started check from ath12k_mac_op_cancel_hw_scan() that was introduced in commit 3863f014ad23 ("wifi: ath12k: symmetrize scan vdev creation and deletion during HW scan") to avoid deleting the scan interface if the scan is triggered on the existing AP vdev as this use case is already handled in ath12k_scan_vdev_clean_work(). Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.4.1-00199-QCAHKSWPL_SILICONZ-1 Fixes: feed05f1526e ("wifi: ath12k: Split scan request for split band device") Signed-off-by: Manish Dharanenthiran <manish.dharanenthiran@oss.qualcomm.com> Reviewed-by: Baochen Qiang <baochen.qiang@oss.qualcomm.com> Reviewed-by: Vasanthakumar Thiagarajan <vasanthakumar.thiagarajan@oss.qualcomm.com> Link: https://patch.msgid.link/20260107-scan_vdev-v1-1-b600aedc645a@qti.qualcomm.com Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-01-30ata: libata: Print features also for ATAPI devicesNiklas Cassel1-0/+3
[ Upstream commit c8c6fb886f57d5bf71fb6de6334a143608d35707 ] Commit d633b8a702ab ("libata: print feature list on device scan") added a print of the features supported by the device for ATA_DEV_ATA and ATA_DEV_ZAC devices, but not for ATA_DEV_ATAPI devices. Fix this by printing the features also for ATAPI devices. Before changes: ata1.00: ATAPI: Slimtype DVD A DU8AESH, 6C2M, max UDMA/133 After changes: ata1.00: ATAPI: Slimtype DVD A DU8AESH, 6C2M, max UDMA/133 ata1.00: Features: Dev-Attention HIPM DIPM Fixes: d633b8a702ab ("libata: print feature list on device scan") Signed-off-by: Niklas Cassel <cassel@kernel.org> Tested-by: Wolf <wolf@yoxt.cc> Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-01-30ata: libata: Add DIPM and HIPM to ata_dev_print_features() early returnNiklas Cassel1-1/+2
[ Upstream commit 89531b68fc293e91187bf0992147e8d22c65cff3 ] ata_dev_print_features() is supposed to return early and not print anything if there are no features supported. However, commit b1f5af54f1f5 ("ata: libata-core: Advertize device support for DIPM and HIPM features") added additional features to ata_dev_print_features() without updating the early return conditional. Add the missing features to the early return conditional. Fixes: b1f5af54f1f5 ("ata: libata-core: Advertize device support for DIPM and HIPM features") Signed-off-by: Niklas Cassel <cassel@kernel.org> Tested-by: Wolf <wolf@yoxt.cc> Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-01-30ata: libata: Add cpr_log to ata_dev_print_features() early returnNiklas Cassel1-1/+1
[ Upstream commit a6bee5e5243ad02cae575becc4c83df66fc29573 ] ata_dev_print_features() is supposed to return early and not print anything if there are no features supported. However, commit fe22e1c2f705 ("libata: support concurrent positioning ranges log") added another feature to ata_dev_print_features() without updating the early return conditional. Add the missing feature to the early return conditional. Fixes: fe22e1c2f705 ("libata: support concurrent positioning ranges log") Signed-off-by: Niklas Cassel <cassel@kernel.org> Tested-by: Wolf <wolf@yoxt.cc> Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-01-30ata: libata-sata: Improve link_power_management_supported sysfs attributeNiklas Cassel1-1/+1
[ Upstream commit ce83767ea323baf8509a75eb0c783cd203e14789 ] The link_power_management_supported sysfs attribute is currently set as true even for ata ports that lack a .set_lpm() callback, e.g. dummy ports. This is a bit silly, because while writing to the link_power_management_policy sysfs attribute will make ata_scsi_lpm_store() update ap->target_lpm_policy (thus sysfs will reflect the new value) and call ata_port_schedule_eh() for the port, it is essentially a no-op. This is because for a port without a .set_lpm() callback, once EH gets to run, the ata_eh_link_set_lpm() will simply return, since the port does not provide a .set_lpm() callback. Thus, make sure that the link_power_management_supported sysfs attribute is set to false for ports that lack a .set_lpm() callback. This way the link_power_management_policy sysfs attribute will no longer be writable, so we will no longer be misleading users to think that their sysfs write actually does something. Fixes: 0060beec0bfa ("ata: libata-sata: Add link_power_management_supported sysfs attribute") Signed-off-by: Niklas Cassel <cassel@kernel.org> Tested-by: Wolf <wolf@yoxt.cc> Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-01-30ata: libata: Call ata_dev_config_lpm() for ATAPI devicesNiklas Cassel1-0/+2
[ Upstream commit 8f3fb33f8f3f825c708ece800c921977c157f9b6 ] Commit d360121832d8 ("ata: libata-core: Introduce ata_dev_config_lpm()") introduced ata_dev_config_lpm(). However, it only called this function for ATA_DEV_ATA and ATA_DEV_ZAC devices, not for ATA_DEV_ATAPI devices. Additionally, commit d99a9142e782 ("ata: libata-core: Move device LPM quirk settings to ata_dev_config_lpm()") moved the LPM quirk application from ata_dev_configure() to ata_dev_config_lpm(), causing LPM quirks for ATAPI devices to no longer be applied. Call ata_dev_config_lpm() also for ATAPI devices, such that LPM quirks are applied for ATAPI devices with an entry in __ata_dev_quirks once again. Fixes: d360121832d8 ("ata: libata-core: Introduce ata_dev_config_lpm()") Fixes: d99a9142e782 ("ata: libata-core: Move device LPM quirk settings to ata_dev_config_lpm()") Signed-off-by: Niklas Cassel <cassel@kernel.org> Tested-by: Wolf <wolf@yoxt.cc> Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-01-30ata: ahci: Do not read the per port area for unimplemented portsNiklas Cassel1-5/+5
[ Upstream commit ea4d4ea6d10a561043922d285f1765c7e4bfd32a ] An AHCI HBA specifies the number of ports it supports using CAP.NP. The HBA is free to only make a subset of the number of ports available using the PI (Ports Implemented) register. libata currently creates dummy ports for HBA ports that are provided by the HBA, but which are marked as "unavailable" using the PI register. Each port will have a per port area of registers in the HBA, regardless if the port is marked as "unavailable" or not. ahci_mark_external_port() currently reads this per port area of registers using readl() to see if the port is marked as external/hotplug-capable. However, AHCI 1.3.1, section "3.1.4 Offset 0Ch: PI – Ports Implemented" states: "Software must not read or write to registers within unavailable ports." Thus, make sure that we only call ahci_mark_external_port() and ahci_update_initial_lpm_policy() for ports that are implemented. From a libata perspective, this should not change anything related to LPM, as dummy ports do not provide any ap->ops (they do not have a .set_lpm() callback), so even if EH were to call .set_lpm() on a dummy port, it was already a no-op. Fixes: f7131935238d ("ata: ahci: move marking of external port earlier") Signed-off-by: Niklas Cassel <cassel@kernel.org> Tested-by: Wolf <wolf@yoxt.cc> Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-01-30wifi: ath12k: don't force radio frequency check in freq_to_idx()Baochen Qiang1-8/+1
[ Upstream commit 1fed08c5519d2f929457f354d3c06c6a8c33829c ] freq_to_idx() is used to map a channel to a survey index. Commit acc152f9be20 ("wifi: ath12k: combine channel list for split-phy devices in single-wiphy") adds radio specific frequency range check in this helper to make sure an invalid index is returned if the channel falls outside that range. However, this check introduces a race, resulting in below warnings as reported in [1]. ath12k_pci 0000:08:00.0: chan info: invalid frequency 6455 (idx 101 out of bounds) ath12k_pci 0000:08:00.0: chan info: invalid frequency 6535 (idx 101 out of bounds) ath12k_pci 0000:08:00.0: chan info: invalid frequency 6615 (idx 101 out of bounds) ath12k_pci 0000:08:00.0: chan info: invalid frequency 6695 (idx 101 out of bounds) ath12k_pci 0000:08:00.0: chan info: invalid frequency 6775 (idx 101 out of bounds) ath12k_pci 0000:08:00.0: chan info: invalid frequency 6855 (idx 101 out of bounds) ath12k_pci 0000:08:00.0: chan info: invalid frequency 6935 (idx 101 out of bounds) ath12k_pci 0000:08:00.0: chan info: invalid frequency 7015 (idx 101 out of bounds) ath12k_pci 0000:08:00.0: chan info: invalid frequency 7095 (idx 101 out of bounds) ath12k_pci 0000:08:00.0: chan info: invalid frequency 6435 (idx 101 out of bounds) Race scenario: 1) A regdomain covering below frequency range is uploaded to host via WMI_REG_CHAN_LIST_CC_EXT_EVENTID event: Country 00, CFG Regdomain UNSET FW Regdomain 0, num_reg_rules 6 1. (2402 - 2472 @ 40) (0, 20) (0 ms) (FLAGS 360448) (0, 0) 2. (2457 - 2477 @ 20) (0, 20) (0 ms) (FLAGS 360576) (0, 0) 3. (5170 - 5330 @ 160) (0, 20) (0 ms) (FLAGS 264320) (0, 0) 4. (5490 - 5730 @ 160) (0, 20) (0 ms) (FLAGS 264320) (0, 0) 5. (5735 - 5895 @ 160) (0, 20) (0 ms) (FLAGS 264320) (0, 0) 6. (5925 - 7125 @ 320) (0, 24) (0 ms) (FLAGS 2056) (0, 255) As a result, radio frequency range is updated as [2402, 7125] ath12k_pci 0000:08:00.0: mac pdev 0 freq limit updated. New range 2402->7125 MHz If no scan in progress or after scan finished, command WMI_SCAN_CHAN_LIST_CMDID is sent to firmware notifying that firmware is allowed to do scan on all channels within that range. The running path is: /* redomain uploaded */ 1. WMI_REG_CHAN_LIST_CC_EXT_EVENTID 2. ath12k_reg_chan_list_event() 3. ath12k_reg_handle_chan_list() 4. queue_work(..., &ar->regd_update_work) 5. ath12k_regd_update_work() 6. ath12k_regd_update() /* update radio frequency range */ 7. ath12k_mac_update_freq_range() 8. regulatory_set_wiphy_regd() 9. ath12k_reg_notifier() 10. ath12k_reg_update_chan_list() 11. queue_work(..., &ar->regd_channel_update_work) 12. ath12k_regd_update_chan_list_work() /* wait scan finishes */ 13. wait_for_completion_timeout(&ar->scan.completed, ...) /* command notifying list of valid channels */ 14. ath12k_wmi_send_scan_chan_list_cmd() 2) Hardware scan is triggered on all allowed channels. 3) Before scan completed, 11D mechanism detects a new country code ath12k_pci 0000:08:00.0: wmi 11d new cc GB With this code sent to firmware, firmware uploads a new regdomain Country GB, CFG Regdomain ETSI FW Regdomain 2, num_reg_rules 9 1. (2402 - 2482 @ 40) (0, 20) (0 ms) (FLAGS 360448) (0, 0) 2. (5170 - 5250 @ 80) (0, 23) (0 ms) (FLAGS 264192) (0, 0) 3. (5250 - 5330 @ 80) (0, 23) (0 ms) (FLAGS 264216) (0, 0) 4. (5490 - 5590 @ 80) (0, 30) (0 ms) (FLAGS 264208) 5. (5590 - 5650 @ 40) (0, 30) (600000 ms) (FLAGS 264208) 6. (5650 - 5730 @ 80) (0, 30) (0 ms) (FLAGS 264208) 7. (5735 - 5875 @ 80) (0, 14) (0 ms) (FLAGS 264192) (0, 0) 8. (5855 - 5875 @ 20) (0, 14) (0 ms) (FLAGS 264192) (0, 0) 9. (5945 - 6425 @ 320) (0, 24) (0 ms) (FLAGS 2056) (0, 11) Then radio frequency range is updated as [2402, 6425] ath12k_pci 0000:08:00.0: mac pdev 0 freq limit updated. New range 2402->6425 MHz Please note this is a smaller range than the previous one. Later host runs the same path for the purpose of notifying the new channel list. However since scan not completed, host just waits there. Meanwhile, firmware is possibly scanning channels outside the new range. As a result, WMI_CHAN_INFO_EVENTID events for those channels fail freq_to_idx() check and triggers warnings above. Fix this issue by removing radio frequency check in freq_to_idx(). This is valid because channels being scanned do not synchronize with frequency range update. Besides, this won't cause any problem, since freq_to_idx() is only used for survey data. Even out-of-range channels filled in the survey, they won't get delivered to userspace due to the range check already there in ath12k_mac_op_get_survey(). Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.1.c5-00302-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1.115823.3 Fixes: acc152f9be20 ("wifi: ath12k: combine channel list for split-phy devices in single-wiphy") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220871 # 1 Signed-off-by: Baochen Qiang <baochen.qiang@oss.qualcomm.com> Link: https://patch.msgid.link/20260108-ath12k-fix-freq-to-idx-v1-1-b2458cf7aa0d@oss.qualcomm.com Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-01-30pmdomain: qcom: rpmhpd: Add MXC to SC8280XPKonrad Dybcio1-0/+4
[ Upstream commit 5bc3e720e725cd5fa34875fa1e5434d565858067 ] This was apparently accounted for in dt-bindings, but never made its way into the driver. Fix it for SC8280XP and its VDD_GFX-less cousin, SA8540P. Fixes: f68f1cb3437d ("soc: qcom: rpmhpd: add sc8280xp & sa8540p rpmh power-domains") Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Signed-off-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Link: https://lore.kernel.org/r/20251202-topic-8280_mxc-v2-2-46cdf47a829e@oss.qualcomm.com Signed-off-by: Bjorn Andersson <andersson@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-01-30dt-bindings: power: qcom,rpmpd: Add SC8280XP_MXC_AOKonrad Dybcio1-0/+1
[ Upstream commit 45e1be5ddec98db71e7481fa7a3005673200d85c ] Not sure how useful it's gonna be in practice, but the definition is missing (unlike the previously-unused SC8280XP_MXC-non-_AO), so add it to allow the driver to create the corresponding pmdomain. Fixes: dbfb5f94e084 ("dt-bindings: power: rpmpd: Add sc8280xp RPMh power-domains") Acked-by: Rob Herring (Arm) <robh@kernel.org> Signed-off-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Link: https://lore.kernel.org/r/20251202-topic-8280_mxc-v2-1-46cdf47a829e@oss.qualcomm.com Signed-off-by: Bjorn Andersson <andersson@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-01-30btrfs: fix missing fields in superblock backup with BLOCK_GROUP_TREEMark Harmstone1-1/+1
[ Upstream commit 1d8f69f453c2e8a2d99b158e58e02ed65031fa6d ] When the BLOCK_GROUP_TREE compat_ro flag is set, the extent root and csum root fields are getting missed. This is because EXTENT_TREE_V2 treated these differently, and when they were split off this special-casing was mistakenly assigned to BGT rather than the rump EXTENT_TREE_V2. There's no reason why the existence of the block group tree should mean that we don't record the details of the last commit's extent root and csum root. Fix the code in backup_super_roots() so that the correct check gets made. Fixes: 1c56ab991903 ("btrfs: separate BLOCK_GROUP_TREE compat RO flag from EXTENT_TREE_V2") Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Mark Harmstone <mark@harmstone.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-01-30Drivers: hv: Always do Hyper-V panic notification in hv_kmsg_dump()Michael Kelley1-5/+7
[ Upstream commit 49f49d47af67f8a7b221db1d758fc634242dc91a ] hv_kmsg_dump() currently skips the panic notification entirely if it doesn't get any message bytes to pass to Hyper-V due to an error from kmsg_dump_get_buffer(). Skipping the notification is undesirable because it leaves the Hyper-V host uncertain about the state of a panic'ed guest. Fix this by always doing the panic notification, even if bytes_written is zero. Also ensure that bytes_written is initialized, which fixes a kernel test robot warning. The warning is actually bogus because kmsg_dump_get_buffer() happens to set bytes_written even if it fails, and in the kernel test robot's CONFIG_PRINTK not set case, hv_kmsg_dump() is never called. But do the initialization for robustness and to quiet the static checker. Fixes: 9c318a1d9b50 ("Drivers: hv: move panic report code from vmbus to hv early init code") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/all/202512172103.OcUspn1Z-lkp@intel.com/ Signed-off-by: Michael Kelley <mhklinux@outlook.com> Reviewed-by: Roman Kisel <vdso@mailbox.org> Signed-off-by: Wei Liu <wei.liu@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-01-30perf parse-events: Fix evsel allocation failureFaisal Bukhari1-2/+5
[ Upstream commit 1eb217ab2e737609f8a861b517649e82e7236d05 ] If evsel__new_idx() returns NULL, the function currently jumps to label 'out_err'. Here, references to `cpus` and `pmu_cpus` are dropped. Also, resources held by evsel->name and evsel->metric_id are freed. But if evsel__new_idx() returns NULL, it can lead to NULL pointer dereference. Fixes: cd63c22168257a0b ("perf parse-events: Minor __add_event refactoring") Signed-off-by: Faisal Bukhari <faisalbukhari523@gmail.com> Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-01-30arm64: dts: rockchip: Fix wrong register range of rk3576 gpuChaoyi Chen1-1/+1
[ Upstream commit 955b263c421c6fe5075369c52199f278289ec8c4 ] According to RK3576 TRM part1 Table 1-1 Address Mapping, the size of the GPU registers is 128 KB. The current mapping incorrectly includes the addresses of multiple following IP like the eInk interface at 0x27900000. This has not been detected by the DT tooling as none of the extra mapped IP is described in the upstream RK3576 DT so far. Fixes: 57b1ce903966 ("arm64: dts: rockchip: Add rk3576 SoC base DT") Signed-off-by: Chaoyi Chen <chaoyi.chen@rock-chips.com> Reviewed-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com> Reviewed-by: Sebastian Reichel <sebastian.reichel@collabora.com> Link: https://patch.msgid.link/20260106071513.209-1-kernel@airkyi.com Signed-off-by: Heiko Stuebner <heiko@sntech.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-01-30arm64: dts: qcom: sm8650: Fix compile warnings in USB controller nodeKrishna Kurapati1-3/+0
[ Upstream commit 1f6ca557088eb96c8c554f853eb7c60862f8a0a8 ] With W=1, the following error comes up: Warning (avoid_unnecessary_addr_size): /soc@0/usb@a600000: unnecessary #address-cells/#size-cells without "ranges", "dma-ranges" or child "reg" or "ranges" property This is because the child node being removed during flattening and moving to latest bindings. Fixes: 77e1f16b9302 ("arm64: dts: qcom: sm8650: Flatten the USB nodes") Signed-off-by: Krishna Kurapati <krishna.kurapati@oss.qualcomm.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com> Link: https://lore.kernel.org/r/20251203144856.2711440-3-krishna.kurapati@oss.qualcomm.com Signed-off-by: Bjorn Andersson <andersson@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-01-30arm64: dts: qcom: sm8550: Fix compile warnings in USB controller nodeKrishna Kurapati1-2/+0
[ Upstream commit 9dbc9bed01837717b8ab755cf5067a6f8d35b00f ] With W=1, the following error comes up: Warning (avoid_unnecessary_addr_size): /soc@0/usb@a600000: unnecessary #address-cells/#size-cells without "ranges", "dma-ranges" or child "reg" or "ranges" property This is because the child node being removed during flattening and moving to latest bindings. Fixes: 33450878adfc ("arm64: dts: qcom: sm8550: Flatten the USB nodes") Signed-off-by: Krishna Kurapati <krishna.kurapati@oss.qualcomm.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com> Link: https://lore.kernel.org/r/20251203144856.2711440-2-krishna.kurapati@oss.qualcomm.com Signed-off-by: Bjorn Andersson <andersson@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-01-30arm64: dts: qcom: sc8280xp: Add missing VDD_MXC linksKonrad Dybcio1-4/+12
[ Upstream commit 868b979c5328b867c95a6d5a93ba13ad0d3cd2f1 ] To make sure that power rail is voted for, wire it up to its consumers. Fixes: 152d1faf1e2f ("arm64: dts: qcom: add SC8280XP platform") Signed-off-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Link: https://lore.kernel.org/r/20251202-topic-8280_mxc-v2-3-46cdf47a829e@oss.qualcomm.com Signed-off-by: Bjorn Andersson <andersson@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-01-23Linux 6.18.7v6.18.7Greg Kroah-Hartman1-1/+1
Link: https://lore.kernel.org/r/20260121181418.537774329@linuxfoundation.org Tested-by: Salvatore Bonaccorso <carnil@debian.org> Tested-by: Ronald Warsow <rwarsow@gmx.de> Tested-by: Shuah Khan <skhan@linuxfoundation.org> Tested-by: Florian Fainelli <florian.fainelli@broadcom.com> Tested-by: Justin M. Forbes <jforbes@fedoraproject.org> Tested-by: Takeshi Ogasawara <takeshi.ogasawara@futuring-girl.com> Tested-by: Brett A C Sheffield <bacs@librecast.net> Tested-by: Shung-Hsi Yu <shung-hsi.yu@suse.com> Tested-by: Jon Hunter <jonathanh@nvidia.com> Tested-by: Ron Economos <re@w6rz.net> Tested-by: Brett Mastbergen <bmastbergen@ciq.com> Tested-by: Peter Schneider <pschneider1968@googlemail.com> Tested-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-01-23iommu/sva: include mmu_notifier.h headerCarlos Llamas1-0/+1
commit 4b5c493ff762bb0433529ca6870b284f0a2a5ca8 upstream. A call to mmu_notifier_arch_invalidate_secondary_tlbs() was introduced in commit e37d5a2d60a3 ("iommu/sva: invalidate stale IOTLB entries for kernel address space") but without explicitly adding its corresponding header file <linux/mmu_notifier.h>. This was evidenced while trying to enable compile testing support for IOMMU_SVA: config IOMMU_SVA select IOMMU_MM_DATA - bool + bool "Shared Virtual Addressing" if COMPILE_TEST The thing is for certain architectures this header file is indirectly included via <asm/tlbflush.h>. However, for others such as 32-bit arm the header is missing and it results in a build failure: $ make ARCH=arm allmodconfig [...] drivers/iommu/iommu-sva.c:340:3: error: call to undeclared function 'mmu_notifier_arch_invalidate_secondary_tlbs' [...] 340 | mmu_notifier_arch_invalidate_secondary_tlbs(iommu_mm->mm, start, end); | ^ Fix this by including the appropriate header file. Link: https://lkml.kernel.org/r/20260105190747.625082-1-cmllamas@google.com Fixes: e37d5a2d60a3 ("iommu/sva: invalidate stale IOTLB entries for kernel address space") Signed-off-by: Carlos Llamas <cmllamas@google.com> Cc: Baolu Lu <baolu.lu@linux.intel.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Joerg Roedel <joro@8bytes.org> Cc: Kevin Tian <kevin.tian@intel.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Vasant Hegde <vasant.hegde@amd.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-01-23Revert "functionfs: fix the open/removal races"Greg Kroah-Hartman1-43/+10
This reverts commit b49c766856fb5901490de577e046149ebf15e39d which is commit e5bf5ee266633cb18fff6f98f0b7d59a62819eee upstream. It has been reported to cause test problems in Android devices. As the other functionfs changes were not also backported at the same time, something is out of sync. So just revert this one for now and it can come back in the future as a patch series if it is tested. Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-01-23mm/page_alloc: prevent pcp corruption with SMP=nVlastimil Babka1-8/+39
commit 038a102535eb49e10e93eafac54352fcc5d78847 upstream. The kernel test robot has reported: BUG: spinlock trylock failure on UP on CPU#0, kcompactd0/28 lock: 0xffff888807e35ef0, .magic: dead4ead, .owner: kcompactd0/28, .owner_cpu: 0 CPU: 0 UID: 0 PID: 28 Comm: kcompactd0 Not tainted 6.18.0-rc5-00127-ga06157804399 #1 PREEMPT 8cc09ef94dcec767faa911515ce9e609c45db470 Call Trace: <IRQ> __dump_stack (lib/dump_stack.c:95) dump_stack_lvl (lib/dump_stack.c:123) dump_stack (lib/dump_stack.c:130) spin_dump (kernel/locking/spinlock_debug.c:71) do_raw_spin_trylock (kernel/locking/spinlock_debug.c:?) _raw_spin_trylock (include/linux/spinlock_api_smp.h:89 kernel/locking/spinlock.c:138) __free_frozen_pages (mm/page_alloc.c:2973) ___free_pages (mm/page_alloc.c:5295) __free_pages (mm/page_alloc.c:5334) tlb_remove_table_rcu (include/linux/mm.h:? include/linux/mm.h:3122 include/asm-generic/tlb.h:220 mm/mmu_gather.c:227 mm/mmu_gather.c:290) ? __cfi_tlb_remove_table_rcu (mm/mmu_gather.c:289) ? rcu_core (kernel/rcu/tree.c:?) rcu_core (include/linux/rcupdate.h:341 kernel/rcu/tree.c:2607 kernel/rcu/tree.c:2861) rcu_core_si (kernel/rcu/tree.c:2879) handle_softirqs (arch/x86/include/asm/jump_label.h:36 include/trace/events/irq.h:142 kernel/softirq.c:623) __irq_exit_rcu (arch/x86/include/asm/jump_label.h:36 kernel/softirq.c:725) irq_exit_rcu (kernel/softirq.c:741) sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1052) </IRQ> <TASK> RIP: 0010:_raw_spin_unlock_irqrestore (arch/x86/include/asm/preempt.h:95 include/linux/spinlock_api_smp.h:152 kernel/locking/spinlock.c:194) free_pcppages_bulk (mm/page_alloc.c:1494) drain_pages_zone (include/linux/spinlock.h:391 mm/page_alloc.c:2632) __drain_all_pages (mm/page_alloc.c:2731) drain_all_pages (mm/page_alloc.c:2747) kcompactd (mm/compaction.c:3115) kthread (kernel/kthread.c:465) ? __cfi_kcompactd (mm/compaction.c:3166) ? __cfi_kthread (kernel/kthread.c:412) ret_from_fork (arch/x86/kernel/process.c:164) ? __cfi_kthread (kernel/kthread.c:412) ret_from_fork_asm (arch/x86/entry/entry_64.S:255) </TASK> Matthew has analyzed the report and identified that in drain_page_zone() we are in a section protected by spin_lock(&pcp->lock) and then get an interrupt that attempts spin_trylock() on the same lock. The code is designed to work this way without disabling IRQs and occasionally fail the trylock with a fallback. However, the SMP=n spinlock implementation assumes spin_trylock() will always succeed, and thus it's normally a no-op. Here the enabled lock debugging catches the problem, but otherwise it could cause a corruption of the pcp structure. The problem has been introduced by commit 574907741599 ("mm/page_alloc: leave IRQs enabled for per-cpu page allocations"). The pcp locking scheme recognizes the need for disabling IRQs to prevent nesting spin_trylock() sections on SMP=n, but the need to prevent the nesting in spin_lock() has not been recognized. Fix it by introducing local wrappers that change the spin_lock() to spin_lock_iqsave() with SMP=n and use them in all places that do spin_lock(&pcp->lock). [vbabka@suse.cz: add pcp_ prefix to the spin_lock_irqsave wrappers, per Steven] Link: https://lkml.kernel.org/r/20260105-fix-pcp-up-v1-1-5579662d2071@suse.cz Fixes: 574907741599 ("mm/page_alloc: leave IRQs enabled for per-cpu page allocations") Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202512101320.e2f2dd6f-lkp@intel.com Analyzed-by: Matthew Wilcox <willy@infradead.org> Link: https://lore.kernel.org/all/aUW05pyc9nZkvY-1@casper.infradead.org/ Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Brendan Jackman <jackmanb@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Zi Yan <ziy@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-01-23mm/page_alloc: batch page freeing in decay_pcp_highJoshua Hahn1-3/+6
commit fc4b909c368f3a7b08c895dd5926476b58e85312 upstream. It is possible for pcp->count - pcp->high to exceed pcp->batch by a lot. When this happens, we should perform batching to ensure that free_pcppages_bulk isn't called with too many pages to free at once and starve out other threads that need the pcp or zone lock. Since we are still only freeing the difference between the initial pcp->count and pcp->high values, there should be no change to how many pages are freed. Link: https://lkml.kernel.org/r/20251014145011.3427205-3-joshua.hahnjy@gmail.com Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com> Suggested-by: Chris Mason <clm@fb.com> Suggested-by: Andrew Morton <akpm@linux-foundation.org> Co-developed-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Brendan Jackman <jackmanb@google.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Michal Hocko <mhocko@suse.com> Cc: SeongJae Park <sj@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Stable-dep-of: 038a102535eb ("mm/page_alloc: prevent pcp corruption with SMP=n") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-01-23mm/page_alloc/vmstat: simplify refresh_cpu_vm_stats change detectionJoshua Hahn3-18/+20
commit 0acc67c4030c39f39ac90413cc5d0abddd3a9527 upstream. Patch series "mm/page_alloc: Batch callers of free_pcppages_bulk", v5. Motivation & Approach ===================== While testing workloads with high sustained memory pressure on large machines in the Meta fleet (1Tb memory, 316 CPUs), we saw an unexpectedly high number of softlockups. Further investigation showed that the zone lock in free_pcppages_bulk was being held for a long time, and was called to free 2k+ pages over 100 times just during boot. This causes starvation in other processes for the zone lock, which can lead to the system stalling as multiple threads cannot make progress without the locks. We can see these issues manifesting as warnings: [ 4512.591979] rcu: INFO: rcu_sched self-detected stall on CPU [ 4512.604370] rcu: 20-....: (9312 ticks this GP) idle=a654/1/0x4000000000000000 softirq=309340/309344 fqs=5426 [ 4512.626401] rcu: hardirqs softirqs csw/system [ 4512.638793] rcu: number: 0 145 0 [ 4512.651177] rcu: cputime: 30 10410 174 ==> 10558(ms) [ 4512.666657] rcu: (t=21077 jiffies g=783665 q=1242213 ncpus=316) While these warnings don't indicate a crash or a kernel panic, they do point to the underlying issue of lock contention. To prevent starvation in both locks, batch the freeing of pages using pcp->batch. Because free_pcppages_bulk is called with the pcp lock and acquires the zone lock, relinquishing and reacquiring the locks are only effective when both of them are broken together (unless the system was built with queued spinlocks). Thus, instead of modifying free_pcppages_bulk to break both locks, batch the freeing from its callers instead. A similar fix has been implemented in the Meta fleet, and we have seen significantly less softlockups. Testing ======= The following are a few synthetic benchmarks, made on three machines. The first is a large machine with 754GiB memory and 316 processors. The second is a relatively smaller machine with 251GiB memory and 176 processors. The third and final is the smallest of the three, which has 62GiB memory and 36 processors. On all machines, I kick off a kernel build with -j$(nproc). Negative delta is better (faster compilation). Large machine (754GiB memory, 316 processors) make -j$(nproc) +------------+---------------+-----------+ | Metric (s) | Variation (%) | Delta(%) | +------------+---------------+-----------+ | real | 0.8070 | - 1.4865 | | user | 0.2823 | + 0.4081 | | sys | 5.0267 | -11.8737 | +------------+---------------+-----------+ Medium machine (251GiB memory, 176 processors) make -j$(nproc) +------------+---------------+----------+ | Metric (s) | Variation (%) | Delta(%) | +------------+---------------+----------+ | real | 0.2806 | +0.0351 | | user | 0.0994 | +0.3170 | | sys | 0.6229 | -0.6277 | +------------+---------------+----------+ Small machine (62GiB memory, 36 processors) make -j$(nproc) +------------+---------------+----------+ | Metric (s) | Variation (%) | Delta(%) | +------------+---------------+----------+ | real | 0.1503 | -2.6585 | | user | 0.0431 | -2.2984 | | sys | 0.1870 | -3.2013 | +------------+---------------+----------+ Here, variation is the coefficient of variation, i.e. standard deviation / mean. Based on these results, it seems like there are varying degrees to how much lock contention this reduces. For the largest and smallest machines that I ran the tests on, it seems like there is quite some significant reduction. There is also some performance increases visible from userspace. Interestingly, the performance gains don't scale with the size of the machine, but rather there seems to be a dip in the gain there is for the medium-sized machine. One possible theory is that because the high watermark depends on both memory and the number of local CPUs, what impacts zone contention the most is not these individual values, but rather the ratio of mem:processors. This patch (of 5): Currently, refresh_cpu_vm_stats returns an int, indicating how many changes were made during its updates. Using this information, callers like vmstat_update can heuristically determine if more work will be done in the future. However, all of refresh_cpu_vm_stats's callers either (a) ignore the result, only caring about performing the updates, or (b) only care about whether changes were made, but not *how many* changes were made. Simplify the code by returning a bool instead to indicate if updates were made. In addition, simplify fold_diff and decay_pcp_high to return a bool for the same reason. Link: https://lkml.kernel.org/r/20251014145011.3427205-1-joshua.hahnjy@gmail.com Link: https://lkml.kernel.org/r/20251014145011.3427205-2-joshua.hahnjy@gmail.com Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: Brendan Jackman <jackmanb@google.com> Cc: Chris Mason <clm@fb.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Michal Hocko <mhocko@suse.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Stable-dep-of: 038a102535eb ("mm/page_alloc: prevent pcp corruption with SMP=n") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-01-23btrfs: fix deadlock in wait_current_trans() due to ignored transaction typeRobbie Ko1-5/+6
commit 5037b342825df7094a4906d1e2a9674baab50cb2 upstream. When wait_current_trans() is called during start_transaction(), it currently waits for a blocked transaction without considering whether the given transaction type actually needs to wait for that particular transaction state. The btrfs_blocked_trans_types[] array already defines which transaction types should wait for which transaction states, but this check was missing in wait_current_trans(). This can lead to a deadlock scenario involving two transactions and pending ordered extents: 1. Transaction A is in TRANS_STATE_COMMIT_DOING state 2. A worker processing an ordered extent calls start_transaction() with TRANS_JOIN 3. join_transaction() returns -EBUSY because Transaction A is in TRANS_STATE_COMMIT_DOING 4. Transaction A moves to TRANS_STATE_UNBLOCKED and completes 5. A new Transaction B is created (TRANS_STATE_RUNNING) 6. The ordered extent from step 2 is added to Transaction B's pending ordered extents 7. Transaction B immediately starts commit by another task and enters TRANS_STATE_COMMIT_START 8. The worker finally reaches wait_current_trans(), sees Transaction B in TRANS_STATE_COMMIT_START (a blocked state), and waits unconditionally 9. However, TRANS_JOIN should NOT wait for TRANS_STATE_COMMIT_START according to btrfs_blocked_trans_types[] 10. Transaction B is waiting for pending ordered extents to complete 11. Deadlock: Transaction B waits for ordered extent, ordered extent waits for Transaction B This can be illustrated by the following call stacks: CPU0 CPU1 btrfs_finish_ordered_io() start_transaction(TRANS_JOIN) join_transaction() # -EBUSY (Transaction A is # TRANS_STATE_COMMIT_DOING) # Transaction A completes # Transaction B created # ordered extent added to # Transaction B's pending list btrfs_commit_transaction() # Transaction B enters # TRANS_STATE_COMMIT_START # waiting for pending ordered # extents wait_current_trans() # waits for Transaction B # (should not wait!) Task bstore_kv_sync in btrfs_commit_transaction waiting for ordered extents: __schedule+0x2e7/0x8a0 schedule+0x64/0xe0 btrfs_commit_transaction+0xbf7/0xda0 [btrfs] btrfs_sync_file+0x342/0x4d0 [btrfs] __x64_sys_fdatasync+0x4b/0x80 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Task kworker in wait_current_trans waiting for transaction commit: Workqueue: btrfs-syno_nocow btrfs_work_helper [btrfs] __schedule+0x2e7/0x8a0 schedule+0x64/0xe0 wait_current_trans+0xb0/0x110 [btrfs] start_transaction+0x346/0x5b0 [btrfs] btrfs_finish_ordered_io.isra.0+0x49b/0x9c0 [btrfs] btrfs_work_helper+0xe8/0x350 [btrfs] process_one_work+0x1d3/0x3c0 worker_thread+0x4d/0x3e0 kthread+0x12d/0x150 ret_from_fork+0x1f/0x30 Fix this by passing the transaction type to wait_current_trans() and checking btrfs_blocked_trans_types[cur_trans->state] against the given type before deciding to wait. This ensures that transaction types which are allowed to join during certain blocked states will not unnecessarily wait and cause deadlocks. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Robbie Ko <robbieko@synology.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Cc: Motiejus Jakštys <motiejus@jakstys.lt> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-01-23HID: intel-ish-hid: Fix -Wcast-function-type-strict in ↵Nathan Chancellor1-2/+6
devm_ishtp_alloc_workqueue() commit 3644f4411713f52bf231574aa8759e3d8e20b341 upstream. Clang warns (or errors with CONFIG_WERROR=y / W=e): drivers/hid/intel-ish-hid/ipc/ipc.c:935:36: error: cast from 'void (*)(struct workqueue_struct *)' to 'void (*)(void *)' converts to incompatible function type [-Werror,-Wcast-function-type-strict] 935 | if (devm_add_action_or_reset(dev, (void (*)(void *))destroy_workqueue, | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ include/linux/device/devres.h:168:34: note: expanded from macro 'devm_add_action_or_reset' 168 | __devm_add_action_or_ireset(dev, action, data, #action) | ^~~~~~ This warning is pointing out a kernel control flow integrity (kCFI / CONFIG_CFI=y) violation will occur due to this function cast when the destroy_workqueue() is indirectly called via devm_action_release() because the prototype of destroy_workqueue() does not match the prototype of (*action)(). Use a local function with the correct prototype to wrap destroy_workqueue() to resolve the warning and CFI violation. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202510190103.qTZvfdjj-lkp@intel.com/ Closes: https://github.com/ClangBuiltLinux/linux/issues/2139 Fixes: 0d30dae38fe0 ("HID: intel-ish-hid: Use dedicated unbound workqueues to prevent resume blocking") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Reviewed-by: Zhang Lixu <lixu.zhang@intel.com> Signed-off-by: Jiri Kosina <jkosina@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-01-23HID: intel-ish-hid: Use dedicated unbound workqueues to prevent resume blockingZhang Lixu7-7/+47
commit 0d30dae38fe01cd1de358c6039a0b1184689fe51 upstream. During suspend/resume tests with S2IDLE, some ISH functional failures were observed because of delay in executing ISH resume handler. Here schedule_work() is used from resume handler to do actual work. schedule_work() uses system_wq, which is a per CPU work queue. Although the queuing is not bound to a CPU, but it prefers local CPU of the caller, unless prohibited. Users of this work queue are not supposed to queue long running work. But in practice, there are scenarios where long running work items are queued on other unbound workqueues, occupying the CPU. As a result, the ISH resume handler may not get a chance to execute in a timely manner. In one scenario, one of the ish_resume_handler() executions was delayed nearly 1 second because another work item on an unbound workqueue occupied the same CPU. This delay causes ISH functionality failures. A similar issue was previously observed where the ISH HID driver timed out while getting the HID descriptor during S4 resume in the recovery kernel, likely caused by the same workqueue contention problem. Create dedicated unbound workqueues for all ISH operations to allow work items to execute on any available CPU, eliminating CPU-specific bottlenecks and improving resume reliability under varying system loads. Also ISH has three different components, a bus driver which implements ISH protocols, a PCI interface layer and HID interface. Use one dedicated work queue for all of them. Signed-off-by: Zhang Lixu <lixu.zhang@intel.com> Signed-off-by: Jiri Kosina <jkosina@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-01-23iommu/sva: invalidate stale IOTLB entries for kernel address spaceLu Baolu4-4/+35
commit e37d5a2d60a338c5917c45296bac65da1382eda5 upstream. Introduce a new IOMMU interface to flush IOTLB paging cache entries for the CPU kernel address space. This interface is invoked from the x86 architecture code that manages combined user and kernel page tables, specifically before any kernel page table page is freed and reused. This addresses the main issue with vfree() which is a common occurrence and can be triggered by unprivileged users. While this resolves the primary problem, it doesn't address some extremely rare case related to memory unplug of memory that was present as reserved memory at boot, which cannot be triggered by unprivileged users. The discussion can be found at the link below. Enable SVA on x86 architecture since the IOMMU can now receive notification to flush the paging cache before freeing the CPU kernel page table pages. Link: https://lkml.kernel.org/r/20251022082635.2462433-9-baolu.lu@linux.intel.com Link: https://lore.kernel.org/linux-iommu/04983c62-3b1d-40d4-93ae-34ca04b827e5@intel.com/ Co-developed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Suggested-by: Jann Horn <jannh@google.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Betkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jean-Philippe Brucker <jean-philippe@linaro.org> Cc: Joerg Roedel <joro@8bytes.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robin Murohy <robin.murphy@arm.com> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com> Cc: Vinicius Costa Gomes <vinicius.gomes@intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Cc: Yi Lai <yi1.lai@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-01-23mm: introduce deferred freeing for kernel page tablesDave Hansen3-3/+53
commit 5ba2f0a1556479638ac11a3c201421f5515e89f5 upstream. This introduces a conditional asynchronous mechanism, enabled by CONFIG_ASYNC_KERNEL_PGTABLE_FREE. When enabled, this mechanism defers the freeing of pages that are used as page tables for kernel address mappings. These pages are now queued to a work struct instead of being freed immediately. This deferred freeing allows for batch-freeing of page tables, providing a safe context for performing a single expensive operation (TLB flush) for a batch of kernel page tables instead of performing that expensive operation for each page table. Link: https://lkml.kernel.org/r/20251022082635.2462433-8-baolu.lu@linux.intel.com Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Alistair Popple <apopple@nvidia.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Betkov <bp@alien8.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Jean-Philippe Brucker <jean-philippe@linaro.org> Cc: Joerg Roedel <joro@8bytes.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robin Murohy <robin.murphy@arm.com> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com> Cc: Vasant Hegde <vasant.hegde@amd.com> Cc: Vinicius Costa Gomes <vinicius.gomes@intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Cc: Yi Lai <yi1.lai@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-01-23x86/mm: use pagetable_free()Lu Baolu2-2/+2
commit bf9e4e30f3538391745a99bc2268ec4f5e4a401e upstream. The kernel's memory management subsystem provides a dedicated interface, pagetable_free(), for freeing page table pages. Updates two call sites to use pagetable_free() instead of the lower-level __free_page() or free_pages(). This improves code consistency and clarity, and ensures the correct freeing mechanism is used. Link: https://lkml.kernel.org/r/20251022082635.2462433-7-baolu.lu@linux.intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Alistair Popple <apopple@nvidia.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Betkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Jean-Philippe Brucker <jean-philippe@linaro.org> Cc: Joerg Roedel <joro@8bytes.org> Cc: Kevin Tian <kevin.tian@intel.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robin Murohy <robin.murphy@arm.com> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com> Cc: Vasant Hegde <vasant.hegde@amd.com> Cc: Vinicius Costa Gomes <vinicius.gomes@intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Cc: Yi Lai <yi1.lai@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-01-23mm: introduce pure page table freeing functionDave Hansen1-3/+8
commit 01894295672335ff304beed4359f30d14d5765f2 upstream. The pages used for ptdescs are currently freed back to the allocator in a single location. They will shortly be freed from a second location. Create a simple helper that just frees them back to the allocator. Link: https://lkml.kernel.org/r/20251022082635.2462433-6-baolu.lu@linux.intel.com Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Alistair Popple <apopple@nvidia.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Betkov <bp@alien8.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Jean-Philippe Brucker <jean-philippe@linaro.org> Cc: Joerg Roedel <joro@8bytes.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robin Murohy <robin.murphy@arm.com> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com> Cc: Vasant Hegde <vasant.hegde@amd.com> Cc: Vinicius Costa Gomes <vinicius.gomes@intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Cc: Yi Lai <yi1.lai@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-01-23x86/mm: use 'ptdesc' when freeing PMD pagesDave Hansen1-6/+6
commit 412d000346ea38ac4b9bb715a86c73ef89d90dea upstream. There are a billion ways to refer to a physical memory address. One of the x86 PMD freeing code location chooses to use a 'pte_t *' to point to a PMD page and then call a PTE-specific freeing function for it. That's a bit wonky. Just use a 'struct ptdesc *' instead. Its entire purpose is to refer to page table pages. It also means being able to remove an explicit cast. Right now, pte_free_kernel() is a one-liner that calls pagetable_dtor_free(). Effectively, all this patch does is remove one superfluous __pa(__va(paddr)) conversion and then call pagetable_dtor_free() directly instead of through a helper. Link: https://lkml.kernel.org/r/20251022082635.2462433-5-baolu.lu@linux.intel.com Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Betkov <bp@alien8.de> Cc: David Hildenbrand <david@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Jean-Philippe Brucker <jean-philippe@linaro.org> Cc: Joerg Roedel <joro@8bytes.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robin Murohy <robin.murphy@arm.com> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com> Cc: Vasant Hegde <vasant.hegde@amd.com> Cc: Vinicius Costa Gomes <vinicius.gomes@intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Cc: Yi Lai <yi1.lai@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-01-23mm: actually mark kernel page table pagesDave Hansen2-0/+21
commit 977870522af34359b461060597ee3a86f27450d6 upstream. Now that the API is in place, mark kernel page table pages just after they are allocated. Unmark them just before they are freed. Note: Unconditionally clearing the 'kernel' marking (via ptdesc_clear_kernel()) would be functionally identical to what is here. But having the if() makes it logically clear that this function can be used for kernel and non-kernel page tables. Link: https://lkml.kernel.org/r/20251022082635.2462433-4-baolu.lu@linux.intel.com Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Alistair Popple <apopple@nvidia.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Betkov <bp@alien8.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Jean-Philippe Brucker <jean-philippe@linaro.org> Cc: Joerg Roedel <joro@8bytes.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robin Murohy <robin.murphy@arm.com> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com> Cc: Vasant Hegde <vasant.hegde@amd.com> Cc: Vinicius Costa Gomes <vinicius.gomes@intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Cc: Yi Lai <yi1.lai@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-01-23mm: add a ptdesc flag to mark kernel page tablesDave Hansen1-0/+41
commit 27bfafac65d87c58639f5d7af1353ec1e7886963 upstream. The page tables used to map the kernel and userspace often have very different handling rules. There are frequently *_kernel() variants of functions just for kernel page tables. That's not great and has lead to code duplication. Instead of having completely separate call paths, allow a 'ptdesc' to be marked as being for kernel mappings. Introduce helpers to set and clear this status. Note: this uses the PG_referenced bit. Page flags are a great fit for this since it is truly a single bit of information. Use PG_referenced itself because it's a fairly benign flag (as opposed to things like PG_lock). It's also (according to Willy) unlikely to go away any time soon. PG_referenced is not in PAGE_FLAGS_CHECK_AT_FREE. It does not need to be cleared before freeing the page, and pages coming out of the allocator should have it cleared. Regardless, introduce an API to clear it anyway. Having symmetry in the API makes it easier to change the underlying implementation later, like if there was a need to move to a PAGE_FLAGS_CHECK_AT_FREE bit. Link: https://lkml.kernel.org/r/20251022082635.2462433-3-baolu.lu@linux.intel.com Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Alistair Popple <apopple@nvidia.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Betkov <bp@alien8.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Jean-Philippe Brucker <jean-philippe@linaro.org> Cc: Joerg Roedel <joro@8bytes.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robin Murohy <robin.murphy@arm.com> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com> Cc: Vasant Hegde <vasant.hegde@amd.com> Cc: Vinicius Costa Gomes <vinicius.gomes@intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Cc: Yi Lai <yi1.lai@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-01-23dmaengine: ti: k3-udma: fix device leak on udma lookupJohan Hovold1-1/+1
commit 430f7803b69cd5e5694e5dfc884c6628870af36e upstream. Make sure to drop the reference taken when looking up the UDMA platform device. Note that holding a reference to a platform device does not prevent its driver data from going away so there is no point in keeping the reference after the lookup helper returns. Fixes: d70241913413 ("dmaengine: ti: k3-udma: Add glue layer for non DMAengine users") Fixes: 1438cde8fe9c ("dmaengine: ti: k3-udma: add missing put_device() call in of_xudma_dev_get()") Cc: stable@vger.kernel.org # 5.6: 1438cde8fe9c Cc: Grygorii Strashko <grygorii.strashko@ti.com> Cc: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Johan Hovold <johan@kernel.org> Link: https://patch.msgid.link/20251117161258.10679-17-johan@kernel.org Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-01-23dmaengine: ti: dma-crossbar: fix device leak on am335x route allocationJohan Hovold1-6/+10
commit 4fc17b1c6d2e04ad13fd6c21cfbac68043ec03f9 upstream. Make sure to drop the reference taken when looking up the crossbar platform device during am335x route allocation. Fixes: 42dbdcc6bf96 ("dmaengine: ti-dma-crossbar: Add support for crossbar on AM33xx/AM43xx") Cc: stable@vger.kernel.org # 4.4 Cc: Peter Ujfalusi <peter.ujfalusi@ti.com> Signed-off-by: Johan Hovold <johan@kernel.org> Link: https://patch.msgid.link/20251117161258.10679-15-johan@kernel.org Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-01-23dmaengine: ti: dma-crossbar: fix device leak on dra7x route allocationJohan Hovold1-0/+2
commit dc7e44db01fc2498644e3106db3e62a9883a93d5 upstream. Make sure to drop the reference taken when looking up the crossbar platform device during dra7x route allocation. Note that commit 615a4bfc426e ("dmaengine: ti: Add missing put_device in ti_dra7_xbar_route_allocate") fixed the leak in the error paths but the reference is still leaking on successful allocation. Fixes: a074ae38f859 ("dmaengine: Add driver for TI DMA crossbar on DRA7x") Fixes: 615a4bfc426e ("dmaengine: ti: Add missing put_device in ti_dra7_xbar_route_allocate") Cc: stable@vger.kernel.org # 4.2: 615a4bfc426e Cc: Peter Ujfalusi <peter.ujfalusi@ti.com> Cc: Miaoqian Lin <linmq006@gmail.com> Signed-off-by: Johan Hovold <johan@kernel.org> Link: https://patch.msgid.link/20251117161258.10679-14-johan@kernel.org Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-01-23dmaengine: stm32: dmamux: fix OF node leak on route allocation failureJohan Hovold1-1/+3
commit b1b590a590af13ded598e70f0b72bc1e515787a1 upstream. Make sure to drop the reference taken to the DMA master OF node also on late route allocation failures. Fixes: df7e762db5f6 ("dmaengine: Add STM32 DMAMUX driver") Cc: stable@vger.kernel.org # 4.15 Cc: Pierre-Yves MORDRET <pierre-yves.mordret@foss.st.com> Signed-off-by: Johan Hovold <johan@kernel.org> Reviewed-by: Amelie Delaunay <amelie.delaunay@foss.st.com> Link: https://patch.msgid.link/20251117161258.10679-12-johan@kernel.org Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-01-23dmaengine: stm32: dmamux: fix device leak on route allocationJohan Hovold1-6/+12
commit dd6e4943889fb354efa3f700e42739da9bddb6ef upstream. Make sure to drop the reference taken when looking up the DMA mux platform device during route allocation. Note that holding a reference to a device does not prevent its driver data from going away so there is no point in keeping the reference. Fixes: df7e762db5f6 ("dmaengine: Add STM32 DMAMUX driver") Cc: stable@vger.kernel.org # 4.15 Cc: Pierre-Yves MORDRET <pierre-yves.mordret@foss.st.com> Signed-off-by: Johan Hovold <johan@kernel.org> Reviewed-by: Amelie Delaunay <amelie.delaunay@foss.st.com> Link: https://patch.msgid.link/20251117161258.10679-11-johan@kernel.org Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-01-23dmaengine: sh: rz-dmac: Fix rz_dmac_terminate_all()Biju Das1-0/+5
commit 747213b08a1ab6a76e3e3b3e7a209cc1d402b5d0 upstream. After audio full duplex testing, playing the recorded file contains a few playback frames from the previous time. The rz_dmac_terminate_all() does not reset all the hardware descriptors queued previously, leading to the wrong descriptor being picked up during the next DMA transfer. Fix the above issue by resetting all the descriptor headers for a channel in rz_dmac_terminate_all() as rz_dmac_lmdesc_recycle() points to the proper descriptor header filled by the rz_dmac_prepare_descs_for_slave_sg(). Cc: stable@kernel.org Fixes: 5000d37042a6 ("dmaengine: sh: Add DMAC driver for RZ/G2L SoC") Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com> Reviewed-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com> Tested-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com> Link: https://patch.msgid.link/20251113195052.564338-1-biju.das.jz@bp.renesas.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-01-23dmaengine: sh: rz-dmac: fix device leak on probe failureJohan Hovold1-2/+11
commit 9fb490323997dcb6f749cd2660a17a39854600cd upstream. Make sure to drop the reference taken when looking up the ICU device during probe also on probe failures (e.g. probe deferral). Fixes: 7de873201c44 ("dmaengine: sh: rz-dmac: Add RZ/V2H(P) support") Cc: stable@vger.kernel.org # 6.16 Cc: Fabrizio Castro <fabrizio.castro.jz@renesas.com> Signed-off-by: Johan Hovold <johan@kernel.org> Reviewed-by: Fabrizio Castro <fabrizio.castro.jz@renesas.com> Link: https://patch.msgid.link/20251117161258.10679-10-johan@kernel.org Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-01-23dmaengine: qcom: gpi: Fix memory leak in gpi_peripheral_config()Miaoqian Lin1-2/+4
commit 3f747004bbd641131d9396d87b5d2d3d1e182728 upstream. Fix a memory leak in gpi_peripheral_config() where the original memory pointed to by gchan->config could be lost if krealloc() fails. The issue occurs when: 1. gchan->config points to previously allocated memory 2. krealloc() fails and returns NULL 3. The function directly assigns NULL to gchan->config, losing the reference to the original memory 4. The original memory becomes unreachable and cannot be freed Fix this by using a temporary variable to hold the krealloc() result and only updating gchan->config when the allocation succeeds. Found via static analysis and code review. Fixes: 5d0c3533a19f ("dmaengine: qcom: Add GPI dma driver") Cc: stable@vger.kernel.org Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Reviewed-by: Bjorn Andersson <andersson@kernel.org> Link: https://patch.msgid.link/20251029123421.91973-1-linmq006@gmail.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>