summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)AuthorFilesLines
2025-05-22tpm: Mask TPM RC in tpm2_start_auth_session()Jarkko Sakkinen1-0/+19
commit 539fbab37881e32ba6a708a100de6db19e1e7e7d upstream. tpm2_start_auth_session() does not mask TPM RC correctly from the callers: [ 28.766528] tpm tpm0: A TPM error (2307) occurred start auth session Process TPM RCs inside tpm2_start_auth_session(), and map them to POSIX error codes. Cc: stable@vger.kernel.org # v6.10+ Fixes: 699e3efd6c64 ("tpm: Add HMAC session start and end functions") Reported-by: Herbert Xu <herbert@gondor.apana.org.au> Closes: https://lore.kernel.org/linux-integrity/Z_NgdRHuTKP6JK--@gondor.apana.org.au/ Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-05-22scsi: sd_zbc: block: Respect bio vector limits for REPORT ZONES bufferSteve Siwinski1-0/+1
commit e8007fad5457ea547ca63bb011fdb03213571c7e upstream. The REPORT ZONES buffer size is currently limited by the HBA's maximum segment count to ensure the buffer can be mapped. However, the block layer further limits the number of iovec entries to 1024 when allocating a bio. To avoid allocation of buffers too large to be mapped, further restrict the maximum buffer size to BIO_MAX_INLINE_VECS. Replace the UIO_MAXIOV symbolic name with the more contextually appropriate BIO_MAX_INLINE_VECS. Fixes: b091ac616846 ("sd_zbc: Fix report zones buffer allocation") Cc: stable@vger.kernel.org Signed-off-by: Steve Siwinski <ssiwinski@atto.com> Link: https://lore.kernel.org/r/20250508200122.243129-1-ssiwinski@atto.com Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-05-22net: phy: micrel: remove KSZ9477 EEE quirks now handled by phylinkOleksij Rempel1-1/+0
commit 8c619eb21b8e87ae95877e9cca9fcb0e3115776e upstream. The KSZ9477 PHY driver contained workarounds for broken EEE capability advertisements by manually masking supported EEE modes and forcibly disabling EEE if MICREL_NO_EEE was set. With proper MAC-side EEE handling implemented via phylink, these quirks are no longer necessary. Remove MICREL_NO_EEE handling and the use of ksz9477_get_features(). This simplifies the PHY driver and avoids duplicated EEE management logic. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Cc: stable@vger.kernel.org # v6.14+ Link: https://patch.msgid.link/20250504081434.424489-3-o.rempel@pengutronix.de Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-05-22Drivers: hv: vmbus: Remove vmbus_sendpacket_pagebuffer()Michael Kelley1-7/+0
commit 45a442fe369e6c4e0b4aa9f63b31c3f2f9e2090e upstream. With the netvsc driver changed to use vmbus_sendpacket_mpb_desc() instead of vmbus_sendpacket_pagebuffer(), the latter has no remaining callers. Remove it. Cc: <stable@vger.kernel.org> # 6.1.x Signed-off-by: Michael Kelley <mhklinux@outlook.com> Link: https://patch.msgid.link/20250513000604.1396-6-mhklinux@outlook.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-05-22tpm: tis: Double the timeout B to 4sMichal Suchanek1-1/+1
[ Upstream commit 2f661f71fda1fc0c42b7746ca5b7da529eb6b5be ] With some Infineon chips the timeouts in tpm_tis_send_data (both B and C) can reach up to about 2250 ms. Timeout C is retried since commit de9e33df7762 ("tpm, tpm_tis: Workaround failed command reception on Infineon devices") Timeout B still needs to be extended. The problem is most commonly encountered with context related operation such as load context/save context. These are issued directly by the kernel, and there is no retry logic for them. When a filesystem is set up to use the TPM for unlocking the boot fails, and restarting the userspace service is ineffective. This is likely because ignoring a load context/save context result puts the real TPM state and the TPM state expected by the kernel out of sync. Chips known to be affected: tpm_tis IFX1522:00: 2.0 TPM (device-id 0x1D, rev-id 54) Description: SLB9672 Firmware Revision: 15.22 tpm_tis MSFT0101:00: 2.0 TPM (device-id 0x1B, rev-id 22) Firmware Revision: 7.83 tpm_tis MSFT0101:00: 2.0 TPM (device-id 0x1A, rev-id 16) Firmware Revision: 5.63 Link: https://lore.kernel.org/linux-integrity/Z5pI07m0Muapyu9w@kitsune.suse.cz/ Signed-off-by: Michal Suchanek <msuchanek@suse.de> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-05-18x86/its: Use dynamic thunks for indirect branchesPeter Zijlstra2-0/+8
commit 872df34d7c51a79523820ea6a14860398c639b87 upstream. ITS mitigation moves the unsafe indirect branches to a safe thunk. This could degrade the prediction accuracy as the source address of indirect branches becomes same for different execution paths. To improve the predictions, and hence the performance, assign a separate thunk for each indirect callsite. This is also a defense-in-depth measure to avoid indirect branches aliasing with each other. As an example, 5000 dynamic thunks would utilize around 16 bits of the address space, thereby gaining entropy. For a BTB that uses 32 bits for indexing, dynamic thunks could provide better prediction accuracy over fixed thunks. Have ITS thunks be variable sized and use EXECMEM_MODULE_TEXT such that they are both more flexible (got to extend them later) and live in 2M TLBs, just like kernel code, avoiding undue TLB pressure. [ pawan: CONFIG_EXECMEM_ROX is not supported on backport kernel, made adjustments to set memory to RW and ROX ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-05-18x86/its: Enable Indirect Target Selection mitigationPawan Gupta1-0/+2
commit f4818881c47fd91fcb6d62373c57c7844e3de1c0 upstream. Indirect Target Selection (ITS) is a bug in some pre-ADL Intel CPUs with eIBRS. It affects prediction of indirect branch and RETs in the lower half of cacheline. Due to ITS such branches may get wrongly predicted to a target of (direct or indirect) branch that is located in the upper half of the cacheline. Scope of impact =============== Guest/host isolation -------------------- When eIBRS is used for guest/host isolation, the indirect branches in the VMM may still be predicted with targets corresponding to branches in the guest. Intra-mode ---------- cBPF or other native gadgets can be used for intra-mode training and disclosure using ITS. User/kernel isolation --------------------- When eIBRS is enabled user/kernel isolation is not impacted. Indirect Branch Prediction Barrier (IBPB) ----------------------------------------- After an IBPB, indirect branches may be predicted with targets corresponding to direct branches which were executed prior to IBPB. This is mitigated by a microcode update. Add cmdline parameter indirect_target_selection=off|on|force to control the mitigation to relocate the affected branches to an ITS-safe thunk i.e. located in the upper half of cacheline. Also add the sysfs reporting. When retpoline mitigation is deployed, ITS safe-thunks are not needed, because retpoline sequence is already ITS-safe. Similarly, when call depth tracking (CDT) mitigation is deployed (retbleed=stuff), ITS safe return thunk is not used, as CDT prevents RSB-underflow. To not overcomplicate things, ITS mitigation is not supported with spectre-v2 lfence;jmp mitigation. Moreover, it is less practical to deploy lfence;jmp mitigation on ITS affected parts anyways. Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-05-18timekeeping: Prevent coarse clocks going backwardsThomas Gleixner1-3/+5
[ Upstream commit b71f9804f66c2592d4c3a2397b7374a4039005a5 ] Lei Chen raised an issue with CLOCK_MONOTONIC_COARSE seeing time inconsistencies. Lei tracked down that this was being caused by the adjustment: tk->tkr_mono.xtime_nsec -= offset; which is made to compensate for the unaccumulated cycles in offset when the multiplicator is adjusted forward, so that the non-_COARSE clockids don't see inconsistencies. However, the _COARSE clockid getter functions use the adjusted xtime_nsec value directly and do not compensate the negative offset via the clocksource delta multiplied with the new multiplicator. In that case the caller can observe time going backwards in consecutive calls. By design, this negative adjustment should be fine, because the logic run from timekeeping_adjust() is done after it accumulated approximately multiplicator * interval_cycles into xtime_nsec. The accumulated value is always larger then the mult_adj * offset value, which is subtracted from xtime_nsec. Both operations are done together under the tk_core.lock, so the net change to xtime_nsec is always always be positive. However, do_adjtimex() calls into timekeeping_advance() as well, to apply the NTP frequency adjustment immediately. In this case, timekeeping_advance() does not return early when the offset is smaller then interval_cycles. In that case there is no time accumulated into xtime_nsec. But the subsequent call into timekeeping_adjust(), which modifies the multiplicator, subtracts from xtime_nsec to correct for the new multiplicator. Here because there was no accumulation, xtime_nsec becomes smaller than before, which opens a window up to the next accumulation, where the _COARSE clockid getters, which don't compensate for the offset, can observe the inconsistency. This has been tried to be fixed by forwarding the timekeeper in the case that adjtimex() adjusts the multiplier, which resets the offset to zero: 757b000f7b93 ("timekeeping: Fix possible inconsistencies in _COARSE clockids") That works correctly, but unfortunately causes a regression on the adjtimex() side. There are two issues: 1) The forwarding of the base time moves the update out of the original period and establishes a new one. 2) The clearing of the accumulated NTP error is changing the behaviour as well. User-space expects that multiplier/frequency updates are in effect, when the syscall returns, so delaying the update to the next tick is not solving the problem either. Commit 757b000f7b93 was reverted so that the established expectations of user space implementations (ntpd, chronyd) are restored, but that obviously brought the inconsistencies back. One of the initial approaches to fix this was to establish a separate storage for the coarse time getter nanoseconds part by calculating it from the offset. That was dropped on the floor because not having yet another state to maintain was simpler. But given the result of the above exercise, this solution turns out to be the right one. Bring it back in a slightly modified form. Thus introduce timekeeper::coarse_nsec and store that nanoseconds part in it, switch the time getter functions and the VDSO update to use that value. coarse_nsec is set on operations which forward or initialize the timekeeper and after time was accumulated during a tick. If there is no accumulation the timestamp is unchanged. This leaves the adjtimex() behaviour unmodified and prevents coarse time from going backwards. [ jstultz: Simplified the coarse_nsec calculation and kept behavior so coarse clockids aren't adjusted on each inter-tick adjtimex call, slightly reworked the comments and commit message ] Fixes: da15cfdae033 ("time: Introduce CLOCK_REALTIME_COARSE") Reported-by: Lei Chen <lei.chen@smartx.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <jstultz@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/all/20250419054706.2319105-1-jstultz@google.com Closes: https://lore.kernel.org/lkml/20250310030004.3705801-1-lei.chen@smartx.com/ Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-05-18mm: vmalloc: support more granular vrealloc() sizingKees Cook1-0/+1
commit a0309faf1cb0622cac7c820150b7abf2024acff5 upstream. Introduce struct vm_struct::requested_size so that the requested (re)allocation size is retained separately from the allocated area size. This means that KASAN will correctly poison the correct spans of requested bytes. This also means we can support growing the usable portion of an allocation that can already be supported by the existing area's existing allocation. Link: https://lkml.kernel.org/r/20250426001105.it.679-kees@kernel.org Fixes: 3ddc2fefe6f3 ("mm: vmalloc: implement vrealloc()") Signed-off-by: Kees Cook <kees@kernel.org> Reported-by: Erhard Furtner <erhard_f@mailbox.org> Closes: https://lore.kernel.org/all/20250408192503.6149a816@outsider.home/ Reviewed-by: Danilo Krummrich <dakr@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-05-18uio_hv_generic: Fix sysfs creation path for ring bufferNaman Jain1-0/+6
commit f31fe8165d365379d858c53bef43254c7d6d1cfd upstream. On regular bootup, devices get registered to VMBus first, so when uio_hv_generic driver for a particular device type is probed, the device is already initialized and added, so sysfs creation in hv_uio_probe() works fine. However, when the device is removed and brought back, the channel gets rescinded and the device again gets registered to VMBus. However this time, the uio_hv_generic driver is already registered to probe for that device and in this case sysfs creation is tried before the device's kobject gets initialized completely. Fix this by moving the core logic of sysfs creation of ring buffer, from uio_hv_generic to HyperV's VMBus driver, where the rest of the sysfs attributes for the channels are defined. While doing that, make use of attribute groups and macros, instead of creating sysfs directly, to ensure better error handling and code flow. Problematic path: vmbus_process_offer (A new offer comes for the VMBus device) vmbus_add_channel_work vmbus_device_register |-> device_register | |... | |-> hv_uio_probe | |... | |-> sysfs_create_bin_file (leads to a warning as | the primary channel's kobject, which is used to | create the sysfs file, is not yet initialized) |-> kset_create_and_add |-> vmbus_add_channel_kobj (initialization of the primary channel's kobject happens later) Above code flow is sequential and the warning is always reproducible in this path. Fixes: 9ab877a6ccf8 ("uio_hv_generic: make ring buffer attribute for primary channel") Cc: stable@kernel.org Suggested-by: Saurabh Sengar <ssengar@linux.microsoft.com> Suggested-by: Michael Kelley <mhklinux@outlook.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Tested-by: Michael Kelley <mhklinux@outlook.com> Reviewed-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Naman Jain <namjain@linux.microsoft.com> Link: https://lore.kernel.org/r/20250502074811.2022-2-namjain@linux.microsoft.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-05-18wifi: mac80211: fix the type of status_code for negotiated TID to Link MappingMichael-CY Lee1-1/+1
[ Upstream commit e12a42f64fc3d74872b349eedd47f90c6676b78a ] The status code should be type of __le16. Fixes: 83e897a961b8 ("wifi: ieee80211: add definitions for negotiated TID to Link map") Fixes: 8f500fbc6c65 ("wifi: mac80211: process and save negotiated TID to Link mapping request") Signed-off-by: Michael-CY Lee <michael-cy.lee@mediatek.com> Link: https://patch.msgid.link/20250505081946.3927214-1-michael-cy.lee@mediatek.com Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-05-09kernel: globalize lookup_or_create_module_kobject()Shyam Saini1-0/+2
[ Upstream commit 7c76c813cfc42a7376378a0c4b7250db2eebab81 ] lookup_or_create_module_kobject() is marked as static and __init, to make it global drop static keyword. Since this function can be called from non-init code, use __modinit instead of __init, __modinit marker will make it __init if CONFIG_MODULES is not defined. Suggested-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Shyam Saini <shyamsaini@linux.microsoft.com> Link: https://lore.kernel.org/r/20250227184930.34163-4-shyamsaini@linux.microsoft.com Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Stable-dep-of: f95bbfe18512 ("drivers: base: handle module_kobject creation") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-05-09block: introduce zone capacity helperNaohiro Aota1-22/+45
[ Upstream commit c1a79b1a583654f24b17da81ba868b0064077243 ] {bdev,disk}_zone_capacity() takes block_device or gendisk and sector position and returns the zone capacity of the corresponding zone. With that, move disk_nr_zones() and blk_zone_plug_bio() to consolidate them in the same #ifdef block. Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: David Sterba <dsterba@suse.com> Stable-dep-of: 866bafae59ec ("btrfs: zoned: skip reporting zone for new block group") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-05-09cpufreq: Introduce policy->boost_supported flagViresh Kumar1-0/+3
[ Upstream commit 1f7d1bab50e6ae517f8b6699e56d709d61ae13e5 ] It is possible to have a scenario where not all cpufreq policies support boost frequencies. And letting sysfs (or other parts of the kernel) enable boost feature for that policy isn't correct. Add a new flag, boost_supported, which will be set to true by the cpufreq core only if the freq table contains valid boost frequencies. Some cpufreq drivers though don't have boost frequencies in the freq-table, they can set this flag from their ->init() callbacks. Once all the drivers are updated to set the flag correctly, we can check it before enabling boost feature for a policy. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Stable-dep-of: 3d59224947b0 ("cpufreq: ACPI: Re-sync CPU boost state on system resume") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-05-09cpufreq: Fix setting policy limits when frequency tables are usedRafael J. Wysocki1-29/+54
commit b79028039f440e7d2c4df6ab243060c4e3803e84 upstream. Commit 7491cdf46b5c ("cpufreq: Avoid using inconsistent policy->min and policy->max") overlooked the fact that policy->min and policy->max were accessed directly in cpufreq_frequency_table_target() and in the functions called by it. Consequently, the changes made by that commit led to problems with setting policy limits. Address this by passing the target frequency limits to __resolve_freq() and cpufreq_frequency_table_target() and propagating them to the functions called by the latter. Fixes: 7491cdf46b5c ("cpufreq: Avoid using inconsistent policy->min and policy->max") Cc: 5.16+ <stable@vger.kernel.org> # 5.16+ Closes: https://lore.kernel.org/linux-pm/aAplED3IA_J0eZN0@linaro.org/ Reported-by: Stephan Gerhold <stephan.gerhold@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Stephan Gerhold <stephan.gerhold@linaro.org> Reviewed-by: Lifeng Zheng <zhenglifeng1@huawei.com> Link: https://patch.msgid.link/5896780.DvuYhMxLoT@rjwysocki.net Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-05-09iommu: Fix two issues in iommu_copy_struct_from_user()Nicolin Chen1-4/+4
commit 30a3f2f3e4bd6335b727c83c08a982d969752bc1 upstream. In the review for iommu_copy_struct_to_user() helper, Matt pointed out that a NULL pointer should be rejected prior to dereferencing it: https://lore.kernel.org/all/86881827-8E2D-461C-BDA3-FA8FD14C343C@nvidia.com And Alok pointed out a typo at the same time: https://lore.kernel.org/all/480536af-6830-43ce-a327-adbd13dc3f1d@oracle.com Since both issues were copied from iommu_copy_struct_from_user(), fix them first in the current header. Fixes: e9d36c07bb78 ("iommu: Add iommu_copy_struct_from_user helper") Cc: stable@vger.kernel.org Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Alok Tiwari <alok.a.tiwari@oracle.com> Reviewed-by: Matthew R. Ochs <mochs@nvidia.com> Link: https://lore.kernel.org/r/20250414191635.450472-1-nicolinc@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-05-02crypto: ecdsa - Harden against integer overflows in DIV_ROUND_UP()Lukas Wunner1-0/+12
[ Upstream commit b16510a530d1e6ab9683f04f8fb34f2e0f538275 ] Herbert notes that DIV_ROUND_UP() may overflow unnecessarily if an ecdsa implementation's ->key_size() callback returns an unusually large value. Herbert instead suggests (for a division by 8): X / 8 + !!(X & 7) Based on this formula, introduce a generic DIV_ROUND_UP_POW2() macro and use it in lieu of DIV_ROUND_UP() for ->key_size() return values. Additionally, use the macro in ecc_digits_from_bytes(), whose "nbytes" parameter is a ->key_size() return value in some instances, or a user-specified ASN.1 length in the case of ecdsa_get_signature_rs(). Link: https://lore.kernel.org/r/Z3iElsILmoSu6FuC@gondor.apana.org.au/ Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-05-02net: phylink: add functions to block/unblock rx clock stopRussell King (Oracle)1-0/+3
commit ddf4bd3f738485c84edb98ff96a5759904498e70 upstream. Some MACs require the PHY receive clock to be running to complete setup actions. This may fail if the PHY has negotiated EEE, the MAC supports receive clock stop, and the link has entered LPI state. Provide a pair of APIs that MAC drivers can use to temporarily block the PHY disabling the receive clock. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1tvO6k-008Vjt-MZ@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org> Cc: Jon Hunter <jonathanh@nvidia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-05-02net: phylink: add phylink_prepare_resume()Russell King (Oracle)1-0/+1
commit 367f1854d442b33c4a0305b068ae40d67ccd7d6a upstream. When the system is suspended, the PHY may be placed in low-power mode by setting the BMCR 0.11 Power down bit. IEEE 802.3 states that the behaviour of the PHY in this state is implementation specific, and the PHY is not required to meet the RX_CLK and TX_CLK requirements. Essentially, this means that a PHY may stop the clocks that it is generating while in power down state. However, MACs exist which require the clocks from the PHY to be running in order to properly resume. phylink_prepare_resume() provides them with a way to clear the Power down bit early. Note, however, that IEEE 802.3 gives PHYs up to 500ms grace before the transmit and receive clocks meet the requirements after clearing the power down bit. Add a resume preparation function, which will ensure that the receive clock from the PHY is appropriately configured while resuming. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1tvO6V-008Vjb-AP@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org> Cc: Jon Hunter <jonathanh@nvidia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-05-02block: move blkdev_{get,put} _no_open prototypes out of blkdev.hChristoph Hellwig1-4/+0
[ Upstream commit c63202140d4b411d27380805c4d68eb11407b7f2 ] These are only to be used by block internal code. Remove the comment as we grew more users due to reworking block device node opening. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christian Brauner <brauner@kernel.org> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20250423053810.1683309-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Stable-dep-of: 5f33b5226c9d ("block: don't autoload drivers on stat") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-05-02net: phy: Add helper for getting tx amplitude gainDimitri Fedrau1-0/+4
[ Upstream commit 961ee5aeea048aa292f28d61f3a96a48554e91af ] Add helper which returns the tx amplitude gain defined in device tree. Modifying it can be necessary to compensate losses on the PCB and connector, so the voltages measured on the RJ45 pins are conforming. Signed-off-by: Dimitri Fedrau <dimitri.fedrau@liebherr.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250214-dp83822-tx-swing-v5-2-02ca72620599@liebherr.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Stable-dep-of: 607b310ada5e ("net: dp83822: Fix OF_MDIO config check") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-05-02PCI/MSI: Add an option to write MSIX ENTRY_DATA before any readsJonathan Currier1-0/+2
[ Upstream commit cf761e3dacc6ad5f65a4886d00da1f9681e6805a ] Commit 7d5ec3d36123 ("PCI/MSI: Mask all unused MSI-X entries") introduced a readl() from ENTRY_VECTOR_CTRL before the writel() to ENTRY_DATA. This is correct, however some hardware, like the Sun Neptune chips, the NIU module, will cause an error and/or fatal trap if any MSIX table entry is read before the corresponding ENTRY_DATA field is written to. Add an optional early writel() in msix_prepare_msi_desc(). Fixes: 7d5ec3d36123 ("PCI/MSI: Mask all unused MSI-X entries") Signed-off-by: Jonathan Currier <dullfire@yahoo.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20241117234843.19236-2-dullfire@yahoo.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-05-02PCI/MSI: Convert pci_msi_ignore_mask to per MSI domain flagRoger Pau Monne1-1/+2
[ Upstream commit c3164d2e0d181027da8fc94f8179d8607c3d440f ] Setting pci_msi_ignore_mask inhibits the toggling of the mask bit for both MSI and MSI-X entries globally, regardless of the IRQ chip they are using. Only Xen sets the pci_msi_ignore_mask when routing physical interrupts over event channels, to prevent PCI code from attempting to toggle the maskbit, as it's Xen that controls the bit. However, the pci_msi_ignore_mask being global will affect devices that use MSI interrupts but are not routing those interrupts over event channels (not using the Xen pIRQ chip). One example is devices behind a VMD PCI bridge. In that scenario the VMD bridge configures MSI(-X) using the normal IRQ chip (the pIRQ one in the Xen case), and devices behind the bridge configure the MSI entries using indexes into the VMD bridge MSI table. The VMD bridge then demultiplexes such interrupts and delivers to the destination device(s). Having pci_msi_ignore_mask set in that scenario prevents (un)masking of MSI entries for devices behind the VMD bridge. Move the signaling of no entry masking into the MSI domain flags, as that allows setting it on a per-domain basis. Set it for the Xen MSI domain that uses the pIRQ chip, while leaving it unset for the rest of the cases. Remove pci_msi_ignore_mask at once, since it was only used by Xen code, and with Xen dropping usage the variable is unneeded. This fixes using devices behind a VMD bridge on Xen PV hardware domains. Albeit Devices behind a VMD bridge are not known to Xen, that doesn't mean Linux cannot use them. By inhibiting the usage of VMD_FEAT_CAN_BYPASS_MSI_REMAP and the removal of the pci_msi_ignore_mask bodge devices behind a VMD bridge do work fine when use from a Linux Xen hardware domain. That's the whole point of the series. Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Juergen Gross <jgross@suse.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Message-ID: <20250219092059.90850-4-roger.pau@citrix.com> Signed-off-by: Juergen Gross <jgross@suse.com> Stable-dep-of: cf761e3dacc6 ("PCI/MSI: Add an option to write MSIX ENTRY_DATA before any reads") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-05-02PM: EM: Address RCU-related sparse warningsRafael J. Wysocki1-6/+6
[ Upstream commit 3ee7be9e10dd5f79448788b899591d4bd2bf0c19 ] The usage of __rcu in the Energy Model code is quite inconsistent which causes the following sparse warnings to trigger: kernel/power/energy_model.c:169:15: warning: incorrect type in assignment (different address spaces) kernel/power/energy_model.c:169:15: expected struct em_perf_table [noderef] __rcu *table kernel/power/energy_model.c:169:15: got struct em_perf_table * kernel/power/energy_model.c:171:9: warning: incorrect type in argument 1 (different address spaces) kernel/power/energy_model.c:171:9: expected struct callback_head *head kernel/power/energy_model.c:171:9: got struct callback_head [noderef] __rcu * kernel/power/energy_model.c:171:9: warning: cast removes address space '__rcu' of expression kernel/power/energy_model.c:182:19: warning: incorrect type in argument 1 (different address spaces) kernel/power/energy_model.c:182:19: expected struct kref *kref kernel/power/energy_model.c:182:19: got struct kref [noderef] __rcu * kernel/power/energy_model.c:200:15: warning: incorrect type in assignment (different address spaces) kernel/power/energy_model.c:200:15: expected struct em_perf_table [noderef] __rcu *table kernel/power/energy_model.c:200:15: got void *[assigned] _res kernel/power/energy_model.c:204:20: warning: incorrect type in argument 1 (different address spaces) kernel/power/energy_model.c:204:20: expected struct kref *kref kernel/power/energy_model.c:204:20: got struct kref [noderef] __rcu * kernel/power/energy_model.c:320:19: warning: incorrect type in argument 1 (different address spaces) kernel/power/energy_model.c:320:19: expected struct kref *kref kernel/power/energy_model.c:320:19: got struct kref [noderef] __rcu * kernel/power/energy_model.c:325:45: warning: incorrect type in argument 2 (different address spaces) kernel/power/energy_model.c:325:45: expected struct em_perf_state *table kernel/power/energy_model.c:325:45: got struct em_perf_state [noderef] __rcu * kernel/power/energy_model.c:425:45: warning: incorrect type in argument 3 (different address spaces) kernel/power/energy_model.c:425:45: expected struct em_perf_state *table kernel/power/energy_model.c:425:45: got struct em_perf_state [noderef] __rcu * kernel/power/energy_model.c:442:15: warning: incorrect type in argument 1 (different address spaces) kernel/power/energy_model.c:442:15: expected void const *objp kernel/power/energy_model.c:442:15: got struct em_perf_table [noderef] __rcu *[assigned] em_table kernel/power/energy_model.c:626:55: warning: incorrect type in argument 2 (different address spaces) kernel/power/energy_model.c:626:55: expected struct em_perf_state *table kernel/power/energy_model.c:626:55: got struct em_perf_state [noderef] __rcu * kernel/power/energy_model.c:681:16: warning: incorrect type in assignment (different address spaces) kernel/power/energy_model.c:681:16: expected struct em_perf_state *new_ps kernel/power/energy_model.c:681:16: got struct em_perf_state [noderef] __rcu * kernel/power/energy_model.c:699:37: warning: incorrect type in argument 2 (different address spaces) kernel/power/energy_model.c:699:37: expected struct em_perf_state *table kernel/power/energy_model.c:699:37: got struct em_perf_state [noderef] __rcu * kernel/power/energy_model.c:733:38: warning: incorrect type in argument 3 (different address spaces) kernel/power/energy_model.c:733:38: expected struct em_perf_state *table kernel/power/energy_model.c:733:38: got struct em_perf_state [noderef] __rcu * kernel/power/energy_model.c:855:53: warning: dereference of noderef expression kernel/power/energy_model.c:864:32: warning: dereference of noderef expression This is because the __rcu annotation for sparse is only applicable to pointers that need rcu_dereference() or equivalent for protection, which basically means pointers assigned with rcu_assign_pointer(). Make all of the above sparse warnings go away by cleaning up the usage of __rcu and using rcu_dereference_protected() where applicable. Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Link: https://patch.msgid.link/5885405.DvuYhMxLoT@rjwysocki.net Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-25nfs: add missing selections of CONFIG_CRC32Eric Biggers1-7/+0
[ Upstream commit cd35b6cb46649750b7dbd0df0e2d767415d8917b ] nfs.ko, nfsd.ko, and lockd.ko all use crc32_le(), which is available only when CONFIG_CRC32 is enabled. But the only NFS kconfig option that selected CONFIG_CRC32 was CONFIG_NFS_DEBUG, which is client-specific and did not actually guard the use of crc32_le() even on the client. The code worked around this bug by only actually calling crc32_le() when CONFIG_CRC32 is built-in, instead hard-coding '0' in other cases. This avoided randconfig build errors, and in real kernels the fallback code was unlikely to be reached since CONFIG_CRC32 is 'default y'. But, this really needs to just be done properly, especially now that I'm planning to update CONFIG_CRC32 to not be 'default y'. Therefore, make CONFIG_NFS_FS, CONFIG_NFSD, and CONFIG_LOCKD select CONFIG_CRC32. Then remove the fallback code that becomes unnecessary, as well as the selection of CONFIG_CRC32 from CONFIG_NFS_DEBUG. Fixes: 1264a2f053a3 ("NFS: refactor code for calculating the crc32 hash of a filehandle") Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Anna Schumaker <anna.schumaker@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-25writeback: fix false warning in inode_to_wb()Andreas Gruenbacher1-0/+1
commit 9e888998ea4d22257b07ce911576509486fa0667 upstream. inode_to_wb() is used also for filesystems that don't support cgroup writeback. For these filesystems inode->i_wb is stable during the lifetime of the inode (it points to bdi->wb) and there's no need to hold locks protecting the inode->i_wb dereference. Improve the warning in inode_to_wb() to not trigger for these filesystems. Link: https://lkml.kernel.org/r/20250412163914.3773459-3-agruenba@redhat.com Fixes: aaa2cacf8184 ("writeback: add lockdep annotation to inode_to_wb()") Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-04-25firmware: cs_dsp: test_bin_error: Fix uninitialized data used as fw versionRichard Fitzgerald1-1/+0
[ Upstream commit 285b2c74cf9982e873ef82a2cb1328d9e9406f65 ] Call cs_dsp_mock_xm_header_get_fw_version() to get the firmware version from the dummy XM header data in cs_dsp_bin_err_test_common_init(). Make the same change to cs_dsp_bin_test_common_init() and remove the cs_dsp_mock_xm_header_get_fw_version_from_regmap() function. The code in cs_dsp_test_bin.c was correctly calling cs_dsp_mock_xm_header_get_fw_version_from_regmap() to fetch the fw version from a dummy header it wrote to XM registers. However in cs_dsp_test_bin_error.c the test doesn't stuff a dummy header into XM, it populates it the normal way using a wmfw file. It should have called cs_dsp_mock_xm_header_get_fw_version() to get the data from its blob buffer, but was calling cs_dsp_mock_xm_header_get_fw_version_from_regmap(). As nothing had been written to the registers this returned the value of uninitialized data. The only other use of cs_dsp_mock_xm_header_get_fw_version_from_regmap() was cs_dsp_test_bin.c, but it doesn't need to use it. It already has a blob buffer containing the dummy XM header so it can use cs_dsp_mock_xm_header_get_fw_version() to read from that. Fixes: cd8c058499b6 ("firmware: cs_dsp: Add KUnit testing of bin error cases") Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com> Link: https://patch.msgid.link/20250410132129.1312541-1-rf@opensource.cirrus.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-20KVM: Allow building irqbypass.ko as as module when kvm.ko is a moduleSean Christopherson1-1/+1
commit 459a35111b0a890172a78d51c01b204e13a34a18 upstream. Convert HAVE_KVM_IRQ_BYPASS into a tristate so that selecting IRQ_BYPASS_MANAGER follows KVM={m,y}, i.e. doesn't force irqbypass.ko to be built-in. Note, PPC allows building KVM as a module, but selects HAVE_KVM_IRQ_BYPASS from a boolean Kconfig, i.e. KVM PPC unnecessarily forces irqbpass.ko to be built-in. But that flaw is a longstanding PPC specific issue. Fixes: 61df71ee992d ("kvm: move "select IRQ_BYPASS_MANAGER" to common code") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20250315024623.2363994-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-04-20mm/hwpoison: introduce folio_contain_hwpoisoned_page() helperJinjiang Tu1-0/+6
commit 5f5ee52d4f58605330b09851273d6e56aaadd29e upstream. Patch series "mm/vmscan: don't try to reclaim hwpoison folio". Fix a bug during memory reclaim if folio is hwpoisoned. This patch (of 2): Introduce helper folio_contain_hwpoisoned_page() to check if the entire folio is hwpoisoned or it contains hwpoisoned pages. Link: https://lkml.kernel.org/r/20250318083939.987651-1-tujinjiang@huawei.com Link: https://lkml.kernel.org/r/20250318083939.987651-2-tujinjiang@huawei.com Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com> Acked-by: Miaohe Lin <linmiaohe@huawei.com> Cc: David Hildenbrand <david@redhat.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Nanyong Sun <sunnanyong@huawei.com> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-04-20mm: fix lazy mmu docs and usageRyan Roberts1-6/+8
commit 691ee97e1a9de0cdb3efb893c1f180e3f4a35e32 upstream. Patch series "Fix lazy mmu mode", v2. I'm planning to implement lazy mmu mode for arm64 to optimize vmalloc. As part of that, I will extend lazy mmu mode to cover kernel mappings in vmalloc table walkers. While lazy mmu mode is already used for kernel mappings in a few places, this will extend it's use significantly. Having reviewed the existing lazy mmu implementations in powerpc, sparc and x86, it looks like there are a bunch of bugs, some of which may be more likely to trigger once I extend the use of lazy mmu. So this series attempts to clarify the requirements and fix all the bugs in advance of that series. See patch #1 commit log for all the details. This patch (of 5): The docs, implementations and use of arch_[enter|leave]_lazy_mmu_mode() is a bit of a mess (to put it politely). There are a number of issues related to nesting of lazy mmu regions and confusion over whether the task, when in a lazy mmu region, is preemptible or not. Fix all the issues relating to the core-mm. Follow up commits will fix the arch-specific implementations. 3 arches implement lazy mmu; powerpc, sparc and x86. When arch_[enter|leave]_lazy_mmu_mode() was first introduced by commit 6606c3e0da53 ("[PATCH] paravirt: lazy mmu mode hooks.patch"), it was expected that lazy mmu regions would never nest and that the appropriate page table lock(s) would be held while in the region, thus ensuring the region is non-preemptible. Additionally lazy mmu regions were only used during manipulation of user mappings. Commit 38e0edb15bd0 ("mm/apply_to_range: call pte function with lazy updates") started invoking the lazy mmu mode in apply_to_pte_range(), which is used for both user and kernel mappings. For kernel mappings the region is no longer protected by any lock so there is no longer any guarantee about non-preemptibility. Additionally, for RT configs, the holding the PTL only implies no CPU migration, it doesn't prevent preemption. Commit bcc6cc832573 ("mm: add default definition of set_ptes()") added arch_[enter|leave]_lazy_mmu_mode() to the default implementation of set_ptes(), used by x86. So after this commit, lazy mmu regions can be nested. Additionally commit 1a10a44dfc1d ("sparc64: implement the new page table range API") and commit 9fee28baa601 ("powerpc: implement the new page table range API") did the same for the sparc and powerpc set_ptes() overrides. powerpc couldn't deal with preemption so avoids it in commit b9ef323ea168 ("powerpc/64s: Disable preemption in hash lazy mmu mode"), which explicitly disables preemption for the whole region in its implementation. x86 can support preemption (or at least it could until it tried to add support nesting; more on this below). Sparc looks to be totally broken in the face of preemption, as far as I can tell. powerpc can't deal with nesting, so avoids it in commit 47b8def9358c ("powerpc/mm: Avoid calling arch_enter/leave_lazy_mmu() in set_ptes"), which removes the lazy mmu calls from its implementation of set_ptes(). x86 attempted to support nesting in commit 49147beb0ccb ("x86/xen: allow nesting of same lazy mode") but as far as I can tell, this breaks its support for preemption. In short, it's all a mess; the semantics for arch_[enter|leave]_lazy_mmu_mode() are not clearly defined and as a result the implementations all have different expectations, sticking plasters and bugs. arm64 is aiming to start using these hooks, so let's clean everything up before adding an arm64 implementation. Update the documentation to state that lazy mmu regions can never be nested, must not be called in interrupt context and preemption may or may not be enabled for the duration of the region. And fix the generic implementation of set_ptes() to avoid nesting. arch-specific fixes to conform to the new spec will proceed this one. These issues were spotted by code review and I have no evidence of issues being reported in the wild. Link: https://lkml.kernel.org/r/20250303141542.3371656-1-ryan.roberts@arm.com Link: https://lkml.kernel.org/r/20250303141542.3371656-2-ryan.roberts@arm.com Fixes: bcc6cc832573 ("mm: add default definition of set_ptes()") Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Juergen Gross <jgross@suse.com> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Borislav Betkov <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David S. Miller <davem@davemloft.net> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Juegren Gross <jgross@suse.com> Cc: Matthew Wilcow (Oracle) <willy@infradead.org> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-04-20mm/damon: avoid applying DAMOS action to same entity multiple timesSeongJae Park1-0/+11
commit 94ba17adaba0f651fdcf745c8891a88e2e028cfa upstream. 'paddr' DAMON operations set can apply a DAMOS scheme's action to a large folio multiple times in single DAMOS-regions-walk if the folio is laid on multiple DAMON regions. Add a field for DAMOS scheme object that can be used by the underlying ops to know what was the last entity that the scheme's action has applied. The core layer unsets the field when each DAMOS-regions-walk is done for the given scheme. And update 'paddr' ops to use the infrastructure to avoid the problem. Link: https://lkml.kernel.org/r/20250207212033.45269-3-sj@kernel.org Fixes: 57223ac29584 ("mm/damon/paddr: support the pageout scheme") Signed-off-by: SeongJae Park <sj@kernel.org> Reported-by: Usama Arif <usamaarif642@gmail.com> Closes: https://lore.kernel.org/20250203225604.44742-3-usamaarif642@gmail.com Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-04-20mtd: spinand: Fix build with gcc < 7.5Miquel Raynal1-1/+1
commit 1c1fd374a2fe72b8a6dde62d3c3a9fd153e7581c upstream. __VA_OPT__ is a macro that is useful when some arguments can be present or not to entirely skip some part of a definition. Unfortunately, it is a too recent addition that some of the still supported old GCC versions do not know about, and is anyway not part of C11 that is the version used in the kernel. Find a trick to remove this macro, typically '__VA_ARGS__ + 0' is a workaround used in netlink.h which works very well here, as we either expect: - 0 - A positive value - No value, which means the field should be 0. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202503181330.YcDXGy7F-lkp@intel.com/ Fixes: 7ce0d16d5802 ("mtd: spinand: Add an optional frequency to read from cache macros") Cc: stable@vger.kernel.org Tested-by: Jean Delvare <jdelvare@suse.de> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-04-20io_uring/net: fix io_req_post_cqe abuse by send bundlePavel Begunkov1-0/+3
commit 6889ae1b4df1579bcdffef023e2ea9a982565dff upstream. [ 114.987980][ T5313] WARNING: CPU: 6 PID: 5313 at io_uring/io_uring.c:872 io_req_post_cqe+0x12e/0x4f0 [ 114.991597][ T5313] RIP: 0010:io_req_post_cqe+0x12e/0x4f0 [ 115.001880][ T5313] Call Trace: [ 115.002222][ T5313] <TASK> [ 115.007813][ T5313] io_send+0x4fe/0x10f0 [ 115.009317][ T5313] io_issue_sqe+0x1a6/0x1740 [ 115.012094][ T5313] io_wq_submit_work+0x38b/0xed0 [ 115.013223][ T5313] io_worker_handle_work+0x62a/0x1600 [ 115.013876][ T5313] io_wq_worker+0x34f/0xdf0 As the comment states, io_req_post_cqe() should only be used by multishot requests, i.e. REQ_F_APOLL_MULTISHOT, which bundled sends are not. Add a flag signifying whether a request wants to post multiple CQEs. Eventually REQ_F_APOLL_MULTISHOT should imply the new flag, but that's left out for simplicity. Cc: stable@vger.kernel.org Fixes: a05d1f625c7aa ("io_uring/net: support bundles for send") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/8b611dbb54d1cd47a88681f5d38c84d0c02bc563.1743067183.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-04-20HID: pidff: Move all hid-pidff definitions to a dedicated headerTomasz Pakuła1-15/+0
[ Upstream commit 0d24d4b1da96df9fc5ff36966f40f980ef864d46 ] Do not clutter hid includes with stuff not needed outside of the kernel. Signed-off-by: Tomasz Pakuła <tomasz.pakula.oficjalny@gmail.com> Reviewed-by: Michał Kopeć <michal@nozomi.space> Reviewed-by: Paul Dino Jones <paul@spacefreak18.xyz> Tested-by: Paul Dino Jones <paul@spacefreak18.xyz> Tested-by: Cristóferson Bueno <cbueno81@gmail.com> Tested-by: Pablo Cisneros <patchkez@protonmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-20tpm, tpm_tis: Workaround failed command reception on Infineon devicesJonathan McDowell1-0/+1
[ Upstream commit de9e33df7762abbfc2a1568291f2c3a3154c6a9d ] Some Infineon devices have a issue where the status register will get stuck with a quick REQUEST_USE / COMMAND_READY sequence. This is not simply a matter of requiring a longer timeout; the work around is to retry the command submission. Add appropriate logic to do this in the send path. This is fixed in later firmware revisions, but those are not always available, and cannot generally be easily updated from outside a firmware environment. Testing has been performed with a simple repeated loop of doing a TPM2_CC_GET_CAPABILITY for TPM_CAP_PROP_MANUFACTURER using the Go code at: https://the.earth.li/~noodles/tpm-stuff/timeout-reproducer-simple.go It can take several hours to reproduce, and several million operations. Signed-off-by: Jonathan McDowell <noodles@meta.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-20PCI: Add Rockchip Vendor IDShawn Lin1-0/+2
[ Upstream commit 20bbb083bbc9d3f8db390f2e35e168f1b23dae8a ] Move PCI_VENDOR_ID_ROCKCHIP from pci_endpoint_test.c to pci_ids.h and reuse it in pcie-rockchip-host.c. Link: https://lore.kernel.org/r/20250218092120.2322784-2-cassel@kernel.org Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com> Signed-off-by: Niklas Cassel <cassel@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-20HID: pidff: Add PERIODIC_SINE_ONLY quirkTomasz Pakuła1-0/+1
[ Upstream commit abdbf8764f4962af2a910abb3a213ecf304a73d3 ] Some devices only support SINE periodic effect although they advertise support for all PERIODIC effect in their HID descriptor. Some just do nothing when trying to play such an effect (upload goes fine), some express undefined behavior like turning to one side. This quirk forces all the periodic effects to be uploaded as SINE. This is acceptable as all these effects are similar in nature and are mostly used as rumble. SINE is the most popular with others seldom used (especially SAW_UP and SAW_DOWN). Fixes periodic effects for PXN and LITE STAR wheels Signed-off-by: Tomasz Pakuła <tomasz.pakula.oficjalny@gmail.com> Reviewed-by: Michał Kopeć <michal@nozomi.space> Reviewed-by: Paul Dino Jones <paul@spacefreak18.xyz> Tested-by: Cristóferson Bueno <cbueno81@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-20HID: pidff: Add FIX_WHEEL_DIRECTION quirkTomasz Pakuła1-0/+1
[ Upstream commit 3051bf5ec773b803c474ea556b57d678a8885be3 ] Most steering wheels simply ignore DIRECTION field, but some try to be compliant with the PID standard and use it in force calculations. Games often ignore setting this field properly and/or there can be issues with dinput8 -> wine -> SDL -> Linux API translation, and this value can be incorrect. This can lead to partial/complete loss of Force Feedback or even unexpected force reversal. Sadly, this quirk can't be detected automatically without sending out effects that would move an axis. This fixes FFB on Moza Racing devices and others where effect direction is not simply ignored. Signed-off-by: Tomasz Pakuła <tomasz.pakula.oficjalny@gmail.com> Reviewed-by: Michał Kopeć <michal@nozomi.space> Reviewed-by: Paul Dino Jones <paul@spacefreak18.xyz> Signed-off-by: Jiri Kosina <jkosina@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-20HID: pidff: Add hid_pidff_init_with_quirks and export as GPL symbolTomasz Pakuła1-0/+2
[ Upstream commit 36de0164bbaff1484288e84ac5df5cff00580263 ] This lays out a way to provide an initial set of quirks to enable before device initialization takes place. GPL symbol export needed for the possibility of building HID drivers which use this function as modules. Adding a wrapper function to ensure compatibility with the old behavior of hid_pidff_init. Signed-off-by: Tomasz Pakuła <tomasz.pakula.oficjalny@gmail.com> Reviewed-by: Michał Kopeć <michal@nozomi.space> Reviewed-by: Paul Dino Jones <paul@spacefreak18.xyz> Tested-by: Paul Dino Jones <paul@spacefreak18.xyz> Tested-by: Cristóferson Bueno <cbueno81@gmail.com> Tested-by: Pablo Cisneros <patchkez@protonmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-20HID: pidff: Add PERMISSIVE_CONTROL quirkTomasz Pakuła1-2/+3
[ Upstream commit a4119108d2530747e61c7cbf52e2affd089cb1f6 ] With this quirk, a PID device isn't required to have a strict logical_minimum of 1 for the the PID_DEVICE_CONTROL usage page. Some devices come with weird values in their device descriptors and this quirk enables their initialization even if the logical minimum of the DEVICE_CONTROL page is not 1. Fixes initialization of VRS Direct Force Pro Changes in v6: - Change quirk name to better reflect it's intention Co-developed-by: Makarenko Oleg <oleg@makarenk.ooo> Signed-off-by: Makarenko Oleg <oleg@makarenk.ooo> Signed-off-by: Tomasz Pakuła <tomasz.pakula.oficjalny@gmail.com> Reviewed-by: Michał Kopeć <michal@nozomi.space> Reviewed-by: Paul Dino Jones <paul@spacefreak18.xyz> Tested-by: Paul Dino Jones <paul@spacefreak18.xyz> Tested-by: Cristóferson Bueno <cbueno81@gmail.com> Tested-by: Pablo Cisneros <patchkez@protonmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-20HID: pidff: Add MISSING_PBO quirk and its detectionTomasz Pakuła1-0/+1
[ Upstream commit fc7c154e9bb3c2b98875cfc565406f4787e3b7a4 ] Some devices with only one axis are missing PARAMETER_BLOCK_OFFSET field for conditional effects. They can only have one axis, so we're limiting the max_axis when setting the report for those effects. Automatic detection ensures compatibility even if such device won't be explicitly defined in the kernel. Fixes initialization of VRS DirectForce PRO and possibly other devices. Changes in v6: - Fixed NULL pointer dereference. When PBO is missing, make sure not to set it anyway Co-developed-by: Makarenko Oleg <oleg@makarenk.ooo> Signed-off-by: Makarenko Oleg <oleg@makarenk.ooo> Signed-off-by: Tomasz Pakuła <tomasz.pakula.oficjalny@gmail.com> Reviewed-by: Michał Kopeć <michal@nozomi.space> Reviewed-by: Paul Dino Jones <paul@spacefreak18.xyz> Tested-by: Paul Dino Jones <paul@spacefreak18.xyz> Tested-by: Cristóferson Bueno <cbueno81@gmail.com> Tested-by: Pablo Cisneros <patchkez@protonmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-20HID: pidff: Add MISSING_DELAY quirk and its detectionTomasz Pakuła1-0/+3
[ Upstream commit 2d5c7ce5bf4cc27db41632f357f682d0ee4518e7 ] A lot of devices do not include this field, and it's seldom used in force feedback implementations. I tested about three dozen applications and none of them make use of the delay. This fixes initialization of a lot of PID wheels like Cammus, VRS, FFBeast This change has no effect on fully compliant devices Co-developed-by: Makarenko Oleg <oleg@makarenk.ooo> Signed-off-by: Makarenko Oleg <oleg@makarenk.ooo> Signed-off-by: Tomasz Pakuła <tomasz.pakula.oficjalny@gmail.com> Reviewed-by: Michał Kopeć <michal@nozomi.space> Reviewed-by: Paul Dino Jones <paul@spacefreak18.xyz> Tested-by: Paul Dino Jones <paul@spacefreak18.xyz> Tested-by: Cristóferson Bueno <cbueno81@gmail.com> Tested-by: Pablo Cisneros <patchkez@protonmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-20Flush console log from kernel_power_off()Paul E. McKenney1-0/+6
[ Upstream commit 6ea9a1781c70a8be1fcdc49134fc1bf4baba8bca ] Kernels built with CONFIG_PREEMPT_RT=y can lose significant console output and shutdown time, which hides shutdown-time RCU issues from rcutorture. Therefore, make pr_flush() public and invoke it after then last print in kernel_power_off(). [ paulmck: Apply John Ogness feedback. ] [ paulmck: Appy Sebastian Andrzej Siewior feedback. ] [ paulmck: Apply kernel test robot feedback. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Link: https://lore.kernel.org/r/5f743488-dc2a-4f19-bdda-cf50b9314832@paulmck-laptop Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-20perf: Fix hang while freeing sigtrap eventFrederic Weisbecker1-1/+0
[ Upstream commit 56799bc035658738f362acec3e7647bb84e68933 ] Perf can hang while freeing a sigtrap event if a related deferred signal hadn't managed to be sent before the file got closed: perf_event_overflow() task_work_add(perf_pending_task) fput() task_work_add(____fput()) task_work_run() ____fput() perf_release() perf_event_release_kernel() _free_event() perf_pending_task_sync() task_work_cancel() -> FAILED rcuwait_wait_event() Once task_work_run() is running, the list of pending callbacks is removed from the task_struct and from this point on task_work_cancel() can't remove any pending and not yet started work items, hence the task_work_cancel() failure and the hang on rcuwait_wait_event(). Task work could be changed to remove one work at a time, so a work running on the current task can always cancel a pending one, however the wait / wake design is still subject to inverted dependencies when remote targets are involved, as pictured by Oleg: T1 T2 fd = perf_event_open(pid => T2->pid); fd = perf_event_open(pid => T1->pid); close(fd) close(fd) <IRQ> <IRQ> perf_event_overflow() perf_event_overflow() task_work_add(perf_pending_task) task_work_add(perf_pending_task) </IRQ> </IRQ> fput() fput() task_work_add(____fput()) task_work_add(____fput()) task_work_run() task_work_run() ____fput() ____fput() perf_release() perf_release() perf_event_release_kernel() perf_event_release_kernel() _free_event() _free_event() perf_pending_task_sync() perf_pending_task_sync() rcuwait_wait_event() rcuwait_wait_event() Therefore the only option left is to acquire the event reference count upon queueing the perf task work and release it from the task work, just like it was done before 3a5465418f5f ("perf: Fix event leak upon exec and file release") but without the leaks it fixed. Some adjustments are necessary to make it work: * A child event might dereference its parent upon freeing. Care must be taken to release the parent last. * Some places assuming the event doesn't have any reference held and therefore can be freed right away must instead put the reference and let the reference counting to its job. Reported-by: "Yi Lai" <yi1.lai@linux.intel.com> Closes: https://lore.kernel.org/all/Zx9Losv4YcJowaP%2F@ly-workstation/ Reported-by: syzbot+3c4321e10eea460eb606@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/673adf75.050a0220.87769.0024.GAE@google.com/ Fixes: 3a5465418f5f ("perf: Fix event leak upon exec and file release") Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20250304135446.18905-1-frederic@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-20perf/core: Simplify the perf_event_alloc() error pathPeter Zijlstra1-7/+9
[ Upstream commit c70ca298036c58a88686ff388d3d367e9d21acf0 ] The error cleanup sequence in perf_event_alloc() is a subset of the existing _free_event() function (it must of course be). Split this out into __free_event() and simplify the error path. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lore.kernel.org/r/20241104135517.967889521@infradead.org Stable-dep-of: 56799bc03565 ("perf: Fix hang while freeing sigtrap event") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-20cgroup/cpuset: Fix race between newly created partition and dying oneWaiman Long2-1/+2
[ Upstream commit a22b3d54de94f82ca057cc2ebf9496fa91ebf698 ] There is a possible race between removing a cgroup diectory that is a partition root and the creation of a new partition. The partition to be removed can be dying but still online, it doesn't not currently participate in checking for exclusive CPUs conflict, but the exclusive CPUs are still there in subpartitions_cpus and isolated_cpus. These two cpumasks are global states that affect the operation of cpuset partitions. The exclusive CPUs in dying cpusets will only be removed when cpuset_css_offline() function is called after an RCU delay. As a result, it is possible that a new partition can be created with exclusive CPUs that overlap with those of a dying one. When that dying partition is finally offlined, it removes those overlapping exclusive CPUs from subpartitions_cpus and maybe isolated_cpus resulting in an incorrect CPU configuration. This bug was found when a warning was triggered in remote_partition_disable() during testing because the subpartitions_cpus mask was empty. One possible way to fix this is to iterate the dying cpusets as well and avoid using the exclusive CPUs in those dying cpusets. However, this can still cause random partition creation failures or other anomalies due to racing. A better way to fix this race is to reset the partition state at the moment when a cpuset is being killed. Introduce a new css_killed() CSS function pointer and call it, if defined, before setting CSS_DYING flag in kill_css(). Also update the css_is_dying() helper to use the CSS_DYING flag introduced by commit 33c35aa48178 ("cgroup: Prevent kill_css() from being called more than once") for proper synchronization. Add a new cpuset_css_killed() function to reset the partition state of a valid partition root if it is being killed. Fixes: ee8dde0cd2ce ("cpuset: Add new v2 cpuset.sched.partition flag") Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10include/{topology,cpuset}: Move dl_rebuild_rd_accounting to cpuset.hJuri Lelli2-2/+5
commit 34929a070b7fd06c386080c926b61ee844e6ad34 upstream. dl_rebuild_rd_accounting() is defined in cpuset.c, so it makes more sense to move related declarations to cpuset.h. Implement the move. Suggested-by: Waiman Long <llong@redhat.com> Signed-off-by: Juri Lelli <juri.lelli@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Waiman Long <llong@redhat.com> Reviewed-by: Valentin Schneider <vschneid@redhat.com> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Tested-by: Waiman Long <longman@redhat.com> Tested-by: Jon Hunter <jonathanh@nvidia.com> Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Link: https://lore.kernel.org/r/Z9MSOVMpU7jpVrMU@jlelli-thinkpadt14gen4.remote.csb Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-04-10uprobes/x86: Harden uretprobe syscall trampoline checkJiri Olsa1-0/+2
commit fa6192adc32f4fdfe5b74edd5b210e12afd6ecc0 upstream. Jann reported a possible issue when trampoline_check_ip returns address near the bottom of the address space that is allowed to call into the syscall if uretprobes are not set up: https://lore.kernel.org/bpf/202502081235.5A6F352985@keescook/T/#m9d416df341b8fbc11737dacbcd29f0054413cbbf Though the mmap minimum address restrictions will typically prevent creating mappings there, let's make sure uretprobe syscall checks for that. Fixes: ff474a78cef5 ("uprobe: Add uretprobe syscall to speed up return probe") Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Kees Cook <kees@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Acked-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20250212220433.3624297-1-jolsa@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-04-10rcu-tasks: Always inline rcu_irq_work_resched()Josh Poimboeuf1-1/+1
[ Upstream commit 6309a5c43b0dc629851f25b2e5ef8beff61d08e5 ] Thanks to CONFIG_DEBUG_SECTION_MISMATCH, empty functions can be generated out of line. rcu_irq_work_resched() can be called from noinstr code, so make sure it's always inlined. Fixes: 564506495ca9 ("rcu/context-tracking: Move deferred nocb resched to context tracking") Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/e84f15f013c07e4c410d972e75620c53b62c1b3e.1743481539.git.jpoimboe@kernel.org Closes: https://lore.kernel.org/d1eca076-fdde-484a-b33e-70e0d167c36d@infradead.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10context_tracking: Always inline ct_{nmi,irq}_{enter,exit}()Josh Poimboeuf1-4/+4
[ Upstream commit 9ac50f7311dc8b39e355582f14c1e82da47a8196 ] Thanks to CONFIG_DEBUG_SECTION_MISMATCH, empty functions can be generated out of line. These can be called from noinstr code, so make sure they're always inlined. Fixes the following warnings: vmlinux.o: warning: objtool: irqentry_nmi_enter+0xa2: call to ct_nmi_enter() leaves .noinstr.text section vmlinux.o: warning: objtool: irqentry_nmi_exit+0x16: call to ct_nmi_exit() leaves .noinstr.text section vmlinux.o: warning: objtool: irqentry_exit+0x78: call to ct_irq_exit() leaves .noinstr.text section Fixes: 6f0e6c1598b1 ("context_tracking: Take IRQ eqs entrypoints over RCU") Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/8509bce3f536bcd4ae7af3a2cf6930d48c5e631a.1743481539.git.jpoimboe@kernel.org Closes: https://lore.kernel.org/d1eca076-fdde-484a-b33e-70e0d167c36d@infradead.org Signed-off-by: Sasha Levin <sashal@kernel.org>