summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
15 hoursLinux 6.12.43v6.12.43linux-6.12.yGreg Kroah-Hartman1-1/+1
Link: https://lore.kernel.org/r/20250818124448.879659024@linuxfoundation.org Tested-by: Salvatore Bonaccorso <carnil@debian.org> Tested-by: Brett A C Sheffield <bacs@librecast.net> Tested-by: Florian Fainelli <florian.fainelli@broadcom.com> Tested-by: Shuah Khan <skhan@linuxfoundation.org> Tested-by: Hardik Garg <hargar@linux.microsoft.com> Tested-by: Peter Schneider <pschneider1968@googlemail.com> Tested-by: Ron Economos <re@w6rz.net> Link: https://lore.kernel.org/r/20250819122820.553053307@linuxfoundation.org Tested-by: Peter Schneider <pschneider1968@googlemail.com> Tested-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com> Tested-by: Florian Fainelli <floria.fainelli@broadcom.com> Tested-by: Pavel Machek (CIP) <pavel@denx.de> Tested-by: Hardik Garg <hargar@linux.microsoft.com> Tested-by: Brett A C Sheffield <bacs@librecast.net> Tested-by: Brett Mastbergen <bmastbergen@ciq.com> Tested-by: Jon Hunter <jonathanh@nvidia.com> Tested-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
15 hoursACPI: Return -ENODEV from acpi_parse_spcr() when SPCR support is disabledLi Chen1-1/+1
commit b9f58d3572a8e1ef707b941eae58ec4014b9269d upstream. If CONFIG_ACPI_SPCR_TABLE is disabled, acpi_parse_spcr() currently returns 0, which may incorrectly suggest that SPCR parsing was successful. This patch changes the behavior to return -ENODEV to clearly indicate that SPCR support is not available. This prepares the codebase for future changes that depend on acpi_parse_spcr() failure detection, such as suppressing misleading console messages. Signed-off-by: Li Chen <chenl311@chinatelecom.cn> Acked-by: Hanjun Guo <guohanjun@huawei.com> Link: https://lore.kernel.org/r/20250620131309.126555-2-me@linux.beauty Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
15 hoursPCI: Honor Max Link Speed when determining supported speedsLukas Wunner1-2/+4
commit 3202ca221578850f34e0fea39dc6cfa745ed7aac upstream. The Supported Link Speeds Vector in the Link Capabilities 2 Register indicates the *supported* link speeds. The Max Link Speed field in the Link Capabilities Register indicates the *maximum* of those speeds. pcie_get_supported_speeds() neglects to honor the Max Link Speed field and will thus incorrectly deem higher speeds as supported. Fix it. One user-visible issue addressed here is an incorrect value in the sysfs attribute "max_link_speed". But the main motivation is a boot hang reported by Niklas: Intel JHL7540 "Titan Ridge 2018" Thunderbolt controllers supports 2.5-8 GT/s speeds, but indicate 2.5 GT/s as maximum. Ilpo recalls seeing this on more devices. It can be explained by the controller's Downstream Ports supporting 8 GT/s if an Endpoint is attached, but limiting to 2.5 GT/s if the port interfaces to a PCIe Adapter, in accordance with USB4 v2 sec 11.2.1: "This section defines the functionality of an Internal PCIe Port that interfaces to a PCIe Adapter. [...] The Logical sub-block shall update the PCIe configuration registers with the following characteristics: [...] Max Link Speed field in the Link Capabilities Register set to 0001b (data rate of 2.5 GT/s only). Note: These settings do not represent actual throughput. Throughput is implementation specific and based on the USB4 Fabric performance." The present commit is not sufficient on its own to fix Niklas' boot hang, but it is a prerequisite: A subsequent commit will fix the boot hang by enabling bandwidth control only if more than one speed is supported. The GENMASK() macro used herein specifies 0 as lowest bit, even though the Supported Link Speeds Vector ends at bit 1. This is done on purpose to avoid a GENMASK(0, 1) macro if Max Link Speed is zero. That macro would be invalid as the lowest bit is greater than the highest bit. Ilpo has witnessed a zero Max Link Speed on Root Complex Integrated Endpoints in particular, so it does occur in practice. (The Link Capabilities Register is optional on RCiEPs per PCIe r6.2 sec 7.5.3.) Fixes: d2bd39c0456b ("PCI: Store all PCIe Supported Link Speeds") Closes: https://lore.kernel.org/r/70829798889c6d779ca0f6cd3260a765780d1369.camel@kernel.org Link: https://lore.kernel.org/r/fe03941e3e1cc42fb9bf4395e302bff53ee2198b.1734428762.git.lukas@wunner.de Reported-by: Niklas Schnelle <niks@kernel.org> Tested-by: Niklas Schnelle <niks@kernel.org> Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
15 hoursdm: split write BIOs on zone boundaries when zone append is not emulatedShin'ichiro Kawasaki1-11/+7
commit 675f940576351bb049f5677615140b9d0a7712d0 upstream. Commit 2df7168717b7 ("dm: Always split write BIOs to zoned device limits") updates the device-mapper driver to perform splits for the write BIOs. However, it did not address the cases where DM targets do not emulate zone append, such as in the cases of dm-linear or dm-flakey. For these targets, when the write BIOs span across zone boundaries, they trigger WARN_ON_ONCE(bio_straddles_zones(bio)) in blk_zone_wplug_handle_write(). This results in I/O errors. The errors are reproduced by running blktests test case zbd/004 using zoned dm-linear or dm-flakey devices. To avoid the I/O errors, handle the write BIOs regardless whether DM targets emulate zone append or not, so that all write BIOs are split at zone boundaries. For that purpose, drop the check for zone append emulation in dm_zone_bio_needs_split(). Its argument 'md' is no longer used then drop it also. Fixes: 2df7168717b7 ("dm: Always split write BIOs to zoned device limits") Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Mikulas Patocka <mpatocka@redhat.com> Link: https://lore.kernel.org/r/20250717103539.37279-1-shinichiro.kawasaki@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
15 hoursrcu: Fix racy re-initialization of irq_work causing hangsFrederic Weisbecker3-2/+9
commit 61399e0c5410567ef60cb1cda34cca42903842e3 upstream. RCU re-initializes the deferred QS irq work everytime before attempting to queue it. However there are situations where the irq work is attempted to be queued even though it is already queued. In that case re-initializing messes-up with the irq work queue that is about to be handled. The chances for that to happen are higher when the architecture doesn't support self-IPIs and irq work are then all lazy, such as with the following sequence: 1) rcu_read_unlock() is called when IRQs are disabled and there is a grace period involving blocked tasks on the node. The irq work is then initialized and queued. 2) The related tasks are unblocked and the CPU quiescent state is reported. rdp->defer_qs_iw_pending is reset to DEFER_QS_IDLE, allowing the irq work to be requeued in the future (note the previous one hasn't fired yet). 3) A new grace period starts and the node has blocked tasks. 4) rcu_read_unlock() is called when IRQs are disabled again. The irq work is re-initialized (but it's queued! and its node is cleared) and requeued. Which means it's requeued to itself. 5) The irq work finally fires with the tick. But since it was requeued to itself, it loops and hangs. Fix this with initializing the irq work only once before the CPU boots. Fixes: b41642c87716 ("rcu: Fix rcu_read_unlock() deadloop due to IRQ work") Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202508071303.c1134cce-lkp@intel.com Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com> Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.upadhyay@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
15 hoursdrm/amd/display: Allow DCN301 to clear update flagsIvan Lipski1-1/+2
commit 2d418e4fd9f1eca7dfce80de86dd702d36a06a25 upstream. [Why & How] Not letting DCN301 to clear after surface/stream update results in artifacts when switching between active overlay planes. The issue is known and has been solved initially. See below: (https://gitlab.freedesktop.org/drm/amd/-/issues/3441) Fixes: f354556e29f4 ("drm/amd/display: limit clear_update_flags t dcn32 and above") Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Ivan Lipski <ivan.lipski@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
15 hoursfirmware: arm_scmi: Convert to SYSTEM_SLEEP_PM_OPSArnd Bergmann1-2/+2
commit 62d6b81e8bd207ad44eff39d1a0fe17f0df510a5 upstream. The old SET_SYSTEM_SLEEP_PM_OPS() macro leads to a warning about an unused function: | drivers/firmware/arm_scmi/scmi_power_control.c:363:12: error: | 'scmi_system_power_resume' defined but not used [-Werror=unused-function] | static int scmi_system_power_resume(struct device *dev) The proper way to do this these days is to use SYSTEM_SLEEP_PM_OPS() and pm_sleep_ptr(). Fixes: 9a0658d3991e ("firmware: arm_scmi: power_control: Ensure SCMI_SYSPOWER_IDLE is set early during resume") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Peng Fan <peng.fan@nxp.com> Message-Id: <20250709070107.1388512-1-arnd@kernel.org> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
15 hoursio_uring/rw: cast rw->flags assignment to rwf_tJens Axboe1-1/+1
commit 825aea662b492571877b32aeeae13689fd9fbee4 upstream. kernel test robot reports that a recent change of the sqe->rw_flags field throws a sparse warning on 32-bit archs: >> io_uring/rw.c:291:19: sparse: sparse: incorrect type in assignment (different base types) @@ expected restricted __kernel_rwf_t [usertype] flags @@ got unsigned int @@ io_uring/rw.c:291:19: sparse: expected restricted __kernel_rwf_t [usertype] flags io_uring/rw.c:291:19: sparse: got unsigned int Force cast it to rwf_t to silence that new sparse warning. Fixes: cf73d9970ea4 ("io_uring: don't use int for ABI") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202507032211.PwSNPNSP-lkp@intel.com/ Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
15 hoursata: libata-sata: Add link_power_management_supported sysfs attributeDamien Le Moal4-12/+44
commit 0060beec0bfa647c4b510df188b1c4673a197839 upstream. A port link power management (LPM) policy can be controlled using the link_power_management_policy sysfs host attribute. However, this attribute exists also for hosts that do not support LPM and in such case, attempting to change the LPM policy for the host (port) will fail with -EOPNOTSUPP. Introduce the new sysfs link_power_management_supported host attribute to indicate to the user if a the port and the devices connected to the port for the host support LPM, which implies that the link_power_management_policy attribute can be used. Since checking that a port and its devices support LPM is common between the new ata_scsi_lpm_supported_show() function and the existing ata_scsi_lpm_store() function, the new helper ata_scsi_lpm_supported() is introduced. Fixes: 413e800cadbf ("ata: libata-sata: Disallow changing LPM state if not supported") Reported-by: Borah, Chaitanya Kumar <chaitanya.kumar.borah@intel.com> Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202507251014.a5becc3b-lkp@intel.com Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
15 hoursrust: workaround `rustdoc` target modifiers bugMiguel Ojeda1-0/+6
commit abbf9a44944171ca99c150adad9361a2f517d3b6 upstream. Starting with Rust 1.88.0 (released 2025-06-26), `rustdoc` complains about a target modifier mismatch in configurations where `-Zfixed-x18` is passed: error: mixing `-Zfixed-x18` will cause an ABI mismatch in crate `rust_out` | = help: the `-Zfixed-x18` flag modifies the ABI so Rust crates compiled with different values of this flag cannot be used together safely = note: unset `-Zfixed-x18` in this crate is incompatible with `-Zfixed-x18=` in dependency `core` = help: set `-Zfixed-x18=` in this crate or unset `-Zfixed-x18` in `core` = help: if you are sure this will not cause problems, you may use `-Cunsafe-allow-abi-mismatch=fixed-x18` to silence this error The reason is that `rustdoc` was not passing the target modifiers when configuring the session options, and thus it would report a mismatch that did not exist as soon as a target modifier is used in a dependency. We did not notice it in the kernel until now because `-Zfixed-x18` has been a target modifier only since 1.88.0 (and it is the only one we use so far). The issue has been reported upstream [1] and a fix has been submitted [2], including a test similar to the kernel case. [ This is now fixed upstream (thanks Guillaume for the quick review), so it will be fixed in Rust 1.90.0 (expected 2025-09-18). - Miguel ] Meanwhile, conditionally pass `-Cunsafe-allow-abi-mismatch=fixed-x18` to workaround the issue on our side. Cc: stable@vger.kernel.org # Needed in 6.12.y and later (Rust is pinned in older LTSs). Reported-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Closes: https://lore.kernel.org/rust-for-linux/36cdc798-524f-4910-8b77-d7b9fac08d77@oss.qualcomm.com/ Link: https://github.com/rust-lang/rust/issues/144521 [1] Link: https://github.com/rust-lang/rust/pull/144523 [2] Reviewed-by: Alice Ryhl <aliceryhl@google.com> Link: https://lore.kernel.org/r/20250727092317.2930617-1-ojeda@kernel.org Signed-off-by: Miguel Ojeda <ojeda@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
15 hoursrust: kbuild: clean output before running `rustdoc`Miguel Ojeda1-2/+5
commit 252fea131e15aba2cd487119d1a8f546471199e2 upstream. `rustdoc` can get confused when generating documentation into a folder that contains generated files from other `rustdoc` versions. For instance, running something like: rustup default 1.78.0 make LLVM=1 rustdoc rustup default 1.88.0 make LLVM=1 rustdoc may generate errors like: error: couldn't generate documentation: invalid template: last line expected to start with a comment | = note: failed to create or modify "./Documentation/output/rust/rustdoc/src-files.js" Thus just always clean the output folder before generating the documentation -- we are anyway regenerating it every time the `rustdoc` target gets called, at least for the time being. Cc: stable@vger.kernel.org # Needed in 6.12.y and later (Rust is pinned in older LTSs). Reported-by: Daniel Almeida <daniel.almeida@collabora.com> Closes: https://rust-for-linux.zulipchat.com/#narrow/channel/288089/topic/x/near/527201113 Reviewed-by: Tamir Duberstein <tamird@kernel.org> Link: https://lore.kernel.org/r/20250726133435.2460085-1-ojeda@kernel.org Signed-off-by: Miguel Ojeda <ojeda@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
15 hoursarm64: dts: ti: k3-j722s-evm: Fix USB gpio-hog level for Type-CSiddharth Vadapalli1-1/+1
[ Upstream commit 65ba2a6e77e9e5c843a591055789050e77b5c65e ] According to the "GPIO Expander Map / Table" section of the J722S EVM Schematic within the Evaluation Module Design Files package [0], the GPIO Pin P05 located on the GPIO Expander 1 (I2C0/0x23) has to be pulled down to select the Type-C interface. Since commit under Fixes claims to enable the Type-C interface, update the property within "p05-hog" from "output-high" to "output-low", thereby switching from the Type-A interface to the Type-C interface. [0]: https://www.ti.com/lit/zip/sprr495 Cc: stable@vger.kernel.org Fixes: 485705df5d5f ("arm64: dts: ti: k3-j722s: Enable PCIe and USB support on J722S-EVM") Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com> Link: https://lore.kernel.org/r/20250623100657.4082031-1-s-vadapalli@ti.com Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
15 hoursarm64: dts: ti: k3-j722s-evm: Fix USB2.0_MUX_SEL to select Type-CHrushikesh Salunke1-1/+1
[ Upstream commit bc8d9e6b5821c40ab5dd3a81e096cb114939de50 ] J722S SOC has two usb controllers USB0 and USB1. USB0 is brought out on the EVM as a stacked USB connector which has one Type-A and one Type-C port. These Type-A and Type-C ports are connected to MUX so only one of them can be enabled at a time. Commit under Fixes, tries to enable the USB0 instance of USB to interface with the Type-C port via the USB hub, by configuring the USB2.0_MUX_SEL to GPIO_ACTIVE_HIGH. But it is observed on J722S-EVM that Type-A port is enabled instead of Type-C port. Fix this by setting USB2.0_MUX_SEL to GPIO_ACTIVE_LOW to enable Type-C port. Fixes: 485705df5d5f ("arm64: dts: ti: k3-j722s: Enable PCIe and USB support on J722S-EVM") Signed-off-by: Hrushikesh Salunke <h-salunke@ti.com> Reviewed-by: Roger Quadros <rogerq@kernel.org> Link: https://lore.kernel.org/r/20250116125726.2549489-1-h-salunke@ti.com Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com> Stable-dep-of: 65ba2a6e77e9 ("arm64: dts: ti: k3-j722s-evm: Fix USB gpio-hog level for Type-C") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
15 hoursPCI/ACPI: Fix runtime PM ref imbalance on Hot-Plug Capable portsLukas Wunner4-5/+13
[ Upstream commit 6cff20ce3b92ffbf2fc5eb9e5a030b3672aa414a ] pci_bridge_d3_possible() is called from both pcie_portdrv_probe() and pcie_portdrv_remove() to determine whether runtime power management shall be enabled (on probe) or disabled (on remove) on a PCIe port. The underlying assumption is that pci_bridge_d3_possible() always returns the same value, else a runtime PM reference imbalance would occur. That assumption is not given if the PCIe port is inaccessible on remove due to hot-unplug: pci_bridge_d3_possible() calls pciehp_is_native(), which accesses Config Space to determine whether the port is Hot-Plug Capable. An inaccessible port returns "all ones", which is converted to "all zeroes" by pcie_capability_read_dword(). Hence the port no longer seems Hot-Plug Capable on remove even though it was on probe. The resulting runtime PM ref imbalance causes warning messages such as: pcieport 0000:02:04.0: Runtime PM usage count underflow! Avoid the Config Space access (and thus the runtime PM ref imbalance) by caching the Hot-Plug Capable bit in struct pci_dev. The struct already contains an "is_hotplug_bridge" flag, which however is not only set on Hot-Plug Capable PCIe ports, but also Conventional PCI Hot-Plug bridges and ACPI slots. The flag identifies bridges which are allocated additional MMIO and bus number resources to allow for hierarchy expansion. The kernel is somewhat sloppily using "is_hotplug_bridge" in a number of places to identify Hot-Plug Capable PCIe ports, even though the flag encompasses other devices. Subsequent commits replace these occurrences with the new flag to clearly delineate Hot-Plug Capable PCIe ports from other kinds of hotplug bridges. Document the existing "is_hotplug_bridge" and the new "is_pciehp" flag and document the (non-obvious) requirement that pci_bridge_d3_possible() always returns the same value across the entire lifetime of a bridge, including its hot-removal. Fixes: 5352a44a561d ("PCI: pciehp: Make pciehp_is_native() stricter") Reported-by: Laurent Bigonville <bigon@bigon.be> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220216 Reported-by: Mario Limonciello <mario.limonciello@amd.com> Closes: https://lore.kernel.org/r/20250609020223.269407-3-superm1@kernel.org/ Link: https://lore.kernel.org/all/20250620025535.3425049-3-superm1@kernel.org/T/#u Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Rafael J. Wysocki <rafael@kernel.org> Cc: stable@vger.kernel.org # v4.18+ Link: https://patch.msgid.link/fe5dcc3b2e62ee1df7905d746bde161eb1b3291c.1752390101.git.lukas@wunner.de Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
15 hoursPCI: Allow PCI bridges to go to D3Hot on all non-x86Manivannan Sadhasivam1-4/+4
[ Upstream commit a5fb3ff632876d63ee1fc5ed3af2464240145a00 ] Currently, pci_bridge_d3_possible() encodes a variety of decision factors when deciding whether a given bridge can be put into D3. A particular one of note is for "recent enough PCIe ports." Per Rafael [0]: "There were hardware issues related to PM on x86 platforms predating the introduction of Connected Standby in Windows. For instance, programming a port into D3hot by writing to its PMCSR might cause the PCIe link behind it to go down and the only way to revive it was to power cycle the Root Complex. And similar." Thus, this function contains a DMI-based check for post-2015 BIOS. The above factors (Windows, x86) don't really apply to non-x86 systems, and also, many such systems don't have BIOS or DMI. However, we'd like to be able to suspend bridges on non-x86 systems too. Restrict the "recent enough" check to x86. If we find further incompatibilities, it probably makes sense to expand on the deny-list approach (i.e., bridge_d3_blacklist or similar). Link: https://lore.kernel.org/r/20250320110604.v6.1.Id0a0e78ab0421b6bce51c4b0b87e6aebdfc69ec7@changeid Link: https://lore.kernel.org/linux-pci/CAJZ5v0j_6jeMAQ7eFkZBe5Yi+USGzysxAgfemYh=-zq4h5W+Qg@mail.gmail.com/ [0] Link: https://lore.kernel.org/linux-pci/20240227225442.GA249898@bhelgaas/ [1] Link: https://lore.kernel.org/linux-pci/20240828210705.GA37859@bhelgaas/ [2] [Brian: rewrite to !X86 based on Rafael's suggestions] Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Brian Norris <briannorris@chromium.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Stable-dep-of: 6cff20ce3b92 ("PCI/ACPI: Fix runtime PM ref imbalance on Hot-Plug Capable ports") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
15 hoursPCI: Store all PCIe Supported Link SpeedsIlpo Järvinen5-17/+56
[ Upstream commit d2bd39c0456b75be9dfc7d774b8d021355c26ae3 ] The PCIe bandwidth controller added by a subsequent commit will require selecting PCIe Link Speeds that are lower than the Maximum Link Speed. The struct pci_bus only stores max_bus_speed. Even if PCIe r6.1 sec 8.2.1 currently disallows gaps in supported Link Speeds, the Implementation Note in PCIe r6.1 sec 7.5.3.18, recommends determining supported Link Speeds using the Supported Link Speeds Vector in the Link Capabilities 2 Register (when available) to "avoid software being confused if a future specification defines Links that do not require support for all slower speeds." Reuse code in pcie_get_speed_cap() to add pcie_get_supported_speeds() to query the Supported Link Speeds Vector of a PCIe device. The value is taken directly from the Supported Link Speeds Vector or synthesized from the Max Link Speed in the Link Capabilities Register when the Link Capabilities 2 Register is not available. The Supported Link Speeds Vector in the Link Capabilities Register 2 corresponds to the bus below on Root Ports and Downstream Ports, whereas it corresponds to the bus above on Upstream Ports and Endpoints (PCIe r6.1 sec 7.5.3.18): Supported Link Speeds Vector - This field indicates the supported Link speed(s) of the associated Port. Add supported_speeds into the struct pci_dev that caches the Supported Link Speeds Vector. supported_speeds contains a set of Link Speeds only in the case where PCIe Link Speed can be determined. Root Complex Integrated Endpoints do not have a well-defined Link Speed because they do not implement either of the Link Capabilities Registers, which is allowed by PCIe r6.1 sec 7.5.3 (the same limitation applies to determining cur_bus_speed and max_bus_speed that are PCI_SPEED_UNKNOWN in such case). This is of no concern from PCIe bandwidth controller point of view because such devices are not attached into a PCIe Root Port that could be controlled. The supported_speeds field keeps the extra reserved zero at the least significant bit to match the Link Capabilities 2 Register layout. An attempt was made to store supported_speeds field into the struct pci_bus as an intersection of both ends of the Link, however, the subordinate struct pci_bus is not available early enough. The Target Speed quirk (in pcie_failed_link_retrain()) can run either during initial scan or later, requiring it to use the API provided by the PCIe bandwidth controller to set the Target Link Speed in order to co-exist with the bandwidth controller. When the Target Speed quirk is calling the bandwidth controller during initial scan, the struct pci_bus is not yet initialized. As such, storing supported_speeds into the struct pci_bus is not viable. Suggested-by: Lukas Wunner <lukas@wunner.de> Link: https://lore.kernel.org/r/20241018144755.7875-4-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> [bhelgaas: move pcie_get_supported_speeds() decl to drivers/pci/pci.h] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Stable-dep-of: 6cff20ce3b92 ("PCI/ACPI: Fix runtime PM ref imbalance on Hot-Plug Capable ports") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
15 hourssmb: client: fix netns refcount leak after net_passive changesWang Zhaolong1-6/+3
[ Upstream commit 59b33fab4ca4d7dacc03367082777627e05d0323 ] After commit 5c70eb5c593d ("net: better track kernel sockets lifetime"), kernel sockets now use net_passive reference counting. However, commit 95d2b9f693ff ("Revert "smb: client: fix TCP timers deadlock after rmmod"") restored the manual socket refcount manipulation without adapting to this new mechanism, causing a memory leak. The issue can be reproduced by[1]: 1. Creating a network namespace 2. Mounting and Unmounting CIFS within the namespace 3. Deleting the namespace Some memory leaks may appear after a period of time following step 3. unreferenced object 0xffff9951419f6b00 (size 256): comm "ip", pid 447, jiffies 4294692389 (age 14.730s) hex dump (first 32 bytes): 1b 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 80 77 c2 44 51 99 ff ff .........w.DQ... backtrace: __kmem_cache_alloc_node+0x30e/0x3d0 __kmalloc+0x52/0x120 net_alloc_generic+0x1d/0x30 copy_net_ns+0x86/0x200 create_new_namespaces+0x117/0x300 unshare_nsproxy_namespaces+0x60/0xa0 ksys_unshare+0x148/0x360 __x64_sys_unshare+0x12/0x20 do_syscall_64+0x59/0x110 entry_SYSCALL_64_after_hwframe+0x78/0xe2 ... unreferenced object 0xffff9951442e7500 (size 32): comm "mount.cifs", pid 475, jiffies 4294693782 (age 13.343s) hex dump (first 32 bytes): 40 c5 38 46 51 99 ff ff 18 01 96 42 51 99 ff ff @.8FQ......BQ... 01 00 00 00 6f 00 c5 07 6f 00 d8 07 00 00 00 00 ....o...o....... backtrace: __kmem_cache_alloc_node+0x30e/0x3d0 kmalloc_trace+0x2a/0x90 ref_tracker_alloc+0x8e/0x1d0 sk_alloc+0x18c/0x1c0 inet_create+0xf1/0x370 __sock_create+0xd7/0x1e0 generic_ip_connect+0x1d4/0x5a0 [cifs] cifs_get_tcp_session+0x5d0/0x8a0 [cifs] cifs_mount_get_session+0x47/0x1b0 [cifs] dfs_mount_share+0xfa/0xa10 [cifs] cifs_mount+0x68/0x2b0 [cifs] cifs_smb3_do_mount+0x10b/0x760 [cifs] smb3_get_tree+0x112/0x2e0 [cifs] vfs_get_tree+0x29/0xf0 path_mount+0x2d4/0xa00 __se_sys_mount+0x165/0x1d0 Root cause: When creating kernel sockets, sk_alloc() calls net_passive_inc() for sockets with sk_net_refcnt=0. The CIFS code manually converts kernel sockets to user sockets by setting sk_net_refcnt=1, but doesn't call the corresponding net_passive_dec(). This creates an imbalance in the net_passive counter, which prevents the network namespace from being destroyed when its last user reference is dropped. As a result, the entire namespace and all its associated resources remain allocated. Timeline of patches leading to this issue: - commit ef7134c7fc48 ("smb: client: Fix use-after-free of network namespace.") in v6.12 fixed the original netns UAF by manually managing socket refcounts - commit e9f2517a3e18 ("smb: client: fix TCP timers deadlock after rmmod") in v6.13 attempted to use kernel sockets but introduced TCP timer issues - commit 5c70eb5c593d ("net: better track kernel sockets lifetime") in v6.14-rc5 introduced the net_passive mechanism with sk_net_refcnt_upgrade() for proper socket conversion - commit 95d2b9f693ff ("Revert "smb: client: fix TCP timers deadlock after rmmod"") in v6.15-rc3 reverted to manual refcount management without adapting to the new net_passive changes Fix this by using sk_net_refcnt_upgrade() which properly handles the net_passive counter when converting kernel sockets to user sockets. Link: https://bugzilla.kernel.org/show_bug.cgi?id=220343 [1] Fixes: 95d2b9f693ff ("Revert "smb: client: fix TCP timers deadlock after rmmod"") Cc: stable@vger.kernel.org Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Enzo Matsumiya <ematsumiya@suse.de> Signed-off-by: Wang Zhaolong <wangzhaolong@huaweicloud.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
15 hoursnet: better track kernel sockets lifetimeEric Dumazet8-39/+30
[ Upstream commit 5c70eb5c593d64d93b178905da215a9fd288a4b5 ] While kernel sockets are dismantled during pernet_operations->exit(), their freeing can be delayed by any tx packets still held in qdisc or device queues, due to skb_set_owner_w() prior calls. This then trigger the following warning from ref_tracker_dir_exit() [1] To fix this, make sure that kernel sockets own a reference on net->passive. Add sk_net_refcnt_upgrade() helper, used whenever a kernel socket is converted to a refcounted one. [1] [ 136.263918][ T35] ref_tracker: net notrefcnt@ffff8880638f01e0 has 1/2 users at [ 136.263918][ T35] sk_alloc+0x2b3/0x370 [ 136.263918][ T35] inet6_create+0x6ce/0x10f0 [ 136.263918][ T35] __sock_create+0x4c0/0xa30 [ 136.263918][ T35] inet_ctl_sock_create+0xc2/0x250 [ 136.263918][ T35] igmp6_net_init+0x39/0x390 [ 136.263918][ T35] ops_init+0x31e/0x590 [ 136.263918][ T35] setup_net+0x287/0x9e0 [ 136.263918][ T35] copy_net_ns+0x33f/0x570 [ 136.263918][ T35] create_new_namespaces+0x425/0x7b0 [ 136.263918][ T35] unshare_nsproxy_namespaces+0x124/0x180 [ 136.263918][ T35] ksys_unshare+0x57d/0xa70 [ 136.263918][ T35] __x64_sys_unshare+0x38/0x40 [ 136.263918][ T35] do_syscall_64+0xf3/0x230 [ 136.263918][ T35] entry_SYSCALL_64_after_hwframe+0x77/0x7f [ 136.263918][ T35] [ 136.343488][ T35] ref_tracker: net notrefcnt@ffff8880638f01e0 has 1/2 users at [ 136.343488][ T35] sk_alloc+0x2b3/0x370 [ 136.343488][ T35] inet6_create+0x6ce/0x10f0 [ 136.343488][ T35] __sock_create+0x4c0/0xa30 [ 136.343488][ T35] inet_ctl_sock_create+0xc2/0x250 [ 136.343488][ T35] ndisc_net_init+0xa7/0x2b0 [ 136.343488][ T35] ops_init+0x31e/0x590 [ 136.343488][ T35] setup_net+0x287/0x9e0 [ 136.343488][ T35] copy_net_ns+0x33f/0x570 [ 136.343488][ T35] create_new_namespaces+0x425/0x7b0 [ 136.343488][ T35] unshare_nsproxy_namespaces+0x124/0x180 [ 136.343488][ T35] ksys_unshare+0x57d/0xa70 [ 136.343488][ T35] __x64_sys_unshare+0x38/0x40 [ 136.343488][ T35] do_syscall_64+0xf3/0x230 [ 136.343488][ T35] entry_SYSCALL_64_after_hwframe+0x77/0x7f Fixes: 0cafd77dcd03 ("net: add a refcount tracker for kernel sockets") Reported-by: syzbot+30a19e01a97420719891@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/67b72aeb.050a0220.14d86d.0283.GAE@google.com/T/#u Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250220131854.4048077-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
15 hoursnet: Add net_passive_inc() and net_passive_dec().Kuniyuki Iwashima2-4/+20
[ Upstream commit e57a6320215c3967f51ab0edeff87db2095440e4 ] net_drop_ns() is NULL when CONFIG_NET_NS is disabled. The next patch introduces a function that increments and decrements net->passive. As a prep, let's rename and export net_free() to net_passive_dec() and add net_passive_inc(). Suggested-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/netdev/CANn89i+oUCt2VGvrbrweniTendZFEh+nwS=uonc004-aPkWy-Q@mail.gmail.com/ Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250217191129.19967-2-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Stable-dep-of: 59b33fab4ca4 ("smb: client: fix netns refcount leak after net_passive changes") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
15 hoursmfd: cros_ec: Separate charge-control probing from USB-PDThomas Weißschuh1-1/+9
commit e40fc1160d491c3bcaf8e940ae0dde0a7c5e8e14 upstream. The charge-control subsystem in the ChromeOS EC is not strictly tied to its USB-PD subsystem. Since commit 7613bc0d116a ("mfd: cros_ec: Don't load charger with UCSI") the presence of EC_FEATURE_UCSI_PPM would inhibit the probing of the charge-control driver. Furthermore recent versions of the EC firmware in Framework laptops hard-disable EC_FEATURE_USB_PD to avoid probing cros-usbpd-charger, which then also breaks cros-charge-control. Instead use the dedicated EC_FEATURE_CHARGER. Cc: stable@vger.kernel.org Link: https://github.com/FrameworkComputer/EmbeddedController/commit/1d7bcf1d50137c8c01969eb65880bc83e424597e Fixes: 555b5fcdb844 ("mfd: cros_ec: Register charge control subdevice") Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Reviewed-by: Tzung-Bi Shih <tzungbi@kernel.org> Tested-by: Tom Vincent <linux@tlvince.com> Link: https://lore.kernel.org/r/20250521-cros-ec-mfd-chctl-probe-v1-1-6ebfe3a6efa7@weissschuh.net Signed-off-by: Lee Jones <lee@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
15 hoursHID: apple: avoid setting up battery timer for devices without batteryAditya Garg1-6/+11
commit c061046fe9ce3ff31fb9a807144a2630ad349c17 upstream. Currently, the battery timer is set up for all devices using hid-apple, irrespective of whether they actually have a battery or not. APPLE_RDESC_BATTERY is a quirk that indicates the device has a battery and needs the battery timer. This patch checks for this quirk before setting up the timer, ensuring that only devices with a battery will have the timer set up. Fixes: 6e143293e17a ("HID: apple: Report Magic Keyboard battery over USB") Cc: stable@vger.kernel.org Signed-off-by: Aditya Garg <gargaditya08@live.com> Signed-off-by: Jiri Kosina <jkosina@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
15 hourstools/hv: fcopy: Fix irregularities with size of ring bufferNaman Jain1-10/+81
commit a4131a50d072b369bfed0b41e741c41fd8048641 upstream. Size of ring buffer, as defined in uio_hv_generic driver, is no longer fixed to 16 KB. This creates a problem in fcopy, since this size was hardcoded. With the change in place to make ring sysfs node actually reflect the size of underlying ring buffer, it is safe to get the size of ring sysfs file and use it for ring buffer size in fcopy daemon. Fix the issue of disparity in ring buffer size, by making it dynamic in fcopy uio daemon. Cc: stable@vger.kernel.org Fixes: 0315fef2aff9 ("uio_hv_generic: Align ring size to system page") Signed-off-by: Naman Jain <namjain@linux.microsoft.com> Reviewed-by: Saurabh Sengar <ssengar@linux.microsoft.com> Reviewed-by: Long Li <longli@microsoft.com> Link: https://lore.kernel.org/r/20250711060846.9168-1-namjain@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <20250711060846.9168-1-namjain@linux.microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
15 hourswifi: mac80211: check basic rates validity in sta_link_apply_parametersMikhail Lobanov1-6/+6
commit 16ee3ea8faef8ff042acc15867a6c458c573de61 upstream. When userspace sets supported rates for a new station via NL80211_CMD_NEW_STATION, it might send a list that's empty or contains only invalid values. Currently, we process these values in sta_link_apply_parameters() without checking the result of ieee80211_parse_bitrates(), which can lead to an empty rates bitmap. A similar issue was addressed for NL80211_CMD_SET_BSS in commit ce04abc3fcc6 ("wifi: mac80211: check basic rates validity"). This patch applies the same approach in sta_link_apply_parameters() for NL80211_CMD_NEW_STATION, ensuring there is at least one valid rate by inspecting the result of ieee80211_parse_bitrates(). Found by Linux Verification Center (linuxtesting.org) with Syzkaller. Fixes: b95eb7f0eee4 ("wifi: cfg80211/mac80211: separate link params from station params") Signed-off-by: Mikhail Lobanov <m.lobanov@rosa.ru> Link: https://patch.msgid.link/20250317103139.17625-1-m.lobanov@rosa.ru Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Hanne-Lotta Mäenpää <hannelotta@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
15 hoursHID: magicmouse: avoid setting up battery timer when not neededAditya Garg1-19/+37
commit 9bdc30e35cbc1aa78ccf01040354209f1e11ca22 upstream. Currently, the battery timer is set up for all devices using hid-magicmouse, irrespective of whether they actually need it or not. The current implementation requires the battery timer for Magic Mouse 2 and Magic Trackpad 2 when connected via USB only. Add checks to ensure that the battery timer is only set up when they are connected via USB. Fixes: 0b91b4e4dae6 ("HID: magicmouse: Report battery level over USB") Cc: stable@vger.kernel.org Signed-off-by: Aditya Garg <gargaditya08@live.com> Signed-off-by: Jiri Kosina <jkosina@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
15 hoursRDMA/siw: Fix the sendmsg byte count in siw_tcp_sendpagesPedro Falcato1-3/+2
commit c18646248fed07683d4cee8a8af933fc4fe83c0d upstream. Ever since commit c2ff29e99a76 ("siw: Inline do_tcp_sendpages()"), we have been doing this: static int siw_tcp_sendpages(struct socket *s, struct page **page, int offset, size_t size) [...] /* Calculate the number of bytes we need to push, for this page * specifically */ size_t bytes = min_t(size_t, PAGE_SIZE - offset, size); /* If we can't splice it, then copy it in, as normal */ if (!sendpage_ok(page[i])) msg.msg_flags &= ~MSG_SPLICE_PAGES; /* Set the bvec pointing to the page, with len $bytes */ bvec_set_page(&bvec, page[i], bytes, offset); /* Set the iter to $size, aka the size of the whole sendpages (!!!) */ iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size); try_page_again: lock_sock(sk); /* Sendmsg with $size size (!!!) */ rv = tcp_sendmsg_locked(sk, &msg, size); This means we've been sending oversized iov_iters and tcp_sendmsg calls for a while. This has a been a benign bug because sendpage_ok() always returned true. With the recent slab allocator changes being slowly introduced into next (that disallow sendpage on large kmalloc allocations), we have recently hit out-of-bounds crashes, due to slight differences in iov_iter behavior between the MSG_SPLICE_PAGES and "regular" copy paths: (MSG_SPLICE_PAGES) skb_splice_from_iter iov_iter_extract_pages iov_iter_extract_bvec_pages uses i->nr_segs to correctly stop in its tracks before OoB'ing everywhere skb_splice_from_iter gets a "short" read (!MSG_SPLICE_PAGES) skb_copy_to_page_nocache copy=iov_iter_count [...] copy_from_iter /* this doesn't help */ if (unlikely(iter->count < len)) len = iter->count; iterate_bvec ... and we run off the bvecs Fix this by properly setting the iov_iter's byte count, plus sending the correct byte count to tcp_sendmsg_locked. Link: https://patch.msgid.link/r/20250729120348.495568-1-pfalcato@suse.de Cc: stable@vger.kernel.org Fixes: c2ff29e99a76 ("siw: Inline do_tcp_sendpages()") Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202507220801.50a7210-lkp@intel.com Reviewed-by: David Howells <dhowells@redhat.com> Signed-off-by: Pedro Falcato <pfalcato@suse.de> Acked-by: Bernard Metzler <bernard.metzler@linux.dev> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
15 hourstools/nolibc: fix spelling of FD_SETBITMASK in FD_* macrosWilly Tarreau1-2/+2
commit a477629baa2a0e9991f640af418e8c973a1c08e3 upstream. While nolibc-test does test syscalls, it doesn't test as much the rest of the macros, and a wrong spelling of FD_SETBITMASK in commit feaf75658783a broke programs using either FD_SET() or FD_CLR() without being noticed. Let's fix these macros. Fixes: feaf75658783a ("nolibc: fix fd_set type") Cc: stable@vger.kernel.org # v6.2+ Acked-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
15 hoursmedia: v4l2: Add support for NV12M tiled variants to v4l2_format_info()Marek Szyprowski1-0/+6
commit f7546da1d6eb8928efb89b7faacbd6c2f8f0de5c upstream. Commit 6f1466123d73 ("media: s5p-mfc: Add YV12 and I420 multiplanar format support") added support for the new formats to s5p-mfc driver, what in turn required some internal calls to the v4l2_format_info() function while setting up formats. This in turn broke support for the "old" tiled NV12MT* formats, which are not recognized by this function. Fix this by adding those variants of NV12M pixel format to v4l2_format_info() function database. Fixes: 6f1466123d73 ("media: s5p-mfc: Add YV12 and I420 multiplanar format support") Cc: stable@vger.kernel.org Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com> Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
15 hoursmedia: uvcvideo: Do not mark valid metadata as invalidRicardo Ribalda1-6/+6
commit bda2859bff0b9596a19648f3740c697ce4c71496 upstream. Currently, the driver performs a length check of the metadata buffer before the actual metadata size is known and before the metadata is decided to be copied. This results in valid metadata buffers being incorrectly marked as invalid. Move the length check to occur after the metadata size is determined and is decided to be copied. Cc: stable@vger.kernel.org Fixes: 088ead255245 ("media: uvcvideo: Add a metadata device node") Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Reviewed-by: Hans de Goede <hansg@kernel.org> Signed-off-by: Ricardo Ribalda <ribalda@chromium.org> Link: https://lore.kernel.org/r/20250707-uvc-meta-v8-1-ed17f8b1218b@chromium.org Signed-off-by: Hans de Goede <hansg@kernel.org> Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
15 hoursmedia: venus: Fix OOB read due to missing payload bound checkVedang Nagar1-25/+58
commit 06d6770ff0d8cc8dfd392329a8cc03e2a83e7289 upstream. Currently, The event_seq_changed() handler processes a variable number of properties sent by the firmware. The number of properties is indicated by the firmware and used to iterate over the payload. However, the payload size is not being validated against the actual message length. This can lead to out-of-bounds memory access if the firmware provides a property count that exceeds the data available in the payload. Such a condition can result in kernel crashes or potential information leaks if memory beyond the buffer is accessed. Fix this by properly validating the remaining size of the payload before each property access and updating bounds accordingly as properties are parsed. This ensures that property parsing is safely bounded within the received message buffer and protects against malformed or malicious firmware behavior. Fixes: 09c2845e8fe4 ("[media] media: venus: hfi: add Host Firmware Interface (HFI)") Cc: stable@vger.kernel.org Signed-off-by: Vedang Nagar <quic_vnagar@quicinc.com> Reviewed-by: Vikash Garodia <quic_vgarodia@quicinc.com> Reviewed-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org> Co-developed-by: Dikshita Agarwal <quic_dikshita@quicinc.com> Signed-off-by: Dikshita Agarwal <quic_dikshita@quicinc.com> Signed-off-by: Bryan O'Donoghue <bod@kernel.org> Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
15 hoursmedia: uvcvideo: Fix 1-byte out-of-bounds read in uvc_parse_format()Youngjun Lee1-0/+3
commit 782b6a718651eda3478b1824b37a8b3185d2740c upstream. The buffer length check before calling uvc_parse_format() only ensured that the buffer has at least 3 bytes (buflen > 2), buf the function accesses buffer[3], requiring at least 4 bytes. This can lead to an out-of-bounds read if the buffer has exactly 3 bytes. Fix it by checking that the buffer has at least 4 bytes in uvc_parse_format(). Signed-off-by: Youngjun Lee <yjjuny.lee@samsung.com> Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Fixes: c0efd232929c ("V4L/DVB (8145a): USB Video Class driver") Cc: stable@vger.kernel.org Reviewed-by: Ricardo Ribalda <ribalda@chromium.org> Link: https://lore.kernel.org/r/20250610124107.37360-1-yjjuny.lee@samsung.com Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
15 hoursmm/kmemleak: avoid deadlock by moving pr_warn() outside kmemleak_lockBreno Leitao1-1/+4
commit 47b0f6d8f0d2be4d311a49e13d2fd5f152f492b2 upstream. When netpoll is enabled, calling pr_warn_once() while holding kmemleak_lock in mem_pool_alloc() can cause a deadlock due to lock inversion with the netconsole subsystem. This occurs because pr_warn_once() may trigger netpoll, which eventually leads to __alloc_skb() and back into kmemleak code, attempting to reacquire kmemleak_lock. This is the path for the deadlock. mem_pool_alloc() -> raw_spin_lock_irqsave(&kmemleak_lock, flags); -> pr_warn_once() -> netconsole subsystem -> netpoll -> __alloc_skb -> __create_object -> raw_spin_lock_irqsave(&kmemleak_lock, flags); Fix this by setting a flag and issuing the pr_warn_once() after kmemleak_lock is released. Link: https://lkml.kernel.org/r/20250731-kmemleak_lock-v1-1-728fd470198f@debian.org Fixes: c5665868183f ("mm: kmemleak: use the memory pool for early allocations") Signed-off-by: Breno Leitao <leitao@debian.org> Reported-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
15 hoursmm/kmemleak: avoid soft lockup in __kmemleak_do_cleanup()Waiman Long1-0/+5
commit d1534ae23c2b6be350c8ab060803fbf6e9682adc upstream. A soft lockup warning was observed on a relative small system x86-64 system with 16 GB of memory when running a debug kernel with kmemleak enabled. watchdog: BUG: soft lockup - CPU#8 stuck for 33s! [kworker/8:1:134] The test system was running a workload with hot unplug happening in parallel. Then kemleak decided to disable itself due to its inability to allocate more kmemleak objects. The debug kernel has its CONFIG_DEBUG_KMEMLEAK_MEM_POOL_SIZE set to 40,000. The soft lockup happened in kmemleak_do_cleanup() when the existing kmemleak objects were being removed and deleted one-by-one in a loop via a workqueue. In this particular case, there are at least 40,000 objects that need to be processed and given the slowness of a debug kernel and the fact that a raw_spinlock has to be acquired and released in __delete_object(), it could take a while to properly handle all these objects. As kmemleak has been disabled in this case, the object removal and deletion process can be further optimized as locking isn't really needed. However, it is probably not worth the effort to optimize for such an edge case that should rarely happen. So the simple solution is to call cond_resched() at periodic interval in the iteration loop to avoid soft lockup. Link: https://lkml.kernel.org/r/20250728190248.605750-1-longman@redhat.com Signed-off-by: Waiman Long <longman@redhat.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
15 hoursmm/ptdump: take the memory hotplug lock inside ptdump_walk_pgd()Anshuman Khandual4-8/+2
commit 59305202c67fea50378dcad0cc199dbc13a0e99a upstream. Memory hot remove unmaps and tears down various kernel page table regions as required. The ptdump code can race with concurrent modifications of the kernel page tables. When leaf entries are modified concurrently, the dump code may log stale or inconsistent information for a VA range, but this is otherwise not harmful. But when intermediate levels of kernel page table are freed, the dump code will continue to use memory that has been freed and potentially reallocated for another purpose. In such cases, the ptdump code may dereference bogus addresses, leading to a number of potential problems. To avoid the above mentioned race condition, platforms such as arm64, riscv and s390 take memory hotplug lock, while dumping kernel page table via the sysfs interface /sys/kernel/debug/kernel_page_tables. Similar race condition exists while checking for pages that might have been marked W+X via /sys/kernel/debug/kernel_page_tables/check_wx_pages which in turn calls ptdump_check_wx(). Instead of solving this race condition again, let's just move the memory hotplug lock inside generic ptdump_check_wx() which will benefit both the scenarios. Drop get_online_mems() and put_online_mems() combination from all existing platform ptdump code paths. Link: https://lkml.kernel.org/r/20250620052427.2092093-1-anshuman.khandual@arm.com Fixes: bbd6ec605c0f ("arm64/mm: Enable memory hot remove") Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Dev Jain <dev.jain@arm.com> Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> [s390] Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
15 hoursmm, slab: restore NUMA policy support for large kmallocVlastimil Babka1-1/+6
commit e2d18cbf178775ad377ad88ee55e6e183c38d262 upstream. The slab allocator observes the task's NUMA policy in various places such as allocating slab pages. Large kmalloc() allocations used to do that too, until an unintended change by c4cab557521a ("mm/slab_common: cleanup kmalloc_large()") resulted in ignoring mempolicy and just preferring the local node. Restore the NUMA policy support. Fixes: c4cab557521a ("mm/slab_common: cleanup kmalloc_large()") Cc: <stable@vger.kernel.org> Acked-by: Christoph Lameter (Ampere) <cl@gentwo.org> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Reviewed-by: Harry Yoo <harry.yoo@oracle.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
15 hoursparisc: Makefile: fix a typo in palo.confRandy Dunlap1-1/+1
commit 963f1b20a8d2a098954606b9725cd54336a2a86c upstream. Correct "objree" to "objtree". "objree" is not defined. Fixes: 75dd47472b92 ("kbuild: remove src and obj from the top Makefile") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Helge Deller <deller@gmx.de> Cc: linux-parisc@vger.kernel.org Signed-off-by: Helge Deller <deller@gmx.de> Cc: stable@vger.kernel.org # v5.3+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
15 hourshv_netvsc: Fix panic during namespace deletion with VFHaiyang Zhang2-1/+31
commit 33caa208dba6fa639e8a92fd0c8320b652e5550c upstream. The existing code move the VF NIC to new namespace when NETDEV_REGISTER is received on netvsc NIC. During deletion of the namespace, default_device_exit_batch() >> default_device_exit_net() is called. When netvsc NIC is moved back and registered to the default namespace, it automatically brings VF NIC back to the default namespace. This will cause the default_device_exit_net() >> for_each_netdev_safe loop unable to detect the list end, and hit NULL ptr: [ 231.449420] mana 7870:00:00.0 enP30832s1: Moved VF to namespace with: eth0 [ 231.449656] BUG: kernel NULL pointer dereference, address: 0000000000000010 [ 231.450246] #PF: supervisor read access in kernel mode [ 231.450579] #PF: error_code(0x0000) - not-present page [ 231.450916] PGD 17b8a8067 P4D 0 [ 231.451163] Oops: Oops: 0000 [#1] SMP NOPTI [ 231.451450] CPU: 82 UID: 0 PID: 1394 Comm: kworker/u768:1 Not tainted 6.16.0-rc4+ #3 VOLUNTARY [ 231.452042] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 11/21/2024 [ 231.452692] Workqueue: netns cleanup_net [ 231.452947] RIP: 0010:default_device_exit_batch+0x16c/0x3f0 [ 231.453326] Code: c0 0c f5 b3 e8 d5 db fe ff 48 85 c0 74 15 48 c7 c2 f8 fd ca b2 be 10 00 00 00 48 8d 7d c0 e8 7b 77 25 00 49 8b 86 28 01 00 00 <48> 8b 50 10 4c 8b 2a 4c 8d 62 f0 49 83 ed 10 4c 39 e0 0f 84 d6 00 [ 231.454294] RSP: 0018:ff75fc7c9bf9fd00 EFLAGS: 00010246 [ 231.454610] RAX: 0000000000000000 RBX: 0000000000000002 RCX: 61c8864680b583eb [ 231.455094] RDX: ff1fa9f71462d800 RSI: ff75fc7c9bf9fd38 RDI: 0000000030766564 [ 231.455686] RBP: ff75fc7c9bf9fd78 R08: 0000000000000000 R09: 0000000000000000 [ 231.456126] R10: 0000000000000001 R11: 0000000000000004 R12: ff1fa9f70088e340 [ 231.456621] R13: ff1fa9f70088e340 R14: ffffffffb3f50c20 R15: ff1fa9f7103e6340 [ 231.457161] FS: 0000000000000000(0000) GS:ff1faa6783a08000(0000) knlGS:0000000000000000 [ 231.457707] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 231.458031] CR2: 0000000000000010 CR3: 0000000179ab2006 CR4: 0000000000b73ef0 [ 231.458434] Call Trace: [ 231.458600] <TASK> [ 231.458777] ops_undo_list+0x100/0x220 [ 231.459015] cleanup_net+0x1b8/0x300 [ 231.459285] process_one_work+0x184/0x340 To fix it, move the ns change to a workqueue, and take rtnl_lock to avoid changing the netdev list when default_device_exit_net() is using it. Cc: stable@vger.kernel.org Fixes: 4c262801ea60 ("hv_netvsc: Fix VF namespace also in synthetic NIC NETDEV_REGISTER event") Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Link: https://patch.msgid.link/1754511711-11188-1-git-send-email-haiyangz@linux.microsoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
15 hoursnet/sched: ets: use old 'nbands' while purging unused classesDavide Caratti1-5/+6
commit 87c6efc5ce9c126ae4a781bc04504b83780e3650 upstream. Shuang reported sch_ets test-case [1] crashing in ets_class_qlen_notify() after recent changes from Lion [2]. The problem is: in ets_qdisc_change() we purge unused DWRR queues; the value of 'q->nbands' is the new one, and the cleanup should be done with the old one. The problem is here since my first attempts to fix ets_qdisc_change(), but it surfaced again after the recent qdisc len accounting fixes. Fix it purging idle DWRR queues before assigning a new value of 'q->nbands', so that all purge operations find a consistent configuration: - old 'q->nbands' because it's needed by ets_class_find() - old 'q->nstrict' because it's needed by ets_class_is_strict() BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: Oops: 0000 [#1] SMP NOPTI CPU: 62 UID: 0 PID: 39457 Comm: tc Kdump: loaded Not tainted 6.12.0-116.el10.x86_64 #1 PREEMPT(voluntary) Hardware name: Dell Inc. PowerEdge R640/06DKY5, BIOS 2.12.2 07/09/2021 RIP: 0010:__list_del_entry_valid_or_report+0x4/0x80 Code: ff 4c 39 c7 0f 84 39 19 8e ff b8 01 00 00 00 c3 cc cc cc cc 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa <48> 8b 17 48 8b 4f 08 48 85 d2 0f 84 56 19 8e ff 48 85 c9 0f 84 ab RSP: 0018:ffffba186009f400 EFLAGS: 00010202 RAX: 00000000000000d6 RBX: 0000000000000000 RCX: 0000000000000004 RDX: ffff9f0fa29b69c0 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffffffffc12c2400 R08: 0000000000000008 R09: 0000000000000004 R10: ffffffffffffffff R11: 0000000000000004 R12: 0000000000000000 R13: ffff9f0f8cfe0000 R14: 0000000000100005 R15: 0000000000000000 FS: 00007f2154f37480(0000) GS:ffff9f269c1c0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 00000001530be001 CR4: 00000000007726f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <TASK> ets_class_qlen_notify+0x65/0x90 [sch_ets] qdisc_tree_reduce_backlog+0x74/0x110 ets_qdisc_change+0x630/0xa40 [sch_ets] __tc_modify_qdisc.constprop.0+0x216/0x7f0 tc_modify_qdisc+0x7c/0x120 rtnetlink_rcv_msg+0x145/0x3f0 netlink_rcv_skb+0x53/0x100 netlink_unicast+0x245/0x390 netlink_sendmsg+0x21b/0x470 ____sys_sendmsg+0x39d/0x3d0 ___sys_sendmsg+0x9a/0xe0 __sys_sendmsg+0x7a/0xd0 do_syscall_64+0x7d/0x160 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7f2155114084 Code: 89 02 b8 ff ff ff ff eb bb 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa 80 3d 25 f0 0c 00 00 74 13 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 48 83 ec 28 89 54 24 1c 48 89 RSP: 002b:00007fff1fd7a988 EFLAGS: 00000202 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 0000560ec063e5e0 RCX: 00007f2155114084 RDX: 0000000000000000 RSI: 00007fff1fd7a9f0 RDI: 0000000000000003 RBP: 00007fff1fd7aa60 R08: 0000000000000010 R09: 000000000000003f R10: 0000560ee9b3a010 R11: 0000000000000202 R12: 00007fff1fd7aae0 R13: 000000006891ccde R14: 0000560ec063e5e0 R15: 00007fff1fd7aad0 </TASK> [1] https://lore.kernel.org/netdev/e08c7f4a6882f260011909a868311c6e9b54f3e4.1639153474.git.dcaratti@redhat.com/ [2] https://lore.kernel.org/netdev/d912cbd7-193b-4269-9857-525bee8bbb6a@gmail.com/ Cc: stable@vger.kernel.org Fixes: 103406b38c60 ("net/sched: Always pass notifications when child class becomes empty") Fixes: c062f2a0b04d ("net/sched: sch_ets: don't remove idle classes from the round-robin list") Fixes: dcc68b4d8084 ("net: sch_ets: Add a new Qdisc") Reported-by: Li Shuang <shuali@redhat.com> Closes: https://issues.redhat.com/browse/RHEL-108026 Reviewed-by: Petr Machata <petrm@nvidia.com> Co-developed-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Link: https://patch.msgid.link/7928ff6d17db47a2ae7cc205c44777b1f1950545.1755016081.git.dcaratti@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
15 hoursfbdev: Fix vmalloc out-of-bounds write in fast_imageblitSravan Kumar Gundu1-4/+5
commit af0db3c1f898144846d4c172531a199bb3ca375d upstream. This issue triggers when a userspace program does an ioctl FBIOPUT_CON2FBMAP by passing console number and frame buffer number. Ideally this maps console to frame buffer and updates the screen if console is visible. As part of mapping it has to do resize of console according to frame buffer info. if this resize fails and returns from vc_do_resize() and continues further. At this point console and new frame buffer are mapped and sets display vars. Despite failure still it continue to proceed updating the screen at later stages where vc_data is related to previous frame buffer and frame buffer info and display vars are mapped to new frame buffer and eventully leading to out-of-bounds write in fast_imageblit(). This bheviour is excepted only when fg_console is equal to requested console which is a visible console and updates screen with invalid struct references in fbcon_putcs(). Reported-and-tested-by: syzbot+c4b7aa0513823e2ea880@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=c4b7aa0513823e2ea880 Signed-off-by: Sravan Kumar Gundu <sravankumarlpu@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
15 hoursuserfaultfd: fix a crash in UFFDIO_MOVE when PMD is a migration entrySuren Baghdasaryan1-7/+10
commit aba6faec0103ed8f169be8dce2ead41fcb689446 upstream. When UFFDIO_MOVE encounters a migration PMD entry, it proceeds with obtaining a folio and accessing it even though the entry is swp_entry_t. Add the missing check and let split_huge_pmd() handle migration entries. While at it also remove unnecessary folio check. [surenb@google.com: remove extra folio check, per David] Link: https://lkml.kernel.org/r/20250807200418.1963585-1-surenb@google.com Link: https://lkml.kernel.org/r/20250806220022.926763-1-surenb@google.com Fixes: adef440691ba ("userfaultfd: UFFDIO_MOVE uABI") Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reported-by: syzbot+b446dbe27035ef6bd6c2@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/68794b5c.a70a0220.693ce.0050.GAE@google.com/ Reviewed-by: Peter Xu <peterx@redhat.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Lokesh Gidra <lokeshgidra@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
15 hoursxfs: fix scrub trace with null pointer in quotacheckAndrey Albershteyn1-1/+1
commit 5d94b19f066480addfcdcb5efde66152ad5a7c0e upstream. The quotacheck doesn't initialize sc->ip. Cc: stable@vger.kernel.org # v6.8 Fixes: 21d7500929c8a0 ("xfs: improve dquot iteration for scrub") Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
15 hoursbtrfs: do not allow relocation of partially dropped subvolumesQu Wenruo1-0/+19
commit 4289b494ac553e74e86fed1c66b2bf9530bc1082 upstream. [BUG] There is an internal report that balance triggered transaction abort, with the following call trace: item 85 key (594509824 169 0) itemoff 12599 itemsize 33 extent refs 1 gen 197740 flags 2 ref#0: tree block backref root 7 item 86 key (594558976 169 0) itemoff 12566 itemsize 33 extent refs 1 gen 197522 flags 2 ref#0: tree block backref root 7 ... BTRFS error (device loop0): extent item not found for insert, bytenr 594526208 num_bytes 16384 parent 449921024 root_objectid 934 owner 1 offset 0 BTRFS error (device loop0): failed to run delayed ref for logical 594526208 num_bytes 16384 type 182 action 1 ref_mod 1: -117 ------------[ cut here ]------------ BTRFS: Transaction aborted (error -117) WARNING: CPU: 1 PID: 6963 at ../fs/btrfs/extent-tree.c:2168 btrfs_run_delayed_refs+0xfa/0x110 [btrfs] And btrfs check doesn't report anything wrong related to the extent tree. [CAUSE] The cause is a little complex, firstly the extent tree indeed doesn't have the backref for 594526208. The extent tree only have the following two backrefs around that bytenr on-disk: item 65 key (594509824 METADATA_ITEM 0) itemoff 13880 itemsize 33 refs 1 gen 197740 flags TREE_BLOCK tree block skinny level 0 (176 0x7) tree block backref root CSUM_TREE item 66 key (594558976 METADATA_ITEM 0) itemoff 13847 itemsize 33 refs 1 gen 197522 flags TREE_BLOCK tree block skinny level 0 (176 0x7) tree block backref root CSUM_TREE But the such missing backref item is not an corruption on disk, as the offending delayed ref belongs to subvolume 934, and that subvolume is being dropped: item 0 key (934 ROOT_ITEM 198229) itemoff 15844 itemsize 439 generation 198229 root_dirid 256 bytenr 10741039104 byte_limit 0 bytes_used 345571328 last_snapshot 198229 flags 0x1000000000001(RDONLY) refs 0 drop_progress key (206324 EXTENT_DATA 2711650304) drop_level 2 level 2 generation_v2 198229 And that offending tree block 594526208 is inside the dropped range of that subvolume. That explains why there is no backref item for that bytenr and why btrfs check is not reporting anything wrong. But this also shows another problem, as btrfs will do all the orphan subvolume cleanup at a read-write mount. So half-dropped subvolume should not exist after an RW mount, and balance itself is also exclusive to subvolume cleanup, meaning we shouldn't hit a subvolume half-dropped during relocation. The root cause is, there is no orphan item for this subvolume. In fact there are 5 subvolumes from around 2021 that have the same problem. It looks like the original report has some older kernels running, and caused those zombie subvolumes. Thankfully upstream commit 8d488a8c7ba2 ("btrfs: fix subvolume/snapshot deletion not triggered on mount") has long fixed the bug. [ENHANCEMENT] For repairing such old fs, btrfs-progs will be enhanced. Considering how delayed the problem will show up (at run delayed ref time) and at that time we have to abort transaction already, it is too late. Instead here we reject any half-dropped subvolume for reloc tree at the earliest time, preventing confusion and extra time wasted on debugging similar bugs. CC: stable@vger.kernel.org # 5.15+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
15 hoursbtrfs: fix iteration bug in __qgroup_excl_accounting()Boris Burkov1-2/+1
commit 7b632596188e1973c6b3ac1c9f8252f735e1039f upstream. __qgroup_excl_accounting() uses the qgroup iterator machinery to update the account of one qgroups usage for all its parent hierarchy, when we either add or remove a relation and have only exclusive usage. However, there is a small bug there: we loop with an extra iteration temporary qgroup called `cur` but never actually refer to that in the body of the loop. As a result, we redundantly account the same usage to the first qgroup in the list. This can be reproduced in the following way: mkfs.btrfs -f -O squota <dev> mount <dev> <mnt> btrfs subvol create <mnt>/sv dd if=/dev/zero of=<mnt>/sv/f bs=1M count=1 sync btrfs qgroup create 1/100 <mnt> btrfs qgroup create 2/200 <mnt> btrfs qgroup assign 1/100 2/200 <mnt> btrfs qgroup assign 0/256 1/100 <mnt> btrfs qgroup show <mnt> and the broken result is (note the 2MiB on 1/100 and 0Mib on 2/100): Qgroupid Referenced Exclusive Path -------- ---------- --------- ---- 0/5 16.00KiB 16.00KiB <toplevel> 0/256 1.02MiB 1.02MiB sv Qgroupid Referenced Exclusive Path -------- ---------- --------- ---- 0/5 16.00KiB 16.00KiB <toplevel> 0/256 1.02MiB 1.02MiB sv 1/100 2.03MiB 2.03MiB 2/100<1 member qgroup> 2/100 0.00B 0.00B <0 member qgroups> With this fix, which simply re-uses `qgroup` as the iteration variable, we see the expected result: Qgroupid Referenced Exclusive Path -------- ---------- --------- ---- 0/5 16.00KiB 16.00KiB <toplevel> 0/256 1.02MiB 1.02MiB sv Qgroupid Referenced Exclusive Path -------- ---------- --------- ---- 0/5 16.00KiB 16.00KiB <toplevel> 0/256 1.02MiB 1.02MiB sv 1/100 1.02MiB 1.02MiB 2/100<1 member qgroup> 2/100 1.02MiB 1.02MiB <0 member qgroups> The existing fstests did not exercise two layer inheritance so this bug was missed. I intend to add that testing there, as well. Fixes: a0bdc04b0732 ("btrfs: qgroup: use qgroup_iterator in __qgroup_excl_accounting()") CC: stable@vger.kernel.org # 6.12+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Boris Burkov <boris@bur.io> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
15 hoursbtrfs: zoned: do not select metadata BG as finish targetNaohiro Aota1-1/+1
commit 3a931e9b39c7ff8066657042f5f00d3b7e6ad315 upstream. We call btrfs_zone_finish_one_bg() to zone finish one block group and make room to activate another block group. Currently, we can choose a metadata block group as a target. But, as we reserve an active metadata block group, we no longer want to select a metadata block group. So, skip it in the loop. CC: stable@vger.kernel.org # 6.6+ Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
15 hoursbtrfs: error on missing block group when unaccounting log tree extent buffersFilipe Manana1-12/+7
commit fc5799986fbca957e2e3c0480027f249951b7bcf upstream. Currently we only log an error message if we can't find the block group for a log tree extent buffer when unaccounting it (while freeing a log tree). A missing block group means something is seriously wrong and we end up leaking space from the metadata space info. So return -ENOENT in case we don't find the block group. CC: stable@vger.kernel.org # 6.12+ Reviewed-by: Boris Burkov <boris@bur.io> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
15 hoursbtrfs: fix log tree replay failure due to file with 0 links and extentsFilipe Manana1-18/+30
commit 0a32e4f0025a74c70dcab4478e9b29c22f5ecf2f upstream. If we log a new inode (not persisted in a past transaction) that has 0 links and extents, then log another inode with an higher inode number, we end up with failing to replay the log tree with -EINVAL. The steps for this are: 1) create new file A 2) write some data to file A 3) open an fd on file A 4) unlink file A 5) fsync file A using the previously open fd 6) create file B (has higher inode number than file A) 7) fsync file B 8) power fail before current transaction commits Now when attempting to mount the fs, the log replay will fail with -ENOENT at replay_one_extent() when attempting to replay the first extent of file A. The failure comes when trying to open the inode for file A in the subvolume tree, since it doesn't exist. Before commit 5f61b961599a ("btrfs: fix inode lookup error handling during log replay"), the returned error was -EIO instead of -ENOENT, since we converted any errors when attempting to read an inode during log replay to -EIO. The reason for this is that the log replay procedure fails to ignore the current inode when we are at the stage LOG_WALK_REPLAY_ALL, our current inode has 0 links and last inode we processed in the previous stage has a non 0 link count. In other words, the issue is that at replay_one_extent() we only update wc->ignore_cur_inode if the current replay stage is LOG_WALK_REPLAY_INODES. Fix this by updating wc->ignore_cur_inode whenever we find an inode item regardless of the current replay stage. This is a simple solution and easy to backport, but later we can do other alternatives like avoid logging extents or inode items other than the inode item for inodes with a link count of 0. The problem with the wc->ignore_cur_inode logic has been around since commit f2d72f42d5fa ("Btrfs: fix warning when replaying log after fsync of a tmpfile") but it only became frequent to hit since the more recent commit 5e85262e542d ("btrfs: fix fsync of files with no hard links not persisting deletion"), because we stopped skipping inodes with a link count of 0 when logging, while before the problem would only be triggered if trying to replay a log tree created with an older kernel which has a logged inode with 0 links. A test case for fstests will be submitted soon. Reported-by: Peter Jung <ptr1337@cachyos.org> Link: https://lore.kernel.org/linux-btrfs/fce139db-4458-4788-bb97-c29acf6cb1df@cachyos.org/ Reported-by: burneddi <burneddi@protonmail.com> Link: https://lore.kernel.org/linux-btrfs/lh4W-Lwc0Mbk-QvBhhQyZxf6VbM3E8VtIvU3fPIQgweP_Q1n7wtlUZQc33sYlCKYd-o6rryJQfhHaNAOWWRKxpAXhM8NZPojzsJPyHMf2qY=@protonmail.com/#t Reported-by: Russell Haley <yumpusamongus@gmail.com> Link: https://lore.kernel.org/linux-btrfs/598ecc75-eb80-41b3-83c2-f2317fbb9864@gmail.com/ Fixes: f2d72f42d5fa ("Btrfs: fix warning when replaying log after fsync of a tmpfile") CC: stable@vger.kernel.org # 5.4+ Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
15 hoursbtrfs: clear dirty status from extent buffer on error at insert_new_root()Filipe Manana1-0/+1
commit c0d013495a80cbb53e2288af7ae0ec4170aafd7c upstream. If we failed to insert the tree mod log operation, we are not removing the dirty status from the allocated and dirtied extent buffer before we free it. Removing the dirty status is needed for several reasons such as to adjust the fs_info->dirty_metadata_bytes counter and remove the dirty status from the respective folios. So add the missing call to btrfs_clear_buffer_dirty(). Fixes: f61aa7ba08ab ("btrfs: do not BUG_ON() on tree mod log failure at insert_new_root()") CC: stable@vger.kernel.org # 6.6+ Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
15 hoursbtrfs: don't skip remaining extrefs if dir not found during log replayFilipe Manana1-4/+14
commit 24e066ded45b8147b79c7455ac43a5bff7b5f378 upstream. During log replay, at add_inode_ref(), if we have an extref item that contains multiple extrefs and one of them points to a directory that does not exist in the subvolume tree, we are supposed to ignore it and process the remaining extrefs encoded in the extref item, since each extref can point to a different parent inode. However when that happens we just return from the function and ignore the remaining extrefs. The problem has been around since extrefs were introduced, in commit f186373fef00 ("btrfs: extended inode refs"), but it's hard to hit in practice because getting extref items encoding multiple extref requires getting a hash collision when computing the offset of the extref's key. The offset if computed like this: key.offset = btrfs_extref_hash(dir_ino, name->name, name->len); and btrfs_extref_hash() is just a wrapper around crc32c(). Fix this by moving to next iteration of the loop when we don't find the parent directory that an extref points to. Fixes: f186373fef00 ("btrfs: extended inode refs") CC: stable@vger.kernel.org # 6.1+ Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
15 hoursbtrfs: qgroup: fix qgroup create ioctl returning success after quotas disabledFilipe Manana2-5/+4
commit 08530d6e638427e7e1344bd67bacc03882ba95b9 upstream. When quotas are disabled qgroup ioctls are supposed to return -ENOTCONN, but the qgroup create ioctl stopped doing that when it races with a quota disable operation, returning 0 instead. This change of behaviour happened in commit 6ed05643ddb1 ("btrfs: create qgroup earlier in snapshot creation"). The issue happens as follows: 1) Task A enters btrfs_ioctl_qgroup_create(), qgroups are enabled and so qgroup_enabled() returns true since fs_info->quota_root is not NULL; 2) Task B enters btrfs_ioctl_quota_ctl() -> btrfs_quota_disable() and disables qgroups, so now fs_info->quota_root is NULL; 3) Task A enters btrfs_create_qgroup() and calls btrfs_qgroup_mode(), which returns BTRFS_QGROUP_MODE_DISABLED since quotas are disabled, and then btrfs_create_qgroup() returns 0 to the caller, which makes the ioctl return 0 instead of -ENOTCONN. The check for fs_info->quota_root and returning -ENOTCONN if it's NULL is made only after the call btrfs_qgroup_mode(). Fix this by moving the check for disabled quotas with btrfs_qgroup_mode() into transaction.c:create_pending_snapshot(), so that we don't abort the transaction if btrfs_create_qgroup() returns -ENOTCONN and quotas are disabled. Fixes: 6ed05643ddb1 ("btrfs: create qgroup earlier in snapshot creation") CC: stable@vger.kernel.org # 6.12+ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
15 hoursbtrfs: populate otime when logging an inode itemQu Wenruo1-0/+3
commit 1ef94169db0958d6de39f9ea6e063ce887342e2d upstream. [TEST FAILURE WITH EXPERIMENTAL FEATURES] When running test case generic/508, the test case will fail with the new btrfs shutdown support: generic/508 - output mismatch (see /home/adam/xfstests/results//generic/508.out.bad) # --- tests/generic/508.out 2022-05-11 11:25:30.806666664 +0930 # +++ /home/adam/xfstests/results//generic/508.out.bad 2025-07-02 14:53:22.401824212 +0930 # @@ -1,2 +1,6 @@ # QA output created by 508 # Silence is golden # +Before: # +After : stat.btime = Thu Jan 1 09:30:00 1970 # +Before: # +After : stat.btime = Wed Jul 2 14:53:22 2025 # ... # (Run 'diff -u /home/adam/xfstests/tests/generic/508.out /home/adam/xfstests/results//generic/508.out.bad' to see the entire diff) Ran: generic/508 Failures: generic/508 Failed 1 of 1 tests Please note that the test case requires shutdown support, thus the test case will be skipped using the current upstream kernel, as it doesn't have shutdown ioctl support. [CAUSE] The direct cause the 0 time stamp in the log tree: leaf 30507008 items 2 free space 16057 generation 9 owner TREE_LOG leaf 30507008 flags 0x1(WRITTEN) backref revision 1 checksum stored e522548d checksum calced e522548d fs uuid 57d45451-481e-43e4-aa93-289ad707a3a0 chunk uuid d52bd3fd-5163-4337-98a7-7986993ad398 item 0 key (257 INODE_ITEM 0) itemoff 16123 itemsize 160 generation 9 transid 9 size 0 nbytes 0 block group 0 mode 100644 links 1 uid 0 gid 0 rdev 0 sequence 1 flags 0x0(none) atime 1751432947.492000000 (2025-07-02 14:39:07) ctime 1751432947.492000000 (2025-07-02 14:39:07) mtime 1751432947.492000000 (2025-07-02 14:39:07) otime 0.0 (1970-01-01 09:30:00) <<< But the old fs tree has all the correct time stamp: btrfs-progs v6.12 fs tree key (FS_TREE ROOT_ITEM 0) leaf 30425088 items 2 free space 16061 generation 5 owner FS_TREE leaf 30425088 flags 0x1(WRITTEN) backref revision 1 checksum stored 48f6c57e checksum calced 48f6c57e fs uuid 57d45451-481e-43e4-aa93-289ad707a3a0 chunk uuid d52bd3fd-5163-4337-98a7-7986993ad398 item 0 key (256 INODE_ITEM 0) itemoff 16123 itemsize 160 generation 3 transid 0 size 0 nbytes 16384 block group 0 mode 40755 links 1 uid 0 gid 0 rdev 0 sequence 0 flags 0x0(none) atime 1751432947.0 (2025-07-02 14:39:07) ctime 1751432947.0 (2025-07-02 14:39:07) mtime 1751432947.0 (2025-07-02 14:39:07) otime 1751432947.0 (2025-07-02 14:39:07) <<< The root cause is that fill_inode_item() in tree-log.c is only populating a/c/m time, not the otime (or btime in statx output). Part of the reason is that, the vfs inode only has a/c/m time, no native btime support yet. [FIX] Thankfully btrfs has its otime stored in btrfs_inode::i_otime_sec and btrfs_inode::i_otime_nsec. So what we really need is just fill the otime time stamp in fill_inode_item() of tree-log.c There is another fill_inode_item() in inode.c, which is doing the proper otime population. Fixes: 94edf4ae43a5 ("Btrfs: don't bother committing delayed inode updates when fsyncing") CC: stable@vger.kernel.org Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
15 hoursbtrfs: fix ssd_spread overallocationBoris Burkov1-16/+17
commit 807d9023e75fc20bfd6dd2ac0408ce4af53f1648 upstream. If the ssd_spread mount option is enabled, then we run the so called clustered allocator for data block groups. In practice, this results in creating a btrfs_free_cluster which caches a block_group and borrows its free extents for allocation. Since the introduction of allocation size classes in 6.1, there has been a bug in the interaction between that feature and ssd_spread. find_free_extent() has a number of nested loops. The loop going over the allocation stages, stored in ffe_ctl->loop and managed by find_free_extent_update_loop(), the loop over the raid levels, and the loop over all the block_groups in a space_info. The size class feature relies on the block_group loop to ensure it gets a chance to see a block_group of a given size class. However, the clustered allocator uses the cached cluster block_group and breaks that loop. Each call to do_allocation() will really just go back to the same cached block_group. Normally, this is OK, as the allocation either succeeds and we don't want to loop any more or it fails, and we clear the cluster and return its space to the block_group. But with size classes, the allocation can succeed, then later fail, outside of do_allocation() due to size class mismatch. That latter failure is not properly handled due to the highly complex multi loop logic. The result is a painful loop where we continue to allocate the same num_bytes from the cluster in a tight loop until it fails and releases the cluster and lets us try a new block_group. But by then, we have skipped great swaths of the available block_groups and are likely to fail to allocate, looping the outer loop. In pathological cases like the reproducer below, the cached block_group is often the very last one, in which case we don't perform this tight bg loop but instead rip through the ffe stages to LOOP_CHUNK_ALLOC and allocate a chunk, which is now the last one, and we enter the tight inner loop until an allocation failure. Then allocation succeeds on the final block_group and if the next allocation is a size mismatch, the exact same thing happens again. Triggering this is as easy as mounting with -o ssd_spread and then running: mount -o ssd_spread $dev $mnt dd if=/dev/zero of=$mnt/big bs=16M count=1 &>/dev/null dd if=/dev/zero of=$mnt/med bs=4M count=1 &>/dev/null sync if you do the two writes + sync in a loop, you can force btrfs to spin an excessive amount on semi-successful clustered allocations, before ultimately failing and advancing to the stage where we force a chunk allocation. This results in 2G of data allocated per iteration, despite only using ~20M of data. By using a small size classed extent, the inner loop takes longer and we can spin for longer. The simplest, shortest term fix to unbreak this is to make the clustered allocator size_class aware in the dumbest way, where it fails on size class mismatch. This may hinder the operation of the clustered allocator, but better hindered than completely broken and terribly overallocating. Further re-design improvements are also in the works. Fixes: 52bb7a2166af ("btrfs: introduce size class to block group allocator") CC: stable@vger.kernel.org # 6.1+ Reported-by: David Sterba <dsterba@suse.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Boris Burkov <boris@bur.io> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>