summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
5 daysMerge v6.16.3linux-rolling-stableGreg Kroah-Hartman8-141/+250
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 daysLinux 6.16.3v6.16.3linux-6.16.yGreg Kroah-Hartman1-1/+1
Link: https://lore.kernel.org/r/20250822123516.780248736@linuxfoundation.org Tested-by: Ronald Warsow <rwarsow@gmx.de> Tested-by: Markus Reichelt <lkt+2023@mareichelt.com> Tested-by: Salvatore Bonaccorso <carnil@debian.org> Tested-by: Takeshi Ogasawara <takeshi.ogasawara@futuring-girl.com> Tested-by: Florian Fainelli <florian.fainelli@broadcom.com> Tested-by: Peter Schneider <pschneider1968@googlemail.com> Tested-by: Linux Kernel Functional Testing <lkft@linaro.org> Tested-by: Brett A C Sheffield <bacs@librecast.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 daysext4: replace ext4_writepage_trans_blocks()Zhang Yi6-27/+25
commit 57661f28756c59510e31543520b5b8f5e591f384 upstream. After ext4 supports large folios, the semantics of reserving credits in pages is no longer applicable. In most scenarios, reserving credits in extents is sufficient. Therefore, introduce ext4_chunk_trans_extent() to replace ext4_writepage_trans_blocks(). move_extent_per_page() is the only remaining location where we are still processing extents in pages. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20250707140814.542883-10-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 daysext4: reserved credits for one extent during the folio writebackZhang Yi1-17/+8
commit bbbf150f3f85619569ac19dc6458cca7c492e715 upstream. After ext4 supports large folios, reserving journal credits for one maximum-ordered folio based on the worst case cenario during the writeback process can easily exceed the maximum transaction credits. Additionally, reserving journal credits for one page is also no longer appropriate. Currently, the folio writeback process can either extend the journal credits or initiate a new transaction if the currently reserved journal credits are insufficient. Therefore, it can be modified to reserve credits for only one extent at the outset. In most cases involving continuous mapping, these credits are generally adequate, and we may only need to perform some basic credit expansion. However, in extreme cases where the block size and folio size differ significantly, or when the folios are sufficiently discontinuous, it may be necessary to restart a new transaction and resubmit the folios. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20250707140814.542883-9-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 daysext4: correct the reserved credits for extent conversionZhang Yi1-3/+3
commit 95ad8ee45cdbc321c135a2db895d48b374ef0f87 upstream. Now, we reserve journal credits for converting extents in only one page to written state when the I/O operation is complete. This is insufficient when large folio is enabled. Fix this by reserving credits for converting up to one extent per block in the largest 2MB folio, this calculation should only involve extents index and leaf blocks, so it should not estimate too many credits. Fixes: 7ac67301e82f ("ext4: enable large folio for regular file") Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Baokun Li <libaokun1@huawei.com> Link: https://patch.msgid.link/20250707140814.542883-8-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 daysext4: enhance tracepoints during the folios writebackZhang Yi2-6/+41
commit 6b132759b0fe78e518abafb62190c294100db6d6 upstream. After mpage_map_and_submit_extent() supports restarting handle if credits are insufficient during allocating blocks, it is more likely to exit the current mapping iteration and continue to process the current processing partially mapped folio again. The existing tracepoints are not sufficient to track this situation, so enhance the tracepoints to track the writeback position and the return value before and after submitting the folios. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20250707140814.542883-7-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 daysext4: restart handle if credits are insufficient during allocating blocksZhang Yi1-5/+36
commit e2c4c49dee64ca2f42ad2958cbe1805de96b6732 upstream. After large folios are supported on ext4, writing back a sufficiently large and discontinuous folio may consume a significant number of journal credits, placing considerable strain on the journal. For example, in a 20GB filesystem with 1K block size and 1MB journal size, writing back a 2MB folio could require thousands of credits in the worst-case scenario (when each block is discontinuous and distributed across different block groups), potentially exceeding the journal size. This issue can also occur in ext4_write_begin() and ext4_page_mkwrite() when delalloc is not enabled. Fix this by ensuring that there are sufficient journal credits before allocating an extent in mpage_map_one_extent() and ext4_block_write_begin(). If there are not enough credits, return -EAGAIN, exit the current mapping loop, restart a new handle and a new transaction, and allocating blocks on this folio again in the next iteration. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20250707140814.542883-6-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 daysext4: refactor the block allocation process of ext4_page_mkwrite()Zhang Yi1-45/+50
commit 2bddafea3d0d85ee9ac3cf5ba9a4b2f2d2f50257 upstream. The block allocation process and error handling in ext4_page_mkwrite() is complex now. Refactor it by introducing a new helper function, ext4_block_page_mkwrite(). It will call ext4_block_write_begin() to allocate blocks instead of directly calling block_page_mkwrite(). Preparing to implement retry logic in a subsequent patch to address situations where the reserved journal credits are insufficient. Additionally, this modification will help prevent potential deadlocks that may occur when waiting for folio writeback while holding the transaction handle. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20250707140814.542883-5-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 daysext4: fix stale data if it bail out of the extents mapping loopZhang Yi1-1/+50
commit ded2d726a3041fce8afd88005cbfe15cd4737702 upstream. During the process of writing back folios, if mpage_map_and_submit_extent() exits the extent mapping loop due to an ENOSPC or ENOMEM error, it may result in stale data or filesystem inconsistency in environments where the block size is smaller than the folio size. When mapping a discontinuous folio in mpage_map_and_submit_extent(), some buffers may have already be mapped. If we exit the mapping loop prematurely, the folio data within the mapped range will not be written back, and the file's disk size will not be updated. Once the transaction that includes this range of extents is committed, this can lead to stale data or filesystem inconsistency. Fix this by submitting the current processing partially mapped folio. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20250707140814.542883-4-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 daysext4: move the calculation of wbc->nr_to_write to mpage_folio_done()Zhang Yi1-2/+1
commit f922c8c2461b022a2efd9914484901fb358a5b2a upstream. mpage_folio_done() should be a more appropriate place than mpage_submit_folio() for updating the wbc->nr_to_write after we have submitted a fully mapped folio. Preparing to make mpage_submit_folio() allows to submit partially mapped folio that is still under processing. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Baokun Li <libaokun1@huawei.com> Link: https://patch.msgid.link/20250707140814.542883-3-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 daysext4: process folios writeback in bytesZhang Yi2-41/+42
commit 1bfe6354e0975fe89c3d25e81b6546d205556a4b upstream. Since ext4 supports large folios, processing writebacks in pages is no longer appropriate, it can be modified to process writebacks in bytes. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20250707140814.542883-2-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
8 daysMerge v6.16.2Greg Kroah-Hartman12673-248911/+576209
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
8 daysLinux 6.16.2v6.16.2Greg Kroah-Hartman1-1/+1
Link: https://lore.kernel.org/r/20250818124505.781598737@linuxfoundation.org Tested-by: Ronald Warsow <rwarsow@gmx.de> Tested-by: Salvatore Bonaccorso <carnil@debian.org> Tested-by: Brett A C Sheffield <bacs@librecast.net> Tested-by: Florian Fainelli <florian.fainelli@broadcom.com> Tested-by: Shuah Khan <skhan@linuxfoundation.org> Tested-by: Hardik Garg <hargar@linux.microsoft.com> Tested-by: Takeshi Ogasawara <takeshi.ogasawara@futuring-girl.com> Tested-by: Ron Economos <re@w6rz.net> Tested-by: Markus Reichelt <lkt+2023@mareichelt.com> Link: https://lore.kernel.org/r/20250819122844.483737955@linuxfoundation.org Tested-by: Ronald Warsow <rwarsow@gmx.de> Tested-by: Markus Reichelt <lkt+2023@mareichelt.com> Tested-by: Peter Schneider <pschneider1968@googlemail.com> Tested-by: Florian Fainelli <floria.fainelli@broadcom.com> Tested-by: Justin M. Forbes <jforbes@fedoraproject.org> Tested-by: Takeshi Ogasawara <takeshi.ogasawara@futuring-girl.com> Tested-by: Hardik Garg <hargar@linux.microsoft.com> Tested-by: Brett A C Sheffield <bacs@librecast.net> Tested-by: Linux Kernel Functional Testing <lkft@linaro.org> Tested-by: Mark Brown <broonie@kernel.org> Tested-by: Jon Hunter <jonathanh@nvidia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
8 daysACPI: Return -ENODEV from acpi_parse_spcr() when SPCR support is disabledLi Chen1-1/+1
commit b9f58d3572a8e1ef707b941eae58ec4014b9269d upstream. If CONFIG_ACPI_SPCR_TABLE is disabled, acpi_parse_spcr() currently returns 0, which may incorrectly suggest that SPCR parsing was successful. This patch changes the behavior to return -ENODEV to clearly indicate that SPCR support is not available. This prepares the codebase for future changes that depend on acpi_parse_spcr() failure detection, such as suppressing misleading console messages. Signed-off-by: Li Chen <chenl311@chinatelecom.cn> Acked-by: Hanjun Guo <guohanjun@huawei.com> Link: https://lore.kernel.org/r/20250620131309.126555-2-me@linux.beauty Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
8 daysio_uring/zcrx: don't leak pages on account failurePavel Begunkov1-4/+2
commit 6bbd3411ff87df1ca38ff32d36eb5dc673ca8021 upstream. Someone needs to release pinned pages in io_import_umem() if accounting fails. Assign them to the area but return an error, the following io_zcrx_free_area() will clean them up. Fixes: 262ab205180d2 ("io_uring/zcrx: account area memory") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/e19f283a912f200c0d427e376cb789fc3f3d69bc.1753091564.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
8 daysio_uring/zcrx: fix null ifq on area destructionPavel Begunkov1-3/+2
commit 720df2310b89cf76c1dc1a05902536282506f8bf upstream. Dan reports that ifq can be null when infering arguments for io_unaccount_mem() from io_zcrx_free_area(). Fix it by always setting a correct ifq. Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/r/202507180628.gBxrOgqr-lkp@intel.com/ Fixes: 262ab205180d2 ("io_uring/zcrx: account area memory") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/20670d163bb90dba2a81a4150f1125603cefb101.1753091564.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
8 daysdm: split write BIOs on zone boundaries when zone append is not emulatedShin'ichiro Kawasaki1-11/+7
commit 675f940576351bb049f5677615140b9d0a7712d0 upstream. Commit 2df7168717b7 ("dm: Always split write BIOs to zoned device limits") updates the device-mapper driver to perform splits for the write BIOs. However, it did not address the cases where DM targets do not emulate zone append, such as in the cases of dm-linear or dm-flakey. For these targets, when the write BIOs span across zone boundaries, they trigger WARN_ON_ONCE(bio_straddles_zones(bio)) in blk_zone_wplug_handle_write(). This results in I/O errors. The errors are reproduced by running blktests test case zbd/004 using zoned dm-linear or dm-flakey devices. To avoid the I/O errors, handle the write BIOs regardless whether DM targets emulate zone append or not, so that all write BIOs are split at zone boundaries. For that purpose, drop the check for zone append emulation in dm_zone_bio_needs_split(). Its argument 'md' is no longer used then drop it also. Fixes: 2df7168717b7 ("dm: Always split write BIOs to zoned device limits") Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Mikulas Patocka <mpatocka@redhat.com> Link: https://lore.kernel.org/r/20250717103539.37279-1-shinichiro.kawasaki@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
8 daysirqchip/mvebu-gicp: Use resource_size() for ioremap()Thomas Gleixner1-1/+1
commit 9f7488f24c7571d349d938061e0ede7a39b65d6b upstream. 0-day reported an off by one in the ioremap() sizing: drivers/irqchip/irq-mvebu-gicp.c:240:45-48: WARNING: Suspicious code. resource_size is maybe missing with gicp -> res Convert it to resource_size(), which does the right thing. Fixes: 3c3d7dbab2c7 ("irqchip/mvebu-gicp: Clear pending interrupts on init") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Closes: https://lore.kernel.org/oe-kbuild-all/202508062150.mtFQMTXc-lkp@intel.com/ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
8 daysrcu: Fix racy re-initialization of irq_work causing hangsFrederic Weisbecker3-2/+9
commit 61399e0c5410567ef60cb1cda34cca42903842e3 upstream. RCU re-initializes the deferred QS irq work everytime before attempting to queue it. However there are situations where the irq work is attempted to be queued even though it is already queued. In that case re-initializing messes-up with the irq work queue that is about to be handled. The chances for that to happen are higher when the architecture doesn't support self-IPIs and irq work are then all lazy, such as with the following sequence: 1) rcu_read_unlock() is called when IRQs are disabled and there is a grace period involving blocked tasks on the node. The irq work is then initialized and queued. 2) The related tasks are unblocked and the CPU quiescent state is reported. rdp->defer_qs_iw_pending is reset to DEFER_QS_IDLE, allowing the irq work to be requeued in the future (note the previous one hasn't fired yet). 3) A new grace period starts and the node has blocked tasks. 4) rcu_read_unlock() is called when IRQs are disabled again. The irq work is re-initialized (but it's queued! and its node is cleared) and requeued. Which means it's requeued to itself. 5) The irq work finally fires with the tick. But since it was requeued to itself, it loops and hangs. Fix this with initializing the irq work only once before the CPU boots. Fixes: b41642c87716 ("rcu: Fix rcu_read_unlock() deadloop due to IRQ work") Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202508071303.c1134cce-lkp@intel.com Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com> Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.upadhyay@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
8 daysdrm/amd/display: Allow DCN301 to clear update flagsIvan Lipski1-1/+2
commit 2d418e4fd9f1eca7dfce80de86dd702d36a06a25 upstream. [Why & How] Not letting DCN301 to clear after surface/stream update results in artifacts when switching between active overlay planes. The issue is known and has been solved initially. See below: (https://gitlab.freedesktop.org/drm/amd/-/issues/3441) Fixes: f354556e29f4 ("drm/amd/display: limit clear_update_flags t dcn32 and above") Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Ivan Lipski <ivan.lipski@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
8 daysfirmware: arm_scmi: Convert to SYSTEM_SLEEP_PM_OPSArnd Bergmann1-2/+2
commit 62d6b81e8bd207ad44eff39d1a0fe17f0df510a5 upstream. The old SET_SYSTEM_SLEEP_PM_OPS() macro leads to a warning about an unused function: | drivers/firmware/arm_scmi/scmi_power_control.c:363:12: error: | 'scmi_system_power_resume' defined but not used [-Werror=unused-function] | static int scmi_system_power_resume(struct device *dev) The proper way to do this these days is to use SYSTEM_SLEEP_PM_OPS() and pm_sleep_ptr(). Fixes: 9a0658d3991e ("firmware: arm_scmi: power_control: Ensure SCMI_SYSPOWER_IDLE is set early during resume") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Peng Fan <peng.fan@nxp.com> Message-Id: <20250709070107.1388512-1-arnd@kernel.org> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
8 daysio_uring/rw: cast rw->flags assignment to rwf_tJens Axboe1-1/+1
commit 825aea662b492571877b32aeeae13689fd9fbee4 upstream. kernel test robot reports that a recent change of the sqe->rw_flags field throws a sparse warning on 32-bit archs: >> io_uring/rw.c:291:19: sparse: sparse: incorrect type in assignment (different base types) @@ expected restricted __kernel_rwf_t [usertype] flags @@ got unsigned int @@ io_uring/rw.c:291:19: sparse: expected restricted __kernel_rwf_t [usertype] flags io_uring/rw.c:291:19: sparse: got unsigned int Force cast it to rwf_t to silence that new sparse warning. Fixes: cf73d9970ea4 ("io_uring: don't use int for ABI") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202507032211.PwSNPNSP-lkp@intel.com/ Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
8 daysata: libata-sata: Add link_power_management_supported sysfs attributeDamien Le Moal4-12/+44
commit 0060beec0bfa647c4b510df188b1c4673a197839 upstream. A port link power management (LPM) policy can be controlled using the link_power_management_policy sysfs host attribute. However, this attribute exists also for hosts that do not support LPM and in such case, attempting to change the LPM policy for the host (port) will fail with -EOPNOTSUPP. Introduce the new sysfs link_power_management_supported host attribute to indicate to the user if a the port and the devices connected to the port for the host support LPM, which implies that the link_power_management_policy attribute can be used. Since checking that a port and its devices support LPM is common between the new ata_scsi_lpm_supported_show() function and the existing ata_scsi_lpm_store() function, the new helper ata_scsi_lpm_supported() is introduced. Fixes: 413e800cadbf ("ata: libata-sata: Disallow changing LPM state if not supported") Reported-by: Borah, Chaitanya Kumar <chaitanya.kumar.borah@intel.com> Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202507251014.a5becc3b-lkp@intel.com Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
8 daysKVM: VMX: Preserve host's DEBUGCTLMSR_FREEZE_IN_SMM while running the guestMaxim Levitsky6-3/+41
[ Upstream commit 6b1dd26544d045f6a79e8c73572c0c0db3ef3c1a ] Set/clear DEBUGCTLMSR_FREEZE_IN_SMM in GUEST_IA32_DEBUGCTL based on the host's pre-VM-Enter value, i.e. preserve the host's FREEZE_IN_SMM setting while running the guest. When running with the "default treatment of SMIs" in effect (the only mode KVM supports), SMIs do not generate a VM-Exit that is visible to host (non-SMM) software, and instead transitions directly from VMX non-root to SMM. And critically, DEBUGCTL isn't context switched by hardware on SMI or RSM, i.e. SMM will run with whatever value was resident in hardware at the time of the SMI. Failure to preserve FREEZE_IN_SMM results in the PMU unexpectedly counting events while the CPU is executing in SMM, which can pollute profiling and potentially leak information into the guest. Check for changes in FREEZE_IN_SMM prior to every entry into KVM's inner run loop, as the bit can be toggled in IRQ context via IPI callback (SMP function call), by way of /sys/devices/cpu/freeze_on_smi. Add a field in kvm_x86_ops to communicate which DEBUGCTL bits need to be preserved, as FREEZE_IN_SMM is only supported and defined for Intel CPUs, i.e. explicitly checking FREEZE_IN_SMM in common x86 is at best weird, and at worst could lead to undesirable behavior in the future if AMD CPUs ever happened to pick up a collision with the bit. Exempt TDX vCPUs, i.e. protected guests, from the check, as the TDX Module owns and controls GUEST_IA32_DEBUGCTL. WARN in SVM if KVM_RUN_LOAD_DEBUGCTL is set, mostly to document that the lack of handling isn't a KVM bug (TDX already WARNs on any run_flag). Lastly, explicitly reload GUEST_IA32_DEBUGCTL on a VM-Fail that is missed by KVM but detected by hardware, i.e. in nested_vmx_restore_host_state(). Doing so avoids the need to track host_debugctl on a per-VMCS basis, as GUEST_IA32_DEBUGCTL is unconditionally written by prepare_vmcs02() and load_vmcs12_host_state(). For the VM-Fail case, even though KVM won't have actually entered the guest, vcpu_enter_guest() will have run with vmcs02 active and thus could result in vmcs01 being run with a stale value. Cc: stable@vger.kernel.org Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Co-developed-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20250610232010.162191-9-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
8 daysKVM: VMX: Wrap all accesses to IA32_DEBUGCTL with getter/setter APIsMaxim Levitsky4-12/+24
[ Upstream commit 7d0cce6cbe71af6e9c1831bff101a2b9c249c4a2 ] Introduce vmx_guest_debugctl_{read,write}() to handle all accesses to vmcs.GUEST_IA32_DEBUGCTL. This will allow stuffing FREEZE_IN_SMM into GUEST_IA32_DEBUGCTL based on the host setting without bleeding the state into the guest, and without needing to copy+paste the FREEZE_IN_SMM logic into every patch that accesses GUEST_IA32_DEBUGCTL. No functional change intended. Cc: stable@vger.kernel.org Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> [sean: massage changelog, make inline, use in all prepare_vmcs02() cases] Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Link: https://lore.kernel.org/r/20250610232010.162191-8-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> Stable-dep-of: 6b1dd26544d0 ("KVM: VMX: Preserve host's DEBUGCTLMSR_FREEZE_IN_SMM while running the guest") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
8 daysKVM: nVMX: Check vmcs12->guest_ia32_debugctl on nested VM-EnterMaxim Levitsky3-5/+15
[ Upstream commit 095686e6fcb4150f0a55b1a25987fad3d8af58d6 ] Add a consistency check for L2's guest_ia32_debugctl, as KVM only supports a subset of hardware functionality, i.e. KVM can't rely on hardware to detect illegal/unsupported values. Failure to check the vmcs12 value would allow the guest to load any harware-supported value while running L2. Take care to exempt BTF and LBR from the validity check in order to match KVM's behavior for writes via WRMSR, but without clobbering vmcs12. Even if VM_EXIT_SAVE_DEBUG_CONTROLS is set in vmcs12, L1 can reasonably expect that vmcs12->guest_ia32_debugctl will not be modified if writes to the MSR are being intercepted. Arguably, KVM _should_ update vmcs12 if VM_EXIT_SAVE_DEBUG_CONTROLS is set *and* writes to MSR_IA32_DEBUGCTLMSR are not being intercepted by L1, but that would incur non-trivial complexity and wouldn't change the fact that KVM's handling of DEBUGCTL is blatantly broken. I.e. the extra complexity is not worth carrying. Cc: stable@vger.kernel.org Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Co-developed-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20250610232010.162191-7-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> Stable-dep-of: 6b1dd26544d0 ("KVM: VMX: Preserve host's DEBUGCTLMSR_FREEZE_IN_SMM while running the guest") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
8 daysKVM: VMX: Extract checking of guest's DEBUGCTL into helperSean Christopherson1-12/+17
[ Upstream commit 8a4351ac302cd8c19729ba2636acfd0467c22ae8 ] Move VMX's logic to check DEBUGCTL values into a standalone helper so that the code can be used by nested VM-Enter to apply the same logic to the value being loaded from vmcs12. KVM needs to explicitly check vmcs12->guest_ia32_debugctl on nested VM-Enter, as hardware may support features that KVM does not, i.e. relying on hardware to detect invalid guest state will result in false negatives. Unfortunately, that means applying KVM's funky suppression of BTF and LBR to vmcs12 so as not to break existing guests. No functional change intended. Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Link: https://lore.kernel.org/r/20250610232010.162191-6-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> Stable-dep-of: 6b1dd26544d0 ("KVM: VMX: Preserve host's DEBUGCTLMSR_FREEZE_IN_SMM while running the guest") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
8 daysRDMA/siw: Fix the sendmsg byte count in siw_tcp_sendpagesPedro Falcato1-3/+2
commit c18646248fed07683d4cee8a8af933fc4fe83c0d upstream. Ever since commit c2ff29e99a76 ("siw: Inline do_tcp_sendpages()"), we have been doing this: static int siw_tcp_sendpages(struct socket *s, struct page **page, int offset, size_t size) [...] /* Calculate the number of bytes we need to push, for this page * specifically */ size_t bytes = min_t(size_t, PAGE_SIZE - offset, size); /* If we can't splice it, then copy it in, as normal */ if (!sendpage_ok(page[i])) msg.msg_flags &= ~MSG_SPLICE_PAGES; /* Set the bvec pointing to the page, with len $bytes */ bvec_set_page(&bvec, page[i], bytes, offset); /* Set the iter to $size, aka the size of the whole sendpages (!!!) */ iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size); try_page_again: lock_sock(sk); /* Sendmsg with $size size (!!!) */ rv = tcp_sendmsg_locked(sk, &msg, size); This means we've been sending oversized iov_iters and tcp_sendmsg calls for a while. This has a been a benign bug because sendpage_ok() always returned true. With the recent slab allocator changes being slowly introduced into next (that disallow sendpage on large kmalloc allocations), we have recently hit out-of-bounds crashes, due to slight differences in iov_iter behavior between the MSG_SPLICE_PAGES and "regular" copy paths: (MSG_SPLICE_PAGES) skb_splice_from_iter iov_iter_extract_pages iov_iter_extract_bvec_pages uses i->nr_segs to correctly stop in its tracks before OoB'ing everywhere skb_splice_from_iter gets a "short" read (!MSG_SPLICE_PAGES) skb_copy_to_page_nocache copy=iov_iter_count [...] copy_from_iter /* this doesn't help */ if (unlikely(iter->count < len)) len = iter->count; iterate_bvec ... and we run off the bvecs Fix this by properly setting the iov_iter's byte count, plus sending the correct byte count to tcp_sendmsg_locked. Link: https://patch.msgid.link/r/20250729120348.495568-1-pfalcato@suse.de Cc: stable@vger.kernel.org Fixes: c2ff29e99a76 ("siw: Inline do_tcp_sendpages()") Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202507220801.50a7210-lkp@intel.com Reviewed-by: David Howells <dhowells@redhat.com> Signed-off-by: Pedro Falcato <pfalcato@suse.de> Acked-by: Bernard Metzler <bernard.metzler@linux.dev> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
8 daystools/nolibc: fix spelling of FD_SETBITMASK in FD_* macrosWilly Tarreau1-2/+2
commit a477629baa2a0e9991f640af418e8c973a1c08e3 upstream. While nolibc-test does test syscalls, it doesn't test as much the rest of the macros, and a wrong spelling of FD_SETBITMASK in commit feaf75658783a broke programs using either FD_SET() or FD_CLR() without being noticed. Let's fix these macros. Fixes: feaf75658783a ("nolibc: fix fd_set type") Cc: stable@vger.kernel.org # v6.2+ Acked-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
8 daystracing: fprobe: Fix infinite recursion using preempt_*_notrace()Masami Hiramatsu (Google)1-2/+2
commit a3e892ab0fc287389176eabdcd74234508f6e52d upstream. Since preempt_count_add/del() are tracable functions, it is not allowed to use preempt_disable/enable() in ftrace handlers. Without this fix, probing on `preempt_count_add%return` will cause an infinite recursion of fprobes. To fix this problem, use preempt_disable/enable_notrace() in fprobe_return(). Link: https://lore.kernel.org/all/175374642359.1471729.1054175011228386560.stgit@mhiramat.tok.corp.google.com/ Fixes: 4346ba160409 ("fprobe: Rewrite fprobe on function-graph tracer") Cc: stable@vger.kernel.org Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
8 daysmedia: i2c: vd55g1: Fix return code in vd55g1_enable_streams error pathBenjamin Mugnier1-1/+1
commit 5931eed35cb632ff8b7e0b0cc91abc6014c64045 upstream. Enable stream was returning success even if an error occurred, fix it by modifying the err_rpm_put return value to -EINVAL. Signed-off-by: Benjamin Mugnier <benjamin.mugnier@foss.st.com> Fixes: e56616d7b23c ("media: i2c: Add driver for ST VD55G1 camera sensor") Cc: stable@vger.kernel.org Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
8 daysmedia: uvcvideo: Turn on the camera if V4L2_EVENT_SUB_FL_SEND_INITIALRicardo Ribalda1-1/+9
commit a03e32e60141058d46ea8cf4631654c43c740fdb upstream. If we subscribe to an event with V4L2_EVENT_SUB_FL_SEND_INITIAL, the driver needs to report back some values that require the camera to be powered on. But VIDIOC_SUBSCRIBE_EVENT is not part of the ioctls that turn on the camera. We could unconditionally turn on the camera during VIDIOC_SUBSCRIBE_EVENT, but it is more efficient to turn it on only during V4L2_EVENT_SUB_FL_SEND_INITIAL, which we believe is not a common usecase. To avoid a list_del if uvc_pm_get() fails, we move list_add_tail to the end of the function. Reviewed-by: Hans de Goede <hansg@kernel.org> Fixes: d1b618e79548 ("media: uvcvideo: Do not turn on the camera for some ioctls") Cc: stable@vger.kernel.org Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Ricardo Ribalda <ribalda@chromium.org> Link: https://lore.kernel.org/r/20250701-uvc-grannular-invert-v4-5-8003b9b89f68@chromium.org Signed-off-by: Hans de Goede <hansg@kernel.org> Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
8 daysmedia: v4l2: Add support for NV12M tiled variants to v4l2_format_info()Marek Szyprowski1-0/+6
commit f7546da1d6eb8928efb89b7faacbd6c2f8f0de5c upstream. Commit 6f1466123d73 ("media: s5p-mfc: Add YV12 and I420 multiplanar format support") added support for the new formats to s5p-mfc driver, what in turn required some internal calls to the v4l2_format_info() function while setting up formats. This in turn broke support for the "old" tiled NV12MT* formats, which are not recognized by this function. Fix this by adding those variants of NV12M pixel format to v4l2_format_info() function database. Fixes: 6f1466123d73 ("media: s5p-mfc: Add YV12 and I420 multiplanar format support") Cc: stable@vger.kernel.org Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com> Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
8 daysmedia: uvcvideo: Do not mark valid metadata as invalidRicardo Ribalda1-6/+6
commit bda2859bff0b9596a19648f3740c697ce4c71496 upstream. Currently, the driver performs a length check of the metadata buffer before the actual metadata size is known and before the metadata is decided to be copied. This results in valid metadata buffers being incorrectly marked as invalid. Move the length check to occur after the metadata size is determined and is decided to be copied. Cc: stable@vger.kernel.org Fixes: 088ead255245 ("media: uvcvideo: Add a metadata device node") Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Reviewed-by: Hans de Goede <hansg@kernel.org> Signed-off-by: Ricardo Ribalda <ribalda@chromium.org> Link: https://lore.kernel.org/r/20250707-uvc-meta-v8-1-ed17f8b1218b@chromium.org Signed-off-by: Hans de Goede <hansg@kernel.org> Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
8 daysmedia: venus: Fix OOB read due to missing payload bound checkVedang Nagar1-25/+58
commit 06d6770ff0d8cc8dfd392329a8cc03e2a83e7289 upstream. Currently, The event_seq_changed() handler processes a variable number of properties sent by the firmware. The number of properties is indicated by the firmware and used to iterate over the payload. However, the payload size is not being validated against the actual message length. This can lead to out-of-bounds memory access if the firmware provides a property count that exceeds the data available in the payload. Such a condition can result in kernel crashes or potential information leaks if memory beyond the buffer is accessed. Fix this by properly validating the remaining size of the payload before each property access and updating bounds accordingly as properties are parsed. This ensures that property parsing is safely bounded within the received message buffer and protects against malformed or malicious firmware behavior. Fixes: 09c2845e8fe4 ("[media] media: venus: hfi: add Host Firmware Interface (HFI)") Cc: stable@vger.kernel.org Signed-off-by: Vedang Nagar <quic_vnagar@quicinc.com> Reviewed-by: Vikash Garodia <quic_vgarodia@quicinc.com> Reviewed-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org> Co-developed-by: Dikshita Agarwal <quic_dikshita@quicinc.com> Signed-off-by: Dikshita Agarwal <quic_dikshita@quicinc.com> Signed-off-by: Bryan O'Donoghue <bod@kernel.org> Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
8 daysmedia: uvcvideo: Fix 1-byte out-of-bounds read in uvc_parse_format()Youngjun Lee1-0/+3
commit 782b6a718651eda3478b1824b37a8b3185d2740c upstream. The buffer length check before calling uvc_parse_format() only ensured that the buffer has at least 3 bytes (buflen > 2), buf the function accesses buffer[3], requiring at least 4 bytes. This can lead to an out-of-bounds read if the buffer has exactly 3 bytes. Fix it by checking that the buffer has at least 4 bytes in uvc_parse_format(). Signed-off-by: Youngjun Lee <yjjuny.lee@samsung.com> Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Fixes: c0efd232929c ("V4L/DVB (8145a): USB Video Class driver") Cc: stable@vger.kernel.org Reviewed-by: Ricardo Ribalda <ribalda@chromium.org> Link: https://lore.kernel.org/r/20250610124107.37360-1-yjjuny.lee@samsung.com Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
8 daysmm/kmemleak: avoid deadlock by moving pr_warn() outside kmemleak_lockBreno Leitao1-1/+4
commit 47b0f6d8f0d2be4d311a49e13d2fd5f152f492b2 upstream. When netpoll is enabled, calling pr_warn_once() while holding kmemleak_lock in mem_pool_alloc() can cause a deadlock due to lock inversion with the netconsole subsystem. This occurs because pr_warn_once() may trigger netpoll, which eventually leads to __alloc_skb() and back into kmemleak code, attempting to reacquire kmemleak_lock. This is the path for the deadlock. mem_pool_alloc() -> raw_spin_lock_irqsave(&kmemleak_lock, flags); -> pr_warn_once() -> netconsole subsystem -> netpoll -> __alloc_skb -> __create_object -> raw_spin_lock_irqsave(&kmemleak_lock, flags); Fix this by setting a flag and issuing the pr_warn_once() after kmemleak_lock is released. Link: https://lkml.kernel.org/r/20250731-kmemleak_lock-v1-1-728fd470198f@debian.org Fixes: c5665868183f ("mm: kmemleak: use the memory pool for early allocations") Signed-off-by: Breno Leitao <leitao@debian.org> Reported-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
8 daysmm/kmemleak: avoid soft lockup in __kmemleak_do_cleanup()Waiman Long1-0/+5
commit d1534ae23c2b6be350c8ab060803fbf6e9682adc upstream. A soft lockup warning was observed on a relative small system x86-64 system with 16 GB of memory when running a debug kernel with kmemleak enabled. watchdog: BUG: soft lockup - CPU#8 stuck for 33s! [kworker/8:1:134] The test system was running a workload with hot unplug happening in parallel. Then kemleak decided to disable itself due to its inability to allocate more kmemleak objects. The debug kernel has its CONFIG_DEBUG_KMEMLEAK_MEM_POOL_SIZE set to 40,000. The soft lockup happened in kmemleak_do_cleanup() when the existing kmemleak objects were being removed and deleted one-by-one in a loop via a workqueue. In this particular case, there are at least 40,000 objects that need to be processed and given the slowness of a debug kernel and the fact that a raw_spinlock has to be acquired and released in __delete_object(), it could take a while to properly handle all these objects. As kmemleak has been disabled in this case, the object removal and deletion process can be further optimized as locking isn't really needed. However, it is probably not worth the effort to optimize for such an edge case that should rarely happen. So the simple solution is to call cond_resched() at periodic interval in the iteration loop to avoid soft lockup. Link: https://lkml.kernel.org/r/20250728190248.605750-1-longman@redhat.com Signed-off-by: Waiman Long <longman@redhat.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
8 daysmm/shmem, swap: improve cached mTHP handling and fix potential hangKairui Song1-9/+30
commit 5c241ed8d031693dadf33dd98ed2e7cc363e9b66 upstream. The current swap-in code assumes that, when a swap entry in shmem mapping is order 0, its cached folios (if present) must be order 0 too, which turns out not always correct. The problem is shmem_split_large_entry is called before verifying the folio will eventually be swapped in, one possible race is: CPU1 CPU2 shmem_swapin_folio /* swap in of order > 0 swap entry S1 */ folio = swap_cache_get_folio /* folio = NULL */ order = xa_get_order /* order > 0 */ folio = shmem_swap_alloc_folio /* mTHP alloc failure, folio = NULL */ <... Interrupted ...> shmem_swapin_folio /* S1 is swapped in */ shmem_writeout /* S1 is swapped out, folio cached */ shmem_split_large_entry(..., S1) /* S1 is split, but the folio covering it has order > 0 now */ Now any following swapin of S1 will hang: `xa_get_order` returns 0, and folio lookup will return a folio with order > 0. The `xa_get_order(&mapping->i_pages, index) != folio_order(folio)` will always return false causing swap-in to return -EEXIST. And this looks fragile. So fix this up by allowing seeing a larger folio in swap cache, and check the whole shmem mapping range covered by the swapin have the right swap value upon inserting the folio. And drop the redundant tree walks before the insertion. This will actually improve performance, as it avoids two redundant Xarray tree walks in the hot path, and the only side effect is that in the failure path, shmem may redundantly reallocate a few folios causing temporary slight memory pressure. And worth noting, it may seems the order and value check before inserting might help reducing the lock contention, which is not true. The swap cache layer ensures raced swapin will either see a swap cache folio or failed to do a swapin (we have SWAP_HAS_CACHE bit even if swap cache is bypassed), so holding the folio lock and checking the folio flag is already good enough for avoiding the lock contention. The chance that a folio passes the swap entry value check but the shmem mapping slot has changed should be very low. Link: https://lkml.kernel.org/r/20250728075306.12704-1-ryncsn@gmail.com Link: https://lkml.kernel.org/r/20250728075306.12704-2-ryncsn@gmail.com Fixes: 809bc86517cc ("mm: shmem: support large folio swap out") Signed-off-by: Kairui Song <kasong@tencent.com> Reviewed-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Baoquan He <bhe@redhat.com> Cc: Barry Song <baohua@kernel.org> Cc: Chris Li <chrisl@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Dev Jain <dev.jain@arm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
8 daysmm/ptdump: take the memory hotplug lock inside ptdump_walk_pgd()Anshuman Khandual4-8/+2
commit 59305202c67fea50378dcad0cc199dbc13a0e99a upstream. Memory hot remove unmaps and tears down various kernel page table regions as required. The ptdump code can race with concurrent modifications of the kernel page tables. When leaf entries are modified concurrently, the dump code may log stale or inconsistent information for a VA range, but this is otherwise not harmful. But when intermediate levels of kernel page table are freed, the dump code will continue to use memory that has been freed and potentially reallocated for another purpose. In such cases, the ptdump code may dereference bogus addresses, leading to a number of potential problems. To avoid the above mentioned race condition, platforms such as arm64, riscv and s390 take memory hotplug lock, while dumping kernel page table via the sysfs interface /sys/kernel/debug/kernel_page_tables. Similar race condition exists while checking for pages that might have been marked W+X via /sys/kernel/debug/kernel_page_tables/check_wx_pages which in turn calls ptdump_check_wx(). Instead of solving this race condition again, let's just move the memory hotplug lock inside generic ptdump_check_wx() which will benefit both the scenarios. Drop get_online_mems() and put_online_mems() combination from all existing platform ptdump code paths. Link: https://lkml.kernel.org/r/20250620052427.2092093-1-anshuman.khandual@arm.com Fixes: bbd6ec605c0f ("arm64/mm: Enable memory hot remove") Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Dev Jain <dev.jain@arm.com> Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> [s390] Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
8 daysmm/huge_memory: don't ignore queried cachemode in vmf_insert_pfn_pud()David Hildenbrand1-4/+3
commit 09fefdca80aebd1023e827cb0ee174983d829d18 upstream. Patch series "mm/huge_memory: vmf_insert_folio_*() and vmf_insert_pfn_pud() fixes", v3. While working on improving vm_normal_page() and friends, I stumbled over this issues: refcounted "normal" folios must not be marked using pmd_special() / pud_special(). Otherwise, we're effectively telling the system that these folios are no "normal", violating the rules we documented for vm_normal_page(). Fortunately, there are not many pmd_special()/pud_special() users yet. So far there doesn't seem to be serious damage. Tested using the ndctl tests ("ndctl:dax" suite). This patch (of 3): We set up the cache mode but ... don't forward the updated pgprot to insert_pfn_pud(). Only a problem on x86-64 PAT when mapping PFNs using PUDs that require a special cachemode. Fix it by using the proper pgprot where the cachemode was setup. It is unclear in which configurations we would get the cachemode wrong: through vfio seems possible. Getting cachemodes wrong is usually ... bad. As the fix is easy, let's backport it to stable. Identified by code inspection. Link: https://lkml.kernel.org/r/20250613092702.1943533-1-david@redhat.com Link: https://lkml.kernel.org/r/20250613092702.1943533-2-david@redhat.com Fixes: 7b806d229ef1 ("mm: remove vmf_insert_pfn_xxx_prot() for huge page-table entries") Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Tested-by: Dan Williams <dan.j.williams@intel.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mariano Pache <npache@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Zi Yan <ziy@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
8 daysmm, slab: restore NUMA policy support for large kmallocVlastimil Babka1-1/+6
commit e2d18cbf178775ad377ad88ee55e6e183c38d262 upstream. The slab allocator observes the task's NUMA policy in various places such as allocating slab pages. Large kmalloc() allocations used to do that too, until an unintended change by c4cab557521a ("mm/slab_common: cleanup kmalloc_large()") resulted in ignoring mempolicy and just preferring the local node. Restore the NUMA policy support. Fixes: c4cab557521a ("mm/slab_common: cleanup kmalloc_large()") Cc: <stable@vger.kernel.org> Acked-by: Christoph Lameter (Ampere) <cl@gentwo.org> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Reviewed-by: Harry Yoo <harry.yoo@oracle.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
8 daysparisc: Makefile: fix a typo in palo.confRandy Dunlap1-1/+1
commit 963f1b20a8d2a098954606b9725cd54336a2a86c upstream. Correct "objree" to "objtree". "objree" is not defined. Fixes: 75dd47472b92 ("kbuild: remove src and obj from the top Makefile") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Helge Deller <deller@gmx.de> Cc: linux-parisc@vger.kernel.org Signed-off-by: Helge Deller <deller@gmx.de> Cc: stable@vger.kernel.org # v5.3+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
8 daysi2c: core: Fix double-free of fwnode in i2c_unregister_device()Hans de Goede1-1/+7
commit 1c24e5fc0c7096e00c202a6a3e0c342c1afb47c2 upstream. Before commit df6d7277e552 ("i2c: core: Do not dereference fwnode in struct device"), i2c_unregister_device() only called fwnode_handle_put() on of_node-s in the form of calling of_node_put(client->dev.of_node). But after this commit the i2c_client's fwnode now unconditionally gets fwnode_handle_put() on it. When the i2c_client has no primary (ACPI / OF) fwnode but it does have a software fwnode, the software-node will be the primary node and fwnode_handle_put() will put() it. But for the software fwnode device_remove_software_node() will also put() it leading to a double free: [ 82.665598] ------------[ cut here ]------------ [ 82.665609] refcount_t: underflow; use-after-free. [ 82.665808] WARNING: CPU: 3 PID: 1502 at lib/refcount.c:28 refcount_warn_saturate+0xba/0x11 ... [ 82.666830] RIP: 0010:refcount_warn_saturate+0xba/0x110 ... [ 82.666962] <TASK> [ 82.666971] i2c_unregister_device+0x60/0x90 Fix this by not calling fwnode_handle_put() when the primary fwnode is a software-node. Fixes: df6d7277e552 ("i2c: core: Do not dereference fwnode in struct device") Cc: stable@vger.kernel.org Signed-off-by: Hans de Goede <hansg@kernel.org> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
8 dayshv_netvsc: Fix panic during namespace deletion with VFHaiyang Zhang2-1/+31
commit 33caa208dba6fa639e8a92fd0c8320b652e5550c upstream. The existing code move the VF NIC to new namespace when NETDEV_REGISTER is received on netvsc NIC. During deletion of the namespace, default_device_exit_batch() >> default_device_exit_net() is called. When netvsc NIC is moved back and registered to the default namespace, it automatically brings VF NIC back to the default namespace. This will cause the default_device_exit_net() >> for_each_netdev_safe loop unable to detect the list end, and hit NULL ptr: [ 231.449420] mana 7870:00:00.0 enP30832s1: Moved VF to namespace with: eth0 [ 231.449656] BUG: kernel NULL pointer dereference, address: 0000000000000010 [ 231.450246] #PF: supervisor read access in kernel mode [ 231.450579] #PF: error_code(0x0000) - not-present page [ 231.450916] PGD 17b8a8067 P4D 0 [ 231.451163] Oops: Oops: 0000 [#1] SMP NOPTI [ 231.451450] CPU: 82 UID: 0 PID: 1394 Comm: kworker/u768:1 Not tainted 6.16.0-rc4+ #3 VOLUNTARY [ 231.452042] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 11/21/2024 [ 231.452692] Workqueue: netns cleanup_net [ 231.452947] RIP: 0010:default_device_exit_batch+0x16c/0x3f0 [ 231.453326] Code: c0 0c f5 b3 e8 d5 db fe ff 48 85 c0 74 15 48 c7 c2 f8 fd ca b2 be 10 00 00 00 48 8d 7d c0 e8 7b 77 25 00 49 8b 86 28 01 00 00 <48> 8b 50 10 4c 8b 2a 4c 8d 62 f0 49 83 ed 10 4c 39 e0 0f 84 d6 00 [ 231.454294] RSP: 0018:ff75fc7c9bf9fd00 EFLAGS: 00010246 [ 231.454610] RAX: 0000000000000000 RBX: 0000000000000002 RCX: 61c8864680b583eb [ 231.455094] RDX: ff1fa9f71462d800 RSI: ff75fc7c9bf9fd38 RDI: 0000000030766564 [ 231.455686] RBP: ff75fc7c9bf9fd78 R08: 0000000000000000 R09: 0000000000000000 [ 231.456126] R10: 0000000000000001 R11: 0000000000000004 R12: ff1fa9f70088e340 [ 231.456621] R13: ff1fa9f70088e340 R14: ffffffffb3f50c20 R15: ff1fa9f7103e6340 [ 231.457161] FS: 0000000000000000(0000) GS:ff1faa6783a08000(0000) knlGS:0000000000000000 [ 231.457707] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 231.458031] CR2: 0000000000000010 CR3: 0000000179ab2006 CR4: 0000000000b73ef0 [ 231.458434] Call Trace: [ 231.458600] <TASK> [ 231.458777] ops_undo_list+0x100/0x220 [ 231.459015] cleanup_net+0x1b8/0x300 [ 231.459285] process_one_work+0x184/0x340 To fix it, move the ns change to a workqueue, and take rtnl_lock to avoid changing the netdev list when default_device_exit_net() is using it. Cc: stable@vger.kernel.org Fixes: 4c262801ea60 ("hv_netvsc: Fix VF namespace also in synthetic NIC NETDEV_REGISTER event") Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Link: https://patch.msgid.link/1754511711-11188-1-git-send-email-haiyangz@linux.microsoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
8 daysnet/sched: ets: use old 'nbands' while purging unused classesDavide Caratti1-5/+6
commit 87c6efc5ce9c126ae4a781bc04504b83780e3650 upstream. Shuang reported sch_ets test-case [1] crashing in ets_class_qlen_notify() after recent changes from Lion [2]. The problem is: in ets_qdisc_change() we purge unused DWRR queues; the value of 'q->nbands' is the new one, and the cleanup should be done with the old one. The problem is here since my first attempts to fix ets_qdisc_change(), but it surfaced again after the recent qdisc len accounting fixes. Fix it purging idle DWRR queues before assigning a new value of 'q->nbands', so that all purge operations find a consistent configuration: - old 'q->nbands' because it's needed by ets_class_find() - old 'q->nstrict' because it's needed by ets_class_is_strict() BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: Oops: 0000 [#1] SMP NOPTI CPU: 62 UID: 0 PID: 39457 Comm: tc Kdump: loaded Not tainted 6.12.0-116.el10.x86_64 #1 PREEMPT(voluntary) Hardware name: Dell Inc. PowerEdge R640/06DKY5, BIOS 2.12.2 07/09/2021 RIP: 0010:__list_del_entry_valid_or_report+0x4/0x80 Code: ff 4c 39 c7 0f 84 39 19 8e ff b8 01 00 00 00 c3 cc cc cc cc 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa <48> 8b 17 48 8b 4f 08 48 85 d2 0f 84 56 19 8e ff 48 85 c9 0f 84 ab RSP: 0018:ffffba186009f400 EFLAGS: 00010202 RAX: 00000000000000d6 RBX: 0000000000000000 RCX: 0000000000000004 RDX: ffff9f0fa29b69c0 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffffffffc12c2400 R08: 0000000000000008 R09: 0000000000000004 R10: ffffffffffffffff R11: 0000000000000004 R12: 0000000000000000 R13: ffff9f0f8cfe0000 R14: 0000000000100005 R15: 0000000000000000 FS: 00007f2154f37480(0000) GS:ffff9f269c1c0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 00000001530be001 CR4: 00000000007726f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <TASK> ets_class_qlen_notify+0x65/0x90 [sch_ets] qdisc_tree_reduce_backlog+0x74/0x110 ets_qdisc_change+0x630/0xa40 [sch_ets] __tc_modify_qdisc.constprop.0+0x216/0x7f0 tc_modify_qdisc+0x7c/0x120 rtnetlink_rcv_msg+0x145/0x3f0 netlink_rcv_skb+0x53/0x100 netlink_unicast+0x245/0x390 netlink_sendmsg+0x21b/0x470 ____sys_sendmsg+0x39d/0x3d0 ___sys_sendmsg+0x9a/0xe0 __sys_sendmsg+0x7a/0xd0 do_syscall_64+0x7d/0x160 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7f2155114084 Code: 89 02 b8 ff ff ff ff eb bb 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa 80 3d 25 f0 0c 00 00 74 13 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 48 83 ec 28 89 54 24 1c 48 89 RSP: 002b:00007fff1fd7a988 EFLAGS: 00000202 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 0000560ec063e5e0 RCX: 00007f2155114084 RDX: 0000000000000000 RSI: 00007fff1fd7a9f0 RDI: 0000000000000003 RBP: 00007fff1fd7aa60 R08: 0000000000000010 R09: 000000000000003f R10: 0000560ee9b3a010 R11: 0000000000000202 R12: 00007fff1fd7aae0 R13: 000000006891ccde R14: 0000560ec063e5e0 R15: 00007fff1fd7aad0 </TASK> [1] https://lore.kernel.org/netdev/e08c7f4a6882f260011909a868311c6e9b54f3e4.1639153474.git.dcaratti@redhat.com/ [2] https://lore.kernel.org/netdev/d912cbd7-193b-4269-9857-525bee8bbb6a@gmail.com/ Cc: stable@vger.kernel.org Fixes: 103406b38c60 ("net/sched: Always pass notifications when child class becomes empty") Fixes: c062f2a0b04d ("net/sched: sch_ets: don't remove idle classes from the round-robin list") Fixes: dcc68b4d8084 ("net: sch_ets: Add a new Qdisc") Reported-by: Li Shuang <shuali@redhat.com> Closes: https://issues.redhat.com/browse/RHEL-108026 Reviewed-by: Petr Machata <petrm@nvidia.com> Co-developed-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Link: https://patch.msgid.link/7928ff6d17db47a2ae7cc205c44777b1f1950545.1755016081.git.dcaratti@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
8 daysocfs2: reset folio to NULL when get folio failsLizhi Xu1-0/+1
commit 2ae826799932ff89409f56636ad3c25578fe7cf5 upstream. The reproducer uses FAULT_INJECTION to make memory allocation fail, which causes __filemap_get_folio() to fail, when initializing w_folios[i] in ocfs2_grab_folios_for_write(), it only returns an error code and the value of w_folios[i] is the error code, which causes ocfs2_unlock_and_free_folios() to recycle the invalid w_folios[i] when releasing folios. Link: https://lkml.kernel.org/r/20250616013140.3602219-1-lizhi.xu@windriver.com Reported-by: syzbot+c2ea94ae47cd7e3881ec@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=c2ea94ae47cd7e3881ec Signed-off-by: Lizhi Xu <lizhi.xu@windriver.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
8 daysfbdev: nvidiafb: add depends on HAS_IOPORTRandy Dunlap1-1/+1
commit ecdd7df997fd992f0ec70b788e3b12258008a2bf upstream. The nvidiafb driver uses inb()/outb() without depending on HAS_IOPORT, which leads to build errors since kernel v6.13-rc1: commit 6f043e757445 ("asm-generic/io.h: Remove I/O port accessors for HAS_IOPORT=n") Add the HAS_IOPORT dependency to prevent the build errors. (Found in ARCH=um allmodconfig builds) drivers/video/fbdev/nvidia/nv_accel.c: In function ‘NVDmaWait’: include/asm-generic/io.h:596:15: error: call to ‘_outb’ declared with attribute error: outb() requires CONFIG_HAS_IOPORT 596 | #define _outb _outb Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Arnd Bergmann <arnd@kernel.org> Cc: Niklas Schnelle <schnelle@linux.ibm.com> Cc: Antonino Daplas <adaplas@gmail.com> Cc: Helge Deller <deller@gmx.de> Cc: linux-fbdev@vger.kernel.org Cc: dri-devel@lists.freedesktop.org Cc: stable@vger.kernel.org # v6.13+ Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
8 daysfbdev: Fix vmalloc out-of-bounds write in fast_imageblitSravan Kumar Gundu1-4/+5
commit af0db3c1f898144846d4c172531a199bb3ca375d upstream. This issue triggers when a userspace program does an ioctl FBIOPUT_CON2FBMAP by passing console number and frame buffer number. Ideally this maps console to frame buffer and updates the screen if console is visible. As part of mapping it has to do resize of console according to frame buffer info. if this resize fails and returns from vc_do_resize() and continues further. At this point console and new frame buffer are mapped and sets display vars. Despite failure still it continue to proceed updating the screen at later stages where vc_data is related to previous frame buffer and frame buffer info and display vars are mapped to new frame buffer and eventully leading to out-of-bounds write in fast_imageblit(). This bheviour is excepted only when fg_console is equal to requested console which is a visible console and updates screen with invalid struct references in fbcon_putcs(). Reported-and-tested-by: syzbot+c4b7aa0513823e2ea880@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=c4b7aa0513823e2ea880 Signed-off-by: Sravan Kumar Gundu <sravankumarlpu@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
8 daysuserfaultfd: fix a crash in UFFDIO_MOVE when PMD is a migration entrySuren Baghdasaryan1-7/+10
commit aba6faec0103ed8f169be8dce2ead41fcb689446 upstream. When UFFDIO_MOVE encounters a migration PMD entry, it proceeds with obtaining a folio and accessing it even though the entry is swp_entry_t. Add the missing check and let split_huge_pmd() handle migration entries. While at it also remove unnecessary folio check. [surenb@google.com: remove extra folio check, per David] Link: https://lkml.kernel.org/r/20250807200418.1963585-1-surenb@google.com Link: https://lkml.kernel.org/r/20250806220022.926763-1-surenb@google.com Fixes: adef440691ba ("userfaultfd: UFFDIO_MOVE uABI") Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reported-by: syzbot+b446dbe27035ef6bd6c2@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/68794b5c.a70a0220.693ce.0050.GAE@google.com/ Reviewed-by: Peter Xu <peterx@redhat.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Lokesh Gidra <lokeshgidra@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>