kernel/linux.git - Linux kernel stable tree (mirror)

Age	Commit message (Collapse)	Author	Files	Lines
2 days	gpio: pca953x: fix wrong error probe return value	Sascha Hauer	1	-1/+1
	commit 0a1db19f66c0960eb00e1f2ccd40708b6747f5b1 upstream. The second argument to dev_err_probe() is the error value. Pass the return value of devm_request_threaded_irq() there instead of the irq number. Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Fixes: c47f7ff0fe61 ("gpio: pca953x: Utilise dev_err_probe() where it makes sense") Link: https://lore.kernel.org/r/20250616134503.1201138-1-s.hauer@pengutronix.de Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 days	scsi: sg: Fix occasional bogus elapsed time that exceeds timeout	Michal Rábek	1	-7/+13
	[ Upstream commit 0e1677654259a2f3ccf728de1edde922a3c4ba57 ] A race condition was found in sg_proc_debug_helper(). It was observed on a system using an IBM LTO-9 SAS Tape Drive (ULTRIUM-TD9) and monitoring /proc/scsi/sg/debug every second. A very large elapsed time would sometimes appear. This is caused by two race conditions. We reproduced the issue with an IBM ULTRIUM-HH9 tape drive on an x86_64 architecture. A patched kernel was built, and the race condition could not be observed anymore after the application of this patch. A reproducer C program utilising the scsi_debug module was also built by Changhui Zhong and can be viewed here: https://github.com/MichaelRabek/linux-tests/blob/master/drivers/scsi/sg/sg_race_trigger.c The first race happens between the reading of hp->duration in sg_proc_debug_helper() and request completion in sg_rq_end_io(). The hp->duration member variable may hold either of two types of information: #1 - The start time of the request. This value is present while the request is not yet finished. #2 - The total execution time of the request (end_time - start_time). If sg_proc_debug_helper() executes after the value of hp->duration was changed from #1 to #2, but before srp->done is set to 1 in sg_rq_end_io(), a fresh timestamp is taken in the else branch, and the elapsed time (value type #2) is subtracted from a timestamp, which cannot yield a valid elapsed time (which is a type #2 value as well). To fix this issue, the value of hp->duration must change under the protection of the sfp->rq_list_lock in sg_rq_end_io(). Since sg_proc_debug_helper() takes this read lock, the change to srp->done and srp->header.duration will happen atomically from the perspective of sg_proc_debug_helper() and the race condition is thus eliminated. The second race condition happens between sg_proc_debug_helper() and sg_new_write(). Even though hp->duration is set to the current time stamp in sg_add_request() under the write lock's protection, it gets overwritten by a call to get_sg_io_hdr(), which calls copy_from_user() to copy struct sg_io_hdr from userspace into kernel space. hp->duration is set to the start time again in sg_common_write(). If sg_proc_debug_helper() is called between these two calls, an arbitrary value set by userspace (usually zero) is used to compute the elapsed time. To fix this issue, hp->duration must be set to the current timestamp again after get_sg_io_hdr() returns successfully. A small race window still exists between get_sg_io_hdr() and setting hp->duration, but this window is only a few instructions wide and does not result in observable issues in practice, as confirmed by testing. Additionally, we fix the format specifier from %d to %u for printing unsigned int values in sg_proc_debug_helper(). Signed-off-by: Michal Rábek <mrabek@redhat.com> Suggested-by: Tomas Henzl <thenzl@redhat.com> Tested-by: Changhui Zhong <czhong@redhat.com> Reviewed-by: Ewan D. Milne <emilne@redhat.com> Reviewed-by: John Meneghini <jmeneghi@redhat.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Link: https://patch.msgid.link/20251212160900.64924-1-mrabek@redhat.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2 days	drm/amd/display: Fix DP no audio issue	Charlene Liu	1	-4/+4
	[ Upstream commit 3886b198bd6e49c801fe9552fcfbfc387a49fbbc ] [why] need to enable APG_CLOCK_ENABLE enable first also need to wake up az from D3 before access az block Reviewed-by: Swapnil Patel <swapnil.patel@amd.com> Signed-off-by: Charlene Liu <Charlene.Liu@amd.com> Signed-off-by: Chenyu Chen <chen-yu.chen@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit bf5e396957acafd46003318965500914d5f4edfa) Signed-off-by: Sasha Levin <sashal@kernel.org>
2 days	powercap: fix sscanf() error return value handling	Sumeet Pawnikar	1	-3/+3
	[ Upstream commit efc4c35b741af973de90f6826bf35d3b3ac36bf1 ] Fix inconsistent error handling for sscanf() return value check. Implicit boolean conversion is used instead of explicit return value checks. The code checks if (!sscanf(...)) which is incorrect because: 1. sscanf returns the number of successfully parsed items 2. On success, it returns 1 (one item passed) 3. On failure, it returns 0 or EOF 4. The check 'if (!sscanf(...))' is wrong because it treats success (1) as failure All occurrences of sscanf() now uses explicit return value check. With this behavior it returns '-EINVAL' when parsing fails (returns 0 or EOF), and continues when parsing succeeds (returns 1). Signed-off-by: Sumeet Pawnikar <sumeet4linux@gmail.com> [ rjw: Subject and changelog edits ] Link: https://patch.msgid.link/20251207151549.202452-1-sumeet4linux@gmail.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2 days	powercap: fix race condition in register_control_type()	Sumeet Pawnikar	1	-5/+11
	[ Upstream commit 7bda1910c4bccd4b8d4726620bb3d6bbfb62286e ] The device becomes visible to userspace via device_register() even before it fully initialized by idr_init(). If userspace or another thread tries to register a zone immediately after device_register(), the control_type_valid() will fail because the control_type is not yet in the list. The IDR is not yet initialized, so this race condition causes zone registration failure. Move idr_init() and list addition before device_register() fix the race condition. Signed-off-by: Sumeet Pawnikar <sumeet4linux@gmail.com> [ rjw: Subject adjustment, empty line added ] Link: https://patch.msgid.link/20251205190216.5032-1-sumeet4linux@gmail.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2 days	net: enetc: fix build warning when PAGE_SIZE is greater than 128K	Wei Fang	1	-2/+2
	[ Upstream commit 4b5bdabb5449b652122e43f507f73789041d4abe ] The max buffer size of ENETC RX BD is 0xFFFF bytes, so if the PAGE_SIZE is greater than 128K, ENETC_RXB_DMA_SIZE and ENETC_RXB_DMA_SIZE_XDP will be greater than 0xFFFF, thus causing a build warning. This will not cause any practical issues because ENETC is currently only used on the ARM64 platform, and the max PAGE_SIZE is 64K. So this patch is only for fixing the build warning that occurs when compiling ENETC drivers for other platforms. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202601050637.kHEKKOG7-lkp@intel.com/ Fixes: e59bc32df2e9 ("net: enetc: correct the value of ENETC_RXB_TRUESIZE") Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Link: https://patch.msgid.link/20260107091204.1980222-1-wei.fang@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2 days	net: usb: pegasus: fix memory leak in update_eth_regs_async()	Petko Manolov	1	-0/+2
	[ Upstream commit afa27621a28af317523e0836dad430bec551eb54 ] When asynchronously writing to the device registers and if usb_submit_urb() fail, the code fail to release allocated to this point resources. Fixes: 323b34963d11 ("drivers: net: usb: pegasus: fix control urb submission") Signed-off-by: Petko Manolov <petkan@nucleusys.com> Link: https://patch.msgid.link/20260106084821.3746677-1-petko.manolov@konsulko.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2 days	HID: quirks: work around VID/PID conflict for appledisplay	René Rebe	1	-0/+9
	[ Upstream commit c7fabe4ad9219866c203164a214c474c95b36bf2 ] For years I wondered why the Apple Cinema Display driver would not just work for me. Turns out the hidraw driver instantly takes it over. Fix by adding appledisplay VID/PIDs to hid_have_special_driver. Fixes: 069e8a65cd79 ("Driver for Apple Cinema Display") Signed-off-by: René Rebe <rene@exactco.de> Signed-off-by: Jiri Kosina <jkosina@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2 days	bnxt_en: Fix potential data corruption with HW GRO/LRO	Srijit Bose	2	-6/+13
	[ Upstream commit ffeafa65b2b26df2f5b5a6118d3174f17bd12ec5 ] Fix the max number of bits passed to find_first_zero_bit() in bnxt_alloc_agg_idx(). We were incorrectly passing the number of long words. find_first_zero_bit() may fail to find a zero bit and cause a wrong ID to be used. If the wrong ID is already in use, this can cause data corruption. Sometimes an error like this can also be seen: bnxt_en 0000:83:00.0 enp131s0np0: TPA end agg_buf 2 != expected agg_bufs 1 Fix it by passing the correct number of bits MAX_TPA_P5. Use DECLARE_BITMAP() to more cleanly define the bitmap. Add a sanity check to warn if a bit cannot be found and reset the ring [MChan]. Fixes: ec4d8e7cf024 ("bnxt_en: Add TPA ID mapping logic for 57500 chips.") Reviewed-by: Ray Jui <ray.jui@broadcom.com> Signed-off-by: Srijit Bose <srijit.bose@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://patch.msgid.link/20251231083625.3911652-1-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2 days	net: wwan: iosm: Fix memory leak in ipc_mux_deinit()	Zilin Guan	1	-0/+6
	[ Upstream commit 92e6e0a87f6860a4710f9494f8c704d498ae60f8 ] Commit 1f52d7b62285 ("net: wwan: iosm: Enable M.2 7360 WWAN card support") allocated memory for pp_qlt in ipc_mux_init() but did not free it in ipc_mux_deinit(). This results in a memory leak when the driver is unloaded. Free the allocated memory in ipc_mux_deinit() to fix the leak. Fixes: 1f52d7b62285 ("net: wwan: iosm: Enable M.2 7360 WWAN card support") Co-developed-by: Jianhao Xu <jianhao.xu@seu.edu.cn> Signed-off-by: Jianhao Xu <jianhao.xu@seu.edu.cn> Signed-off-by: Zilin Guan <zilin@seu.edu.cn> Reviewed-by: Loic Poulain <loic.poulain@oss.qualcomm.com> Link: https://patch.msgid.link/20251230071853.1062223-1-zilin@seu.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2 days	net/mlx5e: Don't print error message due to invalid module	Gal Pressman	1	-1/+2
	[ Upstream commit 144297e2a24e3e54aee1180ec21120ea38822b97 ] Dumping module EEPROM on newer modules is supported through the netlink interface only. Querying with old userspace ethtool (or other tools, such as 'lshw') which still uses the ioctl interface results in an error message that could flood dmesg (in addition to the expected error return value). The original message was added under the assumption that the driver should be able to handle all module types, but now that such flows are easily triggered from userspace, it doesn't serve its purpose. Change the log level of the print in mlx5_query_module_eeprom() to debug. Fixes: bb64143eee8c ("net/mlx5e: Add ethtool support for dump module EEPROM") Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20251225132717.358820-5-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2 days	net: mscc: ocelot: Fix crash when adding interface under a lag	Jerry Wu	1	-2/+4
	[ Upstream commit 34f3ff52cb9fa7dbf04f5c734fcc4cb6ed5d1a95 ] Commit 15faa1f67ab4 ("lan966x: Fix crash when adding interface under a lag") fixed a similar issue in the lan966x driver caused by a NULL pointer dereference. The ocelot_set_aggr_pgids() function in the ocelot driver has similar logic and is susceptible to the same crash. This issue specifically affects the ocelot_vsc7514.c frontend, which leaves unused ports as NULL pointers. The felix_vsc9959.c frontend is unaffected as it uses the DSA framework which registers all ports. Fix this by checking if the port pointer is valid before accessing it. Fixes: 528d3f190c98 ("net: mscc: ocelot: drop the use of the "lags" array") Signed-off-by: Jerry Wu <w.7erry@foxmail.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/tencent_75EF812B305E26B0869C673DD1160866C90A@qq.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2 days	net: marvell: prestera: fix NULL dereference on devlink_alloc() failure	Alok Tiwari	1	-0/+2
	[ Upstream commit a428e0da1248c353557970848994f35fd3f005e2 ] devlink_alloc() may return NULL on allocation failure, but prestera_devlink_alloc() unconditionally calls devlink_priv() on the returned pointer. This leads to a NULL pointer dereference if devlink allocation fails. Add a check for a NULL devlink pointer and return NULL early to avoid the crash. Fixes: 34dd1710f5a3 ("net: marvell: prestera: Add basic devlink support") Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Acked-by: Elad Nachman <enachman@marvell.com> Link: https://patch.msgid.link/20251230052124.897012-1-alok.a.tiwari@oracle.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2 days	gpio: pca953x: handle short interrupt pulses on PCAL devices	Ernest Van Hoecke	1	-1/+24
	[ Upstream commit 014a17deb41201449f76df2b20c857a9c3294a7c ] GPIO drivers with latch input support may miss short pulses on input pins even when input latching is enabled. The generic interrupt logic in the pca953x driver reports interrupts by comparing the current input value against the previously sampled one and only signals an event when a level change is observed between two reads. For short pulses, the first edge is captured when the input register is read, but if the signal returns to its previous level before the read, the second edge is not observed. As a result, successive pulses can produce identical input values at read time and no level change is detected, causing interrupts to be missed. Below timing diagram shows this situation where the top signal is the input pin level and the bottom signal indicates the latched value. ─────┐ ┌─────────────────┐ ┌───────────────────┐ ┌───── │ │ . │ │ . │ │ . │ │ │ │ │ │ │ │ │ └────┘ │ └────┘ │ └────┘ │ Input │ │ │ │ │ │ ▼ │ ▼ │ ▼ │ IRQ │ IRQ │ IRQ │ . . . ─────┐ .┌──────────────┐ .┌────────────────┐ .┌── │ │ │ │ │ │ │ │ │ │ │ │ └────────┘ └────────┘ └────────┘ Latched │ │ │ ▼ ▼ ▼ READ 0 READ 0 READ 0 NO CHANGE NO CHANGE PCAL variants provide an interrupt status register that records which pins triggered an interrupt, but the status and input registers cannot be read atomically. The interrupt status is only cleared when the input port is read, and the input value must also be read to determine the triggering edge. If another interrupt occurs on a different line after the status register has been read but before the input register is sampled, that event will not be reflected in the earlier status snapshot, so relying solely on the interrupt status register is also insufficient. Support for input latching and interrupt status handling was previously added by [1], but the interrupt status-based logic was reverted by [2] due to these issues. This patch addresses the original problem by combining both sources of information. Events indicated by the interrupt status register are merged with events detected through the existing level-change logic. As a result: short pulses, whose second edges are invisible, are detected via the interrupt status register, and * interrupts that occur between the status and input reads are still caught by the generic level-change logic. This significantly improves robustness on devices that signal interrupts as short pulses, while avoiding the issues that led to the earlier reversion. In practice, even if only the first edge of a pulse is observable, the interrupt is reliably detected. This fixes missed interrupts from an Ilitek touch controller with its interrupt line connected to a PCAL6416A, where active-low pulses are approximately 200 us long. [1] commit 44896beae605 ("gpio: pca953x: add PCAL9535 interrupt support for Galileo Gen2") [2] commit d6179f6c6204 ("gpio: pca953x: Improve interrupt support") Fixes: d6179f6c6204 ("gpio: pca953x: Improve interrupt support") Signed-off-by: Ernest Van Hoecke <ernest.vanhoecke@toradex.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20251217153050.142057-1-ernestvanhoecke@gmail.com Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2 days	gpio: pca953x: Add support for level-triggered interrupts	Potin Lai	1	-5/+27
	[ Upstream commit 417b0f8d08f878615de9481c6e8827fbc8b57ed2 ] Adds support for level-triggered interrupts in the PCA953x GPIO expander driver. Previously, the driver only supported edge-triggered interrupts, which could lead to missed events in scenarios where an interrupt condition persists until it is explicitly cleared. By enabling level-triggered interrupts, the driver can now detect and respond to sustained interrupt conditions more reliably. Signed-off-by: Potin Lai <potin.lai.pt@gmail.com> Link: https://lore.kernel.org/r/20250409-gpio-pca953x-level-triggered-irq-v3-1-7f184d814934@gmail.com Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Stable-dep-of: 014a17deb412 ("gpio: pca953x: handle short interrupt pulses on PCAL devices") Signed-off-by: Sasha Levin <sashal@kernel.org>
2 days	gpio: pca953x: Utilise temporary variable for struct device	Andy Shevchenko	1	-16/+14
	[ Upstream commit 6811886ac91eb414b1b74920e05e6590c3f2a688 ] We have a temporary variable to keep pointer to struct device. Utilise it where it makes sense. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Stable-dep-of: 014a17deb412 ("gpio: pca953x: handle short interrupt pulses on PCAL devices") Signed-off-by: Sasha Levin <sashal@kernel.org>
2 days	gpio: pca953x: Utilise dev_err_probe() where it makes sense	Andy Shevchenko	1	-5/+3
	[ Upstream commit c47f7ff0fe61738a40b1b4fef3cd8317ec314079 ] At least in pca953x_irq_setup() we may use dev_err_probe(). Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Stable-dep-of: 014a17deb412 ("gpio: pca953x: handle short interrupt pulses on PCAL devices") Signed-off-by: Sasha Levin <sashal@kernel.org>
2 days	scsi: Revert "scsi: libsas: Fix exp-attached device scan after probe failure ↵	Xingui Yang	1	-14/+0
	scanned in again after probe failed" [ Upstream commit 278712d20bc8ec29d1ad6ef9bdae9000ef2c220c ] This reverts commit ab2068a6fb84751836a84c26ca72b3beb349619d. When probing the exp-attached sata device, libsas/libata will issue a hard reset in sas_probe_sata() -> ata_sas_async_probe(), then a broadcast event will be received after the disk probe fails, and this commit causes the probe will be re-executed on the disk, and a faulty disk may get into an indefinite loop of probe. Therefore, revert this commit, although it can fix some temporary issues with disk probe failure. Signed-off-by: Xingui Yang <yangxingui@huawei.com> Reviewed-by: Jason Yan <yanaijie@huawei.com> Reviewed-by: John Garry <john.g.garry@oracle.com> Link: https://patch.msgid.link/20251202065627.140361-1-yangxingui@huawei.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2 days	scsi: ufs: core: Fix EH failure after W-LUN resume error	Brian Kao	1	-8/+28
	[ Upstream commit b4bb6daf4ac4d4560044ecdd81e93aa2f6acbb06 ] When a W-LUN resume fails, its parent devices in the SCSI hierarchy, including the scsi_target, may be runtime suspended. Subsequently, the error handler in ufshcd_recover_pm_error() fails to set the W-LUN device back to active because the parent target is not active. This results in the following errors: google-ufshcd 3c2d0000.ufs: ufshcd_err_handler started; HBA state eh_fatal; ... ufs_device_wlun 0:0:0:49488: START_STOP failed for power mode: 1, result 40000 ufs_device_wlun 0:0:0:49488: ufshcd_wl_runtime_resume failed: -5 ... ufs_device_wlun 0:0:0:49488: runtime PM trying to activate child device 0:0:0:49488 but parent (target0:0:0) is not active Address this by: 1. Ensuring the W-LUN's parent scsi_target is runtime resumed before attempting to set the W-LUN to active within ufshcd_recover_pm_error(). 2. Explicitly checking for power.runtime_error on the HBA and W-LUN devices before calling pm_runtime_set_active() to clear the error state. 3. Adding pm_runtime_get_sync(hba->dev) in ufshcd_err_handling_prepare() to ensure the HBA itself is active during error recovery, even if a child device resume failed. These changes ensure the device power states are managed correctly during error recovery. Signed-off-by: Brian Kao <powenkao@google.com> Tested-by: Brian Kao <powenkao@google.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://patch.msgid.link/20251112063214.1195761-1-powenkao@google.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2 days	scsi: ipr: Enable/disable IRQD_NO_BALANCING during reset	Wen Xiong	1	-1/+27
	[ Upstream commit 6ac3484fb13b2fc7f31cfc7f56093e7d0ce646a5 ] A dynamic remove/add storage adapter test hits EEH on PowerPC: EEH: [c00000000004f75c] __eeh_send_failure_event+0x7c/0x160 EEH: [c000000000048444] eeh_dev_check_failure.part.0+0x254/0x650 EEH: [c008000001650678] eeh_readl+0x60/0x90 [ipr] EEH: [c00800000166746c] ipr_cancel_op+0x2b8/0x524 [ipr] EEH: [c008000001656524] ipr_eh_abort+0x6c/0x130 [ipr] EEH: [c000000000ab0d20] scmd_eh_abort_handler+0x140/0x440 EEH: [c00000000017e558] process_one_work+0x298/0x590 EEH: [c00000000017eef8] worker_thread+0xa8/0x620 EEH: [c00000000018be34] kthread+0x124/0x130 EEH: [c00000000000cd64] ret_from_kernel_thread+0x5c/0x64 A PCIe bus trace reveals that a vector of MSI-X is cleared to 0 by irqbalance daemon. If we disable irqbalance daemon, we won't see the issue. With debug enabled in ipr driver: [ 44.103071] ipr: Entering __ipr_remove [ 44.103083] ipr: Entering ipr_initiate_ioa_bringdown [ 44.103091] ipr: Entering ipr_reset_shutdown_ioa [ 44.103099] ipr: Leaving ipr_reset_shutdown_ioa [ 44.103105] ipr: Leaving ipr_initiate_ioa_bringdown [ 44.149918] ipr: Entering ipr_reset_ucode_download [ 44.149935] ipr: Entering ipr_reset_alert [ 44.150032] ipr: Entering ipr_reset_start_timer [ 44.150038] ipr: Leaving ipr_reset_alert [ 44.244343] scsi 1:2:3:0: alua: Detached [ 44.254300] ipr: Entering ipr_reset_start_bist [ 44.254320] ipr: Entering ipr_reset_start_timer [ 44.254325] ipr: Leaving ipr_reset_start_bist [ 44.364329] scsi 1:2:4:0: alua: Detached [ 45.134341] scsi 1:2:5:0: alua: Detached [ 45.860949] ipr: Entering ipr_reset_shutdown_ioa [ 45.860962] ipr: Leaving ipr_reset_shutdown_ioa [ 45.860966] ipr: Entering ipr_reset_alert [ 45.861028] ipr: Entering ipr_reset_start_timer [ 45.861035] ipr: Leaving ipr_reset_alert [ 45.964302] ipr: Entering ipr_reset_start_bist [ 45.964309] ipr: Entering ipr_reset_start_timer [ 45.964313] ipr: Leaving ipr_reset_start_bist [ 46.264301] ipr: Entering ipr_reset_bist_done [ 46.264309] ipr: Leaving ipr_reset_bist_done During adapter reset, ipr device driver blocks config space access but can't block MMIO access for MSI-X entries. There is very small window: irqbalance daemon kicks in during adapter reset before ipr driver calls pci_restore_state(pdev) to restore MSI-X table. irqbalance daemon reads back all 0 for that MSI-X vector in __pci_read_msi_msg(). irqbalance daemon: msi_domain_set_affinity() ->irq_chip_set_affinity_patent() ->xive_irq_set_affinity() ->irq_chip_compose_msi_msg() ->pseries_msi_compose_msg() ->__pci_read_msi_msg(): read all 0 since didn't call pci_restore_state ->irq_chip_write_msi_msg() -> pci_write_msg_msi(): write 0 to the msix vector entry When ipr driver calls pci_restore_state(pdev) in ipr_reset_restore_cfg_space(), the MSI-X vector entry has been cleared by irqbalance daemon in pci_write_msg_msix(). pci_restore_state() ->__pci_restore_msix_state() Below is the MSI-X table for ipr adapter after irqbalance daemon kicked in during adapter reset: Dump MSIx table: index=0 address_lo=c800 address_hi=10000000 msg_data=0 Dump MSIx table: index=1 address_lo=c810 address_hi=10000000 msg_data=0 Dump MSIx table: index=2 address_lo=c820 address_hi=10000000 msg_data=0 Dump MSIx table: index=3 address_lo=c830 address_hi=10000000 msg_data=0 Dump MSIx table: index=4 address_lo=c840 address_hi=10000000 msg_data=0 Dump MSIx table: index=5 address_lo=c850 address_hi=10000000 msg_data=0 Dump MSIx table: index=6 address_lo=c860 address_hi=10000000 msg_data=0 Dump MSIx table: index=7 address_lo=c870 address_hi=10000000 msg_data=0 Dump MSIx table: index=8 address_lo=0 address_hi=0 msg_data=0 ---------> Hit EEH since msix vector of index=8 are 0 Dump MSIx table: index=9 address_lo=c890 address_hi=10000000 msg_data=0 Dump MSIx table: index=10 address_lo=c8a0 address_hi=10000000 msg_data=0 Dump MSIx table: index=11 address_lo=c8b0 address_hi=10000000 msg_data=0 Dump MSIx table: index=12 address_lo=c8c0 address_hi=10000000 msg_data=0 Dump MSIx table: index=13 address_lo=c8d0 address_hi=10000000 msg_data=0 Dump MSIx table: index=14 address_lo=c8e0 address_hi=10000000 msg_data=0 Dump MSIx table: index=15 address_lo=c8f0 address_hi=10000000 msg_data=0 [ 46.264312] ipr: Entering ipr_reset_restore_cfg_space [ 46.267439] ipr: Entering ipr_fail_all_ops [ 46.267447] ipr: Leaving ipr_fail_all_ops [ 46.267451] ipr: Leaving ipr_reset_restore_cfg_space [ 46.267454] ipr: Entering ipr_ioa_bringdown_done [ 46.267458] ipr: Leaving ipr_ioa_bringdown_done [ 46.267467] ipr: Entering ipr_worker_thread [ 46.267470] ipr: Leaving ipr_worker_thread IRQ balancing is not required during adapter reset. Enable "IRQ_NO_BALANCING" flag before starting adapter reset and disable it after calling pci_restore_state(). The irqbalance daemon is disabled for this short period of time (~2s). Co-developed-by: Kyle Mahlkuch <Kyle.Mahlkuch@ibm.com> Signed-off-by: Kyle Mahlkuch <Kyle.Mahlkuch@ibm.com> Signed-off-by: Wen Xiong <wenxiong@linux.ibm.com> Link: https://patch.msgid.link/20251028142427.3969819-2-wenxiong@linux.ibm.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2 days	dm-snapshot: fix 'scheduling while atomic' on real-time kernels	Mikulas Patocka	2	-40/+35
	[ Upstream commit 8581b19eb2c5ccf06c195d3b5468c3c9d17a5020 ] There is reported 'scheduling while atomic' bug when using dm-snapshot on real-time kernels. The reason for the bug is that the hlist_bl code does preempt_disable() when taking the lock and the kernel attempts to take other spinlocks while holding the hlist_bl lock. Fix this by converting a hlist_bl spinlock into a regular spinlock. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Reported-by: Jiping Ma <jiping.ma2@windriver.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2 days	pinctrl: qcom: lpass-lpi: mark the GPIO controller as sleeping	Bartosz Golaszewski	1	-1/+1
	commit ebc18e9854e5a2b62a041fb57b216a903af45b85 upstream. The gpio_chip settings in this driver say the controller can't sleep but it actually uses a mutex for synchronization. This triggers the following BUG(): [ 9.233659] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:281 [ 9.233665] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 554, name: (udev-worker) [ 9.233669] preempt_count: 1, expected: 0 [ 9.233673] RCU nest depth: 0, expected: 0 [ 9.233688] Tainted: [W]=WARN [ 9.233690] Hardware name: Dell Inc. Latitude 7455/0FK7MX, BIOS 2.10.1 05/20/2025 [ 9.233694] Call trace: [ 9.233696] show_stack+0x24/0x38 (C) [ 9.233709] dump_stack_lvl+0x40/0x88 [ 9.233716] dump_stack+0x18/0x24 [ 9.233722] __might_resched+0x148/0x160 [ 9.233731] __might_sleep+0x38/0x98 [ 9.233736] mutex_lock+0x30/0xd8 [ 9.233749] lpi_config_set+0x2e8/0x3c8 [pinctrl_lpass_lpi] [ 9.233757] lpi_gpio_direction_output+0x58/0x90 [pinctrl_lpass_lpi] [ 9.233761] gpiod_direction_output_raw_commit+0x110/0x428 [ 9.233772] gpiod_direction_output_nonotify+0x234/0x358 [ 9.233779] gpiod_direction_output+0x38/0xd0 [ 9.233786] gpio_shared_proxy_direction_output+0xb8/0x2a8 [gpio_shared_proxy] [ 9.233792] gpiod_direction_output_raw_commit+0x110/0x428 [ 9.233799] gpiod_direction_output_nonotify+0x234/0x358 [ 9.233806] gpiod_configure_flags+0x2c0/0x580 [ 9.233812] gpiod_find_and_request+0x358/0x4f8 [ 9.233819] gpiod_get_index+0x7c/0x98 [ 9.233826] devm_gpiod_get+0x34/0xb0 [ 9.233829] reset_gpio_probe+0x58/0x128 [reset_gpio] [ 9.233836] auxiliary_bus_probe+0xb0/0xf0 [ 9.233845] really_probe+0x14c/0x450 [ 9.233853] __driver_probe_device+0xb0/0x188 [ 9.233858] driver_probe_device+0x4c/0x250 [ 9.233863] __driver_attach+0xf8/0x2a0 [ 9.233868] bus_for_each_dev+0xf8/0x158 [ 9.233872] driver_attach+0x30/0x48 [ 9.233876] bus_add_driver+0x158/0x2b8 [ 9.233880] driver_register+0x74/0x118 [ 9.233886] __auxiliary_driver_register+0x94/0xe8 [ 9.233893] init_module+0x34/0xfd0 [reset_gpio] [ 9.233898] do_one_initcall+0xec/0x300 [ 9.233903] do_init_module+0x64/0x260 [ 9.233910] load_module+0x16c4/0x1900 [ 9.233915] __arm64_sys_finit_module+0x24c/0x378 [ 9.233919] invoke_syscall+0x4c/0xe8 [ 9.233925] el0_svc_common+0x8c/0xf0 [ 9.233929] do_el0_svc+0x28/0x40 [ 9.233934] el0_svc+0x38/0x100 [ 9.233938] el0t_64_sync_handler+0x84/0x130 [ 9.233943] el0t_64_sync+0x17c/0x180 Mark the controller as sleeping. Fixes: 6e261d1090d6 ("pinctrl: qcom: Add sm8250 lpass lpi pinctrl driver") Cc: stable@vger.kernel.org Reported-by: Val Packett <val@packett.cool> Closes: https://lore.kernel.org/all/98c0f185-b0e0-49ea-896c-f3972dd011ca@packett.cool/ Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Reviewed-by: Bjorn Andersson <andersson@kernel.org> Signed-off-by: Linus Walleij <linusw@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 days	gpio: rockchip: mark the GPIO controller as sleeping	Bartosz Golaszewski	1	-0/+1
	commit 20cf2aed89ac6d78a0122e31c875228e15247194 upstream. The GPIO controller is configured as non-sleeping but it uses generic pinctrl helpers which use a mutex for synchronization. This can cause the following lockdep splat with shared GPIOs enabled on boards which have multiple devices using the same GPIO: BUG: sleeping function called from invalid context at kernel/locking/mutex.c:591 in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 12, name: kworker/u16:0 preempt_count: 1, expected: 0 RCU nest depth: 0, expected: 0 6 locks held by kworker/u16:0/12: #0: ffff0001f0018d48 ((wq_completion)events_unbound#2){+.+.}-{0:0}, at: process_one_work+0x18c/0x604 #1: ffff8000842dbdf0 (deferred_probe_work){+.+.}-{0:0}, at: process_one_work+0x1b4/0x604 #2: ffff0001f18498f8 (&dev->mutex){....}-{4:4}, at: __device_attach+0x38/0x1b0 #3: ffff0001f75f1e90 (&gdev->srcu){.+.?}-{0:0}, at: gpiod_direction_output_raw_commit+0x0/0x360 #4: ffff0001f46e3db8 (&shared_desc->spinlock){....}-{3:3}, at: gpio_shared_proxy_direction_output+0xd0/0x144 [gpio_shared_proxy] #5: ffff0001f180ee90 (&gdev->srcu){.+.?}-{0:0}, at: gpiod_direction_output_raw_commit+0x0/0x360 irq event stamp: 81450 hardirqs last enabled at (81449): [<ffff8000813acba4>] _raw_spin_unlock_irqrestore+0x74/0x78 hardirqs last disabled at (81450): [<ffff8000813abfb8>] _raw_spin_lock_irqsave+0x84/0x88 softirqs last enabled at (79616): [<ffff8000811455fc>] __alloc_skb+0x17c/0x1e8 softirqs last disabled at (79614): [<ffff8000811455fc>] __alloc_skb+0x17c/0x1e8 CPU: 2 UID: 0 PID: 12 Comm: kworker/u16:0 Not tainted 6.19.0-rc4-next-20260105+ #11975 PREEMPT Hardware name: Hardkernel ODROID-M1 (DT) Workqueue: events_unbound deferred_probe_work_func Call trace: show_stack+0x18/0x24 (C) dump_stack_lvl+0x90/0xd0 dump_stack+0x18/0x24 __might_resched+0x144/0x248 __might_sleep+0x48/0x98 __mutex_lock+0x5c/0x894 mutex_lock_nested+0x24/0x30 pinctrl_get_device_gpio_range+0x44/0x128 pinctrl_gpio_direction+0x3c/0xe0 pinctrl_gpio_direction_output+0x14/0x20 rockchip_gpio_direction_output+0xb8/0x19c gpiochip_direction_output+0x38/0x94 gpiod_direction_output_raw_commit+0x1d8/0x360 gpiod_direction_output_nonotify+0x7c/0x230 gpiod_direction_output+0x34/0xf8 gpio_shared_proxy_direction_output+0xec/0x144 [gpio_shared_proxy] gpiochip_direction_output+0x38/0x94 gpiod_direction_output_raw_commit+0x1d8/0x360 gpiod_direction_output_nonotify+0x7c/0x230 gpiod_configure_flags+0xbc/0x480 gpiod_find_and_request+0x1a0/0x574 gpiod_get_index+0x58/0x84 devm_gpiod_get_index+0x20/0xb4 devm_gpiod_get_optional+0x18/0x30 rockchip_pcie_probe+0x98/0x380 platform_probe+0x5c/0xac really_probe+0xbc/0x298 Fixes: 936ee2675eee ("gpio/rockchip: add driver for rockchip gpio") Cc: stable@vger.kernel.org Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Closes: https://lore.kernel.org/all/d035fc29-3b03-4cd6-b8ec-001f93540bc6@samsung.com/ Acked-by: Heiko Stuebner <heiko@sntech.de> Link: https://lore.kernel.org/r/20260106090011.21603-1-bartosz.golaszewski@oss.qualcomm.com Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 days	drm/radeon: Remove __counted_by from ClockInfoArray.clockInfo[]	Alex Deucher	1	-1/+1
	commit 19158c7332468bc28572bdca428e89c7954ee1b1 upstream. clockInfo[] is a generic uchar pointer to variable sized structures which vary from ASIC to ASIC. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4374 Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit dc135aa73561b5acc74eadf776e48530996529a3) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 days	drm/pl111: Fix error handling in pl111_amba_probe	Miaoqian Lin	1	-1/+1
	commit 0ddd3bb4b14c9102c0267b3fd916c81fe5ab89c1 upstream. Jump to the existing dev_put label when devm_request_irq() fails so drm_dev_put() and of_reserved_mem_device_release() run instead of returning early and leaking resources. Found via static analysis and code review. Fixes: bed41005e617 ("drm/pl111: Initial drm/kms driver for pl111") Cc: stable@vger.kernel.org Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Signed-off-by: Linus Walleij <linusw@kernel.org> Link: https://patch.msgid.link/20251211123345.2392065-1-linmq006@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 days	counter: interrupt-cnt: Drop IRQF_NO_THREAD flag	Alexander Sverdlin	1	-2/+1
	commit 23f9485510c338476b9735d516c1d4aacb810d46 upstream. An IRQ handler can either be IRQF_NO_THREAD or acquire spinlock_t, as CONFIG_PROVE_RAW_LOCK_NESTING warns: ============================= [ BUG: Invalid wait context ] 6.18.0-rc1+git... #1 ----------------------------- some-user-space-process/1251 is trying to lock: (&counter->events_list_lock){....}-{3:3}, at: counter_push_event [counter] other info that might help us debug this: context-{2:2} no locks held by some-user-space-process/.... stack backtrace: CPU: 0 UID: 0 PID: 1251 Comm: some-user-space-process 6.18.0-rc1+git... #1 PREEMPT Call trace: show_stack (C) dump_stack_lvl dump_stack __lock_acquire lock_acquire _raw_spin_lock_irqsave counter_push_event [counter] interrupt_cnt_isr [interrupt_cnt] __handle_irq_event_percpu handle_irq_event handle_simple_irq handle_irq_desc generic_handle_domain_irq gpio_irq_handler handle_irq_desc generic_handle_domain_irq gic_handle_irq call_on_irq_stack do_interrupt_handler el0_interrupt __el0_irq_handler_common el0t_64_irq_handler el0t_64_irq ... and Sebastian correctly points out. Remove IRQF_NO_THREAD as an alternative to switching to raw_spinlock_t, because the latter would limit all potential nested locks to raw_spinlock_t only. Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20251117151314.xwLAZrWY@linutronix.de/ Fixes: a55ebd47f21f ("counter: add IRQ or GPIO based counter") Signed-off-by: Alexander Sverdlin <alexander.sverdlin@siemens.com> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://lore.kernel.org/r/20251118083603.778626-1-alexander.sverdlin@siemens.com Signed-off-by: William Breathitt Gray <wbg@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 days	counter: 104-quad-8: Fix incorrect return value in IRQ handler	Haotian Zhang	1	-6/+14
	commit 9517d76dd160208b7a432301ce7bec8fc1ddc305 upstream. quad8_irq_handler() should return irqreturn_t enum values, but it directly returns negative errno codes from regmap operations on error. Return IRQ_NONE if the interrupt status cannot be read. If clearing the interrupt fails, return IRQ_HANDLED to prevent the kernel from disabling the IRQ line due to a spurious interrupt storm. Also, log these regmap failures with dev_WARN_ONCE. Fixes: 98ffe0252911 ("counter: 104-quad-8: Migrate to the regmap API") Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Haotian Zhang <vulab@iscas.ac.cn> Link: https://lore.kernel.org/r/20251215020114.1913-1-vulab@iscas.ac.cn Cc: stable@vger.kernel.org Signed-off-by: William Breathitt Gray <wbg@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 days	mei: me: add nova lake point S DID	Alexander Usyskin	2	-0/+4
	commit 420f423defcf6d0af2263d38da870ca4a20c0990 upstream. Add Nova Lake S device id. Cc: stable <stable@kernel.org> Co-developed-by: Tomas Winkler <tomasw@gmail.com> Signed-off-by: Tomas Winkler <tomasw@gmail.com> Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com> Link: https://patch.msgid.link/20251215105915.1672659-1-alexander.usyskin@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 days	net: 3com: 3c59x: fix possible null dereference in vortex_probe1()	Thomas Fourier	1	-1/+1
	commit a4e305ed60f7c41bbf9aabc16dd75267194e0de3 upstream. pdev can be null and free_ring: can be called in 1297 with a null pdev. Fixes: 55c82617c3e8 ("3c59x: convert to generic DMA API") Cc: <stable@vger.kernel.org> Signed-off-by: Thomas Fourier <fourier.thomas@gmail.com> Link: https://patch.msgid.link/20260106094731.25819-2-fourier.thomas@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2 days	atm: Fix dma_free_coherent() size	Thomas Fourier	1	-1/+2
	commit 4d984b0574ff708e66152763fbfdef24ea40933f upstream. The size of the buffer is not the same when alloc'd with dma_alloc_coherent() in he_init_tpdrq() and freed. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: <stable@vger.kernel.org> Signed-off-by: Thomas Fourier <fourier.thomas@gmail.com> Link: https://patch.msgid.link/20260107090141.80900-2-fourier.thomas@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
8 days	Revert "iommu/amd: Skip enabling command/event buffers for kdump"	Greg Kroah-Hartman	1	-19/+9
	This reverts commit 44a764aec64b3f3235b9cbac2525222f69685418 which is commit 9be15fbfc6c5c89c22cf6e209f66ea43ee0e58bb upstream. This causes problems in older kernel trees as SNP host kdump is not supported in them, so drop it from the stable branches. Reported-by: Ashish Kalra <ashish.kalra@amd.com> Link: https://lore.kernel.org/r/dacdff7f-0606-4ed5-b056-2de564404d51@amd.com Cc: Vasant Hegde <vasant.hegde@amd.com> Cc: Sairaj Kodilkar <sarunkod@amd.com> Cc: Joerg Roedel <joerg.roedel@amd.com> Cc: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
8 days	firmware: arm_scmi: Fix unused notifier-block in unregister	Amitai Gottlieb	1	-0/+1
	In scmi_devm_notifier_unregister(), the notifier-block argument was ignored and never passed to devres_release(). As a result, the function always returned -ENOENT and failed to unregister the notifier. Drivers that depend on this helper for teardown could therefore hit unexpected failures, including kernel panics. Commit 264a2c520628 ("firmware: arm_scmi: Simplify scmi_devm_notifier_unregister") removed the faulty code path during refactoring and hence this fix is not required upstream. Cc: <stable@vger.kernel.org> # 5.15.x, 6.1.x, and 6.6.x Fixes: 5ad3d1cf7d34 ("firmware: arm_scmi: Introduce new devres notification ops") Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Cristian Marussi <cristian.marussi@arm.com> Signed-off-by: Amitai Gottlieb <amitaig@hailo.ai> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
8 days	tty: fix tty_port_tty_*hangup() kernel-doc	Jiri Slaby (SUSE)	1	-5/+0
	commit 6241b49540a65a6d5274fa938fd3eb4cbfe2e076 upstream. The commit below added a new helper, but omitted to move (and add) the corressponding kernel-doc. Do it now. Signed-off-by: "Jiri Slaby (SUSE)" <jirislaby@kernel.org> Fixes: 2b5eac0f8c6e ("tty: introduce and use tty_port_tty_vhangup() helper") Link: https://lore.kernel.org/all/b23d566c-09dc-7374-cc87-0ad4660e8b2e@linux.intel.com/ Reported-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: linux-doc@vger.kernel.org Link: https://lore.kernel.org/r/20250624080641.509959-6-jirislaby@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
8 days	pwm: stm32: Always program polarity	Sean Nyekjaer	1	-2/+1
	Commit 7346e7a058a2 ("pwm: stm32: Always do lazy disabling") triggered a regression where PWM polarity changes could be ignored. stm32_pwm_set_polarity() was skipped due to a mismatch between the cached pwm->state.polarity and the actual hardware state, leaving the hardware polarity unchanged. Fixes: 7edf7369205b ("pwm: Add driver for STM32 plaftorm") Cc: stable@vger.kernel.org # <= 6.12 Signed-off-by: Sean Nyekjaer <sean@geanix.com> Co-developed-by: Uwe Kleine-König <ukleinek@kernel.org> Signed-off-by: Uwe Kleine-König <ukleinek@kernel.org>
8 days	net: stmmac: make sure that ptp_rate is not 0 before configuring EST	Alexis Lothoré	2	-0/+10
	commit cbefe2ffa7784525ec5d008ba87c7add19ec631a upstream. If the ptp_rate recorded earlier in the driver happens to be 0, this bogus value will propagate up to EST configuration, where it will trigger a division by 0. Prevent this division by 0 by adding the corresponding check and error code. Suggested-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com> Fixes: 8572aec3d0dc ("net: stmmac: Add basic EST support for XGMAC") Link: https://patch.msgid.link/20250529-stmmac_tstamp_div-v4-2-d73340a794d5@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> [ The context change is due to the commit c3f3b97238f6 ("net: stmmac: Refactor EST implementation") which is irrelevant to the logic of this patch. ] Signed-off-by: Rahul Sharma <black.hawk@163.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
8 days	virtio_console: fix order of fields cols and rows	Maximilian Immanuel Brandtner	1	-1/+1
	commit 5326ab737a47278dbd16ed3ee7380b26c7056ddd upstream. According to section 5.3.6.2 (Multiport Device Operation) of the virtio spec(version 1.2) a control buffer with the event VIRTIO_CONSOLE_RESIZE is followed by a virtio_console_resize struct containing cols then rows. The kernel implements this the wrong way around (rows then cols) resulting in the two values being swapped. Signed-off-by: Maximilian Immanuel Brandtner <maxbr@linux.ibm.com> Message-Id: <20250324144300.905535-1-maxbr@linux.ibm.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Cc: Filip Hejsek <filip.hejsek@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
8 days	RDMA/rxe: Fix the failure of ibv_query_device() and ibv_query_device_ex() tests	Zhu Yanjun	1	-19/+6
	[ Upstream commit 8ce2eb9dfac8743d1c423b86339336a5b6a6069e ] In rdma-core, the following failures appear. " $ ./build/bin/run_tests.py -k device ssssssss....FF........s ====================================================================== FAIL: test_query_device (tests.test_device.DeviceTest.test_query_device) Test ibv_query_device() ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/ubuntu/rdma-core/tests/test_device.py", line 63, in test_query_device self.verify_device_attr(attr, dev) File "/home/ubuntu/rdma-core/tests/test_device.py", line 200, in verify_device_attr assert attr.sys_image_guid != 0 ^^^^^^^^^^^^^^^^^^^^^^^^ AssertionError ====================================================================== FAIL: test_query_device_ex (tests.test_device.DeviceTest.test_query_device_ex) Test ibv_query_device_ex() ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/ubuntu/rdma-core/tests/test_device.py", line 222, in test_query_device_ex self.verify_device_attr(attr_ex.orig_attr, dev) File "/home/ubuntu/rdma-core/tests/test_device.py", line 200, in verify_device_attr assert attr.sys_image_guid != 0 ^^^^^^^^^^^^^^^^^^^^^^^^ AssertionError " The root cause is: before a net device is set with rxe, this net device is used to generate a sys_image_guid. Fixes: 2ac5415022d1 ("RDMA/rxe: Remove the direct link to net_device") Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> Link: https://patch.msgid.link/20250302215444.3742072-1-yanjun.zhu@linux.dev Reviewed-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Tested-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> [Shivani: Modified to apply on 6.6.y] Signed-off-by: Shivani Agarwal <shivani.agarwal@broadcom.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
8 days	RDMA/rxe: Remove the direct link to net_device	Zhu Yanjun	7	-20/+91
	[ Upstream commit 2ac5415022d16d63d912a39a06f32f1f51140261 ] The similar patch in siw is in the link: https://git.kernel.org/rdma/rdma/c/16b87037b48889 This problem also occurred in RXE. The following analyze this problem. In the following Call Traces: " BUG: KASAN: slab-use-after-free in dev_get_flags+0x188/0x1d0 net/core/dev.c:8782 Read of size 4 at addr ffff8880554640b0 by task kworker/1:4/5295 CPU: 1 UID: 0 PID: 5295 Comm: kworker/1:4 Not tainted 6.12.0-rc3-syzkaller-00399-g9197b73fd7bb #0 Hardware name: Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024 Workqueue: infiniband ib_cache_event_task Call Trace: <TASK> __dump_stack lib/dump_stack.c:94 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120 print_address_description mm/kasan/report.c:377 [inline] print_report+0x169/0x550 mm/kasan/report.c:488 kasan_report+0x143/0x180 mm/kasan/report.c:601 dev_get_flags+0x188/0x1d0 net/core/dev.c:8782 rxe_query_port+0x12d/0x260 drivers/infiniband/sw/rxe/rxe_verbs.c:60 __ib_query_port drivers/infiniband/core/device.c:2111 [inline] ib_query_port+0x168/0x7d0 drivers/infiniband/core/device.c:2143 ib_cache_update+0x1a9/0xb80 drivers/infiniband/core/cache.c:1494 ib_cache_event_task+0xf3/0x1e0 drivers/infiniband/core/cache.c:1568 process_one_work kernel/workqueue.c:3229 [inline] process_scheduled_works+0xa65/0x1850 kernel/workqueue.c:3310 worker_thread+0x870/0xd30 kernel/workqueue.c:3391 kthread+0x2f2/0x390 kernel/kthread.c:389 ret_from_fork+0x4d/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 </TASK> " 1). In the link [1], " infiniband syz2: set down " This means that on 839.350575, the event ib_cache_event_task was sent andi queued in ib_wq. 2). In the link [1], " team0 (unregistering): Port device team_slave_0 removed " It indicates that before 843.251853, the net device should be freed. 3). In the link [1], " BUG: KASAN: slab-use-after-free in dev_get_flags+0x188/0x1d0 " This means that on 850.559070, this slab-use-after-free problem occurred. In all, on 839.350575, the event ib_cache_event_task was sent and queued in ib_wq, before 843.251853, the net device veth was freed. on 850.559070, this event was executed, and the mentioned freed net device was called. Thus, the above call trace occurred. [1] https://syzkaller.appspot.com/x/log.txt?x=12e7025f980000 Reported-by: syzbot+4b87489410b4efd181bf@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=4b87489410b4efd181bf Fixes: 8700e3e7c485 ("Soft RoCE driver") Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> Link: https://patch.msgid.link/20241220222325.2487767-1-yanjun.zhu@linux.dev Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> [Shivani: - exported ib_device_get_netdev() function. - added ib_device_get_netdev() to ib_verbs.h.] Signed-off-by: Shivani Agarwal <shivani.agarwal@broadcom.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
8 days	RDMA/core: Fix "KASAN: slab-use-after-free Read in ib_register_device" problem	Zhu Yanjun	1	-0/+5
	commit d0706bfd3ee40923c001c6827b786a309e2a8713 upstream. Call Trace: __dump_stack lib/dump_stack.c:94 [inline] dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:120 print_address_description mm/kasan/report.c:408 [inline] print_report+0xc3/0x670 mm/kasan/report.c:521 kasan_report+0xe0/0x110 mm/kasan/report.c:634 strlen+0x93/0xa0 lib/string.c:420 __fortify_strlen include/linux/fortify-string.h:268 [inline] get_kobj_path_length lib/kobject.c:118 [inline] kobject_get_path+0x3f/0x2a0 lib/kobject.c:158 kobject_uevent_env+0x289/0x1870 lib/kobject_uevent.c:545 ib_register_device drivers/infiniband/core/device.c:1472 [inline] ib_register_device+0x8cf/0xe00 drivers/infiniband/core/device.c:1393 rxe_register_device+0x275/0x320 drivers/infiniband/sw/rxe/rxe_verbs.c:1552 rxe_net_add+0x8e/0xe0 drivers/infiniband/sw/rxe/rxe_net.c:550 rxe_newlink+0x70/0x190 drivers/infiniband/sw/rxe/rxe.c:225 nldev_newlink+0x3a3/0x680 drivers/infiniband/core/nldev.c:1796 rdma_nl_rcv_msg+0x387/0x6e0 drivers/infiniband/core/netlink.c:195 rdma_nl_rcv_skb.constprop.0.isra.0+0x2e5/0x450 netlink_unicast_kernel net/netlink/af_netlink.c:1313 [inline] netlink_unicast+0x53a/0x7f0 net/netlink/af_netlink.c:1339 netlink_sendmsg+0x8d1/0xdd0 net/netlink/af_netlink.c:1883 sock_sendmsg_nosec net/socket.c:712 [inline] __sock_sendmsg net/socket.c:727 [inline] ____sys_sendmsg+0xa95/0xc70 net/socket.c:2566 ___sys_sendmsg+0x134/0x1d0 net/socket.c:2620 __sys_sendmsg+0x16d/0x220 net/socket.c:2652 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xcd/0x260 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f This problem is similar to the problem that the commit 1d6a9e7449e2 ("RDMA/core: Fix use-after-free when rename device name") fixes. The root cause is: the function ib_device_rename() renames the name with lock. But in the function kobject_uevent(), this name is accessed without lock protection at the same time. The solution is to add the lock protection when this name is accessed in the function kobject_uevent(). Fixes: 779e0bf47632 ("RDMA/core: Do not indicate device ready when device enablement fails") Link: https://patch.msgid.link/r/20250506151008.75701-1-yanjun.zhu@linux.dev Reported-by: syzbot+e2ce9e275ecc70a30b72@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=e2ce9e275ecc70a30b72 Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org> [ Ajay: Modified to apply on v5.10.y-v6.6.y ib_device_notify_register() not present in v5.10.y-v6.6.y, so directly added lock for kobject_uevent() ] Signed-off-by: Ajay Kaher <ajay.kaher@broadcom.com> Signed-off-by: Shivani Agarwal <shivani.agarwal@broadcom.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
8 days	media: mediatek: vcodec: Use spinlock for context list protection lock	Chen-Yu Tsai	7	-20/+28
	[ Upstream commit a5844227e0f030d2af2d85d4aed10c5eca6ca176 ] Previously a mutex was added to protect the encoder and decoder context lists from unexpected changes originating from the SCP IP block, causing the context pointer to go invalid, resulting in a NULL pointer dereference in the IPI handler. Turns out on the MT8173, the VPU IPI handler is called from hard IRQ context. This causes a big warning from the scheduler. This was first reported downstream on the ChromeOS kernels, but is also reproducible on mainline using Fluster with the FFmpeg v4l2m2m decoders. Even though the actual capture format is not supported, the affected code paths are triggered. Since this lock just protects the context list and operations on it are very fast, it should be OK to switch to a spinlock. Fixes: 6467cda18c9f ("media: mediatek: vcodec: adding lock to protect decoder context list") Fixes: afaaf3a0f647 ("media: mediatek: vcodec: adding lock to protect encoder context list") Cc: Yunfei Dong <yunfei.dong@mediatek.com> Cc: stable@vger.kernel.org Signed-off-by: Chen-Yu Tsai <wenst@chromium.org> Reviewed-by: Fei Shao <fshao@chromium.org> Reviewed-by: Tomasz Figa <tfiga@chromium.org> Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com> Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org> [ adapted file_to_dec_ctx() and file_to_enc_ctx() helper calls to equivalent fh_to_dec_ctx(file->private_data) and fh_to_enc_ctx(file->private_data) pattern ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
8 days	media: amphion: Remove vpu_vb_is_codecconfig	Ming Qian	3	-40/+3
	[ Upstream commit 634c2cd17bd021487c57b95973bddb14be8002ff ] Currently the function vpu_vb_is_codecconfig() always returns 0. Delete it and its related code. Fixes: 3cd084519c6f ("media: amphion: add vpu v4l2 m2m support") Cc: stable@vger.kernel.org Signed-off-by: Ming Qian <ming.qian@oss.nxp.com> Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com> Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
8 days	media: amphion: Make some vpu_v4l2 functions static	Laurent Pinchart	2	-11/+9
	[ Upstream commit 5d1e54bb4dc6741284a3ed587e994308ddee2f16 ] Some functions defined in vpu_v4l2.c are never used outside of that compilation unit. Make them static. Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com> Reviewed-by: Ming Qian <ming.qian@oss.nxp.com> Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org> Stable-dep-of: 634c2cd17bd0 ("media: amphion: Remove vpu_vb_is_codecconfig") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
8 days	media: amphion: Add a frame flush mode for decoder	Ming Qian	1	-1/+13
	[ Upstream commit 9ea16ba6eaf93f25f61855751f71e2e701709ddf ] By default the amphion decoder will pre-parse 3 frames before starting to decode the first frame. Alternatively, a block of flush padding data can be appended to the frame, which will ensure that the decoder can start decoding immediately after parsing the flush padding data, thus potentially reducing decoding latency. This mode was previously only enabled, when the display delay was set to 0. Allow the user to manually toggle the use of that mode via a module parameter called low_latency, which enables the mode without changing the display order. Signed-off-by: Ming Qian <ming.qian@oss.nxp.com> Reviewed-by: Nicolas Dufresne <nicolas.dufresne@collabora.com> Signed-off-by: Sebastian Fricke <sebastian.fricke@collabora.com> Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl> Stable-dep-of: 634c2cd17bd0 ("media: amphion: Remove vpu_vb_is_codecconfig") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
8 days	mm/balloon_compaction: convert balloon_page_delete() to balloon_page_finalize()	David Hildenbrand	2	-5/+2
	[ Upstream commit 15504b1163007bbfbd9a63460d5c14737c16e96d ] Let's move the removal of the page from the balloon list into the single caller, to remove the dependency on the PG_isolated flag and clarify locking requirements. Note that for now, balloon_page_delete() was used on two paths: (1) Removing a page from the balloon for deflation through balloon_page_list_dequeue() (2) Removing an isolated page from the balloon for migration in the per-driver migration handlers. Isolated pages were already removed from the balloon list during isolation. So instead of relying on the flag, we can just distinguish both cases directly and handle it accordingly in the caller. We'll shuffle the operations a bit such that they logically make more sense (e.g., remove from the list before clearing flags). In balloon migration functions we can now move the balloon_page_finalize() out of the balloon lock and perform the finalization just before dropping the balloon reference. Document that the page lock is currently required when modifying the movability aspects of a page; hopefully we can soon decouple this from the page lock. Link: https://lkml.kernel.org/r/20250704102524.326966-3-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Brendan Jackman <jackmanb@google.com> Cc: Byungchul Park <byungchul@sk.com> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: Christian Brauner <brauner@kernel.org> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Eugenio Pé rez <eperezma@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Gregory Price <gourry@gourry.net> Cc: Harry Yoo <harry.yoo@oracle.com> Cc: "Huang, Ying" <ying.huang@linux.alibaba.com> Cc: Jan Kara <jack@suse.cz> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jason Wang <jasowang@redhat.com> Cc: Jerrin Shaji George <jerrin.shaji-george@broadcom.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Joshua Hahn <joshua.hahnjy@gmail.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mathew Brost <matthew.brost@intel.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Peter Xu <peterx@redhat.com> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Rakie Kim <rakie.kim@sk.com> Cc: Rik van Riel <riel@surriel.com> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Cc: xu xin <xu.xin16@zte.com.cn> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Stable-dep-of: 0da2ba35c0d5 ("powerpc/pseries/cmm: adjust BALLOON_MIGRATE when migrating pages") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
8 days	media: verisilicon: Fix CPU stalls on G2 bus error	Nicolas Dufresne	6	-23/+85
	[ Upstream commit 19c286b755072a22a063052f530a6b1fac8a1f63 ] In some seek stress tests, we are getting IRQ from the G2 decoder where the dec_bus_int and the dec_e bits are high, meaning the decoder is still running despite the error. Fix this by reworking the IRQ handler to only finish the job once we have reached completion and move the software reset to when our software watchdog triggers. This way, we let the hardware continue on errors when it did not self reset and in worse case scenario the hardware timeout will automatically stop it. The actual error will be fixed in a follow up patch. Fixes: 3385c514ecc5a ("media: hantro: Convert imx8m_vpu_g2_irq to helper") Cc: stable@vger.kernel.org Reviewed-by: Benjamin Gaignard <benjamin.gaignard@collabora.com> Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com> Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
8 days	media: verisilicon: g2: Use common helpers to compute chroma and mv offsets	Benjamin Gaignard	4	-38/+23
	[ Upstream commit 3eeaee737dcee3c32e256870dbc2687a2a6fe970 ] HEVC and VP9 are running on the same hardware and share the same chroma and motion vectors offset constraint. Create common helpers functions for these computation. Source and destination buffer height may not be the same because alignment constraint are different so use destination height to compute chroma offset because we target this buffer as hardware output. To be able to use the helpers in both VP9 HEVC code remove dec_params and use context->bit_depth instead. Signed-off-by: Benjamin Gaignard <benjamin.gaignard@collabora.com> Reviewed-by: Andrzej Pietrasiewicz <andrzej.p@collabora.com> CC: Ezequiel Garcia <ezequiel@vanguardiasur.com.ar> CC: Philipp Zabel <p.zabel@pengutronix.de> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org> Stable-dep-of: 19c286b75507 ("media: verisilicon: Fix CPU stalls on G2 bus error") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
8 days	media: verisilicon: Store chroma and motion vectors offset	Benjamin Gaignard	2	-2/+6
	[ Upstream commit 545bf944f978b7468d3a6bd668d9ff6953bc542e ] Store computed values of chroma and motion vectors offset because they depends on width and height values which change if the resolution change. Signed-off-by: Benjamin Gaignard <benjamin.gaignard@collabora.com> Reviewed-by: Andrzej Pietrasiewicz <andrzej.p@collabora.com> CC: Ezequiel Garcia <ezequiel@vanguardiasur.com.ar> CC: Philipp Zabel <p.zabel@pengutronix.de> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org> Stable-dep-of: 19c286b75507 ("media: verisilicon: Fix CPU stalls on G2 bus error") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
8 days	net: dsa: sja1105: fix kasan out-of-bounds warning in ↵	Vladimir Oltean	1	-2/+4
	sja1105_table_delete_entry() [ Upstream commit 5f2b28b79d2d1946ee36ad8b3dc0066f73c90481 ] There are actually 2 problems: - deleting the last element doesn't require the memmove of elements [i + 1, end) over it. Actually, element i+1 is out of bounds. - The memmove itself should move size - i - 1 elements, because the last element is out of bounds. The out-of-bounds element still remains out of bounds after being accessed, so the problem is only that we touch it, not that it becomes in active use. But I suppose it can lead to issues if the out-of-bounds element is part of an unmapped page. Fixes: 6666cebc5e30 ("net: dsa: sja1105: Add support for VLAN operations") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250318115716.2124395-4-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Chen Yu <xnguchen@sina.cn> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
8 days	drm/amd/display: Fix null pointer deref in dcn20_resource.c	Aurabindo Pillai	1	-4/+5
	[ Upstream commit ecbf60782662f0a388493685b85a645a0ba1613c ] Fixes a hang thats triggered when MPV is run on a DCN401 dGPU: mpv --hwdec=vaapi --vo=gpu --hwdec-codecs=all and then enabling fullscreen playback (double click on the video) The following calltrace will be seen: [ 181.843989] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 181.843997] #PF: supervisor instruction fetch in kernel mode [ 181.844003] #PF: error_code(0x0010) - not-present page [ 181.844009] PGD 0 P4D 0 [ 181.844020] Oops: 0010 [#1] PREEMPT SMP NOPTI [ 181.844028] CPU: 6 PID: 1892 Comm: gnome-shell Tainted: G W OE 6.5.0-41-generic #41~22.04.2-Ubuntu [ 181.844038] Hardware name: System manufacturer System Product Name/CROSSHAIR VI HERO, BIOS 6302 10/23/2018 [ 181.844044] RIP: 0010:0x0 [ 181.844079] Code: Unable to access opcode bytes at 0xffffffffffffffd6. [ 181.844084] RSP: 0018:ffffb593c2b8f7b0 EFLAGS: 00010246 [ 181.844093] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000004 [ 181.844099] RDX: ffffb593c2b8f804 RSI: ffffb593c2b8f7e0 RDI: ffff9e3c8e758400 [ 181.844105] RBP: ffffb593c2b8f7b8 R08: ffffb593c2b8f9c8 R09: ffffb593c2b8f96c [ 181.844110] R10: 0000000000000000 R11: 0000000000000000 R12: ffffb593c2b8f9c8 [ 181.844115] R13: 0000000000000001 R14: ffff9e3c88000000 R15: 0000000000000005 [ 181.844121] FS: 00007c6e323bb5c0(0000) GS:ffff9e3f85f80000(0000) knlGS:0000000000000000 [ 181.844128] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 181.844134] CR2: ffffffffffffffd6 CR3: 0000000140fbe000 CR4: 00000000003506e0 [ 181.844141] Call Trace: [ 181.844146] <TASK> [ 181.844153] ? show_regs+0x6d/0x80 [ 181.844167] ? __die+0x24/0x80 [ 181.844179] ? page_fault_oops+0x99/0x1b0 [ 181.844192] ? do_user_addr_fault+0x31d/0x6b0 [ 181.844204] ? exc_page_fault+0x83/0x1b0 [ 181.844216] ? asm_exc_page_fault+0x27/0x30 [ 181.844237] dcn20_get_dcc_compression_cap+0x23/0x30 [amdgpu] [ 181.845115] amdgpu_dm_plane_validate_dcc.constprop.0+0xe5/0x180 [amdgpu] [ 181.845985] amdgpu_dm_plane_fill_plane_buffer_attributes+0x300/0x580 [amdgpu] [ 181.846848] fill_dc_plane_info_and_addr+0x258/0x350 [amdgpu] [ 181.847734] fill_dc_plane_attributes+0x162/0x350 [amdgpu] [ 181.848748] dm_update_plane_state.constprop.0+0x4e3/0x6b0 [amdgpu] [ 181.849791] ? dm_update_plane_state.constprop.0+0x4e3/0x6b0 [amdgpu] [ 181.850840] amdgpu_dm_atomic_check+0xdfe/0x1760 [amdgpu] Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: David Nyström <david.nystrom@est.tech> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
8 days	usb: xhci: Apply the link chain quirk on NEC isoc endpoints	Michal Pecio	1	-2/+11
	commit bb0ba4cb1065e87f9cc75db1fa454e56d0894d01 upstream. Two clearly different specimens of NEC uPD720200 (one with start/stop bug, one without) were seen to cause IOMMU faults after some Missed Service Errors. Faulting address is immediately after a transfer ring segment and patched dynamic debug messages revealed that the MSE was received when waiting for a TD near the end of that segment: [ 1.041954] xhci_hcd: Miss service interval error for slot 1 ep 2 expected TD DMA ffa08fe0 [ 1.042120] xhci_hcd: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xffa09000 flags=0x0000] [ 1.042146] xhci_hcd: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xffa09040 flags=0x0000] It gets even funnier if the next page is a ring segment accessible to the HC. Below, it reports MSE in segment at ff1e8000, plows through a zero-filled page at ff1e9000 and starts reporting events for TRBs in page at ff1ea000 every microframe, instead of jumping to seg ff1e6000. [ 7.041671] xhci_hcd: Miss service interval error for slot 1 ep 2 expected TD DMA ff1e8fe0 [ 7.041999] xhci_hcd: Miss service interval error for slot 1 ep 2 expected TD DMA ff1e8fe0 [ 7.042011] xhci_hcd: WARN: buffer overrun event for slot 1 ep 2 on endpoint [ 7.042028] xhci_hcd: All TDs skipped for slot 1 ep 2. Clear skip flag. [ 7.042134] xhci_hcd: WARN: buffer overrun event for slot 1 ep 2 on endpoint [ 7.042138] xhci_hcd: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 31 [ 7.042144] xhci_hcd: Looking for event-dma 00000000ff1ea040 trb-start 00000000ff1e6820 trb-end 00000000ff1e6820 [ 7.042259] xhci_hcd: WARN: buffer overrun event for slot 1 ep 2 on endpoint [ 7.042262] xhci_hcd: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 31 [ 7.042266] xhci_hcd: Looking for event-dma 00000000ff1ea050 trb-start 00000000ff1e6820 trb-end 00000000ff1e6820 At some point completion events change from Isoch Buffer Overrun to Short Packet and the HC finally finds cycle bit mismatch in ff1ec000. [ 7.098130] xhci_hcd: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 13 [ 7.098132] xhci_hcd: Looking for event-dma 00000000ff1ecc50 trb-start 00000000ff1e6820 trb-end 00000000ff1e6820 [ 7.098254] xhci_hcd: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 13 [ 7.098256] xhci_hcd: Looking for event-dma 00000000ff1ecc60 trb-start 00000000ff1e6820 trb-end 00000000ff1e6820 [ 7.098379] xhci_hcd: Overrun event on slot 1 ep 2 It's possible that data from the isochronous device were written to random buffers of pending TDs on other endpoints (either IN or OUT), other devices or even other HCs in the same IOMMU domain. Lastly, an error from a different USB device on another HC. Was it caused by the above? I don't know, but it may have been. The disk was working without any other issues and generated PCIe traffic to starve the NEC of upstream BW and trigger those MSEs. The two HCs shared one x1 slot by means of a commercial "PCIe splitter" board. [ 7.162604] usb 10-2: reset SuperSpeed USB device number 3 using xhci_hcd [ 7.178990] sd 9:0:0:0: [sdb] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x07 driverbyte=DRIVER_OK cmd_age=0s [ 7.179001] sd 9:0:0:0: [sdb] tag#0 CDB: opcode=0x28 28 00 04 02 ae 00 00 02 00 00 [ 7.179004] I/O error, dev sdb, sector 67284480 op 0x0:(READ) flags 0x80700 phys_seg 5 prio class 0 Fortunately, it appears that this ridiculous bug is avoided by setting the chain bit of Link TRBs on isochronous rings. Other ancient HCs are known which also expect the bit to be set and they ignore Link TRBs if it's not. Reportedly, 0.95 spec guaranteed that the bit is set. The bandwidth-starved NEC HC running a 32KB/uframe UVC endpoint reports tens of MSEs per second and runs into the bug within seconds. Chaining Link TRBs allows the same workload to run for many minutes, many times. No negative side effects seen in UVC recording and UAC playback with a few devices at full speed, high speed and SuperSpeed. The problem doesn't reproduce on the newer Renesas uPD720201/uPD720202 and on old Etron EJ168 and VIA VL805 (but the VL805 has other bug). [shorten line length of log snippets in commit messge -Mathias] Signed-off-by: Michal Pecio <michal.pecio@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20250306144954.3507700-14-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [Shivani: Modified to apply on 6.6.y] Signed-off-by: Shivani Agarwal <shivani.agarwal@broadcom.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>