summaryrefslogtreecommitdiff
path: root/drivers
AgeCommit message (Collapse)AuthorFilesLines
3 daysmedia: vidtv: fix NULL pointer dereference in vidtv_mux_push_siRuslan Valiyev1-1/+7
commit 7d8bf3d8f91073f4db347ed3aa6302b56107499c upstream. syzbot reported a general protection fault in vidtv_psi_ts_psi_write_into [1]. vidtv_mux_get_pid_ctx() can return NULL, but vidtv_mux_push_si() does not check for this before dereferencing the returned pointer to access the continuity counter. This leads to a general protection fault when accessing a near-NULL address. The root cause is that vidtv_mux_pid_ctx_init() does not check the return value of vidtv_mux_create_pid_ctx_once() for PMT section PIDs. If the allocation fails, the PID context is never created, but init returns success. The subsequent vidtv_mux_push_si() call then gets NULL from vidtv_mux_get_pid_ctx() and crashes. Fix both the root cause (add error check in vidtv_mux_pid_ctx_init for PMT PIDs) and add defensive NULL checks in vidtv_mux_push_si for all vidtv_mux_get_pid_ctx() calls. [1] Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] SMP KASAN PTI KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] Workqueue: events vidtv_mux_tick RIP: 0010:vidtv_psi_ts_psi_write_into+0x54a/0xbc0 drivers/media/test-drivers/vidtv/vidtv_psi.c:197 Call Trace: <TASK> vidtv_psi_table_header_write_into drivers/media/test-drivers/vidtv/vidtv_psi.c:799 [inline] vidtv_psi_pmt_write_into+0x3b2/0xa70 drivers/media/test-drivers/vidtv/vidtv_psi.c:1231 vidtv_mux_push_si+0x932/0xe80 drivers/media/test-drivers/vidtv/vidtv_mux.c:196 vidtv_mux_tick+0xe9b/0x1480 drivers/media/test-drivers/vidtv/vidtv_mux.c:408 Fixes: f90cf6079bf67 ("media: vidtv: add a bridge driver") Cc: stable@vger.kernel.org Reported-by: syzbot+814c351d094f4f1a1b86@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=814c351d094f4f1a1b86 Signed-off-by: Ruslan Valiyev <linuxoid@gmail.com> Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysdrivers/base/memory: set mem->altmap after successful device registrationGeorgi Djakov1-1/+2
commit a2b8d7827f48ee54a686cb80e4a1d0ff954ec42a upstream. If __add_memory_block() fails at xa_store() (under memory pressure for example), device_unregister() is called, which eventually triggers memory_block_release() with mem->altmap still set, causing a WARN_ON(mem->altmap). This was triggered by modifying virtio-mem driver. Fix this by delaying the assignment of mem->altmap until after __add_memory_block() has succeeded. Link: https://lore.kernel.org/20260514092657.3057141-1-georgi.djakov@oss.qualcomm.com Fixes: 1a8c64e11043 ("mm/memory_hotplug: embed vmem_altmap details in memory block") Signed-off-by: Georgi Djakov <georgi.djakov@oss.qualcomm.com> Acked-by: Oscar Salvador (SUSE) <osalvador@kernel.org> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Richard Cheng <icheng@nvidia.com> Cc: David Hildenbrand <david@kernel.org> Cc: Georgi Djakov <djakov@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysserial: qcom_geni: Fix RX DMA stall when SE_DMA_RX_LEN_IN is zeroViken Dadhaniya1-6/+3
commit b93062b6d8a1b2d9bad235cac25558a909819026 upstream. In qcom_geni_serial_handle_rx_dma(), geni_se_rx_dma_unprep() clears port->rx_dma_addr before SE_DMA_RX_LEN_IN is read. If the register is zero, for example when the RX stale counter fires on an idle line, the handler returns without calling geni_se_rx_dma_prep(). The next RX DMA interrupt then hits the !port->rx_dma_addr guard and returns immediately, so the RX DMA buffer is never rearmed and later input is lost. Keep the handler on the rearm path when rx_in is zero. Warn about the unexpected zero-length DMA completion, skip received-data handling, and always call geni_se_rx_dma_prep(). Fixes: 2aaa43c70778 ("tty: serial: qcom-geni-serial: add support for serial engine DMA") Cc: stable@vger.kernel.org Reviewed-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com> Signed-off-by: Viken Dadhaniya <viken.dadhaniya@oss.qualcomm.com> Link: https://patch.msgid.link/20260528-serial-rx-0-byte-fix-v2-1-b4195cfe342f@oss.qualcomm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysvc_screen: fix null-ptr-deref in vcs_notifier() during concurrent vcs_writeYi Yang1-1/+1
commit a287620312dc6dcb9a093417a0e589bf30fcf38a upstream. A KASAN null-ptr-deref was observed in vcs_notifier(): BUG: KASAN: null-ptr-deref in vcs_notifier+0x98/0x130 Read of size 2 at addr qmp_cmd_name: qmp_capabilities, arguments: {} The issue is a race condition in vcs_write(). When the console_lock is temporarily dropped (to copy data from userspace), the vc_data pointer obtained from vcs_vc() may become stale. After re-acquiring the lock, vcs_vc() is called again to re-validate the pointer. If the vc has been deallocated in the meantime, vcs_vc() returns NULL, and the while loop breaks (with written > 0). However, after the loop, vcs_scr_updated(vc) is still called with the now-NULL vc pointer, leading to a null pointer dereference in the notifier chain (vcs_notifier dereferences param->vc). Fix this by adding a NULL check for vc before calling vcs_scr_updated(). Fixes: 8fb9ea65c9d1 ("vc_screen: reload load of struct vc_data pointer in vcs_write() to avoid UAF") Cc: stable@vger.kernel.org Signed-off-by: Yi Yang <yiyang13@huawei.com> Reviewed-by: Jiri Slaby <jirislaby@kernel.org> Link: https://patch.msgid.link/20260604060734.2914976-1-yiyang13@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 dayscrypto: qat - remove unused character device and IOCTLsGiovanni Cabiddu7-557/+1
commit d237230728c567297f2f98b425d63156ab2ed17f upstream. The QAT driver exposes a character device (qat_adf_ctl) with IOCTLs for device configuration, start, stop, status query and enumeration. These IOCTLs are not part of any public uAPI header and have no known in-tree or out-of-tree users. Device lifecycle is already managed via sysfs. The ioctl interface also increases the attack surface and is the subject of a number of bug reports. Remove the character device, the IOCTL definitions, and the related data structures (adf_dev_status_info, adf_user_cfg_key_val, adf_user_cfg_section, adf_user_cfg_ctl_data). Drop the now-unused adf_cfg_user.h header and strip adf_ctl_drv.c down to the minimal module_init/module_exit hooks for workqueue, AER, and crypto/compression algorithm registration. Clean up leftover dead code that was only reachable from the removed IOCTL paths: adf_cfg_del_all(), adf_devmgr_verify_id(), adf_devmgr_get_num_dev(), adf_devmgr_get_dev_by_id(), adf_get_vf_real_id() and the unused ADF_CFG macros. Additionally, drop the entry associated to QAT IOCTLs in ioctl-number.rst. Cc: stable@vger.kernel.org Fixes: d8cba25d2c68 ("crypto: qat - Intel(R) QAT driver framework") Reported-by: Zhi Wang <wangzhi@stu.xidian.edu.cn> Reported-by: Bin Yu <byu@xidian.edu.cn> Reported-by: MingYu Wang <w15303746062@163.com> Closes: https://lore.kernel.org/all/61d6d499.ab89.19b9b7f3186.Coremail.wangzhi_xd@stu.xidian.edu.cn/ Link: https://lore.kernel.org/all/20260508034841.256794-1-w15303746062@163.com/ Link: https://lore.kernel.org/all/20260508023542.256299-1-w15303746062@163.com/ Link: https://lore.kernel.org/all/20260504025120.98242-1-w15303746062@163.com/ Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Reviewed-by: Ahsan Atta <ahsan.atta@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysiio: adc: ti-ads1298: add bounds check to pga_settings indexSam Daly1-1/+6
commit 95e8a48d7a85d4226934020e57815a3316d3a14b upstream. ads1298_pga_settings has 7 elements but ADS1298_MASK_CH_PGA can yield values 0-7. If it yields a value >= 7, this causes an out-of-bounds array access. Add a bounds check and return -EINVAL if the index is out of range. Note that the remaining value b111 is reserved so should not be seen in a correctly functioning system. Assisted-by: gkh_clanker_2000 Cc: stable <stable@kernel.org> Cc: Jonathan Cameron <jic23@kernel.org> Cc: David Lechner <dlechner@baylibre.com> Cc: "Nuno Sá" <nuno.sa@analog.com> Cc: Andy Shevchenko <andy@kernel.org> Signed-off-by: Sam Daly <sam@samdaly.ie> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jonathan Cameron <jic23@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysiio: light: veml6075: add bounds check to veml6075_it_ms indexSam Daly1-2/+6
commit 307dc4240bd41852d9e0912921e298160db1c109 upstream. veml6075_it_ms has 5 elements but VEML6075_CONF_IT can yield values 0-7. If it returns a value >= 5, this causes an out-of-bounds array access. Add a bounds check and return -EINVAL if the index is out of range. The problem values are reserved so should never be read from the register. Hence this is hardening against fault device, missprogramming or bus corruption. Assisted-by: gkh_clanker_2000 Cc: stable <stable@kernel.org> Signed-off-by: Sam Daly <sam@samdaly.ie> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Javier Carrasco <javier.carrasco.cruz@gmail.com> Signed-off-by: Jonathan Cameron <jic23@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysnet: net_failover: Fix the deadlock in slave registerFaicker Mo1-6/+6
commit b84c5632c7b31f8910167075a8128cfb9e50fcfe upstream. There is netdev_lock_ops() before the NETDEV_REGISTER notifier in register_netdevice(), so use the non-locking functions in net_failover_slave_register(). failover_slave_register() in failover_existing_slave_register() adds lock and unlock ops too. Call Trace: <TASK> __schedule+0x30d/0x7a0 schedule+0x27/0x90 schedule_preempt_disabled+0x15/0x30 __mutex_lock.constprop.0+0x538/0x9e0 __mutex_lock_slowpath+0x13/0x20 mutex_lock+0x3b/0x50 dev_set_mtu+0x40/0xe0 net_failover_slave_register+0x24/0x280 failover_slave_register+0x103/0x1b0 failover_event+0x15e/0x210 ? dropmon_net_event+0xac/0xe0 notifier_call_chain+0x5e/0xe0 raw_notifier_call_chain+0x16/0x30 call_netdevice_notifiers_info+0x52/0xa0 register_netdevice+0x5f4/0x7c0 register_netdev+0x1e/0x40 _mlx5e_probe+0xe2/0x370 [mlx5_core] mlx5e_probe+0x59/0x70 [mlx5_core] ? __pfx_mlx5e_probe+0x10/0x10 [mlx5_core] Fixes: 4c975fd70002 ("net: hold instance lock during NETDEV_REGISTER/UP") Signed-off-by: Faicker Mo <faicker.mo@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysfirmware: samsung: acpm: Fix cross-thread RX length corruptionTudor Ambarus1-7/+7
[ Upstream commit f133bd4b5daf71bccdde0ad1a4f47fac76a6bfb1 ] Sashiko identified a cross-thread RX length corruption bug when reviewing the thermal addition to ACPM [1]. When multiple threads concurrently send IPC requests, the ACPM polling mechanism can encounter responses belonging to other threads. To drain the queue, the driver saves these concurrent responses into an internal cache (`rx_data->cmd`) to be retrieved later by the owning thread. Previously, the driver incorrectly used `xfer->rxcnt` (the expected receive length of the *current* polling thread) when copying data for *other* threads into this cache. If the threads expected responses of different lengths, this resulted in buffer underflows (leading to reads of uninitialized memory) or potential buffer overflows. Fix this by replacing the boolean `response` flag in `struct acpm_rx_data` with `rxcnt`, caching the exact expected receive length for each specific transaction during transfer preparation. Use this cached length when saving concurrent responses. Consequently, ensure that `xfer->rxcnt` is explicitly zeroed in driver helpers (e.g., `acpm_dvfs_set_xfer`) for fire-and-forget messages to prevent uninitialized stack garbage from being interpreted as a massive expected receive length. Cc: stable@vger.kernel.org Fixes: a88927b534ba ("firmware: add Exynos ACPM protocol driver") Closes: https://sashiko.dev/#/patchset/20260420-acpm-tmu-v3-0-3dc8e93f0b26%40linaro.org [1] Reported-by: Titouan Ameline de Cadeville <titouan.ameline@gmail.com> Closes: https://lore.kernel.org/r/20260426210255.73674-1-titouan.ameline@gmail.com/ Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org> Link: https://patch.msgid.link/20260505-acpm-fixes-sashiko-reports-v5-1-43b5ee7f1674@linaro.org Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysDrivers: hv: vmbus: Improve the logic of reserving fb_mmio on Gen2 VMsDexuan Cui1-3/+26
[ Upstream commit 016a25e4b0df4d77e7c258edee4aaf982e4ee809 ] If vmbus_reserve_fb() in the kdump/kexec kernel fails to properly reserve the framebuffer MMIO range (which is below 4GB) due to a Gen2 VM's screen.lfb_base being zero [1], there is an MMIO conflict between the drivers hyperv-drm and pci-hyperv: when the driver pci-hyperv's hv_allocate_config_window() calls vmbus_allocate_mmio() to get an MMIO range, typically it gets a 32-bit MMIO range that overlaps with the framebuffer MMIO range, and later hv_pci_enter_d0() fails with an error message "PCI Pass-through VSP failed D0 Entry with status" since the host thinks that PCI devices must not use MMIO space that the host has assigned to the framebuffer. This is especially an issue if pci-hyperv is built-in and hyperv-drm is built as a module. Consequently, the kdump/kexec kernel fails to detect PCI devices via pci-hyperv, and may fail to mount the root file system, which may reside in a NVMe disk. The issue described here has existed for SR-IOV VF NICs since day one of the pci-hyperv driver, and has been worked around on x64 when possible. With the recent introduction of ARM64 VMs that boot from NVMe, there is no workaround, so we need a formal fix. On Gen2 VMs, if the screen.lfb_base is 0 in the kdump/kexec kernel [1], fall back to the low MMIO base, which should be equal to the framebuffer MMIO base [2] (the statement is true according to my testing on x64 Windows Server 2016, and on x64 and ARM64 Windows Server 2025 and on Azure. I checked with the Hyper-V team and they said the statement should continue to be true for Gen2 VMs). In the first kernel, screen.lfb_base is not 0; if the user specifies a very high resolution, it's not enough to only reserve 8MB: let's always reserve half of the space below 4GB, but cap the reservation to 128MB, which is the required framebuffer size of the highest resolution 7680*4320 supported by Hyper-V. While at it, fix the comparison "end > VTPM_BASE_ADDRESS" by changing the > to >=. Here the 'end' is an inclusive end (typically, it's 0xFFFF_FFFF for the low MMIO range). Note: vmbus_reserve_fb() now also reserves an MMIO range at the beginning of the low MMIO range on CVMs, which have no framebuffers (the 'screen.lfb_base' in vmbus_reserve_fb() is 0 for CVMs), just in case the host might treat the beginning of the low MMIO range specially [3]. BTW, the OpenHCL kernel is not affected by the change, because that kernel boots with DeviceTree rather than ACPI (so vmbus_reserve_fb() won't run there), and there is no framebuffer device for that kernel. Note: normally Gen1 VMs don't have the MMIO conflict issue because the framebuffer MMIO range (which is hardcoded to base=4GB-128MB and size=64MB for Gen1 VMs by the host) is always reported via the legacy PCI graphics device's BAR, so the kdump/kexec kernel can reserve the 64MB MMIO range; however, if the VM is configured to use a very high resolution and the required framebuffer size exceeds 64MB (AFAIK, in practice, this isn't a typical configuration by users), the hyperv-drm driver may need to allocate an MMIO range above 4GB and change the framebuffer MMIO location to the allocated MMIO range -- in this case, there can still be issues [4] which can't be easily fixed: any possible affected Gen1 users would have to use a resolution whose framebuffer size is <= 64MB, or switch to Gen2 VMs. [1] https://lore.kernel.org/all/SA1PR21MB692176C1BC53BFC9EAE5CF8EBF51A@SA1PR21MB6921.namprd21.prod.outlook.com/ [2] https://lore.kernel.org/all/SA1PR21MB69218F955B62DFF62E3E88D2BF222@SA1PR21MB6921.namprd21.prod.outlook.com/ [3] https://lore.kernel.org/all/SN6PR02MB415726B17D5A6027CD1717E8D4342@SN6PR02MB4157.namprd02.prod.outlook.com/ [4] https://lore.kernel.org/all/SA1PR21MB69213486F821CA5A2C793C81BF342@SA1PR21MB6921.namprd21.prod.outlook.com/ Fixes: 4daace0d8ce8 ("PCI: hv: Add paravirtual PCI front-end for Microsoft Hyper-V VMs") CC: stable@vger.kernel.org Reviewed-by: Michael Kelley <mhklinux@outlook.com> Tested-by: Krister Johansen <kjlx@templeofstupid.com> Tested-by: Matthew Ruffell <matthew.ruffell@canonical.com> Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 dayshv: utils: handle and propagate errors in kvp_registerThorsten Blum1-12/+13
[ Upstream commit 3fcf923302a8f5c0dc3af3d2ca2657cb5fae4297 ] Make kvp_register() return an error code instead of silently ignoring failures, and propagate the error from kvp_handle_handshake() instead of returning success. This propagates both kzalloc_obj() and hvutil_transport_send() failures to kvp_handle_handshake() and thus to kvp_on_msg(). Fixes: 245ba56a52a3 ("Staging: hv: Implement key/value pair (KVP)") Cc: stable@vger.kernel.org Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysregulator: core: fix locking in regulator_resolve_supply() error pathAndré Draszik1-1/+9
commit 497330b203d2c59c5ff3fa4c34d14494d7203bc3 upstream. If late enabling of a supply regulator fails in regulator_resolve_supply(), the code currently triggers a lockdep warning: WARNING: drivers/regulator/core.c:2649 at _regulator_put+0x80/0xa0, CPU#6: kworker/u32:4/596 ... Call trace: _regulator_put+0x80/0xa0 (P) regulator_resolve_supply+0x7cc/0xbe0 regulator_register_resolve_supply+0x28/0xb8 as the regulator_list_mutex must be held when calling _regulator_put(). To solve this, simply switch to using regulator_put(). While at it, we should also make sure that no concurrent access happens to our rdev while we clear out the supply pointer. Add appropriate locking to ensure that. While the code in question will be removed altogether in a follow-up commit, I believe it is still beneficial to have this corrected before removal for future reference. Fixes: 36a1f1b6ddc6 ("regulator: core: Fix memory leak in regulator_resolve_supply()") Fixes: 8e5356a73604 ("regulator: core: Clear the supply pointer if enabling fails") Signed-off-by: André Draszik <andre.draszik@linaro.org> Link: https://patch.msgid.link/20260109-regulators-defer-v2-2-1a25dc968e60@linaro.org Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Nazar Kalashnikov <nazarkalashnikov0@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysACPI: scan: Use async schedule function in acpi_scan_clear_dep_fn()Yicong Yang1-26/+15
[ Upstream commit 7cf28b3797a81b616bb7eb3e90cf131afc452919 ] The device object rescan in acpi_scan_clear_dep_fn() is scheduled on a system workqueue which is not guaranteed to be finished before entering userspace. This may cause some key devices to be missing when userspace init task tries to find them. Two issues observed on RISCV platforms: - Kernel panic due to userspace init cannot have an opened console. The console device scanning is queued by acpi_scan_clear_dep_queue() and not finished by the time userspace init process running, thus by the time userspace init runs, no console is present. - Entering rescue shell due to the lack of root devices (PCIe nvme in our case). Same reason as above, the PCIe host bridge scanning is queued on a system workqueue and finished after init process runs. The reason is because both devices (console, PCIe host bridge) depend on riscv-aplic irqchip to serve their interrupts (console's wired interrupt and PCI's INTx interrupts). In order to keep the dependency, these devices are scanned and created after initializing riscv-aplic. The riscv-aplic is initialized in device_initcall() and a device scan work is queued via acpi_scan_clear_dep_queue(), which is close to the time userspace init process is run. Since system_dfl_wq is used in acpi_scan_clear_dep_queue() with no synchronization, the issues will happen if userspace init runs before these devices are ready. The solution is to wait for the queued work to complete before entering userspace init. One possible way would be to use a dedicated workqueue instead of system_dfl_wq, and explicitly flush it somewhere in the initcall stage before entering userspace. Another way is to use async_schedule_dev_nocall() for scanning these devices. It's designed for asynchronous initialization and will work in the same way as before because it's using a dedicated unbound workqueue as well, but the kernel init code calls async_synchronize_full() right before entering userspace init which will wait for the work to complete. Compared to a dedicated workqueue, the second approach is simpler because the async schedule framework takes care of all of the details. The ACPI code only needs to focus on its job. A dedicated workqueue for this could also be redundant because some platforms don't need acpi_scan_clear_dep_queue() for their device scanning. Signed-off-by: Yicong Yang <yang.yicong@picoheart.com> [ rjw: Subject adjustment, changelog edits ] Link: https://patch.msgid.link/20260128132848.93638-1-yang.yicong@picoheart.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> [ Vivian: Adjust system_dfl_wq -> system_unbound_wq in removed lines ] Signed-off-by: Vivian Wang <wangruikang@iscas.ac.cn> Signed-off-by: Sasha Levin <sashal@kernel.org>
3 daysagp/amd64: Fix broken error propagation in agp_amd64_probe()Mingyu Wang1-1/+1
commit b08472db93b1ccff84a7adec5779d47f0e9d3a30 upstream. A NULL pointer dereference was observed in the AMD64 AGP driver when running in a virtualized environment (e.g. qemu/kvm) without a physical AMD northbridge. The crash occurs in amd64_fetch_size() when attempting to dereference the pointer returned by node_to_amd_nb(0). The root cause of this crash is broken error propagation in agp_amd64_probe(): When no AMD northbridges are found, cache_nbs() correctly returns -ENODEV. However, the probe function erroneously checks the return value against exactly -1, rather than < 0. As a result, the hardware absence error is masked, allowing the driver to improperly proceed with initialization. It eventually calls agp_add_bridge(), which invokes amd64_fetch_size(). Since the hardware does not exist, node_to_amd_nb(0) returns NULL, leading to a General Protection Fault (GPF) when accessing its ->misc member. Fix the issue by correcting the error check in agp_amd64_probe() to abort properly when cache_nbs() returns any negative error code. This prevents the driver from erroneously proceeding without hardware, thereby avoiding the subsequent NULL pointer dereference at its source. Fixes: a32073bffc65 ("[PATCH] x86_64: Clean and enhance up K8 northbridge access code") Signed-off-by: Mingyu Wang <25181214217@stu.xidian.edu.cn> Signed-off-by: Lukas Wunner <lukas@wunner.de> Reviewed-by: Lukas Wunner <lukas@wunner.de> Cc: stable@vger.kernel.org # v2.6.18+ Link: https://patch.msgid.link/20260504074823.99377-1-w15303746062@163.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysnet: qualcomm: rmnet: fix endpoint use-after-free in rmnet_dellink()Weiming Shi2-4/+5
commit d00c953a8f69921f484b629801766da68f27f658 upstream. rmnet_dellink() removes the endpoint from the hash table with hlist_del_init_rcu() and then immediately frees it with kfree(). However, RCU readers on the receive path (rmnet_rx_handler -> __rmnet_map_ingress_handler) may still hold a reference to the endpoint and dereference ep->egress_dev after the memory has been freed. The endpoint is a kmalloc-32 object, and the stale read at offset 8 corresponds to the egress_dev pointer. BUG: unable to handle page fault for address: ffffffffde942eef Oops: 0002 [#1] SMP NOPTI CPU: 1 UID: 0 PID: 137 Comm: poc_write Not tainted 7.0.0+ #4 PREEMPTLAZY RIP: 0010:rmnet_vnd_rx_fixup (rmnet_vnd.c:27) Call Trace: <TASK> __rmnet_map_ingress_handler (rmnet_handlers.c:48 rmnet_handlers.c:101) rmnet_rx_handler (rmnet_handlers.c:129 rmnet_handlers.c:235) __netif_receive_skb_core.constprop.0 (net/core/dev.c:6096) __netif_receive_skb_one_core (net/core/dev.c:6208) netif_receive_skb (net/core/dev.c:6467) tun_get_user (drivers/net/tun.c:1955) tun_chr_write_iter (drivers/net/tun.c:2003) vfs_write (fs/read_write.c:688) ksys_write (fs/read_write.c:740) </TASK> Add an rcu_head field to struct rmnet_endpoint and replace kfree() with kfree_rcu() so the endpoint memory remains valid through the RCU grace period. Also remove the rmnet_vnd_dellink() call and inline only the nr_rmnet_devs decrement, since rmnet_vnd_dellink() would set ep->egress_dev to NULL during the grace period, creating a data race with lockless readers. Fixes: ceed73a2cf4a ("drivers: net: ethernet: qualcomm: rmnet: Initial implementation") Reported-by: Xiang Mei <xmei5@asu.edu> Signed-off-by: Weiming Shi <bestswngs@gmail.com> Link: https://patch.msgid.link/20260514122511.3083479-2-bestswngs@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysi2c: stub: Reject I2C block transfers with invalid lengthWeiming Shi1-0/+5
commit 6036b5067a8199ba7a2dc7b377d4b9dd276d5f9e upstream. The I2C_SMBUS_I2C_BLOCK_DATA case in stub_xfer() uses data->block[0] as the transfer length. The existing check only clamps it to avoid overrunning the chip->words[256] register array, but does not validate it against I2C_SMBUS_BLOCK_MAX (32), which is the limit of the union i2c_smbus_data.block buffer (34 bytes total). The driver is a development/test tool (CONFIG_I2C_STUB=m, not built by default) that must be loaded with a chip_addr= parameter. A local user with access to /dev/i2c-* can issue an I2C_SMBUS ioctl with I2C_SMBUS_I2C_BLOCK_DATA and data->block[0] > 32, causing stub_xfer() to read or write past the end of the union i2c_smbus_data.block buffer: BUG: KASAN: stack-out-of-bounds in stub_xfer (drivers/i2c/i2c-stub.c:223) Read of size 1 at addr ffff88800abcfd92 by task exploit/81 Call Trace: <TASK> stub_xfer (drivers/i2c/i2c-stub.c:223) __i2c_smbus_xfer (drivers/i2c/i2c-core-smbus.c:593) i2c_smbus_xfer (drivers/i2c/i2c-core-smbus.c:536) i2cdev_ioctl_smbus (drivers/i2c/i2c-dev.c:391) i2cdev_ioctl (drivers/i2c/i2c-dev.c:478) __x64_sys_ioctl (fs/ioctl.c:583) do_syscall_64 (arch/x86/entry/syscall_64.c:94) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) </TASK> The bug exists because i2c-stub implements .smbus_xfer directly, bypassing the I2C_SMBUS_BLOCK_MAX validation in i2c_smbus_xfer_emulated(). The I2C_SMBUS_BLOCK_DATA case in the same function correctly validates against I2C_SMBUS_BLOCK_MAX, but the I2C_SMBUS_I2C_BLOCK_DATA case does not. Fix by rejecting transfers with data->block[0] == 0 or data->block[0] > I2C_SMBUS_BLOCK_MAX with -EINVAL, consistent with both the I2C_SMBUS_BLOCK_DATA case in the same function and the I2C_SMBUS_I2C_BLOCK_DATA validation in i2c_smbus_xfer_emulated(). Fixes: 4710317891e4 ("i2c-stub: Implement I2C block support") Reported-by: Xiang Mei <xmei5@asu.edu> Signed-off-by: Weiming Shi <bestswngs@gmail.com> Reviewed-by: Jean Delvare <jdelvare@suse.de> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysRDMA/bnxt_re: zero shared page before exposing to userspaceLord Ulf Henrik Holmberg1-1/+1
commit f6b079629becfa977f9c51fe53ad2e6dcc55ef44 upstream. bnxt_re_alloc_ucontext() allocates uctx->shpg via __get_free_page(GFP_KERNEL). The buddy allocator does not zero pages without __GFP_ZERO, so the page contains stale kernel data from whatever object most recently freed it. The page is then mapped into userspace via vm_insert_page() under BNXT_RE_MMAP_SH_PAGE in bnxt_re_mmap(). The driver only ever writes 4 bytes (a u32 AVID) at offset BNXT_RE_AVID_OFFT (0x10) inside bnxt_re_create_ah(); the remaining 4092 bytes of the page are exposed to userspace unsanitised, leaking kernel memory contents. Any user with access to /dev/infiniband/uverbsX on a host with a bnxt_re device (typically rdma group membership) can read this data via a single mmap() at pgoff 0 after IB_USER_VERBS_CMD_GET_CONTEXT. Other shared pages in the same file already use get_zeroed_page() correctly: drivers/infiniband/hw/bnxt_re/ib_verbs.c srq->uctx_srq_page = (void *)get_zeroed_page(GFP_KERNEL); cq->uctx_cq_page = (void *)get_zeroed_page(GFP_KERNEL); uctx->shpg is the only outlier. Bring it in line with the existing convention by switching to get_zeroed_page(). Fixes: 1ac5a4047975 ("RDMA/bnxt_re: Add bnxt_re RoCE driver") Signed-off-by: Lord Ulf Henrik Holmberg <henrik.holmberg@defensify.se> Link: https://patch.msgid.link/20260509084011.11971-1-pomzm67@gmail.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 daysnet: stmmac: fix stm32 (and potentially others) resume regressionRussell King (Oracle)1-1/+2
[ Upstream commit dbbec8c5a79f4c7aa8d07da8c0b5a34d76c50699 ] Marek reported that suspending stm32 causes the following errors when the interface is administratively down: $ echo devices > /sys/power/pm_test $ echo mem > /sys/power/state ... ck_ker_eth2stp already disabled ... ck_ker_eth2stp already unprepared ... On suspend, stm32 starts the eth2stp clock in its suspend method, and stops it in the resume method. This is because the blamed commit omits the call to the platform glue ->suspend() method, but does make the call to the platform glue ->resume() method. This problem affects all other converted drivers as well - e.g. looking at the PCIe drivers, pci_save_state() will not be called, but pci_restore_state() will be. Similar issues affect all other drivers. Fix this by always calling the ->suspend() method, even when the network interface is down. This fixes all the conversions to the platform glue ->suspend() and ->resume() methods. Link: https://lore.kernel.org/r/20260114081809.12758-1-marex@nabladev.com Fixes: 07bbbfe7addf ("net: stmmac: add suspend()/resume() platform ops") Reported-by: Marek Vasut <marex@nabladev.com> Tested-by: Marek Vasut <marex@nabladev.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1vlujh-00000007Hkw-2p6r@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
11 daysRDMA/umem: Fix truncation for block sizes >= 4GJason Gunthorpe1-2/+2
[ Upstream commit 15fe76e23615f502d051ef0768f86babaf08746c ] When the iommu is used the linearization of the mapping can give a single block that is very large split across multiple SG entries. When __rdma_block_iter_next() reassembles the split SG entries it is overflowing the 32 bit stack values and computed the wrong DMA addresses for blocks after the truncation. Use the right types to hold DMA addresses. Link: https://patch.msgid.link/r/1-v1-88303e9e509f+f7-ib_umem_types_jgg@nvidia.com Cc: stable@vger.kernel.org Fixes: a808273a495c ("RDMA/verbs: Add a DMA iterator to return aligned contiguous memory blocks") Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 daysRDMA: Move DMA block iterator logic into dedicated filesLeon Romanovsky18-51/+59
[ Upstream commit 6094ea64c69520ed1e770e7c79c43412de202bfa ] The DMA iterator logic was mixed into verbs and umem-specific code, forcing all users to include rdma/ib_umem.h. Move the block iterator logic into iter.c and rdma/iter.h so that rdma/ib_umem.h and rdma/ib_verbs.h can be separated in a follow-up patch. Link: https://patch.msgid.link/20260213-refactor-umem-v1-1-f3be85847922@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Stable-dep-of: 15fe76e23615 ("RDMA/umem: Fix truncation for block sizes >= 4G") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 daysRDMA: During rereg_mr ensure that REREG_ACCESS is compatibleJason Gunthorpe6-0/+37
[ Upstream commit badad6fad60def1b9805559dd81dbab3d97b82aa ] If IB_MR_REREG_ACCESS changes from RO to RW then the umem has to be re-evaluated to ensure it is properly pinned as RW. Since the umem is hidden inside each driver's mr struct add a ib_umem_check_rereg() function that each driver has to call before processing IB_MR_REREG_ACCESS. mlx4 has to retain its duplicate ib_access_writable check because it implements IB_MR_REREG_ACCESS | IB_MR_REREG_TRANS by changing both items in place sequentially while the MR is live, so it will continue to not support this combination. Cc: stable@vger.kernel.org Fixes: b40656aa7d55 ("RDMA/umem: remove FOLL_FORCE usage") Link: https://patch.msgid.link/r/0-v1-06fb1a2d6cf5+107-rereg_access_jgg@nvidia.com Reported-by: Philip Tsukerman <philiptsukerman@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 daysRDMA/umem: Add helpers for umem dmabuf revoke lockJacob Moroni1-0/+16
[ Upstream commit 3a0b171302eea1732a168e26db3b8461f51cc1f9 ] Added helpers to acquire and release the umem dmabuf revoke lock. The intent is to avoid the need for drivers to peek into the ib_umem_dmabuf internals to get the dma_resv_lock and bring us one step closer to abstracting ib_umem_dmabuf away from drivers in general. Signed-off-by: Jacob Moroni <jmoroni@google.com> Link: https://patch.msgid.link/20260305170826.3803155-5-jmoroni@google.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Stable-dep-of: badad6fad60d ("RDMA: During rereg_mr ensure that REREG_ACCESS is compatible") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 daysRDMA/umem: Move umem dmabuf revoke logic into helper functionJacob Moroni1-9/+17
[ Upstream commit 797291a66ce346c96114b72222fc290d402da005 ] This same logic will eventually be reused from within the invalidate_mappings callback which already has the dma_resv_lock held, so break it out into a separate function so it can be reused. Signed-off-by: Jacob Moroni <jmoroni@google.com> Link: https://patch.msgid.link/20260305170826.3803155-3-jmoroni@google.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Stable-dep-of: badad6fad60d ("RDMA: During rereg_mr ensure that REREG_ACCESS is compatible") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 daysRDMA/umem: Add ib_umem_dmabuf_get_pinned_and_lock helperJacob Moroni1-9/+26
[ Upstream commit 553dfa8cbd0c6d36adae042d9738ddf8f8765ac7 ] Move the inner logic of ib_umem_dmabuf_get_pinned_with_dma_device() to a new static function that returns with the lock held upon success. The intent is to allow reuse for the future get_pinned_revocable_and_lock function. Signed-off-by: Jacob Moroni <jmoroni@google.com> Link: https://patch.msgid.link/20260305170826.3803155-2-jmoroni@google.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Stable-dep-of: badad6fad60d ("RDMA: During rereg_mr ensure that REREG_ACCESS is compatible") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 daysdriver core: reject devices with unregistered busesJohan Hovold1-2/+9
commit 36f35b8df6972167102a1c3d4361e0afb6a84534 upstream. Trying to register a device on a bus which has not yet been registered used to trigger a NULL-pointer dereference, but since the const bus structure rework registration instead succeeds without the device being added to the bus. This specifically means that the device will never bind to a driver and that the bus sysfs attributes are not created (i.e. as if the device had no bus). Reject devices with unregistered buses to catch any callers that get the ordering wrong and to handle bus registration failures more gracefully. Fixes: 5221b82d46f2 ("driver core: bus: bus_add/probe/remove_device() cleanups") Cc: stable@vger.kernel.org # 6.3 Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Johan Hovold <johan@kernel.org> Link: https://patch.msgid.link/20260430091718.230228-1-johan@kernel.org Signed-off-by: Danilo Krummrich <dakr@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 daysdrm/amd/display: Use krealloc_array() in dal_vector_reserve()Harry Wentland1-2/+2
commit da48bc4461b8a5ebfb9264c9b191a701d8e99009 upstream. [Why & How] dal_vector_reserve() computes the allocation size as "capacity * vector->struct_size" using uint32_t arithmetic, which can silently wrap to a small value on overflow. This would cause krealloc to return a smaller buffer than expected, leading to heap overflows on subsequent vector appends. Replace krealloc() with krealloc_array() which performs an internal overflow check and returns NULL on wrap, preventing the issue. Fixes: 2004f45ef83f ("drm/amd/display: Use kernel alloc/free") Assisted-by: Copilot:claude-opus-4.6 Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 37668568641ccc4cc1dbca4923d0a16609dd5707) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 daysdrm/amd/display: Fix out-of-bounds read in dp_get_eq_aux_rd_interval()Harry Wentland1-1/+1
commit e8b4d37eba05141ee01794fc6b7f2da808cee83b upstream. [Why & How] The aux_rd_interval array in struct dc_lttpr_caps is declared with MAX_REPEATER_CNT - 1 (7) elements, indexed 0..6. However, the offset parameter passed to dp_get_eq_aux_rd_interval() can be as large as MAX_REPEATER_CNT (8) when a sink reports 8 LTTPR repeaters via DPCD. This leads to an out-of-bounds read of aux_rd_interval[7] when offset is 8. Fix this by growing aux_rd_interval to MAX_REPEATER_CNT elements to accommodate the full range of valid repeater counts defined by the DP spec. Assisted-by: GitHub Copilot:Claude claude-4-opus Signed-off-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit a55a458a8df37a65ffda5cf721d554a8f74f6b04) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 daysdrm/amd/display: Fix NULL deref and buffer over-read in SDP debugfsHarry Wentland1-0/+5
commit adf67034b1f61f7119295208085bfd43f85f56af upstream. [Why & How] dp_sdp_message_debugfs_write() dereferences connector->base.state->crtc without checking for NULL. A connector can be connected but not bound to any CRTC (e.g. after hot-plug before the next atomic commit), causing a kernel crash when writing to the sdp_message debugfs node. The function also ignores the user-provided size argument and always passes 36 bytes to copy_from_user(), reading past the user buffer when size < 36. Fix both issues by: - Returning -ENODEV when connector->base.state or state->crtc is NULL - Clamping write_size to min(size, sizeof(data)) Fixes: c7ba3653e977 ("drm/amd/display: Generic SDP message access in amdgpu") Assisted-by: Copilot:claude-opus-4.6 Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 6ab4c36a522842ff70474a1c0af2e40e50fc8300) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 daysdrm/amd/display: add missing CSC entries for BT.2020 for DCE IPsLeorize2-2/+18
commit 6590fe323ce2807f5d9454e7fccf3fab875d4352 upstream. DCE-based hardware does not have the CSC matrices for BT.2020, which causes the driver to fallback to the GPU built-in matrices. This does not appear to cause any issues for RGB sinks, but causes major color artifacts for YCbCr ones (e.g. black becomes green). This commit adds the missing CSC matrices (taken from DC common) to DCE CSC tables, resolving the issue. Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/3358 Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/5333 Assisted-by: oh-my-pi:GPT-5.5 Signed-off-by: Leorize <leorize+oss@disroot.org> Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 51e6668ab4baf55b082c376318d51ef965757196) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 daysdrm/amd/display: Clamp VBIOS HDMI retimer register count to array sizeHarry Wentland1-16/+32
commit fb0707ce00eef4e2d60c3020e1c0432739703e4a upstream. [Why & How] The VBIOS integrated info tables (v1_11 and v2_1) contain HdmiRegNum and Hdmi6GRegNum fields that are used as loop bounds when copying retimer I2C register settings into fixed-size arrays (dp*_ext_hdmi_reg_settings[9] and dp*_ext_hdmi_6g_reg_settings[3]). These u8 fields are not validated before use, so a malformed VBIOS can specify values up to 255, causing an out-of-bounds heap write during driver probe. Clamp each register count to the destination array size using min_t() before the copy loops, in both get_integrated_info_v11() and get_integrated_info_v2_1(). Assisted-by: GitHub Copilot:claude-opus-4.6 Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 5a7f0ef90195940c54b0f5bb85b87da55f038c69) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 daysdrm/amd/display: Clamp HDMI HDCP2 rx_id_list read to buffer sizeHarry Wentland1-1/+2
commit f0f3981c43b32cadfe373d636d9e9ca522bb3702 upstream. [Why & How] During HDCP 2.x repeater authentication over HDMI, the driver reads the sink's RxStatus register and extracts a 10-bit message size field (max value 1023). This value is used as the read length for the ReceiverID list without being clamped to the size of the destination buffer rx_id_list[177]. A malicious HDMI repeater could advertise a message size larger than the buffer, causing an out-of-bounds write during the I2C read. Clamp the read length in mod_hdcp_read_rx_id_list() to the size of the rx_id_list buffer, matching the approach already used in the DP branch. Fixes: eff682f83c9c ("drm/amd/display: Add DDC handles for HDCP2.2") Assisted-by: Copilot:claude-opus-4.6 Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 229212219e4247d9486f8ba41ef087358490be09) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 daysdrm/amd/display: Bound VBIOS record-chain walk loopsHarry Wentland3-14/+33
commit ff287df16a1a58aca78b08d1f3ee09fc44da0351 upstream. [Why & How] All record-chain walk loops in bios_parser.c and bios_parser2.c use for(;;) and only terminate on a 0xFF record_type sentinel or zero record_size. A malformed VBIOS image missing the terminator record causes unbounded iteration at probe time, potentially hundreds of thousands of iterations with record_size=1. In the final iterations near the BIOS image boundary, struct casts beyond the 2-byte header validated by GET_IMAGE can also read out of bounds. Cap all 14 record-chain walk loops to BIOS_MAX_NUM_RECORD (256) iterations. The atombios.h defines up to 22 distinct record types and atomfirmware.h has 13. Assuming an average of less than 10 records per type (which is reasonable since most are connector- based) 256 is a generous upper bound. Fixes: 4562236b3bc0 ("drm/amd/dc: Add dc display driver (v2)") Assisted-by: Copilot:claude-opus-4.6 Mythos Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 95700a3d660287ed657d6892f7be9ffc0e294a93) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 daysdrm/amd/pm: smu_v14_0_0: use SoftMin for gfxclk in set_soft_freq_limited_rangePriya Hosur1-1/+2
commit 03b70e0d8aa26bab89a0f1394c1c80a871925e42 upstream. In smu_v14_0_0_set_soft_freq_limited_range(), the gfxclk floor is programmed via SetHardMinGfxClk together with SetSoftMaxGfxClk. Under power_dpm_force_performance_level=high this pins HardMin to peak gfxclk. In PMFW arbitration HardMin has higher priority than SoftMax, so the firmware thermal/PPT throttler cannot clamp gfxclk via SoftMax once HardMin is set to peak. Replace SetHardMinGfxClk with SetSoftMinGfxclk so the driver still requests peak performance but the firmware throttler retains the ability to clamp gfxclk under thermal/PPT pressure. SoftMax handling is unchanged and no other clock domains are affected. Signed-off-by: Priya Hosur <Priya.Hosur@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 3ea273267fd29cbf6d83ee72329f59eb5042605b) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 daysdrm/amd/pm: mark metrics.energy_accumulator is invalid for smu 14.0.2Yang Wang1-1/+0
commit ee193c5bbd5e2b56bbeb54ef554414b43a6fc896 upstream. EnergyAccumulator is unsupported on SMU 14.0.2, mark it invalid. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 646b05043eeed04b51c14aad22a400a8250af4b7) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 daysdrm/amd/pm: fix smu13 power limit default/cap calculationYang Wang2-29/+35
commit bb204f19e4a115f094a6a3c4d82fcf48862d0766 upstream. smu_v13_0_0_get_power_limit() and smu_v13_0_7_get_power_limit() mix runtime power_limit with PP table limits when reporting default/min/max. When current power limit query succeeds, default_power_limit was set to the runtime value instead of the PP table default, and min/max could be derived from inconsistent bases (MsgLimits/runtime), leading to incorrect cap info. Use SocketPowerLimitAc/Dc as the PP default base (pp_limit), keep current_power_limit as runtime value, and derive min/max from pp_limit with OD percentages. Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/5227 Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 1eaf26db95901ca70737503a89b831dd763c8453) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 daysdrm/amdgpu: set noretry=1 as default for GFX 10.1.x (Navi10/12/14)Vitaly Prosyak1-1/+1
commit e47b0056a08dc70430ffc44bbf62197e7d1ff8ea upstream. Problem: While developing the amd_close_race IGT test (which intentionally triggers execute permission faults by removing VM_PAGE_EXECUTABLE from GPU page table entries), we discovered that on Navi10 (GFX 10.1.x) these faults produce zero diagnostic output. The GPU simply hangs silently for ~10s until the scheduler timeout fires. There is no way to distinguish an execute permission fault from any other type of GPU hang. Root cause: GFX 10.1.x defaults to noretry=0, which sets RETRY_PERMISSION_OR_INVALID_PAGE_FAULT=1 in the GFXHUB UTCL2 registers (gfxhub_v2_0.c line 313). With this bit set, permission faults (valid PTE, wrong R/W/X bits) are handled entirely within the UTCL1/UTCL2 hardware loop: UTCL2 returns an XNACK to UTCL1, and UTCL1 re-requests the translation indefinitely, expecting software to eventually fix the permission bits (as happens in SVM/HMM recovery). No interrupt of any kind reaches the IH ring. This is different from invalid-page faults (V=0) which DO generate a retry fault interrupt that the driver can escalate to a no-retry fault. Permission faults with valid PTEs loop silently forever in hardware. GFX 10.3+ already defaults to noretry=1, which makes permission faults generate immediate L2 protection fault interrupts. GFX 10.1.x was inadvertently left out of this default. Fix: Change the noretry=1 threshold from IP_VERSION(10, 3, 0) to IP_VERSION(10, 1, 0) in amdgpu_gmc_noretry_set(). This is a one-line change that aligns GFX 10.1.x behavior with GFX 10.3+ and all newer generations. With noretry=1, the existing non-retry fault handler (gmc_v10_0_process_interrupt) already decodes and prints the full GCVM_L2_PROTECTION_FAULT_STATUS register including PERMISSION_FAULTS, faulting address, VMID, PASID, and process name. No additional logging code is needed — the fix is purely routing permission faults to the existing, fully-capable non-retry interrupt handler. v2: Dropped GFX10-specific logging from gmc_v10_0.c and kfd_int_process_v10.c (Felix Kuehling). v1 added logging in the retry fault handler, but with noretry=1 permission faults take the non-retry path — the v1 retry handler code was dead and would never execute. Tested on Navi10 (GFX 10.1.10): - Execute permission faults now produce immediate, clear output: [gfxhub] page fault (src_id:0 ring:64 vmid:4 pasid:592) Process amd_close_race pid 13380 thread amd_close_race pid 13384 in page at address 0x40001000 from client 0x1b (UTCL2) GCVM_L2_PROTECTION_FAULT_STATUS:0x00700881 PERMISSION_FAULTS: 0x8 - No regressions with properly-mapped GPU workloads Cc: Christian Koenig <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit eb21edd24c40d81066753f8ac6f23bce15745395) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 daysdrm/amdgpu: restart the CS if some parts of the VM are still invalidatedChristian König1-1/+3
commit 40396ffdf6120e2380706c59e1a84d7e765a37b6 upstream. Make sure that we only submit work with full up to date VM page tables. Backport to 7.1 and older. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Tested-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 59720bfd8c6dbebeb8d5a7ab64241b007efd9213) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 daysdrm/amdgpu: fix waiting for all submissions for userptrsChristian König1-2/+4
commit 58bafc666c484b21839a2d27e923ae1b2727a1df upstream. Wait for all submissions when userptrs need to be invalidated by the MMU notifier, not just the one the userptr was involved into. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Tested-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 91250893cbaa25c86872deca95a540d08de1f91e) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 daysdrm/v3d: Skip CSD when it has zeroed workgroupsMaíra Canal1-3/+13
commit 7f93fad5ea0affc9e1505dd0f7596c0fdb496213 upstream. A compute shader dispatch encodes its workgroup counts in the CFG0..CFG2 registers. Kicking off a dispatch with a zero count in any of the three dimensions is invalid. First, the hardware will process 0 as 65536, while the user-space driver exposes a maximum of 65535. Over that, a submission with a zeroed workgroup dimension should be a no-op. These zeroed counts can reach the dispatch path through an indirect CSD job, whose workgroup counts are only known once the indirect buffer is read and may legitimately be zero, but such scenario should only result in a no-op. Overwrite the indirect CSD job workgroup counts with the indirect BO ones, even if they are zeroed, and don't submit the job to the hardware when any of the workgroup counts is zero, so the job completes immediately instead of running the shader. Cc: stable@vger.kernel.org Fixes: d223f98f0209 ("drm/v3d: Add support for compute shader dispatch.") Suggested-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Link: https://patch.msgid.link/20260602-v3d-fix-indirect-csd-v4-2-654309e32bc0@igalia.com Signed-off-by: Maíra Canal <mcanal@igalia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 daysdrm/v3d: Fix vaddr leak when indirect CSD has zeroed workgroupsMaíra Canal1-1/+2
commit ae7676952790f421c40918e2586a2c9f12a682b6 upstream. v3d_rewrite_csd_job_wg_counts_from_indirect() maps both the indirect buffer and the workgroup buffer and is expected to release them before returning. When any of the workgroup counts read from the buffer is zero, the function bailed out early and skipped the cleanup, leaking the vaddr mappings of both BOs. Jump to the cleanup path instead of returning directly, so the mappings are always dropped. Cc: stable@vger.kernel.org Fixes: 18b8413b25b7 ("drm/v3d: Create a CPU job extension for a indirect CSD job") Suggested-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Link: https://patch.msgid.link/20260602-v3d-fix-indirect-csd-v4-1-654309e32bc0@igalia.com Signed-off-by: Maíra Canal <mcanal@igalia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 daysdrm/v3d: Fix global performance monitor reference countingMaíra Canal1-5/+19
commit 6bf7e2affc6e62da7add393d7f352d4040f5bc27 upstream. In the SET_GLOBAL ioctl, v3d_perfmon_find() bumps the reference count on the perfmon it returns, but v3d_perfmon_set_global_ioctl() and v3d_perfmon_delete() fail to release that reference on several paths: 1. v3d_perfmon_set_global_ioctl() leaks the reference on its error paths. 2. CLEAR_GLOBAL leaks both the find reference and the reference previously stashed in v3d->global_perfmon by the SET_GLOBAL ioctl that configured it. 3. Destroying a perfmon that is the current global perfmon leaks the reference stashed by the SET_GLOBAL ioctl. Release each of these references explicitly. Cc: stable@vger.kernel.org Fixes: c6eabbab359c ("drm/v3d: Add DRM_IOCTL_V3D_PERFMON_SET_GLOBAL") Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Link: https://patch.msgid.link/20260531-v3d-perfmon-lifetime-v2-1-60ed4485a203@igalia.com Signed-off-by: Maíra Canal <mcanal@igalia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 daysdrm/v3d: Wait for pending L2T flush before cleaning cachesMaíra Canal1-0/+8
commit abf888b03a9805a3bc37948a0df443553b1c0910 upstream. v3d_clean_caches() starts the cache-clean sequence by writing V3D_L2TCACTL_TMUWCF to V3D_CTL_L2TCACTL and then polling for that bit to clear. It does not, however, check for an L2T flush (L2TFLS) that may still be in flight from a previous operation. On pre-V3D 7.1 hardware, kicking off the TMU write-combiner flush while an L2T flush is still pending can clobber bits in L2TCACTL and cause cache inconsistencies. Poll for L2TFLS to clear before writing L2TCACTL on V3D < 7.1, ensuring any pending flush has completed before a new clean is issued. Cc: stable@vger.kernel.org Fixes: d223f98f0209 ("drm/v3d: Add support for compute shader dispatch.") Link: https://patch.msgid.link/20260530-v3d-fix-rpi4-freezes-v1-1-c2c8307da6ce@igalia.com Signed-off-by: Maíra Canal <mcanal@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 daysdrm/xe: Clear pending_disable before signaling suspend fenceTangudu Tilak Tirumalesh1-1/+1
commit 54f2a0442a30fe7a0f6bc8345e81f8b2db8effbd upstream. In the schedule-disable done path for suspend, we signal the suspend fence before clearing pending_disable. That wakeup can let suspend_wait complete and resume be queued immediately. The resume path may then reach enable_scheduling() while pending_disable is still set and hit the !exec_queue_pending_disable(q) assertion. Fix this by clearing pending_disable before signaling the suspend fence, so any resumed transition observes a consistent state. Fixes: 87651f31ae4e ("drm/xe/guc_submit: fix race around suspend_pending") Cc: stable@vger.kernel.org # v7.0+ Signed-off-by: Tangudu Tilak Tirumalesh <tilak.tirumalesh.tangudu@intel.com> Reviewed-by: Thomas Hellstrom <thomas.hellstrom@linux.intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patch.msgid.link/20260603065217.3131066-3-tilak.tirumalesh.tangudu@intel.com (cherry picked from commit 4b1ae138b0e103d753773956a84eebc2edbf62c4) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 daysdrm/xe/display: fix oops in suspend/shutdown without displayJani Nikula1-2/+9
commit 68938cc08e23a94fd881e845837ff918de005ce7 upstream. The xe driver keeps track of whether to probe display, and whether display hardware is there, using xe->info.probe_display. It gets set to false if there's no display after intel_display_device_probe(). However, the display may also be disabled via fuses, detected at a later time in intel_display_device_info_runtime_init(). In this case, the xe driver does for_each_intel_crtc() on uninitialized mode config in xe_display_flush_cleanup_work(), leading to a NULL pointer dereference, and generally calls display code with display info cleared. Check for intel_display_device_present() after intel_display_device_info_runtime_init(), and reset xe->info.probe_display as necessary. Also do unset_display_features() for completeness, although display runtime init has already done that. This will need to be unified across all cases later. Move intel_display_device_info_runtime_init() call slightly earlier, similar to i915, to avoid a bunch of unnecessary setup for no display cases. Note #1: The xe driver has no business doing low level display plumbing like for_each_intel_crtc() to begin with. It all needs to happen in display code. Note #2: The actual bug is present already in commit 44e694958b95 ("drm/xe/display: Implement display support"), but the oops was likely introduced later at commit ddf6492e0e50 ("drm/xe/display: Make display suspend/resume work on discrete"). Fixes: 44e694958b95 ("drm/xe/display: Implement display support") Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/work_items/7904 Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/work_items/6150 Cc: stable@vger.kernel.org # v6.8+ Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com> Link: https://patch.msgid.link/20260515160920.1082842-1-jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com> (cherry picked from commit 7c3eb9f47533220888a67266448185fd0775d4da) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 daysdrm/amdkfd: Fix buffer overflow in SDMA queue checkpoint/restore on GFX11Andrew Martin1-8/+41
commit 352ea59028ea48a6fff77f19ae28f98f71946a80 upstream. The v11 MQD manager incorrectly assigned the CP-compute variants of checkpoint_mqd/restore_mqd for KFD_MQD_TYPE_SDMA queues. These functions use sizeof(struct v11_compute_mqd) (2048 bytes) instead of sizeof(struct v11_sdma_mqd) (512 bytes), causing a 1536-byte overflow. During CRIU checkpoint of an SDMA queue on Navi3x: - checkpoint_mqd() reads 2048 bytes from a 512-byte SDMA MQD buffer, leaking 1536 bytes of adjacent GTT memory to userspace During CRIU restore: - restore_mqd() writes 2048 bytes into a 512-byte SDMA MQD buffer, corrupting 1536 bytes of adjacent GTT memory (often the ring buffer or neighboring MQDs) This is a copy-paste regression unique to v11. All other ASIC backends (cik, vi, v9, v10, v12) correctly use the SDMA-specific variants. Add checkpoint_mqd_sdma() and restore_mqd_sdma() functions that properly handle the smaller v11_sdma_mqd structure, matching the pattern used in other MQD managers. Fixes: cc009e613de6 ("drm/amdkfd: Add KFD support for soc21 v3") Assisted-by: Claude:Sonnet 4-5 Signed-off-by: Andrew Martin <andrew.martin@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 6fa41db7ffdec97d62433adf03b7b9b759af8c2c) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 daysdrm/amdkfd: fix NULL dereference in get_queue_ids()Muhammad Bilal1-1/+1
commit 2bd550b547deabef98bd3b017ff743b7c34d3a6d upstream. When usr_queue_id_array is NULL and num_queues is non-zero, get_queue_ids() returns NULL. The callers check only IS_ERR() on the return value; since IS_ERR(NULL) == false the check passes, and suspend_queues() calls q_array_invalidate() which immediately dereferences NULL while iterating num_queues times. Userspace can trigger this via kfd_ioctl_set_debug_trap() by supplying num_queues > 0 with a zero queue_array_ptr, causing a kernel panic. A NULL usr_queue_id_array with num_queues == 0 is a legitimate no-op (q_array_invalidate never executes, and resume_queues already guards all queue_ids dereferences behind a NULL check). Return ERR_PTR(-EINVAL) only when num_queues is non-zero and the pointer is absent; both callers already propagate IS_ERR() returns correctly to userspace. Fixes: a70a93fa568b ("drm/amdkfd: add debug suspend and resume process queues operation") Signed-off-by: Muhammad Bilal <meatuni001@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit f165a82cdf503884bb1797771c61b2fcc72113d4) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 daysdrm/gem: Try to fix change_handle ioctl, attempt 4Simona Vetter2-41/+37
commit 1a4f03d22fb655e5f192244fb2c87d8066fcfca2 upstream. [airlied: just added some comments on how to reenable] On-list because the cat is out of the bag and we're clearly not good enough to figure this out in private. The story thus far: 5e28b7b94408 ("drm: Set old handle to NULL before prime swap in change_handle") tried to fix a race condition between the gem_close and gem_change_handle ioctls, but got a few things wrong: - There's a confusion with the local variable handle, which is actually the new handle, and so the two-stage trick was actually applied to the wrong idr slot. 7164d78559b0 ("drm/gem: fix race between change_handle and handle_delete") tried to fix that by adding yet another code block, but forgot to add the error handling. Which meant we now have two paths, both kinda wrong. - dc366607c41c ("drm: Replace old pointer to new idr") tried to apply another fix, but inconsistently, again because of the handle confusion - this would be the right fix (kinda, somewhat, it's a mess) if we'd do the two-stage approach for the new handle. Except that wasn't the intent of the original fix. We also didn't have an igt merged for the original ioctl, which is a big no-go. This was attempted to address off-list in the original bugfix, and amd QA people claimed the bug was fixed now. Very clearly that's not the case. Here's my attempt to sort this out: - Rename the local variable to new_handle, the old aliasing with args->handle is just too dangerously confusing. - Merge the gem obj lookup with the two-stage idr_replace so that we avoid getting ourselves confused there. - This means we don't have a surplus temporary reference anymore, only an inherited from the idr. A concurrent gem_close on the new_handle could steal that. Fix that with the same two-stage approach create_tail uses. This is a bit overkill as documented in the comment, but I also don't trust my ability to understand this all correctly, so go with the established pattern we have from other ioctls instead for maximum paranoia. - Adjust error paths. I've tried to make the error and success paths common, because they are identical except for which handle is removed and on which we call idr_replace to (re)install the object again. But that made things messier to read, so I've left it at the more verbose version, which unfortunately hides the symmetry in the entire code flow a bit. - While at it, also replace the 7 space indent with 1 tab. And finally, because I flat out don't trust my abilities here at all anymore: - Disable the ioctl until we have the igt situation and everything else sorted out on-list and with full consensus. v2: Sashiko noticed that I didn't handle the error path for idr_replace correctly, it must be checked with IS_ERR_OR_NULL like in gem_handle_delete. So yeah, definitely should just the existing paths 1:1 because this is endless amounts of tricky. Also add the Fixes: line for the original ioctl, I forgot that too. Reported-by: DARKNAVY (@DarkNavyOrg) <vr@darknavy.com> Signed-off-by: Simona Vetter <simona.vetter@ffwll.ch> Fixes: dc366607c41c ("drm: Replace old pointer to new idr") Cc: syzbot+d7c9eed171647e421013@syzkaller.appspotmail.com Cc: stable@vger.kernel.org Cc: Edward Adam Davis <eadavis@qq.com> Cc: Dave Airlie <airlied@redhat.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Maxime Ripard <mripard@kernel.org> Cc: Thomas Zimmermann <tzimmermann@suse.de> Fixes: 5e28b7b94408 ("drm: Set old handle to NULL before prime swap in change_handle") Cc: David Francis <David.Francis@amd.com> Cc: Puttimet Thammasaeng <pwn8official@gmail.com> Cc: Christian Koenig <Christian.Koenig@amd.com> Fixes: 7164d78559b0 ("drm/gem: fix race between change_handle and handle_delete") Cc: Zhenghang Xiao <kipreyyy@gmail.com> Fixes: 5e28b7b94408 ("drm: Set old handle to NULL before prime swap in change_handle") Reviewed-by: David Francis <David.Francis@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Link: https://patch.msgid.link/20260604194437.1725314-1-simona.vetter@ffwll.ch Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 daysslimbus: qcom-ngd-ctrl: Avoid ABBA on tx_lock/ctrl->lockBjorn Andersson1-3/+0
commit 55f2ea9ff83cc27a85526b14bc9b32f96a08d6ec upstream. During the SSR/PDR down notification the tx_lock is taken with the intent to provide synchronization with active DMA transfers. But during this period qcom_slim_ngd_down() is invoked, which ends up in slim_report_absent(), which takes the slim_controller lock. In multiple other codepaths these two locks are taken in the opposite order (i.e. slim_controller then tx_lock). The result is a lockdep splat, and a possible deadlock: rprocctl/449 is trying to acquire lock: ffff00009793e620 (&ctrl->lock){+.+.}-{4:4}, at: slim_report_absent (drivers/slimbus/core.c:322) slimbus but task is already holding lock: ffff00009793fb50 (&ctrl->tx_lock){+.+.}-{4:4}, at: qcom_slim_ngd_ssr_pdr_notify (drivers/slimbus/qcom-ngd-ctrl.c:1475) slim_qcom_ngd_ctrl which lock already depends on the new lock. Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&ctrl->tx_lock); lock(&ctrl->lock); lock(&ctrl->tx_lock); lock(&ctrl->lock); The assumption is that the comment refers to the desire to not call qcom_slim_ngd_exit_dma() while we have an ongoing DMA TX transaction. But any such transaction is initiated and completed within a single qcom_slim_ngd_xfer_msg(). Prior to calling qcom_slim_ngd_exit_dma() the slim_controller is torn down, all child devices are notified that the slimbus is gone and the child devices are removed. Stop taking the tx_lock in qcom_slim_ngd_ssr_pdr_notify() to avoid the deadlock. Fixes: a899d324863a ("slimbus: qcom-ngd-ctrl: add Sub System Restart support") Cc: stable@vger.kernel.org Signed-off-by: Bjorn Andersson <bjorn.andersson@oss.qualcomm.com> Signed-off-by: Srinivas Kandagatla <srini@kernel.org> Link: https://patch.msgid.link/20260530204421.116824-9-srini@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 daysslimbus: qcom-ngd-ctrl: Balance pm_runtime enablement for NGDBjorn Andersson1-1/+5
commit 6a003446b725c44b9e3ffa111b0effbaa2d43085 upstream. The pm_runtime_enable() and pm_runtime_use_autosuspend() calls are supposed to be balanced on exit, add these calls. Fixes: 917809e2280b ("slimbus: ngd: Add qcom SLIMBus NGD driver") Cc: stable@vger.kernel.org Signed-off-by: Bjorn Andersson <bjorn.andersson@oss.qualcomm.com> Signed-off-by: Srinivas Kandagatla <srini@kernel.org> Link: https://patch.msgid.link/20260530204421.116824-8-srini@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 daysslimbus: qcom-ngd-ctrl: Correct PDR and SSR cleanup ownershipBjorn Andersson1-2/+3
commit 960b53a3f76fa214c2fc493734ae7b3c5e713bbf upstream. PDR and SSR callbacks are registred from the controller probe function, but currently released from the child device's remove function. The remove() function should only be unwinding what was done in the same device's probe() function. Fixes: 917809e2280b ("slimbus: ngd: Add qcom SLIMBus NGD driver") Cc: stable@vger.kernel.org Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Reviewed-by: Mukesh Ojha <mukesh.ojha@oss.qualcomm.com> Signed-off-by: Bjorn Andersson <bjorn.andersson@oss.qualcomm.com> Signed-off-by: Srinivas Kandagatla <srini@kernel.org> Link: https://patch.msgid.link/20260530204421.116824-5-srini@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>