summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)AuthorFilesLines
2025-08-01Merge branch 'pci/controller/xgene'Bjorn Helgaas1-1/+0
- Teach handle_simple_irq() to resend an in-progress interrupt (Marc Zyngier) - Defer probing if the MSI widget driver hasn't probed yet (Marc Zyngier) - Drop useless conditional compilation, since pci-xgene.c is only compiled when CONFIG_PCI_XGENE is selected (Marc Zyngier) - Drop useless XGENE_PCIE_IP_VER_UNKN IP version (Marc Zyngier) - Simplify and make per-CPU interrupt setup robust (Marc Zyngier) - Drop superfluous struct xgene_msi fields (Marc Zyngier) - Use device-managed memory allocations (Marc Zyngier) - Drop intermediate xgene_msi_group tracking structure (Marc Zyngier) - Rewrite pci-xgene-msi.c to fix MSI CPU affinity and clean things up (Marc Zyngier) - Resend an MSI racing with itself on a different CPU (Marc Zyngier) - Probe xgene-msi as a standard platform driver rather than a subsys_initcall (Marc Zyngier) - Simplify MSI handler setup/teardown by dropping useless CPU hotplug bits (Marc Zyngier) - Remove unused cpuhp_state CPUHP_PCI_XGENE_DEAD (Marc Zyngier) * pci/controller/xgene: cpu/hotplug: Remove unused cpuhp_state CPUHP_PCI_XGENE_DEAD PCI: xgene-msi: Restructure handler setup/teardown PCI: xgene-msi: Probe as a standard platform driver PCI: xgene-msi: Resend an MSI racing with itself on a different CPU PCI: xgene-msi: Sanitise MSI allocation and affinity setting PCI: xgene-msi: Get rid of intermediate tracking structure PCI: xgene-msi: Use device-managed memory allocations PCI: xgene-msi: Drop superfluous fields from xgene_msi structure PCI: xgene-msi: Make per-CPU interrupt setup robust PCI: xgene: Drop XGENE_PCIE_IP_VER_UNKN PCI: xgene: Drop useless conditional compilation PCI: xgene: Defer probing if the MSI widget driver hasn't probed yet genirq: Teach handle_simple_irq() to resend an in-progress interrupt
2025-08-01Merge branch 'pci/endpoint/doorbell'Bjorn Helgaas2-0/+46
- Add RC-to-EP doorbell support using platform MSI controller (Frank Li) - Check for MSI parent and mutability since we currently don't support mutable MSI controllers (Frank Li) - Add pci_epf_align_inbound_addr() helper (Frank Li) - Add a doorbell test (Frank Li) * pci/endpoint/doorbell: selftests: pci_endpoint: Add doorbell test case misc: pci_endpoint_test: Add doorbell test case PCI: endpoint: pci-epf-test: Add doorbell test support PCI: endpoint: Add pci_epf_align_inbound_addr() helper for inbound address alignment PCI: endpoint: pci-ep-msi: Add checks for MSI parent and mutability PCI: endpoint: Add RC-to-EP doorbell support using platform MSI controller
2025-08-01Merge branch 'pci/resources'Bjorn Helgaas1-0/+6
- Restore VF resizable BAR state after reset (Michał Winiarski) - Add pci_resource_num_to_vf_bar() and pci_resource_num_from_vf_bar() to convert between VF BAR number and the dev->resource[] index (Michał Winiarski) - Allow IOV resources (VF BARs) to be resized (Michał Winiarski) - Add pci_iov_vf_bar_set_size() so drivers can control VF BAR size (Michał Winiarski) * pci/resources: PCI/IOV: Allow drivers to control VF BAR size PCI/IOV: Check that VF BAR fits within the reservation PCI/IOV: Allow IOV resources to be resized in pci_resize_resource() PCI/IOV: Add pci_resource_num_to_vf_bar() to convert VF BAR number to/from IOV resource PCI/IOV: Restore VF resizable BAR state after reset
2025-08-01Merge branch 'pci/pwrctrl'Bjorn Helgaas1-1/+1
- Add optional slot clock for cases where the PCIe host controller and the slot are supplied by different clocks (Marek Vasut) - Fix kerneldoc tag for private fields (Bartosz Golaszewski) * pci/pwrctrl: PCI/pwrctrl: Fix the kerneldoc tag for private fields PCI/pwrctrl: Add optional slot clock for PCI slots
2025-08-01Merge branch 'pci/hotplug'Bjorn Helgaas2-1/+8
- Fix runtime PM ref imbalance on Hot-Plug Capable ports caused by misinterpreting a config read failure after a device has been removed (Lukas Wunner) - Avoid creating a useless PCIe port service device for pciehp if the slot is handled by the ACPI hotplug driver (Lukas Wunner) - Ignore ACPI hotplug slots when calculating depth of pciehp hotplug ports (Lukas Wunner) - Simplify pci_bridge_d3_possible() and clarify comments (Lukas Wunner) * pci/hotplug: PCI: Move is_pciehp check out of pciehp_is_native() PCI: pciehp: Use is_pciehp instead of is_hotplug_bridge PCI/portdrv: Use is_pciehp instead of is_hotplug_bridge PCI/ACPI: Fix runtime PM ref imbalance on Hot-Plug Capable ports
2025-08-01Merge branch 'pci/enumeration'Bjorn Helgaas1-0/+3
- Allow 'isolated PCI functions' (multi-function devices without a function 0) for LoongArch, similar to s390 and jailhouse (Huacai Chen) - Mask out unrelated bits in PCIE_LNKCAP_SLS2SPEED() and PCIE_LNKCTL2_TLS2SPEED(), which makes them more robust and fixes a WARN_ON_ONCE() in pcie_set_target_speed() (Jiwei Sun) - Read Link Control 2 again when retraining a link after a training failure so we try to increase the link speed (Jiwei Sun) - Allow built-in drivers, not just modular drivers, to use async initial probing (Lukas Wunner) - Support Immediate Readiness even on devices with no PM Capability (Sean Christopherson) * pci/enumeration: PCI: Support Immediate Readiness on devices without PM capabilities PCI: Allow built-in drivers to use async initial probing PCI: Adjust the position of reading the Link Control 2 register PCI: Fix link speed calculation on retrain failure PCI: Extend isolated function probing to LoongArch
2025-08-01block: Fix default IO priority if there is no IO contextGuenter Roeck1-1/+2
Upstream commit 53889bcaf536 ("block: make __get_task_ioprio() easier to read") changes the IO priority returned to the caller if no IO context is defined for the task. Prior to this commit, the returned IO priority was determined by task_nice_ioclass() and task_nice_ioprio(). Now it is always IOPRIO_DEFAULT, which translates to IOPRIO_CLASS_NONE with priority 0. However, task_nice_ioclass() returns IOPRIO_CLASS_IDLE, IOPRIO_CLASS_RT, or IOPRIO_CLASS_BE depending on the task scheduling policy, and task_nice_ioprio() returns a value determined by task_nice(). This causes regressions in test code checking the IO priority and class of IO operations on tasks with no IO context. Fix the problem by returning the IO priority calculated from task_nice_ioclass() and task_nice_ioprio() if no IO context is defined to match earlier behavior. Fixes: 53889bcaf536 ("block: make __get_task_ioprio() easier to read") Cc: Jens Axboe <axboe@kernel.dk> Cc: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20250731044953.1852690-1-linux@roeck-us.net Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-07-31Merge branch 'for-6.17/selftests' into for-linusJiri Kosina12-12/+60
- upgrade the python scripts in hid-selftests to match hid-tools version 0.10 (Benjamin Tissoires)
2025-07-31Merge branch 'for-6.17/core' into for-linusJiri Kosina1-1/+7
- hardening of HID core parser against conversion to 0 bits in s32ton() by buggy/malicious devices (Alan Stern)
2025-07-31Merge tag 'mtd/for-6.17' of ↵Linus Torvalds3-33/+52
git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux Pull mtd updates from Miquel Raynal: "MTD changes: - Apart from a binding conversion to yaml, only minor changes/small fixes have been merged. Raw NAND changes: - Minor fixes for various controller drivers like DMA mapping checks, better timing derivations or bitflip statistics. - some Hynix NAND flashes were not supporting read-retries, so don't even try to do it SPI NAND changes: - In order to support high-speed modes, certain chips need extra configuration like adding more dummy cycles. This is now possible, especially on Winbond chips. - Aside from that, Gigadevice gets support for a new chip (GD5F1GM9). SPI NOR changes: - A notable changes is the fix for exiting 4-byte addressing on Infineon SEMPER flashes. These flashes do not support the standard EX4B opcode (E9h), and use a vendor-specific opcode (B8h) instead. - There is also a fix for unlocking flashes that are write-protected at power-on. This was caused by using an uninitialized mtd_info in spi_nor_try_unlock_all()" * tag 'mtd/for-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux: (26 commits) mtd: spinand: winbond: Add comment about the maximum frequency mtd: spinand: winbond: Enable high-speed modes on w35n0xjw mtd: spinand: winbond: Enable high-speed modes on w25n0xjw mtd: spinand: Add a ->configure_chip() hook mtd: spinand: Add a frequency field to all READ_FROM_CACHE variants mtd: spinand: Fix macro alignment spi: spi-mem: Take into account the actual maximum frequency spi: spi-mem: Use picoseconds for calculating the op durations mtd: rawnand: atmel: set pmecc data setup time mtd: spinand: propagate spinand_wait() errors from spinand_write_page() mtd: rawnand: fsmc: Add missing check after DMA map mtd: rawnand: rockchip: Add missing check after DMA map mtd: rawnand: hynix: don't try read-retry on SLC NANDs mtd: rawnand: atmel: Fix dma_mapping_error() address mtd: nand: brcmnand: fix mtd corrected bits stat mtd: rawnand: renesas: Add missing check after DMA map mtd: spinand: gigadevice: Add support for GD5F1GM9 chips mtd: nand: brcmnand: replace manual string choices with standard helpers mtd: map: Don't use "proxy" headers mtd: spi-nor: Fix spi_nor_try_unlock_all() ...
2025-07-31Merge tag 'clk-for-linus' of ↵Linus Torvalds3-23/+82
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux Pull clk updates from Stephen Boyd: "This is the usual collection of primarily clk driver updates. The big part of the diff is all the new Qualcomm clk drivers added for a few SoCs they're working on. The other two vendors with significant work this cycle are Renesas and Amlogic. Renesas adds a bunch of clks to existing drivers and supports some new SoCs while Amlogic is starting a significant refactoring to simplify their code. The core framework gained a pair of helpers to get the 'struct device' or 'struct device_node' associated with a 'struct clk_hw'. Some associated KUnit tests were added for these simple helpers as well. Beyond that core change there are lots of little fixes throughout the clk drivers for the stuff we see every day, wrong clk driver data that affects tree topology or supported frequencies, etc. They're not found until the clks are actually used by some consumer device driver. New Drivers: - Global, display, gpu, video, camera, tcsr, and rpmh clock controller for the Qualcomm Milos SoC - Camera, display, GPU, and video clock controllers for Qualcomm QCS615 - Video clock controller driver for Qualcomm SM6350 - Camera clock controller driver for Qualcomm SC8180X - I3C clocks and resets on Renesas RZ/G3E - Expanded Serial Peripheral Interface (xSPI) clocks and resets on Renesas RZ/V2H(P) and RZ/V2N - SPI (RSPI) clocks and resets on Renesas RZ/V2H(P) - SDHI and I2C clocks on Renesas RZ/T2H and RZ/N2H - Ethernet clocks and resets on Renesas RZ/G3E - Initial support for the Renesas RZ/T2H (R9A09G077) and RZ/N2H (R9A09G087) SoCs - Ethernet clocks and resets on Renesas RZ/V2H and RZ/V2N - Timer, I2C, watchdog, GPU, and USB2.0 clocks and resets on Renesas RZ/V2N Updates: - Support atomic PWMs in the PWM clk driver - clk_hw_get_dev() and clk_hw_get_of_node() helpers - Replace round_rate() with determine_rate() in various clk drivers - Convert clk DT bindings to DT schema format for DT validation - Various clk driver cleanups and refactorings from static analysis tools and possibly real humans - A lot of little fixes here and there to things like clk tree topology, missing frequencies, flagging clks as critical, etc" * tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (216 commits) clk: clocking-wizard: Fix the round rate handling for versal clk: Fix typos clk: spacemit: ccu_pll: fix error return value in recalc_rate callback clk: tegra: periph: Make tegra_clk_periph_ops static clk: tegra: periph: Fix error handling and resolve unsigned compare warning clk: imx: scu: convert from round_rate() to determine_rate() clk: imx: pllv4: convert from round_rate() to determine_rate() clk: imx: pllv3: convert from round_rate() to determine_rate() clk: imx: pllv2: convert from round_rate() to determine_rate() clk: imx: pll14xx: convert from round_rate() to determine_rate() clk: imx: pfd: convert from round_rate() to determine_rate() clk: imx: frac-pll: convert from round_rate() to determine_rate() clk: imx: fracn-gppll: convert from round_rate() to determine_rate() clk: imx: fixup-div: convert from round_rate() to determine_rate() clk: imx: cpu: convert from round_rate() to determine_rate() clk: imx: busy: convert from round_rate() to determine_rate() clk: imx: composite-93: remove round_rate() in favor of determine_rate() clk: imx: composite-8m: remove round_rate() in favor of determine_rate() clk: qcom: Remove redundant pm_runtime_mark_last_busy() calls clk: imx: Remove redundant pm_runtime_mark_last_busy() calls ...
2025-07-31Merge tag 'hwmon-for-v6.17' of ↵Linus Torvalds1-0/+6
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging Pull hwmon updates from Guenter Roeck: "Updated chip support in existing drivers: - ina238: Support for INA228 - pmbus/tps53679: Support for TPS53685 - pmbus/adp1050: Support for adp1051, adp1055 and ltp8800 - corsair-psu: Support for HX1200i Series 2025 - pmbus/isl68137: Support for RAA229621 - asus-ec-sensors: Support for ProArt X870E-CREATOR WIFI and Other notable changes: - adt7475: Support for #pwm-cells = <3> - amc6821: Cooling device support - emc2305: Support for PWM frequency, polarity and output - Add missing compatible entries to various devicetree bindings And various other minor fixes and improvements" * tag 'hwmon-for-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: (34 commits) dt-bindings: hwmon: Replace bouncing Alexandru Tachici emails hwmon: (ina238) Add support for INA228 dt-bindings: Add INA228 to ina2xx devicetree bindings hwmon: (ina238) Fix inconsistent whitespace dt-bindings: hwmon: adt7475: Allow and recommend #pwm-cells = <3> hwmon: (adt7475) Implement support for #pwm-cells = <3> hwmon: (pmbus/tps53679) Add support for TPS53685 dt-bindings: trivial: Add tps53685 support hwmon: (pmbus/adp1050) Add regulator support for ltp8800 hwmon: (pmbus/adp1050) Add support for adp1051, adp1055 and ltp8800 dt-bindings: hwmon: pmbus/adp1050: Add adp1051, adp1055 and ltp8800 hwmon: (max31827) use sysfs_emit() in temp1_resolution_show() hwmon: (ltc4282) convert from round_rate() to determine_rate() hwmon: (corsair-psu) add support for HX1200i Series 2025 dt-bindings: hwmon: pmbus: ti,ucd90320: Add missing compatibles dt-bindings: hwmon: maxim,max20730: Add maxim,max20710 compatible dt-bindings: hwmon: lltc,ltc2978: Add lltc,ltc713 compatible dt-bindings: hwmon: ti,lm87: Add adi,adm1024 compatible dt-bindings: hwmon: national,lm90: Add missing Dallas max6654 and onsemi nct72, nct214, and nct218 hwmon: (w83627ehf) make the read-only arrays 'bit' static const ...
2025-07-31Merge tag 'media/v6.17-1' of ↵Linus Torvalds1-0/+3
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull media updates from Mauro Carvalho Chehab: - v4l2 core: - sub-device framework routing improvements - NV12M tiled variants added to v4l2_format_info - some fixes at control handler freeing logic - fixed H264 SEPARATE_COLOUR_PLANE check - new staging driver: Intel IPU7 PCI - Rockchip video decoder driver got promoted from staging - iris: added HEVC/VP9 encoder/decoder support - vsp1: driver has gained Renesas VSPX support - uvc: - switched to vb2 ioctl helpers - added MSXU 1.5 metadata support - atomisp: GC0310 sensor driver cleanups in preparation for moving it out of staging - Lots of cleanup, fixes and improvements * tag 'media/v6.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (310 commits) media: rkvdec: Unstage the driver media: rkvdec: Remove TODO file media: dt-bindings: rockchip: Add RK3576 Video Decoder bindings media: dt-bindings: rockchip: Document RK3588 Video Decoder bindings media: amphion: Support dmabuf and v4l2 buffer without binding media: verisilicon: postproc: 4K support media: v4l2: Add support for NV12M tiled variants to v4l2_format_info() media: uvcvideo: Use a count variable for meta_formats instead of 0 terminating media: uvcvideo: Auto-set UVC_QUIRK_MSXU_META media: uvcvideo: Introduce V4L2_META_FMT_UVC_MSXU_1_5 media: uvcvideo: Introduce dev->meta_formats media: Documentation: Add note about UVCH length field media: uvcvideo: Do not mark valid metadata as invalid media: uvcvideo: uvc_v4l2_unlocked_ioctl: Invert PM logic media: core: export v4l2_translate_cmd media: uvcvideo: Turn on the camera if V4L2_EVENT_SUB_FL_SEND_INITIAL media: uvcvideo: Remove stream->is_streaming field media: uvcvideo: Split uvc_stop_streaming() media: uvcvideo: Handle locks in uvc_queue_return_buffers media: uvcvideo: Use vb2 ioctl and fop helpers ...
2025-07-31Merge tag 'libnvdimm-for-6.17' of ↵Linus Torvalds1-5/+10
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm updates from Ira Weiny: "Header file cleanups. Specifically use specific headers rather than just kernel.h in libnvdimm.h to reduce header file usage. The second patch is a fix to CXL but is going through my tree as a libnvdimm patch caused the issue" * tag 'libnvdimm-for-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: cxl: Include range.h in cxl.h libnvdimm: Don't use "proxy" headers
2025-07-31Merge tag 'for-linus-iommufd' of ↵Linus Torvalds2-36/+234
git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd Pull iommufd updates from Jason Gunthorpe: "This broadly brings the assigned HW command queue support to iommufd. This feature is used to improve SVA performance in VMs by avoiding paravirtualization traps during SVA invalidations. Along the way I think some of the core logic is in a much better state to support future driver backed features. Summary: - IOMMU HW now has features to directly assign HW command queues to a guest VM. In this mode the command queue operates on a limited set of invalidation commands that are suitable for improving guest invalidation performance and easy for the HW to virtualize. This brings the generic infrastructure to allow IOMMU drivers to expose such command queues through the iommufd uAPI, mmap the doorbell pages, and get the guest physical range for the command queue ring itself. - An implementation for the NVIDIA SMMUv3 extension "cmdqv" is built on the new iommufd command queue features. It works with the existing SMMU driver support for cmdqv in guest VMs. - Many precursor cleanups and improvements to support the above cleanly, changes to the general ioctl and object helpers, driver support for VDEVICE, and mmap pgoff cookie infrastructure. - Sequence VDEVICE destruction to always happen before VFIO device destruction. When using the above type features, and also in future confidential compute, the internal virtual device representation becomes linked to HW or CC TSM configuration and objects. If a VFIO device is removed from iommufd those HW objects should also be cleaned up to prevent a sort of UAF. This became important now that we have HW backing the VDEVICE. - Fix one syzkaller found error related to math overflows during iova allocation" * tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: (57 commits) iommu/arm-smmu-v3: Replace vsmmu_size/type with get_viommu_size iommu/arm-smmu-v3: Do not bother impl_ops if IOMMU_VIOMMU_TYPE_ARM_SMMUV3 iommufd: Rename some shortterm-related identifiers iommufd/selftest: Add coverage for vdevice tombstone iommufd/selftest: Explicitly skip tests for inapplicable variant iommufd/vdevice: Remove struct device reference from struct vdevice iommufd: Destroy vdevice on idevice destroy iommufd: Add a pre_destroy() op for objects iommufd: Add iommufd_object_tombstone_user() helper iommufd/viommu: Roll back to use iommufd_object_alloc() for vdevice iommufd/selftest: Test reserved regions near ULONG_MAX iommufd: Prevent ALIGN() overflow iommu/tegra241-cmdqv: import IOMMUFD module namespace iommufd: Do not allow _iommufd_object_alloc_ucmd if abort op is set iommu/tegra241-cmdqv: Add IOMMU_VEVENTQ_TYPE_TEGRA241_CMDQV support iommu/tegra241-cmdqv: Add user-space use support iommu/tegra241-cmdqv: Do not statically map LVCMDQs iommu/tegra241-cmdqv: Simplify deinit flow in tegra241_cmdqv_remove_vintf() iommu/tegra241-cmdqv: Use request_threaded_irq iommu/arm-smmu-v3-iommufd: Add hw_info to impl_ops ...
2025-07-31Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds4-2/+50
Pull rdma updates from Jason Gunthorpe: - Various minor code cleanups and fixes for hns, iser, cxgb4, hfi1, rxe, erdma, mana_ib - Prefetch supprot for rxe ODP - Remove memory window support from hns as new device FW is no longer support it - Remove qib, it is very old and obsolete now, Cornelis wishes to restructure the hfi1/qib shared layer - Fix a race in destroying CQs where we can still end up with work running because the work is cancled before the driver stops triggering it - Improve interaction with namespaces: * Follow the devlink namespace for newly spawned RDMA devices * Create iopoib net devces in the parent IB device's namespace * Allow CAP_NET_RAW checks to pass in user namespaces - A new flow control scheme for IB MADs to try and avoid queue overflows in the network - Fix 2G message sizes in bnxt_re - Optimize mkey layout for mlx5 DMABUF - New "DMA Handle" concept to allow controlling PCI TPH and steering tags * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (71 commits) RDMA/siw: Change maintainer email address RDMA/mana_ib: add support of multiple ports RDMA/mlx5: Refactor optional counters steering code RDMA/mlx5: Add DMAH support for reg_user_mr/reg_user_dmabuf_mr IB: Extend UVERBS_METHOD_REG_MR to get DMAH RDMA/mlx5: Add DMAH object support RDMA/core: Introduce a DMAH object and its alloc/free APIs IB/core: Add UVERBS_METHOD_REG_MR on the MR object net/mlx5: Add support for device steering tag net/mlx5: Expose IFC bits for TPH PCI/TPH: Expose pcie_tph_get_st_table_size() RDMA/mlx5: Fix incorrect MKEY masking RDMA/mlx5: Fix returned type from _mlx5r_umr_zap_mkey() RDMA/mlx5: remove redundant check on err on return expression RDMA/mana_ib: add additional port counters RDMA/mana_ib: Fix DSCP value in modify QP RDMA/efa: Add CQ with external memory support RDMA/core: Add umem "is_contiguous" and "start_dma_addr" helpers RDMA/uverbs: Add a common way to create CQ with umem RDMA/mlx5: Optimize DMABUF mkey page size ...
2025-07-31Merge tag 'leds-next-6.17' of ↵Linus Torvalds2-2/+1
git://git.kernel.org/pub/scm/linux/kernel/git/lee/leds Pull LED updates from Lee Jones: "Improvements & Fixes: - A fix for QCOM Flash to prevent incorrect register access when the driver is re-bound. This is solved by duplicating a static register array during the probe function, which prevents register addresses from being miscalculated after multiple binds - The LP50xx driver now correctly handles the 'reg' property in device tree child nodes to ensure the multi_index is set correctly. This prevents issues where LEDs could be controlled incorrectly if the device tree nodes were processed in a different order. An error is returned if the reg property is missing or out of range - Add a Kconfig dependency on between TPS6131x and V4L2_FLASH_LED_CLASS to prevent a build failure when the driver is built-in and the V4L2 flash infrastructure is a loadable module - Fix a potential buffer overflow warning in PCA955x reported by older GCC versions by using a more precise format specifier when creating the default LED label. Cleanups & Refactoring: - Correct the MAINTAINERS file entry for the TPS6131X flash LED driver to point to the correct device tree binding file name - Fix a comment in the Flash Class for the flash_timeout setter to "flash timeout" from "flash duration" for accuracy - The of_led_get() function is no longer exported as it has no users outside of its own module. Removals: - Revert the commit to configure LED blink intervals for hardware offload in the Netdev Trigger. This change was found to break existing PHY drivers by putting their LEDs into a permanent, unconditional blinking state. Device Tree Bindings Updates: - Update the binding for LP50xx to document that the child reg property is the index within the LED bank. The example was also updated to use correct values - Update the JNCP5623 binding to add 0x39 as a valid I2C address, as it is used by the NCP5623C variant" * tag 'leds-next-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/leds: dt-bindings: leds: ncp5623: Add 0x39 as a valid I2C address Revert "leds: trigger: netdev: Configure LED blink interval for HW offload" leds: pca955x: Avoid potential overflow when filling default_label (take 2) leds: Unexport of_led_get() leds: tps6131x: Add V4L2_FLASH_LED_CLASS dependency dt-bindings: leds: lp50xx: Document child reg, fix example leds: leds-lp50xx: Handle reg to get correct multi_index leds: led-class-flash:: Fix flash_timeout comment MAINTAINERS: Adjust file entry in TPS6131X FLASH LED DRIVER leds: flash: leds-qcom-flash: Fix registry access after re-bind
2025-07-31Merge tag 'mfd-next-6.17' of ↵Linus Torvalds8-265/+21
git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd Pull MFD updates from Lee Jones: "New Support & Features: - Add extensive support for the Analog Devices ADP5589 I/O expander, including core MFD, GPIO, PWM, and a new keypad matrix input driver. This also adds support for handling various events including GPI, keypad, reset and unlock ev ents - Add support for the TI TPS652G1 PMIC, a stripped-down version of the TPS65224, including core MFD, PFSM, pinctrl, and GPIO support - Add support for the Apple Silicon System Management Controller (SMC), including the core MFD driver which handles the RTKit-based protocol, a new GPIO driver for PMU GPIOs, and a new reboot/power-off driver. Improvements & Fixes: - Dynamically add ADP5585 sub-devices based on device tree properties - Move ADP5585 oscillator control from the child PWM driver to the main MFD driver to better handle shared resources - Add support for a hardware reset pin and VDD regulator to the ADP5585 driver - Update the TPS65219 MFD cell's GPIO compatible string for the TPS65214 to reflect hardware capabilities correctly - Separate the ChromeOS EC charge-control probing from the USB-PD subsystem, allowing it to probe independently based on the dedicated EC_FEATURE_CHARGER - Fix an interrupt naming typo in the MT6370 driver - Fix RK806 PMIC reset behavior by allowing the reset mode to be customized via a new device tree property - Fix AXP20X regulator cell ID conflicts for secondary PMICs on boards without an IRQ line connected - Fix MT6397 keypad sub-device creation to use specific names instead of a generic one, ensuring correct driver binding - Fix a build warning in the stm32-timers driver by adding a missing include for export.h. Cleanups & Refactoring: - Refactor the ADP5585 driver to simplify how regmap defaults are handled, making it easier to add new chip variants - Introduce per-chip register map structures for the ADP5585/ADP5589 family to handle differences between the devices - Convert several drivers to use dev_fwnode() instead of of_fwnode_handle() - Make various static structures const in the cs40l50, rohm-bd71828, tps65219, and twl6040 drivers - Remove redundant pm_runtime_mark_last_busy() calls from several drivers - Alphabetize Kconfig entries for Cirrus Logic and Maxim drivers - Remove unused fields from the 'tps65219' struct - Update several MFD-related headers to follow the 'Include What You Use' (IWYU) principle. Removals: - Remove the old, platform-data-based adp5589-keys input driver, which is now superseded by the new MFD-based adp5585-keys driver - Remove the unused twl6030_mmc_card_detect() functions and associated header declarations - Remove the now unused pcf50633/core.h header file - Remove the fsl,imx8qxp-csr device tree binding, which was being used incorrectly. Device Tree Bindings Updates: - Add support for the Analog Devices ADP5589 I/O expander to the adi,adp5585.yaml binding - Add new properties to the adi,adp5585.yaml binding for input events, including keypad pins, unlock events, and reset events - Add a reset-gpios property to the adi,adp5585.yaml binding - Add the TI TPS652G1 PMIC to the ti,tps6594.yaml binding - Add new bindings for the Apple Mac System Management Controller (SMC) and its sub-devices: apple,smc.yaml, apple,smc-gpio.yaml, and apple,smc-reboot.yaml - Convert the Freescale MXS LRADC binding (mxs-lradc) to YAML schema format - Convert and combine the NXP LPC1850 CREG, DMAMUX, and USB OTG PHY bindings into a single YAML schema file - Convert the TI TPS65910 binding to YAML schema format - Add a comment to the samsung,s2mps11.yaml binding to clarify the use of 'oneOf' for interrupt properties - Add the rockchip,reset-mode property to the rockchip,rk806.yaml binding to allow customization of the PMIC's reset behavior" * tag 'mfd-next-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: (28 commits) mfd: dt-bindings: Convert TPS65910 to DT schema mfd: Minor Cirrus/Maxim Kconfig order fixes mfd: Remove redundant pm_runtime_mark_last_busy() calls mfd: mt6397: Do not use generic name for keypad sub-devices mfd: axp20x: Set explicit ID for regulator cell if no IRQ line is present mfd: mt6370: Fix the interrupt naming typo mfd: rk8xx-core: Allow to customize RK806 reset mode dt-bindings: mfd: rk806: Allow to customize PMIC reset mode mfd: syscon: atmel-smc: Don't use "proxy" headers mfd: madera: Don't use "proxy" headers mfd: wm8350-core: Don't use "proxy" headers dt-bindings: mfd: samsung,s2mps11: Add comment about interrupts properties mfd: davinci_voicecodec: Don't use "proxy" headers mfd: pcf50633: Remove the header file core.h mfd: tps65219: Remove another unused field from 'struct tps65219' mfd: tps65219: Remove an unused field from 'struct tps65219' mfd: tps65219: Constify struct regmap_irq_sub_irq_map and tps65219_chip_data mfd: rohm-bd71828: Constify some structures dt-bindings: mfd: fsl,imx8qxp-csr: Remove binding documentation mfd: axp20x: Set explicit ID for AXP313 regulator ...
2025-07-31bpf: Fix oob access in cgroup local storageDaniel Borkmann1-0/+1
Lonial reported that an out-of-bounds access in cgroup local storage can be crafted via tail calls. Given two programs each utilizing a cgroup local storage with a different value size, and one program doing a tail call into the other. The verifier will validate each of the indivial programs just fine. However, in the runtime context the bpf_cg_run_ctx holds an bpf_prog_array_item which contains the BPF program as well as any cgroup local storage flavor the program uses. Helpers such as bpf_get_local_storage() pick this up from the runtime context: ctx = container_of(current->bpf_ctx, struct bpf_cg_run_ctx, run_ctx); storage = ctx->prog_item->cgroup_storage[stype]; if (stype == BPF_CGROUP_STORAGE_SHARED) ptr = &READ_ONCE(storage->buf)->data[0]; else ptr = this_cpu_ptr(storage->percpu_buf); For the second program which was called from the originally attached one, this means bpf_get_local_storage() will pick up the former program's map, not its own. With mismatching sizes, this can result in an unintended out-of-bounds access. To fix this issue, we need to extend bpf_map_owner with an array of storage_cookie[] to match on i) the exact maps from the original program if the second program was using bpf_get_local_storage(), or ii) allow the tail call combination if the second program was not using any of the cgroup local storage maps. Fixes: 7d9c3427894f ("bpf: Make cgroup storages shared between programs on the same cgroup") Reported-by: Lonial Con <kongln9170@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20250730234733.530041-4-daniel@iogearbox.net Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-31bpf: Move cgroup iterator helpers to bpf.hDaniel Borkmann2-13/+14
Move them into bpf.h given we also need them in core code. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20250730234733.530041-3-daniel@iogearbox.net Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-31bpf: Move bpf map owner out of common structDaniel Borkmann1-12/+24
Given this is only relevant for BPF tail call maps, it is adding up space and penalizing other map types. We also need to extend this with further objects to track / compare to. Therefore, lets move this out into a separate structure and dynamically allocate it only for BPF tail call maps. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20250730234733.530041-2-daniel@iogearbox.net Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-31bpf: Add cookie object to bpf mapsDaniel Borkmann1-0/+1
Add a cookie to BPF maps to uniquely identify BPF maps for the timespan when the node is up. This is different to comparing a pointer or BPF map id which could get rolled over and reused. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20250730234733.530041-1-daniel@iogearbox.net Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-31Merge tag 'mips_6.17' of ↵Linus Torvalds2-2/+1
git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux Pull MIPS updates from Thomas Bogendoerfer: - DT updates for ralink, mobileye and atheros/qualcomm - Clean up of mc146818 usage - Speed up delay calibration for CPS - Other cleanups and fixes * tag 'mips_6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: (50 commits) MIPS: Don't use %pK through printk MIPS: Update Joshua Kinard's e-mail address MIPS: mobileye: dts: eyeq5,eyeq6h: rename the emmc controller MIPS: mm: tlb-r4k: Uniquify TLB entries on init MIPS: SGI-IP27: Delete an unnecessary check before kfree() in hub_domain_free() mips/malta,loongson2ef: use generic mc146818_get_time function mips: remove redundant macro mc146818_decode_year mips/mach-rm: remove custom mc146818rtc.h file mips: remove unused function mc146818_set_rtc_mmss MIPS: CPS: Optimise delay CPU calibration for SMP MIPS: CPS: Improve mips_cps_first_online_in_cluster() MIPS: disable MMID when not supported by the hardware MIPS: eyeq5_defconfig: add I2C subsystem, driver and temp sensor driver MIPS: eyeq5_defconfig: add GPIO subsystem & driver MIPS: mobileye: eyeq5: add two GPIO bank nodes MIPS: mobileye: eyeq5: add evaluation board I2C temp sensor MIPS: mobileye: eyeq5: add 5 I2C controller nodes MIPS: eyeq5_defconfig: Update for v6.16-rc1 MIPS: vpe-mt: add missing prototypes for vpe_{alloc,start,stop,free} mips: boot: use 'targets' instead of extra-y in Makefile ...
2025-07-31Merge tag 'fsnotify_for_v6.17-rc1' of ↵Linus Torvalds2-36/+11
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fsnotify updates from Jan Kara: "A couple of small improvements for fsnotify subsystem. The most interesting is probably Amir's change modifying the meaning of fsnotify fmode bits (and I spell it out specifically because I know you care about those). There's no change for the common cases of no fsnotify watches or no permission event watches. But when there are permission watches (either for open or for pre-content events) but no FAN_ACCESS_PERM watch (which nobody uses in practice) we are now able optimize away unnecessary cache loads from the read path" * tag 'fsnotify_for_v6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: fsnotify: optimize FMODE_NONOTIFY_PERM for the common cases fsnotify: merge file_set_fsnotify_mode_from_watchers() with open perm hook samples: fix building fs-monitor on musl systems fanotify: sanitize handle_type values when reporting fid
2025-07-31Merge tag 'ubifs-for-linus-6.17-rc1' of ↵Linus Torvalds1-1/+0
git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs Pull UBI and UBIFS updates from Richard Weinberger: "UBIFS: - No longer use write_cache_pages() UBI: - Remove an unused function" * tag 'ubifs-for-linus-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs: ubifs: stop using write_cache_pages mtd: ubi: Remove unused ubi_flush
2025-07-31Merge tag 'nand/for-6.17' into mtd/nextMiquel Raynal20-70/+147
* Raw NAND changes: Various controller drivers received minor fixes like DMA mapping checks, better timing derivations or bitflip statistics. It has also been discovered that some Hynix NAND flashes were not supporting read-retries, which is not properly supported. * SPI NAND changes: In order to support high-speed modes, certain chips need extra configuration like adding more dummy cycles. This is now possible, especially on Winbond chips. Aside from that, Gigadevice gets support for a new chip (GD5F1GM9). Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
2025-07-31Merge tag 'v6.17-p1' of ↵Linus Torvalds3-6/+5
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto update from Herbert Xu: "API: - Allow hash drivers without fallbacks (e.g., hardware key) Algorithms: - Add hmac hardware key support (phmac) on s390 - Re-enable sha384 in FIPS mode - Disable sha1 in FIPS mode - Convert zstd to acomp Drivers: - Lower priority of qat skcipher and aead - Convert aspeed to partial block API - Add iMX8QXP support in caam - Add rate limiting support for GEN6 devices in qat - Enable telemetry for GEN6 devices in qat - Implement full backlog mode for hisilicon/sec2" * tag 'v6.17-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (116 commits) crypto: keembay - Use min() to simplify ocs_create_linked_list_from_sg() crypto: hisilicon/hpre - fix dma unmap sequence crypto: qat - make adf_dev_autoreset() static crypto: ccp - reduce stack usage in ccp_run_aes_gcm_cmd crypto: qat - refactor ring-related debug functions crypto: qat - fix seq_file position update in adf_ring_next() crypto: qat - fix DMA direction for compression on GEN2 devices crypto: jitter - replace ARRAY_SIZE definition with header include crypto: engine - remove {prepare,unprepare}_crypt_hardware callbacks crypto: engine - remove request batching support crypto: qat - flush misc workqueue during device shutdown crypto: qat - enable rate limiting feature for GEN6 devices crypto: qat - add compression slice count for rate limiting crypto: qat - add get_svc_slice_cnt() in device data structure crypto: qat - add adf_rl_get_num_svc_aes() in rate limiting crypto: qat - relocate service related functions crypto: qat - consolidate service enums crypto: qat - add decompression service for rate limiting crypto: qat - validate service in rate limiting sysfs api crypto: hisilicon/sec2 - implement full backlog mode for sec ...
2025-07-31Merge tag 'docs-6.17' of git://git.lwn.net/linuxLinus Torvalds1-0/+8
Pull documentation updates from Jonathan Corbet: "It has been a relatively busy cycle for docs, especially the build system: - The Perl kernel-doc script was added to 2.3.52pre1 just after the turn of the millennium. Over the following 25 years, it accumulated a vast amount of cruft, all in a language few people want to deal with anymore. Mauro's Python replacement in 6.16 faithfully reproduced all of the cruft in the hope of avoiding regressions. Now that we have a more reasonable code base, though, we can work on cleaning it up; many of the changes this time around are toward that end. - A reorganization of the ext4 docs into the usual TOC format. - Various Chinese translations and updates. - A new script from Mauro to help with docs-build testing. - A new document for linked lists - A sweep through MAINTAINERS fixing broken GitHub git:// repository links. ...and lots of fixes and updates" * tag 'docs-6.17' of git://git.lwn.net/linux: (147 commits) scripts: add origin commit identification based on specific patterns sphinx: kernel_abi: fix performance regression with O=<dir> Documentation: core-api: entry: Replace deprecated KVM entry/exit functions docs: fault-injection: drop reference to md-faulty docs: document linked lists scripts: kdoc: make it backward-compatible with Python 3.7 docs: kernel-doc: emit warnings for ancient versions of Python Documentation/rtla: Describe exit status Documentation/rtla: Add include common_appendix.rst docs: kernel: Clarify printk_ratelimit_burst reset behavior Documentation: ioctl-number: Don't repeat macro names Documentation: ioctl-number: Shorten macros table Documentation: ioctl-number: Correct full path to papr-physical-attestation.h Documentation: ioctl-number: Extend "Include File" column width Documentation: ioctl-number: Fix linuxppc-dev mailto link overlayfs.rst: fix typos docs: kdoc: emit a warning for ancient versions of Python docs: kdoc: clean up check_sections() docs: kdoc: directly access the always-there KdocItem fields docs: kdoc: straighten up dump_declaration() ...
2025-07-31bitfield: Ensure the return values of helper functions are checkedBen Horgan1-4/+4
As type##_replace_bits() has no side effects it is only useful if its return value is checked. Add __must_check to enforce this usage. To have the bits replaced in-place typep##_replace_bits() can be used instead. Although, type_##_get_bits() and type_##_encode_bits() are harder to misuse they are still only useful if the return value is checked. For consistency, also add __must_check to these. Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Yury Norov (NVIDIA) <yury.norov@gmail.com>
2025-07-31bits: unify the non-asm GENMASK*()Vincent Mailhol1-22/+4
The newly introduced GENMASK_TYPE() macro can also be used to generate the pre-existing non-asm GENMASK*() variants. Apply GENMASK_TYPE() to GENMASK(), GENMASK_ULL() and GENMASK_U128(). Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Yury Norov (NVIDIA) <yury.norov@gmail.com>
2025-07-31bits: split the definition of the asm and non-asm GENMASK*()Vincent Mailhol1-6/+7
In an upcoming change, the non-asm GENMASK*() will all be unified to depend on GENMASK_TYPE() which indirectly depend on sizeof(), something not available in asm. Instead of adding further complexity to GENMASK_TYPE() to make it work for both asm and non asm, just split the definition of the two variants. Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Yury Norov (NVIDIA) <yury.norov@gmail.com>
2025-07-31cpumask: Remove unnecessary cpumask_nth_andnot()Shaopeng Tan2-43/+0
Commit 94f753143028("x86/resctrl: Optimize cpumask_any_housekeeping()") switched the only user of cpumask_nth_andnot() to other cpumask functions, but left the function cpumask_nth_andnot() unused. This makes function find_nth_andnot_bit() unused as well. Delete them. Signed-off-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Signed-off-by: Yury Norov [NVIDIA] <yury.norov@gmail.com>
2025-07-31unwind: Finish up unwind when a task exitsSteven Rostedt1-0/+3
On do_exit() when a task is exiting, if a unwind is requested and the deferred user stacktrace is deferred via the task_work, the task_work callback is called after exit_mm() is called in do_exit(). This means that the user stack trace will not be retrieved and an empty stack is created. Instead, add a function unwind_deferred_task_exit() and call it just before exit_mm() so that the unwinder can call the requested callbacks with the user space stack. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Indu Bhagat <indu.bhagat@oracle.com> Cc: "Jose E. Marchesi" <jemarch@gnu.org> Cc: Beau Belgrave <beaub@linux.microsoft.com> Cc: Jens Remus <jremus@linux.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Florian Weimer <fweimer@redhat.com> Cc: Sam James <sam@gentoo.org> Link: https://lore.kernel.org/20250729182406.504259474@kernel.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-31unwind: Add USED bit to only have one conditional on way back to user spaceSteven Rostedt1-9/+9
On the way back to user space, the function unwind_reset_info() is called unconditionally (but always inlined). It currently has two conditionals. One that checks the unwind_mask which is set whenever a deferred trace is called and is used to know that the mask needs to be cleared. The other checks if the cache has been allocated, and if so, it resets the nr_entries so that the unwinder knows it needs to do the work to get a new user space stack trace again (it only does it once per entering the kernel). Use one of the bits in the unwind mask as a "USED" bit that gets set whenever a trace is created. This will make it possible to only check the unwind_mask in the unwind_reset_info() to know if it needs to do work or not and eliminates a conditional that happens every time the task goes back to user space. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Indu Bhagat <indu.bhagat@oracle.com> Cc: "Jose E. Marchesi" <jemarch@gnu.org> Cc: Beau Belgrave <beaub@linux.microsoft.com> Cc: Jens Remus <jremus@linux.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Florian Weimer <fweimer@redhat.com> Cc: Sam James <sam@gentoo.org> Link: https://lore.kernel.org/20250729182406.155422551@kernel.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-31unwind deferred: Add unwind_completed mask to stop spurious callbacksSteven Rostedt2-1/+4
If there's more than one registered tracer to the unwind deferred infrastructure, it is currently possible that one tracer could cause extra callbacks to happen for another tracer if the former requests a deferred stacktrace after the latter's callback was executed and before the task went back to user space. Here's an example of how this could occur: [Task enters kernel] tracer 1 request -> add cookie to its buffer tracer 1 request -> add cookie to its buffer <..> [ task work executes ] tracer 1 callback -> add trace + cookie to its buffer [tracer 2 requests and triggers the task work again] [ task work executes again ] tracer 1 callback -> add trace + cookie to its buffer tracer 2 callback -> add trace + cookie to its buffer [Task exits back to user space] This is because the bit for tracer 1 gets set in the task's unwind_mask when it did its request and does not get cleared until the task returns back to user space. But if another tracer were to request another deferred stacktrace, then the next task work will executed all tracer's callbacks that have their bits set in the task's unwind_mask. To fix this issue, add another mask called unwind_completed and place it into the task's info->cache structure. The cache structure is allocated on the first occurrence of a deferred stacktrace and this unwind_completed mask is not needed until then. It's better to have it in the cache than to permanently waste space in the task_struct. After a tracer's callback is executed, it's bit gets set in this unwind_completed mask. When the task_work enters, it will AND the task's unwind_mask with the inverse of the unwind_completed which will eliminate any work that already had its callback executed since the task entered the kernel. When the task leaves the kernel, it will reset this unwind_completed mask just like it resets the other values as it enters user space. Link: https://lore.kernel.org/all/20250716142609.47f0e4a5@batman.local.home/ Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Indu Bhagat <indu.bhagat@oracle.com> Cc: "Jose E. Marchesi" <jemarch@gnu.org> Cc: Beau Belgrave <beaub@linux.microsoft.com> Cc: Jens Remus <jremus@linux.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Florian Weimer <fweimer@redhat.com> Cc: Sam James <sam@gentoo.org> Link: https://lore.kernel.org/20250729182405.989222722@kernel.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-31unwind deferred: Use bitmask to determine which callbacks to callSteven Rostedt2-4/+24
In order to know which registered callback requested a stacktrace for when the task goes back to user space, add a bitmask to keep track of all registered tracers. The bitmask is the size of long, which means that on a 32 bit machine, it can have at most 32 registered tracers, and on 64 bit, it can have at most 64 registered tracers. This should not be an issue as there should not be more than 10 (unless BPF can abuse this?). When a tracer registers with unwind_deferred_init() it will get a bit number assigned to it. When a tracer requests a stacktrace, it will have its bit set within the task_struct. When the task returns back to user space, it will call the callbacks for all the registered tracers where their bits are set in the task's mask. When a tracer is removed by the unwind_deferred_cancel() all current tasks will clear the associated bit, just in case another tracer gets registered immediately afterward and then gets their callback called unexpectedly. To prevent live locks from happening if an event that happens between the task_work and when the task goes back to user space, triggers the deferred unwind, have the unwind_mask get cleared on exit to user space and not after the callback is made. Move the pending bit from a value on the task_struct to bit zero of the unwind_mask (saves space on the task_struct). This will allow modifying the pending bit along with the work bits atomically. Instead of clearing a work's bit after its callback is called, it is delayed until exit. If the work is requested again, the task_work is not queued again and the request will be notified that the task has already been called by returning a positive number (the same as if it was already pending). The pending bit is cleared before calling the callback functions but the current work bits remain. If one of the called works registers again, it will not trigger a task_work if its bit is still present in the task's unwind_mask. If a new work requests a deferred unwind, then it will set both the pending bit and its own bit. Note this will also cause any work that was previously queued and had their callback already executed to be executed again. Future work will remove these spurious callbacks. The use of atomic_long bit operations were suggested by Peter Zijlstra: Link: https://lore.kernel.org/all/20250715102912.GQ1613200@noisy.programming.kicks-ass.net/ The unwind_mask could not be converted to atomic_long_t do to atomic_long not having all the bit operations needed by unwind_mask. Instead it follows other use cases in the kernel and just typecasts the unwind_mask to atomic_long_t when using the two atomic_long functions. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Indu Bhagat <indu.bhagat@oracle.com> Cc: "Jose E. Marchesi" <jemarch@gnu.org> Cc: Beau Belgrave <beaub@linux.microsoft.com> Cc: Jens Remus <jremus@linux.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Florian Weimer <fweimer@redhat.com> Cc: Sam James <sam@gentoo.org> Link: https://lore.kernel.org/20250729182405.822789300@kernel.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-31unwind_user/deferred: Add deferred unwinding interfaceJosh Poimboeuf2-0/+48
Add an interface for scheduling task work to unwind the user space stack before returning to user space. This solves several problems for its callers: - Ensure the unwind happens in task context even if the caller may be running in interrupt context. - Avoid duplicate unwinds, whether called multiple times by the same caller or by different callers. - Create a "context cookie" which allows trace post-processing to correlate kernel unwinds/traces with the user unwind. A concept of a "cookie" is created to detect when the stacktrace is the same. A cookie is generated the first time a user space stacktrace is requested after the task enters the kernel. As the stacktrace is saved on the task_struct while the task is in the kernel, if another request comes in, if the cookie is still the same, it will use the saved stacktrace, and not have to regenerate one. The cookie is passed to the caller on request, and when the stacktrace is generated upon returning to user space, it calls the requester's callback with the cookie as well as the stacktrace. The cookie is cleared when it goes back to user space. Note, this currently adds another conditional to the unwind_reset_info() path that is always called returning to user space, but future changes will put this back to a single conditional. A global list is created and protected by a global mutex that holds tracers that register with the unwind infrastructure. The number of registered tracers will be limited in future changes. Each perf program or ftrace instance will register its own descriptor to use for deferred unwind stack traces. Note, in the function unwind_deferred_task_work() that gets called when returning to user space, it uses a global mutex for synchronization which will cause a big bottleneck. This will be replaced by SRCU, but that change adds some complex synchronization that deservers its own commit. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Indu Bhagat <indu.bhagat@oracle.com> Cc: "Jose E. Marchesi" <jemarch@gnu.org> Cc: Beau Belgrave <beaub@linux.microsoft.com> Cc: Jens Remus <jremus@linux.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Florian Weimer <fweimer@redhat.com> Cc: Sam James <sam@gentoo.org> Link: https://lore.kernel.org/20250729182405.488066537@kernel.org Co-developed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-31unwind_user/deferred: Add unwind cacheJosh Poimboeuf3-1/+16
Cache the results of the unwind to ensure the unwind is only performed once, even when called by multiple tracers. The cache nr_entries gets cleared every time the task exits the kernel. When a stacktrace is requested, nr_entries gets set to the number of entries in the stacktrace. If another stacktrace is requested, if nr_entries is not zero, then it contains the same stacktrace that would be retrieved so it is not processed again and the entries is given to the caller. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Indu Bhagat <indu.bhagat@oracle.com> Cc: "Jose E. Marchesi" <jemarch@gnu.org> Cc: Beau Belgrave <beaub@linux.microsoft.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Florian Weimer <fweimer@redhat.com> Cc: Sam James <sam@gentoo.org> Link: https://lore.kernel.org/20250729182405.319691167@kernel.org Reviewed-by: Jens Remus <jremus@linux.ibm.com> Reviewed-By: Indu Bhagat <indu.bhagat@oracle.com> Co-developed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-31module: Rename MAX_PARAM_PREFIX_LEN to __MODULE_NAME_LENPetr Pavlu2-5/+9
The maximum module name length (MODULE_NAME_LEN) is somewhat confusingly defined in terms of the maximum parameter prefix length (MAX_PARAM_PREFIX_LEN), when in fact the dependency is in the opposite direction. This split originates from commit 730b69d22525 ("module: check kernel param length at compile time, not runtime"). The code needed to use MODULE_NAME_LEN in moduleparam.h, but because module.h requires moduleparam.h, this created a circular dependency. It was resolved by introducing MAX_PARAM_PREFIX_LEN in moduleparam.h and defining MODULE_NAME_LEN in module.h in terms of MAX_PARAM_PREFIX_LEN. Rename MAX_PARAM_PREFIX_LEN to __MODULE_NAME_LEN for clarity. This matches the similar approach of defining MODULE_INFO in module.h and __MODULE_INFO in moduleparam.h. Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Reviewed-by: Daniel Gomez <da.gomez@samsung.com> Link: https://lore.kernel.org/r/20250630143535.267745-6-petr.pavlu@suse.com Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
2025-07-31module: Restore the moduleparam prefix length checkPetr Pavlu1-3/+2
The moduleparam code allows modules to provide their own definition of MODULE_PARAM_PREFIX, instead of using the default KBUILD_MODNAME ".". Commit 730b69d22525 ("module: check kernel param length at compile time, not runtime") added a check to ensure the prefix doesn't exceed MODULE_NAME_LEN, as this is what param_sysfs_builtin() expects. Later, commit 58f86cc89c33 ("VERIFY_OCTAL_PERMISSIONS: stricter checking for sysfs perms.") removed this check, but there is no indication this was intentional. Since the check is still useful for param_sysfs_builtin() to function properly, reintroduce it in __module_param_call(), but in a modernized form using static_assert(). While here, clean up the __module_param_call() comments. In particular, remove the comment "Default value instead of permissions?", which comes from commit 9774a1f54f17 ("[PATCH] Compile-time check re world-writeable module params"). This comment was related to the test variable __param_perm_check_##name, which was removed in the previously mentioned commit 58f86cc89c33. Fixes: 58f86cc89c33 ("VERIFY_OCTAL_PERMISSIONS: stricter checking for sysfs perms.") Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Reviewed-by: Daniel Gomez <da.gomez@samsung.com> Link: https://lore.kernel.org/r/20250630143535.267745-4-petr.pavlu@suse.com Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
2025-07-31module: make structure definitions always visibleThomas Weißschuh1-10/+10
To write code that works with both CONFIG_MODULES=y and CONFIG_MODULES=n it is convenient to use "if (IS_ENABLED(CONFIG_MODULES))" over raw #ifdef. The code will still fully typechecked but the unreachable parts are discarded by the compiler. This prevents accidental breakage when a certain kconfig combination was not specifically tested by the developer. This pattern is already supported to some extend by module.h defining empty stub functions if CONFIG_MODULES=n. However some users of module.h work on the structured defined by module.h. Therefore these structure definitions need to be visible, too. Many structure members are still gated by specific configuration settings. The assumption for those is that the code using them will be gated behind the same configuration setting anyways. Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Reviewed-by: Daniel Gomez <da.gomez@samsung.com> Link: https://lore.kernel.org/r/20250711-kunit-ifdef-modules-v2-2-39443decb1f8@linutronix.de Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
2025-07-31module: move 'struct module_use' to internal.hThomas Weißschuh1-7/+0
The struct was moved to the public header file in commit c8e21ced08b3 ("module: fix kdb's illicit use of struct module_use."). Back then the structure was used outside of the module core. Nowadays this is not true anymore, so the structure can be made internal. Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Reviewed-by: Daniel Gomez <da.gomez@samsung.com> Reviewed-by: Petr Pavlu <petr.pavlu@suse.com> Link: https://lore.kernel.org/r/20250711-kunit-ifdef-modules-v2-1-39443decb1f8@linutronix.de Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
2025-07-31ata: libata-sata: Add link_power_management_supported sysfs attributeDamien Le Moal1-0/+1
A port link power management (LPM) policy can be controlled using the link_power_management_policy sysfs host attribute. However, this attribute exists also for hosts that do not support LPM and in such case, attempting to change the LPM policy for the host (port) will fail with -EOPNOTSUPP. Introduce the new sysfs link_power_management_supported host attribute to indicate to the user if a the port and the devices connected to the port for the host support LPM, which implies that the link_power_management_policy attribute can be used. Since checking that a port and its devices support LPM is common between the new ata_scsi_lpm_supported_show() function and the existing ata_scsi_lpm_store() function, the new helper ata_scsi_lpm_supported() is introduced. Fixes: 413e800cadbf ("ata: libata-sata: Disallow changing LPM state if not supported") Reported-by: Borah, Chaitanya Kumar <chaitanya.kumar.borah@intel.com> Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202507251014.a5becc3b-lkp@intel.com Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-07-31Merge tag 'drm-next-2025-07-30' of https://gitlab.freedesktop.org/drm/kernelLinus Torvalds6-15/+146
Pull drm updates from Dave Airlie: "Highlights: - Intel xe enable Panthor Lake, started adding WildCat Lake - amdgpu has a bunch of reset improvments along with the usual IP updates - msm got VM_BIND support which is important for vulkan sparse memory - more drm_panic users - gpusvm common code to handle a bunch of core SVM work outside drivers. Detail summary: Changes outside drm subdirectory: - 'shrink_shmem_memory()' for better shmem/hibernate interaction - Rust support infrastructure: - make ETIMEDOUT available - add size constants up to SZ_2G - add DMA coherent allocation bindings - mtd driver for Intel GPU non-volatile storage - i2c designware quirk for Intel xe core: - atomic helpers: tune enable/disable sequences - add task info to wedge API - refactor EDID quirks - connector: move HDR sink to drm_display_info - fourcc: half-float and 32-bit float formats - mode_config: pass format info to simplify dma-buf: - heaps: Give CMA heap a stable name ci: - add device tree validation and kunit displayport: - change AUX DPCD access probe address - add quirk for DPCD probe - add panel replay definitions - backlight control helpers fbdev: - make CONFIG_FIRMWARE_EDID available on all arches fence: - fix UAF issues format-helper: - improve tests gpusvm: - introduce devmem only flag for allocation - add timeslicing support to GPU SVM ttm: - improve eviction sched: - tracing improvements - kunit improvements - memory leak fixes - reset handling improvements color mgmt: - add hardware gamma LUT handling helpers bridge: - add destroy hook - switch to reference counted drm_bridge allocations - tc358767: convert to devm_drm_bridge_alloc - improve CEC handling panel: - switch to reference counter drm_panel allocations - fwnode panel lookup - Huiling hl055fhv028c support - Raspberry Pi 7" 720x1280 support - edp: KDC KD116N3730A05, N160JCE-ELL CMN, N116BCJ-EAK - simple: AUO P238HAN01 - st7701: Winstar wf40eswaa6mnn0 - visionox: rm69299-shift - Renesas R61307, Renesas R69328 support - DJN HX83112B hdmi: - add CEC handling - YUV420 output support xe: - WildCat Lake support - Enable PanthorLake by default - mark BMG as SRIOV capable - update firmware recommendations - Expose media OA units - aux-bux support for non-volatile memory - MTD intel-dg driver for non-volatile memory - Expose fan control and voltage regulator in sysfs - restructure migration for multi-device - Restore GuC submit UAF fix - make GEM shrinker drm managed - SRIOV VF Post-migration recovery of GGTT nodes - W/A additions/reworks - Prefetch support for svm ranges - Don't allocate managed BO for each policy change - HWMON fixes for BMG - Create LRC BO without VM - PCI ID updates - make SLPC debugfs files optional - rework eviction rejection of bound external BOs - consolidate PAT programming logic for pre/post Xe2 - init changes for flicker-free boot - Enable GuC Dynamic Inhibit Context switch i915: - drm_panic support for i915/xe - initial flip queue off by default for LNL/PNL - Wildcat Lake Display support - Support for DSC fractional link bpp - Support for simultaneous Panel Replay and Adaptive sync - Support for PTL+ double buffer LUT - initial PIPEDMC event handling - drm_panel_follower support - DPLL interface renames - allocate struct intel_display dynamically - flip queue preperation - abstract DRAM detection better - avoid GuC scheduling stalls - remove DG1 force probe requirement - fix MEI interrupt handler on RT kernels - use backlight control helpers for eDP - more shared display code refactoring amdgpu: - add userq slot to INFO ioctl - SR-IOV hibernation support - Suspend improvements - Backlight improvements - Use scaling for non-native eDP modes - cleaner shader updates for GC 9.x - Remove fence slab - SDMA fw checks for userq support - RAS updates - DMCUB updates - DP tunneling fixes - Display idle D3 support - Per queue reset improvements - initial smartmux support amdkfd: - enable KFD on loongarch - mtype fix for ext coherent system memory radeon: - CS validation additional GL extensions - drop console lock during suspend/resume - bump driver version msm: - VM BIND support - CI: infrastructure updates - UBWC single source of truth - decouple GPU and KMS support - DP: rework I/O accessors - DPU: SM8750 support - DSI: SM8750 support - GPU: X1-45 support and speedbin support for X1-85 - MDSS: SM8750 support nova: - register! macro improvements - DMA object abstraction - VBIOS parser + fwsec lookup - sysmem flush page support - falcon: generic falcon boot code and HAL - FWSEC-FRTS: fb setup and load/execute ivpu: - Add Wildcat Lake support - Add turbo flag ast: - improve hardware generations implementation imx: - IMX8qxq Display Controller support lima: - Rockchip RK3528 GPU support nouveau: - fence handling cleanup panfrost: - MT8370 support - bo labeling - 64-bit register access qaic: - add RAS support rockchip: - convert inno_hdmi to a bridge rz-du: - add RZ/V2H(P) support - MIPI-DSI DCS support sitronix: - ST7567 support sun4i: - add H616 support tidss: - add TI AM62L support - AM65x OLDI bridge support bochs: - drm panic support vkms: - YUV and R* format support - use faux device vmwgfx: - fence improvements hyperv: - move out of simple - add drm_panic support" * tag 'drm-next-2025-07-30' of https://gitlab.freedesktop.org/drm/kernel: (1479 commits) drm/tidss: oldi: convert to devm_drm_bridge_alloc() API drm/tidss: encoder: convert to devm_drm_bridge_alloc() drm/amdgpu: move reset support type checks into the caller drm/amdgpu/sdma7: re-emit unprocessed state on ring reset drm/amdgpu/sdma6: re-emit unprocessed state on ring reset drm/amdgpu/sdma5.2: re-emit unprocessed state on ring reset drm/amdgpu/sdma5: re-emit unprocessed state on ring reset drm/amdgpu/gfx12: re-emit unprocessed state on ring reset drm/amdgpu/gfx11: re-emit unprocessed state on ring reset drm/amdgpu/gfx10: re-emit unprocessed state on ring reset drm/amdgpu/gfx9.4.3: re-emit unprocessed state on kcq reset drm/amdgpu/gfx9: re-emit unprocessed state on kcq reset drm/amdgpu: Add WARN_ON to the resource clear function drm/amd/pm: Use cached metrics data on SMUv13.0.6 drm/amd/pm: Use cached data for min/max clocks gpu: nova-core: fix bounds check in PmuLookupTableEntry::new drm/amdgpu: Replace HQD terminology with slots naming drm/amdgpu: Add user queue instance count in HW IP info drm/amd/amdgpu: Add helper functions for isp buffers drm/amd/amdgpu: Initialize swnode for ISP MFD device ...
2025-07-31Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds13-85/+472
Pull kvm updates from Paolo Bonzini: "ARM: - Host driver for GICv5, the next generation interrupt controller for arm64, including support for interrupt routing, MSIs, interrupt translation and wired interrupts - Use FEAT_GCIE_LEGACY on GICv5 systems to virtualize GICv3 VMs on GICv5 hardware, leveraging the legacy VGIC interface - Userspace control of the 'nASSGIcap' GICv3 feature, allowing userspace to disable support for SGIs w/o an active state on hardware that previously advertised it unconditionally - Map supporting endpoints with cacheable memory attributes on systems with FEAT_S2FWB and DIC where KVM no longer needs to perform cache maintenance on the address range - Nested support for FEAT_RAS and FEAT_DoubleFault2, allowing the guest hypervisor to inject external aborts into an L2 VM and take traps of masked external aborts to the hypervisor - Convert more system register sanitization to the config-driven implementation - Fixes to the visibility of EL2 registers, namely making VGICv3 system registers accessible through the VGIC device instead of the ONE_REG vCPU ioctls - Various cleanups and minor fixes LoongArch: - Add stat information for in-kernel irqchip - Add tracepoints for CPUCFG and CSR emulation exits - Enhance in-kernel irqchip emulation - Various cleanups RISC-V: - Enable ring-based dirty memory tracking - Improve perf kvm stat to report interrupt events - Delegate illegal instruction trap to VS-mode - MMU improvements related to upcoming nested virtualization s390x - Fixes x86: - Add CONFIG_KVM_IOAPIC for x86 to allow disabling support for I/O APIC, PIC, and PIT emulation at compile time - Share device posted IRQ code between SVM and VMX and harden it against bugs and runtime errors - Use vcpu_idx, not vcpu_id, for GA log tag/metadata, to make lookups O(1) instead of O(n) - For MMIO stale data mitigation, track whether or not a vCPU has access to (host) MMIO based on whether the page tables have MMIO pfns mapped; using VFIO is prone to false negatives - Rework the MSR interception code so that the SVM and VMX APIs are more or less identical - Recalculate all MSR intercepts from scratch on MSR filter changes, instead of maintaining shadow bitmaps - Advertise support for LKGS (Load Kernel GS base), a new instruction that's loosely related to FRED, but is supported and enumerated independently - Fix a user-triggerable WARN that syzkaller found by setting the vCPU in INIT_RECEIVED state (aka wait-for-SIPI), and then putting the vCPU into VMX Root Mode (post-VMXON). Trying to detect every possible path leading to architecturally forbidden states is hard and even risks breaking userspace (if it goes from valid to valid state but passes through invalid states), so just wait until KVM_RUN to detect that the vCPU state isn't allowed - Add KVM_X86_DISABLE_EXITS_APERFMPERF to allow disabling interception of APERF/MPERF reads, so that a "properly" configured VM can access APERF/MPERF. This has many caveats (APERF/MPERF cannot be zeroed on vCPU creation or saved/restored on suspend and resume, or preserved over thread migration let alone VM migration) but can be useful whenever you're interested in letting Linux guests see the effective physical CPU frequency in /proc/cpuinfo - Reject KVM_SET_TSC_KHZ for vm file descriptors if vCPUs have been created, as there's no known use case for changing the default frequency for other VM types and it goes counter to the very reason why the ioctl was added to the vm file descriptor. And also, there would be no way to make it work for confidential VMs with a "secure" TSC, so kill two birds with one stone - Dynamically allocation the shadow MMU's hashed page list, and defer allocating the hashed list until it's actually needed (the TDP MMU doesn't use the list) - Extract many of KVM's helpers for accessing architectural local APIC state to common x86 so that they can be shared by guest-side code for Secure AVIC - Various cleanups and fixes x86 (Intel): - Preserve the host's DEBUGCTL.FREEZE_IN_SMM when running the guest. Failure to honor FREEZE_IN_SMM can leak host state into guests - Explicitly check vmcs12.GUEST_DEBUGCTL on nested VM-Enter to prevent L1 from running L2 with features that KVM doesn't support, e.g. BTF x86 (AMD): - WARN and reject loading kvm-amd.ko instead of panicking the kernel if the nested SVM MSRPM offsets tracker can't handle an MSR (which is pretty much a static condition and therefore should never happen, but still) - Fix a variety of flaws and bugs in the AVIC device posted IRQ code - Inhibit AVIC if a vCPU's ID is too big (relative to what hardware supports) instead of rejecting vCPU creation - Extend enable_ipiv module param support to SVM, by simply leaving IsRunning clear in the vCPU's physical ID table entry - Disable IPI virtualization, via enable_ipiv, if the CPU is affected by erratum #1235, to allow (safely) enabling AVIC on such CPUs - Request GA Log interrupts if and only if the target vCPU is blocking, i.e. only if KVM needs a notification in order to wake the vCPU - Intercept SPEC_CTRL on AMD if the MSR shouldn't exist according to the vCPU's CPUID model - Accept any SNP policy that is accepted by the firmware with respect to SMT and single-socket restrictions. An incompatible policy doesn't put the kernel at risk in any way, so there's no reason for KVM to care - Drop a superfluous WBINVD (on all CPUs!) when destroying a VM and use WBNOINVD instead of WBINVD when possible for SEV cache maintenance - When reclaiming memory from an SEV guest, only do cache flushes on CPUs that have ever run a vCPU for the guest, i.e. don't flush the caches for CPUs that can't possibly have cache lines with dirty, encrypted data Generic: - Rework irqbypass to track/match producers and consumers via an xarray instead of a linked list. Using a linked list leads to O(n^2) insertion times, which is hugely problematic for use cases that create large numbers of VMs. Such use cases typically don't actually use irqbypass, but eliminating the pointless registration is a future problem to solve as it likely requires new uAPI - Track irqbypass's "token" as "struct eventfd_ctx *" instead of a "void *", to avoid making a simple concept unnecessarily difficult to understand - Decouple device posted IRQs from VFIO device assignment, as binding a VM to a VFIO group is not a requirement for enabling device posted IRQs - Clean up and document/comment the irqfd assignment code - Disallow binding multiple irqfds to an eventfd with a priority waiter, i.e. ensure an eventfd is bound to at most one irqfd through the entire host, and add a selftest to verify eventfd:irqfd bindings are globally unique - Add a tracepoint for KVM_SET_MEMORY_ATTRIBUTES to help debug issues related to private <=> shared memory conversions - Drop guest_memfd's .getattr() implementation as the VFS layer will call generic_fillattr() if inode_operations.getattr is NULL - Fix issues with dirty ring harvesting where KVM doesn't bound the processing of entries in any way, which allows userspace to keep KVM in a tight loop indefinitely - Kill off kvm_arch_{start,end}_assignment() and x86's associated tracking, now that KVM no longer uses assigned_device_count as a heuristic for either irqbypass usage or MDS mitigation Selftests: - Fix a comment typo - Verify KVM is loaded when getting any KVM module param so that attempting to run a selftest without kvm.ko loaded results in a SKIP message about KVM not being loaded/enabled (versus some random parameter not existing) - Skip tests that hit EACCES when attempting to access a file, and print a "Root required?" help message. In most cases, the test just needs to be run with elevated permissions" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (340 commits) Documentation: KVM: Use unordered list for pre-init VGIC registers RISC-V: KVM: Avoid re-acquiring memslot in kvm_riscv_gstage_map() RISC-V: KVM: Use find_vma_intersection() to search for intersecting VMAs RISC-V: perf/kvm: Add reporting of interrupt events RISC-V: KVM: Enable ring-based dirty memory tracking RISC-V: KVM: Fix inclusion of Smnpm in the guest ISA bitmap RISC-V: KVM: Delegate illegal instruction fault to VS mode RISC-V: KVM: Pass VMID as parameter to kvm_riscv_hfence_xyz() APIs RISC-V: KVM: Factor-out g-stage page table management RISC-V: KVM: Add vmid field to struct kvm_riscv_hfence RISC-V: KVM: Introduce struct kvm_gstage_mapping RISC-V: KVM: Factor-out MMU related declarations into separate headers RISC-V: KVM: Use ncsr_xyz() in kvm_riscv_vcpu_trap_redirect() RISC-V: KVM: Implement kvm_arch_flush_remote_tlbs_range() RISC-V: KVM: Don't flush TLB when PTE is unchanged RISC-V: KVM: Replace KVM_REQ_HFENCE_GVMA_VMID_ALL with KVM_REQ_TLB_FLUSH RISC-V: KVM: Rename and move kvm_riscv_local_tlb_sanitize() RISC-V: KVM: Drop the return value of kvm_riscv_vcpu_aia_init() RISC-V: KVM: Check kvm_riscv_vcpu_alloc_vector_context() return value KVM: arm64: selftests: Add FEAT_RAS EL2 registers to get-reg-list ...
2025-07-31i3c: Add more parameters for controllers to the headerWolfram Sang1-0/+4
Add standard timing value definition from specification. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Tested-by: Tommaso Merciai <tommaso.merciai.xr@bp.renesas.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Link: https://lore.kernel.org/r/20250724094146.6443-3-wsa+renesas@sang-engineering.com Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2025-07-31i3c: Standardize defines for specification parametersWolfram Sang1-4/+5
Align existing defines to follow the consistent pattern: I3C_BUS_<PARAM>_<MAX|MIN|TYP>_<UNIT>. Prepare the codebase for adding new parameters and help avoid duplication. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Tested-by: Tommaso Merciai <tommaso.merciai.xr@bp.renesas.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Link: https://lore.kernel.org/r/20250724094146.6443-2-wsa+renesas@sang-engineering.com Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2025-07-31i3c: fix module_i3c_i2c_driver() with I3C=nArnd Bergmann1-2/+2
When CONFIG_I3C is disabled and the i3c_i2c_driver_register() happens to not be inlined, any driver calling it still references the i3c_driver instance, which then causes a link failure: x86_64-linux-ld: drivers/hwmon/lm75.o: in function `lm75_i3c_reg_read': lm75.c:(.text+0xc61): undefined reference to `i3cdev_to_dev' x86_64-linux-ld: lm75.c:(.text+0xd25): undefined reference to `i3c_device_do_priv_xfers' x86_64-linux-ld: lm75.c:(.text+0xdd8): undefined reference to `i3c_device_do_priv_xfers' This issue was part of the original i3c code, but only now caused problems when i3c support got added to lm75. Change the 'inline' annotations in the header to '__always_inline' to ensure that the dead-code-elimination pass in the compiler can optimize it out as intended. Fixes: 6071d10413ff ("hwmon: (lm75) add I3C support for P3T1755") Fixes: 3a379bbcea0a ("i3c: Add core I3C infrastructure") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Frank Li <Frank.Li@nxp.com> Link: https://lore.kernel.org/r/20250725090609.2456262-1-arnd@kernel.org Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2025-07-31Merge tag 'trace-rv-6.17' of ↵Linus Torvalds4-18/+93
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull runtime verification updates from Steven Rostedt: - Added Linear temporal logic monitors for RT application Real-time applications may have design flaws causing them to have unexpected latency. For example, the applications may raise page faults, or may be blocked trying to take a mutex without priority inheritance. However, while attempting to implement DA monitors for these real-time rules, deterministic automaton is found to be inappropriate as the specification language. The automaton is complicated, hard to understand, and error-prone. For these cases, linear temporal logic is found to be more suitable. The LTL is more concise and intuitive. - Make printk_deferred() public The new monitors needed access to printk_deferred(). Make them visible for the entire kernel. - Add a vpanic() to allow for va_list to be passed to panic. - Add rtapp container monitor. A collection of monitors that check for common problems with real-time applications that cause unexpected latency. - Add page fault tracepoints to risc-v These tracepoints are necessary to for the RV monitor to run on risc-v. - Fix the behaviour of the rv tool with -s and idle tasks. - Allow the rv tool to gracefully terminate with SIGTERM - Adjusts dot2c not to create lines over 100 columns - Properly order nested monitors in the RV Kconfig file - Return the registration error in all DA monitor instead of 0 - Update and add new sched collection monitors Replace tss and sncid monitors with more complete sts: Not only prove that switches occur in scheduling context and scheduling needs interrupt disabled but also that each call to the scheduler disables interrupts to (optionally) switch. New monitor: nrp Preemption requires need resched which is cleared by any switch (includes a non optimal workaround for /nested/ preemptions) New monitor: sssw suspension requires setting the task to sleepable and, after the switch occurs, the task requires a wakeup to come back to runnable New monitor: opid waking and need-resched operations occur with interrupts and preemption disabled or in IRQ without explicitly disabling preemption" * tag 'trace-rv-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (48 commits) rv: Add opid per-cpu monitor rv: Add nrp and sssw per-task monitors rv: Replace tss and sncid monitors with more complete sts sched: Adapt sched tracepoints for RV task model rv: Retry when da monitor detects race conditions rv: Adjust monitor dependencies rv: Use strings in da monitors tracepoints rv: Remove trailing whitespace from tracepoint string rv: Add da_handle_start_run_event_ to per-task monitors rv: Fix wrong type cast in reactors_show() and monitor_reactor_show() rv: Fix wrong type cast in monitors_show() rv: Remove struct rv_monitor::reacting rv: Remove rv_reactor's reference counter rv: Merge struct rv_reactor_def into struct rv_reactor rv: Merge struct rv_monitor_def into struct rv_monitor rv: Remove unused field in struct rv_monitor_def rv: Return init error when registering monitors verification/rvgen: Organise Kconfig entries for nested monitors tools/dot2c: Fix generated files going over 100 column limit tools/rv: Stop gracefully also on SIGTERM ...
2025-07-31Merge tag 'trace-ringbuffer-v6.17' of ↵Linus Torvalds1-3/+1
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull ring-buffer updates from Steven Rostedt: - Rewind persistent ring buffer on boot When the persistent ring buffer is being used for live kernel tracing and the system crashes, the tool that is reading the trace may not have recorded the data when the system crashed. Although the persistent ring buffer still has that data, when reading it after a reboot, it will start where it left off. That is, what was read will not be accessible. Instead, on reboot, have the persistent ring buffer restart where the data starts and this will allow the tooling to recover what was lost when the crash occurred. - Remove the ring_buffer_read_prepare_sync() logic Reading the trace file required stopping writing to the ring buffer as the trace file is only an iterator and does not consume what it read. It was originally not safe to read the ring buffer in this mode and required disabling writing. The ring_buffer_read_prepare_sync() logic was used to stop each per_cpu ring buffer, call synchronize_rcu() and then start the iterator. This was used instead of calling synchronize_rcu() for each per_cpu buffer. Today, the iterator has been updated where it is safe to read the trace file while writing to the ring buffer is still occurring. There is no more need to do this synchronization and it is causing large delays on machines with many CPUs. Remove this unneeded synchronization. - Make static string array a constant in show_irq_str() Making the string array into a constant has shown to decrease code text/data size. * tag 'trace-ringbuffer-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: ring-buffer: Make the const read-only 'type' static ring-buffer: Remove ring_buffer_read_prepare_sync() tracing: ring_buffer: Rewind persistent ring buffer on reboot