summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2023-10-12thermal: int340x: processor_thermal: Handle power floor interruptsSrinivas Pandruvada1-1/+8
On thermal device interrupt, if the interrupt is generated for passing power floor status, call the callback to pass notification to the user space. First call proc_thermal_check_power_floor_intr() to check interrupt, if this callback returns true, wake the IRQ thread to call proc_thermal_power_floor_intr_callback() to notify user space. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-10-12thermal: int340x: processor_thermal: Support power floor notificationsSrinivas Pandruvada5-1/+212
When the hardware reduces the power to the minimum possible, the power floor is notified via an interrupt. This can happen when user space requests a power limit via powercap RAPL interface, which forces the system to enter to the lowest power. This power floor indication can be used as a hint to resort to other methods of reducing power than via RAPL power limit. Before power floor status can be read or the firmware can trigger notifications regarding it, it needs to be configured via a mailbox command. The actual power floor status is read via bit 39 of MMIO offset 0x5B18 of the processor thermal PCI device. To show the current power floor status and get notification on a sysfs attribute, add 2 new attributes to /sys/bus/pci/devices/0000\:00\:04.0/power_limits/ power_floor_enable : This attribute is present when power floor notifications are supported. This attribute allows to enable/disable power floor notifications. power_floor_status : This attribute is present when power floor notifications are supported. When enabled via power_floor_enable, this attribute shows the current power floor status. The power floor implementation provides interfaces which are called from the sysfs callbacks to enable/disable and read power floor status. It also provides two additional interfaces to check if the current processor thermal device interrupt is for power floor status and to send notifications to user space. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> [ rjw: Changelog and documentation changes edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-10-12thermal: int340x: processor_thermal: Set feature mask before proc_thermal_addSrinivas Pandruvada1-11/+10
The function proc_thermal_add() adds sysfs entries for power limits. The feature mask of available features is not present at that time, so it cannot be used by proc_thermal_add() to selectively create sysfs attributes. The feature mask is set by proc_thermal_mmio_add(), so modify the code to call it before proc_thermal_add() so as to allow the latter to use the feature mask. There is no functional impact with this change. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> [ rjw: Changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-10-12thermal: int340x: processor_thermal: Common function to clear SOC interruptSrinivas Pandruvada3-2/+14
The SOC interrupt status register contains multiple interrupt sources (workload hint interrupt and power floor interrupt). It is not possible to clear individual interrupt source with read-modify-write, as it may clear the new interrupt from the firmware after a read operation. It is also not possible to set the interrupt status bit to 1 for the other interrupt source, which is not part of clearing. Hence, create a common function, to clear all status bits at once. Call this function after processing all interrupt sources. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> [ rjw: Changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-10-12thermal: int340x: processor_thermal: Move interrupt status MMIO offset to ↵Srinivas Pandruvada2-1/+2
common header Move define SOC_WT_RES_INT_STATUS_OFFSET to processor_thermal_device.h. This way it can be reused in other modules. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-10-05thermal: intel: powerclamp: fix mismatch in get function for max_idleDavid Arcari1-1/+1
KASAN reported this [ 444.853098] BUG: KASAN: global-out-of-bounds in param_get_int+0x77/0x90 [ 444.853111] Read of size 4 at addr ffffffffc16c9220 by task cat/2105 ... [ 444.853442] The buggy address belongs to the variable: [ 444.853443] max_idle+0x0/0xffffffffffffcde0 [intel_powerclamp] There is a mismatch between the param_get_int and the definition of max_idle. Replacing param_get_int with param_get_byte resolves this issue. Fixes: ebf519710218 ("thermal: intel: powerclamp: Add two module parameters") Cc: 6.3+ <stable@vger.kernel.org> # 6.3+ Signed-off-by: David Arcari <darcari@redhat.com> Reviewed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-10-05thermal: int340x: Use thermal_zone_for_each_trip()Rafael J. Wysocki1-36/+42
Modify int340x_thermal_update_trips() to use thermal_zone_for_each_trip() for walking trips instead of using the trips[] table passed to the thermal zone registration function. For this purpose, store active trip point indices in the priv fieids of the corresponding thermal_trip structures. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2023-10-05Merge earlier changes in Intel thermal drivers for v6.7.Rafael J. Wysocki12-157/+764
2023-10-05thermal: core: Add function to walk trips under zone lockRafael J. Wysocki2-0/+17
Add a wrapper around for_each_thermal_trip(), called thermal_zone_for_each_trip(), that will invoke the former under the thermal zone lock and pass its return value to the caller. Two drivers will be modified subsequently to use this new function. No functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2023-09-28thermal: core: Allow trip pointers to be used for cooling device bindingRafael J. Wysocki2-20/+42
Add new helper functions, thermal_bind_cdev_to_trip() and thermal_unbind_cdev_from_trip(), to allow a trip pointer to be used for binding a cooling device to a trip point and unbinding it, respectively, and redefine the existing helpers, thermal_zone_bind_cooling_device() and thermal_zone_unbind_cooling_device(), as wrappers around the new ones, respectively. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2023-09-28thermal: core: Store trip pointer in struct thermal_instanceRafael J. Wysocki9-37/+60
Replace the integer trip number stored in struct thermal_instance with a pointer to the relevant trip and adjust the code using the structure in question accordingly. The main reason for making this change is to allow the trip point to cooling device binding code more straightforward, as illustrated by subsequent modifications of the ACPI thermal driver, but it also helps to clarify the overall design and allows the governor code overhead to be reduced (through subsequent modifications). The only case in which it adds complexity is trip_point_show() that needs to walk the trips[] table to find the index of the given trip point, but this is not a critical path and the interface that trip_point_show() belongs to is problematic anyway (for instance, it doesn't cover the case when the same cooling devices is associated with multiple trip points). This is a preliminary change and the affected code will be refined by a series of subsequent modifications of thermal governors, the core and the ACPI thermal driver. The general functionality is not expected to be affected by this change. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2023-09-26thermal: trip: Drop redundant trips check from for_each_thermal_trip()Rafael J. Wysocki1-3/+0
It is invalid to call for_each_thermal_trip() on an unregistered thermal zone anyway, and as per thermal_zone_device_register_with_trips(), the trips[] table must be present if num_trips is greater than zero for the given thermal zone. Hence, the trips check in for_each_thermal_trip() is redundant and so it can be dropped. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2023-09-25thermal: core: Drop trips_disabled bitmaskRafael J. Wysocki2-15/+0
After recent changes, thermal_zone_get_trip() cannot fail, as invoked from thermal_zone_device_register_with_trips(), so the only role of the trips_disabled bitmask is struct thermal_zone_device is to make handle_thermal_trip() skip trip points whose temperature was initially zero. However, since the unit of temperature in the thermal core is millicelsius, zero may very well be a valid temperature value at least in some usage scenarios and the trip temperature may as well change later. Thus there is no reason to permanently disable trip points with initial temperature equal to zero. Accordingly, drop the trips_disabled bitmask along with the code related to it. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2023-09-25Linux 6.6-rc3v6.6-rc3Linus Torvalds1-1/+1
2023-09-25Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds20-161/+209
Pull kvm fixes from Paolo Bonzini: "ARM: - Fix EL2 Stage-1 MMIO mappings where a random address was used - Fix SMCCC function number comparison when the SVE hint is set RISC-V: - Fix KVM_GET_REG_LIST API for ISA_EXT registers - Fix reading ISA_EXT register of a missing extension - Fix ISA_EXT register handling in get-reg-list test - Fix filtering of AIA registers in get-reg-list test x86: - Fixes for TSC_AUX virtualization - Stop zapping page tables asynchronously, since we don't zap them as often as before" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: SVM: Do not use user return MSR support for virtualized TSC_AUX KVM: SVM: Fix TSC_AUX virtualization setup KVM: SVM: INTERCEPT_RDTSCP is never intercepted anyway KVM: x86/mmu: Stop zapping invalidated TDP MMU roots asynchronously KVM: x86/mmu: Do not filter address spaces in for_each_tdp_mmu_root_yield_safe() KVM: x86/mmu: Open code leaf invalidation from mmu_notifier KVM: riscv: selftests: Selectively filter-out AIA registers KVM: riscv: selftests: Fix ISA_EXT register handling in get-reg-list RISC-V: KVM: Fix riscv_vcpu_get_isa_ext_single() for missing extensions RISC-V: KVM: Fix KVM_GET_REG_LIST API for ISA_EXT registers KVM: selftests: Assert that vasprintf() is successful KVM: arm64: nvhe: Ignore SVE hint in SMCCC function ID KVM: arm64: Properly return allocated EL2 VA from hyp_alloc_private_va_range()
2023-09-24Merge tag 'trace-v6.6-rc2' of ↵Linus Torvalds2-30/+85
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixes from Steven Rostedt: - Fix the "bytes" output of the per_cpu stat file The tracefs/per_cpu/cpu*/stats "bytes" was giving bogus values as the accounting was not accurate. It is suppose to show how many used bytes are still in the ring buffer, but even when the ring buffer was empty it would still show there were bytes used. - Fix a bug in eventfs where reading a dynamic event directory (open) and then creating a dynamic event that goes into that diretory screws up the accounting. On close, the newly created event dentry will get a "dput" without ever having a "dget" done for it. The fix is to allocate an array on dir open to save what dentries were actually "dget" on, and what ones to "dput" on close. * tag 'trace-v6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: eventfs: Remember what dentries were created on dir open ring-buffer: Fix bytes info in per_cpu buffer stats
2023-09-24Merge tag 'cxl-fixes-6.6-rc3' of ↵Linus Torvalds8-33/+60
git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl Pull cxl fixes from Dan Williams: "A collection of regression fixes, bug fixes, and some small cleanups to the Compute Express Link code. The regressions arrived in the v6.5 dev cycle and missed the v6.6 merge window due to my personal absences this cycle. The most important fixes are for scenarios where the CXL subsystem fails to parse valid region configurations established by platform firmware. This is important because agreement between OS and BIOS on the CXL configuration is fundamental to implementing "OS native" error handling, i.e. address translation and component failure identification. Other important fixes are a driver load error when the BIOS lets the Linux PCI core handle AER events, but not CXL memory errors. The other fixex might have end user impact, but for now are only known to trigger in our test/emulation environment. Summary: - Fix multiple scenarios where platform firmware defined regions fail to be assembled by the CXL core. - Fix a spurious driver-load failure on platforms that enable OS native AER, but not OS native CXL error handling. - Fix a regression detecting "poison" commands when "security" commands are also defined. - Fix a cxl_test regression with the move to centralize CXL port register enumeration in the CXL core. - Miscellaneous small fixes and cleanups" * tag 'cxl-fixes-6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: cxl/acpi: Annotate struct cxl_cxims_data with __counted_by cxl/port: Fix cxl_test register enumeration regression cxl/region: Refactor granularity select in cxl_port_setup_targets() cxl/region: Match auto-discovered region decoders by HPA range cxl/mbox: Fix CEL logic for poison and security commands cxl/pci: Replace host_bridge->native_aer with pcie_aer_is_native() PCI/AER: Export pcie_aer_is_native() cxl/pci: Fix appropriate checking for _OSC while handling CXL RAS registers
2023-09-23Merge tag 'gpio-fixes-for-v6.6-rc3' of ↵Linus Torvalds3-39/+29
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux Pull gpio fixes from Bartosz Golaszewski: - fix an invalid usage of __free(kfree) leading to kfreeing an ERR_PTR() - fix an irq domain leak in gpio-tb10x - MAINTAINERS update * tag 'gpio-fixes-for-v6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: gpio: sim: fix an invalid __free() usage gpio: tb10x: Fix an error handling path in tb10x_gpio_probe() MAINTAINERS: gpio-regmap: make myself a maintainer of it
2023-09-23Merge tag 'mm-hotfixes-stable-2023-09-23-10-31' of ↵Linus Torvalds16-91/+111
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "13 hotfixes, 10 of which pertain to post-6.5 issues. The other three are cc:stable" * tag 'mm-hotfixes-stable-2023-09-23-10-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: proc: nommu: fix empty /proc/<pid>/maps filemap: add filemap_map_order0_folio() to handle order0 folio proc: nommu: /proc/<pid>/maps: release mmap read lock mm: memcontrol: fix GFP_NOFS recursion in memory.high enforcement pidfd: prevent a kernel-doc warning argv_split: fix kernel-doc warnings scatterlist: add missing function params to kernel-doc selftests/proc: fixup proc-empty-vm test after KSM changes revert "scripts/gdb/symbols: add specific ko module load command" selftests: link libasan statically for tests with -fsanitize=address task_work: add kerneldoc annotation for 'data' argument mm: page_alloc: fix CMA and HIGHATOMIC landing on the wrong buddy list sh: mm: re-add lost __ref to ioremap_prot() to fix modpost warning
2023-09-23Merge tag '6.6-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds11-27/+60
Pull smb client fixes from Steve French: "Six smb3 client fixes, including three for stable, from the SMB plugfest (testing event) this week: - Reparse point handling fix (found when investigating dir enumeration when fifo in dir) - Fix excessive thread creation for dir lease cleanup - UAF fix in negotiate path - remove duplicate error message mapping and fix confusing warning message - add dynamic trace point to improve debugging RDMA connection attempts" * tag '6.6-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: smb3: fix confusing debug message smb: client: handle STATUS_IO_REPARSE_TAG_NOT_HANDLED smb3: remove duplicate error mapping cifs: Fix UAF in cifs_demultiplex_thread() smb3: do not start laundromat thread when dir leases disabled smb3: Add dynamic trace points for RDMA (smbdirect) reconnect
2023-09-23Merge tag 'i2c-for-6.6-rc3' of ↵Linus Torvalds6-2/+29
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fixes from Wolfram Sang: "A set of I2C driver fixes. Mostly fixing resource leaks or sanity checks" * tag 'i2c-for-6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: xiic: Correct return value check for xiic_reinit() i2c: mux: gpio: Add missing fwnode_handle_put() i2c: mux: demux-pinctrl: check the return value of devm_kstrdup() i2c: designware: fix __i2c_dw_disable() in case master is holding SCL low i2c: i801: unregister tco_pdev in i801_probe() error path
2023-09-23mfd: cs42l43: Use correct macro for new-style PM runtime opsCharles Keepax1-2/+2
The code was accidentally mixing new and old style macros, update the macros used to remove an unused function warning whilst building with no PM enabled in the config. Fixes: ace6d1448138 ("mfd: cs42l43: Add support for cs42l43 core driver") Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com> Link: https://lore.kernel.org/all/20230822114914.340359-1-ckeepax@opensource.cirrus.com/ Reviewed-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Lee Jones <lee@kernel.org> Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-09-23Merge tag 'loongarch-fixes-6.6-1' of ↵Linus Torvalds26-134/+177
git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson Pull LoongArch fixes from Huacai Chen: "Fix lockdep, fix a boot failure, fix some build warnings, fix document links, and some cleanups" * tag 'loongarch-fixes-6.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson: docs/zh_CN/LoongArch: Update the links of ABI docs/LoongArch: Update the links of ABI LoongArch: Don't inline kasan_mem_to_shadow()/kasan_shadow_to_mem() kasan: Cleanup the __HAVE_ARCH_SHADOW_MAP usage LoongArch: Set all reserved memblocks on Node#0 at initialization LoongArch: Remove dead code in relocate_new_kernel LoongArch: Use _UL() and _ULL() LoongArch: Fix some build warnings with W=1 LoongArch: Fix lockdep static memory detection
2023-09-23Merge tag 's390-6.6-3' of ↵Linus Torvalds4-13/+25
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fixes from Vasily Gorbik: - Fix potential string buffer overflow in hypervisor user-defined certificates handling - Update defconfigs * tag 's390-6.6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/cert_store: fix string length handling s390: update defconfigs
2023-09-23Merge tag 'iomap-6.6-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds2-23/+32
Pull iomap fixes from Darrick Wong: - Return EIO on bad inputs to iomap_to_bh instead of BUGging, to deal less poorly with block device io racing with block device resizing - Fix a stale page data exposure bug introduced in 6.6-rc1 when unsharing a file range that is not in the page cache * tag 'iomap-6.6-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: iomap: convert iomap_unshare_iter to use large folios iomap: don't skip reading in !uptodate folios when unsharing a range iomap: handle error conditions more gracefully in iomap_to_bh
2023-09-23Merge tag 'kvm-riscv-fixes-6.6-1' of https://github.com/kvm-riscv/linux into ↵Paolo Bonzini350-1241/+2321
HEAD KVM/riscv fixes for 6.6, take #1 - Fix KVM_GET_REG_LIST API for ISA_EXT registers - Fix reading ISA_EXT register of a missing extension - Fix ISA_EXT register handling in get-reg-list test - Fix filtering of AIA registers in get-reg-list test
2023-09-23KVM: SVM: Do not use user return MSR support for virtualized TSC_AUXTom Lendacky1-1/+33
When the TSC_AUX MSR is virtualized, the TSC_AUX value is swap type "B" within the VMSA. This means that the guest value is loaded on VMRUN and the host value is restored from the host save area on #VMEXIT. Since the value is restored on #VMEXIT, the KVM user return MSR support for TSC_AUX can be replaced by populating the host save area with the current host value of TSC_AUX. And, since TSC_AUX is not changed by Linux post-boot, the host save area can be set once in svm_hardware_enable(). This eliminates the two WRMSR instructions associated with the user return MSR support. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <d381de38eb0ab6c9c93dda8503b72b72546053d7.1694811272.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-09-23KVM: SVM: Fix TSC_AUX virtualization setupTom Lendacky3-12/+29
The checks for virtualizing TSC_AUX occur during the vCPU reset processing path. However, at the time of initial vCPU reset processing, when the vCPU is first created, not all of the guest CPUID information has been set. In this case the RDTSCP and RDPID feature support for the guest is not in place and so TSC_AUX virtualization is not established. This continues for each vCPU created for the guest. On the first boot of an AP, vCPU reset processing is executed as a result of an APIC INIT event, this time with all of the guest CPUID information set, resulting in TSC_AUX virtualization being enabled, but only for the APs. The BSP always sees a TSC_AUX value of 0 which probably went unnoticed because, at least for Linux, the BSP TSC_AUX value is 0. Move the TSC_AUX virtualization enablement out of the init_vmcb() path and into the vcpu_after_set_cpuid() path to allow for proper initialization of the support after the guest CPUID information has been set. With the TSC_AUX virtualization support now in the vcpu_set_after_cpuid() path, the intercepts must be either cleared or set based on the guest CPUID input. Fixes: 296d5a17e793 ("KVM: SEV-ES: Use V_TSC_AUX if available instead of RDTSC/MSR_TSC_AUX intercepts") Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <4137fbcb9008951ab5f0befa74a0399d2cce809a.1694811272.git.thomas.lendacky@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-09-23KVM: SVM: INTERCEPT_RDTSCP is never intercepted anywayPaolo Bonzini1-4/+1
svm_recalc_instruction_intercepts() is always called at least once before the vCPU is started, so the setting or clearing of the RDTSCP intercept can be dropped from the TSC_AUX virtualization support. Extracted from a patch by Tom Lendacky. Cc: stable@vger.kernel.org Fixes: 296d5a17e793 ("KVM: SEV-ES: Use V_TSC_AUX if available instead of RDTSC/MSR_TSC_AUX intercepts") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-09-23KVM: x86/mmu: Stop zapping invalidated TDP MMU roots asynchronouslySean Christopherson6-103/+68
Stop zapping invalidate TDP MMU roots via work queue now that KVM preserves TDP MMU roots until they are explicitly invalidated. Zapping roots asynchronously was effectively a workaround to avoid stalling a vCPU for an extended during if a vCPU unloaded a root, which at the time happened whenever the guest toggled CR0.WP (a frequent operation for some guest kernels). While a clever hack, zapping roots via an unbound worker had subtle, unintended consequences on host scheduling, especially when zapping multiple roots, e.g. as part of a memslot. Because the work of zapping a root is no longer bound to the task that initiated the zap, things like the CPU affinity and priority of the original task get lost. Losing the affinity and priority can be especially problematic if unbound workqueues aren't affined to a small number of CPUs, as zapping multiple roots can cause KVM to heavily utilize the majority of CPUs in the system, *beyond* the CPUs KVM is already using to run vCPUs. When deleting a memslot via KVM_SET_USER_MEMORY_REGION, the async root zap can result in KVM occupying all logical CPUs for ~8ms, and result in high priority tasks not being scheduled in in a timely manner. In v5.15, which doesn't preserve unloaded roots, the issues were even more noticeable as KVM would zap roots more frequently and could occupy all CPUs for 50ms+. Consuming all CPUs for an extended duration can lead to significant jitter throughout the system, e.g. on ChromeOS with virtio-gpu, deleting memslots is a semi-frequent operation as memslots are deleted and recreated with different host virtual addresses to react to host GPU drivers allocating and freeing GPU blobs. On ChromeOS, the jitter manifests as audio blips during games due to the audio server's tasks not getting scheduled in promptly, despite the tasks having a high realtime priority. Deleting memslots isn't exactly a fast path and should be avoided when possible, and ChromeOS is working towards utilizing MAP_FIXED to avoid the memslot shenanigans, but KVM is squarely in the wrong. Not to mention that removing the async zapping eliminates a non-trivial amount of complexity. Note, one of the subtle behaviors hidden behind the async zapping is that KVM would zap invalidated roots only once (ignoring partial zaps from things like mmu_notifier events). Preserve this behavior by adding a flag to identify roots that are scheduled to be zapped versus roots that have already been zapped but not yet freed. Add a comment calling out why kvm_tdp_mmu_invalidate_all_roots() can encounter invalid roots, as it's not at all obvious why zapping invalidated roots shouldn't simply zap all invalid roots. Reported-by: Pattara Teerapong <pteerapong@google.com> Cc: David Stevens <stevensd@google.com> Cc: Yiwei Zhang<zzyiwei@google.com> Cc: Paul Hsia <paulhsia@google.com> Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230916003916.2545000-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-09-23KVM: x86/mmu: Do not filter address spaces in for_each_tdp_mmu_root_yield_safe()Paolo Bonzini3-19/+14
All callers except the MMU notifier want to process all address spaces. Remove the address space ID argument of for_each_tdp_mmu_root_yield_safe() and switch the MMU notifier to use __for_each_tdp_mmu_root_yield_safe(). Extracted out of a patch by Sean Christopherson <seanjc@google.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-09-23Merge tag 'hardening-v6.6-rc3' of ↵Linus Torvalds1-0/+7
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull hardening fixes from Kees Cook: - Fix UAPI stddef.h to avoid C++-ism (Alexey Dobriyan) - Fix harmless UAPI stddef.h header guard endif (Alexey Dobriyan) * tag 'hardening-v6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: uapi: stddef.h: Fix __DECLARE_FLEX_ARRAY for C++ uapi: stddef.h: Fix header guard location
2023-09-23Merge tag 'xfs-6.6-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds28-241/+441
Pull xfs fixes from Chandan Babu: - Fix an integer overflow bug when processing an fsmap call - Fix crash due to CPU hot remove event racing with filesystem mount operation - During read-only mount, XFS does not allow the contents of the log to be recovered when there are one or more unrecognized rcompat features in the primary superblock, since the log might have intent items which the kernel does not know how to process - During recovery of log intent items, XFS now reserves log space sufficient for one cycle of a permanent transaction to execute. Otherwise, this could lead to livelocks due to non-availability of log space - On an fs which has an ondisk unlinked inode list, trying to delete a file or allocating an O_TMPFILE file can cause the fs to the shutdown if the first inode in the ondisk inode list is not present in the inode cache. The bug is solved by explicitly loading the first inode in the ondisk unlinked inode list into the inode cache if it is not already cached A similar problem arises when the uncached inode is present in the middle of the ondisk unlinked inode list. This second bug is triggered when executing operations like quotacheck and bulkstat. In this case, XFS now reads in the entire ondisk unlinked inode list - Enable LARP mode only on recent v5 filesystems - Fix a out of bounds memory access in scrub - Fix a performance bug when locating the tail of the log during mounting a filesystem * tag 'xfs-6.6-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: use roundup_pow_of_two instead of ffs during xlog_find_tail xfs: only call xchk_stats_merge after validating scrub inputs xfs: require a relatively recent V5 filesystem for LARP mode xfs: make inode unlinked bucket recovery work with quotacheck xfs: load uncached unlinked inodes into memory on demand xfs: reserve less log space when recovering log intent items xfs: fix log recovery when unknown rocompat bits are set xfs: reload entire unlinked bucket lists xfs: allow inode inactivation during a ro mount log recovery xfs: use i_prev_unlinked to distinguish inodes that are not on the unlinked list xfs: remove CPU hotplug infrastructure xfs: remove the all-mounts list xfs: use per-mount cpumask to track nonempty percpu inodegc lists xfs: fix an agbno overflow in __xfs_getfsmap_datadev xfs: fix per-cpu CIL structure aggregation racing with dying cpus xfs: fix select in config XFS_ONLINE_SCRUB_STATS
2023-09-23cxl/acpi: Annotate struct cxl_cxims_data with __counted_byKees Cook1-2/+2
Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct cxl_cxims_data. Additionally, since the element count member must be set before accessing the annotated flexible array member, move its initialization earlier. [1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Jonathan Cameron <jonathan.cameron@huawei.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Alison Schofield <alison.schofield@intel.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: linux-cxl@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20230922175319.work.096-kees@kernel.org Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-09-23cxl/port: Fix cxl_test register enumeration regressionDan Williams1-4/+9
The cxl_test unit test environment models a CXL topology for sysfs/user-ABI regression testing. It uses interface mocking via the "--wrap=" linker option to redirect cxl_core routines that parse hardware registers with versions that just publish objects, like devm_cxl_enumerate_decoders(). Starting with: Commit 19ab69a60e3b ("cxl/port: Store the port's Component Register mappings in struct cxl_port") ...port register enumeration is moved into devm_cxl_add_port(). This conflicts with the "cxl_test avoids emulating registers stance" so either the port code needs to be refactored (too violent), or modified so that register enumeration is skipped on "fake" cxl_test ports (annoying, but straightforward). This conflict has happened previously and the "check for platform device" workaround to avoid instrusive refactoring was deployed in those scenarios. In general, refactoring should only benefit production code, test code needs to remain minimally instrusive to the greatest extent possible. This was missed previously because it may sometimes just cause warning messages to be emitted, but it can also cause test failures. The backport to -stable is only nice to have for clean cxl_test runs. Fixes: 19ab69a60e3b ("cxl/port: Store the port's Component Register mappings in struct cxl_port") Cc: stable@vger.kernel.org Reported-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Tested-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/169476525052.1013896.6235102957693675187.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-09-22eventfs: Remember what dentries were created on dir openSteven Rostedt (Google)1-17/+70
Using the following code with libtracefs: int dfd; // create the directory events/kprobes/kp1 tracefs_kprobe_raw(NULL, "kp1", "schedule_timeout", "time=$arg1"); // Open the kprobes directory dfd = tracefs_instance_file_open(NULL, "events/kprobes", O_RDONLY); // Do a lookup of the kprobes/kp1 directory (by looking at enable) tracefs_file_exists(NULL, "events/kprobes/kp1/enable"); // Now create a new entry in the kprobes directory tracefs_kprobe_raw(NULL, "kp2", "schedule_hrtimeout", "expires=$arg1"); // Do another lookup to create the dentries tracefs_file_exists(NULL, "events/kprobes/kp2/enable")) // Close the directory close(dfd); What happened above, the first open (dfd) will call dcache_dir_open_wrapper() that will create the dentries and up their ref counts. Now the creation of "kp2" will add another dentry within the kprobes directory. Upon the close of dfd, eventfs_release() will now do a dput for all the entries in kprobes. But this is where the problem lies. The open only upped the dentry of kp1 and not kp2. Now the close is decrementing both kp1 and kp2, which causes kp2 to get a negative count. Doing a "trace-cmd reset" which deletes all the kprobes cause the kernel to crash! (due to the messed up accounting of the ref counts). To solve this, save all the dentries that are opened in the dcache_dir_open_wrapper() into an array, and use this array to know what dentries to do a dput on in eventfs_release(). Since the dcache_dir_open_wrapper() calls dcache_dir_open() which uses the file->private_data, we need to also add a wrapper around dcache_readdir() that uses the cursor assigned to the file->private_data. This is because the dentries need to also be saved in the file->private_data. To do this create the structure: struct dentry_list { void *cursor; struct dentry **dentries; }; Which will hold both the cursor and the dentries. Some shuffling around is needed to make sure that dcache_dir_open() and dcache_readdir() only see the cursor. Link: https://lore.kernel.org/linux-trace-kernel/20230919211804.230edf1e@gandalf.local.home/ Link: https://lore.kernel.org/linux-trace-kernel/20230922163446.1431d4fa@gandalf.local.home Cc: Mark Rutland <mark.rutland@arm.com> Cc: Ajay Kaher <akaher@vmware.com> Fixes: 63940449555e7 ("eventfs: Implement eventfs lookup, read, open functions") Reported-by: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-09-22ring-buffer: Fix bytes info in per_cpu buffer statsZheng Yejian1-13/+15
The 'bytes' info in file 'per_cpu/cpu<X>/stats' means the number of bytes in cpu buffer that have not been consumed. However, currently after consuming data by reading file 'trace_pipe', the 'bytes' info was not changed as expected. # cat per_cpu/cpu0/stats entries: 0 overrun: 0 commit overrun: 0 bytes: 568 <--- 'bytes' is problematical !!! oldest event ts: 8651.371479 now ts: 8653.912224 dropped events: 0 read events: 8 The root cause is incorrect stat on cpu_buffer->read_bytes. To fix it: 1. When stat 'read_bytes', account consumed event in rb_advance_reader(); 2. When stat 'entries_bytes', exclude the discarded padding event which is smaller than minimum size because it is invisible to reader. Then use rb_page_commit() instead of BUF_PAGE_SIZE at where accounting for page-based read/remove/overrun. Also correct the comments of ring_buffer_bytes_cpu() in this patch. Link: https://lore.kernel.org/linux-trace-kernel/20230921125425.1708423-1-zhengyejian1@huawei.com Cc: stable@vger.kernel.org Fixes: c64e148a3be3 ("trace: Add ring buffer stats to measure rate of events") Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-09-22Merge tag 'thermal-6.6-rc3' of ↵Linus Torvalds1-4/+5
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull thermal control fix from Rafael Wysocki: "Unbreak the trip point update sysfs interface that has been broken since the 6.3 cycle (Rafael Wysocki)" * tag 'thermal-6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: thermal: sysfs: Fix trip_point_hyst_store()
2023-09-22Merge tag 'acpi-6.6-rc3' of ↵Linus Torvalds2-0/+5
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI fixes from Rafael Wysocki: "These fix a general ACPI processor driver regression and an ia64 build issue, both introduced recently. Specifics: - Fix recently introduced uninitialized memory access issue in the ACPI processor driver (Michal Wilczynski) - Fix ia64 build inadvertently broken by recent ACPI processor driver changes, which is prudent to do for 6.6 even though ia64 support is slated for removal in 6.7 (Ard Biesheuvel)" * tag 'acpi-6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI: processor: Fix uninitialized access of buf in acpi_set_pdc_bits() acpi: Provide ia64 dummy implementation of acpi_proc_quirk_mwait_check()
2023-09-22Merge tag 'arm64-fixes' of ↵Linus Torvalds6-4/+24
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Will Deacon: "Small crop of relatively boring arm64 fixes for -rc3. That's not to say we don't have any juicy bugs, however, it's just that fixes for those are likely to come via -mm and -tip for a hugetlb and an atomics issue respectively. I get left with the documentation... - Fix detection of "ClearBHB" and "Hinted Conditional Branch" features - Fix broken wildcarding for Arm PMU MAINTAINERS entry - Add missing documentation for userspace-visible ID register fields" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: Document missing userspace visible fields in ID_AA64ISAR2_EL1 arm64/hbc: Document HWCAP2_HBC arm64/sme: Include ID_AA64PFR1_EL1.SME in cpu-feature-registers.rst arm64: cpufeature: Fix CLRBHB and BC detection MAINTAINERS: Use wildcard pattern for ARM PMU headers
2023-09-22Merge tag 'x86_urgent_for_v6.6-rc3' of ↵Linus Torvalds2-8/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 rethunk fixes from Borislav Petkov: "Fix the patching ordering between static calls and return thunks" * tag 'x86_urgent_for_v6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86,static_call: Fix static-call vs return-thunk x86/alternatives: Remove faulty optimization
2023-09-22Merge tag 'x86-urgent-2023-09-22' of ↵Linus Torvalds11-55/+56
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc x86 fixes from Ingo Molnar: - Fix a kexec bug - Fix an UML build bug - Fix a handful of SRSO related bugs - Fix a shadow stacks handling bug & robustify related code * tag 'x86-urgent-2023-09-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/shstk: Add warning for shadow stack double unmap x86/shstk: Remove useless clone error handling x86/shstk: Handle vfork clone failure correctly x86/srso: Fix SBPB enablement for spec_rstack_overflow=off x86/srso: Don't probe microcode in a guest x86/srso: Set CPUID feature bits independently of bug or mitigation status x86/srso: Fix srso_show_state() side effect x86/asm: Fix build of UML with KASAN x86/mm, kexec, ima: Use memblock_free_late() from ima_free_kexec_buffer()
2023-09-22Merge tag 'sched-urgent-2023-09-22' of ↵Linus Torvalds2-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fix from Ingo Molnar: "Fix a PF_IDLE initialization bug that generated warnings on tiny-RCU" * tag 'sched-urgent-2023-09-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: kernel/sched: Modify initial boot task idle setup
2023-09-22Merge tag 'locking-urgent-2023-09-22' of ↵Linus Torvalds3-11/+3
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Ingo Molnar: "Fix a include/linux/atomic/atomic-arch-fallback.h breakage that generated incorrect code, and fix a lockdep reporting race that may result in lockups" * tag 'locking-urgent-2023-09-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/seqlock: Do the lockdep annotation before locking in do_write_seqcount_begin_nested() locking/atomic: scripts: fix fallback ifdeffery
2023-09-22x86,static_call: Fix static-call vs return-thunkPeter Zijlstra2-1/+3
Commit 7825451fa4dc ("static_call: Add call depth tracking support") failed to realize the problem fixed there is not specific to call depth tracking but applies to all return-thunk uses. Move the fix to the appropriate place and condition. Fixes: ee88d363d156 ("x86,static_call: Use alternative RET encoding") Reported-by: David Kaplan <David.Kaplan@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Tested-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: <stable@kernel.org>
2023-09-22x86/alternatives: Remove faulty optimizationJosh Poimboeuf1-8/+0
The following commit 095b8303f383 ("x86/alternative: Make custom return thunk unconditional") made '__x86_return_thunk' a placeholder value. All code setting X86_FEATURE_RETHUNK also changes the value of 'x86_return_thunk'. So the optimization at the beginning of apply_returns() is dead code. Also, before the above-mentioned commit, the optimization actually had a bug It bypassed __static_call_fixup(), causing some raw returns to remain unpatched in static call trampolines. Thus the 'Fixes' tag. Fixes: d2408e043e72 ("x86/alternative: Optimize returns patching") Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/16d19d2249d4485d8380fb215ffaae81e6b8119e.1693889988.git.jpoimboe@kernel.org
2023-09-22Merge branch 'acpi-processor'Rafael J. Wysocki1-0/+1
Merge a fix for recently introduced uninitialized memory access in the ACPI processor driver from Michal Wilczynski. * acpi-processor: ACPI: processor: Fix uninitialized access of buf in acpi_set_pdc_bits()
2023-09-22Merge tag 'efi-fixes-for-v6.6-2' of ↵Linus Torvalds1-3/+29
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi Pull EFI fix from Ard Biesheuvel: "Follow-up fix for the unaccepted memory fix merged last week as part of the first EFI fixes batch. The unaccepted memory table needs to be accessible very early, even in cases (such as crashkernels) where the direct map does not cover all of DRAM, and so it is added to memblock explicitly, and subsequently memblock_reserve()'d as before" * tag 'efi-fixes-for-v6.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: efi/unaccepted: Make sure unaccepted table is mapped
2023-09-22Merge tag 'drm-fixes-2023-09-22-2' of git://anongit.freedesktop.org/drm/drmLinus Torvalds23-53/+61
Pull drm fixes from Dave Airlie: "Ben Skeggs is stepping away from nouveau and Red Hat for personal reasons, he'll be missed and we intend to fill the gaps in the upcoming time with Danilo and Lyude stepping in for now. Otherwise i915, nouveau, amdgpu with a few each and some misc spread around. MAINTAINERS: - drop Ben as he retired from nouveau core: - drm_mm test fixes fbdev: - Kconfig fixes ivpu: - IRQ-handling fixes meson: - Fix memory leak in HDMI EDID code nouveau: - Correct type casting - Fix memory leak in scheduler - u_memcpya() fixes i915: - Prevent error pointer dereference - Fix PMU busyness values when using GuC mode amdgpu: - MST fix - Vbios part number reporting fix - Fix a possible memory leak in an error case in the RAS code - Fix low resolution modes on eDP amdkfd: - Fix GPU address for user queue wptr when GART is not at 0" * tag 'drm-fixes-2023-09-22-2' of git://anongit.freedesktop.org/drm/drm: MAINTAINERS: remove myself as nouveau maintainer fbdev/sh7760fb: Depend on FB=y drm/amdkfd: Use gpu_offset for user queue's wptr drm/amd/display: fix the ability to use lower resolution modes on eDP drm/amdgpu: fix a memory leak in amdgpu_ras_feature_enable Revert "drm/amdgpu: Report vbios version instead of PN" drm/amd/display: Fix MST recognizes connected displays as one drm/virtio: clean out_fence on complete_submit i915/pmu: Move execlist stats initialization to execlist specific setup drm/i915/gt: Prevent error pointer dereference drm/meson: fix memory leak on ->hpd_notify callback accel/ivpu/40xx: Fix buttress interrupt handling nouveau/u_memcpya: fix NULL vs error pointer bug nouveau/u_memcpya: use vmemdup_user drm/nouveau: sched: fix leaking memory of timedout job drm/nouveau: fence: fix type cast warning in nouveau_fence_emit() drm: fix up fbdev Kconfig defaults drm/tests: Fix incorrect argument in drm_test_mm_insert_range
2023-09-22Merge tag 'v6.6-p3' of ↵Linus Torvalds1-1/+5
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fix from Herbert Xu: "This fixes a regression in sm2" * tag 'v6.6-p3' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: sm2 - Fix crash caused by uninitialized context