summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)AuthorFilesLines
2024-10-11Merge tag 'arm-soc/for-6.12/devicetree-fixes' of ↵Arnd Bergmann1-1/+1
https://github.com/Broadcom/stblinux into arm/fixes This pull request contains Broadcom ARM-based SoCs Device Tree fixes for 6.12, please pull the following: - Florian fixed the HDMI gpio pin which is connected to GPIO pin 0, not 1 * tag 'arm-soc/for-6.12/devicetree-fixes' of https://github.com/Broadcom/stblinux: ARM: dts: bcm2837-rpi-cm3-io3: Fix HDMI hpd-gpio pin Link: https://lore.kernel.org/r/20241008220440.23182-1-florian.fainelli@broadcom.com Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-10-11powerpc/8xx: Fix kernel DTLB miss on dcbzChristophe Leroy1-0/+1
Following OOPS is encountered while loading test_bpf module on powerpc 8xx: [ 218.835567] BUG: Unable to handle kernel data access on write at 0xcb000000 [ 218.842473] Faulting instruction address: 0xc0017a80 [ 218.847451] Oops: Kernel access of bad area, sig: 11 [#1] [ 218.852854] BE PAGE_SIZE=16K PREEMPT CMPC885 [ 218.857207] SAF3000 DIE NOTIFICATION [ 218.860713] Modules linked in: test_bpf(+) test_module [ 218.865867] CPU: 0 UID: 0 PID: 527 Comm: insmod Not tainted 6.11.0-s3k-dev-09856-g3de3d71ae2e6-dirty #1280 [ 218.875546] Hardware name: MIAE 8xx 0x500000 CMPC885 [ 218.880521] NIP: c0017a80 LR: beab859c CTR: 000101d4 [ 218.885584] REGS: cac2bc90 TRAP: 0300 Not tainted (6.11.0-s3k-dev-09856-g3de3d71ae2e6-dirty) [ 218.894308] MSR: 00009032 <EE,ME,IR,DR,RI> CR: 55005555 XER: a0007100 [ 218.901290] DAR: cb000000 DSISR: c2000000 [ 218.901290] GPR00: 000185d1 cac2bd50 c21b9580 caf7c030 c3883fcc 00000008 cafffffc 00000000 [ 218.901290] GPR08: 00040000 18300000 20000000 00000004 99005555 100d815e ca669d08 00000369 [ 218.901290] GPR16: ca730000 00000000 ca2c004c 00000000 00000000 0000035d 00000311 00000369 [ 218.901290] GPR24: ca732240 00000001 00030ba3 c3800000 00000000 00185d48 caf7c000 ca2c004c [ 218.941087] NIP [c0017a80] memcpy+0x88/0xec [ 218.945277] LR [beab859c] test_bpf_init+0x22c/0x3c90 [test_bpf] [ 218.951476] Call Trace: [ 218.953916] [cac2bd50] [beab8570] test_bpf_init+0x200/0x3c90 [test_bpf] (unreliable) [ 218.962034] [cac2bde0] [c0004c04] do_one_initcall+0x4c/0x1fc [ 218.967706] [cac2be40] [c00a2ec4] do_init_module+0x68/0x360 [ 218.973292] [cac2be60] [c00a5194] init_module_from_file+0x8c/0xc0 [ 218.979401] [cac2bed0] [c00a5568] sys_finit_module+0x250/0x3f0 [ 218.985248] [cac2bf20] [c000e390] system_call_exception+0x8c/0x15c [ 218.991444] [cac2bf30] [c00120a8] ret_from_syscall+0x0/0x28 This happens in the main loop of memcpy() ==> c0017a80: 7c 0b 37 ec dcbz r11,r6 c0017a84: 80 e4 00 04 lwz r7,4(r4) c0017a88: 81 04 00 08 lwz r8,8(r4) c0017a8c: 81 24 00 0c lwz r9,12(r4) c0017a90: 85 44 00 10 lwzu r10,16(r4) c0017a94: 90 e6 00 04 stw r7,4(r6) c0017a98: 91 06 00 08 stw r8,8(r6) c0017a9c: 91 26 00 0c stw r9,12(r6) c0017aa0: 95 46 00 10 stwu r10,16(r6) c0017aa4: 42 00 ff dc bdnz c0017a80 <memcpy+0x88> Commit ac9f97ff8b32 ("powerpc/8xx: Inconditionally use task PGDIR in DTLB misses") relies on re-reading DAR register to know if an error is due to a missing copy of a PMD entry in task's PGDIR, allthough DAR was already read in the exception prolog and copied into thread struct. This is because is it done very early in the exception and there are not enough registers available to keep a pointer to thread struct. However, dcbz instruction is buggy and doesn't update DAR register on fault. That is detected and generates a call to FixupDAR workaround which updates DAR copy in thread struct but doesn't fix DAR register. Let's fix DAR in addition to the update of DAR copy in thread struct. Fixes: ac9f97ff8b32 ("powerpc/8xx: Inconditionally use task PGDIR in DTLB misses") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/2b851399bd87e81c6ccb87ea3a7a6b32c7aa04d7.1728118396.git.christophe.leroy@csgroup.eu
2024-10-10KVM: s390: Change virtual to physical address access in diag 0x258 handlerMichael Mueller1-1/+1
The parameters for the diag 0x258 are real addresses, not virtual, but KVM was using them as virtual addresses. This only happened to work, since the Linux kernel as a guest used to have a 1:1 mapping for physical vs virtual addresses. Fix KVM so that it correctly uses the addresses as real addresses. Cc: stable@vger.kernel.org Fixes: 8ae04b8f500b ("KVM: s390: Guest's memory access functions get access registers") Suggested-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Michael Mueller <mimu@linux.ibm.com> Signed-off-by: Nico Boehr <nrb@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Link: https://lore.kernel.org/r/20240917151904.74314-3-nrb@linux.ibm.com Acked-by: Janosch Frank <frankja@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-10-10KVM: s390: gaccess: Check if guest address is in memslotNico Boehr2-6/+12
Previously, access_guest_page() did not check whether the given guest address is inside of a memslot. This is not a problem, since kvm_write_guest_page/kvm_read_guest_page return -EFAULT in this case. However, -EFAULT is also returned when copy_to/from_user fails. When emulating a guest instruction, the address being outside a memslot usually means that an addressing exception should be injected into the guest. Failure in copy_to/from_user however indicates that something is wrong in userspace and hence should be handled there. To be able to distinguish these two cases, return PGM_ADDRESSING in access_guest_page() when the guest address is outside guest memory. In access_guest_real(), populate vcpu->arch.pgm.code such that kvm_s390_inject_prog_cond() can be used in the caller for injecting into the guest (if applicable). Since this adds a new return value to access_guest_page(), we need to make sure that other callers are not confused by the new positive return value. There are the following users of access_guest_page(): - access_guest_with_key() does the checking itself (in guest_range_to_gpas()), so this case should never happen. Even if, the handling is set up properly. - access_guest_real() just passes the return code to its callers, which are: - read_guest_real() - see below - write_guest_real() - see below There are the following users of read_guest_real(): - ar_translation() in gaccess.c which already returns PGM_* - setup_apcb10(), setup_apcb00(), setup_apcb11() in vsie.c which always return -EFAULT on read_guest_read() nonzero return - no change - shadow_crycb(), handle_stfle() always present this as validity, this could be handled better but doesn't change current behaviour - no change There are the following users of write_guest_real(): - kvm_s390_store_status_unloaded() always returns -EFAULT on write_guest_real() failure. Fixes: 2293897805c2 ("KVM: s390: add architecture compliant guest access functions") Cc: stable@vger.kernel.org Signed-off-by: Nico Boehr <nrb@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Link: https://lore.kernel.org/r/20240917151904.74314-2-nrb@linux.ibm.com Acked-by: Janosch Frank <frankja@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-10-10s390/pci: Handle PCI error codes other than 0x3aNiklas Schnelle1-8/+9
The Linux implementation of PCI error recovery for s390 was based on the understanding that firmware error recovery is a two step process with an optional initial error event to indicate the cause of the error if known followed by either error event 0x3A (Success) or 0x3B (Failure) to indicate whether firmware was able to recover. While this has been the case in testing and the error cases seen in the wild it turns out this is not correct. Instead firmware only generates 0x3A for some error and service scenarios and expects the OS to perform recovery for all PCI events codes except for those indicating permanent error (0x3B, 0x40) and those indicating errors on the function measurement block (0x2A, 0x2B, 0x2C). Align Linux behavior with these expectations. Fixes: 4cdf2f4e24ff ("s390/pci: implement minimal PCI error recovery") Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-10-10x86/bugs: Do not use UNTRAIN_RET with IBPB on entryJohannes Wikner1-0/+17
Since X86_FEATURE_ENTRY_IBPB will invalidate all harmful predictions with IBPB, no software-based untraining of returns is needed anymore. Currently, this change affects retbleed and SRSO mitigations so if either of the mitigations is doing IBPB and the other one does the software sequence, the latter is not needed anymore. [ bp: Massage commit message. ] Suggested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Johannes Wikner <kwikner@ethz.ch> Cc: <stable@kernel.org>
2024-10-10x86/bugs: Skip RSB fill at VMEXITJohannes Wikner1-0/+15
entry_ibpb() is designed to follow Intel's IBPB specification regardless of CPU. This includes invalidating RSB entries. Hence, if IBPB on VMEXIT has been selected, entry_ibpb() as part of the RET untraining in the VMEXIT path will take care of all BTB and RSB clearing so there's no need to explicitly fill the RSB anymore. [ bp: Massage commit message. ] Suggested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Johannes Wikner <kwikner@ethz.ch> Cc: <stable@kernel.org>
2024-10-10x86/entry: Have entry_ibpb() invalidate return predictionsJohannes Wikner1-0/+5
entry_ibpb() should invalidate all indirect predictions, including return target predictions. Not all IBPB implementations do this, in which case the fallback is RSB filling. Prevent SRSO-style hijacks of return predictions following IBPB, as the return target predictor can be corrupted before the IBPB completes. [ bp: Massage. ] Signed-off-by: Johannes Wikner <kwikner@ethz.ch> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: <stable@kernel.org>
2024-10-10x86/cpufeatures: Add a IBPB_NO_RET BUG flagJohannes Wikner2-0/+4
Set this flag if the CPU has an IBPB implementation that does not invalidate return target predictions. Zen generations < 4 do not flush the RSB when executing an IBPB and this bug flag denotes that. [ bp: Massage. ] Signed-off-by: Johannes Wikner <kwikner@ethz.ch> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: <stable@kernel.org>
2024-10-10x86/cpufeatures: Define X86_FEATURE_AMD_IBPB_RETJim Mattson1-1/+2
AMD's initial implementation of IBPB did not clear the return address predictor. Beginning with Zen4, AMD's IBPB *does* clear the return address predictor. This behavior is enumerated by CPUID.80000008H:EBX.IBPB_RET[30]. Define X86_FEATURE_AMD_IBPB_RET for use in KVM_GET_SUPPORTED_CPUID, when determining cross-vendor capabilities. Suggested-by: Venkatesh Srinivas <venkateshs@chromium.org> Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@kernel.org>
2024-10-10riscv, bpf: Fix possible infinite tailcall when CONFIG_CFI_CLANG is enabledPu Lehui1-1/+3
When CONFIG_CFI_CLANG is enabled, the number of prologue instructions skipped by tailcall needs to include the kcfi instruction, otherwise the TCC will be initialized every tailcall is called, which may result in infinite tailcalls. Fixes: e63985ecd226 ("bpf, riscv64/cfi: Support kCFI + BPF on riscv64") Signed-off-by: Pu Lehui <pulehui@huawei.com> Acked-by: Björn Töpel <bjorn@kernel.org> Link: https://lore.kernel.org/r/20241008124544.171161-1-pulehui@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-10-09fs/proc/kcore.c: allow translation of physical memory addressesAlexander Gordeev1-0/+2
When /proc/kcore is read an attempt to read the first two pages results in HW-specific page swap on s390 and another (so called prefix) pages are accessed instead. That leads to a wrong read. Allow architecture-specific translation of memory addresses using kc_xlate_dev_mem_ptr() and kc_unxlate_dev_mem_ptr() callbacks similarily to /dev/mem xlate_dev_mem_ptr() and unxlate_dev_mem_ptr() callbacks. That way an architecture can deal with specific physical memory ranges. Re-use the existing /dev/mem callback implementation on s390, which handles the described prefix pages swapping correctly. For other architectures the default callback is basically NOP. It is expected the condition (vaddr == __va(__pa(vaddr))) always holds true for KCORE_RAM memory type. Link: https://lkml.kernel.org/r/20240930122119.1651546-1-agordeev@linux.ibm.com Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Suggested-by: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-10-09arm64: probes: Fix uprobes for big-endian kernelsMark Rutland2-7/+5
The arm64 uprobes code is broken for big-endian kernels as it doesn't convert the in-memory instruction encoding (which is always little-endian) into the kernel's native endianness before analyzing and simulating instructions. This may result in a few distinct problems: * The kernel may may erroneously reject probing an instruction which can safely be probed. * The kernel may erroneously erroneously permit stepping an instruction out-of-line when that instruction cannot be stepped out-of-line safely. * The kernel may erroneously simulate instruction incorrectly dur to interpretting the byte-swapped encoding. The endianness mismatch isn't caught by the compiler or sparse because: * The arch_uprobe::{insn,ixol} fields are encoded as arrays of u8, so the compiler and sparse have no idea these contain a little-endian 32-bit value. The core uprobes code populates these with a memcpy() which similarly does not handle endianness. * While the uprobe_opcode_t type is an alias for __le32, both arch_uprobe_analyze_insn() and arch_uprobe_skip_sstep() cast from u8[] to the similarly-named probe_opcode_t, which is an alias for u32. Hence there is no endianness conversion warning. Fix this by changing the arch_uprobe::{insn,ixol} fields to __le32 and adding the appropriate __le32_to_cpu() conversions prior to consuming the instruction encoding. The core uprobes copies these fields as opaque ranges of bytes, and so is unaffected by this change. At the same time, remove MAX_UINSN_BYTES and consistently use AARCH64_INSN_SIZE for clarity. Tested with the following: | #include <stdio.h> | #include <stdbool.h> | | #define noinline __attribute__((noinline)) | | static noinline void *adrp_self(void) | { | void *addr; | | asm volatile( | " adrp %x0, adrp_self\n" | " add %x0, %x0, :lo12:adrp_self\n" | : "=r" (addr)); | } | | | int main(int argc, char *argv) | { | void *ptr = adrp_self(); | bool equal = (ptr == adrp_self); | | printf("adrp_self => %p\n" | "adrp_self() => %p\n" | "%s\n", | adrp_self, ptr, equal ? "EQUAL" : "NOT EQUAL"); | | return 0; | } .... where the adrp_self() function was compiled to: | 00000000004007e0 <adrp_self>: | 4007e0: 90000000 adrp x0, 400000 <__ehdr_start> | 4007e4: 911f8000 add x0, x0, #0x7e0 | 4007e8: d65f03c0 ret Before this patch, the ADRP is not recognized, and is assumed to be steppable, resulting in corruption of the result: | # ./adrp-self | adrp_self => 0x4007e0 | adrp_self() => 0x4007e0 | EQUAL | # echo 'p /root/adrp-self:0x007e0' > /sys/kernel/tracing/uprobe_events | # echo 1 > /sys/kernel/tracing/events/uprobes/enable | # ./adrp-self | adrp_self => 0x4007e0 | adrp_self() => 0xffffffffff7e0 | NOT EQUAL After this patch, the ADRP is correctly recognized and simulated: | # ./adrp-self | adrp_self => 0x4007e0 | adrp_self() => 0x4007e0 | EQUAL | # | # echo 'p /root/adrp-self:0x007e0' > /sys/kernel/tracing/uprobe_events | # echo 1 > /sys/kernel/tracing/events/uprobes/enable | # ./adrp-self | adrp_self => 0x4007e0 | adrp_self() => 0x4007e0 | EQUAL Fixes: 9842ceae9fa8 ("arm64: Add uprobe support") Cc: stable@vger.kernel.org Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20241008155851.801546-4-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2024-10-09arm64: probes: Fix simulate_ldr*_literal()Mark Rutland1-11/+7
The simulate_ldr_literal() code always loads a 64-bit quantity, and when simulating a 32-bit load into a 'W' register, it discards the most significant 32 bits. For big-endian kernels this means that the relevant bits are discarded, and the value returned is the the subsequent 32 bits in memory (i.e. the value at addr + 4). Additionally, simulate_ldr_literal() and simulate_ldrsw_literal() use a plain C load, which the compiler may tear or elide (e.g. if the target is the zero register). Today this doesn't happen to matter, but it may matter in future if trampoline code uses a LDR (literal) or LDRSW (literal). Update simulate_ldr_literal() and simulate_ldrsw_literal() to use an appropriately-sized READ_ONCE() to perform the access, which avoids these problems. Fixes: 39a67d49ba35 ("arm64: kprobes instruction simulation support") Cc: stable@vger.kernel.org Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20241008155851.801546-3-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2024-10-09arm64: probes: Remove broken LDR (literal) uprobe supportMark Rutland1-5/+11
The simulate_ldr_literal() and simulate_ldrsw_literal() functions are unsafe to use for uprobes. Both functions were originally written for use with kprobes, and access memory with plain C accesses. When uprobes was added, these were reused unmodified even though they cannot safely access user memory. There are three key problems: 1) The plain C accesses do not have corresponding extable entries, and thus if they encounter a fault the kernel will treat these as unintentional accesses to user memory, resulting in a BUG() which will kill the kernel thread, and likely lead to further issues (e.g. lockup or panic()). 2) The plain C accesses are subject to HW PAN and SW PAN, and so when either is in use, any attempt to simulate an access to user memory will fault. Thus neither simulate_ldr_literal() nor simulate_ldrsw_literal() can do anything useful when simulating a user instruction on any system with HW PAN or SW PAN. 3) The plain C accesses are privileged, as they run in kernel context, and in practice can access a small range of kernel virtual addresses. The instructions they simulate have a range of +/-1MiB, and since the simulated instructions must itself be a user instructions in the TTBR0 address range, these can address the final 1MiB of the TTBR1 acddress range by wrapping downwards from an address in the first 1MiB of the TTBR0 address range. In contemporary kernels the last 8MiB of TTBR1 address range is reserved, and accesses to this will always fault, meaning this is no worse than (1). Historically, it was theoretically possible for the linear map or vmemmap to spill into the final 8MiB of the TTBR1 address range, but in practice this is extremely unlikely to occur as this would require either: * Having enough physical memory to fill the entire linear map all the way to the final 1MiB of the TTBR1 address range. * Getting unlucky with KASLR randomization of the linear map such that the populated region happens to overlap with the last 1MiB of the TTBR address range. ... and in either case if we were to spill into the final page there would be larger problems as the final page would alias with error pointers. Practically speaking, (1) and (2) are the big issues. Given there have been no reports of problems since the broken code was introduced, it appears that no-one is relying on probing these instructions with uprobes. Avoid these issues by not allowing uprobes on LDR (literal) and LDRSW (literal), limiting the use of simulate_ldr_literal() and simulate_ldrsw_literal() to kprobes. Attempts to place uprobes on LDR (literal) and LDRSW (literal) will be rejected as arm_probe_decode_insn() will return INSN_REJECTED. In future we can consider introducing working uprobes support for these instructions, but this will require more significant work. Fixes: 9842ceae9fa8 ("arm64: Add uprobe support") Cc: stable@vger.kernel.org Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20241008155851.801546-2-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2024-10-09x86/bugs: Use code segment selector for VERW operandPawan Gupta1-1/+10
Robert Gill reported below #GP in 32-bit mode when dosemu software was executing vm86() system call: general protection fault: 0000 [#1] PREEMPT SMP CPU: 4 PID: 4610 Comm: dosemu.bin Not tainted 6.6.21-gentoo-x86 #1 Hardware name: Dell Inc. PowerEdge 1950/0H723K, BIOS 2.7.0 10/30/2010 EIP: restore_all_switch_stack+0xbe/0xcf EAX: 00000000 EBX: 00000000 ECX: 00000000 EDX: 00000000 ESI: 00000000 EDI: 00000000 EBP: 00000000 ESP: ff8affdc DS: 0000 ES: 0000 FS: 0000 GS: 0033 SS: 0068 EFLAGS: 00010046 CR0: 80050033 CR2: 00c2101c CR3: 04b6d000 CR4: 000406d0 Call Trace: show_regs+0x70/0x78 die_addr+0x29/0x70 exc_general_protection+0x13c/0x348 exc_bounds+0x98/0x98 handle_exception+0x14d/0x14d exc_bounds+0x98/0x98 restore_all_switch_stack+0xbe/0xcf exc_bounds+0x98/0x98 restore_all_switch_stack+0xbe/0xcf This only happens in 32-bit mode when VERW based mitigations like MDS/RFDS are enabled. This is because segment registers with an arbitrary user value can result in #GP when executing VERW. Intel SDM vol. 2C documents the following behavior for VERW instruction: #GP(0) - If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. CLEAR_CPU_BUFFERS macro executes VERW instruction before returning to user space. Use %cs selector to reference VERW operand. This ensures VERW will not #GP for an arbitrary user %ds. [ mingo: Fixed the SOB chain. ] Fixes: a0e2dab44d22 ("x86/entry_32: Add VERW just before userspace transition") Reported-by: Robert Gill <rtgill82@gmail.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com Cc: stable@vger.kernel.org # 5.10+ Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218707 Closes: https://lore.kernel.org/all/8c77ccfd-d561-45a1-8ed5-6b75212c7a58@leemhuis.info/ Suggested-by: Dave Hansen <dave.hansen@linux.intel.com> Suggested-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2024-10-09x86/entry_32: Clear CPU buffers after register restore in NMI returnPawan Gupta1-1/+2
CPU buffers are currently cleared after call to exc_nmi, but before register state is restored. This may be okay for MDS mitigation but not for RDFS. Because RDFS mitigation requires CPU buffers to be cleared when registers don't have any sensitive data. Move CLEAR_CPU_BUFFERS after RESTORE_ALL_NMI. Fixes: a0e2dab44d22 ("x86/entry_32: Add VERW just before userspace transition") Suggested-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc:stable@vger.kernel.org Link: https://lore.kernel.org/all/20240925-fix-dosemu-vm86-v7-2-1de0daca2d42%40linux.intel.com
2024-10-09x86/entry_32: Do not clobber user EFLAGS.ZFPawan Gupta1-1/+2
Opportunistic SYSEXIT executes VERW to clear CPU buffers after user EFLAGS are restored. This can clobber user EFLAGS.ZF. Move CLEAR_CPU_BUFFERS before the user EFLAGS are restored. This ensures that the user EFLAGS.ZF is not clobbered. Closes: https://lore.kernel.org/lkml/yVXwe8gvgmPADpRB6lXlicS2fcHoV5OHHxyuFbB_MEleRPD7-KhGe5VtORejtPe-KCkT8Uhcg5d7-IBw4Ojb4H7z5LQxoZylSmJ8KNL3A8o=@protonmail.com/ Fixes: a0e2dab44d22 ("x86/entry_32: Add VERW just before userspace transition") Reported-by: Jari Ruusu <jariruusu@protonmail.com> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc:stable@vger.kernel.org Link: https://lore.kernel.org/all/20240925-fix-dosemu-vm86-v7-1-1de0daca2d42%40linux.intel.com
2024-10-09ARM: dts: bcm2837-rpi-cm3-io3: Fix HDMI hpd-gpio pinFlorian Klink1-1/+1
HDMI_HPD_N_1V8 is connected to GPIO pin 0, not 1. This fixes HDMI hotplug/output detection. See https://datasheets.raspberrypi.com/cm/cm3-schematics.pdf Signed-off-by: Florian Klink <flokli@flokli.de> Reviewed-by: Stefan Wahren <wahrenst@gmx.net> Link: https://lore.kernel.org/r/20240715230311.685641-1-flokli@flokli.de Reviewed-by: Stefan Wahren <wahrenst@gmx.net> Fixes: a54fe8a6cf66 ("ARM: dts: add Raspberry Pi Compute Module 3 and IO board") Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
2024-10-08x86/resctrl: Annotate get_mem_config() functions as __initNathan Chancellor1-2/+2
After a recent LLVM change [1] that deduces __cold on functions that only call cold code (such as __init functions), there is a section mismatch warning from __get_mem_config_intel(), which got moved to .text.unlikely. as a result of that optimization: WARNING: modpost: vmlinux: section mismatch in reference: \ __get_mem_config_intel+0x77 (section: .text.unlikely.) -> thread_throttle_mode_init (section: .init.text) Mark __get_mem_config_intel() as __init as well since it is only called from __init code, which clears up the warning. While __rdt_get_mem_config_amd() does not exhibit a warning because it does not call any __init code, it is a similar function that is only called from __init code like __get_mem_config_intel(), so mark it __init as well to keep the code symmetrical. CONFIG_SECTION_MISMATCH_WARN_ONLY=n would turn this into a fatal error. Fixes: 05b93417ce5b ("x86/intel_rdt/mba: Add primary support for Memory Bandwidth Allocation (MBA)") Fixes: 4d05bf71f157 ("x86/resctrl: Introduce AMD QOS feature") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Cc: <stable@kernel.org> Link: https://github.com/llvm/llvm-project/commit/6b11573b8c5e3d36beee099dbe7347c2a007bf53 [1] Link: https://lore.kernel.org/r/20240917-x86-restctrl-get_mem_config_intel-init-v3-1-10d521256284@kernel.org
2024-10-08x86/xen: mark boot CPU of PV guest in MSR_IA32_APICBASEJuergen Gross1-0/+4
Recent topology checks of the x86 boot code uncovered the need for PV guests to have the boot cpu marked in the APICBASE MSR. Fixes: 9d22c96316ac ("x86/topology: Handle bogus ACPI tables correctly") Reported-by: Niels Dettenbach <nd@syndicat.com> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Juergen Gross <jgross@suse.com>
2024-10-08x86/resctrl: Avoid overflow in MB settings in bw_validate()Martin Kletzander1-9/+14
The resctrl schemata file supports specifying memory bandwidth associated with the Memory Bandwidth Allocation (MBA) feature via a percentage (this is the default) or bandwidth in MiBps (when resctrl is mounted with the "mba_MBps" option). The allowed range for the bandwidth percentage is from /sys/fs/resctrl/info/MB/min_bandwidth to 100, using a granularity of /sys/fs/resctrl/info/MB/bandwidth_gran. The supported range for the MiBps bandwidth is 0 to U32_MAX. There are two issues with parsing of MiBps memory bandwidth: * The user provided MiBps is mistakenly rounded up to the granularity that is unique to percentage input. * The user provided MiBps is parsed using unsigned long (thus accepting values up to ULONG_MAX), and then assigned to u32 that could result in overflow. Do not round up the MiBps value and parse user provided bandwidth as the u32 it is intended to be. Use the appropriate kstrtou32() that can detect out of range values. Fixes: 8205a078ba78 ("x86/intel_rdt/mba_sc: Add schemata support") Fixes: 6ce1560d35f6 ("x86/resctrl: Switch over to the resctrl mbps_val list") Co-developed-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Martin Kletzander <nert.pinx@gmail.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com>
2024-10-08KVM: arm64: Expose S1PIE to guestsMark Brown1-1/+3
Prior to commit 70ed7238297f ("KVM: arm64: Sanitise ID_AA64MMFR3_EL1") we just exposed the santised view of ID_AA64MMFR3_EL1 to guests, meaning that they saw both TCRX and S1PIE if present on the host machine. That commit added VMM control over the contents of the register and exposed S1POE but removed S1PIE, meaning that the extension is no longer visible to guests. Reenable support for S1PIE with VMM control. Fixes: 70ed7238297f ("KVM: arm64: Sanitise ID_AA64MMFR3_EL1") Signed-off-by: Mark Brown <broonie@kernel.org> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20241005-kvm-arm64-fix-s1pie-v1-1-5901f02de749@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-10-08KVM: arm64: nv: Clarify safety of allowing TLBI unmaps to rescheduleOliver Upton1-0/+27
There's been a decent amount of attention around unmaps of nested MMUs, and TLBI handling is no exception to this. Add a comment clarifying why it is safe to reschedule during a TLBI unmap, even without a reference on the MMU in progress. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20241007233028.2236133-5-oliver.upton@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-10-08KVM: arm64: nv: Punt stage-2 recycling to a vCPU requestOliver Upton4-2/+37
Currently, when a nested MMU is repurposed for some other MMU context, KVM unmaps everything during vcpu_load() while holding the MMU lock for write. This is quite a performance bottleneck for large nested VMs, as all vCPU scheduling will spin until the unmap completes. Start punting the MMU cleanup to a vCPU request, where it is then possible to periodically release the MMU lock and CPU in the presence of contention. Ensure that no vCPU winds up using a stale MMU by tracking the pending unmap on the S2 MMU itself and requesting an unmap on every vCPU that finds it. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20241007233028.2236133-4-oliver.upton@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-10-08KVM: arm64: nv: Do not block when unmapping stage-2 if disallowedOliver Upton5-15/+17
Right now the nested code allows unmap operations on a shadow stage-2 to block unconditionally. This is wrong in a couple places, such as a non-blocking MMU notifier or on the back of a sched_in() notifier as part of shadow MMU recycling. Carry through whether or not blocking is allowed to kvm_pgtable_stage2_unmap(). This 'fixes' an issue where stage-2 MMU reclaim would precipitate a stack overflow from a pile of kvm_sched_in() callbacks, all trying to recycle a stage-2 MMU. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20241007233028.2236133-3-oliver.upton@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-10-08KVM: arm64: nv: Keep reference on stage-2 MMU when scheduled outOliver Upton1-3/+18
If a vCPU is scheduling out and not in WFI emulation, it is highly likely it will get scheduled again soon and reuse the MMU it had before. Dropping the MMU at vcpu_put() can have some unfortunate consequences, as the MMU could get reclaimed and used in a different context, forcing another 'cold start' on an otherwise active MMU. Avoid that altogether by keeping a reference on the MMU if the vCPU is scheduling out, ensuring that another vCPU cannot reclaim it while the current vCPU is away. Since there are more MMUs than vCPUs, this does not affect the guarantee that an unused MMU is available at any time. Furthermore, this makes the vcpu->arch.hw_mmu ~stable in preemptible code, at least for where it matters in the stage-2 abort path. Yes, the MMU can change across WFI emulation, but there isn't even a use case where this would matter. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20241007233028.2236133-2-oliver.upton@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-10-08KVM: arm64: Unregister redistributor for failed vCPU creationOliver Upton1-1/+21
Alex reports that syzkaller has managed to trigger a use-after-free when tearing down a VM: BUG: KASAN: slab-use-after-free in kvm_put_kvm+0x300/0xe68 virt/kvm/kvm_main.c:5769 Read of size 8 at addr ffffff801c6890d0 by task syz.3.2219/10758 CPU: 3 UID: 0 PID: 10758 Comm: syz.3.2219 Not tainted 6.11.0-rc6-dirty #64 Hardware name: linux,dummy-virt (DT) Call trace: dump_backtrace+0x17c/0x1a8 arch/arm64/kernel/stacktrace.c:317 show_stack+0x2c/0x3c arch/arm64/kernel/stacktrace.c:324 __dump_stack lib/dump_stack.c:93 [inline] dump_stack_lvl+0x94/0xc0 lib/dump_stack.c:119 print_report+0x144/0x7a4 mm/kasan/report.c:377 kasan_report+0xcc/0x128 mm/kasan/report.c:601 __asan_report_load8_noabort+0x20/0x2c mm/kasan/report_generic.c:381 kvm_put_kvm+0x300/0xe68 virt/kvm/kvm_main.c:5769 kvm_vm_release+0x4c/0x60 virt/kvm/kvm_main.c:1409 __fput+0x198/0x71c fs/file_table.c:422 ____fput+0x20/0x30 fs/file_table.c:450 task_work_run+0x1cc/0x23c kernel/task_work.c:228 do_notify_resume+0x144/0x1a0 include/linux/resume_user_mode.h:50 el0_svc+0x64/0x68 arch/arm64/kernel/entry-common.c:169 el0t_64_sync_handler+0x90/0xfc arch/arm64/kernel/entry-common.c:730 el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598 Upon closer inspection, it appears that we do not properly tear down the MMIO registration for a vCPU that fails creation late in the game, e.g. a vCPU w/ the same ID already exists in the VM. It is important to consider the context of commit that introduced this bug by moving the unregistration out of __kvm_vgic_vcpu_destroy(). That change correctly sought to avoid an srcu v. config_lock inversion by breaking up the vCPU teardown into two parts, one guarded by the config_lock. Fix the use-after-free while avoiding lock inversion by adding a special-cased unregistration to __kvm_vgic_vcpu_destroy(). This is safe because failed vCPUs are torn down outside of the config_lock. Cc: stable@vger.kernel.org Fixes: f616506754d3 ("KVM: arm64: vgic: Don't hold config_lock while unregistering redistributors") Reported-by: Alexander Potapenko <glider@google.com> Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20241007223909.2157336-1-oliver.upton@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-10-08Merge branch kvm-arm64/idregs-6.12 into kvmarm/fixesMarc Zyngier2-8/+42
* kvm-arm64/idregs-6.12: : . : Make some fields of ID_AA64DFR0_EL1 and ID_AA64PFR1_EL1 : writable from userspace, so that a VMM can influence the : set of guest-visible features. : : - for ID_AA64DFR0_EL1: DoubleLock, WRPs, PMUVer and DebugVer : are writable (courtesy of Shameer Kolothum) : : - for ID_AA64PFR1_EL1: BT, SSBS, CVS2_frac are writable : (courtesy of Shaoqin Huang) : . KVM: selftests: aarch64: Add writable test for ID_AA64PFR1_EL1 KVM: arm64: Allow userspace to change ID_AA64PFR1_EL1 KVM: arm64: Use kvm_has_feat() to check if FEAT_SSBS is advertised to the guest KVM: arm64: Disable fields that KVM doesn't know how to handle in ID_AA64PFR1_EL1 KVM: arm64: Make the exposed feature bits in AA64DFR0_EL1 writable from userspace Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-10-07x86/amd_nb: Add new PCI ID for AMD family 1Ah model 20hRichard Gong1-0/+2
Add new PCI ID for Device 18h and Function 4. Signed-off-by: Richard Gong <richard.gong@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com> Link: https://lore.kernel.org/r/20240913162903.649519-1-richard.gong@amd.com Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2024-10-07arm64: dts: marvell: cn9130-sr-som: fix cp0 mdio pin numbersJosua Mayer1-1/+1
SolidRun CN9130 SoM actually uses CP_MPP[0:1] for mdio. CP_MPP[40] provides reference clock for dsa switch and ethernet phy on Clearfog Pro, wheras MPP[41] controls efuse programming voltage "VHV". Update the cp0 mdio pinctrl node to specify mpp0, mpp1. Fixes: 1c510c7d82e5 ("arm64: dts: add description for solidrun cn9130 som and clearfog boards") Cc: stable@vger.kernel.org # 6.11.x Signed-off-by: Josua Mayer <josua@solid-run.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/stable/20241002-cn9130-som-mdio-v1-1-0942be4dc550%40solid-run.com Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
2024-10-07perf/x86/amd: Warn only on new bits setBreno Leitao1-2/+8
Warning at every leaking bits can cause a flood of message, triggering various stall-warning mechanisms to fire, including CSD locks, which makes the machine to be unusable. Track the bits that are being leaked, and only warn when a new bit is set. That said, this patch will help with the following issues: 1) It will tell us which bits are being set, so, it is easy to communicate it back to vendor, and to do a root-cause analyzes. 2) It avoid the machine to be unusable, because, worst case scenario, the user gets less than 60 WARNs (one per unhandled bit). Suggested-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Breno Leitao <leitao@debian.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Sandipan Das <sandipan.das@amd.com> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Link: https://lkml.kernel.org/r/20241001141020.2620361-1-leitao@debian.org
2024-10-07perf/x86/intel: Add PMU support for ArrowLake-HDapeng Mi3-3/+127
ArrowLake-H contains 3 different uarchs, LionCove, Skymont and Crestmont. It is different with previous hybrid processors which only contains two kinds of uarchs. This patch adds PMU support for ArrowLake-H processor, adds ARL-H specific events which supports the 3 kinds of uarchs, such as td_retiring_arl_h, and extends some existed format attributes like offcore_rsp to make them be available to support ARL-H as well. Althrough these format attributes like offcore_rsp have been extended to support ARL-H, they can still support the regular hybrid platforms with 2 kinds of uarchs since the helper hybrid_format_is_visible() would filter PMU types and only show the format attribute for available PMUs. Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Tested-by: Yongwei Ma <yongwei.ma@intel.com> Link: https://lkml.kernel.org/r/20240820073853.1974746-5-dapeng1.mi@linux.intel.com
2024-10-07perf/x86/intel: Support hybrid PMU with multiple atom uarchsDapeng Mi2-10/+36
The upcoming ARL-H hybrid processor contains 2 different atom uarchs which have different PMU capabilities. To distinguish these atom uarchs, CPUID.1AH.EAX[23:0] defines a native model ID which can be used to uniquely identify the uarch of the core by combining with core type. Thus a 3rd hybrid pmu type "hybrid_tiny" is defined to mark the 2nd atom uarch. The helper find_hybrid_pmu_for_cpu() would compare the hybrid pmu type and dynamically read core native id from cpu to identify the corresponding hybrid pmu structure. Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Tested-by: Yongwei Ma <yongwei.ma@intel.com> Link: https://lkml.kernel.org/r/20240820073853.1974746-4-dapeng1.mi@linux.intel.com
2024-10-07x86/cpu/intel: Define helper to get CPU core native IDDapeng Mi2-0/+21
Define helper get_this_hybrid_cpu_native_id() to return the CPU core native ID. This core native ID combining with core type can be used to figure out the CPU core uarch uniquely. Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Tested-by: Yongwei Ma <yongwei.ma@intel.com> Link: https://lkml.kernel.org/r/20240820073853.1974746-3-dapeng1.mi@linux.intel.com
2024-10-07perf/x86: Refine hybrid_pmu_type definationDapeng Mi1-7/+5
Use macros instead of magic number to define hybrid_pmu_type and remove X86_HYBRID_NUM_PMUS since it's never used. Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Tested-by: Yongwei Ma <yongwei.ma@intel.com> Link: https://lkml.kernel.org/r/20240820073853.1974746-2-dapeng1.mi@linux.intel.com
2024-10-07uprobes: switch to RCU Tasks Trace flavor for better performanceAndrii Nakryiko1-0/+1
This patch switches uprobes SRCU usage to RCU Tasks Trace flavor, which is optimized for more lightweight and quick readers (at the expense of slower writers, which for uprobes is a fine tradeof) and has better performance and scalability with number of CPUs. Similarly to baseline vs SRCU, we've benchmarked SRCU-based implementation vs RCU Tasks Trace implementation. SRCU ==== uprobe-nop ( 1 cpus): 3.276 ± 0.005M/s ( 3.276M/s/cpu) uprobe-nop ( 2 cpus): 4.125 ± 0.002M/s ( 2.063M/s/cpu) uprobe-nop ( 4 cpus): 7.713 ± 0.002M/s ( 1.928M/s/cpu) uprobe-nop ( 8 cpus): 8.097 ± 0.006M/s ( 1.012M/s/cpu) uprobe-nop (16 cpus): 6.501 ± 0.056M/s ( 0.406M/s/cpu) uprobe-nop (32 cpus): 4.398 ± 0.084M/s ( 0.137M/s/cpu) uprobe-nop (64 cpus): 6.452 ± 0.000M/s ( 0.101M/s/cpu) uretprobe-nop ( 1 cpus): 2.055 ± 0.001M/s ( 2.055M/s/cpu) uretprobe-nop ( 2 cpus): 2.677 ± 0.000M/s ( 1.339M/s/cpu) uretprobe-nop ( 4 cpus): 4.561 ± 0.003M/s ( 1.140M/s/cpu) uretprobe-nop ( 8 cpus): 5.291 ± 0.002M/s ( 0.661M/s/cpu) uretprobe-nop (16 cpus): 5.065 ± 0.019M/s ( 0.317M/s/cpu) uretprobe-nop (32 cpus): 3.622 ± 0.003M/s ( 0.113M/s/cpu) uretprobe-nop (64 cpus): 3.723 ± 0.002M/s ( 0.058M/s/cpu) RCU Tasks Trace =============== uprobe-nop ( 1 cpus): 3.396 ± 0.002M/s ( 3.396M/s/cpu) uprobe-nop ( 2 cpus): 4.271 ± 0.006M/s ( 2.135M/s/cpu) uprobe-nop ( 4 cpus): 8.499 ± 0.015M/s ( 2.125M/s/cpu) uprobe-nop ( 8 cpus): 10.355 ± 0.028M/s ( 1.294M/s/cpu) uprobe-nop (16 cpus): 7.615 ± 0.099M/s ( 0.476M/s/cpu) uprobe-nop (32 cpus): 4.430 ± 0.007M/s ( 0.138M/s/cpu) uprobe-nop (64 cpus): 6.887 ± 0.020M/s ( 0.108M/s/cpu) uretprobe-nop ( 1 cpus): 2.174 ± 0.001M/s ( 2.174M/s/cpu) uretprobe-nop ( 2 cpus): 2.853 ± 0.001M/s ( 1.426M/s/cpu) uretprobe-nop ( 4 cpus): 4.913 ± 0.002M/s ( 1.228M/s/cpu) uretprobe-nop ( 8 cpus): 5.883 ± 0.002M/s ( 0.735M/s/cpu) uretprobe-nop (16 cpus): 5.147 ± 0.001M/s ( 0.322M/s/cpu) uretprobe-nop (32 cpus): 3.738 ± 0.008M/s ( 0.117M/s/cpu) uretprobe-nop (64 cpus): 4.397 ± 0.002M/s ( 0.069M/s/cpu) Peak throughput for uprobes increases from 8 mln/s to 10.3 mln/s (+28%!), and for uretprobes from 5.3 mln/s to 5.8 mln/s (+11%), as we have more work to do on uretprobes side. Even single-thread (no contention) performance is slightly better: 3.276 mln/s to 3.396 mln/s (+3.5%) for uprobes, and 2.055 mln/s to 2.174 mln/s (+5.8%) for uretprobes. We also select TASKS_TRACE_RCU for UPROBES in Kconfig due to the new dependency. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Link: https://lkml.kernel.org/r/20240910174312.3646590-1-andrii@kernel.org
2024-10-06Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds9-42/+82
Pull kvm fixes from Paolo Bonzini: "ARM64: - Fix pKVM error path on init, making sure we do not change critical system registers as we're about to fail - Make sure that the host's vector length is at capped by a value common to all CPUs - Fix kvm_has_feat*() handling of "negative" features, as the current code is pretty broken - Promote Joey to the status of official reviewer, while James steps down -- hopefully only temporarly x86: - Fix compilation with KVM_INTEL=KVM_AMD=n - Fix disabling KVM_X86_QUIRK_SLOT_ZAP_ALL when shadow MMU is in use Selftests: - Fix compilation on non-x86 architectures" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: x86/reboot: emergency callbacks are now registered by common KVM code KVM: x86: leave kvm.ko out of the build if no vendor module is requested KVM: x86/mmu: fix KVM_X86_QUIRK_SLOT_ZAP_ALL for shadow MMU KVM: arm64: Fix kvm_has_feat*() handling of negative features KVM: selftests: Fix build on architectures other than x86_64 KVM: arm64: Another reviewer reshuffle KVM: arm64: Constrain the host to the maximum shared SVE VL with pKVM KVM: arm64: Fix __pkvm_init_vcpu cptr_el2 error path
2024-10-06Merge tag 'powerpc-6.12-3' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fix from Michael Ellerman: - Allow r30 to be used in vDSO code generation of getrandom Thanks to Jason A. Donenfeld * tag 'powerpc-6.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/vdso: allow r30 in vDSO code generation of getrandom
2024-10-06Merge tag 'kvmarm-fixes-6.12-1' of ↵Paolo Bonzini1400-19326/+55927
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 6.12, take #1 - Fix pKVM error path on init, making sure we do not change critical system registers as we're about to fail - Make sure that the host's vector length is at capped by a value common to all CPUs - Fix kvm_has_feat*() handling of "negative" features, as the current code is pretty broken - Promote Joey to the status of official reviewer, while James steps down -- hopefully only temporarly
2024-10-06x86/reboot: emergency callbacks are now registered by common KVM codePaolo Bonzini2-4/+4
Guard them with CONFIG_KVM_X86_COMMON rather than the two vendor modules. In practice this has no functional change, because CONFIG_KVM_X86_COMMON is set if and only if at least one vendor-specific module is being built. However, it is cleaner to specify CONFIG_KVM_X86_COMMON for functions that are used in kvm.ko. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Fixes: 590b09b1d88e ("KVM: x86: Register "emergency disable" callbacks when virt is enabled") Fixes: 6d55a94222db ("x86/reboot: Unconditionally define cpu_emergency_virt_cb typedef") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-06KVM: x86: leave kvm.ko out of the build if no vendor module is requestedPaolo Bonzini2-4/+7
kvm.ko is nothing but library code shared by kvm-intel.ko and kvm-amd.ko. It provides no functionality on its own and it is unnecessary unless one of the vendor-specific module is compiled. In particular, /dev/kvm is not created until one of kvm-intel.ko or kvm-amd.ko is loaded. Use CONFIG_KVM to decide if it is built-in or a module, but use the vendor-specific modules for the actual decision on whether to build it. This also fixes a build failure when CONFIG_KVM_INTEL and CONFIG_KVM_AMD are both disabled. The cpu_emergency_register_virt_callback() function is called from kvm.ko, but it is only defined if at least one of CONFIG_KVM_INTEL and CONFIG_KVM_AMD is provided. Fixes: 590b09b1d88e ("KVM: x86: Register "emergency disable" callbacks when virt is enabled") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-04Merge tag 'arm64-fixes' of ↵Linus Torvalds4-4/+10
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Catalin Marinas: "A couple of build/config issues and expanding the speculative SSBS workaround to more CPUs: - Expand the speculative SSBS workaround to cover Cortex-A715, Neoverse-N3 and Microsoft Azure Cobalt 100 - Force position-independent veneers - in some kernel configurations, the LLD linker generates position-dependent veneers for otherwise position-independent code, resulting in early boot-time failures - Fix Kconfig selection of HAVE_DYNAMIC_FTRACE_WITH_ARGS so that it is not enabled when not supported by the combination of clang and GNU ld" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: Subscribe Microsoft Azure Cobalt 100 to erratum 3194386 arm64: fix selection of HAVE_DYNAMIC_FTRACE_WITH_ARGS arm64: errata: Expand speculative SSBS workaround once more arm64: cputype: Add Neoverse-N3 definitions arm64: Force position-independent veneers
2024-10-04Merge tag 'riscv-for-linus-6.12-rc2' of ↵Linus Torvalds2-3/+7
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V fixes from Palmer Dabbelt: - PERF_TYPE_BREAKPOINT now returns -EOPNOTSUPP instead of -ENOENT, which aligns to other ports and is a saner value - The KASAN-related stack size increasing logic has been moved to a C header, to avoid dependency issues * tag 'riscv-for-linus-6.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: riscv: Fix kernel stack size when KASAN is enabled drivers/perf: riscv: Align errno for unsupported perf event
2024-10-04Merge tag 'trace-v6.12-rc1' of ↵Linus Torvalds1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixes from Steven Rostedt: - Fix tp_printk command line option crashing the kernel With the code that can handle a buffer from a previous boot, the trace_check_vprintf() needed access to the delta of the address space used by the old buffer and the current buffer. To do so, the trace_array (tr) parameter was used. But when tp_printk is enabled on the kernel command line, no trace buffer is used and the trace event is sent directly to printk(). That meant the tr field of the iterator descriptor was NULL, and since tp_printk still uses trace_check_vprintf() it caused a NULL dereference. - Add ptrace.h include to x86 ftrace file for completeness - Fix rtla installation when done with out-of-tree build - Fix the help messages in rtla that were incorrect - Several fixes to fix races with the timerlat and hwlat code Several locking issues were discovered with the coordination between timerlat kthread creation and hotplug. As timerlat has callbacks from hotplug code to start kthreads when CPUs come online. There are also locking issues with grabbing the cpu_read_lock() and the locks within timerlat. * tag 'trace-v6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing/hwlat: Fix a race during cpuhp processing tracing/timerlat: Fix a race during cpuhp processing tracing/timerlat: Drop interface_lock in stop_kthread() tracing/timerlat: Fix duplicated kthread creation due to CPU online/offline x86/ftrace: Include <asm/ptrace.h> rtla: Fix the help text in osnoise and timerlat top tools tools/rtla: Fix installation from out-of-tree build tracing: Fix trace_check_vprintf() when tp_printk is used
2024-10-04arm64: Subscribe Microsoft Azure Cobalt 100 to erratum 3194386Easwar Hariharan1-0/+1
Add the Microsoft Azure Cobalt 100 CPU to the list of CPUs suffering from erratum 3194386 added in commit 75b3c43eab59 ("arm64: errata: Expand speculative SSBS workaround") CC: Mark Rutland <mark.rutland@arm.com> CC: James More <james.morse@arm.com> CC: Will Deacon <will@kernel.org> CC: stable@vger.kernel.org # 6.6+ Signed-off-by: Easwar Hariharan <eahariha@linux.microsoft.com> Link: https://lore.kernel.org/r/20241003225239.321774-1-eahariha@linux.microsoft.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-10-04Merge tag 'rust-fixes-6.12' of https://github.com/Rust-for-Linux/linuxLinus Torvalds1-1/+17
Pull Rust fixes from Miguel Ojeda: "Toolchain and infrastructure: - Fix/improve a couple 'depends on' on the newly added CFI/KASAN suppport to avoid build errors/warnings - Fix ARCH_SLAB_MINALIGN multiple definition error for RISC-V under !CONFIG_MMU - Clean upcoming (Rust 1.83.0) Clippy warnings 'kernel' crate: - 'sync' module: fix soundness issue by requiring 'T: Sync' for 'LockedBy::access'; and fix helpers build error under PREEMPT_RT - Fix trivial sorting issue ('rustfmtcheck') on the v6.12 Rust merge" * tag 'rust-fixes-6.12' of https://github.com/Rust-for-Linux/linux: rust: kunit: use C-string literals to clean warning cfi: encode cfi normalized integers + kasan/gcov bug in Kconfig rust: KASAN+RETHUNK requires rustc 1.83.0 rust: cfi: fix `patchable-function-entry` starting version rust: mutex: fix __mutex_init() usage in case of PREEMPT_RT rust: fix `ARCH_SLAB_MINALIGN` multiple definition error rust: sync: require `T: Sync` for `LockedBy::access` rust: kernel: sort Rust modules
2024-10-04KVM: x86/mmu: fix KVM_X86_QUIRK_SLOT_ZAP_ALL for shadow MMUPaolo Bonzini1-14/+46
As was tried in commit 4e103134b862 ("KVM: x86/mmu: Zap only the relevant pages when removing a memslot"), all shadow pages, i.e. non-leaf SPTEs, need to be zapped. All of the accounting for a shadow page is tied to the memslot, i.e. the shadow page holds a reference to the memslot, for all intents and purposes. Deleting the memslot without removing all relevant shadow pages, as is done when KVM_X86_QUIRK_SLOT_ZAP_ALL is disabled, results in NULL pointer derefs when tearing down the VM. Reintroduce from that commit the code that walks the whole memslot when there are active shadow MMU pages. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-03x86/ftrace: Include <asm/ptrace.h>Sami Tolvanen1-0/+2
<asm/ftrace.h> uses struct pt_regs in several places. Include <asm/ptrace.h> to ensure it's visible. This is needed to make sure object files that only include <asm/asm-prototypes.h> compile. Cc: Mark Rutland <mark.rutland@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Link: https://lore.kernel.org/20240916221557.846853-2-samitolvanen@google.com Suggested-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-10-03KVM: arm64: Fix kvm_has_feat*() handling of negative featuresMarc Zyngier1-12/+13
Oliver reports that the kvm_has_feat() helper is not behaviing as expected for negative feature. On investigation, the main issue seems to be caused by the following construct: #define get_idreg_field(kvm, id, fld) \ (id##_##fld##_SIGNED ? \ get_idreg_field_signed(kvm, id, fld) : \ get_idreg_field_unsigned(kvm, id, fld)) where one side of the expression evaluates as something signed, and the other as something unsigned. In retrospect, this is totally braindead, as the compiler converts this into an unsigned expression. When compared to something that is 0, the test is simply elided. Epic fail. Similar issue exists in the expand_field_sign() macro. The correct way to handle this is to chose between signed and unsigned comparisons, so that both sides of the ternary expression are of the same type (bool). In order to keep the code readable (sort of), we introduce new comparison primitives taking an operator as a parameter, and rewrite the kvm_has_feat*() helpers in terms of these primitives. Fixes: c62d7a23b947 ("KVM: arm64: Add feature checking helpers") Reported-by: Oliver Upton <oliver.upton@linux.dev> Tested-by: Oliver Upton <oliver.upton@linux.dev> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20241002204239.2051637-1-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>