summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)AuthorFilesLines
2025-09-19sparc: fix accurate exception reporting in copy_{from_to}_user for UltraSPARCMichael Karcher1-9/+10
The referenced commit introduced exception handlers on user-space memory references in copy_from_user and copy_to_user. These handlers return from the respective function and calculate the remaining bytes left to copy using the current register contents. This commit fixes a couple of bad calculations. This will fix the return value of copy_from_user and copy_to_user in the faulting case. The behaviour of memcpy stays unchanged. Fixes: cb736fdbb208 ("sparc64: Convert U1copy_{from,to}_user to accurate exception reporting.") Tested-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> # on QEMU 10.0.3 Tested-by: René Rebe <rene@exactcode.com> # on Ultra 5 UltraSparc IIi Tested-by: Jonathan 'theJPster' Pallant <kernel@thejpster.org.uk> # on Sun Netra T1 Signed-off-by: Michael Karcher <kernel@mkarcher.dialup.fu-berlin.de> Reviewed-by: Andreas Larsson <andreas@gaisler.com> Link: https://lore.kernel.org/r/20250905-memcpy_series-v4-1-1ca72dda195b@mkarcher.dialup.fu-berlin.de Signed-off-by: Andreas Larsson <andreas@gaisler.com>
2025-09-19sparc64: fix prototypes of reads[bwl]()Al Viro1-3/+3
Conventions for readsl() are the same as for readl() - any __iomem pointer is acceptable, both const and volatile ones being OK. Same for readsb() and readsw(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Reviewed-by: Andreas Larsson <andreas@gaisler.com> Signed-off-by: Andreas Larsson <andreas@gaisler.com> # Making sparc64 subject prefix
2025-09-19sparc64: Remove redundant __GFP_NOWARNQianfeng Rong1-2/+2
Commit 16f5dfbc851b ("gfp: include __GFP_NOWARN in GFP_NOWAIT") made GFP_NOWAIT implicitly include __GFP_NOWARN. Therefore, explicit __GFP_NOWARN combined with GFP_NOWAIT (e.g., `GFP_NOWAIT | __GFP_NOWARN`) is now redundant. Let's clean up these redundant flags across subsystems. No functional changes. Signed-off-by: Qianfeng Rong <rongqianfeng@vivo.com> Reviewed-by: Andreas Larsson <andreas@gaisler.com> Signed-off-by: Andreas Larsson <andreas@gaisler.com>
2025-09-19sparc64: fix hugetlb for sun4uAnthony Yznaga1-0/+20
An attempt to exercise sparc hugetlb code in a sun4u-based guest running under qemu results in the guest hanging due to being stuck in a trap loop. This is due to invalid hugetlb TTEs being installed that do not have the expected _PAGE_PMD_HUGE and page size bits set. Although the breakage has gone apparently unnoticed for several years, fix it now so there is the option to exercise sparc hugetlb code under qemu. This can be useful because sun4v support in qemu does not support linux guests currently and sun4v-based hardware resources may not be readily available. Fix tested with a 6.15.2 and 6.16-rc6 kernels by running libhugetlbfs tests on a qemu guest running Debian 13. Fixes: c7d9f77d33a7 ("sparc64: Multi-page size support") Cc: stable@vger.kernel.org Signed-off-by: Anthony Yznaga <anthony.yznaga@oracle.com> Tested-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Reviewed-by: Andreas Larsson <andreas@gaisler.com> Link: https://lore.kernel.org/r/20250716012446.10357-1-anthony.yznaga@oracle.com Signed-off-by: Andreas Larsson <andreas@gaisler.com>
2025-09-19sparc/module: Make it clear that relocation numbers are shown in hexKoakuma1-1/+1
This is to ease debugging by removing the ambiguity of the shown number base. Signed-off-by: Koakuma <koachan@protonmail.com> Reviewed-by: Andreas Larsson <andreas@gaisler.com> Signed-off-by: Andreas Larsson <andreas@gaisler.com>
2025-09-19sparc/module: Add R_SPARC_UA64 relocation handlingKoakuma2-0/+2
This is needed so that the kernel can handle R_SPARC_UA64 relocations, which is emitted by LLVM's IAS. Signed-off-by: Koakuma <koachan@protonmail.com> Reviewed-by: Andreas Larsson <andreas@gaisler.com> Signed-off-by: Andreas Larsson <andreas@gaisler.com>
2025-09-19x86/umip: Check that the instruction opcode is at least two bytesSean Christopherson1-2/+2
When checking for a potential UMIP violation on #GP, verify the decoder found at least two opcode bytes to avoid false positives when the kernel encounters an unknown instruction that starts with 0f. Because the array of opcode.bytes is zero-initialized by insn_init(), peeking at bytes[1] will misinterpret garbage as a potential SLDT or STR instruction, and can incorrectly trigger emulation. E.g. if a VPALIGNR instruction 62 83 c5 05 0f 08 ff vpalignr xmm17{k5},xmm23,XMMWORD PTR [r8],0xff hits a #GP, the kernel emulates it as STR and squashes the #GP (and corrupts the userspace code stream). Arguably the check should look for exactly two bytes, but no three byte opcodes use '0f 00 xx' or '0f 01 xx' as an escape, i.e. it should be impossible to get a false positive if the first two opcode bytes match '0f 00' or '0f 01'. Go with a more conservative check with respect to the existing code to minimize the chances of breaking userspace, e.g. due to decoder weirdness. Analyzed by Nick Bray <ncbray@google.com>. Fixes: 1e5db223696a ("x86/umip: Add emulation code for UMIP instructions") Reported-by: Dan Snyder <dansnyder@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org
2025-09-19arm64: mm: split linear mapping if BBML2 unsupported on secondary CPUsRyan Roberts4-27/+187
The kernel linear mapping is painted in very early stage of system boot. The cpufeature has not been finalized yet at this point. So the linear mapping is determined by the capability of boot CPU only. If the boot CPU supports BBML2, large block mappings will be used for linear mapping. But the secondary CPUs may not support BBML2, so repaint the linear mapping if large block mapping is used and the secondary CPUs don't support BBML2 once cpufeature is finalized on all CPUs. If the boot CPU doesn't support BBML2 or the secondary CPUs have the same BBML2 capability with the boot CPU, repainting the linear mapping is not needed. Repainting is implemented by the boot CPU, which we know supports BBML2, so it is safe for the live mapping size to change for this CPU. The linear map region is walked using the pagewalk API and any discovered large leaf mappings are split to pte mappings using the existing helper functions. Since the repainting is performed inside of a stop_machine(), we must use GFP_ATOMIC to allocate the extra intermediate pgtables. But since we are still early in boot, it is expected that there is plenty of memory available so we will never need to sleep for reclaim, and so GFP_ATOMIC is acceptable here. The secondary CPUs are all put into a waiting area with the idmap in TTBR0 and reserved map in TTBR1 while this is performed since they cannot be allowed to observe any size changes on the live mappings. Some of this infrastructure is reused from the kpti case. Specifically we share the same flag (was __idmap_kpti_flag, now idmap_kpti_bbml2_flag) since it means we don't have to reserve any extra pgtable memory to idmap the extra flag. Co-developed-by: Yang Shi <yang@os.amperecomputing.com> Signed-off-by: Yang Shi <yang@os.amperecomputing.com> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
2025-09-19Merge tag 'loongarch-fixes-6.17-2' of ↵Linus Torvalds11-66/+120
git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson Pull LoongArch fixes from Huacai Chen: "Fix some build warnings for RUST-enabled objtool check, align ACPI structures for ARCH_STRICT_ALIGN, fix an unreliable stack for live patching, add some NULL pointer checkings, and fix some bugs around KVM" * tag 'loongarch-fixes-6.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson: LoongArch: KVM: Avoid copy_*_user() with lock hold in kvm_pch_pic_regs_access() LoongArch: KVM: Avoid copy_*_user() with lock hold in kvm_eiointc_sw_status_access() LoongArch: KVM: Avoid copy_*_user() with lock hold in kvm_eiointc_regs_access() LoongArch: KVM: Avoid copy_*_user() with lock hold in kvm_eiointc_ctrl_access() LoongArch: KVM: Fix VM migration failure with PTW enabled LoongArch: KVM: Remove unused returns and semicolons LoongArch: vDSO: Check kcalloc() result in init_vdso() LoongArch: Fix unreliable stack for live patching LoongArch: Replace sprintf() with sysfs_emit() LoongArch: Check the return value when creating kobj LoongArch: Align ACPI structures if ARCH_STRICT_ALIGN enabled LoongArch: Update help info of ARCH_STRICT_ALIGN LoongArch: Handle jump tables options for RUST LoongArch: Make LTO case independent in Makefile objtool/LoongArch: Mark special atomic instruction as INSN_BUG type objtool/LoongArch: Mark types based on break immediate code
2025-09-19riscv: errata: Fix the PAUSE Opcode for MIPS P8700Djordje Todorovic12-3/+127
Add ERRATA_MIPS and ERRATA_MIPS_P8700_PAUSE_OPCODE configs. Handle errata for the MIPS PAUSE instruction. Signed-off-by: Djordje Todorovic <djordje.todorovic@htecgroup.com> Signed-off-by: Aleksandar Rikalo <arikalo@gmail.com> Signed-off-by: Raj Vishwanathan4 <rvishwanathan@mips.com> Signed-off-by: Aleksa Paunovic <aleksa.paunovic@htecgroup.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20250724-p8700-pause-v5-7-a6cbbe1c3412@htecgroup.com [pjw@kernel.org: updated to apply and compile; fixed a checkpatch issue] Signed-off-by: Paul Walmsley <pjw@kernel.org>
2025-09-19riscv: hwprobe: Add MIPS vendor extension probingAleksa Paunovic7-1/+56
Add a new hwprobe key "RISCV_HWPROBE_KEY_VENDOR_EXT_MIPS_0" which allows userspace to probe for the new xmipsexectl vendor extension. Signed-off-by: Aleksa Paunovic <aleksa.paunovic@htecgroup.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20250724-p8700-pause-v5-4-a6cbbe1c3412@htecgroup.com [pjw@kernel.org: fixed some checkpatch issues] Signed-off-by: Paul Walmsley <pjw@kernel.org>
2025-09-19riscv: Add xmipsexectl instructionsAleksa Paunovic1-0/+19
Add xmipsexectl instruction opcodes. This includes the MIPS.PAUSE, MIPS.EHB, and MIPS.IHB instructions. Signed-off-by: Aleksa Paunovic <aleksa.paunovic@htecgroup.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20250724-p8700-pause-v5-3-a6cbbe1c3412@htecgroup.com Signed-off-by: Paul Walmsley <pjw@kernel.org>
2025-09-19KVM: arm64: Expose FEAT_LSFE to guestsMark Brown1-1/+3
FEAT_LSFE (Large System Float Extension), providing atomic floating point memory operations, is optional from v9.5. This feature adds no new architectural state, expose the relevant ID register field to guests so they can discover it. Signed-off-by: Mark Brown <broonie@kernel.org> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-19arm64: Kconfig: Spell out "ARMv9.4" in menuconfig textWill Deacon1-2/+2
The menuconfig entries to configure various architectural features are all formatted as "ARMvx.y architecture features" with the unusual exception of 9.4, which omits the "ARM" prefix. Add the "ARM" prefix to the menuconfig entry for the ARMv9.4 architectural features. Signed-off-by: Will Deacon <will@kernel.org>
2025-09-19KVM: arm64: Add trap configs for PMSDSFR_EL1James Clark3-0/+4
SPE data source filtering (SPE_FEAT_FDS) adds a new register PMSDSFR_EL1, add the trap configs for it. PMSNEVFR_EL1 was also missing its VNCR offset so add it along with PMSDSFR_EL1. Tested-by: Leo Yan <leo.yan@arm.com> Signed-off-by: James Clark <james.clark@linaro.org> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-19KVM: arm64: nv: Expose up to FEAT_Debugv8p8 to NV-enabled VMsOliver Upton1-2/+5
The changes to the debug architecture up to v8.8 are concerned with external debug, which of course has no direct impact on VMs. Raise the feature limit and document what's preventing us from raising it further. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-19KVM: arm64: nv: Advertise FEAT_TIDCP1 to NV-enabled VMsOliver Upton1-1/+0
While KVM does not expose IMPDEF features to VMs, FEAT_TIDCP1 is an architecturally-defined EL1 trap of a particular sysreg encoding range. Furthermore, KVM already advertises this feature to non-NV VMs. As there is no interaction with EL2 traps, expose the feature. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-19KVM: arm64: nv: Advertise FEAT_SpecSEI to NV-enabled VMsOliver Upton1-1/+0
FEAT_SpecSEI is an informational feature describing whether speculative loads may generate SErrors. Since there are already cases where KVM reinjects an SError into the VM it is already possible this may happen due to a speculative load within the VM. Stop hiding the feature from NV-enabled VMs. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-19KVM: arm64: nv: Expose FEAT_TWED to NV-enabled VMsOliver Upton1-1/+0
KVM now handles HCR_EL2.{TWEDEn,TWEDEL} correctly when computing the effective HCR for a nested context. Advertise the feature. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-19KVM: arm64: nv: Exclude guest's TWED configuration when TWE isn't setOliver Upton1-0/+7
Ignore the guest hypervisor's configured TWE delay if it hasn't actually requested WFE traps. Otherwise, OR'ing these fields into the effective HCR when the guest sets TWE is safe as KVM doesn't use FEAT_TWED and leaves the fields initialized to 0. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-19KVM: arm64: nv: Expose FEAT_AFP to NV-enabled VMsOliver Upton1-1/+0
FEAT_AFP doesn't intersect with any EL2 trap behavior, expose to NV-enabled VMs. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-19KVM: arm64: nv: Expose FEAT_ECBHB to NV-enabled VMsOliver Upton1-2/+1
The exact wording of the restrictions on branch prediction due to FEAT_ECBHB in DDI0487L.b is as follows: When FEAT_ECBHB is implemented, the branch history information created in a context before an exception to a higher Exception level using AArch64 cannot be used by code before that exception to exploitatively control the execution of any indirect branches in code in a different context after the exception. While vEL2 and EL1 are multiplexed at EL1, they exist in different hardware-described contexts as KVM uses different stage-2 MMUs to represent the corresponding translation regimes. Additionally, exception entries into vEL2 always imply a hardware exception entry into literal EL2 for the emulated regime change. Given all of this, and the fact that FEAT_ECBHB places no limitation on the EL of the protected context after the exception, we can claim FEAT_ECBHB on supporting hardware. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-19KVM: arm64: nv: Expose FEAT_RASv1p1 via RAS_fracOliver Upton1-1/+0
KVM already supports FEAT_RASv1p1 for NV-enabled VMs but only when advertised through the canonical field. Stop masking the silly frac field to expose the feature on systems without FEAT_DF. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-19KVM: arm64: nv: Expose FEAT_DF2 to NV-enabled VMsOliver Upton1-1/+0
The supporting infrastructure in KVM's abort injection code was merged a while ago, but the author (me!) forgot to relax the NV limitation when FEAT_DF2 got exposed to non-NV VMs. Fix it. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-19KVM: arm64: nv: Don't erroneously claim FEAT_DoubleLock for NV VMsOliver Upton2-1/+25
ID_AA64DFR0_EL1.DoubleLock is one of those annoying signed feature fields where a non-negative value implies that a feature is implemented and a negative value implies that it is not. While the intention of masking this field was likely to hide the feature, KVM actually advertises it, even on unsupporting hardware. Remove FEAT_DoubleLock from the mask, making the NI value visible to the VM. Take care to accept the old, incorrect values for this field as we've lied to userspace. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-19KVM: arm64: nv: Convert masks to denylists in limit_nv_id_reg()Oliver Upton1-14/+33
Consistently use denylisting of features such that the limitations of KVM's nested implementation are explicitly documented (rather than implied). Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-19KVM: arm64: Make ID_AA64MMFR1_EL1.{HCX, TWED} writable from userspaceJinqian Yang1-2/+0
Allow userspace to downgrade {HCX, TWED} in ID_AA64MMFR1_EL1. Userspace can only change the value from high to low. Signed-off-by: Jinqian Yang <yangjinqian1@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-19KVM: arm64: Convert MDCR_EL2 RES0 handling to compute_reg_res0_bits()Marc Zyngier1-5/+5
While MDCR_EL2 cannot be RES0, convert it to the same infrastructure anyway, as it make things cleaner. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-19KVM: arm64: Convert SCTLR_EL1 RES0 handling to compute_reg_res0_bits()Marc Zyngier1-5/+5
While SCTLR_EL1 cannot be RES0, convert it to the same infrastructure anyway, as it make things cleaner. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-19KVM: arm64: Enforce absence of FEAT_TCR2 on TCR2_EL2Marc Zyngier1-5/+5
Enforce that TCR2_EL2 are RES0 when FEAT_TCR2 isn't present. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-19KVM: arm64: Enforce absence of FEAT_SCTLR2 on SCTLR2_EL{1,2}Marc Zyngier1-5/+5
Enforce that SCTLR2_EL{1,2} are RES0 when FEAT_SCTLR2 isn't present. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-19KVM: arm64: Convert HCR_EL2 RES0 handling to compute_reg_res0_bits()Marc Zyngier1-16/+14
While HCR_EL2 is unlikely to ever be RES0 (at least when NV is on), but consistency doesn't hurt, and it can be described in the same way as the other registers. Convert it over to the new RES0-computing infrastructure. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-19KVM: arm64: Enforce absence of FEAT_HCX on HCRX_EL2Marc Zyngier1-5/+7
Add the dependency between the HCRX_EL2 register and FEAT_HCX. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-19KVM: arm64: Enforce absence of FEAT_FGT2 on FGT2 registersMarc Zyngier1-30/+37
Similarly to the FEAT_FGT registers, add the dependency between the registers and the controlling feature. WHile we're at it, add the missing checks for the RES0 vs valid bit overlap. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-19KVM: arm64: Enforce absence of FEAT_FGT on FGT registersMarc Zyngier1-49/+77
As we want to enforce FGT registers behaving as RES0 when FEAT_FGT is not exposed to the guest, We move a bumch of things that are so far passed as parameter into a structure that points to the bit description. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-19KVM: arm64: Add reg_feat_map_desc to describe full register dependencyMarc Zyngier1-9/+72
struct reg_bits_to_feat_map is great to describe bit-to-feature dependency, but not so much to describe register-to-feature dependency. Yet both need to exist. Add a new reg_feat_map_desc structure to describe this. Extra complexity is added by the need to source the RES0 bits from the runtime-computed FGT masks, for which we need an extra flag and extra complexity. Oh well. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-19KVM: arm64: Remove duplicate FEAT_{SYSREG128,MTE2} descriptionsMarc Zyngier1-2/+0
Turns out I'm rather bad at noticing that the description of features has already been added. Remove superflusous definitions for SYSREG128 and MTE2. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-19KVM: arm64: nv: Allow userspace to de-feature stage-2 TGRANsOliver Upton1-5/+18
KVM advertises the stage-2 TGRAN fields as writable to userspace but prevents any modification for NV-enabled VMs. Update the special-cased sanitization to permit de-featuring a particular TGRAN without allowing the legacy value which refers to the stage-1 field for support. Reported-by: Itaru Kitayama <itaru.kitayama@linux.dev> Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-19KVM: arm64: nv: Treat AMO as 1 when at EL2 and {E2H,TGE} = {1, 0}Oliver Upton1-0/+14
SErrors are not deliverable at EL2 when the effective value of HCR_EL2.{TGE,AMO} = {0, 0}. This is bothersome to deal with in nested as we need to use auxiliary pending state to track the pending vSError since HCR_EL2.VSE has no mechanism for honoring the guest HCR. On top of that, we have no way of making that auxiliary pending state visible in ISR_EL1. A defect against the architecture now allows an implementation to treat HCR_EL2.AMO as 1 when HCR_EL2.{E2H,TGE} = {1, 0}. Let's do exactly that, meaning SErrors are always deliverable at EL2 for the typical E2H=RES1 VM. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-19arm64: acpi: Enable ACPI CCEL supportSuzuki K Poulose1-0/+10
Add support for ACPI CCEL by handling the EfiACPIMemoryNVS type memory. As per UEFI specifications NVS memory is reserved for Firmware use even after exiting boot services. Thus map the region as read-only. Cc: Sami Mujawar <sami.mujawar@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Aneesh Kumar K.V <aneesh.kumar@kernel.org> Cc: Steven Price <steven.price@arm.com> Cc: Sudeep Holla <sudeep.holla@arm.com> Cc: Gavin Shan <gshan@redhat.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Tested-by: Sami Mujawar <sami.mujawar@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
2025-09-19arm64: Enable EFI secret area Securityfs supportSuzuki K Poulose1-0/+4
Enable EFI COCO secrets support. Provide the ioremap_encrypted() support required by the driver. Cc: Sami Mujawar <sami.mujawar@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Aneesh Kumar K.V <aneesh.kumar@kernel.org> Cc: Steven Price <steven.price@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Tested-by: Sami Mujawar <sami.mujawar@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
2025-09-19arm64: realm: ioremap: Allow mapping memory as encryptedSuzuki K Poulose3-6/+24
For ioremap(), so far we only checked if it was a device (RIPAS_DEV) to choose an encrypted vs decrypted mapping. However, we may have firmware reserved memory regions exposed to the OS (e.g., EFI Coco Secret Securityfs, ACPI CCEL). We need to make sure that anything that is RIPAS_RAM (i.e., Guest protected memory with RMM guarantees) are also mapped as encrypted. Rephrasing the above, anything that is not RIPAS_EMPTY is guaranteed to be protected by the RMM. Thus we choose encrypted mapping for anything that is not RIPAS_EMPTY. While at it, rename the helper function __arm64_is_protected_mmio => arm64_rsi_is_protected to clearly indicate that this not an arm64 generic helper, but something to do with Realms. Cc: Sami Mujawar <sami.mujawar@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Aneesh Kumar K.V <aneesh.kumar@kernel.org> Cc: Steven Price <steven.price@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Steven Price <steven.price@arm.com> Tested-by: Sami Mujawar <sami.mujawar@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
2025-09-19arm64: dts: allwinner: h313: Add Amediatech X96QJ. Neuschäfer2-0/+231
The X96Q is a set-top box with an H313 SoC, AXP305 PMIC, 1 or 2 GiB RAM, 8 or 16 GiB eMMC flash, 2x USB A, Micro-SD, HDMI, Ethernet, audio/video output, and infrared input. https://x96mini.com/products/x96q-tv-box-android-10-set-top-box Tested, works: - debug UART - status LED - USB ports in host mode - MicroSD - eMMC - recovery button hidden behind audio/video port - analog audio (line out) Does not work: - Ethernet (requires AC200 MFD/EPHY driver) - WLAN (requires out-of-tree XRadio driver) - analog video output (requires AC200 driver) - HDMI audio/video output Untested: - "OTG" USB port in device mode - built-in IR receiver - external IR receiver Table of regulators on the downstream kernel, for reference: vcc-5v 1 15 0 unknown 5000mV 0mA 5000mV 5000mV dcdca 0 0 0 unknown 900mV 0mA 0mV 0mV dcdcb 0 0 0 unknown 1350mV 0mA 0mV 0mV dcdcc 0 0 0 unknown 900mV 0mA 0mV 0mV dcdcd 0 0 0 unknown 1500mV 0mA 0mV 0mV dcdce 0 0 0 unknown 3300mV 0mA 0mV 0mV aldo1 0 0 0 unknown 3300mV 0mA 0mV 0mV aldo2 0 0 0 unknown 700mV 0mA 0mV 0mV aldo3 0 0 0 unknown 700mV 0mA 0mV 0mV bldo1 0 0 0 unknown 1800mV 0mA 0mV 0mV bldo2 0 0 0 unknown 1800mV 0mA 0mV 0mV bldo3 0 0 0 unknown 700mV 0mA 0mV 0mV bldo4 0 0 0 unknown 700mV 0mA 0mV 0mV cldo1 0 0 0 unknown 2500mV 0mA 0mV 0mV cldo2 0 0 0 unknown 700mV 0mA 0mV 0mV cldo3 0 0 0 unknown 700mV 0mA 0mV 0mV Signed-off-by: J. Neuschäfer <j.ne@posteo.net> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Link: https://patch.msgid.link/20250918-x96q-v2-2-51bd39928806@posteo.net Signed-off-by: Chen-Yu Tsai <wens@csie.org>
2025-09-19riscv: Add xmipsexectl as a vendor extensionAleksa Paunovic6-0/+65
Add support for MIPS vendor extensions. Add support for the xmipsexectl vendor extension. Signed-off-by: Aleksa Paunovic <aleksa.paunovic@htecgroup.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20250724-p8700-pause-v5-2-a6cbbe1c3412@htecgroup.com [pjw@kernel.org: added the MIPS vendor ID from another patch to fix the build] Signed-off-by: Paul Walmsley <pjw@kernel.org>
2025-09-19riscv: cpufeature: add validation for zfa, zfh and zfhminClément Léger2-11/+9
These extensions depends on the F one. Add a validation callback checking for the F extension to be present. Now that extensions are correctly reported using the F/D presence, we can remove the has_fpu() check in hwprobe_isa_ext0(). Signed-off-by: Clément Léger <cleger@rivosinc.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20250527100001.33284-1-cleger@rivosinc.com Signed-off-by: Paul Walmsley <pjw@kernel.org>
2025-09-19riscv: sbi: Switch to new sys-off handler APIAndrew Davis1-2/+2
Kernel now supports chained power-off handlers. Use register_platform_power_off() that registers a platform level power-off handler. Legacy pm_power_off() will be removed once all drivers and archs are converted to the new sys-off API. Signed-off-by: Andrew Davis <afd@ti.com> Tested-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20250813151855.105237-1-afd@ti.com Signed-off-by: Paul Walmsley <pjw@kernel.org>
2025-09-18arm64: mm: support large block mapping when rodata=fullYang Shi6-6/+277
When rodata=full is specified, kernel linear mapping has to be mapped at PTE level since large page table can't be split due to break-before-make rule on ARM64. This resulted in a couple of problems: - performance degradation - more TLB pressure - memory waste for kernel page table With FEAT_BBM level 2 support, splitting large block page table to smaller ones doesn't need to make the page table entry invalid anymore. This allows kernel split large block mapping on the fly. Add kernel page table split support and use large block mapping by default when FEAT_BBM level 2 is supported for rodata=full. When changing permissions for kernel linear mapping, the page table will be split to smaller size. The machine without FEAT_BBM level 2 will fallback to have kernel linear mapping PTE-mapped when rodata=full. With this we saw significant performance boost with some benchmarks and much less memory consumption on my AmpereOne machine (192 cores, 1P) with 256GB memory. * Memory use after boot Before: MemTotal: 258988984 kB MemFree: 254821700 kB After: MemTotal: 259505132 kB MemFree: 255410264 kB Around 500MB more memory are free to use. The larger the machine, the more memory saved. * Memcached We saw performance degradation when running Memcached benchmark with rodata=full vs rodata=on. Our profiling pointed to kernel TLB pressure. With this patchset we saw ops/sec is increased by around 3.5%, P99 latency is reduced by around 9.6%. The gain mainly came from reduced kernel TLB misses. The kernel TLB MPKI is reduced by 28.5%. The benchmark data is now on par with rodata=on too. * Disk encryption (dm-crypt) benchmark Ran fio benchmark with the below command on a 128G ramdisk (ext4) with disk encryption (by dm-crypt). fio --directory=/data --random_generator=lfsr --norandommap \ --randrepeat 1 --status-interval=999 --rw=write --bs=4k --loops=1 \ --ioengine=sync --iodepth=1 --numjobs=1 --fsync_on_close=1 \ --group_reporting --thread --name=iops-test-job --eta-newline=1 \ --size 100G The IOPS is increased by 90% - 150% (the variance is high, but the worst number of good case is around 90% more than the best number of bad case). The bandwidth is increased and the avg clat is reduced proportionally. * Sequential file read Read 100G file sequentially on XFS (xfs_io read with page cache populated). The bandwidth is increased by 150%. Co-developed-by: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Yang Shi <yang@os.amperecomputing.com> Signed-off-by: Will Deacon <will@kernel.org>
2025-09-18arm64: Enable permission change on arm64 kernel block mappingsDev Jain1-31/+88
This patch paves the path to enable huge mappings in vmalloc space and linear map space by default on arm64. For this we must ensure that we can handle any permission games on the kernel (init_mm) pagetable. Previously, __change_memory_common() used apply_to_page_range() which does not support changing permissions for block mappings. We move away from this by using the pagewalk API, similar to what riscv does right now. It is the responsibility of the caller to ensure that the range over which permissions are being changed falls on leaf mapping boundaries. For systems with BBML2, this will be handled in future patches by dyanmically splitting the mappings when required. Unlike apply_to_page_range(), the pagewalk API currently enforces the init_mm.mmap_lock to be held. To avoid the unnecessary bottleneck of the mmap_lock for our usecase, this patch extends this generic API to be used locklessly, so as to retain the existing behaviour for changing permissions. Apart from this reason, it is noted at [1] that KFENCE can manipulate kernel pgtable entries during softirqs. It does this by calling set_memory_valid() -> __change_memory_common(). This being a non-sleepable context, we cannot take the init_mm mmap lock. Add comments to highlight the conditions under which we can use the lockless variant - no underlying VMA, and the user having exclusive control over the range, thus guaranteeing no concurrent access. We require that the start and end of a given range do not partially overlap block mappings, or cont mappings. Return -EINVAL in case a partial block mapping is detected in any of the PGD/P4D/PUD/PMD levels; add a corresponding comment in update_range_prot() to warn that eliminating such a condition is the responsibility of the caller. Note that, the pte level callback may change permissions for a whole contpte block, and that will be done one pte at a time, as opposed to an atomic operation for the block mappings. This is fine as any access will decode either the old or the new permission until the TLBI. apply_to_page_range() currently performs all pte level callbacks while in lazy mmu mode. Since arm64 can optimize performance by batching barriers when modifying kernel pgtables in lazy mmu mode, we would like to continue to benefit from this optimisation. Unfortunately walk_kernel_page_table_range() does not use lazy mmu mode. However, since the pagewalk framework is not allocating any memory, we can safely bracket the whole operation inside lazy mmu mode ourselves. Therefore, wrap the call to walk_kernel_page_table_range() with the lazy MMU helpers. Link: https://lore.kernel.org/linux-arm-kernel/89d0ad18-4772-4d8f-ae8a-7c48d26a927e@arm.com/ [1] Signed-off-by: Dev Jain <dev.jain@arm.com> Signed-off-by: Yang Shi <yshi@os.amperecomputing.com> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
2025-09-18arm64: cpufeature: add AmpereOne to BBML2 allow listYang Shi1-0/+2
AmpereOne supports BBML2 without conflict abort, add to the allow list. Reviewed-by: Christoph Lameter (Ampere) <cl@gentwo.org> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Yang Shi <yang@os.amperecomputing.com> Signed-off-by: Will Deacon <will@kernel.org>
2025-09-18arm64: probes: Fix incorrect bl/blr address and register usageJeremy Linton1-2/+2
The pt_regs registers are 64-bit on arm64, and should be u64 when manipulated. Correct this so that we aren't truncating the address during br/blr sequences. Fixes: efb07ac534e2 ("arm64: probes: Add GCS support to bl/blr/ret") Signed-off-by: Jeremy Linton <jeremy.linton@arm.com> Signed-off-by: Will Deacon <will@kernel.org>