summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)AuthorFilesLines
2023-11-09arm64/syscall: Remove duplicate declarationKevin Brodsky1-1/+0
Commit 6ac19f96515e ("arm64: avoid prototype warnings for syscalls") added missing declarations to various syscall wrapper macros. It however proved a little too zealous in __SYSCALL_DEFINEx(), as a declaration for __arm64_sys##name was already present. A declaration is required before the call to ALLOW_ERROR_INJECTION(), so keep the original one and remove the new one. Fixes: 6ac19f96515e ("arm64: avoid prototype warnings for syscalls") Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20231109141153.250046-1-kevin.brodsky@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-11-08Revert "arm64: smp: avoid NMI IPIs with broken MediaTek FW"Douglas Anderson1-4/+1
This reverts commit a07a594152173a3dd3bdd12fc7d73dbba54cdbca. This is no longer needed after the patch ("arm64: Move MediaTek GIC quirk handling from irqchip to core). Signed-off-by: Douglas Anderson <dianders@chromium.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Marc Zyngier <maz@kernel.org> Tested-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Link: https://lore.kernel.org/r/20231107072651.v2.2.I2c5fa192e767eb3ee233bc28eb60e2f8656c29a6@changeid Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-11-08arm64: Move MediaTek GIC quirk handling from irqchip to coreDouglas Anderson1-8/+38
In commit 44bd78dd2b88 ("irqchip/gic-v3: Disable pseudo NMIs on MediaTek devices w/ firmware issues") we added a method for detecting MediaTek devices with broken firmware and disabled pseudo-NMI. While that worked, it didn't address the problem at a deep enough level. The fundamental issue with this broken firmware is that it's not saving and restoring several important GICR registers. The current list is believed to be: * GICR_NUM_IPRIORITYR * GICR_CTLR * GICR_ISPENDR0 * GICR_ISACTIVER0 * GICR_NSACR Pseudo-NMI didn't work because it was the only thing (currently) in the kernel that relied on the broken registers, so forcing pseudo-NMI off was an effective fix. However, it could be observed that calling system_uses_irq_prio_masking() on these systems still returned "true". That caused confusion and led to the need for commit a07a59415217 ("arm64: smp: avoid NMI IPIs with broken MediaTek FW"). It's worried that the incorrect value returned by system_uses_irq_prio_masking() on these systems will continue to confuse future developers. Let's fix the issue a little more completely by disabling IRQ priorities at a deeper level in the kernel. Once we do this we can revert some of the other bits of code dealing with this quirk. This includes a partial revert of commit 44bd78dd2b88 ("irqchip/gic-v3: Disable pseudo NMIs on MediaTek devices w/ firmware issues"). This isn't a full revert because it leaves some of the changes to the "quirks" structure around in case future code needs it. Suggested-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Tested-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231107072651.v2.1.Ide945748593cffd8ff0feb9ae22b795935b944d6@changeid Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-11-07arm64/arm: arm_pmuv3: perf: Don't truncate 64-bit registersIlkka Koskinen2-45/+28
The driver used to truncate several 64-bit registers such as PMCEID[n] registers used to describe whether architectural and microarchitectural events in range 0x4000-0x401f exist. Due to discarding the bits, the driver made the events invisible, even if they existed. Moreover, PMCCFILTR and PMCR registers have additional bits in the upper 32 bits. This patch makes them available although they aren't currently used. Finally, functions handling PMXEVCNTR and PMXEVTYPER registers are removed as they not being used at all. Fixes: df29ddf4f04b ("arm64: perf: Abstract system register accesses away") Reported-by: Carl Worth <carl@os.amperecomputing.com> Signed-off-by: Ilkka Koskinen <ilkka@os.amperecomputing.com> Acked-by: Will Deacon <will@kernel.org> Closes: https://lore.kernel.org/.. Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/20231102183012.1251410-1-ilkka@os.amperecomputing.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-26Merge branch 'for-next/cpus_have_const_cap' into for-next/coreCatalin Marinas47-271/+402
* for-next/cpus_have_const_cap: (38 commits) : cpus_have_const_cap() removal arm64: Remove cpus_have_const_cap() arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_REPEAT_TLBI arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_NVIDIA_CARMEL_CNP arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_CAVIUM_23154 arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_2645198 arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_1742098 arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_1542419 arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_843419 arm64: Avoid cpus_have_const_cap() for ARM64_UNMAP_KERNEL_AT_EL0 arm64: Avoid cpus_have_const_cap() for ARM64_{SVE,SME,SME2,FA64} arm64: Avoid cpus_have_const_cap() for ARM64_SPECTRE_V2 arm64: Avoid cpus_have_const_cap() for ARM64_SSBS arm64: Avoid cpus_have_const_cap() for ARM64_MTE arm64: Avoid cpus_have_const_cap() for ARM64_HAS_TLB_RANGE arm64: Avoid cpus_have_const_cap() for ARM64_HAS_WFXT arm64: Avoid cpus_have_const_cap() for ARM64_HAS_RNG arm64: Avoid cpus_have_const_cap() for ARM64_HAS_EPAN arm64: Avoid cpus_have_const_cap() for ARM64_HAS_PAN arm64: Avoid cpus_have_const_cap() for ARM64_HAS_GIC_PRIO_MASKING arm64: Avoid cpus_have_const_cap() for ARM64_HAS_DIT ...
2023-10-26Merge branch 'for-next/feat_lse128' into for-next/coreCatalin Marinas5-0/+5
* for-next/feat_lse128: : HWCAP for FEAT_LSE128 kselftest/arm64: add FEAT_LSE128 to hwcap test arm64: add FEAT_LSE128 HWCAP
2023-10-26Merge branch 'for-next/feat_lrcpc3' into for-next/coreCatalin Marinas5-0/+5
* for-next/feat_lrcpc3: : HWCAP for FEAT_LRCPC3 selftests/arm64: add HWCAP2_LRCPC3 test arm64: add FEAT_LRCPC3 HWCAP
2023-10-26Merge branch 'for-next/feat_sve_b16b16' into for-next/coreCatalin Marinas5-1/+11
* for-next/feat_sve_b16b16: : Add support for FEAT_SVE_B16B16 (BFloat16) kselftest/arm64: Verify HWCAP2_SVE_B16B16 arm64/sve: Report FEAT_SVE_B16B16 to userspace
2023-10-26Merge branches 'for-next/sve-remove-pseudo-regs', 'for-next/backtrace-ipi', ↵Catalin Marinas19-223/+206
'for-next/kselftest', 'for-next/misc' and 'for-next/cpufeat-display-cores', remote-tracking branch 'arm64/for-next/perf' into for-next/core * arm64/for-next/perf: perf: hisi: Fix use-after-free when register pmu fails drivers/perf: hisi_pcie: Initialize event->cpu only on success drivers/perf: hisi_pcie: Check the type first in pmu::event_init() perf/arm-cmn: Enable per-DTC counter allocation perf/arm-cmn: Rework DTC counters (again) perf/arm-cmn: Fix DTC domain detection drivers: perf: arm_pmuv3: Drop some unused arguments from armv8_pmu_init() drivers: perf: arm_pmuv3: Read PMMIR_EL1 unconditionally drivers/perf: hisi: use cpuhp_state_remove_instance_nocalls() for hisi_hns3_pmu uninit process drivers/perf: xgene: Use device_get_match_data() perf/amlogic: add missing MODULE_DEVICE_TABLE docs/perf: Add ampere_cspmu to toctree to fix a build warning perf: arm_cspmu: ampere_cspmu: Add support for Ampere SoC PMU perf: arm_cspmu: Support implementation specific validation perf: arm_cspmu: Support implementation specific filters perf: arm_cspmu: Split 64-bit write to 32-bit writes perf: arm_cspmu: Separate Arm and vendor module * for-next/sve-remove-pseudo-regs: : arm64/fpsimd: Remove the vector length pseudo registers arm64/sve: Remove SMCR pseudo register from cpufeature code arm64/sve: Remove ZCR pseudo register from cpufeature code * for-next/backtrace-ipi: : Add IPI for backtraces/kgdb, use NMI arm64: smp: Don't directly call arch_smp_send_reschedule() for wakeup arm64: smp: avoid NMI IPIs with broken MediaTek FW arm64: smp: Mark IPI globals as __ro_after_init arm64: kgdb: Implement kgdb_roundup_cpus() to enable pseudo-NMI roundup arm64: smp: IPI_CPU_STOP and IPI_CPU_CRASH_STOP should try for NMI arm64: smp: Add arch support for backtrace using pseudo-NMI arm64: smp: Remove dedicated wakeup IPI arm64: idle: Tag the arm64 idle functions as __cpuidle irqchip/gic-v3: Enable support for SGIs to act as NMIs * for-next/kselftest: : Various arm64 kselftest updates kselftest/arm64: Validate SVCR in streaming SVE stress test * for-next/misc: : Miscellaneous patches arm64: Restrict CPU_BIG_ENDIAN to GNU as or LLVM IAS 15.x or newer arm64: module: Fix PLT counting when CONFIG_RANDOMIZE_BASE=n arm64, irqchip/gic-v3, ACPI: Move MADT GICC enabled check into a helper clocksource/drivers/arm_arch_timer: limit XGene-1 workaround arm64: Remove system_uses_lse_atomics() arm64: Mark the 'addr' argument to set_ptes() and __set_pte_at() as unused arm64/mm: Hoist synchronization out of set_ptes() loop arm64: swiotlb: Reduce the default size if no ZONE_DMA bouncing needed * for-next/cpufeat-display-cores: : arm64 cpufeature display enabled cores arm64: cpufeature: Change DBM to display enabled cores arm64: cpufeature: Display the set of cores with a feature
2023-10-26arm64: Restrict CPU_BIG_ENDIAN to GNU as or LLVM IAS 15.x or newerNathan Chancellor1-0/+2
Prior to LLVM 15.0.0, LLVM's integrated assembler would incorrectly byte-swap NOP when compiling for big-endian, and the resulting series of bytes happened to match the encoding of FNMADD S21, S30, S0, S0. This went unnoticed until commit: 34f66c4c4d5518c1 ("arm64: Use a positive cpucap for FP/SIMD") Prior to that commit, the kernel would always enable the use of FPSIMD early in boot when __cpu_setup() initialized CPACR_EL1, and so usage of FNMADD within the kernel was not detected, but could result in the corruption of user or kernel FPSIMD state. After that commit, the instructions happen to trap during boot prior to FPSIMD being detected and enabled, e.g. | Unhandled 64-bit el1h sync exception on CPU0, ESR 0x000000001fe00000 -- ASIMD | CPU: 0 PID: 0 Comm: swapper Not tainted 6.6.0-rc3-00013-g34f66c4c4d55 #1 | Hardware name: linux,dummy-virt (DT) | pstate: 400000c9 (nZcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--) | pc : __pi_strcmp+0x1c/0x150 | lr : populate_properties+0xe4/0x254 | sp : ffffd014173d3ad0 | x29: ffffd014173d3af0 x28: fffffbfffddffcb8 x27: 0000000000000000 | x26: 0000000000000058 x25: fffffbfffddfe054 x24: 0000000000000008 | x23: fffffbfffddfe000 x22: fffffbfffddfe000 x21: fffffbfffddfe044 | x20: ffffd014173d3b70 x19: 0000000000000001 x18: 0000000000000005 | x17: 0000000000000010 x16: 0000000000000000 x15: 00000000413e7000 | x14: 0000000000000000 x13: 0000000000001bcc x12: 0000000000000000 | x11: 00000000d00dfeed x10: ffffd414193f2cd0 x9 : 0000000000000000 | x8 : 0101010101010101 x7 : ffffffffffffffc0 x6 : 0000000000000000 | x5 : 0000000000000000 x4 : 0101010101010101 x3 : 000000000000002a | x2 : 0000000000000001 x1 : ffffd014171f2988 x0 : fffffbfffddffcb8 | Kernel panic - not syncing: Unhandled exception | CPU: 0 PID: 0 Comm: swapper Not tainted 6.6.0-rc3-00013-g34f66c4c4d55 #1 | Hardware name: linux,dummy-virt (DT) | Call trace: | dump_backtrace+0xec/0x108 | show_stack+0x18/0x2c | dump_stack_lvl+0x50/0x68 | dump_stack+0x18/0x24 | panic+0x13c/0x340 | el1t_64_irq_handler+0x0/0x1c | el1_abort+0x0/0x5c | el1h_64_sync+0x64/0x68 | __pi_strcmp+0x1c/0x150 | unflatten_dt_nodes+0x1e8/0x2d8 | __unflatten_device_tree+0x5c/0x15c | unflatten_device_tree+0x38/0x50 | setup_arch+0x164/0x1e0 | start_kernel+0x64/0x38c | __primary_switched+0xbc/0xc4 Restrict CONFIG_CPU_BIG_ENDIAN to a known good assembler, which is either GNU as or LLVM's IAS 15.0.0 and newer, which contains the linked commit. Closes: https://github.com/ClangBuiltLinux/linux/issues/1948 Link: https://github.com/llvm/llvm-project/commit/1379b150991f70a5782e9a143c2ba5308da1161c Signed-off-by: Nathan Chancellor <nathan@kernel.org> Cc: stable@vger.kernel.org Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20231025-disable-arm64-be-ias-b4-llvm-15-v1-1-b25263ed8b23@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-24arm64: module: Fix PLT counting when CONFIG_RANDOMIZE_BASE=nMaria Yu1-6/+0
The counting of module PLTs has been broken when CONFIG_RANDOMIZE_BASE=n since commit: 3e35d303ab7d22c4 ("arm64: module: rework module VA range selection") Prior to that commit, when CONFIG_RANDOMIZE_BASE=n, the kernel image and all modules were placed within a 128M region, and no PLTs were necessary for B or BL. Hence count_plts() and partition_branch_plt_relas() skipped handling B and BL when CONFIG_RANDOMIZE_BASE=n. After that commit, modules can be placed anywhere within a 2G window regardless of CONFIG_RANDOMIZE_BASE, and hence PLTs may be necessary for B and BL even when CONFIG_RANDOMIZE_BASE=n. Unfortunately that commit failed to update count_plts() and partition_branch_plt_relas() accordingly. Due to this, module_emit_plt_entry() may fail if an insufficient number of PLT entries have been reserved, resulting in modules failing to load with -ENOEXEC. Fix this by counting PLTs regardless of CONFIG_RANDOMIZE_BASE in count_plts() and partition_branch_plt_relas(). Fixes: 3e35d303ab7d ("arm64: module: rework module VA range selection") Signed-off-by: Maria Yu <quic_aiquny@quicinc.com> Cc: <stable@vger.kernel.org> # 6.5.x Acked-by: Ard Biesheuvel <ardb@kernel.org> Fixes: 3e35d303ab7d ("arm64: module: rework module VA range selection") Reviewed-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20231024010954.6768-1-quic_aiquny@quicinc.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-24arm64, irqchip/gic-v3, ACPI: Move MADT GICC enabled check into a helperJames Morse1-1/+1
ACPI, irqchip and the architecture code all inspect the MADT enabled bit for a GICC entry in the MADT. The addition of an 'online capable' bit means all these sites need updating. Move the current checks behind a helper to make future updates easier. Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Signed-off-by: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk> Acked-by: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com> Link: https://lore.kernel.org/r/E1quv5D-00AeNJ-U8@rmk-PC.armlinux.org.uk Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-23arm64: cpufeature: Change DBM to display enabled coresJeremy Linton1-25/+8
Now that we have the ability to display the list of cores with a feature when its selectivly enabled, lets convert DBM to use that as well. Signed-off-by: Jeremy Linton <jeremy.linton@arm.com> Link: https://lore.kernel.org/r/20231017052322.1211099-3-jeremy.linton@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-23arm64: cpufeature: Display the set of cores with a featureJeremy Linton2-9/+15
The AMU feature can be enabled on a subset of the cores in a system. Because of that, it prints a message for each core as it is detected. This becomes tedious when there are hundreds of cores. Instead, for CPU features which can be enabled on a subset of the present cores, lets wait until update_cpu_capabilities() and print the subset of cores the feature was enabled on. Signed-off-by: Jeremy Linton <jeremy.linton@arm.com> Reviewed-by: Ionela Voinescu <ionela.voinescu@arm.com> Tested-by: Ionela Voinescu <ionela.voinescu@arm.com> Reviewed-by: Punit Agrawal <punit.agrawal@bytedance.com> Tested-by: Punit Agrawal <punit.agrawal@bytedance.com> Link: https://lore.kernel.org/r/20231017052322.1211099-2-jeremy.linton@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-18clocksource/drivers/arm_arch_timer: limit XGene-1 workaroundAndre Przywara2-2/+3
The AppliedMicro XGene-1 CPU has an erratum where the timer condition would only consider TVAL, not CVAL. We currently apply a workaround when seeing the PartNum field of MIDR_EL1 being 0x000, under the assumption that this would match only the XGene-1 CPU model. However even the Ampere eMAG (aka XGene-3) uses that same part number, and only differs in the "Variant" and "Revision" fields: XGene-1's MIDR is 0x500f0000, our eMAG reports 0x503f0002. Experiments show the latter doesn't show the faulty behaviour. Increase the specificity of the check to only consider partnum 0x000 and variant 0x00, to exclude the Ampere eMAG. Fixes: 012f18850452 ("clocksource/drivers/arm_arch_timer: Work around broken CVAL implementations") Reported-by: Ross Burton <ross.burton@arm.com> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Acked-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20231016153127.116101-1-andre.przywara@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-18arm64: Remove system_uses_lse_atomics()Gavin Shan1-8/+1
There are two variants of system_uses_lse_atomics(), depending on CONFIG_ARM64_LSE_ATOMICS. The function isn't called anywhere when CONFIG_ARM64_LSE_ATOMICS is disabled. It can be directly replaced by alternative_has_cap_likely(ARM64_HAS_LSE_ATOMICS) when the kernel option is enabled. No need to keep system_uses_lse_atomics() and just remove it. Signed-off-by: Gavin Shan <gshan@redhat.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20231017005036.334067-1-gshan@redhat.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-18arm64: Mark the 'addr' argument to set_ptes() and __set_pte_at() as unusedCatalin Marinas1-4/+5
This argument is not used by the arm64 implementation. Mark it as __always_unused and also remove the unnecessary 'addr' increment in set_ptes(). Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202310140531.BQQwt3NQ-lkp@intel.com/ Cc: Will Deacon <will@kernel.org> Tested-by: Ryan Roberts <ryan.roberts@arm.com> Link: https://lore.kernel.org/r/ZS6EvMiJ0QF5INkv@arm.com
2023-10-16arm64/mm: Hoist synchronization out of set_ptes() loopRyan Roberts3-14/+21
set_ptes() sets a physically contiguous block of memory (which all belongs to the same folio) to a contiguous block of ptes. The arm64 implementation of this previously just looped, operating on each individual pte. But the __sync_icache_dcache() and mte_sync_tags() operations can both be hoisted out of the loop so that they are performed once for the contiguous set of pages (which may be less than the whole folio). This should result in minor performance gains. __sync_icache_dcache() already acts on the whole folio, and sets a flag in the folio so that it skips duplicate calls. But by hoisting the call, all the pte testing is done only once. mte_sync_tags() operates on each individual page with its own loop. But by passing the number of pages explicitly, we can rely solely on its loop and do the checks only once. This approach also makes it robust for the future, rather than assuming if a head page of a compound page is being mapped, then the whole compound page is being mapped, instead we explicitly know how many pages are being mapped. The old assumption may not continue to hold once the "anonymous large folios" feature is merged. Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Steven Price <steven.price@arm.com> Link: https://lore.kernel.org/r/20231005140730.2191134-1-ryan.roberts@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16arm64: Remove cpus_have_const_cap()Mark Rutland2-37/+2
There are no longer any users of cpus_have_const_cap(), and therefore it can be removed. Remove cpus_have_const_cap(). At the same time, remove __cpus_have_const_cap(), as this is a trivial wrapper of alternative_has_cap_unlikely(), which can be used directly instead. The comment for __system_matches_cap() is updated to no longer refer to cpus_have_const_cap(). As we have a number of ways to check the cpucaps, the specific suggestions are removed. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Kristina Martsenko <kristina.martsenko@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_REPEAT_TLBIMark Rutland2-3/+4
In arch_tlbbatch_should_defer() we use cpus_have_const_cap() to check for ARM64_WORKAROUND_REPEAT_TLBI, but this is not necessary and alternative_has_cap_*() would be preferable. For historical reasons, cpus_have_const_cap() is more complicated than it needs to be. Before cpucaps are finalized, it will perform a bitmap test of the system_cpucaps bitmap, and once cpucaps are finalized it will use an alternative branch. This used to be necessary to handle some race conditions in the window between cpucap detection and the subsequent patching of alternatives and static branches, where different branches could be out-of-sync with one another (or w.r.t. alternative sequences). Now that we use alternative branches instead of static branches, these are all patched atomically w.r.t. one another, and there are only a handful of cases that need special care in the window between cpucap detection and alternative patching. Due to the above, it would be nice to remove cpus_have_const_cap(), and migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(), or cpus_have_cap() depending on when their requirements. This will remove redundant instructions and improve code generation, and will make it easier to determine how each callsite will behave before, during, and after alternative patching. The cpus_have_const_cap() check in arch_tlbbatch_should_defer() is an optimization to avoid some redundant work when the ARM64_WORKAROUND_REPEAT_TLBI cpucap is detected and forces the immediate use of TLBI + DSB ISH. In the window between detecting the ARM64_WORKAROUND_REPEAT_TLBI cpucap and patching alternatives this is not a big concern and there's no need to optimize this window at the expsense of subsequent usage at runtime. This patch replaces the use of cpus_have_const_cap() with alternative_has_cap_unlikely(), which will avoid generating code to test the system_cpucaps bitmap and should be better for all subsequent calls at runtime. The ARM64_WORKAROUND_REPEAT_TLBI cpucap is added to cpucap_is_possible() so that code can be elided entirely when this is not possible without requiring ifdeffery or IS_ENABLED() checks at each usage. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_NVIDIA_CARMEL_CNPMark Rutland2-1/+3
In has_useable_cnp() we use cpus_have_const_cap() to check for ARM64_WORKAROUND_NVIDIA_CARMEL_CNP, but this is not necessary and cpus_have_cap() would be preferable. For historical reasons, cpus_have_const_cap() is more complicated than it needs to be. Before cpucaps are finalized, it will perform a bitmap test of the system_cpucaps bitmap, and once cpucaps are finalized it will use an alternative branch. This used to be necessary to handle some race conditions in the window between cpucap detection and the subsequent patching of alternatives and static branches, where different branches could be out-of-sync with one another (or w.r.t. alternative sequences). Now that we use alternative branches instead of static branches, these are all patched atomically w.r.t. one another, and there are only a handful of cases that need special care in the window between cpucap detection and alternative patching. Due to the above, it would be nice to remove cpus_have_const_cap(), and migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(), or cpus_have_cap() depending on when their requirements. This will remove redundant instructions and improve code generation, and will make it easier to determine how each callsite will behave before, during, and after alternative patching. We use has_useable_cnp() to determine whether we have the system-wide ARM64_HAS_CNP cpucap. Due to the structure of the cpufeature code, we call has_useable_cnp() in two distinct cases: 1) When finalizing system capabilities, setup_system_capabilities() will call has_useable_cnp() with SCOPE_SYSTEM to determine whether all CPUs have the feature. This is called after we've detected any local cpucaps including ARM64_WORKAROUND_NVIDIA_CARMEL_CNP, but prior to patching alternatives. If the ARM64_WORKAROUND_NVIDIA_CARMEL_CNP was detected, we will not detect ARM64_HAS_CNP. 2) After finalizing system capabilties, verify_local_cpu_capabilities() will call has_useable_cnp() with SCOPE_LOCAL_CPU to verify that CPUs have CNP if we previously detected it. Note that if ARM64_WORKAROUND_NVIDIA_CARMEL_CNP was detected, we will not have detected ARM64_HAS_CNP. For case 1 we must check the system_cpucaps bitmap as this occurs prior to patching the alternatives. For case 2 we'll only call has_useable_cnp() once per subsequent onlining of a CPU, and as this isn't a fast path it's not necessary to optimize for this case. This patch replaces the use of cpus_have_const_cap() with cpus_have_cap(), which will only generate the bitmap test and avoid generating an alternative sequence, resulting in slightly simpler annd smaller code being generated. The ARM64_WORKAROUND_NVIDIA_CARMEL_CNP cpucap is added to cpucap_is_possible() so that code can be elided entirely when this is not possible. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_CAVIUM_23154Mark Rutland2-0/+10
In gic_read_iar() we use cpus_have_const_cap() to check for ARM64_WORKAROUND_CAVIUM_23154 but this is not necessary and alternative_has_cap_*() would be preferable. For historical reasons, cpus_have_const_cap() is more complicated than it needs to be. Before cpucaps are finalized, it will perform a bitmap test of the system_cpucaps bitmap, and once cpucaps are finalized it will use an alternative branch. This used to be necessary to handle some race conditions in the window between cpucap detection and the subsequent patching of alternatives and static branches, where different branches could be out-of-sync with one another (or w.r.t. alternative sequences). Now that we use alternative branches instead of static branches, these are all patched atomically w.r.t. one another, and there are only a handful of cases that need special care in the window between cpucap detection and alternative patching. Due to the above, it would be nice to remove cpus_have_const_cap(), and migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(), or cpus_have_cap() depending on when their requirements. This will remove redundant instructions and improve code generation, and will make it easier to determine how each callsite will behave before, during, and after alternative patching. The ARM64_WORKAROUND_CAVIUM_23154 cpucap is detected and patched early on the boot CPU before the GICv3 driver is initialized and hence before gic_read_iar() is ever called. Thus it is not necessary to use cpus_have_const_cap(), and alternative_has_cap() is equivalent. In addition, arm64's gic_read_iar() lives in irq-gic-v3.c purely for historical reasons. It was originally added prior to 32-bit arm support in commit: 6d4e11c5e2e8cd54 ("irqchip/gicv3: Workaround for Cavium ThunderX erratum 23154") When support for 32-bit arm was added, 32-bit arm's gic_read_iar() implementation was placed in <asm/arch_gicv3.h>, but the arm64 version was kept within irq-gic-v3.c as it depended on a static key local to irq-gic-v3.c and it was easier to add ifdeffery, which is what we did in commit: 7936e914f7b0827c ("irqchip/gic-v3: Refactor the arm64 specific parts") Subsequently the static key was replaced with a cpucap in commit: a4023f682739439b ("arm64: Add hypervisor safe helper for checking constant capabilities") Since that commit there has been no need to keep arm64's gic_read_iar() in irq-gic-v3.c. This patch replaces the use of cpus_have_const_cap() with alternative_has_cap_unlikely(), which will avoid generating code to test the system_cpucaps bitmap and should be better for all subsequent calls at runtime. For consistency, move the arm64-specific gic_read_iar() implementation over to arm64's <asm/arch_gicv3.h>. The ARM64_WORKAROUND_CAVIUM_23154 cpucap is added to cpucap_is_possible() so that code can be elided entirely when this is not possible. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_2645198Mark Rutland3-4/+4
We use cpus_have_const_cap() to check for ARM64_WORKAROUND_2645198 but this is not necessary and alternative_has_cap() would be preferable. For historical reasons, cpus_have_const_cap() is more complicated than it needs to be. Before cpucaps are finalized, it will perform a bitmap test of the system_cpucaps bitmap, and once cpucaps are finalized it will use an alternative branch. This used to be necessary to handle some race conditions in the window between cpucap detection and the subsequent patching of alternatives and static branches, where different branches could be out-of-sync with one another (or w.r.t. alternative sequences). Now that we use alternative branches instead of static branches, these are all patched atomically w.r.t. one another, and there are only a handful of cases that need special care in the window between cpucap detection and alternative patching. Due to the above, it would be nice to remove cpus_have_const_cap(), and migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(), or cpus_have_cap() depending on when their requirements. This will remove redundant instructions and improve code generation, and will make it easier to determine how each callsite will behave before, during, and after alternative patching. The ARM64_WORKAROUND_2645198 cpucap is detected and patched before any userspace translation table exist, and the workaround is only necessary when manipulating usrspace translation tables which are in use. Thus it is not necessary to use cpus_have_const_cap(), and alternative_has_cap() is equivalent. This patch replaces the use of cpus_have_const_cap() with alternative_has_cap_unlikely(), which will avoid generating code to test the system_cpucaps bitmap and should be better for all subsequent calls at runtime. The ARM64_WORKAROUND_2645198 cpucap is added to cpucap_is_possible() so that code can be elided entirely when this is not possible, and redundant IS_ENABLED() checks are removed. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_1742098Mark Rutland2-3/+5
In elf_hwcap_fixup() we use cpus_have_const_cap() to check for ARM64_WORKAROUND_1742098, but this is not necessary and cpus_have_cap() would be preferable. For historical reasons, cpus_have_const_cap() is more complicated than it needs to be. Before cpucaps are finalized, it will perform a bitmap test of the system_cpucaps bitmap, and once cpucaps are finalized it will use an alternative branch. This used to be necessary to handle some race conditions in the window between cpucap detection and the subsequent patching of alternatives and static branches, where different branches could be out-of-sync with one another (or w.r.t. alternative sequences). Now that we use alternative branches instead of static branches, these are all patched atomically w.r.t. one another, and there are only a handful of cases that need special care in the window between cpucap detection and alternative patching. Due to the above, it would be nice to remove cpus_have_const_cap(), and migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(), or cpus_have_cap() depending on when their requirements. This will remove redundant instructions and improve code generation, and will make it easier to determine how each callsite will behave before, during, and after alternative patching. The ARM64_WORKAROUND_1742098 cpucap is detected and patched before elf_hwcap_fixup() can run, and hence it is not necessary to use cpus_have_const_cap(). We run cpus_have_const_cap() at most twice: once after finalizing system cpucaps, and potentially once more after detecting mismatched CPUs which support AArch32 at EL0. Due to this, it's not necessary to optimize for many calls to elf_hwcap_fixup(), and it's fine to use cpus_have_cap(). This patch replaces the use of cpus_have_const_cap() with cpus_have_cap(), which will only generate the bitmap test and avoid generating an alternative sequence, resulting in slightly simpler annd smaller code being generated. For consistenct with other cpucaps, the ARM64_WORKAROUND_1742098 cpucap is added to cpucap_is_possible() so that code can be elided when this is not possible. However, as we only define compat_elf_hwcap2 when CONFIG_COMPAT=y, some ifdeffery is still required within user_feature_fixup() to avoid build errors when CONFIG_COMPAT=n. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_1542419Mark Rutland2-2/+2
We use cpus_have_const_cap() to check for ARM64_WORKAROUND_1542419 but this is not necessary and cpus_have_final_cap() would be preferable. For historical reasons, cpus_have_const_cap() is more complicated than it needs to be. Before cpucaps are finalized, it will perform a bitmap test of the system_cpucaps bitmap, and once cpucaps are finalized it will use an alternative branch. This used to be necessary to handle some race conditions in the window between cpucap detection and the subsequent patching of alternatives and static branches, where different branches could be out-of-sync with one another (or w.r.t. alternative sequences). Now that we use alternative branches instead of static branches, these are all patched atomically w.r.t. one another, and there are only a handful of cases that need special care in the window between cpucap detection and alternative patching. Due to the above, it would be nice to remove cpus_have_const_cap(), and migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(), or cpus_have_cap() depending on when their requirements. This will remove redundant instructions and improve code generation, and will make it easier to determine how each callsite will behave before, during, and after alternative patching. The ARM64_WORKAROUND_1542419 cpucap is detected and patched before any userspace code can run, and the both __do_compat_cache_op() and ctr_read_handler() are only reachable from exceptions taken from userspace. Thus it is not necessary for either to use cpus_have_const_cap(), and cpus_have_final_cap() is equivalent. This patch replaces the use of cpus_have_const_cap() with cpus_have_final_cap(), which will avoid generating code to test the system_cpucaps bitmap and should be better for all subsequent calls at runtime. Using cpus_have_final_cap() clearly documents that we do not expect this code to run before cpucaps are finalized, and will make it easier to spot issues if code is changed in future to allow these functions to be reached earlier. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_843419Mark Rutland3-6/+6
In count_plts() and is_forbidden_offset_for_adrp() we use cpus_have_const_cap() to check for ARM64_WORKAROUND_843419, but this is not necessary and cpus_have_final_cap() would be preferable. For historical reasons, cpus_have_const_cap() is more complicated than it needs to be. Before cpucaps are finalized, it will perform a bitmap test of the system_cpucaps bitmap, and once cpucaps are finalized it will use an alternative branch. This used to be necessary to handle some race conditions in the window between cpucap detection and the subsequent patching of alternatives and static branches, where different branches could be out-of-sync with one another (or w.r.t. alternative sequences). Now that we use alternative branches instead of static branches, these are all patched atomically w.r.t. one another, and there are only a handful of cases that need special care in the window between cpucap detection and alternative patching. Due to the above, it would be nice to remove cpus_have_const_cap(), and migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(), or cpus_have_cap() depending on when their requirements. This will remove redundant instructions and improve code generation, and will make it easier to determine how each callsite will behave before, during, and after alternative patching. It's not possible to load a module in the window between detecting the ARM64_WORKAROUND_843419 cpucap and patching alternatives. The module VA range limits are initialized much later in module_init_limits() which is a subsys_initcall, and module loading cannot happen before this. Hence it's not necessary for count_plts() or is_forbidden_offset_for_adrp() to use cpus_have_const_cap(). This patch replaces the use of cpus_have_const_cap() with cpus_have_final_cap() which will avoid generating code to test the system_cpucaps bitmap and should be better for all subsequent calls at runtime. Using cpus_have_final_cap() clearly documents that we do not expect this code to run before cpucaps are finalized, and will make it easier to spot issues if code is changed in future to allow modules to be loaded earlier. The ARM64_WORKAROUND_843419 cpucap is added to cpucap_is_possible() so that code can be elided entirely when this is not possible, and redundant IS_ENABLED() checks are removed. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16arm64: Avoid cpus_have_const_cap() for ARM64_UNMAP_KERNEL_AT_EL0Mark Rutland4-3/+5
In arm64_kernel_unmapped_at_el0() we use cpus_have_const_cap() to check for ARM64_UNMAP_KERNEL_AT_EL0, but this is only necessary so that arm64_get_bp_hardening_vector() and this_cpu_set_vectors() can run prior to alternatives being patched. Otherwise this is not necessary and alternative_has_cap_*() would be preferable. For historical reasons, cpus_have_const_cap() is more complicated than it needs to be. Before cpucaps are finalized, it will perform a bitmap test of the system_cpucaps bitmap, and once cpucaps are finalized it will use an alternative branch. This used to be necessary to handle some race conditions in the window between cpucap detection and the subsequent patching of alternatives and static branches, where different branches could be out-of-sync with one another (or w.r.t. alternative sequences). Now that we use alternative branches instead of static branches, these are all patched atomically w.r.t. one another, and there are only a handful of cases that need special care in the window between cpucap detection and alternative patching. Due to the above, it would be nice to remove cpus_have_const_cap(), and migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(), or cpus_have_cap() depending on when their requirements. This will remove redundant instructions and improve code generation, and will make it easier to determine how each callsite will behave before, during, and after alternative patching. The ARM64_UNMAP_KERNEL_AT_EL0 cpucap is a system-wide feature that is detected and patched before any translation tables are created for userspace. In the window between detecting the ARM64_UNMAP_KERNEL_AT_EL0 cpucap and patching alternatives, most users of arm64_kernel_unmapped_at_el0() do not need to know that the cpucap has been detected: * As KVM is initialized after cpucaps are finalized, no usaef of arm64_kernel_unmapped_at_el0() in the KVM code is reachable during this window. * The arm64_mm_context_get() function in arch/arm64/mm/context.c is only called after the SMMU driver is brought up after alternatives have been patched. Thus this can safely use cpus_have_final_cap() or alternative_has_cap_*(). Similarly the asids_update_limit() function is called after alternatives have been patched as an arch_initcall, and this can safely use cpus_have_final_cap() or alternative_has_cap_*(). Similarly we do not expect an ASID rollover to occur between cpucaps being detected and patching alternatives. Thus set_reserved_asid_bits() can safely use cpus_have_final_cap() or alternative_has_cap_*(). * The __tlbi_user() and __tlbi_user_level() macros are not used during this window, and only need to invalidate additional entries once userspace translation tables have been active on a CPU. Thus these can safely use alternative_has_cap_*(). * The xen_kernel_unmapped_at_usr() function is not used during this window as it is only used in a late_initcall. Thus this can safely use cpus_have_final_cap() or alternative_has_cap_*(). * The arm64_get_meltdown_state() function is not used during this window. It only used by arm64_get_meltdown_state() and KVM code, both of which are only used after cpucaps have been finalized. Thus this can safely use cpus_have_final_cap() or alternative_has_cap_*(). * The tls_thread_switch() uses arm64_kernel_unmapped_at_el0() as an optimization to avoid zeroing tpidrro_el0 when KPTI is enabled and this will be trampled by the KPTI trampoline. It doesn't matter if this continues to zero the register during the window between detecting the cpucap and patching alternatives, so this can safely use alternative_has_cap_*(). * The sdei_arch_get_entry_point() and do_sdei_event() functions aren't reachable at this time as the SDEI driver is registered later by acpi_init() -> acpi_ghes_init() -> sdei_init(), where acpi_init is a subsys_initcall. Thus these can safely use cpus_have_final_cap() or alternative_has_cap_*(). * The uses under drivers/ aren't reachable at this time as the drivers are registered later: - TRBE is registered via module_init() - SMMUv3 is registred via module_driver() - SPE is registred via module_init() * The arm64_get_bp_hardening_vector() and this_cpu_set_vectors() functions need to run on boot CPUs prior to patching alternatives. As these are only called during the onlining of a CPU, it's fine to perform a system_cpucaps bitmap test using cpus_have_cap(). This patch modifies this_cpu_set_vectors() to use cpus_have_cap(), and replaced all other use of cpus_have_const_cap() with alternative_has_cap_unlikely(), which will avoid generating code to test the system_cpucaps bitmap and should be better for all subsequent calls at runtime. The ARM64_UNMAP_KERNEL_AT_EL0 cpucap is added to cpucap_is_possible() so that code can be elided entirely when this is not possible. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16arm64: Avoid cpus_have_const_cap() for ARM64_{SVE,SME,SME2,FA64}Mark Rutland2-6/+6
In system_supports_{sve,sme,sme2,fa64}() we use cpus_have_const_cap() to check for the relevant cpucaps, but this is only necessary so that sve_setup() and sme_setup() can run prior to alternatives being patched, and otherwise alternative_has_cap_*() would be preferable. For historical reasons, cpus_have_const_cap() is more complicated than it needs to be. Before cpucaps are finalized, it will perform a bitmap test of the system_cpucaps bitmap, and once cpucaps are finalized it will use an alternative branch. This used to be necessary to handle some race conditions in the window between cpucap detection and the subsequent patching of alternatives and static branches, where different branches could be out-of-sync with one another (or w.r.t. alternative sequences). Now that we use alternative branches instead of static branches, these are all patched atomically w.r.t. one another, and there are only a handful of cases that need special care in the window between cpucap detection and alternative patching. Due to the above, it would be nice to remove cpus_have_const_cap(), and migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(), or cpus_have_cap() depending on when their requirements. This will remove redundant instructions and improve code generation, and will make it easier to determine how each callsite will behave before, during, and after alternative patching. All of system_supports_{sve,sme,sme2,fa64}() will return false prior to system cpucaps being detected. In the window between system cpucaps being detected and patching alternatives, we need system_supports_sve() and system_supports_sme() to run to initialize SVE and SME properties, but all other users of system_supports_{sve,sme,sme2,fa64}() don't depend on the relevant cpucap becoming true until alternatives are patched: * No KVM code runs until after alternatives are patched, and so this can safely use cpus_have_final_cap() or alternative_has_cap_*(). * The cpuid_cpu_online() callback in arch/arm64/kernel/cpuinfo.c is registered later from cpuinfo_regs_init() as a device_initcall, and so this can safely use cpus_have_final_cap() or alternative_has_cap_*(). * The entry, signal, and ptrace code isn't reachable until userspace has run, and so this can safely use cpus_have_final_cap() or alternative_has_cap_*(). * Currently perf_reg_validate() will un-reserve the PERF_REG_ARM64_VG pseudo-register before alternatives are patched, and before sve_setup() has run. If a sampling event is created early enough, this would allow perf_ext_reg_value() to sample (the as-yet uninitialized) thread_struct::vl[] prior to alternatives being patched. It would be preferable to defer this until alternatives are patched, and this can safely use alternative_has_cap_*(). * The context-switch code will run during this window as part of stop_machine() used during alternatives_patch_all(), and potentially for other work if other kernel threads are created early. No threads require the use of SVE/SME/SME2/FA64 prior to alternatives being patched, and it would be preferable for the related context-switch logic to take effect after alternatives are patched so that ths is guaranteed to see a consistent system-wide state (e.g. anything initialized by sve_setup() and sme_setup(). This can safely ues alternative_has_cap_*(). This patch replaces the use of cpus_have_const_cap() with alternative_has_cap_unlikely(), which will avoid generating code to test the system_cpucaps bitmap and should be better for all subsequent calls at runtime. The sve_setup() and sme_setup() functions are modified to use cpus_have_cap() directly so that they can observe the cpucaps being set prior to alternatives being patched. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16arm64: Avoid cpus_have_const_cap() for ARM64_SPECTRE_V2Mark Rutland1-1/+1
In arm64_apply_bp_hardening() we use cpus_have_const_cap() to check for ARM64_SPECTRE_V2 , but this is not necessary and alternative_has_cap_*() would be preferable. For historical reasons, cpus_have_const_cap() is more complicated than it needs to be. Before cpucaps are finalized, it will perform a bitmap test of the system_cpucaps bitmap, and once cpucaps are finalized it will use an alternative branch. This used to be necessary to handle some race conditions in the window between cpucap detection and the subsequent patching of alternatives and static branches, where different branches could be out-of-sync with one another (or w.r.t. alternative sequences). Now that we use alternative branches instead of static branches, these are all patched atomically w.r.t. one another, and there are only a handful of cases that need special care in the window between cpucap detection and alternative patching. Due to the above, it would be nice to remove cpus_have_const_cap(), and migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(), or cpus_have_cap() depending on when their requirements. This will remove redundant instructions and improve code generation, and will make it easier to determine how each callsite will behave before, during, and after alternative patching. The cpus_have_const_cap() check in arm64_apply_bp_hardening() is intended to avoid the overhead of looking up and invoking a per-cpu function pointer when no branch predictor hardening is required. The arm64_apply_bp_hardening() function itself is called in two distinct flows: 1) When handling certain exceptions taken from EL0, where the PC could be a TTBR1 address and hence might have trained a branch predictor. As cpucaps are detected and alternatives are patched long before it is possible to execute userspace, it is not necessary to use cpus_have_const_cap() for these cases, and cpus_have_final_cap() or alternative_has_cap() would be preferable. 2) When switching between tasks in check_and_switch_context(). This can be called before cpucaps are detected and alternatives are patched, but this is long before the kernel mounts filesystems or accepts any input. At this stage the kernel hasn't loaded any secrets and there is no potential for hostile branch predictor training. Once cpucaps have been finalized and alternatives have been patched, switching tasks will invalidate any prior predictions. Hence it is not necessary to use cpus_have_const_cap() for this case. This patch replaces the use of cpus_have_const_cap() with alternative_has_cap_unlikely(), which will avoid generating code to test the system_cpucaps bitmap and should be better for all subsequent calls at runtime. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16arm64: Avoid cpus_have_const_cap() for ARM64_SSBSMark Rutland1-1/+1
In ssbs_thread_switch() we use cpus_have_const_cap() to check for ARM64_SSBS, but this is not necessary and alternative_has_cap_*() would be preferable. For historical reasons, cpus_have_const_cap() is more complicated than it needs to be. Before cpucaps are finalized, it will perform a bitmap test of the system_cpucaps bitmap, and once cpucaps are finalized it will use an alternative branch. This used to be necessary to handle some race conditions in the window between cpucap detection and the subsequent patching of alternatives and static branches, where different branches could be out-of-sync with one another (or w.r.t. alternative sequences). Now that we use alternative branches instead of static branches, these are all patched atomically w.r.t. one another, and there are only a handful of cases that need special care in the window between cpucap detection and alternative patching. Due to the above, it would be nice to remove cpus_have_const_cap(), and migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(), or cpus_have_cap() depending on when their requirements. This will remove redundant instructions and improve code generation, and will make it easier to determine how each callsite will behave before, during, and after alternative patching. The cpus_have_const_cap() check in ssbs_thread_switch() is an optimization to avoid the overhead of spectre_v4_enable_task_mitigation() where all CPUs implement SSBS and naturally preserve the SSBS bit in SPSR_ELx. In the window between detecting the ARM64_SSBS system-wide and patching alternative branches it is benign to continue to call spectre_v4_enable_task_mitigation(). This patch replaces the use of cpus_have_const_cap() with alternative_has_cap_unlikely(), which will avoid generating code to test the system_cpucaps bitmap and should be better for all subsequent calls at runtime. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16arm64: Avoid cpus_have_const_cap() for ARM64_MTEMark Rutland1-1/+1
In system_supports_mte() we use cpus_have_const_cap() to check for ARM64_MTE, but this is not necessary and cpus_have_final_boot_cap() would be preferable. For historical reasons, cpus_have_const_cap() is more complicated than it needs to be. Before cpucaps are finalized, it will perform a bitmap test of the system_cpucaps bitmap, and once cpucaps are finalized it will use an alternative branch. This used to be necessary to handle some race conditions in the window between cpucap detection and the subsequent patching of alternatives and static branches, where different branches could be out-of-sync with one another (or w.r.t. alternative sequences). Now that we use alternative branches instead of static branches, these are all patched atomically w.r.t. one another, and there are only a handful of cases that need special care in the window between cpucap detection and alternative patching. Due to the above, it would be nice to remove cpus_have_const_cap(), and migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(), or cpus_have_cap() depending on when their requirements. This will remove redundant instructions and improve code generation, and will make it easier to determine how each callsite will behave before, during, and after alternative patching. The ARM64_MTE cpucap is a boot cpu feature which is detected and patched early on the boot CPU under smp_prepare_boot_cpu(). In the window between detecting the ARM64_MTE cpucap and patching alternatives, nothing depends on the ARM64_MTE cpucap: * The kasan_hw_tags_enabled() helper depends upon the kasan_flag_enabled static key, which is initialized later in kasan_init_hw_tags() after alternatives have been applied. * No KVM code is called during this window, and KVM is not initialized until after system cpucaps have been detected and patched. KVM code can safely use cpus_have_final_cap() or alternative_has_cap_*(). * We don't context-switch prior to patching boot alternatives, and thus mte_thread_switch() is not reachable during this window. Thus, we can safely use cpus_have_final_boot_cap() or alternative_has_cap_*() in the context-switch code. * IRQ and FIQ are masked during this window, and we can only take SError and Debug exceptions. SError exceptions are fatal at this point in time, and we do not expect to take Debug exceptions, thus: - It's fine to lave TCO set for exceptions taken during this window, and mte_disable_tco_entry() doesn't need to do anything. - We don't need to detect and report asynchronous tag cehck faults during this window, and neither mte_check_tfsr_entry() nor mte_check_tfsr_exit() need to do anything. Since we want to report any SErrors taken during thiw window, these cannot safely use cpus_have_final_boot_cap() or cpus_have_final_cap(), but these can safely use alternative_has_cap_*(). * The __set_pte_at() function is not used during this window. It is possible for this to be used on kernel mappings prior to boot cpucaps being finalized, so this cannot safely use cpus_have_final_boot_cap() or cpus_have_final_cap(), but this can safely use alternative_has_cap_*(). * No userspace translation tables have been created yet, and swap has not been initialized yet. Thus swapping is not possible and none of the following are called: - arch_thp_swp_supported() - arch_prepare_to_swap() - arch_swap_invalidate_page() - arch_swap_invalidate_area() - arch_swap_restore() These can safely use system_has_final_cap() or alternative_has_cap_*(). * The elfcore functions are only reachable after userspace is brought up, which happens after system cpucaps have been detected and patched. Thus the elfcore code can safely use cpus_have_final_cap() or alternative_has_cap_*(). * Hibernation is only possible after userspace is brought up, which happens after system cpucaps have been detected and patched. Thus the hibernate code can safely use cpus_have_final_cap() or alternative_has_cap_*(). * The set_tagged_addr_ctrl() function is only reachable after userspace is brought up, which happens after system cpucaps have been detected and patched. Thus this can safely use cpus_have_final_cap() or alternative_has_cap_*(). * The copy_user_highpage() and copy_highpage() functions are not used during this window, and can safely use alternative_has_cap_*(). This patch replaces the use of cpus_have_const_cap() with alternative_has_cap_unlikely(), which avoid generating code to test the system_cpucaps bitmap and should be better for all subsequent calls at runtime. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16arm64: Avoid cpus_have_const_cap() for ARM64_HAS_TLB_RANGEMark Rutland1-1/+1
We use cpus_have_const_cap() to check for ARM64_HAS_TLB_RANGE, but this is not necessary and alternative_has_cap_*() would be preferable. For historical reasons, cpus_have_const_cap() is more complicated than it needs to be. Before cpucaps are finalized, it will perform a bitmap test of the system_cpucaps bitmap, and once cpucaps are finalized it will use an alternative branch. This used to be necessary to handle some race conditions in the window between cpucap detection and the subsequent patching of alternatives and static branches, where different branches could be out-of-sync with one another (or w.r.t. alternative sequences). Now that we use alternative branches instead of static branches, these are all patched atomically w.r.t. one another, and there are only a handful of cases that need special care in the window between cpucap detection and alternative patching. Due to the above, it would be nice to remove cpus_have_const_cap(), and migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(), or cpus_have_cap() depending on when their requirements. This will remove redundant instructions and improve code generation, and will make it easier to determine how each callsite will behave before, during, and after alternative patching. In the window between detecting the ARM64_HAS_TLB_RANGE cpucap and patching alternative branches, we do not perform any TLB invalidation, and even if we were to perform TLB invalidation here it would not be functionally necessary to optimize this by using range invalidation. Hence there's no need to use cpus_have_const_cap(), and alternative_has_cap_unlikely() is sufficient. This patch replaces the use of cpus_have_const_cap() with alternative_has_cap_unlikely(), which will avoid generating code to test the system_cpucaps bitmap and should be better for all subsequent calls at runtime. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16arm64: Avoid cpus_have_const_cap() for ARM64_HAS_WFXTMark Rutland1-1/+1
In __delay() we use cpus_have_const_cap() to check for ARM64_HAS_WFXT, but this is not necessary and alternative_has_cap() would be preferable. For historical reasons, cpus_have_const_cap() is more complicated than it needs to be. Before cpucaps are finalized, it will perform a bitmap test of the system_cpucaps bitmap, and once cpucaps are finalized it will use an alternative branch. This used to be necessary to handle some race conditions in the window between cpucap detection and the subsequent patching of alternatives and static branches, where different branches could be out-of-sync with one another (or w.r.t. alternative sequences). Now that we use alternative branches instead of static branches, these are all patched atomically w.r.t. one another, and there are only a handful of cases that need special care in the window between cpucap detection and alternative patching. Due to the above, it would be nice to remove cpus_have_const_cap(), and migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(), or cpus_have_cap() depending on when their requirements. This will remove redundant instructions and improve code generation, and will make it easier to determine how each callsite will behave before, during, and after alternative patching. The cpus_have_const_cap() check in __delay() is an optimization to use WFIT and WFET in preference to busy-polling the counter and/or using regular WFE and relying upon the architected timer event stream. It is not necessary to apply this optimization in the window between detecting the ARM64_HAS_WFXT cpucap and patching alternatives. This patch replaces the use of cpus_have_const_cap() with alternative_has_cap_unlikely(), which will avoid generating code to test the system_cpucaps bitmap and should be better for all subsequent calls at runtime. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16arm64: Avoid cpus_have_const_cap() for ARM64_HAS_RNGMark Rutland1-1/+1
In __cpu_has_rng() we use cpus_have_const_cap() to check for ARM64_HAS_RNG, but this is not necessary and alternative_has_cap_*() would be preferable. For historical reasons, cpus_have_const_cap() is more complicated than it needs to be. Before cpucaps are finalized, it will perform a bitmap test of the system_cpucaps bitmap, and once cpucaps are finalized it will use an alternative branch. This used to be necessary to handle some race conditions in the window between cpucap detection and the subsequent patching of alternatives and static branches, where different branches could be out-of-sync with one another (or w.r.t. alternative sequences). Now that we use alternative branches instead of static branches, these are all patched atomically w.r.t. one another, and there are only a handful of cases that need special care in the window between cpucap detection and alternative patching. Due to the above, it would be nice to remove cpus_have_const_cap(), and migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(), or cpus_have_cap() depending on when their requirements. This will remove redundant instructions and improve code generation, and will make it easier to determine how each callsite will behave before, during, and after alternative patching. In the window between detecting the ARM64_HAS_RNG cpucap and patching alternative branches, nothing which calls __cpu_has_rng() can run, and hence it's not necessary to use cpus_have_const_cap(). This patch replaces the use of cpus_have_const_cap() with alternative_has_cap_unlikely(), which will avoid generating code to test the system_cpucaps bitmap and should be better for all subsequent calls at runtime. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16arm64: Avoid cpus_have_const_cap() for ARM64_HAS_EPANMark Rutland3-2/+4
We use cpus_have_const_cap() to check for ARM64_HAS_EPAN but this is not necessary and alternative_has_cap() or cpus_have_cap() would be preferable. For historical reasons, cpus_have_const_cap() is more complicated than it needs to be. Before cpucaps are finalized, it will perform a bitmap test of the system_cpucaps bitmap, and once cpucaps are finalized it will use an alternative branch. This used to be necessary to handle some race conditions in the window between cpucap detection and the subsequent patching of alternatives and static branches, where different branches could be out-of-sync with one another (or w.r.t. alternative sequences). Now that we use alternative branches instead of static branches, these are all patched atomically w.r.t. one another, and there are only a handful of cases that need special care in the window between cpucap detection and alternative patching. Due to the above, it would be nice to remove cpus_have_const_cap(), and migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(), or cpus_have_cap() depending on when their requirements. This will remove redundant instructions and improve code generation, and will make it easier to determine how each callsite will behave before, during, and after alternative patching. The ARM64_HAS_EPAN cpucap is used to affect two things: 1) The permision bits used for userspace executable mappings, which are chosen by adjust_protection_map(), which is an arch_initcall. This is called after the ARM64_HAS_EPAN cpucap has been detected and alternatives have been patched, and before any userspace translation tables exist. 2) The handling of faults taken from (user or kernel) accesses to userspace executable mappings in do_page_fault(). Userspace translation tables are created after adjust_protection_map() is called, and hence after the ARM64_HAS_EPAN cpucap has been detected and alternatives have been patched. Neither of these run until after ARM64_HAS_EPAN cpucap has been detected and alternatives have been patched, and hence there's no need to use cpus_have_const_cap(). Since adjust_protection_map() is only executed once at boot time it would be best for it to use cpus_have_cap(), and since do_page_fault() is executed frequently it would be best for it to use alternatives_have_cap_unlikely(). This patch replaces the uses of cpus_have_const_cap() with cpus_have_cap() and alternative_has_cap_unlikely(), which will avoid generating redundant code, and should be better for all subsequent calls at runtime. The ARM64_HAS_EPAN cpucap is added to cpucap_is_possible() so that code can be elided entirely when this is not possible. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Vladimir Murzin <vladimir.murzin@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16arm64: Avoid cpus_have_const_cap() for ARM64_HAS_PANMark Rutland2-2/+3
In system_uses_hw_pan() we use cpus_have_const_cap() to check for ARM64_HAS_PAN, but this is only necessary so that the system_uses_ttbr0_pan() check in setup_cpu_features() can run prior to alternatives being patched, and otherwise this is not necessary and alternative_has_cap_*() would be preferable. For historical reasons, cpus_have_const_cap() is more complicated than it needs to be. Before cpucaps are finalized, it will perform a bitmap test of the system_cpucaps bitmap, and once cpucaps are finalized it will use an alternative branch. This used to be necessary to handle some race conditions in the window between cpucap detection and the subsequent patching of alternatives and static branches, where different branches could be out-of-sync with one another (or w.r.t. alternative sequences). Now that we use alternative branches instead of static branches, these are all patched atomically w.r.t. one another, and there are only a handful of cases that need special care in the window between cpucap detection and alternative patching. Due to the above, it would be nice to remove cpus_have_const_cap(), and migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(), or cpus_have_cap() depending on when their requirements. This will remove redundant instructions and improve code generation, and will make it easier to determine how each callsite will behave before, during, and after alternative patching. The ARM64_HAS_PAN cpucap is used by system_uses_hw_pan() and system_uses_ttbr0_pan() depending on whether CONFIG_ARM64_SW_TTBR0_PAN is selected, and: * We only use system_uses_hw_pan() directly in __sdei_handler(), which isn't reachable until after alternatives have been patched, and for this it is safe to use alternative_has_cap_*(). * We use system_uses_ttbr0_pan() in a few places: - In check_and_switch_context() and cpu_uninstall_idmap(), which will defer installing a translation table into TTBR0 when the ARM64_HAS_PAN cpucap is not detected. Prior to patching alternatives, all CPUs will be using init_mm with the reserved ttbr0 translation tables install in TTBR0, so these can safely use alternative_has_cap_*(). - In update_saved_ttbr0(), which will only save the active TTBR0 into a per-thread variable when the ARM64_HAS_PAN cpucap is not detected. Prior to patching alternatives, all CPUs will be using init_mm with the reserved ttbr0 translation tables install in TTBR0, so these can safely use alternative_has_cap_*(). - In efi_set_pgd(), which will handle check_and_switch_context() deferring the installation of TTBR0 when TTBR0 PAN is detected. The EFI runtime services are not initialized until after alternatives have been patched, and so this can safely use alternative_has_cap_*() or cpus_have_final_cap(). - In uaccess_ttbr0_disable() and uaccess_ttbr0_enable(), where we'll avoid installing/uninstalling a translation table in TTBR0 when ARM64_HAS_PAN is detected. Prior to patching alternatives we will not perform any uaccess and will not call uaccess_ttbr0_disable() or uaccess_ttbr0_enable(), and so these can safely use alternative_has_cap_*() or cpus_have_final_cap(). - In is_el1_permission_fault() where we will consider a translation fault on a TTBR0 address to be a permission fault when ARM64_HAS_PAN is not detected *and* we have set the PAN bit in the SPSR (which tells us that in the interrupted context, TTBR0 pointed at the reserved zero ttbr). In the window between detecting system cpucaps and patching alternatives we should not perform any accesses to TTBR0 addresses, and no userspace translation tables exist until after patching alternatives. Thus it is safe for this to use alternative_has_cap*(). This patch replaces the use of cpus_have_const_cap() with alternative_has_cap_unlikely(), which will avoid generating code to test the system_cpucaps bitmap and should be better for all subsequent calls at runtime. So that the check for TTBR0 PAN in setup_cpu_features() can run prior to alternatives being patched, the call to system_uses_ttbr0_pan() is replaced with an explicit check of the ARM64_HAS_PAN bit in the system_cpucaps bitmap. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16arm64: Avoid cpus_have_const_cap() for ARM64_HAS_GIC_PRIO_MASKINGMark Rutland2-14/+8
In system_uses_irq_prio_masking() we use cpus_have_const_cap() to check for ARM64_HAS_GIC_PRIO_MASKING, but this is not necessary and alternative_has_cap_*() would be preferable. For historical reasons, cpus_have_const_cap() is more complicated than it needs to be. Before cpucaps are finalized, it will perform a bitmap test of the system_cpucaps bitmap, and once cpucaps are finalized it will use an alternative branch. This used to be necessary to handle some race conditions in the window between cpucap detection and the subsequent patching of alternatives and static branches, where different branches could be out-of-sync with one another (or w.r.t. alternative sequences). Now that we use alternative branches instead of static branches, these are all patched atomically w.r.t. one another, and there are only a handful of cases that need special care in the window between cpucap detection and alternative patching. Due to the above, it would be nice to remove cpus_have_const_cap(), and migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(), or cpus_have_cap() depending on when their requirements. This will remove redundant instructions and improve code generation, and will make it easier to determine how each callsite will behave before, during, and after alternative patching. When CONFIG_ARM64_PSEUDO_NMI=y the ARM64_HAS_GIC_PRIO_MASKING cpucap is a strict boot cpu feature which is detected and patched early on the boot cpu, which both happen in smp_prepare_boot_cpu(). In the window between the ARM64_HAS_GIC_PRIO_MASKING cpucap is detected and alternatives are patched we don't run any code that depends upon the ARM64_HAS_GIC_PRIO_MASKING cpucap: * We leave DAIF.IF set until after boot alternatives are patched, and interrupts are unmasked later in init_IRQ(), so we cannot reach IRQ/FIQ entry code and will not use irqs_priority_unmasked(). * We don't call any code which uses arm_cpuidle_save_irq_context() and arm_cpuidle_restore_irq_context() during this window. * We don't call start_thread_common() during this window. * The local_irq_*() code in <asm/irqflags.h> depends solely on an alternative branch since commit: a5f61cc636f48bdf ("arm64: irqflags: use alternative branches for pseudo-NMI logic") ... and hence will use the default (DAIF-only) masking behaviour until alternatives are patched. * Secondary CPUs are brought up later after alternatives are patched, and alternatives are patched on the boot CPU immediately prior to calling init_gic_priority_masking(), so we'll correctly initialize interrupt masking regardless. This patch replaces the use of cpus_have_const_cap() with alternative_has_cap_unlikely(), which avoid generating code to test the system_cpucaps bitmap and should be better for all subsequent calls at runtime. As this makes system_uses_irq_prio_masking() equivalent to __irqflags_uses_pmr(), the latter is removed and replaced with the former for consistency. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16arm64: Avoid cpus_have_const_cap() for ARM64_HAS_DITMark Rutland1-1/+10
In __cpu_suspend_exit() we use cpus_have_const_cap() to check for ARM64_HAS_DIT but this is not necessary and cpus_have_final_cap() of alternative_has_cap_*() would be preferable. For historical reasons, cpus_have_const_cap() is more complicated than it needs to be. Before cpucaps are finalized, it will perform a bitmap test of the system_cpucaps bitmap, and once cpucaps are finalized it will use an alternative branch. This used to be necessary to handle some race conditions in the window between cpucap detection and the subsequent patching of alternatives and static branches, where different branches could be out-of-sync with one another (or w.r.t. alternative sequences). Now that we use alternative branches instead of static branches, these are all patched atomically w.r.t. one another, and there are only a handful of cases that need special care in the window between cpucap detection and alternative patching. Due to the above, it would be nice to remove cpus_have_const_cap(), and migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(), or cpus_have_cap() depending on when their requirements. This will remove redundant instructions and improve code generation, and will make it easier to determine how each callsite will behave before, during, and after alternative patching. The ARM64_HAS_DIT cpucap is detected and patched (along with all other cpucaps) before __cpu_suspend_exit() can run. We'll only use __cpu_suspend_exit() as part of PSCI cpuidle or hibernation, and both of these are intialized after system cpucaps are detected and patched: the PSCI cpuidle driver is registered with a device_initcall, hibernation restoration occurs in a late_initcall, and hibarnation saving is driven by usrspace. Therefore it is not necessary to use cpus_have_const_cap(), and using alternative_has_cap_*() or cpus_have_final_cap() is sufficient. This patch replaces the use of cpus_have_const_cap() with alternative_has_cap_unlikely(), which will avoid generating code to test the system_cpucaps bitmap and should be better for all subsequent calls at runtime. To clearly document the ordering relationship between suspend/resume and alternatives patching, an explicit check for system_capabilities_finalized() is added to cpu_suspend() along with a comment block, which will make it easier to spot issues if code is changed in future to allow these functions to be reached earlier. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16arm64: Avoid cpus_have_const_cap() for ARM64_HAS_CNPMark Rutland4-14/+20
In system_supports_cnp() we use cpus_have_const_cap() to check for ARM64_HAS_CNP, but this is only necessary so that the cpu_enable_cnp() callback can run prior to alternatives being patched, and otherwise this is not necessary and alternative_has_cap_*() would be preferable. For historical reasons, cpus_have_const_cap() is more complicated than it needs to be. Before cpucaps are finalized, it will perform a bitmap test of the system_cpucaps bitmap, and once cpucaps are finalized it will use an alternative branch. This used to be necessary to handle some race conditions in the window between cpucap detection and the subsequent patching of alternatives and static branches, where different branches could be out-of-sync with one another (or w.r.t. alternative sequences). Now that we use alternative branches instead of static branches, these are all patched atomically w.r.t. one another, and there are only a handful of cases that need special care in the window between cpucap detection and alternative patching. Due to the above, it would be nice to remove cpus_have_const_cap(), and migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(), or cpus_have_cap() depending on when their requirements. This will remove redundant instructions and improve code generation, and will make it easier to determine how each callsite will behave before, during, and after alternative patching. The cpu_enable_cnp() callback is run immediately after the ARM64_HAS_CNP cpucap is detected system-wide under setup_system_capabilities(), prior to alternatives being patched. During this window cpu_enable_cnp() uses cpu_replace_ttbr1() to set the CNP bit for the swapper_pg_dir in TTBR1. No other users of the ARM64_HAS_CNP cpucap need the up-to-date value during this window: * As KVM isn't initialized yet, kvm_get_vttbr() isn't reachable. * As cpuidle isn't initialized yet, __cpu_suspend_exit() isn't reachable. * At this point all CPUs are using the swapper_pg_dir with a reserved ASID in TTBR1, and the idmap_pg_dir in TTBR0, so neither check_and_switch_context() nor cpu_do_switch_mm() need to do anything special. This patch replaces the use of cpus_have_const_cap() with alternative_has_cap_unlikely(), which will avoid generating code to test the system_cpucaps bitmap and should be better for all subsequent calls at runtime. To allow cpu_enable_cnp() to function prior to alternatives being patched, cpu_replace_ttbr1() is split into cpu_replace_ttbr1() and cpu_enable_swapper_cnp(), with the former only used for early TTBR1 replacement, and the latter used by both cpu_enable_cnp() and __cpu_suspend_exit(). Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Vladimir Murzin <vladimir.murzin@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16arm64: Avoid cpus_have_const_cap() for ARM64_HAS_CACHE_DICMark Rutland1-1/+1
In icache_inval_all_pou() we use cpus_have_const_cap() to check for ARM64_HAS_CACHE_DIC, but this is not necessary and alternative_has_cap_*() would be preferable. For historical reasons, cpus_have_const_cap() is more complicated than it needs to be. Before cpucaps are finalized, it will perform a bitmap test of the system_cpucaps bitmap, and once cpucaps are finalized it will use an alternative branch. This used to be necessary to handle some race conditions in the window between cpucap detection and the subsequent patching of alternatives and static branches, where different branches could be out-of-sync with one another (or w.r.t. alternative sequences). Now that we use alternative branches instead of static branches, these are all patched atomically w.r.t. one another, and there are only a handful of cases that need special care in the window between cpucap detection and alternative patching. Due to the above, it would be nice to remove cpus_have_const_cap(), and migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(), or cpus_have_cap() depending on when their requirements. This will remove redundant instructions and improve code generation, and will make it easier to determine how each callsite will behave before, during, and after alternative patching. The cpus_have_const_cap() check in icache_inval_all_pou() is an optimization to skip a redundant (but benign) IC IALLUIS + DSB ISH sequence when all CPUs in the system have DIC. In the window between detecting the ARM64_HAS_CACHE_DIC cpucap and patching alternative branches there is only a single potential call to icache_inval_all_pou() (in the alternatives patching itself), which there's no need to optimize for at the expense of other callers. This patch replaces the use of cpus_have_const_cap() with alternative_has_cap_unlikely(), which will avoid generating code to test the system_cpucaps bitmap and should be better for all subsequent calls at runtime. This also aligns better with the way we patch the assembly cache maintenance sequences in arch/arm64/mm/cache.S. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16arm64: Avoid cpus_have_const_cap() for ARM64_HAS_BTIMark Rutland5-10/+11
In system_supports_bti() we use cpus_have_const_cap() to check for ARM64_HAS_BTI, but this is not necessary and alternative_has_cap_*() or cpus_have_final_*cap() would be preferable. For historical reasons, cpus_have_const_cap() is more complicated than it needs to be. Before cpucaps are finalized, it will perform a bitmap test of the system_cpucaps bitmap, and once cpucaps are finalized it will use an alternative branch. This used to be necessary to handle some race conditions in the window between cpucap detection and the subsequent patching of alternatives and static branches, where different branches could be out-of-sync with one another (or w.r.t. alternative sequences). Now that we use alternative branches instead of static branches, these are all patched atomically w.r.t. one another, and there are only a handful of cases that need special care in the window between cpucap detection and alternative patching. Due to the above, it would be nice to remove cpus_have_const_cap(), and migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(), or cpus_have_cap() depending on when their requirements. This will remove redundant instructions and improve code generation, and will make it easier to determine how each callsite will behave before, during, and after alternative patching. When CONFIG_ARM64_BTI_KERNEL=y, the ARM64_HAS_BTI cpucap is a strict boot cpu feature which is detected and patched early on the boot cpu. All uses guarded by CONFIG_ARM64_BTI_KERNEL happen after the boot CPU has detected ARM64_HAS_BTI and patched boot alternatives, and hence can safely use alternative_has_cap_*() or cpus_have_final_boot_cap(). Regardless of CONFIG_ARM64_BTI_KERNEL, all other uses of ARM64_HAS_BTI happen after system capabilities have been finalized and alternatives have been patched. Hence these can safely use alternative_has_cap_*) or cpus_have_final_cap(). This patch splits system_supports_bti() into system_supports_bti() and system_supports_bti_kernel(), with the former handling where the cpucap affects userspace functionality, and ther latter handling where the cpucap affects kernel functionality. The use of cpus_have_const_cap() is replaced by cpus_have_final_cap() in cpus_have_const_cap, and cpus_have_final_boot_cap() in system_supports_bti_kernel(). This will avoid generating code to test the system_cpucaps bitmap and should be better for all subsequent calls at runtime. The use of cpus_have_final_cap() and cpus_have_final_boot_cap() will make it easier to spot if code is chaanged such that these run before the ARM64_HAS_BTI cpucap is guaranteed to have been finalized. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16arm64: Avoid cpus_have_const_cap() for ARM64_HAS_ARMv8_4_TTLMark Rutland1-1/+1
In __tlbi_level() we use cpus_have_const_cap() to check for ARM64_HAS_ARMv8_4_TTL, but this is not necessary and alternative_has_cap_*() would be preferable. For historical reasons, cpus_have_const_cap() is more complicated than it needs to be. Before cpucaps are finalized, it will perform a bitmap test of the system_cpucaps bitmap, and once cpucaps are finalized it will use an alternative branch. This used to be necessary to handle some race conditions in the window between cpucap detection and the subsequent patching of alternatives and static branches, where different branches could be out-of-sync with one another (or w.r.t. alternative sequences). Now that we use alternative branches instead of static branches, these are all patched atomically w.r.t. one another, and there are only a handful of cases that need special care in the window between cpucap detection and alternative patching. Due to the above, it would be nice to remove cpus_have_const_cap(), and migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(), or cpus_have_cap() depending on when their requirements. This will remove redundant instructions and improve code generation, and will make it easier to determine how each callsite will behave before, during, and after alternative patching. In the window between detecting the ARM64_HAS_ARMv8_4_TTL cpucap and patching alternative branches, we do not perform any TLB invalidation, and even if we were to perform TLB invalidation here it would not be functionally necessary to optimize this by using the TTL hint. Hence there's no need to use cpus_have_const_cap(), and alternative_has_cap_unlikely() is sufficient. This patch replaces the use of cpus_have_const_cap() with alternative_has_cap_unlikely(), which will avoid generating code to test the system_cpucaps bitmap and should be better for all subsequent calls at runtime. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16arm64: Avoid cpus_have_const_cap() for ARM64_HAS_{ADDRESS,GENERIC}_AUTHMark Rutland1-2/+2
In system_supports_address_auth() and system_supports_generic_auth() we use cpus_have_const_cap to check for ARM64_HAS_ADDRESS_AUTH and ARM64_HAS_GENERIC_AUTH respectively, but this is not necessary and alternative_has_cap_*() would bre preferable. For historical reasons, cpus_have_const_cap() is more complicated than it needs to be. Before cpucaps are finalized, it will perform a bitmap test of the system_cpucaps bitmap, and once cpucaps are finalized it will use an alternative branch. This used to be necessary to handle some race conditions in the window between cpucap detection and the subsequent patching of alternatives and static branches, where different branches could be out-of-sync with one another (or w.r.t. alternative sequences). Now that we use alternative branches instead of static branches, these are all patched atomically w.r.t. one another, and there are only a handful of cases that need special care in the window between cpucap detection and alternative patching. Due to the above, it would be nice to remove cpus_have_const_cap(), and migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(), or cpus_have_cap() depending on when their requirements. This will remove redundant instructions and improve code generation, and will make it easier to determine how each callsite will behave before, during, and after alternative patching. The ARM64_HAS_ADDRESS_AUTH cpucap is a boot cpu feature which is detected and patched early on the boot CPU before any pointer authentication keys are enabled via their respective SCTLR_ELx.EN* bits. Nothing which uses system_supports_address_auth() is called before the boot alternatives are patched. Thus it is safe for system_supports_address_auth() to use cpus_have_final_boot_cap() to check for ARM64_HAS_ADDRESS_AUTH. The ARM64_HAS_GENERIC_AUTH cpucap is a system feature which is detected on all CPUs, then finalized and patched under setup_system_capabilities(). We use system_supports_generic_auth() in a few places: * The pac_generic_keys_get() and pac_generic_keys_set() functions are only reachable from system calls once userspace is up and running. As cpucaps are finalzied long before userspace runs, these can safely use alternative_has_cap_*() or cpus_have_final_cap(). * The ptrauth_prctl_reset_keys() function is only reachable from system calls once userspace is up and running. As cpucaps are finalized long before userspace runs, this can safely use alternative_has_cap_*() or cpus_have_final_cap(). * The ptrauth_keys_install_user() function is used during context-switch. This is called prior to alternatives being applied, and so cannot use cpus_have_final_cap(), but as this only needs to switch the APGA key for userspace tasks, it's safe to use alternative_has_cap_*(). * The ptrauth_keys_init_user() function is used to initialize userspace keys, and is only reachable after system cpucaps have been finalized and patched. Thus this can safely use alternative_has_cap_*() or cpus_have_final_cap(). * The system_has_full_ptr_auth() helper function is only used by KVM code, which is only reachable after system cpucaps have been finalized and patched. Thus this can safely use alternative_has_cap_*() or cpus_have_final_cap(). This patch modifies system_supports_address_auth() to use cpus_have_final_boot_cap() to check ARM64_HAS_ADDRESS_AUTH, and modifies system_supports_generic_auth() to use alternative_has_cap_unlikely() to check ARM64_HAS_GENERIC_AUTH. In either case this will avoid generating code to test the system_cpucaps bitmap and should be better for all subsequent calls at runtime. The use of cpus_have_final_boot_cap() will make it easier to spot if code is chaanged such that these run before the relevant cpucap is guaranteed to have been finalized. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16arm64: Use a positive cpucap for FP/SIMDMark Rutland6-26/+44
Currently we have a negative cpucap which describes the *absence* of FP/SIMD rather than *presence* of FP/SIMD. This largely works, but is somewhat awkward relative to other cpucaps that describe the presence of a feature, and it would be nicer to have a cpucap which describes the presence of FP/SIMD: * This will allow the cpucap to be treated as a standard ARM64_CPUCAP_SYSTEM_FEATURE, which can be detected with the standard has_cpuid_feature() function and ARM64_CPUID_FIELDS() description. * This ensures that the cpucap will only transition from not-present to present, reducing the risk of unintentional and/or unsafe usage of FP/SIMD before cpucaps are finalized. * This will allow using arm64_cpu_capabilities::cpu_enable() to enable the use of FP/SIMD later, with FP/SIMD being disabled at boot time otherwise. This will ensure that any unintentional and/or unsafe usage of FP/SIMD prior to this is trapped, and will ensure that FP/SIMD is never unintentionally enabled for userspace in mismatched big.LITTLE systems. This patch replaces the negative ARM64_HAS_NO_FPSIMD cpucap with a positive ARM64_HAS_FPSIMD cpucap, making changes as described above. Note that as FP/SIMD will now be trapped when not supported system-wide, do_fpsimd_acc() must handle these traps in the same way as for SVE and SME. The commentary in fpsimd_restore_current_state() is updated to describe the new scheme. No users of system_supports_fpsimd() need to know that FP/SIMD is available prior to alternatives being patched, so this is updated to use alternative_has_cap_likely() to check for the ARM64_HAS_FPSIMD cpucap, without generating code to test the system_cpucaps bitmap. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16arm64: Rename SVE/SME cpu_enable functionsMark Rutland3-16/+12
The arm64_cpu_capabilities::cpu_enable() callbacks for SVE, SME, SME2, and FA64 are named with an unusual "${feature}_kernel_enable" pattern rather than the much more common "cpu_enable_${feature}". Now that we only use these as cpu_enable() callbacks, it would be nice to have them match the usual scheme. This patch renames the cpu_enable() callbacks to match this scheme. At the same time, the comment above cpu_enable_sve() is removed for consistency with the other cpu_enable() callbacks. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16arm64: Use build-time assertions for cpucap orderingMark Rutland1-8/+6
Both sme2_kernel_enable() and fa64_kernel_enable() need to run after sme_kernel_enable(). This happens to be true today as ARM64_SME has a lower index than either ARM64_SME2 or ARM64_SME_FA64, and both functions have a comment to this effect. It would be nicer to have a build-time assertion like we for for can_use_gic_priorities() and has_gic_prio_relaxed_sync(), as that way it will be harder to miss any potential breakage. This patch replaces the comments with build-time assertions. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Mark Brown <broonie@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16arm64: Explicitly save/restore CPACR when probing SVE and SMEMark Rutland3-7/+50
When a CPUs onlined we first probe for supported features and propetites, and then we subsequently enable features that have been detected. This is a little problematic for SVE and SME, as some properties (e.g. vector lengths) cannot be probed while they are disabled. Due to this, the code probing for SVE properties has to enable SVE for EL1 prior to proving, and the code probing for SME properties has to enable SME for EL1 prior to probing. We never disable SVE or SME for EL1 after probing. It would be a little nicer to transiently enable SVE and SME during probing, leaving them both disabled unless explicitly enabled, as this would make it much easier to catch unintentional usage (e.g. when they are not present system-wide). This patch reworks the SVE and SME feature probing code to only transiently enable support at EL1, disabling after probing is complete. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Reviewed-by: Mark Brown <broonie@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16arm64: kvm: Use cpus_have_final_cap() explicitlyMark Rutland9-15/+15
Much of the arm64 KVM code uses cpus_have_const_cap() to check for cpucaps, but this is unnecessary and it would be preferable to use cpus_have_final_cap(). For historical reasons, cpus_have_const_cap() is more complicated than it needs to be. Before cpucaps are finalized, it will perform a bitmap test of the system_cpucaps bitmap, and once cpucaps are finalized it will use an alternative branch. This used to be necessary to handle some race conditions in the window between cpucap detection and the subsequent patching of alternatives and static branches, where different branches could be out-of-sync with one another (or w.r.t. alternative sequences). Now that we use alternative branches instead of static branches, these are all patched atomically w.r.t. one another, and there are only a handful of cases that need special care in the window between cpucap detection and alternative patching. Due to the above, it would be nice to remove cpus_have_const_cap(), and migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(), or cpus_have_cap() depending on when their requirements. This will remove redundant instructions and improve code generation, and will make it easier to determine how each callsite will behave before, during, and after alternative patching. KVM is initialized after cpucaps have been finalized and alternatives have been patched. Since commit: d86de40decaa14e6 ("arm64: cpufeature: upgrade hyp caps to final") ... use of cpus_have_const_cap() in hyp code is automatically converted to use cpus_have_final_cap(): | static __always_inline bool cpus_have_const_cap(int num) | { | if (is_hyp_code()) | return cpus_have_final_cap(num); | else if (system_capabilities_finalized()) | return __cpus_have_const_cap(num); | else | return cpus_have_cap(num); | } Thus, converting hyp code to use cpus_have_final_cap() directly will not result in any functional change. Non-hyp KVM code is also not executed until cpucaps have been finalized, and it would be preferable to extent the same treatment to this code and use cpus_have_final_cap() directly. This patch converts instances of cpus_have_const_cap() in KVM-only code over to cpus_have_final_cap(). As all of this code runs after cpucaps have been finalized, there should be no functional change as a result of this patch, but the redundant instructions generated by cpus_have_const_cap() will be removed from the non-hyp KVM code. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16arm64: Split kpti_install_ng_mappings()Mark Rutland1-21/+33
The arm64_cpu_capabilities::cpu_enable callbacks are intended for cpu-local feature enablement (e.g. poking system registers). These get called for each online CPU when boot/system cpucaps get finalized and enabled, and get called whenever a CPU is subsequently onlined. For KPTI with the ARM64_UNMAP_KERNEL_AT_EL0 cpucap, we use the kpti_install_ng_mappings() function as the cpu_enable callback. This does a mixture of cpu-local configuration (setting VBAR_EL1 to the appropriate trampoline vectors) and some global configuration (rewriting the swapper page tables to sue non-glboal mappings) that must happen at most once. This patch splits kpti_install_ng_mappings() into a cpu-local cpu_enable_kpti() initialization function and a system-wide kpti_install_ng_mappings() function. The cpu_enable_kpti() function is responsible for selecting the necessary cpu-local vectors each time a CPU is onlined, and the kpti_install_ng_mappings() function performs the one-time rewrite of the translation tables too use non-global mappings. Splitting the two makes the code a bit easier to follow and also allows the page table rewriting code to be marked as __init such that it can be freed after use. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16arm64: Fixup user features at boot timeMark Rutland3-17/+15
For ARM64_WORKAROUND_2658417, we use a cpu_enable() callback to hide the ID_AA64ISAR1_EL1.BF16 ID register field. This is a little awkward as CPUs may attempt to apply the workaround concurrently, requiring that we protect the bulk of the callback with a raw_spinlock, and requiring some pointless work every time a CPU is subsequently hotplugged in. This patch makes this a little simpler by handling the masking once at boot time. A new user_feature_fixup() function is called at the start of setup_user_features() to mask the feature, matching the style of elf_hwcap_fixup(). The ARM64_WORKAROUND_2658417 cpucap is added to cpucap_is_possible() so that code can be elided entirely when this is not possible. Note that the ARM64_WORKAROUND_2658417 capability is matched with ERRATA_MIDR_RANGE(), which implicitly gives the capability a ARM64_CPUCAP_LOCAL_CPU_ERRATUM type, which forbids the late onlining of a CPU with the erratum if the erratum was not present at boot time. Therefore this patch doesn't change the behaviour for late onlining. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Mark Brown <broonie@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>