summaryrefslogtreecommitdiff
path: root/arch/arm64/include
AgeCommit message (Collapse)AuthorFilesLines
2022-05-16arm64: mm: Make arch_faults_on_old_pte() check for migratabilityValentin Schneider1-1/+2
arch_faults_on_old_pte() relies on the calling context being non-preemptible. CONFIG_PREEMPT_RT turns the PTE lock into a sleepable spinlock, which doesn't disable preemption once acquired, triggering the warning in arch_faults_on_old_pte(). It does however disable migration, ensuring the task remains on the same CPU during the entirety of the critical section, making the read of cpu_has_hw_af() safe and stable. Make arch_faults_on_old_pte() check cant_migrate() instead of preemptible(). Cc: Valentin Schneider <vschneid@redhat.com> Suggested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Link: https://lore.kernel.org/r/20220127192437.1192957-1-valentin.schneider@arm.com Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20220505163207.85751-4-bigeasy@linutronix.de Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-05-16Merge branch kvm-arm64/misc-5.19 into kvmarm-master/nextMarc Zyngier2-1/+9
* kvm-arm64/misc-5.19: : . : Misc fixes and general improvements for KVMM/arm64: : : - Better handle out of sequence sysregs in the global tables : : - Remove a couple of unnecessary loads from constant pool : : - Drop unnecessary pKVM checks : : - Add all known M1 implementations to the SEIS workaround : : - Cleanup kerneldoc warnings : . KVM: arm64: vgic-v3: List M1 Pro/Max as requiring the SEIS workaround KVM: arm64: pkvm: Don't mask already zeroed FEAT_SVE KVM: arm64: pkvm: Drop unnecessary FP/SIMD trap handler KVM: arm64: nvhe: Eliminate kernel-doc warnings KVM: arm64: Avoid unnecessary absolute addressing via literals KVM: arm64: Print emulated register table name when it is unsorted KVM: arm64: Don't BUG_ON() if emulated register table is unsorted Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-05-16Merge branch kvm-arm64/per-vcpu-host-pmu-data into kvmarm-master/nextMarc Zyngier1-11/+0
* kvm-arm64/per-vcpu-host-pmu-data: : . : Pass the host PMU state in the vcpu to avoid the use of additional : shared memory between EL1 and EL2 (this obviously only applies : to nVHE and Protected setups). : : Patches courtesy of Fuad Tabba. : . KVM: arm64: pmu: Restore compilation when HW_PERF_EVENTS isn't selected KVM: arm64: Reenable pmu in Protected Mode KVM: arm64: Pass pmu events to hyp via vcpu KVM: arm64: Repack struct kvm_pmu to reduce size KVM: arm64: Wrapper for getting pmu_events Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-05-16Merge branch kvm-arm64/psci-suspend into kvmarm-master/nextMarc Zyngier1-2/+8
* kvm-arm64/psci-suspend: : . : Add support for PSCI SYSTEM_SUSPEND and allow userspace to : filter the wake-up events. : : Patches courtesy of Oliver. : . Documentation: KVM: Fix title level for PSCI_SUSPEND selftests: KVM: Test SYSTEM_SUSPEND PSCI call selftests: KVM: Refactor psci_test to make it amenable to new tests selftests: KVM: Use KVM_SET_MP_STATE to power off vCPU in psci_test selftests: KVM: Create helper for making SMCCC calls selftests: KVM: Rename psci_cpu_on_test to psci_test KVM: arm64: Implement PSCI SYSTEM_SUSPEND KVM: arm64: Add support for userspace to suspend a vCPU KVM: arm64: Return a value from check_vcpu_requests() KVM: arm64: Rename the KVM_REQ_SLEEP handler KVM: arm64: Track vCPU power state using MP state values KVM: arm64: Dedupe vCPU power off helpers KVM: arm64: Don't depend on fallthrough to hide SYSTEM_RESET2 Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-05-16Merge branch kvm-arm64/hcall-selection into kvmarm-master/nextMarc Zyngier2-0/+50
* kvm-arm64/hcall-selection: : . : Introduce a new set of virtual sysregs for userspace to : select the hypercalls it wants to see exposed to the guest. : : Patches courtesy of Raghavendra and Oliver. : . KVM: arm64: Fix hypercall bitmap writeback when vcpus have already run KVM: arm64: Hide KVM_REG_ARM_*_BMAP_BIT_COUNT from userspace Documentation: Fix index.rst after psci.rst renaming selftests: KVM: aarch64: Add the bitmap firmware registers to get-reg-list selftests: KVM: aarch64: Introduce hypercall ABI test selftests: KVM: Create helper for making SMCCC calls selftests: KVM: Rename psci_cpu_on_test to psci_test tools: Import ARM SMCCC definitions Docs: KVM: Add doc for the bitmap firmware registers Docs: KVM: Rename psci.rst to hypercalls.rst KVM: arm64: Add vendor hypervisor firmware register KVM: arm64: Add standard hypervisor firmware register KVM: arm64: Setup a framework for hypercall bitmap firmware registers KVM: arm64: Factor out firmware register handling from psci.c Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-05-16KVM: arm64: pmu: Restore compilation when HW_PERF_EVENTS isn't selectedMarc Zyngier1-6/+0
Moving kvm_pmu_events into the vcpu (and refering to it) broke the somewhat unusual case where the kernel has no support for a PMU at all. In order to solve this, move things around a bit so that we can easily avoid refering to the pmu structure outside of PMU-aware code. As a bonus, pmu.c isn't compiled in when HW_PERF_EVENTS isn't selected. Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/202205161814.KQHpOzsJ-lkp@intel.com
2022-05-15irqchip/gic-v3: Refactor ISB + EOIR at ack timeMark Rutland1-6/+0
There are cases where a context synchronization event is necessary between an IRQ being raised and being handled, and there are races such that we cannot rely upon the exception entry being subsequent to the interrupt being raised. To fix this, we place an ISB between a read of IAR and the subsequent invocation of an IRQ handler. When EOI mode 1 is in use, we need to EOI an interrupt prior to invoking its handler, and we have a write to EOIR for this. As this write to EOIR requires an ISB, and this is provided by the gic_write_eoir() helper, we omit the usual ISB in this case, with the logic being: | if (static_branch_likely(&supports_deactivate_key)) | gic_write_eoir(irqnr); | else | isb(); This is somewhat opaque, and it would be a little clearer if there were an unconditional ISB, with only the write to EOIR being conditional, e.g. | if (static_branch_likely(&supports_deactivate_key)) | write_gicreg(irqnr, ICC_EOIR1_EL1); | | isb(); This patch rewrites the code that way, with this logic factored into a new helper function with comments explaining what the ISB is for, as were originally laid out in commit: 39a06b67c2c1256b ("irqchip/gic: Ensure we have an ISB between ack and ->handle_irq") Note that since then, we removed the IAR polling in commit: 342677d70ab92142 ("irqchip/gic-v3: Remove acknowledge loop") ... which removed one of the two race conditions. For consistency, other portions of the driver are made to manipulate EOIR using write_gicreg() and explcit ISBs, and the gic_write_eoir() helper function is removed. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220513133038.226182-3-mark.rutland@arm.com
2022-05-15KVM: arm64: Hide KVM_REG_ARM_*_BMAP_BIT_COUNT from userspaceMarc Zyngier1-0/+6
These constants will change over time, and userspace has no business knowing about them. Hide them behind __KERNEL__. Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-05-15KVM: arm64: Pass pmu events to hyp via vcpuFuad Tabba1-6/+1
Instead of the host accessing hyp data directly, pass the pmu events of the current cpu to hyp via the vcpu. This adds 64 bits (in two fields) to the vcpu that need to be synced before every vcpu run in nvhe and protected modes. However, it isolates the hypervisor from the host, which allows us to use pmu in protected mode in a subsequent patch. No visible side effects in behavior intended. Signed-off-by: Fuad Tabba <tabba@google.com> Reviewed-by: Oliver Upton <oupton@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220510095710.148178-4-tabba@google.com
2022-05-15KVM: arm64: vgic-v3: List M1 Pro/Max as requiring the SEIS workaroundMarc Zyngier1-0/+8
Unsusprisingly, Apple M1 Pro/Max have the exact same defect as the original M1 and generate random SErrors in the host when a guest tickles the GICv3 CPU interface the wrong way. Add the part numbers for both the CPU types found in these two new implementations, and add them to the hall of shame. This also applies to the Ultra version, as it is composed of 2 Max SoCs. Signed-off-by: Marc Zyngier <maz@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20220514102524.3188730-1-maz@kernel.org
2022-05-14mm: change huge_ptep_clear_flush() to return the original pteBaolin Wang1-2/+2
Patch series "Fix CONT-PTE/PMD size hugetlb issue when unmapping or migrating", v4. presently, migrating a hugetlb page or unmapping a poisoned hugetlb page, we'll use ptep_clear_flush() and set_pte_at() to nuke the page table entry and remap it, and this is incorrect for CONT-PTE or CONT-PMD size hugetlb page, which will cause potential data consistent issue. This patch set will change to use hugetlb related APIs to fix this issue. Note: Mike pointed out the huge_ptep_get() will only return the one specific value, and it would not take into account the dirty or young bits of CONT-PTE/PMDs like the huge_ptep_get_and_clear() [1]. This inconsistent issue is not introduced by this patch set, and this issue will be addressed in another thread [2]. Meanwhile the uffd for hugetlb case [3] pointed out by Gerald also needs another patch to address. [1] https://lore.kernel.org/linux-mm/85bd80b4-b4fd-0d3f-a2e5-149559f2f387@oracle.com/ [2] https://lore.kernel.org/all/cover.1651998586.git.baolin.wang@linux.alibaba.com/ [3] https://lore.kernel.org/linux-mm/20220503120343.6264e126@thinkpad/ This patch (of 3): It is incorrect to use ptep_clear_flush() to nuke a hugetlb page table when unmapping or migrating a hugetlb page, and will change to use huge_ptep_clear_flush() instead in the following patches. So this is a preparation patch, which changes the huge_ptep_clear_flush() to return the original pte to help to nuke a hugetlb page table. [baolin.wang@linux.alibaba.com: fix build in several more architectures] Link: https://lkml.kernel.org/r/0009a4cd-2826-e8be-e671-f050d4f18d5d@linux.alibaba.com [sfr@canb.auug.org.au: fixup] Link: https://lkml.kernel.org/r/20220511181531.7f27a5c1@canb.auug.org.au Link: https://lkml.kernel.org/r/cover.1652270205.git.baolin.wang@linux.alibaba.com Link: https://lkml.kernel.org/r/20f77ddab90baa249bd24504c413189b82acde69.1652270205.git.baolin.wang@linux.alibaba.com Link: https://lkml.kernel.org/r/cover.1652147571.git.baolin.wang@linux.alibaba.com Link: https://lkml.kernel.org/r/dcf065868cce35bceaf138613ad27f17bb7c0c19.1652147571.git.baolin.wang@linux.alibaba.com Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Helge Deller <deller@gmx.de> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Yoshinori Sato <ysato@users.osdn.me> Cc: Rich Felker <dalias@libc.org> Cc: David S. Miller <davem@davemloft.net> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13Merge tag 'mm-hotfixes-stable-2022-05-11' of ↵Linus Torvalds1-0/+4
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "Seven MM fixes, three of which address issues added in the most recent merge window, four of which are cc:stable. Three non-MM fixes, none very serious" [ And yes, that's a real pull request from Andrew, not me creating a branch from emailed patches. Woo-hoo! ] * tag 'mm-hotfixes-stable-2022-05-11' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: MAINTAINERS: add a mailing list for DAMON development selftests: vm: Makefile: rename TARGETS to VMTARGETS mm/kfence: reset PG_slab and memcg_data before freeing __kfence_pool mailmap: add entry for martyna.szapar-mudlaw@intel.com arm[64]/memremap: don't abuse pfn_valid() to ensure presence of linear map procfs: prevent unprivileged processes accessing fdinfo dir mm: mremap: fix sign for EFAULT error return value mm/hwpoison: use pr_err() instead of dump_page() in get_any_page() mm/huge_memory: do not overkill when splitting huge_zero_page Revert "mm/memory-failure.c: skip huge_zero_page in memory_failure()"
2022-05-13arm64/mm: enable ARCH_SUPPORTS_PAGE_TABLE_CHECKKefeng Wang1-6/+55
As commit d283d422c6c4 ("x86: mm: add x86_64 support for page table check") , enable ARCH_SUPPORTS_PAGE_TABLE_CHECK on arm64. Add additional page table check stubs for page table helpers, these stubs can be used to check the existing page table entries. Link: https://lkml.kernel.org/r/20220507110114.4128854-6-tongtiangen@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Tong Tiangen <tongtiangen@huawei.com> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13mm/shmem: convert shmem_swapin_page() to shmem_swapin_folio()Matthew Wilcox (Oracle)1-3/+3
shmem_swapin_page() only brings in order-0 pages, which are folios by definition. Link: https://lkml.kernel.org/r/20220504182857.4013401-24-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13mm: make minimum slab alignment a runtime propertyPeter Collingbourne1-5/+12
When CONFIG_KASAN_HW_TAGS is enabled we currently increase the minimum slab alignment to 16. This happens even if MTE is not supported in hardware or disabled via kasan=off, which creates an unnecessary memory overhead in those cases. Eliminate this overhead by making the minimum slab alignment a runtime property and only aligning to 16 if KASAN is enabled at runtime. On a DragonBoard 845c (non-MTE hardware) with a kernel built with CONFIG_KASAN_HW_TAGS, waiting for quiescence after a full Android boot I see the following Slab measurements in /proc/meminfo (median of 3 reboots): Before: 169020 kB After: 167304 kB [akpm@linux-foundation.org: make slab alignment type `unsigned int' to avoid casting] Link: https://linux-review.googlesource.com/id/I752e725179b43b144153f4b6f584ceb646473ead Link: https://lkml.kernel.org/r/20220427195820.1716975-2-pcc@google.com Signed-off-by: Peter Collingbourne <pcc@google.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Acked-by: David Rientjes <rientjes@google.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Pekka Enberg <penberg@kernel.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13printk: stop including cache.h from printk.hPeter Collingbourne2-0/+2
An inclusion of cache.h in printk.h was added in 2014 in commit c28aa1f0a847 ("printk/cache: mark printk_once test variable __read_mostly") in order to bring in the definition of __read_mostly. The usage of __read_mostly was later removed in commit 3ec25826ae33 ("printk: Tie printk_once / printk_deferred_once into .data.once for reset") which made the inclusion of cache.h unnecessary, so remove it. We have a small amount of code that depended on the inclusion of cache.h from printk.h; fix that code to include the appropriate header. This fixes a circular inclusion on arm64 (linux/printk.h -> linux/cache.h -> asm/cache.h -> linux/kasan-enabled.h -> linux/static_key.h -> linux/jump_label.h -> linux/bug.h -> asm/bug.h -> linux/printk.h) that would otherwise be introduced by the next patch. Build tested using {allyesconfig,defconfig} x {arm64,x86_64}. Link: https://linux-review.googlesource.com/id/I8fd51f72c9ef1f2d6afd3b2cbc875aa4792c1fba Link: https://lkml.kernel.org/r/20220427195820.1716975-1-pcc@google.com Signed-off-by: Peter Collingbourne <pcc@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David Rientjes <rientjes@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kees Cook <keescook@chromium.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-11swiotlb-xen: fix DMA_ATTR_NO_KERNEL_MAPPING on armChristoph Hellwig1-2/+0
swiotlb-xen uses very different ways to allocate coherent memory on x86 vs arm. On the former it allocates memory from the page allocator, while on the later it reuses the dma-direct allocator the handles the complexities of non-coherent DMA on arm platforms. Unfortunately the complexities of trying to deal with the two cases in the swiotlb-xen.c code lead to a bug in the handling of DMA_ATTR_NO_KERNEL_MAPPING on arm. With the DMA_ATTR_NO_KERNEL_MAPPING flag the coherent memory allocator does not actually allocate coherent memory, but just a DMA handle for some memory that is DMA addressable by the device, but which does not have to have a kernel mapping. Thus dereferencing the return value will lead to kernel crashed and memory corruption. Fix this by using the dma-direct allocator directly for arm, which works perfectly fine because on arm swiotlb-xen is only used when the domain is 1:1 mapped, and then simplifying the remaining code to only cater for the x86 case with DMA coherent device. Reported-by: Rahul Singh <Rahul.Singh@arm.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Rahul Singh <rahul.singh@arm.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Tested-by: Rahul Singh <rahul.singh@arm.com>
2022-05-10arm64/pgtable: support __HAVE_ARCH_PTE_SWP_EXCLUSIVEDavid Hildenbrand2-3/+21
Let's use one of the type bits: core-mm only supports 5, so there is no need to consume 6. Note that we might be able to reuse bit 1, but reusing bit 1 turned out problematic in the past for PROT_NONE handling; so let's play safe and use another bit. Link: https://lkml.kernel.org/r/20220329164329.208407-5-david@redhat.com Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Don Dutile <ddutile@redhat.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Liang Zhang <zhangliang5@huawei.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@kernel.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Nadav Amit <namit@vmware.com> Cc: Oded Gabbay <oded.gabbay@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Pedro Demarchi Gomes <pedrodemargomes@gmail.com> Cc: Peter Xu <peterx@redhat.com> Cc: Rik van Riel <riel@surriel.com> Cc: Roman Gushchin <guro@fb.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-10arm[64]/memremap: don't abuse pfn_valid() to ensure presence of linear mapMike Rapoport1-0/+4
The semantics of pfn_valid() is to check presence of the memory map for a PFN and not whether a PFN is covered by the linear map. The memory map may be present for NOMAP memory regions, but they won't be mapped in the linear mapping. Accessing such regions via __va() when they are memremap()'ed will cause a crash. On v5.4.y the crash happens on qemu-arm with UEFI [1]: <1>[ 0.084476] 8<--- cut here --- <1>[ 0.084595] Unable to handle kernel paging request at virtual address dfb76000 <1>[ 0.084938] pgd = (ptrval) <1>[ 0.085038] [dfb76000] *pgd=5f7fe801, *pte=00000000, *ppte=00000000 ... <4>[ 0.093923] [<c0ed6ce8>] (memcpy) from [<c16a06f8>] (dmi_setup+0x60/0x418) <4>[ 0.094204] [<c16a06f8>] (dmi_setup) from [<c16a38d4>] (arm_dmi_init+0x8/0x10) <4>[ 0.094408] [<c16a38d4>] (arm_dmi_init) from [<c0302e9c>] (do_one_initcall+0x50/0x228) <4>[ 0.094619] [<c0302e9c>] (do_one_initcall) from [<c16011e4>] (kernel_init_freeable+0x15c/0x1f8) <4>[ 0.094841] [<c16011e4>] (kernel_init_freeable) from [<c0f028cc>] (kernel_init+0x8/0x10c) <4>[ 0.095057] [<c0f028cc>] (kernel_init) from [<c03010e8>] (ret_from_fork+0x14/0x2c) On kernels v5.10.y and newer the same crash won't reproduce on ARM because commit b10d6bca8720 ("arch, drivers: replace for_each_membock() with for_each_mem_range()") changed the way memory regions are registered in the resource tree, but that merely covers up the problem. On ARM64 memory resources registered in yet another way and there the issue of wrong usage of pfn_valid() to ensure availability of the linear map is also covered. Implement arch_memremap_can_ram_remap() on ARM and ARM64 to prevent access to NOMAP regions via the linear mapping in memremap(). Link: https://lore.kernel.org/all/Yl65zxGgFzF1Okac@sirena.org.uk Link: https://lkml.kernel.org/r/20220426060107.7618-1-rppt@kernel.org Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Reported-by: "kernelci.org bot" <bot@kernelci.org> Tested-by: Mark Brown <broonie@kernel.org> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mark Brown <broonie@kernel.org> Cc: Mark-PK Tsai <mark-pk.tsai@mediatek.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Tony Lindgren <tony@atomide.com> Cc: Will Deacon <will@kernel.org> Cc: <stable@vger.kernel.org> [5.4+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-08arm64: stackleak: fix current_top_of_stack()Mark Rutland1-6/+4
Due to some historical confusion, arm64's current_top_of_stack() isn't what the stackleak code expects. This could in theory result in a number of problems, and practically results in an unnecessary performance hit. We can avoid this by aligning the arm64 implementation with the x86 implementation. The arm64 implementation of current_top_of_stack() was added specifically for stackleak in commit: 0b3e336601b82c6a ("arm64: Add support for STACKLEAK gcc plugin") This was intended to be equivalent to the x86 implementation, but the implementation, semantics, and performance characteristics differ wildly: * On x86, current_top_of_stack() returns the top of the current task's task stack, regardless of which stack is in active use. The implementation accesses a percpu variable which the x86 entry code maintains, and returns the location immediately above the pt_regs on the task stack (above which x86 has some padding). * On arm64 current_top_of_stack() returns the top of the stack in active use (i.e. the one which is currently being used). The implementation checks the SP against a number of potentially-accessible stacks, and will BUG() if no stack is found. The core stackleak_erase() code determines the upper bound of stack to erase with: | if (on_thread_stack()) | boundary = current_stack_pointer; | else | boundary = current_top_of_stack(); On arm64 stackleak_erase() is always called on a task stack, and on_thread_stack() should always be true. On x86, stackleak_erase() is mostly called on a trampoline stack, and is sometimes called on a task stack. Currently, this results in a lot of unnecessary code being generated for arm64 for the impossible !on_thread_stack() case. Some of this is inlined, bloating stackleak_erase(), while portions of this are left out-of-line and permitted to be instrumented (which would be a functional problem if that code were reachable). As a first step towards improving this, this patch aligns arm64's implementation of current_top_of_stack() with x86's, always returning the top of the current task's stack. With GCC 11.1.0 this results in the bulk of the unnecessary code being removed, including all of the out-of-line instrumentable code. While I don't believe there's a functional problem in practice I've marked this as a fix since the semantic was clearly wrong, the fix itself is simple, and other code might rely upon this in future. Fixes: 0b3e336601b82c6a ("arm64: Add support for STACKLEAK gcc plugin") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Alexander Popov <alex.popov@linux.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Kees Cook <keescook@chromium.org> Cc: Will Deacon <will@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220427173128.2603085-2-mark.rutland@arm.com
2022-05-06arm64/sme: More sensibly define the size for the ZA register setMark Brown1-0/+12
Since the vector length configuration mechanism is identical between SVE and SME we share large elements of the code including the definition for the maximum vector length. Unfortunately when we were defining the ABI for SVE we included not only the actual maximum vector length of 2048 bits but also the value possible if all the bits reserved in the architecture for expansion of the LEN field were used, 16384 bits. This starts creating problems if we try to allocate anything for the ZA matrix based on the maximum possible vector length, as we do for the regset used with ptrace during the process of generating a core dump. While the maximum potential size for ZA with the current architecture is a reasonably managable 64K with the higher reserved limit ZA would be 64M which leads to entirely reasonable complaints from the memory management code when we try to allocate a buffer of that size. Avoid these issues by defining the actual maximum vector length for the architecture and using it for the SME regsets. Also use the full ZA_PT_SIZE() with the header rather than just the actual register payload when specifying the size, fixing support for the largest vector lengths now that we have this new, lower define. With the SVE maximum this did not cause problems due to the extra headroom we had. While we're at it add a comment clarifying why even though ZA is a single register we tell the regset code that it is a multi-register regset. Reported-by: Qian Cai <quic_qiancai@quicinc.com> Signed-off-by: Mark Brown <broonie@kernel.org> Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org> Link: https://lore.kernel.org/r/20220505221517.1642014-1-broonie@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-05-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-0/+1
tools/testing/selftests/net/forwarding/Makefile f62c5acc800e ("selftests/net/forwarding: add missing tests to Makefile") 50fe062c806e ("selftests: forwarding: new test, verify host mdb entries") https://lore.kernel.org/all/20220502111539.0b7e4621@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-05mm: Add len and flags parameters to arch_get_mmap_end()Christophe Leroy1-2/+2
Powerpc needs flags and len to make decision on arch_get_mmap_end(). So add them as parameters to arch_get_mmap_end(). Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/b556daabe7d2bdb2361c4d6130280da7c1ba2c14.1649523076.git.christophe.leroy@csgroup.eu
2022-05-04arm64/sysreg: Generate definitions for SCTLR_EL1Mark Brown1-38/+0
Automatically generate register definitions for SCTLR_EL1. No functional change. Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20220503170233.507788-13-broonie@kernel.org [catalin.marinas@arm.com: fix the SCTLR_EL1 encoding] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-05-04KVM: arm64: Don't BUG_ON() if emulated register table is unsortedAlexandru Elisei1-1/+1
To emulate a register access, KVM uses a table of registers sorted by register encoding to speed up queries using binary search. When Linux boots, KVM checks that the table is sorted and uses a BUG_ON() statement to let the user know if it's not. The unfortunate side effect is that an unsorted sysreg table brings down the whole kernel, not just KVM, even though the rest of the kernel can function just fine without KVM. To make matters worse, on machines which lack a serial console, the user is left pondering why the machine is taking so long to boot. Improve this situation by returning an error from kvm_arch_init() if the sysreg tables are not in the correct order. The machine is still very much usable for the user, with the exception of virtualization, who can now easily determine what went wrong. A minor typo has also been corrected in the check_sysreg_table() function. Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220428103405.70884-2-alexandru.elisei@arm.com
2022-05-04arm64/sysreg: Generate definitions for TTBRn_EL1Mark Brown1-2/+0
Automatically generate definitions for accessing the TTBRn_EL1 registers, no functional change. Signed-off-by: Mark Brown <broonie@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20220503170233.507788-12-broonie@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-05-04arm64/sysreg: Generate definitions for ID_AA64ISAR0_EL1Mark Brown1-20/+0
Remove the manual definitions for ID_AA64ISAR0_EL1 in favour of automatic generation. There should be no functional change. The only notable change is that 27:24 TME is defined rather than RES0 reflecting DDI0487H.a. Signed-off-by: Mark Brown <broonie@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20220503170233.507788-11-broonie@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-05-04arm64/sysreg: Enable automatic generation of system register definitionsMark Brown2-0/+9
Now that we have a script for generating system registers hook it up to the build system similarly to cpucaps. Since we don't currently have any actual register information in the input file this should produce no change in the built kernel. For ease of review the register information will be converted in separate patches. Signed-off-by: Mark Brown <broonie@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20220503170233.507788-10-broonie@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-05-04arm64/sysreg: Standardise ID_AA64ISAR0_EL1 macro namesMark Brown2-18/+18
The macros for accessing fields in ID_AA64ISAR0_EL1 omit the _EL1 from the name of the register. In preparation for converting this register to be automatically generated update the names to include an _EL1, there should be no functional change. Signed-off-by: Mark Brown <broonie@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20220503170233.507788-8-broonie@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-05-04arm64: Update name of ID_AA64ISAR0_EL1_ATOMIC to reflect ARMMark Brown1-1/+1
The architecture reference manual refers to the field in bits 23:20 of ID_AA64ISAR0_EL1 with the name "atomic" but the kernel defines for this bitfield use the name "atomics". Bring the two into sync to make it easier to cross reference with the specification. Signed-off-by: Mark Brown <broonie@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20220503170233.507788-7-broonie@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-05-04arm64/sysreg: Define bits for previously RES1 fields in SCTLR_EL1Mark Brown1-21/+32
In older revisions of the architecture SCTLR_EL1 contained several RES1 fields but in DDI0487H.a these now all have assigned functions. In preparation for automatically generating sysreg.h provide explicit definitions for all these bits and use them in the INIT_SCTLR_EL1_ macros where _RES1 was previously used. There should be no functional change. Signed-off-by: Mark Brown <broonie@kernel.org> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20220503170233.507788-6-broonie@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-05-04arm64/sysreg: Rename SCTLR_EL1_NTWE/TWI to SCTLR_EL1_nTWE/TWIMark Brown1-3/+3
We already use lower case in some defines in sysreg.h, for consistency with the architecture definition do so for SCTLR_EL1.nTWE and SCTLR_EL1.nTWI. No functional change. Signed-off-by: Mark Brown <broonie@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20220503170233.507788-5-broonie@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-05-04arm64/mte: Make TCF field values and naming more standardMark Brown1-7/+7
In preparation for automatic generation of the defines for system registers make the values used for the enumeration in SCTLR_ELx.TCF suitable for use with the newly defined SYS_FIELD_PREP_ENUM helper, removing the shift from the define and using the helper to generate it on use instead. Since we only ever interact with this field in EL1 and in preparation for generation of the defines also rename from SCTLR_ELx to SCTLR_EL1. SCTLR_EL2 is not quite the same as SCTLR_EL1 so the conversion does not share the field definitions. There should be no functional change from this patch. Signed-off-by: Mark Brown <broonie@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20220503170233.507788-4-broonie@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-05-04arm64/mte: Make TCF0 naming and field values more standardMark Brown1-4/+4
In preparation for automatic generation of SCTLR_EL1 register definitions make the macros used to define SCTLR_EL1.TCF0 and the enumeration values it has more standard so they can be used with FIELD_PREP() via the newly defined SYS_FIELD_PREP_ helpers. Since the field also exists in SCTLR_EL2 with the same values also rename the macros to SCTLR_ELx rather than SCTLR_EL1. There should be no functional change as a result of this patch. Signed-off-by: Mark Brown <broonie@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com Link: https://lore.kernel.org/r/20220503170233.507788-3-broonie@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-05-04arm64/sysreg: Introduce helpers for access to sysreg fieldsMark Brown1-0/+6
The macros we define for the bitfields within sysregs have very regular names, especially once we switch to automatic generation of those macros. Take advantage of this to define wrappers around FIELD_PREP() allowing us to simplify setting values in fields either numerically SYS_FIELD_PREP(SCTLR_EL1, TCF0, 0x0) or using the values of enumerations within the fields SYS_FIELD_PREP_ENUM(SCTLR_EL1, TCF0, ASYMM) Suggested-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20220503170233.507788-2-broonie@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-05-04arm64: cputype: Avoid overflow using MIDR_IMPLEMENTOR_MASKMichal Orzel1-1/+1
Value of macro MIDR_IMPLEMENTOR_MASK exceeds the range of integer and can lead to overflow. Currently there is no issue as it is used in expressions implicitly casting it to u32. To avoid possible problems, fix the macro. Signed-off-by: Michal Orzel <michal.orzel@arm.com> Link: https://lore.kernel.org/r/20220426070603.56031-1-michal.orzel@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-05-04Merge branch kvm-arm64/aarch32-idreg-trap into kvmarm-master/nextMarc Zyngier3-8/+3
* kvm-arm64/aarch32-idreg-trap: : . : Add trapping/sanitising infrastructure for AArch32 systen registers, : allowing more control over what we actually expose (such as the PMU). : : Patches courtesy of Oliver and Alexandru. : . KVM: arm64: Fix new instances of 32bit ESRs KVM: arm64: Hide AArch32 PMU registers when not available KVM: arm64: Start trapping ID registers for 32 bit guests KVM: arm64: Plumb cp10 ID traps through the AArch64 sysreg handler KVM: arm64: Wire up CP15 feature registers to their AArch64 equivalents KVM: arm64: Don't write to Rt unless sys_reg emulation succeeds KVM: arm64: Return a bool from emulate_cp() Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-05-04Merge branch kvm-arm64/hyp-stack-guard into kvmarm-master/nextMarc Zyngier2-0/+4
* kvm-arm64/hyp-stack-guard: : . : Harden the EL2 stack by providing stack guards, courtesy of : Kalesh Singh. : . KVM: arm64: Symbolize the nVHE HYP addresses KVM: arm64: Detect and handle hypervisor stack overflows KVM: arm64: Add guard pages for pKVM (protected nVHE) hypervisor stack KVM: arm64: Add guard pages for KVM nVHE hypervisor stack KVM: arm64: Introduce pkvm_alloc_private_va_range() KVM: arm64: Introduce hyp_alloc_private_va_range() Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-05-04Merge branch kvm-arm64/wfxt into kvmarm-master/nextMarc Zyngier5-2/+13
* kvm-arm64/wfxt: : . : Add support for the WFET/WFIT instructions that provide the same : service as WFE/WFI, only with a timeout. : . KVM: arm64: Expose the WFXT feature to guests KVM: arm64: Offer early resume for non-blocking WFxT instructions KVM: arm64: Handle blocking WFIT instruction KVM: arm64: Introduce kvm_counter_compute_delta() helper KVM: arm64: Simplify kvm_cpu_has_pending_timer() arm64: Use WFxT for __delay() when possible arm64: Add wfet()/wfit() helpers arm64: Add HWCAP advertising FEAT_WFXT arm64: Add RV and RN fields for ESR_ELx_WFx_ISS arm64: Expand ESR_ELx_WFx_ISS_TI to match its ARMv8.7 definition Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-05-04Merge remote-tracking branch 'arm64/for-next/sme' into kvmarm-master/nextMarc Zyngier16-15/+541
Merge arm64's SME branch to resolve conflicts with the WFxT branch. Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-05-04KVM: arm64: Implement PSCI SYSTEM_SUSPENDOliver Upton1-0/+2
ARM DEN0022D.b 5.19 "SYSTEM_SUSPEND" describes a PSCI call that allows software to request that a system be placed in the deepest possible low-power state. Effectively, software can use this to suspend itself to RAM. Unfortunately, there really is no good way to implement a system-wide PSCI call in KVM. Any precondition checks done in the kernel will need to be repeated by userspace since there is no good way to protect a critical section that spans an exit to userspace. SYSTEM_RESET and SYSTEM_OFF are equally plagued by this issue, although no users have seemingly cared for the relatively long time these calls have been supported. The solution is to just make the whole implementation userspace's problem. Introduce a new system event, KVM_SYSTEM_EVENT_SUSPEND, that indicates to userspace a calling vCPU has invoked PSCI SYSTEM_SUSPEND. Additionally, add a CAP to get buy-in from userspace for this new exit type. Only advertise the SYSTEM_SUSPEND PSCI call if userspace has opted in. If a vCPU calls SYSTEM_SUSPEND, punt straight to userspace. Provide explicit documentation of userspace's responsibilites for the exit and point to the PSCI specification to describe the actual PSCI call. Reviewed-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Oliver Upton <oupton@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220504032446.4133305-8-oupton@google.com
2022-05-04KVM: arm64: Add support for userspace to suspend a vCPUOliver Upton1-0/+1
Introduce a new MP state, KVM_MP_STATE_SUSPENDED, which indicates a vCPU is in a suspended state. In the suspended state the vCPU will block until a wakeup event (pending interrupt) is recognized. Add a new system event type, KVM_SYSTEM_EVENT_WAKEUP, to indicate to userspace that KVM has recognized one such wakeup event. It is the responsibility of userspace to then make the vCPU runnable, or leave it suspended until the next wakeup event. Signed-off-by: Oliver Upton <oupton@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220504032446.4133305-7-oupton@google.com
2022-05-04KVM: arm64: Track vCPU power state using MP state valuesOliver Upton1-2/+3
A subsequent change to KVM will add support for additional power states. Store the MP state by value rather than keeping track of it as a boolean. No functional change intended. Signed-off-by: Oliver Upton <oupton@google.com> Reviewed-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220504032446.4133305-4-oupton@google.com
2022-05-04KVM: arm64: Dedupe vCPU power off helpersOliver Upton1-0/+2
vcpu_power_off() and kvm_psci_vcpu_off() are equivalent; rename the former and replace all callsites to the latter. No functional change intended. Signed-off-by: Oliver Upton <oupton@google.com> Reviewed-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220504032446.4133305-3-oupton@google.com
2022-05-03KVM: arm64: Add vendor hypervisor firmware registerRaghavendra Rao Ananta2-0/+10
Introduce the firmware register to hold the vendor specific hypervisor service calls (owner value 6) as a bitmap. The bitmap represents the features that'll be enabled for the guest, as configured by the user-space. Currently, this includes support for KVM-vendor features along with reading the UID, represented by bit-0, and Precision Time Protocol (PTP), represented by bit-1. Signed-off-by: Raghavendra Rao Ananta <rananta@google.com> Reviewed-by: Gavin Shan <gshan@redhat.com> [maz: tidy-up bitmap values] Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220502233853.1233742-5-rananta@google.com
2022-05-03KVM: arm64: Add standard hypervisor firmware registerRaghavendra Rao Ananta2-0/+9
Introduce the firmware register to hold the standard hypervisor service calls (owner value 5) as a bitmap. The bitmap represents the features that'll be enabled for the guest, as configured by the user-space. Currently, this includes support only for Paravirtualized time, represented by bit-0. Signed-off-by: Raghavendra Rao Ananta <rananta@google.com> Reviewed-by: Gavin Shan <gshan@redhat.com> [maz: tidy-up bitmap values] Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220502233853.1233742-4-rananta@google.com
2022-05-03KVM: arm64: Setup a framework for hypercall bitmap firmware registersRaghavendra Rao Ananta2-0/+25
KVM regularly introduces new hypercall services to the guests without any consent from the userspace. This means, the guests can observe hypercall services in and out as they migrate across various host kernel versions. This could be a major problem if the guest discovered a hypercall, started using it, and after getting migrated to an older kernel realizes that it's no longer available. Depending on how the guest handles the change, there's a potential chance that the guest would just panic. As a result, there's a need for the userspace to elect the services that it wishes the guest to discover. It can elect these services based on the kernels spread across its (migration) fleet. To remedy this, extend the existing firmware pseudo-registers, such as KVM_REG_ARM_PSCI_VERSION, but by creating a new COPROC register space for all the hypercall services available. These firmware registers are categorized based on the service call owners, but unlike the existing firmware pseudo-registers, they hold the features supported in the form of a bitmap. During the VM initialization, the registers are set to upper-limit of the features supported by the corresponding registers. It's expected that the VMMs discover the features provided by each register via GET_ONE_REG, and write back the desired values using SET_ONE_REG. KVM allows this modification only until the VM has started. Some of the standard features are not mapped to any bits of the registers. But since they can recreate the original problem of making it available without userspace's consent, they need to be explicitly added to the case-list in kvm_hvc_call_default_allowed(). Any function-id that's not enabled via the bitmap, or not listed in kvm_hvc_call_default_allowed, will be returned as SMCCC_RET_NOT_SUPPORTED to the guest. Older userspace code can simply ignore the feature and the hypercall services will be exposed unconditionally to the guests, thus ensuring backward compatibility. In this patch, the framework adds the register only for ARM's standard secure services (owner value 4). Currently, this includes support only for ARM True Random Number Generator (TRNG) service, with bit-0 of the register representing mandatory features of v1.0. Other services are momentarily added in the upcoming patches. Signed-off-by: Raghavendra Rao Ananta <rananta@google.com> Reviewed-by: Gavin Shan <gshan@redhat.com> [maz: reduced the scope of some helpers, tidy-up bitmap max values, dropped error-only fast path] Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220502233853.1233742-3-rananta@google.com
2022-05-03KVM: arm64: Start trapping ID registers for 32 bit guestsOliver Upton2-8/+2
To date KVM has not trapped ID register accesses from AArch32, meaning that guests get an unconstrained view of what hardware supports. This can be a serious problem because we try to base the guest's feature registers on values that are safe system-wide. Furthermore, KVM does not implement the latest ISA in the PMU and Debug architecture, so we constrain these fields to supported values. Since KVM now correctly handles CP15 and CP10 register traps, we no longer need to clear HCR_EL2.TID3 for 32 bit guests and will instead emulate reads with their safe values. Signed-off-by: Oliver Upton <oupton@google.com> Reviewed-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220503060205.2823727-6-oupton@google.com
2022-05-03KVM: arm64: Plumb cp10 ID traps through the AArch64 sysreg handlerOliver Upton1-0/+1
In order to enable HCR_EL2.TID3 for AArch32 guests KVM needs to handle traps where ESR_EL2.EC=0x8, which corresponds to an attempted VMRS access from an ID group register. Specifically, the MVFR{0-2} registers are accessed this way from AArch32. Conveniently, these registers are architecturally mapped to MVFR{0-2}_EL1 in AArch64. Furthermore, KVM already handles reads to these aliases in AArch64. Plumb VMRS read traps through to the general AArch64 system register handler. Signed-off-by: Oliver Upton <oupton@google.com> Reviewed-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220503060205.2823727-5-oupton@google.com
2022-05-02KVM: Add max_vcpus field in common 'struct kvm'Sean Christopherson1-3/+0
For TDX guests, the maximum number of vcpus needs to be specified when the TDX guest VM is initialized (creating the TDX data corresponding to TDX guest) before creating vcpu. It needs to record the maximum number of vcpus on VM creation (KVM_CREATE_VM) and return error if the number of vcpus exceeds it Because there is already max_vcpu member in arm64 struct kvm_arch, move it to common struct kvm and initialize it to KVM_MAX_VCPUS before kvm_arch_init_vm() instead of adding it to x86 struct kvm_arch. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> Message-Id: <e53234cdee6a92357d06c80c03d77c19cdefb804.1646422845.git.isaku.yamahata@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>