summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)AuthorFilesLines
2021-08-20KVM: SVM: call avic_vcpu_load/avic_vcpu_put when enabling/disabling AVICMaxim Levitsky1-0/+5
Currently it is possible to have the following scenario: 1. AVIC is disabled by svm_refresh_apicv_exec_ctrl 2. svm_vcpu_blocking calls avic_vcpu_put which does nothing 3. svm_vcpu_unblocking enables the AVIC (due to KVM_REQ_APICV_UPDATE) and then calls avic_vcpu_load 4. warning is triggered in avic_vcpu_load since AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK was never cleared While it is possible to just remove the warning, it seems to be more robust to fully disable/enable AVIC in svm_refresh_apicv_exec_ctrl by calling the avic_vcpu_load/avic_vcpu_put Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210810205251.424103-16-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-20KVM: SVM: move check for kvm_vcpu_apicv_active outside of avic_vcpu_{put|load}Maxim Levitsky2-8/+9
No functional change intended. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210810205251.424103-15-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-20KVM: SVM: avoid refreshing avic if its state didn't changeMaxim Levitsky1-1/+8
Since AVIC can be inhibited and uninhibited rapidly it is possible that we have nothing to do by the time the svm_refresh_apicv_exec_ctrl is called. Detect and avoid this, which will be useful when we will start calling avic_vcpu_load/avic_vcpu_put when the avic inhibition state changes. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210810205251.424103-14-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-20KVM: SVM: remove svm_toggle_avic_for_irq_windowMaxim Levitsky3-14/+2
Now that kvm_request_apicv_update doesn't need to drop the kvm->srcu lock, we can call kvm_request_apicv_update directly. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210810205251.424103-13-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-20KVM: x86: hyper-v: Deactivate APICv only when AutoEOI feature is in useVitaly Kuznetsov2-6/+32
APICV_INHIBIT_REASON_HYPERV is currently unconditionally forced upon SynIC activation as SynIC's AutoEOI is incompatible with APICv/AVIC. It is, however, possible to track whether the feature was actually used by the guest and only inhibit APICv/AVIC when needed. TLFS suggests a dedicated 'HV_DEPRECATING_AEOI_RECOMMENDED' flag to let Windows know that AutoEOI feature should be avoided. While it's up to KVM userspace to set the flag, KVM can help a bit by exposing global APICv/AVIC enablement. Maxim: - always set HV_DEPRECATING_AEOI_RECOMMENDED in kvm_get_hv_cpuid, since this feature can be used regardless of AVIC Paolo: - use arch.apicv_update_lock to protect the hv->synic_auto_eoi_used instead of atomic ops Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210810205251.424103-12-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-20KVM: SVM: add warning for mistmatch between AVIC vcpu state and AVIC inhibitionMaxim Levitsky1-0/+2
It is never a good idea to enter a guest on a vCPU when the AVIC inhibition state doesn't match the enablement of the AVIC on the vCPU. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210810205251.424103-11-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-20KVM: x86: APICv: fix race in kvm_request_apicv_update on SVMMaxim Levitsky2-15/+30
Currently on SVM, the kvm_request_apicv_update toggles the APICv memslot without doing any synchronization. If there is a mismatch between that memslot state and the AVIC state, on one of the vCPUs, an APIC mmio access can be lost: For example: VCPU0: enable the APIC_ACCESS_PAGE_PRIVATE_MEMSLOT VCPU1: access an APIC mmio register. Since AVIC is still disabled on VCPU1, the access will not be intercepted by it, and neither will it cause MMIO fault, but rather it will just be read/written from/to the dummy page mapped into the APIC_ACCESS_PAGE_PRIVATE_MEMSLOT. Fix that by adding a lock guarding the AVIC state changes, and carefully order the operations of kvm_request_apicv_update to avoid this race: 1. Take the lock 2. Send KVM_REQ_APICV_UPDATE 3. Update the apic inhibit reason 4. Release the lock This ensures that at (2) all vCPUs are kicked out of the guest mode, but don't yet see the new avic state. Then only after (4) all other vCPUs can update their AVIC state and resume. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210810205251.424103-10-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-20KVM: x86: don't disable APICv memslot when inhibitedMaxim Levitsky6-32/+14
Thanks to the former patches, it is now possible to keep the APICv memslot always enabled, and it will be invisible to the guest when it is inhibited This code is based on a suggestion from Sean Christopherson: https://lkml.org/lkml/2021/7/19/2970 Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210810205251.424103-9-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-20KVM: x86/mmu: allow APICv memslot to be enabled but invisibleMaxim Levitsky1-5/+18
on AMD, APIC virtualization needs to dynamicaly inhibit the AVIC in a response to some events, and this is problematic and not efficient to do by enabling/disabling the memslot that covers APIC's mmio range. Plus due to SRCU locking, it makes it more complex to request AVIC inhibition. Instead, the APIC memslot will be always enabled, but be invisible to the guest, such as the MMU code will not install a SPTE for it, when it is inhibited and instead jump straight to emulating the access. When inhibiting the AVIC, this SPTE will be zapped. This code is based on a suggestion from Sean Christopherson: https://lkml.org/lkml/2021/7/19/2970 Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210810205251.424103-8-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-20KVM: x86/mmu: allow kvm_faultin_pfn to return page fault handling codeMaxim Levitsky2-9/+12
This will allow it to return RET_PF_EMULATE for APIC mmio emulation. This code is based on a patch from Sean Christopherson: https://lkml.org/lkml/2021/7/19/2970 Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210810205251.424103-7-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-20KVM: x86/mmu: rename try_async_pf to kvm_faultin_pfnMaxim Levitsky2-3/+3
try_async_pf is a wrong name for this function, since this code is used when asynchronous page fault is not enabled as well. This code is based on a patch from Sean Christopherson: https://lkml.org/lkml/2021/7/19/2970 Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210810205251.424103-6-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-20KVM: x86/mmu: bump mmu notifier count in kvm_zap_gfn_rangeMaxim Levitsky1-0/+4
This together with previous patch, ensures that kvm_zap_gfn_range doesn't race with page fault running on another vcpu, and will make this page fault code retry instead. This is based on a patch suggested by Sean Christopherson: https://lkml.org/lkml/2021/7/22/1025 Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210810205251.424103-5-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-20KVM: x86/mmu: add comment explaining arguments to kvm_zap_gfn_rangeMaxim Levitsky1-0/+4
This comment makes it clear that the range of gfns that this function receives is non inclusive. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210810205251.424103-4-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-20KVM: x86/mmu: fix parameters to kvm_flush_remote_tlbs_with_addressMaxim Levitsky1-1/+5
kvm_flush_remote_tlbs_with_address expects (start gfn, number of pages), and not (start gfn, end gfn) Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210810205251.424103-3-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-20Revert "KVM: x86/mmu: Allow zap gfn range to operate under the mmu read lock"Sean Christopherson3-29/+16
This together with the next patch will fix a future race between kvm_zap_gfn_range and the page fault handler, which will happen when AVIC memslot is going to be only partially disabled. The performance impact is minimal since kvm_zap_gfn_range is only called by users, update_mtrr() and kvm_post_set_cr0(). Both only use it if the guest has non-coherent DMA, in order to honor the guest's UC memtype. MTRR and CD setup only happens at boot, and generally in an area where the page tables should be small (for CD) or should not include the affected GFNs at all (for MTRRs). This is based on a patch suggested by Sean Christopherson: https://lkml.org/lkml/2021/7/22/1025 Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210810205251.424103-2-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-20KVM: X86: Introduce mmu_rmaps_stat per-vm debugfs filePeter Xu3-0/+132
Use this file to dump rmap statistic information. The statistic is done by calculating the rmap count and the result is log-2-based. An example output of this looks like (idle 6GB guest, right after boot linux): Rmap_Count: 0 1 2-3 4-7 8-15 16-31 32-63 64-127 128-255 256-511 512-1023 Level=4K: 3086676 53045 12330 1272 502 121 76 2 0 0 0 Level=2M: 5947 231 0 0 0 0 0 0 0 0 0 Level=1G: 32 0 0 0 0 0 0 0 0 0 0 Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20210730220455.26054-5-peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-20KVM: X86: Introduce kvm_mmu_slot_lpages() helpersPeter Xu4-11/+24
Introduce kvm_mmu_slot_lpages() to calculcate lpage_info and rmap array size. The other __kvm_mmu_slot_lpages() can take an extra parameter of npages rather than fetching from the memslot pointer. Start to use the latter one in kvm_alloc_memslot_metadata(). Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20210730220455.26054-4-peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-20Merge tag 'arm64-fixes' of ↵Linus Torvalds2-2/+3
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Will Deacon: - Fix cleaning of vDSO directories - Ensure CNTHCTL_EL2 is fully initialised when booting at EL2 * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: initialize all of CNTHCTL_EL2 arm64: clean vdso & vdso32 files
2021-08-20arm64: replace in_irq() with in_hardirq()Changbin Du1-1/+1
Replace the obsolete and ambiguos macro in_irq() with new macro in_hardirq(). Signed-off-by: Changbin Du <changbin.du@gmail.com> Link: https://lore.kernel.org/r/20210814005405.2658-1-changbin.du@gmail.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-08-20riscv: Fix a number of free'd resources in init_resources()Petr Pavlu1-2/+2
Function init_resources() allocates a boot memory block to hold an array of resources which it adds to iomem_resource. The array is filled in from its end and the function then attempts to free any unused memory at the beginning. The problem is that size of the unused memory is incorrectly calculated and this can result in releasing memory which is in use by active resources. Their data then gets corrupted later when the memory is reused by a different part of the system. Fix the size of the released memory to correctly match the number of unused resource entries. Fixes: ffe0e5261268 ("RISC-V: Improve init_resources()") Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Reviewed-by: Sunil V L <sunilvl@ventanamicro.com> Acked-by: Nick Kossifidis <mick@ics.forth.gr> Tested-by: Sunil V L <sunilvl@ventanamicro.com> Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-08-20arm64: dts: sparx5: Add the Sparx5 switch frame DMA supportSteen Hegelund1-2/+3
This adds the interrupt for the Sparx5 Frame DMA. If this configuration is present the Sparx5 SwitchDev driver will use the Frame DMA feature, and if not it will use register based injection and extraction for sending and receiving frames to the CPU. Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-20powerpc/audit: Simplify syscall_get_arch()Christophe Leroy1-10/+5
Make use of is_32bit_task() and CONFIG_CPU_LITTLE_ENDIAN to simplify syscall_get_arch(). Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/4be53b9187a4d8c163968f4d224267e41a7fcc33.1629451479.git.christophe.leroy@csgroup.eu
2021-08-20powerpc/audit: Avoid unneccessary #ifdef in syscall_get_arguments()Christophe Leroy1-3/+2
Use is_32bit_task() which already handles CONFIG_COMPAT. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/ba49cdd574558a0363300c3f6b5b062b397cb071.1629451483.git.christophe.leroy@csgroup.eu
2021-08-20powerpc/64s: Fix scv implicit soft-mask table for relocated kernelsNicholas Piggin1-3/+4
The implict soft-mask table addresses get relocated if they use a relative symbol like a label. This is right for code that runs relocated but not for unrelocated. The scv interrupt vectors run unrelocated, so absolute addresses are required for their soft-mask table entry. This fixes crashing with relocated kernels, usually an asynchronous interrupt hitting in the scv handler, then hitting the trap that checks whether r1 is in userspace. Fixes: 325678fd0522 ("powerpc/64s: add a table of implicit soft-masked addresses") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210820103431.1701240-1-npiggin@gmail.com
2021-08-20KVM: PPC: Book3S PR: Remove unused variableCédric Le Goater1-2/+1
This fixes a compile error with W=1. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210819125656.14498-5-clg@kaod.org
2021-08-20KVM: PPC: Book3S PR: Declare kvmppc_handle_exit_pr()Cédric Le Goater1-1/+2
This fixes a compile error with W=1. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210819125656.14498-4-clg@kaod.org
2021-08-20powerpc/pseries/vas: Declare pseries_vas_fault_thread_fn() as staticCédric Le Goater1-1/+1
This fixes a compile error with W=1. Fixes: 6d0aaf5e0de0 ("powerpc/pseries/vas: Setup IRQ and fault handling") Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210819125656.14498-3-clg@kaod.org
2021-08-20Merge branch kvm-arm64/pkvm-fixed-features-prologue into kvmarm-master/nextMarc Zyngier18-104/+140
* kvm-arm64/pkvm-fixed-features-prologue: : Rework a bunch of common infrastructure as a prologue : to Fuad Tabba's protected VM fixed feature series. KVM: arm64: Upgrade trace_kvm_arm_set_dreg32() to 64bit KVM: arm64: Add config register bit definitions KVM: arm64: Add feature register flag definitions KVM: arm64: Track value of cptr_el2 in struct kvm_vcpu_arch KVM: arm64: Keep mdcr_el2's value as set by __init_el2_debug KVM: arm64: Restore mdcr_el2 from vcpu KVM: arm64: Refactor sys_regs.h,c for nVHE reuse KVM: arm64: Fix names of config register fields KVM: arm64: MDCR_EL2 is a 64-bit register KVM: arm64: Remove trailing whitespace in comment KVM: arm64: placeholder to check if VM is protected Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-08-20Merge branch kvm-arm64/mmu/vmid-cleanups into kvmarm-master/nextMarc Zyngier9-22/+25
* kvm-arm64/mmu/vmid-cleanups: : Cleanup the stage-2 configuration by providing a single helper, : and tidy up some of the ordering requirements for the VMID : allocator. KVM: arm64: Upgrade VMID accesses to {READ,WRITE}_ONCE KVM: arm64: Unify stage-2 programming behind __load_stage2() KVM: arm64: Move kern_hyp_va() usage in __load_guest_stage2() into the callers Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-08-20Merge branch kvm-arm64/generic-entry into kvmarm-master/nextMarc Zyngier4-27/+47
Switch KVM/arm64 to the generic entry code, courtesy of Oliver Upton * kvm-arm64/generic-entry: KVM: arm64: Use generic KVM xfer to guest work function entry: KVM: Allow use of generic KVM entry w/o full generic support KVM: arm64: Record number of signal exits as a vCPU stat Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-08-20Merge branch kvm-arm64/psci/cpu_on into kvmarm-master/nextMarc Zyngier3-9/+30
PSCI fixes from Oliver Upton: - Plug race on reset - Ensure that a pending reset is applied before userspace accesses - Reject PSCI requests with illegal affinity bits * kvm-arm64/psci/cpu_on: selftests: KVM: Introduce psci_cpu_on_test KVM: arm64: Enforce reserved bits for PSCI target affinities KVM: arm64: Handle PSCI resets before userspace touches vCPU state KVM: arm64: Fix read-side race on updates to vcpu reset state Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-08-20Merge branch kvm-arm64/mmu/el2-tracking into kvmarm-master/nextMarc Zyngier15-306/+618
* kvm-arm64/mmu/el2-tracking: (25 commits) : Enable tracking of page sharing between host EL1 and EL2 KVM: arm64: Minor optimization of range_is_memory KVM: arm64: Make hyp_panic() more robust when protected mode is enabled KVM: arm64: Return -EPERM from __pkvm_host_share_hyp() KVM: arm64: Make __pkvm_create_mappings static KVM: arm64: Restrict EL2 stage-1 changes in protected mode KVM: arm64: Refactor protected nVHE stage-1 locking KVM: arm64: Remove __pkvm_mark_hyp KVM: arm64: Mark host bss and rodata section as shared KVM: arm64: Enable retrieving protections attributes of PTEs KVM: arm64: Introduce addr_is_memory() KVM: arm64: Expose pkvm_hyp_id KVM: arm64: Expose host stage-2 manipulation helpers KVM: arm64: Add helpers to tag shared pages in SW bits KVM: arm64: Allow populating software bits KVM: arm64: Enable forcing page-level stage-2 mappings KVM: arm64: Tolerate re-creating hyp mappings to set software bits KVM: arm64: Don't overwrite software bits with owner id KVM: arm64: Rename KVM_PTE_LEAF_ATTR_S2_IGNORED KVM: arm64: Optimize host memory aborts KVM: arm64: Expose page-table helpers ... Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-08-20Merge branch kvm-arm64/mmu/kmemleak-pkvm into kvmarm-master/nextMarc Zyngier2-2/+9
Prevent kmemleak from peeking into the HYP data, which is fatal in protected mode. * kvm-arm64/mmu/kmemleak-pkvm: KVM: arm64: Unregister HYP sections from kmemleak in protected mode arm64: Move .hyp.rodata outside of the _sdata.._edata range Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-08-20Merge branch kvm-arm64/misc-5.15 into kvmarm-master/nextMarc Zyngier14-135/+112
* kvm-arm64/misc-5.15: : Misc improvements for 5.15: : : - Account the number of VMID-wide TLB invalidations as : remote TLB flushes : - Fix comments in the VGIC code : - Cleanup the PMU IMPDEF identification : - Streamline the TGRAN2 usage : - Avoid advertising a 52bit IPA range for non-64KB configs : - Avoid spurious signalling when a HW-mapped interrupt is in the : A+P state on entry, and in the P state on exit, but that the : physical line is not pending anymore. : - Bunch of minor cleanups KVM: arm64: vgic: Resample HW pending state on deactivation KVM: arm64: vgic: Drop WARN from vgic_get_irq KVM: arm64: Drop unused REQUIRES_VIRT KVM: arm64: Drop check_kvm_target_cpu() based percpu probe KVM: arm64: Drop init_common_resources() KVM: arm64: Use ARM64_MIN_PARANGE_BITS as the minimum supported IPA arm64/mm: Add remaining ID_AA64MMFR0_PARANGE_ macros KVM: arm64: Restrict IPA size to maximum 48 bits on 4K and 16K page size arm64/mm: Define ID_AA64MMFR0_TGRAN_2_SHIFT KVM: arm64: perf: Replace '0xf' instances with ID_AA64DFR0_PMUVER_IMP_DEF KVM: arm64: Fix comments related to GICv2 PMR reporting KVM: arm64: Count VMID-wide TLB invalidations arm64/kexec: Test page size support with new TGRAN range values Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-08-20Merge branch kvm-arm64/mmu/mapping-levels into kvmarm-master/nextMarc Zyngier3-7/+97
Revamp the KVM/arm64 THP code by parsing the userspace page tables instead of relying on an infrastructure that is about to disappear (we are the last user). * kvm-arm64/mmu/mapping-levels: KVM: Get rid of kvm_get_pfn() KVM: arm64: Use get_page() instead of kvm_get_pfn() KVM: Remove kvm_is_transparent_hugepage() and PageTransCompoundMap() KVM: arm64: Avoid mapping size adjustment on permission fault KVM: arm64: Walk userspace page tables to compute the THP mapping size KVM: arm64: Introduce helper to retrieve a PTE and its level Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-08-20KVM: arm64: Minor optimization of range_is_memoryDavid Brazdil1-5/+8
Currently range_is_memory finds the corresponding struct memblock_region for both the lower and upper bounds of the given address range with two rounds of binary search, and then checks that the two memblocks are the same. Simplify this by only doing binary search on the lower bound and then checking that the upper bound is in the same memblock. Signed-off-by: David Brazdil <dbrazdil@google.com> Reviewed-by: Quentin Perret <qperret@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210728153232.1018911-3-dbrazdil@google.com
2021-08-20Merge tag 'kvmarm-fixes-5.14-2' into kvm-arm64/mmu/el2-trackingMarc Zyngier2-5/+9
KVM/arm64 fixes for 5.14, take #2 - Plug race between enabling MTE and creating vcpus - Fix off-by-one bug when checking whether an address range is RAM Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-08-20Merge branch arm64/for-next/sysreg into kvm-arm64/misc-5.15Marc Zyngier2-15/+22
Merge the arm64/for-next/sysreg branch to avoid merge conflicts in -next and upstream. * arm64/for-next/sysreg: arm64/kexec: Test page size support with new TGRAN range values Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-08-20ARM: 9118/1: div64: Remove always-true __div64_const32_is_OK() duplicateGeert Uytterhoeven1-11/+0
Since commit cafa0010cd51fb71 ("Raise the minimum required gcc version to 4.6"), the kernel can no longer be compiled using gcc-3. Hence __div64_const32_is_OK() is always true. Moreover, __div64_const32_is_OK() is defined in the same way in include/asm-generic/div64.h, so the ARM-specific definition can be removed regardless. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2021-08-20ARM: 9116/1: unified: Remove check for gcc < 4Geert Uytterhoeven1-4/+0
Since commit cafa0010cd51fb71 ("Raise the minimum required gcc version to 4.6"), the kernel can no longer be compiled using gcc-3. Hence this condition is never true, and the check can thus be removed. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2021-08-20ARM: 9110/1: oabi-compat: fix oabi epoll sparse warningArnd Bergmann1-1/+1
As my patches change the oabi epoll definition, I received a report from the kernel test robot about a pre-existing issue with a mismatched __poll_t type. The OABI code was correct when it was initially added in linux-2.16, but a later (also correct) change to the generic __poll_t triggered a type mismatch warning from sparse. As __poll_t is always 32-bit bits wide and otherwise compatible, using this instead of __u32 in the oabi_epoll_event definition is a valid workaround. Reported-by: kernel test robot <lkp@intel.com> Fixes: 8ced390c2b18 ("define __poll_t, annotate constants") Fixes: ee219b946e4b ("uapi: turn __poll_t sparse checks on by default") Fixes: 687ad0191488 ("[ARM] 3109/1: old ABI compat: syscall wrappers for ABI impedance matching") Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2021-08-20ARM: 9113/1: uaccess: remove set_fs() implementationArnd Bergmann11-86/+7
There are no remaining callers of set_fs(), so just remove it along with all associated code that operates on thread_info->addr_limit. There are still further optimizations that can be done: - In get_user(), the address check could be moved entirely into the out of line code, rather than passing a constant as an argument, - I assume the DACR handling can be simplified as we now only change it during user access when CONFIG_CPU_SW_DOMAIN_PAN is set, but not during set_fs(). Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2021-08-20ARM: 9112/1: uaccess: add __{get,put}_kernel_nofaultArnd Bergmann1-40/+83
These mimic the behavior of get_user and put_user, except for domain switching, address limit checking and handling of mismatched sizes, none of which are relevant here. To work with pre-Armv6 kernels, this has to avoid TUSER() inside of the new macros, the new approach passes the "t" string along with the opcode, which is a bit uglier but avoids duplicating more code. As there is no __get_user_asm_dword(), I work around it by copying 32 bit at a time, which is possible because the output size is known. Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2021-08-20ARM: 9111/1: oabi-compat: rework fcntl64() emulationArnd Bergmann1-33/+60
This is one of the last users of get_fs(), and this is fairly easy to change, since the infrastructure for it is already there. The replacement here is essentially a copy of the existing fcntl64() syscall entry function. Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2021-08-20ARM: 9114/1: oabi-compat: rework sys_semtimedop emulationArnd Bergmann1-16/+44
sys_oabi_semtimedop() is one of the last users of set_fs() on Arm. To remove this one, expose the internal code of the actual implementation that operates on a kernel pointer and call it directly after copying. There should be no measurable impact on the normal execution of this function, and it makes the overly long function a little shorter, which may help readability. While reworking the oabi version, make it behave a little more like the native one, using kvmalloc_array() and restructure the code flow in a similar way. The naming of __do_semtimedop() is not very good, I hope someone can come up with a better name. One regression was spotted by kernel test robot <rong.a.chen@intel.com> and fixed before the first mailing list submission. Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2021-08-20ARM: 9108/1: oabi-compat: rework epoll_wait/epoll_pwait emulationArnd Bergmann3-69/+29
The epoll_wait() system call wrapper is one of the remaining users of the set_fs() infrasturcture for Arm. Changing it to not require set_fs() is rather complex unfortunately. The approach I'm taking here is to allow architectures to override the code that copies the output to user space, and let the oabi-compat implementation check whether it is getting called from an EABI or OABI system call based on the thread_info->syscall value. The in_oabi_syscall() check here mirrors the in_compat_syscall() and in_x32_syscall() helpers for 32-bit compat implementations on other architectures. Overall, the amount of code goes down, at least with the newly added sys_oabi_epoll_pwait() helper getting removed again. The downside is added complexity in the source code for the native implementation. There should be no difference in runtime performance except for Arm kernels with CONFIG_OABI_COMPAT enabled that now have to go through an external function call to check which of the two variants to use. Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2021-08-20ARM: 9107/1: syscall: always store thread_info->abi_syscallArnd Bergmann6-10/+21
The system call number is used in a a couple of places, in particular ptrace, seccomp and /proc/<pid>/syscall. The last one apparently never worked reliably on ARM for tasks that are not currently getting traced. Storing the syscall number in the normal entry path makes it work, as well as allowing us to see if the current system call is for OABI compat mode, which is the next thing I want to hook into. Since the thread_info->syscall field is not just the number any more, it is now renamed to abi_syscall. In kernels that enable both OABI and EABI, the upper bits of this field encode 0x900000 (__NR_OABI_SYSCALL_BASE) for OABI tasks, while normal EABI tasks do not set the upper bits. This makes it possible to implement the in_oabi_syscall() helper later. All other users of thread_info->syscall go through the syscall_get_nr() helper, which in turn filters out the ABI bits. Note that the ABI information is lost with PTRACE_SET_SYSCALL, so one cannot set the internal number to a particular version, but this was already the case. We could change it to let gdb encode the ABI type along with the syscall in a CONFIG_OABI_COMPAT-enabled kernel, but that itself would be a (backwards-compatible) ABI change, so I don't do it here. Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2021-08-20ARM: 9109/1: oabi-compat: add epoll_pwait handlerArnd Bergmann2-4/+36
The epoll_wait() syscall has a special version for OABI compat mode to convert the arguments to the EABI structure layout of the kernel. However, the later epoll_pwait() syscall was added in arch/arm in linux-2.6.32 without this conversion. Use the same kind of handler for both. Fixes: 369842658a36 ("ARM: 5677/1: ARM support for TIF_RESTORE_SIGMASK/pselect6/ppoll/epoll_pwait") Cc: stable@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2021-08-20ARM: 9106/1: traps: use get_kernel_nofault instead of set_fs()Arnd Bergmann1-31/+16
ARM uses set_fs() and __get_user() to allow the stack dumping code to access possibly invalid pointers carefully. These can be changed to the simpler get_kernel_nofault(), and allow the eventual removal of set_fs(). dump_instr() will print either kernel or user space pointers, depending on how it was called. For dump_mem(), I assume we are only interested in kernel pointers, and the only time that this is called with user_mode(regs)==true is when the regs themselves are unreliable as a result of the condition that caused the trap. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2021-08-20arm64: Remove logic to kill 32-bit tasks on 64-bit-only coresWill Deacon2-39/+1
The scheduler now knows enough about these braindead systems to place 32-bit tasks accordingly, so throw out the safety checks and allow the ret-to-user path to avoid do_notify_resume() if there is nothing to do. Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20210730112443.23245-16-will@kernel.org