summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)AuthorFilesLines
2022-06-09KVM: x86: SVM: fix nested PAUSE filtering when L0 intercepts PAUSEPaolo Bonzini2-20/+23
Commit 74fd41ed16fd ("KVM: x86: nSVM: support PAUSE filtering when L0 doesn't intercept PAUSE") introduced passthrough support for nested pause filtering, (when the host doesn't intercept PAUSE) (either disabled with kvm module param, or disabled with '-overcommit cpu-pm=on') Before this commit, L1 KVM didn't intercept PAUSE at all; afterwards, the feature was exposed as supported by KVM cpuid unconditionally, thus if L1 could try to use it even when the L0 KVM can't really support it. In this case the fallback caused KVM to intercept each PAUSE instruction; in some cases, such intercept can slow down the nested guest so much that it can fail to boot. Instead, before the problematic commit KVM was already setting both thresholds to 0 in vmcb02, but after the first userspace VM exit shrink_ple_window was called and would reset the pause_filter_count to the default value. To fix this, change the fallback strategy - ignore the guest threshold values, but use/update the host threshold values unless the guest specifically requests disabling PAUSE filtering (either simple or advanced). Also fix a minor bug: on nested VM exit, when PAUSE filter counter were copied back to vmcb01, a dirty bit was not set. Thanks a lot to Suravee Suthikulpanit for debugging this! Fixes: 74fd41ed16fd ("KVM: x86: nSVM: support PAUSE filtering when L0 doesn't intercept PAUSE") Reported-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Tested-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Co-developed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220518072709.730031-1-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09KVM: x86: SVM: drop preempt-safe wrappers for avic_vcpu_load/putMaxim Levitsky3-27/+8
Now that these functions are always called with preemption disabled, remove the preempt_disable()/preempt_enable() pair inside them. No functional change intended. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220606180829.102503-8-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09KVM: x86: disable preemption while updating apicv inhibitionMaxim Levitsky1-0/+2
Currently nothing prevents preemption in kvm_vcpu_update_apicv. On SVM, If the preemption happens after we update the vcpu->arch.apicv_active, the preemption itself will 'update' the inhibition since the AVIC will be first disabled on vCPU unload and then enabled, when the current task is loaded again. Then we will try to update it again, which will lead to a warning in __avic_vcpu_load, that the AVIC is already enabled. Fix this by disabling preemption in this code. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220606180829.102503-6-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09KVM: x86: SVM: fix avic_kick_target_vcpus_fastMaxim Levitsky1-36/+69
There are two issues in avic_kick_target_vcpus_fast 1. It is legal to issue an IPI request with APIC_DEST_NOSHORT and a physical destination of 0xFF (or 0xFFFFFFFF in case of x2apic), which must be treated as a broadcast destination. Fix this by explicitly checking for it. Also don’t use ‘index’ in this case as it gives no new information. 2. It is legal to issue a logical IPI request to more than one target. Index field only provides index in physical id table of first such target and therefore can't be used before we are sure that only a single target was addressed. Instead, parse the ICRL/ICRH, double check that a unicast interrupt was requested, and use that info to figure out the physical id of the target vCPU. At that point there is no need to use the index field as well. In addition to fixing the above issues, also skip the call to kvm_apic_match_dest. It is possible to do this now, because now as long as AVIC is not inhibited, it is guaranteed that none of the vCPUs changed their apic id from its default value. This fixes boot of windows guest with AVIC enabled because it uses IPI with 0xFF destination and no destination shorthand. Fixes: 7223fd2d5338 ("KVM: SVM: Use target APIC ID to complete AVIC IRQs when possible") Cc: stable@vger.kernel.org Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220606180829.102503-5-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09KVM: x86: SVM: remove avic's broken code that updated APIC IDMaxim Levitsky1-35/+0
AVIC is now inhibited if the guest changes the apic id, and therefore this code is no longer needed. There are several ways this code was broken, including: 1. a vCPU was only allowed to change its apic id to an apic id of an existing vCPU. 2. After such change, the vCPU whose apic id entry was overwritten, could not correctly change its own apic id, because its own entry is already overwritten. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220606180829.102503-4-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09KVM: x86: inhibit APICv/AVIC on changes to APIC ID or APIC baseMaxim Levitsky4-6/+37
Neither of these settings should be changed by the guest and it is a burden to support it in the acceleration code, so just inhibit this code instead. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220606180829.102503-3-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09KVM: x86: document AVIC/APICv inhibit reasonsMaxim Levitsky1-2/+57
These days there are too many AVIC/APICv inhibit reasons, and it doesn't hurt to have some documentation for them. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220606180829.102503-2-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09KVM: x86/mmu: Set memory encryption "value", not "mask", in shadow PDPTRsYuan Yao1-1/+1
Assign shadow_me_value, not shadow_me_mask, to PAE root entries, a.k.a. shadow PDPTRs, when host memory encryption is supported. The "mask" is the set of all possible memory encryption bits, e.g. MKTME KeyIDs, whereas "value" holds the actual value that needs to be stuffed into host page tables. Using shadow_me_mask results in a failed VM-Entry due to setting reserved PA bits in the PDPTRs, and ultimately causes an OOPS due to physical addresses with non-zero MKTME bits sending to_shadow_page() into the weeds: set kvm_intel.dump_invalid_vmcs=1 to dump internal KVM state. BUG: unable to handle page fault for address: ffd43f00063049e8 PGD 86dfd8067 P4D 0 Oops: 0000 [#1] PREEMPT SMP RIP: 0010:mmu_free_root_page+0x3c/0x90 [kvm] kvm_mmu_free_roots+0xd1/0x200 [kvm] __kvm_mmu_unload+0x29/0x70 [kvm] kvm_mmu_unload+0x13/0x20 [kvm] kvm_arch_destroy_vm+0x8a/0x190 [kvm] kvm_put_kvm+0x197/0x2d0 [kvm] kvm_vm_release+0x21/0x30 [kvm] __fput+0x8e/0x260 ____fput+0xe/0x10 task_work_run+0x6f/0xb0 do_exit+0x327/0xa90 do_group_exit+0x35/0xa0 get_signal+0x911/0x930 arch_do_signal_or_restart+0x37/0x720 exit_to_user_mode_prepare+0xb2/0x140 syscall_exit_to_user_mode+0x16/0x30 do_syscall_64+0x4e/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: e54f1ff244ac ("KVM: x86/mmu: Add shadow_me_value and repurpose shadow_me_mask") Signed-off-by: Yuan Yao <yuan.yao@intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Message-Id: <20220608012015.19566-1-yuan.yao@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09Merge tag 'kvmarm-fixes-5.19-1' of ↵Paolo Bonzini13-73/+95
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 5.19, take #1 - Properly reset the SVE/SME flags on vcpu load - Fix a vgic-v2 regression regarding accessing the pending state of a HW interrupt from userspace (and make the code common with vgic-v3) - Fix access to the idreg range for protected guests - Ignore 'kvm-arm.mode=protected' when using VHE - Return an error from kvm_arch_init_vm() on allocation failure - A bunch of small cleanups (comments, annotations, indentation)
2022-06-09Merge tag 'kvm-riscv-fixes-5.19-1' of https://github.com/kvm-riscv/linux ↵Paolo Bonzini2871-41232/+93627
into HEAD KVM/riscv fixes for 5.19, take #1 - Typo fix in arch/riscv/kvm/vmid.c - Remove broken reference pattern from MAINTAINERS entry
2022-06-09KVM: arm64: Drop stale commentMarc Zyngier1-5/+0
The layout of 'struct kvm_vcpu_arch' has evolved significantly since the initial port of KVM/arm64, so remove the stale comment suggesting that a prefix of the structure is used exclusively from assembly code. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220609121223.2551-7-will@kernel.org
2022-06-09KVM: arm64: Remove redundant hyp_assert_lock_held() assertionsWill Deacon1-4/+0
host_stage2_try() asserts that the KVM host lock is held, so there's no need to duplicate the assertion in its wrappers. Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220609121223.2551-6-will@kernel.org
2022-06-09KVM: arm64: Extend comment in has_vhe()Will Deacon1-0/+3
has_vhe() expands to a compile-time constant when evaluated from the VHE or nVHE code, alternatively checking a static key when called from elsewhere in the kernel. On face value, this looks like a case of premature optimization, but in fact this allows symbol references on VHE-specific code paths to be dropped from the nVHE object. Expand the comment in has_vhe() to make this clearer, hopefully discouraging anybody from simplifying the code. Cc: David Brazdil <dbrazdil@google.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220609121223.2551-5-will@kernel.org
2022-06-09KVM: arm64: Ignore 'kvm-arm.mode=protected' when using VHEWill Deacon2-10/+6
Ignore 'kvm-arm.mode=protected' when using VHE so that kvm_get_mode() only returns KVM_MODE_PROTECTED on systems where the feature is available. Cc: David Brazdil <dbrazdil@google.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220609121223.2551-4-will@kernel.org
2022-06-09KVM: arm64: Handle all ID registers trapped for a protected VMMarc Zyngier1-8/+34
A protected VM accessing ID_AA64ISAR2_EL1 gets punished with an UNDEF, while it really should only get a zero back if the register is not handled by the hypervisor emulation (as mandated by the architecture). Introduce all the missing ID registers (including the unallocated ones), and have them to return 0. Reported-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220609121223.2551-3-will@kernel.org
2022-06-09KVM: arm64: Return error from kvm_arch_init_vm() on allocation failureWill Deacon1-1/+3
If we fail to allocate the 'supported_cpus' cpumask in kvm_arch_init_vm() then be sure to return -ENOMEM instead of success (0) on the failure path. Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220609121223.2551-2-will@kernel.org
2022-06-09RISC-V: KVM: fix typos in commentsJulia Lawall1-1/+1
Various spelling mistakes in comments. Detected with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Signed-off-by: Anup Patel <anup@brainfault.org>
2022-06-08KVM: arm64: Warn if accessing timer pending state outside of vcpu contextMarc Zyngier1-0/+3
A recurrent bug in the KVM/arm64 code base consists in trying to access the timer pending state outside of the vcpu context, which makes zero sense (the pending state only exists when the vcpu is loaded). In order to avoid more embarassing crashes and catch the offenders red-handed, add a warning to kvm_arch_timer_get_input_level() and return the state as non-pending. This avoids taking the system down, and still helps tracking down silly bugs. Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220607131427.1164881-4-maz@kernel.org
2022-06-08KVM: arm64: Replace vgic_v3_uaccess_read_pending with vgic_uaccess_read_pendingMarc Zyngier2-39/+22
Now that GICv2 has a proper userspace accessor for the pending state, switch GICv3 over to it, dropping the local version, moving over the specific behaviours that CGIv3 requires (such as the distinction between pending latch and line level which were never enforced with GICv2). We also gain extra locking that isn't really necessary for userspace, but that's a small price to pay for getting rid of superfluous code. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Eric Auger <eric.auger@redhat.com> Link: https://lore.kernel.org/r/20220607131427.1164881-3-maz@kernel.org
2022-06-08KVM: x86: do not report a vCPU as preempted outside instruction boundariesPaolo Bonzini4-0/+28
If a vCPU is outside guest mode and is scheduled out, it might be in the process of making a memory access. A problem occurs if another vCPU uses the PV TLB flush feature during the period when the vCPU is scheduled out, and a virtual address has already been translated but has not yet been accessed, because this is equivalent to using a stale TLB entry. To avoid this, only report a vCPU as preempted if sure that the guest is at an instruction boundary. A rescheduling request will be delivered to the host physical CPU as an external interrupt, so for simplicity consider any vmexit *not* instruction boundary except for external interrupts. It would in principle be okay to report the vCPU as preempted also if it is sleeping in kvm_vcpu_block(): a TLB flush IPI will incur the vmentry/vmexit overhead unnecessarily, and optimistic spinning is also unlikely to succeed. However, leave it for later because right now kvm_vcpu_check_block() is doing memory accesses. Even though the TLB flush issue only applies to virtual memory address, it's very much preferrable to be conservative. Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-08KVM: x86: do not set st->preempted when going back to user spacePaolo Bonzini2-14/+18
Similar to the Xen path, only change the vCPU's reported state if the vCPU was actually preempted. The reason for KVM's behavior is that for example optimistic spinning might not be a good idea if the guest is doing repeated exits to userspace; however, it is confusing and unlikely to make a difference, because well-tuned guests will hardly ever exit KVM_RUN in the first place. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-07KVM: SVM: fix tsc scaling cache logicMaxim Levitsky3-15/+23
SVM uses a per-cpu variable to cache the current value of the tsc scaling multiplier msr on each cpu. Commit 1ab9287add5e2 ("KVM: X86: Add vendor callbacks for writing the TSC multiplier") broke this caching logic. Refactor the code so that all TSC scaling multiplier writes go through a single function which checks and updates the cache. This fixes the following scenario: 1. A CPU runs a guest with some tsc scaling ratio. 2. New guest with different tsc scaling ratio starts on this CPU and terminates almost immediately. This ensures that the short running guest had set the tsc scaling ratio just once when it was set via KVM_SET_TSC_KHZ. Due to the bug, the per-cpu cache is not updated. 3. The original guest continues to run, it doesn't restore the msr value back to its own value, because the cache matches, and thus continues to run with a wrong tsc scaling ratio. Fixes: 1ab9287add5e2 ("KVM: X86: Add vendor callbacks for writing the TSC multiplier") Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220606181149.103072-1-mlevitsk@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-07KVM: x86/MMU: Zap non-leaf SPTEs when disabling dirty loggingBen Gardon3-6/+42
Currently disabling dirty logging with the TDP MMU is extremely slow. On a 96 vCPU / 96G VM backed with gigabyte pages, it takes ~200 seconds to disable dirty logging with the TDP MMU, as opposed to ~4 seconds with the shadow MMU. When disabling dirty logging, zap non-leaf parent entries to allow replacement with huge pages instead of recursing and zapping all of the child, leaf entries. This reduces the number of TLB flushes required. and reduces the disable dirty log time with the TDP MMU to ~3 seconds. Opportunistically add a WARN() to catch GFNs that are mapped at a higher level than their max level. Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20220525230904.1584480-1-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-07x86: drop bogus "cc" clobber from __try_cmpxchg_user_asm()Jan Beulich1-1/+1
As noted (and fixed) a couple of times in the past, "=@cc<cond>" outputs and clobbering of "cc" don't work well together. The compiler appears to mean to reject such, but doesn't - in its upstream form - quite manage to yet for "cc". Furthermore two similar macros don't clobber "cc", and clobbering "cc" is pointless in asm()-s for x86 anyway - the compiler always assumes status flags to be clobbered there. Fixes: 989b5db215a2 ("x86/uaccess: Implement macros for CMPXCHG on user addresses") Signed-off-by: Jan Beulich <jbeulich@suse.com> Message-Id: <485c0c0b-a3a7-0b7c-5264-7d00c01de032@suse.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-07KVM: x86/mmu: Check every prev_roots in __kvm_mmu_free_obsolete_roots()Shaoqin Huang1-1/+1
When freeing obsolete previous roots, check prev_roots as intended, not the current root. Signed-off-by: Shaoqin Huang <shaoqin.huang@intel.com> Fixes: 527d5cd7eece ("KVM: x86/mmu: Zap only obsolete roots if a root shadow page is zapped") Message-Id: <20220607005905.2933378-1-shaoqin.huang@intel.com> Cc: stable@vger.kernel.org Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-07KVM: arm64: Don't read a HW interrupt pending state in user contextMarc Zyngier3-5/+21
Since 5bfa685e62e9 ("KVM: arm64: vgic: Read HW interrupt pending state from the HW"), we're able to source the pending bit for an interrupt that is stored either on the physical distributor or on a device. However, this state is only available when the vcpu is loaded, and is not intended to be accessed from userspace. Unfortunately, the GICv2 emulation doesn't provide specific userspace accessors, and we fallback with the ones that are intended for the guest, with fatal consequences. Add a new vgic_uaccess_read_pending() accessor for userspace to use, build on top of the existing vgic_mmio_read_pending(). Reported-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Fixes: 5bfa685e62e9 ("KVM: arm64: vgic: Read HW interrupt pending state from the HW") Link: https://lore.kernel.org/r/20220607131427.1164881-2-maz@kernel.org Cc: stable@vger.kernel.org
2022-06-07KVM: arm64: Fix inconsistent indentingsunliming1-1/+1
Fix the following smatch warnings: arch/arm64/kvm/vmid.c:62 flush_context() warn: inconsistent indenting Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: sunliming <sunliming@kylinos.cn> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220602024805.511457-1-sunliming@kylinos.cn
2022-06-07KVM: arm64: Always start with clearing SME flag on loadMarc Zyngier1-0/+1
On each vcpu load, we set the KVM_ARM64_HOST_SME_ENABLED flag if SME is enabled for EL0 on the host. This is used to restore the correct state on vpcu put. However, it appears that nothing ever clears this flag. Once set, it will stick until the vcpu is destroyed, which has the potential to spuriously enable SME for userspace. As it turns out, this is due to the SME code being more or less copied from SVE, and inheriting the same shortcomings. We never saw the issue because nothing uses SME, and the amount of testing is probably still pretty low. Fixes: 861262ab8627 ("KVM: arm64: Handle SME host state when running guests") Signed-off-by: Marc Zyngier <maz@kernel.org> Reviwed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20220528113829.1043361-3-maz@kernel.org
2022-06-07KVM: arm64: Always start with clearing SVE flag on loadMarc Zyngier1-0/+1
On each vcpu load, we set the KVM_ARM64_HOST_SVE_ENABLED flag if SVE is enabled for EL0 on the host. This is used to restore the correct state on vpcu put. However, it appears that nothing ever clears this flag. Once set, it will stick until the vcpu is destroyed, which has the potential to spuriously enable SVE for userspace. We probably never saw the issue because no VMM uses SVE, but that's still pretty bad. Unconditionally clearing the flag on vcpu load addresses the issue. Fixes: 8383741ab2e7 ("KVM: arm64: Get rid of host SVE tracking/saving") Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20220528113829.1043361-2-maz@kernel.org
2022-06-06Merge tag 'mm-hotfixes-stable-2022-06-05' of ↵Linus Torvalds1-3/+9
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull mm hotfixes from Andrew Morton: "Fixups for various recently-added and longer-term issues and a few minor tweaks: - fixes for material merged during this merge window - cc:stable fixes for more longstanding issues - minor mailmap and MAINTAINERS updates" * tag 'mm-hotfixes-stable-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: mm/oom_kill.c: fix vm_oom_kill_table[] ifdeffery x86/kexec: fix memory leak of elf header buffer mm/memremap: fix missing call to untrack_pfn() in pagemap_range() mm: page_isolation: use compound_nr() correctly in isolate_single_pageblock() mm: hugetlb_vmemmap: fix CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON MAINTAINERS: add maintainer information for z3fold mailmap: update Josh Poimboeuf's email
2022-06-05Merge tag 'x86-urgent-2022-06-05' of ↵Linus Torvalds3-6/+115
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 SGX fix from Thomas Gleixner: "A single fix for x86/SGX to prevent that memory which is allocated for an SGX enclave is accounted to the wrong memory control group" * tag 'x86-urgent-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/sgx: Set active memcg prior to shmem allocation
2022-06-05Merge tag 'x86-mm-2022-06-05' of ↵Linus Torvalds1-4/+4
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm cleanup from Thomas Gleixner: "Use PAGE_ALIGNED() instead of open coding it in the x86/mm code" * tag 'x86-mm-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm: Use PAGE_ALIGNED(x) instead of IS_ALIGNED(x, PAGE_SIZE)
2022-06-05Merge tag 'x86-microcode-2022-06-05' of ↵Linus Torvalds3-112/+20
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 microcode updates from Thomas Gleixner: - Disable late microcode loading by default. Unless the HW people get their act together and provide a required minimum version in the microcode header for making a halfways informed decision its just lottery and broken. - Warn and taint the kernel when microcode is loaded late - Remove the old unused microcode loader interface - Remove a redundant perf callback from the microcode loader * tag 'x86-microcode-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/microcode: Remove unnecessary perf callback x86/microcode: Taint and warn on late loading x86/microcode: Default-disable late loading x86/microcode: Rip out the OLD_INTERFACE
2022-06-05Merge tag 'x86-cleanups-2022-06-05' of ↵Linus Torvalds5-71/+66
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Thomas Gleixner: "A set of small x86 cleanups: - Remove unused headers in the IDT code - Kconfig indendation and comment fixes - Fix all 'the the' typos in one go instead of waiting for bots to fix one at a time" * tag 'x86-cleanups-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Fix all occurences of the "the the" typo x86/idt: Remove unused headers x86/Kconfig: Fix indentation of arch/x86/Kconfig.debug x86/Kconfig: Fix indentation and add endif comments to arch/x86/Kconfig
2022-06-05Merge tag 'x86-boot-2022-06-05' of ↵Linus Torvalds1-3/+3
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 boot update from Thomas Gleixner: "Use strlcpy() instead of strscpy() in arch_setup()" * tag 'x86-boot-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/setup: Use strscpy() to replace deprecated strlcpy()
2022-06-05Merge tag 'perf-urgent-2022-06-05' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Thomas Gleixner: - Make the ICL event constraints match reality - Remove a unused local variable * tag 'perf-urgent-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/core: Remove unused local variable perf/x86/intel: Fix event constraints for ICL
2022-06-05Merge tag 'perf-core-2022-06-05' of ↵Linus Torvalds1-6/+6
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixlet from Thomas Gleixner: "Trivial indentation fix in Kconfig" * tag 'perf-core-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/Kconfig: Fix indentation in the Kconfig file
2022-06-05Merge tag 'objtool-urgent-2022-06-05' of ↵Linus Torvalds5-5/+14
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull objtool fixes from Thomas Gleixner: - Handle __ubsan_handle_builtin_unreachable() correctly and treat it as noreturn - Allow architectures to select uaccess validation - Use the non-instrumented bit test for test_cpu_has() to prevent escape from non-instrumentable regions - Use arch_ prefixed atomics for JUMP_LABEL=n builds to prevent escape from non-instrumentable regions - Mark a few tiny inline as __always_inline to prevent GCC from bringing them out of line and instrumenting them - Mark the empty stub context_tracking_enabled() as always inline as GCC brings them out of line and instruments the empty shell - Annotate ex_handler_msr_mce() as dead end * tag 'objtool-urgent-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/extable: Annotate ex_handler_msr_mce() as a dead end context_tracking: Always inline empty stubs x86: Always inline on_thread_stack() and current_top_of_stack() jump_label,noinstr: Avoid instrumentation for JUMP_LABEL=n builds x86/cpu: Elide KCSAN for cpu_has() and friends objtool: Mark __ubsan_handle_builtin_unreachable() as noreturn objtool: Add CONFIG_HAVE_UACCESS_VALIDATION
2022-06-05Merge tag 'kbuild-v5.19-3' of ↵Linus Torvalds2-7/+0
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull more Kbuild updates from Masahiro Yamada: - Fix build regressions for parisc, csky, nios2, openrisc - Simplify module builds for CONFIG_LTO_CLANG and CONFIG_X86_KERNEL_IBT - Remove arch/parisc/nm, which was presumably a workaround for old tools - Check the odd combination of EXPORT_SYMBOL and 'static' precisely - Make external module builds robust against "too long argument error" - Support j, k keys for moving the cursor in nconfig * tag 'kbuild-v5.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (25 commits) kbuild: Allow to select bash in a modified environment scripts: kconfig: nconf: make nconfig accept jk keybindings modpost: use fnmatch() to simplify match() modpost: simplify mod->name allocation kbuild: factor out the common objtool arguments kbuild: move vmlinux.o link to scripts/Makefile.vmlinux_o kbuild: clean .tmp_* pattern by make clean kbuild: remove redundant cleanups in scripts/link-vmlinux.sh kbuild: rebuild multi-object modules when objtool is updated kbuild: add cmd_and_savecmd macro kbuild: make *.mod rule robust against too long argument error kbuild: make built-in.a rule robust against too long argument error kbuild: check static EXPORT_SYMBOL* by script instead of modpost parisc: remove arch/parisc/nm kbuild: do not create *.prelink.o for Clang LTO or IBT kbuild: replace $(linked-object) with CONFIG options kbuild: do not try to parse *.cmd files for objects provided by compiler kbuild: replace $(if A,A,B) with $(or A,B) in scripts/Makefile.modpost modpost: squash if...else-if in find_elf_symbol2() modpost: reuse ARRAY_SIZE() macro for section_mismatch() ...
2022-06-05Merge tag 'pull-18-rc1-work.mount' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull mount handling updates from Al Viro: "Cleanups (and one fix) around struct mount handling. The fix is usermode_driver.c one - once you've done kern_mount(), you must kern_unmount(); simple mntput() will end up with a leak. Several failure exits in there messed up that way... In practice you won't hit those particular failure exits without fault injection, though" * tag 'pull-18-rc1-work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: move mount-related externs from fs.h to mount.h blob_to_mnt(): kern_unmount() is needed to undo kern_mount() m->mnt_root->d_inode->i_sb is a weird way to spell m->mnt_sb... linux/mount.h: trim includes uninline may_mount() and don't opencode it in fspick(2)/fsopen(2)
2022-06-05Merge tag 'bitmap-for-5.19-rc1' of https://github.com/norov/linuxLinus Torvalds6-23/+14
Pull bitmap updates from Yury Norov: - bitmap: optimize bitmap_weight() usage, from me - lib/bitmap.c make bitmap_print_bitmask_to_buf parseable, from Mauro Carvalho Chehab - include/linux/find: Fix documentation, from Anna-Maria Behnsen - bitmap: fix conversion from/to fix-sized arrays, from me - bitmap: Fix return values to be unsigned, from Kees Cook It has been in linux-next for at least a week with no problems. * tag 'bitmap-for-5.19-rc1' of https://github.com/norov/linux: (31 commits) nodemask: Fix return values to be unsigned bitmap: Fix return values to be unsigned KVM: x86: hyper-v: replace bitmap_weight() with hweight64() KVM: x86: hyper-v: fix type of valid_bank_mask ia64: cleanup remove_siblinginfo() drm/amd/pm: use bitmap_{from,to}_arr32 where appropriate KVM: s390: replace bitmap_copy with bitmap_{from,to}_arr64 where appropriate lib/bitmap: add test for bitmap_{from,to}_arr64 lib: add bitmap_{from,to}_arr64 lib/bitmap: extend comment for bitmap_(from,to)_arr32() include/linux/find: Fix documentation lib/bitmap.c make bitmap_print_bitmask_to_buf parseable MAINTAINERS: add cpumask and nodemask files to BITMAP_API arch/x86: replace nodes_weight with nodes_empty where appropriate mm/vmstat: replace cpumask_weight with cpumask_empty where appropriate clocksource: replace cpumask_weight with cpumask_empty in clocksource.c genirq/affinity: replace cpumask_weight with cpumask_empty where appropriate irq: mips: replace cpumask_weight with cpumask_empty where appropriate drm/i915/pmu: replace cpumask_weight with cpumask_empty where appropriate arch/x86: replace cpumask_weight with cpumask_empty where appropriate ...
2022-06-04Merge tag 'for-5.19/parisc-2' of ↵Linus Torvalds3-17/+5
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull more parisc architecture updates from Helge Deller: "A fix to prevent crash at bootup if CONFIG_SCHED_MC is enabled, and add auto-detection of primary graphics card for framebuffer driver" * tag 'for-5.19/parisc-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc/stifb: Keep track of hardware path of graphics card parisc/stifb: Implement fb_is_primary_device() parisc: fix a crash with multicore scheduler
2022-06-04Merge tag 'for-linus-5.19-rc1b-tag' of ↵Linus Torvalds1-3/+0
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull more xen updates from Juergen Gross: "Two cleanup patches for Xen related code and (more important) an update of MAINTAINERS for Xen, as Boris Ostrovsky decided to step down" * tag 'for-linus-5.19-rc1b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen: replace xen_remap() with memremap() MAINTAINERS: Update Xen maintainership xen: switch gnttab_end_foreign_access() to take a struct page pointer
2022-06-04parisc/stifb: Implement fb_is_primary_device()Helge Deller1-0/+4
Implement fb_is_primary_device() function, so that fbcon detects if this framebuffer belongs to the default graphics card which was used to start the system. Signed-off-by: Helge Deller <deller@gmx.de> Cc: stable@vger.kernel.org # v5.10+
2022-06-04Merge tag 'ptrace_stop-cleanup-for-v5.19' of ↵Linus Torvalds10-75/+15
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull ptrace_stop cleanups from Eric Biederman: "While looking at the ptrace problems with PREEMPT_RT and the problems Peter Zijlstra was encountering with ptrace in his freezer rewrite I identified some cleanups to ptrace_stop that make sense on their own and move make resolving the other problems much simpler. The biggest issue is the habit of the ptrace code to change task->__state from the tracer to suppress TASK_WAKEKILL from waking up the tracee. No other code in the kernel does that and it is straight forward to update signal_wake_up and friends to make that unnecessary. Peter's task freezer sets frozen tasks to a new state TASK_FROZEN and then it stores them by calling "wake_up_state(t, TASK_FROZEN)" relying on the fact that all stopped states except the special stop states can tolerate spurious wake up and recover their state. The state of stopped and traced tasked is changed to be stored in task->jobctl as well as in task->__state. This makes it possible for the freezer to recover tasks in these special states, as well as serving as a general cleanup. With a little more work in that direction I believe TASK_STOPPED can learn to tolerate spurious wake ups and become an ordinary stop state. The TASK_TRACED state has to remain a special state as the registers for a process are only reliably available when the process is stopped in the scheduler. Fundamentally ptrace needs acess to the saved register values of a task. There are bunch of semi-random ptrace related cleanups that were found while looking at these issues. One cleanup that deserves to be called out is from commit 57b6de08b5f6 ("ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs"). This makes a change that is technically user space visible, in the handling of what happens to a tracee when a tracer dies unexpectedly. According to our testing and our understanding of userspace nothing cares that spurious SIGTRAPs can be generated in that case" * tag 'ptrace_stop-cleanup-for-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state ptrace: Always take siglock in ptrace_resume ptrace: Don't change __state ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs ptrace: Document that wait_task_inactive can't fail ptrace: Reimplement PTRACE_KILL by always sending SIGKILL signal: Use lockdep_assert_held instead of assert_spin_locked ptrace: Remove arch_ptrace_attach ptrace/xtensa: Replace PT_SINGLESTEP with TIF_SINGLESTEP ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP signal: Replace __group_send_sig_info with send_signal_locked signal: Rename send_signal send_signal_locked
2022-06-04Merge tag 'kthread-cleanups-for-v5.19' of ↵Linus Torvalds25-137/+173
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull kthread updates from Eric Biederman: "This updates init and user mode helper tasks to be ordinary user mode tasks. Commit 40966e316f86 ("kthread: Ensure struct kthread is present for all kthreads") caused init and the user mode helper threads that call kernel_execve to have struct kthread allocated for them. This struct kthread going away during execve in turned made a use after free of struct kthread possible. Here, commit 343f4c49f243 ("kthread: Don't allocate kthread_struct for init and umh") is enough to fix the use after free and is simple enough to be backportable. The rest of the changes pass struct kernel_clone_args to clean things up and cause the code to make sense. In making init and the user mode helpers tasks purely user mode tasks I ran into two complications. The function task_tick_numa was detecting tasks without an mm by testing for the presence of PF_KTHREAD. The initramfs code in populate_initrd_image was using flush_delayed_fput to ensuere the closing of all it's file descriptors was complete, and flush_delayed_fput does not work in a userspace thread. I have looked and looked and more complications and in my code review I have not found any, and neither has anyone else with the code sitting in linux-next" * tag 'kthread-cleanups-for-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: sched: Update task_tick_numa to ignore tasks without an mm fork: Stop allowing kthreads to call execve fork: Explicitly set PF_KTHREAD init: Deal with the init process being a user mode process fork: Generalize PF_IO_WORKER handling fork: Explicity test for idle tasks in copy_thread fork: Pass struct kernel_clone_args into copy_thread kthread: Don't allocate kthread_struct for init and umh
2022-06-04Merge tag 'for-linus-5.19-rc1' of ↵Linus Torvalds15-53/+85
git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml Pull UML updates from Richard Weinberger: - Various cleanups and fixes: xterm, serial line, time travel - Set ARCH_HAS_GCOV_PROFILE_ALL * tag 'for-linus-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml: um: Fix out-of-bounds read in LDT setup um: chan_user: Fix winch_tramp() return value um: virtio_uml: Fix broken device handling in time-travel um: line: Use separate IRQs per line um: Enable ARCH_HAS_GCOV_PROFILE_ALL um: Use asm-generic/dma-mapping.h um: daemon: Make default socket configurable um: xterm: Make default terminal emulator configurable
2022-06-04Merge tag 'arm-multiplatform-5.19-3' of ↵Linus Torvalds12-897/+584
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull yet more ARM multiplatform updates from Arnd Bergmann: "This is the third and final bit of the multiplatform conversion for ARMv5, finishing off OMAP1. One patch enables the common-clk interface, and the other ones does the Kconfig change. These were waiting on a few dependencies to trickle in for common-clk, and the last one of those was in the USB tree" * tag 'arm-multiplatform-5.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: ARM: omap1: enable multiplatform ARM: OMAP1: clock: Convert to CCF
2022-06-04Merge tag 'loongarch-5.19' of ↵Linus Torvalds180-0/+19909
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pull initial Loongarch architecture code from Arnd Bergmann: "This is the majority of the loongarch architecture code, including the final system call interface and all core functionality. It still misses three sets of peripheral but vital patches to add support for other subsystems, which have yet to pass review: - The drivers/firmware/efi stub for booting from a standard UEFI firmware implementation. Both the original custom boot interface and a draft implementation of the EFI stub did not make it, so it is currently impossible to boot the kernel, until the loongarch specific portions get accepted into the UEFI subsystem - The drivers/irqchip/irq-loongson-*.c drivers are shared with the the MIPS port, but currently lack support for ACPI based booting, which will get merged through the irqchip subsystem. - Similarly, the drivers/pci/controller/pci-loongson.c needs to be modified for ACPI support, which will be merged through the PCI subsystem. While the port cannot actually be used before all the above are merged, having it in 5.19 helps to establish the user space ABI for the libc ports to build on, and to help any treewide changes in the mainline kernel get applied here as well. A gcc-12 based tool chains for build testing is now included in https://mirrors.edge.kernel.org/pub/tools/crosstool/" Original description from Huacai Chen: "LoongArch is a new RISC ISA, which is a bit like MIPS or RISC-V. LoongArch includes a reduced 32-bit version (LA32R), a standard 32-bit version (LA32S) and a 64-bit version (LA64). LoongArch use ACPI as its boot protocol LoongArch-specific interrupt controllers (similar to APIC) are already added in the next revision of ACPI Specification (current revision is 6.4). This patchset is adding basic LoongArch support in mainline kernel, we can see a complete snapshot here: https://github.com/loongson/linux/tree/loongarch-next https://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson.git/log/?h=loongarch-next Cross-compile tool chain to build kernel: https://github.com/loongson/build-tools/releases/download/2021.12.21/loongarch64-clfs-2022-03-03-cross-tools-gcc-glibc.tar.xz A CLFS-based Linux distro: https://github.com/loongson/build-tools/releases/download/2021.12.21/loongarch64-clfs-system-2022-03-03.tar.bz2 Open-source tool chain which is under review (Binutils and Gcc are already upstream): https://github.com/loongson/binutils-gdb/tree/upstream_v3.1 https://github.com/loongson/gcc/tree/loongarch_upstream_v6.3 https://github.com/loongson/glibc/tree/loongarch_2_35_dev_v2.2 Loongson and LoongArch documentations: https://github.com/loongson/LoongArch-Documentation LoongArch-specific interrupt controllers: https://mantis.uefi.org/mantis/view.php?id=2203 https://mantis.uefi.org/mantis/view.php?id=2313" Link: https://lore.kernel.org/lkml/20220603072053.35005-1-chenhuacai@loongson.cn/ * tag 'loongarch-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: (24 commits) MAINTAINERS: Add maintainer information for LoongArch LoongArch: Add Loongson-3 default config file LoongArch: Add Non-Uniform Memory Access (NUMA) support LoongArch: Add multi-processor (SMP) support LoongArch: Add VDSO and VSYSCALL support LoongArch: Add some library functions LoongArch: Add misc common routines LoongArch: Add ELF and module support LoongArch: Add signal handling support LoongArch: Add system call support LoongArch: Add memory management LoongArch: Add process management LoongArch: Add exception/interrupt handling LoongArch: Add boot and setup routines LoongArch: Add other common headers LoongArch: Add atomic/locking headers LoongArch: Add CPU definition headers LoongArch: Add build infrastructure LoongArch: Add writecombine support for drm LoongArch: Add ELF-related definitions ...
2022-06-04Merge tag 'arm64-fixes' of ↵Linus Torvalds3-5/+6
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Catalin Marinas: "Most of issues addressed were introduced during this merging window. - Initialise jump labels before setup_machine_fdt(), needed by commit f5bda35fba61 ("random: use static branch for crng_ready()"). - Sparse warnings: missing prototype, incorrect __user annotation. - Skip SVE kselftest if not sufficient vector lengths supported" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: kselftest/arm64: signal: Skip SVE signal test if not enough VLs supported arm64: Initialize jump labels before setup_machine_fdt() arm64: hibernate: Fix syntax errors in comments arm64: Remove the __user annotation for the restore_za_context() argument ftrace/fgraph: fix increased missing-prototypes warnings