summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2022-12-28rxrpc: Fix a couple of potential use-after-freesDavid Howells2-9/+11
At the end of rxrpc_recvmsg(), if a call is found, the call is put and then a trace line is emitted referencing that call in a couple of places - but the call may have been deallocated by the time those traces happen. Fix this by stashing the call debug_id in a variable and passing that to the tracepoint rather than the call pointer. Fixes: 849979051cbc ("rxrpc: Add a tracepoint to follow what recvmsg does") Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-28fbdev: atyfb: use strscpy() to instead of strncpy()Xu Panda1-2/+1
The implementation of strscpy() is more robust and safer. That's now the recommended way to copy NUL-terminated strings. Signed-off-by: Xu Panda <xu.panda@zte.com.cn> Signed-off-by: Yang Yang <yang.yang29@zte.com> Signed-off-by: Helge Deller <deller@gmx.de>
2022-12-28fbdev: omapfb: use strscpy() to instead of strncpy()Xu Panda1-3/+2
The implementation of strscpy() is more robust and safer. That's now the recommended way to copy NUL-terminated strings. Signed-off-by: Xu Panda <xu.panda@zte.com.cn> Signed-off-by: Yang Yang <yang.yang29@zte.com> Signed-off-by: Helge Deller <deller@gmx.de>
2022-12-28fbdev: make offb driver tristateRandy Dunlap1-2/+2
Make the offb (Open Firmware frame buffer) driver tristate, i.e., so that it can be built as a loadable module. However, it still depends on the setting of DRM_OFDRM so that both of these drivers cannot be builtin at the same time nor can one be builtin and the other one a loadable module. Build-tested successfully with all combination of DRM_OFDRM and FB_OF. This fixes a build issue that Michal reported when FB_OF=y and DRM_OFDRM=m: powerpc64-linux-ld: drivers/video/fbdev/offb.o:(.data.rel.ro+0x58): undefined reference to `cfb_fillrect' powerpc64-linux-ld: drivers/video/fbdev/offb.o:(.data.rel.ro+0x60): undefined reference to `cfb_copyarea' powerpc64-linux-ld: drivers/video/fbdev/offb.o:(.data.rel.ro+0x68): undefined reference to `cfb_imageblit' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Suggested-by: Arnd Bergmann <arnd@arndb.de> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: Michal Suchánek <msuchanek@suse.de> Cc: linuxppc-dev@lists.ozlabs.org Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Helge Deller <deller@gmx.de> Cc: linux-fbdev@vger.kernel.org Cc: dri-devel@lists.freedesktop.org Acked-by: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: Helge Deller <deller@gmx.de>
2022-12-27uapi:io_uring.h: allow linux/time_types.h to be skippedStefan Metzmacher1-0/+8
include/uapi/linux/io_uring.h is synced 1:1 into liburing:src/include/liburing/io_uring.h. liburing has a configure check to detect the need for linux/time_types.h. It can opt-out by defining UAPI_LINUX_IO_URING_H_SKIP_LINUX_TIME_TYPES_H Fixes: 78a861b94959 ("io_uring: add sync cancelation API through io_uring_register()") Link: https://github.com/axboe/liburing/issues/708 Link: https://github.com/axboe/liburing/pull/709 Link: https://lore.kernel.org/io-uring/20221115212614.1308132-1-ammar.faizi@intel.com/T/#m9f5dd571cd4f6a5dee84452dbbca3b92ba7a4091 CC: Jens Axboe <axboe@kernel.dk> Cc: Ammar Faizi <ammarfaizi2@gnuweeb.org> Signed-off-by: Stefan Metzmacher <metze@samba.org> Reviewed-by: Ammar Faizi <ammarfaizi2@gnuweeb.org> Link: https://lore.kernel.org/r/7071a0a1d751221538b20b63f9160094fc7e06f4.1668630247.git.metze@samba.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-12-27futex: Fix futex_waitv() hrtimer debug object leak on kcalloc errorMathieu Desnoyers1-4/+7
In a scenario where kcalloc() fails to allocate memory, the futex_waitv system call immediately returns -ENOMEM without invoking destroy_hrtimer_on_stack(). When CONFIG_DEBUG_OBJECTS_TIMERS=y, this results in leaking a timer debug object. Fixes: bf69bad38cf6 ("futex: Implement sys_futex_waitv()") Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Cc: stable@vger.kernel.org Cc: stable@vger.kernel.org # v5.16+ Link: https://lore.kernel.org/r/20221214222008.200393-1-mathieu.desnoyers@efficios.com
2022-12-27x86/kprobes: Fix optprobe optimization check with CONFIG_RETHUNKMasami Hiramatsu (Google)1-20/+8
Since the CONFIG_RETHUNK and CONFIG_SLS will use INT3 for stopping speculative execution after function return, kprobe jump optimization always fails on the functions with such INT3 inside the function body. (It already checks the INT3 padding between functions, but not inside the function) To avoid this issue, as same as kprobes, check whether the INT3 comes from kgdb or not, and if so, stop decoding and make it fail. The other INT3 will come from CONFIG_RETHUNK/CONFIG_SLS and those can be treated as a one-byte instruction. Fixes: e463a09af2f0 ("x86: Add straight-line-speculation mitigation") Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/167146051929.1374301.7419382929328081706.stgit@devnote3
2022-12-27x86/kprobes: Fix kprobes instruction boudary check with CONFIG_RETHUNKMasami Hiramatsu (Google)1-3/+7
Since the CONFIG_RETHUNK and CONFIG_SLS will use INT3 for stopping speculative execution after RET instruction, kprobes always failes to check the probed instruction boundary by decoding the function body if the probed address is after such sequence. (Note that some conditional code blocks will be placed after function return, if compiler decides it is not on the hot path.) This is because kprobes expects kgdb puts the INT3 as a software breakpoint and it will replace the original instruction. But these INT3 are not such purpose, it doesn't need to recover the original instruction. To avoid this issue, kprobes checks whether the INT3 is owned by kgdb or not, and if so, stop decoding and make it fail. The other INT3 will come from CONFIG_RETHUNK/CONFIG_SLS and those can be treated as a one-byte instruction. Fixes: e463a09af2f0 ("x86: Add straight-line-speculation mitigation") Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/167146051026.1374301.392728975473572291.stgit@devnote3
2022-12-27x86/calldepth: Fix incorrect init section referencesArnd Bergmann1-2/+2
The addition of callthunks_translate_call_dest means that skip_addr() and patch_dest() can no longer be discarded as part of the __init section freeing: WARNING: modpost: vmlinux.o: section mismatch in reference: callthunks_translate_call_dest.cold (section: .text.unlikely) -> skip_addr (section: .init.text) WARNING: modpost: vmlinux.o: section mismatch in reference: callthunks_translate_call_dest.cold (section: .text.unlikely) -> patch_dest (section: .init.text) WARNING: modpost: vmlinux.o: section mismatch in reference: is_callthunk.cold (section: .text.unlikely) -> skip_addr (section: .init.text) ERROR: modpost: Section mismatches detected. Set CONFIG_SECTION_MISMATCH_WARN_ONLY=y to allow them. Fixes: b2e9dfe54be4 ("x86/bpf: Emit call depth accounting if required") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20221215164334.968863-1-arnd@kernel.org
2022-12-27perf/core: Call LSM hook after copying perf_event_attrNamhyung Kim1-3/+3
It passes the attr struct to the security_perf_event_open() but it's not initialized yet. Fixes: da97e18458fb ("perf_event: Add support for LSM and SELinux checks") Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20221220223140.4020470-1-namhyung@kernel.org
2022-12-27perf: Fix use-after-free in error pathPeter Zijlstra1-1/+3
The syscall error path has a use-after-free; put_pmu_ctx() will reference ctx, therefore we must ensure ctx is destroyed after pmu_ctx is. Fixes: bd2756811766 ("perf: Rewrite core context handling") Reported-by: syzbot+b8e8c01c8ade4fe6e48f@syzkaller.appspotmail.com Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Chengming Zhou <zhouchengming@bytedance.com> Link: https://lkml.kernel.org/r/Y6B3xEgkbmFUCeni@hirez.programming.kicks-ass.net
2022-12-27perf/x86/amd: fix potential integer overflow on shift of a intColin Ian King1-1/+1
The left shift of int 32 bit integer constant 1 is evaluated using 32 bit arithmetic and then passed as a 64 bit function argument. In the case where i is 32 or more this can lead to an overflow. Avoid this by shifting using the BIT_ULL macro instead. Fixes: 471af006a747 ("perf/x86/amd: Constrain Large Increment per Cycle events") Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Ian Rogers <irogers@google.com> Acked-by: Kim Phillips <kim.phillips@amd.com> Link: https://lore.kernel.org/r/20221202135149.1797974-1-colin.i.king@gmail.com
2022-12-27perf/core: Fix cgroup events trackingChengming Zhou1-32/+10
We encounter perf warnings when using cgroup events like: cd /sys/fs/cgroup mkdir test perf stat -e cycles -a -G test Which then triggers: WARNING: CPU: 0 PID: 690 at kernel/events/core.c:849 perf_cgroup_switch+0xb2/0xc0 Call Trace: <TASK> __schedule+0x4ae/0x9f0 ? _raw_spin_unlock_irqrestore+0x23/0x40 ? __cond_resched+0x18/0x20 preempt_schedule_common+0x2d/0x70 __cond_resched+0x18/0x20 wait_for_completion+0x2f/0x160 ? cpu_stop_queue_work+0x9e/0x130 affine_move_task+0x18a/0x4f0 WARNING: CPU: 0 PID: 690 at kernel/events/core.c:829 ctx_sched_in+0x1cf/0x1e0 Call Trace: <TASK> ? ctx_sched_out+0xb7/0x1b0 perf_cgroup_switch+0x88/0xc0 __schedule+0x4ae/0x9f0 ? _raw_spin_unlock_irqrestore+0x23/0x40 ? __cond_resched+0x18/0x20 preempt_schedule_common+0x2d/0x70 __cond_resched+0x18/0x20 wait_for_completion+0x2f/0x160 ? cpu_stop_queue_work+0x9e/0x130 affine_move_task+0x18a/0x4f0 The above two warnings are not complete here since I remove other unimportant information. The problem is caused by the perf cgroup events tracking: CPU0 CPU1 perf_event_open() perf_event_alloc() account_event() account_event_cpu() atomic_inc(perf_cgroup_events) __perf_event_task_sched_out() if (atomic_read(perf_cgroup_events)) perf_cgroup_switch() // kernel/events/core.c:849 WARN_ON_ONCE(cpuctx->ctx.nr_cgroups == 0) if (READ_ONCE(cpuctx->cgrp) == cgrp) // false return perf_ctx_lock() ctx_sched_out() cpuctx->cgrp = cgrp ctx_sched_in() perf_cgroup_set_timestamp() // kernel/events/core.c:829 WARN_ON_ONCE(!ctx->nr_cgroups) perf_ctx_unlock() perf_install_in_context() cpu_function_call() __perf_install_in_context() add_event_to_ctx() list_add_event() perf_cgroup_event_enable() ctx->nr_cgroups++ cpuctx->cgrp = X We can see from above that we wrongly use percpu atomic perf_cgroup_events to check if we need to perf_cgroup_switch(), which should only be used when we know this CPU has cgroup events enabled. The commit bd2756811766 ("perf: Rewrite core context handling") change to have only one context per-CPU, so we can just use cpuctx->cgrp to check if this CPU has cgroup events enabled. So percpu atomic perf_cgroup_events is not needed. Fixes: bd2756811766 ("perf: Rewrite core context handling") Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lkml.kernel.org/r/20221207124023.66252-1-zhouchengming@bytedance.com
2022-12-27perf core: Return error pointer if inherit_event() fails to find pmu_ctxRavi Bangoria1-1/+1
inherit_event() returns NULL only when it finds orphaned events otherwise it returns either valid child_event pointer or an error pointer. Follow the same when it fails to find pmu_ctx. Fixes: bd2756811766 ("perf: Rewrite core context handling") Reported-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20221118051539.820-1-ravi.bangoria@amd.com
2022-12-27x86/hyperv: Add HV_EXPOSE_INVARIANT_TSC defineVitaly Kuznetsov2-1/+4
Avoid open coding BIT(0) of HV_X64_MSR_TSC_INVARIANT_CONTROL by adding a dedicated define. While there's only one user at this moment, the upcoming KVM implementation of Hyper-V Invariant TSC feature will need to use it as well. Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221013095849.705943-2-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-27KVM: x86/mmu: Pivot on "TDP MMU enabled" when handling direct page faultsSean Christopherson1-1/+1
When handling direct page faults, pivot on the TDP MMU being globally enabled instead of checking if the target MMU is a TDP MMU. Now that the TDP MMU is all-or-nothing, if the TDP MMU is enabled, KVM will reach direct_page_fault() if and only if the MMU is a TDP MMU. When TDP is enabled (obviously required for the TDP MMU), only non-nested TDP page faults reach direct_page_fault(), i.e. nonpaging MMUs are impossible, as NPT requires paging to be enabled and EPT faults use ept_page_fault(). Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221012181702.3663607-8-seanjc@google.com> [Use tdp_mmu_enabled variable. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-27KVM: x86/mmu: Pivot on "TDP MMU enabled" to check if active MMU is TDP MMUSean Christopherson2-21/+8
Simplify and optimize the logic for detecting if the current/active MMU is a TDP MMU. If the TDP MMU is globally enabled, then the active MMU is a TDP MMU if it is direct. When TDP is enabled, so called nonpaging MMUs are never used as the only form of shadow paging KVM uses is for nested TDP, and the active MMU can't be direct in that case. Rename the helper and take the vCPU instead of an arbitrary MMU, as nonpaging MMUs can show up in the walk_mmu if L1 is using nested TDP and L2 has paging disabled. Taking the vCPU has the added bonus of cleaning up the callers, all of which check the current MMU but wrap code that consumes the vCPU. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221012181702.3663607-9-seanjc@google.com> [Use tdp_mmu_enabled variable. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-27KVM: x86/mmu: Replace open coded usage of tdp_mmu_page with is_tdp_mmu_page()Sean Christopherson2-2/+2
Use is_tdp_mmu_page() instead of querying sp->tdp_mmu_page directly so that all users benefit if KVM ever finds a way to optimize the logic. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221012181702.3663607-10-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-27KVM: x86/mmu: Rename __direct_map() to direct_map()David Matlack2-8/+8
Rename __direct_map() to direct_map() since the leading underscores are unnecessary. This also makes the page fault handler names more consistent: kvm_tdp_mmu_page_fault() calls kvm_tdp_mmu_map() and direct_page_fault() calls direct_map(). Opportunistically make some trivial cleanups to comments that had to be modified anyway since they mentioned __direct_map(). Specifically, use "()" when referring to functions, and include kvm_tdp_mmu_map() among the various callers of disallowed_hugepage_adjust(). No functional change intended. Signed-off-by: David Matlack <dmatlack@google.com> Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220921173546.2674386-11-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-27KVM: x86/mmu: Stop needlessly making MMU pages available for TDP MMU faultsDavid Matlack1-4/+0
Stop calling make_mmu_pages_available() when handling TDP MMU faults. The TDP MMU does not participate in the "available MMU pages" tracking and limiting so calling this function is unnecessary work when handling TDP MMU faults. Signed-off-by: David Matlack <dmatlack@google.com> Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220921173546.2674386-10-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-27KVM: x86/mmu: Split out TDP MMU page fault handlingDavid Matlack1-14/+48
Split out the page fault handling for the TDP MMU to a separate function. This creates some duplicate code, but makes the TDP MMU fault handler simpler to read by eliminating branches and will enable future cleanups by allowing the TDP MMU and non-TDP MMU fault paths to diverge. Only compile in the TDP MMU fault handler for 64-bit builds since kvm_tdp_mmu_map() does not exist in 32-bit builds. No functional change intended. Signed-off-by: David Matlack <dmatlack@google.com> Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220921173546.2674386-9-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-27KVM: x86/mmu: Initialize fault.{gfn,slot} earlier for direct MMUsDavid Matlack2-4/+6
Move the initialization of fault.{gfn,slot} earlier in the page fault handling code for fully direct MMUs. This will enable a future commit to split out TDP MMU page fault handling without needing to duplicate the initialization of these 2 fields. Opportunistically take advantage of the fact that fault.gfn is initialized in kvm_tdp_page_fault() rather than recomputing it from fault->addr. No functional change intended. Signed-off-by: David Matlack <dmatlack@google.com> Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220921173546.2674386-8-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-27KVM: x86/mmu: Handle no-slot faults in kvm_faultin_pfn()David Matlack2-31/+31
Handle faults on GFNs that do not have a backing memslot in kvm_faultin_pfn() and drop handle_abnormal_pfn(). This eliminates duplicate code in the various page fault handlers. Opportunistically tweak the comment about handling gfn > host.MAXPHYADDR to reflect that the effect of returning RET_PF_EMULATE at that point is to avoid creating an MMIO SPTE for such GFNs. No functional change intended. Signed-off-by: David Matlack <dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220921173546.2674386-7-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-27KVM: x86/mmu: Avoid memslot lookup during KVM_PFN_ERR_HWPOISON handlingDavid Matlack1-8/+10
Pass the kvm_page_fault struct down to kvm_handle_error_pfn() to avoid a memslot lookup when handling KVM_PFN_ERR_HWPOISON. Opportunistically move the gfn_to_hva_memslot() call and @current down into kvm_send_hwpoison_signal() to cut down on line lengths. No functional change intended. Signed-off-by: David Matlack <dmatlack@google.com> Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220921173546.2674386-6-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-27KVM: x86/mmu: Handle error PFNs in kvm_faultin_pfn()David Matlack1-5/+10
Handle error PFNs in kvm_faultin_pfn() rather than relying on the caller to invoke handle_abnormal_pfn() after kvm_faultin_pfn(). Opportunistically rename kvm_handle_bad_page() to kvm_handle_error_pfn() to make it more consistent with is_error_pfn(). This commit moves KVM closer to being able to drop handle_abnormal_pfn(), which will reduce the amount of duplicate code in the various page fault handlers. No functional change intended. Signed-off-by: David Matlack <dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220921173546.2674386-5-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-27KVM: x86/mmu: Grab mmu_invalidate_seq in kvm_faultin_pfn()David Matlack3-14/+14
Grab mmu_invalidate_seq in kvm_faultin_pfn() and stash it in struct kvm_page_fault. The eliminates duplicate code and reduces the amount of parameters needed for is_page_fault_stale(). Preemptively split out __kvm_faultin_pfn() to a separate function for use in subsequent commits. No functional change intended. Signed-off-by: David Matlack <dmatlack@google.com> Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220921173546.2674386-4-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-27KVM: x86/mmu: Move TDP MMU VM init/uninit behind tdp_mmu_enabledDavid Matlack3-14/+10
Move kvm_mmu_{init,uninit}_tdp_mmu() behind tdp_mmu_enabled. This makes these functions consistent with the rest of the calls into the TDP MMU from mmu.c, and which is now possible since tdp_mmu_enabled is only modified when the x86 vendor module is loaded. i.e. It will never change during the lifetime of a VM. This change also enabled removing the stub definitions for 32-bit KVM, as the compiler will just optimize the calls out like it does for all the other TDP MMU functions. No functional change intended. Signed-off-by: David Matlack <dmatlack@google.com> Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220921173546.2674386-3-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-27KVM: x86/mmu: Change tdp_mmu to a read-only parameterDavid Matlack4-38/+41
Change tdp_mmu to a read-only parameter and drop the per-vm tdp_mmu_enabled. For 32-bit KVM, make tdp_mmu_enabled a macro that is always false so that the compiler can continue omitting cals to the TDP MMU. The TDP MMU was introduced in 5.10 and has been enabled by default since 5.15. At this point there are no known functionality gaps between the TDP MMU and the shadow MMU, and the TDP MMU uses less memory and scales better with the number of vCPUs. In other words, there is no good reason to disable the TDP MMU on a live system. Purposely do not drop tdp_mmu=N support (i.e. do not force 64-bit KVM to always use the TDP MMU) since tdp_mmu=N is still used to get test coverage of KVM's shadow MMU TDP support, which is used in 32-bit KVM. Signed-off-by: David Matlack <dmatlack@google.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220921173546.2674386-2-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-27KVM: selftests: x86: Use TAP interface in the tsc_msrs_testThomas Huth1-4/+12
Let's add some output here so that the user has some feedback about what is being run. Signed-off-by: Thomas Huth <thuth@redhat.com> Message-Id: <20221004093131.40392-4-thuth@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-27KVM: selftests: Use TAP interface in the kvm_binary_stats_testThomas Huth1-2/+9
The kvm_binary_stats_test test currently does not have any output (unless one of the TEST_ASSERT statement fails), so it's hard to say for a user how far it did proceed already. Thus let's make this a little bit more user-friendly and include some TAP output via the kselftest.h interface. Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Andrew Jones <andrew.jones@linux.dev> Message-Id: <20221004093131.40392-2-thuth@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-27kvm: x86/mmu: Warn on linking when sp->unsync_childrenLai Jiangshan1-1/+10
Since the commit 65855ed8b034 ("KVM: X86: Synchronize the shadow pagetable before link it"), no sp would be linked with sp->unsync_children = 1. So make it WARN if it is the case. Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com> Message-Id: <20221212090106.378206-1-jiangshanlai@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-27KVM: VMX: Resurrect vmcs_conf sanitization for KVM-on-Hyper-VVitaly Kuznetsov3-0/+40
Commit 9bcb90650e31 ("KVM: VMX: Get rid of eVMCS specific VMX controls sanitization") dropped 'vmcs_conf' sanitization for KVM-on-Hyper-V because there's no known Hyper-V version which would expose a feature unsupported in eVMCS in VMX feature MSRs. This works well for all currently existing Hyper-V version, however, future Hyper-V versions may add features which are supported by KVM and are currently missing in eVMCSv1 definition (e.g. APIC virtualization, PML,...). When this happens, existing KVMs will get broken. With the inverted 'unsupported by eVMCSv1' checks, we can resurrect vmcs_conf sanitization and make KVM future proof. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20221104144708.435865-5-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-27KVM: nVMX: Prepare to sanitize tertiary execution controls with eVMCSVitaly Kuznetsov2-0/+6
In preparation to restoring vmcs_conf sanitization for KVM-on-Hyper-V, (and for completeness) add tertiary VM-execution controls to 'evmcs_supported_ctrls'. No functional change intended as KVM doesn't yet expose MSR_IA32_VMX_PROCBASED_CTLS3 to its guests. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20221104144708.435865-4-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-27KVM: nVMX: Invert 'unsupported by eVMCSv1' checkVitaly Kuznetsov2-36/+96
When a new feature gets implemented in KVM, EVMCS1_UNSUPPORTED_* defines need to be adjusted to avoid the situation when the feature is exposed to the guest but there's no corresponding eVMCS field[s] for it. This is not obvious and fragile. Invert 'unsupported by eVMCSv1' check and make it 'supported by eVMCSv1' instead, this way it's much harder to make a mistake. New features will get added to EVMCS1_SUPPORTED_* defines when the corresponding fields are added to eVMCS definition. No functional change intended. EVMCS1_SUPPORTED_* defines are composed by taking KVM_{REQUIRED,OPTIONAL}_VMX_ defines and filtering out what was previously known as EVMCS1_UNSUPPORTED_*. From all the controls, SECONDARY_EXEC_TSC_SCALING requires special handling as it's actually present in eVMCSv1 definition but is not currently supported for Hyper-V-on-KVM, just for KVM-on-Hyper-V. As evmcs_supported_ctrls will be used for both scenarios, just add it there instead of EVMCS1_SUPPORTED_2NDEXEC. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20221104144708.435865-3-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-27KVM: nVMX: Sanitize primary processor-based VM-execution controls with eVMCS tooVitaly Kuznetsov1-0/+12
The only unsupported primary processor-based VM-execution control at the moment is CPU_BASED_ACTIVATE_TERTIARY_CONTROLS and KVM doesn't expose it in nested VMX feature MSRs anyway (see nested_vmx_setup_ctls_msrs()) but in preparation to inverting "unsupported with eVMCS" checks (and for completeness) it's better to sanitize MSR_IA32_VMX_PROCBASED_CTLS/ MSR_IA32_VMX_TRUE_PROCBASED_CTLS too. No functional change intended. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20221104144708.435865-2-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-27KVM: x86/xen: Documentation updates and clarificationsDavid Woodhouse1-15/+26
Most notably, the KVM_XEN_EVTCHN_RESET feature had escaped documentation entirely. Along with how to turn most stuff off on SHUTDOWN_soft_reset. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Message-Id: <20221226120320.1125390-6-dwmw2@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-27KVM: x86/xen: Add KVM_XEN_INVALID_GPA and KVM_XEN_INVALID_GFN to uapiDavid Woodhouse2-7/+10
These are (uint64_t)-1 magic values are a userspace ABI, allowing the shared info pages and other enlightenments to be disabled. This isn't a Xen ABI because Xen doesn't let the guest turn these off except with the full SHUTDOWN_soft_reset mechanism. Under KVM, the userspace VMM is expected to handle soft reset, and tear down the kernel parts of the enlightenments accordingly. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Message-Id: <20221226120320.1125390-5-dwmw2@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-27KVM: x86/xen: Simplify eventfd IOCTLsMichal Luczaj1-7/+1
Port number is validated in kvm_xen_setattr_evtchn(). Remove superfluous checks in kvm_xen_eventfd_assign() and kvm_xen_eventfd_update(). Signed-off-by: Michal Luczaj <mhal@rbox.co> Message-Id: <20221222203021.1944101-3-mhal@rbox.co> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Message-Id: <20221226120320.1125390-4-dwmw2@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-27KVM: x86/xen: Fix SRCU/RCU usage in readers of evtchn_portsPaolo Bonzini1-11/+18
The evtchnfd structure itself must be protected by either kvm->lock or SRCU. Use the former in kvm_xen_eventfd_update(), since the lock is being taken anyway; kvm_xen_hcall_evtchn_send() instead is a reader and does not need kvm->lock, and is called in SRCU critical section from the kvm_x86_handle_exit function. It is also important to use rcu_read_{lock,unlock}() in kvm_xen_hcall_evtchn_send(), because idr_remove() will *not* use synchronize_srcu() to wait for readers to complete. Remove a superfluous if (kvm) check before calling synchronize_srcu() in kvm_xen_eventfd_deassign() where kvm has been dereferenced already. Co-developed-by: Michal Luczaj <mhal@rbox.co> Signed-off-by: Michal Luczaj <mhal@rbox.co> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Message-Id: <20221226120320.1125390-3-dwmw2@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-27KVM: x86/xen: Use kvm_read_guest_virt() instead of open-coding it badlyDavid Woodhouse1-38/+18
In particular, we shouldn't assume that being contiguous in guest virtual address space means being contiguous in guest *physical* address space. In dropping the manual calls to kvm_mmu_gva_to_gpa_system(), also drop the srcu_read_lock() that was around them. All call sites are reached from kvm_xen_hypercall() which is called from the handle_exit function with the read lock already held. 536395260 ("KVM: x86/xen: handle PV timers oneshot mode") 1a65105a5 ("KVM: x86/xen: handle PV spinlocks slowpath") Fixes: 2fd6df2f2 ("KVM: x86/xen: intercept EVTCHNOP_send from guests") Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Message-Id: <20221226120320.1125390-2-dwmw2@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-27KVM: x86/xen: Fix memory leak in kvm_xen_write_hypercall_page()Michal Luczaj1-3/+4
Release page irrespectively of kvm_vcpu_write_guest() return value. Suggested-by: Paul Durrant <paul@xen.org> Fixes: 23200b7a30de ("KVM: x86/xen: intercept xen hypercalls if enabled") Signed-off-by: Michal Luczaj <mhal@rbox.co> Message-Id: <20221220151454.712165-1-mhal@rbox.co> Reviewed-by: Paul Durrant <paul@xen.org> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Message-Id: <20221226120320.1125390-1-dwmw2@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-27KVM: Delete extra block of "};" in the KVM API documentationSean Christopherson1-5/+0
Delete an extra block of code/documentation that snuck in when KVM's documentation was converted to ReST format. Fixes: 106ee47dc633 ("docs: kvm: Convert api.txt to ReST format") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221207003637.2041211-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-27kvm: x86/mmu: Remove duplicated "be split" in spte.hLai Jiangshan1-1/+1
"be split be split" -> "be split" Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com> Message-Id: <20221207120505.9175-1-jiangshanlai@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-27kvm: Remove the unused macro KVM_MMU_READ_{,UN}LOCK()Lai Jiangshan1-4/+0
No code is using KVM_MMU_READ_LOCK() or KVM_MMU_READ_UNLOCK(). They used to be in virt/kvm/pfncache.c: KVM_MMU_READ_LOCK(kvm); retry = mmu_notifier_retry_hva(kvm, mmu_seq, uhva); KVM_MMU_READ_UNLOCK(kvm); However, since 58cd407ca4c6 ("KVM: Fix multiple races in gfn=>pfn cache refresh", 2022-05-25) the code is only relying on the MMU notifier's invalidation count and sequence number. Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com> Message-Id: <20221207120617.9409-1-jiangshanlai@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-27MAINTAINERS: adjust entry after renaming the vmx hyperv filesLukas Bulwahn1-1/+1
Commit a789aeba4196 ("KVM: VMX: Rename "vmx/evmcs.{ch}" to "vmx/hyperv.{ch}"") renames the VMX specific Hyper-V files, but does not adjust the entry in MAINTAINERS. Hence, ./scripts/get_maintainer.pl --self-test=patterns complains about a broken reference. Repair this file reference in KVM X86 HYPER-V (KVM/hyper-v). Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Fixes: a789aeba4196 ("KVM: VMX: Rename "vmx/evmcs.{ch}" to "vmx/hyperv.{ch}"") Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221205082044.10141-1-lukas.bulwahn@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-27KVM: selftests: Mark correct page as mapped in virt_map()Oliver Upton1-2/+2
The loop marks vaddr as mapped after incrementing it by page size, thereby marking the *next* page as mapped. Set the bit in vpages_mapped first instead. Fixes: 56fc7732031d ("KVM: selftests: Fill in vm->vpages_mapped bitmap in virt_map() too") Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Message-Id: <20221209015307.1781352-4-oliver.upton@linux.dev> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-27KVM: arm64: selftests: Don't identity map the ucall MMIO holeOliver Upton1-2/+4
Currently the ucall MMIO hole is placed immediately after slot0, which is a relatively safe address in the PA space. However, it is possible that the same address has already been used for something else (like the guest program image) in the VA space. At least in my own testing, building the vgic_irq test with clang leads to the MMIO hole appearing underneath gicv3_ops. Stop identity mapping the MMIO hole and instead find an unused VA to map to it. Yet another subtle detail of the KVM selftests library is that virt_pg_map() does not update vm->vpages_mapped. Switch over to virt_map() instead to guarantee that the chosen VA isn't to something else. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Message-Id: <20221209015307.1781352-6-oliver.upton@linux.dev> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-27KVM: selftests: document the default implementation of vm_vaddr_populate_bitmapPaolo Bonzini1-0/+9
Explain the meaning of the bit manipulations of vm_vaddr_populate_bitmap. These correspond to the "canonical addresses" of x86 and other architectures, but that is not obvious. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-27KVM: selftests: Use magic value to signal ucall_alloc() failureSean Christopherson1-2/+14
Use a magic value to signal a ucall_alloc() failure instead of simply doing GUEST_ASSERT(). GUEST_ASSERT() relies on ucall_alloc() and so a failure puts the guest into an infinite loop. Use -1 as the magic value, as a real ucall struct should never wrap. Reported-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-27KVM: selftests: Disable "gnu-variable-sized-type-not-at-end" warningSean Christopherson1-0/+1
Disable gnu-variable-sized-type-not-at-end so that tests and libraries can create overlays of variable sized arrays at the end of structs when using a fixed number of entries, e.g. to get/set a single MSR. It's possible to fudge around the warning, e.g. by defining a custom struct that hardcodes the number of entries, but that is a burden for both developers and readers of the code. lib/x86_64/processor.c:664:19: warning: field 'header' with variable sized type 'struct kvm_msrs' not at the end of a struct or class is a GNU extension [-Wgnu-variable-sized-type-not-at-end] struct kvm_msrs header; ^ lib/x86_64/processor.c:772:19: warning: field 'header' with variable sized type 'struct kvm_msrs' not at the end of a struct or class is a GNU extension [-Wgnu-variable-sized-type-not-at-end] struct kvm_msrs header; ^ lib/x86_64/processor.c:787:19: warning: field 'header' with variable sized type 'struct kvm_msrs' not at the end of a struct or class is a GNU extension [-Wgnu-variable-sized-type-not-at-end] struct kvm_msrs header; ^ 3 warnings generated. x86_64/hyperv_tlb_flush.c:54:18: warning: field 'hv_vp_set' with variable sized type 'struct hv_vpset' not at the end of a struct or class is a GNU extension [-Wgnu-variable-sized-type-not-at-end] struct hv_vpset hv_vp_set; ^ 1 warning generated. x86_64/xen_shinfo_test.c:137:25: warning: field 'info' with variable sized type 'struct kvm_irq_routing' not at the end of a struct or class is a GNU extension [-Wgnu-variable-sized-type-not-at-end] struct kvm_irq_routing info; ^ 1 warning generated. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221213001653.3852042-12-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>