Age | Commit message (Collapse) | Author | Files | Lines |
|
Common code needs common defines....
Fixes: f2f9e496208c ("x86: Support for lazy preemption")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Implement the x86 pieces for lazy preempt.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Allow to select RT.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Non constant TSC is a nightmare on bare metal already, but with
virtualization it becomes a complete disaster because the workarounds
are horrible latency wise. That's also a preliminary for running RT in
a guest on top of a RT host.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
On x86_64 we must disable preemption before we enable interrupts
for stack faults, int3 and debugging, because the current task is using
a per CPU debug stack defined by the IST. If we schedule out, another task
can come in and use the same stack and cause the stack to be corrupted
and crash the kernel on return.
When CONFIG_PREEMPT_RT is enabled, spin_locks become mutexes, and
one of these is the spin lock used in signal handling.
Some of the debug code (int3) causes do_trap() to send a signal.
This function calls a spin lock that has been converted to a mutex
and has the possibility to sleep. If this happens, the above issues with
the corrupted stack is possible.
Instead of calling the signal right away, for PREEMPT_RT and x86_64,
the signal information is stored on the stacks task_struct and
TIF_NOTIFY_RESUME is set. Then on exit of the trap, the signal resume
code will send the signal when preemption is enabled.
[ rostedt: Switched from #ifdef CONFIG_PREEMPT_RT to
ARCH_RT_DELAYS_SIGNAL_SEND and added comments to the code. ]
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bigeasy: also needed on 32bit as per Yang Shi <yang.shi@linaro.org>]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Delegate the random insertion to the forced threaded interrupt
handler. Store the return IP of the hard interrupt handler in the irq
descriptor and feed it into the random generator as a source of
entropy.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
CPU bringup calls into the random pool to initialize the stack
canary. During boot that works nicely even on RT as the might sleep
checks are disabled. During CPU hotplug the might sleep checks
trigger. Making the locks in random raw is a major PITA, so avoid the
call on RT is the only sensible solution. This is basically the same
randomness which we get during boot where the random pool has no
entropy and we rely on the TSC randomnness.
Reported-by: Carsten Emde <carsten.emde@osadl.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
PREEMPT_RT preempts softirqs and the current implementation avoids
do_softirq_own_stack() and only uses __do_softirq().
Disable the unused softirqs stacks on PREEMPT_RT to safe some memory and
ensure that do_softirq_own_stack() is not used which is not expected.
[bigeasy: commit description.]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lore.kernel.org/r/20210924161245.2357247-1-bigeasy@linutronix.de
|
|
Since printing occurs either atomically or from the printing
kthread, there is no need for any deferring or tracking possible
recursion paths. Remove all printk defer functions and context
tracking.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Add a write_atomic() callback to the console. This is an optional
function for console drivers. The function must be atomic (including
NMI safe) for writing to the console.
Console drivers must still implement the write() callback. The
write_atomic() callback will only be used in special situations,
such as when the kernel panics.
Creating an NMI safe write_atomic() that must synchronize with
write() requires a careful implementation of the console driver. To
aid with the implementation, a set of console_atomic_*() functions
are provided:
void console_atomic_lock(unsigned long flags);
void console_atomic_unlock(unsigned long flags);
These functions synchronize using the printk cpulock and disable
hardware interrupts.
kgdb makes use of its own cpulock (@dbg_master_lock, @kgdb_active)
during cpu roundup. This will conflict with the printk cpulock.
Therefore, a CPU must ensure that it is not holding the printk
cpulock when calling kgdb_cpu_enter(). If it is, it must allow its
printk context to complete first.
A new helper function kgdb_roundup_delay() is introduced for kgdb
to determine if it is holding the printk cpulock. If so, a flag is
set so that when the printk cpulock is released, kgdb will be
re-triggered for that CPU.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Pull kvm fixes from Paolo Bonzini:
- Fixes for s390 interrupt delivery
- Fixes for Xen emulator bugs showing up as debug kernel WARNs
- Fix another issue with SEV/ES string I/O VMGEXITs
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: Take srcu lock in post_kvm_run_save()
KVM: SEV-ES: fix another issue with string I/O VMGEXITs
KVM: x86/xen: Fix kvm_xen_has_interrupt() sleeping in kvm_vcpu_block()
KVM: x86: switch pvclock_gtod_sync_lock to a raw spinlock
KVM: s390: preserve deliverable_mask in __airqs_kick_single_vcpu
KVM: s390: clear kicked_mask before sleeping again
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fix from Herbert Xu:
"Fix a build-time warning in x86/sm4"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: x86/sm4 - Fix invalid section entry size
|
|
The Xen interrupt injection for event channels relies on accessing the
guest's vcpu_info structure in __kvm_xen_has_interrupt(), through a
gfn_to_hva_cache.
This requires the srcu lock to be held, which is mostly the case except
for this code path:
[ 11.822877] WARNING: suspicious RCU usage
[ 11.822965] -----------------------------
[ 11.823013] include/linux/kvm_host.h:664 suspicious rcu_dereference_check() usage!
[ 11.823131]
[ 11.823131] other info that might help us debug this:
[ 11.823131]
[ 11.823196]
[ 11.823196] rcu_scheduler_active = 2, debug_locks = 1
[ 11.823253] 1 lock held by dom:0/90:
[ 11.823292] #0: ffff998956ec8118 (&vcpu->mutex){+.+.}, at: kvm_vcpu_ioctl+0x85/0x680
[ 11.823379]
[ 11.823379] stack backtrace:
[ 11.823428] CPU: 2 PID: 90 Comm: dom:0 Kdump: loaded Not tainted 5.4.34+ #5
[ 11.823496] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 11.823612] Call Trace:
[ 11.823645] dump_stack+0x7a/0xa5
[ 11.823681] lockdep_rcu_suspicious+0xc5/0x100
[ 11.823726] __kvm_xen_has_interrupt+0x179/0x190
[ 11.823773] kvm_cpu_has_extint+0x6d/0x90
[ 11.823813] kvm_cpu_accept_dm_intr+0xd/0x40
[ 11.823853] kvm_vcpu_ready_for_interrupt_injection+0x20/0x30
< post_kvm_run_save() inlined here >
[ 11.823906] kvm_arch_vcpu_ioctl_run+0x135/0x6a0
[ 11.823947] kvm_vcpu_ioctl+0x263/0x680
Fixes: 40da8ccd724f ("KVM: x86/xen: Add event channel interrupt vector upcall")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Cc: stable@vger.kernel.org
Message-Id: <606aaaf29fca3850a63aa4499826104e77a72346.camel@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
If the guest requests string I/O from the hypervisor via VMGEXIT,
SW_EXITINFO2 will contain the REP count. However, sev_es_string_io
was incorrectly treating it as the size of the GHCB buffer in
bytes.
This fixes the "outsw" test in the experimental SEV tests of
kvm-unit-tests.
Cc: stable@vger.kernel.org
Fixes: 7ed9abfe8e9f ("KVM: SVM: Support string IO operations for an SEV-ES guest")
Reported-by: Marc Orr <marcorr@google.com>
Tested-by: Marc Orr <marcorr@google.com>
Reviewed-by: Marc Orr <marcorr@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
In kvm_vcpu_block, the current task is set to TASK_INTERRUPTIBLE before
making a final check whether the vCPU should be woken from HLT by any
incoming interrupt.
This is a problem for the get_user() in __kvm_xen_has_interrupt(), which
really shouldn't be sleeping when the task state has already been set.
I think it's actually harmless as it would just manifest itself as a
spurious wakeup, but it's causing a debug warning:
[ 230.963649] do not call blocking ops when !TASK_RUNNING; state=1 set at [<00000000b6bcdbc9>] prepare_to_swait_exclusive+0x30/0x80
Fix the warning by turning it into an *explicit* spurious wakeup. When
invoked with !task_is_running(current) (and we might as well add
in_atomic() there while we're at it), just return 1 to indicate that
an IRQ is pending, which will cause a wakeup and then something will
call it again in a context that *can* sleep so it can fault the page
back in.
Cc: stable@vger.kernel.org
Fixes: 40da8ccd724f ("KVM: x86/xen: Add event channel interrupt vector upcall")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <168bf8c689561da904e48e2ff5ae4713eaef9e2d.camel@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
On the preemption path when updating a Xen guest's runstate times, this
lock is taken inside the scheduler rq->lock, which is a raw spinlock.
This was shown in a lockdep warning:
[ 89.138354] =============================
[ 89.138356] [ BUG: Invalid wait context ]
[ 89.138358] 5.15.0-rc5+ #834 Tainted: G S I E
[ 89.138360] -----------------------------
[ 89.138361] xen_shinfo_test/2575 is trying to lock:
[ 89.138363] ffffa34a0364efd8 (&kvm->arch.pvclock_gtod_sync_lock){....}-{3:3}, at: get_kvmclock_ns+0x1f/0x130 [kvm]
[ 89.138442] other info that might help us debug this:
[ 89.138444] context-{5:5}
[ 89.138445] 4 locks held by xen_shinfo_test/2575:
[ 89.138447] #0: ffff972bdc3b8108 (&vcpu->mutex){+.+.}-{4:4}, at: kvm_vcpu_ioctl+0x77/0x6f0 [kvm]
[ 89.138483] #1: ffffa34a03662e90 (&kvm->srcu){....}-{0:0}, at: kvm_arch_vcpu_ioctl_run+0xdc/0x8b0 [kvm]
[ 89.138526] #2: ffff97331fdbac98 (&rq->__lock){-.-.}-{2:2}, at: __schedule+0xff/0xbd0
[ 89.138534] #3: ffffa34a03662e90 (&kvm->srcu){....}-{0:0}, at: kvm_arch_vcpu_put+0x26/0x170 [kvm]
...
[ 89.138695] get_kvmclock_ns+0x1f/0x130 [kvm]
[ 89.138734] kvm_xen_update_runstate+0x14/0x90 [kvm]
[ 89.138783] kvm_xen_update_runstate_guest+0x15/0xd0 [kvm]
[ 89.138830] kvm_arch_vcpu_put+0xe6/0x170 [kvm]
[ 89.138870] kvm_sched_out+0x2f/0x40 [kvm]
[ 89.138900] __schedule+0x5de/0xbd0
Cc: stable@vger.kernel.org
Reported-by: syzbot+b282b65c2c68492df769@syzkaller.appspotmail.com
Fixes: 30b5c851af79 ("KVM: x86/xen: Add support for vCPU runstate information")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <1b02a06421c17993df337493a68ba923f3bd5c0f.camel@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Pull more x86 kvm fixes from Paolo Bonzini:
- Cache coherency fix for SEV live migration
- Fix for instruction emulation with PKU
- fixes for rare delaying of interrupt delivery
- fix for SEV-ES buffer overflow
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: SEV-ES: go over the sev_pio_data buffer in multiple passes if needed
KVM: SEV-ES: keep INS functions together
KVM: x86: remove unnecessary arguments from complete_emulator_pio_in
KVM: x86: split the two parts of emulator_pio_in
KVM: SEV-ES: clean up kvm_sev_es_ins/outs
KVM: x86: leave vcpu->arch.pio.count alone in emulator_pio_in_out
KVM: SEV-ES: rename guest_ins_data to sev_pio_data
KVM: SEV: Flush cache on non-coherent systems before RECEIVE_UPDATE_DATA
KVM: MMU: Reset mmu->pkru_mask to avoid stale data
KVM: nVMX: promptly process interrupts delivered while in guest mode
KVM: x86: check for interrupts before deciding whether to exit the fast path
|
|
The PIO scratch buffer is larger than a single page, and therefore
it is not possible to copy it in a single step to vcpu->arch/pio_data.
Bound each call to emulator_pio_in/out to a single page; keep
track of how many I/O operations are left in vcpu->arch.sev_pio_count,
so that the operation can be restarted in the complete_userspace_io
callback.
For OUT, this means that the previous kvm_sev_es_outs implementation
becomes an iterator of the loop, and we can consume the sev_pio_data
buffer before leaving to userspace.
For IN, instead, consuming the buffer and decreasing sev_pio_count
is always done in the complete_userspace_io callback, because that
is when the memcpy is done into sev_pio_data.
Cc: stable@vger.kernel.org
Fixes: 7ed9abfe8e9f ("KVM: SVM: Support string IO operations for an SEV-ES guest")
Reported-by: Felix Wilhelm <fwilhelm@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Make the diff a little nicer when we actually get to fixing
the bug. No functional change intended.
Cc: stable@vger.kernel.org
Fixes: 7ed9abfe8e9f ("KVM: SVM: Support string IO operations for an SEV-ES guest")
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
complete_emulator_pio_in can expect that vcpu->arch.pio has been filled in,
and therefore does not need the size and count arguments. This makes things
nicer when the function is called directly from a complete_userspace_io
callback.
No functional change intended.
Cc: stable@vger.kernel.org
Fixes: 7ed9abfe8e9f ("KVM: SVM: Support string IO operations for an SEV-ES guest")
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
emulator_pio_in handles both the case where the data is pending in
vcpu->arch.pio.count, and the case where I/O has to be done via either
an in-kernel device or a userspace exit. For SEV-ES we would like
to split these, to identify clearly the moment at which the
sev_pio_data is consumed. To this end, create two different
functions: __emulator_pio_in fills in vcpu->arch.pio.count, while
complete_emulator_pio_in clears it and releases vcpu->arch.pio.data.
Because this patch has to be backported, things are left a bit messy.
kernel_pio() operates on vcpu->arch.pio, which leads to emulator_pio_in()
having with two calls to complete_emulator_pio_in(). It will be fixed
in the next release.
While at it, remove the unused void* val argument of emulator_pio_in_out.
The function currently hardcodes vcpu->arch.pio_data as the
source/destination buffer, which sucks but will be fixed after the more
severe SEV-ES buffer overflow.
No functional change intended.
Cc: stable@vger.kernel.org
Fixes: 7ed9abfe8e9f ("KVM: SVM: Support string IO operations for an SEV-ES guest")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
A few very small cleanups to the functions, smushed together because
the patch is already very small like this:
- inline emulator_pio_in_emulated and emulator_pio_out_emulated,
since we already have the vCPU
- remove the data argument and pull setting vcpu->arch.sev_pio_data into
the caller
- remove unnecessary clearing of vcpu->arch.pio.count when
emulation is done by the kernel (and therefore vcpu->arch.pio.count
is already clear on exit from emulator_pio_in and emulator_pio_out).
No functional change intended.
Cc: stable@vger.kernel.org
Fixes: 7ed9abfe8e9f ("KVM: SVM: Support string IO operations for an SEV-ES guest")
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Currently emulator_pio_in clears vcpu->arch.pio.count twice if
emulator_pio_in_out performs kernel PIO. Move the clear into
emulator_pio_out where it is actually necessary.
No functional change intended.
Cc: stable@vger.kernel.org
Fixes: 7ed9abfe8e9f ("KVM: SVM: Support string IO operations for an SEV-ES guest")
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
We will be using this field for OUTS emulation as well, in case the
data that is pushed via OUTS spans more than one page. In that case,
there will be a need to save the data pointer across exits to userspace.
So, change the name to something that refers to any kind of PIO.
Also spell out what it is used for, namely SEV-ES.
No functional change intended.
Cc: stable@vger.kernel.org
Fixes: 7ed9abfe8e9f ("KVM: SVM: Support string IO operations for an SEV-ES guest")
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
This fixes the following warning:
vmlinux.o: warning: objtool: elf_update: invalid section entry size
The size of the rodata section is 164 bytes, directly using the
entry_size of 164 bytes will cause errors in some versions of the
gcc compiler, while using 16 bytes directly will cause errors in
the clang compiler. This patch correct it by filling the size of
rodata to a 16-byte boundary.
Fixes: a7ee22ee1445 ("crypto: x86/sm4 - add AES-NI/AVX/x86_64 implementation")
Fixes: 5b2efa2bb865 ("crypto: x86/sm4 - add AES-NI/AVX2/x86_64 implementation")
Reported-by: Peter Zijlstra <peterz@infradead.org>
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
Tested-by: Heyuan Shi <heyuan@linux.alibaba.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Flush the destination page before invoking RECEIVE_UPDATE_DATA, as the
PSP encrypts the data with the guest's key when writing to guest memory.
If the target memory was not previously encrypted, the cache may contain
dirty, unecrypted data that will persist on non-coherent systems.
Fixes: 15fb7de1a7f5 ("KVM: SVM: Add KVM_SEV_RECEIVE_UPDATE_DATA command")
Cc: stable@vger.kernel.org
Cc: Peter Gonda <pgonda@google.com>
Cc: Marc Orr <marcorr@google.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Masahiro Kozuka <masa.koz@kozuka.jp>
[sean: converted bug report to changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210914210951.2994260-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
When updating mmu->pkru_mask, the value can only be added but it isn't
reset in advance. This will make mmu->pkru_mask keep the stale data.
Fix this issue.
Fixes: 2d344105f57c ("KVM, pkeys: introduce pkru_mask to cache conditions")
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
Message-Id: <20211021071022.1140-1-chenyi.qiang@intel.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Since commit c300ab9f08df ("KVM: x86: Replace late check_nested_events() hack with
more precise fix") there is no longer the certainty that check_nested_events()
tries to inject an external interrupt vmexit to L1 on every call to vcpu_enter_guest.
Therefore, even in that case we need to set KVM_REQ_EVENT. This ensures
that inject_pending_event() is called, and from there kvm_check_nested_events().
Fixes: c300ab9f08df ("KVM: x86: Replace late check_nested_events() hack with more precise fix")
Cc: stable@vger.kernel.org
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The kvm_x86_sync_pir_to_irr callback can sometimes set KVM_REQ_EVENT.
If that happens exactly at the time that an exit is handled as
EXIT_FASTPATH_REENTER_GUEST, vcpu_enter_guest will go incorrectly
through the loop that calls kvm_x86_run, instead of processing
the request promptly.
Fixes: 379a3c8ee444 ("KVM: VMX: Optimize posted-interrupt delivery for timer fastpath")
Cc: stable@vger.kernel.org
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Pull kvm fixes from Paolo Bonzini:
"Tools:
- kvm_stat: do not show halt_wait_ns since it is not a cumulative statistic
x86:
- clean ups and fixes for bus lock vmexit and lazy allocation of rmaps
- two fixes for SEV-ES (one more coming as soon as I get reviews)
- fix for static_key underflow
ARM:
- Properly refcount pages used as a concatenated stage-2 PGD
- Fix missing unlock when detecting the use of MTE+VM_SHARED"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: SEV-ES: reduce ghcb_sa_len to 32 bits
KVM: VMX: Remove redundant handling of bus lock vmexit
KVM: kvm_stat: do not show halt_wait_ns
KVM: x86: WARN if APIC HW/SW disable static keys are non-zero on unload
Revert "KVM: x86: Open code necessary bits of kvm_lapic_set_base() at vCPU RESET"
KVM: SEV-ES: Set guest_state_protected after VMSA update
KVM: X86: fix lazy allocation of rmaps
KVM: SEV-ES: fix length of string I/O
KVM: arm64: Release mmap_lock when using VM_SHARED with MTE
KVM: arm64: Report corrupted refcount at EL2
KVM: arm64: Fix host stage-2 PGD refcount
KVM: s390: Function documentation fixes
|
|
The size of the GHCB scratch area is limited to 16 KiB (GHCB_SCRATCH_AREA_LIMIT),
so there is no need for it to be a u64. This fixes a build error on 32-bit
systems:
i686-linux-gnu-ld: arch/x86/kvm/svm/sev.o: in function `sev_es_string_io:
sev.c:(.text+0x110f): undefined reference to `__udivdi3'
Cc: stable@vger.kernel.org
Fixes: 019057bd73d1 ("KVM: SEV-ES: fix length of string I/O")
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Hardware may or may not set exit_reason.bus_lock_detected on BUS_LOCK
VM-Exits. Dealing with KVM_RUN_X86_BUS_LOCK in handle_bus_lock_vmexit
could be redundant when exit_reason.basic is EXIT_REASON_BUS_LOCK.
We can remove redundant handling of bus lock vmexit. Unconditionally Set
exit_reason.bus_lock_detected in handle_bus_lock_vmexit(), and deal with
KVM_RUN_X86_BUS_LOCK only in vmx_handle_exit().
Signed-off-by: Hao Xiang <hao.xiang@linux.alibaba.com>
Message-Id: <1634299161-30101-1-git-send-email-hao.xiang@linux.alibaba.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
WARN if the static keys used to track if any vCPU has disabled its APIC
are left elevated at module exit. Unlike the underflow case, nothing in
the static key infrastructure will complain if a key is left elevated,
and because an elevated key only affects performance, nothing in KVM will
fail if either key is improperly incremented.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211013003554.47705-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
RESET"
Revert a change to open code bits of kvm_lapic_set_base() when emulating
APIC RESET to fix an apic_hw_disabled underflow bug due to arch.apic_base
and apic_hw_disabled being unsyncrhonized when the APIC is created. If
kvm_arch_vcpu_create() fails after creating the APIC, kvm_free_lapic()
will see the initialized-to-zero vcpu->arch.apic_base and decrement
apic_hw_disabled without KVM ever having incremented apic_hw_disabled.
Using kvm_lapic_set_base() in kvm_lapic_reset() is also desirable for a
potential future where KVM supports RESET outside of vCPU creation, in
which case all the side effects of kvm_lapic_set_base() are needed, e.g.
to handle the transition from x2APIC => xAPIC.
Alternatively, KVM could temporarily increment apic_hw_disabled (and call
kvm_lapic_set_base() at RESET), but that's a waste of cycles and would
impact the performance of other vCPUs and VMs. The other subtle side
effect is that updating the xAPIC ID needs to be done at RESET regardless
of whether the APIC was previously enabled, i.e. kvm_lapic_reset() needs
an explicit call to kvm_apic_set_xapic_id() regardless of whether or not
kvm_lapic_set_base() also performs the update. That makes stuffing the
enable bit at vCPU creation slightly more palatable, as doing so affects
only the apic_hw_disabled key.
Opportunistically tweak the comment to explicitly call out the connection
between vcpu->arch.apic_base and apic_hw_disabled, and add a comment to
call out the need to always do kvm_apic_set_xapic_id() at RESET.
Underflow scenario:
kvm_vm_ioctl() {
kvm_vm_ioctl_create_vcpu() {
kvm_arch_vcpu_create() {
if (something_went_wrong)
goto fail_free_lapic;
/* vcpu->arch.apic_base is initialized when something_went_wrong is false. */
kvm_vcpu_reset() {
kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event) {
vcpu->arch.apic_base = APIC_DEFAULT_PHYS_BASE | MSR_IA32_APICBASE_ENABLE;
}
}
return 0;
fail_free_lapic:
kvm_free_lapic() {
/* vcpu->arch.apic_base is not yet initialized when something_went_wrong is true. */
if (!(vcpu->arch.apic_base & MSR_IA32_APICBASE_ENABLE))
static_branch_slow_dec_deferred(&apic_hw_disabled); // <= underflow bug.
}
return r;
}
}
}
This (mostly) reverts commit 421221234ada41b4a9f0beeb08e30b07388bd4bd.
Fixes: 421221234ada ("KVM: x86: Open code necessary bits of kvm_lapic_set_base() at vCPU RESET")
Reported-by: syzbot+9fc046ab2b0cf295a063@syzkaller.appspotmail.com
Debugged-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211013003554.47705-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
If allocation of rmaps fails, but some of the pointers have already been written,
those pointers can be cleaned up when the memslot is freed, or even reused later
for another attempt at allocating the rmaps. Therefore there is no need to
WARN, as done for example in memslot_rmap_alloc, but the allocation *must* be
skipped lest KVM will overwrite the previous pointer and will indeed leak memory.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The refactoring in commit bb18a6777465 ("KVM: SEV: Acquire
vcpu mutex when updating VMSA") left behind the assignment to
svm->vcpu.arch.guest_state_protected; add it back.
Signed-off-by: Peter Gonda <pgonda@google.com>
[Delta between v2 and v3 of Peter's patch, which had already been
committed; the commit message is my own. - Paolo]
Fixes: bb18a6777465 ("KVM: SEV: Acquire vcpu mutex when updating VMSA")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fix from Borislav Petkov:
- Add Sapphire Rapids to the list of CPUs supporting the SMI count MSR
* tag 'perf_urgent_for_v5.15_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/msr: Add Sapphire Rapids CPU support
|
|
This is a fix for the fix (yeah, /facepalm).
The correct mask to use is not the negation of the MXCSR_MASK but the
actual mask which contains the supported bits in the MXCSR register.
Reported and debugged by Ville Syrjälä <ville.syrjala@linux.intel.com>
Fixes: d298b03506d3 ("x86/fpu: Restore the masking out of reserved MXCSR bits")
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Tested-by: Ser Olmy <ser.olmy@protonmail.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/YWgYIYXLriayyezv@intel.com
|
|
SMI_COUNT MSR is supported on Sapphire Rapids CPU.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/1633551137-192083-1-git-send-email-kan.liang@linux.intel.com
|
|
The size of the data in the scratch buffer is not divided by the size of
each port I/O operation, so vcpu->arch.pio.count ends up being larger
than it should be by a factor of size.
Cc: stable@vger.kernel.org
Fixes: 7ed9abfe8e9f ("KVM: SVM: Support string IO operations for an SEV-ES guest")
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
This Kconfig option was added initially so that memory encryption is
enabled by default on machines which support it.
However, devices which have DMA masks that are less than the bit
position of the encryption bit, aka C-bit, require the use of an IOMMU
or the use of SWIOTLB.
If the IOMMU is disabled or in passthrough mode, the kernel would switch
to SWIOTLB bounce-buffering for those transfers.
In order to avoid that,
2cc13bb4f59f ("iommu: Disable passthrough mode when SME is active")
disables the default IOMMU passthrough mode so that devices for which the
default 256K DMA is insufficient, can use the IOMMU instead.
However 2, there are cases where the IOMMU is disabled in the BIOS, etc.
(think the usual hardware folk "oops, I dropped the ball there" cases) or a
driver doesn't properly use the DMA APIs or a device has a firmware or
hardware bug, e.g.:
ea68573d408f ("drm/amdgpu: Fail to load on RAVEN if SME is active")
However 3, in the above GPU use case, there are APIs like Vulkan and
some OpenGL/OpenCL extensions which are under the assumption that
user-allocated memory can be passed in to the kernel driver and both the
GPU and CPU can do coherent and concurrent access to the same memory.
That cannot work with SWIOTLB bounce buffers, of course.
So, in order for those devices to function, drop the "default y" for the
SME by default active option so that users who want to have SME enabled,
will need to either enable it in their config or use "mem_encrypt=on" on
the kernel command line.
[ tlendacky: Generalize commit message. ]
Fixes: 7744ccdbc16f ("x86/mm: Add Secure Memory Encryption (SME) support")
Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/8bbacd0e-4580-3194-19d2-a0ecad7df09c@molgen.mpg.de
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
- A FPU fix to properly handle invalid MXCSR values: 32-bit masks them
out due to historical reasons and 64-bit kernels reject them
- A fix to clear X86_FEATURE_SMAP when support for is not
config-enabled
- Three fixes correcting misspelled Kconfig symbols used in code
- Two resctrl object cleanup fixes
- Yet another attempt at fixing the neverending saga of botched x86
timers, this time because some incredibly smart hardware decides to
turn off the HPET timer in a low power state - who cares if the OS is
relying on it...
- Check the full return value range of an SEV VMGEXIT call to determine
whether it returned an error
* tag 'x86_urgent_for_v5.15_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/fpu: Restore the masking out of reserved MXCSR bits
x86/Kconfig: Correct reference to MWINCHIP3D
x86/platform/olpc: Correct ifdef symbol to intended CONFIG_OLPC_XO15_SCI
x86/entry: Clear X86_FEATURE_SMAP when CONFIG_X86_SMAP=n
x86/entry: Correct reference to intended CONFIG_64_BIT
x86/resctrl: Fix kfree() of the wrong type in domain_add_cpu()
x86/resctrl: Free the ctrlval arrays when domain_setup_mon_state() fails
x86/hpet: Use another crystalball to evaluate HPET usability
x86/sev: Return an error on a returned non-zero SW_EXITINFO1[31:0]
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
- fix two minor issues in the Xen privcmd driver plus a cleanup patch
for that driver
- fix multiple issues related to running as PVH guest and some related
earlyprintk fixes for other Xen guest types
- fix an issue introduced in 5.15 the Xen balloon driver
* tag 'for-linus-5.15b-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/balloon: fix cancelled balloon action
xen/x86: adjust data placement
x86/PVH: adjust function/data placement
xen/x86: hook up xen_banner() also for PVH
xen/x86: generalize preferred console model from PV to PVH Dom0
xen/x86: make "earlyprintk=xen" work for HVM/PVH DomU
xen/x86: allow "earlyprintk=xen" to work for PV Dom0
xen/x86: make "earlyprintk=xen" work better for PVH Dom0
xen/x86: allow PVH Dom0 without XEN_PV=y
xen/x86: prevent PVH type from getting clobbered
xen/privcmd: drop "pages" parameter from xen_remap_pfn()
xen/privcmd: fix error handling in mmap-resource processing
xen/privcmd: replace kcalloc() by kvcalloc() when allocating empty pages
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull asm-generic fixes from Arnd Bergmann:
"There is one build fix for Arm platforms that ended up impacting most
architectures because of the way the drivers/firmware Kconfig file is
wired up:
The CONFIG_QCOM_SCM dependency have caused a number of randconfig
regressions over time, and some still remain in v5.15-rc4. The fix we
agreed on in the end is to make this symbol selected by any driver
using it, and then building it even for non-Arm platforms with
CONFIG_COMPILE_TEST.
To make this work on all architectures, the drivers/firmware/Kconfig
file needs to be included for all architectures to make the symbol
itself visible.
In a separate discussion, we found that a sound driver patch that is
pending for v5.16 needs the same change to include this Kconfig file,
so the easiest solution seems to have my Kconfig rework included in
v5.15.
Finally, the branch also includes a small unrelated build fix for
NOMMU architectures"
Link: https://lore.kernel.org/all/20210928153508.101208f8@canb.auug.org.au/
Link: https://lore.kernel.org/all/20210928075216.4193128-1-arnd@kernel.org/
Link: https://lore.kernel.org/all/20211007151010.333516-1-arnd@kernel.org/
* tag 'asm-generic-fixes-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
asm-generic/io.h: give stub iounmap() on !MMU same prototype as elsewhere
qcom_scm: hide Kconfig symbol
firmware: include drivers/firmware/Kconfig unconditionally
|
|
Ser Olmy reported a boot failure:
init[1] bad frame in sigreturn frame:(ptrval) ip:b7c9fbe6 sp:bf933310 orax:ffffffff \
in libc-2.33.so[b7bed000+156000]
Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
CPU: 0 PID: 1 Comm: init Tainted: G W 5.14.9 #1
Hardware name: Hewlett-Packard HP PC/HP Board, BIOS JD.00.06 12/06/2001
Call Trace:
dump_stack_lvl
dump_stack
panic
do_exit.cold
do_group_exit
get_signal
arch_do_signal_or_restart
? force_sig_info_to_task
? force_sig
exit_to_user_mode_prepare
syscall_exit_to_user_mode
do_int80_syscall_32
entry_INT80_32
on an old 32-bit Intel CPU:
vendor_id : GenuineIntel
cpu family : 6
model : 6
model name : Celeron (Mendocino)
stepping : 5
microcode : 0x3
Ser bisected the problem to the commit in Fixes.
tglx suggested reverting the rejection of invalid MXCSR values which
this commit introduced and replacing it with what the old code did -
simply masking them out to zero.
Further debugging confirmed his suggestion:
fpu->state.fxsave.mxcsr: 0xb7be13b4, mxcsr_feature_mask: 0xffbf
WARNING: CPU: 0 PID: 1 at arch/x86/kernel/fpu/signal.c:384 __fpu_restore_sig+0x51f/0x540
so restore the original behavior only for 32-bit kernels where you have
ancient machines with buggy hardware. For 32-bit programs on 64-bit
kernels, user space which supplies wrong MXCSR values is considered
malicious so fail the sigframe restoration there.
Fixes: 6f9866a166cd ("x86/fpu/signal: Let xrstor handle the features to init")
Reported-by: Ser Olmy <ser.olmy@protonmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Ser Olmy <ser.olmy@protonmail.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/YVtA67jImg3KlBTw@zn.tnic
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull hyperv fixes from Wei Liu:
- Replace uuid.h with types.h in a header (Andy Shevchenko)
- Avoid sleeping in atomic context in PCI driver (Long Li)
- Avoid sending IPI to self when it shouldn't (Vitaly Kuznetsov)
* tag 'hyperv-fixes-signed-20211007' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
x86/hyperv: Avoid erroneously sending IPI to 'self'
hyper-v: Replace uuid.h with types.h
PCI: hv: Fix sleep while in non-sleep context when removing child devices from the bus
|
|
Compile-testing drivers that require access to a firmware layer
fails when that firmware symbol is unavailable. This happened
twice this week:
- My proposed to change to rework the QCOM_SCM firmware symbol
broke on ppc64 and others.
- The cs_dsp firmware patch added device specific firmware loader
into drivers/firmware, which broke on the same set of
architectures.
We should probably do the same thing for other subsystems as well,
but fix this one first as this is a dependency for other patches
getting merged.
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Reviewed-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Liam Girdwood <lgirdwood@gmail.com>
Cc: Charles Keepax <ckeepax@opensource.cirrus.com>
Cc: Simon Trimmer <simont@opensource.cirrus.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
Commit in Fixes intended to exclude the Winchip series and referred to
CONFIG_WINCHIP3D, but the config symbol is called CONFIG_MWINCHIP3D.
Hence, scripts/checkkconfigsymbols.py warns:
WINCHIP3D
Referencing files: arch/x86/Kconfig
Correct the reference to the intended config symbol.
Fixes: 69b8d3fcabdc ("x86/Kconfig: Exclude i586-class CPUs lacking PAE support from the HIGHMEM64G Kconfig group")
Suggested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20210803113531.30720-4-lukas.bulwahn@gmail.com
|
|
The refactoring in the commit in Fixes introduced an ifdef
CONFIG_OLPC_XO1_5_SCI, however the config symbol is actually called
"CONFIG_OLPC_XO15_SCI".
Fortunately, ./scripts/checkkconfigsymbols.py warns:
OLPC_XO1_5_SCI
Referencing files: arch/x86/platform/olpc/olpc.c
Correct this ifdef condition to the intended config symbol.
Fixes: ec9964b48033 ("Platform: OLPC: Move EC-specific functionality out from x86")
Suggested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20210803113531.30720-3-lukas.bulwahn@gmail.com
|