Age | Commit message (Collapse) | Author | Files | Lines |
|
Add an explicit __ignore_sync_check__ marker which will be used to mark
lines which are supposed to be ignored by file synchronization check
scripts, its advantage being that it explicitly denotes such lines in
the code.
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://lkml.kernel.org/r/20210304174237.31945-4-bp@alien8.de
|
|
Rename insn_decode() to insn_decode_from_regs() to denote that it
receives regs as param and uses registers from there during decoding.
Free the former name for a more generic version of the function.
No functional changes.
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210304174237.31945-2-bp@alien8.de
|
|
Pick up dependent SEV-ES urgent changes to base new work ontop.
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
Retrieve the active PCID only when writing a guest CR3 value, i.e. don't
get the PCID when using EPT or NPT. The PCID is especially problematic
for EPT as the bits have different meaning, and so the PCID and must be
manually stripped, which is annoying and unnecessary. And on VMX,
getting the active PCID also involves reading the guest's CR3 and
CR4.PCIDE, i.e. may add pointless VMREADs.
Opportunistically rename the pgd/pgd_level params to root_hpa and
root_level to better reflect their new roles. Keep the function names,
as "load the guest PGD" is still accurate/correct.
Last, and probably least, pass root_hpa as a hpa_t/u64 instead of an
unsigned long. The EPTP holds a 64-bit value, even in 32-bit mode, so
in theory EPT could support HIGHMEM for 32-bit KVM. Never mind that
doing so would require changing the MMU page allocators and reworking
the MMU to use kmap().
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210305183123.3978098-2-seanjc@google.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Let the MMU deal with the SPTE masks to avoid splitting the logic and
knowledge across the MMU and VMX.
The SPTE masks that are used for EPT are very, very tightly coupled to
the MMU implementation. The use of available bits, the existence of A/D
types, the fact that shadow_x_mask even exists, and so on and so forth
are all baked into the MMU implementation. Cross referencing the params
to the masks is also a nightmare, as pretty much every param is a u64.
A future patch will make the location of the MMU_WRITABLE and
HOST_WRITABLE bits MMU specific, to free up bit 11 for a MMU_PRESENT bit.
Doing that change with the current kvm_mmu_set_mask_ptes() would be an
absolute mess.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210225204749.1512652-18-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Newer AMD processors have a feature to virtualize the use of the
SPEC_CTRL MSR. Presence of this feature is indicated via CPUID
function 0x8000000A_EDX[20]: GuestSpecCtrl. Hypervisors are not
required to enable this feature since it is automatically enabled on
processors that support it.
A hypervisor may wish to impose speculation controls on guest
execution or a guest may want to impose its own speculation controls.
Therefore, the processor implements both host and guest
versions of SPEC_CTRL.
When in host mode, the host SPEC_CTRL value is in effect and writes
update only the host version of SPEC_CTRL. On a VMRUN, the processor
loads the guest version of SPEC_CTRL from the VMCB. When the guest
writes SPEC_CTRL, only the guest version is updated. On a VMEXIT,
the guest version is saved into the VMCB and the processor returns
to only using the host SPEC_CTRL for speculation control. The guest
SPEC_CTRL is located at offset 0x2E0 in the VMCB.
The effective SPEC_CTRL setting is the guest SPEC_CTRL setting or'ed
with the hypervisor SPEC_CTRL setting. This allows the hypervisor to
ensure a minimum SPEC_CTRL if desired.
This support also fixes an issue where a guest may sometimes see an
inconsistent value for the SPEC_CTRL MSR on processors that support
this feature. With the current SPEC_CTRL support, the first write to
SPEC_CTRL is intercepted and the virtualized version of the SPEC_CTRL
MSR is not updated. When the guest reads back the SPEC_CTRL MSR, it
will be 0x0, instead of the actual expected value. There isn’t a
security concern here, because the host SPEC_CTRL value is or’ed with
the Guest SPEC_CTRL value to generate the effective SPEC_CTRL value.
KVM writes with the guest's virtualized SPEC_CTRL value to SPEC_CTRL
MSR just before the VMRUN, so it will always have the actual value
even though it doesn’t appear that way in the guest. The guest will
only see the proper value for the SPEC_CTRL register if the guest was
to write to the SPEC_CTRL register again. With Virtual SPEC_CTRL
support, the save area spec_ctrl is properly saved and restored.
So, the guest will always see the proper value when it is read back.
Signed-off-by: Babu Moger <babu.moger@amd.com>
Message-Id: <161188100955.28787.11816849358413330720.stgit@bmoger-ubuntu>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Newer AMD processors have a feature to virtualize the use of the
SPEC_CTRL MSR. Presence of this feature is indicated via CPUID
function 0x8000000A_EDX[20]: GuestSpecCtrl. When present, the
SPEC_CTRL MSR is automatically virtualized.
Signed-off-by: Babu Moger <babu.moger@amd.com>
Acked-by: Borislav Petkov <bp@suse.de>
Message-Id: <161188100272.28787.4097272856384825024.stgit@bmoger-ubuntu>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Move the entirety of the accelerated RDPMC emulation to x86.c, and assign
the common handler directly to the exit handler array for VMX. SVM has
bizarre nrips behavior that prevents it from directly invoking the common
handler. The nrips goofiness will be addressed in a future patch.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210205005750.3841462-8-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Move the trivial exit handlers, e.g. for instructions that KVM
"emulates" as nops, to common x86 code. Assign the common handlers
directly to the exit handler arrays and drop the vendor trampolines.
Opportunistically use pr_warn_once() where appropriate.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210205005750.3841462-7-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Move the entirety of XSETBV emulation to x86.c, and assign the
function directly to both VMX's and SVM's exit handlers, i.e. drop the
unnecessary trampolines.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210205005750.3841462-6-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Synthesize a nested VM-Exit if L2 triggers an emulated triple fault
instead of exiting to userspace, which likely will kill L1. Any flow
that does KVM_REQ_TRIPLE_FAULT is suspect, but the most common scenario
for L2 killing L1 is if L0 (KVM) intercepts a contributory exception that
is _not_intercepted by L1. E.g. if KVM is intercepting #GPs for the
VMware backdoor, a #GP that occurs in L2 while vectoring an injected #DF
will cause KVM to emulate triple fault.
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Jim Mattson <jmattson@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210302174515.2812275-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Unexport the MMU load and unload helpers now that they are no longer
used (incorrectly) in vendor code.
Opportunistically move the kvm_mmu_sync_roots() declaration into mmu.h,
it should not be exposed to vendor code.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210305011101.3597423-16-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The new per-cpu stat 'nested_run' is introduced in order to track if L1 VM
is running or used to run L2 VM.
An example of the usage of 'nested_run' is to help the host administrator
to easily track if any L1 VM is used to run L2 VM. Suppose there is issue
that may happen with nested virtualization, the administrator will be able
to easily narrow down and confirm if the issue is due to nested
virtualization via 'nested_run'. For example, whether the fix like
commit 88dddc11a8d6 ("KVM: nVMX: do not use dangling shadow VMCS after
guest reset") is required.
Cc: Joe Jin <joe.jin@oracle.com>
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Message-Id: <20210305225747.7682-1-dongli.zhang@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull objtool fix from Thomas Gleixner:
"A single objtool fix to handle the PUSHF/POPF validation correctly for
the paravirt changes which modified arch_local_irq_restore not to use
popf"
* tag 'objtool-urgent-2021-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
objtool,x86: Fix uaccess PUSHF/POPF validation
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
- A couple of SEV-ES fixes and robustifications: verify usermode stack
pointer in NMI is not coming from the syscall gap, correctly track
IRQ states in the #VC handler and access user insn bytes atomically
in same handler as latter cannot sleep.
- Balance 32-bit fast syscall exit path to do the proper work on exit
and thus not confuse audit and ptrace frameworks.
- Two fixes for the ORC unwinder going "off the rails" into KASAN
redzones and when ORC data is missing.
* tag 'x86_urgent_for_v5.12_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/sev-es: Use __copy_from_user_inatomic()
x86/sev-es: Correctly track IRQ states in runtime #VC handler
x86/sev-es: Check regs->sp is trusted before adjusting #VC IST stack
x86/sev-es: Introduce ip_within_syscall_gap() helper
x86/entry: Fix entry/exit mismatch on failed fast 32-bit syscalls
x86/unwind/orc: Silence warnings caused by missing ORC data
x86/unwind/orc: Disable KASAN checking in the ORC unwinder, part 2
|
|
Pull KVM fixes from Paolo Bonzini:
"More fixes for ARM and x86"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: LAPIC: Advancing the timer expiration on guest initiated write
KVM: x86/mmu: Skip !MMU-present SPTEs when removing SP in exclusive mode
KVM: kvmclock: Fix vCPUs > 64 can't be online/hotpluged
kvm: x86: annotate RCU pointers
KVM: arm64: Fix exclusive limit for IPA size
KVM: arm64: Reject VM creation when the default IPA size is unsupported
KVM: arm64: Ensure I-cache isolation between vcpus of a same VM
KVM: arm64: Don't use cbz/adr with external symbols
KVM: arm64: Fix range alignment when walking page tables
KVM: arm64: Workaround firmware wrongly advertising GICv2-on-v3 compatibility
KVM: arm64: Rename __vgic_v3_get_ich_vtr_el2() to __vgic_v3_get_gic_config()
KVM: arm64: Don't access PMSELR_EL0/PMUSERENR_EL0 when no PMU is available
KVM: arm64: Turn kvm_arm_support_pmu_v3() into a static key
KVM: arm64: Fix nVHE hyp panic host context restore
KVM: arm64: Avoid corrupting vCPU context register in guest exit
KVM: arm64: nvhe: Save the SPE context early
kvm: x86: use NULL instead of using plain integer as pointer
KVM: SVM: Connect 'npt' module param to KVM's internal 'npt_enabled'
KVM: x86: Ensure deadline timer has truly expired before posting its IRQ
|
|
This patch adds the annotation to fix the following sparse errors:
arch/x86/kvm//x86.c:8147:15: error: incompatible types in comparison expression (different address spaces):
arch/x86/kvm//x86.c:8147:15: struct kvm_apic_map [noderef] __rcu *
arch/x86/kvm//x86.c:8147:15: struct kvm_apic_map *
arch/x86/kvm//x86.c:10628:16: error: incompatible types in comparison expression (different address spaces):
arch/x86/kvm//x86.c:10628:16: struct kvm_apic_map [noderef] __rcu *
arch/x86/kvm//x86.c:10628:16: struct kvm_apic_map *
arch/x86/kvm//x86.c:10629:15: error: incompatible types in comparison expression (different address spaces):
arch/x86/kvm//x86.c:10629:15: struct kvm_pmu_event_filter [noderef] __rcu *
arch/x86/kvm//x86.c:10629:15: struct kvm_pmu_event_filter *
arch/x86/kvm//lapic.c:267:15: error: incompatible types in comparison expression (different address spaces):
arch/x86/kvm//lapic.c:267:15: struct kvm_apic_map [noderef] __rcu *
arch/x86/kvm//lapic.c:267:15: struct kvm_apic_map *
arch/x86/kvm//lapic.c:269:9: error: incompatible types in comparison expression (different address spaces):
arch/x86/kvm//lapic.c:269:9: struct kvm_apic_map [noderef] __rcu *
arch/x86/kvm//lapic.c:269:9: struct kvm_apic_map *
arch/x86/kvm//lapic.c:637:15: error: incompatible types in comparison expression (different address spaces):
arch/x86/kvm//lapic.c:637:15: struct kvm_apic_map [noderef] __rcu *
arch/x86/kvm//lapic.c:637:15: struct kvm_apic_map *
arch/x86/kvm//lapic.c:994:15: error: incompatible types in comparison expression (different address spaces):
arch/x86/kvm//lapic.c:994:15: struct kvm_apic_map [noderef] __rcu *
arch/x86/kvm//lapic.c:994:15: struct kvm_apic_map *
arch/x86/kvm//lapic.c:1036:15: error: incompatible types in comparison expression (different address spaces):
arch/x86/kvm//lapic.c:1036:15: struct kvm_apic_map [noderef] __rcu *
arch/x86/kvm//lapic.c:1036:15: struct kvm_apic_map *
arch/x86/kvm//lapic.c:1173:15: error: incompatible types in comparison expression (different address spaces):
arch/x86/kvm//lapic.c:1173:15: struct kvm_apic_map [noderef] __rcu *
arch/x86/kvm//lapic.c:1173:15: struct kvm_apic_map *
arch/x86/kvm//pmu.c:190:18: error: incompatible types in comparison expression (different address spaces):
arch/x86/kvm//pmu.c:190:18: struct kvm_pmu_event_filter [noderef] __rcu *
arch/x86/kvm//pmu.c:190:18: struct kvm_pmu_event_filter *
arch/x86/kvm//pmu.c:251:18: error: incompatible types in comparison expression (different address spaces):
arch/x86/kvm//pmu.c:251:18: struct kvm_pmu_event_filter [noderef] __rcu *
arch/x86/kvm//pmu.c:251:18: struct kvm_pmu_event_filter *
arch/x86/kvm//pmu.c:522:18: error: incompatible types in comparison expression (different address spaces):
arch/x86/kvm//pmu.c:522:18: struct kvm_pmu_event_filter [noderef] __rcu *
arch/x86/kvm//pmu.c:522:18: struct kvm_pmu_event_filter *
arch/x86/kvm//pmu.c:522:18: error: incompatible types in comparison expression (different address spaces):
arch/x86/kvm//pmu.c:522:18: struct kvm_pmu_event_filter [noderef] __rcu *
arch/x86/kvm//pmu.c:522:18: struct kvm_pmu_event_filter *
Signed-off-by: Muhammad Usama Anjum <musamaanjum@gmail.com>
Message-Id: <20210305191123.GA497469@LEGION>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Commit ab234a260b1f ("x86/pv: Rework arch_local_irq_restore() to not
use popf") replaced "push %reg; popf" with something like: "test
$0x200, %reg; jz 1f; sti; 1:", which breaks the pushf/popf symmetry
that commit ea24213d8088 ("objtool: Add UACCESS validation") relies
on.
The result is:
drivers/gpu/drm/amd/amdgpu/si.o: warning: objtool: si_common_hw_init()+0xf36: PUSHF stack exhausted
Meanwhile, commit c9c324dc22aa ("objtool: Support stack layout changes
in alternatives") makes that we can actually use stack-ops in
alternatives, which means we can revert 1ff865e343c2 ("x86,smap: Fix
smap_{save,restore}() alternatives").
That in turn means we can limit the PUSHF/POPF handling of
ea24213d8088 to those instructions that are in alternatives.
Fixes: ab234a260b1f ("x86/pv: Rework arch_local_irq_restore() to not use popf")
Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/YEY4rIbQYa5fnnEp@hirez.programming.kicks-ass.net
|
|
There is no need any longer to have different paravirt patch functions
for native and Xen. Eliminate native_patch() and rename
paravirt_patch_default() to paravirt_patch().
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210311142319.4723-15-jgross@suse.com
|
|
Instead of using paravirt patching for custom code sequences use
ALTERNATIVE for the functions with custom code replacements.
Instead of patching an ud2 instruction for unpopulated vector entries
into the caller site, use a simple function just calling BUG() as a
replacement.
Simplify the register defines for assembler paravirt calling, as there
isn't much usage left.
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210311142319.4723-14-jgross@suse.com
|
|
Instead of using paravirt patching for custom code sequences add
support for using ALTERNATIVE handling combined with paravirt call
patching.
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210311142319.4723-13-jgross@suse.com
|
|
The iret paravirt op is rather special as it is using a jmp instead
of a call instruction. Switch it to ALTERNATIVE.
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210311142319.4723-12-jgross@suse.com
|
|
The central pvops call macros ____PVOP_CALL() and ____PVOP_VCALL() are
looking very similar now.
The main differences are using PVOP_VCALL_ARGS or PVOP_CALL_ARGS, which
are identical, and the return value handling.
So drop PVOP_VCALL_ARGS and instead of ____PVOP_VCALL() just use
(void)____PVOP_CALL(long, ...).
Note that it isn't easily possible to just redefine ____PVOP_VCALL()
to use ____PVOP_CALL() instead, as this would require further hiding of
commas in macro parameters.
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210311142319.4723-11-jgross@suse.com
|
|
PVOP_VCALL4() is only used for Xen PV, while PVOP_CALL4() isn't used
at all. Keep PVOP_CALL4() for 64 bits due to symmetry reasons.
This allows to remove the 32-bit definitions of those macros leading
to a substantial simplification of the paravirt macros, as those were
the only ones needing non-empty "pre" and "post" parameters.
PVOP_CALLEE2() and PVOP_VCALLEE2() are used nowhere, so remove them.
Another no longer needed case is special handling of return types
larger than unsigned long. Replace that with a BUILD_BUG_ON().
DISABLE_INTERRUPTS() is used in 32-bit code only, so it can just be
replaced by cli.
INTERRUPT_RETURN in 32-bit code can be replaced by iret.
ENABLE_INTERRUPTS is used nowhere, so it can be removed.
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210311142319.4723-10-jgross@suse.com
|
|
For being able to switch paravirt patching from special cased custom
code sequences to ALTERNATIVE handling some X86_FEATURE_* are needed
as new features. This enables to have the standard indirect pv call
as the default code and to patch that with the non-Xen custom code
sequence via ALTERNATIVE patching later.
Make sure paravirt patching is performed before alternatives patching.
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210311142319.4723-9-jgross@suse.com
|
|
_static_cpu_has() contains a completely open coded version of
ALTERNATIVE_TERNARY(). Replace that with the macro instead.
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210311142319.4723-8-jgross@suse.com
|
|
Add ALTERNATIVE_TERNARY support for replacing an initial instruction
with either of two instructions depending on a feature:
ALTERNATIVE_TERNARY "default_instr", FEATURE_NR,
"feature_on_instr", "feature_off_instr"
which will start with "default_instr" and at patch time will,
depending on FEATURE_NR being set or not, patch that with either
"feature_on_instr" or "feature_off_instr".
[ bp: Add comment ontop. ]
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210311142319.4723-7-jgross@suse.com
|
|
Add support for alternative patching for the case a feature is not
present on the current CPU. For users of ALTERNATIVE() and friends, an
inverted feature is specified by applying the ALT_NOT() macro to it,
e.g.:
ALTERNATIVE(old, new, ALT_NOT(feature));
Committer note:
The decision to encode the NOT-bit in the feature bit itself is because
a future change which would make objtool generate such alternative
calls, would keep the code in objtool itself fairly simple.
Also, this allows for the alternative macros to support the NOT feature
without having to change them.
Finally, the u16 cpuid member encoding the X86_FEATURE_ flags is not an
ABI so if more bits are needed, cpuid itself can be enlarged or a flags
field can be added to struct alt_instr after having considered the size
growth in either cases.
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210311142319.4723-6-jgross@suse.com
|
|
The time pvops functions are the only ones left which might be
used in 32-bit mode and which return a 64-bit value.
Switch them to use the static_call() mechanism instead of pvops, as
this allows quite some simplification of the pvops implementation.
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210311142319.4723-5-jgross@suse.com
|
|
Merge arch/x86/include/asm/alternative-asm.h into
arch/x86/include/asm/alternative.h in order to make it easier to use
common definitions later.
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210311142319.4723-2-jgross@suse.com
|
|
Since a13f2ef168cb ("x86/xen: remove 32-bit Xen PV guest support"),
RESERVE_BRK_ARRAY() has no user anymore so drop it.
Update related comments too.
Signed-off-by: Cao jin <jojing64@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210311083919.27530-1-jojing64@gmail.com
|
|
The macro ALTINSTR_REPLACEMENT() doesn't make use of the feature
parameter, so drop it.
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210309134813.23912-4-jgross@suse.com
|
|
The #VC handler must run in atomic context and cannot sleep. This is a
problem when it tries to fetch instruction bytes from user-space via
copy_from_user().
Introduce a insn_fetch_from_user_inatomic() helper which uses
__copy_from_user_inatomic() to safely copy the instruction bytes to
kernel memory in the #VC handler.
Fixes: 5e3427a7bc432 ("x86/sev-es: Handle instruction fetches from user-space")
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: stable@vger.kernel.org # v5.10+
Link: https://lkml.kernel.org/r/20210303141716.29223-6-joro@8bytes.org
|
|
ELF related fields elf_headers, elf_headers_sz, and elf_load_addr
have been moved from 'struct kimage_arch' to 'struct kimage'.
Use the ELF fields defined in 'struct kimage'.
Suggested-by: Rob Herring <robh@kernel.org>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20210221174930.27324-5-nramas@linux.microsoft.com
|
|
STIMER0 interrupts are most naturally modeled as per-cpu IRQs. But
because x86/x64 doesn't have per-cpu IRQs, the core STIMER0 interrupt
handling machinery is done in code under arch/x86 and Linux IRQs are
not used. Adding support for ARM64 means adding equivalent code
using per-cpu IRQs under arch/arm64.
A better model is to treat per-cpu IRQs as the normal path (which it is
for modern architectures), and the x86/x64 path as the exception. Do this
by incorporating standard Linux per-cpu IRQ allocation into the main
SITMER0 driver code, and bypass it in the x86/x64 exception case. For
x86/x64, special case code is retained under arch/x86, but no STIMER0
interrupt handling code is needed under arch/arm64.
No functional change.
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/1614721102-2241-11-git-send-email-mikelley@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
|
|
While the Hyper-V Reference TSC code is architecture neutral, the
pv_ops.time.sched_clock() function is implemented for x86/x64, but not
for ARM64. Current code calls a utility function under arch/x86 (and
coming, under arch/arm64) to handle the difference.
Change this approach to handle the difference inline based on whether
GENERIC_SCHED_CLOCK is present. The new approach removes code under
arch/* since the difference is tied more to the specifics of the Linux
implementation than to the architecture.
No functional change.
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Boqun Feng <boqun.feng@gmail.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/1614721102-2241-9-git-send-email-mikelley@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
|
|
While the driver for the Hyper-V Reference TSC and STIMERs is architecture
neutral, vDSO is implemented for x86/x64, but not for ARM64. Current code
calls into utility functions under arch/x86 (and coming, under arch/arm64)
to handle the difference.
Change this approach to handle the difference inline based on whether
VDSO_CLOCK_MODE_HVCLOCK is present. The new approach removes code under
arch/* since the difference is tied more to the specifics of the Linux
implementation than to the architecture.
No functional change.
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Boqun Feng <boqun.feng@gmail.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/1614721102-2241-8-git-send-email-mikelley@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
|
|
VMbus interrupts are most naturally modelled as per-cpu IRQs. But
because x86/x64 doesn't have per-cpu IRQs, the core VMbus interrupt
handling machinery is done in code under arch/x86 and Linux IRQs are
not used. Adding support for ARM64 means adding equivalent code
using per-cpu IRQs under arch/arm64.
A better model is to treat per-cpu IRQs as the normal path (which it is
for modern architectures), and the x86/x64 path as the exception. Do this
by incorporating standard Linux per-cpu IRQ allocation into the main VMbus
driver, and bypassing it in the x86/x64 exception case. For x86/x64,
special case code is retained under arch/x86, but no VMbus interrupt
handling code is needed under arch/arm64.
No functional change.
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Boqun Feng <boqun.feng@gmail.com>
Link: https://lore.kernel.org/r/1614721102-2241-7-git-send-email-mikelley@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
|
|
On x86/x64, Hyper-V provides a flag to indicate auto EOI functionality,
but it doesn't on ARM64. Handle this quirk inline instead of calling
into code under arch/x86 (and coming, under arch/arm64).
No functional change.
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Boqun Feng <boqun.feng@gmail.com>
Link: https://lore.kernel.org/r/1614721102-2241-6-git-send-email-mikelley@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
|
|
Current code defines a separate get and set macro for each Hyper-V
synthetic MSR used by the VMbus driver. Furthermore, the get macro
can't be converted to a standard function because the second argument
is modified in place, which is somewhat bad form.
Redo this by providing a single get and a single set function that
take a parameter specifying the MSR to be operated on. Fixup usage
of the get function. Calling locations are no more complex than before,
but the code under arch/x86 and the upcoming code under arch/arm64
is significantly simplified.
Also standardize the names of Hyper-V synthetic MSRs that are
architecture neutral. But keep the old x86-specific names as aliases
that can be removed later when all references (particularly in KVM
code) have been cleaned up in a separate patch series.
No functional change.
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Boqun Feng <boqun.feng@gmail.com>
Link: https://lore.kernel.org/r/1614721102-2241-4-git-send-email-mikelley@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
|
|
The definition of enum hv_message_type includes arch neutral and
x86/x64-specific values. Ideally there would be a way to put the
arch neutral values in an arch neutral module, and the arch
specific values in an arch specific module. But C doesn't provide
a way to extend enum types. As a compromise, move the entire
definition into an arch neutral module, to avoid duplicating the
arch neutral values for x86/x64 and for ARM64.
No functional change.
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Boqun Feng <boqun.feng@gmail.com>
Link: https://lore.kernel.org/r/1614721102-2241-3-git-send-email-mikelley@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
|
|
The Hyper-V page allocator functions are implemented in an architecture
neutral way. Move them into the architecture neutral VMbus module so
a separate implementation for ARM64 is not needed.
No functional change.
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Boqun Feng <boqun.feng@gmail.com>
Link: https://lore.kernel.org/r/1614721102-2241-2-git-send-email-mikelley@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
|
|
Introduce a helper to check whether an exception came from the syscall
gap and use it in the SEV-ES code. Extend the check to also cover the
compatibility SYSCALL entry path.
Fixes: 315562c9af3d5 ("x86/sev-es: Adjust #VC IST Stack on entering NMI handler")
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: stable@vger.kernel.org # 5.10+
Link: https://lkml.kernel.org/r/20210303141716.29223-2-joro@8bytes.org
|
|
On 32-bit kernels, the stackprotector canary is quite nasty -- it is
stored at %gs:(20), which is nasty because 32-bit kernels use %fs for
percpu storage. It's even nastier because it means that whether %gs
contains userspace state or kernel state while running kernel code
depends on whether stackprotector is enabled (this is
CONFIG_X86_32_LAZY_GS), and this setting radically changes the way
that segment selectors work. Supporting both variants is a
maintenance and testing mess.
Merely rearranging so that percpu and the stack canary
share the same segment would be messy as the 32-bit percpu address
layout isn't currently compatible with putting a variable at a fixed
offset.
Fortunately, GCC 8.1 added options that allow the stack canary to be
accessed as %fs:__stack_chk_guard, effectively turning it into an ordinary
percpu variable. This lets us get rid of all of the code to manage the
stack canary GDT descriptor and the CONFIG_X86_32_LAZY_GS mess.
(That name is special. We could use any symbol we want for the
%fs-relative mode, but for CONFIG_SMP=n, gcc refuses to let us use any
name other than __stack_chk_guard.)
Forcibly disable stackprotector on older compilers that don't support
the new options and turn the stack canary into a percpu variable. The
"lazy GS" approach is now used for all 32-bit configurations.
Also makes load_gs_index() work on 32-bit kernels. On 64-bit kernels,
it loads the GS selector and updates the user GSBASE accordingly. (This
is unchanged.) On 32-bit kernels, it loads the GS selector and updates
GSBASE, which is now always the user base. This means that the overall
effect is the same on 32-bit and 64-bit, which avoids some ifdeffery.
[ bp: Massage commit message. ]
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/c0ff7dba14041c7e5d1cae5d4df052f03759bef3.1613243844.git.luto@kernel.org
|
|
There are two definitions for the TSC deadline MSR in msr-index.h,
one with an underscore and one without. Axe one of them and move
all the references over to the other one.
[ bp: Fixup the MSR define in handle_fastpath_set_msr_irqoff() too. ]
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200305174706.0D6B8EE4@viggo.jf.intel.com
|
|
There's a non-trivial conflict between the parallel TLB flush
framework and the IPI flush debugging code - merge them
manually.
Conflicts:
kernel/smp.c
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
cpu_tlbstate is mostly private and only the variable is_lazy is shared.
This causes some false-sharing when TLB flushes are performed.
Break cpu_tlbstate intro cpu_tlbstate and cpu_tlbstate_shared, and mark
each one accordingly.
Signed-off-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/r/20210220231712.2475218-6-namit@vmware.com
|
|
To improve TLB shootdown performance, flush the remote and local TLBs
concurrently. Introduce flush_tlb_multi() that does so. Introduce
paravirtual versions of flush_tlb_multi() for KVM, Xen and hyper-v (Xen
and hyper-v are only compile-tested).
While the updated smp infrastructure is capable of running a function on
a single local core, it is not optimized for this case. The multiple
function calls and the indirect branch introduce some overhead, and
might make local TLB flushes slower than they were before the recent
changes.
Before calling the SMP infrastructure, check if only a local TLB flush
is needed to restore the lost performance in this common case. This
requires to check mm_cpumask() one more time, but unless this mask is
updated very frequently, this should impact performance negatively.
Signed-off-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Michael Kelley <mikelley@microsoft.com> # Hyper-v parts
Reviewed-by: Juergen Gross <jgross@suse.com> # Xen and paravirt parts
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/r/20210220231712.2475218-5-namit@vmware.com
|
|
The unification of these two functions allows to use them in the updated
SMP infrastrucutre.
To do so, remove the reason argument from flush_tlb_func_local(), add
a member to struct tlb_flush_info that says which CPU initiated the
flush and act accordingly. Optimize the size of flush_tlb_info while we
are at it.
Unfortunately, this prevents us from using a constant tlb_flush_info for
arch_tlbbatch_flush(), but in a later stage we may be able to inline
tlb_flush_info into the IPI data, so it should not have an impact
eventually.
Signed-off-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/r/20210220231712.2475218-3-namit@vmware.com
|
|
When compiling an external kernel module with `-O0` or `-O1`, the following
compile error may be reported:
./arch/x86/include/asm/jump_label.h:25:2: error: impossible constraint in ‘asm’
25 | asm_volatile_goto("1:"
| ^~~~~~~~~~~~~~~~~
It appears that these lower optimization levels prevent GCC from detecting
that the key/branch arguments can be treated as constants and used as
immediate operands. To work around this, explicitly add the `const` label.
Signed-off-by: Jason Gerecke <jason.gerecke@wacom.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20210211214848.536626-1-jason.gerecke@wacom.com
|