Age | Commit message (Collapse) | Author | Files | Lines |
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-master
KVM/arm64 fixes for 5.9, take #2
- Fix handling of S1 Page Table Walk permission fault at S2
on instruction fetch
- Cleanup kvm_vcpu_dabt_iswrite()
|
|
arch_scale_freq_invariant() is used by schedutil to determine whether
the scheduler's load-tracking signals are frequency invariant. Its
definition is overridable, though by default it is hardcoded to 'true'
if arch_scale_freq_capacity() is defined ('false' otherwise).
This behaviour is not overridden on arm, arm64 and other users of the
generic arch topology driver, which is somewhat precarious:
arch_scale_freq_capacity() will always be defined, yet not all cpufreq
drivers are guaranteed to drive the frequency invariance scale factor
setting. In other words, the load-tracking signals may very well *not*
be frequency invariant.
Now that cpufreq can be queried on whether the current driver is driving
the Frequency Invariance (FI) scale setting, the current situation can
be improved. This combines the query of whether cpufreq supports the
setting of the frequency scale factor, with whether all online CPUs are
counter-based FI enabled.
While cpufreq FI enablement applies at system level, for all CPUs,
counter-based FI support could also be used for only a subset of CPUs to
set the invariance scale factor. Therefore, if cpufreq-based FI support
is present, we consider the system to be invariant. If missing, we
require all online CPUs to be counter-based FI enabled in order for the
full system to be considered invariant.
If the system ends up not being invariant, a new condition is needed in
the counter initialization code that disables all scale factor setting
based on counters.
Precedence of counters over cpufreq use is not important here. The
invariant status is only given to the system if all CPUs have at least
one method of setting the frequency scale factor.
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Now that kvm_vcpu_trap_is_write_fault() checks for S1PTW, there
is no need for kvm_vcpu_dabt_iswrite() to do the same thing, as
we already check for this condition on all existing paths.
Drop the check and add a comment instead.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20200915104218.1284701-3-maz@kernel.org
|
|
KVM currently assumes that an instruction abort can never be a write.
This is in general true, except when the abort is triggered by
a S1PTW on instruction fetch that tries to update the S1 page tables
(to set AF, for example).
This can happen if the page tables have been paged out and brought
back in without seeing a direct write to them (they are thus marked
read only), and the fault handling code will make the PT executable(!)
instead of writable. The guest gets stuck forever.
In these conditions, the permission fault must be considered as
a write so that the Stage-1 update can take place. This is essentially
the I-side equivalent of the problem fixed by 60e21a0ef54c ("arm64: KVM:
Take S1 walks into account when determining S2 write faults").
Update kvm_is_write_fault() to return true on IABT+S1PTW, and introduce
kvm_vcpu_trap_is_exec_fault() that only return true when no faulting
on a S1 fault. Additionally, kvm_vcpu_dabt_iss1tw() is renamed to
kvm_vcpu_abt_iss1tw(), as the above makes it plain that it isn't
specific to data abort.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Will Deacon <will@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20200915104218.1284701-2-maz@kernel.org
|
|
When generating instructions at runtime, for example due to kernel text
patching or the BPF JIT, we can emit a trapping BRK instruction if we
are asked to encode an invalid instruction such as an out-of-range]
branch. This is indicative of a bug in the caller, and will result in a
crash on executing the generated code. Unfortunately, the message from
the crash is really unhelpful, and mumbles something about ptrace:
| Unexpected kernel BRK exception at EL1
| Internal error: ptrace BRK handler: f2000100 [#1] SMP
We can do better than this. Install a break handler for FAULT_BRK_IMM,
which is the immediate used to encode the "I've been asked to generate
an invalid instruction" error, and triage the faulting PC to determine
whether or not the failure occurred in the BPF JIT.
Link: https://lore.kernel.org/r/20200915141707.GB26439@willie-the-truck
Reported-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: Will Deacon <will@kernel.org>
|
|
As with the generic arch_stack_walk() code the arm64 stack walk code takes
a callback that is called per stack frame. Currently the arm64 code always
passes a struct stackframe to the callback and the generic code just passes
the pc, however none of the users ever reference anything in the struct
other than the pc value. The arm64 code also uses a return type of int
while the generic code uses a return type of bool though in both cases the
return value is a boolean value and the sense is inverted between the two.
In order to reduce code duplication when arm64 is converted to use
arch_stack_walk() change the signature and return sense of the arm64
specific callback to match that of the generic code.
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Link: https://lore.kernel.org/r/20200914153409.25097-3-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
This change exposes write-combine mappings under sysfs for
prefetchable PCI resources on arm64.
Originally, the usage of "write combine" here was driven by the x86
definition of write combine. This definition is specific to x86 and
does not generalize to other architectures. However, the usage of WC
has mutated to "write combine" semantics, which is implemented
differently on each arch.
Generally, prefetchable BARs are accepted to allow speculative
accesses, write combining, and re-ordering-- from the PCI perspective,
this means there are no read side effects. (This contradicts the PCI
spec which allows prefetchable BARs to have read side effects, but
this definition is ill-advised as it is impossible to meet.) On x86,
prefetchable BARs are mapped as WC as originally defined (with some
conditionals on arch features). On arm64, WC is taken to mean normal
non-cacheable memory.
In practice, write combine semantics are used to minimize write
operations. A common usage of this is minimizing PCI TLPs which can
significantly improve performance with PCI devices. In order to
provide the same benefits to userspace, we need to allow userspace to
map prefetchable BARs with write combine semantics. The resourceX_wc
mapping is used today by userspace programs and libraries.
While this model is flawed as "write combine" is very ill-defined, it
is already used by multiple non-x86 archs to expose write combine
semantics to user space. We enable this on arm64 to give userspace on
arm64 an equivalent mechanism for utilizing write combining with PCI
devices.
Signed-off-by: Clint Sbisa <csbisa@amazon.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Bjorn Helgaas <helgaas@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20200918033312.ddfpibgfylfjpex2@amazon.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
lift the compat_s64 and compat_u64 definitions into common code using the
COMPAT_FOR_U64_ALIGNMENT symbol for the x86 special case.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Let's switch the arm64 code to the core accounting, which already
does everything we need.
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
The old IPI registration interface is now unused on arm64, so let's
get rid of it.
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
The way we use the base of DRAM in the EFI stub is problematic as it
is ill defined what the base of DRAM actually means. There are some
restrictions on the placement of FDT and initrd which are defined in
terms of dram_base, but given that the placement of the kernel in
memory is what defines these boundaries (as on ARM, this is where the
linear region starts), it is better to use the image address in these
cases, and disregard dram_base altogether.
Reviewed-by: Maxim Uvarov <maxim.uvarov@linaro.org>
Tested-by: Maxim Uvarov <maxim.uvarov@linaro.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
To complete the transition to SMCCC, the hyp initialization is given a
function ID. This looks neater than comparing the hyp stub function IDs
to the page table physical address.
Some care is taken to only clobber x0-3 before the host context is saved
as only those registers can be clobbered accoring to SMCCC. Fortunately,
only a few acrobatics are needed. The possible new tpidr_el2 is moved to
the argument in x2 so that it can be stashed in tpidr_el2 early to free
up a scratch register. The page table configuration then makes use of
x0-2.
Signed-off-by: Andrew Scull <ascull@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200915104643.2543892-19-ascull@google.com
|
|
Rather than passing arbitrary function pointers to run at hyp, define
and equivalent set of SMCCC functions.
Since the SMCCC functions are strongly tied to the original function
prototypes, it is not expected for the host to ever call an invalid ID
but a warning is raised if this does ever occur.
As __kvm_vcpu_run is used for every switch between the host and a guest,
it is explicitly singled out to be identified before the other function
IDs to improve the performance of the hot path.
Signed-off-by: Andrew Scull <ascull@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200915104643.2543892-18-ascull@google.com
|
|
Restore the host context when panicking from hyp to give the best chance
of the panic being clean.
The host requires that registers be preserved such as x18 for the shadow
callstack. If the panic is caused by an exception from EL1, the host
context is still valid so the panic can return straight back to the
host. If the panic comes from EL2 then it's most likely that the hyp
context is active and the host context needs to be restored.
There are windows before and after the host context is saved and
restored that restoration is attempted incorrectly and the panic won't
be clean.
Signed-off-by: Andrew Scull <ascull@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200915104643.2543892-14-ascull@google.com
|
|
To avoid duplicating the context save and restore macros, move them into
a shareable header.
Signed-off-by: Andrew Scull <ascull@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200915104643.2543892-12-ascull@google.com
|
|
If the guest context is loaded when a panic is triggered, restore the
hyp context so e.g. the shadow call stack works when hyp_panic() is
called and SP_EL0 is valid when the host's panic() is called.
Use the hyp context's __hyp_running_vcpu field to track when hyp
transitions to and from the guest vcpu so the exception handlers know
whether the context needs to be restored.
Signed-off-by: Andrew Scull <ascull@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200915104643.2543892-11-ascull@google.com
|
|
Hyp now has its own nominal context for saving and restoring its state
when switching to and from a guest. Update the related comments and
utilities to match the new name.
Signed-off-by: Andrew Scull <ascull@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200915104643.2543892-10-ascull@google.com
|
|
During __guest_enter, save and restore from a new hyp context rather
than the host context. This is preparation for separation of the hyp and
host context in nVHE.
Signed-off-by: Andrew Scull <ascull@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200915104643.2543892-9-ascull@google.com
|
|
The host is treated differently from the guests when an exception is
taken so introduce a separate vector that is specialized for the host.
This also allows the nVHE specific code to move out of hyp-entry.S and
into nvhe/host.S.
The host is only expected to make HVC calls and anything else is
considered invalid and results in a panic.
Hyp initialization is now passed the vector that is used for the host
and it is swapped for the guest vector during the context switch.
Signed-off-by: Andrew Scull <ascull@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200915104643.2543892-7-ascull@google.com
|
|
Introduce a percpu variable to hold the address of the selected hyp
vector that will be used with guests. This avoids the selection process
each time a guest is being entered and can be used by nVHE when a
separate vector is introduced for the host.
Signed-off-by: Andrew Scull <ascull@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200915104643.2543892-6-ascull@google.com
|
|
Make CHOOSE_HYP_SYM select the symbol of the active hypervisor for the
host, the nVHE symbol for nVHE and the VHE symbol for VHE. The nVHE and
VHE hypervisors see their own symbols without prefixes and trigger a
link error when trying to use a symbol of the other hypervisor.
Signed-off-by: Andrew Scull <ascull@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: David Brazdil <dbrazdil@google.com>
Link: https://lore.kernel.org/r/20200915104643.2543892-5-ascull@google.com
|
|
The kvm_host_data_t typedef is used inconsistently and goes against the
kernel's coding style. Remove it in favour of the full struct specifier.
Signed-off-by: Andrew Scull <ascull@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200915104643.2543892-4-ascull@google.com
|
|
hyp_panic is able to find all the context it needs from within itself so
remove the argument. The __hyp_panic wrapper becomes redundant so is
also removed.
Signed-off-by: Andrew Scull <ascull@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200915104643.2543892-3-ascull@google.com
|
|
The function __{pgd, pud, pmd, pte}_error() are introduced so that
they can be called by {pgd, pud, pmd, pte}_ERROR(). However, some
of the functions could never be called when the corresponding page
table level isn't enabled. For example, __{pud, pmd}_error() are
unused when PUD and PMD are folded to PGD.
This removes __{pgd, pud, pmd, pte}_error() and call pr_err() from
{pgd, pud, pmd, pte}_ERROR() directly, similar to what x86/powerpc
are doing. With this, the code looks a bit simplified either.
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/20200913234730.23145-1-gshan@redhat.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Some Armv8.3 Pointer Authentication enhancements have been introduced
which are mandatory for Armv8.6 and optional for Armv8.3. These features
are,
* ARMv8.3-PAuth2 - An enhanced PAC generation logic is added which hardens
finding the correct PAC value of the authenticated pointer.
* ARMv8.3-FPAC - Fault is generated now when the ptrauth authentication
instruction fails in authenticating the PAC present in the address.
This is different from earlier case when such failures just adds an
error code in the top byte and waits for subsequent load/store to abort.
The ptrauth instructions which may cause this fault are autiasp, retaa
etc.
The above features are now represented by additional configurations
for the Address Authentication cpufeature and a new ESR exception class.
The userspace fault received in the kernel due to ARMv8.3-FPAC is treated
as Illegal instruction and hence signal SIGILL is injected with ILL_ILLOPN
as the signal code. Note that this is different from earlier ARMv8.3
ptrauth where signal SIGSEGV is issued due to Pointer authentication
failures. The in-kernel PAC fault causes kernel to crash.
Signed-off-by: Amit Daniel Kachhap <amit.kachhap@arm.com>
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Link: https://lore.kernel.org/r/20200914083656.21428-4-amit.kachhap@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Some error signal need to pass proper ARM esr error code to userspace to
better identify the cause of the signal. So the function
force_signal_inject is extended to pass this as a parameter. The
existing code is not affected by this change.
Signed-off-by: Amit Daniel Kachhap <amit.kachhap@arm.com>
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Link: https://lore.kernel.org/r/20200914083656.21428-3-amit.kachhap@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Currently the ARMv8.3-PAuth combined branch instructions (braa, retaa
etc.) are not simulated for out-of-line execution with a handler. Hence the
uprobe of such instructions leads to kernel warnings in a loop as they are
not explicitly checked and fall into INSN_GOOD categories. Other combined
instructions like LDRAA and LDRBB can be probed.
The issue of the combined branch instructions is fixed by adding
group definitions of all such instructions and rejecting their probes.
The instruction groups added are br_auth(braa, brab, braaz and brabz),
blr_auth(blraa, blrab, blraaz and blrabz), ret_auth(retaa and retab) and
eret_auth(eretaa and eretab).
Warning log:
WARNING: CPU: 0 PID: 156 at arch/arm64/kernel/probes/uprobes.c:182 uprobe_single_step_handler+0x34/0x50
Modules linked in:
CPU: 0 PID: 156 Comm: func Not tainted 5.9.0-rc3 #188
Hardware name: Foundation-v8A (DT)
pstate: 804003c9 (Nzcv DAIF +PAN -UAO BTYPE=--)
pc : uprobe_single_step_handler+0x34/0x50
lr : single_step_handler+0x70/0xf8
sp : ffff800012af3e30
x29: ffff800012af3e30 x28: ffff000878723b00
x27: 0000000000000000 x26: 0000000000000000
x25: 0000000000000000 x24: 0000000000000000
x23: 0000000060001000 x22: 00000000cb000022
x21: ffff800012065ce8 x20: ffff800012af3ec0
x19: ffff800012068d50 x18: 0000000000000000
x17: 0000000000000000 x16: 0000000000000000
x15: 0000000000000000 x14: 0000000000000000
x13: 0000000000000000 x12: 0000000000000000
x11: 0000000000000000 x10: 0000000000000000
x9 : ffff800010085c90 x8 : 0000000000000000
x7 : 0000000000000000 x6 : ffff80001205a9c8
x5 : ffff80001205a000 x4 : ffff80001233db80
x3 : ffff8000100a7a60 x2 : 0020000000000003
x1 : 0000fffffffff008 x0 : ffff800012af3ec0
Call trace:
uprobe_single_step_handler+0x34/0x50
single_step_handler+0x70/0xf8
do_debug_exception+0xb8/0x130
el0_sync_handler+0x138/0x1b8
el0_sync+0x158/0x180
Fixes: 74afda4016a7 ("arm64: compile the kernel with ptrauth return address signing")
Fixes: 04ca3204fa09 ("arm64: enable pointer authentication")
Signed-off-by: Amit Daniel Kachhap <amit.kachhap@arm.com>
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Link: https://lore.kernel.org/r/20200914083656.21428-2-amit.kachhap@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The GIC's internal view of the priority mask register and the assigned
interrupt priorities are based on whether GIC security is enabled and
whether firmware routes Group 0 interrupts to EL3. At the moment, we
support priority masking when ICC_PMR_EL1 and interrupt priorities are
either both modified by the GIC, or both left unchanged.
Trusted Firmware-A's default interrupt routing model allows Group 0
interrupts to be delivered to the non-secure world (SCR_EL3.FIQ == 0).
Unfortunately, this is precisely the case that the GIC driver doesn't
support: ICC_PMR_EL1 remains unchanged, but the GIC's view of interrupt
priorities is different from the software programmed values.
Support pseudo-NMIs when SCR_EL3.FIQ == 0 by using a different value to
mask regular interrupts. All the other values remain the same.
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200912153707.667731-3-alexandru.elisei@arm.com
|
|
In order to deal with IPIs as normal interrupts, let's add
a new way to register them with the architecture code.
set_smp_ipi_range() takes a range of interrupts, and allows
the arch code to request them as if the were normal interrupts.
A standard handler is then called by the core IRQ code to deal
with the IPI.
This means that we don't need to call irq_enter/irq_exit, and
that we don't need to deal with set_irq_regs either. So let's
move the dispatcher into its own function, and leave handle_IPI()
as a compatibility function.
On the sending side, let's make use of ipi_send_mask, which
already exists for this purpose.
One of the major difference is that we end up, in some cases
(such as when performing IRQ time accounting on the scheduler
IPI), end up with nested irq_enter()/irq_exit() pairs.
Other than the (relatively small) overhead, there should be
no consequences to it (these pairs are designed to nest
correctly, and the accounting shouldn't be off).
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Pull kvm fixes from Paolo Bonzini:
"A bit on the bigger side, mostly due to me being on vacation, then
busy, then on parental leave, but there's nothing worrisome.
ARM:
- Multiple stolen time fixes, with a new capability to match x86
- Fix for hugetlbfs mappings when PUD and PMD are the same level
- Fix for hugetlbfs mappings when PTE mappings are enforced (dirty
logging, for example)
- Fix tracing output of 64bit values
x86:
- nSVM state restore fixes
- Async page fault fixes
- Lots of small fixes everywhere"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (25 commits)
KVM: emulator: more strict rsm checks.
KVM: nSVM: more strict SMM checks when returning to nested guest
SVM: nSVM: setup nested msr permission bitmap on nested state load
SVM: nSVM: correctly restore GIF on vmexit from nesting after migration
x86/kvm: don't forget to ACK async PF IRQ
x86/kvm: properly use DEFINE_IDTENTRY_SYSVEC() macro
KVM: VMX: Don't freeze guest when event delivery causes an APIC-access exit
KVM: SVM: avoid emulation with stale next_rip
KVM: x86: always allow writing '0' to MSR_KVM_ASYNC_PF_EN
KVM: SVM: Periodically schedule when unregistering regions on destroy
KVM: MIPS: Change the definition of kvm type
kvm x86/mmu: use KVM_REQ_MMU_SYNC to sync when needed
KVM: nVMX: Fix the update value of nested load IA32_PERF_GLOBAL_CTRL control
KVM: fix memory leak in kvm_io_bus_unregister_dev()
KVM: Check the allocation of pv cpu mask
KVM: nVMX: Update VMCS02 when L2 PAE PDPTE updates detected
KVM: arm64: Update page shift if stage 2 block mapping not supported
KVM: arm64: Fix address truncation in traces
KVM: arm64: Do not try to map PUDs when they are folded into PMD
arm64/x86: KVM: Introduce steal-time cap
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 fixes for Linux 5.9, take #1
- Multiple stolen time fixes, with a new capability to match x86
- Fix for hugetlbfs mappings when PUD and PMD are the same level
- Fix for hugetlbfs mappings when PTE mappings are enforced
(dirty logging, for example)
- Fix tracing output of 64bit values
|
|
Similar to how CONT_PTE_SHIFT is determined, this introduces a new
kernel option (CONFIG_CONT_PMD_SHIFT) to determine CONT_PMD_SHIFT.
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20200910095936.20307-3-gshan@redhat.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
CONT_PTE_SHIFT actually depends on CONFIG_ARM64_CONT_SHIFT. It's
reasonable to reflect the dependency:
* This renames CONFIG_ARM64_CONT_SHIFT to CONFIG_ARM64_CONT_PTE_SHIFT,
so that we can introduce CONFIG_ARM64_CONT_PMD_SHIFT later.
* CONT_{SHIFT, SIZE, MASK}, defined in page-def.h are removed as they
are not used by anyone.
* CONT_PTE_SHIFT is determined by CONFIG_ARM64_CONT_PTE_SHIFT.
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20200910095936.20307-2-gshan@redhat.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The macro was introduced by commit <ecf35a237a85> ("arm64: PTE/PMD
contiguous bit definition") at the beginning. It's only used by
commit <348a65cdcbbf> ("arm64: Mark kernel page ranges contiguous"),
which was reverted later by commit <667c27597ca8>. This makes the
macro unused.
This removes the unused macro (CONT_RANGE_OFFSET).
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20200910095936.20307-1-gshan@redhat.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
HWCAP name arrays (hwcap_str, compat_hwcap_str, compat_hwcap2_str) that are
scanned for /proc/cpuinfo are detached from their bit definitions making it
vulnerable and difficult to correlate. It is also bit problematic because
during /proc/cpuinfo dump these arrays get traversed sequentially assuming
they reflect and match actual HWCAP bit sequence, to test various features
for a given CPU. This redefines name arrays per their HWCAP bit definitions
. It also warns after detecting any feature which is not expected on arm64.
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Dave Martin <Dave.Martin@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Link: https://lore.kernel.org/r/1599630535-29337-1-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
The stage-2 page-tables are entirely encapsulated by the 'pgt' field of
'struct kvm_s2_mmu', so remove the unused 'pgd' field.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20200911132529.19844-21-will@kernel.org
|
|
Now that KVM is using the generic page-table code to manage the guest
stage-2 page-tables, we can remove a bunch of unused macros, #defines
and static inline functions from the old implementation.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20200911132529.19844-20-will@kernel.org
|
|
Add support for relaxing the permissions of a stage-2 mapping (i.e.
adding additional permissions) to the generic page-table code.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20200911132529.19844-17-will@kernel.org
|
|
Add support for cache flushing a range of the stage-2 address space to
the generic page-table code.
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Cc: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200911132529.19844-15-will@kernel.org
|
|
Add a stage-2 wrprotect() operation to the generic page-table code.
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Cc: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200911132529.19844-13-will@kernel.org
|
|
Add stage-2 mkyoung(), mkold() and is_young() operations to the generic
page-table code.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20200911132529.19844-11-will@kernel.org
|
|
Add stage-2 map() and unmap() operations to the generic page-table code.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20200911132529.19844-7-will@kernel.org
|
|
Introduce alloc() and free() functions to the generic page-table code
for guest stage-2 page-tables and plumb these into the existing KVM
page-table allocator. Subsequent patches will convert other operations
within the KVM allocator over to the generic code.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20200911132529.19844-6-will@kernel.org
|
|
Now that we have a shiny new page-table allocator, replace the hyp
page-table code with calls into the new API. This also allows us to
remove the extended idmap code, as we can now simply ensure that the
VA size is large enough to map everything we need.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20200911132529.19844-5-will@kernel.org
|
|
The generic page-table walker is pretty useless as it stands, because it
doesn't understand enough to allocate anything. Teach it about stage-1
page-tables, and hook up an API for allocating these for the hypervisor
at EL2.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20200911132529.19844-4-will@kernel.org
|
|
The KVM page-table code is intricately tied into the kernel page-table
code and re-uses the pte/pmd/pud/p4d/pgd macros directly in an attempt
to reduce code duplication. Unfortunately, the reality is that there is
an awful lot of code required to make this work, and at the end of the
day you're limited to creating page-tables with the same configuration
as the host kernel. Furthermore, lifting the page-table code to run
directly at EL2 on a non-VHE system (as we plan to to do in future
patches) is practically impossible due to the number of dependencies it
has on the core kernel.
Introduce a framework for walking Armv8 page-tables configured
independently from the host kernel.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20200911132529.19844-3-will@kernel.org
|
|
kvm_mmu_free_memory_caches() is only called by kvm_arch_vcpu_destroy(),
so inline the implementation and get rid of the extra function.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20200911132529.19844-2-will@kernel.org
|