summaryrefslogtreecommitdiff
path: root/arch/powerpc/kvm
AgeCommit message (Collapse)AuthorFilesLines
2018-10-18KVM: PPC: Book3S HV: Avoid crash from THP collapse during radix page faultPaul Mackerras1-0/+10
commit 6579804c431712d56956a63b1a01509441cc6800 upstream. Commit 71d29f43b633 ("KVM: PPC: Book3S HV: Don't use compound_order to determine host mapping size", 2018-09-11) added a call to __find_linux_pte() and a dereference of the returned PTE pointer to the radix page fault path in the common case where the page is normal system memory. Previously, __find_linux_pte() was only called for mappings to physical addresses which don't have a page struct (e.g. memory-mapped I/O) or where the page struct is marked as reserved memory. This exposes us to the possibility that the returned PTE pointer could be NULL, for example in the case of a concurrent THP collapse operation. Dereferencing the returned NULL pointer causes a host crash. To fix this, we check for NULL, and if it is NULL, we retry the operation by returning to the guest, with the expectation that it will generate the same page fault again (unless of course it has been fixed up by another CPU in the meantime). Fixes: 71d29f43b633 ("KVM: PPC: Book3S HV: Don't use compound_order to determine host mapping size") Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-18KVM: PPC: Book3S HV: Don't use compound_order to determine host mapping sizeNicholas Piggin1-54/+37
[ Upstream commit 71d29f43b6332badc5598c656616a62575e83342 ] THP paths can defer splitting compound pages until after the actual remap and TLB flushes to split a huge PMD/PUD. This causes radix partition scope page table mappings to get out of synch with the host qemu page table mappings. This results in random memory corruption in the guest when running with THP. The easiest way to reproduce is use KVM balloon to free up a lot of memory in the guest and then shrink the balloon to give the memory back, while some work is being done in the guest. Cc: David Gibson <david@gibson.dropbear.id.au> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: kvm-ppc@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-10KVM: PPC: Book3S HV: Don't truncate HPTE index in xlate functionPaul Mackerras1-1/+1
[ Upstream commit 46dec40fb741f00f1864580130779aeeaf24fb3d ] This fixes a bug which causes guest virtual addresses to get translated to guest real addresses incorrectly when the guest is using the HPT MMU and has more than 256GB of RAM, or more specifically has a HPT larger than 2GB. This has showed up in testing as a failure of the host to emulate doorbell instructions correctly on POWER9 for HPT guests with more than 256GB of RAM. The bug is that the HPTE index in kvmppc_mmu_book3s_64_hv_xlate() is stored as an int, and in forming the HPTE address, the index gets shifted left 4 bits as an int before being signed-extended to 64 bits. The simple fix is to make the variable a long int, matching the return type of kvmppc_hv_find_lock_hpte(), which is what calculates the index. Fixes: 697d3899dcb4 ("KVM: PPC: Implement MMIO emulation support for Book3S HV guests") Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-26KVM: PPC: Book3S: Fix matching of hardware and emulated TCE tablesAlexey Kardashevskiy1-3/+2
[ Upstream commit 76346cd93a5eca33700f82685d56172dd65d4c0a ] When attaching a hardware table to LIOBN in KVM, we match table parameters such as page size, table offset and table size. However the tables are created via very different paths - VFIO and KVM - and the VFIO path goes through the platform code which has minimum TCE page size requirement (which is 4K but since we allocate memory by pages and cannot avoid alignment anyway, we align to 64k pages for powernv_defconfig). So when we match the tables, one might be bigger that the other which means the hardware table cannot get attached to LIOBN and DMA mapping fails. This removes the table size alignment from the guest visible table. This does not affect the memory allocation which is still aligned - kvmppc_tce_pages() takes care of this. This relaxes the check we do when attaching tables to allow the hardware table be bigger than the guest visible table. Ideally we want the KVM table to cover the same space as the hardware table does but since the hardware table may use multiple levels, and all levels must use the same table size (IODA2 design), the area it can actually cover might get very different from the window size which the guest requested, even though the guest won't map it all. Fixes: ca1fc489cf "KVM: PPC: Book3S: Allow backing bigger guest IOMMU pages with smaller physical pages" Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-26KVM: PPC: Book3S HV: Add of_node_put() in success pathNicholas Mc Guire1-0/+2
[ Upstream commit 51eaa08f029c7343df846325d7cf047be8b96e81 ] The call to of_find_compatible_node() is returning a pointer with incremented refcount so it must be explicitly decremented after the last use. As here it is only being used for checking of node presence but the result is not actually used in the success path it can be dropped immediately. Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org> Fixes: commit f725758b899f ("KVM: PPC: Book3S HV: Use OPAL XICS emulation on POWER9") Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-19KVM: PPC: Book3S HV: Use correct pagesize in kvm_unmap_radix()Paul Mackerras1-3/+3
commit c066fafc595eef5ae3c83ae3a8305956b8c3ef15 upstream. Since commit e641a317830b ("KVM: PPC: Book3S HV: Unify dirty page map between HPT and radix", 2017-10-26), kvm_unmap_radix() computes the number of PAGE_SIZEd pages being unmapped and passes it to kvmppc_update_dirty_map(), which expects to be passed the page size instead. Consequently it will only mark one system page dirty even when a large page (for example a THP page) is being unmapped. The consequence of this is that part of the THP page might not get copied during live migration, resulting in memory corruption for the guest. This fixes it by computing and passing the page size in kvm_unmap_radix(). Cc: stable@vger.kernel.org # v4.15+ Fixes: e641a317830b (KVM: PPC: Book3S HV: Unify dirty page map between HPT and radix) Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-09powerpc64/ftrace: Include ftrace.h needed for enable/disable callsLuke Dashjr1-0/+1
commit d6ee76d3d37d156c479348821574b6f99d6472a1 upstream. this_cpu_disable_ftrace and this_cpu_enable_ftrace are inlines in ftrace.h Without it included, the build fails. Fixes: a4bc64d305af ("powerpc64/ftrace: Disable ftrace during kvm entry/exit") Cc: stable@vger.kernel.org # v4.18+ Signed-off-by: Luke Dashjr <luke-jr+git@utopios.org> Acked-by: Naveen N. Rao <naveen.n.rao at linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-18KVM: PPC: Check if IOMMU page is contained in the pinned physical pageAlexey Kardashevskiy2-3/+5
A VM which has: - a DMA capable device passed through to it (eg. network card); - running a malicious kernel that ignores H_PUT_TCE failure; - capability of using IOMMU pages bigger that physical pages can create an IOMMU mapping that exposes (for example) 16MB of the host physical memory to the device when only 64K was allocated to the VM. The remaining 16MB - 64K will be some other content of host memory, possibly including pages of the VM, but also pages of host kernel memory, host programs or other VMs. The attacking VM does not control the location of the page it can map, and is only allowed to map as many pages as it has pages of RAM. We already have a check in drivers/vfio/vfio_iommu_spapr_tce.c that an IOMMU page is contained in the physical page so the PCI hardware won't get access to unassigned host memory; however this check is missing in the KVM fastpath (H_PUT_TCE accelerated code). We were lucky so far and did not hit this yet as the very first time when the mapping happens we do not have tbl::it_userspace allocated yet and fall back to the userspace which in turn calls VFIO IOMMU driver, this fails and the guest does not retry, This stores the smallest preregistered page size in the preregistered region descriptor and changes the mm_iommu_xxx API to check this against the IOMMU page size. This calculates maximum page size as a minimum of the natural region alignment and compound page size. For the page shift this uses the shift returned by find_linux_pte() which indicates how the page is mapped to the current userspace - if the page is huge and this is not a zero, then it is a leaf pte and the page is mapped within the range. Fixes: 121f80ba68f1 ("KVM: PPC: VFIO: Add in-kernel acceleration for VFIO") Cc: stable@vger.kernel.org # v4.12+ Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-14Merge tag 'kvm-ppc-next-4.18-2' of ↵Paolo Bonzini29-1173/+2224
git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD
2018-06-13Merge tag 'overflow-v4.18-rc1-part2' of ↵Linus Torvalds2-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull more overflow updates from Kees Cook: "The rest of the overflow changes for v4.18-rc1. This includes the explicit overflow fixes from Silvio, further struct_size() conversions from Matthew, and a bug fix from Dan. But the bulk of it is the treewide conversions to use either the 2-factor argument allocators (e.g. kmalloc(a * b, ...) into kmalloc_array(a, b, ...) or the array_size() macros (e.g. vmalloc(a * b) into vmalloc(array_size(a, b)). Coccinelle was fighting me on several fronts, so I've done a bunch of manual whitespace updates in the patches as well. Summary: - Error path bug fix for overflow tests (Dan) - Additional struct_size() conversions (Matthew, Kees) - Explicitly reported overflow fixes (Silvio, Kees) - Add missing kvcalloc() function (Kees) - Treewide conversions of allocators to use either 2-factor argument variant when available, or array_size() and array3_size() as needed (Kees)" * tag 'overflow-v4.18-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (26 commits) treewide: Use array_size in f2fs_kvzalloc() treewide: Use array_size() in f2fs_kzalloc() treewide: Use array_size() in f2fs_kmalloc() treewide: Use array_size() in sock_kmalloc() treewide: Use array_size() in kvzalloc_node() treewide: Use array_size() in vzalloc_node() treewide: Use array_size() in vzalloc() treewide: Use array_size() in vmalloc() treewide: devm_kzalloc() -> devm_kcalloc() treewide: devm_kmalloc() -> devm_kmalloc_array() treewide: kvzalloc() -> kvcalloc() treewide: kvmalloc() -> kvmalloc_array() treewide: kzalloc_node() -> kcalloc_node() treewide: kzalloc() -> kcalloc() treewide: kmalloc() -> kmalloc_array() mm: Introduce kvcalloc() video: uvesafb: Fix integer overflow in allocation UBIFS: Fix potential integer overflow in allocation leds: Use struct_size() in allocation Convert intel uncore to struct_size ...
2018-06-13KVM: PPC: Book3S PR: Fix failure status setting in tabort. emulationSimon Guo1-3/+3
tabort. will perform transaction failure recording and the recording depends on TEXASR FS bit. Currently the TEXASR FS bit is retrieved after tabort., when the TEXASR FS bit is already been updated by tabort. itself. This patch corrects this behavior by retrieving TEXASR val before tabort. tabort. will not immediately leads to transaction failure handling in suspend state. So this patch also remove the mtspr on TEXASR/TFIAR registers to avoid TM bad thing exception. Fixes: 26798f88d58d ("KVM: PPC: Book3S PR: Add emulation for tabort. in privileged state") Signed-off-by: Simon Guo <wei.guo.simon@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-06-13KVM: PPC: Book3S PR: Enable use on POWER9 bare-metal hosts in HPT modePaul Mackerras1-6/+2
It turns out that PR KVM has no dependency on the format of HPTEs, because it uses functions pointed to by mmu_hash_ops which do all the formatting and interpretation of HPTEs. Thus we can allow PR KVM to load on POWER9 bare-metal hosts as long as they are running in HPT mode. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-06-13KVM: PPC: Book3S PR: Don't let PAPR guest set MSR hypervisor bitPaul Mackerras1-0/+4
PAPR guests run in supervisor mode and should not be able to set the MSR HV (hypervisor mode) bit or clear the ME (machine check enable) bit by mtmsrd or any other means. To enforce this, we force MSR_HV off and MSR_ME on in kvmppc_set_msr_pr. Without this, the guest can appear to be in hypervisor mode to itself and to userspace. This has been observed to cause a crash in QEMU when it tries to deliver a system reset interrupt to the guest. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-06-13KVM: PPC: Book3S PR: Fix failure status setting in treclaim. emulationPaul Mackerras1-9/+11
The treclaim. emulation needs to record failure status in the TEXASR register if the transaction had not previously failed. However, the current code first does kvmppc_save_tm_pr() (which does a treclaim. itself) and then checks the failure summary bit in TEXASR after that. Since treclaim. itself causes transaction failure, the FS bit is always set, so we were never updating TEXASR with the failure cause supplied by the guest as the RA parameter to the treclaim. instruction. This caused the tm-unavailable test in tools/testing/selftests/powerpc/tm to fail. To fix this, we need to read TEXASR before calling kvmppc_save_tm_pr(), and base the final value of TEXASR on that value. Fixes: 03c81682a90b ("KVM: PPC: Book3S PR: Add emulation for treclaim.") Reviewed-by: Simon Guo <wei.guo.simon@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-06-13KVM: PPC: Book3S PR: Fix MSR setting when delivering interruptsPaul Mackerras2-33/+22
This makes sure that MSR "partial-function" bits are not transferred to SRR1 when delivering an interrupt. This was causing failures in guests running kernels that include commit f3d96e698ed0 ("powerpc/mm: Overhaul handling of bad page faults", 2017-07-19), which added code to check bits of SRR1 on instruction storage interrupts (ISIs) that indicate a bad page fault. The symptom was that a guest user program that handled a signal and attempted to return from the signal handler would get a SIGBUS signal and die. The code that generated ISIs and some other interrupts would previously set bits in the guest MSR to indicate the interrupt status and then call kvmppc_book3s_queue_irqprio(). This technique no longer works now that kvmppc_inject_interrupt() is masking off those bits. Instead we make kvmppc_core_queue_data_storage() and kvmppc_core_queue_inst_storage() call kvmppc_inject_interrupt() directly, and make sure that all the places that generate ISIs or DSIs call kvmppc_core_queue_{data,inst}_storage instead of kvmppc_book3s_queue_irqprio(). Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-06-13KVM: PPC: Book3S PR: Handle additional interrupt typesCameron Kaiser1-0/+3
This adds trivial handling for additional interrupt types that KVM-PR must support for proper virtualization on a POWER9 host in HPT mode, as a further prerequisite to enabling KVM-PR on that configuration. Signed-off-by: Cameron Kaiser <spectre@floodgap.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-06-13treewide: Use array_size() in vzalloc()Kees Cook1-1/+1
The vzalloc() function has no 2-factor argument form, so multiplication factors need to be wrapped in array_size(). This patch replaces cases of: vzalloc(a * b) with: vzalloc(array_size(a, b)) as well as handling cases of: vzalloc(a * b * c) with: vzalloc(array3_size(a, b, c)) This does, however, attempt to ignore constant size factors like: vzalloc(4 * 1024) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( vzalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) | vzalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( vzalloc( - sizeof(u8) * (COUNT) + COUNT , ...) | vzalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) | vzalloc( - sizeof(char) * (COUNT) + COUNT , ...) | vzalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) | vzalloc( - sizeof(u8) * COUNT + COUNT , ...) | vzalloc( - sizeof(__u8) * COUNT + COUNT , ...) | vzalloc( - sizeof(char) * COUNT + COUNT , ...) | vzalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( vzalloc( - sizeof(TYPE) * (COUNT_ID) + array_size(COUNT_ID, sizeof(TYPE)) , ...) | vzalloc( - sizeof(TYPE) * COUNT_ID + array_size(COUNT_ID, sizeof(TYPE)) , ...) | vzalloc( - sizeof(TYPE) * (COUNT_CONST) + array_size(COUNT_CONST, sizeof(TYPE)) , ...) | vzalloc( - sizeof(TYPE) * COUNT_CONST + array_size(COUNT_CONST, sizeof(TYPE)) , ...) | vzalloc( - sizeof(THING) * (COUNT_ID) + array_size(COUNT_ID, sizeof(THING)) , ...) | vzalloc( - sizeof(THING) * COUNT_ID + array_size(COUNT_ID, sizeof(THING)) , ...) | vzalloc( - sizeof(THING) * (COUNT_CONST) + array_size(COUNT_CONST, sizeof(THING)) , ...) | vzalloc( - sizeof(THING) * COUNT_CONST + array_size(COUNT_CONST, sizeof(THING)) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ vzalloc( - SIZE * COUNT + array_size(COUNT, SIZE) , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( vzalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | vzalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | vzalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | vzalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | vzalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | vzalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | vzalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | vzalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( vzalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | vzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | vzalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | vzalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | vzalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) | vzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( vzalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | vzalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | vzalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | vzalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | vzalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | vzalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | vzalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | vzalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( vzalloc(C1 * C2 * C3, ...) | vzalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants. @@ expression E1, E2; constant C1, C2; @@ ( vzalloc(C1 * C2, ...) | vzalloc( - E1 * E2 + array_size(E1, E2) , ...) ) Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-13treewide: Use array_size() in vmalloc()Kees Cook1-1/+1
The vmalloc() function has no 2-factor argument form, so multiplication factors need to be wrapped in array_size(). This patch replaces cases of: vmalloc(a * b) with: vmalloc(array_size(a, b)) as well as handling cases of: vmalloc(a * b * c) with: vmalloc(array3_size(a, b, c)) This does, however, attempt to ignore constant size factors like: vmalloc(4 * 1024) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( vmalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) | vmalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( vmalloc( - sizeof(u8) * (COUNT) + COUNT , ...) | vmalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) | vmalloc( - sizeof(char) * (COUNT) + COUNT , ...) | vmalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) | vmalloc( - sizeof(u8) * COUNT + COUNT , ...) | vmalloc( - sizeof(__u8) * COUNT + COUNT , ...) | vmalloc( - sizeof(char) * COUNT + COUNT , ...) | vmalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( vmalloc( - sizeof(TYPE) * (COUNT_ID) + array_size(COUNT_ID, sizeof(TYPE)) , ...) | vmalloc( - sizeof(TYPE) * COUNT_ID + array_size(COUNT_ID, sizeof(TYPE)) , ...) | vmalloc( - sizeof(TYPE) * (COUNT_CONST) + array_size(COUNT_CONST, sizeof(TYPE)) , ...) | vmalloc( - sizeof(TYPE) * COUNT_CONST + array_size(COUNT_CONST, sizeof(TYPE)) , ...) | vmalloc( - sizeof(THING) * (COUNT_ID) + array_size(COUNT_ID, sizeof(THING)) , ...) | vmalloc( - sizeof(THING) * COUNT_ID + array_size(COUNT_ID, sizeof(THING)) , ...) | vmalloc( - sizeof(THING) * (COUNT_CONST) + array_size(COUNT_CONST, sizeof(THING)) , ...) | vmalloc( - sizeof(THING) * COUNT_CONST + array_size(COUNT_CONST, sizeof(THING)) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ vmalloc( - SIZE * COUNT + array_size(COUNT, SIZE) , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( vmalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | vmalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | vmalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | vmalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | vmalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | vmalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | vmalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | vmalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( vmalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | vmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | vmalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | vmalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | vmalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) | vmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( vmalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | vmalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | vmalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | vmalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | vmalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | vmalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | vmalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | vmalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( vmalloc(C1 * C2 * C3, ...) | vmalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants. @@ expression E1, E2; constant C1, C2; @@ ( vmalloc(C1 * C2, ...) | vmalloc( - E1 * E2 + array_size(E1, E2) , ...) ) Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-12Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2-3/+2
Pull KVM updates from Paolo Bonzini: "Small update for KVM: ARM: - lazy context-switching of FPSIMD registers on arm64 - "split" regions for vGIC redistributor s390: - cleanups for nested - clock handling - crypto - storage keys - control register bits x86: - many bugfixes - implement more Hyper-V super powers - implement lapic_timer_advance_ns even when the LAPIC timer is emulated using the processor's VMX preemption timer. - two security-related bugfixes at the top of the branch" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (79 commits) kvm: fix typo in flag name kvm: x86: use correct privilege level for sgdt/sidt/fxsave/fxrstor access KVM: x86: pass kvm_vcpu to kvm_read_guest_virt and kvm_write_guest_virt_system KVM: x86: introduce linear_{read,write}_system kvm: nVMX: Enforce cpl=0 for VMX instructions kvm: nVMX: Add support for "VMWRITE to any supported field" kvm: nVMX: Restrict VMX capability MSR changes KVM: VMX: Optimize tscdeadline timer latency KVM: docs: nVMX: Remove known limitations as they do not exist now KVM: docs: mmu: KVM support exposing SLAT to guests kvm: no need to check return value of debugfs_create functions kvm: Make VM ioctl do valloc for some archs kvm: Change return type to vm_fault_t KVM: docs: mmu: Fix link to NPT presentation from KVM Forum 2008 kvm: x86: Amend the KVM_GET_SUPPORTED_CPUID API documentation KVM: x86: hyperv: declare KVM_CAP_HYPERV_TLBFLUSH capability KVM: x86: hyperv: simplistic HVCALL_FLUSH_VIRTUAL_ADDRESS_{LIST,SPACE}_EX implementation KVM: x86: hyperv: simplistic HVCALL_FLUSH_VIRTUAL_ADDRESS_{LIST,SPACE} implementation KVM: introduce kvm_make_vcpus_request_mask() API KVM: x86: hyperv: do rep check for each hypercall separately ...
2018-06-07Merge tag 'powerpc-4.18-1' of ↵Linus Torvalds3-5/+38
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: "Notable changes: - Support for split PMD page table lock on 64-bit Book3S (Power8/9). - Add support for HAVE_RELIABLE_STACKTRACE, so we properly support live patching again. - Add support for patching barrier_nospec in copy_from_user() and syscall entry. - A couple of fixes for our data breakpoints on Book3S. - A series from Nick optimising TLB/mm handling with the Radix MMU. - Numerous small cleanups to squash sparse/gcc warnings from Mathieu Malaterre. - Several series optimising various parts of the 32-bit code from Christophe Leroy. - Removal of support for two old machines, "SBC834xE" and "C2K" ("GEFanuc,C2K"), which is why the diffstat has so many deletions. And many other small improvements & fixes. There's a few out-of-area changes. Some minor ftrace changes OK'ed by Steve, and a fix to our powernv cpuidle driver. Then there's a series touching mm, x86 and fs/proc/task_mmu.c, which cleans up some details around pkey support. It was ack'ed/reviewed by Ingo & Dave and has been in next for several weeks. Thanks to: Akshay Adiga, Alastair D'Silva, Alexey Kardashevskiy, Al Viro, Andrew Donnellan, Aneesh Kumar K.V, Anju T Sudhakar, Arnd Bergmann, Balbir Singh, Cédric Le Goater, Christophe Leroy, Christophe Lombard, Colin Ian King, Dave Hansen, Fabio Estevam, Finn Thain, Frederic Barrat, Gautham R. Shenoy, Haren Myneni, Hari Bathini, Ingo Molnar, Jonathan Neuschäfer, Josh Poimboeuf, Kamalesh Babulal, Madhavan Srinivasan, Mahesh Salgaonkar, Mark Greer, Mathieu Malaterre, Matthew Wilcox, Michael Neuling, Michal Suchanek, Naveen N. Rao, Nicholas Piggin, Nicolai Stange, Olof Johansson, Paul Gortmaker, Paul Mackerras, Peter Rosin, Pridhiviraj Paidipeddi, Ram Pai, Rashmica Gupta, Ravi Bangoria, Russell Currey, Sam Bobroff, Samuel Mendoza-Jonas, Segher Boessenkool, Shilpasri G Bhat, Simon Guo, Souptick Joarder, Stewart Smith, Thiago Jung Bauermann, Torsten Duwe, Vaibhav Jain, Wei Yongjun, Wolfram Sang, Yisheng Xie, YueHaibing" * tag 'powerpc-4.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (251 commits) powerpc/64s/radix: Fix missing ptesync in flush_cache_vmap cpuidle: powernv: Fix promotion from snooze if next state disabled powerpc: fix build failure by disabling attribute-alias warning in pci_32 ocxl: Fix missing unlock on error in afu_ioctl_enable_p9_wait() powerpc-opal: fix spelling mistake "Uniterrupted" -> "Uninterrupted" powerpc: fix spelling mistake: "Usupported" -> "Unsupported" powerpc/pkeys: Detach execute_only key on !PROT_EXEC powerpc/powernv: copy/paste - Mask SO bit in CR powerpc: Remove core support for Marvell mv64x60 hostbridges powerpc/boot: Remove core support for Marvell mv64x60 hostbridges powerpc/boot: Remove support for Marvell mv64x60 i2c controller powerpc/boot: Remove support for Marvell MPSC serial controller powerpc/embedded6xx: Remove C2K board support powerpc/lib: optimise PPC32 memcmp powerpc/lib: optimise 32 bits __clear_user() powerpc/time: inline arch_vtime_task_switch() powerpc/Makefile: set -mcpu=860 flag for the 8xx powerpc: Implement csum_ipv6_magic in assembly powerpc/32: Optimise __csum_partial() powerpc/lib: Adjust .balign inside string functions for PPC32 ...
2018-06-01kvm: no need to check return value of debugfs_create functionsGreg Kroah-Hartman1-2/+1
When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this. This cleans up the error handling a lot, as this code will never get hit. Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Christoffer Dall <christoffer.dall@arm.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim KrÄmář" <rkrcmar@redhat.com> Cc: Arvind Yadav <arvind.yadav.cs@gmail.com> Cc: Eric Auger <eric.auger@redhat.com> Cc: Andre Przywara <andre.przywara@arm.com> Cc: kvm-ppc@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-kernel@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: kvmarm@lists.cs.columbia.edu Cc: kvm@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-06-01kvm: Change return type to vm_fault_tSouptick Joarder1-1/+1
Use new return type vm_fault_t for fault handler. For now, this is just documenting that the function returns a VM_FAULT value rather than an errno. Once all instances are converted, vm_fault_t will become a distinct type. commit 1c8f422059ae ("mm: change return type to vm_fault_t") Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com> Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-06-01KVM: PPC: Book3S PR: Enable kvmppc_get/set_one_reg_pr() for HTM registersSimon Guo1-0/+133
We need to migrate PR KVM during transaction and userspace will use kvmppc_get_one_reg_pr()/kvmppc_set_one_reg_pr() APIs to get/set transaction checkpoint state. This patch adds support for that. So far, QEMU on PR KVM doesn't fully function for migration but the savevm/loadvm can be done against a RHEL72 guest. During savevm/ loadvm procedure, the kvm ioctls will be invoked as well. Test has been performed to savevm/loadvm for a guest running a HTM test program: https://github.com/justdoitqd/publicFiles/blob/master/test-tm-mig.c Signed-off-by: Simon Guo <wei.guo.simon@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-06-01KVM: PPC: Book3S: Remove load/put vcpu for KVM_GET_REGS/KVM_SET_REGSSimon Guo1-6/+0
In both HV and PR KVM, the KVM_SET_REGS/KVM_GET_REGS ioctl should be able to perform without the vcpu loaded. Since the vcpu mutex locking/unlock has been moved out of vcpu_load() /vcpu_put(), KVM_SET_REGS/KVM_GET_REGS don't need to do ioctl with the vcpu loaded anymore. This patch removes vcpu_load()/vcpu_put() from KVM_SET_REGS/KVM_GET_REGS ioctl. Signed-off-by: Simon Guo <wei.guo.simon@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-06-01KVM: PPC: Remove load/put vcpu for KVM_GET/SET_ONE_REG ioctlSimon Guo1-2/+0
Since the vcpu mutex locking/unlock has been moved out of vcpu_load() /vcpu_put(), KVM_GET_ONE_REG and KVM_SET_ONE_REG doesn't need to do ioctl with loading vcpu anymore. This patch removes vcpu_load()/vcpu_put() from KVM_GET_ONE_REG and KVM_SET_ONE_REG ioctl. Signed-off-by: Simon Guo <wei.guo.simon@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-06-01KVM: PPC: Move vcpu_load/vcpu_put down to each ioctl case in kvm_arch_vcpu_ioctlSimon Guo1-3/+6
Although we already have kvm_arch_vcpu_async_ioctl() which doesn't require ioctl to load vcpu, the sync ioctl code need to be cleaned up when CONFIG_HAVE_KVM_VCPU_ASYNC_IOCTL is not configured. This patch moves vcpu_load/vcpu_put down to each ioctl switch case so that each ioctl can decide to do vcpu_load/vcpu_put or not independently. Signed-off-by: Simon Guo <wei.guo.simon@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-06-01KVM: PPC: Book3S PR: Enable HTM for PR KVM for KVM_CHECK_EXTENSION ioctlSimon Guo1-3/+2
With current patch set, PR KVM now supports HTM. So this patch turns it on for PR KVM. Tested with: https://github.com/justdoitqd/publicFiles/blob/master/test_kvm_htm_cap.c Signed-off-by: Simon Guo <wei.guo.simon@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-06-01KVM: PPC: Book3S PR: Support TAR handling for PR KVM HTMSimon Guo3-7/+36
Currently guest kernel doesn't handle TAR facility unavailable and it always runs with TAR bit on. PR KVM will lazily enable TAR. TAR is not a frequent-use register and it is not included in SVCPU struct. Due to the above, the checkpointed TAR val might be a bogus TAR val. To solve this issue, we will make vcpu->arch.fscr tar bit consistent with shadow_fscr when TM is enabled. At the end of emulating treclaim., the correct TAR val need to be loaded into the register if FSCR_TAR bit is on. At the beginning of emulating trechkpt., TAR needs to be flushed so that the right tar val can be copied into tar_tm. Tested with: tools/testing/selftests/powerpc/tm/tm-tar tools/testing/selftests/powerpc/ptrace/ptrace-tm-tar (remove DSCR/PPR related testing). Signed-off-by: Simon Guo <wei.guo.simon@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-06-01KVM: PPC: Book3S PR: Add guard code to prevent returning to guest with PR=0 ↵Simon Guo3-2/+19
and Transactional state Currently PR KVM doesn't support transaction memory in guest privileged state. This patch adds a check at setting guest msr, so that we can never return to guest with PR=0 and TS=0b10. A tabort will be emulated to indicate this and fail transaction immediately. [paulus@ozlabs.org - don't change the TM_CAUSE_MISC definition, instead use TM_CAUSE_KVM_FAC_UNAV.] Signed-off-by: Simon Guo <wei.guo.simon@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-06-01KVM: PPC: Book3S PR: Add emulation for tabort. in privileged stateSimon Guo1-0/+68
Currently privileged-state guest will be run with TM disabled. Although the privileged-state guest cannot initiate a new transaction, it can use tabort to terminate its problem state's transaction. So it is still necessary to emulate tabort. for privileged-state guest. Tested with: https://github.com/justdoitqd/publicFiles/blob/master/test_tabort.c Signed-off-by: Simon Guo <wei.guo.simon@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-06-01KVM: PPC: Book3S PR: Add emulation for trechkpt.Simon Guo2-1/+62
This patch adds host emulation when guest PR KVM executes "trechkpt.", which is a privileged instruction and will trap into host. We firstly copy vcpu ongoing content into vcpu tm checkpoint content, then perform kvmppc_restore_tm_pr() to do trechkpt. with updated vcpu tm checkpoint values. Signed-off-by: Simon Guo <wei.guo.simon@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-06-01KVM: PPC: Book3S PR: Add emulation for treclaim.Simon Guo1-0/+76
This patch adds support for "treclaim." emulation when PR KVM guest executes treclaim. and traps to host. We will firstly do treclaim. and save TM checkpoint. Then it is necessary to update vcpu current reg content with checkpointed vals. When rfid into guest again, those vcpu current reg content (now the checkpoint vals) will be loaded into regs. Signed-off-by: Simon Guo <wei.guo.simon@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-06-01KVM: PPC: Book3S PR: Restore NV regs after emulating mfspr from TM SPRsSimon Guo1-2/+13
Currently kvmppc_handle_fac() will not update NV GPRs and thus it can return with GUEST_RESUME. However PR KVM guest always disables MSR_TM bit in privileged state. If PR privileged-state guest is trying to read TM SPRs, it will trigger TM facility unavailable exception and fall into kvmppc_handle_fac(). Then the emulation will be done by kvmppc_core_emulate_mfspr_pr(). The mfspr instruction can include a RT with NV reg. So it is necessary to restore NV GPRs at this case, to reflect the update to NV RT. This patch make kvmppc_handle_fac() return GUEST_RESUME_NV for TM facility unavailable exceptions in guest privileged state. Signed-off-by: Simon Guo <wei.guo.simon@gmail.com> Reviewed-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-06-01KVM: PPC: Book3S PR: Always fail transactions in guest privileged stateSimon Guo2-1/+50
Currently the kernel doesn't use transaction memory. And there is an issue for privileged state in the guest that: tbegin/tsuspend/tresume/tabort TM instructions can impact MSR TM bits without trapping into the PR host. So following code will lead to a false mfmsr result: tbegin <- MSR bits update to Transaction active. beq <- failover handler branch mfmsr <- still read MSR bits from magic page with transaction inactive. It is not an issue for non-privileged guest state since its mfmsr is not patched with magic page and will always trap into the PR host. This patch will always fail tbegin attempt for privileged state in the guest, so that the above issue is prevented. It is benign since currently (guest) kernel doesn't initiate a transaction. Test case: https://github.com/justdoitqd/publicFiles/blob/master/test_tbegin_pr.c Signed-off-by: Simon Guo <wei.guo.simon@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-06-01KVM: PPC: Book3S PR: Emulate mtspr/mfspr using active TM SPRsSimon Guo2-11/+49
The mfspr/mtspr on TM SPRs(TEXASR/TFIAR/TFHAR) are non-privileged instructions and can be executed by PR KVM guest in problem state without trapping into the host. We only emulate mtspr/mfspr texasr/tfiar/tfhar in guest PR=0 state. When we are emulating mtspr tm sprs in guest PR=0 state, the emulation result needs to be visible to guest PR=1 state. That is, the actual TM SPR val should be loaded into actual registers. We already flush TM SPRs into vcpu when switching out of CPU, and load TM SPRs when switching back. This patch corrects mfspr()/mtspr() emulation for TM SPRs to make the actual source/dest be the actual TM SPRs. Signed-off-by: Simon Guo <wei.guo.simon@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-06-01KVM: PPC: Book3S PR: Add math support for PR KVM HTMSimon Guo1-0/+35
The math registers will be saved into vcpu->arch.fp/vr and corresponding vcpu->arch.fp_tm/vr_tm area. We flush or giveup the math regs into vcpu->arch.fp/vr before saving transaction. After transaction is restored, the math regs will be loaded back into regs. If there is a FP/VEC/VSX unavailable exception during transaction active state, the math checkpoint content might be incorrect and we need to do treclaim./load the correct checkpoint val/trechkpt. sequence to retry the transaction. That will make our solution complicated. To solve this issue, we always make the hardware guest MSR math bits (shadow_msr) consistent with the MSR val which guest sees (kvmppc_get_msr()) when guest msr is with tm enabled. Then all FP/VEC/VSX unavailable exception can be delivered to guest and guest handles the exception by itself. Signed-off-by: Simon Guo <wei.guo.simon@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-06-01KVM: PPC: Book3S PR: Add transaction memory save/restore skeletonSimon Guo1-0/+27
The transaction memory checkpoint area save/restore behavior is triggered when VCPU qemu process is switching out/into CPU, i.e. at kvmppc_core_vcpu_put_pr() and kvmppc_core_vcpu_load_pr(). MSR TM active state is determined by TS bits: active: 10(transactional) or 01 (suspended) inactive: 00 (non-transactional) We don't "fake" TM functionality for guest. We "sync" guest virtual MSR TM active state(10 or 01) with shadow MSR. That is to say, we don't emulate a transactional guest with a TM inactive MSR. TM SPR support(TFIAR/TFAR/TEXASR) has already been supported by commit 9916d57e64a4 ("KVM: PPC: Book3S PR: Expose TM registers"). Math register support (FPR/VMX/VSX) will be done at subsequent patch. Whether TM context need to be saved/restored can be determined by kvmppc_get_msr() TM active state: * TM active - save/restore TM context * TM inactive - no need to do so and only save/restore TM SPRs. Signed-off-by: Simon Guo <wei.guo.simon@gmail.com> Suggested-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-06-01KVM: PPC: Book3S PR: Add kvmppc_save/restore_tm_sprs() APIsSimon Guo1-0/+22
This patch adds 2 new APIs, kvmppc_save_tm_sprs() and kvmppc_restore_tm_sprs(), for the purpose of TEXASR/TFIAR/TFHAR save/restore. Signed-off-by: Simon Guo <wei.guo.simon@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-06-01KVM: PPC: Book3S PR: Add new kvmppc_copyto/from_vcpu_tm APIsSimon Guo1-0/+41
This patch adds 2 new APIs: kvmppc_copyto_vcpu_tm() and kvmppc_copyfrom_vcpu_tm(). These 2 APIs will be used to copy from/to TM data between VCPU_TM/VCPU area. PR KVM will use these APIs for treclaim. or trechkpt. emulation. Signed-off-by: Simon Guo <wei.guo.simon@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-06-01KVM: PPC: Book3S PR: Avoid changing TS bits when exiting guestSimon Guo1-0/+13
PR KVM host usually runs with TM enabled in its host MSR value, and with non-transactional TS value. When a guest with TM active traps into PR KVM host, the rfid at the tail of kvmppc_interrupt_pr() will try to switch TS bits from S0 (Suspended & TM disabled) to N1 (Non-transactional & TM enabled). That will leads to TM Bad Thing interrupt. This patch manually sets target TS bits unchanged to avoid this exception. Signed-off-by: Simon Guo <wei.guo.simon@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-06-01KVM: PPC: Book3S PR: Implement RFID TM behavior to suppress change from S0 to N0Simon Guo1-2/+19
According to ISA specification for RFID, in MSR TM disabled and TS suspended state (S0), if the target MSR is TM disabled and TS state is inactive (N0), rfid should suppress this update. This patch makes the RFID emulation of PR KVM consistent with this. Signed-off-by: Simon Guo <wei.guo.simon@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-06-01KVM: PPC: Book3S PR: Sync TM bits to shadow msr for problem state guestSimon Guo1-23/+50
MSR TS bits can be modified with non-privileged instruction such as tbegin./tend. That means guest can change MSR value "silently" without notifying host. It is necessary to sync the TM bits to host so that host can calculate shadow msr correctly. Note, privileged mode in the guest will always fail transactions so we only take care of problem state mode in the guest. The logic is put into kvmppc_copy_from_svcpu() so that kvmppc_handle_exit_pr() can use correct MSR TM bits even when preemption occurs. Signed-off-by: Simon Guo <wei.guo.simon@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-06-01KVM: PPC: Book3S PR: Pass through MSR TM and TS bits to shadow_msrSimon Guo1-0/+5
PowerPC TM functionality needs MSR TM/TS bits support in hardware level. Guest TM functionality can not be emulated with "fake" MSR (msr in magic page) TS bits. This patch syncs TM/TS bits in shadow_msr with the MSR value in magic page, so that the MSR TS value which guest sees is consistent with actual MSR bits running in guest. Signed-off-by: Simon Guo <wei.guo.simon@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-06-01KVM: PPC: Book3S PR: Transition to Suspended state when injecting interruptSimon Guo1-1/+10
This patch simulates interrupt behavior per Power ISA while injecting interrupt in PR KVM: - When interrupt happens, transactional state should be suspended. kvmppc_mmu_book3s_64_reset_msr() will be invoked when injecting an interrupt. This patch performs this ISA logic in kvmppc_mmu_book3s_64_reset_msr(). Signed-off-by: Simon Guo <wei.guo.simon@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-06-01KVM: PPC: Book3S PR: Add C function wrapper for _kvmppc_save/restore_tm()Simon Guo2-5/+95
Currently __kvmppc_save/restore_tm() APIs can only be invoked from assembly function. This patch adds C function wrappers for them so that they can be safely called from C function. Signed-off-by: Simon Guo <wei.guo.simon@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-06-01KVM: PPC: Book3S PR: Turn on FP/VSX/VMX MSR bits in kvmppc_save_tm()Simon Guo1-0/+2
kvmppc_save_tm() invokes store_fp_state/store_vr_state(). So it is mandatory to turn on FP/VSX/VMX MSR bits for its execution, just like what kvmppc_restore_tm() did. Previously HV KVM has turned the bits on outside of function kvmppc_save_tm(). Now we include this bit change in kvmppc_save_tm() so that the logic is cleaner. And PR KVM can reuse it later. Signed-off-by: Simon Guo <wei.guo.simon@gmail.com> Reviewed-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-06-01KVM: PPC: Book3S PR: Add guest MSR parameter for ↵Simon Guo2-47/+57
kvmppc_save_tm()/kvmppc_restore_tm() HV KVM and PR KVM need different MSR source to indicate whether treclaim. or trecheckpoint. is necessary. This patch add new parameter (guest MSR) for these kvmppc_save_tm/ kvmppc_restore_tm() APIs: - For HV KVM, it is VCPU_MSR - For PR KVM, it is current host MSR or VCPU_SHADOW_SRR1 This enhancement enables these 2 APIs to be reused by PR KVM later. And the patch keeps HV KVM logic unchanged. This patch also reworks kvmppc_save_tm()/kvmppc_restore_tm() to have a clean ABI: r3 for vcpu and r4 for guest_msr. During kvmppc_save_tm/kvmppc_restore_tm(), the R1 need to be saved or restored. Currently the R1 is saved into HSTATE_HOST_R1. In PR KVM, we are going to add a C function wrapper for kvmppc_save_tm/kvmppc_restore_tm() where the R1 will be incremented with added stackframe and save into HSTATE_HOST_R1. There are several places in HV KVM to load HSTATE_HOST_R1 as R1, and we don't want to bring risk or confusion by TM code. This patch will use HSTATE_SCRATCH2 to save/restore R1 in kvmppc_save_tm/kvmppc_restore_tm() to avoid future confusion, since the r1 is actually a temporary/scratch value to be saved/stored. [paulus@ozlabs.org - rebased on top of 7b0e827c6970 ("KVM: PPC: Book3S HV: Factor fake-suspend handling out of kvmppc_save/restore_tm", 2018-05-30)] Signed-off-by: Simon Guo <wei.guo.simon@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-05-31KVM: PPC: Book3S PR: Move kvmppc_save_tm/kvmppc_restore_tm to separate fileSimon Guo3-231/+282
It is a simple patch just for moving kvmppc_save_tm/kvmppc_restore_tm() functionalities to tm.S. There is no logic change. The reconstruct of those APIs will be done in later patches to improve readability. It is for preparation of reusing those APIs on both HV/PR PPC KVM. Some slight change during move the functions includes: - surrounds some HV KVM specific code with CONFIG_KVM_BOOK3S_HV_POSSIBLE for compilation. - use _GLOBAL() to define kvmppc_save_tm/kvmppc_restore_tm() [paulus@ozlabs.org - rebased on top of 7b0e827c6970 ("KVM: PPC: Book3S HV: Factor fake-suspend handling out of kvmppc_save/restore_tm", 2018-05-30)] Signed-off-by: Simon Guo <wei.guo.simon@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-05-31KVM: PPC: Book3S HV: Factor fake-suspend handling out of kvmppc_save/restore_tmPaul Mackerras1-69/+126
This splits out the handling of "fake suspend" mode, part of the hypervisor TM assist code for POWER9, and puts almost all of it in new kvmppc_save_tm_hv and kvmppc_restore_tm_hv functions. The new functions branch to kvmppc_save/restore_tm if the CPU does not require hypervisor TM assistance. With this, it will be more straightforward to move kvmppc_save_tm and kvmppc_restore_tm to another file and use them for transactional memory support in PR KVM. Additionally, it also makes the code a bit clearer and reduces the number of feature sections. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-05-31KVM: PPC: Book3S PR: Allow KVM_PPC_CONFIGURE_V3_MMU to succeedPaul Mackerras1-0/+12
Currently, PR KVM does not implement the configure_mmu operation, and so the KVM_PPC_CONFIGURE_V3_MMU ioctl always fails with an EINVAL error. This causes recent kernels to fail to boot as a PR KVM guest on POWER9, since recent kernels booted in HPT mode do the H_REGISTER_PROC_TBL hypercall, which causes userspace (QEMU) to do KVM_PPC_CONFIGURE_V3_MMU, which fails. This implements a minimal configure_mmu operation for PR KVM. It succeeds only if the MMU is being configured for HPT mode and no process table is being registered. This is enough to get recent kernels to boot as a PR KVM guest. Reviewed-by: Greg Kurz <groug@kaod.org> Tested-by: Greg Kurz <groug@kaod.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>