summaryrefslogtreecommitdiff
path: root/arch/powerpc/kvm
AgeCommit message (Collapse)AuthorFilesLines
2020-12-04powerpc: Fix update form addressing in inline assemblyChristophe Leroy1-2/+2
In several places, inline assembly uses the "%Un" modifier to enable the use of instruction with update form addressing, but the associated "<>" constraint is missing. As mentioned in previous patch, this fails with gcc 4.9, so "<>" can't be used directly. Use UPD_CONSTR macro everywhere %Un modifier is used. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/62eab5ca595485c192de1765bdac099f633a21d0.1603358942.git.christophe.leroy@csgroup.eu
2020-12-03KVM: PPC: BOOK3S: PR: Ignore UAMOR SPRAneesh Kumar K.V1-0/+6
With power7 and above we expect the cpu to support keys. The number of keys are firmware controlled based on device tree. PR KVM do not expose key details via device tree. Hence when running with PR KVM we do run with MMU_FTR_KEY support disabled. But we can still get updates on UAMOR. Hence ignore access to them and for mfstpr return 0 indicating no AMR/IAMR update is no allowed. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201127044424.40686-3-aneesh.kumar@linux.ibm.com
2020-12-03KVM: PPC: Book3S HV: Ratelimit machine check messages coming from guestsNicholas Piggin1-4/+16
A number of machine check exceptions are triggerable by the guest. Ratelimit these to avoid a guest flooding the host console and logs. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Use dedicated ratelimit state, not printk_ratelimit()] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201128070728.825934-5-npiggin@gmail.com
2020-12-03KVM: PPC: Book3S HV: Don't attempt to recover machine checks for FWNMI ↵Nicholas Piggin1-7/+16
enabled guests Guests that can deal with machine checks would actually prefer the hypervisor not to try recover for them. For example if SLB multi-hits are recovered by the hypervisor by clearing the SLB then the guest will not be able to log the contents and debug its programming error. If guests don't register for FWNMI, they may not be so capable and so the hypervisor will continue to recover for those. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201128070728.825934-4-npiggin@gmail.com
2020-11-25Merge branch 'fixes' into nextMichael Ellerman1-0/+7
Merge our fixes branch, in particular to bring in the changes for the entry/uaccess flush.
2020-11-19powerpc: Replace RFI by rfi on book3s/32 and bookeChristophe Leroy1-2/+2
For book3s/32 and for booke, RFI is just an rfi. Only 40x has a non trivial RFI. CONFIG_PPC_RTAS is never selected by 40x platforms. Make it more explicit by replacing RFI by rfi wherever possible. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/b901ddfdeb8a0a3b7cb59999599cdfde1bbfe834.1604854583.git.christophe.leroy@csgroup.eu
2020-11-19powerpc/mm: Update tlbiel loop on POWER10Aneesh Kumar K.V2-2/+16
With POWER10, single tlbiel instruction invalidates all the congruence class of the TLB and hence we need to issue only one tlbiel with SET=0. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201007053305.232879-1-aneesh.kumar@linux.ibm.com
2020-11-16KVM: PPC: Book3S HV: XIVE: Fix possible oops when accessing ESB pageCédric Le Goater1-0/+7
When accessing the ESB page of a source interrupt, the fault handler will retrieve the page address from the XIVE interrupt 'xive_irq_data' structure. If the associated KVM XIVE interrupt is not valid, that is not allocated at the HW level for some reason, the fault handler will dereference a NULL pointer leading to the oops below : WARNING: CPU: 40 PID: 59101 at arch/powerpc/kvm/book3s_xive_native.c:259 xive_native_esb_fault+0xe4/0x240 [kvm] CPU: 40 PID: 59101 Comm: qemu-system-ppc Kdump: loaded Tainted: G W --------- - - 4.18.0-240.el8.ppc64le #1 NIP: c00800000e949fac LR: c00000000044b164 CTR: c00800000e949ec8 REGS: c000001f69617840 TRAP: 0700 Tainted: G W --------- - - (4.18.0-240.el8.ppc64le) MSR: 9000000000029033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 44044282 XER: 00000000 CFAR: c00000000044b160 IRQMASK: 0 GPR00: c00000000044b164 c000001f69617ac0 c00800000e96e000 c000001f69617c10 GPR04: 05faa2b21e000080 0000000000000000 0000000000000005 ffffffffffffffff GPR08: 0000000000000000 0000000000000001 0000000000000000 0000000000000001 GPR12: c00800000e949ec8 c000001ffffd3400 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 0000000000000000 0000000000000000 c000001f5c065160 c000000001c76f90 GPR24: c000001f06f20000 c000001f5c065100 0000000000000008 c000001f0eb98c78 GPR28: c000001dcab40000 c000001dcab403d8 c000001f69617c10 0000000000000011 NIP [c00800000e949fac] xive_native_esb_fault+0xe4/0x240 [kvm] LR [c00000000044b164] __do_fault+0x64/0x220 Call Trace: [c000001f69617ac0] [0000000137a5dc20] 0x137a5dc20 (unreliable) [c000001f69617b50] [c00000000044b164] __do_fault+0x64/0x220 [c000001f69617b90] [c000000000453838] do_fault+0x218/0x930 [c000001f69617bf0] [c000000000456f50] __handle_mm_fault+0x350/0xdf0 [c000001f69617cd0] [c000000000457b1c] handle_mm_fault+0x12c/0x310 [c000001f69617d10] [c00000000007ef44] __do_page_fault+0x264/0xbb0 [c000001f69617df0] [c00000000007f8c8] do_page_fault+0x38/0xd0 [c000001f69617e30] [c00000000000a714] handle_page_fault+0x18/0x38 Instruction dump: 40c2fff0 7c2004ac 2fa90000 409e0118 73e90001 41820080 e8bd0008 7c2004ac 7ca90074 39400000 915c0000 7929d182 <0b090000> 2fa50000 419e0080 e89e0018 ---[ end trace 66c6ff034c53f64f ]--- xive-kvm: xive_native_esb_fault: accessing invalid ESB page for source 8 ! Fix that by checking the validity of the KVM XIVE interrupt structure. Fixes: 6520ca64cde7 ("KVM: PPC: Book3S HV: XIVE: Add a mapping for the source ESB pages") Cc: stable@vger.kernel.org # v5.2+ Reported-by: Greg Kurz <groug@kaod.org> Signed-off-by: Cédric Le Goater <clg@kaod.org> Tested-by: Greg Kurz <groug@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201105134713.656160-1-clg@kaod.org
2020-10-26treewide: Convert macro and uses of __section(foo) to __section("foo")Joe Perches1-1/+1
Use a more generic form for __section that requires quotes to avoid complications with clang and gcc differences. Remove the quote operator # from compiler_attributes.h __section macro. Convert all unquoted __section(foo) uses to quoted __section("foo"). Also convert __attribute__((section("foo"))) uses to __section("foo") even if the __attribute__ has multiple list entry forms. Conversion done using the script at: https://lore.kernel.org/lkml/75393e5ddc272dc7403de74d645e6c6e0f4e70eb.camel@perches.com/2-convert_section.pl Signed-off-by: Joe Perches <joe@perches.com> Reviewed-by: Nick Desaulniers <ndesaulniers@gooogle.com> Reviewed-by: Miguel Ojeda <ojeda@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-23Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds12-49/+110
Pull KVM updates from Paolo Bonzini: "For x86, there is a new alternative and (in the future) more scalable implementation of extended page tables that does not need a reverse map from guest physical addresses to host physical addresses. For now it is disabled by default because it is still lacking a few of the existing MMU's bells and whistles. However it is a very solid piece of work and it is already available for people to hammer on it. Other updates: ARM: - New page table code for both hypervisor and guest stage-2 - Introduction of a new EL2-private host context - Allow EL2 to have its own private per-CPU variables - Support of PMU event filtering - Complete rework of the Spectre mitigation PPC: - Fix for running nested guests with in-kernel IRQ chip - Fix race condition causing occasional host hard lockup - Minor cleanups and bugfixes x86: - allow trapping unknown MSRs to userspace - allow userspace to force #GP on specific MSRs - INVPCID support on AMD - nested AMD cleanup, on demand allocation of nested SVM state - hide PV MSRs and hypercalls for features not enabled in CPUID - new test for MSR_IA32_TSC writes from host and guest - cleanups: MMU, CPUID, shared MSRs - LAPIC latency optimizations ad bugfixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (232 commits) kvm: x86/mmu: NX largepage recovery for TDP MMU kvm: x86/mmu: Don't clear write flooding count for direct roots kvm: x86/mmu: Support MMIO in the TDP MMU kvm: x86/mmu: Support write protection for nesting in tdp MMU kvm: x86/mmu: Support disabling dirty logging for the tdp MMU kvm: x86/mmu: Support dirty logging for the TDP MMU kvm: x86/mmu: Support changed pte notifier in tdp MMU kvm: x86/mmu: Add access tracking for tdp_mmu kvm: x86/mmu: Support invalidate range MMU notifier for TDP MMU kvm: x86/mmu: Allocate struct kvm_mmu_pages for all pages in TDP MMU kvm: x86/mmu: Add TDP MMU PF handler kvm: x86/mmu: Remove disallowed_hugepage_adjust shadow_walk_iterator arg kvm: x86/mmu: Support zapping SPTEs in the TDP MMU KVM: Cache as_id in kvm_memory_slot kvm: x86/mmu: Add functions to handle changed TDP SPTEs kvm: x86/mmu: Allocate and free TDP MMU roots kvm: x86/mmu: Init / Uninit the TDP MMU kvm: x86/mmu: Introduce tdp_iter KVM: mmu: extract spte.h and spte.c KVM: mmu: Separate updating a PTE from kvm_set_pte_rmapp ...
2020-10-22KVM: PPC: Book3S HV: Make struct kernel_param_ops definition constJoe Perches1-1/+1
This should be const, so make it so. Signed-off-by: Joe Perches <joe@perches.com> Message-Id: <d130e88dd4c82a12d979da747cc0365c72c3ba15.1601770305.git.joe@perches.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-16Merge tag 'powerpc-5.10-1' of ↵Linus Torvalds2-0/+15
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - A series from Nick adding ARCH_WANT_IRQS_OFF_ACTIVATE_MM & selecting it for powerpc, as well as a related fix for sparc. - Remove support for PowerPC 601. - Some fixes for watchpoints & addition of a new ptrace flag for detecting ISA v3.1 (Power10) watchpoint features. - A fix for kernels using 4K pages and the hash MMU on bare metal Power9 systems with > 16TB of RAM, or RAM on the 2nd node. - A basic idle driver for shallow stop states on Power10. - Tweaks to our sched domains code to better inform the scheduler about the hardware topology on Power9/10, where two SMT4 cores can be presented by firmware as an SMT8 core. - A series doing further reworks & cleanups of our EEH code. - Addition of a filter for RTAS (firmware) calls done via sys_rtas(), to prevent root from overwriting kernel memory. - Other smaller features, fixes & cleanups. Thanks to: Alexey Kardashevskiy, Andrew Donnellan, Aneesh Kumar K.V, Athira Rajeev, Biwen Li, Cameron Berkenpas, Cédric Le Goater, Christophe Leroy, Christoph Hellwig, Colin Ian King, Daniel Axtens, David Dai, Finn Thain, Frederic Barrat, Gautham R. Shenoy, Greg Kurz, Gustavo Romero, Ira Weiny, Jason Yan, Joel Stanley, Jordan Niethe, Kajol Jain, Konrad Rzeszutek Wilk, Laurent Dufour, Leonardo Bras, Liu Shixin, Luca Ceresoli, Madhavan Srinivasan, Mahesh Salgaonkar, Nathan Lynch, Nicholas Mc Guire, Nicholas Piggin, Nick Desaulniers, Oliver O'Halloran, Pedro Miraglia Franco de Carvalho, Pratik Rajesh Sampat, Qian Cai, Qinglang Miao, Ravi Bangoria, Russell Currey, Satheesh Rajendran, Scott Cheloha, Segher Boessenkool, Srikar Dronamraju, Stan Johnson, Stephen Kitt, Stephen Rothwell, Thiago Jung Bauermann, Tyrel Datwyler, Vaibhav Jain, Vaidyanathan Srinivasan, Vasant Hegde, Wang Wensheng, Wolfram Sang, Yang Yingliang, zhengbin. * tag 'powerpc-5.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (228 commits) Revert "powerpc/pci: unmap legacy INTx interrupts when a PHB is removed" selftests/powerpc: Fix eeh-basic.sh exit codes cpufreq: powernv: Fix frame-size-overflow in powernv_cpufreq_reboot_notifier powerpc/time: Make get_tb() common to PPC32 and PPC64 powerpc/time: Make get_tbl() common to PPC32 and PPC64 powerpc/time: Remove get_tbu() powerpc/time: Avoid using get_tbl() and get_tbu() internally powerpc/time: Make mftb() common to PPC32 and PPC64 powerpc/time: Rename mftbl() to mftb() powerpc/32s: Remove #ifdef CONFIG_PPC_BOOK3S_32 in head_book3s_32.S powerpc/32s: Rename head_32.S to head_book3s_32.S powerpc/32s: Setup the early hash table at all time. powerpc/time: Remove ifdef in get_dec() and set_dec() powerpc: Remove get_tb_or_rtc() powerpc: Remove __USE_RTC() powerpc: Tidy up a bit after removal of PowerPC 601. powerpc: Remove support for PowerPC 601 powerpc: Remove PowerPC 601 powerpc: Drop SYNC_601() ISYNC_601() and SYNC() powerpc: Remove CONFIG_PPC601_SYNC_FIX ...
2020-10-14KVM: PPC: Book3S HV: simplify kvm_cma_reserve()Mike Rapoport1-10/+2
Patch series "memblock: seasonal cleaning^w cleanup", v3. These patches simplify several uses of memblock iterators and hide some of the memblock implementation details from the rest of the system. This patch (of 17): The memory size calculation in kvm_cma_reserve() traverses memblock.memory rather than simply call memblock_phys_mem_size(). The comment in that function suggests that at some point there should have been call to memblock_analyze() before memblock_phys_mem_size() could be used. As of now, there is no memblock_analyze() at all and memblock_phys_mem_size() can be used as soon as cold-plug memory is registered with memblock. Replace loop over memblock.memory with a call to memblock_phys_mem_size(). Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Baoquan He <bhe@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Daniel Axtens <dja@axtens.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Emil Renner Berthing <kernel@esmil.dk> Cc: Ingo Molnar <mingo@redhat.com> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Link: https://lkml.kernel.org/r/20200818151634.14343-1-rppt@kernel.org Link: https://lkml.kernel.org/r/20200818151634.14343-2-rppt@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-14mm/memremap_pages: support multiple ranges per invocationDan Williams1-0/+1
In support of device-dax growing the ability to front physically dis-contiguous ranges of memory, update devm_memremap_pages() to track multiple ranges with a single reference counter and devm instance. Convert all [devm_]memremap_pages() users to specify the number of ranges they are mapping in their 'struct dev_pagemap' instance. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Juergen Gross <jgross@suse.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: "Jérôme Glisse" <jglisse@redhat.co Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brice Goglin <Brice.Goglin@inria.fr> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Hulk Robot <hulkci@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Jason Yan <yanaijie@huawei.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: Jia He <justin.he@arm.com> Cc: Joao Martins <joao.m.martins@oracle.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: kernel test robot <lkp@intel.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: Will Deacon <will@kernel.org> Link: https://lkml.kernel.org/r/159643103789.4062302.18426128170217903785.stgit@dwillia2-desk3.amr.corp.intel.com Link: https://lkml.kernel.org/r/160106116293.30709.13350662794915396198.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-14mm/memremap_pages: convert to 'struct range'Dan Williams1-6/+7
The 'struct resource' in 'struct dev_pagemap' is only used for holding resource span information. The other fields, 'name', 'flags', 'desc', 'parent', 'sibling', and 'child' are all unused wasted space. This is in preparation for introducing a multi-range extension of devm_memremap_pages(). The bulk of this change is unwinding all the places internal to libnvdimm that used 'struct resource' unnecessarily, and replacing instances of 'struct dev_pagemap'.res with 'struct dev_pagemap'.range. P2PDMA had a minor usage of the resource flags field, but only to report failures with "%pR". That is replaced with an open coded print of the range. [dan.carpenter@oracle.com: mm/hmm/test: use after free in dmirror_allocate_chunk()] Link: https://lkml.kernel.org/r/20200926121402.GA7467@kadam Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> [xen] Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Juergen Gross <jgross@suse.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brice Goglin <Brice.Goglin@inria.fr> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Hulk Robot <hulkci@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Jason Yan <yanaijie@huawei.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Jia He <justin.he@arm.com> Cc: Joao Martins <joao.m.martins@oracle.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: kernel test robot <lkp@intel.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: Will Deacon <will@kernel.org> Link: https://lkml.kernel.org/r/159643103173.4062302.768998885691711532.stgit@dwillia2-desk3.amr.corp.intel.com Link: https://lkml.kernel.org/r/160106115761.30709.13539840236873663620.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-09-22KVM: PPC: Book3S: Fix symbol undeclared warningsWang Wensheng7-7/+7
Build the kernel with `C=2`: arch/powerpc/kvm/book3s_hv_nested.c:572:25: warning: symbol 'kvmhv_alloc_nested' was not declared. Should it be static? arch/powerpc/kvm/book3s_64_mmu_radix.c:350:6: warning: symbol 'kvmppc_radix_set_pte_at' was not declared. Should it be static? arch/powerpc/kvm/book3s_hv.c:3568:5: warning: symbol 'kvmhv_p9_guest_entry' was not declared. Should it be static? arch/powerpc/kvm/book3s_hv_rm_xics.c:767:15: warning: symbol 'eoi_rc' was not declared. Should it be static? arch/powerpc/kvm/book3s_64_vio_hv.c:240:13: warning: symbol 'iommu_tce_kill_rm' was not declared. Should it be static? arch/powerpc/kvm/book3s_64_vio.c:492:6: warning: symbol 'kvmppc_tce_iommu_do_map' was not declared. Should it be static? arch/powerpc/kvm/book3s_pr.c:572:6: warning: symbol 'kvmppc_set_pvr_pr' was not declared. Should it be static? Those symbols are used only in the files that define them so make them static to fix the warnings. Signed-off-by: Wang Wensheng <wangwensheng4@huawei.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2020-09-22KVM: PPC: Book3S: Remove redundant initialization of variable retJing Xiangfeng1-1/+1
The variable ret is being initialized with '-ENOMEM' that is meaningless. So remove it. Signed-off-by: Jing Xiangfeng <jingxiangfeng@huawei.com> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2020-09-22KVM: PPC: Book3S HV: XIVE: Convert to DEFINE_SHOW_ATTRIBUTEQinglang Miao1-11/+1
Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code. Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2020-09-17KVM: PPC: Book3S HV: Set LPCR[HDICE] before writing HDECPaul Mackerras2-5/+18
POWER8 and POWER9 machines have a hardware deviation where generation of a hypervisor decrementer exception is suppressed if the HDICE bit in the LPCR register is 0 at the time when the HDEC register decrements from 0 to -1. When entering a guest, KVM first writes the HDEC register with the time until it wants the CPU to exit the guest, and then writes the LPCR with the guest value, which includes HDICE = 1. If HDEC decrements from 0 to -1 during the interval between those two events, it is possible that we can enter the guest with HDEC already negative but no HDEC exception pending, meaning that no HDEC interrupt will occur while the CPU is in the guest, or at least not until HDEC wraps around. Thus it is possible for the CPU to keep executing in the guest for a long time; up to about 4 seconds on POWER8, or about 4.46 years on POWER9 (except that the host kernel hard lockup detector will fire first). To fix this, we set the LPCR[HDICE] bit before writing HDEC on guest entry. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2020-09-17KVM: PPC: Book3S HV: Do not allocate HPT for a nested guestFabiano Rosas1-0/+6
The current nested KVM code does not support HPT guests. This is informed/enforced in some ways: - Hosts < P9 will not be able to enable the nested HV feature; - The nested hypervisor MMU capabilities will not contain KVM_CAP_PPC_MMU_HASH_V3; - QEMU reflects the MMU capabilities in the 'ibm,arch-vec-5-platform-support' device-tree property; - The nested guest, at 'prom_parse_mmu_model' ignores the 'disable_radix' kernel command line option if HPT is not supported; - The KVM_PPC_CONFIGURE_V3_MMU ioctl will fail if trying to use HPT. There is, however, still a way to start a HPT guest by using max-compat-cpu=power8 at the QEMU machine options. This leads to the guest being set to use hash after QEMU calls the KVM_PPC_ALLOCATE_HTAB ioctl. With the guest set to hash, the nested hypervisor goes through the entry path that has no knowledge of nesting (kvmppc_run_vcpu) and crashes when it tries to execute an hypervisor-privileged (mtspr HDEC) instruction at __kvmppc_vcore_entry: root@L1:~ $ qemu-system-ppc64 -machine pseries,max-cpu-compat=power8 ... <snip> [ 538.543303] CPU: 83 PID: 25185 Comm: CPU 0/KVM Not tainted 5.9.0-rc4 #1 [ 538.543355] NIP: c00800000753f388 LR: c00800000753f368 CTR: c0000000001e5ec0 [ 538.543417] REGS: c0000013e91e33b0 TRAP: 0700 Not tainted (5.9.0-rc4) [ 538.543470] MSR: 8000000002843033 <SF,VEC,VSX,FP,ME,IR,DR,RI,LE> CR: 22422882 XER: 20040000 [ 538.543546] CFAR: c00800000753f4b0 IRQMASK: 3 GPR00: c0080000075397a0 c0000013e91e3640 c00800000755e600 0000000080000000 GPR04: 0000000000000000 c0000013eab19800 c000001394de0000 00000043a054db72 GPR08: 00000000003b1652 0000000000000000 0000000000000000 c0080000075502e0 GPR12: c0000000001e5ec0 c0000007ffa74200 c0000013eab19800 0000000000000008 GPR16: 0000000000000000 c00000139676c6c0 c000000001d23948 c0000013e91e38b8 GPR20: 0000000000000053 0000000000000000 0000000000000001 0000000000000000 GPR24: 0000000000000001 0000000000000001 0000000000000000 0000000000000001 GPR28: 0000000000000001 0000000000000053 c0000013eab19800 0000000000000001 [ 538.544067] NIP [c00800000753f388] __kvmppc_vcore_entry+0x90/0x104 [kvm_hv] [ 538.544121] LR [c00800000753f368] __kvmppc_vcore_entry+0x70/0x104 [kvm_hv] [ 538.544173] Call Trace: [ 538.544196] [c0000013e91e3640] [c0000013e91e3680] 0xc0000013e91e3680 (unreliable) [ 538.544260] [c0000013e91e3820] [c0080000075397a0] kvmppc_run_core+0xbc8/0x19d0 [kvm_hv] [ 538.544325] [c0000013e91e39e0] [c00800000753d99c] kvmppc_vcpu_run_hv+0x404/0xc00 [kvm_hv] [ 538.544394] [c0000013e91e3ad0] [c0080000072da4fc] kvmppc_vcpu_run+0x34/0x48 [kvm] [ 538.544472] [c0000013e91e3af0] [c0080000072d61b8] kvm_arch_vcpu_ioctl_run+0x310/0x420 [kvm] [ 538.544539] [c0000013e91e3b80] [c0080000072c7450] kvm_vcpu_ioctl+0x298/0x778 [kvm] [ 538.544605] [c0000013e91e3ce0] [c0000000004b8c2c] sys_ioctl+0x1dc/0xc90 [ 538.544662] [c0000013e91e3dc0] [c00000000002f9a4] system_call_exception+0xe4/0x1c0 [ 538.544726] [c0000013e91e3e20] [c00000000000d140] system_call_common+0xf0/0x27c [ 538.544787] Instruction dump: [ 538.544821] f86d1098 60000000 60000000 48000099 e8ad0fe8 e8c500a0 e9264140 75290002 [ 538.544886] 7d1602a6 7cec42a6 40820008 7d0807b4 <7d164ba6> 7d083a14 f90d10a0 480104fd [ 538.544953] ---[ end trace 74423e2b948c2e0c ]--- This patch makes the KVM_PPC_ALLOCATE_HTAB ioctl fail when running in the nested hypervisor, causing QEMU to abort. Reported-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com> Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2020-09-17KVM: PPC: Don't return -ENOTSUPP to userspace in ioctlsGreg Kurz2-5/+5
ENOTSUPP is a linux only thingy, the value of which is unknown to userspace, not to be confused with ENOTSUP which linux maps to EOPNOTSUPP, as permitted by POSIX [1]: [EOPNOTSUPP] Operation not supported on socket. The type of socket (address family or protocol) does not support the requested operation. A conforming implementation may assign the same values for [EOPNOTSUPP] and [ENOTSUP]. Return -EOPNOTSUPP instead of -ENOTSUPP for the following ioctls: - KVM_GET_FPU for Book3s and BookE - KVM_SET_FPU for Book3s and BookE - KVM_GET_DIRTY_LOG for BookE This doesn't affect QEMU which doesn't call the KVM_GET_FPU and KVM_SET_FPU ioctls on POWER anyway since they are not supported, and _buggily_ ignores anything but -EPERM for KVM_GET_DIRTY_LOG. [1] https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html Signed-off-by: Greg Kurz <groug@kaod.org> Acked-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2020-09-08powerpc/64s: handle ISA v3.1 local copy-paste context switchesNicholas Piggin2-0/+15
The ISA v3.1 the copy-paste facility has a new memory move functionality which allows the copy buffer to be pasted to domestic memory (RAM) as opposed to foreign memory (accelerator). This means the POWER9 trick of avoiding the cp_abort on context switch if the process had not mapped foreign memory does not work on POWER10. Do the cp_abort unconditionally there. KVM must also cp_abort on guest exit to prevent copy buffer state leaking between contexts. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Acked-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200825075535.224536-1-npiggin@gmail.com
2020-09-03KVM: PPC: Book3S HV: XICS: Replace the 'destroy' method by a 'release' methodGreg Kurz2-19/+71
Similarly to what was done with XICS-on-XIVE and XIVE native KVM devices with commit 5422e95103cf ("KVM: PPC: Book3S HV: XIVE: Replace the 'destroy' method by a 'release' method"), convert the historical XICS KVM device to implement the 'release' method. This is needed to run nested guests with an in-kernel IRQ chip. A typical POWER9 guest can select XICS or XIVE during boot, which requires to be able to destroy and to re-create the KVM device. Only the historical XICS KVM device is available under pseries at the current time and it still uses the legacy 'destroy' method. Switching to 'release' means that vCPUs might still be running when the device is destroyed. In order to avoid potential use-after-free, the kvmppc_xics structure is allocated on first usage and kept around until the VM exits. The same pointer is used each time a KVM XICS device is being created, but this is okay since we only have one per VM. Clear the ICP of each vCPU with vcpu->mutex held. This ensures that the next time the vCPU resumes execution, it won't be going into the XICS code anymore. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org> Tested-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2020-08-22KVM: Pass MMU notifier range flags to kvm_unmap_hva_range()Will Deacon2-2/+4
The 'flags' field of 'struct mmu_notifier_range' is used to indicate whether invalidate_range_{start,end}() are permitted to block. In the case of kvm_mmu_notifier_invalidate_range_start(), this field is not forwarded on to the architecture-specific implementation of kvm_unmap_hva_range() and therefore the backend cannot sensibly decide whether or not to block. Add an extra 'flags' parameter to kvm_unmap_hva_range() so that architectures are aware as to whether or not they are permitted to block. Cc: <stable@vger.kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: James Morse <james.morse@arm.com> Signed-off-by: Will Deacon <will@kernel.org> Message-Id: <20200811102725.7121-2-will@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-08-09Merge tag 'kvm-ppc-next-5.9-1' of ↵Paolo Bonzini12-242/+626
git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into kvm-next-5.6 PPC KVM update for 5.9 - Improvements and bug-fixes for secure VM support, giving reduced startup time and memory hotplug support. - Locking fixes in nested KVM code - Increase number of guests supported by HV KVM to 4094 - Preliminary POWER10 support
2020-08-07Merge tag 'powerpc-5.9-1' of ↵Linus Torvalds7-15/+83
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - Add support for (optionally) using queued spinlocks & rwlocks. - Support for a new faster system call ABI using the scv instruction on Power9 or later. - Drop support for the PROT_SAO mmap/mprotect flag as it will be unsupported on Power10 and future processors, leaving us with no way to implement the functionality it requests. This risks breaking userspace, though we believe it is unused in practice. - A bug fix for, and then the removal of, our custom stack expansion checking. We now allow stack expansion up to the rlimit, like other architectures. - Remove the remnants of our (previously disabled) topology update code, which tried to react to NUMA layout changes on virtualised systems, but was prone to crashes and other problems. - Add PMU support for Power10 CPUs. - A change to our signal trampoline so that we don't unbalance the link stack (branch return predictor) in the signal delivery path. - Lots of other cleanups, refactorings, smaller features and so on as usual. Thanks to: Abhishek Goel, Alastair D'Silva, Alexander A. Klimov, Alexey Kardashevskiy, Alistair Popple, Andrew Donnellan, Aneesh Kumar K.V, Anju T Sudhakar, Anton Blanchard, Arnd Bergmann, Athira Rajeev, Balamuruhan S, Bharata B Rao, Bill Wendling, Bin Meng, Cédric Le Goater, Chris Packham, Christophe Leroy, Christoph Hellwig, Daniel Axtens, Dan Williams, David Lamparter, Desnes A. Nunes do Rosario, Erhard F., Finn Thain, Frederic Barrat, Ganesh Goudar, Gautham R. Shenoy, Geoff Levand, Greg Kurz, Gustavo A. R. Silva, Hari Bathini, Harish, Imre Kaloz, Joel Stanley, Joe Perches, John Crispin, Jordan Niethe, Kajol Jain, Kamalesh Babulal, Kees Cook, Laurent Dufour, Leonardo Bras, Li RongQing, Madhavan Srinivasan, Mahesh Salgaonkar, Mark Cave-Ayland, Michal Suchanek, Milton Miller, Mimi Zohar, Murilo Opsfelder Araujo, Nathan Chancellor, Nathan Lynch, Naveen N. Rao, Nayna Jain, Nicholas Piggin, Oliver O'Halloran, Palmer Dabbelt, Pedro Miraglia Franco de Carvalho, Philippe Bergheaud, Pingfan Liu, Pratik Rajesh Sampat, Qian Cai, Qinglang Miao, Randy Dunlap, Ravi Bangoria, Sachin Sant, Sam Bobroff, Sandipan Das, Santosh Sivaraj, Satheesh Rajendran, Shirisha Ganta, Sourabh Jain, Srikar Dronamraju, Stan Johnson, Stephen Rothwell, Thadeu Lima de Souza Cascardo, Thiago Jung Bauermann, Tom Lane, Vaibhav Jain, Vladis Dronov, Wei Yongjun, Wen Xiong, YueHaibing. * tag 'powerpc-5.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (337 commits) selftests/powerpc: Fix pkey syscall redefinitions powerpc: Fix circular dependency between percpu.h and mmu.h powerpc/powernv/sriov: Fix use of uninitialised variable selftests/powerpc: Skip vmx/vsx/tar/etc tests on older CPUs powerpc/40x: Fix assembler warning about r0 powerpc/papr_scm: Add support for fetching nvdimm 'fuel-gauge' metric powerpc/papr_scm: Fetch nvdimm performance stats from PHYP cpuidle: pseries: Fixup exit latency for CEDE(0) cpuidle: pseries: Add function to parse extended CEDE records cpuidle: pseries: Set the latency-hint before entering CEDE selftests/powerpc: Fix online CPU selection powerpc/perf: Consolidate perf_callchain_user_[64|32]() powerpc/pseries/hotplug-cpu: Remove double free in error path powerpc/pseries/mobility: Add pr_debug() for device tree changes powerpc/pseries/mobility: Set pr_fmt() powerpc/cacheinfo: Warn if cache object chain becomes unordered powerpc/cacheinfo: Improve diagnostics about malformed cache lists powerpc/cacheinfo: Use name@unit instead of full DT path in debug messages powerpc/cacheinfo: Set pr_fmt() powerpc: fix function annotations to avoid section mismatch warnings with gcc-10 ...
2020-08-05Merge tag 'for-linus-hmm' of ↵Linus Torvalds1-1/+3
git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Pull hmm updates from Jason Gunthorpe: "Ralph has been working on nouveau's use of hmm_range_fault() and migrate_vma() which resulted in this small series. It adds reporting of the page table order from hmm_range_fault() and some optimization of migrate_vma(): - Report the size of the page table mapping out of hmm_range_fault(). This makes it easier to establish a large/huge/etc mapping in the device's page table. - Allow devices to ignore the invalidations during migration in cases where the migration is not going to change pages. For instance migrating pages to a device does not require the device to invalidate pages already in the device. - Update nouveau and hmm_tests to use the above" * tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: mm/hmm/test: use the new migration invalidation nouveau/svm: use the new migration invalidation mm/notifier: add migration invalidation type mm/migrate: add a flags parameter to migrate_vma nouveau: fix storing invalid ptes nouveau/hmm: support mapping large sysmem pages nouveau: fix mapping 2MB sysmem pages nouveau/hmm: fault one page at a time mm/hmm: add tests for hmm_pfn_to_map_order() mm/hmm: provide the page mapping order in hmm_range_fault()
2020-08-04Merge tag 'uninit-macro-v5.9-rc1' of ↵Linus Torvalds3-5/+2
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull uninitialized_var() macro removal from Kees Cook: "This is long overdue, and has hidden too many bugs over the years. The series has several "by hand" fixes, and then a trivial treewide replacement. - Clean up non-trivial uses of uninitialized_var() - Update documentation and checkpatch for uninitialized_var() removal - Treewide removal of uninitialized_var()" * tag 'uninit-macro-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: compiler: Remove uninitialized_var() macro treewide: Remove uninitialized_var() usage checkpatch: Remove awareness of uninitialized_var() macro mm/debug_vm_pgtable: Remove uninitialized_var() usage f2fs: Eliminate usage of uninitialized_var() macro media: sur40: Remove uninitialized_var() usage KVM: PPC: Book3S PR: Remove uninitialized_var() usage clk: spear: Remove uninitialized_var() usage clk: st: Remove uninitialized_var() usage spi: davinci: Remove uninitialized_var() usage ide: Remove uninitialized_var() usage rtlwifi: rtl8192cu: Remove uninitialized_var() usage b43: Remove uninitialized_var() usage drbd: Remove uninitialized_var() usage x86/mm/numa: Remove uninitialized_var() usage docs: deprecated.rst: Add uninitialized_var()
2020-07-29powerpc/64s: Move HMI IRQ stat from percpu variable to paca.Mahesh Salgaonkar1-1/+1
With the proposed change in percpu bootmem allocator to use page mapping [1], the percpu first chunk memory area can come from vmalloc ranges. This makes the HMI (Hypervisor Maintenance Interrupt) handler crash the kernel whenever percpu variable is accessed in real mode. This patch fixes this issue by moving the HMI IRQ stat inside paca for safe access in realmode. [1] https://lore.kernel.org/linuxppc-dev/20200608070904.387440-1-aneesh.kumar@linux.ibm.com/ Suggested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/159290806973.3642154.5244613424529764050.stgit@jupiter
2020-07-29powerpc/kvm/cma: Improve kernel log during bootAneesh Kumar K.V1-1/+1
Current kernel gives: [ 0.000000] cma: Reserved 26224 MiB at 0x0000007959000000 [ 0.000000] hugetlb_cma: reserve 65536 MiB, up to 16384 MiB per node [ 0.000000] cma: Reserved 16384 MiB at 0x0000001800000000 With the fix [ 0.000000] kvm_cma_reserve: reserving 26214 MiB for global area [ 0.000000] cma: Reserved 26224 MiB at 0x0000007959000000 [ 0.000000] hugetlb_cma: reserve 65536 MiB, up to 16384 MiB per node [ 0.000000] cma: Reserved 16384 MiB at 0x0000001800000000 Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200713150749.25245-2-aneesh.kumar@linux.ibm.com
2020-07-29powerpc/kvm: Use correct CONFIG symbol in commentMichael Ellerman1-1/+1
This comment refers to the non-existent CONFIG_PPC_BOOK3S_XX, which confuses scripts/checkkconfigsymbols.py. Change it to use the correct symbol. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200724131728.1643966-8-mpe@ellerman.id.au
2020-07-28mm/migrate: add a flags parameter to migrate_vmaRalph Campbell1-1/+3
The src_owner field in struct migrate_vma is being used for two purposes, it acts as a selection filter for which types of pages are to be migrated and it identifies device private pages owned by the caller. Split this into separate parameters so the src_owner field can be used just to identify device private pages owned by the caller of migrate_vma_setup(). Rename the src_owner field to pgmap_owner to reflect it is now used only to identify which device private pages to migrate. Link: https://lore.kernel.org/r/20200723223004.9586-3-rcampbell@nvidia.com Signed-off-by: Ralph Campbell <rcampbell@nvidia.com> Reviewed-by: Bharata B Rao <bharata@linux.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-28KVM: PPC: Book3S HV: Rework secure mem slot droppingLaurent Dufour1-17/+35
When a secure memslot is dropped, all the pages backed in the secure device (aka really backed by secure memory by the Ultravisor) should be paged out to a normal page. Previously, this was achieved by triggering the page fault mechanism which is calling kvmppc_svm_page_out() on each pages. This can't work when hot unplugging a memory slot because the memory slot is flagged as invalid and gfn_to_pfn() is then not trying to access the page, so the page fault mechanism is not triggered. Since the final goal is to make a call to kvmppc_svm_page_out() it seems simpler to call directly instead of triggering such a mechanism. This way kvmppc_uvmem_drop_pages() can be called even when hot unplugging a memslot. Since kvmppc_uvmem_drop_pages() is already holding kvm->arch.uvmem_lock, the call to __kvmppc_svm_page_out() is made. As __kvmppc_svm_page_out needs the vma pointer to migrate the pages, the VMA is fetched in a lazy way, to not trigger find_vma() all the time. In addition, the mmap_sem is held in read mode during that time, not in write mode since the virual memory layout is not impacted, and kvm->arch.uvmem_lock prevents concurrent operation on the secure device. Reviewed-by: Bharata B Rao <bharata@linux.ibm.com> Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com> [modified check on the VMA in kvmppc_uvmem_drop_pages] Signed-off-by: Ram Pai <linuxram@us.ibm.com> [modified the changelog description] Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2020-07-28KVM: PPC: Book3S HV: Move kvmppc_svm_page_out upLaurent Dufour1-76/+90
kvmppc_svm_page_out() will need to be called by kvmppc_uvmem_drop_pages() so move it up earlier in this file. Furthermore it will be interesting to call this function when already holding the kvm->arch.uvmem_lock, so prefix the original function with __ and remove the locking in it, and introduce a wrapper which call that function with the lock held. There is no functional change. Reviewed-by: Bharata B Rao <bharata@linux.ibm.com> Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com> Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2020-07-28KVM: PPC: Book3S HV: Migrate hot plugged memoryLaurent Dufour2-12/+25
When a memory slot is hot plugged to a SVM, PFNs associated with the GFNs in that slot must be migrated to the secure-PFNs, aka device-PFNs. Call kvmppc_uv_migrate_mem_slot() to accomplish this. Disable page-merge for all pages in the memory slot. Reviewed-by: Bharata B Rao <bharata@linux.ibm.com> Signed-off-by: Ram Pai <linuxram@us.ibm.com> [rearranged the code, and modified the commit log] Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2020-07-28KVM: PPC: Book3S HV: In H_SVM_INIT_DONE, migrate remaining normal-GFNs to ↵Ram Pai1-22/+132
secure-GFNs The Ultravisor is expected to explicitly call H_SVM_PAGE_IN for all the pages of the SVM before calling H_SVM_INIT_DONE. This causes a huge delay in tranistioning the VM to SVM. The Ultravisor is only interested in the pages that contain the kernel, initrd and other important data structures. The rest contain throw-away content. However if not all pages are requested by the Ultravisor, the Hypervisor continues to consider the GFNs corresponding to the non-requested pages as normal GFNs. This can lead to data-corruption and undefined behavior. In H_SVM_INIT_DONE handler, move all the PFNs associated with the SVM's GFNs to secure-PFNs. Skip the GFNs that are already Paged-in or Shared or Paged-in followed by a Paged-out. Reviewed-by: Bharata B Rao <bharata@linux.ibm.com> Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2020-07-28KVM: PPC: Book3S HV: Track the state GFNs associated with secure VMsRam Pai1-19/+172
During the life of SVM, its GFNs transition through normal, secure and shared states. Since the kernel does not track GFNs that are shared, it is not possible to disambiguate a shared GFN from a GFN whose PFN has not yet been migrated to a secure-PFN. Also it is not possible to disambiguate a secure-GFN from a GFN whose GFN has been pagedout from the ultravisor. The ability to identify the state of a GFN is needed to skip migration of its PFN to secure-PFN during ESM transition. The code is re-organized to track the states of a GFN as explained below. ************************************************************************ 1. States of a GFN --------------- The GFN can be in one of the following states. (a) Secure - The GFN is secure. The GFN is associated with a Secure VM, the contents of the GFN is not accessible to the Hypervisor. This GFN can be backed by a secure-PFN, or can be backed by a normal-PFN with contents encrypted. The former is true when the GFN is paged-in into the ultravisor. The latter is true when the GFN is paged-out of the ultravisor. (b) Shared - The GFN is shared. The GFN is associated with a a secure VM. The contents of the GFN is accessible to Hypervisor. This GFN is backed by a normal-PFN and its content is un-encrypted. (c) Normal - The GFN is a normal. The GFN is associated with a normal VM. The contents of the GFN is accesible to the Hypervisor. Its content is never encrypted. 2. States of a VM. --------------- (a) Normal VM: A VM whose contents are always accessible to the hypervisor. All its GFNs are normal-GFNs. (b) Secure VM: A VM whose contents are not accessible to the hypervisor without the VM's consent. Its GFNs are either Shared-GFN or Secure-GFNs. (c) Transient VM: A Normal VM that is transitioning to secure VM. The transition starts on successful return of H_SVM_INIT_START, and ends on successful return of H_SVM_INIT_DONE. This transient VM, can have GFNs in any of the three states; i.e Secure-GFN, Shared-GFN, and Normal-GFN. The VM never executes in this state in supervisor-mode. 3. Memory slot State. ------------------ The state of a memory slot mirrors the state of the VM the memory slot is associated with. 4. VM State transition. -------------------- A VM always starts in Normal Mode. H_SVM_INIT_START moves the VM into transient state. During this time the Ultravisor may request some of its GFNs to be shared or secured. So its GFNs can be in one of the three GFN states. H_SVM_INIT_DONE moves the VM entirely from transient state to secure-state. At this point any left-over normal-GFNs are transitioned to Secure-GFN. H_SVM_INIT_ABORT moves the transient VM back to normal VM. All its GFNs are moved to Normal-GFNs. UV_TERMINATE transitions the secure-VM back to normal-VM. All the secure-GFN and shared-GFNs are tranistioned to normal-GFN Note: The contents of the normal-GFN is undefined at this point. 5. GFN state implementation: ------------------------- Secure GFN is associated with a secure-PFN; also called uvmem_pfn, when the GFN is paged-in. Its pfn[] has KVMPPC_GFN_UVMEM_PFN flag set, and contains the value of the secure-PFN. It is associated with a normal-PFN; also called mem_pfn, when the GFN is pagedout. Its pfn[] has KVMPPC_GFN_MEM_PFN flag set. The value of the normal-PFN is not tracked. Shared GFN is associated with a normal-PFN. Its pfn[] has KVMPPC_UVMEM_SHARED_PFN flag set. The value of the normal-PFN is not tracked. Normal GFN is associated with normal-PFN. Its pfn[] has no flag set. The value of the normal-PFN is not tracked. 6. Life cycle of a GFN -------------------- -------------------------------------------------------------- | | Share | Unshare | SVM |H_SVM_INIT_DONE| | |operation |operation | abort/ | | | | | | terminate | | ------------------------------------------------------------- | | | | | | | Secure | Shared | Secure |Normal |Secure | | | | | | | | Shared | Shared | Secure |Normal |Shared | | | | | | | | Normal | Shared | Secure |Normal |Secure | -------------------------------------------------------------- 7. Life cycle of a VM -------------------- -------------------------------------------------------------------- | | start | H_SVM_ |H_SVM_ |H_SVM_ |UV_SVM_ | | | VM |INIT_START|INIT_DONE|INIT_ABORT |TERMINATE | | | | | | | | --------- ---------------------------------------------------------- | | | | | | | | Normal | Normal | Transient|Error |Error |Normal | | | | | | | | | Secure | Error | Error |Error |Error |Normal | | | | | | | | |Transient| N/A | Error |Secure |Normal |Normal | -------------------------------------------------------------------- ************************************************************************ Reviewed-by: Bharata B Rao <bharata@linux.ibm.com> Reviewed-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2020-07-28KVM: PPC: Book3S HV: Disable page merging in H_SVM_INIT_STARTRam Pai1-35/+88
Page-merging of pages in memory-slots associated with a Secure VM is disabled in H_SVM_PAGE_IN handler. This operation should have been done the much earlier; the moment the VM is initiated for secure-transition. Delaying this operation increases the probability for those pages to acquire new references, making it impossible to migrate those pages in H_SVM_PAGE_IN handler. Disable page-migration in H_SVM_INIT_START handling. Reviewed-by: Bharata B Rao <bharata@linux.ibm.com> Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2020-07-28KVM: PPC: Book3S HV: Fix function definition in book3s_hv_uvmem.cRam Pai1-11/+10
Without this fix, git is confused. It generates wrong function context for code changes in subsequent patches. Weird, but true. Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2020-07-26powerpc/watchpoint: Rename current H_SET_MODE DAWR macroRavi Bangoria1-1/+1
Current H_SET_MODE hcall macro name for setting/resetting DAWR0 is H_SET_MODE_RESOURCE_SET_DAWR. Add suffix 0 to macro name as well. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Reviewed-by: Jordan Niethe <jniethe5@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200723090813.303838-8-ravi.bangoria@linux.ibm.com
2020-07-23KVM: PPC: Fix typo on H_DISABLE_AND_GET hcallLeonardo Bras1-1/+1
On PAPR+ the hcall() on 0x1B0 is called H_DISABLE_AND_GET, but got defined as H_DISABLE_AND_GETC instead. This define was introduced with a typo in commit <b13a96cfb055> ("[PATCH] powerpc: Extends HCALL interface for InfiniBand usage"), and was later used without having the typo noticed. Signed-off-by: Leonardo Bras <leobras.c@gmail.com> Acked-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200707004812.190765-1-leobras.c@gmail.com
2020-07-23KVM: PPC: Clean up redundant kvm_run parameters in assemblyTianjia Zhang5-49/+44
In the current kvm version, 'kvm_run' has been included in the 'kvm_vcpu' structure. For historical reasons, many kvm-related function parameters retain the 'kvm_run' and 'kvm_vcpu' parameters at the same time. This patch does a unified cleanup of these remaining redundant parameters. [paulus@ozlabs.org - Fixed places that were missed in book3s_interrupts.S] Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2020-07-22KVM: PPC: Book3S HV: Save/restore new PMU registersAthira Rajeev3-2/+52
Power ISA v3.1 has added new performance monitoring unit (PMU) special purpose registers (SPRs). They are: Monitor Mode Control Register 3 (MMCR3) Sampled Instruction Event Register A (SIER2) Sampled Instruction Event Register B (SIER3) Add support to save/restore these new SPRs while entering/exiting guest. Also include changes to support KVM_REG_PPC_MMCR3/SIER2/SIER3. Add new SPRs to KVM API documentation. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1594996707-3727-6-git-send-email-atrajeev@linux.vnet.ibm.com
2020-07-22KVM: PPC: Book3S HV: Cleanup updates for kvm vcpu MMCRAthira Rajeev2-8/+26
Currently `kvm_vcpu_arch` stores all Monitor Mode Control registers in a flat array in order: mmcr0, mmcr1, mmcra, mmcr2, mmcrs Split this to give mmcra and mmcrs its own entries in vcpu and use a flat array for mmcr0 to mmcr2. This patch implements this cleanup to make code easier to read. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> [mpe: Fix MMCRA/MMCR2 uapi breakage as noted by paulus] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1594996707-3727-3-git-send-email-atrajeev@linux.vnet.ibm.com
2020-07-21KVM: PPC: Protect kvm_vcpu_read_guest with srcu locksAlexey Kardashevskiy4-12/+29
The kvm_vcpu_read_guest/kvm_vcpu_write_guest used for nested guests eventually call srcu_dereference_check to dereference a memslot and lockdep produces a warning as neither kvm->slots_lock nor kvm->srcu lock is held and kvm->users_count is above zero (>100 in fact). This wraps mentioned VCPU read/write helpers in srcu read lock/unlock as it is done in other places. This uses vcpu->srcu_idx when possible. These helpers are only used for nested KVM so this may explain why we did not see these before. Here is an example of a warning: ============================= WARNING: suspicious RCU usage 5.7.0-rc3-le_dma-bypass.3.2_a+fstn1 #897 Not tainted ----------------------------- include/linux/kvm_host.h:633 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 1 lock held by qemu-system-ppc/2752: #0: c000200359016be0 (&vcpu->mutex){+.+.}-{3:3}, at: kvm_vcpu_ioctl+0x144/0xd80 [kvm] stack backtrace: CPU: 80 PID: 2752 Comm: qemu-system-ppc Not tainted 5.7.0-rc3-le_dma-bypass.3.2_a+fstn1 #897 Call Trace: [c0002003591ab240] [c000000000b23ab4] dump_stack+0x190/0x25c (unreliable) [c0002003591ab2b0] [c00000000023f954] lockdep_rcu_suspicious+0x140/0x164 [c0002003591ab330] [c008000004a445f8] kvm_vcpu_gfn_to_memslot+0x4c0/0x510 [kvm] [c0002003591ab3a0] [c008000004a44c18] kvm_vcpu_read_guest+0xa0/0x180 [kvm] [c0002003591ab410] [c008000004ff9bd8] kvmhv_enter_nested_guest+0x90/0xb80 [kvm_hv] [c0002003591ab980] [c008000004fe07bc] kvmppc_pseries_do_hcall+0x7b4/0x1c30 [kvm_hv] [c0002003591aba10] [c008000004fe5d30] kvmppc_vcpu_run_hv+0x10a8/0x1a30 [kvm_hv] [c0002003591abae0] [c008000004a5d954] kvmppc_vcpu_run+0x4c/0x70 [kvm] [c0002003591abb10] [c008000004a56e54] kvm_arch_vcpu_ioctl_run+0x56c/0x7c0 [kvm] [c0002003591abba0] [c008000004a3ddc4] kvm_vcpu_ioctl+0x4ac/0xd80 [kvm] [c0002003591abd20] [c0000000006ebb58] ksys_ioctl+0x188/0x210 [c0002003591abd70] [c0000000006ebc28] sys_ioctl+0x48/0xb0 [c0002003591abdb0] [c000000000042764] system_call_exception+0x1d4/0x2e0 [c0002003591abe20] [c00000000000cce8] system_call_common+0xe8/0x214 Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2020-07-21KVM: PPC: Book3S HV: Increase KVMPPC_NR_LPIDS on POWER8 and POWER9Cédric Le Goater1-2/+6
POWER8 and POWER9 have 12-bit LPIDs. Change LPID_RSVD to support up to (4096 - 2) guests on these processors. POWER7 is kept the same with a limitation of (1024 - 2), but it might be time to drop KVM support for POWER7. Tested with 2048 guests * 4 vCPUs on a witherspoon system with 512G RAM and a bit of swap. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2020-07-21KVM: PPC: Book3SHV: Enable support for ISA v3.1 guestsAlistair Popple1-2/+10
Adds support for emulating ISAv3.1 guests by adding the appropriate PCR and FSCR bits. Signed-off-by: Alistair Popple <alistair@popple.id.au> Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2020-07-16treewide: Remove uninitialized_var() usageKees Cook2-2/+2
Using uninitialized_var() is dangerous as it papers over real bugs[1] (or can in the future), and suppresses unrelated compiler warnings (e.g. "unused variable"). If the compiler thinks it is uninitialized, either simply initialize the variable or make compiler changes. In preparation for removing[2] the[3] macro[4], remove all remaining needless uses with the following script: git grep '\buninitialized_var\b' | cut -d: -f1 | sort -u | \ xargs perl -pi -e \ 's/\buninitialized_var\(([^\)]+)\)/\1/g; s:\s*/\* (GCC be quiet|to make compiler happy) \*/$::g;' drivers/video/fbdev/riva/riva_hw.c was manually tweaked to avoid pathological white-space. No outstanding warnings were found building allmodconfig with GCC 9.3.0 for x86_64, i386, arm64, arm, powerpc, powerpc64le, s390x, mips, sparc64, alpha, and m68k. [1] https://lore.kernel.org/lkml/20200603174714.192027-1-glider@google.com/ [2] https://lore.kernel.org/lkml/CA+55aFw+Vbj0i=1TGqCR5vQkCzWJ0QxK6CernOU6eedsudAixw@mail.gmail.com/ [3] https://lore.kernel.org/lkml/CA+55aFwgbgqhbp1fkxvRKEpzyR5J8n1vKT1VZdz9knmPuXhOeg@mail.gmail.com/ [4] https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/ Reviewed-by: Leon Romanovsky <leonro@mellanox.com> # drivers/infiniband and mlx4/mlx5 Acked-by: Jason Gunthorpe <jgg@mellanox.com> # IB Acked-by: Kalle Valo <kvalo@codeaurora.org> # wireless drivers Reviewed-by: Chao Yu <yuchao0@huawei.com> # erofs Signed-off-by: Kees Cook <keescook@chromium.org>
2020-07-16KVM: PPC: Book3S PR: Remove uninitialized_var() usageKees Cook1-3/+0
Using uninitialized_var() is dangerous as it papers over real bugs[1] (or can in the future), and suppresses unrelated compiler warnings (e.g. "unused variable"). If the compiler thinks it is uninitialized, either simply initialize the variable or make compiler changes. As a precursor to removing[2] this[3] macro[4], just remove this variable since it was actually unused: arch/powerpc/kvm/book3s_pr.c:1832:16: warning: unused variable 'vrsave' [-Wunused-variable] unsigned long vrsave; ^ [1] https://lore.kernel.org/lkml/20200603174714.192027-1-glider@google.com/ [2] https://lore.kernel.org/lkml/CA+55aFw+Vbj0i=1TGqCR5vQkCzWJ0QxK6CernOU6eedsudAixw@mail.gmail.com/ [3] https://lore.kernel.org/lkml/CA+55aFwgbgqhbp1fkxvRKEpzyR5J8n1vKT1VZdz9knmPuXhOeg@mail.gmail.com/ [4] https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/ Suggested-by: Nathan Chancellor <natechancellor@gmail.com> Fixes: f05ed4d56e9c ("KVM: PPC: Split out code from book3s.c into book3s_pr.c") Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Kees Cook <keescook@chromium.org>
2020-07-10powerpc64: Break asm/percpu.h vs spinlock_types.h dependencyPeter Zijlstra1-0/+1
In order to use <asm/percpu.h> in lockdep.h, we need to make sure asm/percpu.h does not itself depend on lockdep. The below seems to make that so and builds powerpc64-defconfig + PROVE_LOCKING. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Ingo Molnar <mingo@kernel.org> https://lkml.kernel.org/r/20200623083721.336906073@infradead.org