summaryrefslogtreecommitdiff
path: root/arch/powerpc
AgeCommit message (Collapse)AuthorFilesLines
2020-05-08crypto: lib/sha1 - remove unnecessary includes of linux/cryptohash.hEric Biggers3-3/+0
<linux/cryptohash.h> sounds very generic and important, like it's the header to include if you're doing cryptographic hashing in the kernel. But actually it only includes the library implementation of the SHA-1 compression function (not even the full SHA-1). This should basically never be used anymore; SHA-1 is no longer considered secure, and there are much better ways to do cryptographic hashing in the kernel. Most files that include this header don't actually need it. So in preparation for removing it, remove all these unneeded includes of it. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-05-08crypto: powerpc/sha1 - prefix the "sha1_" functionsEric Biggers1-13/+13
Prefix the PowerPC SHA-1 functions with "powerpc_sha1_" rather than "sha1_". This allows us to rename the library function sha_init() to sha1_init() without causing a naming collision. Cc: linuxppc-dev@lists.ozlabs.org Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-05-08crypto: powerpc/sha1 - remove unused temporary workspaceEric Biggers1-5/+2
The PowerPC implementation of SHA-1 doesn't actually use the 16-word temporary array that's passed to the assembly code. This was probably meant to correspond to the 'W' array that lib/sha1.c uses. However, in sha1-powerpc-asm.S these values are actually stored in GPRs 16-31. Referencing SHA_WORKSPACE_WORDS from this code also isn't appropriate, since it's an implementation detail of lib/sha1.c. Therefore, just remove this unneeded array. Tested with: export ARCH=powerpc CROSS_COMPILE=powerpc-linux-gnu- make mpc85xx_defconfig cat >> .config << EOF # CONFIG_MODULES is not set # CONFIG_CRYPTO_MANAGER_DISABLE_TESTS is not set CONFIG_DEBUG_KERNEL=y CONFIG_CRYPTO_MANAGER_EXTRA_TESTS=y CONFIG_CRYPTO_SHA1_PPC=y EOF make olddefconfig make -j32 qemu-system-ppc -M mpc8544ds -cpu e500 -nographic \ -kernel arch/powerpc/boot/zImage \ -append "cryptomgr.fuzz_iterations=1000 cryptomgr.panic_on_fail=1" Cc: linuxppc-dev@lists.ozlabs.org Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-05-08powerpc/uaccess: Don't use "m<>" constraintMichael Ellerman1-1/+1
The "m<>" constraint breaks compilation with GCC 4.6.x era compilers. The use of the constraint allows the compiler to use update-form instructions, however in practice current compilers never generate those forms for any of the current uses of __put_user_asm_goto(). We anticipate that GCC 4.6 will be declared unsupported for building the kernel in the not too distant future. So for now just switch to the "m" constraint. Fixes: 334710b1496a ("powerpc/uaccess: Implement unsafe_put_user() using 'asm goto'") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Segher Boessenkool <segher@kernel.crashing.org> Link: https://lore.kernel.org/r/20200507123324.2250024-1-mpe@ellerman.id.au
2020-05-07Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-0/+1
Pull kvm fixes from Paolo Bonzini: "Bugfixes, mostly for ARM and AMD, and more documentation. Slightly bigger than usual because I couldn't send out what was pending for rc4, but there is nothing worrisome going on. I have more fixes pending for guest debugging support (gdbstub) but I will send them next week" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (22 commits) KVM: X86: Declare KVM_CAP_SET_GUEST_DEBUG properly KVM: selftests: Fix build for evmcs.h kvm: x86: Use KVM CPU capabilities to determine CR4 reserved bits KVM: VMX: Explicitly clear RFLAGS.CF and RFLAGS.ZF in VM-Exit RSB path docs/virt/kvm: Document configuring and running nested guests KVM: s390: Remove false WARN_ON_ONCE for the PQAP instruction kvm: ioapic: Restrict lazy EOI update to edge-triggered interrupts KVM: x86: Fixes posted interrupt check for IRQs delivery modes KVM: SVM: fill in kvm_run->debug.arch.dr[67] KVM: nVMX: Replace a BUG_ON(1) with BUG() to squash clang warning KVM: arm64: Fix 32bit PC wrap-around KVM: arm64: vgic-v4: Initialize GICv4.1 even in the absence of a virtual ITS KVM: arm64: Save/restore sp_el0 as part of __guest_enter KVM: arm64: Delete duplicated label in invalid_vector KVM: arm64: vgic-its: Fix memory leak on the error path of vgic_add_lpi() KVM: arm64: vgic-v3: Retire all pending LPIs on vcpu destroy KVM: arm: vgic-v2: Only use the virtual state when userspace accesses pending bits KVM: arm: vgic: Only use the virtual state when userspace accesses enable bits KVM: arm: vgic: Synchronize the whole guest on GIC{D,R}_I{S,C}ACTIVER read KVM: arm64: PSCI: Forbid 64bit functions for 32bit guests ...
2020-05-07powerpc/xive: Enforce load-after-store ordering when StoreEOI is activeCédric Le Goater5-0/+25
When an interrupt has been handled, the OS notifies the interrupt controller with a EOI sequence. On a POWER9 system using the XIVE interrupt controller, this can be done with a load or a store operation on the ESB interrupt management page of the interrupt. The StoreEOI operation has less latency and improves interrupt handling performance but it was deactivated during the POWER9 DD2.0 timeframe because of ordering issues. We use the LoadEOI today but we plan to reactivate StoreEOI in future architectures. There is usually no need to enforce ordering between ESB load and store operations as they should lead to the same result. E.g. a store trigger and a load EOI can be executed in any order. Assuming the interrupt state is PQ=10, a store trigger followed by a load EOI will return a Q bit. In the reverse order, it will create a new interrupt trigger from HW. In both cases, the handler processing interrupts is notified. In some cases, the XIVE_ESB_SET_PQ_10 load operation is used to disable temporarily the interrupt source (mask/unmask). When the source is reenabled, the OS can detect if interrupts were received while the source was disabled and reinject them. This process needs special care when StoreEOI is activated. The ESB load and store operations should be correctly ordered because a XIVE_ESB_STORE_EOI operation could leave the source enabled if it has not completed before the loads. For those cases, we enforce Load-after-Store ordering with a special load operation offset. To avoid performance impact, this ordering is only enforced when really needed, that is when interrupt sources are temporarily disabled with the XIVE_ESB_SET_PQ_10 load. It should not be needed for other loads. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200220081506.31209-1-clg@kaod.org
2020-05-07KVM: X86: Declare KVM_CAP_SET_GUEST_DEBUG properlyPeter Xu1-0/+1
KVM_CAP_SET_GUEST_DEBUG should be supported for x86 however it's not declared as supported. My wild guess is that userspaces like QEMU are using "#ifdef KVM_CAP_SET_GUEST_DEBUG" to check for the capability instead, but that could be wrong because the compilation host may not be the runtime host. The userspace might still want to keep the old "#ifdef" though to not break the guest debug on old kernels. Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20200505154750.126300-1-peterx@redhat.com> [Do the same for PPC and s390. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-07powerpc/32s: Fix build failure with CONFIG_PPC_KUAP_DEBUGChristophe Leroy1-1/+1
gpr2 is not a parametre of kuap_check(), it doesn't exist. Use gpr instead. Fixes: a68c31fc01ef ("powerpc/32s: Implement Kernel Userspace Access Protection") Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/ea599546f2a7771bde551393889e44e6b2632332.1587368807.git.christophe.leroy@c-s.fr
2020-05-07powerpc/ima: Fix secure boot rules in ima arch policyNayna Jain1-3/+3
To prevent verifying the kernel module appended signature twice (finit_module), once by the module_sig_check() and again by IMA, powerpc secure boot rules define an IMA architecture specific policy rule only if CONFIG_MODULE_SIG_FORCE is not enabled. This, unfortunately, does not take into account the ability of enabling "sig_enforce" on the boot command line (module.sig_enforce=1). Including the IMA module appraise rule results in failing the finit_module syscall, unless the module signing public key is loaded onto the IMA keyring. This patch fixes secure boot policy rules to be based on CONFIG_MODULE_SIG instead. Fixes: 4238fad366a6 ("powerpc/ima: Add support to initialize ima policy rules") Signed-off-by: Nayna Jain <nayna@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com> Link: https://lore.kernel.org/r/1588342612-14532-1-git-send-email-nayna@linux.ibm.com
2020-05-07powerpc/64s/kuap: Restore AMR in fast_interrupt_returnNicholas Piggin1-1/+3
Interrupts that use fast_interrupt_return actually do lock AMR, but they have been ones which tend to come from userspace (or kernel bugs) in radix mode. With kuap on hash, segment interrupts are taken in kernel often, which quickly breaks due to the missing restore. Fixes: 890274c2dc4c ("powerpc/64s: Implement KUAP for Radix MMU") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200429065654.1677541-6-npiggin@gmail.com
2020-05-06KVM: X86: Declare KVM_CAP_SET_GUEST_DEBUG properlyPeter Xu1-0/+1
KVM_CAP_SET_GUEST_DEBUG should be supported for x86 however it's not declared as supported. My wild guess is that userspaces like QEMU are using "#ifdef KVM_CAP_SET_GUEST_DEBUG" to check for the capability instead, but that could be wrong because the compilation host may not be the runtime host. The userspace might still want to keep the old "#ifdef" though to not break the guest debug on old kernels. Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20200505154750.126300-1-peterx@redhat.com> [Do the same for PPC and s390. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-06Merge the lockless page table walk rework into nextMichael Ellerman19-284/+273
This merges the lockless page table walk rework series from Aneesh. Because it touches powerpc KVM code we are sharing it with the kvm-ppc tree in our topic/ppc-kvm branch. This is the cover letter from Aneesh: Avoid IPI while updating page table entries. Problem Summary: Slow termination of KVM guest with large guest RAM config due to a large number of IPIs that were caused by clearing level 1 PTE entries (THP) entries. This is shown in the stack trace below. - qemu-system-ppc [kernel.vmlinux] [k] smp_call_function_many - smp_call_function_many - 36.09% smp_call_function_many serialize_against_pte_lookup radix__pmdp_huge_get_and_clear zap_huge_pmd unmap_page_range unmap_vmas unmap_region __do_munmap __vm_munmap sys_munmap system_call __munmap qemu_ram_munmap qemu_anon_ram_free reclaim_ramblock call_rcu_thread qemu_thread_start start_thread __clone Why we need to do IPI when clearing PMD entries: This was added as part of commit: 13bd817bb884 ("powerpc/thp: Serialize pmd clear against a linux page table walk") serialize_against_pte_lookup makes sure that all parallel lockless page table walk completes before we convert a PMD pte entry to regular pmd entry. We end up doing that conversion in the below scenarios 1) __split_huge_zero_page_pmd 2) do_huge_pmd_wp_page_fallback 3) MADV_DONTNEED running parallel to page faults. local_irq_disable and lockless page table walk: The lockless page table walk work with the assumption that we can dereference the page table contents without holding a lock. For this to work, we need to make sure we read the page table contents atomically and page table pages are not going to be freed/released while we are walking the table pages. We can achieve by using a rcu based freeing for page table pages or if the architecture implements broadcast tlbie, we can block the IPI as we walk the page table pages. To support both the above framework, lockless page table walk is done with irq disabled instead of rcu_read_lock() We do have two interface for lockless page table walk, gup fast and __find_linux_pte. This patch series makes __find_linux_pte table walk safe against the conversion of PMD PTE to regular PMD. gup fast: gup fast is already safe against THP split because kernel now differentiate between a pmd split and a compound page split. gup fast can run parallel to a pmd split and we prevent a parallel gup fast to a hugepage split, by freezing the page refcount and failing the speculative page ref increment. Similar to how gup is safe against parallel pmd split, this patch series updates the __find_linux_pte callers to be safe against a parallel pmd split. We do that by enforcing the following rules. 1) Don't reload the pte value, because that can be updated in parallel. 2) Code should be able to work with a stale PTE value and not the recent one. ie, the pte value that we are looking at may not be the latest value in the page table. 3) Before looking at pte value check for _PAGE_PTE bit. We now do this as part of pte_present() check. Performance: This speeds up Qemu guest RAM del/unplug time as below 128 core, 496GB guest: Without patch: munmap start: timer = 13162 ms, PID=7684 munmap finish: timer = 95312 ms, PID=7684 - delta = 82150 ms With patch (upto removing IPI) munmap start: timer = 196449 ms, PID=6681 munmap finish: timer = 196488 ms, PID=6681 - delta = 39ms With patch (with adding the tlb invalidate in pmdp_huge_get_and_clear_full) munmap start: timer = 196345 ms, PID=6879 munmap finish: timer = 196714 ms, PID=6879 - delta = 369ms Link: https://lore.kernel.org/r/20200505071729.54912-1-aneesh.kumar@linux.ibm.com
2020-05-05powerpc/spufs: simplify spufs core dumpingChristoph Hellwig3-176/+117
Replace the coredump ->read method with a ->dump method that must call dump_emit itself. That way we avoid a buffer allocation an messing with set_fs() to call into code that is intended to deal with user buffers. For the ->get case we can now use a small on-stack buffer and avoid memory allocations as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-05-05powerpc/spufs: stop using access_okChristoph Hellwig1-34/+8
Just use the proper non __-prefixed get/put_user variants where that is not done yet. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-05-05powerpc/spufs: fix copy_to_user while atomicJeremy Kerr1-38/+75
Currently, we may perform a copy_to_user (through simple_read_from_buffer()) while holding a context's register_lock, while accessing the context save area. This change uses a temporary buffer for the context save area data, which we then pass to simple_read_from_buffer. Includes changes from Christoph Hellwig <hch@lst.de>. Fixes: bf1ab978be23 ("[POWERPC] coredump: Add SPU elf notes to coredump.") Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Reviewed-by: Arnd Bergmann <arnd@arndb.de> [hch: renamed to function to avoid ___-prefixes] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-05-05powerpc/mm/book3s64: Fix MADV_DONTNEED and parallel page fault raceAneesh Kumar K.V2-0/+23
MADV_DONTNEED holds mmap_sem in read mode and that implies a parallel page fault is possible and the kernel can end up with a level 1 PTE entry (THP entry) converted to a level 0 PTE entry without flushing the THP TLB entry. Most architectures including POWER have issues with kernel instantiating a level 0 PTE entry while holding level 1 TLB entries. The code sequence I am looking at is down_read(mmap_sem) down_read(mmap_sem) zap_pmd_range() zap_huge_pmd() pmd lock held pmd_cleared table details added to mmu_gather pmd_unlock() insert a level 0 PTE entry() tlb_finish_mmu(). Fix this by forcing a tlb flush before releasing pmd lock if this is not a fullmm invalidate. We can safely skip this invalidate for task exit case (fullmm invalidate) because in that case we are sure there can be no parallel fault handlers. This do change the Qemu guest RAM del/unplug time as below 128 core, 496GB guest: Without patch: munmap start: timer = 196449 ms, PID=6681 munmap finish: timer = 196488 ms, PID=6681 - delta = 39ms With patch: munmap start: timer = 196345 ms, PID=6879 munmap finish: timer = 196714 ms, PID=6879 - delta = 369ms Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200505071729.54912-23-aneesh.kumar@linux.ibm.com
2020-05-05powerpc/mm/book3s64: Avoid sending IPI on clearing PMDAneesh Kumar K.V3-31/+7
Now that all the lockless page table walk is careful w.r.t the PTE address returned, we can now revert commit: 13bd817bb884 ("powerpc/thp: Serialize pmd clear against a linux page table walk.") We also drop the equivalent IPI from other pte updates routines. We still keep IPI in hash pmdp collapse and that is to take care of parallel hash page table insert. The radix pmdp collapse flush can possibly be removed once I am sure generic code doesn't have the any expectations around parallel gup walk. This speeds up Qemu guest RAM del/unplug time as below 128 core, 496GB guest: Without patch: munmap start: timer = 13162 ms, PID=7684 munmap finish: timer = 95312 ms, PID=7684 - delta = 82150 ms With patch: munmap start: timer = 196449 ms, PID=6681 munmap finish: timer = 196488 ms, PID=6681 - delta = 39ms Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200505071729.54912-21-aneesh.kumar@linux.ibm.com
2020-05-05powerpc/kvm/book3s: Use pte_present instead of opencoding _PAGE_PRESENT checkAneesh Kumar K.V1-1/+1
This adds _PAGE_PTE check and makes sure we validate the pte value returned via find_kvm_host_pte. NOTE: this also considers _PAGE_INVALID to the software valid bit. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200505071729.54912-20-aneesh.kumar@linux.ibm.com
2020-05-05powerpc/kvm/book3s: Use find_kvm_host_pte in kvmppc_get_hpaAneesh Kumar K.V1-21/+11
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200505071729.54912-19-aneesh.kumar@linux.ibm.com
2020-05-05powerpc/kvm/book3s: use find_kvm_host_pte in kvmppc_book3s_instantiate_pageAneesh Kumar K.V1-4/+4
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200505071729.54912-18-aneesh.kumar@linux.ibm.com
2020-05-05powerpc/kvm/book3s: Avoid using rmap to protect parallel page table update.Aneesh Kumar K.V1-29/+9
We now depend on kvm->mmu_lock Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200505071729.54912-17-aneesh.kumar@linux.ibm.com
2020-05-05powerpc/kvm/book3s: use find_kvm_host_pte in pute_tce functionsAneesh Kumar K.V1-6/+24
Current code just hold rmap lock to ensure parallel page table update is prevented. That is not sufficient. The kernel should also check whether a mmu_notifer callback was running in parallel. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200505071729.54912-16-aneesh.kumar@linux.ibm.com
2020-05-05powerpc/kvm/book3s: Use find_kvm_host_pte in h_enterAneesh Kumar K.V2-19/+8
Since kvmppc_do_h_enter can get called in realmode use low level arch_spin_lock which is safe to be called in realmode. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200505071729.54912-15-aneesh.kumar@linux.ibm.com
2020-05-05powerpc/kvm/book3s: Use find_kvm_host_pte in page fault handlerAneesh Kumar K.V1-4/+4
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200505071729.54912-14-aneesh.kumar@linux.ibm.com
2020-05-05powerpc/kvm/book3s: Add helper for host page table walkAneesh Kumar K.V1-0/+16
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200505071729.54912-13-aneesh.kumar@linux.ibm.com
2020-05-05powerpc/kvm/book3s: Use kvm helpers to walk shadow or secondary tableAneesh Kumar K.V4-17/+19
update kvmppc_hv_handle_set_rc to use find_kvm_nested_guest_pte and find_kvm_secondary_pte Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200505071729.54912-12-aneesh.kumar@linux.ibm.com
2020-05-05powerpc/kvm/nested: Add helper to walk nested shadow linux page table.Aneesh Kumar K.V1-7/+21
The locking rules for walking nested shadow linux page table is different from process scoped table. Hence add a helper for nested page table walk and also add check whether we are holding the right locks. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200505071729.54912-11-aneesh.kumar@linux.ibm.com
2020-05-05powerpc/kvm/book3s: Add helper to walk partition scoped linux page table.Aneesh Kumar K.V3-7/+20
The locking rules for walking partition scoped table is different from process scoped table. Hence add a helper for secondary linux page table walk and also add check whether we are holding the right locks. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200505071729.54912-10-aneesh.kumar@linux.ibm.com
2020-05-05powerpc/kvm/book3s: switch from raw_spin_*lock to arch_spin_lock.Aneesh Kumar K.V1-4/+4
These functions can get called in realmode. Hence use low level arch_spin_lock which is safe to be called in realmode. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200505071729.54912-9-aneesh.kumar@linux.ibm.com
2020-05-05powerpc/perf/callchain: Use __get_user_pages_fast in read_user_stack_slowAneesh Kumar K.V1-32/+14
read_user_stack_slow is called with interrupts soft disabled and it copies contents from the page which we find mapped to a specific address. To convert userspace address to pfn, the kernel now uses lockless page table walk. The kernel needs to make sure the pfn value read remains stable and is not released and reused for another process while the contents are read from the page. This can only be achieved by holding a page reference. One of the first approaches I tried was to check the pte value after the kernel copies the contents from the page. But as shown below we can still get it wrong CPU0 CPU1 pte = READ_ONCE(*ptep); pte_clear(pte); put_page(page); page = alloc_page(); memcpy(page_address(page), "secret password", nr); memcpy(buf, kaddr + offset, nb); put_page(page); handle_mm_fault() page = alloc_page(); set_pte(pte, page); if (pte_val(pte) != pte_val(*ptep)) Hence switch to __get_user_pages_fast. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200505071729.54912-8-aneesh.kumar@linux.ibm.com
2020-05-05powerpc/mce: Don't reload pte val in addr_to_pfnAneesh Kumar K.V1-5/+9
A lockless page table walk should be safe against parallel THP collapse, THP split and madvise(MADV_DONTNEED)/parallel fault. This patch makes sure kernel won't reload the pteval when checking for different conditions. The patch also added a check for pte_present to make sure the kernel is indeed operating on a PTE and not a pointer to level 0 table page. The pfn value we find here can be different from the actual pfn on which machine check happened. This can happen if we raced with a parallel update of the page table. In such a scenario we end up isolating a wrong pfn. But that doesn't have any other side effect. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200505071729.54912-7-aneesh.kumar@linux.ibm.com
2020-05-05powerpc/book3s64/hash: Use the pte_t address from the callerAneesh Kumar K.V1-22/+5
Don't fetch the pte value using lockless page table walk. Instead use the value from the caller. hash_preload is called with ptl lock held. So it is safe to use the pte_t address directly. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200505071729.54912-6-aneesh.kumar@linux.ibm.com
2020-05-05powerpc/hash64: Restrict page table lookup using init_mm with ↵Aneesh Kumar K.V3-16/+5
__flush_hash_table_range This is only used with init_mm currently. Walking init_mm is much simpler because we don't need to handle concurrent page table like other mm_context Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200505071729.54912-5-aneesh.kumar@linux.ibm.com
2020-05-05powerpc/mm/hash64: use _PAGE_PTE when checking for pte_presentAneesh Kumar K.V2-7/+19
This makes the pte_present check stricter by checking for additional _PAGE_PTE bit. A level 1 pte pointer (THP pte) can be switched to a pointer to level 0 pte page table page by following two operations. 1) THP split. 2) madvise(MADV_DONTNEED) in parallel to page fault. A lockless page table walk need to make sure we can handle such changes gracefully. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200505071729.54912-4-aneesh.kumar@linux.ibm.com
2020-05-05powerpc/pkeys: Check vma before returning key fault error to the userAneesh Kumar K.V1-8/+0
If multiple threads in userspace keep changing the protection keys mapping a range, there can be a scenario where kernel takes a key fault but the pkey value found in the siginfo struct is a permissive one. This can confuse the userspace as shown in the below test case. /* use this to control the number of test iterations */ static void pkeyreg_set(int pkey, unsigned long rights) { unsigned long reg, shift; shift = (NR_PKEYS - pkey - 1) * PKEY_BITS_PER_PKEY; asm volatile("mfspr %0, 0xd" : "=r"(reg)); reg &= ~(((unsigned long) PKEY_BITS_MASK) << shift); reg |= (rights & PKEY_BITS_MASK) << shift; asm volatile("mtspr 0xd, %0" : : "r"(reg)); } static unsigned long pkeyreg_get(void) { unsigned long reg; asm volatile("mfspr %0, 0xd" : "=r"(reg)); return reg; } static int sys_pkey_mprotect(void *addr, size_t len, int prot, int pkey) { return syscall(SYS_pkey_mprotect, addr, len, prot, pkey); } static int sys_pkey_alloc(unsigned long flags, unsigned long access_rights) { return syscall(SYS_pkey_alloc, flags, access_rights); } static int sys_pkey_free(int pkey) { return syscall(SYS_pkey_free, pkey); } static int faulting_pkey; static int permissive_pkey; static pthread_barrier_t pkey_set_barrier; static pthread_barrier_t mprotect_barrier; static void pkey_handle_fault(int signum, siginfo_t *sinfo, void *ctx) { unsigned long pkeyreg; /* FIXME: printf is not signal-safe but for the current purpose, it gets the job done. */ printf("pkey: exp = %d, got = %d\n", faulting_pkey, sinfo->si_pkey); fflush(stdout); assert(sinfo->si_code == SEGV_PKUERR); assert(sinfo->si_pkey == faulting_pkey); /* clear pkey permissions to let the faulting instruction continue */ pkeyreg_set(faulting_pkey, 0x0); } static void *do_mprotect_fault(void *p) { unsigned long rights, pkeyreg, pgsize; unsigned int i; void *region; int pkey; srand(time(NULL)); pgsize = sysconf(_SC_PAGESIZE); rights = PKEY_DISABLE_WRITE; region = p; /* allocate key, no permissions */ assert((pkey = sys_pkey_alloc(0, PKEY_DISABLE_ACCESS)) > 0); pkeyreg_set(4, 0x0); /* cache the pkey here as the faulting pkey for future reference in the signal handler */ faulting_pkey = pkey; printf("%s: faulting pkey = %d\n", __func__, faulting_pkey); /* try to allocate, mprotect and free pkeys repeatedly */ for (i = 0; i < NUM_ITERATIONS; i++) { /* sync up with the other thread here */ pthread_barrier_wait(&pkey_set_barrier); /* make sure that the pkey used by the non-faulting thread is made permissive for this thread's context too so that no faults are triggered because it still might have been set to a restrictive value */ // pkeyreg_set(permissive_pkey, 0x0); /* sync up with the other thread here */ pthread_barrier_wait(&mprotect_barrier); /* perform mprotect */ assert(!sys_pkey_mprotect(region, pgsize, PROT_READ | PROT_WRITE, pkey)); /* choose a random byte from the protected region and attempt to write to it, this will generate a fault */ *((char *) region + (rand() % pgsize)) = rand(); /* restore pkey permissions as the signal handler may have cleared the bit out for the sake of continuing */ pkeyreg_set(pkey, PKEY_DISABLE_WRITE); } /* free pkey */ sys_pkey_free(pkey); return NULL; } static void *do_mprotect_nofault(void *p) { unsigned long pgsize; unsigned int i, j; void *region; int pkey; pgsize = sysconf(_SC_PAGESIZE); region = p; /* try to allocate, mprotect and free pkeys repeatedly */ for (i = 0; i < NUM_ITERATIONS; i++) { /* allocate pkey, all permissions */ assert((pkey = sys_pkey_alloc(0, 0)) > 0); permissive_pkey = pkey; /* sync up with the other thread here */ pthread_barrier_wait(&pkey_set_barrier); pthread_barrier_wait(&mprotect_barrier); /* perform mprotect on the common page, no faults will be triggered as this is most permissive */ assert(!sys_pkey_mprotect(region, pgsize, PROT_READ | PROT_WRITE, pkey)); /* free pkey */ assert(!sys_pkey_free(pkey)); } return NULL; } int main(int argc, char **argv) { pthread_t fault_thread, nofault_thread; unsigned long pgsize; struct sigaction act; pthread_attr_t attr; cpu_set_t fault_cpuset, nofault_cpuset; unsigned int i; void *region; /* allocate memory region to protect */ pgsize = sysconf(_SC_PAGESIZE); assert(region = memalign(pgsize, pgsize)); CPU_ZERO(&fault_cpuset); CPU_SET(0, &fault_cpuset); CPU_ZERO(&nofault_cpuset); CPU_SET(8, &nofault_cpuset); assert(!pthread_attr_init(&attr)); /* setup sigsegv signal handler */ act.sa_handler = 0; act.sa_sigaction = pkey_handle_fault; assert(!sigprocmask(SIG_SETMASK, 0, &act.sa_mask)); act.sa_flags = SA_SIGINFO; act.sa_restorer = 0; assert(!sigaction(SIGSEGV, &act, NULL)); /* setup barrier for the two threads */ pthread_barrier_init(&pkey_set_barrier, NULL, 2); pthread_barrier_init(&mprotect_barrier, NULL, 2); /* setup and start threads */ assert(!pthread_create(&fault_thread, &attr, &do_mprotect_fault, region)); assert(!pthread_setaffinity_np(fault_thread, sizeof(cpu_set_t), &fault_cpuset)); assert(!pthread_create(&nofault_thread, &attr, &do_mprotect_nofault, region)); assert(!pthread_setaffinity_np(nofault_thread, sizeof(cpu_set_t), &nofault_cpuset)); /* cleanup */ assert(!pthread_attr_destroy(&attr)); assert(!pthread_join(fault_thread, NULL)); assert(!pthread_join(nofault_thread, NULL)); assert(!pthread_barrier_destroy(&pkey_set_barrier)); assert(!pthread_barrier_destroy(&mprotect_barrier)); free(region); puts("PASS"); return EXIT_SUCCESS; } The above test can result the below failure without this patch. pkey: exp = 3, got = 3 pkey: exp = 3, got = 4 a.out: pkey-siginfo-race.c:100: pkey_handle_fault: Assertion `sinfo->si_pkey == faulting_pkey' failed. Aborted Check for vma access before considering this a key fault. If vma pkey allow access retry the acess again. Test case is written by Sandipan Das <sandipan@linux.ibm.com> hence added SOB from him. Signed-off-by: Sandipan Das <sandipan@linux.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200505071729.54912-3-aneesh.kumar@linux.ibm.com
2020-05-05powerpc/pkeys: Avoid using lockless page table walkAneesh Kumar K.V3-56/+60
Fetch pkey from vma instead of linux page table. Also document the fact that in some cases the pkey returned in siginfo won't be the same as the one we took keyfault on. Even with linux page table walk, we can end up in a similar scenario. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200505071729.54912-2-aneesh.kumar@linux.ibm.com
2020-05-05Merge tag 'kvm-ppc-fixes-5.7-1' into topic/ppc-kvmMichael Ellerman2-8/+10
This brings in a fix from the kvm-ppc tree that was merged to mainline after rc2, and so isn't in the base of our topic branch. We'd like it in the topic branch because it interacts with patches we plan to carry in this branch.
2020-05-04powerpc/fadump: consider reserved ranges while reserving memoryHari Bathini1-9/+67
Commit 0962e8004e97 ("powerpc/prom: Scan reserved-ranges node for memory reservations") enabled support to parse reserved-ranges DT node and reserve kernel memory falling in these ranges for F/W purposes. Memory reserved for FADump should not overlap with these ranges as it could corrupt memory meant for F/W or crash'ed kernel memory to be exported as vmcore. But since commit 579ca1a27675 ("powerpc/fadump: make use of memblock's bottom up allocation mode"), memblock_find_in_range() is being used to find the appropriate area to reserve memory for FADump, which can't account for reserved-ranges as these ranges are reserved only after FADump memory reservation. With reserved-ranges now being populated during early boot, look out for these memory ranges while reserving memory for FADump. Without this change, MPIPL on PowerNV systems aborts with hostboot failure, when memory reserved for FADump is less than 4096MB. Fixes: 579ca1a27675 ("powerpc/fadump: make use of memblock's bottom up allocation mode") Cc: stable@vger.kernel.org Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Reviewed-by: Mahesh Salgaonkar <mahesh@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/158737297693.26700.16193820746269425424.stgit@hbathini.in.ibm.com
2020-05-04powerpc/fadump: use static allocation for reserved memory rangesHari Bathini2-33/+48
At times, memory ranges have to be looked up during early boot, when kernel couldn't be initialized for dynamic memory allocation. In fact, reserved-ranges look up is needed during FADump memory reservation. Without accounting for reserved-ranges in reserving memory for FADump, MPIPL boot fails with memory corruption issues. So, extend memory ranges handling to support static allocation and populate reserved memory ranges during early boot. Fixes: dda9dbfeeb7a ("powerpc/fadump: consider reserved ranges while releasing memory") Cc: stable@vger.kernel.org Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Reviewed-by: Mahesh Salgaonkar <mahesh@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/158737294432.26700.4830263187856221314.stgit@hbathini.in.ibm.com
2020-05-04powerpc/64s/kuap: Restore AMR in system reset exceptionNicholas Piggin1-0/+1
The system reset interrupt handler locks AMR and exits with EXCEPTION_RESTORE_REGS without restoring AMR. Similarly to the soft-NMI handler, it needs to restore. Fixes: 890274c2dc4c ("powerpc/64s: Implement KUAP for Radix MMU") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200429065654.1677541-5-npiggin@gmail.com
2020-05-04powerpc/64/kuap: Move kuap checks out of MSR[RI]=0 regions of exit codeNicholas Piggin1-6/+8
Any kind of WARN causes a program check that will crash with unrecoverable exception if it occurs when RI is clear. Fixes: 68b34588e202 ("powerpc/64/sycall: Implement syscall entry/exit logic in C") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200429065654.1677541-2-npiggin@gmail.com
2020-05-04powerpc/64s: Fix unrecoverable SLB crashes due to preemption checkMichael Ellerman2-4/+22
Hugh reported that his trusty G5 crashed after a few hours under load with an "Unrecoverable exception 380". The crash is in interrupt_return() where we check lazy_irq_pending(), which calls get_paca() and with CONFIG_DEBUG_PREEMPT=y that goes to check_preemption_disabled() via debug_smp_processor_id(). As Nick explained on the list: Problem is MSR[RI] is cleared here, ready to do the last few things for interrupt return where we're not allowed to take any other interrupts. SLB interrupts can happen just about anywhere aside from kernel text, global variables, and stack. When that hits, it appears to be unrecoverable due to RI=0. The problematic access is in preempt_count() which is: return READ_ONCE(current_thread_info()->preempt_count); Because of THREAD_INFO_IN_TASK, current_thread_info() just points to current, so the access is to somewhere in kernel memory, but not on the stack or in .data, which means it can cause an SLB miss. If we take an SLB miss with RI=0 it is fatal. The easiest solution is to add a version of lazy_irq_pending() that doesn't do the preemption check and call it from the interrupt return path. Fixes: 68b34588e202 ("powerpc/64/sycall: Implement syscall entry/exit logic in C") Reported-by: Hugh Dickins <hughd@google.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200502143316.929341-1-mpe@ellerman.id.au
2020-05-03Merge KUAP fix from topic/uaccess-ppc into fixesMichael Ellerman1-14/+35
Merge a KUAP fix from Nick that we're keeping in a topic branch due to interactions with other series that are headed for next.
2020-05-01powerpc/uaccess: Implement user_read_access_begin and user_write_access_beginChristophe Leroy3-3/+37
Add support for selective read or write user access with user_read_access_begin/end and user_write_access_begin/end. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/6c83af0f0809ef2a955c39ac622767f6cbede035.1585898438.git.christophe.leroy@c-s.fr
2020-04-30sched/core: Fix illegal RCU from offline CPUsPeter Zijlstra1-1/+0
In the CPU-offline process, it calls mmdrop() after idle entry and the subsequent call to cpuhp_report_idle_dead(). Once execution passes the call to rcu_report_dead(), RCU is ignoring the CPU, which results in lockdep complaining when mmdrop() uses RCU from either memcg or debugobjects below. Fix it by cleaning up the active_mm state from BP instead. Every arch which has CONFIG_HOTPLUG_CPU should have already called idle_task_exit() from AP. The only exception is parisc because it switches them to &init_mm unconditionally (see smp_boot_one_cpu() and smp_cpu_init()), but the patch will still work there because it calls mmgrab(&init_mm) in smp_cpu_init() and then should call mmdrop(&init_mm) in finish_cpu(). WARNING: suspicious RCU usage ----------------------------- kernel/workqueue.c:710 RCU or wq_pool_mutex should be held! other info that might help us debug this: RCU used illegally from offline CPU! Call Trace: dump_stack+0xf4/0x164 (unreliable) lockdep_rcu_suspicious+0x140/0x164 get_work_pool+0x110/0x150 __queue_work+0x1bc/0xca0 queue_work_on+0x114/0x120 css_release+0x9c/0xc0 percpu_ref_put_many+0x204/0x230 free_pcp_prepare+0x264/0x570 free_unref_page+0x38/0xf0 __mmdrop+0x21c/0x2c0 idle_task_exit+0x170/0x1b0 pnv_smp_cpu_kill_self+0x38/0x2e0 cpu_die+0x48/0x64 arch_cpu_idle_dead+0x30/0x50 do_idle+0x2f4/0x470 cpu_startup_entry+0x38/0x40 start_secondary+0x7a8/0xa80 start_secondary_resume+0x10/0x14 Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Qian Cai <cai@lca.pw> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Link: https://lkml.kernel.org/r/20200401214033.8448-1-cai@lca.pw
2020-04-30powerpc/uaccess: Implement unsafe_copy_to_user() as a simple loopChristophe Leroy1-1/+20
At the time being, unsafe_copy_to_user() is based on raw_copy_to_user() which calls __copy_tofrom_user(). __copy_tofrom_user() is a big optimised function to copy big amount of data. It aligns destinations to cache line in order to use dcbz instruction. Today unsafe_copy_to_user() is called only from filldir(). It is used to mainly copy small amount of data like filenames, so __copy_tofrom_user() is not fit. Also, unsafe_copy_to_user() is used within user_access_begin/end sections. In those section, it is preferable to not call functions. Rewrite unsafe_copy_to_user() as a macro that uses __put_user_goto(). We first perform a loop of long, then we finish with necessary complements. unsafe_copy_to_user() might be used in the near future to copy fixed-size data, like pt_regs structs during signal processing. Having it as a macro allows GCC to optimise it for instead when it knows the size in advance, it can unloop loops, drop complements when the size is a multiple of longs, etc ... Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/fe952112c29bf6a0a2778c9e6bbb4f4afd2c4258.1587143308.git.christophe.leroy@c-s.fr
2020-04-30powerpc/uaccess: Implement unsafe_put_user() using 'asm goto'Christophe Leroy1-9/+52
unsafe_put_user() is designed to take benefit of 'asm goto'. Instead of using the standard __put_user() approach and branch based on the returned error, use 'asm goto' and make the exception code branch directly to the error label. There is no code anymore in the fixup section. This change significantly simplifies functions using unsafe_put_user() Small exemple of the benefit with the following code: struct test { u32 item1; u16 item2; u8 item3; u64 item4; }; int set_test_to_user(struct test __user *test, u32 item1, u16 item2, u8 item3, u64 item4) { unsafe_put_user(item1, &test->item1, failed); unsafe_put_user(item2, &test->item2, failed); unsafe_put_user(item3, &test->item3, failed); unsafe_put_user(item4, &test->item4, failed); return 0; failed: return -EFAULT; } Before the patch: 00000be8 <set_test_to_user>: be8: 39 20 00 00 li r9,0 bec: 90 83 00 00 stw r4,0(r3) bf0: 2f 89 00 00 cmpwi cr7,r9,0 bf4: 40 9e 00 38 bne cr7,c2c <set_test_to_user+0x44> bf8: b0 a3 00 04 sth r5,4(r3) bfc: 2f 89 00 00 cmpwi cr7,r9,0 c00: 40 9e 00 2c bne cr7,c2c <set_test_to_user+0x44> c04: 98 c3 00 06 stb r6,6(r3) c08: 2f 89 00 00 cmpwi cr7,r9,0 c0c: 40 9e 00 20 bne cr7,c2c <set_test_to_user+0x44> c10: 90 e3 00 08 stw r7,8(r3) c14: 91 03 00 0c stw r8,12(r3) c18: 21 29 00 00 subfic r9,r9,0 c1c: 7d 29 49 10 subfe r9,r9,r9 c20: 38 60 ff f2 li r3,-14 c24: 7d 23 18 38 and r3,r9,r3 c28: 4e 80 00 20 blr c2c: 38 60 ff f2 li r3,-14 c30: 4e 80 00 20 blr 00000000 <.fixup>: ... b8: 39 20 ff f2 li r9,-14 bc: 48 00 00 00 b bc <.fixup+0xbc> bc: R_PPC_REL24 .text+0xbf0 c0: 39 20 ff f2 li r9,-14 c4: 48 00 00 00 b c4 <.fixup+0xc4> c4: R_PPC_REL24 .text+0xbfc c8: 39 20 ff f2 li r9,-14 cc: 48 00 00 00 b cc <.fixup+0xcc> d0: 39 20 ff f2 li r9,-14 d4: 48 00 00 00 b d4 <.fixup+0xd4> d4: R_PPC_REL24 .text+0xc18 00000000 <__ex_table>: ... a0: R_PPC_REL32 .text+0xbec a4: R_PPC_REL32 .fixup+0xb8 a8: R_PPC_REL32 .text+0xbf8 ac: R_PPC_REL32 .fixup+0xc0 b0: R_PPC_REL32 .text+0xc04 b4: R_PPC_REL32 .fixup+0xc8 b8: R_PPC_REL32 .text+0xc10 bc: R_PPC_REL32 .fixup+0xd0 c0: R_PPC_REL32 .text+0xc14 c4: R_PPC_REL32 .fixup+0xd0 After the patch: 00000be8 <set_test_to_user>: be8: 90 83 00 00 stw r4,0(r3) bec: b0 a3 00 04 sth r5,4(r3) bf0: 98 c3 00 06 stb r6,6(r3) bf4: 90 e3 00 08 stw r7,8(r3) bf8: 91 03 00 0c stw r8,12(r3) bfc: 38 60 00 00 li r3,0 c00: 4e 80 00 20 blr c04: 38 60 ff f2 li r3,-14 c08: 4e 80 00 20 blr 00000000 <__ex_table>: ... a0: R_PPC_REL32 .text+0xbe8 a4: R_PPC_REL32 .text+0xc04 a8: R_PPC_REL32 .text+0xbec ac: R_PPC_REL32 .text+0xc04 b0: R_PPC_REL32 .text+0xbf0 b4: R_PPC_REL32 .text+0xc04 b8: R_PPC_REL32 .text+0xbf4 bc: R_PPC_REL32 .text+0xc04 c0: R_PPC_REL32 .text+0xbf8 c4: R_PPC_REL32 .text+0xc04 Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/23e680624680a9a5405f4b88740d2596d4b17c26.1587143308.git.christophe.leroy@c-s.fr
2020-04-30powerpc/uaccess: Evaluate macro arguments once, before user access is allowedNicholas Piggin1-14/+35
get/put_user() can be called with nontrivial arguments. fs/proc/page.c has a good example: if (put_user(stable_page_flags(ppage), out)) { stable_page_flags() is quite a lot of code, including spin locks in the page allocator. Ensure these arguments are evaluated before user access is allowed. This improves security by reducing code with access to userspace, but it also fixes a PREEMPT bug with KUAP on powerpc/64s: stable_page_flags() is currently called with AMR set to allow writes, it ends up calling spin_unlock(), which can call preempt_schedule. But the task switch code can not be called with AMR set (it relies on interrupts saving the register), so this blows up. It's fine if the code inside allow_user_access() is preemptible, because a timer or IPI will save the AMR, but it's not okay to explicitly cause a reschedule. Fixes: de78a9c42a79 ("powerpc: Add a framework for Kernel Userspace Access Protection") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200407041245.600651-1-npiggin@gmail.com
2020-04-30powerpc/64: Have MPROFILE_KERNEL depend on FUNCTION_TRACERNaveen N. Rao1-1/+1
Currently, it is possible to have CONFIG_FUNCTION_TRACER disabled, but CONFIG_MPROFILE_KERNEL enabled. Though all existing users of MPROFILE_KERNEL are doing the right thing, it is weird to have MPROFILE_KERNEL enabled when the function tracer isn't. Fix this by making MPROFILE_KERNEL depend on FUNCTION_TRACER. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200422092612.514301-1-naveen.n.rao@linux.vnet.ibm.com
2020-04-30powerpc/sysfs: Show idle_purr and idle_spurr for every CPUGautham R. Shenoy2-3/+111
On Pseries LPARs, to calculate utilization, we need to know the [S]PURR ticks when the CPUs were busy or idle. The total PURR and SPURR ticks are already exposed via the per-cpu sysfs files "purr" and "spurr". This patch adds support for exposing the idle PURR and SPURR ticks via new per-cpu sysfs files named "idle_purr" and "idle_spurr". This patch also adds helper functions to accurately read the values of idle_purr and idle_spurr especially from an interrupt context between when the interrupt has occurred between the pseries_idle_prolog() and pseries_idle_epilog(). This will ensure that the idle purr/spurr values corresponding to the latest idle period is accounted for before these values are read. Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1586249263-14048-5-git-send-email-ego@linux.vnet.ibm.com