summaryrefslogtreecommitdiff
path: root/arch/powerpc/kernel
AgeCommit message (Collapse)AuthorFilesLines
2020-07-22KVM: PPC: Book3S HV: Save/restore new PMU registersAthira Rajeev1-0/+3
Power ISA v3.1 has added new performance monitoring unit (PMU) special purpose registers (SPRs). They are: Monitor Mode Control Register 3 (MMCR3) Sampled Instruction Event Register A (SIER2) Sampled Instruction Event Register B (SIER3) Add support to save/restore these new SPRs while entering/exiting guest. Also include changes to support KVM_REG_PPC_MMCR3/SIER2/SIER3. Add new SPRs to KVM API documentation. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1594996707-3727-6-git-send-email-atrajeev@linux.vnet.ibm.com
2020-07-22powerpc/perf: Add support for ISA3.1 PMU SPRsMadhavan Srinivasan1-0/+8
PowerISA v3.1 includes new performance monitoring unit(PMU) special purpose registers (SPRs). They are Monitor Mode Control Register 3 (MMCR3) Sampled Instruction Event Register 2 (SIER2) Sampled Instruction Event Register 3 (SIER3) MMCR3 is added for further sampling related configuration control. SIER2/SIER3 are added to provide additional information about the sampled instruction. Patch adds new PPMU flag called "PPMU_ARCH_31" to support handling of these new SPRs, updates the struct thread_struct to include these new SPRs, include MMCR3 in struct mmcr_regs. This is needed to support programming of MMCR3 SPR during event_enable/disable. Patch also adds the sysfs support for the MMCR3 SPR along with SPRN_ macros for these new pmu SPRs. Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> [mpe: Rename to PPMU_ARCH_31 as noted by jpn] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1594996707-3727-5-git-send-email-atrajeev@linux.vnet.ibm.com
2020-07-22KVM: PPC: Book3S HV: Cleanup updates for kvm vcpu MMCRAthira Rajeev1-0/+2
Currently `kvm_vcpu_arch` stores all Monitor Mode Control registers in a flat array in order: mmcr0, mmcr1, mmcra, mmcr2, mmcrs Split this to give mmcra and mmcrs its own entries in vcpu and use a flat array for mmcr0 to mmcr2. This patch implements this cleanup to make code easier to read. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> [mpe: Fix MMCRA/MMCR2 uapi breakage as noted by paulus] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1594996707-3727-3-git-send-email-atrajeev@linux.vnet.ibm.com
2020-07-21powerpc/64s: Remove PROT_SAO supportNicholas Piggin1-1/+1
ISA v3.1 does not support the SAO storage control attribute required to implement PROT_SAO. PROT_SAO was used by specialised system software (Lx86) that has been discontinued for about 7 years, and is not thought to be used elsewhere, so removal should not cause problems. We rather remove it than keep support for older processors, because live migrating guest partitions to newer processors may not be possible if SAO is in use (or worse allowed with silent races). - PROT_SAO stays in the uapi header so code using it would still build. - arch_validate_prot() is removed, the generic version rejects PROT_SAO so applications would get a failure at mmap() time. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Drop KVM change for the time being] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200703011958.1166620-3-npiggin@gmail.com
2020-07-20powerpc/book3s64/kuap: Move UAMOR setup to key init functionAneesh Kumar K.V2-6/+22
UAMOR values are not application-specific. The kernel initializes its value based on different reserved keys. Remove the thread-specific UAMOR value and don't switch the UAMOR on context switch. Move UAMOR initialization to key initialization code and remove thread_struct.uamor because it is not used anymore. Before commit: 4a4a5e5d2aad ("powerpc/pkeys: key allocation/deallocation must not change pkey registers") we used to update uamor based on key allocation and free. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200709032946.881753-20-aneesh.kumar@linux.ibm.com
2020-07-20powerpc/book3s64/keys/kuap: Reset AMR/IAMR values on kexecAneesh Kumar K.V1-14/+0
As we kexec across kernels that use AMR/IAMR for different purposes we need to ensure that new kernels get kexec'd with a reset value of AMR/IAMR. For ex: the new kernel can use key 0 for kernel mapping and the old AMR value prevents access to key 0. This patch also removes reset if IAMR and AMOR in kexec_sequence. Reset of AMOR is not needed and the IAMR reset is partial (it doesn't do the reset on secondary cpus) and is redundant with this patch. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200709032946.881753-19-aneesh.kumar@linux.ibm.com
2020-07-20powerpc/book3s64/pkeys: Add MMU_FTR_PKEYAneesh Kumar K.V1-0/+5
Parse storage keys related device tree entry in early_init_devtree and enable MMU feature MMU_FTR_PKEY if pkeys are supported. MMU feature is used instead of CPU feature because this enables us to group MMU_FTR_KUAP and MMU_FTR_PKEY in asm feature fixup code. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200709032946.881753-14-aneesh.kumar@linux.ibm.com
2020-07-20powerpc/book3s64/pkeys: kill cpu feature key CPU_FTR_PKEYAneesh Kumar K.V1-6/+0
We don't use CPU_FTR_PKEY anymore. Remove the feature bit and mark it free. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200709032946.881753-9-aneesh.kumar@linux.ibm.com
2020-07-20powerpc/mce: Add MCE notification chainSantosh Sivaraj1-0/+15
Introduce notification chain which lets us know about uncorrected memory errors(UE). This would help prospective users in pmem or nvdimm subsystem to track bad blocks for better handling of persistent memory allocations. Signed-off-by: Santosh Sivaraj <santosh@fossix.org> Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200709135142.721504-1-santosh@fossix.org
2020-07-20powerpc/prom: Enable Radix GTSE in cpu pa-featuresNicholas Piggin1-1/+1
When '029ab30b4c0a ("powerpc/mm: Enable radix GTSE only if supported.")' made GTSE an MMU feature, it was enabled by default in powerpc-cpu-features but was missed in pa-features. This causes random memory corruption during boot of PowerNV kernels where CONFIG_PPC_DT_CPU_FTRS isn't enabled. Fixes: 029ab30b4c0a ("powerpc/mm: Enable radix GTSE only if supported.") Reported-by: Qian Cai <cai@lca.pw> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Bharata B Rao <bharata@linux.ibm.com> [mpe: Unwrap long line] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200720044258.863574-1-bharata@linux.ibm.com
2020-07-20net: remove compat_sys_{get,set}sockoptChristoph Hellwig1-2/+2
Now that the ->compat_{get,set}sockopt proto_ops methods are gone there is no good reason left to keep the compat syscalls separate. This fixes the odd use of unsigned int for the compat_setsockopt optlen and the missing sock_use_custom_sol_socket. It would also easily allow running the eBPF hooks for the compat syscalls, but such a large change in behavior does not belong into a consolidation patch like this one. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-19powerpc: use the generic dma_ops_bypass modeChristoph Hellwig1-81/+9
Use the DMA API bypass mechanism for direct window mappings. This uses common code and speed up the direct mapping case by avoiding indirect calls just when not using dma ops at all. It also fixes a problem where the sync_* methods were using the bypass check for DMA allocations, but those are part of the streaming ops. Note that this patch loses the DMA_ATTR_WEAK_ORDERING override, which has never been well defined, as is only used by a few drivers, which IIRC never showed up in the typical Cell blade setups that are affected by the ordering workaround. Fixes: efd176a04bef ("powerpc/pseries/dma: Allow SWIOTLB") Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2020-07-18Merge branch 'fixes' into nextMichael Ellerman2-2/+2
Merge our fixes branch, primarily to bring in the ebb selftests build fix and the pkey fix, which is a dependency for some future work.
2020-07-16powerpc/pseries: Detect secure and trusted boot state of the system.Nayna Jain1-2/+16
The device-tree properties to check secure and trusted boot state are different for guests (pseries) compared to baremetal (powernv). This patch updates the existing is_ppc_secureboot_enabled() and is_ppc_trustedboot_enabled() functions to add support for pseries. For pseries the secureboot and trustedboot state are exposed via device-tree properties /ibm,secure-boot and /ibm,trusted-boot. The values of ibm,secure-boot under pseries are interpreted as: 0 - Disabled 1 - Enabled in Log-only mode. This patch interprets this value as disabled, since audit mode is currently not supported for Linux. 2 - Enabled and enforced. 3-9 - Enabled and enforcing; requirements are at the discretion of the operating system. The values of ibm,trusted-boot under pseries are interpreted as: 0 - Disabled 1 - Enabled Signed-off-by: Nayna Jain <nayna@linux.ibm.com> Reviewed-by: Daniel Axtens <dja@axtens.net> Reviewed-by: Mimi Zohar <zohar@linux.ibm.com> [mpe: Drop machdep.h inclusion, tweak change log slightly] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1594813921-12425-1-git-send-email-nayna@linux.ibm.com
2020-07-16powerpc/vdso: Fix vdso cpu truncationMilton Miller1-1/+1
The code in vdso_cpu_init that exposes the cpu and numa node to userspace via SPRG_VDSO incorrctly masks the cpu to 12 bits. This means that any kernel running on a box with more than 4096 threads (NR_CPUS advertises a limit of of 8192 cpus) would expose userspace to two cpu contexts running at the same time with the same cpu number. Note: I'm not aware of any distro shipping a kernel with support for more than 4096 threads today, nor of any system image that currently exceeds 4096 threads. Found via code browsing. Fixes: 18ad51dd342a7eb09dbcd059d0b451b616d4dafc ("powerpc: Add VDSO version of getcpu") Signed-off-by: Milton Miller <miltonm@us.ibm.com> Signed-off-by: Anton Blanchard <anton@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200715233704.1352257-1-anton@ozlabs.org
2020-07-16powerpc/fadump: fix race between pstore write and fadump crash triggerSourabh Jain1-0/+24
When we enter into fadump crash path via system reset we fail to update the pstore. On the system reset path we first update the pstore then we go for fadump crash. But the problem here is when all the CPUs try to get the pstore lock to initiate the pstore write, only one CPUs will acquire the lock and proceed with the pstore write. Since it in NMI context CPUs that fail to get lock do not wait for their turn to write to the pstore and simply proceed with the next operation which is fadump crash. One of the CPU who proceeded with fadump crash path triggers the crash and does not wait for the CPU who gets the pstore lock to complete the pstore update. Timeline diagram to depicts the sequence of events that leads to an unsuccessful pstore update when we hit fadump crash path via system reset. 1 2 3 ... n CPU Threads | | | | | | | | Reached to -->|--->|---->| ----------->| system reset | | | | path | | | | | | | | Try to -->|--->|---->|------------>| acquire the | | | | pstore lock | | | | | | | | | | | | Got the -->| +->| | |<-+ pstore lock | | | | | |--> Didn't get the | --------------------------+ lock and moving | | | | ahead on fadump | | | | crash path | | | | Begins the -->| | | | process to | | | |<-- Got the chance to update the | | | | trigger the crash pstore | -> | | ... <- | | | | | | | | | | | | |<-- Triggers the | | | | | | crash | | | | | | ^ | | | | | | | Writing to -->| | | | | | | pstore | | | | | | | | | | ^ |__________________| | | CPU Relax | | | +-----------------------------------------+ | v Race: crash triggered before pstore update completes To avoid this race condition a barrier is added on crash_fadump path, it prevents the CPU to trigger the crash until all the online CPUs completes their task. A barrier is added to make sure all the secondary CPUs hit the crash_fadump function before we initiates the crash. A timeout is kept to ensure the primary CPU (one who initiates the crash) do not wait for secondary CPUs indefinitely. Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200713052435.183750-1-sourabhjain@linux.ibm.com
2020-07-16powerpc/rtasd: simplify handle_rtas_event(), emit message on eventsNathan Lynch1-25/+3
prrn_is_enabled() always returns false/0, so handle_rtas_event() can be simplified and some dead code can be removed. Use machine_is() instead of #ifdef to run this code only on pseries, and add an informational ratelimited message that we are ignoring the events. PRRN events are relatively rare in normal operation and usually arise from operator-initiated actions such as a DPO (Dynamic Platform Optimizer) run. Eventually we do want to consume these events and update the device tree, but that needs more care to be safe vs LPM and DLPAR. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200612051238.1007764-13-nathanl@linux.ibm.com
2020-07-16powerpc/rtas: don't online CPUs for partition suspendNathan Lynch1-120/+2
Partition suspension, used for hibernation and migration, requires that the OS place all but one of the LPAR's processor threads into one of two states prior to calling the ibm,suspend-me RTAS function: * the architected offline state (via RTAS stop-self); or * the H_JOIN hcall, which does not return until the partition resumes execution Using H_CEDE as the offline mode, introduced by commit 3aa565f53c39 ("powerpc/pseries: Add hooks to put the CPU into an appropriate offline state"), means that any threads which are offline from Linux's point of view must be moved to one of those two states before a partition suspension can proceed. This was eventually addressed in commit 120496ac2d2d ("powerpc: Bring all threads online prior to migration/hibernation"), which added code to temporarily bring up any offline processor threads so they can call H_JOIN. Conceptually this is fine, but the implementation has had multiple races with cpu hotplug operations initiated from user space[1][2][3], the error handling is fragile, and it generates user-visible cpu hotplug events which is a lot of noise for a platform feature that's supposed to minimize disruption to workloads. With commit 3aa565f53c39 ("powerpc/pseries: Add hooks to put the CPU into an appropriate offline state") reverted, this code becomes unnecessary, so remove it. Since any offline CPUs now are truly offline from the platform's point of view, it is no longer necessary to bring up CPUs only to have them call H_JOIN and then go offline again upon resuming. Only active threads are required to call H_JOIN; stopped threads can be left alone. [1] commit a6717c01ddc2 ("powerpc/rtas: use device model APIs and serialization during LPM") [2] commit 9fb603050ffd ("powerpc/rtas: retry when cpu offline races with suspend/migration") [3] commit dfd718a2ed1f ("powerpc/rtas: Fix a potential race between CPU-Offline & Migration") Fixes: 120496ac2d2d ("powerpc: Bring all threads online prior to migration/hibernation") Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200612051238.1007764-3-nathanl@linux.ibm.com
2020-07-16powerpc/security: Allow for processors that flush the link stack using the ↵Nicholas Piggin1-8/+19
special bcctr If both count cache and link stack are to be flushed, and can be flushed with the special bcctr, patch that in directly to the flush/branch nop site. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200609070610.846703-7-npiggin@gmail.com
2020-07-16powerpc/64s: Move branch cache flushing bcctr variant to ppc-ops.hNicholas Piggin1-4/+2
Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200609070610.846703-6-npiggin@gmail.com
2020-07-16powerpc/security: split branch cache flush toggle from code patchingNicholas Piggin1-43/+51
Branch cache flushing code patching has inter-dependencies on both the link stack and the count cache flushing state. To make the code clearer and to separate the link stack and count cache handling, split the "toggle" (setting up variables and printing enable/disable) from the code patching. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Always print something, even if the flush is disabled] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200609070610.846703-5-npiggin@gmail.com
2020-07-16powerpc/security: make display of branch cache flush more consistentNicholas Piggin1-4/+4
Make the count-cache and link-stack messages look the same Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200609070610.846703-4-npiggin@gmail.com
2020-07-16powerpc/security: change link stack flush state to the flush type enumNicholas Piggin1-5/+5
Prepare to allow for hardware link stack flushing by using the none/sw/hw type, same as the count cache state. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200609070610.846703-3-npiggin@gmail.com
2020-07-16powerpc/security: re-name count cache flush to branch cache flushNicholas Piggin2-22/+21
The count cache flush mostly refers to both count cache and link stack flushing. As a first step to untangling these a bit, re-name the bits that apply to both. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200609070610.846703-2-npiggin@gmail.com
2020-07-16powerpc: re-initialise lazy FPU/VEC counters on every faultNicholas Piggin2-6/+2
When a FP/VEC/VSX unavailable fault loads registers and enables the facility in the MSR, re-set the lazy restore counters to 1 rather than incrementing them so every fault gets the same number of restores before the next fault. This probably shouldn't be a practical change because if a lazy counter was non-zero then it should have been restored and would not cause a fault when userspace tries to access it. However the code and comment implies otherwise so that's misleading and unnecessary. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200623234139.2262227-3-npiggin@gmail.com
2020-07-16powerpc/64s: Fix restore_math unnecessarily changing MSRNicholas Piggin2-40/+68
Before returning to user, if there are missing FP/VEC/VSX bits from the user MSR then those registers had been saved and must be restored again before use. restore_math will decide whether to restore immediately, or skip the restore and let fp/vec/vsx unavailable faults demand load the registers. Each time restore_math restores one of the FP/VSX or VEC register sets is loaded, an 8-bit counter is incremented (load_fp and load_vec). When these wrap to zero, restore_math no longer restores that register set until after they are next demand faulted. It's quite usual for those counters to have different values, so if one wraps to zero and restore_math no longer restores its registers or user MSR bit but the other is not zero yet does not need to be restored (because the kernel is not frequently using the FPU), then restore_math will be called and it will also not return in the early exit check. This causes msr_check_and_set to test and set the MSR at every kernel exit despite having no work to do. This can cause workloads (e.g., a NULL syscall microbenchmark) to run fast for a time while both counters are non-zero, then slow down when one of the counters reaches zero, then speed up again after the second counter reaches zero. The cost is significant, about 10% slowdown on a NULL syscall benchmark, and the jittery behaviour is very undesirable. Fix this by having restore_math test all conditions first, and only update MSR if we will be loading registers. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200623234139.2262227-2-npiggin@gmail.com
2020-07-16powerpc/64s: restore_math remove TM testNicholas Piggin1-2/+1
The TM test in restore_math added by commit dc16b553c949e ("powerpc: Always restore FPU/VEC/VSX if hardware transactional memory in use") is no longer necessary after commit a8318c13e79ba ("powerpc/tm: Fix restoring FP/VMX facility incorrectly on interrupts"), which removed the cases where restore_math has to restore if TM is active. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200623234139.2262227-1-npiggin@gmail.com
2020-07-16powerpc/mm: Enable radix GTSE only if supported.Bharata B Rao2-5/+9
Make GTSE an MMU feature and enable it by default for radix. However for guest, conditionally enable it if hypervisor supports it via OV5 vector. Let prom_init ask for radix GTSE only if the support exists. Having GTSE as an MMU feature will make it easy to enable radix without GTSE. Currently radix assumes GTSE is enabled by default. Signed-off-by: Bharata B Rao <bharata@linux.ibm.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200703053608.12884-2-bharata@linux.ibm.com
2020-07-15powerpc/vdso64: Switch from __get_datapage() to get_datapage inline macroChristophe Leroy3-34/+12
On the same way as already done on PPC32, drop __get_datapage() function and use get_datapage inline macro instead. See commit ec0895f08f99 ("powerpc/vdso32: inline __get_datapage()") Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/e13d95312e0b9792556b19b4bb8955cc1ff19fc7.1588079622.git.christophe.leroy@c-s.fr
2020-07-15powerpc/signal64: Don't opencode page prefaultingChristophe Leroy1-4/+3
Instead of doing a __get_user() from the first and last location into a tmp var which won't be used, use fault_in_pages_readable() Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/810bd8840ef990a200f58c9dea9abe767ca02a3a.1594146723.git.christophe.leroy@csgroup.eu
2020-07-15powerpc/signal_32: Simplify loop in PPC64 save_general_regs()Christophe Leroy1-10/+8
save_general_regs() which does special handling when i == PT_SOFTE. Rewrite it to minimise the specific part, especially the __put_user() and associated error handling is the same so make it common. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> [mpe: Use a regular if rather than ternary operator] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/47a38df46cae5a5a88a558a64d71f75e9c4d9950.1594125164.git.christophe.leroy@csgroup.eu
2020-07-15powerpc/signal_32: Remove !FULL_REGS() special handling in PPC64 ↵Christophe Leroy1-2/+0
save_general_regs() Since commit ("1bd79336a426 powerpc: Fix various syscall/signal/swapcontext bugs"), getting save_general_regs() called without FULL_REGS() is very unlikely and generates a warning. The 32-bit version of save_general_regs() doesn't take care of it at all and copies all registers anyway since that commit. Moreover, commit 965dd3ad3076 ("powerpc/64/syscall: Remove non-volatile GPR save optimisation") is another reason why it would never happen. So the same with 64-bit, don't worry about FULL_REGS() and copy all registers all the time. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/173de3b659fa3a5f126a0eb170522cccd909950f.1594125164.git.christophe.leroy@csgroup.eu
2020-07-15powerpc/64/signal: Balance return predictor stack in signal trampolineNicholas Piggin2-18/+17
Returning from an interrupt or syscall to a signal handler currently begins execution directly at the handler's entry point, with LR set to the address of the sigreturn trampoline. When the signal handler function returns, it runs the trampoline. It looks like this: # interrupt at user address xyz # kernel stuff... signal is raised rfid # void handler(int sig) addis 2,12,.TOC.-.LCF0@ha addi 2,2,.TOC.-.LCF0@l mflr 0 std 0,16(1) stdu 1,-96(1) # handler stuff ld 0,16(1) mtlr 0 blr # __kernel_sigtramp_rt64 addi r1,r1,__SIGNAL_FRAMESIZE li r0,__NR_rt_sigreturn sc # kernel executes rt_sigreturn rfid # back to user address xyz Note the blr with no matching bl. This can corrupt the return predictor. Solve this by instead resuming execution at the signal trampoline which then calls the signal handler. qtrace-tools link_stack checker confirms the entire user/kernel/vdso cycle is balanced after this patch, whereas it's not upstream. Alan confirms the dwarf unwind info still looks good. gdb still recognises the signal frame and can step into parent frames if it break inside a signal handler. Performance is pretty noisy, not a very significant change on a POWER9 here, but branch misses are consistently a lot lower on a microbenchmark: Performance counter stats for './signal': 13,085.72 msec task-clock # 1.000 CPUs utilized 45,024,760,101 cycles # 3.441 GHz 65,102,895,542 instructions # 1.45 insn per cycle 11,271,673,787 branches # 861.372 M/sec 59,468,979 branch-misses # 0.53% of all branches 12,989.09 msec task-clock # 1.000 CPUs utilized 44,692,719,559 cycles # 3.441 GHz 65,109,984,964 instructions # 1.46 insn per cycle 11,282,136,057 branches # 868.585 M/sec 39,786,942 branch-misses # 0.35% of all branches Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200511101952.1463138-1-npiggin@gmail.com
2020-07-15powerpc/cacheinfo: Add per cpu per index shared_cpu_listSrikar Dronamraju1-1/+10
Unlike drivers/base/cacheinfo, powerpc cacheinfo code is not exposing shared_cpu_list under /sys/devices/system/cpu/cpu<n>/cache/index<m> Add shared_cpu_list to per cpu per index directory to maintain parity with x86. Some scripts (example: mmtests https://github.com/gormanm/mmtests) seem to be looking for shared_cpu_list instead of shared_cpu_map. Before this patch: # ls /sys/devices/system/cpu0/cache/index1 coherency_line_size number_of_sets size ways_of_associativity level shared_cpu_map type # cat /sys/devices/system/cpu0/cache/index1/shared_cpu_map 00ff # After this patch: # ls /sys/devices/system/cpu0/cache/index1 coherency_line_size number_of_sets shared_cpu_map type level shared_cpu_list size ways_of_associativity # cat /sys/devices/system/cpu0/cache/index1/shared_cpu_map 00ff # cat /sys/devices/system/cpu0/cache/index1/shared_cpu_list 0-7 # Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200629103703.4538-4-srikar@linux.vnet.ibm.com
2020-07-15powerpc/cacheinfo: Make cpumap_show code reusableSrikar Dronamraju1-2/+8
In anticipation of implementing shared_cpu_list, move code under shared_cpu_map_show() to a common function. No functional changes. Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200629103703.4538-3-srikar@linux.vnet.ibm.com
2020-07-15powerpc/cacheinfo: Use cpumap_print to print cpumapSrikar Dronamraju1-6/+2
Tejun Heo had modified shared_cpu_map_show() to use scnprintf instead of cpumap_print during support for *pb[l] format. Refer commit 0c118b7bd09a ("powerpc: use %*pb[l] to print bitmaps including cpumasks and nodemasks"). cpumap_print_to_pagebuf() is a standard function to print cpumap. With commit 9cf79d115f0d ("bitmap: remove explicit newline handling using scnprintf format string"), there is no need to print explicit newline and trailing null character. cpumap_print_to_pagebuf() internally uses scnprintf(). Hence replace scnprintf() with cpumap_print_to_pagebuf(). Note: shared_cpu_map_show() in drivers/base/cacheinfo.c already uses cpumap_print_to_pagebuf(). Before this patch: # cat /sys/devices/system/cpu0/cache/index1/shared_cpu_map 00ff # (Notice the extra blank line). After this patch: # cat /sys/devices/system/cpu0/cache/index1/shared_cpu_map 00ff # Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200629103703.4538-2-srikar@linux.vnet.ibm.com
2020-07-14powerpc/pseries/svm: Fix incorrect check for shared_lppaca_sizeSatheesh Rajendran1-1/+1
Early secure guest boot hits the below crash while booting with vcpus numbers aligned with page boundary for PAGE size of 64k and LPPACA size of 1k i.e 64, 128 etc. Partition configured for 64 cpus. CPU maps initialized for 1 thread per core ------------[ cut here ]------------ kernel BUG at arch/powerpc/kernel/paca.c:89! Oops: Exception in kernel mode, sig: 5 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries This is due to the BUG_ON() for shared_lppaca_total_size equal to shared_lppaca_size. Instead the code should only BUG_ON() if we have exceeded the total_size, which indicates we've overflowed the array. Fixes: bd104e6db6f0 ("powerpc/pseries/svm: Use shared memory for LPPACA structures") Cc: stable@vger.kernel.org # v5.4+ Signed-off-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com> Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com> Reviewed-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> [mpe: Reword change log to clarify we're fixing not removing the check] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200619070113.16696-1-sathnaga@linux.vnet.ibm.com
2020-07-10powerpc64: Break asm/percpu.h vs spinlock_types.h dependencyPeter Zijlstra1-0/+2
In order to use <asm/percpu.h> in lockdep.h, we need to make sure asm/percpu.h does not itself depend on lockdep. The below seems to make that so and builds powerpc64-defconfig + PROVE_LOCKING. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Ingo Molnar <mingo@kernel.org> https://lkml.kernel.org/r/20200623083721.336906073@infradead.org
2020-07-08powerpc/64s/exception: Fix 0x1500 interrupt handler crashNicholas Piggin1-1/+1
A typo caused the interrupt handler to branch immediately to the common "unknown interrupt" handler and skip the special case test for denormal cause. This does not affect KVM softpatch handling (e.g., for POWER9 TM assist) because the KVM test was moved to common code by commit 9600f261acaa ("powerpc/64s/exception: Move KVM test to common code") just before this bug was introduced. Fixes: 3f7fbd97d07d ("powerpc/64s/exception: Clean up SRR specifiers") Reported-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Tested-by: Paul Menzel <pmenzel@molgen.mpg.de> [mpe: Split selftest into a separate patch] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200708074942.1713396-1-npiggin@gmail.com
2020-07-08PCI: Use 'pci_channel_state_t' instead of 'enum pci_channel_state'Luc Van Oostenryck1-1/+1
The method struct pci_error_handlers.error_detected() is defined and documented as taking an 'enum pci_channel_state' for the second argument, but most drivers use 'pci_channel_state_t' instead. This 'pci_channel_state_t' is not a typedef for the enum but a typedef for a bitwise type in order to have better/stricter typechecking. Consolidate everything by using 'pci_channel_state_t' in the method's definition, in the related helpers and in the drivers. Enforce use of 'pci_channel_state_t' by replacing 'enum pci_channel_state' with an anonymous 'enum'. Note: Currently, from a typechecking point of view this patch changes nothing because only the constants defined by the enum are bitwise, not the enum itself (sparse doesn't have the notion of 'bitwise enum'). This may change in some not too far future, hence the patch. [bhelgaas: squash in https://lore.kernel.org/r/20200702162651.49526-3-luc.vanoostenryck@gmail.com https://lore.kernel.org/r/20200702162651.49526-4-luc.vanoostenryck@gmail.com] Link: https://lore.kernel.org/r/20200702162651.49526-2-luc.vanoostenryck@gmail.com Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2020-07-07kbuild: remove cc-option test of -fno-stack-protectorMasahiro Yamada1-1/+1
Some Makefiles already pass -fno-stack-protector unconditionally. For example, arch/arm64/kernel/vdso/Makefile, arch/x86/xen/Makefile. No problem report so far about hard-coding this option. So, we can assume all supported compilers know -fno-stack-protector. GCC 4.8 and Clang support this option (https://godbolt.org/z/_HDGzN) Get rid of cc-option from -fno-stack-protector. Remove CONFIG_CC_HAS_STACKPROTECTOR_NONE, which is always 'y'. Note: arch/mips/vdso/Makefile adds -fno-stack-protector twice, first unconditionally, and second conditionally. I removed the second one. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
2020-07-05arch: rename copy_thread_tls() back to copy_thread()Christian Brauner1-1/+1
Now that HAVE_COPY_THREAD_TLS has been removed, rename copy_thread_tls() back simply copy_thread(). It's a simpler name, and doesn't imply that only tls is copied here. This finishes an outstanding chunk of internal process creation work since we've added clone3(). Cc: linux-arch@vger.kernel.org Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>A Acked-by: Stafford Horne <shorne@gmail.com> Acked-by: Greentime Hu <green.hu@gmail.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>A Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-06-22powerpc/pseries/svm: Drop unused align argument in alloc_shared_lppaca() ↵Satheesh Rajendran1-3/+10
function Argument "align" in alloc_shared_lppaca() was unused inside the function. Let's drop it and update code comment for page alignment. Signed-off-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com> Reviewed-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com> [mpe: Massage comment wording/formatting] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200612142953.135408-1-sathnaga@linux.vnet.ibm.com
2020-06-22powerpc/dt_cpu_ftrs: Make use of macro ISA_V3_1Murilo Opsfelder Araujo1-1/+1
Macro ISA_V3_1 was defined but never used. Use it instead of literal. Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200610215114.167544-4-muriloo@linux.ibm.com
2020-06-22powerpc/dt_cpu_ftrs: Make use of macro ISA_V3_0BMurilo Opsfelder Araujo1-1/+1
Macro ISA_V3_0B was defined but never used. Use it instead of literal. Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200610215114.167544-3-muriloo@linux.ibm.com
2020-06-22powerpc/dt_cpu_ftrs: Remove unused macro ISA_V2_07BMurilo Opsfelder Araujo1-1/+0
Macro ISA_V2_07B is defined but not used anywhere else in the code. Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200610215114.167544-2-muriloo@linux.ibm.com
2020-06-22powerpc/64: indirect function call use bctrl rather than blrl in ↵Nicholas Piggin1-2/+2
ret_from_kernel_thread blrl is not recommended to use as an indirect function call, as it may corrupt the link stack predictor. This is not a performance critical path but this should be fixed for consistency. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200611121119.1015740-1-npiggin@gmail.com
2020-06-21Merge tag 'powerpc-5.8-3' of ↵Linus Torvalds2-11/+13
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - One fix for the interrupt rework we did last release which broke KVM-PR - Three commits fixing some fallout from the READ_ONCE() changes interacting badly with our 8xx 16K pages support, which uses a pte_t that is a structure of 4 actual PTEs - A cleanup of the 8xx pte_update() to use the newly added pmd_off() - A fix for a crash when handling an oops if CONFIG_DEBUG_VIRTUAL is enabled - A minor fix for the SPU syscall generation Thanks to Aneesh Kumar K.V, Christian Zigotzky, Christophe Leroy, Mike Rapoport, Nicholas Piggin. * tag 'powerpc-5.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/8xx: Provide ptep_get() with 16k pages mm: Allow arches to provide ptep_get() mm/gup: Use huge_ptep_get() in gup_hugepte() powerpc/syscalls: Use the number when building SPU syscall table powerpc/8xx: use pmd_off() to access a PMD entry in pte_update() powerpc/64s: Fix KVM interrupt using wrong save area powerpc: Fix kernel crash in show_instructions() w/DEBUG_VIRTUAL
2020-06-18maccess: make get_kernel_nofault() check for minimal type compatibilityLinus Torvalds1-1/+1
Now that we've renamed probe_kernel_address() to get_kernel_nofault() and made it look and behave more in line with get_user(), some of the subtle type behavior differences end up being more obvious and possibly dangerous. When you do get_user(val, user_ptr); the type of the access comes from the "user_ptr" part, and the above basically acts as val = *user_ptr; by design (except, of course, for the fact that the actual dereference is done with a user access). Note how in the above case, the type of the end result comes from the pointer argument, and then the value is cast to the type of 'val' as part of the assignment. So the type of the pointer is ultimately the more important type both for the access itself. But 'get_kernel_nofault()' may now _look_ similar, but it behaves very differently. When you do get_kernel_nofault(val, kernel_ptr); it behaves like val = *(typeof(val) *)kernel_ptr; except, of course, for the fact that the actual dereference is done with exception handling so that a faulting access is suppressed and returned as the error code. But note how different the casting behavior of the two superficially similar accesses are: one does the actual access in the size of the type the pointer points to, while the other does the access in the size of the target, and ignores the pointer type entirely. Actually changing get_kernel_nofault() to act like get_user() is almost certainly the right thing to do eventually, but in the meantime this patch adds logit to at least verify that the pointer type is compatible with the type of the result. In many cases, this involves just casting the pointer to 'void *' to make it obvious that the type of the pointer is not the important part. It's not how 'get_user()' acts, but at least the behavioral difference is now obvious and explicit. Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-18maccess: rename probe_kernel_address to get_kernel_nofaultChristoph Hellwig3-3/+3
Better describe what this helper does, and match the naming of copy_from_kernel_nofault. Also switch the argument order around, so that it acts and looks like get_user(). Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>