summaryrefslogtreecommitdiff
path: root/arch/powerpc
AgeCommit message (Collapse)AuthorFilesLines
2020-07-20powerpc/mce: Add MCE notification chainSantosh Sivaraj2-0/+17
Introduce notification chain which lets us know about uncorrected memory errors(UE). This would help prospective users in pmem or nvdimm subsystem to track bad blocks for better handling of persistent memory allocations. Signed-off-by: Santosh Sivaraj <santosh@fossix.org> Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200709135142.721504-1-santosh@fossix.org
2020-07-20powerpc/mm/radix: Create separate mappings for hot-plugged memoryAneesh Kumar K.V3-12/+94
To enable memory unplug without splitting kernel page table mapping, we force the max mapping size to the LMB size. LMB size is the unit in which hypervisor will do memory add/remove operation. Pseries systems supports max LMB size of 256MB. Hence on pseries, we now end up mapping memory with 2M page size instead of 1G. To improve that we want hypervisor to hint the kernel about the hotplug memory range. That was added that as part of commit b6eca183e23e ("powerpc/kernel: Enables memory hot-remove after reboot on pseries guests") But PowerVM doesn't provide that hint yet. Once we get PowerVM updated, we can then force the 2M mapping only to hot-pluggable memory region using memblock_is_hotpluggable(). Till then let's depend on LMB size for finding the mapping page size for linear range. With this change KVM guest will also be doing linear mapping with 2M page size. The actual TLB benefit of mapping guest page table entries with hugepage size can only be materialized if the partition scoped entries are also using the same or higher page size. A guest using 1G hugetlbfs backing guest memory can have a performance impact with the above change. Signed-off-by: Bharata B Rao <bharata@linux.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> [mpe: Fold in fix from Aneesh spotted by lkp@intel.com] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200709131925.922266-5-aneesh.kumar@linux.ibm.com
2020-07-20powerpc/mm/radix: Remove split_kernel_mapping()Bharata B Rao1-76/+19
We split the page table mapping on memory unplug if the linear range was mapped with huge page mapping (for ex: 1G) The page table splitting code has a few issues: 1. Recursive locking -------------------- Memory unplug path takes cpu_hotplug_lock and calls stop_machine() for splitting the mappings. However stop_machine() takes cpu_hotplug_lock again causing deadlock. 2. BUG: sleeping function called from in_atomic() context --------------------------------------------------------- Memory unplug path (remove_pagetable) takes init_mm.page_table_lock spinlock and later calls stop_machine() which does wait_for_completion() 3. Bad unlock unbalance ----------------------- Memory unplug path takes init_mm.page_table_lock spinlock and calls stop_machine(). The stop_machine thread function runs in a different thread context (migration thread) which tries to release and reaquire ptl. Releasing ptl from a different thread than which acquired it causes bad unlock unbalance. These problems can be avoided if we avoid mapping hot-plugged memory with 1G mapping, thereby removing the need for splitting them during unplug. The kernel always make sure the minimum unplug request is SUBSECTION_SIZE for device memory and SECTION_SIZE for regular memory. In preparation for such a change remove page table splitting support. This essentially is a revert of commit 4dd5f8a99e791 ("powerpc/mm/radix: Split linear mapping on hot-unplug") Signed-off-by: Bharata B Rao <bharata@linux.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200709131925.922266-4-aneesh.kumar@linux.ibm.com
2020-07-20powerpc/mm/radix: Free PUD table when freeing pagetableBharata B Rao1-0/+16
remove_pagetable() isn't freeing PUD table. This causes memory leak during memory unplug. Fix this. Fixes: 4b5d62ca17a1 ("powerpc/mm: add radix__remove_section_mapping()") Signed-off-by: Bharata B Rao <bharata@linux.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200709131925.922266-3-aneesh.kumar@linux.ibm.com
2020-07-20powerpc/mm/radix: Fix PTE/PMD fragment count for early page table mappingsAneesh Kumar K.V4-6/+33
We can hit the following BUG_ON during memory unplug: kernel BUG at arch/powerpc/mm/book3s64/pgtable.c:342! Oops: Exception in kernel mode, sig: 5 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries NIP [c000000000093308] pmd_fragment_free+0x48/0xc0 LR [c00000000147bfec] remove_pagetable+0x578/0x60c Call Trace: 0xc000008050000000 (unreliable) remove_pagetable+0x384/0x60c radix__remove_section_mapping+0x18/0x2c remove_section_mapping+0x1c/0x3c arch_remove_memory+0x11c/0x180 try_remove_memory+0x120/0x1b0 __remove_memory+0x20/0x40 dlpar_remove_lmb+0xc0/0x114 dlpar_memory+0x8b0/0xb20 handle_dlpar_errorlog+0xc0/0x190 pseries_hp_work_fn+0x2c/0x60 process_one_work+0x30c/0x810 worker_thread+0x98/0x540 kthread+0x1c4/0x1d0 ret_from_kernel_thread+0x5c/0x74 This occurs when unplug is attempted for such memory which has been mapped using memblock pages as part of early kernel page table setup. We wouldn't have initialized the PMD or PTE fragment count for those PMD or PTE pages. This can be fixed by allocating memory in PAGE_SIZE granularity during early page table allocation. This makes sure a specific page is not shared for another memblock allocation and we can free them correctly on removing page-table pages. Since we now do PAGE_SIZE allocations for both PUD table and PMD table (Note that PTE table allocation is already of PAGE_SIZE), we end up allocating more memory for the same amount of system RAM. Here is a comparision of how much more we need for a 64T and 2G system after this patch: 1. 64T system ------------- 64T RAM would need 64G for vmemmap with struct page size being 64B. 128 PUD tables for 64T memory (1G mappings) 1 PUD table and 64 PMD tables for 64G vmemmap (2M mappings) With default PUD[PMD]_TABLE_SIZE(4K), (128+1+64)*4K=772K With PAGE_SIZE(64K) table allocations, (128+1+64)*64K=12352K 2. 2G system ------------ 2G RAM would need 2M for vmemmap with struct page size being 64B. 1 PUD table for 2G memory (1G mapping) 1 PUD table and 1 PMD table for 2M vmemmap (2M mappings) With default PUD[PMD]_TABLE_SIZE(4K), (1+1+1)*4K=12K With new PAGE_SIZE(64K) table allocations, (1+1+1)*64K=192K Signed-off-by: Bharata B Rao <bharata@linux.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200709131925.922266-2-aneesh.kumar@linux.ibm.com
2020-07-20powerpc/prom: Enable Radix GTSE in cpu pa-featuresNicholas Piggin1-1/+1
When '029ab30b4c0a ("powerpc/mm: Enable radix GTSE only if supported.")' made GTSE an MMU feature, it was enabled by default in powerpc-cpu-features but was missed in pa-features. This causes random memory corruption during boot of PowerNV kernels where CONFIG_PPC_DT_CPU_FTRS isn't enabled. Fixes: 029ab30b4c0a ("powerpc/mm: Enable radix GTSE only if supported.") Reported-by: Qian Cai <cai@lca.pw> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Bharata B Rao <bharata@linux.ibm.com> [mpe: Unwrap long line] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200720044258.863574-1-bharata@linux.ibm.com
2020-07-20net: remove compat_sys_{get,set}sockoptChristoph Hellwig1-2/+2
Now that the ->compat_{get,set}sockopt proto_ops methods are gone there is no good reason left to keep the compat syscalls separate. This fixes the odd use of unsigned int for the compat_setsockopt optlen and the missing sock_use_custom_sol_socket. It would also easily allow running the eBPF hooks for the compat syscalls, but such a large change in behavior does not belong into a consolidation patch like this one. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-19powerpc: use the generic dma_ops_bypass modeChristoph Hellwig3-86/+10
Use the DMA API bypass mechanism for direct window mappings. This uses common code and speed up the direct mapping case by avoiding indirect calls just when not using dma ops at all. It also fixes a problem where the sync_* methods were using the bypass check for DMA allocations, but those are part of the streaming ops. Note that this patch loses the DMA_ATTR_WEAK_ORDERING override, which has never been well defined, as is only used by a few drivers, which IIRC never showed up in the typical Cell blade setups that are affected by the ordering workaround. Fixes: efd176a04bef ("powerpc/pseries/dma: Allow SWIOTLB") Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2020-07-19dma-mapping: make support for dma ops optionalChristoph Hellwig1-0/+1
Avoid the overhead of the dma ops support for tiny builds that only use the direct mapping. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2020-07-18Merge branch 'fixes' into nextMichael Ellerman7-12/+15
Merge our fixes branch, primarily to bring in the ebb selftests build fix and the pkey fix, which is a dependency for some future work.
2020-07-16treewide: Remove uninitialized_var() usageKees Cook3-3/+3
Using uninitialized_var() is dangerous as it papers over real bugs[1] (or can in the future), and suppresses unrelated compiler warnings (e.g. "unused variable"). If the compiler thinks it is uninitialized, either simply initialize the variable or make compiler changes. In preparation for removing[2] the[3] macro[4], remove all remaining needless uses with the following script: git grep '\buninitialized_var\b' | cut -d: -f1 | sort -u | \ xargs perl -pi -e \ 's/\buninitialized_var\(([^\)]+)\)/\1/g; s:\s*/\* (GCC be quiet|to make compiler happy) \*/$::g;' drivers/video/fbdev/riva/riva_hw.c was manually tweaked to avoid pathological white-space. No outstanding warnings were found building allmodconfig with GCC 9.3.0 for x86_64, i386, arm64, arm, powerpc, powerpc64le, s390x, mips, sparc64, alpha, and m68k. [1] https://lore.kernel.org/lkml/20200603174714.192027-1-glider@google.com/ [2] https://lore.kernel.org/lkml/CA+55aFw+Vbj0i=1TGqCR5vQkCzWJ0QxK6CernOU6eedsudAixw@mail.gmail.com/ [3] https://lore.kernel.org/lkml/CA+55aFwgbgqhbp1fkxvRKEpzyR5J8n1vKT1VZdz9knmPuXhOeg@mail.gmail.com/ [4] https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/ Reviewed-by: Leon Romanovsky <leonro@mellanox.com> # drivers/infiniband and mlx4/mlx5 Acked-by: Jason Gunthorpe <jgg@mellanox.com> # IB Acked-by: Kalle Valo <kvalo@codeaurora.org> # wireless drivers Reviewed-by: Chao Yu <yuchao0@huawei.com> # erofs Signed-off-by: Kees Cook <keescook@chromium.org>
2020-07-16KVM: PPC: Book3S PR: Remove uninitialized_var() usageKees Cook1-3/+0
Using uninitialized_var() is dangerous as it papers over real bugs[1] (or can in the future), and suppresses unrelated compiler warnings (e.g. "unused variable"). If the compiler thinks it is uninitialized, either simply initialize the variable or make compiler changes. As a precursor to removing[2] this[3] macro[4], just remove this variable since it was actually unused: arch/powerpc/kvm/book3s_pr.c:1832:16: warning: unused variable 'vrsave' [-Wunused-variable] unsigned long vrsave; ^ [1] https://lore.kernel.org/lkml/20200603174714.192027-1-glider@google.com/ [2] https://lore.kernel.org/lkml/CA+55aFw+Vbj0i=1TGqCR5vQkCzWJ0QxK6CernOU6eedsudAixw@mail.gmail.com/ [3] https://lore.kernel.org/lkml/CA+55aFwgbgqhbp1fkxvRKEpzyR5J8n1vKT1VZdz9knmPuXhOeg@mail.gmail.com/ [4] https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/ Suggested-by: Nathan Chancellor <natechancellor@gmail.com> Fixes: f05ed4d56e9c ("KVM: PPC: Split out code from book3s.c into book3s_pr.c") Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Kees Cook <keescook@chromium.org>
2020-07-16powerpc/pseries: Detect secure and trusted boot state of the system.Nayna Jain1-2/+16
The device-tree properties to check secure and trusted boot state are different for guests (pseries) compared to baremetal (powernv). This patch updates the existing is_ppc_secureboot_enabled() and is_ppc_trustedboot_enabled() functions to add support for pseries. For pseries the secureboot and trustedboot state are exposed via device-tree properties /ibm,secure-boot and /ibm,trusted-boot. The values of ibm,secure-boot under pseries are interpreted as: 0 - Disabled 1 - Enabled in Log-only mode. This patch interprets this value as disabled, since audit mode is currently not supported for Linux. 2 - Enabled and enforced. 3-9 - Enabled and enforcing; requirements are at the discretion of the operating system. The values of ibm,trusted-boot under pseries are interpreted as: 0 - Disabled 1 - Enabled Signed-off-by: Nayna Jain <nayna@linux.ibm.com> Reviewed-by: Daniel Axtens <dja@axtens.net> Reviewed-by: Mimi Zohar <zohar@linux.ibm.com> [mpe: Drop machdep.h inclusion, tweak change log slightly] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1594813921-12425-1-git-send-email-nayna@linux.ibm.com
2020-07-16powerpc/vdso: Fix vdso cpu truncationMilton Miller1-1/+1
The code in vdso_cpu_init that exposes the cpu and numa node to userspace via SPRG_VDSO incorrctly masks the cpu to 12 bits. This means that any kernel running on a box with more than 4096 threads (NR_CPUS advertises a limit of of 8192 cpus) would expose userspace to two cpu contexts running at the same time with the same cpu number. Note: I'm not aware of any distro shipping a kernel with support for more than 4096 threads today, nor of any system image that currently exceeds 4096 threads. Found via code browsing. Fixes: 18ad51dd342a7eb09dbcd059d0b451b616d4dafc ("powerpc: Add VDSO version of getcpu") Signed-off-by: Milton Miller <miltonm@us.ibm.com> Signed-off-by: Anton Blanchard <anton@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200715233704.1352257-1-anton@ozlabs.org
2020-07-16powerpc/perf: Add kernel support for new MSR[HV PR] bits in trace-imcAnju T Sudhakar2-5/+29
IMC trace-mode record has MSR[HV PR] bits added in the third DW. These bits can be used to set the cpumode for the instruction pointer captured in each sample. Add support in kernel to use these bits to set the cpumode for each sample. Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200713144623.508695-1-maddy@linux.ibm.com
2020-07-16powerpc/Kconfig: Replace HTTP links with HTTPS onesAlexander A. Klimov1-1/+1
Rationale: Reduces attack surface on kernel devs opening the links for MITM as HTTPS traffic is much harder to manipulate. Deterministic algorithm: For each file: If not .svg: For each line: If doesn't contain `\bxmlns\b`: For each link, `\bhttp://[^# \t\r\n]*(?:\w|/)`: If neither `\bgnu\.org/license`, nor `\bmozilla\.org/MPL\b`: If both the HTTP and HTTPS versions return 200 OK and serve the same content: Replace HTTP with HTTPS. Signed-off-by: Alexander A. Klimov <grandmaster@al2klimov.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200713192656.37443-1-grandmaster@al2klimov.de
2020-07-16pseries: Fix 64 bit logical memory block panicAnton Blanchard1-1/+1
Booting with a 4GB LMB size causes us to panic: qemu-system-ppc64: OS terminated: OS panic: Memory block size not suitable: 0x0 Fix pseries_memory_block_size() to handle 64 bit LMBs. Cc: stable@vger.kernel.org Signed-off-by: Anton Blanchard <anton@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200715000820.1255764-1-anton@ozlabs.org
2020-07-16powerpc/xive: Remove unused inline function xive_kexec_teardown_cpu()YueHaibing1-1/+0
commit e27e0a94651e ("powerpc/xive: Remove xive_kexec_teardown_cpu()") left behind this, remove it. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200715025040.33952-1-yuehaibing@huawei.com
2020-07-16powerpc/fadump: fix race between pstore write and fadump crash triggerSourabh Jain1-0/+24
When we enter into fadump crash path via system reset we fail to update the pstore. On the system reset path we first update the pstore then we go for fadump crash. But the problem here is when all the CPUs try to get the pstore lock to initiate the pstore write, only one CPUs will acquire the lock and proceed with the pstore write. Since it in NMI context CPUs that fail to get lock do not wait for their turn to write to the pstore and simply proceed with the next operation which is fadump crash. One of the CPU who proceeded with fadump crash path triggers the crash and does not wait for the CPU who gets the pstore lock to complete the pstore update. Timeline diagram to depicts the sequence of events that leads to an unsuccessful pstore update when we hit fadump crash path via system reset. 1 2 3 ... n CPU Threads | | | | | | | | Reached to -->|--->|---->| ----------->| system reset | | | | path | | | | | | | | Try to -->|--->|---->|------------>| acquire the | | | | pstore lock | | | | | | | | | | | | Got the -->| +->| | |<-+ pstore lock | | | | | |--> Didn't get the | --------------------------+ lock and moving | | | | ahead on fadump | | | | crash path | | | | Begins the -->| | | | process to | | | |<-- Got the chance to update the | | | | trigger the crash pstore | -> | | ... <- | | | | | | | | | | | | |<-- Triggers the | | | | | | crash | | | | | | ^ | | | | | | | Writing to -->| | | | | | | pstore | | | | | | | | | | ^ |__________________| | | CPU Relax | | | +-----------------------------------------+ | v Race: crash triggered before pstore update completes To avoid this race condition a barrier is added on crash_fadump path, it prevents the CPU to trigger the crash until all the online CPUs completes their task. A barrier is added to make sure all the secondary CPUs hit the crash_fadump function before we initiates the crash. A timeout is kept to ensure the primary CPU (one who initiates the crash) do not wait for secondary CPUs indefinitely. Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200713052435.183750-1-sourabhjain@linux.ibm.com
2020-07-16powerpc: Add cputime_to_nsecs()Anton Blanchard1-0/+2
Generic code has a wrapper to implement cputime_to_nsecs() on top of cputime_to_usecs() but we can easily return the full nanosecond resolution directly. Signed-off-by: Anton Blanchard <anton@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200713083601.1103978-1-anton@ozlabs.org
2020-07-16powerpc/ppc-opcode: Fold PPC_INST_* macros into PPC_RAW_* macrosBalamuruhan S1-265/+129
Lots of PPC_INST_* macros are used only ever in PPC_* macros, fold those PPC_INST_* into PPC_RAW_* to avoid using PPC_INST_* accidentally. Signed-off-by: Balamuruhan S <bala24@linux.ibm.com> Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> [mpe: Deal with PHWSYNC, PLWSYNC] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200624113038.908074-7-bala24@linux.ibm.com
2020-07-16powerpc/ppc-opcode: Reuse raw instruction macros to stringifyBalamuruhan S1-158/+75
Wrap existing stringify macros to reuse raw instruction encoding macros that are newly added. Signed-off-by: Balamuruhan S <bala24@linux.ibm.com> Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> [mpe: Add DCBFPS, DCBSTPS, PHWSYNC, PLWSYNC] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200624113038.908074-6-bala24@linux.ibm.com
2020-07-16powerpc/ppc-opcode: Consolidate powerpc instructions from bpf_jit.hBalamuruhan S6-373/+324
Move macro definitions of powerpc instructions from bpf_jit.h to ppc-opcode.h and adopt the users of the macros accordingly. `PPC_MR()` is defined twice in bpf_jit.h, remove the duplicate one. Signed-off-by: Balamuruhan S <bala24@linux.ibm.com> Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Acked-by: Sandipan Das <sandipan@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200624113038.908074-5-bala24@linux.ibm.com
2020-07-16powerpc/bpf_jit: Reuse instruction macros from ppc-opcode.hBalamuruhan S5-35/+19
Remove duplicate macro definitions from bpf_jit.h and reuse the macros from ppc-opcode.h Signed-off-by: Balamuruhan S <bala24@linux.ibm.com> Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Acked-by: Sandipan Das <sandipan@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200624113038.908074-4-bala24@linux.ibm.com
2020-07-16powerpc/ppc-opcode: Move ppc instruction encoding from test_emulate_stepBalamuruhan S2-99/+74
Few ppc instructions are encoded in test_emulate_step.c, consolidate them and use it from ppc-opcode.h Signed-off-by: Balamuruhan S <bala24@linux.ibm.com> Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Acked-by: Sandipan Das <sandipan@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200624113038.908074-3-bala24@linux.ibm.com
2020-07-16powerpc/ppc-opcode: Introduce PPC_RAW_* macros for base instruction encodingBalamuruhan S1-7/+86
Introduce PPC_RAW_* macros to have all the bare encoding of ppc instructions. Move `VSX_XX*()` and `TMRN()` macros up to reuse it. Signed-off-by: Balamuruhan S <bala24@linux.ibm.com> Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> [mpe: Add DCBFPS, DCBSTPS, PHWSYNC, PLWSYNC] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200624113038.908074-2-bala24@linux.ibm.com
2020-07-16powerpc/hv-24x7: Add sysfs files inside hv-24x7 device to show cpumaskKajol Jain1-0/+8
Patch here adds a cpumask attr to hv_24x7 pmu along with ABI documentation. Primary use to expose the cpumask is for the perf tool which has the capability to parse the driver sysfs folder and understand the cpumask file. Having cpumask file will reduce the number of perf command line parameters (will avoid "-C" option in the perf tool command line). It can also notify the user which is the current cpu used to retrieve the counter data. command:# cat /sys/devices/hv_24x7/interface/cpumask 0 Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200709051836.723765-3-kjain@linux.ibm.com
2020-07-16powerpc/perf/hv-24x7: Add cpu hotplug supportKajol Jain1-0/+46
Patch here adds cpu hotplug functions to hv_24x7 pmu. A new cpuhp_state "CPUHP_AP_PERF_POWERPC_HV_24x7_ONLINE" enum is added. The online callback function updates the cpumask only if its empty. As the primary intention of adding hotplug support is to designate a CPU to make HCALL to collect the counter data. The offline function test and clear corresponding cpu in a cpumask and update cpumask to any other active cpu. Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200709051836.723765-2-kjain@linux.ibm.com
2020-07-16powerpc/pseries: remove obsolete memory hotplug DT notifier codeNathan Lynch1-64/+1
pseries_update_drconf_memory() runs from a DT notifier in response to an update to the ibm,dynamic-memory property of the /ibm,dynamic-reconfiguration-memory node. This property is an older less compact format than the ibm,dynamic-memory-v2 property used in most currently supported firmwares. There has never been an equivalent function for the v2 property. pseries_update_drconf_memory() compares the 'assigned' flag for each LMB in the old vs new properties and adds or removes the block accordingly. However it appears to be of no actual utility: * Partition suspension and PRRNs are specified only to change LMBs' NUMA affinity information. This notifier should be a no-op for those scenarios since the assigned flags should not change. * The memory hotplug/DLPAR path has a hack which short-circuits execution of the notifier: dlpar_memory() ... rtas_hp_event = true; drmem_update_dt() of_update_property() pseries_memory_notifier() pseries_update_drconf_memory() if (rtas_hp_event) return; So this code only makes sense as a relic of the time when more of the DLPAR workflow took place in user space. I don't see a purpose for it now. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200612051238.1007764-19-nathanl@linux.ibm.com
2020-07-16powerpc/pseries: remove dlpar_cpu_readd()Nathan Lynch2-20/+0
dlpar_cpu_readd() is unused now. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200612051238.1007764-18-nathanl@linux.ibm.com
2020-07-16powerpc/pseries: remove memory "re-add" implementationNathan Lynch2-43/+0
dlpar_memory() no longer has any callers which pass PSERIES_HP_ELOG_ACTION_READD. Remove this case and the corresponding unreachable code. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200612051238.1007764-17-nathanl@linux.ibm.com
2020-07-16powerpc/pseries: remove prrn special case from DT update pathNathan Lynch1-27/+0
pseries_devicetree_update() is no longer called with PRRN_SCOPE. The purpose of prrn_update_node() was to remove and then add back a LMB whose NUMA assignment had changed. This has never been reliable, and this codepath has been default-disabled for several releases. Remove prrn_update_node(). Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200612051238.1007764-16-nathanl@linux.ibm.com
2020-07-16powerpc/numa: remove arch_update_cpu_topologyNathan Lynch2-16/+0
Since arch_update_cpu_topology() doesn't do anything on powerpc now, remove it and associated dead code. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200612051238.1007764-15-nathanl@linux.ibm.com
2020-07-16powerpc/numa: remove prrn_is_enabled()Nathan Lynch2-10/+0
All users of this prrn_is_enabled() are gone; remove it. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200612051238.1007764-14-nathanl@linux.ibm.com
2020-07-16powerpc/rtasd: simplify handle_rtas_event(), emit message on eventsNathan Lynch1-25/+3
prrn_is_enabled() always returns false/0, so handle_rtas_event() can be simplified and some dead code can be removed. Use machine_is() instead of #ifdef to run this code only on pseries, and add an informational ratelimited message that we are ignoring the events. PRRN events are relatively rare in normal operation and usually arise from operator-initiated actions such as a DPO (Dynamic Platform Optimizer) run. Eventually we do want to consume these events and update the device tree, but that needs more care to be safe vs LPM and DLPAR. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200612051238.1007764-13-nathanl@linux.ibm.com
2020-07-16powerpc/numa: remove start/stop_topology_update()Nathan Lynch4-38/+1
These APIs have become no-ops, so remove them and all call sites. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200612051238.1007764-12-nathanl@linux.ibm.com
2020-07-16powerpc/numa: remove timed_topology_update()Nathan Lynch3-16/+0
timed_topology_update is a no-op now, so remove it and all call sites. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200612051238.1007764-11-nathanl@linux.ibm.com
2020-07-16powerpc/numa: stub out numa_update_cpu_topology()Nathan Lynch1-192/+1
Previous changes have removed the code which sets bits in cpu_associativity_changes_mask and thus it is never modifed at runtime. From this we can reason that numa_update_cpu_topology() always returns 0 without doing anything. Remove the body of numa_update_cpu_topology() and remove all code which becomes unreachable as a result. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200612051238.1007764-10-nathanl@linux.ibm.com
2020-07-16powerpc/numa: remove vphn_enabled and prrn_enabled internal flagsNathan Lynch1-4/+2
These flags are always zero now; remove them and suitably adjust the remaining references to them. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200612051238.1007764-9-nathanl@linux.ibm.com
2020-07-16powerpc/numa: remove unreachable topology workqueue codeNathan Lynch1-14/+0
Since vphn_enabled is always 0, we can remove the call to topology_schedule_update() and remove the code which becomes unreachable as a result. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200612051238.1007764-8-nathanl@linux.ibm.com
2020-07-16powerpc/numa: remove unreachable topology timer codeNathan Lynch1-21/+0
Since vphn_enabled is always 0, we can stub out timed_topology_update() and remove the code which becomes unreachable. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200612051238.1007764-7-nathanl@linux.ibm.com
2020-07-16powerpc/numa: make vphn_enabled, prrn_enabled flags constNathan Lynch1-2/+2
Previous changes have made it so these flags are never changed; enforce this by making them const. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200612051238.1007764-6-nathanl@linux.ibm.com
2020-07-16powerpc/numa: remove unreachable topology update codeNathan Lynch1-147/+2
Since the topology_updates_enabled flag is now always false, remove it and the code which has become unreachable. This is the minimum change that prevents 'defined but unused' warnings emitted by the compiler after stubbing out the start/stop_topology_updates() functions. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200612051238.1007764-5-nathanl@linux.ibm.com
2020-07-16powerpc/numa: remove ability to enable topology updatesNathan Lynch1-70/+1
Remove the /proc/powerpc/topology_updates interface and the topology_updates=on/off command line argument. The internal topology_updates_enabled flag remains for now, but always false. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200612051238.1007764-4-nathanl@linux.ibm.com
2020-07-16powerpc/rtas: don't online CPUs for partition suspendNathan Lynch3-143/+3
Partition suspension, used for hibernation and migration, requires that the OS place all but one of the LPAR's processor threads into one of two states prior to calling the ibm,suspend-me RTAS function: * the architected offline state (via RTAS stop-self); or * the H_JOIN hcall, which does not return until the partition resumes execution Using H_CEDE as the offline mode, introduced by commit 3aa565f53c39 ("powerpc/pseries: Add hooks to put the CPU into an appropriate offline state"), means that any threads which are offline from Linux's point of view must be moved to one of those two states before a partition suspension can proceed. This was eventually addressed in commit 120496ac2d2d ("powerpc: Bring all threads online prior to migration/hibernation"), which added code to temporarily bring up any offline processor threads so they can call H_JOIN. Conceptually this is fine, but the implementation has had multiple races with cpu hotplug operations initiated from user space[1][2][3], the error handling is fragile, and it generates user-visible cpu hotplug events which is a lot of noise for a platform feature that's supposed to minimize disruption to workloads. With commit 3aa565f53c39 ("powerpc/pseries: Add hooks to put the CPU into an appropriate offline state") reverted, this code becomes unnecessary, so remove it. Since any offline CPUs now are truly offline from the platform's point of view, it is no longer necessary to bring up CPUs only to have them call H_JOIN and then go offline again upon resuming. Only active threads are required to call H_JOIN; stopped threads can be left alone. [1] commit a6717c01ddc2 ("powerpc/rtas: use device model APIs and serialization during LPM") [2] commit 9fb603050ffd ("powerpc/rtas: retry when cpu offline races with suspend/migration") [3] commit dfd718a2ed1f ("powerpc/rtas: Fix a potential race between CPU-Offline & Migration") Fixes: 120496ac2d2d ("powerpc: Bring all threads online prior to migration/hibernation") Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200612051238.1007764-3-nathanl@linux.ibm.com
2020-07-16powerpc/pseries: remove cede offline state for CPUsNathan Lynch4-222/+15
This effectively reverts commit 3aa565f53c39 ("powerpc/pseries: Add hooks to put the CPU into an appropriate offline state"), which added an offline mode for CPUs which uses the H_CEDE hcall instead of the architected stop-self RTAS function in order to facilitate "folding" of dedicated mode processors on PowerVM platforms to achieve energy savings. This has been the default offline mode since its introduction. There's nothing about stop-self that would prevent the hypervisor from achieving the energy savings available via H_CEDE, so the original premise of this change appears to be flawed. I also have encountered the claim that the transition to and from ceded state is much faster than stop-self/start-cpu. Certainly we would not want to use stop-self as an *idle* mode. That is what H_CEDE is for. However, this difference is insignificant in the context of Linux CPU hotplug, where the latency of an offline or online operation on current systems is on the order of 100ms, mainly attributable to all the various subsystems' cpuhp callbacks. The cede offline mode also prevents accurate accounting, as discussed before: https://lore.kernel.org/linuxppc-dev/1571740391-3251-1-git-send-email-ego@linux.vnet.ibm.com/ Unconditionally use stop-self to offline processor threads. This is the architected method for offlining CPUs on PAPR systems. The "cede_offline" boot parameter is rendered obsolete. Removing this code enables the removal of the partition suspend code which temporarily onlines all present CPUs. Fixes: 3aa565f53c39 ("powerpc/pseries: Add hooks to put the CPU into an appropriate offline state") Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200612051238.1007764-2-nathanl@linux.ibm.com
2020-07-16powerpc/security: Allow for processors that flush the link stack using the ↵Nicholas Piggin2-8/+21
special bcctr If both count cache and link stack are to be flushed, and can be flushed with the special bcctr, patch that in directly to the flush/branch nop site. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200609070610.846703-7-npiggin@gmail.com
2020-07-16powerpc/64s: Move branch cache flushing bcctr variant to ppc-ops.hNicholas Piggin2-4/+4
Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200609070610.846703-6-npiggin@gmail.com
2020-07-16powerpc/security: split branch cache flush toggle from code patchingNicholas Piggin1-43/+51
Branch cache flushing code patching has inter-dependencies on both the link stack and the count cache flushing state. To make the code clearer and to separate the link stack and count cache handling, split the "toggle" (setting up variables and printing enable/disable) from the code patching. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Always print something, even if the flush is disabled] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200609070610.846703-5-npiggin@gmail.com
2020-07-16powerpc/security: make display of branch cache flush more consistentNicholas Piggin1-4/+4
Make the count-cache and link-stack messages look the same Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200609070610.846703-4-npiggin@gmail.com