summaryrefslogtreecommitdiff
path: root/arch/powerpc/kernel
AgeCommit message (Collapse)AuthorFilesLines
2020-07-27powerpc: switch to ->regset_get()Al Viro7-304/+148
Note: compat variant of REGSET_TM_CGPR is almost certainly wrong; it claims to be 48*64bit, but just as compat REGSET_GPR it stores 44*32bit of (truncated) registers + 4 32bit zeros... followed by 48 more 32bit zeroes. Might be too late to change - it's a userland ABI, after all ;-/ Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-07-27powerpc/fadump: Fix build error with CONFIG_PRESERVE_FA_DUMP=yMichael Ellerman1-1/+3
skiroot_defconfig fails: arch/powerpc/kernel/fadump.c:48:17: error: ‘cpus_in_fadump’ defined but not used 48 | static atomic_t cpus_in_fadump; Fix it by moving the definition into the #ifdef where it's used. Fixes: ba608c4fa12c ("powerpc/fadump: fix race between pstore write and fadump crash trigger") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200727070341.595634-1-mpe@ellerman.id.au
2020-07-27powerpc/64s/hash: Fix hash_preload running with interrupts enabledNicholas Piggin1-3/+11
Commit 2f92447f9f96 ("powerpc/book3s64/hash: Use the pte_t address from the caller") removed the local_irq_disable from hash_preload, but it was required for more than just the page table walk: the hash pte busy bit is effectively a lock which may be taken in interrupt context, and the local update flag test must not be preempted before it's used. This solves apparent lockups with perf interrupting __hash_page_64K. If get_perf_callchain then also takes a hash fault on the same page while it is already locked, it will loop forever taking hash faults, which looks like this: cpu 0x49e: Vector: 100 (System Reset) at [c00000001a4f7d70] pc: c000000000072dc8: hash_page_mm+0x8/0x800 lr: c00000000000c5a4: do_hash_page+0x24/0x38 sp: c0002ac1cc69ac70 msr: 8000000000081033 current = 0xc0002ac1cc602e00 paca = 0xc00000001de1f280 irqmask: 0x03 irq_happened: 0x01 pid = 20118, comm = pread2_processe Linux version 5.8.0-rc6-00345-g1fad14f18bc6 49e:mon> t [c0002ac1cc69ac70] c00000000000c5a4 do_hash_page+0x24/0x38 (unreliable) --- Exception: 300 (Data Access) at c00000000008fa60 __copy_tofrom_user_power7+0x20c/0x7ac [link register ] c000000000335d10 copy_from_user_nofault+0xf0/0x150 [c0002ac1cc69af70] c00032bf9fa3c880 (unreliable) [c0002ac1cc69afa0] c000000000109df0 read_user_stack_64+0x70/0xf0 [c0002ac1cc69afd0] c000000000109fcc perf_callchain_user_64+0x15c/0x410 [c0002ac1cc69b060] c000000000109c00 perf_callchain_user+0x20/0x40 [c0002ac1cc69b080] c00000000031c6cc get_perf_callchain+0x25c/0x360 [c0002ac1cc69b120] c000000000316b50 perf_callchain+0x70/0xa0 [c0002ac1cc69b140] c000000000316ddc perf_prepare_sample+0x25c/0x790 [c0002ac1cc69b1a0] c000000000317350 perf_event_output_forward+0x40/0xb0 [c0002ac1cc69b220] c000000000306138 __perf_event_overflow+0x88/0x1a0 [c0002ac1cc69b270] c00000000010cf70 record_and_restart+0x230/0x750 [c0002ac1cc69b620] c00000000010d69c perf_event_interrupt+0x20c/0x510 [c0002ac1cc69b730] c000000000027d9c performance_monitor_exception+0x4c/0x60 [c0002ac1cc69b750] c00000000000b2f8 performance_monitor_common_virt+0x1b8/0x1c0 --- Exception: f00 (Performance Monitor) at c0000000000cb5b0 pSeries_lpar_hpte_insert+0x0/0x160 [link register ] c0000000000846f0 __hash_page_64K+0x210/0x540 [c0002ac1cc69ba50] 0000000000000000 (unreliable) [c0002ac1cc69bb00] c000000000073ae0 update_mmu_cache+0x390/0x3a0 [c0002ac1cc69bb70] c00000000037f024 wp_page_copy+0x364/0xce0 [c0002ac1cc69bc20] c00000000038272c do_wp_page+0xdc/0xa60 [c0002ac1cc69bc70] c0000000003857bc handle_mm_fault+0xb9c/0x1b60 [c0002ac1cc69bd50] c00000000006c434 __do_page_fault+0x314/0xc90 [c0002ac1cc69be20] c00000000000c5c8 handle_page_fault+0x10/0x2c --- Exception: 300 (Data Access) at 00007fff8c861fe8 SP (7ffff6b19660) is in userspace Fixes: 2f92447f9f96 ("powerpc/book3s64/hash: Use the pte_t address from the caller") Reported-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Reported-by: Anton Blanchard <anton@ozlabs.org> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200727060947.10060-1-npiggin@gmail.com
2020-07-26powerpc/32s: Kernel space starts at TASK_SIZEChristophe Leroy1-6/+6
Kernel space starts at TASK_SIZE. Select kernel page table when address is over TASK_SIZE. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/893425e32cd0a003539573b2d115e0ffa98bc26c.1593428200.git.christophe.leroy@csgroup.eu
2020-07-26powerpc: Use MODULES_VADDR if definedChristophe Leroy1-0/+11
In order to allow allocation of modules outside of vmalloc space, use MODULES_VADDR and MODULES_END when MODULES_VADDR is defined. Redefine module_alloc() when MODULES_VADDR defined. Unmap corresponding KASAN shadow memory. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/7ecf5fff1eef67d450e73fc412b6ec3818483d75.1593428200.git.christophe.leroy@csgroup.eu
2020-07-26powerpc/perf: Initialize power10 PMU registers in cpu setup routineAthira Rajeev1-4/+15
Initialize Monitor Mode Control Register 3 (MMCR3) SPR which is new in power10. For PowerISA v3.1, BHRB disable is controlled via Monitor Mode Control Register A (MMCRA) bit, namely "BHRB Recording Disable (BHRBRD)". This patch also initializes MMCRA BHRBRD to disable BHRB feature at boot for power10. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Reviewed-by: Jordan Niethe <jniethe5@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1595489557-2047-1-git-send-email-atrajeev@linux.vnet.ibm.com
2020-07-26powerpc/eeh: Move PE tree setup into the platformOliver O'Halloran1-52/+19
The EEH core has a concept of a "PE tree" to support PowerNV. The PE tree follows the PCI bus structures because a reset asserted on an upstream bridge will be propagated to the downstream bridges. On pseries there's a 1-1 correspondence between what the guest sees are a PHB and a PE so the "tree" is really just a single node. Current the EEH core is reponsible for setting up this PE tree which it does by traversing the pci_dn tree. The structure of the pci_dn tree matches the bus tree on PowerNV which leads to the PE tree being "correct" this setup method doesn't make a whole lot of sense and it's actively confusing for the pseries case where it doesn't really do anything. We want to remove the dependence on pci_dn anyway so this patch move choosing where to insert a new PE into the platform code rather than being part of the generic EEH code. For PowerNV this simplifies the tree building logic and removes the use of pci_dn. For pseries we keep the existing logic. I'm not really convinced it does anything due to the 1-1 PE-to-PHB correspondence so every device under that PHB should be in the same PE, but I'd rather not remove it entirely until we've had a chance to look at it more deeply. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200725081231.39076-14-oohall@gmail.com
2020-07-26powerpc/eeh: Drop pdn use in eeh_pe_tree_insert()Oliver O'Halloran1-8/+7
This is mostly just to make the subsequent diffs less noisy. No functional changes. One thing that needs calling out is the removal of the "config_addr" variable and replacing it with edev->bdfn. The contents of edev->bdfn are the same, however it's worth pointing out that what RTAS calls a "config_addr" isn't the same as the bdfn. The config_addr is supposed to be: <bus><devfn><reg> with each field being an 8 bit number. Various parts of the EEH code use BDFN and "config_addr" as interchangeable quantities even though they aren't really. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200725081231.39076-13-oohall@gmail.com
2020-07-26powerpc/eeh: Rename eeh_{add_to|remove_from}_parent_pe()Oliver O'Halloran4-8/+8
The naming of eeh_{add_to|remove_from}_parent_pe() doesn't really reflect what they actually do. If the PE referred to be edev->pe_config_addr already exists under that PHB then the edev is added to that PE. However, if the PE doesn't exist the a new one is created for the edev. The bulk of the implementation of eeh_add_to_parent_pe() covers that second case. Similarly, most of eeh_remove_from_parent_pe() is determining when it's safe to delete a PE. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200725081231.39076-12-oohall@gmail.com
2020-07-26powerpc/eeh: Remove spurious use of pci_dn in eeh_dump_dev_logOliver O'Halloran1-10/+4
Retrieve the domain, bus, device, and function numbers from the edev. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200725081231.39076-10-oohall@gmail.com
2020-07-26powerpc/eeh: Pass eeh_dev to eeh_ops->{read|write}_config()Oliver O'Halloran2-37/+32
Mechanical conversion of the eeh_ops interfaces to use eeh_dev to reference a specific device rather than pci_dn. No functional changes. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200725081231.39076-9-oohall@gmail.com
2020-07-26powerpc/eeh: Pass eeh_dev to eeh_ops->resume_notify()Oliver O'Halloran2-3/+3
Mechanical conversion of the eeh_ops interfaces to use eeh_dev to reference a specific device rather than pci_dn. No functional changes. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200725081231.39076-8-oohall@gmail.com
2020-07-26powerpc/eeh: Pass eeh_dev to eeh_ops->restore_config()Oliver O'Halloran2-7/+4
Mechanical conversion of the eeh_ops interfaces to use eeh_dev to reference a specific device rather than pci_dn. No functional changes. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200725081231.39076-7-oohall@gmail.com
2020-07-26powerpc/eeh: Remove VF config space restorationOliver O'Halloran1-59/+0
There's a bunch of strange things about this code. First up is that none of the fields being written to are functional for a VF. The SR-IOV specification lists then as "Reserved, but OS should preserve" so writing new values to them doesn't do anything and is clearly wrong from a correctness perspective. However, since VFs are designed to be managed by the OS there is an argument to be made that we should be saving and restoring some parts of config space. We already sort of do that by saving the first 64 bytes of config space in the eeh_dev (see eeh_dev->config_space[]). This is inadequate since it doesn't even consider saving and restoring the PCI capability structures. However, this is a problem with EEH in general and that needs to be fixed for non-VF devices too. There's no real reason to keep around this around so delete it. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200725081231.39076-6-oohall@gmail.com
2020-07-26powerpc/eeh: Move vf_index out of pci_dn and into eeh_devOliver O'Halloran2-7/+6
Drivers that do not support the PCI error handling callbacks are handled by tearing down the device and re-probing them. If the device being removed is a virtual function then we need to know the VF index so it can be removed using the pci_iov_{add|remove}_virtfn() API. Currently this is handled by looking up the pci_dn, and using the vf_index that was stashed there when the pci_dn for the VF was created in pcibios_sriov_enable(). We would like to eliminate the use of pci_dn outside of pseries though so we need to provide the generic EEH code with some other way to find the vf_index. The easiest thing to do here is move the vf_index field out of pci_dn and into eeh_dev. Currently pci_dn and eeh_dev are allocated and initialized together so this is a fairly minimal change in preparation for splitting pci_dn and eeh_dev in the future. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200725081231.39076-3-oohall@gmail.com
2020-07-26powerpc/eeh: Remove eeh_dev.cOliver O'Halloran3-55/+21
The only thing in this file is eeh_dev_init() which is allocates and initialises an eeh_dev based on a pci_dn. This is only ever called from pci_dn.c so move it into there and remove the file. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200725081231.39076-2-oohall@gmail.com
2020-07-26powerpc/eeh: Remove eeh_dev_phb_init_dynamic()Oliver O'Halloran3-16/+3
This function is a one line wrapper around eeh_phb_pe_create() and despite the name it doesn't create any eeh_dev structures. Replace it with direct calls to eeh_phb_pe_create() since that does what it says on the tin and removes a layer of indirection. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200725081231.39076-1-oohall@gmail.com
2020-07-26powerpc/watchpoint: Remove 512 byte boundaryRavi Bangoria1-2/+3
Power10 has removed 512 bytes boundary from match criteria i.e. the watch range can cross 512 bytes boundary. Note: ISA 3.1 Book III 9.4 match criteria includes 512 byte limit but that is a documentation mistake and hopefully will be fixed in the next version of ISA. Though, ISA 3.1 change log mentions about removal of 512B boundary: Multiple DEAW: Added a second Data Address Watchpoint. [H]DAR is set to the first byte of overlap. 512B boundary is removed. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200723090813.303838-11-ravi.bangoria@linux.ibm.com
2020-07-26powerpc/watchpoint: Guest support for 2nd DAWR hcallRavi Bangoria1-1/+1
2nd DAWR can be set/unset using H_SET_MODE hcall with resource value 5. Enable powervm guest support with that. This has no effect on kvm guest because kvm will return error if guest does hcall with resource value 5. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200723090813.303838-9-ravi.bangoria@linux.ibm.com
2020-07-26powerpc/watchpoint: Set CPU_FTR_DAWR1 based on pa-features bitRavi Bangoria1-0/+2
As per the PAPR, bit 0 of byte 64 in pa-features property indicates availability of 2nd DAWR registers. i.e. If this bit is set, 2nd DAWR is present, otherwise not. Host generally uses "cpu-features", which masks "pa-features". But "cpu-features" are still not used for guests and thus this change is mostly applicable for guests only. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Tested-by: Jordan Niethe <jniethe5@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200723090813.303838-7-ravi.bangoria@linux.ibm.com
2020-07-26powerpc/dt_cpu_ftrs: Add feature for 2nd DAWRRavi Bangoria1-0/+1
Add new device-tree feature for 2nd DAWR. If this feature is present, 2nd DAWR is supported, otherwise not. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200723090813.303838-6-ravi.bangoria@linux.ibm.com
2020-07-26powerpc/watchpoint: Fix DAWR exception for CACHEOPRavi Bangoria1-1/+20
'ea' returned by analyse_instr() needs to be aligned down to cache block size for CACHEOP instructions. analyse_instr() does not set size for CACHEOP, thus size also needs to be calculated manually. Fixes: 27985b2a640e ("powerpc/watchpoint: Don't ignore extraneous exceptions blindly") Fixes: 74c6881019b7 ("powerpc/watchpoint: Prepare handler to handle more than one watchpoint") Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200723090813.303838-4-ravi.bangoria@linux.ibm.com
2020-07-26powerpc/watchpoint: Fix DAWR exception constraintRavi Bangoria1-31/+41
Pedro Miraglia Franco de Carvalho noticed that on p8/p9, DAR value is inconsistent with different type of load/store. Like for byte,word etc. load/stores, DAR is set to the address of the first byte of overlap between watch range and real access. But for quadword load/ store it's sometime set to the address of the first byte of real access whereas sometime set to the address of the first byte of overlap. This issue has been fixed in p10. In p10(ISA 3.1), DAR is always set to the address of the first byte of overlap. Commit 27985b2a640e ("powerpc/watchpoint: Don't ignore extraneous exceptions blindly") wrongly assumes that DAR is set to the address of the first byte of overlap for all load/stores on p8/p9 as well. Fix that. With the fix, we now rely on 'ea' provided by analyse_instr(). If analyse_instr() fails, generate event unconditionally on p8/p9, and on p10 generate event only if DAR is within a DAWR range. Note: 8xx is not affected. Fixes: 27985b2a640e ("powerpc/watchpoint: Don't ignore extraneous exceptions blindly") Fixes: 74c6881019b7 ("powerpc/watchpoint: Prepare handler to handle more than one watchpoint") Reported-by: Pedro Miraglia Franco de Carvalho <pedromfc@br.ibm.com> Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200723090813.303838-3-ravi.bangoria@linux.ibm.com
2020-07-26powerpc/watchpoint: Fix 512 byte boundary limitRavi Bangoria1-1/+1
Milton Miller reported that we are aligning start and end address to wrong size SZ_512M. It should be SZ_512. Fix that. While doing this change I also found a case where ALIGN() comparison fails. Within a given aligned range, ALIGN() of two addresses does not match when start address is pointing to the first byte and end address is pointing to any other byte except the first one. But that's not true for ALIGN_DOWN(). ALIGN_DOWN() of any two addresses within that range will always point to the first byte. So use ALIGN_DOWN() instead of ALIGN(). Fixes: e68ef121c1f4 ("powerpc/watchpoint: Use builtin ALIGN*() macros") Reported-by: Milton Miller <miltonm@us.ibm.com> Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Tested-by: Jordan Niethe <jniethe5@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200723090813.303838-2-ravi.bangoria@linux.ibm.com
2020-07-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller2-2/+2
The UDP reuseport conflict was a little bit tricky. The net-next code, via bpf-next, extracted the reuseport handling into a helper so that the BPF sk lookup code could invoke it. At the same time, the logic for reuseport handling of unconnected sockets changed via commit efc6b6f6c3113e8b203b9debfb72d81e0f3dcace which changed the logic to carry on the reuseport result into the rest of the lookup loop if we do not return immediately. This requires moving the reuseport_has_conns() logic into the callers. While we are here, get rid of inline directives as they do not belong in foo.c files. The other changes were cases of more straightforward overlapping modifications. Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-25Merge tag 'v5.8-rc6' into locking/core, to pick up fixesIngo Molnar2-2/+2
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-07-23Merge branch 'scv' support into nextMichael Ellerman9-27/+374
From Nick's cover letter: Linux powerpc new system call instruction and ABI System Call Vectored (scv) ABI ============================== The scv instruction is introduced with POWER9 / ISA3, it comes with an rfscv counter-part. The benefit of these instructions is performance (trading slower SRR0/1 with faster LR/CTR registers, and entering the kernel with MSR[EE] and MSR[RI] left enabled, which can reduce MSR updates. The scv instruction has 128 levels (not enough to cover the Linux system call space). Assignment and advertisement ---------------------------- The proposal is to assign scv levels conservatively, and advertise them with HWCAP feature bits as we add support for more. Linux has not enabled FSCR[SCV] yet, so executing the scv instruction will cause the kernel to log a "SCV facility unavilable" message, and deliver a SIGILL with ILL_ILLOPC to the process. Linux has defined a HWCAP2 bit PPC_FEATURE2_SCV for SCV support, but does not set it. This change allocates the zero level ('scv 0'), advertised with PPC_FEATURE2_SCV, which will be used to provide normal Linux system calls (equivalent to 'sc'). Attempting to execute scv with other levels will cause a SIGILL to be delivered the same as before, but will not log a "SCV facility unavailable" message (because the processor facility is enabled). Calling convention ------------------ The proposal is for scv 0 to provide the standard Linux system call ABI with the following differences from sc convention[1]: - LR is to be volatile across scv calls. This is necessary because the scv instruction clobbers LR. From previous discussion, this should be possible to deal with in GCC clobbers and CFI. - cr1 and cr5-cr7 are volatile. This matches the C ABI and would allow the kernel system call exit to avoid restoring the volatile cr registers (although we probably still would anyway to avoid information leaks). - Error handling: The consensus among kernel, glibc, and musl is to move to using negative return values in r3 rather than CR0[SO]=1 to indicate error, which matches most other architectures, and is closer to a function call. Notes ----- - r0,r4-r8 are documented as volatile in the ABI, but the kernel patch as submitted currently preserves them. This is to leave room for deciding which way to go with these. Some small benefit was found by preserving them[1] but I'm not convinced it's worth deviating from the C function call ABI just for this. Release code should follow the ABI. Previous discussions: https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-April/208691.html https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-April/209268.html [1] https://github.com/torvalds/linux/blob/master/Documentation/powerpc/syscall64-abi.rst [2] https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-April/209263.html
2020-07-23powerpc/powernv: Machine check handler for POWER10Nicholas Piggin3-0/+95
Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200702233343.1128026-1-npiggin@gmail.com
2020-07-23powerpc: Select ARCH_HAS_MEMBARRIER_SYNC_CORENicholas Piggin1-0/+6
powerpc return from interrupt and return from system call sequences are context synchronising. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200716013522.338318-1-npiggin@gmail.com
2020-07-23powerpc/64: Fix an out of date comment about MMIO orderingPalmer Dabbelt1-1/+1
This primitive has been renamed, but because it was spelled incorrectly in the first place it must have escaped the fixup patch. As far as I can tell this logic is still correct: smp_mb__after_spinlock() uses the default smp_mb() implementation, which is "sync" rather than "hwsync" but those are the same (though I'm not that familiar with PowerPC). Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200716193820.1141936-1-palmer@dabbelt.com
2020-07-23powerpc: Add a ppc_inst_as_str() helperJordan Niethe2-14/+14
There are quite a few places where instructions are printed, this is done using a '%x' format specifier. With the introduction of prefixed instructions, this does not work well. Currently in these places, ppc_inst_val() is used for the value for %x so only the first word of prefixed instructions are printed. When the instructions are word instructions, only a single word should be printed. For prefixed instructions both the prefix and suffix should be printed. To accommodate both of these situations, instead of a '%x' specifier use '%s' and introduce a helper, __ppc_inst_as_str() which returns a char *. The char * __ppc_inst_as_str() returns is buffer that is passed to it by the caller. It is cumbersome to require every caller of __ppc_inst_as_str() to now declare a buffer. To make it more convenient to use __ppc_inst_as_str(), wrap it in a macro that uses a compound statement to allocate a buffer on the caller's stack before calling it. Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Reviewed-by: Joel Stanley <joel@jms.id.au> Acked-by: Segher Boessenkool <segher@kernel.crashing.org> [mpe: Drop 0x prefix to match most existings uses, especially xmon] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200602052728.18227-1-jniethe5@gmail.com
2020-07-22powerpc/64s: system call support for scv/rfscv instructionsNicholas Piggin9-24/+350
Add support for the scv instruction on POWER9 and later CPUs. For now this implements the zeroth scv vector 'scv 0', as identical to 'sc' system calls, with the exception that LR is not preserved, nor are volatile CR registers, and error is not indicated with CR0[SO], but by returning a negative errno. rfscv is implemented to return from scv type system calls. It can not be used to return from sc system calls because those are defined to preserve LR. getpid syscall throughput on POWER9 is improved by 26% (428 to 318 cycles), largely due to reducing mtmsr and mtspr. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Fix ppc64e build] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200611081203.995112-3-npiggin@gmail.com
2020-07-22powerpc/64s/exception: treat NIA below __end_interrupts as soft-maskedNicholas Piggin1-3/+24
The scv instruction causes an interrupt which can enter the kernel with MSR[EE]=1, thus allowing interrupts to hit at any time. These must not be taken as normal interrupts, because they come from MSR[PR]=0 context, and yet the kernel stack is not yet set up and r13 is not set to the PACA). Treat this as a soft-masked interrupt regardless of the soft masked state. This does not affect behaviour yet, because currently all interrupts are taken with MSR[EE]=0. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200611081203.995112-2-npiggin@gmail.com
2020-07-22powerpc/perf: Add Power10 PMU feature to DT CPU featuresMadhavan Srinivasan1-0/+26
Add Power10 feature function to DT CPU features, along with a Power10 specific init() to initialize PMU SPRs, sets the oprofile_cpu_type and cpu_features. This will enable performance monitoring unit (PMU) for Power10 in CPU features with "performance-monitor-power10". For Power ISA v3.1, BHRB disable is controlled via Monitor Mode Control Register A (MMCRA) bit, namely "BHRB Recording Disable (BHRBRD)". This patch initializes MMCRA BHRBRD to disable BHRB feature at boot for Power10. Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> [mpe: Move MMCRA_BHRB_DISABLE as noted by jpn, drop CPU setup changes] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1594996707-3727-8-git-send-email-atrajeev@linux.vnet.ibm.com
2020-07-22KVM: PPC: Book3S HV: Save/restore new PMU registersAthira Rajeev1-0/+3
Power ISA v3.1 has added new performance monitoring unit (PMU) special purpose registers (SPRs). They are: Monitor Mode Control Register 3 (MMCR3) Sampled Instruction Event Register A (SIER2) Sampled Instruction Event Register B (SIER3) Add support to save/restore these new SPRs while entering/exiting guest. Also include changes to support KVM_REG_PPC_MMCR3/SIER2/SIER3. Add new SPRs to KVM API documentation. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1594996707-3727-6-git-send-email-atrajeev@linux.vnet.ibm.com
2020-07-22powerpc/perf: Add support for ISA3.1 PMU SPRsMadhavan Srinivasan1-0/+8
PowerISA v3.1 includes new performance monitoring unit(PMU) special purpose registers (SPRs). They are Monitor Mode Control Register 3 (MMCR3) Sampled Instruction Event Register 2 (SIER2) Sampled Instruction Event Register 3 (SIER3) MMCR3 is added for further sampling related configuration control. SIER2/SIER3 are added to provide additional information about the sampled instruction. Patch adds new PPMU flag called "PPMU_ARCH_31" to support handling of these new SPRs, updates the struct thread_struct to include these new SPRs, include MMCR3 in struct mmcr_regs. This is needed to support programming of MMCR3 SPR during event_enable/disable. Patch also adds the sysfs support for the MMCR3 SPR along with SPRN_ macros for these new pmu SPRs. Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> [mpe: Rename to PPMU_ARCH_31 as noted by jpn] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1594996707-3727-5-git-send-email-atrajeev@linux.vnet.ibm.com
2020-07-22KVM: PPC: Book3S HV: Cleanup updates for kvm vcpu MMCRAthira Rajeev1-0/+2
Currently `kvm_vcpu_arch` stores all Monitor Mode Control registers in a flat array in order: mmcr0, mmcr1, mmcra, mmcr2, mmcrs Split this to give mmcra and mmcrs its own entries in vcpu and use a flat array for mmcr0 to mmcr2. This patch implements this cleanup to make code easier to read. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> [mpe: Fix MMCRA/MMCR2 uapi breakage as noted by paulus] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1594996707-3727-3-git-send-email-atrajeev@linux.vnet.ibm.com
2020-07-21powerpc/64s: Remove PROT_SAO supportNicholas Piggin1-1/+1
ISA v3.1 does not support the SAO storage control attribute required to implement PROT_SAO. PROT_SAO was used by specialised system software (Lx86) that has been discontinued for about 7 years, and is not thought to be used elsewhere, so removal should not cause problems. We rather remove it than keep support for older processors, because live migrating guest partitions to newer processors may not be possible if SAO is in use (or worse allowed with silent races). - PROT_SAO stays in the uapi header so code using it would still build. - arch_validate_prot() is removed, the generic version rejects PROT_SAO so applications would get a failure at mmap() time. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Drop KVM change for the time being] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200703011958.1166620-3-npiggin@gmail.com
2020-07-20powerpc/book3s64/kuap: Move UAMOR setup to key init functionAneesh Kumar K.V2-6/+22
UAMOR values are not application-specific. The kernel initializes its value based on different reserved keys. Remove the thread-specific UAMOR value and don't switch the UAMOR on context switch. Move UAMOR initialization to key initialization code and remove thread_struct.uamor because it is not used anymore. Before commit: 4a4a5e5d2aad ("powerpc/pkeys: key allocation/deallocation must not change pkey registers") we used to update uamor based on key allocation and free. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200709032946.881753-20-aneesh.kumar@linux.ibm.com
2020-07-20powerpc/book3s64/keys/kuap: Reset AMR/IAMR values on kexecAneesh Kumar K.V1-14/+0
As we kexec across kernels that use AMR/IAMR for different purposes we need to ensure that new kernels get kexec'd with a reset value of AMR/IAMR. For ex: the new kernel can use key 0 for kernel mapping and the old AMR value prevents access to key 0. This patch also removes reset if IAMR and AMOR in kexec_sequence. Reset of AMOR is not needed and the IAMR reset is partial (it doesn't do the reset on secondary cpus) and is redundant with this patch. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200709032946.881753-19-aneesh.kumar@linux.ibm.com
2020-07-20powerpc/book3s64/pkeys: Add MMU_FTR_PKEYAneesh Kumar K.V1-0/+5
Parse storage keys related device tree entry in early_init_devtree and enable MMU feature MMU_FTR_PKEY if pkeys are supported. MMU feature is used instead of CPU feature because this enables us to group MMU_FTR_KUAP and MMU_FTR_PKEY in asm feature fixup code. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200709032946.881753-14-aneesh.kumar@linux.ibm.com
2020-07-20powerpc/book3s64/pkeys: kill cpu feature key CPU_FTR_PKEYAneesh Kumar K.V1-6/+0
We don't use CPU_FTR_PKEY anymore. Remove the feature bit and mark it free. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200709032946.881753-9-aneesh.kumar@linux.ibm.com
2020-07-20powerpc/mce: Add MCE notification chainSantosh Sivaraj1-0/+15
Introduce notification chain which lets us know about uncorrected memory errors(UE). This would help prospective users in pmem or nvdimm subsystem to track bad blocks for better handling of persistent memory allocations. Signed-off-by: Santosh Sivaraj <santosh@fossix.org> Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200709135142.721504-1-santosh@fossix.org
2020-07-20powerpc/prom: Enable Radix GTSE in cpu pa-featuresNicholas Piggin1-1/+1
When '029ab30b4c0a ("powerpc/mm: Enable radix GTSE only if supported.")' made GTSE an MMU feature, it was enabled by default in powerpc-cpu-features but was missed in pa-features. This causes random memory corruption during boot of PowerNV kernels where CONFIG_PPC_DT_CPU_FTRS isn't enabled. Fixes: 029ab30b4c0a ("powerpc/mm: Enable radix GTSE only if supported.") Reported-by: Qian Cai <cai@lca.pw> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Bharata B Rao <bharata@linux.ibm.com> [mpe: Unwrap long line] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200720044258.863574-1-bharata@linux.ibm.com
2020-07-20net: remove compat_sys_{get,set}sockoptChristoph Hellwig1-2/+2
Now that the ->compat_{get,set}sockopt proto_ops methods are gone there is no good reason left to keep the compat syscalls separate. This fixes the odd use of unsigned int for the compat_setsockopt optlen and the missing sock_use_custom_sol_socket. It would also easily allow running the eBPF hooks for the compat syscalls, but such a large change in behavior does not belong into a consolidation patch like this one. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-19powerpc: use the generic dma_ops_bypass modeChristoph Hellwig1-81/+9
Use the DMA API bypass mechanism for direct window mappings. This uses common code and speed up the direct mapping case by avoiding indirect calls just when not using dma ops at all. It also fixes a problem where the sync_* methods were using the bypass check for DMA allocations, but those are part of the streaming ops. Note that this patch loses the DMA_ATTR_WEAK_ORDERING override, which has never been well defined, as is only used by a few drivers, which IIRC never showed up in the typical Cell blade setups that are affected by the ordering workaround. Fixes: efd176a04bef ("powerpc/pseries/dma: Allow SWIOTLB") Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2020-07-18Merge branch 'fixes' into nextMichael Ellerman2-2/+2
Merge our fixes branch, primarily to bring in the ebb selftests build fix and the pkey fix, which is a dependency for some future work.
2020-07-16powerpc/pseries: Detect secure and trusted boot state of the system.Nayna Jain1-2/+16
The device-tree properties to check secure and trusted boot state are different for guests (pseries) compared to baremetal (powernv). This patch updates the existing is_ppc_secureboot_enabled() and is_ppc_trustedboot_enabled() functions to add support for pseries. For pseries the secureboot and trustedboot state are exposed via device-tree properties /ibm,secure-boot and /ibm,trusted-boot. The values of ibm,secure-boot under pseries are interpreted as: 0 - Disabled 1 - Enabled in Log-only mode. This patch interprets this value as disabled, since audit mode is currently not supported for Linux. 2 - Enabled and enforced. 3-9 - Enabled and enforcing; requirements are at the discretion of the operating system. The values of ibm,trusted-boot under pseries are interpreted as: 0 - Disabled 1 - Enabled Signed-off-by: Nayna Jain <nayna@linux.ibm.com> Reviewed-by: Daniel Axtens <dja@axtens.net> Reviewed-by: Mimi Zohar <zohar@linux.ibm.com> [mpe: Drop machdep.h inclusion, tweak change log slightly] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1594813921-12425-1-git-send-email-nayna@linux.ibm.com
2020-07-16powerpc/vdso: Fix vdso cpu truncationMilton Miller1-1/+1
The code in vdso_cpu_init that exposes the cpu and numa node to userspace via SPRG_VDSO incorrctly masks the cpu to 12 bits. This means that any kernel running on a box with more than 4096 threads (NR_CPUS advertises a limit of of 8192 cpus) would expose userspace to two cpu contexts running at the same time with the same cpu number. Note: I'm not aware of any distro shipping a kernel with support for more than 4096 threads today, nor of any system image that currently exceeds 4096 threads. Found via code browsing. Fixes: 18ad51dd342a7eb09dbcd059d0b451b616d4dafc ("powerpc: Add VDSO version of getcpu") Signed-off-by: Milton Miller <miltonm@us.ibm.com> Signed-off-by: Anton Blanchard <anton@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200715233704.1352257-1-anton@ozlabs.org
2020-07-16powerpc/fadump: fix race between pstore write and fadump crash triggerSourabh Jain1-0/+24
When we enter into fadump crash path via system reset we fail to update the pstore. On the system reset path we first update the pstore then we go for fadump crash. But the problem here is when all the CPUs try to get the pstore lock to initiate the pstore write, only one CPUs will acquire the lock and proceed with the pstore write. Since it in NMI context CPUs that fail to get lock do not wait for their turn to write to the pstore and simply proceed with the next operation which is fadump crash. One of the CPU who proceeded with fadump crash path triggers the crash and does not wait for the CPU who gets the pstore lock to complete the pstore update. Timeline diagram to depicts the sequence of events that leads to an unsuccessful pstore update when we hit fadump crash path via system reset. 1 2 3 ... n CPU Threads | | | | | | | | Reached to -->|--->|---->| ----------->| system reset | | | | path | | | | | | | | Try to -->|--->|---->|------------>| acquire the | | | | pstore lock | | | | | | | | | | | | Got the -->| +->| | |<-+ pstore lock | | | | | |--> Didn't get the | --------------------------+ lock and moving | | | | ahead on fadump | | | | crash path | | | | Begins the -->| | | | process to | | | |<-- Got the chance to update the | | | | trigger the crash pstore | -> | | ... <- | | | | | | | | | | | | |<-- Triggers the | | | | | | crash | | | | | | ^ | | | | | | | Writing to -->| | | | | | | pstore | | | | | | | | | | ^ |__________________| | | CPU Relax | | | +-----------------------------------------+ | v Race: crash triggered before pstore update completes To avoid this race condition a barrier is added on crash_fadump path, it prevents the CPU to trigger the crash until all the online CPUs completes their task. A barrier is added to make sure all the secondary CPUs hit the crash_fadump function before we initiates the crash. A timeout is kept to ensure the primary CPU (one who initiates the crash) do not wait for secondary CPUs indefinitely. Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200713052435.183750-1-sourabhjain@linux.ibm.com