summaryrefslogtreecommitdiff
path: root/arch/powerpc/kernel
AgeCommit message (Collapse)AuthorFilesLines
2021-06-16powerpc: Define empty_zero_page[] in CChristophe Leroy6-28/+0
At the time being, empty_zero_page[] is defined in each platform head.S. Define it in mm/mem.c instead, and put it in BSS section instead of the DATA section. Commit 5227cfa71f9e ("arm64: mm: place empty_zero_page in bss") explains why it is interesting to have it in BSS. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/5838caffa269e0957c5a50cc85477876220298b0.1623063174.git.christophe.leroy@csgroup.eu
2021-06-16powerpc/nohash: Convert set_context() to CChristophe Leroy3-17/+0
ppc8xx already has set_context() in C. Other ones have it in assembly. The only thing it does is to write the context id into SPRN_PID. Do it in C. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/a5d0759064f3831c6b88af49ef5d3b05ba1c4dad.1622712515.git.christophe.leroy@csgroup.eu
2021-06-16powerpc/nohash: Refactor update of BDI2000 pointers in switch_mmu_context()Christophe Leroy4-53/+0
Instead of duplicating the update of BDI2000 pointers in set_context(), do it directly from switch_mmu_context(). Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/4c54997edd3548fa54717915e7c6ebaf60f208c0.1622712515.git.christophe.leroy@csgroup.eu
2021-06-16powerpc/32s: Rework Kernel Userspace Access ProtectionChristophe Leroy1-0/+3
On book3s/32, KUAP is provided by toggling Ks bit in segment registers. One segment register addresses 256M of virtual memory. At the time being, KUAP implements a complex logic to apply the unlock/lock on the exact number of segments covering the user range to access, with saving the boundaries of the range of segments in a member of thread struct. But most if not all user accesses are within a single segment. Rework KUAP with a different approach: - Open only one segment, the one corresponding to the starting address of the range to be accessed. - If a second segment is involved, it will generate a page fault. The segment will then be open by the page fault handler. The kuap member of thread struct will now contain: - The start address of the current on going user access, that will be used to know which segment to lock at the end of the user access. - ~0 when no user access is open - ~1 when additionnal segments are opened by a page fault. Then, at lock time - When only one segment is open, close it. - When several segments are open, close all user segments. Almost 100% of the time, only one segment will be involved. In interrupts, inline the function that unlock/lock all segments, because not inlining them implies a lot of register save/restore. With the patch, writing value 128 in userspace in perf_copy_attr() is done with 16 instructions: 3890: 93 82 04 dc stw r28,1244(r2) 3894: 7d 20 e5 26 mfsrin r9,r28 3898: 55 29 00 80 rlwinm r9,r9,0,2,0 389c: 7d 20 e1 e4 mtsrin r9,r28 38a0: 4c 00 01 2c isync 38a4: 39 20 00 80 li r9,128 38a8: 91 3c 00 00 stw r9,0(r28) 38ac: 81 42 04 dc lwz r10,1244(r2) 38b0: 39 00 ff ff li r8,-1 38b4: 91 02 04 dc stw r8,1244(r2) 38b8: 2c 0a ff fe cmpwi r10,-2 38bc: 41 82 00 88 beq 3944 <perf_copy_attr+0x36c> 38c0: 7d 20 55 26 mfsrin r9,r10 38c4: 65 29 40 00 oris r9,r9,16384 38c8: 7d 20 51 e4 mtsrin r9,r10 38cc: 4c 00 01 2c isync ... 3944: 48 00 00 01 bl 3944 <perf_copy_attr+0x36c> 3944: R_PPC_REL24 kuap_lock_all_ool Before the patch it was 118 instructions. In reality only 42 are executed in most cases, but GCC is not able to see that a properly aligned user access cannot involve more than one segment. 5060: 39 1d 00 04 addi r8,r29,4 5064: 3d 20 b0 00 lis r9,-20480 5068: 7c 08 48 40 cmplw r8,r9 506c: 40 81 00 08 ble 5074 <perf_copy_attr+0x2cc> 5070: 3d 00 b0 00 lis r8,-20480 5074: 39 28 ff ff addi r9,r8,-1 5078: 57 aa 00 06 rlwinm r10,r29,0,0,3 507c: 55 29 27 3e rlwinm r9,r9,4,28,31 5080: 39 29 00 01 addi r9,r9,1 5084: 7d 29 53 78 or r9,r9,r10 5088: 91 22 04 dc stw r9,1244(r2) 508c: 7d 20 ed 26 mfsrin r9,r29 5090: 55 29 00 80 rlwinm r9,r9,0,2,0 5094: 7c 08 50 40 cmplw r8,r10 5098: 40 81 00 c0 ble 5158 <perf_copy_attr+0x3b0> 509c: 7d 46 50 f8 not r6,r10 50a0: 7c c6 42 14 add r6,r6,r8 50a4: 54 c6 27 be rlwinm r6,r6,4,30,31 50a8: 7d 20 51 e4 mtsrin r9,r10 50ac: 3c ea 10 00 addis r7,r10,4096 50b0: 39 29 01 11 addi r9,r9,273 50b4: 7f 88 38 40 cmplw cr7,r8,r7 50b8: 55 29 02 06 rlwinm r9,r9,0,8,3 50bc: 40 9d 00 9c ble cr7,5158 <perf_copy_attr+0x3b0> 50c0: 2f 86 00 00 cmpwi cr7,r6,0 50c4: 41 9e 00 4c beq cr7,5110 <perf_copy_attr+0x368> 50c8: 2f 86 00 01 cmpwi cr7,r6,1 50cc: 41 9e 00 2c beq cr7,50f8 <perf_copy_attr+0x350> 50d0: 2f 86 00 02 cmpwi cr7,r6,2 50d4: 41 9e 00 14 beq cr7,50e8 <perf_copy_attr+0x340> 50d8: 7d 20 39 e4 mtsrin r9,r7 50dc: 39 29 01 11 addi r9,r9,273 50e0: 3c e7 10 00 addis r7,r7,4096 50e4: 55 29 02 06 rlwinm r9,r9,0,8,3 50e8: 7d 20 39 e4 mtsrin r9,r7 50ec: 39 29 01 11 addi r9,r9,273 50f0: 3c e7 10 00 addis r7,r7,4096 50f4: 55 29 02 06 rlwinm r9,r9,0,8,3 50f8: 7d 20 39 e4 mtsrin r9,r7 50fc: 3c e7 10 00 addis r7,r7,4096 5100: 39 29 01 11 addi r9,r9,273 5104: 7f 88 38 40 cmplw cr7,r8,r7 5108: 55 29 02 06 rlwinm r9,r9,0,8,3 510c: 40 9d 00 4c ble cr7,5158 <perf_copy_attr+0x3b0> 5110: 7d 20 39 e4 mtsrin r9,r7 5114: 39 29 01 11 addi r9,r9,273 5118: 3c c7 10 00 addis r6,r7,4096 511c: 55 29 02 06 rlwinm r9,r9,0,8,3 5120: 7d 20 31 e4 mtsrin r9,r6 5124: 39 29 01 11 addi r9,r9,273 5128: 3c c6 10 00 addis r6,r6,4096 512c: 55 29 02 06 rlwinm r9,r9,0,8,3 5130: 7d 20 31 e4 mtsrin r9,r6 5134: 39 29 01 11 addi r9,r9,273 5138: 3c c7 30 00 addis r6,r7,12288 513c: 55 29 02 06 rlwinm r9,r9,0,8,3 5140: 7d 20 31 e4 mtsrin r9,r6 5144: 3c e7 40 00 addis r7,r7,16384 5148: 39 29 01 11 addi r9,r9,273 514c: 7f 88 38 40 cmplw cr7,r8,r7 5150: 55 29 02 06 rlwinm r9,r9,0,8,3 5154: 41 9d ff bc bgt cr7,5110 <perf_copy_attr+0x368> 5158: 4c 00 01 2c isync 515c: 39 20 00 80 li r9,128 5160: 91 3d 00 00 stw r9,0(r29) 5164: 38 e0 00 00 li r7,0 5168: 90 e2 04 dc stw r7,1244(r2) 516c: 7d 20 ed 26 mfsrin r9,r29 5170: 65 29 40 00 oris r9,r9,16384 5174: 40 81 00 c0 ble 5234 <perf_copy_attr+0x48c> 5178: 7d 47 50 f8 not r7,r10 517c: 7c e7 42 14 add r7,r7,r8 5180: 54 e7 27 be rlwinm r7,r7,4,30,31 5184: 7d 20 51 e4 mtsrin r9,r10 5188: 3d 4a 10 00 addis r10,r10,4096 518c: 39 29 01 11 addi r9,r9,273 5190: 7c 08 50 40 cmplw r8,r10 5194: 55 29 02 06 rlwinm r9,r9,0,8,3 5198: 40 81 00 9c ble 5234 <perf_copy_attr+0x48c> 519c: 2c 07 00 00 cmpwi r7,0 51a0: 41 82 00 4c beq 51ec <perf_copy_attr+0x444> 51a4: 2c 07 00 01 cmpwi r7,1 51a8: 41 82 00 2c beq 51d4 <perf_copy_attr+0x42c> 51ac: 2c 07 00 02 cmpwi r7,2 51b0: 41 82 00 14 beq 51c4 <perf_copy_attr+0x41c> 51b4: 7d 20 51 e4 mtsrin r9,r10 51b8: 39 29 01 11 addi r9,r9,273 51bc: 3d 4a 10 00 addis r10,r10,4096 51c0: 55 29 02 06 rlwinm r9,r9,0,8,3 51c4: 7d 20 51 e4 mtsrin r9,r10 51c8: 39 29 01 11 addi r9,r9,273 51cc: 3d 4a 10 00 addis r10,r10,4096 51d0: 55 29 02 06 rlwinm r9,r9,0,8,3 51d4: 7d 20 51 e4 mtsrin r9,r10 51d8: 3d 4a 10 00 addis r10,r10,4096 51dc: 39 29 01 11 addi r9,r9,273 51e0: 7c 08 50 40 cmplw r8,r10 51e4: 55 29 02 06 rlwinm r9,r9,0,8,3 51e8: 40 81 00 4c ble 5234 <perf_copy_attr+0x48c> 51ec: 7d 20 51 e4 mtsrin r9,r10 51f0: 39 29 01 11 addi r9,r9,273 51f4: 3c ea 10 00 addis r7,r10,4096 51f8: 55 29 02 06 rlwinm r9,r9,0,8,3 51fc: 7d 20 39 e4 mtsrin r9,r7 5200: 39 29 01 11 addi r9,r9,273 5204: 3c e7 10 00 addis r7,r7,4096 5208: 55 29 02 06 rlwinm r9,r9,0,8,3 520c: 7d 20 39 e4 mtsrin r9,r7 5210: 39 29 01 11 addi r9,r9,273 5214: 3c ea 30 00 addis r7,r10,12288 5218: 55 29 02 06 rlwinm r9,r9,0,8,3 521c: 7d 20 39 e4 mtsrin r9,r7 5220: 3d 4a 40 00 addis r10,r10,16384 5224: 39 29 01 11 addi r9,r9,273 5228: 7c 08 50 40 cmplw r8,r10 522c: 55 29 02 06 rlwinm r9,r9,0,8,3 5230: 41 81 ff bc bgt 51ec <perf_copy_attr+0x444> 5234: 4c 00 01 2c isync Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> [mpe: Export the ool handlers to fix build errors] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/d9121f96a7c4302946839a0771f5d1daeeb6968c.1622708530.git.christophe.leroy@csgroup.eu
2021-06-16powerpc/32s: Initialise KUAP and KUEP in CChristophe Leroy2-6/+4
In order to selectively activate KUAP and KUEP in a following patch, perform KUAP and KUEP initialisation in C. Unlike PPC64, PPC32 doesn't have an early_setup_secondary(), so do it in start_secondary(). Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/87be72023448dd4e476744ed279b8c04b8d08a1c.1622708530.git.christophe.leroy@csgroup.eu
2021-06-16powerpc/32s: Convert switch_mmu_context() to CChristophe Leroy2-63/+0
switch_mmu_context() does things that can easily be done in C. For updating user segments, we have update_user_segments(). As mentionned in commit b5efec00b671 ("powerpc/32s: Move KUEP locking/unlocking in C"), update_user_segments() has the loop unrolled which is a significant performance gain. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/05c0875ad8220c03452c3a334946e207c6ca04d6.1622708530.git.christophe.leroy@csgroup.eu
2021-06-16powerpc/44x: Implement Kernel Userspace Exec Protection (KUEP)Christophe Leroy1-0/+8
Powerpc 44x has two bits for exec protection in TLBs: one for user (UX) and one for superviser (SX). Clear SX on user pages in TLB miss handlers to provide KUEP. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/169310e08152aa1d96c979770291d165ec6896ae.1622616032.git.christophe.leroy@csgroup.eu
2021-06-16powerpc/optprobes: use PPC_RAW_ macrosChristophe Leroy1-32/+7
Use PPC_RAW_ macros to simplify the code. And use PPC_LO/PPC_HI instead of IMM_L/IMM_H which are for internal use inside ppc-opcode.h Those macros are self explanatory, comments can go as well. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/5a167b8ba4d33a5c09cd504f0c862e25ffe85459.1621516826.git.christophe.leroy@csgroup.eu
2021-06-16powerpc/optprobes: Compact code source a bit.Christophe Leroy1-22/+11
Now that lines can be up to 100 chars long, minimise the amount of split lines to increase readability. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/8ebbd977ea8cf8d706d82458f2a21acd44562a99.1621516826.git.christophe.leroy@csgroup.eu
2021-06-16powerpc/optprobes: Minimise castsChristophe Leroy1-12/+11
nip is already an unsigned long, no cast needed. op_callback_addr and emulate_step_addr are kprobe_opcode_t *. There value is obtained with ppc_kallsyms_lookup_name() which returns 'unsigned long', and there values are used create_branch() which expects 'unsigned long'. So change them to 'unsigned long' to avoid casting them back and forth. can_optimize() used p->addr several times as 'unsigned long'. Use a local 'unsigned long' variable and avoid casting multiple times. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/e03192a6d4123242a275e71ce2ba0bb4d90700c1.1621516826.git.christophe.leroy@csgroup.eu
2021-06-16powerpc: Don't use 'struct ppc_inst' to reference instruction locationChristophe Leroy10-68/+56
'struct ppc_inst' is an internal representation of an instruction, but in-memory instructions are and will remain a table of 'u32' forever. Replace all 'struct ppc_inst *' used for locating an instruction in memory by 'u32 *'. This removes a lot of undue casts to 'struct ppc_inst *'. It also helps locating ab-use of 'struct ppc_inst' dereference. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> [mpe: Fix ppc_inst_next(), use u32 instead of unsigned int] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/7062722b087228e42cbd896e39bfdf526d6a340a.1621516826.git.christophe.leroy@csgroup.eu
2021-06-16powerpc: Do not dereference code as 'struct ppc_inst' (uprobe, ↵Christophe Leroy1-1/+1
code-patching, feature-fixups) 'struct ppc_inst' is an internal structure to represent an instruction, it is not directly the representation of that instruction in text code. It is not meant to map and dereference code. Dereferencing code directly through 'struct ppc_inst' has two main issues: - On powerpc, structs are expected to be 8 bytes aligned while code is spread every 4 byte. - Should a non prefixed instruction lie at the end of the page and the following page not be mapped, it would generate a page fault. In-memory code must be accessed with ppc_inst_read(). Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/c9a1201dd0a66b4a0f91f0fb46d9385cbf030feb.1621516826.git.christophe.leroy@csgroup.eu
2021-06-15powerpc: Replace PPC_INST_NOP by PPC_RAW_NOP()Christophe Leroy3-3/+3
On the road to removing all PPC_INST_xx defines in asm/ppc-opcodes.h, change PPC_INST_NOP to PPC_RAW_NOP(). Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/ad46c195ca1b8572629ef07ba6bfe247585239a6.1621506159.git.christophe.leroy@csgroup.eu
2021-06-15powerpc/traps: Start using PPC_RAW_xx() macrosChristophe Leroy1-3/+4
Start using PPC_RAW_xx() macros where relevant. PPC_INST_SYNC is used to both represent the 'sync' instruction and the family of synchronisation instructions. Keep it for the later, maybe we'll change the name in the future to avoid confusion. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/0945c155d6cb113431185fc1296ac127359fe29b.1621506159.git.christophe.leroy@csgroup.eu
2021-06-15powerpc/ftrace: Use PPC_RAW_MFLR() and PPC_RAW_NOP()Christophe Leroy1-9/+9
Use PPC_RAW_MFLR() instead of open coding with PPC_INST_MFLR. Same for PPC_INST_NOP. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/98fd4d717810b7c4032a1edf62dd6fe638e64329.1621506159.git.christophe.leroy@csgroup.eu
2021-06-15powerpc/security: Use PPC_RAW_BLR() and PPC_RAW_NOP()Christophe Leroy1-6/+6
On the road to remove all use of PPC_INST_xxx, replace PPC_INST_BLR by PPC_RAW_BLR(). Same for PPC_INST_NOP. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/c04f88d0e53d2122fbbe92226892a01ebc668b6a.1621506159.git.christophe.leroy@csgroup.eu
2021-06-15powerpc/modules: Use PPC_RAW_xx() macrosChristophe Leroy2-51/+23
To improve readability, use PPC_RAW_xx() macros instead of open coding. Those macros are self-explanatory so the comments can go as well. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/99d9ee8849d3992beeadb310a665aae01c3abfb1.1621506159.git.christophe.leroy@csgroup.eu
2021-06-15powerpc/signal: Use PPC_RAW_xx() macrosChristophe Leroy2-16/+10
To improve readability, use PPC_RAW_xx() macros instead of open coding. Those macros are self-explanatory so the comments can go as well. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/4ca2bfdca2f47a293d05f61eb3c4e487ee170f1f.1621506159.git.christophe.leroy@csgroup.eu
2021-06-15powerpc/lib/code-patching: Use PPC_RAW_() macrosChristophe Leroy1-1/+1
Instead of open coding with PPC_INST_ defines, use PPC_RAW_() macros. It improves readability. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/8c92f1d9e825ee47c6f88fe43ad42d2a8cc2ab4a.1621506159.git.christophe.leroy@csgroup.eu
2021-06-15powerpc: Don't handle ALTIVEC/SPE in ASM in _switch(). Do it in C.Christophe Leroy3-37/+9
_switch() saves and restores ALTIVEC and SPE status. For altivec this is redundant with what __switch_to() does with save_sprs() and restore_sprs() and giveup_all() before calling _switch(). Add support for SPI in save_sprs() and restore_sprs() and remove things from _switch(). Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/8ab21fd93d6e0047aa71e6509e5e312f14b2991b.1620998075.git.christophe.leroy@csgroup.eu
2021-06-15Merge branch 'fixes' into nextMichael Ellerman6-50/+17
Merge our fixes branch which has a number of important fixes, notably the fix for initrd corruption, as well as the fixes for scv vs ptrace.
2021-06-15powerpc/tau: Remove superfluous parameter in alloc_workqueue() callFinn Thain1-1/+1
This avoids an (optional) compiler warning: arch/powerpc/kernel/tau_6xx.c: In function 'TAU_init': arch/powerpc/kernel/tau_6xx.c:204:30: error: too many arguments for format [-Werror=format-extra-args] tau_workq = alloc_workqueue("tau", WQ_UNBOUND, 1, 0); Fixes: b1c6a0a10bfa ("powerpc/tau: Convert from timer to workqueue") Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Signed-off-by: Finn Thain <fthain@linux-m68k.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/a1456e8bbd33ef702e3ff6f14b1bf3919241c62b.1623398307.git.fthain@linux-m68k.org
2021-06-15powerpc/prom_init: Move custom isspace() to its own namespaceAndy Shevchenko1-9/+8
If by some reason any of the headers will include ctype.h we will have a name collision. Avoid this by moving isspace() to the dedicate namespace. First appearance of the code is in the commit cf68787b68a2 ("powerpc/prom_init: Evaluate mem kernel parameter for early allocation"). Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> [mpe: Reformat prom_isxdigit() now that we allow longer lines] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210510144925.58195-1-andriy.shevchenko@linux.intel.com
2021-06-14powerpc/signal64: Copy siginfo before changing regs->nipMichael Ellerman1-5/+4
In commit 96d7a4e06fab ("powerpc/signal64: Rewrite handle_rt_signal64() to minimise uaccess switches") the 64-bit signal code was rearranged to use user_write_access_begin/end(). As part of that change the call to copy_siginfo_to_user() was moved later in the function, so that it could be done after the user_write_access_end(). In particular it was moved after we modify regs->nip to point to the signal trampoline. That means if copy_siginfo_to_user() fails we exit handle_rt_signal64() with an error but with regs->nip modified, whereas previously we would not modify regs->nip until the copy succeeded. Returning an error from signal delivery but with regs->nip updated leaves the process in a sort of half-delivered state. We do immediately force a SEGV in signal_setup_done(), called from do_signal(), so the process should never run in the half-delivered state. However that SEGV is not delivered until we've gone around to do_notify_resume() again, so it's possible some tracing could observe the half-delivered state. There are other cases where we fail signal delivery with regs partly updated, eg. the write to newsp and SA_SIGINFO, but the latter at least is very unlikely to fail as it reads back from the frame we just wrote to. Looking at other arches they seem to be more careful about leaving regs unchanged until the copy operations have succeeded, and in general that seems like good hygenie. So although the current behaviour is not cleary buggy, it's also not clearly correct. So move the call to copy_siginfo_to_user() up prior to the modification of regs->nip, which is closer to the old behaviour, and easier to reason about. Fixes: 96d7a4e06fab ("powerpc/signal64: Rewrite handle_rt_signal64() to minimise uaccess switches") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210608134605.2783677-1-mpe@ellerman.id.au
2021-06-10KVM: PPC: Book3S HV: Remove radix guest support from P7/8 pathNicholas Piggin1-1/+0
The P9 path now runs all supported radix guest combinations, so remove radix guest support from the P7/8 path. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-24-npiggin@gmail.com
2021-06-10KVM: PPC: Book3S HV P9: Implement the rest of the P9 path in CNicholas Piggin1-1/+4
Almost all logic is moved to C, by introducing a new in_guest mode for the P9 path that branches very early in the KVM interrupt handler to P9 exit code. The main P9 entry and exit assembly is now only about 160 lines of low level stack setup and register save/restore, plus a bad-interrupt handler. There are two motivations for this, the first is just make the code more maintainable being in C. The second is to reduce the amount of code running in a special KVM mode, "realmode". In quotes because with radix it is no longer necessarily real-mode in the MMU, but it still has to be treated specially because it may be in real-mode, and has various important registers like PID, DEC, TB, etc set to guest. This is hostile to the rest of Linux and can't use arbitrary kernel functionality or be instrumented well. This initial patch is a reasonably faithful conversion of the asm code, but it does lack any loop to return quickly back into the guest without switching out of realmode in the case of unimportant or easily handled interrupts. As explained in previous changes, handling HV interrupts very quickly in this low level realmode is not so important for P9 performance, and are important to avoid for security, observability, debugability reasons. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-15-npiggin@gmail.com
2021-06-10KVM: PPC: Book3S HV P9: Reduce irq_work vs guest decrementer racesNicholas Piggin1-10/+0
irq_work's use of the DEC SPR is racy with guest<->host switch and guest entry which flips the DEC interrupt to guest, which could lose a host work interrupt. This patch closes one race, and attempts to comment another class of races. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-11-npiggin@gmail.com
2021-06-10KVM: PPC: Book3S 64: Minimise hcall handler calling convention differencesNicholas Piggin1-1/+20
This sets up the same calling convention from interrupt entry to KVM interrupt handler for system calls as exists for other interrupt types. This is a better API, it uses a save area rather than SPR, and it has more registers free to use. Using a single common API helps maintain it, and it becomes easier to use in C in a later patch. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-8-npiggin@gmail.com
2021-06-10KVM: PPC: Book3S 64: Move interrupt early register setup to KVMNicholas Piggin1-108/+23
Like the earlier patch for hcalls, KVM interrupt entry requires a different calling convention than the Linux interrupt handlers set up. Move the code that converts from one to the other into KVM. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-6-npiggin@gmail.com
2021-06-10KVM: PPC: Book3S 64: Move hcall early register setup to KVMNicholas Piggin1-41/+1
System calls / hcalls have a different calling convention than other interrupts, so there is code in the KVMTEST to massage these into the same form as other interrupt handlers. Move this work into the KVM hcall handler. This means teaching KVM a little more about the low level interrupt handler setup, PACA save areas, etc., although that's not obviously worse than the current approach of coming up with an entirely different interrupt register / save convention. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-5-npiggin@gmail.com
2021-06-10KVM: PPC: Book3S 64: add hcall interrupt handlerNicholas Piggin1-3/+3
Add a separate hcall entry point. This can be used to deal with the different calling convention. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Daniel Axtens <dja@axtens.net> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-4-npiggin@gmail.com
2021-06-10KVM: PPC: Book3S 64: Move GUEST_MODE_SKIP test into KVMNicholas Piggin1-60/+0
Move the GUEST_MODE_SKIP logic into KVM code. This is quite a KVM internal detail that has no real need to be in common handlers. Add a comment explaining the what and why of KVM "skip" interrupts. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Daniel Axtens <dja@axtens.net> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-3-npiggin@gmail.com
2021-06-10KVM: PPC: Book3S 64: move KVM interrupt entry to a common entry pointNicholas Piggin1-7/+1
Rather than bifurcate the call depending on whether or not HV is possible, and have the HV entry test for PR, just make a single common point which does the demultiplexing. This makes it simpler to add another type of exit handler. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Daniel Axtens <dja@axtens.net> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Acked-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-2-npiggin@gmail.com
2021-06-10powerpc: Add missing linux/{of.h,irqdomain.h} include directivesMarc Zyngier1-0/+1
A bunch of PPC files are missing the inclusion of linux/of.h and linux/irqdomain.h, relying on transitive inclusion from another file. As we are about to break this dependency, make sure these dependencies are explicit. Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-06-07quota: Wire up quotactl_fd syscallJan Kara1-1/+1
Wire up the quotactl_fd syscall. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
2021-06-06Merge tag 'powerpc-5.13-5' of ↵Linus Torvalds4-43/+11
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "Fix our KVM reverse map real-mode handling since we enabled huge vmalloc (in some configurations). Revert a recent change to our IOMMU code which broke some devices. Fix KVM handling of FSCR on P7/P8, which could have possibly let a guest crash it's Qemu. Fix kprobes validation of prefixed instructions across page boundary. Thanks to Alexey Kardashevskiy, Christophe Leroy, Fabiano Rosas, Frederic Barrat, Naveen N. Rao, and Nicholas Piggin" * tag 'powerpc-5.13-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: Revert "powerpc/kernel/iommu: Align size for IOMMU_PAGE_SIZE() to save TCEs" KVM: PPC: Book3S HV: Save host FSCR in the P7/8 path powerpc: Fix reverse map real-mode address lookup with huge vmalloc powerpc/kprobes: Fix validation of prefixed instructions across page boundary
2021-06-03Merge branch 'sched/urgent' into sched/core, to pick up fixesIngo Molnar6-24/+35
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2021-06-03kprobes: Do not increment probe miss count in the fault handlerNaveen N. Rao1-7/+0
Kprobes has a counter 'nmissed', that is used to count the number of times a probe handler was not called. This generally happens when we hit a kprobe while handling another kprobe. However, if one of the probe handlers causes a fault, we are currently incrementing 'nmissed'. The comment in fault handler indicates that this can be used to account faults taken by the probe handlers. But, this has never been the intention as is evident from the comment above 'nmissed' in 'struct kprobe': /*count the number of times this probe was temporarily disarmed */ unsigned long nmissed; Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Link: https://lkml.kernel.org/r/20210601120150.672652-1-naveen.n.rao@linux.vnet.ibm.com
2021-06-01kprobes: Remove kprobe::fault_handlerPeter Zijlstra1-10/+0
The reason for kprobe::fault_handler(), as given by their comment: * We come here because instructions in the pre/post * handler caused the page_fault, this could happen * if handler tries to access user space by * copy_from_user(), get_user() etc. Let the * user-specified handler try to fix it first. Is just plain bad. Those other handlers are ran from non-preemptible context and had better use _nofault() functions. Also, there is no upstream usage of this. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Link: https://lore.kernel.org/r/20210525073213.561116662@infradead.org
2021-06-01Revert "powerpc/kernel/iommu: Align size for IOMMU_PAGE_SIZE() to save TCEs"Frederic Barrat1-6/+5
This reverts commit 3c0468d4451eb6b4f6604370639f163f9637a479. That commit was breaking alignment guarantees for the DMA address when allocating coherent mappings, as described in Documentation/core-api/dma-api-howto.rst It was also noticed by Mellanox' driver: [ 1515.763621] mlx5_core c002:01:00.0: mlx5_frag_buf_alloc_node:146:(pid 13402): unexpected map alignment: 0x0800000000c61000, page_shift=16 [ 1515.763635] mlx5_core c002:01:00.0: mlx5_cqwq_create:181:(pid 13402): mlx5_frag_buf_alloc_node() failed, -12 Fixes: 3c0468d4451e ("powerpc/kernel/iommu: Align size for IOMMU_PAGE_SIZE() to save TCEs") Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210526144540.117795-1-fbarrat@linux.ibm.com
2021-05-28powerpc: Fix reverse map real-mode address lookup with huge vmallocNicholas Piggin2-35/+4
real_vmalloc_addr() does not currently work for huge vmalloc, which is what the reverse map can be allocated with for radix host, hash guest. Extract the hugepage aware equivalent from eeh code into a helper, and convert existing sites including this one to use it. Fixes: 8abddd968a30 ("powerpc/64s/radix: Enable huge vmalloc mappings") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210526120005.3432222-1-npiggin@gmail.com
2021-05-28powerpc/kprobes: Fix validation of prefixed instructions across page boundaryNaveen N. Rao1-2/+2
When checking if the probed instruction is the suffix of a prefixed instruction, we access the instruction at the previous word. If the probed instruction is the very first word of a module, we can end up trying to access an invalid page. Fix this by skipping the check for all instructions at the beginning of a page. Prefixed instructions cannot cross a 64-byte boundary and as such, we don't expect to encounter a suffix as the very first word in a page for kernel text. Even if there are prefixed instructions crossing a page boundary (from a module, for instance), the instruction will be illegal, so preventing probing on the suffix of such prefix instructions isn't worthwhile. Fixes: b4657f7650ba ("powerpc/kprobes: Don't allow breakpoints on suffixes") Cc: stable@vger.kernel.org # v5.8+ Reported-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/0df9a032a05576a2fa8e97d1b769af2ff0eafbd6.1621416666.git.naveen.n.rao@linux.vnet.ibm.com
2021-05-23Merge tag 'powerpc-5.13-4' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - Fix breakage of strace (and other ptracers etc.) when using the new scv ABI (Power9 or later with glibc >= 2.33). - Fix early_ioremap() on 64-bit, which broke booting on some machines. Thanks to Dmitry V. Levin, Nicholas Piggin, Alexey Kardashevskiy, and Christophe Leroy. * tag 'powerpc-5.13-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/64s/syscall: Fix ptrace syscall info with scv syscalls powerpc/64s/syscall: Use pt_regs.trap to distinguish syscall ABI difference between sc and scv syscalls powerpc: Fix early setup to make early_ioremap() work
2021-05-23powerpc/kprobes: Replace ppc_optinsn by common optinsnChristophe Leroy1-18/+5
Commit 51c9c0843993 ("powerpc/kprobes: Implement Optprobes") implemented a powerpc specific version of optinsn in order to workaround the 32Mb limitation for direct branches. Instead of implementing a dedicated powerpc version, use the common optinsn and override the allocation and freeing functions. This also indirectly remove the CLANG warning about is_kprobe_ppc_optinsn_slot() not being use, and the powerpc will now benefit from commit 5b485629ba0d ("kprobes, extable: Identify kprobes trampolines as kernel text area") Suggested-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/ec5e85f9f9abcfecc959a03495f4a7858eb4d203.1620896780.git.christophe.leroy@csgroup.eu
2021-05-20Merge tag 'quota_for_v5.13-rc3' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull quota fixes from Jan Kara: "The most important part in the pull is disablement of the new syscall quotactl_path() which was added in rc1. The reason is some people at LWN discussion pointed out dirfd would be useful for this path based syscall and Christian Brauner agreed. Without dirfd it may be indeed problematic for containers. So let's just disable the syscall for now when it doesn't have users yet so that we have more time to mull over how to best specify the filesystem we want to work on" * tag 'quota_for_v5.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: quota: Disable quotactl_path syscall quota: Use 'hlist_for_each_entry' to simplify code
2021-05-20powerpc: Fix early setup to make early_ioremap() workAlexey Kardashevskiy1-2/+2
The immediate problem is that after commit 0bd3f9e953bd ("powerpc/legacy_serial: Use early_ioremap()") the kernel silently reboots on some systems. The reason is that early_ioremap() returns broken addresses as it uses slot_virt[] array which initialized with offsets from FIXADDR_TOP == IOREMAP_END+FIXADDR_SIZE == KERN_IO_END - FIXADDR_SIZ + FIXADDR_SIZE == __kernel_io_end which is 0 when early_ioremap_setup() is called. __kernel_io_end is initialized little bit later in early_init_mmu(). This fixes the initialization by swapping early_ioremap_setup() and early_init_mmu(). Fixes: 265c3491c4bc ("powerpc: Add support for GENERIC_EARLY_IOREMAP") Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> [mpe: Drop unrelated cleanup & cleanup change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210520032919.358935-1-aik@ozlabs.ru
2021-05-17quota: Disable quotactl_path syscallJan Kara1-1/+1
In commit fa8b90070a80 ("quota: wire up quotactl_path") we have wired up new quotactl_path syscall. However some people in LWN discussion have objected that the path based syscall is missing dirfd and flags argument which is mostly standard for contemporary path based syscalls. Indeed they have a point and after a discussion with Christian Brauner and Sascha Hauer I've decided to disable the syscall for now and update its API. Since there is no userspace currently using that syscall and it hasn't been released in any major release, we should be fine. CC: Christian Brauner <christian.brauner@ubuntu.com> CC: Sascha Hauer <s.hauer@pengutronix.de> Link: https://lore.kernel.org/lkml/20210512153621.n5u43jsytbik4yze@wittgenstein Signed-off-by: Jan Kara <jack@suse.cz>
2021-05-17powerpc/603: Avoid a pile of NOPs when not using SW LRU in TLB exceptionsChristophe Leroy1-4/+14
The SW LRU is in an MMU feature section. When not used, that's a dozen of NOPs to fetch for nothing. Define an ALT section that does the few remaining operations. That also avoids a double read on SRR1 in the SW LRU case. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/603725297466959419628ef7964aaf3417fb647d.1620363691.git.christophe.leroy@csgroup.eu
2021-05-17powerpc/paca: Remove mm_ctx_id and mm_ctx_slb_addr_limitChristophe Leroy1-2/+0
mm_ctx_id and mm_ctx_slb_addr_limit are not used anymore. Remove them. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/6e1813953da38c452c131fe3e2a2761a0fddb975.1620223303.git.christophe.leroy@csgroup.eu
2021-05-17powerpc/asm-offset: Remove unused itemsChristophe Leroy1-51/+1
Following PACA related items are not used anymore by ASM code: PACA_SIZE, PACACONTEXTID, PACALOWSLICESPSIZE, PACAHIGHSLICEPSIZE, PACA_SLB_ADDR_LIMIT, MMUPSIZEDEFSIZE, PACASLBCACHE, PACASLBCACHEPTR, PACASTABRR, PACAVMALLOCSLLP, MMUPSIZESLLP, PACACONTEXTSLLP, PACALPPACAPTR, LPPACA_DTLIDX and PACA_DTL_RIDX. Following items are also not used anymore: SIGSEGV, NMI_MASK, THREAD_DBCR0, KUAP, TI_FLAGS, TI_PREEMPT, DCACHEL1BLOCKSPERPAGE, ICACHEL1BLOCKSIZE, ICACHEL1LOGBLOCKSIZE, ICACHEL1BLOCKSPERPAGE, STACK_REGS_KUAP, KVM_NEED_FLUSH, KVM_FWNMI, VCPU_DEC, VCPU_SPMC, HSTATE_XICS_PHYS, HSTATE_SAVED_XIRR and PPC_DBELL_MSGTYPE. Remove all of them. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1c80981548dc0c4f145109cdd473022c1aad8d2b.1620223302.git.christophe.leroy@csgroup.eu