summaryrefslogtreecommitdiff
path: root/arch/powerpc/mm
AgeCommit message (Collapse)AuthorFilesLines
2017-08-15powerpc/mm: Simplify __set_fixmap()Christophe Leroy1-15/+0
__set_fixmap() uses __fix_to_virt() then does the boundary checks by it self. Instead, we can use fix_to_virt() which does the verification at build time. For this, we need to use it inline so that GCC can see the real value of idx at buildtime. In the meantime, we remove the 'fixmaps' variable. This variable is set but has never been used from the beginning (commit 2c419bdeca1d9 ("[POWERPC] Port fixmap from x86 and use for kmap_atomic")) Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-15powerpc/mm: declare some local functions staticChristophe Leroy1-2/+2
get_pteptr() and __mapin_ram_chunk() are only used locally, so define them static Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-15powerpc/mm: Implement STRICT_KERNEL_RWX on PPC32Christophe Leroy2-0/+30
This patch implements STRICT_KERNEL_RWX on PPC32. As for CONFIG_DEBUG_PAGEALLOC, it deactivates BAT and LTLB mappings in order to allow page protection setup at the level of each page. As BAT/LTLB mappings are deactivated, there might be a performance impact. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-15powerpc/mm: Fix kernel RAM protection after freeing unused memory on PPC32Christophe Leroy1-3/+10
As seen below, allthough the init sections have been freed, the associated memory area is still marked as executable in the page tables. ~ dmesg [ 5.860093] Freeing unused kernel memory: 592K (c0570000 - c0604000) ~ cat /sys/kernel/debug/kernel_page_tables ---[ Start of kernel VM ]--- 0xc0000000-0xc0497fff 4704K rw X present dirty accessed shared 0xc0498000-0xc056ffff 864K rw present dirty accessed shared 0xc0570000-0xc059ffff 192K rw X present dirty accessed shared 0xc05a0000-0xc7ffffff 125312K rw present dirty accessed shared ---[ vmalloc() Area ]--- This patch fixes that. The implementation is done by reusing the change_page_attr() function implemented for CONFIG_DEBUG_PAGEALLOC Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-15powerpc/mm: Ensure change_page_attr() doesn't invalidate pinned TLBsChristophe Leroy1-4/+6
__change_page_attr() uses flush_tlb_page(). flush_tlb_page() uses tlbie instruction, which also invalidates pinned TLBs, which is not what we expect. This patch modifies the implementation to use flush_tlb_kernel_range() instead. This will make use of tlbia which will preserve pinned TLBs. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-15powerpc/8xx: mark init functions with __initChristophe Leroy1-4/+4
setup_initial_memory_limit() is only called during init. mmu_patch_cmp_limit() is only called from 8xx_mmu.c Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-15powerpc/8xx: Make pinning of ITLBs optionalChristophe Leroy1-1/+7
As stated in a comment in head_8xx.S, today we "Always pin the first 8 MB ITLB to prevent ITLB misses while mucking around with SRR0/SRR1 in asm". This issue has just been cleared by the preceding patch, therefore we can make this pinning optional (on by default) and independent of DATA pinning. This patch also makes pinning of IMMR independent of pinning of DATA. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-15powerpc/8xx: Ensures RAM mapped with LTLB is seen as block mapped on 8xx.Christophe Leroy1-2/+11
On the 8xx, the RAM mapped with LTLBs must be seen as block mapped, just like areas mapped with BATs on standard PPC32. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-10powerpc/mm: Fix section mismatch warning in early_check_vec5()Michael Ellerman1-1/+1
early_check_vec5() is called from and calls __init routines, so should also be __init. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-10powerpc/8xx: Use symbolic names for DSISR bits in DSIChristophe Leroy1-1/+1
Use symbolic names for DSISR bits in DSI Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-10powerpc/8xx: Getting rid of remaining use of CONFIG_8xxChristophe Leroy4-8/+8
Two config options exist to define powerpc MPC8xx: * CONFIG_PPC_8xx * CONFIG_8xx arch/powerpc/platforms/Kconfig.cputype has contained the following comment about CONFIG_8xx item for some years: "# this is temp to handle compat with arch=ppc" arch/powerpc is now the only place with remaining use of CONFIG_8xx: get rid of them. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-10powerpc/mm: Properly invalidate when setting process table baseSuraj Jitindar Singh1-2/+6
The host process table base is stored in the partition table by calling the function native_register_process_table(). Currently this just sets the entry in memory and is missing a subsequent cache invalidation instruction. Any update to the partition table should be followed by a cache invalidation instruction specifying invalidation of the caching of any partition table entries (RIC = 2, PRS = 0). We already have a function to update the partition table with the required cache invalidation instructions - mmu_partition_table_set_entry(). Update the native_register_process_table() function to call mmu_partition_table_set_entry(), this ensures all appropriate invalidation will be performed. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> [mpe: Use a local for patb0 to clean it up slightly] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-08powerpc/mm/hash64: Make vmalloc 56T on hashMichael Ellerman1-3/+15
On 64-bit book3s, with the hash MMU, we currently define the kernel virtual space (vmalloc, ioremap etc.), to be 16T in size. This is a leftover from pre v3.7 when our user VM was also 16T. Of that 16T we split it 50/50, with half used for PCI IO and ioremap and the other 8T for vmalloc. We never bothered to make it any bigger because 8T of vmalloc ought to be enough for anybody. But it turns out that's not true, the per cpu allocator wants large amounts of vmalloc space, not to make large allocations, but to allow a large stride between allocations, because we use pcpu_embed_first_chunk(). With a bit of juggling we can increase the entire kernel virtual space to 64T. The only real complication is the check of the address in the SLB miss handler, see the comment in the code. Although we could continue to split virtual space 50/50 as we do now, no one seems to be running out of PCI IO or ioremap space. So instead keep that as 8T, and use the remaining 56T for vmalloc. In future we should be able to increase the kernel virtual space to 512T, the code already supports that, it just needs testing on older hardware. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
2017-08-08powerpc/mm/slb: Move comment next to the code it's referring toMichael Ellerman1-3/+4
There is a comment in slb_allocate() referring to the load of paca->vmalloc_sllp, but it's several lines prior in the assembly. We're about to change this code, and we want to add another comment, so move the comment immediately prior to the instruction it's talking about. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-08powerpc/mm/book3s64: Make KERN_IO_START a variableMichael Ellerman3-0/+4
Currently KERN_IO_START is defined as: #define KERN_IO_START (KERN_VIRT_START + (KERN_VIRT_SIZE >> 1)) Although it looks like a constant, both the components are actually variables, to allow us to have a different value between Radix and Hash with a single kernel. However that still requires both Radix and Hash to place the kernel IO region at the same location relative to the start and end of the kernel virtual region (namely 1/2 way through it), and we'd like to change that. So split KERN_IO_START out into its own variable, and initialise it for Radix and Hash. In the medium term we should be able to reconsolidate this, by doing a more involved rearrangement of the location of the regions. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-03powerpc: Remove old unused icswx based coprocessor supportBenjamin Herrenschmidt6-482/+0
We have a whole pile of unused code to maintain the ACOP register, allocate coprocessor PIDs and handle ACOP faults. This mechanism was used for the HFI adapter on POWER7 which is dead and gone and whose driver never went upstream. It was used on some A2 core based stuff that also never saw the light of day. Take out all that code. There is still some POWER8 coprocessor code that uses icswx but it's kernel only and thus doesn't use any of that infrastructure. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-03powerpc/mm: Cleanup check for stack expansionBenjamin Herrenschmidt1-36/+48
When hitting below a VM_GROWSDOWN vma (typically growing the stack), we check whether it's a valid stack-growing instruction and we check the distance to GPR1. This is largely open coded with lots of comments, so move it out to a helper. While at it, make store_update_sp a boolean. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-03powerpc/mm: Don't lose "major" fault indication on retryBenjamin Herrenschmidt1-2/+3
If the first iteration returns VM_FAULT_MAJOR but the second one doesn't, we fail to account the fault as a major fault. This fixes it and brings the code in line with x86. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-03powerpc/mm: Move page fault VMA access checks to a helperBenjamin Herrenschmidt1-24/+33
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-03powerpc/mm: Set fault flags earlierBenjamin Herrenschmidt1-1/+4
Move out the code that sets FAULT_FLAG_WRITE so the block that check access permissions can be extracted. While at it also set FAULT_FLAG_INSTRUCTION which will be used for protection keys. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-03powerpc/mm: Add a bunch of (un)likely annotations to do_page_faultBenjamin Herrenschmidt1-10/+10
Mostly for the failure cases Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-03powerpc/mm: Move/simplify faulthandler_disabled() and !mm checkBenjamin Herrenschmidt1-14/+13
Do the check before we re-enable interrupts and clean the code up a bit. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-03powerpc/mm: Move the DSISR_PROTFAULT sanity checkBenjamin Herrenschmidt1-33/+42
This has a page of comment explaining what's going on right in the middle of do_page_fault() which makes things a bit hard to follow. Move it to a helper instead. Also do the test earlier as there's no point waiting until after we found the VMA. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-03powerpc/mm: Cosmetic fix to page fault accountingBenjamin Herrenschmidt1-4/+2
No need to break those lines, they aren't that long Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-03powerpc/mm: Move CMO accounting out of do_page_fault into a helperBenjamin Herrenschmidt1-11/+18
It makes do_page_fault() more readable. No functional change. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-03powerpc/mm: Rework mm_fault_error()Benjamin Herrenschmidt1-38/+28
First, handle the normal retry failure in do_page_fault itself, since it's a simple return statement. That allows us to remove the "continue" special return code from mm_fault_error(). Once that's done, we can have an implementation much closer to x86 where we only call mm_fault_error() if VM_FAULT_ERROR is set and directly return. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-03powerpc/mm: Make bad_area* helper functionsBenjamin Herrenschmidt1-28/+50
Instead of goto labels, instead call those functions and return. This gets us closer to x86 and allows us to shring do_page_fault() even more. The main difference with x86 is that those function return a value which we then return from do_page_fault(). That value is our return value from do_page_fault() which we use to generate kernel faults. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-03powerpc/mm: Fix reporting of kernel execute faultsBenjamin Herrenschmidt1-6/+15
We currently test for is_exec and DSISR_PROTFAULT but that doesn't make sense as this is the wrong error bit to test for an execute permission failure. In fact, we had code that would return early if we had an exec fault in kernel mode so I think that was just dead code anyway. Finally the location of that test is awkward and prevents further simplifications. So instead move that test into a helper along with the existing early test for kernel exec faults and out of range accesses, and put it all in a "bad_kernel_fault()" helper. While at it test the correct error bits. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-03powerpc/mm: Simplify returns from __do_page_faultBenjamin Herrenschmidt1-23/+16
Now that we moved the exception state handling to a wrapper, we can just directly return rather than "goto bail" Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-03powerpc/mm: Move debugger check to notify_page_fault()Benjamin Herrenschmidt1-13/+8
unclutters the main path Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-03powerpc/mm: Overhaul handling of bad page faultsBenjamin Herrenschmidt1-18/+14
A bad page fault is when the HW signals an error such as a bad copy/paste, an AMO error, or some other type of error that will not be fixed by updating the PTE. Use a helper page_fault_is_bad() to check for bad page faults thus removing the per-processor family open-coding in __do_page_fault() and trigger a SIGBUS rather than a SIGSEGV which is more appropriate. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-03powerpc/mm: Move error_code checks for bad faults earlierBenjamin Herrenschmidt1-15/+20
There's no point looking for the VMA etc.. when we already know we are going to fail. This adds some code to set "code" for the si_code but that will be gone in subsequent patches. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-03powerpc/mm: Move out definition of CPU specific is_write bitsBenjamin Herrenschmidt1-7/+11
Define a common page_fault_is_write() helper and use it Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-03powerpc/6xx: Handle DABR match before calling do_page_faultBenjamin Herrenschmidt1-9/+0
On legacy 6xx 32-bit procesors, we checked for the DABR match bit in DSISR from do_page_fault(), in the middle of a pile of ifdef's because all other CPU types do it in assembly prior to calling do_page_fault. Fix that. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [mpe: Add #ifdef CONFIG_6xx] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-02powerpc/mm: Pre-filter SRR1 bits before do_page_fault()Benjamin Herrenschmidt1-12/+2
By filtering the relevant SRR1 bits in the assembly rather than in do_page_fault() itself, we avoid a conditional branch (since we already come from different path for data and instruction faults). This will allow more simplifications later Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-02powerpc/mm: Move exception_enter/exit to a do_page_fault wrapperBenjamin Herrenschmidt1-3/+11
This will allow simplifying the returns from do_page_fault Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-02powerpc/mm/radix: Avoid flushing the PWC on every flush_tlb_rangeBenjamin Herrenschmidt2-6/+42
We do that because it's used by THP pmd collapsing, so use instead a dedicated flush function. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-02powerpc/mm/radix: Improve TLB/PWC flushesBenjamin Herrenschmidt1-39/+27
At the moment we have to rather sub-optimal flushing behaviours: - flush_tlb_mm() will flush the PWC which is unnecessary (for example when doing a fork) - A large unmap will call flush_tlb_pwc() multiple times causing us to perform that fairly expensive operation repeatedly. This happens often in batches of 3 on every new process. So we change flush_tlb_mm() to only flush the TLB, and we use the existing "need_flush_all" flag in struct mmu_gather to indicate that the PWC needs flushing. Unfortunately, flush_tlb_range() still needs to do a full flush for now as it's used by the THP collapsing. We will fix that later. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-02powerpc/mm/radix: Improve _tlbiel_pid to be usable for PWC flushesBenjamin Herrenschmidt1-4/+7
The PWC flush only needs a single set call, just like the full (RIC=2) flush. This will allow us to get rid of the dedicated _tlbiel_pwc() Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-31Merge tag 'v4.13-rc1' into fixesMichael Ellerman1-9/+19
The fixes branch is based off a random pre-rc1 commit, because we had some fixes that needed to go in before rc1 was released. However we now need to fix some code that went in after that point, but before rc1, so merge rc1 to get that code into fixes so we can fix it!
2017-07-31powerpc/mm: Fix check of multiple 16G pages from device treeRui Teng1-1/+1
The offset of hugepage block will not be 16G, if the expected page is more than one. Calculate the totol size instead of the hardcode value. Fixes: 4792adbac9eb ("powerpc: Don't use a 16G page if beyond mem= limits") Signed-off-by: Rui Teng <rui.teng@linux.vnet.ibm.com> Tested-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-27powerpc/mm/hash: Free the subpage_prot_table correctlyAneesh Kumar K.V1-1/+1
Fixes: dad6f37c2602e ("powerpc: subpage_protect: Increase the array size to take care of 64TB") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Tested-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-26powerpc/mm/radix: Workaround prefetch issue with KVMBenjamin Herrenschmidt3-5/+79
There's a somewhat architectural issue with Radix MMU and KVM. When coming out of a guest with AIL (Alternate Interrupt Location, ie, MMU enabled), we start executing hypervisor code with the PID register still containing whatever the guest has been using. The problem is that the CPU can (and will) then start prefetching or speculatively load from whatever host context has that same PID (if any), thus bringing translations for that context into the TLB, which Linux doesn't know about. This can cause stale translations and subsequent crashes. Fixing this in a way that is neither racy nor a huge performance impact is difficult. We could just make the host invalidations always use broadcast forms but that would hurt single threaded programs for example. We chose to fix it instead by partitioning the PID space between guest and host. This is possible because today Linux only use 19 out of the 20 bits of PID space, so existing guests will work if we make the host use the top half of the 20 bits space. We additionally add support for a property to indicate to Linux the size of the PID register which will be useful if we eventually have processors with a larger PID space available. There is still an issue with malicious guests purposefully setting the PID register to a value in the hosts PID range. Hopefully future HW can prevent that, but in the meantime, we handle it with a pair of kludges: - On the way out of a guest, before we clear the current VCPU in the PACA, we check the PID and if it's outside of the permitted range we flush the TLB for that PID. - When context switching, if the mm is "new" on that CPU (the corresponding bit was set for the first time in the mm cpumask), we check if any sibling thread is in KVM (has a non-NULL VCPU pointer in the PACA). If that is the case, we also flush the PID for that CPU (core). This second part is needed to handle the case where a process is migrated (or starts a new pthread) on a sibling thread of the CPU coming out of KVM, as there's a window where stale translations can exist before we detect it and flush them out. A future optimization could be added by keeping track of whether the PID has ever been used and avoid doing that for completely fresh PIDs. We could similarily mark PIDs that have been the subject of a global invalidation as "fresh". But for now this will do. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [mpe: Rework the asm to build with CONFIG_PPC_RADIX_MMU=n, drop unneeded include of kvm_book3s_asm.h] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-24powerpc/mm: Build fix for non SPARSEMEM_VMEMAP configAneesh Kumar K.V1-2/+2
We can use pfn_to_page() in realmode for other configs. Hence remove the CONFIG_FLATMEM ifdef. Fixes: 8e0861fa3c4e ("powerpc: Prepare to support kernel handling of IOMMU map/unmap") Cc: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> [mpe: Also fix up the #endif comment] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-21Merge tag 'powerpc-4.13-3' of ↵Linus Torvalds4-18/+63
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "A handful of fixes, mostly for new code: - some reworking of the new STRICT_KERNEL_RWX support to make sure we also remove executable permission from __init memory before it's freed. - a fix to some recent optimisations to the hypercall entry where we were clobbering r12, this was breaking nested guests (PR KVM). - a fix for the recent patch to opal_configure_cores(). This could break booting on bare metal Power8 boxes if the kernel was built without CONFIG_JUMP_LABEL_FEATURE_CHECK_DEBUG. - .. and finally a workaround for spurious PMU interrupts on Power9 DD2. Thanks to: Nicholas Piggin, Anton Blanchard, Balbir Singh" * tag 'powerpc-4.13-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/mm: Mark __init memory no-execute when STRICT_KERNEL_RWX=y powerpc/mm/hash: Refactor hash__mark_rodata_ro() powerpc/mm/radix: Refactor radix__mark_rodata_ro() powerpc/64s: Fix hypercall entry clobbering r12 input powerpc/perf: Avoid spurious PMU interrupts after idle powerpc/powernv: Fix boot on Power8 bare metal due to opal_configure_cores()
2017-07-18powerpc/mm: Mark __init memory no-execute when STRICT_KERNEL_RWX=yMichael Ellerman4-0/+29
Currently even with STRICT_KERNEL_RWX we leave the __init text marked executable after init, which is bad. Add a hook to mark it NX (no-execute) before we free it, and implement it for radix and hash. Note that we use __init_end as the end address, not _einittext, because overlaps_kernel_text() uses __init_end, because there are additional executable sections other than .init.text between __init_begin and __init_end. Tested on radix and hash with: 0:mon> p $__init_begin *** 400 exception occurred Fixes: 1e0fc9d1eb2b ("powerpc/Kconfig: Enable STRICT_KERNEL_RWX for some configs") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-18powerpc/mm/hash: Refactor hash__mark_rodata_ro()Michael Ellerman1-13/+19
Move the core logic into a helper, so we can use it for changing other permissions. We also change the logic to align start down, and end up. This means calling the function with a range will expand that range to be at least 1 mmu_linear_psize page in size. We need that so we can use it on __init_begin ... __init_end which is not a full page in size. This should always work for _stext/__init_begin, because we align __init_begin to _stext + 16M in the linker script. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-18powerpc/mm/radix: Refactor radix__mark_rodata_ro()Michael Ellerman1-5/+15
Move the core logic into a helper, so we can use it for changing permissions other than _PAGE_WRITE. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-15Merge tag 'powerpc-4.13-2' of ↵Linus Torvalds1-3/+17
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "Nothing that really stands out, just a bunch of fixes that have come in in the last couple of weeks. None of these are actually fixes for code that is new in 4.13. It's roughly half older bugs, with fixes going to stable, and half fixes/updates for Power9. Thanks to: Aneesh Kumar K.V, Anton Blanchard, Balbir Singh, Benjamin Herrenschmidt, Madhavan Srinivasan, Michael Neuling, Nicholas Piggin, Oliver O'Halloran" * tag 'powerpc-4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/64: Fix atomic64_inc_not_zero() to return an int powerpc: Fix emulation of mfocrf in emulate_step() powerpc: Fix emulation of mcrf in emulate_step() powerpc/perf: Add POWER9 alternate PM_RUN_CYC and PM_RUN_INST_CMPL events powerpc/perf: Fix SDAR_MODE value for continous sampling on Power9 powerpc/asm: Mark cr0 as clobbered in mftb() powerpc/powernv: Fix local TLB flush for boot and MCE on POWER9 powerpc/mm/radix: Synchronize updates to the process table powerpc/mm/radix: Properly clear process table entry powerpc/powernv: Tell OPAL about our MMU mode on POWER9 powerpc/kexec: Fix radix to hash kexec due to IAMR/AMOR
2017-07-13powerpc,mmap: properly account for stack randomization in mmap_baseRik van Riel1-9/+19
When RLIMIT_STACK is, for example, 256MB, the current code results in a gap between the top of the task and mmap_base of 256MB, failing to take into account the amount by which the stack address was randomized. In other words, the stack gets less than RLIMIT_STACK space. Ensure that the gap between the stack and mmap_base always takes stack randomization and the stack guard gap into account. Inspired by Daniel Micay's linux-hardened tree. Link: http://lkml.kernel.org/r/20170622200033.25714-4-riel@redhat.com Signed-off-by: Rik van Riel <riel@redhat.com> Reported-by: Florian Weimer <fweimer@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Will Deacon <will.deacon@arm.com> Cc: Daniel Micay <danielmicay@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Hugh Dickins <hughd@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>