summaryrefslogtreecommitdiff
path: root/arch/arm64/lib
AgeCommit message (Collapse)AuthorFilesLines
2021-05-27arm64: insn: Add SVE instruction classJulien Thierry1-1/+1
SVE has been public for some time now. Let the decoder acknowledge its existence. Signed-off-by: Julien Thierry <jthierry@redhat.com> Link: https://lore.kernel.org/r/20210303170536.1838032-6-jthierry@redhat.com Signed-off-by: Will Deacon <will@kernel.org>
2021-05-27arm64: Move instruction encoder/decoder under lib/Julien Thierry2-3/+1461
Aarch64 instruction set encoding and decoding logic can prove useful for some features/tools both part of the kernel and outside the kernel. Isolate the function dealing only with encoding/decoding instructions, with minimal dependency on kernel utilities in order to be able to reuse that code. Code was only moved, no code should have been added, removed nor modifier. Signed-off-by: Julien Thierry <jthierry@redhat.com> Link: https://lore.kernel.org/r/20210303170536.1838032-5-jthierry@redhat.com Signed-off-by: Will Deacon <will@kernel.org>
2021-05-27kasan: arm64: support specialized outlined tag mismatch checksPeter Collingbourne2-0/+78
By using outlined checks we can achieve a significant code size improvement by moving the tag-based ASAN checks into separate functions. Unlike the existing CONFIG_KASAN_OUTLINE mode these functions have a custom calling convention that preserves most registers and is specialized to the register containing the address and the type of access, and as a result we can eliminate the code size and performance overhead of a standard calling convention such as AAPCS for these functions. This change depends on a separate series of changes to Clang [1] to support outlined checks in the kernel, although the change works fine without them (we just don't get outlined checks). This is because the flag -mllvm -hwasan-inline-all-checks=0 has no effect until the Clang changes land. The flag was introduced in the Clang 9.0 timeframe as part of the support for outlined checks in userspace and because our minimum Clang version is 10.0 we can pass it unconditionally. Outlined checks require a new runtime function with a custom calling convention. Add this function to arch/arm64/lib. I measured the code size of defconfig + tag-based KASAN, as well as boot time (i.e. time to init launch) on a DragonBoard 845c with an Android arm64 GKI kernel. The results are below: code size boot time CONFIG_KASAN_INLINE=y before 92824064 6.18s CONFIG_KASAN_INLINE=y after 38822400 6.65s CONFIG_KASAN_OUTLINE=y 39215616 11.48s We can see straight away that specialized outlined checks beat the existing CONFIG_KASAN_OUTLINE=y on both code size and boot time for tag-based ASAN. As for the comparison between CONFIG_KASAN_INLINE=y before and after we saw similar performance numbers in userspace [2] and decided that since the performance overhead is minimal compared to the overhead of tag-based ASAN itself as well as compared to the code size improvements we would just replace the inlined checks with the specialized outlined checks without the option to select between them, and that is what I have implemented in this patch. Signed-off-by: Peter Collingbourne <pcc@google.com> Acked-by: Andrey Konovalov <andreyknvl@gmail.com> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Link: https://linux-review.googlesource.com/id/I1a30036c70ab3c3ee78d75ed9b87ef7cdc3fdb76 Link: [1] https://reviews.llvm.org/D90426 Link: [2] https://reviews.llvm.org/D56954 Link: https://lore.kernel.org/r/20210526174927.2477847-3-pcc@google.com Signed-off-by: Will Deacon <will@kernel.org>
2021-05-25arm64: Rename arm64-internal cache maintenance functionsFuad Tabba1-2/+2
Although naming across the codebase isn't that consistent, it tends to follow certain patterns. Moreover, the term "flush" isn't defined in the Arm Architecture reference manual, and might be interpreted to mean clean, invalidate, or both for a cache. Rename arm64-internal functions to make the naming internally consistent, as well as making it consistent with the Arm ARM, by specifying whether it applies to the instruction, data, or both caches, whether the operation is a clean, invalidate, or both. Also specify which point the operation applies to, i.e., to the point of unification (PoU), coherency (PoC), or persistence (PoP). This commit applies the following sed transformation to all files under arch/arm64: "s/\b__flush_cache_range\b/caches_clean_inval_pou_macro/g;"\ "s/\b__flush_icache_range\b/caches_clean_inval_pou/g;"\ "s/\binvalidate_icache_range\b/icache_inval_pou/g;"\ "s/\b__flush_dcache_area\b/dcache_clean_inval_poc/g;"\ "s/\b__inval_dcache_area\b/dcache_inval_poc/g;"\ "s/__clean_dcache_area_poc\b/dcache_clean_poc/g;"\ "s/\b__clean_dcache_area_pop\b/dcache_clean_pop/g;"\ "s/\b__clean_dcache_area_pou\b/dcache_clean_pou/g;"\ "s/\b__flush_cache_user_range\b/caches_clean_inval_user_pou/g;"\ "s/\b__flush_icache_all\b/icache_inval_all_pou/g;" Note that __clean_dcache_area_poc is deliberately missing a word boundary check at the beginning in order to match the efistub symbols in image-vars.h. Also note that, despite its name, __flush_icache_range operates on both instruction and data caches. The name change here reflects that. No functional change intended. Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Fuad Tabba <tabba@google.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20210524083001.2586635-19-tabba@google.com Signed-off-by: Will Deacon <will@kernel.org>
2021-05-25arm64: __clean_dcache_area_pop to take end parameter instead of sizeFuad Tabba1-2/+2
To be consistent with other functions with similar names and functionality in cacheflush.h, cache.S, and cachetlb.rst, change to specify the range in terms of start and end, as opposed to start and size. No functional change intended. Reported-by: Will Deacon <will@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Fuad Tabba <tabba@google.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20210524083001.2586635-15-tabba@google.com Signed-off-by: Will Deacon <will@kernel.org>
2021-03-19arm64: lib: Annotate {clear, copy}_page() as position-independentWill Deacon2-4/+4
clear_page() and copy_page() are suitable for use outside of the kernel address space, so annotate them as position-independent code. Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210319100146.1149909-2-qperret@google.com
2021-02-26arm64: kasan: simplify and inline MTE functionsAndrey Konovalov1-16/+0
This change provides a simpler implementation of mte_get_mem_tag(), mte_get_random_tag(), and mte_set_mem_tag_range(). Simplifications include removing system_supports_mte() checks as these functions are onlye called from KASAN runtime that had already checked system_supports_mte(). Besides that, size and address alignment checks are removed from mte_set_mem_tag_range(), as KASAN now does those. This change also moves these functions into the asm/mte-kasan.h header and implements mte_set_mem_tag_range() via inline assembly to avoid unnecessary functions calls. [vincenzo.frascino@arm.com: fix warning in mte_get_random_tag()] Link: https://lkml.kernel.org/r/20210211152208.23811-1-vincenzo.frascino@arm.com Link: https://lkml.kernel.org/r/a26121b294fdf76e369cb7a74351d1c03a908930.1612546384.git.andreyknvl@google.com Co-developed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Branislav Rankov <Branislav.Rankov@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Marco Elver <elver@google.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-22arm64: mte: add in-kernel MTE helpersVincenzo Frascino1-0/+16
Provide helper functions to manipulate allocation and pointer tags for kernel addresses. Low-level helper functions (mte_assign_*, written in assembly) operate tag values from the [0x0, 0xF] range. High-level helper functions (mte_get/set_*) use the [0xF0, 0xFF] range to preserve compatibility with normal kernel pointers that have 0xFF in their top byte. MTE_GRANULE_SIZE and related definitions are moved to mte-def.h header that doesn't have any dependencies and is safe to include into any low-level header. Link: https://lkml.kernel.org/r/c31bf759b4411b2d98cdd801eb928e241584fd1f.1606161801.git.andreyknvl@google.com Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Co-developed-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Branislav Rankov <Branislav.Rankov@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Marco Elver <elver@google.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-02arm64: uaccess cleanup macro namingMark Rutland5-22/+22
Now the uaccess primitives use LDTR/STTR unconditionally, the uao_{ldp,stp,user_alternative} asm macros are misnamed, and have a redundant argument. Let's remove the redundant argument and rename these to user_{ldp,stp,ldst} respectively to clean this up. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Robin Murohy <robin.murphy@arm.com> Cc: Christoph Hellwig <hch@lst.de> Cc: James Morse <james.morse@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20201202131558.39270-9-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-12-02arm64: uaccess: simplify __copy_user_flushcache()Mark Rutland1-3/+1
Currently __copy_user_flushcache() open-codes raw_copy_from_user(), and doesn't use uaccess_mask_ptr() on the user address. Let's have it call raw_copy_from_user(), which is both a simplification and ensures that user pointers are masked under speculation. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20201202131558.39270-6-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-11-10arm64: uaccess: move uao_* alternatives to asm-uaccess.hMark Rutland1-1/+1
The uao_* alternative asm macros are only used by the uaccess assembly routines in arch/arm64/lib/, where they are included indirectly via asm-uaccess.h. Since they're specific to the uaccess assembly (and will lose the alternatives in subsequent patches), let's move them into asm-uaccess.h. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christoph Hellwig <hch@lst.de> Cc: James Morse <james.morse@arm.com> Cc: Will Deacon <will@kernel.org> [will: update #include in mte.S to pull in uao asm macros] Signed-off-by: Will Deacon <will@kernel.org>
2020-10-30arm64: Change .weak to SYM_FUNC_START_WEAK_PI for arch/arm64/lib/mem*.SFangrui Song3-6/+3
Commit 39d114ddc682 ("arm64: add KASAN support") added .weak directives to arch/arm64/lib/mem*.S instead of changing the existing SYM_FUNC_START_PI macros. This can lead to the assembly snippet `.weak memcpy ... .globl memcpy` which will produce a STB_WEAK memcpy with GNU as but STB_GLOBAL memcpy with LLVM's integrated assembler before LLVM 12. LLVM 12 (since https://reviews.llvm.org/D90108) will error on such an overridden symbol binding. Use the appropriate SYM_FUNC_START_WEAK_PI instead. Fixes: 39d114ddc682 ("arm64: add KASAN support") Reported-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Fangrui Song <maskray@google.com> Tested-by: Sami Tolvanen <samitolvanen@google.com> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20201029181951.1866093-1-maskray@google.com Signed-off-by: Will Deacon <will@kernel.org>
2020-09-04arm64: mte: Enable swap of tagged pagesSteven Price1-0/+45
When swapping pages out to disk it is necessary to save any tags that have been set, and restore when swapping back in. Make use of the new page flag (PG_ARCH_2, locally named PG_mte_tagged) to identify pages with tags. When swapping out these pages the tags are stored in memory and later restored when the pages are brought back in. Because shmem can swap pages back in without restoring the userspace PTE it is also necessary to add a hook for shmem. Signed-off-by: Steven Price <steven.price@arm.com> [catalin.marinas@arm.com: move function prototypes to mte.h] [catalin.marinas@arm.com: drop '_tags' from arch_swap_restore_tags()] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Will Deacon <will@kernel.org>
2020-09-04arm64: mte: ptrace: Add PTRACE_{PEEK,POKE}MTETAGS supportCatalin Marinas1-0/+53
Add support for bulk setting/getting of the MTE tags in a tracee's address space at 'addr' in the ptrace() syscall prototype. 'data' points to a struct iovec in the tracer's address space with iov_base representing the address of a tracer's buffer of length iov_len. The tags to be copied to/from the tracer's buffer are stored as one tag per byte. On successfully copying at least one tag, ptrace() returns 0 and updates the tracer's iov_len with the number of tags copied. In case of error, either -EIO or -EFAULT is returned, trying to follow the ptrace() man page. Note that the tag copying functions are not performance critical, therefore they lack optimisations found in typical memory copy routines. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Alan Hayward <Alan.Hayward@arm.com> Cc: Luis Machado <luis.machado@linaro.org> Cc: Omair Javaid <omair.javaid@linaro.org>
2020-09-04arm64: mte: Tags-aware copy_{user_,}highpage() implementationsVincenzo Frascino1-0/+19
When the Memory Tagging Extension is enabled, the tags need to be preserved across page copy (e.g. for copy-on-write, page migration). Introduce MTE-aware copy_{user_,}highpage() functions to copy tags to the destination if the source page has the PG_mte_tagged flag set. copy_user_page() does not need to handle tag copying since, with this patch, it is only called by the DAX code where there is no source page structure (and no source tags). Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Co-developed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org>
2020-09-04arm64: mte: Clear the tags when a page is mapped in user-space with PROT_MTECatalin Marinas2-0/+36
Pages allocated by the kernel are not guaranteed to have the tags zeroed, especially as the kernel does not (yet) use MTE itself. To ensure the user can still access such pages when mapped into its address space, clear the tags via set_pte_at(). A new page flag - PG_mte_tagged (PG_arch_2) - is used to track pages with valid allocation tags. Since the zero page is mapped as pte_special(), it won't be covered by the above set_pte_at() mechanism. Clear its tags during early MTE initialisation. Co-developed-by: Steven Price <steven.price@arm.com> Signed-off-by: Steven Price <steven.price@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org>
2020-06-11Merge branch 'rwonce/rework' of ↵Linus Torvalds1-8/+12
git://git.kernel.org/pub/scm/linux/kernel/git/will/linux Pull READ/WRITE_ONCE rework from Will Deacon: "This the READ_ONCE rework I've been working on for a while, which bumps the minimum GCC version and improves code-gen on arm64 when stack protector is enabled" [ Side note: I'm _really_ tempted to raise the minimum gcc version to 4.9, so that we can just say that we require _Generic() support. That would allow us to more cleanly handle a lot of the cases where we depend on very complex macros with 'sizeof' or __builtin_choose_expr() with __builtin_types_compatible_p() etc. This branch has a workaround for sparse not handling _Generic(), either, but that was already fixed in the sparse development branch, so it's really just gcc-4.9 that we'd require. - Linus ] * 'rwonce/rework' of git://git.kernel.org/pub/scm/linux/kernel/git/will/linux: compiler_types.h: Use unoptimized __unqual_scalar_typeof for sparse compiler_types.h: Optimize __unqual_scalar_typeof compilation time compiler.h: Enforce that READ_ONCE_NOCHECK() access size is sizeof(long) compiler-types.h: Include naked type in __pick_integer_type() match READ_ONCE: Fix comment describing 2x32-bit atomicity gcov: Remove old GCC 3.4 support arm64: barrier: Use '__unqual_scalar_typeof' for acquire/release macros locking/barriers: Use '__unqual_scalar_typeof' for load-acquire macros READ_ONCE: Drop pointer qualifiers when reading from scalar types READ_ONCE: Enforce atomicity for {READ,WRITE}_ONCE() memory accesses READ_ONCE: Simplify implementations of {READ,WRITE}_ONCE() arm64: csum: Disable KASAN for do_csum() fault_inject: Don't rely on "return value" from WRITE_ONCE() net: tls: Avoid assigning 'const' pointer to non-const pointer netfilter: Avoid assigning 'const' pointer to non-const pointer compiler/gcc: Raise minimum GCC version for kernel builds to 4.8
2020-04-29arm64: Reorder the macro arguments in the copy routinesCatalin Marinas4-64/+64
The current argument order is obviously buggy (memcpy.S): macro strb1 ptr, regB, val strb \ptr, [\regB], \val endm However, it cancels out as the calling sites in copy_template.S pass the address as the regB argument. Mechanically reorder the arguments to match the instruction mnemonics. There is no difference in objdump before and after this patch. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20200429183702.28445-1-catalin.marinas@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2020-04-28arm64: lib: Consistently enable crc32 extensionMark Brown1-1/+1
Currently most of the assembly files that use architecture extensions enable them using the .arch directive but crc32.S uses .cpu instead. Move that over to .arch for consistency. Signed-off-by: Mark Brown <broonie@kernel.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20200414182843.31664-1-broonie@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2020-04-15arm64: csum: Disable KASAN for do_csum()Will Deacon1-8/+12
do_csum() over-reads the source buffer and therefore abuses READ_ONCE_NOCHECK() to avoid tripping up KASAN. In preparation for READ_ONCE_NOCHECK() becoming a macro, and therefore losing its '__no_sanitize_address' annotation, just annotate do_csum() explicitly and fall back to normal loads. Cc: Mark Rutland <mark.rutland@arm.com> Cc: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
2020-03-17arm64: fix spelling mistake "ca not" -> "cannot"韩科才1-1/+1
There is a spelling mistake in the comment, Fix it. Signed-off-by: hankecai <hankecai@bbktel.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-03-09arm64: csum: Optimise IPv6 header checksumRobin Murphy1-0/+27
Throwing our __uint128_t idioms at csum_ipv6_magic() makes it about 1.3x-2x faster across a range of microarchitecture/compiler combinations. Not much in absolute terms, but every little helps. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-01-22Merge branch 'for-next/asm-annotations' into for-next/coreWill Deacon19-50/+50
* for-next/asm-annotations: (6 commits) arm64: kernel: Correct annotation of end of el0_sync ...
2020-01-22Merge branches 'for-next/acpi', 'for-next/cpufeatures', 'for-next/csum', ↵Will Deacon3-22/+148
'for-next/e0pd', 'for-next/entry', 'for-next/kbuild', 'for-next/kexec/cleanup', 'for-next/kexec/file-kdump', 'for-next/misc', 'for-next/nofpsimd', 'for-next/perf' and 'for-next/scs' into for-next/core * for-next/acpi: ACPI/IORT: Fix 'Number of IDs' handling in iort_id_map() * for-next/cpufeatures: (2 commits) arm64: Introduce ID_ISAR6 CPU register ... * for-next/csum: (2 commits) arm64: csum: Fix pathological zero-length calls ... * for-next/e0pd: (7 commits) arm64: kconfig: Fix alignment of E0PD help text ... * for-next/entry: (5 commits) arm64: entry: cleanup sp_el0 manipulation ... * for-next/kbuild: (4 commits) arm64: kbuild: remove compressed images on 'make ARCH=arm64 (dist)clean' ... * for-next/kexec/cleanup: (11 commits) Revert "arm64: kexec: make dtb_mem always enabled" ... * for-next/kexec/file-kdump: (2 commits) arm64: kexec_file: add crash dump support ... * for-next/misc: (12 commits) arm64: entry: Avoid empty alternatives entries ... * for-next/nofpsimd: (7 commits) arm64: nofpsmid: Handle TIF_FOREIGN_FPSTATE flag cleanly ... * for-next/perf: (2 commits) perf/imx_ddr: Fix cpu hotplug state cleanup ... * for-next/scs: (6 commits) arm64: kernel: avoid x18 in __cpu_soft_restart ...
2020-01-17arm64: csum: Fix pathological zero-length callsRobin Murphy1-0/+3
In validating the checksumming results of the new routine, I sadly neglected to test its not-checksumming results. Thus it slipped through that the one case where @buff is already dword-aligned and @len = 0 manages to defeat the tail-masking logic and behave as if @len = 8. For a zero length it doesn't make much sense to deference @buff anyway, so just add an early return (which has essentially zero impact on performance). Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
2020-01-16arm64/lib: copy_page: avoid x18 register in assembler codeArd Biesheuvel1-19/+19
Register x18 will no longer be used as a caller save register in the future, so stop using it in the copy_page() code. Link: https://patchwork.kernel.org/patch/9836869/ Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> [Sami: changed the offset and bias to be explicit] Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
2020-01-16arm64: Implement optimised checksum routineRobin Murphy2-3/+126
Apparently there exist certain workloads which rely heavily on software checksumming, for which the generic do_csum() implementation becomes a significant bottleneck. Therefore let's give arm64 its own optimised version - for ease of maintenance this foregoes assembly or intrisics, and is thus not actually arm64-specific, but does rely heavily on C idioms that translate well to the A64 ISA and the typical load/store capabilities of most ARMv8 CPU cores. The resulting increase in checksum throughput scales nicely with buffer size, tending towards 4x for a small in-order core (Cortex-A53), and up to 6x or more for an aggressive big core (Ampere eMAG). Reported-by: Lingyan Huang <huanglingyan2@huawei.com> Tested-by: Lingyan Huang <huanglingyan2@huawei.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
2020-01-08arm64: lib: Use modern annotations for assembly functionsMark Brown19-50/+50
In an effort to clarify and simplify the annotation of assembly functions in the kernel new macros have been introduced. These replace ENTRY and ENDPROC and also add a new annotation for static functions which previously had no ENTRY equivalent. Update the annotations in the library code to the new macros. Signed-off-by: Mark Brown <broonie@kernel.org> [will: Use SYM_FUNC_START_WEAK_PI] Signed-off-by: Will Deacon <will@kernel.org>
2019-11-20arm64: uaccess: Remove uaccess_*_not_uao asm macrosPavel Tatashin5-13/+5
It is safer and simpler to drop the uaccess assembly macros in favour of inline C functions. Although this bloats the Image size slightly, it aligns our user copy routines with '{get,put}_user()' and generally makes the code a lot easier to reason about. Cc: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com> [will: tweaked commit message and changed temporary variable names] Signed-off-by: Will Deacon <will@kernel.org>
2019-11-20arm64: uaccess: Ensure PAN is re-enabled after unhandled uaccess faultPavel Tatashin4-0/+4
A number of our uaccess routines ('__arch_clear_user()' and '__arch_copy_{in,from,to}_user()') fail to re-enable PAN if they encounter an unhandled fault whilst accessing userspace. For CPUs implementing both hardware PAN and UAO, this bug has no effect when both extensions are in use by the kernel. For CPUs implementing hardware PAN but not UAO, this means that a kernel using hardware PAN may execute portions of code with PAN inadvertently disabled, opening us up to potential security vulnerabilities that rely on userspace access from within the kernel which would usually be prevented by this mechanism. In other words, parts of the kernel run the same way as they would on a CPU without PAN implemented/emulated at all. For CPUs not implementing hardware PAN and instead relying on software emulation via 'CONFIG_ARM64_SW_TTBR0_PAN=y', the impact is unfortunately much worse. Calling 'schedule()' with software PAN disabled means that the next task will execute in the kernel using the page-table and ASID of the previous process even after 'switch_mm()', since the actual hardware switch is deferred until return to userspace. At this point, or if there is a intermediate call to 'uaccess_enable()', the page-table and ASID of the new process are installed. Sadly, due to the changes introduced by KPTI, this is not an atomic operation and there is a very small window (two instructions) where the CPU is configured with the page-table of the old task and the ASID of the new task; a speculative access in this state is disastrous because it would corrupt the TLB entries for the new task with mappings from the previous address space. As Pavel explains: | I was able to reproduce memory corruption problem on Broadcom's SoC | ARMv8-A like this: | | Enable software perf-events with PERF_SAMPLE_CALLCHAIN so userland's | stack is accessed and copied. | | The test program performed the following on every CPU and forking | many processes: | | unsigned long *map = mmap(NULL, PAGE_SIZE, PROT_READ|PROT_WRITE, | MAP_SHARED | MAP_ANONYMOUS, -1, 0); | map[0] = getpid(); | sched_yield(); | if (map[0] != getpid()) { | fprintf(stderr, "Corruption detected!"); | } | munmap(map, PAGE_SIZE); | | From time to time I was getting map[0] to contain pid for a | different process. Ensure that PAN is re-enabled when returning after an unhandled user fault from our uaccess routines. Cc: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Cc: <stable@vger.kernel.org> Fixes: 338d4f49d6f7 ("arm64: kernel: Add support for Privileged Access Never") Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com> [will: rewrote commit message] Signed-off-by: Will Deacon <will@kernel.org>
2019-08-30Merge branch 'for-next/atomics' into for-next/coreWill Deacon2-22/+0
* for-next/atomics: (10 commits) Rework LSE instruction selection to use static keys instead of alternatives
2019-08-29arm64: atomics: Remove atomic_ll_sc compilation unitAndrew Murray2-22/+0
We no longer fall back to out-of-line atomics on systems with CONFIG_ARM64_LSE_ATOMICS where ARM64_HAS_LSE_ATOMICS is not set. Remove the unused compilation unit which provided these symbols. Signed-off-by: Andrew Murray <andrew.murray@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
2019-08-07arm64: Add support for function error injectionLeo Yan2-0/+20
Inspired by the commit 7cd01b08d35f ("powerpc: Add support for function error injection"), this patch supports function error injection for Arm64. This patch mainly support two functions: one is regs_set_return_value() which is used to overwrite the return value; the another function is override_function_with_return() which is to override the probed function returning and jump to its caller. Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: Will Deacon <will@kernel.org>
2019-06-19treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500Thomas Gleixner2-8/+2
Based on 2 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation # extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 4122 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Enrico Weigelt <info@metux.net> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-19treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 234Thomas Gleixner20-249/+20
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details you should have received a copy of the gnu general public license along with this program if not see http www gnu org licenses extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 503 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexios Zavras <alexios.zavras@intel.com> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Enrico Weigelt <info@metux.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190602204653.811534538@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-09arm64: Makefile: Replace -pg with CC_FLAGS_FTRACETorsten Duwe1-1/+1
In preparation for arm64 supporting ftrace built on other compiler options, let's have the arm64 Makefiles remove the $(CC_FLAGS_FTRACE) flags, whatever these may be, rather than assuming '-pg'. There should be no functional change as a result of this patch. Reviewed-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Torsten Duwe <duwe@suse.de> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-10arm64: string: use asm EXPORT_SYMBOL()Mark Rutland11-0/+14
For a while now it's been possible to use EXPORT_SYMBOL() in assembly files, which allows us to place exports immediately after assembly functions, as we do for C functions. As a step towards removing arm64ksyms.c, let's move the string routine exports to the assembly files the functions are defined in. Routines which should only be exported for !KASAN builds are exported using the EXPORT_SYMBOL_NOKASAN() helper. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-10arm64: uaccess: use asm EXPORT_SYMBOL()Mark Rutland4-3/+11
For a while now it's been possible to use EXPORT_SYMBOL() in assembly files, which allows us to place exports immediately after assembly functions, as we do for C functions. As a step towards removing arm64ksyms.c, let's move the uaccess exports to the assembly files the functions are defined in. As we have to include <asm/assembler.h>, the existing includes are fixed to follow the usual ordering conventions. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-10arm64: page: use asm EXPORT_SYMBOL()Mark Rutland2-0/+2
For a while now it's been possible to use EXPORT_SYMBOL() in assembly files, which allows us to place exports immediately after assembly functions, as we do for C functions. As a step towards removing arm64ksyms.c, let's move the copy_page and clear_page exports to the assembly files the functions are defined in. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-10arm64: tishift: use asm EXPORT_SYMBOL()Mark Rutland1-0/+5
For a while now it's been possible to use EXPORT_SYMBOL() in assembly files, which allows us to place exports immediately after assembly functions, as we do for C functions. As a step towards removing arm64ksyms.c, let's move the tishift exports to the assembly file the functions are defined in. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-06arm64: crypto: add NEON accelerated XOR implementationJackie Liu2-0/+190
This is a NEON acceleration method that can improve performance by approximately 20%. I got the following data from the centos 7.5 on Huawei's HISI1616 chip: [ 93.837726] xor: measuring software checksum speed [ 93.874039] 8regs : 7123.200 MB/sec [ 93.914038] 32regs : 7180.300 MB/sec [ 93.954043] arm64_neon: 9856.000 MB/sec [ 93.954047] xor: using function: arm64_neon (9856.000 MB/sec) I believe this code can bring some optimization for all arm64 platform. thanks for Ard Biesheuvel's suggestions. Signed-off-by: Jackie Liu <liuyun01@kylinos.cn> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-11-30arm64/lib: improve CRC32 performance for deep pipelinesArd Biesheuvel1-5/+49
Improve the performance of the crc32() asm routines by getting rid of most of the branches and small sized loads on the common path. Instead, use a branchless code path involving overlapping 16 byte loads to process the first (length % 32) bytes, and process the remainder using a loop that processes 32 bytes at a time. Tested using the following test program: #include <stdlib.h> extern void crc32_le(unsigned short, char const*, int); int main(void) { static const char buf[4096]; srand(20181126); for (int i = 0; i < 100 * 1000 * 1000; i++) crc32_le(0, buf, rand() % 1024); return 0; } On Cortex-A53 and Cortex-A57, the performance regresses but only very slightly. On Cortex-A72 however, the performance improves from $ time ./crc32 real 0m10.149s user 0m10.149s sys 0m0.000s to $ time ./crc32 real 0m7.915s user 0m7.915s sys 0m0.000s Cc: Rui Sun <sunrui26@huawei.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-10-27arm64: lib: use C string functions with KASAN enabledAndrey Ryabinin8-8/+8
ARM64 has asm implementation of memchr(), memcmp(), str[r]chr(), str[n]cmp(), str[n]len(). KASAN don't see memory accesses in asm code, thus it can potentially miss many bugs. Ifdef out __HAVE_ARCH_* defines of these functions when KASAN is enabled, so the generic implementations from lib/string.c will be used. We can't just remove the asm functions because efistub uses them. And we can't have two non-weak functions either, so declare the asm functions as weak. Link: http://lkml.kernel.org/r/20180920135631.23833-2-aryabinin@virtuozzo.com Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Reported-by: Kyeongdon Kim <kyeongdon.kim@lge.com> Cc: Alexander Potapenko <glider@google.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-09-24arm64: lse: remove -fcall-used-x0 flagTri Vo1-1/+1
x0 is not callee-saved in the PCS. So there is no need to specify -fcall-used-x0. Clang doesn't currently support -fcall-used flags. This patch will help building the kernel with clang. Tested-by: Nick Desaulniers <ndesaulniers@google.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Tri Vo <trong@android.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-09-10arm64/lib: add accelerated crc32 routinesArd Biesheuvel2-0/+62
Unlike crc32c(), which is wired up to the crypto API internally so the optimal driver is selected based on the platform's capabilities, crc32_le() is implemented as a library function using a slice-by-8 table based C implementation. Even though few of the call sites may be bottlenecks, calling a time variant implementation with a non-negligible D-cache footprint is a bit of a waste, given that ARMv8.1 and up mandates support for the CRC32 instructions that were optional in ARMv8.0, but are already widely available, even on the Cortex-A53 based Raspberry Pi. So implement routines that use these instructions if available, and fall back to the existing generic routines otherwise. The selection is based on alternatives patching. Note that this unconditionally selects CONFIG_CRC32 as a builtin. Since CRC32 is relied upon by core functionality such as CONFIG_OF_FLATTREE, this just codifies the status quo. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-06-21locking/atomics/arm64: Replace our atomic/lock bitop implementations with ↵Will Deacon2-77/+1
asm-generic The <asm-generic/bitops/{atomic,lock}.h> implementations are built around the atomic-fetch ops, which we implement efficiently for both LSE and LL/SC systems. Use that instead of our hand-rolled, out-of-line bitops.S. Signed-off-by: Will Deacon <will.deacon@arm.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arm-kernel@lists.infradead.org Cc: yamada.masahiro@socionext.com Link: https://lore.kernel.org/lkml/1529412794-17720-9-git-send-email-will.deacon@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-21arm64: export tishift functions to modulesJason A. Donenfeld1-13/+2
Otherwise modules that use these arithmetic operations will fail to link. We accomplish this with the usual EXPORT_SYMBOL, which on most architectures goes in the .S file but the ARM64 maintainers prefer that insead it goes into arm64ksyms. While we're at it, we also fix this up to use SPDX, and I personally choose to relicense this as GPL2||BSD so that these symbols don't need to be export_symbol_gpl, so all modules can use the routines, since these are important general purpose compiler-generated function calls. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Reported-by: PaX Team <pageexec@freemail.hu> Cc: stable@vger.kernel.org Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-04-27arm64: avoid instrumenting atomic_ll_sc.oMark Rutland1-0/+4
Our out-of-line atomics are built with a special calling convention, preventing pointless stack spilling, and allowing us to patch call sites with ARMv8.1 atomic instructions. Instrumentation inserted by the compiler may result in calls to functions not following this special calling convention, resulting in registers being unexpectedly clobbered, and various problems resulting from this. For example, if a kernel is built with KCOV and ARM64_LSE_ATOMICS, the compiler inserts calls to __sanitizer_cov_trace_pc in the prologues of the atomic functions. This has been observed to result in spurious cmpxchg failures, leading to a hang early on in the boot process. This patch avoids such issues by preventing instrumentation of our out-of-line atomics. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-04-06Merge tag 'devicetree-for-4.17' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux Pull DeviceTree updates from Rob Herring: - Sync dtc to upstream version v1.4.6-9-gaadd0b65c987. This adds a bunch more warnings (hidden behind W=1). - Build dtc lexer and parser files instead of using shipped versions. - Rework overlay apply API to take an FDT as input and apply overlays in a single step. - Add a phandle lookup cache. This improves boot time by hundreds of msec on systems with large DT. - Add trivial mcp4017/18/19 potentiometers bindings. - Remove VLA stack usage in DT code. * tag 'devicetree-for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (26 commits) of: unittest: fix an error code in of_unittest_apply_overlay() of: unittest: move misplaced function declaration of: unittest: Remove VLA stack usage of: overlay: Fix forgotten reference to of_overlay_apply() of: Documentation: Fix forgotten reference to of_overlay_apply() of: unittest: local return value variable related cleanups of: unittest: remove unneeded local return value variables dt-bindings: trivial: add various mcp4017/18/19 potentiometers of: unittest: fix an error test in of_unittest_overlay_8() of: cache phandle nodes to reduce cost of of_find_node_by_phandle() dt-bindings: rockchip-dw-mshc: use consistent clock names MAINTAINERS: Add linux/of_*.h headers to appropriate subsystems scripts: turn off some new dtc warnings by default scripts/dtc: Update to upstream version v1.4.6-9-gaadd0b65c987 scripts/dtc: generate lexer and parser during build instead of shipping powerpc: boot: add strrchr function of: overlay: do not include path in full_name of added nodes of: unittest: clean up changeset test arm64/efi: Make strrchr() available to the EFI namespace ARM: boot: add strrchr function ...
2018-03-06arm64: lse: Pass -fomit-frame-pointer to out-of-line ll/sc atomicsWill Deacon1-1/+2
In cases where x30 is used as a temporary in the out-of-line ll/sc atomics (e.g. atomic_fetch_add), the compiler tends to put out a full stackframe, which included pointing the x29 at the new frame. Since these things aren't traceable anyway, we can pass -fomit-frame-pointer to reduce the work when spilling. Since this is incompatible with -pg, we also remove that from the CFLAGS for this file. Signed-off-by: Will Deacon <will.deacon@arm.com>