summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2022-03-25kasan, mm: only define ___GFP_SKIP_KASAN_POISON with HW_TAGSAndrey Konovalov2-4/+16
Only define the ___GFP_SKIP_KASAN_POISON flag when CONFIG_KASAN_HW_TAGS is enabled. This patch it not useful by itself, but it prepares the code for additions of new KASAN-specific GFP patches. Link: https://lkml.kernel.org/r/44e5738a584c11801b2b8f1231898918efc8634a.1643047180.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Acked-by: Marco Elver <elver@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-25kasan, vmalloc: unpoison VM_ALLOC pages after mappingAndrey Konovalov1-8/+22
Make KASAN unpoison vmalloc mappings after they have been mapped in when it's possible: for vmalloc() (indentified via VM_ALLOC) and vm_map_ram(). The reasons for this are: - For vmalloc() and vm_map_ram(): pages don't get unpoisoned in case mapping them fails. - For vmalloc(): HW_TAGS KASAN needs pages to be mapped to set tags via kasan_unpoison_vmalloc(). As a part of these changes, the return value of __vmalloc_node_range() is changed to area->addr. This is a non-functional change, as __vmalloc_area_node() returns area->addr anyway. Link: https://lkml.kernel.org/r/fcb98980e6fcd3c4be6acdcb5d6110898ef28548.1643047180.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Alexander Potapenko <glider@google.com> Acked-by: Marco Elver <elver@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-25kasan, vmalloc, arm64: mark vmalloc mappings as pgprot_taggedAndrey Konovalov3-0/+22
HW_TAGS KASAN relies on ARM Memory Tagging Extension (MTE). With MTE, a memory region must be mapped as MT_NORMAL_TAGGED to allow setting memory tags via MTE-specific instructions. Add proper protection bits to vmalloc() allocations. These allocations are always backed by page_alloc pages, so the tags will actually be getting set on the corresponding physical memory. Link: https://lkml.kernel.org/r/983fc33542db2f6b1e77b34ca23448d4640bbb9e.1643047180.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Co-developed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Acked-by: Marco Elver <elver@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-25kasan, vmalloc: add vmalloc tagging for SW_TAGSAndrey Konovalov3-14/+22
Add vmalloc tagging support to SW_TAGS KASAN. - __kasan_unpoison_vmalloc() now assigns a random pointer tag, poisons the virtual mapping accordingly, and embeds the tag into the returned pointer. - __get_vm_area_node() (used by vmalloc() and vmap()) and pcpu_get_vm_areas() save the tagged pointer into vm_struct->addr (note: not into vmap_area->addr). This requires putting kasan_unpoison_vmalloc() after setup_vmalloc_vm[_locked](); otherwise the latter will overwrite the tagged pointer. The tagged pointer then is naturally propagateed to vmalloc() and vmap(). - vm_map_ram() returns the tagged pointer directly. As a result of this change, vm_struct->addr is now tagged. Enabling KASAN_VMALLOC with SW_TAGS is not yet allowed. Link: https://lkml.kernel.org/r/4a78f3c064ce905e9070c29733aca1dd254a74f1.1643047180.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Acked-by: Marco Elver <elver@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-25kasan, arm64: reset pointer tags of vmapped stacksAndrey Konovalov2-6/+10
Once tag-based KASAN modes start tagging vmalloc() allocations, kernel stacks start getting tagged if CONFIG_VMAP_STACK is enabled. Reset the tag of kernel stack pointers after allocation in arch_alloc_vmap_stack(). For SW_TAGS KASAN, when CONFIG_KASAN_STACK is enabled, the instrumentation can't handle the SP register being tagged. For HW_TAGS KASAN, there's no instrumentation-related issues. However, the impact of having a tagged SP register needs to be properly evaluated, so keep it non-tagged for now. Note, that the memory for the stack allocation still gets tagged to catch vmalloc-into-stack out-of-bounds accesses. [andreyknvl@google.com: fix case when a stack is retrieved from cached_stacks] Link: https://lkml.kernel.org/r/f50c5f96ef896d7936192c888b0c0a7674e33184.1644943792.git.andreyknvl@google.com [dan.carpenter@oracle.com: remove unnecessary check in alloc_thread_stack_node()] Link: https://lkml.kernel.org/r/20220301080706.GB17208@kili Link: https://lkml.kernel.org/r/698c5ab21743c796d46c15d075b9481825973e34.1643047180.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Marco Elver <elver@google.com> Reviewed-by: Marco Elver <elver@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-25kasan, fork: reset pointer tags of vmapped stacksAndrey Konovalov1-0/+2
Once tag-based KASAN modes start tagging vmalloc() allocations, kernel stacks start getting tagged if CONFIG_VMAP_STACK is enabled. Reset the tag of kernel stack pointers after allocation in alloc_thread_stack_node(). For SW_TAGS KASAN, when CONFIG_KASAN_STACK is enabled, the instrumentation can't handle the SP register being tagged. For HW_TAGS KASAN, there's no instrumentation-related issues. However, the impact of having a tagged SP register needs to be properly evaluated, so keep it non-tagged for now. Note, that the memory for the stack allocation still gets tagged to catch vmalloc-into-stack out-of-bounds accesses. Link: https://lkml.kernel.org/r/c6c96f012371ecd80e1936509ebcd3b07a5956f7.1643047180.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Alexander Potapenko <glider@google.com> Acked-by: Marco Elver <elver@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-25kasan, vmalloc: reset tags in vmalloc functionsAndrey Konovalov1-3/+9
In preparation for adding vmalloc support to SW/HW_TAGS KASAN, reset pointer tags in functions that use pointer values in range checks. vread() is a special case here. Despite the untagging of the addr pointer in its prologue, the accesses performed by vread() are checked. Instead of accessing the virtual mappings though addr directly, vread() recovers the physical address via page_address(vmalloc_to_page()) and acceses that. And as page_address() recovers the pointer tag, the accesses get checked. Link: https://lkml.kernel.org/r/046003c5f683cacb0ba18e1079e9688bb3dca943.1643047180.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Acked-by: Marco Elver <elver@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-25kasan: add wrappers for vmalloc hooksAndrey Konovalov2-5/+17
Add wrappers around functions that [un]poison memory for vmalloc allocations. These functions will be used by HW_TAGS KASAN and therefore need to be disabled when kasan=off command line argument is provided. This patch does no functional changes for software KASAN modes. Link: https://lkml.kernel.org/r/3b8728eac438c55389fb0f9a8a2145d71dd77487.1643047180.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Alexander Potapenko <glider@google.com> Acked-by: Marco Elver <elver@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-25kasan: reorder vmalloc hooksAndrey Konovalov2-32/+31
Group functions that [de]populate shadow memory for vmalloc. Group functions that [un]poison memory for vmalloc. This patch does no functional changes but prepares KASAN code for adding vmalloc support to HW_TAGS KASAN. Link: https://lkml.kernel.org/r/aeef49eb249c206c4c9acce2437728068da74c28.1643047180.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Alexander Potapenko <glider@google.com> Acked-by: Marco Elver <elver@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-25kasan, vmalloc: drop outdated VM_KASAN commentAndrey Konovalov1-11/+0
The comment about VM_KASAN in include/linux/vmalloc.c is outdated. VM_KASAN is currently only used to mark vm_areas allocated for kernel modules when CONFIG_KASAN_VMALLOC is disabled. Drop the comment. Link: https://lkml.kernel.org/r/780395afea83a147b3b5acc36cf2e38f7f8479f9.1643047180.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Alexander Potapenko <glider@google.com> Acked-by: Marco Elver <elver@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-25kasan, x86, arm64, s390: rename functions for modules shadowAndrey Konovalov6-13/+13
Rename kasan_free_shadow to kasan_free_module_shadow and kasan_module_alloc to kasan_alloc_module_shadow. These functions are used to allocate/free shadow memory for kernel modules when KASAN_VMALLOC is not enabled. The new names better reflect their purpose. Also reword the comment next to their declaration to improve clarity. Link: https://lkml.kernel.org/r/36db32bde765d5d0b856f77d2d806e838513fe84.1643047180.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Marco Elver <elver@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-25kasan: define KASAN_VMALLOC_INVALID for SW_TAGSAndrey Konovalov1-1/+2
In preparation for adding vmalloc support to SW_TAGS KASAN, provide a KASAN_VMALLOC_INVALID definition for it. HW_TAGS KASAN won't be using this value, as it falls back onto page_alloc for poisoning freed vmalloc() memory. Link: https://lkml.kernel.org/r/1daaaafeb148a7ae8285265edc97d7ca07b6a07d.1643047180.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Alexander Potapenko <glider@google.com> Acked-by: Marco Elver <elver@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-25kasan: clean up metadata byte definitionsAndrey Konovalov1-2/+5
Most of the metadata byte values are only used for Generic KASAN. Remove KASAN_KMALLOC_FREETRACK definition for !CONFIG_KASAN_GENERIC case, and put it along with other metadata values for the Generic mode under a corresponding ifdef. Link: https://lkml.kernel.org/r/ac11d6e9e007c95e472e8fdd22efb6074ef3c6d8.1643047180.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Alexander Potapenko <glider@google.com> Acked-by: Marco Elver <elver@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-25kasan, page_alloc: rework kasan_unpoison_pages call siteAndrey Konovalov1-7/+12
Rework the checks around kasan_unpoison_pages() call in post_alloc_hook(). The logical condition for calling this function is: - If a software KASAN mode is enabled, we need to mark shadow memory. - Otherwise, HW_TAGS KASAN is enabled, and it only makes sense to set tags if they haven't already been cleared by tag_clear_highpage(), which is indicated by init_tags. This patch concludes the changes for post_alloc_hook(). Link: https://lkml.kernel.org/r/0ecebd0d7ccd79150e3620ea4185a32d3dfe912f.1643047180.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Acked-by: Marco Elver <elver@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-25kasan, page_alloc: move kernel_init_free_pages in post_alloc_hookAndrey Konovalov1-4/+8
Pull the kernel_init_free_pages() call in post_alloc_hook() out of the big if clause for better code readability. This also allows for more simplifications in the following patch. This patch does no functional changes. Link: https://lkml.kernel.org/r/a7a76456501eb37ddf9fca6529cee9555e59cdb1.1643047180.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Alexander Potapenko <glider@google.com> Acked-by: Marco Elver <elver@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-25kasan, page_alloc: move SetPageSkipKASanPoison in post_alloc_hookAndrey Konovalov1-3/+3
Pull the SetPageSkipKASanPoison() call in post_alloc_hook() out of the big if clause for better code readability. This also allows for more simplifications in the following patches. Also turn the kasan_has_integrated_init() check into the proper kasan_hw_tags_enabled() one. These checks evaluate to the same value, but logically skipping kasan poisoning has nothing to do with integrated init. Link: https://lkml.kernel.org/r/7214c1698b754ccfaa44a792113c95cc1f807c48.1643047180.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Acked-by: Marco Elver <elver@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-25kasan, page_alloc: combine tag_clear_highpage calls in post_alloc_hookAndrey Konovalov1-16/+16
Move tag_clear_highpage() loops out of the kasan_has_integrated_init() clause as a code simplification. This patch does no functional changes. Link: https://lkml.kernel.org/r/587e3fc36358b88049320a89cc8dc6deaecb0cda.1643047180.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Alexander Potapenko <glider@google.com> Acked-by: Marco Elver <elver@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-25kasan, page_alloc: merge kasan_alloc_pages into post_alloc_hookAndrey Konovalov4-37/+16
Currently, the code responsible for initializing and poisoning memory in post_alloc_hook() is scattered across two locations: kasan_alloc_pages() hook for HW_TAGS KASAN and post_alloc_hook() itself. This is confusing. This and a few following patches combine the code from these two locations. Along the way, these patches do a step-by-step restructure the many performed checks to make them easier to follow. Replace the only caller of kasan_alloc_pages() with its implementation. As kasan_has_integrated_init() is only true when CONFIG_KASAN_HW_TAGS is enabled, moving the code does no functional changes. Also move init and init_tags variables definitions out of kasan_has_integrated_init() clause in post_alloc_hook(), as they have the same values regardless of what the if condition evaluates to. This patch is not useful by itself but makes the simplifications in the following patches easier to follow. Link: https://lkml.kernel.org/r/5ac7e0b30f5cbb177ec363ddd7878a3141289592.1643047180.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Acked-by: Marco Elver <elver@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-25kasan, page_alloc: refactor init checks in post_alloc_hookAndrey Konovalov1-8/+10
Separate code for zeroing memory from the code clearing tags in post_alloc_hook(). This patch is not useful by itself but makes the simplifications in the following patches easier to follow. This patch does no functional changes. Link: https://lkml.kernel.org/r/2283fde963adfd8a2b29a92066f106cc16661a3c.1643047180.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Alexander Potapenko <glider@google.com> Acked-by: Marco Elver <elver@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-25kasan: only apply __GFP_ZEROTAGS when memory is zeroedAndrey Konovalov1-1/+2
__GFP_ZEROTAGS should only be effective if memory is being zeroed. Currently, hardware tag-based KASAN violates this requirement. Fix by including an initialization check along with checking for __GFP_ZEROTAGS. Link: https://lkml.kernel.org/r/f4f4593f7f675262d29d07c1938db5bd0cd5e285.1643047180.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Alexander Potapenko <glider@google.com> Acked-by: Marco Elver <elver@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-25mm: clarify __GFP_ZEROTAGS commentAndrey Konovalov1-2/+4
__GFP_ZEROTAGS is intended as an optimization: if memory is zeroed during allocation, it's possible to set memory tags at the same time with little performance impact. Clarify this intention of __GFP_ZEROTAGS in the comment. Link: https://lkml.kernel.org/r/cdffde013973c5634a447513e10ec0d21e8eee29.1643047180.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Acked-by: Marco Elver <elver@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-25kasan: drop skip_kasan_poison variable in free_pages_prepareAndrey Konovalov1-2/+1
skip_kasan_poison is only used in a single place. Call should_skip_kasan_poison() directly for simplicity. Link: https://lkml.kernel.org/r/1d33212e79bc9ef0b4d3863f903875823e89046f.1643047180.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Suggested-by: Marco Elver <elver@google.com> Acked-by: Marco Elver <elver@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-25kasan, page_alloc: init memory of skipped pages on freeAndrey Konovalov1-3/+8
Since commit 7a3b83537188 ("kasan: use separate (un)poison implementation for integrated init"), when all init, kasan_has_integrated_init(), and skip_kasan_poison are true, free_pages_prepare() doesn't initialize the page. This is wrong. Fix it by remembering whether kasan_poison_pages() performed initialization, and call kernel_init_free_pages() if it didn't. Reordering kasan_poison_pages() and kernel_init_free_pages() is OK, since kernel_init_free_pages() can handle poisoned memory. Link: https://lkml.kernel.org/r/1d97df75955e52727a3dc1c4e33b3b50506fc3fd.1643047180.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Acked-by: Marco Elver <elver@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-25kasan, page_alloc: simplify kasan_poison_pages call siteAndrey Konovalov1-13/+5
Simplify the code around calling kasan_poison_pages() in free_pages_prepare(). This patch does no functional changes. Link: https://lkml.kernel.org/r/ae4f9bcf071577258e786bcec4798c145d718c46.1643047180.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Alexander Potapenko <glider@google.com> Acked-by: Marco Elver <elver@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-25kasan, page_alloc: merge kasan_free_pages into free_pages_prepareAndrey Konovalov4-22/+5
Currently, the code responsible for initializing and poisoning memory in free_pages_prepare() is scattered across two locations: kasan_free_pages() for HW_TAGS KASAN and free_pages_prepare() itself. This is confusing. This and a few following patches combine the code from these two locations. Along the way, these patches also simplify the performed checks to make them easier to follow. Replaces the only caller of kasan_free_pages() with its implementation. As kasan_has_integrated_init() is only true when CONFIG_KASAN_HW_TAGS is enabled, moving the code does no functional changes. This patch is not useful by itself but makes the simplifications in the following patches easier to follow. Link: https://lkml.kernel.org/r/303498d15840bb71905852955c6e2390ecc87139.1643047180.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Alexander Potapenko <glider@google.com> Acked-by: Marco Elver <elver@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-25kasan, page_alloc: move tag_clear_highpage out of kernel_init_free_pagesAndrey Konovalov1-11/+13
Currently, kernel_init_free_pages() serves two purposes: it either only zeroes memory or zeroes both memory and memory tags via a different code path. As this function has only two callers, each using only one code path, this behaviour is confusing. Pull the code that zeroes both memory and tags out of kernel_init_free_pages(). As a result of this change, the code in free_pages_prepare() starts to look complicated, but this is improved in the few following patches. Those improvements are not integrated into this patch to make diffs easier to read. This patch does no functional changes. Link: https://lkml.kernel.org/r/7719874e68b23902629c7cf19f966c4fd5f57979.1643047180.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Alexander Potapenko <glider@google.com> Acked-by: Marco Elver <elver@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-25kasan, page_alloc: deduplicate should_skip_kasan_poisonAndrey Konovalov1-22/+33
Patch series "kasan, vmalloc, arm64: add vmalloc tagging support for SW/HW_TAGS", v6. This patchset adds vmalloc tagging support for SW_TAGS and HW_TAGS KASAN modes. About half of patches are cleanups I went for along the way. None of them seem to be important enough to go through stable, so I decided not to split them out into separate patches/series. The patchset is partially based on an early version of the HW_TAGS patchset by Vincenzo that had vmalloc support. Thus, I added a Co-developed-by tag into a few patches. SW_TAGS vmalloc tagging support is straightforward. It reuses all of the generic KASAN machinery, but uses shadow memory to store tags instead of magic values. Naturally, vmalloc tagging requires adding a few kasan_reset_tag() annotations to the vmalloc code. HW_TAGS vmalloc tagging support stands out. HW_TAGS KASAN is based on Arm MTE, which can only assigns tags to physical memory. As a result, HW_TAGS KASAN only tags vmalloc() allocations, which are backed by page_alloc memory. It ignores vmap() and others. This patch (of 39): Currently, should_skip_kasan_poison() has two definitions: one for when CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled, one for when it's not. Instead of duplicating the checks, add a deferred_pages_enabled() helper and use it in a single should_skip_kasan_poison() definition. Also move should_skip_kasan_poison() closer to its caller and clarify all conditions in the comment. Link: https://lkml.kernel.org/r/cover.1643047180.git.andreyknvl@google.com Link: https://lkml.kernel.org/r/658b79f5fb305edaf7dc16bc52ea870d3220d4a8.1643047180.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Acked-by: Marco Elver <elver@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-25mm/migration: add trace events for base page and HugeTLB migrationsAnshuman Khandual4-2/+40
This adds two trace events for base page and HugeTLB page migrations. These events, closely follow the implementation details like setting and removing of PTE migration entries, which are essential operations for migration. The new CREATE_TRACE_POINTS in <mm/rmap.c> covers both <events/migration.h> and <events/tlb.h> based trace events. Hence drop redundant CREATE_TRACE_POINTS from other places which could have otherwise conflicted during build. Link: https://lkml.kernel.org/r/1643368182-9588-3-git-send-email-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Reported-by: kernel test robot <lkp@intel.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-25mm/migration: add trace events for THP migrationsAnshuman Khandual3-1/+32
Patch series "mm/migration: Add trace events", v3. This adds trace events for all migration scenarios including base page, THP and HugeTLB. This patch (of 3): This adds two trace events for PMD based THP migration without split. These events closely follow the implementation details like setting and removing of PMD migration entries, which are essential operations for THP migration. This moves CREATE_TRACE_POINTS into generic THP from powerpc for these new trace events to be available on other platforms as well. Link: https://lkml.kernel.org/r/1643368182-9588-1-git-send-email-anshuman.khandual@arm.com Link: https://lkml.kernel.org/r/1643368182-9588-2-git-send-email-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-25mm/thp: fix NR_FILE_MAPPED accounting in page_*_file_rmap()Hugh Dickins1-17/+14
NR_FILE_MAPPED accounting in mm/rmap.c (for /proc/meminfo "Mapped" and /proc/vmstat "nr_mapped" and the memcg's memory.stat "mapped_file") is slightly flawed for file or shmem huge pages. It is well thought out, and looks convincing, but there's a racy case when the careful counting in page_remove_file_rmap() (without page lock) gets discarded. So that in a workload like two "make -j20" kernel builds under memory pressure, with cc1 on hugepage text, "Mapped" can easily grow by a spurious 5MB or more on each iteration, ending up implausibly bigger than most other numbers in /proc/meminfo. And, hypothetically, might grow to the point of seriously interfering in mm/vmscan.c's heuristics, which do take NR_FILE_MAPPED into some consideration. Fixed by moving the __mod_lruvec_page_state() down to where it will not be missed before return (and I've grown a bit tired of that oft-repeated but-not-everywhere comment on the __ness: it gets lost in the move here). Does page_add_file_rmap() need the same change? I suspect not, because page lock is held in all relevant cases, and its skipping case looks safe; but it's much easier to be sure, if we do make the same change. Link: https://lkml.kernel.org/r/e02e52a1-8550-a57c-ed29-f51191ea2375@google.com Fixes: dd78fedde4b9 ("rmap: support file thp") Signed-off-by: Hugh Dickins <hughd@google.com> Reviewed-by: Yang Shi <shy828301@gmail.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-25mm: filemap_unaccount_folio() large skip mapcount fixupHugh Dickins1-13/+13
The page_mapcount_reset() when folio_mapped() while mapping_exiting() was devised long before there were huge or compound pages in the cache. It is still valid for small pages, but not at all clear what's right to check and reset on large pages. Just don't try when folio_test_large(). Link: https://lkml.kernel.org/r/879c4426-4122-da9c-1a86-697f2c9a083@google.com Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-25mm: delete __ClearPageWaiters()Hugh Dickins4-22/+9
The PG_waiters bit is not included in PAGE_FLAGS_CHECK_AT_FREE, and vmscan.c's free_unref_page_list() callers rely on that not to generate bad_page() alerts. So __page_cache_release(), put_pages_list() and release_pages() (and presumably copy-and-pasted free_zone_device_page()) are redundant and misleading to make a special point of clearing it (as the "__" implies, it could only safely be used on the freeing path). Delete __ClearPageWaiters(). Remark on this in one of the "possible" comments in folio_wake_bit(), and delete the superfluous comments. Link: https://lkml.kernel.org/r/3eafa969-5b1a-accf-88fe-318784c791a@google.com Signed-off-by: Hugh Dickins <hughd@google.com> Tested-by: Yu Zhao <yuzhao@google.com> Reviewed-by: Yang Shi <shy828301@gmail.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Yu Zhao <yuzhao@google.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-25selftest/vm: add helpers to detect PAGE_SIZE and PAGE_SHIFTMike Rapoport2-4/+26
PAGE_SIZE is not 4096 in many configurations, particularly ppc64 uses 64K pages in majority of cases. Add helpers to detect PAGE_SIZE and PAGE_SHIFT dynamically. Without this tests are broken w.r.t reading /proc/self/pagemap if (pread(pagemap_fd, ent, sizeof(ent), (uintptr_t)ptr >> (PAGE_SHIFT - 3)) != sizeof(ent)) err(2, "read pagemap"); Link: https://lkml.kernel.org/r/20220307054355.149820-2-aneesh.kumar@linux.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-25selftest/vm: add util.h and and move helper functions thereAneesh Kumar K.V3-75/+52
Avoid code duplication by adding util.h. No functional change in this patch. Link: https://lkml.kernel.org/r/20220307054355.149820-1-aneesh.kumar@linux.ibm.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-25mm: unexport page_init_poisonChristoph Hellwig1-1/+0
page_init_poison is only used in core MM code, so unexport it. Link: https://lkml.kernel.org/r/20220207063446.1833404-1-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-25tools/vm/page_owner_sort.c: support for user-defined culling rulesJiajian Ye2-22/+157
When viewing page owner information, we may want to cull blocks of information with our own rules. So it is important to enhance culling function to provide the support for customizing culling rules. Therefore, following adjustments are made: 1. Add --cull option to support the culling of blocks of information with user-defined culling rules. ./page_owner_sort <input> <output> --cull=<rules> ./page_owner_sort <input> <output> --cull <rules> <rules> is a single argument in the form of a comma-separated list to specify individual culling rules, by the sequence of keys k1,k2, .... Mixed use of abbreviated and complete-form of keys is allowed. For reference, please see the document(Documentation/vm/page_owner.rst). Now, assuming two blocks in the input file are as follows: Page allocated via order 0, mask xxxx, pid 1, tgid 1 (task_name_demo) PFN xxxx prep_new_page+0xd0/0xf8 get_page_from_freelist+0x4a0/0x1290 __alloc_pages+0x168/0x340 alloc_pages+0xb0/0x158 Page allocated via order 0, mask xxxx, pid 32, tgid 32 (task_name_demo) PFN xxxx prep_new_page+0xd0/0xf8 get_page_from_freelist+0x4a0/0x1290 __alloc_pages+0x168/0x340 alloc_pages+0xb0/0x158 If we want to cull the blocks by stacktrace and task command name, we can use this command: ./page_owner_sort <input> <output> --cull=stacktrace,name The output would be like: 2 times, 2 pages, task_comm_name: task_name_demo prep_new_page+0xd0/0xf8 get_page_from_freelist+0x4a0/0x1290 __alloc_pages+0x168/0x340 alloc_pages+0xb0/0x158 As we can see, these two blocks are culled successfully, for they share the same pid and task command name. However, if we want to cull the blocks by pid, stacktrace and task command name, we can this command: ./page_owner_sort <input> <output> --cull=stacktrace,name,pid The output would be like: 1 times, 1 pages, PID 1, task_comm_name: task_name_demo prep_new_page+0xd0/0xf8 get_page_from_freelist+0x4a0/0x1290 __alloc_pages+0x168/0x340 alloc_pages+0xb0/0x158 1 times, 1 pages, PID 32, task_comm_name: task_name_demo prep_new_page+0xd0/0xf8 get_page_from_freelist+0x4a0/0x1290 __alloc_pages+0x168/0x340 alloc_pages+0xb0/0x158 As we can see, these two blocks are failed to cull, for their PIDs are different. 2. Add explanations of --cull options to the document. This work is coauthored by Yixuan Cao Shenghong Han Yinan Zhang Chongxi Zhao Yuhong Feng Link: https://lkml.kernel.org/r/20220312145834.624-1-yejiajian2018@email.szu.edu.cn Signed-off-by: Jiajian Ye <yejiajian2018@email.szu.edu.cn> Cc: Yixuan Cao <caoyixuan2019@email.szu.edu.cn> Cc: Shenghong Han <hanshenghong2019@email.szu.edu.cn> Cc: Yinan Zhang <zhangyinan2019@email.szu.edu.cn> Cc: Chongxi Zhao <zhaochongxi2019@email.szu.edu.cn> Cc: Yuhong Feng <yuhongf@szu.edu.cn> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Sean Anderson <seanga2@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-25tools/vm/page_owner_sort.c: support for selecting by PID, TGID or task ↵Jiajian Ye2-27/+98
command name When viewing page owner information, we may also need to select the blocks by PID, TGID or task command name, which helps to get more accurate page allocation information as needed. Therefore, following adjustments are made: 1. Add three new options, including --pid, --tgid and --name, to support the selection of information blocks by a specific pid, tgid and task command name. In addtion, multiple options are allowed to be used at the same time. ./page_owner_sort [input] [output] --pid <PID> ./page_owner_sort [input] [output] --tgid <TGID> ./page_owner_sort [input] [output] --name <TASK_COMMAND_NAME> Assuming a scenario when a multi-threaded program, ./demo (PID = 5280), is running, and ./demo creates a child process (PID = 5281). $ps PID TTY TIME CMD 5215 pts/0 00:00:00 bash 5280 pts/0 00:00:00 ./demo 5281 pts/0 00:00:00 ./demo 5282 pts/0 00:00:00 ps It would be better to filter out the records with tgid=5280 and the task name "demo" when debugging the parent process, and the specific usage is ./page_owner_sort [input] [output] --tgid 5280 --name demo 2. Add explanations of three new options, including --pid, --tgid and --name, to the document. This work is coauthored by Shenghong Han <hanshenghong2019@email.szu.edu.cn>, Yixuan Cao <caoyixuan2019@email.szu.edu.cn>, Yinan Zhang <zhangyinan2019@email.szu.edu.cn>, Chongxi Zhao <zhaochongxi2019@email.szu.edu.cn>, Yuhong Feng <yuhongf@szu.edu.cn>. Link: https://lkml.kernel.org/r/1646835223-7584-1-git-send-email-yejiajian2018@email.szu.edu.cn Signed-off-by: Jiajian Ye <yejiajian2018@email.szu.edu.cn> Cc: Sean Anderson <seanga2@gmail.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Zhenliang Wei <weizhenliang@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-25tools/vm/page_owner_sort: support for sorting by task command nameJiajian Ye2-1/+35
When viewing page owner information, we may also need to the block to be sorted by task command name. Therefore, the following adjustments are made: 1. Add a member variable to record task command name of block. 2. Add a new -n option to sort the information of blocks by task command name. 3. Add -n option explanation in the document. Link: https://lkml.kernel.org/r/20220306030640.43054-2-yejiajian2018@email.szu.edu.cn Signed-off-by: Jiajian Ye <yejiajian2018@email.szu.edu.cn> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Sean Anderson <seanga2@gmail.com> Cc: Yixuan Cao <caoyixuan2019@email.szu.edu.cn> Cc: Zhenliang Wei <weizhenliang@huawei.com> Cc: <zhaochongxi2019@email.szu.edu.cn> Cc: <hanshenghong2019@email.szu.edu.cn> Cc: <zhangyinan2019@email.szu.edu.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-25tools/vm/page_owner_sort: fix three trivival placesJiajian Ye1-18/+19
The following adjustments are made: 1. Instead of using another array to cull the blocks after sorting, reuse the old array. So there is no need to malloc a new array. 2. When enabling '-f' option to filter out the blocks which have been released, only add those have not been released in the list, rather than add all of blocks in the list and then do the filtering when printing the result. 3. When enabling '-c' option to cull the blocks by comparing stacktrace, print the stacetrace rather than the total block. Link: https://lkml.kernel.org/r/20220306030640.43054-1-yejiajian2018@email.szu.edu.cn Signed-off-by: Jiajian Ye <yejiajian2018@email.szu.edu.cn> Cc: <hanshenghong2019@email.szu.edu.cn> Cc: Sean Anderson <seanga2@gmail.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Yixuan Cao <caoyixuan2019@email.szu.edu.cn> Cc: <zhangyinan2019@email.szu.edu.cn> Cc: Zhenliang Wei <weizhenliang@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-25tools/vm/page_owner_sort.c: support sorting by tgid and update documentationJiajian Ye2-3/+38
When the "page owner" information is read, the information sorted by TGID is expected. As a result, the following adjustments have been made: 1. Add a new -P option to sort the information of blocks by TGID in ascending order. 2. Adjust the order of member variables in block_list strust to avoid one 4 byte hole. 3. Add -P option explanation in the document. Link: https://lkml.kernel.org/r/20220301151438.166118-3-yejiajian2018@email.szu.edu.cn Signed-off-by: Jiajian Ye <yejiajian2018@email.szu.edu.cn> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Yixuan Cao <caoyixuan2019@email.szu.edu.cn> Cc: Zhenliang Wei <weizhenliang@huawei.com> Cc: Yinan Zhang <zhangyinan2019@email.szu.edu.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-25tools/vm/page_owner_sort.c: add a security checkJiajian Ye1-0/+6
Add a security check after using malloc() to allocate memory. Link: https://lkml.kernel.org/r/20220301151438.166118-2-yejiajian2018@email.szu.edu.cn Signed-off-by: Jiajian Ye <yejiajian2018@email.szu.edu.cn> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Yinan Zhang <zhangyinan2019@email.szu.edu.cn> Cc: Yixuan Cao <caoyixuan2019@email.szu.edu.cn> Cc: Zhenliang Wei <weizhenliang@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-25tools/vm/page_owner_sort.c: fix commentsJiajian Ye1-2/+2
Two adjustments are made: 1. Correct a grammatical error: replace the "what" in "Do the job what you want to debug" with "that". 2. Replace "has not been" with "has been" in the description of the -f option: According to Commit b1c9ba071e7d ("tools/vm/page_owner_sort.c: fix the instructions for use"), the description of the "-f" option is "Filter out the information of blocks whose memory has been released." Link: https://lkml.kernel.org/r/20220301151438.166118-1-yejiajian2018@email.szu.edu.cn Signed-off-by: Jiajian Ye <yejiajian2018@email.szu.edu.cn> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Yinan Zhang <zhangyinan2019@email.szu.edu.cn> Cc: Yixuan Cao <caoyixuan2019@email.szu.edu.cn> Cc: Zhenliang Wei <weizhenliang@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-25tools/vm/page_owner_sort.c: fix the instructions for useYixuan Cao1-1/+1
I noticed a discrepancy between the usage method and the code logic. If we enable the -f option, it should be "Filter out the information of blocks whose memory has been released". Link: https://lkml.kernel.org/r/20220219143106.2805-1-caoyixuan2019@email.szu.edu.cn Signed-off-by: Yixuan Cao <caoyixuan2019@email.szu.edu.cn> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Sean Anderson <seanga2@gmail.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Zhenliang Wei <weizhenliang@huawei.com> Cc: Tang Bin <tangbin@cmss.chinamobile.com> Cc: Yinan Zhang <zhangyinan2019@email.szu.edu.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-25mm/page_owner.c: record tgidYixuan Cao1-6/+9
In a single-threaded process, the pid in kernel task_struct is the same as the tgid, which can mark the process of page allocation. But in a multithreaded process, only the task_struct of the thread leader has the same pid as tgid, and the pids of other threads are different from tgid. Therefore, tgid is recorded to provide effective information for debugging and data statistics of multithreaded programs. This can also be achieved by observing the task name (executable file name) for a specific process. However, when the same program is started multiple times, the task name is the same and the tgid is different. Therefore, in the debugging of multi-threaded programs, combined with the task name and tgid, more accurate runtime information of a certain run of the program can be obtained. Link: https://lkml.kernel.org/r/20220219180450.2399-1-caoyixuan2019@email.szu.edu.cn Signed-off-by: Yixuan Cao <caoyixuan2019@email.szu.edu.cn> Cc: Waiman Long <longman@redhat.com> Cc: Rafael Aquini <aquini@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-25mm/page_owner: record task command nameWaiman Long1-4/+10
The page_owner information currently includes the pid of the calling task. That is useful as long as the task is still running. Otherwise, the number is meaningless. To have more information about the allocating tasks that had exited by the time the page_owner information is retrieved, we need to store the command name of the task. Add a new comm field into page_owner structure to store the command name and display it when the page_owner information is retrieved. Link: https://lkml.kernel.org/r/20220202203036.744010-5-longman@redhat.com Signed-off-by: Waiman Long <longman@redhat.com> Acked-by: Rafael Aquini <aquini@redhat.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-25mm/page_owner: print memcg informationWaiman Long1-0/+42
It was found that a number of offline memcgs were not freed because they were pinned by some charged pages that were present. Even "echo 1 > /proc/sys/vm/drop_caches" wasn't able to free those pages. These offline but not freed memcgs tend to increase in number over time with the side effect that percpu memory consumption as shown in /proc/meminfo also increases over time. In order to find out more information about those pages that pin offline memcgs, the page_owner feature is extended to print memory cgroup information especially whether the cgroup is offline or not. RCU read lock is taken when memcg is being accessed to make sure that it won't be freed. Link: https://lkml.kernel.org/r/20220202203036.744010-4-longman@redhat.com Signed-off-by: Waiman Long <longman@redhat.com> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Roman Gushchin <guro@fb.com> Acked-by: Rafael Aquini <aquini@redhat.com> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-25mm/page_owner: use scnprintf() to avoid excessive buffer overrun checkWaiman Long1-11/+3
The snprintf() function can return a length greater than the given input size. That will require a check for buffer overrun after each invocation of snprintf(). scnprintf(), on the other hand, will never return a greater length. By using scnprintf() in selected places, we can avoid some buffer overrun checks except after stack_depot_snprint() and after the last snprintf(). Link: https://lkml.kernel.org/r/20220202203036.744010-3-longman@redhat.com Signed-off-by: Waiman Long <longman@redhat.com> Acked-by: David Rientjes <rientjes@google.com> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by: Rafael Aquini <aquini@redhat.com> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-25lib/vsprintf: avoid redundant work with 0 sizeWaiman Long1-3/+5
Patch series "mm/page_owner: Extend page_owner to show memcg information", v4. While debugging the constant increase in percpu memory consumption on a system that spawned large number of containers, it was found that a lot of offline mem_cgroup structures remained in place without being freed. Further investigation indicated that those mem_cgroup structures were pinned by some pages. In order to find out what those pages are, the existing page_owner debugging tool is extended to show memory cgroup information and whether those memcgs are offline or not. With the enhanced page_owner tool, the following is a typical page that pinned the mem_cgroup structure in my test case: Page allocated via order 0, mask 0x1100cca(GFP_HIGHUSER_MOVABLE), pid 162970 (podman), ts 1097761405537 ns, free_ts 1097760838089 ns PFN 1925700 type Movable Block 3761 type Movable Flags 0x17ffffc00c001c(uptodate|dirty|lru|reclaim|swapbacked|node=0|zone=2|lastcpupid=0x1fffff) prep_new_page+0xac/0xe0 get_page_from_freelist+0x1327/0x14d0 __alloc_pages+0x191/0x340 alloc_pages_vma+0x84/0x250 shmem_alloc_page+0x3f/0x90 shmem_alloc_and_acct_page+0x76/0x1c0 shmem_getpage_gfp+0x281/0x940 shmem_write_begin+0x36/0xe0 generic_perform_write+0xed/0x1d0 __generic_file_write_iter+0xdc/0x1b0 generic_file_write_iter+0x5d/0xb0 new_sync_write+0x11f/0x1b0 vfs_write+0x1ba/0x2a0 ksys_write+0x59/0xd0 do_syscall_64+0x37/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae Charged to offline memcg libpod-conmon-15e4f9c758422306b73b2dd99f9d50a5ea53cbb16b4a13a2c2308a4253cc0ec8. So the page was not freed because it was part of a shmem segment. That is useful information that can help users to diagnose similar problems. With cgroup v1, /proc/cgroups can be read to find out the total number of memory cgroups (online + offline). With cgroup v2, the cgroup.stat of the root cgroup can be read to find the number of dying cgroups (most likely pinned by dying memcgs). The page_owner feature is not supposed to be enabled for production system due to its memory overhead. However, if it is suspected that dying memcgs are increasing over time, a test environment with page_owner enabled can then be set up with appropriate workload for further analysis on what may be causing the increasing number of dying memcgs. This patch (of 4): For *scnprintf(), vsnprintf() is always called even if the input size is 0. That is a waste of time, so just return 0 in this case. Note that vsnprintf() will never return -1 to indicate an error. So skipping the call to vsnprintf() when size is 0 will have no functional impact at all. Link: https://lkml.kernel.org/r/20220202203036.744010-1-longman@redhat.com Link: https://lkml.kernel.org/r/20220202203036.744010-2-longman@redhat.com Signed-off-by: Waiman Long <longman@redhat.com> Acked-by: David Rientjes <rientjes@google.com> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by: Roman Gushchin <guro@fb.com> Acked-by: Rafael Aquini <aquini@redhat.com> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Ira Weiny <ira.weiny@intel.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-25Documentation/vm/page_owner.rst: fix unexpected indentation warnsShuah Khan1-3/+3
Fix Unexpected indentation warns in page_owner: Documentation/vm/page_owner.rst:92: WARNING: Unexpected indentation. Documentation/vm/page_owner.rst:96: WARNING: Unexpected indentation. Documentation/vm/page_owner.rst:107: WARNING: Unexpected indentation. Link: https://lkml.kernel.org/r/20211215001929.47866-1-skhan@linuxfoundation.org Signed-off-by: Shuah Khan <skhan@linuxfoundation.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-25Documentation/vm/page_owner.rst: update the documentationShenghong Han1-2/+21
Update the documentation of ``page_owner``. [akpm@linux-foundation.org: small grammatical tweaks] Link: https://lkml.kernel.org/r/20211214134736.2569-1-hanshenghong2019@email.szu.edu.cn Signed-off-by: Shenghong Han <hanshenghong2019@email.szu.edu.cn> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Georgi Djakov <georgi.djakov@linaro.org> Cc: Liam Mark <lmark@codeaurora.org> Cc: Tang Bin <tangbin@cmss.chinamobile.com> Cc: Zhang Shengju <zhangshengju@cmss.chinamobile.com> Cc: Zhenliang Wei <weizhenliang@huawei.com> Cc: Xiaoming Ni <nixiaoming@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>