summaryrefslogtreecommitdiff
path: root/Documentation/admin-guide/mm
AgeCommit message (Collapse)AuthorFilesLines
2022-05-13Docs/admin-guide/mm/damon/reclaim: document 'commit_inputs' parameterSeongJae Park1-0/+11
This commit documents the new DAMON_RECLAIM parameter, 'commit_inputs' in its usage document. Link: https://lkml.kernel.org/r/20220429160606.127307-15-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13Docs/{ABI,admin-guide}/damon: Update for 'state' sysfs file input keyword, ↵SeongJae Park1-4/+5
'commit' This commit documents the newly added 'state' sysfs file input keyword, 'commit', which allows online tuning of DAMON contexts. Link: https://lkml.kernel.org/r/20220429160606.127307-13-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13Docs/{ABI,admin-guide}/damon: update for fixed virtual address ranges monitoringSeongJae Park1-3/+11
This commit documents the user space support of the newly added monitoring operations set for fixed virtual address ranges monitoring, namely 'fvaddr', on the ABI and usage documents for DAMON. Link: https://lkml.kernel.org/r/20220426231750.48822-4-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13Docs/{ABI,admin-guide}/damon: document 'avail_operations' sysfs fileSeongJae Park1-6/+12
This commit updates the DAMON ABI and usage documents for the new sysfs file, 'avail_operations'. Link: https://lkml.kernel.org/r/20220426203843.45238-5-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-29mm/vmstat: add events for ksm cowYang Yang1-0/+18
Users may use ksm by calling madvise(, , MADV_MERGEABLE) when they want to save memory, it's a tradeoff by suffering delay on ksm cow. Users can get to know how much memory ksm saved by reading /sys/kernel/mm/ksm/pages_sharing, but they don't know what's the costs of ksm cow, and this is important of some delay sensitive tasks. So add ksm cow events to help users evaluate whether or how to use ksm. Also update Documentation/admin-guide/mm/ksm.rst with new added events. Link: https://lkml.kernel.org/r/20220331035616.2390805-1-yang.yang29@zte.com.cn Signed-off-by: Yang Yang <yang.yang29@zte.com.cn> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: xu xin <xu.xin16@zte.com.cn> Reviewed-by: Ran Xiaokai <ran.xiaokai@zte.com.cn> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Saravanan D <saravanand@fb.com> Cc: Minchan Kim <minchan@kernel.org> Cc: John Hubbard <jhubbard@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-29mm: hugetlb_vmemmap: cleanup CONFIG_HUGETLB_PAGE_FREE_VMEMMAP*Muchun Song1-1/+1
The word of "free" is not expressive enough to express the feature of optimizing vmemmap pages associated with each HugeTLB, rename this keywork to "optimize". In this patch , cheanup configs to make code more expressive. Link: https://lkml.kernel.org/r/20220404074652.68024-4-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-03-23Docs/admin-guide/mm/damon/usage: document DAMON sysfs interfaceSeongJae Park1-6/+344
This commit adds detailed usage of DAMON sysfs interface in the admin-guide document for DAMON. Link: https://lkml.kernel.org/r/20220228081314.5770-13-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Xin Hao <xhao@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-23Docs/damon: update outdated term 'regions update interval'SeongJae Park1-3/+3
Before DAMON is merged in the mainline, the concept of 'regions update interval' has generalized to be used as the time interval for update of any monitoring operations related data structure, but the document has not updated properly. This commit updates the document for better consistency. Link: https://lkml.kernel.org/r/20220222170100.17068-4-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-23Docs/admin-guide/mm/damon/usage: update for changed initail_regions file inputSeongJae Park1-10/+14
A previous commit made init_regions debugfs file to use target index instead of target id for specifying the target of the init regions. This commit updates the usage document to reflect the change. Link: https://lkml.kernel.org/r/20211230100723.2238-3-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-23mm/zswap.c: allow handling just same-value filled pagesMaciej S. Szmigiero1-3/+19
Zswap has an ability to efficiently store same-value filled pages, which can be turned on and off using the "same_filled_pages_enabled" parameter. However, there is currently no way to enable just this (lightweight) functionality, while not making use of the whole compressed page storage machinery. Add a "non_same_filled_pages_enabled" parameter which allows disabling handling of pages that aren't same-value filled. This way zswap can be run in such lightweight same-value filled pages only mode. Link: https://lkml.kernel.org/r/7dbafa963e8bab43608189abbe2067f4b9287831.1641247624.git.maciej.szmigiero@oracle.com Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Cc: Seth Jennings <sjenning@redhat.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-05proc: fix documentation and description of pagemapYun Zhou1-1/+1
Since bit 57 was exported for uffd-wp write-protected (commit fb8e37f35a2f: "mm/pagemap: export uffd-wp protection information"), fixing it can reduce some unnecessary confusion. Link: https://lkml.kernel.org/r/20220301044538.3042713-1-yun.zhou@windriver.com Fixes: fb8e37f35a2fe1 ("mm/pagemap: export uffd-wp protection information") Signed-off-by: Yun Zhou <yun.zhou@windriver.com> Reviewed-by: Peter Xu <peterx@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Tiberiu A Georgescu <tiberiu.georgescu@nutanix.com> Cc: Florian Schmidt <florian.schmidt@nutanix.com> Cc: Ivan Teterevkov <ivan.teterevkov@nutanix.com> Cc: SeongJae Park <sj@kernel.org> Cc: Yang Shi <shy828301@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Colin Cross <ccross@google.com> Cc: Alistair Popple <apopple@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-15Docs/admin-guide/mm/damon/usage: update for schemes statisticsSeongJae Park1-4/+5
This updates DAMON debugfs interface for statistics of schemes successfully applied regions and time/space quota limit exceeds counts. Link: https://lkml.kernel.org/r/20211210150016.35349-7-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-15Docs/admin-guide/mm/damon/reclaim: document statistics parametersSeongJae Park1-0/+25
This adds descriptions for the DAMON_RECLAIM statistics parameters. Link: https://lkml.kernel.org/r/20211210150016.35349-5-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-15Docs/admin-guide/mm/damon/usage: update for kdamond_pid and (mk|rm)_contextsSeongJae Park1-3/+49
The DAMON debugfs usage document is missing descriptions for 'kdamond_pid', 'mk_contexts', and 'rm_contexts' debugfs files. This commit adds those. Link: https://lkml.kernel.org/r/20211209131806.19317-6-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-15Docs/admin-guide/mm/damon/usage: mention tracepoint at the beginningSeongJae Park1-1/+8
To get detailed monitoring results from the user space, users need to use the damon_aggregated tracepoint. This commit adds a brief mention of it at the beginning of the usage document. Link: https://lkml.kernel.org/r/20211209131806.19317-5-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-15Docs/admin-guide/mm/damon/usage: remove redundant informationSeongJae Park1-22/+22
DAMON usage document mentions DAMON user space tool and programming interface twice. This commit integrates those and remove unnecessary part. Link: https://lkml.kernel.org/r/20211209131806.19317-4-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-15Docs/admin-guide/mm/damon/usage: update for scheme quotas and watermarksSeongJae Park1-25/+98
DAMOS features including time/space quota limits and watermarks are not described in the DAMON debugfs interface document. This commit updates the document for the features. Link: https://lkml.kernel.org/r/20211209131806.19317-3-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-15mm/mempolicy: add set_mempolicy_home_node syscallAneesh Kumar K.V1-1/+15
This syscall can be used to set a home node for the MPOL_BIND and MPOL_PREFERRED_MANY memory policy. Users should use this syscall after setting up a memory policy for the specified range as shown below. mbind(p, nr_pages * page_size, MPOL_BIND, new_nodes->maskp, new_nodes->size + 1, 0); sys_set_mempolicy_home_node((unsigned long)p, nr_pages * page_size, home_node, 0); The syscall allows specifying a home node/preferred node from which kernel will fulfill memory allocation requests first. For address range with MPOL_BIND memory policy, if nodemask specifies more than one node, page allocations will come from the node in the nodemask with sufficient free memory that is closest to the home node/preferred node. For MPOL_PREFERRED_MANY if the nodemask specifies more than one node, page allocation will come from the node in the nodemask with sufficient free memory that is closest to the home node/preferred node. If there is not enough memory in all the nodes specified in the nodemask, the allocation will be attempted from the closest numa node to the home node in the system. This helps applications to hint at a memory allocation preference node and fallback to _only_ a set of nodes if the memory is not available on the preferred node. Fallback allocation is attempted from the node which is nearest to the preferred node. This helps applications to have control on memory allocation numa nodes and avoids default fallback to slow memory NUMA nodes. For example a system with NUMA nodes 1,2 and 3 with DRAM memory and 10, 11 and 12 of slow memory new_nodes = numa_bitmask_alloc(nr_nodes); numa_bitmask_setbit(new_nodes, 1); numa_bitmask_setbit(new_nodes, 2); numa_bitmask_setbit(new_nodes, 3); p = mmap(NULL, nr_pages * page_size, protflag, mapflag, -1, 0); mbind(p, nr_pages * page_size, MPOL_BIND, new_nodes->maskp, new_nodes->size + 1, 0); sys_set_mempolicy_home_node(p, nr_pages * page_size, 2, 0); This will allocate from nodes closer to node 2 and will make sure the kernel will only allocate from nodes 1, 2, and 3. Memory will not be allocated from slow memory nodes 10, 11, and 12. This differs from default MPOL_BIND behavior in that with default MPOL_BIND the allocation will be attempted from node closer to the local node. One of the reasons to specify a home node is to allow allocations from cpu less NUMA node and its nearby NUMA nodes. With MPOL_PREFERRED_MANY on the other hand will first try to allocate from the closest node to node 2 from the node list 1, 2 and 3. If those nodes don't have enough memory, kernel will allocate from slow memory node 10, 11 and 12 which ever is closer to node 2. Link: https://lkml.kernel.org/r/20211202123810.267175-3-aneesh.kumar@linux.ibm.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Ben Widawsky <ben.widawsky@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Feng Tang <feng.tang@intel.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Huang Ying <ying.huang@intel.com> Cc: <linux-api@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06Docs/admin-guide/mm/pagemap: wordsmith page flags descriptionsSeongJae Park1-26/+27
Some descriptions of page flags in 'pagemap.rst' are written in assumption of none-rst, which respects every new line, as below: 7 - SLAB page is managed by the SLAB/SLOB/SLUB/SLQB kernel memory allocator When compound page is used, SLUB/SLQB will only set this flag on the head Because rst ignores the new line between the first sentence and second sentence, resulting html looks a little bit weird, as below. 7 - SLAB page is managed by the SLAB/SLOB/SLUB/SLQB kernel memory allocator When ^ compound page is used, SLUB/SLQB will only set this flag on the head page; SLOB will not flag it at all. This change makes it more natural and consistent with other parts in the rendered version. Link: https://lkml.kernel.org/r/20211022090311.3856-5-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06Docs/admin-guide/mm/damon/start: simplify the contentSeongJae Park1-53/+60
Information in 'TL; DR' section of 'Getting Started' is duplicated in other parts of the doc. It is also asking readers to visit the access pattern visualizations gallery web site to show the results of example visualization commands, while the users of the commands can use terminal output. To make the doc simple, this removes the duplicated 'TL; DR' section and replaces the visualization example commands with versions using terminal outputs. Link: https://lkml.kernel.org/r/20211022090311.3856-4-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06Docs/admin-guide/mm/damon/start: fix a wrong linkSeongJae Park1-1/+3
The 'Getting Started' of DAMON is providing a link to DAMON's user interface document while saying about its user space tool's detailed usages. This fixes the link. Link: https://lkml.kernel.org/r/20211022090311.3856-3-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06Docs/admin-guide/mm/damon/start: fix wrong example commandsSeongJae Park1-7/+7
Patch series "Fix trivial nits in Documentation/admin-guide/mm". This patchset fixes trivial nits in admin guide documents for DAMON and pagemap. This patch (of 4): Some of the example commands in DAMON getting started guide are outdated, missing sudo, or just wrong. This fixes those. Link: https://lkml.kernel.org/r/20211022090311.3856-2-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06Documentation/admin-guide/mm/damon: add a document for DAMON_RECLAIMSeongJae Park2-0/+236
This adds an admin-guide document for DAMON-based Reclamation. Link: https://lkml.kernel.org/r/20211019150731.16699-16-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Amit Shah <amit@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Hildenbrand <david@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: David Woodhouse <dwmw@amazon.com> Cc: Greg Thelen <gthelen@google.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Leonard Foerster <foersleo@amazon.de> Cc: Marco Elver <elver@google.com> Cc: Markus Boehme <markubo@amazon.de> Cc: Shakeel Butt <shakeelb@google.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06Docs/DAMON: document physical memory monitoring supportSeongJae Park1-5/+20
This updates the DAMON documents for the physical memory address space monitoring support. Link: https://lkml.kernel.org/r/20211012205711.29216-8-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Amit Shah <amit@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Brendan Higgins <brendanhiggins@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Rienjes <rientjes@google.com> Cc: David Woodhouse <dwmw@amazon.com> Cc: Greg Thelen <gthelen@google.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Leonard Foerster <foersleo@amazon.de> Cc: Marco Elver <elver@google.com> Cc: Markus Boehme <markubo@amazon.de> Cc: Shakeel Butt <shakeelb@google.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06Docs/admin-guide/mm/damon: document 'init_regions' featureSeongJae Park1-2/+39
This adds description of the 'init_regions' feature in the DAMON usage document. Link: https://lkml.kernel.org/r/20211012205711.29216-4-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Amit Shah <amit@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Brendan Higgins <brendanhiggins@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Rienjes <rientjes@google.com> Cc: David Woodhouse <dwmw@amazon.com> Cc: Greg Thelen <gthelen@google.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Leonard Foerster <foersleo@amazon.de> Cc: Marco Elver <elver@google.com> Cc: Markus Boehme <markubo@amazon.de> Cc: Shakeel Butt <shakeelb@google.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06Docs/admin-guide/mm/damon: document DAMON-based Operation SchemesSeongJae Park2-2/+60
This adds the description of DAMON-based operation schemes in the DAMON documents. Link: https://lkml.kernel.org/r/20211001125604.29660-8-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Amit Shah <amit@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Hildenbrand <david@redhat.com> Cc: David Rienjes <rientjes@google.com> Cc: David Woodhouse <dwmw@amazon.com> Cc: Greg Thelen <gthelen@google.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Leonard Foerster <foersleo@amazon.de> Cc: Marco Elver <elver@google.com> Cc: Markus Boehme <markubo@amazon.de> Cc: Shakeel Butt <shakeelb@google.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06Documentation/vm: move user guides to admin-guide/mm/SeongJae Park3-0/+234
Most memory management user guide documents are in 'admin-guide/mm/', but two of those are in 'vm/'. This moves the two docs into 'admin-guide/mm' for easier documents finding. Link: https://lkml.kernel.org/r/20210917123958.3819-2-sj@kernel.org Signed-off-by: SeongJae Park <sjpark@amazon.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06memory-hotplug.rst: document the "auto-movable" online policyDavid Hildenbrand1-20/+121
Commit e83a437faa62 ("mm/memory_hotplug: introduce "auto-movable" online policy") introduced a new memory online policy to automatically select a zone for memory blocks to be onlined. It added a way to set the active online policy and tunables for the auto-movable online policy. Follow-up commits tweaked the "auto-movable" policy to also consider memory device details when selecting zones for memory blocks to be onlined. Let's document the new toggles and how the two online policies we have work. [david@redhat.com: updates] Link: https://lkml.kernel.org/r/20211011082058.6076-4-david@redhat.com Link: https://lkml.kernel.org/r/20210930144117.23641-4-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Michal Hocko <mhocko@suse.com> Cc: Oscar Salvador <osalvador@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06memory-hotplug.rst: fix wrong /sys/module/memory_hotplug/parameters/ pathDavid Hildenbrand1-1/+1
We accidentially added a superfluous "s". Link: https://lkml.kernel.org/r/20210930144117.23641-3-david@redhat.com Fixes: ac3332c44767 ("memory-hotplug.rst: complete admin-guide overhaul") Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Michal Hocko <mhocko@suse.com> Cc: Oscar Salvador <osalvador@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06memory-hotplug.rst: fix two instances of "movablecore" that should be ↵David Hildenbrand1-2/+2
"movable_node" Patch series "memory-hotplug.rst: document the "auto-movable" online policy". Now that the memory-hotplug.rst overhaul is upstream, proper documentation for the "auto-movable" online policy, documenting all new toggles and options. Along, two fixes for the original overhaul. This patch (of 3): We really want to refer to the "movable_node" kernel command line parameter here. Link: https://lkml.kernel.org/r/20210930144117.23641-2-david@redhat.com Fixes: ac3332c44767 ("memory-hotplug.rst: complete admin-guide overhaul") Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Michal Hocko <mhocko@suse.com> Cc: Oscar Salvador <osalvador@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06hugetlbfs: extend the definition of hugepages parameter to support node ↵Zhenguo Yao1-1/+11
allocation We can specify the number of hugepages to allocate at boot. But the hugepages is balanced in all nodes at present. In some scenarios, we only need hugepages in one node. For example: DPDK needs hugepages which are in the same node as NIC. If DPDK needs four hugepages of 1G size in node1 and system has 16 numa nodes we must reserve 64 hugepages on the kernel cmdline. But only four hugepages are used. The others should be free after boot. If the system memory is low(for example: 64G), it will be an impossible task. So extend the hugepages parameter to support specifying hugepages on a specific node. For example add following parameter: hugepagesz=1G hugepages=0:1,1:3 It will allocate 1 hugepage in node0 and 3 hugepages in node1. Link: https://lkml.kernel.org/r/20211005054729.86457-1-yaozhenguo1@gmail.com Signed-off-by: Zhenguo Yao <yaozhenguo1@gmail.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Zhenguo Yao <yaozhenguo1@gmail.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Mike Rapoport <rppt@kernel.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06hugetlb: add demote hugetlb page sysfs interfacesMike Kravetz1-2/+28
Patch series "hugetlb: add demote/split page functionality", v4. The concurrent use of multiple hugetlb page sizes on a single system is becoming more common. One of the reasons is better TLB support for gigantic page sizes on x86 hardware. In addition, hugetlb pages are being used to back VMs in hosting environments. When using hugetlb pages to back VMs, it is often desirable to preallocate hugetlb pools. This avoids the delay and uncertainty of allocating hugetlb pages at VM startup. In addition, preallocating huge pages minimizes the issue of memory fragmentation that increases the longer the system is up and running. In such environments, a combination of larger and smaller hugetlb pages are preallocated in anticipation of backing VMs of various sizes. Over time, the preallocated pool of smaller hugetlb pages may become depleted while larger hugetlb pages still remain. In such situations, it is desirable to convert larger hugetlb pages to smaller hugetlb pages. Converting larger to smaller hugetlb pages can be accomplished today by first freeing the larger page to the buddy allocator and then allocating the smaller pages. For example, to convert 50 GB pages on x86: gb_pages=`cat .../hugepages-1048576kB/nr_hugepages` m2_pages=`cat .../hugepages-2048kB/nr_hugepages` echo $(($gb_pages - 50)) > .../hugepages-1048576kB/nr_hugepages echo $(($m2_pages + 25600)) > .../hugepages-2048kB/nr_hugepages On an idle system this operation is fairly reliable and results are as expected. The number of 2MB pages is increased as expected and the time of the operation is a second or two. However, when there is activity on the system the following issues arise: 1) This process can take quite some time, especially if allocation of the smaller pages is not immediate and requires migration/compaction. 2) There is no guarantee that the total size of smaller pages allocated will match the size of the larger page which was freed. This is because the area freed by the larger page could quickly be fragmented. In a test environment with a load that continually fills the page cache with clean pages, results such as the following can be observed: Unexpected number of 2MB pages allocated: Expected 25600, have 19944 real 0m42.092s user 0m0.008s sys 0m41.467s To address these issues, introduce the concept of hugetlb page demotion. Demotion provides a means of 'in place' splitting of a hugetlb page to pages of a smaller size. This avoids freeing pages to buddy and then trying to allocate from buddy. Page demotion is controlled via sysfs files that reside in the per-hugetlb page size and per node directories. - demote_size Target page size for demotion, a smaller huge page size. File can be written to chose a smaller huge page size if multiple are available. - demote Writable number of hugetlb pages to be demoted To demote 50 GB huge pages, one would: cat .../hugepages-1048576kB/free_hugepages /* optional, verify free pages */ cat .../hugepages-1048576kB/demote_size /* optional, verify target size */ echo 50 > .../hugepages-1048576kB/demote Only hugetlb pages which are free at the time of the request can be demoted. Demotion does not add to the complexity of surplus pages and honors reserved huge pages. Therefore, when a value is written to the sysfs demote file, that value is only the maximum number of pages which will be demoted. It is possible fewer will actually be demoted. The recently introduced per-hstate mutex is used to synchronize demote operations with other operations that modify hugetlb pools. Real world use cases -------------------- The above scenario describes a real world use case where hugetlb pages are used to back VMs on x86. Both issues of long allocation times and not necessarily getting the expected number of smaller huge pages after a free and allocate cycle have been experienced. The occurrence of these issues is dependent on other activity within the host and can not be predicted. This patch (of 5): Two new sysfs files are added to demote hugtlb pages. These files are both per-hugetlb page size and per node. Files are: demote_size - The size in Kb that pages are demoted to. (read-write) demote - The number of huge pages to demote. (write-only) By default, demote_size is the next smallest huge page size. Valid huge page sizes less than huge page size may be written to this file. When huge pages are demoted, they are demoted to this size. Writing a value to demote will result in an attempt to demote that number of hugetlb pages to an appropriate number of demote_size pages. NOTE: Demote interfaces are only provided for huge page sizes if there is a smaller target demote huge page size. For example, on x86 1GB huge pages will have demote interfaces. 2MB huge pages will not have demote interfaces. This patch does not provide full demote functionality. It only provides the sysfs interfaces. It also provides documentation for the new interfaces. [mike.kravetz@oracle.com: n_mask initialization does not need to be protected by the mutex] Link: https://lkml.kernel.org/r/0530e4ef-2492-5186-f919-5db68edea654@oracle.com Link: https://lkml.kernel.org/r/20211007181918.136982-2-mike.kravetz@oracle.com Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: David Hildenbrand <david@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev> Cc: David Rientjes <rientjes@google.com> Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com> Cc: Nghia Le <nghialm78@gmail.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06Documentation: update pagemap with shmem exceptionsTiberiu A Georgescu1-0/+22
This patch follows the discussions on previous documentation patch threads [1][2]. It presents the exception case of shared memory management from the pagemap's point of view. It briefly describes what is missing, why it is missing and alternatives to the pagemap for page info retrieval in user space. In short, the kernel does not keep track of PTEs for swapped out shared pages within the processes that references them. Thus, the proc/pid/pagemap tool cannot print the swap destination of the shared memory pages, instead setting the pagemap entry to zero for both non-allocated and swapped out pages. This can create confusion for users who need information on swapped out pages. The reasons why maintaining the PTEs of all swapped out shared pages among all processes while maintaining similar performance is not a trivial task, or a desirable change, have been discussed extensively [1][3][4][5]. There are also arguments for why this arguably missing information should eventually be exposed to the user in either a future pagemap patch, or by an alternative tool. [1]: https://marc.info/?m=162878395426774 [2]: https://lore.kernel.org/lkml/20210920164931.175411-1-tiberiu.georgescu@nutanix.com/ [3]: https://lore.kernel.org/lkml/20210730160826.63785-1-tiberiu.georgescu@nutanix.com/ [4]: https://lore.kernel.org/lkml/20210807032521.7591-1-peterx@redhat.com/ [5]: https://lore.kernel.org/lkml/20210715201651.212134-1-peterx@redhat.com/ Mention the current missing information in the pagemap and alternatives on how to retrieve it, in case someone stumbles upon unexpected behaviour. Link: https://lkml.kernel.org/r/20210923064618.157046-1-tiberiu.georgescu@nutanix.com Link: https://lkml.kernel.org/r/20210923064618.157046-2-tiberiu.georgescu@nutanix.com Signed-off-by: Tiberiu A Georgescu <tiberiu.georgescu@nutanix.com> Reviewed-by: Ivan Teterevkov <ivan.teterevkov@nutanix.com> Reviewed-by: Florian Schmidt <florian.schmidt@nutanix.com> Reviewed-by: Carl Waldspurger <carl.waldspurger@nutanix.com> Reviewed-by: Jonathan Davies <jonathan.davies@nutanix.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-08Merge branch 'akpm' (patches from Andrew)Linus Torvalds5-345/+697
Merge more updates from Andrew Morton: "147 patches, based on 7d2a07b769330c34b4deabeed939325c77a7ec2f. Subsystems affected by this patch series: mm (memory-hotplug, rmap, ioremap, highmem, cleanups, secretmem, kfence, damon, and vmscan), alpha, percpu, procfs, misc, core-kernel, MAINTAINERS, lib, checkpatch, epoll, init, nilfs2, coredump, fork, pids, criu, kconfig, selftests, ipc, and scripts" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (94 commits) scripts: check_extable: fix typo in user error message mm/workingset: correct kernel-doc notations ipc: replace costly bailout check in sysvipc_find_ipc() selftests/memfd: remove unused variable Kconfig.debug: drop selecting non-existing HARDLOCKUP_DETECTOR_ARCH configs: remove the obsolete CONFIG_INPUT_POLLDEV prctl: allow to setup brk for et_dyn executables pid: cleanup the stale comment mentioning pidmap_init(). kernel/fork.c: unexport get_{mm,task}_exe_file coredump: fix memleak in dump_vma_snapshot() fs/coredump.c: log if a core dump is aborted due to changed file permissions nilfs2: use refcount_dec_and_lock() to fix potential UAF nilfs2: fix memory leak in nilfs_sysfs_delete_snapshot_group nilfs2: fix memory leak in nilfs_sysfs_create_snapshot_group nilfs2: fix memory leak in nilfs_sysfs_delete_##name##_group nilfs2: fix memory leak in nilfs_sysfs_create_##name##_group nilfs2: fix NULL pointer in nilfs_##name##_attr_release nilfs2: fix memory leak in nilfs_sysfs_create_device_group trap: cleanup trap_init() init: move usermodehelper_enable() to populate_rootfs() ...
2021-09-08Documentation: add documents for DAMONSeongJae Park4-0/+242
This commit adds documents for DAMON under `Documentation/admin-guide/mm/damon/` and `Documentation/vm/damon/`. Link: https://lkml.kernel.org/r/20210716081449.22187-11-sj38.park@gmail.com Signed-off-by: SeongJae Park <sjpark@amazon.de> Reviewed-by: Fernand Sieber <sieberf@amazon.com> Reviewed-by: Markus Boehme <markubo@amazon.de> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Amit Shah <amit@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Brendan Higgins <brendanhiggins@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: David Woodhouse <dwmw@amazon.com> Cc: Fan Du <fan.du@intel.com> Cc: Greg Kroah-Hartman <greg@kroah.com> Cc: Greg Thelen <gthelen@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joe Perches <joe@perches.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Leonard Foerster <foersleo@amazon.de> Cc: Marco Elver <elver@google.com> Cc: Maximilian Heyne <mheyne@amazon.de> Cc: Mel Gorman <mgorman@suse.de> Cc: Minchan Kim <minchan@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-08memory-hotplug.rst: complete admin-guide overhaulDavid Hildenbrand1-306/+455
The memory hot(un)plug documentation is outdated and incomplete. Most of the content dates back to 2007, so it's time for a major overhaul. Let's rewrite, reorganize and update most parts of the documentation. In addition to memory hot(un)plug, also add some details regarding ZONE_MOVABLE, with memory hotunplug being one of its main consumers. Drop the file history, that information can more reliably be had from the git log. The style of the document is also properly fixed that e.g., "restview" renders it cleanly now. In the future, we might add some more details about virt users like virtio-mem, the XEN balloon, the Hyper-V balloon and ppc64 dlpar. Link: https://lkml.kernel.org/r/20210707073205.3835-3-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-08memory-hotplug.rst: remove locking details from admin-guideDavid Hildenbrand1-39/+0
Patch series "memory-hotplug.rst: complete admin-guide overhaul", v3. This patch (of 2): We have the same content at Documentation/core-api/memory-hotplug.rst and it doesn't fit into the admin-guide. The documentation was accidentially duplicated when merging. Link: https://lkml.kernel.org/r/20210707073205.3835-1-david@redhat.com Link: https://lkml.kernel.org/r/20210707073205.3835-2-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-03mm/mempolicy: advertise new MPOL_PREFERRED_MANYBen Widawsky1-4/+11
Adds a new mode to the existing mempolicy modes, MPOL_PREFERRED_MANY. MPOL_PREFERRED_MANY will be adequately documented in the internal admin-guide with this patch. Eventually, the man pages for mbind(2), get_mempolicy(2), set_mempolicy(2) and numactl(8) will also have text about this mode. Those shall contain the canonical reference. NUMA systems continue to become more prevalent. New technologies like PMEM make finer grain control over memory access patterns increasingly desirable. MPOL_PREFERRED_MANY allows userspace to specify a set of nodes that will be tried first when performing allocations. If those allocations fail, all remaining nodes will be tried. It's a straight forward API which solves many of the presumptive needs of system administrators wanting to optimize workloads on such machines. The mode will work either per VMA, or per thread. [Michal Hocko: refine kernel doc for MPOL_PREFERRED_MANY] Link: https://lore.kernel.org/r/20200630212517.308045-13-ben.widawsky@intel.com Link: https://lkml.kernel.org/r/1627970362-61305-5-git-send-email-feng.tang@intel.com Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> Signed-off-by: Feng Tang <feng.tang@intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@kernel.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-07-01userfaultfd/shmem: advertise shmem minor fault supportAxel Rasmussen1-1/+2
Now that the feature is fully implemented (the faulting path hooks exist so userspace is notified, and the ioctl to resolve such faults is available), advertise this as a supported feature. Link: https://lkml.kernel.org/r/20210503180737.2487560-6-axelrasmussen@google.com Signed-off-by: Axel Rasmussen <axelrasmussen@google.com> Acked-by: Hugh Dickins <hughd@google.com> Acked-by: Peter Xu <peterx@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Brian Geffon <bgeffon@google.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Joe Perches <joe@perches.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Lokesh Gidra <lokeshgidra@google.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Mina Almasry <almasrymina@google.com> Cc: Oliver Upton <oupton@google.com> Cc: Shaohua Li <shli@fb.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Wang Qing <wangqing@vivo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-07-01mm/pagemap: export uffd-wp protection informationPeter Xu1-0/+2
Export the PTE/PMD status of uffd-wp to pagemap too. Link: https://lkml.kernel.org/r/20210428225030.9708-6-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Brian Geffon <bgeffon@google.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Joe Perches <joe@perches.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Lokesh Gidra <lokeshgidra@google.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Mina Almasry <almasrymina@google.com> Cc: Oliver Upton <oupton@google.com> Cc: Shaohua Li <shli@fb.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Wang Qing <wangqing@vivo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-07-01mm: hugetlb: add a kernel parameter hugetlb_free_vmemmapMuchun Song1-0/+3
Add a kernel parameter hugetlb_free_vmemmap to enable the feature of freeing unused vmemmap pages associated with each hugetlb page on boot. We disable PMD mapping of vmemmap pages for x86-64 arch when this feature is enabled. Because vmemmap_remap_free() depends on vmemmap being base page mapped. Link: https://lkml.kernel.org/r/20210510030027.56044-8-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: Barry Song <song.bao.hua@hisilicon.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Tested-by: Chen Huang <chenhuang5@huawei.com> Tested-by: Bodeddula Balasubramaniam <bodeddub@amazon.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andy Lutomirski <luto@kernel.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: HORIGUCHI NAOYA <naoya.horiguchi@nec.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joao Martins <joao.m.martins@oracle.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mina Almasry <almasrymina@google.com> Cc: Oliver Neukum <oneukum@suse.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Xiongchun Duan <duanxiongchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-07-01mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB pageMuchun Song2-0/+21
When we free a HugeTLB page to the buddy allocator, we need to allocate the vmemmap pages associated with it. However, we may not be able to allocate the vmemmap pages when the system is under memory pressure. In this case, we just refuse to free the HugeTLB page. This changes behavior in some corner cases as listed below: 1) Failing to free a huge page triggered by the user (decrease nr_pages). User needs to try again later. 2) Failing to free a surplus huge page when freed by the application. Try again later when freeing a huge page next time. 3) Failing to dissolve a free huge page on ZONE_MOVABLE via offline_pages(). This can happen when we have plenty of ZONE_MOVABLE memory, but not enough kernel memory to allocate vmemmmap pages. We may even be able to migrate huge page contents, but will not be able to dissolve the source huge page. This will prevent an offline operation and is unfortunate as memory offlining is expected to succeed on movable zones. Users that depend on memory hotplug to succeed for movable zones should carefully consider whether the memory savings gained from this feature are worth the risk of possibly not being able to offline memory in certain situations. 4) Failing to dissolve a huge page on CMA/ZONE_MOVABLE via alloc_contig_range() - once we have that handling in place. Mainly affects CMA and virtio-mem. Similar to 3). virito-mem will handle migration errors gracefully. CMA might be able to fallback on other free areas within the CMA region. Vmemmap pages are allocated from the page freeing context. In order for those allocations to be not disruptive (e.g. trigger oom killer) __GFP_NORETRY is used. hugetlb_lock is dropped for the allocation because a non sleeping allocation would be too fragile and it could fail too easily under memory pressure. GFP_ATOMIC or other modes to access memory reserves is not used because we want to prevent consuming reserves under heavy hugetlb freeing. [mike.kravetz@oracle.com: fix dissolve_free_huge_page use of tail/head page] Link: https://lkml.kernel.org/r/20210527231225.226987-1-mike.kravetz@oracle.com [willy@infradead.org: fix alloc_vmemmap_page_list documentation warning] Link: https://lkml.kernel.org/r/20210615200242.1716568-6-willy@infradead.org Link: https://lkml.kernel.org/r/20210510030027.56044-7-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andy Lutomirski <luto@kernel.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Barry Song <song.bao.hua@hisilicon.com> Cc: Bodeddula Balasubramaniam <bodeddub@amazon.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Chen Huang <chenhuang5@huawei.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: HORIGUCHI NAOYA <naoya.horiguchi@nec.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joao Martins <joao.m.martins@oracle.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mina Almasry <almasrymina@google.com> Cc: Oliver Neukum <oneukum@suse.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Xiongchun Duan <duanxiongchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-05memory-hotplug.rst: add a note about ZONE_MOVABLE and page pinningPavel Tatashin1-0/+9
Document the special handling of page pinning when ZONE_MOVABLE present. Link: https://lkml.kernel.org/r/20210215161349.246722-11-pasha.tatashin@soleen.com Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com> Suggested-by: David Hildenbrand <david@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: James Morris <jmorris@namei.org> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Michal Hocko <mhocko@kernel.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sasha Levin <sashal@kernel.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Tyler Hicks <tyhicks@linux.microsoft.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-05userfaultfd: update documentation to describe minor fault handlingAxel Rasmussen1-41/+66
Reword / reorganize things a little bit into "lists", so new features / modes / ioctls can sort of just be appended. Describe how UFFDIO_REGISTER_MODE_MINOR and UFFDIO_CONTINUE can be used to intercept and resolve minor faults. Make it clear that COPY and ZEROPAGE are used for MISSING faults, whereas CONTINUE is used for MINOR faults. Link: https://lkml.kernel.org/r/20210301222728.176417-6-axelrasmussen@google.com Signed-off-by: Axel Rasmussen <axelrasmussen@google.com> Reviewed-by: Peter Xu <peterx@redhat.com> Cc: Adam Ruprecht <ruprecht@google.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Cannon Matthews <cannonmatthews@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chinwen Chang <chinwen.chang@mediatek.com> Cc: David Rientjes <rientjes@google.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Lokesh Gidra <lokeshgidra@google.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: "Michal Koutn" <mkoutny@suse.com> Cc: Michel Lespinasse <walken@google.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Mina Almasry <almasrymina@google.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oliver Upton <oupton@google.com> Cc: Shaohua Li <shli@fb.com> Cc: Shawn Anastasio <shawn@anastas.io> Cc: Steven Price <steven.price@arm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-30mm/page_alloc: combine __alloc_pages and __alloc_pages_nodemaskMatthew Wilcox (Oracle)1-1/+1
There are only two callers of __alloc_pages() so prune the thicket of alloc_page variants by combining the two functions together. Current callers of __alloc_pages() simply add an extra 'NULL' parameter and current callers of __alloc_pages_nodemask() call __alloc_pages() instead. Link: https://lkml.kernel.org/r/20210225150642.2582252-4-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-03-31Documentation: Add leading slash to some pathsMark O'Donovan1-1/+1
Change multiple sys/xyz to /sys/xyz Signed-off-by: Mark O'Donovan <shiftee@posteo.net> Link: https://lore.kernel.org/r/20210328152837.73347-1-shiftee@posteo.net Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2021-02-26Documentation: sysfs/memory: clarify some memory block device propertiesDavid Hildenbrand1-8/+8
In commit 53cdc1cb29e8 ("drivers/base/memory.c: indicate all memory blocks as removable") we changed the output of the "removable" property of memory devices to return "1" if and only if the kernel supports memory offlining. Let's update documentation, stating that the interface is legacy. Also update documentation of the "state" property and "valid_zones" properties. Link: https://lkml.kernel.org/r/20210201181347.13262-3-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: David Hildenbrand <david@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Ilya Dryomov <idryomov@gmail.com> Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Cc: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26drivers/base/memory: don't store phys_device in memory blocksDavid Hildenbrand1-2/+2
No need to store the value for each and every memory block, as we can easily query the value at runtime. Reshuffle the members to optimize the memory layout. Also, let's clarify what the interface once was used for and why it's legacy nowadays. "phys_device" was used on s390x in older versions of lsmem[2]/chmem[3], back when they were still part of s390x-tools. They were later replaced by the variants in linux-utils. For example, RHEL6 and RHEL7 contain lsmem/chmem from s390-utils. RHEL8 switched to versions from util-linux on s390x [4]. "phys_device" was added with sysfs support for memory hotplug in commit 3947be1969a9 ("[PATCH] memory hotplug: sysfs and add/remove functions") in 2005. It always returned 0. s390x started returning something != 0 on some setups (if sclp.rzm is set by HW) in 2010 via commit 57b552ba0b2f ("memory hotplug/s390: set phys_device"). For s390x, it allowed for identifying which memory block devices belong to the same storage increment (RZM). Only if all memory block devices comprising a single storage increment were offline, the memory could actually be removed in the hypervisor. Since commit e5d709bb5fb7 ("s390/memory hotplug: provide memory_block_size_bytes() function") in 2013 a memory block device spans at least one storage increment - which is why the interface isn't really helpful/used anymore (except by old lsmem/chmem tools). There were once RFC patches to make use of "phys_device" in ACPI context; however, the underlying problem could be solved using different interfaces [1]. [1] https://patchwork.kernel.org/patch/2163871/ [2] https://github.com/ibm-s390-tools/s390-tools/blob/v2.1.0/zconf/lsmem [3] https://github.com/ibm-s390-tools/s390-tools/blob/v2.1.0/zconf/chmem [4] https://bugzilla.redhat.com/show_bug.cgi?id=1504134 Link: https://lkml.kernel.org/r/20210201181347.13262-2-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Cc: Ilya Dryomov <idryomov@gmail.com> Cc: Vaibhav Jain <vaibhav@linux.ibm.com> Cc: Tom Rix <trix@redhat.com> Cc: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-01-01docs/mm: concepts.rst: Correct the threshold to low watermarkLiao Pingfang1-1/+1
It should be "low watermark" where we wake up kswapd daemon. Signed-off-by: Liao Pingfang <winndows@163.com> Link: https://lore.kernel.org/r/1609053319-3112-1-git-send-email-winndows@163.com Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2020-12-15Merge branch 'akpm' (patches from Andrew)Linus Torvalds1-15/+0
Merge misc updates from Andrew Morton: - a few random little subsystems - almost all of the MM patches which are staged ahead of linux-next material. I'll trickle to post-linux-next work in as the dependents get merged up. Subsystems affected by this patch series: kthread, kbuild, ide, ntfs, ocfs2, arch, and mm (slab-generic, slab, slub, dax, debug, pagecache, gup, swap, shmem, memcg, pagemap, mremap, hmm, vmalloc, documentation, kasan, pagealloc, memory-failure, hugetlb, vmscan, z3fold, compaction, oom-kill, migration, cma, page-poison, userfaultfd, zswap, zsmalloc, uaccess, zram, and cleanups). * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (200 commits) mm: cleanup kstrto*() usage mm: fix fall-through warnings for Clang mm: slub: convert sysfs sprintf family to sysfs_emit/sysfs_emit_at mm: shmem: convert shmem_enabled_show to use sysfs_emit_at mm:backing-dev: use sysfs_emit in macro defining functions mm: huge_memory: convert remaining use of sprintf to sysfs_emit and neatening mm: use sysfs_emit for struct kobject * uses mm: fix kernel-doc markups zram: break the strict dependency from lzo zram: add stat to gather incompressible pages since zram set up zram: support page writeback mm/process_vm_access: remove redundant initialization of iov_r mm/zsmalloc.c: rework the list_add code in insert_zspage() mm/zswap: move to use crypto_acomp API for hardware acceleration mm/zswap: fix passing zero to 'PTR_ERR' warning mm/zswap: make struct kernel_param_ops definitions const userfaultfd/selftests: hint the test runner on required privilege userfaultfd/selftests: fix retval check for userfaultfd_open() userfaultfd/selftests: always dump something in modes userfaultfd: selftests: make __{s,u}64 format specifiers portable ...