diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2022-05-26 22:32:41 +0300 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2022-05-26 22:32:41 +0300 |
commit | 98931dd95fd489fcbfa97da563505a6f071d7c77 (patch) | |
tree | 44683fc4a92efa614acdca2742a7ff19d26da1e3 /Documentation/admin-guide | |
parent | df202b452fe6c6d6f1351bad485e2367ef1e644e (diff) | |
parent | f403f22f8ccb12860b2b62fec3173c6ccd45938b (diff) | |
download | linux-98931dd95fd489fcbfa97da563505a6f071d7c77.tar.xz |
Merge tag 'mm-stable-2022-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
"Almost all of MM here. A few things are still getting finished off,
reviewed, etc.
- Yang Shi has improved the behaviour of khugepaged collapsing of
readonly file-backed transparent hugepages.
- Johannes Weiner has arranged for zswap memory use to be tracked and
managed on a per-cgroup basis.
- Munchun Song adds a /proc knob ("hugetlb_optimize_vmemmap") for
runtime enablement of the recent huge page vmemmap optimization
feature.
- Baolin Wang contributes a series to fix some issues around hugetlb
pagetable invalidation.
- Zhenwei Pi has fixed some interactions between hwpoisoned pages and
virtualization.
- Tong Tiangen has enabled the use of the presently x86-only
page_table_check debugging feature on arm64 and riscv.
- David Vernet has done some fixup work on the memcg selftests.
- Peter Xu has taught userfaultfd to handle write protection faults
against shmem- and hugetlbfs-backed files.
- More DAMON development from SeongJae Park - adding online tuning of
the feature and support for monitoring of fixed virtual address
ranges. Also easier discovery of which monitoring operations are
available.
- Nadav Amit has done some optimization of TLB flushing during
mprotect().
- Neil Brown continues to labor away at improving our swap-over-NFS
support.
- David Hildenbrand has some fixes to anon page COWing versus
get_user_pages().
- Peng Liu fixed some errors in the core hugetlb code.
- Joao Martins has reduced the amount of memory consumed by
device-dax's compound devmaps.
- Some cleanups of the arch-specific pagemap code from Anshuman
Khandual.
- Muchun Song has found and fixed some errors in the TLB flushing of
transparent hugepages.
- Roman Gushchin has done more work on the memcg selftests.
... and, of course, many smaller fixes and cleanups. Notably, the
customary million cleanup serieses from Miaohe Lin"
* tag 'mm-stable-2022-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (381 commits)
mm: kfence: use PAGE_ALIGNED helper
selftests: vm: add the "settings" file with timeout variable
selftests: vm: add "test_hmm.sh" to TEST_FILES
selftests: vm: check numa_available() before operating "merge_across_nodes" in ksm_tests
selftests: vm: add migration to the .gitignore
selftests/vm/pkeys: fix typo in comment
ksm: fix typo in comment
selftests: vm: add process_mrelease tests
Revert "mm/vmscan: never demote for memcg reclaim"
mm/kfence: print disabling or re-enabling message
include/trace/events/percpu.h: cleanup for "percpu: improve percpu_alloc_percpu event trace"
include/trace/events/mmflags.h: cleanup for "tracing: incorrect gfp_t conversion"
mm: fix a potential infinite loop in start_isolate_page_range()
MAINTAINERS: add Muchun as co-maintainer for HugeTLB
zram: fix Kconfig dependency warning
mm/shmem: fix shmem folio swapoff hang
cgroup: fix an error handling path in alloc_pagecache_max_30M()
mm: damon: use HPAGE_PMD_SIZE
tracing: incorrect isolate_mote_t cast in mm_vmscan_lru_isolate
nodemask.h: fix compilation error with GCC12
...
Diffstat (limited to 'Documentation/admin-guide')
-rw-r--r-- | Documentation/admin-guide/blockdev/zram.rst | 5 | ||||
-rw-r--r-- | Documentation/admin-guide/cgroup-v2.rst | 49 | ||||
-rw-r--r-- | Documentation/admin-guide/kernel-parameters.txt | 10 | ||||
-rw-r--r-- | Documentation/admin-guide/mm/damon/reclaim.rst | 11 | ||||
-rw-r--r-- | Documentation/admin-guide/mm/damon/usage.rst | 41 | ||||
-rw-r--r-- | Documentation/admin-guide/mm/hugetlbpage.rst | 2 | ||||
-rw-r--r-- | Documentation/admin-guide/mm/ksm.rst | 18 | ||||
-rw-r--r-- | Documentation/admin-guide/sysctl/vm.rst | 48 |
8 files changed, 165 insertions, 19 deletions
diff --git a/Documentation/admin-guide/blockdev/zram.rst b/Documentation/admin-guide/blockdev/zram.rst index 54fe63745ed8..c73b16930449 100644 --- a/Documentation/admin-guide/blockdev/zram.rst +++ b/Documentation/admin-guide/blockdev/zram.rst @@ -343,6 +343,11 @@ Admin can request writeback of those idle pages at right timing via:: With the command, zram will writeback idle pages from memory to the storage. +Additionally, if a user choose to writeback only huge and idle pages +this can be accomplished with:: + + echo huge_idle > /sys/block/zramX/writeback + If an admin wants to write a specific page in zram device to the backing device, they could write a page index into the interface. diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst index 38aa01939e1e..176298f2f4de 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -1208,6 +1208,34 @@ PAGE_SIZE multiple when read back. high limit is used and monitored properly, this limit's utility is limited to providing the final safety net. + memory.reclaim + A write-only nested-keyed file which exists for all cgroups. + + This is a simple interface to trigger memory reclaim in the + target cgroup. + + This file accepts a single key, the number of bytes to reclaim. + No nested keys are currently supported. + + Example:: + + echo "1G" > memory.reclaim + + The interface can be later extended with nested keys to + configure the reclaim behavior. For example, specify the + type of memory to reclaim from (anon, file, ..). + + Please note that the kernel can over or under reclaim from + the target cgroup. If less bytes are reclaimed than the + specified amount, -EAGAIN is returned. + + memory.peak + A read-only single value file which exists on non-root + cgroups. + + The max memory usage recorded for the cgroup and its + descendants since the creation of the cgroup. + memory.oom.group A read-write single value file which exists on non-root cgroups. The default value is "0". @@ -1326,6 +1354,12 @@ PAGE_SIZE multiple when read back. Amount of cached filesystem data that is swap-backed, such as tmpfs, shm segments, shared anonymous mmap()s + zswap + Amount of memory consumed by the zswap compression backend. + + zswapped + Amount of application memory swapped out to zswap. + file_mapped Amount of cached filesystem data mapped with mmap() @@ -1516,6 +1550,21 @@ PAGE_SIZE multiple when read back. higher than the limit for an extended period of time. This reduces the impact on the workload and memory management. + memory.zswap.current + A read-only single value file which exists on non-root + cgroups. + + The total amount of memory consumed by the zswap compression + backend. + + memory.zswap.max + A read-write single value file which exists on non-root + cgroups. The default is "max". + + Zswap usage hard limit. If a cgroup's zswap pool reaches this + limit, it will refuse to take any more stores before existing + entries fault back in or are written out to disk. + memory.pressure A read-only nested-keyed file. diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index a9066cfb85a0..32073f873662 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -1705,16 +1705,16 @@ boot-time allocation of gigantic hugepages is skipped. hugetlb_free_vmemmap= - [KNL] Reguires CONFIG_HUGETLB_PAGE_FREE_VMEMMAP + [KNL] Reguires CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP enabled. Allows heavy hugetlb users to free up some more memory (7 * PAGE_SIZE for each 2MB hugetlb page). - Format: { on | off (default) } + Format: { [oO][Nn]/Y/y/1 | [oO][Ff]/N/n/0 (default) } - on: enable the feature - off: disable the feature + [oO][Nn]/Y/y/1: enable the feature + [oO][Ff]/N/n/0: disable the feature - Built with CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON=y, + Built with CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP_DEFAULT_ON=y, the default is on. This is not compatible with memory_hotplug.memmap_on_memory. diff --git a/Documentation/admin-guide/mm/damon/reclaim.rst b/Documentation/admin-guide/mm/damon/reclaim.rst index 0af51a9705b1..46306f1f34b1 100644 --- a/Documentation/admin-guide/mm/damon/reclaim.rst +++ b/Documentation/admin-guide/mm/damon/reclaim.rst @@ -66,6 +66,17 @@ Setting it as ``N`` disables DAMON_RECLAIM. Note that DAMON_RECLAIM could do no real monitoring and reclamation due to the watermarks-based activation condition. Refer to below descriptions for the watermarks parameter for this. +commit_inputs +------------- + +Make DAMON_RECLAIM reads the input parameters again, except ``enabled``. + +Input parameters that updated while DAMON_RECLAIM is running are not applied +by default. Once this parameter is set as ``Y``, DAMON_RECLAIM reads values +of parametrs except ``enabled`` again. Once the re-reading is done, this +parameter is set as ``N``. If invalid parameters are found while the +re-reading, DAMON_RECLAIM will be disabled. + min_age ------- diff --git a/Documentation/admin-guide/mm/damon/usage.rst b/Documentation/admin-guide/mm/damon/usage.rst index 592ea9a50881..1bb7b72414b2 100644 --- a/Documentation/admin-guide/mm/damon/usage.rst +++ b/Documentation/admin-guide/mm/damon/usage.rst @@ -68,7 +68,7 @@ comma (","). :: │ kdamonds/nr_kdamonds │ │ 0/state,pid │ │ │ contexts/nr_contexts - │ │ │ │ 0/operations + │ │ │ │ 0/avail_operations,operations │ │ │ │ │ monitoring_attrs/ │ │ │ │ │ │ intervals/sample_us,aggr_us,update_us │ │ │ │ │ │ nr_regions/min,max @@ -121,10 +121,11 @@ In each kdamond directory, two files (``state`` and ``pid``) and one directory Reading ``state`` returns ``on`` if the kdamond is currently running, or ``off`` if it is not running. Writing ``on`` or ``off`` makes the kdamond be -in the state. Writing ``update_schemes_stats`` to ``state`` file updates the -contents of stats files for each DAMON-based operation scheme of the kdamond. -For details of the stats, please refer to :ref:`stats section -<sysfs_schemes_stats>`. +in the state. Writing ``commit`` to the ``state`` file makes kdamond reads the +user inputs in the sysfs files except ``state`` file again. Writing +``update_schemes_stats`` to ``state`` file updates the contents of stats files +for each DAMON-based operation scheme of the kdamond. For details of the +stats, please refer to :ref:`stats section <sysfs_schemes_stats>`. If the state is ``on``, reading ``pid`` shows the pid of the kdamond thread. @@ -143,17 +144,28 @@ be written to the file. contexts/<N>/ ------------- -In each context directory, one file (``operations``) and three directories -(``monitoring_attrs``, ``targets``, and ``schemes``) exist. +In each context directory, two files (``avail_operations`` and ``operations``) +and three directories (``monitoring_attrs``, ``targets``, and ``schemes``) +exist. DAMON supports multiple types of monitoring operations, including those for -virtual address space and the physical address space. You can set and get what -type of monitoring operations DAMON will use for the context by writing one of -below keywords to, and reading from the file. +virtual address space and the physical address space. You can get the list of +available monitoring operations set on the currently running kernel by reading +``avail_operations`` file. Based on the kernel configuration, the file will +list some or all of below keywords. - vaddr: Monitor virtual address spaces of specific processes + - fvaddr: Monitor fixed virtual address ranges - paddr: Monitor the physical address space of the system +Please refer to :ref:`regions sysfs directory <sysfs_regions>` for detailed +differences between the operations sets in terms of the monitoring target +regions. + +You can set and get what type of monitoring operations DAMON will use for the +context by writing one of the keywords listed in ``avail_operations`` file and +reading from the ``operations`` file. + contexts/<N>/monitoring_attrs/ ------------------------------ @@ -192,6 +204,8 @@ If you wrote ``vaddr`` to the ``contexts/<N>/operations``, each target should be a process. You can specify the process to DAMON by writing the pid of the process to the ``pid_target`` file. +.. _sysfs_regions: + targets/<N>/regions ------------------- @@ -202,9 +216,10 @@ can be covered. However, users could want to set the initial monitoring region to specific address ranges. In contrast, DAMON do not automatically sets and updates the monitoring target -regions when ``paddr`` monitoring operations set is being used (``paddr`` is -written to the ``contexts/<N>/operations``). Therefore, users should set the -monitoring target regions by themselves in the case. +regions when ``fvaddr`` or ``paddr`` monitoring operations sets are being used +(``fvaddr`` or ``paddr`` have written to the ``contexts/<N>/operations``). +Therefore, users should set the monitoring target regions by themselves in the +cases. For such cases, users can explicitly set the initial monitoring target regions as they want, by writing proper values to the files under this directory. diff --git a/Documentation/admin-guide/mm/hugetlbpage.rst b/Documentation/admin-guide/mm/hugetlbpage.rst index 0166f9de3428..a90330d0a837 100644 --- a/Documentation/admin-guide/mm/hugetlbpage.rst +++ b/Documentation/admin-guide/mm/hugetlbpage.rst @@ -164,7 +164,7 @@ default_hugepagesz will all result in 256 2M huge pages being allocated. Valid default huge page size is architecture dependent. hugetlb_free_vmemmap - When CONFIG_HUGETLB_PAGE_FREE_VMEMMAP is set, this enables freeing + When CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP is set, this enables optimizing unused vmemmap pages associated with each HugeTLB page. When multiple huge page sizes are supported, ``/proc/sys/vm/nr_hugepages`` diff --git a/Documentation/admin-guide/mm/ksm.rst b/Documentation/admin-guide/mm/ksm.rst index 97d816791aca..b244f0202a03 100644 --- a/Documentation/admin-guide/mm/ksm.rst +++ b/Documentation/admin-guide/mm/ksm.rst @@ -184,6 +184,24 @@ The maximum possible ``pages_sharing/pages_shared`` ratio is limited by the ``max_page_sharing`` tunable. To increase the ratio ``max_page_sharing`` must be increased accordingly. +Monitoring KSM events +===================== + +There are some counters in /proc/vmstat that may be used to monitor KSM events. +KSM might help save memory, it's a tradeoff by may suffering delay on KSM COW +or on swapping in copy. Those events could help users evaluate whether or how +to use KSM. For example, if cow_ksm increases too fast, user may decrease the +range of madvise(, , MADV_MERGEABLE). + +cow_ksm + is incremented every time a KSM page triggers copy on write (COW) + when users try to write to a KSM page, we have to make a copy. + +ksm_swpin_copy + is incremented every time a KSM page is copied when swapping in + note that KSM page might be copied when swapping in because do_swap_page() + cannot do all the locking needed to reconstitute a cross-anon_vma KSM page. + -- Izik Eidus, Hugh Dickins, 17 Nov 2009 diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst index f4804ce37c58..5c9aa171a0d3 100644 --- a/Documentation/admin-guide/sysctl/vm.rst +++ b/Documentation/admin-guide/sysctl/vm.rst @@ -62,6 +62,7 @@ Currently, these files are in /proc/sys/vm: - overcommit_memory - overcommit_ratio - page-cluster +- page_lock_unfairness - panic_on_oom - percpu_pagelist_high_fraction - stat_interval @@ -561,6 +562,45 @@ Change the minimum size of the hugepage pool. See Documentation/admin-guide/mm/hugetlbpage.rst +hugetlb_optimize_vmemmap +======================== + +This knob is not available when memory_hotplug.memmap_on_memory (kernel parameter) +is configured or the size of 'struct page' (a structure defined in +include/linux/mm_types.h) is not power of two (an unusual system config could +result in this). + +Enable (set to 1) or disable (set to 0) the feature of optimizing vmemmap pages +associated with each HugeTLB page. + +Once enabled, the vmemmap pages of subsequent allocation of HugeTLB pages from +buddy allocator will be optimized (7 pages per 2MB HugeTLB page and 4095 pages +per 1GB HugeTLB page), whereas already allocated HugeTLB pages will not be +optimized. When those optimized HugeTLB pages are freed from the HugeTLB pool +to the buddy allocator, the vmemmap pages representing that range needs to be +remapped again and the vmemmap pages discarded earlier need to be rellocated +again. If your use case is that HugeTLB pages are allocated 'on the fly' (e.g. +never explicitly allocating HugeTLB pages with 'nr_hugepages' but only set +'nr_overcommit_hugepages', those overcommitted HugeTLB pages are allocated 'on +the fly') instead of being pulled from the HugeTLB pool, you should weigh the +benefits of memory savings against the more overhead (~2x slower than before) +of allocation or freeing HugeTLB pages between the HugeTLB pool and the buddy +allocator. Another behavior to note is that if the system is under heavy memory +pressure, it could prevent the user from freeing HugeTLB pages from the HugeTLB +pool to the buddy allocator since the allocation of vmemmap pages could be +failed, you have to retry later if your system encounter this situation. + +Once disabled, the vmemmap pages of subsequent allocation of HugeTLB pages from +buddy allocator will not be optimized meaning the extra overhead at allocation +time from buddy allocator disappears, whereas already optimized HugeTLB pages +will not be affected. If you want to make sure there are no optimized HugeTLB +pages, you can set "nr_hugepages" to 0 first and then disable this. Note that +writing 0 to nr_hugepages will make any "in use" HugeTLB pages become surplus +pages. So, those surplus pages are still optimized until they are no longer +in use. You would need to wait for those surplus pages to be released before +there are no optimized pages in the system. + + nr_hugepages_mempolicy ====================== @@ -754,6 +794,14 @@ extra faults and I/O delays for following faults if they would have been part of that consecutive pages readahead would have brought in. +page_lock_unfairness +==================== + +This value determines the number of times that the page lock can be +stolen from under a waiter. After the lock is stolen the number of times +specified in this file (default is 5), the "fair lock handoff" semantics +will apply, and the waiter will only be awakened if the lock can be taken. + panic_on_oom ============ |