diff options
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/cgroup-v1/cgroups.txt | 2 | ||||
-rw-r--r-- | Documentation/cgroup-v1/cpusets.txt | 2 | ||||
-rw-r--r-- | Documentation/cgroup-v2.txt | 25 | ||||
-rw-r--r-- | Documentation/filesystems/proc.txt | 18 | ||||
-rw-r--r-- | Documentation/sysctl/vm.txt | 18 | ||||
-rw-r--r-- | Documentation/vm/transhuge.txt | 22 |
6 files changed, 85 insertions, 2 deletions
diff --git a/Documentation/cgroup-v1/cgroups.txt b/Documentation/cgroup-v1/cgroups.txt index c6256ae9885b..947e6fe31ef9 100644 --- a/Documentation/cgroup-v1/cgroups.txt +++ b/Documentation/cgroup-v1/cgroups.txt @@ -8,7 +8,7 @@ Original copyright statements from cpusets.txt: Portions Copyright (C) 2004 BULL SA. Portions Copyright (c) 2004-2006 Silicon Graphics, Inc. Modified by Paul Jackson <pj@sgi.com> -Modified by Christoph Lameter <clameter@sgi.com> +Modified by Christoph Lameter <cl@linux.com> CONTENTS: ========= diff --git a/Documentation/cgroup-v1/cpusets.txt b/Documentation/cgroup-v1/cpusets.txt index fdf7dff3f607..e5cdcd445615 100644 --- a/Documentation/cgroup-v1/cpusets.txt +++ b/Documentation/cgroup-v1/cpusets.txt @@ -6,7 +6,7 @@ Written by Simon.Derr@bull.net Portions Copyright (c) 2004-2006 Silicon Graphics, Inc. Modified by Paul Jackson <pj@sgi.com> -Modified by Christoph Lameter <clameter@sgi.com> +Modified by Christoph Lameter <cl@linux.com> Modified by Paul Menage <menage@google.com> Modified by Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt index ff49cf901148..8f1329a5f700 100644 --- a/Documentation/cgroup-v2.txt +++ b/Documentation/cgroup-v2.txt @@ -843,6 +843,15 @@ PAGE_SIZE multiple when read back. Amount of memory used to cache filesystem data, including tmpfs and shared memory. + kernel_stack + + Amount of memory allocated to kernel stacks. + + slab + + Amount of memory used for storing in-kernel data + structures. + sock Amount of memory used in network transmission buffers @@ -871,6 +880,16 @@ PAGE_SIZE multiple when read back. on the internal memory management lists used by the page reclaim algorithm + slab_reclaimable + + Part of "slab" that might be reclaimed, such as + dentries and inodes. + + slab_unreclaimable + + Part of "slab" that cannot be reclaimed on memory + pressure. + pgfault Total number of page faults incurred @@ -1368,6 +1387,12 @@ system than killing the group. Otherwise, memory.max is there to limit this type of spillover and ultimately contain buggy or even malicious applications. +Setting the original memory.limit_in_bytes below the current usage was +subject to a race condition, where concurrent charges could cause the +limit setting to fail. memory.max on the other hand will first set the +limit to prevent new charges, and then reclaim and OOM kill until the +new limit is met - or the task writing to memory.max is killed. + The combined memory+swap accounting and limiting is replaced by real control over swap space. diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index 843b045b4069..7f5607a089b4 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt @@ -43,6 +43,7 @@ Table of Contents 3.7 /proc/<pid>/task/<tid>/children - Information about task children 3.8 /proc/<pid>/fdinfo/<fd> - Information about opened file 3.9 /proc/<pid>/map_files - Information about memory mapped files + 3.10 /proc/<pid>/timerslack_ns - Task timerslack value 4 Configuring procfs 4.1 Mount options @@ -1862,6 +1863,23 @@ time one can open(2) mappings from the listings of two processes and comparing their inode numbers to figure out which anonymous memory areas are actually shared. +3.10 /proc/<pid>/timerslack_ns - Task timerslack value +--------------------------------------------------------- +This file provides the value of the task's timerslack value in nanoseconds. +This value specifies a amount of time that normal timers may be deferred +in order to coalesce timers and avoid unnecessary wakeups. + +This allows a task's interactivity vs power consumption trade off to be +adjusted. + +Writing 0 to the file will set the tasks timerslack to the default value. + +Valid values are from 0 - ULLONG_MAX + +An application setting the value must have PTRACE_MODE_ATTACH_FSCREDS level +permissions on the task specified to change its timerslack_ns value. + + ------------------------------------------------------------------------------ Configuring procfs ------------------------------------------------------------------------------ diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt index 89a887c76629..cb0368459da3 100644 --- a/Documentation/sysctl/vm.txt +++ b/Documentation/sysctl/vm.txt @@ -803,6 +803,24 @@ performance impact. Reclaim code needs to take various locks to find freeable directory and inode objects. With vfs_cache_pressure=1000, it will look for ten times more freeable objects than there are. +============================================================= + +watermark_scale_factor: + +This factor controls the aggressiveness of kswapd. It defines the +amount of memory left in a node/system before kswapd is woken up and +how much memory needs to be free before kswapd goes back to sleep. + +The unit is in fractions of 10,000. The default value of 10 means the +distances between watermarks are 0.1% of the available memory in the +node/system. The maximum value is 1000, or 10% of memory. + +A high rate of threads entering direct reclaim (allocstall) or kswapd +going to sleep prematurely (kswapd_low_wmark_hit_quickly) can indicate +that the number of free pages kswapd maintains for latency reasons is +too small for the allocation bursts occurring in the system. This knob +can then be used to tune kswapd aggressiveness accordingly. + ============================================================== zone_reclaim_mode: diff --git a/Documentation/vm/transhuge.txt b/Documentation/vm/transhuge.txt index 21cf34f3ddb2..d9cb65cf5cfd 100644 --- a/Documentation/vm/transhuge.txt +++ b/Documentation/vm/transhuge.txt @@ -113,9 +113,26 @@ guaranteed, but it may be more likely in case the allocation is for a MADV_HUGEPAGE region. echo always >/sys/kernel/mm/transparent_hugepage/defrag +echo defer >/sys/kernel/mm/transparent_hugepage/defrag echo madvise >/sys/kernel/mm/transparent_hugepage/defrag echo never >/sys/kernel/mm/transparent_hugepage/defrag +"always" means that an application requesting THP will stall on allocation +failure and directly reclaim pages and compact memory in an effort to +allocate a THP immediately. This may be desirable for virtual machines +that benefit heavily from THP use and are willing to delay the VM start +to utilise them. + +"defer" means that an application will wake kswapd in the background +to reclaim pages and wake kcompact to compact memory so that THP is +available in the near future. It's the responsibility of khugepaged +to then install the THP pages later. + +"madvise" will enter direct reclaim like "always" but only for regions +that are have used madvise(MADV_HUGEPAGE). This is the default behaviour. + +"never" should be self-explanatory. + By default kernel tries to use huge zero page on read page fault. It's possible to disable huge zero page by writing 0 or enable it back by writing 1: @@ -229,6 +246,11 @@ thp_split_page is incremented every time a huge page is split into base thp_split_page_failed is is incremented if kernel fails to split huge page. This can happen if the page was pinned by somebody. +thp_deferred_split_page is incremented when a huge page is put onto split + queue. This happens when a huge page is partially unmapped and + splitting it would free up some memory. Pages on split queue are + going to be split under memory pressure. + thp_split_pmd is incremented every time a PMD split into table of PTEs. This can happen, for instance, when application calls mprotect() or munmap() on part of huge page. It doesn't split huge page, only |