summaryrefslogtreecommitdiff
path: root/Documentation
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2018-10-27 05:33:41 +0300
committerLinus Torvalds <torvalds@linux-foundation.org>2018-10-27 05:33:41 +0300
commit345671ea0f9258f410eb057b9ced9cefbbe5dc78 (patch)
treefe97ba3d27679789e6aa34e39b002ee64ce25412 /Documentation
parent4904008165c8a1c48602b8316139691b8c735e6e (diff)
parent22146c3ce98962436e401f7b7016a6f664c9ffb5 (diff)
downloadlinux-345671ea0f9258f410eb057b9ced9cefbbe5dc78.tar.xz
Merge branch 'akpm' (patches from Andrew)
Merge updates from Andrew Morton: - a few misc things - ocfs2 updates - most of MM * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (132 commits) hugetlbfs: dirty pages as they are added to pagecache mm: export add_swap_extent() mm: split SWP_FILE into SWP_ACTIVATED and SWP_FS tools/testing/selftests/vm/map_fixed_noreplace.c: add test for MAP_FIXED_NOREPLACE mm: thp: relocate flush_cache_range() in migrate_misplaced_transhuge_page() mm: thp: fix mmu_notifier in migrate_misplaced_transhuge_page() mm: thp: fix MADV_DONTNEED vs migrate_misplaced_transhuge_page race condition mm/kasan/quarantine.c: make quarantine_lock a raw_spinlock_t mm/gup: cache dev_pagemap while pinning pages Revert "x86/e820: put !E820_TYPE_RAM regions into memblock.reserved" mm: return zero_resv_unavail optimization mm: zero remaining unavailable struct pages tools/testing/selftests/vm/gup_benchmark.c: add MAP_HUGETLB option tools/testing/selftests/vm/gup_benchmark.c: add MAP_SHARED option tools/testing/selftests/vm/gup_benchmark.c: allow user specified file tools/testing/selftests/vm/gup_benchmark.c: fix 'write' flag usage mm/gup_benchmark.c: add additional pinning methods mm/gup_benchmark.c: time put_page() mm: don't raise MEMCG_OOM event due to failed high-order allocation mm/page-writeback.c: fix range_cyclic writeback vs writepages deadlock ...
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/accounting/psi.txt73
-rw-r--r--Documentation/admin-guide/cgroup-v2.rst22
-rw-r--r--Documentation/admin-guide/kernel-parameters.txt12
-rw-r--r--Documentation/filesystems/proc.txt4
-rw-r--r--Documentation/vm/slub.rst12
-rw-r--r--Documentation/x86/pat.txt4
6 files changed, 122 insertions, 5 deletions
diff --git a/Documentation/accounting/psi.txt b/Documentation/accounting/psi.txt
new file mode 100644
index 000000000000..b8ca28b60215
--- /dev/null
+++ b/Documentation/accounting/psi.txt
@@ -0,0 +1,73 @@
+================================
+PSI - Pressure Stall Information
+================================
+
+:Date: April, 2018
+:Author: Johannes Weiner <hannes@cmpxchg.org>
+
+When CPU, memory or IO devices are contended, workloads experience
+latency spikes, throughput losses, and run the risk of OOM kills.
+
+Without an accurate measure of such contention, users are forced to
+either play it safe and under-utilize their hardware resources, or
+roll the dice and frequently suffer the disruptions resulting from
+excessive overcommit.
+
+The psi feature identifies and quantifies the disruptions caused by
+such resource crunches and the time impact it has on complex workloads
+or even entire systems.
+
+Having an accurate measure of productivity losses caused by resource
+scarcity aids users in sizing workloads to hardware--or provisioning
+hardware according to workload demand.
+
+As psi aggregates this information in realtime, systems can be managed
+dynamically using techniques such as load shedding, migrating jobs to
+other systems or data centers, or strategically pausing or killing low
+priority or restartable batch jobs.
+
+This allows maximizing hardware utilization without sacrificing
+workload health or risking major disruptions such as OOM kills.
+
+Pressure interface
+==================
+
+Pressure information for each resource is exported through the
+respective file in /proc/pressure/ -- cpu, memory, and io.
+
+The format for CPU is as such:
+
+some avg10=0.00 avg60=0.00 avg300=0.00 total=0
+
+and for memory and IO:
+
+some avg10=0.00 avg60=0.00 avg300=0.00 total=0
+full avg10=0.00 avg60=0.00 avg300=0.00 total=0
+
+The "some" line indicates the share of time in which at least some
+tasks are stalled on a given resource.
+
+The "full" line indicates the share of time in which all non-idle
+tasks are stalled on a given resource simultaneously. In this state
+actual CPU cycles are going to waste, and a workload that spends
+extended time in this state is considered to be thrashing. This has
+severe impact on performance, and it's useful to distinguish this
+situation from a state where some tasks are stalled but the CPU is
+still doing productive work. As such, time spent in this subset of the
+stall state is tracked separately and exported in the "full" averages.
+
+The ratios are tracked as recent trends over ten, sixty, and three
+hundred second windows, which gives insight into short term events as
+well as medium and long term trends. The total absolute stall time is
+tracked and exported as well, to allow detection of latency spikes
+which wouldn't necessarily make a dent in the time averages, or to
+average trends over custom time frames.
+
+Cgroup2 interface
+=================
+
+In a system with a CONFIG_CGROUP=y kernel and the cgroup2 filesystem
+mounted, pressure stall information is also tracked for tasks grouped
+into cgroups. Each subdirectory in the cgroupfs mountpoint contains
+cpu.pressure, memory.pressure, and io.pressure files; the format is
+the same as the /proc/pressure/ files.
diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index caf36105a1c7..8384c681a4b2 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -966,6 +966,12 @@ All time durations are in microseconds.
$PERIOD duration. "max" for $MAX indicates no limit. If only
one number is written, $MAX is updated.
+ cpu.pressure
+ A read-only nested-key file which exists on non-root cgroups.
+
+ Shows pressure stall information for CPU. See
+ Documentation/accounting/psi.txt for details.
+
Memory
------
@@ -1127,6 +1133,10 @@ PAGE_SIZE multiple when read back.
disk readahead. For now OOM in memory cgroup kills
tasks iff shortage has happened inside page fault.
+ This event is not raised if the OOM killer is not
+ considered as an option, e.g. for failed high-order
+ allocations.
+
oom_kill
The number of processes belonging to this cgroup
killed by any kind of OOM killer.
@@ -1271,6 +1281,12 @@ PAGE_SIZE multiple when read back.
higher than the limit for an extended period of time. This
reduces the impact on the workload and memory management.
+ memory.pressure
+ A read-only nested-key file which exists on non-root cgroups.
+
+ Shows pressure stall information for memory. See
+ Documentation/accounting/psi.txt for details.
+
Usage Guidelines
~~~~~~~~~~~~~~~~
@@ -1408,6 +1424,12 @@ IO Interface Files
8:16 rbps=2097152 wbps=max riops=max wiops=max
+ io.pressure
+ A read-only nested-key file which exists on non-root cgroups.
+
+ Shows pressure stall information for IO. See
+ Documentation/accounting/psi.txt for details.
+
Writeback
~~~~~~~~~
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 47ca5cda0eef..b90fe3b6bc6c 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4851,6 +4851,18 @@
This is actually a boot loader parameter; the value is
passed to the kernel using a special protocol.
+ vm_debug[=options] [KNL] Available with CONFIG_DEBUG_VM=y.
+ May slow down system boot speed, especially when
+ enabled on systems with a large amount of memory.
+ All options are enabled by default, and this
+ interface is meant to allow for selectively
+ enabling or disabling specific virtual memory
+ debugging features.
+
+ Available options are:
+ P Enable page structure init time poisoning
+ - Disable all of the above options
+
vmalloc=nn[KMG] [KNL,BOOT] Forces the vmalloc area to have an exact
size of <nn>. This can be used to increase the
minimum size (128MB on x86). It can also be used to
diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index 22b4b00dee31..12a5e6e693b6 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -858,6 +858,7 @@ Writeback: 0 kB
AnonPages: 861800 kB
Mapped: 280372 kB
Shmem: 644 kB
+KReclaimable: 168048 kB
Slab: 284364 kB
SReclaimable: 159856 kB
SUnreclaim: 124508 kB
@@ -925,6 +926,9 @@ AnonHugePages: Non-file backed huge pages mapped into userspace page tables
ShmemHugePages: Memory used by shared memory (shmem) and tmpfs allocated
with huge pages
ShmemPmdMapped: Shared memory mapped into userspace with huge pages
+KReclaimable: Kernel allocations that the kernel will attempt to reclaim
+ under memory pressure. Includes SReclaimable (below), and other
+ direct allocations with a shrinker.
Slab: in-kernel data structures cache
SReclaimable: Part of Slab, that might be reclaimed, such as caches
SUnreclaim: Part of Slab, that cannot be reclaimed on memory pressure
diff --git a/Documentation/vm/slub.rst b/Documentation/vm/slub.rst
index 3a775fd64e2d..195928808bac 100644
--- a/Documentation/vm/slub.rst
+++ b/Documentation/vm/slub.rst
@@ -36,9 +36,10 @@ debugging is enabled. Format:
slub_debug=<Debug-Options>
Enable options for all slabs
-slub_debug=<Debug-Options>,<slab name>
- Enable options only for select slabs
+slub_debug=<Debug-Options>,<slab name1>,<slab name2>,...
+ Enable options only for select slabs (no spaces
+ after a comma)
Possible debug options are::
@@ -62,7 +63,12 @@ Trying to find an issue in the dentry cache? Try::
slub_debug=,dentry
-to only enable debugging on the dentry cache.
+to only enable debugging on the dentry cache. You may use an asterisk at the
+end of the slab name, in order to cover all slabs with the same prefix. For
+example, here's how you can poison the dentry cache as well as all kmalloc
+slabs:
+
+ slub_debug=P,kmalloc-*,dentry
Red zoning and tracking may realign the slab. We can just apply sanity checks
to the dentry cache with::
diff --git a/Documentation/x86/pat.txt b/Documentation/x86/pat.txt
index 2a4ee6302122..481d8d8536ac 100644
--- a/Documentation/x86/pat.txt
+++ b/Documentation/x86/pat.txt
@@ -90,12 +90,12 @@ pci proc | -- | -- | WC |
Advanced APIs for drivers
-------------------------
A. Exporting pages to users with remap_pfn_range, io_remap_pfn_range,
-vm_insert_pfn
+vmf_insert_pfn
Drivers wanting to export some pages to userspace do it by using mmap
interface and a combination of
1) pgprot_noncached()
-2) io_remap_pfn_range() or remap_pfn_range() or vm_insert_pfn()
+2) io_remap_pfn_range() or remap_pfn_range() or vmf_insert_pfn()
With PAT support, a new API pgprot_writecombine is being added. So, drivers can
continue to use the above sequence, with either pgprot_noncached() or