summaryrefslogtreecommitdiff
path: root/Documentation/admin-guide
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/admin-guide')
-rw-r--r--Documentation/admin-guide/README.rst6
-rw-r--r--Documentation/admin-guide/blockdev/zram.rst6
-rw-r--r--Documentation/admin-guide/braille-console.rst4
-rw-r--r--Documentation/admin-guide/bug-hunting.rst9
-rw-r--r--Documentation/admin-guide/cgroup-v2.rst58
-rw-r--r--Documentation/admin-guide/index.rst162
-rw-r--r--Documentation/admin-guide/kernel-parameters.rst3
-rw-r--r--Documentation/admin-guide/kernel-parameters.txt374
-rw-r--r--Documentation/admin-guide/media/ipu3.rst6
-rw-r--r--Documentation/admin-guide/mm/damon/start.rst67
-rw-r--r--Documentation/admin-guide/mm/damon/usage.rst392
-rw-r--r--Documentation/admin-guide/mm/memory-hotplug.rst4
-rw-r--r--Documentation/admin-guide/mm/transhuge.rst82
-rw-r--r--Documentation/admin-guide/nvme-multipath.rst72
-rw-r--r--Documentation/admin-guide/perf/dwc_pcie_pmu.rst6
-rw-r--r--Documentation/admin-guide/perf/hisi-pmu.rst5
-rw-r--r--Documentation/admin-guide/perf/index.rst2
-rw-r--r--Documentation/admin-guide/perf/mrvl-odyssey-ddr-pmu.rst80
-rw-r--r--Documentation/admin-guide/perf/mrvl-odyssey-tad-pmu.rst37
-rw-r--r--Documentation/admin-guide/perf/nvidia-pmu.rst52
-rw-r--r--Documentation/admin-guide/quickly-build-trimmed-linux.rst2
-rw-r--r--Documentation/admin-guide/sysctl/fs.rst2
-rw-r--r--Documentation/admin-guide/sysctl/kernel.rst7
-rw-r--r--Documentation/admin-guide/sysrq.rst20
-rw-r--r--Documentation/admin-guide/verify-bugs-and-bisect-regressions.rst2
-rw-r--r--Documentation/admin-guide/workload-tracing.rst2
26 files changed, 925 insertions, 537 deletions
diff --git a/Documentation/admin-guide/README.rst b/Documentation/admin-guide/README.rst
index f2bebff6a733..b557cf1c820d 100644
--- a/Documentation/admin-guide/README.rst
+++ b/Documentation/admin-guide/README.rst
@@ -176,7 +176,7 @@ Configuring the kernel
values without prompting.
"make defconfig" Create a ./.config file by using the default
- symbol values from either arch/$ARCH/defconfig
+ symbol values from either arch/$ARCH/configs/defconfig
or arch/$ARCH/configs/${PLATFORM}_defconfig,
depending on the architecture.
@@ -356,5 +356,5 @@ instructions at 'Documentation/admin-guide/reporting-issues.rst'.
Hints on understanding kernel bug reports are in
'Documentation/admin-guide/bug-hunting.rst'. More on debugging the kernel
-with gdb is in 'Documentation/dev-tools/gdb-kernel-debugging.rst' and
-'Documentation/dev-tools/kgdb.rst'.
+with gdb is in 'Documentation/process/debugging/gdb-kernel-debugging.rst' and
+'Documentation/process/debugging/kgdb.rst'.
diff --git a/Documentation/admin-guide/blockdev/zram.rst b/Documentation/admin-guide/blockdev/zram.rst
index 714a5171bfc0..1576fb93f06c 100644
--- a/Documentation/admin-guide/blockdev/zram.rst
+++ b/Documentation/admin-guide/blockdev/zram.rst
@@ -121,14 +121,14 @@ compression algorithm to use external pre-trained dictionary, pass full
path to the `dict` along with other parameters::
#pass path to pre-trained zstd dictionary
- echo "algo=zstd dict=/etc/dictioary" > /sys/block/zram0/algorithm_params
+ echo "algo=zstd dict=/etc/dictionary" > /sys/block/zram0/algorithm_params
#same, but using algorithm priority
- echo "priority=1 dict=/etc/dictioary" > \
+ echo "priority=1 dict=/etc/dictionary" > \
/sys/block/zram0/algorithm_params
#pass path to pre-trained zstd dictionary and compression level
- echo "algo=zstd level=8 dict=/etc/dictioary" > \
+ echo "algo=zstd level=8 dict=/etc/dictionary" > \
/sys/block/zram0/algorithm_params
Parameters are algorithm specific: not all algorithms support pre-trained
diff --git a/Documentation/admin-guide/braille-console.rst b/Documentation/admin-guide/braille-console.rst
index 18e79337dcfd..153472e93cae 100644
--- a/Documentation/admin-guide/braille-console.rst
+++ b/Documentation/admin-guide/braille-console.rst
@@ -21,8 +21,8 @@ override the baud rate to 115200, etc.
By default, the braille device will just show the last kernel message (console
mode). To review previous messages, press the Insert key to switch to the VT
review mode. In review mode, the arrow keys permit to browse in the VT content,
-:kbd:`PAGE-UP`/:kbd:`PAGE-DOWN` keys go at the top/bottom of the screen, and
-the :kbd:`HOME` key goes back
+`PAGE-UP`/`PAGE-DOWN` keys go at the top/bottom of the screen, and
+the `HOME` key goes back
to the cursor, hence providing very basic screen reviewing facility.
Sound feedback can be obtained by adding the ``braille_console.sound=1`` kernel
diff --git a/Documentation/admin-guide/bug-hunting.rst b/Documentation/admin-guide/bug-hunting.rst
index 1d0f8ceb3075..ce6f4e8ca487 100644
--- a/Documentation/admin-guide/bug-hunting.rst
+++ b/Documentation/admin-guide/bug-hunting.rst
@@ -368,12 +368,3 @@ processed by ``klogd``::
Aug 29 09:51:01 blizard kernel: Call Trace: [oops:_oops_ioctl+48/80] [_sys_ioctl+254/272] [_system_call+82/128]
Aug 29 09:51:01 blizard kernel: Code: c7 00 05 00 00 00 eb 08 90 90 90 90 90 90 90 90 89 ec 5d c3
----------------------------------------------------------------------------
-
-::
-
- Dr. G.W. Wettstein Oncology Research Div. Computing Facility
- Roger Maris Cancer Center INTERNET: greg@wind.rmcc.com
- 820 4th St. N.
- Fargo, ND 58122
- Phone: 701-234-7556
diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 315ede811c9d..cb1b4e759b7e 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -64,13 +64,14 @@ v1 is available under :ref:`Documentation/admin-guide/cgroup-v1/index.rst <cgrou
5-6. Device
5-7. RDMA
5-7-1. RDMA Interface Files
- 5-8. HugeTLB
- 5.8-1. HugeTLB Interface Files
- 5-9. Misc
- 5.9-1 Miscellaneous cgroup Interface Files
- 5.9-2 Migration and Ownership
- 5-10. Others
- 5-10-1. perf_event
+ 5-8. DMEM
+ 5-9. HugeTLB
+ 5.9-1. HugeTLB Interface Files
+ 5-10. Misc
+ 5.10-1 Miscellaneous cgroup Interface Files
+ 5.10-2 Migration and Ownership
+ 5-11. Others
+ 5-11-1. perf_event
5-N. Non-normative information
5-N-1. CPU controller root cgroup process behaviour
5-N-2. IO controller root cgroup process behaviour
@@ -2626,6 +2627,49 @@ RDMA Interface Files
mlx4_0 hca_handle=1 hca_object=20
ocrdma1 hca_handle=1 hca_object=23
+DMEM
+----
+
+The "dmem" controller regulates the distribution and accounting of
+device memory regions. Because each memory region may have its own page size,
+which does not have to be equal to the system page size, the units are always bytes.
+
+DMEM Interface Files
+~~~~~~~~~~~~~~~~~~~~
+
+ dmem.max, dmem.min, dmem.low
+ A readwrite nested-keyed file that exists for all the cgroups
+ except root that describes current configured resource limit
+ for a region.
+
+ An example for xe follows::
+
+ drm/0000:03:00.0/vram0 1073741824
+ drm/0000:03:00.0/stolen max
+
+ The semantics are the same as for the memory cgroup controller, and are
+ calculated in the same way.
+
+ dmem.capacity
+ A read-only file that describes maximum region capacity.
+ It only exists on the root cgroup. Not all memory can be
+ allocated by cgroups, as the kernel reserves some for
+ internal use.
+
+ An example for xe follows::
+
+ drm/0000:03:00.0/vram0 8514437120
+ drm/0000:03:00.0/stolen 67108864
+
+ dmem.current
+ A read-only file that describes current resource usage.
+ It exists for all the cgroup except root.
+
+ An example for xe follows::
+
+ drm/0000:03:00.0/vram0 12550144
+ drm/0000:03:00.0/stolen 8650752
+
HugeTLB
-------
diff --git a/Documentation/admin-guide/index.rst b/Documentation/admin-guide/index.rst
index e85b1adf5908..c8af32a8f800 100644
--- a/Documentation/admin-guide/index.rst
+++ b/Documentation/admin-guide/index.rst
@@ -7,6 +7,9 @@ added to the kernel over time. There is, as yet, little overall order or
organization here — this material was not written to be a single, coherent
document! With luck things will improve quickly over time.
+General guides to kernel administration
+---------------------------------------
+
This initial section contains overall information, including the README
file describing the kernel as a whole, documentation on kernel parameters,
etc.
@@ -15,19 +18,44 @@ etc.
:maxdepth: 1
README
- kernel-parameters
devices
- sysctl/index
- abi
features
-This section describes CPU vulnerabilities and their mitigations.
+A big part of the kernel's administrative interface is the /proc and sysfs
+virtual filesystems; these documents describe how to interact with tem
+
+.. toctree::
+ :maxdepth: 1
+
+ sysfs-rules
+ sysctl/index
+ cputopology
+ abi
+
+Security-related documentation:
.. toctree::
:maxdepth: 1
hw-vuln/index
+ LSM/index
+ perf-security
+
+Booting the kernel
+------------------
+
+.. toctree::
+ :maxdepth: 1
+
+ bootconfig
+ kernel-parameters
+ efi-stub
+ initrd
+
+
+Tracking down and identifying problems
+--------------------------------------
Here is a set of documents aimed at users who are trying to track down
problems and bugs in particular.
@@ -48,94 +76,120 @@ problems and bugs in particular.
kdump/index
perf/index
pstore-blk
+ clearing-warn-once
+ kernel-per-CPU-kthreads
+ lockup-watchdogs
+ RAS/index
+ sysrq
-This is the beginning of a section with information of interest to
-application developers. Documents covering various aspects of the kernel
-ABI will be found here.
+
+Core-kernel subsystems
+----------------------
+
+These documents describe core-kernel administration interfaces that are
+likely to be of interest on almost any system.
.. toctree::
:maxdepth: 1
- sysfs-rules
+ cgroup-v2
+ cgroup-v1/index
+ cpu-load
+ mm/index
+ module-signing
+ namespaces/index
+ numastat
+ pm/index
+ syscall-user-dispatch
-This is the beginning of a section with information of interest to
-application developers and system integrators doing analysis of the
-Linux kernel for safety critical applications. Documents supporting
-analysis of kernel interactions with applications, and key kernel
-subsystems expectations will be found here.
+Support for non-native binary formats. Note that some of these
+documents are ... old ...
.. toctree::
:maxdepth: 1
- workload-tracing
+ binfmt-misc
+ java
+ mono
+
-The rest of this manual consists of various unordered guides on how to
-configure specific aspects of kernel behavior to your liking.
+Block-layer and filesystem administration
+-----------------------------------------
.. toctree::
:maxdepth: 1
- acpi/index
- aoe/index
- auxdisplay/index
bcache
binderfs
- binfmt-misc
blockdev/index
- bootconfig
- braille-console
- btmrvl
- cgroup-v1/index
- cgroup-v2
cifs/index
- clearing-warn-once
- cpu-load
- cputopology
- dell_rbu
device-mapper/index
- edid
- efi-stub
ext4
filesystem-monitoring
nfs/index
- gpio/index
- highuid
- hw_random
- initrd
iostats
- java
jfs
- kernel-per-CPU-kthreads
+ md
+ ufs
+ xfs
+
+Device-specific guides
+----------------------
+
+How to configure your hardware within your Linux system.
+
+.. toctree::
+ :maxdepth: 1
+
+ acpi/index
+ aoe/index
+ auxdisplay/index
+ braille-console
+ btmrvl
+ dell_rbu
+ edid
+ gpio/index
+ hw_random
laptops/index
lcd-panel-cgram
- ldm
- lockup-watchdogs
- LSM/index
- md
media/index
- mm/index
- module-signing
- mono
- namespaces/index
- numastat
+ nvme-multipath
parport
- perf-security
- pm/index
pnp
rapidio
- RAS/index
rtc
serial-console
svga
- syscall-user-dispatch
- sysrq
thermal/index
thunderbolt
- ufs
- unicode
vga-softcursor
video-output
- xfs
+
+Workload analysis
+-----------------
+
+This is the beginning of a section with information of interest to
+application developers and system integrators doing analysis of the
+Linux kernel for safety critical applications. Documents supporting
+analysis of kernel interactions with applications, and key kernel
+subsystems expectations will be found here.
+
+.. toctree::
+ :maxdepth: 1
+
+ workload-tracing
+
+Everything else
+---------------
+
+A few hard-to-categorize and generally obsolete documents.
+
+.. toctree::
+ :maxdepth: 1
+
+ highuid
+ ldm
+ unicode
.. only:: subproject and html
diff --git a/Documentation/admin-guide/kernel-parameters.rst b/Documentation/admin-guide/kernel-parameters.rst
index 59931f21c974..39d0e7ff0965 100644
--- a/Documentation/admin-guide/kernel-parameters.rst
+++ b/Documentation/admin-guide/kernel-parameters.rst
@@ -194,8 +194,6 @@ is applicable::
WDT Watchdog support is enabled.
X86-32 X86-32, aka i386 architecture is enabled.
X86-64 X86-64 architecture is enabled.
- More X86-64 boot options can be found in
- Documentation/arch/x86/x86_64/boot-options.rst.
X86 Either 32-bit or 64-bit x86 (same as X86-32+X86-64)
X86_UV SGI UV support is enabled.
XEN Xen support is enabled
@@ -213,7 +211,6 @@ Do not modify the syntax of boot loader parameters without extreme
need or coordination with <Documentation/arch/x86/boot.rst>.
There are also arch-specific kernel-parameters not documented here.
-See for example <Documentation/arch/x86/x86_64/boot-options.rst>.
Note that ALL kernel parameters listed below are CASE SENSITIVE, and that
a trailing = on the name of any parameter states that that parameter will
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 3872bc6ec49d..aa7447f8837c 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -21,6 +21,10 @@
strictly ACPI specification compliant.
rsdt -- prefer RSDT over (default) XSDT
copy_dsdt -- copy DSDT to memory
+ nocmcff -- Disable firmware first mode for corrected
+ errors. This disables parsing the HEST CMC error
+ source to check if firmware has set the FF flag. This
+ may result in duplicate corrected error reports.
nospcr -- disable console in ACPI SPCR table as
default _serial_ console on ARM64
For ARM64, ONLY "acpi=off", "acpi=on", "acpi=force" or
@@ -405,6 +409,8 @@
not play well with APC CPU idle - disable it if you have
APC and your system crashes randomly.
+ apic [APIC,X86-64] Use IO-APIC. Default.
+
apic= [APIC,X86,EARLY] Advanced Programmable Interrupt Controller
Change the output verbosity while booting
Format: { quiet (default) | verbose | debug }
@@ -424,6 +430,10 @@
useful so that a dump capture kernel won't be
shot down by NMI
+ apicpmtimer Do APIC timer calibration using the pmtimer. Implies
+ apicmaintimer. Useful when your PIT timer is totally
+ broken.
+
autoconf= [IPV6]
See Documentation/networking/ipv6.rst.
@@ -1726,6 +1736,8 @@
off: Disable GDS mitigation.
+ gbpages [X86] Use GB pages for kernel direct mappings.
+
gcov_persist= [GCOV] When non-zero (default), profiling data for
kernel modules is saved and remains accessible via
debugfs, even when the module is unloaded/reloaded.
@@ -2008,12 +2020,21 @@
idle= [X86,EARLY]
Format: idle=poll, idle=halt, idle=nomwait
- Poll forces a polling idle loop that can slightly
- improve the performance of waking up a idle CPU, but
- will use a lot of power and make the system run hot.
- Not recommended.
+
+ idle=poll: Don't do power saving in the idle loop
+ using HLT, but poll for rescheduling event. This will
+ make the CPUs eat a lot more power, but may be useful
+ to get slightly better performance in multiprocessor
+ benchmarks. It also makes some profiling using
+ performance counters more accurate. Please note that
+ on systems with MONITOR/MWAIT support (like Intel
+ EM64T CPUs) this option has no performance advantage
+ over the normal idle loop. It may also interact badly
+ with hyperthreading.
+
idle=halt: Halt is forced to be used for CPU idle.
In such case C2/C3 won't be used again.
+
idle=nomwait: Disable mwait for CPU C-states
idxd.sva= [HW]
@@ -2311,20 +2332,73 @@
relaxed
iommu= [X86,EARLY]
+
off
+ Don't initialize and use any kind of IOMMU.
+
force
+ Force the use of the hardware IOMMU even when
+ it is not actually needed (e.g. because < 3 GB
+ memory).
+
noforce
+ Don't force hardware IOMMU usage when it is not
+ needed. (default).
+
biomerge
panic
nopanic
merge
nomerge
+
soft
- pt [X86]
- nopt [X86]
- nobypass [PPC/POWERNV]
+ Use software bounce buffering (SWIOTLB) (default for
+ Intel machines). This can be used to prevent the usage
+ of an available hardware IOMMU.
+
+ [X86]
+ pt
+ [X86]
+ nopt
+ [PPC/POWERNV]
+ nobypass
Disable IOMMU bypass, using IOMMU for PCI devices.
+ [X86]
+ AMD Gart HW IOMMU-specific options:
+
+ <size>
+ Set the size of the remapping area in bytes.
+
+ allowed
+ Overwrite iommu off workarounds for specific chipsets
+
+ fullflush
+ Flush IOMMU on each allocation (default).
+
+ nofullflush
+ Don't use IOMMU fullflush.
+
+ memaper[=<order>]
+ Allocate an own aperture over RAM with size
+ 32MB<<order. (default: order=1, i.e. 64MB)
+
+ merge
+ Do scatter-gather (SG) merging. Implies "force"
+ (experimental).
+
+ nomerge
+ Don't do scatter-gather (SG) merging.
+
+ noaperture
+ Ask the IOMMU not to touch the aperture for AGP.
+
+ noagp
+ Don't initialize the AGP driver and use full aperture.
+
+ panic
+ Always panic when IOMMU overflows.
+
iommu.forcedac= [ARM64,X86,EARLY] Control IOVA allocation for PCI devices.
Format: { "0" | "1" }
0 - Try to allocate a 32-bit DMA address first, before
@@ -2432,7 +2506,9 @@
specified in the flag list (default: domain):
nohz
- Disable the tick when a single task runs.
+ Disable the tick when a single task runs as well as
+ disabling other kernel noises like having RCU callbacks
+ offloaded. This is equivalent to the nohz_full parameter.
A residual 1Hz tick is offloaded to workqueues, which you
need to affine to housekeeping through the global
@@ -2695,7 +2771,7 @@
VMs, i.e. on the 0=>1 and 1=>0 transitions of the
number of VMs.
- Enabling virtualization at module lode avoids potential
+ Enabling virtualization at module load avoids potential
latency for creation of the 0=>1 VM, as KVM serializes
virtualization enabling across all online CPUs. The
"cost" of enabling virtualization when KVM is loaded,
@@ -2748,17 +2824,21 @@
nvhe: Standard nVHE-based mode, without support for
protected guests.
- protected: nVHE-based mode with support for guests whose
- state is kept private from the host.
+ protected: Mode with support for guests whose state is
+ kept private from the host, using VHE or
+ nVHE depending on HW support.
nested: VHE-based mode with support for nested
- virtualization. Requires at least ARMv8.3
- hardware.
+ virtualization. Requires at least ARMv8.4
+ hardware (with FEAT_NV2).
Defaults to VHE/nVHE based on hardware support. Setting
mode to "protected" will disable kexec and hibernation
- for the host. "nested" is experimental and should be
- used with extreme caution.
+ for the host. To force nVHE on VHE hardware, add
+ "arm64_sw.hvhe=0 id_aa64mmfr1.vh=0" to the
+ command-line.
+ "nested" is experimental and should be used with
+ extreme caution.
kvm-arm.vgic_v3_group0_trap=
[KVM,ARM,EARLY] Trap guest accesses to GICv3 group-0
@@ -3036,6 +3116,8 @@
* max_sec_lba48: Set or clear transfer size limit to
65535 sectors.
+ * external: Mark port as external (hotplug-capable).
+
* [no]lpm: Enable or disable link power management.
* [no]setxfer: Indicate if transfer speed mode setting
@@ -3259,9 +3341,77 @@
devices can be requested on-demand with the
/dev/loop-control interface.
- mce [X86-32] Machine Check Exception
+ mce= [X86-{32,64}]
+
+ Please see Documentation/arch/x86/x86_64/machinecheck.rst for sysfs runtime tunables.
+
+ off
+ disable machine check
+
+ no_cmci
+ disable CMCI(Corrected Machine Check Interrupt) that
+ Intel processor supports. Usually this disablement is
+ not recommended, but it might be handy if your
+ hardware is misbehaving.
+
+ Note that you'll get more problems without CMCI than
+ with due to the shared banks, i.e. you might get
+ duplicated error logs.
+
+ dont_log_ce
+ don't make logs for corrected errors. All events
+ reported as corrected are silently cleared by OS. This
+ option will be useful if you have no interest in any
+ of corrected errors.
+
+ ignore_ce
+ disable features for corrected errors, e.g.
+ polling timer and CMCI. All events reported as
+ corrected are not cleared by OS and remained in its
+ error banks.
+
+ Usually this disablement is not recommended, however
+ if there is an agent checking/clearing corrected
+ errors (e.g. BIOS or hardware monitoring
+ applications), conflicting with OS's error handling,
+ and you cannot deactivate the agent, then this option
+ will be a help.
+
+ no_lmce
+ do not opt-in to Local MCE delivery. Use legacy method
+ to broadcast MCEs.
+
+ bootlog
+ enable logging of machine checks left over from
+ booting. Disabled by default on AMD Fam10h and older
+ because some BIOS leave bogus ones.
+
+ If your BIOS doesn't do that it's a good idea to
+ enable though to make sure you log even machine check
+ events that result in a reboot. On Intel systems it is
+ enabled by default.
+
+ nobootlog
+ disable boot machine check logging.
+
+ monarchtimeout (number)
+ sets the time in us to wait for other CPUs on machine
+ checks. 0 to disable.
+
+ bios_cmci_threshold
+ don't overwrite the bios-set CMCI threshold. This boot
+ option prevents Linux from overwriting the CMCI
+ threshold set by the bios. Without this option, Linux
+ always sets the CMCI threshold to 1. Enabling this may
+ make memory predictive failure analysis less effective
+ if the bios sets thresholds for memory errors since we
+ will not see details for all errors.
+
+ recovery
+ force-enable recoverable machine check code paths
+
+ Everything else is in sysfs now.
- mce=option [X86-64] See Documentation/arch/x86/x86_64/boot-options.rst
md= [HW] RAID subsystems devices and level
See Documentation/admin-guide/md.rst.
@@ -3351,8 +3501,8 @@
[KNL] Set the initial state for the memory hotplug
onlining policy. If not specified, the default value is
set according to the
- CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE kernel config
- option.
+ CONFIG_MHP_DEFAULT_ONLINE_TYPE kernel config
+ options.
See Documentation/admin-guide/mm/memory-hotplug.rst.
memmap=exactmap [KNL,X86,EARLY] Enable setting of an exact
@@ -3887,6 +4037,8 @@
noapic [SMP,APIC,EARLY] Tells the kernel to not make use of any
IOAPICs that may be present in the system.
+ noapictimer [APIC,X86] Don't set up the APIC timer
+
noautogroup Disable scheduler automatic task group creation.
nocache [ARM,EARLY]
@@ -3934,6 +4086,8 @@
register save and restore. The kernel will only save
legacy floating-point registers on task switch.
+ nogbpages [X86] Do not use GB pages for kernel direct mappings.
+
no_hash_pointers
[KNL,EARLY]
Force pointers printed to the console or buffers to be
@@ -3960,6 +4114,8 @@
the impact of the sleep instructions. This is also
useful when using JTAG debugger.
+ nohpet [X86] Don't use the HPET timer.
+
nohugeiomap [KNL,X86,PPC,ARM64,EARLY] Disable kernel huge I/O mappings.
nohugevmalloc [KNL,X86,PPC,ARM64,EARLY] Disable kernel huge vmalloc mappings.
@@ -4111,8 +4267,10 @@
nosync [HW,M68K] Disables sync negotiation for all devices.
- no_timer_check [X86,APIC] Disables the code which tests for
- broken timer IRQ sources.
+ no_timer_check [X86,APIC] Disables the code which tests for broken
+ timer IRQ sources, i.e., the IO-APIC timer. This can
+ work around problems with incorrect timer
+ initialization on some boards.
no_uaccess_flush
[PPC,EARLY] Don't flush the L1-D cache after accessing user data.
@@ -4192,6 +4350,11 @@
If given as an integer followed by 'U', it will
divide each physical node into N emulated nodes.
+ numa=noacpi [X86] Don't parse the SRAT table for NUMA setup
+
+ numa=nohmat [X86] Don't parse the HMAT table for NUMA setup, or
+ soft-reserved memory partitioning.
+
numa_balancing= [KNL,ARM64,PPC,RISCV,S390,X86] Enable or disable automatic
NUMA balancing.
Allowed values are enable and disable
@@ -4673,7 +4836,7 @@
'1' – force enabled
'x' – unchanged
For example,
- pci=config_acs=10x
+ pci=config_acs=10x@pci:0:0
would configure all devices that support
ACS to enable P2P Request Redirect, disable
Translation Blocking, and leave Source
@@ -5367,7 +5530,42 @@
rcutorture.gp_cond= [KNL]
Use conditional/asynchronous update-side
- primitives, if available.
+ normal-grace-period primitives, if available.
+
+ rcutorture.gp_cond_exp= [KNL]
+ Use conditional/asynchronous update-side
+ expedited-grace-period primitives, if available.
+
+ rcutorture.gp_cond_full= [KNL]
+ Use conditional/asynchronous update-side
+ normal-grace-period primitives that also take
+ concurrent expedited grace periods into account,
+ if available.
+
+ rcutorture.gp_cond_exp_full= [KNL]
+ Use conditional/asynchronous update-side
+ expedited-grace-period primitives that also take
+ concurrent normal grace periods into account,
+ if available.
+
+ rcutorture.gp_cond_wi= [KNL]
+ Nominal wait interval for normal conditional
+ grace periods (specified by rcutorture's
+ gp_cond and gp_cond_full module parameters),
+ in microseconds. The actual wait interval will
+ be randomly selected to nanosecond granularity up
+ to this wait interval. Defaults to 16 jiffies,
+ for example, 16,000 microseconds on a system
+ with HZ=1000.
+
+ rcutorture.gp_cond_wi_exp= [KNL]
+ Nominal wait interval for expedited conditional
+ grace periods (specified by rcutorture's
+ gp_cond_exp and gp_cond_exp_full module
+ parameters), in microseconds. The actual wait
+ interval will be randomly selected to nanosecond
+ granularity up to this wait interval. Defaults to
+ 128 microseconds.
rcutorture.gp_exp= [KNL]
Use expedited update-side primitives, if available.
@@ -5376,6 +5574,43 @@
Use normal (non-expedited) asynchronous
update-side primitives, if available.
+ rcutorture.gp_poll= [KNL]
+ Use polled update-side normal-grace-period
+ primitives, if available.
+
+ rcutorture.gp_poll_exp= [KNL]
+ Use polled update-side expedited-grace-period
+ primitives, if available.
+
+ rcutorture.gp_poll_full= [KNL]
+ Use polled update-side normal-grace-period
+ primitives that also take concurrent expedited
+ grace periods into account, if available.
+
+ rcutorture.gp_poll_exp_full= [KNL]
+ Use polled update-side expedited-grace-period
+ primitives that also take concurrent normal
+ grace periods into account, if available.
+
+ rcutorture.gp_poll_wi= [KNL]
+ Nominal wait interval for normal conditional
+ grace periods (specified by rcutorture's
+ gp_poll and gp_poll_full module parameters),
+ in microseconds. The actual wait interval will
+ be randomly selected to nanosecond granularity up
+ to this wait interval. Defaults to 16 jiffies,
+ for example, 16,000 microseconds on a system
+ with HZ=1000.
+
+ rcutorture.gp_poll_wi_exp= [KNL]
+ Nominal wait interval for expedited conditional
+ grace periods (specified by rcutorture's
+ gp_poll_exp and gp_poll_exp_full module
+ parameters), in microseconds. The actual wait
+ interval will be randomly selected to nanosecond
+ granularity up to this wait interval. Defaults to
+ 128 microseconds.
+
rcutorture.gp_sync= [KNL]
Use normal (non-expedited) synchronous
update-side primitives, if available. If all
@@ -5429,6 +5664,22 @@
Set time (jiffies) between CPU-hotplug operations,
or zero to disable CPU-hotplug testing.
+ rcutorture.preempt_duration= [KNL]
+ Set duration (in milliseconds) of preemptions
+ by a high-priority FIFO real-time task. Set to
+ zero (the default) to disable. The CPUs to
+ preempt are selected randomly from the set that
+ are online at a given point in time. Races with
+ CPUs going offline are ignored, with that attempt
+ at preemption skipped.
+
+ rcutorture.preempt_interval= [KNL]
+ Set interval (in milliseconds, defaulting to one
+ second) between preemptions by a high-priority
+ FIFO real-time task. This delay is mediated
+ by an hrtimer and is further fuzzed to avoid
+ inadvertent synchronizations.
+
rcutorture.read_exit_burst= [KNL]
The number of times in a given read-then-exit
episode that a set of read-then-exit kthreads
@@ -5715,6 +5966,55 @@
reboot_cpu is s[mp]#### with #### being the processor
to be used for rebooting.
+ acpi
+ Use the ACPI RESET_REG in the FADT. If ACPI is not
+ configured or the ACPI reset does not work, the reboot
+ path attempts the reset using the keyboard controller.
+
+ bios
+ Use the CPU reboot vector for warm reset
+
+ cold
+ Set the cold reboot flag
+
+ default
+ There are some built-in platform specific "quirks"
+ - you may see: "reboot: <name> series board detected.
+ Selecting <type> for reboots." In the case where you
+ think the quirk is in error (e.g. you have newer BIOS,
+ or newer board) using this option will ignore the
+ built-in quirk table, and use the generic default
+ reboot actions.
+
+ efi
+ Use efi reset_system runtime service. If EFI is not
+ configured or the EFI reset does not work, the reboot
+ path attempts the reset using the keyboard controller.
+
+ force
+ Don't stop other CPUs on reboot. This can make reboot
+ more reliable in some cases.
+
+ kbd
+ Use the keyboard controller. cold reset (default)
+
+ pci
+ Use a write to the PCI config space register 0xcf9 to
+ trigger reboot.
+
+ triple
+ Force a triple fault (init)
+
+ warm
+ Don't set the cold reboot flag
+
+ Using warm reset will be much faster especially on big
+ memory systems because the BIOS will not go through
+ the memory check. Disadvantage is that not all
+ hardware will be completely reinitialized on reboot so
+ there may be boot problems on some systems.
+
+
refscale.holdoff= [KNL]
Set test-start holdoff period. The purpose of
this parameter is to delay the start of the
@@ -6106,7 +6406,16 @@
serialnumber [BUGS=X86-32]
- sev=option[,option...] [X86-64] See Documentation/arch/x86/x86_64/boot-options.rst
+ sev=option[,option...] [X86-64]
+
+ debug
+ Enable debug messages.
+
+ nosnp
+ Do not enable SEV-SNP (applies to host/hypervisor
+ only). Setting 'nosnp' avoids the RMP check overhead
+ in memory accesses when users do not want to run
+ SEV-SNP guests.
shapers= [NET]
Maximal number of shapers.
@@ -6858,6 +7167,14 @@
comma-separated list of trace events to enable. See
also Documentation/trace/events.rst
+ To enable modules, use :mod: keyword:
+
+ trace_event=:mod:<module>
+
+ The value before :mod: will only enable specific events
+ that are part of the module. See the above mentioned
+ document for more information.
+
trace_instance=[instance-info]
[FTRACE] Create a ring buffer instance early in boot up.
This will be listed in:
@@ -6992,6 +7309,13 @@
See Documentation/admin-guide/mm/transhuge.rst
for more details.
+ transparent_hugepage_tmpfs= [KNL]
+ Format: [always|within_size|advise|never]
+ Can be used to control the default hugepage allocation policy
+ for the tmpfs mount.
+ See Documentation/admin-guide/mm/transhuge.rst
+ for more details.
+
trusted.source= [KEYS]
Format: <string>
This parameter identifies the trust source as a backend
@@ -7474,7 +7798,7 @@
vt.cur_default= [VT] Default cursor shape.
Format: 0xCCBBAA, where AA, BB, and CC are the same as
the parameters of the <Esc>[?A;B;Cc escape sequence;
- see VGA-softcursor.txt. Default: 2 = underline.
+ see vga-softcursor.rst. Default: 2 = underline.
vt.default_blu= [VT]
Format: <blue0>,<blue1>,<blue2>,...,<blue15>
diff --git a/Documentation/admin-guide/media/ipu3.rst b/Documentation/admin-guide/media/ipu3.rst
index 83b3cd03b35c..9c190942932e 100644
--- a/Documentation/admin-guide/media/ipu3.rst
+++ b/Documentation/admin-guide/media/ipu3.rst
@@ -98,7 +98,7 @@ frames in packed raw Bayer format to IPU3 CSI2 receiver.
# and that ov5670 sensor is connected to i2c bus 10 with address 0x36
export SDEV=$(media-ctl -d $MDEV -e "ov5670 10-0036")
- # Establish the link for the media devices using media-ctl [#f3]_
+ # Establish the link for the media devices using media-ctl
media-ctl -d $MDEV -l "ov5670:0 -> ipu3-csi2 0:0[1]"
# Set the format for the media devices
@@ -589,12 +589,8 @@ preserved.
References
==========
-.. [#f5] drivers/staging/media/ipu3/include/uapi/intel-ipu3.h
-
.. [#f1] https://github.com/intel/nvt
.. [#f2] http://git.ideasonboard.org/yavta.git
-.. [#f3] http://git.ideasonboard.org/?p=media-ctl.git;a=summary
-
.. [#f4] ImgU limitation requires an additional 16x16 for all input resolutions
diff --git a/Documentation/admin-guide/mm/damon/start.rst b/Documentation/admin-guide/mm/damon/start.rst
index c4dddf6733cd..ede14b679d02 100644
--- a/Documentation/admin-guide/mm/damon/start.rst
+++ b/Documentation/admin-guide/mm/damon/start.rst
@@ -42,32 +42,45 @@ the execution. ::
$ git clone https://github.com/sjp38/masim; cd masim; make
$ sudo damo start "./masim ./configs/stairs.cfg --quiet"
- $ sudo ./damo show
- 0 addr [85.541 TiB , 85.541 TiB ) (57.707 MiB ) access 0 % age 10.400 s
- 1 addr [85.541 TiB , 85.542 TiB ) (413.285 MiB) access 0 % age 11.400 s
- 2 addr [127.649 TiB , 127.649 TiB) (57.500 MiB ) access 0 % age 1.600 s
- 3 addr [127.649 TiB , 127.649 TiB) (32.500 MiB ) access 0 % age 500 ms
- 4 addr [127.649 TiB , 127.649 TiB) (9.535 MiB ) access 100 % age 300 ms
- 5 addr [127.649 TiB , 127.649 TiB) (8.000 KiB ) access 60 % age 0 ns
- 6 addr [127.649 TiB , 127.649 TiB) (6.926 MiB ) access 0 % age 1 s
- 7 addr [127.998 TiB , 127.998 TiB) (120.000 KiB) access 0 % age 11.100 s
- 8 addr [127.998 TiB , 127.998 TiB) (8.000 KiB ) access 40 % age 100 ms
- 9 addr [127.998 TiB , 127.998 TiB) (4.000 KiB ) access 0 % age 11 s
- total size: 577.590 MiB
- $ sudo ./damo stop
+ $ sudo damo report access
+ heatmap: 641111111000000000000000000000000000000000000000000000[...]33333333333333335557984444[...]7
+ # min/max temperatures: -1,840,000,000, 370,010,000, column size: 3.925 MiB
+ 0 addr 86.182 TiB size 8.000 KiB access 0 % age 14.900 s
+ 1 addr 86.182 TiB size 8.000 KiB access 60 % age 0 ns
+ 2 addr 86.182 TiB size 3.422 MiB access 0 % age 4.100 s
+ 3 addr 86.182 TiB size 2.004 MiB access 95 % age 2.200 s
+ 4 addr 86.182 TiB size 29.688 MiB access 0 % age 14.100 s
+ 5 addr 86.182 TiB size 29.516 MiB access 0 % age 16.700 s
+ 6 addr 86.182 TiB size 29.633 MiB access 0 % age 17.900 s
+ 7 addr 86.182 TiB size 117.652 MiB access 0 % age 18.400 s
+ 8 addr 126.990 TiB size 62.332 MiB access 0 % age 9.500 s
+ 9 addr 126.990 TiB size 13.980 MiB access 0 % age 5.200 s
+ 10 addr 126.990 TiB size 9.539 MiB access 100 % age 3.700 s
+ 11 addr 126.990 TiB size 16.098 MiB access 0 % age 6.400 s
+ 12 addr 127.987 TiB size 132.000 KiB access 0 % age 2.900 s
+ total size: 314.008 MiB
+ $ sudo damo stop
The first command of the above example downloads and builds an artificial
memory access generator program called ``masim``. The second command asks DAMO
-to execute the artificial generator process start via the given command and
-make DAMON monitors the generator process. The third command retrieves the
-current snapshot of the monitored access pattern of the process from DAMON and
-shows the pattern in a human readable format.
-
-Each line of the output shows which virtual address range (``addr [XX, XX)``)
-of the process is how frequently (``access XX %``) accessed for how long time
-(``age XX``). For example, the fifth region of ~9 MiB size is being most
-frequently accessed for last 300 milliseconds. Finally, the fourth command
-stops DAMON.
+to start the program via the given command and make DAMON monitors the newly
+started process. The third command retrieves the current snapshot of the
+monitored access pattern of the process from DAMON and shows the pattern in a
+human readable format.
+
+The first line of the output shows the relative access temperature (hotness) of
+the regions in a single row hetmap format. Each column on the heatmap
+represents regions of same size on the monitored virtual address space. The
+position of the colun on the row and the number on the column represents the
+relative location and access temperature of the region. ``[...]`` means
+unmapped huge regions on the virtual address spaces. The second line shows
+additional information for better understanding the heatmap.
+
+Each line of the output from the third line shows which virtual address range
+(``addr XX size XX``) of the process is how frequently (``access XX %``)
+accessed for how long time (``age XX``). For example, the evelenth region of
+~9.5 MiB size is being most frequently accessed for last 3.7 seconds. Finally,
+the fourth command stops DAMON.
Note that DAMON can monitor not only virtual address spaces but multiple types
of address spaces including the physical address space.
@@ -95,7 +108,7 @@ Visualizing Recorded Patterns
You can visualize the pattern in a heatmap, showing which memory region
(x-axis) got accessed when (y-axis) and how frequently (number).::
- $ sudo damo report heats --heatmap stdout
+ $ sudo damo report heatmap
22222222222222222222222222222222222222211111111111111111111111111111111111111100
44444444444444444444444444444444444444434444444444444444444444444444444444443200
44444444444444444444444444444444444444433444444444444444444444444444444444444200
@@ -160,6 +173,6 @@ Data Access Pattern Aware Memory Management
Below command makes every memory region of size >=4K that has not accessed for
>=60 seconds in your workload to be swapped out. ::
- $ sudo damo schemes --damos_access_rate 0 0 --damos_sz_region 4K max \
- --damos_age 60s max --damos_action pageout \
- <pid of your workload>
+ $ sudo damo start --damos_access_rate 0 0 --damos_sz_region 4K max \
+ --damos_age 60s max --damos_action pageout \
+ <pid of your workload>
diff --git a/Documentation/admin-guide/mm/damon/usage.rst b/Documentation/admin-guide/mm/damon/usage.rst
index d9be9f7caa7d..47a44bd348ab 100644
--- a/Documentation/admin-guide/mm/damon/usage.rst
+++ b/Documentation/admin-guide/mm/damon/usage.rst
@@ -26,12 +26,6 @@ DAMON provides below interfaces for different users.
writing kernel space DAMON application programs for you. You can even extend
DAMON for various address spaces. For detail, please refer to the interface
:doc:`document </mm/damon/api>`.
-- *debugfs interface. (DEPRECATED!)*
- :ref:`This <debugfs_interface>` is almost identical to :ref:`sysfs interface
- <sysfs_interface>`. This is deprecated, so users should move to the
- :ref:`sysfs interface <sysfs_interface>`. If you depend on this and cannot
- move, please report your usecase to damon@lists.linux.dev and
- linux-mm@kvack.org.
.. _sysfs_interface:
@@ -89,10 +83,10 @@ comma (",").
│ │ │ │ │ │ │ │ │ 0/target_metric,target_value,current_value
│ │ │ │ │ │ │ :ref:`watermarks <sysfs_watermarks>`/metric,interval_us,high,mid,low
│ │ │ │ │ │ │ :ref:`filters <sysfs_filters>`/nr_filters
- │ │ │ │ │ │ │ │ 0/type,matching,memcg_id
- │ │ │ │ │ │ │ :ref:`stats <sysfs_schemes_stats>`/nr_tried,sz_tried,nr_applied,sz_applied,qt_exceeds
+ │ │ │ │ │ │ │ │ 0/type,matching,allow,memcg_path,addr_start,addr_end,target_idx
+ │ │ │ │ │ │ │ :ref:`stats <sysfs_schemes_stats>`/nr_tried,sz_tried,nr_applied,sz_applied,sz_ops_filter_passed,qt_exceeds
│ │ │ │ │ │ │ :ref:`tried_regions <sysfs_schemes_tried_regions>`/total_bytes
- │ │ │ │ │ │ │ │ 0/start,end,nr_accesses,age
+ │ │ │ │ │ │ │ │ 0/start,end,nr_accesses,age,sz_filter_passed
│ │ │ │ │ │ │ │ ...
│ │ │ │ │ │ ...
│ │ │ │ ...
@@ -412,59 +406,62 @@ number (``N``) to the file creates the number of child directories named ``0``
to ``N-1``. Each directory represents each filter. The filters are evaluated
in the numeric order.
-Each filter directory contains six files, namely ``type``, ``matcing``,
-``memcg_path``, ``addr_start``, ``addr_end``, and ``target_idx``. To ``type``
-file, you can write one of five special keywords: ``anon`` for anonymous pages,
-``memcg`` for specific memory cgroup, ``young`` for young pages, ``addr`` for
-specific address range (an open-ended interval), or ``target`` for specific
-DAMON monitoring target filtering. In case of the memory cgroup filtering, you
-can specify the memory cgroup of the interest by writing the path of the memory
-cgroup from the cgroups mount point to ``memcg_path`` file. In case of the
-address range filtering, you can specify the start and end address of the range
-to ``addr_start`` and ``addr_end`` files, respectively. For the DAMON
-monitoring target filtering, you can specify the index of the target between
-the list of the DAMON context's monitoring targets list to ``target_idx`` file.
-You can write ``Y`` or ``N`` to ``matching`` file to filter out pages that does
-or does not match to the type, respectively. Then, the scheme's action will
-not be applied to the pages that specified to be filtered out.
+Each filter directory contains seven files, namely ``type``, ``matching``,
+``allow``, ``memcg_path``, ``addr_start``, ``addr_end``, and ``target_idx``.
+To ``type`` file, you can write one of five special keywords: ``anon`` for
+anonymous pages, ``memcg`` for specific memory cgroup, ``young`` for young
+pages, ``addr`` for specific address range (an open-ended interval), or
+``target`` for specific DAMON monitoring target filtering. Meaning of the
+types are same to the description on the :ref:`design doc
+<damon_design_damos_filters>`.
+
+In case of the memory cgroup filtering, you can specify the memory cgroup of
+the interest by writing the path of the memory cgroup from the cgroups mount
+point to ``memcg_path`` file. In case of the address range filtering, you can
+specify the start and end address of the range to ``addr_start`` and
+``addr_end`` files, respectively. For the DAMON monitoring target filtering,
+you can specify the index of the target between the list of the DAMON context's
+monitoring targets list to ``target_idx`` file.
+
+You can write ``Y`` or ``N`` to ``matching`` file to specify whether the filter
+is for memory that matches the ``type``. You can write ``Y`` or ``N`` to
+``allow`` file to specify if applying the action to the memory that satisfies
+the ``type`` and ``matching`` should be allowed or not.
For example, below restricts a DAMOS action to be applied to only non-anonymous
pages of all memory cgroups except ``/having_care_already``.::
# echo 2 > nr_filters
- # # filter out anonymous pages
+ # # disallow anonymous pages
echo anon > 0/type
echo Y > 0/matching
+ echo N > 0/allow
# # further filter out all cgroups except one at '/having_care_already'
echo memcg > 1/type
echo /having_care_already > 1/memcg_path
echo Y > 1/matching
+ echo N > 1/allow
-Note that ``anon`` and ``memcg`` filters are currently supported only when
-``paddr`` :ref:`implementation <sysfs_context>` is being used.
-
-Also, memory regions that are filtered out by ``addr`` or ``target`` filters
-are not counted as the scheme has tried to those, while regions that filtered
-out by other type filters are counted as the scheme has tried to. The
-difference is applied to :ref:`stats <damos_stats>` and
-:ref:`tried regions <sysfs_schemes_tried_regions>`.
+Refer to the :ref:`DAMOS filters design documentation
+<damon_design_damos_filters>` for more details including how multiple filters
+of different ``allow`` works, when each of the filters are supported, and
+differences on stats.
.. _sysfs_schemes_stats:
schemes/<N>/stats/
------------------
-DAMON counts the total number and bytes of regions that each scheme is tried to
-be applied, the two numbers for the regions that each scheme is successfully
-applied, and the total number of the quota limit exceeds. This statistics can
-be used for online analysis or tuning of the schemes.
+DAMON counts statistics for each scheme. This statistics can be used for
+online analysis or tuning of the schemes. Refer to :ref:`design doc
+<damon_design_damos_stat>` for more details about the stats.
The statistics can be retrieved by reading the files under ``stats`` directory
-(``nr_tried``, ``sz_tried``, ``nr_applied``, ``sz_applied``, and
-``qt_exceeds``), respectively. The files are not updated in real time, so you
-should ask DAMON sysfs interface to update the content of the files for the
-stats by writing a special keyword, ``update_schemes_stats`` to the relevant
-``kdamonds/<N>/state`` file.
+(``nr_tried``, ``sz_tried``, ``nr_applied``, ``sz_applied``,
+``sz_ops_filter_passed``, and ``qt_exceeds``), respectively. The files are not
+updated in real time, so you should ask DAMON sysfs interface to update the
+content of the files for the stats by writing a special keyword,
+``update_schemes_stats`` to the relevant ``kdamonds/<N>/state`` file.
.. _sysfs_schemes_tried_regions:
@@ -501,10 +498,10 @@ set the ``access pattern`` as their interested pattern that they want to query.
tried_regions/<N>/
------------------
-In each region directory, you will find four files (``start``, ``end``,
-``nr_accesses``, and ``age``). Reading the files will show the start and end
-addresses, ``nr_accesses``, and ``age`` of the region that corresponding
-DAMON-based operation scheme ``action`` has tried to be applied.
+In each region directory, you will find five files (``start``, ``end``,
+``nr_accesses``, ``age``, and ``sz_filter_passed``). Reading the files will
+show the properties of the region that corresponding DAMON-based operation
+scheme ``action`` has tried to be applied.
Example
~~~~~~~
@@ -600,306 +597,3 @@ fields are as usual. It shows the index of the DAMON context (``ctx_idx=X``)
of the scheme in the list of the contexts of the context's kdamond, the index
of the scheme (``scheme_idx=X``) in the list of the schemes of the context, in
addition to the output of ``damon_aggregated`` tracepoint.
-
-
-.. _debugfs_interface:
-
-debugfs Interface (DEPRECATED!)
-===============================
-
-.. note::
-
- THIS IS DEPRECATED!
-
- DAMON debugfs interface is deprecated, so users should move to the
- :ref:`sysfs interface <sysfs_interface>`. If you depend on this and cannot
- move, please report your usecase to damon@lists.linux.dev and
- linux-mm@kvack.org.
-
-DAMON exports nine files, ``DEPRECATED``, ``attrs``, ``target_ids``,
-``init_regions``, ``schemes``, ``monitor_on_DEPRECATED``, ``kdamond_pid``,
-``mk_contexts`` and ``rm_contexts`` under its debugfs directory,
-``<debugfs>/damon/``.
-
-
-``DEPRECATED`` is a read-only file for the DAMON debugfs interface deprecation
-notice. Reading it returns the deprecation notice, as below::
-
- # cat DEPRECATED
- DAMON debugfs interface is deprecated, so users should move to DAMON_SYSFS. If you cannot, please report your usecase to damon@lists.linux.dev and linux-mm@kvack.org.
-
-
-Attributes
-----------
-
-Users can get and set the ``sampling interval``, ``aggregation interval``,
-``update interval``, and min/max number of monitoring target regions by
-reading from and writing to the ``attrs`` file. To know about the monitoring
-attributes in detail, please refer to the :doc:`/mm/damon/design`. For
-example, below commands set those values to 5 ms, 100 ms, 1,000 ms, 10 and
-1000, and then check it again::
-
- # cd <debugfs>/damon
- # echo 5000 100000 1000000 10 1000 > attrs
- # cat attrs
- 5000 100000 1000000 10 1000
-
-
-Target IDs
-----------
-
-Some types of address spaces supports multiple monitoring target. For example,
-the virtual memory address spaces monitoring can have multiple processes as the
-monitoring targets. Users can set the targets by writing relevant id values of
-the targets to, and get the ids of the current targets by reading from the
-``target_ids`` file. In case of the virtual address spaces monitoring, the
-values should be pids of the monitoring target processes. For example, below
-commands set processes having pids 42 and 4242 as the monitoring targets and
-check it again::
-
- # cd <debugfs>/damon
- # echo 42 4242 > target_ids
- # cat target_ids
- 42 4242
-
-Users can also monitor the physical memory address space of the system by
-writing a special keyword, "``paddr\n``" to the file. Because physical address
-space monitoring doesn't support multiple targets, reading the file will show a
-fake value, ``42``, as below::
-
- # cd <debugfs>/damon
- # echo paddr > target_ids
- # cat target_ids
- 42
-
-Note that setting the target ids doesn't start the monitoring.
-
-
-Initial Monitoring Target Regions
----------------------------------
-
-In case of the virtual address space monitoring, DAMON automatically sets and
-updates the monitoring target regions so that entire memory mappings of target
-processes can be covered. However, users can want to limit the monitoring
-region to specific address ranges, such as the heap, the stack, or specific
-file-mapped area. Or, some users can know the initial access pattern of their
-workloads and therefore want to set optimal initial regions for the 'adaptive
-regions adjustment'.
-
-In contrast, DAMON do not automatically sets and updates the monitoring target
-regions in case of physical memory monitoring. Therefore, users should set the
-monitoring target regions by themselves.
-
-In such cases, users can explicitly set the initial monitoring target regions
-as they want, by writing proper values to the ``init_regions`` file. The input
-should be a sequence of three integers separated by white spaces that represent
-one region in below form.::
-
- <target idx> <start address> <end address>
-
-The ``target idx`` should be the index of the target in ``target_ids`` file,
-starting from ``0``, and the regions should be passed in address order. For
-example, below commands will set a couple of address ranges, ``1-100`` and
-``100-200`` as the initial monitoring target region of pid 42, which is the
-first one (index ``0``) in ``target_ids``, and another couple of address
-ranges, ``20-40`` and ``50-100`` as that of pid 4242, which is the second one
-(index ``1``) in ``target_ids``.::
-
- # cd <debugfs>/damon
- # cat target_ids
- 42 4242
- # echo "0 1 100 \
- 0 100 200 \
- 1 20 40 \
- 1 50 100" > init_regions
-
-Note that this sets the initial monitoring target regions only. In case of
-virtual memory monitoring, DAMON will automatically updates the boundary of the
-regions after one ``update interval``. Therefore, users should set the
-``update interval`` large enough in this case, if they don't want the
-update.
-
-
-Schemes
--------
-
-Users can get and set the DAMON-based operation :ref:`schemes
-<damon_design_damos>` by reading from and writing to ``schemes`` debugfs file.
-Reading the file also shows the statistics of each scheme. To the file, each
-of the schemes should be represented in each line in below form::
-
- <target access pattern> <action> <quota> <watermarks>
-
-You can disable schemes by simply writing an empty string to the file.
-
-Target Access Pattern
-~~~~~~~~~~~~~~~~~~~~~
-
-The target access :ref:`pattern <damon_design_damos_access_pattern>` of the
-scheme. The ``<target access pattern>`` is constructed with three ranges in
-below form::
-
- min-size max-size min-acc max-acc min-age max-age
-
-Specifically, bytes for the size of regions (``min-size`` and ``max-size``),
-number of monitored accesses per aggregate interval for access frequency
-(``min-acc`` and ``max-acc``), number of aggregate intervals for the age of
-regions (``min-age`` and ``max-age``) are specified. Note that the ranges are
-closed interval.
-
-Action
-~~~~~~
-
-The ``<action>`` is a predefined integer for memory management :ref:`actions
-<damon_design_damos_action>`. The mapping between the ``<action>`` values and
-the memory management actions is as below. For the detailed meaning of the
-action and DAMON operations set supporting each action, please refer to the
-list on :ref:`design doc <damon_design_damos_action>`.
-
- - 0: ``willneed``
- - 1: ``cold``
- - 2: ``pageout``
- - 3: ``hugepage``
- - 4: ``nohugepage``
- - 5: ``stat``
-
-Quota
-~~~~~
-
-Users can set the :ref:`quotas <damon_design_damos_quotas>` of the given scheme
-via the ``<quota>`` in below form::
-
- <ms> <sz> <reset interval> <priority weights>
-
-This makes DAMON to try to use only up to ``<ms>`` milliseconds for applying
-the action to memory regions of the ``target access pattern`` within the
-``<reset interval>`` milliseconds, and to apply the action to only up to
-``<sz>`` bytes of memory regions within the ``<reset interval>``. Setting both
-``<ms>`` and ``<sz>`` zero disables the quota limits.
-
-For the :ref:`prioritization <damon_design_damos_quotas_prioritization>`, users
-can set the weights for the three properties in ``<priority weights>`` in below
-form::
-
- <size weight> <access frequency weight> <age weight>
-
-Watermarks
-~~~~~~~~~~
-
-Users can specify :ref:`watermarks <damon_design_damos_watermarks>` of the
-given scheme via ``<watermarks>`` in below form::
-
- <metric> <check interval> <high mark> <middle mark> <low mark>
-
-``<metric>`` is a predefined integer for the metric to be checked. The
-supported numbers and their meanings are as below.
-
- - 0: Ignore the watermarks
- - 1: System's free memory rate (per thousand)
-
-The value of the metric is checked every ``<check interval>`` microseconds.
-
-If the value is higher than ``<high mark>`` or lower than ``<low mark>``, the
-scheme is deactivated. If the value is lower than ``<mid mark>``, the scheme
-is activated.
-
-.. _damos_stats:
-
-Statistics
-~~~~~~~~~~
-
-It also counts the total number and bytes of regions that each scheme is tried
-to be applied, the two numbers for the regions that each scheme is successfully
-applied, and the total number of the quota limit exceeds. This statistics can
-be used for online analysis or tuning of the schemes.
-
-The statistics can be shown by reading the ``schemes`` file. Reading the file
-will show each scheme you entered in each line, and the five numbers for the
-statistics will be added at the end of each line.
-
-Example
-~~~~~~~
-
-Below commands applies a scheme saying "If a memory region of size in [4KiB,
-8KiB] is showing accesses per aggregate interval in [0, 5] for aggregate
-interval in [10, 20], page out the region. For the paging out, use only up to
-10ms per second, and also don't page out more than 1GiB per second. Under the
-limitation, page out memory regions having longer age first. Also, check the
-free memory rate of the system every 5 seconds, start the monitoring and paging
-out when the free memory rate becomes lower than 50%, but stop it if the free
-memory rate becomes larger than 60%, or lower than 30%".::
-
- # cd <debugfs>/damon
- # scheme="4096 8192 0 5 10 20 2" # target access pattern and action
- # scheme+=" 10 $((1024*1024*1024)) 1000" # quotas
- # scheme+=" 0 0 100" # prioritization weights
- # scheme+=" 1 5000000 600 500 300" # watermarks
- # echo "$scheme" > schemes
-
-
-Turning On/Off
---------------
-
-Setting the files as described above doesn't incur effect unless you explicitly
-start the monitoring. You can start, stop, and check the current status of the
-monitoring by writing to and reading from the ``monitor_on_DEPRECATED`` file.
-Writing ``on`` to the file starts the monitoring of the targets with the
-attributes. Writing ``off`` to the file stops those. DAMON also stops if
-every target process is terminated. Below example commands turn on, off, and
-check the status of DAMON::
-
- # cd <debugfs>/damon
- # echo on > monitor_on_DEPRECATED
- # echo off > monitor_on_DEPRECATED
- # cat monitor_on_DEPRECATED
- off
-
-Please note that you cannot write to the above-mentioned debugfs files while
-the monitoring is turned on. If you write to the files while DAMON is running,
-an error code such as ``-EBUSY`` will be returned.
-
-
-Monitoring Thread PID
----------------------
-
-DAMON does requested monitoring with a kernel thread called ``kdamond``. You
-can get the pid of the thread by reading the ``kdamond_pid`` file. When the
-monitoring is turned off, reading the file returns ``none``. ::
-
- # cd <debugfs>/damon
- # cat monitor_on_DEPRECATED
- off
- # cat kdamond_pid
- none
- # echo on > monitor_on_DEPRECATED
- # cat kdamond_pid
- 18594
-
-
-Using Multiple Monitoring Threads
----------------------------------
-
-One ``kdamond`` thread is created for each monitoring context. You can create
-and remove monitoring contexts for multiple ``kdamond`` required use case using
-the ``mk_contexts`` and ``rm_contexts`` files.
-
-Writing the name of the new context to the ``mk_contexts`` file creates a
-directory of the name on the DAMON debugfs directory. The directory will have
-DAMON debugfs files for the context. ::
-
- # cd <debugfs>/damon
- # ls foo
- # ls: cannot access 'foo': No such file or directory
- # echo foo > mk_contexts
- # ls foo
- # attrs init_regions kdamond_pid schemes target_ids
-
-If the context is not needed anymore, you can remove it and the corresponding
-directory by putting the name of the context to the ``rm_contexts`` file. ::
-
- # echo foo > rm_contexts
- # ls foo
- # ls: cannot access 'foo': No such file or directory
-
-Note that ``mk_contexts``, ``rm_contexts``, and ``monitor_on_DEPRECATED`` files
-are in the root directory only.
diff --git a/Documentation/admin-guide/mm/memory-hotplug.rst b/Documentation/admin-guide/mm/memory-hotplug.rst
index cb2c080f400c..33c886f3d198 100644
--- a/Documentation/admin-guide/mm/memory-hotplug.rst
+++ b/Documentation/admin-guide/mm/memory-hotplug.rst
@@ -280,8 +280,8 @@ The following files are currently defined:
blocks; configure auto-onlining.
The default value depends on the
- CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE kernel configuration
- option.
+ CONFIG_MHP_DEFAULT_ONLINE_TYPE kernel configuration
+ options.
See the ``state`` property of memory blocks for details.
``block_size_bytes`` read-only: the size in bytes of a memory block.
diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
index 8872203df088..dff8d5985f0f 100644
--- a/Documentation/admin-guide/mm/transhuge.rst
+++ b/Documentation/admin-guide/mm/transhuge.rst
@@ -332,6 +332,12 @@ allocation policy for the internal shmem mount by using the kernel parameter
seven valid policies for shmem (``always``, ``within_size``, ``advise``,
``never``, ``deny``, and ``force``).
+Similarly to ``transparent_hugepage_shmem``, you can control the default
+hugepage allocation policy for the tmpfs mount by using the kernel parameter
+``transparent_hugepage_tmpfs=<policy>``, where ``<policy>`` is one of the
+four valid policies for tmpfs (``always``, ``within_size``, ``advise``,
+``never``). The tmpfs mount default policy is ``never``.
+
In the same manner as ``thp_anon`` controls each supported anonymous THP
size, ``thp_shmem`` controls each supported shmem THP size. ``thp_shmem``
has the same format as ``thp_anon``, but also supports the policy
@@ -352,8 +358,21 @@ default to ``never``.
Hugepages in tmpfs/shmem
========================
-You can control hugepage allocation policy in tmpfs with mount option
-``huge=``. It can have following values:
+Traditionally, tmpfs only supported a single huge page size ("PMD"). Today,
+it also supports smaller sizes just like anonymous memory, often referred
+to as "multi-size THP" (mTHP). Huge pages of any size are commonly
+represented in the kernel as "large folios".
+
+While there is fine control over the huge page sizes to use for the internal
+shmem mount (see below), ordinary tmpfs mounts will make use of all available
+huge page sizes without any control over the exact sizes, behaving more like
+other file systems.
+
+tmpfs mounts
+------------
+
+The THP allocation policy for tmpfs mounts can be adjusted using the mount
+option: ``huge=``. It can have following values:
always
Attempt to allocate huge pages every time we need a new page;
@@ -363,24 +382,24 @@ never
within_size
Only allocate huge page if it will be fully within i_size.
- Also respect fadvise()/madvise() hints;
+ Also respect madvise() hints;
advise
- Only allocate huge pages if requested with fadvise()/madvise();
+ Only allocate huge pages if requested with madvise();
+
+Remember, that the kernel may use huge pages of all available sizes, and
+that no fine control as for the internal tmpfs mount is available.
-The default policy is ``never``.
+The default policy in the past was ``never``, but it can now be adjusted
+using the kernel parameter ``transparent_hugepage_tmpfs=<policy>``.
``mount -o remount,huge= /mountpoint`` works fine after mount: remounting
``huge=never`` will not attempt to break up huge pages at all, just stop more
from being allocated.
-There's also sysfs knob to control hugepage allocation policy for internal
-shmem mount: /sys/kernel/mm/transparent_hugepage/shmem_enabled. The mount
-is used for SysV SHM, memfds, shared anonymous mmaps (of /dev/zero or
-MAP_ANONYMOUS), GPU drivers' DRM objects, Ashmem.
-
-In addition to policies listed above, shmem_enabled allows two further
-values:
+In addition to policies listed above, the sysfs knob
+/sys/kernel/mm/transparent_hugepage/shmem_enabled will affect the
+allocation policy of tmpfs mounts, when set to the following values:
deny
For use in emergencies, to force the huge option off from
@@ -388,13 +407,24 @@ deny
force
Force the huge option on for all - very useful for testing;
-Shmem can also use "multi-size THP" (mTHP) by adding a new sysfs knob to
-control mTHP allocation:
-'/sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/shmem_enabled',
-and its value for each mTHP is essentially consistent with the global
-setting. An 'inherit' option is added to ensure compatibility with these
-global settings. Conversely, the options 'force' and 'deny' are dropped,
-which are rather testing artifacts from the old ages.
+shmem / internal tmpfs
+----------------------
+The mount internal tmpfs mount is used for SysV SHM, memfds, shared anonymous
+mmaps (of /dev/zero or MAP_ANONYMOUS), GPU drivers' DRM objects, Ashmem.
+
+To control the THP allocation policy for this internal tmpfs mount, the
+sysfs knob /sys/kernel/mm/transparent_hugepage/shmem_enabled and the knobs
+per THP size in
+'/sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/shmem_enabled'
+can be used.
+
+The global knob has the same semantics as the ``huge=`` mount options
+for tmpfs mounts, except that the different huge page sizes can be controlled
+individually, and will only use the setting of the global knob when the
+per-size knob is set to 'inherit'.
+
+The options 'force' and 'deny' are dropped for the individual sizes, which
+are rather testing artifacts from the old ages.
always
Attempt to allocate <size> huge pages every time we need a new page;
@@ -408,10 +438,10 @@ never
within_size
Only allocate <size> huge page if it will be fully within i_size.
- Also respect fadvise()/madvise() hints;
+ Also respect madvise() hints;
advise
- Only allocate <size> huge pages if requested with fadvise()/madvise();
+ Only allocate <size> huge pages if requested with madvise();
Need of application restart
===========================
@@ -561,6 +591,16 @@ swpin
is incremented every time a huge page is swapped in from a non-zswap
swap device in one piece.
+swpin_fallback
+ is incremented if swapin fails to allocate or charge a huge page
+ and instead falls back to using huge pages with lower orders or
+ small pages.
+
+swpin_fallback_charge
+ is incremented if swapin fails to charge a huge page and instead
+ falls back to using huge pages with lower orders or small pages
+ even though the allocation was successful.
+
swpout
is incremented every time a huge page is swapped out to a non-zswap
swap device in one piece without splitting.
diff --git a/Documentation/admin-guide/nvme-multipath.rst b/Documentation/admin-guide/nvme-multipath.rst
new file mode 100644
index 000000000000..97ca1ccef459
--- /dev/null
+++ b/Documentation/admin-guide/nvme-multipath.rst
@@ -0,0 +1,72 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+====================
+Linux NVMe multipath
+====================
+
+This document describes NVMe multipath and its path selection policies supported
+by the Linux NVMe host driver.
+
+
+Introduction
+============
+
+The NVMe multipath feature in Linux integrates namespaces with the same
+identifier into a single block device. Using multipath enhances the reliability
+and stability of I/O access while improving bandwidth performance. When a user
+sends I/O to this merged block device, the multipath mechanism selects one of
+the underlying block devices (paths) according to the configured policy.
+Different policies result in different path selections.
+
+
+Policies
+========
+
+All policies follow the ANA (Asymmetric Namespace Access) mechanism, meaning
+that when an optimized path is available, it will be chosen over a non-optimized
+one. Current the NVMe multipath policies include numa(default), round-robin and
+queue-depth.
+
+To set the desired policy (e.g., round-robin), use one of the following methods:
+ 1. echo -n "round-robin" > /sys/module/nvme_core/parameters/iopolicy
+ 2. or add the "nvme_core.iopolicy=round-robin" to cmdline.
+
+
+NUMA
+----
+
+The NUMA policy selects the path closest to the NUMA node of the current CPU for
+I/O distribution. This policy maintains the nearest paths to each NUMA node
+based on network interface connections.
+
+When to use the NUMA policy:
+ 1. Multi-core Systems: Optimizes memory access in multi-core and
+ multi-processor systems, especially under NUMA architecture.
+ 2. High Affinity Workloads: Binds I/O processing to the CPU to reduce
+ communication and data transfer delays across nodes.
+
+
+Round-Robin
+-----------
+
+The round-robin policy distributes I/O requests evenly across all paths to
+enhance throughput and resource utilization. Each I/O operation is sent to the
+next path in sequence.
+
+When to use the round-robin policy:
+ 1. Balanced Workloads: Effective for balanced and predictable workloads with
+ similar I/O size and type.
+ 2. Homogeneous Path Performance: Utilizes all paths efficiently when
+ performance characteristics (e.g., latency, bandwidth) are similar.
+
+
+Queue-Depth
+-----------
+
+The queue-depth policy manages I/O requests based on the current queue depth
+of each path, selecting the path with the least number of in-flight I/Os.
+
+When to use the queue-depth policy:
+ 1. High load with small I/Os: Effectively balances load across paths when
+ the load is high, and I/O operations consist of small, relatively
+ fixed-sized requests.
diff --git a/Documentation/admin-guide/perf/dwc_pcie_pmu.rst b/Documentation/admin-guide/perf/dwc_pcie_pmu.rst
index 39b8e1fdd0cd..cb376f335f40 100644
--- a/Documentation/admin-guide/perf/dwc_pcie_pmu.rst
+++ b/Documentation/admin-guide/perf/dwc_pcie_pmu.rst
@@ -60,7 +60,7 @@ description of available events and configuration options in sysfs, see
The "format" directory describes format of the config fields of the
perf_event_attr structure. The "events" directory provides configuration
templates for all documented events. For example,
-"Rx_PCIe_TLP_Data_Payload" is an equivalent of "eventid=0x22,type=0x1".
+"rx_pcie_tlp_data_payload" is an equivalent of "eventid=0x21,type=0x0".
The "perf list" command shall list the available events from sysfs, e.g.::
@@ -79,8 +79,8 @@ Example usage of counting PCIe RX TLP data payload (Units of bytes)::
The average RX/TX bandwidth can be calculated using the following formula:
- PCIe RX Bandwidth = Rx_PCIe_TLP_Data_Payload / Measure_Time_Window
- PCIe TX Bandwidth = Tx_PCIe_TLP_Data_Payload / Measure_Time_Window
+ PCIe RX Bandwidth = rx_pcie_tlp_data_payload / Measure_Time_Window
+ PCIe TX Bandwidth = tx_pcie_tlp_data_payload / Measure_Time_Window
Lane Event Usage
-------------------------------
diff --git a/Documentation/admin-guide/perf/hisi-pmu.rst b/Documentation/admin-guide/perf/hisi-pmu.rst
index 5cc248d18c63..48992a0b8e94 100644
--- a/Documentation/admin-guide/perf/hisi-pmu.rst
+++ b/Documentation/admin-guide/perf/hisi-pmu.rst
@@ -35,7 +35,10 @@ e.g. hisi_sccl1_hha0/rx_operations is RX_OPERATIONS event of HHA index #0 in
SCCL ID #1.
The driver also provides a "cpumask" sysfs attribute, which shows the CPU core
-ID used to count the uncore PMU event.
+ID used to count the uncore PMU event. An "associated_cpus" sysfs attribute is
+also provided to show the CPUs associated with this PMU. The "cpumask" indicates
+the CPUs to open the events, usually as a hint for userspaces tools like perf.
+It only contains one associated CPU from the "associated_cpus".
Example usage of perf::
diff --git a/Documentation/admin-guide/perf/index.rst b/Documentation/admin-guide/perf/index.rst
index a58bd3f7e190..072b510385c4 100644
--- a/Documentation/admin-guide/perf/index.rst
+++ b/Documentation/admin-guide/perf/index.rst
@@ -14,6 +14,8 @@ Performance monitor support
qcom_l2_pmu
qcom_l3_pmu
starfive_starlink_pmu
+ mrvl-odyssey-ddr-pmu
+ mrvl-odyssey-tad-pmu
arm-ccn
arm-cmn
arm-ni
diff --git a/Documentation/admin-guide/perf/mrvl-odyssey-ddr-pmu.rst b/Documentation/admin-guide/perf/mrvl-odyssey-ddr-pmu.rst
new file mode 100644
index 000000000000..2e817593a4d9
--- /dev/null
+++ b/Documentation/admin-guide/perf/mrvl-odyssey-ddr-pmu.rst
@@ -0,0 +1,80 @@
+===================================================================
+Marvell Odyssey DDR PMU Performance Monitoring Unit (PMU UNCORE)
+===================================================================
+
+Odyssey DRAM Subsystem supports eight counters for monitoring performance
+and software can program those counters to monitor any of the defined
+performance events. Supported performance events include those counted
+at the interface between the DDR controller and the PHY, interface between
+the DDR Controller and the CHI interconnect, or within the DDR Controller.
+
+Additionally DSS also supports two fixed performance event counters, one
+for ddr reads and the other for ddr writes.
+
+The counter will be operating in either manual or auto mode.
+
+The PMU driver exposes the available events and format options under sysfs::
+
+ /sys/bus/event_source/devices/mrvl_ddr_pmu_<>/events/
+ /sys/bus/event_source/devices/mrvl_ddr_pmu_<>/format/
+
+Examples::
+
+ $ perf list | grep ddr
+ mrvl_ddr_pmu_<>/ddr_act_bypass_access/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_bsm_alloc/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_bsm_starvation/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_cam_active_access/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_cam_mwr/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_cam_rd_active_access/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_cam_rd_or_wr_access/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_cam_read/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_cam_wr_access/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_cam_write/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_capar_error/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_crit_ref/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_ddr_reads/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_ddr_writes/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_dfi_cmd_is_retry/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_dfi_cycles/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_dfi_parity_poison/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_dfi_rd_data_access/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_dfi_wr_data_access/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_dqsosc_mpc/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_dqsosc_mrr/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_enter_mpsm/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_enter_powerdown/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_enter_selfref/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_hif_pri_rdaccess/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_hif_rd_access/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_hif_rd_or_wr_access/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_hif_rmw_access/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_hif_wr_access/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_hpri_sched_rd_crit_access/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_load_mode/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_lpri_sched_rd_crit_access/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_precharge/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_precharge_for_other/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_precharge_for_rdwr/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_raw_hazard/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_rd_bypass_access/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_rd_crc_error/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_rd_uc_ecc_error/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_rdwr_transitions/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_refresh/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_retry_fifo_full/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_spec_ref/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_tcr_mrr/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_war_hazard/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_waw_hazard/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_win_limit_reached_rd/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_win_limit_reached_wr/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_wr_crc_error/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_wr_trxn_crit_access/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_write_combine/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_zqcl/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_zqlatch/ [Kernel PMU event]
+ mrvl_ddr_pmu_<>/ddr_zqstart/ [Kernel PMU event]
+
+ $ perf stat -e ddr_cam_read,ddr_cam_write,ddr_cam_active_access,ddr_cam
+ rd_or_wr_access,ddr_cam_rd_active_access,ddr_cam_mwr <workload>
diff --git a/Documentation/admin-guide/perf/mrvl-odyssey-tad-pmu.rst b/Documentation/admin-guide/perf/mrvl-odyssey-tad-pmu.rst
new file mode 100644
index 000000000000..ad1975b14087
--- /dev/null
+++ b/Documentation/admin-guide/perf/mrvl-odyssey-tad-pmu.rst
@@ -0,0 +1,37 @@
+====================================================================
+Marvell Odyssey LLC-TAD Performance Monitoring Unit (PMU UNCORE)
+====================================================================
+
+Each TAD provides eight 64-bit counters for monitoring
+cache behavior.The driver always configures the same counter for
+all the TADs. The user would end up effectively reserving one of
+eight counters in every TAD to look across all TADs.
+The occurrences of events are aggregated and presented to the user
+at the end of running the workload. The driver does not provide a
+way for the user to partition TADs so that different TADs are used for
+different applications.
+
+The performance events reflect various internal or interface activities.
+By combining the values from multiple performance counters, cache
+performance can be measured in terms such as: cache miss rate, cache
+allocations, interface retry rate, internal resource occupancy, etc.
+
+The PMU driver exposes the available events and format options under sysfs::
+
+ /sys/bus/event_source/devices/tad/events/
+ /sys/bus/event_source/devices/tad/format/
+
+Examples::
+
+ $ perf list | grep tad
+ tad/tad_alloc_any/ [Kernel PMU event]
+ tad/tad_alloc_dtg/ [Kernel PMU event]
+ tad/tad_alloc_ltg/ [Kernel PMU event]
+ tad/tad_hit_any/ [Kernel PMU event]
+ tad/tad_hit_dtg/ [Kernel PMU event]
+ tad/tad_hit_ltg/ [Kernel PMU event]
+ tad/tad_req_msh_in_exlmn/ [Kernel PMU event]
+ tad/tad_tag_rd/ [Kernel PMU event]
+ tad/tad_tot_cycle/ [Kernel PMU event]
+
+ $ perf stat -e tad_alloc_dtg,tad_alloc_ltg,tad_alloc_any,tad_hit_dtg,tad_hit_ltg,tad_hit_any,tad_tag_rd <workload>
diff --git a/Documentation/admin-guide/perf/nvidia-pmu.rst b/Documentation/admin-guide/perf/nvidia-pmu.rst
index 2e0d47cfe7ea..f538ef67e0e8 100644
--- a/Documentation/admin-guide/perf/nvidia-pmu.rst
+++ b/Documentation/admin-guide/perf/nvidia-pmu.rst
@@ -34,7 +34,7 @@ strongly-ordered (SO) PCIE write traffic to local/remote memory. Please see
traffic coverage.
The events and configuration options of this PMU device are described in sysfs,
-see /sys/bus/event_sources/devices/nvidia_scf_pmu_<socket-id>.
+see /sys/bus/event_source/devices/nvidia_scf_pmu_<socket-id>.
Example usage:
@@ -66,7 +66,7 @@ Please see :ref:`NVIDIA_Uncore_PMU_Traffic_Coverage_Section` for more info about
the PMU traffic coverage.
The events and configuration options of this PMU device are described in sysfs,
-see /sys/bus/event_sources/devices/nvidia_nvlink_c2c0_pmu_<socket-id>.
+see /sys/bus/event_source/devices/nvidia_nvlink_c2c0_pmu_<socket-id>.
Example usage:
@@ -86,6 +86,22 @@ Example usage:
perf stat -a -e nvidia_nvlink_c2c0_pmu_3/event=0x0/
+The NVLink-C2C has two ports that can be connected to one GPU (occupying both
+ports) or to two GPUs (one GPU per port). The user can use "port" bitmap
+parameter to select the port(s) to monitor. Each bit represents the port number,
+e.g. "port=0x1" corresponds to port 0 and "port=0x3" is for port 0 and 1. The
+PMU will monitor both ports by default if not specified.
+
+Example for port filtering:
+
+* Count event id 0x0 from the GPU connected with socket 0 on port 0::
+
+ perf stat -a -e nvidia_nvlink_c2c0_pmu_0/event=0x0,port=0x1/
+
+* Count event id 0x0 from the GPUs connected with socket 0 on port 0 and port 1::
+
+ perf stat -a -e nvidia_nvlink_c2c0_pmu_0/event=0x0,port=0x3/
+
NVLink-C2C1 PMU
-------------------
@@ -96,7 +112,7 @@ Please see :ref:`NVIDIA_Uncore_PMU_Traffic_Coverage_Section` for more info about
the PMU traffic coverage.
The events and configuration options of this PMU device are described in sysfs,
-see /sys/bus/event_sources/devices/nvidia_nvlink_c2c1_pmu_<socket-id>.
+see /sys/bus/event_source/devices/nvidia_nvlink_c2c1_pmu_<socket-id>.
Example usage:
@@ -116,6 +132,22 @@ Example usage:
perf stat -a -e nvidia_nvlink_c2c1_pmu_3/event=0x0/
+The NVLink-C2C has two ports that can be connected to one GPU (occupying both
+ports) or to two GPUs (one GPU per port). The user can use "port" bitmap
+parameter to select the port(s) to monitor. Each bit represents the port number,
+e.g. "port=0x1" corresponds to port 0 and "port=0x3" is for port 0 and 1. The
+PMU will monitor both ports by default if not specified.
+
+Example for port filtering:
+
+* Count event id 0x0 from the GPU connected with socket 0 on port 0::
+
+ perf stat -a -e nvidia_nvlink_c2c1_pmu_0/event=0x0,port=0x1/
+
+* Count event id 0x0 from the GPUs connected with socket 0 on port 0 and port 1::
+
+ perf stat -a -e nvidia_nvlink_c2c1_pmu_0/event=0x0,port=0x3/
+
CNVLink PMU
---------------
@@ -125,13 +157,14 @@ to local memory. For PCIE traffic, this PMU captures read and relaxed ordered
for more info about the PMU traffic coverage.
The events and configuration options of this PMU device are described in sysfs,
-see /sys/bus/event_sources/devices/nvidia_cnvlink_pmu_<socket-id>.
+see /sys/bus/event_source/devices/nvidia_cnvlink_pmu_<socket-id>.
Each SoC socket can be connected to one or more sockets via CNVLink. The user can
use "rem_socket" bitmap parameter to select the remote socket(s) to monitor.
Each bit represents the socket number, e.g. "rem_socket=0xE" corresponds to
-socket 1 to 3.
-/sys/bus/event_sources/devices/nvidia_cnvlink_pmu_<socket-id>/format/rem_socket
+socket 1 to 3. The PMU will monitor all remote sockets by default if not
+specified.
+/sys/bus/event_source/devices/nvidia_cnvlink_pmu_<socket-id>/format/rem_socket
shows the valid bits that can be set in the "rem_socket" parameter.
The PMU can not distinguish the remote traffic initiator, therefore it does not
@@ -165,12 +198,13 @@ local/remote memory. Please see :ref:`NVIDIA_Uncore_PMU_Traffic_Coverage_Section
for more info about the PMU traffic coverage.
The events and configuration options of this PMU device are described in sysfs,
-see /sys/bus/event_sources/devices/nvidia_pcie_pmu_<socket-id>.
+see /sys/bus/event_source/devices/nvidia_pcie_pmu_<socket-id>.
Each SoC socket can support multiple root ports. The user can use
"root_port" bitmap parameter to select the port(s) to monitor, i.e.
-"root_port=0xF" corresponds to root port 0 to 3.
-/sys/bus/event_sources/devices/nvidia_pcie_pmu_<socket-id>/format/root_port
+"root_port=0xF" corresponds to root port 0 to 3. The PMU will monitor all root
+ports by default if not specified.
+/sys/bus/event_source/devices/nvidia_pcie_pmu_<socket-id>/format/root_port
shows the valid bits that can be set in the "root_port" parameter.
Example usage:
diff --git a/Documentation/admin-guide/quickly-build-trimmed-linux.rst b/Documentation/admin-guide/quickly-build-trimmed-linux.rst
index f08149bc53f8..07cfd8863b46 100644
--- a/Documentation/admin-guide/quickly-build-trimmed-linux.rst
+++ b/Documentation/admin-guide/quickly-build-trimmed-linux.rst
@@ -733,7 +733,7 @@ can easily happen that your self-built kernel will lack modules for tasks you
did not perform before utilizing this make target. That's because those tasks
require kernel modules that are normally autoloaded when you perform that task
for the first time; if you didn't perform that task at least once before using
-localmodonfig, the latter will thus assume these modules are superfluous and
+localmodconfig, the latter will thus assume these modules are superfluous and
disable them.
You can try to avoid this by performing typical tasks that often will autoload
diff --git a/Documentation/admin-guide/sysctl/fs.rst b/Documentation/admin-guide/sysctl/fs.rst
index f5ec6c9312e1..08e89e031714 100644
--- a/Documentation/admin-guide/sysctl/fs.rst
+++ b/Documentation/admin-guide/sysctl/fs.rst
@@ -41,7 +41,7 @@ pre-allocation or re-sizing of any kernel data structures.
dentry-negative
----------------------------
-Policy for negative dentries. Set to 1 to to always delete the dentry when a
+Policy for negative dentries. Set to 1 to always delete the dentry when a
file is removed, and 0 to disable it. By default, this behavior is disabled.
dentry-state
diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
index 7a85b6eb884e..dd49a89a62d3 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -1555,6 +1555,13 @@ constant ``FUTEX_TID_MASK`` (0x3fffffff).
If a value outside of this range is written to ``threads-max`` an
``EINVAL`` error occurs.
+timer_migration
+===============
+
+When set to a non-zero value, attempt to migrate timers away from idle cpus to
+allow them to remain in low power states longer.
+
+Default is set (1).
traceoff_on_warning
===================
diff --git a/Documentation/admin-guide/sysrq.rst b/Documentation/admin-guide/sysrq.rst
index a85b3384d1e7..9c7aa817adc7 100644
--- a/Documentation/admin-guide/sysrq.rst
+++ b/Documentation/admin-guide/sysrq.rst
@@ -49,26 +49,26 @@ How do I use the magic SysRq key?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
On x86
- You press the key combo :kbd:`ALT-SysRq-<command key>`.
+ You press the key combo `ALT-SysRq-<command key>`.
.. note::
Some
keyboards may not have a key labeled 'SysRq'. The 'SysRq' key is
also known as the 'Print Screen' key. Also some keyboards cannot
handle so many keys being pressed at the same time, so you might
- have better luck with press :kbd:`Alt`, press :kbd:`SysRq`,
- release :kbd:`SysRq`, press :kbd:`<command key>`, release everything.
+ have better luck with press `Alt`, press `SysRq`,
+ release `SysRq`, press `<command key>`, release everything.
On SPARC
- You press :kbd:`ALT-STOP-<command key>`, I believe.
+ You press `ALT-STOP-<command key>`, I believe.
On the serial console (PC style standard serial ports only)
You send a ``BREAK``, then within 5 seconds a command key. Sending
``BREAK`` twice is interpreted as a normal BREAK.
On PowerPC
- Press :kbd:`ALT - Print Screen` (or :kbd:`F13`) - :kbd:`<command key>`.
- :kbd:`Print Screen` (or :kbd:`F13`) - :kbd:`<command key>` may suffice.
+ Press `ALT - Print Screen` (or `F13`) - `<command key>`.
+ `Print Screen` (or `F13`) - `<command key>` may suffice.
On other
If you know of the key combos for other architectures, please
@@ -88,7 +88,7 @@ On all
echo _reisub > /proc/sysrq-trigger
-The :kbd:`<command key>` is case sensitive.
+The `<command key>` is case sensitive.
What are the 'command' keys?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -225,9 +225,9 @@ Sometimes SysRq seems to get 'stuck' after using it, what can I do?
When this happens, try tapping shift, alt and control on both sides of the
keyboard, and hitting an invalid sysrq sequence again. (i.e., something like
-:kbd:`alt-sysrq-z`).
+`alt-sysrq-z`).
-Switching to another virtual console (:kbd:`ALT+Fn`) and then back again
+Switching to another virtual console (`ALT+Fn`) and then back again
should also help.
I hit SysRq, but nothing seems to happen, what's wrong?
@@ -290,7 +290,7 @@ exception the header line from the sysrq command is passed to all console
consumers as if the current loglevel was maximum. If only the header
is emitted it is almost certain that the kernel loglevel is too low.
Should you require the output on the console channel then you will need
-to temporarily up the console loglevel using :kbd:`alt-sysrq-8` or::
+to temporarily up the console loglevel using `alt-sysrq-8` or::
echo 8 > /proc/sysrq-trigger
diff --git a/Documentation/admin-guide/verify-bugs-and-bisect-regressions.rst b/Documentation/admin-guide/verify-bugs-and-bisect-regressions.rst
index 6281eae9e6bc..03c55151346c 100644
--- a/Documentation/admin-guide/verify-bugs-and-bisect-regressions.rst
+++ b/Documentation/admin-guide/verify-bugs-and-bisect-regressions.rst
@@ -1431,7 +1431,7 @@ can easily happen that your self-built kernels will lack modules for tasks you
did not perform at least once before utilizing this make target. That happens
when a task requires kernel modules which are only autoloaded when you execute
it for the first time. So when you never performed that task since starting your
-kernel the modules will not have been loaded -- and from localmodonfig's point
+kernel the modules will not have been loaded -- and from localmodconfig's point
of view look superfluous, which thus disables them to reduce the amount of code
to be compiled.
diff --git a/Documentation/admin-guide/workload-tracing.rst b/Documentation/admin-guide/workload-tracing.rst
index b2e254ec8ee8..6be38c1b9c5b 100644
--- a/Documentation/admin-guide/workload-tracing.rst
+++ b/Documentation/admin-guide/workload-tracing.rst
@@ -83,7 +83,7 @@ scripts/ver_linux is a good way to check if your system already has
the necessary tools::
sudo apt-get build-essentials flex bison yacc
- sudo apt install libelf-dev systemtap-sdt-dev libaudit-dev libslang2-dev libperl-dev libdw-dev
+ sudo apt install libelf-dev systemtap-sdt-dev libslang2-dev libperl-dev libdw-dev
cscope is a good tool to browse kernel sources. Let's install it now::