diff options
Diffstat (limited to 'Documentation/admin-guide')
-rw-r--r-- | Documentation/admin-guide/README.rst | 121 | ||||
-rw-r--r-- | Documentation/admin-guide/acpi/dsdt-override.rst | 13 | ||||
-rw-r--r-- | Documentation/admin-guide/cgroup-v2.rst | 5 | ||||
-rw-r--r-- | Documentation/admin-guide/dynamic-debug-howto.rst | 246 | ||||
-rw-r--r-- | Documentation/admin-guide/hw-vuln/processor_mmio_stale_data.rst | 14 | ||||
-rw-r--r-- | Documentation/admin-guide/hw-vuln/spectre.rst | 1 | ||||
-rw-r--r-- | Documentation/admin-guide/kdump/vmcoreinfo.rst | 2 | ||||
-rw-r--r-- | Documentation/admin-guide/kernel-parameters.txt | 31 | ||||
-rw-r--r-- | Documentation/admin-guide/mm/damon/usage.rst | 18 | ||||
-rw-r--r-- | Documentation/admin-guide/mm/hugetlbpage.rst | 2 | ||||
-rw-r--r-- | Documentation/admin-guide/perf/alibaba_pmu.rst | 100 | ||||
-rw-r--r-- | Documentation/admin-guide/perf/index.rst | 1 | ||||
-rw-r--r-- | Documentation/admin-guide/pm/amd-pstate.rst | 76 | ||||
-rw-r--r-- | Documentation/admin-guide/sysctl/net.rst | 24 | ||||
-rw-r--r-- | Documentation/admin-guide/tainted-kernels.rst | 6 |
15 files changed, 400 insertions, 260 deletions
diff --git a/Documentation/admin-guide/README.rst b/Documentation/admin-guide/README.rst index caa3c09a5c3f..9a969c0157f1 100644 --- a/Documentation/admin-guide/README.rst +++ b/Documentation/admin-guide/README.rst @@ -1,9 +1,9 @@ .. _readme: -Linux kernel release 5.x <http://kernel.org/> +Linux kernel release 6.x <http://kernel.org/> ============================================= -These are the release notes for Linux version 5. Read them carefully, +These are the release notes for Linux version 6. Read them carefully, as they tell you what this is all about, explain how to install the kernel, and what to do if something goes wrong. @@ -63,7 +63,7 @@ Installing the kernel source directory where you have permissions (e.g. your home directory) and unpack it:: - xz -cd linux-5.x.tar.xz | tar xvf - + xz -cd linux-6.x.tar.xz | tar xvf - Replace "X" with the version number of the latest kernel. @@ -72,12 +72,12 @@ Installing the kernel source files. They should match the library, and not get messed up by whatever the kernel-du-jour happens to be. - - You can also upgrade between 5.x releases by patching. Patches are + - You can also upgrade between 6.x releases by patching. Patches are distributed in the xz format. To install by patching, get all the newer patch files, enter the top level directory of the kernel source - (linux-5.x) and execute:: + (linux-6.x) and execute:: - xz -cd ../patch-5.x.xz | patch -p1 + xz -cd ../patch-6.x.xz | patch -p1 Replace "x" for all versions bigger than the version "x" of your current source tree, **in_order**, and you should be ok. You may want to remove @@ -85,13 +85,13 @@ Installing the kernel source that there are no failed patches (some-file-name# or some-file-name.rej). If there are, either you or I have made a mistake. - Unlike patches for the 5.x kernels, patches for the 5.x.y kernels + Unlike patches for the 6.x kernels, patches for the 6.x.y kernels (also known as the -stable kernels) are not incremental but instead apply - directly to the base 5.x kernel. For example, if your base kernel is 5.0 - and you want to apply the 5.0.3 patch, you must not first apply the 5.0.1 - and 5.0.2 patches. Similarly, if you are running kernel version 5.0.2 and - want to jump to 5.0.3, you must first reverse the 5.0.2 patch (that is, - patch -R) **before** applying the 5.0.3 patch. You can read more on this in + directly to the base 6.x kernel. For example, if your base kernel is 6.0 + and you want to apply the 6.0.3 patch, you must not first apply the 6.0.1 + and 6.0.2 patches. Similarly, if you are running kernel version 6.0.2 and + want to jump to 6.0.3, you must first reverse the 6.0.2 patch (that is, + patch -R) **before** applying the 6.0.3 patch. You can read more on this in :ref:`Documentation/process/applying-patches.rst <applying_patches>`. Alternatively, the script patch-kernel can be used to automate this @@ -114,7 +114,7 @@ Installing the kernel source Software requirements --------------------- - Compiling and running the 5.x kernels requires up-to-date + Compiling and running the 6.x kernels requires up-to-date versions of various software packages. Consult :ref:`Documentation/process/changes.rst <changes>` for the minimum version numbers required and how to get updates for these packages. Beware that using @@ -132,12 +132,12 @@ Build directory for the kernel place for the output files (including .config). Example:: - kernel source code: /usr/src/linux-5.x + kernel source code: /usr/src/linux-6.x build directory: /home/name/build/kernel To configure and build the kernel, use:: - cd /usr/src/linux-5.x + cd /usr/src/linux-6.x make O=/home/name/build/kernel menuconfig make O=/home/name/build/kernel sudo make O=/home/name/build/kernel modules_install install @@ -262,8 +262,6 @@ Compiling the kernel - Make sure you have at least gcc 5.1 available. For more information, refer to :ref:`Documentation/process/changes.rst <changes>`. - Please note that you can still run a.out user programs with this kernel. - - Do a ``make`` to create a compressed kernel image. It is also possible to do ``make install`` if you have lilo installed to suit the kernel makefiles, but you may want to check your particular lilo setup first. @@ -332,85 +330,10 @@ Compiling the kernel If something goes wrong ----------------------- - - If you have problems that seem to be due to kernel bugs, please check - the file MAINTAINERS to see if there is a particular person associated - with the part of the kernel that you are having trouble with. If there - isn't anyone listed there, then the second best thing is to mail - them to me (torvalds@linux-foundation.org), and possibly to any other - relevant mailing-list or to the newsgroup. - - - In all bug-reports, *please* tell what kernel you are talking about, - how to duplicate the problem, and what your setup is (use your common - sense). If the problem is new, tell me so, and if the problem is - old, please try to tell me when you first noticed it. - - - If the bug results in a message like:: - - unable to handle kernel paging request at address C0000010 - Oops: 0002 - EIP: 0010:XXXXXXXX - eax: xxxxxxxx ebx: xxxxxxxx ecx: xxxxxxxx edx: xxxxxxxx - esi: xxxxxxxx edi: xxxxxxxx ebp: xxxxxxxx - ds: xxxx es: xxxx fs: xxxx gs: xxxx - Pid: xx, process nr: xx - xx xx xx xx xx xx xx xx xx xx - - or similar kernel debugging information on your screen or in your - system log, please duplicate it *exactly*. The dump may look - incomprehensible to you, but it does contain information that may - help debugging the problem. The text above the dump is also - important: it tells something about why the kernel dumped code (in - the above example, it's due to a bad kernel pointer). More information - on making sense of the dump is in Documentation/admin-guide/bug-hunting.rst - - - If you compiled the kernel with CONFIG_KALLSYMS you can send the dump - as is, otherwise you will have to use the ``ksymoops`` program to make - sense of the dump (but compiling with CONFIG_KALLSYMS is usually preferred). - This utility can be downloaded from - https://www.kernel.org/pub/linux/utils/kernel/ksymoops/ . - Alternatively, you can do the dump lookup by hand: - - - In debugging dumps like the above, it helps enormously if you can - look up what the EIP value means. The hex value as such doesn't help - me or anybody else very much: it will depend on your particular - kernel setup. What you should do is take the hex value from the EIP - line (ignore the ``0010:``), and look it up in the kernel namelist to - see which kernel function contains the offending address. - - To find out the kernel function name, you'll need to find the system - binary associated with the kernel that exhibited the symptom. This is - the file 'linux/vmlinux'. To extract the namelist and match it against - the EIP from the kernel crash, do:: - - nm vmlinux | sort | less - - This will give you a list of kernel addresses sorted in ascending - order, from which it is simple to find the function that contains the - offending address. Note that the address given by the kernel - debugging messages will not necessarily match exactly with the - function addresses (in fact, that is very unlikely), so you can't - just 'grep' the list: the list will, however, give you the starting - point of each kernel function, so by looking for the function that - has a starting address lower than the one you are searching for but - is followed by a function with a higher address you will find the one - you want. In fact, it may be a good idea to include a bit of - "context" in your problem report, giving a few lines around the - interesting one. - - If you for some reason cannot do the above (you have a pre-compiled - kernel image or similar), telling me as much about your setup as - possible will help. Please read - 'Documentation/admin-guide/reporting-issues.rst' for details. - - - Alternatively, you can use gdb on a running kernel. (read-only; i.e. you - cannot change values or set break points.) To do this, first compile the - kernel with -g; edit arch/x86/Makefile appropriately, then do a ``make - clean``. You'll also need to enable CONFIG_PROC_FS (via ``make config``). - - After you've rebooted with the new kernel, do ``gdb vmlinux /proc/kcore``. - You can now use all the usual gdb commands. The command to look up the - point where your system crashed is ``l *0xXXXXXXXX``. (Replace the XXXes - with the EIP value.) - - gdb'ing a non-running kernel currently fails because ``gdb`` (wrongly) - disregards the starting offset for which the kernel is compiled. +If you have problems that seem to be due to kernel bugs, please follow the +instructions at 'Documentation/admin-guide/reporting-issues.rst'. + +Hints on understanding kernel bug reports are in +'Documentation/admin-guide/bug-hunting.rst'. More on debugging the kernel +with gdb is in 'Documentation/dev-tools/gdb-kernel-debugging.rst' and +'Documentation/dev-tools/kgdb.rst'. diff --git a/Documentation/admin-guide/acpi/dsdt-override.rst b/Documentation/admin-guide/acpi/dsdt-override.rst deleted file mode 100644 index 50bd7f194bf4..000000000000 --- a/Documentation/admin-guide/acpi/dsdt-override.rst +++ /dev/null @@ -1,13 +0,0 @@ -.. SPDX-License-Identifier: GPL-2.0 - -=============== -Overriding DSDT -=============== - -Linux supports a method of overriding the BIOS DSDT: - -CONFIG_ACPI_CUSTOM_DSDT - builds the image into the kernel. - -When to use this method is described in detail on the -Linux/ACPI home page: -https://01.org/linux-acpi/documentation/overriding-dsdt diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst index be4a77baf784..7ce8130a8924 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -1355,6 +1355,11 @@ PAGE_SIZE multiple when read back. pagetables Amount of memory allocated for page tables. + sec_pagetables + Amount of memory allocated for secondary page tables, + this currently includes KVM mmu allocations on x86 + and arm64. + percpu (npn) Amount of memory used for storing per-cpu kernel data structures. diff --git a/Documentation/admin-guide/dynamic-debug-howto.rst b/Documentation/admin-guide/dynamic-debug-howto.rst index a89cfa083155..faa22f77847a 100644 --- a/Documentation/admin-guide/dynamic-debug-howto.rst +++ b/Documentation/admin-guide/dynamic-debug-howto.rst @@ -5,143 +5,115 @@ Dynamic debug Introduction ============ -This document describes how to use the dynamic debug (dyndbg) feature. +Dynamic debug allows you to dynamically enable/disable kernel +debug-print code to obtain additional kernel information. -Dynamic debug is designed to allow you to dynamically enable/disable -kernel code to obtain additional kernel information. Currently, if -``CONFIG_DYNAMIC_DEBUG`` is set, then all ``pr_debug()``/``dev_dbg()`` and -``print_hex_dump_debug()``/``print_hex_dump_bytes()`` calls can be dynamically -enabled per-callsite. +If ``/proc/dynamic_debug/control`` exists, your kernel has dynamic +debug. You'll need root access (sudo su) to use this. -If you do not want to enable dynamic debug globally (i.e. in some embedded -system), you may set ``CONFIG_DYNAMIC_DEBUG_CORE`` as basic support of dynamic -debug and add ``ccflags := -DDYNAMIC_DEBUG_MODULE`` into the Makefile of any -modules which you'd like to dynamically debug later. - -If ``CONFIG_DYNAMIC_DEBUG`` is not set, ``print_hex_dump_debug()`` is just -shortcut for ``print_hex_dump(KERN_DEBUG)``. - -For ``print_hex_dump_debug()``/``print_hex_dump_bytes()``, format string is -its ``prefix_str`` argument, if it is constant string; or ``hexdump`` -in case ``prefix_str`` is built dynamically. +Dynamic debug provides: -Dynamic debug has even more useful features: + * a Catalog of all *prdbgs* in your kernel. + ``cat /proc/dynamic_debug/control`` to see them. - * Simple query language allows turning on and off debugging - statements by matching any combination of 0 or 1 of: + * a Simple query/command language to alter *prdbgs* by selecting on + any combination of 0 or 1 of: - source filename - function name - line number (including ranges of line numbers) - module name - format string - - * Provides a debugfs control file: ``<debugfs>/dynamic_debug/control`` - which can be read to display the complete list of known debug - statements, to help guide you - -Controlling dynamic debug Behaviour -=================================== - -The behaviour of ``pr_debug()``/``dev_dbg()`` are controlled via writing to a -control file in the 'debugfs' filesystem. Thus, you must first mount -the debugfs filesystem, in order to make use of this feature. -Subsequently, we refer to the control file as: -``<debugfs>/dynamic_debug/control``. For example, if you want to enable -printing from source file ``svcsock.c``, line 1603 you simply do:: - - nullarbor:~ # echo 'file svcsock.c line 1603 +p' > - <debugfs>/dynamic_debug/control - -If you make a mistake with the syntax, the write will fail thus:: - - nullarbor:~ # echo 'file svcsock.c wtf 1 +p' > - <debugfs>/dynamic_debug/control - -bash: echo: write error: Invalid argument - -Note, for systems without 'debugfs' enabled, the control file can be -found in ``/proc/dynamic_debug/control``. + - class name (as known/declared by each module) Viewing Dynamic Debug Behaviour =============================== -You can view the currently configured behaviour of all the debug -statements via:: +You can view the currently configured behaviour in the *prdbg* catalog:: - nullarbor:~ # cat <debugfs>/dynamic_debug/control + :#> head -n7 /proc/dynamic_debug/control # filename:lineno [module]function flags format - net/sunrpc/svc_rdma.c:323 [svcxprt_rdma]svc_rdma_cleanup =_ "SVCRDMA Module Removed, deregister RPC RDMA transport\012" - net/sunrpc/svc_rdma.c:341 [svcxprt_rdma]svc_rdma_init =_ "\011max_inline : %d\012" - net/sunrpc/svc_rdma.c:340 [svcxprt_rdma]svc_rdma_init =_ "\011sq_depth : %d\012" - net/sunrpc/svc_rdma.c:338 [svcxprt_rdma]svc_rdma_init =_ "\011max_requests : %d\012" - ... + init/main.c:1179 [main]initcall_blacklist =_ "blacklisting initcall %s\012 + init/main.c:1218 [main]initcall_blacklisted =_ "initcall %s blacklisted\012" + init/main.c:1424 [main]run_init_process =_ " with arguments:\012" + init/main.c:1426 [main]run_init_process =_ " %s\012" + init/main.c:1427 [main]run_init_process =_ " with environment:\012" + init/main.c:1429 [main]run_init_process =_ " %s\012" +The 3rd space-delimited column shows the current flags, preceded by +a ``=`` for easy use with grep/cut. ``=p`` shows enabled callsites. -You can also apply standard Unix text manipulation filters to this -data, e.g.:: +Controlling dynamic debug Behaviour +=================================== - nullarbor:~ # grep -i rdma <debugfs>/dynamic_debug/control | wc -l - 62 +The behaviour of *prdbg* sites are controlled by writing +query/commands to the control file. Example:: - nullarbor:~ # grep -i tcp <debugfs>/dynamic_debug/control | wc -l - 42 + # grease the interface + :#> alias ddcmd='echo $* > /proc/dynamic_debug/control' -The third column shows the currently enabled flags for each debug -statement callsite (see below for definitions of the flags). The -default value, with no flags enabled, is ``=_``. So you can view all -the debug statement callsites with any non-default flags:: + :#> ddcmd '-p; module main func run* +p' + :#> grep =p /proc/dynamic_debug/control + init/main.c:1424 [main]run_init_process =p " with arguments:\012" + init/main.c:1426 [main]run_init_process =p " %s\012" + init/main.c:1427 [main]run_init_process =p " with environment:\012" + init/main.c:1429 [main]run_init_process =p " %s\012" - nullarbor:~ # awk '$3 != "=_"' <debugfs>/dynamic_debug/control - # filename:lineno [module]function flags format - net/sunrpc/svcsock.c:1603 [sunrpc]svc_send p "svc_process: st_sendto returned %d\012" +Error messages go to console/syslog:: + + :#> ddcmd mode foo +p + dyndbg: unknown keyword "mode" + dyndbg: query parse failed + bash: echo: write error: Invalid argument + +If debugfs is also enabled and mounted, ``dynamic_debug/control`` is +also under the mount-dir, typically ``/sys/kernel/debug/``. Command Language Reference ========================== -At the lexical level, a command comprises a sequence of words separated +At the basic lexical level, a command is a sequence of words separated by spaces or tabs. So these are all equivalent:: - nullarbor:~ # echo -n 'file svcsock.c line 1603 +p' > - <debugfs>/dynamic_debug/control - nullarbor:~ # echo -n ' file svcsock.c line 1603 +p ' > - <debugfs>/dynamic_debug/control - nullarbor:~ # echo -n 'file svcsock.c line 1603 +p' > - <debugfs>/dynamic_debug/control + :#> ddcmd file svcsock.c line 1603 +p + :#> ddcmd "file svcsock.c line 1603 +p" + :#> ddcmd ' file svcsock.c line 1603 +p ' Command submissions are bounded by a write() system call. Multiple commands can be written together, separated by ``;`` or ``\n``:: - ~# echo "func pnpacpi_get_resources +p; func pnp_assign_mem +p" \ - > <debugfs>/dynamic_debug/control + :#> ddcmd "func pnpacpi_get_resources +p; func pnp_assign_mem +p" + :#> ddcmd <<"EOC" + func pnpacpi_get_resources +p + func pnp_assign_mem +p + EOC + :#> cat query-batch-file > /proc/dynamic_debug/control -If your query set is big, you can batch them too:: +You can also use wildcards in each query term. The match rule supports +``*`` (matches zero or more characters) and ``?`` (matches exactly one +character). For example, you can match all usb drivers:: - ~# cat query-batch-file > <debugfs>/dynamic_debug/control + :#> ddcmd file "drivers/usb/*" +p # "" to suppress shell expansion -Another way is to use wildcards. The match rule supports ``*`` (matches -zero or more characters) and ``?`` (matches exactly one character). For -example, you can match all usb drivers:: - - ~# echo "file drivers/usb/* +p" > <debugfs>/dynamic_debug/control - -At the syntactical level, a command comprises a sequence of match -specifications, followed by a flags change specification:: +Syntactically, a command is pairs of keyword values, followed by a +flags change or setting:: command ::= match-spec* flags-spec -The match-spec's are used to choose a subset of the known pr_debug() -callsites to which to apply the flags-spec. Think of them as a query -with implicit ANDs between each pair. Note that an empty list of -match-specs will select all debug statement callsites. +The match-spec's select *prdbgs* from the catalog, upon which to apply +the flags-spec, all constraints are ANDed together. An absent keyword +is the same as keyword "*". + -A match specification comprises a keyword, which controls the -attribute of the callsite to be compared, and a value to compare -against. Possible keywords are::: +A match specification is a keyword, which selects the attribute of +the callsite to be compared, and a value to compare against. Possible +keywords are::: match-spec ::= 'func' string | 'file' string | 'module' string | 'format' string | + 'class' string | 'line' line-range line-range ::= lineno | @@ -203,6 +175,16 @@ format format "nfsd: SETATTR" // a neater way to match a format with whitespace format 'nfsd: SETATTR' // yet another way to match a format with whitespace +class + The given class_name is validated against each module, which may + have declared a list of known class_names. If the class_name is + found for a module, callsite & class matching and adjustment + proceeds. Examples:: + + class DRM_UT_KMS # a DRM.debug category + class JUNK # silent non-match + // class TLD_* # NOTICE: no wildcard in class names + line The given line number or range of line numbers is compared against the line number of each ``pr_debug()`` callsite. A single @@ -228,17 +210,16 @@ of the characters:: The flags are:: p enables the pr_debug() callsite. - f Include the function name in the printed message - l Include line number in the printed message - m Include module name in the printed message - t Include thread ID in messages not generated from interrupt context - _ No flags are set. (Or'd with others on input) + _ enables no flags. -For ``print_hex_dump_debug()`` and ``print_hex_dump_bytes()``, only ``p`` flag -have meaning, other flags ignored. + Decorator flags add to the message-prefix, in order: + t Include thread ID, or <intr> + m Include module name + f Include the function name + l Include line number -For display, the flags are preceded by ``=`` -(mnemonic: what the flags are currently equal to). +For ``print_hex_dump_debug()`` and ``print_hex_dump_bytes()``, only +the ``p`` flag has meaning, other flags are ignored. Note the regexp ``^[-+=][flmpt_]+$`` matches a flags specification. To clear all flags at once, use ``=_`` or ``-flmpt``. @@ -313,7 +294,7 @@ For ``CONFIG_DYNAMIC_DEBUG`` kernels, any settings given at boot-time (or enabled by ``-DDEBUG`` flag during compilation) can be disabled later via the debugfs interface if the debug messages are no longer needed:: - echo "module module_name -p" > <debugfs>/dynamic_debug/control + echo "module module_name -p" > /proc/dynamic_debug/control Examples ======== @@ -321,37 +302,31 @@ Examples :: // enable the message at line 1603 of file svcsock.c - nullarbor:~ # echo -n 'file svcsock.c line 1603 +p' > - <debugfs>/dynamic_debug/control + :#> ddcmd 'file svcsock.c line 1603 +p' // enable all the messages in file svcsock.c - nullarbor:~ # echo -n 'file svcsock.c +p' > - <debugfs>/dynamic_debug/control + :#> ddcmd 'file svcsock.c +p' // enable all the messages in the NFS server module - nullarbor:~ # echo -n 'module nfsd +p' > - <debugfs>/dynamic_debug/control + :#> ddcmd 'module nfsd +p' // enable all 12 messages in the function svc_process() - nullarbor:~ # echo -n 'func svc_process +p' > - <debugfs>/dynamic_debug/control + :#> ddcmd 'func svc_process +p' // disable all 12 messages in the function svc_process() - nullarbor:~ # echo -n 'func svc_process -p' > - <debugfs>/dynamic_debug/control + :#> ddcmd 'func svc_process -p' // enable messages for NFS calls READ, READLINK, READDIR and READDIR+. - nullarbor:~ # echo -n 'format "nfsd: READ" +p' > - <debugfs>/dynamic_debug/control + :#> ddcmd 'format "nfsd: READ" +p' // enable messages in files of which the paths include string "usb" - nullarbor:~ # echo -n 'file *usb* +p' > <debugfs>/dynamic_debug/control + :#> ddcmd 'file *usb* +p' > /proc/dynamic_debug/control // enable all messages - nullarbor:~ # echo -n '+p' > <debugfs>/dynamic_debug/control + :#> ddcmd '+p' > /proc/dynamic_debug/control // add module, function to all enabled messages - nullarbor:~ # echo -n '+mf' > <debugfs>/dynamic_debug/control + :#> ddcmd '+mf' > /proc/dynamic_debug/control // boot-args example, with newlines and comments for readability Kernel command line: ... @@ -364,3 +339,38 @@ Examples dyndbg="file init/* +p #cmt ; func parse_one +p" // enable pr_debugs in 2 functions in a module loaded later pc87360.dyndbg="func pc87360_init_device +p; func pc87360_find +p" + +Kernel Configuration +==================== + +Dynamic Debug is enabled via kernel config items:: + + CONFIG_DYNAMIC_DEBUG=y # build catalog, enables CORE + CONFIG_DYNAMIC_DEBUG_CORE=y # enable mechanics only, skip catalog + +If you do not want to enable dynamic debug globally (i.e. in some embedded +system), you may set ``CONFIG_DYNAMIC_DEBUG_CORE`` as basic support of dynamic +debug and add ``ccflags := -DDYNAMIC_DEBUG_MODULE`` into the Makefile of any +modules which you'd like to dynamically debug later. + + +Kernel *prdbg* API +================== + +The following functions are cataloged and controllable when dynamic +debug is enabled:: + + pr_debug() + dev_dbg() + print_hex_dump_debug() + print_hex_dump_bytes() + +Otherwise, they are off by default; ``ccflags += -DDEBUG`` or +``#define DEBUG`` in a source file will enable them appropriately. + +If ``CONFIG_DYNAMIC_DEBUG`` is not set, ``print_hex_dump_debug()`` is +just a shortcut for ``print_hex_dump(KERN_DEBUG)``. + +For ``print_hex_dump_debug()``/``print_hex_dump_bytes()``, format string is +its ``prefix_str`` argument, if it is constant string; or ``hexdump`` +in case ``prefix_str`` is built dynamically. diff --git a/Documentation/admin-guide/hw-vuln/processor_mmio_stale_data.rst b/Documentation/admin-guide/hw-vuln/processor_mmio_stale_data.rst index 9393c50b5afc..c98fd11907cc 100644 --- a/Documentation/admin-guide/hw-vuln/processor_mmio_stale_data.rst +++ b/Documentation/admin-guide/hw-vuln/processor_mmio_stale_data.rst @@ -230,6 +230,20 @@ The possible values in this file are: * - 'Mitigation: Clear CPU buffers' - The processor is vulnerable and the CPU buffer clearing mitigation is enabled. + * - 'Unknown: No mitigations' + - The processor vulnerability status is unknown because it is + out of Servicing period. Mitigation is not attempted. + +Definitions: +------------ + +Servicing period: The process of providing functional and security updates to +Intel processors or platforms, utilizing the Intel Platform Update (IPU) +process or other similar mechanisms. + +End of Servicing Updates (ESU): ESU is the date at which Intel will no +longer provide Servicing, such as through IPU or other similar update +processes. ESU dates will typically be aligned to end of quarter. If the processor is vulnerable then the following information is appended to the above information: diff --git a/Documentation/admin-guide/hw-vuln/spectre.rst b/Documentation/admin-guide/hw-vuln/spectre.rst index 2ce2a38cdd55..c4dcdb3d0d45 100644 --- a/Documentation/admin-guide/hw-vuln/spectre.rst +++ b/Documentation/admin-guide/hw-vuln/spectre.rst @@ -613,6 +613,7 @@ kernel command line. eibrs enhanced IBRS eibrs,retpoline enhanced IBRS + Retpolines eibrs,lfence enhanced IBRS + LFENCE + ibrs use IBRS to protect kernel Not specifying this option is equivalent to spectre_v2=auto. diff --git a/Documentation/admin-guide/kdump/vmcoreinfo.rst b/Documentation/admin-guide/kdump/vmcoreinfo.rst index 8419019b6a88..6726f439958c 100644 --- a/Documentation/admin-guide/kdump/vmcoreinfo.rst +++ b/Documentation/admin-guide/kdump/vmcoreinfo.rst @@ -200,7 +200,7 @@ prb A pointer to the printk ringbuffer (struct printk_ringbuffer). This may be pointing to the static boot ringbuffer or the dynamically -allocated ringbuffer, depending on when the the core dump occurred. +allocated ringbuffer, depending on when the core dump occurred. Used by user-space tools to read the active kernel log buffer. printk_rb_static diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index d7f30902fda0..b2413342b309 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -966,10 +966,6 @@ debugpat [X86] Enable PAT debugging - decnet.addr= [HW,NET] - Format: <area>[,<node>] - See also Documentation/networking/decnet.rst. - default_hugepagesz= [HW] The size of the default HugeTLB page. This is the size represented by the legacy /proc/ hugepages @@ -2436,6 +2432,12 @@ 0: force disabled 1: force enabled + kunit.enable= [KUNIT] Enable executing KUnit tests. Requires + CONFIG_KUNIT to be set to be fully enabled. The + default value can be overridden via + KUNIT_DEFAULT_ENABLED. + Default is 1 (enabled) + kvm.ignore_msrs=[KVM] Ignore guest accesses to unhandled MSRs. Default is 0 (don't ignore, but inject #GP) @@ -3207,6 +3209,7 @@ spectre_v2_user=off [X86] spec_store_bypass_disable=off [X86,PPC] ssbd=force-off [ARM64] + nospectre_bhb [ARM64] l1tf=off [X86] mds=off [X86] tsx_async_abort=off [X86] @@ -3613,7 +3616,7 @@ nohugeiomap [KNL,X86,PPC,ARM64] Disable kernel huge I/O mappings. - nohugevmalloc [PPC] Disable kernel huge vmalloc mappings. + nohugevmalloc [KNL,X86,PPC,ARM64] Disable kernel huge vmalloc mappings. nosmt [KNL,S390] Disable symmetric multithreading (SMT). Equivalent to smt=1. @@ -3626,11 +3629,15 @@ (bounds check bypass). With this option data leaks are possible in the system. - nospectre_v2 [X86,PPC_FSL_BOOK3E,ARM64] Disable all mitigations for + nospectre_v2 [X86,PPC_E500,ARM64] Disable all mitigations for the Spectre variant 2 (indirect branch prediction) vulnerability. System may allow data leaks with this option. + nospectre_bhb [ARM64] Disable all mitigations for Spectre-BHB (branch + history injection) vulnerability. System may allow data leaks + with this option. + nospec_store_bypass_disable [HW] Disable all mitigations for the Speculative Store Bypass vulnerability @@ -3741,9 +3748,9 @@ [X86,PV_OPS] Disable paravirtualized VMware scheduler clock and use the default one. - no-steal-acc [X86,PV_OPS,ARM64] Disable paravirtualized steal time - accounting. steal time is computed, but won't - influence scheduler behaviour + no-steal-acc [X86,PV_OPS,ARM64,PPC/PSERIES] Disable paravirtualized + steal time accounting. steal time is computed, but + won't influence scheduler behaviour nolapic [X86-32,APIC] Do not enable or use the local APIC. @@ -3805,6 +3812,10 @@ nox2apic [X86-64,APIC] Do not enable x2APIC mode. + NOTE: this parameter will be ignored on systems with the + LEGACY_XAPIC_DISABLED bit set in the + IA32_XAPIC_DISABLE_STATUS MSR. + nps_mtm_hs_ctr= [KNL,ARC] This parameter sets the maximum duration, in cycles, each HW thread of the CTOP can run @@ -5331,6 +5342,8 @@ rodata= [KNL] on Mark read-only kernel memory as read-only (default). off Leave read-only kernel memory writable for debugging. + full Mark read-only kernel memory and aliases as read-only + [arm64] rockchip.usb_uart Enable the uart passthrough on the designated usb port diff --git a/Documentation/admin-guide/mm/damon/usage.rst b/Documentation/admin-guide/mm/damon/usage.rst index d52f572a9029..ca91ecc29078 100644 --- a/Documentation/admin-guide/mm/damon/usage.rst +++ b/Documentation/admin-guide/mm/damon/usage.rst @@ -50,10 +50,10 @@ For a short example, users can monitor the virtual address space of a given workload as below. :: # cd /sys/kernel/mm/damon/admin/ - # echo 1 > kdamonds/nr && echo 1 > kdamonds/0/contexts/nr + # echo 1 > kdamonds/nr_kdamonds && echo 1 > kdamonds/0/contexts/nr_contexts # echo vaddr > kdamonds/0/contexts/0/operations - # echo 1 > kdamonds/0/contexts/0/targets/nr - # echo $(pidof <workload>) > kdamonds/0/contexts/0/targets/0/pid + # echo 1 > kdamonds/0/contexts/0/targets/nr_targets + # echo $(pidof <workload>) > kdamonds/0/contexts/0/targets/0/pid_target # echo on > kdamonds/0/state Files Hierarchy @@ -366,12 +366,12 @@ memory rate becomes larger than 60%, or lower than 30%". :: # echo 1 > kdamonds/0/contexts/0/schemes/nr_schemes # cd kdamonds/0/contexts/0/schemes/0 # # set the basic access pattern and the action - # echo 4096 > access_patterns/sz/min - # echo 8192 > access_patterns/sz/max - # echo 0 > access_patterns/nr_accesses/min - # echo 5 > access_patterns/nr_accesses/max - # echo 10 > access_patterns/age/min - # echo 20 > access_patterns/age/max + # echo 4096 > access_pattern/sz/min + # echo 8192 > access_pattern/sz/max + # echo 0 > access_pattern/nr_accesses/min + # echo 5 > access_pattern/nr_accesses/max + # echo 10 > access_pattern/age/min + # echo 20 > access_pattern/age/max # echo pageout > action # # set quotas # echo 10 > quotas/ms diff --git a/Documentation/admin-guide/mm/hugetlbpage.rst b/Documentation/admin-guide/mm/hugetlbpage.rst index 8e2727dc18d4..19f27c0d92e0 100644 --- a/Documentation/admin-guide/mm/hugetlbpage.rst +++ b/Documentation/admin-guide/mm/hugetlbpage.rst @@ -65,7 +65,7 @@ HugePages_Surp may be temporarily larger than the maximum number of surplus huge pages when the system is under memory pressure. Hugepagesize - is the default hugepage size (in Kb). + is the default hugepage size (in kB). Hugetlb is the total amount of memory (in kB), consumed by huge pages of all sizes. diff --git a/Documentation/admin-guide/perf/alibaba_pmu.rst b/Documentation/admin-guide/perf/alibaba_pmu.rst new file mode 100644 index 000000000000..11de998bb480 --- /dev/null +++ b/Documentation/admin-guide/perf/alibaba_pmu.rst @@ -0,0 +1,100 @@ +============================================================= +Alibaba's T-Head SoC Uncore Performance Monitoring Unit (PMU) +============================================================= + +The Yitian 710, custom-built by Alibaba Group's chip development business, +T-Head, implements uncore PMU for performance and functional debugging to +facilitate system maintenance. + +DDR Sub-System Driveway (DRW) PMU Driver +========================================= + +Yitian 710 employs eight DDR5/4 channels, four on each die. Each DDR5 channel +is independent of others to service system memory requests. And one DDR5 +channel is split into two independent sub-channels. The DDR Sub-System Driveway +implements separate PMUs for each sub-channel to monitor various performance +metrics. + +The Driveway PMU devices are named as ali_drw_<sys_base_addr> with perf. +For example, ali_drw_21000 and ali_drw_21080 are two PMU devices for two +sub-channels of the same channel in die 0. And the PMU device of die 1 is +prefixed with ali_drw_400XXXXX, e.g. ali_drw_40021000. + +Each sub-channel has 36 PMU counters in total, which is classified into +four groups: + +- Group 0: PMU Cycle Counter. This group has one pair of counters + pmu_cycle_cnt_low and pmu_cycle_cnt_high, that is used as the cycle count + based on DDRC core clock. + +- Group 1: PMU Bandwidth Counters. This group has 8 counters that are used + to count the total access number of either the eight bank groups in a + selected rank, or four ranks separately in the first 4 counters. The base + transfer unit is 64B. + +- Group 2: PMU Retry Counters. This group has 10 counters, that intend to + count the total retry number of each type of uncorrectable error. + +- Group 3: PMU Common Counters. This group has 16 counters, that are used + to count the common events. + +For now, the Driveway PMU driver only uses counters in group 0 and group 3. + +The DDR Controller (DDRCTL) and DDR PHY combine to create a complete solution +for connecting an SoC application bus to DDR memory devices. The DDRCTL +receives transactions Host Interface (HIF) which is custom-defined by Synopsys. +These transactions are queued internally and scheduled for access while +satisfying the SDRAM protocol timing requirements, transaction priorities, and +dependencies between the transactions. The DDRCTL in turn issues commands on +the DDR PHY Interface (DFI) to the PHY module, which launches and captures data +to and from the SDRAM. The driveway PMUs have hardware logic to gather +statistics and performance logging signals on HIF, DFI, etc. + +By counting the READ, WRITE and RMW commands sent to the DDRC through the HIF +interface, we could calculate the bandwidth. Example usage of counting memory +data bandwidth:: + + perf stat \ + -e ali_drw_21000/hif_wr/ \ + -e ali_drw_21000/hif_rd/ \ + -e ali_drw_21000/hif_rmw/ \ + -e ali_drw_21000/cycle/ \ + -e ali_drw_21080/hif_wr/ \ + -e ali_drw_21080/hif_rd/ \ + -e ali_drw_21080/hif_rmw/ \ + -e ali_drw_21080/cycle/ \ + -e ali_drw_23000/hif_wr/ \ + -e ali_drw_23000/hif_rd/ \ + -e ali_drw_23000/hif_rmw/ \ + -e ali_drw_23000/cycle/ \ + -e ali_drw_23080/hif_wr/ \ + -e ali_drw_23080/hif_rd/ \ + -e ali_drw_23080/hif_rmw/ \ + -e ali_drw_23080/cycle/ \ + -e ali_drw_25000/hif_wr/ \ + -e ali_drw_25000/hif_rd/ \ + -e ali_drw_25000/hif_rmw/ \ + -e ali_drw_25000/cycle/ \ + -e ali_drw_25080/hif_wr/ \ + -e ali_drw_25080/hif_rd/ \ + -e ali_drw_25080/hif_rmw/ \ + -e ali_drw_25080/cycle/ \ + -e ali_drw_27000/hif_wr/ \ + -e ali_drw_27000/hif_rd/ \ + -e ali_drw_27000/hif_rmw/ \ + -e ali_drw_27000/cycle/ \ + -e ali_drw_27080/hif_wr/ \ + -e ali_drw_27080/hif_rd/ \ + -e ali_drw_27080/hif_rmw/ \ + -e ali_drw_27080/cycle/ -- sleep 10 + +The average DRAM bandwidth can be calculated as follows: + +- Read Bandwidth = perf_hif_rd * DDRC_WIDTH * DDRC_Freq / DDRC_Cycle +- Write Bandwidth = (perf_hif_wr + perf_hif_rmw) * DDRC_WIDTH * DDRC_Freq / DDRC_Cycle + +Here, DDRC_WIDTH = 64 bytes. + +The current driver does not support sampling. So "perf record" is +unsupported. Also attach to a task is unsupported as the events are all +uncore. diff --git a/Documentation/admin-guide/perf/index.rst b/Documentation/admin-guide/perf/index.rst index 9c9ece88ce53..793e1970bc05 100644 --- a/Documentation/admin-guide/perf/index.rst +++ b/Documentation/admin-guide/perf/index.rst @@ -18,3 +18,4 @@ Performance monitor support xgene-pmu arm_dsu_pmu thunderx2-pmu + alibaba_pmu diff --git a/Documentation/admin-guide/pm/amd-pstate.rst b/Documentation/admin-guide/pm/amd-pstate.rst index 83b58eb4ab4d..8f3d30c5a0d8 100644 --- a/Documentation/admin-guide/pm/amd-pstate.rst +++ b/Documentation/admin-guide/pm/amd-pstate.rst @@ -182,6 +182,7 @@ to the ``struct sugov_cpu`` that the utilization update belongs to. Then, ``amd-pstate`` updates the desired performance according to the CPU scheduler assigned. +.. _processor_support: Processor Support ======================= @@ -282,6 +283,8 @@ efficiency frequency management method on AMD processors. Kernel Module Options for ``amd-pstate`` ========================================= +.. _shared_mem: + ``shared_mem`` Use a module param (shared_mem) to enable related processors manually with **amd_pstate.shared_mem=1**. @@ -393,6 +396,76 @@ about part of the output. :: CPU_005 712 116384 39 49 166 0.7565 9645075 2214891 38431470 25.1 11.646 469 2.496 kworker/5:0-40 CPU_006 712 116408 39 49 166 0.6769 8950227 1839034 37192089 24.06 11.272 470 2.496 kworker/6:0-1264 +Unit Tests for amd-pstate +------------------------- + +``amd-pstate-ut`` is a test module for testing the ``amd-pstate`` driver. + + * It can help all users to verify their processor support (SBIOS/Firmware or Hardware). + + * Kernel can have a basic function test to avoid the kernel regression during the update. + + * We can introduce more functional or performance tests to align the result together, it will benefit power and performance scale optimization. + +1. Test case decriptions + + +---------+--------------------------------+------------------------------------------------------------------------------------+ + | Index | Functions | Description | + +=========+================================+====================================================================================+ + | 0 | amd_pstate_ut_acpi_cpc_valid || Check whether the _CPC object is present in SBIOS. | + | | || | + | | || The detail refer to `Processor Support <processor_support_>`_. | + +---------+--------------------------------+------------------------------------------------------------------------------------+ + | 1 | amd_pstate_ut_check_enabled || Check whether AMD P-State is enabled. | + | | || | + | | || AMD P-States and ACPI hardware P-States always can be supported in one processor. | + | | | But AMD P-States has the higher priority and if it is enabled with | + | | | :c:macro:`MSR_AMD_CPPC_ENABLE` or ``cppc_set_enable``, it will respond to the | + | | | request from AMD P-States. | + +---------+--------------------------------+------------------------------------------------------------------------------------+ + | 2 | amd_pstate_ut_check_perf || Check if the each performance values are reasonable. | + | | || highest_perf >= nominal_perf > lowest_nonlinear_perf > lowest_perf > 0. | + +---------+--------------------------------+------------------------------------------------------------------------------------+ + | 3 | amd_pstate_ut_check_freq || Check if the each frequency values and max freq when set support boost mode | + | | | are reasonable. | + | | || max_freq >= nominal_freq > lowest_nonlinear_freq > min_freq > 0 | + | | || If boost is not active but supported, this maximum frequency will be larger than | + | | | the one in ``cpuinfo``. | + +---------+--------------------------------+------------------------------------------------------------------------------------+ + +#. How to execute the tests + + We use test module in the kselftest frameworks to implement it. + We create amd-pstate-ut module and tie it into kselftest.(for + details refer to Linux Kernel Selftests [4]_). + + 1. Build + + + open the :c:macro:`CONFIG_X86_AMD_PSTATE` configuration option. + + set the :c:macro:`CONFIG_X86_AMD_PSTATE_UT` configuration option to M. + + make project + + make selftest :: + + $ cd linux + $ make -C tools/testing/selftests + + #. Installation & Steps :: + + $ make -C tools/testing/selftests install INSTALL_PATH=~/kselftest + $ sudo ./kselftest/run_kselftest.sh -c amd-pstate + TAP version 13 + 1..1 + # selftests: amd-pstate: amd-pstate-ut.sh + # amd-pstate-ut: ok + ok 1 selftests: amd-pstate: amd-pstate-ut.sh + + #. Results :: + + $ dmesg | grep "amd_pstate_ut" | tee log.txt + [12977.570663] amd_pstate_ut: 1 amd_pstate_ut_acpi_cpc_valid success! + [12977.570673] amd_pstate_ut: 2 amd_pstate_ut_check_enabled success! + [12977.571207] amd_pstate_ut: 3 amd_pstate_ut_check_perf success! + [12977.571212] amd_pstate_ut: 4 amd_pstate_ut_check_freq success! Reference =========== @@ -405,3 +478,6 @@ Reference .. [3] Processor Programming Reference (PPR) for AMD Family 19h Model 51h, Revision A1 Processors https://www.amd.com/system/files/TechDocs/56569-A1-PUB.zip + +.. [4] Linux Kernel Selftests, + https://www.kernel.org/doc/html/latest/dev-tools/kselftest.html diff --git a/Documentation/admin-guide/sysctl/net.rst b/Documentation/admin-guide/sysctl/net.rst index 805f2281e000..6394f5dc2303 100644 --- a/Documentation/admin-guide/sysctl/net.rst +++ b/Documentation/admin-guide/sysctl/net.rst @@ -31,17 +31,18 @@ see only some of them, depending on your kernel's configuration. Table : Subdirectories in /proc/sys/net - ========= =================== = ========== ================== + ========= =================== = ========== =================== Directory Content Directory Content - ========= =================== = ========== ================== - core General parameter appletalk Appletalk protocol - unix Unix domain sockets netrom NET/ROM - 802 E802 protocol ax25 AX25 - ethernet Ethernet protocol rose X.25 PLP layer + ========= =================== = ========== =================== + 802 E802 protocol mptcp Multipath TCP + appletalk Appletalk protocol netfilter Network Filter + ax25 AX25 netrom NET/ROM + bridge Bridging rose X.25 PLP layer + core General parameter tipc TIPC + ethernet Ethernet protocol unix Unix domain sockets ipv4 IP version 4 x25 X.25 protocol - bridge Bridging decnet DEC net - ipv6 IP version 6 tipc TIPC - ========= =================== = ========== ================== + ipv6 IP version 6 + ========= =================== = ========== =================== 1. /proc/sys/net/core - Network core options ============================================ @@ -101,6 +102,9 @@ Values: - 1 - enable JIT hardening for unprivileged users only - 2 - enable JIT hardening for all users +where "privileged user" in this context means a process having +CAP_BPF or CAP_SYS_ADMIN in the root user name space. + bpf_jit_kallsyms ---------------- @@ -271,7 +275,7 @@ poll cycle or the number of packets processed reaches netdev_budget. netdev_max_backlog ------------------ -Maximum number of packets, queued on the INPUT side, when the interface +Maximum number of packets, queued on the INPUT side, when the interface receives packets faster than kernel can process them. netdev_rss_key diff --git a/Documentation/admin-guide/tainted-kernels.rst b/Documentation/admin-guide/tainted-kernels.rst index 7d80e8c307d1..92a8a07f5c43 100644 --- a/Documentation/admin-guide/tainted-kernels.rst +++ b/Documentation/admin-guide/tainted-kernels.rst @@ -134,6 +134,12 @@ More detailed explanation for tainting scsi/snic on something else than x86_64, scsi/ips on non x86/x86_64/itanium, have broken firmware settings for the irqchip/irq-gic on arm64 ...). + - x86/x86_64: Microcode late loading is dangerous and will result in + tainting the kernel. It requires that all CPUs rendezvous to make sure + the update happens when the system is as quiescent as possible. However, + a higher priority MCE/SMI/NMI can move control flow away from that + rendezvous and interrupt the update, which can be detrimental to the + machine. 3) ``R`` if a module was force unloaded by ``rmmod -f``, ``' '`` if all modules were unloaded normally. |