diff options
Diffstat (limited to 'Documentation/admin-guide')
-rw-r--r-- | Documentation/admin-guide/README.rst | 91 | ||||
-rw-r--r-- | Documentation/admin-guide/acpi/dsdt-override.rst | 13 | ||||
-rw-r--r-- | Documentation/admin-guide/cgroup-v2.rst | 5 | ||||
-rw-r--r-- | Documentation/admin-guide/dynamic-debug-howto.rst | 246 | ||||
-rw-r--r-- | Documentation/admin-guide/hw-vuln/spectre.rst | 1 | ||||
-rw-r--r-- | Documentation/admin-guide/kdump/vmcoreinfo.rst | 2 | ||||
-rw-r--r-- | Documentation/admin-guide/kernel-parameters.txt | 29 | ||||
-rw-r--r-- | Documentation/admin-guide/mm/hugetlbpage.rst | 2 | ||||
-rw-r--r-- | Documentation/admin-guide/perf/alibaba_pmu.rst | 100 | ||||
-rw-r--r-- | Documentation/admin-guide/perf/index.rst | 1 | ||||
-rw-r--r-- | Documentation/admin-guide/pm/amd-pstate.rst | 76 | ||||
-rw-r--r-- | Documentation/admin-guide/sysctl/net.rst | 22 | ||||
-rw-r--r-- | Documentation/admin-guide/tainted-kernels.rst | 6 |
13 files changed, 359 insertions, 235 deletions
diff --git a/Documentation/admin-guide/README.rst b/Documentation/admin-guide/README.rst index 9eb6b9042f75..9a969c0157f1 100644 --- a/Documentation/admin-guide/README.rst +++ b/Documentation/admin-guide/README.rst @@ -262,8 +262,6 @@ Compiling the kernel - Make sure you have at least gcc 5.1 available. For more information, refer to :ref:`Documentation/process/changes.rst <changes>`. - Please note that you can still run a.out user programs with this kernel. - - Do a ``make`` to create a compressed kernel image. It is also possible to do ``make install`` if you have lilo installed to suit the kernel makefiles, but you may want to check your particular lilo setup first. @@ -332,85 +330,10 @@ Compiling the kernel If something goes wrong ----------------------- - - If you have problems that seem to be due to kernel bugs, please check - the file MAINTAINERS to see if there is a particular person associated - with the part of the kernel that you are having trouble with. If there - isn't anyone listed there, then the second best thing is to mail - them to me (torvalds@linux-foundation.org), and possibly to any other - relevant mailing-list or to the newsgroup. - - - In all bug-reports, *please* tell what kernel you are talking about, - how to duplicate the problem, and what your setup is (use your common - sense). If the problem is new, tell me so, and if the problem is - old, please try to tell me when you first noticed it. - - - If the bug results in a message like:: - - unable to handle kernel paging request at address C0000010 - Oops: 0002 - EIP: 0010:XXXXXXXX - eax: xxxxxxxx ebx: xxxxxxxx ecx: xxxxxxxx edx: xxxxxxxx - esi: xxxxxxxx edi: xxxxxxxx ebp: xxxxxxxx - ds: xxxx es: xxxx fs: xxxx gs: xxxx - Pid: xx, process nr: xx - xx xx xx xx xx xx xx xx xx xx - - or similar kernel debugging information on your screen or in your - system log, please duplicate it *exactly*. The dump may look - incomprehensible to you, but it does contain information that may - help debugging the problem. The text above the dump is also - important: it tells something about why the kernel dumped code (in - the above example, it's due to a bad kernel pointer). More information - on making sense of the dump is in Documentation/admin-guide/bug-hunting.rst - - - If you compiled the kernel with CONFIG_KALLSYMS you can send the dump - as is, otherwise you will have to use the ``ksymoops`` program to make - sense of the dump (but compiling with CONFIG_KALLSYMS is usually preferred). - This utility can be downloaded from - https://www.kernel.org/pub/linux/utils/kernel/ksymoops/ . - Alternatively, you can do the dump lookup by hand: - - - In debugging dumps like the above, it helps enormously if you can - look up what the EIP value means. The hex value as such doesn't help - me or anybody else very much: it will depend on your particular - kernel setup. What you should do is take the hex value from the EIP - line (ignore the ``0010:``), and look it up in the kernel namelist to - see which kernel function contains the offending address. - - To find out the kernel function name, you'll need to find the system - binary associated with the kernel that exhibited the symptom. This is - the file 'linux/vmlinux'. To extract the namelist and match it against - the EIP from the kernel crash, do:: - - nm vmlinux | sort | less - - This will give you a list of kernel addresses sorted in ascending - order, from which it is simple to find the function that contains the - offending address. Note that the address given by the kernel - debugging messages will not necessarily match exactly with the - function addresses (in fact, that is very unlikely), so you can't - just 'grep' the list: the list will, however, give you the starting - point of each kernel function, so by looking for the function that - has a starting address lower than the one you are searching for but - is followed by a function with a higher address you will find the one - you want. In fact, it may be a good idea to include a bit of - "context" in your problem report, giving a few lines around the - interesting one. - - If you for some reason cannot do the above (you have a pre-compiled - kernel image or similar), telling me as much about your setup as - possible will help. Please read - 'Documentation/admin-guide/reporting-issues.rst' for details. - - - Alternatively, you can use gdb on a running kernel. (read-only; i.e. you - cannot change values or set break points.) To do this, first compile the - kernel with -g; edit arch/x86/Makefile appropriately, then do a ``make - clean``. You'll also need to enable CONFIG_PROC_FS (via ``make config``). - - After you've rebooted with the new kernel, do ``gdb vmlinux /proc/kcore``. - You can now use all the usual gdb commands. The command to look up the - point where your system crashed is ``l *0xXXXXXXXX``. (Replace the XXXes - with the EIP value.) - - gdb'ing a non-running kernel currently fails because ``gdb`` (wrongly) - disregards the starting offset for which the kernel is compiled. +If you have problems that seem to be due to kernel bugs, please follow the +instructions at 'Documentation/admin-guide/reporting-issues.rst'. + +Hints on understanding kernel bug reports are in +'Documentation/admin-guide/bug-hunting.rst'. More on debugging the kernel +with gdb is in 'Documentation/dev-tools/gdb-kernel-debugging.rst' and +'Documentation/dev-tools/kgdb.rst'. diff --git a/Documentation/admin-guide/acpi/dsdt-override.rst b/Documentation/admin-guide/acpi/dsdt-override.rst deleted file mode 100644 index 50bd7f194bf4..000000000000 --- a/Documentation/admin-guide/acpi/dsdt-override.rst +++ /dev/null @@ -1,13 +0,0 @@ -.. SPDX-License-Identifier: GPL-2.0 - -=============== -Overriding DSDT -=============== - -Linux supports a method of overriding the BIOS DSDT: - -CONFIG_ACPI_CUSTOM_DSDT - builds the image into the kernel. - -When to use this method is described in detail on the -Linux/ACPI home page: -https://01.org/linux-acpi/documentation/overriding-dsdt diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst index 9319c29f7eac..7bcfb38498c6 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -1355,6 +1355,11 @@ PAGE_SIZE multiple when read back. pagetables Amount of memory allocated for page tables. + sec_pagetables + Amount of memory allocated for secondary page tables, + this currently includes KVM mmu allocations on x86 + and arm64. + percpu (npn) Amount of memory used for storing per-cpu kernel data structures. diff --git a/Documentation/admin-guide/dynamic-debug-howto.rst b/Documentation/admin-guide/dynamic-debug-howto.rst index a89cfa083155..faa22f77847a 100644 --- a/Documentation/admin-guide/dynamic-debug-howto.rst +++ b/Documentation/admin-guide/dynamic-debug-howto.rst @@ -5,143 +5,115 @@ Dynamic debug Introduction ============ -This document describes how to use the dynamic debug (dyndbg) feature. +Dynamic debug allows you to dynamically enable/disable kernel +debug-print code to obtain additional kernel information. -Dynamic debug is designed to allow you to dynamically enable/disable -kernel code to obtain additional kernel information. Currently, if -``CONFIG_DYNAMIC_DEBUG`` is set, then all ``pr_debug()``/``dev_dbg()`` and -``print_hex_dump_debug()``/``print_hex_dump_bytes()`` calls can be dynamically -enabled per-callsite. +If ``/proc/dynamic_debug/control`` exists, your kernel has dynamic +debug. You'll need root access (sudo su) to use this. -If you do not want to enable dynamic debug globally (i.e. in some embedded -system), you may set ``CONFIG_DYNAMIC_DEBUG_CORE`` as basic support of dynamic -debug and add ``ccflags := -DDYNAMIC_DEBUG_MODULE`` into the Makefile of any -modules which you'd like to dynamically debug later. - -If ``CONFIG_DYNAMIC_DEBUG`` is not set, ``print_hex_dump_debug()`` is just -shortcut for ``print_hex_dump(KERN_DEBUG)``. - -For ``print_hex_dump_debug()``/``print_hex_dump_bytes()``, format string is -its ``prefix_str`` argument, if it is constant string; or ``hexdump`` -in case ``prefix_str`` is built dynamically. +Dynamic debug provides: -Dynamic debug has even more useful features: + * a Catalog of all *prdbgs* in your kernel. + ``cat /proc/dynamic_debug/control`` to see them. - * Simple query language allows turning on and off debugging - statements by matching any combination of 0 or 1 of: + * a Simple query/command language to alter *prdbgs* by selecting on + any combination of 0 or 1 of: - source filename - function name - line number (including ranges of line numbers) - module name - format string - - * Provides a debugfs control file: ``<debugfs>/dynamic_debug/control`` - which can be read to display the complete list of known debug - statements, to help guide you - -Controlling dynamic debug Behaviour -=================================== - -The behaviour of ``pr_debug()``/``dev_dbg()`` are controlled via writing to a -control file in the 'debugfs' filesystem. Thus, you must first mount -the debugfs filesystem, in order to make use of this feature. -Subsequently, we refer to the control file as: -``<debugfs>/dynamic_debug/control``. For example, if you want to enable -printing from source file ``svcsock.c``, line 1603 you simply do:: - - nullarbor:~ # echo 'file svcsock.c line 1603 +p' > - <debugfs>/dynamic_debug/control - -If you make a mistake with the syntax, the write will fail thus:: - - nullarbor:~ # echo 'file svcsock.c wtf 1 +p' > - <debugfs>/dynamic_debug/control - -bash: echo: write error: Invalid argument - -Note, for systems without 'debugfs' enabled, the control file can be -found in ``/proc/dynamic_debug/control``. + - class name (as known/declared by each module) Viewing Dynamic Debug Behaviour =============================== -You can view the currently configured behaviour of all the debug -statements via:: +You can view the currently configured behaviour in the *prdbg* catalog:: - nullarbor:~ # cat <debugfs>/dynamic_debug/control + :#> head -n7 /proc/dynamic_debug/control # filename:lineno [module]function flags format - net/sunrpc/svc_rdma.c:323 [svcxprt_rdma]svc_rdma_cleanup =_ "SVCRDMA Module Removed, deregister RPC RDMA transport\012" - net/sunrpc/svc_rdma.c:341 [svcxprt_rdma]svc_rdma_init =_ "\011max_inline : %d\012" - net/sunrpc/svc_rdma.c:340 [svcxprt_rdma]svc_rdma_init =_ "\011sq_depth : %d\012" - net/sunrpc/svc_rdma.c:338 [svcxprt_rdma]svc_rdma_init =_ "\011max_requests : %d\012" - ... + init/main.c:1179 [main]initcall_blacklist =_ "blacklisting initcall %s\012 + init/main.c:1218 [main]initcall_blacklisted =_ "initcall %s blacklisted\012" + init/main.c:1424 [main]run_init_process =_ " with arguments:\012" + init/main.c:1426 [main]run_init_process =_ " %s\012" + init/main.c:1427 [main]run_init_process =_ " with environment:\012" + init/main.c:1429 [main]run_init_process =_ " %s\012" +The 3rd space-delimited column shows the current flags, preceded by +a ``=`` for easy use with grep/cut. ``=p`` shows enabled callsites. -You can also apply standard Unix text manipulation filters to this -data, e.g.:: +Controlling dynamic debug Behaviour +=================================== - nullarbor:~ # grep -i rdma <debugfs>/dynamic_debug/control | wc -l - 62 +The behaviour of *prdbg* sites are controlled by writing +query/commands to the control file. Example:: - nullarbor:~ # grep -i tcp <debugfs>/dynamic_debug/control | wc -l - 42 + # grease the interface + :#> alias ddcmd='echo $* > /proc/dynamic_debug/control' -The third column shows the currently enabled flags for each debug -statement callsite (see below for definitions of the flags). The -default value, with no flags enabled, is ``=_``. So you can view all -the debug statement callsites with any non-default flags:: + :#> ddcmd '-p; module main func run* +p' + :#> grep =p /proc/dynamic_debug/control + init/main.c:1424 [main]run_init_process =p " with arguments:\012" + init/main.c:1426 [main]run_init_process =p " %s\012" + init/main.c:1427 [main]run_init_process =p " with environment:\012" + init/main.c:1429 [main]run_init_process =p " %s\012" - nullarbor:~ # awk '$3 != "=_"' <debugfs>/dynamic_debug/control - # filename:lineno [module]function flags format - net/sunrpc/svcsock.c:1603 [sunrpc]svc_send p "svc_process: st_sendto returned %d\012" +Error messages go to console/syslog:: + + :#> ddcmd mode foo +p + dyndbg: unknown keyword "mode" + dyndbg: query parse failed + bash: echo: write error: Invalid argument + +If debugfs is also enabled and mounted, ``dynamic_debug/control`` is +also under the mount-dir, typically ``/sys/kernel/debug/``. Command Language Reference ========================== -At the lexical level, a command comprises a sequence of words separated +At the basic lexical level, a command is a sequence of words separated by spaces or tabs. So these are all equivalent:: - nullarbor:~ # echo -n 'file svcsock.c line 1603 +p' > - <debugfs>/dynamic_debug/control - nullarbor:~ # echo -n ' file svcsock.c line 1603 +p ' > - <debugfs>/dynamic_debug/control - nullarbor:~ # echo -n 'file svcsock.c line 1603 +p' > - <debugfs>/dynamic_debug/control + :#> ddcmd file svcsock.c line 1603 +p + :#> ddcmd "file svcsock.c line 1603 +p" + :#> ddcmd ' file svcsock.c line 1603 +p ' Command submissions are bounded by a write() system call. Multiple commands can be written together, separated by ``;`` or ``\n``:: - ~# echo "func pnpacpi_get_resources +p; func pnp_assign_mem +p" \ - > <debugfs>/dynamic_debug/control + :#> ddcmd "func pnpacpi_get_resources +p; func pnp_assign_mem +p" + :#> ddcmd <<"EOC" + func pnpacpi_get_resources +p + func pnp_assign_mem +p + EOC + :#> cat query-batch-file > /proc/dynamic_debug/control -If your query set is big, you can batch them too:: +You can also use wildcards in each query term. The match rule supports +``*`` (matches zero or more characters) and ``?`` (matches exactly one +character). For example, you can match all usb drivers:: - ~# cat query-batch-file > <debugfs>/dynamic_debug/control + :#> ddcmd file "drivers/usb/*" +p # "" to suppress shell expansion -Another way is to use wildcards. The match rule supports ``*`` (matches -zero or more characters) and ``?`` (matches exactly one character). For -example, you can match all usb drivers:: - - ~# echo "file drivers/usb/* +p" > <debugfs>/dynamic_debug/control - -At the syntactical level, a command comprises a sequence of match -specifications, followed by a flags change specification:: +Syntactically, a command is pairs of keyword values, followed by a +flags change or setting:: command ::= match-spec* flags-spec -The match-spec's are used to choose a subset of the known pr_debug() -callsites to which to apply the flags-spec. Think of them as a query -with implicit ANDs between each pair. Note that an empty list of -match-specs will select all debug statement callsites. +The match-spec's select *prdbgs* from the catalog, upon which to apply +the flags-spec, all constraints are ANDed together. An absent keyword +is the same as keyword "*". + -A match specification comprises a keyword, which controls the -attribute of the callsite to be compared, and a value to compare -against. Possible keywords are::: +A match specification is a keyword, which selects the attribute of +the callsite to be compared, and a value to compare against. Possible +keywords are::: match-spec ::= 'func' string | 'file' string | 'module' string | 'format' string | + 'class' string | 'line' line-range line-range ::= lineno | @@ -203,6 +175,16 @@ format format "nfsd: SETATTR" // a neater way to match a format with whitespace format 'nfsd: SETATTR' // yet another way to match a format with whitespace +class + The given class_name is validated against each module, which may + have declared a list of known class_names. If the class_name is + found for a module, callsite & class matching and adjustment + proceeds. Examples:: + + class DRM_UT_KMS # a DRM.debug category + class JUNK # silent non-match + // class TLD_* # NOTICE: no wildcard in class names + line The given line number or range of line numbers is compared against the line number of each ``pr_debug()`` callsite. A single @@ -228,17 +210,16 @@ of the characters:: The flags are:: p enables the pr_debug() callsite. - f Include the function name in the printed message - l Include line number in the printed message - m Include module name in the printed message - t Include thread ID in messages not generated from interrupt context - _ No flags are set. (Or'd with others on input) + _ enables no flags. -For ``print_hex_dump_debug()`` and ``print_hex_dump_bytes()``, only ``p`` flag -have meaning, other flags ignored. + Decorator flags add to the message-prefix, in order: + t Include thread ID, or <intr> + m Include module name + f Include the function name + l Include line number -For display, the flags are preceded by ``=`` -(mnemonic: what the flags are currently equal to). +For ``print_hex_dump_debug()`` and ``print_hex_dump_bytes()``, only +the ``p`` flag has meaning, other flags are ignored. Note the regexp ``^[-+=][flmpt_]+$`` matches a flags specification. To clear all flags at once, use ``=_`` or ``-flmpt``. @@ -313,7 +294,7 @@ For ``CONFIG_DYNAMIC_DEBUG`` kernels, any settings given at boot-time (or enabled by ``-DDEBUG`` flag during compilation) can be disabled later via the debugfs interface if the debug messages are no longer needed:: - echo "module module_name -p" > <debugfs>/dynamic_debug/control + echo "module module_name -p" > /proc/dynamic_debug/control Examples ======== @@ -321,37 +302,31 @@ Examples :: // enable the message at line 1603 of file svcsock.c - nullarbor:~ # echo -n 'file svcsock.c line 1603 +p' > - <debugfs>/dynamic_debug/control + :#> ddcmd 'file svcsock.c line 1603 +p' // enable all the messages in file svcsock.c - nullarbor:~ # echo -n 'file svcsock.c +p' > - <debugfs>/dynamic_debug/control + :#> ddcmd 'file svcsock.c +p' // enable all the messages in the NFS server module - nullarbor:~ # echo -n 'module nfsd +p' > - <debugfs>/dynamic_debug/control + :#> ddcmd 'module nfsd +p' // enable all 12 messages in the function svc_process() - nullarbor:~ # echo -n 'func svc_process +p' > - <debugfs>/dynamic_debug/control + :#> ddcmd 'func svc_process +p' // disable all 12 messages in the function svc_process() - nullarbor:~ # echo -n 'func svc_process -p' > - <debugfs>/dynamic_debug/control + :#> ddcmd 'func svc_process -p' // enable messages for NFS calls READ, READLINK, READDIR and READDIR+. - nullarbor:~ # echo -n 'format "nfsd: READ" +p' > - <debugfs>/dynamic_debug/control + :#> ddcmd 'format "nfsd: READ" +p' // enable messages in files of which the paths include string "usb" - nullarbor:~ # echo -n 'file *usb* +p' > <debugfs>/dynamic_debug/control + :#> ddcmd 'file *usb* +p' > /proc/dynamic_debug/control // enable all messages - nullarbor:~ # echo -n '+p' > <debugfs>/dynamic_debug/control + :#> ddcmd '+p' > /proc/dynamic_debug/control // add module, function to all enabled messages - nullarbor:~ # echo -n '+mf' > <debugfs>/dynamic_debug/control + :#> ddcmd '+mf' > /proc/dynamic_debug/control // boot-args example, with newlines and comments for readability Kernel command line: ... @@ -364,3 +339,38 @@ Examples dyndbg="file init/* +p #cmt ; func parse_one +p" // enable pr_debugs in 2 functions in a module loaded later pc87360.dyndbg="func pc87360_init_device +p; func pc87360_find +p" + +Kernel Configuration +==================== + +Dynamic Debug is enabled via kernel config items:: + + CONFIG_DYNAMIC_DEBUG=y # build catalog, enables CORE + CONFIG_DYNAMIC_DEBUG_CORE=y # enable mechanics only, skip catalog + +If you do not want to enable dynamic debug globally (i.e. in some embedded +system), you may set ``CONFIG_DYNAMIC_DEBUG_CORE`` as basic support of dynamic +debug and add ``ccflags := -DDYNAMIC_DEBUG_MODULE`` into the Makefile of any +modules which you'd like to dynamically debug later. + + +Kernel *prdbg* API +================== + +The following functions are cataloged and controllable when dynamic +debug is enabled:: + + pr_debug() + dev_dbg() + print_hex_dump_debug() + print_hex_dump_bytes() + +Otherwise, they are off by default; ``ccflags += -DDEBUG`` or +``#define DEBUG`` in a source file will enable them appropriately. + +If ``CONFIG_DYNAMIC_DEBUG`` is not set, ``print_hex_dump_debug()`` is +just a shortcut for ``print_hex_dump(KERN_DEBUG)``. + +For ``print_hex_dump_debug()``/``print_hex_dump_bytes()``, format string is +its ``prefix_str`` argument, if it is constant string; or ``hexdump`` +in case ``prefix_str`` is built dynamically. diff --git a/Documentation/admin-guide/hw-vuln/spectre.rst b/Documentation/admin-guide/hw-vuln/spectre.rst index 2ce2a38cdd55..c4dcdb3d0d45 100644 --- a/Documentation/admin-guide/hw-vuln/spectre.rst +++ b/Documentation/admin-guide/hw-vuln/spectre.rst @@ -613,6 +613,7 @@ kernel command line. eibrs enhanced IBRS eibrs,retpoline enhanced IBRS + Retpolines eibrs,lfence enhanced IBRS + LFENCE + ibrs use IBRS to protect kernel Not specifying this option is equivalent to spectre_v2=auto. diff --git a/Documentation/admin-guide/kdump/vmcoreinfo.rst b/Documentation/admin-guide/kdump/vmcoreinfo.rst index 8419019b6a88..6726f439958c 100644 --- a/Documentation/admin-guide/kdump/vmcoreinfo.rst +++ b/Documentation/admin-guide/kdump/vmcoreinfo.rst @@ -200,7 +200,7 @@ prb A pointer to the printk ringbuffer (struct printk_ringbuffer). This may be pointing to the static boot ringbuffer or the dynamically -allocated ringbuffer, depending on when the the core dump occurred. +allocated ringbuffer, depending on when the core dump occurred. Used by user-space tools to read the active kernel log buffer. printk_rb_static diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 426fa892d311..b2413342b309 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -966,10 +966,6 @@ debugpat [X86] Enable PAT debugging - decnet.addr= [HW,NET] - Format: <area>[,<node>] - See also Documentation/networking/decnet.rst. - default_hugepagesz= [HW] The size of the default HugeTLB page. This is the size represented by the legacy /proc/ hugepages @@ -2436,6 +2432,12 @@ 0: force disabled 1: force enabled + kunit.enable= [KUNIT] Enable executing KUnit tests. Requires + CONFIG_KUNIT to be set to be fully enabled. The + default value can be overridden via + KUNIT_DEFAULT_ENABLED. + Default is 1 (enabled) + kvm.ignore_msrs=[KVM] Ignore guest accesses to unhandled MSRs. Default is 0 (don't ignore, but inject #GP) @@ -3207,6 +3209,7 @@ spectre_v2_user=off [X86] spec_store_bypass_disable=off [X86,PPC] ssbd=force-off [ARM64] + nospectre_bhb [ARM64] l1tf=off [X86] mds=off [X86] tsx_async_abort=off [X86] @@ -3613,7 +3616,7 @@ nohugeiomap [KNL,X86,PPC,ARM64] Disable kernel huge I/O mappings. - nohugevmalloc [PPC] Disable kernel huge vmalloc mappings. + nohugevmalloc [KNL,X86,PPC,ARM64] Disable kernel huge vmalloc mappings. nosmt [KNL,S390] Disable symmetric multithreading (SMT). Equivalent to smt=1. @@ -3626,11 +3629,15 @@ (bounds check bypass). With this option data leaks are possible in the system. - nospectre_v2 [X86,PPC_FSL_BOOK3E,ARM64] Disable all mitigations for + nospectre_v2 [X86,PPC_E500,ARM64] Disable all mitigations for the Spectre variant 2 (indirect branch prediction) vulnerability. System may allow data leaks with this option. + nospectre_bhb [ARM64] Disable all mitigations for Spectre-BHB (branch + history injection) vulnerability. System may allow data leaks + with this option. + nospec_store_bypass_disable [HW] Disable all mitigations for the Speculative Store Bypass vulnerability @@ -3741,9 +3748,9 @@ [X86,PV_OPS] Disable paravirtualized VMware scheduler clock and use the default one. - no-steal-acc [X86,PV_OPS,ARM64] Disable paravirtualized steal time - accounting. steal time is computed, but won't - influence scheduler behaviour + no-steal-acc [X86,PV_OPS,ARM64,PPC/PSERIES] Disable paravirtualized + steal time accounting. steal time is computed, but + won't influence scheduler behaviour nolapic [X86-32,APIC] Do not enable or use the local APIC. @@ -3805,6 +3812,10 @@ nox2apic [X86-64,APIC] Do not enable x2APIC mode. + NOTE: this parameter will be ignored on systems with the + LEGACY_XAPIC_DISABLED bit set in the + IA32_XAPIC_DISABLE_STATUS MSR. + nps_mtm_hs_ctr= [KNL,ARC] This parameter sets the maximum duration, in cycles, each HW thread of the CTOP can run diff --git a/Documentation/admin-guide/mm/hugetlbpage.rst b/Documentation/admin-guide/mm/hugetlbpage.rst index 8e2727dc18d4..19f27c0d92e0 100644 --- a/Documentation/admin-guide/mm/hugetlbpage.rst +++ b/Documentation/admin-guide/mm/hugetlbpage.rst @@ -65,7 +65,7 @@ HugePages_Surp may be temporarily larger than the maximum number of surplus huge pages when the system is under memory pressure. Hugepagesize - is the default hugepage size (in Kb). + is the default hugepage size (in kB). Hugetlb is the total amount of memory (in kB), consumed by huge pages of all sizes. diff --git a/Documentation/admin-guide/perf/alibaba_pmu.rst b/Documentation/admin-guide/perf/alibaba_pmu.rst new file mode 100644 index 000000000000..11de998bb480 --- /dev/null +++ b/Documentation/admin-guide/perf/alibaba_pmu.rst @@ -0,0 +1,100 @@ +============================================================= +Alibaba's T-Head SoC Uncore Performance Monitoring Unit (PMU) +============================================================= + +The Yitian 710, custom-built by Alibaba Group's chip development business, +T-Head, implements uncore PMU for performance and functional debugging to +facilitate system maintenance. + +DDR Sub-System Driveway (DRW) PMU Driver +========================================= + +Yitian 710 employs eight DDR5/4 channels, four on each die. Each DDR5 channel +is independent of others to service system memory requests. And one DDR5 +channel is split into two independent sub-channels. The DDR Sub-System Driveway +implements separate PMUs for each sub-channel to monitor various performance +metrics. + +The Driveway PMU devices are named as ali_drw_<sys_base_addr> with perf. +For example, ali_drw_21000 and ali_drw_21080 are two PMU devices for two +sub-channels of the same channel in die 0. And the PMU device of die 1 is +prefixed with ali_drw_400XXXXX, e.g. ali_drw_40021000. + +Each sub-channel has 36 PMU counters in total, which is classified into +four groups: + +- Group 0: PMU Cycle Counter. This group has one pair of counters + pmu_cycle_cnt_low and pmu_cycle_cnt_high, that is used as the cycle count + based on DDRC core clock. + +- Group 1: PMU Bandwidth Counters. This group has 8 counters that are used + to count the total access number of either the eight bank groups in a + selected rank, or four ranks separately in the first 4 counters. The base + transfer unit is 64B. + +- Group 2: PMU Retry Counters. This group has 10 counters, that intend to + count the total retry number of each type of uncorrectable error. + +- Group 3: PMU Common Counters. This group has 16 counters, that are used + to count the common events. + +For now, the Driveway PMU driver only uses counters in group 0 and group 3. + +The DDR Controller (DDRCTL) and DDR PHY combine to create a complete solution +for connecting an SoC application bus to DDR memory devices. The DDRCTL +receives transactions Host Interface (HIF) which is custom-defined by Synopsys. +These transactions are queued internally and scheduled for access while +satisfying the SDRAM protocol timing requirements, transaction priorities, and +dependencies between the transactions. The DDRCTL in turn issues commands on +the DDR PHY Interface (DFI) to the PHY module, which launches and captures data +to and from the SDRAM. The driveway PMUs have hardware logic to gather +statistics and performance logging signals on HIF, DFI, etc. + +By counting the READ, WRITE and RMW commands sent to the DDRC through the HIF +interface, we could calculate the bandwidth. Example usage of counting memory +data bandwidth:: + + perf stat \ + -e ali_drw_21000/hif_wr/ \ + -e ali_drw_21000/hif_rd/ \ + -e ali_drw_21000/hif_rmw/ \ + -e ali_drw_21000/cycle/ \ + -e ali_drw_21080/hif_wr/ \ + -e ali_drw_21080/hif_rd/ \ + -e ali_drw_21080/hif_rmw/ \ + -e ali_drw_21080/cycle/ \ + -e ali_drw_23000/hif_wr/ \ + -e ali_drw_23000/hif_rd/ \ + -e ali_drw_23000/hif_rmw/ \ + -e ali_drw_23000/cycle/ \ + -e ali_drw_23080/hif_wr/ \ + -e ali_drw_23080/hif_rd/ \ + -e ali_drw_23080/hif_rmw/ \ + -e ali_drw_23080/cycle/ \ + -e ali_drw_25000/hif_wr/ \ + -e ali_drw_25000/hif_rd/ \ + -e ali_drw_25000/hif_rmw/ \ + -e ali_drw_25000/cycle/ \ + -e ali_drw_25080/hif_wr/ \ + -e ali_drw_25080/hif_rd/ \ + -e ali_drw_25080/hif_rmw/ \ + -e ali_drw_25080/cycle/ \ + -e ali_drw_27000/hif_wr/ \ + -e ali_drw_27000/hif_rd/ \ + -e ali_drw_27000/hif_rmw/ \ + -e ali_drw_27000/cycle/ \ + -e ali_drw_27080/hif_wr/ \ + -e ali_drw_27080/hif_rd/ \ + -e ali_drw_27080/hif_rmw/ \ + -e ali_drw_27080/cycle/ -- sleep 10 + +The average DRAM bandwidth can be calculated as follows: + +- Read Bandwidth = perf_hif_rd * DDRC_WIDTH * DDRC_Freq / DDRC_Cycle +- Write Bandwidth = (perf_hif_wr + perf_hif_rmw) * DDRC_WIDTH * DDRC_Freq / DDRC_Cycle + +Here, DDRC_WIDTH = 64 bytes. + +The current driver does not support sampling. So "perf record" is +unsupported. Also attach to a task is unsupported as the events are all +uncore. diff --git a/Documentation/admin-guide/perf/index.rst b/Documentation/admin-guide/perf/index.rst index 9c9ece88ce53..793e1970bc05 100644 --- a/Documentation/admin-guide/perf/index.rst +++ b/Documentation/admin-guide/perf/index.rst @@ -18,3 +18,4 @@ Performance monitor support xgene-pmu arm_dsu_pmu thunderx2-pmu + alibaba_pmu diff --git a/Documentation/admin-guide/pm/amd-pstate.rst b/Documentation/admin-guide/pm/amd-pstate.rst index 83b58eb4ab4d..8f3d30c5a0d8 100644 --- a/Documentation/admin-guide/pm/amd-pstate.rst +++ b/Documentation/admin-guide/pm/amd-pstate.rst @@ -182,6 +182,7 @@ to the ``struct sugov_cpu`` that the utilization update belongs to. Then, ``amd-pstate`` updates the desired performance according to the CPU scheduler assigned. +.. _processor_support: Processor Support ======================= @@ -282,6 +283,8 @@ efficiency frequency management method on AMD processors. Kernel Module Options for ``amd-pstate`` ========================================= +.. _shared_mem: + ``shared_mem`` Use a module param (shared_mem) to enable related processors manually with **amd_pstate.shared_mem=1**. @@ -393,6 +396,76 @@ about part of the output. :: CPU_005 712 116384 39 49 166 0.7565 9645075 2214891 38431470 25.1 11.646 469 2.496 kworker/5:0-40 CPU_006 712 116408 39 49 166 0.6769 8950227 1839034 37192089 24.06 11.272 470 2.496 kworker/6:0-1264 +Unit Tests for amd-pstate +------------------------- + +``amd-pstate-ut`` is a test module for testing the ``amd-pstate`` driver. + + * It can help all users to verify their processor support (SBIOS/Firmware or Hardware). + + * Kernel can have a basic function test to avoid the kernel regression during the update. + + * We can introduce more functional or performance tests to align the result together, it will benefit power and performance scale optimization. + +1. Test case decriptions + + +---------+--------------------------------+------------------------------------------------------------------------------------+ + | Index | Functions | Description | + +=========+================================+====================================================================================+ + | 0 | amd_pstate_ut_acpi_cpc_valid || Check whether the _CPC object is present in SBIOS. | + | | || | + | | || The detail refer to `Processor Support <processor_support_>`_. | + +---------+--------------------------------+------------------------------------------------------------------------------------+ + | 1 | amd_pstate_ut_check_enabled || Check whether AMD P-State is enabled. | + | | || | + | | || AMD P-States and ACPI hardware P-States always can be supported in one processor. | + | | | But AMD P-States has the higher priority and if it is enabled with | + | | | :c:macro:`MSR_AMD_CPPC_ENABLE` or ``cppc_set_enable``, it will respond to the | + | | | request from AMD P-States. | + +---------+--------------------------------+------------------------------------------------------------------------------------+ + | 2 | amd_pstate_ut_check_perf || Check if the each performance values are reasonable. | + | | || highest_perf >= nominal_perf > lowest_nonlinear_perf > lowest_perf > 0. | + +---------+--------------------------------+------------------------------------------------------------------------------------+ + | 3 | amd_pstate_ut_check_freq || Check if the each frequency values and max freq when set support boost mode | + | | | are reasonable. | + | | || max_freq >= nominal_freq > lowest_nonlinear_freq > min_freq > 0 | + | | || If boost is not active but supported, this maximum frequency will be larger than | + | | | the one in ``cpuinfo``. | + +---------+--------------------------------+------------------------------------------------------------------------------------+ + +#. How to execute the tests + + We use test module in the kselftest frameworks to implement it. + We create amd-pstate-ut module and tie it into kselftest.(for + details refer to Linux Kernel Selftests [4]_). + + 1. Build + + + open the :c:macro:`CONFIG_X86_AMD_PSTATE` configuration option. + + set the :c:macro:`CONFIG_X86_AMD_PSTATE_UT` configuration option to M. + + make project + + make selftest :: + + $ cd linux + $ make -C tools/testing/selftests + + #. Installation & Steps :: + + $ make -C tools/testing/selftests install INSTALL_PATH=~/kselftest + $ sudo ./kselftest/run_kselftest.sh -c amd-pstate + TAP version 13 + 1..1 + # selftests: amd-pstate: amd-pstate-ut.sh + # amd-pstate-ut: ok + ok 1 selftests: amd-pstate: amd-pstate-ut.sh + + #. Results :: + + $ dmesg | grep "amd_pstate_ut" | tee log.txt + [12977.570663] amd_pstate_ut: 1 amd_pstate_ut_acpi_cpc_valid success! + [12977.570673] amd_pstate_ut: 2 amd_pstate_ut_check_enabled success! + [12977.571207] amd_pstate_ut: 3 amd_pstate_ut_check_perf success! + [12977.571212] amd_pstate_ut: 4 amd_pstate_ut_check_freq success! Reference =========== @@ -405,3 +478,6 @@ Reference .. [3] Processor Programming Reference (PPR) for AMD Family 19h Model 51h, Revision A1 Processors https://www.amd.com/system/files/TechDocs/56569-A1-PUB.zip + +.. [4] Linux Kernel Selftests, + https://www.kernel.org/doc/html/latest/dev-tools/kselftest.html diff --git a/Documentation/admin-guide/sysctl/net.rst b/Documentation/admin-guide/sysctl/net.rst index 60d44165fba7..6394f5dc2303 100644 --- a/Documentation/admin-guide/sysctl/net.rst +++ b/Documentation/admin-guide/sysctl/net.rst @@ -31,17 +31,18 @@ see only some of them, depending on your kernel's configuration. Table : Subdirectories in /proc/sys/net - ========= =================== = ========== ================== + ========= =================== = ========== =================== Directory Content Directory Content - ========= =================== = ========== ================== - core General parameter appletalk Appletalk protocol - unix Unix domain sockets netrom NET/ROM - 802 E802 protocol ax25 AX25 - ethernet Ethernet protocol rose X.25 PLP layer + ========= =================== = ========== =================== + 802 E802 protocol mptcp Multipath TCP + appletalk Appletalk protocol netfilter Network Filter + ax25 AX25 netrom NET/ROM + bridge Bridging rose X.25 PLP layer + core General parameter tipc TIPC + ethernet Ethernet protocol unix Unix domain sockets ipv4 IP version 4 x25 X.25 protocol - bridge Bridging decnet DEC net - ipv6 IP version 6 tipc TIPC - ========= =================== = ========== ================== + ipv6 IP version 6 + ========= =================== = ========== =================== 1. /proc/sys/net/core - Network core options ============================================ @@ -101,6 +102,9 @@ Values: - 1 - enable JIT hardening for unprivileged users only - 2 - enable JIT hardening for all users +where "privileged user" in this context means a process having +CAP_BPF or CAP_SYS_ADMIN in the root user name space. + bpf_jit_kallsyms ---------------- diff --git a/Documentation/admin-guide/tainted-kernels.rst b/Documentation/admin-guide/tainted-kernels.rst index 7d80e8c307d1..92a8a07f5c43 100644 --- a/Documentation/admin-guide/tainted-kernels.rst +++ b/Documentation/admin-guide/tainted-kernels.rst @@ -134,6 +134,12 @@ More detailed explanation for tainting scsi/snic on something else than x86_64, scsi/ips on non x86/x86_64/itanium, have broken firmware settings for the irqchip/irq-gic on arm64 ...). + - x86/x86_64: Microcode late loading is dangerous and will result in + tainting the kernel. It requires that all CPUs rendezvous to make sure + the update happens when the system is as quiescent as possible. However, + a higher priority MCE/SMI/NMI can move control flow away from that + rendezvous and interrupt the update, which can be detrimental to the + machine. 3) ``R`` if a module was force unloaded by ``rmmod -f``, ``' '`` if all modules were unloaded normally. |