diff options
Diffstat (limited to 'Documentation/admin-guide')
-rw-r--r-- | Documentation/admin-guide/bug-hunting.rst | 2 | ||||
-rw-r--r-- | Documentation/admin-guide/cgroup-v2.rst | 79 | ||||
-rw-r--r-- | Documentation/admin-guide/gpio/gpio-aggregator.rst | 107 | ||||
-rw-r--r-- | Documentation/admin-guide/kernel-parameters.txt | 7 | ||||
-rw-r--r-- | Documentation/admin-guide/media/c3-isp.dot | 26 | ||||
-rw-r--r-- | Documentation/admin-guide/media/c3-isp.rst | 101 | ||||
-rw-r--r-- | Documentation/admin-guide/media/mgb4.rst | 9 | ||||
-rw-r--r-- | Documentation/admin-guide/media/pci-cardlist.rst | 1 | ||||
-rw-r--r-- | Documentation/admin-guide/media/v4l-drivers.rst | 1 | ||||
-rw-r--r-- | Documentation/admin-guide/namespaces/resource-control.rst | 24 | ||||
-rw-r--r-- | Documentation/admin-guide/pm/cpufreq.rst | 8 | ||||
-rw-r--r-- | Documentation/admin-guide/pm/intel_idle.rst | 21 | ||||
-rw-r--r-- | Documentation/admin-guide/pm/intel_pstate.rst | 104 | ||||
-rw-r--r-- | Documentation/admin-guide/quickly-build-trimmed-linux.rst | 4 | ||||
-rw-r--r-- | Documentation/admin-guide/reporting-issues.rst | 6 | ||||
-rw-r--r-- | Documentation/admin-guide/verify-bugs-and-bisect-regressions.rst | 4 |
16 files changed, 455 insertions, 49 deletions
diff --git a/Documentation/admin-guide/bug-hunting.rst b/Documentation/admin-guide/bug-hunting.rst index ce6f4e8ca487..30858757c9f2 100644 --- a/Documentation/admin-guide/bug-hunting.rst +++ b/Documentation/admin-guide/bug-hunting.rst @@ -196,7 +196,7 @@ will see the assembler code for the routine shown, but if your kernel has debug symbols the C code will also be available. (Debug symbols can be enabled in the kernel hacking menu of the menu configuration.) For example:: - $ objdump -r -S -l --disassemble net/dccp/ipv4.o + $ objdump -r -S -l --disassemble net/ipv4/tcp.o .. note:: diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst index 9e7de8e70048..1edc26622594 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -1076,7 +1076,7 @@ cpufreq governor about the minimum desired frequency which should always be provided by a CPU, as well as the maximum desired frequency, which should not be exceeded by a CPU. -WARNING: cgroup2 cpu controller doesn't yet fully support the control of +WARNING: cgroup2 cpu controller doesn't yet support the (bandwidth) control of realtime processes. For a kernel built with the CONFIG_RT_GROUP_SCHED option enabled for group scheduling of realtime processes, the cpu controller can only be enabled when all RT processes are in the root cgroup. Be aware that system @@ -1095,19 +1095,34 @@ realtime processes irrespective of CONFIG_RT_GROUP_SCHED. CPU Interface Files ~~~~~~~~~~~~~~~~~~~ -All time durations are in microseconds. +The interaction of a process with the cpu controller depends on its scheduling +policy and the underlying scheduler. From the point of view of the cpu controller, +processes can be categorized as follows: + +* Processes under the fair-class scheduler +* Processes under a BPF scheduler with the ``cgroup_set_weight`` callback +* Everything else: ``SCHED_{FIFO,RR,DEADLINE}`` and processes under a BPF scheduler + without the ``cgroup_set_weight`` callback + +For details on when a process is under the fair-class scheduler or a BPF scheduler, +check out :ref:`Documentation/scheduler/sched-ext.rst <sched-ext>`. + +For each of the following interface files, the above categories +will be referred to. All time durations are in microseconds. cpu.stat A read-only flat-keyed file. This file exists whether the controller is enabled or not. - It always reports the following three stats: + It always reports the following three stats, which account for all the + processes in the cgroup: - usage_usec - user_usec - system_usec - and the following five when the controller is enabled: + and the following five when the controller is enabled, which account for + only the processes under the fair-class scheduler: - nr_periods - nr_throttled @@ -1125,6 +1140,10 @@ All time durations are in microseconds. If the cgroup has been configured to be SCHED_IDLE (cpu.idle = 1), then the weight will show as a 0. + This file affects only processes under the fair-class scheduler and a BPF + scheduler with the ``cgroup_set_weight`` callback depending on what the + callback actually does. + cpu.weight.nice A read-write single value file which exists on non-root cgroups. The default is "0". @@ -1137,6 +1156,10 @@ All time durations are in microseconds. granularity is coarser for the nice values, the read value is the closest approximation of the current weight. + This file affects only processes under the fair-class scheduler and a BPF + scheduler with the ``cgroup_set_weight`` callback depending on what the + callback actually does. + cpu.max A read-write two value file which exists on non-root cgroups. The default is "max 100000". @@ -1149,43 +1172,55 @@ All time durations are in microseconds. $PERIOD duration. "max" for $MAX indicates no limit. If only one number is written, $MAX is updated. + This file affects only processes under the fair-class scheduler. + cpu.max.burst A read-write single value file which exists on non-root cgroups. The default is "0". The burst in the range [0, $MAX]. + This file affects only processes under the fair-class scheduler. + cpu.pressure A read-write nested-keyed file. Shows pressure stall information for CPU. See :ref:`Documentation/accounting/psi.rst <psi>` for details. + This file accounts for all the processes in the cgroup. + cpu.uclamp.min - A read-write single value file which exists on non-root cgroups. - The default is "0", i.e. no utilization boosting. + A read-write single value file which exists on non-root cgroups. + The default is "0", i.e. no utilization boosting. + + The requested minimum utilization (protection) as a percentage + rational number, e.g. 12.34 for 12.34%. - The requested minimum utilization (protection) as a percentage - rational number, e.g. 12.34 for 12.34%. + This interface allows reading and setting minimum utilization clamp + values similar to the sched_setattr(2). This minimum utilization + value is used to clamp the task specific minimum utilization clamp, + including those of realtime processes. - This interface allows reading and setting minimum utilization clamp - values similar to the sched_setattr(2). This minimum utilization - value is used to clamp the task specific minimum utilization clamp. + The requested minimum utilization (protection) is always capped by + the current value for the maximum utilization (limit), i.e. + `cpu.uclamp.max`. - The requested minimum utilization (protection) is always capped by - the current value for the maximum utilization (limit), i.e. - `cpu.uclamp.max`. + This file affects all the processes in the cgroup. cpu.uclamp.max - A read-write single value file which exists on non-root cgroups. - The default is "max". i.e. no utilization capping + A read-write single value file which exists on non-root cgroups. + The default is "max". i.e. no utilization capping + + The requested maximum utilization (limit) as a percentage rational + number, e.g. 98.76 for 98.76%. - The requested maximum utilization (limit) as a percentage rational - number, e.g. 98.76 for 98.76%. + This interface allows reading and setting maximum utilization clamp + values similar to the sched_setattr(2). This maximum utilization + value is used to clamp the task specific maximum utilization clamp, + including those of realtime processes. - This interface allows reading and setting maximum utilization clamp - values similar to the sched_setattr(2). This maximum utilization - value is used to clamp the task specific maximum utilization clamp. + This file affects all the processes in the cgroup. cpu.idle A read-write single value file which exists on non-root cgroups. @@ -1197,7 +1232,7 @@ All time durations are in microseconds. own relative priorities, but the cgroup itself will be treated as very low priority relative to its peers. - + This file affects only processes under the fair-class scheduler. Memory ------ diff --git a/Documentation/admin-guide/gpio/gpio-aggregator.rst b/Documentation/admin-guide/gpio/gpio-aggregator.rst index 5cd1e7221756..8374a9df9105 100644 --- a/Documentation/admin-guide/gpio/gpio-aggregator.rst +++ b/Documentation/admin-guide/gpio/gpio-aggregator.rst @@ -69,6 +69,113 @@ write-only attribute files in sysfs. $ echo gpio-aggregator.0 > delete_device +Aggregating GPIOs using Configfs +-------------------------------- + +**Group:** ``/config/gpio-aggregator`` + + This is the root directory of the gpio-aggregator configfs tree. + +**Group:** ``/config/gpio-aggregator/<example-name>`` + + This directory represents a GPIO aggregator device. You can assign any + name to ``<example-name>`` (e.g. ``agg0``), except names starting with + ``_sysfs`` prefix, which are reserved for auto-generated configfs + entries corresponding to devices created via Sysfs. + +**Attribute:** ``/config/gpio-aggregator/<example-name>/live`` + + The ``live`` attribute allows to trigger the actual creation of the device + once it's fully configured. Accepted values are: + + * ``1``, ``yes``, ``true`` : enable the virtual device + * ``0``, ``no``, ``false`` : disable the virtual device + +**Attribute:** ``/config/gpio-aggregator/<example-name>/dev_name`` + + The read-only ``dev_name`` attribute exposes the name of the device as it + will appear in the system on the platform bus (e.g. ``gpio-aggregator.0``). + This is useful for identifying a character device for the newly created + aggregator. If it's ``gpio-aggregator.0``, + ``/sys/devices/platform/gpio-aggregator.0/gpiochipX`` path tells you that the + GPIO device id is ``X``. + +You must create subdirectories for each virtual line you want to +instantiate, named exactly as ``line0``, ``line1``, ..., ``lineY``, when +you want to instantiate ``Y+1`` (Y >= 0) lines. Configure all lines before +activating the device by setting ``live`` to 1. + +**Group:** ``/config/gpio-aggregator/<example-name>/<lineY>/`` + + This directory represents a GPIO line to include in the aggregator. + +**Attribute:** ``/config/gpio-aggregator/<example-name>/<lineY>/key`` + +**Attribute:** ``/config/gpio-aggregator/<example-name>/<lineY>/offset`` + + The default values after creating the ``<lineY>`` directory are: + + * ``key`` : <empty> + * ``offset`` : -1 + + ``key`` must always be explicitly configured, while ``offset`` depends. + Two configuration patterns exist for each ``<lineY>``: + + (a). For lookup by GPIO line name: + + * Set ``key`` to the line name. + * Ensure ``offset`` remains -1 (the default). + + (b). For lookup by GPIO chip name and the line offset within the chip: + + * Set ``key`` to the chip name. + * Set ``offset`` to the line offset (0 <= ``offset`` < 65535). + +**Attribute:** ``/config/gpio-aggregator/<example-name>/<lineY>/name`` + + The ``name`` attribute sets a custom name for lineY. If left unset, the + line will remain unnamed. + +Once the configuration is done, the ``'live'`` attribute must be set to 1 +in order to instantiate the aggregator device. It can be set back to 0 to +destroy the virtual device. The module will synchronously wait for the new +aggregator device to be successfully probed and if this doesn't happen, writing +to ``'live'`` will result in an error. This is a different behaviour from the +case when you create it using sysfs ``new_device`` interface. + +.. note:: + + For aggregators created via Sysfs, the configfs entries are + auto-generated and appear as ``/config/gpio-aggregator/_sysfs.<N>/``. You + cannot add or remove line directories with mkdir(2)/rmdir(2). To modify + lines, you must use the "delete_device" interface to tear down the + existing device and reconfigure it from scratch. However, you can still + toggle the aggregator with the ``live`` attribute and adjust the + ``key``, ``offset``, and ``name`` attributes for each line when ``live`` + is set to 0 by hand (i.e. it's not waiting for deferred probe). + +Sample configuration commands +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. code-block:: sh + + # Create a directory for an aggregator device + $ mkdir /sys/kernel/config/gpio-aggregator/agg0 + + # Configure each line + $ mkdir /sys/kernel/config/gpio-aggregator/agg0/line0 + $ echo gpiochip0 > /sys/kernel/config/gpio-aggregator/agg0/line0/key + $ echo 6 > /sys/kernel/config/gpio-aggregator/agg0/line0/offset + $ echo test0 > /sys/kernel/config/gpio-aggregator/agg0/line0/name + $ mkdir /sys/kernel/config/gpio-aggregator/agg0/line1 + $ echo gpiochip0 > /sys/kernel/config/gpio-aggregator/agg0/line1/key + $ echo 7 > /sys/kernel/config/gpio-aggregator/agg0/line1/offset + $ echo test1 > /sys/kernel/config/gpio-aggregator/agg0/line1/name + + # Activate the aggregator device + $ echo 1 > /sys/kernel/config/gpio-aggregator/agg0/live + + Generic GPIO Driver ------------------- diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 1d6d39f4c68d..ea81784be981 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -1828,6 +1828,13 @@ lz4: Select LZ4 compression algorithm to compress/decompress hibernation image. + hibernate.pm_test_delay= + [HIBERNATION] + Sets the number of seconds to remain in a hibernation test + mode before resuming the system (see + /sys/power/pm_test). Only available when CONFIG_PM_DEBUG + is set. Default value is 5. + highmem=nn[KMG] [KNL,BOOT,EARLY] forces the highmem zone to have an exact size of <nn>. This works even on boxes that have no highmem otherwise. This also works to reduce highmem diff --git a/Documentation/admin-guide/media/c3-isp.dot b/Documentation/admin-guide/media/c3-isp.dot new file mode 100644 index 000000000000..42dc931ee84a --- /dev/null +++ b/Documentation/admin-guide/media/c3-isp.dot @@ -0,0 +1,26 @@ +digraph board { + rankdir=TB + n00000001 [label="{{<port0> 0 | <port1> 1} | c3-isp-core\n/dev/v4l-subdev0 | {<port2> 2 | <port3> 3 | <port4> 4 | <port5> 5}}", shape=Mrecord, style=filled, fillcolor=green] + n00000001:port3 -> n00000008:port0 + n00000001:port4 -> n0000000b:port0 + n00000001:port5 -> n0000000e:port0 + n00000001:port2 -> n00000027 + n00000008 [label="{{<port0> 0} | c3-isp-resizer0\n/dev/v4l-subdev1 | {<port1> 1}}", shape=Mrecord, style=filled, fillcolor=green] + n00000008:port1 -> n00000016 [style=bold] + n0000000b [label="{{<port0> 0} | c3-isp-resizer1\n/dev/v4l-subdev2 | {<port1> 1}}", shape=Mrecord, style=filled, fillcolor=green] + n0000000b:port1 -> n0000001a [style=bold] + n0000000e [label="{{<port0> 0} | c3-isp-resizer2\n/dev/v4l-subdev3 | {<port1> 1}}", shape=Mrecord, style=filled, fillcolor=green] + n0000000e:port1 -> n00000023 [style=bold] + n00000011 [label="{{<port0> 0} | c3-mipi-adapter\n/dev/v4l-subdev4 | {<port1> 1}}", shape=Mrecord, style=filled, fillcolor=green] + n00000011:port1 -> n00000001:port0 [style=bold] + n00000016 [label="c3-isp-cap0\n/dev/video0", shape=box, style=filled, fillcolor=yellow] + n0000001a [label="c3-isp-cap1\n/dev/video1", shape=box, style=filled, fillcolor=yellow] + n0000001e [label="{{<port0> 0} | c3-mipi-csi2\n/dev/v4l-subdev5 | {<port1> 1}}", shape=Mrecord, style=filled, fillcolor=green] + n0000001e:port1 -> n00000011:port0 [style=bold] + n00000023 [label="c3-isp-cap2\n/dev/video2", shape=box, style=filled, fillcolor=yellow] + n00000027 [label="c3-isp-stats\n/dev/video3", shape=box, style=filled, fillcolor=yellow] + n0000002b [label="c3-isp-params\n/dev/video4", shape=box, style=filled, fillcolor=yellow] + n0000002b -> n00000001:port1 + n0000003f [label="{{} | imx290 2-001a\n/dev/v4l-subdev6 | {<port0> 0}}", shape=Mrecord, style=filled, fillcolor=green] + n0000003f:port0 -> n0000001e:port0 [style=bold] +} diff --git a/Documentation/admin-guide/media/c3-isp.rst b/Documentation/admin-guide/media/c3-isp.rst new file mode 100644 index 000000000000..ac508b8c6831 --- /dev/null +++ b/Documentation/admin-guide/media/c3-isp.rst @@ -0,0 +1,101 @@ +.. SPDX-License-Identifier: (GPL-2.0-only OR MIT) + +.. include:: <isonum.txt> + +================================================= +Amlogic C3 Image Signal Processing (C3ISP) driver +================================================= + +Introduction +============ + +This file documents the Amlogic C3ISP driver located under +drivers/media/platform/amlogic/c3/isp. + +The current version of the driver supports the C3ISP found on +Amlogic C308L processor. + +The driver implements V4L2, Media controller and V4L2 subdev interfaces. +Camera sensor using V4L2 subdev interface in the kernel is supported. + +The driver has been tested on AW419-C308L-Socket platform. + +Amlogic C3 ISP +============== + +The Camera hardware found on C308L processors and supported by +the driver consists of: + +- 1 MIPI-CSI-2 module: handles the physical layer of the MIPI CSI-2 receiver and + receives data from the connected camera sensor. +- 1 MIPI-ADAPTER module: organizes MIPI data to meet ISP input requirements and + send MIPI data to ISP. +- 1 ISP (Image Signal Processing) module: contains a pipeline of image processing + hardware blocks. The ISP pipeline contains three resizers at the end each of + them connected to a DMA interface which writes the output data to memory. + +A high-level functional view of the C3 ISP is presented below.:: + + +----------+ +-------+ + | Resizer |--->| WRMIF | + +---------+ +------------+ +--------------+ +-------+ |----------+ +-------+ + | Sensor |--->| MIPI CSI-2 |--->| MIPI ADAPTER |--->| ISP |---|----------+ +-------+ + +---------+ +------------+ +--------------+ +-------+ | Resizer |--->| WRMIF | + +----------+ +-------+ + |----------+ +-------+ + | Resizer |--->| WRMIF | + +----------+ +-------+ + +Driver architecture and design +============================== + +With the goal to model the hardware links between the modules and to expose a +clean, logical and usable interface, the driver registers the following V4L2 +sub-devices: + +- 1 `c3-mipi-csi2` sub-device - the MIPI CSI-2 receiver +- 1 `c3-mipi-adapter` sub-device - the MIPI adapter +- 1 `c3-isp-core` sub-device - the ISP core +- 3 `c3-isp-resizer` sub-devices - the ISP resizers + +The `c3-isp-core` sub-device is linked to 2 video device nodes for statistics +capture and parameters programming: + +- the `c3-isp-stats` capture video device node for statistics capture +- the `c3-isp-params` output video device for parameters programming + +Each `c3-isp-resizer` sub-device is linked to a capture video device node where +frames are captured from: + +- `c3-isp-resizer0` is linked to the `c3-isp-cap0` capture video device +- `c3-isp-resizer1` is linked to the `c3-isp-cap1` capture video device +- `c3-isp-resizer2` is linked to the `c3-isp-cap2` capture video device + +The media controller pipeline graph is as follows (with connected a +IMX290 camera sensor): + +.. _isp_topology_graph: + +.. kernel-figure:: c3-isp.dot + :alt: c3-isp.dot + :align: center + + Media pipeline topology + +Implementation +============== + +Runtime configuration of the ISP hardware is performed on the `c3-isp-params` +video device node using the :ref:`V4L2_META_FMT_C3ISP_PARAMS +<v4l2-meta-fmt-c3isp-params>` as data format. The buffer structure is defined by +:c:type:`c3_isp_params_cfg`. + +Statistics are captured from the `c3-isp-stats` video device node using the +:ref:`V4L2_META_FMT_C3ISP_STATS <v4l2-meta-fmt-c3isp-stats>` data format. + +The final picture size and format is configured using the V4L2 video +capture interface on the `c3-isp-cap[0, 2]` video device nodes. + +The Amlogic C3 ISP is supported by `libcamera <https://libcamera.org>`_ with a +dedicated pipeline handler and algorithms that perform run-time image correction +and enhancement. diff --git a/Documentation/admin-guide/media/mgb4.rst b/Documentation/admin-guide/media/mgb4.rst index f69d331e3cb1..5ac69b833a7a 100644 --- a/Documentation/admin-guide/media/mgb4.rst +++ b/Documentation/admin-guide/media/mgb4.rst @@ -1,8 +1,17 @@ .. SPDX-License-Identifier: GPL-2.0 +.. include:: <isonum.txt> + The mgb4 driver =============== +Copyright |copy| 2023 - 2025 Digiteq Automotive + author: Martin Tůma <martin.tuma@digiteqautomotive.com> + +This is a v4l2 device driver for the Digiteq Automotive FrameGrabber 4, a PCIe +card capable of capturing and generating FPD-Link III and GMSL2/3 video streams +as used in the automotive industry. + sysfs interface --------------- diff --git a/Documentation/admin-guide/media/pci-cardlist.rst b/Documentation/admin-guide/media/pci-cardlist.rst index 7d8e3c8987db..239879634ea5 100644 --- a/Documentation/admin-guide/media/pci-cardlist.rst +++ b/Documentation/admin-guide/media/pci-cardlist.rst @@ -86,7 +86,6 @@ saa7134 Philips SAA7134 saa7164 NXP SAA7164 smipcie SMI PCIe DVBSky cards solo6x10 Bluecherry / Softlogic 6x10 capture cards (MPEG-4/H.264) -sta2x11_vip STA2X11 VIP Video For Linux tw5864 Techwell TW5864 video/audio grabber and encoder tw686x Intersil/Techwell TW686x tw68 Techwell tw68x Video For Linux diff --git a/Documentation/admin-guide/media/v4l-drivers.rst b/Documentation/admin-guide/media/v4l-drivers.rst index e8761561b2fe..3bac5165b134 100644 --- a/Documentation/admin-guide/media/v4l-drivers.rst +++ b/Documentation/admin-guide/media/v4l-drivers.rst @@ -10,6 +10,7 @@ Video4Linux (V4L) driver-specific documentation :maxdepth: 2 bttv + c3-isp cafe_ccic cx88 fimc diff --git a/Documentation/admin-guide/namespaces/resource-control.rst b/Documentation/admin-guide/namespaces/resource-control.rst index 369556e00f0c..553a44803231 100644 --- a/Documentation/admin-guide/namespaces/resource-control.rst +++ b/Documentation/admin-guide/namespaces/resource-control.rst @@ -1,17 +1,17 @@ -=========================== -Namespaces research control -=========================== +==================================== +User namespaces and resource control +==================================== -There are a lot of kinds of objects in the kernel that don't have -individual limits or that have limits that are ineffective when a set -of processes is allowed to switch user ids. With user namespaces -enabled in a kernel for people who don't trust their users or their -users programs to play nice this problems becomes more acute. +The kernel contains many kinds of objects that either don't have +individual limits or that have limits which are ineffective when +a set of processes is allowed to switch their UID. On a system +where the admins don't trust their users or their users' programs, +user namespaces expose the system to potential misuse of resources. -Therefore it is recommended that memory control groups be enabled in -kernels that enable user namespaces, and it is further recommended -that userspace configure memory control groups to limit how much -memory user's they don't trust to play nice can use. +In order to mitigate this, we recommend that admins enable memory +control groups on any system that enables user namespaces. +Furthermore, we recommend that admins configure the memory control +groups to limit the maximum memory usable by any untrusted user. Memory control groups can be configured by installing the libcgroup package present on most distros editing /etc/cgrules.conf, diff --git a/Documentation/admin-guide/pm/cpufreq.rst b/Documentation/admin-guide/pm/cpufreq.rst index 3950583f2b15..2d74af7f0efe 100644 --- a/Documentation/admin-guide/pm/cpufreq.rst +++ b/Documentation/admin-guide/pm/cpufreq.rst @@ -231,7 +231,7 @@ are the following: present). The existence of the limit may be a result of some (often unintentional) - BIOS settings, restrictions coming from a service processor or another + BIOS settings, restrictions coming from a service processor or other BIOS/HW-based mechanisms. This does not cover ACPI thermal limitations which can be discovered @@ -258,8 +258,8 @@ are the following: extension on ARM). If one cannot be determined, this attribute should not be present. - Note, that failed attempt to retrieve current frequency for a given - CPU(s) will result in an appropriate error, i.e: EAGAIN for CPU that + Note that failed attempt to retrieve current frequency for a given + CPU(s) will result in an appropriate error, i.e.: EAGAIN for CPU that remains idle (raised on ARM). ``cpuinfo_max_freq`` @@ -499,7 +499,7 @@ This governor exposes the following tunables: represented by it to be 1.5 times as high as the transition latency (the default):: - # echo `$(($(cat cpuinfo_transition_latency) * 3 / 2)) > ondemand/sampling_rate + # echo `$(($(cat cpuinfo_transition_latency) * 3 / 2))` > ondemand/sampling_rate ``up_threshold`` If the estimated CPU load is above this value (in percent), the governor diff --git a/Documentation/admin-guide/pm/intel_idle.rst b/Documentation/admin-guide/pm/intel_idle.rst index 5940528146eb..ed6f055d4b14 100644 --- a/Documentation/admin-guide/pm/intel_idle.rst +++ b/Documentation/admin-guide/pm/intel_idle.rst @@ -38,6 +38,27 @@ instruction at all. only way to pass early-configuration-time parameters to it is via the kernel command line. +Sysfs Interface +=============== + +The ``intel_idle`` driver exposes the following ``sysfs`` attributes in +``/sys/devices/system/cpu/cpuidle/``: + +``intel_c1_demotion`` + Enable or disable C1 demotion for all CPUs in the system. This file is + only exposed on platforms that support the C1 demotion feature and where + it was tested. Value 0 means that C1 demotion is disabled, value 1 means + that it is enabled. Write 0 or 1 to disable or enable C1 demotion for + all CPUs. + + The C1 demotion feature involves the platform firmware demoting deep + C-state requests from the OS (e.g., C6 requests) to C1. The idea is that + firmware monitors CPU wake-up rate, and if it is higher than a + platform-specific threshold, the firmware demotes deep C-state requests + to C1. For example, Linux requests C6, but firmware noticed too many + wake-ups per second, and it keeps the CPU in C1. When the CPU stays in + C1 long enough, the platform promotes it back to C6. This may improve + some workloads' performance, but it may also increase power consumption. .. _intel-idle-enumeration-of-states: diff --git a/Documentation/admin-guide/pm/intel_pstate.rst b/Documentation/admin-guide/pm/intel_pstate.rst index 78fc83ed2a7e..26e702c7016e 100644 --- a/Documentation/admin-guide/pm/intel_pstate.rst +++ b/Documentation/admin-guide/pm/intel_pstate.rst @@ -329,6 +329,106 @@ information listed above is the same for all of the processors supporting the HWP feature, which is why ``intel_pstate`` works with all of them.] +Support for Hybrid Processors +============================= + +Some processors supported by ``intel_pstate`` contain two or more types of CPU +cores differing by the maximum turbo P-state, performance vs power characteristics, +cache sizes, and possibly other properties. They are commonly referred to as +hybrid processors. To support them, ``intel_pstate`` requires HWP to be enabled +and it assumes the HWP performance units to be the same for all CPUs in the +system, so a given HWP performance level always represents approximately the +same physical performance regardless of the core (CPU) type. + +Hybrid Processors with SMT +-------------------------- + +On systems where SMT (Simultaneous Multithreading), also referred to as +HyperThreading (HT) in the context of Intel processors, is enabled on at least +one core, ``intel_pstate`` assigns performance-based priorities to CPUs. Namely, +the priority of a given CPU reflects its highest HWP performance level which +causes the CPU scheduler to generally prefer more performant CPUs, so the less +performant CPUs are used when the other ones are fully loaded. However, SMT +siblings (that is, logical CPUs sharing one physical core) are treated in a +special way such that if one of them is in use, the effective priority of the +other ones is lowered below the priorities of the CPUs located in the other +physical cores. + +This approach maximizes performance in the majority of cases, but unfortunately +it also leads to excessive energy usage in some important scenarios, like video +playback, which is not generally desirable. While there is no other viable +choice with SMT enabled because the effective capacity and utilization of SMT +siblings are hard to determine, hybrid processors without SMT can be handled in +more energy-efficient ways. + +.. _CAS: + +Capacity-Aware Scheduling Support +--------------------------------- + +The capacity-aware scheduling (CAS) support in the CPU scheduler is enabled by +``intel_pstate`` by default on hybrid processors without SMT. CAS generally +causes the scheduler to put tasks on a CPU so long as there is a sufficient +amount of spare capacity on it, and if the utilization of a given task is too +high for it, the task will need to go somewhere else. + +Since CAS takes CPU capacities into account, it does not require CPU +prioritization and it allows tasks to be distributed more symmetrically among +the more performant and less performant CPUs. Once placed on a CPU with enough +capacity to accommodate it, a task may just continue to run there regardless of +whether or not the other CPUs are fully loaded, so on average CAS reduces the +utilization of the more performant CPUs which causes the energy usage to be more +balanced because the more performant CPUs are generally less energy-efficient +than the less performant ones. + +In order to use CAS, the scheduler needs to know the capacity of each CPU in +the system and it needs to be able to compute scale-invariant utilization of +CPUs, so ``intel_pstate`` provides it with the requisite information. + +First of all, the capacity of each CPU is represented by the ratio of its highest +HWP performance level, multiplied by 1024, to the highest HWP performance level +of the most performant CPU in the system, which works because the HWP performance +units are the same for all CPUs. Second, the frequency-invariance computations, +carried out by the scheduler to always express CPU utilization in the same units +regardless of the frequency it is currently running at, are adjusted to take the +CPU capacity into account. All of this happens when ``intel_pstate`` has +registered itself with the ``CPUFreq`` core and it has figured out that it is +running on a hybrid processor without SMT. + +Energy-Aware Scheduling Support +------------------------------- + +If ``CONFIG_ENERGY_MODEL`` has been set during kernel configuration and +``intel_pstate`` runs on a hybrid processor without SMT, in addition to enabling +`CAS <CAS_>`_ it registers an Energy Model for the processor. This allows the +Energy-Aware Scheduling (EAS) support to be enabled in the CPU scheduler if +``schedutil`` is used as the ``CPUFreq`` governor which requires ``intel_pstate`` +to operate in the `passive mode <Passive Mode_>`_. + +The Energy Model registered by ``intel_pstate`` is artificial (that is, it is +based on abstract cost values and it does not include any real power numbers) +and it is relatively simple to avoid unnecessary computations in the scheduler. +There is a performance domain in it for every CPU in the system and the cost +values for these performance domains have been chosen so that running a task on +a less performant (small) CPU appears to be always cheaper than running that +task on a more performant (big) CPU. However, for two CPUs of the same type, +the cost difference depends on their current utilization, and the CPU whose +current utilization is higher generally appears to be a more expensive +destination for a given task. This helps to balance the load among CPUs of the +same type. + +Since EAS works on top of CAS, high-utilization tasks are always migrated to +CPUs with enough capacity to accommodate them, but thanks to EAS, low-utilization +tasks tend to be placed on the CPUs that look less expensive to the scheduler. +Effectively, this causes the less performant and less loaded CPUs to be +preferred as long as they have enough spare capacity to run the given task +which generally leads to reduced energy usage. + +The Energy Model created by ``intel_pstate`` can be inspected by looking at +the ``energy_model`` directory in ``debugfs`` (typlically mounted on +``/sys/kernel/debug/``). + + User Space Interface in ``sysfs`` ================================= @@ -697,8 +797,8 @@ of them have to be prepended with the ``intel_pstate=`` prefix. Limits`_ for details). ``no_cas`` - Do not enable capacity-aware scheduling (CAS) which is enabled by - default on hybrid systems. + Do not enable `capacity-aware scheduling <CAS_>`_ which is enabled by + default on hybrid systems without SMT. Diagnostics and Tuning ====================== diff --git a/Documentation/admin-guide/quickly-build-trimmed-linux.rst b/Documentation/admin-guide/quickly-build-trimmed-linux.rst index 07cfd8863b46..4a5ffb0996a3 100644 --- a/Documentation/admin-guide/quickly-build-trimmed-linux.rst +++ b/Documentation/admin-guide/quickly-build-trimmed-linux.rst @@ -347,7 +347,7 @@ again. [:ref:`details<uninstall>`] -.. _submit_improvements: +.. _submit_improvements_qbtl: Did you run into trouble following any of the above steps that is not cleared up by the reference section below? Or do you have ideas how to improve the text? @@ -1070,7 +1070,7 @@ complicated, and harder to follow. That being said: this of course is a balancing act. Hence, if you think an additional use-case is worth describing, suggest it to the maintainers of this -document, as :ref:`described above <submit_improvements>`. +document, as :ref:`described above <submit_improvements_qbtl>`. .. diff --git a/Documentation/admin-guide/reporting-issues.rst b/Documentation/admin-guide/reporting-issues.rst index 2fd5a030235a..9a847506f6ec 100644 --- a/Documentation/admin-guide/reporting-issues.rst +++ b/Documentation/admin-guide/reporting-issues.rst @@ -41,7 +41,7 @@ If you are facing multiple issues with the Linux kernel at once, report each separately. While writing your report, include all information relevant to the issue, like the kernel and the distro used. In case of a regression, CC the regressions mailing list (regressions@lists.linux.dev) to your report. Also try -to pin-point the culprit with a bisection; if you succeed, include its +to pinpoint the culprit with a bisection; if you succeed, include its commit-id and CC everyone in the sign-off-by chain. Once the report is out, answer any questions that come up and help where you @@ -206,7 +206,7 @@ Reporting issues only occurring in older kernel version lines This subsection is for you, if you tried the latest mainline kernel as outlined above, but failed to reproduce your issue there; at the same time you want to see the issue fixed in a still supported stable or longterm series or vendor -kernels regularly rebased on those. If that the case, follow these steps: +kernels regularly rebased on those. If that is the case, follow these steps: * Prepare yourself for the possibility that going through the next few steps might not get the issue solved in older releases: the fix might be too big @@ -312,7 +312,7 @@ small modifications to a kernel based on a recent Linux version; that for example often holds true for the mainline kernels shipped by Debian GNU/Linux Sid or Fedora Rawhide. Some developers will also accept reports about issues with kernels from distributions shipping the latest stable kernel, as long as -its only slightly modified; that for example is often the case for Arch Linux, +it's only slightly modified; that for example is often the case for Arch Linux, regular Fedora releases, and openSUSE Tumbleweed. But keep in mind, you better want to use a mainline Linux and avoid using a stable kernel for this process, as outlined in the section 'Install a fresh kernel for testing' in more diff --git a/Documentation/admin-guide/verify-bugs-and-bisect-regressions.rst b/Documentation/admin-guide/verify-bugs-and-bisect-regressions.rst index 03c55151346c..d8946b084b1e 100644 --- a/Documentation/admin-guide/verify-bugs-and-bisect-regressions.rst +++ b/Documentation/admin-guide/verify-bugs-and-bisect-regressions.rst @@ -267,7 +267,7 @@ culprit might be known already. For further details on what actually qualifies as a regression check out Documentation/admin-guide/reporting-regressions.rst. If you run into any problems while following this guide or have ideas how to -improve it, :ref:`please let the kernel developers know <submit_improvements>`. +improve it, :ref:`please let the kernel developers know <submit_improvements_vbbr>`. .. _introprep_bissbs: @@ -1055,7 +1055,7 @@ follow these instructions. [:ref:`details <introoptional_bisref>`] -.. _submit_improvements: +.. _submit_improvements_vbbr: Conclusion ---------- |