diff options
Diffstat (limited to 'Documentation')
67 files changed, 1170 insertions, 194 deletions
diff --git a/Documentation/PCI/pci.rst b/Documentation/PCI/pci.rst index dd7b1c0c21da..f4d2662871ab 100644 --- a/Documentation/PCI/pci.rst +++ b/Documentation/PCI/pci.rst @@ -52,7 +52,7 @@ driver generally needs to perform the following initialization: - Enable DMA/processing engines When done using the device, and perhaps the module needs to be unloaded, -the driver needs to take the follow steps: +the driver needs to take the following steps: - Disable the device from generating IRQs - Release the IRQ (free_irq()) diff --git a/Documentation/accel/qaic/qaic.rst b/Documentation/accel/qaic/qaic.rst index efb7771273bb..628bf2f7a416 100644 --- a/Documentation/accel/qaic/qaic.rst +++ b/Documentation/accel/qaic/qaic.rst @@ -93,7 +93,7 @@ commands (does not impact QAIC). uAPI ==== -QAIC creates an accel device per phsyical PCIe device. This accel device exists +QAIC creates an accel device per physical PCIe device. This accel device exists for as long as the PCIe device is known to Linux. The PCIe device may not be in the state to accept requests from userspace at diff --git a/Documentation/admin-guide/bug-bisect.rst b/Documentation/admin-guide/bug-bisect.rst index 325c5d0ed34a..585630d14581 100644 --- a/Documentation/admin-guide/bug-bisect.rst +++ b/Documentation/admin-guide/bug-bisect.rst @@ -1,76 +1,144 @@ -Bisecting a bug -+++++++++++++++ +.. SPDX-License-Identifier: (GPL-2.0+ OR CC-BY-4.0) +.. [see the bottom of this file for redistribution information] -Last updated: 28 October 2016 +====================== +Bisecting a regression +====================== -Introduction -============ +This document describes how to use a ``git bisect`` to find the source code +change that broke something -- for example when some functionality stopped +working after upgrading from Linux 6.0 to 6.1. -Always try the latest kernel from kernel.org and build from source. If you are -not confident in doing that please report the bug to your distribution vendor -instead of to a kernel developer. +The text focuses on the gist of the process. If you are new to bisecting the +kernel, better follow Documentation/admin-guide/verify-bugs-and-bisect-regressions.rst +instead: it depicts everything from start to finish while covering multiple +aspects even kernel developers occasionally forget. This includes detecting +situations early where a bisection would be a waste of time, as nobody would +care about the result -- for example, because the problem happens after the +kernel marked itself as 'tainted', occurs in an abandoned version, was already +fixed, or is caused by a .config change you or your Linux distributor performed. -Finding bugs is not always easy. Have a go though. If you can't find it don't -give up. Report as much as you have found to the relevant maintainer. See -MAINTAINERS for who that is for the subsystem you have worked on. +Finding the change causing a kernel issue using a bisection +=========================================================== -Before you submit a bug report read -'Documentation/admin-guide/reporting-issues.rst'. +*Note: the following process assumes you prepared everything for a bisection. +This includes having a Git clone with the appropriate sources, installing the +software required to build and install kernels, as well as a .config file stored +in a safe place (the following example assumes '~/prepared_kernel_.config') to +use as pristine base at each bisection step; ideally, you have also worked out +a fully reliable and straight-forward way to reproduce the regression, too.* -Devices not appearing -===================== - -Often this is caused by udev/systemd. Check that first before blaming it -on the kernel. - -Finding patch that caused a bug -=============================== - -Using the provided tools with ``git`` makes finding bugs easy provided the bug -is reproducible. - -Steps to do it: - -- build the Kernel from its git source -- start bisect with [#f1]_:: - - $ git bisect start - -- mark the broken changeset with:: - - $ git bisect bad [commit] - -- mark a changeset where the code is known to work with:: - - $ git bisect good [commit] - -- rebuild the Kernel and test -- interact with git bisect by using either:: - - $ git bisect good - - or:: - - $ git bisect bad - - depending if the bug happened on the changeset you're testing -- After some interactions, git bisect will give you the changeset that - likely caused the bug. - -- For example, if you know that the current version is bad, and version - 4.8 is good, you could do:: - - $ git bisect start - $ git bisect bad # Current version is bad - $ git bisect good v4.8 - - -.. [#f1] You can, optionally, provide both good and bad arguments at git - start with ``git bisect start [BAD] [GOOD]`` - -For further references, please read: - -- The man page for ``git-bisect`` -- `Fighting regressions with git bisect <https://www.kernel.org/pub/software/scm/git/docs/git-bisect-lk2009.html>`_ -- `Fully automated bisecting with "git bisect run" <https://lwn.net/Articles/317154>`_ -- `Using Git bisect to figure out when brokenness was introduced <http://webchick.net/node/99>`_ +* Preparation: start the bisection and tell Git about the points in the history + you consider to be working and broken, which Git calls 'good' and 'bad':: + + git bisect start + git bisect good v6.0 + git bisect bad v6.1 + + Instead of Git tags like 'v6.0' and 'v6.1' you can specify commit-ids, too. + +1. Copy your prepared .config into the build directory and adjust it to the + needs of the codebase Git checked out for testing:: + + cp ~/prepared_kernel_.config .config + make olddefconfig + +2. Now build, install, and boot a kernel. This might fail for unrelated reasons, + for example, when a compile error happens at the current stage of the + bisection a later change resolves. In such cases run ``git bisect skip`` and + go back to step 1. + +3. Check if the functionality that regressed works in the kernel you just built. + + If it works, execute:: + + git bisect good + + If it is broken, run:: + + git bisect bad + + Note, getting this wrong just once will send the rest of the bisection + totally off course. To prevent having to start anew later you thus want to + ensure what you tell Git is correct; it is thus often wise to spend a few + minutes more on testing in case your reproducer is unreliable. + + After issuing one of these two commands, Git will usually check out another + bisection point and print something like 'Bisecting: 675 revisions left to + test after this (roughly 10 steps)'. In that case go back to step 1. + + If Git instead prints something like 'cafecaca0c0dacafecaca0c0dacafecaca0c0da + is the first bad commit', then you have finished the bisection. In that case + move to the next point below. Note, right after displaying that line Git will + show some details about the culprit including its patch description; this can + easily fill your terminal, so you might need to scroll up to see the message + mentioning the culprit's commit-id. + + In case you missed Git's output, you can always run ``git bisect log`` to + print the status: it will show how many steps remain or mention the result of + the bisection. + +* Recommended complementary task: put the bisection log and the current .config + file aside for the bug report; furthermore tell Git to reset the sources to + the state before the bisection:: + + git bisect log > ~/bisection-log + cp .config ~/bisection-config-culprit + git bisect reset + +* Recommended optional task: try reverting the culprit on top of the latest + codebase and check if that fixes your bug; if that is the case, it validates + the bisection and enables developers to resolve the regression through a + revert. + + To try this, update your clone and check out latest mainline. Then tell Git + to revert the change by specifying its commit-id:: + + git revert --no-edit cafec0cacaca0 + + Git might reject this, for example when the bisection landed on a merge + commit. In that case, abandon the attempt. Do the same, if Git fails to revert + the culprit on its own because later changes depend on it -- at least unless + you bisected a stable or longterm kernel series, in which case you want to + check out its latest codebase and try a revert there. + + If a revert succeeds, build and test another kernel to check if reverting + resolved your regression. + +With that the process is complete. Now report the regression as described by +Documentation/admin-guide/reporting-issues.rst. + + +Additional reading material +--------------------------- + +* The `man page for 'git bisect' <https://git-scm.com/docs/git-bisect>`_ and + `fighting regressions with 'git bisect' <https://git-scm.com/docs/git-bisect-lk2009.html>`_ + in the Git documentation. +* `Working with git bisect <https://nathanchance.dev/posts/working-with-git-bisect/>`_ + from kernel developer Nathan Chancellor. +* `Using Git bisect to figure out when brokenness was introduced <http://webchick.net/node/99>`_. +* `Fully automated bisecting with 'git bisect run' <https://lwn.net/Articles/317154>`_. + +.. + end-of-content +.. + This document is maintained by Thorsten Leemhuis <linux@leemhuis.info>. If + you spot a typo or small mistake, feel free to let him know directly and + he'll fix it. You are free to do the same in a mostly informal way if you + want to contribute changes to the text -- but for copyright reasons please CC + linux-doc@vger.kernel.org and 'sign-off' your contribution as + Documentation/process/submitting-patches.rst explains in the section 'Sign + your work - the Developer's Certificate of Origin'. +.. + This text is available under GPL-2.0+ or CC-BY-4.0, as stated at the top + of the file. If you want to distribute this text under CC-BY-4.0 only, + please use 'The Linux kernel development community' for author attribution + and link this as source: + https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/Documentation/admin-guide/bug-bisect.rst + +.. + Note: Only the content of this RST file as found in the Linux kernel sources + is available under CC-BY-4.0, as versions of this text that were processed + (for example by the kernel's build system) might contain content taken from + files which use a more restrictive license. diff --git a/Documentation/admin-guide/bug-hunting.rst b/Documentation/admin-guide/bug-hunting.rst index 95299b08c405..1d0f8ceb3075 100644 --- a/Documentation/admin-guide/bug-hunting.rst +++ b/Documentation/admin-guide/bug-hunting.rst @@ -244,14 +244,14 @@ Reporting the bug Once you find where the bug happened, by inspecting its location, you could either try to fix it yourself or report it upstream. -In order to report it upstream, you should identify the mailing list -used for the development of the affected code. This can be done by using -the ``get_maintainer.pl`` script. +In order to report it upstream, you should identify the bug tracker, if any, or +mailing list used for the development of the affected code. This can be done by +using the ``get_maintainer.pl`` script. For example, if you find a bug at the gspca's sonixj.c file, you can get its maintainers with:: - $ ./scripts/get_maintainer.pl -f drivers/media/usb/gspca/sonixj.c + $ ./scripts/get_maintainer.pl --bug -f drivers/media/usb/gspca/sonixj.c Hans Verkuil <hverkuil@xs4all.nl> (odd fixer:GSPCA USB WEBCAM DRIVER,commit_signer:1/1=100%) Mauro Carvalho Chehab <mchehab@kernel.org> (maintainer:MEDIA INPUT INFRASTRUCTURE (V4L/DVB),commit_signer:1/1=100%) Tejun Heo <tj@kernel.org> (commit_signer:1/1=100%) @@ -267,11 +267,12 @@ Please notice that it will point to: - The driver maintainer (Hans Verkuil); - The subsystem maintainer (Mauro Carvalho Chehab); - The driver and/or subsystem mailing list (linux-media@vger.kernel.org); -- the Linux Kernel mailing list (linux-kernel@vger.kernel.org). +- The Linux Kernel mailing list (linux-kernel@vger.kernel.org); +- The bug reporting URIs for the driver/subsystem (none in the above example). -Usually, the fastest way to have your bug fixed is to report it to mailing -list used for the development of the code (linux-media ML) copying the -driver maintainer (Hans). +If the listing contains bug reporting URIs at the end, please prefer them over +email. Otherwise, please report bugs to the mailing list used for the +development of the code (linux-media ML) copying the driver maintainer (Hans). If you are totally stumped as to whom to send the report, and ``get_maintainer.pl`` didn't provide you anything useful, send it to diff --git a/Documentation/admin-guide/device-mapper/dm-crypt.rst b/Documentation/admin-guide/device-mapper/dm-crypt.rst index 552c9155165d..48a48bd09372 100644 --- a/Documentation/admin-guide/device-mapper/dm-crypt.rst +++ b/Documentation/admin-guide/device-mapper/dm-crypt.rst @@ -162,13 +162,18 @@ iv_large_sectors Module parameters:: - max_read_size + Maximum size of read requests. When a request larger than this size + is received, dm-crypt will split the request. The splitting improves + concurrency (the split requests could be encrypted in parallel by multiple + cores), but it also causes overhead. The user should tune this parameters to + fit the actual workload. + max_write_size - Maximum size of read or write requests. When a request larger than this size + Maximum size of write requests. When a request larger than this size is received, dm-crypt will split the request. The splitting improves concurrency (the split requests could be encrypted in parallel by multiple - cores), but it also causes overhead. The user should tune these parameters to + cores), but it also causes overhead. The user should tune this parameters to fit the actual workload. diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 0b400aa28482..e5489df3ea37 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -517,6 +517,18 @@ Format: <io>,<irq>,<mode> See header of drivers/net/hamradio/baycom_ser_hdx.c. + bdev_allow_write_mounted= + Format: <bool> + Control the ability to open a mounted block device + for writing, i.e., allow / disallow writes that bypass + the FS. This was implemented as a means to prevent + fuzzers from crashing the kernel by overwriting the + metadata underneath a mounted FS without its awareness. + This also prevents destructive formatting of mounted + filesystems by naive storage tooling that don't use + O_EXCL. Default is Y and can be changed through the + Kconfig option CONFIG_BLK_DEV_WRITE_MOUNTED. + bert_disable [ACPI] Disable BERT OS support on buggy BIOSes. diff --git a/Documentation/admin-guide/tainted-kernels.rst b/Documentation/admin-guide/tainted-kernels.rst index f92551539e8a..700aa72eecb1 100644 --- a/Documentation/admin-guide/tainted-kernels.rst +++ b/Documentation/admin-guide/tainted-kernels.rst @@ -182,3 +182,5 @@ More detailed explanation for tainting produce extremely unusual kernel structure layouts (even performance pathological ones), which is important to know when debugging. Set at build time. + + 18) ``N`` if an in-kernel test, such as a KUnit test, has been run. diff --git a/Documentation/arch/arm/stm32/stm32-dma-mdma-chaining.rst b/Documentation/arch/arm/stm32/stm32-dma-mdma-chaining.rst index 2945e0e33104..301aa30890ae 100644 --- a/Documentation/arch/arm/stm32/stm32-dma-mdma-chaining.rst +++ b/Documentation/arch/arm/stm32/stm32-dma-mdma-chaining.rst @@ -359,7 +359,7 @@ Driver updates for STM32 DMA-MDMA chaining support in foo driver descriptor you want a callback to be called at the end of the transfer (dmaengine_prep_slave_sg()) or the period (dmaengine_prep_dma_cyclic()). Depending on the direction, set the callback on the descriptor that finishes - the overal transfer: + the overall transfer: * DMA_DEV_TO_MEM: set the callback on the "MDMA" descriptor * DMA_MEM_TO_DEV: set the callback on the "DMA" descriptor @@ -371,7 +371,7 @@ Driver updates for STM32 DMA-MDMA chaining support in foo driver As STM32 MDMA channel transfer is triggered by STM32 DMA, you must issue STM32 MDMA channel before STM32 DMA channel. - If any, your callback will be called to warn you about the end of the overal + If any, your callback will be called to warn you about the end of the overall transfer or the period completion. Don't forget to terminate both channels. STM32 DMA channel is configured in diff --git a/Documentation/arch/arm64/cpu-hotplug.rst b/Documentation/arch/arm64/cpu-hotplug.rst index 76ba8d932c72..8fb438bf7781 100644 --- a/Documentation/arch/arm64/cpu-hotplug.rst +++ b/Documentation/arch/arm64/cpu-hotplug.rst @@ -26,7 +26,7 @@ There are no systems that support the physical addition (or removal) of CPUs while the system is running, and ACPI is not able to sufficiently describe them. -e.g. New CPUs come with new caches, but the platform's cache toplogy is +e.g. New CPUs come with new caches, but the platform's cache topology is described in a static table, the PPTT. How caches are shared between CPUs is not discoverable, and must be described by firmware. diff --git a/Documentation/arch/powerpc/ultravisor.rst b/Documentation/arch/powerpc/ultravisor.rst index ba6b1bf1cc44..6d0407b2f5a1 100644 --- a/Documentation/arch/powerpc/ultravisor.rst +++ b/Documentation/arch/powerpc/ultravisor.rst @@ -134,7 +134,7 @@ Hardware * PTCR and partition table entries (partition table is in secure memory). An attempt to write to PTCR will cause a Hypervisor - Emulation Assitance interrupt. + Emulation Assistance interrupt. * LDBAR (LD Base Address Register) and IMC (In-Memory Collection) non-architected registers. An attempt to write to them will cause a diff --git a/Documentation/arch/riscv/vector.rst b/Documentation/arch/riscv/vector.rst index 75dd88a62e1d..3987f5f76a9d 100644 --- a/Documentation/arch/riscv/vector.rst +++ b/Documentation/arch/riscv/vector.rst @@ -15,7 +15,7 @@ status for the use of Vector in userspace. The intended usage guideline for these interfaces is to give init systems a way to modify the availability of V for processes running under its domain. Calling these interfaces is not recommended in libraries routines because libraries should not override policies -configured from the parant process. Also, users must noted that these interfaces +configured from the parent process. Also, users must note that these interfaces are not portable to non-Linux, nor non-RISC-V environments, so it is discourage to use in a portable code. To get the availability of V in an ELF program, please read :c:macro:`COMPAT_HWCAP_ISA_V` bit of :c:macro:`ELF_HWCAP` in the diff --git a/Documentation/arch/x86/mds.rst b/Documentation/arch/x86/mds.rst index c58c72362911..5a2e6c0ef04a 100644 --- a/Documentation/arch/x86/mds.rst +++ b/Documentation/arch/x86/mds.rst @@ -162,7 +162,7 @@ Mitigation points 3. It would take a large number of these precisely-timed NMIs to mount an actual attack. There's presumably not enough bandwidth. 4. The NMI in question occurs after a VERW, i.e. when user state is - restored and most interesting data is already scrubbed. Whats left + restored and most interesting data is already scrubbed. What's left is only the data that NMI touches, and that may or may not be of any interest. diff --git a/Documentation/arch/x86/x86_64/fsgs.rst b/Documentation/arch/x86/x86_64/fsgs.rst index 50960e09e1f6..d07e445dac5c 100644 --- a/Documentation/arch/x86/x86_64/fsgs.rst +++ b/Documentation/arch/x86/x86_64/fsgs.rst @@ -125,7 +125,7 @@ FSGSBASE instructions enablement FSGSBASE instructions compiler support ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -GCC version 4.6.4 and newer provide instrinsics for the FSGSBASE +GCC version 4.6.4 and newer provide intrinsics for the FSGSBASE instructions. Clang 5 supports them as well. =================== =========================== @@ -135,7 +135,7 @@ instructions. Clang 5 supports them as well. _writegsbase_u64() Write the GS base register =================== =========================== -To utilize these instrinsics <immintrin.h> must be included in the source +To utilize these intrinsics <immintrin.h> must be included in the source code and the compiler option -mfsgsbase has to be added. Compiler support for FS/GS based addressing diff --git a/Documentation/block/bfq-iosched.rst b/Documentation/block/bfq-iosched.rst index df3a8a47f58c..a0ff0eb11e7f 100644 --- a/Documentation/block/bfq-iosched.rst +++ b/Documentation/block/bfq-iosched.rst @@ -9,7 +9,7 @@ controllers), BFQ's main features are: - BFQ guarantees a high system and application responsiveness, and a low latency for time-sensitive applications, such as audio or video players; -- BFQ distributes bandwidth, and not just time, among processes or +- BFQ distributes bandwidth, not just time, among processes or groups (switching back to time distribution when needed to keep throughput high). @@ -111,7 +111,7 @@ Higher speed for code-development tasks If some additional workload happens to be executed in parallel, then BFQ executes the I/O-related components of typical code-development -tasks (compilation, checkout, merge, ...) much more quickly than CFQ, +tasks (compilation, checkout, merge, etc.) much more quickly than CFQ, NOOP or DEADLINE. High throughput @@ -127,9 +127,9 @@ Strong fairness, bandwidth and delay guarantees ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ BFQ distributes the device throughput, and not just the device time, -among I/O-bound applications in proportion their weights, with any +among I/O-bound applications in proportion to their weights, with any workload and regardless of the device parameters. From these bandwidth -guarantees, it is possible to compute tight per-I/O-request delay +guarantees, it is possible to compute a tight per-I/O-request delay guarantees by a simple formula. If not configured for strict service guarantees, BFQ switches to time-based resource sharing (only) for applications that would otherwise cause a throughput loss. @@ -199,7 +199,7 @@ plus a lot of code, are borrowed from CFQ. - On flash-based storage with internal queueing of commands (typically NCQ), device idling happens to be always detrimental - for throughput. So, with these devices, BFQ performs idling + to throughput. So, with these devices, BFQ performs idling only when strictly needed for service guarantees, i.e., for guaranteeing low latency or fairness. In these cases, overall throughput may be sub-optimal. No solution currently exists to @@ -212,7 +212,7 @@ plus a lot of code, are borrowed from CFQ. and to reduce their latency. The most important action taken to achieve this goal is to give to the queues associated with these applications more than their fair share of the device - throughput. For brevity, we call just "weight-raising" the whole + throughput. For brevity, we call it just "weight-raising" the whole sets of actions taken by BFQ to privilege these queues. In particular, BFQ provides a milder form of weight-raising for interactive applications, and a stronger form for soft real-time @@ -231,7 +231,7 @@ plus a lot of code, are borrowed from CFQ. responsive in detecting interleaved I/O (cooperating processes), that it enables BFQ to achieve a high throughput, by queue merging, even for queues for which CFQ needs a different - mechanism, preemption, to get a high throughput. As such EQM is a + mechanism, preemption, to get a high throughput. As such, EQM is a unified mechanism to achieve a high throughput with interleaved I/O. @@ -254,7 +254,7 @@ plus a lot of code, are borrowed from CFQ. - First, with any proportional-share scheduler, the maximum deviation with respect to an ideal service is proportional to the maximum budget (slice) assigned to queues. As a consequence, - BFQ can keep this deviation tight not only because of the + BFQ can keep this deviation tight, not only because of the accurate service of B-WF2Q+, but also because BFQ *does not* need to assign a larger budget to a queue to let the queue receive a higher fraction of the device throughput. @@ -327,7 +327,7 @@ applications. Unset this tunable if you need/want to control weights. slice_idle ---------- -This parameter specifies how long BFQ should idle for next I/O +This parameter specifies how long BFQ should idle for the next I/O request, when certain sync BFQ queues become empty. By default slice_idle is a non-zero value. Idling has a double purpose: boosting throughput and making sure that the desired throughput distribution is @@ -365,7 +365,7 @@ terms of I/O-request dispatches. To guarantee that the actual service order then corresponds to the dispatch order, the strict_guarantees tunable must be set too. -There is an important flipside for idling: apart from the above cases +There is an important flip side to idling: apart from the above cases where it is beneficial also for throughput, idling can severely impact throughput. One important case is random workload. Because of this issue, BFQ tends to avoid idling as much as possible, when it is not @@ -475,7 +475,7 @@ max_budget Maximum amount of service, measured in sectors, that can be provided to a BFQ queue once it is set in service (of course within the limits -of the above timeout). According to what said in the description of +of the above timeout). According to what was said in the description of the algorithm, larger values increase the throughput in proportion to the percentage of sequential I/O requests issued. The price of larger values is that they coarsen the granularity of short-term bandwidth diff --git a/Documentation/core-api/memory-allocation.rst b/Documentation/core-api/memory-allocation.rst index 8b84eb4bdae7..0f19dd524323 100644 --- a/Documentation/core-api/memory-allocation.rst +++ b/Documentation/core-api/memory-allocation.rst @@ -45,8 +45,9 @@ here we briefly outline their recommended usage: * If the allocation is performed from an atomic context, e.g interrupt handler, use ``GFP_NOWAIT``. This flag prevents direct reclaim and IO or filesystem operations. Consequently, under memory pressure - ``GFP_NOWAIT`` allocation is likely to fail. Allocations which - have a reasonable fallback should be using ``GFP_NOWARN``. + ``GFP_NOWAIT`` allocation is likely to fail. Users of this flag need + to provide a suitable fallback to cope with such failures where + appropriate. * If you think that accessing memory reserves is justified and the kernel will be stressed unless allocation succeeds, you may use ``GFP_ATOMIC``. * Untrusted allocations triggered from userspace should be a subject diff --git a/Documentation/dev-tools/kcsan.rst b/Documentation/dev-tools/kcsan.rst index 02143f060b22..d81c42d1063e 100644 --- a/Documentation/dev-tools/kcsan.rst +++ b/Documentation/dev-tools/kcsan.rst @@ -361,7 +361,8 @@ Alternatives Considered ----------------------- An alternative data race detection approach for the kernel can be found in the -`Kernel Thread Sanitizer (KTSAN) <https://github.com/google/ktsan/wiki>`_. +`Kernel Thread Sanitizer (KTSAN) +<https://github.com/google/kernel-sanitizers/blob/master/KTSAN.md>`_. KTSAN is a happens-before data race detector, which explicitly establishes the happens-before order between memory operations, which can then be used to determine data races as defined in `Data Races`_. diff --git a/Documentation/doc-guide/checktransupdate.rst b/Documentation/doc-guide/checktransupdate.rst new file mode 100644 index 000000000000..dfaf9d373747 --- /dev/null +++ b/Documentation/doc-guide/checktransupdate.rst @@ -0,0 +1,54 @@ +.. SPDX-License-Identifier: GPL-2.0 + +Checking for needed translation updates +======================================= + +This script helps track the translation status of the documentation in +different locales, i.e., whether the documentation is up-to-date with +the English counterpart. + +How it works +------------ + +It uses ``git log`` command to track the latest English commit from the +translation commit (order by author date) and the latest English commits +from HEAD. If any differences occur, the file is considered as out-of-date, +then commits that need to be updated will be collected and reported. + +Features implemented + +- check all files in a certain locale +- check a single file or a set of files +- provide options to change output format +- track the translation status of files that have no translation + +Usage +----- + +:: + + ./scripts/checktransupdate.py --help + +Please refer to the output of argument parser for usage details. + +Samples + +- ``./scripts/checktransupdate.py -l zh_CN`` + This will print all the files that need to be updated in the zh_CN locale. +- ``./scripts/checktransupdate.py Documentation/translations/zh_CN/dev-tools/testing-overview.rst`` + This will only print the status of the specified file. + +Then the output is something like: + +:: + + Documentation/dev-tools/kfence.rst + No translation in the locale of zh_CN + + Documentation/translations/zh_CN/dev-tools/testing-overview.rst + commit 42fb9cfd5b18 ("Documentation: dev-tools: Add link to RV docs") + 1 commits needs resolving in total + +Features to be implemented + +- files can be a folder instead of only a file diff --git a/Documentation/doc-guide/index.rst b/Documentation/doc-guide/index.rst index 7c7d97784626..24d058faa75c 100644 --- a/Documentation/doc-guide/index.rst +++ b/Documentation/doc-guide/index.rst @@ -12,6 +12,7 @@ How to write kernel documentation parse-headers contributing maintainer-profile + checktransupdate .. only:: subproject and html diff --git a/Documentation/dontdiff b/Documentation/dontdiff index 3c399f132e2d..94b3492dc301 100644 --- a/Documentation/dontdiff +++ b/Documentation/dontdiff @@ -262,7 +262,7 @@ vsyscall_32.lds wanxlfw.inc uImage unifdef -utf8data.h +utf8data.c wakeup.bin wakeup.elf wakeup.lds diff --git a/Documentation/driver-api/driver-model/devres.rst b/Documentation/driver-api/driver-model/devres.rst index ac9ee7441887..5f2ee8d717b1 100644 --- a/Documentation/driver-api/driver-model/devres.rst +++ b/Documentation/driver-api/driver-model/devres.rst @@ -391,7 +391,7 @@ PCI devm_pci_remap_cfgspace() : ioremap PCI configuration space devm_pci_remap_cfg_resource() : ioremap PCI configuration space resource - pcim_enable_device() : after success, all PCI ops become managed + pcim_enable_device() : after success, some PCI ops become managed pcim_iomap() : do iomap() on a single BAR pcim_iomap_regions() : do request_region() and iomap() on multiple BARs pcim_iomap_regions_request_all() : do request_region() on all and iomap() on multiple BARs diff --git a/Documentation/driver-api/iio/buffers.rst b/Documentation/driver-api/iio/buffers.rst index e83026aebe97..63f364e862d1 100644 --- a/Documentation/driver-api/iio/buffers.rst +++ b/Documentation/driver-api/iio/buffers.rst @@ -15,8 +15,8 @@ trigger source. Multiple data channels can be read at once from IIO buffer sysfs interface ========================== An IIO buffer has an associated attributes directory under -:file:`/sys/bus/iio/iio:device{X}/buffer/*`. Here are some of the existing -attributes: +:file:`/sys/bus/iio/devices/iio:device{X}/buffer/*`. Here are some of the +existing attributes: * :file:`length`, the total number of data samples (capacity) that can be stored by the buffer. @@ -28,8 +28,8 @@ IIO buffer setup The meta information associated with a channel reading placed in a buffer is called a scan element. The important bits configuring scan elements are exposed to userspace applications via the -:file:`/sys/bus/iio/iio:device{X}/scan_elements/` directory. This directory contains -attributes of the following form: +:file:`/sys/bus/iio/devices/iio:device{X}/scan_elements/` directory. This +directory contains attributes of the following form: * :file:`enable`, used for enabling a channel. If and only if its attribute is non *zero*, then a triggered capture will contain data samples for this diff --git a/Documentation/driver-api/iio/core.rst b/Documentation/driver-api/iio/core.rst index 715cf29482a1..dfe438dc91a7 100644 --- a/Documentation/driver-api/iio/core.rst +++ b/Documentation/driver-api/iio/core.rst @@ -24,7 +24,7 @@ then we will show how a device driver makes use of an IIO device. There are two ways for a user space application to interact with an IIO driver. -1. :file:`/sys/bus/iio/iio:device{X}/`, this represents a hardware sensor +1. :file:`/sys/bus/iio/devices/iio:device{X}/`, this represents a hardware sensor and groups together the data channels of the same chip. 2. :file:`/dev/iio:device{X}`, character device node interface used for buffered data transfer and for events information retrieval. @@ -51,8 +51,8 @@ IIO device sysfs interface Attributes are sysfs files used to expose chip info and also allowing applications to set various configuration parameters. For device with -index X, attributes can be found under /sys/bus/iio/iio:deviceX/ directory. -Common attributes are: +index X, attributes can be found under /sys/bus/iio/devices/iio:deviceX/ +directory. Common attributes are: * :file:`name`, description of the physical chip. * :file:`dev`, shows the major:minor pair associated with @@ -140,16 +140,16 @@ Here is how we can make use of the channel's modifiers:: This channel's definition will generate two separate sysfs files for raw data retrieval: -* :file:`/sys/bus/iio/iio:device{X}/in_intensity_ir_raw` -* :file:`/sys/bus/iio/iio:device{X}/in_intensity_both_raw` +* :file:`/sys/bus/iio/devices/iio:device{X}/in_intensity_ir_raw` +* :file:`/sys/bus/iio/devices/iio:device{X}/in_intensity_both_raw` one file for processed data: -* :file:`/sys/bus/iio/iio:device{X}/in_illuminance_input` +* :file:`/sys/bus/iio/devices/iio:device{X}/in_illuminance_input` and one shared sysfs file for sampling frequency: -* :file:`/sys/bus/iio/iio:device{X}/sampling_frequency`. +* :file:`/sys/bus/iio/devices/iio:device{X}/sampling_frequency`. Here is how we can make use of the channel's indexing:: diff --git a/Documentation/fault-injection/fault-injection.rst b/Documentation/fault-injection/fault-injection.rst index 70380a2a01b4..8b8aeea71c68 100644 --- a/Documentation/fault-injection/fault-injection.rst +++ b/Documentation/fault-injection/fault-injection.rst @@ -141,6 +141,14 @@ configuration of fault-injection capabilities. default is 'Y', setting it to 'N' will also inject failures into highmem/user allocations (__GFP_HIGHMEM allocations). +- /sys/kernel/debug/failslab/cache-filter + Format: { 'Y' | 'N' } + + default is 'N', setting it to 'Y' will only inject failures when + objects are requests from certain caches. + + Select the cache by writing '1' to /sys/kernel/slab/<cache>/failslab: + - /sys/kernel/debug/failslab/ignore-gfp-wait: - /sys/kernel/debug/fail_page_alloc/ignore-gfp-wait: @@ -283,7 +291,7 @@ kernel may crash because it may not be able to handle the error. There are 4 types of errors defined in include/asm-generic/error-injection.h EI_ETYPE_NULL - This function will return `NULL` if it fails. e.g. return an allocateed + This function will return `NULL` if it fails. e.g. return an allocated object address. EI_ETYPE_ERRNO @@ -459,6 +467,18 @@ Application Examples losetup -d $DEVICE rm testfile.img +------------------------------------------------------------------------------ + +- Inject only skbuff allocation failures :: + + # mark skbuff_head_cache as faulty + echo 1 > /sys/kernel/slab/skbuff_head_cache/failslab + # Turn on cache filter (off by default) + echo 1 > /sys/kernel/debug/failslab/cache-filter + # Turn on fault injection + echo 1 > /sys/kernel/debug/failslab/times + echo 1 > /sys/kernel/debug/failslab/probability + Tool to run command with failslab or fail_page_alloc ---------------------------------------------------- diff --git a/Documentation/filesystems/9p.rst b/Documentation/filesystems/9p.rst index 1e0e0bb6fdf9..d270a0aa8d55 100644 --- a/Documentation/filesystems/9p.rst +++ b/Documentation/filesystems/9p.rst @@ -31,7 +31,7 @@ Other applications are described in the following papers: * PROSE I/O: Using 9p to enable Application Partitions http://plan9.escet.urjc.es/iwp9/cready/PROSE_iwp9_2006.pdf * VirtFS: A Virtualization Aware File System pass-through - http://goo.gl/3WPDg + https://kernel.org/doc/ols/2010/ols2010-pages-109-120.pdf Usage ===== diff --git a/Documentation/filesystems/autofs.rst b/Documentation/filesystems/autofs.rst index 3b6e38e646cd..1ac576458c69 100644 --- a/Documentation/filesystems/autofs.rst +++ b/Documentation/filesystems/autofs.rst @@ -18,7 +18,7 @@ key advantages: 2. The names and locations of filesystems can be stored in a remote database and can change at any time. The content - in that data base at the time of access will be used to provide + in that database at the time of access will be used to provide a target for the access. The interpretation of names in the filesystem can even be programmatic rather than database-backed, allowing wildcards for example, and can vary based on the user who @@ -423,7 +423,7 @@ The available ioctl commands are: and objects are expired if the are not in use. **AUTOFS_EXP_FORCED** causes the in use status to be ignored - and objects are expired ieven if they are in use. This assumes + and objects are expired even if they are in use. This assumes that the daemon has requested this because it is capable of performing the umount. diff --git a/Documentation/filesystems/journalling.rst b/Documentation/filesystems/journalling.rst index e18f90ffc6fd..0254f7d57429 100644 --- a/Documentation/filesystems/journalling.rst +++ b/Documentation/filesystems/journalling.rst @@ -137,7 +137,7 @@ Fast commits JBD2 to also allows you to perform file-system specific delta commits known as fast commits. In order to use fast commits, you will need to set following -callbacks that perform correspodning work: +callbacks that perform corresponding work: `journal->j_fc_cleanup_cb`: Cleanup function called after every full commit and fast commit. @@ -149,7 +149,7 @@ File system is free to perform fast commits as and when it wants as long as it gets permission from JBD2 to do so by calling the function :c:func:`jbd2_fc_begin_commit()`. Once a fast commit is done, the client file system should tell JBD2 about it by calling -:c:func:`jbd2_fc_end_commit()`. If file system wants JBD2 to perform a full +:c:func:`jbd2_fc_end_commit()`. If the file system wants JBD2 to perform a full commit immediately after stopping the fast commit it can do so by calling :c:func:`jbd2_fc_end_commit_fallback()`. This is useful if fast commit operation fails for some reason and the only way to guarantee consistency is for JBD2 to @@ -199,7 +199,7 @@ Journal Level .. kernel-doc:: fs/jbd2/recovery.c :internal: -Transasction Level +Transaction Level ~~~~~~~~~~~~~~~~~~ .. kernel-doc:: fs/jbd2/transaction.c diff --git a/Documentation/gpu/komeda-kms.rst b/Documentation/gpu/komeda-kms.rst index 633a016563ae..eaea40eb725b 100644 --- a/Documentation/gpu/komeda-kms.rst +++ b/Documentation/gpu/komeda-kms.rst @@ -86,7 +86,7 @@ types of working mode: - Single display mode Two pipelines work together to drive only one display output. - On this mode, pipeline_B doesn't work indenpendently, but outputs its + On this mode, pipeline_B doesn't work independently, but outputs its composition result into pipeline_A, and its pixel timing also derived from pipeline_A.timing_ctrlr. The pipeline_B works just like a "slave" of pipeline_A(master) diff --git a/Documentation/leds/leds-mlxcpld.rst b/Documentation/leds/leds-mlxcpld.rst index 528582429e0b..c520a134d91e 100644 --- a/Documentation/leds/leds-mlxcpld.rst +++ b/Documentation/leds/leds-mlxcpld.rst @@ -115,4 +115,4 @@ Driver provides the following LEDs for the system "msn2100": - [1,1,1,1] = Blue blink 6Hz Driver supports HW blinking at 3Hz and 6Hz frequency (50% duty cycle). -For 3Hz duty cylce is about 167 msec, for 6Hz is about 83 msec. +For 3Hz duty cycle is about 167 msec, for 6Hz is about 83 msec. diff --git a/Documentation/mm/hmm.rst b/Documentation/mm/hmm.rst index 0595098a74d9..f6d53c37a2ca 100644 --- a/Documentation/mm/hmm.rst +++ b/Documentation/mm/hmm.rst @@ -66,7 +66,7 @@ combinatorial explosion in the library entry points. Finally, with the advance of high level language constructs (in C++ but in other languages too) it is now possible for the compiler to leverage GPUs and other devices without programmer knowledge. Some compiler identified patterns -are only do-able with a shared address space. It is also more reasonable to use +are only doable with a shared address space. It is also more reasonable to use a shared address space for all other patterns. @@ -267,7 +267,7 @@ functions are designed to make drivers easier to write and to centralize common code across drivers. Before migrating pages to device private memory, special device private -``struct page`` need to be created. These will be used as special "swap" +``struct page`` needs to be created. These will be used as special "swap" page table entries so that a CPU process will fault if it tries to access a page that has been migrated to device private memory. @@ -322,7 +322,7 @@ between device driver specific code and shared common code: The ``invalidate_range_start()`` callback is passed a ``struct mmu_notifier_range`` with the ``event`` field set to ``MMU_NOTIFY_MIGRATE`` and the ``owner`` field set to - the ``args->pgmap_owner`` field passed to migrate_vma_setup(). This is + the ``args->pgmap_owner`` field passed to migrate_vma_setup(). This allows the device driver to skip the invalidation callback and only invalidate device private MMU mappings that are actually migrating. This is explained more in the next section. @@ -405,7 +405,7 @@ can be used to make a memory range inaccessible from userspace. This replaces all mappings for pages in the given range with special swap entries. Any attempt to access the swap entry results in a fault which is -resovled by replacing the entry with the original mapping. A driver gets +resolved by replacing the entry with the original mapping. A driver gets notified that the mapping has been changed by MMU notifiers, after which point it will no longer have exclusive access to the page. Exclusive access is guaranteed to last until the driver drops the page lock and page reference, at @@ -431,7 +431,7 @@ Same decision was made for memory cgroup. Device memory pages are accounted against same memory cgroup a regular page would be accounted to. This does simplify migration to and from device memory. This also means that migration back from device memory to regular memory cannot fail because it would -go above memory cgroup limit. We might revisit this choice latter on once we +go above memory cgroup limit. We might revisit this choice later on once we get more experience in how device memory is used and its impact on memory resource control. diff --git a/Documentation/mm/vmalloced-kernel-stacks.rst b/Documentation/mm/vmalloced-kernel-stacks.rst index 4edca515bfd7..5bc0f7ceea89 100644 --- a/Documentation/mm/vmalloced-kernel-stacks.rst +++ b/Documentation/mm/vmalloced-kernel-stacks.rst @@ -110,7 +110,7 @@ Bulk of the code is in: `kernel/fork.c <https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/fork.c>`. stack_vm_area pointer in task_struct keeps track of the virtually allocated -stack and a non-null stack_vm_area pointer serves as a indication that the +stack and a non-null stack_vm_area pointer serves as an indication that the virtually mapped kernel stacks are enabled. :: @@ -120,8 +120,8 @@ virtually mapped kernel stacks are enabled. Stack overflow handling ----------------------- -Leading and trailing guard pages help detect stack overflows. When stack -overflows into the guard pages, handlers have to be careful not overflow +Leading and trailing guard pages help detect stack overflows. When the stack +overflows into the guard pages, handlers have to be careful not to overflow the stack again. When handlers are called, it is likely that very little stack space is left. @@ -148,6 +148,6 @@ Conclusions - THREAD_INFO_IN_TASK gets rid of arch-specific thread_info entirely and simply embed the thread_info (containing only flags) and 'int cpu' into task_struct. -- The thread stack can be free'ed as soon as the task is dead (without +- The thread stack can be freed as soon as the task is dead (without waiting for RCU) and then, if vmapped stacks are in use, cache the entire stack for reuse on the same cpu. diff --git a/Documentation/nvme/feature-and-quirk-policy.rst b/Documentation/nvme/feature-and-quirk-policy.rst index c01d836d8e41..e21966bf20a8 100644 --- a/Documentation/nvme/feature-and-quirk-policy.rst +++ b/Documentation/nvme/feature-and-quirk-policy.rst @@ -1,8 +1,8 @@ .. SPDX-License-Identifier: GPL-2.0 -======================================= -Linux NVMe feature and and quirk policy -======================================= +=================================== +Linux NVMe feature and quirk policy +=================================== This file explains the policy used to decide what is supported by the Linux NVMe driver and what is not. diff --git a/Documentation/process/backporting.rst b/Documentation/process/backporting.rst index e1a6ea0a1e8a..a71480fcf3b4 100644 --- a/Documentation/process/backporting.rst +++ b/Documentation/process/backporting.rst @@ -73,7 +73,7 @@ Once you have the patch in git, you can go ahead and cherry-pick it into your source tree. Don't forget to cherry-pick with ``-x`` if you want a written record of where the patch came from! -Note that if you are submiting a patch for stable, the format is +Note that if you are submitting a patch for stable, the format is slightly different; the first line after the subject line needs tobe either:: @@ -147,7 +147,7 @@ divergence. It's important to always identify the commit or commits that caused the conflict, as otherwise you cannot be confident in the correctness of your resolution. As an added bonus, especially if the patch is in an -area you're not that famliar with, the changelogs of these commits will +area you're not that familiar with, the changelogs of these commits will often give you the context to understand the code and potential problems or pitfalls with your conflict resolution. @@ -197,7 +197,7 @@ git blame Another way to find prerequisite commits (albeit only the most recent one for a given conflict) is to run ``git blame``. In this case, you need to run it against the parent commit of the patch you are -cherry-picking and the file where the conflict appared, i.e.:: +cherry-picking and the file where the conflict appeared, i.e.:: git blame <commit>^ -- <path> diff --git a/Documentation/process/coding-style.rst b/Documentation/process/coding-style.rst index 8e30c8f7697d..19d2ed47ff79 100644 --- a/Documentation/process/coding-style.rst +++ b/Documentation/process/coding-style.rst @@ -986,7 +986,7 @@ that can go into these 5 milliseconds. A reasonable rule of thumb is to not put inline at functions that have more than 3 lines of code in them. An exception to this rule are the cases where -a parameter is known to be a compiletime constant, and as a result of this +a parameter is known to be a compile time constant, and as a result of this constantness you *know* the compiler will be able to optimize most of your function away at compile time. For a good example of this later case, see the kmalloc() inline function. diff --git a/Documentation/process/email-clients.rst b/Documentation/process/email-clients.rst index dd22c46d1d02..e6b9173a1845 100644 --- a/Documentation/process/email-clients.rst +++ b/Documentation/process/email-clients.rst @@ -216,7 +216,7 @@ Mutt is highly customizable. Here is a minimum configuration to start using Mutt to send patches through Gmail:: # .muttrc - # ================ IMAP ==================== + # ================ IMAP ==================== set imap_user = 'yourusername@gmail.com' set imap_pass = 'yourpassword' set spoolfile = imaps://imap.gmail.com/INBOX diff --git a/Documentation/process/maintainer-tip.rst b/Documentation/process/maintainer-tip.rst index ba312345d030..349a27a53343 100644 --- a/Documentation/process/maintainer-tip.rst +++ b/Documentation/process/maintainer-tip.rst @@ -154,7 +154,7 @@ Examples for illustration: We modify the hot cpu handling to cancel the delayed work on the dying cpu and run the worker immediately on a different cpu in same domain. We - donot flush the worker because the MBM overflow worker reschedules the + do not flush the worker because the MBM overflow worker reschedules the worker on same CPU and scans the domain->cpu_mask to get the domain pointer. diff --git a/Documentation/process/submitting-patches.rst b/Documentation/process/submitting-patches.rst index f310f2f36666..1518bd57adab 100644 --- a/Documentation/process/submitting-patches.rst +++ b/Documentation/process/submitting-patches.rst @@ -842,6 +842,14 @@ Make sure that base commit is in an official maintainer/mainline tree and not in some internal, accessible only to you tree - otherwise it would be worthless. +Tooling +------- + +Many of the technical aspects of this process can be automated using +b4, documented at <https://b4.docs.kernel.org/en/latest/>. This can +help with things like tracking dependencies, running checkpatch and +with formatting and sending mails. + References ---------- diff --git a/Documentation/scheduler/completion.rst b/Documentation/scheduler/completion.rst index f19aca2062bd..adf0c0a56d02 100644 --- a/Documentation/scheduler/completion.rst +++ b/Documentation/scheduler/completion.rst @@ -51,7 +51,7 @@ which has only two fields:: struct completion { unsigned int done; - wait_queue_head_t wait; + struct swait_queue_head wait; }; This provides the ->wait waitqueue to place tasks on for waiting (if any), and diff --git a/Documentation/scheduler/index.rst b/Documentation/scheduler/index.rst index 43bd8a145b7a..1f2942c4d14b 100644 --- a/Documentation/scheduler/index.rst +++ b/Documentation/scheduler/index.rst @@ -12,6 +12,7 @@ Scheduler sched-bwc sched-deadline sched-design-CFS + sched-eevdf sched-domains sched-capacity sched-energy diff --git a/Documentation/scheduler/sched-design-CFS.rst b/Documentation/scheduler/sched-design-CFS.rst index bc1e507269c6..8786f219fc73 100644 --- a/Documentation/scheduler/sched-design-CFS.rst +++ b/Documentation/scheduler/sched-design-CFS.rst @@ -8,10 +8,12 @@ CFS Scheduler 1. OVERVIEW ============ -CFS stands for "Completely Fair Scheduler," and is the new "desktop" process -scheduler implemented by Ingo Molnar and merged in Linux 2.6.23. It is the -replacement for the previous vanilla scheduler's SCHED_OTHER interactivity -code. +CFS stands for "Completely Fair Scheduler," and is the "desktop" process +scheduler implemented by Ingo Molnar and merged in Linux 2.6.23. When +originally merged, it was the replacement for the previous vanilla +scheduler's SCHED_OTHER interactivity code. Nowadays, CFS is making room +for EEVDF, for which documentation can be found in +Documentation/scheduler/sched-eevdf.rst. 80% of CFS's design can be summed up in a single sentence: CFS basically models an "ideal, precise multi-tasking CPU" on real hardware. diff --git a/Documentation/scheduler/sched-eevdf.rst b/Documentation/scheduler/sched-eevdf.rst new file mode 100644 index 000000000000..83efe7c0a30d --- /dev/null +++ b/Documentation/scheduler/sched-eevdf.rst @@ -0,0 +1,43 @@ +=============== +EEVDF Scheduler +=============== + +The "Earliest Eligible Virtual Deadline First" (EEVDF) was first introduced +in a scientific publication in 1995 [1]. The Linux kernel began +transitioning to EEVDF in version 6.6 (as a new option in 2024), moving +away from the earlier Completely Fair Scheduler (CFS) in favor of a version +of EEVDF proposed by Peter Zijlstra in 2023 [2-4]. More information +regarding CFS can be found in +Documentation/scheduler/sched-design-CFS.rst. + +Similarly to CFS, EEVDF aims to distribute CPU time equally among all +runnable tasks with the same priority. To do so, it assigns a virtual run +time to each task, creating a "lag" value that can be used to determine +whether a task has received its fair share of CPU time. In this way, a task +with a positive lag is owed CPU time, while a negative lag means the task +has exceeded its portion. EEVDF picks tasks with lag greater or equal to +zero and calculates a virtual deadline (VD) for each, selecting the task +with the earliest VD to execute next. It's important to note that this +allows latency-sensitive tasks with shorter time slices to be prioritized, +which helps with their responsiveness. + +There are ongoing discussions on how to manage lag, especially for sleeping +tasks; but at the time of writing EEVDF uses a "decaying" mechanism based +on virtual run time (VRT). This prevents tasks from exploiting the system +by sleeping briefly to reset their negative lag: when a task sleeps, it +remains on the run queue but marked for "deferred dequeue," allowing its +lag to decay over VRT. Hence, long-sleeping tasks eventually have their lag +reset. Finally, tasks can preempt others if their VD is earlier, and tasks +can request specific time slices using the new sched_setattr() system call, +which further facilitates the job of latency-sensitive applications. + +REFERENCES +========== + +[1] https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=805acf7726282721504c8f00575d91ebfd750564 + +[2] https://lore.kernel.org/lkml/a79014e6-ea83-b316-1e12-2ae056bda6fa@linux.vnet.ibm.com/ + +[3] https://lwn.net/Articles/969062/ + +[4] https://lwn.net/Articles/925371/ diff --git a/Documentation/sphinx/kerneldoc-preamble.sty b/Documentation/sphinx/kerneldoc-preamble.sty index d479cfa73658..5d68395539fe 100644 --- a/Documentation/sphinx/kerneldoc-preamble.sty +++ b/Documentation/sphinx/kerneldoc-preamble.sty @@ -199,6 +199,8 @@ % Inactivate CJK after tableofcontents \apptocmd{\sphinxtableofcontents}{\kerneldocCJKoff}{}{} \xeCJKsetup{CJKspace = true}% For inter-phrase space of Korean TOC + % Suppress extra white space at latin .. non-latin in literal blocks + \AtBeginEnvironment{sphinxVerbatim}{\CJKsetecglue{}} }{ % Don't enable CJK % Custom macros to on/off CJK and switch CJK fonts (Dummy) \newcommand{\kerneldocCJKon}{} diff --git a/Documentation/translations/ko_KR/core-api/wrappers/memory-barriers.rst b/Documentation/translations/ko_KR/core-api/wrappers/memory-barriers.rst new file mode 100644 index 000000000000..526ae534dd86 --- /dev/null +++ b/Documentation/translations/ko_KR/core-api/wrappers/memory-barriers.rst @@ -0,0 +1,18 @@ +.. SPDX-License-Identifier: GPL-2.0 + This is a simple wrapper to bring memory-barriers.txt into the RST world + until such a time as that file can be converted directly. + +========================= +리눅스 커널 메모리 배리어 +========================= + +.. raw:: latex + + \footnotesize + +.. include:: ../../memory-barriers.txt + :literal: + +.. raw:: latex + + \normalsize diff --git a/Documentation/translations/ko_KR/index.rst b/Documentation/translations/ko_KR/index.rst index 4add6b2fe1f2..a20772f9d61c 100644 --- a/Documentation/translations/ko_KR/index.rst +++ b/Documentation/translations/ko_KR/index.rst @@ -11,19 +11,9 @@ .. toctree:: :maxdepth: 1 - howto - - -리눅스 커널 메모리 배리어 -------------------------- - -.. raw:: latex - - \footnotesize - -.. include:: ./memory-barriers.txt - :literal: + process/howto + core-api/wrappers/memory-barriers.rst .. raw:: latex - }\kerneldocEndKR + }\kerneldocEndKR diff --git a/Documentation/translations/ko_KR/howto.rst b/Documentation/translations/ko_KR/process/howto.rst index 34f14899c155..34f14899c155 100644 --- a/Documentation/translations/ko_KR/howto.rst +++ b/Documentation/translations/ko_KR/process/howto.rst diff --git a/Documentation/translations/sp_SP/scheduler/index.rst b/Documentation/translations/sp_SP/scheduler/index.rst index 768488d6f001..32f9fd7517b2 100644 --- a/Documentation/translations/sp_SP/scheduler/index.rst +++ b/Documentation/translations/sp_SP/scheduler/index.rst @@ -6,3 +6,4 @@ :maxdepth: 1 sched-design-CFS + sched-eevdf diff --git a/Documentation/translations/sp_SP/scheduler/sched-design-CFS.rst b/Documentation/translations/sp_SP/scheduler/sched-design-CFS.rst index 731c266beb1a..dc728c739e28 100644 --- a/Documentation/translations/sp_SP/scheduler/sched-design-CFS.rst +++ b/Documentation/translations/sp_SP/scheduler/sched-design-CFS.rst @@ -14,10 +14,10 @@ Gestor de tareas CFS CFS viene de las siglas en inglés de "Gestor de tareas totalmente justo" ("Completely Fair Scheduler"), y es el nuevo gestor de tareas de escritorio -implementado por Ingo Molnar e integrado en Linux 2.6.23. Es el sustituto de -el previo gestor de tareas SCHED_OTHER. - -Nota: El planificador EEVDF fue incorporado más recientemente al kernel. +implementado por Ingo Molnar e integrado en Linux 2.6.23. Es el sustituto +del previo gestor de tareas SCHED_OTHER. Hoy en día se está abriendo camino +para el gestor de tareas EEVDF, cuya documentación se puede ver en +Documentation/scheduler/sched-eevdf.rst El 80% del diseño de CFS puede ser resumido en una única frase: CFS básicamente modela una "CPU ideal, precisa y multi-tarea" sobre hardware diff --git a/Documentation/translations/sp_SP/scheduler/sched-eevdf.rst b/Documentation/translations/sp_SP/scheduler/sched-eevdf.rst new file mode 100644 index 000000000000..d54736f297b8 --- /dev/null +++ b/Documentation/translations/sp_SP/scheduler/sched-eevdf.rst @@ -0,0 +1,58 @@ + +.. include:: ../disclaimer-sp.rst + +:Original: Documentation/scheduler/sched-eevdf.rst +:Translator: Sergio González Collado <sergio.collado@gmail.com> + +====================== +Gestor de tareas EEVDF +====================== + +El gestor de tareas EEVDF, del inglés: "Earliest Eligible Virtual Deadline +First", fue presentado por primera vez en una publicación científica en +1995 [1]. El kernel de Linux comenzó a transicionar hacia EEVPF en la +versión 6.6 (y como una nueva opción en 2024), alejándose del gestor +de tareas CFS, en favor de una versión de EEVDF propuesta por Peter +Zijlstra en 2023 [2-4]. Más información relativa a CFS puede encontrarse +en Documentation/scheduler/sched-design-CFS.rst. + +De forma parecida a CFS, EEVDF intenta distribuir el tiempo de ejecución +de la CPU de forma equitativa entre todas las tareas que tengan la misma +prioridad y puedan ser ejecutables. Para eso, asigna un tiempo de +ejecución virtual a cada tarea, creando un "retraso" que puede ser usado +para determinar si una tarea ha recibido su cantidad justa de tiempo +de ejecución en la CPU. De esta manera, una tarea con un "retraso" +positivo, es porque se le debe tiempo de ejecución, mientras que una +con "retraso" negativo implica que la tarea ha excedido su cuota de +tiempo. EEVDF elige las tareas con un "retraso" mayor igual a cero y +calcula un tiempo límite de ejecución virtual (VD, del inglés: virtual +deadline) para cada una, eligiendo la tarea con la VD más próxima para +ser ejecutada a continuación. Es importante darse cuenta que esto permite +que la tareas que sean sensibles a la latencia que tengan porciones de +tiempos de ejecución de CPU más cortos ser priorizadas, lo cual ayuda con +su menor tiempo de respuesta. + +Ahora mismo se está discutiendo cómo gestionar esos "retrasos", especialmente +en tareas que estén en un estado durmiente; pero en el momento en el que +se escribe este texto EEVDF usa un mecanismo de "decaimiento" basado en el +tiempo virtual de ejecución (VRT, del inglés: virtual run time). Esto previene +a las tareas de abusar del sistema simplemente durmiendo brevemente para +reajustar su retraso negativo: cuando una tarea duerme, esta permanece en +la cola de ejecución pero marcada para "desencolado diferido", permitiendo +a su retraso decaer a lo largo de VRT. Por tanto, las tareas que duerman +por más tiempo eventualmente eliminarán su retraso. Finalmente, las tareas +pueden adelantarse a otras si su VD es más próximo en el tiempo, y las +tareas podrán pedir porciones de tiempo específicas con la nueva llamada +del sistema sched_setattr(), todo esto facilitara el trabajo de las aplicaciones +que sean sensibles a las latencias. + +REFERENCIAS +=========== + +[1] https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=805acf7726282721504c8f00575d91ebfd750564 + +[2] https://lore.kernel.org/lkml/a79014e6-ea83-b316-1e12-2ae056bda6fa@linux.vnet.ibm.com/ + +[3] https://lwn.net/Articles/969062/ + +[4] https://lwn.net/Articles/925371/ diff --git a/Documentation/translations/zh_CN/admin-guide/index.rst b/Documentation/translations/zh_CN/admin-guide/index.rst index 0db80ab830a0..15d9ab5993a7 100644 --- a/Documentation/translations/zh_CN/admin-guide/index.rst +++ b/Documentation/translations/zh_CN/admin-guide/index.rst @@ -37,7 +37,6 @@ Todolist: reporting-issues reporting-regressions - security-bugs bug-hunting bug-bisect tainted-kernels diff --git a/Documentation/translations/zh_CN/admin-guide/reporting-issues.rst b/Documentation/translations/zh_CN/admin-guide/reporting-issues.rst index 59e51e3539b4..9ff4ba94391d 100644 --- a/Documentation/translations/zh_CN/admin-guide/reporting-issues.rst +++ b/Documentation/translations/zh_CN/admin-guide/reporting-issues.rst @@ -300,7 +300,7 @@ Documentation/admin-guide/reporting-regressions.rst 对此进行了更详细的 添加到回归跟踪列表中,以确保它不会被忽略。 什么是安全问题留给您自己判断。在继续之前,请考虑阅读 -Documentation/translations/zh_CN/admin-guide/security-bugs.rst , +Documentation/translations/zh_CN/process/security-bugs.rst , 因为它提供了如何最恰当地处理安全问题的额外细节。 当发生了完全无法接受的糟糕事情时,此问题就是一个“非常严重的问题”。例如, @@ -983,7 +983,7 @@ Documentation/admin-guide/reporting-regressions.rst ;它还提供了大量其 报告,请将报告的文本转发到这些地址;但请在报告的顶部加上注释,表明您提交了 报告,并附上工单链接。 -更多信息请参见 Documentation/translations/zh_CN/admin-guide/security-bugs.rst 。 +更多信息请参见 Documentation/translations/zh_CN/process/security-bugs.rst 。 发布报告后的责任 diff --git a/Documentation/translations/zh_CN/dev-tools/index.rst b/Documentation/translations/zh_CN/dev-tools/index.rst index c540e4a7d5db..6a8c637c0be1 100644 --- a/Documentation/translations/zh_CN/dev-tools/index.rst +++ b/Documentation/translations/zh_CN/dev-tools/index.rst @@ -21,6 +21,7 @@ Documentation/translations/zh_CN/dev-tools/testing-overview.rst testing-overview sparse kcov + kcsan gcov kasan ubsan @@ -32,7 +33,6 @@ Todolist: - checkpatch - coccinelle - kmsan - - kcsan - kfence - kgdb - kselftest diff --git a/Documentation/translations/zh_CN/dev-tools/kcsan.rst b/Documentation/translations/zh_CN/dev-tools/kcsan.rst new file mode 100644 index 000000000000..8c495c17f109 --- /dev/null +++ b/Documentation/translations/zh_CN/dev-tools/kcsan.rst @@ -0,0 +1,320 @@ +.. SPDX-License-Identifier: GPL-2.0 + +.. include:: ../disclaimer-zh_CN.rst + +:Original: Documentation/dev-tools/kcsan.rst +:Translator: 刘浩阳 Haoyang Liu <tttturtleruss@hust.edu.cn> + +内核并发消毒剂(KCSAN) +===================== + +内核并发消毒剂(KCSAN)是一个动态竞争检测器,依赖编译时插桩,并且使用基于观察 +点的采样方法来检测竞争。KCSAN 的主要目的是检测 `数据竞争`_。 + +使用 +---- + +KCSAN 受 GCC 和 Clang 支持。使用 GCC 需要版本 11 或更高,使用 Clang 也需要 +版本 11 或更高。 + +为了启用 KCSAN,用如下参数配置内核:: + + CONFIG_KCSAN = y + +KCSAN 提供了几个其他的配置选项来自定义行为(见 ``lib/Kconfig.kcsan`` 中的各自的 +帮助文档以获取更多信息)。 + +错误报告 +~~~~~~~~ + +一个典型数据竞争的报告如下所示:: + + ================================================================== + BUG: KCSAN: data-race in test_kernel_read / test_kernel_write + + write to 0xffffffffc009a628 of 8 bytes by task 487 on cpu 0: + test_kernel_write+0x1d/0x30 + access_thread+0x89/0xd0 + kthread+0x23e/0x260 + ret_from_fork+0x22/0x30 + + read to 0xffffffffc009a628 of 8 bytes by task 488 on cpu 6: + test_kernel_read+0x10/0x20 + access_thread+0x89/0xd0 + kthread+0x23e/0x260 + ret_from_fork+0x22/0x30 + + value changed: 0x00000000000009a6 -> 0x00000000000009b2 + + Reported by Kernel Concurrency Sanitizer on: + CPU: 6 PID: 488 Comm: access_thread Not tainted 5.12.0-rc2+ #1 + Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 + ================================================================== + +报告的头部提供了一个关于竞争中涉及到的函数的简短总结。随后是竞争中的两个线程的 +访问类型和堆栈信息。如果 KCSAN 发现了一个值的变化,那么那个值的旧值和新值会在 +“value changed”这一行单独显示。 + +另一个不太常见的数据竞争类型的报告如下所示:: + + ================================================================== + BUG: KCSAN: data-race in test_kernel_rmw_array+0x71/0xd0 + + race at unknown origin, with read to 0xffffffffc009bdb0 of 8 bytes by task 515 on cpu 2: + test_kernel_rmw_array+0x71/0xd0 + access_thread+0x89/0xd0 + kthread+0x23e/0x260 + ret_from_fork+0x22/0x30 + + value changed: 0x0000000000002328 -> 0x0000000000002329 + + Reported by Kernel Concurrency Sanitizer on: + CPU: 2 PID: 515 Comm: access_thread Not tainted 5.12.0-rc2+ #1 + Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 + ================================================================== + +这个报告是当另一个竞争线程不可能被发现,但是可以从观测的内存地址的值改变而推断 +出来的时候生成的。这类报告总是会带有“value changed”行。这类报告的出现通常是因 +为在竞争线程中缺少插桩,也可能是因为其他原因,比如 DMA 访问。这类报告只会在 +设置了内核参数 ``CONFIG_KCSAN_REPORT_RACE_UNKNOWN_ORIGIN=y`` 时才会出现,而这 +个参数是默认启用的。 + +选择性分析 +~~~~~~~~~~ + +对于一些特定的访问,函数,编译单元或者整个子系统,可能需要禁用数据竞争检测。 +对于静态黑名单,有如下可用的参数: + +* KCSAN 支持使用 ``data_race(expr)`` 注解,这个注解告诉 KCSAN 任何由访问 + ``expr`` 所引起的数据竞争都应该被忽略,其产生的行为后果被认为是安全的。请查阅 + `在 LKMM 中 "标记共享内存访问"`_ 获得更多信息。 + +* 与 ``data_race(...)`` 相似,可以使用类型限定符 ``__data_racy`` 来标记一个变量 + ,所有访问该变量而导致的数据竞争都是故意为之并且应该被 KCSAN 忽略:: + + struct foo { + ... + int __data_racy stats_counter; + ... + }; + +* 使用函数属性 ``__no_kcsan`` 可以对整个函数禁用数据竞争检测:: + + __no_kcsan + void foo(void) { + ... + + 为了动态限制该为哪些函数生成报告,查阅 `Debug 文件系统接口`_ 黑名单/白名单特性。 + +* 为特定的编译单元禁用数据竞争检测,将下列参数加入到 ``Makefile`` 中:: + + KCSAN_SANITIZE_file.o := n + +* 为 ``Makefile`` 中的所有编译单元禁用数据竞争检测,将下列参数添加到相应的 + ``Makefile`` 中:: + + KCSAN_SANITIZE := n + +.. _在 LKMM 中 "标记共享内存访问": https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/memory-model/Documentation/access-marking.txt + +此外,KCSAN 可以根据偏好设置显示或隐藏整个类别的数据竞争。可以使用如下 +Kconfig 参数进行更改: + +* ``CONFIG_KCSAN_REPORT_VALUE_CHANGE_ONLY``: 如果启用了该参数并且通过观测点 + (watchpoint) 观测到一个有冲突的写操作,但是对应的内存地址中存储的值没有改变, + 则不会报告这起数据竞争。 + +* ``CONFIG_KCSAN_ASSUME_PLAIN_WRITES_ATOMIC``: 假设默认情况下,不超过字大小的简 + 单对齐写入操作是原子的。假设这些写入操作不会受到不安全的编译器优化影响,从而导 + 致数据竞争。该选项使 KCSAN 不报告仅由不超过字大小的简单对齐写入操作引起 + 的冲突所导致的数据竞争。 + +* ``CONFIG_KCSAN_PERMISSIVE``: 启用额外的宽松规则来忽略某些常见类型的数据竞争。 + 与上面的规则不同,这条规则更加复杂,涉及到值改变模式,访问类型和地址。这个 + 选项依赖编译选项 ``CONFIG_KCSAN_REPORT_VALUE_CHANGE_ONLY=y``。请查看 + ``kernel/kcsan/permissive.h`` 获取更多细节。对于只侧重于特定子系统而不是整个 + 内核报告的测试者和维护者,建议禁用该选项。 + +要使用尽可能严格的规则,选择 ``CONFIG_KCSAN_STRICT=y``,这将配置 KCSAN 尽可 +能紧密地遵循 Linux 内核内存一致性模型(LKMM)。 + +Debug 文件系统接口 +~~~~~~~~~~~~~~~~~~ + +文件 ``/sys/kernel/debug/kcsan`` 提供了如下接口: + +* 读 ``/sys/kernel/debug/kcsan`` 返回不同的运行时统计数据。 + +* 将 ``on`` 或 ``off`` 写入 ``/sys/kernel/debug/kcsan`` 允许打开或关闭 KCSAN。 + +* 将 ``!some_func_name`` 写入 ``/sys/kernel/debug/kcsan`` 会将 + ``some_func_name`` 添加到报告过滤列表中,该列表(默认)会将数据竞争报告中的顶 + 层堆栈帧是列表中函数的情况列入黑名单。 + +* 将 ``blacklist`` 或 ``whitelist`` 写入 ``/sys/kernel/debug/kcsan`` 会改变报告 + 过滤行为。例如,黑名单的特性可以用来过滤掉经常发生的数据竞争。白名单特性可以帮 + 助复现和修复测试。 + +性能调优 +~~~~~~~~ + +影响 KCSAN 整体的性能和 bug 检测能力的核心参数是作为内核命令行参数公开的,其默认 +值也可以通过相应的 Kconfig 选项更改。 + +* ``kcsan.skip_watch`` (``CONFIG_KCSAN_SKIP_WATCH``): 在另一个观测点设置之前每 + 个 CPU 要跳过的内存操作次数。更加频繁的设置观测点将增加观察到竞争情况的可能性 + 。这个参数对系统整体的性能和竞争检测能力影响最显著。 + +* ``kcsan.udelay_task`` (``CONFIG_KCSAN_UDELAY_TASK``): 对于任务,观测点设置之 + 后暂停执行的微秒延迟。值越大,检测到竞争情况的可能性越高。 + +* ``kcsan.udelay_interrupt`` (``CONFIG_KCSAN_UDELAY_INTERRUPT``): 对于中断, + 观测点设置之后暂停执行的微秒延迟。中断对于延迟的要求更加严格,其延迟通常应该小 + 于为任务选择的延迟。 + +它们可以通过 ``/sys/module/kcsan/parameters/`` 在运行时进行调整。 + +数据竞争 +-------- + +在一次执行中,如果两个内存访问存在 *冲突*,在不同的线程中并发执行,并且至少 +有一个访问是 *简单访问*,则它们就形成了 *数据竞争*。如果它们访问了同一个内存地址并且 +至少有一个是写操作,则称它们存在 *冲突*。有关更详细的讨论和定义,见 +`LKMM 中的 "简单访问和数据竞争"`_。 + +.. _LKMM 中的 "简单访问和数据竞争": https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/memory-model/Documentation/explanation.txt#n1922 + +与 Linux 内核内存一致性模型(LKMM)的关系 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +LKMM 定义了各种内存操作的传播和排序规则,让开发者可以推理并发代码。最终这允许确 +定并发代码可能的执行情况并判断这些代码是否存在数据竞争。 + +KCSAN 可以识别 *被标记的原子操作* ( ``READ_ONCE``, ``WRITE_ONCE`` , ``atomic_*`` +等),以及内存屏障所隐含的一部分顺序保证。启用 ``CONFIG_KCSAN_WEAK_MEMORY=y`` +配置,KCSAN 会对加载或存储缓冲区进行建模,并可以检测遗漏的 +``smp_mb()``, ``smp_wmb()``, ``smp_rmb()``, ``smp_store_release()``,以及所有的 +具有等效隐含内存屏障的 ``atomic_*`` 操作。 + +请注意,KCSAN 不会报告所有由于缺失内存顺序而导致的数据竞争,特别是在需要内存屏障 +来禁止后续内存操作在屏障之前重新排序的情况下。因此,开发人员应该仔细考虑那些未 +被检查的内存顺序要求。 + +数据竞争以外的竞争检测 +--------------------------- + +对于有着复杂并发设计的代码,竞争状况不总是表现为数据竞争。如果并发操作引起了意 +料之外的系统行为,则认为发生了竞争状况。另一方面,数据竞争是在 C 语言层面定义 +的。内核定义了一些宏定义用来检测非数据竞争的漏洞并发代码的属性。 + +.. note:: + 为了不引入新的文档编译警告,这里不展示宏定义的具体内容,如果想查看具体 + 宏定义可以结合原文(Documentation/dev-tools/kcsan.rst)阅读。 + +实现细节 +-------- + +KCSAN 需要观测两个并发访问。特别重要的是,我们想要(a)增加观测到竞争的机会(尤 +其是很少发生的竞争),以及(b)能够实际观测到这些竞争。我们可以通过(a)注入 +不同的延迟,以及(b)使用地址观测点(或断点)来实现。 + +如果我们在设置了地址观察点的情况下故意延迟一个内存访问,然后观察到观察点被触发 +,那么两个对同一地址的访问就发生了竞争。使用硬件观察点,这是 `DataCollider +<http://usenix.org/legacy/events/osdi10/tech/full_papers/Erickson.pdf>`_ 中采用 +的方法。与 DataCollider 不同,KCSAN 不使用硬件观察点,而是依赖于编译器插桩和“软 +观测点”。 + +在 KCSAN 中,观察点是通过一种高效的编码实现的,该编码将访问类型、大小和地址存储 +在一个长整型变量中;使用“软观察点”的好处是具有可移植性和更大的灵活性。然后, +KCSAN依赖于编译器对普通访问的插桩。对于每个插桩的普通访问: + +1. 检测是否存在一个符合的观测点,如果存在,并且至少有一个操作是写操作,则我们发 + 现了一个竞争访问。 + +2. 如果不存在匹配的观察点,则定期的设置一个观测点并随机延迟一小段时间。 + +3. 在延迟前检查数据值,并在延迟后重新检查数据值;如果值不匹配,我们推测存在一个 + 未知来源的竞争状况。 + +为了检测普通访问和标记访问之间的数据竞争,KCSAN 也对标记访问进行标记,但仅用于 +检查是否存在观察点;即 KCSAN 不会在标记访问上设置观察点。通过不在标记操作上设 +置观察点,如果对一个变量的所有并发访问都被正确标记,KCSAN 将永远不会触发观察点 +,因此也不会报告这些访问。 + +弱内存建模 +~~~~~~~~~~ + +KCSAN 通过建模访问重新排序(使用 ``CONFIG_KCSAN_WEAK_MEMORY=y``)来检测由于缺少 +内存屏障而导致的数据竞争。每个设置了观察点的普通内存访问也会被选择在其函数范围 +内进行模拟重新排序(最多一个正在进行的访问)。 + +一旦某个访问被选择用于重新排序,它将在函数范围内与每个其他访问进行检查。如果遇 +到适当的内存屏障,该访问将不再被考虑进行模拟重新排序。 + +当内存操作的结果应该由屏障排序时,KCSAN 可以检测到仅由于缺失屏障而导致的冲突的 +数据竞争。考虑下面的例子:: + + int x, flag; + void T1(void) + { + x = 1; // data race! + WRITE_ONCE(flag, 1); // correct: smp_store_release(&flag, 1) + } + void T2(void) + { + while (!READ_ONCE(flag)); // correct: smp_load_acquire(&flag) + ... = x; // data race! + } + +当启用了弱内存建模,KCSAN 将考虑对 ``T1`` 中的 ``x`` 进行模拟重新排序。在写入 +``flag`` 之后,x再次被检查是否有并发访问:因为 ``T2`` 可以在写入 +``flag`` 之后继续进行,因此检测到数据竞争。如果遇到了正确的屏障, ``x`` 在正确 +释放 ``flag`` 后将不会被考虑重新排序,因此不会检测到数据竞争。 + +在复杂性上的权衡以及实际的限制意味着只能检测到一部分由于缺失内存屏障而导致的数 +据竞争。由于当前可用的编译器支持,KCSAN 的实现仅限于建模“缓冲”(延迟访问)的 +效果,因为运行时不能“预取”访问。同时要注意,观测点只设置在普通访问上,这是唯 +一一个 KCSAN 会模拟重新排序的访问类型。这意味着标记访问的重新排序不会被建模。 + +上述情况的一个后果是获取 (acquire) 操作不需要屏障插桩(不需要预取)。此外,引 +入地址或控制依赖的标记访问不需要特殊处理(标记访问不能重新排序,后续依赖的访问 +不能被预取)。 + +关键属性 +~~~~~~~~ + +1. **内存开销**:整体的内存开销只有几 MiB,取决于配置。当前的实现是使用一个小长 + 整型数组来编码观测点信息,几乎可以忽略不计。 + +2. **性能开销**:KCSAN 的运行时旨在性能开销最小化,使用一个高效的观测点编码,在 + 快速路径中不需要获取任何锁。在拥有 8 个 CPU 的系统上的内核启动来说: + + - 使用默认 KCSAN 配置时,性能下降 5 倍; + - 仅因运行时快速路径开销导致性能下降 2.8 倍(设置非常大的 + ``KCSAN_SKIP_WATCH`` 并取消设置 ``KCSAN_SKIP_WATCH_RANDOMIZE``)。 + +3. **注解开销**:KCSAN 运行时之外需要的注释很少。因此,随着内核的发展维护的开 + 销也很小。 + +4. **检测设备的竞争写入**:由于设置观测点时会检查数据值,设备的竞争写入也可以 + 被检测到。 + +5. **内存排序**:KCSAN 只了解一部分 LKMM 排序规则;这可能会导致漏报数据竞争( + 假阴性)。 + +6. **分析准确率**: 对于观察到的执行,由于使用采样策略,分析是 *不健全* 的 + (可能有假阴性),但期望得到完整的分析(没有假阳性)。 + +考虑的替代方案 +-------------- + +一个内核数据竞争检测的替代方法是 `Kernel Thread Sanitizer (KTSAN) +<https://github.com/google/kernel-sanitizers/blob/master/KTSAN.md>`_。KTSAN 是一 +个基于先行发生关系(happens-before)的数据竞争检测器,它显式建立内存操作之间的先 +后发生顺序,这可以用来确定 `数据竞争`_ 中定义的数据竞争。 + +为了建立正确的先行发生关系,KTSAN 必须了解 LKMM 的所有排序规则和同步原语。不幸 +的是,任何遗漏都会导致大量的假阳性,这在包含众多自定义同步机制的内核上下文中特 +别有害。为了跟踪前因后果关系,KTSAN 的实现需要为每个内存位置提供元数据(影子内 +存),这意味着每页内存对应 4 页影子内存,在大型系统上可能会带来数十 GiB 的开销 +。 diff --git a/Documentation/translations/zh_CN/doc-guide/checktransupdate.rst b/Documentation/translations/zh_CN/doc-guide/checktransupdate.rst new file mode 100644 index 000000000000..d20b4ce66b9f --- /dev/null +++ b/Documentation/translations/zh_CN/doc-guide/checktransupdate.rst @@ -0,0 +1,55 @@ +.. SPDX-License-Identifier: GPL-2.0 + +.. include:: ../disclaimer-zh_CN.rst + +:Original: Documentation/doc-guide/checktransupdate.rst + +:译者: 慕冬亮 Dongliang Mu <dzm91@hust.edu.cn> + +检查翻译更新 + +这个脚本帮助跟踪不同语言的文档翻译状态,即文档是否与对应的英文版本保持更新。 + +工作原理 +------------ + +它使用 ``git log`` 命令来跟踪翻译提交的最新英文提交(按作者日期排序)和英文文档的 +最新提交。如果有任何差异,则该文件被认为是过期的,然后需要更新的提交将被收集并报告。 + +实现的功能 + +- 检查特定语言中的所有文件 +- 检查单个文件或一组文件 +- 提供更改输出格式的选项 +- 跟踪没有翻译过的文件的翻译状态 + +用法 +----- + +:: + + ./scripts/checktransupdate.py --help + +具体用法请参考参数解析器的输出 + +示例 + +- ``./scripts/checktransupdate.py -l zh_CN`` + 这将打印 zh_CN 语言中需要更新的所有文件。 +- ``./scripts/checktransupdate.py Documentation/translations/zh_CN/dev-tools/testing-overview.rst`` + 这将只打印指定文件的状态。 + +然后输出类似如下的内容: + +:: + + Documentation/dev-tools/kfence.rst + No translation in the locale of zh_CN + + Documentation/translations/zh_CN/dev-tools/testing-overview.rst + commit 42fb9cfd5b18 ("Documentation: dev-tools: Add link to RV docs") + 1 commits needs resolving in total + +待实现的功能 + +- 文件参数可以是文件夹而不仅仅是单个文件 diff --git a/Documentation/translations/zh_CN/doc-guide/index.rst b/Documentation/translations/zh_CN/doc-guide/index.rst index 78c2e9a1697f..0ac1fc9315ea 100644 --- a/Documentation/translations/zh_CN/doc-guide/index.rst +++ b/Documentation/translations/zh_CN/doc-guide/index.rst @@ -18,6 +18,7 @@ parse-headers contributing maintainer-profile + checktransupdate .. only:: subproject and html diff --git a/Documentation/translations/zh_CN/index.rst b/Documentation/translations/zh_CN/index.rst index 20b9d4270d1f..7574e1673180 100644 --- a/Documentation/translations/zh_CN/index.rst +++ b/Documentation/translations/zh_CN/index.rst @@ -89,10 +89,10 @@ TODOList: admin-guide/index admin-guide/reporting-issues.rst userspace-api/index + 内核构建系统 <kbuild/index> TODOList: -* 内核构建系统 <kbuild/index> * 用户空间工具 <tools/index> 也可参考独立于内核文档的 `Linux 手册页 <https://www.kernel.org/doc/man-pages/>`_ 。 diff --git a/Documentation/translations/zh_CN/kbuild/gcc-plugins.rst b/Documentation/translations/zh_CN/kbuild/gcc-plugins.rst new file mode 100644 index 000000000000..67a8abbf5887 --- /dev/null +++ b/Documentation/translations/zh_CN/kbuild/gcc-plugins.rst @@ -0,0 +1,126 @@ +.. SPDX-License-Identifier: GPL-2.0 + +.. include:: ../disclaimer-zh_CN.rst + +:Original: Documentation/kbuild/gcc-plugins.rst +:Translator: 慕冬亮 Dongliang Mu <dzm91@hust.edu.cn> + +================ +GCC 插件基础设施 +================ + + +介绍 +==== + +GCC 插件是为编译器提供额外功能的可加载模块 [1]_。它们对于运行时插装和静态分析非常有用。 +我们可以在编译过程中通过回调 [2]_,GIMPLE [3]_,IPA [4]_ 和 RTL Passes [5]_ +(译者注:Pass 是编译器所采用的一种结构化技术,用于完成编译对象的分析、优化或转换等功能) +来分析、修改和添加更多的代码。 + +内核的 GCC 插件基础设施支持构建树外模块、交叉编译和在单独的目录中构建。插件源文件必须由 +C++ 编译器编译。 + +目前 GCC 插件基础设施只支持一些架构。搜索 "select HAVE_GCC_PLUGINS" 来查找支持 +GCC 插件的架构。 + +这个基础设施是从 grsecurity [6]_ 和 PaX [7]_ 移植过来的。 + +-- + +.. [1] https://gcc.gnu.org/onlinedocs/gccint/Plugins.html +.. [2] https://gcc.gnu.org/onlinedocs/gccint/Plugin-API.html#Plugin-API +.. [3] https://gcc.gnu.org/onlinedocs/gccint/GIMPLE.html +.. [4] https://gcc.gnu.org/onlinedocs/gccint/IPA.html +.. [5] https://gcc.gnu.org/onlinedocs/gccint/RTL.html +.. [6] https://grsecurity.net/ +.. [7] https://pax.grsecurity.net/ + + +目的 +==== + +GCC 插件的设计目的是提供一个用于试验 GCC 或 Clang 上游没有的潜在编译器功能的场所。 +一旦它们的实用性得到验证,这些功能将被添加到 GCC(和 Clang)的上游。随后,在所有 +支持的 GCC 版本都支持这些功能后,它们会被从内核中移除。 + +具体来说,新插件应该只实现上游编译器(GCC 和 Clang)不支持的功能。 + +当 Clang 中存在 GCC 中不存在的某项功能时,应努力将该功能做到 GCC 上游(而不仅仅 +是作为内核专用的 GCC 插件),以使整个生态都能从中受益。 + +类似的,如果 GCC 插件提供的功能在 Clang 中 **不** 存在,但该功能被证明是有用的,也应 +努力将该功能上传到 GCC(和 Clang)。 + +在上游 GCC 提供了某项功能后,该插件将无法在相应的 GCC 版本(以及更高版本)下编译。 +一旦所有内核支持的 GCC 版本都提供了该功能,该插件将从内核中移除。 + + +文件 +==== + +**$(src)/scripts/gcc-plugins** + + 这是 GCC 插件的目录。 + +**$(src)/scripts/gcc-plugins/gcc-common.h** + + 这是 GCC 插件的兼容性头文件。 + 应始终包含它,而不是单独的 GCC 头文件。 + +**$(src)/scripts/gcc-plugins/gcc-generate-gimple-pass.h, +$(src)/scripts/gcc-plugins/gcc-generate-ipa-pass.h, +$(src)/scripts/gcc-plugins/gcc-generate-simple_ipa-pass.h, +$(src)/scripts/gcc-plugins/gcc-generate-rtl-pass.h** + + 这些头文件可以自动生成 GIMPLE、SIMPLE_IPA、IPA 和 RTL passes 的注册结构。 + 与手动创建结构相比,它们更受欢迎。 + + +用法 +==== + +你必须为你的 GCC 版本安装 GCC 插件头文件,以 Ubuntu 上的 gcc-10 为例:: + + apt-get install gcc-10-plugin-dev + +或者在 Fedora 上:: + + dnf install gcc-plugin-devel libmpc-devel + +或者在 Fedora 上使用包含插件的交叉编译器时:: + + dnf install libmpc-devel + +在内核配置中启用 GCC 插件基础设施与一些你想使用的插件:: + + CONFIG_GCC_PLUGINS=y + CONFIG_GCC_PLUGIN_LATENT_ENTROPY=y + ... + +运行 gcc(本地或交叉编译器),确保能够检测到插件头文件:: + + gcc -print-file-name=plugin + CROSS_COMPILE=arm-linux-gnu- ${CROSS_COMPILE}gcc -print-file-name=plugin + +"plugin" 这个词意味着它们没有被检测到:: + + plugin + +完整的路径则表示插件已经被检测到:: + + /usr/lib/gcc/x86_64-redhat-linux/12/plugin + +编译包括插件在内的最小工具集:: + + make scripts + +或者直接在内核中运行 make,使用循环复杂性 GCC 插件编译整个内核。 + + +4. 如何添加新的 GCC 插件 +======================== + +GCC 插件位于 scripts/gcc-plugins/。你需要将插件源文件放在 scripts/gcc-plugins/ 目录下。 +子目录创建并不支持,你必须添加在 scripts/gcc-plugins/Makefile、scripts/Makefile.gcc-plugins +和相关的 Kconfig 文件中。 diff --git a/Documentation/translations/zh_CN/kbuild/headers_install.rst b/Documentation/translations/zh_CN/kbuild/headers_install.rst new file mode 100644 index 000000000000..02cb8896e555 --- /dev/null +++ b/Documentation/translations/zh_CN/kbuild/headers_install.rst @@ -0,0 +1,39 @@ +.. SPDX-License-Identifier: GPL-2.0 + +.. include:: ../disclaimer-zh_CN.rst + +:Original: Documentation/kbuild/headers_install.rst +:Translator: 慕冬亮 Dongliang Mu <dzm91@hust.edu.cn> + +============================ +导出内核头文件供用户空间使用 +============================ + +"make headers_install" 命令以适合于用户空间程序的形式导出内核头文件。 + +Linux 内核导出的头文件描述了用户空间程序尝试使用内核服务的 API。这些内核 +头文件被系统的 C 库(例如 glibc 和 uClibc)用于定义可用的系统调用,以及 +与这些系统调用一起使用的常量和结构。C 库的头文件包括来自 linux 子目录的 +内核头文件。系统的 libc 头文件通常被安装在默认位置 /usr/include,而内核 +头文件在该位置的子目录中(主要是 /usr/include/linux 和 /usr/include/asm)。 + +内核头文件向后兼容,但不向前兼容。这意味着使用旧内核头文件的 C 库构建的程序 +可以在新内核上运行(尽管它可能无法访问新特性),但使用新内核头文件构建的程序 +可能无法在旧内核上运行。 + +"make headers_install" 命令可以在内核源代码的顶层目录中运行(或使用标准 +的树外构建)。它接受两个可选参数:: + + make headers_install ARCH=i386 INSTALL_HDR_PATH=/usr + +ARCH 表明为其生成头文件的架构,默认为当前架构。导出内核头文件的 linux/asm +目录是基于特定平台的,要查看支持架构的完整列表,使用以下命令:: + + ls -d include/asm-* | sed 's/.*-//' + +INSTALL_HDR_PATH 表明头文件的安装位置,默认为 "./usr"。 + +该命令会在 INSTALL_HDR_PATH 中自动创建创建一个 'include' 目录,而头文件 +会被安装在 INSTALL_HDR_PATH/include 中。 + +内核头文件导出的基础设施由 David Woodhouse <dwmw2@infradead.org> 维护。 diff --git a/Documentation/translations/zh_CN/kbuild/index.rst b/Documentation/translations/zh_CN/kbuild/index.rst new file mode 100644 index 000000000000..b51655d981f6 --- /dev/null +++ b/Documentation/translations/zh_CN/kbuild/index.rst @@ -0,0 +1,35 @@ +.. SPDX-License-Identifier: GPL-2.0 + +.. include:: ../disclaimer-zh_CN.rst + +:Original: Documentation/kbuild/index.rst +:Translator: 慕冬亮 Dongliang Mu <dzm91@hust.edu.cn> + +============ +内核编译系统 +============ + +.. toctree:: + :maxdepth: 1 + + headers_install + gcc-plugins + +TODO: + +- kconfig-language +- kconfig-macro-language +- kbuild +- kconfig +- makefiles +- modules +- issues +- reproducible-builds +- llvm + +.. only:: subproject and html + + 目录 + ===== + + * :ref:`genindex` diff --git a/Documentation/translations/zh_CN/process/index.rst b/Documentation/translations/zh_CN/process/index.rst index 5a5cd7c01c62..3bcb3bdaf533 100644 --- a/Documentation/translations/zh_CN/process/index.rst +++ b/Documentation/translations/zh_CN/process/index.rst @@ -49,10 +49,11 @@ TODOLIST: embargoed-hardware-issues cve + security-bugs TODOLIST: -* security-bugs +* handling-regressions 其它大多数开发人员感兴趣的社区指南: diff --git a/Documentation/translations/zh_CN/admin-guide/security-bugs.rst b/Documentation/translations/zh_CN/process/security-bugs.rst index d6b8f8a4e7f6..a8f5fcbfadc9 100644 --- a/Documentation/translations/zh_CN/admin-guide/security-bugs.rst +++ b/Documentation/translations/zh_CN/process/security-bugs.rst @@ -1,3 +1,5 @@ +.. SPDX-License-Identifier: GPL-2.0-or-later + .. include:: ../disclaimer-zh_CN.rst :Original: :doc:`../../../process/security-bugs` @@ -5,6 +7,7 @@ :译者: 吴想成 Wu XiangCheng <bobwxc@email.cn> + 慕冬亮 Dongliang Mu <dzm91@hust.edu.cn> 安全缺陷 ========= @@ -17,13 +20,13 @@ Linux内核开发人员非常重视安全性。因此我们想知道何时发现 可以通过电子邮件<security@kernel.org>联系Linux内核安全团队。这是一个安全人员 的私有列表,他们将帮助验证错误报告并开发和发布修复程序。如果您已经有了一个 -修复,请将其包含在您的报告中,这样可以大大加快进程。安全团队可能会从区域维护 +修复,请将其包含在您的报告中,这样可以大大加快处理进程。安全团队可能会从区域维护 人员那里获得额外的帮助,以理解和修复安全漏洞。 与任何缺陷一样,提供的信息越多,诊断和修复就越容易。如果您不清楚哪些信息有用, 请查看“Documentation/translations/zh_CN/admin-guide/reporting-issues.rst”中 -概述的步骤。任何利用漏洞的攻击代码都非常有用,未经报告者同意不会对外发布,除 -非已经公开。 +概述的步骤。任何利用漏洞的攻击代码都非常有用,未经报告者同意不会对外发布, +除非已经公开。 请尽可能发送无附件的纯文本电子邮件。如果所有的细节都藏在附件里,那么就很难对 一个复杂的问题进行上下文引用的讨论。把它想象成一个 @@ -49,24 +52,31 @@ Linux内核开发人员非常重视安全性。因此我们想知道何时发现 换句话说,我们唯一感兴趣的是修复缺陷。提交给安全列表的所有其他资料以及对报告 的任何后续讨论,即使在解除限制之后,也将永久保密。 -协调 ------- +与其他团队协调 +-------------- + +虽然内核安全团队仅关注修复漏洞,但还有其他组织关注修复发行版上的安全问题以及协调 +操作系统厂商的漏洞披露。协调通常由 "linux-distros" 邮件列表处理,而披露则由 +公共 "oss-security" 邮件列表进行。两者紧密关联且被展示在 linux-distros 维基: +<https://oss-security.openwall.org/wiki/mailing-lists/distros> + +请注意,这三个列表的各自政策和规则是不同的,因为它们追求不同的目标。内核安全团队 +与其他团队之间的协调很困难,因为对于内核安全团队,保密期(即最大允许天数)是从补丁 +可用时开始,而 "linux-distros" 则从首次发布到列表时开始计算,无论是否存在补丁。 -对敏感缺陷(例如那些可能导致权限提升的缺陷)的修复可能需要与私有邮件列表 -<linux-distros@vs.openwall.org>进行协调,以便分发供应商做好准备,在公开披露 -上游补丁时发布一个已修复的内核。发行版将需要一些时间来测试建议的补丁,通常 -会要求至少几天的限制,而供应商更新发布更倾向于周二至周四。若合适,安全团队 -可以协助这种协调,或者报告者可以从一开始就包括linux发行版。在这种情况下,请 -记住在电子邮件主题行前面加上“[vs]”,如linux发行版wiki中所述: -<http://oss-security.openwall.org/wiki/mailing-lists/distros#how-to-use-the-lists>。 +因此,内核安全团队强烈建议,作为一位潜在安全问题的报告者,在受影响代码的维护者 +接受补丁之前,且在您阅读上述发行版维基页面并完全理解联系 "linux-distros" +邮件列表会对您和内核社区施加的要求之前,不要联系 "linux-distros" 邮件列表。 +这也意味着通常情况下不要同时抄送两个邮件列表,除非在协调时有已接受但尚未合并的补丁。 +换句话说,在补丁被接受之前,不要抄送 "linux-distros";在修复程序被合并之后, +不要抄送内核安全团队。 CVE分配 -------- -安全团队通常不分配CVE,我们也不需要它们来进行报告或修复,因为这会使过程不必 -要的复杂化,并可能耽误缺陷处理。如果报告者希望在公开披露之前分配一个CVE编号, -他们需要联系上述的私有linux-distros列表。当在提供补丁之前已有这样的CVE编号时, -如报告者愿意,最好在提交消息中提及它。 +安全团队不分配 CVE,同时我们也不需要 CVE 来报告或修复漏洞,因为这会使过程不必要 +的复杂化,并可能延误漏洞处理。如果报告者希望为确认的问题分配一个 CVE 编号, +可以联系 :doc:`内核 CVE 分配团队 <../process/cve>` 获取。 保密协议 --------- diff --git a/Documentation/translations/zh_CN/process/submitting-patches.rst b/Documentation/translations/zh_CN/process/submitting-patches.rst index 7864107e60a8..7ca16bda3709 100644 --- a/Documentation/translations/zh_CN/process/submitting-patches.rst +++ b/Documentation/translations/zh_CN/process/submitting-patches.rst @@ -208,7 +208,7 @@ torvalds@linux-foundation.org 。他收到的邮件很多,所以一般来说 如果您有修复可利用安全漏洞的补丁,请将该补丁发送到 security@kernel.org 。对于 严重的bug,可以考虑短期禁令以允许分销商(有时间)向用户发布补丁;在这种情况下, 显然不应将补丁发送到任何公共列表。 -参见 Documentation/translations/zh_CN/admin-guide/security-bugs.rst 。 +参见 Documentation/translations/zh_CN/process/security-bugs.rst 。 修复已发布内核中严重错误的补丁程序应该抄送给稳定版维护人员,方法是把以下列行 放进补丁的签准区(注意,不是电子邮件收件人):: diff --git a/Documentation/translations/zh_TW/admin-guide/reporting-issues.rst b/Documentation/translations/zh_TW/admin-guide/reporting-issues.rst index bc132b25f2ae..1d4e4c7a6750 100644 --- a/Documentation/translations/zh_TW/admin-guide/reporting-issues.rst +++ b/Documentation/translations/zh_TW/admin-guide/reporting-issues.rst @@ -301,7 +301,7 @@ Documentation/admin-guide/reporting-regressions.rst 對此進行了更詳細的 添加到迴歸跟蹤列表中,以確保它不會被忽略。 什麼是安全問題留給您自己判斷。在繼續之前,請考慮閱讀 -Documentation/translations/zh_CN/admin-guide/security-bugs.rst , +Documentation/translations/zh_CN/process/security-bugs.rst , 因爲它提供瞭如何最恰當地處理安全問題的額外細節。 當發生了完全無法接受的糟糕事情時,此問題就是一個“非常嚴重的問題”。例如, @@ -984,7 +984,7 @@ Documentation/admin-guide/reporting-regressions.rst ;它還提供了大量其 報告,請將報告的文本轉發到這些地址;但請在報告的頂部加上註釋,表明您提交了 報告,並附上工單鏈接。 -更多信息請參見 Documentation/translations/zh_CN/admin-guide/security-bugs.rst 。 +更多信息請參見 Documentation/translations/zh_CN/process/security-bugs.rst 。 發佈報告後的責任 diff --git a/Documentation/translations/zh_TW/process/submitting-patches.rst b/Documentation/translations/zh_TW/process/submitting-patches.rst index f12f2f193f85..64de92c07906 100644 --- a/Documentation/translations/zh_TW/process/submitting-patches.rst +++ b/Documentation/translations/zh_TW/process/submitting-patches.rst @@ -209,7 +209,7 @@ torvalds@linux-foundation.org 。他收到的郵件很多,所以一般來說 如果您有修復可利用安全漏洞的補丁,請將該補丁發送到 security@kernel.org 。對於 嚴重的bug,可以考慮短期禁令以允許分銷商(有時間)向用戶發佈補丁;在這種情況下, 顯然不應將補丁發送到任何公共列表。 -參見 Documentation/translations/zh_CN/admin-guide/security-bugs.rst 。 +參見 Documentation/translations/zh_CN/process/security-bugs.rst 。 修復已發佈內核中嚴重錯誤的補丁程序應該抄送給穩定版維護人員,方法是把以下列行 放進補丁的籤準區(注意,不是電子郵件收件人):: diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst b/Documentation/userspace-api/ioctl/ioctl-number.rst index e91c0376ee59..e4be1378ba26 100644 --- a/Documentation/userspace-api/ioctl/ioctl-number.rst +++ b/Documentation/userspace-api/ioctl/ioctl-number.rst @@ -78,6 +78,7 @@ Code Seq# Include File Comments 0x03 all linux/hdreg.h 0x04 D2-DC linux/umsdos_fs.h Dead since 2.6.11, but don't reuse these. 0x06 all linux/lp.h +0x07 9F-D0 linux/vmw_vmci_defs.h, uapi/linux/vm_sockets.h 0x09 all linux/raid/md_u.h 0x10 00-0F drivers/char/s390/vmcp.h 0x10 10-1F arch/s390/include/uapi/sclp_ctl.h @@ -292,6 +293,7 @@ Code Seq# Include File Comments 't' 80-8F linux/isdn_ppp.h 't' 90-91 linux/toshiba.h toshiba and toshiba_acpi SMM 'u' 00-1F linux/smb_fs.h gone +'u' 00-2F linux/ublk_cmd.h conflict! 'u' 20-3F linux/uvcvideo.h USB video class host driver 'u' 40-4f linux/udmabuf.h userspace dma-buf misc device 'v' 00-1F linux/ext2_fs.h conflict! diff --git a/Documentation/virt/kvm/index.rst b/Documentation/virt/kvm/index.rst index ad13ec55ddfe..9ca5a45c2140 100644 --- a/Documentation/virt/kvm/index.rst +++ b/Documentation/virt/kvm/index.rst @@ -14,6 +14,7 @@ KVM s390/index ppc-pv x86/index + loongarch/index locking vcpu-requests diff --git a/Documentation/virt/kvm/loongarch/hypercalls.rst b/Documentation/virt/kvm/loongarch/hypercalls.rst new file mode 100644 index 000000000000..2d6b94031f1b --- /dev/null +++ b/Documentation/virt/kvm/loongarch/hypercalls.rst @@ -0,0 +1,89 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=================================== +The LoongArch paravirtual interface +=================================== + +KVM hypercalls use the HVCL instruction with code 0x100 and the hypercall +number is put in a0. Up to five arguments may be placed in registers a1 - a5. +The return value is placed in v0 (an alias of a0). + +Source code for this interface can be found in arch/loongarch/kvm*. + +Querying for existence +====================== + +To determine if the host is running on KVM, we can utilize the cpucfg() +function at index CPUCFG_KVM_BASE (0x40000000). + +The CPUCFG_KVM_BASE range, spanning from 0x40000000 to 0x400000FF, The +CPUCFG_KVM_BASE range between 0x40000000 - 0x400000FF is marked as reserved. +Consequently, all current and future processors will not implement any +feature within this range. + +On a KVM-virtualized Linux system, a read operation on cpucfg() at index +CPUCFG_KVM_BASE (0x40000000) returns the magic string 'KVM\0'. + +Once you have determined that your host is running on a paravirtualization- +capable KVM, you may now use hypercalls as described below. + +KVM hypercall ABI +================= + +The KVM hypercall ABI is simple, with one scratch register a0 (v0) and at most +five generic registers (a1 - a5) used as input parameters. The FP (Floating- +point) and vector registers are not utilized as input registers and must +remain unmodified during a hypercall. + +Hypercall functions can be inlined as it only uses one scratch register. + +The parameters are as follows: + + ======== ================= ================ + Register IN OUT + ======== ================= ================ + a0 function number Return code + a1 1st parameter - + a2 2nd parameter - + a3 3rd parameter - + a4 4th parameter - + a5 5th parameter - + ======== ================= ================ + +The return codes may be one of the following: + + ==== ========================= + Code Meaning + ==== ========================= + 0 Success + -1 Hypercall not implemented + -2 Bad Hypercall parameter + ==== ========================= + +KVM Hypercalls Documentation +============================ + +The template for each hypercall is as follows: + +1. Hypercall name +2. Purpose + +1. KVM_HCALL_FUNC_IPI +------------------------ + +:Purpose: Send IPIs to multiple vCPUs. + +- a0: KVM_HCALL_FUNC_IPI +- a1: Lower part of the bitmap for destination physical CPUIDs +- a2: Higher part of the bitmap for destination physical CPUIDs +- a3: The lowest physical CPUID in the bitmap + +The hypercall lets a guest send multiple IPIs (Inter-Process Interrupts) with +at most 128 destinations per hypercall. The destinations are represented in a +bitmap contained in the first two input registers (a1 and a2). + +Bit 0 of a1 corresponds to the physical CPUID in the third input register (a3) +and bit 1 corresponds to the physical CPUID in a3+1, and so on. + +PV IPI on LoongArch includes both PV IPI multicast sending and PV IPI receiving, +and SWI is used for PV IPI inject since there is no VM-exits accessing SWI registers. diff --git a/Documentation/virt/kvm/loongarch/index.rst b/Documentation/virt/kvm/loongarch/index.rst new file mode 100644 index 000000000000..83387b4c5345 --- /dev/null +++ b/Documentation/virt/kvm/loongarch/index.rst @@ -0,0 +1,10 @@ +.. SPDX-License-Identifier: GPL-2.0 + +========================= +KVM for LoongArch systems +========================= + +.. toctree:: + :maxdepth: 2 + + hypercalls.rst diff --git a/Documentation/watchdog/watchdog-api.rst b/Documentation/watchdog/watchdog-api.rst index 800dcd7586f2..78e228c272cf 100644 --- a/Documentation/watchdog/watchdog-api.rst +++ b/Documentation/watchdog/watchdog-api.rst @@ -249,7 +249,7 @@ Note that not all devices support these two calls, and some only support the GETBOOTSTATUS call. Some drivers can measure the temperature using the GETTEMP ioctl. The -returned value is the temperature in degrees fahrenheit:: +returned value is the temperature in degrees Fahrenheit:: int temperature; ioctl(fd, WDIOC_GETTEMP, &temperature); |