summaryrefslogtreecommitdiff
path: root/Documentation/userspace-api
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/userspace-api')
-rw-r--r--Documentation/userspace-api/accelerators/ocxl.rst7
-rw-r--r--Documentation/userspace-api/check_exec.rst144
-rw-r--r--Documentation/userspace-api/dma-buf-heaps.rst25
-rw-r--r--Documentation/userspace-api/fwctl/fwctl-cxl.rst142
-rw-r--r--Documentation/userspace-api/fwctl/fwctl.rst286
-rw-r--r--Documentation/userspace-api/fwctl/index.rst14
-rw-r--r--Documentation/userspace-api/fwctl/pds_fwctl.rst46
-rw-r--r--Documentation/userspace-api/index.rst4
-rw-r--r--Documentation/userspace-api/ioctl/ioctl-number.rst32
-rw-r--r--Documentation/userspace-api/iommufd.rst17
-rw-r--r--Documentation/userspace-api/landlock.rst72
-rw-r--r--Documentation/userspace-api/media/drivers/uvcvideo.rst64
-rw-r--r--Documentation/userspace-api/media/rc/rc-sysfs-nodes.rst2
-rw-r--r--Documentation/userspace-api/media/v4l/meta-formats.rst1
-rw-r--r--Documentation/userspace-api/media/v4l/metafmt-c3-isp.rst86
-rw-r--r--Documentation/userspace-api/media/v4l/pixfmt-yuv-planar.rst128
-rw-r--r--Documentation/userspace-api/media/v4l/vidioc-g-ext-ctrls.rst26
-rw-r--r--Documentation/userspace-api/media/v4l/vidioc-queryctrl.rst14
-rw-r--r--Documentation/userspace-api/media/videodev2.h.rst.exceptions4
-rw-r--r--Documentation/userspace-api/mseal.rst23
-rw-r--r--Documentation/userspace-api/netlink/c-code-gen.rst4
-rw-r--r--Documentation/userspace-api/netlink/intro-specs.rst8
-rw-r--r--Documentation/userspace-api/netlink/netlink-raw.rst2
-rw-r--r--Documentation/userspace-api/ntsync.rst385
-rw-r--r--Documentation/userspace-api/perf_ring_buffer.rst4
-rw-r--r--Documentation/userspace-api/sysfs-platform_profile.rst38
26 files changed, 1524 insertions, 54 deletions
diff --git a/Documentation/userspace-api/accelerators/ocxl.rst b/Documentation/userspace-api/accelerators/ocxl.rst
index db7570d5e50d..4e213af70237 100644
--- a/Documentation/userspace-api/accelerators/ocxl.rst
+++ b/Documentation/userspace-api/accelerators/ocxl.rst
@@ -3,8 +3,11 @@ OpenCAPI (Open Coherent Accelerator Processor Interface)
========================================================
OpenCAPI is an interface between processors and accelerators. It aims
-at being low-latency and high-bandwidth. The specification is
-developed by the `OpenCAPI Consortium <http://opencapi.org/>`_.
+at being low-latency and high-bandwidth.
+
+The specification was developed by the OpenCAPI Consortium, and is now
+available from the `Compute Express Link Consortium
+<https://computeexpresslink.org/resource/opencapi-specification-archive/>`_.
It allows an accelerator (which could be an FPGA, ASICs, ...) to access
the host memory coherently, using virtual addresses. An OpenCAPI
diff --git a/Documentation/userspace-api/check_exec.rst b/Documentation/userspace-api/check_exec.rst
new file mode 100644
index 000000000000..05dfe3b56f71
--- /dev/null
+++ b/Documentation/userspace-api/check_exec.rst
@@ -0,0 +1,144 @@
+.. SPDX-License-Identifier: GPL-2.0
+.. Copyright © 2024 Microsoft Corporation
+
+===================
+Executability check
+===================
+
+The ``AT_EXECVE_CHECK`` :manpage:`execveat(2)` flag, and the
+``SECBIT_EXEC_RESTRICT_FILE`` and ``SECBIT_EXEC_DENY_INTERACTIVE`` securebits
+are intended for script interpreters and dynamic linkers to enforce a
+consistent execution security policy handled by the kernel. See the
+`samples/check-exec/inc.c`_ example.
+
+Whether an interpreter should check these securebits or not depends on the
+security risk of running malicious scripts with respect to the execution
+environment, and whether the kernel can check if a script is trustworthy or
+not. For instance, Python scripts running on a server can use arbitrary
+syscalls and access arbitrary files. Such interpreters should then be
+enlighten to use these securebits and let users define their security policy.
+However, a JavaScript engine running in a web browser should already be
+sandboxed and then should not be able to harm the user's environment.
+
+Script interpreters or dynamic linkers built for tailored execution environments
+(e.g. hardened Linux distributions or hermetic container images) could use
+``AT_EXECVE_CHECK`` without checking the related securebits if backward
+compatibility is handled by something else (e.g. atomic update ensuring that
+all legitimate libraries are allowed to be executed). It is then recommended
+for script interpreters and dynamic linkers to check the securebits at run time
+by default, but also to provide the ability for custom builds to behave like if
+``SECBIT_EXEC_RESTRICT_FILE`` or ``SECBIT_EXEC_DENY_INTERACTIVE`` were always
+set to 1 (i.e. always enforce restrictions).
+
+AT_EXECVE_CHECK
+===============
+
+Passing the ``AT_EXECVE_CHECK`` flag to :manpage:`execveat(2)` only performs a
+check on a regular file and returns 0 if execution of this file would be
+allowed, ignoring the file format and then the related interpreter dependencies
+(e.g. ELF libraries, script's shebang).
+
+Programs should always perform this check to apply kernel-level checks against
+files that are not directly executed by the kernel but passed to a user space
+interpreter instead. All files that contain executable code, from the point of
+view of the interpreter, should be checked. However the result of this check
+should only be enforced according to ``SECBIT_EXEC_RESTRICT_FILE`` or
+``SECBIT_EXEC_DENY_INTERACTIVE.``.
+
+The main purpose of this flag is to improve the security and consistency of an
+execution environment to ensure that direct file execution (e.g.
+``./script.sh``) and indirect file execution (e.g. ``sh script.sh``) lead to
+the same result. For instance, this can be used to check if a file is
+trustworthy according to the caller's environment.
+
+In a secure environment, libraries and any executable dependencies should also
+be checked. For instance, dynamic linking should make sure that all libraries
+are allowed for execution to avoid trivial bypass (e.g. using ``LD_PRELOAD``).
+For such secure execution environment to make sense, only trusted code should
+be executable, which also requires integrity guarantees.
+
+To avoid race conditions leading to time-of-check to time-of-use issues,
+``AT_EXECVE_CHECK`` should be used with ``AT_EMPTY_PATH`` to check against a
+file descriptor instead of a path.
+
+SECBIT_EXEC_RESTRICT_FILE and SECBIT_EXEC_DENY_INTERACTIVE
+==========================================================
+
+When ``SECBIT_EXEC_RESTRICT_FILE`` is set, a process should only interpret or
+execute a file if a call to :manpage:`execveat(2)` with the related file
+descriptor and the ``AT_EXECVE_CHECK`` flag succeed.
+
+This secure bit may be set by user session managers, service managers,
+container runtimes, sandboxer tools... Except for test environments, the
+related ``SECBIT_EXEC_RESTRICT_FILE_LOCKED`` bit should also be set.
+
+Programs should only enforce consistent restrictions according to the
+securebits but without relying on any other user-controlled configuration.
+Indeed, the use case for these securebits is to only trust executable code
+vetted by the system configuration (through the kernel), so we should be
+careful to not let untrusted users control this configuration.
+
+However, script interpreters may still use user configuration such as
+environment variables as long as it is not a way to disable the securebits
+checks. For instance, the ``PATH`` and ``LD_PRELOAD`` variables can be set by
+a script's caller. Changing these variables may lead to unintended code
+executions, but only from vetted executable programs, which is OK. For this to
+make sense, the system should provide a consistent security policy to avoid
+arbitrary code execution e.g., by enforcing a write xor execute policy.
+
+When ``SECBIT_EXEC_DENY_INTERACTIVE`` is set, a process should never interpret
+interactive user commands (e.g. scripts). However, if such commands are passed
+through a file descriptor (e.g. stdin), its content should be interpreted if a
+call to :manpage:`execveat(2)` with the related file descriptor and the
+``AT_EXECVE_CHECK`` flag succeed.
+
+For instance, script interpreters called with a script snippet as argument
+should always deny such execution if ``SECBIT_EXEC_DENY_INTERACTIVE`` is set.
+
+This secure bit may be set by user session managers, service managers,
+container runtimes, sandboxer tools... Except for test environments, the
+related ``SECBIT_EXEC_DENY_INTERACTIVE_LOCKED`` bit should also be set.
+
+Here is the expected behavior for a script interpreter according to combination
+of any exec securebits:
+
+1. ``SECBIT_EXEC_RESTRICT_FILE=0`` and ``SECBIT_EXEC_DENY_INTERACTIVE=0``
+
+ Always interpret scripts, and allow arbitrary user commands (default).
+
+ No threat, everyone and everything is trusted, but we can get ahead of
+ potential issues thanks to the call to :manpage:`execveat(2)` with
+ ``AT_EXECVE_CHECK`` which should always be performed but ignored by the
+ script interpreter. Indeed, this check is still important to enable systems
+ administrators to verify requests (e.g. with audit) and prepare for
+ migration to a secure mode.
+
+2. ``SECBIT_EXEC_RESTRICT_FILE=1`` and ``SECBIT_EXEC_DENY_INTERACTIVE=0``
+
+ Deny script interpretation if they are not executable, but allow
+ arbitrary user commands.
+
+ The threat is (potential) malicious scripts run by trusted (and not fooled)
+ users. That can protect against unintended script executions (e.g. ``sh
+ /tmp/*.sh``). This makes sense for (semi-restricted) user sessions.
+
+3. ``SECBIT_EXEC_RESTRICT_FILE=0`` and ``SECBIT_EXEC_DENY_INTERACTIVE=1``
+
+ Always interpret scripts, but deny arbitrary user commands.
+
+ This use case may be useful for secure services (i.e. without interactive
+ user session) where scripts' integrity is verified (e.g. with IMA/EVM or
+ dm-verity/IPE) but where access rights might not be ready yet. Indeed,
+ arbitrary interactive commands would be much more difficult to check.
+
+4. ``SECBIT_EXEC_RESTRICT_FILE=1`` and ``SECBIT_EXEC_DENY_INTERACTIVE=1``
+
+ Deny script interpretation if they are not executable, and also deny
+ any arbitrary user commands.
+
+ The threat is malicious scripts run by untrusted users (but trusted code).
+ This makes sense for system services that may only execute trusted scripts.
+
+.. Links
+.. _samples/check-exec/inc.c:
+ https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/samples/check-exec/inc.c
diff --git a/Documentation/userspace-api/dma-buf-heaps.rst b/Documentation/userspace-api/dma-buf-heaps.rst
new file mode 100644
index 000000000000..535f49047ce6
--- /dev/null
+++ b/Documentation/userspace-api/dma-buf-heaps.rst
@@ -0,0 +1,25 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==============================
+Allocating dma-buf using heaps
+==============================
+
+Dma-buf Heaps are a way for userspace to allocate dma-buf objects. They are
+typically used to allocate buffers from a specific allocation pool, or to share
+buffers across frameworks.
+
+Heaps
+=====
+
+A heap represents a specific allocator. The Linux kernel currently supports the
+following heaps:
+
+ - The ``system`` heap allocates virtually contiguous, cacheable, buffers.
+
+ - The ``cma`` heap allocates physically contiguous, cacheable,
+ buffers. Only present if a CMA region is present. Such a region is
+ usually created either through the kernel commandline through the
+ `cma` parameter, a memory region Device-Tree node with the
+ `linux,cma-default` property set, or through the `CMA_SIZE_MBYTES` or
+ `CMA_SIZE_PERCENTAGE` Kconfig options. Depending on the platform, it
+ might be called ``reserved``, ``linux,cma``, or ``default-pool``.
diff --git a/Documentation/userspace-api/fwctl/fwctl-cxl.rst b/Documentation/userspace-api/fwctl/fwctl-cxl.rst
new file mode 100644
index 000000000000..670b43b72949
--- /dev/null
+++ b/Documentation/userspace-api/fwctl/fwctl-cxl.rst
@@ -0,0 +1,142 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+================
+fwctl cxl driver
+================
+
+:Author: Dave Jiang
+
+Overview
+========
+
+The CXL spec defines a set of commands that can be issued to the mailbox of a
+CXL device or switch. It also left room for vendor specific commands to be
+issued to the mailbox as well. fwctl provides a path to issue a set of allowed
+mailbox commands from user space to the device moderated by the kernel driver.
+
+The following 3 commands will be used to support CXL Features:
+CXL spec r3.1 8.2.9.6.1 Get Supported Features (Opcode 0500h)
+CXL spec r3.1 8.2.9.6.2 Get Feature (Opcode 0501h)
+CXL spec r3.1 8.2.9.6.3 Set Feature (Opcode 0502h)
+
+The "Get Supported Features" return data may be filtered by the kernel driver to
+drop any features that are forbidden by the kernel or being exclusively used by
+the kernel. The driver will set the "Set Feature Size" of the "Get Supported
+Features Supported Feature Entry" to 0 to indicate that the Feature cannot be
+modified. The "Get Supported Features" command and the "Get Features" falls
+under the fwctl policy of FWCTL_RPC_CONFIGURATION.
+
+For "Set Feature" command, the access policy currently is broken down into two
+categories depending on the Set Feature effects reported by the device. If the
+Set Feature will cause immediate change to the device, the fwctl access policy
+must be FWCTL_RPC_DEBUG_WRITE_FULL. The effects for this level are
+"immediate config change", "immediate data change", "immediate policy change",
+or "immediate log change" for the set effects mask. If the effects are "config
+change with cold reset" or "config change with conventional reset", then the
+fwctl access policy must be FWCTL_RPC_DEBUG_WRITE or higher.
+
+fwctl cxl User API
+==================
+
+.. kernel-doc:: include/uapi/fwctl/cxl.h
+
+1. Driver info query
+--------------------
+
+First step for the app is to issue the ioctl(FWCTL_CMD_INFO). Successful
+invocation of the ioctl implies the Features capability is operational and
+returns an all zeros 32bit payload. A ``struct fwctl_info`` needs to be filled
+out with the ``fwctl_info.out_device_type`` set to ``FWCTL_DEVICE_TYPE_CXL``.
+The return data should be ``struct fwctl_info_cxl`` that contains a reserved
+32bit field that should be all zeros.
+
+2. Send hardware commands
+-------------------------
+
+Next step is to send the 'Get Supported Features' command to the driver from
+user space via ioctl(FWCTL_RPC). A ``struct fwctl_rpc_cxl`` is pointed to
+by ``fwctl_rpc.in``. ``struct fwctl_rpc_cxl.in_payload`` points to
+the hardware input structure that is defined by the CXL spec. ``fwctl_rpc.out``
+points to the buffer that contains a ``struct fwctl_rpc_cxl_out`` that includes
+the hardware output data inlined as ``fwctl_rpc_cxl_out.payload``. This command
+is called twice. First time to retrieve the number of features supported.
+A second time to retrieve the specific feature details as the output data.
+
+After getting the specific feature details, a Get/Set Feature command can be
+appropriately programmed and sent. For a "Set Feature" command, the retrieved
+feature info contains an effects field that details the resulting
+"Set Feature" command will trigger. That will inform the user whether
+the system is configured to allowed the "Set Feature" command or not.
+
+Code example of a Get Feature
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: c
+
+ static int cxl_fwctl_rpc_get_test_feature(int fd, struct test_feature *feat_ctx,
+ const uint32_t expected_data)
+ {
+ struct cxl_mbox_get_feat_in *feat_in;
+ struct fwctl_rpc_cxl_out *out;
+ struct fwctl_rpc rpc = {0};
+ struct fwctl_rpc_cxl *in;
+ size_t out_size, in_size;
+ uint32_t val;
+ void *data;
+ int rc;
+
+ in_size = sizeof(*in) + sizeof(*feat_in);
+ rc = posix_memalign((void **)&in, 16, in_size);
+ if (rc)
+ return -ENOMEM;
+ memset(in, 0, in_size);
+ feat_in = &in->get_feat_in;
+
+ uuid_copy(feat_in->uuid, feat_ctx->uuid);
+ feat_in->count = feat_ctx->get_size;
+
+ out_size = sizeof(*out) + feat_ctx->get_size;
+ rc = posix_memalign((void **)&out, 16, out_size);
+ if (rc)
+ goto free_in;
+ memset(out, 0, out_size);
+
+ in->opcode = CXL_MBOX_OPCODE_GET_FEATURE;
+ in->op_size = sizeof(*feat_in);
+
+ rpc.size = sizeof(rpc);
+ rpc.scope = FWCTL_RPC_CONFIGURATION;
+ rpc.in_len = in_size;
+ rpc.out_len = out_size;
+ rpc.in = (uint64_t)(uint64_t *)in;
+ rpc.out = (uint64_t)(uint64_t *)out;
+
+ rc = send_command(fd, &rpc, out);
+ if (rc)
+ goto free_all;
+
+ data = out->payload;
+ val = le32toh(*(__le32 *)data);
+ if (memcmp(&val, &expected_data, sizeof(val)) != 0) {
+ rc = -ENXIO;
+ goto free_all;
+ }
+
+ free_all:
+ free(out);
+ free_in:
+ free(in);
+ return rc;
+ }
+
+Take a look at CXL CLI test directory
+<https://github.com/pmem/ndctl/tree/main/test/fwctl.c> for a detailed user code
+for examples on how to exercise this path.
+
+
+fwctl cxl Kernel API
+====================
+
+.. kernel-doc:: drivers/cxl/core/features.c
+ :export:
+.. kernel-doc:: include/cxl/features.h
diff --git a/Documentation/userspace-api/fwctl/fwctl.rst b/Documentation/userspace-api/fwctl/fwctl.rst
new file mode 100644
index 000000000000..fdcfe418a83f
--- /dev/null
+++ b/Documentation/userspace-api/fwctl/fwctl.rst
@@ -0,0 +1,286 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===============
+fwctl subsystem
+===============
+
+:Author: Jason Gunthorpe
+
+Overview
+========
+
+Modern devices contain extensive amounts of FW, and in many cases, are largely
+software-defined pieces of hardware. The evolution of this approach is largely a
+reaction to Moore's Law where a chip tape out is now highly expensive, and the
+chip design is extremely large. Replacing fixed HW logic with a flexible and
+tightly coupled FW/HW combination is an effective risk mitigation against chip
+respin. Problems in the HW design can be counteracted in device FW. This is
+especially true for devices which present a stable and backwards compatible
+interface to the operating system driver (such as NVMe).
+
+The FW layer in devices has grown to incredible size and devices frequently
+integrate clusters of fast processors to run it. For example, mlx5 devices have
+over 30MB of FW code, and big configurations operate with over 1GB of FW managed
+runtime state.
+
+The availability of such a flexible layer has created quite a variety in the
+industry where single pieces of silicon are now configurable software-defined
+devices and can operate in substantially different ways depending on the need.
+Further, we often see cases where specific sites wish to operate devices in ways
+that are highly specialized and require applications that have been tailored to
+their unique configuration.
+
+Further, devices have become multi-functional and integrated to the point they
+no longer fit neatly into the kernel's division of subsystems. Modern
+multi-functional devices have drivers, such as bnxt/ice/mlx5/pds, that span many
+subsystems while sharing the underlying hardware using the auxiliary device
+system.
+
+All together this creates a challenge for the operating system, where devices
+have an expansive FW environment that needs robust device-specific debugging
+support, and FW-driven functionality that is not well suited to “generic”
+interfaces. fwctl seeks to allow access to the full device functionality from
+user space in the areas of debuggability, management, and first-boot/nth-boot
+provisioning.
+
+fwctl is aimed at the common device design pattern where the OS and FW
+communicate via an RPC message layer constructed with a queue or mailbox scheme.
+In this case the driver will typically have some layer to deliver RPC messages
+and collect RPC responses from device FW. The in-kernel subsystem drivers that
+operate the device for its primary purposes will use these RPCs to build their
+drivers, but devices also usually have a set of ancillary RPCs that don't really
+fit into any specific subsystem. For example, a HW RAID controller is primarily
+operated by the block layer but also comes with a set of RPCs to administer the
+construction of drives within the HW RAID.
+
+In the past when devices were more single function, individual subsystems would
+grow different approaches to solving some of these common problems. For instance
+monitoring device health, manipulating its FLASH, debugging the FW,
+provisioning, all have various unique interfaces across the kernel.
+
+fwctl's purpose is to define a common set of limited rules, described below,
+that allow user space to securely construct and execute RPCs inside device FW.
+The rules serve as an agreement between the operating system and FW on how to
+correctly design the RPC interface. As a uAPI the subsystem provides a thin
+layer of discovery and a generic uAPI to deliver the RPCs and collect the
+response. It supports a system of user space libraries and tools which will
+use this interface to control the device using the device native protocols.
+
+Scope of Action
+---------------
+
+fwctl drivers are strictly restricted to being a way to operate the device FW.
+It is not an avenue to access random kernel internals, or other operating system
+SW states.
+
+fwctl instances must operate on a well-defined device function, and the device
+should have a well-defined security model for what scope within the physical
+device the function is permitted to access. For instance, the most complex PCIe
+device today may broadly have several function-level scopes:
+
+ 1. A privileged function with full access to the on-device global state and
+ configuration
+
+ 2. Multiple hypervisor functions with control over itself and child functions
+ used with VMs
+
+ 3. Multiple VM functions tightly scoped within the VM
+
+The device may create a logical parent/child relationship between these scopes.
+For instance a child VM's FW may be within the scope of the hypervisor FW. It is
+quite common in the VFIO world that the hypervisor environment has a complex
+provisioning/profiling/configuration responsibility for the function VFIO
+assigns to the VM.
+
+Further, within the function, devices often have RPC commands that fall within
+some general scopes of action (see enum fwctl_rpc_scope):
+
+ 1. Access to function & child configuration, FLASH, etc. that becomes live at a
+ function reset. Access to function & child runtime configuration that is
+ transparent or non-disruptive to any driver or VM.
+
+ 2. Read-only access to function debug information that may report on FW objects
+ in the function & child, including FW objects owned by other kernel
+ subsystems.
+
+ 3. Write access to function & child debug information strictly compatible with
+ the principles of kernel lockdown and kernel integrity protection. Triggers
+ a kernel Taint.
+
+ 4. Full debug device access. Triggers a kernel Taint, requires CAP_SYS_RAWIO.
+
+User space will provide a scope label on each RPC and the kernel must enforce the
+above CAPs and taints based on that scope. A combination of kernel and FW can
+enforce that RPCs are placed in the correct scope by user space.
+
+Denied behavior
+---------------
+
+There are many things this interface must not allow user space to do (without a
+Taint or CAP), broadly derived from the principles of kernel lockdown. Some
+examples:
+
+ 1. DMA to/from arbitrary memory, hang the system, compromise FW integrity with
+ untrusted code, or otherwise compromise device or system security and
+ integrity.
+
+ 2. Provide an abnormal “back door” to kernel drivers. No manipulation of kernel
+ objects owned by kernel drivers.
+
+ 3. Directly configure or otherwise control kernel drivers. A subsystem kernel
+ driver can react to the device configuration at function reset/driver load
+ time, but otherwise must not be coupled to fwctl.
+
+ 4. Operate the HW in a way that overlaps with the core purpose of another
+ primary kernel subsystem, such as read/write to LBAs, send/receive of
+ network packets, or operate an accelerator's data plane.
+
+fwctl is not a replacement for device direct access subsystems like uacce or
+VFIO.
+
+Operations exposed through fwctl's non-taining interfaces should be fully
+sharable with other users of the device. For instance exposing a RPC through
+fwctl should never prevent a kernel subsystem from also concurrently using that
+same RPC or hardware unit down the road. In such cases fwctl will be less
+important than proper kernel subsystems that eventually emerge. Mistakes in this
+area resulting in clashes will be resolved in favour of a kernel implementation.
+
+fwctl User API
+==============
+
+.. kernel-doc:: include/uapi/fwctl/fwctl.h
+.. kernel-doc:: include/uapi/fwctl/mlx5.h
+.. kernel-doc:: include/uapi/fwctl/pds.h
+
+sysfs Class
+-----------
+
+fwctl has a sysfs class (/sys/class/fwctl/fwctlNN/) and character devices
+(/dev/fwctl/fwctlNN) with a simple numbered scheme. The character device
+operates the iotcl uAPI described above.
+
+fwctl devices can be related to driver components in other subsystems through
+sysfs::
+
+ $ ls /sys/class/fwctl/fwctl0/device/infiniband/
+ ibp0s10f0
+
+ $ ls /sys/class/infiniband/ibp0s10f0/device/fwctl/
+ fwctl0/
+
+ $ ls /sys/devices/pci0000:00/0000:00:0a.0/fwctl/fwctl0
+ dev device power subsystem uevent
+
+User space Community
+--------------------
+
+Drawing inspiration from nvme-cli, participating in the kernel side must come
+with a user space in a common TBD git tree, at a minimum to usefully operate the
+kernel driver. Providing such an implementation is a pre-condition to merging a
+kernel driver.
+
+The goal is to build user space community around some of the shared problems
+we all have, and ideally develop some common user space programs with some
+starting themes of:
+
+ - Device in-field debugging
+
+ - HW provisioning
+
+ - VFIO child device profiling before VM boot
+
+ - Confidential Compute topics (attestation, secure provisioning)
+
+that stretch across all subsystems in the kernel. fwupd is a great example of
+how an excellent user space experience can emerge out of kernel-side diversity.
+
+fwctl Kernel API
+================
+
+.. kernel-doc:: drivers/fwctl/main.c
+ :export:
+.. kernel-doc:: include/linux/fwctl.h
+
+fwctl Driver design
+-------------------
+
+In many cases a fwctl driver is going to be part of a larger cross-subsystem
+device possibly using the auxiliary_device mechanism. In that case several
+subsystems are going to be sharing the same device and FW interface layer so the
+device design must already provide for isolation and cooperation between kernel
+subsystems. fwctl should fit into that same model.
+
+Part of the driver should include a description of how its scope restrictions
+and security model work. The driver and FW together must ensure that RPCs
+provided by user space are mapped to the appropriate scope. If the validation is
+done in the driver then the validation can read a 'command effects' report from
+the device, or hardwire the enforcement. If the validation is done in the FW,
+then the driver should pass the fwctl_rpc_scope to the FW along with the command.
+
+The driver and FW must cooperate to ensure that either fwctl cannot allocate
+any FW resources, or any resources it does allocate are freed on FD closure. A
+driver primarily constructed around FW RPCs may find that its core PCI function
+and RPC layer belongs under fwctl with auxiliary devices connecting to other
+subsystems.
+
+Each device type must be mindful of Linux's philosophy for stable ABI. The FW
+RPC interface does not have to meet a strictly stable ABI, but it does need to
+meet an expectation that userspace tools that are deployed and in significant
+use don't needlessly break. FW upgrade and kernel upgrade should keep widely
+deployed tooling working.
+
+Development and debugging focused RPCs under more permissive scopes can have
+less stabilitiy if the tools using them are only run under exceptional
+circumstances and not for every day use of the device. Debugging tools may even
+require exact version matching as they may require something similar to DWARF
+debug information from the FW binary.
+
+Security Response
+=================
+
+The kernel remains the gatekeeper for this interface. If violations of the
+scopes, security or isolation principles are found, we have options to let
+devices fix them with a FW update, push a kernel patch to parse and block RPC
+commands or push a kernel patch to block entire firmware versions/devices.
+
+While the kernel can always directly parse and restrict RPCs, it is expected
+that the existing kernel pattern of allowing drivers to delegate validation to
+FW to be a useful design.
+
+Existing Similar Examples
+=========================
+
+The approach described in this document is not a new idea. Direct, or near
+direct device access has been offered by the kernel in different areas for
+decades. With more devices wanting to follow this design pattern it is becoming
+clear that it is not entirely well understood and, more importantly, the
+security considerations are not well defined or agreed upon.
+
+Some examples:
+
+ - HW RAID controllers. This includes RPCs to do things like compose drives into
+ a RAID volume, configure RAID parameters, monitor the HW and more.
+
+ - Baseboard managers. RPCs for configuring settings in the device and more
+
+ - NVMe vendor command capsules. nvme-cli provides access to some monitoring
+ functions that different products have defined, but more exist.
+
+ - CXL also has a NVMe-like vendor command system.
+
+ - DRM allows user space drivers to send commands to the device via kernel
+ mediation
+
+ - RDMA allows user space drivers to directly push commands to the device
+ without kernel involvement
+
+ - Various “raw” APIs, raw HID (SDL2), raw USB, NVMe Generic Interface, etc.
+
+The first 4 are examples of areas that fwctl intends to cover. The latter three
+are examples of denied behavior as they fully overlap with the primary purpose
+of a kernel subsystem.
+
+Some key lessons learned from these past efforts are the importance of having a
+common user space project to use as a pre-condition for obtaining a kernel
+driver. Developing good community around useful software in user space is key to
+getting companies to fund participation to enable their products.
diff --git a/Documentation/userspace-api/fwctl/index.rst b/Documentation/userspace-api/fwctl/index.rst
new file mode 100644
index 000000000000..316ac456ad3b
--- /dev/null
+++ b/Documentation/userspace-api/fwctl/index.rst
@@ -0,0 +1,14 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Firmware Control (FWCTL) Userspace API
+======================================
+
+A framework that define a common set of limited rules that allows user space
+to securely construct and execute RPCs inside device firmware.
+
+.. toctree::
+ :maxdepth: 1
+
+ fwctl
+ fwctl-cxl
+ pds_fwctl
diff --git a/Documentation/userspace-api/fwctl/pds_fwctl.rst b/Documentation/userspace-api/fwctl/pds_fwctl.rst
new file mode 100644
index 000000000000..b5a31f82c883
--- /dev/null
+++ b/Documentation/userspace-api/fwctl/pds_fwctl.rst
@@ -0,0 +1,46 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+================
+fwctl pds driver
+================
+
+:Author: Shannon Nelson
+
+Overview
+========
+
+The PDS Core device makes a fwctl service available through an
+auxiliary_device named pds_core.fwctl.N. The pds_fwctl driver binds to
+this device and registers itself with the fwctl subsystem. The resulting
+userspace interface is used by an application that is a part of the
+AMD Pensando software package for the Distributed Service Card (DSC).
+
+The pds_fwctl driver has little knowledge of the firmware's internals.
+It only knows how to send commands through pds_core's message queue to the
+firmware for fwctl requests. The set of fwctl operations available
+depends on the firmware in the DSC, and the userspace application
+version must match the firmware so that they can talk to each other.
+
+When a connection is created the pds_fwctl driver requests from the
+firmware a list of firmware object endpoints, and for each endpoint the
+driver requests a list of operations for that endpoint.
+
+Each operation description includes a firmware defined command attribute
+that maps to the FWCTL scope levels. The driver translates those firmware
+values into the FWCTL scope values which can then be used for filtering the
+scoped user requests.
+
+pds_fwctl User API
+==================
+
+Each RPC request includes the target endpoint and the operation id, and in
+and out buffer lengths and pointers. The driver verifies the existence
+of the requested endpoint and operations, then checks the request scope
+against the required scope of the operation. The request is then put
+together with the request data and sent through pds_core's message queue
+to the firmware, and the results are returned to the caller.
+
+The RPC endpoints, operations, and buffer contents are defined by the
+particular firmware package in the device, which varies across the
+available product configurations. The details are available in the
+specific product SDK documentation.
diff --git a/Documentation/userspace-api/index.rst b/Documentation/userspace-api/index.rst
index 274cc7546efc..b8c73be4fb11 100644
--- a/Documentation/userspace-api/index.rst
+++ b/Documentation/userspace-api/index.rst
@@ -35,6 +35,7 @@ Security-related interfaces
mfd_noexec
spec_ctrl
tee
+ check_exec
Devices and I/O
===============
@@ -43,7 +44,9 @@ Devices and I/O
:maxdepth: 1
accelerators/ocxl
+ dma-buf-heaps
dma-buf-alloc-exchange
+ fwctl/index
gpio/index
iommufd
media/index
@@ -63,6 +66,7 @@ Everything else
vduse
futex2
perf_ring_buffer
+ ntsync
.. only:: subproject and html
diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst b/Documentation/userspace-api/ioctl/ioctl-number.rst
index 243f1f1b554a..bc91756bde73 100644
--- a/Documentation/userspace-api/ioctl/ioctl-number.rst
+++ b/Documentation/userspace-api/ioctl/ioctl-number.rst
@@ -28,10 +28,10 @@ or number from the table below. Because of the large number of drivers,
many drivers share a partial letter with other drivers.
If you are writing a driver for a new device and need a letter, pick an
-unused block with enough room for expansion: 32 to 256 ioctl commands.
-You can register the block by patching this file and submitting the
-patch to Linus Torvalds. Or you can e-mail me at <mec@shout.net> and
-I'll register one for you.
+unused block with enough room for expansion: 32 to 256 ioctl commands
+should suffice. You can register the block by patching this file and
+submitting the patch through :doc:`usual patch submission process
+</process/submitting-patches>`.
The second argument to _IO, _IOW, _IOR, or _IOWR is a sequence number
to distinguish ioctls from each other. The third argument to _IOW,
@@ -62,9 +62,8 @@ Following this convention is good because:
(5) When following the convention, the driver code can use generic
code to copy the parameters between user and kernel space.
-This table lists ioctls visible from user land for Linux/x86. It contains
-most drivers up to 2.6.31, but I know I am missing some. There has been
-no attempt to list non-X86 architectures or ioctls from drivers/staging/.
+This table lists ioctls visible from userland, excluding ones from
+drivers/staging/.
==== ===== ======================================================= ================================================================
Code Seq# Include File Comments
@@ -85,6 +84,8 @@ Code Seq# Include File Comments
0x10 20-2F arch/s390/include/uapi/asm/hypfs.h
0x12 all linux/fs.h BLK* ioctls
linux/blkpg.h
+ linux/blkzoned.h
+ linux/blk-crypto.h
0x15 all linux/fs.h FS_IOC_* ioctls
0x1b all InfiniBand Subsystem
<http://infiniband.sourceforge.net/>
@@ -283,6 +284,7 @@ Code Seq# Include File Comments
'p' 80-9F linux/ppdev.h user-space parport
<mailto:tim@cyberelk.net>
'p' A1-A5 linux/pps.h LinuxPPS
+'p' B1-B3 linux/pps_gen.h LinuxPPS
<mailto:giometti@linux.it>
'q' 00-1F linux/serio.h
'q' 80-FF linux/telephony.h Internet PhoneJACK, Internet LineJACK
@@ -311,6 +313,7 @@ Code Seq# Include File Comments
<mailto:oe@port.de>
'z' 10-4F drivers/s390/crypto/zcrypt_api.h conflict!
'|' 00-7F linux/media.h
+'|' 80-9F samples/ Any sample and example drivers
0x80 00-1F linux/fb.h
0x81 00-1F linux/vduse.h
0x89 00-06 arch/x86/include/asm/sockios.h
@@ -329,6 +332,7 @@ Code Seq# Include File Comments
0x97 00-7F fs/ceph/ioctl.h Ceph file system
0x99 00-0F 537-Addinboard driver
<mailto:buk@buks.ipn.de>
+0x9A 00-0F include/uapi/fwctl/fwctl.h
0xA0 all linux/sdp/sdp.h Industrial Device Project
<mailto:kenji@bitgate.com>
0xA1 0 linux/vtpm_proxy.h TPM Emulator Proxy Driver
@@ -361,6 +365,12 @@ Code Seq# Include File Comments
<mailto:linuxppc-dev>
0xB2 01-02 arch/powerpc/include/uapi/asm/papr-sysparm.h powerpc/pseries system parameter API
<mailto:linuxppc-dev>
+0xB2 03-05 arch/powerpc/include/uapi/asm/papr-indices.h powerpc/pseries indices API
+ <mailto:linuxppc-dev>
+0xB2 06-07 arch/powerpc/include/uapi/asm/papr-platform-dump.h powerpc/pseries Platform Dump API
+ <mailto:linuxppc-dev>
+0xB2 08 powerpc/include/uapi/asm/papr-physical-attestation.h powerpc/pseries Physical Attestation API
+ <mailto:linuxppc-dev>
0xB3 00 linux/mmc/ioctl.h
0xB4 00-0F linux/gpio.h <mailto:linux-gpio@vger.kernel.org>
0xB5 00-0F uapi/linux/rpmsg.h <mailto:linux-remoteproc@vger.kernel.org>
@@ -368,10 +378,12 @@ Code Seq# Include File Comments
0xB7 all uapi/linux/remoteproc_cdev.h <mailto:linux-remoteproc@vger.kernel.org>
0xB7 all uapi/linux/nsfs.h <mailto:Andrei Vagin <avagin@openvz.org>>
0xB8 01-02 uapi/misc/mrvl_cn10k_dpi.h Marvell CN10K DPI driver
+0xB8 all uapi/linux/mshv.h Microsoft Hyper-V /dev/mshv driver
+ <mailto:linux-hyperv@vger.kernel.org>
0xC0 00-0F linux/usb/iowarrior.h
-0xCA 00-0F uapi/misc/cxl.h
+0xCA 00-0F uapi/misc/cxl.h Dead since 6.15
0xCA 10-2F uapi/misc/ocxl.h
-0xCA 80-BF uapi/scsi/cxlflash_ioctl.h
+0xCA 80-BF uapi/scsi/cxlflash_ioctl.h Dead since 6.15
0xCB 00-1F CBM serial IEC bus in development:
<mailto:michael.klein@puffin.lb.shuttle.de>
0xCC 00-0F drivers/misc/ibmvmc.h pseries VMC driver
@@ -390,6 +402,8 @@ Code Seq# Include File Comments
<mailto:mathieu.desnoyers@efficios.com>
0xF8 all arch/x86/include/uapi/asm/amd_hsmp.h AMD HSMP EPYC system management interface driver
<mailto:nchatrad@amd.com>
+0xF9 00-0F uapi/misc/amd-apml.h AMD side band system management interface driver
+ <mailto:naveenkrishna.chatradhi@amd.com>
0xFD all linux/dm-ioctl.h
0xFE all linux/isst_if.h
==== ===== ======================================================= ================================================================
diff --git a/Documentation/userspace-api/iommufd.rst b/Documentation/userspace-api/iommufd.rst
index 70289d6815d2..b0df15865dec 100644
--- a/Documentation/userspace-api/iommufd.rst
+++ b/Documentation/userspace-api/iommufd.rst
@@ -63,6 +63,13 @@ Following IOMMUFD objects are exposed to userspace:
space usually has mappings from guest-level I/O virtual addresses to guest-
level physical addresses.
+- IOMMUFD_FAULT, representing a software queue for an HWPT reporting IO page
+ faults using the IOMMU HW's PRI (Page Request Interface). This queue object
+ provides user space an FD to poll the page fault events and also to respond
+ to those events. A FAULT object must be created first to get a fault_id that
+ could be then used to allocate a fault-enabled HWPT via the IOMMU_HWPT_ALLOC
+ command by setting the IOMMU_HWPT_FAULT_ID_VALID bit in its flags field.
+
- IOMMUFD_OBJ_VIOMMU, representing a slice of the physical IOMMU instance,
passed to or shared with a VM. It may be some HW-accelerated virtualization
features and some SW resources used by the VM. For examples:
@@ -109,6 +116,14 @@ Following IOMMUFD objects are exposed to userspace:
vIOMMU, which is a separate ioctl call from attaching the same device to an
HWPT_PAGING that the vIOMMU holds.
+- IOMMUFD_OBJ_VEVENTQ, representing a software queue for a vIOMMU to report its
+ events such as translation faults occurred to a nested stage-1 (excluding I/O
+ page faults that should go through IOMMUFD_OBJ_FAULT) and HW-specific events.
+ This queue object provides user space an FD to poll/read the vIOMMU events. A
+ vIOMMU object must be created first to get its viommu_id, which could be then
+ used to allocate a vEVENTQ. Each vIOMMU can support multiple types of vEVENTS,
+ but is confined to one vEVENTQ per vEVENTQ type.
+
All user-visible objects are destroyed via the IOMMU_DESTROY uAPI.
The diagrams below show relationships between user-visible objects and kernel
@@ -251,8 +266,10 @@ User visible objects are backed by following datastructures:
- iommufd_device for IOMMUFD_OBJ_DEVICE.
- iommufd_hwpt_paging for IOMMUFD_OBJ_HWPT_PAGING.
- iommufd_hwpt_nested for IOMMUFD_OBJ_HWPT_NESTED.
+- iommufd_fault for IOMMUFD_OBJ_FAULT.
- iommufd_viommu for IOMMUFD_OBJ_VIOMMU.
- iommufd_vdevice for IOMMUFD_OBJ_VDEVICE.
+- iommufd_veventq for IOMMUFD_OBJ_VEVENTQ.
Several terminologies when looking at these datastructures:
diff --git a/Documentation/userspace-api/landlock.rst b/Documentation/userspace-api/landlock.rst
index d639c61cb472..1d0c2c15c22e 100644
--- a/Documentation/userspace-api/landlock.rst
+++ b/Documentation/userspace-api/landlock.rst
@@ -8,7 +8,7 @@ Landlock: unprivileged access control
=====================================
:Author: Mickaël Salaün
-:Date: October 2024
+:Date: March 2025
The goal of Landlock is to enable restriction of ambient rights (e.g. global
filesystem or network access) for a set of processes. Because Landlock
@@ -317,33 +317,32 @@ IPC scoping
-----------
Similar to the implicit `Ptrace restrictions`_, we may want to further restrict
-interactions between sandboxes. Each Landlock domain can be explicitly scoped
-for a set of actions by specifying it on a ruleset. For example, if a
-sandboxed process should not be able to :manpage:`connect(2)` to a
-non-sandboxed process through abstract :manpage:`unix(7)` sockets, we can
-specify such a restriction with ``LANDLOCK_SCOPE_ABSTRACT_UNIX_SOCKET``.
-Moreover, if a sandboxed process should not be able to send a signal to a
-non-sandboxed process, we can specify this restriction with
-``LANDLOCK_SCOPE_SIGNAL``.
-
-A sandboxed process can connect to a non-sandboxed process when its domain is
-not scoped. If a process's domain is scoped, it can only connect to sockets
-created by processes in the same scope.
-Moreover, If a process is scoped to send signal to a non-scoped process, it can
-only send signals to processes in the same scope.
-
-A connected datagram socket behaves like a stream socket when its domain is
-scoped, meaning if the domain is scoped after the socket is connected , it can
-still :manpage:`send(2)` data just like a stream socket. However, in the same
-scenario, a non-connected datagram socket cannot send data (with
-:manpage:`sendto(2)`) outside its scope.
-
-A process with a scoped domain can inherit a socket created by a non-scoped
-process. The process cannot connect to this socket since it has a scoped
-domain.
-
-IPC scoping does not support exceptions, so if a domain is scoped, no rules can
-be added to allow access to resources or processes outside of the scope.
+interactions between sandboxes. Therefore, at ruleset creation time, each
+Landlock domain can restrict the scope for certain operations, so that these
+operations can only reach out to processes within the same Landlock domain or in
+a nested Landlock domain (the "scope").
+
+The operations which can be scoped are:
+
+``LANDLOCK_SCOPE_SIGNAL``
+ This limits the sending of signals to target processes which run within the
+ same or a nested Landlock domain.
+
+``LANDLOCK_SCOPE_ABSTRACT_UNIX_SOCKET``
+ This limits the set of abstract :manpage:`unix(7)` sockets to which we can
+ :manpage:`connect(2)` to socket addresses which were created by a process in
+ the same or a nested Landlock domain.
+
+ A :manpage:`sendto(2)` on a non-connected datagram socket is treated as if
+ it were doing an implicit :manpage:`connect(2)` and will be blocked if the
+ remote end does not stem from the same or a nested Landlock domain.
+
+ A :manpage:`sendto(2)` on a socket which was previously connected will not
+ be restricted. This works for both datagram and stream sockets.
+
+IPC scoping does not support exceptions via :manpage:`landlock_add_rule(2)`.
+If an operation is scoped within a domain, no rules can be added to allow access
+to resources or processes outside of the scope.
Truncating files
----------------
@@ -595,6 +594,16 @@ Starting with the Landlock ABI version 6, it is possible to restrict
:manpage:`signal(7)` sending by setting ``LANDLOCK_SCOPE_SIGNAL`` to the
``scoped`` ruleset attribute.
+Logging (ABI < 7)
+-----------------
+
+Starting with the Landlock ABI version 7, it is possible to control logging of
+Landlock audit events with the ``LANDLOCK_RESTRICT_SELF_LOG_SAME_EXEC_OFF``,
+``LANDLOCK_RESTRICT_SELF_LOG_NEW_EXEC_ON``, and
+``LANDLOCK_RESTRICT_SELF_LOG_SUBDOMAINS_OFF`` flags passed to
+sys_landlock_restrict_self(). See Documentation/admin-guide/LSM/landlock.rst
+for more details on audit.
+
.. _kernel_support:
Kernel support
@@ -683,9 +692,16 @@ fine-grained restrictions). Moreover, their complexity can lead to security
issues, especially when untrusted processes can manipulate them (cf.
`Controlling access to user namespaces <https://lwn.net/Articles/673597/>`_).
+How to disable Landlock audit records?
+--------------------------------------
+
+You might want to put in place filters as explained here:
+Documentation/admin-guide/LSM/landlock.rst
+
Additional documentation
========================
+* Documentation/admin-guide/LSM/landlock.rst
* Documentation/security/landlock.rst
* https://landlock.io
diff --git a/Documentation/userspace-api/media/drivers/uvcvideo.rst b/Documentation/userspace-api/media/drivers/uvcvideo.rst
index a290f9fadae9..dbb30ad389ae 100644
--- a/Documentation/userspace-api/media/drivers/uvcvideo.rst
+++ b/Documentation/userspace-api/media/drivers/uvcvideo.rst
@@ -181,6 +181,7 @@ Argument: struct uvc_xu_control_mapping
UVC_CTRL_DATA_TYPE_BOOLEAN Boolean
UVC_CTRL_DATA_TYPE_ENUM Enumeration
UVC_CTRL_DATA_TYPE_BITMASK Bitmask
+ UVC_CTRL_DATA_TYPE_RECT Rectangular area
UVCIOC_CTRL_QUERY - Query a UVC XU control
@@ -255,3 +256,66 @@ Argument: struct uvc_xu_control_query
__u8 query Request code to send to the device
__u16 size Control data size (in bytes)
__u8 *data Control value
+
+
+Driver-specific V4L2 controls
+-----------------------------
+
+The uvcvideo driver implements the following UVC-specific controls:
+
+``V4L2_CID_UVC_REGION_OF_INTEREST_RECT (struct)``
+ This control determines the region of interest (ROI). ROI is a
+ rectangular area represented by a struct :c:type:`v4l2_rect`. The
+ rectangle is in global sensor coordinates using pixel units. It is
+ independent of the field of view, not impacted by any cropping or
+ scaling.
+
+ Use ``V4L2_CTRL_WHICH_MIN_VAL`` and ``V4L2_CTRL_WHICH_MAX_VAL`` to query
+ the range of rectangle sizes.
+
+ Setting a ROI allows the camera to optimize the capture for the region.
+ The value of ``V4L2_CID_REGION_OF_INTEREST_AUTO`` control determines
+ the detailed behavior.
+
+ An example of use of this control, can be found in the:
+ `Chrome OS USB camera HAL.
+ <https://chromium.googlesource.com/chromiumos/platform2/+/refs/heads/release-R121-15699.B/camera/hal/usb/>`
+
+
+``V4L2_CID_UVC_REGION_OF_INTEREST_AUTO (bitmask)``
+ This determines which, if any, on-board features should track to the
+ Region of Interest specified by the current value of
+ ``V4L2_CID_UVD__REGION_OF_INTEREST_RECT``.
+
+ Max value is a mask indicating all supported Auto Controls.
+
+.. flat-table::
+ :header-rows: 0
+ :stub-columns: 0
+
+ * - ``V4L2_UVC_REGION_OF_INTEREST_AUTO_EXPOSURE``
+ - Setting this bit causes automatic exposure to track the region of
+ interest instead of the whole image.
+ * - ``V4L2_UVC_REGION_OF_INTEREST_AUTO_IRIS``
+ - Setting this bit causes automatic iris to track the region of interest
+ instead of the whole image.
+ * - ``V4L2_UVC_REGION_OF_INTEREST_AUTO_WHITE_BALANCE``
+ - Setting this bit causes automatic white balance to track the region
+ of interest instead of the whole image.
+ * - ``V4L2_UVC_REGION_OF_INTEREST_AUTO_FOCUS``
+ - Setting this bit causes automatic focus adjustment to track the region
+ of interest instead of the whole image.
+ * - ``V4L2_UVC_REGION_OF_INTEREST_AUTO_FACE_DETECT``
+ - Setting this bit causes automatic face detection to track the region of
+ interest instead of the whole image.
+ * - ``V4L2_UVC_REGION_OF_INTEREST_AUTO_DETECT_AND_TRACK``
+ - Setting this bit enables automatic face detection and tracking. The
+ current value of ``V4L2_CID_REGION_OF_INTEREST_RECT`` may be updated by
+ the driver.
+ * - ``V4L2_UVC_REGION_OF_INTEREST_AUTO_IMAGE_STABILIZATION``
+ - Setting this bit enables automatic image stabilization. The
+ current value of ``V4L2_CID_REGION_OF_INTEREST_RECT`` may be updated by
+ the driver.
+ * - ``V4L2_UVC_REGION_OF_INTEREST_AUTO_HIGHER_QUALITY``
+ - Setting this bit enables automatically capture the specified region
+ with higher quality if possible.
diff --git a/Documentation/userspace-api/media/rc/rc-sysfs-nodes.rst b/Documentation/userspace-api/media/rc/rc-sysfs-nodes.rst
index 34d6a0a1f4d3..70b5966aaff8 100644
--- a/Documentation/userspace-api/media/rc/rc-sysfs-nodes.rst
+++ b/Documentation/userspace-api/media/rc/rc-sysfs-nodes.rst
@@ -6,7 +6,7 @@
Remote Controller's sysfs nodes
*******************************
-As defined at ``Documentation/ABI/testing/sysfs-class-rc``, those are
+As defined at Documentation/ABI/testing/sysfs-class-rc, those are
the sysfs nodes that control the Remote Controllers:
diff --git a/Documentation/userspace-api/media/v4l/meta-formats.rst b/Documentation/userspace-api/media/v4l/meta-formats.rst
index 86ffb3bc8ade..bb6876cfc271 100644
--- a/Documentation/userspace-api/media/v4l/meta-formats.rst
+++ b/Documentation/userspace-api/media/v4l/meta-formats.rst
@@ -12,6 +12,7 @@ These formats are used for the :ref:`metadata` interface only.
.. toctree::
:maxdepth: 1
+ metafmt-c3-isp
metafmt-d4xx
metafmt-generic
metafmt-intel-ipu3
diff --git a/Documentation/userspace-api/media/v4l/metafmt-c3-isp.rst b/Documentation/userspace-api/media/v4l/metafmt-c3-isp.rst
new file mode 100644
index 000000000000..449b45c2ec24
--- /dev/null
+++ b/Documentation/userspace-api/media/v4l/metafmt-c3-isp.rst
@@ -0,0 +1,86 @@
+.. SPDX-License-Identifier: (GPL-2.0-only OR MIT)
+
+.. _v4l2-meta-fmt-c3isp-stats:
+.. _v4l2-meta-fmt-c3isp-params:
+
+***********************************************************************
+V4L2_META_FMT_C3ISP_STATS ('C3ST'), V4L2_META_FMT_C3ISP_PARAMS ('C3PM')
+***********************************************************************
+
+.. c3_isp_stats_info
+
+3A Statistics
+=============
+
+The C3 ISP can collect different statistics over an input Bayer frame.
+Those statistics are obtained from the "c3-isp-stats" metadata capture video nodes,
+using the :c:type:`v4l2_meta_format` interface.
+They are formatted as described by the :c:type:`c3_isp_stats_info` structure.
+
+The statistics collected are Auto-white balance,
+Auto-exposure and Auto-focus information.
+
+.. c3_isp_params_cfg
+
+Configuration Parameters
+========================
+
+The configuration parameters are passed to the c3-isp-params metadata output video node,
+using the :c:type:`v4l2_meta_format` interface. Rather than a single struct containing
+sub-structs for each configurable area of the ISP, parameters for the C3-ISP
+are defined as distinct structs or "blocks" which may be added to the data
+member of :c:type:`c3_isp_params_cfg`. Userspace is responsible for
+populating the data member with the blocks that need to be configured by the driver, but
+need not populate it with **all** the blocks, or indeed with any at all if there
+are no configuration changes to make. Populated blocks **must** be consecutive
+in the buffer. To assist both userspace and the driver in identifying the
+blocks each block-specific struct embeds
+:c:type:`c3_isp_params_block_header` as its first member and userspace
+must populate the type member with a value from
+:c:type:`c3_isp_params_block_type`. Once the blocks have been populated
+into the data buffer, the combined size of all populated blocks shall be set in
+the data_size member of :c:type:`c3_isp_params_cfg`. For example:
+
+.. code-block:: c
+
+ struct c3_isp_params_cfg *params =
+ (struct c3_isp_params_cfg *)buffer;
+
+ params->version = C3_ISP_PARAM_BUFFER_V0;
+ params->data_size = 0;
+
+ void *data = (void *)params->data;
+
+ struct c3_isp_params_awb_gains *gains =
+ (struct c3_isp_params_awb_gains *)data;
+
+ gains->header.type = C3_ISP_PARAMS_BLOCK_AWB_GAINS;
+ gains->header.flags = C3_ISP_PARAMS_BLOCK_FL_ENABLE;
+ gains->header.size = sizeof(struct c3_isp_params_awb_gains);
+
+ gains->gr_gain = 256;
+ gains->r_gain = 256;
+ gains->b_gain = 256;
+ gains->gb_gain = 256;
+
+ data += sizeof(struct c3_isp__params_awb_gains);
+ params->data_size += sizeof(struct c3_isp_params_awb_gains);
+
+ struct c3_isp_params_awb_config *awb_cfg =
+ (struct c3_isp_params_awb_config *)data;
+
+ awb_cfg->header.type = C3_ISP_PARAMS_BLOCK_AWB_CONFIG;
+ awb_cfg->header.flags = C3_ISP_PARAMS_BLOCK_FL_ENABLE;
+ awb_cfg->header.size = sizeof(struct c3_isp_params_awb_config);
+
+ awb_cfg->tap_point = C3_ISP_AWB_STATS_TAP_BEFORE_WB;
+ awb_cfg->satur = 1;
+ awb_cfg->horiz_zones_num = 32;
+ awb_cfg->vert_zones_num = 24;
+
+ params->data_size += sizeof(struct c3_isp_params_awb_config);
+
+Amlogic C3 ISP uAPI data types
+===============================
+
+.. kernel-doc:: include/uapi/linux/media/amlogic/c3-isp-config.h
diff --git a/Documentation/userspace-api/media/v4l/pixfmt-yuv-planar.rst b/Documentation/userspace-api/media/v4l/pixfmt-yuv-planar.rst
index b788f6933855..6e4f399f1f88 100644
--- a/Documentation/userspace-api/media/v4l/pixfmt-yuv-planar.rst
+++ b/Documentation/userspace-api/media/v4l/pixfmt-yuv-planar.rst
@@ -137,6 +137,13 @@ All components are stored with the same number of bits per component.
- Cb, Cr
- No
- Linear
+ * - V4L2_PIX_FMT_NV15
+ - 'NV15'
+ - 10
+ - 4:2:0
+ - Cb, Cr
+ - Yes
+ - Linear
* - V4L2_PIX_FMT_NV15_4L4
- 'VT15'
- 15
@@ -186,6 +193,13 @@ All components are stored with the same number of bits per component.
- Cr, Cb
- No
- Linear
+ * - V4L2_PIX_FMT_NV20
+ - 'NV20'
+ - 10
+ - 4:2:2
+ - Cb, Cr
+ - Yes
+ - Linear
* - V4L2_PIX_FMT_NV24
- 'NV24'
- 8
@@ -302,6 +316,57 @@ of the luma plane.
- Cr\ :sub:`11`
+.. _V4L2-PIX-FMT-NV15:
+
+NV15
+----
+
+Semi-planar 10-bit YUV 4:2:0 format similar to NV12, using 10-bit components
+with no padding between each component. A group of 4 components are stored over
+5 bytes in little endian order.
+
+.. flat-table:: Sample 4x4 NV15 Image (1 byte per cell)
+ :header-rows: 0
+ :stub-columns: 0
+
+ * - start + 0:
+ - Y'\ :sub:`00[7:0]`
+ - Y'\ :sub:`01[5:0]`\ Y'\ :sub:`00[9:8]`
+ - Y'\ :sub:`02[3:0]`\ Y'\ :sub:`01[9:6]`
+ - Y'\ :sub:`03[1:0]`\ Y'\ :sub:`02[9:4]`
+ - Y'\ :sub:`03[9:2]`
+ * - start + 5:
+ - Y'\ :sub:`10[7:0]`
+ - Y'\ :sub:`11[5:0]`\ Y'\ :sub:`10[9:8]`
+ - Y'\ :sub:`12[3:0]`\ Y'\ :sub:`11[9:6]`
+ - Y'\ :sub:`13[1:0]`\ Y'\ :sub:`12[9:4]`
+ - Y'\ :sub:`13[9:2]`
+ * - start + 10:
+ - Y'\ :sub:`20[7:0]`
+ - Y'\ :sub:`21[5:0]`\ Y'\ :sub:`20[9:8]`
+ - Y'\ :sub:`22[3:0]`\ Y'\ :sub:`21[9:6]`
+ - Y'\ :sub:`23[1:0]`\ Y'\ :sub:`22[9:4]`
+ - Y'\ :sub:`23[9:2]`
+ * - start + 15:
+ - Y'\ :sub:`30[7:0]`
+ - Y'\ :sub:`31[5:0]`\ Y'\ :sub:`30[9:8]`
+ - Y'\ :sub:`32[3:0]`\ Y'\ :sub:`31[9:6]`
+ - Y'\ :sub:`33[1:0]`\ Y'\ :sub:`32[9:4]`
+ - Y'\ :sub:`33[9:2]`
+ * - start + 20:
+ - Cb\ :sub:`00[7:0]`
+ - Cr\ :sub:`00[5:0]`\ Cb\ :sub:`00[9:8]`
+ - Cb\ :sub:`01[3:0]`\ Cr\ :sub:`00[9:6]`
+ - Cr\ :sub:`01[1:0]`\ Cb\ :sub:`01[9:4]`
+ - Cr\ :sub:`01[9:2]`
+ * - start + 25:
+ - Cb\ :sub:`10[7:0]`
+ - Cr\ :sub:`10[5:0]`\ Cb\ :sub:`10[9:8]`
+ - Cb\ :sub:`11[3:0]`\ Cr\ :sub:`10[9:6]`
+ - Cr\ :sub:`11[1:0]`\ Cb\ :sub:`11[9:4]`
+ - Cr\ :sub:`11[9:2]`
+
+
.. _V4L2-PIX-FMT-NV12MT:
.. _V4L2-PIX-FMT-NV12MT-16X16:
.. _V4L2-PIX-FMT-NV12-4L4:
@@ -631,6 +696,69 @@ number of lines as the luma plane.
- Cr\ :sub:`32`
+.. _V4L2-PIX-FMT-NV20:
+
+NV20
+----
+
+Semi-planar 10-bit YUV 4:2:2 format similar to NV16, using 10-bit components
+with no padding between each component. A group of 4 components are stored over
+5 bytes in little endian order.
+
+.. flat-table:: Sample 4x4 NV20 Image (1 byte per cell)
+ :header-rows: 0
+ :stub-columns: 0
+
+ * - start + 0:
+ - Y'\ :sub:`00[7:0]`
+ - Y'\ :sub:`01[5:0]`\ Y'\ :sub:`00[9:8]`
+ - Y'\ :sub:`02[3:0]`\ Y'\ :sub:`01[9:6]`
+ - Y'\ :sub:`03[1:0]`\ Y'\ :sub:`02[9:4]`
+ - Y'\ :sub:`03[9:2]`
+ * - start + 5:
+ - Y'\ :sub:`10[7:0]`
+ - Y'\ :sub:`11[5:0]`\ Y'\ :sub:`10[9:8]`
+ - Y'\ :sub:`12[3:0]`\ Y'\ :sub:`11[9:6]`
+ - Y'\ :sub:`13[1:0]`\ Y'\ :sub:`12[9:4]`
+ - Y'\ :sub:`13[9:2]`
+ * - start + 10:
+ - Y'\ :sub:`20[7:0]`
+ - Y'\ :sub:`21[5:0]`\ Y'\ :sub:`20[9:8]`
+ - Y'\ :sub:`22[3:0]`\ Y'\ :sub:`21[9:6]`
+ - Y'\ :sub:`23[1:0]`\ Y'\ :sub:`22[9:4]`
+ - Y'\ :sub:`23[9:2]`
+ * - start + 15:
+ - Y'\ :sub:`30[7:0]`
+ - Y'\ :sub:`31[5:0]`\ Y'\ :sub:`30[9:8]`
+ - Y'\ :sub:`32[3:0]`\ Y'\ :sub:`31[9:6]`
+ - Y'\ :sub:`33[1:0]`\ Y'\ :sub:`32[9:4]`
+ - Y'\ :sub:`33[9:2]`
+ * - start + 20:
+ - Cb\ :sub:`00[7:0]`
+ - Cr\ :sub:`00[5:0]`\ Cb\ :sub:`00[9:8]`
+ - Cb\ :sub:`01[3:0]`\ Cr\ :sub:`00[9:6]`
+ - Cr\ :sub:`01[1:0]`\ Cb\ :sub:`01[9:4]`
+ - Cr\ :sub:`01[9:2]`
+ * - start + 25:
+ - Cb\ :sub:`10[7:0]`
+ - Cr\ :sub:`10[5:0]`\ Cb\ :sub:`10[9:8]`
+ - Cb\ :sub:`11[3:0]`\ Cr\ :sub:`10[9:6]`
+ - Cr\ :sub:`11[1:0]`\ Cb\ :sub:`11[9:4]`
+ - Cr\ :sub:`11[9:2]`
+ * - start + 30:
+ - Cb\ :sub:`20[7:0]`
+ - Cr\ :sub:`20[5:0]`\ Cb\ :sub:`20[9:8]`
+ - Cb\ :sub:`21[3:0]`\ Cr\ :sub:`20[9:6]`
+ - Cr\ :sub:`21[1:0]`\ Cb\ :sub:`21[9:4]`
+ - Cr\ :sub:`21[9:2]`
+ * - start + 35:
+ - Cb\ :sub:`30[7:0]`
+ - Cr\ :sub:`30[5:0]`\ Cb\ :sub:`30[9:8]`
+ - Cb\ :sub:`31[3:0]`\ Cr\ :sub:`30[9:6]`
+ - Cr\ :sub:`31[1:0]`\ Cb\ :sub:`31[9:4]`
+ - Cr\ :sub:`31[9:2]`
+
+
.. _V4L2-PIX-FMT-NV24:
.. _V4L2-PIX-FMT-NV42:
diff --git a/Documentation/userspace-api/media/v4l/vidioc-g-ext-ctrls.rst b/Documentation/userspace-api/media/v4l/vidioc-g-ext-ctrls.rst
index 4d56c0528ad7..b8698b85bd80 100644
--- a/Documentation/userspace-api/media/v4l/vidioc-g-ext-ctrls.rst
+++ b/Documentation/userspace-api/media/v4l/vidioc-g-ext-ctrls.rst
@@ -199,6 +199,10 @@ still cause this situation.
- ``p_area``
- A pointer to a struct :c:type:`v4l2_area`. Valid if this control is
of type ``V4L2_CTRL_TYPE_AREA``.
+ * - struct :c:type:`v4l2_rect` *
+ - ``p_rect``
+ - A pointer to a struct :c:type:`v4l2_rect`. Valid if this control is
+ of type ``V4L2_CTRL_TYPE_RECT``.
* - struct :c:type:`v4l2_ctrl_h264_sps` *
- ``p_h264_sps``
- A pointer to a struct :c:type:`v4l2_ctrl_h264_sps`. Valid if this control is
@@ -334,14 +338,26 @@ still cause this situation.
- Which value of the control to get/set/try.
* - :cspan:`2` ``V4L2_CTRL_WHICH_CUR_VAL`` will return the current value of
the control, ``V4L2_CTRL_WHICH_DEF_VAL`` will return the default
- value of the control and ``V4L2_CTRL_WHICH_REQUEST_VAL`` indicates that
- these controls have to be retrieved from a request or tried/set for
- a request. In the latter case the ``request_fd`` field contains the
+ value of the control, ``V4L2_CTRL_WHICH_MIN_VAL`` will return the minimum
+ value of the control, and ``V4L2_CTRL_WHICH_MAX_VAL`` will return the maximum
+ value of the control. ``V4L2_CTRL_WHICH_REQUEST_VAL`` indicates that
+ the control value has to be retrieved from a request or tried/set for
+ a request. In that case the ``request_fd`` field contains the
file descriptor of the request that should be used. If the device
does not support requests, then ``EACCES`` will be returned.
- When using ``V4L2_CTRL_WHICH_DEF_VAL`` be aware that you can only
- get the default value of the control, you cannot set or try it.
+ When using ``V4L2_CTRL_WHICH_DEF_VAL``, ``V4L2_CTRL_WHICH_MIN_VAL``
+ or ``V4L2_CTRL_WHICH_MAX_VAL`` be aware that you can only get the
+ default/minimum/maximum value of the control, you cannot set or try it.
+
+ Whether a control supports querying the minimum and maximum values using
+ ``V4L2_CTRL_WHICH_MIN_VAL`` and ``V4L2_CTRL_WHICH_MAX_VAL`` is indicated
+ by the ``V4L2_CTRL_FLAG_HAS_WHICH_MIN_MAX`` flag. Most non-compound
+ control types support this. For controls with compound types, the
+ definition of minimum/maximum values are provided by
+ the control documentation. If a compound control does not document the
+ meaning of minimum/maximum value, then querying the minimum or maximum
+ value will result in the error code -EINVAL.
For backwards compatibility you can also use a control class here
(see :ref:`ctrl-class`). In that case all controls have to
diff --git a/Documentation/userspace-api/media/v4l/vidioc-queryctrl.rst b/Documentation/userspace-api/media/v4l/vidioc-queryctrl.rst
index 4d38acafe8e1..3549417c7feb 100644
--- a/Documentation/userspace-api/media/v4l/vidioc-queryctrl.rst
+++ b/Documentation/userspace-api/media/v4l/vidioc-queryctrl.rst
@@ -441,6 +441,16 @@ See also the examples in :ref:`control`.
- n/a
- A struct :c:type:`v4l2_area`, containing the width and the height
of a rectangular area. Units depend on the use case.
+ * - ``V4L2_CTRL_TYPE_RECT``
+ - n/a
+ - n/a
+ - n/a
+ - A struct :c:type:`v4l2_rect`, containing a rectangle described by
+ the position of its top-left corner, the width and the height. Units
+ depend on the use case. Support for ``V4L2_CTRL_WHICH_MIN_VAL`` and
+ ``V4L2_CTRL_WHICH_MAX_VAL`` is optional and depends on the
+ ``V4L2_CTRL_FLAG_HAS_WHICH_MIN_MAX`` flag. See the documentation of
+ the specific control on how to interpret the minimum and maximum values.
* - ``V4L2_CTRL_TYPE_H264_SPS``
- n/a
- n/a
@@ -657,6 +667,10 @@ See also the examples in :ref:`control`.
``dims[0]``. So setting the control with a differently sized
array will change the ``elems`` field when the control is
queried afterwards.
+ * - ``V4L2_CTRL_FLAG_HAS_WHICH_MIN_MAX``
+ - 0x1000
+ - This control supports getting minimum and maximum values using
+ vidioc_g_ext_ctrls with V4L2_CTRL_WHICH_MIN/MAX_VAL.
Return Value
============
diff --git a/Documentation/userspace-api/media/videodev2.h.rst.exceptions b/Documentation/userspace-api/media/videodev2.h.rst.exceptions
index 429b5cdf05c3..35d3456cc812 100644
--- a/Documentation/userspace-api/media/videodev2.h.rst.exceptions
+++ b/Documentation/userspace-api/media/videodev2.h.rst.exceptions
@@ -150,6 +150,7 @@ replace symbol V4L2_CTRL_TYPE_HEVC_SPS :c:type:`v4l2_ctrl_type`
replace symbol V4L2_CTRL_TYPE_HEVC_PPS :c:type:`v4l2_ctrl_type`
replace symbol V4L2_CTRL_TYPE_HEVC_SLICE_PARAMS :c:type:`v4l2_ctrl_type`
replace symbol V4L2_CTRL_TYPE_AREA :c:type:`v4l2_ctrl_type`
+replace symbol V4L2_CTRL_TYPE_RECT :c:type:`v4l2_ctrl_type`
replace symbol V4L2_CTRL_TYPE_FWHT_PARAMS :c:type:`v4l2_ctrl_type`
replace symbol V4L2_CTRL_TYPE_VP8_FRAME :c:type:`v4l2_ctrl_type`
replace symbol V4L2_CTRL_TYPE_VP9_COMPRESSED_HDR :c:type:`v4l2_ctrl_type`
@@ -395,6 +396,7 @@ replace define V4L2_CTRL_FLAG_HAS_PAYLOAD control-flags
replace define V4L2_CTRL_FLAG_EXECUTE_ON_WRITE control-flags
replace define V4L2_CTRL_FLAG_MODIFY_LAYOUT control-flags
replace define V4L2_CTRL_FLAG_DYNAMIC_ARRAY control-flags
+replace define V4L2_CTRL_FLAG_HAS_WHICH_MIN_MAX control-flags
replace define V4L2_CTRL_FLAG_NEXT_CTRL control
replace define V4L2_CTRL_FLAG_NEXT_COMPOUND control
@@ -569,6 +571,8 @@ ignore define V4L2_CTRL_DRIVER_PRIV
ignore define V4L2_CTRL_MAX_DIMS
ignore define V4L2_CTRL_WHICH_CUR_VAL
ignore define V4L2_CTRL_WHICH_DEF_VAL
+ignore define V4L2_CTRL_WHICH_MIN_VAL
+ignore define V4L2_CTRL_WHICH_MAX_VAL
ignore define V4L2_CTRL_WHICH_REQUEST_VAL
ignore define V4L2_OUT_CAP_CUSTOM_TIMINGS
ignore define V4L2_CID_MAX_CTRLS
diff --git a/Documentation/userspace-api/mseal.rst b/Documentation/userspace-api/mseal.rst
index 41102f74c5e2..ea9b11a0bd89 100644
--- a/Documentation/userspace-api/mseal.rst
+++ b/Documentation/userspace-api/mseal.rst
@@ -27,7 +27,7 @@ SYSCALL
=======
mseal syscall signature
-----------------------
- ``int mseal(void \* addr, size_t len, unsigned long flags)``
+ ``int mseal(void *addr, size_t len, unsigned long flags)``
**addr**/**len**: virtual memory address range.
The address range set by **addr**/**len** must meet:
@@ -130,6 +130,27 @@ Use cases
- Chrome browser: protect some security sensitive data structures.
+- System mappings:
+ The system mappings are created by the kernel and includes vdso, vvar,
+ vvar_vclock, vectors (arm compat-mode), sigpage (arm compat-mode), uprobes.
+
+ Those system mappings are readonly only or execute only, memory sealing can
+ protect them from ever changing to writable or unmmap/remapped as different
+ attributes. This is useful to mitigate memory corruption issues where a
+ corrupted pointer is passed to a memory management system.
+
+ If supported by an architecture (CONFIG_ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS),
+ the CONFIG_MSEAL_SYSTEM_MAPPINGS seals all system mappings of this
+ architecture.
+
+ The following architectures currently support this feature: x86-64, arm64,
+ loongarch and s390.
+
+ WARNING: This feature breaks programs which rely on relocating
+ or unmapping system mappings. Known broken software at the time
+ of writing includes CHECKPOINT_RESTORE, UML, gVisor, rr. Therefore
+ this config can't be enabled universally.
+
When not to use mseal
=====================
Applications can apply sealing to any virtual memory region from userspace,
diff --git a/Documentation/userspace-api/netlink/c-code-gen.rst b/Documentation/userspace-api/netlink/c-code-gen.rst
index 89de42c13350..46415e6d646d 100644
--- a/Documentation/userspace-api/netlink/c-code-gen.rst
+++ b/Documentation/userspace-api/netlink/c-code-gen.rst
@@ -56,7 +56,9 @@ If ``name-prefix`` is specified it replaces the ``$family-$enum``
portion of the entry name.
Boolean ``render-max`` controls creation of the max values
-(which are enabled by default for attribute enums).
+(which are enabled by default for attribute enums). These max
+values are named ``__$pfx-MAX`` and ``$pfx-MAX``. The name
+of the first value can be overridden via ``enum-cnt-name`` property.
Attributes
==========
diff --git a/Documentation/userspace-api/netlink/intro-specs.rst b/Documentation/userspace-api/netlink/intro-specs.rst
index bada89699455..a4435ae4628d 100644
--- a/Documentation/userspace-api/netlink/intro-specs.rst
+++ b/Documentation/userspace-api/netlink/intro-specs.rst
@@ -15,7 +15,7 @@ developing Netlink related code. The tool is implemented in Python
and can use a YAML specification to issue Netlink requests
to the kernel. Only Generic Netlink is supported.
-The tool is located at ``tools/net/ynl/cli.py``. It accepts
+The tool is located at ``tools/net/ynl/pyynl/cli.py``. It accepts
a handul of arguments, the most important ones are:
- ``--spec`` - point to the spec file
@@ -27,7 +27,7 @@ YAML specs can be found under ``Documentation/netlink/specs/``.
Example use::
- $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/ethtool.yaml \
+ $ ./tools/net/ynl/pyynl/cli.py --spec Documentation/netlink/specs/ethtool.yaml \
--do rings-get \
--json '{"header":{"dev-index": 18}}'
{'header': {'dev-index': 18, 'dev-name': 'eni1np1'},
@@ -75,7 +75,7 @@ the two marker lines like above to a file, add that file to git,
and run the regeneration tool. Grep the tree for ``YNL-GEN``
to see other examples.
-The code generation itself is performed by ``tools/net/ynl/ynl-gen-c.py``
+The code generation itself is performed by ``tools/net/ynl/pyynl/ynl_gen_c.py``
but it takes a few arguments so calling it directly for each file
quickly becomes tedious.
@@ -84,7 +84,7 @@ YNL lib
``tools/net/ynl/lib/`` contains an implementation of a C library
(based on libmnl) which integrates with code generated by
-``tools/net/ynl/ynl-gen-c.py`` to create easy to use netlink wrappers.
+``tools/net/ynl/pyynl/ynl_gen_c.py`` to create easy to use netlink wrappers.
YNL basics
----------
diff --git a/Documentation/userspace-api/netlink/netlink-raw.rst b/Documentation/userspace-api/netlink/netlink-raw.rst
index 1990eea772d0..31fc91020eb3 100644
--- a/Documentation/userspace-api/netlink/netlink-raw.rst
+++ b/Documentation/userspace-api/netlink/netlink-raw.rst
@@ -62,7 +62,7 @@ Sub-messages
------------
Several raw netlink families such as
-:doc:`rt_link<../../networking/netlink_spec/rt_link>` and
+:doc:`rt-link<../../networking/netlink_spec/rt-link>` and
:doc:`tc<../../networking/netlink_spec/tc>` use attribute nesting as an
abstraction to carry module specific information.
diff --git a/Documentation/userspace-api/ntsync.rst b/Documentation/userspace-api/ntsync.rst
new file mode 100644
index 000000000000..25e7c4aef968
--- /dev/null
+++ b/Documentation/userspace-api/ntsync.rst
@@ -0,0 +1,385 @@
+===================================
+NT synchronization primitive driver
+===================================
+
+This page documents the user-space API for the ntsync driver.
+
+ntsync is a support driver for emulation of NT synchronization
+primitives by user-space NT emulators. It exists because implementation
+in user-space, using existing tools, cannot match Windows performance
+while offering accurate semantics. It is implemented entirely in
+software, and does not drive any hardware device.
+
+This interface is meant as a compatibility tool only, and should not
+be used for general synchronization. Instead use generic, versatile
+interfaces such as futex(2) and poll(2).
+
+Synchronization primitives
+==========================
+
+The ntsync driver exposes three types of synchronization primitives:
+semaphores, mutexes, and events.
+
+A semaphore holds a single volatile 32-bit counter, and a static 32-bit
+integer denoting the maximum value. It is considered signaled (that is,
+can be acquired without contention, or will wake up a waiting thread)
+when the counter is nonzero. The counter is decremented by one when a
+wait is satisfied. Both the initial and maximum count are established
+when the semaphore is created.
+
+A mutex holds a volatile 32-bit recursion count, and a volatile 32-bit
+identifier denoting its owner. A mutex is considered signaled when its
+owner is zero (indicating that it is not owned). The recursion count is
+incremented when a wait is satisfied, and ownership is set to the given
+identifier.
+
+A mutex also holds an internal flag denoting whether its previous owner
+has died; such a mutex is said to be abandoned. Owner death is not
+tracked automatically based on thread death, but rather must be
+communicated using ``NTSYNC_IOC_MUTEX_KILL``. An abandoned mutex is
+inherently considered unowned.
+
+Except for the "unowned" semantics of zero, the actual value of the
+owner identifier is not interpreted by the ntsync driver at all. The
+intended use is to store a thread identifier; however, the ntsync
+driver does not actually validate that a calling thread provides
+consistent or unique identifiers.
+
+An event is similar to a semaphore with a maximum count of one. It holds
+a volatile boolean state denoting whether it is signaled or not. There
+are two types of events, auto-reset and manual-reset. An auto-reset
+event is designaled when a wait is satisfied; a manual-reset event is
+not. The event type is specified when the event is created.
+
+Unless specified otherwise, all operations on an object are atomic and
+totally ordered with respect to other operations on the same object.
+
+Objects are represented by files. When all file descriptors to an
+object are closed, that object is deleted.
+
+Char device
+===========
+
+The ntsync driver creates a single char device /dev/ntsync. Each file
+description opened on the device represents a unique instance intended
+to back an individual NT virtual machine. Objects created by one ntsync
+instance may only be used with other objects created by the same
+instance.
+
+ioctl reference
+===============
+
+All operations on the device are done through ioctls. There are four
+structures used in ioctl calls::
+
+ struct ntsync_sem_args {
+ __u32 count;
+ __u32 max;
+ };
+
+ struct ntsync_mutex_args {
+ __u32 owner;
+ __u32 count;
+ };
+
+ struct ntsync_event_args {
+ __u32 signaled;
+ __u32 manual;
+ };
+
+ struct ntsync_wait_args {
+ __u64 timeout;
+ __u64 objs;
+ __u32 count;
+ __u32 owner;
+ __u32 index;
+ __u32 alert;
+ __u32 flags;
+ __u32 pad;
+ };
+
+Depending on the ioctl, members of the structure may be used as input,
+output, or not at all.
+
+The ioctls on the device file are as follows:
+
+.. c:macro:: NTSYNC_IOC_CREATE_SEM
+
+ Create a semaphore object. Takes a pointer to struct
+ :c:type:`ntsync_sem_args`, which is used as follows:
+
+ .. list-table::
+
+ * - ``count``
+ - Initial count of the semaphore.
+ * - ``max``
+ - Maximum count of the semaphore.
+
+ Fails with ``EINVAL`` if ``count`` is greater than ``max``.
+ On success, returns a file descriptor the created semaphore.
+
+.. c:macro:: NTSYNC_IOC_CREATE_MUTEX
+
+ Create a mutex object. Takes a pointer to struct
+ :c:type:`ntsync_mutex_args`, which is used as follows:
+
+ .. list-table::
+
+ * - ``count``
+ - Initial recursion count of the mutex.
+ * - ``owner``
+ - Initial owner of the mutex.
+
+ If ``owner`` is nonzero and ``count`` is zero, or if ``owner`` is
+ zero and ``count`` is nonzero, the function fails with ``EINVAL``.
+ On success, returns a file descriptor the created mutex.
+
+.. c:macro:: NTSYNC_IOC_CREATE_EVENT
+
+ Create an event object. Takes a pointer to struct
+ :c:type:`ntsync_event_args`, which is used as follows:
+
+ .. list-table::
+
+ * - ``signaled``
+ - If nonzero, the event is initially signaled, otherwise
+ nonsignaled.
+ * - ``manual``
+ - If nonzero, the event is a manual-reset event, otherwise
+ auto-reset.
+
+ On success, returns a file descriptor the created event.
+
+The ioctls on the individual objects are as follows:
+
+.. c:macro:: NTSYNC_IOC_SEM_POST
+
+ Post to a semaphore object. Takes a pointer to a 32-bit integer,
+ which on input holds the count to be added to the semaphore, and on
+ output contains its previous count.
+
+ If adding to the semaphore's current count would raise the latter
+ past the semaphore's maximum count, the ioctl fails with
+ ``EOVERFLOW`` and the semaphore is not affected. If raising the
+ semaphore's count causes it to become signaled, eligible threads
+ waiting on this semaphore will be woken and the semaphore's count
+ decremented appropriately.
+
+.. c:macro:: NTSYNC_IOC_MUTEX_UNLOCK
+
+ Release a mutex object. Takes a pointer to struct
+ :c:type:`ntsync_mutex_args`, which is used as follows:
+
+ .. list-table::
+
+ * - ``owner``
+ - Specifies the owner trying to release this mutex.
+ * - ``count``
+ - On output, contains the previous recursion count.
+
+ If ``owner`` is zero, the ioctl fails with ``EINVAL``. If ``owner``
+ is not the current owner of the mutex, the ioctl fails with
+ ``EPERM``.
+
+ The mutex's count will be decremented by one. If decrementing the
+ mutex's count causes it to become zero, the mutex is marked as
+ unowned and signaled, and eligible threads waiting on it will be
+ woken as appropriate.
+
+.. c:macro:: NTSYNC_IOC_SET_EVENT
+
+ Signal an event object. Takes a pointer to a 32-bit integer, which on
+ output contains the previous state of the event.
+
+ Eligible threads will be woken, and auto-reset events will be
+ designaled appropriately.
+
+.. c:macro:: NTSYNC_IOC_RESET_EVENT
+
+ Designal an event object. Takes a pointer to a 32-bit integer, which
+ on output contains the previous state of the event.
+
+.. c:macro:: NTSYNC_IOC_PULSE_EVENT
+
+ Wake threads waiting on an event object while leaving it in an
+ unsignaled state. Takes a pointer to a 32-bit integer, which on
+ output contains the previous state of the event.
+
+ A pulse operation can be thought of as a set followed by a reset,
+ performed as a single atomic operation. If two threads are waiting on
+ an auto-reset event which is pulsed, only one will be woken. If two
+ threads are waiting a manual-reset event which is pulsed, both will
+ be woken. However, in both cases, the event will be unsignaled
+ afterwards, and a simultaneous read operation will always report the
+ event as unsignaled.
+
+.. c:macro:: NTSYNC_IOC_READ_SEM
+
+ Read the current state of a semaphore object. Takes a pointer to
+ struct :c:type:`ntsync_sem_args`, which is used as follows:
+
+ .. list-table::
+
+ * - ``count``
+ - On output, contains the current count of the semaphore.
+ * - ``max``
+ - On output, contains the maximum count of the semaphore.
+
+.. c:macro:: NTSYNC_IOC_READ_MUTEX
+
+ Read the current state of a mutex object. Takes a pointer to struct
+ :c:type:`ntsync_mutex_args`, which is used as follows:
+
+ .. list-table::
+
+ * - ``owner``
+ - On output, contains the current owner of the mutex, or zero
+ if the mutex is not currently owned.
+ * - ``count``
+ - On output, contains the current recursion count of the mutex.
+
+ If the mutex is marked as abandoned, the function fails with
+ ``EOWNERDEAD``. In this case, ``count`` and ``owner`` are set to
+ zero.
+
+.. c:macro:: NTSYNC_IOC_READ_EVENT
+
+ Read the current state of an event object. Takes a pointer to struct
+ :c:type:`ntsync_event_args`, which is used as follows:
+
+ .. list-table::
+
+ * - ``signaled``
+ - On output, contains the current state of the event.
+ * - ``manual``
+ - On output, contains 1 if the event is a manual-reset event,
+ and 0 otherwise.
+
+.. c:macro:: NTSYNC_IOC_KILL_OWNER
+
+ Mark a mutex as unowned and abandoned if it is owned by the given
+ owner. Takes an input-only pointer to a 32-bit integer denoting the
+ owner. If the owner is zero, the ioctl fails with ``EINVAL``. If the
+ owner does not own the mutex, the function fails with ``EPERM``.
+
+ Eligible threads waiting on the mutex will be woken as appropriate
+ (and such waits will fail with ``EOWNERDEAD``, as described below).
+
+.. c:macro:: NTSYNC_IOC_WAIT_ANY
+
+ Poll on any of a list of objects, atomically acquiring at most one.
+ Takes a pointer to struct :c:type:`ntsync_wait_args`, which is
+ used as follows:
+
+ .. list-table::
+
+ * - ``timeout``
+ - Absolute timeout in nanoseconds. If ``NTSYNC_WAIT_REALTIME``
+ is set, the timeout is measured against the REALTIME clock;
+ otherwise it is measured against the MONOTONIC clock. If the
+ timeout is equal to or earlier than the current time, the
+ function returns immediately without sleeping. If ``timeout``
+ is U64_MAX, the function will sleep until an object is
+ signaled, and will not fail with ``ETIMEDOUT``.
+ * - ``objs``
+ - Pointer to an array of ``count`` file descriptors
+ (specified as an integer so that the structure has the same
+ size regardless of architecture). If any object is
+ invalid, the function fails with ``EINVAL``.
+ * - ``count``
+ - Number of objects specified in the ``objs`` array.
+ If greater than ``NTSYNC_MAX_WAIT_COUNT``, the function fails
+ with ``EINVAL``.
+ * - ``owner``
+ - Mutex owner identifier. If any object in ``objs`` is a mutex,
+ the ioctl will attempt to acquire that mutex on behalf of
+ ``owner``. If ``owner`` is zero, the ioctl fails with
+ ``EINVAL``.
+ * - ``index``
+ - On success, contains the index (into ``objs``) of the object
+ which was signaled. If ``alert`` was signaled instead,
+ this contains ``count``.
+ * - ``alert``
+ - Optional event object file descriptor. If nonzero, this
+ specifies an "alert" event object which, if signaled, will
+ terminate the wait. If nonzero, the identifier must point to a
+ valid event.
+ * - ``flags``
+ - Zero or more flags. Currently the only flag is
+ ``NTSYNC_WAIT_REALTIME``, which causes the timeout to be
+ measured against the REALTIME clock instead of MONOTONIC.
+ * - ``pad``
+ - Unused, must be set to zero.
+
+ This function attempts to acquire one of the given objects. If unable
+ to do so, it sleeps until an object becomes signaled, subsequently
+ acquiring it, or the timeout expires. In the latter case the ioctl
+ fails with ``ETIMEDOUT``. The function only acquires one object, even
+ if multiple objects are signaled.
+
+ A semaphore is considered to be signaled if its count is nonzero, and
+ is acquired by decrementing its count by one. A mutex is considered
+ to be signaled if it is unowned or if its owner matches the ``owner``
+ argument, and is acquired by incrementing its recursion count by one
+ and setting its owner to the ``owner`` argument. An auto-reset event
+ is acquired by designaling it; a manual-reset event is not affected
+ by acquisition.
+
+ Acquisition is atomic and totally ordered with respect to other
+ operations on the same object. If two wait operations (with different
+ ``owner`` identifiers) are queued on the same mutex, only one is
+ signaled. If two wait operations are queued on the same semaphore,
+ and a value of one is posted to it, only one is signaled.
+
+ If an abandoned mutex is acquired, the ioctl fails with
+ ``EOWNERDEAD``. Although this is a failure return, the function may
+ otherwise be considered successful. The mutex is marked as owned by
+ the given owner (with a recursion count of 1) and as no longer
+ abandoned, and ``index`` is still set to the index of the mutex.
+
+ The ``alert`` argument is an "extra" event which can terminate the
+ wait, independently of all other objects.
+
+ It is valid to pass the same object more than once, including by
+ passing the same event in the ``objs`` array and in ``alert``. If a
+ wakeup occurs due to that object being signaled, ``index`` is set to
+ the lowest index corresponding to that object.
+
+ The function may fail with ``EINTR`` if a signal is received.
+
+.. c:macro:: NTSYNC_IOC_WAIT_ALL
+
+ Poll on a list of objects, atomically acquiring all of them. Takes a
+ pointer to struct :c:type:`ntsync_wait_args`, which is used
+ identically to ``NTSYNC_IOC_WAIT_ANY``, except that ``index`` is
+ always filled with zero on success if not woken via alert.
+
+ This function attempts to simultaneously acquire all of the given
+ objects. If unable to do so, it sleeps until all objects become
+ simultaneously signaled, subsequently acquiring them, or the timeout
+ expires. In the latter case the ioctl fails with ``ETIMEDOUT`` and no
+ objects are modified.
+
+ Objects may become signaled and subsequently designaled (through
+ acquisition by other threads) while this thread is sleeping. Only
+ once all objects are simultaneously signaled does the ioctl acquire
+ them and return. The entire acquisition is atomic and totally ordered
+ with respect to other operations on any of the given objects.
+
+ If an abandoned mutex is acquired, the ioctl fails with
+ ``EOWNERDEAD``. Similarly to ``NTSYNC_IOC_WAIT_ANY``, all objects are
+ nevertheless marked as acquired. Note that if multiple mutex objects
+ are specified, there is no way to know which were marked as
+ abandoned.
+
+ As with "any" waits, the ``alert`` argument is an "extra" event which
+ can terminate the wait. Critically, however, an "all" wait will
+ succeed if all members in ``objs`` are signaled, *or* if ``alert`` is
+ signaled. In the latter case ``index`` will be set to ``count``. As
+ with "any" waits, if both conditions are filled, the former takes
+ priority, and objects in ``objs`` will be acquired.
+
+ Unlike ``NTSYNC_IOC_WAIT_ANY``, it is not valid to pass the same
+ object more than once, nor is it valid to pass the same object in
+ ``objs`` and in ``alert``. If this is attempted, the function fails
+ with ``EINVAL``.
diff --git a/Documentation/userspace-api/perf_ring_buffer.rst b/Documentation/userspace-api/perf_ring_buffer.rst
index bde9d8cbc106..dc71544532ce 100644
--- a/Documentation/userspace-api/perf_ring_buffer.rst
+++ b/Documentation/userspace-api/perf_ring_buffer.rst
@@ -627,7 +627,7 @@ regular ring buffer.
AUX events and AUX trace data are two different things. Let's see an
example::
- perf record -a -e cycles -e cs_etm/@tmc_etr0/ -- sleep 2
+ perf record -a -e cycles -e cs_etm// -- sleep 2
The above command enables two events: one is the event *cycles* from PMU
and another is the AUX event *cs_etm* from Arm CoreSight, both are saved
@@ -766,7 +766,7 @@ only record AUX trace data at a specific time point which users are
interested in. E.g. below gives an example of how to take snapshots
with 1 second interval with Arm CoreSight::
- perf record -e cs_etm/@tmc_etr0/u -S -a program &
+ perf record -e cs_etm//u -S -a program &
PERFPID=$!
while true; do
kill -USR2 $PERFPID
diff --git a/Documentation/userspace-api/sysfs-platform_profile.rst b/Documentation/userspace-api/sysfs-platform_profile.rst
index 4fccde2e4563..7f013356118a 100644
--- a/Documentation/userspace-api/sysfs-platform_profile.rst
+++ b/Documentation/userspace-api/sysfs-platform_profile.rst
@@ -40,3 +40,41 @@ added. Drivers which wish to introduce new profile names must:
1. Explain why the existing profile names cannot be used.
2. Add the new profile name, along with a clear description of the
expected behaviour, to the sysfs-platform_profile ABI documentation.
+
+"Custom" profile support
+========================
+The platform_profile class also supports profiles advertising a "custom"
+profile. This is intended to be set by drivers when the setttings in the
+driver have been modified in a way that a standard profile doesn't represent
+the current state.
+
+Multiple driver support
+=======================
+When multiple drivers on a system advertise a platform profile handler, the
+platform profile handler core will only advertise the profiles that are
+common between all drivers to the ``/sys/firmware/acpi`` interfaces.
+
+This is to ensure there is no ambiguity on what the profile names mean when
+all handlers don't support a profile.
+
+Individual drivers will register a 'platform_profile' class device that has
+similar semantics as the ``/sys/firmware/acpi/platform_profile`` interface.
+
+To discover which driver is associated with a platform profile handler the
+user can read the ``name`` attribute of the class device.
+
+To discover available profiles from the class interface the user can read the
+``choices`` attribute.
+
+If a user wants to select a profile for a specific driver, they can do so
+by writing to the ``profile`` attribute of the driver's class device.
+
+This will allow users to set different profiles for different drivers on the
+same system. If the selected profile by individual drivers differs the
+platform profile handler core will display the profile 'custom' to indicate
+that the profiles are not the same.
+
+While the ``platform_profile`` attribute has the value ``custom``, writing a
+common profile from ``platform_profile_choices`` to the platform_profile
+attribute of the platform profile handler core will set the profile for all
+drivers.