diff options
Diffstat (limited to 'Documentation/userspace-api')
26 files changed, 1524 insertions, 54 deletions
diff --git a/Documentation/userspace-api/accelerators/ocxl.rst b/Documentation/userspace-api/accelerators/ocxl.rst index db7570d5e50d..4e213af70237 100644 --- a/Documentation/userspace-api/accelerators/ocxl.rst +++ b/Documentation/userspace-api/accelerators/ocxl.rst @@ -3,8 +3,11 @@ OpenCAPI (Open Coherent Accelerator Processor Interface) ======================================================== OpenCAPI is an interface between processors and accelerators. It aims -at being low-latency and high-bandwidth. The specification is -developed by the `OpenCAPI Consortium <http://opencapi.org/>`_. +at being low-latency and high-bandwidth. + +The specification was developed by the OpenCAPI Consortium, and is now +available from the `Compute Express Link Consortium +<https://computeexpresslink.org/resource/opencapi-specification-archive/>`_. It allows an accelerator (which could be an FPGA, ASICs, ...) to access the host memory coherently, using virtual addresses. An OpenCAPI diff --git a/Documentation/userspace-api/check_exec.rst b/Documentation/userspace-api/check_exec.rst new file mode 100644 index 000000000000..05dfe3b56f71 --- /dev/null +++ b/Documentation/userspace-api/check_exec.rst @@ -0,0 +1,144 @@ +.. SPDX-License-Identifier: GPL-2.0 +.. Copyright © 2024 Microsoft Corporation + +=================== +Executability check +=================== + +The ``AT_EXECVE_CHECK`` :manpage:`execveat(2)` flag, and the +``SECBIT_EXEC_RESTRICT_FILE`` and ``SECBIT_EXEC_DENY_INTERACTIVE`` securebits +are intended for script interpreters and dynamic linkers to enforce a +consistent execution security policy handled by the kernel. See the +`samples/check-exec/inc.c`_ example. + +Whether an interpreter should check these securebits or not depends on the +security risk of running malicious scripts with respect to the execution +environment, and whether the kernel can check if a script is trustworthy or +not. For instance, Python scripts running on a server can use arbitrary +syscalls and access arbitrary files. Such interpreters should then be +enlighten to use these securebits and let users define their security policy. +However, a JavaScript engine running in a web browser should already be +sandboxed and then should not be able to harm the user's environment. + +Script interpreters or dynamic linkers built for tailored execution environments +(e.g. hardened Linux distributions or hermetic container images) could use +``AT_EXECVE_CHECK`` without checking the related securebits if backward +compatibility is handled by something else (e.g. atomic update ensuring that +all legitimate libraries are allowed to be executed). It is then recommended +for script interpreters and dynamic linkers to check the securebits at run time +by default, but also to provide the ability for custom builds to behave like if +``SECBIT_EXEC_RESTRICT_FILE`` or ``SECBIT_EXEC_DENY_INTERACTIVE`` were always +set to 1 (i.e. always enforce restrictions). + +AT_EXECVE_CHECK +=============== + +Passing the ``AT_EXECVE_CHECK`` flag to :manpage:`execveat(2)` only performs a +check on a regular file and returns 0 if execution of this file would be +allowed, ignoring the file format and then the related interpreter dependencies +(e.g. ELF libraries, script's shebang). + +Programs should always perform this check to apply kernel-level checks against +files that are not directly executed by the kernel but passed to a user space +interpreter instead. All files that contain executable code, from the point of +view of the interpreter, should be checked. However the result of this check +should only be enforced according to ``SECBIT_EXEC_RESTRICT_FILE`` or +``SECBIT_EXEC_DENY_INTERACTIVE.``. + +The main purpose of this flag is to improve the security and consistency of an +execution environment to ensure that direct file execution (e.g. +``./script.sh``) and indirect file execution (e.g. ``sh script.sh``) lead to +the same result. For instance, this can be used to check if a file is +trustworthy according to the caller's environment. + +In a secure environment, libraries and any executable dependencies should also +be checked. For instance, dynamic linking should make sure that all libraries +are allowed for execution to avoid trivial bypass (e.g. using ``LD_PRELOAD``). +For such secure execution environment to make sense, only trusted code should +be executable, which also requires integrity guarantees. + +To avoid race conditions leading to time-of-check to time-of-use issues, +``AT_EXECVE_CHECK`` should be used with ``AT_EMPTY_PATH`` to check against a +file descriptor instead of a path. + +SECBIT_EXEC_RESTRICT_FILE and SECBIT_EXEC_DENY_INTERACTIVE +========================================================== + +When ``SECBIT_EXEC_RESTRICT_FILE`` is set, a process should only interpret or +execute a file if a call to :manpage:`execveat(2)` with the related file +descriptor and the ``AT_EXECVE_CHECK`` flag succeed. + +This secure bit may be set by user session managers, service managers, +container runtimes, sandboxer tools... Except for test environments, the +related ``SECBIT_EXEC_RESTRICT_FILE_LOCKED`` bit should also be set. + +Programs should only enforce consistent restrictions according to the +securebits but without relying on any other user-controlled configuration. +Indeed, the use case for these securebits is to only trust executable code +vetted by the system configuration (through the kernel), so we should be +careful to not let untrusted users control this configuration. + +However, script interpreters may still use user configuration such as +environment variables as long as it is not a way to disable the securebits +checks. For instance, the ``PATH`` and ``LD_PRELOAD`` variables can be set by +a script's caller. Changing these variables may lead to unintended code +executions, but only from vetted executable programs, which is OK. For this to +make sense, the system should provide a consistent security policy to avoid +arbitrary code execution e.g., by enforcing a write xor execute policy. + +When ``SECBIT_EXEC_DENY_INTERACTIVE`` is set, a process should never interpret +interactive user commands (e.g. scripts). However, if such commands are passed +through a file descriptor (e.g. stdin), its content should be interpreted if a +call to :manpage:`execveat(2)` with the related file descriptor and the +``AT_EXECVE_CHECK`` flag succeed. + +For instance, script interpreters called with a script snippet as argument +should always deny such execution if ``SECBIT_EXEC_DENY_INTERACTIVE`` is set. + +This secure bit may be set by user session managers, service managers, +container runtimes, sandboxer tools... Except for test environments, the +related ``SECBIT_EXEC_DENY_INTERACTIVE_LOCKED`` bit should also be set. + +Here is the expected behavior for a script interpreter according to combination +of any exec securebits: + +1. ``SECBIT_EXEC_RESTRICT_FILE=0`` and ``SECBIT_EXEC_DENY_INTERACTIVE=0`` + + Always interpret scripts, and allow arbitrary user commands (default). + + No threat, everyone and everything is trusted, but we can get ahead of + potential issues thanks to the call to :manpage:`execveat(2)` with + ``AT_EXECVE_CHECK`` which should always be performed but ignored by the + script interpreter. Indeed, this check is still important to enable systems + administrators to verify requests (e.g. with audit) and prepare for + migration to a secure mode. + +2. ``SECBIT_EXEC_RESTRICT_FILE=1`` and ``SECBIT_EXEC_DENY_INTERACTIVE=0`` + + Deny script interpretation if they are not executable, but allow + arbitrary user commands. + + The threat is (potential) malicious scripts run by trusted (and not fooled) + users. That can protect against unintended script executions (e.g. ``sh + /tmp/*.sh``). This makes sense for (semi-restricted) user sessions. + +3. ``SECBIT_EXEC_RESTRICT_FILE=0`` and ``SECBIT_EXEC_DENY_INTERACTIVE=1`` + + Always interpret scripts, but deny arbitrary user commands. + + This use case may be useful for secure services (i.e. without interactive + user session) where scripts' integrity is verified (e.g. with IMA/EVM or + dm-verity/IPE) but where access rights might not be ready yet. Indeed, + arbitrary interactive commands would be much more difficult to check. + +4. ``SECBIT_EXEC_RESTRICT_FILE=1`` and ``SECBIT_EXEC_DENY_INTERACTIVE=1`` + + Deny script interpretation if they are not executable, and also deny + any arbitrary user commands. + + The threat is malicious scripts run by untrusted users (but trusted code). + This makes sense for system services that may only execute trusted scripts. + +.. Links +.. _samples/check-exec/inc.c: + https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/samples/check-exec/inc.c diff --git a/Documentation/userspace-api/dma-buf-heaps.rst b/Documentation/userspace-api/dma-buf-heaps.rst new file mode 100644 index 000000000000..535f49047ce6 --- /dev/null +++ b/Documentation/userspace-api/dma-buf-heaps.rst @@ -0,0 +1,25 @@ +.. SPDX-License-Identifier: GPL-2.0 + +============================== +Allocating dma-buf using heaps +============================== + +Dma-buf Heaps are a way for userspace to allocate dma-buf objects. They are +typically used to allocate buffers from a specific allocation pool, or to share +buffers across frameworks. + +Heaps +===== + +A heap represents a specific allocator. The Linux kernel currently supports the +following heaps: + + - The ``system`` heap allocates virtually contiguous, cacheable, buffers. + + - The ``cma`` heap allocates physically contiguous, cacheable, + buffers. Only present if a CMA region is present. Such a region is + usually created either through the kernel commandline through the + `cma` parameter, a memory region Device-Tree node with the + `linux,cma-default` property set, or through the `CMA_SIZE_MBYTES` or + `CMA_SIZE_PERCENTAGE` Kconfig options. Depending on the platform, it + might be called ``reserved``, ``linux,cma``, or ``default-pool``. diff --git a/Documentation/userspace-api/fwctl/fwctl-cxl.rst b/Documentation/userspace-api/fwctl/fwctl-cxl.rst new file mode 100644 index 000000000000..670b43b72949 --- /dev/null +++ b/Documentation/userspace-api/fwctl/fwctl-cxl.rst @@ -0,0 +1,142 @@ +.. SPDX-License-Identifier: GPL-2.0 + +================ +fwctl cxl driver +================ + +:Author: Dave Jiang + +Overview +======== + +The CXL spec defines a set of commands that can be issued to the mailbox of a +CXL device or switch. It also left room for vendor specific commands to be +issued to the mailbox as well. fwctl provides a path to issue a set of allowed +mailbox commands from user space to the device moderated by the kernel driver. + +The following 3 commands will be used to support CXL Features: +CXL spec r3.1 8.2.9.6.1 Get Supported Features (Opcode 0500h) +CXL spec r3.1 8.2.9.6.2 Get Feature (Opcode 0501h) +CXL spec r3.1 8.2.9.6.3 Set Feature (Opcode 0502h) + +The "Get Supported Features" return data may be filtered by the kernel driver to +drop any features that are forbidden by the kernel or being exclusively used by +the kernel. The driver will set the "Set Feature Size" of the "Get Supported +Features Supported Feature Entry" to 0 to indicate that the Feature cannot be +modified. The "Get Supported Features" command and the "Get Features" falls +under the fwctl policy of FWCTL_RPC_CONFIGURATION. + +For "Set Feature" command, the access policy currently is broken down into two +categories depending on the Set Feature effects reported by the device. If the +Set Feature will cause immediate change to the device, the fwctl access policy +must be FWCTL_RPC_DEBUG_WRITE_FULL. The effects for this level are +"immediate config change", "immediate data change", "immediate policy change", +or "immediate log change" for the set effects mask. If the effects are "config +change with cold reset" or "config change with conventional reset", then the +fwctl access policy must be FWCTL_RPC_DEBUG_WRITE or higher. + +fwctl cxl User API +================== + +.. kernel-doc:: include/uapi/fwctl/cxl.h + +1. Driver info query +-------------------- + +First step for the app is to issue the ioctl(FWCTL_CMD_INFO). Successful +invocation of the ioctl implies the Features capability is operational and +returns an all zeros 32bit payload. A ``struct fwctl_info`` needs to be filled +out with the ``fwctl_info.out_device_type`` set to ``FWCTL_DEVICE_TYPE_CXL``. +The return data should be ``struct fwctl_info_cxl`` that contains a reserved +32bit field that should be all zeros. + +2. Send hardware commands +------------------------- + +Next step is to send the 'Get Supported Features' command to the driver from +user space via ioctl(FWCTL_RPC). A ``struct fwctl_rpc_cxl`` is pointed to +by ``fwctl_rpc.in``. ``struct fwctl_rpc_cxl.in_payload`` points to +the hardware input structure that is defined by the CXL spec. ``fwctl_rpc.out`` +points to the buffer that contains a ``struct fwctl_rpc_cxl_out`` that includes +the hardware output data inlined as ``fwctl_rpc_cxl_out.payload``. This command +is called twice. First time to retrieve the number of features supported. +A second time to retrieve the specific feature details as the output data. + +After getting the specific feature details, a Get/Set Feature command can be +appropriately programmed and sent. For a "Set Feature" command, the retrieved +feature info contains an effects field that details the resulting +"Set Feature" command will trigger. That will inform the user whether +the system is configured to allowed the "Set Feature" command or not. + +Code example of a Get Feature +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. code-block:: c + + static int cxl_fwctl_rpc_get_test_feature(int fd, struct test_feature *feat_ctx, + const uint32_t expected_data) + { + struct cxl_mbox_get_feat_in *feat_in; + struct fwctl_rpc_cxl_out *out; + struct fwctl_rpc rpc = {0}; + struct fwctl_rpc_cxl *in; + size_t out_size, in_size; + uint32_t val; + void *data; + int rc; + + in_size = sizeof(*in) + sizeof(*feat_in); + rc = posix_memalign((void **)&in, 16, in_size); + if (rc) + return -ENOMEM; + memset(in, 0, in_size); + feat_in = &in->get_feat_in; + + uuid_copy(feat_in->uuid, feat_ctx->uuid); + feat_in->count = feat_ctx->get_size; + + out_size = sizeof(*out) + feat_ctx->get_size; + rc = posix_memalign((void **)&out, 16, out_size); + if (rc) + goto free_in; + memset(out, 0, out_size); + + in->opcode = CXL_MBOX_OPCODE_GET_FEATURE; + in->op_size = sizeof(*feat_in); + + rpc.size = sizeof(rpc); + rpc.scope = FWCTL_RPC_CONFIGURATION; + rpc.in_len = in_size; + rpc.out_len = out_size; + rpc.in = (uint64_t)(uint64_t *)in; + rpc.out = (uint64_t)(uint64_t *)out; + + rc = send_command(fd, &rpc, out); + if (rc) + goto free_all; + + data = out->payload; + val = le32toh(*(__le32 *)data); + if (memcmp(&val, &expected_data, sizeof(val)) != 0) { + rc = -ENXIO; + goto free_all; + } + + free_all: + free(out); + free_in: + free(in); + return rc; + } + +Take a look at CXL CLI test directory +<https://github.com/pmem/ndctl/tree/main/test/fwctl.c> for a detailed user code +for examples on how to exercise this path. + + +fwctl cxl Kernel API +==================== + +.. kernel-doc:: drivers/cxl/core/features.c + :export: +.. kernel-doc:: include/cxl/features.h diff --git a/Documentation/userspace-api/fwctl/fwctl.rst b/Documentation/userspace-api/fwctl/fwctl.rst new file mode 100644 index 000000000000..fdcfe418a83f --- /dev/null +++ b/Documentation/userspace-api/fwctl/fwctl.rst @@ -0,0 +1,286 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=============== +fwctl subsystem +=============== + +:Author: Jason Gunthorpe + +Overview +======== + +Modern devices contain extensive amounts of FW, and in many cases, are largely +software-defined pieces of hardware. The evolution of this approach is largely a +reaction to Moore's Law where a chip tape out is now highly expensive, and the +chip design is extremely large. Replacing fixed HW logic with a flexible and +tightly coupled FW/HW combination is an effective risk mitigation against chip +respin. Problems in the HW design can be counteracted in device FW. This is +especially true for devices which present a stable and backwards compatible +interface to the operating system driver (such as NVMe). + +The FW layer in devices has grown to incredible size and devices frequently +integrate clusters of fast processors to run it. For example, mlx5 devices have +over 30MB of FW code, and big configurations operate with over 1GB of FW managed +runtime state. + +The availability of such a flexible layer has created quite a variety in the +industry where single pieces of silicon are now configurable software-defined +devices and can operate in substantially different ways depending on the need. +Further, we often see cases where specific sites wish to operate devices in ways +that are highly specialized and require applications that have been tailored to +their unique configuration. + +Further, devices have become multi-functional and integrated to the point they +no longer fit neatly into the kernel's division of subsystems. Modern +multi-functional devices have drivers, such as bnxt/ice/mlx5/pds, that span many +subsystems while sharing the underlying hardware using the auxiliary device +system. + +All together this creates a challenge for the operating system, where devices +have an expansive FW environment that needs robust device-specific debugging +support, and FW-driven functionality that is not well suited to “generic” +interfaces. fwctl seeks to allow access to the full device functionality from +user space in the areas of debuggability, management, and first-boot/nth-boot +provisioning. + +fwctl is aimed at the common device design pattern where the OS and FW +communicate via an RPC message layer constructed with a queue or mailbox scheme. +In this case the driver will typically have some layer to deliver RPC messages +and collect RPC responses from device FW. The in-kernel subsystem drivers that +operate the device for its primary purposes will use these RPCs to build their +drivers, but devices also usually have a set of ancillary RPCs that don't really +fit into any specific subsystem. For example, a HW RAID controller is primarily +operated by the block layer but also comes with a set of RPCs to administer the +construction of drives within the HW RAID. + +In the past when devices were more single function, individual subsystems would +grow different approaches to solving some of these common problems. For instance +monitoring device health, manipulating its FLASH, debugging the FW, +provisioning, all have various unique interfaces across the kernel. + +fwctl's purpose is to define a common set of limited rules, described below, +that allow user space to securely construct and execute RPCs inside device FW. +The rules serve as an agreement between the operating system and FW on how to +correctly design the RPC interface. As a uAPI the subsystem provides a thin +layer of discovery and a generic uAPI to deliver the RPCs and collect the +response. It supports a system of user space libraries and tools which will +use this interface to control the device using the device native protocols. + +Scope of Action +--------------- + +fwctl drivers are strictly restricted to being a way to operate the device FW. +It is not an avenue to access random kernel internals, or other operating system +SW states. + +fwctl instances must operate on a well-defined device function, and the device +should have a well-defined security model for what scope within the physical +device the function is permitted to access. For instance, the most complex PCIe +device today may broadly have several function-level scopes: + + 1. A privileged function with full access to the on-device global state and + configuration + + 2. Multiple hypervisor functions with control over itself and child functions + used with VMs + + 3. Multiple VM functions tightly scoped within the VM + +The device may create a logical parent/child relationship between these scopes. +For instance a child VM's FW may be within the scope of the hypervisor FW. It is +quite common in the VFIO world that the hypervisor environment has a complex +provisioning/profiling/configuration responsibility for the function VFIO +assigns to the VM. + +Further, within the function, devices often have RPC commands that fall within +some general scopes of action (see enum fwctl_rpc_scope): + + 1. Access to function & child configuration, FLASH, etc. that becomes live at a + function reset. Access to function & child runtime configuration that is + transparent or non-disruptive to any driver or VM. + + 2. Read-only access to function debug information that may report on FW objects + in the function & child, including FW objects owned by other kernel + subsystems. + + 3. Write access to function & child debug information strictly compatible with + the principles of kernel lockdown and kernel integrity protection. Triggers + a kernel Taint. + + 4. Full debug device access. Triggers a kernel Taint, requires CAP_SYS_RAWIO. + +User space will provide a scope label on each RPC and the kernel must enforce the +above CAPs and taints based on that scope. A combination of kernel and FW can +enforce that RPCs are placed in the correct scope by user space. + +Denied behavior +--------------- + +There are many things this interface must not allow user space to do (without a +Taint or CAP), broadly derived from the principles of kernel lockdown. Some +examples: + + 1. DMA to/from arbitrary memory, hang the system, compromise FW integrity with + untrusted code, or otherwise compromise device or system security and + integrity. + + 2. Provide an abnormal “back door” to kernel drivers. No manipulation of kernel + objects owned by kernel drivers. + + 3. Directly configure or otherwise control kernel drivers. A subsystem kernel + driver can react to the device configuration at function reset/driver load + time, but otherwise must not be coupled to fwctl. + + 4. Operate the HW in a way that overlaps with the core purpose of another + primary kernel subsystem, such as read/write to LBAs, send/receive of + network packets, or operate an accelerator's data plane. + +fwctl is not a replacement for device direct access subsystems like uacce or +VFIO. + +Operations exposed through fwctl's non-taining interfaces should be fully +sharable with other users of the device. For instance exposing a RPC through +fwctl should never prevent a kernel subsystem from also concurrently using that +same RPC or hardware unit down the road. In such cases fwctl will be less +important than proper kernel subsystems that eventually emerge. Mistakes in this +area resulting in clashes will be resolved in favour of a kernel implementation. + +fwctl User API +============== + +.. kernel-doc:: include/uapi/fwctl/fwctl.h +.. kernel-doc:: include/uapi/fwctl/mlx5.h +.. kernel-doc:: include/uapi/fwctl/pds.h + +sysfs Class +----------- + +fwctl has a sysfs class (/sys/class/fwctl/fwctlNN/) and character devices +(/dev/fwctl/fwctlNN) with a simple numbered scheme. The character device +operates the iotcl uAPI described above. + +fwctl devices can be related to driver components in other subsystems through +sysfs:: + + $ ls /sys/class/fwctl/fwctl0/device/infiniband/ + ibp0s10f0 + + $ ls /sys/class/infiniband/ibp0s10f0/device/fwctl/ + fwctl0/ + + $ ls /sys/devices/pci0000:00/0000:00:0a.0/fwctl/fwctl0 + dev device power subsystem uevent + +User space Community +-------------------- + +Drawing inspiration from nvme-cli, participating in the kernel side must come +with a user space in a common TBD git tree, at a minimum to usefully operate the +kernel driver. Providing such an implementation is a pre-condition to merging a +kernel driver. + +The goal is to build user space community around some of the shared problems +we all have, and ideally develop some common user space programs with some +starting themes of: + + - Device in-field debugging + + - HW provisioning + + - VFIO child device profiling before VM boot + + - Confidential Compute topics (attestation, secure provisioning) + +that stretch across all subsystems in the kernel. fwupd is a great example of +how an excellent user space experience can emerge out of kernel-side diversity. + +fwctl Kernel API +================ + +.. kernel-doc:: drivers/fwctl/main.c + :export: +.. kernel-doc:: include/linux/fwctl.h + +fwctl Driver design +------------------- + +In many cases a fwctl driver is going to be part of a larger cross-subsystem +device possibly using the auxiliary_device mechanism. In that case several +subsystems are going to be sharing the same device and FW interface layer so the +device design must already provide for isolation and cooperation between kernel +subsystems. fwctl should fit into that same model. + +Part of the driver should include a description of how its scope restrictions +and security model work. The driver and FW together must ensure that RPCs +provided by user space are mapped to the appropriate scope. If the validation is +done in the driver then the validation can read a 'command effects' report from +the device, or hardwire the enforcement. If the validation is done in the FW, +then the driver should pass the fwctl_rpc_scope to the FW along with the command. + +The driver and FW must cooperate to ensure that either fwctl cannot allocate +any FW resources, or any resources it does allocate are freed on FD closure. A +driver primarily constructed around FW RPCs may find that its core PCI function +and RPC layer belongs under fwctl with auxiliary devices connecting to other +subsystems. + +Each device type must be mindful of Linux's philosophy for stable ABI. The FW +RPC interface does not have to meet a strictly stable ABI, but it does need to +meet an expectation that userspace tools that are deployed and in significant +use don't needlessly break. FW upgrade and kernel upgrade should keep widely +deployed tooling working. + +Development and debugging focused RPCs under more permissive scopes can have +less stabilitiy if the tools using them are only run under exceptional +circumstances and not for every day use of the device. Debugging tools may even +require exact version matching as they may require something similar to DWARF +debug information from the FW binary. + +Security Response +================= + +The kernel remains the gatekeeper for this interface. If violations of the +scopes, security or isolation principles are found, we have options to let +devices fix them with a FW update, push a kernel patch to parse and block RPC +commands or push a kernel patch to block entire firmware versions/devices. + +While the kernel can always directly parse and restrict RPCs, it is expected +that the existing kernel pattern of allowing drivers to delegate validation to +FW to be a useful design. + +Existing Similar Examples +========================= + +The approach described in this document is not a new idea. Direct, or near +direct device access has been offered by the kernel in different areas for +decades. With more devices wanting to follow this design pattern it is becoming +clear that it is not entirely well understood and, more importantly, the +security considerations are not well defined or agreed upon. + +Some examples: + + - HW RAID controllers. This includes RPCs to do things like compose drives into + a RAID volume, configure RAID parameters, monitor the HW and more. + + - Baseboard managers. RPCs for configuring settings in the device and more + + - NVMe vendor command capsules. nvme-cli provides access to some monitoring + functions that different products have defined, but more exist. + + - CXL also has a NVMe-like vendor command system. + + - DRM allows user space drivers to send commands to the device via kernel + mediation + + - RDMA allows user space drivers to directly push commands to the device + without kernel involvement + + - Various “raw” APIs, raw HID (SDL2), raw USB, NVMe Generic Interface, etc. + +The first 4 are examples of areas that fwctl intends to cover. The latter three +are examples of denied behavior as they fully overlap with the primary purpose +of a kernel subsystem. + +Some key lessons learned from these past efforts are the importance of having a +common user space project to use as a pre-condition for obtaining a kernel +driver. Developing good community around useful software in user space is key to +getting companies to fund participation to enable their products. diff --git a/Documentation/userspace-api/fwctl/index.rst b/Documentation/userspace-api/fwctl/index.rst new file mode 100644 index 000000000000..316ac456ad3b --- /dev/null +++ b/Documentation/userspace-api/fwctl/index.rst @@ -0,0 +1,14 @@ +.. SPDX-License-Identifier: GPL-2.0 + +Firmware Control (FWCTL) Userspace API +====================================== + +A framework that define a common set of limited rules that allows user space +to securely construct and execute RPCs inside device firmware. + +.. toctree:: + :maxdepth: 1 + + fwctl + fwctl-cxl + pds_fwctl diff --git a/Documentation/userspace-api/fwctl/pds_fwctl.rst b/Documentation/userspace-api/fwctl/pds_fwctl.rst new file mode 100644 index 000000000000..b5a31f82c883 --- /dev/null +++ b/Documentation/userspace-api/fwctl/pds_fwctl.rst @@ -0,0 +1,46 @@ +.. SPDX-License-Identifier: GPL-2.0 + +================ +fwctl pds driver +================ + +:Author: Shannon Nelson + +Overview +======== + +The PDS Core device makes a fwctl service available through an +auxiliary_device named pds_core.fwctl.N. The pds_fwctl driver binds to +this device and registers itself with the fwctl subsystem. The resulting +userspace interface is used by an application that is a part of the +AMD Pensando software package for the Distributed Service Card (DSC). + +The pds_fwctl driver has little knowledge of the firmware's internals. +It only knows how to send commands through pds_core's message queue to the +firmware for fwctl requests. The set of fwctl operations available +depends on the firmware in the DSC, and the userspace application +version must match the firmware so that they can talk to each other. + +When a connection is created the pds_fwctl driver requests from the +firmware a list of firmware object endpoints, and for each endpoint the +driver requests a list of operations for that endpoint. + +Each operation description includes a firmware defined command attribute +that maps to the FWCTL scope levels. The driver translates those firmware +values into the FWCTL scope values which can then be used for filtering the +scoped user requests. + +pds_fwctl User API +================== + +Each RPC request includes the target endpoint and the operation id, and in +and out buffer lengths and pointers. The driver verifies the existence +of the requested endpoint and operations, then checks the request scope +against the required scope of the operation. The request is then put +together with the request data and sent through pds_core's message queue +to the firmware, and the results are returned to the caller. + +The RPC endpoints, operations, and buffer contents are defined by the +particular firmware package in the device, which varies across the +available product configurations. The details are available in the +specific product SDK documentation. diff --git a/Documentation/userspace-api/index.rst b/Documentation/userspace-api/index.rst index 274cc7546efc..b8c73be4fb11 100644 --- a/Documentation/userspace-api/index.rst +++ b/Documentation/userspace-api/index.rst @@ -35,6 +35,7 @@ Security-related interfaces mfd_noexec spec_ctrl tee + check_exec Devices and I/O =============== @@ -43,7 +44,9 @@ Devices and I/O :maxdepth: 1 accelerators/ocxl + dma-buf-heaps dma-buf-alloc-exchange + fwctl/index gpio/index iommufd media/index @@ -63,6 +66,7 @@ Everything else vduse futex2 perf_ring_buffer + ntsync .. only:: subproject and html diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst b/Documentation/userspace-api/ioctl/ioctl-number.rst index 243f1f1b554a..bc91756bde73 100644 --- a/Documentation/userspace-api/ioctl/ioctl-number.rst +++ b/Documentation/userspace-api/ioctl/ioctl-number.rst @@ -28,10 +28,10 @@ or number from the table below. Because of the large number of drivers, many drivers share a partial letter with other drivers. If you are writing a driver for a new device and need a letter, pick an -unused block with enough room for expansion: 32 to 256 ioctl commands. -You can register the block by patching this file and submitting the -patch to Linus Torvalds. Or you can e-mail me at <mec@shout.net> and -I'll register one for you. +unused block with enough room for expansion: 32 to 256 ioctl commands +should suffice. You can register the block by patching this file and +submitting the patch through :doc:`usual patch submission process +</process/submitting-patches>`. The second argument to _IO, _IOW, _IOR, or _IOWR is a sequence number to distinguish ioctls from each other. The third argument to _IOW, @@ -62,9 +62,8 @@ Following this convention is good because: (5) When following the convention, the driver code can use generic code to copy the parameters between user and kernel space. -This table lists ioctls visible from user land for Linux/x86. It contains -most drivers up to 2.6.31, but I know I am missing some. There has been -no attempt to list non-X86 architectures or ioctls from drivers/staging/. +This table lists ioctls visible from userland, excluding ones from +drivers/staging/. ==== ===== ======================================================= ================================================================ Code Seq# Include File Comments @@ -85,6 +84,8 @@ Code Seq# Include File Comments 0x10 20-2F arch/s390/include/uapi/asm/hypfs.h 0x12 all linux/fs.h BLK* ioctls linux/blkpg.h + linux/blkzoned.h + linux/blk-crypto.h 0x15 all linux/fs.h FS_IOC_* ioctls 0x1b all InfiniBand Subsystem <http://infiniband.sourceforge.net/> @@ -283,6 +284,7 @@ Code Seq# Include File Comments 'p' 80-9F linux/ppdev.h user-space parport <mailto:tim@cyberelk.net> 'p' A1-A5 linux/pps.h LinuxPPS +'p' B1-B3 linux/pps_gen.h LinuxPPS <mailto:giometti@linux.it> 'q' 00-1F linux/serio.h 'q' 80-FF linux/telephony.h Internet PhoneJACK, Internet LineJACK @@ -311,6 +313,7 @@ Code Seq# Include File Comments <mailto:oe@port.de> 'z' 10-4F drivers/s390/crypto/zcrypt_api.h conflict! '|' 00-7F linux/media.h +'|' 80-9F samples/ Any sample and example drivers 0x80 00-1F linux/fb.h 0x81 00-1F linux/vduse.h 0x89 00-06 arch/x86/include/asm/sockios.h @@ -329,6 +332,7 @@ Code Seq# Include File Comments 0x97 00-7F fs/ceph/ioctl.h Ceph file system 0x99 00-0F 537-Addinboard driver <mailto:buk@buks.ipn.de> +0x9A 00-0F include/uapi/fwctl/fwctl.h 0xA0 all linux/sdp/sdp.h Industrial Device Project <mailto:kenji@bitgate.com> 0xA1 0 linux/vtpm_proxy.h TPM Emulator Proxy Driver @@ -361,6 +365,12 @@ Code Seq# Include File Comments <mailto:linuxppc-dev> 0xB2 01-02 arch/powerpc/include/uapi/asm/papr-sysparm.h powerpc/pseries system parameter API <mailto:linuxppc-dev> +0xB2 03-05 arch/powerpc/include/uapi/asm/papr-indices.h powerpc/pseries indices API + <mailto:linuxppc-dev> +0xB2 06-07 arch/powerpc/include/uapi/asm/papr-platform-dump.h powerpc/pseries Platform Dump API + <mailto:linuxppc-dev> +0xB2 08 powerpc/include/uapi/asm/papr-physical-attestation.h powerpc/pseries Physical Attestation API + <mailto:linuxppc-dev> 0xB3 00 linux/mmc/ioctl.h 0xB4 00-0F linux/gpio.h <mailto:linux-gpio@vger.kernel.org> 0xB5 00-0F uapi/linux/rpmsg.h <mailto:linux-remoteproc@vger.kernel.org> @@ -368,10 +378,12 @@ Code Seq# Include File Comments 0xB7 all uapi/linux/remoteproc_cdev.h <mailto:linux-remoteproc@vger.kernel.org> 0xB7 all uapi/linux/nsfs.h <mailto:Andrei Vagin <avagin@openvz.org>> 0xB8 01-02 uapi/misc/mrvl_cn10k_dpi.h Marvell CN10K DPI driver +0xB8 all uapi/linux/mshv.h Microsoft Hyper-V /dev/mshv driver + <mailto:linux-hyperv@vger.kernel.org> 0xC0 00-0F linux/usb/iowarrior.h -0xCA 00-0F uapi/misc/cxl.h +0xCA 00-0F uapi/misc/cxl.h Dead since 6.15 0xCA 10-2F uapi/misc/ocxl.h -0xCA 80-BF uapi/scsi/cxlflash_ioctl.h +0xCA 80-BF uapi/scsi/cxlflash_ioctl.h Dead since 6.15 0xCB 00-1F CBM serial IEC bus in development: <mailto:michael.klein@puffin.lb.shuttle.de> 0xCC 00-0F drivers/misc/ibmvmc.h pseries VMC driver @@ -390,6 +402,8 @@ Code Seq# Include File Comments <mailto:mathieu.desnoyers@efficios.com> 0xF8 all arch/x86/include/uapi/asm/amd_hsmp.h AMD HSMP EPYC system management interface driver <mailto:nchatrad@amd.com> +0xF9 00-0F uapi/misc/amd-apml.h AMD side band system management interface driver + <mailto:naveenkrishna.chatradhi@amd.com> 0xFD all linux/dm-ioctl.h 0xFE all linux/isst_if.h ==== ===== ======================================================= ================================================================ diff --git a/Documentation/userspace-api/iommufd.rst b/Documentation/userspace-api/iommufd.rst index 70289d6815d2..b0df15865dec 100644 --- a/Documentation/userspace-api/iommufd.rst +++ b/Documentation/userspace-api/iommufd.rst @@ -63,6 +63,13 @@ Following IOMMUFD objects are exposed to userspace: space usually has mappings from guest-level I/O virtual addresses to guest- level physical addresses. +- IOMMUFD_FAULT, representing a software queue for an HWPT reporting IO page + faults using the IOMMU HW's PRI (Page Request Interface). This queue object + provides user space an FD to poll the page fault events and also to respond + to those events. A FAULT object must be created first to get a fault_id that + could be then used to allocate a fault-enabled HWPT via the IOMMU_HWPT_ALLOC + command by setting the IOMMU_HWPT_FAULT_ID_VALID bit in its flags field. + - IOMMUFD_OBJ_VIOMMU, representing a slice of the physical IOMMU instance, passed to or shared with a VM. It may be some HW-accelerated virtualization features and some SW resources used by the VM. For examples: @@ -109,6 +116,14 @@ Following IOMMUFD objects are exposed to userspace: vIOMMU, which is a separate ioctl call from attaching the same device to an HWPT_PAGING that the vIOMMU holds. +- IOMMUFD_OBJ_VEVENTQ, representing a software queue for a vIOMMU to report its + events such as translation faults occurred to a nested stage-1 (excluding I/O + page faults that should go through IOMMUFD_OBJ_FAULT) and HW-specific events. + This queue object provides user space an FD to poll/read the vIOMMU events. A + vIOMMU object must be created first to get its viommu_id, which could be then + used to allocate a vEVENTQ. Each vIOMMU can support multiple types of vEVENTS, + but is confined to one vEVENTQ per vEVENTQ type. + All user-visible objects are destroyed via the IOMMU_DESTROY uAPI. The diagrams below show relationships between user-visible objects and kernel @@ -251,8 +266,10 @@ User visible objects are backed by following datastructures: - iommufd_device for IOMMUFD_OBJ_DEVICE. - iommufd_hwpt_paging for IOMMUFD_OBJ_HWPT_PAGING. - iommufd_hwpt_nested for IOMMUFD_OBJ_HWPT_NESTED. +- iommufd_fault for IOMMUFD_OBJ_FAULT. - iommufd_viommu for IOMMUFD_OBJ_VIOMMU. - iommufd_vdevice for IOMMUFD_OBJ_VDEVICE. +- iommufd_veventq for IOMMUFD_OBJ_VEVENTQ. Several terminologies when looking at these datastructures: diff --git a/Documentation/userspace-api/landlock.rst b/Documentation/userspace-api/landlock.rst index d639c61cb472..1d0c2c15c22e 100644 --- a/Documentation/userspace-api/landlock.rst +++ b/Documentation/userspace-api/landlock.rst @@ -8,7 +8,7 @@ Landlock: unprivileged access control ===================================== :Author: Mickaël Salaün -:Date: October 2024 +:Date: March 2025 The goal of Landlock is to enable restriction of ambient rights (e.g. global filesystem or network access) for a set of processes. Because Landlock @@ -317,33 +317,32 @@ IPC scoping ----------- Similar to the implicit `Ptrace restrictions`_, we may want to further restrict -interactions between sandboxes. Each Landlock domain can be explicitly scoped -for a set of actions by specifying it on a ruleset. For example, if a -sandboxed process should not be able to :manpage:`connect(2)` to a -non-sandboxed process through abstract :manpage:`unix(7)` sockets, we can -specify such a restriction with ``LANDLOCK_SCOPE_ABSTRACT_UNIX_SOCKET``. -Moreover, if a sandboxed process should not be able to send a signal to a -non-sandboxed process, we can specify this restriction with -``LANDLOCK_SCOPE_SIGNAL``. - -A sandboxed process can connect to a non-sandboxed process when its domain is -not scoped. If a process's domain is scoped, it can only connect to sockets -created by processes in the same scope. -Moreover, If a process is scoped to send signal to a non-scoped process, it can -only send signals to processes in the same scope. - -A connected datagram socket behaves like a stream socket when its domain is -scoped, meaning if the domain is scoped after the socket is connected , it can -still :manpage:`send(2)` data just like a stream socket. However, in the same -scenario, a non-connected datagram socket cannot send data (with -:manpage:`sendto(2)`) outside its scope. - -A process with a scoped domain can inherit a socket created by a non-scoped -process. The process cannot connect to this socket since it has a scoped -domain. - -IPC scoping does not support exceptions, so if a domain is scoped, no rules can -be added to allow access to resources or processes outside of the scope. +interactions between sandboxes. Therefore, at ruleset creation time, each +Landlock domain can restrict the scope for certain operations, so that these +operations can only reach out to processes within the same Landlock domain or in +a nested Landlock domain (the "scope"). + +The operations which can be scoped are: + +``LANDLOCK_SCOPE_SIGNAL`` + This limits the sending of signals to target processes which run within the + same or a nested Landlock domain. + +``LANDLOCK_SCOPE_ABSTRACT_UNIX_SOCKET`` + This limits the set of abstract :manpage:`unix(7)` sockets to which we can + :manpage:`connect(2)` to socket addresses which were created by a process in + the same or a nested Landlock domain. + + A :manpage:`sendto(2)` on a non-connected datagram socket is treated as if + it were doing an implicit :manpage:`connect(2)` and will be blocked if the + remote end does not stem from the same or a nested Landlock domain. + + A :manpage:`sendto(2)` on a socket which was previously connected will not + be restricted. This works for both datagram and stream sockets. + +IPC scoping does not support exceptions via :manpage:`landlock_add_rule(2)`. +If an operation is scoped within a domain, no rules can be added to allow access +to resources or processes outside of the scope. Truncating files ---------------- @@ -595,6 +594,16 @@ Starting with the Landlock ABI version 6, it is possible to restrict :manpage:`signal(7)` sending by setting ``LANDLOCK_SCOPE_SIGNAL`` to the ``scoped`` ruleset attribute. +Logging (ABI < 7) +----------------- + +Starting with the Landlock ABI version 7, it is possible to control logging of +Landlock audit events with the ``LANDLOCK_RESTRICT_SELF_LOG_SAME_EXEC_OFF``, +``LANDLOCK_RESTRICT_SELF_LOG_NEW_EXEC_ON``, and +``LANDLOCK_RESTRICT_SELF_LOG_SUBDOMAINS_OFF`` flags passed to +sys_landlock_restrict_self(). See Documentation/admin-guide/LSM/landlock.rst +for more details on audit. + .. _kernel_support: Kernel support @@ -683,9 +692,16 @@ fine-grained restrictions). Moreover, their complexity can lead to security issues, especially when untrusted processes can manipulate them (cf. `Controlling access to user namespaces <https://lwn.net/Articles/673597/>`_). +How to disable Landlock audit records? +-------------------------------------- + +You might want to put in place filters as explained here: +Documentation/admin-guide/LSM/landlock.rst + Additional documentation ======================== +* Documentation/admin-guide/LSM/landlock.rst * Documentation/security/landlock.rst * https://landlock.io diff --git a/Documentation/userspace-api/media/drivers/uvcvideo.rst b/Documentation/userspace-api/media/drivers/uvcvideo.rst index a290f9fadae9..dbb30ad389ae 100644 --- a/Documentation/userspace-api/media/drivers/uvcvideo.rst +++ b/Documentation/userspace-api/media/drivers/uvcvideo.rst @@ -181,6 +181,7 @@ Argument: struct uvc_xu_control_mapping UVC_CTRL_DATA_TYPE_BOOLEAN Boolean UVC_CTRL_DATA_TYPE_ENUM Enumeration UVC_CTRL_DATA_TYPE_BITMASK Bitmask + UVC_CTRL_DATA_TYPE_RECT Rectangular area UVCIOC_CTRL_QUERY - Query a UVC XU control @@ -255,3 +256,66 @@ Argument: struct uvc_xu_control_query __u8 query Request code to send to the device __u16 size Control data size (in bytes) __u8 *data Control value + + +Driver-specific V4L2 controls +----------------------------- + +The uvcvideo driver implements the following UVC-specific controls: + +``V4L2_CID_UVC_REGION_OF_INTEREST_RECT (struct)`` + This control determines the region of interest (ROI). ROI is a + rectangular area represented by a struct :c:type:`v4l2_rect`. The + rectangle is in global sensor coordinates using pixel units. It is + independent of the field of view, not impacted by any cropping or + scaling. + + Use ``V4L2_CTRL_WHICH_MIN_VAL`` and ``V4L2_CTRL_WHICH_MAX_VAL`` to query + the range of rectangle sizes. + + Setting a ROI allows the camera to optimize the capture for the region. + The value of ``V4L2_CID_REGION_OF_INTEREST_AUTO`` control determines + the detailed behavior. + + An example of use of this control, can be found in the: + `Chrome OS USB camera HAL. + <https://chromium.googlesource.com/chromiumos/platform2/+/refs/heads/release-R121-15699.B/camera/hal/usb/>` + + +``V4L2_CID_UVC_REGION_OF_INTEREST_AUTO (bitmask)`` + This determines which, if any, on-board features should track to the + Region of Interest specified by the current value of + ``V4L2_CID_UVD__REGION_OF_INTEREST_RECT``. + + Max value is a mask indicating all supported Auto Controls. + +.. flat-table:: + :header-rows: 0 + :stub-columns: 0 + + * - ``V4L2_UVC_REGION_OF_INTEREST_AUTO_EXPOSURE`` + - Setting this bit causes automatic exposure to track the region of + interest instead of the whole image. + * - ``V4L2_UVC_REGION_OF_INTEREST_AUTO_IRIS`` + - Setting this bit causes automatic iris to track the region of interest + instead of the whole image. + * - ``V4L2_UVC_REGION_OF_INTEREST_AUTO_WHITE_BALANCE`` + - Setting this bit causes automatic white balance to track the region + of interest instead of the whole image. + * - ``V4L2_UVC_REGION_OF_INTEREST_AUTO_FOCUS`` + - Setting this bit causes automatic focus adjustment to track the region + of interest instead of the whole image. + * - ``V4L2_UVC_REGION_OF_INTEREST_AUTO_FACE_DETECT`` + - Setting this bit causes automatic face detection to track the region of + interest instead of the whole image. + * - ``V4L2_UVC_REGION_OF_INTEREST_AUTO_DETECT_AND_TRACK`` + - Setting this bit enables automatic face detection and tracking. The + current value of ``V4L2_CID_REGION_OF_INTEREST_RECT`` may be updated by + the driver. + * - ``V4L2_UVC_REGION_OF_INTEREST_AUTO_IMAGE_STABILIZATION`` + - Setting this bit enables automatic image stabilization. The + current value of ``V4L2_CID_REGION_OF_INTEREST_RECT`` may be updated by + the driver. + * - ``V4L2_UVC_REGION_OF_INTEREST_AUTO_HIGHER_QUALITY`` + - Setting this bit enables automatically capture the specified region + with higher quality if possible. diff --git a/Documentation/userspace-api/media/rc/rc-sysfs-nodes.rst b/Documentation/userspace-api/media/rc/rc-sysfs-nodes.rst index 34d6a0a1f4d3..70b5966aaff8 100644 --- a/Documentation/userspace-api/media/rc/rc-sysfs-nodes.rst +++ b/Documentation/userspace-api/media/rc/rc-sysfs-nodes.rst @@ -6,7 +6,7 @@ Remote Controller's sysfs nodes ******************************* -As defined at ``Documentation/ABI/testing/sysfs-class-rc``, those are +As defined at Documentation/ABI/testing/sysfs-class-rc, those are the sysfs nodes that control the Remote Controllers: diff --git a/Documentation/userspace-api/media/v4l/meta-formats.rst b/Documentation/userspace-api/media/v4l/meta-formats.rst index 86ffb3bc8ade..bb6876cfc271 100644 --- a/Documentation/userspace-api/media/v4l/meta-formats.rst +++ b/Documentation/userspace-api/media/v4l/meta-formats.rst @@ -12,6 +12,7 @@ These formats are used for the :ref:`metadata` interface only. .. toctree:: :maxdepth: 1 + metafmt-c3-isp metafmt-d4xx metafmt-generic metafmt-intel-ipu3 diff --git a/Documentation/userspace-api/media/v4l/metafmt-c3-isp.rst b/Documentation/userspace-api/media/v4l/metafmt-c3-isp.rst new file mode 100644 index 000000000000..449b45c2ec24 --- /dev/null +++ b/Documentation/userspace-api/media/v4l/metafmt-c3-isp.rst @@ -0,0 +1,86 @@ +.. SPDX-License-Identifier: (GPL-2.0-only OR MIT) + +.. _v4l2-meta-fmt-c3isp-stats: +.. _v4l2-meta-fmt-c3isp-params: + +*********************************************************************** +V4L2_META_FMT_C3ISP_STATS ('C3ST'), V4L2_META_FMT_C3ISP_PARAMS ('C3PM') +*********************************************************************** + +.. c3_isp_stats_info + +3A Statistics +============= + +The C3 ISP can collect different statistics over an input Bayer frame. +Those statistics are obtained from the "c3-isp-stats" metadata capture video nodes, +using the :c:type:`v4l2_meta_format` interface. +They are formatted as described by the :c:type:`c3_isp_stats_info` structure. + +The statistics collected are Auto-white balance, +Auto-exposure and Auto-focus information. + +.. c3_isp_params_cfg + +Configuration Parameters +======================== + +The configuration parameters are passed to the c3-isp-params metadata output video node, +using the :c:type:`v4l2_meta_format` interface. Rather than a single struct containing +sub-structs for each configurable area of the ISP, parameters for the C3-ISP +are defined as distinct structs or "blocks" which may be added to the data +member of :c:type:`c3_isp_params_cfg`. Userspace is responsible for +populating the data member with the blocks that need to be configured by the driver, but +need not populate it with **all** the blocks, or indeed with any at all if there +are no configuration changes to make. Populated blocks **must** be consecutive +in the buffer. To assist both userspace and the driver in identifying the +blocks each block-specific struct embeds +:c:type:`c3_isp_params_block_header` as its first member and userspace +must populate the type member with a value from +:c:type:`c3_isp_params_block_type`. Once the blocks have been populated +into the data buffer, the combined size of all populated blocks shall be set in +the data_size member of :c:type:`c3_isp_params_cfg`. For example: + +.. code-block:: c + + struct c3_isp_params_cfg *params = + (struct c3_isp_params_cfg *)buffer; + + params->version = C3_ISP_PARAM_BUFFER_V0; + params->data_size = 0; + + void *data = (void *)params->data; + + struct c3_isp_params_awb_gains *gains = + (struct c3_isp_params_awb_gains *)data; + + gains->header.type = C3_ISP_PARAMS_BLOCK_AWB_GAINS; + gains->header.flags = C3_ISP_PARAMS_BLOCK_FL_ENABLE; + gains->header.size = sizeof(struct c3_isp_params_awb_gains); + + gains->gr_gain = 256; + gains->r_gain = 256; + gains->b_gain = 256; + gains->gb_gain = 256; + + data += sizeof(struct c3_isp__params_awb_gains); + params->data_size += sizeof(struct c3_isp_params_awb_gains); + + struct c3_isp_params_awb_config *awb_cfg = + (struct c3_isp_params_awb_config *)data; + + awb_cfg->header.type = C3_ISP_PARAMS_BLOCK_AWB_CONFIG; + awb_cfg->header.flags = C3_ISP_PARAMS_BLOCK_FL_ENABLE; + awb_cfg->header.size = sizeof(struct c3_isp_params_awb_config); + + awb_cfg->tap_point = C3_ISP_AWB_STATS_TAP_BEFORE_WB; + awb_cfg->satur = 1; + awb_cfg->horiz_zones_num = 32; + awb_cfg->vert_zones_num = 24; + + params->data_size += sizeof(struct c3_isp_params_awb_config); + +Amlogic C3 ISP uAPI data types +=============================== + +.. kernel-doc:: include/uapi/linux/media/amlogic/c3-isp-config.h diff --git a/Documentation/userspace-api/media/v4l/pixfmt-yuv-planar.rst b/Documentation/userspace-api/media/v4l/pixfmt-yuv-planar.rst index b788f6933855..6e4f399f1f88 100644 --- a/Documentation/userspace-api/media/v4l/pixfmt-yuv-planar.rst +++ b/Documentation/userspace-api/media/v4l/pixfmt-yuv-planar.rst @@ -137,6 +137,13 @@ All components are stored with the same number of bits per component. - Cb, Cr - No - Linear + * - V4L2_PIX_FMT_NV15 + - 'NV15' + - 10 + - 4:2:0 + - Cb, Cr + - Yes + - Linear * - V4L2_PIX_FMT_NV15_4L4 - 'VT15' - 15 @@ -186,6 +193,13 @@ All components are stored with the same number of bits per component. - Cr, Cb - No - Linear + * - V4L2_PIX_FMT_NV20 + - 'NV20' + - 10 + - 4:2:2 + - Cb, Cr + - Yes + - Linear * - V4L2_PIX_FMT_NV24 - 'NV24' - 8 @@ -302,6 +316,57 @@ of the luma plane. - Cr\ :sub:`11` +.. _V4L2-PIX-FMT-NV15: + +NV15 +---- + +Semi-planar 10-bit YUV 4:2:0 format similar to NV12, using 10-bit components +with no padding between each component. A group of 4 components are stored over +5 bytes in little endian order. + +.. flat-table:: Sample 4x4 NV15 Image (1 byte per cell) + :header-rows: 0 + :stub-columns: 0 + + * - start + 0: + - Y'\ :sub:`00[7:0]` + - Y'\ :sub:`01[5:0]`\ Y'\ :sub:`00[9:8]` + - Y'\ :sub:`02[3:0]`\ Y'\ :sub:`01[9:6]` + - Y'\ :sub:`03[1:0]`\ Y'\ :sub:`02[9:4]` + - Y'\ :sub:`03[9:2]` + * - start + 5: + - Y'\ :sub:`10[7:0]` + - Y'\ :sub:`11[5:0]`\ Y'\ :sub:`10[9:8]` + - Y'\ :sub:`12[3:0]`\ Y'\ :sub:`11[9:6]` + - Y'\ :sub:`13[1:0]`\ Y'\ :sub:`12[9:4]` + - Y'\ :sub:`13[9:2]` + * - start + 10: + - Y'\ :sub:`20[7:0]` + - Y'\ :sub:`21[5:0]`\ Y'\ :sub:`20[9:8]` + - Y'\ :sub:`22[3:0]`\ Y'\ :sub:`21[9:6]` + - Y'\ :sub:`23[1:0]`\ Y'\ :sub:`22[9:4]` + - Y'\ :sub:`23[9:2]` + * - start + 15: + - Y'\ :sub:`30[7:0]` + - Y'\ :sub:`31[5:0]`\ Y'\ :sub:`30[9:8]` + - Y'\ :sub:`32[3:0]`\ Y'\ :sub:`31[9:6]` + - Y'\ :sub:`33[1:0]`\ Y'\ :sub:`32[9:4]` + - Y'\ :sub:`33[9:2]` + * - start + 20: + - Cb\ :sub:`00[7:0]` + - Cr\ :sub:`00[5:0]`\ Cb\ :sub:`00[9:8]` + - Cb\ :sub:`01[3:0]`\ Cr\ :sub:`00[9:6]` + - Cr\ :sub:`01[1:0]`\ Cb\ :sub:`01[9:4]` + - Cr\ :sub:`01[9:2]` + * - start + 25: + - Cb\ :sub:`10[7:0]` + - Cr\ :sub:`10[5:0]`\ Cb\ :sub:`10[9:8]` + - Cb\ :sub:`11[3:0]`\ Cr\ :sub:`10[9:6]` + - Cr\ :sub:`11[1:0]`\ Cb\ :sub:`11[9:4]` + - Cr\ :sub:`11[9:2]` + + .. _V4L2-PIX-FMT-NV12MT: .. _V4L2-PIX-FMT-NV12MT-16X16: .. _V4L2-PIX-FMT-NV12-4L4: @@ -631,6 +696,69 @@ number of lines as the luma plane. - Cr\ :sub:`32` +.. _V4L2-PIX-FMT-NV20: + +NV20 +---- + +Semi-planar 10-bit YUV 4:2:2 format similar to NV16, using 10-bit components +with no padding between each component. A group of 4 components are stored over +5 bytes in little endian order. + +.. flat-table:: Sample 4x4 NV20 Image (1 byte per cell) + :header-rows: 0 + :stub-columns: 0 + + * - start + 0: + - Y'\ :sub:`00[7:0]` + - Y'\ :sub:`01[5:0]`\ Y'\ :sub:`00[9:8]` + - Y'\ :sub:`02[3:0]`\ Y'\ :sub:`01[9:6]` + - Y'\ :sub:`03[1:0]`\ Y'\ :sub:`02[9:4]` + - Y'\ :sub:`03[9:2]` + * - start + 5: + - Y'\ :sub:`10[7:0]` + - Y'\ :sub:`11[5:0]`\ Y'\ :sub:`10[9:8]` + - Y'\ :sub:`12[3:0]`\ Y'\ :sub:`11[9:6]` + - Y'\ :sub:`13[1:0]`\ Y'\ :sub:`12[9:4]` + - Y'\ :sub:`13[9:2]` + * - start + 10: + - Y'\ :sub:`20[7:0]` + - Y'\ :sub:`21[5:0]`\ Y'\ :sub:`20[9:8]` + - Y'\ :sub:`22[3:0]`\ Y'\ :sub:`21[9:6]` + - Y'\ :sub:`23[1:0]`\ Y'\ :sub:`22[9:4]` + - Y'\ :sub:`23[9:2]` + * - start + 15: + - Y'\ :sub:`30[7:0]` + - Y'\ :sub:`31[5:0]`\ Y'\ :sub:`30[9:8]` + - Y'\ :sub:`32[3:0]`\ Y'\ :sub:`31[9:6]` + - Y'\ :sub:`33[1:0]`\ Y'\ :sub:`32[9:4]` + - Y'\ :sub:`33[9:2]` + * - start + 20: + - Cb\ :sub:`00[7:0]` + - Cr\ :sub:`00[5:0]`\ Cb\ :sub:`00[9:8]` + - Cb\ :sub:`01[3:0]`\ Cr\ :sub:`00[9:6]` + - Cr\ :sub:`01[1:0]`\ Cb\ :sub:`01[9:4]` + - Cr\ :sub:`01[9:2]` + * - start + 25: + - Cb\ :sub:`10[7:0]` + - Cr\ :sub:`10[5:0]`\ Cb\ :sub:`10[9:8]` + - Cb\ :sub:`11[3:0]`\ Cr\ :sub:`10[9:6]` + - Cr\ :sub:`11[1:0]`\ Cb\ :sub:`11[9:4]` + - Cr\ :sub:`11[9:2]` + * - start + 30: + - Cb\ :sub:`20[7:0]` + - Cr\ :sub:`20[5:0]`\ Cb\ :sub:`20[9:8]` + - Cb\ :sub:`21[3:0]`\ Cr\ :sub:`20[9:6]` + - Cr\ :sub:`21[1:0]`\ Cb\ :sub:`21[9:4]` + - Cr\ :sub:`21[9:2]` + * - start + 35: + - Cb\ :sub:`30[7:0]` + - Cr\ :sub:`30[5:0]`\ Cb\ :sub:`30[9:8]` + - Cb\ :sub:`31[3:0]`\ Cr\ :sub:`30[9:6]` + - Cr\ :sub:`31[1:0]`\ Cb\ :sub:`31[9:4]` + - Cr\ :sub:`31[9:2]` + + .. _V4L2-PIX-FMT-NV24: .. _V4L2-PIX-FMT-NV42: diff --git a/Documentation/userspace-api/media/v4l/vidioc-g-ext-ctrls.rst b/Documentation/userspace-api/media/v4l/vidioc-g-ext-ctrls.rst index 4d56c0528ad7..b8698b85bd80 100644 --- a/Documentation/userspace-api/media/v4l/vidioc-g-ext-ctrls.rst +++ b/Documentation/userspace-api/media/v4l/vidioc-g-ext-ctrls.rst @@ -199,6 +199,10 @@ still cause this situation. - ``p_area`` - A pointer to a struct :c:type:`v4l2_area`. Valid if this control is of type ``V4L2_CTRL_TYPE_AREA``. + * - struct :c:type:`v4l2_rect` * + - ``p_rect`` + - A pointer to a struct :c:type:`v4l2_rect`. Valid if this control is + of type ``V4L2_CTRL_TYPE_RECT``. * - struct :c:type:`v4l2_ctrl_h264_sps` * - ``p_h264_sps`` - A pointer to a struct :c:type:`v4l2_ctrl_h264_sps`. Valid if this control is @@ -334,14 +338,26 @@ still cause this situation. - Which value of the control to get/set/try. * - :cspan:`2` ``V4L2_CTRL_WHICH_CUR_VAL`` will return the current value of the control, ``V4L2_CTRL_WHICH_DEF_VAL`` will return the default - value of the control and ``V4L2_CTRL_WHICH_REQUEST_VAL`` indicates that - these controls have to be retrieved from a request or tried/set for - a request. In the latter case the ``request_fd`` field contains the + value of the control, ``V4L2_CTRL_WHICH_MIN_VAL`` will return the minimum + value of the control, and ``V4L2_CTRL_WHICH_MAX_VAL`` will return the maximum + value of the control. ``V4L2_CTRL_WHICH_REQUEST_VAL`` indicates that + the control value has to be retrieved from a request or tried/set for + a request. In that case the ``request_fd`` field contains the file descriptor of the request that should be used. If the device does not support requests, then ``EACCES`` will be returned. - When using ``V4L2_CTRL_WHICH_DEF_VAL`` be aware that you can only - get the default value of the control, you cannot set or try it. + When using ``V4L2_CTRL_WHICH_DEF_VAL``, ``V4L2_CTRL_WHICH_MIN_VAL`` + or ``V4L2_CTRL_WHICH_MAX_VAL`` be aware that you can only get the + default/minimum/maximum value of the control, you cannot set or try it. + + Whether a control supports querying the minimum and maximum values using + ``V4L2_CTRL_WHICH_MIN_VAL`` and ``V4L2_CTRL_WHICH_MAX_VAL`` is indicated + by the ``V4L2_CTRL_FLAG_HAS_WHICH_MIN_MAX`` flag. Most non-compound + control types support this. For controls with compound types, the + definition of minimum/maximum values are provided by + the control documentation. If a compound control does not document the + meaning of minimum/maximum value, then querying the minimum or maximum + value will result in the error code -EINVAL. For backwards compatibility you can also use a control class here (see :ref:`ctrl-class`). In that case all controls have to diff --git a/Documentation/userspace-api/media/v4l/vidioc-queryctrl.rst b/Documentation/userspace-api/media/v4l/vidioc-queryctrl.rst index 4d38acafe8e1..3549417c7feb 100644 --- a/Documentation/userspace-api/media/v4l/vidioc-queryctrl.rst +++ b/Documentation/userspace-api/media/v4l/vidioc-queryctrl.rst @@ -441,6 +441,16 @@ See also the examples in :ref:`control`. - n/a - A struct :c:type:`v4l2_area`, containing the width and the height of a rectangular area. Units depend on the use case. + * - ``V4L2_CTRL_TYPE_RECT`` + - n/a + - n/a + - n/a + - A struct :c:type:`v4l2_rect`, containing a rectangle described by + the position of its top-left corner, the width and the height. Units + depend on the use case. Support for ``V4L2_CTRL_WHICH_MIN_VAL`` and + ``V4L2_CTRL_WHICH_MAX_VAL`` is optional and depends on the + ``V4L2_CTRL_FLAG_HAS_WHICH_MIN_MAX`` flag. See the documentation of + the specific control on how to interpret the minimum and maximum values. * - ``V4L2_CTRL_TYPE_H264_SPS`` - n/a - n/a @@ -657,6 +667,10 @@ See also the examples in :ref:`control`. ``dims[0]``. So setting the control with a differently sized array will change the ``elems`` field when the control is queried afterwards. + * - ``V4L2_CTRL_FLAG_HAS_WHICH_MIN_MAX`` + - 0x1000 + - This control supports getting minimum and maximum values using + vidioc_g_ext_ctrls with V4L2_CTRL_WHICH_MIN/MAX_VAL. Return Value ============ diff --git a/Documentation/userspace-api/media/videodev2.h.rst.exceptions b/Documentation/userspace-api/media/videodev2.h.rst.exceptions index 429b5cdf05c3..35d3456cc812 100644 --- a/Documentation/userspace-api/media/videodev2.h.rst.exceptions +++ b/Documentation/userspace-api/media/videodev2.h.rst.exceptions @@ -150,6 +150,7 @@ replace symbol V4L2_CTRL_TYPE_HEVC_SPS :c:type:`v4l2_ctrl_type` replace symbol V4L2_CTRL_TYPE_HEVC_PPS :c:type:`v4l2_ctrl_type` replace symbol V4L2_CTRL_TYPE_HEVC_SLICE_PARAMS :c:type:`v4l2_ctrl_type` replace symbol V4L2_CTRL_TYPE_AREA :c:type:`v4l2_ctrl_type` +replace symbol V4L2_CTRL_TYPE_RECT :c:type:`v4l2_ctrl_type` replace symbol V4L2_CTRL_TYPE_FWHT_PARAMS :c:type:`v4l2_ctrl_type` replace symbol V4L2_CTRL_TYPE_VP8_FRAME :c:type:`v4l2_ctrl_type` replace symbol V4L2_CTRL_TYPE_VP9_COMPRESSED_HDR :c:type:`v4l2_ctrl_type` @@ -395,6 +396,7 @@ replace define V4L2_CTRL_FLAG_HAS_PAYLOAD control-flags replace define V4L2_CTRL_FLAG_EXECUTE_ON_WRITE control-flags replace define V4L2_CTRL_FLAG_MODIFY_LAYOUT control-flags replace define V4L2_CTRL_FLAG_DYNAMIC_ARRAY control-flags +replace define V4L2_CTRL_FLAG_HAS_WHICH_MIN_MAX control-flags replace define V4L2_CTRL_FLAG_NEXT_CTRL control replace define V4L2_CTRL_FLAG_NEXT_COMPOUND control @@ -569,6 +571,8 @@ ignore define V4L2_CTRL_DRIVER_PRIV ignore define V4L2_CTRL_MAX_DIMS ignore define V4L2_CTRL_WHICH_CUR_VAL ignore define V4L2_CTRL_WHICH_DEF_VAL +ignore define V4L2_CTRL_WHICH_MIN_VAL +ignore define V4L2_CTRL_WHICH_MAX_VAL ignore define V4L2_CTRL_WHICH_REQUEST_VAL ignore define V4L2_OUT_CAP_CUSTOM_TIMINGS ignore define V4L2_CID_MAX_CTRLS diff --git a/Documentation/userspace-api/mseal.rst b/Documentation/userspace-api/mseal.rst index 41102f74c5e2..ea9b11a0bd89 100644 --- a/Documentation/userspace-api/mseal.rst +++ b/Documentation/userspace-api/mseal.rst @@ -27,7 +27,7 @@ SYSCALL ======= mseal syscall signature ----------------------- - ``int mseal(void \* addr, size_t len, unsigned long flags)`` + ``int mseal(void *addr, size_t len, unsigned long flags)`` **addr**/**len**: virtual memory address range. The address range set by **addr**/**len** must meet: @@ -130,6 +130,27 @@ Use cases - Chrome browser: protect some security sensitive data structures. +- System mappings: + The system mappings are created by the kernel and includes vdso, vvar, + vvar_vclock, vectors (arm compat-mode), sigpage (arm compat-mode), uprobes. + + Those system mappings are readonly only or execute only, memory sealing can + protect them from ever changing to writable or unmmap/remapped as different + attributes. This is useful to mitigate memory corruption issues where a + corrupted pointer is passed to a memory management system. + + If supported by an architecture (CONFIG_ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS), + the CONFIG_MSEAL_SYSTEM_MAPPINGS seals all system mappings of this + architecture. + + The following architectures currently support this feature: x86-64, arm64, + loongarch and s390. + + WARNING: This feature breaks programs which rely on relocating + or unmapping system mappings. Known broken software at the time + of writing includes CHECKPOINT_RESTORE, UML, gVisor, rr. Therefore + this config can't be enabled universally. + When not to use mseal ===================== Applications can apply sealing to any virtual memory region from userspace, diff --git a/Documentation/userspace-api/netlink/c-code-gen.rst b/Documentation/userspace-api/netlink/c-code-gen.rst index 89de42c13350..46415e6d646d 100644 --- a/Documentation/userspace-api/netlink/c-code-gen.rst +++ b/Documentation/userspace-api/netlink/c-code-gen.rst @@ -56,7 +56,9 @@ If ``name-prefix`` is specified it replaces the ``$family-$enum`` portion of the entry name. Boolean ``render-max`` controls creation of the max values -(which are enabled by default for attribute enums). +(which are enabled by default for attribute enums). These max +values are named ``__$pfx-MAX`` and ``$pfx-MAX``. The name +of the first value can be overridden via ``enum-cnt-name`` property. Attributes ========== diff --git a/Documentation/userspace-api/netlink/intro-specs.rst b/Documentation/userspace-api/netlink/intro-specs.rst index bada89699455..a4435ae4628d 100644 --- a/Documentation/userspace-api/netlink/intro-specs.rst +++ b/Documentation/userspace-api/netlink/intro-specs.rst @@ -15,7 +15,7 @@ developing Netlink related code. The tool is implemented in Python and can use a YAML specification to issue Netlink requests to the kernel. Only Generic Netlink is supported. -The tool is located at ``tools/net/ynl/cli.py``. It accepts +The tool is located at ``tools/net/ynl/pyynl/cli.py``. It accepts a handul of arguments, the most important ones are: - ``--spec`` - point to the spec file @@ -27,7 +27,7 @@ YAML specs can be found under ``Documentation/netlink/specs/``. Example use:: - $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/ethtool.yaml \ + $ ./tools/net/ynl/pyynl/cli.py --spec Documentation/netlink/specs/ethtool.yaml \ --do rings-get \ --json '{"header":{"dev-index": 18}}' {'header': {'dev-index': 18, 'dev-name': 'eni1np1'}, @@ -75,7 +75,7 @@ the two marker lines like above to a file, add that file to git, and run the regeneration tool. Grep the tree for ``YNL-GEN`` to see other examples. -The code generation itself is performed by ``tools/net/ynl/ynl-gen-c.py`` +The code generation itself is performed by ``tools/net/ynl/pyynl/ynl_gen_c.py`` but it takes a few arguments so calling it directly for each file quickly becomes tedious. @@ -84,7 +84,7 @@ YNL lib ``tools/net/ynl/lib/`` contains an implementation of a C library (based on libmnl) which integrates with code generated by -``tools/net/ynl/ynl-gen-c.py`` to create easy to use netlink wrappers. +``tools/net/ynl/pyynl/ynl_gen_c.py`` to create easy to use netlink wrappers. YNL basics ---------- diff --git a/Documentation/userspace-api/netlink/netlink-raw.rst b/Documentation/userspace-api/netlink/netlink-raw.rst index 1990eea772d0..31fc91020eb3 100644 --- a/Documentation/userspace-api/netlink/netlink-raw.rst +++ b/Documentation/userspace-api/netlink/netlink-raw.rst @@ -62,7 +62,7 @@ Sub-messages ------------ Several raw netlink families such as -:doc:`rt_link<../../networking/netlink_spec/rt_link>` and +:doc:`rt-link<../../networking/netlink_spec/rt-link>` and :doc:`tc<../../networking/netlink_spec/tc>` use attribute nesting as an abstraction to carry module specific information. diff --git a/Documentation/userspace-api/ntsync.rst b/Documentation/userspace-api/ntsync.rst new file mode 100644 index 000000000000..25e7c4aef968 --- /dev/null +++ b/Documentation/userspace-api/ntsync.rst @@ -0,0 +1,385 @@ +=================================== +NT synchronization primitive driver +=================================== + +This page documents the user-space API for the ntsync driver. + +ntsync is a support driver for emulation of NT synchronization +primitives by user-space NT emulators. It exists because implementation +in user-space, using existing tools, cannot match Windows performance +while offering accurate semantics. It is implemented entirely in +software, and does not drive any hardware device. + +This interface is meant as a compatibility tool only, and should not +be used for general synchronization. Instead use generic, versatile +interfaces such as futex(2) and poll(2). + +Synchronization primitives +========================== + +The ntsync driver exposes three types of synchronization primitives: +semaphores, mutexes, and events. + +A semaphore holds a single volatile 32-bit counter, and a static 32-bit +integer denoting the maximum value. It is considered signaled (that is, +can be acquired without contention, or will wake up a waiting thread) +when the counter is nonzero. The counter is decremented by one when a +wait is satisfied. Both the initial and maximum count are established +when the semaphore is created. + +A mutex holds a volatile 32-bit recursion count, and a volatile 32-bit +identifier denoting its owner. A mutex is considered signaled when its +owner is zero (indicating that it is not owned). The recursion count is +incremented when a wait is satisfied, and ownership is set to the given +identifier. + +A mutex also holds an internal flag denoting whether its previous owner +has died; such a mutex is said to be abandoned. Owner death is not +tracked automatically based on thread death, but rather must be +communicated using ``NTSYNC_IOC_MUTEX_KILL``. An abandoned mutex is +inherently considered unowned. + +Except for the "unowned" semantics of zero, the actual value of the +owner identifier is not interpreted by the ntsync driver at all. The +intended use is to store a thread identifier; however, the ntsync +driver does not actually validate that a calling thread provides +consistent or unique identifiers. + +An event is similar to a semaphore with a maximum count of one. It holds +a volatile boolean state denoting whether it is signaled or not. There +are two types of events, auto-reset and manual-reset. An auto-reset +event is designaled when a wait is satisfied; a manual-reset event is +not. The event type is specified when the event is created. + +Unless specified otherwise, all operations on an object are atomic and +totally ordered with respect to other operations on the same object. + +Objects are represented by files. When all file descriptors to an +object are closed, that object is deleted. + +Char device +=========== + +The ntsync driver creates a single char device /dev/ntsync. Each file +description opened on the device represents a unique instance intended +to back an individual NT virtual machine. Objects created by one ntsync +instance may only be used with other objects created by the same +instance. + +ioctl reference +=============== + +All operations on the device are done through ioctls. There are four +structures used in ioctl calls:: + + struct ntsync_sem_args { + __u32 count; + __u32 max; + }; + + struct ntsync_mutex_args { + __u32 owner; + __u32 count; + }; + + struct ntsync_event_args { + __u32 signaled; + __u32 manual; + }; + + struct ntsync_wait_args { + __u64 timeout; + __u64 objs; + __u32 count; + __u32 owner; + __u32 index; + __u32 alert; + __u32 flags; + __u32 pad; + }; + +Depending on the ioctl, members of the structure may be used as input, +output, or not at all. + +The ioctls on the device file are as follows: + +.. c:macro:: NTSYNC_IOC_CREATE_SEM + + Create a semaphore object. Takes a pointer to struct + :c:type:`ntsync_sem_args`, which is used as follows: + + .. list-table:: + + * - ``count`` + - Initial count of the semaphore. + * - ``max`` + - Maximum count of the semaphore. + + Fails with ``EINVAL`` if ``count`` is greater than ``max``. + On success, returns a file descriptor the created semaphore. + +.. c:macro:: NTSYNC_IOC_CREATE_MUTEX + + Create a mutex object. Takes a pointer to struct + :c:type:`ntsync_mutex_args`, which is used as follows: + + .. list-table:: + + * - ``count`` + - Initial recursion count of the mutex. + * - ``owner`` + - Initial owner of the mutex. + + If ``owner`` is nonzero and ``count`` is zero, or if ``owner`` is + zero and ``count`` is nonzero, the function fails with ``EINVAL``. + On success, returns a file descriptor the created mutex. + +.. c:macro:: NTSYNC_IOC_CREATE_EVENT + + Create an event object. Takes a pointer to struct + :c:type:`ntsync_event_args`, which is used as follows: + + .. list-table:: + + * - ``signaled`` + - If nonzero, the event is initially signaled, otherwise + nonsignaled. + * - ``manual`` + - If nonzero, the event is a manual-reset event, otherwise + auto-reset. + + On success, returns a file descriptor the created event. + +The ioctls on the individual objects are as follows: + +.. c:macro:: NTSYNC_IOC_SEM_POST + + Post to a semaphore object. Takes a pointer to a 32-bit integer, + which on input holds the count to be added to the semaphore, and on + output contains its previous count. + + If adding to the semaphore's current count would raise the latter + past the semaphore's maximum count, the ioctl fails with + ``EOVERFLOW`` and the semaphore is not affected. If raising the + semaphore's count causes it to become signaled, eligible threads + waiting on this semaphore will be woken and the semaphore's count + decremented appropriately. + +.. c:macro:: NTSYNC_IOC_MUTEX_UNLOCK + + Release a mutex object. Takes a pointer to struct + :c:type:`ntsync_mutex_args`, which is used as follows: + + .. list-table:: + + * - ``owner`` + - Specifies the owner trying to release this mutex. + * - ``count`` + - On output, contains the previous recursion count. + + If ``owner`` is zero, the ioctl fails with ``EINVAL``. If ``owner`` + is not the current owner of the mutex, the ioctl fails with + ``EPERM``. + + The mutex's count will be decremented by one. If decrementing the + mutex's count causes it to become zero, the mutex is marked as + unowned and signaled, and eligible threads waiting on it will be + woken as appropriate. + +.. c:macro:: NTSYNC_IOC_SET_EVENT + + Signal an event object. Takes a pointer to a 32-bit integer, which on + output contains the previous state of the event. + + Eligible threads will be woken, and auto-reset events will be + designaled appropriately. + +.. c:macro:: NTSYNC_IOC_RESET_EVENT + + Designal an event object. Takes a pointer to a 32-bit integer, which + on output contains the previous state of the event. + +.. c:macro:: NTSYNC_IOC_PULSE_EVENT + + Wake threads waiting on an event object while leaving it in an + unsignaled state. Takes a pointer to a 32-bit integer, which on + output contains the previous state of the event. + + A pulse operation can be thought of as a set followed by a reset, + performed as a single atomic operation. If two threads are waiting on + an auto-reset event which is pulsed, only one will be woken. If two + threads are waiting a manual-reset event which is pulsed, both will + be woken. However, in both cases, the event will be unsignaled + afterwards, and a simultaneous read operation will always report the + event as unsignaled. + +.. c:macro:: NTSYNC_IOC_READ_SEM + + Read the current state of a semaphore object. Takes a pointer to + struct :c:type:`ntsync_sem_args`, which is used as follows: + + .. list-table:: + + * - ``count`` + - On output, contains the current count of the semaphore. + * - ``max`` + - On output, contains the maximum count of the semaphore. + +.. c:macro:: NTSYNC_IOC_READ_MUTEX + + Read the current state of a mutex object. Takes a pointer to struct + :c:type:`ntsync_mutex_args`, which is used as follows: + + .. list-table:: + + * - ``owner`` + - On output, contains the current owner of the mutex, or zero + if the mutex is not currently owned. + * - ``count`` + - On output, contains the current recursion count of the mutex. + + If the mutex is marked as abandoned, the function fails with + ``EOWNERDEAD``. In this case, ``count`` and ``owner`` are set to + zero. + +.. c:macro:: NTSYNC_IOC_READ_EVENT + + Read the current state of an event object. Takes a pointer to struct + :c:type:`ntsync_event_args`, which is used as follows: + + .. list-table:: + + * - ``signaled`` + - On output, contains the current state of the event. + * - ``manual`` + - On output, contains 1 if the event is a manual-reset event, + and 0 otherwise. + +.. c:macro:: NTSYNC_IOC_KILL_OWNER + + Mark a mutex as unowned and abandoned if it is owned by the given + owner. Takes an input-only pointer to a 32-bit integer denoting the + owner. If the owner is zero, the ioctl fails with ``EINVAL``. If the + owner does not own the mutex, the function fails with ``EPERM``. + + Eligible threads waiting on the mutex will be woken as appropriate + (and such waits will fail with ``EOWNERDEAD``, as described below). + +.. c:macro:: NTSYNC_IOC_WAIT_ANY + + Poll on any of a list of objects, atomically acquiring at most one. + Takes a pointer to struct :c:type:`ntsync_wait_args`, which is + used as follows: + + .. list-table:: + + * - ``timeout`` + - Absolute timeout in nanoseconds. If ``NTSYNC_WAIT_REALTIME`` + is set, the timeout is measured against the REALTIME clock; + otherwise it is measured against the MONOTONIC clock. If the + timeout is equal to or earlier than the current time, the + function returns immediately without sleeping. If ``timeout`` + is U64_MAX, the function will sleep until an object is + signaled, and will not fail with ``ETIMEDOUT``. + * - ``objs`` + - Pointer to an array of ``count`` file descriptors + (specified as an integer so that the structure has the same + size regardless of architecture). If any object is + invalid, the function fails with ``EINVAL``. + * - ``count`` + - Number of objects specified in the ``objs`` array. + If greater than ``NTSYNC_MAX_WAIT_COUNT``, the function fails + with ``EINVAL``. + * - ``owner`` + - Mutex owner identifier. If any object in ``objs`` is a mutex, + the ioctl will attempt to acquire that mutex on behalf of + ``owner``. If ``owner`` is zero, the ioctl fails with + ``EINVAL``. + * - ``index`` + - On success, contains the index (into ``objs``) of the object + which was signaled. If ``alert`` was signaled instead, + this contains ``count``. + * - ``alert`` + - Optional event object file descriptor. If nonzero, this + specifies an "alert" event object which, if signaled, will + terminate the wait. If nonzero, the identifier must point to a + valid event. + * - ``flags`` + - Zero or more flags. Currently the only flag is + ``NTSYNC_WAIT_REALTIME``, which causes the timeout to be + measured against the REALTIME clock instead of MONOTONIC. + * - ``pad`` + - Unused, must be set to zero. + + This function attempts to acquire one of the given objects. If unable + to do so, it sleeps until an object becomes signaled, subsequently + acquiring it, or the timeout expires. In the latter case the ioctl + fails with ``ETIMEDOUT``. The function only acquires one object, even + if multiple objects are signaled. + + A semaphore is considered to be signaled if its count is nonzero, and + is acquired by decrementing its count by one. A mutex is considered + to be signaled if it is unowned or if its owner matches the ``owner`` + argument, and is acquired by incrementing its recursion count by one + and setting its owner to the ``owner`` argument. An auto-reset event + is acquired by designaling it; a manual-reset event is not affected + by acquisition. + + Acquisition is atomic and totally ordered with respect to other + operations on the same object. If two wait operations (with different + ``owner`` identifiers) are queued on the same mutex, only one is + signaled. If two wait operations are queued on the same semaphore, + and a value of one is posted to it, only one is signaled. + + If an abandoned mutex is acquired, the ioctl fails with + ``EOWNERDEAD``. Although this is a failure return, the function may + otherwise be considered successful. The mutex is marked as owned by + the given owner (with a recursion count of 1) and as no longer + abandoned, and ``index`` is still set to the index of the mutex. + + The ``alert`` argument is an "extra" event which can terminate the + wait, independently of all other objects. + + It is valid to pass the same object more than once, including by + passing the same event in the ``objs`` array and in ``alert``. If a + wakeup occurs due to that object being signaled, ``index`` is set to + the lowest index corresponding to that object. + + The function may fail with ``EINTR`` if a signal is received. + +.. c:macro:: NTSYNC_IOC_WAIT_ALL + + Poll on a list of objects, atomically acquiring all of them. Takes a + pointer to struct :c:type:`ntsync_wait_args`, which is used + identically to ``NTSYNC_IOC_WAIT_ANY``, except that ``index`` is + always filled with zero on success if not woken via alert. + + This function attempts to simultaneously acquire all of the given + objects. If unable to do so, it sleeps until all objects become + simultaneously signaled, subsequently acquiring them, or the timeout + expires. In the latter case the ioctl fails with ``ETIMEDOUT`` and no + objects are modified. + + Objects may become signaled and subsequently designaled (through + acquisition by other threads) while this thread is sleeping. Only + once all objects are simultaneously signaled does the ioctl acquire + them and return. The entire acquisition is atomic and totally ordered + with respect to other operations on any of the given objects. + + If an abandoned mutex is acquired, the ioctl fails with + ``EOWNERDEAD``. Similarly to ``NTSYNC_IOC_WAIT_ANY``, all objects are + nevertheless marked as acquired. Note that if multiple mutex objects + are specified, there is no way to know which were marked as + abandoned. + + As with "any" waits, the ``alert`` argument is an "extra" event which + can terminate the wait. Critically, however, an "all" wait will + succeed if all members in ``objs`` are signaled, *or* if ``alert`` is + signaled. In the latter case ``index`` will be set to ``count``. As + with "any" waits, if both conditions are filled, the former takes + priority, and objects in ``objs`` will be acquired. + + Unlike ``NTSYNC_IOC_WAIT_ANY``, it is not valid to pass the same + object more than once, nor is it valid to pass the same object in + ``objs`` and in ``alert``. If this is attempted, the function fails + with ``EINVAL``. diff --git a/Documentation/userspace-api/perf_ring_buffer.rst b/Documentation/userspace-api/perf_ring_buffer.rst index bde9d8cbc106..dc71544532ce 100644 --- a/Documentation/userspace-api/perf_ring_buffer.rst +++ b/Documentation/userspace-api/perf_ring_buffer.rst @@ -627,7 +627,7 @@ regular ring buffer. AUX events and AUX trace data are two different things. Let's see an example:: - perf record -a -e cycles -e cs_etm/@tmc_etr0/ -- sleep 2 + perf record -a -e cycles -e cs_etm// -- sleep 2 The above command enables two events: one is the event *cycles* from PMU and another is the AUX event *cs_etm* from Arm CoreSight, both are saved @@ -766,7 +766,7 @@ only record AUX trace data at a specific time point which users are interested in. E.g. below gives an example of how to take snapshots with 1 second interval with Arm CoreSight:: - perf record -e cs_etm/@tmc_etr0/u -S -a program & + perf record -e cs_etm//u -S -a program & PERFPID=$! while true; do kill -USR2 $PERFPID diff --git a/Documentation/userspace-api/sysfs-platform_profile.rst b/Documentation/userspace-api/sysfs-platform_profile.rst index 4fccde2e4563..7f013356118a 100644 --- a/Documentation/userspace-api/sysfs-platform_profile.rst +++ b/Documentation/userspace-api/sysfs-platform_profile.rst @@ -40,3 +40,41 @@ added. Drivers which wish to introduce new profile names must: 1. Explain why the existing profile names cannot be used. 2. Add the new profile name, along with a clear description of the expected behaviour, to the sysfs-platform_profile ABI documentation. + +"Custom" profile support +======================== +The platform_profile class also supports profiles advertising a "custom" +profile. This is intended to be set by drivers when the setttings in the +driver have been modified in a way that a standard profile doesn't represent +the current state. + +Multiple driver support +======================= +When multiple drivers on a system advertise a platform profile handler, the +platform profile handler core will only advertise the profiles that are +common between all drivers to the ``/sys/firmware/acpi`` interfaces. + +This is to ensure there is no ambiguity on what the profile names mean when +all handlers don't support a profile. + +Individual drivers will register a 'platform_profile' class device that has +similar semantics as the ``/sys/firmware/acpi/platform_profile`` interface. + +To discover which driver is associated with a platform profile handler the +user can read the ``name`` attribute of the class device. + +To discover available profiles from the class interface the user can read the +``choices`` attribute. + +If a user wants to select a profile for a specific driver, they can do so +by writing to the ``profile`` attribute of the driver's class device. + +This will allow users to set different profiles for different drivers on the +same system. If the selected profile by individual drivers differs the +platform profile handler core will display the profile 'custom' to indicate +that the profiles are not the same. + +While the ``platform_profile`` attribute has the value ``custom``, writing a +common profile from ``platform_profile_choices`` to the platform_profile +attribute of the platform profile handler core will set the profile for all +drivers. |